Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In finite model theory and in descriptive complexity theory the Ehrenfeucht-Fraïssé method for first-order logic FO is mainly used to obtain inexpressibility results and hierarchy results. While Fraïssé [9] introduced this method in more algebraic terms, Ehrenfeucht [6] phrased it in an appealing game-theoretic form. Concerning generalizations, games were developed for further logics, mainly for logics relevant in descriptive complexity theory such as least fixed-point logic LFP, (monadic) existential second-order logic (monadic) \(\Sigma ^1_1\), and finite variable logics.

An inexpressibility result for a logic L shows that a given property is not definable (or expressible) in L. A hierarchy result states that a certain increasing sequence \(\text {H}_1\subseteq \text {H}_2\subseteq \ldots \) of classes \(\text {H}_m\) of sentences of a given logic is strict; that is, that for every \(m\in \mathbb N\) there is a property of finite structures expressible by some sentence of \(\text {H}_{m+1}\) but by no sentence of \(\text {H}_{m}\). Often, to obtain such an inexpressibility result, Ehrenfeucht-Fraïssé games have been used. The finite variable hierarchy \((\text {FO}^m)_{m\in \mathbb N}\) is an example of a strict hierarchy. Here \(\text {FO}^m\) consists of those FO-formulas which contain at most m variables.

Suppose we want to show, using the Ehrenfeucht-Fraïssé method, that for (finite) ordered graphs “eveness” of the cardinality of the vertex set is not expressible in FO, or equivalently, that for every \(m\in \mathbb N\) “eveness” is not expressible by an \(\text {FO}_m\)-sentence. Here \(\text {FO}_m\) denotes the set of sentences of first-order logic of quantifier rank at most m. One chooses ordered graphs \({G}_m\) and \({H}_m\) that are paths of length \(2^m+1\) and \(2^m\), respectively, and shows that \({G}_m\equiv _{\text {FO}_m} {H}_m\), that is, that \({G}_m\) and \({H}_m\) satisfy the same sentences of \(\text {FO}_m\). The latter property is shown by playing, more precisely, by analyzing the Ehrenfeucht-Fraïssé game (for first-order logic) with board \(({G}_m, {H}_m)\). It is not hard to show that the size of the board \(({G}_m, {H}_m)\) must be exponential in m.

Let us mention some further results obtained by the Ehrenfeucht-Fraïssé method (or by a probabilistic generalization of it):

  • Reachability in directed graphs is not expressible in monadic \(\Sigma ^1_1\) [1].

  • For ordered graphs connectivity is not expressible in monadic \(\Sigma ^1_1\) [20].

  • The finite variable hierarchy for FO on ordered structures is strict [12, 18].

  • The arity hierarchy is strict for LFP [10].

  • For every \(k\in \mathbb N\) the hierarchy whose mth member consists of formulas with at most m nested k-ary fixed-point operators is strict for LFP [15].

We know (see Theorem 1) that \(\text {P}\ne \text {NP}\) if and only if for every m there are a 3-colorable ordered graph \({G}_m\) and an ordered graph \({H}_m\), which is not 3-colorable, such that \({G}_m\) and \({H}_m\) are indistinguishable by sentences of LFP of “quantifier rank” or length at most m; this last property, denoted by \({G}_m\equiv _{{LFP }_m}{H}_m\), would be shown by the Ehrenfeucht-Fraïssé game for LFP. Let us call such a sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) a \((\textsc {3-Col}, \text {LFP})\)-sequence. Furthermore, \(\text {NP}\ne \text {co-NP}\) if and only if there is a \((\textsc {3-Col},\Sigma ^1_1)\)-sequence, where a \((\textsc {3-Col}, \Sigma ^1_1)\)-sequence is defined in a similar way. In [8], the authors remark:

It is known that \(\Sigma ^1_1\ne \Pi ^1_1\) if and only if such a separation can be proven via second-order Ehrenfeucht-Fraïssé games. Unfortunately, “playing” second-order Ehrenfeucht-Fraïssé games is very difficult, and the above promise is still largely unfulfilled; for example, the equivalence between the \(\text {NP}= \text {co-NP}\) question and the \(\Sigma ^1_1= \Pi ^1_1\) question has not so far led to any progress on either of these questions.

And Kolaitis remarks in [7, page 56]:

Although ...Ehrenfeucht-Fraïssé games yield a sound and complete method for studying ESO-definability [that is, \(\Sigma ^1_1\) -definability] (and thus potentially leading to the separation of NP and co-NP), so far this approach has had rather limited success. The reason is that formidable combinatorial difficulties arise in implementing this method ...when dealing with ESO-formulas in which at least one of the existentially quantified second-order variables has an arity bigger than 1.

Definitely the authors are right with their observation that “playing” second-order Ehrenfeucht-Fraïssé games is very difficult. However, in order to derive the last two hierarchy results mentioned above, the corresponding authors successfully apply games for logics containing nonmonadic second-order quantifiers.

In the example of “eveness” we already observed that the size of a board \(({G}_m, {H}_m)\) of ordered graphs has to be exponential in m. On the other hand, analyzing most of the successful applications of the Ehrenfeucht-Fraïssé method obtained so far, we realized that the boards \(({G}_m, {H}_m)_{m\in \mathbb N}\) could be constructed in polynomial output time, that is, in time \((|V({G}_m)|+ |V({H}_m)|)^{O(1)}\). However, by a simple and standard diagonal argument we show:

  1. (A)

    No \((\textsc {3-Col}, \text {LFP})\) -sequence can be generated in polynomial output time.

Even more, to the best of our knowledge, it is open whether we can get such a sequence of boards by an algorithm more efficient than brute force.

Mostly in successful applications of the Ehrenfeucht-Fraïssé method the main task consisted in constructing boards such that one can find an argument showing, via Ehrenfeucht-Fraïssé games for the given logic, that the corresponding structures are indistinguishable to a certain extent. As mentioned, for a proof of \(\text {P}\ne \text {NP}\) via the Ehrenfeucht-Fraïssé method, already the presumably easier step of merely constructing the sequence of boards (and forgetting about the concrete verification of their indistinguishability) is hard. This makes our “negative” result even stronger with respect to the existence of positive applications of the Ehrenfeucht-Fraïssé method for sufficiently rich logics. It is an interesting challenge, though: how can we use the Ehrenfeucht-Fraïssé method to prove \(\text {P}\ne \text {NP}\) if we must necessarily work with non-constructive boards?

What happens if we allow probabilistic algorithmsFootnote 1 to yield the boards for the Ehrenfeucht-Fraïssé method? Such random constructions have been used for two of the applications mentioned above, namely to show that reachability in directed graphs is not definable in monadic second-order logic and in the proof of Rossman [18] that the finite variable hierarchy for first-order logic on ordered graphs is strict. It turns out that in order to derive a probabilistic generalization of (A) of the type “No \((\textsc {3-Col}, \text {LFP})\)-sequence can be generated by a probabilistic algorithm in polynomial output time” we need a further assumption,Footnote 2 namely that there is a verifier, that is, an algorithm that in a reasonable time verifies that with high probability the board \(({G}_m, {H}_m)\) satisfies

$$\begin{aligned} {G}_m\in \textsc {3-Col}, {H}_m\notin \textsc {3-Col}, \text { and } {G}_m\equiv _{{LFP }_m}{H}_m. \end{aligned}$$

So we get:

  1. (B)

    Assume that there is a pseudorandom generator. No \((\textsc {3-Col},\text {LFP})\) -sequence having a verifier can be generated by a probabilistic algorithm in polynomial output time.

Is the assumption of the existence of a verifier necessary? The question is related to the planted clique conjecture. This conjecture claims that there is no polynomial time algorithm that detects a clique of size \(4\cdot \text {log}\;n\), which has been planted uniformly at random in a random graph with n vertices and edge probability 1 / 2. In this article we introduce a stronger conjecture, a logic version LPCC of the planted clique conjecture. It is not hard to show:

  1. (C)

    If LPCC holds, then a \((\textsc {3-Col},\text {LFP})\) -sequence can be generated by a probabilistic algorithm in polynomial output time.

As already the planted clique conjecture implies \(\text {P}\ne \text {NP}\), so does LPCC. Can we refute LPCC? We show that this is the case for some strengthening of LPCC.

The content of the different sections is the following. After fixing some notation (in Sect. 2), we recall the Ehrenfeucht-Fraïssé method in Sect. 3. In Sect. 4, first we study the minimum size of the board \(({G}_m,{H}_m)\) of a \((\textsc {3-Col},\text {LFP})\)-sequence and then we prove statement (A). Section 5 is devoted to a proof of the probabilistic generalization of this result, stated as (B) above. In Sect. 6 we introduce the logic version LPCC of the planted clique conjecture and derive statement (C) in Sect. 7. In Sect. 8 we show that some strengthened versions of LPCC are refutable. Finally, in the last section we mention extensions of our results and some further results related to the topic of this article. Moreover, we state some conjectures and open questions.

2 Preliminaries

For a natural number n we set \([n]:= \{1, \ldots , n\}\). For a graph G we denote by V(G) and E(G) its vertex set and its edge set, respectively. We speak of an ordered graph G if G comes with an ordering of its vertex set. As already mentioned, in this article graph always means finite graph. A problem (or, property) Q of ordered graphs is a class of ordered graphs closed under isomorphism.

We assume familiarity with basic notions of first-order logic FO and of least fixed-point logic LFP. Concerning LFP, till Sect. 8 essentially we only need the Immerman-Vardi Theorem, which we recall in the next section.

Let L be a logic. A property Q of ordered graphs is definable in L (or, expressible in L) if there is a sentence of L such that Q is its class of models.

3 The Ehrenfeucht-Fraïssé-method

Let us denote by \(\text {FO}_m\) the set of sentences of first-order logic of quantifier rank (\(=\) maximum number of nested quantifiers) at most m and by \(\text {LFP}_m\) the set of LFP-sentences \(\varphi \) of length \(|\varphi |\le m\). Here \(|\varphi |\) denotes the number of symbols in \(\varphi \) (that is, the number of connectives, quantifiers, LFP-operators, variables, ...; however, two occurrences, say, of the same variable in \(\varphi \) count as two symbols).

Let L be one of the logics FO or LFP and denote by \(L_m\) the corresponding set \(\text {FO}_m\) or \(\text {LFP}_m\). The Ehrenfeucht-Fraïssé method relies on the following result.

Theorem 1

For \(L\in \{{FO, LFP }\}\) and a problem Q of ordered graphs the following are equivalent:

  1. (i)

    For all \(m\in \mathbb N\) there are ordered graphs \({G}_m\) and \({H}_m\) with

    $$\begin{aligned} {G}_m\in Q, \quad {H}_m\notin Q, \qquad \text {and}\qquad {G}_m\equiv _{L_m}{H}_m. \end{aligned}$$
    (1)
  2. (ii)

    Q is not definable in L.

So, in order to show that the problem Q is not definable in the logic \(L\in \{\text {FO}, \text {LFP}\}\), it suffices to exhibit a \((Q,L)\)-sequence in the sense of the following definition.

Definition 2

Assume \(L\in \{\text {FO}, \text {LFP}\}\) and let Q be a problem of ordered graphs. A sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) of ordered graphs is a \((Q,L)\) -sequence if for al \(m\in \mathbb N\)

$${G}_m\in Q,\ \ {H}_m\notin Q, \text { and } {G}_m\equiv _{L_m} {H}_m.$$

In many concrete applications of Theorem 1, Ehrenfeucht-Fraïssé-games are applied to show that \({G}_m\equiv _{L_m}{H}_m\). We recall the Ehrenfeucht-Fraïssé-game for FO (see [4, 10, 15] for the Ehrenfeucht-Fraïssé-game for LFP and other extensions of FO by fixed-point operators). Let G and H be ordered graphs and \(m\in \mathbb N\). The Ehrenfeucht-Fraïssé-game \(\text {G}_m(G, H)\) (with boards G and H) is played by two players called Spoiler and Duplicator. The game consists of a sequence of m rounds. In round i of the game, first Spoiler picks a graph (either G or H) and a vertex of his choice in that graph. Duplicator then replies by picking a vertex of his choice in the other graph. Thus, after m rounds, vertices \(u_1, \ldots , u_m\) in V(G) and \(v_1,\ldots , v_m\) in V(H) have been selected, \(u_i\) and \(v_i\) being the vertices chosen in round i. Duplicator wins if the induced ordered subgraphs \(G[\{u_1,\ldots , u_m\}]\) and \(H[\{v_1,\ldots , v_m\}]\) (induced by G on \(\{u_1,\ldots , u_m\}\) and by H on \(\{v_1, \ldots , v_m \}\), respectively) are isomorphic via the mapping \(f(u_i):= v_i\) for \(i\in [m]\). It should be clear what it means that Duplicator has a winning strategy for the game \(\text {G}_m(G, H)\).

Theorem 3

(Ehrenfeucht-Fraïssé-Theorem). Let G and H be ordered graphs and \(m\in \mathbb N\). Then Duplicator has a winning strategy for the game \(\text {G}_m(G, H)\) if and only if \(G\equiv _{\text {FO}_m} H\).

The following simple application of the Ehrenfeucht-Fraïssé-game shows that the class Even of ordered graphs with vertex set of even cardinality is not definable in FO: For \(m\in \mathbb N\) let the ordered graphs \({G}_m\) and \({H}_m\) be paths of length \(2^m+1\) and \(2^m\), respectively. Then Duplicator has a winning strategy for the game \(\text {G}_m({G}_m, {H}_m)\). In fact, in the ith round he picks his vertex, \(u_i\) or \(v_i\), such that for all \(j\in [i-1]\),

$$\begin{aligned} d^{{G}_m}(u_i,u_j)=d^{{H}_m}(v_i,v_j)\ \ \ \text {or } \big (d^{{G}_m}(u_i,u_j)>2^{m-i}\text { and }d^{{H}_m}(v_i,v_j)>2^{m-i}\big ). \end{aligned}$$

Here \(d^{G}(u,u')\) denotes the distance of the vertices u and \(u'\) in the graph G. Thus, \({G}_m\equiv _{\text {FO}_m} {H}_m\) and hence, \(({G}_m, {H}_m)_{m\in \mathbb N}\) is an \((\textsc {Even}, \text {FO})\)-sequence.

The graphs \({G}_m\) and \({H}_m\) just constructed have size exponential in m. We can’t do it better: the sizes of the graphs of every \((Q,\text {FO})\)-sequence for any problem Q of ordered graphs must be exponential in m. This follows from the following result, which can easily been derived.

Proposition 4

Let \(m\in \mathbb N\). If G and H are nonisomorphic ordered graphs, then

$$\begin{aligned} \text {G } {\equiv _{{FO }_{m+3}} H\,\,implies\,\,|V(G)|, |V(H)|> 2^m.} \end{aligned}$$

4 A Logical Reformulation of \(\text {P}\ne \text {NP}\)

Immerman and Vardi have proven that least fixed-point logic LFP captures the complexity class \(\text {P}\) in the following sense.

Theorem 5

(Immerman-Vardi Theorem). A problem of ordered graphs is decidable in polynomial time if and only if it can be defined in least fixed-point logic LFP.

As the problem 3-Col, the 3-colorability problem of ordered graphs, is NP-complete, we get:

Corollary 6

\({P }\ne {NP }\) if and only if 3-Col is not definable in LFP.

We defined \(\varphi \) an \(\text {LFP}_m\)-sentence with \(|\varphi |\le m\). The previous corollary together with Theorem 1 yield:

Corollary 7

\({P }\ne {NP }\) if and only if there is a \((\textsc {3-Col}, {LFP })\)-sequence, that is, a sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) of ordered graphs such that for all m,

$$\begin{aligned} {G}_m\in \textsc {3-Col}, \ \ {H}_m\notin \textsc {3-Col}, \ \ \text { and } \ \ {G}_m\equiv _{{LFP }_m}{H}_m. \end{aligned}$$

Assume \(\text {P}\ne \text {NP}\). What can we say about the minimum size of the graphs of a \((\textsc {3-Col},\text {LFP})\)-sequence and what about the running time of an algorithm generating a \((\textsc {3-Col},\text {LFP})\)-sequence? We set

$$\begin{aligned} \textsc {size}(\textsc {3-Col})(m):= \text {min}\big \{\text {max}\{|V(G)|,&|V(H)|\} \;\big |\;\ \!\text {{ G} and { H} are ordered graphs with}\\&\textit{G }\in \textsc {3-Col}, H\notin \textsc {3-Col}, \text { and } G \equiv _{{LFP }_m}H \big \}. \end{aligned}$$

Recall that a problem Q has circuit size c, where \(c:\mathbb N\rightarrow \mathbb N\), if for \(n\in \mathbb N\), c(n) is the least \(d\in \mathbb N\) such there exists a (Boolean) circuit C with n input variables of size \(\le d\) such that for every x with \(|x|= n\),

$$\begin{aligned} x\in Q\quad \iff \quad C(x)=1 \ \text {(i.e., } C \text { accepts } x) . \end{aligned}$$

In [5] we derived the following lower and upper bound for \(\textsc {size}(\textsc {3-Col})(m)\).

Proposition 8

Assume \({P }\ne {NP }\). Then:

  1. (a)

    There is an \(\varepsilon >0\) such that for all \(m\in \mathbb N\) we have \(2^{\varepsilon \cdot m}\le \textsc {size}(\textsc {3-Col})(m). \)

  2. (b)

    If the circuit size of 3-Col is not in \(2^{o(n)}\), then for all \(\varepsilon >0\) and infinitely many m,

    $$\begin{aligned} \textsc {size}(\textsc {3-Col})(m)\le 2^{(1+\varepsilon )\cdot m\cdot \text {log}\;m}. \end{aligned}$$

Definition 9

An algorithm \(\mathbb A\) generates the sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) if \(\mathbb A\) on input \(m\in \mathbb N\) outputs \(({G}_m, {H}_m)\).

By systematically testing, for \(\ell = 1, 2, \ldots \), all graphs G and H with vertex sets of cardinality \(\le \ell \) whether they satisfy

$$\begin{aligned} G\in \textsc {3-Col},\ \ H\notin \textsc {3-Col},\ \ \text {and} \ \ G\equiv _{{LFP }_m}H, \end{aligned}$$

we obtain from the previous result an upper bound for the time needed to get the graphs of a \((\textsc {3-Col},\text {LFP})\)-sequence, even of a sequence with boards of minimum size:

Proposition 10

([5]). If \({P }\ne {NP }\), then there is an algorithm that generates a \((\textsc {3-Col}, {LFP })\)-sequence in time \(2^{O(\textsc {size}(\textsc {3-Col})(m)^2)}\). The sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) generated by the algorithm satisfies \(\textsc {size}(\textsc {3-Col})(m)= {max }\big \{|V({G}_m)|, |V({H}_m)|\big \}\).

By Proposition 4, the boards of all \((Q,\text {FO})\)-sequences for any problem Q of ordered graphs must have size exponential in m. However we could construct the graphs \({G}_m\) and \({H}_m\) of an \((\textsc {Even}, \text {FO})\)-sequence in polynomial output time, that is, in time \((|V({G}_m)|+ |V({H}_m)|)^{O(1)}\). In fact, we realized that in most successful applications of the Ehrenfeucht-Fraïssé method showing that a property is not definable in a given logic, the boards for the corresponding game can be constructed in polynomial output time. So we ask, is it possible to construct a \((\textsc {3-Col},\text {LFP})\)-sequence in polynomial output time? By a standard diagonalization argument we show that this is not possible:

Theorem 11

No \((\textsc {3-Col}, {LFP })\)-sequence can be constructed in polynomial output time.

Proof

We sketch the main steps of a proof (for more details see [5]). Assume for a contradiction that the algorithm \(\mathbb A\) generates a \((\textsc {3-Col},\text {LFP})\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) in polynomial output time. By passing to a suitable subsequence (cf. the proof of Lemma,16), we can assume that \(({G}_m, {H}_m)_{m\in \mathbb N}\) is monotone, that is, that it satisfies

$$\begin{aligned} \text {max}\big \{|V({G}_m)|,|V({H}_m)|\big \} < \text {min}\big \{|V(G_{m+1})|, |V(H_{m+1})|\big \}. \end{aligned}$$

Furthermore, we can assume (again by passing to a suitable subsequence) that \(|V({G}_m)|\ge |V({H}_m)|\) for all \(m\in \mathbb N\) or that \(|V({G}_m)|\le |V({H}_m)|\) for all \(m\in \mathbb N\). Then we can transform \(\mathbb A\) into an algorithm \(\mathbb B\) running in polynomial time such that for all \(m\in \mathbb N\),

$$\begin{aligned} \mathbb B\,\,\text {accepts}\,\,{G}_m\quad and\quad \mathbb B \text { rejects } {H}_m. \end{aligned}$$

By the Immerman-Vardi Theorem there is an LFP-sentence \(\varphi _{\mathbb B}\), say \(\varphi _{\mathbb B}\in \text {LFP}_{m_0}\), such that for all ordered graphs G,

$$\begin{aligned} G\,\models \,\varphi _{\mathbb B}\iff \mathbb B \text { accepts } G. \end{aligned}$$

In particular, for all \(m\in \mathbb N\),

$$\begin{aligned} {G}_m\,\models \, \varphi _{\mathbb B} \quad \text { and }\quad {H}_m\not \,\models \,\varphi _{\mathbb B}. \end{aligned}$$

For \(m\ge m_0\), this equivalence contradicts \({G}_m\equiv _{{LFP }_m}{H}_m\).   \(\Box \)

The same proof works for every property Q of ordered graphs (instead of \(\textsc {3-Col}\)), even more: By definition, an \(\text {LFP}\) -sequence is a sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) of ordered graphs \({G}_m\) and \({H}_m\) with

$$\begin{aligned} {{G}_m\not \cong {H}_m\ \ ({G}_m\text { and } {H}_m\text { are not isomorphic)} \text { and } {G}_m\equiv _{{LFP }_m}{H}_m.} \end{aligned}$$

Clearly every \((Q,\text {LFP})\)-sequence for any property Q of ordered graphs is an \(\text {LFP}\)-sequence. We state the following result, which can be derived similarly to Theorem 11.

Theorem 12

([5]). No \({LFP }\)-sequence can be generated in polynomial output time.

We should mention that also for first-order logic there are problems Q such that no \((Q,\text {FO})\)-sequence can be generated in polynomial output time:

Example 13

Let \(B\subseteq \{0,1 \}^*\) be a P-bi-immune set; that is, neither B nor \(\{0,1 \}^*\setminus B\) contains an infinite subset decidable in polynomial time. For \(x\in B\), \(x=x_1\ldots x_s\) with \(x_i\in \{0,1\}\), let G(x) be the ordered graph with vertex set \([s+1]\), with the natural ordering on \([s+1]\), and with edge set \(\{\{i,i+1 \}\mid i\in [s]\text { and }x_i=1\}\). Let Q(B) be the smallest class of ordered graphs containing all G(x) with \(x\in B\) and closed under isomorphism. No \((Q(B),\text {FO})\)-sequence can be generated in polynomial output time. For a contradiction assume that \(({G}_m, {H}_m)_{m\in \mathbb N}\) is a \((Q(B),\text {FO})\)-sequence generated in polynomial output time. As above we can assume that the sequence is monotone and that \(|V({G}_m)|\ge |V({H}_m)|\) for all \(m\in \mathbb N\) or that \(|V({G}_m)|\le |V({H}_m)|\) for all \(m\in \mathbb N\). In the first case, B contains an infinite subset in P and in the second case \(\{0,1 \}^*\setminus B\).

5 On Random (\(\textsc {3-Col}\),\(\text {LFP}\))-Sequences

We have seen that we cannot construct a \((\textsc {3-Col},\text {LFP})\)-sequence in polynomial output time. What happens if we consider random sequences? There are successful applications of the Ehrenfeucht-Fraïssé-method where the graphs of the corresponding sequences are constructed randomly. For example, in this way it has been shown that reachability in directed graphs is not definable in monadic second-order logic (see [1]) and that the finite variable hierarchy for first-order logic on ordered graphs is strict (see [18]).

We aim at a result showing limitations of the probabilistic Ehrenfeucht-Fraïssé-method similar to Theorem 11. For this purpose we have to take into consideration a further property of such sequences \(({G}_m, {H}_m)_{m\in \mathbb N}\) satisfied in most successful applications of the Ehrenfeucht-Fraïssé-method obtained so far. For \((\textsc {3-Col},\text {LFP})\)-sequences \(({G}_m, {H}_m)_{m\in \mathbb N}\) this property ensures that we can verify that \({G}_m\in \textsc {3-Col}\), \({H}_m\notin \textsc {3-Col}\), and that \({G}_m\equiv _{{LFP }_m}{H}_m\) in a reasonable time. Condition (r2) of the following definition of random \((\textsc {3-Col},\text {LFP})\)-sequence contains the precise formulation.

Definition 14

A probabilistic algorithm \(\mathbb P\) generates a random \((\textsc {3-Col},\text {LFP})\) -sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) if (r1) and (r2) are satisfied.

  • (r1) For every \(m\in \mathbb N\) the algorithm \(\mathbb P\), on input m, first deterministically computes the vertex sets \(V({G}_m)\) and \(V({H}_m)\), and then it constructs the ordered graphs \({G}_m\) and \({H}_m\) probabilistically.

  • (r2) There is an algorithm \(\mathbb V\), the verifier, such that (a)–(c) hold.

    1. (a)

      For all ordered graphs G and H and all \(m\in \mathbb N\),

      $$\text {if } \mathbb V \text { accepts } (G, H, m), \text { then } G\equiv _{{LFP }_m}H, \ \ G\in \textsc {3-Col}, \text { and } H\notin \textsc {3-Col}.$$
    2. (b)

      For sufficiently large \(m\in \mathbb N\) and all \(m'\ge m\),

      $$\Pr \big [\mathbb V\,\mathrm{accepts}\,(G_{m'}, H_{m'}, m)\big ] \ge \frac{1}{\big (|V(G_{m'} )|+ |V(H_{m'})|\big )^{O(1)}}.$$
    3. (c)

      The running time of \(\mathbb V\) on input (GHm) is bounded by \(f(m)\cdot (|V(G)|+|V(H)|)^{O(1)}\) for some computable function \(f:\mathbb N\rightarrow \mathbb N\).

In this section we show:

Theorem 15

Assume that there is a \(2^{\left\lceil \ell /c\right\rceil }\)-pseudorandom generatorFootnote 3 for some natural number \(c \ge 1\). Then there is no probabilistic algorithm that generates a random \((\textsc {3-Col},{LFP })\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) in polynomial output time.

The following lemmas will finally yield a proof of Theorem 15 along the following lines: For a contradiction we assume that there exists a probabilistic algorithm \(\mathbb P\) generating a random \((\textsc {3-Col},\text {LFP})\)-sequence in polynomial output time. Essentially we use the pseudorandom generator to derandomize the algorithm \(\mathbb P\). In this way we obtain a deterministic algorithm which generates a \((\textsc {3-Col}, \text {LFP})\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) in polynomial output time. This contradicts Theorem 11.

As in the deterministic case we say that a probabilistic algorithm \(\mathbb P\) generates a random monotone \((\textsc {3-Col}, \text {LFP})\)-sequence if it generates a random \((\textsc {3-Col},\text {LFP})\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\), which in addition to (r1) and (r2) satisfies (r3), where

  • (r3) for all \(m\in \mathbb N\), \(\text {max}\{|V({G}_m)|, |V({H}_m)|\}< \text {min}\{|V(G_{m+1})|, |V(H_{m+1})|\}\).

If furthermore (r4) and (r5) hold, where

  • (r4) \(\left\lceil \text {log}\;(|V({G}_m)|+ |V({H}_m)|)\right\rceil < \left\lceil \text {log}\;(|V(G_{m+1})|+ |V(H_{m+1})|)\right\rceil \)

  • (r5) \(f(m)\le \text {max}\{|V({G}_m)|, |V({H}_m)|\}\) (where f is the computable function of (r2)(c) used to bound the running time of the verifier \(\mathbb V\)),

then we speak of a strongly monotone \((\textsc {3-Col},\text {LFP})\) -sequence.

For our proof of Theorem 15 we need to show that we can restrict ourselves to strongly monotone \((\textsc {3-Col},\text {LFP})\)-sequences.

Lemma 16

If there is a probabilistic algorithm generating a random \((\textsc {3-Col},{LFP })\)-sequence in polynomial output time, then there is a probabilistic algorithm that generates a strongly monotone random \((\textsc {3-Col},{LFP })\)-sequence in polynomial output time.

Proof

Similar to Proposition 4 one gets an increasing function \(s: \mathbb N\rightarrow \mathbb N\) such that s(m) is computable in space \(O(\text {log}\;m)\) and such that for all ordered graphs G and H and all \(m\in \mathbb N\),

$$\begin{aligned} \text {if }{G\equiv _{\text {LFP}_{s(m)}} H \text { and }G\not \cong H, then |V(G)|, | V({H)}|> m}. \end{aligned}$$

Assume that the \((\textsc {3-Col},\text {LFP})\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) is generated by the probabilistic algorithm \(\mathbb P\) in polynomial output time. Recall that the universes of \({G}_m\) and \({H}_m\) are obtained deterministically. We define a function \(h:\mathbb N\rightarrow \mathbb N\) inductively by

$$\begin{aligned} h(m):= {\left\{ \begin{array}{ll} s(0), &{} \text { if m=0 }\!\!, \\ s\big (\text {max}\{|V(G_{h(m-1)})|,|V(H_{h(m-1)})|\}\big ), &{} \text { if } m>0. \end{array}\right. } \end{aligned}$$

As \(G_{h(m)}\equiv _{\text {LFP}_{h(m)}} H_{h(m)}\), that is, \(G_{h(m)}\equiv _{\text {LFP}_{s\big (\text {max}\{|V(G_{h(m-1)})|,|V(H_{h(m-1)})|\} \big )}}H_{h(m)}\), we have

$$|V(G_{h(m)})|,|V(H_{h(m)})|> \text {max}\big \{|V(G_{h(m-1)})|, |V(H_{h(m-1)})|\big \}.$$

As \(G_{h(m)}\equiv _{\text {LFP}_{h(m)}} H_{h(m)}\), we have \(G_{h(m)}\equiv _{{LFP }_m}H_{h(m)}\). Therefore, it is routine to show that the probabilistic algorithm, which on input m first computes h(m) and then simulates \(\mathbb P\) on h(m), generates a random monotone \((\textsc {3-Col}, \text {LFP})\)-sequence in polynomial in output time.

So we may assume that the \((\textsc {3-Col},\text {LFP})\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) generated by \(\mathbb P\) is monotone. We will get the sequence satisfying (r4) and (r5) as a subsequence of \(({G}_m, {H}_m)_{m\in \mathbb N}\), therefore it will be itself monotone. We may assume that the function \(f:\mathbb N\rightarrow \mathbb N\) mentioned in (r2) is time constructible. We define \(g:\mathbb N\rightarrow \mathbb N\) by

$$\begin{aligned} g(k):= {\left\{ \begin{array}{ll} \text { the least } m \text { such that } f(0)\le \text {max}\{|V({G}_m)|, |V({H}_m)|\}, &{} \text { if } k=0, \\ \text { the least } m \text { such that } f(k)\le \text {max}\{|V({G}_m)|, |V({H}_m)|\} \text { and }\\ \left\lceil \text {log}\;(|V(G_{g(k-1)})|+ |V(H_{g(k-1)})|)\right\rceil < \left\lceil \text {log}\;(|V({G}_m)|+ |V({H}_m)|)\right\rceil \!, &{} \text { if } k>0. \end{array}\right. } \end{aligned}$$

Again it is routine to show that the probabilistic algorithm, which on input m first computes g(m) and then simulates \(\mathbb P\) on g(m), generates a random and strongly monotone \((\textsc {3-Col},\text {LFP})\)-sequence in polynomial output time.    \(\Box \)

Before turning to the main step of the proof of Theorem 15, for the reader’s convenience we recall the definition of pseudorandom generator (following [3, Definition 20.2]).

Definition 17

Let \(c\in \mathbb N\). An algorithm \(\mathbb G\) is a \(2^{\left\lceil \ell /c\right\rceil }\)-pseudorandom generator if it satisfies (g1) and (g2).

  • (g1) On every input \(s\in \{0,1\}^*\) the algorithm \(\mathbb G\) computes a string \(\mathbb G(s)\in \{0,1\}^*\) with \(|\mathbb G(s)| = 2^{\left\lceil |s|/c\right\rceil } \) in time \(2^{|s|}\).

  • (g2) For every \(\ell \in \mathbb N\) and every circuit C of size at most \(t^3\), where \(t:= 2^{\left\lceil \ell /c\right\rceil }\), we have

    $$\left| \Pr _{s\in \{0,1\}^{\ell }}\big [C(\mathbb G(s))= 1 \big ]- \Pr _{r\in \{0,1\}^t} \big [C(r)=1\big ]\right| < 1/10.$$

    In the left term we consider the uniform probability space on \(\{0,1\}^{\ell }\), in the right term the uniform probability space on \(\{0,1\}^{t}\).

Lemma 18

Assume

  • there is a \(2^{\left\lceil \ell /c\right\rceil }\)-pseudorandom generator \(\mathbb G\) for some \(c\in \mathbb N\);

  • there is a probabilistic algorithm \(\mathbb P\) that generates a strongly monotone random \((\textsc {3-Col},{LFP })\)-sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) in polynomial output time.

Then there is a deterministic algorithm \(\mathbb A\) such that for every \(m\in \mathbb N\) the algorithm \(\mathbb A\) on input m computes a sequence of pairs

$$\begin{aligned} (G^1_m, H^1_m), \ldots , (G^{t_m}_m, H^{t_m}_m) \end{aligned}$$

of ordered graphs, where all \(G^i_m\) have \(V({G}_m)\) as vertex set, and all \(H^i_m\) have \(V({H}_m)\) as vertex set (recall that \(V({G}_m)\) and \(V({H}_m)\) are the vertex sets deterministically computed by \(\mathbb P\) on input m). Moreover, the following conditions (a1)–(a3) hold:

  • (a1) The algorithm \(\mathbb A\) runs in time \((|V({G}_m)|+|V({H}_m|)^{O(1)}\); in particular, \(t_m= (|V({G}_m)|+|V({H}_m|)^{O(1)}\).

  • (a2) For sufficiently large \(m\in \mathbb N\),

    $$\begin{aligned} \Pr _{p\in [t_m]} \big [G^p_m \equiv _{{LFP }_m}H^p_m, \ G^p_m\in \textsc {3-Col}&\ \text { and }\ H^p_m\notin \textsc {3-Col}\big ] \\[-1mm]&\ge \Pr _{p\in [t_m]} \big [\mathbb V \text { accepts } (G^p_m, H^p_m, m) \big ]>1/2, \end{aligned}$$

    where \(\mathbb V\), the verifier, is the algorithm associated with \(\mathbb P\) and mentioned in condition (r2) of Definition 14. Note that the first inequality holds by this condition.

  • (a3) For every \(m\in \mathbb N\) we have

    • \(-\) \(\text {max}\{|V({G}_m)|, |V({H}_m)|\}< \text {min}\{|V(G_{m+1})|, |V(H_{m+1})|\}\)

    • \(-\) \(\left\lceil \text {log}\;(|V({G}_m)|+ |V({H}_m)|\right\rceil < \left\lceil \text {log}\;(|V(G_{m+1}|+ |V(H_{m+1})|)\right\rceil \);

    • \(-\) \(f(m)\le \text {max}\{|V({G}_m)|, |V({H}_m)|\}\) (where f is the function mentioned in (r2)(c)).

Proof

For the probabilistic algorithm \(\mathbb P\) we choose the verifier \(\mathbb V\) according to (r2). By (r5) we know that \(\mathbb V\) on input \(({G}_m, {H}_m,m)\) runs in time polynomial in \((|V({G}_m)|+ |V({H}_m)|)\). We can assume that \(\mathbb P\) satisfies (r2)(b\('\)) instead of (r2)(b), where

  • (r2) (b\('\)) for sufficiently large \(m\in \mathbb N\), \(\Pr \big [\mathbb V \text { accepts } (G_{m}, H_{m},m)\big ] \ge 4/5\).

This is achieved by the standard amplification method. More precisely, by repeating the algorithm \(\mathbb P\), on input m, polynomial many times, that is, polynomial in \((|V({G}_m)|+ |V({H}_m)|)\) many times, and each time checking whether \(\mathbb V\) accepts \(({G}_m, {H}_m, m)\), where \(({G}_m, {H}_m)\) is the output of \(\mathbb P\).

By the properties of \(\mathbb P\), we know that for some \(d\in \mathbb N\) with \(d\ge 10\):

  • The running time of \(\mathbb P\) on m is bounded by \((|V({G}_m)|+ |V({H}_m)|)^d\).

  • The running time of the algorithms \(\mathbb V\) on inputs (GHm) with \(f(m)\le \text {max}\{|V(G)|\), \(|V(H)|\}\) is bounded by \((|V(G)|+ |V(H)|)^d\).

We let \(\mathbb A\) be the following deterministic algorithm:

figure a

Then (a1) holds as \(2^\ell =(|V({G}_m)|+ |V({H}_m)|)^{O(1)}\). Since \(\mathbb P\) generates strongly monotone sequences, also (a3) holds. It remains to establish (a2). For a contradiction assume that

$$\begin{aligned} \text {for infinitely many } m\in \mathbb N: \Pr _{p\in [t_m]}\big [\mathbb V \text { accepts } (G^p_m, H^p_m, m) \big ]\le 1/2. \end{aligned}$$
(2)

For every \(m\in \mathbb N\) we let

$$\begin{aligned} n_m:= |V({G}_m)|+ |V({H}_m)|. \end{aligned}$$

Clearly there is an algorithm that decides in time \(O(n^{d+1})\) whether a given \(n\in \mathbb N\) is equal to \(n_m\) for some \(m\in \mathbb N\), and if so, outputs \(m \left( \text {which is unique by (a3)}\right) \). We consider the following algorithm \(\mathbb D\):

figure b

By (r2)(b\('\)), for sufficiently large \(m\in \mathbb N\), and hence sufficiently large \(n^*:=2^{\left\lceil d \cdot \text {log}\;n_m\right\rceil }\),

$$\begin{aligned} \Pr _{r\in \{0,1\}^{n^*}}\big [\mathbb D \text { accepts } r\big ] = \Pr _{p\in [t_m]} \big [\mathbb V \text { accepts } (G^p_m, H^p_m, m)\big ] \ge 4/5. \end{aligned}$$
(3)

Furthermore note that by (2),

$$\begin{aligned} \text {for infinitely many } m\,\text {and}\,\ell := c \cdot \left\lceil d \cdot \text {log}\;n_m\right\rceil :\ \ \Pr _{s\in \{0,1\}^{\ell }}\big [\mathbb D(\mathbb G(s))= 1\big ] \le 1/2. \end{aligned}$$
(4)

Moreover, as \(f(m)\le \text {max}\{|V({G}_m)|,|V({H}_m)|\}\) (by the strong monotonicity of the random \((\textsc {3-Col},\text {LFP})\)-sequence computed by \(\mathbb P\)), we see that the running time of \(\mathbb D\) is bounded by \(O(|r|^{1+1/d})\le O(|r|^{1.1})\). Using the Cook-Levin’s reduction, from the algorithm \(\mathbb D\) we can construct, for every \(m\in \mathbb N\) and \(n^*:=2^{\left\lceil d \cdot \text {log}\;n_m\right\rceil }\), a circuit \(C_{n^*}\) such that for every \(r\in \{0,1\}^{n^*}\),

$$\begin{aligned} C_{n^*}(r)= 1 \iff \mathbb D \text { accepts r} \end{aligned}$$
(5)

and such that for the size \(|C_{n^*}|\) of the circuit \(C_{n^*}\) we have

$$\begin{aligned} |C_{n^*}|= O\big ((n^*)^{2.2}\big ). \end{aligned}$$
(6)

By (3) and (5), for sufficiently large \(m\in \mathbb N\), and hence sufficiently large \(n^*= 2^{\left\lceil d \cdot \text {log}\;n_m\right\rceil }\),

$$\begin{aligned} \Pr _{r\in \{0,1\}^{n^*}}\big [C_{n^*}(r)= 1\big ] = \Pr _{p\in [t_m]} \big [\mathbb V \text { accepts } (G^p_m, H^p_m, m)\big ] \ge 4/5. \end{aligned}$$

By (4) and (5), we know that for infinitely many \(m\in \mathbb N\) and \(\ell := c\cdot \left\lceil d \cdot \text {log}\;n_m\right\rceil \) we have for \(n^*=2^{\left\lceil d \cdot \text {log}\;n_m\right\rceil }\),

$$\begin{aligned} \Pr _{s\in \{0,1\}^{\ell }}\big [C_{n^*}(\mathbb G(s))=1\big ]\le 1/2. \end{aligned}$$

Together with the previous inequality, for such an m and the corresponding \(\ell \) and \(n^*\),

$$\begin{aligned} \Big |\Pr _{r\in \{0,1\}^{n^*}}\big [C_{n^*}(r)=1\big ] - \Pr _{s\in \{0,1\}^{\ell }}\big [C_{n^*}(\mathbb G(s))=1\big ]\Big | \ge 4/5-1/2>1/10, \end{aligned}$$

which, by (6), contradicts (g2) in Definition 17.

   \(\Box \)

Proof of Theorem 15: Assume that there is a probabilistic algorithm that generates a random ordered \((\textsc {3-Col},\text {LFP})\)-sequence in polynomial output time. We show that there is a deterministic algorithm which generates a \((\textsc {3-Col},\text {LFP})\)-sequence in polynomial output time. This contradicts Theorem 11.

By Lemmas 16 and 18 there is an algorithm \(\mathbb A\) with the properties stated in Lemma 18. We show that the following algorithm \(\mathbb S\) generates a \((\textsc {3-Col},\text {LFP})\)-sequence \((G'_m, H'_m)_{m\in \mathbb N}\) in polynomial output time.

figure c

By (a2) of Lemma 18, the algorithm \(\mathbb S\) will halt on input m and yield the desired \((G'_m, H'_m)\). By (a3) of Lemma 18, the algorithm \(\mathbb V\) is applied to inputs (GHm) with \(f(m)\le \text {max}\{|V(G)|,|V(H)|\}\); on such inputs its running time is bounded by \((|V(G)|+|V(H)|)^{O(1)}\). Together with (a1), this shows that \(\mathbb S\) runs in polynomial output time.    \(\Box \)

In contrast to deterministic algorithms generating “standard” \((\textsc {3-Col}, \text {LFP})\)-sequences we require of randomized \((\textsc {3-Col},\text {LFP})\)-sequences \(({G}_m, {H}_m)_{m\in \mathbb N}\) that the property

$${G}_m\equiv _{{LFP }_m}{H}_m, \ \ {G}_m\in \textsc {3-Col}, \text { and } {H}_m\notin \textsc {3-Col}$$

can be checked in a reasonable time (the existence of the verifier, see property (r2) in Definition 14). What happens if we drop this requirement? The following sections address this problem.

6 The Planted Clique Conjecture

In the standard planted clique problem, we are given a graph G whose edges are generated by starting with a random graph with universe [n], then “planting” (adding edges to make) a random clique on k vertices; the problem asks for efficient algorithms finding such a clique of size k. The problem was addressed in [2, 13, 16], the authors of the last paper mention that it was suggested by M. Saks. It has applications in cryptography [14], algorithmic game theory [11, 17], and classical complexity [19]. Here we study some consequences for the Ehrenfeucht-Fraïssé method of a “logic reformulation” of the planted clique problem.

The Erdős-Rényi probability space \(\text {ER}(n,1/2)\) is obtained as follows. We start with the set [n] of vertices. Then we choose every \(e\in \left( {\begin{array}{c}[n]\\ 2\end{array}}\right) \ \ \big (:=\{X\subseteq [n]\mid |X|=2\}\big )\) as an edge with probability 1 / 2, independently of the choices of other edges.

For \(G\in \text {ER}(n,1/2)\) the expected size of a maximum clique is approximately \(2 \cdot \text {log}\;n\). Clearly, the probability that \(G\in \text {ER}(n,1/2)\) contains a clique of size k is bounded by

$$\begin{aligned} {\left( \begin{array}{c} n \\ k\\ \end{array} \right) }\cdot 2^{\tiny {-{\left( \begin{array}{c} k \\ 2\\ \end{array} \right) }}}. \end{aligned}$$

For \(k= 4\cdot \text {log}\;n\) we have

$$\begin{aligned} {\left( \begin{array}{c} n \\ k\\ \end{array} \right) }\cdot 2^{\tiny {-{\left( \begin{array}{c} k \\ 2\\ \end{array} \right) }}} \le n^{4\cdot \text {log}\;n}\cdot 2^{\tiny {-{\left( \begin{array}{c} k \\ 2\\ \end{array} \right) }}} = 2^{4\cdot \text {log}\;^2 n}\cdot 2^{2\cdot \text {log}\;n- 8\cdot \text {log}\;^2 n} \le 2^{- 2\cdot \text {log}\;^2 n}=n^{-2\cdot \text {log}\;n}. \end{aligned}$$

Thus

Proposition 19

\(\Pr _{G\in {ER }(n,1/2)}\big [\text {G contains a clique of size 4}\cdot {log }\,n \big ]\!=\!\!\frac{1}{n^{\Omega ({log } n)}}\).

For any graph G with vertex set [n] and \(A\subseteq [n]\) we denote by \(G+ K(A)\) the graph obtained from G by adding edges such that the subgraph induced on A is a clique. For \(n\in \mathbb N\) and \(k\in [n]\) we consider a second distribution \(\text {ER}(n,1/2,k)\): pick a random (ordered) graph \(G\in \text {ER}(n,1/2)\) and a uniformly random subset A of [n] of size k and plant in a clique on A in G, thus getting \(G+ K(A)\).Footnote 4 We view G and \(G+ K(A)\) as ordered graphs equipped with the natural ordering on [n].

The following decision version \(\text {PCC}(\delta )\) of the planted clique conjecture states that no polynomial time algorithm distinguishes between the distributions \(\text {ER}(n,1/2)\) and \(\text {ER}(n,1/2,\ 4\cdot \text {log}\;n)\) more than \(\delta (n)\).

Conjecture 20

(The Planted Clique Conjecture PCC \((\delta )\) ). Let \(\delta :\mathbb N\rightarrow \mathbb R\) with \(0< \delta (n)< 1\) for all \(n\in \mathbb N\). For every polynomial time algorithm \(\mathbb A\) there is an \(n_0\in \mathbb N\) such that for all \(n\ge n_0\),

$$\begin{aligned} \left| \Pr _{G\in \text {ER}(n,1/2)}\big [\mathbb A \text { accepts G }\big ] - \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )}\big [\mathbb A \text { accepts } G+ K(A) \big ]\right| \le \delta (n). \end{aligned}$$

Clearly, if \(\delta (n)\le \delta '(n)\) for all \(n\in \mathbb N\), then \(\text {PCC}(\delta )\) implies \(\text {PCC}(\delta ') \). In [14] the assumption \(\text {PCC}(1-1/q)\) for some \(q\in \mathbb N[X]\), that is, for some polynomial q with natural numbers as coefficients, has been put to good use.

Proposition 21

For \(q\in \mathbb N[X]\), the statement \(\text {PCC}(1-1/q)\) implies \(\text {P}\ne \text {NP}\).

Proof

By Proposition 19 we know that for sufficiently large n,

$$\begin{aligned} \Pr _{G\in \text {ER}(n,1/2)}\big [G \text { contains a clique of size } 4\cdot \text {log}\;n \big ]< 1/q(n). \end{aligned}$$
(7)

If \(\text {P}= \text {NP}\), then there is a (deterministic) polynomial time algorithm \(\mathbb A\) deciding whether a graph contains a clique of size \(4\cdot \text {log}\;n\). For such an \(\mathbb A\) we have by (7),

$$\begin{aligned} \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )} \big [\mathbb A \text { accepts } G+ K(A) \big ]- \Pr _{G\in \text {ER}(n,1/2)} \big [\mathbb A \text { accepts } G\big ] > 1-\frac{1}{q(n)}. \end{aligned}$$

This contradicts to \(\text {PCC}(1-1/q) \).    \(\Box \)

By the Immerman-Vardi Theorem, on ordered graphs polynomial time algorithms correspond to LFP-sentences. Therefore, \(\text {PCC}(\delta )\) just says that for every LFP-sentence \(\varphi \) and all sufficiently large n,

$$\begin{aligned} \left| \Pr _{G\in \text {ER}(n,1/2)}\big [G\,\models \, \varphi \big ] - \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )} \big [G+ K(A)\,\models \, \varphi \big ]\right| \le \delta (n). \end{aligned}$$

This holds if

$$\begin{aligned} \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )} \big [G\,\models \, \varphi \iff {G}+K(A)\,\models \, \varphi \big ]\ge 1-\delta (n). \end{aligned}$$
(8)

For our intended application to the Ehrenfeucht-Fraïssé-method we need an even stronger assumption, namely that for every \(m\in \mathbb N\) and all sufficiently large n,

$$\begin{aligned} \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )} \big [\text {for all } \varphi \in \text {LFP}_m: \ \big (G\,\models \, \varphi \iff G+ K(A)\,\models \, \varphi \big )\big ] \ge 1-\delta (n), \end{aligned}$$

or more succinctly,

$$\begin{aligned} \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )}\big [G\equiv _{{LFP }_m}G+ K(A)\big )\big ]\ge 1-\delta (n). \end{aligned}$$

We shall need an effective bound for the rate of convergence. So we introduce the following logic version \(\text {LPCC}(\varepsilon )\) of the planted clique conjecture.

Conjecture 22

(LPCC \((\varepsilon )\) ). Let \(\varepsilon :\mathbb N\rightarrow \mathbb R\) with \(0<\varepsilon (n)<1\) for all \(n\in \mathbb N\). There is a computable function \(f:\mathbb N\rightarrow \mathbb N\) such that for every \(m\in \mathbb N\) and all \(n\ge f(m)\),

$$\begin{aligned} \Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )}\big [G\equiv _{{LFP }_m}G+ K(A)\big ]\ge \varepsilon (n). \end{aligned}$$

The previous remarks show:

Proposition 23

Let \(\varepsilon :\mathbb N\rightarrow \mathbb R\) with \(0<\varepsilon (n)<1\) for all \(n\in \mathbb N\). Then \({LPCC }(\varepsilon )\) implies \({PCC }(1-\varepsilon ) \).

By this proposition and Proposition 21, we get

Corollary 24

For \(q\in \mathbb N[X]\), \({LPCC }(1/q)\) implies \({P }\ne {NP }\).

Assume that \(\text {LPCC}(\varepsilon )\) holds. By taking a natural number m such that \(\text {LFP}_m\) contains a sentence expressing that the number of edges is even, we see that \(\lim _{n\in \mathbb N}\varepsilon (n)\le 1/2\). In Proposition 26 we generalize this and show that \(\lim _{n\rightarrow \infty }\varepsilon (n)\) must be 0.

7 The Planted Clique Conjecture and \((\textsc {3-Col}\),\(\text {LFP})\)-sequences

The following result shows that, assuming \(\text {LPCC}(1/q)\), there is a probabilistic algorithm yielding a random sequence \(({G}_m, {H}_m)_{m\in \mathbb N}\) such that

$$\begin{aligned} {G}_m\equiv _{{LFP }_m}{H}_m,\ \ {G}_m\in \textsc {3-Col},\ \ \text {and}\ \ {H}_m\notin \textsc {3-Col}\end{aligned}$$
(9)

holds with high probability. By Theorem 15 we cannot have a verifier for this algorithm, that is an efficient algorithm that verifies the properties stated in (9) (assuming the existence of a pseudorandom generator).

Theorem 25

Assume that \({LPCC }(1/q)\) holds for some polynomial \(q\in \mathbb N[X]\). Then there is a probabilistic algorithm \(\mathbb P\) which on input \(m\in \mathbb N\) generates a pair \(({G}_m, {H}_m)\) of ordered graphs in time \((|V({G}_m)|+|V({H}_m)|)^{O(1)}\) such that

$$\begin{aligned} \Pr \big [{G}_m\equiv _{{LFP }_m}{H}_m,\ \ {G}_m\in \textsc {3-Col},\ \ \text {and}\ \ {H}_m\notin \textsc {3-Col}\big ] \ge \frac{1}{\big (|V({G}_m)|+|V({H}_m)|\big )^{O(1)}}. \end{aligned}$$

Moreover, \(\mathbb P\) on input \(m\in \mathbb N\) first deterministically computes the vertex sets of the graphs \({G}_m\) and \({H}_m\).

Proof

Consider the problem

figure d
  • \(\text {LPCC}(1/q)\) for some \(q\in \mathbb N[X]\)” essentially states that there is a probabilistic algorithm \(\mathbb P\) which generates a \(\big (\textsc {Clique}(4\cdot \text {log}\;),\text {LFP}\big )\)-sequence \(({G}_m,{H}_m)_{m\in \mathbb N}\) of ordered graphs in polynomial output time such that

    $$\begin{aligned}&\Pr \big [{G}_m\equiv _{{LFP }_m}{H}_m,\ \ {G}_m\in \textsc {Clique}(4\cdot \text {log}\;), \ \ \text {and}\ \ {H}_m\notin \textsc {Clique}(4\cdot \text {log}\;)\big ]\\&\quad \ge \frac{1}{\big (|V({G}_m)|+|V({H}_m)|\big )^{O(1)}}. \end{aligned}$$
  • As \(\textsc {Clique}(4\cdot \text {log}\;)\) is in NP and \(\textsc {3-Col}\) is NP-complete and has a padding function, we can transform the \(\big (\textsc {Clique}(4\cdot \text {log}\;),\text {LFP}\big )\)-sequence into a \((\textsc {3-Col},\text {LFP})\)-sequence.   \(\Box \)

8 Some Remarks on the Logic Version of the Planted Clique Conjecture

In this section we show (see Lemma 27) that with positive asymptotic probability we can distinguish the \(\text {LFP}_m\)-theory of the graphs G and \(G+ K(A)\) by modulo counting their edges (see Lemma 27 for the precise statement). Using this fact, we refute \(\text {LPCC}(\varepsilon )\) unless \(\lim _{n\in \mathbb N}\varepsilon (n)=0\).

Proposition 26

Let \(\varepsilon :\mathbb N\rightarrow \mathbb R^+\). If \({LPCC }(\varepsilon ) \) holds, then \(\lim _{n\in \mathbb N}\varepsilon (n)= 0\)

Proof

It suffices to show that for every positive \(\delta \in \mathbb R\) there is an \(m\in \mathbb N\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )} \big [G\equiv _{{LFP }_m}G+ K(A)\big ]\le \delta . \end{aligned}$$

This is an immediate consequence of the following lemma as there are LFP-sentences expressing in an ordered graph that the number of edges is congruent i modulo \(\ell \) (for \(\ell \in \mathbb N\) and \(i\in \{0,\ldots , \ell -1\}\)).    \(\Box \)

Lemma 27

Let \(\ell \in \mathbb N\) and \(i\in \{0,\ldots , \ell -1\}\). Then for every nondecreasing and unbounded function \(h:\mathbb N\rightarrow \mathbb N\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr _{G+ K(A)\in \text {ER}(n,1/2,\ h(n))} \big [\ |E(G+ K(A))|- |E(G)|\equiv i \mod \ell \big ]= \frac{1}{\ell }. \end{aligned}$$

Proof

Let \(n\in \mathbb N\) and \(k\in [n]\). Then, for every graph G with vertex set [n], every subset A of [n] of size k, and every \(i\in \{0,1, \ldots , \ell -1\}\), we have

$$\begin{aligned} \big |E(G+ K(A))\big |- |E(G)|\equiv i \mod \ell \!\!\iff \!\! \big |E(G) \cap E(K(A))\big | \equiv \left( {\begin{array}{c}k\\ 2\end{array}}\right) - i \mod \ell . \end{aligned}$$
(10)

Here, E(K(A)) denotes the set of edges of the clique on A. We set \(s(k):= \left( {\begin{array}{c}k\\ 2\end{array}}\right) \). Then \(|E({K}(A))|=s(k)\). For every \(r\in \{0,1, \ldots , \ell -1\}\), we let \(a_r(k)\) be the number of those subsets of E(K(A)), whose cardinality is equivalent to r modulo \(\ell \); thus

$$\begin{aligned} a_r(k)= \sum _{0\le j\le s}^{j\equiv r\mod \ell }\left( {\begin{array}{c}s(k)\\ j\end{array}}\right) . \end{aligned}$$

Note that \(a_r(k)\) does not depend on n (and in particular, not on the chosen subset A of [n] of size k). By (10), we get for all \(n\ge k\), all subsets A of [n] of size k, and all \(i\in \{0,1, \ldots , \ell -1\}\),

$$\begin{aligned} \Pr _{G\in \text {ER}(n,1/2)}\Big [\ \big |E(G+ K(A))\big |- \big |E(G)\big |\equiv i \mod \ell \Big ] = \frac{a_{s(k)-i}}{2^{s(k)}}. \end{aligned}$$
(11)

Claim 1. Let \(r\in \{0,1, \ldots , \ell -1\}\). Then (here \(a_\ell (k):=a_0(k)\)),

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{|a_{r+1}(k)- a_{r}(k)|}{2^{s(k)}}= 0. \end{aligned}$$

Proof of Claim 1: First we show that there is a positive \(\iota \in \mathbb R\) such for all sufficiently small positive \(\delta \in \mathbb R\) and all \(n\in \mathbb N\) with \((1/2- \delta )\cdot n\in \mathbb N\),

$$\begin{aligned} \left( {\begin{array}{c}n\\ (1/2- \delta )\cdot n\end{array}}\right) = O\left( \frac{2^{(1- \iota \delta ^2)\cdot n}}{\sqrt{n}}\right) . \end{aligned}$$
(12)

In fact, using Stirling’s formula

$$\begin{aligned} \sqrt{2\pi n}\cdot \left( \frac{n}{e}\right) ^n \le n! \le e\cdot \sqrt{n}\cdot \left( \frac{n}{e}\right) ^n, \end{aligned}$$

we get for \(n\in \mathbb N\) and \(\varepsilon \in \mathbb R\) with \(\varepsilon \cdot n \in \mathbb N\),

$$\begin{aligned} \left( {\begin{array}{c}n\\ \varepsilon \cdot n\end{array}}\right) \le \frac{e\cdot 2^{H(\varepsilon )\cdot n}}{2\pi \cdot \sqrt{\varepsilon \cdot (1- \varepsilon )\cdot n}}. \end{aligned}$$
(13)

Here \(H:(0,1) \rightarrow \mathbb R\) denotes the binary entropy function defined by

$$\begin{aligned} H(\varepsilon ) = - \varepsilon \cdot \text {log}\;\varepsilon - (1-\varepsilon )\cdot \text {log}\;(1- \varepsilon ). \end{aligned}$$

Recall that H attains 1, its maximum value, at \(\varepsilon = 1/2\). We want to bound the values of H in the neighborhood of 1 / 2. Let \(\delta \in \mathbb R\) with \(0\le \delta < 1/2\). Then

$$\begin{aligned} H(1/2- \delta ) = - (1/2- \delta )\cdot \text {log}\;(1/2- \delta ) - (1/2+ \delta )\cdot \text {log}\;(1/2+ \delta ). \end{aligned}$$

Using the Taylor series for \(\text {log}\;x\), we get from this equality that there is an \(\iota \in \mathbb R\) with \(\iota >0\) such that for sufficiently small \(\delta \in \mathbb R\) with \(\delta \ge 0\),

$$\begin{aligned} H(1/2- \delta ) \le 1- \iota \cdot \delta ^2. \end{aligned}$$
(14)

Hence, assuming in addition that \(\delta < 1/\sqrt{8}\) and \((1/2- \delta )\cdot n\in \mathbb N\),

$$\begin{aligned} \left( {\begin{array}{c}n\\ (1/2- \delta )\cdot n\end{array}}\right)&\le \frac{e\cdot 2^{(1- \iota \cdot \delta ^2)\cdot n}}{2\pi \cdot \sqrt{(1/4- \delta ^2)\cdot n}}&\text {(by }(13)\,\,{\text {and}}\,\,(14))\\&= O\left( \frac{2^{(1- \iota \cdot \delta ^2)\cdot n}}{\sqrt{n}}\right)&\text {(as } \delta ^2< 1/8), \end{aligned}$$

which is the desired equality.

Now let \(j,s \in \mathbb N\) satisfy \(0\le j< s\). Note that

$$\begin{aligned} \left( {\begin{array}{c}s\\ j+1\end{array}}\right) - \left( {\begin{array}{c}s\\ j\end{array}}\right) = \frac{s-2j-1}{j+1} \cdot \left( {\begin{array}{c}s\\ j\end{array}}\right) . \end{aligned}$$
(15)

We distinguish two cases.

Case \(j \le s/2 - \root 3 \of {s^2}\): Then \(j\le (1/2-\delta )\cdot s\) for \(\delta \in (s^{-2/3}, s^{-1/3})\). If \((1/2-\delta )\cdot s\in \mathbb N\), we get by (12)

Case \(s/2 - \root 3 \of {s^2}< j< s/2\): Then

Putting all together we get the statement of Claim 1 as follows

$$\begin{aligned} {}&a_{r+1}(k)- a_r(k) = \sum _{0\le j\le s(k)}^{j\equiv r+1\mod \ell }\left( {\begin{array}{c}s(k)\\ j\end{array}}\right) - \sum _{0\le j\le s(k)}^{j\equiv r\mod \ell }\left( {\begin{array}{c}s(k)\\ j\end{array}}\right) \\&\quad \le \sum _{0\le j < s(k)/2}^{j\equiv r\mod \ell }\left( \left( {\begin{array}{c}s(k)\\ j+1\end{array}}\right) - \left( {\begin{array}{c}s(k)\\ j\end{array}}\right) \right) \\&\quad = \sum _{0\le j \le s(k)/2- \root 3 \of { s(k)^2}}^{j\equiv r\mod \ell }\left( \left( {\begin{array}{c} s(k)\\ j+1\end{array}}\right) - \left( {\begin{array}{c}s(k)\\ j\end{array}}\right) \right) \\&\quad {} + \sum _{s(k)/2 - \root 3 \of {s(k)^2}< j< s(k)/2}^{j\equiv r\mod \ell }\left( \left( {\begin{array}{c} s(k)\\ j+1\end{array}}\right) - \left( {\begin{array}{c}s(k)\\ j\end{array}}\right) \right) \\&\quad = O\left( \frac{s(k)\cdot \sqrt{s}\cdot 2^{s(k)}}{2^{\iota \cdot \root 3 \of {s(k)}}}\right) + O\left( \frac{s(k)^{2/3}\cdot 2^{s(k)}}{s(k)^{5/6}}\right) \text { (by the equalities derived above) }\\&\quad = o(2^{s(k)}) \end{aligned}$$

Similarly we can show \(a_r(k)- a_{r+1}(k)= o(2^{s(k)})\). \({\dashv }\)

Claim 2. Let \(\delta >0\). If k is sufficiently large, then for all \(n\ge k\), all subsets A of [n] of size k, and all \(i\in \{0,1, \ldots , \ell -1\}\), we have

$$\begin{aligned} \frac{1}{\ell }-\delta \le \Pr _{G\in \text {ER}(n,1/2)} \Big [\ \big |E(G+ K(A))\big |- \big |E(G)\big |\equiv i \mod \ell \Big ] \le \frac{1}{\ell }+ \delta . \end{aligned}$$

Proof of Claim 2: For every \(i\in \{0,1,\ldots , \ell -1\}\) let

$$\begin{aligned} p_i(k) : = \frac{a_{s(k)-i}(k)}{2^{s(k)}}. \end{aligned}$$

Claim 1 implies that for every \(\iota > 0\) and all sufficiently large k,

$$\begin{aligned} \big |p_{i+1}(k)- p_{i}(k)\big | \le \iota . \end{aligned}$$

Thus,

$$\begin{aligned} p_0(k)- i\cdot \iota \le p_i(k)\le p_0(k)+ i\cdot \iota . \end{aligned}$$
(16)

As \(\sum _{j=0}^{\ell -1} j=\ell \cdot (\ell -1)/2\), we obtain

$$\begin{aligned} \ell \cdot p_0(k) - \frac{\ell \cdot (\ell -1)}{2}\cdot \iota \le \sum _{j=0}^{\ell -1} p_j(k)= 1 \le \ell \cdot p_0(k) + \frac{\ell \cdot (\ell -1)}{2}\cdot \iota \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{\ell }- \frac{(\ell -1)}{2}\cdot \iota \le p_0(k) \le \frac{1}{\ell }+ \frac{(\ell -1)}{2}\cdot \iota . \end{aligned}$$
(17)

Choosing \(\iota \) small enough, (16) and (17) imply for all sufficiently large k and every \(i\in \{0,1,\ldots , \ell -1\}\),

$$\begin{aligned} \frac{1}{\ell }-\delta \le p_i(k)\le \frac{1}{\ell }+ \delta . \end{aligned}$$

As for all \(n\ge k\), all subsets A of [n] of size k, and all \(i\in \{0,1, \ldots , \ell -1\}\), we have (compare (11))

$$\begin{aligned} p_{i}(k)= \frac{a_{s(k)-i}}{2^{s(k)}} =\Pr _{G\in \text {ER}(n,1/2)}\Big [\ \big |E(G+ K(A))\big |- \big |E(G)\big | \equiv i \mod \ell \Big ], \end{aligned}$$

this yields our claim. \({\dashv }\)

Clearly, Claim 2 immediately implies the statement of Lemma 27.    \(\Box \)

9 Further Results and Open Questions

In Sect. 4 we have seen that for no problem Q of ordered graphs there exists a \((Q,\text {LFP})\)-sequence, which can be generated in polynomial output time. Recall that \(\text {LFP}\) captures polynomial time on ordered graphs. More generally, let L be a logic capturing one of the complexity classes LOGSPACE, P, or PSPACE on (ordered) graphs: Then, for no problem Q of (ordered) graphs we can generate a (QL)-sequence \(( {G}_m,{H}_m)\) by an algorithm which satisfies the resource bound in \(|V({G}_m)|+|V({H}_m)|\) characteristic for the corresponding complexity class, e.g., not in space \(O(\text {log}\;(|V({G}_m)|+|V({H}_m)|))\) for LOGSPACE. Furthermore there are extensions of these results to “nondeterministic classes” such as NLOGSPACE and NP and extensions for so-called Ajtai-Fagin games adequate for (monadic) \(\Sigma ^1_1\) (see [5] for most of these results).

We are far from understanding when an efficiently computable (QL)-sequence exists. Even for first-order logic we have no simple and informative characterization of the problems Q with a \((Q,\text {FO})\)-sequence computable in polynomial output time. Besides the “negative” Example 13, we have a positive result: If Q is NP-hard under FO-reductions (a property shared by many natural NP-complete problems), then a \((Q,\text {FO})\)-sequence can be generated in polynomial output time.

In Sect. 5 we have mentioned that in most applications of the Ehrenfeucht-Fraïssé-method the verification that \({G}_m\) and \({H}_m\) satisfy the same sentences of the corresponding logic of “quantifier rank” or length \(\le m\) was done by an algorithm running in time \(f(m)\cdot (|V({G}_m)|+|V({H}_m)|)^{O(1)}\) for some computable function f. In the Appendix of [5], we have shown this explicitly for two (nontrivial) applications of the method. However, this is not always the case; for example, not for the highly nontrivial application of the Ehrenfeucht-Fraïssé-method in [21].

We have seen in Sect. 6 that LPCC(1 / q) for some \(q\in \mathbb N[X]\) implies \(\text {P}\ne \text {NP}\). Can one refute the statement “there is a \(q\in \mathbb N[X]\) with LPCC(1 / q)?” or are there results or insights which make the statement plausible?

Furthermore, we ask: Is it true that for every single LFP-sentence \(\varphi \) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathop {\Pr }\limits _{G+ K(A)\in \text {ER}(n,1/2,\ 4\cdot \text {log}\;n )} \big [G\,\models \, \varphi \iff G+ K(A)\models \varphi \big ]\ge 1/2? \end{aligned}$$