1 Introduction

In a random constraint satisfaction problem (csp), we have n variables taking values in a (finite) alphabet \(\mathcal {X}\), subject to a random set of constraints. In previous works on models of this kind, it has emerged that the space of solutions—a random subset of \(\mathcal {X}^n\)—can have a complicated structure, posing obstacles to mathematical analysis. Major advances in intuition were achieved by statistical physicists, who developed powerful analytic heuristics to shed light on the behavior of random csps ([31] and references therein). Their insights and methods are fundamental to the current understanding of random csps.

One prominent application of the physics heuristic is in giving explicit (non-rigorous) predictions for the locations of satisfiability thresholds in a large class of random csps ([36] and others). Recent works have given rigorous proofs for some of these thresholds: in the random regular nae-sat model [22, 25], in the random k-sat model [23], and in the independent set model on random regular graphs [24]. However, the satisfiability threshold is only one aspect of the rich picture that physicists have developed. There are deep conjectures for the behavior of these models inside the satisfiable regime, and it remains an outstanding mathematical challenge to prove them. In this paper we address one part of this challenge, concerning the total number of solutions for a typical instance in the satisfiable regime.

1.1 Main result

Given a cnf boolean formula, a not-all-equal-sat (nae-sat) solution is an assignment \(\underline{{{\varvec{x}}}}\) of literals to variables such that both \(\underline{{{\varvec{x}}}}\) and its negation \(\lnot \underline{{{\varvec{x}}}}\) evaluate to true—equivalently, such that no clause gives the same evaluation to all its variables. A k-nae-sat problem is one in which each clause has exactly k literals; it is termed d-regular if each variable appears in exactly d clauses. Sampling such a formula in a uniformly random manner gives rise to the random d-regular k-nae-sat model. (The formal definition is given in Sect. 2.) See [3] for important early work on the closely related model of random (Erdős–Rényi) nae-sat. The appeal of this model is that it has certain symmetries making the analysis particularly tractable, yet it is expected to share most of the interesting qualitative phenomena exhibited by other commonly studied problems, including random k-sat and random graph colorings.

Following convention, we fix k and then parametrize the model by its clause-to-variable ratio, \(\alpha \equiv d/k\). The partition function, denoted \(Z\equiv Z_n\), is the number of valid nae-sat assignments for an instance on n variables. It is conjectured that for each \(k\geqslant 3\), the model has an exact satisfiability threshold \(\alpha _\text {sat}(k)\): for \(\alpha <\alpha _\text {sat}\) it is satisfiable (\(Z>0\)) with high probability, but for \(\alpha >\alpha _\text {sat}\) it is unsatisfiable (\(Z=0\)) with high probability. This has been proved [25, Thm. 1] for all k exceeding an absolute constant \(k_0\), together with an exact formula for \(\alpha _\text {sat}\) which matches the physics prediction. It can be approximated as

$$\begin{aligned} \alpha _\text {sat}= \left( 2^{k-1} -\frac{1}{2} -\frac{1}{4\ln 2}\right) \ln 2 + \epsilon _k \end{aligned}$$
(1)

where \(\epsilon _k\) denotes an error tending to zero as \(k\rightarrow \infty \).

We say the model has free energy \({\textsf {f}}(\alpha )\) if \(Z^{1/n}\) converges to \({\textsf {f}}(\alpha )\) in probability as \(n\rightarrow \infty \). A priori, the limit may not be well-defined. If it exists, however, Markov’s inequality and Jensen’s inequality imply that it must be upper bounded by the replica symmetric free energy

$$\begin{aligned} {\textsf {f}}^\textsc {rs}(\alpha ) \equiv (\mathbb {E}Z)^{1/n} = 2 \bigg (1-\frac{2}{2^k}\bigg )^\alpha \,. \end{aligned}$$
(2)

(In this model and in other random regular models, the replica symmetry free energy is the same as the annealed free energy.) One of the intriguing predictions from the physics analysis [38, 44] is that there is a critical value \(\alpha _\text {cond}\) strictly below \(\alpha _\text {sat}\), such that \({\textsf {f}}(\alpha )\) and \({\textsf {f}}^\textsc {rs}(\alpha )\) agree up to \(\alpha =\alpha _\text {cond}\) and diverge thereafter. In particular, this implies that the function \({\textsf {f}}(\alpha )\) must be non-analytic at \(\alpha =\alpha _\text {cond}\). This is the condensation transition (or Kauzmann transition), and will be further described below in Sect. 1.2. For all \(0\leqslant \alpha <\alpha _\text {sat}\), the free energy is predicted to be given by a formula

$$\begin{aligned}{\textsf {f}}(\alpha )={\textsf {f}}^\textsc {1rsb}(\alpha ) {\left\{ \begin{array}{ll} ={\textsf {f}}^\textsc {rs}(\alpha ) &{} \text {for }0\leqslant \alpha \leqslant \alpha _\text {cond},\\ <{\textsf {f}}^\textsc {rs}(\alpha ) &{} \text {for }\alpha >\alpha _\text {cond}. \end{array}\right. } \end{aligned}$$

The function \({\textsf {f}}^\textsc {1rsb}(\alpha )\) is quite explicit, although not extremely simple to state; it is formally presented below in Definition 1.3. The formula for \({\textsf {f}}^\textsc {1rsb}(\alpha )\) is derived via the one-step replica symmetry breaking (1rsb) heuristic, discussed further below. Our main result is to prove this prediction for large k:

Theorem 1

In random regular k-nae-sat with \(k\geqslant k_0\), for all \(\alpha <\alpha _\text {sat}(k)\) the free energy \({\textsf {f}}(\alpha )\) exists and equals the predicted value \({\textsf {f}}^\textsc {1rsb}(\alpha )\).

Remark 1.1

We allow for \(k_0\) to be adjusted as long as it remains an absolute constant (so it need not equal the \(k_0\) from [25]). It is assumed throughout the paper that \(k\geqslant k_0\), even when not explicitly stated. The following considerations restrict the range of \(\alpha =d/k\) that we must consider:

  • A convenient upper bound on the satisfiable regime is given by

    $$\begin{aligned} \alpha _\text {sat}\leqslant \alpha _{\textsc {rs}}\equiv \frac{\ln 2}{-\ln (1-2/2^k)} < 2^{k-1}\ln 2 \equiv \alpha _\text {ubd}\,. \end{aligned}$$

    This bound is certainly implied by the estimate (1) from [25], but it follows much more easily and directly from the first moment calculation (2). Indeed, we see from (2) that the function \({\textsf {f}}^\textsc {rs}(\alpha )\) is decreasing in \(\alpha \) and satisfies \({\textsf {f}}^\textsc {rs}(\alpha _{\textsc {rs}})=1\), so \((\mathbb {E}Z)^{1/n}<1\) for all \(\alpha >\alpha _{\textsc {rs}}\). Thus, by Markov’s inequality, we have that \(\mathbb {P}(Z>0) \leqslant \mathbb {E}Z\) tends to zero as \(n\rightarrow \infty \), i.e., the random problem instance is unsatisfiable with high probability.

  • For \(\alpha >\alpha _\text {sat}\) we must have \({\textsf {f}}(\alpha )=0\). On the other hand, we can see by comparing (1) and (2) that \(\alpha _\text {sat}\) is strictly smaller than \(\alpha _{\textsc {rs}}\), and \({\textsf {f}}^\textsc {rs}(\alpha _\text {sat})\) is strictly positive. This suggests that \(\alpha _\text {cond}\) occurs strictly before \(\alpha _\text {sat}\), since \(\alpha _\text {cond}=\alpha _\text {sat}\) would mean that \({\textsf {f}}(\alpha )={\textsf {f}}^\textsc {rs}(\alpha )\) up to \(\alpha _\text {sat}\), and in this case we would expect to have \(\alpha _\text {sat}=\alpha _{\textsc {rs}}\). Formally, it requires further argument to confirm that \(\alpha _\text {cond}<\alpha _\text {sat}\) in random regular nae-sat, and we obtain this as a consequence of results in the present paper. However, the phenomenon of \(\alpha _\text {cond}<\alpha _\text {sat}\) was previously confirmed by [20] and [8] for random hypergraph bicoloring and random regular sat, both of which are very similar to random regular nae-sat. As for the value of \({\textsf {f}}(\alpha )\) at the threshold \(\alpha =\alpha _\text {sat}\), we point out that \(\alpha =\alpha _\text {sat}\) makes sense in the setting of this paper only if \(d_\text {sat}(k)\equiv k\alpha _\text {sat}(k)\) is integer-valued for some k. We have no reason to think that this ever occurs; however, if it does, then the probability for \(Z>0\) is bounded away from both zero and one [25, Thm. 1]. In this case, \(Z^{1/n}\) does not concentrate around a single value but rather on two values,

    $$\begin{aligned}\bigg \{0, \lim _{\alpha \uparrow \alpha _\text {sat}} {\textsf {f}}^\textsc {1rsb}(\alpha ) \bigg \}\,. \end{aligned}$$
  • In [25, Propn. 1.1] it is shown that for \(0\leqslant \alpha \leqslant \alpha _\text {lbd}\equiv (2^{k-1}-2)\ln 2\) and n large enough,

    $$\begin{aligned} \frac{\mathbb {E}(Z^2)}{(\mathbb {E}Z)^2} \leqslant C\equiv C(k,\alpha )<\infty \end{aligned}$$

    where \(Z\equiv Z_n\) and \(C(k,\alpha )\) does not depend on n. Thus, for any fixed \(0<\epsilon <1\) and n large enough,

    $$\begin{aligned}\mathbb {P}(Z\geqslant \epsilon \mathbb {E}Z) {\mathop {\geqslant }\limits ^{\odot }} \frac{\mathbb {E}(Z \mathbf {1}\{ Z\geqslant \epsilon \mathbb {E}Z\})^2}{\mathbb {E}(Z^2)} \geqslant \frac{(1-\epsilon )^2(\mathbb {E}Z)^2}{\mathbb {E}(Z^2)} \geqslant \frac{(1-\epsilon )^2}{C} \equiv \delta \,. \end{aligned}$$

    where the step marked \(\odot \) is by the Cauchy–Schwarz inequality. The results of [25, Sec. 6] imply the stronger statement that for any \(0\leqslant \alpha \leqslant \alpha _\text {lbd}\),

    $$\begin{aligned} \lim _{\epsilon \downarrow 0} \liminf _{n\rightarrow \infty } \mathbb {P}(Z \geqslant \epsilon \mathbb {E}Z)=1\,. \end{aligned}$$

    On the other hand we already noted in (2) that \(\mathbb {E}(Z^{1/n}) \leqslant (\mathbb {E}Z)^{1/n}={\textsf {f}}^\textsc {rs}(\alpha )\) for all \(\alpha \geqslant 0\) and \(n\geqslant 1\). It follows by combining these facts that \(Z^{1/n}\) converges in probability to \({\textsf {f}}^\textsc {rs}(\alpha )\) in probability for any \(0\leqslant \alpha \leqslant \alpha _\text {lbd}\). That is to say, the result of Theorem 1 is already proved for \(\alpha \leqslant \alpha _\text {lbd}\), with \({\textsf {f}}(\alpha )={\textsf {f}}^\textsc {rs}(\alpha )\). This also implies that the condensation transition \(\alpha _\text {cond}\) must occur above \(\alpha _\text {lbd}\).

In summary, we have \(\alpha _\text {lbd}< \alpha _\text {sat}< \alpha _{\textsc {rs}}< \alpha _\text {ubd}\), and it remains to prove Theorem 1 for \(\alpha \in (\alpha _\text {lbd},\alpha _\text {sat})\). Thus, we can assume for the remainder of the paper that

$$\begin{aligned} (2^{k-1}-2)\ln 2 = \alpha _\text {lbd}\leqslant \alpha \leqslant \alpha _\text {ubd}= 2^{k-1}\ln 2\,. \end{aligned}$$
(3)

In the course of proving Theorem 1 we will also identify the condensation threshold \(\alpha _\text {cond}\in (\alpha _\text {lbd},\alpha _\text {sat})\) (characterized in Proposition 1.4 below).

The 1rsb heuristic, along with its implications for the condensation and satisfiability thresholds, has been studied in numerous recent works, which we briefly survey here. The existence of a condensation transition was first shown in random hypergraph bicoloring [20], which as we mentioned above is a model very similar to random nae-sat. We also point out [17] which is the first work to successfully analyze solution clusters within the condensation regime, leading to a very good lower bound on satisfiability threshold. This was an important precursor to subsequent works [23,24,25] on exact satisfiability thresholds in random regular nae-sat, random sat, and independent sets. Condensation has been demonstrated to occur even at positive temperature in hypergraph bicoloring (which is very similar to nae-sat) [11]. However, determining the precise location of \(\alpha _\text {cond}\) is challenging, and was first achieved for the random graph coloring model [10] by an impressive and technically challenging analysis. A related paper pinpoints \(\alpha _\text {cond}\) for random regular k-sat (which again is very similar to nae-sat) [8]. Subsequent work [15] characterizes the condensation threshold in a more general family of models, and shows a correspondence with information-theoretic thresholds in statistical inference problems. The main contribution of this paper is to determine for the first time the free energy throughout the condensation regime \((\alpha _\text {cond},\alpha _\text {sat})\).

1.2 Statistical physics predictions

According to the heuristic analysis by statistical physics methods, the random regular nae-sat model has a single level of replica symmetry breaking (1rsb). We summarize here some of the key phenomena that are predicted from the 1rsb framework [31, 38, 44], referring the reader to [33, Ch. 19] for a full expository account. While much of the following description remains conjectural, the implications at the free energy level are rigorously established by the present paper. Throughout the following we write \(\doteq \) to indicate equality up to subexponential factors (\(\exp \{o(n)\}\)).

Recall that we consider nae-sat with k fixed, parametrized by the clause density \(\alpha \equiv d/k\). Abbreviate \({\texttt {0}}\equiv \textsc {true}\), \({\texttt {1}}\equiv \textsc {false}\). For small \(\alpha \), almost all of the solutions lie in a single well-connected subset of \(\{{\texttt {0}},{\texttt {1}}\}^n\). This holds until a clustering transition (or dynamical transition) \(\alpha _\text {clust}\), above which the solution space becomes broken up into many well-separated pieces, or clusters (see [1, 2, 6, 35]). Informally speaking, clusters are subsets of solutions which are characterized by the property that within-cluster distances are very small relative to between-cluster distances. Conjecturally, \(\alpha _\text {clust}\) also coincides with the reconstruction threshold [28, 31, 39], and is small relative to \(\alpha _\text {sat}\) when k is large, with \(\alpha _\text {clust}/\alpha _\text {sat}\asymp (\ln k)/k\).

For \(\alpha \) above \(\alpha _\text {clust}\) it is expected that the number of clusters of size \(\exp \{ n s\}\) has mean value \(\exp \{ n\Sigma (s;\alpha ) \}\), and is concentrated about this mean. The function \(\Sigma (s)\equiv \Sigma (s;\alpha )\) is referred to as the “cluster complexity.” The 1rsb framework of statistical physics gives an explicit conjecture for \(\Sigma \), discussed below in Sect. 1.3. Then, summing over cluster sizes \(0\leqslant s\leqslant \ln 2\) gives that the total number Z of nae-sat solutions has mean

$$\begin{aligned} \mathbb {E}Z \doteq \sum _s \exp \{ n[s+\Sigma (s)] \} \doteq \exp \{ n[s_1+\Sigma (s_1)] \}, \end{aligned}$$
(4)

where \(s_1={{\,\mathrm{arg\,max}\,}}[s+\Sigma (s)]\). It is expected that \(\Sigma \) is continuous and strictly concave in s, and also that \(s+\Sigma (s)\) has a unique maximizer \(s_1\) with \(\Sigma '(s_1)=-1\). For nae-sat and related models, this explicit calculation reveals a critical value \(\alpha _\text {cond}\in (\alpha _\text {clust},\alpha _\text {sat})\), characterized as

$$\begin{aligned} \alpha _\text {cond}=\sup \{\alpha \geqslant \alpha _\text {clust}: \Sigma (s_1(\alpha );\alpha )\geqslant 0 \}\,. \end{aligned}$$

By contrast, the satisfiability threshold can be characterized as

$$\begin{aligned} \alpha _\text {sat}=\sup \{\alpha \geqslant \alpha _\text {clust}: \max _s\Sigma (s;\alpha ) \geqslant 0\}\,. \end{aligned}$$

For all \(\alpha \geqslant \alpha _\text {clust}\), the expected partition function \(\mathbb {E}Z\) is dominated by clusters of size \(\exp \{n s_1\}\). However, for \(\alpha >\alpha _\text {cond}\), we have \(\Sigma (s_1)<0\), so the expected number of clusters of this size is very small: \(\exp \{n\Sigma (s_1)\}\) tends to zero exponentially fast as \(n\rightarrow \infty \). This means that clusters of size \(\exp \{ns_1\}\) are highly unlikely to appear in a typical realization of the model. Instead, in a typical realization we only expect to see clusters of size \(\exp \{ns\}\) with \(\Sigma (s)\geqslant 0\). As a result the solution space should be dominated (with high probability) by clusters of size \(s_{\max }\) where

$$\begin{aligned}s_{\max } \equiv s_{\max }(\alpha ) \equiv {{\,\mathrm{arg\,max}\,}}\{ s+\Sigma (s) : \Sigma (s)\geqslant 0 \}\,. \end{aligned}$$

Since \(\Sigma \) is continuous, \(s_{\max }\) is the largest root of \(\Sigma \), and for \(\alpha \in (\alpha _\text {cond}, \alpha _\text {sat})\) we should have

$$\begin{aligned}Z\doteq \exp \{n s_{\max }\}\ll \mathbb {E}Z = \exp \{n[s_1+\Sigma (s_1)]\} \end{aligned}$$

(where the approximation for Z holds with high probability). The 1rsb free energy, formally given by Definition 1.3 below, should be interpreted as an expression for the function \({\textsf {f}}^\textsc {1rsb}(\alpha )=s_{\max }(\alpha )\).

1.3 The tilted cluster partition function

From the discussion of Sect. 1.2 we see that once the function \(\Sigma (s;\alpha )\) is determined, it is possible to derive \(\alpha _\text {cond}\), \(\alpha _\text {sat}\), and \({\textsf {f}}(\alpha )\). However, previous works have not taken the approach of actually computing \(\Sigma \). Indeed, \(\alpha _\text {sat}\) was determined [25] by an analysis involving only \(\max _s\Sigma (s;\alpha )\), which contains less information than the full curve \(\Sigma \). Related work on the exact determination (in a range of models) of \(\alpha _\text {cond}\) [8, 10, 15] also avoids computing \(\Sigma \), reasoning instead via the so-called “planted model.”

In order to compute the free energy, however, we cannot avoid computing (some version of) the function \(\Sigma \), which we will do by a physics-inspired approach. First consider the \(\lambda \)-tilted partition function

$$\begin{aligned} \bar{\varvec{Z}}_\lambda \equiv \sum _{\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})} |\varvec{\gamma }|^\lambda \,,\end{aligned}$$
(5)

where \(\mathrm {\textsf {{CL}}}(\mathscr {G})\) denotes the set of solution clusters of \(\mathscr {G}\), and \(|\varvec{\gamma }|\) denotes the number of satisfying assignments inside the cluster \(\varvec{\gamma }\). According to the conjectural picture described above, we should have

$$\begin{aligned} \mathbb {E}\bar{\varvec{Z}}_\lambda \doteq \sum _s (\exp \{ns\})^\lambda \exp \{n\Sigma (s)\} \doteq \exp \{n{\mathfrak {F}}(\lambda )\} \end{aligned}$$

where \({\mathfrak {F}}\) is the Legendre dual of \(-\Sigma \):

$$\begin{aligned} {\mathfrak {F}}(\lambda ) \equiv (-\Sigma )^\star (\lambda ) \equiv \max _s \bigg \{ \lambda s + \Sigma (s) \bigg \} = \lambda s_\lambda + \Sigma (s_\lambda )\,,\end{aligned}$$
(6)

where \(s_\lambda \equiv {{\,\mathrm{arg\,max}\,}}_s[\lambda s + \Sigma (s)]\). Moreover, if \(\Sigma (s_\lambda )\geqslant 0\), then \(\varvec{Z}_\lambda \) should concentrate near \(\mathbb {E}\varvec{Z}_\lambda \), and in this regime physicists have an exact prediction for \({\mathfrak {F}}(\lambda )\), which will be further discussed below in Sect. 1.5. In short, the physics approach to computing \(\Sigma \) is to first compute \({\mathfrak {F}}(\lambda )\) (in the regime where \(\Sigma (s_\lambda )\geqslant 0\)), and then set \(\Sigma = -{\mathfrak {F}}^\star \). Note that by differentiating \({\mathfrak {F}}(\lambda ) = n^{-1}\ln \mathbb {E}\bar{\varvec{Z}}_\lambda \) we find that \({\mathfrak {F}}\) is convex in \(\lambda \), so the resulting \(\Sigma \) will indeed be concave.

At first glance the reduction to computing \({\mathfrak {F}}(\lambda )\) may not seem to improve matters. It is not immediately clear how “clusters” should be defined. It turns out that in the regime we are studying, a reasonable definition is that two nae-sat solutions are connected if they differ by a single bit, and the clusters are the connected components of the solution space. A typical nae-sat solution will have a positive density of variables which are free, meaning their value can be changed without violating any clause; any such solution must belong in a cluster of exponential size. Each cluster may be a complicated subset of \(\{{\texttt {0}},{\texttt {1}}\}^n\)—changing the value at one free variable may affect whether its neighbors are free, so a cluster need not be a simple subcube of \(\{{\texttt {0}},{\texttt {1}}\}^n\). Nevertheless, we wish to sum over the cluster sizes raised to non-integer powers. This computation is made tractable by constructing more explicit combinatorial models for the nae-sat solution clusters, as we next describe.

1.4 Modeling solution clusters

In our regime of interest (i.e., \(k\geqslant k_0\) and \(\alpha _\text {lbd}\leqslant \alpha \leqslant \alpha _\text {ubd}\); see Remark 1.1), the analysis of nae-sat solution clusters is greatly simplified by the fact that in a typical satisfying assignment the vast majority of variables are frozen rather than free. The result of this, roughly speaking, is that a cluster \(\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})\) can be encoded by a configuration \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^n\) (representing its circumscribed subcube, so \(x_v={\texttt {f}}\) indicates a free variable) with no essential loss of information. This is formalized by a combinatorial model of “frozen configurations” representing the clusters (Definition 2.2). These frozen configurations can be viewed as the solutions of a certain csp lifted from the original nae-sat problem — so the physics heuristics can be applied again to (the randomized version of) the lifted model. Variations on this idea appear in several places in the physics literature; in the specific context of random csps we refer to [13, 34, 42]. Analyzing the number of frozen configurations, corresponding to (5) with \(\lambda =0\), yields the satisfiability threshold for this model [25].

Analyzing (5) for general \(\lambda \) requires deeper investigation of the arrangement of free variables in a typical frozen configuration \(\underline{{x}}\). For this purpose it is convenient to view an nae-sat instance as a (bipartite) graph \(\mathscr {G}\), where the vertices are given by variables and clauses, and the edges indicate which variables participate in which clauses (the formal description appears in Sect. 2). A key piece of intuition is that if we consider the subgraph of \(\mathscr {G}\) induced by the free variables, together with the clauses through which they interact, then this subgraph is predominantly comprised of disjoint components \(\varvec{T}\) of bounded size. (In fact, the majority of free variables are simply isolated vertices; a smaller fraction occur in linked pairs; a yet smaller fraction occur in components of size three or more.) Each free component \(\varvec{T}\) is surrounded by frozen variables, and we let \(z(\varvec{T})\) be the number of nae-sat assignments on \(\varvec{T}\) which are consistent with the frozen boundary. Since disjoint components \(\varvec{T},\varvec{T}'\) do not interact, the size of the cluster represented by \(\underline{{x}}\) is simply the product of \(z(\varvec{T})\) over all \(\varvec{T}\).

Another key observation is that the random nae-sat graph has few short cycles, so almost all of the free components will be trees. As a result, their weights \(z(\varvec{T})\) can be evaluated recursively by belief propagation (bp), a well-known dynamic programming method (see e.g. [33, Ch. 14]). In the rsb heuristic framework, a cluster is represented by a vector \(\underline{{{\texttt {m}}}}\) of “messages,” indexed by the directed edges of the nae-sat graph \(\mathscr {G}\). Informally, for a given cluster, and for any variable v adjacent to any clause a,

$$\begin{aligned}&{\texttt {m}}_{v\rightarrow a}\text { represents the ``within-cluster law of }{\varvec{x}}_v\text { in absence of }a'';\nonumber \\&{\texttt {m}}_{a\rightarrow v}\text { represents the ``within-cluster law of } {\varvec{x}}_v\text { in absence of }\partial v{\setminus } a'', \end{aligned}$$
(7)

where \(\partial v\) denotes the neighboring clauses of v. Each message is a probability measure on \(\{{\texttt {0}},{\texttt {1}}\}\), and the messages are related to one another via local consistency equations, which are known as the bp equations. A configuration \(\underline{{{\texttt {m}}}}\) which satisfies all the local consistency equations is a bp solution. Thus a cluster \(\varvec{\gamma }\) can be encoded either by a frozen configuration \(\underline{{x}}\) or by a bp solution \(\underline{{{\texttt {m}}}}\); the latter has the key advantage that the size of \(\varvec{\gamma }\) can be easily read off from \(\underline{{{\texttt {m}}}}\), as a certain product of local functions. For the cluster size raised to power \(\lambda \), simply raise each local function to power \(\lambda \). Thus the configurations \(\underline{{{\texttt {m}}}}\) with \(\lambda \)-tilted weights form a spin system (Markov random field), whose partition function is the quantity of interest (5). This new spin system is sometimes termed the “auxiliary model” (e.g. [33, Ch. 19]).

1.5 One-step replica symmetry breaking

In Sect. 1.4 we described informally how a solution cluster \(\varvec{\gamma }\) can be encoded by a frozen configuration \(\underline{{x}}\), or a bp solution \(\underline{{{\texttt {m}}}}\). An important caveat is that the converse need not hold. In the nae-sat model, for any value of \(\alpha \), a trivial bp solution is always given by the “replica symmetric fixed point” (also called the “factorized fixed point”), where every \({\texttt {m}}_{v\rightarrow a}\) is the uniform measure on \(\{{\texttt {0}},{\texttt {1}}\}\). However, above \(\alpha _\text {cond}\), this is a spurious solution. One way to see this is via the heuristic “cavity calculation” of \({\textsf {f}}^\textsc {rs}(\alpha )\), which we now describe to motivate the more complicated expression for \({\textsf {f}}^\textsc {1rsb}(\alpha )\).

Given a random regular nae-sat instance \(\mathscr {G}\) on n variables, choose k uniformly random variables \(v_1,\ldots ,v_k\), and assume for simplicity that no two of these share a clause. Remove the k variables along with their kd incident clauses, producing the “cavity graph” \(\mathscr {G}''\). Then add \(d(k-1)\) new clauses to \(\mathscr {G}''\), producing the graph \(\mathscr {G}'\). Under this operation (cf. [7]), \(\mathscr {G}'\) is distributed as a random regular nae-sat instance on \(n-k\) variables. If the free energy \({\textsf {f}}(\alpha )=\lim _{n\rightarrow \infty } Z^{1/n}\) exists, then we would expect it to agree asymptotically with

$$\begin{aligned} \bigg (\frac{Z(\mathscr {G})}{Z(\mathscr {G}')}\bigg )^{1/k} =\bigg (\frac{Z(\mathscr {G})}{Z(\mathscr {G}'')}\bigg )^{1/k} \bigg / \bigg (\frac{Z(\mathscr {G}')}{Z(\mathscr {G}'')}\bigg )^{1/k}\,. \end{aligned}$$
(8)

Let U denote the set of “cavity neighbors”: the variables in \(\mathscr {G}''\) of degree \(d-1\), which neighbored the clauses that were deleted from \(\mathscr {G}\). Then \(\mathscr {G}\) and \(\mathscr {G}'\) differ from \(\mathscr {G}''\) only in the addition of a few small subgraphs which are attached to U. Computing \(Z(\mathscr {G})/Z(\mathscr {G}'')\) or \(Z(\mathscr {G}')/Z(\mathscr {G}'')\) reduces to understanding the joint law of the spins \(({\varvec{x}}_u)_{u\in U}\) under the nae-sat model defined by \(\mathscr {G}''\). Since \(\mathscr {G}\) is unlikely to have many cycles, the vertices of U are typically far apart from one another in \(\mathscr {G}''\). Therefore, one plausible scenario is that their spins are approximately independent under the nae-sat model on \(\mathscr {G}''\), with \({\varvec{x}}_u\) marginally distributed according to \({\texttt {m}}_{u\rightarrow a}\) where a is the deleted clause that neighbored u in \(\mathscr {G}\). If this is the case, then each \({\texttt {m}}_{u\rightarrow a}\) must be uniform over \(\{{\texttt {0}},{\texttt {1}}\}\), by the negation symmetry of nae-sat. Under this assumption, we can calculate

$$\begin{aligned} \bigg (\frac{Z(\mathscr {G})}{Z(\mathscr {G}'')}\bigg )^{1/k} = 2(1-2/2^k)^d,\quad \bigg (\frac{Z(\mathscr {G}')}{Z(\mathscr {G}'')}\bigg )^{1/k} = (1-2/2^k)^{\alpha (k-1)}, \end{aligned}$$
(9)

Substituting into (8) gives the replica symmetric free energy prediction \({\textsf {f}}(\alpha )\doteq {\textsf {f}}^\textsc {rs}(\alpha )\), which we know to be false for large \(\alpha \) (in particular, it is inconsistent with the known satisfiability threshold). Thus the replica symmetric fixed point, \({\texttt {m}}_{u\rightarrow a}= \text {unif}(\{0,1\})\) for all \(u\rightarrow a\), is a spurious bp solution. In reality the \({\varvec{x}}_u\) are not approximately independent in \(\mathscr {G}''\), even though the u’s are far apart. This phenomenon of non-negligible long-range dependence may be taken as a definition of replica symmetry breaking (rsb) in this setting, and occurs precisely for \(\alpha \) larger than \(\alpha _\text {cond}\).

Since above \(\alpha _\text {cond}\) the partition function cannot be estimated by (9) due to replica symmetry breaking, a different approach is needed. To this end, the one-step rsb (1rsb) heuristic posits that even when the original nae-sat model exhibits rsb, the (seemingly more complicated) “auxiliary model” of \(\lambda \)-weighted bp solutions \(\underline{{{\texttt {m}}}}\) is replica symmetric, for \(\lambda \) small enough: conjecturally, as long as \(\Sigma (s_\lambda )\geqslant 0\) for \(s_\lambda \equiv {{\,\mathrm{arg\,max}\,}}_s \{ \lambda s+\Sigma (s)\}\) (cf. the discussion below (6)). That is, for such \(\lambda \), the auxiliary model is predicted to have correlation decay, in contrast with the long-range correlations of the original model. This would mean that in the auxiliary model of the cavity graph \(\mathscr {G}''\), the spins \(({\texttt {m}}_{u\rightarrow a})_{u\in U}\) are approximately independent, with each \({\texttt {m}}_{u\rightarrow a}\) marginally distributed according to some law \({\dot{q}}_{u\rightarrow a}\). The model has a replica symmetric fixed point, \({\dot{q}}_{u\rightarrow a}={\dot{q}}_\lambda \) for all \(u\rightarrow a\) (the analogue of \({\texttt {m}}_{u\rightarrow a}=\text {unif}(\{{\texttt {0}},{\texttt {1}}\})\) for all \(u\rightarrow a\)). If we substitute this assumption into the cavity calculation (the analogues of (8) and (9)), we obtain the replica symmetric prediction for the auxiliary model free energy \({\mathfrak {F}}(\lambda )\), expressed as a function of \({\dot{q}}_\lambda \). As explained above, from \({\mathfrak {F}}(\lambda )\) we can derive the complexity function \(\Sigma (s)\) and the 1rsb nae-sat free energy \({\textsf {f}}^\textsc {1rsb}(\alpha )\).

1.6 The 1RSB free energy prediction

Having described the heuristic reasoning, we now proceed to formally state the 1rsb free energy prediction. We first describe \({\dot{q}}_\lambda \) as a certain discrete probability measure over \({\texttt {m}}\). Since \({\texttt {m}}\) is a probability measure over \(\{{\texttt {0}},{\texttt {1}}\}\), we encode it by \(x\equiv {\texttt {m}}({\texttt {1}})\in [0,1]\). A measure q on \({\texttt {m}}\) can thus be encoded by an element \(\mu \in {\mathscr {P}}\) where \({\mathscr {P}}\) denotes the set of discrete probability measures on [0, 1]. For measurable \(B\subseteq [0,1]\), define

$$\begin{aligned} \hat{\mathscr {R}}_\lambda \mu (B)&\equiv \frac{1}{\hat{{\mathscr {Z}}}(\mu )} \int \bigg (2-\prod _{i=1}^{k-1}x_{i}-\prod _{i=1}^{k-1}(1-x_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ \frac{1-\prod _{i=1}^{k-1}x_{i}}{2- \prod _{i=1}^{k-1}x_{i} - \prod _{i=1}^{k-1}(1-x_{i})} \in B\bigg \} \, \prod _{i=1}^{k-1}{\mu }(dx_{i})\,,\nonumber \\ \dot{\mathscr {R}}_\lambda \mu (B)&\equiv \frac{1}{\dot{{\mathscr {Z}}}(\mu )} \int \bigg (\prod _{i=1}^{d-1}y_{i}+\prod _{i=1}^{d-1}(1-y_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ \frac{\prod _{i=1}^{d-1}y_{i}}{\prod _{i=1}^{d-1}y_{i}+\prod _{i=1}^{d-1}(1-y_{i})} \in B \bigg \} \, \prod _{i=1}^{d-1}\mu (dy_{i})\,, \end{aligned}$$
(10)

where \(\hat{{\mathscr {Z}}}(\mu )\) and \(\dot{{\mathscr {Z}}}(\mu )\) are the normalizing constants such that \(\hat{\mathscr {R}}_\lambda \mu \) and \(\dot{\mathscr {R}}_\lambda \mu \) are also probability measures on [0, 1]. (In the context of \(\lambda =0\) we take the convention that \(0^0=0\).) Denote \(\mathscr {R}_{\lambda } \equiv \dot{\mathscr {R}}_{\lambda }\circ \hat{\mathscr {R}}_{\lambda }\). The map \(\mathscr {R}_{\lambda }:{\mathscr {P}}\rightarrow {\mathscr {P}}\) represents the bp recursion for the auxiliary model. The following presents a solution for \(\alpha \) in the interval \((\alpha _\text {lbd},\alpha _\text {ubd})\) which we recall (Remark 1.1) is a superset of \((\alpha _\text {cond},\alpha _\text {sat})\).

Proposition 1.2

(proved in “Appendix B” ) For \(\lambda \in [0,1]\), let \(\dot{\mu }_{\lambda ,0}\equiv \frac{1}{2} \delta _0 + \frac{1}{2} \delta _1\in {\mathscr {P}}\), and define recursively \(\dot{\mu }_{\lambda ,l+1} = \mathscr {R}_\lambda \dot{\mu }_{\lambda ,l}\in {\mathscr {P}}\) for all \(l\geqslant 0\). Define \(S_l \equiv ({{\,\mathrm{supp}\,}}\dot{\mu }_{\lambda ,l}) {\setminus } ({{\,\mathrm{supp}\,}}( \dot{\mu }_{\lambda ,0} +\ldots +\dot{\mu }_{\lambda ,l-1} ))\); this is a finite subset of [0, 1]. Regard \(\dot{\mu }_{\lambda ,l}\) as an infinite sequence indexed by the elements of \(S_1\) in increasing order, followed by the elements of \(S_2\) in increasing order, and so on. For \(k\geqslant k_0\) and \(\alpha _\text {lbd}\leqslant \alpha \leqslant \alpha _\text {ubd}\), in the limit \(l\rightarrow \infty \), \(\dot{\mu }_{\lambda ,l}\) converges in the \(\ell ^1\) sequence space to a limit \(\dot{\mu }_\lambda \in {\mathscr {P}}\) satisfying the fixed point equation \(\dot{\mu }_\lambda = \mathscr {R}_\lambda \dot{\mu }_\lambda \), as well as the estimates \(\dot{\mu }_\lambda ((0,1))\leqslant 7/2^k\) and \(\dot{\mu }_\lambda (dx) = \dot{\mu }_\lambda (d(1-x))\).

The limit \(\dot{\mu }_\lambda \) of Proposition 1.2 encodes the desired replica symmetric solution \({\dot{q}}_\lambda \) for the auxiliary model. We can then express \({\mathfrak {F}}(\lambda )\) in terms of \(\dot{\mu }_\lambda \) as follows. Writing \(\hat{\mu }_\lambda \equiv \hat{{\mathscr {R}}}_\lambda \dot{\mu }_\lambda \), let \({\dot{w}}_\lambda ,{\hat{w}}_\lambda ,{\bar{w}}_\lambda \in {\mathscr {P}}\) be defined by

$$\begin{aligned} {\dot{w}}_{\lambda }(B)&= ({\dot{\mathfrak {Z}}}_{\lambda })^{-1} \int \bigg (\prod _{i=1}^{d}y_{i}+\prod _{i=1}^{d}(1-y_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ \prod _{i=1}^{d}y_{i} +\prod _{i=1}^{d}(1-y_{i})\in B \bigg \} \prod _{i=1}^{d}\hat{\mu }_{\lambda }(dy_{i})\,,\nonumber \\ {\hat{w}}_{\lambda }(B)&= ({\hat{\mathfrak {Z}}}_{\lambda })^{-1} \int \bigg (1-\prod _{i=1}^{k}x_{i}-\prod _{i=1}^{k}(1-x_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ 1-\prod _{i=1}^{k}x_{i}-\prod _{i=1}^{k}(1-x_{i})\in B \bigg \} \prod _{i=1}^{k}\dot{\mu }_\lambda (dx_{i})\,,\nonumber \\ {\bar{w}}_{\lambda }(B)&= ({\bar{\mathfrak {Z}}}_{\lambda })^{-1} \iint \bigg (xy+(1-x)(1-y)\bigg )^{\lambda } {\mathbf {1}}\Big \{ xy+(1-x)(1-y)\in B \Big \} \dot{\mu }_\lambda (dx)\hat{\mu }_{\lambda }(dy) \,, \end{aligned}$$
(11)

with \(\dot{\mathfrak {Z}}_{\lambda },\hat{\mathfrak {Z}}_{\lambda },\bar{\mathfrak {Z}}_{\lambda }\) the normalizing constants. The analogue of (9) for this model is

$$\begin{aligned}\bigg (\frac{\bar{\varvec{Z}}_\lambda (\mathscr {G})}{\bar{\varvec{Z}}_\lambda (\mathscr {G}'')}\bigg )^{1/k} =\dot{\mathfrak {Z}}_\lambda (\hat{\mathfrak {Z}}_\lambda /\bar{\mathfrak {Z}}_\lambda )^d,\quad \bigg (\frac{\bar{\varvec{Z}}_\lambda (\mathscr {G}')}{\bar{\varvec{Z}}_\lambda (\mathscr {G}'')}\bigg )^{1/k} = (\hat{\mathfrak {Z}}_\lambda )^{\alpha (k-1)}, \end{aligned}$$

and substituting into (8) gives the 1rsb prediction \(\bar{\varvec{Z}}_\lambda \doteq \exp \{{\mathfrak {F}}(\lambda )\}\) where

$$\begin{aligned} {\mathfrak {F}}(\lambda )\equiv {\mathfrak {F}}(\lambda ;\alpha ) \equiv \ln \dot{\mathfrak {Z}}_{\lambda } + \alpha \ln \hat{\mathfrak {Z}}_{\lambda } - k\alpha \ln \bar{\mathfrak {Z}}_{\lambda }\,. \end{aligned}$$
(12)

Further, the maximizer of \(s\mapsto (\lambda s+\Sigma (s))\) is predicted to be given by

$$\begin{aligned} s_\lambda \equiv s_\lambda (\alpha )\equiv \int \ln (x) {\dot{w}}_\lambda (dx) + \alpha \int \ln (x) {\hat{w}}_\lambda (dx) - k\alpha \int \ln (x) {\bar{w}}_\lambda (dx)\,. \end{aligned}$$
(13)

If \(s=s_\lambda \) for \(\lambda \in [0,1]\) then we define

$$\begin{aligned} \Sigma (s) \equiv \Sigma (s;\alpha ) \equiv {\mathfrak {F}}(\lambda ;\alpha ) -\lambda s_\lambda (\alpha )\,. \end{aligned}$$
(14)

We then use (14) to define the thresholds

$$\begin{aligned} \alpha _\text {cond}&\equiv \sup \{\alpha : \Sigma (s_1;\alpha )>0\}\,,\\ \alpha _\text {sat}&\equiv \sup \{\alpha : \Sigma (s_0;\alpha )>0\}\,. \end{aligned}$$

We can now formally state the predicted free energy of the original nae-sat model:

Definition 1.3

For \(\alpha \in k^{-1}{\mathbb {Z}}\), 1rsb free energy prediction \({\textsf {f}}^\textsc {1rsb}(\alpha )\) is defined as

$$\begin{aligned} {\textsf {f}}^\textsc {1rsb}(\alpha ) = {\left\{ \begin{array}{ll} {\textsf {f}}^\textsc {rs}(\alpha ) =2(1-2/2^k)^\alpha &{} \text {for }\alpha \leqslant \alpha _\text {cond},\\ \exp [\sup \{ s : \Sigma (s) \geqslant 0\}] &{} \text {for }\alpha _\text {cond}\leqslant \alpha < \alpha _\text {sat}\\ 0 &{}\text {for }\alpha > \alpha _\text {sat}. \end{array}\right. } \end{aligned}$$
(15)

(In regular k-nae-sat we must have integer \(d=k\alpha \), so we need not consider \(\alpha \notin k^{-1}{\mathbb {Z}}\).)

Proposition 1.4

(proved in “Appendix B” ) Assume \(k\geqslant k_0\) and write \(A\equiv [\alpha _\text {lbd}, \alpha _\text {ubd}] \cap (k^{-1}{\mathbb {Z}})\).

  1. a.

    For each \(\alpha \in A\), the function \(s\mapsto \Sigma (s;\alpha )\) is well-defined, continuous, and strictly decreasing in s.

  2. b.

    For each \(0\leqslant \lambda \leqslant 1\), the function \(\alpha \mapsto \Sigma (s_\lambda ;\alpha )= {\mathfrak {F}}(\lambda ) -\lambda s_\lambda \) is strictly decreasing with respect to \(\alpha \in A\). There is a unique \(\alpha _\lambda \in A\) such that \(\Sigma (s_\lambda ;\alpha )\) is nonnegative for all \(\alpha \leqslant \alpha _\lambda \), and is negative for all \(\alpha >\alpha _\lambda \). Taking \(\lambda =0\) we recover the estimate (1); and taking \(\lambda =1\) we obtain in addition

    $$\begin{aligned} \alpha _\text {cond}=\alpha _1 = (2^{k-1}-1)\ln 2+ \epsilon _k\,.\end{aligned}$$
    (16)

(The main purpose of this proposition is to show that \(\Sigma (s_1)<0\) for all \(\alpha \in (\alpha _\text {cond},\alpha _\text {sat})\), i.e., that the “condensation regime” is a contiguous range of values of \(\alpha \). The expansion of \(\alpha _{\text {cond}}\) matches an earlier result of [20], which was obtained for a slightly different but closely related model.)

1.7 Proof approach

Since \({\textsf {f}}={\textsf {f}}(\alpha )\) is a priori not well-defined, the statement \({\textsf {f}}\leqslant \textsf {g}\) means formally that for all \(\epsilon >0\), \(\mathbb {P}( Z^{1/n} \geqslant \textsf {g}+\epsilon )\) tends to zero as \(n\rightarrow \infty \). With this notation, we will prove separately the upper bound \({\textsf {f}}(\alpha )\leqslant {\textsf {f}}^\textsc {1rsb}(\alpha )\) and the matching lower bound \({\textsf {f}}(\alpha )\geqslant {\textsf {f}}^\textsc {1rsb}(\alpha )\). This implies the main result Theorem 1: the free energy \({\textsf {f}}(\alpha )\) is indeed well-defined, and equals \({\textsf {f}}^\textsc {1rsb}(\alpha )\).

The upper bound is proved by an interpolation argument, which we defer to “Appendix E”. This argument builds on similar bounds for spin glasses on Erdős–Rényi graphs [26, 43], together with ideas from [12, 27] for interpolation in random regular models. Let \(Z_n(\beta )\) denote the partition function of nae-sat at inverse temperature \(\beta >0\). The interpolation method yields an upper bound on \(\mathbb {E}\ln Z_n(\beta )\) which is expressed as the infimum of a certain function \({\mathcal {P}}(\mu ;\beta )\), with \(\mu \) ranging over probability measures on [0, 1]. We then choose \(\mu \) according to Proposition 1.2, and take \(\beta \rightarrow \infty \) to obtain the desired bound \({\textsf {f}}(\alpha )\leqslant {\textsf {f}}^\textsc {1rsb}(\alpha )\).

Most of the paper is devoted to establishing the matching lower bound. The proof strategy is inspired by the physics picture described above, and at a high level proceeds as follows. Take any \(\lambda \) such that \(\Sigma (s_\lambda )\) (as defined by (13) and (14)) is nonnegative, and let \(\varvec{Y}_\lambda \) be the number of clusters of size roughly \(\exp \{ns_\lambda \}\). (As discussed in §1.3, we shall think of a cluster as a connected component of the solution space.) The informal statement of what we show is that

$$\begin{aligned} \varvec{Y}_\lambda \doteq \exp \{ n[ \lambda s_\lambda + \Sigma (s_\lambda ) ] \}\,.\end{aligned}$$
(17)

Adjusting \(\lambda \) as indicated by (15) then proves the desired bound \({\textsf {f}}(\alpha )\geqslant {\textsf {f}}^\textsc {1rsb}(\alpha )\).

Proving a formalized version of (17) occupies a significant part of the present paper. We introduce a slightly modified version of the messages \({\texttt {m}}\) which record the topologies of the free trees \(\varvec{T}\). We then restrict to cluster encodings in which every free tree has fewer than T variables, which limits the distance that information can propagate between free variables. We prove a version of (17) for every fixed T, and show that this yields the sharp lower bound in the limit \(T\rightarrow \infty \). The proof of (17) for fixed T is via the moment method for the auxiliary model, which boils down to a complicated optimization problem over many dimensions. It is known (see e.g. [25, Lem. 3.6]) that stationary points of the optimization problem correspond to “generalized” bp fixed points—these are measures \(Q_{v\rightarrow a}({\texttt {m}}_{v\rightarrow a},{\texttt {m}}_{a\rightarrow v})\), rather than the simpler “one-sided” measures \(q_{v\rightarrow a}({\texttt {m}}_{v\rightarrow a})\) considered in the 1rsb heuristic.

The one-sided property is a crucial simplification in physics calculations (cf. [33, Proposition 19.4]), but is challenging to prove in general. One contribution of this work that we wish to highlight is a novel resampling argument which yields a reduction to one-sided messages, and allows us to solve the moment optimization problem. (We are helped here by the truncation on the sizes of free trees.) Furthermore, the approach allows us to bring in methods from large deviations theory. With these we can show that the objective function has negative-definite Hessian at the optimizer, which is necessary for the second moment method. This resampling approach is quite general and should apply in a broad range of models.

1.8 Open problems

Beyond the free energy, it remains a challenge to establish the full picture predicted by statistical physicists for \(\alpha \leqslant \alpha _\text {sat}\). We refer the reader to several recent works targeted at a broad class of models in the regime \(\alpha \leqslant \alpha _\text {cond}\) [9, 16, 19], and to work on the location on \(\alpha _\text {cond}\) in a general family of models [14]. In the condensation regime \((\alpha _\text {cond},\alpha _\text {sat})\), an initial step would be to show that most solutions lie within a bounded number of clusters. A much more refined prediction is that the mass distribution among the largest clusters forms a Poisson–Dirichlet process. Another question is to show that on a typical problem instance over n variables, if \(\underline{{{\varvec{x}}}}^1,\underline{{{\varvec{x}}}}^2\) are sampled independently and uniformly at random from the solutions of that instance, then the normalized overlap \(R_{1,2}\equiv n^{-1}\{v:{\varvec{x}}^1_v={\varvec{x}}^2_v\}\) concentrates on two values (corresponding roughly to the two cases that \(\underline{{{\varvec{x}}}}^1,\underline{{{\varvec{x}}}}^2\) come from the same cluster, or from different clusters)—this criterion is sometimes taken as the precise definition of 1rsb. During the final revision stage of this paper, some of the above questions were addressed by a new preprint [40].

Beyond the immediate context of random csps, understanding the condensation transition may deepen our understanding of the stochastic block model, a model for random networks with underlying community structure. Here again ideas from statistical physics have played an important role [21]. A great deal is now known rigorously for the case of two blocks [32, 37], where there is no condensation regime. For models with more than two blocks, however, it is predicted that the condensation can occur, and may define a regime where detection is information-theoretically possible but computationally intractable. A condensation threshold has been established for the anti-ferromagnetic Potts model, corresponding to the disassortative regime of the stochastic block model. An analogous transition is expected in the ferromagnetic (assortative) case, and this remains open.

2 Combinatorial model

In this section we formalize a combinatorial model of clusters, which allows us to rigorously lower bound the tilted cluster partition function (5). We begin by reviewing the (standard) graphical depiction of nae-sat. A not-all-equal-sat (nae-sat) problem instance is naturally represented by a bipartite factor graph \(\mathscr {G}\) with signed edges, as follows. The vertex set of \(\mathscr {G}\) is partitioned into a set \(V=\{v_1,\ldots ,v_n\}\) of variables and a set \(F=\{a_1,\ldots ,a_m\}\) of clauses; we then have a set E of edges joining variables to clauses. For each edge \(e\in E\) we write v(e) for the incident variable, and a(e) for the incident clause; and we assign an edge literal \({\texttt {L}}_e\in \{{\texttt {0}},{\texttt {1}}\}\) to indicate whether v(e) participates affirmatively (\({\texttt {L}}_e={\texttt {0}}\)) or negatively (\({\texttt {L}}_e={\texttt {1}}\)) in a(e). We define all edges to have length one-half, so two variables \(v\ne v'\) lie at unit distance if and only if they appear in the same clause. Throughout this paper we denote \(\mathcal {G}\equiv (V,F,E)\) for the bipartite graph without edge literals, and \(\mathscr {G}\equiv (V,F,E,\underline{{\texttt {L}}})\equiv (\mathcal {G},\underline{{\texttt {L}}})\) for the nae-sat instance.

Formally we regard the edges E as a permutation, as follows. Each variable \(v\in V\) has incident half-edges \(\delta v\), while each clause \(a\in F\) has incident half-edges \(\delta a\). Write \(\delta V\) for the labelled set of all variable-incident half-edges, and \(\delta F\) for the labelled set of all clause-incident half-edges; we require that \(|\delta V|=|\delta F|=\ell \). Then any permutation \({\mathfrak {m}}\) of \([\ell ]\equiv \{1,\ldots ,\ell \}\) defines E by defining a matching between \(\delta V\) and \(\delta F\). Note that any permutation of \([\ell ]\) is permitted, so multi-edges can occur. In this paper we assume that the graph is (dk)-regular: each variable has d incident edges, and each clause has k incident edges, so \(|E|=nd=mk\). A random k-nae-sat instance is given by \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) where \(|V|=n\), \(|F|=m\), E is given by a uniformly random permutation \({\mathfrak {m}}\) of [nd], and \(\underline{{\texttt {L}}}\) is a uniformly random sample from \(\{{\texttt {0}},{\texttt {1}}\}^E\). We write \(\mathbb {P}\) and \(\mathbb {E}\) for probability and expectation over the law of \(\mathscr {G}\).

Definition 2.1

(solutions and clusters) A solution of the nae-sat problem instance \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) is any assignment \(\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) such that for all \(a\in F\), \(({\texttt {L}}_e \oplus {\varvec{x}}_{v(e)})_{e\in \delta a}\) is neither identically \({\texttt {0}}\) nor identically \({\texttt {1}}\). Let \(\mathrm {\textsf {{SOL}}}(\mathscr {G})\subseteq \{{\texttt {0}},{\texttt {1}}\}^V\) denote the set of all solutions of \(\mathscr {G}\), and define a graph on \(\mathrm {\textsf {{SOL}}}(\mathscr {G})\) by connecting any pair of solutions at unit Hamming distance. The (maximal) connected components of the \(\mathrm {\textsf {{SOL}}}(\mathscr {G})\) graph are the solution clusters, hereafter denoted \(\mathrm {\textsf {{CL}}}(\mathscr {G})\).

The aim of this section is to establish that (under a certain restriction) the nae-sat solution clusters can be represented by a combinatorial model of what we will term “colorings.” We will describe the correspondence in a few stages. Informally, the progression is given by

$$\begin{aligned}&\textsc {nae-sat}\, \text {solution clusters }\varvec{\gamma }\leftrightarrow \text {frozen configurations }\underline{{x}}\leftrightarrow \nonumber \\&\leftrightarrow \text {warning configurations }\underline{{y}} \leftrightarrow \text {message configurations }\underline{{\tau }}\leftrightarrow \text {colorings }\underline{{\sigma }}. \end{aligned}$$
(18)

Each step of (18) is formalized below. As mentioned previously, the key feature of the last model is that the size of a cluster \(\varvec{\gamma }\) can be easily read off from its corresponding coloring \(\underline{{\sigma }}\), as a product of local functions. Some steps of the correspondence (18) appear in existing literature (see [13, 25, 33, 34, 42]) but we present them here in detail for completeness.

2.1 Frozen and warning configurations

We introduce a new value \({\texttt {f}}\) (free), and adopt the convention \({\texttt {0}}\oplus {\texttt {f}}\equiv {\texttt {f}}\equiv {\texttt {1}}\oplus {\texttt {f}}\). For \(l\geqslant 1\) and \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l\) let \(I^\textsc {nae}(\underline{{x}})\) be the indicator that \(\underline{{x}}\) is neither identically \({\texttt {0}}\) nor identically \({\texttt {1}}\). Given an nae-sat instance \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) and an assignment \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\), denote

$$\begin{aligned} I^\textsc {nae}(\underline{{x}};\mathscr {G}) \equiv \prod _{a\in F} I^\textsc {nae}( ({\texttt {L}}_e\oplus x_{v(e)})_{e\in \delta a} )\,. \end{aligned}$$

By Definition 2.1, an nae-sat solution is an assignment \(\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) satisfying \(I^\textsc {nae}(\underline{{{\varvec{x}}}};\mathscr {G})=1\).

Definition 2.2

(frozen configurations) Given an nae-sat instance \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), for any \(e\in E\) let \(\mathscr {G}\oplus {\texttt {1}}_e\) denote the instance obtained by flipping the edge label \({\texttt {L}}_e\) to \({\texttt {L}}_e\oplus {\texttt {1}}\). We say that \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\) is a valid frozen configuration on \(\mathscr {G}\) if (i) no nae-sat constraint is violated, meaning \(I^\textsc {nae}(\underline{{x}};\mathscr {G})=1\); and (ii) for all \(v\in V\), \(x_v\) takes a value in \(\{{\texttt {0}},{\texttt {1}}\}\) only when forced to do so, meaning there is some \(e\in \delta v\) such that

$$\begin{aligned} I^\textsc {nae}(\underline{{x}};\mathscr {G}\oplus {\texttt {1}}_e)=0\,. \end{aligned}$$
(19)

If no such \(e\in \delta v\) exists then \(x_v={\texttt {f}}\).

It is well known that on any given \(\mathscr {G}\), every nae-sat solution \(\underline{{{\varvec{x}}}}\) can be mapped to a frozen configuration \(\underline{{x}}=\underline{{x}}(\underline{{{\varvec{x}}}})\) via a “coarsening” or “whitening” procedure [42], as follows. Initialize \(\underline{{x}}=\underline{{{\varvec{x}}}}\). Then, whenever \(x_v\in \{{\texttt {0}},{\texttt {1}}\}\) but there exists no \(e\in \delta v\) such that (19) holds, update \(x_v\) to \({\texttt {f}}\). Iterate until no further updates can be made; the result is then a valid frozen configuration. Two nae-sat solutions \(\underline{{{\varvec{x}}}}\), \(\underline{{{\varvec{x}}}}'\) map to the same frozen configuration \(\underline{{x}}\) if and only if they lie in the same cluster \(\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})\). Thus, for any given \(\mathscr {G}\), we have a well-defined mapping from clusters \(\varvec{\gamma }\) to frozen configurations \(\underline{{x}}\). This map is one-to-one but not necessarily onto: for instance, the all-free assignment \(\underline{{x}}\equiv {\texttt {f}}\) is always trivially a valid frozen configuration, but on many instances \(\mathscr {G}\) there is no solution cluster \(\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})\) whose coarsening is \(\underline{{x}}\equiv {\texttt {f}}\). Since the aim is to lower bound the clusters, the lack of surjectivity must be addressed. We will do so momentarily (Definition 2.4 below), but first we review an useful alternative representation of frozen configurations:

Definition 2.3

(warning configurations) For the integers \(l\geqslant 1\), define functions \(\dot{Y}: \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l\rightarrow \{{\texttt {0}},{\texttt {1}},{\texttt {f}},\texttt {z}\}\) and \(\hat{Y}: \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l\rightarrow \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}\) by

$$\begin{aligned} \dot{Y}(\underline{{{\hat{y}}}}) ={\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }{\texttt {0}}\in \{{\hat{y}}_i\} \subseteq \{{\texttt {0}},{\texttt {f}}\},\\ {\texttt {1}}&{}\text {if }{\texttt {1}}\in \{{\hat{y}}_i\}\subseteq \{{\texttt {1}},{\texttt {f}}\},\\ {\texttt {f}}&{}\text {if }\{{\hat{y}}_i\}={\texttt {f}},\\ \texttt {z}&{}\text {otherwise;}\end{array}\right. } \quad \hat{Y}(\underline{{{\dot{y}}}}) = {\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }\{{\dot{y}}_i\}=\{{\texttt {1}}\},\\ {\texttt {1}}&{}\text {if }\{{\dot{y}}_i\}=\{{\texttt {0}}\};\\ {\texttt {f}}&{}\text {otherwise.}\end{array}\right. }\end{aligned}$$

Denote \(M\equiv \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^2\). We write \(\underline{{y}}\in M^E\) if \(\underline{{y}}=(y_e)_{e\in E}\) where \(y_e\equiv ({\dot{y}}_e,{\hat{y}}_e)\in M\). If edge e joins variable v to clause a, then \({\dot{y}}_e\) represents a “warning” along e from v to a, while \({\hat{y}}_e\) represents a “warning” along e from a to v. We say that \(\underline{{y}}\in M^E\) is a valid warning configuration on \(\mathscr {G}\) if it satisfies the local equations

$$\begin{aligned} y_e=({\dot{y}}_e,{\hat{y}}_e) =\Big ( \dot{Y}( \underline{{{\hat{y}}}}_{\delta v(e){\setminus } e}), {\texttt {L}}_e \oplus \hat{Y}( (\underline{{\texttt {L}}}\oplus \underline{{{\dot{y}}}})_{\delta a(e){\setminus } e} ) \Big )\end{aligned}$$
(20)

for all \(e\in E\) (with no \({\dot{y}}_e=\texttt {z}\)).

It is well known that on any given \(\mathscr {G}\) there is a natural bijection

$$\begin{aligned} \left\{ \begin{array}{c} \text {frozen configurations}\\ \underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\end{array} \right\} \longleftrightarrow \left\{ \begin{array}{c} \text {warning configurations}\\ \underline{{y}}\in M^E\end{array} \right\} \,. \end{aligned}$$
(21)

In the forward direction, given a (valid) frozen configuration \(\underline{{x}}\), for any variable v and any edge \(e\in \delta v\) such that (19) holds, set \({\hat{y}}_e=x_v \in \{{\texttt {0}},{\texttt {1}}\}\); then in all other cases set \({\hat{y}}_e={\texttt {f}}\). Then, having defined all the \({\hat{y}}_e\), the \({\dot{y}}_e\) can only be defined by the local Eq. (20). One can check that the resulting assignment \(\underline{{y}}\in M^E\) is a warning configuration. Conversely, given a warning configuration \(\underline{{y}}\), a frozen configuration \(\underline{{x}}\) can be obtained by setting \(x_v=\dot{Y}({\hat{y}}_{\delta v})\) for all v.

2.2 Message configurations

We return to the question of surjectivity: does a given frozen configuration \(\underline{{x}}\) encode a (nonempty) solution cluster \(\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})\)? We will now state an easy sufficient condition for this to hold. The condition is not in general necessary, but we will show that it captures enough of the solution space to deliver a sharp lower bound on the free energy.

Definition 2.4

(free cycles) Let \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\) be a valid frozen configuration on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\). We say that a clause \(a\in F\) is separating (with respect to \(\underline{{x}}\)) if there exist \(e',e''\in \delta a\) such that \({\texttt {L}}_{e'}\oplus x_{v(e')}={\texttt {0}}\) while \({\texttt {L}}_{e''}\oplus x_{v(e'')}={\texttt {1}}\). For instance, a forcing clause is also separating. A cycle in \(\mathscr {G}\) is a sequence of edges

$$\begin{aligned} e_1 e_2 \ldots e_{2\ell -1} e_{2\ell } e_1, \end{aligned}$$

where, taking indices modulo \(2\ell \), it holds for each integer i that \(e_{2i-1}\) and \(e_{2i}\) are distinct but share a clause, while \(e_{2i}\) and \(e_{2i+1}\) are distinct but share a variable. (In particular, if v is joined to a by two edges \(e' \ne e''\), then \(e'e''\) forms a cycle.) We say the cycle in \(\mathscr {G}\) is free (with respect to \(\underline{{x}}\)) if all its variables are free and all its clauses are non-separating.

Definition 2.5

(free trees) Let \(\underline{{x}}\) be a frozen configuration on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) that has no free cyces. Let H be the subgraph of \(\mathscr {G}\) induced by the free variables and non-separating clauses of \(\underline{{x}}\). Since \(\underline{{x}}\) has no free cycles, H must be a disjoint union of tree components \(\varvec{t}\), which we term the free trees of \(\underline{{x}}\). For each \(\varvec{t}\), let \(\varvec{T}\equiv \varvec{T}(\varvec{t})\) be the subgraph of \(\mathscr {G}\) induced by \(\varvec{t}\) together with its incident variables. The subgraphs \(\varvec{T}\) (which can contain cycles) will be termed the free pieces of \(\underline{{x}}\). Each free variable is covered by exactly one free piece. In the simplest case, a free piece consists of a single free variable surrounded by d separating clauses.

Let us say that \(\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) extends \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\) if \({\varvec{x}}_v=x_v\) for all v such that \(x_v\in \{{\texttt {0}},{\texttt {1}}\}\). If \(\underline{{x}}\) is a frozen configuration on \(\mathscr {G}\) with no free cycles, it is easy to extend \(\underline{{x}}\) to valid nae-sat solutions \(\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\)—we simply extend \(\underline{{x}}\) on each free tree \(\varvec{t}\), since nae-sat on a tree is always solvable; the different free trees do not interact. Let \(\varvec{\gamma }\) denote the set of all valid nae-sat solutions on \(\mathscr {G}\) that extend \(\underline{{x}}\), and denote \(\mathrm {\textsf {{size}}}(\underline{{x}})\equiv |\varvec{\gamma }|\). Meanwhile, let \({\mathfrak {T}}(\underline{{x}})\) denote the set of all free pieces of \(\underline{{x}}\). For each \(\varvec{T}\in {\mathfrak {T}}(\underline{{x}})\), let \(\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})\) denote the number of valid nae-sat solutions on \(\varvec{T}\) that extend \(\underline{{x}}|_{\varvec{T}}\). It follows from our discussion that \(\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})\) with

$$\begin{aligned} |\varvec{\gamma }| = \mathrm {\textsf {{size}}}(\underline{{x}})=\prod _{\varvec{T}\in {\mathfrak {T}}(\underline{{x}})} \mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})\,. \end{aligned}$$
(22)

That is to say, the absence of free cycles is an easy sufficient condition for a frozen configuration to encode a nonempty cluster; and it further ensures that the cluster has a relatively simple product structure (22). As noted previously, the structure within each free piece \(\varvec{T}\) can be understood by dynamic programming (bp). This is a well-known calculation (see e.g. [33, Ch. 14]) but we will review the details for our setting. To this end, we first introduce a combinatorial model of “message configurations” which will map directly to the natural bp variables.

Recall from Definition 2.3 that a warning configuration is denoted \(\underline{{y}}\in M^E\) where each \(y_e\equiv ({\dot{y}}_e,{\hat{y}}_e)\in M\). We denote a message configuration by \(\underline{{\tau }}\in {\mathscr {M}}^E\) where each \(\tau _e=(\dot{\tau }_e,\hat{\tau }_e)\in {\mathscr {M}}\) (for \({\mathscr {M}}\) to be defined below). It will be convenient to let \({\textsc {e}}\) indicate a directed edge, pointing from tail vertex \(t({\textsc {e}})\) to head vertex \(h({\textsc {e}})\). If e is the undirected version of \({\textsc {e}}\), then we denote

$$\begin{aligned}(y_{{\textsc {e}}},\tau _{{\textsc {e}}}) ={\left\{ \begin{array}{ll} ({\dot{y}}_e,\dot{\tau }_e) &{}\text {if }t({\textsc {e}})\text { is a variable,}\\ ({\hat{y}}_e,\hat{\tau }_e) &{}\text {if }t({\textsc {e}})\text { is a clause.} \end{array}\right. }\end{aligned}$$

We will make a definition such that \(\tau _{{\textsc {e}}}\) either takes the value “\(\star \)” or is a bipartite factor tree. The tree is unlabelled except that one vertex is distinguished as the root, and some edges are assigned \({\texttt {0}}\) or \({\texttt {1}}\) values as explained below. The root of \(\tau _{{\textsc {e}}}\) is required to have degree one, and should be thought of as corresponding to the head vertex \(h({\textsc {e}})\).

In the context of message configurations \(\underline{{\tau }}\), we use “\({\texttt {0}}\)” or “\({\texttt {1}}\)” to stand for the tree consisting of a single edge which is labelled \({\texttt {0}}\) or \({\texttt {1}}\) and rooted at the endpoint corresponding to the head vertex—the root is the incident clause in the case of \(\dot{\tau }\), the incident variable in the case of \(\hat{\tau }\). We use \({\texttt {s}}\) to stand for the tree consisting of a single unlabelled edge, rooted at the incident variable; this will be related to the situation of separating clauses from Definition 2.4. Given a collection of rooted trees \(t_1,\ldots ,t_\ell \) whose roots \(o_1,\ldots ,o_\ell \) are all of the same type (either all variable or all clauses), we define \(t=\mathrm {\textsf {{join}}}(t_1,\ldots ,t_\ell )\) by identifying all the \(o_i\) as a single vertex o, then adding an edge which joins o to a new vertex \(o'\). The vertex o has the same type (variable or clause) as the \(o_i\); and the vertex \(o'\) is assigned the opposite type and becomes the root of t.

Definition 2.6

(message configurations) Start with \(\dot{\mathscr {M}}_0\equiv \{{\texttt {0}},{\texttt {1}},\star \}\) and \(\hat{\mathscr {M}}_0\equiv \varnothing \), and suppose inductively that \(\dot{\mathscr {M}}_t,\hat{\mathscr {M}}_t\) have been defined. For \(\underline{{\hat{\tau }}}\in (\hat{\mathscr {M}}_t)^{d-1}\) and \(\underline{{\dot{\tau }}}\in (\dot{\mathscr {M}}_t)^{k-1}\), let us abbreviate \(\{\hat{\tau }_i\}\equiv \{\hat{\tau }_1,\ldots ,\hat{\tau }_{k-1}\}\) and \(\{\dot{\tau }_i\}\equiv \{\dot{\tau }_1,\ldots ,\dot{\tau }_{d-1}\}\). Define

$$\begin{aligned}\dot{T}(\underline{{\hat{\tau }}}) ={\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }{\texttt {0}}\in \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {1}}\},\\ {\texttt {1}}&{}\text {if }{\texttt {1}}\in \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {0}}\},\\ \texttt {z}&{}\text {if }\{{\texttt {0}},{\texttt {1}}\}\subseteq \{\hat{\tau }_i\},\\ \star &{}\text {if }\star \in \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {0}},{\texttt {1}}\},\\ \mathrm{{\textsf {{join}}}} \{\hat{\tau }_i\} &{}\text {if } \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}; \end{array}\right. }\quad \hat{T}(\underline{{\dot{\tau }}}) ={\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }\{\dot{\tau }_i\}=\{{\texttt {1}}\},\\ {\texttt {1}}&{}\text {if }\{\dot{\tau }_i\}=\{{\texttt {0}}\},\\ {\texttt {s}}&{}\text {if }\{{\texttt {0}},{\texttt {1}}\}\subseteq \{\dot{\tau }_i\},\\ \star &{}\text {if }\{{\texttt {0}},{\texttt {1}}\}\not \subseteq \{\dot{\tau }_i\}\text { and } \star \in \{\dot{\tau }_i\},\\ \mathrm{{\textsf {{join}}}} \{{\tau }_i\} &{}\text {otherwise.}\end{array}\right. } \end{aligned}$$

Then, for \(t\geqslant 0\), define recursively the sets

$$\begin{aligned} \hat{\mathscr {M}}_{t+1}&\equiv \hat{\mathscr {M}}_t \cup \hat{T}[(\dot{\mathscr {M}}_t)^{k-1}]\,,\\ \dot{\mathscr {M}}_{t+1}&\equiv \dot{\mathscr {M}}_t \cup \dot{T}[(\hat{\mathscr {M}}_{t+1})^{d-1}]{{\setminus }}\{\texttt {z}\} \end{aligned}$$

We then let \(\dot{\mathscr {M}}\) be the union of all the \(\dot{\mathscr {M}}_t\), let \(\hat{\mathscr {M}}\) be the union of all the \(\hat{\mathscr {M}}_t\), and let \({\mathscr {M}}=\dot{\mathscr {M}}\times \hat{\mathscr {M}}\). On \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), the assignment \(\underline{{\tau }}\in {\mathscr {M}}^E\) is a valid message configuration if (i) it satisfies the local equations

$$\begin{aligned} \tau _e =(\dot{\tau }_e,\hat{\tau }_e) =\Big ( \dot{T}( \underline{{\hat{\tau }}}_{\delta v(e){\setminus } e} ), {\texttt {L}}_e\oplus \hat{T}( (\underline{{\texttt {L}}}\oplus \underline{{\dot{\tau }}})_{\delta a(e){\setminus } e} ) \Big ) \end{aligned}$$
(23)

for all \(e\in E\) (with no \(\dot{\tau }_e=\texttt {z}\)), and (ii) if one element of \(\{\dot{\tau }_e,\hat{\tau }_e\}\) equals \(\star \) then the other element is in \(\{{\texttt {0}},{\texttt {1}}\}\). In (23), we take the convention that \({\texttt {L}}_e\oplus {\texttt {f}}={\texttt {f}}\) and \({\texttt {L}}_e\oplus \star =\star \), and if \(\tau \) is a tree with labels then \({\texttt {L}}_e\oplus \tau \) is defined by applying \({\texttt {L}}_e\oplus \cdot \) entrywise to all labels of \(\tau \). See Fig. 1.

Fig. 1
figure 1

Examples of messages (Definition 2.6). Variables are indicated by circle nodes, clauses by square nodes, and edges by lines. For simplicity we assume that all edges depicted have literals \({\texttt {L}}_e={\texttt {0}}\). Each message is shown with its root as a filled node at the top. The variable-to-clause messages \(\dot{\tau }\) are rooted at clauses, while the clause-to-variable messages \(\hat{\tau }\) are rooted at variables. The heavy lines indicates edges inside the message that are labelled \({\texttt {0}}\) or \({\texttt {1}}\). In our notation we have \(\dot{\tau }_0={\texttt {0}}\) and \(\dot{\tau }_1={\texttt {1}}\). Next \(\hat{\tau }_0={\hat{T}}(\dot{\tau }_1,\dot{\tau }_1)={\texttt {0}}\), while \(\hat{\tau }_1={\hat{T}}(\dot{\tau }_0,\dot{\tau }_1)={\texttt {s}}\). Finally \(\dot{\tau }_3={\dot{T}}(\hat{\tau }_1,\hat{\tau }_1) =\mathrm {\textsf {{join}}}(\hat{\tau }_1,\hat{\tau }_1)\) and \(\hat{\tau }_2={\hat{T}}(\dot{\tau }_3,\dot{\tau }_0) =\mathrm {\textsf {{join}}}(\dot{\tau }_3,\dot{\tau }_0)\)

Suppose \(\underline{{x}}\) is a frozen configuration on \(\mathscr {G}\), and let \(\underline{{y}}\) be its corresponding warning configuration from (21). Given \(\underline{{y}}\), we define \(\underline{{\tau }}\) in four stages:

  1. 1.

    If \({\dot{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}\) then set \(\dot{\tau }_e={\dot{y}}_e\); likewise if \({\hat{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}\) then set \(\hat{\tau }_e={\hat{y}}_e\).

  2. 2.

    If \((\underline{{\texttt {L}}}\oplus \underline{{{\dot{y}}}})_{\delta a(e){\setminus } e}\) has both \({\texttt {0}}\) and \({\texttt {1}}\) entries, then set \(\hat{\tau }_e={\texttt {s}}\).

  3. 3.

    Apply the local Eq. (23) recursively to define \(\dot{\tau }_e,\hat{\tau }_e\) wherever possible.

  4. 4.

    Lastly, if any \(\dot{\tau }_e\) or \(\hat{\tau }_e\) remains undefined, then set it to \(\star \).

An example with \(\star \) messages is given in Fig. 2.

Fig. 2
figure 2

Example of how \(\star \) messages can arise in the mapping from \(\underline{{y}}\) to \(\underline{{\tau }}\)2.2). The figure shows a subgraph of \(\mathscr {G}\) with variables indicated by circle nodes, clauses by square nodes, and edges by lines. All edges in the figure are assumed to have label \({\texttt {L}}_e={\texttt {0}}\). All variables shown are frozen to \({\texttt {0}}\) or \({\texttt {1}}\), and all clauses shown are separating. To avoid clutter we did not label the edges with the warnings \(y_e\); instead, each variable v is labeled with its frozen configuration spin \(x_v\), according to the \(\underline{{x}}\leftrightarrow \underline{{y}}\) bijection (21). The clauses force the variables along the cycle in the clockwise direction, resulting in \(\star \) values in the final \(\underline{{\tau }}\) in the counterclockwise direction of the cycle. (Note also that \(\underline{{y}}\) can be recovered from \(\underline{{\tau }}\) by changing \(\star \) to \({\texttt {f}}\); cf. Lemma 2.7)

Lemma 2.7

The mapping described above defines a bijection

$$\begin{aligned} \left\{ \begin{array}{c} \text {frozen configurations } \underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\\ \text {without free cycles}\end{array} \right\} \longleftrightarrow \left\{ \begin{array}{c} \text {message configurations}\\ \underline{{\tau }}\in {\mathscr {M}}^E\end{array} \right\} \,. \end{aligned}$$

Proof

Let \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\) be a frozen configuration on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) without free cycles, and let \(\underline{{y}}\in M^E\) be the warning configuration which corresponds to \(\underline{{x}}\) via (21). We first check that the mapping \(\underline{{y}}\mapsto \underline{{\tau }}\), as described above, gives a message configuration which is valid, i.e., satisfies conditions (i) and (ii) of Definition 2.6. In the first stage, the mapping procedure sets \(\dot{\tau }_e={\dot{y}}_e\) whenever \({\dot{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}\), and \(\hat{\tau }_e={\hat{y}}_e\) whenever \({\hat{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}\). One can argue by induction that the rest of the procedure does not create any additional \({\texttt {0}}\) or \({\texttt {1}}\) messages, so that in the final configuration the \(\{{\texttt {0}},{\texttt {1}}\}\) values of \(\underline{{\tau }}\) will match those of \(\underline{{y}}\). The second and third stages of the procedure are clearly consistent with the local Eq. (23). Note in particular that the third stage does not produce any \(\dot{\tau }_e=\texttt {z}\) message, because it would contradict the assumption that \(\underline{{y}}\) is a valid warning configuration; it also does not produce any \(\star \) message. All \(\star \) messages are created in the fourth stage, and this is clearly consistent with the mapping of \(\star \) messages under \(\dot{T}\) and \(\hat{T}\). This concludes the proof that \(\underline{{\tau }}\) satisfies condition (i) of Definition 2.6. To check condition (ii), suppose \(\tau _{{\textsc {e}}}=\star \), and let \({\textsc {f}}\) denote the reversal of \({\textsc {e}}\). From the above construction, it must be that \(y_{{\textsc {e}}}={\texttt {f}}\) and \(\tau _{{\textsc {e}}'}=\star \) for some \({\textsc {e}}'\) that points to the tail vertex \(t({\textsc {e}})\) but does not equal \({\textsc {f}}\). Consequently \({\textsc {e}}\) must belong to a directed cycle \({\textsc {e}}_1 {\textsc {e}}_2\ldots {\textsc {e}}_{2k} {\textsc {e}}_1\) with all the \(\tau _{{\textsc {e}}_i}\) equal to \(\star \). Whenever \({\textsc {e}}\) points from a separating clause a to free variable v, we must have \(\tau _{{\textsc {e}}}={\texttt {s}}\). As a result, if all the variables along the cycle are free, then none of the clauses can be separating, contradicting the assumption that \(\underline{{x}}\) has no free cycles. Therefore some variable v on the cycle must take value \(x_v\in \{{\texttt {0}},{\texttt {1}}\}\), and by relabelling we may assume \(v=t({\textsc {e}}_1)\). Let \({\textsc {f}}_i\) denote the reversal of \({\textsc {e}}_i\): since \(x_v\ne {\texttt {f}}\) but \(y_{{\textsc {e}}_1}={\texttt {f}}\), it must be that \(y_{{\textsc {f}}_1}=x_v\). This means that the clause \(a=h({\textsc {e}}_1)=t({\textsc {f}}_1)\) is forcing to v, so in particular \(y_{{\textsc {f}}_2}\in \{{\texttt {0}},{\texttt {1}}\}\). Continuing in this way we see that \(y_{{\textsc {f}}_i}\in \{{\texttt {0}},{\texttt {1}}\}\) for all i, and it follows that \(\underline{{\tau }}\) satisfies condition (ii), and so is a valid message configuration.

The mapping from \(\underline{{y}}\) to \(\underline{{\tau }}\) is clearly injective. To see that it is surjective, let \(\underline{{\tau }}\) be any valid message configuration. Projecting \(\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}}\}\mapsto {\texttt {f}}\) and \(\hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}}\}\mapsto {\texttt {f}}\) yields a valid warning configuration \(\underline{{y}}\), which in turn maps to a valid frozen configuration \(\underline{{x}}\). It remains then to check that \(\underline{{x}}\) has no free cycles. Indeed, along a free cycle, all the warnings (in either direction) must be \({\texttt {f}}\). This means none of the messages can be in \(\{{\texttt {0}},{\texttt {1}}\}\), and as a result none of the messages can be \(\star \), by condition (ii) of Definition 2.6. This means all the messages must be in \(\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\) or \(\hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\). Suppose in one direction of the cycle we have the directed edges \({\textsc {e}}_1{\textsc {e}}_2\ldots {\textsc {e}}_{2k}{\textsc {e}}_1\). By definition of \(\dot{T}\) and \(\hat{T}\), \(\tau _{{\textsc {e}}_i}\) is a proper subtree of \(\tau _{{\textsc {e}}_{i+1}}\) for all i, with indices modulo 2k. Going around the cycle we find that \(\tau _{{\textsc {e}}_1}\) is a proper subtree of \(\tau _{{\textsc {e}}_{2k+1}}=\tau _{{\textsc {e}}_1}\), which gives the contradiction.\(\square \)

2.3 Bethe formula

We now describe the dynamic programming (bp) calculation which will ultimately take a message configuration \(\underline{{\tau }}\) and evaluate a product of local functions to compute the size of its associated cluster. The first step is to define the dynamic programming variables; these will formalize the measures \({\texttt {m}}\) which were introduced previously in (7). Recall that for \(l\geqslant 1\) and \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l\), we write \(I^\textsc {nae}(\underline{{x}})\) for the indicator that the entries of \(\underline{{x}}\) are not identically \({\texttt {0}}\) or identically \({\texttt {1}}\).

Definition 2.8

Recall that message configuration spins belong to the space \({\mathscr {M}}=\dot{\mathscr {M}}\times \hat{\mathscr {M}}\) (Definition 2.6). Let \({\mathscr {P}}(\{{\texttt {0}},{\texttt {1}}\})\) denote the space of probability measures on \(\{{\texttt {0}},{\texttt {1}}\}\). Define the mappings \(\dot{{\texttt {m}}}:\dot{\mathscr {M}}\rightarrow {\mathscr {P}}(\{{\texttt {0}},{\texttt {1}}\})\) and \(\hat{{\texttt {m}}}:\hat{\mathscr {M}}\rightarrow {\mathscr {P}}(\{{\texttt {0}},{\texttt {1}}\})\) as follows. For \(\dot{\tau }\in \{{\texttt {0}},{\texttt {1}}\}\) let \(\dot{{\texttt {m}}}(\dot{\tau })\) be the unit measure supported on \(\dot{\tau }\). Likewise, for \(\hat{\tau }\in \{{\texttt {0}},{\texttt {1}}\}\) let \(\hat{{\texttt {m}}}(\hat{\tau })\) be the unit measure supported on \(\hat{\tau }\). For \(\dot{\tau }\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\) or \(\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\) we let \(\dot{{\texttt {m}}}(\dot{\tau })\) and \(\hat{{\texttt {m}}}(\hat{\tau })\) be recursively defined: if \(\dot{\tau }=\dot{T}(\hat{\tau }_1,\ldots ,\hat{\tau }_{d-1})\) where no \(\hat{\tau }_j=\star \), define

$$\begin{aligned} \dot{z}(\dot{\tau }) \equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \prod _{i=1}^{d-1} [\hat{{\texttt {m}}}(\hat{\tau }_i)]({\varvec{x}})\,, \quad [\dot{{\texttt {m}}}(\dot{\tau })]({\varvec{x}}) \equiv \frac{1}{\dot{z}(\dot{\tau })} \prod _{i=1}^{d-1} [\hat{{\texttt {m}}}(\hat{\tau }_i)]({\varvec{x}})\,. \end{aligned}$$
(24)

Note that \(\hat{\tau }_1,\ldots ,\hat{\tau }_{d-1}\) can be recovered from \(\dot{\tau }\) modulo permutation of the indices, so these quantities are well-defined. We see inductively that for \(\dot{\tau }\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\), the normalizing factor \(\dot{z}(\dot{\tau })\) is positive, and \(\dot{{\texttt {m}}}(\dot{\tau })\) is a nondegenerate probability measure on \(\{{\texttt {0}},{\texttt {1}}\}\). Similarly, if \(\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\) equals \(\hat{T}(\dot{\tau }_1,\ldots ,\dot{\tau }_{k-1})\) where none of the \(\dot{\tau }_i\) are \(\star \), then set

$$\begin{aligned} \hat{z}(\hat{\tau })&\equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \sum _{\underline{{\dot{{\varvec{x}}}}}\in \{{\texttt {0}},{\texttt {1}}\}^{k-1}} I^\textsc {nae}({\varvec{x}},\underline{{\dot{{\varvec{x}}}}}) \prod _{i=1}^{k-1} [\dot{{\texttt {m}}}(\dot{\tau }_i)](\dot{{\varvec{x}}}_i) =2 -\prod _{i=1}^{k-1}[\dot{{\texttt {m}}}(\dot{\tau }_i)]({\texttt {0}}) -\prod _{i=1}^{k-1}[\dot{{\texttt {m}}}(\dot{\tau }_i)]({\texttt {1}})\,,\nonumber \\ [\hat{{\texttt {m}}}(\hat{\tau })]({\varvec{x}})&\equiv \frac{1}{\hat{z}(\hat{\tau })} \sum _{\underline{{\dot{{\varvec{x}}}}}\in \{{\texttt {0}},{\texttt {1}}\}^{k-1}} I^\textsc {nae}({\varvec{x}},\underline{{\dot{{\varvec{x}}}}}) \prod _{i=1}^{k-1} [\dot{{\texttt {m}}}(\dot{\tau }_i)](\dot{{\varvec{x}}}_i) = \frac{1}{\hat{z}(\hat{\tau })} \bigg (1-\prod _{i=1}^{k-1}[\dot{{\texttt {m}}}(\dot{\tau }_i)]({\varvec{x}})\bigg )\,. \end{aligned}$$
(25)

Again, we see inductively that for \(\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\), the normalizing factor \(\hat{z}(\hat{\tau })\) is positive, and \(\hat{{\texttt {m}}}(\hat{\tau })\) is a nondegenerate probability measure on \(\{{\texttt {0}},{\texttt {1}}\}\). Finally, we will see below that for our purposes we can take \(\dot{{\texttt {m}}}(\star )\) and \(\hat{{\texttt {m}}}(\star )\) to be arbitrary nondegenerate probability measures on \(\{{\texttt {0}},{\texttt {1}}\}\); we therefore define them both to equal the uniform measure on \(\{{\texttt {0}},{\texttt {1}}\}\).

Given a valid message configuration \(\underline{{\tau }}\) on \(\mathscr {G}\), define \(\underline{{{\texttt {m}}}} = ({\texttt {m}}_e)_{e\in E}\) where \({\texttt {m}}_e\equiv (\dot{{\texttt {m}}}_e,\hat{{\texttt {m}}}_e)\) with \(\dot{{\texttt {m}}}_e\equiv \dot{{\texttt {m}}}(\dot{\tau }_e)\) and \(\hat{{\texttt {m}}}_e\equiv \hat{{\texttt {m}}}(\hat{\tau }_e)\). It follows from Definition 2.8 that \(\underline{{{\texttt {m}}}}\) satisfies the following local consistency equations, which are inherited from the Eq. (23) satisfied by \(\underline{{\tau }}\), in combination with the above definitions (24) and (25). If \(\dot{\tau }_e\ne \star \), then \(\dot{{\texttt {m}}}_e\) is given by the equation

$$\begin{aligned} \dot{{\texttt {m}}}_e({\varvec{x}}) = \frac{1}{\dot{z}(\dot{\tau }_e)} \prod _{e'\in \delta v(e){\setminus } e} \hat{{\texttt {m}}}_{e'}({\varvec{x}}) \end{aligned}$$
(26)

for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\). Likewise, if \(\hat{\tau }_e\ne \star \), then \(\hat{{\texttt {m}}}_e\) is given by the equation

$$\begin{aligned} \hat{{\texttt {m}}}_e({\varvec{x}}) = \frac{1}{\hat{z}(\hat{\tau }_e)} \sum _{ \underline{{\dot{{\varvec{x}}}}}_{\delta a(e)} \in \{{\texttt {0}},{\texttt {1}}\}^d } \mathbf {1}\{\dot{{\varvec{x}}}_e={\varvec{x}}\} I^\textsc {nae}( (\underline{{\dot{{\varvec{x}}}}}\oplus \underline{{\texttt {L}}})_{\delta a(e)} ) \prod _{e'\in \delta a(e){\setminus } e} \dot{{\texttt {m}}}_{e'}({\varvec{x}}_{e'}) \end{aligned}$$
(27)

for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\). The Eqs. (26) and (27) are known as the bp equations. We now proceed to the calculation of the cluster size (22). To this end, we define the local functions

$$\begin{aligned} \bar{\varphi }(\dot{\tau },\hat{\tau })&\equiv \bigg \{ \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \dot{{\texttt {m}}}[\dot{\tau }]({\varvec{x}}) \cdot \hat{{\texttt {m}}}[\hat{\tau }]({\varvec{x}}) \bigg \}^{-1}\,,\nonumber \\ \hat{\varphi }^\text {lit}(\dot{\tau }_1,\ldots ,\dot{\tau }_k)&\equiv \sum _{\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^k} I^\textsc {nae}(\underline{{{\varvec{x}}}}) \prod _{i=1}^k \dot{{\texttt {m}}}[\dot{\tau }_i]({\varvec{x}}_i) =1\nonumber \\&-\sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \prod _{i=1}^k \dot{{\texttt {m}}}[\dot{\tau }_i]({\varvec{x}}) = \frac{\hat{z}(\hat{T}( (\dot{\tau }_j)_{j\ne i} ))) }{\bar{\varphi }(\dot{\tau }_i,\hat{T}( (\dot{\tau }_j)_{j\ne i} ))} \,,\nonumber \\ \dot{\varphi }(\hat{\tau }_1,\ldots ,\hat{\tau }_d)&\equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \prod _{i=1}^d\hat{{\texttt {m}}}[\hat{\tau }_i]({\varvec{x}}) = \frac{\dot{z}(\dot{T}((\hat{\tau }_j)_{j\ne i})))}{\bar{\varphi }(\hat{\tau }_i,\dot{T}((\hat{\tau }_j)_{j\ne i}))}\,, \end{aligned}$$
(28)

where the last identity in the last two lines holds for any choice of i. The bp calculation is summarized by the following:

Lemma 2.9

Suppose on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) that \(\underline{{x}}\) is a frozen configuration with no free cycles, and let \(\underline{{\tau }}\) be its corresponding message configuration from Lemma 2.7. Let \(\varvec{T}\in {\mathfrak {T}}(\underline{{x}})\) be a free piece of \(\underline{{x}}\), and let \(\varvec{t}\) be the free tree inside it. Then the number of nae-sat extensions of \(\underline{{x}}|_{\varvec{T}}\) on \(\varvec{T}\) is given by

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}};{\varvec{T}}) =\prod _{v\in V({\varvec{t}})} \bigg \{ \dot{\varphi }(\underline{{{\tau }}}_{\delta v}) \prod _{e\in \delta v}\bar{\varphi }(\tau _e) \bigg \} \prod _{a\in F({\varvec{t}})} \hat{\varphi }^\text {lit}((\underline{{{\tau }}} \oplus \underline{{\texttt {L}}})_{\delta a}) \end{aligned}$$
(29)

where \(V(\varvec{t})\) and \(F(\varvec{t})\) denote respectively the variables and clauses in \(\varvec{t}\). (An example calculation is worked out in Fig. 3.)

Fig. 3
figure 3

Example of correspondence (Lemma 2.7) between frozen and message configurations. Variables are indicated by circle nodes, clauses by square nodes, and edges by lines. The graph formed by the wavy lines is a free piece \(\varvec{T}\), with the free tree \(\varvec{t}\) in green and \(\varvec{T}{\setminus }\varvec{t}\) in blue (Definition 2.5). Each variable v is labelled with its frozen configuration spin value . The four separating clauses are indicated by filled black squares. The message configuration is only partially shown, with the remaining values given by the obvious symmetries. The clause-to-variable messages \(\hat{\tau }\) are shown in black, and the variable-to-clause messages \(\dot{\tau }\) are shown in purple. Each message is a tree, with root vertex shown as a filled node. The heavy black and purple lines indicate edges inside the messages that are labeled \({\texttt {1}}\). For instance, the message coming up out of the bottom variable is a tree consisting of a single edge, labelled \({\texttt {1}}\) (indicated in the figure by a heavy purple line), rooted at its incident clause. We then calculate on each edge the values \(m=( {\dot{{\texttt {m}}}[\dot{\tau }]({\texttt {1}})}, {\hat{{\texttt {m}}}[\hat{\tau }]({\texttt {1}})})\), and use this to determine the factors \(\dot{\varphi }\), \(\hat{\varphi }^\text {lit}\), \(\bar{\varphi }\) from (28). In this example, \(\varvec{t}\) has two free variables each with \(\dot{\varphi }=1/4\), four separating clauses each with \(\hat{\varphi }^\text {lit}=1\), and one non-separating clause with \(\hat{\varphi }^\text {lit}=3/4\). There are six edges incident to \(\varvec{t}\), each with \(\bar{\varphi }=2\). Multiplying all these factors together (Lemma 2.9) gives \(\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})=3\). Indeed, in this small example it is easy to see that there are exactly three nae-sat assignments extending the frozen configuration \(\underline{{x}}\) on \(\varvec{T}\), since the two free variables cannot both take value \({\texttt {1}}\), but the remaining three possibilities give valid nae-sat assignments (color figure online)

Proof

As we have mentioned before, this calculation is well known (see e.g. [33, Ch. 14]) but we will review it here, beginning with a minor technical point. As noted in Definition 2.5, \(\varvec{t}\) is a tree but \(\varvec{T}\) has a cycle wherever a variable \(v\in \varvec{T}{\setminus }\varvec{t}\) is joined by more than one edge to \(\varvec{t}\). However, since \(\underline{{x}}|_{\varvec{T}{\setminus }\varvec{t}}\) is \(\{{\texttt {0}},{\texttt {1}}\}\)-valued, these cycles play no role in the question of extending \(\underline{{x}}|_{\varvec{T}}\) to a valid nae-sat assignment on \(\varvec{T}\)—one can simply duplicate variables in \(\varvec{T}{\setminus }\varvec{t}\) so that each one joins to \(\varvec{t}\) by exactly one edge. We may therefore assume for the rest of the proof that all the free pieces \(\varvec{T}\in {\mathfrak {T}}(\underline{{x}})\) are acyclic.

For any \(\varvec{T}\in {\mathfrak {T}}(\underline{{x}})\) and any edge \(e\in \varvec{T}\), delete from \(\varvec{T}\) the edges \(\delta a(e){\setminus } e\), and let \(\dot{\varvec{T}}_e\) denote the component containing e in what remains, rooted at a(e). Likewise, delete from \(\varvec{T}\) the edges \(\delta v(e){{\setminus }} e\), and let \(\hat{\varvec{T}}_e\) denote the component containing e in what remains, rooted at v(e). For each variable \(w\in \dot{\varvec{T}}_e{{\setminus }}\varvec{t}\), let \(\acute{x}_w\in \{{\texttt {0}},{\texttt {1}}\}\) be the boolean sum of \(x_w\) together with all the edge literals \({\texttt {L}}\) on the path joining w to a(e) in \(\dot{\varvec{T}}_e\). Note then that \(\dot{\tau }_e\) encodes the isomorphism class of \(\dot{\varvec{T}}_e\), labelled with boundary data \(\acute{x}_w\) (for all the variables \(w\in \dot{\varvec{T}}_e{\setminus }\varvec{t}\)). A similar relation holds between \(\hat{\tau }_e\) and \(\hat{\varvec{T}}_e\). For each \(e\in \varvec{T}\), let \(\dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})\) count the number of valid nae-sat assignments that extend \(\underline{{x}}|_{\dot{\varvec{T}}_e}\) on \(\dot{\varvec{T}}_e\) and take value \({\varvec{x}}\) on v(e). Let \(\hat{\textsf {s}}_e({\varvec{x}};\underline{{x}})\) count the number of valid nae-sat assignments that extend \(\underline{{x}}|_{\hat{\varvec{T}}_e}\) on \(\hat{\varvec{T}}_e\) and take value \({\varvec{x}}\) on v(e). Denote

$$\begin{aligned}\dot{\textsf {s}}_e(\underline{{x}})\equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})\,,\quad \hat{\textsf {s}}_e(\underline{{x}}) \equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \hat{\textsf {s}}_e({\varvec{x}};\underline{{x}})\,. \end{aligned}$$

There are two boundary cases: if edge e joins a free variable in \(\varvec{t}\) to a separating clause in \(\varvec{T}{\setminus }\varvec{t}\), then we have \(\hat{\textsf {s}}_e({\texttt {0}};\underline{{x}}) = \hat{\textsf {s}}_e({\texttt {1}};\underline{{x}})=1\). If edge e instead joins a non-separating clause in \(\varvec{t}\) to a frozen variable in \(\varvec{T}{\setminus }\varvec{t}\), then we have \(\dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})=\mathbf {1}\{{\varvec{x}}=x_{v(e)}\}\). By induction started from these boundary cases we find that for all \(e\in \varvec{T}\),

$$\begin{aligned} \dot{{\texttt {m}}}_e({\varvec{x}}) =\frac{\dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})}{\dot{\textsf {s}}_e(\underline{{x}})}\,,\quad \hat{{\texttt {m}}}_e({\varvec{x}}) =\frac{\hat{\textsf {s}}_e({\varvec{x}};\underline{{x}})}{\hat{\textsf {s}}_e(\underline{{x}})}\,. \end{aligned}$$

It follows that for any variable \(v\in \varvec{t}\), any clause \(a\in \varvec{t}\), and any edge \(e\in \varvec{T}\), we have the identities

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}};{\varvec{T}}) = \dot{\varphi }(\underline{\hat{\tau }}_{\delta v}) \prod _{e'\in \delta v} \hat{\textsf {s}}_{e'}(\underline{{x}}) = \hat{\varphi }^\text {lit}(\underline{\dot{\tau }}_{\delta a}) \prod _{e'\in \delta a} \dot{\textsf {s}}_{e'}(\underline{{x}}) = \frac{\dot{\textsf {s}}_e(\underline{{x}}) \hat{\textsf {s}}_e(\underline{{x}})}{\bar{\varphi }(\tau _e)}\,.\end{aligned}$$

Combining the identities and rearranging gives (writing \(E(\varvec{t})\) for the edges of \(\varvec{t}\))

For \(a\in F(\varvec{t})\) and \(e\in \delta a{\setminus }\varvec{t}\), the variable v(e) is frozen and so we have \(\dot{\textsf {s}}_e(\underline{{x}})=1\). For any \(v\in V(\varvec{t})\) and \(e\in \delta v{\setminus }\varvec{t}\), we have \(\dot{\textsf {s}}_e(\underline{{x}})=\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})\). The tree \(\varvec{t}\) has Euler characteristic one. The right-hand side of the above equation then simplifies to \(\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})\), thereby proving the claim.\(\square \)

Corollary 2.10

Suppose on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) that \(\underline{{x}}\) is a frozen configuration with no free cycles, and let \(\underline{{\tau }}\) be its corresponding message configuration from Lemma 2.7. Then the number of valid nae-sat extensions of \(\underline{{x}}\) is given by the product formula

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}})=\prod _{v\in V} \dot{\varphi }(\underline{\hat{\tau }}_{\delta v}) \prod _{a \in F} \hat{\varphi }^\text {lit}((\underline{\dot{\tau }}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in E} \bar{\varphi }(\tau _e)\,. \end{aligned}$$

This identity holds as long as \(\hat{{\texttt {m}}}(\star )\) and \(\hat{{\texttt {m}}}(\star )\) are fixed nondegenerate probability measures on \(\{{\texttt {0}},{\texttt {1}}\}\).

Proof

Let \(V'\) denote the set of free variables, and let \(E'\) denote the set of all edges incident to \(V'\). Let \(F'\) the set of non-separating clauses. From (22) and Lemma 2.9 we have

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}}) =\prod _{{\varvec{T}}\in {\mathfrak {T}}(\underline{{x}})} \mathrm{{\textsf {{size}}}}({{\varvec{x}}};{\varvec{T}}) =\prod _{v\in V'} {\dot{\varphi }}(\underline{\hat{\tau }}_{\delta v}) \prod _{a\in F'} {\hat{\varphi }^\text {lit}}((\underline{\dot{\tau }} \oplus {\underline{{\texttt {L}}}})_{\delta a}) \prod _{e\in E'}\bar{\varphi }(\tau _e)\,. \end{aligned}$$
(30)

For any edge \(e\in E{\setminus } E'\), the incident variable v(e) must lie in \(V{\setminus } V'\), meaning \(x_{v(e)}\in \{{\texttt {0}},{\texttt {1}}\}\). We now partition \(E{\setminus } E'\) into the disjoint union of \(E_{\texttt {r}}\) and \(E_{\texttt {b}}\), as follows. Let \(E_{\texttt {r}}\) be the set of edges \(e\in E{\setminus } E'\) such that \(\hat{{\texttt {m}}}_e\) is fully supported on \(x_{v(e)}\). Let \(E_{\texttt {b}}\) be the set of edges \(e\in E{\setminus } E'\) such that \(\hat{{\texttt {m}}}_e\) is a nondegenerate measure on \(\{{\texttt {0}},{\texttt {1}}\}\); note that \(\dot{{\texttt {m}}}_e\) must then be fully supported on \(x_{v(e)}\). Consider a clause \(a\in F{\setminus } F'\). If a is non-forcing, then \(\delta a\cap E_{\texttt {r}}=\varnothing \) and \(\hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a})=1\). Otherwise, a is forcing in the direction of some edge \(e\in \delta a\), in which case \(\delta a \cap E_{\texttt {r}}= \{e\}\) and \(\hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a}) = \dot{{\texttt {m}}}_e(x_{v(e)}) = 1/\bar{\varphi }(\tau _e)\). We conclude for all \(a\in F{\setminus } F'\) that

$$\begin{aligned} \hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in \delta a \cap E_{\texttt {r}}} \bar{\varphi }(\tau _e) = 1\,. \end{aligned}$$
(31)

For \(v\in V{\setminus } V'\), for all \(e\in \delta v\cap E_{\texttt {b}}\) we have \(\dot{{\texttt {m}}}_e(x_v)=1\) and so \(\bar{\varphi }(\tau _e)=1/\hat{{\texttt {m}}}(x_v)\). Thus, for all \(v\in V{\setminus } V'\),

$$\begin{aligned} \dot{\varphi }(\underline{{\hat{\tau }}}_{\delta v}) \prod _{e\in \delta v\cap E_{\texttt {b}}} \bar{\varphi }(\tau _e) =1\,. \end{aligned}$$
(32)

The identities (31) and (32) remain valid even for vertices incident to \(\star \) messages, as long as \(\dot{{\texttt {m}}}(\star )\) and \(\hat{{\texttt {m}}}(\star )\) are fixed nondegenerate probability measures on \(\{{\texttt {0}},{\texttt {1}}\}\). Combining the identities with (30) proves the claim.\(\square \)

2.4 Colorings

We conclude this section by defining the coloring model, building on an encoding introduced by [18]. It is a simplification of the message configuration model (Definition 2.6) that takes advantage of some of the cancellations ((31) and (32)) seen above. In short, following the notation of Corollary 2.10, for edges in \(E{\setminus } E'\) it is not necessary to keep all the information of \(\tau _e\); instead, it suffices to keep track only of whether e belongs to \(E_{\texttt {r}}\) or \(E_{\texttt {b}}\), along with the value of \(x_{v(e)}\in \{{\texttt {0}},{\texttt {1}}\}\). The colorings encode precisely this information. The resulting bijection between colorings and message configurations is the last step of (18).

Recall messages take values \(\tau \equiv (\dot{\tau },\hat{\tau })\in {\mathscr {M}}\equiv \dot{\mathscr {M}}\times \hat{\mathscr {M}}\) (Definition 2.6), and let \(\{{\texttt {f}}\}\subseteq {\mathscr {M}}\) denote the subset of values \(\tau \in {\mathscr {M}}\) where we have both \(\dot{\tau }\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\) and \(\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\). Denote \(\Omega \equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}, {\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup \{{\texttt {f}}\}\). Define a projection \(\textsf {S}: {\mathscr {M}} \rightarrow \Omega \) by

$$\begin{aligned} \textsf {S}(\tau )={\left\{ \begin{array}{ll} {\texttt {r}}_{\texttt {0}}&{} \text {if }\hat{\tau }={\texttt {0}},\\ {\texttt {r}}_{\texttt {1}}&{} \text {if }\hat{\tau }={\texttt {1}},\\ {\texttt {b}}_{\texttt {0}}&{} \text {if }\hat{\tau }\ne {\texttt {0}}\text { and }\dot{\tau }={\texttt {0}},\\ {\texttt {b}}_{\texttt {1}}&{} \text {if }\hat{\tau }\ne {\texttt {1}}\text { and }\dot{\tau }={\texttt {1}},\\ \tau &{} \text {otherwise (meaning that }\tau \in \{{\texttt {f}}\}). \end{array}\right. } \end{aligned}$$
(33)

(Note that \(\textsf {S}(\tau )\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\) includes the case \(\dot{\tau }=\star \), and \(\textsf {S}(\tau )\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\) includes the case \(\hat{\tau }=\star \).) We define a partial inverse to \(\textsf {S}\) as follows. If \(\sigma \in \{{\texttt {f}}\}\) then define \(\tau \equiv \tau (\sigma )\equiv \sigma \equiv (\dot{\sigma },\hat{\sigma })\). If \(\sigma ={\texttt {r}}_{{\varvec{x}}}\) for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\) then define \(\hat{\tau }\equiv \hat{\tau }(\sigma )\equiv {\varvec{x}}\) and leave \(\dot{\tau }(\sigma )\) undefined. If \(\sigma ={\texttt {b}}_{{\varvec{x}}}\) for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\) then define \(\dot{\tau }\equiv \dot{\tau }(\sigma )\equiv {\varvec{x}}\) and leave \(\hat{\tau }(\sigma )\) undefined. For \(\sigma \in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\) we denote \((\dot{\sigma },\hat{\sigma })\equiv (\sigma ,\sigma )\). The coloring model is the image of the message configuration model under the projection \(\textsf {S}\), formally given by the following:

Definition 2.11

(colorings) For \(\underline{{\sigma }}\in \Omega ^d\), abbreviate \(\{\sigma _i\}\equiv \{\sigma _1,\ldots ,\sigma _d\}\), and define

$$\begin{aligned}{\dot{I}}(\underline{{\sigma }}) \equiv {\left\{ \begin{array}{ll} 1 &{} \text {if }{\texttt {r}}_{\texttt {0}}\in \{\sigma _i\}\subseteq \{{\texttt {r}}_{\texttt {0}},{\texttt {b}}_{\texttt {0}}\},\\ 1 &{} \text {if }{\texttt {r}}_{\texttt {1}}\in \{\sigma _i\}\subseteq \{{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {1}}\},\\ 1 &{} \{\sigma _i\}\subseteq \{{\texttt {f}}\},\text { and } \dot{\sigma }_i=\dot{T}((\hat{\sigma }_j)_{j\ne i})\text { for all }i,\\ 0 &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

For \(\underline{{\sigma }}\in \Omega ^k\), abbreviate \(\{\sigma _i\}\equiv \{\sigma _1,\ldots ,\sigma _k\}\), and define

$$\begin{aligned}\hat{I}^\text {lit}(\underline{{\sigma }}) ={\left\{ \begin{array}{ll} 1 &{} \text {if }\exists i : \sigma _i={\texttt {r}}_{\texttt {0}}\text { and } \{\sigma _j\}_{j\ne i} =\{{\texttt {b}}_{\texttt {1}}\},\\ 1 &{} \text {if }\exists i : \sigma _i={\texttt {r}}_{\texttt {1}}\text { and } \{\sigma _j\}_{j\ne i} =\{{\texttt {b}}_{\texttt {0}}\},\\ 1 &{} \text {if }\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \subseteq \{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup \{\sigma \in \{{\texttt {f}}\}: \hat{\sigma }={\texttt {s}}\},\\ 1 &{} \text {if }\{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {0}}\}\cup \{{\texttt {f}}\}, |\{\sigma _i\}\cap \{{\texttt {f}}\}|\geqslant 2,\text { and } \hat{\sigma }_i=\hat{T}((\dot{\tau }(\sigma _j))_{j\ne i})\\ {} &{}\text { for all }i\text { where } \sigma _i\ne {\texttt {b}}_{\texttt {0}};\\ 1 &{} \text {if }\{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\}, |\{\sigma _i\}\cap \{{\texttt {f}}\}|\geqslant 2,\text { and } \hat{\sigma }_i=\hat{T}((\dot{\tau }(\sigma _j))_{j\ne i})\\ {} &{}\text { for all }i\text { where } \sigma _i\ne {\texttt {b}}_{\texttt {1}};\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

(In the definition of \(\hat{I}^\text {lit}(\underline{{\sigma }})\) we used that if \(\{\sigma _i\}\subseteq \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\}\), then \(\dot{\tau }(\sigma _i)\) is defined for all i.) On an nae-sat instance \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), a configuration \(\underline{{\sigma }}\in \Omega ^E\) is a valid coloring if \({\dot{I}}(\underline{{\sigma }}_{\delta v})=1\) for all \(v\in V\), and \(\hat{I}^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a})=1\) for all \(a\in F\).

Lemma 2.12

On any given nae-sat instance \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), we have a bijection

$$\begin{aligned} \left\{ \begin{array}{c} \text {message configurations}\\ \underline{{\tau }}\in {\mathscr {M}}^E\end{array}\right\} \longleftrightarrow \left\{ \begin{array}{c} \text {colorings}\\ \underline{{\sigma }}\in \Omega ^E\end{array}\right\} \,.\end{aligned}$$

Proof

Given a valid message configuration \(\underline{{\tau }}\), a valid coloring \(\underline{{\sigma }}\) is obtained by coordinatewise application of the projection map \(\textsf {S}\) from (33). In the other direction, given a valid coloring \(\underline{{\sigma }}\), let \(x_v={\texttt {0}}\) if \(\underline{{\sigma }}_{\delta v}\) has any \({\texttt {r}}_{\texttt {0}}\) entries, \(x_v={\texttt {1}}\) if \(\underline{{\sigma }}_{\delta v}\) has any \({\texttt {r}}_{\texttt {1}}\) entries, and \(x_v={\texttt {f}}\) otherwise. The resulting \(\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\) is a valid frozen configuration, and the argument of Lemma 2.7 implies that it has no free cycles. It then maps by Lemma 2.7 to a valid message configuration \(\underline{{\tau }}\), which completes the correspondence.

\(\square \)

Recall the definitions (28) of \(\bar{\varphi }\), \(\hat{\varphi }^\text {lit}\), and \(\dot{\varphi }\). For \(\underline{{\sigma }}\in \Omega ^d\), let

$$\begin{aligned}\dot{\Phi }(\underline{{\sigma }}) ={\left\{ \begin{array}{ll} \dot{\varphi }(\underline{{\hat{\sigma }}}) &{}\text {if }{\dot{I}}(\underline{{\sigma }})=1\text { and } \{\sigma _i\}\subseteq \{{\texttt {f}}\};\\ 1 &{} \text {if }{\dot{I}} (\underline{{\sigma }})=1\text { and }\{\sigma _i\} \subseteq \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\};\\ 0 &{} \text {otherwise (meaning that }{\dot{I}}(\underline{{\sigma }})=0). \end{array}\right. }\end{aligned}$$

(Note if \(\{\sigma _i\}\subseteq \{{\texttt {f}}\}\) then \(\underline{{\hat{\sigma }}}=\underline{{\hat{\tau }}}\) and \(\dot{\varphi }(\underline{{\hat{\sigma }}})\) is well-defined.) For \(\underline{{\sigma }}\in \Omega ^k\), let

$$\begin{aligned} \hat{\Phi }^\text {lit}(\underline{{\sigma }})={\left\{ \begin{array}{ll} \hat{\varphi }^\text {lit}((\dot{\tau }(\sigma _i))_i) &{} \text {if } \hat{I}^\text {lit}(\underline{{\sigma }})=1\text { and }\{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\};\\ 1 &{} \text {if }\hat{I}^\text {lit}(\underline{{\sigma }})=1\text { and }\{\sigma _i\} \cap \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\ne \varnothing ;\\ 0 &{}\text {otherwise (meaning that } \hat{I}^\text {lit}(\underline{{\sigma }})=0).\end{array}\right. } \end{aligned}$$
(34)

(Note if \(\{\sigma _i\}\subseteq \{{\texttt {b}}_{\texttt {1}},{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\}\) then \(\dot{\tau }(\sigma _i)\) is well-defined for all i.) Finally, let

$$\begin{aligned}\bar{\Phi }(\sigma ) ={\left\{ \begin{array}{ll} \bar{\varphi }(\sigma ) &{}\text {if }\sigma \in \{{\texttt {f}}\},\\ 1 &{} \text {if }\sigma \in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}, {\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}.\end{array}\right. } \end{aligned}$$

The following is a straightforward consequence of Lemma 2.9:

Lemma 2.13

Suppose on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\) that \(\underline{{x}}\) is a frozen configuration with no free cycles. Let \(\underline{{\sigma }}\) be the coloring that corresponds to \(\underline{{x}}\) by Lemmas 2.7 and 2.12. Then the number of valid nae-sat extensions of \(\underline{{x}}\) is given by \(\mathrm {\textsf {{size}}}(\underline{{x}})=\varvec{w}^\text {lit}_{\mathscr {G}}(\underline{{\sigma }})\) where we define

$$\begin{aligned} \varvec{w}^\text {lit}_{\mathscr {G}}(\underline{{\sigma }})\equiv \prod _{v\in V} \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F} \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in E} \bar{\Phi }(\sigma _e)\,. \end{aligned}$$
(35)

Proof

This is a rewriting of (30).\(\square \)

Definition 2.14

(T-colorings) If \(\sigma \in \{{\texttt {f}}\}\), then \(\dot{\sigma }\) is a tree rooted at a clause a incident to a single edge e(a), while \(\hat{\sigma }\) is a tree rooted at a variable v incident to a single edge e(v). Glue \(\dot{\sigma }\) and \(\hat{\sigma }\) together by identifying e(a) with e(v), and let \(|\sigma |\) count the number of free variables in the resulting tree. (Note that \(|\sigma |\) must be finite because we only consider colorings of finite nae-sat instances \(\mathscr {G}\).) Thus \(|\sigma |=|\dot{\sigma }|+|\hat{\sigma }|-1\) where \(|\dot{\sigma }|\) is the number of free variables in the tree \(\dot{\sigma }\), and \(|\hat{\sigma }|\) is the number of free variables in the tree \(\hat{\sigma }\). If \(\sigma \in \Omega {\setminus }\{{\texttt {f}}\} =\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\) then define \(|\sigma |\equiv 0\). If \(\underline{{\sigma }}\) is a valid coloring on \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), then \(|\sigma _e|\) must be finite on every edge \(e\in E\), by Definition 2.11. For \(0\leqslant T\leqslant \infty \) define \(\Omega _T\equiv \{\sigma \in \Omega :|\sigma |\leqslant T\}\); we then call \(\underline{{\sigma }}\) a T-coloring if \(\sigma _e\in \Omega _T\) for all \(e\in E\). Define \(\varvec{Z}_{\lambda ,T}\) to be the partition function of \(\lambda \)-tilted T-colorings,

$$\begin{aligned} \varvec{Z}_{\lambda ,T}\equiv \varvec{Z}_{\lambda ,T}(\mathscr {G}) \equiv \sum _{\underline{{\sigma }}\in (\Omega _T)^E} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \,. \end{aligned}$$
(36)

Denote \(\varvec{Z}_\lambda \equiv \varvec{Z}_{\lambda ,\infty }\) and note that as \(T\uparrow \infty \) we have \(\varvec{Z}_{\lambda ,T}\uparrow \varvec{Z}_{\lambda ,\infty }\equiv \varvec{Z}_\lambda \).

Proposition 2.15

On an nae-sat instance \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), recall \(\mathrm {\textsf {{CL}}}(\mathscr {G})\) denotes the set of solution clusters (connected components of \(\mathrm {\textsf {{SOL}}}(\mathscr {G})\)), and define

$$\begin{aligned} \bar{\varvec{Z}}_\lambda \equiv \bar{\varvec{Z}}_\lambda (\mathscr {G}) \equiv \sum _{\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})}|\varvec{\gamma }|^\lambda \,. \end{aligned}$$
(37)

Then \(\bar{\varvec{Z}}_\lambda \) is lower bounded by \(\varvec{Z}_\lambda \), where \(\varvec{Z}_\lambda \) is the increasing limit of \(\varvec{Z}_{\lambda ,T}\) as defined by (36).

Proof

On \(\mathscr {G}\), the colorings \(\underline{{\sigma }}\) are in bijection (Lemma 2.12) with the message configurations \(\underline{{\tau }}\), which in turn are in bijection (Lemma 2.7) with the frozen configurations \(\underline{{x}}\) that do not have free cycles. Each such frozen configuration defines a distinct cluster \(\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})\), of size \(|\varvec{\gamma }|=\mathrm {\textsf {{size}}}(\underline{{x}})=\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})\). The claimed inequality directly follows.\(\square \)

To summarize what we have obtained so far, note that the quantity \(\bar{\varvec{Z}}_\lambda \) of (37) is a formal definition of the “\(\lambda \)-tilted cluster partition function” introduced in (5). In a sequence (18) of combinatorial mappings, we have produced in (36) a mathematically well-defined quantity \(\varvec{Z}_{\lambda ,T}\) which lower bounds \(\bar{\varvec{Z}}_\lambda \) (Proposition 2.15), and will be much more tractable thanks to the product formula for cluster sizes (Lemma 2.13). The lower bound of Theorem 1 is based on the second moment method applied to \(\varvec{Z}_{\lambda ,T}\). In preparation for the moment calculation, we conclude the current section by discussing some simplifications obtained by averaging over the literals of the nae-sat instance.

2.5 Averaging over edge literals

Our eventual purpose is to calculate \(\mathbb {E}\varvec{Z}_{\lambda ,T}\) and \(\mathbb {E}[(\varvec{Z}_{\lambda ,T})^2]\), where \(\mathbb {E}\) is expectation over the nae-sat instance \(\mathscr {G}\). Recall that \(\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})\) where \(\mathcal {G}=(V,F,E)\) is the graph without the edge literals \(\underline{{\texttt {L}}}\). Then \(\mathbb {E}\varvec{Z}_{\lambda ,T}=\mathbb {E}(\mathbb {E}(\varvec{Z}_{\lambda ,T}\,|\,\mathcal {G}))\) where

$$\begin{aligned} \mathbb {E}(\varvec{Z}_{\lambda ,T}\,|\,\mathcal {G}) =\sum _{\underline{{\sigma }}\in (\Omega _T)^E} \mathbb {E}(\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \,|\,\mathcal {G})\,. \end{aligned}$$

For any \(l\geqslant 1\) and any function \(g:\{{\texttt {0}},{\texttt {1}}\}^l\rightarrow \mathbb {R}\), let \(\mathbb {E}^lit g\) denote the average value of \(g(\underline{{\texttt {L}}})\) over all \(\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^l\). For any \(\underline{{\sigma }}\in \Omega ^E\), we have \(\mathbb {E}(\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \,|\,\mathcal {G})= \varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda \) where (compare (35))

$$\begin{aligned} \varvec{w}_\mathcal {G}(\underline{{\sigma }}) \equiv \prod _{v\in V} \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F}\hat{\Phi }(\underline{{\sigma }}_{\delta a}) \prod _{e\in E} \bar{\Phi }(\sigma _e)\,,\quad \hat{\Phi }(\underline{{\sigma }}) \equiv \Big ( \mathbb {E}^lit [\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})^\lambda ] \Big )^{1/\lambda } \end{aligned}$$
(38)

— that is to say, even after averaging over \(\underline{{\texttt {L}}}\), the contribution of each \(\underline{{\sigma }}\in \Omega ^E\) is still given by a product formula. This means that \(\mathbb {E}(\varvec{Z}_{\lambda ,T}\,|\,\mathcal {G})\) is the partition function of a “factor model”:

Definition 2.16

(factor model) On a bipartite graph \(\mathcal {G}=(V,F,E)\), the factor model specified by \(g\equiv ({\dot{g}},{\hat{g}},{\bar{g}})\) is the probability measure \(\nu _\mathcal {G}\) on configurations \(\underline{{\xi }}\in {\mathscr {X}}^E\) defined by

$$\begin{aligned}\nu _\mathcal {G}(\underline{{\xi }}) = \frac{1}{Z} \prod _{v\in V} {\dot{g}}(\underline{{\xi }}_{\delta v}) \prod _{a\in F} {\hat{g}}(\underline{{\xi }}_{\delta a}) \prod _{e\in E} {\bar{g}}(\xi _e), \end{aligned}$$

with Z the normalizing constant.

A further observation is that for \(\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})\) and \(\underline{{\sigma }}\in \Omega ^E\), as we go over all possibilities of \(\underline{{\texttt {L}}}\) while keeping \(\mathcal {G}\) fixed, the weight \(\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})\) (the size of the cluster encoded by \(\underline{{\sigma }}\) on \(\mathscr {G}\), as given by (35)) does not take more than one positive value. In other words, we can extract the cluster size without referring to the edge literals. The precise statement is as follows:

Lemma 2.17

The function \(\hat{\Phi }^\text {lit}\) of (34) can be factorized as \(\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})\hat{F}(\underline{{\sigma }})\) for

$$\begin{aligned}\hat{F}(\underline{{\sigma }})\equiv {\left\{ \begin{array}{ll} 1 &{}\text {if }\underline{{\sigma }}\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^k,\\ \displaystyle \frac{\hat{z}(\hat{\sigma }_j)}{\bar{\varphi }(\sigma _j)} &{}\text {if } \underline{{\sigma }}\in \Omega ^k\text { with }\sigma _j\in \{{\texttt {f}}\}. \end{array}\right. } \end{aligned}$$

As a consequence, the function of (38) satisfies \(\hat{\Phi }(\underline{{\sigma }})^\lambda ={\hat{v}}(\underline{{\sigma }})\hat{F}(\underline{{\sigma }})^\lambda \) where \({\hat{v}}(\underline{{\sigma }})\equiv \mathbb {E}^lit [\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})]\).

Proof

For \(\underline{{\sigma }}\in \Omega ^k\) abbreviate \(\{\sigma _i\}\equiv \{\sigma _1,\ldots ,\sigma _k\}\). If \(\{\sigma _i\}\subseteq \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\), then the definition (34) implies \(\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})\) for all \(\underline{{\texttt {L}}}\), so the factorization holds with \(\hat{F}(\underline{{\sigma }})\equiv 1\). If \(\{\sigma _i\}\) nontrivially intersects both \(\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\) and \(\{{\texttt {f}}\}\), then \(\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=0\) for all \(\underline{{\texttt {L}}}\), so we can set \(\hat{F}(\underline{{\sigma }})\) arbitrarily. It remains to consider the case where \(\{\sigma _i\}\) nontrivially intersects \(\{{\texttt {f}}\}\) but does not intersect \(\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\). Recalling the discussion around (33), this means that \(\dot{\tau }_i\equiv \dot{\tau }(\sigma _i)\in \dot{\mathscr {M}}{\setminus }\{\star \}\) is well-defined for all i—if \(\sigma _i\in \{{\texttt {f}}\}\) then \(\dot{\tau }_i=\dot{\sigma }_i\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}\), and if \(\sigma _i={\texttt {b}}_{{\varvec{x}}}\) then \(\dot{\tau }_i={\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\). Following (23), given any \(\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^k\), let us define \(\hat{\tau }_{\underline{{\texttt {L}}},i}\equiv {\texttt {L}}_i\oplus \hat{T}((\dot{\tau }_j\oplus {\texttt {L}}_j)_{j\ne i} )\). If \(\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=1\), then it follows from (25), (28), and (34) that

$$\begin{aligned} \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})&=\hat{\varphi }^\text {lit}( \underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}}) =\sum _{\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^k} I^\textsc {nae}(\underline{{{\varvec{x}}}}\oplus \underline{{\texttt {L}}}) \prod _{j=1}^k [\dot{{\texttt {m}}}(\dot{\tau }_j)]({\varvec{x}}_j)\nonumber \\&=\hat{z}(\hat{\tau }_{\underline{{\texttt {L}}},i}) \sum _{{\varvec{x}}_i} [\dot{{\texttt {m}}}(\dot{\tau }_i)]({\varvec{x}}_i) \cdot [\hat{{\texttt {m}}}(\hat{\tau }_{\underline{{\texttt {L}}},i})]({\varvec{x}}_i) =\frac{\hat{z}(\hat{\tau }_{\underline{{\texttt {L}}},i})}{\bar{\varphi }(\dot{\tau }_i,\hat{\tau }_{\underline{{\texttt {L}}},i})}\,. \end{aligned}$$
(39)

We will have \(\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=1\) if and only if it holds for all \(1\leqslant i\leqslant k\) that \(\hat{\tau }_{\underline{{\texttt {L}}},i}\) is compatible with \(\sigma _i\), in the sense that \(\textsf {S}(\dot{\tau }_i,\hat{\tau }_{\underline{{\texttt {L}}},i})=\sigma _i\). In particular, if \(\sigma _i\in \{{\texttt {f}}\}\) (and we assumed \(\underline{{\sigma }}\) has at least one such entry), we must have \((\dot{\tau }_i,\hat{\tau }_{\underline{{\texttt {L}}},i})=(\dot{\sigma }_i,\hat{\sigma }_i)\). It follows that for any \(\underline{{\sigma }}\in \Omega ^k\) having at least one entry in \(\{{\texttt {f}}\}\), we can define \(\hat{F}(\underline{{\sigma }})\equiv \hat{z}(\hat{\sigma }_i)/\bar{\varphi }(\sigma _i)\) for any i where \(\sigma _i\in \{{\texttt {f}}\}\). This completes the proof.\(\square \)

Corollary 2.18

On a bipartite graph \(\mathcal {G}=(V,F,E)\), suppose \(\underline{{\sigma }}\in \Omega ^E\) satisfies \({\dot{I}}(\underline{{\sigma }}_{\delta v})=1\) for all \(v\in V\). Then, for \(\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})\), it follows from (35) that

$$\begin{aligned} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }}) =\bigg \{\prod _{a\in F} \hat{I}^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a})\bigg \} \varvec{W}_\mathcal {G}(\underline{{\sigma }})\,,\quad \varvec{W}_\mathcal {G}(\underline{{\sigma }})\equiv \prod _{v\in V}\dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F}\hat{F}(\underline{{\sigma }}_{\delta a}) \prod _{e\in E}\bar{\Phi }(\sigma _e)\,. \end{aligned}$$

Combining with (38) gives, with \({\hat{v}}\) as defined by Lemma 2.17,

$$\begin{aligned} \varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda =\varvec{p}_\mathcal {G}(\underline{{\sigma }}) \varvec{W}_\mathcal {G}(\underline{{\sigma }})^\lambda \,,\quad \varvec{p}_\mathcal {G}(\underline{{\sigma }}) \equiv \mathbb {E}\bigg [ \prod _{a\in F} \hat{I}^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \,\bigg |\,\mathcal {G}\bigg ] =\prod _{a\in F} {\hat{v}}(\underline{{\sigma }}_{\delta a})\,. \end{aligned}$$

Proof

Immediate consequence of Lemma 2.17.\(\square \)

In the notation of Definition 2.16, the conditional first moment \(\mathbb {E}(\varvec{Z}_{\lambda ,T}\,|,\mathcal {G})\) is the partition function of the factor model with specification \((\dot{\Phi },\hat{\Phi },\bar{\Phi })^\lambda \) restricted to the alphabet \(\Omega _T\). Similarly, the conditional second moment \(\mathbb {E}[(\varvec{Z}_{\lambda ,T})^2\,|\,\mathcal {G}]\) is the partition function of the factor model on the alphabet \((\Omega _T)^2\) with specification \((\dot{\Phi }_2,\hat{\Phi }_2,\bar{\Phi }_2)^\lambda \), where \(\dot{\Phi }_2\equiv \dot{\Phi }\otimes \dot{\Phi }\), \(\bar{\Phi }_2\equiv \bar{\Phi }\otimes \bar{\Phi }\), and for any \(\underline{{\sigma }}\equiv (\underline{{\sigma }}^1,\underline{{\sigma }}^2)\in \Omega ^{2k}\) we have

$$\begin{aligned} \hat{\Phi }_2(\underline{{\sigma }})\equiv \bigg (\mathbb {E}^lit \Big [ \hat{\Phi }^\text {lit}(\underline{{\sigma }}^1\oplus \underline{{\texttt {L}}})^\lambda \hat{\Phi }^\text {lit}(\underline{{\sigma }}^2\oplus \underline{{\texttt {L}}})^\lambda \Big ]\bigg )^{1/\lambda } ={\hat{v}}_2(\underline{{\sigma }})^{1/\lambda } (\hat{F}\otimes \hat{F})(\underline{{\sigma }})\,, \end{aligned}$$

for \({\hat{v}}_2(\underline{{\sigma }})\equiv \mathbb {E}^lit [\hat{I}^\text {lit}(\underline{{\sigma }}^1\oplus \underline{{\texttt {L}}})\hat{I}^\text {lit}(\underline{{\sigma }}^1\oplus \underline{{\texttt {L}}})]\) (by Corollary 2.18). We emphasize that \(\hat{\Phi },\hat{\Phi }_2\) both depend on \(\lambda \), although we suppress it from the notation. Moreover, \(\hat{\Phi }_2\ne \hat{\Phi }\otimes \hat{\Phi }\) since \(\underline{{\sigma }}^1\) and \(\underline{{\sigma }}^2\) are coupled through their interaction with the same literals \(\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^k\). Lastly, we have written \(\underline{{\sigma }}\) in the first moment and \(\underline{{\sigma }}\equiv (\underline{{\sigma }}^1,\underline{{\sigma }}^2)\) in the second moment—this is a deliberate abuse of notation, which allows us to treat the two cases in a unified manner. To distinguish the cases we shall refer to the “first-moment” or “single-copy” model, versus the “second-moment” or “pair” model. We turn next to the analysis of these models.

3 Proof outline

Having formally set up our combinatorial model of nae-sat solution clusters (Sect. 2), we now give a more detailed outline for the (first and second) moment calculation that proves the lower bound of Theorem 1. (As we mentioned before, the upper bound of Theorem 1 is proved by an interpolation argument which builds on prior results in spin glass theory [12, 26, 43]. It does not involve the combinatorial model or the moment method, and is deferred to “Appendix E”.)

3.1 Empirical measures and moments

We use standard multi-index notations in what follows—in particular, for any ordered sequence \(z=(z_1,\ldots ,z_l)\) of nonnegative integers summing to n, we denote

$$\begin{aligned} \left( {\begin{array}{c}n\\ z\end{array}}\right) \equiv n!\bigg / \prod _{i=1}^l z_i!\,. \end{aligned}$$

If \(\pi \) is any nonnegative measure on a discrete space, write \(\mathcal {H}(\pi ) = -\langle \pi ,\ln \pi \rangle \) for its Shannon entropy. It follows from Stirling’s formula that for any fixed \(\pi \), in the limit \(n\rightarrow \infty \) we have

$$\begin{aligned} \left( {\begin{array}{c}n\\ n\pi \end{array}}\right) \asymp \frac{\exp \{n{\mathcal {H}}(\pi )\}}{n^{(|{{\,\mathrm{supp}\,}}\pi |-1)/2}}\,. \end{aligned}$$

On a bipartite graph \(\mathcal {G}\), we will summarize colorings \(\underline{{\sigma }}\) according to some “local statistics,” as follow:

Definition 3.1

(empirical measures) Given a bipartite graph \(\mathcal {G}=(V,F,E)\) and \(\underline{{\sigma }}\in \Omega ^E\), define

$$\begin{aligned} \begin{aligned}\dot{H}(\underline{{\dot{\sigma }}})&=|\{v\in V:\underline{{\sigma }}_{\delta v} =\underline{{\dot{\sigma }}}\}|/|V|&\quad \text {for } \underline{{\dot{\sigma }}}\in \Omega ^d,\\ \hat{H}(\underline{{\hat{\sigma }}})&= |\{a\in F: \underline{{\sigma }}_{\delta a} =\underline{{\hat{\sigma }}}\}|/|F|&\quad \text {for } \underline{{\hat{\sigma }}}\in \Omega ^k,\\ \bar{H}(\sigma )&=|\{e\in E: \sigma _e=\sigma \}|/|E|&\quad \text {for } \sigma \in \Omega ; \end{aligned} \end{aligned}$$

The triple \(H\equiv H(\mathcal {G},\underline{{\sigma }})\equiv (\dot{H},\hat{H},\bar{H})\) is the empirical measure of \(\underline{{\sigma }}\) on \(\mathcal {G}\).

Recall from (38) that \(\varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda \) is the contribution to \(\mathbb {E}(\varvec{Z}_\lambda \,|\,\mathcal {G})\) from \(\underline{{\sigma }}\in \Omega ^E\). We saw in Corollary 2.18 that \(\varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda =\varvec{p}_\mathcal {G}(\underline{{\sigma }})\varvec{W}_\mathcal {G}(\underline{{\sigma }})^\lambda \) where \(\varvec{p}_\mathcal {G}(\underline{{\sigma }})\) is the probability (conditional on \(\mathcal {G}\)) that \(\underline{{\sigma }}\) is a valid coloring on \((\mathcal {G},\underline{{\texttt {L}}})\); and \(\varvec{W}_\mathcal {G}(\underline{{\sigma }})\) is the size of the cluster encoded if \(\underline{{\sigma }}\) is valid. Now all these quantities can be expressed solely in terms of \(H=H(\mathcal {G},\underline{{\sigma }})\): we have \(\varvec{p}_\mathcal {G}(\underline{{\sigma }})=\exp (n\varvec{v}(H))\) and \(\varvec{W}_\mathcal {G}(\underline{{\sigma }})=\exp (n\varvec{s}(H))\) where

$$\begin{aligned} \varvec{v}(H)&\equiv (d/k) \langle \ln {\hat{v}},\hat{H}\rangle =(d/k) \sum _{\underline{{\sigma }}\in \Omega ^k} \hat{H}(\underline{{\sigma }}) \ln {\hat{v}}(\underline{{\sigma }}) \,,\\ \varvec{s}(H)&\equiv \langle \ln \dot{\Phi },\dot{H}\rangle +(d/k)\langle \ln \hat{F}, \hat{H}\rangle +d\langle \ln \bar{\Phi },\bar{H}\rangle \,. \end{aligned}$$

Given \(\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})\), let \(\varvec{Z}_{\lambda ,T}(H)\) be the contribution to \(\varvec{Z}_{\lambda ,T}\) from colorings \(\underline{{\sigma }}\in (\Omega _T)^E\) such that \(H(\mathcal {G},\underline{{\sigma }})=H\). In what follows we will often suppress the dependence on \(\lambda \) and T, and write simply \(\varvec{Z}\equiv \varvec{Z}_{\lambda ,T}\).

Definition 3.2

(simplex) For dkT fixed, the simplex of empirical measures is the space \(\varvec{\Delta }\equiv \varvec{\Delta }(T)\) of triples \(H\equiv (\dot{H},\hat{H},\bar{H})\) satisfyng the following conditions: \(\dot{H}\) is a probability measure supported within the set of \(\underline{{\sigma }}\in (\Omega _T)^d\) such that \({\dot{I}}(\underline{{\sigma }})=1\); \(\hat{H}\) is a probability measure supported within the set of \(\underline{{\sigma }}\in (\Omega _T)^k\) such that \({\hat{v}}(\underline{{\sigma }})\) is positive; and both \(\dot{H}\) and \(\hat{H}\) must have marginal \(\bar{H}\), that is,

$$\begin{aligned} \frac{1}{d}\sum _{\underline{{\sigma }}\in \Omega ^d} \dot{H}(\underline{{\sigma }})\sum _{i=1}^d \mathbf {1}\{\sigma _i=\sigma \} =\bar{H}(\sigma ) =\frac{1}{k}\sum _{\underline{{\sigma }}\in \Omega ^k} \hat{H}(\underline{{\sigma }})\sum _{j=1}^k \mathbf {1}\{\sigma _j=\sigma \} \end{aligned}$$
(40)

for all \(\sigma \in \Omega \). It follows that \(\bar{H}\) is a probability measure supported on \(\Omega _T\).

It follows from Corollary 2.18 that if \(\mathbb {E}\) is expectation over a (dk)-regular nae-sat instance on n variables, then \(\mathbb {E}\varvec{Z}(H)\) is positive if and only if \(H\in \varvec{\Delta }\) and \((n\dot{H},m\hat{H})\) is integer-valued. For such H, it follows from the definition of the random regular nae-sat graph that

$$\begin{aligned} \mathbb {E}\varvec{Z}(H) =\bigg [\bigg \{\left( {\begin{array}{c}n\\ n\dot{H}\end{array}}\right) \left( {\begin{array}{c}m\\ m\hat{H}\end{array}}\right) \bigg / \left( {\begin{array}{c}nd\\ nd\bar{H}\end{array}}\right) \bigg \} \exp \{n\varvec{v}(H)\}\bigg ] \cdot \exp \{ n\lambda \varvec{s}(H) \}\,. \end{aligned}$$
(41)

In (41), the first factor (in square brackets) is the expected number \(\mathbb {E}\varvec{Z}_{\lambda =0,T}\) of valid colorings with empirical profile H. The remaining factor \(\exp \{ n\lambda \varvec{s}(H) \}\) is explained by the fact that any such coloring encodes a cluster of size \(\exp \{n\varvec{s}(H)\}\). By Stirling’s formula, in the limit \(n\rightarrow \infty \) (with T fixed),

$$\begin{aligned} \mathbb {E}\varvec{Z}_{\lambda =0,T} \asymp \frac{\exp \{n [\mathcal {H}(\dot{H}) + (d/k)\mathcal {H}(\hat{H}) -d\mathcal {H}(\bar{H}) +\varvec{v}(H)]\}}{n^{\wp (H)/2}} \equiv \frac{\exp \{n\varvec{\Sigma }(H)\}}{n^{\wp (H)/2}} \end{aligned}$$

where \(\wp (H)\equiv |{{\,\mathrm{supp}\,}}\dot{H}|+|{{\,\mathrm{supp}\,}}\hat{H}|-|{{\,\mathrm{supp}\,}}\bar{H}|-1\), and the exponential rate \(\varvec{\Sigma }(H)\) is a formal analogue of the “cluster complexity” function \(\Sigma (s)\) appearing in (4). In analogy with (6) we let

$$\begin{aligned} \varvec{F}\equiv \varvec{F}_{\lambda ,T} \equiv \varvec{\Sigma }(H)+\lambda \varvec{s}(H)\,. \end{aligned}$$
(42)

Then altogether the first moment can be estimated as

$$\begin{aligned} \mathbb {E}\varvec{Z}(H) \asymp \bigg (\frac{\exp \{n\varvec{\Sigma }(H)\}}{n^{\wp (H)/2}}\bigg ) \exp \{n\lambda \varvec{s}(H)\} =\frac{\exp \{n\varvec{F}(H)\}}{n^{\wp (H)/2}}\,. \end{aligned}$$
(43)

Note that \(\asymp \) hides a dependence on T, since we keep T fixed throughout our moment analysis.

3.2 Outline of first moment

For any subset of empirical measures \({\mathbf {H}}\subseteq \varvec{\Delta }\), we write \(\underline{{\sigma }}\in {\mathbf {H}}\) to indicate that \(H(\mathcal {G},\underline{{\sigma }})\in {\mathbf {H}}\), and write \(\varvec{Z}({\mathbf {H}})\equiv \varvec{Z}_{\lambda ,T}({\mathbf {H}})\) for the contribution to \(\varvec{Z}_{\lambda ,T}\) from colorings \(\underline{{\sigma }}\in {\mathbf {H}}\). It then follows from (43) that

$$\begin{aligned}\mathbb {E}\varvec{Z}({\mathbf {H}}) =\sum _{H\in {\mathbf {H}}}\mathbb {E}\varvec{Z}(H) =n^{O(1)} \exp \Big \{n\max \{\varvec{F}(H):H\in {\mathbf {H}}\} \Big \}\,, \end{aligned}$$

for \(\varvec{F}\) as in (42). Thus, calculating the first moment \(\mathbb {E}\varvec{Z}\) essentially reduces to the problem of maximizing \(\varvec{F}\) over \(\varvec{\Delta }\). The physics theory suggests that \(\varvec{F}\) is uniquely maximized at a point \(H_\star \in \varvec{\Delta }\) which is given explicitly in terms of a replica symmetric fixed point for the \(\lambda \)-tilted T-coloring model. (Recall from §1.5 that in the original nae-sat model, the replica symmetric fixed point was described by the measure \({\texttt {m}}=\text {unif}(\{{\texttt {0}},{\texttt {1}}\})\). In the coloring model, with spins \(\sigma \equiv (\dot{\sigma },\hat{\sigma })\in \Omega _T\), the replica symmetric fixed point will be characterized by a measure \({\dot{q}}\) on the space \(\Omega _T\) of possible values for \(\dot{\sigma }\).)

There are several obstacles to the rigorous moment computation. From a physics perspective, the replica symmetric fixed point of the \(\lambda \)-tilted coloring model at \(T=\infty \) is equivalent to the fixed point described by Proposition 1.2, which was used to define the 1rsb prediction (12). Mathematically, however, we work with T finite so that \(\varvec{\Delta }\) has finite dimension and \(\wp (H)\) is defined. Therefore we need to explicitly construct a replica symmetric fixed point at finite T, and use it to define \( H_\star =H_{\lambda ,T}\in \varvec{\Delta }\) (Definition 5.6 below). We must then take \(T\rightarrow \infty \) and show that the limit matches the fixed point of Proposition 1.2, so that \(\varvec{F}(H_\star )= \varvec{F}_{\lambda ,T}(H_{\lambda ,T})\) converges as \(T\rightarrow \infty \) the 1rsb prediction (12). The construction of the fixed point at finite T is stated in Proposition 5.5 below, and proved in “Appendix A”. The correspondence with (12) in the \(T=\infty \) limit is stated in Proposition 3.13 below, and proved in “Appendix B”.

A more difficult problem is to show that \(\varvec{F}\) is in fact maximized at \(H_\star \). The function \(\varvec{F}\) is generally not convex, and must be optimized over a space \(\varvec{\Delta }\) whose dimension grows with dkT. Moreover, an analogous but even more difficult optimization must be solved to compute the second moment \(\mathbb {E}(\varvec{Z}^2)\). The main part of this analysis is carried out in Sects. 4 and 5. In the remainder of this section we make some preparatory calculations and explain how the pieces will be fit together to prove the main result Theorem 1. Recall from Remark 1.1 that we can restrict consideration to \(\alpha \) satisfying (3). In this regime, we make a priori estimates to show that the optimal H satisfy some basic restrictions. Abbreviate \(\{{\texttt {r}}\}\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\), recall \(\{{\texttt {f}}\}\equiv \Omega {\setminus }\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\), and let

$$\begin{aligned} \mathbf {N}_\circ&\equiv \bigg \{H\in \varvec{\Delta }: \max \{ \bar{H}({\texttt {f}}), \bar{H}({\texttt {r}}) \} \leqslant \frac{7}{2^k} \bigg \}\,\,,\nonumber \\ \mathbf {N}&=\bigg \{ H\in \mathbf {N}_\circ : \Vert {H-H_\star } \Vert \leqslant \frac{1}{n^{1/3}} \bigg \} \subseteq \mathbf {N}_\circ \,, \end{aligned}$$
(44)

where \(\Vert {\cdot } \Vert \) denotes the \(\ell ^1\) norm throughout this paper. We next show that empirical measures \(H\notin \mathbf {N}_\circ \) give a negligible contribution to the first moment:

Lemma 3.3

For \(k\geqslant k_0\), \(\alpha \equiv d/k\) satisfying (3), and \(0\leqslant \lambda \leqslant 1\), \(\mathbb {E}\varvec{Z}(\varvec{\Delta }{\setminus }\mathbf {N}_\circ )\) is exponentially small in n.

Proof

Let \(Z^{\texttt {f}}\) count the nae-sat solutions \(\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) which map—via coarsening and the bijection (18)—to warning configurations \(\underline{{y}}\) with more than \(7/2^k\) fraction of edges e such that \({\dot{y}}_e={\hat{y}}_e={\texttt {f}}\). Similarly, let \( Z^{\texttt {r}}\) count nae-sat solutions \(\underline{{{\varvec{x}}}}\) mapping to warning configurations \(\underline{{y}}\) with more than \(7/2^k\) fraction of edges e such that \({\hat{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}\). It follows from Proposition 2.15 that for any \(0\leqslant \lambda \leqslant 1\) we have \(\varvec{Z}(\varvec{\Delta }{\setminus }\mathbf {N}_\circ )\leqslant Z^{\texttt {f}}+ Z^{\texttt {r}}\). For \(\alpha \) satisfying (3), \(\mathbb {E}Z^{\texttt {f}}\) is exponentially small in n by [25, Propn. 2.2]. As for \(Z^{\texttt {r}}\), let us say that an edge \(e\in E\) is blocked under \(\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) if \({\texttt {L}}_e \oplus x_{v(e)}= {\texttt {1}}\oplus {\texttt {L}}_{e'} \oplus x_{v(e')}\) for all \(e'\in \delta a(e) {\setminus } e\). Note that if \(\underline{{{\varvec{x}}}}\) maps to \(\underline{{y}}\), the only possibility for \(y_e\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\) is that e was blocked under \(\underline{{{\varvec{x}}}}\). (The converse need not hold.) If we condition on \(\underline{{{\varvec{x}}}}\) being a valid nae-sat solution, then each clause contains a blocking edge independently with chance \(\theta = 2k/(2^k-2)\); note also that a clause can contain at most one blocking edge. It follows that

$$\begin{aligned} \mathbb {E}Z^{\texttt {r}}\leqslant (\mathbb {E}Z) \mathbb {P}\bigg ({\mathrm {Bin}}( m, \theta ) \geqslant 7 nd/2^k\bigg )\,. \end{aligned}$$

This is exponentially small in n by a Chernoff bound together with the trivial bound \(\mathbb {E}Z \leqslant 2^n\).\(\square \)

We assume throughout what follows that \(k\geqslant k_0\), \(\alpha \) satisfies (3), and \(0\leqslant \lambda \leqslant 1\). In this regime, Lemma 3.3 tells us that \(\max \{\varvec{F}(H) : H\notin \mathbf {N}_\circ \}\) is negative. On the other hand, we shall assume that the global maximum of \(\varvec{F}\) is nonnegative, since otherwise \(\mathbb {E}\varvec{Z}\) is exponentially small in n and there is nothing to prove. From this we have that any maximizer H of \(\varvec{F}\) must lie in \(\mathbf {N}_\circ \). In Sects. 4 and 5 we develop a more refined analysis to solve the optimization problem for \(\varvec{F}\) restricted to \(\mathbf {N}_\circ \):

Proposition 3.4

(proved in Sect. 5) Assuming the global maximum of \(\varvec{F}\) is nonnegative, the unique maximizer of \(\varvec{F}\) is an explicitly characterized point \(H_\star \) in the interior of \(\mathbf {N}_\circ \). Moreover, there is a positive constant \(\epsilon =\epsilon (k,\lambda ,T)\) so that for all \(\Vert {H-H_\star } \Vert \leqslant \epsilon \) we have \(\varvec{F}(H) \leqslant \varvec{F}(H_\star )-\epsilon \Vert {H-H_\star } \Vert ^2\).

A consequence of the above is that we can compute the first moment of \(\varvec{Z}\) up to constant factors. In the following, let \(\dot{\wp }\equiv \dot{\wp }(T)\) count the number of d-tuples \(\underline{{\sigma }}\in (\Omega _T)^d\) for which \({\dot{I}}(\underline{{\sigma }})>0\). Let \(\hat{\wp }\equiv \hat{\wp }(T)\) count the number of k-tuples \(\underline{{\sigma }}\in (\Omega _T)^k\) for which \({\hat{v}}(\underline{{\sigma }})>0\). Let \(\bar{\wp }\equiv |\Omega _T|\), and denote \(\wp \equiv \dot{\wp }+\hat{\wp }-\bar{\wp }-1\). Recall from (44) the definition of \(\mathbf {N}\).

Corollary 3.5

The coloring partition function \(\varvec{Z}\equiv \varvec{Z}_{\lambda ,T}\) has first moment \(\mathbb {E}\varvec{Z}\asymp \exp \{ n \varvec{F}(H_\star ) \}\). Moreover the expectation is dominated by \(\mathbf {N}\) in the sense that \(\mathbb {E}\varvec{Z}(\mathbf {N})=(1-o(1)) \mathbb {E}\varvec{Z}\).

Proof

Define the \(\bar{\wp }\times \dot{\wp }\) matrix \({\dot{M}}\) with entries

$$\begin{aligned}{\dot{M}}(\sigma ',\underline{{\sigma }}) \equiv \sum _{i=1}^d \mathbf {1}\{\sigma _i=\sigma '\}\,,\quad \sigma '\in \Omega _T\text { and } \underline{{\sigma }}\in (\Omega _T)^d\cap ({{\,\mathrm{supp}\,}}{\dot{I}})\,. \end{aligned}$$

Similarly define the \(\bar{\wp }\times \hat{\wp }\) matrix \({\hat{M}}\) with entries

$$\begin{aligned}{\hat{M}}(\sigma ',\underline{{\sigma }}) =\sum _{i=1}^k \mathbf {1}\{\sigma _i=\sigma '\}\,,\quad \sigma '\in \Omega _T\text { and } \underline{{\sigma }}\in (\Omega _T)^k\cap ({{\,\mathrm{supp}\,}}{\hat{v}})\,. \end{aligned}$$

Lastly define the \(\bar{\wp }\times (\dot{\wp }+\hat{\wp })\) matrix \(M \equiv \begin{pmatrix}{\dot{M}}&-{\hat{M}}\end{pmatrix}\). It follows from the discussion after Definition 3.2 that \(\mathbb {E}\varvec{Z}(H)\) is positive if and only if (i) \(\dot{H}\) and \(\hat{H}\) are nonnegative; (ii) \(\langle {\mathbf {1}},\dot{H}\rangle =1\); (iii) \((k\dot{H},d\hat{H})\) lies in the kernel of M; and (iv) \((n\dot{H},m\hat{H})\) is integer-valued. Conditions (i)–(iii) are equivalent to \(H\in \varvec{\Delta }\). One can verify that the matrix M is of full rank, from which it follows that the space of vectors \((\dot{H},\hat{H})\) satisfying the conditions (ii) and (iii) has dimension \(\wp \). In Lemma 4.4 we will show that M satisfies a stronger condition, which implies that the space of \((\dot{H},\hat{H})\) satisfying (ii)–(iv) is an affine transformation of \((n^{-1}{\mathbb {Z}})^\wp \), where the coefficients of the transformation are uniformly bounded. Then, by substituting the result of Proposition 3.4 in to (43), we conclude

$$\begin{aligned} \mathbb {E}\varvec{Z}\asymp \sum _{z\in (n^{-1}{\mathbb {Z}})^\wp } \frac{\exp \{ n[\varvec{F}(H_\star )- \Theta (\Vert {z} \Vert ^2) ]\}}{n^{\wp /2}} \asymp \exp \{n\varvec{F}(H_\star )\} \end{aligned}$$

Empirical measures \(H\notin \mathbf {N}\) correspond to vectors \(z\in (n^{-1}{\mathbb {Z}})^\wp \) with norm \(\Vert {z} \Vert \geqslant n^{-1/3}\). These give a negligible contribution to the above sum which proves the second claim \(\mathbb {E}\varvec{Z}(\mathbf {N})=(1-o(1))\mathbb {E}\varvec{Z}\).\(\square \)

3.3 Outline of second moment

By a similar calculation as above, calculating the second moment \(\mathbb {E}(\varvec{Z}^2)\) reduces to the problem of maximizing a function \(\varvec{F}_2\equiv \varvec{F}_{2,\lambda ,T}\) over a space \(\varvec{\Delta }_2\) of pair empirical measures. In fact we will calculate the second moment not of \(\varvec{Z}\) itself, but rather of a more restricted random variable \(\varvec{S}\leqslant \varvec{Z}\), defined below. This leads to a more tractable analysis, as we now explain.

Concretely, \(\varvec{\Delta }_2\) is the space of triples \(H=(\dot{H},\hat{H},\bar{H})\) satisfying the following conditions: \(\dot{H}\) is a probability measure on \((\Omega _T)^{2d}\cap ({{\,\mathrm{supp}\,}}{\dot{I}})^2\), \(\hat{H}\) is a probability measure on \((\Omega _T)^{2d}\cap ({{\,\mathrm{supp}\,}}{\hat{v}}_2)\), and \(\bar{H}\) is a probability measure on \((\Omega _T)^2\) which can be obtained from both \(\dot{H}\) and \(\hat{H}\) as the edge marginal (40). Repeating the derivation of (43) in the second moment setting, we have the following. For any \(H\in \varvec{\Delta }_2\), the expected number of valid coloring pairs \((\underline{{\sigma }},\underline{{\sigma }}')\in (\Omega _T)^{2E}\) of type H is

$$\begin{aligned} \mathbb {E}(\varvec{Z}_{\lambda =0,T})^2(H) \asymp \frac{\exp \{n[\mathcal {H}(\dot{H}) + (d/k)\mathcal {H}(\hat{H})-d\mathcal {H}(\bar{H}) +\varvec{v}_2(H)]\}}{n^{\wp (H)/2}} \equiv \frac{\exp \{n\varvec{\Sigma }_2(H)\}}{n^{\wp (H)/2}} \end{aligned}$$

Given a pair empirical measure \(H\in \varvec{\Delta }_2\), take the marginal on the first element of the pair to define the first-copy marginal \(H^1\in \varvec{\Delta }\); define likewise the second-copy marginal \(H^2\in \varvec{\Delta }\). The contribution to \(\varvec{Z}^2\) from any valid pair is given by \(\exp \{n\lambda \varvec{s}_2(H)\}\) where \(\varvec{s}_2(H)\equiv \varvec{s}(H^1)+\varvec{s}(H^2)\). Thus \(\varvec{F}_2\) is given explicitly by

$$\begin{aligned} \mathbb {E}\varvec{Z}^2(H) \asymp \frac{\exp \{n\varvec{F}_2(H)\}}{n^{\wp (H)/2}} \equiv \frac{\exp \{n [\varvec{\Sigma }_2(H)+\lambda \varvec{s}_2(H)] \}}{n^{\wp (H)/2}} \end{aligned}$$
(45)

(cf. (42) and (43)). In view of Corollary 3.5, it would suffice for our purposes to calculate the second moment of \(\varvec{Z}(\mathbf {N})\) rather than \(\varvec{Z}\), which amounts to maximizing \(\varvec{F}_2\) on the restricted set

$$\begin{aligned}\mathbf {N}_2 \equiv \Big \{ H\in \varvec{\Delta }_2 : H^1\in \mathbf {N}\text { and } H^2\in \mathbf {N}\Big \}\,.\end{aligned}$$

Following [18] we can simplify the analysis by a further restriction, as follows:

Definition 3.6

(separability) If \(\underline{{\sigma }},\underline{{\sigma }}'\) are valid colorings on \(\mathscr {G}\), define their separation \(\mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}')\) to be the fraction of variables where their corresponding frozen configurations (from Lemmas 2.7 and 2.12) differ. Write \(\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\) if the frozen configuration of \(\underline{{\sigma }}'\) has more free variables than that of \(\underline{{\sigma }}\). We say that a coloring is separable if \(\underline{{\sigma }}\in \mathbf {N}\) (recall this means \(H(\mathcal {G},\underline{{\sigma }})\in \mathbf {N}\)) and

$$\begin{aligned}|\{\underline{{\sigma }}'\in \mathbf {N}: \underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\text { and } \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \notin I_\text {se} \}| \leqslant \exp \{ (\ln n)^4 \}, \end{aligned}$$

where \(I_\text {se}\equiv [(1-k^4/2^{k/2})/2, (1+k^4/2^{k/2})/2]\) and it is implicit that both \(\underline{{\sigma }},\underline{{\sigma }}'\) must both be valid on \(\mathscr {G}\). Let \(\varvec{S}\equiv \varvec{S}_{\lambda ,T}\) be the contribution to \(\varvec{Z}(\mathbf {N})\) from separable colorings.

Proposition 3.7

(proved in “Appendix D”) The first moment is dominated by separable colorings in the sense that \(\mathbb {E}\varvec{S}= (1-o(1)) \mathbb {E}\varvec{Z}(\mathbf {N})\).

We will apply the second moment method to lower bound \(\varvec{S}\); the result will follow since \(\varvec{S}\leqslant \varvec{Z}(\mathbf {N})\leqslant \varvec{Z}\). For \(H\in \varvec{\Delta }_2\), all pairs \((\underline{{\sigma }},\underline{{\sigma }}')\in H\) must have the same separation \(\mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}')\), which we can thus denote as \(\mathrm {\textsf {{sep}}}(H)\). Partition \(\mathbf {N}_2\) into \(\mathbf {N}_\text {se}\equiv \{H\in \mathbf {N}_2:\mathrm {\textsf {{sep}}}(H) \in I_\text {se}\}\) (the near-uncorrelated regime) and \(\mathbf {N}_\text {ns}\equiv \mathbf {N}_2{\setminus }\mathbf {N}_\text {se}\) (the correlated regime). Denote the corresponding contributions to \(\varvec{S}^2\) by \(\varvec{S}^2(\mathbf {N}_\text {se})\) and \(\varvec{S}^2(\mathbf {N}_\text {ns})\).

Corollary 3.8

For separable colorings, the second moment contribution from the correlated regime \(\mathbf {N}_\text {ns}\) is bounded by \(\mathbb {E}[\varvec{S}^2(\mathbf {N}_\text {ns})] \leqslant \exp \{n \lambda \varvec{s}(H_\star ) + o(n) \} \,\mathbb {E}\varvec{S}\).

Proof

By the symmetry between the roles of \(\underline{{\sigma }}\) and \(\underline{{\sigma }}'\), and the definition of separability, we have

$$\begin{aligned}\varvec{S}^2(\mathbf {N}_\text {ns}) \leqslant 2\sum _{\begin{array}{c} (\underline{{\sigma }},\underline{{\sigma }}')\in \mathbf {N}_\text {ns},\\ \underline{{\sigma }}\text { separable} \end{array}} \mathbf {1}\{\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\} \varvec{w}^\text {lit}_{\mathscr {G},T}(\underline{{\sigma }})^\lambda \varvec{w}^\text {lit}_{\mathscr {G},T}(\underline{{\sigma }}')^\lambda \\ \leqslant \exp \{ n\varvec{s}(H_\star )\lambda +o(n) \} \varvec{S}\,.\end{aligned}$$

Taking expectations gives the claim.\(\square \)

We then conclude the second moment calculation by computing \(\mathbb {E}(\varvec{Z}^2(\mathbf {N}_\text {se}))\), which is an upper bound on \(\mathbb {E}(\varvec{S}^2(\mathbf {N}_\text {se}))\). Therefore we must maximize the function \(\varvec{F}_2\) on \(\mathbf {N}_\text {se}\). As in the first moment, the physics theory suggests that the unique maximizer of \(\varvec{F}_2\) on \(\mathbf {N}_\text {se}\) is given by a specific pair empirical measure \(H_\bullet \) which is defined in terms of \(H_\star \). In the nae-sat model there is a small complication in this definition: we will have \(\dot{H}_\bullet =\dot{H}_\star \otimes \dot{H}_\star \) and \(\bar{H}_\bullet =\bar{H}_\star \otimes \bar{H}_\star \), but \(\hat{H}_\bullet \ne \hat{H}_\star \otimes \hat{H}_\star \) because the first and second copies interact via the edge literals. It therefore requires a small calculation to argue that \(\varvec{F}_2(H_\bullet )=2\varvec{F}_2(H_\star )\). We address this by giving a simple sufficient condition for \(H\in \varvec{\Delta }_2\) to satisfy \(\varvec{F}_2(H)=\varvec{F}(H^1)+\varvec{F}(H^2)\) where \(H^j\in \varvec{\Delta }\) are its single-copy marginal.

Lemma 3.9

Consider \(H\in \varvec{\Delta }_2\) with single-copy marginals \(H^1,H^2\in \varvec{\Delta }\). Suppose there are functions \(g^1,g^2\) which are invariant to literals in the sense that \(g^j(\underline{{\sigma }}^j)=g^j(\underline{{\sigma }}^j\oplus \underline{{\texttt {L}}})\) for all \(\underline{{\sigma }}^j\in (\Omega _T)^k\), \(\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^k\), and

$$\begin{aligned}\hat{H}^j(\underline{{\sigma }}^j)={\hat{v}}(\underline{{\sigma }}^j)g^j(\underline{{\sigma }})\,,\quad \hat{H}(\underline{{\sigma }}^1,\underline{{\sigma }}^2) ={\hat{v}}_2(\underline{{\sigma }}^1,\underline{{\sigma }}^2) \prod _{j=1,2}g^1(\underline{{\sigma }}^1)\,. \end{aligned}$$

If in addition \(\dot{H}=\dot{H}^1\otimes \dot{H}^2\) and \(\bar{H}=\bar{H}^1\otimes \bar{H}^2\), then \(\varvec{F}_2(H)=\varvec{F}(H^1)+\varvec{F}(H^2)\).

Proof

Let \(K^j(\underline{{\sigma }}^j,\underline{{\texttt {L}}})\equiv \hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})g^j(\underline{{\sigma }})/2^k\). This defines a probability measure on \((\Omega _T)^k\times \{{\texttt {0}},{\texttt {1}}\}^k\) where the marginal on \((\Omega _T)^k\) is \(\hat{H}^j\), and the marginal on \(\{{\texttt {0}},{\texttt {1}}\}^k\) is uniform by the assumption on \(g^j\). It follows that \(K^j(\underline{{\sigma }}^j\,|\,\underline{{\texttt {L}}}) = \hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})g^j(\underline{{\sigma }})\) and \(\hat{H}^j(\underline{{\sigma }}^j)=\mathbb {E}^lit [K^j(\underline{{\sigma }}^j\,|\,\underline{{\texttt {L}}})]\). Let \((X^1,X^2,L)\) be a random variable with law

$$\begin{aligned} \mathbb {P}((X^1,X^2,L)=(\underline{{\sigma }}^1,\underline{{\sigma }}^2,\underline{{\texttt {L}}})) =\frac{1}{2^k}\prod _{j=1,2} K^j(\underline{{\sigma }}^j\,|\,\underline{{\texttt {L}}})\,. \end{aligned}$$

The marginal law of L is uniform on \(\{{\texttt {0}},{\texttt {1}}\}^k\), and the \(X^j\) are conditionally independent given L. The marginal law of \((X^1,X^2)\) is \(\hat{H}\), and the marginal law of \(X^j\) is \(\hat{H}^j\). The law of L conditional on \(X^j\) is uniform over \(2^k{\hat{v}}(X^j)\) possibilities, whereas the law of L conditional on \((X^1,X^2)\) is uniform over \(2^k{\hat{v}}_2(X^1,X^2)\) possibilities. It follows that

$$\begin{aligned} \mathcal {H}(\hat{H})&=\mathcal {H}(X^1,X^2) =\mathcal {H}(L)-\mathcal {H}(L|X^1,X^2) +\sum _{j=1,2}\mathcal {H}(X^j|L) \\&=-\langle \hat{H},\ln {\hat{v}}_2\rangle +\sum _{j=1,2}\mathcal {H}(X^j|L)\\&= -\langle \hat{H},\ln {\hat{v}}_2\rangle +\sum _{j=1,2} [\mathcal {H}(\hat{H}^j)+ \langle \hat{H}^j,\ln {\hat{v}}\rangle ]\,. \end{aligned}$$

Rearranging gives \(\varvec{\Sigma }_2(H)=\varvec{\Sigma }(H^1)+\varvec{\Sigma }(H^2)\), and the result follows.\(\square \)

It will be clear from the explicit definition that the measure \(H_\star \) of Proposition 3.4 can be expressed as \(H_\star (\underline{{\sigma }})={\hat{v}}(\underline{{\sigma }})g_\star (\underline{{\sigma }})\) where \(g_\star \) is invariant to literals. Let \(H_\bullet =(\dot{H}_\bullet ,\hat{H}_\bullet ,\bar{H}_\bullet )\) where

$$\begin{aligned}\hat{H}_\bullet (\underline{{\sigma }}^1,\underline{{\sigma }}^2) \equiv {\hat{v}}_2(\underline{{\sigma }}^1,\underline{{\sigma }}^2) \prod _{j=1,2} g_\star (\underline{{\sigma }}^j)\,, \end{aligned}$$

\(\dot{H}_\bullet =\dot{H}_\star \otimes \dot{H}_\star \), and \(\bar{H}_\bullet =\bar{H}_\star \otimes \bar{H}_\star \). The following is the second moment analogue of Proposition 3.4.

Proposition 3.10

(proved in Section 5) The unique maximizer of \(\varvec{F}_2\) in \(\mathbf {N}_\text {se}\) is \(H_\bullet \). Moreover, there is a positive constant \(\epsilon =\epsilon (k,\lambda ,T)\) so that for \(\Vert {H-H_\bullet } \Vert \leqslant \epsilon \) we have \(\varvec{F}_2(H) \leqslant \varvec{F}_2(H_\bullet )-\epsilon \Vert {H-H_\bullet } \Vert ^2\).

Corollary 3.11

For the coloring model, the second moment contribution from the near-uncorrelated regime \(\mathbf {N}_\text {se}\) is given by the estimate \(\mathbb {E}[\varvec{Z}^2(\mathbf {N}_\text {se})]\asymp \exp \{2n\varvec{F}(H_\star )\}\).

Proof

Recall from Corollary 3.5 the definition of \((\dot{\wp },\hat{\wp },\bar{\wp })\) for the single-copy model, and define \((\dot{\wp }_2,\hat{\wp }_2,\bar{\wp }_2)\) analogously for the pair model. Let \(\wp _2 \equiv \dot{\wp }_2+\hat{\wp }_2-\bar{\wp }_2-1\). For any \(H^1,H^2\in \mathbf {N}\), let \(\mathbf {N}_{\text {se},H^1,H^2}\) denote the set of \(H\in \mathbf {N}_\text {se}\) with single-copy marginals \(H^1,H^2\). This is a space of dimension \(\wp _2-2\wp \), and it follows from Proposition 3.10 and Lemma 4.4 that

$$\begin{aligned} \mathbb {E}[\varvec{Z}^2(\mathbf {N}_{\text {se},H^1,H^2})] \asymp \frac{\exp \{ n[ \varvec{F}_2(H_\bullet )-\Theta ( \Vert (H^1,H^2)-(H_\star ,H_\star )\Vert ^2)]\}}{n^\wp }\,. \end{aligned}$$

Summing over \(H^1,H^2\in \mathbf {N}\) then gives \(\mathbb {E}[\varvec{Z}^2(\mathbf {N}_\text {se})]\asymp \exp \{ n\varvec{F}_2(H_\bullet ) \}\), which in turn equals \(\exp \{2n\varvec{F}(H_\star )\}\) by applying Lemma 3.9.\(\square \)

3.4 Conclusion of main result

We now explain that the main theorem follows. We continue to assume, as we have done throughout the section, that \(k\geqslant k_0\), \(\alpha \) satisfies (3), and \(0\leqslant \lambda \leqslant 1\). The measure \(H_\star \) of Proposition 3.4 depends on \(\lambda \) and T, and we now make this explicit by writing \(H_\star \equiv H_{\lambda ,T}\).

Corollary 3.12

For any \(0\leqslant \lambda \leqslant 1\) and T finite such that \(\varvec{\Sigma }(H_{\lambda ,T})\) is positive, the separable contribution to \(\varvec{Z}_{\lambda ,T}\) is well-concentrated about its mean:

$$\begin{aligned}\lim _{\epsilon \downarrow 0} \liminf _{n\rightarrow \infty } \mathbb {P}\bigg (\epsilon (\mathbb {E}\varvec{S}) \leqslant \varvec{S}\leqslant \frac{\mathbb {E}\varvec{S}}{\epsilon }\bigg )=1\,. \end{aligned}$$

Proof

The upper bound follows trivially from Markov’s inequality, so the task is to show the lower bound. In the first moment, we have \(\mathbb {E}\varvec{S}\asymp \exp \{n\varvec{F}(H_\star )\}\) by Corollary 3.5 and Proposition 3.7. In the second moment, since \(\varvec{S}\leqslant \varvec{Z}\), we have \(\mathbb {E}(\varvec{S}^2) \leqslant \mathbb {E}[\varvec{S}^2(\mathbf {N}_\text {ns})] +\mathbb {E}[\varvec{Z}^2(\mathbf {N}_\text {se})]\). Combining with Corollaries 3.8 and 3.11 gives

$$\begin{aligned} \frac{\mathbb {E}(\varvec{S}^2)}{(\mathbb {E}\varvec{S})^2} \leqslant \frac{\exp \{n \lambda \varvec{s}(H_\star ) + o(n) \}}{\mathbb {E}\varvec{S}} + O(1) \asymp \frac{\exp \{o(n)\}}{\exp \{n \varvec{\Sigma }(H_\star )\}} +O(1)\,, \end{aligned}$$

which immediately implies \(\mathbb {P}(\varvec{S}\geqslant \delta (\mathbb {E}\varvec{S}))\geqslant \delta \) for some positive constant \(\delta \). This can be strengthened to the asserted concentration result by an easy adaptation of the method described in [25, Sec. 6].\(\square \)

Proposition 3.13

(proved in “Appendix B”) For \(0\leqslant \lambda \leqslant 1\) and \(H_{\lambda ,T}=H_\star \in \varvec{\Delta }\) as given by Proposition 3.4, the triple \((\varvec{s}(H_{\lambda ,T}),\varvec{\Sigma }(H_{\lambda ,T}),\varvec{F}(H_{\lambda ,T}))\) converges as \(T\rightarrow \infty \) to \((s_\lambda ,\Sigma (s_\lambda ),{\mathfrak {F}}(\lambda ))\) from Definition 1.3.

Proof of Theorem 1

In “Appendix E” we prove the upper bound, \({\textsf {f}}(\alpha )\leqslant {\textsf {f}}^\textsc {1rsb}(\alpha )\) for all \(0\leqslant \alpha <\alpha _\text {sat}\). For any \(\lambda ,T\) such that \(\varvec{\Sigma }(H_{\lambda ,T})\) is positive, Corollary 3.12 gives

$$\begin{aligned} \liminf _{n\rightarrow \infty }(\varvec{Z}_{\lambda ,T}(\mathbf {N}))^{1/n} \geqslant \lim _{n\rightarrow \infty }(\varvec{S}_{\lambda ,T})^{1/n} =\exp \{\varvec{F}(H_{\lambda ,T})\} =\exp \{\varvec{\Sigma }(H_{\lambda ,T}) +\lambda \varvec{s}(H_{\lambda ,T})\}\,. \end{aligned}$$

On the other hand, \(\varvec{Z}_{\lambda ,T}(\mathbf {N})\) consists entirely of clusters of size \(\exp \{n\varvec{s}(H_{\lambda ,T}) + o(n)\}\). Therefore, if \(\varvec{\Sigma }(H_{\lambda ,T})\) is positive, it must be that \({\textsf {f}}(\alpha )\geqslant \varvec{s}(H_{\lambda ,T})\). The lower bound \({\textsf {f}}(\alpha )\geqslant {\textsf {f}}^\textsc {1rsb}(\alpha )\) then follows by appealing to Proposition 3.13, so the theorem is proved. \(\square \)

The next two sections are devoted to the optimization of \(\varvec{F}\) in \(\mathbf {N}_\circ \), and of \(\varvec{F}_2\) in \(\mathbf {N}_\text {se}\). In Sect. 4 we show that the optimization of \(\varvec{F}\) and \(\varvec{F}_2\) over small regions can be reduced to an optimization problem on trees. In Sect. 5 we solve the tree optimization problem by connecting it to the analysis of the bp recursion for the coloring model. This allows us to prove Propositions 3.4 and 3.11, thereby completing the proof of the main result Theorem 1.

4 Reduction to tree optimization by local updates

In this section we prove the key reduction that ultimately allows us to compute the (first and second) moments of \(\varvec{Z}\equiv \varvec{Z}_{\lambda ,T}\) (Propositions 3.4 and 3.10). As we have already seen, the calculation reduces to the optimization of functions \(\varvec{F}\) and \(\varvec{F}_2\) from (42) and (45). These functions are generally not convex over the entirety of their domains \(\varvec{\Delta }\) and \(\varvec{\Delta }_2\), but we expect them to be convex in neighborhoods around their maximizers \(H_\star \) and \(H_\bullet \) (as given in Definition 5.6 below). With this in mind, we rely on other means (a priori estimates and separability) to restrict the domains—from \(\varvec{\Delta }\) to \(\mathbf {N}_\circ \) in the first moment (Lemma 3.3), and from \(\varvec{\Delta }_2\) to \(\mathbf {N}_\text {se}\) in the second moment (Corollary 3.8). Within these restricted regions, we will show that \(\varvec{F}\) and \(\varvec{F}_2\) can be optimized by a local update procedure that reduces the (nonconvex) graph optimization to a (convex) tree optimization.

Fig. 4
figure 4

One step of the local update procedure. In this figure, open circles indicate variable factors \(\dot{\Phi }\), solid squares indicate clause factors \(\hat{\Phi }^\text {lit}\), and each variable-clause edge is bisected by a small dot indicating the edge factor \(\bar{\Phi }\). The initial state (top panel) is a triple \((\mathscr {G},Y,\underline{{\sigma }})\) where \(\mathscr {G}\) is an nae-sat instance, Y is a subset of variables (blue circles), and \(\underline{{\sigma }}\) is a coloring on \(\mathscr {G}\) (not shown). Let \(\mathscr {N}\equiv \mathscr {N}(Y)\) be the subgraph of \(\mathscr {G}\) induced by the variables in Y, together with the clausees neighboring Y and their incident half-edges (shown in blue, above the dashed line). The local update procedure resamples the edge literals on \(\mathscr {N}\), as well as the matching between \(\mathscr {N}\) and \(\mathscr {G}{\setminus }\mathscr {N}\), to produce a modified instance \(\mathscr {G}'\). The coloring is updated accordingly so that we arrive at the new state \((\mathscr {G}',Y,\underline{{\eta }})\) (bottom panel). In both \(\mathscr {G}\) and \(\mathscr {G}'\), the edges cut by the dashed lines will be referred to as cut edges. The half-edges just above the dashed lines will be referred to as the leaf edges of \(\mathscr {N}\) (color figure online)

4.1 Local update

We begin with an overview. Throughout this section, we assume \(1\leqslant T<\infty \). Suppose \(\underline{{\sigma }}\) is a T-coloring on \(\mathscr {G}\). Sample from \(\mathscr {G}\) a subset of variables Y, and let \(\mathscr {N}\equiv \mathscr {N}(Y)\) be the subgraph of \(\mathscr {G}\) induced by Y, together with the clauses neighboring Y and their incident half-edges. The half-edges at the boundary of \(\mathscr {N}\) will be referred to as the leaf edges of \(\mathscr {N}\).

Form a modified instance \(\mathscr {G}'\) (see Fig. 4) by resampling the edge literals on \(\mathscr {N}\) as well as the matching between \(\mathscr {N}\) and \(\mathscr {G}{\setminus }\mathscr {N}\). In both \(\mathscr {G}\) and \(\mathscr {G}'\), we will say cut edges to refer to the edges \(e=(av)\) where a is a clause in \(\mathscr {N}\) and v is a variable in the complement of \(\mathscr {N}\); these are the edges cut by the dashed lines in Fig. 4. According to our terminology, the leaf edges of \(\mathscr {N}\) are the half-edges that lie just above the dashed lines, so each leaf edge of \(\mathscr {N}\) is half of a cut edge. The coloring is updated accordingly to produce \(\underline{{\eta }}\), a T-coloring on \(\mathscr {G}'\) which agrees as much as possible with \(\underline{{\sigma }}\) on \(\mathscr {G}{\setminus }\mathscr {N}\): in particular, \(\underline{{\sigma }}\) and \(\underline{{\eta }}\) will agree in the variable-to-clause colors on the cut edges. We will define the procedure so that it gives a Markov chain \(\pi \) on triples \((\mathscr {G},Y,\underline{{\sigma }})\) with reversing measure given by \(\mu (\mathscr {G},Y,\underline{{\sigma }})=\mathbb {P}(\mathscr {G})\mathbb {P}(Y\,|\,\mathscr {G})\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \). (Note that \(\mu \) is not normalized to be a probability measure.)

Reversibility implies that for any subset A of the state space, if B is the set of states reachable in one step from A, then

$$\begin{aligned} \mu (A)&= \sum _\mathrm{{\textsc {a}}\in A} \sum _\mathrm{{\textsc {b}}\in B}\mu (\mathrm {\textsc {a}})\pi (\mathrm {\textsc {a}},\mathrm {\textsc {b}}) =\sum _\mathrm{{\textsc {a}}\in A}\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}})\pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}}) =\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}}) \sum _\mathrm{{\textsc {a}}\in A} \pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}})\nonumber \\&=\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}}) \pi (\mathrm {\textsc {b}},A) \leqslant \bigg \{\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}})\bigg \} \bigg \{ \max _\mathrm{{\textsc {b}}\in B}\pi (\mathrm {\textsc {b}},A) \bigg \} = \mu (B)\max _\mathrm{{\textsc {b}}\in B}\pi (\mathrm {\textsc {b}},A) \,. \end{aligned}$$
(46)

We will design the sampling procedure to ensure that (i) the vertices in Y are far from one another and from any short cycles, and (ii) the empirical measure \(H^\text {sm}\) of \(\underline{{\sigma }}\) on \(\mathscr {N}\) is close to \(H^\text {sy}\), a certain symmetrization of the overall empirical measure \(H(\mathcal {G},\underline{{\sigma }})\). Then \(\mathbb {E}\varvec{Z}(H)\approx \mu (A)\) where A is the set of states \((\mathscr {G},Y,\underline{{\sigma }})\) with \(H^\text {sm}\approx H^\text {sy}\). The update produces a state \((\mathscr {G}',Y,\underline{{\eta }})\in B\) with possibly different \(H^\text {sm}\), but with the same empirical measure \(\dot{h}^\text {tr}(H^\text {sm})\) of variable-to-clause colors \(\dot{\sigma }\) on the leaf edges of \(\mathscr {N}\). Bounding \(\pi (\mathrm {\textsc {b}},A)\) reduces to calculating the weight of configurations on \(\mathscr {N}\) with empirical measure \(H^\text {sm}\approx H^\text {sy}\), relative to the weight of all configurations on \(\mathscr {N}\) with empirical measure \(\dot{h}^\text {tr}(H^\text {sm})\approx \dot{h}^\text {tr}(H^\text {sy})\) on the leaf edges of \(\mathscr {N}\). Because \(\mathscr {N}\) is a disjoint union of trees, this reduces to a convex optimization problem which lends itself much more readily to analysis. The purpose of the current section is to formalize this graphs-to-trees reduction. We begin with the precise definitions of \(H^\text {sy}\), \(H^\text {sm}\) and \(\dot{h}^\text {tr}(H^\text {sm})\). Recall our notation \(\sigma \equiv (\dot{\sigma },\hat{\sigma })\in \Omega \) from the discussion following (33). As \(\sigma \) goes over all of \(\Omega _T\), write \(\dot{\Omega }_T\) for the possible values of \(\dot{\sigma }\), and \(\hat{\Omega }_T\) for the possible values of \(\hat{\sigma }\). Let \(\dot{\Omega }\equiv \dot{\Omega }_\infty \) and \(\hat{\Omega }\equiv \hat{\Omega }_\infty \), so

$$\begin{aligned} \dot{\Omega }&\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup (\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \})\,,\\ \hat{\Omega }&\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup (\hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}) \end{aligned}$$

Definition 4.1

(sample empirical measures) Given an nae-sat instance \(\mathscr {G}\equiv (\mathcal {G},\underline{{\texttt {L}}})\), a T-coloring \(\underline{{\sigma }}\) on \(\mathscr {G}\), and a nonempty subset of variables \(Y\subseteq V\), we record the local statistics of “\(\underline{{\sigma }}\) around Y” as follows. Let \(\dot{H}^\text {sm}\) be the empirical measure of variable-incident colorings in Y: for \(\underline{{\eta }}\in (\Omega _T)^d\),

$$\begin{aligned} \dot{H}^\text {sm}(\underline{{\eta }}) \equiv \frac{1}{|Y|}\sum _{v\in Y} \mathbf {1}\{\underline{{\sigma }}_{\delta v}=\underline{{\eta }}\}\,. \end{aligned}$$

Let \(\bar{H}^\text {sm}\) be the empirical measure of colors on the edges incident to Y: for \(\eta \in \Omega _T\),

$$\begin{aligned} \bar{H}^\text {sm}(\eta ) \equiv \frac{1}{|Y|d} \sum _{v\in Y}\sum _{e\in \delta v} \mathbf {1}\{\sigma _e=\eta \}\,. \end{aligned}$$

For \(\underline{{\eta }}\in (\Omega _T)^k\) and \(1\leqslant j\leqslant k\) define the rotation \(\underline{{\eta }}^{(j)}\equiv (\eta _j,\ldots ,\eta _k,\eta _1,\ldots ,\eta _{j-1})\). For any \(v\in Y\) and \(e\in \delta v\), let j(e) be the index of e in \(\delta a(e)\). For \(\underline{{\eta }}\in (\Omega _T)^k\) let

$$\begin{aligned} \hat{H}^\text {sm}(\underline{{\eta }})\equiv \frac{1}{|Y|d}\sum _{v\in Y}\sum _{e\in \delta v} \mathbf {1}\{(\underline{{\sigma }}_{\delta a(e)})^{j(e)}=\underline{{\eta }}\}\,. \end{aligned}$$

Then \(H^\text {sm}\equiv (\dot{H}^\text {sm},\hat{H}^\text {sm},\bar{H}^\text {sm})\) is the sample empirical measure for the state \((\mathscr {G},Y,\underline{{\sigma }})\); we shall write this hereafter as \(H^\text {sm}=H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})\). Note that \(H^\text {sm}\) lies in the space \(\varvec{\Delta }^\text {sm}\) which is defined similarly to \(\varvec{\Delta }\) but with condition (40) replaced by

$$\begin{aligned} \frac{1}{d}\sum _{\underline{{\sigma }}\in \Omega ^d} \dot{H}^\text {sm}(\underline{{\sigma }})\sum _{i=1}^d \mathbf {1}\{\sigma _i=\tau \} =\bar{H}^\text {sm}(\tau ) =\sum _{\underline{{\sigma }}\in \Omega ^k} \hat{H}^\text {sm}(\underline{{\sigma }})\mathbf {1}\{\sigma _1=\tau \}\,. \end{aligned}$$
(47)

We emphasize that (40) and (47) differ on the right-hand side. However, if we have \(H=(\dot{H},\hat{H},\bar{H})\in \varvec{\Delta }\) such that \(\hat{H}\) is invariant under rotation of the indices \(1\leqslant j\leqslant k\), then \(H\in \varvec{\Delta }^\text {sm}\) as well. With this in mind, for \(H\in \varvec{\Delta }\) we define \(H^\text {sy}\equiv (\dot{H},\hat{H}^\text {sy},\bar{H})\in \varvec{\Delta }^\text {sm}\) where \(\hat{H}^\text {sy}\) is the average over all k rotations of \(\hat{H}\). Later we will sample Y such that \(H^\text {sm}\) falls very close to \(H^\text {sy}\) with high probability. Lastly, for any \(H^\text {sm}\in \varvec{\Delta }^\text {sm}\) we let \(\dot{h}^\text {tr}(H^\text {sm})\) be the measure on \(\dot{\Omega }_T\) given by

$$\begin{aligned}{}[\dot{h}^\text {tr}(H^\text {sm})](\dot{\eta }) \equiv \frac{1}{k-1} \sum _{\underline{{\sigma }}\in (\Omega _T)^k} \sum _{j=2}^k \mathbf {1}\{\dot{\sigma }_j=\dot{\eta }\}\hat{H}^\text {sm}(\underline{{\sigma }})\,. \end{aligned}$$

Thus \(\dot{h}^\text {tr}(H^\text {sm})\) represents the empirical measure of spins \(\dot{\sigma }\) on the leaf edges of \(\mathscr {N}\), i.e., the edges cut by the dashed lines in Fig. 4.

For \(H\in \varvec{\Delta }\), recall from (43) that \(\varvec{F}(H)=\varvec{\Sigma }(H)+\lambda \varvec{s}(H)\) where \(\varvec{\Sigma }(H)\) is the cluster complexity and \(\varvec{s}(H)\) is (the exponential rate of) the cluster size. The tree analogue of \(\varvec{\Sigma }(H)\) is \(\varvec{\Sigma }^\text {tr}(H^\text {sm})\) where

$$\begin{aligned} \varvec{\Sigma }^\text {tr}(H) \equiv \mathcal {H}(\dot{H}) + d\mathcal {H}(\hat{H}) -d\mathcal {H}(\bar{H}) +\varvec{v}(H)\end{aligned}$$
(48)

— the only difference being that \(\varvec{\Sigma }\) has coefficient \(\alpha =d/k\) on the clause entropy term \(\mathcal {H}(\hat{H})\), while \(\varvec{\Sigma }^\text {tr}\) has coefficient d. As we see below, this occurs because the ratio of variables to clauses to edges is 1 : d : d for the disjoint union of trees \(\mathscr {N}\), versus \(1:\alpha :d\) for the full graph \(\mathscr {G}\). We will also see that \(\varvec{\Sigma }^\text {tr}\) is always concave, though \(\varvec{\Sigma }\) need not be. Likewise, the tree analogue of \(\varvec{s}(H)\) is \(\varvec{s}^\text {tr}(H^\text {sm})\) where

$$\begin{aligned} \varvec{s}^\text {tr}(H) \equiv \langle \ln \dot{\Phi },\dot{H}\rangle +d\langle \ln \hat{F}, \hat{H}\rangle +d\langle \ln \bar{\Phi },\bar{H}\rangle \,.\end{aligned}$$

The tree analogue of \(\varvec{F}(H)\) is \(\varvec{\Lambda }(H^\text {sm})\) where

$$\begin{aligned} \varvec{\Lambda }(H) \equiv \varvec{\Sigma }^\text {tr}(H)+ \lambda \varvec{s}^\text {tr}(H)\,.\end{aligned}$$
(49)

Recall Definition 4.1: given \((\mathscr {G},Y,\underline{{\sigma }})\) with sample empirical measure \(H^\text {sm}\), the empirical measure of spins \(\dot{\sigma }\) on the leaf edges of \(\mathscr {N}(Y)\) is given by \(\dot{h}^\text {tr}=\dot{h}^\text {tr}(H^\text {sm})\). Then, for any probability measure \({\dot{h}}\) on \(\dot{\Omega }_T\), we let

$$\begin{aligned} \varvec{\Lambda }^\text {op}({\dot{h}}) \equiv \sup \{\varvec{\Lambda }(H) : H\in \varvec{\Delta }^\text {sm}\text { with } \dot{h}^\text {tr}(H) ={\dot{h}} \}\,, \end{aligned}$$

where we emphasize that the supremum is taken over \(\varvec{\Delta }^\text {sm}\) rather than \(\varvec{\Delta }\). For \(H\in \varvec{\Delta }^\text {sm}\) we define

$$\begin{aligned} \varvec{\Xi }(H) \equiv \varvec{\Xi }_{\lambda ,T}(H) \equiv \varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))- \varvec{\Lambda }(H)\,.\end{aligned}$$
(50)

The interpretation of \(\varvec{\Xi }\), formalized below, is that for any \(H\in \varvec{\Delta }\), if A is the set of states with \(H^\text {sm}\approx H^\text {sy}\) and B is the set of states reachable in one step of the chain from A, then \(\max \{\pi (\mathrm {\textsc {b}},A):\mathrm {\textsc {b}}\in B\}\) is approximately \(\exp \{-|Y|\,\varvec{\Xi }(H^\text {sy})\}\), where we note that \(\varvec{\Xi }(H^\text {sy})\geqslant 0\) since \(\varvec{\Xi }\) is nonnegative on all of \(\varvec{\Delta }^\text {sm}\), and \(H^\text {sy}\in \varvec{\Delta }\cap \varvec{\Delta }^\text {sm}\). Formally, we have the following bound:

Theorem 4.2

For \(\epsilon \) small enough (depending only on dkT), it holds for all \(H\in \varvec{\Delta }\) that

$$\begin{aligned}\varvec{F}(H)\leqslant \max \Big \{\varvec{F}(H'): \Vert {H'-H} \Vert \leqslant \epsilon (dk)^{2T}\Big \}- \epsilon \,\varvec{\Xi }(H^\text {sy})\,. \end{aligned}$$

The analogous statement holds in the second moment with \(\varvec{F}_2=\varvec{F}_{2,\lambda ,T}\) and \(\varvec{\Xi }_2=\varvec{\Xi }_{2,\lambda ,T}\).

For the sake of exposition, we will give the proof of Theorem 4.2 for \(\varvec{F}\) only; the assertion for \(\varvec{F}_2\) follows from the same argument with essentially no modifications. The first task is to define the Markov chain that was informally discussed above. There are a few issues to be addressed: how to sample Y ensuring certain desirable properties; how to resample the matching between \(\mathscr {N}\) and \(\mathscr {G}{\setminus }\mathscr {N}\); and how to produce a valid coloring \(\underline{{\eta }}\) on \(\mathscr {G}'\) without changing the spins \(\dot{\sigma }\) on the cut edges. We address the last issue next.

4.2 Tree updates

Recall that in the bipartite factor graph \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), each edge joins a variable to a clause and is defined to have length one-half. For the discussion that follows, it is useful to bisect each edge \(e\in E\) with an artificial vertex indicating the edge factor \(\bar{\Phi }\); these are shown as small dots in Fig. 4. Thus an edge e joining \(a\in F\) to \(v\in V\) becomes two quarter-length edges, (ae) and (ev), where e now refers to the artificial vertex. Given a coloring \(\underline{{\sigma }}\) on the original graph, we obtain a coloring on the new graph by simply duplicating the color on each edge, setting \(\sigma _{ae}=\sigma _e=\sigma _{ev}\). We then define \(\mathscr {N}(v)\) as the (5/8)-neighborhood of variable v, and define \(\mathscr {N}=\mathscr {N}(Y)\) as the union of \(\mathscr {N}(v)\) for all \(v\in Y\): in the top panel of Fig. 4, \(\mathscr {N}\) is the subgraph shown in blue, above the dashed line.

Directly below the same dashed line, the small solid orange dots correspond to the boundary edges, hereafter denoted \(\mathcal {W}\), of the cavity graph \(\mathscr {G}_\partial \equiv \mathscr {G}{\setminus }\mathscr {N}\). For \(e\in \mathcal {W}\), let \(\mathscr {T}\) be its neighborhood in \(\mathscr {G}{\setminus }\mathscr {N}\) of some radius \(\ell >T\) where \(2\ell \) is a positive integer. Assuming e is not close to a short cycle, \(\mathscr {T}\) is what we will call a directed tree rooted at e. In this case we also call \(\mathscr {T}\) a variable-to-clause tree since the root edge has no incident clause; a clause-to-variable tree is similarly defined. We always visualize a directed tree \(\mathscr {T}\) as in Fig. 5, with the root edge e at the top, so that paths leaving the root travel downwards. On an edge \(e=(av)\), the upward color is \(\dot{\sigma }_{av}\) if a lies above v, and \(\hat{\sigma }_{av}\) if v lies above a. We let \(\delta \mathscr {T}\) denote the boundary edges of \(\mathscr {T}\), not including the root edge.

Fig. 5
figure 5

Two types of directed tree \(\mathscr {T}\). In each case the root edge e is shown at the top, and the boundary \(\delta \mathscr {T}\) is highlighted in purple. For \(e\in \mathcal {W}\), the unit-radius neighborhood of e in \(\mathscr {G}{\setminus }\mathscr {N}\) typically looks like the left panel (color figure online)

Suppose \(\underline{{\sigma }}\) is a valid T-coloring of a directed tree \(\mathscr {T}\) with root spin \(\sigma _e=\sigma \), and consider a new root spin \(\eta \in \Omega _T\). If \(\sigma \) and \(\eta \) agree on the upward color of the root edge, then there is a unique valid coloring

$$\begin{aligned}\underline{{\eta }}=\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T}) \in (\Omega _T)^{E(\mathscr {T})}\end{aligned}$$

which has root spin \(\eta \), and agrees with \(\underline{{\sigma }}\) in all the upward colors. Indeed, the only possibility for \(\sigma \ne \eta \) is that both \(\sigma ,\eta \in \{{\texttt {f}}\}\). Then, recalling (23), the coloring \(\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T})\) is uniquely defined by recursively applying the mappings \(\dot{T}\) and \(\hat{T}\), starting from the root and continuing downwards. Since we assumed that \(\underline{{\sigma }}\) was a valid T-coloring and \(\eta \in \Omega _T\), it is easy to verify that the resulting \(\underline{{\eta }}\) is also a valid T-coloring, so the \(\mathrm {\textsf {update}}\) procedure respects the restriction to \(\Omega _T\). From now on we assume all edge colors belong to \(\Omega _T\).

Lemma 4.3

Suppose \(\underline{{\sigma }}\) is a valid T-coloring of the directed tree \(\mathscr {T}\) with root color \(\sigma \), and \(\eta \in \Omega _T\) agrees with \(\sigma \) on the upward color of the root edge. If \(\underline{{\eta }}=\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T})\) agrees with \(\underline{{\sigma }}\) on the boundary \(\delta \mathscr {T}\), then \(\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\sigma }})=\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\eta }})\).

Proof

It follows from the construction that on any edge e in the tree, \(\sigma _e\) and \(\eta _e\) agree on the upward color; moreover, if \(\sigma _e\ne \eta _e\) then we must have \(\sigma _e,\eta _e\in \{{\texttt {f}}\}\). For each vertex \(x\in \mathscr {T}\), let e(x) denote the parent edge of x, that is, the unique edge of \(\mathscr {T}\) which lies above x. We then have

$$\begin{aligned}\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\sigma }}) = \prod _{e\in \delta \mathscr {T}} \bar{\Phi }(\sigma _e) \prod _{v\in V(\mathscr {T})}\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \bar{\Phi }(\sigma _{e(v)})\bigg \} \prod _{a\in F(\mathscr {T})} \bigg \{ \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \bar{\Phi }(\sigma _{e(a)}) \bigg \}\,. \end{aligned}$$

For a clause a in \(\mathscr {T}\) with \(e(a)=e\), if both \(\sigma _e,\eta _e\in \{{\texttt {f}}\}\) then it follows directly from (39) that

$$\begin{aligned} \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a})\bar{\Phi }(\sigma _e) = \hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a})\bar{\varphi }(\sigma _e) = \hat{z}(\hat{\sigma }_e) = \hat{z}(\hat{\eta }_e) = \hat{\Phi }^\text {lit}((\underline{{\eta }}\oplus \underline{{\texttt {L}}})_{\delta a})\bar{\Phi }(\eta _e)\,. \end{aligned}$$

For a variable v in \(\mathscr {T}\) with \(e(v)=e\), if \(\sigma _e,\eta _e\in \{{\texttt {f}}\}\), then a similar calculation as (39) gives

$$\begin{aligned} \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \bar{\Phi }(\sigma _e) =\dot{\varphi }(\underline{{\hat{\sigma }}}_{\delta v}) \bar{\varphi }(\sigma _e) =\dot{z}(\dot{\sigma }_e) = \dot{z}(\dot{\eta }_e) =\dot{\Phi }(\underline{{\eta }}_{\delta v}) \bar{\Phi }(\eta _e)\,, \end{aligned}$$

where we used the fact that necessarily we have \(\underline{{\sigma }}_{\delta v}\in \{{\texttt {f}}\}^d\). To conclude, we recall that \(\underline{{\sigma }}\) and \(\underline{{\eta }}\) agree on \(\delta \mathscr {T}\) by assumption, so we have \(\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\sigma }})=\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\eta }})\) as claimed. \(\square \)

We also use the directed tree as a device to prove the following lemma, which was used in the proofs of Corollaries 3.5 and 3.11.

Lemma 4.4

Let \({\dot{M}},{\hat{M}}\) be as defined in Corollary 3.5, and let \({\dot{M}}_2,{\hat{M}}_2\) be their analogues in the pair model. For any \(\sigma ,\eta \in \Omega \) there exists an integer-valued vector \((\dot{H},\hat{H})\) so that

$$\begin{aligned} \langle {\mathbf {1}},\dot{H}\rangle =0=\langle {\mathbf {1}},\hat{H}\rangle \quad \text {and}\quad {\dot{M}}\dot{H}-{\hat{M}}\hat{H}={\mathbf {1}}_\sigma -{\mathbf {1}}_\eta \,, \end{aligned}$$

where \({\mathbf {1}}\) denotes the all-ones vector, and \({\mathbf {1}}_\sigma \) denotes the vector which is one in the \(\sigma \) coordinate and zero elsewhere. The analogous statement holds for \(({\dot{M}}_2,{\hat{M}}_2)\).

Proof

We define a graph on \(\Omega _T\) by putting an edge between \(\sigma \) and \(\eta \) if there exist valid colorings \(\underline{{\sigma }},\underline{{\eta }}\) on some directed tree \(\mathscr {T}\) which take values \(\sigma ,\eta \) on the root edge, but agree on the boundary edges \(\delta \mathscr {T}\). If \(\sigma ,\eta \) are connected in this way, then taking

$$\begin{aligned} \dot{H}(\underline{{\rho }})&=\sum _{v\in V(\mathscr {T})} \mathbf {1}\{\underline{{\sigma }}_{\delta v}=\underline{{\rho }}\} -\sum _{v\in V(\mathscr {T})}\mathbf {1}\{\underline{{\eta }}_{\delta v}=\underline{{\rho }}\}\,, \quad \underline{{\rho }}\in (\Omega _T)^d\\ \hat{H}(\underline{{\rho }})&=\sum _{a\in F(\mathscr {T})} \mathbf {1}\{\underline{{\sigma }}_{\delta a}=\underline{{\rho }}\} -\sum _{a\in F(\mathscr {T})} \mathbf {1}\{\underline{{\eta }}_{\delta a}=\underline{{\rho }}\}\,, \quad \underline{{\rho }}\in (\Omega _T)^k. \end{aligned}$$

gives \({\dot{M}}\dot{H}-{\hat{M}}\hat{H}={\mathbf {1}}_\sigma -{\mathbf {1}}_{\eta }\) as required. It therefore suffices to show that the graph we have defined on \(\Omega _T\) is connected (hence complete). First, if \(\dot{\sigma }=\dot{\eta }\), it is clear that \(\sigma \) and \(\eta \) can be connected by colorings \(\underline{{\sigma }},\underline{{\eta }}\) of some variable-to-clause tree \(\mathscr {T}\), with \(\underline{{\eta }}=\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T})\). Similarly, if \(\hat{\sigma }=\hat{\eta }\), then \(\sigma \) and \(\eta \) can be connected by a clause-to-variable tree. This implies that \(\{{\texttt {f}}\}\) is connected. Next, if \(\sigma ={\texttt {r}}_{{\varvec{x}}}\) and \(\eta ={\texttt {b}}_{{\varvec{x}}}\), then they can be connected by a variable-to-clause tree rooted at edge e, containing a single variable factor \(v=v(e)\), with \(\underline{{\sigma }}_{\delta v{\setminus } e}\) identically equal to \({\texttt {r}}_{{\varvec{x}}}\). If \(\sigma ={\texttt {b}}_{{\varvec{x}}}\) and \(\eta =(\dot{\tau },{\texttt {s}})\) for any \(\dot{\tau }\in \dot{\Omega }{\setminus }\{{\texttt {r}},{\texttt {b}}\}\), then they can be connected by a clause-to-variable tree rooted at edge e, containing a single clause factor \(a=a(e)\), with any \(\underline{{\sigma }}_{\delta a{\setminus } e}\) such that \((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a{\setminus } e}\) contains both \(\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\) entries. It follows that \(\Omega _T\) is indeed connected, which proves the assertion concerning \(({\dot{M}},{\hat{M}})\). The proof for \(({\dot{M}}_2,{\hat{M}}_2)\) is very similar and we omit the details.\(\square \)

4.3 Markov chain

We now define a Markov chain on tuples \((\mathscr {G},Y,\underline{{\sigma }})\) where \(\mathscr {G}\) is an nae-sat instance, \(\underline{{\sigma }}\) is a valid T-coloring on \(\mathscr {G}\), and \(Y\subseteq V\) is a subset of variables such that

$$\begin{aligned} \text {the subgraphs }B_{2T}(v),\text { for }v\in Y,\text { are mutually disjoint trees,} \end{aligned}$$
(51)

where \(B_{2T}(v)\) is to the 2T-neighborhood of v in \(\mathscr {G}\). Recall that \(\mathscr {N}=\mathscr {N}(Y)\) is the \(\tfrac{5}{8}\)-neighborhood of Y; we write \(\mathscr {N}= (\mathcal {N},\underline{{\texttt {L}}}_\mathcal {N})\) where \(\mathcal {N}\) is the graph without edge literals. Write \(\delta \mathcal {N}\) for the boundary of \(\mathcal {N}\), consisting of clause-incident edges that are not incident to Y (just above the dashed line in Fig. 4). Write \(\underline{{\sigma }}_\mathcal {N}\) for a T-coloring on \(\mathcal {N}\) (including \(\delta \mathcal {N}\)), and let

$$\begin{aligned} \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N}) \equiv \varvec{w}^\text {lit}_\mathscr {N}(\underline{{\sigma }}_\mathcal {N}) \equiv \prod _{v\in Y}\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{e\in \delta v} \Big \{ \hat{\Phi }^\text {lit}( (\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a(e)}) \bar{\Phi }(\sigma _e) \Big \}\bigg \}\,. \end{aligned}$$
(52)

On the other hand we have \(\mathscr {G}{\setminus }\mathscr {N}\equiv \mathscr {G}_\partial \equiv (\mathcal {G}_\partial ,\underline{{\texttt {L}}}_\partial )\) where \(\mathcal {G}_\partial \equiv (V_\partial ,F_\partial ,E_\partial )\), and \(\mathcal {W}\) denotes the boundary of \(\mathscr {G}_\partial \) (just below the dashed line in Fig. 4). Write \(\underline{{\sigma }}_\partial \) for a coloring on \(\mathcal {G}_\partial \) (including \(\mathcal {W}\)), and let

$$\begin{aligned} \varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial ) \equiv \prod _{v\in V_\partial } \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F_\partial } \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in E_\partial } \bar{\Phi }(\sigma _e)\,. \end{aligned}$$

By matching \(\delta \mathcal {N}\) to \(\mathcal {W}\) (along the dashed line in Fig. 4), the graphs \(\mathscr {G}_\partial \) and \(\mathscr {N}\) combine to form the original instance \(\mathscr {G}\). If \(\underline{{\sigma }}\) is a valid coloring on \(\mathscr {G}\), then \(\underline{{\sigma }}_{\delta \mathcal {N}}\) and \(\underline{{\sigma }}_\mathcal {W}\) must agree, and we have

$$\begin{aligned} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }}) =\varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial ) \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}| \underline{{\texttt {L}}}_\mathcal {N})\,. \end{aligned}$$
(53)

Let \(\dot{h}^\text {tr}(\underline{{\sigma }}_{\delta \mathcal {N}})=\dot{h}^\text {tr}\) be the empirical measure of the spins \((\dot{\sigma }_e)_{e\in \delta \mathcal {N}}\). Given initial state \((\mathscr {G},Y,\underline{{\sigma }})\), we take one step of the Markov chain as follows:

  1. 1.

    Detach \(\mathscr {N}\) from \(\mathscr {G}\). On \(\mathscr {N}\), sample a new assignment \((\underline{{\texttt {L}}}'_\mathcal {N},\underline{{\eta }}_\mathcal {N})\) from the probability measure

    $$\begin{aligned} p( (\underline{{\texttt {L}}}'_\mathcal {N},\underline{{\eta }}_\mathcal {N})\,|\, (\underline{{\texttt {L}}}_\mathcal {N},\underline{{\sigma }}_\mathcal {N})) =\frac{\mathbf {1}\{ \dot{h}^\text {tr}(\underline{{\eta }}_{\delta \mathcal {N}})=\dot{h}^\text {tr}\} \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda }{z(|Y|,\dot{h}^\text {tr})}\end{aligned}$$
    (54)

    where the denominator is the normalizing constant obtained by summing over all possible \((\underline{{\texttt {L}}}'_\mathcal {N},\underline{{\eta }}_\mathcal {N})\).

  2. 2.

    Form the new graph \(\mathscr {G}'\) by sampling a uniformly random matching of \(\delta \mathcal {N}\) with \(\mathcal {W}\), subject to the constraint that \(e\in \mathcal {W}\) must be matched to \(e'\in \delta \mathcal {N}\) with \(\dot{\sigma }_e=\dot{\eta }_{e'}\). The number of such matchings depends only on |Y| and \(\dot{h}^\text {tr}\), so we denote it as \({\mathcal {M}}(|Y|,\dot{h}^\text {tr})\). For each matched pair \((e,e')\) where \(\hat{\sigma }_e\ne \hat{\eta }_{e'}\), let \(\mathscr {T}=\mathscr {T}(e)\) be the radius-2T neighborhood of e in the graph \(\mathscr {G}_\partial \). Let

    $$\begin{aligned} \underline{{\eta }}_{\mathscr {T}}\equiv \mathrm {\textsf {update}}(\underline{{\sigma }}_{\mathscr {T}},\eta _e;\mathscr {T}) \end{aligned}$$

    and note that, since \(\underline{{\sigma }}\) is a valid T-coloring, \(\underline{{\eta }}_{\mathscr {T}}\) and \(\underline{{\underline{{\sigma }}}}_{\mathscr {T}}\) must agree at the boundary of \(\mathscr {T}\). Finally, on the rest of \(\mathcal {G}_\partial \) outside the radius-2T neighborhood of \(\mathcal {W}\), we simply take \(\underline{{\eta }}\) and \(\underline{{\sigma }}\) to be the same.

The state of the Markov chain after one step is \((\mathscr {G}',Y,\underline{{\eta }})\) where \(\underline{{\eta }}\) is a valid T-coloring on \(\mathscr {G}'\).

Lemma 4.5

Suppose we have a sampling mechanism for a random subset of variables Y in \(\mathscr {G}\) such that, whenever \((\mathscr {G},Y,\underline{{\sigma }})\) and \((\mathscr {G}',Y,\underline{{\eta }})\) appear in the same orbit of the Markov chain, we have

$$\begin{aligned} \mathbb {P}(Y\,|\,\mathscr {G})=\mathbb {P}(Y\,|\,\mathscr {G}')\,. \end{aligned}$$
(55)

A reversing measure for the Markov chain is then given by \(\mu (\mathscr {G},Y,\underline{{\sigma }}) = \mathbb {P}(\mathscr {G}) \mathbb {P}(Y\,|\,\mathscr {G}) \varvec{w}^\text {lit}_{\mathscr {G}}(\underline{{\sigma }})^\lambda \).

Proof

Given \(\mathrm {\textsc {a}}=(\mathscr {G},Y,\underline{{\sigma }})\), let \(\mathrm {\textsc {b}}=(\mathscr {G}',Y,\underline{{\eta }})\) be any state reachable from \(\mathrm {\textsc {a}}\) in a single step of the chain. By the factorization (53), together with assumption (55) and the fact that \(\mathbb {P}(\mathscr {G})=\mathbb {P}(\mathscr {G}')\),

$$\begin{aligned} \frac{\mu (\mathrm {\textsc {a}})}{\mu (\mathrm {\textsc {b}})} =\frac{\mathbb {P}(\mathscr {G})\mathbb {P}(Y\,|\,\mathscr {G}) \varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda }{\mathbb {P}(\mathscr {G}')\mathbb {P}(Y\,|\,\mathscr {G}') \varvec{w}^\text {lit}_\partial (\underline{{\eta }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda } =\frac{\varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda }{\varvec{w}^\text {lit}_\partial (\underline{{\eta }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda } =\frac{\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda }{\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda }\,, \end{aligned}$$

where the last identity is by Lemma 4.3. On the other hand, with \(\pi \) denoting the transition probabilities for the Markov chain, (54) implies

$$\begin{aligned}\frac{\pi (\mathrm {\textsc {a}},\mathrm {\textsc {b}})}{\pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}})} =\frac{ \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda }{{\mathcal {M}}(|Y|,\dot{h}^\text {tr}) z(|Y|,\dot{h}^\text {tr})} \frac{{\mathcal {M}}(|Y|,\dot{h}^\text {tr}) z(|Y|,\dot{h}^\text {tr})}{\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda } =\frac{\mu (\mathrm {\textsc {b}})}{\mu (\mathrm {\textsc {a}})}\,.\end{aligned}$$

Rearranging proves reversibility, \(\mu (\mathrm {\textsc {a}})\pi (\mathrm {\textsc {a}},\mathrm {\textsc {b}}) =\mu (\mathrm {\textsc {b}})\pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}})\). (We remark that since the Markov chain breaks up into many disjoint orbits, the reversing measure \(\mu \) is not unique.)\(\square \)

4.4 From graph to tree optimizations

If Y satisfies condition (51) and we define \(\mathscr {N}=\mathscr {N}(Y)=(\mathcal {N},\underline{{\texttt {L}}})\) as before, then \(\mathcal {N}\) consists of \(|Y|\equiv s\) disjoint copies of the tree \(\mathcal {D}\) shown in Fig. 6. Recall from Definition 4.1 the definition of \(H^\text {sm}=H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})\), and note \(H^\text {sm}\) depends only on \(\underline{{\sigma }}_\mathcal {N}\). For any \(H^\text {sm}\in \varvec{\Delta }^\text {sm}\) we let \(\varvec{Z}(H^\text {sm};\mathscr {N})\) be the partition function of all colorings on \(\mathscr {N}\) with empirical measure \(H^\text {sm}\)—the only randomness comes from the literals \(\underline{{\texttt {L}}}_\mathcal {N}\). The expected number of valid colorings is

$$\begin{aligned} \mathbb {E}\varvec{Z}_{\lambda =0,T}(H^\text {sm};\mathscr {N}) =\exp \{sd\langle \hat{H}^\text {sm},{\hat{v}}\rangle \} \left( {\begin{array}{c}s\\ s\dot{H}^\text {sm}\end{array}}\right) \left( {\begin{array}{c}ds\\ ds\hat{H}^\text {sm}\end{array}}\right) \bigg / \left( {\begin{array}{c}ds\\ ds\bar{H}^\text {sm}\end{array}}\right) \, \end{aligned}$$

which by Stirling’s formula is \(s^{O(1)} \exp \{s\,\varvec{\Sigma }^\text {tr}(H^\text {sm})\}\) (see (48)). Any valid coloring \(\underline{{\sigma }}_\mathcal {N}\) with empirical measure \(H^\text {sm}\) contributes weight \(\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda =\exp \{s\lambda \varvec{s}^\text {tr}(H^\text {sm})\}\), so altogether

$$\begin{aligned} \mathbb {E}\varvec{Z}(H^\text {sm};\mathscr {N}) = s^{O(1)}\exp \{s[\varvec{\Sigma }^\text {tr}(H^\text {sm})+\lambda \varvec{s}^\text {tr}(H^\text {sm})]\} =s^{O(1)}\exp \{s\,\varvec{\Lambda }(H^\text {sm})\}\,, \end{aligned}$$
(56)

with \(\varvec{\Lambda }\) as in (49). (This calculation clarifies why we refer to \(\varvec{\Sigma }^\text {tr},\varvec{\Lambda }\) as the “tree analogues” of \(\varvec{\Sigma },\varvec{F}\).)

Fig. 6
figure 6

If \(s\equiv |Y|\) and \(\mathscr {N}=\mathscr {N}(Y)=(\mathcal {N},\underline{{\texttt {L}}})\), the graph \(\mathcal {N}\) consists of s disjoint copies of the tree \(\mathcal {D}\) shown here. The boundary \(\delta \mathcal {D}\) is highlighted in purple (color figure online)

Fix \(H^\text {sm}\in \varvec{\Delta }^\text {sm}\) and let \(A(H^\text {sm})\) be the set of all \((\mathscr {G},Y,\underline{{\sigma }})\) with \(H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})=H^\text {sm}\). Let \(B(H^\text {sm})\) be the set of states reachable in one step from \(A(H^\text {sm})\). Then, for all \(\mathrm {\textsc {b}}\in B(H^\text {sm})\),

$$\begin{aligned} \pi (\mathrm {\textsc {b}},A(H^\text {sm})) =\frac{\mathbb {E}\varvec{Z}(H^\text {sm};\mathscr {N})}{\sum _{H'\in \varvec{\Delta }^\text {sm}} \mathbf {1}\{\dot{h}^\text {tr}(H')=\dot{h}^\text {tr}(H^\text {sm})\} \mathbb {E}\varvec{Z}(H';\mathscr {N})} =s^{O(1)}\exp \{-s\,\varvec{\Xi }(H^\text {sm})\}\,. \end{aligned}$$
(57)

This is the key calculation for Theorem 4.2. To complete the proof, what remains is to produce a sampling mechanism \(\mathbb {P}(Y\,|\,\mathscr {G})\) which satisfies our earlier conditions (51) and (55), together with some concentration bound to ensure that in most cases Y is large and \(H^\text {sm}\approx H^\text {sy}\). We formalize this as follows:

Definition 4.6

(sampling mechanism) Let \(\mathbb {P}(Y\,|\,\mathscr {G})\) be the probability of sampling the subset of variables Y from the nae-sat instance \(\mathscr {G}\). We call this a good sampling mechanism if the following holds: first, whenever \((\mathscr {G},Y,\underline{{\sigma }})\) and \((\mathscr {G}',Y,\underline{{\eta }})\) appear in the same orbit of the Markov chain, we must have \(\mathbb {P}(Y\,|\,\mathscr {G})=\mathbb {P}(Y\,|\,\mathscr {G}')\) (used in Lemma 4.5 to show reversibility). Next we require that for every \(\mathscr {G}\), and for every Y with \(\mathbb {P}(Y\,|\,\mathscr {G})\) positive, the neighborhoods \(B_{2T}(v)\) for \(v\in Y\) are mutually disjoint trees (condition (51), required for defining the Markov chain). Lastly we require that for all but an exceptional set \({\mathscr {B}}\) of “bad” nae-sat instances, with \(\mathbb {P}(\mathscr {G}\in {\mathscr {B}}) \leqslant \exp \{-n(\ln n)^{1/2}\}\), we have

$$\begin{aligned} \sum _{Y:n\epsilon \leqslant |Y|\leqslant 4n\epsilon } \mathbb {P}(Y\,|\,\mathscr {G}) {\mathbf {1}}\bigg \{ \Big \Vert H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }}) -H^\text {sy}(\mathcal {G},\underline{{\sigma }})\Big \Vert \leqslant \frac{1}{(\ln \ln n)^{1/2}}\bigg \} \geqslant \frac{1}{2}\,. \end{aligned}$$
(58)

for all colorings \(\underline{{\sigma }}\) on \(\mathscr {G}\).

Proposition 4.7

Assume the existence of a good sampling mechanism in the sense of Definition 4.6. Then, for \(\epsilon \) small enough (depending only on dkT), it holds for all \(H\in \varvec{\Delta }\) that

$$\begin{aligned}\frac{\mathbb {E}\varvec{Z}(H)}{\mathbb {E}\varvec{Z}(\mathbf {N}_{H,\epsilon })} \leqslant \frac{\exp \{o_n(1)\}}{\exp \{n\epsilon \,\varvec{\Xi }(H^\text {sy})\}} \end{aligned}$$

where \(\mathbf {N}_{H,\epsilon }\equiv \{H'\in \varvec{\Delta }:\Vert H-H' \Vert \leqslant \epsilon (dk)^{2T}\}\) and \(H^\text {sy}\) is the symmetrization of H from Definition 4.1.

Proof

Abbreviate \(\delta \equiv 1/(\ln \ln n)^{1/2}\). Given \(H\in \varvec{\Delta }\) and its symmetrization \(H^\text {sy}\in \varvec{\Delta }\cap \varvec{\Delta }^\text {sm}\), let

$$\begin{aligned}A=\bigg \{ (\mathscr {G},Y,\underline{{\sigma }}): |Y|\geqslant n\epsilon \text { and } \Big \Vert H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})-H^\text {sy}\Big \Vert \leqslant 2\delta \bigg \}\,.\end{aligned}$$

With \(\mu \) the reversing measure from Lemma 4.5, we have

$$\begin{aligned} \mu (A) \geqslant \sum _{\mathscr {G}\notin {\mathscr {B}}}\mathbb {P}(\mathscr {G}) \sum _{\underline{{\sigma }}} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \sum _{Y:|Y|\geqslant n\epsilon } \mathbb {P}(Y\,|\,\mathscr {G}) {\mathbf {1}}\bigg \{\quad \begin{array}{c} \Vert H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }}) -H^\text {sy}(\mathcal {G},\underline{{\sigma }}) \Vert \leqslant \delta \\ \text {and } \Vert H(\mathcal {G},\underline{{\sigma }})-H \Vert \leqslant \delta \end{array}\quad \bigg \}\,, \end{aligned}$$

using \(\Vert H(\mathcal {G},\underline{{\sigma }})-H \Vert \leqslant \Vert H^\text {sy}(\mathcal {G},\underline{{\sigma }})-H^\text {sy}\Vert \) together with the triangle inequality. Applying (58) gives

$$\begin{aligned}\mu (A) \geqslant \frac{1}{2} \sum _{\mathscr {G}\notin {\mathscr {B}}}\mathbb {P}(\mathscr {G}) \sum _{\underline{{\sigma }}} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda {\mathbf {1}}\bigg \{ \Vert H(\mathcal {G},\underline{{\sigma }})-H \Vert \leqslant \delta \bigg \} \geqslant \frac{\mathbb {E}[\varvec{Z}(H);\mathscr {G}\notin {\mathscr {B}}]}{2} \geqslant \frac{\mathbb {E}\varvec{Z}(H)}{4}\,, \end{aligned}$$

where the last step follows from the bound on \(\mathbb {P}(\mathscr {G}\in {\mathscr {B}})\). If B is the set of states reachable from A in one step of the Markov chain, then crudely \(\mu (B)\leqslant \mathbb {E}\varvec{Z}(\mathbf {N}_{H,\epsilon })\), and

$$\begin{aligned}\max _\mathrm{{\textsc {b}}\in B} \pi (\mathrm {\textsc {b}},A) \leqslant \frac{\exp \{o_n(1)\}}{\exp \{n\epsilon \,\varvec{\Xi }(H^\text {sy})\}} \end{aligned}$$

by our earlier calculation (57). The result follows by substituting into (46).\(\square \)

4.5 Sampling mechanism

To complete the proof of Theorem 4.2, it remains for us to define a sampling mechanism satisfying the conditions of Definition 4.6. To this end, given a (dk)-regular graph \(\mathcal {G}\), let \(V_t\subseteq V\) be the subset of variables \(v\in V\) such that the t-neighborhood \(B_t(v)\) around v is a tree. Recall the following form of the Chernoff bound: if X is a binomial random variable with mean \(\mu \), then for all \(t\geqslant 1\) we have \(\mathbb {P}(X\geqslant t\mu )\leqslant \exp \{ - t\mu \ln (t/e) \}\).

Lemma 4.8

Suppose \(\mathcal {G}\) is sampled from the (dk)-regular configuration model on n vertices. For any fixed t we have \(\mathbb {P}( |V {\setminus } V_t| \geqslant n/(\ln \ln n) )\leqslant \exp \{ -n (\ln n)^{1/2} \}\) for n large enough (depending on dkt).

Proof

Let \(\gamma \) count the total number of cycles in \(\mathcal {G}\) of length at most 2t. If \(v\notin V_t\) then v must certainly lie within distance t of one of these cycles, so crudely we have

$$\begin{aligned} |V{\setminus } V_t| \leqslant 2t (dk)^t\gamma \,. \end{aligned}$$
(59)

Consider breadth-first search exploration in \(\mathcal {G}\) started from an arbitrary variable, say \(v=1\). At each step of the exploration we reveal one edge, so the exploration takes nd steps total. Conditioned on everything revealed in the first t steps, the chance that the edge revealed at step \(t+1\) will form a new cycle of length \(\leqslant 2t\) is upper bounded by\((dk)^{2t}/(nd-t)\). It follows that the total number of cycles revealed up to time \(nd(1-\delta )\) is stochastically dominated by a binomial random variable

$$\begin{aligned} \gamma '\sim {\mathrm {Bin}}\bigg ( nd(1-\delta ), \frac{(dk)^{2t}}{nd\delta }\bigg )\,. \end{aligned}$$

The final \(nd\delta \) exploration steps form at most \(nd\delta \) cycles, so \(\gamma \leqslant \gamma ' + nd\delta \). Applying the Chernoff bound (as stated above) with \(\delta =1/(\ln \ln n)^2\), we obtain

$$\begin{aligned} \mathbb {P}(\gamma \geqslant 2nd\delta ) \leqslant \mathbb {P}(\gamma ' \geqslant nd\delta ) \leqslant \exp \bigg \{ -nd\delta \ln \bigg (\frac{nd\delta ^2}{e (dk)^{2t}}\bigg ) \bigg \} \leqslant \exp \{-n(\ln n)^{1/2}\} \end{aligned}$$

for large enough n. Recalling (59) gives the claimed bound.\(\square \)

Given an instance \(\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})\), let \(V_t\) be as defined above and take \(V'\equiv V_{4T}\). We then take i.i.d. random variables \(I_v\sim \mathrm {Ber}(\epsilon ')\) indexed by \(v\in V'\) (for \(\epsilon '\) a constant to be determined) and let

$$\begin{aligned} Y_v\equiv \mathbf {1}\{ I_v=1, \text { and } I_u=0 \text { for all } u\in B_{4T}(v){\setminus }\{v\}\}\,. \end{aligned}$$
(60)

We then define \(\mathbb {P}(Y\,|\,\mathscr {G})\) to be the law of the set \(Y=\{v\in V' : Y_v=1\}\). Note that the random variables \(Y_v\), for \(v\in V'\), all have the same expected value, so we can define \(\epsilon \equiv (\mathbb {E}Y_v)/2\).

Lemma 4.9

Define \({\mathscr {B}}\) to be the set of all \(\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})\) with \(|V{\setminus } V_{4T}|\leqslant n/(\ln \ln n)\). For the sampling mechanism described above, condition (58) holds for any \(\mathscr {G}\notin {\mathscr {B}}\) and any coloring \(\underline{{\sigma }}\) on \(\mathscr {G}\).

Proof

Fix an instance \(\mathscr {G}\notin {\mathscr {B}}\) and a coloring \(\underline{{\sigma }}\) on \(\mathscr {G}\). Recalling Definition 4.1, for each \(v\in V\) denote

$$\begin{aligned} X_v\equiv ({\dot{X}}_v,{\hat{X}}_v,{\bar{X}}_v) \equiv H^\text {sm}(\mathcal {G},\{v\},\underline{{\sigma }})\,. \end{aligned}$$

Assume without loss that \(V'\equiv V_{4T} = [n']\equiv \{v_1,\ldots ,v_{n'}\}\), and for \(0\leqslant \ell \leqslant n'\) let \(\mathscr {F}_\ell \) denote the sigma-field generated by \(Y_1,\ldots ,Y_\ell \). Consider

$$\begin{aligned}S \equiv \sum _{v\leqslant n'}A_v Y_v\end{aligned}$$

where we can take different choices of \(A_v\) to prove various different bounds:

  • taking \(A_v=1\) gives \(S=|Y|\) and \(\mathbb {E}S = 2n'\epsilon \);

  • taking \(A_v={\dot{X}}_v(\underline{{\eta }})\) gives \(S=|Y|\dot{H}^\text {sm}(\underline{{\eta }})\) and \(|\mathbb {E}S - 2n'\epsilon \dot{H}(\underline{{\eta }})|\leqslant n-n'\);

  • taking \(A_v={\hat{X}}_v(\underline{{\eta }})\) gives \(S=|Y|\hat{H}^\text {sm}(\underline{{\eta }})\) and \(|\mathbb {E}S - 2n'\epsilon \hat{H}(\underline{{\eta }})| \leqslant n-n'\);

  • taking \(A_v={\bar{X}}_v(\eta )\) gives \(S=|Y|\bar{H}^\text {sm}(\eta )\) and \(|\mathbb {E}S-2n'\epsilon \bar{H}(\eta )|\leqslant n-n'\),

where we recall that \(n-n'=|V{\setminus } V_{4T}\leqslant n/(\ln \ln n)\). Consider the Doob martingale

$$\begin{aligned} M_\ell \equiv \mathbb {E}(S\,|\,\mathscr {F}_\ell ) \equiv \sum _{v\leqslant n'} A_v \, \mathbb {E}(Y_v\,|\,\mathscr {F}_\ell )\,. \end{aligned}$$

For \(\ell \leqslant n'\), if v lies at distance greater than 8T from any variable in \([\ell ]\equiv \{v_1,\ldots ,v_\ell \}\), then

$$\begin{aligned} \mathbb {E}(Y_v\,|\,\mathscr {F}_\ell ) = \mathbb {E}Y_v = 2\epsilon \,. \end{aligned}$$

Thus, the only possibility for \(\mathbb {E}(Y_v\,|\,\mathscr {F}_{\ell +1})\ne \mathbb {E}(Y_v\,|\,\mathscr {F}_\ell )\) is that v lies within distance 8T of vertex \(\ell +1\). The number of such v is at most \((dk)^{8T}\), so we conclude

$$\begin{aligned} |M_{\ell +1}-M_\ell | \leqslant (dk)^{8T} \Vert A \Vert _\infty \leqslant (dk)^{8T}\,. \end{aligned}$$

It follows by the Azuma–Hoeffding martingale inequality that

$$\begin{aligned} \mathbb {P}(| S - \mathbb {E}S | \geqslant x) \leqslant \exp \bigg \{ -\frac{x^2}{2 n' (dk)^{16T}} \bigg \}\,. \end{aligned}$$

The result follows by summing over the choices of A listed above, combined with our above estimates on \(\mathbb {E}S\) for each choice of A.\(\square \)

Proof of Theorem 4.2

It follows from Lemmas 4.8 and 4.9 that the sampling mechanism described by (60) satisfies the conditions of Definition 4.6. The result then follows by taking \(n\rightarrow \infty \) in Proposition 4.7. \(\square \)

5 Solution of tree optimization

From Theorem 4.2 we see that if \(H\in \varvec{\Delta }\) is a local maximizer for the first moment exponent \(\varvec{F}=\varvec{F}_{\lambda ,T}\), then its symmetrization \(H^\text {sy}\) must be a zero of the function \(\varvec{\Xi }=\varvec{\Xi }_{\lambda ,T}\) defined by (50). The analogous statement holds in the second moment with \(\varvec{F}_2\) and \(\varvec{\Xi }_2\). The functions \(\varvec{\Xi },\varvec{\Xi }_2\) correspond to tree optimization problems, which we solve in this section by relating them to the bp recursions for the coloring model.

Proposition 5.1

For \(0\leqslant \lambda \leqslant 1\) and \(1\leqslant T<\infty \), let \(H_\star \in \varvec{\Delta }\) and \(H_\bullet \in \varvec{\Delta }_2\) be as in Definition 5.6 below.

  1. a.

    On \(\{H\in \mathbf {N}_\circ :H = H^\text {sy}\}\), \(\varvec{\Xi }\) is uniquely minimized at \(H=H_\star \), with \(\varvec{\Xi }(H_\star )=0\).

  2. b.

    On \(\{H\in \mathbf {N}_\text {se}:H = H^\text {sy}\}\), \(\varvec{\Xi }_2\) is uniquely minimized at \(H=H_\bullet \), with \(\varvec{\Xi }_2(H_\bullet )=0\).

Moreover there is a positive constant \(\epsilon =\epsilon (d,k,T)\) such that

  1. 1.

    \(\varvec{\Xi }(H) \geqslant \epsilon \Vert {H-H_\star } \Vert ^2\) for all \(H\in \varvec{\Delta }\) with \(H=H^\text {sy}\) and \(\Vert {H-H_\star } \Vert \leqslant \epsilon \), and

  2. 2.

    \(\varvec{\Xi }_2(H) \geqslant \epsilon \Vert {H-H_\bullet } \Vert ^2\) for all \(H\in \varvec{\Delta }_2\) with \(H=H^\text {sy}\) and \(\Vert H-H_\bullet \Vert \leqslant \epsilon \).

5.1 Tree optimization problem

Recall from the previous section that in the local update procedure, we sample a subset of variables Y and consider its neighborhood \(\mathscr {N}=(\mathcal {N},\underline{{\texttt {L}}}_\mathcal {N})\). Writing \(s=|Y|\), the graph \(\mathcal {N}\) is the disjoint union of \(\mathcal {D}_1,\ldots ,\mathcal {D}_s\) where each \(\mathcal {D}_i\) is a copy of the tree \(\mathcal {D}\) of Fig. 6. Let \(\varvec{\Pi }\) be the space of probability measures on colorings of \(\mathcal {D}\). Any coloring \(\underline{{\sigma }}_\mathcal {N}\) can be summarized by \(\nu \in \varvec{\Pi }\) where \(\nu (\underline{{\sigma }}_\mathcal {D})\) is the fraction of copies \(\mathcal {D}_i\) with \(\underline{{\sigma }}_{\mathcal {D}_i}=\underline{{\sigma }}_\mathcal {D}\). The sample empirical measure \(H^\text {sm}=H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})\) can be obtained as a linear projection of \(\nu \), and we hereafter denote this relation by \(H^\text {sm}=H^\text {tr}(\nu )\). Recalling (52), we have \(\mathbb {E}^lit [(\varvec{w}^\text {lit}_\mathscr {N}(\underline{{\sigma }}_\mathcal {N}))^\lambda ] = \varvec{w}_{\mathcal {D}}(\underline{{\sigma }}_{\mathcal {D}_1})^\lambda \cdots \varvec{w}_{\mathcal {D}}(\underline{{\sigma }}_{\mathcal {D}_s})^\lambda \) where

$$\begin{aligned}\varvec{w}_{\mathcal {D}}(\underline{{\sigma }}_\mathcal {D}) = \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{e\in \delta v}\bigg \{ \bar{\Phi }(\sigma _e) \hat{\Phi }(\underline{{\sigma }}_{\delta a(e)}) \bigg \}\,.\end{aligned}$$

Lemma 5.2

The function \(\varvec{\Lambda }\) of (49) is concave on \(\varvec{\Delta }^\text {sm}\), and can be expressed as

$$\begin{aligned} \varvec{\Lambda }(H)=\sup \Big \{ \mathcal {H}(\nu )+\lambda \langle \ln \varvec{w}_\mathcal {D},\nu \rangle :\nu \in \varvec{\Pi }\text { with } H^\text {tr}(\nu )=H \Big \}\,. \end{aligned}$$
(61)

Proof

The function \(\varvec{\Lambda }(H)\) is the sum of \(\varvec{\Sigma }^\text {tr}(H)\) and the linear function \(\lambda \varvec{s}^\text {tr}(H)\), so it suffices to show that \(\varvec{\Sigma }^\text {tr}\) is concave on \(\varvec{\Delta }^\text {sm}\). For \(H=(\dot{H},\hat{H},\bar{H})\in \varvec{\Delta }^\text {sm}\), if \(X\in \Omega ^k\) is a random variable with law \(\hat{H}\), then the first coordinate \(X_1\) has marginal law \(\bar{H}\) by (47). It follows that for any \(H\in \varvec{\Delta }^\text {sm}\) we can express

$$\begin{aligned} \varvec{\Sigma }^\text {tr}(H) =\mathcal {H}(\dot{H})+d\mathcal {H}(X)-d\mathcal {H}(X_1)+\varvec{v}(H) =\mathcal {H}(\dot{H})+d\mathcal {H}(X\,|\,X_1)+\varvec{v}(H)\,. \end{aligned}$$

The entropy function is concave and \(\varvec{v}\) is linear, so this proves that \(\varvec{\Sigma }^\text {tr}\) (hence \(\varvec{\Lambda }\)) is indeed concave on \(\varvec{\Delta }^\text {sm}\). In fact this can be argued alternatively, as follows. Recalling (56), note that for \(H\in \varvec{\Pi }\) we have

$$\begin{aligned}s^{O(1)}\exp \{s\varvec{\Lambda }(H)\} =\mathbb {E}\varvec{Z}(H;\mathscr {N}) =\sum _{\nu \in \varvec{\Pi }} \mathbf {1}\{H^\text {tr}(\nu )=H\} \left( {\begin{array}{c}s\\ s\nu \end{array}}\right) (\varvec{w}_\mathcal {D})^{\lambda \nu }\,. \end{aligned}$$

Expanding with Stirling’s formula gives the representation (61), which also implies concavity of \(\varvec{\Lambda }\).\(\square \)

Thus, for \(H\in \varvec{\Delta }^\text {sm}\), we have \(\varvec{\Xi }(H) = \varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))- \varvec{\Lambda }(H)\) where \(\varvec{\Lambda }\) is given by (61), and

$$\begin{aligned} \varvec{\Lambda }^\text {op}({\dot{h}}) = \sup \Big \{\mathcal {H}(\nu ) + \lambda \langle \ln \varvec{w}_{\mathcal {D}},\nu \rangle : \nu \in \varvec{\Pi }\text { with } \dot{h}^\text {tr}(H^\text {tr}(\nu ))={\dot{h}} \Big \}\,. \end{aligned}$$
(62)

Both (61) and (62) fall in the general category of entropy maximization problems subject to linear constraints. In “Appendix C” we review basic calculations for problems of this type. The discussion there, in particular Remark C.7, implies that for any \({\dot{h}}\), there is a unique measure \(\nu =\nu ^\text {op}({\dot{h}})\) achieving the maximum in (62). Moreover, there exists a probability measure \({\dot{q}}\) on \(\dot{\Omega }_T\)—serving the role of Lagrange multipliers for the constrained maximization—such that \(\nu ^\text {op}({\dot{h}})\) can be expressed as

$$\begin{aligned} \nu (\underline{{\sigma }}_\mathcal {D}) =\nu _{\dot{q}}(\underline{{\sigma }}_\mathcal {D}) \equiv \frac{\varvec{w}_\mathcal {D}(\underline{{\sigma }})^\lambda }{Z} \prod _{e\in \delta \mathcal {D}} {\dot{q}}(\dot{\sigma }_e)\,, \end{aligned}$$
(63)

where Z is the normalizing constant. The analogous statement holds for the second moment.

5.2 BP recursions

We now state the bp recursions for the \(\lambda \)-tilted T-coloring model. In the standard formulation (e.g. [33, Ch. 14]), this is a pair of relations for probability measures \(\dot{\varvec{q}}\), \(\hat{\varvec{q}}\) on \(\Omega _T\):

$$\begin{aligned} \dot{\varvec{q}}(\sigma )&=[\dot{\varvec{B}}_{\lambda ,T}(\hat{\varvec{q}})](\sigma ) \cong \mathbf {1}\{\sigma \in \Omega _T\} \bar{\Phi }(\sigma )^\lambda \sum _{\underline{{\sigma }}\in (\Omega _T)^d} \mathbf {1}\{\sigma _1=\sigma \} \dot{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=2}^d\hat{\varvec{q}}(\sigma _i)\\ \hat{\varvec{q}}(\sigma )&=[\hat{\varvec{B}}_{\lambda ,T}(\dot{\varvec{q}})](\sigma ) \cong \mathbf {1}\{\sigma \in \Omega _T\} \bar{\Phi }(\sigma )^\lambda \sum _{\underline{{\sigma }}\in (\Omega _T)^k} \mathbf {1}\{\sigma _1=\sigma \} \hat{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=2}^k\dot{\varvec{q}}(\sigma _i) \end{aligned}$$

where \(\cong \) denotes equality up to normalization, so that the mapping always outputs a probability measure. Recall from Definition 4.1 that for \(\dot{\sigma }\equiv (\dot{\sigma },\hat{\sigma })\in \Omega _T\) we have \(\dot{\sigma }\in \dot{\Omega }_T\) and \(\hat{\sigma }\in \hat{\Omega }_T\). For our purposes we can assume a one-sided dependence, meaning there are probability measures \({\dot{q}}\) on \(\dot{\Omega }_T\) and \(\hat{q}\) on \(\hat{\Omega }_T\) such that \(\dot{\varvec{q}}(\sigma ) \cong {\dot{q}}(\dot{\sigma })\mathbf {1}\{\sigma \in \Omega _T\}\) and \(\hat{\varvec{q}}(\sigma )\cong \hat{q}(\hat{\sigma })\mathbf {1}\{\sigma \in \Omega _T\}\). One can then check (e.g. [33, Ch. 19]) that the bp recursions preserve the one-sided property, so that \(\dot{\varvec{B}}_{\lambda ,T}\) and \(\hat{\varvec{B}}_{\lambda ,T}\) restrict to mappings

$$\begin{aligned} \dot{{\texttt {BP}}}\equiv \dot{{\texttt {BP}}}_{\lambda ,T} : {\mathscr {P}}(\hat{\Omega }_T) \rightarrow {\mathscr {P}}(\dot{\Omega }_T)\,,\quad \hat{{\texttt {BP}}}\equiv \hat{{\texttt {BP}}}_{\lambda ,T} : {\mathscr {P}}(\dot{\Omega }_T) \rightarrow {\mathscr {P}}(\hat{\Omega }_T)\,. \end{aligned}$$
(64)

We also denote \({\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,T} \equiv \dot{{\texttt {BP}}}\circ \hat{{\texttt {BP}}}\). Given any \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)\), write \(\hat{q}\equiv {\texttt {BP}}{\dot{q}}\), and let \(H\equiv H_{\dot{q}}\) be defined by

$$\begin{aligned} \dot{H}_{\dot{q}}(\underline{{\sigma }}) =\frac{\dot{\Phi }(\underline{{\sigma }})^\lambda }{\dot{\mathfrak {z}}} \prod _{i=1}^d \hat{q}(\hat{\sigma }_i), \quad \hat{H}_{\dot{q}}(\underline{{\sigma }}) =\frac{\hat{\Phi }(\underline{{\sigma }})^\lambda }{\hat{\mathfrak {z}}} \prod _{i=1}^d {\dot{q}}(\dot{\sigma }_i), \quad \bar{H}_{\dot{q}}(\sigma ) =\frac{\bar{\Phi }(\sigma )^{-\lambda }}{\bar{\mathfrak {z}}} {\dot{q}}(\dot{\sigma }) \hat{q}(\hat{\sigma }) \end{aligned}$$
(65)

where \(\dot{\mathfrak {z}}\), \(\hat{\mathfrak {z}}\), and \(\bar{\mathfrak {z}}\) are normalizing constants, all dependent on \({\dot{q}}\). Clearly, \(H_{\dot{q}}=(H_{\dot{q}})^\text {sy}\). If \({\dot{q}}\) is a fixed point of \({\texttt {BP}}\), then \(H_{\dot{q}}\in \varvec{\Delta }\). An entirely similar discussion applies to the pair (second moment) model, where the bp recursion reduces to a pair of mappings between \({\mathscr {P}}((\dot{\Omega }_T)^2)\) and \({\mathscr {P}}((\hat{\Omega }_T)^2)\). If \({\dot{q}}\in {\mathscr {P}}((\dot{\Omega }_T)^2)\) is a fixed point of \({\texttt {BP}}\), then (65) defines an element \(H_{\dot{q}}\in \varvec{\Delta }_2\).

Lemma 5.3

For \(1\leqslant T<\infty \), if \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)\) is any fixed point of \({\texttt {BP}}_{\lambda ,T}\) which has full support on \(\dot{\Omega }_T\), then \(\varvec{\Xi }(H_{\dot{q}})=0\). The analogous statement holds for the second moment.

Proof

Consider the optimization problem (62) for \(\varvec{\Lambda }^\text {op}({\dot{h}})\) with \({\dot{h}}=\dot{h}^\text {tr}(H_{\dot{q}})\). As noted above, \(\nu ^\text {op}({\dot{h}})\) can be written (63) as \(\nu _{\tilde{q}}\) for some measure \(\tilde{q}\in {\mathscr {P}}(\dot{\Omega }_T)\), which may not be unique if the constraint \(\dot{h}^\text {tr}(H^\text {tr}(\nu ))={\dot{h}}\) is rank-deficient. However, if \({\dot{q}}\) has full support on \(\dot{\Omega }_T\), then \({\dot{h}}=\dot{h}^\text {tr}(H_{\dot{q}})\) does also. In this case it is straightforward to check that the constraints are indeed of full rank, so \(\tilde{q}\) is unique. Because \({\dot{q}}\) is a fixed point of \({\texttt {BP}}\), the measure \(\nu _{\dot{q}}\) satisfies \(H^\text {tr}(\nu _{\dot{q}})=H_{\dot{q}}\), so it also satisfies the weaker constraint \(\dot{h}^\text {tr}(H^\text {tr}(\nu _{\dot{q}}))={\dot{h}}\). It follows by the above uniqueness argument that \({\dot{q}}=\tilde{q}\). Therefore, \(\nu _{\dot{q}}\) solves the optimization problem (62) for \(\varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H_{\dot{q}}))\), as well as the optimization problem (61) for \(\varvec{\Lambda }(H_{\dot{q}})\), so we conclude \(\varvec{\Xi }(H_{\dot{q}})=0\) as claimed.\(\square \)

Lemma 5.4

For \(0\leqslant \lambda \leqslant 1\) and \(1\leqslant T<\infty \), let \(\varvec{\Xi }=\varvec{\Xi }_{\lambda ,T}\), \(\varvec{\Xi }_2=\varvec{\Xi }_{2,\lambda ,T}\), and \({\texttt {BP}}={\texttt {BP}}_{\lambda ,T}\).

  1. a.

    If \(H\in \mathbf {N}_\circ \) with \(H=H^\text {sy}\) and \(\varvec{\Xi }(H)=0\), then \(H=H_{\dot{q}}\) where \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)\) is a fixed point of \({\texttt {BP}}\).

  2. b.

    If \(H\in \mathbf {N}_\text {se}\) with \(H=H^\text {sy}\) and \(\varvec{\Xi }_2(H)=0\), then \(H=H_{\dot{q}}\) where \({\dot{q}}\in {\mathscr {P}}((\dot{\Omega }_T)^2)\) is a fixed point of \({\texttt {BP}}\).

Proof

Let \(\mu =\nu ^\text {op}(H)\) denote the solution of the optimization problem (61) for \(\varvec{\Lambda }(H)\), and let \(\nu =\nu ^\text {op}(\dot{h}^\text {tr}(H))\) denote the solution of the optimization problem (62) for \(\varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))\). Since (62) has a unique optimizer, we have \(\varvec{\Xi }(H)=0\) if and only if \(\mu =\nu \). This means \(H^\text {tr}(\nu )=H\), but also \(\nu =\nu _{\dot{q}}\) from (63), which gives

$$\begin{aligned} \hat{H}(\underline{{\sigma }}) \cong \hat{\Phi }(\underline{{\sigma }})^\lambda (({\texttt {BP}}{\dot{q}})(\dot{\sigma }_1)) \prod _{i=2}^k {\dot{q}}(\dot{\sigma }_i)\,. \end{aligned}$$
(66)

We now claim that in order for \(\hat{H}=\hat{H}^\text {sy}\), we must have \({\texttt {BP}}{\dot{q}}={\dot{q}}\). Note that if \(\hat{\Phi }\) were fully supported on \((\Omega _T)^k\), and both \({\dot{q}}\) and \({\texttt {BP}}{\dot{q}}\) were fully supported on \(\dot{\Omega }_T\), the claim would be obvious. Since \(\hat{\Phi }\) is certainly not fully supported, and we also do not know a priori whether \({\dot{q}}\) and \({\texttt {BP}}{\dot{q}}\) are fully supported, the claim requires some argument, which differs slightly between the first- and second-moment cases:

  1. a.

    In the first moment, Lemma 3.3 implies that \({\dot{q}}(\dot{\sigma })\) is positive for at least one \(\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\). Assume without loss that \({\dot{q}}({\texttt {b}}_{\texttt {0}})\) is positive; it follows that \(({\texttt {BP}}{\dot{q}})(\dot{\sigma })\) is positive for both \(\dot{\sigma }={\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\). For any \(\dot{\sigma }\in \dot{\Omega }\), there exists \(\hat{\sigma }\) such that

    $$\begin{aligned} \hat{\Phi }( (\dot{\sigma },\hat{\sigma }), {\texttt {b}}_{\texttt {0}},\ldots ,{\texttt {b}}_{\texttt {0}})>0\,. \end{aligned}$$
    (67)

    The symmetry of \(\hat{H}\) then gives the relation

    $$\begin{aligned} \frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}})({\texttt {b}}_{\texttt {0}})} = \frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}})}, \end{aligned}$$

    so it follows that \({\texttt {BP}}{\dot{q}}={\dot{q}}\) in the first moment.

  2. b.

    In the second moment, since we restrict to \(H\in \mathbf {N}_\text {se}\), \({\dot{q}}(\dot{\sigma })\) is positive for at least one \(\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2\). Assume without loss that \({\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})\) is positive. For any \(\dot{\sigma }\notin \{{\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}\}\), there exists \(\hat{\sigma }\) such that the second-moment analogue of (67) holds. The preceding argument gives

    $$\begin{aligned}\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} = \frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} \quad \text {for all } \dot{\sigma }\notin \{{\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}\}\,. \end{aligned}$$

    Since \(({\texttt {BP}}{\dot{q}})(\dot{\sigma })\) is positive for all \(\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2\), it follows that the same holds for \({\dot{q}}\), so

    $$\begin{aligned}\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})} = \frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})} \quad \text {for all } \dot{\sigma }\notin \{ {\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}, {\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}}\}\,.\end{aligned}$$

    Combining these, we have for \(\dot{\sigma }\in \{{\texttt {r}}_{{\texttt {0}}}{\texttt {r}}_{{\texttt {1}}},{\texttt {r}}_{{\texttt {1}}}{\texttt {r}}_{{\texttt {0}}}\}\) that

    $$\begin{aligned}\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} =\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})} \frac{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} =\frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})}, \end{aligned}$$

    and this proves \({\texttt {BP}}{\dot{q}}={\dot{q}}\) in the second moment.

Altogether, the above proves in both the first- and second-moment settings that \({\dot{q}}\) is a bp fixed point.\(\square \)

5.3 BP contraction and conclusion

The next step is to (explicitly) define a subset \(\varvec{\Gamma }\) of measures \({\dot{q}}\) on which we have a contraction estimate of the form \(\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \) for a constant \(c<1\). A useful feature of nae-sat is that its bp recursions are self-averaging: if \({\dot{q}}\) is a measure on \(\dot{\Omega }_T\), let

$$\begin{aligned}{\dot{q}}^\text {av}(\dot{\sigma }) \equiv \frac{{\dot{q}}(\dot{\sigma })+{\dot{q}}(\dot{\sigma }\oplus {\texttt {1}})}{2}\,. \end{aligned}$$

Then \(\hat{{\texttt {BP}}}{\dot{q}}=\hat{{\texttt {BP}}}{\dot{q}}^\text {av}\), and consequently \({\texttt {BP}}{\dot{q}}={\texttt {BP}}{\dot{q}}^\text {av}\). The analogous statement holds in the second moment. It then suffices to prove contraction on the measures \({\dot{q}}={\dot{q}}^\text {av}\), since for general \({\dot{q}}\) it implies

$$\begin{aligned}\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert =\Vert {{\texttt {BP}}{\dot{q}}^\text {av}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}^\text {av}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \,. \end{aligned}$$

Abbreviate \(\{{\texttt {r}}\}\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\) and \(\{{\texttt {b}}\}\equiv \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\). In a mild abuse of notation we now write \(\{{\texttt {f}}\}\) for \((\dot{\Omega }\cup \hat{\Omega }){\setminus }\{{\texttt {r}},{\texttt {b}}\}\); so for instance \({\dot{q}}({\texttt {f}}) ={\dot{q}}(\dot{\Omega }{\setminus }\{{\texttt {r}},{\texttt {b}}\}) ={\dot{q}}(\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \})\). For the first moment analysis, let \(\varvec{\Gamma }\) be the set of measures \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)\) satisfying \({\dot{q}}={\dot{q}}^\text {av}\), such that

$$\begin{aligned} \frac{{\dot{q}}({\texttt {r}}) + 2^k{\dot{q}}({\texttt {f}}) }{C} \leqslant {\dot{q}}({\texttt {b}}) \leqslant \frac{{\dot{q}}({\texttt {r}})}{1-C/2^k} \end{aligned}$$
(68)

for C a large constant (to be determined). For the second moment analysis, let \(\varvec{\Gamma }(c,\kappa )\) be the set of measures \({\dot{q}}\in {\mathscr {P}}((\dot{\Omega }_T)^2)\) satisfying \({\dot{q}}={\dot{q}}^\text {av}\), such that

$$\begin{aligned}&|{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})| \leqslant (k^9/2^{ck}) {\dot{q}}({\texttt {b}}{\texttt {b}}),\hbox { and }{\dot{q}}({\texttt {f}}{\texttt {f}})+{\dot{q}}(\{{\texttt {f}}{\texttt {r}},{\texttt {r}}{\texttt {f}}\})/2^k\\&\quad +{\dot{q}}({\texttt {r}}{\texttt {r}})/4^k \leqslant (C/2^k) {\dot{q}}({\texttt {b}}{\texttt {b}});\qquad \qquad \quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\,\, (1\varvec{\Gamma })\\&{\dot{q}}(\{{\texttt {r}}{\texttt {f}},{\texttt {f}}{\texttt {r}}\} ) \leqslant (C/2^{k\kappa }){\dot{q}}({\texttt {b}}{\texttt {b}})\hbox { and }{\dot{q}}({\texttt {r}}{\texttt {r}}) \leqslant C 2^{k(1-\kappa ) } {\dot{q}}({\texttt {b}}{\texttt {b}});\qquad \qquad \qquad \quad \qquad \,\, (2\varvec{\Gamma })\\&{\dot{q}}({\texttt {r}}_{\varvec{x}}\dot{\sigma }) \geqslant [1-{C/2^k}] {\dot{q}}({\texttt {b}}_{\varvec{x}}\dot{\sigma })\text { and } {\dot{q}}(\dot{\sigma }{\texttt {r}}_{\varvec{x}})\\ {}&\geqslant [1-{C/2^k}] {\dot{q}}(\dot{\sigma }{\texttt {b}}_{\varvec{x}})\text { for all } {\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}, \dot{\sigma }\in \dot{\Omega }.\quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad (3\varvec{\Gamma }) \end{aligned}$$

(To clarify the notation: since \({\texttt {b}}\equiv \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\), in (\(1\varvec{\Gamma }\)) the expression \({\dot{q}}({\texttt {b}}{\texttt {b}})\) refers to \({\dot{q}}(\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2)\). Similarly, \({\dot{q}}({\texttt {f}}{\texttt {r}})\) refers to \({\dot{q}}(\{{\texttt {f}}\}\times \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\})\) where in this context \(\{{\texttt {f}}\}=(\dot{\Omega }\cup \hat{\Omega }){\setminus }\{{\texttt {r}},{\texttt {b}}\}\).)

Proposition 5.5

(proved in “Appendix A”) Assume \(0\leqslant \lambda \leqslant 1\). In the first moment, we have:

  1. a.

    For any \(1\leqslant T\leqslant \infty \), the map \({\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,T}\) has a unique fixed point \({\dot{q}}_\star \equiv {\dot{q}}_{\lambda ,T}\in \varvec{\Gamma }\). For any \({\dot{q}}\in \varvec{\Gamma }\), we have \({\texttt {BP}}{\dot{q}}\in \varvec{\Gamma }\) also, with \(\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^2/2^k) \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \).

  2. b.

    In the limit \(T\rightarrow \infty \), \(\Vert {{\dot{q}}_{\lambda ,T} - {\dot{q}}_{\lambda ,\infty }} \Vert \rightarrow 0\).

In the second moment, for any \(1\leqslant T\leqslant \infty \), we have the following:

  1. A.

    The map \({\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,T}\) has a unique fixed point in \(\varvec{\Gamma }(1,1)\), given by \({\dot{q}}_\star \otimes {\dot{q}}_\star \) with \({\dot{q}}_\star \) as in part a. Moreover, for \(c\in (0,1]\) and k sufficiently large, there is no other fixed point of \({\texttt {BP}}\) in \(\varvec{\Gamma }(c,1)\): if \({\dot{q}}\in \varvec{\Gamma }(c,1)\) then \({\texttt {BP}}{\dot{q}}\in \varvec{\Gamma }(1,1)\), with \(\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^4/2^k) \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \).

  2. B.

    If for some \(c\in (0,1]\) we have \({\dot{q}}\in \varvec{\Gamma }(c,0)\) and \({\dot{q}}= {\texttt {BP}}{\dot{q}}\), then \({\dot{q}}\in \varvec{\Gamma }(c,1)\).

Definition 5.6

(optimal empirical measures) For \(0\leqslant \lambda \leqslant 1\) and \(1\leqslant T<\infty \), take the fixed point \({\dot{q}}_\star ={\dot{q}}_{\lambda ,T}\) as given by Proposition 5.5a, and use (65) to define \(H_\star =H_{{\dot{q}}_\star }\in \varvec{\Delta }\) and \(H_\bullet =H_{{\dot{q}}_\star \otimes {\dot{q}}_\star }\in \varvec{\Delta }_2\). Note that this agrees with our earlier definition of \(H_\bullet \), in the discussion below Lemma 3.9.

Lemma 5.7

Let \({\dot{q}}\) be any fixed point of \({\texttt {BP}}\) that arises from Lemma 5.4.

  1. a.

    If \(H=H_{\dot{q}}\in \mathbf {N}_\circ \), then \({\dot{q}}={\dot{q}}_\star \) and so \(H=H_\star \).

  2. b.

    If \(H=H_{\dot{q}}\in \mathbf {N}_\text {se}\), then \({\dot{q}}={\dot{q}}_\star \otimes {\dot{q}}_\star \) and so \(H=H_\bullet \).

Proof

Since \({\dot{q}}={\texttt {BP}}{\dot{q}}\), we must have \({\dot{q}}={\dot{q}}^\text {av}\). Below we argue separately for the first and second moment. In each case we repeatedly take advantage of the fact that \(H=H_{\dot{q}}\) is symmetric.

  1. a.

    For the first moment, by Proposition 5.5a it suffices to show that \({\dot{q}}\) must lie in the set \(\varvec{\Gamma }\) defined by (68). It follows directly from the relation \({\dot{q}}={\texttt {BP}}{\dot{q}}\) that \({\dot{q}}({\texttt {r}})\geqslant {\dot{q}}({\texttt {b}})\). By definition of \(\mathbf {N}_\circ \) we must have \(\bar{H}({\texttt {r}})\leqslant 7/2^k\) and \(\bar{H}({\texttt {f}})\leqslant 7/2^k\), so the vast majority of clauses must have all incident colors in \(\{{\texttt {b}}\}=\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\):

    $$\begin{aligned}1-\frac{14k}{2^k} \leqslant \hat{H}({\texttt {b}}^k) = \frac{1}{\hat{\mathfrak {z}}}\sum _{\underline{{\sigma }}\in {\texttt {b}}^k} \hat{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=1}^k {\dot{q}}(\dot{\sigma }_i) \leqslant \frac{{\dot{q}}({\texttt {b}})^k}{\hat{\mathfrak {z}}}\,. \end{aligned}$$

    Next, if \(\underline{{\sigma }}\in {\texttt {r}}{\texttt {b}}^{k-1}\), we have \(\hat{\Phi }(\underline{{\sigma }})^\lambda =\mathbb {E}^lit [\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})^\lambda ] \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})= 2/2^k\), so

    $$\begin{aligned}\frac{7}{2^k}\geqslant \bar{H}({\texttt {r}})= \hat{H}({\texttt {r}}{\texttt {b}}^{k-1}) \geqslant \frac{{\dot{q}}({\texttt {r}}){\dot{q}}({\texttt {b}})^{k-1}}{2^{k-1} \hat{\mathfrak {z}}} \geqslant \frac{{\dot{q}}({\texttt {r}})}{{\dot{q}}({\texttt {b}})} \frac{2}{2^k} \bigg (1-\frac{14k}{2^k} \bigg )\,,\end{aligned}$$

    which gives \({\dot{q}}({\texttt {r}})/{\dot{q}}({\texttt {b}})\leqslant 4\) for large k. Similarly, if \(\underline{{\sigma }}\in {\texttt {f}}{\texttt {b}}^{k-1}\) with \(\hat{\sigma }_1={\texttt {s}}\) (indicating a separating clause), then \(\hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}}) = 1-4/2^k\), so

    $$\begin{aligned}\frac{7}{2^k} \geqslant \bar{H}({\texttt {f}}) \geqslant \hat{H}({\texttt {f}}{\texttt {b}}^{k-1}) \geqslant \bigg (1-\frac{4}{2^k}\bigg ) \frac{{\dot{q}}({\texttt {f}}){\dot{q}}({\texttt {b}})^{k-1}}{\hat{\mathfrak {z}}} \geqslant \frac{{\dot{q}}({\texttt {f}})}{{\dot{q}}({\texttt {b}})} \bigg (1-\frac{4}{2^k}\bigg ) \bigg (1-\frac{14k}{2^k}\bigg )\,, \end{aligned}$$

    which gives \({\dot{q}}({\texttt {f}})/{\dot{q}}({\texttt {b}}) \leqslant 8/2^k\) for large k. Combining these estimates proves \({\dot{q}}\in \varvec{\Gamma }\).

  2. b.

    For the second moment, by Proposition 5.5A it suffices to verify \({\dot{q}}\in \varvec{\Gamma }(1,1)\), as defined by (\(1\varvec{\Gamma }\))–(\(3\varvec{\Gamma }\)). Condition (\(3\varvec{\Gamma }\)) is immediate from the relation \({\dot{q}}={\texttt {BP}}{\dot{q}}\). Moreover, by Proposition 5.5B it suffices to show \({\dot{q}}\in \varvec{\Gamma }(1,0)\), in which case condition (\(2\varvec{\Gamma }\)) follows from (\(1\varvec{\Gamma }\)). It remains to verify (\(1\varvec{\Gamma }\)). For this end, we denote \(\mathbb {B}\equiv \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2\), and partition this into \(\mathbb {B}_\texttt {=}\equiv \{{\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}\}\) and . By definition, for any \(H\in \mathbf {N}_\text {se}\) the single-copy marginals \(H^1,H^2\) lie in \(\mathbf {N}\subseteq \mathbf {N}_\circ \), so the total density of \(\{{\texttt {r}},{\texttt {f}}\}\) edges in either copy is very small. As a result the vast majority of clauses have all incident colors in \(\mathbb {B}\):

    $$\begin{aligned}1-\frac{28k}{2^k} \leqslant \hat{H}(\mathbb {B}^k) \leqslant \frac{{\dot{q}}(\mathbb {B})^k}{\hat{\mathfrak {z}}}\,.\end{aligned}$$

    For \(H\in \mathbf {N}_\text {se}\), we have . The fraction of edges in not-all-\(\mathbb {B}\) clauses is \(O(k/2^k)\), and for \(\underline{{\sigma }}\in \mathbb {B}^k\) we have \(1\geqslant \hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}}) = 1 - O(k/2^k)\), so

    Rearranging gives , which proves the first part of (\(1\varvec{\Gamma }\)) (with \(c=1\)). It remains to show the second part of (\(1\varvec{\Gamma }\)). If we denote \(\mathbb {R}_\texttt {=}\equiv \{{\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}}\}\) and consider \(\underline{{\sigma }}\in \mathbb {R}_\texttt {=}(\mathbb {B}_\texttt {=})^{k-1}\), then (similarly as above) we have \(\hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=2/2^k\), so

    $$\begin{aligned}\frac{7}{2^k}\geqslant \bar{H}(\mathbb {R}_\texttt {=}) =\hat{H}(\mathbb {R}_\texttt {=}(\mathbb {B}_\texttt {=})^{k-1}) \geqslant \frac{2}{2^k} \frac{{\dot{q}}(\mathbb {R}_\texttt {=}){\dot{q}}(\mathbb {B}_\texttt {=})^{k-1}}{\hat{\mathfrak {z}}} \geqslant \frac{2}{4^k}\frac{{\dot{q}}(\mathbb {R}_\texttt {=})}{{\dot{q}}(\mathbb {B})} \bigg ( 1-\frac{O(k^5)}{2^{k/2}}\bigg )\,,\end{aligned}$$

    where the last inequality is by the preceding estimates on \({\dot{q}}(\mathbb {B})\) and \({\dot{q}}(\mathbb {B}_\texttt {=})\). The same calculation bounds for . Next consider \(\sigma =((\dot{\sigma },{\texttt {s}}),{\texttt {r}})\in \{{\texttt {f}}{\texttt {r}}\}\): if \(\underline{{\sigma }}\in \Omega ^k\) with \(\sigma _1=\sigma \) and the other \(k-1\) entries in \(\mathbb {B}\), then \(\hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant 4/2^k\) as long as the other entries are not all \(\mathbb {B}_\texttt {=}\) or all . Therefore

    and the same calculation bounds \({\dot{q}}({\texttt {r}}{\texttt {f}})\). Finally, for \(\sigma =((\dot{\sigma }^1,{\texttt {s}}),(\dot{\sigma }^2,{\texttt {s}}))\in \{{\texttt {f}}{\texttt {f}}\}\), we can consider \(\underline{{\sigma }}\in \Omega ^k\) with \(\sigma _1=\sigma \) and the other \(k-1\) entries in \(\mathbb {B}\); therefore

    $$\begin{aligned}\frac{7}{2^k}\geqslant \bar{H}({\texttt {f}}{\texttt {f}}) \geqslant \frac{{\dot{q}}({\texttt {f}}{\texttt {f}}){\dot{q}}(\mathbb {B})^{k-1}}{\hat{\mathfrak {z}}} \bigg (1-\frac{4}{2^k}\bigg ) \geqslant \frac{{\dot{q}}({\texttt {f}}{\texttt {f}})}{{\dot{q}}(\mathbb {B})}\bigg (1-\frac{O(k)}{2^k}\bigg )\,.\end{aligned}$$

    Combining these estimate verifies the second part of (\(1\varvec{\Gamma }\)).

Altogether we have shown that if \(H=H_{\dot{q}}\) lies in \(\mathbf {N}_\circ \), then \({\dot{q}}\in \varvec{\Gamma }\) and so \({\dot{q}}={\dot{q}}_\star \); and if \(H=H_{\dot{q}}\) lies in \(\mathbf {N}_\text {se}\) then \({\dot{q}}\in \varvec{\Gamma }(1,1)\) and so \({\dot{q}}={\dot{q}}_\star \otimes {\dot{q}}_\star \). This concludes the proof. \(\square \)

Proof of Proposition 5.1

We will prove the claim in the first moment; the result for the second moment follows by the same argument. It follows from Lemmas 5.35.4 and 5.7 that the unique minimizer of \(\varvec{\Xi }\) on the set \(\{H\in \mathbf {N}_\circ :H=H^\text {sy}\}\) is \(H_\star \) (as given by Definition 5.6), with \(\varvec{\Xi }(H_\star )=0\). It remains to establish that, for \(H\in \varvec{\Delta }\) with \(H=H^\text {sy}\) and \(\Vert {H-H_\star } \Vert \leqslant \epsilon \), we have \(\varvec{\Xi }(H)\geqslant \epsilon \Vert {H-H_\star } \Vert ^2\). As in the proof of Lemma 5.4, let \(\mu =\nu ^\text {op}(H)\) be the solution of the optimization problem (61) for \(\varvec{\Lambda }(H)\), and let \(\nu =\nu ^\text {op}(\dot{h}^\text {tr}(H))\) be the solution of the optimization problem (62) for \(\varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))\). We have from (63) that for some \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)\),

$$\begin{aligned} \nu (\underline{{\sigma }}_\mathcal {D}) =\nu _{\dot{q}}(\underline{{\sigma }}_\mathcal {D}) = \frac{\varvec{w}_\mathcal {D}(\underline{{\sigma }})^\lambda }{Z} \prod _{e\in \delta \mathcal {D}}{\dot{q}}(\dot{\sigma }_e)\,. \end{aligned}$$

For \(e\in \delta \mathcal {D}\), abbreviate \(g_e(\underline{{\sigma }}_\mathcal {D}) \equiv \ln {\dot{q}}(\sigma _e)\). Then, for any probability measure \(\varpi \) on colorings \(\underline{{\sigma }}_\mathcal {D}\), the quantity \(\Lambda (\omega ) \equiv \mathcal {H}(\varpi )+\lambda \langle \ln \varvec{w}_\mathcal {D},\varpi \rangle \) can be expressed as

$$\begin{aligned} \Lambda (\omega )= & {} \mathcal {H}(\varpi ) +\bigg \langle \ln \nu +\ln Z -\sum _{e\in \delta \mathcal {D}}g_e,\varpi \bigg \rangle \\= & {} -\mathcal {D}_{\textsc {kl}}(\omega |\nu )+\ln Z -|\delta \mathcal {D}|\langle \ln {\dot{q}}, \dot{h}^\text {tr}(H^\text {tr}(\varpi ))\rangle \,. \end{aligned}$$

We have \(\dot{h}^\text {tr}(H^\text {tr}(\varpi ))=\dot{h}^\text {tr}(H)\) for both \(\varpi =\nu \) and \(\varpi =\mu \), so \(\varvec{\Xi }(H) =\Lambda (\mu )-\Lambda (\nu )=\mathcal {D}_{\textsc {kl}}(\mu |\nu )\). (For further discussion, see Proposition C.6.) It is well known that \(\mathcal {D}_{\textsc {kl}}(\mu |\nu )\gtrsim \Vert {\mu -\nu } \Vert ^2\), so to conclude it remains for us to show that \(\Vert {\mu -\nu } \Vert \gtrsim \Vert {H-H_\star } \Vert \). To this end, let \(\nu _\star \equiv \nu _{{\dot{q}}_\star }\), and note that \(H=H^\text {tr}(\mu )\) while \(H_\star =H^\text {tr}(\nu _\star )\). Recall from the discussion preceding Lemma 5.2 that \(\varpi \mapsto H^\text {tr}(\varpi )\) is a linear projection, so

$$\begin{aligned}\Vert {H-H_\star } \Vert \lesssim \Vert {\mu -\nu _\star } \Vert \leqslant \Vert {\mu -\nu } \Vert +\Vert {\nu -\nu _\star } \Vert \lesssim \Vert {\mu -\nu } \Vert +\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \,, \end{aligned}$$

where the last bound holds since \(\nu =\nu _{\dot{q}}\) and \(\nu _\star =\nu _{{\dot{q}}_\star }\). Recall from Proposition 5.5a (or Proposition 5.5A for the second moment) that we have the contraction estimate \(\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \) for \(c\in (0,1)\), so

$$\begin{aligned}(1-c)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert -\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant \Vert {{\dot{q}}-{\texttt {BP}}{\dot{q}}} \Vert \,. \end{aligned}$$

Let \(K\equiv ({\dot{K}},{\hat{K}},{\bar{K}})\equiv H^\text {tr}(\nu )\), and note that \({\hat{K}}\) need not be symmetric: if we let \({\hat{K}}'(\underline{{\sigma }})\equiv {\hat{K}}(\sigma _2,\ldots ,\sigma _k,\sigma _1)\) for \(\underline{{\sigma }}\in (\Omega _T)^k\), then \({\hat{K}}\) and \({\hat{K}}'\) need not agree. On the other hand \(H=H^\text {tr}(\mu )=H^\text {sy}\), so

$$\begin{aligned}\Vert {{\hat{K}}-{\hat{K}}'} \Vert \leqslant \Vert {\hat{H}-{\hat{K}}} \Vert +\Vert {\hat{H}-{\hat{K}}'} \Vert =2\Vert {\hat{H}-{\hat{K}}} \Vert \leqslant 2\Vert {H-K} \Vert \lesssim \Vert {\mu -\nu } \Vert \,. \end{aligned}$$

For any k-tuple \(\underline{{{\dot{h}}}}\equiv ({\dot{h}}_1,\ldots ,{\dot{h}}_k)\) of probability measures on \(\dot{\Omega }_T\), consider

$$\begin{aligned}\hat{H}^\text {op}(\underline{{{\dot{h}}}}) \equiv {{\,\mathrm{arg\,max}\,}}_{\hat{\nu }} \bigg \{\mathcal {H}(\hat{\nu }) + \lambda \langle \ln \hat{\Phi },\hat{\nu }\rangle : \hat{\nu }(\dot{\sigma }_1=\cdot )={\dot{h}}_i \text { for all }i \bigg \} \end{aligned}$$

where \(\hat{\nu }\) denotes any probability measure on \({{\,\mathrm{supp}\,}}{\hat{v}}\subseteq (\Omega _T)^k\). The unique optimizer \(\hat{H}^\text {op}(\underline{{{\dot{h}}}})\) can be described in terms of another k-tuple of probability measures on \(\dot{\Omega }_T\), denoted \(\underline{{{\dot{q}}}}\equiv ({\dot{q}}_1,\ldots ,{\dot{q}}_k)\), which serve as Lagrange multipliers: \(\hat{H}^\text {op}(\underline{{{\dot{h}}}})=\hat{H}(\underline{{{\dot{q}}}})\) where

$$\begin{aligned}{}[\hat{H}(\underline{{{\dot{q}}}})](\underline{{\sigma }})\cong \hat{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=1}^k{\dot{q}}_i(\dot{\sigma }_i)\,. \end{aligned}$$

In particular, \(\hat{H}^\text {op}(\underline{{{\dot{h}}}}_\star )=\hat{H}(\underline{{{\dot{q}}}}_\star )\) for \(\underline{{{\dot{h}}}}_\star \equiv (\dot{h}^\text {tr}(H_\star ),\ldots ,\dot{h}^\text {tr}(H_\star ))\) and \(\underline{{{\dot{q}}}}_\star \equiv ({\dot{q}}_\star ,\ldots ,{\dot{q}}_\star )\). For \(\underline{{{\dot{h}}}}\) near \(\underline{{{\dot{h}}}}_\star \), there is a unique \(\underline{{{\dot{q}}}}\) satisfying \(\hat{H}^\text {op}(\underline{{{\dot{h}}}})=\hat{H}(\underline{{{\dot{q}}}})\), and we can determine this \(\underline{{{\dot{q}}}}\) as a smooth function of \(\underline{{{\dot{h}}}}\). Thus

$$\begin{aligned}\Vert {{\hat{K}}-{\hat{K}}'} \Vert =\Vert {{\hat{H}}({\texttt {BP}}{\dot{q}},{\dot{q}},\ldots ,{\dot{q}}) -{\hat{H}}({\dot{q}},{\texttt {BP}}{\dot{q}},{\dot{q}},\ldots ,{\dot{q}}) } \Vert \gtrsim \Vert {{\dot{q}}-{\texttt {BP}}{\dot{q}}} \Vert \,. \end{aligned}$$

Combining the above inequalities gives \(\Vert {H-H_\star } \Vert \lesssim \Vert {\mu -\nu } \Vert \) as desired.\(\square \)

Proof of Propositions 3.4 and 3.10

Note that for \(\bar{H}\) fixed, \(\varvec{F}(H)=\varvec{F}(\dot{H},\hat{H},\bar{H})\) is a strictly concave function of \(\dot{H},\hat{H}\). It follows that for all \(H\in \varvec{\Delta }\) we have \(\varvec{F}(H)\leqslant \varvec{F}(L_H)-\epsilon \Vert {H-L_H} \Vert ^2\) for

$$\begin{aligned} L_H\equiv {{\,\mathrm{arg\,max}\,}}_L\Big \{\varvec{F}(L): L\in \varvec{\Delta }\text { with } {\bar{L}}=\bar{H}\Big \}\,. \end{aligned}$$

Clearly \(L_H=(L_H)^\text {sy}\), so it follows from Theorem 4.2 and Proposition 5.1 that

$$\begin{aligned} \varvec{F}(L_H)\leqslant \varvec{F}(H_\star ) - \epsilon \Vert {L_H-H_\star } \Vert ^2. \end{aligned}$$

Combining the inequalities (and adjusting \(\epsilon \) as needed) gives \(\varvec{F}(H)\leqslant \varvec{F}(H_\star )-\epsilon \Vert {H-H_\star } \Vert ^2\). This concludes the proof of Proposition 3.4, and Proposition 3.10 follows by exactly the same argument. \(\square \)