The number of solutions for random regular NAE-SAT

Sly, Allan; Sun, Nike; Zhang, Yumeng

doi:10.1007/s00440-021-01029-5

The number of solutions for random regular NAE-SAT

Published: 20 November 2021

Volume 182, pages 1–109, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Probability Theory and Related Fields Aims and scope Submit manuscript

The number of solutions for random regular NAE-SAT

Download PDF

Allan Sly¹,
Nike Sun² &
Yumeng Zhang³

412 Accesses
1 Citation
Explore all metrics

Abstract

Recent work has made substantial progress in understanding the transitions of random constraint satisfaction problems. In particular, for several of these models, the exact satisfiability threshold has been rigorously determined, confirming predictions of statistical physics. Here we revisit one of these models, random regular k-nae-sat: knowing the satisfiability threshold, it is natural to study, in the satisfiable regime, the number of solutions in a typical instance. We prove here that these solutions have a well-defined free energy (limiting exponential growth rate), with explicit value matching the one-step replica symmetry breaking prediction. The proof develops new techniques for analyzing a certain “survey propagation model” associated to this problem. We believe that these methods may be applicable in a wide class of related problems.

Satisfiability Threshold for Random Regular nae-sat

Article 26 November 2015

Super Solutions of Random Instances of Satisfiability

The backtracking survey propagation algorithm for solving random K-SAT problems

Article Open access 03 October 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In a random constraint satisfaction problem (csp), we have n variables taking values in a (finite) alphabet $\mathcal {X}$, subject to a random set of constraints. In previous works on models of this kind, it has emerged that the space of solutions—a random subset of $\mathcal {X}^n$—can have a complicated structure, posing obstacles to mathematical analysis. Major advances in intuition were achieved by statistical physicists, who developed powerful analytic heuristics to shed light on the behavior of random csps ([31] and references therein). Their insights and methods are fundamental to the current understanding of random csps.

One prominent application of the physics heuristic is in giving explicit (non-rigorous) predictions for the locations of satisfiability thresholds in a large class of random csps ([36] and others). Recent works have given rigorous proofs for some of these thresholds: in the random regular nae-sat model [22, 25], in the random k-sat model [23], and in the independent set model on random regular graphs [24]. However, the satisfiability threshold is only one aspect of the rich picture that physicists have developed. There are deep conjectures for the behavior of these models inside the satisfiable regime, and it remains an outstanding mathematical challenge to prove them. In this paper we address one part of this challenge, concerning the total number of solutions for a typical instance in the satisfiable regime.

1.1 Main result

Given a cnf boolean formula, a not-all-equal-sat (nae-sat) solution is an assignment $\underline{{{\varvec{x}}}}$ of literals to variables such that both $\underline{{{\varvec{x}}}}$ and its negation $\lnot \underline{{{\varvec{x}}}}$ evaluate to true—equivalently, such that no clause gives the same evaluation to all its variables. A k-nae-sat problem is one in which each clause has exactly k literals; it is termed d-regular if each variable appears in exactly d clauses. Sampling such a formula in a uniformly random manner gives rise to the random d-regular k-nae-sat model. (The formal definition is given in Sect. 2.) See [3] for important early work on the closely related model of random (Erdős–Rényi) nae-sat. The appeal of this model is that it has certain symmetries making the analysis particularly tractable, yet it is expected to share most of the interesting qualitative phenomena exhibited by other commonly studied problems, including random k-sat and random graph colorings.

Following convention, we fix k and then parametrize the model by its clause-to-variable ratio, $\alpha \equiv d/k$. The partition function, denoted $Z\equiv Z_n$, is the number of valid nae-sat assignments for an instance on n variables. It is conjectured that for each $k\geqslant 3$, the model has an exact satisfiability threshold $\alpha _\text {sat}(k)$: for $\alpha <\alpha _\text {sat}$ it is satisfiable ($Z>0$) with high probability, but for $\alpha >\alpha _\text {sat}$ it is unsatisfiable ($Z=0$) with high probability. This has been proved [25, Thm. 1] for all k exceeding an absolute constant $k_0$, together with an exact formula for $\alpha _\text {sat}$ which matches the physics prediction. It can be approximated as

$$\begin{aligned} \alpha _\text {sat}= \left( 2^{k-1} -\frac{1}{2} -\frac{1}{4\ln 2}\right) \ln 2 + \epsilon _k \end{aligned}$$

(1)

where $\epsilon _k$ denotes an error tending to zero as $k\rightarrow \infty $.

We say the model has free energy ${\textsf {f}}(\alpha )$ if $Z^{1/n}$ converges to ${\textsf {f}}(\alpha )$ in probability as $n\rightarrow \infty $. A priori, the limit may not be well-defined. If it exists, however, Markov’s inequality and Jensen’s inequality imply that it must be upper bounded by the replica symmetric free energy

$$\begin{aligned} {\textsf {f}}^\textsc {rs}(\alpha ) \equiv (\mathbb {E}Z)^{1/n} = 2 \bigg (1-\frac{2}{2^k}\bigg )^\alpha \,. \end{aligned}$$

(2)

(In this model and in other random regular models, the replica symmetry free energy is the same as the annealed free energy.) One of the intriguing predictions from the physics analysis [38, 44] is that there is a critical value $\alpha _\text {cond}$ strictly below $\alpha _\text {sat}$, such that ${\textsf {f}}(\alpha )$ and ${\textsf {f}}^\textsc {rs}(\alpha )$ agree up to $\alpha =\alpha _\text {cond}$ and diverge thereafter. In particular, this implies that the function ${\textsf {f}}(\alpha )$ must be non-analytic at $\alpha =\alpha _\text {cond}$. This is the condensation transition (or Kauzmann transition), and will be further described below in Sect. 1.2. For all $0\leqslant \alpha <\alpha _\text {sat}$, the free energy is predicted to be given by a formula

$$\begin{aligned}{\textsf {f}}(\alpha )={\textsf {f}}^\textsc {1rsb}(\alpha ) {\left\{ \begin{array}{ll} ={\textsf {f}}^\textsc {rs}(\alpha ) &{} \text {for }0\leqslant \alpha \leqslant \alpha _\text {cond},\\ <{\textsf {f}}^\textsc {rs}(\alpha ) &{} \text {for }\alpha >\alpha _\text {cond}. \end{array}\right. } \end{aligned}$$

The function ${\textsf {f}}^\textsc {1rsb}(\alpha )$ is quite explicit, although not extremely simple to state; it is formally presented below in Definition 1.3. The formula for ${\textsf {f}}^\textsc {1rsb}(\alpha )$ is derived via the one-step replica symmetry breaking (1rsb) heuristic, discussed further below. Our main result is to prove this prediction for large k:

Theorem 1

In random regular k-nae-sat with $k\geqslant k_0$, for all $\alpha <\alpha _\text {sat}(k)$ the free energy ${\textsf {f}}(\alpha )$ exists and equals the predicted value ${\textsf {f}}^\textsc {1rsb}(\alpha )$.

Remark 1.1

We allow for $k_0$ to be adjusted as long as it remains an absolute constant (so it need not equal the $k_0$ from [25]). It is assumed throughout the paper that $k\geqslant k_0$, even when not explicitly stated. The following considerations restrict the range of $\alpha =d/k$ that we must consider:

A convenient upper bound on the satisfiable regime is given by
$$\begin{aligned} \alpha _\text {sat}\leqslant \alpha _{\textsc {rs}}\equiv \frac{\ln 2}{-\ln (1-2/2^k)} < 2^{k-1}\ln 2 \equiv \alpha _\text {ubd}\,. \end{aligned}$$
This bound is certainly implied by the estimate (1) from [25], but it follows much more easily and directly from the first moment calculation (2). Indeed, we see from (2) that the function ${\textsf {f}}^\textsc {rs}(\alpha )$ is decreasing in $\alpha $ and satisfies ${\textsf {f}}^\textsc {rs}(\alpha _{\textsc {rs}})=1$, so $(\mathbb {E}Z)^{1/n}<1$ for all $\alpha >\alpha _{\textsc {rs}}$. Thus, by Markov’s inequality, we have that $\mathbb {P}(Z>0) \leqslant \mathbb {E}Z$ tends to zero as $n\rightarrow \infty $, i.e., the random problem instance is unsatisfiable with high probability.
For $\alpha >\alpha _\text {sat}$ we must have ${\textsf {f}}(\alpha )=0$. On the other hand, we can see by comparing (1) and (2) that $\alpha _\text {sat}$ is strictly smaller than $\alpha _{\textsc {rs}}$, and ${\textsf {f}}^\textsc {rs}(\alpha _\text {sat})$ is strictly positive. This suggests that $\alpha _\text {cond}$ occurs strictly before $\alpha _\text {sat}$, since $\alpha _\text {cond}=\alpha _\text {sat}$ would mean that ${\textsf {f}}(\alpha )={\textsf {f}}^\textsc {rs}(\alpha )$ up to $\alpha _\text {sat}$, and in this case we would expect to have $\alpha _\text {sat}=\alpha _{\textsc {rs}}$. Formally, it requires further argument to confirm that $\alpha _\text {cond}<\alpha _\text {sat}$ in random regular nae-sat, and we obtain this as a consequence of results in the present paper. However, the phenomenon of $\alpha _\text {cond}<\alpha _\text {sat}$ was previously confirmed by [20] and [8] for random hypergraph bicoloring and random regular sat, both of which are very similar to random regular nae-sat. As for the value of ${\textsf {f}}(\alpha )$ at the threshold $\alpha =\alpha _\text {sat}$, we point out that $\alpha =\alpha _\text {sat}$ makes sense in the setting of this paper only if $d_\text {sat}(k)\equiv k\alpha _\text {sat}(k)$ is integer-valued for some k. We have no reason to think that this ever occurs; however, if it does, then the probability for $Z>0$ is bounded away from both zero and one [25, Thm. 1]. In this case, $Z^{1/n}$ does not concentrate around a single value but rather on two values,
$$\begin{aligned}\bigg \{0, \lim _{\alpha \uparrow \alpha _\text {sat}} {\textsf {f}}^\textsc {1rsb}(\alpha ) \bigg \}\,. \end{aligned}$$
In [25, Propn. 1.1] it is shown that for $0\leqslant \alpha \leqslant \alpha _\text {lbd}\equiv (2^{k-1}-2)\ln 2$ and n large enough,
$$\begin{aligned} \frac{\mathbb {E}(Z^2)}{(\mathbb {E}Z)^2} \leqslant C\equiv C(k,\alpha )<\infty \end{aligned}$$
where $Z\equiv Z_n$ and $C(k,\alpha )$ does not depend on n. Thus, for any fixed $0<\epsilon <1$ and n large enough,
$$\begin{aligned}\mathbb {P}(Z\geqslant \epsilon \mathbb {E}Z) {\mathop {\geqslant }\limits ^{\odot }} \frac{\mathbb {E}(Z \mathbf {1}\{ Z\geqslant \epsilon \mathbb {E}Z\})^2}{\mathbb {E}(Z^2)} \geqslant \frac{(1-\epsilon )^2(\mathbb {E}Z)^2}{\mathbb {E}(Z^2)} \geqslant \frac{(1-\epsilon )^2}{C} \equiv \delta \,. \end{aligned}$$
where the step marked $\odot $ is by the Cauchy–Schwarz inequality. The results of [25, Sec. 6] imply the stronger statement that for any $0\leqslant \alpha \leqslant \alpha _\text {lbd}$,
$$\begin{aligned} \lim _{\epsilon \downarrow 0} \liminf _{n\rightarrow \infty } \mathbb {P}(Z \geqslant \epsilon \mathbb {E}Z)=1\,. \end{aligned}$$
On the other hand we already noted in (2) that $\mathbb {E}(Z^{1/n}) \leqslant (\mathbb {E}Z)^{1/n}={\textsf {f}}^\textsc {rs}(\alpha )$ for all $\alpha \geqslant 0$ and $n\geqslant 1$. It follows by combining these facts that $Z^{1/n}$ converges in probability to ${\textsf {f}}^\textsc {rs}(\alpha )$ in probability for any $0\leqslant \alpha \leqslant \alpha _\text {lbd}$. That is to say, the result of Theorem 1 is already proved for $\alpha \leqslant \alpha _\text {lbd}$, with ${\textsf {f}}(\alpha )={\textsf {f}}^\textsc {rs}(\alpha )$. This also implies that the condensation transition $\alpha _\text {cond}$ must occur above $\alpha _\text {lbd}$.

In summary, we have $\alpha _\text {lbd}< \alpha _\text {sat}< \alpha _{\textsc {rs}}< \alpha _\text {ubd}$, and it remains to prove Theorem 1 for $\alpha \in (\alpha _\text {lbd},\alpha _\text {sat})$. Thus, we can assume for the remainder of the paper that

$$\begin{aligned} (2^{k-1}-2)\ln 2 = \alpha _\text {lbd}\leqslant \alpha \leqslant \alpha _\text {ubd}= 2^{k-1}\ln 2\,. \end{aligned}$$

(3)

In the course of proving Theorem 1 we will also identify the condensation threshold $\alpha _\text {cond}\in (\alpha _\text {lbd},\alpha _\text {sat})$ (characterized in Proposition 1.4 below).

The 1rsb heuristic, along with its implications for the condensation and satisfiability thresholds, has been studied in numerous recent works, which we briefly survey here. The existence of a condensation transition was first shown in random hypergraph bicoloring [20], which as we mentioned above is a model very similar to random nae-sat. We also point out [17] which is the first work to successfully analyze solution clusters within the condensation regime, leading to a very good lower bound on satisfiability threshold. This was an important precursor to subsequent works [23,24,25] on exact satisfiability thresholds in random regular nae-sat, random sat, and independent sets. Condensation has been demonstrated to occur even at positive temperature in hypergraph bicoloring (which is very similar to nae-sat) [11]. However, determining the precise location of $\alpha _\text {cond}$ is challenging, and was first achieved for the random graph coloring model [10] by an impressive and technically challenging analysis. A related paper pinpoints $\alpha _\text {cond}$ for random regular k-sat (which again is very similar to nae-sat) [8]. Subsequent work [15] characterizes the condensation threshold in a more general family of models, and shows a correspondence with information-theoretic thresholds in statistical inference problems. The main contribution of this paper is to determine for the first time the free energy throughout the condensation regime $(\alpha _\text {cond},\alpha _\text {sat})$.

1.2 Statistical physics predictions

According to the heuristic analysis by statistical physics methods, the random regular nae-sat model has a single level of replica symmetry breaking (1rsb). We summarize here some of the key phenomena that are predicted from the 1rsb framework [31, 38, 44], referring the reader to [33, Ch. 19] for a full expository account. While much of the following description remains conjectural, the implications at the free energy level are rigorously established by the present paper. Throughout the following we write $\doteq $ to indicate equality up to subexponential factors ($\exp \{o(n)\}$).

Recall that we consider nae-sat with k fixed, parametrized by the clause density $\alpha \equiv d/k$. Abbreviate ${\texttt {0}}\equiv \textsc {true}$, ${\texttt {1}}\equiv \textsc {false}$. For small $\alpha $, almost all of the solutions lie in a single well-connected subset of $\{{\texttt {0}},{\texttt {1}}\}^n$. This holds until a clustering transition (or dynamical transition) $\alpha _\text {clust}$, above which the solution space becomes broken up into many well-separated pieces, or clusters (see [1, 2, 6, 35]). Informally speaking, clusters are subsets of solutions which are characterized by the property that within-cluster distances are very small relative to between-cluster distances. Conjecturally, $\alpha _\text {clust}$ also coincides with the reconstruction threshold [28, 31, 39], and is small relative to $\alpha _\text {sat}$ when k is large, with $\alpha _\text {clust}/\alpha _\text {sat}\asymp (\ln k)/k$.

For $\alpha $ above $\alpha _\text {clust}$ it is expected that the number of clusters of size $\exp \{ n s\}$ has mean value $\exp \{ n\Sigma (s;\alpha ) \}$, and is concentrated about this mean. The function $\Sigma (s)\equiv \Sigma (s;\alpha )$ is referred to as the “cluster complexity.” The 1rsb framework of statistical physics gives an explicit conjecture for $\Sigma $, discussed below in Sect. 1.3. Then, summing over cluster sizes $0\leqslant s\leqslant \ln 2$ gives that the total number Z of nae-sat solutions has mean

$$\begin{aligned} \mathbb {E}Z \doteq \sum _s \exp \{ n[s+\Sigma (s)] \} \doteq \exp \{ n[s_1+\Sigma (s_1)] \}, \end{aligned}$$

(4)

where $s_1={{\,\mathrm{arg\,max}\,}}[s+\Sigma (s)]$. It is expected that $\Sigma $ is continuous and strictly concave in s, and also that $s+\Sigma (s)$ has a unique maximizer $s_1$ with $\Sigma '(s_1)=-1$. For nae-sat and related models, this explicit calculation reveals a critical value $\alpha _\text {cond}\in (\alpha _\text {clust},\alpha _\text {sat})$, characterized as

$$\begin{aligned} \alpha _\text {cond}=\sup \{\alpha \geqslant \alpha _\text {clust}: \Sigma (s_1(\alpha );\alpha )\geqslant 0 \}\,. \end{aligned}$$

By contrast, the satisfiability threshold can be characterized as

$$\begin{aligned} \alpha _\text {sat}=\sup \{\alpha \geqslant \alpha _\text {clust}: \max _s\Sigma (s;\alpha ) \geqslant 0\}\,. \end{aligned}$$

For all $\alpha \geqslant \alpha _\text {clust}$, the expected partition function $\mathbb {E}Z$ is dominated by clusters of size $\exp \{n s_1\}$. However, for $\alpha >\alpha _\text {cond}$, we have $\Sigma (s_1)<0$, so the expected number of clusters of this size is very small: $\exp \{n\Sigma (s_1)\}$ tends to zero exponentially fast as $n\rightarrow \infty $. This means that clusters of size $\exp \{ns_1\}$ are highly unlikely to appear in a typical realization of the model. Instead, in a typical realization we only expect to see clusters of size $\exp \{ns\}$ with $\Sigma (s)\geqslant 0$. As a result the solution space should be dominated (with high probability) by clusters of size $s_{\max }$ where

$$\begin{aligned}s_{\max } \equiv s_{\max }(\alpha ) \equiv {{\,\mathrm{arg\,max}\,}}\{ s+\Sigma (s) : \Sigma (s)\geqslant 0 \}\,. \end{aligned}$$

Since $\Sigma $ is continuous, $s_{\max }$ is the largest root of $\Sigma $, and for $\alpha \in (\alpha _\text {cond}, \alpha _\text {sat})$ we should have

$$\begin{aligned}Z\doteq \exp \{n s_{\max }\}\ll \mathbb {E}Z = \exp \{n[s_1+\Sigma (s_1)]\} \end{aligned}$$

(where the approximation for Z holds with high probability). The 1rsb free energy, formally given by Definition 1.3 below, should be interpreted as an expression for the function ${\textsf {f}}^\textsc {1rsb}(\alpha )=s_{\max }(\alpha )$.

1.3 The tilted cluster partition function

From the discussion of Sect. 1.2 we see that once the function $\Sigma (s;\alpha )$ is determined, it is possible to derive $\alpha _\text {cond}$, $\alpha _\text {sat}$, and ${\textsf {f}}(\alpha )$. However, previous works have not taken the approach of actually computing $\Sigma $. Indeed, $\alpha _\text {sat}$ was determined [25] by an analysis involving only $\max _s\Sigma (s;\alpha )$, which contains less information than the full curve $\Sigma $. Related work on the exact determination (in a range of models) of $\alpha _\text {cond}$ [8, 10, 15] also avoids computing $\Sigma $, reasoning instead via the so-called “planted model.”

In order to compute the free energy, however, we cannot avoid computing (some version of) the function $\Sigma $, which we will do by a physics-inspired approach. First consider the $\lambda $-tilted partition function

$$\begin{aligned} \bar{\varvec{Z}}_\lambda \equiv \sum _{\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})} |\varvec{\gamma }|^\lambda \,,\end{aligned}$$

(5)

where $\mathrm {\textsf {{CL}}}(\mathscr {G})$ denotes the set of solution clusters of $\mathscr {G}$, and $|\varvec{\gamma }|$ denotes the number of satisfying assignments inside the cluster $\varvec{\gamma }$. According to the conjectural picture described above, we should have

$$\begin{aligned} \mathbb {E}\bar{\varvec{Z}}_\lambda \doteq \sum _s (\exp \{ns\})^\lambda \exp \{n\Sigma (s)\} \doteq \exp \{n{\mathfrak {F}}(\lambda )\} \end{aligned}$$

where ${\mathfrak {F}}$ is the Legendre dual of $-\Sigma $:

$$\begin{aligned} {\mathfrak {F}}(\lambda ) \equiv (-\Sigma )^\star (\lambda ) \equiv \max _s \bigg \{ \lambda s + \Sigma (s) \bigg \} = \lambda s_\lambda + \Sigma (s_\lambda )\,,\end{aligned}$$

(6)

where $s_\lambda \equiv {{\,\mathrm{arg\,max}\,}}_s[\lambda s + \Sigma (s)]$. Moreover, if $\Sigma (s_\lambda )\geqslant 0$, then $\varvec{Z}_\lambda $ should concentrate near $\mathbb {E}\varvec{Z}_\lambda $, and in this regime physicists have an exact prediction for ${\mathfrak {F}}(\lambda )$, which will be further discussed below in Sect. 1.5. In short, the physics approach to computing $\Sigma $ is to first compute ${\mathfrak {F}}(\lambda )$ (in the regime where $\Sigma (s_\lambda )\geqslant 0$), and then set $\Sigma = -{\mathfrak {F}}^\star $. Note that by differentiating ${\mathfrak {F}}(\lambda ) = n^{-1}\ln \mathbb {E}\bar{\varvec{Z}}_\lambda $ we find that ${\mathfrak {F}}$ is convex in $\lambda $, so the resulting $\Sigma $ will indeed be concave.

At first glance the reduction to computing ${\mathfrak {F}}(\lambda )$ may not seem to improve matters. It is not immediately clear how “clusters” should be defined. It turns out that in the regime we are studying, a reasonable definition is that two nae-sat solutions are connected if they differ by a single bit, and the clusters are the connected components of the solution space. A typical nae-sat solution will have a positive density of variables which are free, meaning their value can be changed without violating any clause; any such solution must belong in a cluster of exponential size. Each cluster may be a complicated subset of $\{{\texttt {0}},{\texttt {1}}\}^n$—changing the value at one free variable may affect whether its neighbors are free, so a cluster need not be a simple subcube of $\{{\texttt {0}},{\texttt {1}}\}^n$. Nevertheless, we wish to sum over the cluster sizes raised to non-integer powers. This computation is made tractable by constructing more explicit combinatorial models for the nae-sat solution clusters, as we next describe.

1.4 Modeling solution clusters

In our regime of interest (i.e., $k\geqslant k_0$ and $\alpha _\text {lbd}\leqslant \alpha \leqslant \alpha _\text {ubd}$; see Remark 1.1), the analysis of nae-sat solution clusters is greatly simplified by the fact that in a typical satisfying assignment the vast majority of variables are frozen rather than free. The result of this, roughly speaking, is that a cluster $\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})$ can be encoded by a configuration $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^n$ (representing its circumscribed subcube, so $x_v={\texttt {f}}$ indicates a free variable) with no essential loss of information. This is formalized by a combinatorial model of “frozen configurations” representing the clusters (Definition 2.2). These frozen configurations can be viewed as the solutions of a certain csp lifted from the original nae-sat problem — so the physics heuristics can be applied again to (the randomized version of) the lifted model. Variations on this idea appear in several places in the physics literature; in the specific context of random csps we refer to [13, 34, 42]. Analyzing the number of frozen configurations, corresponding to (5) with $\lambda =0$, yields the satisfiability threshold for this model [25].

Analyzing (5) for general $\lambda $ requires deeper investigation of the arrangement of free variables in a typical frozen configuration $\underline{{x}}$. For this purpose it is convenient to view an nae-sat instance as a (bipartite) graph $\mathscr {G}$, where the vertices are given by variables and clauses, and the edges indicate which variables participate in which clauses (the formal description appears in Sect. 2). A key piece of intuition is that if we consider the subgraph of $\mathscr {G}$ induced by the free variables, together with the clauses through which they interact, then this subgraph is predominantly comprised of disjoint components $\varvec{T}$ of bounded size. (In fact, the majority of free variables are simply isolated vertices; a smaller fraction occur in linked pairs; a yet smaller fraction occur in components of size three or more.) Each free component $\varvec{T}$ is surrounded by frozen variables, and we let $z(\varvec{T})$ be the number of nae-sat assignments on $\varvec{T}$ which are consistent with the frozen boundary. Since disjoint components $\varvec{T},\varvec{T}'$ do not interact, the size of the cluster represented by $\underline{{x}}$ is simply the product of $z(\varvec{T})$ over all $\varvec{T}$.

Another key observation is that the random nae-sat graph has few short cycles, so almost all of the free components will be trees. As a result, their weights $z(\varvec{T})$ can be evaluated recursively by belief propagation (bp), a well-known dynamic programming method (see e.g. [33, Ch. 14]). In the rsb heuristic framework, a cluster is represented by a vector $\underline{{{\texttt {m}}}}$ of “messages,” indexed by the directed edges of the nae-sat graph $\mathscr {G}$. Informally, for a given cluster, and for any variable v adjacent to any clause a,

$$\begin{aligned}&{\texttt {m}}_{v\rightarrow a}\text { represents the ``within-cluster law of }{\varvec{x}}_v\text { in absence of }a'';\nonumber \\&{\texttt {m}}_{a\rightarrow v}\text { represents the ``within-cluster law of } {\varvec{x}}_v\text { in absence of }\partial v{\setminus } a'', \end{aligned}$$

(7)

where $\partial v$ denotes the neighboring clauses of v. Each message is a probability measure on $\{{\texttt {0}},{\texttt {1}}\}$, and the messages are related to one another via local consistency equations, which are known as the bp equations. A configuration $\underline{{{\texttt {m}}}}$ which satisfies all the local consistency equations is a bp solution. Thus a cluster $\varvec{\gamma }$ can be encoded either by a frozen configuration $\underline{{x}}$ or by a bp solution $\underline{{{\texttt {m}}}}$; the latter has the key advantage that the size of $\varvec{\gamma }$ can be easily read off from $\underline{{{\texttt {m}}}}$, as a certain product of local functions. For the cluster size raised to power $\lambda $, simply raise each local function to power $\lambda $. Thus the configurations $\underline{{{\texttt {m}}}}$ with $\lambda $-tilted weights form a spin system (Markov random field), whose partition function is the quantity of interest (5). This new spin system is sometimes termed the “auxiliary model” (e.g. [33, Ch. 19]).

1.5 One-step replica symmetry breaking

In Sect. 1.4 we described informally how a solution cluster $\varvec{\gamma }$ can be encoded by a frozen configuration $\underline{{x}}$, or a bp solution $\underline{{{\texttt {m}}}}$. An important caveat is that the converse need not hold. In the nae-sat model, for any value of $\alpha $, a trivial bp solution is always given by the “replica symmetric fixed point” (also called the “factorized fixed point”), where every ${\texttt {m}}_{v\rightarrow a}$ is the uniform measure on $\{{\texttt {0}},{\texttt {1}}\}$. However, above $\alpha _\text {cond}$, this is a spurious solution. One way to see this is via the heuristic “cavity calculation” of ${\textsf {f}}^\textsc {rs}(\alpha )$, which we now describe to motivate the more complicated expression for ${\textsf {f}}^\textsc {1rsb}(\alpha )$.

Given a random regular nae-sat instance $\mathscr {G}$ on n variables, choose k uniformly random variables $v_1,\ldots ,v_k$, and assume for simplicity that no two of these share a clause. Remove the k variables along with their kd incident clauses, producing the “cavity graph” $\mathscr {G}''$. Then add $d(k-1)$ new clauses to $\mathscr {G}''$, producing the graph $\mathscr {G}'$. Under this operation (cf. [7]), $\mathscr {G}'$ is distributed as a random regular nae-sat instance on $n-k$ variables. If the free energy ${\textsf {f}}(\alpha )=\lim _{n\rightarrow \infty } Z^{1/n}$ exists, then we would expect it to agree asymptotically with

$$\begin{aligned} \bigg (\frac{Z(\mathscr {G})}{Z(\mathscr {G}')}\bigg )^{1/k} =\bigg (\frac{Z(\mathscr {G})}{Z(\mathscr {G}'')}\bigg )^{1/k} \bigg / \bigg (\frac{Z(\mathscr {G}')}{Z(\mathscr {G}'')}\bigg )^{1/k}\,. \end{aligned}$$

(8)

Let U denote the set of “cavity neighbors”: the variables in $\mathscr {G}''$ of degree $d-1$, which neighbored the clauses that were deleted from $\mathscr {G}$. Then $\mathscr {G}$ and $\mathscr {G}'$ differ from $\mathscr {G}''$ only in the addition of a few small subgraphs which are attached to U. Computing $Z(\mathscr {G})/Z(\mathscr {G}'')$ or $Z(\mathscr {G}')/Z(\mathscr {G}'')$ reduces to understanding the joint law of the spins $({\varvec{x}}_u)_{u\in U}$ under the nae-sat model defined by $\mathscr {G}''$. Since $\mathscr {G}$ is unlikely to have many cycles, the vertices of U are typically far apart from one another in $\mathscr {G}''$. Therefore, one plausible scenario is that their spins are approximately independent under the nae-sat model on $\mathscr {G}''$, with ${\varvec{x}}_u$ marginally distributed according to ${\texttt {m}}_{u\rightarrow a}$ where a is the deleted clause that neighbored u in $\mathscr {G}$. If this is the case, then each ${\texttt {m}}_{u\rightarrow a}$ must be uniform over $\{{\texttt {0}},{\texttt {1}}\}$, by the negation symmetry of nae-sat. Under this assumption, we can calculate

$$\begin{aligned} \bigg (\frac{Z(\mathscr {G})}{Z(\mathscr {G}'')}\bigg )^{1/k} = 2(1-2/2^k)^d,\quad \bigg (\frac{Z(\mathscr {G}')}{Z(\mathscr {G}'')}\bigg )^{1/k} = (1-2/2^k)^{\alpha (k-1)}, \end{aligned}$$

(9)

Substituting into (8) gives the replica symmetric free energy prediction ${\textsf {f}}(\alpha )\doteq {\textsf {f}}^\textsc {rs}(\alpha )$, which we know to be false for large $\alpha $ (in particular, it is inconsistent with the known satisfiability threshold). Thus the replica symmetric fixed point, ${\texttt {m}}_{u\rightarrow a}= \text {unif}(\{0,1\})$ for all $u\rightarrow a$, is a spurious bp solution. In reality the ${\varvec{x}}_u$ are not approximately independent in $\mathscr {G}''$, even though the u’s are far apart. This phenomenon of non-negligible long-range dependence may be taken as a definition of replica symmetry breaking (rsb) in this setting, and occurs precisely for $\alpha $ larger than $\alpha _\text {cond}$.

Since above $\alpha _\text {cond}$ the partition function cannot be estimated by (9) due to replica symmetry breaking, a different approach is needed. To this end, the one-step rsb (1rsb) heuristic posits that even when the original nae-sat model exhibits rsb, the (seemingly more complicated) “auxiliary model” of $\lambda $-weighted bp solutions $\underline{{{\texttt {m}}}}$ is replica symmetric, for $\lambda $ small enough: conjecturally, as long as $\Sigma (s_\lambda )\geqslant 0$ for $s_\lambda \equiv {{\,\mathrm{arg\,max}\,}}_s \{ \lambda s+\Sigma (s)\}$ (cf. the discussion below (6)). That is, for such $\lambda $, the auxiliary model is predicted to have correlation decay, in contrast with the long-range correlations of the original model. This would mean that in the auxiliary model of the cavity graph $\mathscr {G}''$, the spins $({\texttt {m}}_{u\rightarrow a})_{u\in U}$ are approximately independent, with each ${\texttt {m}}_{u\rightarrow a}$ marginally distributed according to some law ${\dot{q}}_{u\rightarrow a}$. The model has a replica symmetric fixed point, ${\dot{q}}_{u\rightarrow a}={\dot{q}}_\lambda $ for all $u\rightarrow a$ (the analogue of ${\texttt {m}}_{u\rightarrow a}=\text {unif}(\{{\texttt {0}},{\texttt {1}}\})$ for all $u\rightarrow a$). If we substitute this assumption into the cavity calculation (the analogues of (8) and (9)), we obtain the replica symmetric prediction for the auxiliary model free energy ${\mathfrak {F}}(\lambda )$, expressed as a function of ${\dot{q}}_\lambda $. As explained above, from ${\mathfrak {F}}(\lambda )$ we can derive the complexity function $\Sigma (s)$ and the 1rsb nae-sat free energy ${\textsf {f}}^\textsc {1rsb}(\alpha )$.

1.6 The 1RSB free energy prediction

Having described the heuristic reasoning, we now proceed to formally state the 1rsb free energy prediction. We first describe ${\dot{q}}_\lambda $ as a certain discrete probability measure over ${\texttt {m}}$. Since ${\texttt {m}}$ is a probability measure over $\{{\texttt {0}},{\texttt {1}}\}$, we encode it by $x\equiv {\texttt {m}}({\texttt {1}})\in [0,1]$. A measure q on ${\texttt {m}}$ can thus be encoded by an element $\mu \in {\mathscr {P}}$ where ${\mathscr {P}}$ denotes the set of discrete probability measures on [0, 1]. For measurable $B\subseteq [0,1]$, define

$$\begin{aligned} \hat{\mathscr {R}}_\lambda \mu (B)&\equiv \frac{1}{\hat{{\mathscr {Z}}}(\mu )} \int \bigg (2-\prod _{i=1}^{k-1}x_{i}-\prod _{i=1}^{k-1}(1-x_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ \frac{1-\prod _{i=1}^{k-1}x_{i}}{2- \prod _{i=1}^{k-1}x_{i} - \prod _{i=1}^{k-1}(1-x_{i})} \in B\bigg \} \, \prod _{i=1}^{k-1}{\mu }(dx_{i})\,,\nonumber \\ \dot{\mathscr {R}}_\lambda \mu (B)&\equiv \frac{1}{\dot{{\mathscr {Z}}}(\mu )} \int \bigg (\prod _{i=1}^{d-1}y_{i}+\prod _{i=1}^{d-1}(1-y_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ \frac{\prod _{i=1}^{d-1}y_{i}}{\prod _{i=1}^{d-1}y_{i}+\prod _{i=1}^{d-1}(1-y_{i})} \in B \bigg \} \, \prod _{i=1}^{d-1}\mu (dy_{i})\,, \end{aligned}$$

(10)

where $\hat{{\mathscr {Z}}}(\mu )$ and $\dot{{\mathscr {Z}}}(\mu )$ are the normalizing constants such that $\hat{\mathscr {R}}_\lambda \mu $ and $\dot{\mathscr {R}}_\lambda \mu $ are also probability measures on [0, 1]. (In the context of $\lambda =0$ we take the convention that $0^0=0$.) Denote $\mathscr {R}_{\lambda } \equiv \dot{\mathscr {R}}_{\lambda }\circ \hat{\mathscr {R}}_{\lambda }$. The map $\mathscr {R}_{\lambda }:{\mathscr {P}}\rightarrow {\mathscr {P}}$ represents the bp recursion for the auxiliary model. The following presents a solution for $\alpha $ in the interval $(\alpha _\text {lbd},\alpha _\text {ubd})$ which we recall (Remark 1.1) is a superset of $(\alpha _\text {cond},\alpha _\text {sat})$.

Proposition 1.2

(proved in “Appendix B” ) For $\lambda \in [0,1]$, let $\dot{\mu }_{\lambda ,0}\equiv \frac{1}{2} \delta _0 + \frac{1}{2} \delta _1\in {\mathscr {P}}$, and define recursively $\dot{\mu }_{\lambda ,l+1} = \mathscr {R}_\lambda \dot{\mu }_{\lambda ,l}\in {\mathscr {P}}$ for all $l\geqslant 0$. Define $S_l \equiv ({{\,\mathrm{supp}\,}}\dot{\mu }_{\lambda ,l}) {\setminus } ({{\,\mathrm{supp}\,}}( \dot{\mu }_{\lambda ,0} +\ldots +\dot{\mu }_{\lambda ,l-1} ))$; this is a finite subset of [0, 1]. Regard $\dot{\mu }_{\lambda ,l}$ as an infinite sequence indexed by the elements of $S_1$ in increasing order, followed by the elements of $S_2$ in increasing order, and so on. For $k\geqslant k_0$ and $\alpha _\text {lbd}\leqslant \alpha \leqslant \alpha _\text {ubd}$, in the limit $l\rightarrow \infty $, $\dot{\mu }_{\lambda ,l}$ converges in the $\ell ^1$ sequence space to a limit $\dot{\mu }_\lambda \in {\mathscr {P}}$ satisfying the fixed point equation $\dot{\mu }_\lambda = \mathscr {R}_\lambda \dot{\mu }_\lambda $, as well as the estimates $\dot{\mu }_\lambda ((0,1))\leqslant 7/2^k$ and $\dot{\mu }_\lambda (dx) = \dot{\mu }_\lambda (d(1-x))$.

The limit $\dot{\mu }_\lambda $ of Proposition 1.2 encodes the desired replica symmetric solution ${\dot{q}}_\lambda $ for the auxiliary model. We can then express ${\mathfrak {F}}(\lambda )$ in terms of $\dot{\mu }_\lambda $ as follows. Writing $\hat{\mu }_\lambda \equiv \hat{{\mathscr {R}}}_\lambda \dot{\mu }_\lambda $, let ${\dot{w}}_\lambda ,{\hat{w}}_\lambda ,{\bar{w}}_\lambda \in {\mathscr {P}}$ be defined by

$$\begin{aligned} {\dot{w}}_{\lambda }(B)&= ({\dot{\mathfrak {Z}}}_{\lambda })^{-1} \int \bigg (\prod _{i=1}^{d}y_{i}+\prod _{i=1}^{d}(1-y_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ \prod _{i=1}^{d}y_{i} +\prod _{i=1}^{d}(1-y_{i})\in B \bigg \} \prod _{i=1}^{d}\hat{\mu }_{\lambda }(dy_{i})\,,\nonumber \\ {\hat{w}}_{\lambda }(B)&= ({\hat{\mathfrak {Z}}}_{\lambda })^{-1} \int \bigg (1-\prod _{i=1}^{k}x_{i}-\prod _{i=1}^{k}(1-x_{i})\bigg )^{\lambda } {\mathbf {1}}\bigg \{ 1-\prod _{i=1}^{k}x_{i}-\prod _{i=1}^{k}(1-x_{i})\in B \bigg \} \prod _{i=1}^{k}\dot{\mu }_\lambda (dx_{i})\,,\nonumber \\ {\bar{w}}_{\lambda }(B)&= ({\bar{\mathfrak {Z}}}_{\lambda })^{-1} \iint \bigg (xy+(1-x)(1-y)\bigg )^{\lambda } {\mathbf {1}}\Big \{ xy+(1-x)(1-y)\in B \Big \} \dot{\mu }_\lambda (dx)\hat{\mu }_{\lambda }(dy) \,, \end{aligned}$$

(11)

with $\dot{\mathfrak {Z}}_{\lambda },\hat{\mathfrak {Z}}_{\lambda },\bar{\mathfrak {Z}}_{\lambda }$ the normalizing constants. The analogue of (9) for this model is

$$\begin{aligned}\bigg (\frac{\bar{\varvec{Z}}_\lambda (\mathscr {G})}{\bar{\varvec{Z}}_\lambda (\mathscr {G}'')}\bigg )^{1/k} =\dot{\mathfrak {Z}}_\lambda (\hat{\mathfrak {Z}}_\lambda /\bar{\mathfrak {Z}}_\lambda )^d,\quad \bigg (\frac{\bar{\varvec{Z}}_\lambda (\mathscr {G}')}{\bar{\varvec{Z}}_\lambda (\mathscr {G}'')}\bigg )^{1/k} = (\hat{\mathfrak {Z}}_\lambda )^{\alpha (k-1)}, \end{aligned}$$

and substituting into (8) gives the 1rsb prediction $\bar{\varvec{Z}}_\lambda \doteq \exp \{{\mathfrak {F}}(\lambda )\}$ where

$$\begin{aligned} {\mathfrak {F}}(\lambda )\equiv {\mathfrak {F}}(\lambda ;\alpha ) \equiv \ln \dot{\mathfrak {Z}}_{\lambda } + \alpha \ln \hat{\mathfrak {Z}}_{\lambda } - k\alpha \ln \bar{\mathfrak {Z}}_{\lambda }\,. \end{aligned}$$

(12)

Further, the maximizer of $s\mapsto (\lambda s+\Sigma (s))$ is predicted to be given by

$$\begin{aligned} s_\lambda \equiv s_\lambda (\alpha )\equiv \int \ln (x) {\dot{w}}_\lambda (dx) + \alpha \int \ln (x) {\hat{w}}_\lambda (dx) - k\alpha \int \ln (x) {\bar{w}}_\lambda (dx)\,. \end{aligned}$$

(13)

If $s=s_\lambda $ for $\lambda \in [0,1]$ then we define

$$\begin{aligned} \Sigma (s) \equiv \Sigma (s;\alpha ) \equiv {\mathfrak {F}}(\lambda ;\alpha ) -\lambda s_\lambda (\alpha )\,. \end{aligned}$$

(14)

We then use (14) to define the thresholds

$$\begin{aligned} \alpha _\text {cond}&\equiv \sup \{\alpha : \Sigma (s_1;\alpha )>0\}\,,\\ \alpha _\text {sat}&\equiv \sup \{\alpha : \Sigma (s_0;\alpha )>0\}\,. \end{aligned}$$

We can now formally state the predicted free energy of the original nae-sat model:

Definition 1.3

For $\alpha \in k^{-1}{\mathbb {Z}}$, 1rsb free energy prediction ${\textsf {f}}^\textsc {1rsb}(\alpha )$ is defined as

$$\begin{aligned} {\textsf {f}}^\textsc {1rsb}(\alpha ) = {\left\{ \begin{array}{ll} {\textsf {f}}^\textsc {rs}(\alpha ) =2(1-2/2^k)^\alpha &{} \text {for }\alpha \leqslant \alpha _\text {cond},\\ \exp [\sup \{ s : \Sigma (s) \geqslant 0\}] &{} \text {for }\alpha _\text {cond}\leqslant \alpha < \alpha _\text {sat}\\ 0 &{}\text {for }\alpha > \alpha _\text {sat}. \end{array}\right. } \end{aligned}$$

(15)

(In regular k-nae-sat we must have integer $d=k\alpha $, so we need not consider $\alpha \notin k^{-1}{\mathbb {Z}}$.)

Proposition 1.4

(proved in “Appendix B” ) Assume $k\geqslant k_0$ and write $A\equiv [\alpha _\text {lbd}, \alpha _\text {ubd}] \cap (k^{-1}{\mathbb {Z}})$.

a.
For each $\alpha \in A$, the function $s\mapsto \Sigma (s;\alpha )$ is well-defined, continuous, and strictly decreasing in s.
b.
For each $0\leqslant \lambda \leqslant 1$, the function $\alpha \mapsto \Sigma (s_\lambda ;\alpha )= {\mathfrak {F}}(\lambda ) -\lambda s_\lambda $ is strictly decreasing with respect to $\alpha \in A$. There is a unique $\alpha _\lambda \in A$ such that $\Sigma (s_\lambda ;\alpha )$ is nonnegative for all $\alpha \leqslant \alpha _\lambda $, and is negative for all $\alpha >\alpha _\lambda $. Taking $\lambda =0$ we recover the estimate (1); and taking $\lambda =1$ we obtain in addition
$$\begin{aligned} \alpha _\text {cond}=\alpha _1 = (2^{k-1}-1)\ln 2+ \epsilon _k\,.\end{aligned}$$
(16)

(The main purpose of this proposition is to show that $\Sigma (s_1)<0$ for all $\alpha \in (\alpha _\text {cond},\alpha _\text {sat})$, i.e., that the “condensation regime” is a contiguous range of values of $\alpha $. The expansion of $\alpha _{\text {cond}}$ matches an earlier result of [20], which was obtained for a slightly different but closely related model.)

1.7 Proof approach

Since ${\textsf {f}}={\textsf {f}}(\alpha )$ is a priori not well-defined, the statement ${\textsf {f}}\leqslant \textsf {g}$ means formally that for all $\epsilon >0$, $\mathbb {P}( Z^{1/n} \geqslant \textsf {g}+\epsilon )$ tends to zero as $n\rightarrow \infty $. With this notation, we will prove separately the upper bound ${\textsf {f}}(\alpha )\leqslant {\textsf {f}}^\textsc {1rsb}(\alpha )$ and the matching lower bound ${\textsf {f}}(\alpha )\geqslant {\textsf {f}}^\textsc {1rsb}(\alpha )$. This implies the main result Theorem 1: the free energy ${\textsf {f}}(\alpha )$ is indeed well-defined, and equals ${\textsf {f}}^\textsc {1rsb}(\alpha )$.

The upper bound is proved by an interpolation argument, which we defer to “Appendix E”. This argument builds on similar bounds for spin glasses on Erdős–Rényi graphs [26, 43], together with ideas from [12, 27] for interpolation in random regular models. Let $Z_n(\beta )$ denote the partition function of nae-sat at inverse temperature $\beta >0$. The interpolation method yields an upper bound on $\mathbb {E}\ln Z_n(\beta )$ which is expressed as the infimum of a certain function ${\mathcal {P}}(\mu ;\beta )$, with $\mu $ ranging over probability measures on [0, 1]. We then choose $\mu $ according to Proposition 1.2, and take $\beta \rightarrow \infty $ to obtain the desired bound ${\textsf {f}}(\alpha )\leqslant {\textsf {f}}^\textsc {1rsb}(\alpha )$.

Most of the paper is devoted to establishing the matching lower bound. The proof strategy is inspired by the physics picture described above, and at a high level proceeds as follows. Take any $\lambda $ such that $\Sigma (s_\lambda )$ (as defined by (13) and (14)) is nonnegative, and let $\varvec{Y}_\lambda $ be the number of clusters of size roughly $\exp \{ns_\lambda \}$. (As discussed in §1.3, we shall think of a cluster as a connected component of the solution space.) The informal statement of what we show is that

$$\begin{aligned} \varvec{Y}_\lambda \doteq \exp \{ n[ \lambda s_\lambda + \Sigma (s_\lambda ) ] \}\,.\end{aligned}$$

(17)

Adjusting $\lambda $ as indicated by (15) then proves the desired bound ${\textsf {f}}(\alpha )\geqslant {\textsf {f}}^\textsc {1rsb}(\alpha )$.

Proving a formalized version of (17) occupies a significant part of the present paper. We introduce a slightly modified version of the messages ${\texttt {m}}$ which record the topologies of the free trees $\varvec{T}$. We then restrict to cluster encodings in which every free tree has fewer than T variables, which limits the distance that information can propagate between free variables. We prove a version of (17) for every fixed T, and show that this yields the sharp lower bound in the limit $T\rightarrow \infty $. The proof of (17) for fixed T is via the moment method for the auxiliary model, which boils down to a complicated optimization problem over many dimensions. It is known (see e.g. [25, Lem. 3.6]) that stationary points of the optimization problem correspond to “generalized” bp fixed points—these are measures $Q_{v\rightarrow a}({\texttt {m}}_{v\rightarrow a},{\texttt {m}}_{a\rightarrow v})$, rather than the simpler “one-sided” measures $q_{v\rightarrow a}({\texttt {m}}_{v\rightarrow a})$ considered in the 1rsb heuristic.

The one-sided property is a crucial simplification in physics calculations (cf. [33, Proposition 19.4]), but is challenging to prove in general. One contribution of this work that we wish to highlight is a novel resampling argument which yields a reduction to one-sided messages, and allows us to solve the moment optimization problem. (We are helped here by the truncation on the sizes of free trees.) Furthermore, the approach allows us to bring in methods from large deviations theory. With these we can show that the objective function has negative-definite Hessian at the optimizer, which is necessary for the second moment method. This resampling approach is quite general and should apply in a broad range of models.

1.8 Open problems

Beyond the free energy, it remains a challenge to establish the full picture predicted by statistical physicists for $\alpha \leqslant \alpha _\text {sat}$. We refer the reader to several recent works targeted at a broad class of models in the regime $\alpha \leqslant \alpha _\text {cond}$ [9, 16, 19], and to work on the location on $\alpha _\text {cond}$ in a general family of models [14]. In the condensation regime $(\alpha _\text {cond},\alpha _\text {sat})$, an initial step would be to show that most solutions lie within a bounded number of clusters. A much more refined prediction is that the mass distribution among the largest clusters forms a Poisson–Dirichlet process. Another question is to show that on a typical problem instance over n variables, if $\underline{{{\varvec{x}}}}^1,\underline{{{\varvec{x}}}}^2$ are sampled independently and uniformly at random from the solutions of that instance, then the normalized overlap $R_{1,2}\equiv n^{-1}\{v:{\varvec{x}}^1_v={\varvec{x}}^2_v\}$ concentrates on two values (corresponding roughly to the two cases that $\underline{{{\varvec{x}}}}^1,\underline{{{\varvec{x}}}}^2$ come from the same cluster, or from different clusters)—this criterion is sometimes taken as the precise definition of 1rsb. During the final revision stage of this paper, some of the above questions were addressed by a new preprint [40].

Beyond the immediate context of random csps, understanding the condensation transition may deepen our understanding of the stochastic block model, a model for random networks with underlying community structure. Here again ideas from statistical physics have played an important role [21]. A great deal is now known rigorously for the case of two blocks [32, 37], where there is no condensation regime. For models with more than two blocks, however, it is predicted that the condensation can occur, and may define a regime where detection is information-theoretically possible but computationally intractable. A condensation threshold has been established for the anti-ferromagnetic Potts model, corresponding to the disassortative regime of the stochastic block model. An analogous transition is expected in the ferromagnetic (assortative) case, and this remains open.

2 Combinatorial model

In this section we formalize a combinatorial model of clusters, which allows us to rigorously lower bound the tilted cluster partition function (5). We begin by reviewing the (standard) graphical depiction of nae-sat. A not-all-equal-sat (nae-sat) problem instance is naturally represented by a bipartite factor graph $\mathscr {G}$ with signed edges, as follows. The vertex set of $\mathscr {G}$ is partitioned into a set $V=\{v_1,\ldots ,v_n\}$ of variables and a set $F=\{a_1,\ldots ,a_m\}$ of clauses; we then have a set E of edges joining variables to clauses. For each edge $e\in E$ we write v(e) for the incident variable, and a(e) for the incident clause; and we assign an edge literal ${\texttt {L}}_e\in \{{\texttt {0}},{\texttt {1}}\}$ to indicate whether v(e) participates affirmatively (${\texttt {L}}_e={\texttt {0}}$) or negatively (${\texttt {L}}_e={\texttt {1}}$) in a(e). We define all edges to have length one-half, so two variables $v\ne v'$ lie at unit distance if and only if they appear in the same clause. Throughout this paper we denote $\mathcal {G}\equiv (V,F,E)$ for the bipartite graph without edge literals, and $\mathscr {G}\equiv (V,F,E,\underline{{\texttt {L}}})\equiv (\mathcal {G},\underline{{\texttt {L}}})$ for the nae-sat instance.

Formally we regard the edges E as a permutation, as follows. Each variable $v\in V$ has incident half-edges $\delta v$, while each clause $a\in F$ has incident half-edges $\delta a$. Write $\delta V$ for the labelled set of all variable-incident half-edges, and $\delta F$ for the labelled set of all clause-incident half-edges; we require that $|\delta V|=|\delta F|=\ell $. Then any permutation ${\mathfrak {m}}$ of $[\ell ]\equiv \{1,\ldots ,\ell \}$ defines E by defining a matching between $\delta V$ and $\delta F$. Note that any permutation of $[\ell ]$ is permitted, so multi-edges can occur. In this paper we assume that the graph is (d, k)-regular: each variable has d incident edges, and each clause has k incident edges, so $|E|=nd=mk$. A random k-nae-sat instance is given by $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ where $|V|=n$, $|F|=m$, E is given by a uniformly random permutation ${\mathfrak {m}}$ of [nd], and $\underline{{\texttt {L}}}$ is a uniformly random sample from $\{{\texttt {0}},{\texttt {1}}\}^E$. We write $\mathbb {P}$ and $\mathbb {E}$ for probability and expectation over the law of $\mathscr {G}$.

Definition 2.1

(solutions and clusters) A solution of the nae-sat problem instance $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ is any assignment $\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ such that for all $a\in F$, $({\texttt {L}}_e \oplus {\varvec{x}}_{v(e)})_{e\in \delta a}$ is neither identically ${\texttt {0}}$ nor identically ${\texttt {1}}$. Let $\mathrm {\textsf {{SOL}}}(\mathscr {G})\subseteq \{{\texttt {0}},{\texttt {1}}\}^V$ denote the set of all solutions of $\mathscr {G}$, and define a graph on $\mathrm {\textsf {{SOL}}}(\mathscr {G})$ by connecting any pair of solutions at unit Hamming distance. The (maximal) connected components of the $\mathrm {\textsf {{SOL}}}(\mathscr {G})$ graph are the solution clusters, hereafter denoted $\mathrm {\textsf {{CL}}}(\mathscr {G})$.

The aim of this section is to establish that (under a certain restriction) the nae-sat solution clusters can be represented by a combinatorial model of what we will term “colorings.” We will describe the correspondence in a few stages. Informally, the progression is given by

$$\begin{aligned}&\textsc {nae-sat}\, \text {solution clusters }\varvec{\gamma }\leftrightarrow \text {frozen configurations }\underline{{x}}\leftrightarrow \nonumber \\&\leftrightarrow \text {warning configurations }\underline{{y}} \leftrightarrow \text {message configurations }\underline{{\tau }}\leftrightarrow \text {colorings }\underline{{\sigma }}. \end{aligned}$$

(18)

Each step of (18) is formalized below. As mentioned previously, the key feature of the last model is that the size of a cluster $\varvec{\gamma }$ can be easily read off from its corresponding coloring $\underline{{\sigma }}$, as a product of local functions. Some steps of the correspondence (18) appear in existing literature (see [13, 25, 33, 34, 42]) but we present them here in detail for completeness.

2.1 Frozen and warning configurations

We introduce a new value ${\texttt {f}}$ (free), and adopt the convention ${\texttt {0}}\oplus {\texttt {f}}\equiv {\texttt {f}}\equiv {\texttt {1}}\oplus {\texttt {f}}$. For $l\geqslant 1$ and $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l$ let $I^\textsc {nae}(\underline{{x}})$ be the indicator that $\underline{{x}}$ is neither identically ${\texttt {0}}$ nor identically ${\texttt {1}}$. Given an nae-sat instance $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ and an assignment $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V$, denote

$$\begin{aligned} I^\textsc {nae}(\underline{{x}};\mathscr {G}) \equiv \prod _{a\in F} I^\textsc {nae}( ({\texttt {L}}_e\oplus x_{v(e)})_{e\in \delta a} )\,. \end{aligned}$$

By Definition 2.1, an nae-sat solution is an assignment $\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ satisfying $I^\textsc {nae}(\underline{{{\varvec{x}}}};\mathscr {G})=1$.

Definition 2.2

(frozen configurations) Given an nae-sat instance $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, for any $e\in E$ let $\mathscr {G}\oplus {\texttt {1}}_e$ denote the instance obtained by flipping the edge label ${\texttt {L}}_e$ to ${\texttt {L}}_e\oplus {\texttt {1}}$. We say that $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V$ is a valid frozen configuration on $\mathscr {G}$ if (i) no nae-sat constraint is violated, meaning $I^\textsc {nae}(\underline{{x}};\mathscr {G})=1$; and (ii) for all $v\in V$, $x_v$ takes a value in $\{{\texttt {0}},{\texttt {1}}\}$ only when forced to do so, meaning there is some $e\in \delta v$ such that

$$\begin{aligned} I^\textsc {nae}(\underline{{x}};\mathscr {G}\oplus {\texttt {1}}_e)=0\,. \end{aligned}$$

(19)

If no such $e\in \delta v$ exists then $x_v={\texttt {f}}$.

It is well known that on any given $\mathscr {G}$, every nae-sat solution $\underline{{{\varvec{x}}}}$ can be mapped to a frozen configuration $\underline{{x}}=\underline{{x}}(\underline{{{\varvec{x}}}})$ via a “coarsening” or “whitening” procedure [42], as follows. Initialize $\underline{{x}}=\underline{{{\varvec{x}}}}$. Then, whenever $x_v\in \{{\texttt {0}},{\texttt {1}}\}$ but there exists no $e\in \delta v$ such that (19) holds, update $x_v$ to ${\texttt {f}}$. Iterate until no further updates can be made; the result is then a valid frozen configuration. Two nae-sat solutions $\underline{{{\varvec{x}}}}$, $\underline{{{\varvec{x}}}}'$ map to the same frozen configuration $\underline{{x}}$ if and only if they lie in the same cluster $\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})$. Thus, for any given $\mathscr {G}$, we have a well-defined mapping from clusters $\varvec{\gamma }$ to frozen configurations $\underline{{x}}$. This map is one-to-one but not necessarily onto: for instance, the all-free assignment $\underline{{x}}\equiv {\texttt {f}}$ is always trivially a valid frozen configuration, but on many instances $\mathscr {G}$ there is no solution cluster $\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})$ whose coarsening is $\underline{{x}}\equiv {\texttt {f}}$. Since the aim is to lower bound the clusters, the lack of surjectivity must be addressed. We will do so momentarily (Definition 2.4 below), but first we review an useful alternative representation of frozen configurations:

Definition 2.3

(warning configurations) For the integers $l\geqslant 1$, define functions $\dot{Y}: \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l\rightarrow \{{\texttt {0}},{\texttt {1}},{\texttt {f}},\texttt {z}\}$ and $\hat{Y}: \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l\rightarrow \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}$ by

$$\begin{aligned} \dot{Y}(\underline{{{\hat{y}}}}) ={\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }{\texttt {0}}\in \{{\hat{y}}_i\} \subseteq \{{\texttt {0}},{\texttt {f}}\},\\ {\texttt {1}}&{}\text {if }{\texttt {1}}\in \{{\hat{y}}_i\}\subseteq \{{\texttt {1}},{\texttt {f}}\},\\ {\texttt {f}}&{}\text {if }\{{\hat{y}}_i\}={\texttt {f}},\\ \texttt {z}&{}\text {otherwise;}\end{array}\right. } \quad \hat{Y}(\underline{{{\dot{y}}}}) = {\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }\{{\dot{y}}_i\}=\{{\texttt {1}}\},\\ {\texttt {1}}&{}\text {if }\{{\dot{y}}_i\}=\{{\texttt {0}}\};\\ {\texttt {f}}&{}\text {otherwise.}\end{array}\right. }\end{aligned}$$

Denote $M\equiv \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^2$. We write $\underline{{y}}\in M^E$ if $\underline{{y}}=(y_e)_{e\in E}$ where $y_e\equiv ({\dot{y}}_e,{\hat{y}}_e)\in M$. If edge e joins variable v to clause a, then ${\dot{y}}_e$ represents a “warning” along e from v to a, while ${\hat{y}}_e$ represents a “warning” along e from a to v. We say that $\underline{{y}}\in M^E$ is a valid warning configuration on $\mathscr {G}$ if it satisfies the local equations

$$\begin{aligned} y_e=({\dot{y}}_e,{\hat{y}}_e) =\Big ( \dot{Y}( \underline{{{\hat{y}}}}_{\delta v(e){\setminus } e}), {\texttt {L}}_e \oplus \hat{Y}( (\underline{{\texttt {L}}}\oplus \underline{{{\dot{y}}}})_{\delta a(e){\setminus } e} ) \Big )\end{aligned}$$

(20)

for all $e\in E$ (with no ${\dot{y}}_e=\texttt {z}$).

It is well known that on any given $\mathscr {G}$ there is a natural bijection

$$\begin{aligned} \left\{ \begin{array}{c} \text {frozen configurations}\\ \underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\end{array} \right\} \longleftrightarrow \left\{ \begin{array}{c} \text {warning configurations}\\ \underline{{y}}\in M^E\end{array} \right\} \,. \end{aligned}$$

(21)

In the forward direction, given a (valid) frozen configuration $\underline{{x}}$, for any variable v and any edge $e\in \delta v$ such that (19) holds, set ${\hat{y}}_e=x_v \in \{{\texttt {0}},{\texttt {1}}\}$; then in all other cases set ${\hat{y}}_e={\texttt {f}}$. Then, having defined all the ${\hat{y}}_e$, the ${\dot{y}}_e$ can only be defined by the local Eq. (20). One can check that the resulting assignment $\underline{{y}}\in M^E$ is a warning configuration. Conversely, given a warning configuration $\underline{{y}}$, a frozen configuration $\underline{{x}}$ can be obtained by setting $x_v=\dot{Y}({\hat{y}}_{\delta v})$ for all v.

2.2 Message configurations

We return to the question of surjectivity: does a given frozen configuration $\underline{{x}}$ encode a (nonempty) solution cluster $\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})$? We will now state an easy sufficient condition for this to hold. The condition is not in general necessary, but we will show that it captures enough of the solution space to deliver a sharp lower bound on the free energy.

Definition 2.4

(free cycles) Let $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V$ be a valid frozen configuration on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$. We say that a clause $a\in F$ is separating (with respect to $\underline{{x}}$) if there exist $e',e''\in \delta a$ such that ${\texttt {L}}_{e'}\oplus x_{v(e')}={\texttt {0}}$ while ${\texttt {L}}_{e''}\oplus x_{v(e'')}={\texttt {1}}$. For instance, a forcing clause is also separating. A cycle in $\mathscr {G}$ is a sequence of edges

$$\begin{aligned} e_1 e_2 \ldots e_{2\ell -1} e_{2\ell } e_1, \end{aligned}$$

where, taking indices modulo $2\ell $, it holds for each integer i that $e_{2i-1}$ and $e_{2i}$ are distinct but share a clause, while $e_{2i}$ and $e_{2i+1}$ are distinct but share a variable. (In particular, if v is joined to a by two edges $e' \ne e''$, then $e'e''$ forms a cycle.) We say the cycle in $\mathscr {G}$ is free (with respect to $\underline{{x}}$) if all its variables are free and all its clauses are non-separating.

Definition 2.5

(free trees) Let $\underline{{x}}$ be a frozen configuration on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ that has no free cyces. Let H be the subgraph of $\mathscr {G}$ induced by the free variables and non-separating clauses of $\underline{{x}}$. Since $\underline{{x}}$ has no free cycles, H must be a disjoint union of tree components $\varvec{t}$, which we term the free trees of $\underline{{x}}$. For each $\varvec{t}$, let $\varvec{T}\equiv \varvec{T}(\varvec{t})$ be the subgraph of $\mathscr {G}$ induced by $\varvec{t}$ together with its incident variables. The subgraphs $\varvec{T}$ (which can contain cycles) will be termed the free pieces of $\underline{{x}}$. Each free variable is covered by exactly one free piece. In the simplest case, a free piece consists of a single free variable surrounded by d separating clauses.

Let us say that $\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ extends $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V$ if ${\varvec{x}}_v=x_v$ for all v such that $x_v\in \{{\texttt {0}},{\texttt {1}}\}$. If $\underline{{x}}$ is a frozen configuration on $\mathscr {G}$ with no free cycles, it is easy to extend $\underline{{x}}$ to valid nae-sat solutions $\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$—we simply extend $\underline{{x}}$ on each free tree $\varvec{t}$, since nae-sat on a tree is always solvable; the different free trees do not interact. Let $\varvec{\gamma }$ denote the set of all valid nae-sat solutions on $\mathscr {G}$ that extend $\underline{{x}}$, and denote $\mathrm {\textsf {{size}}}(\underline{{x}})\equiv |\varvec{\gamma }|$. Meanwhile, let ${\mathfrak {T}}(\underline{{x}})$ denote the set of all free pieces of $\underline{{x}}$. For each $\varvec{T}\in {\mathfrak {T}}(\underline{{x}})$, let $\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})$ denote the number of valid nae-sat solutions on $\varvec{T}$ that extend $\underline{{x}}|_{\varvec{T}}$. It follows from our discussion that $\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})$ with

$$\begin{aligned} |\varvec{\gamma }| = \mathrm {\textsf {{size}}}(\underline{{x}})=\prod _{\varvec{T}\in {\mathfrak {T}}(\underline{{x}})} \mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})\,. \end{aligned}$$

(22)

That is to say, the absence of free cycles is an easy sufficient condition for a frozen configuration to encode a nonempty cluster; and it further ensures that the cluster has a relatively simple product structure (22). As noted previously, the structure within each free piece $\varvec{T}$ can be understood by dynamic programming (bp). This is a well-known calculation (see e.g. [33, Ch. 14]) but we will review the details for our setting. To this end, we first introduce a combinatorial model of “message configurations” which will map directly to the natural bp variables.

Recall from Definition 2.3 that a warning configuration is denoted $\underline{{y}}\in M^E$ where each $y_e\equiv ({\dot{y}}_e,{\hat{y}}_e)\in M$. We denote a message configuration by $\underline{{\tau }}\in {\mathscr {M}}^E$ where each $\tau _e=(\dot{\tau }_e,\hat{\tau }_e)\in {\mathscr {M}}$ (for ${\mathscr {M}}$ to be defined below). It will be convenient to let ${\textsc {e}}$ indicate a directed edge, pointing from tail vertex $t({\textsc {e}})$ to head vertex $h({\textsc {e}})$. If e is the undirected version of ${\textsc {e}}$, then we denote

$$\begin{aligned}(y_{{\textsc {e}}},\tau _{{\textsc {e}}}) ={\left\{ \begin{array}{ll} ({\dot{y}}_e,\dot{\tau }_e) &{}\text {if }t({\textsc {e}})\text { is a variable,}\\ ({\hat{y}}_e,\hat{\tau }_e) &{}\text {if }t({\textsc {e}})\text { is a clause.} \end{array}\right. }\end{aligned}$$

We will make a definition such that $\tau _{{\textsc {e}}}$ either takes the value “$\star $” or is a bipartite factor tree. The tree is unlabelled except that one vertex is distinguished as the root, and some edges are assigned ${\texttt {0}}$ or ${\texttt {1}}$ values as explained below. The root of $\tau _{{\textsc {e}}}$ is required to have degree one, and should be thought of as corresponding to the head vertex $h({\textsc {e}})$.

In the context of message configurations $\underline{{\tau }}$, we use “${\texttt {0}}$” or “${\texttt {1}}$” to stand for the tree consisting of a single edge which is labelled ${\texttt {0}}$ or ${\texttt {1}}$ and rooted at the endpoint corresponding to the head vertex—the root is the incident clause in the case of $\dot{\tau }$, the incident variable in the case of $\hat{\tau }$. We use ${\texttt {s}}$ to stand for the tree consisting of a single unlabelled edge, rooted at the incident variable; this will be related to the situation of separating clauses from Definition 2.4. Given a collection of rooted trees $t_1,\ldots ,t_\ell $ whose roots $o_1,\ldots ,o_\ell $ are all of the same type (either all variable or all clauses), we define $t=\mathrm {\textsf {{join}}}(t_1,\ldots ,t_\ell )$ by identifying all the $o_i$ as a single vertex o, then adding an edge which joins o to a new vertex $o'$. The vertex o has the same type (variable or clause) as the $o_i$; and the vertex $o'$ is assigned the opposite type and becomes the root of t.

Definition 2.6

(message configurations) Start with $\dot{\mathscr {M}}_0\equiv \{{\texttt {0}},{\texttt {1}},\star \}$ and $\hat{\mathscr {M}}_0\equiv \varnothing $, and suppose inductively that $\dot{\mathscr {M}}_t,\hat{\mathscr {M}}_t$ have been defined. For $\underline{{\hat{\tau }}}\in (\hat{\mathscr {M}}_t)^{d-1}$ and $\underline{{\dot{\tau }}}\in (\dot{\mathscr {M}}_t)^{k-1}$, let us abbreviate $\{\hat{\tau }_i\}\equiv \{\hat{\tau }_1,\ldots ,\hat{\tau }_{k-1}\}$ and $\{\dot{\tau }_i\}\equiv \{\dot{\tau }_1,\ldots ,\dot{\tau }_{d-1}\}$. Define

$$\begin{aligned}\dot{T}(\underline{{\hat{\tau }}}) ={\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }{\texttt {0}}\in \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {1}}\},\\ {\texttt {1}}&{}\text {if }{\texttt {1}}\in \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {0}}\},\\ \texttt {z}&{}\text {if }\{{\texttt {0}},{\texttt {1}}\}\subseteq \{\hat{\tau }_i\},\\ \star &{}\text {if }\star \in \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {0}},{\texttt {1}}\},\\ \mathrm{{\textsf {{join}}}} \{\hat{\tau }_i\} &{}\text {if } \{\hat{\tau }_i\}\subseteq \hat{\mathscr {M}}_t{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}; \end{array}\right. }\quad \hat{T}(\underline{{\dot{\tau }}}) ={\left\{ \begin{array}{ll} {\texttt {0}}&{}\text {if }\{\dot{\tau }_i\}=\{{\texttt {1}}\},\\ {\texttt {1}}&{}\text {if }\{\dot{\tau }_i\}=\{{\texttt {0}}\},\\ {\texttt {s}}&{}\text {if }\{{\texttt {0}},{\texttt {1}}\}\subseteq \{\dot{\tau }_i\},\\ \star &{}\text {if }\{{\texttt {0}},{\texttt {1}}\}\not \subseteq \{\dot{\tau }_i\}\text { and } \star \in \{\dot{\tau }_i\},\\ \mathrm{{\textsf {{join}}}} \{{\tau }_i\} &{}\text {otherwise.}\end{array}\right. } \end{aligned}$$

Then, for $t\geqslant 0$, define recursively the sets

$$\begin{aligned} \hat{\mathscr {M}}_{t+1}&\equiv \hat{\mathscr {M}}_t \cup \hat{T}[(\dot{\mathscr {M}}_t)^{k-1}]\,,\\ \dot{\mathscr {M}}_{t+1}&\equiv \dot{\mathscr {M}}_t \cup \dot{T}[(\hat{\mathscr {M}}_{t+1})^{d-1}]{{\setminus }}\{\texttt {z}\} \end{aligned}$$

We then let $\dot{\mathscr {M}}$ be the union of all the $\dot{\mathscr {M}}_t$, let $\hat{\mathscr {M}}$ be the union of all the $\hat{\mathscr {M}}_t$, and let ${\mathscr {M}}=\dot{\mathscr {M}}\times \hat{\mathscr {M}}$. On $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, the assignment $\underline{{\tau }}\in {\mathscr {M}}^E$ is a valid message configuration if (i) it satisfies the local equations

$$\begin{aligned} \tau _e =(\dot{\tau }_e,\hat{\tau }_e) =\Big ( \dot{T}( \underline{{\hat{\tau }}}_{\delta v(e){\setminus } e} ), {\texttt {L}}_e\oplus \hat{T}( (\underline{{\texttt {L}}}\oplus \underline{{\dot{\tau }}})_{\delta a(e){\setminus } e} ) \Big ) \end{aligned}$$

(23)

for all $e\in E$ (with no $\dot{\tau }_e=\texttt {z}$), and (ii) if one element of $\{\dot{\tau }_e,\hat{\tau }_e\}$ equals $\star $ then the other element is in $\{{\texttt {0}},{\texttt {1}}\}$. In (23), we take the convention that ${\texttt {L}}_e\oplus {\texttt {f}}={\texttt {f}}$ and ${\texttt {L}}_e\oplus \star =\star $, and if $\tau $ is a tree with labels then ${\texttt {L}}_e\oplus \tau $ is defined by applying ${\texttt {L}}_e\oplus \cdot $ entrywise to all labels of $\tau $. See Fig. 1.

Suppose $\underline{{x}}$ is a frozen configuration on $\mathscr {G}$, and let $\underline{{y}}$ be its corresponding warning configuration from (21). Given $\underline{{y}}$, we define $\underline{{\tau }}$ in four stages:

1.
If ${\dot{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}$ then set $\dot{\tau }_e={\dot{y}}_e$; likewise if ${\hat{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}$ then set $\hat{\tau }_e={\hat{y}}_e$.
2.
If $(\underline{{\texttt {L}}}\oplus \underline{{{\dot{y}}}})_{\delta a(e){\setminus } e}$ has both ${\texttt {0}}$ and ${\texttt {1}}$ entries, then set $\hat{\tau }_e={\texttt {s}}$.
3.
Apply the local Eq. (23) recursively to define $\dot{\tau }_e,\hat{\tau }_e$ wherever possible.
4.
Lastly, if any $\dot{\tau }_e$ or $\hat{\tau }_e$ remains undefined, then set it to $\star $.

An example with $\star $ messages is given in Fig. 2.

Lemma 2.7

The mapping described above defines a bijection

$$\begin{aligned} \left\{ \begin{array}{c} \text {frozen configurations } \underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V\\ \text {without free cycles}\end{array} \right\} \longleftrightarrow \left\{ \begin{array}{c} \text {message configurations}\\ \underline{{\tau }}\in {\mathscr {M}}^E\end{array} \right\} \,. \end{aligned}$$

Proof

Let $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V$ be a frozen configuration on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ without free cycles, and let $\underline{{y}}\in M^E$ be the warning configuration which corresponds to $\underline{{x}}$ via (21). We first check that the mapping $\underline{{y}}\mapsto \underline{{\tau }}$, as described above, gives a message configuration which is valid, i.e., satisfies conditions (i) and (ii) of Definition 2.6. In the first stage, the mapping procedure sets $\dot{\tau }_e={\dot{y}}_e$ whenever ${\dot{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}$, and $\hat{\tau }_e={\hat{y}}_e$ whenever ${\hat{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}$. One can argue by induction that the rest of the procedure does not create any additional ${\texttt {0}}$ or ${\texttt {1}}$ messages, so that in the final configuration the $\{{\texttt {0}},{\texttt {1}}\}$ values of $\underline{{\tau }}$ will match those of $\underline{{y}}$. The second and third stages of the procedure are clearly consistent with the local Eq. (23). Note in particular that the third stage does not produce any $\dot{\tau }_e=\texttt {z}$ message, because it would contradict the assumption that $\underline{{y}}$ is a valid warning configuration; it also does not produce any $\star $ message. All $\star $ messages are created in the fourth stage, and this is clearly consistent with the mapping of $\star $ messages under $\dot{T}$ and $\hat{T}$. This concludes the proof that $\underline{{\tau }}$ satisfies condition (i) of Definition 2.6. To check condition (ii), suppose $\tau _{{\textsc {e}}}=\star $, and let ${\textsc {f}}$ denote the reversal of ${\textsc {e}}$. From the above construction, it must be that $y_{{\textsc {e}}}={\texttt {f}}$ and $\tau _{{\textsc {e}}'}=\star $ for some ${\textsc {e}}'$ that points to the tail vertex $t({\textsc {e}})$ but does not equal ${\textsc {f}}$. Consequently ${\textsc {e}}$ must belong to a directed cycle ${\textsc {e}}_1 {\textsc {e}}_2\ldots {\textsc {e}}_{2k} {\textsc {e}}_1$ with all the $\tau _{{\textsc {e}}_i}$ equal to $\star $. Whenever ${\textsc {e}}$ points from a separating clause a to free variable v, we must have $\tau _{{\textsc {e}}}={\texttt {s}}$. As a result, if all the variables along the cycle are free, then none of the clauses can be separating, contradicting the assumption that $\underline{{x}}$ has no free cycles. Therefore some variable v on the cycle must take value $x_v\in \{{\texttt {0}},{\texttt {1}}\}$, and by relabelling we may assume $v=t({\textsc {e}}_1)$. Let ${\textsc {f}}_i$ denote the reversal of ${\textsc {e}}_i$: since $x_v\ne {\texttt {f}}$ but $y_{{\textsc {e}}_1}={\texttt {f}}$, it must be that $y_{{\textsc {f}}_1}=x_v$. This means that the clause $a=h({\textsc {e}}_1)=t({\textsc {f}}_1)$ is forcing to v, so in particular $y_{{\textsc {f}}_2}\in \{{\texttt {0}},{\texttt {1}}\}$. Continuing in this way we see that $y_{{\textsc {f}}_i}\in \{{\texttt {0}},{\texttt {1}}\}$ for all i, and it follows that $\underline{{\tau }}$ satisfies condition (ii), and so is a valid message configuration.

The mapping from $\underline{{y}}$ to $\underline{{\tau }}$ is clearly injective. To see that it is surjective, let $\underline{{\tau }}$ be any valid message configuration. Projecting $\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}}\}\mapsto {\texttt {f}}$ and $\hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}}\}\mapsto {\texttt {f}}$ yields a valid warning configuration $\underline{{y}}$, which in turn maps to a valid frozen configuration $\underline{{x}}$. It remains then to check that $\underline{{x}}$ has no free cycles. Indeed, along a free cycle, all the warnings (in either direction) must be ${\texttt {f}}$. This means none of the messages can be in $\{{\texttt {0}},{\texttt {1}}\}$, and as a result none of the messages can be $\star $, by condition (ii) of Definition 2.6. This means all the messages must be in $\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$ or $\hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$. Suppose in one direction of the cycle we have the directed edges ${\textsc {e}}_1{\textsc {e}}_2\ldots {\textsc {e}}_{2k}{\textsc {e}}_1$. By definition of $\dot{T}$ and $\hat{T}$, $\tau _{{\textsc {e}}_i}$ is a proper subtree of $\tau _{{\textsc {e}}_{i+1}}$ for all i, with indices modulo 2k. Going around the cycle we find that $\tau _{{\textsc {e}}_1}$ is a proper subtree of $\tau _{{\textsc {e}}_{2k+1}}=\tau _{{\textsc {e}}_1}$, which gives the contradiction.$\square $

2.3 Bethe formula

We now describe the dynamic programming (bp) calculation which will ultimately take a message configuration $\underline{{\tau }}$ and evaluate a product of local functions to compute the size of its associated cluster. The first step is to define the dynamic programming variables; these will formalize the measures ${\texttt {m}}$ which were introduced previously in (7). Recall that for $l\geqslant 1$ and $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^l$, we write $I^\textsc {nae}(\underline{{x}})$ for the indicator that the entries of $\underline{{x}}$ are not identically ${\texttt {0}}$ or identically ${\texttt {1}}$.

Definition 2.8

Recall that message configuration spins belong to the space ${\mathscr {M}}=\dot{\mathscr {M}}\times \hat{\mathscr {M}}$ (Definition 2.6). Let ${\mathscr {P}}(\{{\texttt {0}},{\texttt {1}}\})$ denote the space of probability measures on $\{{\texttt {0}},{\texttt {1}}\}$. Define the mappings $\dot{{\texttt {m}}}:\dot{\mathscr {M}}\rightarrow {\mathscr {P}}(\{{\texttt {0}},{\texttt {1}}\})$ and $\hat{{\texttt {m}}}:\hat{\mathscr {M}}\rightarrow {\mathscr {P}}(\{{\texttt {0}},{\texttt {1}}\})$ as follows. For $\dot{\tau }\in \{{\texttt {0}},{\texttt {1}}\}$ let $\dot{{\texttt {m}}}(\dot{\tau })$ be the unit measure supported on $\dot{\tau }$. Likewise, for $\hat{\tau }\in \{{\texttt {0}},{\texttt {1}}\}$ let $\hat{{\texttt {m}}}(\hat{\tau })$ be the unit measure supported on $\hat{\tau }$. For $\dot{\tau }\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$ or $\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$ we let $\dot{{\texttt {m}}}(\dot{\tau })$ and $\hat{{\texttt {m}}}(\hat{\tau })$ be recursively defined: if $\dot{\tau }=\dot{T}(\hat{\tau }_1,\ldots ,\hat{\tau }_{d-1})$ where no $\hat{\tau }_j=\star $, define

$$\begin{aligned} \dot{z}(\dot{\tau }) \equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \prod _{i=1}^{d-1} [\hat{{\texttt {m}}}(\hat{\tau }_i)]({\varvec{x}})\,, \quad [\dot{{\texttt {m}}}(\dot{\tau })]({\varvec{x}}) \equiv \frac{1}{\dot{z}(\dot{\tau })} \prod _{i=1}^{d-1} [\hat{{\texttt {m}}}(\hat{\tau }_i)]({\varvec{x}})\,. \end{aligned}$$

(24)

Note that $\hat{\tau }_1,\ldots ,\hat{\tau }_{d-1}$ can be recovered from $\dot{\tau }$ modulo permutation of the indices, so these quantities are well-defined. We see inductively that for $\dot{\tau }\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$, the normalizing factor $\dot{z}(\dot{\tau })$ is positive, and $\dot{{\texttt {m}}}(\dot{\tau })$ is a nondegenerate probability measure on $\{{\texttt {0}},{\texttt {1}}\}$. Similarly, if $\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$ equals $\hat{T}(\dot{\tau }_1,\ldots ,\dot{\tau }_{k-1})$ where none of the $\dot{\tau }_i$ are $\star $, then set

$$\begin{aligned} \hat{z}(\hat{\tau })&\equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \sum _{\underline{{\dot{{\varvec{x}}}}}\in \{{\texttt {0}},{\texttt {1}}\}^{k-1}} I^\textsc {nae}({\varvec{x}},\underline{{\dot{{\varvec{x}}}}}) \prod _{i=1}^{k-1} [\dot{{\texttt {m}}}(\dot{\tau }_i)](\dot{{\varvec{x}}}_i) =2 -\prod _{i=1}^{k-1}[\dot{{\texttt {m}}}(\dot{\tau }_i)]({\texttt {0}}) -\prod _{i=1}^{k-1}[\dot{{\texttt {m}}}(\dot{\tau }_i)]({\texttt {1}})\,,\nonumber \\ [\hat{{\texttt {m}}}(\hat{\tau })]({\varvec{x}})&\equiv \frac{1}{\hat{z}(\hat{\tau })} \sum _{\underline{{\dot{{\varvec{x}}}}}\in \{{\texttt {0}},{\texttt {1}}\}^{k-1}} I^\textsc {nae}({\varvec{x}},\underline{{\dot{{\varvec{x}}}}}) \prod _{i=1}^{k-1} [\dot{{\texttt {m}}}(\dot{\tau }_i)](\dot{{\varvec{x}}}_i) = \frac{1}{\hat{z}(\hat{\tau })} \bigg (1-\prod _{i=1}^{k-1}[\dot{{\texttt {m}}}(\dot{\tau }_i)]({\varvec{x}})\bigg )\,. \end{aligned}$$

(25)

Again, we see inductively that for $\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$, the normalizing factor $\hat{z}(\hat{\tau })$ is positive, and $\hat{{\texttt {m}}}(\hat{\tau })$ is a nondegenerate probability measure on $\{{\texttt {0}},{\texttt {1}}\}$. Finally, we will see below that for our purposes we can take $\dot{{\texttt {m}}}(\star )$ and $\hat{{\texttt {m}}}(\star )$ to be arbitrary nondegenerate probability measures on $\{{\texttt {0}},{\texttt {1}}\}$; we therefore define them both to equal the uniform measure on $\{{\texttt {0}},{\texttt {1}}\}$.

Given a valid message configuration $\underline{{\tau }}$ on $\mathscr {G}$, define $\underline{{{\texttt {m}}}} = ({\texttt {m}}_e)_{e\in E}$ where ${\texttt {m}}_e\equiv (\dot{{\texttt {m}}}_e,\hat{{\texttt {m}}}_e)$ with $\dot{{\texttt {m}}}_e\equiv \dot{{\texttt {m}}}(\dot{\tau }_e)$ and $\hat{{\texttt {m}}}_e\equiv \hat{{\texttt {m}}}(\hat{\tau }_e)$. It follows from Definition 2.8 that $\underline{{{\texttt {m}}}}$ satisfies the following local consistency equations, which are inherited from the Eq. (23) satisfied by $\underline{{\tau }}$, in combination with the above definitions (24) and (25). If $\dot{\tau }_e\ne \star $, then $\dot{{\texttt {m}}}_e$ is given by the equation

$$\begin{aligned} \dot{{\texttt {m}}}_e({\varvec{x}}) = \frac{1}{\dot{z}(\dot{\tau }_e)} \prod _{e'\in \delta v(e){\setminus } e} \hat{{\texttt {m}}}_{e'}({\varvec{x}}) \end{aligned}$$

(26)

for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$. Likewise, if $\hat{\tau }_e\ne \star $, then $\hat{{\texttt {m}}}_e$ is given by the equation

$$\begin{aligned} \hat{{\texttt {m}}}_e({\varvec{x}}) = \frac{1}{\hat{z}(\hat{\tau }_e)} \sum _{ \underline{{\dot{{\varvec{x}}}}}_{\delta a(e)} \in \{{\texttt {0}},{\texttt {1}}\}^d } \mathbf {1}\{\dot{{\varvec{x}}}_e={\varvec{x}}\} I^\textsc {nae}( (\underline{{\dot{{\varvec{x}}}}}\oplus \underline{{\texttt {L}}})_{\delta a(e)} ) \prod _{e'\in \delta a(e){\setminus } e} \dot{{\texttt {m}}}_{e'}({\varvec{x}}_{e'}) \end{aligned}$$

(27)

for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$. The Eqs. (26) and (27) are known as the bp equations. We now proceed to the calculation of the cluster size (22). To this end, we define the local functions

$$\begin{aligned} \bar{\varphi }(\dot{\tau },\hat{\tau })&\equiv \bigg \{ \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \dot{{\texttt {m}}}[\dot{\tau }]({\varvec{x}}) \cdot \hat{{\texttt {m}}}[\hat{\tau }]({\varvec{x}}) \bigg \}^{-1}\,,\nonumber \\ \hat{\varphi }^\text {lit}(\dot{\tau }_1,\ldots ,\dot{\tau }_k)&\equiv \sum _{\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^k} I^\textsc {nae}(\underline{{{\varvec{x}}}}) \prod _{i=1}^k \dot{{\texttt {m}}}[\dot{\tau }_i]({\varvec{x}}_i) =1\nonumber \\&-\sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \prod _{i=1}^k \dot{{\texttt {m}}}[\dot{\tau }_i]({\varvec{x}}) = \frac{\hat{z}(\hat{T}( (\dot{\tau }_j)_{j\ne i} ))) }{\bar{\varphi }(\dot{\tau }_i,\hat{T}( (\dot{\tau }_j)_{j\ne i} ))} \,,\nonumber \\ \dot{\varphi }(\hat{\tau }_1,\ldots ,\hat{\tau }_d)&\equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \prod _{i=1}^d\hat{{\texttt {m}}}[\hat{\tau }_i]({\varvec{x}}) = \frac{\dot{z}(\dot{T}((\hat{\tau }_j)_{j\ne i})))}{\bar{\varphi }(\hat{\tau }_i,\dot{T}((\hat{\tau }_j)_{j\ne i}))}\,, \end{aligned}$$

(28)

where the last identity in the last two lines holds for any choice of i. The bp calculation is summarized by the following:

Lemma 2.9

Suppose on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ that $\underline{{x}}$ is a frozen configuration with no free cycles, and let $\underline{{\tau }}$ be its corresponding message configuration from Lemma 2.7. Let $\varvec{T}\in {\mathfrak {T}}(\underline{{x}})$ be a free piece of $\underline{{x}}$, and let $\varvec{t}$ be the free tree inside it. Then the number of nae-sat extensions of $\underline{{x}}|_{\varvec{T}}$ on $\varvec{T}$ is given by

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}};{\varvec{T}}) =\prod _{v\in V({\varvec{t}})} \bigg \{ \dot{\varphi }(\underline{{{\tau }}}_{\delta v}) \prod _{e\in \delta v}\bar{\varphi }(\tau _e) \bigg \} \prod _{a\in F({\varvec{t}})} \hat{\varphi }^\text {lit}((\underline{{{\tau }}} \oplus \underline{{\texttt {L}}})_{\delta a}) \end{aligned}$$

(29)

where $V(\varvec{t})$ and $F(\varvec{t})$ denote respectively the variables and clauses in $\varvec{t}$. (An example calculation is worked out in Fig. 3.)

Proof

As we have mentioned before, this calculation is well known (see e.g. [33, Ch. 14]) but we will review it here, beginning with a minor technical point. As noted in Definition 2.5, $\varvec{t}$ is a tree but $\varvec{T}$ has a cycle wherever a variable $v\in \varvec{T}{\setminus }\varvec{t}$ is joined by more than one edge to $\varvec{t}$. However, since $\underline{{x}}|_{\varvec{T}{\setminus }\varvec{t}}$ is $\{{\texttt {0}},{\texttt {1}}\}$-valued, these cycles play no role in the question of extending $\underline{{x}}|_{\varvec{T}}$ to a valid nae-sat assignment on $\varvec{T}$—one can simply duplicate variables in $\varvec{T}{\setminus }\varvec{t}$ so that each one joins to $\varvec{t}$ by exactly one edge. We may therefore assume for the rest of the proof that all the free pieces $\varvec{T}\in {\mathfrak {T}}(\underline{{x}})$ are acyclic.

For any $\varvec{T}\in {\mathfrak {T}}(\underline{{x}})$ and any edge $e\in \varvec{T}$, delete from $\varvec{T}$ the edges $\delta a(e){\setminus } e$, and let $\dot{\varvec{T}}_e$ denote the component containing e in what remains, rooted at a(e). Likewise, delete from $\varvec{T}$ the edges $\delta v(e){{\setminus }} e$, and let $\hat{\varvec{T}}_e$ denote the component containing e in what remains, rooted at v(e). For each variable $w\in \dot{\varvec{T}}_e{{\setminus }}\varvec{t}$, let $\acute{x}_w\in \{{\texttt {0}},{\texttt {1}}\}$ be the boolean sum of $x_w$ together with all the edge literals ${\texttt {L}}$ on the path joining w to a(e) in $\dot{\varvec{T}}_e$. Note then that $\dot{\tau }_e$ encodes the isomorphism class of $\dot{\varvec{T}}_e$, labelled with boundary data $\acute{x}_w$ (for all the variables $w\in \dot{\varvec{T}}_e{\setminus }\varvec{t}$). A similar relation holds between $\hat{\tau }_e$ and $\hat{\varvec{T}}_e$. For each $e\in \varvec{T}$, let $\dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})$ count the number of valid nae-sat assignments that extend $\underline{{x}}|_{\dot{\varvec{T}}_e}$ on $\dot{\varvec{T}}_e$ and take value ${\varvec{x}}$ on v(e). Let $\hat{\textsf {s}}_e({\varvec{x}};\underline{{x}})$ count the number of valid nae-sat assignments that extend $\underline{{x}}|_{\hat{\varvec{T}}_e}$ on $\hat{\varvec{T}}_e$ and take value ${\varvec{x}}$ on v(e). Denote

$$\begin{aligned}\dot{\textsf {s}}_e(\underline{{x}})\equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})\,,\quad \hat{\textsf {s}}_e(\underline{{x}}) \equiv \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \hat{\textsf {s}}_e({\varvec{x}};\underline{{x}})\,. \end{aligned}$$

There are two boundary cases: if edge e joins a free variable in $\varvec{t}$ to a separating clause in $\varvec{T}{\setminus }\varvec{t}$, then we have $\hat{\textsf {s}}_e({\texttt {0}};\underline{{x}}) = \hat{\textsf {s}}_e({\texttt {1}};\underline{{x}})=1$. If edge e instead joins a non-separating clause in $\varvec{t}$ to a frozen variable in $\varvec{T}{\setminus }\varvec{t}$, then we have $\dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})=\mathbf {1}\{{\varvec{x}}=x_{v(e)}\}$. By induction started from these boundary cases we find that for all $e\in \varvec{T}$,

$$\begin{aligned} \dot{{\texttt {m}}}_e({\varvec{x}}) =\frac{\dot{\textsf {s}}_e({\varvec{x}};\underline{{x}})}{\dot{\textsf {s}}_e(\underline{{x}})}\,,\quad \hat{{\texttt {m}}}_e({\varvec{x}}) =\frac{\hat{\textsf {s}}_e({\varvec{x}};\underline{{x}})}{\hat{\textsf {s}}_e(\underline{{x}})}\,. \end{aligned}$$

It follows that for any variable $v\in \varvec{t}$, any clause $a\in \varvec{t}$, and any edge $e\in \varvec{T}$, we have the identities

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}};{\varvec{T}}) = \dot{\varphi }(\underline{\hat{\tau }}_{\delta v}) \prod _{e'\in \delta v} \hat{\textsf {s}}_{e'}(\underline{{x}}) = \hat{\varphi }^\text {lit}(\underline{\dot{\tau }}_{\delta a}) \prod _{e'\in \delta a} \dot{\textsf {s}}_{e'}(\underline{{x}}) = \frac{\dot{\textsf {s}}_e(\underline{{x}}) \hat{\textsf {s}}_e(\underline{{x}})}{\bar{\varphi }(\tau _e)}\,.\end{aligned}$$

Combining the identities and rearranging gives (writing $E(\varvec{t})$ for the edges of $\varvec{t}$)

For $a\in F(\varvec{t})$ and $e\in \delta a{\setminus }\varvec{t}$, the variable v(e) is frozen and so we have $\dot{\textsf {s}}_e(\underline{{x}})=1$. For any $v\in V(\varvec{t})$ and $e\in \delta v{\setminus }\varvec{t}$, we have $\dot{\textsf {s}}_e(\underline{{x}})=\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})$. The tree $\varvec{t}$ has Euler characteristic one. The right-hand side of the above equation then simplifies to $\mathrm {\textsf {{size}}}(\underline{{x}};\varvec{T})$, thereby proving the claim.$\square $

Corollary 2.10

Suppose on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ that $\underline{{x}}$ is a frozen configuration with no free cycles, and let $\underline{{\tau }}$ be its corresponding message configuration from Lemma 2.7. Then the number of valid nae-sat extensions of $\underline{{x}}$ is given by the product formula

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}})=\prod _{v\in V} \dot{\varphi }(\underline{\hat{\tau }}_{\delta v}) \prod _{a \in F} \hat{\varphi }^\text {lit}((\underline{\dot{\tau }}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in E} \bar{\varphi }(\tau _e)\,. \end{aligned}$$

This identity holds as long as $\hat{{\texttt {m}}}(\star )$ and $\hat{{\texttt {m}}}(\star )$ are fixed nondegenerate probability measures on $\{{\texttt {0}},{\texttt {1}}\}$.

Proof

Let $V'$ denote the set of free variables, and let $E'$ denote the set of all edges incident to $V'$. Let $F'$ the set of non-separating clauses. From (22) and Lemma 2.9 we have

$$\begin{aligned} \mathrm{{\textsf {{size}}}}(\underline{{x}}) =\prod _{{\varvec{T}}\in {\mathfrak {T}}(\underline{{x}})} \mathrm{{\textsf {{size}}}}({{\varvec{x}}};{\varvec{T}}) =\prod _{v\in V'} {\dot{\varphi }}(\underline{\hat{\tau }}_{\delta v}) \prod _{a\in F'} {\hat{\varphi }^\text {lit}}((\underline{\dot{\tau }} \oplus {\underline{{\texttt {L}}}})_{\delta a}) \prod _{e\in E'}\bar{\varphi }(\tau _e)\,. \end{aligned}$$

(30)

For any edge $e\in E{\setminus } E'$, the incident variable v(e) must lie in $V{\setminus } V'$, meaning $x_{v(e)}\in \{{\texttt {0}},{\texttt {1}}\}$. We now partition $E{\setminus } E'$ into the disjoint union of $E_{\texttt {r}}$ and $E_{\texttt {b}}$, as follows. Let $E_{\texttt {r}}$ be the set of edges $e\in E{\setminus } E'$ such that $\hat{{\texttt {m}}}_e$ is fully supported on $x_{v(e)}$. Let $E_{\texttt {b}}$ be the set of edges $e\in E{\setminus } E'$ such that $\hat{{\texttt {m}}}_e$ is a nondegenerate measure on $\{{\texttt {0}},{\texttt {1}}\}$; note that $\dot{{\texttt {m}}}_e$ must then be fully supported on $x_{v(e)}$. Consider a clause $a\in F{\setminus } F'$. If a is non-forcing, then $\delta a\cap E_{\texttt {r}}=\varnothing $ and $\hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a})=1$. Otherwise, a is forcing in the direction of some edge $e\in \delta a$, in which case $\delta a \cap E_{\texttt {r}}= \{e\}$ and $\hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a}) = \dot{{\texttt {m}}}_e(x_{v(e)}) = 1/\bar{\varphi }(\tau _e)$. We conclude for all $a\in F{\setminus } F'$ that

$$\begin{aligned} \hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in \delta a \cap E_{\texttt {r}}} \bar{\varphi }(\tau _e) = 1\,. \end{aligned}$$

(31)

For $v\in V{\setminus } V'$, for all $e\in \delta v\cap E_{\texttt {b}}$ we have $\dot{{\texttt {m}}}_e(x_v)=1$ and so $\bar{\varphi }(\tau _e)=1/\hat{{\texttt {m}}}(x_v)$. Thus, for all $v\in V{\setminus } V'$,

$$\begin{aligned} \dot{\varphi }(\underline{{\hat{\tau }}}_{\delta v}) \prod _{e\in \delta v\cap E_{\texttt {b}}} \bar{\varphi }(\tau _e) =1\,. \end{aligned}$$

(32)

The identities (31) and (32) remain valid even for vertices incident to $\star $ messages, as long as $\dot{{\texttt {m}}}(\star )$ and $\hat{{\texttt {m}}}(\star )$ are fixed nondegenerate probability measures on $\{{\texttt {0}},{\texttt {1}}\}$. Combining the identities with (30) proves the claim.$\square $

2.4 Colorings

We conclude this section by defining the coloring model, building on an encoding introduced by [18]. It is a simplification of the message configuration model (Definition 2.6) that takes advantage of some of the cancellations ((31) and (32)) seen above. In short, following the notation of Corollary 2.10, for edges in $E{\setminus } E'$ it is not necessary to keep all the information of $\tau _e$; instead, it suffices to keep track only of whether e belongs to $E_{\texttt {r}}$ or $E_{\texttt {b}}$, along with the value of $x_{v(e)}\in \{{\texttt {0}},{\texttt {1}}\}$. The colorings encode precisely this information. The resulting bijection between colorings and message configurations is the last step of (18).

Recall messages take values $\tau \equiv (\dot{\tau },\hat{\tau })\in {\mathscr {M}}\equiv \dot{\mathscr {M}}\times \hat{\mathscr {M}}$ (Definition 2.6), and let $\{{\texttt {f}}\}\subseteq {\mathscr {M}}$ denote the subset of values $\tau \in {\mathscr {M}}$ where we have both $\dot{\tau }\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$ and $\hat{\tau }\in \hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$. Denote $\Omega \equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}, {\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup \{{\texttt {f}}\}$. Define a projection $\textsf {S}: {\mathscr {M}} \rightarrow \Omega $ by

$$\begin{aligned} \textsf {S}(\tau )={\left\{ \begin{array}{ll} {\texttt {r}}_{\texttt {0}}&{} \text {if }\hat{\tau }={\texttt {0}},\\ {\texttt {r}}_{\texttt {1}}&{} \text {if }\hat{\tau }={\texttt {1}},\\ {\texttt {b}}_{\texttt {0}}&{} \text {if }\hat{\tau }\ne {\texttt {0}}\text { and }\dot{\tau }={\texttt {0}},\\ {\texttt {b}}_{\texttt {1}}&{} \text {if }\hat{\tau }\ne {\texttt {1}}\text { and }\dot{\tau }={\texttt {1}},\\ \tau &{} \text {otherwise (meaning that }\tau \in \{{\texttt {f}}\}). \end{array}\right. } \end{aligned}$$

(33)

(Note that $\textsf {S}(\tau )\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}$ includes the case $\dot{\tau }=\star $, and $\textsf {S}(\tau )\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$ includes the case $\hat{\tau }=\star $.) We define a partial inverse to $\textsf {S}$ as follows. If $\sigma \in \{{\texttt {f}}\}$ then define $\tau \equiv \tau (\sigma )\equiv \sigma \equiv (\dot{\sigma },\hat{\sigma })$. If $\sigma ={\texttt {r}}_{{\varvec{x}}}$ for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$ then define $\hat{\tau }\equiv \hat{\tau }(\sigma )\equiv {\varvec{x}}$ and leave $\dot{\tau }(\sigma )$ undefined. If $\sigma ={\texttt {b}}_{{\varvec{x}}}$ for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$ then define $\dot{\tau }\equiv \dot{\tau }(\sigma )\equiv {\varvec{x}}$ and leave $\hat{\tau }(\sigma )$ undefined. For $\sigma \in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$ we denote $(\dot{\sigma },\hat{\sigma })\equiv (\sigma ,\sigma )$. The coloring model is the image of the message configuration model under the projection $\textsf {S}$, formally given by the following:

Definition 2.11

(colorings) For $\underline{{\sigma }}\in \Omega ^d$, abbreviate $\{\sigma _i\}\equiv \{\sigma _1,\ldots ,\sigma _d\}$, and define

$$\begin{aligned}{\dot{I}}(\underline{{\sigma }}) \equiv {\left\{ \begin{array}{ll} 1 &{} \text {if }{\texttt {r}}_{\texttt {0}}\in \{\sigma _i\}\subseteq \{{\texttt {r}}_{\texttt {0}},{\texttt {b}}_{\texttt {0}}\},\\ 1 &{} \text {if }{\texttt {r}}_{\texttt {1}}\in \{\sigma _i\}\subseteq \{{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {1}}\},\\ 1 &{} \{\sigma _i\}\subseteq \{{\texttt {f}}\},\text { and } \dot{\sigma }_i=\dot{T}((\hat{\sigma }_j)_{j\ne i})\text { for all }i,\\ 0 &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

For $\underline{{\sigma }}\in \Omega ^k$, abbreviate $\{\sigma _i\}\equiv \{\sigma _1,\ldots ,\sigma _k\}$, and define

$$\begin{aligned}\hat{I}^\text {lit}(\underline{{\sigma }}) ={\left\{ \begin{array}{ll} 1 &{} \text {if }\exists i : \sigma _i={\texttt {r}}_{\texttt {0}}\text { and } \{\sigma _j\}_{j\ne i} =\{{\texttt {b}}_{\texttt {1}}\},\\ 1 &{} \text {if }\exists i : \sigma _i={\texttt {r}}_{\texttt {1}}\text { and } \{\sigma _j\}_{j\ne i} =\{{\texttt {b}}_{\texttt {0}}\},\\ 1 &{} \text {if }\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \subseteq \{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup \{\sigma \in \{{\texttt {f}}\}: \hat{\sigma }={\texttt {s}}\},\\ 1 &{} \text {if }\{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {0}}\}\cup \{{\texttt {f}}\}, |\{\sigma _i\}\cap \{{\texttt {f}}\}|\geqslant 2,\text { and } \hat{\sigma }_i=\hat{T}((\dot{\tau }(\sigma _j))_{j\ne i})\\ {} &{}\text { for all }i\text { where } \sigma _i\ne {\texttt {b}}_{\texttt {0}};\\ 1 &{} \text {if }\{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\}, |\{\sigma _i\}\cap \{{\texttt {f}}\}|\geqslant 2,\text { and } \hat{\sigma }_i=\hat{T}((\dot{\tau }(\sigma _j))_{j\ne i})\\ {} &{}\text { for all }i\text { where } \sigma _i\ne {\texttt {b}}_{\texttt {1}};\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

(In the definition of $\hat{I}^\text {lit}(\underline{{\sigma }})$ we used that if $\{\sigma _i\}\subseteq \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\}$, then $\dot{\tau }(\sigma _i)$ is defined for all i.) On an nae-sat instance $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, a configuration $\underline{{\sigma }}\in \Omega ^E$ is a valid coloring if ${\dot{I}}(\underline{{\sigma }}_{\delta v})=1$ for all $v\in V$, and $\hat{I}^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a})=1$ for all $a\in F$.

Lemma 2.12

On any given nae-sat instance $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, we have a bijection

$$\begin{aligned} \left\{ \begin{array}{c} \text {message configurations}\\ \underline{{\tau }}\in {\mathscr {M}}^E\end{array}\right\} \longleftrightarrow \left\{ \begin{array}{c} \text {colorings}\\ \underline{{\sigma }}\in \Omega ^E\end{array}\right\} \,.\end{aligned}$$

Proof

Given a valid message configuration $\underline{{\tau }}$, a valid coloring $\underline{{\sigma }}$ is obtained by coordinatewise application of the projection map $\textsf {S}$ from (33). In the other direction, given a valid coloring $\underline{{\sigma }}$, let $x_v={\texttt {0}}$ if $\underline{{\sigma }}_{\delta v}$ has any ${\texttt {r}}_{\texttt {0}}$ entries, $x_v={\texttt {1}}$ if $\underline{{\sigma }}_{\delta v}$ has any ${\texttt {r}}_{\texttt {1}}$ entries, and $x_v={\texttt {f}}$ otherwise. The resulting $\underline{{x}}\in \{{\texttt {0}},{\texttt {1}},{\texttt {f}}\}^V$ is a valid frozen configuration, and the argument of Lemma 2.7 implies that it has no free cycles. It then maps by Lemma 2.7 to a valid message configuration $\underline{{\tau }}$, which completes the correspondence.

$\square $

Recall the definitions (28) of $\bar{\varphi }$, $\hat{\varphi }^\text {lit}$, and $\dot{\varphi }$. For $\underline{{\sigma }}\in \Omega ^d$, let

$$\begin{aligned}\dot{\Phi }(\underline{{\sigma }}) ={\left\{ \begin{array}{ll} \dot{\varphi }(\underline{{\hat{\sigma }}}) &{}\text {if }{\dot{I}}(\underline{{\sigma }})=1\text { and } \{\sigma _i\}\subseteq \{{\texttt {f}}\};\\ 1 &{} \text {if }{\dot{I}} (\underline{{\sigma }})=1\text { and }\{\sigma _i\} \subseteq \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\};\\ 0 &{} \text {otherwise (meaning that }{\dot{I}}(\underline{{\sigma }})=0). \end{array}\right. }\end{aligned}$$

(Note if $\{\sigma _i\}\subseteq \{{\texttt {f}}\}$ then $\underline{{\hat{\sigma }}}=\underline{{\hat{\tau }}}$ and $\dot{\varphi }(\underline{{\hat{\sigma }}})$ is well-defined.) For $\underline{{\sigma }}\in \Omega ^k$, let

$$\begin{aligned} \hat{\Phi }^\text {lit}(\underline{{\sigma }})={\left\{ \begin{array}{ll} \hat{\varphi }^\text {lit}((\dot{\tau }(\sigma _i))_i) &{} \text {if } \hat{I}^\text {lit}(\underline{{\sigma }})=1\text { and }\{\sigma _i\} \subseteq \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\};\\ 1 &{} \text {if }\hat{I}^\text {lit}(\underline{{\sigma }})=1\text { and }\{\sigma _i\} \cap \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}\ne \varnothing ;\\ 0 &{}\text {otherwise (meaning that } \hat{I}^\text {lit}(\underline{{\sigma }})=0).\end{array}\right. } \end{aligned}$$

(34)

(Note if $\{\sigma _i\}\subseteq \{{\texttt {b}}_{\texttt {1}},{\texttt {b}}_{\texttt {1}}\}\cup \{{\texttt {f}}\}$ then $\dot{\tau }(\sigma _i)$ is well-defined for all i.) Finally, let

$$\begin{aligned}\bar{\Phi }(\sigma ) ={\left\{ \begin{array}{ll} \bar{\varphi }(\sigma ) &{}\text {if }\sigma \in \{{\texttt {f}}\},\\ 1 &{} \text {if }\sigma \in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}, {\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}.\end{array}\right. } \end{aligned}$$

The following is a straightforward consequence of Lemma 2.9:

Lemma 2.13

Suppose on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$ that $\underline{{x}}$ is a frozen configuration with no free cycles. Let $\underline{{\sigma }}$ be the coloring that corresponds to $\underline{{x}}$ by Lemmas 2.7 and 2.12. Then the number of valid nae-sat extensions of $\underline{{x}}$ is given by $\mathrm {\textsf {{size}}}(\underline{{x}})=\varvec{w}^\text {lit}_{\mathscr {G}}(\underline{{\sigma }})$ where we define

$$\begin{aligned} \varvec{w}^\text {lit}_{\mathscr {G}}(\underline{{\sigma }})\equiv \prod _{v\in V} \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F} \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in E} \bar{\Phi }(\sigma _e)\,. \end{aligned}$$

(35)

Proof

This is a rewriting of (30).$\square $

Definition 2.14

(T-colorings) If $\sigma \in \{{\texttt {f}}\}$, then $\dot{\sigma }$ is a tree rooted at a clause a incident to a single edge e(a), while $\hat{\sigma }$ is a tree rooted at a variable v incident to a single edge e(v). Glue $\dot{\sigma }$ and $\hat{\sigma }$ together by identifying e(a) with e(v), and let $|\sigma |$ count the number of free variables in the resulting tree. (Note that $|\sigma |$ must be finite because we only consider colorings of finite nae-sat instances $\mathscr {G}$.) Thus $|\sigma |=|\dot{\sigma }|+|\hat{\sigma }|-1$ where $|\dot{\sigma }|$ is the number of free variables in the tree $\dot{\sigma }$, and $|\hat{\sigma }|$ is the number of free variables in the tree $\hat{\sigma }$. If $\sigma \in \Omega {\setminus }\{{\texttt {f}}\} =\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$ then define $|\sigma |\equiv 0$. If $\underline{{\sigma }}$ is a valid coloring on $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, then $|\sigma _e|$ must be finite on every edge $e\in E$, by Definition 2.11. For $0\leqslant T\leqslant \infty $ define $\Omega _T\equiv \{\sigma \in \Omega :|\sigma |\leqslant T\}$; we then call $\underline{{\sigma }}$ a T-coloring if $\sigma _e\in \Omega _T$ for all $e\in E$. Define $\varvec{Z}_{\lambda ,T}$ to be the partition function of $\lambda $-tilted T-colorings,

$$\begin{aligned} \varvec{Z}_{\lambda ,T}\equiv \varvec{Z}_{\lambda ,T}(\mathscr {G}) \equiv \sum _{\underline{{\sigma }}\in (\Omega _T)^E} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \,. \end{aligned}$$

(36)

Denote $\varvec{Z}_\lambda \equiv \varvec{Z}_{\lambda ,\infty }$ and note that as $T\uparrow \infty $ we have $\varvec{Z}_{\lambda ,T}\uparrow \varvec{Z}_{\lambda ,\infty }\equiv \varvec{Z}_\lambda $.

Proposition 2.15

On an nae-sat instance $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, recall $\mathrm {\textsf {{CL}}}(\mathscr {G})$ denotes the set of solution clusters (connected components of $\mathrm {\textsf {{SOL}}}(\mathscr {G})$), and define

$$\begin{aligned} \bar{\varvec{Z}}_\lambda \equiv \bar{\varvec{Z}}_\lambda (\mathscr {G}) \equiv \sum _{\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})}|\varvec{\gamma }|^\lambda \,. \end{aligned}$$

(37)

Then $\bar{\varvec{Z}}_\lambda $ is lower bounded by $\varvec{Z}_\lambda $, where $\varvec{Z}_\lambda $ is the increasing limit of $\varvec{Z}_{\lambda ,T}$ as defined by (36).

Proof

On $\mathscr {G}$, the colorings $\underline{{\sigma }}$ are in bijection (Lemma 2.12) with the message configurations $\underline{{\tau }}$, which in turn are in bijection (Lemma 2.7) with the frozen configurations $\underline{{x}}$ that do not have free cycles. Each such frozen configuration defines a distinct cluster $\varvec{\gamma }\in \mathrm {\textsf {{CL}}}(\mathscr {G})$, of size $|\varvec{\gamma }|=\mathrm {\textsf {{size}}}(\underline{{x}})=\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})$. The claimed inequality directly follows.$\square $

To summarize what we have obtained so far, note that the quantity $\bar{\varvec{Z}}_\lambda $ of (37) is a formal definition of the “$\lambda $-tilted cluster partition function” introduced in (5). In a sequence (18) of combinatorial mappings, we have produced in (36) a mathematically well-defined quantity $\varvec{Z}_{\lambda ,T}$ which lower bounds $\bar{\varvec{Z}}_\lambda $ (Proposition 2.15), and will be much more tractable thanks to the product formula for cluster sizes (Lemma 2.13). The lower bound of Theorem 1 is based on the second moment method applied to $\varvec{Z}_{\lambda ,T}$. In preparation for the moment calculation, we conclude the current section by discussing some simplifications obtained by averaging over the literals of the nae-sat instance.

2.5 Averaging over edge literals

Our eventual purpose is to calculate $\mathbb {E}\varvec{Z}_{\lambda ,T}$ and $\mathbb {E}[(\varvec{Z}_{\lambda ,T})^2]$, where $\mathbb {E}$ is expectation over the nae-sat instance $\mathscr {G}$. Recall that $\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})$ where $\mathcal {G}=(V,F,E)$ is the graph without the edge literals $\underline{{\texttt {L}}}$. Then $\mathbb {E}\varvec{Z}_{\lambda ,T}=\mathbb {E}(\mathbb {E}(\varvec{Z}_{\lambda ,T}\,|\,\mathcal {G}))$ where

$$\begin{aligned} \mathbb {E}(\varvec{Z}_{\lambda ,T}\,|\,\mathcal {G}) =\sum _{\underline{{\sigma }}\in (\Omega _T)^E} \mathbb {E}(\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \,|\,\mathcal {G})\,. \end{aligned}$$

For any $l\geqslant 1$ and any function $g:\{{\texttt {0}},{\texttt {1}}\}^l\rightarrow \mathbb {R}$, let $\mathbb {E}^lit g$ denote the average value of $g(\underline{{\texttt {L}}})$ over all $\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^l$. For any $\underline{{\sigma }}\in \Omega ^E$, we have $\mathbb {E}(\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \,|\,\mathcal {G})= \varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda $ where (compare (35))

$$\begin{aligned} \varvec{w}_\mathcal {G}(\underline{{\sigma }}) \equiv \prod _{v\in V} \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F}\hat{\Phi }(\underline{{\sigma }}_{\delta a}) \prod _{e\in E} \bar{\Phi }(\sigma _e)\,,\quad \hat{\Phi }(\underline{{\sigma }}) \equiv \Big ( \mathbb {E}^lit [\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})^\lambda ] \Big )^{1/\lambda } \end{aligned}$$

(38)

— that is to say, even after averaging over $\underline{{\texttt {L}}}$, the contribution of each $\underline{{\sigma }}\in \Omega ^E$ is still given by a product formula. This means that $\mathbb {E}(\varvec{Z}_{\lambda ,T}\,|\,\mathcal {G})$ is the partition function of a “factor model”:

Definition 2.16

(factor model) On a bipartite graph $\mathcal {G}=(V,F,E)$, the factor model specified by $g\equiv ({\dot{g}},{\hat{g}},{\bar{g}})$ is the probability measure $\nu _\mathcal {G}$ on configurations $\underline{{\xi }}\in {\mathscr {X}}^E$ defined by

$$\begin{aligned}\nu _\mathcal {G}(\underline{{\xi }}) = \frac{1}{Z} \prod _{v\in V} {\dot{g}}(\underline{{\xi }}_{\delta v}) \prod _{a\in F} {\hat{g}}(\underline{{\xi }}_{\delta a}) \prod _{e\in E} {\bar{g}}(\xi _e), \end{aligned}$$

with Z the normalizing constant.

A further observation is that for $\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})$ and $\underline{{\sigma }}\in \Omega ^E$, as we go over all possibilities of $\underline{{\texttt {L}}}$ while keeping $\mathcal {G}$ fixed, the weight $\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})$ (the size of the cluster encoded by $\underline{{\sigma }}$ on $\mathscr {G}$, as given by (35)) does not take more than one positive value. In other words, we can extract the cluster size without referring to the edge literals. The precise statement is as follows:

Lemma 2.17

The function $\hat{\Phi }^\text {lit}$ of (34) can be factorized as $\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})\hat{F}(\underline{{\sigma }})$ for

$$\begin{aligned}\hat{F}(\underline{{\sigma }})\equiv {\left\{ \begin{array}{ll} 1 &{}\text {if }\underline{{\sigma }}\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^k,\\ \displaystyle \frac{\hat{z}(\hat{\sigma }_j)}{\bar{\varphi }(\sigma _j)} &{}\text {if } \underline{{\sigma }}\in \Omega ^k\text { with }\sigma _j\in \{{\texttt {f}}\}. \end{array}\right. } \end{aligned}$$

As a consequence, the function of (38) satisfies $\hat{\Phi }(\underline{{\sigma }})^\lambda ={\hat{v}}(\underline{{\sigma }})\hat{F}(\underline{{\sigma }})^\lambda $ where ${\hat{v}}(\underline{{\sigma }})\equiv \mathbb {E}^lit [\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})]$.

Proof

For $\underline{{\sigma }}\in \Omega ^k$ abbreviate $\{\sigma _i\}\equiv \{\sigma _1,\ldots ,\sigma _k\}$. If $\{\sigma _i\}\subseteq \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$, then the definition (34) implies $\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})$ for all $\underline{{\texttt {L}}}$, so the factorization holds with $\hat{F}(\underline{{\sigma }})\equiv 1$. If $\{\sigma _i\}$ nontrivially intersects both $\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}$ and $\{{\texttt {f}}\}$, then $\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=0$ for all $\underline{{\texttt {L}}}$, so we can set $\hat{F}(\underline{{\sigma }})$ arbitrarily. It remains to consider the case where $\{\sigma _i\}$ nontrivially intersects $\{{\texttt {f}}\}$ but does not intersect $\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}$. Recalling the discussion around (33), this means that $\dot{\tau }_i\equiv \dot{\tau }(\sigma _i)\in \dot{\mathscr {M}}{\setminus }\{\star \}$ is well-defined for all i—if $\sigma _i\in \{{\texttt {f}}\}$ then $\dot{\tau }_i=\dot{\sigma }_i\in \dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}$, and if $\sigma _i={\texttt {b}}_{{\varvec{x}}}$ then $\dot{\tau }_i={\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$. Following (23), given any $\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^k$, let us define $\hat{\tau }_{\underline{{\texttt {L}}},i}\equiv {\texttt {L}}_i\oplus \hat{T}((\dot{\tau }_j\oplus {\texttt {L}}_j)_{j\ne i} )$. If $\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=1$, then it follows from (25), (28), and (34) that

$$\begin{aligned} \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})&=\hat{\varphi }^\text {lit}( \underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}}) =\sum _{\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^k} I^\textsc {nae}(\underline{{{\varvec{x}}}}\oplus \underline{{\texttt {L}}}) \prod _{j=1}^k [\dot{{\texttt {m}}}(\dot{\tau }_j)]({\varvec{x}}_j)\nonumber \\&=\hat{z}(\hat{\tau }_{\underline{{\texttt {L}}},i}) \sum _{{\varvec{x}}_i} [\dot{{\texttt {m}}}(\dot{\tau }_i)]({\varvec{x}}_i) \cdot [\hat{{\texttt {m}}}(\hat{\tau }_{\underline{{\texttt {L}}},i})]({\varvec{x}}_i) =\frac{\hat{z}(\hat{\tau }_{\underline{{\texttt {L}}},i})}{\bar{\varphi }(\dot{\tau }_i,\hat{\tau }_{\underline{{\texttt {L}}},i})}\,. \end{aligned}$$

(39)

We will have $\hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=1$ if and only if it holds for all $1\leqslant i\leqslant k$ that $\hat{\tau }_{\underline{{\texttt {L}}},i}$ is compatible with $\sigma _i$, in the sense that $\textsf {S}(\dot{\tau }_i,\hat{\tau }_{\underline{{\texttt {L}}},i})=\sigma _i$. In particular, if $\sigma _i\in \{{\texttt {f}}\}$ (and we assumed $\underline{{\sigma }}$ has at least one such entry), we must have $(\dot{\tau }_i,\hat{\tau }_{\underline{{\texttt {L}}},i})=(\dot{\sigma }_i,\hat{\sigma }_i)$. It follows that for any $\underline{{\sigma }}\in \Omega ^k$ having at least one entry in $\{{\texttt {f}}\}$, we can define $\hat{F}(\underline{{\sigma }})\equiv \hat{z}(\hat{\sigma }_i)/\bar{\varphi }(\sigma _i)$ for any i where $\sigma _i\in \{{\texttt {f}}\}$. This completes the proof.$\square $

Corollary 2.18

On a bipartite graph $\mathcal {G}=(V,F,E)$, suppose $\underline{{\sigma }}\in \Omega ^E$ satisfies ${\dot{I}}(\underline{{\sigma }}_{\delta v})=1$ for all $v\in V$. Then, for $\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})$, it follows from (35) that

$$\begin{aligned} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }}) =\bigg \{\prod _{a\in F} \hat{I}^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a})\bigg \} \varvec{W}_\mathcal {G}(\underline{{\sigma }})\,,\quad \varvec{W}_\mathcal {G}(\underline{{\sigma }})\equiv \prod _{v\in V}\dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F}\hat{F}(\underline{{\sigma }}_{\delta a}) \prod _{e\in E}\bar{\Phi }(\sigma _e)\,. \end{aligned}$$

Combining with (38) gives, with ${\hat{v}}$ as defined by Lemma 2.17,

$$\begin{aligned} \varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda =\varvec{p}_\mathcal {G}(\underline{{\sigma }}) \varvec{W}_\mathcal {G}(\underline{{\sigma }})^\lambda \,,\quad \varvec{p}_\mathcal {G}(\underline{{\sigma }}) \equiv \mathbb {E}\bigg [ \prod _{a\in F} \hat{I}^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \,\bigg |\,\mathcal {G}\bigg ] =\prod _{a\in F} {\hat{v}}(\underline{{\sigma }}_{\delta a})\,. \end{aligned}$$

Proof

Immediate consequence of Lemma 2.17.$\square $

In the notation of Definition 2.16, the conditional first moment $\mathbb {E}(\varvec{Z}_{\lambda ,T}\,|,\mathcal {G})$ is the partition function of the factor model with specification $(\dot{\Phi },\hat{\Phi },\bar{\Phi })^\lambda $ restricted to the alphabet $\Omega _T$. Similarly, the conditional second moment $\mathbb {E}[(\varvec{Z}_{\lambda ,T})^2\,|\,\mathcal {G}]$ is the partition function of the factor model on the alphabet $(\Omega _T)^2$ with specification $(\dot{\Phi }_2,\hat{\Phi }_2,\bar{\Phi }_2)^\lambda $, where $\dot{\Phi }_2\equiv \dot{\Phi }\otimes \dot{\Phi }$, $\bar{\Phi }_2\equiv \bar{\Phi }\otimes \bar{\Phi }$, and for any $\underline{{\sigma }}\equiv (\underline{{\sigma }}^1,\underline{{\sigma }}^2)\in \Omega ^{2k}$ we have

$$\begin{aligned} \hat{\Phi }_2(\underline{{\sigma }})\equiv \bigg (\mathbb {E}^lit \Big [ \hat{\Phi }^\text {lit}(\underline{{\sigma }}^1\oplus \underline{{\texttt {L}}})^\lambda \hat{\Phi }^\text {lit}(\underline{{\sigma }}^2\oplus \underline{{\texttt {L}}})^\lambda \Big ]\bigg )^{1/\lambda } ={\hat{v}}_2(\underline{{\sigma }})^{1/\lambda } (\hat{F}\otimes \hat{F})(\underline{{\sigma }})\,, \end{aligned}$$

for ${\hat{v}}_2(\underline{{\sigma }})\equiv \mathbb {E}^lit [\hat{I}^\text {lit}(\underline{{\sigma }}^1\oplus \underline{{\texttt {L}}})\hat{I}^\text {lit}(\underline{{\sigma }}^1\oplus \underline{{\texttt {L}}})]$ (by Corollary 2.18). We emphasize that $\hat{\Phi },\hat{\Phi }_2$ both depend on $\lambda $, although we suppress it from the notation. Moreover, $\hat{\Phi }_2\ne \hat{\Phi }\otimes \hat{\Phi }$ since $\underline{{\sigma }}^1$ and $\underline{{\sigma }}^2$ are coupled through their interaction with the same literals $\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^k$. Lastly, we have written $\underline{{\sigma }}$ in the first moment and $\underline{{\sigma }}\equiv (\underline{{\sigma }}^1,\underline{{\sigma }}^2)$ in the second moment—this is a deliberate abuse of notation, which allows us to treat the two cases in a unified manner. To distinguish the cases we shall refer to the “first-moment” or “single-copy” model, versus the “second-moment” or “pair” model. We turn next to the analysis of these models.

3 Proof outline

Having formally set up our combinatorial model of nae-sat solution clusters (Sect. 2), we now give a more detailed outline for the (first and second) moment calculation that proves the lower bound of Theorem 1. (As we mentioned before, the upper bound of Theorem 1 is proved by an interpolation argument which builds on prior results in spin glass theory [12, 26, 43]. It does not involve the combinatorial model or the moment method, and is deferred to “Appendix E”.)

3.1 Empirical measures and moments

We use standard multi-index notations in what follows—in particular, for any ordered sequence $z=(z_1,\ldots ,z_l)$ of nonnegative integers summing to n, we denote

$$\begin{aligned} \left( {\begin{array}{c}n\\ z\end{array}}\right) \equiv n!\bigg / \prod _{i=1}^l z_i!\,. \end{aligned}$$

If $\pi $ is any nonnegative measure on a discrete space, write $\mathcal {H}(\pi ) = -\langle \pi ,\ln \pi \rangle $ for its Shannon entropy. It follows from Stirling’s formula that for any fixed $\pi $, in the limit $n\rightarrow \infty $ we have

$$\begin{aligned} \left( {\begin{array}{c}n\\ n\pi \end{array}}\right) \asymp \frac{\exp \{n{\mathcal {H}}(\pi )\}}{n^{(|{{\,\mathrm{supp}\,}}\pi |-1)/2}}\,. \end{aligned}$$

On a bipartite graph $\mathcal {G}$, we will summarize colorings $\underline{{\sigma }}$ according to some “local statistics,” as follow:

Definition 3.1

(empirical measures) Given a bipartite graph $\mathcal {G}=(V,F,E)$ and $\underline{{\sigma }}\in \Omega ^E$, define

$$\begin{aligned} \begin{aligned}\dot{H}(\underline{{\dot{\sigma }}})&=|\{v\in V:\underline{{\sigma }}_{\delta v} =\underline{{\dot{\sigma }}}\}|/|V|&\quad \text {for } \underline{{\dot{\sigma }}}\in \Omega ^d,\\ \hat{H}(\underline{{\hat{\sigma }}})&= |\{a\in F: \underline{{\sigma }}_{\delta a} =\underline{{\hat{\sigma }}}\}|/|F|&\quad \text {for } \underline{{\hat{\sigma }}}\in \Omega ^k,\\ \bar{H}(\sigma )&=|\{e\in E: \sigma _e=\sigma \}|/|E|&\quad \text {for } \sigma \in \Omega ; \end{aligned} \end{aligned}$$

The triple $H\equiv H(\mathcal {G},\underline{{\sigma }})\equiv (\dot{H},\hat{H},\bar{H})$ is the empirical measure of $\underline{{\sigma }}$ on $\mathcal {G}$.

Recall from (38) that $\varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda $ is the contribution to $\mathbb {E}(\varvec{Z}_\lambda \,|\,\mathcal {G})$ from $\underline{{\sigma }}\in \Omega ^E$. We saw in Corollary 2.18 that $\varvec{w}_\mathcal {G}(\underline{{\sigma }})^\lambda =\varvec{p}_\mathcal {G}(\underline{{\sigma }})\varvec{W}_\mathcal {G}(\underline{{\sigma }})^\lambda $ where $\varvec{p}_\mathcal {G}(\underline{{\sigma }})$ is the probability (conditional on $\mathcal {G}$) that $\underline{{\sigma }}$ is a valid coloring on $(\mathcal {G},\underline{{\texttt {L}}})$; and $\varvec{W}_\mathcal {G}(\underline{{\sigma }})$ is the size of the cluster encoded if $\underline{{\sigma }}$ is valid. Now all these quantities can be expressed solely in terms of $H=H(\mathcal {G},\underline{{\sigma }})$: we have $\varvec{p}_\mathcal {G}(\underline{{\sigma }})=\exp (n\varvec{v}(H))$ and $\varvec{W}_\mathcal {G}(\underline{{\sigma }})=\exp (n\varvec{s}(H))$ where

$$\begin{aligned} \varvec{v}(H)&\equiv (d/k) \langle \ln {\hat{v}},\hat{H}\rangle =(d/k) \sum _{\underline{{\sigma }}\in \Omega ^k} \hat{H}(\underline{{\sigma }}) \ln {\hat{v}}(\underline{{\sigma }}) \,,\\ \varvec{s}(H)&\equiv \langle \ln \dot{\Phi },\dot{H}\rangle +(d/k)\langle \ln \hat{F}, \hat{H}\rangle +d\langle \ln \bar{\Phi },\bar{H}\rangle \,. \end{aligned}$$

Given $\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})$, let $\varvec{Z}_{\lambda ,T}(H)$ be the contribution to $\varvec{Z}_{\lambda ,T}$ from colorings $\underline{{\sigma }}\in (\Omega _T)^E$ such that $H(\mathcal {G},\underline{{\sigma }})=H$. In what follows we will often suppress the dependence on $\lambda $ and T, and write simply $\varvec{Z}\equiv \varvec{Z}_{\lambda ,T}$.

Definition 3.2

(simplex) For d, k, T fixed, the simplex of empirical measures is the space $\varvec{\Delta }\equiv \varvec{\Delta }(T)$ of triples $H\equiv (\dot{H},\hat{H},\bar{H})$ satisfyng the following conditions: $\dot{H}$ is a probability measure supported within the set of $\underline{{\sigma }}\in (\Omega _T)^d$ such that ${\dot{I}}(\underline{{\sigma }})=1$; $\hat{H}$ is a probability measure supported within the set of $\underline{{\sigma }}\in (\Omega _T)^k$ such that ${\hat{v}}(\underline{{\sigma }})$ is positive; and both $\dot{H}$ and $\hat{H}$ must have marginal $\bar{H}$, that is,

$$\begin{aligned} \frac{1}{d}\sum _{\underline{{\sigma }}\in \Omega ^d} \dot{H}(\underline{{\sigma }})\sum _{i=1}^d \mathbf {1}\{\sigma _i=\sigma \} =\bar{H}(\sigma ) =\frac{1}{k}\sum _{\underline{{\sigma }}\in \Omega ^k} \hat{H}(\underline{{\sigma }})\sum _{j=1}^k \mathbf {1}\{\sigma _j=\sigma \} \end{aligned}$$

(40)

for all $\sigma \in \Omega $. It follows that $\bar{H}$ is a probability measure supported on $\Omega _T$.

It follows from Corollary 2.18 that if $\mathbb {E}$ is expectation over a (d, k)-regular nae-sat instance on n variables, then $\mathbb {E}\varvec{Z}(H)$ is positive if and only if $H\in \varvec{\Delta }$ and $(n\dot{H},m\hat{H})$ is integer-valued. For such H, it follows from the definition of the random regular nae-sat graph that

$$\begin{aligned} \mathbb {E}\varvec{Z}(H) =\bigg [\bigg \{\left( {\begin{array}{c}n\\ n\dot{H}\end{array}}\right) \left( {\begin{array}{c}m\\ m\hat{H}\end{array}}\right) \bigg / \left( {\begin{array}{c}nd\\ nd\bar{H}\end{array}}\right) \bigg \} \exp \{n\varvec{v}(H)\}\bigg ] \cdot \exp \{ n\lambda \varvec{s}(H) \}\,. \end{aligned}$$

(41)

In (41), the first factor (in square brackets) is the expected number $\mathbb {E}\varvec{Z}_{\lambda =0,T}$ of valid colorings with empirical profile H. The remaining factor $\exp \{ n\lambda \varvec{s}(H) \}$ is explained by the fact that any such coloring encodes a cluster of size $\exp \{n\varvec{s}(H)\}$. By Stirling’s formula, in the limit $n\rightarrow \infty $ (with T fixed),

$$\begin{aligned} \mathbb {E}\varvec{Z}_{\lambda =0,T} \asymp \frac{\exp \{n [\mathcal {H}(\dot{H}) + (d/k)\mathcal {H}(\hat{H}) -d\mathcal {H}(\bar{H}) +\varvec{v}(H)]\}}{n^{\wp (H)/2}} \equiv \frac{\exp \{n\varvec{\Sigma }(H)\}}{n^{\wp (H)/2}} \end{aligned}$$

where $\wp (H)\equiv |{{\,\mathrm{supp}\,}}\dot{H}|+|{{\,\mathrm{supp}\,}}\hat{H}|-|{{\,\mathrm{supp}\,}}\bar{H}|-1$, and the exponential rate $\varvec{\Sigma }(H)$ is a formal analogue of the “cluster complexity” function $\Sigma (s)$ appearing in (4). In analogy with (6) we let

$$\begin{aligned} \varvec{F}\equiv \varvec{F}_{\lambda ,T} \equiv \varvec{\Sigma }(H)+\lambda \varvec{s}(H)\,. \end{aligned}$$

(42)

Then altogether the first moment can be estimated as

$$\begin{aligned} \mathbb {E}\varvec{Z}(H) \asymp \bigg (\frac{\exp \{n\varvec{\Sigma }(H)\}}{n^{\wp (H)/2}}\bigg ) \exp \{n\lambda \varvec{s}(H)\} =\frac{\exp \{n\varvec{F}(H)\}}{n^{\wp (H)/2}}\,. \end{aligned}$$

(43)

Note that $\asymp $ hides a dependence on T, since we keep T fixed throughout our moment analysis.

3.2 Outline of first moment

For any subset of empirical measures ${\mathbf {H}}\subseteq \varvec{\Delta }$, we write $\underline{{\sigma }}\in {\mathbf {H}}$ to indicate that $H(\mathcal {G},\underline{{\sigma }})\in {\mathbf {H}}$, and write $\varvec{Z}({\mathbf {H}})\equiv \varvec{Z}_{\lambda ,T}({\mathbf {H}})$ for the contribution to $\varvec{Z}_{\lambda ,T}$ from colorings $\underline{{\sigma }}\in {\mathbf {H}}$. It then follows from (43) that

$$\begin{aligned}\mathbb {E}\varvec{Z}({\mathbf {H}}) =\sum _{H\in {\mathbf {H}}}\mathbb {E}\varvec{Z}(H) =n^{O(1)} \exp \Big \{n\max \{\varvec{F}(H):H\in {\mathbf {H}}\} \Big \}\,, \end{aligned}$$

for $\varvec{F}$ as in (42). Thus, calculating the first moment $\mathbb {E}\varvec{Z}$ essentially reduces to the problem of maximizing $\varvec{F}$ over $\varvec{\Delta }$. The physics theory suggests that $\varvec{F}$ is uniquely maximized at a point $H_\star \in \varvec{\Delta }$ which is given explicitly in terms of a replica symmetric fixed point for the $\lambda $-tilted T-coloring model. (Recall from §1.5 that in the original nae-sat model, the replica symmetric fixed point was described by the measure ${\texttt {m}}=\text {unif}(\{{\texttt {0}},{\texttt {1}}\})$. In the coloring model, with spins $\sigma \equiv (\dot{\sigma },\hat{\sigma })\in \Omega _T$, the replica symmetric fixed point will be characterized by a measure ${\dot{q}}$ on the space $\Omega _T$ of possible values for $\dot{\sigma }$.)

There are several obstacles to the rigorous moment computation. From a physics perspective, the replica symmetric fixed point of the $\lambda $-tilted coloring model at $T=\infty $ is equivalent to the fixed point described by Proposition 1.2, which was used to define the 1rsb prediction (12). Mathematically, however, we work with T finite so that $\varvec{\Delta }$ has finite dimension and $\wp (H)$ is defined. Therefore we need to explicitly construct a replica symmetric fixed point at finite T, and use it to define $ H_\star =H_{\lambda ,T}\in \varvec{\Delta }$ (Definition 5.6 below). We must then take $T\rightarrow \infty $ and show that the limit matches the fixed point of Proposition 1.2, so that $\varvec{F}(H_\star )= \varvec{F}_{\lambda ,T}(H_{\lambda ,T})$ converges as $T\rightarrow \infty $ the 1rsb prediction (12). The construction of the fixed point at finite T is stated in Proposition 5.5 below, and proved in “Appendix A”. The correspondence with (12) in the $T=\infty $ limit is stated in Proposition 3.13 below, and proved in “Appendix B”.

A more difficult problem is to show that $\varvec{F}$ is in fact maximized at $H_\star $. The function $\varvec{F}$ is generally not convex, and must be optimized over a space $\varvec{\Delta }$ whose dimension grows with d, k, T. Moreover, an analogous but even more difficult optimization must be solved to compute the second moment $\mathbb {E}(\varvec{Z}^2)$. The main part of this analysis is carried out in Sects. 4 and 5. In the remainder of this section we make some preparatory calculations and explain how the pieces will be fit together to prove the main result Theorem 1. Recall from Remark 1.1 that we can restrict consideration to $\alpha $ satisfying (3). In this regime, we make a priori estimates to show that the optimal H satisfy some basic restrictions. Abbreviate $\{{\texttt {r}}\}\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}$, recall $\{{\texttt {f}}\}\equiv \Omega {\setminus }\{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$, and let

$$\begin{aligned} \mathbf {N}_\circ&\equiv \bigg \{H\in \varvec{\Delta }: \max \{ \bar{H}({\texttt {f}}), \bar{H}({\texttt {r}}) \} \leqslant \frac{7}{2^k} \bigg \}\,\,,\nonumber \\ \mathbf {N}&=\bigg \{ H\in \mathbf {N}_\circ : \Vert {H-H_\star } \Vert \leqslant \frac{1}{n^{1/3}} \bigg \} \subseteq \mathbf {N}_\circ \,, \end{aligned}$$

(44)

where $\Vert {\cdot } \Vert $ denotes the $\ell ^1$ norm throughout this paper. We next show that empirical measures $H\notin \mathbf {N}_\circ $ give a negligible contribution to the first moment:

Lemma 3.3

For $k\geqslant k_0$, $\alpha \equiv d/k$ satisfying (3), and $0\leqslant \lambda \leqslant 1$, $\mathbb {E}\varvec{Z}(\varvec{\Delta }{\setminus }\mathbf {N}_\circ )$ is exponentially small in n.

Proof

Let $Z^{\texttt {f}}$ count the nae-sat solutions $\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ which map—via coarsening and the bijection (18)—to warning configurations $\underline{{y}}$ with more than $7/2^k$ fraction of edges e such that ${\dot{y}}_e={\hat{y}}_e={\texttt {f}}$. Similarly, let $ Z^{\texttt {r}}$ count nae-sat solutions $\underline{{{\varvec{x}}}}$ mapping to warning configurations $\underline{{y}}$ with more than $7/2^k$ fraction of edges e such that ${\hat{y}}_e\in \{{\texttt {0}},{\texttt {1}}\}$. It follows from Proposition 2.15 that for any $0\leqslant \lambda \leqslant 1$ we have $\varvec{Z}(\varvec{\Delta }{\setminus }\mathbf {N}_\circ )\leqslant Z^{\texttt {f}}+ Z^{\texttt {r}}$. For $\alpha $ satisfying (3), $\mathbb {E}Z^{\texttt {f}}$ is exponentially small in n by [25, Propn. 2.2]. As for $Z^{\texttt {r}}$, let us say that an edge $e\in E$ is blocked under $\underline{{{\varvec{x}}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ if ${\texttt {L}}_e \oplus x_{v(e)}= {\texttt {1}}\oplus {\texttt {L}}_{e'} \oplus x_{v(e')}$ for all $e'\in \delta a(e) {\setminus } e$. Note that if $\underline{{{\varvec{x}}}}$ maps to $\underline{{y}}$, the only possibility for $y_e\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}$ is that e was blocked under $\underline{{{\varvec{x}}}}$. (The converse need not hold.) If we condition on $\underline{{{\varvec{x}}}}$ being a valid nae-sat solution, then each clause contains a blocking edge independently with chance $\theta = 2k/(2^k-2)$; note also that a clause can contain at most one blocking edge. It follows that

$$\begin{aligned} \mathbb {E}Z^{\texttt {r}}\leqslant (\mathbb {E}Z) \mathbb {P}\bigg ({\mathrm {Bin}}( m, \theta ) \geqslant 7 nd/2^k\bigg )\,. \end{aligned}$$

This is exponentially small in n by a Chernoff bound together with the trivial bound $\mathbb {E}Z \leqslant 2^n$.$\square $

We assume throughout what follows that $k\geqslant k_0$, $\alpha $ satisfies (3), and $0\leqslant \lambda \leqslant 1$. In this regime, Lemma 3.3 tells us that $\max \{\varvec{F}(H) : H\notin \mathbf {N}_\circ \}$ is negative. On the other hand, we shall assume that the global maximum of $\varvec{F}$ is nonnegative, since otherwise $\mathbb {E}\varvec{Z}$ is exponentially small in n and there is nothing to prove. From this we have that any maximizer H of $\varvec{F}$ must lie in $\mathbf {N}_\circ $. In Sects. 4 and 5 we develop a more refined analysis to solve the optimization problem for $\varvec{F}$ restricted to $\mathbf {N}_\circ $:

Proposition 3.4

(proved in Sect. 5) Assuming the global maximum of $\varvec{F}$ is nonnegative, the unique maximizer of $\varvec{F}$ is an explicitly characterized point $H_\star $ in the interior of $\mathbf {N}_\circ $. Moreover, there is a positive constant $\epsilon =\epsilon (k,\lambda ,T)$ so that for all $\Vert {H-H_\star } \Vert \leqslant \epsilon $ we have $\varvec{F}(H) \leqslant \varvec{F}(H_\star )-\epsilon \Vert {H-H_\star } \Vert ^2$.

A consequence of the above is that we can compute the first moment of $\varvec{Z}$ up to constant factors. In the following, let $\dot{\wp }\equiv \dot{\wp }(T)$ count the number of d-tuples $\underline{{\sigma }}\in (\Omega _T)^d$ for which ${\dot{I}}(\underline{{\sigma }})>0$. Let $\hat{\wp }\equiv \hat{\wp }(T)$ count the number of k-tuples $\underline{{\sigma }}\in (\Omega _T)^k$ for which ${\hat{v}}(\underline{{\sigma }})>0$. Let $\bar{\wp }\equiv |\Omega _T|$, and denote $\wp \equiv \dot{\wp }+\hat{\wp }-\bar{\wp }-1$. Recall from (44) the definition of $\mathbf {N}$.

Corollary 3.5

The coloring partition function $\varvec{Z}\equiv \varvec{Z}_{\lambda ,T}$ has first moment $\mathbb {E}\varvec{Z}\asymp \exp \{ n \varvec{F}(H_\star ) \}$. Moreover the expectation is dominated by $\mathbf {N}$ in the sense that $\mathbb {E}\varvec{Z}(\mathbf {N})=(1-o(1)) \mathbb {E}\varvec{Z}$.

Proof

Define the $\bar{\wp }\times \dot{\wp }$ matrix ${\dot{M}}$ with entries

$$\begin{aligned}{\dot{M}}(\sigma ',\underline{{\sigma }}) \equiv \sum _{i=1}^d \mathbf {1}\{\sigma _i=\sigma '\}\,,\quad \sigma '\in \Omega _T\text { and } \underline{{\sigma }}\in (\Omega _T)^d\cap ({{\,\mathrm{supp}\,}}{\dot{I}})\,. \end{aligned}$$

Similarly define the $\bar{\wp }\times \hat{\wp }$ matrix ${\hat{M}}$ with entries

$$\begin{aligned}{\hat{M}}(\sigma ',\underline{{\sigma }}) =\sum _{i=1}^k \mathbf {1}\{\sigma _i=\sigma '\}\,,\quad \sigma '\in \Omega _T\text { and } \underline{{\sigma }}\in (\Omega _T)^k\cap ({{\,\mathrm{supp}\,}}{\hat{v}})\,. \end{aligned}$$

Lastly define the $\bar{\wp }\times (\dot{\wp }+\hat{\wp })$ matrix $M \equiv \begin{pmatrix}{\dot{M}}&-{\hat{M}}\end{pmatrix}$. It follows from the discussion after Definition 3.2 that $\mathbb {E}\varvec{Z}(H)$ is positive if and only if (i) $\dot{H}$ and $\hat{H}$ are nonnegative; (ii) $\langle {\mathbf {1}},\dot{H}\rangle =1$; (iii) $(k\dot{H},d\hat{H})$ lies in the kernel of M; and (iv) $(n\dot{H},m\hat{H})$ is integer-valued. Conditions (i)–(iii) are equivalent to $H\in \varvec{\Delta }$. One can verify that the matrix M is of full rank, from which it follows that the space of vectors $(\dot{H},\hat{H})$ satisfying the conditions (ii) and (iii) has dimension $\wp $. In Lemma 4.4 we will show that M satisfies a stronger condition, which implies that the space of $(\dot{H},\hat{H})$ satisfying (ii)–(iv) is an affine transformation of $(n^{-1}{\mathbb {Z}})^\wp $, where the coefficients of the transformation are uniformly bounded. Then, by substituting the result of Proposition 3.4 in to (43), we conclude

$$\begin{aligned} \mathbb {E}\varvec{Z}\asymp \sum _{z\in (n^{-1}{\mathbb {Z}})^\wp } \frac{\exp \{ n[\varvec{F}(H_\star )- \Theta (\Vert {z} \Vert ^2) ]\}}{n^{\wp /2}} \asymp \exp \{n\varvec{F}(H_\star )\} \end{aligned}$$

Empirical measures $H\notin \mathbf {N}$ correspond to vectors $z\in (n^{-1}{\mathbb {Z}})^\wp $ with norm $\Vert {z} \Vert \geqslant n^{-1/3}$. These give a negligible contribution to the above sum which proves the second claim $\mathbb {E}\varvec{Z}(\mathbf {N})=(1-o(1))\mathbb {E}\varvec{Z}$.$\square $

3.3 Outline of second moment

By a similar calculation as above, calculating the second moment $\mathbb {E}(\varvec{Z}^2)$ reduces to the problem of maximizing a function $\varvec{F}_2\equiv \varvec{F}_{2,\lambda ,T}$ over a space $\varvec{\Delta }_2$ of pair empirical measures. In fact we will calculate the second moment not of $\varvec{Z}$ itself, but rather of a more restricted random variable $\varvec{S}\leqslant \varvec{Z}$, defined below. This leads to a more tractable analysis, as we now explain.

Concretely, $\varvec{\Delta }_2$ is the space of triples $H=(\dot{H},\hat{H},\bar{H})$ satisfying the following conditions: $\dot{H}$ is a probability measure on $(\Omega _T)^{2d}\cap ({{\,\mathrm{supp}\,}}{\dot{I}})^2$, $\hat{H}$ is a probability measure on $(\Omega _T)^{2d}\cap ({{\,\mathrm{supp}\,}}{\hat{v}}_2)$, and $\bar{H}$ is a probability measure on $(\Omega _T)^2$ which can be obtained from both $\dot{H}$ and $\hat{H}$ as the edge marginal (40). Repeating the derivation of (43) in the second moment setting, we have the following. For any $H\in \varvec{\Delta }_2$, the expected number of valid coloring pairs $(\underline{{\sigma }},\underline{{\sigma }}')\in (\Omega _T)^{2E}$ of type H is

$$\begin{aligned} \mathbb {E}(\varvec{Z}_{\lambda =0,T})^2(H) \asymp \frac{\exp \{n[\mathcal {H}(\dot{H}) + (d/k)\mathcal {H}(\hat{H})-d\mathcal {H}(\bar{H}) +\varvec{v}_2(H)]\}}{n^{\wp (H)/2}} \equiv \frac{\exp \{n\varvec{\Sigma }_2(H)\}}{n^{\wp (H)/2}} \end{aligned}$$

Given a pair empirical measure $H\in \varvec{\Delta }_2$, take the marginal on the first element of the pair to define the first-copy marginal $H^1\in \varvec{\Delta }$; define likewise the second-copy marginal $H^2\in \varvec{\Delta }$. The contribution to $\varvec{Z}^2$ from any valid pair is given by $\exp \{n\lambda \varvec{s}_2(H)\}$ where $\varvec{s}_2(H)\equiv \varvec{s}(H^1)+\varvec{s}(H^2)$. Thus $\varvec{F}_2$ is given explicitly by

$$\begin{aligned} \mathbb {E}\varvec{Z}^2(H) \asymp \frac{\exp \{n\varvec{F}_2(H)\}}{n^{\wp (H)/2}} \equiv \frac{\exp \{n [\varvec{\Sigma }_2(H)+\lambda \varvec{s}_2(H)] \}}{n^{\wp (H)/2}} \end{aligned}$$

(45)

(cf. (42) and (43)). In view of Corollary 3.5, it would suffice for our purposes to calculate the second moment of $\varvec{Z}(\mathbf {N})$ rather than $\varvec{Z}$, which amounts to maximizing $\varvec{F}_2$ on the restricted set

$$\begin{aligned}\mathbf {N}_2 \equiv \Big \{ H\in \varvec{\Delta }_2 : H^1\in \mathbf {N}\text { and } H^2\in \mathbf {N}\Big \}\,.\end{aligned}$$

Following [18] we can simplify the analysis by a further restriction, as follows:

Definition 3.6

(separability) If $\underline{{\sigma }},\underline{{\sigma }}'$ are valid colorings on $\mathscr {G}$, define their separation $\mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}')$ to be the fraction of variables where their corresponding frozen configurations (from Lemmas 2.7 and 2.12) differ. Write $\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}$ if the frozen configuration of $\underline{{\sigma }}'$ has more free variables than that of $\underline{{\sigma }}$. We say that a coloring is separable if $\underline{{\sigma }}\in \mathbf {N}$ (recall this means $H(\mathcal {G},\underline{{\sigma }})\in \mathbf {N}$) and

$$\begin{aligned}|\{\underline{{\sigma }}'\in \mathbf {N}: \underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\text { and } \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \notin I_\text {se} \}| \leqslant \exp \{ (\ln n)^4 \}, \end{aligned}$$

where $I_\text {se}\equiv [(1-k^4/2^{k/2})/2, (1+k^4/2^{k/2})/2]$ and it is implicit that both $\underline{{\sigma }},\underline{{\sigma }}'$ must both be valid on $\mathscr {G}$. Let $\varvec{S}\equiv \varvec{S}_{\lambda ,T}$ be the contribution to $\varvec{Z}(\mathbf {N})$ from separable colorings.

Proposition 3.7

(proved in “Appendix D”) The first moment is dominated by separable colorings in the sense that $\mathbb {E}\varvec{S}= (1-o(1)) \mathbb {E}\varvec{Z}(\mathbf {N})$.

We will apply the second moment method to lower bound $\varvec{S}$; the result will follow since $\varvec{S}\leqslant \varvec{Z}(\mathbf {N})\leqslant \varvec{Z}$. For $H\in \varvec{\Delta }_2$, all pairs $(\underline{{\sigma }},\underline{{\sigma }}')\in H$ must have the same separation $\mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}')$, which we can thus denote as $\mathrm {\textsf {{sep}}}(H)$. Partition $\mathbf {N}_2$ into $\mathbf {N}_\text {se}\equiv \{H\in \mathbf {N}_2:\mathrm {\textsf {{sep}}}(H) \in I_\text {se}\}$ (the near-uncorrelated regime) and $\mathbf {N}_\text {ns}\equiv \mathbf {N}_2{\setminus }\mathbf {N}_\text {se}$ (the correlated regime). Denote the corresponding contributions to $\varvec{S}^2$ by $\varvec{S}^2(\mathbf {N}_\text {se})$ and $\varvec{S}^2(\mathbf {N}_\text {ns})$.

Corollary 3.8

For separable colorings, the second moment contribution from the correlated regime $\mathbf {N}_\text {ns}$ is bounded by $\mathbb {E}[\varvec{S}^2(\mathbf {N}_\text {ns})] \leqslant \exp \{n \lambda \varvec{s}(H_\star ) + o(n) \} \,\mathbb {E}\varvec{S}$.

Proof

By the symmetry between the roles of $\underline{{\sigma }}$ and $\underline{{\sigma }}'$, and the definition of separability, we have

$$\begin{aligned}\varvec{S}^2(\mathbf {N}_\text {ns}) \leqslant 2\sum _{\begin{array}{c} (\underline{{\sigma }},\underline{{\sigma }}')\in \mathbf {N}_\text {ns},\\ \underline{{\sigma }}\text { separable} \end{array}} \mathbf {1}\{\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\} \varvec{w}^\text {lit}_{\mathscr {G},T}(\underline{{\sigma }})^\lambda \varvec{w}^\text {lit}_{\mathscr {G},T}(\underline{{\sigma }}')^\lambda \\ \leqslant \exp \{ n\varvec{s}(H_\star )\lambda +o(n) \} \varvec{S}\,.\end{aligned}$$

Taking expectations gives the claim.$\square $

We then conclude the second moment calculation by computing $\mathbb {E}(\varvec{Z}^2(\mathbf {N}_\text {se}))$, which is an upper bound on $\mathbb {E}(\varvec{S}^2(\mathbf {N}_\text {se}))$. Therefore we must maximize the function $\varvec{F}_2$ on $\mathbf {N}_\text {se}$. As in the first moment, the physics theory suggests that the unique maximizer of $\varvec{F}_2$ on $\mathbf {N}_\text {se}$ is given by a specific pair empirical measure $H_\bullet $ which is defined in terms of $H_\star $. In the nae-sat model there is a small complication in this definition: we will have $\dot{H}_\bullet =\dot{H}_\star \otimes \dot{H}_\star $ and $\bar{H}_\bullet =\bar{H}_\star \otimes \bar{H}_\star $, but $\hat{H}_\bullet \ne \hat{H}_\star \otimes \hat{H}_\star $ because the first and second copies interact via the edge literals. It therefore requires a small calculation to argue that $\varvec{F}_2(H_\bullet )=2\varvec{F}_2(H_\star )$. We address this by giving a simple sufficient condition for $H\in \varvec{\Delta }_2$ to satisfy $\varvec{F}_2(H)=\varvec{F}(H^1)+\varvec{F}(H^2)$ where $H^j\in \varvec{\Delta }$ are its single-copy marginal.

Lemma 3.9

Consider $H\in \varvec{\Delta }_2$ with single-copy marginals $H^1,H^2\in \varvec{\Delta }$. Suppose there are functions $g^1,g^2$ which are invariant to literals in the sense that $g^j(\underline{{\sigma }}^j)=g^j(\underline{{\sigma }}^j\oplus \underline{{\texttt {L}}})$ for all $\underline{{\sigma }}^j\in (\Omega _T)^k$, $\underline{{\texttt {L}}}\in \{{\texttt {0}},{\texttt {1}}\}^k$, and

$$\begin{aligned}\hat{H}^j(\underline{{\sigma }}^j)={\hat{v}}(\underline{{\sigma }}^j)g^j(\underline{{\sigma }})\,,\quad \hat{H}(\underline{{\sigma }}^1,\underline{{\sigma }}^2) ={\hat{v}}_2(\underline{{\sigma }}^1,\underline{{\sigma }}^2) \prod _{j=1,2}g^1(\underline{{\sigma }}^1)\,. \end{aligned}$$

If in addition $\dot{H}=\dot{H}^1\otimes \dot{H}^2$ and $\bar{H}=\bar{H}^1\otimes \bar{H}^2$, then $\varvec{F}_2(H)=\varvec{F}(H^1)+\varvec{F}(H^2)$.

Proof

Let $K^j(\underline{{\sigma }}^j,\underline{{\texttt {L}}})\equiv \hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})g^j(\underline{{\sigma }})/2^k$. This defines a probability measure on $(\Omega _T)^k\times \{{\texttt {0}},{\texttt {1}}\}^k$ where the marginal on $(\Omega _T)^k$ is $\hat{H}^j$, and the marginal on $\{{\texttt {0}},{\texttt {1}}\}^k$ is uniform by the assumption on $g^j$. It follows that $K^j(\underline{{\sigma }}^j\,|\,\underline{{\texttt {L}}}) = \hat{I}^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})g^j(\underline{{\sigma }})$ and $\hat{H}^j(\underline{{\sigma }}^j)=\mathbb {E}^lit [K^j(\underline{{\sigma }}^j\,|\,\underline{{\texttt {L}}})]$. Let $(X^1,X^2,L)$ be a random variable with law

$$\begin{aligned} \mathbb {P}((X^1,X^2,L)=(\underline{{\sigma }}^1,\underline{{\sigma }}^2,\underline{{\texttt {L}}})) =\frac{1}{2^k}\prod _{j=1,2} K^j(\underline{{\sigma }}^j\,|\,\underline{{\texttt {L}}})\,. \end{aligned}$$

The marginal law of L is uniform on $\{{\texttt {0}},{\texttt {1}}\}^k$, and the $X^j$ are conditionally independent given L. The marginal law of $(X^1,X^2)$ is $\hat{H}$, and the marginal law of $X^j$ is $\hat{H}^j$. The law of L conditional on $X^j$ is uniform over $2^k{\hat{v}}(X^j)$ possibilities, whereas the law of L conditional on $(X^1,X^2)$ is uniform over $2^k{\hat{v}}_2(X^1,X^2)$ possibilities. It follows that

$$\begin{aligned} \mathcal {H}(\hat{H})&=\mathcal {H}(X^1,X^2) =\mathcal {H}(L)-\mathcal {H}(L|X^1,X^2) +\sum _{j=1,2}\mathcal {H}(X^j|L) \\&=-\langle \hat{H},\ln {\hat{v}}_2\rangle +\sum _{j=1,2}\mathcal {H}(X^j|L)\\&= -\langle \hat{H},\ln {\hat{v}}_2\rangle +\sum _{j=1,2} [\mathcal {H}(\hat{H}^j)+ \langle \hat{H}^j,\ln {\hat{v}}\rangle ]\,. \end{aligned}$$

Rearranging gives $\varvec{\Sigma }_2(H)=\varvec{\Sigma }(H^1)+\varvec{\Sigma }(H^2)$, and the result follows.$\square $

It will be clear from the explicit definition that the measure $H_\star $ of Proposition 3.4 can be expressed as $H_\star (\underline{{\sigma }})={\hat{v}}(\underline{{\sigma }})g_\star (\underline{{\sigma }})$ where $g_\star $ is invariant to literals. Let $H_\bullet =(\dot{H}_\bullet ,\hat{H}_\bullet ,\bar{H}_\bullet )$ where

$$\begin{aligned}\hat{H}_\bullet (\underline{{\sigma }}^1,\underline{{\sigma }}^2) \equiv {\hat{v}}_2(\underline{{\sigma }}^1,\underline{{\sigma }}^2) \prod _{j=1,2} g_\star (\underline{{\sigma }}^j)\,, \end{aligned}$$

$\dot{H}_\bullet =\dot{H}_\star \otimes \dot{H}_\star $, and $\bar{H}_\bullet =\bar{H}_\star \otimes \bar{H}_\star $. The following is the second moment analogue of Proposition 3.4.

Proposition 3.10

(proved in Section 5) The unique maximizer of $\varvec{F}_2$ in $\mathbf {N}_\text {se}$ is $H_\bullet $. Moreover, there is a positive constant $\epsilon =\epsilon (k,\lambda ,T)$ so that for $\Vert {H-H_\bullet } \Vert \leqslant \epsilon $ we have $\varvec{F}_2(H) \leqslant \varvec{F}_2(H_\bullet )-\epsilon \Vert {H-H_\bullet } \Vert ^2$.

Corollary 3.11

For the coloring model, the second moment contribution from the near-uncorrelated regime $\mathbf {N}_\text {se}$ is given by the estimate $\mathbb {E}[\varvec{Z}^2(\mathbf {N}_\text {se})]\asymp \exp \{2n\varvec{F}(H_\star )\}$.

Proof

Recall from Corollary 3.5 the definition of $(\dot{\wp },\hat{\wp },\bar{\wp })$ for the single-copy model, and define $(\dot{\wp }_2,\hat{\wp }_2,\bar{\wp }_2)$ analogously for the pair model. Let $\wp _2 \equiv \dot{\wp }_2+\hat{\wp }_2-\bar{\wp }_2-1$. For any $H^1,H^2\in \mathbf {N}$, let $\mathbf {N}_{\text {se},H^1,H^2}$ denote the set of $H\in \mathbf {N}_\text {se}$ with single-copy marginals $H^1,H^2$. This is a space of dimension $\wp _2-2\wp $, and it follows from Proposition 3.10 and Lemma 4.4 that

$$\begin{aligned} \mathbb {E}[\varvec{Z}^2(\mathbf {N}_{\text {se},H^1,H^2})] \asymp \frac{\exp \{ n[ \varvec{F}_2(H_\bullet )-\Theta ( \Vert (H^1,H^2)-(H_\star ,H_\star )\Vert ^2)]\}}{n^\wp }\,. \end{aligned}$$

Summing over $H^1,H^2\in \mathbf {N}$ then gives $\mathbb {E}[\varvec{Z}^2(\mathbf {N}_\text {se})]\asymp \exp \{ n\varvec{F}_2(H_\bullet ) \}$, which in turn equals $\exp \{2n\varvec{F}(H_\star )\}$ by applying Lemma 3.9.$\square $

3.4 Conclusion of main result

We now explain that the main theorem follows. We continue to assume, as we have done throughout the section, that $k\geqslant k_0$, $\alpha $ satisfies (3), and $0\leqslant \lambda \leqslant 1$. The measure $H_\star $ of Proposition 3.4 depends on $\lambda $ and T, and we now make this explicit by writing $H_\star \equiv H_{\lambda ,T}$.

Corollary 3.12

For any $0\leqslant \lambda \leqslant 1$ and T finite such that $\varvec{\Sigma }(H_{\lambda ,T})$ is positive, the separable contribution to $\varvec{Z}_{\lambda ,T}$ is well-concentrated about its mean:

$$\begin{aligned}\lim _{\epsilon \downarrow 0} \liminf _{n\rightarrow \infty } \mathbb {P}\bigg (\epsilon (\mathbb {E}\varvec{S}) \leqslant \varvec{S}\leqslant \frac{\mathbb {E}\varvec{S}}{\epsilon }\bigg )=1\,. \end{aligned}$$

Proof

The upper bound follows trivially from Markov’s inequality, so the task is to show the lower bound. In the first moment, we have $\mathbb {E}\varvec{S}\asymp \exp \{n\varvec{F}(H_\star )\}$ by Corollary 3.5 and Proposition 3.7. In the second moment, since $\varvec{S}\leqslant \varvec{Z}$, we have $\mathbb {E}(\varvec{S}^2) \leqslant \mathbb {E}[\varvec{S}^2(\mathbf {N}_\text {ns})] +\mathbb {E}[\varvec{Z}^2(\mathbf {N}_\text {se})]$. Combining with Corollaries 3.8 and 3.11 gives

$$\begin{aligned} \frac{\mathbb {E}(\varvec{S}^2)}{(\mathbb {E}\varvec{S})^2} \leqslant \frac{\exp \{n \lambda \varvec{s}(H_\star ) + o(n) \}}{\mathbb {E}\varvec{S}} + O(1) \asymp \frac{\exp \{o(n)\}}{\exp \{n \varvec{\Sigma }(H_\star )\}} +O(1)\,, \end{aligned}$$

which immediately implies $\mathbb {P}(\varvec{S}\geqslant \delta (\mathbb {E}\varvec{S}))\geqslant \delta $ for some positive constant $\delta $. This can be strengthened to the asserted concentration result by an easy adaptation of the method described in [25, Sec. 6].$\square $

Proposition 3.13

(proved in “Appendix B”) For $0\leqslant \lambda \leqslant 1$ and $H_{\lambda ,T}=H_\star \in \varvec{\Delta }$ as given by Proposition 3.4, the triple $(\varvec{s}(H_{\lambda ,T}),\varvec{\Sigma }(H_{\lambda ,T}),\varvec{F}(H_{\lambda ,T}))$ converges as $T\rightarrow \infty $ to $(s_\lambda ,\Sigma (s_\lambda ),{\mathfrak {F}}(\lambda ))$ from Definition 1.3.

Proof of Theorem 1

In “Appendix E” we prove the upper bound, ${\textsf {f}}(\alpha )\leqslant {\textsf {f}}^\textsc {1rsb}(\alpha )$ for all $0\leqslant \alpha <\alpha _\text {sat}$. For any $\lambda ,T$ such that $\varvec{\Sigma }(H_{\lambda ,T})$ is positive, Corollary 3.12 gives

$$\begin{aligned} \liminf _{n\rightarrow \infty }(\varvec{Z}_{\lambda ,T}(\mathbf {N}))^{1/n} \geqslant \lim _{n\rightarrow \infty }(\varvec{S}_{\lambda ,T})^{1/n} =\exp \{\varvec{F}(H_{\lambda ,T})\} =\exp \{\varvec{\Sigma }(H_{\lambda ,T}) +\lambda \varvec{s}(H_{\lambda ,T})\}\,. \end{aligned}$$

On the other hand, $\varvec{Z}_{\lambda ,T}(\mathbf {N})$ consists entirely of clusters of size $\exp \{n\varvec{s}(H_{\lambda ,T}) + o(n)\}$. Therefore, if $\varvec{\Sigma }(H_{\lambda ,T})$ is positive, it must be that ${\textsf {f}}(\alpha )\geqslant \varvec{s}(H_{\lambda ,T})$. The lower bound ${\textsf {f}}(\alpha )\geqslant {\textsf {f}}^\textsc {1rsb}(\alpha )$ then follows by appealing to Proposition 3.13, so the theorem is proved. $\square $

The next two sections are devoted to the optimization of $\varvec{F}$ in $\mathbf {N}_\circ $, and of $\varvec{F}_2$ in $\mathbf {N}_\text {se}$. In Sect. 4 we show that the optimization of $\varvec{F}$ and $\varvec{F}_2$ over small regions can be reduced to an optimization problem on trees. In Sect. 5 we solve the tree optimization problem by connecting it to the analysis of the bp recursion for the coloring model. This allows us to prove Propositions 3.4 and 3.11, thereby completing the proof of the main result Theorem 1.

4 Reduction to tree optimization by local updates

In this section we prove the key reduction that ultimately allows us to compute the (first and second) moments of $\varvec{Z}\equiv \varvec{Z}_{\lambda ,T}$ (Propositions 3.4 and 3.10). As we have already seen, the calculation reduces to the optimization of functions $\varvec{F}$ and $\varvec{F}_2$ from (42) and (45). These functions are generally not convex over the entirety of their domains $\varvec{\Delta }$ and $\varvec{\Delta }_2$, but we expect them to be convex in neighborhoods around their maximizers $H_\star $ and $H_\bullet $ (as given in Definition 5.6 below). With this in mind, we rely on other means (a priori estimates and separability) to restrict the domains—from $\varvec{\Delta }$ to $\mathbf {N}_\circ $ in the first moment (Lemma 3.3), and from $\varvec{\Delta }_2$ to $\mathbf {N}_\text {se}$ in the second moment (Corollary 3.8). Within these restricted regions, we will show that $\varvec{F}$ and $\varvec{F}_2$ can be optimized by a local update procedure that reduces the (nonconvex) graph optimization to a (convex) tree optimization.

4.1 Local update

We begin with an overview. Throughout this section, we assume $1\leqslant T<\infty $. Suppose $\underline{{\sigma }}$ is a T-coloring on $\mathscr {G}$. Sample from $\mathscr {G}$ a subset of variables Y, and let $\mathscr {N}\equiv \mathscr {N}(Y)$ be the subgraph of $\mathscr {G}$ induced by Y, together with the clauses neighboring Y and their incident half-edges. The half-edges at the boundary of $\mathscr {N}$ will be referred to as the leaf edges of $\mathscr {N}$.

Form a modified instance $\mathscr {G}'$ (see Fig. 4) by resampling the edge literals on $\mathscr {N}$ as well as the matching between $\mathscr {N}$ and $\mathscr {G}{\setminus }\mathscr {N}$. In both $\mathscr {G}$ and $\mathscr {G}'$, we will say cut edges to refer to the edges $e=(av)$ where a is a clause in $\mathscr {N}$ and v is a variable in the complement of $\mathscr {N}$; these are the edges cut by the dashed lines in Fig. 4. According to our terminology, the leaf edges of $\mathscr {N}$ are the half-edges that lie just above the dashed lines, so each leaf edge of $\mathscr {N}$ is half of a cut edge. The coloring is updated accordingly to produce $\underline{{\eta }}$, a T-coloring on $\mathscr {G}'$ which agrees as much as possible with $\underline{{\sigma }}$ on $\mathscr {G}{\setminus }\mathscr {N}$: in particular, $\underline{{\sigma }}$ and $\underline{{\eta }}$ will agree in the variable-to-clause colors on the cut edges. We will define the procedure so that it gives a Markov chain $\pi $ on triples $(\mathscr {G},Y,\underline{{\sigma }})$ with reversing measure given by $\mu (\mathscr {G},Y,\underline{{\sigma }})=\mathbb {P}(\mathscr {G})\mathbb {P}(Y\,|\,\mathscr {G})\varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda $. (Note that $\mu $ is not normalized to be a probability measure.)

Reversibility implies that for any subset A of the state space, if B is the set of states reachable in one step from A, then

$$\begin{aligned} \mu (A)&= \sum _\mathrm{{\textsc {a}}\in A} \sum _\mathrm{{\textsc {b}}\in B}\mu (\mathrm {\textsc {a}})\pi (\mathrm {\textsc {a}},\mathrm {\textsc {b}}) =\sum _\mathrm{{\textsc {a}}\in A}\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}})\pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}}) =\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}}) \sum _\mathrm{{\textsc {a}}\in A} \pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}})\nonumber \\&=\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}}) \pi (\mathrm {\textsc {b}},A) \leqslant \bigg \{\sum _\mathrm{{\textsc {b}}\in B} \mu (\mathrm {\textsc {b}})\bigg \} \bigg \{ \max _\mathrm{{\textsc {b}}\in B}\pi (\mathrm {\textsc {b}},A) \bigg \} = \mu (B)\max _\mathrm{{\textsc {b}}\in B}\pi (\mathrm {\textsc {b}},A) \,. \end{aligned}$$

(46)

We will design the sampling procedure to ensure that (i) the vertices in Y are far from one another and from any short cycles, and (ii) the empirical measure $H^\text {sm}$ of $\underline{{\sigma }}$ on $\mathscr {N}$ is close to $H^\text {sy}$, a certain symmetrization of the overall empirical measure $H(\mathcal {G},\underline{{\sigma }})$. Then $\mathbb {E}\varvec{Z}(H)\approx \mu (A)$ where A is the set of states $(\mathscr {G},Y,\underline{{\sigma }})$ with $H^\text {sm}\approx H^\text {sy}$. The update produces a state $(\mathscr {G}',Y,\underline{{\eta }})\in B$ with possibly different $H^\text {sm}$, but with the same empirical measure $\dot{h}^\text {tr}(H^\text {sm})$ of variable-to-clause colors $\dot{\sigma }$ on the leaf edges of $\mathscr {N}$. Bounding $\pi (\mathrm {\textsc {b}},A)$ reduces to calculating the weight of configurations on $\mathscr {N}$ with empirical measure $H^\text {sm}\approx H^\text {sy}$, relative to the weight of all configurations on $\mathscr {N}$ with empirical measure $\dot{h}^\text {tr}(H^\text {sm})\approx \dot{h}^\text {tr}(H^\text {sy})$ on the leaf edges of $\mathscr {N}$. Because $\mathscr {N}$ is a disjoint union of trees, this reduces to a convex optimization problem which lends itself much more readily to analysis. The purpose of the current section is to formalize this graphs-to-trees reduction. We begin with the precise definitions of $H^\text {sy}$, $H^\text {sm}$ and $\dot{h}^\text {tr}(H^\text {sm})$. Recall our notation $\sigma \equiv (\dot{\sigma },\hat{\sigma })\in \Omega $ from the discussion following (33). As $\sigma $ goes over all of $\Omega _T$, write $\dot{\Omega }_T$ for the possible values of $\dot{\sigma }$, and $\hat{\Omega }_T$ for the possible values of $\hat{\sigma }$. Let $\dot{\Omega }\equiv \dot{\Omega }_\infty $ and $\hat{\Omega }\equiv \hat{\Omega }_\infty $, so

$$\begin{aligned} \dot{\Omega }&\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup (\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \})\,,\\ \hat{\Omega }&\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\} \cup (\hat{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \}) \end{aligned}$$

Definition 4.1

(sample empirical measures) Given an nae-sat instance $\mathscr {G}\equiv (\mathcal {G},\underline{{\texttt {L}}})$, a T-coloring $\underline{{\sigma }}$ on $\mathscr {G}$, and a nonempty subset of variables $Y\subseteq V$, we record the local statistics of “$\underline{{\sigma }}$ around Y” as follows. Let $\dot{H}^\text {sm}$ be the empirical measure of variable-incident colorings in Y: for $\underline{{\eta }}\in (\Omega _T)^d$,

$$\begin{aligned} \dot{H}^\text {sm}(\underline{{\eta }}) \equiv \frac{1}{|Y|}\sum _{v\in Y} \mathbf {1}\{\underline{{\sigma }}_{\delta v}=\underline{{\eta }}\}\,. \end{aligned}$$

Let $\bar{H}^\text {sm}$ be the empirical measure of colors on the edges incident to Y: for $\eta \in \Omega _T$,

$$\begin{aligned} \bar{H}^\text {sm}(\eta ) \equiv \frac{1}{|Y|d} \sum _{v\in Y}\sum _{e\in \delta v} \mathbf {1}\{\sigma _e=\eta \}\,. \end{aligned}$$

For $\underline{{\eta }}\in (\Omega _T)^k$ and $1\leqslant j\leqslant k$ define the rotation $\underline{{\eta }}^{(j)}\equiv (\eta _j,\ldots ,\eta _k,\eta _1,\ldots ,\eta _{j-1})$. For any $v\in Y$ and $e\in \delta v$, let j(e) be the index of e in $\delta a(e)$. For $\underline{{\eta }}\in (\Omega _T)^k$ let

$$\begin{aligned} \hat{H}^\text {sm}(\underline{{\eta }})\equiv \frac{1}{|Y|d}\sum _{v\in Y}\sum _{e\in \delta v} \mathbf {1}\{(\underline{{\sigma }}_{\delta a(e)})^{j(e)}=\underline{{\eta }}\}\,. \end{aligned}$$

Then $H^\text {sm}\equiv (\dot{H}^\text {sm},\hat{H}^\text {sm},\bar{H}^\text {sm})$ is the sample empirical measure for the state $(\mathscr {G},Y,\underline{{\sigma }})$; we shall write this hereafter as $H^\text {sm}=H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})$. Note that $H^\text {sm}$ lies in the space $\varvec{\Delta }^\text {sm}$ which is defined similarly to $\varvec{\Delta }$ but with condition (40) replaced by

$$\begin{aligned} \frac{1}{d}\sum _{\underline{{\sigma }}\in \Omega ^d} \dot{H}^\text {sm}(\underline{{\sigma }})\sum _{i=1}^d \mathbf {1}\{\sigma _i=\tau \} =\bar{H}^\text {sm}(\tau ) =\sum _{\underline{{\sigma }}\in \Omega ^k} \hat{H}^\text {sm}(\underline{{\sigma }})\mathbf {1}\{\sigma _1=\tau \}\,. \end{aligned}$$

(47)

We emphasize that (40) and (47) differ on the right-hand side. However, if we have $H=(\dot{H},\hat{H},\bar{H})\in \varvec{\Delta }$ such that $\hat{H}$ is invariant under rotation of the indices $1\leqslant j\leqslant k$, then $H\in \varvec{\Delta }^\text {sm}$ as well. With this in mind, for $H\in \varvec{\Delta }$ we define $H^\text {sy}\equiv (\dot{H},\hat{H}^\text {sy},\bar{H})\in \varvec{\Delta }^\text {sm}$ where $\hat{H}^\text {sy}$ is the average over all k rotations of $\hat{H}$. Later we will sample Y such that $H^\text {sm}$ falls very close to $H^\text {sy}$ with high probability. Lastly, for any $H^\text {sm}\in \varvec{\Delta }^\text {sm}$ we let $\dot{h}^\text {tr}(H^\text {sm})$ be the measure on $\dot{\Omega }_T$ given by

$$\begin{aligned}{}[\dot{h}^\text {tr}(H^\text {sm})](\dot{\eta }) \equiv \frac{1}{k-1} \sum _{\underline{{\sigma }}\in (\Omega _T)^k} \sum _{j=2}^k \mathbf {1}\{\dot{\sigma }_j=\dot{\eta }\}\hat{H}^\text {sm}(\underline{{\sigma }})\,. \end{aligned}$$

Thus $\dot{h}^\text {tr}(H^\text {sm})$ represents the empirical measure of spins $\dot{\sigma }$ on the leaf edges of $\mathscr {N}$, i.e., the edges cut by the dashed lines in Fig. 4.

For $H\in \varvec{\Delta }$, recall from (43) that $\varvec{F}(H)=\varvec{\Sigma }(H)+\lambda \varvec{s}(H)$ where $\varvec{\Sigma }(H)$ is the cluster complexity and $\varvec{s}(H)$ is (the exponential rate of) the cluster size. The tree analogue of $\varvec{\Sigma }(H)$ is $\varvec{\Sigma }^\text {tr}(H^\text {sm})$ where

$$\begin{aligned} \varvec{\Sigma }^\text {tr}(H) \equiv \mathcal {H}(\dot{H}) + d\mathcal {H}(\hat{H}) -d\mathcal {H}(\bar{H}) +\varvec{v}(H)\end{aligned}$$

(48)

— the only difference being that $\varvec{\Sigma }$ has coefficient $\alpha =d/k$ on the clause entropy term $\mathcal {H}(\hat{H})$, while $\varvec{\Sigma }^\text {tr}$ has coefficient d. As we see below, this occurs because the ratio of variables to clauses to edges is 1 : d : d for the disjoint union of trees $\mathscr {N}$, versus $1:\alpha :d$ for the full graph $\mathscr {G}$. We will also see that $\varvec{\Sigma }^\text {tr}$ is always concave, though $\varvec{\Sigma }$ need not be. Likewise, the tree analogue of $\varvec{s}(H)$ is $\varvec{s}^\text {tr}(H^\text {sm})$ where

$$\begin{aligned} \varvec{s}^\text {tr}(H) \equiv \langle \ln \dot{\Phi },\dot{H}\rangle +d\langle \ln \hat{F}, \hat{H}\rangle +d\langle \ln \bar{\Phi },\bar{H}\rangle \,.\end{aligned}$$

The tree analogue of $\varvec{F}(H)$ is $\varvec{\Lambda }(H^\text {sm})$ where

$$\begin{aligned} \varvec{\Lambda }(H) \equiv \varvec{\Sigma }^\text {tr}(H)+ \lambda \varvec{s}^\text {tr}(H)\,.\end{aligned}$$

(49)

Recall Definition 4.1: given $(\mathscr {G},Y,\underline{{\sigma }})$ with sample empirical measure $H^\text {sm}$, the empirical measure of spins $\dot{\sigma }$ on the leaf edges of $\mathscr {N}(Y)$ is given by $\dot{h}^\text {tr}=\dot{h}^\text {tr}(H^\text {sm})$. Then, for any probability measure ${\dot{h}}$ on $\dot{\Omega }_T$, we let

$$\begin{aligned} \varvec{\Lambda }^\text {op}({\dot{h}}) \equiv \sup \{\varvec{\Lambda }(H) : H\in \varvec{\Delta }^\text {sm}\text { with } \dot{h}^\text {tr}(H) ={\dot{h}} \}\,, \end{aligned}$$

where we emphasize that the supremum is taken over $\varvec{\Delta }^\text {sm}$ rather than $\varvec{\Delta }$. For $H\in \varvec{\Delta }^\text {sm}$ we define

$$\begin{aligned} \varvec{\Xi }(H) \equiv \varvec{\Xi }_{\lambda ,T}(H) \equiv \varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))- \varvec{\Lambda }(H)\,.\end{aligned}$$

(50)

The interpretation of $\varvec{\Xi }$, formalized below, is that for any $H\in \varvec{\Delta }$, if A is the set of states with $H^\text {sm}\approx H^\text {sy}$ and B is the set of states reachable in one step of the chain from A, then $\max \{\pi (\mathrm {\textsc {b}},A):\mathrm {\textsc {b}}\in B\}$ is approximately $\exp \{-|Y|\,\varvec{\Xi }(H^\text {sy})\}$, where we note that $\varvec{\Xi }(H^\text {sy})\geqslant 0$ since $\varvec{\Xi }$ is nonnegative on all of $\varvec{\Delta }^\text {sm}$, and $H^\text {sy}\in \varvec{\Delta }\cap \varvec{\Delta }^\text {sm}$. Formally, we have the following bound:

Theorem 4.2

For $\epsilon $ small enough (depending only on d, k, T), it holds for all $H\in \varvec{\Delta }$ that

$$\begin{aligned}\varvec{F}(H)\leqslant \max \Big \{\varvec{F}(H'): \Vert {H'-H} \Vert \leqslant \epsilon (dk)^{2T}\Big \}- \epsilon \,\varvec{\Xi }(H^\text {sy})\,. \end{aligned}$$

The analogous statement holds in the second moment with $\varvec{F}_2=\varvec{F}_{2,\lambda ,T}$ and $\varvec{\Xi }_2=\varvec{\Xi }_{2,\lambda ,T}$.

For the sake of exposition, we will give the proof of Theorem 4.2 for $\varvec{F}$ only; the assertion for $\varvec{F}_2$ follows from the same argument with essentially no modifications. The first task is to define the Markov chain that was informally discussed above. There are a few issues to be addressed: how to sample Y ensuring certain desirable properties; how to resample the matching between $\mathscr {N}$ and $\mathscr {G}{\setminus }\mathscr {N}$; and how to produce a valid coloring $\underline{{\eta }}$ on $\mathscr {G}'$ without changing the spins $\dot{\sigma }$ on the cut edges. We address the last issue next.

4.2 Tree updates

Recall that in the bipartite factor graph $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, each edge joins a variable to a clause and is defined to have length one-half. For the discussion that follows, it is useful to bisect each edge $e\in E$ with an artificial vertex indicating the edge factor $\bar{\Phi }$; these are shown as small dots in Fig. 4. Thus an edge e joining $a\in F$ to $v\in V$ becomes two quarter-length edges, (ae) and (ev), where e now refers to the artificial vertex. Given a coloring $\underline{{\sigma }}$ on the original graph, we obtain a coloring on the new graph by simply duplicating the color on each edge, setting $\sigma _{ae}=\sigma _e=\sigma _{ev}$. We then define $\mathscr {N}(v)$ as the (5/8)-neighborhood of variable v, and define $\mathscr {N}=\mathscr {N}(Y)$ as the union of $\mathscr {N}(v)$ for all $v\in Y$: in the top panel of Fig. 4, $\mathscr {N}$ is the subgraph shown in blue, above the dashed line.

Directly below the same dashed line, the small solid orange dots correspond to the boundary edges, hereafter denoted $\mathcal {W}$, of the cavity graph $\mathscr {G}_\partial \equiv \mathscr {G}{\setminus }\mathscr {N}$. For $e\in \mathcal {W}$, let $\mathscr {T}$ be its neighborhood in $\mathscr {G}{\setminus }\mathscr {N}$ of some radius $\ell >T$ where $2\ell $ is a positive integer. Assuming e is not close to a short cycle, $\mathscr {T}$ is what we will call a directed tree rooted at e. In this case we also call $\mathscr {T}$ a variable-to-clause tree since the root edge has no incident clause; a clause-to-variable tree is similarly defined. We always visualize a directed tree $\mathscr {T}$ as in Fig. 5, with the root edge e at the top, so that paths leaving the root travel downwards. On an edge $e=(av)$, the upward color is $\dot{\sigma }_{av}$ if a lies above v, and $\hat{\sigma }_{av}$ if v lies above a. We let $\delta \mathscr {T}$ denote the boundary edges of $\mathscr {T}$, not including the root edge.

Suppose $\underline{{\sigma }}$ is a valid T-coloring of a directed tree $\mathscr {T}$ with root spin $\sigma _e=\sigma $, and consider a new root spin $\eta \in \Omega _T$. If $\sigma $ and $\eta $ agree on the upward color of the root edge, then there is a unique valid coloring

$$\begin{aligned}\underline{{\eta }}=\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T}) \in (\Omega _T)^{E(\mathscr {T})}\end{aligned}$$

which has root spin $\eta $, and agrees with $\underline{{\sigma }}$ in all the upward colors. Indeed, the only possibility for $\sigma \ne \eta $ is that both $\sigma ,\eta \in \{{\texttt {f}}\}$. Then, recalling (23), the coloring $\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T})$ is uniquely defined by recursively applying the mappings $\dot{T}$ and $\hat{T}$, starting from the root and continuing downwards. Since we assumed that $\underline{{\sigma }}$ was a valid T-coloring and $\eta \in \Omega _T$, it is easy to verify that the resulting $\underline{{\eta }}$ is also a valid T-coloring, so the $\mathrm {\textsf {update}}$ procedure respects the restriction to $\Omega _T$. From now on we assume all edge colors belong to $\Omega _T$.

Lemma 4.3

Suppose $\underline{{\sigma }}$ is a valid T-coloring of the directed tree $\mathscr {T}$ with root color $\sigma $, and $\eta \in \Omega _T$ agrees with $\sigma $ on the upward color of the root edge. If $\underline{{\eta }}=\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T})$ agrees with $\underline{{\sigma }}$ on the boundary $\delta \mathscr {T}$, then $\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\sigma }})=\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\eta }})$.

Proof

It follows from the construction that on any edge e in the tree, $\sigma _e$ and $\eta _e$ agree on the upward color; moreover, if $\sigma _e\ne \eta _e$ then we must have $\sigma _e,\eta _e\in \{{\texttt {f}}\}$. For each vertex $x\in \mathscr {T}$, let e(x) denote the parent edge of x, that is, the unique edge of $\mathscr {T}$ which lies above x. We then have

$$\begin{aligned}\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\sigma }}) = \prod _{e\in \delta \mathscr {T}} \bar{\Phi }(\sigma _e) \prod _{v\in V(\mathscr {T})}\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \bar{\Phi }(\sigma _{e(v)})\bigg \} \prod _{a\in F(\mathscr {T})} \bigg \{ \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \bar{\Phi }(\sigma _{e(a)}) \bigg \}\,. \end{aligned}$$

For a clause a in $\mathscr {T}$ with $e(a)=e$, if both $\sigma _e,\eta _e\in \{{\texttt {f}}\}$ then it follows directly from (39) that

$$\begin{aligned} \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a})\bar{\Phi }(\sigma _e) = \hat{\varphi }^\text {lit}((\underline{{\dot{\tau }}}\oplus \underline{{\texttt {L}}})_{\delta a})\bar{\varphi }(\sigma _e) = \hat{z}(\hat{\sigma }_e) = \hat{z}(\hat{\eta }_e) = \hat{\Phi }^\text {lit}((\underline{{\eta }}\oplus \underline{{\texttt {L}}})_{\delta a})\bar{\Phi }(\eta _e)\,. \end{aligned}$$

For a variable v in $\mathscr {T}$ with $e(v)=e$, if $\sigma _e,\eta _e\in \{{\texttt {f}}\}$, then a similar calculation as (39) gives

$$\begin{aligned} \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \bar{\Phi }(\sigma _e) =\dot{\varphi }(\underline{{\hat{\sigma }}}_{\delta v}) \bar{\varphi }(\sigma _e) =\dot{z}(\dot{\sigma }_e) = \dot{z}(\dot{\eta }_e) =\dot{\Phi }(\underline{{\eta }}_{\delta v}) \bar{\Phi }(\eta _e)\,, \end{aligned}$$

where we used the fact that necessarily we have $\underline{{\sigma }}_{\delta v}\in \{{\texttt {f}}\}^d$. To conclude, we recall that $\underline{{\sigma }}$ and $\underline{{\eta }}$ agree on $\delta \mathscr {T}$ by assumption, so we have $\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\sigma }})=\varvec{w}^\text {lit}_{\mathscr {T}}(\underline{{\eta }})$ as claimed. $\square $

We also use the directed tree as a device to prove the following lemma, which was used in the proofs of Corollaries 3.5 and 3.11.

Lemma 4.4

Let ${\dot{M}},{\hat{M}}$ be as defined in Corollary 3.5, and let ${\dot{M}}_2,{\hat{M}}_2$ be their analogues in the pair model. For any $\sigma ,\eta \in \Omega $ there exists an integer-valued vector $(\dot{H},\hat{H})$ so that

$$\begin{aligned} \langle {\mathbf {1}},\dot{H}\rangle =0=\langle {\mathbf {1}},\hat{H}\rangle \quad \text {and}\quad {\dot{M}}\dot{H}-{\hat{M}}\hat{H}={\mathbf {1}}_\sigma -{\mathbf {1}}_\eta \,, \end{aligned}$$

where ${\mathbf {1}}$ denotes the all-ones vector, and ${\mathbf {1}}_\sigma $ denotes the vector which is one in the $\sigma $ coordinate and zero elsewhere. The analogous statement holds for $({\dot{M}}_2,{\hat{M}}_2)$.

Proof

We define a graph on $\Omega _T$ by putting an edge between $\sigma $ and $\eta $ if there exist valid colorings $\underline{{\sigma }},\underline{{\eta }}$ on some directed tree $\mathscr {T}$ which take values $\sigma ,\eta $ on the root edge, but agree on the boundary edges $\delta \mathscr {T}$. If $\sigma ,\eta $ are connected in this way, then taking

$$\begin{aligned} \dot{H}(\underline{{\rho }})&=\sum _{v\in V(\mathscr {T})} \mathbf {1}\{\underline{{\sigma }}_{\delta v}=\underline{{\rho }}\} -\sum _{v\in V(\mathscr {T})}\mathbf {1}\{\underline{{\eta }}_{\delta v}=\underline{{\rho }}\}\,, \quad \underline{{\rho }}\in (\Omega _T)^d\\ \hat{H}(\underline{{\rho }})&=\sum _{a\in F(\mathscr {T})} \mathbf {1}\{\underline{{\sigma }}_{\delta a}=\underline{{\rho }}\} -\sum _{a\in F(\mathscr {T})} \mathbf {1}\{\underline{{\eta }}_{\delta a}=\underline{{\rho }}\}\,, \quad \underline{{\rho }}\in (\Omega _T)^k. \end{aligned}$$

gives ${\dot{M}}\dot{H}-{\hat{M}}\hat{H}={\mathbf {1}}_\sigma -{\mathbf {1}}_{\eta }$ as required. It therefore suffices to show that the graph we have defined on $\Omega _T$ is connected (hence complete). First, if $\dot{\sigma }=\dot{\eta }$, it is clear that $\sigma $ and $\eta $ can be connected by colorings $\underline{{\sigma }},\underline{{\eta }}$ of some variable-to-clause tree $\mathscr {T}$, with $\underline{{\eta }}=\mathrm {\textsf {update}}(\underline{{\sigma }},\eta ;\mathscr {T})$. Similarly, if $\hat{\sigma }=\hat{\eta }$, then $\sigma $ and $\eta $ can be connected by a clause-to-variable tree. This implies that $\{{\texttt {f}}\}$ is connected. Next, if $\sigma ={\texttt {r}}_{{\varvec{x}}}$ and $\eta ={\texttt {b}}_{{\varvec{x}}}$, then they can be connected by a variable-to-clause tree rooted at edge e, containing a single variable factor $v=v(e)$, with $\underline{{\sigma }}_{\delta v{\setminus } e}$ identically equal to ${\texttt {r}}_{{\varvec{x}}}$. If $\sigma ={\texttt {b}}_{{\varvec{x}}}$ and $\eta =(\dot{\tau },{\texttt {s}})$ for any $\dot{\tau }\in \dot{\Omega }{\setminus }\{{\texttt {r}},{\texttt {b}}\}$, then they can be connected by a clause-to-variable tree rooted at edge e, containing a single clause factor $a=a(e)$, with any $\underline{{\sigma }}_{\delta a{\setminus } e}$ such that $(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a{\setminus } e}$ contains both $\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$ entries. It follows that $\Omega _T$ is indeed connected, which proves the assertion concerning $({\dot{M}},{\hat{M}})$. The proof for $({\dot{M}}_2,{\hat{M}}_2)$ is very similar and we omit the details.$\square $

4.3 Markov chain

We now define a Markov chain on tuples $(\mathscr {G},Y,\underline{{\sigma }})$ where $\mathscr {G}$ is an nae-sat instance, $\underline{{\sigma }}$ is a valid T-coloring on $\mathscr {G}$, and $Y\subseteq V$ is a subset of variables such that

$$\begin{aligned} \text {the subgraphs }B_{2T}(v),\text { for }v\in Y,\text { are mutually disjoint trees,} \end{aligned}$$

(51)

where $B_{2T}(v)$ is to the 2T-neighborhood of v in $\mathscr {G}$. Recall that $\mathscr {N}=\mathscr {N}(Y)$ is the $\tfrac{5}{8}$-neighborhood of Y; we write $\mathscr {N}= (\mathcal {N},\underline{{\texttt {L}}}_\mathcal {N})$ where $\mathcal {N}$ is the graph without edge literals. Write $\delta \mathcal {N}$ for the boundary of $\mathcal {N}$, consisting of clause-incident edges that are not incident to Y (just above the dashed line in Fig. 4). Write $\underline{{\sigma }}_\mathcal {N}$ for a T-coloring on $\mathcal {N}$ (including $\delta \mathcal {N}$), and let

$$\begin{aligned} \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N}) \equiv \varvec{w}^\text {lit}_\mathscr {N}(\underline{{\sigma }}_\mathcal {N}) \equiv \prod _{v\in Y}\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{e\in \delta v} \Big \{ \hat{\Phi }^\text {lit}( (\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a(e)}) \bar{\Phi }(\sigma _e) \Big \}\bigg \}\,. \end{aligned}$$

(52)

On the other hand we have $\mathscr {G}{\setminus }\mathscr {N}\equiv \mathscr {G}_\partial \equiv (\mathcal {G}_\partial ,\underline{{\texttt {L}}}_\partial )$ where $\mathcal {G}_\partial \equiv (V_\partial ,F_\partial ,E_\partial )$, and $\mathcal {W}$ denotes the boundary of $\mathscr {G}_\partial $ (just below the dashed line in Fig. 4). Write $\underline{{\sigma }}_\partial $ for a coloring on $\mathcal {G}_\partial $ (including $\mathcal {W}$), and let

$$\begin{aligned} \varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial ) \equiv \prod _{v\in V_\partial } \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in F_\partial } \hat{\Phi }^\text {lit}((\underline{{\sigma }}\oplus \underline{{\texttt {L}}})_{\delta a}) \prod _{e\in E_\partial } \bar{\Phi }(\sigma _e)\,. \end{aligned}$$

By matching $\delta \mathcal {N}$ to $\mathcal {W}$ (along the dashed line in Fig. 4), the graphs $\mathscr {G}_\partial $ and $\mathscr {N}$ combine to form the original instance $\mathscr {G}$. If $\underline{{\sigma }}$ is a valid coloring on $\mathscr {G}$, then $\underline{{\sigma }}_{\delta \mathcal {N}}$ and $\underline{{\sigma }}_\mathcal {W}$ must agree, and we have

$$\begin{aligned} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }}) =\varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial ) \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}| \underline{{\texttt {L}}}_\mathcal {N})\,. \end{aligned}$$

(53)

Let $\dot{h}^\text {tr}(\underline{{\sigma }}_{\delta \mathcal {N}})=\dot{h}^\text {tr}$ be the empirical measure of the spins $(\dot{\sigma }_e)_{e\in \delta \mathcal {N}}$. Given initial state $(\mathscr {G},Y,\underline{{\sigma }})$, we take one step of the Markov chain as follows:

1.
Detach $\mathscr {N}$ from $\mathscr {G}$. On $\mathscr {N}$, sample a new assignment $(\underline{{\texttt {L}}}'_\mathcal {N},\underline{{\eta }}_\mathcal {N})$ from the probability measure
$$\begin{aligned} p( (\underline{{\texttt {L}}}'_\mathcal {N},\underline{{\eta }}_\mathcal {N})\,|\, (\underline{{\texttt {L}}}_\mathcal {N},\underline{{\sigma }}_\mathcal {N})) =\frac{\mathbf {1}\{ \dot{h}^\text {tr}(\underline{{\eta }}_{\delta \mathcal {N}})=\dot{h}^\text {tr}\} \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda }{z(|Y|,\dot{h}^\text {tr})}\end{aligned}$$
(54)
where the denominator is the normalizing constant obtained by summing over all possible $(\underline{{\texttt {L}}}'_\mathcal {N},\underline{{\eta }}_\mathcal {N})$.
2.
Form the new graph $\mathscr {G}'$ by sampling a uniformly random matching of $\delta \mathcal {N}$ with $\mathcal {W}$, subject to the constraint that $e\in \mathcal {W}$ must be matched to $e'\in \delta \mathcal {N}$ with $\dot{\sigma }_e=\dot{\eta }_{e'}$. The number of such matchings depends only on |Y| and $\dot{h}^\text {tr}$, so we denote it as ${\mathcal {M}}(|Y|,\dot{h}^\text {tr})$. For each matched pair $(e,e')$ where $\hat{\sigma }_e\ne \hat{\eta }_{e'}$, let $\mathscr {T}=\mathscr {T}(e)$ be the radius-2T neighborhood of e in the graph $\mathscr {G}_\partial $. Let
$$\begin{aligned} \underline{{\eta }}_{\mathscr {T}}\equiv \mathrm {\textsf {update}}(\underline{{\sigma }}_{\mathscr {T}},\eta _e;\mathscr {T}) \end{aligned}$$
and note that, since $\underline{{\sigma }}$ is a valid T-coloring, $\underline{{\eta }}_{\mathscr {T}}$ and $\underline{{\underline{{\sigma }}}}_{\mathscr {T}}$ must agree at the boundary of $\mathscr {T}$. Finally, on the rest of $\mathcal {G}_\partial $ outside the radius-2T neighborhood of $\mathcal {W}$, we simply take $\underline{{\eta }}$ and $\underline{{\sigma }}$ to be the same.

The state of the Markov chain after one step is $(\mathscr {G}',Y,\underline{{\eta }})$ where $\underline{{\eta }}$ is a valid T-coloring on $\mathscr {G}'$.

Lemma 4.5

Suppose we have a sampling mechanism for a random subset of variables Y in $\mathscr {G}$ such that, whenever $(\mathscr {G},Y,\underline{{\sigma }})$ and $(\mathscr {G}',Y,\underline{{\eta }})$ appear in the same orbit of the Markov chain, we have

$$\begin{aligned} \mathbb {P}(Y\,|\,\mathscr {G})=\mathbb {P}(Y\,|\,\mathscr {G}')\,. \end{aligned}$$

(55)

A reversing measure for the Markov chain is then given by $\mu (\mathscr {G},Y,\underline{{\sigma }}) = \mathbb {P}(\mathscr {G}) \mathbb {P}(Y\,|\,\mathscr {G}) \varvec{w}^\text {lit}_{\mathscr {G}}(\underline{{\sigma }})^\lambda $.

Proof

Given $\mathrm {\textsc {a}}=(\mathscr {G},Y,\underline{{\sigma }})$, let $\mathrm {\textsc {b}}=(\mathscr {G}',Y,\underline{{\eta }})$ be any state reachable from $\mathrm {\textsc {a}}$ in a single step of the chain. By the factorization (53), together with assumption (55) and the fact that $\mathbb {P}(\mathscr {G})=\mathbb {P}(\mathscr {G}')$,

$$\begin{aligned} \frac{\mu (\mathrm {\textsc {a}})}{\mu (\mathrm {\textsc {b}})} =\frac{\mathbb {P}(\mathscr {G})\mathbb {P}(Y\,|\,\mathscr {G}) \varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda }{\mathbb {P}(\mathscr {G}')\mathbb {P}(Y\,|\,\mathscr {G}') \varvec{w}^\text {lit}_\partial (\underline{{\eta }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda } =\frac{\varvec{w}^\text {lit}_\partial (\underline{{\sigma }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda }{\varvec{w}^\text {lit}_\partial (\underline{{\eta }}_\partial )^\lambda \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda } =\frac{\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda }{\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda }\,, \end{aligned}$$

where the last identity is by Lemma 4.3. On the other hand, with $\pi $ denoting the transition probabilities for the Markov chain, (54) implies

$$\begin{aligned}\frac{\pi (\mathrm {\textsc {a}},\mathrm {\textsc {b}})}{\pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}})} =\frac{ \varvec{w}^\text {lit}_\mathcal {N}(\underline{{\eta }}_\mathcal {N}|\underline{{\texttt {L}}}'_\mathcal {N})^\lambda }{{\mathcal {M}}(|Y|,\dot{h}^\text {tr}) z(|Y|,\dot{h}^\text {tr})} \frac{{\mathcal {M}}(|Y|,\dot{h}^\text {tr}) z(|Y|,\dot{h}^\text {tr})}{\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda } =\frac{\mu (\mathrm {\textsc {b}})}{\mu (\mathrm {\textsc {a}})}\,.\end{aligned}$$

Rearranging proves reversibility, $\mu (\mathrm {\textsc {a}})\pi (\mathrm {\textsc {a}},\mathrm {\textsc {b}}) =\mu (\mathrm {\textsc {b}})\pi (\mathrm {\textsc {b}},\mathrm {\textsc {a}})$. (We remark that since the Markov chain breaks up into many disjoint orbits, the reversing measure $\mu $ is not unique.)$\square $

4.4 From graph to tree optimizations

If Y satisfies condition (51) and we define $\mathscr {N}=\mathscr {N}(Y)=(\mathcal {N},\underline{{\texttt {L}}})$ as before, then $\mathcal {N}$ consists of $|Y|\equiv s$ disjoint copies of the tree $\mathcal {D}$ shown in Fig. 6. Recall from Definition 4.1 the definition of $H^\text {sm}=H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})$, and note $H^\text {sm}$ depends only on $\underline{{\sigma }}_\mathcal {N}$. For any $H^\text {sm}\in \varvec{\Delta }^\text {sm}$ we let $\varvec{Z}(H^\text {sm};\mathscr {N})$ be the partition function of all colorings on $\mathscr {N}$ with empirical measure $H^\text {sm}$—the only randomness comes from the literals $\underline{{\texttt {L}}}_\mathcal {N}$. The expected number of valid colorings is

$$\begin{aligned} \mathbb {E}\varvec{Z}_{\lambda =0,T}(H^\text {sm};\mathscr {N}) =\exp \{sd\langle \hat{H}^\text {sm},{\hat{v}}\rangle \} \left( {\begin{array}{c}s\\ s\dot{H}^\text {sm}\end{array}}\right) \left( {\begin{array}{c}ds\\ ds\hat{H}^\text {sm}\end{array}}\right) \bigg / \left( {\begin{array}{c}ds\\ ds\bar{H}^\text {sm}\end{array}}\right) \, \end{aligned}$$

which by Stirling’s formula is $s^{O(1)} \exp \{s\,\varvec{\Sigma }^\text {tr}(H^\text {sm})\}$ (see (48)). Any valid coloring $\underline{{\sigma }}_\mathcal {N}$ with empirical measure $H^\text {sm}$ contributes weight $\varvec{w}^\text {lit}_\mathcal {N}(\underline{{\sigma }}_\mathcal {N}|\underline{{\texttt {L}}}_\mathcal {N})^\lambda =\exp \{s\lambda \varvec{s}^\text {tr}(H^\text {sm})\}$, so altogether

$$\begin{aligned} \mathbb {E}\varvec{Z}(H^\text {sm};\mathscr {N}) = s^{O(1)}\exp \{s[\varvec{\Sigma }^\text {tr}(H^\text {sm})+\lambda \varvec{s}^\text {tr}(H^\text {sm})]\} =s^{O(1)}\exp \{s\,\varvec{\Lambda }(H^\text {sm})\}\,, \end{aligned}$$

(56)

with $\varvec{\Lambda }$ as in (49). (This calculation clarifies why we refer to $\varvec{\Sigma }^\text {tr},\varvec{\Lambda }$ as the “tree analogues” of $\varvec{\Sigma },\varvec{F}$.)

Fix $H^\text {sm}\in \varvec{\Delta }^\text {sm}$ and let $A(H^\text {sm})$ be the set of all $(\mathscr {G},Y,\underline{{\sigma }})$ with $H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})=H^\text {sm}$. Let $B(H^\text {sm})$ be the set of states reachable in one step from $A(H^\text {sm})$. Then, for all $\mathrm {\textsc {b}}\in B(H^\text {sm})$,

$$\begin{aligned} \pi (\mathrm {\textsc {b}},A(H^\text {sm})) =\frac{\mathbb {E}\varvec{Z}(H^\text {sm};\mathscr {N})}{\sum _{H'\in \varvec{\Delta }^\text {sm}} \mathbf {1}\{\dot{h}^\text {tr}(H')=\dot{h}^\text {tr}(H^\text {sm})\} \mathbb {E}\varvec{Z}(H';\mathscr {N})} =s^{O(1)}\exp \{-s\,\varvec{\Xi }(H^\text {sm})\}\,. \end{aligned}$$

(57)

This is the key calculation for Theorem 4.2. To complete the proof, what remains is to produce a sampling mechanism $\mathbb {P}(Y\,|\,\mathscr {G})$ which satisfies our earlier conditions (51) and (55), together with some concentration bound to ensure that in most cases Y is large and $H^\text {sm}\approx H^\text {sy}$. We formalize this as follows:

Definition 4.6

(sampling mechanism) Let $\mathbb {P}(Y\,|\,\mathscr {G})$ be the probability of sampling the subset of variables Y from the nae-sat instance $\mathscr {G}$. We call this a good sampling mechanism if the following holds: first, whenever $(\mathscr {G},Y,\underline{{\sigma }})$ and $(\mathscr {G}',Y,\underline{{\eta }})$ appear in the same orbit of the Markov chain, we must have $\mathbb {P}(Y\,|\,\mathscr {G})=\mathbb {P}(Y\,|\,\mathscr {G}')$ (used in Lemma 4.5 to show reversibility). Next we require that for every $\mathscr {G}$, and for every Y with $\mathbb {P}(Y\,|\,\mathscr {G})$ positive, the neighborhoods $B_{2T}(v)$ for $v\in Y$ are mutually disjoint trees (condition (51), required for defining the Markov chain). Lastly we require that for all but an exceptional set ${\mathscr {B}}$ of “bad” nae-sat instances, with $\mathbb {P}(\mathscr {G}\in {\mathscr {B}}) \leqslant \exp \{-n(\ln n)^{1/2}\}$, we have

$$\begin{aligned} \sum _{Y:n\epsilon \leqslant |Y|\leqslant 4n\epsilon } \mathbb {P}(Y\,|\,\mathscr {G}) {\mathbf {1}}\bigg \{ \Big \Vert H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }}) -H^\text {sy}(\mathcal {G},\underline{{\sigma }})\Big \Vert \leqslant \frac{1}{(\ln \ln n)^{1/2}}\bigg \} \geqslant \frac{1}{2}\,. \end{aligned}$$

(58)

for all colorings $\underline{{\sigma }}$ on $\mathscr {G}$.

Proposition 4.7

Assume the existence of a good sampling mechanism in the sense of Definition 4.6. Then, for $\epsilon $ small enough (depending only on d, k, T), it holds for all $H\in \varvec{\Delta }$ that

$$\begin{aligned}\frac{\mathbb {E}\varvec{Z}(H)}{\mathbb {E}\varvec{Z}(\mathbf {N}_{H,\epsilon })} \leqslant \frac{\exp \{o_n(1)\}}{\exp \{n\epsilon \,\varvec{\Xi }(H^\text {sy})\}} \end{aligned}$$

where $\mathbf {N}_{H,\epsilon }\equiv \{H'\in \varvec{\Delta }:\Vert H-H' \Vert \leqslant \epsilon (dk)^{2T}\}$ and $H^\text {sy}$ is the symmetrization of H from Definition 4.1.

Proof

Abbreviate $\delta \equiv 1/(\ln \ln n)^{1/2}$. Given $H\in \varvec{\Delta }$ and its symmetrization $H^\text {sy}\in \varvec{\Delta }\cap \varvec{\Delta }^\text {sm}$, let

$$\begin{aligned}A=\bigg \{ (\mathscr {G},Y,\underline{{\sigma }}): |Y|\geqslant n\epsilon \text { and } \Big \Vert H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})-H^\text {sy}\Big \Vert \leqslant 2\delta \bigg \}\,.\end{aligned}$$

With $\mu $ the reversing measure from Lemma 4.5, we have

$$\begin{aligned} \mu (A) \geqslant \sum _{\mathscr {G}\notin {\mathscr {B}}}\mathbb {P}(\mathscr {G}) \sum _{\underline{{\sigma }}} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda \sum _{Y:|Y|\geqslant n\epsilon } \mathbb {P}(Y\,|\,\mathscr {G}) {\mathbf {1}}\bigg \{\quad \begin{array}{c} \Vert H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }}) -H^\text {sy}(\mathcal {G},\underline{{\sigma }}) \Vert \leqslant \delta \\ \text {and } \Vert H(\mathcal {G},\underline{{\sigma }})-H \Vert \leqslant \delta \end{array}\quad \bigg \}\,, \end{aligned}$$

using $\Vert H(\mathcal {G},\underline{{\sigma }})-H \Vert \leqslant \Vert H^\text {sy}(\mathcal {G},\underline{{\sigma }})-H^\text {sy}\Vert $ together with the triangle inequality. Applying (58) gives

$$\begin{aligned}\mu (A) \geqslant \frac{1}{2} \sum _{\mathscr {G}\notin {\mathscr {B}}}\mathbb {P}(\mathscr {G}) \sum _{\underline{{\sigma }}} \varvec{w}^\text {lit}_\mathscr {G}(\underline{{\sigma }})^\lambda {\mathbf {1}}\bigg \{ \Vert H(\mathcal {G},\underline{{\sigma }})-H \Vert \leqslant \delta \bigg \} \geqslant \frac{\mathbb {E}[\varvec{Z}(H);\mathscr {G}\notin {\mathscr {B}}]}{2} \geqslant \frac{\mathbb {E}\varvec{Z}(H)}{4}\,, \end{aligned}$$

where the last step follows from the bound on $\mathbb {P}(\mathscr {G}\in {\mathscr {B}})$. If B is the set of states reachable from A in one step of the Markov chain, then crudely $\mu (B)\leqslant \mathbb {E}\varvec{Z}(\mathbf {N}_{H,\epsilon })$, and

$$\begin{aligned}\max _\mathrm{{\textsc {b}}\in B} \pi (\mathrm {\textsc {b}},A) \leqslant \frac{\exp \{o_n(1)\}}{\exp \{n\epsilon \,\varvec{\Xi }(H^\text {sy})\}} \end{aligned}$$

by our earlier calculation (57). The result follows by substituting into (46).$\square $

4.5 Sampling mechanism

To complete the proof of Theorem 4.2, it remains for us to define a sampling mechanism satisfying the conditions of Definition 4.6. To this end, given a (d, k)-regular graph $\mathcal {G}$, let $V_t\subseteq V$ be the subset of variables $v\in V$ such that the t-neighborhood $B_t(v)$ around v is a tree. Recall the following form of the Chernoff bound: if X is a binomial random variable with mean $\mu $, then for all $t\geqslant 1$ we have $\mathbb {P}(X\geqslant t\mu )\leqslant \exp \{ - t\mu \ln (t/e) \}$.

Lemma 4.8

Suppose $\mathcal {G}$ is sampled from the (d, k)-regular configuration model on n vertices. For any fixed t we have $\mathbb {P}( |V {\setminus } V_t| \geqslant n/(\ln \ln n) )\leqslant \exp \{ -n (\ln n)^{1/2} \}$ for n large enough (depending on d, k, t).

Proof

Let $\gamma $ count the total number of cycles in $\mathcal {G}$ of length at most 2t. If $v\notin V_t$ then v must certainly lie within distance t of one of these cycles, so crudely we have

$$\begin{aligned} |V{\setminus } V_t| \leqslant 2t (dk)^t\gamma \,. \end{aligned}$$

(59)

Consider breadth-first search exploration in $\mathcal {G}$ started from an arbitrary variable, say $v=1$. At each step of the exploration we reveal one edge, so the exploration takes nd steps total. Conditioned on everything revealed in the first t steps, the chance that the edge revealed at step $t+1$ will form a new cycle of length $\leqslant 2t$ is upper bounded by$(dk)^{2t}/(nd-t)$. It follows that the total number of cycles revealed up to time $nd(1-\delta )$ is stochastically dominated by a binomial random variable

$$\begin{aligned} \gamma '\sim {\mathrm {Bin}}\bigg ( nd(1-\delta ), \frac{(dk)^{2t}}{nd\delta }\bigg )\,. \end{aligned}$$

The final $nd\delta $ exploration steps form at most $nd\delta $ cycles, so $\gamma \leqslant \gamma ' + nd\delta $. Applying the Chernoff bound (as stated above) with $\delta =1/(\ln \ln n)^2$, we obtain

$$\begin{aligned} \mathbb {P}(\gamma \geqslant 2nd\delta ) \leqslant \mathbb {P}(\gamma ' \geqslant nd\delta ) \leqslant \exp \bigg \{ -nd\delta \ln \bigg (\frac{nd\delta ^2}{e (dk)^{2t}}\bigg ) \bigg \} \leqslant \exp \{-n(\ln n)^{1/2}\} \end{aligned}$$

for large enough n. Recalling (59) gives the claimed bound.$\square $

Given an instance $\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})$, let $V_t$ be as defined above and take $V'\equiv V_{4T}$. We then take i.i.d. random variables $I_v\sim \mathrm {Ber}(\epsilon ')$ indexed by $v\in V'$ (for $\epsilon '$ a constant to be determined) and let

$$\begin{aligned} Y_v\equiv \mathbf {1}\{ I_v=1, \text { and } I_u=0 \text { for all } u\in B_{4T}(v){\setminus }\{v\}\}\,. \end{aligned}$$

(60)

We then define $\mathbb {P}(Y\,|\,\mathscr {G})$ to be the law of the set $Y=\{v\in V' : Y_v=1\}$. Note that the random variables $Y_v$, for $v\in V'$, all have the same expected value, so we can define $\epsilon \equiv (\mathbb {E}Y_v)/2$.

Lemma 4.9

Define ${\mathscr {B}}$ to be the set of all $\mathscr {G}=(\mathcal {G},\underline{{\texttt {L}}})$ with $|V{\setminus } V_{4T}|\leqslant n/(\ln \ln n)$. For the sampling mechanism described above, condition (58) holds for any $\mathscr {G}\notin {\mathscr {B}}$ and any coloring $\underline{{\sigma }}$ on $\mathscr {G}$.

Proof

Fix an instance $\mathscr {G}\notin {\mathscr {B}}$ and a coloring $\underline{{\sigma }}$ on $\mathscr {G}$. Recalling Definition 4.1, for each $v\in V$ denote

$$\begin{aligned} X_v\equiv ({\dot{X}}_v,{\hat{X}}_v,{\bar{X}}_v) \equiv H^\text {sm}(\mathcal {G},\{v\},\underline{{\sigma }})\,. \end{aligned}$$

Assume without loss that $V'\equiv V_{4T} = [n']\equiv \{v_1,\ldots ,v_{n'}\}$, and for $0\leqslant \ell \leqslant n'$ let $\mathscr {F}_\ell $ denote the sigma-field generated by $Y_1,\ldots ,Y_\ell $. Consider

$$\begin{aligned}S \equiv \sum _{v\leqslant n'}A_v Y_v\end{aligned}$$

where we can take different choices of $A_v$ to prove various different bounds:

taking $A_v=1$ gives $S=|Y|$ and $\mathbb {E}S = 2n'\epsilon $;
taking $A_v={\dot{X}}_v(\underline{{\eta }})$ gives $S=|Y|\dot{H}^\text {sm}(\underline{{\eta }})$ and $|\mathbb {E}S - 2n'\epsilon \dot{H}(\underline{{\eta }})|\leqslant n-n'$;
taking $A_v={\hat{X}}_v(\underline{{\eta }})$ gives $S=|Y|\hat{H}^\text {sm}(\underline{{\eta }})$ and $|\mathbb {E}S - 2n'\epsilon \hat{H}(\underline{{\eta }})| \leqslant n-n'$;
taking $A_v={\bar{X}}_v(\eta )$ gives $S=|Y|\bar{H}^\text {sm}(\eta )$ and $|\mathbb {E}S-2n'\epsilon \bar{H}(\eta )|\leqslant n-n'$,

where we recall that $n-n'=|V{\setminus } V_{4T}\leqslant n/(\ln \ln n)$. Consider the Doob martingale

$$\begin{aligned} M_\ell \equiv \mathbb {E}(S\,|\,\mathscr {F}_\ell ) \equiv \sum _{v\leqslant n'} A_v \, \mathbb {E}(Y_v\,|\,\mathscr {F}_\ell )\,. \end{aligned}$$

For $\ell \leqslant n'$, if v lies at distance greater than 8T from any variable in $[\ell ]\equiv \{v_1,\ldots ,v_\ell \}$, then

$$\begin{aligned} \mathbb {E}(Y_v\,|\,\mathscr {F}_\ell ) = \mathbb {E}Y_v = 2\epsilon \,. \end{aligned}$$

Thus, the only possibility for $\mathbb {E}(Y_v\,|\,\mathscr {F}_{\ell +1})\ne \mathbb {E}(Y_v\,|\,\mathscr {F}_\ell )$ is that v lies within distance 8T of vertex $\ell +1$. The number of such v is at most $(dk)^{8T}$, so we conclude

$$\begin{aligned} |M_{\ell +1}-M_\ell | \leqslant (dk)^{8T} \Vert A \Vert _\infty \leqslant (dk)^{8T}\,. \end{aligned}$$

It follows by the Azuma–Hoeffding martingale inequality that

$$\begin{aligned} \mathbb {P}(| S - \mathbb {E}S | \geqslant x) \leqslant \exp \bigg \{ -\frac{x^2}{2 n' (dk)^{16T}} \bigg \}\,. \end{aligned}$$

The result follows by summing over the choices of A listed above, combined with our above estimates on $\mathbb {E}S$ for each choice of A.$\square $

Proof of Theorem 4.2

It follows from Lemmas 4.8 and 4.9 that the sampling mechanism described by (60) satisfies the conditions of Definition 4.6. The result then follows by taking $n\rightarrow \infty $ in Proposition 4.7. $\square $

5 Solution of tree optimization

From Theorem 4.2 we see that if $H\in \varvec{\Delta }$ is a local maximizer for the first moment exponent $\varvec{F}=\varvec{F}_{\lambda ,T}$, then its symmetrization $H^\text {sy}$ must be a zero of the function $\varvec{\Xi }=\varvec{\Xi }_{\lambda ,T}$ defined by (50). The analogous statement holds in the second moment with $\varvec{F}_2$ and $\varvec{\Xi }_2$. The functions $\varvec{\Xi },\varvec{\Xi }_2$ correspond to tree optimization problems, which we solve in this section by relating them to the bp recursions for the coloring model.

Proposition 5.1

For $0\leqslant \lambda \leqslant 1$ and $1\leqslant T<\infty $, let $H_\star \in \varvec{\Delta }$ and $H_\bullet \in \varvec{\Delta }_2$ be as in Definition 5.6 below.

a.
On $\{H\in \mathbf {N}_\circ :H = H^\text {sy}\}$, $\varvec{\Xi }$ is uniquely minimized at $H=H_\star $, with $\varvec{\Xi }(H_\star )=0$.
b.
On $\{H\in \mathbf {N}_\text {se}:H = H^\text {sy}\}$, $\varvec{\Xi }_2$ is uniquely minimized at $H=H_\bullet $, with $\varvec{\Xi }_2(H_\bullet )=0$.

Moreover there is a positive constant $\epsilon =\epsilon (d,k,T)$ such that

1.
$\varvec{\Xi }(H) \geqslant \epsilon \Vert {H-H_\star } \Vert ^2$ for all $H\in \varvec{\Delta }$ with $H=H^\text {sy}$ and $\Vert {H-H_\star } \Vert \leqslant \epsilon $, and
2.
$\varvec{\Xi }_2(H) \geqslant \epsilon \Vert {H-H_\bullet } \Vert ^2$ for all $H\in \varvec{\Delta }_2$ with $H=H^\text {sy}$ and $\Vert H-H_\bullet \Vert \leqslant \epsilon $.

5.1 Tree optimization problem

Recall from the previous section that in the local update procedure, we sample a subset of variables Y and consider its neighborhood $\mathscr {N}=(\mathcal {N},\underline{{\texttt {L}}}_\mathcal {N})$. Writing $s=|Y|$, the graph $\mathcal {N}$ is the disjoint union of $\mathcal {D}_1,\ldots ,\mathcal {D}_s$ where each $\mathcal {D}_i$ is a copy of the tree $\mathcal {D}$ of Fig. 6. Let $\varvec{\Pi }$ be the space of probability measures on colorings of $\mathcal {D}$. Any coloring $\underline{{\sigma }}_\mathcal {N}$ can be summarized by $\nu \in \varvec{\Pi }$ where $\nu (\underline{{\sigma }}_\mathcal {D})$ is the fraction of copies $\mathcal {D}_i$ with $\underline{{\sigma }}_{\mathcal {D}_i}=\underline{{\sigma }}_\mathcal {D}$. The sample empirical measure $H^\text {sm}=H^\text {sm}(\mathcal {G},Y,\underline{{\sigma }})$ can be obtained as a linear projection of $\nu $, and we hereafter denote this relation by $H^\text {sm}=H^\text {tr}(\nu )$. Recalling (52), we have $\mathbb {E}^lit [(\varvec{w}^\text {lit}_\mathscr {N}(\underline{{\sigma }}_\mathcal {N}))^\lambda ] = \varvec{w}_{\mathcal {D}}(\underline{{\sigma }}_{\mathcal {D}_1})^\lambda \cdots \varvec{w}_{\mathcal {D}}(\underline{{\sigma }}_{\mathcal {D}_s})^\lambda $ where

$$\begin{aligned}\varvec{w}_{\mathcal {D}}(\underline{{\sigma }}_\mathcal {D}) = \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{e\in \delta v}\bigg \{ \bar{\Phi }(\sigma _e) \hat{\Phi }(\underline{{\sigma }}_{\delta a(e)}) \bigg \}\,.\end{aligned}$$

Lemma 5.2

The function $\varvec{\Lambda }$ of (49) is concave on $\varvec{\Delta }^\text {sm}$, and can be expressed as

$$\begin{aligned} \varvec{\Lambda }(H)=\sup \Big \{ \mathcal {H}(\nu )+\lambda \langle \ln \varvec{w}_\mathcal {D},\nu \rangle :\nu \in \varvec{\Pi }\text { with } H^\text {tr}(\nu )=H \Big \}\,. \end{aligned}$$

(61)

Proof

The function $\varvec{\Lambda }(H)$ is the sum of $\varvec{\Sigma }^\text {tr}(H)$ and the linear function $\lambda \varvec{s}^\text {tr}(H)$, so it suffices to show that $\varvec{\Sigma }^\text {tr}$ is concave on $\varvec{\Delta }^\text {sm}$. For $H=(\dot{H},\hat{H},\bar{H})\in \varvec{\Delta }^\text {sm}$, if $X\in \Omega ^k$ is a random variable with law $\hat{H}$, then the first coordinate $X_1$ has marginal law $\bar{H}$ by (47). It follows that for any $H\in \varvec{\Delta }^\text {sm}$ we can express

$$\begin{aligned} \varvec{\Sigma }^\text {tr}(H) =\mathcal {H}(\dot{H})+d\mathcal {H}(X)-d\mathcal {H}(X_1)+\varvec{v}(H) =\mathcal {H}(\dot{H})+d\mathcal {H}(X\,|\,X_1)+\varvec{v}(H)\,. \end{aligned}$$

The entropy function is concave and $\varvec{v}$ is linear, so this proves that $\varvec{\Sigma }^\text {tr}$ (hence $\varvec{\Lambda }$) is indeed concave on $\varvec{\Delta }^\text {sm}$. In fact this can be argued alternatively, as follows. Recalling (56), note that for $H\in \varvec{\Pi }$ we have

$$\begin{aligned}s^{O(1)}\exp \{s\varvec{\Lambda }(H)\} =\mathbb {E}\varvec{Z}(H;\mathscr {N}) =\sum _{\nu \in \varvec{\Pi }} \mathbf {1}\{H^\text {tr}(\nu )=H\} \left( {\begin{array}{c}s\\ s\nu \end{array}}\right) (\varvec{w}_\mathcal {D})^{\lambda \nu }\,. \end{aligned}$$

Expanding with Stirling’s formula gives the representation (61), which also implies concavity of $\varvec{\Lambda }$.$\square $

Thus, for $H\in \varvec{\Delta }^\text {sm}$, we have $\varvec{\Xi }(H) = \varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))- \varvec{\Lambda }(H)$ where $\varvec{\Lambda }$ is given by (61), and

$$\begin{aligned} \varvec{\Lambda }^\text {op}({\dot{h}}) = \sup \Big \{\mathcal {H}(\nu ) + \lambda \langle \ln \varvec{w}_{\mathcal {D}},\nu \rangle : \nu \in \varvec{\Pi }\text { with } \dot{h}^\text {tr}(H^\text {tr}(\nu ))={\dot{h}} \Big \}\,. \end{aligned}$$

(62)

Both (61) and (62) fall in the general category of entropy maximization problems subject to linear constraints. In “Appendix C” we review basic calculations for problems of this type. The discussion there, in particular Remark C.7, implies that for any ${\dot{h}}$, there is a unique measure $\nu =\nu ^\text {op}({\dot{h}})$ achieving the maximum in (62). Moreover, there exists a probability measure ${\dot{q}}$ on $\dot{\Omega }_T$—serving the role of Lagrange multipliers for the constrained maximization—such that $\nu ^\text {op}({\dot{h}})$ can be expressed as

$$\begin{aligned} \nu (\underline{{\sigma }}_\mathcal {D}) =\nu _{\dot{q}}(\underline{{\sigma }}_\mathcal {D}) \equiv \frac{\varvec{w}_\mathcal {D}(\underline{{\sigma }})^\lambda }{Z} \prod _{e\in \delta \mathcal {D}} {\dot{q}}(\dot{\sigma }_e)\,, \end{aligned}$$

(63)

where Z is the normalizing constant. The analogous statement holds for the second moment.

5.2 BP recursions

We now state the bp recursions for the $\lambda $-tilted T-coloring model. In the standard formulation (e.g. [33, Ch. 14]), this is a pair of relations for probability measures $\dot{\varvec{q}}$, $\hat{\varvec{q}}$ on $\Omega _T$:

$$\begin{aligned} \dot{\varvec{q}}(\sigma )&=[\dot{\varvec{B}}_{\lambda ,T}(\hat{\varvec{q}})](\sigma ) \cong \mathbf {1}\{\sigma \in \Omega _T\} \bar{\Phi }(\sigma )^\lambda \sum _{\underline{{\sigma }}\in (\Omega _T)^d} \mathbf {1}\{\sigma _1=\sigma \} \dot{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=2}^d\hat{\varvec{q}}(\sigma _i)\\ \hat{\varvec{q}}(\sigma )&=[\hat{\varvec{B}}_{\lambda ,T}(\dot{\varvec{q}})](\sigma ) \cong \mathbf {1}\{\sigma \in \Omega _T\} \bar{\Phi }(\sigma )^\lambda \sum _{\underline{{\sigma }}\in (\Omega _T)^k} \mathbf {1}\{\sigma _1=\sigma \} \hat{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=2}^k\dot{\varvec{q}}(\sigma _i) \end{aligned}$$

where $\cong $ denotes equality up to normalization, so that the mapping always outputs a probability measure. Recall from Definition 4.1 that for $\dot{\sigma }\equiv (\dot{\sigma },\hat{\sigma })\in \Omega _T$ we have $\dot{\sigma }\in \dot{\Omega }_T$ and $\hat{\sigma }\in \hat{\Omega }_T$. For our purposes we can assume a one-sided dependence, meaning there are probability measures ${\dot{q}}$ on $\dot{\Omega }_T$ and $\hat{q}$ on $\hat{\Omega }_T$ such that $\dot{\varvec{q}}(\sigma ) \cong {\dot{q}}(\dot{\sigma })\mathbf {1}\{\sigma \in \Omega _T\}$ and $\hat{\varvec{q}}(\sigma )\cong \hat{q}(\hat{\sigma })\mathbf {1}\{\sigma \in \Omega _T\}$. One can then check (e.g. [33, Ch. 19]) that the bp recursions preserve the one-sided property, so that $\dot{\varvec{B}}_{\lambda ,T}$ and $\hat{\varvec{B}}_{\lambda ,T}$ restrict to mappings

$$\begin{aligned} \dot{{\texttt {BP}}}\equiv \dot{{\texttt {BP}}}_{\lambda ,T} : {\mathscr {P}}(\hat{\Omega }_T) \rightarrow {\mathscr {P}}(\dot{\Omega }_T)\,,\quad \hat{{\texttt {BP}}}\equiv \hat{{\texttt {BP}}}_{\lambda ,T} : {\mathscr {P}}(\dot{\Omega }_T) \rightarrow {\mathscr {P}}(\hat{\Omega }_T)\,. \end{aligned}$$

(64)

We also denote ${\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,T} \equiv \dot{{\texttt {BP}}}\circ \hat{{\texttt {BP}}}$. Given any ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)$, write $\hat{q}\equiv {\texttt {BP}}{\dot{q}}$, and let $H\equiv H_{\dot{q}}$ be defined by

$$\begin{aligned} \dot{H}_{\dot{q}}(\underline{{\sigma }}) =\frac{\dot{\Phi }(\underline{{\sigma }})^\lambda }{\dot{\mathfrak {z}}} \prod _{i=1}^d \hat{q}(\hat{\sigma }_i), \quad \hat{H}_{\dot{q}}(\underline{{\sigma }}) =\frac{\hat{\Phi }(\underline{{\sigma }})^\lambda }{\hat{\mathfrak {z}}} \prod _{i=1}^d {\dot{q}}(\dot{\sigma }_i), \quad \bar{H}_{\dot{q}}(\sigma ) =\frac{\bar{\Phi }(\sigma )^{-\lambda }}{\bar{\mathfrak {z}}} {\dot{q}}(\dot{\sigma }) \hat{q}(\hat{\sigma }) \end{aligned}$$

(65)

where $\dot{\mathfrak {z}}$, $\hat{\mathfrak {z}}$, and $\bar{\mathfrak {z}}$ are normalizing constants, all dependent on ${\dot{q}}$. Clearly, $H_{\dot{q}}=(H_{\dot{q}})^\text {sy}$. If ${\dot{q}}$ is a fixed point of ${\texttt {BP}}$, then $H_{\dot{q}}\in \varvec{\Delta }$. An entirely similar discussion applies to the pair (second moment) model, where the bp recursion reduces to a pair of mappings between ${\mathscr {P}}((\dot{\Omega }_T)^2)$ and ${\mathscr {P}}((\hat{\Omega }_T)^2)$. If ${\dot{q}}\in {\mathscr {P}}((\dot{\Omega }_T)^2)$ is a fixed point of ${\texttt {BP}}$, then (65) defines an element $H_{\dot{q}}\in \varvec{\Delta }_2$.

Lemma 5.3

For $1\leqslant T<\infty $, if ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)$ is any fixed point of ${\texttt {BP}}_{\lambda ,T}$ which has full support on $\dot{\Omega }_T$, then $\varvec{\Xi }(H_{\dot{q}})=0$. The analogous statement holds for the second moment.

Proof

Consider the optimization problem (62) for $\varvec{\Lambda }^\text {op}({\dot{h}})$ with ${\dot{h}}=\dot{h}^\text {tr}(H_{\dot{q}})$. As noted above, $\nu ^\text {op}({\dot{h}})$ can be written (63) as $\nu _{\tilde{q}}$ for some measure $\tilde{q}\in {\mathscr {P}}(\dot{\Omega }_T)$, which may not be unique if the constraint $\dot{h}^\text {tr}(H^\text {tr}(\nu ))={\dot{h}}$ is rank-deficient. However, if ${\dot{q}}$ has full support on $\dot{\Omega }_T$, then ${\dot{h}}=\dot{h}^\text {tr}(H_{\dot{q}})$ does also. In this case it is straightforward to check that the constraints are indeed of full rank, so $\tilde{q}$ is unique. Because ${\dot{q}}$ is a fixed point of ${\texttt {BP}}$, the measure $\nu _{\dot{q}}$ satisfies $H^\text {tr}(\nu _{\dot{q}})=H_{\dot{q}}$, so it also satisfies the weaker constraint $\dot{h}^\text {tr}(H^\text {tr}(\nu _{\dot{q}}))={\dot{h}}$. It follows by the above uniqueness argument that ${\dot{q}}=\tilde{q}$. Therefore, $\nu _{\dot{q}}$ solves the optimization problem (62) for $\varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H_{\dot{q}}))$, as well as the optimization problem (61) for $\varvec{\Lambda }(H_{\dot{q}})$, so we conclude $\varvec{\Xi }(H_{\dot{q}})=0$ as claimed.$\square $

Lemma 5.4

For $0\leqslant \lambda \leqslant 1$ and $1\leqslant T<\infty $, let $\varvec{\Xi }=\varvec{\Xi }_{\lambda ,T}$, $\varvec{\Xi }_2=\varvec{\Xi }_{2,\lambda ,T}$, and ${\texttt {BP}}={\texttt {BP}}_{\lambda ,T}$.

a.
If $H\in \mathbf {N}_\circ $ with $H=H^\text {sy}$ and $\varvec{\Xi }(H)=0$, then $H=H_{\dot{q}}$ where ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)$ is a fixed point of ${\texttt {BP}}$.
b.
If $H\in \mathbf {N}_\text {se}$ with $H=H^\text {sy}$ and $\varvec{\Xi }_2(H)=0$, then $H=H_{\dot{q}}$ where ${\dot{q}}\in {\mathscr {P}}((\dot{\Omega }_T)^2)$ is a fixed point of ${\texttt {BP}}$.

Proof

Let $\mu =\nu ^\text {op}(H)$ denote the solution of the optimization problem (61) for $\varvec{\Lambda }(H)$, and let $\nu =\nu ^\text {op}(\dot{h}^\text {tr}(H))$ denote the solution of the optimization problem (62) for $\varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))$. Since (62) has a unique optimizer, we have $\varvec{\Xi }(H)=0$ if and only if $\mu =\nu $. This means $H^\text {tr}(\nu )=H$, but also $\nu =\nu _{\dot{q}}$ from (63), which gives

$$\begin{aligned} \hat{H}(\underline{{\sigma }}) \cong \hat{\Phi }(\underline{{\sigma }})^\lambda (({\texttt {BP}}{\dot{q}})(\dot{\sigma }_1)) \prod _{i=2}^k {\dot{q}}(\dot{\sigma }_i)\,. \end{aligned}$$

(66)

We now claim that in order for $\hat{H}=\hat{H}^\text {sy}$, we must have ${\texttt {BP}}{\dot{q}}={\dot{q}}$. Note that if $\hat{\Phi }$ were fully supported on $(\Omega _T)^k$, and both ${\dot{q}}$ and ${\texttt {BP}}{\dot{q}}$ were fully supported on $\dot{\Omega }_T$, the claim would be obvious. Since $\hat{\Phi }$ is certainly not fully supported, and we also do not know a priori whether ${\dot{q}}$ and ${\texttt {BP}}{\dot{q}}$ are fully supported, the claim requires some argument, which differs slightly between the first- and second-moment cases:

a.
In the first moment, Lemma 3.3 implies that ${\dot{q}}(\dot{\sigma })$ is positive for at least one $\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$. Assume without loss that ${\dot{q}}({\texttt {b}}_{\texttt {0}})$ is positive; it follows that $({\texttt {BP}}{\dot{q}})(\dot{\sigma })$ is positive for both $\dot{\sigma }={\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}$. For any $\dot{\sigma }\in \dot{\Omega }$, there exists $\hat{\sigma }$ such that
$$\begin{aligned} \hat{\Phi }( (\dot{\sigma },\hat{\sigma }), {\texttt {b}}_{\texttt {0}},\ldots ,{\texttt {b}}_{\texttt {0}})>0\,. \end{aligned}$$
(67)
The symmetry of $\hat{H}$ then gives the relation
$$\begin{aligned} \frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}})({\texttt {b}}_{\texttt {0}})} = \frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}})}, \end{aligned}$$
so it follows that ${\texttt {BP}}{\dot{q}}={\dot{q}}$ in the first moment.
b.
In the second moment, since we restrict to $H\in \mathbf {N}_\text {se}$, ${\dot{q}}(\dot{\sigma })$ is positive for at least one $\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2$. Assume without loss that ${\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})$ is positive. For any $\dot{\sigma }\notin \{{\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}\}$, there exists $\hat{\sigma }$ such that the second-moment analogue of (67) holds. The preceding argument gives
$$\begin{aligned}\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} = \frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} \quad \text {for all } \dot{\sigma }\notin \{{\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}\}\,. \end{aligned}$$
Since $({\texttt {BP}}{\dot{q}})(\dot{\sigma })$ is positive for all $\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2$, it follows that the same holds for ${\dot{q}}$, so
$$\begin{aligned}\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})} = \frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})} \quad \text {for all } \dot{\sigma }\notin \{ {\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}, {\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}}\}\,.\end{aligned}$$
Combining these, we have for $\dot{\sigma }\in \{{\texttt {r}}_{{\texttt {0}}}{\texttt {r}}_{{\texttt {1}}},{\texttt {r}}_{{\texttt {1}}}{\texttt {r}}_{{\texttt {0}}}\}$ that
$$\begin{aligned}\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} =\frac{({\texttt {BP}}{\dot{q}})(\dot{\sigma })}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})} \frac{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})}{({\texttt {BP}}{\dot{q}}) ({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})} =\frac{{\dot{q}}(\dot{\sigma })}{{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})}, \end{aligned}$$
and this proves ${\texttt {BP}}{\dot{q}}={\dot{q}}$ in the second moment.

Altogether, the above proves in both the first- and second-moment settings that ${\dot{q}}$ is a bp fixed point.$\square $

5.3 BP contraction and conclusion

The next step is to (explicitly) define a subset $\varvec{\Gamma }$ of measures ${\dot{q}}$ on which we have a contraction estimate of the form $\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert $ for a constant $c<1$. A useful feature of nae-sat is that its bp recursions are self-averaging: if ${\dot{q}}$ is a measure on $\dot{\Omega }_T$, let

$$\begin{aligned}{\dot{q}}^\text {av}(\dot{\sigma }) \equiv \frac{{\dot{q}}(\dot{\sigma })+{\dot{q}}(\dot{\sigma }\oplus {\texttt {1}})}{2}\,. \end{aligned}$$

Then $\hat{{\texttt {BP}}}{\dot{q}}=\hat{{\texttt {BP}}}{\dot{q}}^\text {av}$, and consequently ${\texttt {BP}}{\dot{q}}={\texttt {BP}}{\dot{q}}^\text {av}$. The analogous statement holds in the second moment. It then suffices to prove contraction on the measures ${\dot{q}}={\dot{q}}^\text {av}$, since for general ${\dot{q}}$ it implies

$$\begin{aligned}\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert =\Vert {{\texttt {BP}}{\dot{q}}^\text {av}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}^\text {av}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \,. \end{aligned}$$

Abbreviate $\{{\texttt {r}}\}\equiv \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\}$ and $\{{\texttt {b}}\}\equiv \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$. In a mild abuse of notation we now write $\{{\texttt {f}}\}$ for $(\dot{\Omega }\cup \hat{\Omega }){\setminus }\{{\texttt {r}},{\texttt {b}}\}$; so for instance ${\dot{q}}({\texttt {f}}) ={\dot{q}}(\dot{\Omega }{\setminus }\{{\texttt {r}},{\texttt {b}}\}) ={\dot{q}}(\dot{\mathscr {M}}{\setminus }\{{\texttt {0}},{\texttt {1}},\star \})$. For the first moment analysis, let $\varvec{\Gamma }$ be the set of measures ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)$ satisfying ${\dot{q}}={\dot{q}}^\text {av}$, such that

$$\begin{aligned} \frac{{\dot{q}}({\texttt {r}}) + 2^k{\dot{q}}({\texttt {f}}) }{C} \leqslant {\dot{q}}({\texttt {b}}) \leqslant \frac{{\dot{q}}({\texttt {r}})}{1-C/2^k} \end{aligned}$$

(68)

for C a large constant (to be determined). For the second moment analysis, let $\varvec{\Gamma }(c,\kappa )$ be the set of measures ${\dot{q}}\in {\mathscr {P}}((\dot{\Omega }_T)^2)$ satisfying ${\dot{q}}={\dot{q}}^\text {av}$, such that

$$\begin{aligned}&|{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -{\dot{q}}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})| \leqslant (k^9/2^{ck}) {\dot{q}}({\texttt {b}}{\texttt {b}}),\hbox { and }{\dot{q}}({\texttt {f}}{\texttt {f}})+{\dot{q}}(\{{\texttt {f}}{\texttt {r}},{\texttt {r}}{\texttt {f}}\})/2^k\\&\quad +{\dot{q}}({\texttt {r}}{\texttt {r}})/4^k \leqslant (C/2^k) {\dot{q}}({\texttt {b}}{\texttt {b}});\qquad \qquad \quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\,\, (1\varvec{\Gamma })\\&{\dot{q}}(\{{\texttt {r}}{\texttt {f}},{\texttt {f}}{\texttt {r}}\} ) \leqslant (C/2^{k\kappa }){\dot{q}}({\texttt {b}}{\texttt {b}})\hbox { and }{\dot{q}}({\texttt {r}}{\texttt {r}}) \leqslant C 2^{k(1-\kappa ) } {\dot{q}}({\texttt {b}}{\texttt {b}});\qquad \qquad \qquad \quad \qquad \,\, (2\varvec{\Gamma })\\&{\dot{q}}({\texttt {r}}_{\varvec{x}}\dot{\sigma }) \geqslant [1-{C/2^k}] {\dot{q}}({\texttt {b}}_{\varvec{x}}\dot{\sigma })\text { and } {\dot{q}}(\dot{\sigma }{\texttt {r}}_{\varvec{x}})\\ {}&\geqslant [1-{C/2^k}] {\dot{q}}(\dot{\sigma }{\texttt {b}}_{\varvec{x}})\text { for all } {\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}, \dot{\sigma }\in \dot{\Omega }.\quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad (3\varvec{\Gamma }) \end{aligned}$$

(To clarify the notation: since ${\texttt {b}}\equiv \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$, in ($1\varvec{\Gamma }$) the expression ${\dot{q}}({\texttt {b}}{\texttt {b}})$ refers to ${\dot{q}}(\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2)$. Similarly, ${\dot{q}}({\texttt {f}}{\texttt {r}})$ refers to ${\dot{q}}(\{{\texttt {f}}\}\times \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\})$ where in this context $\{{\texttt {f}}\}=(\dot{\Omega }\cup \hat{\Omega }){\setminus }\{{\texttt {r}},{\texttt {b}}\}$.)

Proposition 5.5

(proved in “Appendix A”) Assume $0\leqslant \lambda \leqslant 1$. In the first moment, we have:

a.
For any $1\leqslant T\leqslant \infty $, the map ${\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,T}$ has a unique fixed point ${\dot{q}}_\star \equiv {\dot{q}}_{\lambda ,T}\in \varvec{\Gamma }$. For any ${\dot{q}}\in \varvec{\Gamma }$, we have ${\texttt {BP}}{\dot{q}}\in \varvec{\Gamma }$ also, with $\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^2/2^k) \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert $.
b.
In the limit $T\rightarrow \infty $, $\Vert {{\dot{q}}_{\lambda ,T} - {\dot{q}}_{\lambda ,\infty }} \Vert \rightarrow 0$.

In the second moment, for any $1\leqslant T\leqslant \infty $, we have the following:

A.
The map ${\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,T}$ has a unique fixed point in $\varvec{\Gamma }(1,1)$, given by ${\dot{q}}_\star \otimes {\dot{q}}_\star $ with ${\dot{q}}_\star $ as in part a. Moreover, for $c\in (0,1]$ and k sufficiently large, there is no other fixed point of ${\texttt {BP}}$ in $\varvec{\Gamma }(c,1)$: if ${\dot{q}}\in \varvec{\Gamma }(c,1)$ then ${\texttt {BP}}{\dot{q}}\in \varvec{\Gamma }(1,1)$, with $\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^4/2^k) \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert $.
B.
If for some $c\in (0,1]$ we have ${\dot{q}}\in \varvec{\Gamma }(c,0)$ and ${\dot{q}}= {\texttt {BP}}{\dot{q}}$, then ${\dot{q}}\in \varvec{\Gamma }(c,1)$.

Definition 5.6

(optimal empirical measures) For $0\leqslant \lambda \leqslant 1$ and $1\leqslant T<\infty $, take the fixed point ${\dot{q}}_\star ={\dot{q}}_{\lambda ,T}$ as given by Proposition 5.5a, and use (65) to define $H_\star =H_{{\dot{q}}_\star }\in \varvec{\Delta }$ and $H_\bullet =H_{{\dot{q}}_\star \otimes {\dot{q}}_\star }\in \varvec{\Delta }_2$. Note that this agrees with our earlier definition of $H_\bullet $, in the discussion below Lemma 3.9.

Lemma 5.7

Let ${\dot{q}}$ be any fixed point of ${\texttt {BP}}$ that arises from Lemma 5.4.

a.
If $H=H_{\dot{q}}\in \mathbf {N}_\circ $, then ${\dot{q}}={\dot{q}}_\star $ and so $H=H_\star $.
b.
If $H=H_{\dot{q}}\in \mathbf {N}_\text {se}$, then ${\dot{q}}={\dot{q}}_\star \otimes {\dot{q}}_\star $ and so $H=H_\bullet $.

Proof

Since ${\dot{q}}={\texttt {BP}}{\dot{q}}$, we must have ${\dot{q}}={\dot{q}}^\text {av}$. Below we argue separately for the first and second moment. In each case we repeatedly take advantage of the fact that $H=H_{\dot{q}}$ is symmetric.

a.
For the first moment, by Proposition 5.5a it suffices to show that ${\dot{q}}$ must lie in the set $\varvec{\Gamma }$ defined by (68). It follows directly from the relation ${\dot{q}}={\texttt {BP}}{\dot{q}}$ that ${\dot{q}}({\texttt {r}})\geqslant {\dot{q}}({\texttt {b}})$. By definition of $\mathbf {N}_\circ $ we must have $\bar{H}({\texttt {r}})\leqslant 7/2^k$ and $\bar{H}({\texttt {f}})\leqslant 7/2^k$, so the vast majority of clauses must have all incident colors in $\{{\texttt {b}}\}=\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$:
$$\begin{aligned}1-\frac{14k}{2^k} \leqslant \hat{H}({\texttt {b}}^k) = \frac{1}{\hat{\mathfrak {z}}}\sum _{\underline{{\sigma }}\in {\texttt {b}}^k} \hat{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=1}^k {\dot{q}}(\dot{\sigma }_i) \leqslant \frac{{\dot{q}}({\texttt {b}})^k}{\hat{\mathfrak {z}}}\,. \end{aligned}$$
Next, if $\underline{{\sigma }}\in {\texttt {r}}{\texttt {b}}^{k-1}$, we have $\hat{\Phi }(\underline{{\sigma }})^\lambda =\mathbb {E}^lit [\hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})^\lambda ] \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})= 2/2^k$, so
$$\begin{aligned}\frac{7}{2^k}\geqslant \bar{H}({\texttt {r}})= \hat{H}({\texttt {r}}{\texttt {b}}^{k-1}) \geqslant \frac{{\dot{q}}({\texttt {r}}){\dot{q}}({\texttt {b}})^{k-1}}{2^{k-1} \hat{\mathfrak {z}}} \geqslant \frac{{\dot{q}}({\texttt {r}})}{{\dot{q}}({\texttt {b}})} \frac{2}{2^k} \bigg (1-\frac{14k}{2^k} \bigg )\,,\end{aligned}$$
which gives ${\dot{q}}({\texttt {r}})/{\dot{q}}({\texttt {b}})\leqslant 4$ for large k. Similarly, if $\underline{{\sigma }}\in {\texttt {f}}{\texttt {b}}^{k-1}$ with $\hat{\sigma }_1={\texttt {s}}$ (indicating a separating clause), then $\hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}}) = 1-4/2^k$, so
$$\begin{aligned}\frac{7}{2^k} \geqslant \bar{H}({\texttt {f}}) \geqslant \hat{H}({\texttt {f}}{\texttt {b}}^{k-1}) \geqslant \bigg (1-\frac{4}{2^k}\bigg ) \frac{{\dot{q}}({\texttt {f}}){\dot{q}}({\texttt {b}})^{k-1}}{\hat{\mathfrak {z}}} \geqslant \frac{{\dot{q}}({\texttt {f}})}{{\dot{q}}({\texttt {b}})} \bigg (1-\frac{4}{2^k}\bigg ) \bigg (1-\frac{14k}{2^k}\bigg )\,, \end{aligned}$$
which gives ${\dot{q}}({\texttt {f}})/{\dot{q}}({\texttt {b}}) \leqslant 8/2^k$ for large k. Combining these estimates proves ${\dot{q}}\in \varvec{\Gamma }$.
b.
For the second moment, by Proposition 5.5A it suffices to verify ${\dot{q}}\in \varvec{\Gamma }(1,1)$, as defined by ($1\varvec{\Gamma }$)–($3\varvec{\Gamma }$). Condition ($3\varvec{\Gamma }$) is immediate from the relation ${\dot{q}}={\texttt {BP}}{\dot{q}}$. Moreover, by Proposition 5.5B it suffices to show ${\dot{q}}\in \varvec{\Gamma }(1,0)$, in which case condition ($2\varvec{\Gamma }$) follows from ($1\varvec{\Gamma }$). It remains to verify ($1\varvec{\Gamma }$). For this end, we denote $\mathbb {B}\equiv \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}^2$, and partition this into $\mathbb {B}_\texttt {=}\equiv \{{\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}\}$ and . By definition, for any $H\in \mathbf {N}_\text {se}$ the single-copy marginals $H^1,H^2$ lie in $\mathbf {N}\subseteq \mathbf {N}_\circ $, so the total density of $\{{\texttt {r}},{\texttt {f}}\}$ edges in either copy is very small. As a result the vast majority of clauses have all incident colors in $\mathbb {B}$:
$$\begin{aligned}1-\frac{28k}{2^k} \leqslant \hat{H}(\mathbb {B}^k) \leqslant \frac{{\dot{q}}(\mathbb {B})^k}{\hat{\mathfrak {z}}}\,.\end{aligned}$$
For $H\in \mathbf {N}_\text {se}$, we have . The fraction of edges in not-all-$\mathbb {B}$ clauses is $O(k/2^k)$, and for $\underline{{\sigma }}\in \mathbb {B}^k$ we have $1\geqslant \hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}}) = 1 - O(k/2^k)$, so
Rearranging gives , which proves the first part of ($1\varvec{\Gamma }$) (with $c=1$). It remains to show the second part of ($1\varvec{\Gamma }$). If we denote $\mathbb {R}_\texttt {=}\equiv \{{\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}}\}$ and consider $\underline{{\sigma }}\in \mathbb {R}_\texttt {=}(\mathbb {B}_\texttt {=})^{k-1}$, then (similarly as above) we have $\hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant \mathbb {E}^lit \hat{\Phi }^\text {lit}(\underline{{\sigma }}\oplus \underline{{\texttt {L}}})=2/2^k$, so
$$\begin{aligned}\frac{7}{2^k}\geqslant \bar{H}(\mathbb {R}_\texttt {=}) =\hat{H}(\mathbb {R}_\texttt {=}(\mathbb {B}_\texttt {=})^{k-1}) \geqslant \frac{2}{2^k} \frac{{\dot{q}}(\mathbb {R}_\texttt {=}){\dot{q}}(\mathbb {B}_\texttt {=})^{k-1}}{\hat{\mathfrak {z}}} \geqslant \frac{2}{4^k}\frac{{\dot{q}}(\mathbb {R}_\texttt {=})}{{\dot{q}}(\mathbb {B})} \bigg ( 1-\frac{O(k^5)}{2^{k/2}}\bigg )\,,\end{aligned}$$
where the last inequality is by the preceding estimates on ${\dot{q}}(\mathbb {B})$ and ${\dot{q}}(\mathbb {B}_\texttt {=})$. The same calculation bounds for . Next consider $\sigma =((\dot{\sigma },{\texttt {s}}),{\texttt {r}})\in \{{\texttt {f}}{\texttt {r}}\}$: if $\underline{{\sigma }}\in \Omega ^k$ with $\sigma _1=\sigma $ and the other $k-1$ entries in $\mathbb {B}$, then $\hat{\Phi }(\underline{{\sigma }})^\lambda \geqslant 4/2^k$ as long as the other entries are not all $\mathbb {B}_\texttt {=}$ or all . Therefore
and the same calculation bounds ${\dot{q}}({\texttt {r}}{\texttt {f}})$. Finally, for $\sigma =((\dot{\sigma }^1,{\texttt {s}}),(\dot{\sigma }^2,{\texttt {s}}))\in \{{\texttt {f}}{\texttt {f}}\}$, we can consider $\underline{{\sigma }}\in \Omega ^k$ with $\sigma _1=\sigma $ and the other $k-1$ entries in $\mathbb {B}$; therefore
$$\begin{aligned}\frac{7}{2^k}\geqslant \bar{H}({\texttt {f}}{\texttt {f}}) \geqslant \frac{{\dot{q}}({\texttt {f}}{\texttt {f}}){\dot{q}}(\mathbb {B})^{k-1}}{\hat{\mathfrak {z}}} \bigg (1-\frac{4}{2^k}\bigg ) \geqslant \frac{{\dot{q}}({\texttt {f}}{\texttt {f}})}{{\dot{q}}(\mathbb {B})}\bigg (1-\frac{O(k)}{2^k}\bigg )\,.\end{aligned}$$
Combining these estimate verifies the second part of ($1\varvec{\Gamma }$).

Altogether we have shown that if $H=H_{\dot{q}}$ lies in $\mathbf {N}_\circ $, then ${\dot{q}}\in \varvec{\Gamma }$ and so ${\dot{q}}={\dot{q}}_\star $; and if $H=H_{\dot{q}}$ lies in $\mathbf {N}_\text {se}$ then ${\dot{q}}\in \varvec{\Gamma }(1,1)$ and so ${\dot{q}}={\dot{q}}_\star \otimes {\dot{q}}_\star $. This concludes the proof. $\square $

Proof of Proposition 5.1

We will prove the claim in the first moment; the result for the second moment follows by the same argument. It follows from Lemmas 5.3, 5.4 and 5.7 that the unique minimizer of $\varvec{\Xi }$ on the set $\{H\in \mathbf {N}_\circ :H=H^\text {sy}\}$ is $H_\star $ (as given by Definition 5.6), with $\varvec{\Xi }(H_\star )=0$. It remains to establish that, for $H\in \varvec{\Delta }$ with $H=H^\text {sy}$ and $\Vert {H-H_\star } \Vert \leqslant \epsilon $, we have $\varvec{\Xi }(H)\geqslant \epsilon \Vert {H-H_\star } \Vert ^2$. As in the proof of Lemma 5.4, let $\mu =\nu ^\text {op}(H)$ be the solution of the optimization problem (61) for $\varvec{\Lambda }(H)$, and let $\nu =\nu ^\text {op}(\dot{h}^\text {tr}(H))$ be the solution of the optimization problem (62) for $\varvec{\Lambda }^\text {op}(\dot{h}^\text {tr}(H))$. We have from (63) that for some ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega }_T)$,

$$\begin{aligned} \nu (\underline{{\sigma }}_\mathcal {D}) =\nu _{\dot{q}}(\underline{{\sigma }}_\mathcal {D}) = \frac{\varvec{w}_\mathcal {D}(\underline{{\sigma }})^\lambda }{Z} \prod _{e\in \delta \mathcal {D}}{\dot{q}}(\dot{\sigma }_e)\,. \end{aligned}$$

For $e\in \delta \mathcal {D}$, abbreviate $g_e(\underline{{\sigma }}_\mathcal {D}) \equiv \ln {\dot{q}}(\sigma _e)$. Then, for any probability measure $\varpi $ on colorings $\underline{{\sigma }}_\mathcal {D}$, the quantity $\Lambda (\omega ) \equiv \mathcal {H}(\varpi )+\lambda \langle \ln \varvec{w}_\mathcal {D},\varpi \rangle $ can be expressed as

$$\begin{aligned} \Lambda (\omega )= & {} \mathcal {H}(\varpi ) +\bigg \langle \ln \nu +\ln Z -\sum _{e\in \delta \mathcal {D}}g_e,\varpi \bigg \rangle \\= & {} -\mathcal {D}_{\textsc {kl}}(\omega |\nu )+\ln Z -|\delta \mathcal {D}|\langle \ln {\dot{q}}, \dot{h}^\text {tr}(H^\text {tr}(\varpi ))\rangle \,. \end{aligned}$$

We have $\dot{h}^\text {tr}(H^\text {tr}(\varpi ))=\dot{h}^\text {tr}(H)$ for both $\varpi =\nu $ and $\varpi =\mu $, so $\varvec{\Xi }(H) =\Lambda (\mu )-\Lambda (\nu )=\mathcal {D}_{\textsc {kl}}(\mu |\nu )$. (For further discussion, see Proposition C.6.) It is well known that $\mathcal {D}_{\textsc {kl}}(\mu |\nu )\gtrsim \Vert {\mu -\nu } \Vert ^2$, so to conclude it remains for us to show that $\Vert {\mu -\nu } \Vert \gtrsim \Vert {H-H_\star } \Vert $. To this end, let $\nu _\star \equiv \nu _{{\dot{q}}_\star }$, and note that $H=H^\text {tr}(\mu )$ while $H_\star =H^\text {tr}(\nu _\star )$. Recall from the discussion preceding Lemma 5.2 that $\varpi \mapsto H^\text {tr}(\varpi )$ is a linear projection, so

$$\begin{aligned}\Vert {H-H_\star } \Vert \lesssim \Vert {\mu -\nu _\star } \Vert \leqslant \Vert {\mu -\nu } \Vert +\Vert {\nu -\nu _\star } \Vert \lesssim \Vert {\mu -\nu } \Vert +\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \,, \end{aligned}$$

where the last bound holds since $\nu =\nu _{\dot{q}}$ and $\nu _\star =\nu _{{\dot{q}}_\star }$. Recall from Proposition 5.5a (or Proposition 5.5A for the second moment) that we have the contraction estimate $\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant c\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert $ for $c\in (0,1)$, so

$$\begin{aligned}(1-c)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert -\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert \leqslant \Vert {{\dot{q}}-{\texttt {BP}}{\dot{q}}} \Vert \,. \end{aligned}$$

Let $K\equiv ({\dot{K}},{\hat{K}},{\bar{K}})\equiv H^\text {tr}(\nu )$, and note that ${\hat{K}}$ need not be symmetric: if we let ${\hat{K}}'(\underline{{\sigma }})\equiv {\hat{K}}(\sigma _2,\ldots ,\sigma _k,\sigma _1)$ for $\underline{{\sigma }}\in (\Omega _T)^k$, then ${\hat{K}}$ and ${\hat{K}}'$ need not agree. On the other hand $H=H^\text {tr}(\mu )=H^\text {sy}$, so

$$\begin{aligned}\Vert {{\hat{K}}-{\hat{K}}'} \Vert \leqslant \Vert {\hat{H}-{\hat{K}}} \Vert +\Vert {\hat{H}-{\hat{K}}'} \Vert =2\Vert {\hat{H}-{\hat{K}}} \Vert \leqslant 2\Vert {H-K} \Vert \lesssim \Vert {\mu -\nu } \Vert \,. \end{aligned}$$

For any k-tuple $\underline{{{\dot{h}}}}\equiv ({\dot{h}}_1,\ldots ,{\dot{h}}_k)$ of probability measures on $\dot{\Omega }_T$, consider

$$\begin{aligned}\hat{H}^\text {op}(\underline{{{\dot{h}}}}) \equiv {{\,\mathrm{arg\,max}\,}}_{\hat{\nu }} \bigg \{\mathcal {H}(\hat{\nu }) + \lambda \langle \ln \hat{\Phi },\hat{\nu }\rangle : \hat{\nu }(\dot{\sigma }_1=\cdot )={\dot{h}}_i \text { for all }i \bigg \} \end{aligned}$$

where $\hat{\nu }$ denotes any probability measure on ${{\,\mathrm{supp}\,}}{\hat{v}}\subseteq (\Omega _T)^k$. The unique optimizer $\hat{H}^\text {op}(\underline{{{\dot{h}}}})$ can be described in terms of another k-tuple of probability measures on $\dot{\Omega }_T$, denoted $\underline{{{\dot{q}}}}\equiv ({\dot{q}}_1,\ldots ,{\dot{q}}_k)$, which serve as Lagrange multipliers: $\hat{H}^\text {op}(\underline{{{\dot{h}}}})=\hat{H}(\underline{{{\dot{q}}}})$ where

$$\begin{aligned}{}[\hat{H}(\underline{{{\dot{q}}}})](\underline{{\sigma }})\cong \hat{\Phi }(\underline{{\sigma }})^\lambda \prod _{i=1}^k{\dot{q}}_i(\dot{\sigma }_i)\,. \end{aligned}$$

In particular, $\hat{H}^\text {op}(\underline{{{\dot{h}}}}_\star )=\hat{H}(\underline{{{\dot{q}}}}_\star )$ for $\underline{{{\dot{h}}}}_\star \equiv (\dot{h}^\text {tr}(H_\star ),\ldots ,\dot{h}^\text {tr}(H_\star ))$ and $\underline{{{\dot{q}}}}_\star \equiv ({\dot{q}}_\star ,\ldots ,{\dot{q}}_\star )$. For $\underline{{{\dot{h}}}}$ near $\underline{{{\dot{h}}}}_\star $, there is a unique $\underline{{{\dot{q}}}}$ satisfying $\hat{H}^\text {op}(\underline{{{\dot{h}}}})=\hat{H}(\underline{{{\dot{q}}}})$, and we can determine this $\underline{{{\dot{q}}}}$ as a smooth function of $\underline{{{\dot{h}}}}$. Thus

$$\begin{aligned}\Vert {{\hat{K}}-{\hat{K}}'} \Vert =\Vert {{\hat{H}}({\texttt {BP}}{\dot{q}},{\dot{q}},\ldots ,{\dot{q}}) -{\hat{H}}({\dot{q}},{\texttt {BP}}{\dot{q}},{\dot{q}},\ldots ,{\dot{q}}) } \Vert \gtrsim \Vert {{\dot{q}}-{\texttt {BP}}{\dot{q}}} \Vert \,. \end{aligned}$$

Combining the above inequalities gives $\Vert {H-H_\star } \Vert \lesssim \Vert {\mu -\nu } \Vert $ as desired.$\square $

Proof of Propositions 3.4 and 3.10

Note that for $\bar{H}$ fixed, $\varvec{F}(H)=\varvec{F}(\dot{H},\hat{H},\bar{H})$ is a strictly concave function of $\dot{H},\hat{H}$. It follows that for all $H\in \varvec{\Delta }$ we have $\varvec{F}(H)\leqslant \varvec{F}(L_H)-\epsilon \Vert {H-L_H} \Vert ^2$ for

$$\begin{aligned} L_H\equiv {{\,\mathrm{arg\,max}\,}}_L\Big \{\varvec{F}(L): L\in \varvec{\Delta }\text { with } {\bar{L}}=\bar{H}\Big \}\,. \end{aligned}$$

Clearly $L_H=(L_H)^\text {sy}$, so it follows from Theorem 4.2 and Proposition 5.1 that

$$\begin{aligned} \varvec{F}(L_H)\leqslant \varvec{F}(H_\star ) - \epsilon \Vert {L_H-H_\star } \Vert ^2. \end{aligned}$$

Combining the inequalities (and adjusting $\epsilon $ as needed) gives $\varvec{F}(H)\leqslant \varvec{F}(H_\star )-\epsilon \Vert {H-H_\star } \Vert ^2$. This concludes the proof of Proposition 3.4, and Proposition 3.10 follows by exactly the same argument. $\square $

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during this study.

Notes

The converse is not needed for the final bound, but we mention it for the sake of concreteness.
The matrix $MM^t$ is invertible: if $MM^tx=0$ then $M^t x \in \ker M = ({{\,\mathrm{im}\,}}M^t)^\perp $. On the other hand clearly $M^t x \in {{\,\mathrm{im}\,}}M^t$, so $M^t x \in ({{\,\mathrm{im}\,}}M^t) \cap ({{\,\mathrm{im}\,}}M^t)^\perp =\{0\}$. Therefore $x\in \ker M^t$, but $M^t$ is injective by assumption.
For the proof of Theorem E.5 it is equivalent to sample $\rho $ from $\eta ^\text {av} \equiv \int \eta \,d\zeta $.
The event $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$ is measurable with respect to ${\mathscr {F}}_{r,\circ }$, since ${\delta 'V_{r,\circ }},{\delta 'U_{r,\circ }}$ would remain less than k if the coupling fails at an earlier iteration.
That is to say, let $(w_\gamma )_{\gamma \geqslant 1}$ be a Poisson point process on $\mathbb {R}_{>0}$ with intensity measure $w^{-(1+\lambda )}\,dw$. Let W denote their sum, which is finite almost surely. Assume the points of $w_\gamma $ are arranged in decreasing order, and write $z_\gamma \equiv w_\gamma /W$. Then $(z_\gamma )_{\gamma \geqslant 1}$ is distributed as a Poisson–Dirichlet process with parameter $\lambda $.

References

Achlioptas, D., Coja-Oghlan, A.: Algorithmic barriers from phase transitions. In: Proceedings of 49th FOCS, pp. 793–802. IEEE (2008)
Achlioptas, D., Coja-Oghlan, A., Ricci-Tersenghi, F.: On the solution-space geometry of random constraint satisfaction problems. Random Struct. Algorithm 38(3), 251–268 (2011)
Article MathSciNet Google Scholar
Achlioptas, D., Moore, C.: Random $k$-SAT: two moments suffice to cross a sharp threshold. SIAM J. Comput. 36(3), 740–762 (2006)
Article MathSciNet Google Scholar
Abbe, E., Montanari, A.: On the concentration of the number of solutions of random satisfiability formulas. arXiv:1006.3786, (2010)
Abbe, E., Montanari, A.: On the concentration of the number of solutions of random satisfiability formulas. Random Struct. Algorithm 45(3), 362–382 (2014)
Article MathSciNet Google Scholar
Achlioptas, D., Ricci-Tersenghi, F.: On the solution-space geometry of random constraint satisfaction problems. In: Proceedings of 38th STOC, pp. 130–139. ACM, New York (2006)
Aizenman, M., Sims, R., Starr, S.L.: Extended variational principle for the Sherrington-Kirkpatrick spin-glass model. Phys. Rev. B 68(21), 214403 (2003)
Article Google Scholar
Bapst, V., Coja-Oghlan, A.: The condensation phase transition in the regular $k$-SAT model. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and techniques, volume 60 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 22, 18. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern (2016)
Bapst, V., Coja-Oghlan, A.: Harnessing the Bethe free energy. Random Struct. Algorithm 49(4), 694–741 (2016)
Article MathSciNet Google Scholar
Bapst, V., Coja-Oghlan, A., Hetterich, S., Raßmann, F., Vilenchik, D.: The condensation phase transition in random graph coloring. Commun. Math. Phys. 341(2), 543–606 (2016)
Article MathSciNet Google Scholar
Bapst, V., Coja-Oghlan, A., Raßmann, F.: A positive temperature phase transition in random hypergraph 2-coloring. Ann. Appl. Probab. 26(3), 1362–1406 (2016)
Article MathSciNet Google Scholar
Bayati, M., Gamarnik, D., Tetali, P.: Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Ann. Probab. 41(6), 4080–4115 (2013)
Article MathSciNet Google Scholar
Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for satisfiability. Random Struct. Algorithm 27(2), 201–226 (2005)
Article MathSciNet Google Scholar
Coja-Oghlan, A., Krza̧kała, F., Perkins, W., Zdeborová, L.: Information-theoretic thresholds from the cavity method. In: Proceedings of 49th STOC, pp. 146–157. ACM, New York (2017)
Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborová, L.: Information-theoretic thresholds from the cavity method. Adv. Math. 333, 694–795 (2018)
Article MathSciNet Google Scholar
Coja-Oghlan, A., Perkins, W.: Belief propagation on replica symmetric random factor graph models. Ann. Inst. Henri Poincaré D 5(2), 211–249 (2018)
Article MathSciNet Google Scholar
Coja-Oghlan, A., Panagiotou, K.: Catching the $k$-NAE-SAT threshold. In: Proceedings of 44th STOC, pp. 899–907. ACM, New York (2012)
Coja-Oghlan, A., Panagiotou, K.: The asymptotic $k$-SAT threshold. Adv. Math. 288, 985–1068 (2016)
Article MathSciNet Google Scholar
Coja-Oghlan, A., Perkins, W., Skubch, K.: Limits of discrete distributions and Gibbs measures on random graphs. Eur. J. Combin. 66, 37–59 (2017)
Article MathSciNet Google Scholar
Coja-Oghlan, A., Zdeborová, L.: The condensation transition in random hypergraph 2-coloring. In: Proceedings of 23rd SODA, pp. 241–250. ACM, New York (2012)
Decelle, A., Krza̧kała, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84(6), 66106 (2011)
Ding, J., Sly, A., Sun, N.: Satisfiability threshold for random regular NAE-SAT. In: Proceedings of 46th STOC. ACM, New York (2014)
Ding, J., Sly, A., Sun, N.: Proof of the satisfiability conjecture for large $k$. In: Proceedings of 47th STOC, pp. 59–68. ACM, New York (2015)
Ding, J., Sly, A., Sun, N.: Maximum independent sets on random regular graphs. Acta Math. 217(2), 263–340 (2016)
Article MathSciNet Google Scholar
Ding, J., Sly, A., Sun, N.: Satisfiability threshold for random regular NAE-SAT. Commun. Math. Phys. 341(2), 435–489 (2016)
Article MathSciNet Google Scholar
Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111(3–4), 535–564 (2003)
Article MathSciNet Google Scholar
Gamarnik, D.: Right-convergence of sparse random graphs. Probab. Theory Relat. Fields 160(1–2), 253–278 (2014)
Article MathSciNet Google Scholar
Gerschenfeld, A., Montanari, A.: Reconstruction for models on random graphs. In: Proceedings of 48th FOCS, pp. 194–204. IEEE (2007)
Guerra, F., Toninelli, F.L.: Infinite volume limit and spontaneous replica symmetry breaking in mean field spin glass models. Ann. Henri Poincaré 4(suppl. 1), S441–S444 (2003)
Article MathSciNet Google Scholar
Guerra, F.: Broken replica symmetry bounds in the mean field spin glass model. Commun. Math. Phys. 233(1), 1–12 (2003)
Article MathSciNet Google Scholar
Krza̧kała, F., Montanari, A., Ricci-Tersenghi, F., Semerjian, G., Zdeborová, L.: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. USA 104(25), 10318–10323 (2007)
Massoulié, L.: Community detection thresholds and the weak Ramanujan property. In: Proceedings of 46th STOC, pp. 694–703. ACM, New York (2014)
Mézard, M., Montanari, A.: Information, Physics, and Computation. Oxford University Press, Oxford, Oxford Graduate Texts (2009)
Book Google Scholar
Maneva, E., Mossel, E., Wainwright, M.J.: A new look at survey propagation and its generalizations. J. ACM 54(4), 41 (2007)
Article MathSciNet Google Scholar
Mézard, M., Mora, T., Zecchina, R.: Clustering of solutions in the random satisfiability problem. Phys. Rev. Lett. 94(19), 197205 (2005)
Article Google Scholar
Mertens, S., Mézard, M., Zecchina, R.: Threshold values of random $k$-SAT from the cavity method. Random Struct. Algorithm 28(3), 340–373 (2006)
Article MathSciNet Google Scholar
Mossel, E., Neeman, J., Sly, A.: Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 162(3–4), 431–461 (2015)
Article MathSciNet Google Scholar
Montanari, A., Ricci-Tersenghi, F., Semerjian, G.: Clusters of solutions and replica symmetry breaking in random $k$-satisfiability. J. Stat. Mech. 2008(04), P04004 (2008)
Article Google Scholar
Montanari, A., Restrepo, R., Tetali, P.: Reconstruction and clustering in random constraint satisfaction problems. SIAM J. Discrete Math. 25(2), 771–808 (2011)
Article MathSciNet Google Scholar
Nam, D., Sly, A., Sohn, Y.: One-step replica symmetry breaking of random regular nae-sat. arXiv:2011.14270, (2020)
Panchenko, D.: The Sherrington0-Kirkpatrick Model. Springer Monographs in Mathematics, Springer, New York (2013)
Book Google Scholar
Parisi, G.: On local equilibrium equations for clustering states. arXiv:cs/0212047, (2002)
Panchenko, D., Talagrand, M.: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130(3), 319–336 (2004)
Article MathSciNet Google Scholar
Zdeborova, L., Krza̧kała, F.: Phase transitions in the coloring of random graphs. Phys. Rev. E 76(3), 31131 (2007)

Download references

Acknowledgements

We are grateful to Amir Dembo, Jian Ding, Andrea Montanari, and Lenka Zdeborová for helpful conversations. We thank the anonymous referee and Youngtak Sohn for pointing out errors and giving many helpful comments on drafts of the paper. We also gratefully acknowledge the hospitality of the Simons Institute at Berkeley, where part of this work was completed during a spring 2016 semester program.

Author information

Authors and Affiliations

Princeton University, Princeton, NJ, USA
Allan Sly
Massachusetts Institute of Technology, Cambridge, MA, USA
Nike Sun
Stanford University, Stanford, CA, USA
Yumeng Zhang

Authors

Allan Sly
View author publications
You can also search for this author in PubMed Google Scholar
Nike Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yumeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Allan Sly.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research supported in part by Allan Sly: NSF DMS-1208338, DMS-1352013, Sloan Fellowship, and Nike Sun: NSFMSPRF.

Appendices

Appendix A: Contraction estimates

We now prove Proposition 5.5, on the contraction of the bp recursion for the coloring model. In Section A.1 we analyze the recursions for the first moment (single-copy) model and prove Proposition 5.5a. In Section A.2 we analyze the the recursions for the first moment (pair) model and prove the remainder of Proposition 5.5. We assume throughout the section that $0\leqslant \lambda \leqslant 1$ and $1\leqslant T\leqslant \infty $.

1.1 A.1. Single-copy coloring recursions

Recall from Sect. 5.2 that the bp recursion is a pair (64) of mappings $\dot{{\texttt {BP}}}: {\mathscr {P}}(\hat{\Omega }_T)\rightarrow {\mathscr {P}}(\dot{\Omega }_T)$ and $\hat{{\texttt {BP}}}: {\mathscr {P}}(\dot{\Omega }_T)\rightarrow {\mathscr {P}}(\hat{\Omega }_T)$. Recall that for our purposes we can restrict attention to measures satisfying ${\dot{q}}={\dot{q}}^\text {av}$ and $\hat{q}=\hat{q}^\text {av}$. Under this restriction, the bp recursion is quite explicit, as we now describe. Recall from Definition 2.8, equations (24) and (25), that for $\dot{\tau }\in \dot{\mathscr {M}}$ and $\hat{\tau }\in \hat{\mathscr {M}}$ we defined $\dot{{\texttt {m}}}(\dot{\tau })$ and $\hat{{\texttt {m}}}(\hat{\tau })$ as probability measures on $\{{\texttt {0}},{\texttt {1}}\}$. For convenience, we also define

$$\begin{aligned} \dot{{\texttt {m}}}({\texttt {r}}_{\texttt {1}})=\hat{{\texttt {m}}}({\texttt {b}}_{\texttt {1}})=\delta _{\texttt {1}},\quad \dot{{\texttt {m}}}({\texttt {r}}_{\texttt {0}})=\hat{{\texttt {m}}}({\texttt {b}}_{\texttt {0}})=\delta _{\texttt {0}}\,.\end{aligned}$$

(69)

In what follows we often represent a probability measure on $\{{\texttt {0}},{\texttt {1}}\}$ by the probability assigned to ${\texttt {1}}$, writing $\dot{m}(\dot{\tau })\equiv \dot{{\texttt {m}}}[\dot{\tau }]({\texttt {1}})$ and $\hat{m}(\hat{\tau })\equiv \hat{{\texttt {m}}}[\hat{\tau }]({\texttt {1}})$. Thus, equations (24), (25), and (69) together define mappings $\dot{m}:\dot{\Omega }\rightarrow [0,1]$ and $\hat{m}:\hat{\Omega }\rightarrow [0,1]$. Recall that we denote $\{{\texttt {r}}\}\equiv \{{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {0}}\}$, $\{{\texttt {b}}\}\equiv \{{\texttt {b}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}}\}$, and $\{{\texttt {f}}\}\equiv \Omega {\setminus }\{{\texttt {r}},{\texttt {b}}\}$. We also write $\{{\texttt {f}}\}\equiv (\dot{\Omega }\cup \hat{\Omega }){\setminus }\{{\texttt {r}},{\texttt {b}}\}$; the precise meaning of $\{{\texttt {f}}\}$ will be unambiguous from context. Then, for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$, let us abbreviate

$$\begin{aligned}{\texttt {g}}\equiv {\texttt {b}}\cup {\texttt {f}}, \ {\texttt {g}}_{{\varvec{x}}}\equiv {\texttt {b}}_{{\varvec{x}}}\cup {\texttt {f}}, \ {\texttt {y}}\equiv {\texttt {r}}\cup {\texttt {f}}, \ {\texttt {p}}_{{\varvec{x}}} \equiv {\texttt {b}}_{{\varvec{x}}} \cup {\texttt {r}}_{{\varvec{x}}}\,.\end{aligned}$$

The variable recursion $\dot{{\texttt {BP}}}\equiv \dot{{\texttt {BP}}}_{\lambda ,T}$ is given by

$$\begin{aligned}(\dot{{\texttt {BP}}}\hat{q})(\dot{\sigma }) \cong {\left\{ \begin{array}{ll} \hat{q}({\texttt {p}}_{\texttt {1}})^{d-1} &{} \text {if }\dot{\sigma }\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\},\\ \hat{q}({\texttt {p}}_{\texttt {1}})^{d-1} - (\hat{q}({\texttt {b}}_{\texttt {1}}))^{d-1} &{} \text {if }\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\},\\ \displaystyle \dot{z}(\dot{\sigma })^\lambda \sum _{\hat{\sigma }_2,\ldots ,\hat{\sigma }_d} {\mathbf {1}}\bigg \{ \dot{\sigma }= \dot{T}\Big ((\hat{\sigma }_i)_{i\geqslant 2}\Big ) \bigg \} \prod _{i=2}^d \hat{q}(\hat{\sigma }_i) &{} \text {if }\dot{\sigma }\in \dot{\Omega }_T{\setminus }\{{\texttt {r}},{\texttt {b}}\}, \end{array}\right. } \end{aligned}$$

where $\cong $ indicates the normalization which makes $\dot{{\texttt {BP}}}\hat{q}$ a probability measure on $\dot{\Omega }_T$. For the clause recursion, let us write $\underline{{\dot{\sigma }}}\sim \hat{\sigma }$ if $\underline{{\dot{\sigma }}}\equiv (\dot{\sigma }_2,\ldots ,\dot{\sigma }_k)\in (\dot{\Omega }_T)^{k-1}$ is compatible with $\hat{\sigma }$, in the sense that

$$\begin{aligned} \Big \{\underline{{\sigma }}=( (\dot{\sigma },\hat{\sigma }), (\dot{\sigma }_2,\hat{\sigma }_2), \ldots (\dot{\sigma }_k,\hat{\sigma }_k)) \in (\Omega _T)^k: \hat{I}^\text {lit}(\underline{{\sigma }})=1 \Big \} \ne \varnothing . \end{aligned}$$

(70)

The clause recursion $\hat{{\texttt {BP}}}\equiv \hat{{\texttt {BP}}}_{\lambda ,T}$ is given by

$$\begin{aligned}(\hat{{\texttt {BP}}}{\dot{q}})(\hat{\sigma }) \cong {\left\{ \begin{array}{ll} \displaystyle {\dot{q}}({\texttt {b}}_{{\texttt {0}}})^{k-1} &{} \text {if }\hat{\sigma }\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\},\\ \displaystyle \hat{z}(\hat{\sigma })^\lambda \sum _{\dot{\sigma }_2,\ldots ,\dot{\sigma }_k} {\mathbf {1}}\bigg \{ \hat{\sigma }= \hat{T}\Big ( (\dot{\sigma }_i)_{i\geqslant 2}\Big ) \bigg \} \prod _{i=2}^k{\dot{q}}(\dot{\sigma }_i) &{} \text {if }\hat{\sigma }\in \hat{\Omega }_T{\setminus }\{{\texttt {r}},{\texttt {b}}\},\\ \displaystyle \sum _{\underline{{\dot{\sigma }}} \sim {\texttt {b}}_{\texttt {1}}} \Big ( 1-\prod _{i=2}^k \dot{m}(\dot{\sigma }_i) \Big )^\lambda \prod _{i=2}^k {\dot{q}}(\dot{\sigma }_i) &{} \text {if } \hat{\sigma }\in \{{\texttt {b}}_{\texttt {0}}, {\texttt {b}}_{\texttt {1}}\}, \end{array}\right. } \end{aligned}$$

where the last line uses the convention (69). Recall that ${\texttt {BP}}\equiv \dot{{\texttt {BP}}}\circ \hat{{\texttt {BP}}}\equiv {\texttt {BP}}_{\lambda ,T}$. We will show the following contraction result (assuming, as always, $0\leqslant \lambda \leqslant 1$ and $1\leqslant T\leqslant \infty $).

Proposition A.1

If ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }$, then ${\texttt {BP}}{\dot{q}}_1, {\texttt {BP}}{\dot{q}}_2 \in \varvec{\Gamma }$ and $\Vert {{\texttt {BP}}{\dot{q}}_1-{\texttt {BP}}{\dot{q}}_2} \Vert = O(k^2/2^k)\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert $.

Before the proof of Proposition A.1 we deduce the following consequences:

Proof of Proposition 5.5a

Let ${\dot{q}}^{(0)}$ be the uniform measure on $\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {0}}\}$, and let ${\dot{q}}^{(l)} \equiv {\texttt {BP}}{\dot{q}}^{(l-1)}$. It is clear that ${\dot{q}}^{(0)}\in \varvec{\Gamma }$, so Proposition A.1 implies ${\dot{q}}^{(l)}\in \varvec{\Gamma }$ for all $l\geqslant 1$, and furthermore that $({\dot{q}}^{(l)})_{l\geqslant 1}$ forms an $\ell ^1$ Cauchy sequence. By completeness of $\ell ^1$ we conclude that there exists ${\dot{q}}^{(\infty )}={\dot{q}}_\star \in \varvec{\Gamma }$ such that ${\texttt {BP}}{\dot{q}}_\star = {\dot{q}}_\star $ and $\Vert {{\dot{q}}^{(l)} - {\dot{q}}_\star } \Vert \rightarrow 0$ as $l\rightarrow \infty $. Applying Proposition A.1 again gives $\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^2/2^k)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert $ for any ${\dot{q}}\in \varvec{\Gamma }$, from which it follows that ${\dot{q}}_\star $ is the unique fixed point of ${\texttt {BP}}$ in $\varvec{\Gamma }$.$\square $

Proof of Proposition 5.5b

For each $1\leqslant T\leqslant \infty $, let $({\dot{q}}_{\lambda ,T})^{(l)}$ ($l\geqslant 0$) be defined in the same way as ${\dot{q}}^{(l)}$ in the proof of Proposition A.1. It follows from the definition that $({\dot{q}}_{\lambda ,T})^{(l)} = ({\dot{q}}_{\lambda ,\infty })^{(l)}$ for all $l \leqslant l_T$, where $l_T\equiv \ln T/ \ln (dk)$. By the triangle inequality and Proposition 5.5a,

$$\begin{aligned} \Vert {{\dot{q}}_{\lambda ,T} - {\dot{q}}_{\lambda ,\infty }} \Vert \leqslant \Vert {{\dot{q}}_{\lambda ,T} - ({\dot{q}}_{\lambda ,\infty })^{(l_T)}} \Vert +\Vert {({\dot{q}}_{\lambda ,\infty })^{(l_T)} - {\dot{q}}_{\lambda ,\infty }} \Vert \leqslant (C/2^k)^{l_T} \end{aligned}$$

for some absolute constant k. The result follows assuming $k\geqslant k_0$.$\square $

We now turn to the proof of Proposition A.1. We work with the non-normalized bp recursions $\dot{{\texttt {NB}}}\equiv \dot{{\texttt {NB}}}_{\lambda ,T}$ and $\hat{{\texttt {NB}}}\equiv \hat{{\texttt {NB}}}_{\lambda ,T}$, defined by substituting “$\cong $” with “$=$” in the definitions of $\dot{{\texttt {BP}}}$ and $\hat{{\texttt {BP}}}$ respectively. One can then recover $\dot{{\texttt {BP}}},\hat{{\texttt {BP}}}$ from $\dot{{\texttt {NB}}},\hat{{\texttt {NB}}}$ via

$$\begin{aligned} (\dot{{\texttt {BP}}}\hat{p})(\dot{\sigma }) = \frac{(\dot{{\texttt {NB}}}\hat{p})(\dot{\sigma })}{\sum _{\dot{\sigma }' \in \dot{\Omega }}(\dot{{\texttt {NB}}}\hat{p})(\dot{\sigma }')}\,,\quad (\hat{{\texttt {BP}}}\dot{p})(\hat{\sigma }) = \frac{(\hat{{\texttt {NB}}}\dot{p})(\hat{\sigma })}{\sum _{\hat{\sigma }' \in \hat{\Omega }}(\hat{{\texttt {NB}}}\dot{p})(\hat{\sigma }')}\,. \end{aligned}$$

Let $\dot{p}$ be the reweighted measure defined by

$$\begin{aligned} \dot{p}(\dot{\sigma }) \equiv [\dot{p}({\dot{q}})](\dot{\sigma }) \equiv \frac{{\dot{q}}(\dot{\sigma })}{1 - {\dot{q}}({\texttt {r}})}\,. \end{aligned}$$

(71)

In the above we have assumed that the inputs to $\dot{{\texttt {BP}}},\hat{{\texttt {BP}}},\dot{{\texttt {NB}}},\hat{{\texttt {NB}}}$ are probability measures; we now extend them in the obvious manner to nonnegative measures with strictly positive total mass.

Given two measures $r_1,r_2$ defined on any space $\mathcal {X}$, we denote $\Delta r(x) \equiv |r_1(x)-r_2(x)|$. We regard $\Delta r$ as a nonnegative measure on $\mathcal {X}$: for any subset $S\subseteq \mathcal {X}$,

$$\begin{aligned} \Delta r(S) =\sum _{x\in S}|r_1(x)-r_2(x)| \geqslant |r_1(S)-r_2(S)|, \end{aligned}$$

where the inequality may be strict. For any nonnegative measure ${\hat{r}}$ on $\hat{\Omega }$, we abbreviate

$$\begin{aligned} \hat{m}^\lambda {\hat{r}}(\hat{\sigma })&\equiv \hat{m}(\hat{\sigma })^\lambda {\hat{r}}(\hat{\sigma }),\\ (1-\hat{m})^\lambda {\hat{r}}(\hat{\sigma })&\equiv (1-\hat{m}(\hat{\sigma }))^\lambda {\hat{r}}(\hat{\sigma }). \end{aligned}$$

In what follows we will begin with two measures in $\varvec{\Gamma }$, and show that they contract under one step of the bp recursion. Let $\hat{{\texttt {NB}}}$ and $\dot{{\texttt {NB}}}$ be the non-normalized single-copy bp recursions at parameters $\lambda ,T$. Starting from ${\dot{q}}_i\in \varvec{\Gamma }$ ($i=1,2$), denote

$$\begin{aligned} \dot{p}_i&\equiv \dot{p}({\dot{q}}_i) \text { (as defined by } (71)),\\ \hat{p}_i&\equiv \hat{{\texttt {NB}}}(\dot{p}_i) \text { and } \hat{p}_{i,\infty }\equiv \hat{{\texttt {NB}}}_{\lambda ,\infty }(\dot{p}_i),\\ \dot{p}^{\text {u}}_i&\equiv \dot{{\texttt {NB}}}(\hat{p}_i) \text { and } \tilde{q}_i \equiv \dot{{\texttt {BP}}}\hat{p}_i ={\texttt {BP}}{\dot{q}}_i. \end{aligned}$$

With this notation in mind, the proof of Proposition A.1 is divided into four lemmas.

Lemma A.2

(effect of reweighting) Assuming ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }$, $\Vert {\Delta \dot{p}} \Vert = O(1) \Vert {{\dot{q}}_1 - {\dot{q}}_2} \Vert $, where O(1) indicates a constant depending on the constant appearing in (68).

Lemma A.3

(clause bp) Assuming ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }$,

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {s}})&= 1 -4/2^k + O(k/4^k),\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {f}})&=\hat{m}^\lambda \hat{p}_i({\texttt {s}}) + O(k/4^k),\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})&= 1 + O(k/2^k),\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}})&= (2/2^k)[1 + O(k/2^k)]. \end{aligned}$$

(72)

Further, writing $\Delta \hat{m}^\lambda \hat{p}(\cdot ) \equiv \hat{m}^\lambda (\cdot )|\hat{p}_1 (\cdot ) - \hat{p}_2 (\cdot )|$,

$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {f}}) + \Delta \hat{m}^\lambda \hat{p}({\texttt {r}})&= O(k/2^k)\Delta \dot{p}({\texttt {f}}),\nonumber \\ \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert&= O(k^2/2^k) \Vert {\Delta \dot{p}} \Vert . \end{aligned}$$

(73)

(Recall that $\hat{p}(\hat{\sigma }\oplus {\texttt {1}})=\hat{p}(\hat{\sigma })$ and $\hat{m}(\hat{\sigma }\oplus {\texttt {1}})=1-\hat{m}(\hat{\sigma })$, so $(1-\hat{m})^\lambda \hat{p}(\hat{\sigma }) = \hat{m}^\lambda \hat{p}(\hat{\sigma }\oplus {\texttt {1}})$. As a result, the bounds for $\Delta \hat{m}^\lambda \hat{p}$ imply analogous bounds for $\Delta (1-\hat{m})^\lambda \hat{p}$.)

Lemma A.4

(variable bp, non-normalized) Assuming ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }$, we have

$$\begin{aligned} \begin{bmatrix} \dot{p}^{\text {u}}_i({\texttt {f}}) \\ \dot{p}^{\text {u}}_i({\texttt {r}}) \end{bmatrix} = \begin{bmatrix} O(2^{-k}) \\ 1+O(2^{-k}) \end{bmatrix}\dot{p}^{\text {u}}_i({\texttt {b}}) ,\quad \begin{bmatrix} \Delta \dot{p}^{\text {u}}({\texttt {f}})\\ \Delta \dot{p}^{\text {u}}({\texttt {b}})\\ \Delta \dot{p}^{\text {u}}({\texttt {r}}) \end{bmatrix} = \begin{bmatrix} O(k) \\ O(k2^k) \\ O(k2^k) \end{bmatrix} \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2}\Big \{ \dot{p}^{\text {u}}_i({\texttt {b}})\Big \}. \end{aligned}$$

(74)

Lemma A.5

(variable bp, normalized) Assuming ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }$, we have $\tilde{q}_1,\tilde{q}_2\in \varvec{\Gamma }$ as well, with

$$\begin{aligned}\Vert {\tilde{q}_1-\tilde{q}_2} \Vert \lesssim k \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \,.\end{aligned}$$

Proof of Proposition A.1

Follows by combining the four preceding lemmas A.2–A.5.$\square $

We now prove the four lemmas.

Proof of Lemma A.2

This follows from the elementary identity

$$\begin{aligned} \frac{a_1}{b_1} - \frac{a_2}{b_2} = \frac{1}{b_1} (a_1 - a_2) + \frac{b_2 - b_1}{b_1b_2} {a_2}\,. \end{aligned}$$

(75)

together with (68).$\square $

In the proof of the next two lemmas, the following elementary fact will be used repeatedly: suppose for $1\leqslant l\leqslant m$ that we have nonnegative measures $a^l,b^l$ over a finite set $\mathcal {X}^l$. Then, denoting $\underline{{\mathcal {X}}}=\mathcal {X}^1\times \cdots \times \mathcal {X}^m$, we have

$$\begin{aligned} \sum _{\underline{{x}}\in \underline{{\mathcal {X}}}} \bigg | \prod _{l=1}^m a^l(x^l) -\prod _{l=1}^m b^l(x^l) \bigg |&\leqslant \sum _{l=1}^m \sum _{\underline{{x}}\in \underline{{\mathcal {X}}}} \bigg \{\prod _{1\leqslant j<l} b^j(x^j)\bigg \} \bigg \{\prod _{l<j\leqslant m} a^j(x^j)\bigg \} \Big | a^l(x^l)-b^l(x^l) \Big |\nonumber \\&\leqslant \sum _{l=1}^m \Vert {a^l-b^l} \Vert \prod _{j\ne l} \Big ( \Vert {a^j} \Vert +\Vert {a^j-b^j} \Vert \Big ). \end{aligned}$$

(76)

If all the $(\mathcal {X}^l,a^l,b^l)$ are the same $(\mathcal {X},a,b)$, this reduces to the bound

$$\begin{aligned} \sum _{x_1,\dots ,x_m\in \mathcal {X}} \bigg | \prod _{i=1}^{m} a(x_i) - \prod _{i=1}^{m} b(x_i) \bigg | \leqslant m\Vert {a-b} \Vert \Big ( \Vert {a} \Vert + \Vert {a-b} \Vert \Big )^{m-1}\,. \end{aligned}$$

(77)

In what follows we will abbreviate (for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$)

$$\begin{aligned} {\texttt {a}}_{\varvec{x}}\equiv \Big \{\hat{\sigma }\in \hat{\Omega }_T: \underline{{\dot{\sigma }}}\in ({\texttt {g}}_{\varvec{x}})^{k-1} \text { for all } \underline{{\dot{\sigma }}}\sim \hat{\sigma }\Big \}\,. \end{aligned}$$

(78)

Proof of Lemma A.3

From the definition, if $\dot{p}=\dot{p}({\dot{q}})$ then

$$\begin{aligned}\dot{p}({\texttt {b}}) =\frac{{\dot{q}}({\texttt {b}})}{1-{\dot{q}}({\texttt {r}})} =\frac{{\dot{q}}({\texttt {b}})}{{\dot{q}}({\texttt {g}})} =1-\dot{p}({\texttt {f}})\,. \end{aligned}$$

It follows that for any ${\dot{q}}_1,{\dot{q}}_2 \in \varvec{\Gamma }$ we have $\Delta \dot{p}({\texttt {b}}) \leqslant \Delta \dot{p}({\texttt {f}}) \leqslant \dot{p}_1({\texttt {f}})+\dot{p}_2({\texttt {f}}) = O(2^{-k})$. Another consequence of the definition of $\varvec{\Gamma }$ is that $\Vert {\Delta \dot{p}} \Vert =O(1)$. We now control $\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma })$, distinguishing a few cases:

1.
We first consider $\hat{\sigma }\in \hat{\Omega }{\setminus }\{{\texttt {b}},{\texttt {s}}\}$. For such $\hat{\sigma }$ we have
$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma }) =\bigg | [\hat{m}(\hat{\sigma }) \hat{z}(\hat{\sigma })]^\lambda \sum _{\underline{{\dot{\sigma }}}\sim \hat{\sigma }} \bigg (\prod _{j=2}^k\dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k\dot{p}_2(\dot{\sigma }_j)\bigg ) \bigg |,\end{aligned}$$
and it is easy to check that
$$\begin{aligned}\hat{m}(\hat{\sigma }) \hat{z}(\hat{\sigma }) =1-\prod _{j=2}^k \dot{m}(\dot{\sigma }_j)\in [0,1]\,.\end{aligned}$$
Moreover, any such $\hat{\sigma }$ must belong to ${\texttt {a}}_{\texttt {0}}$ or ${\texttt {a}}_{\texttt {1}}$. By summing over $\hat{\sigma }\in {\texttt {a}}_{\texttt {0}}$ and applying (77) we have
$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\texttt {0}}) \leqslant (k-1) \Vert (\dot{p}_1-\dot{p}_2) \Vert _{\ell ^1 ({\texttt {g}}_{\texttt {0}})} \Big ( \dot{p}_1({\texttt {g}}_{\texttt {0}}) + \Delta \dot{p}({\texttt {f}}) \Big )^{k-2}\,.\end{aligned}$$
Recalling that $\dot{p}_1$ and $\dot{p}_2$ both lie in $\varvec{\Gamma }$, in the above we have $\dot{p}_1({\texttt {g}}_{\texttt {0}}) + \Delta \dot{p}({\texttt {f}}) \leqslant [1 + O(2^{-k})]/2$, as well as $\Vert (\dot{p}_1-\dot{p}_2) \Vert _{\ell ^1 ({\texttt {b}}_{\texttt {0}},{\texttt {f}})} =O(1) \Delta \dot{p}({\texttt {f}})$. Combining these gives
$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\texttt {0}}) =O(k/2^k) \Delta \dot{p}({\texttt {f}}),\end{aligned}$$
and the same bound holds for $\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\texttt {1}})$.
2.
Next consider $\hat{\sigma }={\texttt {s}}$, for which we have $\hat{m}(\hat{\sigma })=1/2$ and $\hat{z}(\hat{\sigma })=2$. Thus
$$\begin{aligned} \hat{m}^\lambda \hat{p}({\texttt {s}}) = 1 - (\dot{p}({\texttt {g}}_{\texttt {0}}))^{k-1} - (\dot{p}({\texttt {g}}_{\texttt {1}}))^{k-1} + \dot{p}({\texttt {f}})^{k-1}\,.\end{aligned}$$
(79)
Arguing as above gives $\Delta \hat{m}^\lambda \hat{p}({\texttt {s}}) = O(k/2^k) \Delta \dot{p}({\texttt {f}})$, proving the first half of (73).
3.
Lastly consider $\hat{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}$. Recalling (69) we have $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {0}})=0$, so let us take $\hat{\sigma }={\texttt {b}}_{\texttt {1}}$, and consider $\underline{{\dot{\sigma }}}\sim {\texttt {b}}_{\texttt {1}}$. Note that if $\underline{{\dot{\sigma }}}$ has no entry in $\{{\texttt {r}}\}$, then we also have $\underline{{\dot{\sigma }}}\sim \hat{\sigma }'$ for some $\hat{\sigma }' \in \{{\texttt {r}},{\texttt {f}}\}$. Again making use of (69), this $\underline{{\dot{\sigma }}}$ gives the same contribution to $\hat{m}^\lambda \hat{p}_\infty (\hat{\sigma }')$ as to $\hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}})$. It follows that
$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}) \leqslant \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}) + k\Big | \dot{p}_1({\texttt {r}}_{{\texttt {0}}})\dot{p}_1({\texttt {b}}_{\texttt {1}})^{k-2} - \dot{p}_2({\texttt {r}}_{{\texttt {0}}}) \dot{p}_2({\texttt {b}}_{\texttt {1}})^{k-2} \Big |\,.\end{aligned}$$
The first term on the right-hand side captures the contribution from those $\underline{{\dot{\sigma }}}$ with no entry in $\{{\texttt {r}}\}$, and by the preceding arguments it is $O(k/2^k)\Delta \dot{p}({\texttt {f}})$. It is easy to check that the second term is $O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert $, which finishes the second part of (73).

Combining the above estimates proves (73). We next prove (72). For this purpose we introduce the notation ${\texttt {f}}_{\geqslant 1}$ to refer to elements of $\dot{\Omega }$ or $\hat{\Omega }$ that contain at least one free variable. In particular, ${\texttt {f}}_{\geqslant 1}\cap \hat{\Omega }$ is given by $\{{\texttt {f}}\}{\setminus }\{{\texttt {s}}\} \subseteq {\texttt {a}}_{\texttt {0}}\cup {\texttt {a}}_{\texttt {1}}\subseteq \hat{\Omega }$. Since ${\dot{q}}_i\in \varvec{\Gamma }$, we must have from (68) that

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {f}}_{\geqslant 1}) \leqslant 2\sum _{l=1}^{k-1} \left( {\begin{array}{c}k-1\\ l\end{array}}\right) \dot{p}_i({\texttt {f}})^l \dot{p}_i({\texttt {b}}_{\texttt {0}})^{k-1-l} \leqslant 2 \dot{p}_i({\texttt {b}}_{\texttt {0}})^{k-1} \sum _{l=1}^{k-1} \bigg ( \frac{k \dot{p}_i({\texttt {f}})}{\dot{p}_i({\texttt {b}}_{\texttt {0}})} \bigg )^l = O(k/4^k)\,.\end{aligned}$$

(80)

On the other hand, we see from (79) that

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {s}}) = 1-4/2^k+O(k/4^k)\,.\end{aligned}$$

If $\underline{{\dot{\sigma }}}\sim {\texttt {b}}_{\texttt {1}}$ has no entry in $\{{\texttt {r}}\}$, then there must exist some $\hat{\sigma }\in \{{\texttt {f}}\}$ such that $\underline{{\dot{\sigma }}}\sim \hat{\sigma }$ as well. Conversely, if $\hat{\sigma }\in \hat{\Omega }_T{\setminus }\{{\texttt {r}},{\texttt {b}}\}$ and $\underline{{\dot{\sigma }}}\sim \hat{\sigma }$, then $\underline{{\dot{\sigma }}}\sim {\texttt {b}}_{\texttt {1}}$, unless $\underline{{\dot{\sigma }}}$ has exactly one spin $\dot{\sigma }_i\in \{{\texttt {b}}_{\texttt {0}},{\texttt {f}}\}$ with the remaining $k-2$ spins equal to ${\texttt {b}}_{\texttt {1}}$.^{Footnote 1} It follows that

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})&=\hat{p}_i({\texttt {b}}_{\texttt {1}})=\hat{m}^\lambda \hat{p}_{i,\infty } ({\texttt {f}}) + (k-1) \Big [ \dot{p}_i({\texttt {r}}_{\texttt {0}}) -\dot{p}_i({\texttt {g}}_{\texttt {0}}) \Big ] \dot{p}_i({\texttt {b}}_{\texttt {1}})^{k-2} \nonumber \\&\leqslant \hat{m}^\lambda \hat{p}_{i,\infty } ({\texttt {f}}) + (k-1) \dot{p}_i({\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {1}})^{k-2} = 1 + O(k/2^k). \end{aligned}$$

(81)

For a lower bound it suffices to consider the contribution from clauses with all k incident colors in $\{{\texttt {b}}\}$:

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) =\hat{p}_i({\texttt {b}}_{\texttt {1}}) \geqslant \dot{p}_i({\texttt {b}})^{k-1} [1-O(k/2^k)]= 1 - O(k/2^k)\,.\end{aligned}$$

(82)

Lastly, note by symmetry that

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}) =\hat{p}_i({\texttt {r}}_{\texttt {1}}) =\hat{p}_i({\texttt {b}}_{\texttt {0}})^{k-1} =(2/2^k) \hat{p}_i({\texttt {b}})^{k-1}\,.\end{aligned}$$

Combining these estimates proves (72).$\square $

Proof of Lemma A.4

We control $\dot{p}^{\text {u}}$ and $\Delta \dot{p}^{\text {u}}$ in two cases.

1.
First consider $\dot{\sigma }\in \dot{\Omega }{\setminus }\{{\texttt {r}},{\texttt {b}}\}$. Up to permutation there is a unique $\underline{{\hat{\sigma }}}\in \{{\texttt {f}}\}^{d-1}$ such that $\dot{\sigma }=\hat{T}(\underline{{\hat{\sigma }}})$. Let ${\textsf {comb}}(\dot{\sigma })$ denote the number of distinct tuples $\underline{{\hat{\sigma }}}'$ that can be obtained by permuting the coordinates of $\underline{{\hat{\sigma }}}$. For this $\underline{{\hat{\sigma }}}$ we have
$$\begin{aligned} \prod _{j=2}^d\hat{m}(\hat{\sigma }_j)^\lambda \leqslant \dot{z}(\dot{\sigma })^\lambda \leqslant \prod _{j=2}^d\hat{m}(\hat{\sigma }_j)^\lambda +\prod _{j=2}^d(1-\hat{m}(\hat{\sigma }_j))^\lambda ,\end{aligned}$$
(83)
where the rightmost inequality uses that $(a+b)^\lambda \leqslant a^\lambda + b^\lambda $ for $a,b\geqslant 0$ and $\lambda \in [0,1]$. It follows that for $i=1,2$ we have
$$\begin{aligned} {\textsf {comb}}(\dot{\sigma }) \prod _{j=2}^d \hat{m}\hat{p}_i(\hat{\sigma }_j) \leqslant \dot{p}^{\text {u}}_i(\dot{\sigma }) \leqslant {\textsf {comb}}(\dot{\sigma }) \bigg \{ \prod _{j=2}^d \hat{m}^\lambda \hat{p}_i(\hat{\sigma }_j) + \prod _{j=2}^d (1-\hat{m})^\lambda \hat{p}_i(\hat{\sigma }_j)\bigg \}\,. \end{aligned}$$
It follows by symmetry that $\hat{m}^\lambda \hat{p}_i({\texttt {f}}) = (1-\hat{m})^\lambda \hat{p}_i({\texttt {f}})$, so
$$\begin{aligned}{}[\hat{m}^\lambda \hat{p}_i({\texttt {s}})]^{d-1} \leqslant \dot{p}^{\text {u}}_i({\texttt {f}}) \leqslant [\hat{m}^\lambda \hat{p}_i({\texttt {f}})]^{d-1} + [(1-\hat{m})^\lambda \hat{p}_i({\texttt {f}})]^{d-1} = 2[\hat{m}^\lambda \hat{p}_i({\texttt {f}})]^{d-1}\,. \end{aligned}$$
(84)
Making use of the symmetry together with (83) gives
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}) \leqslant 2\sum _{\underline{{\hat{\sigma }}} \in (\hat{\Omega }_{\texttt {f}})^{d-1}} \bigg | \prod _{j=2}^{d-1} \hat{m}^\lambda \hat{p}_1(\hat{\sigma }_j) -\prod _{j=2}^{d-1} \hat{m}^\lambda \hat{p}_2(\hat{\sigma }_j) \bigg |,\end{aligned}$$
and applying (77) gives
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}) \lesssim d\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \Big ( \hat{m}^\lambda \hat{p}_1({\texttt {f}}) +\Delta \hat{m}^\lambda \hat{p}_1({\texttt {f}}) \Big )^{d-2}\,.\end{aligned}$$
Combining (72) with the lower bound from (83) then gives
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}) \lesssim d\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2}\Big \{\dot{p}^{\text {u}}_i({\texttt {f}})\Big \}\,.\end{aligned}$$
2.
] Next consider $\dot{\sigma }\in \{{\texttt {r}},{\texttt {b}}\}$: for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$, note that $\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}) = \hat{p}_i({\texttt {p}}_{\varvec{x}})^{d-1}$, and
$$\begin{aligned} \frac{\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}) - \dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}})}{\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}})} =\frac{\hat{p}_i({\texttt {b}}_{\varvec{x}})^{d-1}}{\hat{p}_i({\texttt {p}}_{\varvec{x}})^{d-1}} =\bigg (1 - \frac{\hat{p}_i({\texttt {r}}_{\varvec{x}})}{\hat{p}_i({\texttt {p}}_{\varvec{x}})} \bigg )^{d-1}=O(2^{-k}),\end{aligned}$$
(85)
where the last estimate uses (72) and $d/k=2^{k-1}\ln 2+O(1)$. Applying (77) gives
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}) \lesssim d \Vert {\hat{m}^\lambda \hat{p}} \Vert \Big (\min _{i=1,2}\Big \{ \hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})\Big \} +\Delta \hat{m}^\lambda \hat{p}({\texttt {p}}_{\texttt {1}}) \Big )^{d-2}\,.\end{aligned}$$
Suppose without loss that $\hat{m}^\lambda \hat{p}_1({\texttt {b}}_1) \leqslant \hat{m}^\lambda \hat{p}_2({\texttt {b}}_1)$: then
$$\begin{aligned} \hat{m}^\lambda \hat{p}_1({\texttt {p}}_{\texttt {1}}) +\Delta \hat{m}^\lambda \hat{p}({\texttt {p}}_{\texttt {1}})&=\hat{m}^\lambda \hat{p}_2({\texttt {b}}_{\texttt {1}}) +\hat{m}^\lambda \hat{p}_1({\texttt {r}}_{\texttt {1}}) +\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}) \\&\leqslant \hat{m}^\lambda \hat{p}_2({\texttt {p}}_{\texttt {1}}) +2\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}), \end{aligned}$$
and substituting into the above gives
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}) \lesssim d\Vert {\hat{m}^\lambda \hat{p}} \Vert \Big (\max _{i=1,2}\Big \{ \hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})\Big \} +\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}) \Big )^{d-2}\,.\end{aligned}$$
From (73) and the definition (68) of $\varvec{\Gamma }$ we have $\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}) = O(k/2^k)\Delta \dot{p}({\texttt {f}}) = O(k/4^k)$. It follows from (85) that
$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}) \lesssim d \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2}\Big \{ \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}})\Big \}\,.\end{aligned}$$
(86)

It remains to show $\dot{p}^{\text {u}}({\texttt {f}}) / \dot{p}^{\text {u}}({\texttt {b}}) = O(2^{-k})$. From (81),

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {f}}) -\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) \leqslant \hat{m}^\lambda \hat{p}_{i,\infty }({\texttt {f}}) -\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) \leqslant (k-1) \Big [\dot{p}_i({\texttt {g}}_{\texttt {0}}) -\dot{p}_i({\texttt {r}}_{\texttt {0}})\Big ] \dot{p}_i({\texttt {b}}_{\texttt {1}})^{k-2},\end{aligned}$$

and by definition of $\varvec{\Gamma }$ the right-hand side is $O(k/4^k) \dot{p}_i({\texttt {b}})^{k-1}$. Now recall from (82) that $\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) \gtrsim \dot{p}_i({\texttt {b}})^{k-1}$. Combining these gives

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {f}}) \leqslant [1+O(k/4^k)] \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})\,.\end{aligned}$$

(87)

Recalling (83), it follows that

$$\begin{aligned}\frac{\dot{p}^{\text {u}}_i({\texttt {f}})}{\dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}})} \lesssim \bigg (\frac{\hat{m}^\lambda \hat{p}_i({\texttt {f}})}{\hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})}\bigg )^{d-1} \lesssim \bigg (\frac{\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})}{\hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})}\bigg )^{d-1} \lesssim 2^{-k},\end{aligned}$$

where the last step uses (72). This concludes the proof.$\square $

Proof of Lemma A.5

Denote $\tilde{q}_i \equiv {\texttt {BP}}{\dot{q}}_i$ and $\Delta \tilde{q}\equiv |\tilde{q}_1-\tilde{q}_2|$. We first check that $\tilde{q}_i$ lies in $\varvec{\Gamma }$: the first condition of (68) follows from (74), and the second is automatically satisfied from the definition of $\dot{{\texttt {BP}}}$. Next we bound $\Delta \tilde{q}$. With some abuse of notation, we shall write $\tilde{q}_i({\texttt {X}})\equiv \tilde{q}_i({\texttt {r}})-\tilde{q}_i({\texttt {b}})$ and

$$\begin{aligned}\Delta \tilde{q}({\texttt {X}}) \equiv |(\tilde{q}_1({\texttt {r}})-\tilde{q}_1({\texttt {b}})) - (\tilde{q}_2({\texttt {r}})-\tilde{q}_2({\texttt {b}}))|\,.\end{aligned}$$

Let $\dot{p}^{\text {u}}_i({\texttt {X}})$ and $\Delta \dot{p}^{\text {u}}({\texttt {X}})$ be similarly defined. Arguing similarly as in the derivation of (86),

$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {X}}) = 2 |\hat{p}_1({\texttt {b}}_{\texttt {1}})^{d-1} -\hat{p}_2({\texttt {b}}_{\texttt {1}})^{d-1}| \lesssim k\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2} \Big \{\dot{p}^{\text {u}}_i({\texttt {b}})\Big \}\end{aligned}$$

(88)

Recalling $\Vert {\tilde{q}_i} \Vert =1$, we have

$$\begin{aligned} 2\tilde{q}_i({\texttt {r}})&=[1-\tilde{q}_i({\texttt {f}})] +[\tilde{q}_i({\texttt {r}})-\tilde{q}_i({\texttt {b}})] \text { and}\\ 2\tilde{q}_i({\texttt {b}})&=[1-\tilde{q}_i({\texttt {f}})]-[\tilde{q}_i({\texttt {r}})-\tilde{q}_i({\texttt {b}})], \text { so}\\ \Vert {\Delta \tilde{q}} \Vert&\lesssim \Delta \tilde{q}({\texttt {f}}) + \Delta \tilde{q}({\texttt {X}}). \end{aligned}$$

If we take $a\in \{1,2\}$ and $b=2-a$, and write ${\dot{Z}}_i\equiv \Vert {\dot{p}^{\text {u}}_i} \Vert $, then

$$\begin{aligned} \Delta \tilde{q}({\texttt {f}}) + \Delta \tilde{q}({\texttt {X}}) \leqslant \frac{\Delta \dot{p}^{\text {u}}({\texttt {f}}) +\Delta \dot{p}^{\text {u}}({\texttt {X}}) }{{\dot{Z}}_a} + \frac{|{\dot{Z}}_a-{\dot{Z}}_b|}{{\dot{Z}}_a} \frac{[\dot{p}^{\text {u}}_b({\texttt {f}}) +\dot{p}^{\text {u}}_b({\texttt {r}})-\dot{p}^{\text {u}}_b({\texttt {b}})]}{{\dot{Z}}_b}\,. \end{aligned}$$

If we take $a\in {{\,\mathrm{arg\,max}\,}}_i\dot{p}^{\text {u}}_i({\texttt {b}})$, then, by (74) and (88), the first term on the right-hand side is

$$\begin{aligned} \lesssim \frac{ k\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \dot{p}^{\text {u}}_a({\texttt {b}}) }{{\dot{Z}}_a} \lesssim k\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert , \end{aligned}$$

where the rightmost inequality uses ${\dot{Z}}_i\geqslant \dot{p}^{\text {u}}_i({\texttt {b}})$. As for the second term, (74) gives

$$\begin{aligned} \frac{|{\dot{Z}}_a-{\dot{Z}}_b|}{{\dot{Z}}_a} \lesssim d\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \quad \text {and}\quad \frac{[\dot{p}^{\text {u}}_b({\texttt {f}}) +\dot{p}^{\text {u}}_b({\texttt {r}})-\dot{p}^{\text {u}}_b({\texttt {b}})]}{ {\dot{Z}}_b} \lesssim 2^{-k}\,. \end{aligned}$$

Combining these estimates yields the claimed bound.$\square $

1.2 A.2. Pair coloring recursions

In this section we analyze the bp recursions for the pair coloring model and prove the remaining assertions of Proposition 5.5. Recall that we assume ${\dot{q}}={\dot{q}}^\text {av}$ and $\hat{q}=\hat{q}^\text {av}$, where these are now probability measures on $(\dot{\Omega }_T)^2$ and $(\hat{\Omega }_T)^2$ respectively. For any measure p(x) defined on $x\equiv (x^1,x^2)$ in $(\dot{\Omega }_T)^2$ or $(\hat{\Omega }_T)^2$, define

$$\begin{aligned}(\mathfrak {f}p)(x) \equiv p(\mathfrak {f}x)\quad \text {where } \mathfrak {f}x \equiv x\oplus ({\texttt {0}},{\texttt {1}}) \equiv (x^1,x^2\oplus {\texttt {1}})\,. \end{aligned}$$

Recall from Sect. 5.3 the definition of $\varvec{\Gamma }(c,\kappa )$. We will prove that

Proposition A.6

For any $c\in (0,1]$ and any $ {\dot{q}}_1,{\dot{q}}_2 \in \varvec{\Gamma }(c,1)$, we have ${\texttt {BP}}{\dot{q}}_1, {\texttt {BP}}{\dot{q}}_2 \in \varvec{\Gamma }(1,1)$ and

$$\begin{aligned} \Vert {{\texttt {BP}}{\dot{q}}_1-{\texttt {BP}}{\dot{q}}_2} \Vert = O(k^4/2^k)\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert + O(k^4/2^k) \sum _{i=1,2} \Vert {{\dot{q}}_i -\mathfrak {f}{\dot{q}}_i} \Vert . \end{aligned}$$

(89)

Assuming this result, it is straightforward to deduce Proposition 5.5A:

Proof of Proposition 5.5A

Let ${\dot{q}}^{(0)}$ be the uniform probability measure on $\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {0}}\}^2$, and define recursively ${\dot{q}}^{(l)} = {\texttt {BP}}{\dot{q}}^{(l-1)}$ for $l\geqslant 1$. It is clear that ${\dot{q}}^{(0)}\in \varvec{\Gamma }(1,1)$ and ${\dot{q}}^{(0)} =\mathfrak {f}{\dot{q}}^{(0)}$. Since ${\dot{q}}^{(l)}=\mathfrak {f}{\dot{q}}^{(l)}$ for all $l\geqslant 1$, it follows from (89) that $({\dot{q}}^{(l)})_{l\geqslant 1}$ forms an $\ell ^1$ Cauchy sequence. It follows by completeness of $\ell ^1$ that ${\dot{q}}^{(l)}$ converges to a limit ${\dot{q}}^{(\infty )}={\dot{q}}_\star \in \varvec{\Gamma }(1,1)$, satisfying ${\dot{q}}_\star =\mathfrak {f}{\dot{q}}_\star ={\texttt {BP}}{\dot{q}}_\star $. This implies that for any probability measure ${\dot{q}}$,

$$\begin{aligned}\Vert {{\dot{q}}-\mathfrak {f}{\dot{q}}} \Vert \leqslant \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert +\Vert {{\dot{q}}_\star -\mathfrak {f}{\dot{q}}} \Vert =2\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \,.\end{aligned}$$

Applying (89) again gives

$$\begin{aligned} \Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^4/2^k)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert + O(k^4/2^k)\Vert {{\dot{q}}-\mathfrak {f}{\dot{q}}} \Vert = O(k^4/2^k)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert ,\end{aligned}$$

proving the claimed contraction estimate. Uniqueness of ${\dot{q}}_\star $ can be deduced from this contraction.$\square $

We now turn to the proof of Proposition A.6; the proof of Proposition 5.5B is given after. Let $\dot{{\texttt {NB}}},\hat{{\texttt {NB}}}$ now denote the non-normalized bp recursions for the pair model. Let ${\texttt {r}}[\dot{\sigma }]\in \{0,1,2\}$ count the number of ${\texttt {r}}$ spins in $\dot{\sigma }$, and let $\dot{p}\equiv \dot{p}({\dot{q}})$ be the reweighted measure

$$\begin{aligned} \dot{p}(\dot{\sigma }) \equiv \frac{{\dot{q}}(\dot{\sigma })}{1 - {\dot{q}}({\texttt {r}}[\dot{\sigma }] > 0)}\,.\end{aligned}$$

(90)

Recalling convention (69), we will denote

$$\begin{aligned}\hat{m}^\lambda {\hat{r}}(\hat{\sigma }^1,\hat{\sigma }^2) \equiv [\hat{m}(\hat{\sigma }^1) \hat{m}(\hat{\sigma }^2)]^\lambda {\hat{r}}(\hat{\sigma }^1,\hat{\sigma }^2)\,.\end{aligned}$$

Let $\hat{{\texttt {NB}}}$ and $\dot{{\texttt {NB}}}$ be the non-normalized pair bp recursions at parameters $\lambda ,T$. Starting from ${\dot{q}}_i\in \varvec{\Gamma }(c,\kappa )$ ($i=1,2$), we denote

$$\begin{aligned} \dot{p}_i&\equiv \dot{p}({\dot{q}}_i) \text { (as defined by (71)),}\\ \hat{p}_i&\equiv \hat{{\texttt {NB}}}(\dot{p}_i) \text { and } \hat{p}_{i,\infty }\equiv \hat{{\texttt {NB}}}_{\lambda ,\infty }(\dot{p}_i),\\ \dot{p}^{\text {u}}_i&\equiv \dot{{\texttt {NB}}}(\hat{p}_i) \text { and } \tilde{q}_i \equiv \dot{{\texttt {BP}}}\hat{p}_i ={\texttt {BP}}{\dot{q}}_i.\end{aligned}$$

With this notation in mind, the proof of Proposition A.6 is divided into the following lemmas.

Lemma A.7

(effect of reweighting) Suppose ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )$ for $c\in (0,1]$ and $\kappa \in [0,1]$: then

$$\begin{aligned} \Vert {\Delta \dot{p}} \Vert&\equiv O(2^{2(1-\kappa )k}) \Vert {\Delta {\dot{q}}} \Vert ,\\ \Vert {\dot{p}_i - \mathfrak {f}\dot{p}_i} \Vert&\equiv O(2^{(1-\kappa )k}) \Vert {{\dot{q}}_i - \mathfrak {f}{\dot{q}}_i} \Vert . \end{aligned}$$

Lemma A.8

(clause bp contraction) Suppose ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )$ for $c\in (0,1]$ and $\kappa \in [0,1]$: then

$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {y}}{\texttt {y}})&=O(k^3/2^{k}) \Delta \dot{p}({\texttt {g}}{\texttt {g}}) =O(k^3/2^{(1+c)k}),\nonumber \\ \Delta \hat{m}^\lambda \hat{p}(\{{\texttt {b}}{\texttt {r}},{\texttt {b}}{\texttt {f}}_{\geqslant 1}\})&= O(k^2/2^k) [\Delta \dot{p}({\texttt {g}}{\texttt {g}}) +2^{-k}\Delta \dot{p}( \dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\})] =O(k^3/2^{(1+c)k}),\nonumber \\ \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert&= O(k^3/2^{k})\Vert {\Delta \dot{p}} \Vert =O(k^3 2^{(1-2\kappa )k}), \end{aligned}$$

(91)

and the same estimates hold with $\mathfrak {f}\hat{p}$ in place of $\hat{p}$. For both $i=1,2$,

$$\begin{aligned} \Vert {\hat{m}^\lambda \hat{p}_i -\hat{m}^\lambda \mathfrak {f}\hat{p}_i} \Vert =O(k^3/2^{(1+\kappa )k}) \Vert {\dot{p}_i-\mathfrak {f}\dot{p}_i} \Vert =O(k^3/2^{2\kappa k}) \Vert {{\dot{q}}_i-\mathfrak {f}{\dot{q}}_i} \Vert \,. \end{aligned}$$

(92)

Lemma A.9

(clause bp output values) Suppose ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )$ for $c\in (0,1]$ and $\kappa \in [0,1]$. For $s,t\subseteq \hat{\Omega }$ let $st\equiv s\times t$. Then it holds for all $s,t \in \{{\texttt {r}}_{\texttt {1}}, {\texttt {b}}_{\texttt {1}}, {\texttt {f}}, {\texttt {s}}\}$ that

$$\begin{aligned} \frac{\hat{m}^\lambda \hat{p}_i(s, t)}{(2/2^k)^{{\texttt {r}}[s]+{\texttt {r}}[t]} } = {\left\{ \begin{array}{ll} 1+O(k^2/2^{k}) &{} \text {if }{\texttt {r}}[s] + {\texttt {r}}[t] \leqslant 1,\\ 1+O(k^2/2^{ck}) &{} \text {if }{\texttt {r}}[s] + {\texttt {r}}[t] = 2. \\ \end{array}\right. } \end{aligned}$$

(93)

Furthermore we have the bounds

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {f}}_{\geqslant 1}t) + \hat{m}^\lambda \hat{p}_i(t{\texttt {f}}_{\geqslant 1})&\leqslant O(k/4^k) \text { for all } t\in \{{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\},\nonumber \\ \hat{m}^\lambda \hat{p}_i (\{{\texttt {f}}\}\times \hat{\Omega }) -\hat{m}^\lambda \hat{p}_i(\{{\texttt {b}}_{\texttt {1}}\} \times \hat{\Omega })&\leqslant O(k/4^{k}). \end{aligned}$$

(94)

The same estimates hold with $\mathfrak {f}\hat{p}_i$ in place of $\hat{p}_i$.

Lemma A.10

(variable bp) Suppose ${\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )$ for $c\in (0,1]$ and $\kappa \in [0,1]$. Then ${\texttt {BP}}{\dot{q}}_1,{\texttt {BP}}{\dot{q}}_2 \in \varvec{\Gamma }(c',1)$ for $c' = \max \{0,2\kappa -1\}$, and

$$\begin{aligned}\Big \Vert {{\texttt {BP}}{\dot{q}}_1 - {\texttt {BP}}{\dot{q}}_2} \Big \Vert \lesssim k \Big \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Big \Vert \big ) + k2^k\sum _{i=1,2} \Big \Vert {\hat{m}^\lambda \hat{p}_i - \hat{m}^\lambda \mathfrak {f}\hat{p}_i} \Big \Vert \,.\end{aligned}$$

Proof of Proposition A.6

Follows by combining the preceding lemmas A.7–A.10.$\square $

Proof of Proposition 5.5B

If ${\dot{q}}\in \varvec{\Gamma }(c,0)$ is a fixed point of ${\texttt {BP}}$, then it follows from Lemmas A.8–A.10 that we have ${\dot{q}}\in \varvec{\Gamma }(c,0)\cap \varvec{\Gamma }(0,1) = \varvec{\Gamma }(c,1)$.$\square $

We now prove the three lemmas leading to Proposition A.6.

Proof of Lemma A.7

Applying (75) we have

$$\begin{aligned} |\dot{p}_1(\dot{\sigma })-\dot{p}_2(\dot{\sigma })| \leqslant \frac{|{\dot{q}}_1(\dot{\sigma })-{\dot{q}}_2(\dot{\sigma })|}{ {\dot{q}}_1({\texttt {g}}{\texttt {g}}) } + \frac{|{\dot{q}}_1({\texttt {g}}{\texttt {g}}) -{\dot{q}}_2({\texttt {g}}{\texttt {g}})|}{{\dot{q}}_1({\texttt {g}}{\texttt {g}}){\dot{q}}_2({\texttt {g}}{\texttt {g}})} {\dot{q}}_2(\dot{\sigma }), \end{aligned}$$

and summing over $\dot{\sigma }\in \dot{\Omega }^2$ gives

$$\begin{aligned} \Vert {\Delta \dot{p}} \Vert \leqslant \frac{\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert }{ {\dot{q}}_1({\texttt {g}}{\texttt {g}}) } + \frac{|{\dot{q}}_1({\texttt {g}}{\texttt {g}}) -{\dot{q}}_2({\texttt {g}}{\texttt {g}})|}{{\dot{q}}_1({\texttt {g}}{\texttt {g}}){\dot{q}}_2({\texttt {g}}{\texttt {g}})} \leqslant \frac{2\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert }{{\dot{q}}_1({\texttt {g}}{\texttt {g}}){\dot{q}}_2({\texttt {g}}{\texttt {g}})}\,. \end{aligned}$$

Since ${\dot{q}}_i\in \varvec{\Gamma }$, we have, using ($1\varvec{\Gamma }$) and ($2\varvec{\Gamma }$),

$$\begin{aligned} \dot{p}_i(\dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\})= O(1)\,,\quad \dot{p}_i({\texttt {r}}{\texttt {r}})=O(2^{(1-\kappa )k})\,. \end{aligned}$$

(95)

Consequently ${\dot{q}}_i({\texttt {g}}{\texttt {g}})^{-1}\leqslant O(1) 2^{(1-\kappa )k}$, and the claimed bound on $\Vert {\Delta \dot{p}} \Vert $ follows. The bound on $\Vert {\dot{p}_i-\mathfrak {f}\dot{p}_i} \Vert $ follows by noting that if ${\dot{q}}_2=\mathfrak {f}{\dot{q}}_1$, then ${\dot{q}}_1({\texttt {g}}{\texttt {g}})={\dot{q}}_2({\texttt {g}}{\texttt {g}})$.

$\square $

Proof of Lemma A.8

We will prove (91) for $\hat{p}_i$; the proof for $\mathfrak {f}\hat{p}_i$ is entirely similar. It follows from the symmetry $\dot{p}_i=(\dot{p}_i)^\text {av}$ that for any ${\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}$,

$$\begin{aligned} \Big |\dot{p}_i({\texttt {b}}{\texttt {b}}) -4\dot{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})\Big | =2\Big | \dot{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{{\varvec{y}}\oplus {\texttt {1}}}) -\dot{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})\Big | =2\Big | \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})\Big |, \end{aligned}$$

from which we obtain that

$$\begin{aligned} \Delta \dot{p}({\texttt {b}}{\texttt {b}}) \lesssim \max _{i=1,2} \Big | \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})\Big |\,. \end{aligned}$$

Recall ${\texttt {g}}=\{{\texttt {b}},{\texttt {f}}\}$ and $\dot{p}_i({\texttt {g}}{\texttt {g}})=1$. Combining the above with the definition of $\varvec{\Gamma }(c,\kappa )$ gives

$$\begin{aligned} \Delta \dot{p}({\texttt {g}}{\texttt {g}})&\leqslant \Delta \dot{p}({\texttt {b}}{\texttt {b}})+ \Delta \dot{p}({\texttt {g}}{\texttt {f}}) +\Delta \dot{p}({\texttt {f}}{\texttt {g}})\nonumber \\&\leqslant \sum _{i=1,2}\bigg \{ \Big | \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})\Big | +\dot{p}_i({\texttt {g}}{\texttt {f}}) +\dot{p}({\texttt {f}}{\texttt {g}}) \bigg \} = O(2^{-ck}). \end{aligned}$$

(96)

Step I. We first control $\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma })$. By symmetry it suffices to analyze the bp recursion at a clause with all literals ${\texttt {L}}_j={\texttt {0}}$. We distinguish the following cases of $\hat{\sigma }\in \hat{\Omega }^2$:

1.
Recall ${\texttt {y}}\equiv {\texttt {r}}\cup {\texttt {f}}$, and note $\{{\texttt {y}}\}{\setminus }\{{\texttt {s}}\} \subseteq {\texttt {a}}_{\texttt {0}}\cup {\texttt {a}}_{\texttt {1}}$ (as defined by (78)). Thus
$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}( \{{\texttt {y}}{\texttt {y}}\}{\setminus }\{{\texttt {s}}{\texttt {s}}\}) \leqslant \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \bigg \{ \Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\varvec{x}}\times \{{\texttt {y}}\}) + \Delta \hat{m}^\lambda \hat{p}( \{{\texttt {y}}\}\times {\texttt {a}}_{\varvec{x}}) \bigg \}. \end{aligned}$$
(97)
For ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$, consider $\hat{\sigma }\in {\texttt {a}}_{\varvec{x}}\times \{{\texttt {y}}\}$: in order for $\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}$ to be compatible with $\hat{\sigma }$, it is necessary that $\dot{\sigma }_j\in A \equiv \{{\texttt {g}}_{\varvec{x}}\}\times \{{\texttt {g}}\}$ for all $2\leqslant j\leqslant k$. Combining with (77) gives
$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\varvec{x}}\times \{{\texttt {y}}\}) \leqslant \sum _{\underline{{\dot{\sigma }}}\in A^{k-1}} \bigg | \prod _{j=2}^k\dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k\dot{p}_2(\dot{\sigma }_j) \bigg | \leqslant k\Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big ( \dot{p}_1(A)+\Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big )^{k-2}\,. \end{aligned}$$
It follows from the definition of $\varvec{\Gamma }(c,\kappa )$ that $\dot{p}_1(A)+\Delta \dot{p}({\texttt {g}}{\texttt {g}}) = \tfrac{1}{2}+O(2^{-ck})$, so we conclude
$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}( \{{\texttt {y}}{\texttt {y}}\}{\setminus }\{{\texttt {s}}{\texttt {s}}\}) = O(k/2^k) \Delta \dot{p}({\texttt {g}}{\texttt {g}})\,.\end{aligned}$$
(98)
2.
Now take $\hat{\sigma }={\texttt {s}}{\texttt {s}}$: for $\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}$ to be compatible with $\hat{\sigma }$, it is necessary that $\underline{{\dot{\sigma }}}\in \{{\texttt {y}}{\texttt {y}}\}^{k-1}$. On the other hand, it is sufficient that $\underline{{\dot{\sigma }}}\in \{{\texttt {g}}{\texttt {g}}\}^{k-1}$ does not belong to any of the sets $(A_{\texttt {0}})^{k-1},(A_{\texttt {1}})^{k-1},(B_{\texttt {0}})^{k-1},(B_{\texttt {1}})^{k-1}$, where for ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$ we define $A_{\varvec{x}}\equiv \{{\texttt {b}}_{\varvec{x}}{\texttt {g}}\}\cup \{{\texttt {f}}{\texttt {g}}\}$ and $B_{\varvec{x}}\equiv \{{\texttt {g}}{\texttt {b}}_{\varvec{x}}\}\cup \{{\texttt {g}}{\texttt {f}}\}$. Therefore
$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {s}}{\texttt {s}}) \leqslant \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \sum _{\underline{{\dot{\sigma }}} \in (A_{\varvec{x}})^{k-1} \cup (B_{\varvec{x}})^{k-1}} \bigg | \prod _{j=2}^k\dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k\dot{p}_2(\dot{\sigma }_j) \bigg | = O(k/2^k) \Delta \dot{p}({\texttt {g}}{\texttt {g}}),\end{aligned}$$
where the last estimate follows by the same argument that led to (98). This concludes the proof of the first line of (91).
3.
Now consider $\hat{\sigma }$ with exactly one coordinate in $\{{\texttt {b}}\}$, meaning the other must be in $\{{\texttt {y}}\}$. Recalling convention (69), we assume without loss that $\hat{\sigma }\in \{{\texttt {b}}_{\texttt {1}}{\texttt {y}}\}$ and proceed to bound $\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma })$. Let $\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}$ be compatible with $\hat{\sigma }$. There are two cases:
1. a.
  If $\underline{{\dot{\sigma }}}$ has no entry in $\{{\texttt {r}}\}$, it must also be compatible with some $\hat{\sigma }'\in \{{\texttt {y}}{\texttt {y}}\}$, as long as we permit the possibility that $|(\hat{\sigma }')^1|>T$. Such $\underline{{\dot{\sigma }}}$ gives the same contribution to $\hat{m}^\lambda \hat{p}(\hat{\sigma })$ as to $\hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {y}})$. It follows from the preceding estimates that the contribution to $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {y}})$ from all such $\underline{{\dot{\sigma }}}$ is upper bounded by
  $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {y}}) =O(k/2^k) \Delta \dot{p}({\texttt {g}}{\texttt {g}})\end{aligned}$$
  (99)
2. b.
  The only remaining possibility is that some permutation of $\underline{{\dot{\sigma }}}$ belongs to $A\times B^{k-2}$ for $A=\{{\texttt {r}}_{\texttt {0}}{\texttt {g}}\}$ and $B=\{{\texttt {b}}_{\texttt {1}}{\texttt {g}}\}$: the contribution to $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {y}})$ from all such $\underline{{\dot{\sigma }}}$ is
  $$\begin{aligned} \leqslant (k-1)\sum _{\underline{{\dot{\sigma }}}\in A\times B^{k-2}}\bigg |\prod _{j=2}^k \dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k \dot{p}_2(\dot{\sigma }_j)\bigg | = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert ,\end{aligned}$$
  (100)
  where the last estimate follows using (76) and (95).
Combining the above estimates (and using the symmetry between ${\texttt {b}}_{\texttt {1}}{\texttt {y}}$ and ${\texttt {y}}{\texttt {b}}_{\texttt {1}}$) gives
$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {y}}) +\Delta \hat{m}^\lambda \hat{p}({\texttt {y}}{\texttt {b}}_{\texttt {1}}) = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert \,.\end{aligned}$$
(101)
If we further have $\hat{\sigma }\in \{{\texttt {b}}_{\texttt {1}}\}\times \{{\texttt {r}},{\texttt {f}}_{\geqslant 1}\}$, then, arguing as above, $\underline{{\dot{\sigma }}}$ either contributes to $\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}\times \{{\texttt {r}},{\texttt {f}}_{\geqslant 1}\})$, or else belongs to $A_{\varvec{x}}\times (B_{\varvec{x}})^{k-2}$ for $A_{\varvec{x}}=\{{\texttt {r}}_{\texttt {0}}{\texttt {g}}_{\varvec{x}}\}$, $B_{\varvec{x}}=\{{\texttt {b}}_1{\texttt {g}}_{\varvec{x}}\}$ and ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$. The contribution from first case is bounded by (98). The contribution from the second case, using (76) and (95), is
$$\begin{aligned} \lesssim k \Delta \dot{p}(\dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\}) \Big (\max _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \dot{p}_1(B_{\varvec{x}}) + \Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big )^{k-2} =O(k^2/4^k) \Delta \dot{p}(\dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\})\,.\end{aligned}$$
The second claim of (91) follows by combining these estimates and recalling (96).
c.
Lastly, consider $\hat{\sigma }\in \{{\texttt {b}}{\texttt {b}}\}$. Without loss of generality, we take $\hat{\sigma }={\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}$ and proceed to bound $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})$. Let $\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}$ be compatible with $\hat{\sigma }$. We distinguish three cases:
1. a.
  For at least one $i\in \{1,2\}$, $\underline{{\dot{\sigma }}}^i$ contains no entry in $\{{\texttt {r}}\}$. In this case $\underline{{\dot{\sigma }}}$ is also compatible with some $\hat{\sigma }'\in \{{\texttt {b}}_{\texttt {1}}{\texttt {y}}\} \cup \{{\texttt {y}}{\texttt {b}}_{\texttt {1}}\}$, as long as we permit the possibility that $|(\hat{\sigma }')^i|>T$. The contribution of all such $\underline{{\dot{\sigma }}}$ to $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})$ is therefore upper bounded by
  $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {b}}_{\texttt {1}}{\texttt {y}}) +\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {b}}_{\texttt {1}}) = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert ,\end{aligned}$$
  (102)
  where the last step is by the same argument as for (101).
2. b.
  The next case is that $\underline{{\dot{\sigma }}}$ is a permutation of $({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}, ({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2})$. The contribution to $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})$ from this case is at most
  $$\begin{aligned}(k-1)\bigg | \dot{p}_1({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_1({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} -\dot{p}_2({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_2({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} \bigg |\,.\end{aligned}$$
  Using (76) and the definition of $\varvec{\Gamma }(c,\kappa )$, this is at most
  $$\begin{aligned}&O(k^2/4^k) \Big ( \Delta \dot{p}({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) + \dot{p}({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \cdot \Delta \dot{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}) \Big )\nonumber \\&\quad =O(k^2/4^k)\Vert {\dot{p}} \Vert \Vert {\Delta \dot{p}} \Vert = O(k^2/2^{(1+\kappa )k})\Vert {\Delta \dot{p}} \Vert . \end{aligned}$$
  (103)
3. c.
  The last case is that $\underline{{\dot{\sigma }}}$ is a permutation of $({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}, {\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}, ({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3})$. The contribution to $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})$ from this case is at most
  $$\begin{aligned} k^2 \bigg | \dot{p}_1({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) \dot{p}_1({\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}) \dot{p}_1({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3} -\dot{p}_2({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) \dot{p}_2({\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}) \dot{p}_2({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3} \bigg |\,. \end{aligned}$$
  This is at most $O(k^2/4^k) \Vert {\Delta \dot{p}} \Vert $ by another application of (76) and the definition of $\varvec{\Gamma }(c,\kappa )$.
The above estimates together give
$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}) = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert ,\end{aligned}$$
(104)
where the main contribution comes from (102). Combining with the previous bound (101) yields the last part of (91).

Step II. Next we prove (92) by improving the preceding bounds in the special case that $\dot{p}_1=\dot{p}$ and $\dot{p}_2 \equiv \mathfrak {f}\dot{p}$. Recall $\hat{p}_i \equiv \hat{{\texttt {NB}}}(\dot{p}_i)$; it follows that $\hat{p}_2=\mathfrak {f}\hat{p}_1$. Thus, for any $\hat{\sigma }\in \hat{\Omega }^2$ with $\hat{\sigma }^2={\texttt {s}}$, we have $\hat{\sigma }=\mathfrak {f}\hat{\sigma }$, consequently $\hat{p}_2(\hat{\sigma }) =\hat{p}_1(\mathfrak {f}\hat{\sigma })=\hat{p}_1(\hat{\sigma })$. For $\hat{\sigma }\in \hat{\Omega }^2$ with $\hat{\sigma }^1={\texttt {s}}$, we have $\hat{\sigma }=(\mathfrak {f}\hat{\sigma })\oplus {\texttt {1}}$, so $\hat{p}_2(\hat{\sigma })=\hat{p}_1(\mathfrak {f}\hat{\sigma })=\hat{p}_1(\hat{\sigma })$, where the last step uses that $\hat{p}_1=(\hat{p}_1)^\text {av}$. It follows that instead of (97) and (99) we have the improved bound

$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {y}})&=\Delta \hat{m}^\lambda \hat{p}_\infty (\{{\texttt {y}}{\texttt {y}}\} {\setminus }(\{{\texttt {s}}{\texttt {y}}\}\cup \{{\texttt {y}}{\texttt {s}}\})) \leqslant \sum _{{\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {a}}_{\varvec{x}}\times {\texttt {a}}_{\varvec{y}})\\&= O(k) \Vert {\Delta \dot{p}} \Vert \sum _{{\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} \Big ( \dot{p}_1({\texttt {g}}_{\varvec{x}},{\texttt {g}}_{\varvec{y}}) +\Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big )^{k-2} = O(k/4^k)\Vert {\dot{p}-\mathfrak {f}\dot{p}} \Vert . \end{aligned}$$

Similarly, instead of (100) we would only have a contribution from $\underline{{\dot{\sigma }}}$ belonging to either $A_{\texttt {0}}\times (B_{\texttt {0}})^{k-2}$ or $A_{\texttt {1}}\times (B_{\texttt {1}})^{k-2}$, where $A_{\varvec{x}}=\{{\texttt {r}}_{\texttt {0}}{\texttt {g}}_{\varvec{x}}\}$ and $B_{\varvec{x}}=\{{\texttt {b}}_{\texttt {1}}{\texttt {g}}_{\varvec{x}}\}$. It follows that instead of (101) and (102) we have the improved bound

$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {b}}_{\texttt {1}}{\texttt {y}}) +\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {b}}_{\texttt {1}}) =O(k^4/4^k)\Vert {\Delta \dot{p}} \Vert \,.\end{aligned}$$

Previously the main contribution in (104) came from (102), but now it comes instead from (103). This gives the improved bound $\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}) =O(k^2/2^{(1+\kappa )k})$, which proves the first part of (92). The second part follows by applying Lemma A.7.$\square $

Proof of Lemma A.9

We first prove (93). Assume $s,t\in \{{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\}$, and write $st\equiv s\times t \subseteq \hat{\Omega }^2$. Then for a lower bound we have

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i(st) \geqslant [1-O(k/2^k)] \dot{p}_i({\texttt {b}}{\texttt {b}})^{k-1} =1-O(k/2^k)\,. \end{aligned}$$

for an upper bound we have

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i(st)&\leqslant \dot{p}_i({\texttt {g}}{\texttt {g}})^{k-1} + k\dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {g}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}})^{k-2} + k\dot{p}_i({\texttt {g}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {g}}{\texttt {b}}_{\texttt {1}})^{k-2}\\&\qquad + k\dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} + k^2 \dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3}\\ {}&= 1+O(k^2/2^k). \end{aligned}$$

Writing ${\texttt {r}}_{\texttt {1}}t\equiv {\texttt {r}}_{\texttt {1}}\times t$ for $t\in \{{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\}$, a similar argument gives

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}t)&\geqslant [1-O(k/2^k)] \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}})^{k-1} = [1-O(k/2^k)] \cdot (2/2^k)\,,\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}t)&\leqslant \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {g}})^{k-1} + k \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})^{k-2} = [1-O(k/2^k)] \cdot (2/2^k)\,. \end{aligned}$$

(105)

Lastly, it is easily seen that

$$\begin{aligned} \hat{m}\hat{p}_i({\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}}) = \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})^{k-1} = [1-O(k/2^{ck})] \cdot (2/2^k)^2\,. \end{aligned}$$

This concludes the proof of (93), and we turn next to the proof of (94). Arguing similarly as for (80) gives

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i(\{{\texttt {f}}{\texttt {f}}\} {\setminus }\{{\texttt {s}}{\texttt {s}}\}) \leqslant \hat{m}^\lambda \hat{p}_i({\texttt {f}}_{\geqslant 1}{\texttt {f}}) +\hat{m}^\lambda \hat{p}_i({\texttt {f}}{\texttt {f}}_{\geqslant 1}) = O(k/4^{k})\,. \end{aligned}$$

Next, suppose $\underline{{\dot{\sigma }}}$ is compatible with $\hat{\sigma }\in {\texttt {b}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}$: if $\underline{{\dot{\sigma }}}$ has no entry in $\{{\texttt {r}}\}$, then it is also compatible with some $\hat{\sigma }'\in {\texttt {f}}{\texttt {f}}_{\geqslant 1}$, provided we allow $|(\hat{\sigma }')^1|>T$. Therefore

$$\begin{aligned}&\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}) - \hat{m}^\lambda \hat{p}_{i,\infty } ({\texttt {f}}{\texttt {f}}_{\geqslant 1}) \\&\qquad \leqslant \sum _{{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} \bigg [ k\dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {f}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}}_{\varvec{y}})^{k-2} +k^2 \dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\varvec{y}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}}_{\varvec{y}})^{k-3} \bigg ], \end{aligned}$$

and by definition of $\varvec{\Gamma }(c,\kappa )$ this is $O(k/4^k)$. Finally,

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}) \leqslant \sum _{{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} k\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {f}}) \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {g}}_{\varvec{y}})^{k-2} =O(k/8^k), \end{aligned}$$

which proves the first part of (94). For the second part, arguing as for (87), we have for any $\hat{\eta }\in \hat{\Omega }$ that

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {f}}\hat{\eta }) -\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}\hat{\eta }) \leqslant (k-1) \sum _{\underline{{\dot{\sigma }}}\sim \hat{\eta }} [ \dot{p}_i({\texttt {g}}_{\texttt {0}}\dot{\sigma }_2) -\dot{p}_i({\texttt {r}}_{\texttt {0}}\dot{\sigma }_2) ] \prod _{j=3}^k \dot{p}_i({\texttt {b}}_{\texttt {1}}\dot{\sigma }_j)\,. \end{aligned}$$

Note that $\underline{{\dot{\sigma }}}$ has at most one entry in $\{{\texttt {r}}\}$. If $\dot{\sigma }_2={\texttt {r}}_{\texttt {0}}$, then $\dot{\sigma }_j={\texttt {b}}_{\texttt {1}}$ for all $j\geqslant 3$. Since ${\dot{q}}_i\in \varvec{\Gamma }(c,\kappa )$ (which means also that ${\dot{q}}_i=({\dot{q}}_i)^\text {av}$), we have

$$\begin{aligned} \sum _{\underline{{\dot{\sigma }}}\sim \hat{\eta }} \mathbf {1}\{\dot{\sigma }_2=\zeta \} \prod _{j=3}^k \dot{p}_i({\texttt {b}}_{\texttt {1}}\dot{\sigma }_j) \leqslant {\left\{ \begin{array}{ll} \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} \leqslant O(4^{-k}) &{} \text {if }\zeta ={\texttt {r}}_{\texttt {0}},\\ \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}})^{k-3} \leqslant O(2^{-k}) &{} \text {if }\zeta \in \dot{\Omega }{\setminus }\{{\texttt {r}}_{\texttt {0}}\}. \end{array}\right. } \end{aligned}$$

On the other hand, ${\dot{q}}_i\in \varvec{\Gamma }(c,\kappa )$ also implies

$$\begin{aligned}\dot{p}_i({\texttt {g}}_{\texttt {0}}\zeta ) -\dot{p}_i({\texttt {r}}_{\texttt {0}}\zeta ) \leqslant O(2^{-k}) \dot{p}_i({\texttt {b}}_{\texttt {0}}\zeta ) +\dot{p}_i({\texttt {f}}\zeta ) \leqslant {\left\{ \begin{array}{ll} O(1) &{}\text {if }\zeta ={\texttt {r}}_{\texttt {0}},\\ O(2^{-k}) &{}\text {if }\zeta \in \dot{\Omega }{\setminus }\{{\texttt {r}}_{\texttt {0}}\}.\end{array}\right. } \end{aligned}$$

Combining these estimates and summing over $\hat{\eta }\in \hat{\Omega }$ proves the second part of (94).$\square $

An immediate application of (93), which will be useful in the next proof, is that

$$\begin{aligned} \frac{\hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\varvec{x}}\hat{\eta })}{\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\varvec{x}}\hat{\eta })} \geqslant [1+O(k^2/2^{k})] \cdot (2/2^k)\,. \end{aligned}$$

(106)

for all $\hat{\eta }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\}$ and all ${\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}$.

Proof of Lemma A.10

We divide the proof in two parts.

Step I. Non-normalized messages.

1.
First consider $\dot{\sigma }\in \{{\texttt {f}}{\texttt {f}}\}$. Recalling $(a+b)^\lambda \leqslant a^\lambda + b^\lambda $ for $a,b\geqslant 0$ and $\lambda \in [0,1]$,
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}{\texttt {f}}) \leqslant 2 \sum _{{\hat{r}} \in \{\hat{p},\mathfrak {f}\hat{p}\} } \sum _{\underline{{\hat{\sigma }}} \in \{{\texttt {f}}{\texttt {f}}\}^{k-1}} \bigg |\prod _{j=2}^d \hat{m}^\lambda {\hat{r}}_1(\hat{\sigma }_j) -\prod _{j=2}^d\hat{m}^\lambda {\hat{r}}_2(\hat{\sigma }_j) \bigg |\end{aligned}$$
where the ${\hat{r}}=\mathfrak {f}\hat{p}$ term arises from the fact that
$$\begin{aligned}\hat{m}(\hat{\sigma }^1)^\lambda [1-\hat{m}(\hat{\sigma }^2)]^\lambda \hat{p}(\hat{\sigma }) =\hat{m}(\hat{\sigma }^1)^\lambda \hat{m}(\hat{\sigma }^2\oplus {\texttt {1}})^\lambda (\mathfrak {f}\hat{p})(\mathfrak {f}\hat{\sigma }) =\hat{m}^\lambda \mathfrak {f}\hat{p}(\mathfrak {f}\hat{\sigma })\,.\end{aligned}$$
Applying (77) gives
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}{\texttt {f}}) =O(d) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \Delta \hat{m}^\lambda {\hat{r}}({\texttt {f}}{\texttt {f}}) \Big (\hat{m}^\lambda {\hat{r}}_1({\texttt {f}}{\texttt {f}}) + \Delta \hat{m}^\lambda {\hat{r}}({\texttt {f}}{\texttt {f}}) \Big )^{d-2}\,.\end{aligned}$$
We have from (91) and (93) that $\hat{m}^\lambda \hat{p}_1({\texttt {f}}{\texttt {f}}) \asymp 1$ and $\Delta \hat{m}^\lambda \hat{p}({\texttt {f}}{\texttt {f}}) = O(k^3/2^{(1+c)k})$, so
$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {f}}{\texttt {f}}) = O(d)\Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \cdot \dot{p}^{\text {u}}_1({\texttt {f}}{\texttt {f}})\,.\end{aligned}$$
(107)
2.
Next consider $\dot{\sigma }\in \{{\texttt {p}}_{\texttt {1}}{\texttt {f}}\}$. Let ${\hat{r}}_{\max }(\hat{\sigma }) \equiv \max _{i=1,2}{\hat{r}}_i(\hat{\sigma })$—in this notation,
$$\begin{aligned}{\hat{r}}_{\max }(\hat{\Omega }) = \sum _{\hat{\sigma }\in \hat{\Omega }} \max _{i=1,2} {\hat{r}}_i(\hat{\sigma }) \geqslant \max _{i=1,2} \sum _{\hat{\sigma }\in \hat{\Omega }} {\hat{r}}_i(\hat{\sigma }) = \max _{i=1,2}{\hat{r}}_i(\hat{\Omega })\end{aligned}$$
where the inequality may be strict. Then
$$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) = O(d) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \Delta \hat{m}^\lambda {\hat{r}}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) [\hat{m}^\lambda {\hat{r}}_{\max } ({\texttt {p}}_{\texttt {1}}{\texttt {f}}) ]^{d-2}\,.\end{aligned}$$
Let $a\in {{\,\mathrm{arg\,max}\,}}_i {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {s}})$, so that
$$\begin{aligned}0\leqslant \hat{m}^\lambda {\hat{r}}_{\max }({\texttt {p}}_{\texttt {1}}{\texttt {f}}) -\hat{m}^\lambda {\hat{r}}_a({\texttt {p}}_{\texttt {1}}{\texttt {f}}) \leqslant \Delta \hat{m}^\lambda {\hat{r}}({\texttt {r}}_{\texttt {1}}{\texttt {f}}) +\Delta \hat{m}^\lambda {\hat{r}} ({\texttt {b}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}) = O(2^{ -(1+ c)k}),\end{aligned}$$
where the last estimate is by (91) and (94). We also have from (93) that $\hat{m}^\lambda \hat{p}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) \geqslant \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {f}}) \asymp 1$, and it follows that
$$\begin{aligned}{}[\hat{m}^\lambda {\hat{r}}_{\max } ({\texttt {p}}_{\texttt {1}}{\texttt {f}})]^{d-2} \asymp [\hat{m}^\lambda {\hat{r}}_a ({\texttt {p}}_{\texttt {1}}{\texttt {f}})]^{d-1}. \end{aligned}$$
(108)
Applying (93) and (94) again, we have (for $i=1,2$)
$$\begin{aligned}^{d-1} \asymp [ \hat{m}^\lambda {\hat{r}}_i ({\texttt {p}}_{\texttt {1}}{\texttt {s}})]^{d-1}\,.\end{aligned}$$
On the other hand, assuming $T\geqslant 1$, we have
$$\begin{aligned}\dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}) \geqslant [ \hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {s}}) ]^{d-1} -[ \hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {s}}) ]^{d-1} \asymp [ \hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {s}}) ]^{d-1}\end{aligned}$$
where the last step follows by (106). Similarly,
$$\begin{aligned} \dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}) - \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}})&=O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}})^{d-1} = O(2^{-k}) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {f}})^{d-1} \nonumber \\&= O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}) = O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}), \end{aligned}$$
(109)
where the last step follows by rearranging the terms. Combining the above gives
$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) \leqslant O(d) \Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \max _{i=1,2} \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}). \end{aligned}$$
(110)
Clearly, similar bounds hold if we replace ${\texttt {p}}_{\texttt {1}}{\texttt {f}}$ with any of ${\texttt {p}}_{\texttt {0}}{\texttt {f}}$, ${\texttt {f}}{\texttt {p}}_{\texttt {1}}$, or ${\texttt {f}}{\texttt {p}}_{\texttt {0}}$.
3.
Lastly we bound $\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{x}})$ for ${\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}$. As in the single-copy recursion, we denote
$$\begin{aligned} {\dot{r}}({\texttt {X}}_{\varvec{x}}\dot{\sigma })&\equiv {\dot{r}}({\texttt {r}}_{\varvec{x}}\dot{\sigma }) -{\dot{r}}({\texttt {b}}_{\varvec{x}}\dot{\sigma }),\\ {\dot{r}}(\dot{\sigma }{\texttt {X}}_{\varvec{x}})&\equiv {\dot{r}}(\dot{\sigma }{\texttt {r}}_{\varvec{x}}) -{\dot{r}}(\dot{\sigma }{\texttt {b}}_{\varvec{x}}),\\ {\dot{r}}({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}})&\equiv {\dot{r}}({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) -{\dot{r}}({\texttt {r}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) - {\dot{r}}({\texttt {b}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + {\dot{r}}({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}). \end{aligned}$$
Applying (106) gives
$$\begin{aligned} \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}})&=[\hat{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})]^{d-1} = O(2^{-k}) [\hat{p}_i({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})]^{d-1} = O(2^{-k})\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}),\\ \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}})&=[ \hat{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})]^{d-1} =O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}). \end{aligned}$$
Combining the above estimates gives
$$\begin{aligned}\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) -\dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) =\dot{p}^{\text {u}}_i ({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + \dot{p}^{\text {u}}_i ({\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) - \dot{p}^{\text {u}}_i ({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) =O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}})\,. \end{aligned}$$
Further, it follows from the bp equations that
$$\begin{aligned}&\max \{ \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}), \dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}), \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}), \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) \} \leqslant \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) -\dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}), \nonumber \\&\quad \text {so } \dot{p}^{\text {u}}_i(st) =[1+O(2^{-k})]\dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) \text { for all }s\in \{{\texttt {r}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\}, t\in \{{\texttt {r}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\}. \end{aligned}$$
(111)
Similarly, we can upper bound
$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})&\leqslant 4[ \Delta \dot{p}^{\text {u}}({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + \Delta \dot{p}^{\text {u}}({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + \Delta \dot{p}^{\text {u}}({\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) + \Delta \dot{p}^{\text {u}}({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) ].\nonumber \\&\leqslant O(d) \sum _{{\hat{r}}\in \{ \hat{p},\mathfrak {f}\hat{p}\}} \sum _{\begin{array}{c} s\in \{{\texttt {p}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\} \\ t\in \{{\texttt {p}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\} \end{array}} \Vert { \Delta \hat{m}^\lambda {\hat{r}} } \Vert [\hat{m}^\lambda {\hat{r}}_{\max }(st)]^{d-2}. \end{aligned}$$
(112)
For ${\hat{r}} \in \{\hat{p},\mathfrak {f}\hat{p}\}$, let $a={{\,\mathrm{arg\,max}\,}}_{i=1,2} \hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})$: then, for any $s\in \{{\texttt {p}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\}$, $t\in \{{\texttt {p}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\}$,
$$\begin{aligned} 0&\leqslant {\hat{m}^\lambda {\hat{r}}_{\max }(st)}- {\max _{i=1,2}\hat{m}^\lambda {\hat{r}}_i(st)} \leqslant {\hat{m}^\lambda {\hat{r}}_{\max }(st)} -\hat{m}^\lambda {\hat{r}}_a(st)\\&\leqslant O(1) \Delta \hat{m}^\lambda {\hat{r}}(\{{\texttt {p}}{\texttt {p}}\}{\setminus }\{{\texttt {b}}{\texttt {b}}\}) \leqslant O(1/2^{(1+c)k}), \end{aligned}$$
where the last estimate is by (91). Combining with (72) and (111) gives
$$\begin{aligned}\sum _{\begin{array}{c} s\in \{{\texttt {p}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\} \\ t\in \{{\texttt {p}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\} \end{array}} [\hat{m}^\lambda {\hat{r}}_{\max }(st)]^{d-2} = O(1) \Big [ \max _{i=1,2} {\hat{r}}_i({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})\Big ]^{d-1} =O(1) \max _{i=1,2} \dot{p}^{\text {u}}_i({\texttt {b}}{\texttt {b}})\,.\end{aligned}$$
Substituting into (112) gives
$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}}) \leqslant O(d)\Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \max _{i=1,2}\dot{p}^{\text {u}}_i ({\texttt {b}}{\texttt {b}})\,.\end{aligned}$$
(113)
Further, for any $st\in \{{\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}, {\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}},{\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}\}$, we have
$$\begin{aligned} \Delta \dot{p}^{\text {u}}(st) \leqslant O(k)\Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \max _{i=1,2}\dot{p}^{\text {u}}_i ({\texttt {b}}{\texttt {b}})\,.\end{aligned}$$
(114)
Lastly, in the special case $\hat{p}_2 = \mathfrak {f}\hat{p}_1$, (113) reduces to
$$\begin{aligned} |\dot{p}^{\text {u}}_1({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}^{\text {u}}_1({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|&\leqslant O(d) \Vert {\hat{m}^\lambda \hat{p}_1 -\hat{m}^\lambda \mathfrak {f}\hat{p}_1} \Vert \dot{p}^{\text {u}}_1({\texttt {b}}{\texttt {b}})\nonumber \\&\leqslant k^5 2^{(1-2\kappa )k} \Vert {\dot{p}_i-\mathfrak {f}\dot{p}_i} \Vert \,. \end{aligned}$$
(115)
where the last estimate is by (92).

Step II. Normalized messages. Recall $\tilde{q}_i\equiv {\texttt {BP}}{\dot{q}}_i$. It remains to verify that $\tilde{q}_i\in \varvec{\Gamma }(c',1)$ with $c'=\max \{0,2\kappa -1\}$: recalling the definition of $\varvec{\Gamma }$, this means

$$\begin{aligned}&|p({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -p({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})| \leqslant (k^9/2^{c'k})p({\texttt {b}}{\texttt {b}})\text { and } p({\texttt {f}}{\texttt {f}})+p(\{{\texttt {f}}{\texttt {r}},{\texttt {r}}{\texttt {f}}\})/2^k \\&\quad + p({\texttt {r}}{\texttt {r}})/4^k = O(2^{-k}) p({\texttt {b}}{\texttt {b}});\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \, (1\varvec{\Gamma }')\\&p( {\texttt {f}}{\texttt {r}}) = O(2^{-k})p({\texttt {b}}{\texttt {b}})\text { and } p({\texttt {r}}{\texttt {r}}) = O(1) p({\texttt {b}}{\texttt {b}}); \qquad \qquad \qquad \qquad \qquad \qquad \qquad (2\varvec{\Gamma }')\\&p({\texttt {r}}_{\varvec{x}}\dot{\sigma }) \geqslant [1-O(2^{-k})] p({\texttt {b}}_{\varvec{x}}\dot{\sigma }) \text { for all } {\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\text { and } \dot{\sigma }\in \dot{\Omega }.\qquad \qquad \qquad \qquad \,\, (3\varvec{\Gamma }') \end{aligned}$$

Condition ($3\varvec{\Gamma }'$) is automatically satisfied due to the bp equations. The second part of ($2\varvec{\Gamma }'$) follows from (111). The first part of ($1\varvec{\Gamma }'$) holds trivially if $c'=0$, and otherwise follows from (115). We claim that

$$\begin{aligned} \tilde{q}_i(\{{\texttt {r}}{\texttt {f}},{\texttt {f}}{\texttt {r}},{\texttt {f}}{\texttt {f}}\}) = O(2^{-k})\tilde{q}_i({\texttt {b}}{\texttt {b}})\,. \end{aligned}$$

(116)

This immediately gives the first part of ($2\varvec{\Gamma }'$). Further, the bp equations give $\tilde{q}_i({\texttt {b}}{\texttt {f}})\leqslant \tilde{q}_i({\texttt {r}}{\texttt {f}})$ and $\tilde{q}_i({\texttt {f}}{\texttt {b}})\leqslant \tilde{q}_i({\texttt {f}}{\texttt {r}})$, so the second part of ($1\varvec{\Gamma }'$) also follows. To see that (116) holds, note that the second part of (94) gives

$$\begin{aligned} \dot{p}^{\text {u}}_i({\texttt {f}}{\texttt {f}})&\leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i({\texttt {f}}{\texttt {f}})]^{d-1} \leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})]^{d-1},\\ \dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}})&\leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {f}})]^{d-1} \leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i ({\texttt {p}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})]^{d-1}. \end{aligned}$$

Combining with (106) gives $\dot{p}^{\text {u}}_i(\{{\texttt {r}}_{\texttt {1}}{\texttt {f}},{\texttt {f}}{\texttt {f}}\}) = O(2^{-k})\dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}})$. Recalling (111) (and making use of symmetry) gives (116). Finally, we conclude the proof of the lemma by bounding the difference $\Delta \tilde{q}\equiv |\tilde{q}_1-\tilde{q}_2|$. Recalling the definition of ${\texttt {X}}_{\varvec{x}}$, we have

$$\begin{aligned} \Delta \tilde{q}({\texttt {p}}{\texttt {p}})&\leqslant O(1) \Delta \tilde{q}(\{{\texttt {b}}{\texttt {b}}, {\texttt {r}}{\texttt {X}},{\texttt {X}}{\texttt {r}},{\texttt {X}}{\texttt {X}}\}),\\ \Delta \tilde{q}(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\})&\leqslant O(1)\Delta \tilde{q}( \{{\texttt {b}}{\texttt {f}},{\texttt {f}}{\texttt {b}},{\texttt {f}}{\texttt {f}}, {\texttt {f}}{\texttt {X}},{\texttt {X}}{\texttt {f}}\}). \end{aligned}$$

We next bound $\Delta \tilde{q}({\texttt {b}}{\texttt {b}})$, which is the sum of $\Delta \tilde{q}({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})$ over ${\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}$. By symmetry let us take ${\varvec{x}}={\varvec{y}}={\texttt {0}}$. Since $\tilde{q}_i=(\tilde{q}_i)^\text {av}$, $\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) =\tfrac{1}{4}\tilde{q}_i({\texttt {b}}{\texttt {b}}) +\tfrac{1}{2}[ \tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) ]$, so

$$\begin{aligned} \Delta \tilde{q}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) \leqslant \tfrac{1}{4} |\tilde{q}_1({\texttt {b}}{\texttt {b}})-\tilde{q}_2({\texttt {b}}{\texttt {b}})| +\tfrac{1}{2} \sum _{i=1,2}|\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|\,. \end{aligned}$$

Since the $\tilde{q}_i$ are normalized to be probability measures,

$$\begin{aligned} 1-\tilde{q}_i(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\}) =\tilde{q}_i({\texttt {p}}{\texttt {p}}) =2\tilde{q}_i({\texttt {r}}{\texttt {X}}) +2\tilde{q}_i({\texttt {X}}{\texttt {r}}) -3 \tilde{q}_i({\texttt {X}}{\texttt {X}}) +4\tilde{q}_i({\texttt {b}}{\texttt {b}}), \end{aligned}$$

from which it follows that

$$\begin{aligned} |\tilde{q}_1({\texttt {b}}{\texttt {b}})-\tilde{q}_2({\texttt {b}}{\texttt {b}})| \lesssim |\tilde{q}_1(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\}) -\tilde{q}_2(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\})| + \Delta \tilde{q}(\{ {\texttt {r}}{\texttt {X}},{\texttt {X}}{\texttt {r}},{\texttt {X}}{\texttt {X}}\})\,. \end{aligned}$$

Combining the above estimates gives

$$\begin{aligned} \Vert {\Delta \tilde{q}} \Vert \lesssim \Delta \tilde{q}(\texttt {A} ) + \sum _{i=1,2} |\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|,\quad \texttt {A} \equiv \{ {\texttt {b}}{\texttt {f}},{\texttt {f}}{\texttt {b}},{\texttt {f}}{\texttt {f}}, {\texttt {f}}{\texttt {X}},{\texttt {X}}{\texttt {f}}, {\texttt {r}}{\texttt {X}},{\texttt {X}}{\texttt {r}},{\texttt {X}}{\texttt {X}}\}\,. \end{aligned}$$

Write $\dot{Z}_i\equiv \Vert {\dot{p}^{\text {u}}_i} \Vert $. Taking $a\in \{1,2\}$ and $b=2-a$, we find $\Vert {\Delta \tilde{q}} \Vert \leqslant e_1+e_2e_3+e_4$ with

$$\begin{aligned}&e_1\equiv \frac{\Delta \dot{p}^{\text {u}}(\texttt {A})}{{\dot{Z}}_a}\,,\quad e_2\equiv \frac{|{\dot{Z}}_1-{\dot{Z}}_2|}{{\dot{Z}}_a} \\&\quad \leqslant \frac{\Vert {\Delta \dot{p}^{\text {u}}} \Vert }{{\dot{Z}}_a}\,, \quad e_3\equiv \frac{\dot{p}^{\text {u}}_b(\texttt {A})}{{\dot{Z}}_b}\,, \quad \\&\quad e_4\equiv \sum _{i=1,2} \frac{|\dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|}{{\dot{Z}}_i}\,. \end{aligned}$$

It follows from (107), (110), (114) and (116), and taking $a={{\,\mathrm{arg\,max}\,}}_i\dot{p}^{\text {u}}_i({\texttt {b}}{\texttt {b}})$, that

$$\begin{aligned} e_1 \lesssim \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert (d/2^k) \max _{i=1,2}\dot{p}^{\text {u}}_i({\texttt {b}}{\texttt {b}})/ {\dot{Z}}_a \lesssim k \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \,. \end{aligned}$$

Further, recalling (113) gives

$$\begin{aligned} e_2 \lesssim k2^k \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \,. \end{aligned}$$

Combining (109), (111), and (116) gives $e_3=O(2^{-k})$. Finally, (115) gives

$$\begin{aligned} e_4 \lesssim k2^{k} \Vert {\hat{m}^\lambda \hat{p}_i -\hat{m}^\lambda \mathfrak {f}\hat{p}_i} \Vert \,. \end{aligned}$$

Combining the pieces together finishes the proof.$\square $

Appendix B: The 1RSB free energy

1.1 B.1. Equivalence of recursions

In this section, we relate the coloring recursion (64) to the distributional recursion (10), and prove the following:

Proposition B.1

Let ${\dot{q}}_\lambda $ be the fixed point given by Proposition 5.5a for parameters $\lambda \in [0,1]$ and $T=\infty $. Let $H_\lambda \equiv (\dot{H}_\lambda ,\hat{H}_\lambda ,\bar{H}_\lambda )\in \varvec{\Delta }$ be the associated triple of measures defined by Proposition 3.4. We then have the identity $(\varvec{s}(H_\lambda ),\varvec{\Sigma }(H_\lambda ),\varvec{F}(H_\lambda )) = (s_\lambda , \Sigma (s_\lambda ),{\mathfrak {F}}(\lambda ))$.

In the course of the proof, we will obtain Proposition 1.2 as a corollary. Throughout the section we take $T=\infty $ unless explicitly indicated otherwise. We begin with some notations. Recall that ${\mathscr {P}}(\mathcal {X})$ is the space of probability measures on $\mathcal {X}$. Given ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega })$, we define two associated measures $\dot{m}^\lambda {\dot{q}},(1-\dot{m})^\lambda {\dot{q}}$ on $\dot{\Omega }$ by

$$\begin{aligned}(\dot{m}^\lambda {\dot{q}})(\dot{\sigma }) \equiv \dot{m}(\dot{\sigma })^\lambda {\dot{q}}(\dot{\sigma }),\quad ((1-\dot{m})^\lambda {\dot{q}})(\dot{\sigma }) \equiv (1-\dot{m}(\dot{\sigma }))^\lambda {\dot{q}}(\dot{\sigma }),\end{aligned}$$

We let $\dot{\pi }\equiv \dot{\pi }({\dot{q}})$ be the probability measure on $\dot{\mathscr {M}}{\setminus }\{\star \}$ given by

Recall from §A.1 the mappings $\dot{m}: \dot{\Omega }\rightarrow [0,1]$ and $\hat{m}: \dot{\Omega }\rightarrow [0,1]$. We then denote the pushforward measure $\dot{u}\equiv \dot{u}({\dot{q}}) \equiv \dot{\pi }\circ \dot{m}^{-1}$, so that $\dot{u}$ belongs to the space ${\mathscr {P}}$ of discrete probability measures on [0, 1]. Analogously, given $\hat{q}\in {\mathscr {P}}(\hat{\Omega })$, we define two associated measures $\hat{m}^\lambda \hat{q},(1-\hat{m})^\lambda \hat{q}$ on $\hat{\Omega }$. We let $\hat{\pi }\equiv \hat{\pi }(\hat{q})$ be the probability measure on $\hat{\mathscr {M}}{\setminus }\{\star \}$ given by

and we then denote $\hat{u}\equiv \hat{u}(\hat{q})\equiv \hat{\pi }\circ \hat{m}^{-1}$, so that $\hat{u}\in {\mathscr {P}}$ also. The next two lemmas follow straightforwardly from the above definitions, and we omit their proofs:

Lemma B.2

Suppose ${\dot{q}}\in {\mathscr {P}}(\dot{\Omega })$ satisfies ${\dot{q}}={\dot{q}}^\text {av}$ and

$$\begin{aligned} \dot{m}^\lambda {\dot{q}}({\texttt {f}}) ={\dot{q}}({\texttt {r}}_{\texttt {1}}) - {\dot{q}}({\texttt {b}}_{\texttt {1}}) ={\dot{q}}({\texttt {r}}_{\texttt {0}}) - {\dot{q}}({\texttt {b}}_{\texttt {0}}) =(1-\dot{m})^\lambda {\dot{q}}({\texttt {f}}) \end{aligned}$$

(117)

Then $\hat{q}\equiv \hat{{\texttt {BP}}}{\dot{q}}\in {\mathscr {P}}(\hat{\Omega })$ must satisfy $\hat{q}=\hat{q}^\text {av}$ and

$$\begin{aligned} \hat{m}^\lambda \hat{q}({\texttt {f}}) = \hat{q}({\texttt {b}}_{\texttt {1}}) = \hat{q}({\texttt {b}}_{\texttt {0}}) = (1-\hat{m})^\lambda \hat{q}({\texttt {f}}), \end{aligned}$$

(118)

Let $\hat{\varvec{z}}\equiv (\hat{{\texttt {NB}}}{\dot{q}})/(\hat{{\texttt {BP}}}{\dot{q}})$ be the normalizing constant. Then $\dot{u}\equiv \dot{u}({\dot{q}})$ and $\hat{u}\equiv \hat{u}(\hat{q})$ satisfy

$$\begin{aligned} \hat{u}= \hat{\mathscr {R}}_\lambda (\dot{u}), \quad \hat{{\mathscr {Z}}}_\lambda (\dot{u}) = \frac{\hat{\varvec{z}}(1-\hat{q}({\texttt {b}}))}{(1-{\dot{q}}({\texttt {r}}))^{k-1}}. \end{aligned}$$

(119)

Lemma B.3

Suppose $\hat{q}\in {\mathscr {P}}(\hat{\Omega })$ satisfies $\hat{q}=\hat{q}^\text {av}$ and (118). Then ${\dot{q}}\equiv \hat{{\texttt {BP}}}\hat{q}\in {\mathscr {P}}(\dot{\Omega })$ must satisfy ${\dot{q}}={\dot{q}}^\text {av}$ and (117). Let $\dot{\varvec{z}}\equiv (\dot{{\texttt {NB}}}\hat{q})/(\dot{{\texttt {BP}}}\hat{q})$ be the normalizing constant: then

$$\begin{aligned} \dot{u}= \dot{\mathscr {R}}_\lambda (\hat{u}), \quad \dot{{\mathscr {Z}}}_\lambda (\hat{u}) = \frac{\dot{\varvec{z}}(1-{\dot{q}}({\texttt {r}}))}{(1-\hat{q}({\texttt {b}}))^{d-1}}. \end{aligned}$$

(120)

Proof of Proposition 1.2

This is simply a rephrasing of the proof of Proposition 5.5a, using Lemma B.2 and Lemma B.3.$\square $

We next prove Proposition B.1. In the remainder of this section, fix $\lambda \in [0,1]$ and $T=\infty $. Let ${\dot{q}}\equiv {\dot{q}}_\lambda $ be the fixed point of ${\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,\infty }$ given by Proposition 5.5a. Let $\hat{q}\equiv \hat{q}_\lambda $ denote the image of ${\dot{q}}$ under the mapping $\hat{{\texttt {BP}}}\equiv \hat{{\texttt {BP}}}_{\lambda ,\infty }$. Denote the associated normalizing constants

$$\begin{aligned} \hat{\varvec{z}}\equiv \hat{\varvec{z}}_\lambda \equiv (\hat{{\texttt {NB}}}{\dot{q}})/(\hat{{\texttt {BP}}}{\dot{q}}),\quad \dot{\varvec{z}}\equiv \dot{\varvec{z}}_\lambda \equiv (\dot{{\texttt {NB}}}\hat{q})/(\dot{{\texttt {BP}}}\hat{q})\,. \end{aligned}$$

Let $H_\lambda \equiv (\dot{H}_\lambda ,\hat{H}_\lambda ,\bar{H}_\lambda )$ be the triple of associated measures defined as in Proposition 3.4, with normalizing constants $(\dot{\mathfrak {z}}_\lambda ,\hat{\mathfrak {z}}_\lambda ,\bar{\mathfrak {z}}_\lambda )$. Recall from (12) that ${\mathfrak {F}}(\lambda )=\ln \dot{\mathfrak {Z}}_\lambda +\alpha \ln \hat{\mathfrak {Z}}_\lambda - d\ln \bar{\mathfrak {Z}}_\lambda $. We now show that it coincides with $\varvec{F}(H_\lambda )$:

Lemma B.4

Under the above notations, $\varvec{F}(H_\lambda ) = \ln \dot{\mathfrak {z}}_\lambda + \alpha \ln \hat{\mathfrak {z}}_\lambda - d \ln \bar{\mathfrak {z}}_\lambda $, and

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda = \frac{\bar{\mathfrak {z}}_\lambda }{(1-{\dot{q}}_\lambda ({\texttt {r}}))(1-\hat{q}_\lambda ({\texttt {b}}))} ,\quad \dot{\mathfrak {Z}}_\lambda = \frac{\dot{\mathfrak {z}}_\lambda }{(1-\hat{q}_\lambda ({\texttt {b}}))^{d}} ,\quad \hat{\mathfrak {Z}}_\lambda = \frac{\hat{\mathfrak {z}}_\lambda }{(1-{\dot{q}}_\lambda ({\texttt {r}}))^{k}} . \end{aligned}$$

(121)

Consequently ${\mathfrak {F}}(\lambda ) = \varvec{F}(H_\lambda )$.

Proof

It follows from the definition (43) (and recalling from Corollary 2.18 that $\hat{\Phi }(\underline{{\sigma }})^\lambda = \hat{F}(\underline{{\sigma }})^\lambda {\hat{v}}(\underline{{\sigma }})$) that

$$\begin{aligned} \varvec{F}(H_\lambda ) = \langle \ln (\dot{\Phi }^\lambda /\dot{H}) , \dot{H}_\lambda \rangle + \alpha \langle \ln ( \hat{\Phi }^\lambda /\hat{H}_\lambda ) ,\hat{H}_\lambda \rangle + d \langle \ln ( \bar{\Phi }^\lambda \bar{H}_\lambda ) , \bar{H}_\lambda \rangle \,. \end{aligned}$$

Substituting in Definition 5.6 and rearranging gives

$$\begin{aligned}&\varvec{F}(H_\lambda ) -\Big (\ln \dot{\mathfrak {z}}_\lambda + \alpha \ln \hat{\mathfrak {z}}_\lambda - d \ln \bar{\mathfrak {z}}_\lambda \Big ) \\&\qquad =- \Big \langle \sum _{i=1}^d\ln \hat{q}_\lambda (\hat{\sigma }_i), \dot{H}_\lambda \Big \rangle -\alpha \Big \langle \sum _{i=1}^k\ln {\dot{q}}_\lambda (\dot{\sigma }_i),\hat{H}_\lambda \Big \rangle + d \langle \ln [{\dot{q}}_\lambda (\dot{\sigma })\hat{q}_\lambda (\hat{\sigma })], \bar{H}_\lambda \rangle . \end{aligned}$$

This equals zero since $H_\lambda \in \varvec{\Delta }$. The proof of (121) is straightforward from the preceding definitions, and is omitted. $\square $

Proof of Proposition B.1

By similar calculations as above, it is straightforward to verify that $s_\lambda =\varvec{s}(H_\lambda )$. Since by definition ${\mathfrak {F}}(\lambda ) = \lambda s_\lambda + \Sigma (s_\lambda )$ and $\varvec{F}(H_\lambda ) = \lambda \varvec{s}(H_\lambda ) + \varvec{\Sigma }(H_\lambda )$, it follows that $\Sigma (s_\lambda )= \varvec{\Sigma }(H_\lambda )$, concluding the proof.$\square $

Proof of Proposition 3.13

Immediate consequence of Proposition B.1 together with Proposition 5.5b.$\square $

1.2 B.2. Large-k asymptotics

We now evaluate the large-k asymptotics of the free energy, beginning with (12). Let $\dot{\mu }_\lambda $ be the probability measure on [0, 1] given by Proposition 1.2, and write $\hat{\mu }_\lambda \equiv \hat{{\mathscr {R}}}_\lambda (\dot{\mu }_\lambda )$. In what follows it will be useful to denote $\dot{\mu }_\lambda ({\texttt {f}}) \equiv \dot{\mu }_\lambda ((0,1))$, as well as

$$\begin{aligned}\psi _\lambda \equiv \int x^\lambda \mathbf {1}\{x\in (0,1)\} \dot{\mu }_\lambda (dx), \quad \rho _\lambda \equiv \int y^\lambda \mathbf {1}\{ y\in (0,1) {\setminus }\{\tfrac{1}{2}\} \} \hat{\mu }_\lambda (dy)\,. \end{aligned}$$

Proposition B.5

For $k\geqslant k_0$, $\alpha _\text {lbd}\leqslant \alpha = (2^{k-1}-c)\ln 2 \leqslant \alpha _\text {ubd}$, and $\lambda \in [0,1]$,

$$\begin{aligned} \ln \dot{\mathfrak {Z}}_{\lambda }&=\ln 2 -(1-2^{\lambda -1})/2^k + d\ln \Big ( 2^{-\lambda }\hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }_\lambda (1) +\rho _\lambda \Big ) + \mathrm{{\textsf {{err}}}}, \end{aligned}$$

(122)

$$\begin{aligned} -d\ln \bar{\mathfrak {Z}}&=- d \ln \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }(1) + \rho _\lambda \Big ) - (k\ln 2) [- \dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ] + \mathrm{{\textsf {{err}}}}, \end{aligned}$$

(123)

$$\begin{aligned} \alpha \ln \hat{\mathfrak {Z}}&=\alpha \ln (1-2/2^k) + (k\ln 2) ( - \dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ) + \mathrm{{\textsf {{err}}}}, \end{aligned}$$

(124)

where $\mathrm{{\textsf {{err}}}}$ denotes any error bounded by $k^{O(1)}/4^k$. Altogether this yields

$$\begin{aligned} {\mathfrak {F}}(\lambda ) ={\textsf {f}}^\textsc {rs}(\alpha ) - (1-2^{\lambda -1})/2^k + \mathrm{{\textsf {{err}}}}= [(2c-1)\ln 2 -(1-2^{\lambda -1})]/2^k +\mathrm{{\textsf {{err}}}}\,. \end{aligned}$$

On the other hand $\lambda s_\lambda = \lambda (\ln 2) 2^{\lambda -1} / 2^k + \mathrm{{\textsf {{err}}}}$.

Proof of Proposition 1.4b

Apply Proposition B.5: setting ${\mathfrak {F}}(\lambda ) = \lambda s_\lambda $ gives

$$\begin{aligned} \alpha _\lambda =(2^{k-1}-c_\lambda )\ln 2+\mathrm{{\textsf {{err}}}},\quad c_\lambda = \frac{1}{2} + \frac{ 1 - 2^{\lambda -1}(1-\lambda \ln 2) }{2\ln 2}\,. \end{aligned}$$

Substituting the special values $\lambda =1$ and $\lambda =0$ gives

$$\begin{aligned} c_\text {cond} =c_1=1,\quad c_\text {sat} =c_0 =\frac{1}{2} + \frac{1}{4\ln 2}, \end{aligned}$$

verifying (1) and (16).$\square $

Proof of Proposition B.5

Throughout the proof we abbreviate $\epsilon _k$ for a small error term which may change from one occurrence to the next, but is bounded throughout by $k^C/2^k$ for a sufficiently large absolute constant C. Note that

$$\begin{aligned} \hat{\mu }_\lambda (\tfrac{1}{2}) = 1 -2 \cdot \frac{2^{1-\lambda }}{2^k} + \epsilon _k, \quad \hat{\mu }_\lambda (1) = \hat{\mu }_\lambda (0) = \frac{2^{1-\lambda }}{2^k} + \epsilon _k, \quad \hat{\mu }_\lambda ((0,1){\setminus }\{\tfrac{1}{2}\}) = \epsilon _k, \end{aligned}$$

from which it follows that $\rho _\lambda =\epsilon _k$. Meanwhile, $\psi _\lambda \leqslant \dot{\mu }_\lambda ({\texttt {f}})$, and we will show below that

$$\begin{aligned} \dot{\mu }_\lambda ({\texttt {f}})= \frac{2^{\lambda -1}}{2^k} + \epsilon _k\,. \end{aligned}$$

(125)

$\square $

Estimate of $\dot{\mathfrak {Z}}_{\lambda }$. Recall from the definition (11) that

$$\begin{aligned}\dot{\mathfrak {Z}}_{\lambda } = \int \bigg (\prod _{i=1}^d y_i +\prod _{i=1}^d (1-y_i)\bigg )^\lambda \prod _{i=1}^d \hat{\mu }_\lambda (dy_i)\,.\end{aligned}$$

Let $\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}})$ denote the contribution to $\dot{\mathfrak {Z}}_{\lambda }$ from free variables, meaning $y_i\in (0,1)$ for all i. This can be decomposed further into the contribution $\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}}_1)$ from isolated free variables (meaning $y_i=1/2$ for all i) and the remainder $\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}}_{\geqslant 2})$. We then calculate

$$\begin{aligned} \dot{\mathfrak {Z}}_\lambda ({\texttt {f}}_1) = 2^\lambda \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2})\Big )^d\,. \end{aligned}$$

This dominates the contribution from non-isolated free variables:

$$\begin{aligned} \dot{\mathfrak {Z}}_\lambda ({\texttt {f}}_{\geqslant 2})&= \sum _{j=1}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) \bigg (\int y^\lambda \mathbf {1}\{y\in (0,1){\setminus }\{\tfrac{1}{2}\}\} \hat{\mu }_\lambda (dy)\bigg )^j \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) \Big )^{d-j} \\&\leqslant O(1) d \hat{\mu }_\lambda ( (0,1){\setminus }\{\tfrac{1}{2}\}) \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) \Big )^d \leqslant \dot{\mathfrak {Z}}_\lambda ({\texttt {f}}_1) k^{O(1)}/2^k. \end{aligned}$$

Next let $\dot{\mathfrak {Z}}_{\lambda }({\texttt {1}})$ denote the contribution from variables frozen to ${\texttt {1}}$:

$$\begin{aligned} \dot{\mathfrak {Z}}_{\lambda }({\texttt {1}})&=\Big ( \int y^\lambda \hat{\mu }_\lambda (dy) \Big )^d -\Big ( \int y^\lambda \mathbf {1}\{ y\in (0,1)\} \hat{\mu }_\lambda (dy) \Big )^d\\&= \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }_\lambda (1)+ \rho _\lambda \Big )^d - 2^{-\lambda } \dot{\mathfrak {Z}}_{\lambda }({\texttt {f}}_1) + \epsilon _k. \end{aligned}$$

The ratio of free to frozen variables is given by

$$\begin{aligned}\frac{\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}})}{2\dot{\mathfrak {Z}}_{\lambda }({\texttt {1}})} = \frac{2^\lambda }{2} \bigg (\frac{\hat{\mu }_\lambda (\tfrac{1}{2})}{\hat{\mu }_\lambda (\tfrac{1}{2}) + 2^\lambda \hat{\mu }_\lambda (1)} \bigg )^d + \epsilon _k = \frac{2^{\lambda -1}}{2^k} + \epsilon _k. \end{aligned}$$

Combining these yields (122). The proof of (125) is very similar.

Estimate of $\bar{\mathfrak {Z}}_\lambda $. Recall from the definition (11) that

$$\begin{aligned}\bar{\mathfrak {Z}}_\lambda = \int \Big ( xy+(1-x)(1-y)\Big )^\lambda \dot{\mu }_\lambda (dx) \hat{\mu }_\lambda (dy)\,. \end{aligned}$$

The contribution to $\bar{\mathfrak {Z}}$ from $x=0$ or $x=1$ is given by

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda (x=1) = \dot{\mu }_\lambda (1) \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }_\lambda (1)+ \rho _\lambda \Big )=\bar{\mathfrak {Z}}_\lambda (x=0)\,. \end{aligned}$$

The contribution from $x\in (0,1)$ and $y=1/2$ is given by

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda (x\in (0,1),y=1/2) =\dot{\mu }_\lambda ({\texttt {f}}) 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2})\,. \end{aligned}$$

Lastly, the contribution from $x\in (0,1)$ and $y=1$ is given by

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda (x\in (0,1),y=1) =\hat{\mu }_\lambda (1) \psi _\lambda , \end{aligned}$$

and there is an equal contribution from the case $x\in (0,1)$ and $y=0$. The contribution from the case that both $x,y\in (0,1)$ is $\leqslant k^{O(1)}/8^k$. Combining these estimates gives

$$\begin{aligned} d\ln \bar{\mathfrak {Z}}_\lambda&= d \ln \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) + 2 \dot{\mu }_\lambda (1) \hat{\mu }(1) + 2 \dot{\mu }_\lambda (1) \rho _\lambda + 2 \hat{\mu }(1)\psi _\lambda \Big ) + \epsilon _k \\&=d \ln \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }(1) + \rho _\lambda \Big ) +d\ln \Big (1 + \frac{\hat{\mu }(1)[-\dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ] }{ 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) } \Big ) + \epsilon _k. \end{aligned}$$

Recalling $\hat{\mu }_\lambda =\hat{{\mathscr {R}}}\dot{\mu }_\lambda $ gives

$$\begin{aligned}d\ln \Big (1+\frac{ \hat{\mu }(1)[ -\dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ] }{ 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) }\Big ) =d\dot{\mu }_{\lambda }(0)^{k-1} (-\dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ) + \epsilon _k,\end{aligned}$$

and (123) follows.

Estimate of $\hat{\mathfrak {Z}}_\lambda $. Recall from the definition (11) that

$$\begin{aligned}\hat{\mathfrak {Z}}_\lambda = \int \bigg (1 -\prod _{i=1}^k x_i -\prod _{i=1}^k (1-x_i)\bigg ) \prod _{i=1}^k\dot{\mu }_\lambda (x_i)\,. \end{aligned}$$

Writing $\dot{\mu }_\lambda ({\texttt {0}},{\texttt {f}})\equiv \dot{\mu }_\lambda ([0,1))$, the contribution to $\hat{\mathfrak {Z}}$ from separating clauses is

$$\begin{aligned} 1 - 2 \dot{\mu }_\lambda ({\texttt {0}},{\texttt {f}})^k+\dot{\mu }_\lambda ({\texttt {f}})^k = 1 - (2/2^k) (1+k\dot{\mu }_\lambda ({\texttt {f}})) + k^{O(1)}/8^k\,. \end{aligned}$$

The contribution from clauses which are forcing to some variable that is not forced by any other clause is $2k \dot{\mu }_\lambda (0)^{k-1} \psi _\lambda $. The contribution from all other clause types is $\leqslant k^{O(1)}/8^k$, and (124) follows.

Estimate of $s_\lambda $. Recall from (13) the definition of $s_\lambda $. By similar considerations as above, it is straightforward to check that the total contribution from frozen variables, edges incident to frozen variables, and separating or forcing clauses is zero. The dominant term is the contribution of isolated free variables, and the estimate follows.$\square $

1.3 B.3. Properties of the complexity function

We conclude with a few basic properties of the complexity function $\Sigma (s)$, including a proof of Proposition 1.4a.

Lemma B.6

For fixed $1\leqslant T < \infty $, the fixed point ${\dot{q}}_{\lambda ,T}$ of Proposition 5.5a is continuously differentiable as a function of $\lambda \in [0,1]$.

Proof

Fix $T<\infty $ and define $f_T[{\dot{q}},\lambda ] \equiv {\texttt {BP}}_{\lambda ,T}[{\dot{q}}]-{\dot{q}}$ as the mapping from ${\mathscr {P}}(\dot{\Omega }_T)\times [0,1]$ to the set of signed measures on $\Omega _T$. Since function $\dot{z}(\dot{\sigma })$ ($\hat{z}(\hat{\sigma })$, respectively) can take only finitely many values on $\dot{\Omega }_T$ ($\hat{\Omega }_T$, respectively) and therefore must be uniformly bounded away from 0. It is straightforward to check that for any $\lambda \in [0,1]$,

$$\begin{aligned} f_T[{\dot{q}}_\star (\lambda ,T),\lambda ](\dot{\sigma }) = 0 ,\quad \forall \dot{\sigma }\in \Omega _T, \end{aligned}$$

and is uniformly differentiable in a neighborhood of $\{({\dot{q}}_\star (\lambda ,T),\lambda ):\lambda \in [0,1]\}$.

For any other ${\dot{q}}$ in the contraction region (68), Proposition A.1 guarantees that

$$\begin{aligned} \Vert {f_T[{\dot{q}},\lambda ] - f_T[{\dot{q}}_\star (\lambda ,T),\lambda ]} \Vert&\geqslant \Vert {{\dot{q}}-{\dot{q}}_\star (\lambda )} \Vert - \Vert {{\texttt {BP}}_{\lambda ,T}[{\dot{q}}] - {\texttt {BP}}_{\lambda ,T}[{\dot{q}}_\star (\lambda ,T)]} \Vert \\&\geqslant (1-O(k^22^{-k})) \Vert {{\dot{q}}-{\dot{q}}_\star (\lambda ,T)} \Vert . \end{aligned}$$

Therefore the Jacobian matrix

$$\begin{aligned}\Big (\frac{\partial f_T(\dot{\sigma }_i)}{\partial {\dot{q}}(\dot{\sigma }_j)} \Big )_{\dot{\Omega }\times \dot{\Omega }} \end{aligned}$$

is invertible at each $({\dot{q}}_\star (\lambda ,T),\lambda )$. By implicit function theorem, ${\dot{q}}_\star (\lambda ,T)$, as the solution of $f_T[{\dot{q}},\lambda ] = 0 $, is uniformly differentiable in $\lambda $.$\square $

Let us first fix $T<\infty $ and consider the clusters encoded by T-colorings. We have explicitly defined $\varvec{\Sigma }(H)$ and $\varvec{s}(H)$. Let ${\mathcal {S}}(s) \equiv \sup \{ \varvec{\Sigma }(H) : \varvec{s}(H)=s\}$, with the convention that a supremum over an empty set is $-\infty $. Thus ${\mathcal {S}}(s)$ is a well-defined function which captures the spirit of the function $\Sigma (s)$ discussed in the introduction. (Note ${\mathcal {S}}$ implicitly depends on T since the maximum is taken over empirical measures H which are supported on T-colorings.) Recall that the physics approach ([31] and refs. therein) takes ${\mathcal {S}}(s)$ as a conceptual starting point. However, for purposes of explicit calculation the actual starting point is the Legendre dual

$$\begin{aligned}{\mathfrak {F}}(\lambda ) \equiv (-{\mathcal {S}})^\star (\lambda ) = \sup _{s\in \mathbb {R}} \Big \{\lambda s + {\mathcal {S}}(s)\Big \} = \sup _H \varvec{F}_\lambda (H), \end{aligned}$$

where $\varvec{F}_\lambda (H)\equiv \lambda \varvec{s}(H)+\varvec{\Sigma }(H)$. The replica symmetry breaking heuristic gives an explicit conjecture for ${\mathfrak {F}}$. One then makes the assumption that ${\mathcal {S}}(s)$ is concave in s: this means it is the same as

$$\begin{aligned} {\mathcal {R}}(s) \equiv - {\mathfrak {F}}^\star (s) = -(-{\mathcal {S}})^{\star \star }(s), \end{aligned}$$

so if ${\mathcal {S}}$ is concave then it can be recovered from ${\mathfrak {F}}$.

We do not have a proof that ${\mathcal {S}}(s)$ is concave for all s, but we will argue that this holds on the interval of s corresponding to $\lambda \in [0,1]$. Formally, for $\lambda \in [0,1]$, we proved that $\varvec{F}_\lambda (H)$ has a unique maximizer $H_\star \equiv H_\lambda $. This implies that there is a unique $s_\lambda $ which maximizes $\lambda s +{\mathcal {S}}(s)$, given by

$$\begin{aligned} s_\lambda =\varvec{s}(H_\lambda )\,. \end{aligned}$$

Recall that $H_\lambda $ and $s_\lambda $ both depend implicitly on T. We also have from Lemma B.6 that for any fixed $T<\infty $, $s_\lambda $ is continuous in $\lambda $, so it maps $\lambda \in [0,1]$ onto some compact interval ${\mathcal {I}} \equiv [s_-,s_+]$. Define the modified function

$$\begin{aligned} \overline{{\mathcal {S}}}(s) \equiv {\left\{ \begin{array}{ll} {\mathcal {S}}(s) &{} \text {if }s\in {\mathcal {I}},\\ -\infty &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

Lemma B.7

For all $s\in \mathbb {R}$, $\overline{{\mathcal {S}}}(s) =-(-\overline{{\mathcal {S}}})^{\star \star }(s)$. Consequently the function $\overline{{\mathcal {S}}}$ is concave, and $s_\lambda $ is nondecreasing in $\lambda $.

Proof

The function $-{\mathcal {S}}(s)$ has Legendre dual

$$\begin{aligned} \overline{{\mathfrak {F}}}(\lambda ) = \sup _{s\in \mathbb {R}}\Big \{ \lambda s + \overline{{\mathcal {S}}}(s)\Big \} = \sup _{s\in {\mathcal {I}} }\Big \{ \lambda s + {\mathcal {S}}(s)\Big \} \leqslant {\mathfrak {F}}(\lambda )\,. \end{aligned}$$

For $\lambda \in [0,1]$ it is clear that $\overline{{\mathfrak {F}}}(\lambda ) ={\mathfrak {F}}(\lambda )$. It is straightforward to check that if $\lambda <0$ then

$$\begin{aligned}\overline{{\mathfrak {F}}}(\lambda ) \leqslant \max _{s\in {\mathcal {I}}}\lambda s +\max _{s\in {\mathcal {I}}}{\mathcal {S}}(s) = \lambda s_{\min } + {\mathcal {S}}(s_0), \end{aligned}$$

so if $s<s_{\min }$ then

$$\begin{aligned} (-\overline{{\mathcal {S}}})^{\star \star }(s)=(\overline{{\mathfrak {F}}})^\star (s) \geqslant \sup _{\lambda<0} \Big \{\lambda s -\overline{{\mathfrak {F}}}(\lambda ) \Big \} \geqslant \sup _{\lambda <0} \Big \{\lambda (s-s_{\min }) - {\mathcal {S}}(s_0) \Big \}=+\infty \,. \end{aligned}$$

A symmetric argument shows that $(-\overline{{\mathcal {S}}})^{\star \star }(s)=+\infty $ also for $s>s_{\max }$. If $s\in {\mathcal {I}}$, we must have $s=s_{\lambda _\circ }$ for some $\lambda _\circ \in [0,1]$, and so

$$\begin{aligned} (-\overline{{\mathcal {S}}})^{\star \star }(s) \geqslant \lambda _\circ s - {\mathfrak {F}}(\lambda _\circ ) = -{\mathcal {S}}(s)\,. \end{aligned}$$

This proves $(-\overline{{\mathcal {S}}})^{\star \star }(s) \geqslant -\overline{{\mathcal {S}}}(s)$ for all $s\in \mathbb {R}$. On the other hand, it holds for any function f that $f^{\star \star } \leqslant f$, so we conclude $(-\overline{{\mathcal {S}}})^{\star \star }(s) =-\overline{{\mathcal {S}}}(s)$ for all $s\in \mathbb {R}$. This implies that $\overline{{\mathcal {S}}}$ is concave, concluding the proof.$\square $

Proof of Proposition 1.4a

We can obtain $\Sigma (s)$ as the limit of $\overline{{\mathcal {S}}}(s)$ in the limit $T\rightarrow \infty $. It follows from Lemma B.7 together with Proposition 5.5b that it is strictly decreasing in s.$\square $

Appendix C: Constrained entropy maximization

In this section we review basic calculations for entropy maximization problems under affine constraints.

1.1 C.1. Constraints and continuity

We will optimize a functional over nonnegative measures $\nu $ on a finite space X (with $|X|=s$), subject to some affine constraints $M\nu =b$. We begin by discussing basic continuity properties. Denote

$$\begin{aligned}\mathbb {H}(b)\equiv \{\nu \geqslant 0\} \cap \{M\nu =b\} \subseteq \mathbb {R}^s\,. \end{aligned}$$

Let $\Delta \equiv \{\nu \geqslant 0\} \cap \{\langle {\mathbf {1}},\nu \rangle =1\}$, and let $\varvec{B}$ denote the space of $b\in \mathbb {R}^r$ for which

$$\begin{aligned}\varnothing \ne \mathbb {H}(b) \subseteq \Delta \,.\end{aligned}$$

Then $\varvec{B}$ is contained in the image of $\Delta $ under M, so $\varvec{B}$ is a compact subset of $\mathbb {R}^r$.

Proposition C.1

If $\varvec{F}$ is any continuous function on $\Delta $ and

$$\begin{aligned} F(b)= \max \{ \varvec{F}(\nu ) : \nu \in \mathbb {H}(b) \},\end{aligned}$$

(126)

then F is (uniformly) continuous over $b\in \varvec{B}$.

Proposition C.1 is a straightforward consequence of the following two lemmas.

Lemma C.2

For $b\in \varvec{B}$ and any vector u in the unit sphere ${\mathbb {S}}^{r-1}$, let

$$\begin{aligned}d(b,u)\equiv \inf \{t\geqslant 0: b+tu\notin \varvec{B}\}\,.\end{aligned}$$

There exists $\delta =\delta (b)>0$ such that

$$\begin{aligned}d(b,u) \in \{0\} \cup [\delta ,\infty ) \quad \text {for all }b\in \varvec{B}\,.\end{aligned}$$

Proof

$\varvec{B}$ is a polytope, so it can be expressed as the intersection of finitely many closed half-spaces $H_1,\ldots ,H_k$, where $H_i = \{ x\in \mathbb {R}^r : \langle a_i,x\rangle \leqslant c_i \}$. Consequently there is at least one index $1\leqslant i\leqslant k$ such that

$$\begin{aligned}d(b,u) = \inf \{t\geqslant 0:b+tu\notin H_i\}\,.\end{aligned}$$

It follows that $\langle a_i,u\rangle >0$ and

$$\begin{aligned}d(b,u)=\frac{c_i-\langle a_i,b\rangle }{\langle a_i,u\rangle } \geqslant \frac{c_i-\langle a_i,b\rangle }{|a_i|} = d( b,\partial H_i )\end{aligned}$$

where $d( b,\partial H_i )$ is the distance between b and the boundary of $H_i$. In particular, $d(b,u)>0$ if and only if $\langle a_i,b\rangle <c_i$, which in turn holds if and only if $d( b,\partial H_i )>0$. It follows that for all $u\in {\mathbb {S}}^{r-1}$ we have $d(b,u)\in \{0\} \cup [\delta ,\infty )$ with

$$\begin{aligned}\delta =\delta (b)= \min \{ d( b,\partial H_i ): d( b,\partial H_i )>0 \};\end{aligned}$$

$\delta $ is a minimum over finitely many positive numbers so it is also positive.$\square $

Lemma C.3

The set-valued function $\mathbb {H}$ is continuous on $\varvec{B}$ with respect to the Hausdorff metric $d_{\mathcal {H}}$, that is to say, if $b_n\in \varvec{B}$ with $\lim _{n\rightarrow \infty } b_n=b$ then

$$\begin{aligned}\lim _{n\rightarrow \infty }d_{\mathcal {H}}(\mathbb {H}(b_n),\mathbb {H}(b)) = 0\,.\end{aligned}$$

Proof

Recall that the Hausdorff distance between two subsets X and Y of a metric space is

$$\begin{aligned}d_{\mathcal {H}}(X,Y) = \inf \{\epsilon \geqslant 0 : X \subseteq Y^\epsilon \text { and } Y \subseteq X^\epsilon \},\end{aligned}$$

where $X^\epsilon ,Y^\epsilon $ are the $\epsilon $-thickenings of X and Y. Any sequence $\nu _n\in \mathbb {H}(b_n)$ converges along subsequences to limits $\nu \in \mathbb {H}(b)$, so for all $\epsilon >0$ there exists $n_0(\epsilon )$ large enough that

$$\begin{aligned}\mathbb {H}(b_n) \subseteq (\mathbb {H}(b))^\epsilon , \quad n\geqslant n_0(\epsilon )\,.\end{aligned}$$

In the other direction, we now argue that if $\nu \in \mathbb {H}(b)$ and $b'=b+tu$ for $u\in {\mathbb {S}}^{r-1}$ and t a small positive number, then we can find $\nu '\in \mathbb {H}(b')$ which is close to $\nu $. For $u\in {\mathbb {S}}^{r-1}$ let d(b, u) be as in Lemma C.2, and take $\nu (b,u)$ to be any fixed element of $\mathbb {H}(b+d(b,u)u)$ (which by definition is nonempty). Since we consider $b'=b+tu$ for $t>0$, we can assume that d(b, u) is positive, hence $\geqslant \delta (b)$ by Lemma C.2. We can express $b'=b+tu$ as the convex combination

$$\begin{aligned}b' = (1-\epsilon )b + \epsilon [ b+d(b,u)u ],\quad \epsilon = \frac{t}{d(b,u)} = \frac{|b'-b|}{d(b,u)} \leqslant \frac{|b'-b|}{\delta }\,.\end{aligned}$$

Then $\nu ' = (1-\epsilon )\nu + \epsilon \nu (b,u)\in \mathbb {H}(b')$, so

$$\begin{aligned}|\nu '-\nu | = \epsilon | \nu (b,u)-\nu | \leqslant \frac{({{\,\mathrm{diam}\,}}\Delta )|b-b'|}{\delta }\end{aligned}$$

This implies $H(b) \subseteq (H(b_n))^\epsilon $ for large enough n, and the result follows.$\square $

Proof of Proposition C.1

Take $\nu \in \mathbb {H}(b)$ so that $F(b)=\varvec{F}(\nu )$. If $b'=b+tu\in \varvec{B}$ for some $u\in {\mathbb {S}}^{r-1}$, then Lemma C.3 implies that we can find $\nu '\in \mathbb {H}(b')$ with $|\nu '-\nu | = o_t(1)$, where $o_t(1)$ indicates a function tending to zero in the limit $t\downarrow 0$, uniformly over $u\in {\mathbb {S}}^{r-1}$. It follows that $\varvec{F}(\nu ) = \varvec{F}(\nu ')+o_t(1)$, since $\varvec{F}$ is uniformly continuous on $\Delta $ by the Heine–Cantor theorem. Therefore

$$\begin{aligned}F(b) = \varvec{F}(\nu ) = \varvec{F}(\nu ') + o_t(1) \leqslant F(b') + o_t(1)\,.\end{aligned}$$

By the same argument $F(b') \leqslant F(b) + o_t(1)$, concluding the proof.$\square $

When solving (126) for a fixed value of $b\in \varvec{B}$, it will be convenient to make the following reduction:

Remark C.4

Suppose M is an $r\times s$ matrix where $s=|X|$. We can assume without loss that M has full rank r, since otherwise we can eliminate redundant constraints. We consider only $b\in \varvec{B}$, meaning $\varnothing \ne \mathbb {H}(b)\subseteq \Delta $. The affine space $\{M\nu =b\}$ has dimension $s-r$; we assume this is positive since otherwise $\mathbb {H}(b)$ would be a single point. Then, if $\mathbb {H}(b)$ does not contain an interior point of $\{\nu \geqslant 0\}$, it must be that

$$\begin{aligned}X_\circ \equiv \{x\in X : \exists \nu \in \{\nu \geqslant 0\}\cap \{M\nu =b\} \text { so that }\nu (x)>0\}\end{aligned}$$

is a nonempty subset of X. In this case, it is equivalent to solve the optimization problem over measures $\nu _\circ $ on the reduced alphabet $X_\circ $, subject to constraints $M' \nu _\circ =b$ where $M'$ is the submatrix of M formed by the columns indexed by $X_\circ $. Then, by construction, the space

$$\begin{aligned}\mathbb {H}_\circ (b) =\{\nu _\circ \geqslant 0\} \cap \{M' \nu _\circ =b\}\end{aligned}$$

contains an interior point of $\{\nu _\circ \geqslant 0\}$. The matrix $M'$ is $r\times s_\circ $ where $s_\circ =|X_\circ |$; and if $M'$ is not of rank r then we can again remove redundant constraints, replacing $M'$ with an $r_\circ \times s_\circ $ submatrix $M_\circ $ which has full rank $r_\circ $. We emphasize that the final matrix $M_\circ $ depends on b. In conclusion, when solving (126) for a fixed $b\in \varvec{B}$, we may assume with no essential loss of generality that the original matrix M is $r\times s$ with full rank r, and that $\mathbb {H}(b)=\{\nu \geqslant 0\}\cap \{M\nu =b\}$ contains an interior point of $\{\nu \geqslant 0\}$. It follows that this space has dimension $s-r>0$, and its boundary is contained in the boundary of $\{\nu \geqslant 0\}$.

1.2 C.2. Entropy maximization

We now restrict (126) to the case of functionals $\varvec{F}$ which are concave on the domain $\{\nu \geqslant 0\}$. It is straightforward to verify from definitions that the optimal value F(b) is (weakly) concave in b. Recall that the convex conjugate of a function f on domain C is the function $f^\star $ defined by

$$\begin{aligned} f^\star (x^\star ) = \sup \{\langle x^\star ,x\rangle -f(x) : x\in C\}\,.\end{aligned}$$

Denote $G(\gamma ) \equiv (-\varvec{F})^\star (M^t\gamma )$, and consider the Lagrangian functional

$$\begin{aligned}{\mathcal {L}}(\gamma ;b) = \sup \{ \varvec{F}(\nu ) + \langle \gamma , M\nu -b \rangle : \nu \geqslant 0 \} = -\langle \gamma ,b\rangle + G(\gamma )\,.\end{aligned}$$

It holds for any $\gamma \in \mathbb {R}^r$ that ${\mathcal {L}}(\gamma ;b) \geqslant F(b)$, so

$$\begin{aligned} F(b) \leqslant \inf \{{\mathcal {L}}(\gamma ;b) : \gamma \in \mathbb {R}^r\} = -G^\star (b)\,.\end{aligned}$$

(127)

Now assume $\psi $ is a positive function on X, and consider (126) for the special case

$$\begin{aligned} \varvec{F}(\nu ) = \mathcal {H}(\nu ) + \langle \nu ,\ln \psi \rangle = \sum _{x\in X} \nu (x)\ln \frac{\psi (x)}{\nu (x)}\,.\end{aligned}$$

(128)

We remark that the supremum in $(-\mathcal {H})^\star (\nu ^\star ) =\sup \{ \langle \nu ^\star ,\nu \rangle +\mathcal {H}(\nu ): \nu \geqslant 0\}$ is uniquely attained by the measure $\nu ^\text {op}(x)=\exp \{ -1 + \nu ^\star (x)\}$, yielding

$$\begin{aligned}(-\mathcal {H})^\star (\nu ^\star ) =\langle \nu ^\text {op}(\nu ^\star ) ,1\rangle = \sum _x \exp \{ -1+\nu ^\star (x)\}\,.\end{aligned}$$

This gives the explicit expression

$$\begin{aligned} G(\gamma ) =(-\varvec{F})^\star (M^t\gamma ) =(-\mathcal {H})^\star (\ln \psi +M^t\gamma ) =\sum _x \psi (x) \exp \{ -1 + (M^t\gamma )(x)\}\,.\end{aligned}$$

(129)

Lemma C.5

Assume $\psi $ is a strictly positive function on a set X of size s and that M is $r\times s$ with rank r. Then the function $G(\gamma )$ of (129) is strictly convex in $\gamma $.

Proof

Let $\nu \equiv \nu (\gamma )$ denote the measure on X defined by

$$\begin{aligned}\nu (x) = \psi (x) \exp \{ -1+(M^t\gamma )(x) \},\end{aligned}$$

and write $\langle f(x) \rangle _\nu \equiv \langle f,\nu \rangle $. The Hessian matrix $H \equiv {{\,\mathrm{Hess}\,}}G(\gamma )$ has entries

$$\begin{aligned}H_{i,j} = \frac{\partial ^2{\mathcal {L}}(\gamma ;b)}{ \partial \gamma _i\partial \gamma _j } =\sum _{x\in X} \nu (x) M_{i,x} M_{j,x} =\langle M_{i,x} M_{j,x}\rangle _\nu \,.\end{aligned}$$

Let $M_x$ denote the vector-valued function $(M_{i,x})_{i\leqslant r}$, so

$$\begin{aligned}\alpha ^t H \alpha = \langle (\alpha ^t M_x)^2 \rangle _\nu \,.\end{aligned}$$

This is zero if and only if $\nu (\{x\in X:\alpha ^t M_x=0\})=1$. Since $\nu $ is a positive measure, this can only happen if $\alpha ^t M_x=0$ for all $x\in X$, but this contradicts the assumption that M has rank r. This proves that H is positive-definite, so G is strictly convex in $\gamma $.$\square $

Proposition C.6

Let $b\in \varvec{B}$ such that $\mathbb {H}(b)=\{\nu \geqslant 0\}\cap \{M\nu =b\}$ contains an interior point of $\{\nu \geqslant 0\}$, and consider the optimization problem (126) for $\varvec{F}$ as in (128). For this problem, the inequality (127) becomes an equality,

$$\begin{aligned}F(b) =\inf \{ {\mathcal {L}}(\gamma ;b): \gamma \in \mathbb {R}^r\}=-G^\star (b)\,.\end{aligned}$$

Further, ${\mathcal {L}}(\gamma ;b)$ is strictly convex in $\gamma $, and its infimum is achieved by a unique $\gamma =\gamma (b)$. The optimum value of (126) is uniquely attained by the measure $\nu =\nu ^\text {op}(b)$ defined by

$$\begin{aligned} \nu (x)= \psi (x) \exp \{-1 + (M^t\gamma )(x)\}\,.\end{aligned}$$

(130)

For any $\mu \in \mathbb {H}(b)$, $\varvec{F}(\nu )-\varvec{F}(\mu ) =\mathcal {D}_{\textsc {kl}}(\mu |\nu ) \gtrsim \Vert \nu -\mu \Vert ^2$. Finally, in a neighborhood of b in $\varvec{B}$, $\gamma '(b)$ is defined and F(b) is strictly concave in b.

Proof

Under the assumptions, the boundary of the set $\mathbb {H}(b)$ is contained in the boundary of $\{\nu \geqslant 0\}$. The entropy $\mathcal {H}$ has unbounded gradient at this boundary, so for $\varvec{F}$ as in (128), the optimization problem (126) must be solved by a strictly positive measure $\nu >0$. Since $\nu >0$, we can differentiate in the direction of any vector $\delta $ with $M\delta =0$ to find

$$\begin{aligned}0=\frac{d}{dt} \bigg [ \mathcal {H}(\nu +t\delta ) +\langle \ln \psi ,\nu +t\delta \rangle \bigg ]\bigg |_{t=0} = \langle \delta ,-1-\ln \nu +\ln \psi \rangle \,. \end{aligned}$$

Recalling Remark C.4, we assume without loss that M is $r\times s$ with rank r, since otherwise we can eliminate redundant constraints. Then, since $M\delta =0$, for any $\gamma \in \mathbb {R}^r$ we have

$$\begin{aligned} 0 = \langle \delta ,\epsilon \rangle \quad \text {where }\epsilon = -1-\ln \nu +\ln \psi + M^t\gamma \,. \end{aligned}$$

We can then solve for $\gamma $ so that $M\epsilon =0$:^{Footnote 2}

$$\begin{aligned} \gamma = (M M^t)^{-1} M(\ln \nu -\ln \psi +1)\,. \end{aligned}$$

Setting $\delta =\epsilon $ in the above gives $0=\Vert \epsilon \Vert ^2$, therefore we must have $\epsilon =0$. This proves the existence of $\gamma =\gamma (b) \in \mathbb {R}^r$ such that (126) is optimized by $\nu =\nu ^\text {op}(b)$, as given by (130). The optimal value of (126) is then

$$\begin{aligned} F(b)&= \langle 1,\nu ^\text {op}(b)\rangle - \langle M^t\gamma (b), \nu ^\text {op}(b)\rangle \\&= \sum _x \psi (x) \exp \{ -1+ (M^t\gamma )(x) \} -\langle \gamma ,b\rangle \bigg |_{\gamma =\gamma (b)} = {\mathcal {L}}(\gamma (b),b). \end{aligned}$$

In view of (127), this proves that in fact

$$\begin{aligned}-G^\star (b)=\inf \{ {\mathcal {L}}(\gamma ,b) :\gamma \in \mathbb {R}^r\} = \min \{ {\mathcal {L}}(\gamma ,b) : \gamma \in \mathbb {R}^r\} = {\mathcal {L}}(\gamma (b),b) = F(b)\end{aligned}$$

as claimed. Now recall from Lemma C.5 that $G(\gamma )$ is strictly convex, which implies that ${\mathcal {L}}(\gamma ;b)$ is strictly convex in $\gamma $. Thus $\gamma =\gamma (b)$ is the unique stationary point of ${\mathcal {L}}(\gamma ;b)$.

These conclusions are valid under the assumption that $\mathbb {H}(b)$ contains an interior point of $\{\nu \geqslant 0\}$, which is valid in a neighborhood of b in $\varvec{B}$. Throughout this neighborhood, $\gamma (b)$ is defined by the stationarity condition $b = G'(\gamma )$. Differentiating again with respect to $\gamma $ gives

$$\begin{aligned} b'(\gamma ) = {{\,\mathrm{Hess}\,}}G(\gamma ),\quad \gamma '(b) = [{{\,\mathrm{Hess}\,}}G(\gamma (b))]^{-1}\,.\end{aligned}$$

(131)

We also find (in this neighborhood) that

$$\begin{aligned}F'(b) = -\gamma (b),\quad F''(b) = - \gamma '(b) = -[{{\,\mathrm{Hess}\,}}G(\gamma (b))]^{-1},\end{aligned}$$

so F is strictly concave. It remains to prove that $\varvec{F}(\nu )-\varvec{F}(\mu ) =\mathcal {D}_{\textsc {kl}}(\mu |\nu )$. (The estimate $\mathcal {D}_{\textsc {kl}}(\mu |\nu )\gtrsim \Vert \mu -\nu \Vert ^2$ is well known and straightforward to verify.) For any measure $\mu $,

$$\begin{aligned}-\mathcal {D}_{\textsc {kl}}(\mu |\nu ) = \mathcal {H}(\mu ) + \langle \mu ,\ln (\psi \exp \{-1+M^t\gamma \}) \rangle \,.\end{aligned}$$

Applying this with $\mu =\nu $ gives

$$\begin{aligned}0=-\mathcal {D}_{\textsc {kl}}(\nu |\nu ) = \mathcal {H}(\nu ) + \langle \nu ,\ln (\psi \exp \{-1+M^t\gamma \}) \rangle \,.\end{aligned}$$

Subtracting these two equations gives

$$\begin{aligned}-\mathcal {D}_{\textsc {kl}}(\mu |\nu ) = \mathcal {H}(\mu )-\mathcal {H}(\nu ) +\langle \mu -\nu ,\ln \psi \rangle + \langle \mu -\nu ,\ln (\exp \{-1+M^t\gamma \}) \rangle \,.\end{aligned}$$

If $M\nu =M\nu =b$ then the last term vanishes, giving $-\mathcal {D}_{\textsc {kl}}(\mu |\nu ) = \varvec{F}(\mu )-\varvec{F}(\nu )$.$\square $

Remark C.7

Our main application of Proposition C.6 is for the depth-one tree $\mathcal {D}$ of Fig. 6. In the notation of the current section, X is the space of valid T-colorings $\underline{{\sigma }}$ of $\mathcal {D}$, and $\psi : X \rightarrow (0,\infty )$ is defined by

$$\begin{aligned}\psi (\underline{{\sigma }}) = \varvec{w}_{\mathcal {D}}(\underline{{\sigma }})^\lambda =\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in \partial v} [\bar{\Phi }(\sigma _{av}) \hat{\Phi }(\underline{{\sigma }}_{\delta a})] \bigg \}^\lambda \,.\end{aligned}$$

We then wish to solve the optimization problem (126) for $\varvec{F}(\nu )$ as in (128), under the constraint that $\nu $ has marginals $\dot{h}^\text {tr}(\dot{\sigma })$ on the boundary edges $\delta \mathcal {D}$. This can be expressed as $M\nu ={\dot{h}}$ where M has rows indexed by the spins $\dot{\sigma }\in \dot{\Omega }$, columns indexed by valid T-colorings $\underline{{\eta }}\equiv \underline{{\eta }}_\mathcal {D}$ of $\mathcal {D}$: the $(\dot{\sigma },\underline{{\eta }})$ entry of M is given by

$$\begin{aligned}M(\dot{\sigma },\underline{{\eta }}) =|\delta \mathcal {D}|^{-1} \sum _{e\in \delta \mathcal {D}} \mathbf {1}\{\dot{\eta }_e=\dot{\sigma }\}\,.\end{aligned}$$

Recall Remark C.4, let $\dot{\Omega }_+=\{\dot{\sigma }\in \dot{\Omega }: \dot{h}^\text {tr}(\dot{\sigma })>0\}$, and $X_\circ = \{\underline{{\eta }}\in X:M(\dot{\sigma },\underline{{\eta }})=0\ \forall \dot{\sigma }\notin \dot{\Omega }\}$. Let $M_+$ be the $\dot{\Omega }_+ \times X_\circ $ submatrix of M, and set ${\dot{q}}(\dot{\sigma })=0$ for all $\dot{\sigma }\notin \dot{\Omega }_+$. Next, in the matrix $M_+$, if the $\dot{\eta }$ row is a linear combination of other rows, then set ${\dot{q}}(\dot{\eta })=1$ and remove this row. Repeat until we arrive at an $\dot{\Omega }_\circ \times X_\circ $ matrix $M_\circ $ of full rank $r_\circ =|\dot{\Omega }_\circ |$. The original problem reduces to an optimization over $\{\nu _\circ \geqslant 0\}\cap \{M_\circ \nu _\circ =b_\circ \}$ where $b_\circ $ denotes the entries of b indexed by $\dot{\Omega }_\circ $. It follows from Proposition C.6 that the unique maximizer of (126) is the measure $\nu =\nu ^\text {op}(b)$ given by

$$\begin{aligned}\nu (\underline{{\sigma }}) = \frac{1}{Z} \varvec{w}_{\mathcal {D}}(\underline{{\sigma }})^\lambda = \frac{1}{Z}\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in \partial v} [\bar{\Phi }(\sigma _{av}) \hat{\Phi }(\underline{{\sigma }}_{\delta a})] \bigg \}^\lambda \prod _{e\in \delta \mathcal {D}} {\dot{q}}(\sigma _e)\,.\end{aligned}$$

Note however that if $M_+$ is not of full rank then ${\dot{q}}$ need not be unique.

Appendix D: Pairs of intermediate or large overlap

In this section we prove Proposition 3.7, which states that the first moment of $\varvec{Z}=\varvec{Z}_{\lambda ,T}$ is dominated by separable colorings provided $0\leqslant \lambda \leqslant 1$.

1.1 D.1. Intermediate overlap

We first show that configurations with “intermediate” overlap are negligible. This can be done with quite crude estimates, working with nae-sat solutions rather than colorings.

Lemma D.1

Consider random regular nae-sat at clause density $\alpha \geqslant 2^{k-1}\ln 2 - O(1)$. On $\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})$, let $Z^2[\rho ]$ count the number of pairs $\underline{{x}},\underline{{\acute{x}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ of valid nae-sat solutions which agree on $\rho $ fraction of variables. Then

$$\begin{aligned}\mathbb {E}Z^2[\rho ] \leqslant (\mathbb {E}Z) \exp \Big \{ n \Big [ H(\rho ) -(\ln 2)\pi (\rho ) + O(1/2^k) \Big ] \Big \},\end{aligned}$$

for $\pi (\rho )\equiv 1-\rho ^k-(1-\rho )^k$.

Proof

For $\underline{{u}}\in \{{\texttt {0}},{\texttt {1}}\}^V$, let $I^\textsc {nae}(\underline{{u}};\mathscr {G})$ be the indicator that $\underline{{u}}$ is a valid nae-sat solution on $\mathscr {G}$. Fix any pair of vectors $\underline{{x}},\underline{{\acute{x}}}\in \{{\texttt {0}},{\texttt {1}}\}^V$ which agree on $\rho $ fraction of variables:

$$\begin{aligned}\mathbb {E}Z^2[\rho ] = 2^n \left( {\begin{array}{c}n\\ n\rho \end{array}}\right) \mathbb {E}[ I^\textsc {nae}(\underline{{x}};\mathscr {G}) I^\textsc {nae}(\underline{{\acute{x}}};\mathscr {G})] = (\mathbb {E}Z) \left( {\begin{array}{c}n\\ n\rho \end{array}}\right) \mathbb {E}[I^\textsc {nae}(\underline{{\acute{x}}};\mathscr {G}) \,|\, I^\textsc {nae}(\underline{{x}};\mathscr {G})=1]\,.\end{aligned}$$

Given $\underline{{x}},\underline{{\acute{x}}}$, let $M\equiv M(\underline{{x}},\underline{{\acute{x}}})$ count the number of clauses $a\in F$ where

$$\begin{aligned}|\{e\in \delta a : x_{v(e)}=\acute{x}_{v(e)} \}| \notin \{0,k\}\,.\end{aligned}$$

In each of these clauses, there are $2^k-2$ literal assignments $\underline{{\texttt {L}}}_{\delta a}$ which are valid for $\underline{{x}}$. Out of these, exactly $2^k-4$ are valid also for $\underline{{\acute{x}}}$. If we define i.i.d. binomial random variables $D_a\sim {\mathrm {Bin}}(k,\rho )$, indexed by $a\in F$, then

$$\begin{aligned}\mathbb {P}( M=m\gamma ) = \mathbb {P}\bigg ( \sum _{a\in F} \mathbf {1}\{D_a\notin \{0,k\}\} \,\bigg |\, \sum _{a\in F} D_a = mk\rho \bigg )\,.\end{aligned}$$

The $(D_a)_{a\in F}$ sum to $mk\rho $ with probability which is polynomial in n, so

$$\begin{aligned}\mathbb {P}( M=m\gamma ) \leqslant n^{O(1)} \mathbb {P}({\mathrm {Bin}}(m,\pi )=m\gamma )\end{aligned}$$

with $\pi =\pi (\rho )$ as in the statement of the lemma. Therefore

$$\begin{aligned}\mathbb {E}[I^\textsc {nae}(\underline{{\acute{x}}};\mathscr {G}) \,|\, I^\textsc {nae}(\underline{{x}};\mathscr {G})=1] \leqslant n^{O(1)} \mathbb {E}\bigg [ \bigg ( \frac{2^k-4}{2^k-2} \bigg )^X \bigg ]\end{aligned}$$

for $X\sim {\mathrm {Bin}}(m,\rho )$. It is easily seen that the above is $\leqslant \exp \{ -m\pi /2^{k-1} \}$, and the claimed bound follows, using the lower bound on $\alpha =m/n$.$\square $

Corollary D.2

Let $\psi (\rho ) = H(\rho ) - (\ln 2)\pi (\rho )$. Then $\psi (\rho ) \leqslant -2k/2^k$ for all $\rho $ in

$$\begin{aligned} \cup [\tfrac{1}{2}(1 + k/2^{k/2}), 1-\exp \{-k/(\ln k)\}]\,.\end{aligned}$$

Assuming $\alpha =m/n\geqslant 2^{k-1}\ln 2-O(1)$, $\mathbb {E}Z^2[\rho ] \leqslant \exp \{ -nk/2^k \}$ for all such $\rho $.

Proof

Note that $H( \tfrac{1+\epsilon }{2}) \leqslant \ln 2-\epsilon ^2/2$. If $(k\ln k)/2^k \leqslant \epsilon \leqslant 1/k$, then

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant -\epsilon ^2/2 + O(k\epsilon /2^k) \leqslant -\epsilon ^2/3\,.\end{aligned}$$

Both $H( \tfrac{1+\epsilon }{2})$ and $\pi ( \tfrac{1+\epsilon }{2})$ are symmetric about $\epsilon =0$, and decreasing on the interval $0\leqslant \epsilon \leqslant 1$. It follows that for any $0\leqslant a \leqslant b\leqslant 1$,

$$\begin{aligned}\max _{a\leqslant \epsilon \leqslant b} \psi ( \tfrac{1+\epsilon }{2} ) \leqslant H(\tfrac{1+a}{2}) -(\ln 2)\pi (\tfrac{1+b}{2})\,.\end{aligned}$$

With this in mind, if $1/k \leqslant \epsilon \leqslant 1-5(\ln k)/k$,

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant -(2k^2)^{-1} + O(k^{-5/2}) \leqslant -(4k^2)^{-1}\,.\end{aligned}$$

If $1-5(\ln k)/k \leqslant \epsilon \leqslant 1-(\ln k)^3/k^2$,

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant O(1) (\ln k)^2/k -\Omega (1) (\ln k)^3/k \leqslant -\Omega (1) (\ln k)^3/k\,.\end{aligned}$$

Finally, if $1-(\ln k)^3/k^2 \leqslant \epsilon \leqslant 1-\exp \{-2k/(\ln k)\}$, then

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant O(1) \epsilon k/(\ln k) - \Omega (1) \epsilon k \leqslant - \Omega (1) \epsilon k\,.\end{aligned}$$

Combining these estimates proves the claimed bound on $\psi (\rho )$. The assertion for $\mathbb {E}[Z^2(\rho )]$ then follows by substituting into Lemma D.1, and noting that $\mathbb {E}Z \leqslant \exp \{O(n/2^k)\}$.$\square $

1.2 D.2. Large overlap

In what follows, we restrict consideration to a small neighborhood $\mathbf {N}$ of $H_\star $. We abbreviate $\underline{{\sigma }}\in H$ if $H(\mathcal {G},\underline{{\sigma }})=H$, and $\underline{{\sigma }}\in \mathbf {N}$ if $H(\mathcal {G},\underline{{\sigma }})\in \mathbf {N}$. Recall that we write $\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}$ if the number of free variables in $\underline{{x}}(\underline{{\sigma }}')$ upper bounds the number in $\underline{{x}}(\underline{{\sigma }})$. We also write $H' \succcurlyeq H$ if $\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}$ for any (all) $\underline{{\sigma }}\in H$ and $\underline{{\sigma }}'\in H'$. Let $\varvec{Z}_\text {ns}(H,H')$ count the colorings $\underline{{\sigma }}\in H$ such that

$$\begin{aligned}\Big |\Big \{ \underline{{\sigma }}'\in H' : \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \exp \{-k/(\ln k) \} \Big \}\Big | \geqslant \omega (n),\end{aligned}$$

for $\omega (n) = \exp \{ (\ln n)^4 \}$. (Although we will not write it explicitly, it should be understood that $\varvec{Z}_\text {ns}(H,H')$ depends on $\mathscr {G}$, since both $\underline{{\sigma }},\underline{{\sigma }}'$ are required to be valid colorings of $\mathscr {G}$.) Let $\varvec{Z}_\text {ns}(\mathbf {N})$ denote the sum of $\varvec{Z}_\text {ns}(H;H')$ over all pairs $H,H'\in \mathbf {N}$ with $H'\succcurlyeq H$. Let $\varvec{Z}(\mathbf {N})$ denote the sum of $\varvec{Z}(H)$ over all $H\in \mathbf {N}$.

Proposition D.3

There exists a small enough positive constant $\epsilon _{\max }(k)$ such that, if $\mathbf {N}$ is the $\epsilon $-neighborhood of $H_\star $ for any $\epsilon \leqslant \epsilon _{\max }$, then

$$\begin{aligned}\mathbb {E}\varvec{Z}_\text {ns}(\mathbf {N})\leqslant \mathbb {E}\varvec{Z}(\mathbf {N}) \exp \{-(\ln n)^2\}\,.\end{aligned}$$

Proof

By definition,

$$\begin{aligned}\varvec{Z}_\text {ns}(\mathbf {N}) = \sum _{H\in \mathbf {N}} \varvec{Z}_\succcurlyeq (H),\quad \varvec{Z}_\succcurlyeq (H)\equiv \sum _{H'\in \mathbf {N}} \mathbf {1}\{ H' \succcurlyeq H \} \varvec{Z}_\text {ns}(H,H')\,.\end{aligned}$$

It suffices to show that for every $H\in \mathbf {N}$, $\mathbb {E}\varvec{Z}_\succcurlyeq (H)\leqslant \mathbb {E}\varvec{Z}(H)\exp \{ -2 (\ln n)^2 \}$. Note that the total number of empirical measures $H'$ is at most $n^c$ for some constant c(k, T). Let $\varvec{E}$ denote the set of pairs $(\mathscr {G},\underline{{\sigma }})$ for which

$$\begin{aligned}\Big |\Big \{ \underline{{\sigma }}'\in \mathbf {N}: \underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\text { and } \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \exp \{-k/(\ln k)\} \Big \}\Big |\geqslant \omega (n)\,.\end{aligned}$$

(Again, it is understood that both $\underline{{\sigma }},\underline{{\sigma }}'$ must be valid colorings of $\mathscr {G}$.) Then

$$\begin{aligned}\varvec{Z}_\succcurlyeq (H) \leqslant n^{c} \sum _{\underline{{\sigma }}\in H} \mathbf {1}\{(\mathscr {G},\underline{{\sigma }})\in \varvec{E}\}\,.\end{aligned}$$

Consequently, in order to show the required bound on $\mathbb {E}\varvec{Z}_\succcurlyeq (H)$, it suffices to show

$$\begin{aligned} \mathbb {P}^H(\varvec{E})\leqslant n^{-c} \exp \{-2(\ln n)^2\},\end{aligned}$$

(132)

where $\mathbb {P}^H$ is a “planted” measure on pairs $(\mathscr {G},\underline{{\sigma }})$: to sample from $\mathbb {P}^H$, we start with a set V of n isolated variables each with d incident half-edges, and a set F of m isolated clauses each with k incident half-edges. Assign colorings of the half-edges,

$$\begin{aligned}\underline{{\sigma }}_\delta \equiv (\underline{{\sigma }}_{\delta V},\underline{{\sigma }}_{\delta F}) \quad \text {where } \underline{{\sigma }}_{\delta V} \equiv (\underline{{\sigma }}_{\delta v})_{v\in V}, \ \underline{{\sigma }}_{\delta F} \equiv (\underline{{\sigma }}_{\delta a})_{a\in F},\end{aligned}$$

which are uniformly random subject to the empirical measure H. Then $\underline{{\sigma }}_\delta $ is the “planted” coloring: conditioned on it, we sample uniformly at random a graph $\mathscr {G}$ such that $\underline{{\sigma }}_\delta $ becomes a valid coloring $\underline{{\sigma }}$ on $\mathscr {G}$. The resulting pair $(\mathscr {G},\underline{{\sigma }})$ is a sample from $\mathbb {P}^H$.

Suppose $(\mathscr {G},\underline{{\sigma }})\in \varvec{E}$. The total number of configurations $\underline{{\sigma }}'$ with $\mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \delta $ is at most $(cn)^{n\delta }$, which is $\ll \omega (n)$ if $\delta \leqslant n^{-1} (\ln n)^2$. This implies that there must exist $\underline{{\sigma }}'\in \mathbf {N}$ such that $\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}$ and

$$\begin{aligned}n^{-1} (\ln n)^2\leqslant \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \exp \{-k/(\ln k)\}\,.\end{aligned}$$

It follows that

$$\begin{aligned}S\equiv \{v\in V: x_v(\underline{{\sigma }})\in \{{\texttt {0}},{\texttt {1}}\} \text { and } x_v(\underline{{\sigma }}') \ne x_v(\underline{{\sigma }})\}\end{aligned}$$

has size $|S| \equiv ns$ for $s \in [(2n)^{-1}(\ln n)^2,\exp \{-k/(\ln k)\}]$. The set S is internally forced in $\underline{{\sigma }}$: for every $v\in S$, any clause forcing to v must have another edge connecting to S. Formally, let $\texttt {R}_U$ (resp. $\texttt {B}_U$) count the number of $\{{\texttt {r}}\}$-colored (resp. $\{{\texttt {b}}\}$-colored) edges incident to a subset of vertices U. Let $I_S$ be the indicator that all variables in S are forced. For any fixed $S\subseteq V$,

$$\begin{aligned}\mathbb {P}^H(S\text { internally forced}) \leqslant \mathbb {E}_{\mathbb {P}^H}\bigg [ I_S k^{\texttt {R}_S} \frac{(\texttt {B}_S)_{\texttt {R}_S}}{(\texttt {B}_F)_{\texttt {R}_S}} \bigg ] \leqslant \mathbb {E}_{\mathbb {P}^H}[ I_S (4ks)^{\texttt {R}_S}]\,.\end{aligned}$$

In the first inequality, the factor $k^{\texttt {R}_S}$ accounts for the choice, for each S-incident $\{{\texttt {r}}\}$-colored edge e, of another edge $e'$ sharing the same clause. The factor $(\texttt {B}_S)_{\texttt {R}_S}/(\texttt {B}_F)_{\texttt {R}_S}$ then accounts for the chance that the chosen edge $e'$ (which must have color in $\{{\texttt {b}}\}$) will also be S-incident. The second inequality follows by noting that we certainly have $\texttt {B}_S \leqslant nsd$, and for H near $H_\star $ we also clearly have $\texttt {B}_F \geqslant nd/4$.

To bound the above, we can work with a slightly different measure $\mathbb {Q}^H$: instead of sampling $\underline{{\sigma }}_\delta $ subject to H, we can simply sample variable-incident colorings $\underline{{\sigma }}_{\delta v}$ i.i.d. from $\dot{H}$, and clause-incident colorings $\underline{{\sigma }}_{\delta a}$ i.i.d. from $\hat{H}$. On the event $\mathrm {\textsf {{MARG}}}$ that the resulting $\underline{{\sigma }}_\delta $ has empirical measure H, we sample the graph $\mathscr {G}$ according to $\mathbb {P}^H(\mathscr {G}|\underline{{\sigma }}_\delta )$, and otherwise we set $\mathscr {G}=\varnothing $. Then, since $\mathbb {Q}^H(\mathrm {\textsf {{MARG}}}) \geqslant n^{-c}$ (adjusting c as needed), we have

$$\begin{aligned}\mathbb {P}^H((\mathscr {G},\underline{{\sigma }}))=\mathbb {Q}^H((\mathscr {G},\underline{{\sigma }}) \,|\,\mathrm {\textsf {{MARG}}}) \leqslant n^c \, \mathbb {Q}^H((\mathscr {G},\underline{{\sigma }}) ; \mathrm {\textsf {{MARG}}})\,.\end{aligned}$$

Let us abbreviate $\dot{H}(\ell )$ for the probability under $\dot{H}$ that $\underline{{\sigma }}$ has $\ell $ entries in $\{{\texttt {r}}\}$: then

$$\begin{aligned} \mathbb {E}_{\mathbb {P}^H}[ I_S (4ks)^{\texttt {R}_S}] \leqslant n^c \, \mathbb {E}_{\mathbb {Q}^H}[ I_S (4ks)^{\texttt {R}_S}; \mathrm {\textsf {{MARG}}}] \leqslant n^c \, \, \bigg (\sum _{\ell \geqslant 1} \dot{H}(\ell ) (4ks)^{\ell }\bigg )^{ns}\,. \end{aligned}$$

(133)

For H sufficiently close to $H_\star $, we will have

$$\begin{aligned}\dot{H}(\ell ) \leqslant 2\dot{H}_\star (\ell ) \leqslant 2 \left( {\begin{array}{c}d\\ \ell \end{array}}\right) \frac{\hat{q}_\star ({\texttt {r}}_{\texttt {1}})^\ell \hat{q}_\star ({\texttt {b}}_{\texttt {1}})^{d-\ell }}{ [\hat{q}_\star ({\texttt {r}}_{\texttt {1}}) +\hat{q}_\star ({\texttt {b}}_{\texttt {1}})]^d -\hat{q}_\star ({\texttt {b}}_{\texttt {1}})^d }\,.\end{aligned}$$

It follows that the right-hand side of (133) is (for some absolute constant $\delta $)

$$\begin{aligned}\leqslant n^c \, 2^{ns} \bigg (\frac{[\hat{q}_\star ({\texttt {r}}_{\texttt {1}}) \cdot 4ks +\hat{q}_\star ({\texttt {b}}_{\texttt {1}})]^d -\hat{q}_\star ({\texttt {b}}_{\texttt {1}})^d}{[\hat{q}_\star ({\texttt {r}}_{\texttt {1}}) +\hat{q}_\star ({\texttt {b}}_{\texttt {1}})]^d -\hat{q}_\star ({\texttt {b}}_{\texttt {1}})^d }\bigg )^{ns} \leqslant n^c s^{ns} 2^{-\delta k n s} ,\end{aligned}$$

where the last inequality uses that $s\leqslant \exp \{ -k/(\ln k)\}$. Summing over S gives

$$\begin{aligned}\mathbb {P}^H(\varvec{E}) \leqslant \max _{s \geqslant (2n)^{-1}(\ln n)^2} n^c 2^{-\delta k n s/2} \leqslant \exp \{ -\Omega (1) k(\ln n)^2\}\,.\end{aligned}$$

This implies (132); and the claimed result follows as previously explained.$\square $

Proof of Proposition3.7

Follows by combining Corollary D.2 and Proposition D.3.$\square $

Appendix E: Free energy upper bound

For a family of spin systems that includes nae-sat, an interpolative calculation gives an upper bound for the free energy on Erdős-Rényi graphs ([26, 43], cf. [30]). These bounds build on earlier work [29] concerning the subadditivity of the free energy in the Sherrington–Kirkpatrick model, which was later generalized to a broad class of models [4, 5, 12, 27]. (Although these results are closely related, we remark that interpolation gives quantitative bounds whereas subadditivity does not.) To prove the upper bound in Theorem 1, we establish the analogue of [26, 43] for random regular graphs. Although the main concern of this paper is the nae-sat model, we give the bound for a more general class of models, which may be of independent interest.

1.1 E.1. Basic interpolation bound

Recall $\mathcal {G}=(V,F,E)$ denotes a (d, k)-regular bipartite graph (without edge literals). We consider measures defined on vectors $\underline{{x}}\in \mathcal {X}^V$ where $\mathcal {X}$ is some fixed alphabet of finite size. Fix also a finite index set S. Suppose we have (random) vectors $b\in \mathbb {R}^S$ and $f\in {\mathcal {F}}(\mathcal {X})^S$, where ${\mathcal {F}}(\mathcal {X})$ denotes the space of functions $\mathcal {X}\rightarrow \mathbb {R}_{\geqslant 0}$. Independently of b, let $f_1,\ldots ,f_k$ be i.i.d. copies of f, and define the random function

$$\begin{aligned} \theta (\underline{{x}}) \equiv \sum _{s\in S} b_s \prod _{j=1}^k f_{s,j}(x_j). \end{aligned}$$

(134)

Let h be another (random) element of ${\mathcal {F}}(\mathcal {X})$. Assume there is a constant $\epsilon >0$ so that

$$\begin{aligned} \epsilon \leqslant \{h,1-\theta \} \leqslant \frac{1}{\epsilon }\quad \text {almost surely.} \end{aligned}$$

(135)

Note we do not require the $b_s$ to be nonnegative; however, we assume that

$$\begin{aligned} b^p(\underline{{s}}) \equiv \mathbb {E}\Big [ \prod _{\ell =1}^p b_{s_\ell }\Big ]\geqslant 0 \quad \text {for any }p\geqslant 1, \underline{{s}} \equiv (s_1,\ldots ,s_p)\in S^p. \end{aligned}$$

(136)

Let $\mathscr {G}$ denote the graph $\mathcal {G}$ labelled by a vector $((h_v)_{v\in V},(\theta _a)_{a\in F})$ of independent functions, where the $h_v$ are i.i.d. copies of h and the $\theta _a$ are i.i.d. copies of $\theta $. For $a\in F$ we abbreviate $\underline{{x}}_{\delta a}\equiv ( x_{v(e)} )_{e\in \delta a}\in \mathcal {X}^k$, and we consider the (random) Gibbs measure

$$\begin{aligned} \mu _{\mathscr {G}}(\underline{{x}}) \equiv \frac{1}{Z(\mathscr {G})} \prod _{v\in V} h_v(x_v) \prod _{a\in F} [1-\theta _a( \underline{{x}}_{\delta a})] \end{aligned}$$

(137)

where $Z(\mathscr {G})$ is the normalizing constant. Now let $\mathscr {G}$ be the random (d, k)-regular graph on n variables, together with the random function labels. We write $\mathbb {E}_n$ for expectation over the law of $\mathscr {G}$, and define the (logarithmic) free energy of the model to be

$$\begin{aligned}F_n \equiv \frac{1}{n}\mathbb {E}_n\ln Z(\mathscr {G})\,. \end{aligned}$$

Example E.1

(positive temperature nae-sat) Let $\mathcal {X}=\{{\texttt {0}},{\texttt {1}}\}$, and let $\underline{{{\texttt {L}}}}\equiv ({\texttt {L}}_i)_{i\leqslant k}$ be a sequence of i.i.d. $\text {Bernoulli}(1/2)$ random variables. The positive-temperature nae-sat model corresponds to taking $h\equiv 1$ and

$$\begin{aligned}\theta (\underline{{x}}) \equiv (1-e^{-\beta }) \bigg ( \prod _{i=1}^k \frac{{\texttt {L}}_{1}\oplus x_{i}}{2} +\prod _{i=1}^k \frac{{\texttt {1}}\oplus {\texttt {L}}_{i}\oplus x_{i}}{2} \bigg )\end{aligned}$$

where $\beta \in (0,\infty )$ is the inverse temperature. In this model, each violated clause incurs a multiplicative penalty $e^{-\beta }$.

Example E.2

(positive-temperature coloring) Let $\mathcal {X}=[q]$. The positive-temperature coloring model (i.e., anti-ferromagnetic Potts model) on a k-uniform hypergraph corresponds to $h\equiv 1$ and

$$\begin{aligned}\theta (\underline{{x}}) \equiv (1-e^{-\beta }) \sum _{s=1}^q \mathbf {1}\{x_1=\cdots =x_k=s\}\end{aligned}$$

where $\beta \in (0,\infty )$ is the inverse temperature. In this model, each monochromatic (hyper)edge incurs a multiplicative penalty $e^{-\beta }$.

The following theorem is a random regular graph analog of [43, Thm. 3]. (We stated our result for a slightly more general class of models than considered in [43]; however the main result of [43] extends to these models with only minor modifications.)

Theorem E.3

Consider a (random) Gibbs measure (137) satisfying assumptions (134)–(136), and consider the (nonasymptotic) free energy $F_n \equiv n^{-1}\mathbb {E}_n\ln Z(\mathscr {G})$. Let

$$\begin{aligned} {\mathcal {M}}_0&\equiv \text {space of probability measures over } \mathcal {X},\\ {\mathcal {M}}_1&\equiv \text {space of probability measures over } {\mathcal {M}}_0,\\ {\mathcal {M}}_2&\equiv \text {space of probability measures over } {\mathcal {M}}_1. \end{aligned}$$

For ${\zeta \in {\mathcal {M}}_2}$, let $\underline{{\eta }}\equiv (\eta _{a,j})_{a\geqslant 0,j\geqslant 0}$ be an array of i.i.d. samples from $\zeta $. For each index (a, j) let $\rho _{a,j}$ be a conditionally independent sample from $\eta _{a,j}$, and denote $\underline{{\rho }}\equiv (\rho _{a,j})_{a\geqslant 0,j\geqslant 0}$. Let $(h\rho )_{a,j}(x) \equiv h_{a,j}(x)\rho _{a,j}(x)$, define random variables

$$\begin{aligned} \varvec{u}_a(x)&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} \mathbf {1}\{x_1=x\} [1-\theta _a(\underline{{x}})] \prod _{j=2}^k (h\rho )_{a,j}(x_j) ,\\ \varvec{u}_a&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} [1-\theta _a(\underline{{x}})] \prod _{j=1}^k (h\rho )_{a,j}(x_j). \end{aligned}$$

For any $\lambda \in (0,1)$ and any ${\zeta \in {\mathcal {M}}_2}$,

$$\begin{aligned}F_n \leqslant \lambda ^{-1} \mathbb {E}\ln \mathbb {E}'\bigg [ \Big ( \sum _{x\in \mathcal {X}} h(x) \prod _{a=1}^d \varvec{u}_a(x) \Big )^\lambda \bigg ] -(k-1)\alpha \lambda ^{-1} \mathbb {E}\ln \mathbb {E}'[(\varvec{u}_0)^\lambda ] +O_{\epsilon }(n^{-1/3}) \end{aligned}$$

where $\mathbb {E}'$ denotes the expectation over $\underline{{\rho }}$ conditioned on all else, and $\mathbb {E}$ denotes the overall expectation.

Remark E.4

In the statistical physics framework, elements $\rho \in {\mathcal {M}}_0$ correspond to belief propagation messages for the underlying model, which has state space $\mathcal {X}$. Elements $\eta \in {\mathcal {M}}_1$ correspond to belief propagation messages for the 1rsb model (termed “auxiliary model” in [33, Ch. 19]), which has state space ${\mathcal {M}}_0$. The informal picture is that the $\eta $ associated to variable x is determined by the geometry of the local neighborhood of x — that is to say, the randomness of $\zeta $ reflects the randomness in the geometry of the R-neighborhood of a uniformly randomly variable in the graph. In random regular graphs this randomness is degenerate—the R-neighborhood of (almost) every vertex is simply a regular tree. It is therefore expected that the best upper bound in Theorem E.3 can be achieved with $\zeta $ a point mass.

1.2 E.2. Replica symmetric bound

Along the lines of [43], we first prove a weaker “replica symmetric” version of Theorem E.3. Afterwards we will apply it to obtain the full result.

Theorem E.5

In the setting of Theorem E.3, define

$$\begin{aligned}\Phi _V\equiv \mathbb {E}\ln \Big ( \sum _{x\in \mathcal {X}} h(x)\prod _{a=1}^d \varvec{u}_a (x) \Big ),\quad \Phi _F \equiv (k-1)\alpha \mathbb {E}\ln (\varvec{u}_0)\,. \end{aligned}$$

Then $F_n \leqslant \Phi _V-\Phi _F-O_{\epsilon }(n^{-1/3})$.

Inspired by the proof of [12], we prove Theorem E.5 by a combinatorial interpolation between two graphs, $\mathscr {G}_{-1}$ and $\mathscr {G}_{nd+1}$. The initial graph $\mathscr {G}_{-1}$ will have free energy $\Phi _V$, and the final graph $\mathscr {G}_{nd+1}$ will have free energy $F_n + \Phi _F$. We will show that, up to $O_\epsilon (n^{1/3})$ error, the free energy of $\mathscr {G}_{-1}$ will be larger than that of $\mathscr {G}_{nd+1}$, from which the bound of Theorem E.5 follows.

To begin, we take $\mathscr {G}_{-1}$ to be a factor graph consisting of n disjoint trees (Fig. 7a). Each tree is rooted at a variable v which joins to d clauses. Each of these clauses then joins to $k-1$ more variables, which form the leaves of the tree. We write V for the root variables, A for the clauses, and U for the leaf variables. Note $|V|=n$, $|A|=nd$, and $|U|=nd(k-1)$.

Independently of all else, take a vector of i.i.d. samples $(\eta _u,\rho _u)_{u\in U}$ where $\eta _u$ is a sample from $\zeta $, and $\rho _u$ is a sample from $\eta _u$.^{Footnote 3} As before, the variables and clauses in $\mathscr {G}_{-1}$ are labelled independently with functions $h_v$ and $\theta _a$. We now additionally assign to each $u\in U$ the label $(\eta _u,\rho _u)$. Let $(h\rho )_u(x)\equiv h_u(x)\rho _u(x)$. We consider the factor model on $\mathscr {G}_{-1}$ defined by

$$\begin{aligned}\mu _{\mathscr {G}_{-1}}(\underline{{x}}) = \frac{1}{Z(\mathscr {G}_{-1})} \prod _{v\in V} h_v(x_v) \prod _{a\in A} [1-\theta _a(\underline{{x}}_{\delta a})] \prod _{u\in U} (h\rho )_u(x_u)\,. \end{aligned}$$

We now define the interpolating sequence of graphs $\mathscr {G}_{-1},\mathscr {G}_0,\ldots ,\mathscr {G}_{nd+1}$. Fix $m'\equiv 2n^{2/3}$. The construction proceeds by adding and removing clauses. Whenever we remove a clause a, the edges $\delta a$ are left behind as k unmatched edges in the remaining graph. Whenever we add a new clause b, we label it with a fresh sample $\theta _b$ of $\theta $. The graph $\mathscr {G}_r$ has clauses $F_r$ which can be partitioned into $A_{U,r}$ (clauses involving U only), $A_{V,r}$ (clauses involving V only), and $A_r$ (clauses involving both U and V). We will define below a certain sequence of events $\mathrm {\textsf {{{COUP}}}}_r$. Let $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$ be the event that $\mathrm {\textsf {{{COUP}}}}_s$ occurs for all $0\leqslant s \leqslant r$. The event $\mathrm {\textsf {{{COUP}}}}_{\leqslant -1}$ occurs vacuously, so $\mathbb {P}(\mathrm {\textsf {{{COUP}}}}_{\leqslant -1})=1$. With this notation in mind, the construction goes as follows:

1.
Starting from $\mathscr {G}_{-1}$, choose a uniformly random subset of $m'$ clauses from $F_{-1}=A_{-1}=A$, and remove them to form the new graph $\mathscr {G}_0$.
2.
For $0\leqslant r\leqslant nd-m'-1$, we start from $\mathscr {G}_r$ and form $\mathscr {G}_{r+1}$ as follows.
1. a.
  If $\mathrm {\textsf {{{COUP}}}}_{\leqslant r-1}$ succeeds, choose a uniformly random clause a from $A_r$, and remove it to form the new graph $\mathscr {G}_{r,\circ }$. Let $\delta 'U_{r,\circ }$ and $\delta 'V_{r,\circ }$ denote the unmatched half-edges incident to U and V respectively in $\mathscr {G}_{r,\circ }$, and define the event
  $$\begin{aligned}\mathrm {\textsf {{{COUP}}}}_r\equiv \{ \min \{{\delta 'U_{r,\circ }},{\delta 'V_{r,\circ }}\}\geqslant k\}\,.\end{aligned}$$
  If instead $\mathrm {\textsf {{{COUP}}}}_{\leqslant r-1}$ fails, then $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$ fails by definition.
2. b.
  If $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$ fails, let $\mathscr {G}_{r+1}=\mathscr {G}_r$. If $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$ succeeds, then with probability 1/k take k half-edges from $\delta ' V_{r,\circ }$ and join them into a new clause c. With the remaining probability $(k-1)/k$ take k half-edges from $\delta ' U_{r,\circ }$ and join them into a new clause c.
c.
For $nd-m' \leqslant r\leqslant nd-1$ let $\mathscr {G}_{r+1}=\mathscr {G}_r$. Starting from $\mathscr {G}_{nd}$, remove all the clauses in $A_{nd}$. Then connect (uniformly at random) all remaining unmatched V-incident edges into clauses. Likewise, connect all remaining unmatched U-incident edges into clauses. Denote the resulting graph $\mathscr {G}_{nd+1}$.

By construction, $\mathscr {G}_{nd+1}$ consists of two disjoint subgraphs, which are the induced subgraphs $\mathscr {G}_U,\mathscr {G}_V$ of U, V respectively. Note that $\mathscr {G}_V$ is distributed as the random graph $\mathscr {G}$ of interest, while $\mathscr {G}_U$ consists of a collection of $nd(k-1)/k = n\alpha (k-1)$ disjoint trees.

Lemma E.6

Under the construction above,

$$\begin{aligned} \mathbb {E}\ln Z(\mathscr {G}_0)\geqslant \mathbb {E}\ln Z(\mathscr {G}_{nd})-O_\epsilon (n^{1/3})\,,\end{aligned}$$

(138)

where the expectation $\mathbb {E}$ is over the sequence of random graphs $(\mathscr {G}_r)_{-1\leqslant r\leqslant nd+1}$.

Proof

Let ${\mathscr {F}}_{r,\circ }$ be the $\sigma $-field generated by $\mathscr {G}_{r,\circ }$, and write $\mathbb {E}_{r,\circ }$ for expectation conditioned on ${\mathscr {F}}_{r,\circ }$. One can rewrite (138) as

$$\begin{aligned} \mathbb {E}\ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{nd})} = \sum _{r=0}^{nd-1} \mathbb {E}\Delta _r,\quad \Delta _r\equiv \mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })} -\mathbb {E}_{r,\circ }\ln \frac{Z(\mathscr {G}_{r+1})}{Z(\mathscr {G}_{r,\circ })}\,.\end{aligned}$$

In particular, $\Delta _r=0$ if the coupling fails. Therefore it suffices to show that $\Delta _r$ is positive conditioned on $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$.^{Footnote 4} First we compare $\mathscr {G}_r$ and $\mathscr {G}_{r,\circ }$. Conditioned on ${\mathscr {F}}_{r,\circ }$, we know $\mathscr {G}_{r,\circ }$. From $\mathscr {G}_{r,\circ }$ we can obtain $\mathscr {G}_r$ by adding a single clause $a\equiv a_r$, together with a random label $\theta _a$ which is a fresh copy of $\theta $. To choose the unmatched edges $\delta a=(e_1,\ldots ,e_k)$ which are combined into the clause a, we take $e_1$ uniformly at random from $\delta 'V_{r,\circ }$, then take $\{e_2,\ldots ,e_k\}$ a uniformly random subset of $\delta 'U_{r,\circ }$. Let $\mu _{r,\circ }$ be the Gibbs measure on $\mathscr {G}_{r,\circ }$ (ignoring unmatched half-edges). Let $\underline{\underline{x}}\equiv (\underline{{x}},\underline{{x}}^1,\underline{{x}}^2,\ldots )$ be an infinite sequence of i.i.d. samples from $\mu _{r,\circ }$, and write $\langle \cdot \rangle _{r,\circ }$ for the expectation with respect to their joint law. Then

$$\begin{aligned}\mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })} = \mathbb {E}_{r,\circ } \ln (1- \langle \theta (\underline{{x}}_{\delta a}) \rangle _{r,\circ }) = \sum _{p\geqslant 1} \frac{1}{p} {\mathscr {A}}_p,\quad {\mathscr {A}}_p \equiv \mathbb {E}_{r,\circ }\bigg [\Big \langle \prod _{\ell =1}^p \theta (\underline{{x}}^\ell _{\delta a}) \Big \rangle _{r,\circ } \bigg ]\,.\end{aligned}$$

We have $\mathbb {E}_{r,\circ }=\mathbb {E}_a\mathbb {E}_\theta $ where $\mathbb {E}_a$ is expectation over the choice of $\delta a$, and $\mathbb {E}_\theta $ is expectation over the choice of $\theta $. Under $\mathbb {E}_a$, the edges $(e_2,\ldots ,e_k)$ are weakly dependent, since they are required to be distinct elements of $\delta 'U_{r,\circ }$. We can consider instead sampling $e_2,\ldots ,e_k$ uniformly with replacement from $\delta 'U_{r,\circ }$, so that $e_1,\ldots ,e_k$ are independent conditional on ${\mathscr {F}}_{r,\circ }$; let $\mathbb {E}_{a,\text {ind}}$ denote expectation with respect to this choice of $\delta a$. Under $\mathbb {E}_{a,\text {ind}}$ the chance of a collision $e_i=e_j$ ($i\leqslant j$) is $O(k^2/|\delta 'U_{r,\circ }|)$. Recalling $1-\theta \geqslant \epsilon $ almost surely, we have

$$\begin{aligned}{\mathscr {A}}_{p,\text {ind}} \equiv \mathbb {E}_{a,\text {ind}} \mathbb {E}_\theta \bigg [ \Big \langle \prod _{\ell =1}^p \theta (\underline{{x}}^\ell _{\delta a}) \Big \rangle _{r,\circ } \bigg ] = {\mathscr {A}}_p + O(1) (1-\epsilon )^p \min \bigg \{ \frac{k^2}{|\delta 'U_{r,\circ }|}, 1 \bigg \}\,.\end{aligned}$$

Recall from (134) the product form of $\theta $, and let $\mathbb {E}_f$ denote expectation over the law of $f\equiv (f_s)_{s\in S}$. Then, with $b^p(\underline{{s}})$ as defined in (136), we have

$$\begin{aligned} {\mathscr {A}}_{p,\text {ind}}&=\sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \bigg \langle \mathbb {E}_{a,\text {ind}}\bigg \{ \prod _{j=1}^k \mathbb {E}_f\bigg [ \prod _{\ell =1}^p f_{s_\ell }(x^\ell _{e_j})\bigg ] \bigg \} \bigg \rangle _{r,\circ }\\&= \sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \langle I_{V,\underline{{s}}}(\underline{\underline{x}}) I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} \rangle _{r,\circ }, \end{aligned}$$

where, for $W=U$ or $W=V$, we define

$$\begin{aligned}I_{W,\underline{{s}}}(\underline{\underline{x}}) \equiv \frac{1}{|\delta 'W_{r,\circ }|} \sum _{e\in \delta 'W_{r,\circ }} \mathbb {E}_f\bigg [ \prod _{\ell =1}^p f_{s_\ell }(x^\ell _e) \bigg ]\,.\end{aligned}$$

Summing over $p\geqslant 1$ gives that, on the event $\mathrm {\textsf {{{COUP}}}}_{\leqslant r}$,

$$\begin{aligned}&\mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })} = \sum _{p\geqslant 1} \frac{1}{p} \sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \mathbb {E}_{r,\circ } \langle I_{V,\underline{{s}}}(\underline{\underline{x}}) I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} \rangle _{r,\circ } + \textsf {err}_{r,1},\\&\qquad \text {where } |\textsf {err}_{r,1}| \leqslant O_\epsilon (1) \min \bigg \{ \frac{k^2}{|\delta 'U_{r,\circ }|}, 1 \bigg \}. \end{aligned}$$

A similar comparison between $\mathscr {G}_{r+1}$ and $\mathscr {G}_{r,\circ }$ gives

$$\begin{aligned} \mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })}&= \sum _{p\geqslant 1} \frac{1}{p} \mathbb {E}_{r,\circ } \bigg [ \sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \bigg \langle \frac{k-1}{k} I_{U,\underline{{s}}}(\underline{\underline{x}})^{k} + \frac{1}{k} I_{V,\underline{{s}}}(\underline{\underline{x}})^{k} \bigg \rangle _{r,\circ } \bigg ] +\textsf {err}_{r,2},\\&\qquad |\textsf {err}_{r,2}| \leqslant O_\epsilon (1) \min \bigg \{ \frac{k^2}{ \min \{|\delta 'U_{r,\circ }|, |\delta 'V_{r,\circ }|\}}, 1 \bigg \}. \end{aligned}$$

We now argue that the sum of the error terms $\textsf {err}_{r,1},\textsf {err}_{r,2}$, over $0\leqslant r\leqslant nd-1$, is small in expectation. First note that for a constant $C=C(k,\epsilon )$,

$$\begin{aligned}\sum _{r=0}^{nd-1} \mathbb {E}[\textsf {err}_{r,1} +\textsf {err}_{r,2}] \leqslant C n \bigg [ n^{-2/3} + \mathbb {P}\Big ( \min \{|\delta 'V_{r,\circ }|, |\delta 'V_{r,\circ }|\} \leqslant n^{2/3} \text { for some } r \leqslant nd \Big ) \bigg ]\,.\end{aligned}$$

The process $(|\delta 'V_{r,\circ }|)_{r\geqslant 0}$ is an unbiased random walk started from $m'+1 = 2n^{2/3}+1$. In each step it goes up by 1 with chance $(k-1)/k$, and down by $k-1$ with chance 1/k; it is absorbed if it hits k before time $nd-m'$. Similarly, $(|\delta 'U|_{r,\circ })_{r\geqslant 0}$ is an unbiased random walk started from $(m'+1)(k-1)$ with an absorbing barrier at k. By the Azuma–Hoeffding bound, there is a constant $c=c(k)$ such that

$$\begin{aligned}\mathbb {P}(|\delta 'V_{r,\circ }| \leqslant |\delta 'V_{0,\circ }|-n^{2/3}) + \mathbb {P}(|\delta 'U_{r,\circ }| \leqslant |\delta 'U_{0,\circ }|-n^{2/3}) \leqslant \exp \{ -c n^{1/3} \}\end{aligned}$$

Taking a union bound over r shows that with very high probability, neither of the walks $|\delta 'V_{r,\circ }|,|\delta 'U_{r,\circ }|$ is absorbed before time $nd-m'$, and (adjusting the constant C as needed)

$$\begin{aligned}\sum _{r=0}^{nd-1} \mathbb {E}[\textsf {err}_{r,1} +\textsf {err}_{r,2}] \leqslant C n^{1/3}\,.\end{aligned}$$

Altogether this gives

$$\begin{aligned}&\mathbb {E}\ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{nd})} - O_\epsilon (n^{1/3})\\&=\sum _{r=0}^{nd-1} \sum _{p\geqslant 1} \frac{1}{p} \sum _{\underline{{s}}} b^p(\underline{{s}}) \mathbb {E}_{r,\circ } \bigg \langle I_{V,\underline{{s}}}(\underline{\underline{x}}) I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} -\frac{k-1}{k} I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} -\frac{1}{k} I_{V,\underline{{s}}}(\underline{\underline{x}})^{k-1} \bigg \rangle _{r,\circ }. \end{aligned}$$

Using the fact that $x^k-kxy^{k-1}+(k-1)y^k\geqslant 0$ for all $x,y\in \mathbb {R}$ and even $k\geqslant 2$, or $x,y\geqslant 0$ and odd $k\geqslant 3$ finishes the proof.$\square $

Corollary E.7

In the setting of Lemma E.6,

$$\begin{aligned}\mathbb {E}\ln Z(\mathscr {G}_{-1})\geqslant \mathbb {E}\ln Z(\mathscr {G}_{nd+1}) -O_{\epsilon }(n^{2/3}),\end{aligned}$$

where the expectation $\mathbb {E}$ is over the sequence of random graphs $(\mathscr {G}_r)_{-1\leqslant r\leqslant nd+1}$.

Proof

Adding or removing a clause can change the partition function by at most a multiplicative constant (depending on $\epsilon $). On the event that the coupling succeeds for all r,

$$\begin{aligned}\bigg | \ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{-1})}\bigg | +\bigg | \ln \frac{Z(\mathscr {G}_{nd+1})}{Z(\mathscr {G}_{nd})}\bigg | = O_\epsilon (m') = O_\epsilon (n^{2/3})\,.\end{aligned}$$

On the event that the coupling fails, the difference is crudely $O_\epsilon (n)$. We saw in the proof of Lemma E.6 that the coupling fails with probability exponentially small in n, so altogether we conclude

$$\begin{aligned}\mathbb {E}\bigg | \ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{-1})}\bigg | +\mathbb {E}\bigg | \ln \frac{Z(\mathscr {G}_{nd+1})}{Z(\mathscr {G}_{nd})}\bigg | = O_\epsilon (n^{2/3})\,. \end{aligned}$$

Combining with the result of Lemma E.6 proves the claim.$\square $

Proof of Theorem E.5

In the interpolation, the initial graph $\mathscr {G}_{-1}$ consists of n disjoint trees $T_v$, each rooted at a variable $v\in V$. Thus

$$\begin{aligned}n^{-1}\mathbb {E}\ln Z(\mathscr {G}_{-1}) =\mathbb {E}\ln Z(T_v) =\mathbb {E}\ln \bigg ( \sum _{x\in \mathcal {X}} h_v(x) \prod _{a=1}^d \varvec{u}_a(x) \bigg )\,.\end{aligned}$$

The final graph $\mathscr {G}_{nd+1}$ is comprised of two disjoint subgraphs—one subgraph $\mathscr {G}_V$ has the same law as the graph $\mathscr {G}$ of interest, while the other subgraph $\mathscr {G}_U=(U,F_U,E_U)$ consists of $n\alpha (k-1)$ disjoint trees $S_c$, each rooted at a clause $c\in A_U$. Thus

$$\begin{aligned}n^{-1}\mathbb {E}\ln Z(\mathscr {G}_{nd+1}) = \alpha (k-1)\mathbb {E}\ln Z(S_c) + n^{-1}\mathbb {E}\ln Z(\mathscr {G}) = \alpha (k-1)\mathbb {E}\ln \varvec{u}_0 + F_n\,.\end{aligned}$$

The theorem follows by substituting these into the bound of Corollary E.7.$\square $

1.3 E.3. Proof of 1RSB bound

For the proof of Theorem E.3, we take $\mathscr {G}_{-1}$ as before and modify it as follows. Where previously each $u\in U$ had spin value $x_u\in \mathcal {X}$, it now has the augmented spin $(x_u,\gamma _u)$ where $\gamma $ goes over the positive integers. Let $\underline{{\gamma }}\equiv (\gamma _u)_u$. Next, instead of labeling u with $(h_u,\eta _u,\rho _u)$ as before, we now label it with $(h_u,\eta _u,(\rho ^\gamma _u)_{\gamma \geqslant 1})$ where $(\rho ^\gamma _u)_{\gamma \geqslant 1}$ is an infinite sequence of i.i.d. samples from $\eta _u$. Lastly, we join all variables in U to a new clause $a_*$ (Fig. 8), which is labelled with the function

$$\begin{aligned}\varphi _{a_*}(\underline{{\gamma }}) =\sum _{\gamma \geqslant 1} z_\gamma \prod _{u\in U}\mathbf {1}\{ \gamma _u=\gamma \}\end{aligned}$$

for some sequence of (random) weights $(z_\gamma )_{\gamma \geqslant 1}$. Let ${\mathscr {H}}_{-1}$ denote the resulting graph.

Given ${\mathscr {H}}_{-1}$, let $\mu _{{\mathscr {H}}_{-1}}$ be the associated Gibbs measure on configurations $(\underline{{\gamma }},\underline{{x}})$. Due to the definition of $\varphi _{a_*}$, the support of $\mu _{{\mathscr {H}}_{-1}}$ contains only those configurations where all the $\gamma _u$ share a common value $\gamma $, in which case we denote $(\underline{{\gamma }},\underline{{x}})\equiv (\gamma ,\underline{{x}})$. Explicitly,

$$\begin{aligned}\mu _{{\mathscr {H}}_{-1}}(\gamma ,\underline{{x}}) =\frac{1}{Z({\mathscr {H}}_{-1})} z_\gamma \prod _{v\in V} h_v(x_v) \prod _{a\in A} [1-\theta _a(\underline{{x}}_{\delta a})] \prod _{u\in U} (\rho ^\gamma h)_u(x_u)\,.\end{aligned}$$

We can then define an interpolating sequence ${\mathscr {H}}_{-1},\ldots , {\mathscr {H}}_{nd+1}$ precisely as in the proof of Theorem E.5, leaving $a_*$ untouched. Let $\mathscr {G}_r$ denote the graph ${\mathscr {H}}_r$ without the clause $a_*$, and let $Z_\gamma (\mathscr {G}_r)$ denote the partition function on $\mathscr {G}_r$ restricted to configurations where $\gamma _u=\gamma $ for all u. Then, for each $0\leqslant r\leqslant nd+1$,

$$\begin{aligned}Z({\mathscr {H}}_r) =\sum _\gamma z_\gamma Z_\gamma (\mathscr {G}_r)\,.\end{aligned}$$

The proofs of Lemma E.6 and Corollary E.7 carry over to this setting with essentially no changes, giving

Corollary E.8

Under the assumptions above,

$$\begin{aligned}\mathbb {E}\ln Z({\mathscr {H}}_{-1}) \geqslant \mathbb {E}\ln Z({\mathscr {H}}_{nd+1}) -O_{\epsilon }(n^{2/3}),\end{aligned}$$

where the expectation $\mathbb {E}$ is over the sequence of random graphs $({\mathscr {H}}_r)_{-1 \leqslant r\leqslant nd+1}$.

The result of Corollary E.8 applies for any $(z_\gamma )_{\gamma \geqslant 1}$. Now take $(z_\gamma )_{\gamma \geqslant 1}$ to be a Poisson–Dirichlet process with parameter $\lambda \in (0,1)$.^{Footnote 5} The process has the following invariance property (see e.g. [41, Ch. 2]):

Proposition E.9

Let $(z_\gamma )_{\gamma \geqslant 1}$ be a Poisson–Dirichlet process with parameter $\lambda \in (0,1)$. Independently, let $(\xi _\gamma )_{\gamma \geqslant 1}$ be a sequence of i.i.d. positive random variables with finite second moment. Then the two sequences $(z_\gamma \xi _\gamma )_{\gamma \geqslant 1}$ and $(z_\gamma (\mathbb {E}\xi _1^\lambda )^{1/\lambda })_{\gamma \geqslant 1}$ have the same distribution, and consequently

$$\begin{aligned} \mathbb {E}\ln \sum _{\gamma \geqslant 1} z_\gamma \xi _\gamma =\frac{1}{\lambda }\ln \mathbb {E}\xi ^\lambda \,. \end{aligned}$$

Proof of Theorem E.3

Consider $\underline{{Z}}(\gamma ) \equiv (Z_\gamma (\mathscr {G}_r) )_{-1 \leqslant r \leqslant nd+1}$. If we condition on everything else except for the $\rho $’s, then $(\underline{{Z}}(\gamma ))_{\gamma \geqslant 1}$ is an i.i.d. sequence indexed by $\gamma $. Let $\mathbb {E}_{z,\rho }$ denote expectation over the z’s and $\rho $’s, conditioned on all else: then applying Proposition E.9 gives

$$\begin{aligned} n^{-1}\mathbb {E}\ln Z({\mathscr {H}}_{-1})&= (n\lambda )^{-1} \mathbb {E}\ln \mathbb {E}_{z,\rho } [ Z({\mathscr {G}}_{-1})^\lambda ] = \lambda ^{-1} \mathbb {E}\ln \mathbb {E}_{z,\rho }\bigg [ \bigg (\sum _{x\in \mathcal {X}} h(x) \prod _{a=1}^d \varvec{u}_a(x)\bigg )^\lambda \bigg ],\\ n^{-1}\mathbb {E}\ln Z({\mathscr {H}}_{nd+1})&=F_n + \lambda ^{-1}\mathbb {E}\ln \mathbb {E}_{z,\rho }[ (\varvec{u}_0)^\lambda ]. \end{aligned}$$

Combining with Corollary E.8 proves the result.$\square $

1.4 E.4. Extension to higher levels of RSB

We finally explain that Theorem E.3 can be extended relatively easily to cover the scenario of r-step replica symmetry breaking. Before stating the result, we define some notations (mainly following notation of [41, §2.3]). Let ${\mathbb {N}}$ be the set of positive integers and ${\mathbb {N}}^r$ be its r-fold product; in particular, ${\mathbb {N}}^0\equiv \{\varnothing \}$. We consider arrays indexed by the set

$$\begin{aligned} {\mathcal {A}}\equiv \bigcup _{p=0}^r {\mathbb {N}}^p\,. \end{aligned}$$

We view ${\mathcal {A}}$ as a depth-r infinitary tree rooted at $\varnothing $. For $0\leqslant p\leqslant r-1$, each vertex $\gamma =(\gamma _1,\ldots ,\gamma _p)\in {\mathbb {N}}^p$ has children $\gamma n \equiv (\gamma _1,\dots ,\gamma _s,n)\in {\mathbb {N}}^{s+1}$. The leaves of the tree are in the last level ${\mathbb {N}}^r$. For $\gamma \in {\mathbb {N}}^p$ write $|\gamma |\equiv p$, and let ${\mathsf {p}}(\gamma )$ be the path between the root and $\gamma $ (not inclusive):

$$\begin{aligned} {\mathsf {p}}( \gamma ) \equiv \bigg \{ \gamma _1,(\gamma _1,\gamma _2), \dots ,(\gamma _1,\dots ,\gamma _{p-1}) \bigg \}\,. \end{aligned}$$

Fix a sequence of parameters $ \underline{{m}} = (m_1,\dots ,m_r)$ satisfying

$$\begin{aligned} 0< m_0< \dots< m_{r-1}< 1. \end{aligned}$$

(139)

For each $\gamma \in {\mathcal {A}}{\setminus }{\mathbb {N}}^r$, let $\Pi _\gamma $ be (independently of all else) a Poisson–Dirichlet point process with parameter $m_{|\gamma |}$. Let $(u_{\gamma n})_{n\in {\mathbb {N}}}$ be the points of $\Pi _\gamma $ arranged in decreasing order. As $\gamma $ goes over all of ${\mathcal {A}}\setminus {\mathbb {N}}^r$, we obtain an array $(u_\beta )_{\beta \in {\mathcal {A}}\setminus {\mathbb {N}}^0}$. Let

$$\begin{aligned}w_\gamma \equiv \prod _{\beta \in {\mathsf {p}}(\gamma )}u_\beta \,.\end{aligned}$$

The Ruelle probability cascade of parameter $\underline{{m}}$ (hereafter $\text {RPC}(\underline{{m}})$) is defined as the ${\mathbb {N}}^r$-indexed array

$$\begin{aligned}\nu _\gamma \equiv \frac{w_\gamma }{\sum _{\beta \in {\mathbb {N}}^r}w_\beta }\,.\end{aligned}$$

For the validity of the definition, see for instance [41, Lem. 2.4]. As in the 1rsb setting, we plan to apply Theorem E.5 to the modified graph ${\mathscr {H}}_{-1}$, where we “glue” multiple weighted copies of $\mathscr {G}_{-1}$’s together via the extra clause $a_\star $. The only difference is that now the copies of $\mathscr {G}_{-1}$ are indexed by $\gamma \in {\mathbb {N}}^r$ instead of ${\mathbb {N}}$. More precisely, the extra spin at each vertex $u\in U$ will take a value $\gamma \in {\mathbb {N}}^r$; the label at each vertex $u\in U$ will be $(h_u,\eta _u,(\rho ^\gamma _u)_{\gamma \in {\mathcal {A}}})$; and the function at $a_\star $ will be

$$\begin{aligned} \phi _{a_\star } (\underline{{\gamma }}) = \sum _{\gamma \in {\mathbb {N}}^r} z_\gamma \prod _{u\in U} \mathbf {1}\{\gamma _u = \gamma \} , \end{aligned}$$

(140)

where $(z_\gamma )_{\gamma \in {\mathbb {N}}^r}$ is a ${\mathbb {N}}^r$-indexed random array representing the weight of copy $\gamma \in {\mathbb {N}}^r$. In the proof, we will choose $(z_\gamma )_{\gamma \in {\mathbb {N}}^r}$ according to the $\text {RPC}(\underline{{m}})$ law.

We now specify the labels $(\rho ^\gamma )_{\gamma \in {\mathcal {A}}}$ that will be used in the proof. Recall that ${\mathcal {M}}_0$ is the space of probability measures on the alphabet ${\mathcal {X}}$. We recursively define ${\mathcal {M}}_r$, for $1\leqslant r\leqslant p$, to be the space of probability measures on ${\mathcal {M}}_{r-1}$. Now fix an element $\rho ^\varnothing = \zeta \in {\mathcal {M}}_r$. For each $0\leqslant p \leqslant r-1$ and $\gamma \in {\mathbb {N}}^p$, suppose inductively that we have constructed $\rho ^\gamma \in {\mathcal {M}}_{r-p}$. We then take $\rho ^{\gamma n} \in {\mathcal {M}}_{r-p-1}$ for $n\in {\mathbb {N}}$ as i.i.d. samples $\rho ^{\gamma }\in {\mathcal {M}}_{r-p}$ in ${\mathcal {M}}_{r-p-1}$. The process terminates with the construction of $\rho ^\gamma \in {\mathcal {X}}$ for each $\gamma \in {\mathbb {N}}^r$. Define the $\sigma $-field

$$\begin{aligned} {\mathcal {F}}_p \equiv \sigma \bigg ( \Big ( (\rho ^\gamma )_{\gamma \in {\mathbb {N}}^s} \Big )_{s\leqslant p} \bigg )\,, \end{aligned}$$

and write $\mathbb {E}_p$ for expectation conditional on ${\mathcal {F}}_p$. For any deterministic function $V(u,\rho )$, any random variable U independent of $(\rho ^\gamma )_{\gamma \in {\mathbb {N}}^r}$, and any sequence of parameters $\underline{{m}}$ satisfying (139), consider the random array $(V^\gamma )_{\gamma \in {\mathbb {N}}^r} \equiv (V(U,\rho ^\gamma ))_{\gamma \in {\mathbb {N}}^r}$. Let ${\mathsf {T}}_r(V) = V(U,\rho ^{\underline{{1}}})$, and for $0\leqslant p\leqslant r-1$ let

$$\begin{aligned} {\mathsf {T}}_p(V) = \bigg \{ \mathbb {E}_p\Big ( {\mathsf {T}}_{p+1}(V) \Big )^{m_p} \bigg \}^{1/m_p} \end{aligned}$$

The resulting operator ${\mathsf {T}}_0$ depends implicitly on the distribution of U, measure $\rho ^\varnothing \in {\mathcal {M}}_r$ and parameter $\underline{{m}}$. The following lemma is a well-known property of the RPC.

Lemma E.10

([43, Prop. 2]) Let $(z_\gamma )_{\gamma \in {\mathbb {N}}^r}$ be the RPC with parameter $\underline{{m}}$. Under the notations above,

$$\begin{aligned} \mathbb {E}\ln {\mathsf {T}}_0(V) = \mathbb {E}\ln \sum _{\gamma \in {\mathbb {N}}^r} z_\gamma V^\gamma . \end{aligned}$$

The next result generalizes Theorem E.3.

Theorem E.11

Consider a (random) Gibbs measure (137) satisfying assumptions (134)–(136). Write $(h\rho )_{a,j}(x) \equiv h_{a,j}(x)\rho _{a,j}(x)$. For each $a\in F$ we define

$$\begin{aligned} \varvec{u}_a(x)&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} \mathbf {1}\{x_1=x\} [1-\theta _a(\underline{{x}})] \prod _{j=2}^k (h\rho )_{a,j}(x_j) ,\\ \varvec{u}_a&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} [1-\theta _a(\underline{{x}})] \prod _{j=1}^k (h\rho )_{a,j}(x_j)\,. \end{aligned}$$

Note that $\varvec{u}_a(x)$ and $\varvec{u}_a$ are deterministic functions of the variables $\big (\theta _a, (h_{a,j})_{j\in [k]}, (\rho _{a,j})_{j\in [k]}\big )$. Let

$$\begin{aligned} \varvec{v}\equiv \sum _{x\in \mathcal {X}} h(x) \prod _{a=1}^d \varvec{u}_a(x) . \end{aligned}$$

For any ${\zeta \in {\mathcal {M}}_r}$ and sequence $\underline{{m}}$ satisfying (139), let $(\rho ^\gamma )_{\gamma \in {\mathbb {N}}^r}$ be constructed as above, and let $(\rho ^\gamma _{a,j})_{\gamma \in {\mathbb {N}}^r}$ be i.i.d. copies indexed by (a, j). Define ${\mathsf {T}}_0$ similarly as above, using the $\sigma $-fields

$$\begin{aligned} {\mathcal {F}}_p = \sigma \bigg ( (\rho ^\gamma _{a,j})_{\gamma \in {\mathbb {N}}^s}: a\in F, j\in [k], s\leqslant p \bigg )\,. \end{aligned}$$

Then the nonasymptotic free energy $F_n \equiv n^{-1}\mathbb {E}_n\ln Z(\mathscr {G})$ satisfies the bound

$$\begin{aligned} F_n \leqslant \mathbb {E}\ln {\mathsf {T}}_0 (\varvec{v}) -(k-1)\alpha \mathbb {E}\ln {\mathsf {T}}_0(\varvec{u}_0) +O_{\epsilon }\bigg (\frac{1}{n^{1/3}}\bigg ) \end{aligned}$$

where $\mathbb {E}$ denotes the expectation over $(\theta _a)_{a\in F}$ and $(h_{a,j})_{a\in F,j\in [k]}$.

Proof

As outlined above, we consider the modified graph ${\mathscr {H}}_{-1}$ where each vertex $u\in U$ is independently labeled with $(h_u,\eta _u,(\rho ^\gamma _u)_{\gamma \in {\mathcal {A}}})$ and the extra clause $a_\star $ is labeled with the function defined in (140). In this setting, each $u\in U$ has spin value $(\gamma ,x)\in {\mathbb {N}}^r\times \mathcal {X}$. Since we are interested only in configurations $(\underline{{\gamma }},\underline{{x}})$ such $\gamma _u\equiv \gamma $ for all $u\in U$, we write $(\gamma ,\underline{{x}})$ instead of $(\underline{{\gamma }},\underline{{x}})$ and define the Gibbs measure as

$$\begin{aligned} \mu _{{\mathscr {H}}_{-1}}(\gamma ,\underline{{x}}) =\frac{1}{Z({\mathscr {H}}_{-1})} z_\gamma \prod _{v\in V} h_v(x_v) \prod _{a\in A} [1-\theta _a(\underline{{x}}_{\delta a})] \prod _{u\in U} (\rho ^\gamma h)_u(x_u) \,. \end{aligned}$$

Sample the weights $(z_\gamma )_{\gamma \in {\mathbb {N}}^r}$ according the law $\text {RPC}(\underline{{m}})$. The result then follows by the proof of Theorem E.3, with Lemma E.10 replacing the role of Proposition E.9. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sly, A., Sun, N. & Zhang, Y. The number of solutions for random regular NAE-SAT. Probab. Theory Relat. Fields 182, 1–109 (2022). https://doi.org/10.1007/s00440-021-01029-5

Download citation

Received: 27 March 2019
Revised: 02 February 2021
Accepted: 06 February 2021
Published: 20 November 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00440-021-01029-5

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The number of solutions for random regular NAE-SAT

Abstract

Similar content being viewed by others

Satisfiability Threshold for Random Regular nae-sat

Super Solutions of Random Instances of Satisfiability

The backtracking survey propagation algorithm for solving random K-SAT problems

1 Introduction

1.1 Main result

Theorem 1

Remark 1.1

1.2 Statistical physics predictions

1.3 The tilted cluster partition function

1.4 Modeling solution clusters

1.5 One-step replica symmetry breaking

1.6 The 1RSB free energy prediction

Proposition 1.2

Definition 1.3

Proposition 1.4

1.7 Proof approach

1.8 Open problems

2 Combinatorial model

Definition 2.1

2.1 Frozen and warning configurations

Definition 2.2

Definition 2.3

2.2 Message configurations

Definition 2.4

Definition 2.5

Definition 2.6

Lemma 2.7

Proof

2.3 Bethe formula

Definition 2.8

Lemma 2.9

Proof

Corollary 2.10

Proof

2.4 Colorings

Definition 2.11

Lemma 2.12

Proof

Lemma 2.13

Proof

Definition 2.14

Proposition 2.15

Proof

2.5 Averaging over edge literals

Definition 2.16

Lemma 2.17

Proof

Corollary 2.18

Proof

3 Proof outline

3.1 Empirical measures and moments

Definition 3.1

Definition 3.2

3.2 Outline of first moment

Lemma 3.3

Proof

Proposition 3.4

Corollary 3.5

Proof

3.3 Outline of second moment

Definition 3.6

Proposition 3.7

Corollary 3.8

Proof

Lemma 3.9

Proof

Proposition 3.10

Corollary 3.11

Proof

3.4 Conclusion of main result

Corollary 3.12

Proof

Proposition 3.13

Proof of Theorem 1

4 Reduction to tree optimization by local updates

4.1 Local update

Definition 4.1