1 Introduction

1.1 Disordered systems and the Bethe lattice

In 2001 in a ground-breaking contribution Mézard and Parisi proposed an analytic but non-rigorous technique that they called the cavity method for the study of spin glasses on the ‘Bethe lattice’,Footnote 1 known in combinatorics as the random d-regular graph [48]. Mézard and Parisi argued that the Bethe lattice constitutes an attractive halfway point between classical ‘mean-field’ models such as the Sherrington-Kirkpatrick model with complete interaction between all sites and spatial models such as the Edwards-Anderson model. Indeed, the Bethe lattice induces a non-trivial metric on the sites, each of which interacts with only a bounded number of others. But at the same time Mézard and Parisi showed that the model is amenable to analytic methods, even though matters are significantly more complicated than in the fully connected case. They went on to argue that the spin glass on the Bethe lattice exhibits many of the properties expected of real glassy systems, such as replica symmetry breaking and the proliferation of pure states.

From the original contribution [48] sprang a truly enormous body of work that has had a transformative impact on an astounding variety of subjects, ranging from physics to combinatorics to machine learning. Many of the applications may appear unexpected, even surprising. Almost all of them hinge on the cavity method. Prominent success stories include the development of ‘low-density parity check codes’, a rare example of a statistical physics idea leading directly to an eminently useful, and widely used, algorithm [56]. A further example is a new algorithm for the compressed sensing problem, a fundamental signal processing task [58]. Other important cavity method-based contributions pertain to classical problems in mathematics, such as phase transitions in random graphs and other random structures [41, 46, 47]. The cavity method has also been used to put forward predictions in machine learning, including the capacity of the Hopfield model or on restricted Boltzmann machines [45].

Due to these numerous ramifications, vindicating the cavity method rigorously has become an important research task at the junction of mathematical physics, combinatorics and computer science. There has been a lot of progress recently, e.g., [9, 19, 33, 51]; we shall review the literature in greater detail in Sect. 2.5. However, much of this work is concerned with special cases, mostly the ‘replica symmetric’ scenario where there is just a single pure state.

The aim of the present paper is to move past such assumptions and special cases. We confirm several of the key hypotheses of Mézard and Parisi, particularly the decomposition into pure states and the validity of the Belief Propagation recurrence, the mainstay of the cavity calculations. Further, we obtain a general variational formula for the free energy that is perfectly in line with the Mézard-Parisi ansatz. Additionally, we show that the free energy can be computed from the Belief Propagation representation of the pure states of the model. We obtain these results not merely for a specific model, but for a broad family of models on the Bethe lattice. The prime example is, of course, the diluted spin glass model. But in addition, since the proof techniques that we develop are generic, the results apply to models that are of eminent interest in other areas, particularly combinatorics, such as the Potts antiferromagnet or the hard-core model. Crucially, the results apply universally to all parameter values (such as degree, inverse temperature) of the respective models.

We should point out, however, that the present results fall short of fully corroborating the Mézard-Parisi ansatz. Most importantly, while we prove that general random factor graph models possess pure state decompositions represented by (approximate) Belief Propagation fixed points, an important prediction of the Mézard-Parisi ansatz pertains to the relative geometry of these fixed points. Roughly speaking, Mézard and Parisi predict three different scenarios, depending on the parameter values (such as temperature). In the replica symmetric case, there is just one single pure state (or possibly a small bounded number due to inherent symmetries). Moreover, in the 1-step replica symmetry breaking scenario an unbounded number of pure states occur, each represented by a Belief Propagation fixed point. But these fixed points are predicted to exhibit a strong symmetry property that ensures, e.g., that the empirical distributions of the Belief Propagation messages in the different pure states are (nearly) identical. Finally, in the full replica symmetry breaking scenario we also expect an unbounded number of pure states and assorted Belief Propagation fixed points, which are arranged hierarchically in the fashion of an ultrametric tree. While in the present paper we prove the existence of a pure state decomposition and of associated Belief Propagation fixed points, our methods do not suffice to establish these more precise predictions as to the geometry and relative weights of the pure states. A full verification of these predictions remains an important open problem as it would, among other things, lead to a significantly simplified variational formula for the free energy. For a more detailed discussion of the Mézard and Parisi ansatz, replica symmetry breaking, and ultrametricity we refer to [46, 53, 54].

Technically the paper builds upon and continues two intertwined threads of prior work. First, we bring to bear a variant of the ‘regularity method’ from combinatorics that we developed recently [10, 15, 24] in order to establish the pure state decomposition and to vindicate the Belief Propagation equations. Second, we seize upon Panchenko’s work on asymptotic Gibbs measures and the interpolation method, particularly in order to derive the variational formula for the free energy [52, 53]. Both of these methods were previously applied with great success to random graphs of Erdős-Rényi type. This line of work crucially exploited the relative geometric flexibility of the Erdős-Rényi model, whose Poisson degree distribution facilitates coupling arguments. By contrast, the geometry of the Bethe lattice is rigid. While this entails that the specification of the model, the cavity equations and their solution are quite ‘clean’, the rigidity poses substantial technical challenges that the present paper resolves.

Before presenting the main results of the paper, which cover a broad family of problems that we call random factor graph models, in Sect. 2, we illustrate the results and the concepts around which they revolve with the spin glass model from the original contribution of Mézard and Parisi. We also work out an additional application to the hard-core model and the independence number of the random regular graph. Several further applications, including the Potts model and the Max q-Cut problem, are worked out in Sect. 7.

1.2 The diluted spin glass

For integers \(d\ge 3\), \(n>0\) such that dn is even, let \({\mathbb {G}}={\mathbb {G}}(n,d)\) be the uniformly random d-regular graph on the vertex set \(V_n=\{v_1,\ldots ,v_n\}\). With each edge \(e\in E({\mathbb {G}})\) comes a standard Gaussian \(J_e\). The random variables \((J_e)_{e\in E({\mathbb {G}})}\) are mutually independent. For a given inverse temperature \(\beta >0\), the diluted spin glass on \({\mathbb {G}}\) is the probability distribution on \(\{\pm 1\}^{V_n}\) defined by

$$\begin{aligned} \mu _{{\mathbb {G}}}(\sigma )&= \frac{1}{Z({\mathbb {G}})}\prod _{vw\in E({\mathbb {G}})}\frac{1+\tanh (\beta J_{vw})\sigma _v\sigma _w}{2}, \end{aligned}$$
(1.1)

where the partition function \(Z({\mathbb {G}})\) ensures normalization.Footnote 2 Without the couplings \(J_e\), this would just be the ferromagnetic Ising model on \({\mathbb {G}}\). But since the \(J_e\) are independent Gaussians, some will be positive and others negative. In effect, some edges induce ferromagnetic and others antiferromagnetic interactions, causing frustration. Thus, \(\mu _{{\mathbb {G}}}\) is a spin glass model, the well-known diluted spin glass on the Bethe lattice.

There are two fundamental problems associated with this and numerous similar models: first, to characterize the structure of the Boltzmann distribution \(\mu _{{\mathbb {G}}}\). Does it exhibit long-range correlations? Does it decompose into one or several ‘pure states’, and if so, how can we characterize them? Second, to calculate the quantity \(\lim _{n\rightarrow \infty } \frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]\), which we call the free energy density. Its fundamental importance is due to the fact that other important observables derive from it. Moreover, the singularities of the function \(\beta \mapsto \lim _{n\rightarrow \infty } \frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]\) constitute the phase transitions of the model.

1.2.1 Bethe states and the Boltzmann distribution.

With respect to the first problem, Mézard and Parisi hypothesized that the Boltzmann distribution always decomposes into one or a moderate (albeit not necessarily bounded) number of pure states. Further, they hypothesized that these pure states are characterized by fixed points of a recurrence called Belief Propagation. Our first theorem confirms this hypothesis.

To be precise, writing \(\partial v\) for the set of neighbors of a vertex v, let \(\mathscr {M}({\mathbb {G}})\) be the set of all families \((\nu _{u\rightarrow v})_{v\in V_n,u\in \partial v}\) such that \(\nu _{u\rightarrow v}\in [0,1]\). We call \(\nu _{u\rightarrow v}\) the message from u to v. The messages need not be symmetric, i.e., possibly \(\nu _{u\rightarrow v}\ne \nu _{v\rightarrow u}\). Furthermore, Belief Propagation is the operator \(\mathrm {BP}:\mathscr {M}({\mathbb {G}})\rightarrow \mathscr {M}({\mathbb {G}})\), \(\nu \mapsto {\hat{\nu }}\), where

$$\begin{aligned} {\hat{\nu }}_{v\rightarrow u}&=\frac{\prod _{w\in \partial v{\setminus } u}1+2\tanh (\beta J_{vw})(\nu _{w\rightarrow v}-1/2)}{\sum _{\sigma \in \{\pm 1\}}\prod _{w\in \partial v{\setminus } u}1+2\sigma \tanh (\beta J_{vw})(\nu _{w\rightarrow v}-1/2)}. \end{aligned}$$
(1.2)

The motivation behind this operator, and the origin of the name ‘cavity method’, is this. Suppose we fix a vertex v in a d-regular graph along with a neighbor u. Now suppose we remove the vertex u, thereby creating a ‘cavity’. Then the ‘ideal’ message \(\mu _{{\mathbb {G}},u\rightarrow v}\) that we would like to compute is just the marginal probability \(\mu _{{\mathbb {G}}-v,u}(1)\) that u takes spin 1 in the subgraph obtained by removing v. If the Boltzmann distribution \(\mu _{{\mathbb {G}}}\) is free from long-range correlations, then these ideal messages should plausibly be a fixed point of the BP operator. Indeed, if we remove v, then very likely its former neighbors will be mutually far apart in the resulting graph. In effect, the joint distribution of their spins should factorize. If so, then a straightforward calculation verifies that the ideal messages are a fixed point of BP. In fact this reasoning goes back to Bethe’s classical work [15].

However, generally spin glass models do exhibit long-range correlations, a phenomenon called replica symmetry breaking (see, e.g., [17, 25] for proofs that replica symmetry breaking occurs in certain models). Yet the fundamental hypothesis of Mézard and Parisi holds that the phase space \(\{\pm 1\}^{V_n}\) always decomposes into Bethe states\(S_1,\ldots ,S_\ell \) in such a way that the conditional distributions \(\mu _{{\mathbb {G}}}[\,\cdot \,|S_h]\) are free from long-range correlations. Formally, this means that if we pick a pair of vertices \((v_i,v_j)\) uniformly at random, then typically the conditional joint distribution \(\mu _{{\mathbb {G}},v_i,v_j}[\,\cdot \,|S_h]\) of the spins of \(v_i\) and \(v_j\) is close to the product distribution \(\mu _{{\mathbb {G}},v_i}(\,\cdot \,|S_h)\otimes \mu _{{\mathbb {G}},v_j}(\,\cdot \,|S_h)\), i.e.,

$$\begin{aligned} \frac{1}{n^2}\sum _{1\le i<j\le n}\left\| {\mu _{{\mathbb {G}},v_i,v_j}(\,\cdot \,|S_h)-\mu _{{\mathbb {G}},v_i}(\,\cdot \,|S_h)\otimes \mu _{{\mathbb {G}},v_j}(\,\cdot \,|S_h)}\right\| _{\mathrm {TV}}&=o(1). \end{aligned}$$
(1.3)

In effect, within each Bethe state the ‘ideal’ messages are predicted to be an approximate fixed point of the BP operator. To be precise, for adjacent vertices uv we write \(\mu _{{\mathbb {G}},v\rightarrow u}[S_h]=\mu _{{\mathbb {G}}-u,v}(1|S_h)\) for the conditional probability given \(S_h\) that v takes spin 1 in the subgraph of \({\mathbb {G}}\) with u removed. Then we expect that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{u\in \partial v_i}\left\| {\mu _{{\mathbb {G}},v_i\rightarrow u}[S_h] -{\hat{\mu }}_{{\mathbb {G}},v_i\rightarrow u}[S_h]}\right\| _{\mathrm {TV}}&=o(1)\quad \text{ where }\quad ({\hat{\mu }}_{{\mathbb {G}},v\rightarrow u}[S_h])_{v\in V_n,u\in \partial v}\nonumber \\&=\mathrm {BP}(\mu _{{\mathbb {G}},v\rightarrow u}[S_h])_{v\in V_n,u\in \partial v}. \end{aligned}$$
(1.4)

Further, the cavity method predicts that the Boltzmann marginals can be obtained from the messages by a formula quite similar to (1.2):

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left| {\mu _{{\mathbb {G}},v_i}(1|S_h)- \frac{\prod _{w\in \partial v_i}1+2\tanh (\beta J)(\mu _{{\mathbb {G}},w}[S_h]-1/2)}{\sum _{\sigma \in \{\pm 1\}}\prod _{w\in \partial v_i}1+2\sigma \tanh (\beta J)(\mu _{{\mathbb {G}},w}[S_h]-1/2)}}\right|&=o(1). \end{aligned}$$
(1.5)

The following theorem establishes these conjectures rigorously. We say that \({\mathbb {G}}\) enjoys a property with high probability (‘w.h.p.’) if the probability that the property holds tends to one as \(n\rightarrow \infty \).

Theorem 1.1

For any \(d\ge 3\), \(\beta >0\) the following is true. Let \(L=L(n)\rightarrow \infty \) be any integer sequence that tends to infinity. Then there exists a decomposition \(S_0=S_0({\mathbb {G}}),S_1=S_1({\mathbb {G}}),\ldots ,S_\ell =S_\ell ({\mathbb {G}})\), \(\ell =\ell ({\mathbb {G}})\le L\), of the phase space \(\{\pm 1\}^n\) into non-empty sets such that \(\mu _{{\mathbb {G}}}(S_0)=o(1)\) and such that with high probability (1.3)–(1.5) are satisfied for \(h=1,\ldots ,\ell \).

Crucially, and in contrast to much prior work in this area, Theorem 1.1 applies indiscriminately to all \(d,\beta \). While it is expected that in the ‘high-temperature’ regime (small \(\beta \)) there is just a single pure state, it is widely conjectured that for large d and \(\beta \) the number of pure states is unbounded. Thus, we do not expect that it will be possible to replace the unbounded L in Theorem 1.1 by a constant. Yet Theorem 1.1 shows that the number of states can be upper bounded by an arbitrarily slowly growing function L(n).

1.2.2 The free energy.

The Bethe states and their associated messages contain all the information needed to compute the free energy. To be precise, once more following the ideas of Mézard and Parisi, we can set up a recurrence for computing the difference \(\mathbb {E}[\log Z({\mathbb {G}}(n+1,d))]-\mathbb {E}[\log Z({\mathbb {G}}(n,d))]\), which in turn enables us to write a formula for \(\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}}(n,d))]\) by telescoping. To set up such a recurrence it is necessary to crack the rigid geometry of the random regular graph open a little bit. To this end, we resort to the idea of creating a few ‘cavities’. Specifically, we delete a few random vertices and edges from \({\mathbb {G}}(n,d)\). Formally, let \(\omega >0\) and let \(\varvec{X},\varvec{Y}\) be two independent Poisson variables with mean \(\omega \). Moreover, let \(\varvec{u}_1,\ldots ,\varvec{u}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\) be sequences of uniformly random vertices and edges of \({\mathbb {G}}\), chosen independently. With \(S_1,\ldots ,S_\ell \) the decomposition from Theorem 1.1, we introduce weights

$$\begin{aligned} \varvec{z}_{{\mathbb {G}},h}&=\mu _{{\mathbb {G}}}(S_h)\cdot \prod _{i=1}^{\varvec{X}}\left( { \sum _{\sigma \in \{\pm 1\}}\prod _{v\in \partial \varvec{u}_i}1+2\sigma \tanh (\beta J_{v\varvec{u}_i}) (\mu _{{\mathbb {G}},v\rightarrow \varvec{u}_i}[\sigma |S_h]-1/2)}\right) ^{-1}\\&\quad \cdot \prod _{i=1}^{\varvec{Y}}\left( {1+4\tanh (\beta J_{\varvec{v}_i\varvec{w}_i}) (\mu _{{\mathbb {G}},\varvec{v}_i\rightarrow \varvec{w}_i}[1|S_h]-1/2)(\mu _{{\mathbb {G}},\varvec{w}_i\rightarrow \varvec{v}_i}[1|S_h]-1/2)}\right) ^{-1} \end{aligned}$$

and \(\varvec{z}_{{\mathbb {G}}}=\sum _{h=1}^\ell \varvec{z}_{{\mathbb {G}},h}\). Further, let \(\mathscr {C}({\mathbb {G}})\) be the set of all vertices of degree less than d in the graph \({\mathbb {G}}_{n,\omega }\) obtained from \({\mathbb {G}}\) by removing \(\varvec{u}_1,\ldots ,\varvec{u}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\). Then with high probability each \(c\in \mathscr {C}({\mathbb {G}})\) has degree precisely \(d-1\), and we write \(c'\) for the erstwhile d’th neighbor of c. Further, with \(\varvec{c}_1,\varvec{c}_2,\ldots \) a sequence of uniformly and independently chosen elements of \(\mathscr {C}({\mathbb {G}})\) and \((\varvec{J}_i)_{i\ge 1}\) a sequence of independent standard Gaussians, we let

$$\begin{aligned} \mathscr {B}({\mathbb {G}})&=\mathbb {E}\left[{ \log {\sum _{h=1}^\ell \frac{\varvec{z}_{{\mathbb {G}},h}}{\varvec{z}_{{\mathbb {G}}}} \sum _{\sigma \in \{\pm 1\}}\prod _{i=1}^d1+2\sigma \tanh (\beta \varvec{J}_{i})(\mu _{{\mathbb {G}},\varvec{c}_i\rightarrow \varvec{c}_i'}[S_h]-1/2)} \Bigg |{\mathbb {G}}}\right]\\&\quad -\frac{d}{2}\mathbb {E}\left[ \log 1+4\tanh (\beta \varvec{J}_1)\sum _{h=1}^\ell \frac{\varvec{z}_{{\mathbb {G}},h}}{\varvec{z}_{{\mathbb {G}}}} (\mu _{{\mathbb {G}},\varvec{c}_1\rightarrow \varvec{c}_1'}[1|S_h]\right. \nonumber \\&\quad \left. -1/2) (\mu _{{\mathbb {G}},\varvec{c}_2\rightarrow \varvec{c}_2'}[1|S_h]-1/2)\Bigg |{\mathbb {G}}\right] -\frac{d}{2}\log 2. \end{aligned}$$

The expression \(\mathscr {B}({\mathbb {G}})\) mirrors our recurrence for the difference \(\mathbb {E}[\log Z({\mathbb {G}}(n+1,d))]-\mathbb {E}[\log Z({\mathbb {G}}(n,d))]\). Having created a moderate number of cavities, we insert a new \((n+1)\)st vertex, connected to d randomly chosen ‘cavities’. The first summand above represents the ensuing change in the free energy. But this operation adds d more edges, whereas a random regular graph with \(n+1\) vertices only has d / 2 more edges than one with n vertices. Therefore, a correction term is needed. Hence the second summand.

Crucially, the functional \(\mathscr {B}({\mathbb {G}})\) depends only on the pure state decomposition from Theorem 1.1 and the associated messages. The following theorem shows that this information suffices to compute the free energy.

Theorem 1.2

For all \(d\ge 3,\beta >0\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\,\mathbb {E}[\mathscr {B}({\mathbb {G}})]. \end{aligned}$$

Entirely in line with the ideas developed in [48], Theorem 1.2 establishes a direct conceptual link between Belief Propagation and the pure state decomposition from Theorem 1.1 and the free energy for all\(d,\beta \). Of course, in order to evaluate \(\mathscr {B}({\mathbb {G}})\) it is necessary to actually determine the pure state decomposition along with the corresponding Belief Propagation messages. The shape of this decomposition, and the practical difficulty of computing it, will depend significantly on the parameters \(d,\beta \). Alternatively, as we see next, it is possible to derive a variational formula for the free energy.

1.2.3 A variational formula.

The variational formula comes in terms of an optimization problem on a space that resembles the graphon space from the theory of graph limits [44]. To be precise, let \(\nu :[0,1]^2\rightarrow [0,1]\), \((s,x)\mapsto \nu _{s,x}\) and \(\nu ':[0,1]^2\rightarrow [0,1]\), \((s,x)\mapsto \nu '_{s,x}\) be measurable maps. We define the cut distance between \(\nu ,\nu '\) by

$$\begin{aligned} \mathscr {D}_{\Box }(\nu ,\nu ')&=\inf _{\varphi ,\varphi '}\,\sup _{\begin{array}{c} S,X\subset [0,1] \end{array}} \left| {\int _S\int _X\nu _{s,x}(\omega )-\nu '_{\varphi (s),\varphi '(x)}(\omega ){\mathrm d}x\,{\mathrm d}s}\right| , \end{aligned}$$

where \(\varphi ,\varphi ':[0,1]\rightarrow [0,1]\) are measurable maps that preserve the Lebesgue measure and \(S,X\subset [0,1]\) are measurable. Obtain the space \(\mathfrak {K}\) by identifying any \(\nu ,\nu '\) with \(\mathscr {D}_{\Box }(\nu ,\nu ')=0\). Then \(\mathfrak {K}\) endowed with the cut distance is a compact metric space. In addition, write \(\mathfrak {D}\) for the space of probability measures on \(\mathfrak {K}\).

The formula for the free energy comes as a variational problem on a subspace \(\mathfrak {D}^\star \) of \(\mathfrak {D}\). Let \(N,M\ge 0\) be integers. For \(\mu \in \mathfrak {K}\) we define a randomly perturbed \(\mu ^{*(N,M)}\in \mathfrak {K}\) as follows. Let \((\varvec{x}_{i,j})_{i,j\ge 1}\) be a family of uniform random variables on [0, 1] and let \((\varvec{J}_{i,j})_{i,j\ge 1}\) be a family of standard Gaussians, all mutually independent. Then for \(s\in [0,1]\) we define

$$\begin{aligned} \varvec{z}_{s}&=\prod _{i=1}^N\left( { \sum _{\sigma \in \{\pm 1\}}\prod _{j=1}^d1+2\tanh (\beta \varvec{J}_{i,j})(\mu _{s,\varvec{x}_{i,j}}-1/2)}\right) \\&\quad \prod _{i=1}^M\left( {1+4\tanh (\beta \varvec{J}_{i+N,1}) (\mu _{s,\varvec{x}_{i+N,1}}-1/2)(\mu _{s,\varvec{x}_{i+N,2}}-1/2)}\right) . \end{aligned}$$

Further, let

$$\begin{aligned} \varvec{t}&=\varvec{t}(s)=\inf \left\{ {\theta \in [0,1]:\int _0^\theta \varvec{z}_{u}{\mathrm d}u\ge s\int _0^1\varvec{z}_{u}{\mathrm d}u}\right\} ,&\qquad \text{ and }\quad \mu ^{*(N,M)}_{s,x}=\mu _{\varvec{t},x}\in \mathfrak {K}. \end{aligned}$$

Now, suppose that \(\pi \in \mathfrak {D}\) is a distribution, and write \(\varvec{\mu }^\pi \in \mathfrak {K}\) for a sample from \(\pi \). Then we let \(\mathfrak {D}^\star \) be the set of all \(\pi \in \mathfrak {D}\) such that the perturbed \(\varvec{\mu }^{\pi *(N,M)}\) has distribution \(\pi \) again for all \(N,M\ge 0\).

The definition of \(\mathfrak {D}^\star \), which is an adaptation of the one stated by Panchenko [52] in the case of models of Erdős-Rényi type, mirrors a natural combinatorial invariance property of the graph \({\mathbb {G}}_{n,\omega }\) with the random cavities. Indeed, because the numbers \(\varvec{X},\varvec{Y}\) of deleted edges and vertices are Poisson with a large mean \(\omega \), for any fixed NM the random graph \({\mathbb {G}}_{n,\omega }\) with \(\varvec{X}\) deleted vertices and \(\varvec{Y}\) deleted edges is close in total variation to the one with merely \(\varvec{X}-N\) deleted vertices and \(\varvec{Y}-M\) deleted edges. Furthermore, because adding or removing a small number of edges only affects the Boltzmann weights by a bounded factor, we should expect that the Bethe states of these two factor graphs remain the same. But, of course, the relative probability masses of the Bethe states will be different. Accordingly, the weights \(\varvec{z}_s\) mirror the changes in the weights of the Bethe states upon re-insertion of N vertices, each with d incident edges, and another M edges into \({\mathbb {G}}_{n,\omega }\). Once we take \(\omega \) and n to infinity, the closeness of the two random factor graphs in total variation translates into the statement that the distribution of the messages emitted by the cavities of \({\mathbb {G}}_{n,\omega }\) belongs to \(\mathfrak {D}^\star \).

Finally, define a functional \(\mathscr {B}:\mathfrak {K}\rightarrow \mathbb {R}\) by letting

$$\begin{aligned} \mathscr {B}(\mu )&=\mathbb {E}\Big [ \log \left( {\sum _{\sigma \in \{\pm 1\}}\int _0^1\prod _{i=1}^d 1+2\sigma \tanh (\beta \varvec{J}_{i,j})(\mu _{s,\varvec{x}_{i,j}}-1/2){\mathrm d}s}\right) \\&\quad -\frac{d}{2}\log \left( {1+4\tanh (\beta \varvec{J}_{1,1})\int _0^1(\mu _{s,\varvec{x}_{1,1}}-1/2)(\mu _{s,\varvec{x}_{1,2}}-1/2) {\mathrm d}s}\right) \Big ]\\&\quad -\frac{d}{2}\log 2. \end{aligned}$$

We are ready to state the variational formula for the free energy.

Theorem 1.3

For all \(d\ge 3\) and \(\beta >0\) we have \(\displaystyle \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z_\beta ({\mathbb {G}})]=\min _{\pi \in \mathfrak {D}^\star }\mathbb {E}[\mathscr {B}(\varvec{\mu }^\pi )].\)

Theorem 1.2 provides the combinatorial interpretation of the optimal \(\pi \) for Theorem 1.3: it is the kernel representing the messages \((\mu _{{\mathbb {G}},\varvec{c}\rightarrow \varvec{c}'}[\,\cdot \,|S_h])_{c\in \mathscr {C}({\mathbb {G}}),h=1,\ldots ,\ell }\) sent out by the cavities on the individual Bethe states.

1.3 The hard-core model

As a second application we discuss the hard-core model on the random regular graph \({\mathbb {G}}={\mathbb {G}}(n,d)\). This is a probability distribution on the collection of independents sets of \({\mathbb {G}}\) parametrized by \(\lambda >0\), the fugacity. Formally, encoding subsets of the vertex set by their indicator vectors, we define

$$\begin{aligned} \mu _{{\mathbb {G}}}(\sigma )&=\frac{\lambda ^{\sum _{i=1}^n\sigma _i}}{Z({\mathbb {G}})} \prod _{1\le i<j\le n}1-\varvec{1}\{v_i\in \partial v_j\}\sigma _i\sigma _j&(\sigma \in \{0,1\}^n), \end{aligned}$$

with \(Z({\mathbb {G}})\) the partition function that turns \(\mu _{{\mathbb {G}}}\) into a probability measure. Thus, \(\mu _{{\mathbb {G}}}(\sigma )=0\) unless the 1-entries of \(\sigma \) form an independent set in \({\mathbb {G}}\), in which case the weight of \(\sigma \) is proportional to \(\lambda \) taken to the power of the size of the independent set.

The hard-core model, of great prominence in statistical physics, is of eminent importance in combinatorics as well because it is closely related to the problem of finding the size of the largest independent set of the random regular graph. For d large, this problem was solved by Ding, Sly, and Sun [34] using an intricate version of the second-moment method guided by insights from the 1-step replica symmetry breaking (1RSB) version of the cavity method. But according to the physics predictions [13], the 1RSB method runs into an inherent obstacle for small d as the model exhibits a continuous phase transition to a more complicated ‘full replica symmetry breaking’ (full RSB) phase. In Corollary 1.5 below we will derive a formula for the largest independent set size that holds for all d and that accommodates the full RSB scenario.

But let us first deal with the free energy of the hard-core model, in and of itself a well-known problem. To derive a variational formula for the free energy, obtain \(\mathfrak {K}_\lambda \) from the space of all measurable functions \([0,1]^2\rightarrow [0,\lambda /(1+\lambda )]\) by identifying any \(\nu ,\nu '\) with \(\mathscr {D}_{\Box }(\nu ,\nu ')=0\). Then \(\mathfrak {K}_\lambda \) is a compact. In addition, we let \(\mathfrak {D}_\lambda \) be the space of probability measures on \(\mathfrak {K}_\lambda \). Similarly to the spin glass problem, the formula for the free energy comes as a variational problem on a subspace \(\mathfrak {D}^\star _\lambda \) of \(\mathfrak {D}_\lambda \). This subspace is defined as follows. Let \((\varvec{x}_{i,j})_{i,j\ge 1}\) be a family of independent random variables, uniformly distributed on [0, 1], and let \(N,M\ge 0\) be integers. Then for \(\mu \in \mathfrak {K}_\lambda \) we define a random \(\mu ^{*(N,M)}\in \mathfrak {K}_\lambda \) as follows. For \(s\in [0,1]\) let

$$\begin{aligned} \varvec{z}_s&=\prod _{i=1}^N\left( {1+\lambda \prod _{j=1}^d1-\mu _{s,\varvec{x}_{i,j}}}\right) \prod _{i=1}^M\left( {1-\mu _{s,\varvec{x}_{i+N,1}}\mu _{s,\varvec{x}_{i+N,2}}}\right) \quad \text{ and }\\ \varvec{t}&=\varvec{t}(s)=ty \inf \left\{ {\theta \in [0,1]:\int _0^\theta \varvec{z}_{\mu ,u}^{N,M}{\mathrm d}s\ge s\int _0^1\varvec{z}_{\mu ,u}^{N,M}{\mathrm d}u}\right\} \end{aligned}$$

and set

$$\begin{aligned} \mu ^{*(N,M)}_{s,x}=\mu _{ \varvec{t},x}\in \mathfrak {K}. \end{aligned}$$

Further, suppose that \(\pi \in \mathfrak {D}_\lambda \) is a distribution, and write \(\varvec{\mu }^\pi \in \mathfrak {K}_\lambda \) for an element chosen from \(\pi \). Then we let \(\mathfrak {D}^\star _\lambda \) be the set of all \(\pi \in \mathfrak {D}_\lambda \) such that \(\varvec{\mu }^\pi \) and \(\varvec{\mu }^{\pi *(N,M)}\) are identically distributed for all \(N,M\ge 0\). Finally, let \(\mathscr {B}:\mathfrak {K}_\lambda \rightarrow \mathbb {R}\) be the function defined by

$$\begin{aligned} \mathscr {B}(\mu )&=\mathbb {E}\left[{\log \left( {1+\lambda \int _0^1 \prod _{j=1}^d1-\mu _{s,\varvec{x}_{1,j}} {\mathrm d}s}\right) -\frac{d}{2}\log \left( {1-\int _0^1\mu _{s,\varvec{x}_{1,1}}\mu _{s,\varvec{x}_{1,2}}{\mathrm d}s}\right) }\right]. \end{aligned}$$

The variational formula for the free energy reads as follows.

Theorem 1.4

For all \(d\ge 3\) and \(\lambda >0\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]=\Phi _{d,\lambda },\qquad \text{ with }\qquad \Phi _{d,\lambda }=\min _{\pi \in \mathfrak {D}^\star _\lambda }\mathbb {E}[\mathscr {B}(\varvec{\mu }^\pi )]. \end{aligned}$$

In the limit \(\lambda \rightarrow \infty \) the distribution \(\mu _{{\mathbb {G}},\lambda }\) concentrates on the maximum independent sets of the random graph. As an application of Theorem 1.4 we therefore obtain the following result on the size of the largest independent set, i.e., the independence number \(\alpha ({\mathbb {G}})\) of the random graph.

Corollary 1.5

For all \(d\ge 3\) we have \(\displaystyle \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\alpha ({\mathbb {G}})]=\lim _{\lambda \rightarrow \infty }\lambda \cdot (\Phi _{d,\lambda +1}-\Phi _{d,\lambda }).\)

The formula in Corollary 1.5 may not be easy to evaluate; in particular, it may be difficult to obtain a numerical estimate for a given value of d. Nonetheless, since the proofs show that the optimal \(\pi \) in Theorem 1.4 is closely related to the Belief Propagation fixed points on \({\mathbb {G}}\), it should be possible to extract combinatorial information about the independent set problem on random graphs. In any case, Theorem 1.4 and Corollary 1.5 put a lid on the complexity of the problem.

1.4 Organization

In Sect. 2 we present the main results of the paper, which cover a broad family of random factor graph models. At the end of Sect. 2 we are in a position to discuss related work in detail. Sections 36 deliver the proofs of these general results. Finally, in Sect. 7 we show how Theorems 1.11.4 and Corollary 1.5 follow from the general results in Sect. 2. In addition, we work through several more applications that have each received considerable attention in their own right, such as the Potts antiferromagnet.

2 Random Factor Graphs

In this section we present the main results of the paper, which cover a broad class of models called random factor graphs. The class encompasses many well-studied examples of problems on random regular graphs or hypergraphs, including the spin glass model from the previous section. Some other cases, such as the hard-core model or extremal cuts, can be dealt with by taking limits; we will come to that in Sect. 7.

2.1 Definitions

To define random factor graph models, we consider a finite set \(\Omega \ne \varnothing \) whose elements we call spins. Moreover, for an integer \(k\ge 2\) we let \((\Psi ,P)\) be a probability space of weight functions\(\psi :\Omega ^k\rightarrow (0,1)\). We always denote by \(\varvec{\psi }\) an element of \(\Psi \) chosen from the distribution P. The space \(\Psi \) may be finite or infinite. In the latter case we assume that

$$\begin{aligned} \mathbb {E}[\exp (1/ \min _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma ))]<\infty . \end{aligned}$$
(2.1)

Furthermore, we always assume that the distribution P is invariant under permutations of the coordinates. That is, for any \(\psi \in \Psi \) and for any permutation \(\kappa \) of \(\{1,\ldots ,k\}\) the function \(\psi ^\kappa :\sigma \mapsto \psi (\sigma _{\kappa _1},\ldots ,\sigma _{\kappa _k})\) belongs to \(\Psi \) as well and \(\varvec{\psi }^\kappa \) has the same distribution as \(\varvec{\psi }\). Additionally, let p be a probability distribution on \(\Omega \) with \(p(\omega )>0\) for all \(\omega \in \Omega \). Further, let \(d\ge 3 ,n>0\) be integers and set \(m=\lfloor dn/k\rfloor \). Let \(V_n=\{v_1,\ldots ,v_n\}\) be a set of variable nodes and let \(F_m=\{a_1,\ldots ,a_m\}\) be a set of constraint nodes.

Definition 2.1

Suppose that k divides dn. The random factor graph \({\varvec{G}}={\varvec{G}}(n,d,p,P)\) consists of

  • a weight function \(\psi _{a_i}\in \Psi \) drawn from the distribution P independently for each \(i=1,\ldots ,m\) and

  • an independent uniformly random bijection \(\partial _{{\varvec{G}}}:F_m\times \{1,\ldots ,k\}\rightarrow V_n\times \{1,\ldots ,d\}\).

The definition resembles the pairing model of random regular graphs [40]. Accordingly, we use standard graph-theoretic terminology. For instance, we call \(x_i\in V_n\) and \(a_j\in F_m\) adjacent if there exist \(s\in [d]\) and \(k\in [k]\) such that \(\partial _{{\varvec{G}}}(a_j,t)=(x_i,s)\). We also use the symbol \(\partial _{{\varvec{G}}}(a_j,t)\) for the variable node \(x_i\) such that \(\partial _{{\varvec{G}}}(a_j,t)=(x_i,s)\). Further, we write \(\partial _{{\varvec{G}}}x_i\) for the set of all \(a_j\in F_m\) that \(x_i\) is adjacent to, and similarly for \(a_j\). We omit the index and just write \(\partial x_i,\partial a_j\) etc. where the reference to the random graph is apparent. In particular, \({\varvec{G}}\) induces a bipartite graph on the variable and constraint nodes, and thereby the shortest path metric on \(V_n\cup F_m\). Hence, by extension of the above notation, we write \(\partial ^\ell _{{\varvec{G}}}u\) for the set of all nodes at distance precisely \(\ell \) from u and \(\nabla ^\ell _{{\varvec{G}}}u\) for set of all variable nodes at distance at most \(\ell \) from u.

We let \(\mathscr {S}\) be the event that \({\varvec{G}}\) is simple, i.e., that there do not occur multiple edges between any variable and constraint nodes. Moreover, we denote by \({\mathbb {G}}\) the conditional distribution of \({\varvec{G}}\) given \(\mathscr {S}\). Let us make a note of the following well known fact.

Fact 2.2

([40]). We have \(\mathbb {P}\left[{{\varvec{G}}\in \mathscr {S}}\right]{\sim }\,\exp \left[{-(d-1)(k-1)/2{-}\,\varvec{1}\{k=2\}(d-1)^2/4}\right]\).

The random factor graph induces a probability distribution on \(\Omega ^{V_n}\). To define it, we introduce the shorthand \(\psi _{a_i}(\sigma )=\psi _{a_i}(\sigma (\partial (a_i,1)),\ldots ,\sigma (\partial (a_i,k)))\) for \(i\in [m]\) and \(\sigma \in \Omega ^{V_n}\). Thus, \(\psi _{a_i}(\sigma )\) is the weight that constraint node \(a_i\) gives to \(\sigma \). Further, we introduce the total weight

$$\begin{aligned} \psi _{{\varvec{G}}}(\sigma )&=\prod _{i=1}^m\psi _{a_i}(\sigma )&(\sigma \in \Omega ^{V_n}) \end{aligned}$$

by multiplying up all the weight functions of the constraint nodes. The total weights \(\psi _{{\varvec{G}}}(\sigma )\) give rise to the partition function and the Boltzmann distribution:

$$\begin{aligned} Z({\varvec{G}})&=\sum _{\tau \in \Omega ^{V_n}}\psi _{{\varvec{G}}}(\tau )\prod _{i=1}^np(\sigma (x_i)),&\mu _{{\varvec{G}}}(\sigma )&=\frac{\psi _{{\varvec{G}}}(\sigma )}{Z({\varvec{G}})}\prod _{i=1}^np(\sigma (x_i))&(\sigma \in \Omega ^{V_n}). \end{aligned}$$
(2.2)

Since all the weight functions \(\psi \in \Psi \) are strictly positive, the Boltzmann distribution is a well-defined probability measure on the phase space \(\Omega ^{V_n}\).

We set out to investigate the structure of the Boltzmann distribution \(\mu _{{\varvec{G}}}(\,\cdot \,)\) and to compute the partition function \(Z({\varvec{G}})\) or, more specifically, its logarithm, which we call the free energy. In Sect. 2.2 we will prove the main result of the paper, which provides that the Boltzmann distribution decomposes into a convex combination of relatively simple distributions called Bethe states. But before we come to that, let us look at an example.

Example 2.3

(the k-spin model). Let \(\Omega =\{\pm 1\}\), let \(k\ge 2\) be an integer and let \(\beta >0\) be a real parameter. The k-spin model is a generalization of the spin glass model from the previous section, which corresponds to the special case \(k=2\). The weight functions of the k-spin model read

$$\begin{aligned} \psi _{\beta ,J}(\sigma _1,\ldots ,\sigma _k)&=\frac{1}{2}\left( {1+\tanh (\beta J)\prod _{i=1}^k\sigma _i}\right) \qquad \qquad \qquad (J\in \mathbb {R}). \end{aligned}$$

Thus, \(\Psi =\{\psi _{\beta ,J}:J\in \mathbb {R}\}\), and the distribution P on \(\Psi \) is defined by choosing J from the standard Gaussian distribution. This distribution clearly satisfies (2.1). Geometrically, this model lives on a generalized Bethe lattice where all variable nodes, representing the sites, have degree d, while all constraint nodes, representing the interactions, have degree k.

The hard-core model from Sect. 1.3 cannot be expressed as a factor graph model directly because of the requirement that all weight functions be strictly positive. But it is possible to arrive at the hard-core model by taking suitable limits; see Sect. 7 for details.

2.2 Bethe states

The Belief Propagation message-passing scheme provides the mainstay of the physicists’ non-rigorous cavity method. Our first main result vindicates its use by showing that the Boltzmann distribution of any random factor graph model can be described in terms of Belief Propagation fixed points.

To introduce Belief Propagation let \(\mathscr {M}({\varvec{G}})\) be the message space, consisting of all families

$$\begin{aligned} \nu =(\nu _{v\rightarrow a},\nu _{a\rightarrow v})_{v\in V_n, a\in F_n:v\in \partial _{{\varvec{G}}}a} \end{aligned}$$

of probability measures \(\nu _{v\rightarrow a},\nu _{a\rightarrow v}\) on \(\Omega \). For adjacent av we interpret \(\nu _{v\rightarrow a}\) as a ‘message’ from v to a, and \(\nu _{a\rightarrow v}\) as a message in the reverse direction. We equip \(\mathscr {M}({\varvec{G}})\) with the metric

$$\begin{aligned} \mathscr {D}_1(\nu ,\nu ')&=\frac{1}{n}\sum _{v,a:v\in \partial _{{\varvec{G}}}a}\left\| {\nu _{v\rightarrow a}-\nu '_{v\rightarrow a}}\right\| _{\mathrm {TV}}+\left\| {\nu _{a\rightarrow v}-\nu '_{a\rightarrow v}}\right\| _{\mathrm {TV}}. \end{aligned}$$

Belief Propagation is the operator \(\mathrm {BP}:\mathscr {M}({\varvec{G}})\rightarrow \mathscr {M}({\varvec{G}})\) that maps \(\nu \) to \({\hat{\nu }}\) defined by

$$\begin{aligned} {\hat{\nu }}_{v\rightarrow a}(\sigma )&=\frac{p(\sigma )\prod _{b\in \partial v{\setminus } a}\nu _{b\rightarrow v}(\sigma )}{\sum _{\tau \in \Omega }p(\tau )\prod _{b\in \partial v{\setminus } a}\nu _{b\rightarrow v}(\tau )},\nonumber \\ {\hat{\nu }}_{a\rightarrow v}(\sigma )&=\frac{\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _v=\sigma \}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\nu _{w\rightarrow a}(\tau _w)}{\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\nu _{w\rightarrow a}(\tau _w)}. \end{aligned}$$
(2.3)

Further, a point \(\nu \in \mathscr {M}({\varvec{G}})\) is an \(\varepsilon \)-Belief Propagation fixed point if \(\mathscr {D}_1(\nu ,\mathrm {BP}(\nu ))<\varepsilon .\)

For a thorough discussion and motivation of Belief Propagation we refer to [46]. The punch line is that on acyclic factor graphs a Belief Propagation fixed point computation provably yields the marginals of the Boltzmann distribution as well as the free energy. Since the random graph \({\varvec{G}}\) contains only very few short cycles, one may therefore expect that Belief Propagation renders meaningful information on random factor graphs as well, provided that the Boltzmann distribution is free of long-range correlations.

Alas, in general long-range correlations do occur. Nevertheless, we will prove that the Boltzmann distribution still decomposes into a convex combination of relatively few ‘Bethe states’, characterized by Belief Propagation fixed points. To be precise, suppose that \(\varnothing \ne S\subset \Omega ^{V_n}\) is an event. Let v be a variable node and let \(a\in \partial _{{\varvec{G}}} v\). Then we define \(\mu _{{\varvec{G}},v\rightarrow a}(\,\cdot \,|S)\) as the conditional marginal of v given S under the Boltzmann distribution of the factor graph \({\varvec{G}}-a\) obtained from \({\varvec{G}}\) by removing the constraint node a. In formulas, with \(\left\langle {{\,\cdot \,},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle \) denoting the expectation with respect to \(\varvec{\sigma }\) drawn from \(\mu _{{\varvec{G}}}(\,\cdot \,|S)\), we have

$$\begin{aligned} \mu _{{\varvec{G}},v\rightarrow a}(\sigma \mid S)&=\frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/\psi _a(\varvec{\sigma })},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle }{\left\langle {{1/\psi _a(\varvec{\sigma })},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle }&(\sigma \in \Omega ). \end{aligned}$$
(2.4)

Similarly, we let \(\mu _{{\varvec{G}},a\rightarrow v}(\,\cdot \,|S)\) be the conditional marginal of v under the Boltzmann distribution of the factor graph obtained from \({\varvec{G}}\) by removing all constraint nodes \(b\in \partial _{{\varvec{G}}} v{\setminus } a\) and disregarding the prior of v:

$$\begin{aligned} \mu _{{\varvec{G}},a\rightarrow v}(\sigma \mid S)&=\frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/(p(\sigma )\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle }{\left\langle {{1/(p(\varvec{\sigma }_v)\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle }&(\sigma \in \Omega ). \end{aligned}$$
(2.5)

We refer to \(\mu _{{\varvec{G}},v\rightarrow a}(\,\cdot \,|S),\mu _{{\varvec{G}},a\rightarrow v}(\,\cdot \,|S)\) as the standard messages given S.

Definition 2.4

Let \(\varepsilon >0\). An event \(S\subset \Omega ^n\) is an \(\varepsilon \)-Bethe state of \({\varvec{G}}\) if the following two conditions hold.

BS1::

the standard messages given S are an \(\varepsilon \)-Belief Propagation fixed point.

BS2::

if \(\ell ,\ell '\le 1/\varepsilon \) and if \(\varvec{I}\subset V_n\), \(\varvec{J}\subset F_m\) are independent uniformly random sets of sizes \(|\varvec{I}|=\ell \), \(|\varvec{J}|=\ell '\), then for every \(\sigma \in \Omega ^{V_n}\) we have

$$\begin{aligned}&\mathbb {E}\bigg |\left\langle {{\varvec{1}\{\forall v\in \varvec{I}\cup \partial \varvec{J}\cup \partial ^2\varvec{I}: \varvec{\sigma }_v=\sigma _v\}},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle \nonumber \\&\quad -\prod _{v\in \varvec{I}}\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a{\setminus } v}\mu _{w\rightarrow a}(\sigma _w|S)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a{\setminus } v}\mu _{w\rightarrow a}(\tau _w|S)}\cdot \nonumber \\&\quad \prod _{a\in \partial \varvec{J}}\frac{\psi _a(\sigma )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\sigma _w|S)}{\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\tau _w|S)} \bigg |<\varepsilon . \end{aligned}$$
(2.6)

Thus, on a Bethe state the standard messages form an approximate Belief Propagation fixed point. Furthermore, locally around a bunch of randomly chosen variable and constraint nodes the Boltzmann distribution is characterized by the standard messages. In particular, setting \(\ell =0\) and \(\ell '=1\) in BS2, we see that the conditional joint distribution \(\mu _{{\varvec{G}},\partial a}(\,\cdot \,|S)\) of the variables around a typical random constraint node a reads

$$\begin{aligned} \mu _{{\varvec{G}},\partial a}(\sigma \mid S)&=\frac{\psi _a(\sigma )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\sigma _w|S)}{\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\tau _w|S)}+O(\varepsilon )&(\sigma \in \Omega ^{\partial a}). \end{aligned}$$
(2.7)

Additionally, setting \(\ell =1\) and \(\ell '=0\), we find that the local distribution around a typical variable node v, i.e., the distribution \(\mu _{{\varvec{G}},v\cup \partial ^2v}(\,\cdot \,|S)\) induced on the second neighborhood of v, reads

$$\begin{aligned} \mu _{{\varvec{G}},v\cup \partial ^2v}(\sigma \mid S)&=\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\sigma _w|S)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{w\rightarrow a}(\tau _w|S)}\nonumber \\&\quad +O(\varepsilon ) \quad (\sigma \in \Omega ^{v\cup \partial ^2v}). \end{aligned}$$
(2.8)

Thus, for most variable nodes v the conditional Boltzmann marginal \(\mu _{{\varvec{G}},v}(\,\cdot \,|S)\) satisfies

$$\begin{aligned} \mu _{{\varvec{G}},v}(\sigma \mid S)&=\frac{p(\sigma )\prod _{a\in \partial v}\mu _{a\rightarrow v}(\sigma |S)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\mu _{a\rightarrow v}(\chi |S)}+O(\varepsilon )&(\sigma \in \Omega ). \end{aligned}$$
(2.9)

Apart from the conditioning on S, the formulas (2.7)–(2.9) coincide with the ones known in the acyclic case [46].

In addition, (2.6) implies that if we pick a few variable and/or constraint nodes randomly, then the joint distribution of their neighborhoods approximately factorizes. Applied to \(\ell =2\), \(\ell '=0\), this means that once we condition on S, the joint distribution of two randomly chosen variable nodes is close to a product distribution:

$$\begin{aligned} \frac{1}{n^2}\sum _{1\le i<j\le n}\mathbb {E}\left\| {\mu _{{\varvec{G}},v_i,v_j}(\,\cdot \,|S)-\mu _{{\varvec{G}},v_i}(\,\cdot \,|S)\otimes \mu _{{\varvec{G}},v_j}(\,\cdot \,|S)}\right\| _{\mathrm {TV}}&=O(\varepsilon ); \end{aligned}$$
(2.10)

in statistical physics jargon, the conditional distribution \(\mu _{{\varvec{G}}}(\,\cdot \,|S)\) is replica symmetric.

Confirming the picture sketched by the cavity method and vindicating the use of Belief Propagation for the study of the Boltzmann distribution, the following theorem shows that w.h.p. the Boltzmann distribution of a random factor graph decomposes into a relatively small number of Bethe states.

Theorem 2.5

For any function \(L=L(n)\rightarrow \infty \) there exists \(\varepsilon =\varepsilon (n)\rightarrow 0\) such that the following is true. There exists a decomposition \(S_0=S_0({\varvec{G}}),S_1=S_1({\varvec{G}}),\ldots ,S_\ell =S_\ell ({\varvec{G}})\), \(\ell =\ell ({\varvec{G}})\le L\), of \(\Omega ^n\) into non-empty sets such that \(\mu _{{\mathbb {G}}}(S_0)\le \varepsilon \) such that with high probability \(S_1,\ldots ,S_\ell \subset \Omega ^n\) are \(\varepsilon \)-Bethe states. The same statement holds with \({\varvec{G}}\) replaced by \({\mathbb {G}}\).

An important feature of Theorem 2.5 is that the upper bound L on the size of the Bethe state decomposition can be an arbitrarily slowly growing function of n. Thus, the Gibbs measure can generally be decomposed into relatively few Bethe states, within which long-range correlations are negligible and where short-range correlations are characterized by Belief Propagation.

2.3 The free energy

Apart from the structure of the Boltzmann distribution, a second key challenge is the computation of the free energy. More specifically, arguably the single most important quantity associated with a random factor graph model is the free energy density

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\varvec{G}})}\right]. \end{aligned}$$
(2.11)

Of course, it comes as no surprise that computing (2.11) generally poses a formidable challenge. In fact, even the existence of the limit remains an unresolved problem in several interesting cases.

The next theorem provides a formula for (and en passant establishes the existence of) the limit (2.11) in terms of the Bethe state decomposition from Theorem 2.5 for a broad class of models. We merely require a certain ‘convexity condition’. This condition can be stated neatly in terms of a space that resembles the graphon space from combinatorics [44]. Specifically, let \(\mathscr {K}\) be the space of all measurable maps \([0,1]^2\rightarrow \mathscr {P}(\Omega )\) modulo equality (Lebesgue-)almost everywhere. We call these maps strong kernels. For \((s,x)\in [0,1]\) and \(\mu \in \mathscr {K}\) we let \(\mu _{s,x}\in \mathscr {P}(\Omega )\) denote the function value of \(\mu \) at (sx). Further, for \(\mu ,\mu '\in \mathscr {K}\) we define the cut distance

$$\begin{aligned} \mathscr {D}_{\Box }(\mu ,\mu ')&=\inf _{\varphi ,\varphi '}\,\sup _{\begin{array}{c} S,X\subset [0,1]\\ \omega \in \Omega \end{array}} \left| {\int _S\int _X\mu _{s,x}(\omega )-\mu '_{\varphi (s),\varphi '(x)}(\omega ){\mathrm d}x\,{\mathrm d}s}\right| , \end{aligned}$$
(2.12)

where the infimum is over all measurable \(\varphi ,\varphi ':[0,1]\rightarrow [0,1]\) that preserve the Lebesgue measure and where the supremum runs over all measurable \(S,X\subset [0,1]\). Strictly speaking, \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\) is a pre-metric (as possibly \(\mathscr {D}_{\Box }(\mu ,\nu )=0\) even though \(\mu \ne \nu \)). We therefore let \(\mathfrak {K}\) be the metric space where any two \(\mu ,\nu \) with \(\mathscr {D}_{\Box }(\mu ,\nu )=0\) are identified. Then \(\mathfrak {K}\) is a compact Polish space [39]. Additionally, we write \(\mathfrak {D}\) for the space of all probability distributions on \(\mathfrak {K}\).

Crucially, the convexity assumption that we require comes solely in terms of the distribution P on the set \(\Psi \) of weight functions. Namely, let \(\varvec{x}=(\varvec{x}_i)_{i\ge 1}\) be a sequence of independent uniformly random points in [0, 1], chosen independently of \(\varvec{\psi }\in \Psi \). Writing \(\mathbb {E}\left[{\,\cdot \,}\right]\) for the expectation on \(\varvec{x},\varvec{\psi }\), we make the following assumption.

figure a

We will see in Sect. 7 that POS is easily verified for several interesting models, including the spin glass model from Sect. 1.

To obtain the formula for the free energy, we will represent the Bethe state decomposition of the random factor graph by a point in \(\mathfrak {K}\). Specifically, let \(\varvec{X},\varvec{Y}\) be random variables with distribution \(\mathrm{Po}(\omega )\) for an integer \(\omega >0\), mutually independent and independent of \({\varvec{G}}\). Then with \(S_1,\ldots ,S_\ell \) the decomposition promised by Theorem 2.5 we introduce for \(i=1,\ldots ,\ell \),

$$\begin{aligned} {\check{\varvec{z}}}_{{\varvec{G}},i}=\mu _{{\varvec{G}}}(S_i)&\cdot \prod _{i=1}^{\varvec{X}}\left( {\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v_i}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _{v_i}=\chi \}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v_i}\mu _{w\rightarrow a}(\tau _w|S_i)}\right) ^{-1}\nonumber \\&\cdot \prod _{i=1}^{\varvec{Y}}\left( {\sum _{\tau \in \Omega ^{\partial a_i}}\psi _{a_i}(\tau )\prod _{w\in \partial a_i}\mu _{w\rightarrow a_i}(\tau _w|S_i)}\right) ^{-1}, \end{aligned}$$
(2.13)

and we let \({\check{\varvec{z}}}_{{\varvec{G}}}=\sum _{i=1}^\ell {\check{\varvec{z}}}_{{\varvec{G}},i}\). It will emerge that combinatorially \({\check{\varvec{z}}}_{{\varvec{G}},i}/{\check{\varvec{z}}}_{{\varvec{G}}}\) represents the probability mass of the Bethe state \(S_i\) in the factor graph \({\varvec{G}}'\) where we remove the first \(\varvec{Y}\) constraint nodes \(a_1,\ldots ,a_{\varvec{Y}}\) as well as the first \(\varvec{X}\) variable nodes \(v_1,\ldots ,v_{\varvec{X}}\) along with their adjacent constraint nodes. While this removal operation has no discernible impact on the free energy (so long as \(\omega =o(n)\)), it enables us to set up a recurrence for computing this quantity.

The recurrence comes in terms of the messages sent out by those variable nodes that are left with degree \(d-1\) after the removal operation. We thus set up a kernel that captures these messages. Specifically, let \(v_{h_1},\ldots ,v_{h_t}\) be the variable nodes of degree \(d-1\) in the factor graph \({\varvec{G}}'\) and let \(b_{1},\ldots ,b_{t}\) be their \({\varvec{G}}\)-neighbors that got deleted. Then we define the kernel \({\check{\mu }}_{{\varvec{G}},X,Y}:[0,1]^2\rightarrow \mathscr {P}(\Omega )\) by letting

$$\begin{aligned} {\check{\mu }}_{{\varvec{G}},X,Y}:&(s,x)\mapsto \sum _{i=1}^t\sum _{j=1}^\ell \varvec{1}\left\{ {t-1\le x<t,\,\sum _{h<j}{\check{\varvec{z}}}_{{\varvec{G}},h}<s \varvec{z}_{{\varvec{G}}}\le \sum _{h\le j}{\check{\varvec{z}}}_{{\varvec{G}},h}}\right\} \mu _{{\varvec{G}},v_{h_i}\rightarrow b_i}(\,\cdot \,|S_j). \end{aligned}$$
(2.14)

Recalling that \({\varvec{G}},\varvec{X},\varvec{Y}\) are random, we write \({\check{\pi }}_{n,\omega }\in \mathfrak {D}\) for the distribution of \({\check{\mu }}_{{\varvec{G}},\varvec{X},\varvec{Y}}\). Analogously, we write \({\check{\pi }}_{n,\omega ,\mathscr {S}}\) for the distribution of \({\check{\mu }}_{{\mathbb {G}},\varvec{X},\varvec{Y}}\) defined for the simple random factor graph.

Finally, we introduce a functional on the space \(\mathfrak {D}\) that encodes the recurrence for computing the free energy from the Bethe state decomposition. Namely, let \((\varvec{x}_{i,j})_{i,j\ge 1}\) be a family of random variables that are uniform on [0, 1], let \((\varvec{h}_i)_{i\ge 1}\) be a family of random variables that are uniform on \(\{1,\ldots ,k\}\), let \((\varvec{\psi }_i)_{i\ge 1}\) be a sequence of samples from P, and let \(\varvec{\mu }^{\pi }\in \mathfrak {K}\) be a sample from \(\pi \in \mathfrak {D}\), all mutually independent; then

$$\begin{aligned} \mathscr {B}(\pi )&=\mathbb {E}\Big [ \log \int _0^1\sum _{\sigma \in \Omega }p(\sigma )\prod _{i=1}^{d} \sum _{\begin{array}{c} \tau \in \Omega ^k:\\ \tau _{\varvec{h}_i}=\sigma \end{array}}\varvec{\psi }_{i}(\varvec{\tau })\prod _{j\ne \varvec{h}_i}\varvec{\mu }^\pi _{s,\varvec{x}_{i,j}}(\tau _j){\mathrm d}s\nonumber \\&\quad -d(1-k^{-1})\log \int _0^1\sum _{\tau \in \Omega ^k}\varvec{\psi }_1(\tau )\prod _{j=1}^k\varvec{\mu }^\pi _{s,\varvec{x}_{1,j}}(\tau _j){\mathrm d}s \Big ]. \end{aligned}$$
(2.15)

We obtain the following expression for the free energy.

Theorem 2.6

Assume that condition POS is satisfied. Then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\varvec{G}})}\right]&= \liminf _{\omega \rightarrow \infty }\,\liminf _{n \rightarrow \infty } \mathscr {B}(\check{\pi }_{n,\omega }), \\ \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\mathbb {G}})}\right]&= \liminf _{\omega \rightarrow \infty }\,\liminf _{n \rightarrow \infty } \mathscr {B}(\check{\pi }_{n,\omega ,\mathscr {S}}). \end{aligned}$$

In particular, the limit on the left hand side exists, and it can be computed from the Bethe state decomposition.

2.4 A variational formula

We proceed to state a variational formula for the free energy of the random factor graph models akin to the one from Theorem 1.3 for the spin glass model. Namely, we express the limit (2.11) variationally as the infimum of \(\mathscr {B}(\pi )\) over \(\pi \) chosen from a certain subspace \(\mathfrak {D}^\star \subset \mathfrak {D}\). The definition of \(\mathfrak {D}^\star \) is an adaptation to the Bethe lattice of the invariance property that Panchenko [52] put forward in the case of the Erdős-Rényi model.

To define the subspace \(\mathfrak {D}^\star \) let \(\mu \in \mathscr {K}\), let \(s\in [0,1]\) and let \(N,M\ge 0\) be integers. We introduce the random variable

$$\begin{aligned} \varvec{z}(s)=&\prod _{i=1}^N\left[{\sum _{\sigma \in \Omega }p(\sigma )\prod _{j=1}^d\sum _{\tau \in \Omega ^k}\varvec{1}\{\tau _{\varvec{h}_i}=\sigma \} \varvec{\psi }_{di+j}(\tau )\prod _{h\ne \varvec{h}_i}\mu _{s,\varvec{x}_{k(di+j)+h}}(\tau _h)}\right]\nonumber \\&\cdot \prod _{i=1}^M\left[{\sum _{\tau \in \Omega ^k}\varvec{\psi }_{dN+i}(\tau )\prod _{j=1}^k\mu _{s,\varvec{x}_{dk(N+1)+j}}(\tau _j)}\right]. \end{aligned}$$
(2.16)

Further, let

$$\begin{aligned} \varvec{t}&=\varvec{t}(s)=\inf \left\{ {\theta \in [0,1]:\int _0^\theta \varvec{z}(u){\mathrm d}u\ge s\int _0^1 \varvec{z}(u){\mathrm d}u}\right\}&\text{ and }\qquad \mu ^{*(N,M)}_{s,x}=\mu _{\varvec{t},x}. \end{aligned}$$
(2.17)

Thus, for each \(\mu \in \mathfrak {K}\) we obtain a random \(\mu ^{*(N,M)}\in \mathfrak {K}\). Further, given \(\pi \in \mathfrak {D}\) we can apply this operation to a randomly chosen kernel \(\varvec{\mu }^\pi \in \mathfrak {K}\), thus obtaining a random kernel \(\varvec{\mu }^{\pi *(N,M)}\). We denote the distribution of \(\varvec{\mu }^{\pi *(N,M)}\) by \(\pi ^{*(N,M)}\). Now, let \(\mathfrak {D}^\star \) be the set of all densities \(\pi \in \mathfrak {D}\) such that \(\pi ^{*(N,M)}=\pi \) for all \(N,M\ge 0\). Then we obtain the following self-contained formula for the free energy.

Theorem 2.7

Assume that POS holds. Then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\varvec{G}})}\right]= \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\mathbb {G}})}\right]=\min _{\pi \in \mathfrak {D}^\star }\mathscr {B}(\pi ). \end{aligned}$$

Admittedly, the variational formula may not be easy to evaluate. But Theorem 2.7 places a lid on the complexity of the problem, and Theorem 2.6 provides an explicit combinatorial interpretation of the minimizer in terms of Belief Propagation fixed points and Bethe states.

2.5 Discussion and related work

Over the past two decades an enormous amount of research, based on both rigorous and non-rigorous techniques, has been devoted to random factor graph models. Much of this work has been sparked by the cavity method advanced in the original contribution of Mézard and Parisi [48]. A survey of this literature up until about 2008 can be found in [46]. More recently models of Bayesian inference problems such as the stochastic block model have received a great deal of attention as well; this literature is surveyed in [1, 50, 58].

Rigorous work on random factor graphs and the cavity method can broadly be split into two categories. First, contributions that investigate physics predictions on specific models. Many of these contributions, particularly the earlier ones, rely on ‘classical’ techniques such as the second moment method, albeit frequently with physics-inspired twists. Examples include work on the k-SAT threshold [3, 6, 20, 21, 32], which culminated in the proof of the k-SAT threshold conjecture for large k [33], the Potts model and the random graph coloring problem [4, 11, 16, 26] or the hard-core model [27, 34]. Some recent work is based on the powerful but technically demanding idea of ‘spatial coupling’, which has led to important results in, e.g., coding theory [37] and random constraint satisfaction problems [2]. A second line of work focused on the mathematical vindication of the cavity method in general, with applications to specific models of interest. Examples include work on the role of spatial mixing [7, 28,29,30], the use of the interpolation method [14, 43, 55], phase transitions in inference problems [12, 19], and contributions based on the asymptotic analysis of the Boltzmann distribution such as the influential work of Panchenko [52] as well as [10, 15]. The present paper belongs to this second category.

In the following we discuss the main results and methods of the paper and how they compare to prior mathematical research. Subsequently we compare the present work with the physics intuition and discuss directions for future research.

2.5.1 Mathematical work.

We regard Theorem 2.5 as the main result of the paper. The theorem confirms in great generality one of the key assumptions behind the cavity method and explains the success of Belief Propagation as a device for analyzing random regular factor graph models. Indeed, the existence of a Bethe state decomposition has been conjectured explicitly, e.g., by Mézard and Montanari [46, Chapter 19]; see also Dembo and Montanari [28].

In a prior paper [23] we constructed a Bethe state decomposition for random factor graph models of Erdős-Rényi type, where the constraint nodes independently choose k-tuples of adjacent variable nodes. While we will be able to use some of the general tools developed in that work, the main argument breaks in the case of the Bethe lattice due to its rigid geometry. Indeed, the construction of the Bethe state decomposition hinges on coupling arguments involving, e.g., a coupling of a factor graph with n variable and m constraint nodes and another one with parameters \(n'\) and \(m'\) such that \(n=n'+O(1)\), \(m=m'+O(1)\). Due to the Poisson degree distribution and the Stein-Chen property, such arguments are pretty straightforward in the Erdős-Rényi case. One might say that the Erdős-Rényi graph resembles a gentle climbing wall with footholds supplied by the irregularity of the Poisson degree distribution. By contrast, the Bethe lattice with its regular degree makes for a smooth cliff. As a consequence, the Bethe lattice requires new ideas, leading to a rather subtle but ultimately elegant argument. The upshot is that this proof, which we present in Sect. 4, can be expected to generalize to other random graph models with given degrees. Apart from the appeal of such lattice-like models from a physics perspective, these models play a vital role, e.g., in coding theory, where a suitably chosen degree sequence is apt to greatly boost performance [56].

Similarly, the variational formula for the free energy provided by Theorem 2.7 is a generalization and adaptation of the formula established by Panchenko [52] for models of Erdős-Rényi type with spins \(\Omega =\{\pm 1\}\). Panchenko’s proof relies on two ingredients: an interpolation argument and a coupling argument. So does ours. But while the interpolation argument, an adaptation of the technique of Franz and Leone [35], goes through without too much trouble, the coupling argument does not. Once more the rigidity of the Bethe lattice poses substantial challenges that require subtle new arguments. A further, albeit relatively minor extension is that the present work applies to relatively general models with two or more spins, subject only to the condition POS. A further similarity between Panchenko’s work and ours is the embedding of discrete Boltzmann distributions into a compact metric space, which enables us to pick convergent subsequences. While Panchenko resorts to the Aldous-Hoover representation, here we use the cut metric and the associated kernel space, which is convenient to link the combinatorial representation of the measures in terms of messages directly with the free energy formula. That the Aldous-Hoover representation is closely related to graph limits is, of course, a well known fact [31].

Furthermore, Bayati, Gamarnik and Tetali [14] applied the interpolation method to factor graph models, including ones with regular degrees, to establish the existence of the limit \(\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\varvec{G}})]\) in certain cases via a super-additivity argument. In the process they also used arguments based on ‘cavities’, i.e., the removal of a small but linear number of vertices from the graph; a similar trick was used in [22] as well. But here, particularly in the construction of the Bethe state decomposition, we need to tread much more carefully. In particular, while removal of a small linear number of vertices does not shift the free energy too much, here we can only afford the creation of a very small number of cavities in order to avoid a distortion of the Boltzmann distribution, an extremely volatile object.

Theorem 2.6, which expresses the free energy density in terms of the Bethe state decomposition, is a synthesis of Theorems 2.5 and 2.7. The proof shows that the free energy can be expressed in terms of a particular distribution on kernels \([0,1]^2\rightarrow \mathscr {P}(\Omega )\), namely the one that encodes the Bethe state decomposition of the random factor graph or, more specifically, the associated Belief Propagation messages. No corresponding result was previously known even in the conceptually simpler Erdős-Rényi case.

Apart from the interpolation method and coupling arguments, the proofs of Theorems 2.52.7 rely on some of the techniques that we developed in [10, 15, 19, 24], particularly the cut metric and its ramifications. The cut metric, which we apply to kernel representations of probability distributions, was originally developed in the context of the regularity method [36] and the theory of graph limits in combinatorics [44]. Here we use the cut metric and certain assorted results, such as the ‘pinning lemma’ (Lemma 3.15 below) from [19] as tools, e.g., in the construction of the Bethe state decomposition.

While the present paper is concerned with diluted models where each node has a (fixed) bounded number of neighbors, there is also a substantial literature on fully connected models. The prime example, of course, is the Sherrington-Kirkpatrick model. The monographs of Panchenko [54] and Talagrand [57] provide an overview of this literature. In particular, the TAP equations, the (simplified) fixed point equations that correspond to the Belief Propagation equations in the fully connected case, have been established in several cases [8].

In Sect. 7 we work out several application of the general results to specific models, such as the spin glass model from Sect. 1. Pointers to related work on the specific problems can be found there.

2.5.2 The physics perspective.

The seminal work of Mézard and Parisi [48] marks the starting point of a substantial body of physics work. Highlights include the Survey Propagation algorithm and precise predictions on phase transitions, including satisfiability thresholds in combinatorial problems [41, 47, 49].

The results provided by Theorems 2.52.7 are perfectly in line with the physics predictions. But we should comment on a subtle point that is apt to cause confusion. Namely, it has been pointed out that within the replica symmetric phase of certain models the support of the Boltzmann distribution may decompose into an exponentially large number of tiny ‘clusters’ [41, 47], a phenomenon called ‘dynamic replica symmetry breaking’. Indeed, it has been conjectured that each of these tiny clusters induces a Bethe state [46]; for the special case of the random graph coloring problem, this can be verified rigorously [11]. At first glance this proliferation of Bethe states may appear to contradict Theorem 2.5, where the number of Bethe states is upper-bounded by an arbitrarily slowly growing function L(n). Yet the Bethe state decomposition is not unique, and despite the abundance of tiny clusters, \(\mu _{{\varvec{G}}}\) itself is replica symmetric (i.e., condition (2.10) holds for \(S= \Omega ^n\)) throughout the dynamic RSB phase. In effect, Theorem 2.5 would render just a single Bethe state that comprises all of the tiny clusters. By contrast, beyond the dynamic RSB phase, within the so-called condensed phase, Theorem 2.5 would yield a non-trivial decomposition. The existence of a condensed phase has been established rigorously in several examples [17, 19].

The variational formula for the free energy furnished by Theorem 2.7 is in line with the physics work, which does, however, provide additional clues as to the structure of the minimizer of the functional \(\mathscr {B}(\,\cdot \,)\). Specifically, three different scenarios are expected to occur, depending on the model and the choice of its parameters. First, the replica symmetric scenario with a single (or a bounded number of) Bethe states. Second, the so-called ‘one-step replica symmetry breaking’ scenario, where there are an unbounded number of ‘independent’ Bethe states. Third, the ‘full replica symmetry breaking’ scenario, where the Bethe states form a hierarchical structure; see [46] for a detailed discussion. Clearly, in order to better evaluate the variational formula it would be very valuable to establish this additional structural information rigorously; in the Erdős-Rényi case first attempts have been undertaken in [53].

2.6 Organization

In Sect. 3 we introduce the necessary pieces of notation and state some basic results that we will need. Then in Sect. 3.3 we revisit the cut metric. While much of what we need on this subject already appears in earlier papers, there are a few general preparations that we need to make and that we carry out in that section. Subsequently Sect. 4 deals with the proof of Theorem 2.5. In Sects. 5 and 6 we then prove Theorem 2.7 about the variational formula for the free energy. Sect. 6 also contains the proof of Theorem 2.6. Finally, in Sect. 7 we work through a few applications, including the spin glass and hard-core models from Sect. 1.

3 Preliminaries

3.1 Basics

For an integer \(\ell \ge 1\) we use the shorthand \([\ell ]=\{1,\ldots ,\ell \}\). Furthermore, the symbols \(O(\,\cdot \,),\Omega (\,\cdot \,),\ldots \) refer to the limit \(n\rightarrow \infty \) by default. To indicate asymptotics with respect to another variable K tending to infinity, we write \(O_K(\,\cdot \,),\Omega _K(\,\cdot \,)\), etc. Further, where set operations involve singletons, we usually omit braces. For instance, if \(x\in X\), then we just write \(X{\setminus } x\) rather than \(X{\setminus }\left\{ {x}\right\} \).

For a finite set \(\mathscr {X}\) we let \(\mathscr {P}(\mathscr {X})\) be the set of all probability distributions on \(\mathscr {X}\), endowed with the total variation distance. More generally, if \((\mathscr {X},\mathfrak {A})\) is a measurable space, then \(\mathscr {P}(\mathscr {X})=\mathscr {P}(\mathscr {X},\mathfrak {A})\) denotes the set of all probability measures on this space. Further, for probability measures \(\pi ,\pi '\in \mathscr {P}(\mathscr {X})\) we let \(\Gamma (\pi ,\pi ')\) be the set of all couplings of \(\pi ,\pi '\). Thus, \(\gamma \in \Gamma (\pi ,\pi ')\) is a probability distribution on \(\mathscr {X}\times \mathscr {X}\) with marginals \(\pi ,\pi '\).

Suppose that \(\mathscr {X}\) is a finite set, that \(n\ge 1\) is an integer and that \(\mu \in \mathscr {P}(\mathscr {X}^n)\). Then we denote by \(\varvec{\sigma }^\mu ,\varvec{\sigma }^{1,\mu },\varvec{\sigma }^{2,\mu },\ldots \) a sequence of independent samples from \(\mu \). We omit the superscript \(\mu \) where it is evident from the context. Further, if \(f:(\mathscr {X}^n)^\ell \rightarrow \mathbb {R}\) is a function, then we write \(\left\langle {{f(\varvec{\sigma }^1,\ldots ,\varvec{\sigma }^\ell )},{\mu }}\right\rangle \) for the expectation of f with respect to independent samples from \(\mu \); thus,

$$\begin{aligned} \left\langle {{f(\varvec{\sigma }^1,\ldots ,\varvec{\sigma }^\ell )},{\mu }}\right\rangle =\sum _{\sigma ^1,\ldots ,\sigma ^\ell \in \mathscr {X}^n} f(\sigma ^1,\ldots ,\sigma ^\ell )\prod _{i=1}^\ell \mu (\sigma ^i). \end{aligned}$$

Suppose that \(\Omega ,V\ne \varnothing \) are finite sets. For a distribution \(\mu \in \mathscr {P}(\Omega ^V)\) and a set \(I\subset V\) we denote by \(\mu _I\) the joint distribution of the coordinates I. That is,

$$\begin{aligned} \mu _I(\sigma )&=\sum _{\tau \in \Omega ^n}\varvec{1}\{\forall i\in I:\tau _i=\sigma _i\}\mu (\tau )&(I\subset V,\sigma \in \Omega ^I). \end{aligned}$$

For \(\sigma \in \Omega ^I\) we use the shorthand \(\mu (\sigma )=\mu _I(\sigma )\). Moreover, if \(I=\{i_1,\ldots ,i_l\}\) we usually write \(\mu _{i_1,\ldots ,i_l}\) instead of \(\mu _{\{i_1,\ldots ,i_l\}}\). Additionally, if \(I\subset V\) and \(\tau \in \Omega ^V\), then we let \(\tau _I=(\tau _i)_{i\in I}\) be the restriction of \(\tau \) to I.

We keep the notation from Sect. 2; in particular, \(\Omega \) continues to denote a finite set of spins, p is a probability distribution on \(\Omega \), \(\Psi \) is a measurable space of functions \(\Omega ^k\rightarrow (0,1)\), and P is a probability distribution on \(\Psi \). In addition throughout the paper we denote by

$$\begin{aligned} \varvec{x}_i,{\hat{\varvec{x}}}_i,\varvec{s}_i,\varvec{x}_{i,j},\varvec{x}_{i,j}',\varvec{x}_{i,j}'',{\hat{\varvec{x}}}_{i,j}\qquad (i,j\ge 1) \end{aligned}$$

uniformly distributed random variables with values in [0, 1]. Additionally,

$$\begin{aligned} \varvec{\psi },\varvec{\psi }_i,\varvec{\psi }_{i,j},\varvec{\psi }_{i,j}',\varvec{\psi }_{i,j}'',{\hat{\varvec{\psi }}}_{i,j}\qquad (i,j\ge 1) \end{aligned}$$

denote elements of \(\Psi \) drawn from the distribution P. Further,

$$\begin{aligned} \varvec{h}_i,\varvec{h}_{i,j},\varvec{h}_i',{\hat{\varvec{h}}}_{i,j}\qquad (i,j\ge 1) \end{aligned}$$

are uniformly distributed random variables with values in [k]. All of the above random variables are mutually independent as well as independently of any other sources of randomness. These random variables yield random functions that will play an important role: for \(i\ge 1\) we let

$$\begin{aligned} \varvec{\varphi }_i&:\Omega ^{dk}\rightarrow \mathbb {R},&\sigma \mapsto \sum _{\chi \in \Omega }p(\chi )\prod _{j=1}^d\varvec{1}\{\sigma _{k(j-1)+\varvec{h}_{i,j}}=\chi \}\varvec{\psi }_{i,j}(\sigma ),\\ {\hat{\varvec{\varphi }}}_i&:\Omega ^{dk}\rightarrow \mathbb {R},&\sigma \mapsto \sum _{\chi \in \Omega } p(\chi )\prod _{j=1}^d\varvec{1}\{\sigma _{k(j-1)+{\hat{\varvec{h}}}_{i,j}}=\chi \}{\hat{\varvec{\psi }}}_{i,j}(\sigma ). \end{aligned}$$

3.2 Factor graphs

In Sect. 2 we already introduced the random factor graph model \({\varvec{G}}(n,d,p,P)\). To facilitate the proofs we need the following abstract definition.

Definition 3.1

Suppose that \((\mathscr {X},\mathfrak {A})\) is a measurable space. An \(\mathscr {X}\)-factor graph\(G=(V,F,(\partial a)_{a\in F},(\psi _a)_{a\in F},\pi )\) consists of

  • a finite set V of variable nodes,

  • a finite set F of constraint nodes,

  • a set \(\partial a\subset V\) for each \(a\in F\),

  • a function \(\psi _a:\mathscr {X}^{\partial a}\rightarrow [0,\infty )\) for each \(a\in F\) and

  • a probability measure \(\mathfrak {p}\) on \(\mathscr {X}^V\), called the prior.

A factor graph induces a bipartite graph on \(V\cup F\), where \(x\in V\) is adjacent to \(a\in F\) iff \(v\in \partial a\). Accordingly, for a variable node v we let \(\partial v\subset F\) be the set of adjacent constraint nodes (i.e., \(a\in \partial v\) iff \(v\in \partial a\)). The bipartite graph defines a metric on \(V\cup F\), the shortest path metric. For a variable or constraint node u we let \(\partial ^\ell u=\partial ^\ell _Gu\) be the set of all nodes at distance precisely \(\ell \) from u. Moreover, \(\nabla ^\ell u=V\cap (u\cup \bigcup _{i\le \ell }\partial ^iu)\) denotes the set of all variable nodes at distance no more than \(\ell \) from u.

Further, for an assignment \(\sigma \in \Omega ^V\) and \(a\in F\) we use the notation \(\psi _a(\sigma )=\psi _a(\sigma _{\partial a})\) and we define

$$\begin{aligned} \psi _G(\sigma )=\prod _{a\in F}\psi _a(\sigma _{\partial a})\qquad \text{ and }\qquad Z(G)=\int _{\mathscr {X}^V}\psi _G(\sigma ){\mathrm d}\mathfrak {p}(\sigma ). \end{aligned}$$

Providing that \(Z(G)>0\), we introduce a probability measure \(\mu _G\) on \(\mathscr {X}^V\), the Boltzmann distribution, by letting

$$\begin{aligned} {\mathrm d}\mu _G(\sigma )=\frac{\psi _G(\sigma )}{Z(G)}{\mathrm d}\mathfrak {p}(\sigma ). \end{aligned}$$

Mostly the factor graphs that we deal with will have a finite space \(\mathscr {X}=\Omega \) and the prior \(\mathfrak {p}\) will be the product measure \(\mathfrak {p}=\bigotimes _{v\in V}p_v\). In this case we introduce the standard messages given an event \(S\subset \Omega ^V\) as in Sect. 2: for a constraint node a and \(v\in \partial a\) we let

$$\begin{aligned} \mu _{G,v\rightarrow a}(\sigma \mid S)&=\frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/\psi _a(\varvec{\sigma })},{\mu _{G}(\,\cdot \,|S)}}\right\rangle }{\left\langle {{1/\psi _a(\varvec{\sigma })},{\mu _{G}(\,\cdot \,|S)}}\right\rangle }&(\sigma \in \Omega ), \end{aligned}$$
(3.1)
$$\begin{aligned} \mu _{G,a\rightarrow v}(\sigma \mid S)&=\frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/(p_v(\sigma )\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{\mu _{G}(\,\cdot \,|S)}}\right\rangle }{\left\langle {{1/(p_v(\varvec{\sigma }_v)\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{\mu _{G}(\,\cdot \,|S)}}\right\rangle }&(\sigma \in \Omega ). \end{aligned}$$
(3.2)

In the case that \(S=\Omega ^V\) is the entire phase space, we omit the conditioning from the notation and just write \(\mu _{G,v\rightarrow a}\) and \(\mu _{G,a\rightarrow v}\), respectively.

3.3 The cut metric revisited

The cut metric, defined in (2.12), plays a key role in the proofs of the main results. In this section we summarize a few basic facts about the cut metric. Although some have been proved in prior work, we will need to provide a few extensions and adaptations for our purposes. In addition to the continuous version from (2.12), we also need a discrete version of the cut metric, which we present in Sect. 3.3.2.

3.3.1 The continuous cut metric.

We remember that \(\mathscr {K}\) denotes the space of all measurable maps \([0,1]^2\rightarrow \mathscr {P}(\Omega )\), up to equality almost everywhere; we call such maps strong kernels. The cut distance (2.12) induces a pre-metric on this space [39]. Moreover, on the space \(\mathfrak {K}\) obtained by identifying points at cut distance 0 the cut distance yields a metric. The elements of \(\mathfrak {K}\) are called weak kernels. We drop the attribute and just speak of kernels where there is no danger of confusion.

Proposition 3.2

([24]). Endowed with the cut distance \(\mathfrak {K}\) is a compact Polish space.

We continue to write \(\mathfrak {D}\) for the space of all probability measures on \(\mathfrak {K}\). This space is endowed with the weak topology. Since \(\mathfrak {K}\) is a compact Polish space, so is \(\mathfrak {D}\). Hence, there is a natural metric on \(\mathfrak {D}\) that induces the weak topology, the \(L_1\)-Wasserstein metric. We take license to denote this metric by \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\) as well. Thus, recalling that \(\Gamma (\pi ,\pi ')\) is the set of all couplings of \(\pi ,\pi '\in \mathfrak {D}\), we have

$$\begin{aligned} \mathscr {D}_{\Box }(\pi ,\pi ')&=\inf \left\{ {\int _{\mathfrak {K}\times \mathfrak {K}}\mathscr {D}_{\Box }(\mu ,\mu ')\,{\mathrm d}\gamma (\mu ,\mu '): \gamma \in \Gamma (\pi ,\pi ')}\right\} . \end{aligned}$$

For \(\pi \in \mathfrak {D}\) we let \(\varvec{\mu }^\pi \in \mathfrak {K}\) denote a sample. We just write \(\varvec{\mu }\) where \(\pi \) is apparent.

By comparison to other metrics on the space of measurable functions \([0,1]^2\rightarrow \mathscr {P}(\Omega )\) the cut metric is extremely weak; this is highlighted by the compactness of the space \(\mathfrak {K}\) provided by Proposition 3.2. Yet the cut metric is sufficiently strong to ensure that certain functions that will be of vital interest to us are continuous. Indeed, suppose that \(m,n>0\) are integers and that \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\) is a function. Then for \(\mu \in \mathscr {K}\) we define the random variable

$$\begin{aligned} \left\langle {{f},{\mu }}\right\rangle&=\sum _{\sigma \in \Omega ^{m\times n}}f(\sigma ) \int _0^1\cdots \int _0^1\prod _{i=1}^m\prod _{j=1}^n\mu _{s_i,\varvec{x}_{j}}(\sigma _{i,j}){\mathrm d}s_1\cdots {\mathrm d}s_m. \end{aligned}$$

Because we average out the \(s_i\) and the \(\varvec{x}_i\) are uniform, the random variables \(\left\langle {{f},{\mu }}\right\rangle \) and \(\left\langle {{f},{\nu }}\right\rangle \) are identically distributed if \(\mathscr {D}_{\Box }(\mu ,\nu )=0\). Thus, we may safely write \(\left\langle {{f},{\mu }}\right\rangle \) for \(\mu \in \mathfrak {K}\).

Lemma 3.3

For any \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\), \(\ell \ge 1\) the map \(\mu \in \mathfrak {K}\mapsto \mathbb {E}\left[{\left\langle {{f},{\mu }}\right\rangle ^\ell }\right]\) is continuous with respect to the cut metric.

The proof of Lemma 3.3 can be found in the appendix. For a probability distribution \(\pi \in \mathfrak {D}\) we let \(\left\langle {{f},{\pi }}\right\rangle \) be the random variable \(\left\langle {{f},{\varvec{\mu }^{\pi }}}\right\rangle \), with \(\varvec{\mu }^{\pi }\) chosen independently of the \(\varvec{x}_i\). Since \(\mathfrak {D}\) carries the weak topology, Lemma 3.3 implies

Corollary 3.4

For any \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\), \(\ell \ge 1\) the map \(\pi \in \mathfrak {D}\mapsto \mathbb {E}\left[{\left\langle {{f},{\pi }}\right\rangle ^\ell }\right]\) is continuous.

We recall the functional \(\mathscr {B}(\,\cdot \,)\) from (2.15).

Corollary 3.5

The map \(\pi \in \mathfrak {D}\mapsto \mathscr {B}(\pi )\) is continuous.

Proof

Thanks to the tail bound (2.1), we can approximate the logarithms in (2.15) by polynomials. Therefore, the assertion follows from Corollary 3.4. \(\quad \square \)

The set \(\mathfrak {D}^\star \) of \(\pi \in \mathfrak {D}\) that are invariant under the \(*(N,M)\)-operation is a closed subset of \(\mathfrak {D}\). To see this, and to interpret the \(*(N,M)\)-operation nicely in terms of operations that are continuous under the cut metric, we introduce the following general transformation. Suppose that \(f:\Omega ^N\rightarrow (0,\infty )\) is a function and that \(\mu \in \mathscr {K}\). Then we define a random \(f*\mu \in \mathscr {K}\) as follows. Letting

$$\begin{aligned} \varvec{z}_s&=\varvec{z}_s^{f,\mu }=\sum _{\sigma \in \Omega ^N}f(\sigma )\prod _{i=1}^N\mu _{s,{\hat{\varvec{x}_i}}}(\sigma _i),&\varvec{z}&=\varvec{z}^{f,\mu }=\int _0^1\varvec{z}_s^{f,\mu }{\mathrm d}s, \end{aligned}$$

we introduce

$$\begin{aligned} \varvec{t}&=\varvec{t}_s^{f,\mu }=\inf \left\{ {\theta \in [0,1]:\int _0^\theta \varvec{z}_u{\mathrm d}u\ge s\varvec{z}}\right\} . \end{aligned}$$

Now, \(f*\mu _{s,x}=\mu _{\varvec{t},x}\). We emphasize that \(f*\mu \in \mathscr {K}\) is random, dependent on \({\hat{\varvec{x}}}_1,\ldots ,{\hat{\varvec{x}}}_N\). The kernel is characterized by the identity

$$\begin{aligned} \int _0^1\int _0^1 g_{s,x}\cdot f*\mu _{s,x}(\omega ){\mathrm d}s{\mathrm d}x&= \frac{1}{\varvec{z}}\sum _{\sigma \in \Omega ^N}f(\sigma )\int _0^1\int _0^1g_{s,x}\cdot \mu _{s,x}(\omega )\varvec{z}_s{\mathrm d}s{\mathrm d}x \\&\quad \text{ for } \text{ all } g:[0,1]^2\rightarrow [0,1],\,\omega \in \Omega . \end{aligned}$$

Further, since the \({\hat{\varvec{x}_i}}\) are uniform, we have \(\mathscr {D}_{\Box }(f*\mu ,f*\nu )=0\) if \(\mathscr {D}_{\Box }(\mu ,\nu )=0\). Hence, the \(*\)-operation extends to weak kernels. Furthermore, for a distribution \(\pi \) we let \(f*\pi \) be the distribution of \(f*\varvec{\mu }^\pi \).

Lemma 3.6

For any function \(f:\Omega ^k\rightarrow (0,\infty )\) the map \(\mathfrak {K}\rightarrow \mathfrak {D}\), \(\mu \mapsto f*\mu \) is continuous.

The proof of Lemma 3.6 can be found in the appendix.

The \(*(N,M)\)-operation is an application of the above \(*\)-operation to a particular random function f. To define this random function, we need one more piece of notation. Namely, for functions \(f:\Omega ^{M\times N}\rightarrow \mathbb {R}\), \(g:\Omega ^{M\times L}\rightarrow \mathbb {R}\) we define

$$\begin{aligned} f\oplus g&:\Omega ^{M\times (N+L)}\rightarrow \mathbb {R},&\sigma&\mapsto f\left( {(\sigma _{i,j})_{i\in [M],j\in [N]}}\right) \cdot g\left( {(\sigma _{i,j+N})_{i\in [M],j\in [L]}}\right) . \end{aligned}$$

In words, we stick the first N ‘columns’ of \(\sigma \) into f and the last L columns into g and multiply the results. Recalling the random functions \({\hat{\varvec{\psi }_i}}\), \({\hat{\varvec{\varphi }_i}}\) from Sect. 3.1, we obtain the following.

Lemma 3.7

For any \(\mu \in \mathfrak {K}\) the random \(\mu ^{*(N,M)}\in \mathfrak {K}\) is distributed as \(\left( {\bigoplus _{i=1}^N{\hat{\varvec{\varphi }}}_i\oplus \bigoplus _{i=1}^M{\hat{\varvec{\psi }_i}}}\right) *\mu .\)

Proof

This is immediate from the construction of \(\mu ^{*(N,M)}\). \(\quad \square \)

Corollary 3.8

For any NM the map \(\mu \in \mathfrak {K}\mapsto \mu ^{*(N,M)}\) is continuous with respect to the cut metric.

Proof

Since \(\mathfrak {D}\) carries the weak topology, which is induced by the Wasserstein metric, the assertion follows from Lemmas 3.63.7 and (2.1). \(\quad \square \)

As a further immediate consequence of Lemma 3.7 we obtain

Corollary 3.9

A distribution \(\pi \in \mathfrak {D}\) belongs to \(\mathfrak {D}^\star \) if and only if \(\pi =\left( {\bigoplus _{i=1}^N{\hat{\varvec{\varphi }}}_i\oplus \bigoplus _{i=1}^M{\hat{\varvec{\psi }}}_i}\right) *\pi \) for all NM.

In particular, Corollaries 3.8 and 3.9 imply that the map \(\pi \mapsto \pi ^{*(N,M)}\) is continuous for all \(N,M\ge 0\). Consequently, \(\mathfrak {D}^\star \) is a closed subset of \(\mathfrak {D}\).

3.3.2 The discrete version.

Apart from the ‘continuous’ installment of the cut metric, defined on kernels, we also need a discrete variant, defined on probability measures on discrete sets. To be precise, with \(\Omega \ne \varnothing \) our finite set of spins and V another finite set of size \(n\ge 1\), we define a metric \(\Delta _{\Box }(\,\cdot \,,\,\cdot \,)\) on \(\mathscr {P}(\Omega ^V)\) as follows. Recalling that \(\Gamma (\mu ,\nu )\) is the set of all couplings of probability measures \(\mu ,\nu \) on \(\Omega ^V\), we let

$$\begin{aligned} \Delta _{\Box }(\mu ,\nu )&=\frac{1}{n}\min _{\gamma \in \Gamma (\mu ,\nu )}\max _{\begin{array}{c} I\subset V\\ B\subset \Omega ^V\times \Omega ^V\\ \omega \in \Omega \end{array}}\left| {\sum _{i\in I}\sum _{(\sigma ,\tau )\in B}\gamma (\sigma ,\tau ) (\varvec{1}\{\sigma _i=\omega \}-\varvec{1}\{\tau _i=\omega \})}\right| \nonumber \\&\quad \text{ for } \mu ,\nu \in \mathscr {P}(\Omega ^V). \end{aligned}$$
(3.3)

Fact 3.10

([23]) \(\Delta _{\Box }(\,\cdot \,,\,\cdot \,)\) is a metric on \(\mathscr {P}(\Omega ^V)\).

We refer to \(\Delta _{\Box }(\,\cdot \,,\,\cdot \,)\) as the discrete cut metric.

Suppose that V is a finite set. A measure \(\mu \in \mathscr {P}(\Omega ^V)\) can be represented by a point \({\dot{\mu }}\in \mathscr {K}\). Indeed, assume without loss that \(V=[n]\) and that \(\Omega =[q]\). Then the set \(\Omega ^V\) can be ordered lexicographically as \(\sigma ^{(1)},\ldots ,\sigma ^{(q^n)}\). We define \({\dot{\mu }}\in \mathscr {K}\) by letting

$$\begin{aligned} {\dot{\mu }}_{s,x}&=\sum _{i=1}^n\sum _{j=1}^{q^n}\varvec{1}\{(i-1)/n\le x<i/n\} \varvec{1}\left\{ {\sum _{h<j}\mu (\sigma ^{(h)})\le s<\sum _{h\le j}\mu (\sigma ^{(h)})}\right\} \delta _{\sigma ^{(j)}_i}. \end{aligned}$$

Comparing (3.3) with the definition (2.12) of the continuous cut metric, we see that

$$\begin{aligned} \mathscr {D}_{\Box }({\dot{\mu }},{\dot{\nu }})&\le \Delta _{\Box }(\mu ,\nu )&(\mu ,\nu \in \Omega ^V). \end{aligned}$$
(3.4)

The discrete cut metric encodes a great deal of information about the discrete measures. A particularly important case occurs when a measure \(\mu \in \mathscr {P}(\Omega ^V)\) is close to a product measure. To be precise, we say that \(\mu \) is \(\varepsilon \)-extremal if \(\Delta _{\Box }(\mu ,\bigotimes _{v\in V}\mu _v)<\varepsilon \). In words, \(\mu \) is close to the product measure with the same marginals. In addition, \(\mu \in \mathscr {P}(\Omega ^V)\) is \((\varepsilon ,\ell )\)-symmetric if

$$\begin{aligned} \frac{1}{|V|^\ell }\sum _{v_1,\ldots ,v_\ell \in V}\left\| {\mu _{v_1,\ldots ,v_\ell }- \mu _{v_1}\otimes \cdots \otimes \mu _{v_\ell }}\right\| _{\mathrm {TV}}&<\varepsilon . \end{aligned}$$
(3.5)

Informally, if we choose \(\ell \) coordinates randomly, then their joint distribution typically ‘nearly’ factorizes. The following statement shows that these concepts are essentially equivalent, up to a moderate loss in the parameters.

Proposition 3.11

([23]). For any \(\Omega \) of size \(1<\left| {\Omega }\right| <\infty \), any \(0<\varepsilon <1/2\) and any \(\ell \ge 2\) there exists \(n_0>0\) such that for all \(n>n_0\) and all \(\mu \in \mathscr {P}(\Omega ^V)\) the following two statements hold.

  1. (i)

    If \(\mu \) is \((\varepsilon /9)^3\)-symmetric, then \(\mu \) is \(\varepsilon \)-extremal.

  2. (ii)

    If \(\mu \) is \(\varepsilon ^3/(128|\Omega |)^{4\ell }\)-extremal, then \(\mu \) is \((\varepsilon ,\ell )\)-symmetric.

It is an elementary observation that probability measures that are close in the discrete cut metric cannot have very different marginals. Formally, we have the following.

Lemma 3.12

For any two probability measures \(\mu ,\nu \) on \(\Omega ^V\) we have \(\sum _{v\in V}\Vert {\mu _v-\nu _v}\Vert _{\mathrm {TV}}\le 2|\Omega |\Delta _{\Box }(\mu ,\nu ).\)

Proof

There exists \(\omega \in \Omega \) such that

$$\begin{aligned} \sum _{v\in V}(\mu _v(\omega )-\nu _v(\omega ))\vee 0\ge \frac{1}{2|\Omega |}\sum _{v\in V}\Vert {\mu _v-\nu _v}\Vert _{\mathrm {TV}}. \end{aligned}$$
(3.6)

Let \(I=\left\{ {i\in V:\mu _i(\omega )\ge \nu _i(\omega )}\right\} \) and \(B=\Omega ^V\times \Omega ^V\). Then for any coupling \(\gamma \in \Gamma (\mu ,\nu )\),

$$\begin{aligned} \sum _{i\in I}\sum _{(\sigma ,\tau )\in B}\gamma (\sigma ,\tau )\left( {\varvec{1}\{\sigma _i=\omega \}-\varvec{1}\{\tau _i=\omega \}}\right)&=\sum _{i\in V}(\mu _i(\omega )-\nu _i(\omega ))\vee 0. \end{aligned}$$
(3.7)

Combining (3.6) and (3.7) completes the proof. \(\quad \square \)

The converse bound, that close marginals imply closeness in the cut metric, holds for extremal measures.

Lemma 3.13

For any two \(\varepsilon \)-extremal \(\mu ,\nu \in \mathscr {P}(\Omega ^V)\) we have \(\mathscr {D}_{\Box }(\mu ,\nu )\le 2\varepsilon +\sum _{v\in V}\Vert {\mu _v-\nu _v}\Vert _{\mathrm {TV}}.\)

Proof

Assume without loss that \(V=[n]\) and let \({\bar{\mu }}=\bigotimes _{i=1}^n\mu _i\) and \({\bar{\nu }}=\bigotimes _{i=1}^n\nu _i\). Since \(\mu ,\nu \) are \(\varepsilon \)-extremal, we have

$$\begin{aligned} \Delta _{\Box }(\mu ,{\bar{\mu }})&<\varepsilon ,&\Delta _{\Box }(\nu ,{\bar{\nu }})&<\varepsilon . \end{aligned}$$
(3.8)

Let \(\gamma _i\in \mathscr {P}(\Omega \times \Omega )\) be an optimal coupling of \(\mu _i,\nu _i\), i.e., \(\Vert {\mu _i-\nu _i}\Vert _{\mathrm {TV}}=\sum _{\sigma \ne \tau }\gamma _i(\sigma ,\tau )\). Then \(\gamma =\bigotimes _{i=1}^n\gamma _i\) is a coupling of \({\bar{\mu }},{\bar{\nu }}\). Further, for any \(I\subset [n],B\subset \Omega ^n\times \Omega ^n,\omega \in \Omega \) we have

$$\begin{aligned}&\left| {\sum _{i\in I}\sum _{(\sigma ,\tau )\in B}\gamma (\sigma ,\tau )\left( {\varvec{1}\{\sigma _i=\omega \}-\varvec{1}\{\tau _i=\omega \}}\right) }\right| \\&\quad \le \sum _{i\in I}\sum _{(\sigma ,\tau )\in B}\gamma (\sigma ,\tau )\varvec{1}\{\sigma _i\ne \tau _i\}\le \sum _{i=1}^n\Vert {\mu _i-\nu _i}\Vert _{\mathrm {TV}}. \end{aligned}$$

Hence, \(\Delta _{\Box }({\bar{\mu }},{\bar{\nu }})\le \sum _{i=1}^n\Vert {\mu _i-\nu _i}\Vert _{\mathrm {TV}}\), and thus the assertion follows from (3.8) and the triangle inequality. \(\quad \square \)

We also make a note of the following enhanced triangle inequality.

Lemma 3.14

([23]). Suppose that \(\mu ^{(1)},\nu ^{(1)},\ldots ,\mu ^{(\ell )},\nu ^{(\ell )}\) are probability measures on \(\Omega ^V\) and that \(u_1,\ldots ,u_\ell \ge 0\) are numbers such that \(\sum _{i=1}^\ell u_i=1\). Then

$$\begin{aligned} \Delta _{\Box }\left( {\sum _{i=1}^\ell u_i\mu ^{(i)},\sum _{i=1}^\ell u_i\nu ^{(i)}}\right)&\le \sum _{i=1}^\ell u_i\Delta _{\Box }(\mu ^{(i)},\nu ^{(i)}). \end{aligned}$$

Finally, we come to an important fact, intimately related to the Szemerédi regularity lemma from combinatorics. Namely, any probability distribution \(\mu \in \mathscr {P}(\Omega ^V)\) is close in the cut metric to a mixture of a ‘small’ number of product measures. To state this results precisely, suppose that \(I\subset V\) and that \(\sigma \in \Omega ^I\). Let

$$\begin{aligned} S^{I,\sigma }=\left\{ {\tau \in \Omega ^V:\tau _I=\sigma }\right\} \end{aligned}$$

be the sub-cube of \(\Omega ^V\) where the entries of the coordinates in I coincide with \(\sigma \). Further, assuming that \(\mu \in \mathscr {P}(\Omega ^V)\) and \(\mu (S^{I,\sigma })>0\), we let

$$\begin{aligned} \mu ^{I,\sigma }=\mu [\,\cdot \,|S^{I,\sigma }] \end{aligned}$$
(3.9)

be the corresponding conditional distribution of \(\mu \). (If \(\mu (S^{I,\sigma })=0\), then we agree that \(\mu ^{I,\sigma }\) is the uniform distribution on \(S^{I,\sigma }\).) The following key lemma shows that \(\mu ^{I,\sigma }\) is likely \(\varepsilon \)-symmetric for suitably random \(I,\sigma \).

Lemma 3.15

([19, Lemma 3.5]). For any set \(\Omega \) of size \(1<|\Omega |<\infty \) and any \(\varepsilon >0\) there exist \(n_0>0\) and a random variable \(0<\varvec{\theta }\le 2\varepsilon ^{-4}\log |\Omega |\) such that for all \(n>n_0\) and all \(\mu \in \mathscr {P}(\Omega ^V)\) the following holds. Let \(\varvec{I}\subset V\) be a uniformly random subset of size \(\varvec{\theta }\) and choose \(\varvec{\sigma }\in \Omega ^{\varvec{I}}\) from \(\mu _{\varvec{I}}\). Then \(\mathbb {P}\left[{\mu ^{\varvec{I},\varvec{\sigma }} \text{ is } \varepsilon \text{-symmetric }}\right]>1-\varepsilon .\)

We can apply Lemma 3.15 multiple times to obtain a decomposition of the set \(\Omega ^V\) into sub-cubes \(S_1,\ldots ,S_\ell \) such that \(\mu [\,\cdot \,|S_i]\) is \(\varepsilon \)-symmetric. To obtain these sub-cubes we just choose the set \(\varvec{I}\) randomly as in Lemma 3.15 and let \(\sigma \) range over all \(|\Omega |^{\varvec{\theta }}\) possible assignments of \(\varvec{I}\). We then obtain the following version of the regularity lemma.

Corollary 3.16

([10]). For any finite set \(\Omega \ne \varnothing \) and any \(\varepsilon >0\) there exist \(L,n_0\) such that for all \(n>n_0\) the following is true. For any \(\mu \in \mathscr {P}(\Omega ^V)\) there exists a partition of \(\Omega ^V\) into pairwise disjoint sets \(S_0,\ldots ,S_\ell \), \(\ell \le L\), such that \(\mu (S_0)<\varepsilon \) and such that \(\mu (\,\cdot \,|S_i)\) is \(\varepsilon \)-symmetric for each \(1\le i\le \ell \).

3.3.3 Contiguity.

Suppose that \(\Omega \ne \varnothing \) is a finite set and let \(c\ge 1\). A probability distribution \(\nu \) on \(\Omega ^n\) is c-contiguous with respect to another probability distribution \(\mu \) if

$$\begin{aligned} \nu (\sigma )&\le c\mu (\sigma )&\text{ for } \text{ all } \sigma \in \Omega ^n. \end{aligned}$$

Moreover, \(\mu ,\nu \) are mutually c-contiguous if each is c-contiguous with respect to the other.

Lemma 3.17

For any \(c\ge 1,\delta >0\) there exists \(\varepsilon >0\) such that for all large enough n the following is true. Assume that \(\mu \in \mathscr {P}(\Omega ^n)\) is \(\varepsilon \)-extremal and that \(\nu \) is c-contiguous with respect to \(\mu \). Then \(\nu \) is \(\delta \)-extremal and \(\Delta _{\Box }(\mu ,\nu )<\delta \).

Proof

Choose \(0<\varepsilon \ll \eta \ll \zeta \ll \delta \) and assume that \(n>n_0(\varepsilon )\) is sufficiently large and that \(\mu \) is \(\varepsilon \)-extremal. Applying Corollary 3.16 to the measure \(\nu \), we obtain a partition \(S_0,S_1,\ldots ,S_\ell \) of the cube \(\Omega ^n\) into pairwise disjoint sets such that \(\nu (S_0)<\eta \) and such that \(\nu (\,\cdot \,|S_i)\) is \(\eta \)-symmetric for every \(i=1,\ldots ,\ell \). Moreover, \(\ell \) is bounded by a number \(L(\eta ,\Omega )>0\) that depends on \(\eta \) and \(|\Omega |\) only.

Suppose that for every \(1\le i\le \ell \) with \(\nu (S_i)\ge \eta /\ell \) we have

$$\begin{aligned} \sum _{j=1}^n\Vert {\nu _j(\,\cdot \,|S_i)-\mu _j}\Vert _{\mathrm {TV}}\le \zeta n. \end{aligned}$$
(3.10)

Then Lemma 3.13 yields \(\Delta _{\Box }(\nu (\,\cdot \,|S_i),\mu )\le 2\eta +\zeta \). Hence, Lemma 3.14 shows that

$$\begin{aligned} \Delta _{\Box }(\nu ,\mu )\le 4\eta +\zeta <\delta . \end{aligned}$$
(3.11)

Further, (3.10) implies that \(\sum _{j=1}^n\Vert {\nu _j-\mu _j}\Vert _{\mathrm {TV}}\le \zeta +2\eta \). Hence, letting \({\bar{\mu }}=\bigotimes _{i=1}^n\mu _i\), \({\bar{\nu }}=\bigotimes _{i=1}^n\nu _i\) and applying Lemma 3.13 a second time, we obtain \(\Delta _{\Box }({\bar{\mu }},{\bar{\nu }})\le \zeta +2\eta \). Thus, invoking the \(\varepsilon \)-extremality of \(\mu \) and (3.11), we conclude that

$$\begin{aligned} \Delta _{\Box }(\nu ,{\bar{\nu }})\le \Delta _{\Box }(\nu ,\mu )+\Delta _{\Box }(\mu ,{\bar{\mu }})+\Delta _{\Box }({\bar{\mu }},{\bar{\nu }})\le (4\eta +\zeta )+\varepsilon +(\zeta +2\eta )<\delta . \end{aligned}$$
(3.12)

In summary, if (3.10) is satisfied, then (3.11) and (3.12) yield \(\Delta _{\Box }(\nu ,\mu )<\delta \) and \(\Delta _{\Box }(\nu ,{\bar{\nu }})<\delta \), as claimed.

Thus, we are left to establish (3.10). Assume for contradiction that there is \(1\le i\le \ell \) with \(\nu (S_i)\ge \eta /\ell \) and \(\sum _{j=1}^n\Vert {\nu _j(\,\cdot \,|S_i)-\mu _j}\Vert _{\mathrm {TV}}>\zeta n\). Then there exist \(J\subset [n]\) and \(\omega \in \Omega \) such that \(\sum _{j\in J}{\nu _j(\omega |S_i)-\mu _j(\omega )} >\zeta n/\left( {2|\Omega |}\right) \). In other words, the random variable \(X(\sigma )=\sum _{j\in J}\varvec{1}\{\sigma _j=\omega \}\) satisfies

$$\begin{aligned} \left\langle {{X},{\nu (\,\cdot \,|S_i)}}\right\rangle -\left\langle {{X},{\mu }}\right\rangle >\zeta n/\left( {2|\Omega |}\right) . \end{aligned}$$
(3.13)

Due to the \(\eta \)-symmetry of \(\nu (\,\cdot \,|S_i)\) and the \(\varepsilon \)-symmetry of \(\mu \), the second moments work out as

$$\begin{aligned} \left\langle {{X(X-1)},{\nu (\,\cdot \,|S_i)}}\right\rangle&=\sum _{j,j'\in J:j\ne j'}\left\langle {{\varvec{1}\{\varvec{\sigma }_j=\varvec{\sigma }_{j'}=\omega \}},{\nu (\,\cdot \,|S_i)}}\right\rangle \le \eta n+\left\langle {{X},{\nu (\,\cdot \,|S_i)}}\right\rangle ^2,\nonumber \\ \left\langle {{X(X-1)},{\mu }}\right\rangle&\le \varepsilon n+\left\langle {{X},{\mu }}\right\rangle ^2. \end{aligned}$$
(3.14)

Combining (3.13) and (3.14) with Chebyshev’s inequality and keeping in mind that \(\varepsilon \ll \eta \ll \zeta \), we conclude that the event \(B=\left\{ {X(\varvec{\sigma })\ge \left\langle {{X},{\mu }}\right\rangle +\zeta /(4|\Omega |)}\right\} \) satisfies

$$\begin{aligned} \nu (B|S_i)&\ge 3/4,&\mu (B)&\le \varepsilon ^{1/4}. \end{aligned}$$
(3.15)

However, if \(\nu \) is c-contiguous with respect to \(\mu \), then (3.15) yields

$$\begin{aligned} c\varepsilon ^{1/4}\ge c\mu (B)&\ge \nu (B)=\nu (B|S_i)\nu (S_i)\ge 3\eta /(4\ell )\ge 3\eta /(4L(\eta ,\Omega )), \end{aligned}$$

which contradicts the choice of the parameters \(\varepsilon ,\eta \). \(\quad \square \)

Corollary 3.18

For any \(\delta >0\) there exists \(\varepsilon >0\) such that the following is true. Suppose that \(\mu \) is \(\varepsilon \)-extremal and that \(S\subset \Omega ^n\) is an event such that \(\mu (S)\ge \delta \). Then \(\mu (\,\cdot \,|S)\) is \(\delta \)-extremal and \(\Delta _{\Box }(\mu (\,\cdot \,|S),\mu )<\delta \).

Proof

Since \(\mu (\sigma |S)\le \mu (\sigma )/\mu (S)\) for every \(\sigma \), the conditional distribution \(\mu (\,\cdot \,|S)\) is \(1/\varepsilon \)-contiguous with respect to \(\mu \). Thus, the assertion follows from Lemma 3.17 immediately. \(\quad \square \)

4 Bethe State Decompositions

In this section we prove Theorem 2.5. Before getting into the details, why might we expect the statement of the theorem to be correct? In order to construct the desired decomposition of the phase space \(\Omega ^{V_n}\) we could attempt to apply Lemma 3.15. The lemma shows that if we randomly pick a set \(\varvec{I}\subset V_n\) of a moderate size of, say, \(\varvec{I}=O(\log \log n)\), we will likely obtain a decomposition \((S^{\varvec{I},\sigma })_{\sigma \in \Omega ^{\varvec{I}}}\) such that (2.10) is satisfied for most of its parts. Formally, Lemma 3.15 guarantees that with high probability,

$$\begin{aligned} \frac{1}{n^2} \sum _{1\le i<j\le n}\left\langle {{\left\| {\mu _{{\varvec{G}},v_i,v_j}(\,\cdot \,|S^{\varvec{I},\varvec{\sigma }})-\mu _{{\varvec{G}},v_i}(\,\cdot \,|S^{\varvec{I},\varvec{\sigma }})\otimes \mu _{{\varvec{G}},v_j}(\,\cdot \,|S^{\varvec{I},\varvec{\sigma }})}\right\| _{\mathrm {TV}}},{\mu _{{\varvec{G}}}}}\right\rangle&=o(1). \end{aligned}$$
(4.1)

Proposition 3.11 extends (4.1) from pairwise independence to independence of bounded numbers of randomly chosen variable nodes. Thus, the pinning operation eliminates long-range correlations.

But why should short-range correlations be described by Belief Propagation? Recalling the standard messages from (2.4)–(2.5), Belief Propagation (2.3) asserts that with high probability,

$$\begin{aligned} \mu _{{\varvec{G}},v\rightarrow a}(\sigma \mid S^{\varvec{I},\varvec{\sigma }})&\propto \prod _{b\in \partial v{\setminus } a}\sum _{\tau \in \Omega ^{\partial b}}\varvec{1}\left\{ {\tau _x=\sigma }\right\} \psi _b(\tau )\prod _{y\in \partial }\mu _{{\varvec{G}},y\rightarrow b}(\tau _y\mid S^{\varvec{I},\varvec{\sigma }})+o(1). \end{aligned}$$
(4.2)

This last formula expresses the notion that once we remove x and its adjacent constraints \(\partial x\), the spins assigned to the variables y at distance two from x in \({\varvec{G}}\) are (essentially) stochastically independent. This is very much in line with the absence of long-range correlations: in \({\varvec{G}}-(x\cup \partial x)\), the variables y likely are far apart from one another as \({\varvec{G}}\) contains only few short cycles. (For a more detailed discussion of the intuition behind Belief Propagation we refer to [46, Chapter 14].)

Yet (4.2) does not follow from (4.1) directly. Indeed, (4.1) merely states that the joint two randomly chosen variables of \({\varvec{G}}\) are likely approximately independent. But being second neighbors of x, the y variables in (4.2) are anything but a uniformly random family of variable nodes. For random factor graph models of Erdős-Rényi type, this difficulty is easily overcome [23] because the random factor graph obtained by removing \(x\cup \partial x\) has essentially the same distribution as the original model (of order \(n-1\)). In effect, the original factor graph is distributed nearly the same as the factor graph of order \(n-1\) plus a new variable plus adjacent constraints connected to a uniformly random family of variable nodes, and (4.1) applies to these random attachments. Of course, this trick does not work on the Bethe lattice, where the variables y stand out as they have degree less than d. To cope with this difficulty, we will consider an auxiliary model in which a random number of variable nodes along with their neighborhoods are removed. Moreover, we will apply Lemma 3.15 not merely to the entire original factor graph \({\varvec{G}}\) but also to the variable nodes that have degree \(d-1\) after the removal operation, which will enable us to establish the Belief Propagation equations on the reduced factor graph. Finally, we will use the decorrelation property (4.1) together with the tools from Sect. 3.3 to stitch the factor graph back up, i.e., to get back to the original model with no variables and constraints removed. Let us now carry out this strategy in detail.

4.1 The construction

We aim to show that the Boltzmann distribution \(\mu _{{\varvec{G}}}\) is well approximated by a collection of no more than L Belief Propagation fixed points for an arbitrarily slowly growing \(L=L(n)\). As (4.2) shows, for a given variable node v the corresponding fixed point equations involve the messages sent by the constraint \(a\in \partial v\), which in turn are determined by the messages sent out by the variables w at distance precisely two from v. Thus, to express a single application of the Belief Propagation operator we require information about the variable nodes at distance two from v. Therefore, in addition to the Boltzmann distribution \(\mu _{{\varvec{G}}}\) we will consider an enhanced measure \({\hat{\mu }}_{{\varvec{G}}}\) that captures the joint distribution of the second neighborhoods.

To be precise, let \(G=(V,F,(\partial a)_{a\in F},(\psi _a)_{a\in F},p^{\otimes n})\) be a factor graph. Then its Boltzmann distribution \(\mu _G\) ‘lives’ on the space \(\Omega _G=\Omega ^V\). In addition, recalling that \(\nabla ^2_Gv\) consists of all variable nodes at distance at most two from v, consider the space

$$\begin{aligned} {\hat{\Omega _G}}=\prod _{v\in V}\Omega ^{\nabla ^2_Gv} \end{aligned}$$

of second neighborhood assignments, whose elements we denote as \(\tau =(\tau (v,w))_{v\in V,w\in \nabla ^2_Gv}\). The factor graph G induces an embedding

$$\begin{aligned} \Omega _G&\rightarrow {\hat{\Omega }_G},&\sigma&\mapsto {\hat{\sigma }}=({\hat{\sigma }}(v,w))_{v\in V,w\in \nabla ^2_Gv}, \qquad \text{ where } {\hat{\sigma }}(x,y)=\sigma (y). \end{aligned}$$

Thus, \(\mu _G\) induces a probability distribution \({\hat{\mu }_G}\) on \({\hat{\Omega }_G}\). For a variable v we denote by \({\hat{\mu }}_{G,v}\in \mathscr {P}(\Omega ^{\nabla ^2_Gv})\) the marginal distribution of \({\hat{\mu }_G}\) on the v-factor of \({\hat{\Omega }_G}\).

The enhanced measure \({\hat{\mu }}_{{\varvec{G}}}\) will play a vital role in the construction of the Bethe state decomposition. Indeed, by comparison to the Erdős-Rényi case, the rigid geometry of the random regular graph causes significant difficulties. More precisely, while the Belief Propagation messages are defined in terms of removing one or a few constraints, such operations clearly destroy regularity. Hence, we need to create a bit of wiggling room. To this end, we remove some variable nodes along with their adjacent constraint nodes, thereby leaving a few variable nodes with degree \(d-1\) rather than d. We refer to these variables as ‘cavities’. Clearly, this operation loses some information and would therefore by itself not suffice to prove Theorem 2.5. However, what saves the day is that the enhanced measure \({\hat{\mu }}_{{\varvec{G}}}\) contains the extra information needed to stitch the graph back up without losing track of the Bethe decomposition.

Unsurprisingly, the construction is subtle and involves several steps. It requires a number of carefully chosen parameters. Specifically, given a slowly diverging monotonically increasing positive integer sequence \(L=L(n)\rightarrow \infty \) as in Theorem 2.5, we choose a sequence \(0<\xi =\xi (L)=o(1)\) that tends to zero monotonically sufficiently slowly, a further sequence \(\omega =\omega (\xi )\rightarrow \infty \) that tends to infinity monotonically sufficiently slowly, as well as sequences \(0<\vartheta =\vartheta (\omega )=o(1)\), \(0<\zeta =\zeta (\vartheta )=o(1)\), \(0<\beta =\beta (\vartheta )=o(1)\), \(0<\alpha =\alpha (\beta )=o(1)\), \(0<\eta =\eta (\alpha )=o(1)\) and \(0<\varepsilon =\varepsilon (\eta )=o(1)\) that tend monotonically to zero slowly enough. In summary, the pecking order reads

$$\begin{aligned} 1\ll 1/\varepsilon&\ll 1/\eta \ll 1/\alpha \ll 1/\beta \ll 1/\zeta \ll 1/\vartheta \ll \omega \ll 1/\xi \ll L\ll \log \log n, \end{aligned}$$
(4.3)

and we always assume tacitly that \(n>n_0\) is sufficiently large.

We are ready to begin the construction. Let \({\varvec{G}}_*\) be the random factor graph obtained from \({\varvec{G}}\) as follows. Let \(\varvec{\theta }_*\) be a copy of the random variable \(\varvec{\theta }_\xi \) promised by Lemma 3.15; \(\varvec{\theta }_*\) is independent of \({\varvec{G}}\). Further, let \(\varvec{U}_*\) be a random set of \(\varvec{\theta }_*\) variable nodes of \({\varvec{G}}\) and draw \(\varvec{\sigma }_*\) from \(\mu _{{\varvec{G}}}\) independently of \(\varvec{\theta }_*\) and \(\varvec{U}_*\). Now, obtain \({\varvec{G}}_*\) from \({\varvec{G}}\) by changing the prior distribution to

$$\begin{aligned} p_{{\varvec{G}}_*}(\sigma )=\prod _{u\in \varvec{U}_*\cup \partial ^2\varvec{U}_*}\varvec{1}\{\sigma _u=\varvec{\sigma }_{*\,u}\}\prod _{v\not \in \varvec{U}_*\cup \partial ^2\varvec{U}_*}p(\sigma _v). \end{aligned}$$
(4.4)

Additionally, let \(\varvec{\omega }\) be a random variable with distribution \(\mathrm{Po}(\omega )\wedge 2\omega \), independent of everything else, and let \(\varvec{W}=\{v_{n-\varvec{\omega }+1},\ldots ,v_n\}\). Finally, obtain \({\varvec{G}}_*'\) from \({\varvec{G}}_*\) by removing the variable nodes in \(\varvec{W}\) along with their adjacent constraint nodes.

Thus, in \({\varvec{G}}_*'\) we pin the spins of the variable nodes in \(\varvec{U}_*\) and their neighbors to the values observed under \(\varvec{\sigma }_*\), which is drawn from \(\mu _{{\varvec{G}}}\). Additionally, we create cavities by removing the last \(\varvec{\omega }\) variable nodes along with their adjacent constraints. The following lemma shows that the removal of the variable nodes in \(\varvec{W}\) does not shift the marginals of the enhanced Boltzmann distribution much.

Lemma 4.1

With probability at least \(1-\omega ^{-10}\) over the choice of \(\varvec{\theta }_*\), \(\varvec{\sigma }_*\) and \({\varvec{G}}\) the following statements are true.

  1. (i)

    both \(\mu _{{\varvec{G}}_*}\) and \({\hat{\mu }}_{{\varvec{G}}_*}\) are \(\xi ^{1/4}\)-extremal.

  2. (ii)

    we have \(\sum _{v\in V_n{\setminus }(\varvec{W}\cup \partial ^2\varvec{W})}\Vert {{\hat{\mu }}_{{\varvec{G}}_*,v}-{\hat{\mu }}_{{\varvec{G}}_*',v}}\Vert _{\mathrm {TV}}<\vartheta n.\)

Proof

By construction, \({\hat{\mu }}_{{\varvec{G}}_*}\) is identical to the measure obtained through the pinning procedure of Lemma 3.15 applied to the \(\varvec{U}_*\)-components of the space \({\hat{\Omega }}_{{\varvec{G}}}\). Hence, Lemma 3.15 and Proposition 3.11 imply that \({\hat{\mu }}_{{\varvec{G}}_*}\) is \(\xi ^{1/4}\)-extremal with probability at least \(1-\xi ^{1/4}\). Since \(\mu _{{\varvec{G}}_*}\) is a projection of \({\hat{\mu }}_{{\varvec{G}}_*}\), we obtain (i).

Further, let \(\mathscr {V}=V_n{\setminus }(\varvec{W}\cup \partial ^2\varvec{W})\). If \({\hat{\mu }}_{{\varvec{G}}_*}\) is \(\xi ^{1/4}\)-extremal, then by the definition of the cut metric the distribution \({\hat{\mu }}_{{\varvec{G}}_*,\mathscr {V}}\) induced on the neighborhoods of \(\mathscr {V}\) is \(2\xi ^{1/4}\)-extremal, because \(|\mathscr {V}|\ge n/2\). Additionally, there is \(C=C(\omega )\) such that \({\hat{\mu }}_{{\varvec{G}}_*,\mathscr {V}}\) is C-contiguous with respect to \({\hat{\mu }}_{{\varvec{G}}_*',\mathscr {V}}\) with probability at least \(1-\omega ^{-11}\). This follows from (2.1), because \({\varvec{G}}_*'\) is obtained from \({\varvec{G}}_*\) by removing no more than \(d\omega \) constraint nodes. Therefore, (ii) follows from (i) and Lemma 3.17, provided that \(\xi ,\omega ,\vartheta \) are chosen appropriately in accordance with (4.3). \(\quad \square \)

The following proposition, which establishes the Belief Propagation equations on \({\varvec{G}}_*'\), constitutes the main technical step of the proof.

Proposition 4.2

With probability at least \(1-\alpha ^{9}\), \({\varvec{G}}'_*\) enjoys the following properties.

  1. (i)

    the standard messages \((\mu _{{\varvec{G}}'_*,v\rightarrow a},\mu _{{\varvec{G}}'_*,a\rightarrow v})_{v\in V({\varvec{G}}_*'),a\in \partial v}\) form an \(\alpha ^{9d}\)-Belief Propagation fixed point.

  2. (ii)

    we have

    $$\begin{aligned}&\sum _{v\in V({\varvec{G}}'_*)} \sum _{\sigma \in \Omega ^{\nabla ^2v}}\left| \mu _{{\varvec{G}}_*'}(\sigma )\right. \\&\qquad \left. -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*',w\rightarrow a}(\sigma _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}:\tau _v=\chi }\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\varvec{G}}_*',w\rightarrow a}(\tau _w)} \right|&\quad <\alpha ^{9d}n. \end{aligned}$$

Before we prove Proposition 4.2 in Sect. 4.2, let us indicate how the theorem follows. As a final preparation we need the following basic fact.

Lemma 4.3

For any factor graph G, for any variable node v, any \(S\subset \partial v\) and any \(\sigma \in \Omega \) we have

$$\begin{aligned} \mu _{G-S,v}(\sigma )= \frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/\prod _{a\in S}\psi _a(\varvec{\sigma })},{\mu _{G,v\cup \partial ^2v}}}\right\rangle }{\left\langle {{1/\prod _{a\in S}\psi _a(\varvec{\sigma })},{\mu _{G,v\cup \partial ^2v}}}\right\rangle }. \end{aligned}$$

Proof

The partition function works out to be

$$\begin{aligned} Z(G-S)&=\sum _{\sigma \in \Omega ^{V(G)}}\prod _{a\in F(G){\setminus } S}\psi _a(\sigma _{\partial a}) \prod _{v\in V(G)}p_{G,v}(\sigma _v)\\&=\sum _{\sigma \in \Omega ^{V(G)}}\frac{\prod _{a\in F(G)}\psi _a(\sigma _{\partial a})\prod _{v\in V(G)}p_{G,v}(\sigma _v)}{\prod _{a\in S}\psi _a(\sigma _{\partial a})}=Z(G)\left\langle {{1/\prod _{a\in S}\psi _a},{\mu _G}}\right\rangle . \end{aligned}$$

Hence, for any \(\tau \in \Omega ^{V(G)}\),

$$\begin{aligned} \mu _{G-S}(\tau )&=\frac{1}{Z(G-S)}\prod _{a\in F(G){\setminus } S}\psi _a(\tau _{\partial a})\prod _{v\in V(G)}p_{G,v}(\sigma _v)\\&=\frac{Z(G)}{Z(G-S)}\frac{1}{Z(G)}\frac{\prod _{a\in F(G)}\psi _a(\tau _{\partial a})\prod _{v\in V(G)}p_{G,v}(\sigma _v)}{\prod _{a\in S}\psi _a(\tau _{\partial a})}\\&=\frac{\mu _G(\tau )}{\left\langle {{1/\prod _{a\in S}\psi _a},{\mu _G}}\right\rangle \prod _{a\in S}\psi _a(\tau _{\partial a})}, \end{aligned}$$

and the average in the denominator involves variables in \(v\cup \partial ^2v\) only. \(\quad \square \)

Proof of Theorem 2.5

For any assignment \(\chi \in \mathscr {X}=\prod _{v\in \varvec{U}_*}\Omega ^{\nabla ^2v}\) of the variables in \(\varvec{U}_*\) and their neighborhoods let

$$\begin{aligned} S(\chi )=\{\sigma \in {\hat{\Omega }}_{{\varvec{G}}}:\forall v\in \varvec{U}_*,\,w\in \nabla ^2v:\sigma (v,w)=\chi (v,w)\}. \end{aligned}$$

Then \((S(\chi ))_{\chi \in \mathscr {X}}\) is a decomposition of \({\hat{\Omega }}_{{\varvec{G}}}\) into no more than \(q^{(1+d(k-1))\varvec{\theta }_*}\) sub-cubes, corresponding to the neighborhood assignments of the \(\varvec{\theta }_*\) variable nodes in \(\varvec{U}_*\). As Lemma 3.15 shows, by choosing the functions from (4.3) appropriately we can guarantee that \(q^{(1+d(k-1))\varvec{\theta }_*}\le L\). We are going to show that the decomposition \((S(\chi ))_{\chi }\) meets the requirements of the theorem w.h.p.

For \(\chi \in \mathscr {X}\) let \({\varvec{G}}_*[\chi ]\) be the random factor graph \({\varvec{G}}_*\) given that \(\varvec{\sigma }_{*}(w)=\chi (v,w)\) for all \(v\in \varvec{U}_*\) and all \(w\in \nabla ^2v\). Also let \({\varvec{G}}_*'[\chi ]\) be the factor graph obtained from \({\varvec{G}}_*[\chi ]\) by removing the variables in \(\varvec{W}\) along with their adjacent constraint nodes. Further, let \({{\mathscr {E}}}_\chi \) be the event that the following four conditions are satisfied.

E1::

Both \(\mu _{{\varvec{G}}_*[\chi ]}\) and \({\hat{\mu }}_{{\varvec{G}}_*[\chi ]}\) are \(\xi ^{1/8}\)-extremal.

E2::

We have

$$\begin{aligned} \sum _{v\in V_n{\setminus }(\varvec{W}\cup \partial _2\varvec{W})}\left\| {{\hat{\mu }}_{{\varvec{G}}_*[\chi ],v} -{\hat{\mu }}_{{\varvec{G}}_*'[\chi ],v}}\right\| _{\mathrm {TV}}<\vartheta n. \end{aligned}$$
(4.5)
E3::

On \({\varvec{G}}_*'[\chi ]\) the standard messages form an \(\alpha ^{9d}\)-BP fixed point and

$$\begin{aligned}&\sum _{v\not \in \varvec{W}\cup \partial ^2\varvec{W}} \sum _{\sigma \in \Omega ^{\nabla ^2v}}\nonumber \\&\quad \left| {\mu _{{\varvec{G}}_*'[\chi ]}(\sigma ) -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*'[\chi ],w\rightarrow a}(\sigma _w)}{\sum _{s\in \Omega }p(s)\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}:\tau _v=s}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\varvec{G}}_*'[\chi ],w\rightarrow a}(\tau _w)} }\right| \nonumber \\&\qquad <\alpha ^{9d}n. \end{aligned}$$
(4.6)
E4::

There are no more than \(\alpha ^{10d}n\) constraint nodes a in \({\varvec{G}}\) such that \(\min _{\sigma \in \Omega ^k}\psi _a(\sigma )\le \alpha ^{1/4}\), nor are there more than \(\varepsilon ^{20}n\) constraint nodes a such that \(\min _{\sigma \in \Omega ^k}\psi _a(\sigma )\le \varepsilon \).

Then (2.1), Lemma 4.1 and Proposition 4.2 yield \(\mathbb {E}\left[{\sum _{\chi }\mu _{{\varvec{G}}}(\chi )(1-\varvec{1}{{\mathscr {E}}}_\chi )\mid {\mathbf {E4}}}\right]\le \alpha ^8\). Thus, Markov’s inequality shows

$$\begin{aligned} \mathbb {P}\left[{\sum _{\chi }\mu _{{\varvec{G}}}(\chi )(1-\varvec{1}{{\mathscr {E}}}_\chi )\ge \alpha ^4}\right]\le \varepsilon ^4. \end{aligned}$$
(4.7)

Hence, we are left to argue that \(S(\chi )\) is an \(\varepsilon \)-Bethe state of \({\varvec{G}}\) if the event \({{\mathscr {E}}}_\chi \) occurs. As a first step, we are going to show that the standard messages of \({\varvec{G}}_*'[\chi ]\), \({\varvec{G}}_*[\chi ]\) are close: given \({{\mathscr {E}}}_\chi \), we claim

$$\begin{aligned}&\sum _{\sigma \in \Omega }\sum _{v\not \in \varvec{W}\cup \partial _2\varvec{W}}\sum _{a\in \partial v} \left| {\mu _{{\varvec{G}}_*[\chi ],v\rightarrow a}(\sigma )-\mu _{{\varvec{G}}_*'[\chi ],v\rightarrow a}(\sigma )}\right| \nonumber \\&\qquad + \left| {\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\sigma )-\mu _{{\varvec{G}}_*'[\chi ],a\rightarrow v}(\sigma )}\right| \nonumber \\&\quad <\alpha ^{8d}n. \end{aligned}$$
(4.8)

To see this, recall that \(\mu _{{\varvec{G}}_*[\chi ],v\rightarrow a}\) is the marginal of v in the factor graph \({\varvec{G}}_*[\chi ]-a\) and that \(\mu _{{\varvec{G}}_*[\chi ],v\rightarrow a}\) is the marginal of v in the factor graph obtained from \({\varvec{G}}_*[\chi ]\) by removing all \(b\in \partial v{\setminus } a\) and disregarding the prior. Hence, Lemma 4.3 shows that

$$\begin{aligned} \mu _{{\varvec{G}}_*[\chi ],v\rightarrow a}(\sigma )&= \frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/\psi _a(\varvec{\sigma })},{\mu _{{\varvec{G}}_*[\chi ],\nabla ^2 v}}}\right\rangle }{\left\langle {{1/\psi _a(\varvec{\sigma })},{\mu _{{\varvec{G}}_*[\chi ],\nabla ^2 v}}}\right\rangle },\end{aligned}$$
(4.9)
$$\begin{aligned} \mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\sigma )&= \frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/(p(\sigma )\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{\mu _{{\varvec{G}}_*[\chi ],\nabla ^2 v}}}\right\rangle }{\left\langle {{1/(p(\varvec{\sigma }_v)\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{\mu _{{\varvec{G}}_*[\chi ],\nabla ^2 v}}}\right\rangle }, \end{aligned}$$
(4.10)

and analogously for \({\varvec{G}}_*'[\chi ]\). Providing \(\vartheta \ll \alpha ^d\), we obtain (4.8) from (4.5), (4.9), (4.10) and E2, E4. Further, the estimate (4.8) and the fact that the standard messages of \({\varvec{G}}_*'[\chi ]\) are an \(\alpha ^{9d}\)-BP fixed point imply that the standard messages of \({\varvec{G}}_*[\chi ]\) are an \(\eta \)-BP fixed point. Thus, we have established BS1.

In order to prove BS2, we estimate the derivatives of a term like in (4.6) as follows:

$$\begin{aligned}&\left| {\frac{\partial }{\partial \nu _w(\sigma )} \frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\nu _{w}(\sigma (w))}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a}\nu _{w}(\tau (w))}}\right| \\&\quad \le \frac{1}{\min _{a\in \partial v,\tau \in \Omega ^{\partial a}}\psi _a(\tau )^{2d}}. \end{aligned}$$

Hence, (4.5), (4.6), (4.8) and E4 and the bound \(|\varvec{W}\cup \partial ^2\varvec{W}|=O(\log n)\) yield

$$\begin{aligned}&\sum _{v\in V_n} \sum _{\sigma \in \Omega ^{\nabla ^2v}}\left| {\mu _{{\varvec{G}}_*[\chi ]}(\sigma ) -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\sigma (w))}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau (w))} }\right| \nonumber \\&\quad <\alpha ^{2d}n. \end{aligned}$$
(4.11)

Additionally, we claim that

$$\begin{aligned} \sum _{b\in F_m} \sum _{\sigma \in \Omega ^{\partial b}}\left| {\mu _{{\varvec{G}}_*[\chi ]}(\sigma ) -\frac{\psi _b(\sigma )\prod _{w\in \partial b}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\sigma (w))}{\sum _{\tau \in \Omega ^{\partial b}}\psi _b(\tau )\prod _{w\in \partial b}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\tau (w))} }\right|&<\varepsilon ^9 n. \end{aligned}$$
(4.12)

To see this, suppose that b satisfies \(\min _{\sigma \in \Omega ^k}\psi _b(\sigma )\ge \varepsilon \), that \(b\in \partial v\) for a variable node v such that

$$\begin{aligned}&\sum _{\sigma \in \Omega ^{\nabla ^2v}}\left| {\mu _{{\varvec{G}}_*[\chi ]}(\sigma ) -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\sigma _w)}{\sum _{\kappa \in \Omega }p(\kappa )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\varvec{1}\{\tau _v=\kappa \} \prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)} }\right| \nonumber \\&<\alpha ^{d} \end{aligned}$$
(4.13)

and that

$$\begin{aligned}&\sum _{\sigma \in \Omega }\left| {\mu _{{\varvec{G}}_*[\chi ],v\rightarrow b}(\sigma )-\frac{p(\sigma )\prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\sigma )}{\sum _{\kappa \in \Omega }p(\kappa )\prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\kappa )}}\right| \quad <\eta ^{1/4},\end{aligned}$$
(4.14)
$$\begin{aligned}&\sum _{a\in \partial v}\sum _{\sigma \in \Omega }\left| {\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\sigma )- \frac{\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _v=\sigma \}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)}{\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)}}\right| \quad <\eta ^{1/4}. \end{aligned}$$
(4.15)

All but \(\varepsilon ^{10}n\) constraint nodes b enjoy these properties, due to E4, (4.11) and because the standard messages of \({\varvec{G}}_*[\chi ]\) form an \(\eta \)-BP fixed point. For any such b and any \(\sigma \in \Omega ^{\partial b}\) we obtain

$$\begin{aligned}&\mu _{{\varvec{G}}_*[\chi ],\partial b}(\sigma )=\sum _{\tau \in \Omega ^{\nabla ^2v}}\varvec{1}\{\tau _{\partial b}=\sigma \}\mu _{{\varvec{G}}_*[\chi ],\nabla ^2v}(\tau )\\&\quad {{\mathop {=}\limits ^{{{(4.13)}}}}}\sum _{\tau \in \Omega ^{\nabla ^2v}} \frac{\varvec{1}\{\tau _{\partial b}=\sigma \}p(\sigma _v)\prod _{a\in \partial v}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)}{\sum _{\kappa \in \Omega }\prod _{a\in \partial v}\sum _{\tau '\in \Omega ^{\partial a}}\psi _a(\tau ')\varvec{1}\{\tau '_v=\kappa \} \prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w')}+O(\alpha ^d)\\&\quad {{\mathop {=}\limits ^{{{(4.15)}}}}}\frac{p(\sigma _v)\psi _b(\sigma )\prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\sigma _w) \prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\sigma _v)\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)}{\sum _{\kappa \in \Omega ^{\partial b}}p(\kappa _v)\psi _b(\kappa ) \prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\kappa _w) \prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\kappa )\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)}\\&\qquad \qquad \qquad +O(\alpha ^d)\\&\quad =\frac{p(\sigma _v)\psi _b(\sigma )\prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\sigma _w) \prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\sigma _v)}{\sum _{\kappa \in \Omega ^{\partial b}}p(\kappa _v)\psi _b(\kappa ) \prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\kappa _w) \prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*[\chi ],a\rightarrow v}(\kappa )}+O(\alpha ^d)\\&\quad {{\mathop {=}\limits ^{{{(4.14)}}}}} \frac{\psi _b(\sigma )\prod _{w\in \partial b}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\sigma _w)}{\sum _{\kappa \in \Omega ^{\partial b}}\psi _b(\kappa )\prod _{w\in \partial b}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\kappa _w)}+O(\alpha ^d), \end{aligned}$$

whence (4.12) follows by averaging on b.

Finally, BS2 follows from (4.11), (4.12), the \(\xi ^{1/8}\)-extremality of \({\hat{\mu }}_{{\varvec{G}}_*[\chi ]}\). Indeed, let \(\varvec{I},\varvec{J}\) be random sets of at most \(1/\varepsilon \) variable/constraint nodes. For each \(b\in \varvec{J}\) pick a variable node \(v_b\in \partial b\). Because \({\hat{\mu }}_{{\varvec{G}}_*[\chi ]}\) is \(\xi ^{1/8}\)-extremal, Proposition 3.11 yields

$$\begin{aligned} \mathbb {E}\left\| {{\hat{\mu }}_{{\varvec{G}}_*[\chi ],\varvec{I}\cup \{v_b:b\in \varvec{J}\}}-\bigotimes _{v\in \varvec{I}\cup \{v_b:b\in \varvec{J}\}}{\hat{\mu }}_{{\varvec{G}}_*[\chi ],v}}\right\| _{\mathrm {TV}}&<\alpha ^4. \end{aligned}$$
(4.16)

Furthermore, (4.11) and (4.12) imply that with probability at least \(1-\varepsilon ^2\) over the choice of \(\varvec{I},\varvec{J}\) we have

$$\begin{aligned} \forall b\in \varvec{J}&: \sum _{\sigma \in \Omega ^{\partial b}}\left| {\mu _{{\varvec{G}}_*[\chi ]}(\sigma ) -\frac{\psi _b(\sigma )\prod _{w\in \partial b}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\sigma (w))}{\sum _{\tau \in \Omega ^{\partial b}}\psi _b(\tau )\prod _{w\in \partial b}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow b}(\tau (w))} }\right|<\varepsilon ^3,\\ \forall v\in \varvec{I}&: \sum _{\sigma \in \Omega ^{\nabla ^2v}}\left| {\mu _{{\varvec{G}}_*[\chi ]}(\sigma ) -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\sigma (w))}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau (w))} }\right| \\&\quad <\varepsilon ^3. \end{aligned}$$

If these estimates hold, then for any configuration \(\sigma \in \Omega ^{\varvec{I}\cup \partial \varvec{J}\cup \partial ^2\varvec{I}}\) we obtain

$$\begin{aligned}&\bigg |\bigotimes _{v\in \varvec{I}\cup \{v_b:b\in \varvec{J}\}}{\hat{\mu }}_{{\varvec{G}}_*[\chi ],v}(\sigma )\nonumber \\&\qquad -\prod _{v\in \varvec{I}}\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\sigma _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)} \cdot \nonumber \\&\qquad \cdot \prod _{a\in \varvec{J}}\frac{\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\sigma _w)}{\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*[\chi ],w\rightarrow a}(\tau _w)} \bigg |<\frac{\varepsilon }{2}. \end{aligned}$$
(4.17)

Thus, BS2 follows from (4.16) and (4.17).

Finally, to obtain the Bethe state decomposition of the simple factor graph \({\mathbb {G}}\), we merely recall that \(\mathbb {P}\left[{{\varvec{G}}\in \mathscr {S}}\right]=\Omega (1)\) by Fact 2.2. Hence, the claim about \({\mathbb {G}}\) follows immediately form the statement for \({\varvec{G}}\) and Bayes’ rule. \(\quad \square \)

4.2 Proof of Proposition 4.2

By construction, the random factor graph \({\varvec{G}}_*'\) comprises a pairing of variable clones \((v_i,h)\in V_n\times [d]\) and constraint clones \((a_j,h)\in F_m\times [d]\). But since we obtained \({\varvec{G}}_*'\) from \({\varvec{G}}_*\) by removing some variable nodes \(\varvec{W}\) along with their adjacent constraint nodes, not all of the variable clones \((v_i,h)\) with \(i\le n-\varvec{\omega }\) are paired. We call variables with at least one unpaired clone cavities. Let \(\mathscr {C}\) be the set of all cavities.

The basic idea behind the proof is as follows. We will add a new variable node \(v^+\) along with new adjacent constraint nodes \(b_1,\ldots ,b_d\) to \({\varvec{G}}_*'\). Apart from \(v^+\), these new constraint nodes are adjacent to some of the cavities. The fresh randomness afforded by this construction will facilitate the study of the standard messages from \(v^+\) to the \(b_i\) as well as the reverse messages. Then we will argue that \(v^+\) is essentially indistinguishable from a randomly chosen variable node of \({\varvec{G}}_*'\), thereby extending the analysis to almost all the messages of \({\varvec{G}}_*'\).

Formally, since \(\varvec{\omega }\) is a Poisson variable with mean \(\omega \) truncated at \(2\omega \), w.h.p. we have \(\omega /2\le |\mathscr {C}|\le 2d(k-1)\omega \). Given that \(|\mathscr {C}|\ge d(k-1)\), obtain \({\varvec{G}}_*^-\) from \({\varvec{G}}_*'\) by re-inserting one variable node \(v_+=v_{n-\varvec{\omega }+1}\) along with d new constraint nodes \(b_1,\ldots ,b_d\). For each of these constraint nodes a random clone \((b_i,\varvec{h}_i)\), \(\varvec{h}_i\in [k]\), is paired with a random clone \(v_+\). In addition, the \(b_i\) are paired randomly to \(k-1\) cavities. The weight functions \(\psi _{b_i}\) are chosen independently from P. The following lemma shows that the distributions of \({\varvec{G}}_*'\) and \({\varvec{G}}_*^-\) are reasonably close.

Lemma 4.4

For any event \({{\mathscr {E}}}\) we have \(\mathbb {P}\left[{{\varvec{G}}_*'\in {{\mathscr {E}}}}\right]\le \alpha ^{-1}\mathbb {P}\left[{{\varvec{G}}_*^-\in {{\mathscr {E}}}}\right]+O(\alpha ^{100}).\)

Proof

We need to get a grip on the conditional distribution of the second neighborhood of \(v_+\) in \({\varvec{G}}\) given \({\varvec{G}}_*'\). This is non-trivial because of the revised prior of \({\varvec{G}}_*\) introduced by the pinning operation (4.4); for the assignment \(\varvec{\sigma }_*\) is correlated with the neighborhood of \(v_+\) in \({\varvec{G}}\). To begin, let \(\mathscr {A}\) be the event that no constraint node of \({\varvec{G}}\) is connected by two edges with the variable nodes \(v_{n-\varvec{\omega }+1},\ldots ,v_n\) and that all cavities have degree precisely \(d-1\). Then (4.3) guarantees that

$$\begin{aligned} \mathbb {P}\left[{{\varvec{G}}\in \mathscr {A}}\right]=1-O(\omega ^2/n)=1-O(n^{-1/2}). \end{aligned}$$
(4.18)

Further, given \(\mathscr {A}\) the total number of cavities of \({\varvec{G}}_*'\) is equal to \(d(k-1)\varvec{\omega }\), and thus

$$\begin{aligned} \mathbb {P}\left[{\left| {\mathscr {C}}\right| \ge \omega /2\mid \mathscr {A}}\right]=1-O(\omega ^{-1}). \end{aligned}$$
(4.19)

Let \(\mathscr {A}'\) be the event that \(\mathscr {A}\) occurs, that \(\left| {\mathscr {C}}\right| \ge \omega /2\) and that the weight functions of all constraints adjacent to \(v_+\) take a minimum value of at least \(\alpha ^{-1/(2dk)}\). We condition on the event \(\mathscr {A}'\), which occurs with probability \(1+O(\alpha ^{100})\) due to (2.1), (4.18) and (4.19).

Let \(\mathscr {N}_1,\mathscr {N}_2\) be two possible outcomes of the depth-two neighborhoods of \(v_+\) in \({\varvec{G}}\) given \({\varvec{G}}_*'\). Thus, \(\mathscr {N}_1,\mathscr {N}_2\) specify the weight functions of the d constraints adjacent to \(v_+\), the pairing of the clones of \(v_+\) to those of these constraint nodes, and the pairing of these d constraint nodes and the cavities \(\mathscr {C}\). In addition, let \({\varvec{G}}'\) be the random factor graph obtained from \({\varvec{G}}_*'\) by restoring the prior to \(p^{\otimes n}\). Then we can set up a coupling \((\varvec{\Gamma }_1,\varvec{\Gamma }_2)\) of \({\varvec{G}}\) given \({\varvec{G}}',\mathscr {N}_1,\mathscr {A}'\) and of \({\varvec{G}}\) given \({\varvec{G}}',\mathscr {N}_2,\mathscr {A}'\) such that under \(\Gamma \) the two random factor graphs differ in no more than 2dk edges: the coupling simply switches the pairings occurring in \(\mathscr {N}_1\) but not in \(\mathscr {N}_2\), and vice versa. In effect, on \(\mathscr {A}'\) the Boltzmann distributions \(\mu _{\varvec{\Gamma }_1}, \mu _{\varvec{\Gamma }_2}\) are mutually \(\alpha ^{-1}\)-contiguous. Consequently, since the priors are amended according to samples from these respective Boltzmann distribution, we conclude that for any two outcomes \(\mathscr {N}_1,\mathscr {N}_2\) of the second neighborhood of \(v^+\) and for any possible outcome g of \({\varvec{G}}_*'\),

$$\begin{aligned} \mathbb {P}\left[{{\varvec{G}}_*'=g\mid \mathscr {N}_2,{\varvec{G}}',\mathscr {A}'}\right]\le \alpha ^{-1}\,\mathbb {P}\left[{{\varvec{G}}_*'=g\mid \mathscr {N}_1,{\varvec{G}}',\mathscr {A}'}\right]. \end{aligned}$$
(4.20)

Combining (4.18)–(4.20), we conclude that for any possible g and for any \(2\omega /3\le \omega _0\le 3\omega /2\),

$$\begin{aligned} \mathbb {P}\left[{{\varvec{G}}_*'=g\mid \varvec{\omega }=\omega _0}\right]\le \alpha ^{-1}\,\mathbb {P}\left[{{\varvec{G}}_*^-=g\mid \varvec{\omega }=\omega _0+1}\right]+O(\alpha ^{100}). \end{aligned}$$
(4.21)

Finally, the assertion follows from (4.21) because \(d_{\mathrm {TV}}(\varvec{\omega },\varvec{\omega }+1)=O(\omega ^{-1/2})=O(\alpha ^{200})\) and \(\mathbb {P}\left[{2\omega /3\le \varvec{\omega }\le 3\omega /2}\right]=1-\exp (-\Omega (\omega ))=1-O(\alpha ^{200})\). \(\quad \square \)

Lemma 4.4 shows that studying the messages received by and emanating from \(v_+\) is about as good as studying the messages of a random variable node of \({\varvec{G}}_*'\). The randomness involved in the attachment process will help, but is not yet quite sufficient to actually verify the Belief Propagation equations. Namely, we also need to make sure that the Boltzmann distribution of the cavities is extremal in order to argue that typically the joint distribution of the variables where the new constraints \(b_1,\ldots ,b_d\) are anchored factorizes. Unfortunately, we do not know a priori that extremality holds. Indeed, while going from \({\varvec{G}}\) to \({\varvec{G}}_*'\) renders the Boltzmann distribution \(\xi ^{1/4}\)-extremal (by Lemma 4.1), the cavities are too few in number to conclude that their joint distribution is extremal.

Hence, we will apply a second round of pinning. But this time we will pin the cavities directly. To be precise, recalling the random variable \(\varvec{\theta }_{+}=\varvec{\theta }_{\zeta }\) from Lemma 3.15, let \(\mathscr {C}_+\subset \mathscr {C}\) be a random subset of size \(\varvec{\theta }_{+}\wedge |\mathscr {C}|\). Further, draw a sample \(\varvec{\sigma }_{+}\) from \(\mu _{{\varvec{G}}_*'}\). The choice of \(\mathscr {C}_+,\varvec{\sigma }_{+}\) is independent of the choice of the constraints \(b_1,\ldots ,b_d\), and \(\varvec{\sigma }_+\) is independent of \(\mathscr {C}_+\). Now, obtain \({\varvec{G}}_*''\) from \({\varvec{G}}_*'\) by changing the prior to

$$\begin{aligned} p_{{\varvec{G}}_*''}(\sigma )&\propto p_{{\varvec{G}}_*'}(\sigma )\prod _{y\in \mathscr {C}_+}\varvec{1}\{\sigma =\varvec{\sigma }_{+}(y)\}. \end{aligned}$$
(4.22)

Thus, we pin the cavities \(y\in \mathscr {C}_+\) to the spins observed under \(\varvec{\sigma }_{+}\), which are independent of \(b_1,\ldots ,b_d\).

Lemma 4.5

The joint distribution \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) of the cavities is \(\zeta \)-symmetric with probability at least \(1-\zeta \).

Proof

Since \(|\mathscr {C}|\ge \omega /2\) with probability \(1-\exp (-\Omega (\omega ))\), the assertion follows immediately from Lemma 3.15 and the construction of \({\varvec{G}}_*''\). \(\quad \square \)

Additionally, obtain \({\varvec{G}}^+_*\) from \({\varvec{G}}^-_*\) by changing the prior as per (4.22) as well, i.e.,

$$\begin{aligned} p_{{\varvec{G}}^+_*}(\sigma )\propto p_{{\varvec{G}}_*^-}(\sigma )\prod _{y\in \mathscr {C}_+}\varvec{1}\{\sigma =\varvec{\sigma }_{+}(y)\}. \end{aligned}$$
(4.23)

We are ready to verify the Belief Propagation equations for \(v_+\) on \({\varvec{G}}_*^+\).

Lemma 4.6

With probability \(1-O(\alpha ^{90})\) the random factor graph \({\varvec{G}}_*^+\) has the following properties:

$$\begin{aligned}&\left| {\mu _{{\varvec{G}}_*^+,b_i\rightarrow v_+}(\sigma )-\frac{\sum _{\tau \in \Omega ^{\partial b_i}}\varvec{1}\{\tau _{v_+}=\sigma \}\psi _{b_i}(\tau )\prod _{w\in \partial b_i{\setminus } v_+}\mu _{{\varvec{G}}_*^+,w\rightarrow b_i}(\tau _y)}{\sum _{\tau \in \Omega ^{\partial b_i}}\psi _{b_i}(\tau )\prod _{w\in \partial b_i{\setminus } v_+}\mu _{{\varvec{G}}_*^+,w\rightarrow b_i}(\tau _w)}}\right| \nonumber \\&\quad \le \alpha ^{70d}\quad \qquad \forall i\in [d],\sigma \in \Omega , \end{aligned}$$
(4.24)
$$\begin{aligned}&\left| {\mu _{{\varvec{G}}_*^+,v_+\rightarrow b_i}(\sigma )- \frac{p(\sigma )\prod _{j\ne i}\sum _{\tau \in \Omega ^{\partial b_j}}\varvec{1}\{\tau _{v_+}=\sigma \}\psi _{b_j}(\tau )\prod _{w\in \partial b_j{\setminus } v_+}\mu _{{\varvec{G}}_*^+,w\rightarrow b_j}(\tau _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{j\ne i}\sum _{\tau \in \Omega ^{\partial b_j}}\psi _{b_j}(\tau )\varvec{1}\{\tau _{v_+}=\chi \}\prod _{w\in \partial b_j{\setminus } v_+}\mu _{{\varvec{G}}_*^+,w\rightarrow b_j}(\tau _w)}}\right| \nonumber \\&\quad \le \alpha ^{70d}\quad \qquad \forall i\in [d],\sigma \in \Omega , \end{aligned}$$
(4.25)
$$\begin{aligned}&\left| {\mu _{{\varvec{G}}_*^+}(\sigma ) -\frac{p(\sigma )\prod _{i=1}^d\psi _{b_i}(\sigma )\prod _{w\in \partial b_i{\setminus } v_+}\mu _{{\varvec{G}}_*^+,w\rightarrow b_i}(\sigma _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{i=1}^d\sum _{\tau \in \Omega ^{\partial b_i}}\varvec{1}\{\tau _{v_+}=\chi \}\psi _{b_i}(\tau ) \prod _{w\in \partial b_i{\setminus } v_+}\mu _{{\varvec{G}}_*^+,w\rightarrow b_i}(\tau _w)} }\right| \nonumber \\&\quad <\alpha ^{70d}\quad \qquad \forall \sigma \in \Omega ^{\nabla ^2v_+}. \end{aligned}$$
(4.26)

Proof

Lemma 4.5 shows that \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \(\zeta \)-symmetric with probability at least \(1-\zeta \). Suppose it is. Then Proposition 3.11 shows that \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \((\beta ,d(k-1))\)-symmetric. We may also assume that \(|\mathscr {C}|\ge \omega /2\), an event that occurs with probability at least \(1-\exp (-\Omega (\omega ))\) by the construction of \({\varvec{G}}_*'\). Additionally, due to (2.1) we may assume that

$$\begin{aligned} \min _{\sigma \in \Omega ^k}\psi _{b_i}(\sigma )\ge \alpha \qquad \text{ for } \text{ all } i\in [d]\text{. } \end{aligned}$$
(4.27)

According to (3.2), the standard message \(\mu _{{\varvec{G}}_*^+,b_1\rightarrow v_+}\) is defined as the marginal of \(v_+\) in the factor graph obtained from \(G_*^+\) by removing \(b_2,\ldots ,b_d\) and replacing the prior of \(v_+\) by the uniform distribution. By construction, this factor graph is obtained from \({\varvec{G}}_*''\) by adding the variable node \(v_+\) and constraint node \(b_1\) and replacing the prior of \(v_+\) by the uniform distribution. Therefore,

$$\begin{aligned}&\mu _{{\varvec{G}}_*^+,b_1\rightarrow v_+}(\sigma )\nonumber \\&\quad =\frac{\sum _{\tau \in \Omega ^{\partial b_1}}\varvec{1}\{\tau _{v_+}=\sigma \}\psi _{b_1}(\tau ) \left\langle {{\varvec{1}\{\forall w\in \partial b_1{\setminus } v_+:\varvec{\sigma }_w=\tau _w\}},{\mu _{{\varvec{G}}_*'',\mathscr {C}}}}\right\rangle }{\sum _{\tau \in \Omega ^{\partial b_1}}\psi _{b_1}(\tau ) \left\langle {{\varvec{1}\{\forall w\in \partial b_1{\setminus } v_+:\varvec{\sigma }_w=\tau _w\}},{\mu _{{\varvec{G}}_*'',\mathscr {C}}}}\right\rangle }\qquad (\sigma \in \Omega ). \end{aligned}$$
(4.28)

Further, the neighbors \(\partial b_1{\setminus } v^+\) are chosen uniformly from \(\mathscr {C}\) (without replacement). Because \(|\mathscr {C}|\ge \omega /2\) and \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \((\beta ,d(k-1))\)-symmetric, we conclude that

$$\begin{aligned} \mathbb {P}\left[{\left\| {\mu _{{\varvec{G}}_*'',\partial b_1{\setminus } v^+}-\bigotimes _{w\in \partial b_1{\setminus } v^+}\mu _{{\varvec{G}}_*'',w}}\right\| _{\mathrm {TV}}\le \beta ^{1/3}}\right]&\ge 1-\beta ^{1/3}. \end{aligned}$$
(4.29)

Combining (4.27), (4.28) and (4.29), we obtain the estimate

$$\begin{aligned} \mathbb {E}\left| {\mu _{{\varvec{G}}_*^+,b_1\rightarrow v^+}(\sigma )- \frac{\sum _{\tau \in \Omega ^{\partial b_1}}\varvec{1}\{\tau _{v^+}=\sigma \}\psi _{b_1}(\tau )\prod _{w\in \partial b_1{\setminus } x^+}\mu _{{\varvec{G}}_*'',w}(\tau _w)}{\sum _{\tau \in \Omega ^{\partial b_1}}\psi _{b_1}(\tau )\prod _{w\in \partial b_1{\setminus } v^+}\mu _{{\varvec{G}}_*'',w}(\tau _w)} }\right|&\le \beta ^{1/4}. \end{aligned}$$
(4.30)

Moreover, the factor graph \({\varvec{G}}^+_*-b_1\) is obtained from \({\varvec{G}}_*''\) by adding \(v_+\) and \(b_2,\ldots ,b_d\). Hence, (4.27) implies that \(\mu _{{\varvec{G}}^+_*-b_1,\mathscr {C}}\) is \((2/\alpha )^{dk}\)-contiguous with respect to \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\). Since \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \(\zeta \)-symmetric, Proposition 3.11 and Lemma 3.17 yield \(\Delta _{\Box }(\mu _{{\varvec{G}}_*^+-b_1,\mathscr {C}},\mu _{{\varvec{G}}_*'',\mathscr {C}})<\beta .\) Because the neighborhood \(\partial b_1\) is random, Lemma 3.12 therefore yields

$$\begin{aligned} \mathbb {E}\sum _{w\in \partial b_1{\setminus } v_+}\left\| {\mu _{{\varvec{G}}_*'',w}-\mu _{{\varvec{G}}_*^+,w\rightarrow b_1}}\right\| _{\mathrm {TV}}&\le O(\beta ). \end{aligned}$$
(4.31)

Combining (4.27), (4.30) and (4.31), we obtain (4.24).

The proofs of (4.25) and (4.26) are similar. Indeed, \(\mu _{{\varvec{G}}_*^+,v_+\rightarrow b_1}\) is the marginal of \(v_+\) in \({\varvec{G}}_*^+-b_1\), which is obtained from \({\varvec{G}}_*''\) by adding \(b_2,\ldots ,b_d\). Hence,

$$\begin{aligned}&\mu _{{\varvec{G}}_*^+,v_+\rightarrow b_1}(\sigma )\\&\quad =\frac{\sum _{\tau \in \Omega ^{\{v_+\}\cup \partial ^2v_+}}p(\sigma )\varvec{1}\{\tau _{v_+}=\sigma \} \left\langle {{\varvec{1}\{\forall w\in \partial ^2v_+:\varvec{\sigma }_w=\tau _w\}},{\mu _{{\varvec{G}}_*'',\mathscr {C}}}}\right\rangle \prod _{i=2}^d\psi _{b_i}(\tau )}{\sum _{\tau \in \Omega ^{\{v_+\}\cup \partial ^2v_+}}p(\tau _{v_+})\left\langle {{\varvec{1}\{\forall w\in \partial ^2v_+:\varvec{\sigma }_w=\tau _w\}},{\mu _{{\varvec{G}}_*'',\mathscr {C}}}}\right\rangle \prod _{i=2}^d\psi _{b_i}(\tau )}. \end{aligned}$$

Invoking the \((\beta ,d(k-1))\)-symmetry of \(\mu _{{\varvec{G}}_*''}\) and (4.27), we obtain

$$\begin{aligned}&\mathbb {E}\left| {\mu _{{\varvec{G}}_*^+,v_+\rightarrow b_1}(\sigma )-\frac{\sum _{\tau \in \Omega ^{\{v_+\}\cup \partial ^2v_+}}p(\sigma )\varvec{1}\{\tau _{v_+}=\sigma \} \prod _{i=2}^d\psi _{b_i}(\tau )\prod _{w\in \partial ^2v_+}\mu _{{\varvec{G}}_*'',w}(\tau _w)}{\sum _{\tau \in \Omega ^{\{v_+\}\cup \partial ^2v_+}}p(\tau _{v_+}) \prod _{i=2}^d\psi _{b_i}(\tau )\prod _{w\in \partial ^2v_+}\mu _{{\varvec{G}}_*'',w}(\tau _w)}}\right| \\&\quad \le \beta ^{1/4}. \end{aligned}$$

Moreover, reordering the sums and products, we simplify the last expression and find

$$\begin{aligned}&\mathbb {E}\left| {\mu _{{\varvec{G}}_*^+,v_+\rightarrow b_1}(\sigma )-\frac{p(\sigma ) \prod _{i=2}^d\sum _{\tau \in \Omega ^{\partial b_i}}\varvec{1}\{\tau _{v_+}=\sigma \}\psi _{b_i}(\tau )\prod _{w\in \partial b_i{\setminus } v_+}\mu _{{\varvec{G}}_*'',w}(\tau _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{i=2}^d\sum _{\tau \in \Omega ^{\partial b_i}}\varvec{1}\{\tau _{v_+}=\chi \}\psi _{b_i}(\tau )\prod _{w\in \partial b_iv v_+}\mu _{{\varvec{G}}_*'',w}(\tau _w)}}\right| \nonumber \\&\le \beta ^{1/4}. \end{aligned}$$
(4.32)

Further, (4.27) ensures that for each \(i\in [d]\) the distribution \(\mu _{{\varvec{G}}_*^+-b_i,\mathscr {C}}\) is \((2/\alpha )^{dk}\)-contiguous with respect to \({\varvec{G}}_*''\). Consequently, since the neighbors of \(b_i\) are chosen randomly from the set \(\mathscr {C}\) of cavities, Proposition 3.11 and Lemma 3.17 yield

$$\begin{aligned} \sum _{i=2}^d\mathbb {E}\left[{\sum _{w\in \partial b_i{\setminus } v_+}\left\| {\mu _{{\varvec{G}}_*'',w}-\mu _{{\varvec{G}}_*^+,w\rightarrow b_i}}\right\| _{\mathrm {TV}}}\right]&\le O(\beta ). \end{aligned}$$
(4.33)

Combining (4.32) and (4.33), we obtain (4.25).

Moving on to (4.26), we consider \(\sigma \in \Omega ^{\{v^+\}\cup \partial ^2v^+}\). Since \({\varvec{G}}_*^+\) is obtained from \({\varvec{G}}_*''\) by adding \(v_+\) along with \(b_1,\ldots ,b_d\), we have the exact formula

$$\begin{aligned}&\mu _{{\varvec{G}}_*^+}(\sigma )\\&\quad = \frac{p(\sigma )\left\langle {{\varvec{1}\{\forall w\in \partial ^2v^+:\varvec{\sigma }_w=\sigma _w},{\mu _{{\varvec{G}}_*''}}}\right\rangle \prod _{i=1}^d\psi _{b_i}(\sigma )}{\sum _{\chi \in \Omega }p(\chi ) \left\langle {{\prod _{i=1}^d\sum _{\tau \in \Omega ^{\partial b_i}}\varvec{1}\{\tau _{v^+}=\chi \}\psi _{b_i}(\tau ) \prod _{w\in \partial b_i{\setminus } v^+}\varvec{1}\{\varvec{\sigma }_w=\tau _w\}},{\mu _{{\varvec{G}}_*''}}}\right\rangle } \qquad (\sigma \in \Omega ^{\nabla ^2v_+}). \end{aligned}$$

Since \(\partial ^2v^+\) is a random set of cavities, the \((\beta ,d(k-1))\)-symmetry of \(\mu _{{\varvec{G}}_*''}\) and (4.27) ensure that

$$\begin{aligned}&\mathbb {E}\left| {\mu _{{\varvec{G}}_*^+}(\sigma )- \frac{p(\sigma )\prod _{w\in \partial ^2v^+}\mu _{{\varvec{G}}_*'',w}(\sigma _w)\prod _{i=1}^d\psi _{b_i}(\sigma )}{\sum _{\chi \in \Omega }p(\chi )\prod _{i=1}^d\sum _{\tau \in \Omega ^{\partial b_i}}\varvec{1}\{\tau _{v^+}=\chi \}\psi _{b_i}(\tau ) \prod _{w\in \partial b_i{\setminus } v^+}\mu _{{\varvec{G}}_*'',w}(\tau _w)}}\right| \nonumber \\&\quad \le \beta ^{1/4}. \end{aligned}$$
(4.34)

Finally, to complete the proof we combine (4.33) and (4.34). \(\quad \square \)

We set up the random factor graph \({\varvec{G}}_*^+\) so as to facilitate the verification of the BP equations. But in a sense the model is a bit ‘out of whack’ because the prior is pinned according to a configuration \(\varvec{\sigma }_+\) drawn from \(\mu _{{\varvec{G}}_*'}\) rather than \(\mu _{{\varvec{G}}_*^-}\); see (4.23). Thus, with \(\varvec{\theta }_{+}=\varvec{\theta }_{\zeta }\) and \(\mathscr {C}_+\subset \mathscr {C}\) as before, draw \(\varvec{\sigma }_{++}\) from \(\mu _{{\varvec{G}}_*^-}\) and let \({\varvec{G}}_*^{++}\) be the random factor graph obtained from \({\varvec{G}}_*^-\) by changing the prior to

$$\begin{aligned} p_{{\varvec{G}}_*^{++}}(\sigma )\propto p_{{\varvec{G}}_*'}(\sigma )\prod _{w\in \mathscr {C}_+}\varvec{1}\{\sigma =\varvec{\sigma }_{++}(w)\}. \end{aligned}$$

Hence, \(\varvec{\sigma }_{++}\) takes \(v_+,b_1,\ldots ,b_d\) into account.

Corollary 4.7

With probability \(1-O(\alpha ^{80})\) the bounds (4.24)–(4.26) hold with \({\varvec{G}}_*^+\) replaced by \({\varvec{G}}_*^{++}\).

Proof

The only difference between \({\varvec{G}}_*^{++}\) and \({\varvec{G}}_*^+\) lies in the choice of the configuration to which the variable nodes in \(\mathscr {C}_+\) get pinned. But since \({\varvec{G}}_*^+\) is obtained from \({\varvec{G}}_*''\) by the mere addition of d constraint nodes \(b_1,\ldots ,b_d\), (2.1) shows that \(\varvec{\sigma }_{++}\) is \(\alpha ^{-1}\)-contiguous with respect to the distribution of \(\varvec{\sigma }_{+}\) with probability \(1-O(\alpha ^{80}).\) Thus, the assertion follows from Lemma 4.6. \(\quad \square \)

We are finally ready to go back to the random factor graph \({\varvec{G}}_*''\). Indeed, basically the only difference between \({\varvec{G}}_*^{++}\) and \({\varvec{G}}_*''\) is that the former has one more variable node, along with d adjacent constraint nodes. But since the number of variable nodes of \({\varvec{G}}_*''\) is random, this difference should hardly be noticeable. Also \({\varvec{G}}_*''\) is invariant under permutations of its variable nodes. Thus, whatever we can prove for the last variable node \(v_+\) of \({\varvec{G}}_*^{++}\) carries over to a random variable node of \({\varvec{G}}_*''\). The following corollary makes this precise.

Corollary 4.8

With probability \(1-O(\alpha ^{70})\) we have

$$\begin{aligned}&\sum _{v\in V_n}\sum _{b\in \partial v}\sum _{\sigma \in \Omega }\nonumber \\&\quad \left| {\mu _{{\varvec{G}}_*'',b\rightarrow v}(\sigma )-\frac{\sum _{\tau \in \Omega ^{\partial b}}\varvec{1}\{\tau _v=\sigma \}\psi _{b}(\tau )\prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*'',w\rightarrow b}(\tau _w)}{\sum _{\tau \in \Omega ^{\partial b}}\psi _{b}(\tau )\prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*'',w\rightarrow b}(\tau _w)}}\right| \le n\alpha ^{60d}, \end{aligned}$$
(4.35)
$$\begin{aligned}&\sum _{v\in V_n}\sum _{b\in \partial v}\sum _{\sigma \in \Omega }\nonumber \\&\quad \left| {\mu _{{\varvec{G}}_*'',v\rightarrow b}(\sigma )- \frac{p(\sigma )\prod _{a\in \partial v{\setminus } b}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _{v}=\sigma \}\psi _{a}(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*'',w\rightarrow a}(\tau _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v{\setminus } b}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _{v}=\chi \}\psi _{a}(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*'',w\rightarrow a}(\tau _w)}}\right| \nonumber \\&\qquad \le n\alpha ^{60d} \end{aligned}$$
(4.36)
$$\begin{aligned}&\sum _{v\in V_n}\sum _{\sigma \in \Omega ^{x\cup \partial ^2x}}\nonumber \\&\quad \left| {\mu _{{\varvec{G}}_*''}(\sigma ) -\frac{p(\sigma )\prod _{a\in \partial v}\psi _{a}(\sigma )\prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*'',w\rightarrow a}(\sigma _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _{v}=\chi \}\psi _{a}(\tau ) \prod _{w\in \partial a{\setminus } v}\mu _{{\varvec{G}}_*'',w\rightarrow a}(\tau _w)} }\right| <n\alpha ^{60d}. \end{aligned}$$
(4.37)

Proof

Consider the event \({{\mathscr {E}}}\) that in \({\varvec{G}}_*''\), for the variable node \(v_{n-\varvec{\omega }}\) with the largest index the estimate

$$\begin{aligned}&\sum _{b\in \partial v_{n-\varvec{\omega }}}\sum _{\sigma \in \Omega } \left| {\mu _{{\varvec{G}}_*'',b\rightarrow v_{n-\varvec{\omega }}}(\sigma )-\frac{\sum _{\tau \in \Omega ^{\partial b}}\varvec{1}\{\tau (v_{n-\varvec{\omega }})=\sigma \}\psi _{b}(\tau )\prod _{w\in \partial b{\setminus } v_{n-\varvec{\omega }}}\mu _{{\varvec{G}}_*'',w\rightarrow b}(\tau (w))}{\sum _{\tau \in \Omega ^{\partial b}}\psi _{b}(\tau )\prod _{w\in \partial b{\setminus } v_{n-\varvec{\omega }}}\mu _{{\varvec{G}}_*'',w\rightarrow b}(\tau _w)}}\right| \nonumber \\&\quad \le \alpha ^{69} \end{aligned}$$
(4.38)

holds. Since \({\varvec{G}}_*^{++}\) is obtained from \({\varvec{G}}_*^+\) by the same process that produces \({\varvec{G}}_*''\) from \({\varvec{G}}_*'\), Lemma 4.4 and Corollary 4.7 show that \(\mathbb {P}\left[{{\varvec{G}}_*''\in {{\mathscr {E}}}}\right]=1-O(\alpha ^{70})\). But since the distribution of \({\varvec{G}}_*''\) is invariant under permutations of the \(n-O(\omega )\) variable nodes of degree d, we can replace \(v_{n-\varvec{\omega }}\) in (4.38) by a random variable node of degree d. Thus, we obtain (4.35). The two bounds (4.36) and (4.37) follow analogously. \(\quad \square \)

To complete the proof of Proposition 4.2, we finally need to get from \({\varvec{G}}_*''\) back to \({\varvec{G}}_*'\). Thus, we need to undo the additional pinning of the cavities that was required to verify the BP equations (4.35)–(4.37). The elegant insight that makes this possible is that (4.35)–(4.37) really just describe a property of the joint distribution of the second neighborhoods of the variable nodes \(v_i\), \(i\ne n-\varvec{\omega }\). Indeed, by Lemma 4.3 the standard messages, defined via the removal of a few constraints adjacent to a single variable node v, can be expressed easily in terms of the joint distribution of the second neighborhood of v. Furthermore, Lemma 4.1 implies that the enhanced measure \({\hat{\mu }}_{{\varvec{G}}_*'}\) describing the second neighborhood distributions is \(\xi ^{1/4}\)-extremal, with \(\xi \) is near the top of the pecking order (4.3). In effect, \({\hat{\mu }}_{{\varvec{G}}_*'}\) is impervious to the additional pinning required to go from \({\varvec{G}}_*'\) to \({\varvec{G}}_*''\). Let us formalize this argument to finish the proof of Proposition 4.2.

Proof of Proposition 4.2

By Lemma 4.1 the measure \({\hat{\mu }}_{{\varvec{G}}^*}\) is \(\xi ^{1/4}\)-extremal with probability \(1-\omega ^{-1}\). Consequently, since \({\varvec{G}}_*'\) is obtained by deleting \(O(\omega )\) constraints, Lemma 3.17 and (2.1) ensure that \({\hat{\mu }}_{{\varvec{G}}_*'}\) is \(\vartheta \)-extremal with probability \(1-\alpha ^{10}\). Furthermore, \({\hat{\mu }}_{{\varvec{G}}_*''}\) is nothing but the conditional distribution \({\hat{\mu }}_{{\varvec{G}}_*'}\) given the event \(S_+\) that the spins of the \(\varvec{\theta }_\zeta \) cavities \(\mathscr {C}_+\) coincide with the ones of the reference configuration \(\varvec{\sigma }_+\). Since \(\varvec{\sigma }_+\) is drawn from \({\hat{\mu }}_{{\varvec{G}}_*'}\), with probability at least \(1-\zeta \) we have

$$\begin{aligned} {\hat{\mu }}_{{\varvec{G}}_*'}(S_+)\ge \zeta q^{-\varvec{\theta }_\zeta }. \end{aligned}$$

If so, and if \({\hat{\mu }}_{{\varvec{G}}_*'}\) is \(\vartheta \)-extremal, then (4.3) and Corollary 3.18 imply that \(\Delta _{\Box }({\hat{\mu }}_{{\varvec{G}}_*''},{\hat{\mu }}_{{\varvec{G}}_*'})\le \beta \). In summary,

$$\begin{aligned} \mathbb {P}\left[{\Delta _{\Box }({\hat{\mu }}_{{\varvec{G}}_*''},{\hat{\mu }}_{{\varvec{G}}_*'})\le \beta }\right]&\ge 1-2\zeta . \end{aligned}$$

In addition (2.1) ensures that with probability at least \(1-\alpha ^{10}\),

$$\begin{aligned} \left| {\left\{ {a\in F({\varvec{G}}_*'):\min _{\sigma \in \Omega ^k}\psi _a(\sigma )\le \alpha }\right\} }\right|&\le n\alpha ^{1000d}. \end{aligned}$$
(4.39)

Further, by Corollary 4.8 the bounds (4.35)–(4.37) hold with probability \(1-O(\alpha ^{70})\).

Thus, we are left to prove statements (i) and (ii) under the assumption that \(\Delta _{\Box }({\hat{\mu }}_{{\varvec{G}}_*''},{\hat{\mu }}_{{\varvec{G}}_*'})\le \beta \) and that (4.35)–(4.37) and (4.39) hold. Applying Lemma 3.12, we obtain

$$\begin{aligned} \sum _{v\in V}\left\| {{\hat{\mu }}_{{\varvec{G}}_*'',v}-{\hat{\mu }}_{{\varvec{G}}_*',v}}\right\| _{\mathrm {TV}}&\le O(\beta ). \end{aligned}$$
(4.40)

Further, Lemma 4.3 shows that the messages \(\mu _{{\varvec{G}}_*'',v\rightarrow a}\), \(\mu _{{\varvec{G}}_*'',a\rightarrow v}\) and \(\mu _{{\varvec{G}}_*',v\rightarrow a}\), \(\mu _{{\varvec{G}}_*',a\rightarrow v}\) can be expressed in terms of the marginal distributions \({\hat{\mu }}_{{\varvec{G}}_*'',v}\) and \({\hat{\mu }}_{{\varvec{G}}_*',v}\) of the depth-two neighborhood. Indeed, according to (3.1)–(3.2), for any \(a\in \partial v\) and \(\sigma \in \Omega \),

$$\begin{aligned} \mu _{{\varvec{G}}_*'',v\rightarrow a}(\sigma )&=\frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/\psi _a(\varvec{\sigma })},{{\hat{\mu }}_{{\varvec{G}}_*'',v}}}\right\rangle }{\left\langle {{1/\psi _a(\varvec{\sigma })},{{\hat{\mu }}_{{\varvec{G}}_*'',v}}}\right\rangle },\\ \mu _{{\varvec{G}}_*'',a\rightarrow v}(\sigma )&=\frac{\left\langle {{\varvec{1}\{\varvec{\sigma }_v=\sigma \}/(p(\sigma )\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{{\hat{\mu }}_{{\varvec{G}}_*'',v}}}\right\rangle }{\left\langle {{1/(p(\varvec{\sigma }_v)\prod _{b\in \partial v{\setminus } a}\psi _b(\varvec{\sigma }))},{{\hat{\mu }}_{{\varvec{G}}_*'',v}}}\right\rangle }, \end{aligned}$$

and analogously for \({\varvec{G}}_*'\). Hence, the total variation bound (4.40) and (4.39) imply that

$$\begin{aligned} \sum _{v\in V}\sum _{a\in \partial v}\left\| {\mu _{{\varvec{G}}_*'',v\rightarrow a}-\mu _{{\varvec{G}}_*',v\rightarrow a}}\right\| _{\mathrm {TV}}+\left\| {\mu _{{\varvec{G}}_*'',a\rightarrow v}-\mu _{{\varvec{G}}_*',a\rightarrow v}}\right\| _{\mathrm {TV}}&=O(n\alpha ^{900d}). \end{aligned}$$
(4.41)

Combining (4.37) and (4.39)–(4.41), we obtain assertion (ii). Further, (4.35), (4.39) and (4.41) readily yield

$$\begin{aligned}&\sum _{v\in V}\sum _{b\in \partial v}\sum _{\sigma \in \Omega } \left| {\mu _{{\varvec{G}}_*',b\rightarrow v}(\sigma )-\frac{\sum _{\tau \in \Omega ^{\partial b}}\varvec{1}\{\tau (v)=\sigma \}\psi _{b}(\tau )\prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*',w\rightarrow b}(\tau (w))}{\sum _{\tau \in \Omega ^{\partial b}}\psi _{b}(\tau )\prod _{w\in \partial b{\setminus } v}\mu _{{\varvec{G}}_*',w\rightarrow b}(\tau (w))}}\right| \nonumber \\&\le O(n\alpha ^{60d}). \end{aligned}$$
(4.42)

Moreover, combining (4.36), (4.39), (4.40) and (4.42), we obtain

$$\begin{aligned} \sum _{v\in V}\sum _{b\in \partial v}\sum _{\sigma \in \Omega } \left| {\mu _{{\varvec{G}}_*',v\rightarrow b}(\sigma )-\frac{p(\sigma )\prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*',a\rightarrow v}(\sigma )}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v{\setminus } b}\mu _{{\varvec{G}}_*',a\rightarrow v}(\chi )}}\right|&\le O(n\alpha ^{50d}). \end{aligned}$$
(4.43)

Finally, (4.42) and (4.43) show that the standard messages are a \(O(\alpha ^{40d})\)-BP fixed point.

\(\square \)

5 The Free Energy: Upper Bound

5.1 Outline

In this section we derive the following upper bound on the free energy.

Proposition 5.1

Assume that POS is satisfied. Then

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\varvec{G}})}\right]&\le \inf _{\pi \in \mathfrak {D}^\star }\mathscr {B}(\pi ),&\limsup _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\log Z({\mathbb {G}})}\right]&\le \inf _{\pi \in \mathfrak {D}^\star }\mathscr {B}(\pi ). \end{aligned}$$

The proof of Proposition 5.1 consists of two parts. First, we will prove that any\(\mu \in \mathscr {K}\) yields an upper bound on \(\mathbb {E}\left[{\log Z({\varvec{G}})}\right]\). Specifically, recalling the notation from Sect. 3.1, let

$$\begin{aligned} \mathscr {B}'(\mu )&=\mathbb {E}\log \left\langle {{\bigoplus _{i=1}^n\varvec{\varphi }_i},{\mu }}\right\rangle ,&\mathscr {B}''(\mu )&=\mathbb {E}\log \left\langle {{\bigoplus _{1\le i\le (k-1)dn/k}\varvec{\psi }_{1,i}},{\mu }}\right\rangle . \end{aligned}$$

Then we have the following generic upper bound, which may be of interest in its own right.

Proposition 5.2

Assume that POS is satisfied. Then \(\mathbb {E}\left[{\log Z({\varvec{G}})}\right]\le o(n)+\mathscr {B}'(\mu )-\mathscr {B}''(\mu )\) for any \(\mu \in \mathscr {K}\).

The proof of Proposition 5.2, based on the interpolation method [38], is relatively standard, although the fact that we deal with regular graphs requires a bit of care. The details are carried out in Sect. 5.2. This is the only place where condition POS is required.

The second step toward the proof of Proposition 5.1 is to show that for \(\mu \) drawn from \(\pi \in \mathfrak {D}^\star \) the upper bound from Proposition 5.2 boils down to the expression \(\mathscr {B}(\pi )\).

Proposition 5.3

For any \(\pi \in \mathfrak {D}^\star \) we have \(\mathscr {B}(\pi )=\mathbb {E}[\mathscr {B}'(\varvec{\mu }^\pi )-\mathscr {B}''(\varvec{\mu }^\pi )].\)

We prove Proposition 5.3 in Sect. 5.3.

Proof of Proposition 5.1

The first assertion is immediate from Propositions 5.2 and 5.3. To obtain the second assertion, we apply Azuma’s inequality and (2.1) to see that \(n^{-0.51}\left| {\log Z({\varvec{G}})-\mathbb {E}\log Z({\varvec{G}})}\right| \) converges to zero in probability. Hence, Fact 2.2 and Bayes’ rule show that \(\mathbb {E}\log Z({\mathbb {G}})=\mathbb {E}\log Z({\varvec{G}})+o(n)\), and thus the second assertion follows from the first. \(\quad \square \)

5.2 Proof of Proposition 5.2

We construct a family of random factor graph models parametrized by \(t\in [0,1]\). The free energy of the model at \(t=1\) will be easy to compute, and we will see that it is (nearly) equal to \(\mathscr {B}'(\mu )-\mathscr {B}''(\mu )\). The model with \(t=0\) essentially coincides with \({\varvec{G}}\). Furthermore, we will show that the derivative of the free energy is non-negative for all t, thus obtaining the desired upper bound on \(\mathbb {E}\log Z({\varvec{G}})\).

To construct this interpolating family, fix \(\mu \in \mathscr {K}\) and a small \(\varepsilon >0\). For \(t\in [0,1]\) let

$$\begin{aligned} \varvec{m}_t&=\mathrm{Po}((1-t)\exp (-\varepsilon )dn/k),\\ \varvec{m}_t'&=\mathrm{Po}(t\exp (-\varepsilon )dn),\\ \varvec{m}_t''&=\mathrm{Po}((1-t)(k-1)\exp (-\varepsilon )dn/k), \end{aligned}$$

all three mutually independent and independent of everything else. Given \(k\varvec{m}_t+\varvec{m}_t'\le dn\), we define the random factor graph \({\varvec{G}}_t\) as follows.

INT1::

the set of variable nodes is \(\mathscr {V}=\left\{ {s}\right\} \cup V_n\), and the set of spins is \(\mathscr {X}=\Omega \cup [0,1]\).

INT2::

the set of constraint nodes is

$$\begin{aligned} \mathscr {F}_{t}=\left\{ {a_1,\ldots ,a_{\varvec{m}_t},a_1',\ldots ,a_{\varvec{m}_t'}',a_1'',\ldots ,a_{\varvec{m}_t''}''}\right\} . \end{aligned}$$
INT3::

each constraint node \(a_i\) independently chooses a weight function \(\psi _{a_i}\) from P, and the \(a_i\) are joined to the variable nodes \(v_1,\ldots ,v_n\) by a random pairing of \(V_n\times [d]\) and \(\{a_1,\ldots ,a_{\varvec{m}_t}\}\times [k]\).

INT4::

each of the constraint nodes \(a_i'\), \(i\in [\varvec{m}_t']\), is adjacent to the variable node s and one further variable node from \(v_1,\ldots ,v_n\); the links between the \(a_i'\) and the \(v_j\) are constructed by choosing a random pairing between the \(\varvec{h}_i'\)-clone of each \(a_i'\) and the clones in \(V_n\times [d]\) that are not paired to a constraint node \(a_h\). The weight function associated with \(a_i'\) reads

$$\begin{aligned} \psi _{a_i'}(s,\sigma )&=\sum _{\tau \in \Omega ^k}\varvec{1}\{\tau _{\varvec{h}_i'}=\sigma \}\varvec{\psi }_i'(\tau )\prod _{h\ne \varvec{h}_i'}\mu _{s,\varvec{x}_{i,h}'}(\tau _h). \end{aligned}$$
INT5::

the constraint nodes \(a_i''\), \(i\in [\varvec{m}_t'']\), are unary, adjacent to s only. Their weight functions read

$$\begin{aligned} \psi _{a_i''}(s)&=\sum _{\tau \in \Omega ^k}\varvec{\psi }_i''(\tau )\prod _{h=1}^k\mu _{s,\varvec{x}_{i,h}''}(\tau _h). \end{aligned}$$
INT6::

the prior \(\mathfrak {p}\) is a product measure

$$\begin{aligned} {\mathrm d}\mathfrak {p}(\sigma )=\varvec{1}\{\sigma _s\in [0,1],\,\forall 1\le i\le n:\sigma _{v_i}\in \Omega \} \prod _{i=1}^np\left( {\sigma _{x_i}}\right) {\mathrm d}\sigma _s; \end{aligned}$$

thus, for each \(v_i\in V_n\) a spin from \(\Omega \) is chosen independently from p, and \(\sigma _s\) is uniform on [0, 1].

Thus, the total weight, partition function and Boltzmann distribution of \({\varvec{G}}_t\) read

$$\begin{aligned} \psi _{{\varvec{G}}_t}(\sigma )&=\prod _{i=1}^{\varvec{m}_t}\psi _{a_i}(\sigma )\prod _{i=1}^{\varvec{m}_t'}\psi _{a_i'}(\sigma )\prod _{i=1}^{\varvec{m}_t''}\psi _{a_i''}(\sigma ),&(\sigma _s\in [0,1],\ \sigma _{x_i}\in \Omega ),\nonumber \\ Z({\varvec{G}}_t)&=\sum _{\sigma _{x_1},\ldots ,\sigma _{x_n}\in \Omega }\int _0^1\psi _{{\varvec{G}}_t}(\sigma ){\mathrm d}\sigma _s \prod _{i=1}^n p(\sigma _{x_i}),&{\mathrm d}\mu _{{\varvec{G}}_t}(\sigma )&=\frac{\psi _{{\varvec{G}}_t}(\sigma )}{Z({\varvec{G}}_t)}{\mathrm d}\mathfrak {p}(\sigma ). \end{aligned}$$
(5.1)

The following lemma establishes the monotonicity of the free energy in t; its proof is the only place where we use condition POS.

Lemma 5.4

Suppose that POS is satisfied. Then uniformly for all \(t\in (0,1)\) we have

$$\begin{aligned} \frac{1}{n}\frac{\partial }{\partial t}\mathbb {E}\left[{\log Z({\varvec{G}}_t)}\right]\ge o(1). \end{aligned}$$

Proof

We recall the derivative of the Poisson density: for any \(\lambda >0\), \(\ell \ge 1\),

$$\begin{aligned} \frac{\partial }{\partial \lambda }\mathbb {P}\left[{\mathrm{Po}(\lambda )=\ell }\right]&= \frac{\partial }{\partial \lambda }\frac{\lambda ^\ell }{\ell !}\exp (-\lambda )=\frac{\lambda ^{\ell -1}}{(\ell -1)!}\exp (-\lambda )\nonumber \\&\quad - \frac{\lambda ^\ell }{\ell !}\exp (-\lambda )=\mathbb {P}\left[{\mathrm{Po}(\lambda )=\ell -1}\right]-\mathbb {P}\left[{\mathrm{Po}(\lambda )=\ell }\right]. \end{aligned}$$
(5.2)

The variable t affects the distribution of \({\varvec{G}}_t\) by way of the variables \(\varvec{m}_t,\varvec{m}_t',\varvec{m}_t''\). Specifically, let

$$\begin{aligned} \lambda _t&=(1-t)\exp (-\varepsilon )dn/k,&\lambda _t'&=t\exp (-\varepsilon )dn,&\lambda _t''&=(1-t)(k-1)\exp (-\varepsilon )dn/k. \end{aligned}$$

Recall that \(\varvec{m}_t,\varvec{m}_t'\) are conditional Poisson variables \(\mathrm{Po}(\lambda _t)\) and \(\mathrm{Po}(\lambda _t')\), respectively, given that \(k\varvec{m}_t+\varvec{m}_t'\le dn\). Since \(\varepsilon >0\) is independent of n, (5.2) shows that for any two integers \(m_t,m_t'\ge 1\),

$$\begin{aligned} \frac{1}{n}\frac{\partial }{\partial t} \mathbb {P}\left[{\varvec{m}_t=m_t,\ \varvec{m}_t'=m_t'}\right]&=\exp (-\Omega (n))+ \frac{1}{n}\frac{\partial }{\partial t}\mathbb {P}\left[{\mathrm{Po}(\lambda _t)=m_t}\right]\mathbb {P}\left[{\mathrm{Po}(\lambda _t')=m_t'}\right]\nonumber \\&=\exp (-\Omega (n))+ \frac{1}{n}\frac{\partial }{\partial t}\mathbb {P}\left[{\mathrm{Po}(\lambda _t)=m_t}\right]\mathbb {P}\left[{\mathrm{Po}(\lambda _t')=m_t'}\right]\nonumber \\&=\exp (-\Omega (n))\nonumber \\&\qquad +\left( {\mathbb {P}\left[{\mathrm{Po}(\lambda _t)=m_t}\right]-\mathbb {P}\left[{\mathrm{Po}(\lambda _t)=m_t-1}\right]}\right) \mathbb {P}\left[{\mathrm{Po}(\lambda _t')=m_t'}\right]\nonumber \\&\quad \exp (-\varepsilon )d/k\nonumber \\&\qquad +\left( {\mathbb {P}\left[{\mathrm{Po}(\lambda _t')=m_t'-1}\right]-\mathbb {P}\left[{\mathrm{Po}(\lambda _t')=m_t'}\right]}\right) \cdot \nonumber \\&\quad \mathbb {P}\left[{\mathrm{Po}(\lambda _t)=m_t}\right]\exp (-\varepsilon )d. \end{aligned}$$
(5.3)

Further, given the event \(k\varvec{m}_t+\varvec{m}_t''\le dn-k\) let \({\varvec{G}}_t'\) be the random factor graph obtained from \({\varvec{G}}_t\) by adding one more constraint node \(a_{\varvec{m}_t+1}\) as per INT3. Similarly, given \(k\varvec{m}_t+\varvec{m}_t''\le dn-1\) obtain \({\varvec{G}}_t''\) from \({\varvec{G}}_t\) by adding \(a_{\varvec{m}_t'+1}\) according to INT4. Additionally, obtain \({\varvec{G}}_t'''\) from \({\varvec{G}}_t\) by adding a unary \(a_{\varvec{m}_t''+1}\) as described in INT5. Since \(\varvec{m}_t''\) is independent of \(\varvec{m}_t',\varvec{m}_t''\), (5.2) and (5.3) yield

$$\begin{aligned}&\frac{1}{n}\frac{\partial }{\partial t}\mathbb {E}[\log Z({\varvec{G}}_t)]=\exp \left( {-\Omega (n)}\right) \nonumber \\&\qquad +\sum _{\begin{array}{c} m_t,m_t',m_t''\ge 1\\ km_t+m_t''\le dn-k \end{array}} \mathbb {E}\left[{\log Z({\varvec{G}}_t)\,\Bigg |\, \begin{pmatrix}\varvec{m}_t\\ \varvec{m}_t'\\ \varvec{m}_t''\end{pmatrix} =\begin{pmatrix}m_t\\ m_t'\\ m_t''\end{pmatrix}}\right] \cdot \frac{1}{n}\frac{\partial }{\partial t}\mathbb {P}\left[{\begin{pmatrix}\varvec{m}_t\\ \varvec{m}_t'\\ \varvec{m}_t''\end{pmatrix} =\begin{pmatrix}m_t\\ m_t'\\ m_t''\end{pmatrix}}\right]\nonumber \\&\quad =\exp \left( {-\Omega (n)}\right) \nonumber \\&\qquad -\exp \left( {-\varepsilon }\right) \frac{d}{k}\bigg [\mathbb {E}\log \frac{Z({\varvec{G}}_t')}{Z({\varvec{G}}_t)} -k\mathbb {E}\log \frac{Z({\varvec{G}}_t'')}{ Z({\varvec{G}}_t)}+(k-1)\mathbb {E}\log \frac{Z({\varvec{G}}_t''')}{Z({\varvec{G}}_t)}\bigg ]. \end{aligned}$$
(5.4)

Hence, it suffices to prove that for all \(0<t<1\),

$$\begin{aligned} \mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t')}{Z({\varvec{G}}_t)}}\right]-k\mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t'')}{Z({\varvec{G}}_t)}}\right]+(k-1)\mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t''')}{Z({\varvec{G}}_t)}}\right]&\le 0. \end{aligned}$$
(5.5)

By the definition of the Boltzmann distribution (5.1),

$$\begin{aligned} \frac{Z({\varvec{G}}_t')}{Z({\varvec{G}}_t)}&=\left\langle {{\psi _{a_{\varvec{m}_t+1}}},{\mu _{{\varvec{G}}_t}}}\right\rangle ,&\frac{Z({\varvec{G}}_t'')}{Z({\varvec{G}}_t)}&=\left\langle {{\psi _{a_{\varvec{m}_t'+1}'}},{\mu _{{\varvec{G}}_t}}}\right\rangle ,&\frac{Z({\varvec{G}}_t''')}{Z({\varvec{G}}_t)}&=\left\langle {{\psi _{a_{\varvec{m}_t''+1}''}},{\mu _{{\varvec{G}}_t}}}\right\rangle . \end{aligned}$$

Hence,

$$\begin{aligned} \log \frac{Z({\varvec{G}}_t')}{Z({\varvec{G}}_t)}&=\log \left\langle {{\psi _{a_{\varvec{m}_t+1}}},{\mu _{{\varvec{G}}_t}}}\right\rangle =-\sum _{\ell \ge 1}\frac{1}{\ell }\left\langle {{1-\psi _{a_{\varvec{m}_t+1}}},{\mu _{{\varvec{G}}_t}}}\right\rangle ^\ell . \end{aligned}$$
(5.6)

Further, in terms of the kernel representation \({\dot{\mu }}_{{\varvec{G}}_t}\) of the Boltzmann distribution we obtain

$$\begin{aligned} \mathbb {E}\left[{\left\langle {{1-\psi _{a_{\varvec{m}_t+1}}},{\mu _{{\varvec{G}}_t}}}\right\rangle ^\ell }\right]&= \mathbb {E}\left[{\left( {1-\sum _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma )\int _0^1\prod _{i=1}^k{\dot{\mu }}_{{\varvec{G}}_t,z,\varvec{x}_i}(\sigma _i){\mathrm d}z}\right) ^\ell }\right]. \end{aligned}$$
(5.7)

Combining (5.6) and (5.7) yields

$$\begin{aligned} \mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t')}{Z({\varvec{G}}_t)}}\right]&=-\mathbb {E}\left[{\sum _{\ell \ge 1}\frac{1}{\ell }\left( {1-\sum _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma )\int _0^1\prod _{i=1}^k{\dot{\mu }}_{{\varvec{G}}_t,z,\varvec{x}_i}(\sigma _i){\mathrm d}z}\right) ^\ell }\right]. \end{aligned}$$
(5.8)

Due to (2.1) and Fubini’s theorem, we can exchange the sum and the expectation in (5.8); indeed, (2.1) yields

$$\begin{aligned} \sum _{\ell \ge 1}\mathbb {E}\left| {\left( {1-\sum _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma )\int _0^1\prod _{i=1}^k{\tilde{\mu }}_{{\varvec{G}}_t,z,\varvec{x}_i}(\sigma _i){\mathrm d}z}\right) ^\ell }\right|&\le \sum _{\ell \ge 1}\mathbb {E}\left[{\max _{\sigma \in \Omega ^k}|1-\varvec{\psi }(\sigma )|^\ell }\right]<\infty . \end{aligned}$$

Thus, (5.8) becomes

$$\begin{aligned} \mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t')}{Z({\varvec{G}}_t)}}\right]&=-\sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left[{\left( {1-\sum _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma )\int _0^1\prod _{i=1}^k{\tilde{\mu }}_{{\varvec{G}}_t,z,\varvec{x}_i}(\sigma _i){\mathrm d}z}\right) ^\ell }\right]. \end{aligned}$$
(5.9)

Following similar steps, we obtain expansions for the other two terms from (5.5) as well:

$$\begin{aligned}&\mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t'')}{Z({\varvec{G}}_t)}}\right]\nonumber \\&\quad =-\frac{1}{k}\sum _{h=1}^k\sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left[{\left( {1-\sum _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma )\int _0^1{\tilde{\mu }}_{{\varvec{G}}_t,z,\varvec{x}_h}(\sigma _h) \prod _{i\ne h}\mu _{z,\varvec{x}_i}(\sigma _i){\mathrm d}z}\right) ^\ell }\right], \end{aligned}$$
(5.10)
$$\begin{aligned}&\mathbb {E}\left[{\log \frac{Z({\varvec{G}}_t''')}{Z({\varvec{G}}_t)}}\right]=-\sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left[{\left( {1-\sum _{\sigma \in \Omega ^k}\varvec{\psi }(\sigma )\int _0^1\prod _{i=1}^k\mu _{z,\varvec{x}_i}(\sigma _i){\mathrm d}z}\right) ^\ell }\right]. \end{aligned}$$
(5.11)

Finally, the assertion follows from POS and (5.5), (5.9), (5.10) and (5.11). \(\quad \square \)

Proof of Proposition 5.2

Integrating t from 0 to 1 and applying Lemma 5.4, we obtain for any \(\varepsilon >0\),

$$\begin{aligned} \mathbb {E}[\log Z({\varvec{G}}_0)]\le \mathbb {E}[\log Z({\varvec{G}}_1)]+o(n). \end{aligned}$$
(5.12)

Letting

$$\begin{aligned} Y=\log \int _0^1\prod _{i=1}^{\varvec{m}_0''}\sum _{\sigma \in \Omega ^k}\varvec{\psi }_{i}''(\sigma ) \prod _{h=1}^k\mu _{z,\varvec{x}_{i,h}''}(\sigma _h){\mathrm d}z, \end{aligned}$$

we claim that for a certain number \(c=c(P)>0\),

$$\begin{aligned} \mathbb {E}\log Z({\varvec{G}})+\mathbb {E}[Y]\le \mathbb {E}\log Z({\varvec{G}}_1) +\varepsilon c n. \end{aligned}$$
(5.13)

Indeed, at \(t=0\) the variable node s is adjacent to the constraint nodes \(a_i''\), \(i\in [\varvec{m}_t'']\), only. Hence, \({\varvec{G}}_0\) decomposes into connected components, one of which comprises s and the \(a_i''\). Let \({\varvec{G}}_0''\) be this component, and let \({\varvec{G}}_0'\) be the remainder of \({\varvec{G}}_0\). Then by construction we have \(\mathbb {E}\log Z({\varvec{G}}_0'')=\mathbb {E}[Y]\). Thus, (5.12) yields

$$\begin{aligned} \mathbb {E}[\log Z({\varvec{G}}_0')]+\mathbb {E}[Y]&=\mathbb {E}[\log Z({\varvec{G}}_0)]\le \mathbb {E}[\log Z({\varvec{G}}_1)]+o(n). \end{aligned}$$
(5.14)

Furthermore, \({\varvec{G}}_0'\) consists of the variable nodes \(v_1,\ldots ,v_n\) and the constraint nodes \(a_1,\ldots ,a_{\varvec{m}_1}\), where \(\varvec{m}_1\) is a Poisson variable \(\mathrm{Po}(\exp (-\varepsilon )dn/k)\) conditioned on taking a value of at most dn / k. Thus, we can construct a random factor graph with the same distribution as \({\varvec{G}}\) from \({\varvec{G}}_0'\) by simply adding \(dn/k-\varvec{m}_1\) further random k-ary constraint nodes as per INT3. Since all weight functions \(\psi \in \Psi \) take values in (0, 2), we obtain \(c=c(P)>0\) such that

$$\begin{aligned} \mathbb {E}\log Z({\varvec{G}})\le \mathbb {E}\log Z({\varvec{G}}_0')+\varepsilon cn. \end{aligned}$$
(5.15)

Combining (5.14) and (5.15), we obtain (5.13).

We further claim that there is a constant \(c'=c'(P)>0\) such that

$$\begin{aligned} \frac{1}{n}\mathbb {E}[Y]\le \varepsilon c'+o(1)+\mathscr {B}''(\mu ). \end{aligned}$$
(5.16)

Indeed, \(\mathscr {B}''(\mu )=\mathbb {E}[Y\mid \varvec{m}_0''=(k-1)dn/k]\). In other words, we can think of \(\mathscr {B}''(\mu )\) as the free energy of \({\varvec{G}}_0''\) given that \(\varvec{m}_0''=(k-1)dn/k\). Thus, obtain \({\varvec{G}}_0'''\) from \({\varvec{G}}_0''\) by adding \((k-1)dn/k-\varvec{m}_0''\) more constraint nodes according to INT5, or by removing some random constraint nodes if \(\varvec{m}_0''>(k-1)dn/k\). Then \(\mathscr {B}''(\mu )=\mathbb {E}\log Z({\varvec{G}}_0''')\). Since \(\varvec{m}_0''\) is a Poisson variable with mean \(\exp (-\varepsilon )(k-1)dn/k\), with probability \(1-\exp (-\Omega (n))\) we do not need to add or remove more than \(2\varepsilon (k-1)dn\) constraint nodes. The tail bound (2.1) therefore implies together with the Chernoff bound that (5.16) is satisfied for a certain \(c'=c'(P)\).

By similar arguments, for a certain \(c''=c''(P)\) we have

$$\begin{aligned} \frac{1}{n}\mathbb {E}[\log Z({\varvec{G}}_1)]\le \mathscr {B}'(\mu )+ \varepsilon c''+o(1). \end{aligned}$$
(5.17)

Indeed, \(\mathscr {B}'(\mu )\) is nothing but the conditional expectation of \(\log Z({\varvec{G}}_1)\) given that \(\varvec{m}_1'=dn\). Hence, if we pad \({\varvec{G}}_1\) by adding the missing \(dn-\varvec{m}_1'\) constraint nodes \(a_i'\) according to INT4, then the total number of constraints added does not exceed \(2\varepsilon dn\) with probability \(1-\exp (-\Omega (n))\). Hence, (5.17) follows from (2.1) and the Chernoff bound.

Finally, combining (5.12)–(5.17), we conclude that

$$\begin{aligned} \frac{1}{n}\mathbb {E}\log Z({\varvec{G}})\le \mathscr {B}'(\mu )-\mathscr {B}''(\mu )+\varepsilon c'''+o(1) \end{aligned}$$

for a certain \(c'''=c'''(P)>0\). Since this is true for any fixed \(\varepsilon >0\), the assertion follows. \(\quad \square \)

5.3 Proof of Proposition 5.3

Following Panchenko [52], who worked with factor graphs of Erdős-Rényi type, we are going to use the invariance property of \(\pi \in \mathfrak {D}^\star \) under the \(*(N,M)\)-operation to simplify \(\mathscr {B}',\mathscr {B}''\) separately.

Lemma 5.5

Suppose that \(\pi \in \mathfrak {D}^\star \). Then

$$\begin{aligned} \mathbb {E}[\mathscr {B}''(\varvec{\mu }^\pi )]&= \frac{d(k-1)n}{k} \mathbb {E}\left[{\log \left\langle {{\varvec{\psi }_1},{\pi }}\right\rangle }\right]. \end{aligned}$$
(5.18)

Proof

Let \(\phi =\mathbb {E}\left[{\log \left\langle {{\varvec{\psi }_1},{\pi }}\right\rangle }\right]\) for brevity. We claim that for any integer \(m\ge 0\),

$$\begin{aligned} \mathbb {E}\left[{\log \frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right]&=\phi . \end{aligned}$$
(5.19)

Then (5.18) follows by summing (5.19) on \(0\le m<d(k-1)n/k\).

Thus, we are left to prove (5.19). Since \(\pi \in \mathfrak {D}^\star \), Corollaries 3.4 and 3.9 imply that for any integer \(\ell \ge 1\),

$$\begin{aligned} \mathbb {E}\left[{\left( {\frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right]&=\mathbb {E}\left[{\left\langle {{\frac{\bigoplus _{i=1}^m\varvec{\psi }_i}{\left\langle {{\bigoplus _{j=1}^m\varvec{\psi }_j},{\pi }}\right\rangle }\oplus \varvec{\psi }_{m+1}},{\pi }}\right\rangle ^\ell }\right]\\&=\mathbb {E}\left[{\left\langle {{\varvec{\psi }_{m+1}},{\bigoplus _{i=1}^m\varvec{\psi }_i*\pi }}\right\rangle ^\ell }\right]=\mathbb {E}\left[{\left\langle {{\varvec{\psi }_{m+1}},{\pi }}\right\rangle ^\ell }\right]\\&=\mathbb {E}\left[{\left\langle {{\varvec{\psi }_{1}},{\pi }}\right\rangle ^\ell }\right]. \end{aligned}$$

Consequently, for all \(\ell \ge 1\) we have

$$\begin{aligned} \mathbb {E}\left[{\left( {1-\frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right]=\mathbb {E}\left[{\left( {1-\left\langle {{\varvec{\psi }_{1}},{\pi }}\right\rangle }\right) ^\ell }\right]. \end{aligned}$$
(5.20)

Further, because the continuous function \(z\in [-1,1]\mapsto |z|\) is a uniform limit of polynomials, (5.20) yields

$$\begin{aligned} \mathbb {E}\left| {\left( {1-\frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right| =\mathbb {E}\left| {\left( {1-\left\langle {{\varvec{\psi }_{1}},{\pi }}\right\rangle }\right) ^\ell }\right| \end{aligned}$$

Therefore, invoking (2.1), we obtain

$$\begin{aligned} \sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left| {\left( {1-\frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right|&=\sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left| {\left( {1-\left\langle {{\varvec{\psi }_{1}},{\pi }}\right\rangle }\right) ^\ell }\right| \\&\le \sum _{\ell \ge 1}\mathbb {E}\left[{\max _{\sigma \in \Omega ^k}|1-\varvec{\psi }_1(\sigma )|^\ell }\right]<\infty . \end{aligned}$$

Hence, by (5.20) and Fubini’s theorem,

$$\begin{aligned} \mathbb {E}\left[{\log \frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right]&= -\sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left[{\left( {1-\frac{\left\langle {{\bigoplus _{i=1}^{m+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{m}\varvec{\psi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right]\\&=-\sum _{\ell \ge 1}\frac{1}{\ell }\mathbb {E}\left[{\left( {1-\left\langle {{\varvec{\psi }_{1}},{\pi }}\right\rangle }\right) ^\ell }\right]=\phi , \end{aligned}$$

which is (5.19). \(\quad \square \)

Lemma 5.6

Suppose that \(\pi \in \mathfrak {D}^\star \). Then \(\mathbb {E}[\mathscr {B}'(\varvec{\mu }^\pi )]=\mathbb {E}\log \left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle .\)

Proof

We use a similar argument as in the proof of Lemma 5.5. This time we set \(\phi =\mathbb {E}\log \left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle \). It suffices to show that for every \(n\ge 0\),

$$\begin{aligned} \mathbb {E}\left[{\log \frac{\left\langle {{\bigoplus _{i=1}^{n+1}\varvec{\varphi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{n}\varvec{\varphi }_i},{\pi }}\right\rangle }}\right]&=\phi . \end{aligned}$$
(5.21)

As in the proof of Lemma 5.5, we use that \(\pi \in \mathfrak {D}^\star \) and apply Corollaries 3.4 and 3.9 to obtain for any \(\ell \ge 1\),

$$\begin{aligned} \mathbb {E}\left[{\left( {\frac{\left\langle {{\bigoplus _{i=1}^{n+1}\varvec{\varphi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{n}\varvec{\varphi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right]&= \mathbb {E}\left[{\left( {\left\langle {{\frac{\bigoplus _{i=1}^{n}\varvec{\varphi }_i}{\left\langle {{\bigoplus _{j=1}^{n}\varvec{\varphi }_j},{\pi }}\right\rangle }\oplus \varvec{\varphi }_{n+1}},{\pi }}\right\rangle }\right) ^\ell }\right]\nonumber \\&= \mathbb {E}\left[{\left\langle {{\varvec{\varphi }_{n+1}},{\bigoplus _{i=1}^{n}\varvec{\varphi }_i*\pi }}\right\rangle ^\ell }\right]= \mathbb {E}\left[{\left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle ^\ell }\right]. \end{aligned}$$
(5.22)

Hence, for any \(\ell \ge 1\),

$$\begin{aligned} \mathbb {E}\left[{\left( {1-\frac{\left\langle {{\bigoplus _{i=1}^{n+1}\varvec{\varphi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{n}\varvec{\varphi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right]&=\mathbb {E}\left[{\left( {1-\left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle }\right) ^\ell }\right]. \end{aligned}$$
(5.23)

Further, approximating the absolute value by polynomials, we obtain from (5.22) that

$$\begin{aligned} \mathbb {E}\left| {\left( {1-\frac{\left\langle {{\bigoplus _{i=1}^{n+1}\varvec{\varphi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{n}\varvec{\varphi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right|&=\mathbb {E}\left| {\left( {1-\left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle }\right) ^\ell }\right| . \end{aligned}$$

Thus, (5.21) follows from (5.23) and Fubini’s theorem. \(\quad \square \)

Finally, Proposition 5.3 is immediate from Lemmas 5.5 and 5.6.

6 The Free Energy: Lower Bound

6.1 Outline

In this section we prove the following lower bound on the free energy that matches the upper bound from Proposition 5.1. The lower bound does not require the assumption POS.

Proposition 6.1

We have

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{1}{n} \mathbb {E}\left[{\log Z ({\varvec{G}})}\right]&\ge \inf _{\pi \in \mathfrak {D}^\star }\mathscr {B}(\pi ),&\liminf _{n \rightarrow \infty } \frac{1}{n} \mathbb {E}\left[{\log Z ({\mathbb {G}})}\right]\ge \inf _{\pi \in \mathfrak {D}^\star }\mathscr {B}(\pi ). \end{aligned}$$

Theorem 2.7 follows immediately from Propositions 5.1 and 6.1.

The proof of Proposition 6.1 is based on a kind of coupling argument that is colloquially referred to as the ‘Aizenman–Sims–Starr’ scheme. This technique has been applied with great success to random factor graphs of Erdős-Rényi type, where the degree distribution is approximately Poisson [15, 22, 52]. The basic idea is to couple a random factor graph with n variable nodes with a random factor graph with \(n+1\) variable nodes and to calculate the difference of their free energies very precisely. This coupling is very easy to set up in the Erdős-Rényi case due to the Stein-Chen property of the Poisson distribution.

However, in the case of random regular graphs matters are more intricate. Due to the rigid local structure there is no obvious way of coupling random regular factor graphs with n and \(n+1\) variable nodes. As in Sect. 4, we therefore resort to the idea of creating a bit of wiggling room by carving out a few cavities, in such a way that the free energy does not change significantly. But the details of the construction are delicate.

Let \(n,\omega \) be integers and let \(\varvec{X},\varvec{Y}\) be two independent Poisson variables with mean \(\omega \). The protagonist of the proof is the random factor graph \({\varvec{G}}_{n,\omega }\) defined as follows. Let

$$\begin{aligned} N_{n,\omega }={k\vee }(n-\varvec{X})\qquad \text{ and } \text{ let }\qquad \Delta _{n,\omega }\,{{\mathop {=}\limits ^{{d}}}}\,\mathrm{Be}(d N_{n,\omega }/k-\lfloor d N_{n,\omega }/k\rfloor ) \end{aligned}$$

be independent of \(\varvec{Y}\). Further, set

$$\begin{aligned} M_{n,\omega }={d\vee }\left( {\lfloor d N_{n,\omega }/k\rfloor \wedge \left( {\lfloor d N_{n,\omega }/k\rfloor +\Delta _{n,\omega }-d\varvec{X}-\varvec{Y}}\right) }\right) . \end{aligned}$$

Then \({\varvec{G}}_{n,\omega }\) has \(N_{n,\omega }\) variable nodes \(v_i\), \(i\in [N_{n,\omega }]\), and \(M_{n,\omega }\) constraint nodes \(a_i\), \(i\in [M_{n,\omega }]\). The weight functions \(\psi _{a_i}\) are chosen independently from P. Furthermore, the variable and constraint nodes are linked through a random (one-to-one) pairing

$$\begin{aligned} F_{M_{n,\omega }}\times [k]\rightarrow V_{N_{n,\omega }}\times [d]. \end{aligned}$$

Since \(kM_{n,\omega }\le dN_{n,\omega }\) by construction, such a pairing exists, but some variable clones may go unpaired. We are going to harness these unpaired ‘cavities’ to set up a coupling of \({\varvec{G}}_{n,\omega }\) and \({\varvec{G}}_{n+1,\omega }\).

To this end, consider a further random factor graph \({\hat{{\varvec{G}}}}_{n,\omega }\) with \(N_{n,\omega }\) variable nodes and \({\hat{M}}_{n,\omega }=M_{n+1,\omega }-d\) constraint nodes. The weight functions are chosen independently from P, and the connections between the constraint and variable nodes are induced by a random pairing

$$\begin{aligned} F_{{\hat{M}}_{n,\omega }}\times [k]\rightarrow V_{N_{n,\omega }}\times [d]. \end{aligned}$$

Rather than coupling \({\varvec{G}}_{n,\omega }\) and \({\varvec{G}}_{n+1,\omega }\) directly, we will couple \({\varvec{G}}_{n,\omega }\) and \({\hat{{\varvec{G}}}}_{n,\omega }\) as well as \({\varvec{G}}_{n+1,\omega }\) and \({\hat{{\varvec{G}}}}_{n,\omega }\).

This construction leads to an approximate formula for the free energy of \({\varvec{G}}_{n,\omega }\) that comes in terms of the kernel representation of the Boltzmann distribution of \({\hat{{\varvec{G}}}}_{n,\omega }\). To be precise, let \({\hat{\mathscr {C}}}\) be the set of variables \(v_i\in V_{N_{n,\omega }}\) with at least one unpaired clone in \({\hat{{\varvec{G}}}}_{n,\omega }\). Consider the random kernel

$$\begin{aligned} {\hat{\rho }}_{n,\omega }={\dot{\mu }}_{{\hat{{\varvec{G}}}}_{n,\omega },{\hat{\mathscr {C}}}}\in \mathfrak {K}\end{aligned}$$

representing the joint distribution of the cavities \({\hat{\mathscr {C}}}\). Further, let \({\hat{\pi }}_{n,\omega }\in \mathfrak {D}\) be the distribution of \({\hat{\rho }}_{n,\omega }\). To deal with the conditioning on the event \(\mathscr {S}\), we also introduce versions \({\mathbb {G}}_{n,\omega }\), \({\hat{{\mathbb {G}}}}_{n,\omega }\) of the above random factor graphs conditional on \(\mathscr {S}\). Let \({\tilde{\rho }}_{n,\omega }={\dot{\mu }}_{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}\in \mathfrak {K}\) be the kernel representation of the corresponding Boltzmann distribution, and let \({\tilde{\pi }}_{n,\omega }\in \mathfrak {D}\) be the law of \({\tilde{\rho }}_{n,\omega }\). In Sect. 6.2 we will derive the following formula.

Proposition 6.2

For any \(\varepsilon >0\) there exists \(\omega >0\) such that

$$\begin{aligned} \liminf _{n\rightarrow \infty }\, \mathbb {E}\left[{\log \frac{Z({\varvec{G}}_{n+1,\omega })}{Z({\varvec{G}}_{n,\omega })}}\right]- \mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\hat{\rho }}_{n,\omega }}}\right\rangle -\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{{\hat{\rho }}_{n,\omega }}}\right\rangle }\right]&\ge -\varepsilon ,\\ \liminf _{n\rightarrow \infty }\, \mathbb {E}\left[{\log \frac{Z({\mathbb {G}}_{n+1,\omega })}{Z({\mathbb {G}}_{n,\omega })}}\right]- \mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\tilde{\rho }}_{n,\omega }}}\right\rangle -\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{{\tilde{\rho }}_{n,\omega }}}\right\rangle }\right]&\ge -\varepsilon . \end{aligned}$$

There are still two gaps to fill toward the proof of Proposition 6.1. First, the estimate of the free energy provided by Proposition 6.2 does not quite match the functional \(\mathscr {B}({\hat{\pi }}_{n,\omega })\). Second, the distribution \({\hat{\pi }}_{n,\omega }\in \mathfrak {D}\) does not generally belong to the subspace \(\mathfrak {D}^\star \). The following proposition deals with the second issue, which holds the key to resolving the first. Recall that the topology of \(\mathfrak {D}\) is induced by the Wasserstein metric \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\). We introduce a relaxed version of \(\mathfrak {D}^\star \) by letting

$$\begin{aligned} \mathfrak {D}^\star _{\varepsilon ,N,M}=\left\{ {\pi \in \mathfrak {D}:\mathscr {D}_{\Box }(\pi ,\pi ^{*(u,w)})\le \varepsilon \text{ for } \text{ all } u\le N \text{ and } w\le M}\right\} . \end{aligned}$$

Since (2.1) and Lemma 3.6 show that the map \(\pi \mapsto \pi ^{*(u,w)}\) is continuous, \(\mathfrak {D}^\star _{\varepsilon ,N,M}\) is a closed subspace of the compact Polish space \(\mathfrak {D}\).

Proposition 6.3

For any \(\varepsilon ,L>0\) there is \(\omega _0>0\) such that for every \(\omega >\omega _0\) for large enough n we have

$$\begin{aligned} {\hat{\pi }}_{n,\omega },{\tilde{\pi }}_{n,\omega }\in \mathfrak {D}^\star _{\varepsilon ,L,L}. \end{aligned}$$

The proof of Proposition 6.3 can be found in Sect. 6.3. Finally, in Sect. 6.4 we derive Proposition 6.1 from Propositions 6.2 and 6.3.

6.2 Proof of Proposition 6.2

We assume throughout that \(\omega >\omega _0\) for a big enough \(\omega _0=\omega _0(d,P)\) and that n sufficiently large.

Obtain the random factor graph \({\varvec{G}}'_{n,\omega }\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by adding \(M_n-{\hat{M}}_{n,\omega }\) new random constraint nodes \(a_{i}\), \({\hat{M}}_{n,\omega }<i\le M_n\), whose weight functions are drawn from P independently and that are linked with the variable nodes via a random pairing with the cavities \({\hat{\mathscr {C}}}\) of \({\hat{{\varvec{G}}}}_{n,\omega }\).

Further, if \(k{\hat{M}}_{n,\omega }\le dN_{n,\omega }-d(k-1)\), then obtain \({\varvec{G}}''_{n,\omega }\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by adding one new variable node \({\hat{v}}=v_{N_{n,\omega }+1}\) along with d random constraint nodes \({\hat{a}}_1,\ldots ,{\hat{a}}_d\) adjacent to \({\hat{v}}\) whose weight functions are drawn independently from P. To be precise, the clones of \({\hat{v}}\) are paired each with a uniformly random clone \({\hat{\varvec{h}_i}}\) of \({\hat{a}}_i\) for \(i=1,\ldots ,d\), and the remaining \(d(k-1)\) clones of the \({\hat{a}}_i\) are paired with randomly chosen cavities of \({\hat{{\varvec{G}}}}_{n,\omega }\). If \(k{\hat{M}}_{n,\omega }> dN_{n,\omega }-d(k-1)\), then obtain \({\varvec{G}}''_{n,\omega }\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by just adding a new isolated variable node \({\hat{v}}\).

Obtain \({\mathbb {G}}_{n,\omega }',{\mathbb {G}}_{n,\omega }''\) analogously from \({\hat{{\mathbb {G}}}}_{n,\omega }\) while conditioning on the event that the outcome is simple. If it is impossible to add the required number of constraint nodes in such a way that the resulting factor graph is simple, then do not add any.

Lemma 6.4

For \(\omega >0\) we have

$$\begin{aligned} \left| {\mathbb {E}\log Z({\varvec{G}}_{n,\omega })-\mathbb {E}\log Z({\varvec{G}}'_{n,\omega })}\right|&=o(1),&\left| {\mathbb {E}\log Z({\varvec{G}}_{n+1,\omega })-\mathbb {E}\log Z({\varvec{G}}''_{n,\omega })}\right|&= o_\omega (1),\nonumber \\ \left| {\mathbb {E}\log Z({\mathbb {G}}_{n,\omega })-\mathbb {E}\log Z({\mathbb {G}}'_{n,\omega })}\right|&=o_\omega (1),&\left| {\mathbb {E}\log Z({\mathbb {G}}_{n+1,\omega })-\mathbb {E}\log Z({\mathbb {G}}''_{n,\omega })}\right|&= o_\omega (1). \end{aligned}$$
(6.1)

Proof

Since \(d\ge 3\) and \(k\ge 2\),

$$\begin{aligned} {\hat{M}}_{n,\omega }-\lfloor dN_{n,\omega }/k\rfloor&\le dN_{n,\omega }/k+1+d/k-d-\lfloor dN_{n,\omega }/k\rfloor \le 2-d(1-1/k)\le 1/2. \end{aligned}$$

Because the left-hand side is an integer, we conclude that \({\hat{M}}_{n,\omega }\le \lfloor dN_{n,\omega }/k\rfloor \). Similarly,

$$\begin{aligned} {\hat{M}}_{n,\omega }-\left( {\lfloor d N_{n,\omega }/k\rfloor +\Delta _{n,\omega }-d\varvec{X}-\varvec{Y}}\right)&\le 2-d(1-1/k)\le 1/2. \end{aligned}$$

Thus, \(M_{n,\omega }\ge {\hat{M}}_{n,\omega }\). Hence, \({\varvec{G}}_{n,\omega }\) and \({\varvec{G}}'\) are identically distributed.

Moving on to the second claim, we consider the event \(\mathscr {A}\) that the last variable node is adjacent to precisely d distinct constraint nodes. Then

$$\begin{aligned} \mathbb {P}\left[{{\varvec{G}}_{n+1,\omega }\in \mathscr {A}}\right]=1-O(\omega /n), \end{aligned}$$
(6.2)

while \({\varvec{G}}_{n,\omega }''\in \mathscr {A}\) with certainty. Given \(\mathscr {A}\) and given that \(\varvec{X}+\varvec{Y}\le \sqrt{n}\), say, the subgraph \({\tilde{{\varvec{G}}}}_{n+1,\omega }\) obtained from \({\varvec{G}}_{n+1,\omega }\) by deleting \({\tilde{v}}\) along with its adjacent constraint nodes is distributed precisely as \({\hat{{\varvec{G}}}}_{n,\omega }\), and therefore \({\varvec{G}}''_{n,\omega }\) and \({\varvec{G}}_{n+1,\omega }\) can be coupled identically. Hence,

$$\begin{aligned} \mathbb {E}\left[{\log Z({\varvec{G}}''_{n,\omega })\mid \varvec{X}+\varvec{Y}\le \sqrt{n}}\right] =\mathbb {E}\left[{\log Z({\varvec{G}}_{n+1,\omega })\mid \mathscr {A},\,\varvec{X}+\varvec{Y}\le \sqrt{n}}\right]. \end{aligned}$$
(6.3)

If, on the other hand, \(\varvec{X}+\varvec{Y}\le \sqrt{n}\) but \(\mathscr {A}\) does not occur, then we can couple \({\varvec{G}}_{n+1,\omega }\) and \({\varvec{G}}''_{n,\omega }\) such that both disagree on at most 2d constraint nodes. Indeed, suppose that \(v_{N_{n+1,\omega }}\) has \({\tilde{d}}<d\) adjacent constraints in \({\varvec{G}}_{n+1,\omega }\). Then the subgraph obtained by removing \(v_{N_{n+1,\omega }}\), its \({\tilde{d}}\) neighbors and another \(d-{\tilde{d}}\) random constraint nodes is distributed precisely as \({\hat{{\varvec{G}}}}_{n,\omega }\). Hence, we can obtain both \({\varvec{G}}_{n+1,\omega }\) and \({\varvec{G}}_{n,\omega }''\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by adding d (possibly distinct) constraint nodes. Thus, (2.1) ensures that

$$\begin{aligned} \mathbb {E}\left[{\log Z({\varvec{G}}''_{n,\omega })\mid \varvec{X}+\varvec{Y}\le \sqrt{n}}\right]=\mathbb {E}\left[{\log Z({\varvec{G}}_{n+1,\omega })\mid \overline{\mathscr {A}},\,\varvec{X}+\varvec{Y}\le \sqrt{n}}\right]+O(1). \end{aligned}$$
(6.4)

Furthermore, (2.1) ensures that

$$\begin{aligned} \mathbb {E}\left[{\log Z({\varvec{G}}''_{n,\omega })\mid \varvec{X},\varvec{Y}}\right],\mathbb {E}\left[{\log Z({\varvec{G}}_{n+1,\omega })\mid \varvec{X},\varvec{Y}}\right]=O(n). \end{aligned}$$
(6.5)

Since \(\mathbb {P}\left[{X+Y>\sqrt{n}}\right]=o(n^{-2})\), (6.2)–(6.5) yield the second assertion.

Matters get slightly more complicated once we condition on \(\mathscr {S}\). Since \(\varvec{X}+\varvec{Y}\le \log n\) with probability \(1-O(n^{-k})\), due to (2.1) the event \(\varvec{X}+\varvec{Y}>\log n\) contributes no more than an additive o(1) to the difference of the free energies. Hence, we may condition on \(\varvec{X}+\varvec{Y}\le \log n\). Let \(\varvec{d}'\) be the vector comprising the variable degrees in \({\hat{{\mathbb {G}}}}_{n,\omega }\). Let \(\mathscr {D}\) be the set of all such sequences with entries either d or \(d-1\). A standard moment calculation shows that given any possible \(\varvec{d}'\), the event \({\hat{{\varvec{G}}}}_{n,\omega }\in \mathscr {S}\) has probability \((1+o(1))\exp \left[{-(d-1)(k-1)/2-\varvec{1}\{k=2\}(d-1)^2/4}\right]\) (cf. Fact 2.2). Therefore, with \({\tilde{O}}(\,\cdot \,)\) hiding poly-logarithmic terms,

$$\begin{aligned} \mathbb {P}\left[{\varvec{d}'\in \mathscr {D}\mid \varvec{X}+\varvec{Y}\le \log n}\right]&=1-{\tilde{O}}(1/n). \end{aligned}$$
(6.6)

Similarly, let \(\varvec{d}\) comprise the variable degrees of the factor graph \({\mathbb {G}}_{n,\omega }^-\) obtained from \({\mathbb {G}}_{n,\omega }\) by deleting the last d constraint nodes. Then

$$\begin{aligned} \mathbb {P}\left[{\varvec{d}\in \mathscr {D}\mid \varvec{X}+\varvec{Y}\le \log n}\right]&=1-{\tilde{O}}(1/n). \end{aligned}$$
(6.7)

Additionally, let \({{\mathscr {E}}}\) be the set of all factor graphs that have a constraint node that is adjacent to variable nodes of degree less than d only. Then

$$\begin{aligned} \mathbb {P}\left[{{\hat{{\mathbb {G}}}}_{n,d}\in {{\mathscr {E}}}\mid \varvec{X}+\varvec{Y}\le \log n}\right],\mathbb {P}\left[{{\mathbb {G}}_{n,\omega }^-\in {{\mathscr {E}}}\mid \varvec{X}+\varvec{Y}\le \log n}\right]&=\tilde{O}(n^{1-k}). \end{aligned}$$
(6.8)

Further, on the event \(\mathscr {D}{\setminus }{{\mathscr {E}}}\) we can couple \({\mathbb {G}}_{n,\omega }'\) and \({\mathbb {G}}_{n,\omega }\) identically, because there is no way of adding the missing constraint nodes to \({\hat{{\mathbb {G}}}}_{n,\omega }\) without obtaining a simple factor graph. Hence,

$$\begin{aligned}&\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega })\mid \varvec{d}\in \mathscr {D},\,{\mathbb {G}}_{n,\omega }^-\not \in {{\mathscr {E}}},\varvec{X}+\varvec{Y}\le \log n}\right] \nonumber \\&\quad =\mathbb {E}\left[{\log Z({\mathbb {G}}'_{n,\omega })\mid {{\hat{\varvec{d}}}}\in \mathscr {D},\,{\hat{{\mathbb {G}}}}_{n,\omega }\not \in {{\mathscr {E}}},\varvec{X}+\varvec{Y}\le \log n}\right]. \end{aligned}$$
(6.9)

But (6.9) does not yet suffice to prove (6.1) because outside the event \(\mathscr {D}{\setminus }{{\mathscr {E}}}\) the free energies of the two factor graphs may differ by \(\Omega (n)\). Hence, we also need to consider the event \(\mathscr {D}'\) that \(\varvec{d}\) has a single \(d-2\) entry; this suffices because

$$\begin{aligned} \mathbb {P}\left[{\varvec{d}\not \in \mathscr {D}\cup \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right],\mathbb {P}\left[{{{\hat{\varvec{d}}}}\not \in \mathscr {D}\cup \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]&=1-{\tilde{O}}(n^{-2}) \end{aligned}$$
(6.10)

and thus the contribution of the complement of \(\mathscr {D}\cup \mathscr {D}'\) to the free energy difference is o(1) due to (2.1). Considering the event \(\mathscr {D}'\) is indeed necessary because \(\mathbb {P}[{{\hat{\varvec{d}}}}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n]>\mathbb {P}[\varvec{d}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n]\). Indeed, while \({\hat{{\mathbb {G}}}}_{n,\omega }\) is just a uniformly random simple factor graph with \(N_{n,\omega }\) variable and \({\hat{M}}_{n,\omega }\) constraint nodes, \({\mathbb {G}}_{n,\omega }^-\) has a tilted distribution, with each possible simple graph being weighed according to the number of extensions into a simple graph with \(M_{n,\omega }\) constraints. In effect, since variable nodes of degree less than \(d-1\) leave us with fewer extensions, the event \(\mathscr {D}'\) is less likely in \({\mathbb {G}}_{n,\omega }^-\). Yet because \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) is bounded, on the event \(\mathscr {D}'\) we can couple \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega }'\) such that both differ only in a bounded number of constraint nodes. As a consequence,

$$\begin{aligned}&\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega })\mid \varvec{d}\in \mathscr {D}',\,{\mathbb {G}}_{n,\omega }^-\not \in {{\mathscr {E}}},\mid \varvec{X}+\varvec{Y}\le \log n}\right]\nonumber \\&\quad =\mathbb {E}\left[{\log Z({\mathbb {G}}'_{n,\omega })\mid {{\hat{\varvec{d}}}}\in \mathscr {D}',\,{\hat{{\mathbb {G}}}}_{n,\omega }\not \in {{\mathscr {E}}},\mid \varvec{X}+\varvec{Y}\le \log n}\right]+O(1). \end{aligned}$$
(6.11)

Additionally, we claim that also \({\mathbb {G}}_{n,\omega }\) given \(\varvec{d}\in \mathscr {D}\) and \({\mathbb {G}}'_{n,\omega }\) given \({{\hat{\varvec{d}}}}\in \mathscr {D}'\) can be coupled such that with probability \(1-{\tilde{O}}(1/n)\) both differ only in \({\tilde{O}}(1)\) constraint nodes and that, in effect,

$$\begin{aligned}&\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega })\mid \varvec{d}\in \mathscr {D},\,{\mathbb {G}}_{n,\omega }^-\not \in {{\mathscr {E}}},\mid \varvec{X}+\varvec{Y}\le \log n}\right]\nonumber \\&\quad =\mathbb {E}\left[{\log Z({\mathbb {G}}'_{n,\omega })\mid {{\hat{\varvec{d}}}}\in \mathscr {D}',\,{\hat{{\mathbb {G}}}}_{n,\omega }\not \in {{\mathscr {E}}},\mid \varvec{X}+\varvec{Y}\le \log n}\right]+{\tilde{O}}(1). \end{aligned}$$
(6.12)

To see this, let \(u_1,\ldots ,u_\ell \) be the variables nodes of degree less than d in \({\mathbb {G}}_{n,\omega }^-\); suppose, indeed, that all of them have degree \(d-1\). Similarly, let \(u_1',\ldots ,u_{\ell -1}'\) be the cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\), all of degree \(d-1\) except for \(u_{\ell -1}'\), which has degree \(d-2\). Pick a further variable node \(u_\ell '\) of degree d randomly. Then with probability \(1-{\tilde{O}}(n^{-1})\) the second neighborhoods \(\partial ^2\{u_1,\ldots ,u_\ell \}\), \(\partial ^2\{u_1',\ldots ,u_\ell '\}\) both have size \(\ell (k-1)(d-1)\). Consequently, the subgraphs of \({\mathbb {G}}_{n,\omega }^-\) and \({\hat{{\mathbb {G}}}}_{n,\omega }\) obtained by removing \(u_1,\ldots ,u_\ell \) and \(u_1',\ldots ,u_\ell '\) along with their neighbors, respectively, can be coupled such that both coincide with probability \(1-\tilde{O}(1/n)\). Thus, \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega }'\) can be coupled such that the expected number of constraint nodes on which the two factor graphs differ is \({\tilde{O}}(1)\), whence we obtain (6.12).

To deal with the event \({{\mathscr {E}}}\), we may assume that \(k=2\) due to (6.6). Furthermore, because of (6.8) and as

$$\begin{aligned} \mathbb {P}[\varvec{d}'\not \in \mathscr {D}\mid {\hat{{\mathbb {G}}}}_{n,\omega }\in {{\mathscr {E}}},\mid \varvec{X}+\varvec{Y}\le \log n]&={\tilde{O}}(1/n),\nonumber \\ \mathbb {P}[\varvec{d}\not \in \mathscr {D}\mid {\mathbb {G}}_{n,\omega }^-\in {{\mathscr {E}}},\mid \varvec{X}+\varvec{Y}\le \log n]&=\tilde{O}(1/n), \end{aligned}$$
(6.13)

we may assume that \(\varvec{d},\varvec{d}'\in \mathscr {D}\). Since

$$\begin{aligned} \mathbb {P}\left[{{\hat{{\mathbb {G}}}}_{n,\omega }\in {{\mathscr {E}}}\mid \varvec{d}'\in \mathscr {D},\varvec{X}+\varvec{Y}\le \log n}\right]&\ge \mathbb {P}\left[{{\mathbb {G}}_{n,\omega }^-\in {{\mathscr {E}}}\mid \varvec{d}\in \mathscr {D},\varvec{X}+\varvec{Y}\le \log n}\right] \end{aligned}$$
(6.14)

because the event \({{\mathscr {E}}}\) precludes certain extensions into a simple factor graph with \(M_{n,\omega }\) constraints, we just need to consider the case that \({\hat{{\mathbb {G}}}}_{n,\omega }\in {{\mathscr {E}}}\) and \({\mathbb {G}}_{n,\omega }^-\not \in {{\mathscr {E}}}\) given that \(\varvec{d},\varvec{d}'\in \mathscr {D}\). Let \(u_1,\ldots ,u_\ell \) and \(u_1',\ldots ,u_{\ell }'\) be the cavities of \({\mathbb {G}}_{n,\omega }^-\) and \({\hat{{\mathbb {G}}}}_{n,\omega }\), respectively. Pick one further constraint node b of \({\hat{{\mathbb {G}}}}_{n,\omega }\). Then with probability \(1-{\tilde{O}}(1/n)\) the set \(\partial ^2\{u_1,\ldots ,u_\ell \}\) has size \(\ell (d-1)\), and all variable nodes in this set have pairwise distance at least four. The same is true of the set \(\partial ^2\{u_1',\ldots ,u_\ell '\}\cup \partial b\) with probability \(1-{\tilde{O}}(1/n)\). If these two events occur, then \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega '}\) can be coupled such that they only differ on the \({\tilde{O}}(1)\) constraint nodes that are adjacent to \(u_1,\ldots ,u_\ell \) and \(u_1',\ldots ,u_\ell '\) and b. Hence, we obtain a coupling such that \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega '}\) only differ on \({\tilde{O}}(1/n)\) variable nodes in expectation, and thus

$$\begin{aligned}&\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega }')\mid \varvec{d}'\in \mathscr {D},\,{\hat{{\mathbb {G}}}}_{n,\omega }\in {{\mathscr {E}}},\varvec{X}+\varvec{Y}\le \log n}\right] \nonumber \\&\quad =\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega })\mid \varvec{d}\in \mathscr {D},\,{\mathbb {G}}_{n,\omega }^-\not \in {{\mathscr {E}}},\varvec{X}+\varvec{Y}\le \log n}\right]+{\tilde{O}}(1). \end{aligned}$$
(6.15)

Moreover, because given \({{\mathscr {E}}}\) there is precisely one constraint involving variables of degree \(d-1\) only with probability \(1-\tilde{O}(1/n)\), we obtain

$$\begin{aligned}&\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega }')\mid \varvec{d}'\in \mathscr {D},\,{\hat{{\mathbb {G}}}}_{n,\omega }\in {{\mathscr {E}}},\varvec{X}+\varvec{Y}\le \log n}\right] \nonumber \\&\quad =\mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega })\mid \varvec{d}\in \mathscr {D},\,{\mathbb {G}}_{n,\omega }^-\in {{\mathscr {E}}},\varvec{X}+\varvec{Y}\le \log n}\right]+{\tilde{O}}(1). \end{aligned}$$
(6.16)

Combining (6.15)–(6.16), we obtain the left bound stated in (6.1).

We proceed similarly to derive the right bound in (6.1). Indeed, in this case we do not need to consider the event \({{\mathscr {E}}}\) separately, because all additional constraint nodes are connected with a variable node that does not belong to \({\hat{{\mathbb {G}}}}_{n,\omega }\) or \({\mathbb {G}}_{n,\omega }^-\), respectively. Hence, on the event \(\mathscr {D}\) we can couple \({\mathbb {G}}_{n,\omega }''\) and \({\mathbb {G}}_{n+1,\omega }\) identically, and thus

$$\begin{aligned} \mathbb {E}\left[{\log Z({\mathbb {G}}_{n+1,\omega })\mid \varvec{d}\in \mathscr {D},\varvec{X}+\varvec{Y}\le \log n}\right]&= \mathbb {E}\left[{\log Z({\mathbb {G}}_{n,\omega }'')\mid \varvec{d}'\in \mathscr {D},\varvec{X}+\varvec{Y}\le \log n}\right]. \end{aligned}$$
(6.17)

In effect, due to (6.10) we just need to construct a coupling in the event that \({\hat{{\mathbb {G}}}}_{n,\omega }\in \mathscr {D}'\) and \({\hat{{\mathbb {G}}}}_{n,\omega }^-\in \mathscr {D}\). To this end, we proceed as above by coupling \({\hat{{\mathbb {G}}}}_{n,\omega }\), \({\mathbb {G}}_{n,\omega }^-\) given the second neighborhoods of the cavities such that both only differ in an expected \({\tilde{O}}(1/n)\) constraint nodes. Since \(\mathbb {P}\left[{\varvec{d}'\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right],\mathbb {P}\left[{\varvec{d}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]={\tilde{O}}(1/n)\) and \(\mathbb {P}\left[{\varvec{d}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]\ge \mathbb {P}\left[{\varvec{d}'\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]\), the second part of (6.1) follows from (6.17). \(\quad \square \)

We are ready to compare the free energies of \(Z({\varvec{G}}'')\), \(Z({\hat{{\varvec{G}}}})\) and \(Z({\varvec{G}}')\), \(Z({\hat{{\varvec{G}}}})\) and of the corresponding simple graphs. We will carry the proofs out for the case of the simple random factor graph \({\hat{{\mathbb {G}}}}\); the other case is simply obtained by skipping any deliberations pertinent to the event \(\mathscr {S}\).

Lemma 6.5

We have

$$\begin{aligned} \mathbb {E}\left[{\log \frac{Z({\varvec{G}}_{n,\omega }'')}{Z({\hat{{\varvec{G}}}}_{n,\omega })}}\right]&=\mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\hat{\rho }}_{n,\omega }}}\right\rangle }\right]+o_\omega (1),\\ \mathbb {E}\left[{\log \frac{Z({\mathbb {G}}_{n,\omega }'')}{Z({\hat{{\mathbb {G}}}}_{n,\omega })}}\right]&=\mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\tilde{\rho }}_{n,\omega }}}\right\rangle }\right]+o_\omega (1). \end{aligned}$$

Proof

Let \(\mathscr {A}\) be the event that \({\hat{{\mathbb {G}}}}_{n,\omega }\) has at least \(\omega /2\) cavities, that all variable nodes have degree either d or \(d-1\) and that no two variable nodes of degree \(d-1\) are adjacent to the same constraint node. Then \(\mathbb {P}\left[{\mathscr {A}}\right]=1-\exp (-\Omega _\omega (\omega ))\). Hence, (2.1) ensures that

$$\begin{aligned} \mathbb {E}[\log (Z({\mathbb {G}}_{n,\omega }'')/Z({\hat{{\mathbb {G}}}}_{n,\omega }))]&=\mathbb {E}[\log (Z({\mathbb {G}}_{n,\omega }'')/Z({\hat{{\mathbb {G}}}}_{n,\omega }))|\mathscr {A}]+o_\omega (1). \end{aligned}$$
(6.18)

Moreover, on \(\mathscr {A}\) the random factor graph \({\mathbb {G}}_{n,\omega }''\) is obtained from \({\mathbb {G}}_{n,\omega }\) by adding one variable node \({\hat{v}}\) along with d constraint nodes \({\hat{a}}_1,\ldots ,{\hat{a}}_d\), whose weight functions are drawn from P independently. Further, on the event \(\mathscr {A}\) all neighbors of the \({\hat{a}}_i\) except \({\hat{v}}\) belong to the set \({\hat{\mathscr {C}}}\) of cavities. Therefore, we have the exact formula

$$\begin{aligned} \frac{Z({\mathbb {G}}_{n,\omega }'')}{Z({\hat{{\mathbb {G}}}}_{n,\omega })}&= \left\langle {{\sum _{\chi \in \Omega }p(\chi )\prod _{i=1}^d\sum _{\tau \in \Omega ^{\partial {\hat{a}}_i}}\psi _{{\hat{a}}_i}(\tau ) \varvec{1}\{\tau _{{\hat{x}}}=\chi ,\,\forall y\in \partial {\hat{a}}_i{\setminus }{\hat{x}}:\varvec{\sigma }_y=\tau _y\}},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle . \end{aligned}$$
(6.19)

To proceed, let \((\varvec{v}_{i,j})_{i,j}\) be a sequence of uniformly and independently chosen cavities \(\varvec{v}_{i,j}\in {\hat{\mathscr {C}}}\). We claim that on the event \(\mathscr {A}\),

$$\begin{aligned}&\mathbb {E}\left[{\log \frac{Z({\mathbb {G}}_{n,\omega }'')}{Z({\hat{{\mathbb {G}}}}_{n,\omega })}\,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right]\nonumber \\&\quad =o_\omega (1)\nonumber \\&\quad +\mathbb {E}\left[{\log \left\langle {{\sum _{\chi \in \Omega }p(\chi )\prod _{i=1}^d\sum _{\tau \in \Omega ^{k}}\varvec{\psi }_{i}(\tau ) \varvec{1}\{\tau _{\varvec{h}_i}=\chi ,\,\forall h\in [k]{\setminus } \varvec{h}_i:\varvec{\sigma }_{\varvec{v}_{i,h}}=\tau _h\}},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle \,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right]. \end{aligned}$$
(6.20)

Indeed, the only difference between (6.19) and (6.20) is that in the former the neighbours \(\partial {\hat{a}}_i{\setminus }{\hat{v}}\) are chosen from \({\hat{\mathscr {C}}}\) without replacement, whereas the \(\varvec{v}_{i,j}\) are chosen independently, i.e., with replacement. But since we choose a mere dk cavities \((\varvec{v}_{i,j})_{i\in [d], j\in [k]}\) out of a total of at least \(\omega /2\), the probability of hitting the same cavity twice is \(o_\omega (1)\), and thus (6.20) follows from (2.1). Further, unravelling the definitions of \({\tilde{\rho }}_{n,\omega }\) and \(\varvec{\varphi }_1\), we see that

$$\begin{aligned}&\mathbb {E}\left[{\log \left\langle {{\sum _{\chi \in \Omega }p(\chi )\prod _{i=1}^d\sum _{\tau \in \Omega ^{k}}\varvec{\psi }_{i}(\tau ) \varvec{1}\{\tau _{\varvec{h}_i}=\chi ,\,\forall h\in [k]{\setminus } \varvec{h}_i:\varvec{\sigma }_{\varvec{v}_{i,h}}=\tau _h\}},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle \,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right]\nonumber \\&\quad =\mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\tilde{\rho }}_{n,\omega }}}\right\rangle \mid {\hat{{\mathbb {G}}}}_{n,\omega }}\right]. \end{aligned}$$
(6.21)

Finally, the assertion follows from (6.18)–(6.21) by taking the expectation on \({\hat{{\mathbb {G}}}}_{n,\omega }\). \(\square \)

Lemma 6.6

We have

$$\begin{aligned} \mathbb {E}\left[{\log \frac{Z({\varvec{G}}_{n,\omega }')}{Z({\hat{{\varvec{G}}}}_{n,\omega })}}\right]&= \mathbb {E}\left\langle {{\log \bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{{\hat{\rho }}_{n,\omega }}}\right\rangle +o_\omega (1),\\ \mathbb {E}\left[{\log \frac{Z({\mathbb {G}}_{n,\omega }')}{Z({\hat{{\mathbb {G}}}}_{n,\omega })}}\right]&= \mathbb {E}\left\langle {{\log \bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{{\tilde{\rho }}_{n,\omega }}}\right\rangle +o_\omega (1). \end{aligned}$$

Proof

The proof is similar in spirit to the previous one. Once more we consider the event \(\mathscr {A}\) that \({\hat{{\mathbb {G}}}}_{n,\omega }\) has at least \(\omega /2\) cavities, that all variable nodes have degree either d or \(d-1\) and that no two variable nodes of degree \(d-1\) are adjacent to a common constraint node. Then \(\mathbb {P}\left[{\mathscr {A}}\right]=1-\exp (-\Omega _\omega (\omega ))\) and

$$\begin{aligned} \mathbb {E}[\log (Z({\mathbb {G}}_{n,\omega }')/Z({\hat{{\mathbb {G}}}}_{n,\omega }))]&=\mathbb {E}[\log (Z({\mathbb {G}}_{n,\omega }'')/Z({\hat{{\mathbb {G}}}}_{n,\omega }))|\mathscr {A}]+o_\omega (1). \end{aligned}$$
(6.22)

Moreover, we have the pointwise exact formula

$$\begin{aligned} \frac{Z({\mathbb {G}}_{n,\omega }')}{Z({\hat{{\mathbb {G}}}}_{n,\omega })}&=\left\langle {{\prod _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }} \sum _{\tau \in \Omega ^{\partial a_i}}\psi _{a_i}(\tau )\varvec{1}\{\varvec{\sigma }_{\partial a_i}=\tau \}},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle . \end{aligned}$$
(6.23)

With \((\varvec{v}_{i,j})_{i,j}\) a sequence of independently chosen cavities \(\varvec{v}_{i,j}\in {\hat{\mathscr {C}}}\), we claim that on \(\mathscr {A}\),

$$\begin{aligned} \mathbb {E}\left[{\log \frac{Z({\mathbb {G}}_{n,\omega }')}{Z(\hat{\mathbb {G}}_{n,\omega })}\,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right]&=o_\omega (1)\nonumber \\&\quad + \mathbb {E}\left[{\log \left\langle {{\prod _{i=1}^{M_{n,\omega }-{\hat{M}}_{n,\omega }} \sum _{\tau \in \Omega ^{k}}\varvec{\psi }_{i}(\tau )\prod _{j=1}^k\varvec{1}\{\varvec{\sigma }_{\varvec{v}_{i,j}}=\tau _j\}},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle \,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right]. \end{aligned}$$
(6.24)

Indeed, the only difference is that in (6.24) the \(\varvec{v}_{i,j}\) are chosen independently, whereas in (6.23) the neighbors of the \(a_i\) are chosen without replacement. But since \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) is bounded while there are at least \(\omega /2\) cavities, the two terms coincide up to \(o_\omega (1)\). Finally, the construction of \({\tilde{\rho }}_{n,\omega }\) ensures that

$$\begin{aligned}&\mathbb {E}\left[{\log \left\langle {{\prod _{i=1}^{M_{n,\omega }-{\hat{M}}_{n,\omega }} \sum _{\tau \in \Omega ^{k}}\varvec{\psi }_{i}(\tau )\prod _{j=1}^k\varvec{1}\{\varvec{\sigma }_{\varvec{v}_{i,j}}=\tau _j\}},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle \,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right]\nonumber \\&\quad =\mathbb {E}\left[{\log \left\langle {{\bigoplus _{i=1}^{M_{n,\omega }-{\hat{M}}_{n,\omega }}\varvec{\psi }_{1,i}},{{\hat{\rho }}_{n,\omega }}}\right\rangle \,\bigg |\,{\hat{{\mathbb {G}}}}_{n,\omega }}\right], \end{aligned}$$
(6.25)

and thus the assertion follows from (6.22)–(6.25) by taking the expectation. \(\quad \square \)

Finally, Proposition 6.2 is an immediate consequence of Lemmas 6.4, 6.5 and 6.6.

6.3 Proof of Proposition 6.3

Once more we will carry the proof out for the simple random factor graph, which is the (slightly) more intricate case; the unconditional case follows by skipping any considerations pertaining to the conditioning. The basic idea behind the proof of Proposition 6.3 is quite simple. With probability \(1-o_\omega (1)\) the random graph \({\hat{{\mathbb {G}}}}_{n,\omega }\) consists of \(N_{n,\omega }\) variable and \({\hat{M}}_{n,\omega }\) constraint nodes and we have \(N_{n,\omega }=n-\varvec{X}\) and

$$\begin{aligned} {\hat{M}}_{n,\omega }=M_{n+1,\omega }-d=\lfloor d N_{n,\omega }/k\rfloor \wedge \left( {\lfloor d N_{n,\omega }/k\rfloor +\Delta _{n,\omega }-d\varvec{X}-\varvec{Y}}\right) -d \end{aligned}$$

with independent \(\mathrm{Po}(\omega )\) variables \(\varvec{X},\varvec{Y}\). Fix two integers \(\ell ,\ell '\ge 0\). Given that \(\varvec{X}\ge \ell \) and \(k{\hat{M}}_{n,\omega }\le dN_{n,\omega }-k\ell '-d(k-1)\ell \), let \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) be the random factor graph obtained from \({\hat{{\mathbb {G}}}}_{n,\omega }\) by adding

  • \(\ell \) more variable nodes \({\hat{v}}_1=v_{N_{n,\omega }+1},\ldots ,{\hat{v}}_\ell =v_{N_{n,\omega }+\ell }\) along with \(d\ell \) new constraint nodes \({\hat{a}}_{i,j}\), \(i\in [\ell ]\), \(j\in [d]\), each with a weight function chosen from P independently; connect a random clone \({\hat{\varvec{h}}}_{i,j}\) of each \({\hat{a}}_{i,j}\) with a random clone of \({\hat{v}}_i\) and pair the other \(k-1\) clones of \({\hat{a}}_{i,j}\) with random cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\) left pending by the previous additions.

  • \(\ell '\) more constraint nodes \({\hat{a}}_1,\ldots ,{\hat{a}}_{\ell '}\), each endowed with a weight function chosen from P independently and each connected with k random cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\) left vacant by the previous operations.

The resulting random factor graph \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) is not necessarily simple. Yet the key insight behind Proposition 6.3 is that for any \(\ell ,\ell '\) the distribution of \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) is close to that of the original graph \({\hat{{\mathbb {G}}}}_{n,\omega }\), provided that \(\omega \) is big enough. Moreover, the perturbation of the Boltzmann distribution that ensues upon going from \({\hat{{\mathbb {G}}}}_{n,\omega }\) to \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) is close to the perturbation induced by the \(*(\ell ,\ell ')\)-operation. We introduce similar notation \({\hat{{\varvec{G}}}}_{n,\omega }[\ell ,\ell ']\) for the random graph without the conditioning on \(\mathscr {S}\).

To formalize this idea, we first compare the distributions of \({\hat{{\mathbb {G}}}}_{n,\omega }\) and \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\). For integers xy we denote by \({\hat{{\mathbb {G}}}}_{n,\omega ,x,y}\) the conditional \({\hat{{\mathbb {G}}}}_{n,\omega }\) given that \(\varvec{X}=x\) and \(\varvec{Y}=y\).

Lemma 6.7

For any \(\ell ,\ell '\ge 0\) we have \(d_{\mathrm {TV}}\left( {{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]}\right) =o_\omega (1)\) and analogously \(d_{\mathrm {TV}}\left( {{\hat{{\varvec{G}}}}_{n,\omega },{\hat{{\varvec{G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]}\right) =o_\omega (1)\).

Proof

The event \(\mathscr {A}=\{\omega /2\le \varvec{X}\le 2\omega ,\, \omega /2\le \varvec{Y}\le 2\omega \}\) has probability \(1-o_\omega (1)\). Further, because \(\varvec{X},\varvec{Y}\) are independent Poisson variables with a large mean \(\omega \) while \(\ell ,\ell '\) are fixed, the total variation distance of the pairs \((\varvec{X},\varvec{Y})\) and \((\varvec{X}-\ell ,\varvec{Y}-\ell ')\) is of order \(O_\omega (\omega ^{-1/2})\). Hence, given \(\mathscr {A}\) the total variation distance of \({\hat{{\mathbb {G}}}}_{n,\omega }\) and \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}\) is \(o_\omega (1)\); in symbols,

$$\begin{aligned} d_{\mathrm {TV}}\left( {{\hat{{\mathbb {G}}}}_{n,\omega }\mid \mathscr {A},{\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}\mid \mathscr {A}}\right) =o_\omega (1). \end{aligned}$$
(6.26)

Further, let \(\mathscr {A}'\) be the event that \({\hat{{\mathbb {G}}}}_{n,\omega }\) enjoys the following additional properties.

  1. (i’)

    The last \(\ell \) variable nodes of \({\hat{{\mathbb {G}}}}_{n,\omega }\) satisfy \(\left| {\partial ^2\{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}{\setminus }{\hat{\mathscr {C}}}}\right| =\ell d(k-1).\) Hence, there are \(\ell d(k-1)\) distinct second neighbors, none of which is a cavity.

  2. (ii’)

    The last \(\ell '\) constraint nodes of \({\hat{{\mathbb {G}}}}_{n,\omega }\) satisfy \(\left| {\partial \{a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\}{\setminus }{\hat{\mathscr {C}}}}\right| =k\ell '\). Hence, there are \(k\ell ' \) distinct second neighbors, none of them a cavity.

  3. (iii’)

    We have \(\partial \{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}\cap \{a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\}=\varnothing \).

  4. (iv’)

    Let

    $$\begin{aligned} \mathscr {U}={\hat{\mathscr {C}}}\cup \partial ^2\{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}\cup \partial \{a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\}. \end{aligned}$$

    Then for any constraint node \(a\not \in \partial \{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}\cup a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\) we have \(|\partial a\cap \mathscr {U}|\le 1\). Thus, only the constraint nodes adjacent to the last \(\ell \) variable nodes or the \(a_i\) with \(i>{\hat{M}}_{n,\omega }-\ell '\) may be adjacent to more than one variable node in \(\mathscr {U}\).

  5. (v’)

    All variable nodes \(u\in \mathscr {U}\) have degree d or \(d-1\).

Additionally, let \(\mathscr {A}''\) be the event that \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}\) has the following properties.

  1. (i’)

    all variable nodes have degree either d or \(d-1\).

  2. (ii’)

    no two variable nodes of degree \(d-1\) are adjacent to the same constraint node.

Then

$$\begin{aligned} \mathbb {P}\left[{{\hat{{\mathbb {G}}}}_{n,\omega }\in \mathscr {A}'\mid \mathscr {A}}\right]&=1-o_\omega (1),&\mathbb {P}\left[{{\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}\in \mathscr {A}''\mid \mathscr {A}}\right]&=1-o_\omega (1). \end{aligned}$$
(6.27)

Furthermore, given \(\mathscr {A}''\cap \mathscr {A}\), the random factor graph \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}[\ell ,\ell ']\) obtained by attaching \(\ell \) new variable nodes and \(\ell '\) new constraint nodes is distributed precisely as \({\hat{{\mathbb {G}}}}_{n,\omega }\) given \(\mathscr {A}'\cap \mathscr {A}\). Indeed, the construction of the enhanced factor graph \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}[\ell ,\ell ']\) expressly ensures that (i’)–(iii’) are satisfied, and (iv’)–(v’) follow from (i”)–(ii”). Hence, (6.27) yields

$$\begin{aligned} d_{\mathrm {TV}}\left( {{\hat{{\mathbb {G}}}}_{n,\omega }\mid \mathscr {A},{\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}[\ell ,\ell ']\mid \mathscr {A}}\right) =o_\omega (1). \end{aligned}$$
(6.28)

Finally, since \(\mathbb {P}\left[{\mathscr {A}}\right]=1-o_\omega (1)\), the assertion follows from (6.26) and (6.28). \(\quad \square \)

Let \({\hat{\mathscr {C}[\ell }},\ell ']\) be the set of cavities of \({\hat{{\varvec{G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) and let \({\hat{\rho }}_{n,\omega }[\ell ,\ell ']\in \mathfrak {K}\) be the kernel representing \(\mu _{{\hat{{\varvec{G}}}}_{n,\omega }\left[{\ell ,\ell '}\right],{\hat{\mathscr {C}[\ell }},\ell ']}\). Let \({\hat{\pi }}_{n,\omega }\left[{\ell ,\ell '}\right]\in \mathfrak {D}\) be the distribution of \({\hat{\rho }}_{n,\omega }[\ell ,\ell ']\). Define \({\tilde{\mathscr {C}}}[\ell ,\ell ']\), \({\tilde{\rho }}_{n,\omega }[\ell ,\ell ']\) analogously for \({\hat{{\mathbb {G}}}}_{n,\omega }\).

Lemma 6.8

For any \(\ell ,\ell '\ge 0\) we have

$$\begin{aligned} \mathscr {D}_{\Box }\left( {{\hat{\pi }}_{n,\omega }^{*(\ell ,\ell ')},{\hat{\pi }}_{n,\omega }\left[{\ell ,\ell '}\right]}\right)&=o_\omega (1),&\mathscr {D}_{\Box }\left( {{\tilde{\pi }}_{n,\omega }^{*(\ell ,\ell ')},{\tilde{\pi }}_{n,\omega }\left[{\ell ,\ell '}\right]}\right) =o_\omega (1). \end{aligned}$$

Proof

The event \(\mathscr {A}_1=\left\{ {\omega /2\le \varvec{X},\varvec{Y}\le 2\omega }\right\} \) occurs with probability \(1-o_\omega (1)\). So does the event \(\mathscr {A}_2\) that all variable nodes of \({\hat{{\mathbb {G}}}}_{n,\omega }\) have degree either d or \(d-1\), and thus the same is true of \(\mathscr {A}=\mathscr {A}_1\cap \mathscr {A}_2\). Moreover, the construction of \({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell ']\) is such that on the event \(\mathscr {A}\) we have the exact formula

$$\begin{aligned} \frac{Z({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell '])}{Z({\hat{{\mathbb {G}}}}_{n,\omega })}&=\left\langle {{\prod _{i=1}^\ell \varphi _i(\varvec{\sigma })\prod _{i=1}^{\ell '}\psi _{{\hat{a}}_i}(\varvec{\sigma })},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}}}\right\rangle ,\qquad \text{ where }\\ \varphi _i(\sigma )&=\sum _{\chi \in \Omega }p(\chi )\prod _{j=1}^d\sum _{\tau \in \Omega ^{\partial {\hat{a}}_{i,j}}} \psi _{{\hat{a}}_{i,j}}(\tau )\varvec{1}\{\tau _{{\hat{v}}_i}=\chi ,\,\forall w\in \partial {\hat{a}}_{i,j}{\setminus }{\hat{v}}_i:\sigma _w=\tau _w \}. \end{aligned}$$

Consequently, the joint distribution of the cavities \({\tilde{\mathscr {C}}}[\ell ,\ell ']\) of \({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell ']\) reads

$$\begin{aligned}&\mu _{{\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell '],{\tilde{\mathscr {C}}}[\ell ,\ell ']}(\sigma )\nonumber \\&\quad =\frac{ \left\langle {{\varvec{1}\{\forall u\in {\tilde{\mathscr {C}}}[\ell ,\ell ']:\varvec{\sigma }_u=\sigma _u\}\prod _{i=1}^\ell \varphi _i(\sigma )\prod _{i=1}^{\ell '}\psi _{{\hat{a}}_i}(\sigma )},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega }}(\sigma )}}\right\rangle }{\left\langle {{\prod _{i=1}^\ell \varphi _i(\varvec{\sigma })\prod _{i=1}^{\ell '}\psi _{{\hat{a}}_i}(\varvec{\sigma })},{\mu _{{\hat{{\mathbb {G}}}}_{n,\omega },{\tilde{\mathscr {C}}}}}}\right\rangle }\qquad (\sigma \in \Omega ^{{\tilde{\mathscr {C}}}[\ell ,\ell ']}). \end{aligned}$$
(6.29)

Thus, with probability \(1-o_\omega (1)\), namely on the event \(\mathscr {A}\), \({\tilde{\rho }}_{n,\omega }[\ell ,\ell ']\) is just the kernel representing the right hand side of (6.29) We claim that in this case \({\tilde{\rho }}_{n,\omega }[\ell ,\ell ']\) and \({\tilde{\rho }}_{n,\omega }^{*(\ell ,\ell ')}\) can be coupled to coincide with probability \(1-o_\omega (1)\). Indeed, the weight functions associated with the \({\hat{a}}_i\) and the \({\hat{a}}_{i,j}\) are chosen from P independently, and they are connected to the cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\) by a random pairing. By comparison, we construct \({\tilde{\rho }}_{n,\omega }^{*(\ell ,\ell ')}\) by adjoining \(\varvec{\varphi }_1,\ldots ,\varvec{\varphi }_\ell \) and \(\varvec{\psi }_1,\ldots ,\varvec{\psi }_{\ell '}\) that evaluate the kernel \({\tilde{\rho }}_{n,\omega }\) at independent uniformly random points of the unit interval. Combinatorially, this is equivalent to attaching the new variable and constraint nodes to random cavities chosen with replacement, rather than without replacement as in the construction of \({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell ']\). But since the number of cavities of \({\hat{{\mathbb {G}}}}\) is \(\Omega _\omega (\omega )\), the two constructions have total variation distance \(o_\omega (1)\). \(\quad \square \)

Proof of Proposition 6.3

The proposition is immediate from Lemmas 6.7 and 6.8. \(\square \)

6.4 Proof of Proposition 6.1

We begin with the following lemma, whose proof is similar to the proof of Lemma 5.5.

Lemma 6.9

We have

$$\begin{aligned} \mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\hat{\pi }}_{n,\omega }}}\right\rangle -\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{{\hat{\pi }}_{n,\omega }}}\right\rangle }\right]&=\mathscr {B}({\hat{\pi }}_{n,\omega })+o_\omega (1),\\ \mathbb {E}\left[{\log \left\langle {{\varvec{\varphi }_1},{{\tilde{\pi }}_{n,\omega }}}\right\rangle -\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{{\tilde{\pi }}_{n,\omega }}}\right\rangle }\right]&=\mathscr {B}({\tilde{\pi }}_{n,\omega })+o_\omega (1). \end{aligned}$$

Proof

Let \(\pi ={\hat{\pi }}_{n,\omega }\) or \(\pi ={\tilde{\pi }}_{n,\omega }\). Since \(\mathbb {E}[M_{n,\omega }-{\hat{M}}_{n,\omega }]=d(k-1)/k+o_\omega (1)\), due to (2.1) it suffices to show that

$$\begin{aligned} \mathbb {E}\left[{\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{\pi }}\right\rangle }\right]&=o_{\omega }(1)+\mathbb {E}[M_{n,\omega }-{\hat{M}}_{n,\omega }]\cdot \mathbb {E}\left[{\log \left\langle {{\varvec{\psi }_{1,1}},{\pi }}\right\rangle }\right]. \end{aligned}$$
(6.30)

Thus, we need to cope with the correlations between \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) and \({\hat{{\varvec{G}}}}_{n,\omega }\) or \({\hat{{\mathbb {G}}}}_{n,\omega }\), respectively. In other words, we need to assess the correlations between \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) and \({\hat{N}}_{n,\omega }\), \({\hat{M}}_{n,\omega }\). With probability \(1-\exp (-\Omega _\omega (\omega ))\) we have

$$\begin{aligned} M_{n,\omega }&-{\hat{M}}_{n,\omega }\quad =W,\quad \text{ where }\quad W\,=\,\lfloor d N_{n,\omega }/k\rfloor \nonumber \\&-\lfloor d(1+N_{n,\omega })/k\rfloor +\Delta _{n,\omega }-\Delta _{n+1,\omega }+d, \end{aligned}$$
(6.31)

with independent Bernoulli variables \(\Delta _{n,\omega },\Delta _{n+1,\omega }\). Thus, \(W\le d+1\) and (2.1) ensures that

$$\begin{aligned}&\mathbb {E}\left[{\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{\pi }}\right\rangle }\right]= \mathbb {E}\left[{\log \left\langle {{\bigoplus _{i=1}^W\varvec{\psi }_{1,i}},{\pi }}\right\rangle }\right]+o_\omega (1)\nonumber \\&\quad =\mathbb {E}\left[{\varvec{1}\left\{ {\left| {{\hat{N}}_{n,\omega }-\mathbb {E}[{\hat{N}}_{n,\omega }]}\right| ,\left| {{\hat{M}}_{n,\omega }-\mathbb {E}[{\hat{M}}_{n,\omega }]}\right| \le \sqrt{\omega \log \omega }}\right\} \log \left\langle {{\bigoplus _{i=1}^W\varvec{\psi }_{1,i}},{\pi }}\right\rangle }\right]+o_\omega (1). \end{aligned}$$
(6.32)

Furthermore, since \(\varvec{X},\varvec{Y}\) are independent Poisson variables with mean \(\omega \) while W is bounded, for any \({\hat{n}},{\hat{m}}\) such that \(|{\hat{n}}-\mathbb {E}[{\hat{N}}_{n,\omega }]|,|{\hat{m}}-\mathbb {E}[{\hat{M}}_{n,\omega }]|\le \sqrt{\omega \log \omega }\) we obtain from (6.31) that

$$\begin{aligned}&\mathbb {P}\left[{{\hat{M}}_{n,\omega }={\hat{m}}\mid {\hat{N}}_{n,\omega }={\hat{n}}}\right]&=(1+o_\omega (1))\mathbb {P}\left[{{\hat{M}}_{n,\omega }={\hat{m}}\mid N_{n,\omega }={\hat{n}},\, W=h}\right]\nonumber \\&\quad \text{ for } \text{ any } 0\le h\le d+1. \end{aligned}$$

Hence, introducing an independent copy \(W'\) of W, we obtain from (2.1) and (6.32) that

$$\begin{aligned} \mathbb {E}\left[{\log \left\langle {{\bigoplus _{{\hat{M}}_{n,\omega }<i\le M_{n,\omega }}\varvec{\psi }_{1,i}},{\pi }}\right\rangle }\right]&=\mathbb {E}\left[{\log \left\langle {{\bigoplus _{i=1}^{W'}\varvec{\psi }_{1,i}},{\pi }}\right\rangle }\right]+o_\omega (1). \end{aligned}$$
(6.33)

Additionally, we claim that for any \(0\le w\le d+1\),

$$\begin{aligned} \mathbb {E}\left[{\log \frac{\left\langle {{\bigoplus _{i=1}^{w+1}\varvec{\psi }_{1,i}},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{w}\varvec{\psi }_{1,i}},{\pi }}\right\rangle }}\right]&=\mathbb {E}\left[{\log \left\langle {{\varvec{\psi }_{1,1}},{\pi }}\right\rangle }\right]+o_\omega (1). \end{aligned}$$
(6.34)

Indeed, as in the proof of Lemma 5.5 we obtain

$$\begin{aligned} \mathbb {E}\left[{\left( {\frac{\left\langle {{\bigoplus _{i=1}^{w+1}\varvec{\psi }_i},{\pi }}\right\rangle }{\left\langle {{\bigoplus _{i=1}^{w}\varvec{\psi }_i},{\pi }}\right\rangle }}\right) ^\ell }\right]&=\mathbb {E}\left[{\left\langle {{\varvec{\psi }_{w+1}},{\bigoplus _{i=1}^w\varvec{\psi }_i*\pi }}\right\rangle ^\ell }\right]. \end{aligned}$$
(6.35)

Further, (2.1), Corollary 3.4 and Proposition 6.3 yield

$$\begin{aligned} \mathbb {E}\left[{\left\langle {{\varvec{\psi }_{w+1}},{\bigoplus _{i=1}^w\varvec{\psi }_i*\pi }}\right\rangle ^\ell }\right]&=\mathbb {E}\left[{\left\langle {{\varvec{\psi }_{w+1}},{\pi }}\right\rangle ^\ell }\right] +o_\omega (1). \end{aligned}$$
(6.36)

As the logarithm can be approximated arbitrarily well by polynomials due to (2.1), (6.34) follows from (6.35)–(6.36). Finally, (6.30) follows from (6.33) and (6.34). \(\quad \square \)

Proof of Proposition 6.1

Proposition 6.3 and Lemma 6.9 show that for any \(\ell \ge 1\) there exists \(\omega _{\ell }>\omega _{\ell -1}\) such that for all sufficiently large n we have \({\hat{\pi }}_{n,\omega _\ell }\in \mathfrak {D}^\star _{1/\ell ,\ell ,\ell }\) and

$$\begin{aligned} \frac{1}{n} \mathbb {E}\log Z ({\varvec{G}})&\ge \mathscr {B}({\hat{\pi }}_{n,\omega _\ell })-1/\ell . \end{aligned}$$
(6.37)

Since \(\mathfrak {D}\) is compact, the sequence \(({\hat{\pi }}_{n,\omega _\ell })_n\) has a convergent subsequence, whose limit \({\hat{\pi }}^{(\ell )}\) lies in the closed set \(\mathfrak {D}^\star _{1/\ell ,\ell ,\ell }\). Furthermore, because Lemma 3.3 shows that \(\mathscr {B}(\,\cdot \,)\) is continuous, (6.37) yields

$$\begin{aligned} \liminf _{n\rightarrow \infty }\frac{1}{n} \mathbb {E}\log Z ({\varvec{G}})&\ge \mathscr {B}({\hat{\pi }}^{(\ell )})-1/\ell . \end{aligned}$$
(6.38)

Additionally, \(({\hat{\pi }}^{(\ell )})_\ell \) has a subsequence that converges to \({\tilde{\pi }}\in \mathfrak {D}^\star =\bigcap _\ell \mathfrak {D}^\star _{1/\ell ,\ell ,\ell }\). Thus, the first assertion follows from (6.38) and the continuity of \(\mathscr {B}(\,\cdot \,)\) established by Corollary 3.5.

The second assertion concerning the simple random factor graph \({\mathbb {G}}\) is immediate from the first. Indeed, due to (2.1) a standard application of Azuma’s inequality shows that \(n^{-0.51}|\log Z({\varvec{G}})-\mathbb {E}\log Z({\varvec{G}})|\rightarrow 0\) in probability. Hence, Fact 2.2 and Bayes’ rule imply that \(\mathbb {E}\log Z({\mathbb {G}})-\mathbb {E}\log Z({\varvec{G}})=o(n)\). \(\quad \square \)

6.5 Proof of Theorem 2.6

We begin by showing that the free energy of \({\varvec{G}}\) can be expressed in terms of the functional \(\mathscr {B}\) applied to \({\hat{\pi }}_{n,\omega }\) or \({\tilde{\pi }}_{n,\omega }\), respectively. Once more we will carry the details out for \({\mathbb {G}}\); the unconditioned random factor graph \({\varvec{G}}\) is easier to deal with, and the proofs are just obtained from the \({\mathbb {G}}\) case by dropping any considerations regarding multiple edges.

Lemma 6.10

If POS is satisfied, then

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\varvec{G}})=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}({\hat{\pi }}_{n,\omega }),\\&\quad \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}({\tilde{\pi }}_{n,\omega }). \end{aligned}$$

Proof

Proposition 6.2 and Lemma 6.9 show that for any \(\varepsilon >0\) there exists \(\omega _0\) such that for all \(\omega >\omega _0\) there exists \(n_0\) such that for all \(n>n_0\) we have \(n^{-1}\mathbb {E}\log Z({\mathbb {G}})\ge \mathscr {B}({\tilde{\pi }}_{n,\omega })-\varepsilon .\) Hence, for any \(\varepsilon >0\) there is \(\omega _0>0\) such that for all \(\omega >\omega _0\) we have

$$\begin{aligned} \liminf _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})&\ge \liminf _{n\rightarrow \infty }\mathscr {B}({\tilde{\pi }}_{n,\omega })-\varepsilon . \end{aligned}$$
(6.39)

Indeed, since Propositions 5.1 and 6.1 show that \((\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}}))_n\) converges, (6.39) yields

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})=\liminf _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})&\ge \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}({\tilde{\pi }}_{n,\omega }). \end{aligned}$$
(6.40)

We are left to prove the converse inequality. The space \(\mathfrak {D}\) is compact and separable. Therefore, for any \(\omega \) the sequence \(({\tilde{\pi }}_{n,\omega })_n\) has a subsequence that converges to \({\tilde{\pi }}^{(\omega )}\in \mathfrak {D}\) such that \(\liminf _{n\rightarrow \infty }\mathscr {B}({\tilde{\pi }}_{n,\omega })=\mathscr {B}({\tilde{\pi }}^{(\omega )})\). Further, \(({\tilde{\pi }}^{(\omega )})_\omega \) has a subsequence that converges to \({\tilde{\pi }}^*\) such that

$$\begin{aligned} \liminf _{\omega \rightarrow \infty }\mathscr {B}({\tilde{\pi }}^{(\omega )})&=\mathscr {B}({\tilde{\pi }}^{*}). \end{aligned}$$
(6.41)

Proposition 6.3 shows that \({\tilde{\pi }}^{*}\in \mathfrak {D}^\star \). Hence, Proposition 5.1 implies that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})&\le \mathscr {B}({\tilde{\pi }}^{*})= \liminf _{\omega \rightarrow \infty }\mathscr {B}({\tilde{\pi }}^{(\omega )})= \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}({\tilde{\pi }}_{n,\omega }). \end{aligned}$$
(6.42)

Thus, the assertion follows from (6.40)–(6.42). \(\quad \square \)

To proceed we need a small twist on Lemma 6.10. Namely, instead of using \({\hat{{\mathbb {G}}}}_{n,\omega }\) as our reference point, we are going to work with \({\mathbb {G}}_{n,\omega }\). Thus, let \(\mathscr {C}\) be the set of cavities of \({\mathbb {G}}_{n,\omega }\) and let \(\rho _{n,\omega ,\mathscr {S}}\in \mathfrak {K}\) be the kernel representing \(\mu _{{\mathbb {G}}_{n,\omega },\mathscr {C}}\). Further, let \(\pi _{n,\omega ,\mathscr {S}}\in \mathfrak {D}\) be the distribution of \(\rho _{n,\omega ,\mathscr {S}}\). Define \(\rho _{n,\omega }\), \(\pi _{n,\omega }\) analogously with respect to \({\varvec{G}}_{n,\omega }\).

Corollary 6.11

If POS is satisfied, then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\varvec{G}})&=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}(\pi _{n,\omega }),\\ \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})&=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}(\pi _{n,\omega ,\mathscr {S}}). \end{aligned}$$

Proof

Since \(\varvec{X},\varvec{Y}\) are Poisson variables with a large mean \(\omega \), \({\hat{M}}_{n,\omega }\) and \(M_{n,\omega }\) can be coupled so that both coincide with probability \(1-o_\omega (1)\). This coupling naturally extends to a coupling of \({\hat{{\mathbb {G}}}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega }\) under which \({\mathbb {G}}_{n,\omega }={\hat{{\mathbb {G}}}}_{n,\omega }\) with probability \(1-o_\omega (1)\). Consequently, recalling that \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\) stands for the Wasserstein metric on \(\mathfrak {D}\), we have \(\mathscr {D}_{\Box }(\pi _{n,\omega ,\mathscr {S}},{\tilde{\pi }}_{n,\omega })=o_\omega (1)\). Thus, the assertion follows from the Corollary 3.5. \(\quad \square \)

We recall the construction of the kernel \({\check{\mu }}_{{\varvec{G}},\varvec{X},\varvec{Y}}\in \mathfrak {K}\) from (2.14). Let \({\check{\pi }}_{n,\omega }\in \mathfrak {D}\) be the distribution of \({\check{\mu }}_{{\varvec{G}},\varvec{X},\varvec{Y}}\), and define \({\check{\mu }}_{{\mathbb {G}},\varvec{X},\varvec{Y}}\in \mathfrak {K}\), \({\check{\pi }}_{n,\omega ,\mathscr {S}}\in \mathfrak {D}\) analogously with respect to \({\mathbb {G}}\). Due to the inevitable divisibility condition required to construct a regular factor graph, these kernels are defined whenever k|dn. The following proposition summarizes the main step toward the proof of Theorem 2.6.

Proposition 6.12

For any \(\alpha >0\) there is \(\omega _0>0\) such that for every \(\omega >\omega _0\) there exists \(n_0>0\) such that for all \(n>n_0\) with k|dn we have

$$\begin{aligned} \mathscr {D}_{\Box }({\check{\pi }}_{n,\omega },\pi _{n,\omega })&<\alpha ,&\mathscr {D}_{\Box }({\check{\pi }}_{n,\omega ,\mathscr {S}},\pi _{n,\omega ,\mathscr {S}})&<\alpha . \end{aligned}$$

To prove Proposition 6.12 we let \(\mathscr {V}=\{v_{i}:i>N_{n,\omega }\}\) and \(\mathscr {A}=\{a_i:i>M_{n,\omega }\}\cup \bigcup _{v\in \mathscr {V}}\partial v\) be the sets of variable and constraint nodes, respectively, that are present in \({\mathbb {G}}\) but not in \({\mathbb {G}}_{n,\omega }\). Similarly as in Sect. 6.3, conditioning on the event that \(kM_{n,\omega }\le dN_{n,\omega }-d(k-1)\varvec{X}-k\varvec{Y}\), we define an enhanced random factor graph \({\mathbb {G}}^{\#}\) by

  • adding the variable nodes \(\mathscr {V}\) to \({\mathbb {G}}_{n,\omega }\) along with with \(d\varvec{X}\) new constraint nodes \(a_{v,j}^{\#}\), \(v\in \mathscr {V}\), \(j\in [d]\). Each \(a_{v,j}^{\#}\) is adjacent to v and \(k-1\) random cavities of \({\mathbb {G}}_{n,\omega }\),

  • adding \(\varvec{Y}\) more constraint nodes \(a_1^{\#},\ldots ,a_{\varvec{Y}}^{\#}\), each connected with k random cavities of \({\mathbb {G}}_{n,\omega }\).

Of course, the cavities in the above construction are drawn without replacement and all weight functions are chosen from P independently. We do not require that the outcome \({\mathbb {G}}^{\#}\) be simple. Let

$$\begin{aligned} \mathscr {A}^{\#}=\{a_{v,j}^{\#}:v\in \mathscr {V},j\in [d]\}\cup \{a_i^{\#}:i\le \varvec{Y}\} \end{aligned}$$

comprise the new constraint nodes.

Lemma 6.13

We have \(d_{\mathrm {TV}}({\mathbb {G}},{\mathbb {G}}^{\#})=o_\omega (1)\).

Proof

Similarly as in the proof of Lemma 6.7, we consider the event \({{\mathscr {E}}}=\{\omega /2\le \varvec{X}\le 2\omega ,\, \omega /2\le \varvec{Y}\le 2\omega \}\), which has probability \(1-o_\omega (1)\). Further, let \({{\mathscr {E}}}'\) be the event that \({\mathbb {G}}\) enjoys the following additional properties.

  1. (i’)

    We have \(|\partial ^2\mathscr {V}|=|\mathscr {V}|d(k-1).\)

  2. (ii’)

    \(|\partial \{a_{ M_{n,\omega }+1},\ldots ,a_{m}\}|=k(m-M_{n,\omega })\) and \(\partial \mathscr {V}\cap \{a_{M_{n,\omega }},\ldots ,a_{ m}\}=\varnothing \).

  3. (iii’)

    If \(a\not \in \mathscr {A}\), then a is connected to the set \(\partial \mathscr {A}\) by at most one edge.

Additionally, let \({{\mathscr {E}}}''\) be the event that \({\mathbb {G}}_{n,\omega }\) has the following properties.

  1. (i”)

    all variable nodes have degree either d or \(d-1\).

  2. (ii”)

    no two cavities are adjacent to the same constraint node.

We have

$$\begin{aligned} \mathbb {P}\left[{{{\mathscr {E}}}}\right]&=1-o_\omega (1),&\mathbb {P}\left[{{\mathbb {G}}\in {{\mathscr {E}}}'}\right]&=1-o(1),&\mathbb {P}\left[{{\mathbb {G}}_{n,\omega }\in {{\mathscr {E}}}''}\right]&=1-o(1). \end{aligned}$$
(6.43)

Moreover, \({\mathbb {G}}\) given \({{\mathscr {E}}}'\) is distributed precisely as \({\mathbb {G}}^{\#}\) given \({{\mathscr {E}}}''\). Thus, the assertion follows from (6.43). \(\quad \square \)

Due to Lemma 6.13 we can apply Theorem 2.5 to \({\mathbb {G}}^{\#}\). Let \(S_1^{\#},\ldots ,S_\ell ^{\#}\) denote the resulting Bethe state decomposition of \({\mathbb {G}}^{\#}\). Let \(T_i^{\#}=S_i^{\#}\cap \Omega ^{V_n{\setminus }\mathscr {V}}\) for \(i\in [\ell ]\). Further, we introduce

$$\begin{aligned} \varvec{z}_{{\mathbb {G}}^\#,i}&= \left\langle {{\varvec{1}\{\varvec{\sigma }\in S_{i}^\#\}\bigg / \sum _{\tau \in \Omega ^{\mathscr {V}}} \prod _{v\in \mathscr {V}}p(\tau _{v})\prod _{a\in \mathscr {A}^\#}\psi _a(\varvec{\sigma }_{V_n{\setminus }\mathscr {V}},\tau )},{\mu _{{\mathbb {G}}^\#}}}\right\rangle ,\nonumber \\ \mu _{{\mathbb {G}}^\#,i}(\sigma )&=\frac{\mu _{{\mathbb {G}}^\#}(\sigma )\varvec{1}\{\sigma \in T_i^\#\}}{\varvec{z}_{{\mathbb {G}}^\#,i} \sum _{\tau \in \Omega ^{\mathscr {V}}} \prod _{v\in \mathscr {V}}p(\tau _{v})\prod _{a\in \mathscr {A}^\#}\psi _a(\sigma ,\tau )}&(\sigma \in \Omega ^{V_n{\setminus }\mathscr {V}}). \end{aligned}$$
(6.44)

Thus, \(\mu _{{\mathbb {G}}^\#,i}\in \mathscr {P}(\Omega ^{V_n{\setminus }\mathscr {V}})\).

Lemma 6.14

With probability \(1-o_\omega (1)\) the sets \(T_1^\#,\ldots ,T_\ell ^\#\) are pairswise disjoint and we have

$$\begin{aligned} \mu _{{\mathbb {G}}^\#,i}(\tau )&=\mu _{{\mathbb {G}}_{n,\omega }}(\tau |T_i^\#)\qquad \text{ for } \text{ all } \tau \in T_i^\# \text{ and }&\mu _{{\mathbb {G}}_{n,\omega }}(T_i^\#)&=\varvec{z}_{{\mathbb {G}}^\#,i}\big /\sum _{j=1}^\ell \varvec{z}_{{\mathbb {G}}^\#,j}. \end{aligned}$$
(6.45)

Proof

We recall from Sect. 4.1 that the decomposition \(S_1^\#,\ldots ,S_\ell ^\#\) is constructed by pinning the values of a random set \(\varvec{U}_*\) of variables to specific spins. Since the size of this set is bounded, with high probability we have \((\mathscr {C}\cup \mathscr {V})\cap \varvec{U}_*=\varnothing \). We will prove that in this case, \(\mu _{{\mathbb {G}}^\#,i}(\sigma )=\mu _{{\mathbb {G}}_{n,\omega }}(\sigma |T_i^\#)\) for all \(i,\sigma \).

If \((\mathscr {C}\cup \mathscr {V})\cap \varvec{U}_*=\varnothing \), then \(T_1^\#,\ldots ,T_\ell ^\#\) are pairwise disjoint. Thus, fix \(i\in [\ell ]\) and \(\sigma \in T_i^\#\). Then by the construction of \({\mathbb {G}}^\#\),

$$\begin{aligned} \mu _{{\mathbb {G}}_{n,\omega }}(\sigma )&=\frac{Z({\mathbb {G}}^\#)}{Z({\mathbb {G}}_{n,\omega })}\cdot \frac{\mu _{{\mathbb {G}}^\#}(\sigma )}{\sum _{\tau \in \Omega ^{\mathscr {V}}}\prod _{v\in \mathscr {V}}p(\tau _v)\prod _{a\in \mathscr {A}^\#}\psi _a(\sigma ,\tau )}, \end{aligned}$$
(6.46)
$$\begin{aligned} \frac{Z({\mathbb {G}}_{n,\omega })}{Z({\mathbb {G}}^\#)}&=\left\langle {{1\bigg /\sum _{\tau \in \Omega ^{\mathscr {V}}}\prod _{v\in \mathscr {V}}p(\tau _v)\prod _{a\in \mathscr {A}^\#}\psi _a(\varvec{\sigma },\tau )},{\mu _{{\mathbb {G}}}}}\right\rangle =\sum _{j=1}^\ell \varvec{z}_{{\mathbb {G}}^\#,j}. \end{aligned}$$
(6.47)

Combining (6.46) and (6.47), we obtain the second identity in (6.45). Further, combining the second part of (6.45) with (6.46) and (6.47), we find

$$\begin{aligned} \mu _{{\mathbb {G}}_{n,\omega }}(\sigma \mid T_i^\#)&= \frac{\mu _{{\mathbb {G}}_{n,\omega }}(\sigma )}{\mu _{{\mathbb {G}}_{n,\omega }}(T_i^\#)} =\frac{\sum _{j=1}^\ell \varvec{z}_{{\mathbb {G}}^\#,j}}{\varvec{z}_{{\mathbb {G}}^\#,i}} \cdot \frac{Z({\mathbb {G}}^\#)}{Z({\mathbb {G}}_{n,\omega })}\cdot \varvec{z}_{{\mathbb {G}}^\#,i}\cdot \mu _{{\mathbb {G}}^{\#},i}(\sigma )=\mu _{{\mathbb {G}}^{\#},i}(\sigma ), \end{aligned}$$

thereby establishing the first part of (6.45). \(\quad \square \)

W.h.p. each cavity \(v\in \mathscr {C}\) of \({\mathbb {G}}_{n,\omega }\) has degree \(d-1\). In this case, we denote by \(b_v\) the unique neighbour of v in \({\mathbb {G}}^\#\) that is not present in \({\mathbb {G}}_{n,\omega }\). Further, for \(i\in [\ell ]\) let \(\varvec{\nu }_{{\mathbb {G}},i}\in \mathscr {P}(\Omega ^{\mathscr {C}})\) be the product measure

$$\begin{aligned} \nu _{{\mathbb {G}}^\#,i}=\bigotimes _{v\in \mathscr {C}}\mu _{{\mathbb {G}}^\#,v\rightarrow b_v}(\,\cdot \,|S_i^\#). \end{aligned}$$

In close analogy to the weights introduced in (2.13), we also define

$$\begin{aligned}&{\check{\varvec{z}}}_{i}^\#=\mu _{{\mathbb {G}}^\#}(S_i^\#)\cdot \prod _{v\in \mathscr {V}}\left( {\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _{v}=\chi \}\psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\tau _w|S_i^\#)}\right) ^{-1} \nonumber \\&\quad \cdot \prod _{i=1}^{\varvec{Y}}\left( {\sum _{\tau \in \Omega ^{\partial a_i^\#}}\psi _{a_i^\#}(\tau )\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\tau _w|S_i^\#)}\right) ^{-1}. \end{aligned}$$
(6.48)

Lemma 6.15

With probability \(1-o_\omega (1)\) we have \(\sum _{h=1}^\ell \left| {\varvec{z}_{{\mathbb {G}}^\#,h}-{\check{\varvec{z}}}_{h}^\#}\right| =o(1)\) and

$$\begin{aligned} \sum _{h=1}^\ell \mu _{{\mathbb {G}}_{n,\omega }}(T_h^\#)\left\| {\mu _{{\mathbb {G}}_{n,\omega },\mathscr {C}}(\,\cdot \,|T_h^\#)-\nu _{{\mathbb {G}}^\#,h}}\right\| _{\mathrm {TV}}&=o(1). \end{aligned}$$

Proof

Fix \(h\in [\ell ]\) and suppose that \(S_h^\#\) is an o(1)-Bethe state, which occurs with probability \(1-o_\omega (1)\) due to Theorem 2.5 and Lemma 6.13. Then by BS2 w.h.p. we have for any \(\sigma \in \Omega ^{\mathscr {V}\cup \mathscr {C}}\),

$$\begin{aligned} \mu _{{\mathbb {G}}^\#}(\sigma |S_h^\#)&\sim \prod _{v\in \mathscr {V}}\frac{p(\sigma _{v_i})\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\sigma _w|S_h^\#)}{\sum _{\chi \in \Omega }\prod _{a\in \partial v_i}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _v=\chi \}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\tau _w|S_h^\#)}\nonumber \\&\quad \cdot \prod _{i=1}^{\varvec{Y}}\frac{\psi _{a_i^\#}(\sigma )\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\sigma _w|S_h^\#)}{\sum _{\tau \in \Omega ^{\partial a_i^\#}}\psi _{a_i^\#}(\tau )\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\tau _w|S_h^\#)}. \end{aligned}$$
(6.49)

Further, w.h.p. each cavity of \({\mathbb {G}}_{n,\omega }\) has degree \(d-1\); in this case, denote by \(c_v\) the unique neighbor of v in \({\mathbb {G}}^{\#}\) that is absent in \({\mathbb {G}}_{n,\omega }\). Then by (6.49) w.h.p. we have

$$\begin{aligned} \varvec{z}_{{\mathbb {G}}^\#,h}&=\mu _{{\mathbb {G}}^\#}(S_h^\#) \left\langle {{1\bigg / \sum _{\tau \in \Omega ^{\mathscr {V}}} \prod _{v\in \mathscr {V}}p(\tau _v)\prod _{a\in \mathscr {A}^\#}\psi _a(\varvec{\sigma }_{V{\setminus }\mathscr {V}},\tau )},{\mu _{{\mathbb {G}}^\#}(\,\cdot \,|S_h^\#)}}\right\rangle \nonumber \\&\quad \sim \mu _{{\mathbb {G}}^\#}(S_h^\#)\nonumber \\&\quad \sum _{\sigma \in \Omega ^{\mathscr {V}\cup \mathscr {C}}} \prod _{v\in \mathscr {V}}\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a{\setminus } v}\mu _{{\mathbb {G}}^{\#},w\rightarrow a}(\sigma _w|S_h^\#)}{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma ) \cdot \sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\chi =\tau _v\} \psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\mathbb {G}}^{\#},w\rightarrow a}(\tau _w|S_h^\#)} \nonumber \\&\quad \cdot \prod _{i=1}^{\varvec{Y}}\frac{\psi _{a_i^{\#}}(\sigma )\prod _{w\in \partial a_i^{\#}}\mu _{{\mathbb {G}}^{\#},w\rightarrow a_i^{\#}}(\sigma _w|S_h^\#)}{\psi _{a_i^{\#}}(\sigma )\cdot \sum _{\tau \in \Omega ^{\partial a_i^{\#}}}\psi _{a_i^{\#}}(\tau )\prod _{w\in \partial a_i^{\#}}\mu _{{\mathbb {G}}^{\#},w\rightarrow a_i^{\#}}(\tau _w|S_h^\#)} \nonumber \\&\quad =\mu _{{\mathbb {G}}^\#}(S_h^\#)\prod _{v\in \mathscr {V}}\frac{\sum _{\sigma \in \Omega ^{\partial ^2v}}\prod _{a\in \partial v}\prod _{w\in \partial a{\setminus } v}\mu _{{\mathbb {G}}^{\#},w\rightarrow a}(\sigma _w|S_h^\#)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\chi =\tau _v\} \psi _a(\tau )\prod _{w\in \partial a{\setminus } v}\mu _{{\mathbb {G}}^{\#},w\rightarrow a}(\tau _w|S_h^\#)}\nonumber \\&\quad \cdot \prod _{i=1}^{\varvec{Y}}\frac{\sum _{\sigma \in \Omega ^{\partial a_i^{\#}}}\prod _{w\in \partial a_i^{\#}}\mu _{{\mathbb {G}}^{\#},w\rightarrow a_i^{\#}}(\sigma _w|S_h^\#)}{\psi _{a_i^{\#}}(\sigma )\cdot \sum _{\tau \in \Omega ^{\partial a_i^{\#}}}\psi _{a_i^{\#}}(\tau )\prod _{w\in \partial a_i^{\#}}\mu _{{\mathbb {G}}^{\#},w\rightarrow a_i^{\#}}(\tau _w|S_h^\#)}\nonumber \\&\quad ={\check{\varvec{z}}}_{h}^{\#}. \end{aligned}$$
(6.50)

Summing on h completes the proof of the first assertion.

With respect to the second assertion, for \(\sigma \in \Omega ^{\mathscr {C}}\) we have w.h.p.

$$\begin{aligned} \mu _{{\mathbb {G}}_{n,\omega },\mathscr {C}}(\sigma |T_h^{\#})&= \mu _{{\mathbb {G}}^\#,h}(\sigma )&\text{[by } \text{ Lemma } \text{6.14] }\\&=\mu _{{\mathbb {G}}^{\#}}(S_h^{\#})\cdot \frac{\mu _{{\mathbb {G}}^\#}(\sigma |S_h^{\#})}{\varvec{z}_{{\mathbb {G}}^\#,h} \sum _{\tau \in \Omega ^{\mathscr {V}}} \prod _{v\in \mathscr {V}}p(\tau _{v})\prod _{a\in \mathscr {A}^\#}\psi _a(\sigma ,\tau )}&\text{[by } \text{(6.44)] }\\&\sim \frac{\mu _{{\mathbb {G}}^{\#}}(S_h^{\#})}{\varvec{z}_{{\mathbb {G}}^\#,h}} \left( {\sum _{\tau \in \Omega ^{\mathscr {V}}} \prod _{v\in \mathscr {V}}p(\tau _{v})\prod _{a\in \mathscr {A}^\#}\psi _a(\sigma ,\tau )}\right) ^{-1}\\&\quad \cdot \prod _{i=1}^{\varvec{Y}}\frac{\psi _{a_i^\#}(\sigma )\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\sigma _w|S_h^\#)}{\sum _{\tau \in \Omega ^{\partial a_i^\#}}\psi _{a_i^\#}(\tau )\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\tau _w|S_h^\#)}\\&\quad \cdot \prod _{v\in \mathscr {V}}\frac{p(\sigma _{v_i})\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\sigma _w|S_h^\#)}{\sum _{\chi \in \Omega }\prod _{a\in \partial v_i}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _v=\chi \}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\tau _w|S_h^\#)}&\text{[by } \text{(6.49)] }\\&=\frac{\mu _{{\mathbb {G}}^{\#}}(S_h^{\#})}{\varvec{z}_{{\mathbb {G}}^\#,h}} \prod _{i=1}^{\varvec{Y}}\frac{\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\sigma _w|S_h^\#)}{\sum _{\tau \in \Omega ^{\partial a_i^\#}}\psi _{a_i^\#}(\tau )\prod _{w\in \partial a_i^\#}\mu _{{\mathbb {G}}^\#,w\rightarrow a_i^\#}(\tau _w|S_h^\#)}\\&\quad \cdot \prod _{v\in \mathscr {V}}\frac{\prod _{a\in \partial v}\prod _{w\in \partial a}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\sigma _w|S_h^\#)}{\sum _{\chi \in \Omega }\prod _{a\in \partial v_i}\sum _{\tau \in \Omega ^{\partial a}}\varvec{1}\{\tau _v=\chi \}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\mathbb {G}}^\#,w\rightarrow a}(\tau _w|S_h^\#)}\\&=\nu _{{\mathbb {G}}^\#,h}(\sigma ),&\text{[by } \text{(6.50)] } \end{aligned}$$

as claimed. \(\quad \square \)

Proof of Proposition 6.12

Let \(\nu _{{\mathbb {G}}^{\#}}={\sum _{i=1}^\ell {\check{\varvec{z}}}^{\#}_i\nu _{{\mathbb {G}}^{\#},i}}/{\sum _{i=1}^\ell {\check{\varvec{z}}}_{{\mathbb {G}}^{\#},i}}\) and let \(\pi ^\#_{n,\omega ,\mathscr {S}}\) be the distribution of the kernel representation \({\dot{\nu }}_{{\mathbb {G}}^{\#}}\in \mathfrak {K}\). Then up to a renumbering of the variable and constraint nodes, \({\check{\mu }}_{{\mathbb {G}}^{\#},\varvec{X},\varvec{Y}}\in \mathfrak {K}\) is distributed as the representation of \(\nu _{{\mathbb {G}}^{\#}}\). Specifically, in (2.14) we renumbered the nodes such that \(\mathscr {V}\) comprises the first \(\varvec{X}\) variable nodes and such that the \(a_i^{\#}\), \(i\in [\varvec{Y}]\), are the first \(\varvec{Y}\) constraint nodes. Due to Lemma 6.13 and because \({\mathbb {G}}\) and \({\varvec{G}}\) are invariant under node permutations, we conclude that \(\mathscr {D}_{\Box }(\pi ^\#_{n,\omega ,\mathscr {S}},{\check{\pi }}_{n,\omega ,\mathscr {S}})=o_\omega (1)\). Furthermore, combining Lemmas 3.14, 6.14 and 6.15, we see that \(\mathbb {E}[\Delta _{\Box }(\mu _{{\mathbb {G}}_{n,\omega },\mathscr {C}},\nu _{{\mathbb {G}}^{\#}})]=o_\omega (1)\). Hence, invoking (3.4), we conclude that \(\mathbb {E}[\mathscr {D}_{\Box }(\rho _{n,\omega ,\mathscr {S}},{\dot{\nu }}_{{\mathbb {G}}^{\#}})]=o_\omega (1)\). Thus, the triangle inequality yields \(\mathscr {D}_{\Box }(\pi _{n,\omega ,\mathscr {S}},{\check{\pi }}_{n,\omega ,\mathscr {S}})=o_\omega (1)\). The same argument applies to \(\pi _{n,\omega }\) and \({\check{\pi }}_{n,\omega }\). \(\quad \square \)

As a final preparation toward the proof of Theorem 2.6, we need the following simple lemma.

Lemma 6.16

For any fixed integer \(\ell \) we have

$$\begin{aligned} \mathscr {D}_{\Box }(\pi _{n,\omega },\pi _{n+\ell ,\omega })&=o_\omega (1),&\mathscr {D}_{\Box }(\pi _{n,\omega ,\mathscr {S}},\pi _{n+\ell ,\omega ,\mathscr {S}})&=o_\omega (1). \end{aligned}$$

Proof

The random factor graph \({\mathbb {G}}_{n,\omega }\) or \({\varvec{G}}_{n,\omega }\), respectively, has \(n-\varvec{X}\) variable nodes with probability \(1-o_\omega (1)\). Similarly, the number of variable nodes of \({\mathbb {G}}_{n+\ell ,\omega }\) or \({\varvec{G}}_{n+\ell ,\omega }\) is \(n+\ell -\varvec{X}\) with probability \(1-o_\omega (1)\). Since \(\varvec{X}\) is a Poisson variable with mean \(\omega \), we have \(d_{\mathrm {TV}}(n+\ell -\varvec{X},n-\varvec{X})=o_\omega (1)\). Hence, we can couple \({\mathbb {G}}_{n+\ell ,\omega }\) and \({\mathbb {G}}_{n,\omega }\) as well as \({\varvec{G}}_{n+\ell ,\omega }\) and \({\varvec{G}}_{n,\omega }\) in such a way that both coincide w.h.p. This coupling extends to the distributions \(\rho _{n,\omega ,\mathscr {S}},\rho _{n+\ell ,\omega ,\mathscr {S}}\) and \(\rho _{n,\omega },\rho _{n+\ell ,\omega }\). \(\quad \square \)

Proof

Corollary 6.11 yields the free energy formula in terms of the distributions \(\pi _{n,\omega }\) and \(\pi _{n,\omega ,\mathscr {S}}\), respectively. Furthermore, Proposition 6.12 implies together with Corollary 3.5 that

$$\begin{aligned} \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty , k|dn}\mathscr {B}(\pi _{n,\omega })&= \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty , k|dn}\mathscr {B}({\check{\pi }}_{n,\omega }), \end{aligned}$$
(6.51)
$$\begin{aligned} \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty , k|dn}\mathscr {B}(\pi _{n,\omega ,\mathscr {S}})&= \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty , k|dn}\mathscr {B}({\check{\pi }}_{n,\omega ,\mathscr {S}}), \end{aligned}$$
(6.52)

with the limit on n confined to integers such that k|dn each time. But Lemma 6.16 implies with Corollary 3.5 that this divisibility condition does not alter the limits on the left hand side of these equations, i.e.,

$$\begin{aligned} \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty , k|dn}\mathscr {B}(\pi _{n,\omega })&= \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}(\pi _{n,\omega }), \end{aligned}$$
(6.53)
$$\begin{aligned} \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty ,k|dn}\mathscr {B}(\pi _{n,\omega ,\mathscr {S}})&= \liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}(\pi _{n,\omega ,\mathscr {S}}). \end{aligned}$$
(6.54)

Thus, combining (6.51)–(6.54) and invoking Corollary 6.11, we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\varvec{G}})&=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}({\check{\pi }}_{n,\omega }),\\ \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}})&=\liminf _{\omega \rightarrow \infty }\,\liminf _{n\rightarrow \infty }\mathscr {B}({\check{\pi }}_{n,\omega ,\mathscr {S}}), \end{aligned}$$

where, of course, the limit is confined to n such that k|dn because \({\mathbb {G}}\), \({\varvec{G}}\) and \({\check{\pi }}_{n,\omega }\), \({\check{\pi }}_{n,\omega ,\mathscr {S}}\) are defined only for such n; this is the assertion. \(\quad \square \)

7 Applications

In Sect. 7.1 we prove that the spin glass model from Sect. 1.2 satisfies the condition POS; the results stated in Sect. 1.2 are then immediate from those in Sect. 2. Further, in Sections 7.2 and 7.3 we apply the results from Sect. 2 to two further models, the Potts antiferromagnet and the random regular k-SAT model. Finally, in Sect. 7.4 we show how the theorems from Sect. 2 can be brought to bear on the hard-core model, thereby proving the results stated in Sect. 1.3.

7.1 The spin glass

To derive the results on the spin glass model stated in Sect. 1 from the general theorems in Sect. 2, we just need to verify the condition POS for the spin glass model. In Example 2.3 we introduced the relevant weight function even in the more general case of the k-spin model; the case \(k=2\) corresponds to the spin glass on the Bethe lattice.

Lemma 7.1

The k-spin model satisfies POS for all \(d\ge 3\), \(\beta >0\) and all even \(k\ge 2\).

Proof

The lemma is already implicit in [35, 55]; but let us carry the simple proof out for completeness. Let \(\varvec{J}\) be a standard Gaussian. Upon substituting the weight functions from Example 2.3 into POS and multiplying by \(2^\ell \), POS reads

$$\begin{aligned}&\mathbb {E}\left[{\left( {1-\tanh (\beta \varvec{J})\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^\ell }\right]\nonumber \\&\quad +(k-1)\mathbb {E}\left[{\left( {1-\tanh (\beta \varvec{J})\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^\ell }\right]\nonumber \\&\quad -k\mathbb {E}\left[{\left( {1-\tanh (\beta \varvec{J})\int _0^1(2\mu _{s,\varvec{x}_1}-1)\prod _{i=2}^k(2\mu '_{s,\varvec{x}_i}-1){\mathrm d}s}\right) ^\ell }\right]\ge 0. \end{aligned}$$
(7.1)

for all measurable \(\mu ,\mu ':[0,1]^2\rightarrow [0,1]\). Expanding the first expectation yields

$$\begin{aligned} \mathbb {E}&\left[{\left( {1-\tanh (\beta \varvec{J})\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^\ell }\right]\\&=\sum _{j=0}^\ell {\left( {\begin{array}{c}\ell \\ j\end{array}}\right) }(-1)^j\mathbb {E}\left[{\tanh (\beta \varvec{J})^j\left( {\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^j}\right] \end{aligned}$$

Since \(\varvec{J}\) is independent of the \(\varvec{x}_i\), the last expectation vanishes if j is odd, while \(\tanh (\beta \varvec{J})^j\ge 0\) if j is even. Thus, in order to establish (7.1) it suffices to show that for any even \(j\ge 2\),

$$\begin{aligned}&\mathbb {E}\left[ \left( {\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^j +(k-1)\left( {\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^j\right. \nonumber \\&\quad \left. -k\left( {\int _0^1(2\mu _{s,\varvec{x}_1}-1)\prod _{i=2}^k(2\mu '_{s,\varvec{x}_i}-1){\mathrm d}s}\right) ^j\right] \ge 0. \end{aligned}$$
(7.2)

Let \(\varvec{s}_1,\ldots ,\varvec{s}_j\in [0,1]\) be uniformly distribution and mutually independent as well as independent of the \(\varvec{x}_i\). Then Fubini’s theorem yields

$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\prod _{i=1}^k\left( {2\mu _{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^j}\right],&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^j(2\mu _{\varvec{s}_h,\varvec{x}_1}-1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_j}\right]^k}\right], \end{aligned}$$
(7.3)
$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\prod _{i=1}^k\left( {2\mu '_{s,\varvec{x}_i}-1}\right) {\mathrm d}s}\right) ^j}\right]&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^j(2\mu '_{\varvec{s}_h,\varvec{x}_1}-1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_j}\right]^k}\right], \end{aligned}$$
(7.4)
$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\left( {2\mu _{s,\varvec{x}_1}-1)}\right) \prod _{i=2}^k(2\mu '_{s,\varvec{x}_2}-1){\mathrm d}s}\right) ^j}\right]&=\mathbb {E}\left[{\prod _{h=1}^j(2\mu _{\varvec{s}_h,\varvec{x}_1}-1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]\nonumber \\&\quad \mathbb {E}\left[{\prod _{h=1}^j(2\mu '_{\varvec{s}_h,\varvec{x}_1}-1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^{k-1}. \end{aligned}$$
(7.5)

Since for even k we have \(X^k+(k-1)Y^k-kXY^{k-1}\ge 0\) for all \(X,Y\in \mathbb {R}\), (7.3)–(7.5) yield (7.2). \(\quad \square \)

Due to Lemma 7.1, Theorem 1.1 follows from Theorem 2.5, Theorem 1.2 follows from Theorem 2.6 and Theorem 1.3 follows from Theorem 2.7.

Remark 7.2

Indeed, together with Lemma 7.1 the results from Sect. 2 yield the Bethe state decomposition and the corresponding formulas for the free energy for the k-spin model for any even \(k\ge 2\).

7.2 The Potts model

For an integer \(q\ge 2\) let \(\Omega =\{1,\ldots ,q\}\) be a set of q distinct colors Also let \(\beta >0\) be a real parameter, the inverse temperature. The Potts antiferromagnet on \({\mathbb {G}}\) is the distribution on \(\Omega ^{V_n}\) defined by

$$\begin{aligned} \mu _{{\mathbb {G}},\beta }(\sigma )&=\frac{1}{Z_\beta ({\mathbb {G}})}\exp \left[{-\beta \sum _{1\le i<j\le n}\varvec{1}\{v_i\in \partial v_j,\, \sigma (v_i)=\sigma (v_j)\}}\right],&(\sigma \in \Omega ^{V_n}), \end{aligned}$$

where the partition function \(Z_\beta ({\mathbb {G}})\) provides normalization; we omit the reference to \(\beta \) where possible. Thus, for a given \(\sigma \in \Omega ^{V_n}\) each monochromatic edge of \({\mathbb {G}}\) incurs an \(\exp (-\beta )\) penalty factor.

The Potts antiferromagnet and the associated optimization problems, the Max q-Cut problem, are of fundamental importance in combinatorics. Krzakala and Zdeborová [42] brought the cavity method to bear on this model. In the following we show how the main results of the present paper apply to this model to underpin the predictions from [42] rigorously. In particular, we specialize the Belief Propagation equations to the Potts model, work out the variational formula for the free energy and apply this formula to the Max q-Cut problem on the random regular graph.

The Potts model on \({\mathbb {G}}(n,d)\) can be cast as a random factor graph model with a single weight function

$$\begin{aligned} \psi _\beta&:\Omega ^2\rightarrow (0,1),&(\sigma ,\tau )\mapsto \exp (-\beta \varvec{1}\{\sigma =\tau \}). \end{aligned}$$

Thus, \(k=2\), \(\Psi =\{\psi _\beta \}\) and \(P(\psi _\beta )=1\) and the prior distribution p is uniform on \(\Omega \). Since the constraints are binary, the random regular factor graph \({\varvec{G}}\) can be identified with the usual random d-regular graph \({\mathbb {G}}\), with the edges representing the factor nodes.

Lemma 7.3

The Potts model satisfies condition POS for all \(\beta >0\).

Proof

We plug the definition of \(\psi _\beta \) into POS and notice that the \(1-\mathrm {e}^{-\beta }\) factors cancel. Hence, the desired inequality reads

$$\begin{aligned}&\mathbb {E}\left[ \left( { \sum _{\sigma =1}^q\int _0^1\mu _{s,\varvec{x}_1}(\sigma )\mu _{s,\varvec{x}_2}(\sigma )}\right) ^\ell +\left( {\sum _{\sigma =1}^q\int _0^1\mu '_{s,\varvec{x}_1}(\sigma )\mu '_{s,\varvec{x}_2}(\sigma )}\right) ^\ell \right. \nonumber \\&\quad \left. - 2\left( {\sum _{\sigma =1}^q\int _0^1\mu _{s,\varvec{x}_1}(\sigma )\mu '_{s,\varvec{x}_2}(\sigma )}\right) ^\ell \right] \ge 0 \quad \quad (\mu ,\mu '\in \mathfrak {K}). \end{aligned}$$
(7.6)

Applying Fubini’s theorem to take the expectation on \(\varvec{x}_1,\varvec{x}_2\) inside, we find

$$\begin{aligned} \mathbb {E}\left[{\left( {\sum _{\sigma =1}^q\int _0^1\mu _{s,\varvec{x}_1}(\sigma )\mu _{s,\varvec{x}_2}(\sigma )}\right) ^\ell }\right]&= \sum _{\sigma _1,\ldots ,\sigma _\ell =1}^q\mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(\sigma _h)\mu _{\varvec{s}_h,\varvec{x}_2}(\sigma _h)}\right]\nonumber \\&=\sum _{\sigma _1,\ldots ,\sigma _\ell =1}^q \mathbb {E}\left[ \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(\sigma _h)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]\right. \nonumber \\&\quad \left. \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_2}(\sigma _h)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]\right] \nonumber \\&=\sum _{\sigma _1,\ldots ,\sigma _\ell =1}^q \mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(\sigma _h)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^2}\right]. \end{aligned}$$
(7.7)

Similar manipulations yield

$$\begin{aligned} \mathbb {E}\left[{\left( {\sum _{\sigma =1}^q\int _0^1\mu '_{s,\varvec{x}_1}(\sigma )\mu '_{s,\varvec{x}_2}(\sigma )}\right) ^\ell }\right]&=\sum _{\sigma _1,\ldots ,\sigma _\ell =1}^q \mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu '_{\varvec{s}_h,\varvec{x}_1}(\sigma _h)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^2}\right], \end{aligned}$$
(7.8)
$$\begin{aligned} \mathbb {E}\left[{\left( {\sum _{\sigma =1}^q\int _0^1\mu _{s,\varvec{x}_1}(1)\mu '_{s,\varvec{x}_2}(1)}\right) ^\ell }\right]&=\sum _{\sigma _1,\ldots ,\sigma _\ell =1}^q\mathbb {E}\left[ \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(\sigma _h)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]\right. \nonumber \\&\quad \left. \mathbb {E}\left[{\prod _{h=1}^\ell \mu '_{\varvec{s}_h,\varvec{x}_1}(\sigma _h)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]\right] . \end{aligned}$$
(7.9)

Combining (7.7)–(7.9), we conclude that the l.h.s. of (7.6) is the expectation of a sum of squares, and thus non-negative. \(\quad \square \)

The message space \(\mathscr {S}({\mathbb {G}})\) of the Potts model boils down to the set of all families \((\mu _{v\rightarrow w})_{v\in V_n,w\in \partial w}\), with \(\mu _{v\rightarrow w}\in \mathscr {P}(\Omega )\). With this simplification the Belief Propagation operator \(\mathrm {BP}:\mathscr {S}({\mathbb {G}})\rightarrow \mathscr {S}({\mathbb {G}})\), \(\nu \mapsto {\hat{\nu }}\) of the Potts model reads

$$\begin{aligned} {\hat{\nu }}_{v\rightarrow u}(\sigma )&=\frac{\prod _{w\in \partial v{\setminus } u}1-(1-\mathrm {e}^{-\beta })\mu _{w\rightarrow v}(\sigma )}{\sum _{\tau \in \Omega }\prod _{w\in \partial v{\setminus } u}1-(1-\mathrm {e}^{-\beta })\mu _{w\rightarrow v}(\tau )}&(\sigma \in \Omega ). \end{aligned}$$
(7.10)

With respect to Bethe states, we expect that the phase space \(\Omega ^n\) decomposes into \(S_1,\ldots ,S_\ell \) such that the conditional distribution \(\mu _{{\mathbb {G}}}[\,\cdot \,|S_i]\) are free of long-range correlations, that their standard messages form an approximate fixed point of BP and that the conditional marginals derive from the messages. In formulas, with high probability over the choice of the graph and with \(({\hat{\mu }}_{{\mathbb {G}},v\rightarrow u}[\,\cdot \,|S_h])_{u\in \partial v}=\mathrm {BP}(\mu _{{\mathbb {G}},v\rightarrow u}[\,\cdot \,|S_h])_{u\in \partial v}\), we aim to show that

$$\begin{aligned} \frac{1}{n^2}\sum _{1\le i<j\le n} \left\| {\mu _{{\mathbb {G}},v_i,v_j}[\,\cdot \,|S_h]-\mu _{{\mathbb {G}},v_i}[\,\cdot \,|S_h]\otimes \mu _{{\mathbb {G}},v_j}[\,\cdot \,|S_h]}\right\| _{\mathrm {TV}}&=o(1), \end{aligned}$$
(7.11)
$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{u\in \partial v_i}\left\| {\mu _{{\mathbb {G}},v_i\rightarrow u}[\,\cdot \,|S_h] -{\hat{\mu }}_{{\mathbb {G}},v_i\rightarrow u}[\,\cdot \,|S_h]}\right\| _{\mathrm {TV}}&=o(1), \end{aligned}$$
(7.12)
$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{\sigma \in \Omega }\left| {\mu _{{\mathbb {G}},v_i}[\sigma |S_h]- \frac{\prod _{w\in \partial v_i}1-(1-\mathrm {e}^{-\beta })\mu _{{\mathbb {G}},v_i\rightarrow w}(\sigma )}{\sum _{\tau \in \Omega }\prod _{w\in \partial v_i}1-(1-\mathrm {e}^{-\beta })\mu _{{\mathbb {G}},v_i\rightarrow w}(\tau )}}\right|&=o(1). \end{aligned}$$
(7.13)

The following theorem establishes these facts.

Theorem 7.4

For any sequence \(L=L(n)\rightarrow \infty \) and all \(d\ge 3\), \(\beta >0\) the following is true. With high probability \({\mathbb {G}}\) admits a decomposition \(S_0,S_1,\ldots ,S_\ell \), \(\ell \le L\), of the phase space \(\Omega ^n\) such that \(\mu _{{\mathbb {G}}}(S_0)=o(1)\) and such that (7.11)–(7.12) are satisfied for \(h=1,\ldots ,\ell \).

Proof

This is immediate from Theorem 2.5 applied to the factor graph representation of the Potts model. \(\quad \square \)

With respect to the free energy, let \(\varvec{X},\varvec{Y}\) be two independent Poisson variables with mean \(\omega \). Let \(\varvec{u}_1,\ldots ,\varvec{u}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\) be uniformly random vertices and edges of \({\mathbb {G}}\), chosen independently. With \(S_1,\ldots ,S_\ell \) the decomposition from Theorem 7.4, we introduce the weights

$$\begin{aligned} \varvec{z}_{{\mathbb {G}},h}&=\mu _{{\mathbb {G}}}(S_h)\prod _{i=1}^{\varvec{X}}\left( { \sum _{\sigma \in \Omega }\prod _{v\in \partial \varvec{u}_i}1-(1-\mathrm {e}^{-\beta })\mu _{{\mathbb {G}},v\rightarrow \varvec{u}_i}(\sigma |S_h)}\right) ^{-1}\\&\quad \prod _{i=1}^{\varvec{Y}}\left( {1-(1-\mathrm {e}^{-\beta })\sum _{\sigma \in \Omega } \mu _{{\mathbb {G}},\varvec{v}_i\rightarrow \varvec{w}_i}(\sigma |S_h)\mu _{{\mathbb {G}},\varvec{w}_i\rightarrow \varvec{v}_i}(\sigma |S_h)}\right) ^{-1} \end{aligned}$$

and \(\varvec{z}_{{\mathbb {G}}}=\sum _{h=1}^\ell \varvec{z}_{{\mathbb {G}},h}\). Further, let \(\mathscr {C}({\mathbb {G}})\) be the set of all vertices of degree less than d in the graph obtained from \({\mathbb {G}}\) by removing \(\varvec{v}_1,\ldots ,\varvec{v}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\). Then with high probability each \(c\in \mathscr {C}({\mathbb {G}})\) has degree precisely \(d-1\), and we write \(c'\) for the missing d’th neighbor of c. Then with \(\varvec{c}_1,\varvec{c}_2,\ldots \) a sequence of uniformly and independently chosen elements of \(\mathscr {C}({\mathbb {G}})\), we let

$$\begin{aligned} \mathscr {B}({\mathbb {G}})&=\mathbb {E}\left[ \log {\sum _{h=1}^\ell \frac{\varvec{z}_{{\mathbb {G}},h}}{\varvec{z}_{{\mathbb {G}}}} \sum _{\sigma \in \Omega }\prod _{i=1}^d1-(1-\mathrm {e}^{-\beta })\mu _{{\mathbb {G}},\varvec{c}_i\rightarrow \varvec{c}_i'}(\sigma )}\right. \\&\quad \left. -\frac{d}{2}\log { \sum _{h=1}^\ell \frac{\varvec{z}_{{\mathbb {G}},h}}{\varvec{z}_{{\mathbb {G}}}}1-(1-\mathrm {e}^{-\beta }) \sum _{\sigma \in \Omega }\mu _{{\mathbb {G}},\varvec{c}_1\rightarrow \varvec{c}_1'}(\sigma )\mu _{{\mathbb {G}},\varvec{c}_2\rightarrow \varvec{c}_2'}(\sigma ) }\Bigg |{\mathbb {G}}\right] . \end{aligned}$$

Theorem 7.5

For all \(d\ge 3,\beta >0\) we have \(\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]=\liminf _{\omega \rightarrow \infty }\liminf _{n\rightarrow \infty }\,\mathbb {E}[\mathscr {B}({\mathbb {G}})].\)

Proof

This is an immediate consequence of Theorem 2.6 and Lemma 7.3. \(\quad \square \)

Additionally, Theorem 2.7 yields a variational formula for the free energy. Writing out the specifics of the Potts case, we see that \(\mathfrak {D}^\star \) consists of all \(\pi \in \mathfrak {D}\) that satisfy the following property. For a measurable \(\mu :[0,1]^2\rightarrow \mathscr {P}(\Omega )\) with \(\Omega =[q]\) and integers \(N,M\ge 0\) let

$$\begin{aligned}&\varvec{z}_{\mu ,s}^{N,M}=\prod _{i=1}^N\left( {\sum _{\sigma =1}^q\prod _{j=1}^d1-(1-\mathrm {e}^{-\beta })\mu _{s,\varvec{x}_{i,j}}(\sigma )}\right) \prod _{i=1}^M\left( {1-(1-\mathrm {e}^{-\beta })\sum _{\sigma =1}^q\mu _{s,\varvec{x}_{i+N,1}}(\sigma )\mu _{s,\varvec{x}_{i+N,2}}(\sigma )}\right) ,\\&\quad \text{ and } \text{ set } \varvec{t}=\varvec{t}(s)= \inf \left\{ {u\in [0,1]:\int _0^u\varvec{z}_{\mu ,u}^{N,M}{\mathrm d}s\ge s\int _0^1\varvec{z}_{\mu ,u}^{N,M}{\mathrm d}u}\right\} . \end{aligned}$$

Then we let \(\mu ^{*(N,M)}_{s,x}=\mu _{\varvec{t},x}\). Now \(\mathfrak {D}^\star _\beta \) is the set of all \(\pi \in \mathfrak {D}\) such that for a random \(\varvec{\mu }^{\pi }\in \mathfrak {K}\) drawn from \(\pi \), the perturbed \(\varvec{\mu }^{\pi *(N,M)}\in \mathfrak {K}\) again has distribution \(\pi \). Furthermore, in the Potts model the functional \(\mathscr {B}(\,\cdot \,)\) reads

$$\begin{aligned} \mathscr {B}_\beta (\pi )&=\mathbb {E}\left[ \log \left( {\sum _{\sigma =1}^q\int _0^1 \prod _{j=1}^d1-(1-\mathrm {e}^{-\beta })\mu _{s,\varvec{x}_{1,j}}(\sigma ) {\mathrm d}s}\right) \right. \\&\quad \left. -\frac{d}{2}\log \left( {1-(1-\mathrm {e}^{-\beta })\sum _{\sigma =1}^q\int _0^1\mu _{s,\varvec{x}_{1,1}}(\sigma )\mu _{s,\varvec{x}_{1,2}}(\sigma ){\mathrm d}s}\right) \right] . \end{aligned}$$

Theorem 7.6

For all \(d\ge 3\), \(\beta >0\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]&=\Phi _{d,\beta }&\text{ with } \qquad \Phi _{d,\beta }&=\inf _{\pi \in \mathfrak {D}^\star _\beta }\mathscr {B}_\beta (\pi ). \end{aligned}$$

As a further application we obtain a variational formula for the Max q-Cut of the random regular graph, which is defined as

$$\begin{aligned} \mathrm {MC}_q({\mathbb {G}})&=\frac{dn}{2}-\frac{1}{2}\min _{\sigma :[n]\rightarrow [q]}\sum _{v,w=1}^n\varvec{1}\{w\in \partial v,\,\sigma (v)=\sigma (w)\}. \end{aligned}$$
(7.14)

Thus, \(\mathrm {MC}_q({\mathbb {G}})\) equals the total number of edges of \({\mathbb {G}}\) minus the ground state energy of the Potts model. In other words, \(\mathrm {MC}_q({\mathbb {G}})\) is the maximum, over the choice of \(\sigma :[n]\rightarrow [q]\), of the number of edges that link vertices of different colors The Max q-Cut problem is well-studied in combinatorics and computer science. In particular, the problem is well known to be NP-hard on worst-case instances.

Corollary 7.7

For all \(d\ge 3\) we have \(\displaystyle \mathrm {MC}_q({\mathbb {G}})/n\ {{\mathop {\longrightarrow }\limits ^{{n\rightarrow \infty }}}}\ \frac{d}{2}+\lim _{\beta \rightarrow \infty }\Phi _{d,\beta +1}-\Phi _{d,\beta }\) in probability.

Proof

Since Azuma’s inequality shows that \(\mathrm {MC}_q({\mathbb {G}})\) is concentrated within \(O(\sqrt{n\log n}\)) about its mean, it suffices to prove that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\mathrm {MC}_q({\mathbb {G}})]=\frac{d}{2}+\lim _{\beta \rightarrow \infty }\Phi _{d,\beta +1}-\Phi _{d,\beta }. \end{aligned}$$
(7.15)

Further, introducing \(\mathscr {H}_{{\mathbb {G}}}(\sigma )=\frac{1}{2}\sum _{v,w=1}^n\varvec{1}\{w\in \partial v,\,\sigma (v)=\sigma (w)\}\) and recalling (7.14), we can rewrite (7.15) as

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\min _{\sigma :[n]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}}(\sigma )}\right]=\lim _{\beta \rightarrow \infty }\Phi _{d,\beta }-\Phi _{d,\beta +1}. \end{aligned}$$
(7.16)

To prove (7.16) we write \(\mu _{G,\beta }\in \mathscr {P}([q]^{V(G)})\) for the Potts distribution induced by a d-regular graph \(G=(V(G),E(G))\). Moreover, let us denote the Potts Hamiltonian by \(\mathscr {H}_G\) and the partition function by \(Z_\beta (G)\). It is well known that for any \(\varepsilon >0\) there exists \(\beta _0(\varepsilon )>0\) such that for all \(\beta >\beta _0(\varepsilon )\) and all d-regular graphs G we have

$$\begin{aligned} \left\langle {{\mathscr {H}_G},{\mu _{G,\beta }}}\right\rangle -\varepsilon |V(G)|&\le \min _{\sigma :V(G)\rightarrow [q]}\mathscr {H}_G(\sigma )\le \left\langle {{\mathscr {H}_G},{\mu _{G,\beta }}}\right\rangle . \end{aligned}$$
(7.17)

Consequently, for all \(\beta >\beta _0(\varepsilon )\) we have

$$\begin{aligned} \int _\beta ^{\beta +1}\left\langle {{\mathscr {H}_G},{\mu _{G,b}}}\right\rangle {\mathrm d}b -\varepsilon |V(G)|&\le \min _{\sigma :V(G)\rightarrow [q]}\mathscr {H}_G(\sigma )\le \int _\beta ^{\beta +1}\left\langle {{\mathscr {H}_G},{\mu _{G,b}}}\right\rangle {\mathrm d}b. \end{aligned}$$
(7.18)

Since \(\left\langle {{\mathscr {H}_G},{\mu _{G,\beta }}}\right\rangle =-\frac{\partial }{\partial \beta }\log Z_\beta (G)\), (7.18) yields

$$\begin{aligned}&\log Z_\beta (G)-\log Z_{\beta +1}(G)-\varepsilon |V(G)|\nonumber \\&\quad \le \min _{\sigma :V(G)\rightarrow [q]}\mathscr {H}_G(\sigma )\le \log Z_\beta (G)-\log Z_{\beta +1}(G). \end{aligned}$$
(7.19)

Applying (7.19) to the random regular graph \({\mathbb {G}}\) and taking expectations, we obtain

$$\begin{aligned}&\frac{1}{n}\mathbb {E}[\log Z_\beta ({\mathbb {G}})]-\frac{1}{n}\mathbb {E}[\log Z_{\beta +1}({\mathbb {G}})]-\varepsilon \le \frac{1}{n}\mathbb {E}\left[{\min _{\sigma :[n]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}}(\sigma )}\right]\nonumber \\&\quad \le \frac{1}{n}\mathbb {E}[\log Z_\beta ({\mathbb {G}})]-\frac{1}{n}\mathbb {E}[\log Z_{\beta +1}({\mathbb {G}})]. \end{aligned}$$
(7.20)

Hence, taking \(n\rightarrow \infty \), we obtain for all \(\beta >\beta _0(\varepsilon )\),

$$\begin{aligned}&\Phi _{d,\beta }-\Phi _{d,\beta +1}-\varepsilon \le \liminf _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\min _{\sigma :[n]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}}(\sigma )}\right]\nonumber \\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}\left[{\min _{\sigma :[n]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}}(\sigma )}\right]\le \Phi _{d,\beta }-\Phi _{d,\beta +1}. \end{aligned}$$
(7.21)

Finally, there exists a subsequence \((n_l)\) along which \(\mathbb {E}\left[{\min _{\sigma :[n_l]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}(n_l,d)}(\sigma )}\right]/n_l\) converges to a number \(\xi \ge 0\). Taking the limit of (7.20) along this subsequence, we obtain \(\xi \le \Phi _{d,\beta }-\Phi _{d,\beta +1}\le \xi +\varepsilon \) for all \(\beta >\beta _0(\varepsilon )\). Consequently, the limit \(\lim _{\beta \rightarrow \infty }\Phi _{d,\beta }-\Phi _{d,\beta +1}\) exists. Therefore, taking \(\beta \rightarrow \infty \) in (7.21), we conclude that

$$\begin{aligned} \lim _{n\rightarrow \infty }n^{-1}\mathbb {E}\left[{\min _{\sigma :[n]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}}(\sigma )}\right] \end{aligned}$$

exists as well and that (7.16) is satisfied. \(\quad \square \)

7.3 The regular k-SAT model

The k-SAT problem plays a major role in computer science, particularly in computational complexity theory. In its optimization version, known as the Max k-SAT problem asks for the largest number of clauses of a propositional formula in conjunctive normal form with clauses of length k that can be satisfied simultaneously. Random instances of k-SAT and Max k-SAT have been studied extensively as instructive benchmarks [5].

We can express the Max k-SAT problem as a factor graph model with spins \(\Omega =\{-1,1\}\) corresponding to the Boolean values ‘true’ and ‘false’ as follows. With \(k\ge 2\) an integer and \(\beta >0\) be a real parameter, we introduce the weight functions

$$\begin{aligned} \psi _{\beta ,\chi }:&\left\{ {\pm 1}\right\} ^k\rightarrow (0,1),&\sigma&\mapsto \frac{{1-\tanh \left( {\beta \prod _{i=1}^k\chi _i\sigma _i}\right) }}{2}&(\chi \in \left\{ {\pm 1}\right\} ^k). \end{aligned}$$

Let p be the uniform distribution on \(\Omega \) and let P be uniform on \(\Psi _\beta =\{\psi _{\beta ,\chi }:\chi \in \Omega ^k\}\). In terms of propositional formulas, the semantics is that \(\psi _{\beta ,\chi }\) encodes a k-clause whose ith literal is negated if \(\chi _i=1\) and positive if \(\chi _i=-1\). Thus, \(\prod _{i=1}^k\chi _i\sigma _i=1\) if the truth assignment \(\sigma \) fails to satisfy the clause, and \(\prod _{i=1}^ks_i\sigma _i=-1\) otherwise. In effect, \(\psi _{\beta ,s}(\sigma )=(1-\tanh \beta )/2\rightarrow 0\) as \(\beta \rightarrow \infty \) if \(\sigma \) fails to satisfy the clause, whereas \(\psi _{\beta ,s}(\sigma )=(1+\tanh \beta )/2\rightarrow 1\) if \(\sigma \) is satisfying. Hence, the random factor graph \({\varvec{G}}\) models a random k-SAT formula in which every variable appears precisely d times, the regular k-SAT model. We are going to derive variational formulas for its free energy and its ground state energy.

Lemma 7.8

The regular k-SAT model satisfies POS for all \(d,k\ge 3\) and all \(\beta >0\).

Proof

Once more this is already implicit in [35, 55], but we carry out the argument here for completeness. Let us write \(\varvec{\chi }\) for a uniformly random element of \(\{\pm 1\}^k\). Substituting \(\psi _{\beta ,\chi }\) into POS and cancelling positive constants, we are left to verify the inequality

$$\begin{aligned}&\mathbb {E}\left[ \left( {\int _0^1\prod _{i=1}^k\mu _{s,\varvec{x}_i}(\varvec{\chi }_i){\mathrm d}s}\right) ^\ell +\left( {\int _0^1\prod _{i=1}^k\mu _{s,\varvec{x}_i}'(\varvec{\chi }_i){\mathrm d}s}\right) ^\ell \right. \nonumber \\&\quad \left. -(k-1)\left( {\int _0^1\mu _{s,\varvec{x}_1}(\varvec{\chi }_1)\prod _{i=2}^k\mu '_{s,\varvec{x}_2}(\varvec{\chi }_i){\mathrm d}s}\right) ^\ell \right] \ge 0 \quad \quad (\mu ,\mu '\in \mathfrak {K}). \end{aligned}$$
(7.22)

Fubini’s theorem yields

$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\prod _{i=1}^k\mu _{s,\varvec{x}_i}(\varvec{\chi }_i){\mathrm d}s}\right) ^\ell }\right]&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(\varvec{\chi }_1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^k}\right], \end{aligned}$$
(7.23)
$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\prod _{i=1}^k\mu _{s,\varvec{x}_i}'(\varvec{\chi }_i){\mathrm d}s}\right) ^\ell }\right]&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu '_{\varvec{s}_h,\varvec{x}_1}(\varvec{\chi }_1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^k}\right], \end{aligned}$$
(7.24)
$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\mu _{s,\varvec{x}_1}(\varvec{\chi }_2)\prod _{i=2}^k\mu '_{s,\varvec{x}_2}(\varvec{\chi }_i){\mathrm d}s}\right) ^\ell }\right]&=\mathbb {E}\left[ \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(\varvec{\chi }_1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]\right. \nonumber \\&\quad \left. \mathbb {E}\left[{\prod _{h=1}^\ell \mu '_{\varvec{s}_h,\varvec{x}_1}(\varvec{\chi }_2)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^{k-1}\right] . \end{aligned}$$
(7.25)

Since \(X^k+(k-1)Y^k-kXY^{k-1}\ge 0\) for \(X,Y\ge 0\), (7.23)–(7.25) yield (7.22). \(\square \)

Due to Lemma 7.8 we can bring the results from Sect. 2 to bear on the random regular k-SAT model. Specifically, for a measurable \(\mu :[0,1]^2\rightarrow \mathscr {P}(\Omega )\) with \(\Omega =\{\pm 1\}\) and integers \(N,M\ge 0\) let \((\varvec{\chi }_{i,j})_{i,j\ge 1}\) be independent uniformly random elements of \(\Omega \) and let

$$\begin{aligned}&\varvec{z}_{\mu ,s}^{N,M}=\prod _{i=1}^N\left( {\sum _{\sigma \in \Omega }\prod _{j=1}^d1-\tanh (\beta )\sum _{\tau \in \Omega ^{k-1}}\varvec{\chi }_{i,k}\sigma \prod _{j=1}^{k-1}\varvec{\chi }_{i,j}\tau _j\mu _{s,\varvec{x}_{i,j}}(\tau _j)}\right) \\&\quad \prod _{i=1}^M\left( { 1-\tanh \left( {\beta }\right) \sum _{\tau \in \Omega ^k} \prod _{j=1}^k\varvec{\chi }_{i+N,j}\tau _j \mu _{s,\varvec{x}_{i+N,j}}(\tau _j)}\right) . \end{aligned}$$

Further, let

$$\begin{aligned} \varvec{t}&=\varvec{t}(s)= \inf \left\{ {u\in [0,1]:\int _0^u\varvec{z}_{\mu ,u}^{N,M}{\mathrm d}s\ge s\int _0^1\varvec{z}_{\mu ,u}^{N,M}{\mathrm d}u}\right\} . \end{aligned}$$

and \(\mu ^{N,M}_{s,x}=\mu _{\varvec{t},x}\). Then \(\mathfrak {D}^\star _\beta \) consists of all \(\pi \in \mathfrak {D}\) such that \(\varvec{\mu }^{\pi ,N,M}\) has distribution \(\pi \). Furthermore, the functional \(\mathscr {B}(\,\cdot \,)\) reads

$$\begin{aligned} \mathscr {B}_\beta (\pi )&=\mathbb {E}\left[{\log \left( {\sum _{\sigma \in \Omega }^q\int _0^1\sum _{\sigma \in \Omega }\prod _{j=1}^d1-\tanh (\beta )\sum _{\tau \in \Omega ^{k-1}}\varvec{\chi }_{i,k}\sigma \prod _{j=1}^{k-1}\varvec{\chi }_{i,j}\tau _j\mu _{s,\varvec{x}_{i,j}}(\tau _j){\mathrm d}s}\right) }\right]\\&\quad +\frac{d(k-1)}{k}\mathbb {E}\left[{\left( {\sum _{\tau \in \Omega ^k}1-\tanh \beta \int _0^1\prod _{j=1}^k\varvec{\chi }_{1,j}\tau _j\mu _{s,\varvec{x}_{1,j}}(\tau _j){\mathrm d}s}\right) }\right]-dk\log 2. \end{aligned}$$

Let

$$\begin{aligned} \Phi _{d,\beta }&=\inf _{\pi \in \mathfrak {D}^\star _\beta }\mathscr {B}_\beta (\pi ). \end{aligned}$$

Theorem 7.9

For all \(d,k\ge 3\), \(\beta >0\) we have \(\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\varvec{G}})]=\Phi _{d,\beta }\).

Proof

This follows immediately from Theorem 2.7 and Lemma 7.8. \(\quad \square \)

As a further application we also obtain a variational formula for the Max k-SAT problem. Specifically, with the interpretation of \(\sigma \in \Omega ^n\) as a truth assignment, define \(\mathscr {H}_{{\varvec{G}}}(\sigma )\) as the number of propositional clauses of \({\varvec{G}}\) that \(\sigma \) fails to satisfy. Further, let \(\mathrm {OPT}({\varvec{G}})=dn/k-\min _{\sigma \in \Omega ^k}\mathscr {H}_{{\varvec{G}}}(\sigma )\) be the maximum number of clauses that can be satisfied simultaneously. Following the steps of the proof of Corollary 7.7 precisely, we obtain the following result.

Corollary 7.10

For all \(d,k\ge 3\) we have

$$\begin{aligned} \frac{1}{n}\mathrm {OPT}({\varvec{G}})\ {{\mathop {\longrightarrow }\limits ^{{n\rightarrow \infty }}}}\ \frac{d}{k}+\lim _{\beta \rightarrow \infty }\Phi _{d,\beta +1}-\Phi _{d,\beta } \qquad \text{ in } \text{ probability }. \end{aligned}$$

7.4 The hard-core model

The proofs of Theorem 1.4 and Corollary 1.5 are not entirely straightforward because the hard-core model cannot be cast directly as a factor graph model as in Sect. 2. This is because of the ‘hard’ constraint that \(\varvec{\sigma }_v\varvec{\sigma }_w=0\) for any adjacent vw. We therefore prove Theorem 1.4 and Corollary 1.5 by way of a relaxed ‘soft-core model’ and taking two limits, first in the ‘softness’ and then in the fugacity. Specifically, we obtain a random factor graph model with \(\Omega =\{0,1\}\) and the prior \(p(0)=1/(1+\lambda )\) and \(p(1)=\lambda /(1+\lambda )\). In addition, to mimic the hard-core constraints we would like to introduce a binary weight function that forbids its two adjacent variable nodes from both taking the spin 1. But since it would take values \(\{0,1\}\), we instead introduce

$$\begin{aligned} \psi _\beta&:\Omega ^2\rightarrow (0,1),&(\sigma _1,\sigma _2)\mapsto 1-(1-\mathrm {e}^{-\beta })\sigma _1\sigma _2. \end{aligned}$$

Thus, \(\beta >0\) is a ‘softness parameter’, and upon taking \(\beta \rightarrow \infty \) we recover the hard-core constraint: \(\psi _\infty (\sigma _1,\sigma _2)=1-\sigma _1\sigma _2.\) For any \(\beta ,\lambda \) and \(d\ge 3\) we obtain the random factor factor graph model \({\varvec{G}}_{\lambda ,\beta }\) with the single binary weight function \(\psi _\beta \).

Lemma 7.11

The model \({\varvec{G}}_{\lambda ,\beta }\) satisfies POS for all \(d\ge 3,\lambda >0,\beta \in (0,\infty ]\).

Proof

Substituting \(\psi _\beta \) into POS and noticing that \(1-\mathrm {e}^{-\beta }>0\), we see that it suffices to verify the inequality

$$\begin{aligned}&\mathbb {E}\left[ \left( {\int _0^1\mu _{s,\varvec{x}_1}(1)\mu _{s,\varvec{x}_2}(1){\mathrm d}s}\right) ^\ell +\left( {\int _0^1\mu '_{s,\varvec{x}_1}(1)\mu '_{s,\varvec{x}_2}(1){\mathrm d}s}\right) ^\ell \right. \nonumber \\&\quad \left. -2\left( {\int _0^1\mu _{s,\varvec{x}_1}(1)\mu '_{s,\varvec{x}_2}(1){\mathrm d}s}\right) ^\ell \right] \ge 0 \quad \quad (\mu ,\mu '\in \mathfrak {K}). \end{aligned}$$
(7.26)

By Fubini’s theorem,

$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\mu _{s,\varvec{x}_1}(1)\mu _{s,\varvec{x}_2}(1)}\right) ^\ell }\right]&= \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(1)\mu _{\varvec{s}_h,\varvec{x}_2}(1)}\right]\nonumber \\&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right] \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_2}(1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]}\right]\nonumber \\&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^2}\right], \end{aligned}$$
(7.27)

and analogously

$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\mu '_{s,\varvec{x}_1}(1)\mu '_{s,\varvec{x}_2}(1)}\right) ^\ell }\right]&=\mathbb {E}\left[{\mathbb {E}\left[{\prod _{h=1}^\ell \mu '_{\varvec{s}_h,\varvec{x}_1}(1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]^2}\right], \end{aligned}$$
(7.28)
$$\begin{aligned} \mathbb {E}\left[{\left( {\int _0^1\mu _{s,\varvec{x}_1}(1)\mu '_{s,\varvec{x}_2}(1)}\right) ^\ell }\right]&=\mathbb {E}\left[{ \mathbb {E}\left[{\prod _{h=1}^\ell \mu _{\varvec{s}_h,\varvec{x}_1}(1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right] \mathbb {E}\left[{\prod _{h=1}^\ell \mu '_{\varvec{s}_h,\varvec{x}_1}(1)\bigg |\varvec{s}_1,\ldots ,\varvec{s}_\ell }\right]}\right]. \end{aligned}$$
(7.29)

Combining (7.27)–(7.29), we conclude that the l.h.s. of (7.26) is the expectation of a square. \(\quad \square \)

We proceed to prove Theorem 1.4. In light of Lemma 7.11, Theorem 2.7 readily yields a variational formula for \({\varvec{G}}_{\lambda ,\beta }\). The main issue that we have to confront is that the resulting variational problem for given \(\lambda ,\beta \) ranges over a spaces that depends on these parameters. In effect, it is not a priori clear that these variational problems bear any relationship to the one stated in Theorem 2.7. To deal with this issue, let \(\mathfrak {D}_{\lambda }\) be the set of all \(\pi \in \mathfrak {D}\) that are supported on \(\mu \in \mathfrak {K}\) such that \(\mu _{s,x}(1)\le \lambda /(1+\lambda )\) for all \(s,x\in [0,1]\). Further, for \(\pi \in \mathfrak {D}_{\lambda }\) we let \(\pi ^{*_\beta (N,M)}\) be the distribution obtained by the adjoining operation with respect to the weight function \(\psi _\beta \). Finally, let

$$\begin{aligned} \mathfrak {D}^\star _{\lambda ,\beta }=\left\{ {\pi \in \mathfrak {D}_{\lambda }:\text{ for } \text{ all } N,M\ge 0 \text{ we } \text{ have } \pi ^{*_\beta (N,M)}=\pi }\right\} . \end{aligned}$$

Lemma 7.12

For any \(N,M\ge 0\) the map \(\pi \in \mathfrak {D}_\lambda \mapsto \pi ^{*_\infty (N,M)}\) is continuous.

Like in the case of Lemma 3.7, the proof is based on arguments involving the cut metric. The details can be found in Appendix A.

Lemma 7.13

Let \(N,M\ge 0\) be integers. Uniformly for all \(\pi \in \mathfrak {D}_\lambda \) we have \(\pi ^{*_\beta (N,M)}\rightarrow \pi ^{*_\infty (N,M)}\) as \(\beta \rightarrow \infty \).

Proof

Let \(\varepsilon >0\) For any \(\mu \in \mathscr {K}\) let \(\varvec{Z}^{N,M}_{\mu ,\beta }(s)\) be the weight from (2.16) with respect to the weight function \(\psi _\beta \). Then we see that, uniformly for all \(\mu \) and s,

$$\begin{aligned} \varvec{Z}^{N,M}_{\mu ,\beta }(s)&\rightarrow \varvec{Z}^{N,M}_{\mu ,\infty }(s)\qquad \text{ as } \qquad \beta \rightarrow \infty . \end{aligned}$$
(7.30)

Furthermore, if \(\mu _{s,x}\le \lambda /(1+\lambda )\) for all sx, then for all \(\beta \in (0,\infty ]\) we have

$$\begin{aligned} \varvec{Z}^{N,M}_{\mu ,\beta }(s)&\ge \left( {\frac{1}{1+\lambda }}\right) ^N\left( {1-\left( {\frac{\lambda }{1+\lambda }}\right) ^2}\right) ^M>0. \end{aligned}$$
(7.31)

Combining (7.30) and (7.31) and recalling the construction of \(\mu ^{*_\beta (N,M)}\), we can construct a measurable map \(\xi :[0,1]\rightarrow [0,1]\) that preserves the Lebesgue measure such that for large enough \(\beta \) for all \(S,X\subset [0,1]\),

$$\begin{aligned} \left| {\int _S\int _X\mu _{s,x}^{*_\beta (N,M)}-\mu _{\xi (s),x}^{*_\infty (N,M)}\,{\mathrm d}x\,{\mathrm d}s}\right|&<\varepsilon . \end{aligned}$$

Thus, \(\mathscr {D}_{\Box }(\mu ^{*_\beta (N,M)},\mu ^{*_\infty (N,M)})<\varepsilon \) for large \(\beta \). Since \(\mathfrak {D}_\lambda \) is endowed with the \(W_1\)-metric, the assertion follows. \(\quad \square \)

Lemma 7.14

The set \(\mathfrak {K}_\lambda \) is closed.

Proof

We can view \(\mathfrak {K}_\lambda \) as a scaled version of the space of weak kernels. Therefore, since \(\mathfrak {K}\) is complete, so is \(\mathfrak {K}_\lambda \) is complete. Hence, any Cauchy sequence in \(\mathfrak {K}_\lambda \) has a limit within this set, and thus \(\mathfrak {K}_\lambda \) is a closed subspace of \(\mathfrak {K}\). \(\quad \square \)

Corollary 7.15

The set \(\mathfrak {D}_\lambda \) is closed.

Proof

By Lemma 7.14 there exists an increasing sequence of continuous functions \(u_n:\mathfrak {K}\rightarrow [0,1]\) that converges pointwise to \(1-\varvec{1}\mathfrak {K}_\lambda \). Thus, \(\mathfrak {D}_\lambda =\bigcap _{n\ge 1}\left\{ {\pi \in \mathfrak {D}:\int u_n{\mathrm d}\pi {=}0}\right\} \) is closed in the weak topology. \(\quad \square \)

Corollary 7.16

We have \(\liminf _{n \rightarrow \infty } \frac{1}{n} \mathbb {E}\left[{\log Z ({\varvec{G}}_{\lambda ,\beta })}\right]\ge \inf _{\pi \in \mathfrak {D}^\star _\lambda }\mathscr {B}(\pi ).\)

Proof

Since \(\mathfrak {D}^\star \) is compact, Proposition 6.1 shows that there exists \(\pi \in \mathfrak {D}^\star \) such that

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{1}{n} \mathbb {E}\left[{\log Z ({\varvec{G}}_{\lambda ,\beta })}\right]\ge \mathscr {B}(\pi ). \end{aligned}$$
(7.32)

The construction of the \(\pi \) for which the lower bound is attained is based on Proposition 6.2, whose proof shows that the measure \(\pi _{\lambda ,\beta }\) for which the lower bound is attained in the limit of a sequence of distributions \((\pi _{\lambda ,\beta ,n})_{n\ge 1}\) that come from random factor graphs with the weight function \(\psi _\beta \). Specifically, we considered a random factor graph \({\varvec{G}}_{\lambda ,\beta ,n,\omega }\) with a random number of ‘cavities’ for a slowly growing \(\omega =\omega _n\rightarrow \infty \). With \(\mu _{n}\in \mathscr {P}(\Omega ^{\mathscr {C}})\) the joint Boltzmann distribution of the spins of the cavities \(\mathscr {C}\), the measure \(\pi _{\lambda ,\beta ,n}\) is defined as the distribution of the representation of \(\mu _{n}\) as an element of \(\mathscr {M}\). Thus, we just need to show that these representations converge to points in \(\mathfrak {K}_\lambda \).

The proof of this fact is based on Corollary 3.16. Specifically, let \(\varepsilon >0\). We obtain a decomposition \(S_1,\ldots ,S_\ell \) of \(\Omega ^{\mathscr {C}}\) into classes by pinning a random set \(\Theta _\varepsilon \) of cavities. The size \(|\Theta _\varepsilon |\) of this set depends on \(\varepsilon \) only and

$$\begin{aligned} \Delta _{\Box }(\mu _n,{\bar{\mu }}_n)&<\varepsilon ,\quad \text{ where }\qquad {\bar{\mu }}_n=\sum _{i=1}^\ell \mu (S_i)\bigotimes _{v\in \mathscr {C}}\mu _v(\,\cdot \,|S_i). \end{aligned}$$
(7.33)

Now, consider a cavity \(v\in \mathscr {C}{\setminus }\Theta _\varepsilon \), let \(1\le i\le \ell \) and consider a configuration \(\sigma \in S_i\) with \(\sigma _v=1\). Obtain \(\sigma '\) by setting \(\sigma '_v=0\) and \(\sigma '_w=\sigma _w\) for all \(w\ne v\). Then \(\sigma '\in S_i\) and the construction of the Boltzmann distribution ensures that \(\mu _n(\sigma |S_i)\le \lambda \mu _n(\sigma '|S_i)\). Hence, \(\mu _v(1|S_i)\le \lambda /(1+\lambda )\). Since \(|\Theta _\varepsilon |\) is bounded in terms of \(\varepsilon \) only, whereas \(|\mathscr {C}|\ge \omega _n/2\rightarrow \infty \) with high probability, we deduce from (7.33) that the representation \({\check{\mu }}_n\in \mathfrak {K}\) satisfies \(\mathscr {D}_{\Box }({\check{\mu }}_n,\mathfrak {K}_\lambda )<\varepsilon \) with high probability. Since, furthermore, the Wasserstein metric induces the weak topology on \(\mathfrak {D}\), we conclude that \(\pi _{\lambda ,\beta ,n}\) converges to a point \(\pi \) on in the closure of \(\mathfrak {D}_\lambda \); but since \(\mathfrak {D}_\lambda \) is closed, we conclude that \(\pi \in \mathfrak {D}_\lambda \). Finally, Corollary 7.15 implies that \(\pi \in \mathfrak {D}_\lambda \cap \mathfrak {D}^\star =\mathfrak {D}^\star _\lambda \). Thus, the assertion follows from (7.32).

\(\square \)

We are ready to establish the lower bound on the free energy.

Proposition 7.17

For all \(d\ge 3,\lambda >0\) we have \(\liminf _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}}_{\lambda ,\infty })]\ge \Phi _{d,\lambda }.\)

Proof

For any \(\beta ,\lambda >0\) Corollary 7.16 supplies \(\pi _{\lambda ,\beta }\in \mathfrak {D}^\star _\lambda \) such that

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{1}{n} \mathbb {E}\left[{\log Z ({\varvec{G}}_{\lambda ,\beta })}\right]\ge \mathscr {B}_{d,\lambda ,\beta }(\pi _{\lambda ,\beta }). \end{aligned}$$
(7.34)

Now consider the sequence \((\pi _{\lambda ,\beta })_{\beta =1,2,\ldots }\). Since \(\mathfrak {D}_\lambda \) is compact, a subsequence \((\pi _{\lambda ,\beta _j})_j\) converges to \(\pi _{\lambda }\in \mathfrak {D}_\lambda \), i.e.,

$$\begin{aligned} \lim _{j\rightarrow \infty }\mathscr {D}_{\Box }(\pi _{\lambda ,\beta _j},\pi _{\lambda })=0. \end{aligned}$$
(7.35)

Further, since \(\pi _{\lambda ,\beta _j}^{*_{\beta _j}(N,M)}=\pi _{\lambda ,\beta _j}\) for all j and \(N,M\ge 0\), Lemma 7.13 implies that for all pairs \(N,M\ge 0\),

$$\begin{aligned} \lim _{j\rightarrow \infty }\mathscr {D}_{\Box }(\pi _{\lambda ,\beta _j},\pi _{\lambda ,\beta _j}^{*_{\infty }(N,M)})=0. \end{aligned}$$
(7.36)

Combining (7.35) and (7.36) with Lemma 7.12, we conclude that \(\pi _\lambda \in \mathfrak {D}^\star _\lambda \). Finally, since for every \(\beta >0\) we have \(\mathscr {B}_{d,\lambda ,\beta }(\,\cdot \,)\ge \mathscr {B}_{d,\lambda ,\infty }(\,\cdot \,)\) on \(\mathfrak {D}_\lambda \), the assertion follows from (7.34) and the continuity of the functional \(\mathscr {B}_{d,\lambda ,\infty }(\,\cdot \,)\). \(\quad \square \)

A separate argument is needed to derive the upper bound on the free energy. Basically, we will prove the following proposition by checking that the interpolation argument from Sect. 5 goes through for the hard-core model.

Proposition 7.18

For all \(d\ge 3,\lambda >0\) we have \(\limsup _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}}_{\lambda ,\infty })]\le \Phi _{d,\lambda }.\)

With \(\varvec{\varphi }_i\) and \(\varvec{\psi }_{1,i}\) defined with respect to the hard-core weight function \(\psi _\infty \), let

$$\begin{aligned} \mathscr {B}'(\mu )&=\mathbb {E}\log \left\langle {{\bigoplus _{i=1}^n\varvec{\varphi }_i},{\mu }}\right\rangle ,&\mathscr {B}''(\mu )&=\mathbb {E}\log \left\langle {{\bigoplus _{1\le i\le dn/2}\varvec{\psi }_{1,i}},{\mu }}\right\rangle . \end{aligned}$$

Lemma 7.19

For any \(\lambda >0\) and any \(\mu \in \mathfrak {K}_\lambda \) we have \(\mathbb {E}\left[{\log Z({\varvec{G}}_{\lambda ,\infty })}\right]\le \mathscr {B}'(\mu )-\mathscr {B}''(\mu )+o(n)\).

Proof

This follows along the lines of the proof of Proposition 5.2. In that proof we required the assumption that all weight functions are strictly positive, but only in one place. Namely, we required positivity in order expand the logarithm into a power series in equations (5.9)–(5.11). Yet this approximation is still valid in the hardcore model. Indeed, the term \(\left\langle {{\psi _{a_{\varvec{m}_t+1}}},{\mu _{{\varvec{G}}_t}}}\right\rangle \), whose logarithm we calculate in (5.9), is lower-bounded by \(1-\lambda /(1+\lambda )\), because in the hard-core model the marginal probability that a single variable node has spin one is upper-bounded by \(\lambda /(1+\lambda )\). Similarly, the arguments of the logarithms in (5.10) and (5.11) are lower-bounded by \(1-\lambda /(1+\lambda )\) because \(\mu \in \mathfrak {K}_\lambda \).

\(\square \)

Proof of Proposition 7.18

Based on Lemma 7.19, we follow the proof of Proposition 5.3 to complete the proof of Proposition 7.18. Specifically, we claim that for any \(\pi \in \mathfrak {D}^\star _\lambda \),

$$\begin{aligned} \mathbb {E}[\mathscr {B}''(\varvec{\mu }^\pi )]&=\frac{dn}{2}\mathbb {E}\left[{\log \left\langle {{\psi _\infty },{\pi }}\right\rangle }\right],&\text{ and }&\mathbb {E}[\mathscr {B}'(\varvec{\mu }^\pi )]&=\mathbb {E}\log \left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle . \end{aligned}$$
(7.37)

This follows along the lines of Lemmas 5.5 and 5.6. In both cases we assumed that the weight functions are strictly positive in order to ensure that the arguments of the logarithms on the l.h.s. are bounded away from zero so that the logarithmic series applies. But the condition \(\pi \in \mathfrak {D}^\star _\lambda \) guarantees that

$$\begin{aligned} \left\langle {{\bigoplus _{i=1}^n\varvec{\varphi }_i},{\mu }}\right\rangle&\ge (1/(1+\lambda ))^n\quad \text{ and }\quad \left\langle {{\bigoplus _{1\le i\le dn/2}\varvec{\psi }_{1,i}},{\mu }}\right\rangle&\ge (1/(1+\lambda ))^{dn/2}. \end{aligned}$$

Thus, the same manipulations as before yield (7.37). Finally, the assertion follows from (7.37) and Lemma 7.19. \(\quad \square \)

Proof of Theorem 1.4

The theorem is an immediate consequence of Propositions 7.17 and 7.18. \(\quad \square \)

Proof of Corollary 1.5

For a graph \(G=(V(G),E(G))\) let \(\mu _{G,\lambda }\in \mathscr {P}(\{0,1\}^{V(G)})\) denote the hard-core model on G with fugacity \(\lambda \), and let \(Z_\lambda (G)\) be the corresponding partition function. Further, let \(\alpha _\lambda (G)=\sum _{v\in V(G)}\left\langle {{\varvec{\sigma }_v},{\mu _{G,\lambda }}}\right\rangle \) be the average size of an independent set drawn from \(\mu _{G,\lambda }\). Additionally, we write \(\alpha (G)\) for the maximum independent set size. It is well known that

$$\begin{aligned} \alpha _\lambda (G)=\lambda \frac{\partial }{\partial \lambda }\log Z_\lambda (G) \end{aligned}$$
(7.38)

and that

$$\begin{aligned} \frac{\alpha _\lambda (G)}{|V(G)|}&\quad {{\mathop {\longrightarrow }\limits ^{{\lambda \rightarrow \infty }}}}\quad \frac{\alpha (G)}{|V(G)|}&\text{ uniformly } \text{ for } \text{ all } G. \end{aligned}$$
(7.39)

As an immediate consequence of (7.38) we obtain

$$\begin{aligned} \log Z_{\lambda +1}(G)-\log Z_{\lambda }(G)&=\int _\lambda ^{\lambda +1}\frac{\alpha _t(G)}{t}{\mathrm d}t{\left\{ \begin{array}{ll}\le \alpha _{\lambda +1}(G)/\lambda ,\\ \ge \alpha _\lambda (G)/(\lambda +1). \end{array}\right. } \end{aligned}$$

Hence, (7.39) shows that for any \(\varepsilon >0\) there exists \(\lambda _0>0\) such that for all \(\lambda \ge \lambda _0\) and all d-regular graphs G we have

$$\begin{aligned} (1-\varepsilon )\alpha (G)\le \frac{\lambda }{1+\lambda }\alpha _\lambda (G)&\le \lambda (\log Z_{\lambda +1}(G)-\log Z_{\lambda }(G))\le \alpha _\lambda (G)\le \alpha (G). \end{aligned}$$
(7.40)

Applying (7.40) to the random graph \({\varvec{G}}_\lambda \) and taking expectations, we obtain

$$\begin{aligned} (1-\varepsilon )\mathbb {E}\left[{\frac{\alpha ({\mathbb {G}})}{n}}\right]&\le \mathbb {E}\left[{\lambda (\log Z_{\lambda +1}({\mathbb {G}})-\log Z_{\lambda }({\mathbb {G}}))}\right]\le \mathbb {E}\left[{\frac{\alpha ({\mathbb {G}})}{n}}\right]. \end{aligned}$$
(7.41)

Theorem 1.4 guarantees that the sequence \(\left( {\mathbb {E}\left[{\lambda (\log Z_{\lambda +1}({\mathbb {G}})-\log Z_{\lambda }({\mathbb {G}}))}\right]}\right) _n\) converges, and thus (7.41) yields

$$\begin{aligned} (1-\varepsilon )\limsup _{n\rightarrow \infty }\mathbb {E}\left[{\frac{\alpha ({\mathbb {G}})}{n}}\right]&\le \lambda (\Phi _{d,\lambda +1}-\Phi _{d,\lambda })\le \liminf _{n\rightarrow \infty }\,\mathbb {E}\left[{\frac{\alpha ({\mathbb {G}})}{n}}\right]. \end{aligned}$$
(7.42)

Further, there exists a subsequence \((n_l)_{l\ge 1}\) along which \(\mathbb {E}[\alpha ({\mathbb {G}})/n]\) converges to \(\alpha _*\in [0,1]\), whence (7.41) yields

$$\begin{aligned} (1-\varepsilon )\alpha _*&\le \lambda (\Phi _{d,\lambda +1}-\Phi _{d,\lambda })\le \alpha _*. \end{aligned}$$
(7.43)

Since (7.43) holds for every \(\varepsilon >0\) for large enough \(\lambda \), we conclude that \(\lim _{\lambda \rightarrow \infty }\lambda (\Phi _{d,\lambda +1}-\Phi _{d,\lambda })\) exists. Hence, taking the limit \(\varepsilon \rightarrow 0\), and thus \(\lambda \rightarrow \infty \), in (7.42) completes the proof. \(\quad \square \)