Abstract
In an extremely influential paper Mézard and Parisi put forward an analytic but non-rigorous approach called the cavity method for studying spin systems on the Bethe lattice, i.e., the random d-regular graph Mézard and Parisi (Eur Phys J B 20:217–233, 2001). Their technique was based on certain hypotheses; most importantly, that the phase space decomposes into a number of Bethe states that are free from long-range correlations and whose marginals are given by a recurrence called Belief Propagation. In this paper we establish this decomposition rigorously for a very general family of spin systems. In addition, we show that the free energy can be computed from this decomposition. We also derive a variational formula for the free energy. The general results have interesting ramifications on several special cases.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Disordered systems and the Bethe lattice
In 2001 in a ground-breaking contribution Mézard and Parisi proposed an analytic but non-rigorous technique that they called the cavity method for the study of spin glasses on the ‘Bethe lattice’,Footnote 1 known in combinatorics as the random d-regular graph [48]. Mézard and Parisi argued that the Bethe lattice constitutes an attractive halfway point between classical ‘mean-field’ models such as the Sherrington-Kirkpatrick model with complete interaction between all sites and spatial models such as the Edwards-Anderson model. Indeed, the Bethe lattice induces a non-trivial metric on the sites, each of which interacts with only a bounded number of others. But at the same time Mézard and Parisi showed that the model is amenable to analytic methods, even though matters are significantly more complicated than in the fully connected case. They went on to argue that the spin glass on the Bethe lattice exhibits many of the properties expected of real glassy systems, such as replica symmetry breaking and the proliferation of pure states.
From the original contribution [48] sprang a truly enormous body of work that has had a transformative impact on an astounding variety of subjects, ranging from physics to combinatorics to machine learning. Many of the applications may appear unexpected, even surprising. Almost all of them hinge on the cavity method. Prominent success stories include the development of ‘low-density parity check codes’, a rare example of a statistical physics idea leading directly to an eminently useful, and widely used, algorithm [56]. A further example is a new algorithm for the compressed sensing problem, a fundamental signal processing task [58]. Other important cavity method-based contributions pertain to classical problems in mathematics, such as phase transitions in random graphs and other random structures [41, 46, 47]. The cavity method has also been used to put forward predictions in machine learning, including the capacity of the Hopfield model or on restricted Boltzmann machines [45].
Due to these numerous ramifications, vindicating the cavity method rigorously has become an important research task at the junction of mathematical physics, combinatorics and computer science. There has been a lot of progress recently, e.g., [9, 19, 33, 51]; we shall review the literature in greater detail in Sect. 2.5. However, much of this work is concerned with special cases, mostly the ‘replica symmetric’ scenario where there is just a single pure state.
The aim of the present paper is to move past such assumptions and special cases. We confirm several of the key hypotheses of Mézard and Parisi, particularly the decomposition into pure states and the validity of the Belief Propagation recurrence, the mainstay of the cavity calculations. Further, we obtain a general variational formula for the free energy that is perfectly in line with the Mézard-Parisi ansatz. Additionally, we show that the free energy can be computed from the Belief Propagation representation of the pure states of the model. We obtain these results not merely for a specific model, but for a broad family of models on the Bethe lattice. The prime example is, of course, the diluted spin glass model. But in addition, since the proof techniques that we develop are generic, the results apply to models that are of eminent interest in other areas, particularly combinatorics, such as the Potts antiferromagnet or the hard-core model. Crucially, the results apply universally to all parameter values (such as degree, inverse temperature) of the respective models.
We should point out, however, that the present results fall short of fully corroborating the Mézard-Parisi ansatz. Most importantly, while we prove that general random factor graph models possess pure state decompositions represented by (approximate) Belief Propagation fixed points, an important prediction of the Mézard-Parisi ansatz pertains to the relative geometry of these fixed points. Roughly speaking, Mézard and Parisi predict three different scenarios, depending on the parameter values (such as temperature). In the replica symmetric case, there is just one single pure state (or possibly a small bounded number due to inherent symmetries). Moreover, in the 1-step replica symmetry breaking scenario an unbounded number of pure states occur, each represented by a Belief Propagation fixed point. But these fixed points are predicted to exhibit a strong symmetry property that ensures, e.g., that the empirical distributions of the Belief Propagation messages in the different pure states are (nearly) identical. Finally, in the full replica symmetry breaking scenario we also expect an unbounded number of pure states and assorted Belief Propagation fixed points, which are arranged hierarchically in the fashion of an ultrametric tree. While in the present paper we prove the existence of a pure state decomposition and of associated Belief Propagation fixed points, our methods do not suffice to establish these more precise predictions as to the geometry and relative weights of the pure states. A full verification of these predictions remains an important open problem as it would, among other things, lead to a significantly simplified variational formula for the free energy. For a more detailed discussion of the Mézard and Parisi ansatz, replica symmetry breaking, and ultrametricity we refer to [46, 53, 54].
Technically the paper builds upon and continues two intertwined threads of prior work. First, we bring to bear a variant of the ‘regularity method’ from combinatorics that we developed recently [10, 15, 24] in order to establish the pure state decomposition and to vindicate the Belief Propagation equations. Second, we seize upon Panchenko’s work on asymptotic Gibbs measures and the interpolation method, particularly in order to derive the variational formula for the free energy [52, 53]. Both of these methods were previously applied with great success to random graphs of Erdős-Rényi type. This line of work crucially exploited the relative geometric flexibility of the Erdős-Rényi model, whose Poisson degree distribution facilitates coupling arguments. By contrast, the geometry of the Bethe lattice is rigid. While this entails that the specification of the model, the cavity equations and their solution are quite ‘clean’, the rigidity poses substantial technical challenges that the present paper resolves.
Before presenting the main results of the paper, which cover a broad family of problems that we call random factor graph models, in Sect. 2, we illustrate the results and the concepts around which they revolve with the spin glass model from the original contribution of Mézard and Parisi. We also work out an additional application to the hard-core model and the independence number of the random regular graph. Several further applications, including the Potts model and the Max q-Cut problem, are worked out in Sect. 7.
1.2 The diluted spin glass
For integers \(d\ge 3\), \(n>0\) such that dn is even, let \({\mathbb {G}}={\mathbb {G}}(n,d)\) be the uniformly random d-regular graph on the vertex set \(V_n=\{v_1,\ldots ,v_n\}\). With each edge \(e\in E({\mathbb {G}})\) comes a standard Gaussian \(J_e\). The random variables \((J_e)_{e\in E({\mathbb {G}})}\) are mutually independent. For a given inverse temperature \(\beta >0\), the diluted spin glass on \({\mathbb {G}}\) is the probability distribution on \(\{\pm 1\}^{V_n}\) defined by
where the partition function \(Z({\mathbb {G}})\) ensures normalization.Footnote 2 Without the couplings \(J_e\), this would just be the ferromagnetic Ising model on \({\mathbb {G}}\). But since the \(J_e\) are independent Gaussians, some will be positive and others negative. In effect, some edges induce ferromagnetic and others antiferromagnetic interactions, causing frustration. Thus, \(\mu _{{\mathbb {G}}}\) is a spin glass model, the well-known diluted spin glass on the Bethe lattice.
There are two fundamental problems associated with this and numerous similar models: first, to characterize the structure of the Boltzmann distribution \(\mu _{{\mathbb {G}}}\). Does it exhibit long-range correlations? Does it decompose into one or several ‘pure states’, and if so, how can we characterize them? Second, to calculate the quantity \(\lim _{n\rightarrow \infty } \frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]\), which we call the free energy density. Its fundamental importance is due to the fact that other important observables derive from it. Moreover, the singularities of the function \(\beta \mapsto \lim _{n\rightarrow \infty } \frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]\) constitute the phase transitions of the model.
1.2.1 Bethe states and the Boltzmann distribution.
With respect to the first problem, Mézard and Parisi hypothesized that the Boltzmann distribution always decomposes into one or a moderate (albeit not necessarily bounded) number of pure states. Further, they hypothesized that these pure states are characterized by fixed points of a recurrence called Belief Propagation. Our first theorem confirms this hypothesis.
To be precise, writing \(\partial v\) for the set of neighbors of a vertex v, let \(\mathscr {M}({\mathbb {G}})\) be the set of all families \((\nu _{u\rightarrow v})_{v\in V_n,u\in \partial v}\) such that \(\nu _{u\rightarrow v}\in [0,1]\). We call \(\nu _{u\rightarrow v}\) the message from u to v. The messages need not be symmetric, i.e., possibly \(\nu _{u\rightarrow v}\ne \nu _{v\rightarrow u}\). Furthermore, Belief Propagation is the operator \(\mathrm {BP}:\mathscr {M}({\mathbb {G}})\rightarrow \mathscr {M}({\mathbb {G}})\), \(\nu \mapsto {\hat{\nu }}\), where
The motivation behind this operator, and the origin of the name ‘cavity method’, is this. Suppose we fix a vertex v in a d-regular graph along with a neighbor u. Now suppose we remove the vertex u, thereby creating a ‘cavity’. Then the ‘ideal’ message \(\mu _{{\mathbb {G}},u\rightarrow v}\) that we would like to compute is just the marginal probability \(\mu _{{\mathbb {G}}-v,u}(1)\) that u takes spin 1 in the subgraph obtained by removing v. If the Boltzmann distribution \(\mu _{{\mathbb {G}}}\) is free from long-range correlations, then these ideal messages should plausibly be a fixed point of the BP operator. Indeed, if we remove v, then very likely its former neighbors will be mutually far apart in the resulting graph. In effect, the joint distribution of their spins should factorize. If so, then a straightforward calculation verifies that the ideal messages are a fixed point of BP. In fact this reasoning goes back to Bethe’s classical work [15].
However, generally spin glass models do exhibit long-range correlations, a phenomenon called replica symmetry breaking (see, e.g., [17, 25] for proofs that replica symmetry breaking occurs in certain models). Yet the fundamental hypothesis of Mézard and Parisi holds that the phase space \(\{\pm 1\}^{V_n}\) always decomposes into Bethe states\(S_1,\ldots ,S_\ell \) in such a way that the conditional distributions \(\mu _{{\mathbb {G}}}[\,\cdot \,|S_h]\) are free from long-range correlations. Formally, this means that if we pick a pair of vertices \((v_i,v_j)\) uniformly at random, then typically the conditional joint distribution \(\mu _{{\mathbb {G}},v_i,v_j}[\,\cdot \,|S_h]\) of the spins of \(v_i\) and \(v_j\) is close to the product distribution \(\mu _{{\mathbb {G}},v_i}(\,\cdot \,|S_h)\otimes \mu _{{\mathbb {G}},v_j}(\,\cdot \,|S_h)\), i.e.,
In effect, within each Bethe state the ‘ideal’ messages are predicted to be an approximate fixed point of the BP operator. To be precise, for adjacent vertices u, v we write \(\mu _{{\mathbb {G}},v\rightarrow u}[S_h]=\mu _{{\mathbb {G}}-u,v}(1|S_h)\) for the conditional probability given \(S_h\) that v takes spin 1 in the subgraph of \({\mathbb {G}}\) with u removed. Then we expect that
Further, the cavity method predicts that the Boltzmann marginals can be obtained from the messages by a formula quite similar to (1.2):
The following theorem establishes these conjectures rigorously. We say that \({\mathbb {G}}\) enjoys a property with high probability (‘w.h.p.’) if the probability that the property holds tends to one as \(n\rightarrow \infty \).
Theorem 1.1
For any \(d\ge 3\), \(\beta >0\) the following is true. Let \(L=L(n)\rightarrow \infty \) be any integer sequence that tends to infinity. Then there exists a decomposition \(S_0=S_0({\mathbb {G}}),S_1=S_1({\mathbb {G}}),\ldots ,S_\ell =S_\ell ({\mathbb {G}})\), \(\ell =\ell ({\mathbb {G}})\le L\), of the phase space \(\{\pm 1\}^n\) into non-empty sets such that \(\mu _{{\mathbb {G}}}(S_0)=o(1)\) and such that with high probability (1.3)–(1.5) are satisfied for \(h=1,\ldots ,\ell \).
Crucially, and in contrast to much prior work in this area, Theorem 1.1 applies indiscriminately to all \(d,\beta \). While it is expected that in the ‘high-temperature’ regime (small \(\beta \)) there is just a single pure state, it is widely conjectured that for large d and \(\beta \) the number of pure states is unbounded. Thus, we do not expect that it will be possible to replace the unbounded L in Theorem 1.1 by a constant. Yet Theorem 1.1 shows that the number of states can be upper bounded by an arbitrarily slowly growing function L(n).
1.2.2 The free energy.
The Bethe states and their associated messages contain all the information needed to compute the free energy. To be precise, once more following the ideas of Mézard and Parisi, we can set up a recurrence for computing the difference \(\mathbb {E}[\log Z({\mathbb {G}}(n+1,d))]-\mathbb {E}[\log Z({\mathbb {G}}(n,d))]\), which in turn enables us to write a formula for \(\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}}(n,d))]\) by telescoping. To set up such a recurrence it is necessary to crack the rigid geometry of the random regular graph open a little bit. To this end, we resort to the idea of creating a few ‘cavities’. Specifically, we delete a few random vertices and edges from \({\mathbb {G}}(n,d)\). Formally, let \(\omega >0\) and let \(\varvec{X},\varvec{Y}\) be two independent Poisson variables with mean \(\omega \). Moreover, let \(\varvec{u}_1,\ldots ,\varvec{u}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\) be sequences of uniformly random vertices and edges of \({\mathbb {G}}\), chosen independently. With \(S_1,\ldots ,S_\ell \) the decomposition from Theorem 1.1, we introduce weights
and \(\varvec{z}_{{\mathbb {G}}}=\sum _{h=1}^\ell \varvec{z}_{{\mathbb {G}},h}\). Further, let \(\mathscr {C}({\mathbb {G}})\) be the set of all vertices of degree less than d in the graph \({\mathbb {G}}_{n,\omega }\) obtained from \({\mathbb {G}}\) by removing \(\varvec{u}_1,\ldots ,\varvec{u}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\). Then with high probability each \(c\in \mathscr {C}({\mathbb {G}})\) has degree precisely \(d-1\), and we write \(c'\) for the erstwhile d’th neighbor of c. Further, with \(\varvec{c}_1,\varvec{c}_2,\ldots \) a sequence of uniformly and independently chosen elements of \(\mathscr {C}({\mathbb {G}})\) and \((\varvec{J}_i)_{i\ge 1}\) a sequence of independent standard Gaussians, we let
The expression \(\mathscr {B}({\mathbb {G}})\) mirrors our recurrence for the difference \(\mathbb {E}[\log Z({\mathbb {G}}(n+1,d))]-\mathbb {E}[\log Z({\mathbb {G}}(n,d))]\). Having created a moderate number of cavities, we insert a new \((n+1)\)st vertex, connected to d randomly chosen ‘cavities’. The first summand above represents the ensuing change in the free energy. But this operation adds d more edges, whereas a random regular graph with \(n+1\) vertices only has d / 2 more edges than one with n vertices. Therefore, a correction term is needed. Hence the second summand.
Crucially, the functional \(\mathscr {B}({\mathbb {G}})\) depends only on the pure state decomposition from Theorem 1.1 and the associated messages. The following theorem shows that this information suffices to compute the free energy.
Theorem 1.2
For all \(d\ge 3,\beta >0\) we have
Entirely in line with the ideas developed in [48], Theorem 1.2 establishes a direct conceptual link between Belief Propagation and the pure state decomposition from Theorem 1.1 and the free energy for all\(d,\beta \). Of course, in order to evaluate \(\mathscr {B}({\mathbb {G}})\) it is necessary to actually determine the pure state decomposition along with the corresponding Belief Propagation messages. The shape of this decomposition, and the practical difficulty of computing it, will depend significantly on the parameters \(d,\beta \). Alternatively, as we see next, it is possible to derive a variational formula for the free energy.
1.2.3 A variational formula.
The variational formula comes in terms of an optimization problem on a space that resembles the graphon space from the theory of graph limits [44]. To be precise, let \(\nu :[0,1]^2\rightarrow [0,1]\), \((s,x)\mapsto \nu _{s,x}\) and \(\nu ':[0,1]^2\rightarrow [0,1]\), \((s,x)\mapsto \nu '_{s,x}\) be measurable maps. We define the cut distance between \(\nu ,\nu '\) by
where \(\varphi ,\varphi ':[0,1]\rightarrow [0,1]\) are measurable maps that preserve the Lebesgue measure and \(S,X\subset [0,1]\) are measurable. Obtain the space \(\mathfrak {K}\) by identifying any \(\nu ,\nu '\) with \(\mathscr {D}_{\Box }(\nu ,\nu ')=0\). Then \(\mathfrak {K}\) endowed with the cut distance is a compact metric space. In addition, write \(\mathfrak {D}\) for the space of probability measures on \(\mathfrak {K}\).
The formula for the free energy comes as a variational problem on a subspace \(\mathfrak {D}^\star \) of \(\mathfrak {D}\). Let \(N,M\ge 0\) be integers. For \(\mu \in \mathfrak {K}\) we define a randomly perturbed \(\mu ^{*(N,M)}\in \mathfrak {K}\) as follows. Let \((\varvec{x}_{i,j})_{i,j\ge 1}\) be a family of uniform random variables on [0, 1] and let \((\varvec{J}_{i,j})_{i,j\ge 1}\) be a family of standard Gaussians, all mutually independent. Then for \(s\in [0,1]\) we define
Further, let
Now, suppose that \(\pi \in \mathfrak {D}\) is a distribution, and write \(\varvec{\mu }^\pi \in \mathfrak {K}\) for a sample from \(\pi \). Then we let \(\mathfrak {D}^\star \) be the set of all \(\pi \in \mathfrak {D}\) such that the perturbed \(\varvec{\mu }^{\pi *(N,M)}\) has distribution \(\pi \) again for all \(N,M\ge 0\).
The definition of \(\mathfrak {D}^\star \), which is an adaptation of the one stated by Panchenko [52] in the case of models of Erdős-Rényi type, mirrors a natural combinatorial invariance property of the graph \({\mathbb {G}}_{n,\omega }\) with the random cavities. Indeed, because the numbers \(\varvec{X},\varvec{Y}\) of deleted edges and vertices are Poisson with a large mean \(\omega \), for any fixed N, M the random graph \({\mathbb {G}}_{n,\omega }\) with \(\varvec{X}\) deleted vertices and \(\varvec{Y}\) deleted edges is close in total variation to the one with merely \(\varvec{X}-N\) deleted vertices and \(\varvec{Y}-M\) deleted edges. Furthermore, because adding or removing a small number of edges only affects the Boltzmann weights by a bounded factor, we should expect that the Bethe states of these two factor graphs remain the same. But, of course, the relative probability masses of the Bethe states will be different. Accordingly, the weights \(\varvec{z}_s\) mirror the changes in the weights of the Bethe states upon re-insertion of N vertices, each with d incident edges, and another M edges into \({\mathbb {G}}_{n,\omega }\). Once we take \(\omega \) and n to infinity, the closeness of the two random factor graphs in total variation translates into the statement that the distribution of the messages emitted by the cavities of \({\mathbb {G}}_{n,\omega }\) belongs to \(\mathfrak {D}^\star \).
Finally, define a functional \(\mathscr {B}:\mathfrak {K}\rightarrow \mathbb {R}\) by letting
We are ready to state the variational formula for the free energy.
Theorem 1.3
For all \(d\ge 3\) and \(\beta >0\) we have \(\displaystyle \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z_\beta ({\mathbb {G}})]=\min _{\pi \in \mathfrak {D}^\star }\mathbb {E}[\mathscr {B}(\varvec{\mu }^\pi )].\)
Theorem 1.2 provides the combinatorial interpretation of the optimal \(\pi \) for Theorem 1.3: it is the kernel representing the messages \((\mu _{{\mathbb {G}},\varvec{c}\rightarrow \varvec{c}'}[\,\cdot \,|S_h])_{c\in \mathscr {C}({\mathbb {G}}),h=1,\ldots ,\ell }\) sent out by the cavities on the individual Bethe states.
1.3 The hard-core model
As a second application we discuss the hard-core model on the random regular graph \({\mathbb {G}}={\mathbb {G}}(n,d)\). This is a probability distribution on the collection of independents sets of \({\mathbb {G}}\) parametrized by \(\lambda >0\), the fugacity. Formally, encoding subsets of the vertex set by their indicator vectors, we define
with \(Z({\mathbb {G}})\) the partition function that turns \(\mu _{{\mathbb {G}}}\) into a probability measure. Thus, \(\mu _{{\mathbb {G}}}(\sigma )=0\) unless the 1-entries of \(\sigma \) form an independent set in \({\mathbb {G}}\), in which case the weight of \(\sigma \) is proportional to \(\lambda \) taken to the power of the size of the independent set.
The hard-core model, of great prominence in statistical physics, is of eminent importance in combinatorics as well because it is closely related to the problem of finding the size of the largest independent set of the random regular graph. For d large, this problem was solved by Ding, Sly, and Sun [34] using an intricate version of the second-moment method guided by insights from the 1-step replica symmetry breaking (1RSB) version of the cavity method. But according to the physics predictions [13], the 1RSB method runs into an inherent obstacle for small d as the model exhibits a continuous phase transition to a more complicated ‘full replica symmetry breaking’ (full RSB) phase. In Corollary 1.5 below we will derive a formula for the largest independent set size that holds for all d and that accommodates the full RSB scenario.
But let us first deal with the free energy of the hard-core model, in and of itself a well-known problem. To derive a variational formula for the free energy, obtain \(\mathfrak {K}_\lambda \) from the space of all measurable functions \([0,1]^2\rightarrow [0,\lambda /(1+\lambda )]\) by identifying any \(\nu ,\nu '\) with \(\mathscr {D}_{\Box }(\nu ,\nu ')=0\). Then \(\mathfrak {K}_\lambda \) is a compact. In addition, we let \(\mathfrak {D}_\lambda \) be the space of probability measures on \(\mathfrak {K}_\lambda \). Similarly to the spin glass problem, the formula for the free energy comes as a variational problem on a subspace \(\mathfrak {D}^\star _\lambda \) of \(\mathfrak {D}_\lambda \). This subspace is defined as follows. Let \((\varvec{x}_{i,j})_{i,j\ge 1}\) be a family of independent random variables, uniformly distributed on [0, 1], and let \(N,M\ge 0\) be integers. Then for \(\mu \in \mathfrak {K}_\lambda \) we define a random \(\mu ^{*(N,M)}\in \mathfrak {K}_\lambda \) as follows. For \(s\in [0,1]\) let
and set
Further, suppose that \(\pi \in \mathfrak {D}_\lambda \) is a distribution, and write \(\varvec{\mu }^\pi \in \mathfrak {K}_\lambda \) for an element chosen from \(\pi \). Then we let \(\mathfrak {D}^\star _\lambda \) be the set of all \(\pi \in \mathfrak {D}_\lambda \) such that \(\varvec{\mu }^\pi \) and \(\varvec{\mu }^{\pi *(N,M)}\) are identically distributed for all \(N,M\ge 0\). Finally, let \(\mathscr {B}:\mathfrak {K}_\lambda \rightarrow \mathbb {R}\) be the function defined by
The variational formula for the free energy reads as follows.
Theorem 1.4
For all \(d\ge 3\) and \(\lambda >0\) we have
In the limit \(\lambda \rightarrow \infty \) the distribution \(\mu _{{\mathbb {G}},\lambda }\) concentrates on the maximum independent sets of the random graph. As an application of Theorem 1.4 we therefore obtain the following result on the size of the largest independent set, i.e., the independence number \(\alpha ({\mathbb {G}})\) of the random graph.
Corollary 1.5
For all \(d\ge 3\) we have \(\displaystyle \lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\alpha ({\mathbb {G}})]=\lim _{\lambda \rightarrow \infty }\lambda \cdot (\Phi _{d,\lambda +1}-\Phi _{d,\lambda }).\)
The formula in Corollary 1.5 may not be easy to evaluate; in particular, it may be difficult to obtain a numerical estimate for a given value of d. Nonetheless, since the proofs show that the optimal \(\pi \) in Theorem 1.4 is closely related to the Belief Propagation fixed points on \({\mathbb {G}}\), it should be possible to extract combinatorial information about the independent set problem on random graphs. In any case, Theorem 1.4 and Corollary 1.5 put a lid on the complexity of the problem.
1.4 Organization
In Sect. 2 we present the main results of the paper, which cover a broad family of random factor graph models. At the end of Sect. 2 we are in a position to discuss related work in detail. Sections 3–6 deliver the proofs of these general results. Finally, in Sect. 7 we show how Theorems 1.1–1.4 and Corollary 1.5 follow from the general results in Sect. 2. In addition, we work through several more applications that have each received considerable attention in their own right, such as the Potts antiferromagnet.
2 Random Factor Graphs
In this section we present the main results of the paper, which cover a broad class of models called random factor graphs. The class encompasses many well-studied examples of problems on random regular graphs or hypergraphs, including the spin glass model from the previous section. Some other cases, such as the hard-core model or extremal cuts, can be dealt with by taking limits; we will come to that in Sect. 7.
2.1 Definitions
To define random factor graph models, we consider a finite set \(\Omega \ne \varnothing \) whose elements we call spins. Moreover, for an integer \(k\ge 2\) we let \((\Psi ,P)\) be a probability space of weight functions\(\psi :\Omega ^k\rightarrow (0,1)\). We always denote by \(\varvec{\psi }\) an element of \(\Psi \) chosen from the distribution P. The space \(\Psi \) may be finite or infinite. In the latter case we assume that
Furthermore, we always assume that the distribution P is invariant under permutations of the coordinates. That is, for any \(\psi \in \Psi \) and for any permutation \(\kappa \) of \(\{1,\ldots ,k\}\) the function \(\psi ^\kappa :\sigma \mapsto \psi (\sigma _{\kappa _1},\ldots ,\sigma _{\kappa _k})\) belongs to \(\Psi \) as well and \(\varvec{\psi }^\kappa \) has the same distribution as \(\varvec{\psi }\). Additionally, let p be a probability distribution on \(\Omega \) with \(p(\omega )>0\) for all \(\omega \in \Omega \). Further, let \(d\ge 3 ,n>0\) be integers and set \(m=\lfloor dn/k\rfloor \). Let \(V_n=\{v_1,\ldots ,v_n\}\) be a set of variable nodes and let \(F_m=\{a_1,\ldots ,a_m\}\) be a set of constraint nodes.
Definition 2.1
Suppose that k divides dn. The random factor graph \({\varvec{G}}={\varvec{G}}(n,d,p,P)\) consists of
a weight function \(\psi _{a_i}\in \Psi \) drawn from the distribution P independently for each \(i=1,\ldots ,m\) and
an independent uniformly random bijection \(\partial _{{\varvec{G}}}:F_m\times \{1,\ldots ,k\}\rightarrow V_n\times \{1,\ldots ,d\}\).
The definition resembles the pairing model of random regular graphs [40]. Accordingly, we use standard graph-theoretic terminology. For instance, we call \(x_i\in V_n\) and \(a_j\in F_m\) adjacent if there exist \(s\in [d]\) and \(k\in [k]\) such that \(\partial _{{\varvec{G}}}(a_j,t)=(x_i,s)\). We also use the symbol \(\partial _{{\varvec{G}}}(a_j,t)\) for the variable node \(x_i\) such that \(\partial _{{\varvec{G}}}(a_j,t)=(x_i,s)\). Further, we write \(\partial _{{\varvec{G}}}x_i\) for the set of all \(a_j\in F_m\) that \(x_i\) is adjacent to, and similarly for \(a_j\). We omit the index and just write \(\partial x_i,\partial a_j\) etc. where the reference to the random graph is apparent. In particular, \({\varvec{G}}\) induces a bipartite graph on the variable and constraint nodes, and thereby the shortest path metric on \(V_n\cup F_m\). Hence, by extension of the above notation, we write \(\partial ^\ell _{{\varvec{G}}}u\) for the set of all nodes at distance precisely \(\ell \) from u and \(\nabla ^\ell _{{\varvec{G}}}u\) for set of all variable nodes at distance at most \(\ell \) from u.
We let \(\mathscr {S}\) be the event that \({\varvec{G}}\) is simple, i.e., that there do not occur multiple edges between any variable and constraint nodes. Moreover, we denote by \({\mathbb {G}}\) the conditional distribution of \({\varvec{G}}\) given \(\mathscr {S}\). Let us make a note of the following well known fact.
Fact 2.2
([40]). We have \(\mathbb {P}\left[{{\varvec{G}}\in \mathscr {S}}\right]{\sim }\,\exp \left[{-(d-1)(k-1)/2{-}\,\varvec{1}\{k=2\}(d-1)^2/4}\right]\).
The random factor graph induces a probability distribution on \(\Omega ^{V_n}\). To define it, we introduce the shorthand \(\psi _{a_i}(\sigma )=\psi _{a_i}(\sigma (\partial (a_i,1)),\ldots ,\sigma (\partial (a_i,k)))\) for \(i\in [m]\) and \(\sigma \in \Omega ^{V_n}\). Thus, \(\psi _{a_i}(\sigma )\) is the weight that constraint node \(a_i\) gives to \(\sigma \). Further, we introduce the total weight
by multiplying up all the weight functions of the constraint nodes. The total weights \(\psi _{{\varvec{G}}}(\sigma )\) give rise to the partition function and the Boltzmann distribution:
Since all the weight functions \(\psi \in \Psi \) are strictly positive, the Boltzmann distribution is a well-defined probability measure on the phase space \(\Omega ^{V_n}\).
We set out to investigate the structure of the Boltzmann distribution \(\mu _{{\varvec{G}}}(\,\cdot \,)\) and to compute the partition function \(Z({\varvec{G}})\) or, more specifically, its logarithm, which we call the free energy. In Sect. 2.2 we will prove the main result of the paper, which provides that the Boltzmann distribution decomposes into a convex combination of relatively simple distributions called Bethe states. But before we come to that, let us look at an example.
Example 2.3
(the k-spin model). Let \(\Omega =\{\pm 1\}\), let \(k\ge 2\) be an integer and let \(\beta >0\) be a real parameter. The k-spin model is a generalization of the spin glass model from the previous section, which corresponds to the special case \(k=2\). The weight functions of the k-spin model read
Thus, \(\Psi =\{\psi _{\beta ,J}:J\in \mathbb {R}\}\), and the distribution P on \(\Psi \) is defined by choosing J from the standard Gaussian distribution. This distribution clearly satisfies (2.1). Geometrically, this model lives on a generalized Bethe lattice where all variable nodes, representing the sites, have degree d, while all constraint nodes, representing the interactions, have degree k.
The hard-core model from Sect. 1.3 cannot be expressed as a factor graph model directly because of the requirement that all weight functions be strictly positive. But it is possible to arrive at the hard-core model by taking suitable limits; see Sect. 7 for details.
2.2 Bethe states
The Belief Propagation message-passing scheme provides the mainstay of the physicists’ non-rigorous cavity method. Our first main result vindicates its use by showing that the Boltzmann distribution of any random factor graph model can be described in terms of Belief Propagation fixed points.
To introduce Belief Propagation let \(\mathscr {M}({\varvec{G}})\) be the message space, consisting of all families
of probability measures \(\nu _{v\rightarrow a},\nu _{a\rightarrow v}\) on \(\Omega \). For adjacent a, v we interpret \(\nu _{v\rightarrow a}\) as a ‘message’ from v to a, and \(\nu _{a\rightarrow v}\) as a message in the reverse direction. We equip \(\mathscr {M}({\varvec{G}})\) with the metric
Belief Propagation is the operator \(\mathrm {BP}:\mathscr {M}({\varvec{G}})\rightarrow \mathscr {M}({\varvec{G}})\) that maps \(\nu \) to \({\hat{\nu }}\) defined by
Further, a point \(\nu \in \mathscr {M}({\varvec{G}})\) is an \(\varepsilon \)-Belief Propagation fixed point if \(\mathscr {D}_1(\nu ,\mathrm {BP}(\nu ))<\varepsilon .\)
For a thorough discussion and motivation of Belief Propagation we refer to [46]. The punch line is that on acyclic factor graphs a Belief Propagation fixed point computation provably yields the marginals of the Boltzmann distribution as well as the free energy. Since the random graph \({\varvec{G}}\) contains only very few short cycles, one may therefore expect that Belief Propagation renders meaningful information on random factor graphs as well, provided that the Boltzmann distribution is free of long-range correlations.
Alas, in general long-range correlations do occur. Nevertheless, we will prove that the Boltzmann distribution still decomposes into a convex combination of relatively few ‘Bethe states’, characterized by Belief Propagation fixed points. To be precise, suppose that \(\varnothing \ne S\subset \Omega ^{V_n}\) is an event. Let v be a variable node and let \(a\in \partial _{{\varvec{G}}} v\). Then we define \(\mu _{{\varvec{G}},v\rightarrow a}(\,\cdot \,|S)\) as the conditional marginal of v given S under the Boltzmann distribution of the factor graph \({\varvec{G}}-a\) obtained from \({\varvec{G}}\) by removing the constraint node a. In formulas, with \(\left\langle {{\,\cdot \,},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle \) denoting the expectation with respect to \(\varvec{\sigma }\) drawn from \(\mu _{{\varvec{G}}}(\,\cdot \,|S)\), we have
Similarly, we let \(\mu _{{\varvec{G}},a\rightarrow v}(\,\cdot \,|S)\) be the conditional marginal of v under the Boltzmann distribution of the factor graph obtained from \({\varvec{G}}\) by removing all constraint nodes \(b\in \partial _{{\varvec{G}}} v{\setminus } a\) and disregarding the prior of v:
We refer to \(\mu _{{\varvec{G}},v\rightarrow a}(\,\cdot \,|S),\mu _{{\varvec{G}},a\rightarrow v}(\,\cdot \,|S)\) as the standard messages given S.
Definition 2.4
Let \(\varepsilon >0\). An event \(S\subset \Omega ^n\) is an \(\varepsilon \)-Bethe state of \({\varvec{G}}\) if the following two conditions hold.
- BS1::
the standard messages given S are an \(\varepsilon \)-Belief Propagation fixed point.
- BS2::
if \(\ell ,\ell '\le 1/\varepsilon \) and if \(\varvec{I}\subset V_n\), \(\varvec{J}\subset F_m\) are independent uniformly random sets of sizes \(|\varvec{I}|=\ell \), \(|\varvec{J}|=\ell '\), then for every \(\sigma \in \Omega ^{V_n}\) we have
$$\begin{aligned}&\mathbb {E}\bigg |\left\langle {{\varvec{1}\{\forall v\in \varvec{I}\cup \partial \varvec{J}\cup \partial ^2\varvec{I}: \varvec{\sigma }_v=\sigma _v\}},{\mu _{{\varvec{G}}}(\,\cdot \,|S)}}\right\rangle \nonumber \\&\quad -\prod _{v\in \varvec{I}}\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a{\setminus } v}\mu _{w\rightarrow a}(\sigma _w|S)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau ) \prod _{w\in \partial a{\setminus } v}\mu _{w\rightarrow a}(\tau _w|S)}\cdot \nonumber \\&\quad \prod _{a\in \partial \varvec{J}}\frac{\psi _a(\sigma )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\sigma _w|S)}{\sum _{\tau \in \Omega ^{\partial a}}\psi _a(\tau )\prod _{w\in \partial a}\mu _{w\rightarrow a}(\tau _w|S)} \bigg |<\varepsilon . \end{aligned}$$(2.6)
Thus, on a Bethe state the standard messages form an approximate Belief Propagation fixed point. Furthermore, locally around a bunch of randomly chosen variable and constraint nodes the Boltzmann distribution is characterized by the standard messages. In particular, setting \(\ell =0\) and \(\ell '=1\) in BS2, we see that the conditional joint distribution \(\mu _{{\varvec{G}},\partial a}(\,\cdot \,|S)\) of the variables around a typical random constraint node a reads
Additionally, setting \(\ell =1\) and \(\ell '=0\), we find that the local distribution around a typical variable node v, i.e., the distribution \(\mu _{{\varvec{G}},v\cup \partial ^2v}(\,\cdot \,|S)\) induced on the second neighborhood of v, reads
Thus, for most variable nodes v the conditional Boltzmann marginal \(\mu _{{\varvec{G}},v}(\,\cdot \,|S)\) satisfies
Apart from the conditioning on S, the formulas (2.7)–(2.9) coincide with the ones known in the acyclic case [46].
In addition, (2.6) implies that if we pick a few variable and/or constraint nodes randomly, then the joint distribution of their neighborhoods approximately factorizes. Applied to \(\ell =2\), \(\ell '=0\), this means that once we condition on S, the joint distribution of two randomly chosen variable nodes is close to a product distribution:
in statistical physics jargon, the conditional distribution \(\mu _{{\varvec{G}}}(\,\cdot \,|S)\) is replica symmetric.
Confirming the picture sketched by the cavity method and vindicating the use of Belief Propagation for the study of the Boltzmann distribution, the following theorem shows that w.h.p. the Boltzmann distribution of a random factor graph decomposes into a relatively small number of Bethe states.
Theorem 2.5
For any function \(L=L(n)\rightarrow \infty \) there exists \(\varepsilon =\varepsilon (n)\rightarrow 0\) such that the following is true. There exists a decomposition \(S_0=S_0({\varvec{G}}),S_1=S_1({\varvec{G}}),\ldots ,S_\ell =S_\ell ({\varvec{G}})\), \(\ell =\ell ({\varvec{G}})\le L\), of \(\Omega ^n\) into non-empty sets such that \(\mu _{{\mathbb {G}}}(S_0)\le \varepsilon \) such that with high probability \(S_1,\ldots ,S_\ell \subset \Omega ^n\) are \(\varepsilon \)-Bethe states. The same statement holds with \({\varvec{G}}\) replaced by \({\mathbb {G}}\).
An important feature of Theorem 2.5 is that the upper bound L on the size of the Bethe state decomposition can be an arbitrarily slowly growing function of n. Thus, the Gibbs measure can generally be decomposed into relatively few Bethe states, within which long-range correlations are negligible and where short-range correlations are characterized by Belief Propagation.
2.3 The free energy
Apart from the structure of the Boltzmann distribution, a second key challenge is the computation of the free energy. More specifically, arguably the single most important quantity associated with a random factor graph model is the free energy density
Of course, it comes as no surprise that computing (2.11) generally poses a formidable challenge. In fact, even the existence of the limit remains an unresolved problem in several interesting cases.
The next theorem provides a formula for (and en passant establishes the existence of) the limit (2.11) in terms of the Bethe state decomposition from Theorem 2.5 for a broad class of models. We merely require a certain ‘convexity condition’. This condition can be stated neatly in terms of a space that resembles the graphon space from combinatorics [44]. Specifically, let \(\mathscr {K}\) be the space of all measurable maps \([0,1]^2\rightarrow \mathscr {P}(\Omega )\) modulo equality (Lebesgue-)almost everywhere. We call these maps strong kernels. For \((s,x)\in [0,1]\) and \(\mu \in \mathscr {K}\) we let \(\mu _{s,x}\in \mathscr {P}(\Omega )\) denote the function value of \(\mu \) at (s, x). Further, for \(\mu ,\mu '\in \mathscr {K}\) we define the cut distance
where the infimum is over all measurable \(\varphi ,\varphi ':[0,1]\rightarrow [0,1]\) that preserve the Lebesgue measure and where the supremum runs over all measurable \(S,X\subset [0,1]\). Strictly speaking, \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\) is a pre-metric (as possibly \(\mathscr {D}_{\Box }(\mu ,\nu )=0\) even though \(\mu \ne \nu \)). We therefore let \(\mathfrak {K}\) be the metric space where any two \(\mu ,\nu \) with \(\mathscr {D}_{\Box }(\mu ,\nu )=0\) are identified. Then \(\mathfrak {K}\) is a compact Polish space [39]. Additionally, we write \(\mathfrak {D}\) for the space of all probability distributions on \(\mathfrak {K}\).
Crucially, the convexity assumption that we require comes solely in terms of the distribution P on the set \(\Psi \) of weight functions. Namely, let \(\varvec{x}=(\varvec{x}_i)_{i\ge 1}\) be a sequence of independent uniformly random points in [0, 1], chosen independently of \(\varvec{\psi }\in \Psi \). Writing \(\mathbb {E}\left[{\,\cdot \,}\right]\) for the expectation on \(\varvec{x},\varvec{\psi }\), we make the following assumption.
We will see in Sect. 7 that POS is easily verified for several interesting models, including the spin glass model from Sect. 1.
To obtain the formula for the free energy, we will represent the Bethe state decomposition of the random factor graph by a point in \(\mathfrak {K}\). Specifically, let \(\varvec{X},\varvec{Y}\) be random variables with distribution \(\mathrm{Po}(\omega )\) for an integer \(\omega >0\), mutually independent and independent of \({\varvec{G}}\). Then with \(S_1,\ldots ,S_\ell \) the decomposition promised by Theorem 2.5 we introduce for \(i=1,\ldots ,\ell \),
and we let \({\check{\varvec{z}}}_{{\varvec{G}}}=\sum _{i=1}^\ell {\check{\varvec{z}}}_{{\varvec{G}},i}\). It will emerge that combinatorially \({\check{\varvec{z}}}_{{\varvec{G}},i}/{\check{\varvec{z}}}_{{\varvec{G}}}\) represents the probability mass of the Bethe state \(S_i\) in the factor graph \({\varvec{G}}'\) where we remove the first \(\varvec{Y}\) constraint nodes \(a_1,\ldots ,a_{\varvec{Y}}\) as well as the first \(\varvec{X}\) variable nodes \(v_1,\ldots ,v_{\varvec{X}}\) along with their adjacent constraint nodes. While this removal operation has no discernible impact on the free energy (so long as \(\omega =o(n)\)), it enables us to set up a recurrence for computing this quantity.
The recurrence comes in terms of the messages sent out by those variable nodes that are left with degree \(d-1\) after the removal operation. We thus set up a kernel that captures these messages. Specifically, let \(v_{h_1},\ldots ,v_{h_t}\) be the variable nodes of degree \(d-1\) in the factor graph \({\varvec{G}}'\) and let \(b_{1},\ldots ,b_{t}\) be their \({\varvec{G}}\)-neighbors that got deleted. Then we define the kernel \({\check{\mu }}_{{\varvec{G}},X,Y}:[0,1]^2\rightarrow \mathscr {P}(\Omega )\) by letting
Recalling that \({\varvec{G}},\varvec{X},\varvec{Y}\) are random, we write \({\check{\pi }}_{n,\omega }\in \mathfrak {D}\) for the distribution of \({\check{\mu }}_{{\varvec{G}},\varvec{X},\varvec{Y}}\). Analogously, we write \({\check{\pi }}_{n,\omega ,\mathscr {S}}\) for the distribution of \({\check{\mu }}_{{\mathbb {G}},\varvec{X},\varvec{Y}}\) defined for the simple random factor graph.
Finally, we introduce a functional on the space \(\mathfrak {D}\) that encodes the recurrence for computing the free energy from the Bethe state decomposition. Namely, let \((\varvec{x}_{i,j})_{i,j\ge 1}\) be a family of random variables that are uniform on [0, 1], let \((\varvec{h}_i)_{i\ge 1}\) be a family of random variables that are uniform on \(\{1,\ldots ,k\}\), let \((\varvec{\psi }_i)_{i\ge 1}\) be a sequence of samples from P, and let \(\varvec{\mu }^{\pi }\in \mathfrak {K}\) be a sample from \(\pi \in \mathfrak {D}\), all mutually independent; then
We obtain the following expression for the free energy.
Theorem 2.6
Assume that condition POS is satisfied. Then
In particular, the limit on the left hand side exists, and it can be computed from the Bethe state decomposition.
2.4 A variational formula
We proceed to state a variational formula for the free energy of the random factor graph models akin to the one from Theorem 1.3 for the spin glass model. Namely, we express the limit (2.11) variationally as the infimum of \(\mathscr {B}(\pi )\) over \(\pi \) chosen from a certain subspace \(\mathfrak {D}^\star \subset \mathfrak {D}\). The definition of \(\mathfrak {D}^\star \) is an adaptation to the Bethe lattice of the invariance property that Panchenko [52] put forward in the case of the Erdős-Rényi model.
To define the subspace \(\mathfrak {D}^\star \) let \(\mu \in \mathscr {K}\), let \(s\in [0,1]\) and let \(N,M\ge 0\) be integers. We introduce the random variable
Further, let
Thus, for each \(\mu \in \mathfrak {K}\) we obtain a random \(\mu ^{*(N,M)}\in \mathfrak {K}\). Further, given \(\pi \in \mathfrak {D}\) we can apply this operation to a randomly chosen kernel \(\varvec{\mu }^\pi \in \mathfrak {K}\), thus obtaining a random kernel \(\varvec{\mu }^{\pi *(N,M)}\). We denote the distribution of \(\varvec{\mu }^{\pi *(N,M)}\) by \(\pi ^{*(N,M)}\). Now, let \(\mathfrak {D}^\star \) be the set of all densities \(\pi \in \mathfrak {D}\) such that \(\pi ^{*(N,M)}=\pi \) for all \(N,M\ge 0\). Then we obtain the following self-contained formula for the free energy.
Theorem 2.7
Assume that POS holds. Then
Admittedly, the variational formula may not be easy to evaluate. But Theorem 2.7 places a lid on the complexity of the problem, and Theorem 2.6 provides an explicit combinatorial interpretation of the minimizer in terms of Belief Propagation fixed points and Bethe states.
2.5 Discussion and related work
Over the past two decades an enormous amount of research, based on both rigorous and non-rigorous techniques, has been devoted to random factor graph models. Much of this work has been sparked by the cavity method advanced in the original contribution of Mézard and Parisi [48]. A survey of this literature up until about 2008 can be found in [46]. More recently models of Bayesian inference problems such as the stochastic block model have received a great deal of attention as well; this literature is surveyed in [1, 50, 58].
Rigorous work on random factor graphs and the cavity method can broadly be split into two categories. First, contributions that investigate physics predictions on specific models. Many of these contributions, particularly the earlier ones, rely on ‘classical’ techniques such as the second moment method, albeit frequently with physics-inspired twists. Examples include work on the k-SAT threshold [3, 6, 20, 21, 32], which culminated in the proof of the k-SAT threshold conjecture for large k [33], the Potts model and the random graph coloring problem [4, 11, 16, 26] or the hard-core model [27, 34]. Some recent work is based on the powerful but technically demanding idea of ‘spatial coupling’, which has led to important results in, e.g., coding theory [37] and random constraint satisfaction problems [2]. A second line of work focused on the mathematical vindication of the cavity method in general, with applications to specific models of interest. Examples include work on the role of spatial mixing [7, 28,29,30], the use of the interpolation method [14, 43, 55], phase transitions in inference problems [12, 19], and contributions based on the asymptotic analysis of the Boltzmann distribution such as the influential work of Panchenko [52] as well as [10, 15]. The present paper belongs to this second category.
In the following we discuss the main results and methods of the paper and how they compare to prior mathematical research. Subsequently we compare the present work with the physics intuition and discuss directions for future research.
2.5.1 Mathematical work.
We regard Theorem 2.5 as the main result of the paper. The theorem confirms in great generality one of the key assumptions behind the cavity method and explains the success of Belief Propagation as a device for analyzing random regular factor graph models. Indeed, the existence of a Bethe state decomposition has been conjectured explicitly, e.g., by Mézard and Montanari [46, Chapter 19]; see also Dembo and Montanari [28].
In a prior paper [23] we constructed a Bethe state decomposition for random factor graph models of Erdős-Rényi type, where the constraint nodes independently choose k-tuples of adjacent variable nodes. While we will be able to use some of the general tools developed in that work, the main argument breaks in the case of the Bethe lattice due to its rigid geometry. Indeed, the construction of the Bethe state decomposition hinges on coupling arguments involving, e.g., a coupling of a factor graph with n variable and m constraint nodes and another one with parameters \(n'\) and \(m'\) such that \(n=n'+O(1)\), \(m=m'+O(1)\). Due to the Poisson degree distribution and the Stein-Chen property, such arguments are pretty straightforward in the Erdős-Rényi case. One might say that the Erdős-Rényi graph resembles a gentle climbing wall with footholds supplied by the irregularity of the Poisson degree distribution. By contrast, the Bethe lattice with its regular degree makes for a smooth cliff. As a consequence, the Bethe lattice requires new ideas, leading to a rather subtle but ultimately elegant argument. The upshot is that this proof, which we present in Sect. 4, can be expected to generalize to other random graph models with given degrees. Apart from the appeal of such lattice-like models from a physics perspective, these models play a vital role, e.g., in coding theory, where a suitably chosen degree sequence is apt to greatly boost performance [56].
Similarly, the variational formula for the free energy provided by Theorem 2.7 is a generalization and adaptation of the formula established by Panchenko [52] for models of Erdős-Rényi type with spins \(\Omega =\{\pm 1\}\). Panchenko’s proof relies on two ingredients: an interpolation argument and a coupling argument. So does ours. But while the interpolation argument, an adaptation of the technique of Franz and Leone [35], goes through without too much trouble, the coupling argument does not. Once more the rigidity of the Bethe lattice poses substantial challenges that require subtle new arguments. A further, albeit relatively minor extension is that the present work applies to relatively general models with two or more spins, subject only to the condition POS. A further similarity between Panchenko’s work and ours is the embedding of discrete Boltzmann distributions into a compact metric space, which enables us to pick convergent subsequences. While Panchenko resorts to the Aldous-Hoover representation, here we use the cut metric and the associated kernel space, which is convenient to link the combinatorial representation of the measures in terms of messages directly with the free energy formula. That the Aldous-Hoover representation is closely related to graph limits is, of course, a well known fact [31].
Furthermore, Bayati, Gamarnik and Tetali [14] applied the interpolation method to factor graph models, including ones with regular degrees, to establish the existence of the limit \(\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\varvec{G}})]\) in certain cases via a super-additivity argument. In the process they also used arguments based on ‘cavities’, i.e., the removal of a small but linear number of vertices from the graph; a similar trick was used in [22] as well. But here, particularly in the construction of the Bethe state decomposition, we need to tread much more carefully. In particular, while removal of a small linear number of vertices does not shift the free energy too much, here we can only afford the creation of a very small number of cavities in order to avoid a distortion of the Boltzmann distribution, an extremely volatile object.
Theorem 2.6, which expresses the free energy density in terms of the Bethe state decomposition, is a synthesis of Theorems 2.5 and 2.7. The proof shows that the free energy can be expressed in terms of a particular distribution on kernels \([0,1]^2\rightarrow \mathscr {P}(\Omega )\), namely the one that encodes the Bethe state decomposition of the random factor graph or, more specifically, the associated Belief Propagation messages. No corresponding result was previously known even in the conceptually simpler Erdős-Rényi case.
Apart from the interpolation method and coupling arguments, the proofs of Theorems 2.5–2.7 rely on some of the techniques that we developed in [10, 15, 19, 24], particularly the cut metric and its ramifications. The cut metric, which we apply to kernel representations of probability distributions, was originally developed in the context of the regularity method [36] and the theory of graph limits in combinatorics [44]. Here we use the cut metric and certain assorted results, such as the ‘pinning lemma’ (Lemma 3.15 below) from [19] as tools, e.g., in the construction of the Bethe state decomposition.
While the present paper is concerned with diluted models where each node has a (fixed) bounded number of neighbors, there is also a substantial literature on fully connected models. The prime example, of course, is the Sherrington-Kirkpatrick model. The monographs of Panchenko [54] and Talagrand [57] provide an overview of this literature. In particular, the TAP equations, the (simplified) fixed point equations that correspond to the Belief Propagation equations in the fully connected case, have been established in several cases [8].
In Sect. 7 we work out several application of the general results to specific models, such as the spin glass model from Sect. 1. Pointers to related work on the specific problems can be found there.
2.5.2 The physics perspective.
The seminal work of Mézard and Parisi [48] marks the starting point of a substantial body of physics work. Highlights include the Survey Propagation algorithm and precise predictions on phase transitions, including satisfiability thresholds in combinatorial problems [41, 47, 49].
The results provided by Theorems 2.5–2.7 are perfectly in line with the physics predictions. But we should comment on a subtle point that is apt to cause confusion. Namely, it has been pointed out that within the replica symmetric phase of certain models the support of the Boltzmann distribution may decompose into an exponentially large number of tiny ‘clusters’ [41, 47], a phenomenon called ‘dynamic replica symmetry breaking’. Indeed, it has been conjectured that each of these tiny clusters induces a Bethe state [46]; for the special case of the random graph coloring problem, this can be verified rigorously [11]. At first glance this proliferation of Bethe states may appear to contradict Theorem 2.5, where the number of Bethe states is upper-bounded by an arbitrarily slowly growing function L(n). Yet the Bethe state decomposition is not unique, and despite the abundance of tiny clusters, \(\mu _{{\varvec{G}}}\) itself is replica symmetric (i.e., condition (2.10) holds for \(S= \Omega ^n\)) throughout the dynamic RSB phase. In effect, Theorem 2.5 would render just a single Bethe state that comprises all of the tiny clusters. By contrast, beyond the dynamic RSB phase, within the so-called condensed phase, Theorem 2.5 would yield a non-trivial decomposition. The existence of a condensed phase has been established rigorously in several examples [17, 19].
The variational formula for the free energy furnished by Theorem 2.7 is in line with the physics work, which does, however, provide additional clues as to the structure of the minimizer of the functional \(\mathscr {B}(\,\cdot \,)\). Specifically, three different scenarios are expected to occur, depending on the model and the choice of its parameters. First, the replica symmetric scenario with a single (or a bounded number of) Bethe states. Second, the so-called ‘one-step replica symmetry breaking’ scenario, where there are an unbounded number of ‘independent’ Bethe states. Third, the ‘full replica symmetry breaking’ scenario, where the Bethe states form a hierarchical structure; see [46] for a detailed discussion. Clearly, in order to better evaluate the variational formula it would be very valuable to establish this additional structural information rigorously; in the Erdős-Rényi case first attempts have been undertaken in [53].
2.6 Organization
In Sect. 3 we introduce the necessary pieces of notation and state some basic results that we will need. Then in Sect. 3.3 we revisit the cut metric. While much of what we need on this subject already appears in earlier papers, there are a few general preparations that we need to make and that we carry out in that section. Subsequently Sect. 4 deals with the proof of Theorem 2.5. In Sects. 5 and 6 we then prove Theorem 2.7 about the variational formula for the free energy. Sect. 6 also contains the proof of Theorem 2.6. Finally, in Sect. 7 we work through a few applications, including the spin glass and hard-core models from Sect. 1.
3 Preliminaries
3.1 Basics
For an integer \(\ell \ge 1\) we use the shorthand \([\ell ]=\{1,\ldots ,\ell \}\). Furthermore, the symbols \(O(\,\cdot \,),\Omega (\,\cdot \,),\ldots \) refer to the limit \(n\rightarrow \infty \) by default. To indicate asymptotics with respect to another variable K tending to infinity, we write \(O_K(\,\cdot \,),\Omega _K(\,\cdot \,)\), etc. Further, where set operations involve singletons, we usually omit braces. For instance, if \(x\in X\), then we just write \(X{\setminus } x\) rather than \(X{\setminus }\left\{ {x}\right\} \).
For a finite set \(\mathscr {X}\) we let \(\mathscr {P}(\mathscr {X})\) be the set of all probability distributions on \(\mathscr {X}\), endowed with the total variation distance. More generally, if \((\mathscr {X},\mathfrak {A})\) is a measurable space, then \(\mathscr {P}(\mathscr {X})=\mathscr {P}(\mathscr {X},\mathfrak {A})\) denotes the set of all probability measures on this space. Further, for probability measures \(\pi ,\pi '\in \mathscr {P}(\mathscr {X})\) we let \(\Gamma (\pi ,\pi ')\) be the set of all couplings of \(\pi ,\pi '\). Thus, \(\gamma \in \Gamma (\pi ,\pi ')\) is a probability distribution on \(\mathscr {X}\times \mathscr {X}\) with marginals \(\pi ,\pi '\).
Suppose that \(\mathscr {X}\) is a finite set, that \(n\ge 1\) is an integer and that \(\mu \in \mathscr {P}(\mathscr {X}^n)\). Then we denote by \(\varvec{\sigma }^\mu ,\varvec{\sigma }^{1,\mu },\varvec{\sigma }^{2,\mu },\ldots \) a sequence of independent samples from \(\mu \). We omit the superscript \(\mu \) where it is evident from the context. Further, if \(f:(\mathscr {X}^n)^\ell \rightarrow \mathbb {R}\) is a function, then we write \(\left\langle {{f(\varvec{\sigma }^1,\ldots ,\varvec{\sigma }^\ell )},{\mu }}\right\rangle \) for the expectation of f with respect to independent samples from \(\mu \); thus,
Suppose that \(\Omega ,V\ne \varnothing \) are finite sets. For a distribution \(\mu \in \mathscr {P}(\Omega ^V)\) and a set \(I\subset V\) we denote by \(\mu _I\) the joint distribution of the coordinates I. That is,
For \(\sigma \in \Omega ^I\) we use the shorthand \(\mu (\sigma )=\mu _I(\sigma )\). Moreover, if \(I=\{i_1,\ldots ,i_l\}\) we usually write \(\mu _{i_1,\ldots ,i_l}\) instead of \(\mu _{\{i_1,\ldots ,i_l\}}\). Additionally, if \(I\subset V\) and \(\tau \in \Omega ^V\), then we let \(\tau _I=(\tau _i)_{i\in I}\) be the restriction of \(\tau \) to I.
We keep the notation from Sect. 2; in particular, \(\Omega \) continues to denote a finite set of spins, p is a probability distribution on \(\Omega \), \(\Psi \) is a measurable space of functions \(\Omega ^k\rightarrow (0,1)\), and P is a probability distribution on \(\Psi \). In addition throughout the paper we denote by
uniformly distributed random variables with values in [0, 1]. Additionally,
denote elements of \(\Psi \) drawn from the distribution P. Further,
are uniformly distributed random variables with values in [k]. All of the above random variables are mutually independent as well as independently of any other sources of randomness. These random variables yield random functions that will play an important role: for \(i\ge 1\) we let
3.2 Factor graphs
In Sect. 2 we already introduced the random factor graph model \({\varvec{G}}(n,d,p,P)\). To facilitate the proofs we need the following abstract definition.
Definition 3.1
Suppose that \((\mathscr {X},\mathfrak {A})\) is a measurable space. An \(\mathscr {X}\)-factor graph\(G=(V,F,(\partial a)_{a\in F},(\psi _a)_{a\in F},\pi )\) consists of
a finite set V of variable nodes,
a finite set F of constraint nodes,
a set \(\partial a\subset V\) for each \(a\in F\),
a function \(\psi _a:\mathscr {X}^{\partial a}\rightarrow [0,\infty )\) for each \(a\in F\) and
a probability measure \(\mathfrak {p}\) on \(\mathscr {X}^V\), called the prior.
A factor graph induces a bipartite graph on \(V\cup F\), where \(x\in V\) is adjacent to \(a\in F\) iff \(v\in \partial a\). Accordingly, for a variable node v we let \(\partial v\subset F\) be the set of adjacent constraint nodes (i.e., \(a\in \partial v\) iff \(v\in \partial a\)). The bipartite graph defines a metric on \(V\cup F\), the shortest path metric. For a variable or constraint node u we let \(\partial ^\ell u=\partial ^\ell _Gu\) be the set of all nodes at distance precisely \(\ell \) from u. Moreover, \(\nabla ^\ell u=V\cap (u\cup \bigcup _{i\le \ell }\partial ^iu)\) denotes the set of all variable nodes at distance no more than \(\ell \) from u.
Further, for an assignment \(\sigma \in \Omega ^V\) and \(a\in F\) we use the notation \(\psi _a(\sigma )=\psi _a(\sigma _{\partial a})\) and we define
Providing that \(Z(G)>0\), we introduce a probability measure \(\mu _G\) on \(\mathscr {X}^V\), the Boltzmann distribution, by letting
Mostly the factor graphs that we deal with will have a finite space \(\mathscr {X}=\Omega \) and the prior \(\mathfrak {p}\) will be the product measure \(\mathfrak {p}=\bigotimes _{v\in V}p_v\). In this case we introduce the standard messages given an event \(S\subset \Omega ^V\) as in Sect. 2: for a constraint node a and \(v\in \partial a\) we let
In the case that \(S=\Omega ^V\) is the entire phase space, we omit the conditioning from the notation and just write \(\mu _{G,v\rightarrow a}\) and \(\mu _{G,a\rightarrow v}\), respectively.
3.3 The cut metric revisited
The cut metric, defined in (2.12), plays a key role in the proofs of the main results. In this section we summarize a few basic facts about the cut metric. Although some have been proved in prior work, we will need to provide a few extensions and adaptations for our purposes. In addition to the continuous version from (2.12), we also need a discrete version of the cut metric, which we present in Sect. 3.3.2.
3.3.1 The continuous cut metric.
We remember that \(\mathscr {K}\) denotes the space of all measurable maps \([0,1]^2\rightarrow \mathscr {P}(\Omega )\), up to equality almost everywhere; we call such maps strong kernels. The cut distance (2.12) induces a pre-metric on this space [39]. Moreover, on the space \(\mathfrak {K}\) obtained by identifying points at cut distance 0 the cut distance yields a metric. The elements of \(\mathfrak {K}\) are called weak kernels. We drop the attribute and just speak of kernels where there is no danger of confusion.
Proposition 3.2
([24]). Endowed with the cut distance \(\mathfrak {K}\) is a compact Polish space.
We continue to write \(\mathfrak {D}\) for the space of all probability measures on \(\mathfrak {K}\). This space is endowed with the weak topology. Since \(\mathfrak {K}\) is a compact Polish space, so is \(\mathfrak {D}\). Hence, there is a natural metric on \(\mathfrak {D}\) that induces the weak topology, the \(L_1\)-Wasserstein metric. We take license to denote this metric by \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\) as well. Thus, recalling that \(\Gamma (\pi ,\pi ')\) is the set of all couplings of \(\pi ,\pi '\in \mathfrak {D}\), we have
For \(\pi \in \mathfrak {D}\) we let \(\varvec{\mu }^\pi \in \mathfrak {K}\) denote a sample. We just write \(\varvec{\mu }\) where \(\pi \) is apparent.
By comparison to other metrics on the space of measurable functions \([0,1]^2\rightarrow \mathscr {P}(\Omega )\) the cut metric is extremely weak; this is highlighted by the compactness of the space \(\mathfrak {K}\) provided by Proposition 3.2. Yet the cut metric is sufficiently strong to ensure that certain functions that will be of vital interest to us are continuous. Indeed, suppose that \(m,n>0\) are integers and that \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\) is a function. Then for \(\mu \in \mathscr {K}\) we define the random variable
Because we average out the \(s_i\) and the \(\varvec{x}_i\) are uniform, the random variables \(\left\langle {{f},{\mu }}\right\rangle \) and \(\left\langle {{f},{\nu }}\right\rangle \) are identically distributed if \(\mathscr {D}_{\Box }(\mu ,\nu )=0\). Thus, we may safely write \(\left\langle {{f},{\mu }}\right\rangle \) for \(\mu \in \mathfrak {K}\).
Lemma 3.3
For any \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\), \(\ell \ge 1\) the map \(\mu \in \mathfrak {K}\mapsto \mathbb {E}\left[{\left\langle {{f},{\mu }}\right\rangle ^\ell }\right]\) is continuous with respect to the cut metric.
The proof of Lemma 3.3 can be found in the appendix. For a probability distribution \(\pi \in \mathfrak {D}\) we let \(\left\langle {{f},{\pi }}\right\rangle \) be the random variable \(\left\langle {{f},{\varvec{\mu }^{\pi }}}\right\rangle \), with \(\varvec{\mu }^{\pi }\) chosen independently of the \(\varvec{x}_i\). Since \(\mathfrak {D}\) carries the weak topology, Lemma 3.3 implies
Corollary 3.4
For any \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\), \(\ell \ge 1\) the map \(\pi \in \mathfrak {D}\mapsto \mathbb {E}\left[{\left\langle {{f},{\pi }}\right\rangle ^\ell }\right]\) is continuous.
We recall the functional \(\mathscr {B}(\,\cdot \,)\) from (2.15).
Corollary 3.5
The map \(\pi \in \mathfrak {D}\mapsto \mathscr {B}(\pi )\) is continuous.
Proof
Thanks to the tail bound (2.1), we can approximate the logarithms in (2.15) by polynomials. Therefore, the assertion follows from Corollary 3.4. \(\quad \square \)
The set \(\mathfrak {D}^\star \) of \(\pi \in \mathfrak {D}\) that are invariant under the \(*(N,M)\)-operation is a closed subset of \(\mathfrak {D}\). To see this, and to interpret the \(*(N,M)\)-operation nicely in terms of operations that are continuous under the cut metric, we introduce the following general transformation. Suppose that \(f:\Omega ^N\rightarrow (0,\infty )\) is a function and that \(\mu \in \mathscr {K}\). Then we define a random \(f*\mu \in \mathscr {K}\) as follows. Letting
we introduce
Now, \(f*\mu _{s,x}=\mu _{\varvec{t},x}\). We emphasize that \(f*\mu \in \mathscr {K}\) is random, dependent on \({\hat{\varvec{x}}}_1,\ldots ,{\hat{\varvec{x}}}_N\). The kernel is characterized by the identity
Further, since the \({\hat{\varvec{x}_i}}\) are uniform, we have \(\mathscr {D}_{\Box }(f*\mu ,f*\nu )=0\) if \(\mathscr {D}_{\Box }(\mu ,\nu )=0\). Hence, the \(*\)-operation extends to weak kernels. Furthermore, for a distribution \(\pi \) we let \(f*\pi \) be the distribution of \(f*\varvec{\mu }^\pi \).
Lemma 3.6
For any function \(f:\Omega ^k\rightarrow (0,\infty )\) the map \(\mathfrak {K}\rightarrow \mathfrak {D}\), \(\mu \mapsto f*\mu \) is continuous.
The proof of Lemma 3.6 can be found in the appendix.
The \(*(N,M)\)-operation is an application of the above \(*\)-operation to a particular random function f. To define this random function, we need one more piece of notation. Namely, for functions \(f:\Omega ^{M\times N}\rightarrow \mathbb {R}\), \(g:\Omega ^{M\times L}\rightarrow \mathbb {R}\) we define
In words, we stick the first N ‘columns’ of \(\sigma \) into f and the last L columns into g and multiply the results. Recalling the random functions \({\hat{\varvec{\psi }_i}}\), \({\hat{\varvec{\varphi }_i}}\) from Sect. 3.1, we obtain the following.
Lemma 3.7
For any \(\mu \in \mathfrak {K}\) the random \(\mu ^{*(N,M)}\in \mathfrak {K}\) is distributed as \(\left( {\bigoplus _{i=1}^N{\hat{\varvec{\varphi }}}_i\oplus \bigoplus _{i=1}^M{\hat{\varvec{\psi }_i}}}\right) *\mu .\)
Proof
This is immediate from the construction of \(\mu ^{*(N,M)}\). \(\quad \square \)
Corollary 3.8
For any N, M the map \(\mu \in \mathfrak {K}\mapsto \mu ^{*(N,M)}\) is continuous with respect to the cut metric.
Proof
Since \(\mathfrak {D}\) carries the weak topology, which is induced by the Wasserstein metric, the assertion follows from Lemmas 3.6–3.7 and (2.1). \(\quad \square \)
As a further immediate consequence of Lemma 3.7 we obtain
Corollary 3.9
A distribution \(\pi \in \mathfrak {D}\) belongs to \(\mathfrak {D}^\star \) if and only if \(\pi =\left( {\bigoplus _{i=1}^N{\hat{\varvec{\varphi }}}_i\oplus \bigoplus _{i=1}^M{\hat{\varvec{\psi }}}_i}\right) *\pi \) for all N, M.
In particular, Corollaries 3.8 and 3.9 imply that the map \(\pi \mapsto \pi ^{*(N,M)}\) is continuous for all \(N,M\ge 0\). Consequently, \(\mathfrak {D}^\star \) is a closed subset of \(\mathfrak {D}\).
3.3.2 The discrete version.
Apart from the ‘continuous’ installment of the cut metric, defined on kernels, we also need a discrete variant, defined on probability measures on discrete sets. To be precise, with \(\Omega \ne \varnothing \) our finite set of spins and V another finite set of size \(n\ge 1\), we define a metric \(\Delta _{\Box }(\,\cdot \,,\,\cdot \,)\) on \(\mathscr {P}(\Omega ^V)\) as follows. Recalling that \(\Gamma (\mu ,\nu )\) is the set of all couplings of probability measures \(\mu ,\nu \) on \(\Omega ^V\), we let
Fact 3.10
([23]) \(\Delta _{\Box }(\,\cdot \,,\,\cdot \,)\) is a metric on \(\mathscr {P}(\Omega ^V)\).
We refer to \(\Delta _{\Box }(\,\cdot \,,\,\cdot \,)\) as the discrete cut metric.
Suppose that V is a finite set. A measure \(\mu \in \mathscr {P}(\Omega ^V)\) can be represented by a point \({\dot{\mu }}\in \mathscr {K}\). Indeed, assume without loss that \(V=[n]\) and that \(\Omega =[q]\). Then the set \(\Omega ^V\) can be ordered lexicographically as \(\sigma ^{(1)},\ldots ,\sigma ^{(q^n)}\). We define \({\dot{\mu }}\in \mathscr {K}\) by letting
Comparing (3.3) with the definition (2.12) of the continuous cut metric, we see that
The discrete cut metric encodes a great deal of information about the discrete measures. A particularly important case occurs when a measure \(\mu \in \mathscr {P}(\Omega ^V)\) is close to a product measure. To be precise, we say that \(\mu \) is \(\varepsilon \)-extremal if \(\Delta _{\Box }(\mu ,\bigotimes _{v\in V}\mu _v)<\varepsilon \). In words, \(\mu \) is close to the product measure with the same marginals. In addition, \(\mu \in \mathscr {P}(\Omega ^V)\) is \((\varepsilon ,\ell )\)-symmetric if
Informally, if we choose \(\ell \) coordinates randomly, then their joint distribution typically ‘nearly’ factorizes. The following statement shows that these concepts are essentially equivalent, up to a moderate loss in the parameters.
Proposition 3.11
([23]). For any \(\Omega \) of size \(1<\left| {\Omega }\right| <\infty \), any \(0<\varepsilon <1/2\) and any \(\ell \ge 2\) there exists \(n_0>0\) such that for all \(n>n_0\) and all \(\mu \in \mathscr {P}(\Omega ^V)\) the following two statements hold.
- (i)
If \(\mu \) is \((\varepsilon /9)^3\)-symmetric, then \(\mu \) is \(\varepsilon \)-extremal.
- (ii)
If \(\mu \) is \(\varepsilon ^3/(128|\Omega |)^{4\ell }\)-extremal, then \(\mu \) is \((\varepsilon ,\ell )\)-symmetric.
It is an elementary observation that probability measures that are close in the discrete cut metric cannot have very different marginals. Formally, we have the following.
Lemma 3.12
For any two probability measures \(\mu ,\nu \) on \(\Omega ^V\) we have \(\sum _{v\in V}\Vert {\mu _v-\nu _v}\Vert _{\mathrm {TV}}\le 2|\Omega |\Delta _{\Box }(\mu ,\nu ).\)
Proof
There exists \(\omega \in \Omega \) such that
Let \(I=\left\{ {i\in V:\mu _i(\omega )\ge \nu _i(\omega )}\right\} \) and \(B=\Omega ^V\times \Omega ^V\). Then for any coupling \(\gamma \in \Gamma (\mu ,\nu )\),
Combining (3.6) and (3.7) completes the proof. \(\quad \square \)
The converse bound, that close marginals imply closeness in the cut metric, holds for extremal measures.
Lemma 3.13
For any two \(\varepsilon \)-extremal \(\mu ,\nu \in \mathscr {P}(\Omega ^V)\) we have \(\mathscr {D}_{\Box }(\mu ,\nu )\le 2\varepsilon +\sum _{v\in V}\Vert {\mu _v-\nu _v}\Vert _{\mathrm {TV}}.\)
Proof
Assume without loss that \(V=[n]\) and let \({\bar{\mu }}=\bigotimes _{i=1}^n\mu _i\) and \({\bar{\nu }}=\bigotimes _{i=1}^n\nu _i\). Since \(\mu ,\nu \) are \(\varepsilon \)-extremal, we have
Let \(\gamma _i\in \mathscr {P}(\Omega \times \Omega )\) be an optimal coupling of \(\mu _i,\nu _i\), i.e., \(\Vert {\mu _i-\nu _i}\Vert _{\mathrm {TV}}=\sum _{\sigma \ne \tau }\gamma _i(\sigma ,\tau )\). Then \(\gamma =\bigotimes _{i=1}^n\gamma _i\) is a coupling of \({\bar{\mu }},{\bar{\nu }}\). Further, for any \(I\subset [n],B\subset \Omega ^n\times \Omega ^n,\omega \in \Omega \) we have
Hence, \(\Delta _{\Box }({\bar{\mu }},{\bar{\nu }})\le \sum _{i=1}^n\Vert {\mu _i-\nu _i}\Vert _{\mathrm {TV}}\), and thus the assertion follows from (3.8) and the triangle inequality. \(\quad \square \)
We also make a note of the following enhanced triangle inequality.
Lemma 3.14
([23]). Suppose that \(\mu ^{(1)},\nu ^{(1)},\ldots ,\mu ^{(\ell )},\nu ^{(\ell )}\) are probability measures on \(\Omega ^V\) and that \(u_1,\ldots ,u_\ell \ge 0\) are numbers such that \(\sum _{i=1}^\ell u_i=1\). Then
Finally, we come to an important fact, intimately related to the Szemerédi regularity lemma from combinatorics. Namely, any probability distribution \(\mu \in \mathscr {P}(\Omega ^V)\) is close in the cut metric to a mixture of a ‘small’ number of product measures. To state this results precisely, suppose that \(I\subset V\) and that \(\sigma \in \Omega ^I\). Let
be the sub-cube of \(\Omega ^V\) where the entries of the coordinates in I coincide with \(\sigma \). Further, assuming that \(\mu \in \mathscr {P}(\Omega ^V)\) and \(\mu (S^{I,\sigma })>0\), we let
be the corresponding conditional distribution of \(\mu \). (If \(\mu (S^{I,\sigma })=0\), then we agree that \(\mu ^{I,\sigma }\) is the uniform distribution on \(S^{I,\sigma }\).) The following key lemma shows that \(\mu ^{I,\sigma }\) is likely \(\varepsilon \)-symmetric for suitably random \(I,\sigma \).
Lemma 3.15
([19, Lemma 3.5]). For any set \(\Omega \) of size \(1<|\Omega |<\infty \) and any \(\varepsilon >0\) there exist \(n_0>0\) and a random variable \(0<\varvec{\theta }\le 2\varepsilon ^{-4}\log |\Omega |\) such that for all \(n>n_0\) and all \(\mu \in \mathscr {P}(\Omega ^V)\) the following holds. Let \(\varvec{I}\subset V\) be a uniformly random subset of size \(\varvec{\theta }\) and choose \(\varvec{\sigma }\in \Omega ^{\varvec{I}}\) from \(\mu _{\varvec{I}}\). Then \(\mathbb {P}\left[{\mu ^{\varvec{I},\varvec{\sigma }} \text{ is } \varepsilon \text{-symmetric }}\right]>1-\varepsilon .\)
We can apply Lemma 3.15 multiple times to obtain a decomposition of the set \(\Omega ^V\) into sub-cubes \(S_1,\ldots ,S_\ell \) such that \(\mu [\,\cdot \,|S_i]\) is \(\varepsilon \)-symmetric. To obtain these sub-cubes we just choose the set \(\varvec{I}\) randomly as in Lemma 3.15 and let \(\sigma \) range over all \(|\Omega |^{\varvec{\theta }}\) possible assignments of \(\varvec{I}\). We then obtain the following version of the regularity lemma.
Corollary 3.16
([10]). For any finite set \(\Omega \ne \varnothing \) and any \(\varepsilon >0\) there exist \(L,n_0\) such that for all \(n>n_0\) the following is true. For any \(\mu \in \mathscr {P}(\Omega ^V)\) there exists a partition of \(\Omega ^V\) into pairwise disjoint sets \(S_0,\ldots ,S_\ell \), \(\ell \le L\), such that \(\mu (S_0)<\varepsilon \) and such that \(\mu (\,\cdot \,|S_i)\) is \(\varepsilon \)-symmetric for each \(1\le i\le \ell \).
3.3.3 Contiguity.
Suppose that \(\Omega \ne \varnothing \) is a finite set and let \(c\ge 1\). A probability distribution \(\nu \) on \(\Omega ^n\) is c-contiguous with respect to another probability distribution \(\mu \) if
Moreover, \(\mu ,\nu \) are mutually c-contiguous if each is c-contiguous with respect to the other.
Lemma 3.17
For any \(c\ge 1,\delta >0\) there exists \(\varepsilon >0\) such that for all large enough n the following is true. Assume that \(\mu \in \mathscr {P}(\Omega ^n)\) is \(\varepsilon \)-extremal and that \(\nu \) is c-contiguous with respect to \(\mu \). Then \(\nu \) is \(\delta \)-extremal and \(\Delta _{\Box }(\mu ,\nu )<\delta \).
Proof
Choose \(0<\varepsilon \ll \eta \ll \zeta \ll \delta \) and assume that \(n>n_0(\varepsilon )\) is sufficiently large and that \(\mu \) is \(\varepsilon \)-extremal. Applying Corollary 3.16 to the measure \(\nu \), we obtain a partition \(S_0,S_1,\ldots ,S_\ell \) of the cube \(\Omega ^n\) into pairwise disjoint sets such that \(\nu (S_0)<\eta \) and such that \(\nu (\,\cdot \,|S_i)\) is \(\eta \)-symmetric for every \(i=1,\ldots ,\ell \). Moreover, \(\ell \) is bounded by a number \(L(\eta ,\Omega )>0\) that depends on \(\eta \) and \(|\Omega |\) only.
Suppose that for every \(1\le i\le \ell \) with \(\nu (S_i)\ge \eta /\ell \) we have
Then Lemma 3.13 yields \(\Delta _{\Box }(\nu (\,\cdot \,|S_i),\mu )\le 2\eta +\zeta \). Hence, Lemma 3.14 shows that
Further, (3.10) implies that \(\sum _{j=1}^n\Vert {\nu _j-\mu _j}\Vert _{\mathrm {TV}}\le \zeta +2\eta \). Hence, letting \({\bar{\mu }}=\bigotimes _{i=1}^n\mu _i\), \({\bar{\nu }}=\bigotimes _{i=1}^n\nu _i\) and applying Lemma 3.13 a second time, we obtain \(\Delta _{\Box }({\bar{\mu }},{\bar{\nu }})\le \zeta +2\eta \). Thus, invoking the \(\varepsilon \)-extremality of \(\mu \) and (3.11), we conclude that
In summary, if (3.10) is satisfied, then (3.11) and (3.12) yield \(\Delta _{\Box }(\nu ,\mu )<\delta \) and \(\Delta _{\Box }(\nu ,{\bar{\nu }})<\delta \), as claimed.
Thus, we are left to establish (3.10). Assume for contradiction that there is \(1\le i\le \ell \) with \(\nu (S_i)\ge \eta /\ell \) and \(\sum _{j=1}^n\Vert {\nu _j(\,\cdot \,|S_i)-\mu _j}\Vert _{\mathrm {TV}}>\zeta n\). Then there exist \(J\subset [n]\) and \(\omega \in \Omega \) such that \(\sum _{j\in J}{\nu _j(\omega |S_i)-\mu _j(\omega )} >\zeta n/\left( {2|\Omega |}\right) \). In other words, the random variable \(X(\sigma )=\sum _{j\in J}\varvec{1}\{\sigma _j=\omega \}\) satisfies
Due to the \(\eta \)-symmetry of \(\nu (\,\cdot \,|S_i)\) and the \(\varepsilon \)-symmetry of \(\mu \), the second moments work out as
Combining (3.13) and (3.14) with Chebyshev’s inequality and keeping in mind that \(\varepsilon \ll \eta \ll \zeta \), we conclude that the event \(B=\left\{ {X(\varvec{\sigma })\ge \left\langle {{X},{\mu }}\right\rangle +\zeta /(4|\Omega |)}\right\} \) satisfies
However, if \(\nu \) is c-contiguous with respect to \(\mu \), then (3.15) yields
which contradicts the choice of the parameters \(\varepsilon ,\eta \). \(\quad \square \)
Corollary 3.18
For any \(\delta >0\) there exists \(\varepsilon >0\) such that the following is true. Suppose that \(\mu \) is \(\varepsilon \)-extremal and that \(S\subset \Omega ^n\) is an event such that \(\mu (S)\ge \delta \). Then \(\mu (\,\cdot \,|S)\) is \(\delta \)-extremal and \(\Delta _{\Box }(\mu (\,\cdot \,|S),\mu )<\delta \).
Proof
Since \(\mu (\sigma |S)\le \mu (\sigma )/\mu (S)\) for every \(\sigma \), the conditional distribution \(\mu (\,\cdot \,|S)\) is \(1/\varepsilon \)-contiguous with respect to \(\mu \). Thus, the assertion follows from Lemma 3.17 immediately. \(\quad \square \)
4 Bethe State Decompositions
In this section we prove Theorem 2.5. Before getting into the details, why might we expect the statement of the theorem to be correct? In order to construct the desired decomposition of the phase space \(\Omega ^{V_n}\) we could attempt to apply Lemma 3.15. The lemma shows that if we randomly pick a set \(\varvec{I}\subset V_n\) of a moderate size of, say, \(\varvec{I}=O(\log \log n)\), we will likely obtain a decomposition \((S^{\varvec{I},\sigma })_{\sigma \in \Omega ^{\varvec{I}}}\) such that (2.10) is satisfied for most of its parts. Formally, Lemma 3.15 guarantees that with high probability,
Proposition 3.11 extends (4.1) from pairwise independence to independence of bounded numbers of randomly chosen variable nodes. Thus, the pinning operation eliminates long-range correlations.
But why should short-range correlations be described by Belief Propagation? Recalling the standard messages from (2.4)–(2.5), Belief Propagation (2.3) asserts that with high probability,
This last formula expresses the notion that once we remove x and its adjacent constraints \(\partial x\), the spins assigned to the variables y at distance two from x in \({\varvec{G}}\) are (essentially) stochastically independent. This is very much in line with the absence of long-range correlations: in \({\varvec{G}}-(x\cup \partial x)\), the variables y likely are far apart from one another as \({\varvec{G}}\) contains only few short cycles. (For a more detailed discussion of the intuition behind Belief Propagation we refer to [46, Chapter 14].)
Yet (4.2) does not follow from (4.1) directly. Indeed, (4.1) merely states that the joint two randomly chosen variables of \({\varvec{G}}\) are likely approximately independent. But being second neighbors of x, the y variables in (4.2) are anything but a uniformly random family of variable nodes. For random factor graph models of Erdős-Rényi type, this difficulty is easily overcome [23] because the random factor graph obtained by removing \(x\cup \partial x\) has essentially the same distribution as the original model (of order \(n-1\)). In effect, the original factor graph is distributed nearly the same as the factor graph of order \(n-1\) plus a new variable plus adjacent constraints connected to a uniformly random family of variable nodes, and (4.1) applies to these random attachments. Of course, this trick does not work on the Bethe lattice, where the variables y stand out as they have degree less than d. To cope with this difficulty, we will consider an auxiliary model in which a random number of variable nodes along with their neighborhoods are removed. Moreover, we will apply Lemma 3.15 not merely to the entire original factor graph \({\varvec{G}}\) but also to the variable nodes that have degree \(d-1\) after the removal operation, which will enable us to establish the Belief Propagation equations on the reduced factor graph. Finally, we will use the decorrelation property (4.1) together with the tools from Sect. 3.3 to stitch the factor graph back up, i.e., to get back to the original model with no variables and constraints removed. Let us now carry out this strategy in detail.
4.1 The construction
We aim to show that the Boltzmann distribution \(\mu _{{\varvec{G}}}\) is well approximated by a collection of no more than L Belief Propagation fixed points for an arbitrarily slowly growing \(L=L(n)\). As (4.2) shows, for a given variable node v the corresponding fixed point equations involve the messages sent by the constraint \(a\in \partial v\), which in turn are determined by the messages sent out by the variables w at distance precisely two from v. Thus, to express a single application of the Belief Propagation operator we require information about the variable nodes at distance two from v. Therefore, in addition to the Boltzmann distribution \(\mu _{{\varvec{G}}}\) we will consider an enhanced measure \({\hat{\mu }}_{{\varvec{G}}}\) that captures the joint distribution of the second neighborhoods.
To be precise, let \(G=(V,F,(\partial a)_{a\in F},(\psi _a)_{a\in F},p^{\otimes n})\) be a factor graph. Then its Boltzmann distribution \(\mu _G\) ‘lives’ on the space \(\Omega _G=\Omega ^V\). In addition, recalling that \(\nabla ^2_Gv\) consists of all variable nodes at distance at most two from v, consider the space
of second neighborhood assignments, whose elements we denote as \(\tau =(\tau (v,w))_{v\in V,w\in \nabla ^2_Gv}\). The factor graph G induces an embedding
Thus, \(\mu _G\) induces a probability distribution \({\hat{\mu }_G}\) on \({\hat{\Omega }_G}\). For a variable v we denote by \({\hat{\mu }}_{G,v}\in \mathscr {P}(\Omega ^{\nabla ^2_Gv})\) the marginal distribution of \({\hat{\mu }_G}\) on the v-factor of \({\hat{\Omega }_G}\).
The enhanced measure \({\hat{\mu }}_{{\varvec{G}}}\) will play a vital role in the construction of the Bethe state decomposition. Indeed, by comparison to the Erdős-Rényi case, the rigid geometry of the random regular graph causes significant difficulties. More precisely, while the Belief Propagation messages are defined in terms of removing one or a few constraints, such operations clearly destroy regularity. Hence, we need to create a bit of wiggling room. To this end, we remove some variable nodes along with their adjacent constraint nodes, thereby leaving a few variable nodes with degree \(d-1\) rather than d. We refer to these variables as ‘cavities’. Clearly, this operation loses some information and would therefore by itself not suffice to prove Theorem 2.5. However, what saves the day is that the enhanced measure \({\hat{\mu }}_{{\varvec{G}}}\) contains the extra information needed to stitch the graph back up without losing track of the Bethe decomposition.
Unsurprisingly, the construction is subtle and involves several steps. It requires a number of carefully chosen parameters. Specifically, given a slowly diverging monotonically increasing positive integer sequence \(L=L(n)\rightarrow \infty \) as in Theorem 2.5, we choose a sequence \(0<\xi =\xi (L)=o(1)\) that tends to zero monotonically sufficiently slowly, a further sequence \(\omega =\omega (\xi )\rightarrow \infty \) that tends to infinity monotonically sufficiently slowly, as well as sequences \(0<\vartheta =\vartheta (\omega )=o(1)\), \(0<\zeta =\zeta (\vartheta )=o(1)\), \(0<\beta =\beta (\vartheta )=o(1)\), \(0<\alpha =\alpha (\beta )=o(1)\), \(0<\eta =\eta (\alpha )=o(1)\) and \(0<\varepsilon =\varepsilon (\eta )=o(1)\) that tend monotonically to zero slowly enough. In summary, the pecking order reads
and we always assume tacitly that \(n>n_0\) is sufficiently large.
We are ready to begin the construction. Let \({\varvec{G}}_*\) be the random factor graph obtained from \({\varvec{G}}\) as follows. Let \(\varvec{\theta }_*\) be a copy of the random variable \(\varvec{\theta }_\xi \) promised by Lemma 3.15; \(\varvec{\theta }_*\) is independent of \({\varvec{G}}\). Further, let \(\varvec{U}_*\) be a random set of \(\varvec{\theta }_*\) variable nodes of \({\varvec{G}}\) and draw \(\varvec{\sigma }_*\) from \(\mu _{{\varvec{G}}}\) independently of \(\varvec{\theta }_*\) and \(\varvec{U}_*\). Now, obtain \({\varvec{G}}_*\) from \({\varvec{G}}\) by changing the prior distribution to
Additionally, let \(\varvec{\omega }\) be a random variable with distribution \(\mathrm{Po}(\omega )\wedge 2\omega \), independent of everything else, and let \(\varvec{W}=\{v_{n-\varvec{\omega }+1},\ldots ,v_n\}\). Finally, obtain \({\varvec{G}}_*'\) from \({\varvec{G}}_*\) by removing the variable nodes in \(\varvec{W}\) along with their adjacent constraint nodes.
Thus, in \({\varvec{G}}_*'\) we pin the spins of the variable nodes in \(\varvec{U}_*\) and their neighbors to the values observed under \(\varvec{\sigma }_*\), which is drawn from \(\mu _{{\varvec{G}}}\). Additionally, we create cavities by removing the last \(\varvec{\omega }\) variable nodes along with their adjacent constraints. The following lemma shows that the removal of the variable nodes in \(\varvec{W}\) does not shift the marginals of the enhanced Boltzmann distribution much.
Lemma 4.1
With probability at least \(1-\omega ^{-10}\) over the choice of \(\varvec{\theta }_*\), \(\varvec{\sigma }_*\) and \({\varvec{G}}\) the following statements are true.
- (i)
both \(\mu _{{\varvec{G}}_*}\) and \({\hat{\mu }}_{{\varvec{G}}_*}\) are \(\xi ^{1/4}\)-extremal.
- (ii)
we have \(\sum _{v\in V_n{\setminus }(\varvec{W}\cup \partial ^2\varvec{W})}\Vert {{\hat{\mu }}_{{\varvec{G}}_*,v}-{\hat{\mu }}_{{\varvec{G}}_*',v}}\Vert _{\mathrm {TV}}<\vartheta n.\)
Proof
By construction, \({\hat{\mu }}_{{\varvec{G}}_*}\) is identical to the measure obtained through the pinning procedure of Lemma 3.15 applied to the \(\varvec{U}_*\)-components of the space \({\hat{\Omega }}_{{\varvec{G}}}\). Hence, Lemma 3.15 and Proposition 3.11 imply that \({\hat{\mu }}_{{\varvec{G}}_*}\) is \(\xi ^{1/4}\)-extremal with probability at least \(1-\xi ^{1/4}\). Since \(\mu _{{\varvec{G}}_*}\) is a projection of \({\hat{\mu }}_{{\varvec{G}}_*}\), we obtain (i).
Further, let \(\mathscr {V}=V_n{\setminus }(\varvec{W}\cup \partial ^2\varvec{W})\). If \({\hat{\mu }}_{{\varvec{G}}_*}\) is \(\xi ^{1/4}\)-extremal, then by the definition of the cut metric the distribution \({\hat{\mu }}_{{\varvec{G}}_*,\mathscr {V}}\) induced on the neighborhoods of \(\mathscr {V}\) is \(2\xi ^{1/4}\)-extremal, because \(|\mathscr {V}|\ge n/2\). Additionally, there is \(C=C(\omega )\) such that \({\hat{\mu }}_{{\varvec{G}}_*,\mathscr {V}}\) is C-contiguous with respect to \({\hat{\mu }}_{{\varvec{G}}_*',\mathscr {V}}\) with probability at least \(1-\omega ^{-11}\). This follows from (2.1), because \({\varvec{G}}_*'\) is obtained from \({\varvec{G}}_*\) by removing no more than \(d\omega \) constraint nodes. Therefore, (ii) follows from (i) and Lemma 3.17, provided that \(\xi ,\omega ,\vartheta \) are chosen appropriately in accordance with (4.3). \(\quad \square \)
The following proposition, which establishes the Belief Propagation equations on \({\varvec{G}}_*'\), constitutes the main technical step of the proof.
Proposition 4.2
With probability at least \(1-\alpha ^{9}\), \({\varvec{G}}'_*\) enjoys the following properties.
- (i)
the standard messages \((\mu _{{\varvec{G}}'_*,v\rightarrow a},\mu _{{\varvec{G}}'_*,a\rightarrow v})_{v\in V({\varvec{G}}_*'),a\in \partial v}\) form an \(\alpha ^{9d}\)-Belief Propagation fixed point.
- (ii)
we have
$$\begin{aligned}&\sum _{v\in V({\varvec{G}}'_*)} \sum _{\sigma \in \Omega ^{\nabla ^2v}}\left| \mu _{{\varvec{G}}_*'}(\sigma )\right. \\&\qquad \left. -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*',w\rightarrow a}(\sigma _w)}{\sum _{\chi \in \Omega }p(\chi )\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}:\tau _v=\chi }\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\varvec{G}}_*',w\rightarrow a}(\tau _w)} \right|&\quad <\alpha ^{9d}n. \end{aligned}$$
Before we prove Proposition 4.2 in Sect. 4.2, let us indicate how the theorem follows. As a final preparation we need the following basic fact.
Lemma 4.3
For any factor graph G, for any variable node v, any \(S\subset \partial v\) and any \(\sigma \in \Omega \) we have
Proof
The partition function works out to be
Hence, for any \(\tau \in \Omega ^{V(G)}\),
and the average in the denominator involves variables in \(v\cup \partial ^2v\) only. \(\quad \square \)
Proof of Theorem 2.5
For any assignment \(\chi \in \mathscr {X}=\prod _{v\in \varvec{U}_*}\Omega ^{\nabla ^2v}\) of the variables in \(\varvec{U}_*\) and their neighborhoods let
Then \((S(\chi ))_{\chi \in \mathscr {X}}\) is a decomposition of \({\hat{\Omega }}_{{\varvec{G}}}\) into no more than \(q^{(1+d(k-1))\varvec{\theta }_*}\) sub-cubes, corresponding to the neighborhood assignments of the \(\varvec{\theta }_*\) variable nodes in \(\varvec{U}_*\). As Lemma 3.15 shows, by choosing the functions from (4.3) appropriately we can guarantee that \(q^{(1+d(k-1))\varvec{\theta }_*}\le L\). We are going to show that the decomposition \((S(\chi ))_{\chi }\) meets the requirements of the theorem w.h.p.
For \(\chi \in \mathscr {X}\) let \({\varvec{G}}_*[\chi ]\) be the random factor graph \({\varvec{G}}_*\) given that \(\varvec{\sigma }_{*}(w)=\chi (v,w)\) for all \(v\in \varvec{U}_*\) and all \(w\in \nabla ^2v\). Also let \({\varvec{G}}_*'[\chi ]\) be the factor graph obtained from \({\varvec{G}}_*[\chi ]\) by removing the variables in \(\varvec{W}\) along with their adjacent constraint nodes. Further, let \({{\mathscr {E}}}_\chi \) be the event that the following four conditions are satisfied.
- E1::
Both \(\mu _{{\varvec{G}}_*[\chi ]}\) and \({\hat{\mu }}_{{\varvec{G}}_*[\chi ]}\) are \(\xi ^{1/8}\)-extremal.
- E2::
We have
$$\begin{aligned} \sum _{v\in V_n{\setminus }(\varvec{W}\cup \partial _2\varvec{W})}\left\| {{\hat{\mu }}_{{\varvec{G}}_*[\chi ],v} -{\hat{\mu }}_{{\varvec{G}}_*'[\chi ],v}}\right\| _{\mathrm {TV}}<\vartheta n. \end{aligned}$$(4.5)- E3::
On \({\varvec{G}}_*'[\chi ]\) the standard messages form an \(\alpha ^{9d}\)-BP fixed point and
$$\begin{aligned}&\sum _{v\not \in \varvec{W}\cup \partial ^2\varvec{W}} \sum _{\sigma \in \Omega ^{\nabla ^2v}}\nonumber \\&\quad \left| {\mu _{{\varvec{G}}_*'[\chi ]}(\sigma ) -\frac{p(\sigma _v)\prod _{a\in \partial v}\psi _a(\sigma )\prod _{w\in \partial a}\mu _{{\varvec{G}}_*'[\chi ],w\rightarrow a}(\sigma _w)}{\sum _{s\in \Omega }p(s)\prod _{a\in \partial v}\sum _{\tau \in \Omega ^{\partial a}:\tau _v=s}\psi _a(\tau ) \prod _{w\in \partial a}\mu _{{\varvec{G}}_*'[\chi ],w\rightarrow a}(\tau _w)} }\right| \nonumber \\&\qquad <\alpha ^{9d}n. \end{aligned}$$(4.6)- E4::
There are no more than \(\alpha ^{10d}n\) constraint nodes a in \({\varvec{G}}\) such that \(\min _{\sigma \in \Omega ^k}\psi _a(\sigma )\le \alpha ^{1/4}\), nor are there more than \(\varepsilon ^{20}n\) constraint nodes a such that \(\min _{\sigma \in \Omega ^k}\psi _a(\sigma )\le \varepsilon \).
Then (2.1), Lemma 4.1 and Proposition 4.2 yield \(\mathbb {E}\left[{\sum _{\chi }\mu _{{\varvec{G}}}(\chi )(1-\varvec{1}{{\mathscr {E}}}_\chi )\mid {\mathbf {E4}}}\right]\le \alpha ^8\). Thus, Markov’s inequality shows
Hence, we are left to argue that \(S(\chi )\) is an \(\varepsilon \)-Bethe state of \({\varvec{G}}\) if the event \({{\mathscr {E}}}_\chi \) occurs. As a first step, we are going to show that the standard messages of \({\varvec{G}}_*'[\chi ]\), \({\varvec{G}}_*[\chi ]\) are close: given \({{\mathscr {E}}}_\chi \), we claim
To see this, recall that \(\mu _{{\varvec{G}}_*[\chi ],v\rightarrow a}\) is the marginal of v in the factor graph \({\varvec{G}}_*[\chi ]-a\) and that \(\mu _{{\varvec{G}}_*[\chi ],v\rightarrow a}\) is the marginal of v in the factor graph obtained from \({\varvec{G}}_*[\chi ]\) by removing all \(b\in \partial v{\setminus } a\) and disregarding the prior. Hence, Lemma 4.3 shows that
and analogously for \({\varvec{G}}_*'[\chi ]\). Providing \(\vartheta \ll \alpha ^d\), we obtain (4.8) from (4.5), (4.9), (4.10) and E2, E4. Further, the estimate (4.8) and the fact that the standard messages of \({\varvec{G}}_*'[\chi ]\) are an \(\alpha ^{9d}\)-BP fixed point imply that the standard messages of \({\varvec{G}}_*[\chi ]\) are an \(\eta \)-BP fixed point. Thus, we have established BS1.
In order to prove BS2, we estimate the derivatives of a term like in (4.6) as follows:
Hence, (4.5), (4.6), (4.8) and E4 and the bound \(|\varvec{W}\cup \partial ^2\varvec{W}|=O(\log n)\) yield
Additionally, we claim that
To see this, suppose that b satisfies \(\min _{\sigma \in \Omega ^k}\psi _b(\sigma )\ge \varepsilon \), that \(b\in \partial v\) for a variable node v such that
and that
All but \(\varepsilon ^{10}n\) constraint nodes b enjoy these properties, due to E4, (4.11) and because the standard messages of \({\varvec{G}}_*[\chi ]\) form an \(\eta \)-BP fixed point. For any such b and any \(\sigma \in \Omega ^{\partial b}\) we obtain
whence (4.12) follows by averaging on b.
Finally, BS2 follows from (4.11), (4.12), the \(\xi ^{1/8}\)-extremality of \({\hat{\mu }}_{{\varvec{G}}_*[\chi ]}\). Indeed, let \(\varvec{I},\varvec{J}\) be random sets of at most \(1/\varepsilon \) variable/constraint nodes. For each \(b\in \varvec{J}\) pick a variable node \(v_b\in \partial b\). Because \({\hat{\mu }}_{{\varvec{G}}_*[\chi ]}\) is \(\xi ^{1/8}\)-extremal, Proposition 3.11 yields
Furthermore, (4.11) and (4.12) imply that with probability at least \(1-\varepsilon ^2\) over the choice of \(\varvec{I},\varvec{J}\) we have
If these estimates hold, then for any configuration \(\sigma \in \Omega ^{\varvec{I}\cup \partial \varvec{J}\cup \partial ^2\varvec{I}}\) we obtain
Thus, BS2 follows from (4.16) and (4.17).
Finally, to obtain the Bethe state decomposition of the simple factor graph \({\mathbb {G}}\), we merely recall that \(\mathbb {P}\left[{{\varvec{G}}\in \mathscr {S}}\right]=\Omega (1)\) by Fact 2.2. Hence, the claim about \({\mathbb {G}}\) follows immediately form the statement for \({\varvec{G}}\) and Bayes’ rule. \(\quad \square \)
4.2 Proof of Proposition 4.2
By construction, the random factor graph \({\varvec{G}}_*'\) comprises a pairing of variable clones \((v_i,h)\in V_n\times [d]\) and constraint clones \((a_j,h)\in F_m\times [d]\). But since we obtained \({\varvec{G}}_*'\) from \({\varvec{G}}_*\) by removing some variable nodes \(\varvec{W}\) along with their adjacent constraint nodes, not all of the variable clones \((v_i,h)\) with \(i\le n-\varvec{\omega }\) are paired. We call variables with at least one unpaired clone cavities. Let \(\mathscr {C}\) be the set of all cavities.
The basic idea behind the proof is as follows. We will add a new variable node \(v^+\) along with new adjacent constraint nodes \(b_1,\ldots ,b_d\) to \({\varvec{G}}_*'\). Apart from \(v^+\), these new constraint nodes are adjacent to some of the cavities. The fresh randomness afforded by this construction will facilitate the study of the standard messages from \(v^+\) to the \(b_i\) as well as the reverse messages. Then we will argue that \(v^+\) is essentially indistinguishable from a randomly chosen variable node of \({\varvec{G}}_*'\), thereby extending the analysis to almost all the messages of \({\varvec{G}}_*'\).
Formally, since \(\varvec{\omega }\) is a Poisson variable with mean \(\omega \) truncated at \(2\omega \), w.h.p. we have \(\omega /2\le |\mathscr {C}|\le 2d(k-1)\omega \). Given that \(|\mathscr {C}|\ge d(k-1)\), obtain \({\varvec{G}}_*^-\) from \({\varvec{G}}_*'\) by re-inserting one variable node \(v_+=v_{n-\varvec{\omega }+1}\) along with d new constraint nodes \(b_1,\ldots ,b_d\). For each of these constraint nodes a random clone \((b_i,\varvec{h}_i)\), \(\varvec{h}_i\in [k]\), is paired with a random clone \(v_+\). In addition, the \(b_i\) are paired randomly to \(k-1\) cavities. The weight functions \(\psi _{b_i}\) are chosen independently from P. The following lemma shows that the distributions of \({\varvec{G}}_*'\) and \({\varvec{G}}_*^-\) are reasonably close.
Lemma 4.4
For any event \({{\mathscr {E}}}\) we have \(\mathbb {P}\left[{{\varvec{G}}_*'\in {{\mathscr {E}}}}\right]\le \alpha ^{-1}\mathbb {P}\left[{{\varvec{G}}_*^-\in {{\mathscr {E}}}}\right]+O(\alpha ^{100}).\)
Proof
We need to get a grip on the conditional distribution of the second neighborhood of \(v_+\) in \({\varvec{G}}\) given \({\varvec{G}}_*'\). This is non-trivial because of the revised prior of \({\varvec{G}}_*\) introduced by the pinning operation (4.4); for the assignment \(\varvec{\sigma }_*\) is correlated with the neighborhood of \(v_+\) in \({\varvec{G}}\). To begin, let \(\mathscr {A}\) be the event that no constraint node of \({\varvec{G}}\) is connected by two edges with the variable nodes \(v_{n-\varvec{\omega }+1},\ldots ,v_n\) and that all cavities have degree precisely \(d-1\). Then (4.3) guarantees that
Further, given \(\mathscr {A}\) the total number of cavities of \({\varvec{G}}_*'\) is equal to \(d(k-1)\varvec{\omega }\), and thus
Let \(\mathscr {A}'\) be the event that \(\mathscr {A}\) occurs, that \(\left| {\mathscr {C}}\right| \ge \omega /2\) and that the weight functions of all constraints adjacent to \(v_+\) take a minimum value of at least \(\alpha ^{-1/(2dk)}\). We condition on the event \(\mathscr {A}'\), which occurs with probability \(1+O(\alpha ^{100})\) due to (2.1), (4.18) and (4.19).
Let \(\mathscr {N}_1,\mathscr {N}_2\) be two possible outcomes of the depth-two neighborhoods of \(v_+\) in \({\varvec{G}}\) given \({\varvec{G}}_*'\). Thus, \(\mathscr {N}_1,\mathscr {N}_2\) specify the weight functions of the d constraints adjacent to \(v_+\), the pairing of the clones of \(v_+\) to those of these constraint nodes, and the pairing of these d constraint nodes and the cavities \(\mathscr {C}\). In addition, let \({\varvec{G}}'\) be the random factor graph obtained from \({\varvec{G}}_*'\) by restoring the prior to \(p^{\otimes n}\). Then we can set up a coupling \((\varvec{\Gamma }_1,\varvec{\Gamma }_2)\) of \({\varvec{G}}\) given \({\varvec{G}}',\mathscr {N}_1,\mathscr {A}'\) and of \({\varvec{G}}\) given \({\varvec{G}}',\mathscr {N}_2,\mathscr {A}'\) such that under \(\Gamma \) the two random factor graphs differ in no more than 2dk edges: the coupling simply switches the pairings occurring in \(\mathscr {N}_1\) but not in \(\mathscr {N}_2\), and vice versa. In effect, on \(\mathscr {A}'\) the Boltzmann distributions \(\mu _{\varvec{\Gamma }_1}, \mu _{\varvec{\Gamma }_2}\) are mutually \(\alpha ^{-1}\)-contiguous. Consequently, since the priors are amended according to samples from these respective Boltzmann distribution, we conclude that for any two outcomes \(\mathscr {N}_1,\mathscr {N}_2\) of the second neighborhood of \(v^+\) and for any possible outcome g of \({\varvec{G}}_*'\),
Combining (4.18)–(4.20), we conclude that for any possible g and for any \(2\omega /3\le \omega _0\le 3\omega /2\),
Finally, the assertion follows from (4.21) because \(d_{\mathrm {TV}}(\varvec{\omega },\varvec{\omega }+1)=O(\omega ^{-1/2})=O(\alpha ^{200})\) and \(\mathbb {P}\left[{2\omega /3\le \varvec{\omega }\le 3\omega /2}\right]=1-\exp (-\Omega (\omega ))=1-O(\alpha ^{200})\). \(\quad \square \)
Lemma 4.4 shows that studying the messages received by and emanating from \(v_+\) is about as good as studying the messages of a random variable node of \({\varvec{G}}_*'\). The randomness involved in the attachment process will help, but is not yet quite sufficient to actually verify the Belief Propagation equations. Namely, we also need to make sure that the Boltzmann distribution of the cavities is extremal in order to argue that typically the joint distribution of the variables where the new constraints \(b_1,\ldots ,b_d\) are anchored factorizes. Unfortunately, we do not know a priori that extremality holds. Indeed, while going from \({\varvec{G}}\) to \({\varvec{G}}_*'\) renders the Boltzmann distribution \(\xi ^{1/4}\)-extremal (by Lemma 4.1), the cavities are too few in number to conclude that their joint distribution is extremal.
Hence, we will apply a second round of pinning. But this time we will pin the cavities directly. To be precise, recalling the random variable \(\varvec{\theta }_{+}=\varvec{\theta }_{\zeta }\) from Lemma 3.15, let \(\mathscr {C}_+\subset \mathscr {C}\) be a random subset of size \(\varvec{\theta }_{+}\wedge |\mathscr {C}|\). Further, draw a sample \(\varvec{\sigma }_{+}\) from \(\mu _{{\varvec{G}}_*'}\). The choice of \(\mathscr {C}_+,\varvec{\sigma }_{+}\) is independent of the choice of the constraints \(b_1,\ldots ,b_d\), and \(\varvec{\sigma }_+\) is independent of \(\mathscr {C}_+\). Now, obtain \({\varvec{G}}_*''\) from \({\varvec{G}}_*'\) by changing the prior to
Thus, we pin the cavities \(y\in \mathscr {C}_+\) to the spins observed under \(\varvec{\sigma }_{+}\), which are independent of \(b_1,\ldots ,b_d\).
Lemma 4.5
The joint distribution \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) of the cavities is \(\zeta \)-symmetric with probability at least \(1-\zeta \).
Proof
Since \(|\mathscr {C}|\ge \omega /2\) with probability \(1-\exp (-\Omega (\omega ))\), the assertion follows immediately from Lemma 3.15 and the construction of \({\varvec{G}}_*''\). \(\quad \square \)
Additionally, obtain \({\varvec{G}}^+_*\) from \({\varvec{G}}^-_*\) by changing the prior as per (4.22) as well, i.e.,
We are ready to verify the Belief Propagation equations for \(v_+\) on \({\varvec{G}}_*^+\).
Lemma 4.6
With probability \(1-O(\alpha ^{90})\) the random factor graph \({\varvec{G}}_*^+\) has the following properties:
Proof
Lemma 4.5 shows that \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \(\zeta \)-symmetric with probability at least \(1-\zeta \). Suppose it is. Then Proposition 3.11 shows that \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \((\beta ,d(k-1))\)-symmetric. We may also assume that \(|\mathscr {C}|\ge \omega /2\), an event that occurs with probability at least \(1-\exp (-\Omega (\omega ))\) by the construction of \({\varvec{G}}_*'\). Additionally, due to (2.1) we may assume that
According to (3.2), the standard message \(\mu _{{\varvec{G}}_*^+,b_1\rightarrow v_+}\) is defined as the marginal of \(v_+\) in the factor graph obtained from \(G_*^+\) by removing \(b_2,\ldots ,b_d\) and replacing the prior of \(v_+\) by the uniform distribution. By construction, this factor graph is obtained from \({\varvec{G}}_*''\) by adding the variable node \(v_+\) and constraint node \(b_1\) and replacing the prior of \(v_+\) by the uniform distribution. Therefore,
Further, the neighbors \(\partial b_1{\setminus } v^+\) are chosen uniformly from \(\mathscr {C}\) (without replacement). Because \(|\mathscr {C}|\ge \omega /2\) and \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \((\beta ,d(k-1))\)-symmetric, we conclude that
Combining (4.27), (4.28) and (4.29), we obtain the estimate
Moreover, the factor graph \({\varvec{G}}^+_*-b_1\) is obtained from \({\varvec{G}}_*''\) by adding \(v_+\) and \(b_2,\ldots ,b_d\). Hence, (4.27) implies that \(\mu _{{\varvec{G}}^+_*-b_1,\mathscr {C}}\) is \((2/\alpha )^{dk}\)-contiguous with respect to \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\). Since \(\mu _{{\varvec{G}}_*'',\mathscr {C}}\) is \(\zeta \)-symmetric, Proposition 3.11 and Lemma 3.17 yield \(\Delta _{\Box }(\mu _{{\varvec{G}}_*^+-b_1,\mathscr {C}},\mu _{{\varvec{G}}_*'',\mathscr {C}})<\beta .\) Because the neighborhood \(\partial b_1\) is random, Lemma 3.12 therefore yields
Combining (4.27), (4.30) and (4.31), we obtain (4.24).
The proofs of (4.25) and (4.26) are similar. Indeed, \(\mu _{{\varvec{G}}_*^+,v_+\rightarrow b_1}\) is the marginal of \(v_+\) in \({\varvec{G}}_*^+-b_1\), which is obtained from \({\varvec{G}}_*''\) by adding \(b_2,\ldots ,b_d\). Hence,
Invoking the \((\beta ,d(k-1))\)-symmetry of \(\mu _{{\varvec{G}}_*''}\) and (4.27), we obtain
Moreover, reordering the sums and products, we simplify the last expression and find
Further, (4.27) ensures that for each \(i\in [d]\) the distribution \(\mu _{{\varvec{G}}_*^+-b_i,\mathscr {C}}\) is \((2/\alpha )^{dk}\)-contiguous with respect to \({\varvec{G}}_*''\). Consequently, since the neighbors of \(b_i\) are chosen randomly from the set \(\mathscr {C}\) of cavities, Proposition 3.11 and Lemma 3.17 yield
Combining (4.32) and (4.33), we obtain (4.25).
Moving on to (4.26), we consider \(\sigma \in \Omega ^{\{v^+\}\cup \partial ^2v^+}\). Since \({\varvec{G}}_*^+\) is obtained from \({\varvec{G}}_*''\) by adding \(v_+\) along with \(b_1,\ldots ,b_d\), we have the exact formula
Since \(\partial ^2v^+\) is a random set of cavities, the \((\beta ,d(k-1))\)-symmetry of \(\mu _{{\varvec{G}}_*''}\) and (4.27) ensure that
Finally, to complete the proof we combine (4.33) and (4.34). \(\quad \square \)
We set up the random factor graph \({\varvec{G}}_*^+\) so as to facilitate the verification of the BP equations. But in a sense the model is a bit ‘out of whack’ because the prior is pinned according to a configuration \(\varvec{\sigma }_+\) drawn from \(\mu _{{\varvec{G}}_*'}\) rather than \(\mu _{{\varvec{G}}_*^-}\); see (4.23). Thus, with \(\varvec{\theta }_{+}=\varvec{\theta }_{\zeta }\) and \(\mathscr {C}_+\subset \mathscr {C}\) as before, draw \(\varvec{\sigma }_{++}\) from \(\mu _{{\varvec{G}}_*^-}\) and let \({\varvec{G}}_*^{++}\) be the random factor graph obtained from \({\varvec{G}}_*^-\) by changing the prior to
Hence, \(\varvec{\sigma }_{++}\) takes \(v_+,b_1,\ldots ,b_d\) into account.
Corollary 4.7
With probability \(1-O(\alpha ^{80})\) the bounds (4.24)–(4.26) hold with \({\varvec{G}}_*^+\) replaced by \({\varvec{G}}_*^{++}\).
Proof
The only difference between \({\varvec{G}}_*^{++}\) and \({\varvec{G}}_*^+\) lies in the choice of the configuration to which the variable nodes in \(\mathscr {C}_+\) get pinned. But since \({\varvec{G}}_*^+\) is obtained from \({\varvec{G}}_*''\) by the mere addition of d constraint nodes \(b_1,\ldots ,b_d\), (2.1) shows that \(\varvec{\sigma }_{++}\) is \(\alpha ^{-1}\)-contiguous with respect to the distribution of \(\varvec{\sigma }_{+}\) with probability \(1-O(\alpha ^{80}).\) Thus, the assertion follows from Lemma 4.6. \(\quad \square \)
We are finally ready to go back to the random factor graph \({\varvec{G}}_*''\). Indeed, basically the only difference between \({\varvec{G}}_*^{++}\) and \({\varvec{G}}_*''\) is that the former has one more variable node, along with d adjacent constraint nodes. But since the number of variable nodes of \({\varvec{G}}_*''\) is random, this difference should hardly be noticeable. Also \({\varvec{G}}_*''\) is invariant under permutations of its variable nodes. Thus, whatever we can prove for the last variable node \(v_+\) of \({\varvec{G}}_*^{++}\) carries over to a random variable node of \({\varvec{G}}_*''\). The following corollary makes this precise.
Corollary 4.8
With probability \(1-O(\alpha ^{70})\) we have
Proof
Consider the event \({{\mathscr {E}}}\) that in \({\varvec{G}}_*''\), for the variable node \(v_{n-\varvec{\omega }}\) with the largest index the estimate
holds. Since \({\varvec{G}}_*^{++}\) is obtained from \({\varvec{G}}_*^+\) by the same process that produces \({\varvec{G}}_*''\) from \({\varvec{G}}_*'\), Lemma 4.4 and Corollary 4.7 show that \(\mathbb {P}\left[{{\varvec{G}}_*''\in {{\mathscr {E}}}}\right]=1-O(\alpha ^{70})\). But since the distribution of \({\varvec{G}}_*''\) is invariant under permutations of the \(n-O(\omega )\) variable nodes of degree d, we can replace \(v_{n-\varvec{\omega }}\) in (4.38) by a random variable node of degree d. Thus, we obtain (4.35). The two bounds (4.36) and (4.37) follow analogously. \(\quad \square \)
To complete the proof of Proposition 4.2, we finally need to get from \({\varvec{G}}_*''\) back to \({\varvec{G}}_*'\). Thus, we need to undo the additional pinning of the cavities that was required to verify the BP equations (4.35)–(4.37). The elegant insight that makes this possible is that (4.35)–(4.37) really just describe a property of the joint distribution of the second neighborhoods of the variable nodes \(v_i\), \(i\ne n-\varvec{\omega }\). Indeed, by Lemma 4.3 the standard messages, defined via the removal of a few constraints adjacent to a single variable node v, can be expressed easily in terms of the joint distribution of the second neighborhood of v. Furthermore, Lemma 4.1 implies that the enhanced measure \({\hat{\mu }}_{{\varvec{G}}_*'}\) describing the second neighborhood distributions is \(\xi ^{1/4}\)-extremal, with \(\xi \) is near the top of the pecking order (4.3). In effect, \({\hat{\mu }}_{{\varvec{G}}_*'}\) is impervious to the additional pinning required to go from \({\varvec{G}}_*'\) to \({\varvec{G}}_*''\). Let us formalize this argument to finish the proof of Proposition 4.2.
Proof of Proposition 4.2
By Lemma 4.1 the measure \({\hat{\mu }}_{{\varvec{G}}^*}\) is \(\xi ^{1/4}\)-extremal with probability \(1-\omega ^{-1}\). Consequently, since \({\varvec{G}}_*'\) is obtained by deleting \(O(\omega )\) constraints, Lemma 3.17 and (2.1) ensure that \({\hat{\mu }}_{{\varvec{G}}_*'}\) is \(\vartheta \)-extremal with probability \(1-\alpha ^{10}\). Furthermore, \({\hat{\mu }}_{{\varvec{G}}_*''}\) is nothing but the conditional distribution \({\hat{\mu }}_{{\varvec{G}}_*'}\) given the event \(S_+\) that the spins of the \(\varvec{\theta }_\zeta \) cavities \(\mathscr {C}_+\) coincide with the ones of the reference configuration \(\varvec{\sigma }_+\). Since \(\varvec{\sigma }_+\) is drawn from \({\hat{\mu }}_{{\varvec{G}}_*'}\), with probability at least \(1-\zeta \) we have
If so, and if \({\hat{\mu }}_{{\varvec{G}}_*'}\) is \(\vartheta \)-extremal, then (4.3) and Corollary 3.18 imply that \(\Delta _{\Box }({\hat{\mu }}_{{\varvec{G}}_*''},{\hat{\mu }}_{{\varvec{G}}_*'})\le \beta \). In summary,
In addition (2.1) ensures that with probability at least \(1-\alpha ^{10}\),
Further, by Corollary 4.8 the bounds (4.35)–(4.37) hold with probability \(1-O(\alpha ^{70})\).
Thus, we are left to prove statements (i) and (ii) under the assumption that \(\Delta _{\Box }({\hat{\mu }}_{{\varvec{G}}_*''},{\hat{\mu }}_{{\varvec{G}}_*'})\le \beta \) and that (4.35)–(4.37) and (4.39) hold. Applying Lemma 3.12, we obtain
Further, Lemma 4.3 shows that the messages \(\mu _{{\varvec{G}}_*'',v\rightarrow a}\), \(\mu _{{\varvec{G}}_*'',a\rightarrow v}\) and \(\mu _{{\varvec{G}}_*',v\rightarrow a}\), \(\mu _{{\varvec{G}}_*',a\rightarrow v}\) can be expressed in terms of the marginal distributions \({\hat{\mu }}_{{\varvec{G}}_*'',v}\) and \({\hat{\mu }}_{{\varvec{G}}_*',v}\) of the depth-two neighborhood. Indeed, according to (3.1)–(3.2), for any \(a\in \partial v\) and \(\sigma \in \Omega \),
and analogously for \({\varvec{G}}_*'\). Hence, the total variation bound (4.40) and (4.39) imply that
Combining (4.37) and (4.39)–(4.41), we obtain assertion (ii). Further, (4.35), (4.39) and (4.41) readily yield
Moreover, combining (4.36), (4.39), (4.40) and (4.42), we obtain
Finally, (4.42) and (4.43) show that the standard messages are a \(O(\alpha ^{40d})\)-BP fixed point.
\(\square \)
5 The Free Energy: Upper Bound
5.1 Outline
In this section we derive the following upper bound on the free energy.
Proposition 5.1
Assume that POS is satisfied. Then
The proof of Proposition 5.1 consists of two parts. First, we will prove that any\(\mu \in \mathscr {K}\) yields an upper bound on \(\mathbb {E}\left[{\log Z({\varvec{G}})}\right]\). Specifically, recalling the notation from Sect. 3.1, let
Then we have the following generic upper bound, which may be of interest in its own right.
Proposition 5.2
Assume that POS is satisfied. Then \(\mathbb {E}\left[{\log Z({\varvec{G}})}\right]\le o(n)+\mathscr {B}'(\mu )-\mathscr {B}''(\mu )\) for any \(\mu \in \mathscr {K}\).
The proof of Proposition 5.2, based on the interpolation method [38], is relatively standard, although the fact that we deal with regular graphs requires a bit of care. The details are carried out in Sect. 5.2. This is the only place where condition POS is required.
The second step toward the proof of Proposition 5.1 is to show that for \(\mu \) drawn from \(\pi \in \mathfrak {D}^\star \) the upper bound from Proposition 5.2 boils down to the expression \(\mathscr {B}(\pi )\).
Proposition 5.3
For any \(\pi \in \mathfrak {D}^\star \) we have \(\mathscr {B}(\pi )=\mathbb {E}[\mathscr {B}'(\varvec{\mu }^\pi )-\mathscr {B}''(\varvec{\mu }^\pi )].\)
We prove Proposition 5.3 in Sect. 5.3.
Proof of Proposition 5.1
The first assertion is immediate from Propositions 5.2 and 5.3. To obtain the second assertion, we apply Azuma’s inequality and (2.1) to see that \(n^{-0.51}\left| {\log Z({\varvec{G}})-\mathbb {E}\log Z({\varvec{G}})}\right| \) converges to zero in probability. Hence, Fact 2.2 and Bayes’ rule show that \(\mathbb {E}\log Z({\mathbb {G}})=\mathbb {E}\log Z({\varvec{G}})+o(n)\), and thus the second assertion follows from the first. \(\quad \square \)
5.2 Proof of Proposition 5.2
We construct a family of random factor graph models parametrized by \(t\in [0,1]\). The free energy of the model at \(t=1\) will be easy to compute, and we will see that it is (nearly) equal to \(\mathscr {B}'(\mu )-\mathscr {B}''(\mu )\). The model with \(t=0\) essentially coincides with \({\varvec{G}}\). Furthermore, we will show that the derivative of the free energy is non-negative for all t, thus obtaining the desired upper bound on \(\mathbb {E}\log Z({\varvec{G}})\).
To construct this interpolating family, fix \(\mu \in \mathscr {K}\) and a small \(\varepsilon >0\). For \(t\in [0,1]\) let
all three mutually independent and independent of everything else. Given \(k\varvec{m}_t+\varvec{m}_t'\le dn\), we define the random factor graph \({\varvec{G}}_t\) as follows.
- INT1::
the set of variable nodes is \(\mathscr {V}=\left\{ {s}\right\} \cup V_n\), and the set of spins is \(\mathscr {X}=\Omega \cup [0,1]\).
- INT2::
the set of constraint nodes is
$$\begin{aligned} \mathscr {F}_{t}=\left\{ {a_1,\ldots ,a_{\varvec{m}_t},a_1',\ldots ,a_{\varvec{m}_t'}',a_1'',\ldots ,a_{\varvec{m}_t''}''}\right\} . \end{aligned}$$- INT3::
each constraint node \(a_i\) independently chooses a weight function \(\psi _{a_i}\) from P, and the \(a_i\) are joined to the variable nodes \(v_1,\ldots ,v_n\) by a random pairing of \(V_n\times [d]\) and \(\{a_1,\ldots ,a_{\varvec{m}_t}\}\times [k]\).
- INT4::
each of the constraint nodes \(a_i'\), \(i\in [\varvec{m}_t']\), is adjacent to the variable node s and one further variable node from \(v_1,\ldots ,v_n\); the links between the \(a_i'\) and the \(v_j\) are constructed by choosing a random pairing between the \(\varvec{h}_i'\)-clone of each \(a_i'\) and the clones in \(V_n\times [d]\) that are not paired to a constraint node \(a_h\). The weight function associated with \(a_i'\) reads
$$\begin{aligned} \psi _{a_i'}(s,\sigma )&=\sum _{\tau \in \Omega ^k}\varvec{1}\{\tau _{\varvec{h}_i'}=\sigma \}\varvec{\psi }_i'(\tau )\prod _{h\ne \varvec{h}_i'}\mu _{s,\varvec{x}_{i,h}'}(\tau _h). \end{aligned}$$- INT5::
the constraint nodes \(a_i''\), \(i\in [\varvec{m}_t'']\), are unary, adjacent to s only. Their weight functions read
$$\begin{aligned} \psi _{a_i''}(s)&=\sum _{\tau \in \Omega ^k}\varvec{\psi }_i''(\tau )\prod _{h=1}^k\mu _{s,\varvec{x}_{i,h}''}(\tau _h). \end{aligned}$$- INT6::
the prior \(\mathfrak {p}\) is a product measure
$$\begin{aligned} {\mathrm d}\mathfrak {p}(\sigma )=\varvec{1}\{\sigma _s\in [0,1],\,\forall 1\le i\le n:\sigma _{v_i}\in \Omega \} \prod _{i=1}^np\left( {\sigma _{x_i}}\right) {\mathrm d}\sigma _s; \end{aligned}$$thus, for each \(v_i\in V_n\) a spin from \(\Omega \) is chosen independently from p, and \(\sigma _s\) is uniform on [0, 1].
Thus, the total weight, partition function and Boltzmann distribution of \({\varvec{G}}_t\) read
The following lemma establishes the monotonicity of the free energy in t; its proof is the only place where we use condition POS.
Lemma 5.4
Suppose that POS is satisfied. Then uniformly for all \(t\in (0,1)\) we have
Proof
We recall the derivative of the Poisson density: for any \(\lambda >0\), \(\ell \ge 1\),
The variable t affects the distribution of \({\varvec{G}}_t\) by way of the variables \(\varvec{m}_t,\varvec{m}_t',\varvec{m}_t''\). Specifically, let
Recall that \(\varvec{m}_t,\varvec{m}_t'\) are conditional Poisson variables \(\mathrm{Po}(\lambda _t)\) and \(\mathrm{Po}(\lambda _t')\), respectively, given that \(k\varvec{m}_t+\varvec{m}_t'\le dn\). Since \(\varepsilon >0\) is independent of n, (5.2) shows that for any two integers \(m_t,m_t'\ge 1\),
Further, given the event \(k\varvec{m}_t+\varvec{m}_t''\le dn-k\) let \({\varvec{G}}_t'\) be the random factor graph obtained from \({\varvec{G}}_t\) by adding one more constraint node \(a_{\varvec{m}_t+1}\) as per INT3. Similarly, given \(k\varvec{m}_t+\varvec{m}_t''\le dn-1\) obtain \({\varvec{G}}_t''\) from \({\varvec{G}}_t\) by adding \(a_{\varvec{m}_t'+1}\) according to INT4. Additionally, obtain \({\varvec{G}}_t'''\) from \({\varvec{G}}_t\) by adding a unary \(a_{\varvec{m}_t''+1}\) as described in INT5. Since \(\varvec{m}_t''\) is independent of \(\varvec{m}_t',\varvec{m}_t''\), (5.2) and (5.3) yield
Hence, it suffices to prove that for all \(0<t<1\),
By the definition of the Boltzmann distribution (5.1),
Hence,
Further, in terms of the kernel representation \({\dot{\mu }}_{{\varvec{G}}_t}\) of the Boltzmann distribution we obtain
Combining (5.6) and (5.7) yields
Due to (2.1) and Fubini’s theorem, we can exchange the sum and the expectation in (5.8); indeed, (2.1) yields
Thus, (5.8) becomes
Following similar steps, we obtain expansions for the other two terms from (5.5) as well:
Finally, the assertion follows from POS and (5.5), (5.9), (5.10) and (5.11). \(\quad \square \)
Proof of Proposition 5.2
Integrating t from 0 to 1 and applying Lemma 5.4, we obtain for any \(\varepsilon >0\),
Letting
we claim that for a certain number \(c=c(P)>0\),
Indeed, at \(t=0\) the variable node s is adjacent to the constraint nodes \(a_i''\), \(i\in [\varvec{m}_t'']\), only. Hence, \({\varvec{G}}_0\) decomposes into connected components, one of which comprises s and the \(a_i''\). Let \({\varvec{G}}_0''\) be this component, and let \({\varvec{G}}_0'\) be the remainder of \({\varvec{G}}_0\). Then by construction we have \(\mathbb {E}\log Z({\varvec{G}}_0'')=\mathbb {E}[Y]\). Thus, (5.12) yields
Furthermore, \({\varvec{G}}_0'\) consists of the variable nodes \(v_1,\ldots ,v_n\) and the constraint nodes \(a_1,\ldots ,a_{\varvec{m}_1}\), where \(\varvec{m}_1\) is a Poisson variable \(\mathrm{Po}(\exp (-\varepsilon )dn/k)\) conditioned on taking a value of at most dn / k. Thus, we can construct a random factor graph with the same distribution as \({\varvec{G}}\) from \({\varvec{G}}_0'\) by simply adding \(dn/k-\varvec{m}_1\) further random k-ary constraint nodes as per INT3. Since all weight functions \(\psi \in \Psi \) take values in (0, 2), we obtain \(c=c(P)>0\) such that
Combining (5.14) and (5.15), we obtain (5.13).
We further claim that there is a constant \(c'=c'(P)>0\) such that
Indeed, \(\mathscr {B}''(\mu )=\mathbb {E}[Y\mid \varvec{m}_0''=(k-1)dn/k]\). In other words, we can think of \(\mathscr {B}''(\mu )\) as the free energy of \({\varvec{G}}_0''\) given that \(\varvec{m}_0''=(k-1)dn/k\). Thus, obtain \({\varvec{G}}_0'''\) from \({\varvec{G}}_0''\) by adding \((k-1)dn/k-\varvec{m}_0''\) more constraint nodes according to INT5, or by removing some random constraint nodes if \(\varvec{m}_0''>(k-1)dn/k\). Then \(\mathscr {B}''(\mu )=\mathbb {E}\log Z({\varvec{G}}_0''')\). Since \(\varvec{m}_0''\) is a Poisson variable with mean \(\exp (-\varepsilon )(k-1)dn/k\), with probability \(1-\exp (-\Omega (n))\) we do not need to add or remove more than \(2\varepsilon (k-1)dn\) constraint nodes. The tail bound (2.1) therefore implies together with the Chernoff bound that (5.16) is satisfied for a certain \(c'=c'(P)\).
By similar arguments, for a certain \(c''=c''(P)\) we have
Indeed, \(\mathscr {B}'(\mu )\) is nothing but the conditional expectation of \(\log Z({\varvec{G}}_1)\) given that \(\varvec{m}_1'=dn\). Hence, if we pad \({\varvec{G}}_1\) by adding the missing \(dn-\varvec{m}_1'\) constraint nodes \(a_i'\) according to INT4, then the total number of constraints added does not exceed \(2\varepsilon dn\) with probability \(1-\exp (-\Omega (n))\). Hence, (5.17) follows from (2.1) and the Chernoff bound.
Finally, combining (5.12)–(5.17), we conclude that
for a certain \(c'''=c'''(P)>0\). Since this is true for any fixed \(\varepsilon >0\), the assertion follows. \(\quad \square \)
5.3 Proof of Proposition 5.3
Following Panchenko [52], who worked with factor graphs of Erdős-Rényi type, we are going to use the invariance property of \(\pi \in \mathfrak {D}^\star \) under the \(*(N,M)\)-operation to simplify \(\mathscr {B}',\mathscr {B}''\) separately.
Lemma 5.5
Suppose that \(\pi \in \mathfrak {D}^\star \). Then
Proof
Let \(\phi =\mathbb {E}\left[{\log \left\langle {{\varvec{\psi }_1},{\pi }}\right\rangle }\right]\) for brevity. We claim that for any integer \(m\ge 0\),
Then (5.18) follows by summing (5.19) on \(0\le m<d(k-1)n/k\).
Thus, we are left to prove (5.19). Since \(\pi \in \mathfrak {D}^\star \), Corollaries 3.4 and 3.9 imply that for any integer \(\ell \ge 1\),
Consequently, for all \(\ell \ge 1\) we have
Further, because the continuous function \(z\in [-1,1]\mapsto |z|\) is a uniform limit of polynomials, (5.20) yields
Therefore, invoking (2.1), we obtain
Hence, by (5.20) and Fubini’s theorem,
which is (5.19). \(\quad \square \)
Lemma 5.6
Suppose that \(\pi \in \mathfrak {D}^\star \). Then \(\mathbb {E}[\mathscr {B}'(\varvec{\mu }^\pi )]=\mathbb {E}\log \left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle .\)
Proof
We use a similar argument as in the proof of Lemma 5.5. This time we set \(\phi =\mathbb {E}\log \left\langle {{\varvec{\varphi }_1},{\pi }}\right\rangle \). It suffices to show that for every \(n\ge 0\),
As in the proof of Lemma 5.5, we use that \(\pi \in \mathfrak {D}^\star \) and apply Corollaries 3.4 and 3.9 to obtain for any \(\ell \ge 1\),
Hence, for any \(\ell \ge 1\),
Further, approximating the absolute value by polynomials, we obtain from (5.22) that
Thus, (5.21) follows from (5.23) and Fubini’s theorem. \(\quad \square \)
Finally, Proposition 5.3 is immediate from Lemmas 5.5 and 5.6.
6 The Free Energy: Lower Bound
6.1 Outline
In this section we prove the following lower bound on the free energy that matches the upper bound from Proposition 5.1. The lower bound does not require the assumption POS.
Proposition 6.1
We have
Theorem 2.7 follows immediately from Propositions 5.1 and 6.1.
The proof of Proposition 6.1 is based on a kind of coupling argument that is colloquially referred to as the ‘Aizenman–Sims–Starr’ scheme. This technique has been applied with great success to random factor graphs of Erdős-Rényi type, where the degree distribution is approximately Poisson [15, 22, 52]. The basic idea is to couple a random factor graph with n variable nodes with a random factor graph with \(n+1\) variable nodes and to calculate the difference of their free energies very precisely. This coupling is very easy to set up in the Erdős-Rényi case due to the Stein-Chen property of the Poisson distribution.
However, in the case of random regular graphs matters are more intricate. Due to the rigid local structure there is no obvious way of coupling random regular factor graphs with n and \(n+1\) variable nodes. As in Sect. 4, we therefore resort to the idea of creating a bit of wiggling room by carving out a few cavities, in such a way that the free energy does not change significantly. But the details of the construction are delicate.
Let \(n,\omega \) be integers and let \(\varvec{X},\varvec{Y}\) be two independent Poisson variables with mean \(\omega \). The protagonist of the proof is the random factor graph \({\varvec{G}}_{n,\omega }\) defined as follows. Let
be independent of \(\varvec{Y}\). Further, set
Then \({\varvec{G}}_{n,\omega }\) has \(N_{n,\omega }\) variable nodes \(v_i\), \(i\in [N_{n,\omega }]\), and \(M_{n,\omega }\) constraint nodes \(a_i\), \(i\in [M_{n,\omega }]\). The weight functions \(\psi _{a_i}\) are chosen independently from P. Furthermore, the variable and constraint nodes are linked through a random (one-to-one) pairing
Since \(kM_{n,\omega }\le dN_{n,\omega }\) by construction, such a pairing exists, but some variable clones may go unpaired. We are going to harness these unpaired ‘cavities’ to set up a coupling of \({\varvec{G}}_{n,\omega }\) and \({\varvec{G}}_{n+1,\omega }\).
To this end, consider a further random factor graph \({\hat{{\varvec{G}}}}_{n,\omega }\) with \(N_{n,\omega }\) variable nodes and \({\hat{M}}_{n,\omega }=M_{n+1,\omega }-d\) constraint nodes. The weight functions are chosen independently from P, and the connections between the constraint and variable nodes are induced by a random pairing
Rather than coupling \({\varvec{G}}_{n,\omega }\) and \({\varvec{G}}_{n+1,\omega }\) directly, we will couple \({\varvec{G}}_{n,\omega }\) and \({\hat{{\varvec{G}}}}_{n,\omega }\) as well as \({\varvec{G}}_{n+1,\omega }\) and \({\hat{{\varvec{G}}}}_{n,\omega }\).
This construction leads to an approximate formula for the free energy of \({\varvec{G}}_{n,\omega }\) that comes in terms of the kernel representation of the Boltzmann distribution of \({\hat{{\varvec{G}}}}_{n,\omega }\). To be precise, let \({\hat{\mathscr {C}}}\) be the set of variables \(v_i\in V_{N_{n,\omega }}\) with at least one unpaired clone in \({\hat{{\varvec{G}}}}_{n,\omega }\). Consider the random kernel
representing the joint distribution of the cavities \({\hat{\mathscr {C}}}\). Further, let \({\hat{\pi }}_{n,\omega }\in \mathfrak {D}\) be the distribution of \({\hat{\rho }}_{n,\omega }\). To deal with the conditioning on the event \(\mathscr {S}\), we also introduce versions \({\mathbb {G}}_{n,\omega }\), \({\hat{{\mathbb {G}}}}_{n,\omega }\) of the above random factor graphs conditional on \(\mathscr {S}\). Let \({\tilde{\rho }}_{n,\omega }={\dot{\mu }}_{{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{\mathscr {C}}}}\in \mathfrak {K}\) be the kernel representation of the corresponding Boltzmann distribution, and let \({\tilde{\pi }}_{n,\omega }\in \mathfrak {D}\) be the law of \({\tilde{\rho }}_{n,\omega }\). In Sect. 6.2 we will derive the following formula.
Proposition 6.2
For any \(\varepsilon >0\) there exists \(\omega >0\) such that
There are still two gaps to fill toward the proof of Proposition 6.1. First, the estimate of the free energy provided by Proposition 6.2 does not quite match the functional \(\mathscr {B}({\hat{\pi }}_{n,\omega })\). Second, the distribution \({\hat{\pi }}_{n,\omega }\in \mathfrak {D}\) does not generally belong to the subspace \(\mathfrak {D}^\star \). The following proposition deals with the second issue, which holds the key to resolving the first. Recall that the topology of \(\mathfrak {D}\) is induced by the Wasserstein metric \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\). We introduce a relaxed version of \(\mathfrak {D}^\star \) by letting
Since (2.1) and Lemma 3.6 show that the map \(\pi \mapsto \pi ^{*(u,w)}\) is continuous, \(\mathfrak {D}^\star _{\varepsilon ,N,M}\) is a closed subspace of the compact Polish space \(\mathfrak {D}\).
Proposition 6.3
For any \(\varepsilon ,L>0\) there is \(\omega _0>0\) such that for every \(\omega >\omega _0\) for large enough n we have
The proof of Proposition 6.3 can be found in Sect. 6.3. Finally, in Sect. 6.4 we derive Proposition 6.1 from Propositions 6.2 and 6.3.
6.2 Proof of Proposition 6.2
We assume throughout that \(\omega >\omega _0\) for a big enough \(\omega _0=\omega _0(d,P)\) and that n sufficiently large.
Obtain the random factor graph \({\varvec{G}}'_{n,\omega }\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by adding \(M_n-{\hat{M}}_{n,\omega }\) new random constraint nodes \(a_{i}\), \({\hat{M}}_{n,\omega }<i\le M_n\), whose weight functions are drawn from P independently and that are linked with the variable nodes via a random pairing with the cavities \({\hat{\mathscr {C}}}\) of \({\hat{{\varvec{G}}}}_{n,\omega }\).
Further, if \(k{\hat{M}}_{n,\omega }\le dN_{n,\omega }-d(k-1)\), then obtain \({\varvec{G}}''_{n,\omega }\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by adding one new variable node \({\hat{v}}=v_{N_{n,\omega }+1}\) along with d random constraint nodes \({\hat{a}}_1,\ldots ,{\hat{a}}_d\) adjacent to \({\hat{v}}\) whose weight functions are drawn independently from P. To be precise, the clones of \({\hat{v}}\) are paired each with a uniformly random clone \({\hat{\varvec{h}_i}}\) of \({\hat{a}}_i\) for \(i=1,\ldots ,d\), and the remaining \(d(k-1)\) clones of the \({\hat{a}}_i\) are paired with randomly chosen cavities of \({\hat{{\varvec{G}}}}_{n,\omega }\). If \(k{\hat{M}}_{n,\omega }> dN_{n,\omega }-d(k-1)\), then obtain \({\varvec{G}}''_{n,\omega }\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by just adding a new isolated variable node \({\hat{v}}\).
Obtain \({\mathbb {G}}_{n,\omega }',{\mathbb {G}}_{n,\omega }''\) analogously from \({\hat{{\mathbb {G}}}}_{n,\omega }\) while conditioning on the event that the outcome is simple. If it is impossible to add the required number of constraint nodes in such a way that the resulting factor graph is simple, then do not add any.
Lemma 6.4
For \(\omega >0\) we have
Proof
Since \(d\ge 3\) and \(k\ge 2\),
Because the left-hand side is an integer, we conclude that \({\hat{M}}_{n,\omega }\le \lfloor dN_{n,\omega }/k\rfloor \). Similarly,
Thus, \(M_{n,\omega }\ge {\hat{M}}_{n,\omega }\). Hence, \({\varvec{G}}_{n,\omega }\) and \({\varvec{G}}'\) are identically distributed.
Moving on to the second claim, we consider the event \(\mathscr {A}\) that the last variable node is adjacent to precisely d distinct constraint nodes. Then
while \({\varvec{G}}_{n,\omega }''\in \mathscr {A}\) with certainty. Given \(\mathscr {A}\) and given that \(\varvec{X}+\varvec{Y}\le \sqrt{n}\), say, the subgraph \({\tilde{{\varvec{G}}}}_{n+1,\omega }\) obtained from \({\varvec{G}}_{n+1,\omega }\) by deleting \({\tilde{v}}\) along with its adjacent constraint nodes is distributed precisely as \({\hat{{\varvec{G}}}}_{n,\omega }\), and therefore \({\varvec{G}}''_{n,\omega }\) and \({\varvec{G}}_{n+1,\omega }\) can be coupled identically. Hence,
If, on the other hand, \(\varvec{X}+\varvec{Y}\le \sqrt{n}\) but \(\mathscr {A}\) does not occur, then we can couple \({\varvec{G}}_{n+1,\omega }\) and \({\varvec{G}}''_{n,\omega }\) such that both disagree on at most 2d constraint nodes. Indeed, suppose that \(v_{N_{n+1,\omega }}\) has \({\tilde{d}}<d\) adjacent constraints in \({\varvec{G}}_{n+1,\omega }\). Then the subgraph obtained by removing \(v_{N_{n+1,\omega }}\), its \({\tilde{d}}\) neighbors and another \(d-{\tilde{d}}\) random constraint nodes is distributed precisely as \({\hat{{\varvec{G}}}}_{n,\omega }\). Hence, we can obtain both \({\varvec{G}}_{n+1,\omega }\) and \({\varvec{G}}_{n,\omega }''\) from \({\hat{{\varvec{G}}}}_{n,\omega }\) by adding d (possibly distinct) constraint nodes. Thus, (2.1) ensures that
Furthermore, (2.1) ensures that
Since \(\mathbb {P}\left[{X+Y>\sqrt{n}}\right]=o(n^{-2})\), (6.2)–(6.5) yield the second assertion.
Matters get slightly more complicated once we condition on \(\mathscr {S}\). Since \(\varvec{X}+\varvec{Y}\le \log n\) with probability \(1-O(n^{-k})\), due to (2.1) the event \(\varvec{X}+\varvec{Y}>\log n\) contributes no more than an additive o(1) to the difference of the free energies. Hence, we may condition on \(\varvec{X}+\varvec{Y}\le \log n\). Let \(\varvec{d}'\) be the vector comprising the variable degrees in \({\hat{{\mathbb {G}}}}_{n,\omega }\). Let \(\mathscr {D}\) be the set of all such sequences with entries either d or \(d-1\). A standard moment calculation shows that given any possible \(\varvec{d}'\), the event \({\hat{{\varvec{G}}}}_{n,\omega }\in \mathscr {S}\) has probability \((1+o(1))\exp \left[{-(d-1)(k-1)/2-\varvec{1}\{k=2\}(d-1)^2/4}\right]\) (cf. Fact 2.2). Therefore, with \({\tilde{O}}(\,\cdot \,)\) hiding poly-logarithmic terms,
Similarly, let \(\varvec{d}\) comprise the variable degrees of the factor graph \({\mathbb {G}}_{n,\omega }^-\) obtained from \({\mathbb {G}}_{n,\omega }\) by deleting the last d constraint nodes. Then
Additionally, let \({{\mathscr {E}}}\) be the set of all factor graphs that have a constraint node that is adjacent to variable nodes of degree less than d only. Then
Further, on the event \(\mathscr {D}{\setminus }{{\mathscr {E}}}\) we can couple \({\mathbb {G}}_{n,\omega }'\) and \({\mathbb {G}}_{n,\omega }\) identically, because there is no way of adding the missing constraint nodes to \({\hat{{\mathbb {G}}}}_{n,\omega }\) without obtaining a simple factor graph. Hence,
But (6.9) does not yet suffice to prove (6.1) because outside the event \(\mathscr {D}{\setminus }{{\mathscr {E}}}\) the free energies of the two factor graphs may differ by \(\Omega (n)\). Hence, we also need to consider the event \(\mathscr {D}'\) that \(\varvec{d}\) has a single \(d-2\) entry; this suffices because
and thus the contribution of the complement of \(\mathscr {D}\cup \mathscr {D}'\) to the free energy difference is o(1) due to (2.1). Considering the event \(\mathscr {D}'\) is indeed necessary because \(\mathbb {P}[{{\hat{\varvec{d}}}}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n]>\mathbb {P}[\varvec{d}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n]\). Indeed, while \({\hat{{\mathbb {G}}}}_{n,\omega }\) is just a uniformly random simple factor graph with \(N_{n,\omega }\) variable and \({\hat{M}}_{n,\omega }\) constraint nodes, \({\mathbb {G}}_{n,\omega }^-\) has a tilted distribution, with each possible simple graph being weighed according to the number of extensions into a simple graph with \(M_{n,\omega }\) constraints. In effect, since variable nodes of degree less than \(d-1\) leave us with fewer extensions, the event \(\mathscr {D}'\) is less likely in \({\mathbb {G}}_{n,\omega }^-\). Yet because \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) is bounded, on the event \(\mathscr {D}'\) we can couple \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega }'\) such that both differ only in a bounded number of constraint nodes. As a consequence,
Additionally, we claim that also \({\mathbb {G}}_{n,\omega }\) given \(\varvec{d}\in \mathscr {D}\) and \({\mathbb {G}}'_{n,\omega }\) given \({{\hat{\varvec{d}}}}\in \mathscr {D}'\) can be coupled such that with probability \(1-{\tilde{O}}(1/n)\) both differ only in \({\tilde{O}}(1)\) constraint nodes and that, in effect,
To see this, let \(u_1,\ldots ,u_\ell \) be the variables nodes of degree less than d in \({\mathbb {G}}_{n,\omega }^-\); suppose, indeed, that all of them have degree \(d-1\). Similarly, let \(u_1',\ldots ,u_{\ell -1}'\) be the cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\), all of degree \(d-1\) except for \(u_{\ell -1}'\), which has degree \(d-2\). Pick a further variable node \(u_\ell '\) of degree d randomly. Then with probability \(1-{\tilde{O}}(n^{-1})\) the second neighborhoods \(\partial ^2\{u_1,\ldots ,u_\ell \}\), \(\partial ^2\{u_1',\ldots ,u_\ell '\}\) both have size \(\ell (k-1)(d-1)\). Consequently, the subgraphs of \({\mathbb {G}}_{n,\omega }^-\) and \({\hat{{\mathbb {G}}}}_{n,\omega }\) obtained by removing \(u_1,\ldots ,u_\ell \) and \(u_1',\ldots ,u_\ell '\) along with their neighbors, respectively, can be coupled such that both coincide with probability \(1-\tilde{O}(1/n)\). Thus, \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega }'\) can be coupled such that the expected number of constraint nodes on which the two factor graphs differ is \({\tilde{O}}(1)\), whence we obtain (6.12).
To deal with the event \({{\mathscr {E}}}\), we may assume that \(k=2\) due to (6.6). Furthermore, because of (6.8) and as
we may assume that \(\varvec{d},\varvec{d}'\in \mathscr {D}\). Since
because the event \({{\mathscr {E}}}\) precludes certain extensions into a simple factor graph with \(M_{n,\omega }\) constraints, we just need to consider the case that \({\hat{{\mathbb {G}}}}_{n,\omega }\in {{\mathscr {E}}}\) and \({\mathbb {G}}_{n,\omega }^-\not \in {{\mathscr {E}}}\) given that \(\varvec{d},\varvec{d}'\in \mathscr {D}\). Let \(u_1,\ldots ,u_\ell \) and \(u_1',\ldots ,u_{\ell }'\) be the cavities of \({\mathbb {G}}_{n,\omega }^-\) and \({\hat{{\mathbb {G}}}}_{n,\omega }\), respectively. Pick one further constraint node b of \({\hat{{\mathbb {G}}}}_{n,\omega }\). Then with probability \(1-{\tilde{O}}(1/n)\) the set \(\partial ^2\{u_1,\ldots ,u_\ell \}\) has size \(\ell (d-1)\), and all variable nodes in this set have pairwise distance at least four. The same is true of the set \(\partial ^2\{u_1',\ldots ,u_\ell '\}\cup \partial b\) with probability \(1-{\tilde{O}}(1/n)\). If these two events occur, then \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega '}\) can be coupled such that they only differ on the \({\tilde{O}}(1)\) constraint nodes that are adjacent to \(u_1,\ldots ,u_\ell \) and \(u_1',\ldots ,u_\ell '\) and b. Hence, we obtain a coupling such that \({\mathbb {G}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega '}\) only differ on \({\tilde{O}}(1/n)\) variable nodes in expectation, and thus
Moreover, because given \({{\mathscr {E}}}\) there is precisely one constraint involving variables of degree \(d-1\) only with probability \(1-\tilde{O}(1/n)\), we obtain
Combining (6.15)–(6.16), we obtain the left bound stated in (6.1).
We proceed similarly to derive the right bound in (6.1). Indeed, in this case we do not need to consider the event \({{\mathscr {E}}}\) separately, because all additional constraint nodes are connected with a variable node that does not belong to \({\hat{{\mathbb {G}}}}_{n,\omega }\) or \({\mathbb {G}}_{n,\omega }^-\), respectively. Hence, on the event \(\mathscr {D}\) we can couple \({\mathbb {G}}_{n,\omega }''\) and \({\mathbb {G}}_{n+1,\omega }\) identically, and thus
In effect, due to (6.10) we just need to construct a coupling in the event that \({\hat{{\mathbb {G}}}}_{n,\omega }\in \mathscr {D}'\) and \({\hat{{\mathbb {G}}}}_{n,\omega }^-\in \mathscr {D}\). To this end, we proceed as above by coupling \({\hat{{\mathbb {G}}}}_{n,\omega }\), \({\mathbb {G}}_{n,\omega }^-\) given the second neighborhoods of the cavities such that both only differ in an expected \({\tilde{O}}(1/n)\) constraint nodes. Since \(\mathbb {P}\left[{\varvec{d}'\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right],\mathbb {P}\left[{\varvec{d}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]={\tilde{O}}(1/n)\) and \(\mathbb {P}\left[{\varvec{d}\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]\ge \mathbb {P}\left[{\varvec{d}'\in \mathscr {D}'\mid \varvec{X}+\varvec{Y}\le \log n}\right]\), the second part of (6.1) follows from (6.17). \(\quad \square \)
We are ready to compare the free energies of \(Z({\varvec{G}}'')\), \(Z({\hat{{\varvec{G}}}})\) and \(Z({\varvec{G}}')\), \(Z({\hat{{\varvec{G}}}})\) and of the corresponding simple graphs. We will carry the proofs out for the case of the simple random factor graph \({\hat{{\mathbb {G}}}}\); the other case is simply obtained by skipping any deliberations pertinent to the event \(\mathscr {S}\).
Lemma 6.5
We have
Proof
Let \(\mathscr {A}\) be the event that \({\hat{{\mathbb {G}}}}_{n,\omega }\) has at least \(\omega /2\) cavities, that all variable nodes have degree either d or \(d-1\) and that no two variable nodes of degree \(d-1\) are adjacent to the same constraint node. Then \(\mathbb {P}\left[{\mathscr {A}}\right]=1-\exp (-\Omega _\omega (\omega ))\). Hence, (2.1) ensures that
Moreover, on \(\mathscr {A}\) the random factor graph \({\mathbb {G}}_{n,\omega }''\) is obtained from \({\mathbb {G}}_{n,\omega }\) by adding one variable node \({\hat{v}}\) along with d constraint nodes \({\hat{a}}_1,\ldots ,{\hat{a}}_d\), whose weight functions are drawn from P independently. Further, on the event \(\mathscr {A}\) all neighbors of the \({\hat{a}}_i\) except \({\hat{v}}\) belong to the set \({\hat{\mathscr {C}}}\) of cavities. Therefore, we have the exact formula
To proceed, let \((\varvec{v}_{i,j})_{i,j}\) be a sequence of uniformly and independently chosen cavities \(\varvec{v}_{i,j}\in {\hat{\mathscr {C}}}\). We claim that on the event \(\mathscr {A}\),
Indeed, the only difference between (6.19) and (6.20) is that in the former the neighbours \(\partial {\hat{a}}_i{\setminus }{\hat{v}}\) are chosen from \({\hat{\mathscr {C}}}\) without replacement, whereas the \(\varvec{v}_{i,j}\) are chosen independently, i.e., with replacement. But since we choose a mere dk cavities \((\varvec{v}_{i,j})_{i\in [d], j\in [k]}\) out of a total of at least \(\omega /2\), the probability of hitting the same cavity twice is \(o_\omega (1)\), and thus (6.20) follows from (2.1). Further, unravelling the definitions of \({\tilde{\rho }}_{n,\omega }\) and \(\varvec{\varphi }_1\), we see that
Finally, the assertion follows from (6.18)–(6.21) by taking the expectation on \({\hat{{\mathbb {G}}}}_{n,\omega }\). \(\square \)
Lemma 6.6
We have
Proof
The proof is similar in spirit to the previous one. Once more we consider the event \(\mathscr {A}\) that \({\hat{{\mathbb {G}}}}_{n,\omega }\) has at least \(\omega /2\) cavities, that all variable nodes have degree either d or \(d-1\) and that no two variable nodes of degree \(d-1\) are adjacent to a common constraint node. Then \(\mathbb {P}\left[{\mathscr {A}}\right]=1-\exp (-\Omega _\omega (\omega ))\) and
Moreover, we have the pointwise exact formula
With \((\varvec{v}_{i,j})_{i,j}\) a sequence of independently chosen cavities \(\varvec{v}_{i,j}\in {\hat{\mathscr {C}}}\), we claim that on \(\mathscr {A}\),
Indeed, the only difference is that in (6.24) the \(\varvec{v}_{i,j}\) are chosen independently, whereas in (6.23) the neighbors of the \(a_i\) are chosen without replacement. But since \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) is bounded while there are at least \(\omega /2\) cavities, the two terms coincide up to \(o_\omega (1)\). Finally, the construction of \({\tilde{\rho }}_{n,\omega }\) ensures that
and thus the assertion follows from (6.22)–(6.25) by taking the expectation. \(\quad \square \)
Finally, Proposition 6.2 is an immediate consequence of Lemmas 6.4, 6.5 and 6.6.
6.3 Proof of Proposition 6.3
Once more we will carry the proof out for the simple random factor graph, which is the (slightly) more intricate case; the unconditional case follows by skipping any considerations pertaining to the conditioning. The basic idea behind the proof of Proposition 6.3 is quite simple. With probability \(1-o_\omega (1)\) the random graph \({\hat{{\mathbb {G}}}}_{n,\omega }\) consists of \(N_{n,\omega }\) variable and \({\hat{M}}_{n,\omega }\) constraint nodes and we have \(N_{n,\omega }=n-\varvec{X}\) and
with independent \(\mathrm{Po}(\omega )\) variables \(\varvec{X},\varvec{Y}\). Fix two integers \(\ell ,\ell '\ge 0\). Given that \(\varvec{X}\ge \ell \) and \(k{\hat{M}}_{n,\omega }\le dN_{n,\omega }-k\ell '-d(k-1)\ell \), let \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) be the random factor graph obtained from \({\hat{{\mathbb {G}}}}_{n,\omega }\) by adding
\(\ell \) more variable nodes \({\hat{v}}_1=v_{N_{n,\omega }+1},\ldots ,{\hat{v}}_\ell =v_{N_{n,\omega }+\ell }\) along with \(d\ell \) new constraint nodes \({\hat{a}}_{i,j}\), \(i\in [\ell ]\), \(j\in [d]\), each with a weight function chosen from P independently; connect a random clone \({\hat{\varvec{h}}}_{i,j}\) of each \({\hat{a}}_{i,j}\) with a random clone of \({\hat{v}}_i\) and pair the other \(k-1\) clones of \({\hat{a}}_{i,j}\) with random cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\) left pending by the previous additions.
\(\ell '\) more constraint nodes \({\hat{a}}_1,\ldots ,{\hat{a}}_{\ell '}\), each endowed with a weight function chosen from P independently and each connected with k random cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\) left vacant by the previous operations.
The resulting random factor graph \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) is not necessarily simple. Yet the key insight behind Proposition 6.3 is that for any \(\ell ,\ell '\) the distribution of \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) is close to that of the original graph \({\hat{{\mathbb {G}}}}_{n,\omega }\), provided that \(\omega \) is big enough. Moreover, the perturbation of the Boltzmann distribution that ensues upon going from \({\hat{{\mathbb {G}}}}_{n,\omega }\) to \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) is close to the perturbation induced by the \(*(\ell ,\ell ')\)-operation. We introduce similar notation \({\hat{{\varvec{G}}}}_{n,\omega }[\ell ,\ell ']\) for the random graph without the conditioning on \(\mathscr {S}\).
To formalize this idea, we first compare the distributions of \({\hat{{\mathbb {G}}}}_{n,\omega }\) and \({\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\). For integers x, y we denote by \({\hat{{\mathbb {G}}}}_{n,\omega ,x,y}\) the conditional \({\hat{{\mathbb {G}}}}_{n,\omega }\) given that \(\varvec{X}=x\) and \(\varvec{Y}=y\).
Lemma 6.7
For any \(\ell ,\ell '\ge 0\) we have \(d_{\mathrm {TV}}\left( {{\hat{{\mathbb {G}}}}_{n,\omega },{\hat{{\mathbb {G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]}\right) =o_\omega (1)\) and analogously \(d_{\mathrm {TV}}\left( {{\hat{{\varvec{G}}}}_{n,\omega },{\hat{{\varvec{G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]}\right) =o_\omega (1)\).
Proof
The event \(\mathscr {A}=\{\omega /2\le \varvec{X}\le 2\omega ,\, \omega /2\le \varvec{Y}\le 2\omega \}\) has probability \(1-o_\omega (1)\). Further, because \(\varvec{X},\varvec{Y}\) are independent Poisson variables with a large mean \(\omega \) while \(\ell ,\ell '\) are fixed, the total variation distance of the pairs \((\varvec{X},\varvec{Y})\) and \((\varvec{X}-\ell ,\varvec{Y}-\ell ')\) is of order \(O_\omega (\omega ^{-1/2})\). Hence, given \(\mathscr {A}\) the total variation distance of \({\hat{{\mathbb {G}}}}_{n,\omega }\) and \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}\) is \(o_\omega (1)\); in symbols,
Further, let \(\mathscr {A}'\) be the event that \({\hat{{\mathbb {G}}}}_{n,\omega }\) enjoys the following additional properties.
- (i’)
The last \(\ell \) variable nodes of \({\hat{{\mathbb {G}}}}_{n,\omega }\) satisfy \(\left| {\partial ^2\{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}{\setminus }{\hat{\mathscr {C}}}}\right| =\ell d(k-1).\) Hence, there are \(\ell d(k-1)\) distinct second neighbors, none of which is a cavity.
- (ii’)
The last \(\ell '\) constraint nodes of \({\hat{{\mathbb {G}}}}_{n,\omega }\) satisfy \(\left| {\partial \{a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\}{\setminus }{\hat{\mathscr {C}}}}\right| =k\ell '\). Hence, there are \(k\ell ' \) distinct second neighbors, none of them a cavity.
- (iii’)
We have \(\partial \{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}\cap \{a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\}=\varnothing \).
- (iv’)
Let
$$\begin{aligned} \mathscr {U}={\hat{\mathscr {C}}}\cup \partial ^2\{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}\cup \partial \{a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\}. \end{aligned}$$Then for any constraint node \(a\not \in \partial \{v_{N_{n,\omega }-\ell +1},\ldots ,v_{N_{n,\omega }}\}\cup a_{{\hat{M}}_{n,\omega }-\ell '+1},\ldots ,a_{{\hat{M}}_{n,\omega }}\) we have \(|\partial a\cap \mathscr {U}|\le 1\). Thus, only the constraint nodes adjacent to the last \(\ell \) variable nodes or the \(a_i\) with \(i>{\hat{M}}_{n,\omega }-\ell '\) may be adjacent to more than one variable node in \(\mathscr {U}\).
- (v’)
All variable nodes \(u\in \mathscr {U}\) have degree d or \(d-1\).
Additionally, let \(\mathscr {A}''\) be the event that \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}\) has the following properties.
- (i’)
all variable nodes have degree either d or \(d-1\).
- (ii’)
no two variable nodes of degree \(d-1\) are adjacent to the same constraint node.
Then
Furthermore, given \(\mathscr {A}''\cap \mathscr {A}\), the random factor graph \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}[\ell ,\ell ']\) obtained by attaching \(\ell \) new variable nodes and \(\ell '\) new constraint nodes is distributed precisely as \({\hat{{\mathbb {G}}}}_{n,\omega }\) given \(\mathscr {A}'\cap \mathscr {A}\). Indeed, the construction of the enhanced factor graph \({\hat{{\mathbb {G}}}}_{n,\omega ,\varvec{X}-\ell ,\varvec{Y}-\ell '}[\ell ,\ell ']\) expressly ensures that (i’)–(iii’) are satisfied, and (iv’)–(v’) follow from (i”)–(ii”). Hence, (6.27) yields
Finally, since \(\mathbb {P}\left[{\mathscr {A}}\right]=1-o_\omega (1)\), the assertion follows from (6.26) and (6.28). \(\quad \square \)
Let \({\hat{\mathscr {C}[\ell }},\ell ']\) be the set of cavities of \({\hat{{\varvec{G}}}}_{n,\omega }\left[{\ell ,\ell '}\right]\) and let \({\hat{\rho }}_{n,\omega }[\ell ,\ell ']\in \mathfrak {K}\) be the kernel representing \(\mu _{{\hat{{\varvec{G}}}}_{n,\omega }\left[{\ell ,\ell '}\right],{\hat{\mathscr {C}[\ell }},\ell ']}\). Let \({\hat{\pi }}_{n,\omega }\left[{\ell ,\ell '}\right]\in \mathfrak {D}\) be the distribution of \({\hat{\rho }}_{n,\omega }[\ell ,\ell ']\). Define \({\tilde{\mathscr {C}}}[\ell ,\ell ']\), \({\tilde{\rho }}_{n,\omega }[\ell ,\ell ']\) analogously for \({\hat{{\mathbb {G}}}}_{n,\omega }\).
Lemma 6.8
For any \(\ell ,\ell '\ge 0\) we have
Proof
The event \(\mathscr {A}_1=\left\{ {\omega /2\le \varvec{X},\varvec{Y}\le 2\omega }\right\} \) occurs with probability \(1-o_\omega (1)\). So does the event \(\mathscr {A}_2\) that all variable nodes of \({\hat{{\mathbb {G}}}}_{n,\omega }\) have degree either d or \(d-1\), and thus the same is true of \(\mathscr {A}=\mathscr {A}_1\cap \mathscr {A}_2\). Moreover, the construction of \({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell ']\) is such that on the event \(\mathscr {A}\) we have the exact formula
Consequently, the joint distribution of the cavities \({\tilde{\mathscr {C}}}[\ell ,\ell ']\) of \({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell ']\) reads
Thus, with probability \(1-o_\omega (1)\), namely on the event \(\mathscr {A}\), \({\tilde{\rho }}_{n,\omega }[\ell ,\ell ']\) is just the kernel representing the right hand side of (6.29) We claim that in this case \({\tilde{\rho }}_{n,\omega }[\ell ,\ell ']\) and \({\tilde{\rho }}_{n,\omega }^{*(\ell ,\ell ')}\) can be coupled to coincide with probability \(1-o_\omega (1)\). Indeed, the weight functions associated with the \({\hat{a}}_i\) and the \({\hat{a}}_{i,j}\) are chosen from P independently, and they are connected to the cavities of \({\hat{{\mathbb {G}}}}_{n,\omega }\) by a random pairing. By comparison, we construct \({\tilde{\rho }}_{n,\omega }^{*(\ell ,\ell ')}\) by adjoining \(\varvec{\varphi }_1,\ldots ,\varvec{\varphi }_\ell \) and \(\varvec{\psi }_1,\ldots ,\varvec{\psi }_{\ell '}\) that evaluate the kernel \({\tilde{\rho }}_{n,\omega }\) at independent uniformly random points of the unit interval. Combinatorially, this is equivalent to attaching the new variable and constraint nodes to random cavities chosen with replacement, rather than without replacement as in the construction of \({\hat{{\mathbb {G}}}}_{n,\omega }[\ell ,\ell ']\). But since the number of cavities of \({\hat{{\mathbb {G}}}}\) is \(\Omega _\omega (\omega )\), the two constructions have total variation distance \(o_\omega (1)\). \(\quad \square \)
Proof of Proposition 6.3
The proposition is immediate from Lemmas 6.7 and 6.8. \(\square \)
6.4 Proof of Proposition 6.1
We begin with the following lemma, whose proof is similar to the proof of Lemma 5.5.
Lemma 6.9
We have
Proof
Let \(\pi ={\hat{\pi }}_{n,\omega }\) or \(\pi ={\tilde{\pi }}_{n,\omega }\). Since \(\mathbb {E}[M_{n,\omega }-{\hat{M}}_{n,\omega }]=d(k-1)/k+o_\omega (1)\), due to (2.1) it suffices to show that
Thus, we need to cope with the correlations between \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) and \({\hat{{\varvec{G}}}}_{n,\omega }\) or \({\hat{{\mathbb {G}}}}_{n,\omega }\), respectively. In other words, we need to assess the correlations between \(M_{n,\omega }-{\hat{M}}_{n,\omega }\) and \({\hat{N}}_{n,\omega }\), \({\hat{M}}_{n,\omega }\). With probability \(1-\exp (-\Omega _\omega (\omega ))\) we have
with independent Bernoulli variables \(\Delta _{n,\omega },\Delta _{n+1,\omega }\). Thus, \(W\le d+1\) and (2.1) ensures that
Furthermore, since \(\varvec{X},\varvec{Y}\) are independent Poisson variables with mean \(\omega \) while W is bounded, for any \({\hat{n}},{\hat{m}}\) such that \(|{\hat{n}}-\mathbb {E}[{\hat{N}}_{n,\omega }]|,|{\hat{m}}-\mathbb {E}[{\hat{M}}_{n,\omega }]|\le \sqrt{\omega \log \omega }\) we obtain from (6.31) that
Hence, introducing an independent copy \(W'\) of W, we obtain from (2.1) and (6.32) that
Additionally, we claim that for any \(0\le w\le d+1\),
Indeed, as in the proof of Lemma 5.5 we obtain
Further, (2.1), Corollary 3.4 and Proposition 6.3 yield
As the logarithm can be approximated arbitrarily well by polynomials due to (2.1), (6.34) follows from (6.35)–(6.36). Finally, (6.30) follows from (6.33) and (6.34). \(\quad \square \)
Proof of Proposition 6.1
Proposition 6.3 and Lemma 6.9 show that for any \(\ell \ge 1\) there exists \(\omega _{\ell }>\omega _{\ell -1}\) such that for all sufficiently large n we have \({\hat{\pi }}_{n,\omega _\ell }\in \mathfrak {D}^\star _{1/\ell ,\ell ,\ell }\) and
Since \(\mathfrak {D}\) is compact, the sequence \(({\hat{\pi }}_{n,\omega _\ell })_n\) has a convergent subsequence, whose limit \({\hat{\pi }}^{(\ell )}\) lies in the closed set \(\mathfrak {D}^\star _{1/\ell ,\ell ,\ell }\). Furthermore, because Lemma 3.3 shows that \(\mathscr {B}(\,\cdot \,)\) is continuous, (6.37) yields
Additionally, \(({\hat{\pi }}^{(\ell )})_\ell \) has a subsequence that converges to \({\tilde{\pi }}\in \mathfrak {D}^\star =\bigcap _\ell \mathfrak {D}^\star _{1/\ell ,\ell ,\ell }\). Thus, the first assertion follows from (6.38) and the continuity of \(\mathscr {B}(\,\cdot \,)\) established by Corollary 3.5.
The second assertion concerning the simple random factor graph \({\mathbb {G}}\) is immediate from the first. Indeed, due to (2.1) a standard application of Azuma’s inequality shows that \(n^{-0.51}|\log Z({\varvec{G}})-\mathbb {E}\log Z({\varvec{G}})|\rightarrow 0\) in probability. Hence, Fact 2.2 and Bayes’ rule imply that \(\mathbb {E}\log Z({\mathbb {G}})-\mathbb {E}\log Z({\varvec{G}})=o(n)\). \(\quad \square \)
6.5 Proof of Theorem 2.6
We begin by showing that the free energy of \({\varvec{G}}\) can be expressed in terms of the functional \(\mathscr {B}\) applied to \({\hat{\pi }}_{n,\omega }\) or \({\tilde{\pi }}_{n,\omega }\), respectively. Once more we will carry the details out for \({\mathbb {G}}\); the unconditioned random factor graph \({\varvec{G}}\) is easier to deal with, and the proofs are just obtained from the \({\mathbb {G}}\) case by dropping any considerations regarding multiple edges.
Lemma 6.10
If POS is satisfied, then
Proof
Proposition 6.2 and Lemma 6.9 show that for any \(\varepsilon >0\) there exists \(\omega _0\) such that for all \(\omega >\omega _0\) there exists \(n_0\) such that for all \(n>n_0\) we have \(n^{-1}\mathbb {E}\log Z({\mathbb {G}})\ge \mathscr {B}({\tilde{\pi }}_{n,\omega })-\varepsilon .\) Hence, for any \(\varepsilon >0\) there is \(\omega _0>0\) such that for all \(\omega >\omega _0\) we have
Indeed, since Propositions 5.1 and 6.1 show that \((\frac{1}{n}\mathbb {E}\log Z({\mathbb {G}}))_n\) converges, (6.39) yields
We are left to prove the converse inequality. The space \(\mathfrak {D}\) is compact and separable. Therefore, for any \(\omega \) the sequence \(({\tilde{\pi }}_{n,\omega })_n\) has a subsequence that converges to \({\tilde{\pi }}^{(\omega )}\in \mathfrak {D}\) such that \(\liminf _{n\rightarrow \infty }\mathscr {B}({\tilde{\pi }}_{n,\omega })=\mathscr {B}({\tilde{\pi }}^{(\omega )})\). Further, \(({\tilde{\pi }}^{(\omega )})_\omega \) has a subsequence that converges to \({\tilde{\pi }}^*\) such that
Proposition 6.3 shows that \({\tilde{\pi }}^{*}\in \mathfrak {D}^\star \). Hence, Proposition 5.1 implies that
Thus, the assertion follows from (6.40)–(6.42). \(\quad \square \)
To proceed we need a small twist on Lemma 6.10. Namely, instead of using \({\hat{{\mathbb {G}}}}_{n,\omega }\) as our reference point, we are going to work with \({\mathbb {G}}_{n,\omega }\). Thus, let \(\mathscr {C}\) be the set of cavities of \({\mathbb {G}}_{n,\omega }\) and let \(\rho _{n,\omega ,\mathscr {S}}\in \mathfrak {K}\) be the kernel representing \(\mu _{{\mathbb {G}}_{n,\omega },\mathscr {C}}\). Further, let \(\pi _{n,\omega ,\mathscr {S}}\in \mathfrak {D}\) be the distribution of \(\rho _{n,\omega ,\mathscr {S}}\). Define \(\rho _{n,\omega }\), \(\pi _{n,\omega }\) analogously with respect to \({\varvec{G}}_{n,\omega }\).
Corollary 6.11
If POS is satisfied, then
Proof
Since \(\varvec{X},\varvec{Y}\) are Poisson variables with a large mean \(\omega \), \({\hat{M}}_{n,\omega }\) and \(M_{n,\omega }\) can be coupled so that both coincide with probability \(1-o_\omega (1)\). This coupling naturally extends to a coupling of \({\hat{{\mathbb {G}}}}_{n,\omega }\) and \({\mathbb {G}}_{n,\omega }\) under which \({\mathbb {G}}_{n,\omega }={\hat{{\mathbb {G}}}}_{n,\omega }\) with probability \(1-o_\omega (1)\). Consequently, recalling that \(\mathscr {D}_{\Box }(\,\cdot \,,\,\cdot \,)\) stands for the Wasserstein metric on \(\mathfrak {D}\), we have \(\mathscr {D}_{\Box }(\pi _{n,\omega ,\mathscr {S}},{\tilde{\pi }}_{n,\omega })=o_\omega (1)\). Thus, the assertion follows from the Corollary 3.5. \(\quad \square \)
We recall the construction of the kernel \({\check{\mu }}_{{\varvec{G}},\varvec{X},\varvec{Y}}\in \mathfrak {K}\) from (2.14). Let \({\check{\pi }}_{n,\omega }\in \mathfrak {D}\) be the distribution of \({\check{\mu }}_{{\varvec{G}},\varvec{X},\varvec{Y}}\), and define \({\check{\mu }}_{{\mathbb {G}},\varvec{X},\varvec{Y}}\in \mathfrak {K}\), \({\check{\pi }}_{n,\omega ,\mathscr {S}}\in \mathfrak {D}\) analogously with respect to \({\mathbb {G}}\). Due to the inevitable divisibility condition required to construct a regular factor graph, these kernels are defined whenever k|dn. The following proposition summarizes the main step toward the proof of Theorem 2.6.
Proposition 6.12
For any \(\alpha >0\) there is \(\omega _0>0\) such that for every \(\omega >\omega _0\) there exists \(n_0>0\) such that for all \(n>n_0\) with k|dn we have
To prove Proposition 6.12 we let \(\mathscr {V}=\{v_{i}:i>N_{n,\omega }\}\) and \(\mathscr {A}=\{a_i:i>M_{n,\omega }\}\cup \bigcup _{v\in \mathscr {V}}\partial v\) be the sets of variable and constraint nodes, respectively, that are present in \({\mathbb {G}}\) but not in \({\mathbb {G}}_{n,\omega }\). Similarly as in Sect. 6.3, conditioning on the event that \(kM_{n,\omega }\le dN_{n,\omega }-d(k-1)\varvec{X}-k\varvec{Y}\), we define an enhanced random factor graph \({\mathbb {G}}^{\#}\) by
adding the variable nodes \(\mathscr {V}\) to \({\mathbb {G}}_{n,\omega }\) along with with \(d\varvec{X}\) new constraint nodes \(a_{v,j}^{\#}\), \(v\in \mathscr {V}\), \(j\in [d]\). Each \(a_{v,j}^{\#}\) is adjacent to v and \(k-1\) random cavities of \({\mathbb {G}}_{n,\omega }\),
adding \(\varvec{Y}\) more constraint nodes \(a_1^{\#},\ldots ,a_{\varvec{Y}}^{\#}\), each connected with k random cavities of \({\mathbb {G}}_{n,\omega }\).
Of course, the cavities in the above construction are drawn without replacement and all weight functions are chosen from P independently. We do not require that the outcome \({\mathbb {G}}^{\#}\) be simple. Let
comprise the new constraint nodes.
Lemma 6.13
We have \(d_{\mathrm {TV}}({\mathbb {G}},{\mathbb {G}}^{\#})=o_\omega (1)\).
Proof
Similarly as in the proof of Lemma 6.7, we consider the event \({{\mathscr {E}}}=\{\omega /2\le \varvec{X}\le 2\omega ,\, \omega /2\le \varvec{Y}\le 2\omega \}\), which has probability \(1-o_\omega (1)\). Further, let \({{\mathscr {E}}}'\) be the event that \({\mathbb {G}}\) enjoys the following additional properties.
- (i’)
We have \(|\partial ^2\mathscr {V}|=|\mathscr {V}|d(k-1).\)
- (ii’)
\(|\partial \{a_{ M_{n,\omega }+1},\ldots ,a_{m}\}|=k(m-M_{n,\omega })\) and \(\partial \mathscr {V}\cap \{a_{M_{n,\omega }},\ldots ,a_{ m}\}=\varnothing \).
- (iii’)
If \(a\not \in \mathscr {A}\), then a is connected to the set \(\partial \mathscr {A}\) by at most one edge.
Additionally, let \({{\mathscr {E}}}''\) be the event that \({\mathbb {G}}_{n,\omega }\) has the following properties.
- (i”)
all variable nodes have degree either d or \(d-1\).
- (ii”)
no two cavities are adjacent to the same constraint node.
We have
Moreover, \({\mathbb {G}}\) given \({{\mathscr {E}}}'\) is distributed precisely as \({\mathbb {G}}^{\#}\) given \({{\mathscr {E}}}''\). Thus, the assertion follows from (6.43). \(\quad \square \)
Due to Lemma 6.13 we can apply Theorem 2.5 to \({\mathbb {G}}^{\#}\). Let \(S_1^{\#},\ldots ,S_\ell ^{\#}\) denote the resulting Bethe state decomposition of \({\mathbb {G}}^{\#}\). Let \(T_i^{\#}=S_i^{\#}\cap \Omega ^{V_n{\setminus }\mathscr {V}}\) for \(i\in [\ell ]\). Further, we introduce
Thus, \(\mu _{{\mathbb {G}}^\#,i}\in \mathscr {P}(\Omega ^{V_n{\setminus }\mathscr {V}})\).
Lemma 6.14
With probability \(1-o_\omega (1)\) the sets \(T_1^\#,\ldots ,T_\ell ^\#\) are pairswise disjoint and we have
Proof
We recall from Sect. 4.1 that the decomposition \(S_1^\#,\ldots ,S_\ell ^\#\) is constructed by pinning the values of a random set \(\varvec{U}_*\) of variables to specific spins. Since the size of this set is bounded, with high probability we have \((\mathscr {C}\cup \mathscr {V})\cap \varvec{U}_*=\varnothing \). We will prove that in this case, \(\mu _{{\mathbb {G}}^\#,i}(\sigma )=\mu _{{\mathbb {G}}_{n,\omega }}(\sigma |T_i^\#)\) for all \(i,\sigma \).
If \((\mathscr {C}\cup \mathscr {V})\cap \varvec{U}_*=\varnothing \), then \(T_1^\#,\ldots ,T_\ell ^\#\) are pairwise disjoint. Thus, fix \(i\in [\ell ]\) and \(\sigma \in T_i^\#\). Then by the construction of \({\mathbb {G}}^\#\),
Combining (6.46) and (6.47), we obtain the second identity in (6.45). Further, combining the second part of (6.45) with (6.46) and (6.47), we find
thereby establishing the first part of (6.45). \(\quad \square \)
W.h.p. each cavity \(v\in \mathscr {C}\) of \({\mathbb {G}}_{n,\omega }\) has degree \(d-1\). In this case, we denote by \(b_v\) the unique neighbour of v in \({\mathbb {G}}^\#\) that is not present in \({\mathbb {G}}_{n,\omega }\). Further, for \(i\in [\ell ]\) let \(\varvec{\nu }_{{\mathbb {G}},i}\in \mathscr {P}(\Omega ^{\mathscr {C}})\) be the product measure
In close analogy to the weights introduced in (2.13), we also define
Lemma 6.15
With probability \(1-o_\omega (1)\) we have \(\sum _{h=1}^\ell \left| {\varvec{z}_{{\mathbb {G}}^\#,h}-{\check{\varvec{z}}}_{h}^\#}\right| =o(1)\) and
Proof
Fix \(h\in [\ell ]\) and suppose that \(S_h^\#\) is an o(1)-Bethe state, which occurs with probability \(1-o_\omega (1)\) due to Theorem 2.5 and Lemma 6.13. Then by BS2 w.h.p. we have for any \(\sigma \in \Omega ^{\mathscr {V}\cup \mathscr {C}}\),
Further, w.h.p. each cavity of \({\mathbb {G}}_{n,\omega }\) has degree \(d-1\); in this case, denote by \(c_v\) the unique neighbor of v in \({\mathbb {G}}^{\#}\) that is absent in \({\mathbb {G}}_{n,\omega }\). Then by (6.49) w.h.p. we have
Summing on h completes the proof of the first assertion.
With respect to the second assertion, for \(\sigma \in \Omega ^{\mathscr {C}}\) we have w.h.p.
as claimed. \(\quad \square \)
Proof of Proposition 6.12
Let \(\nu _{{\mathbb {G}}^{\#}}={\sum _{i=1}^\ell {\check{\varvec{z}}}^{\#}_i\nu _{{\mathbb {G}}^{\#},i}}/{\sum _{i=1}^\ell {\check{\varvec{z}}}_{{\mathbb {G}}^{\#},i}}\) and let \(\pi ^\#_{n,\omega ,\mathscr {S}}\) be the distribution of the kernel representation \({\dot{\nu }}_{{\mathbb {G}}^{\#}}\in \mathfrak {K}\). Then up to a renumbering of the variable and constraint nodes, \({\check{\mu }}_{{\mathbb {G}}^{\#},\varvec{X},\varvec{Y}}\in \mathfrak {K}\) is distributed as the representation of \(\nu _{{\mathbb {G}}^{\#}}\). Specifically, in (2.14) we renumbered the nodes such that \(\mathscr {V}\) comprises the first \(\varvec{X}\) variable nodes and such that the \(a_i^{\#}\), \(i\in [\varvec{Y}]\), are the first \(\varvec{Y}\) constraint nodes. Due to Lemma 6.13 and because \({\mathbb {G}}\) and \({\varvec{G}}\) are invariant under node permutations, we conclude that \(\mathscr {D}_{\Box }(\pi ^\#_{n,\omega ,\mathscr {S}},{\check{\pi }}_{n,\omega ,\mathscr {S}})=o_\omega (1)\). Furthermore, combining Lemmas 3.14, 6.14 and 6.15, we see that \(\mathbb {E}[\Delta _{\Box }(\mu _{{\mathbb {G}}_{n,\omega },\mathscr {C}},\nu _{{\mathbb {G}}^{\#}})]=o_\omega (1)\). Hence, invoking (3.4), we conclude that \(\mathbb {E}[\mathscr {D}_{\Box }(\rho _{n,\omega ,\mathscr {S}},{\dot{\nu }}_{{\mathbb {G}}^{\#}})]=o_\omega (1)\). Thus, the triangle inequality yields \(\mathscr {D}_{\Box }(\pi _{n,\omega ,\mathscr {S}},{\check{\pi }}_{n,\omega ,\mathscr {S}})=o_\omega (1)\). The same argument applies to \(\pi _{n,\omega }\) and \({\check{\pi }}_{n,\omega }\). \(\quad \square \)
As a final preparation toward the proof of Theorem 2.6, we need the following simple lemma.
Lemma 6.16
For any fixed integer \(\ell \) we have
Proof
The random factor graph \({\mathbb {G}}_{n,\omega }\) or \({\varvec{G}}_{n,\omega }\), respectively, has \(n-\varvec{X}\) variable nodes with probability \(1-o_\omega (1)\). Similarly, the number of variable nodes of \({\mathbb {G}}_{n+\ell ,\omega }\) or \({\varvec{G}}_{n+\ell ,\omega }\) is \(n+\ell -\varvec{X}\) with probability \(1-o_\omega (1)\). Since \(\varvec{X}\) is a Poisson variable with mean \(\omega \), we have \(d_{\mathrm {TV}}(n+\ell -\varvec{X},n-\varvec{X})=o_\omega (1)\). Hence, we can couple \({\mathbb {G}}_{n+\ell ,\omega }\) and \({\mathbb {G}}_{n,\omega }\) as well as \({\varvec{G}}_{n+\ell ,\omega }\) and \({\varvec{G}}_{n,\omega }\) in such a way that both coincide w.h.p. This coupling extends to the distributions \(\rho _{n,\omega ,\mathscr {S}},\rho _{n+\ell ,\omega ,\mathscr {S}}\) and \(\rho _{n,\omega },\rho _{n+\ell ,\omega }\). \(\quad \square \)
Proof
Corollary 6.11 yields the free energy formula in terms of the distributions \(\pi _{n,\omega }\) and \(\pi _{n,\omega ,\mathscr {S}}\), respectively. Furthermore, Proposition 6.12 implies together with Corollary 3.5 that
with the limit on n confined to integers such that k|dn each time. But Lemma 6.16 implies with Corollary 3.5 that this divisibility condition does not alter the limits on the left hand side of these equations, i.e.,
Thus, combining (6.51)–(6.54) and invoking Corollary 6.11, we obtain
where, of course, the limit is confined to n such that k|dn because \({\mathbb {G}}\), \({\varvec{G}}\) and \({\check{\pi }}_{n,\omega }\), \({\check{\pi }}_{n,\omega ,\mathscr {S}}\) are defined only for such n; this is the assertion. \(\quad \square \)
7 Applications
In Sect. 7.1 we prove that the spin glass model from Sect. 1.2 satisfies the condition POS; the results stated in Sect. 1.2 are then immediate from those in Sect. 2. Further, in Sections 7.2 and 7.3 we apply the results from Sect. 2 to two further models, the Potts antiferromagnet and the random regular k-SAT model. Finally, in Sect. 7.4 we show how the theorems from Sect. 2 can be brought to bear on the hard-core model, thereby proving the results stated in Sect. 1.3.
7.1 The spin glass
To derive the results on the spin glass model stated in Sect. 1 from the general theorems in Sect. 2, we just need to verify the condition POS for the spin glass model. In Example 2.3 we introduced the relevant weight function even in the more general case of the k-spin model; the case \(k=2\) corresponds to the spin glass on the Bethe lattice.
Lemma 7.1
The k-spin model satisfies POS for all \(d\ge 3\), \(\beta >0\) and all even \(k\ge 2\).
Proof
The lemma is already implicit in [35, 55]; but let us carry the simple proof out for completeness. Let \(\varvec{J}\) be a standard Gaussian. Upon substituting the weight functions from Example 2.3 into POS and multiplying by \(2^\ell \), POS reads
for all measurable \(\mu ,\mu ':[0,1]^2\rightarrow [0,1]\). Expanding the first expectation yields
Since \(\varvec{J}\) is independent of the \(\varvec{x}_i\), the last expectation vanishes if j is odd, while \(\tanh (\beta \varvec{J})^j\ge 0\) if j is even. Thus, in order to establish (7.1) it suffices to show that for any even \(j\ge 2\),
Let \(\varvec{s}_1,\ldots ,\varvec{s}_j\in [0,1]\) be uniformly distribution and mutually independent as well as independent of the \(\varvec{x}_i\). Then Fubini’s theorem yields
Since for even k we have \(X^k+(k-1)Y^k-kXY^{k-1}\ge 0\) for all \(X,Y\in \mathbb {R}\), (7.3)–(7.5) yield (7.2). \(\quad \square \)
Due to Lemma 7.1, Theorem 1.1 follows from Theorem 2.5, Theorem 1.2 follows from Theorem 2.6 and Theorem 1.3 follows from Theorem 2.7.
Remark 7.2
Indeed, together with Lemma 7.1 the results from Sect. 2 yield the Bethe state decomposition and the corresponding formulas for the free energy for the k-spin model for any even \(k\ge 2\).
7.2 The Potts model
For an integer \(q\ge 2\) let \(\Omega =\{1,\ldots ,q\}\) be a set of q distinct colors Also let \(\beta >0\) be a real parameter, the inverse temperature. The Potts antiferromagnet on \({\mathbb {G}}\) is the distribution on \(\Omega ^{V_n}\) defined by
where the partition function \(Z_\beta ({\mathbb {G}})\) provides normalization; we omit the reference to \(\beta \) where possible. Thus, for a given \(\sigma \in \Omega ^{V_n}\) each monochromatic edge of \({\mathbb {G}}\) incurs an \(\exp (-\beta )\) penalty factor.
The Potts antiferromagnet and the associated optimization problems, the Max q-Cut problem, are of fundamental importance in combinatorics. Krzakala and Zdeborová [42] brought the cavity method to bear on this model. In the following we show how the main results of the present paper apply to this model to underpin the predictions from [42] rigorously. In particular, we specialize the Belief Propagation equations to the Potts model, work out the variational formula for the free energy and apply this formula to the Max q-Cut problem on the random regular graph.
The Potts model on \({\mathbb {G}}(n,d)\) can be cast as a random factor graph model with a single weight function
Thus, \(k=2\), \(\Psi =\{\psi _\beta \}\) and \(P(\psi _\beta )=1\) and the prior distribution p is uniform on \(\Omega \). Since the constraints are binary, the random regular factor graph \({\varvec{G}}\) can be identified with the usual random d-regular graph \({\mathbb {G}}\), with the edges representing the factor nodes.
Lemma 7.3
The Potts model satisfies condition POS for all \(\beta >0\).
Proof
We plug the definition of \(\psi _\beta \) into POS and notice that the \(1-\mathrm {e}^{-\beta }\) factors cancel. Hence, the desired inequality reads
Applying Fubini’s theorem to take the expectation on \(\varvec{x}_1,\varvec{x}_2\) inside, we find
Similar manipulations yield
Combining (7.7)–(7.9), we conclude that the l.h.s. of (7.6) is the expectation of a sum of squares, and thus non-negative. \(\quad \square \)
The message space \(\mathscr {S}({\mathbb {G}})\) of the Potts model boils down to the set of all families \((\mu _{v\rightarrow w})_{v\in V_n,w\in \partial w}\), with \(\mu _{v\rightarrow w}\in \mathscr {P}(\Omega )\). With this simplification the Belief Propagation operator \(\mathrm {BP}:\mathscr {S}({\mathbb {G}})\rightarrow \mathscr {S}({\mathbb {G}})\), \(\nu \mapsto {\hat{\nu }}\) of the Potts model reads
With respect to Bethe states, we expect that the phase space \(\Omega ^n\) decomposes into \(S_1,\ldots ,S_\ell \) such that the conditional distribution \(\mu _{{\mathbb {G}}}[\,\cdot \,|S_i]\) are free of long-range correlations, that their standard messages form an approximate fixed point of BP and that the conditional marginals derive from the messages. In formulas, with high probability over the choice of the graph and with \(({\hat{\mu }}_{{\mathbb {G}},v\rightarrow u}[\,\cdot \,|S_h])_{u\in \partial v}=\mathrm {BP}(\mu _{{\mathbb {G}},v\rightarrow u}[\,\cdot \,|S_h])_{u\in \partial v}\), we aim to show that
The following theorem establishes these facts.
Theorem 7.4
For any sequence \(L=L(n)\rightarrow \infty \) and all \(d\ge 3\), \(\beta >0\) the following is true. With high probability \({\mathbb {G}}\) admits a decomposition \(S_0,S_1,\ldots ,S_\ell \), \(\ell \le L\), of the phase space \(\Omega ^n\) such that \(\mu _{{\mathbb {G}}}(S_0)=o(1)\) and such that (7.11)–(7.12) are satisfied for \(h=1,\ldots ,\ell \).
Proof
This is immediate from Theorem 2.5 applied to the factor graph representation of the Potts model. \(\quad \square \)
With respect to the free energy, let \(\varvec{X},\varvec{Y}\) be two independent Poisson variables with mean \(\omega \). Let \(\varvec{u}_1,\ldots ,\varvec{u}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\) be uniformly random vertices and edges of \({\mathbb {G}}\), chosen independently. With \(S_1,\ldots ,S_\ell \) the decomposition from Theorem 7.4, we introduce the weights
and \(\varvec{z}_{{\mathbb {G}}}=\sum _{h=1}^\ell \varvec{z}_{{\mathbb {G}},h}\). Further, let \(\mathscr {C}({\mathbb {G}})\) be the set of all vertices of degree less than d in the graph obtained from \({\mathbb {G}}\) by removing \(\varvec{v}_1,\ldots ,\varvec{v}_{\varvec{X}}\) and \(\varvec{v}_1\varvec{w}_1,\ldots ,\varvec{v}_{\varvec{Y}}\varvec{w}_{\varvec{Y}}\). Then with high probability each \(c\in \mathscr {C}({\mathbb {G}})\) has degree precisely \(d-1\), and we write \(c'\) for the missing d’th neighbor of c. Then with \(\varvec{c}_1,\varvec{c}_2,\ldots \) a sequence of uniformly and independently chosen elements of \(\mathscr {C}({\mathbb {G}})\), we let
Theorem 7.5
For all \(d\ge 3,\beta >0\) we have \(\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}})]=\liminf _{\omega \rightarrow \infty }\liminf _{n\rightarrow \infty }\,\mathbb {E}[\mathscr {B}({\mathbb {G}})].\)
Proof
This is an immediate consequence of Theorem 2.6 and Lemma 7.3. \(\quad \square \)
Additionally, Theorem 2.7 yields a variational formula for the free energy. Writing out the specifics of the Potts case, we see that \(\mathfrak {D}^\star \) consists of all \(\pi \in \mathfrak {D}\) that satisfy the following property. For a measurable \(\mu :[0,1]^2\rightarrow \mathscr {P}(\Omega )\) with \(\Omega =[q]\) and integers \(N,M\ge 0\) let
Then we let \(\mu ^{*(N,M)}_{s,x}=\mu _{\varvec{t},x}\). Now \(\mathfrak {D}^\star _\beta \) is the set of all \(\pi \in \mathfrak {D}\) such that for a random \(\varvec{\mu }^{\pi }\in \mathfrak {K}\) drawn from \(\pi \), the perturbed \(\varvec{\mu }^{\pi *(N,M)}\in \mathfrak {K}\) again has distribution \(\pi \). Furthermore, in the Potts model the functional \(\mathscr {B}(\,\cdot \,)\) reads
Theorem 7.6
For all \(d\ge 3\), \(\beta >0\) we have
As a further application we obtain a variational formula for the Max q-Cut of the random regular graph, which is defined as
Thus, \(\mathrm {MC}_q({\mathbb {G}})\) equals the total number of edges of \({\mathbb {G}}\) minus the ground state energy of the Potts model. In other words, \(\mathrm {MC}_q({\mathbb {G}})\) is the maximum, over the choice of \(\sigma :[n]\rightarrow [q]\), of the number of edges that link vertices of different colors The Max q-Cut problem is well-studied in combinatorics and computer science. In particular, the problem is well known to be NP-hard on worst-case instances.
Corollary 7.7
For all \(d\ge 3\) we have \(\displaystyle \mathrm {MC}_q({\mathbb {G}})/n\ {{\mathop {\longrightarrow }\limits ^{{n\rightarrow \infty }}}}\ \frac{d}{2}+\lim _{\beta \rightarrow \infty }\Phi _{d,\beta +1}-\Phi _{d,\beta }\) in probability.
Proof
Since Azuma’s inequality shows that \(\mathrm {MC}_q({\mathbb {G}})\) is concentrated within \(O(\sqrt{n\log n}\)) about its mean, it suffices to prove that
Further, introducing \(\mathscr {H}_{{\mathbb {G}}}(\sigma )=\frac{1}{2}\sum _{v,w=1}^n\varvec{1}\{w\in \partial v,\,\sigma (v)=\sigma (w)\}\) and recalling (7.14), we can rewrite (7.15) as
To prove (7.16) we write \(\mu _{G,\beta }\in \mathscr {P}([q]^{V(G)})\) for the Potts distribution induced by a d-regular graph \(G=(V(G),E(G))\). Moreover, let us denote the Potts Hamiltonian by \(\mathscr {H}_G\) and the partition function by \(Z_\beta (G)\). It is well known that for any \(\varepsilon >0\) there exists \(\beta _0(\varepsilon )>0\) such that for all \(\beta >\beta _0(\varepsilon )\) and all d-regular graphs G we have
Consequently, for all \(\beta >\beta _0(\varepsilon )\) we have
Since \(\left\langle {{\mathscr {H}_G},{\mu _{G,\beta }}}\right\rangle =-\frac{\partial }{\partial \beta }\log Z_\beta (G)\), (7.18) yields
Applying (7.19) to the random regular graph \({\mathbb {G}}\) and taking expectations, we obtain
Hence, taking \(n\rightarrow \infty \), we obtain for all \(\beta >\beta _0(\varepsilon )\),
Finally, there exists a subsequence \((n_l)\) along which \(\mathbb {E}\left[{\min _{\sigma :[n_l]\rightarrow [q]}\mathscr {H}_{{\mathbb {G}}(n_l,d)}(\sigma )}\right]/n_l\) converges to a number \(\xi \ge 0\). Taking the limit of (7.20) along this subsequence, we obtain \(\xi \le \Phi _{d,\beta }-\Phi _{d,\beta +1}\le \xi +\varepsilon \) for all \(\beta >\beta _0(\varepsilon )\). Consequently, the limit \(\lim _{\beta \rightarrow \infty }\Phi _{d,\beta }-\Phi _{d,\beta +1}\) exists. Therefore, taking \(\beta \rightarrow \infty \) in (7.21), we conclude that
exists as well and that (7.16) is satisfied. \(\quad \square \)
7.3 The regular k-SAT model
The k-SAT problem plays a major role in computer science, particularly in computational complexity theory. In its optimization version, known as the Max k-SAT problem asks for the largest number of clauses of a propositional formula in conjunctive normal form with clauses of length k that can be satisfied simultaneously. Random instances of k-SAT and Max k-SAT have been studied extensively as instructive benchmarks [5].
We can express the Max k-SAT problem as a factor graph model with spins \(\Omega =\{-1,1\}\) corresponding to the Boolean values ‘true’ and ‘false’ as follows. With \(k\ge 2\) an integer and \(\beta >0\) be a real parameter, we introduce the weight functions
Let p be the uniform distribution on \(\Omega \) and let P be uniform on \(\Psi _\beta =\{\psi _{\beta ,\chi }:\chi \in \Omega ^k\}\). In terms of propositional formulas, the semantics is that \(\psi _{\beta ,\chi }\) encodes a k-clause whose ith literal is negated if \(\chi _i=1\) and positive if \(\chi _i=-1\). Thus, \(\prod _{i=1}^k\chi _i\sigma _i=1\) if the truth assignment \(\sigma \) fails to satisfy the clause, and \(\prod _{i=1}^ks_i\sigma _i=-1\) otherwise. In effect, \(\psi _{\beta ,s}(\sigma )=(1-\tanh \beta )/2\rightarrow 0\) as \(\beta \rightarrow \infty \) if \(\sigma \) fails to satisfy the clause, whereas \(\psi _{\beta ,s}(\sigma )=(1+\tanh \beta )/2\rightarrow 1\) if \(\sigma \) is satisfying. Hence, the random factor graph \({\varvec{G}}\) models a random k-SAT formula in which every variable appears precisely d times, the regular k-SAT model. We are going to derive variational formulas for its free energy and its ground state energy.
Lemma 7.8
The regular k-SAT model satisfies POS for all \(d,k\ge 3\) and all \(\beta >0\).
Proof
Once more this is already implicit in [35, 55], but we carry out the argument here for completeness. Let us write \(\varvec{\chi }\) for a uniformly random element of \(\{\pm 1\}^k\). Substituting \(\psi _{\beta ,\chi }\) into POS and cancelling positive constants, we are left to verify the inequality
Fubini’s theorem yields
Since \(X^k+(k-1)Y^k-kXY^{k-1}\ge 0\) for \(X,Y\ge 0\), (7.23)–(7.25) yield (7.22). \(\square \)
Due to Lemma 7.8 we can bring the results from Sect. 2 to bear on the random regular k-SAT model. Specifically, for a measurable \(\mu :[0,1]^2\rightarrow \mathscr {P}(\Omega )\) with \(\Omega =\{\pm 1\}\) and integers \(N,M\ge 0\) let \((\varvec{\chi }_{i,j})_{i,j\ge 1}\) be independent uniformly random elements of \(\Omega \) and let
Further, let
and \(\mu ^{N,M}_{s,x}=\mu _{\varvec{t},x}\). Then \(\mathfrak {D}^\star _\beta \) consists of all \(\pi \in \mathfrak {D}\) such that \(\varvec{\mu }^{\pi ,N,M}\) has distribution \(\pi \). Furthermore, the functional \(\mathscr {B}(\,\cdot \,)\) reads
Let
Theorem 7.9
For all \(d,k\ge 3\), \(\beta >0\) we have \(\lim _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\varvec{G}})]=\Phi _{d,\beta }\).
Proof
This follows immediately from Theorem 2.7 and Lemma 7.8. \(\quad \square \)
As a further application we also obtain a variational formula for the Max k-SAT problem. Specifically, with the interpretation of \(\sigma \in \Omega ^n\) as a truth assignment, define \(\mathscr {H}_{{\varvec{G}}}(\sigma )\) as the number of propositional clauses of \({\varvec{G}}\) that \(\sigma \) fails to satisfy. Further, let \(\mathrm {OPT}({\varvec{G}})=dn/k-\min _{\sigma \in \Omega ^k}\mathscr {H}_{{\varvec{G}}}(\sigma )\) be the maximum number of clauses that can be satisfied simultaneously. Following the steps of the proof of Corollary 7.7 precisely, we obtain the following result.
Corollary 7.10
For all \(d,k\ge 3\) we have
7.4 The hard-core model
The proofs of Theorem 1.4 and Corollary 1.5 are not entirely straightforward because the hard-core model cannot be cast directly as a factor graph model as in Sect. 2. This is because of the ‘hard’ constraint that \(\varvec{\sigma }_v\varvec{\sigma }_w=0\) for any adjacent v, w. We therefore prove Theorem 1.4 and Corollary 1.5 by way of a relaxed ‘soft-core model’ and taking two limits, first in the ‘softness’ and then in the fugacity. Specifically, we obtain a random factor graph model with \(\Omega =\{0,1\}\) and the prior \(p(0)=1/(1+\lambda )\) and \(p(1)=\lambda /(1+\lambda )\). In addition, to mimic the hard-core constraints we would like to introduce a binary weight function that forbids its two adjacent variable nodes from both taking the spin 1. But since it would take values \(\{0,1\}\), we instead introduce
Thus, \(\beta >0\) is a ‘softness parameter’, and upon taking \(\beta \rightarrow \infty \) we recover the hard-core constraint: \(\psi _\infty (\sigma _1,\sigma _2)=1-\sigma _1\sigma _2.\) For any \(\beta ,\lambda \) and \(d\ge 3\) we obtain the random factor factor graph model \({\varvec{G}}_{\lambda ,\beta }\) with the single binary weight function \(\psi _\beta \).
Lemma 7.11
The model \({\varvec{G}}_{\lambda ,\beta }\) satisfies POS for all \(d\ge 3,\lambda >0,\beta \in (0,\infty ]\).
Proof
Substituting \(\psi _\beta \) into POS and noticing that \(1-\mathrm {e}^{-\beta }>0\), we see that it suffices to verify the inequality
By Fubini’s theorem,
and analogously
Combining (7.27)–(7.29), we conclude that the l.h.s. of (7.26) is the expectation of a square. \(\quad \square \)
We proceed to prove Theorem 1.4. In light of Lemma 7.11, Theorem 2.7 readily yields a variational formula for \({\varvec{G}}_{\lambda ,\beta }\). The main issue that we have to confront is that the resulting variational problem for given \(\lambda ,\beta \) ranges over a spaces that depends on these parameters. In effect, it is not a priori clear that these variational problems bear any relationship to the one stated in Theorem 2.7. To deal with this issue, let \(\mathfrak {D}_{\lambda }\) be the set of all \(\pi \in \mathfrak {D}\) that are supported on \(\mu \in \mathfrak {K}\) such that \(\mu _{s,x}(1)\le \lambda /(1+\lambda )\) for all \(s,x\in [0,1]\). Further, for \(\pi \in \mathfrak {D}_{\lambda }\) we let \(\pi ^{*_\beta (N,M)}\) be the distribution obtained by the adjoining operation with respect to the weight function \(\psi _\beta \). Finally, let
Lemma 7.12
For any \(N,M\ge 0\) the map \(\pi \in \mathfrak {D}_\lambda \mapsto \pi ^{*_\infty (N,M)}\) is continuous.
Like in the case of Lemma 3.7, the proof is based on arguments involving the cut metric. The details can be found in Appendix A.
Lemma 7.13
Let \(N,M\ge 0\) be integers. Uniformly for all \(\pi \in \mathfrak {D}_\lambda \) we have \(\pi ^{*_\beta (N,M)}\rightarrow \pi ^{*_\infty (N,M)}\) as \(\beta \rightarrow \infty \).
Proof
Let \(\varepsilon >0\) For any \(\mu \in \mathscr {K}\) let \(\varvec{Z}^{N,M}_{\mu ,\beta }(s)\) be the weight from (2.16) with respect to the weight function \(\psi _\beta \). Then we see that, uniformly for all \(\mu \) and s,
Furthermore, if \(\mu _{s,x}\le \lambda /(1+\lambda )\) for all s, x, then for all \(\beta \in (0,\infty ]\) we have
Combining (7.30) and (7.31) and recalling the construction of \(\mu ^{*_\beta (N,M)}\), we can construct a measurable map \(\xi :[0,1]\rightarrow [0,1]\) that preserves the Lebesgue measure such that for large enough \(\beta \) for all \(S,X\subset [0,1]\),
Thus, \(\mathscr {D}_{\Box }(\mu ^{*_\beta (N,M)},\mu ^{*_\infty (N,M)})<\varepsilon \) for large \(\beta \). Since \(\mathfrak {D}_\lambda \) is endowed with the \(W_1\)-metric, the assertion follows. \(\quad \square \)
Lemma 7.14
The set \(\mathfrak {K}_\lambda \) is closed.
Proof
We can view \(\mathfrak {K}_\lambda \) as a scaled version of the space of weak kernels. Therefore, since \(\mathfrak {K}\) is complete, so is \(\mathfrak {K}_\lambda \) is complete. Hence, any Cauchy sequence in \(\mathfrak {K}_\lambda \) has a limit within this set, and thus \(\mathfrak {K}_\lambda \) is a closed subspace of \(\mathfrak {K}\). \(\quad \square \)
Corollary 7.15
The set \(\mathfrak {D}_\lambda \) is closed.
Proof
By Lemma 7.14 there exists an increasing sequence of continuous functions \(u_n:\mathfrak {K}\rightarrow [0,1]\) that converges pointwise to \(1-\varvec{1}\mathfrak {K}_\lambda \). Thus, \(\mathfrak {D}_\lambda =\bigcap _{n\ge 1}\left\{ {\pi \in \mathfrak {D}:\int u_n{\mathrm d}\pi {=}0}\right\} \) is closed in the weak topology. \(\quad \square \)
Corollary 7.16
We have \(\liminf _{n \rightarrow \infty } \frac{1}{n} \mathbb {E}\left[{\log Z ({\varvec{G}}_{\lambda ,\beta })}\right]\ge \inf _{\pi \in \mathfrak {D}^\star _\lambda }\mathscr {B}(\pi ).\)
Proof
Since \(\mathfrak {D}^\star \) is compact, Proposition 6.1 shows that there exists \(\pi \in \mathfrak {D}^\star \) such that
The construction of the \(\pi \) for which the lower bound is attained is based on Proposition 6.2, whose proof shows that the measure \(\pi _{\lambda ,\beta }\) for which the lower bound is attained in the limit of a sequence of distributions \((\pi _{\lambda ,\beta ,n})_{n\ge 1}\) that come from random factor graphs with the weight function \(\psi _\beta \). Specifically, we considered a random factor graph \({\varvec{G}}_{\lambda ,\beta ,n,\omega }\) with a random number of ‘cavities’ for a slowly growing \(\omega =\omega _n\rightarrow \infty \). With \(\mu _{n}\in \mathscr {P}(\Omega ^{\mathscr {C}})\) the joint Boltzmann distribution of the spins of the cavities \(\mathscr {C}\), the measure \(\pi _{\lambda ,\beta ,n}\) is defined as the distribution of the representation of \(\mu _{n}\) as an element of \(\mathscr {M}\). Thus, we just need to show that these representations converge to points in \(\mathfrak {K}_\lambda \).
The proof of this fact is based on Corollary 3.16. Specifically, let \(\varepsilon >0\). We obtain a decomposition \(S_1,\ldots ,S_\ell \) of \(\Omega ^{\mathscr {C}}\) into classes by pinning a random set \(\Theta _\varepsilon \) of cavities. The size \(|\Theta _\varepsilon |\) of this set depends on \(\varepsilon \) only and
Now, consider a cavity \(v\in \mathscr {C}{\setminus }\Theta _\varepsilon \), let \(1\le i\le \ell \) and consider a configuration \(\sigma \in S_i\) with \(\sigma _v=1\). Obtain \(\sigma '\) by setting \(\sigma '_v=0\) and \(\sigma '_w=\sigma _w\) for all \(w\ne v\). Then \(\sigma '\in S_i\) and the construction of the Boltzmann distribution ensures that \(\mu _n(\sigma |S_i)\le \lambda \mu _n(\sigma '|S_i)\). Hence, \(\mu _v(1|S_i)\le \lambda /(1+\lambda )\). Since \(|\Theta _\varepsilon |\) is bounded in terms of \(\varepsilon \) only, whereas \(|\mathscr {C}|\ge \omega _n/2\rightarrow \infty \) with high probability, we deduce from (7.33) that the representation \({\check{\mu }}_n\in \mathfrak {K}\) satisfies \(\mathscr {D}_{\Box }({\check{\mu }}_n,\mathfrak {K}_\lambda )<\varepsilon \) with high probability. Since, furthermore, the Wasserstein metric induces the weak topology on \(\mathfrak {D}\), we conclude that \(\pi _{\lambda ,\beta ,n}\) converges to a point \(\pi \) on in the closure of \(\mathfrak {D}_\lambda \); but since \(\mathfrak {D}_\lambda \) is closed, we conclude that \(\pi \in \mathfrak {D}_\lambda \). Finally, Corollary 7.15 implies that \(\pi \in \mathfrak {D}_\lambda \cap \mathfrak {D}^\star =\mathfrak {D}^\star _\lambda \). Thus, the assertion follows from (7.32).
\(\square \)
We are ready to establish the lower bound on the free energy.
Proposition 7.17
For all \(d\ge 3,\lambda >0\) we have \(\liminf _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}}_{\lambda ,\infty })]\ge \Phi _{d,\lambda }.\)
Proof
For any \(\beta ,\lambda >0\) Corollary 7.16 supplies \(\pi _{\lambda ,\beta }\in \mathfrak {D}^\star _\lambda \) such that
Now consider the sequence \((\pi _{\lambda ,\beta })_{\beta =1,2,\ldots }\). Since \(\mathfrak {D}_\lambda \) is compact, a subsequence \((\pi _{\lambda ,\beta _j})_j\) converges to \(\pi _{\lambda }\in \mathfrak {D}_\lambda \), i.e.,
Further, since \(\pi _{\lambda ,\beta _j}^{*_{\beta _j}(N,M)}=\pi _{\lambda ,\beta _j}\) for all j and \(N,M\ge 0\), Lemma 7.13 implies that for all pairs \(N,M\ge 0\),
Combining (7.35) and (7.36) with Lemma 7.12, we conclude that \(\pi _\lambda \in \mathfrak {D}^\star _\lambda \). Finally, since for every \(\beta >0\) we have \(\mathscr {B}_{d,\lambda ,\beta }(\,\cdot \,)\ge \mathscr {B}_{d,\lambda ,\infty }(\,\cdot \,)\) on \(\mathfrak {D}_\lambda \), the assertion follows from (7.34) and the continuity of the functional \(\mathscr {B}_{d,\lambda ,\infty }(\,\cdot \,)\). \(\quad \square \)
A separate argument is needed to derive the upper bound on the free energy. Basically, we will prove the following proposition by checking that the interpolation argument from Sect. 5 goes through for the hard-core model.
Proposition 7.18
For all \(d\ge 3,\lambda >0\) we have \(\limsup _{n\rightarrow \infty }\frac{1}{n}\mathbb {E}[\log Z({\mathbb {G}}_{\lambda ,\infty })]\le \Phi _{d,\lambda }.\)
With \(\varvec{\varphi }_i\) and \(\varvec{\psi }_{1,i}\) defined with respect to the hard-core weight function \(\psi _\infty \), let
Lemma 7.19
For any \(\lambda >0\) and any \(\mu \in \mathfrak {K}_\lambda \) we have \(\mathbb {E}\left[{\log Z({\varvec{G}}_{\lambda ,\infty })}\right]\le \mathscr {B}'(\mu )-\mathscr {B}''(\mu )+o(n)\).
Proof
This follows along the lines of the proof of Proposition 5.2. In that proof we required the assumption that all weight functions are strictly positive, but only in one place. Namely, we required positivity in order expand the logarithm into a power series in equations (5.9)–(5.11). Yet this approximation is still valid in the hardcore model. Indeed, the term \(\left\langle {{\psi _{a_{\varvec{m}_t+1}}},{\mu _{{\varvec{G}}_t}}}\right\rangle \), whose logarithm we calculate in (5.9), is lower-bounded by \(1-\lambda /(1+\lambda )\), because in the hard-core model the marginal probability that a single variable node has spin one is upper-bounded by \(\lambda /(1+\lambda )\). Similarly, the arguments of the logarithms in (5.10) and (5.11) are lower-bounded by \(1-\lambda /(1+\lambda )\) because \(\mu \in \mathfrak {K}_\lambda \).
\(\square \)
Proof of Proposition 7.18
Based on Lemma 7.19, we follow the proof of Proposition 5.3 to complete the proof of Proposition 7.18. Specifically, we claim that for any \(\pi \in \mathfrak {D}^\star _\lambda \),
This follows along the lines of Lemmas 5.5 and 5.6. In both cases we assumed that the weight functions are strictly positive in order to ensure that the arguments of the logarithms on the l.h.s. are bounded away from zero so that the logarithmic series applies. But the condition \(\pi \in \mathfrak {D}^\star _\lambda \) guarantees that
Thus, the same manipulations as before yield (7.37). Finally, the assertion follows from (7.37) and Lemma 7.19. \(\quad \square \)
Proof of Theorem 1.4
The theorem is an immediate consequence of Propositions 7.17 and 7.18. \(\quad \square \)
Proof of Corollary 1.5
For a graph \(G=(V(G),E(G))\) let \(\mu _{G,\lambda }\in \mathscr {P}(\{0,1\}^{V(G)})\) denote the hard-core model on G with fugacity \(\lambda \), and let \(Z_\lambda (G)\) be the corresponding partition function. Further, let \(\alpha _\lambda (G)=\sum _{v\in V(G)}\left\langle {{\varvec{\sigma }_v},{\mu _{G,\lambda }}}\right\rangle \) be the average size of an independent set drawn from \(\mu _{G,\lambda }\). Additionally, we write \(\alpha (G)\) for the maximum independent set size. It is well known that
and that
As an immediate consequence of (7.38) we obtain
Hence, (7.39) shows that for any \(\varepsilon >0\) there exists \(\lambda _0>0\) such that for all \(\lambda \ge \lambda _0\) and all d-regular graphs G we have
Applying (7.40) to the random graph \({\varvec{G}}_\lambda \) and taking expectations, we obtain
Theorem 1.4 guarantees that the sequence \(\left( {\mathbb {E}\left[{\lambda (\log Z_{\lambda +1}({\mathbb {G}})-\log Z_{\lambda }({\mathbb {G}}))}\right]}\right) _n\) converges, and thus (7.41) yields
Further, there exists a subsequence \((n_l)_{l\ge 1}\) along which \(\mathbb {E}[\alpha ({\mathbb {G}})/n]\) converges to \(\alpha _*\in [0,1]\), whence (7.41) yields
Since (7.43) holds for every \(\varepsilon >0\) for large enough \(\lambda \), we conclude that \(\lim _{\lambda \rightarrow \infty }\lambda (\Phi _{d,\lambda +1}-\Phi _{d,\lambda })\) exists. Hence, taking the limit \(\varepsilon \rightarrow 0\), and thus \(\lambda \rightarrow \infty \), in (7.42) completes the proof. \(\quad \square \)
Notes
Sometimes the d-regular infinite tree is referred to as the ‘Bethe lattice’. However, as Mézard and Parisi point out, the d-regular infinite tree does not provide a particularly useful framework for the study of spin interactions because almost all sites belong to the boundary of the tree. The random d-regular graph, which they and hence we call the Bethe lattice, provides a useful way out: while the local geometry around a given vertex is just a d-regular tree, at long distances this tree ‘wraps around’.
The expression (1.1) is equivalent to the possibly more familiar formula \(\mu _{{\mathbb {G}}}(\sigma )\propto \exp \left( {\beta \sum _{vw}\sigma _v\sigma _w }\right) \).
The arguments in the appendix are special cases of more general results on the cut metric from [18].
References
Abbe, E.: Community detection and stochastic block models: recent developments. arXiv:1703.10146 (2017)
Achlioptas, D., Hassani, S., Macris, N., Urbanke, R.: Bounds for random constraint satisfaction problems via spatial coupling. In: Proceedings of 27th SODA, pp. 469–479 (2016)
Achlioptas, D., Moore, C.: Random \(k\)-SAT: two moments suffice to cross a sharp threshold. SIAM J. Comput. 36, 740–762 (2006)
Achlioptas, D., Naor, A.: The two possible values of the chromatic number of a random graph. Ann. Math. 162, 1333–1349 (2005)
Achlioptas, D., Naor, A., Peres, Y.: Rigorous location of phase transitions in hard optimization problems. Nature 435, 759–764 (2005)
Achlioptas, D., Peres, Y.: The threshold for random \(k\)-SAT is \(2^k \ln 2 - O(k)\). J. AMS 17, 947–973 (2004)
Alberici, D., Contucci, P.: Solution of the monomer-dimer model on locally tree-like graphs. Rigorous results. Commun. Math. Phys. 331, 975–1003 (2014)
Auffinger, A., Jagannath, A.: Thouless–Anderson–Palmer equations for conditional Gibbs measures in the generic p-spin glass model. arXiv:1612.06359 (2016)
Banks, J., Moore, C., Neeman, J., Netrapalli, P.: Information-theoretic thresholds for community detection in sparse networks. In: Proceedings of 29th COLT, pp. 383–416 (2016)
Bapst, V., Coja-Oghlan, A.: Harnessing the Bethe free energy. Random Struct. Algorithms 49, 694–741 (2016)
Bapst, V., Coja-Oghlan, A., Hetterich, S., Rassmann, F., Vilenchik, D.: The condensation phase transition in random graph coloring. Commun. Math. Phys. 341, 543–606 (2016)
Barbier, J., Macris, N.: The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. arXiv:1705.02780 (2017)
Barbier, J., Krzakala, F., Zdeborová, L., Zhang, P.: The hard-core model on random graphs revisited. J. Phys. Conf. Ser. 473, 12–21 (2013)
Bayati, M., Gamarnik, D., Tetali, P.: Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Ann. Probab. 41, 4080–4115 (2013)
Bethe, H.: Statistical theory of superlattices. Proc. R. Soc. Lond. A 150, 552–558 (1935)
Coja-Oghlan, A., Efthymiou, C., Hetterich, S.: On the chromatic number of random regular graphs. J. Comb. Theory Ser. B 116, 367–439 (2016)
Coja-Oghlan, A., Efthymiou, C., Jaafari, N., Kang, M., Kapetanopoulos, T.: Charting the replica symmetric phase. Commun. Math. Phys. 359, 603–698 (2018)
Coja-Oghlan, A., Hahn-Klimroth, M.: The cut metric for probability distributions. arXiv:1905.13619
Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborova, L.: Information-theoretic thresholds from the cavity method. Adv. Math. 333, 694–795 (2018)
Coja-Oghlan, A., Panagiotou, K.: Catching the \(k\)-NAESAT threshold. In: Proceedings of 44th STOC, pp. 899–908 (2012)
Coja-Oghlan, A., Panagiotou, K.: The asymptotic \(k\)-SAT threshold. Adv. Math. 288, 985–1068 (2016)
Coja-Oghlan, A., Perkins, W.: Belief Propagation on replica symmetric random factor graph models. Annales de l’institut Henri Poincare D 5, 211–249 (2018)
Coja-Oghlan, A., Perkins, W.: Bethe states of random factor graphs. Commun. Math. Phys. 366, 173–201 (2019)
Coja-Oghlan, A., Perkins, W., Skubch, K.: Limits of discrete distributions and Gibbs measures on random graphs. Eur. J. Comb. 66, 37–59 (2017)
Coja-Oghlan, A., Zdeborová, L.: The condensation transition in random hypergraph 2-coloring. In: Proceedings of 23rd SODA, pp. 241–250 (2012)
Contucci, P., Dommers, S., Giardina, C., Starr, S.: Antiferromagnetic Potts model on the Erdős-Rényi random graph. Commun. Math. Phys. 323, 517–554 (2013)
Dani, V., Moore, C.: Independent sets in random graphs from the weighted second moment method. In: Proceedings of 15th RANDOM, pp. 472–482 (2011)
Dembo, A., Montanari, A.: Gibbs measures and phase transitions on sparse random graphs. Braz. J. Probab. Stat. 24, 137–211 (2010)
Dembo, A., Montanari, A.: Ising models on locally tree-like graphs. Ann. Appl. Probab. 20, 565–592 (2010)
Dembo, A., Montanari, A., Sun, N.: Factor models on locally tree-like graphs. Ann. Probab. 41, 4162–4213 (2013)
Diaconis, P., Janson, S.: Graph limits and exchangeable random graphs. Rend. Mat. Appl. 28, 33–61 (2008)
Ding, J., Sly, A., Sun, N.: Satisfiability threshold for random regular NAE-SAT. Commun. Math. Phys. 341, 435–489 (2016)
Ding, J., Sly, A., Sun, N.: Proof of the satisfiability conjecture for large \(k\). In: Proceedings of 47th STOC, pp. 59–68 (2015)
Ding, J., Sly, A., Sun, N.: Maximum independent sets on random regular graphs. Acta Math. 217, 263–340 (2016)
Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111, 535–564 (2003)
Frieze, A., Kannan, R.: Quick approximation to matrices and applications. Combinatorica 19, 175–220 (1999)
Giurgiu, A., Macris, N., Urbanke, R.: Spatial coupling as a proof technique and three applications. IEEE Trans. Inf. Theory 62, 5281–5295 (2016)
Guerra, F.: Broken replica symmetry bounds in the mean field spin glass model. Commun. Math. Phys. 233, 1–12 (2003)
Janson, S.: Graphons, cut norm and distance, couplings and rearrangements. N. Y. J. Math. 4 (2013)
Janson, S., Łuczak, T., Ruciński, A.: Random Graphs. Wiley, Hoboken (2000)
Krzakala, F., Montanari, A., Ricci-Tersenghi, F., Semerjian, G., Zdeborová, L.: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. 104, 10318–10323 (2007)
Krzakala, F., Zdeborová, L.: Potts glass on random graphs. Europhys. Lett. 81, 57005 (2008)
Lelarge, M., Oulamara, M.: Replica bounds by combinatorial interpolation for diluted spin systems. J. Stat. Phys. 173, 917–940 (2018)
Lovász, L.: Large networks and graph limits. American Mathematical Society, Providence, Rhode Island, USA (2012)
Mézard, M.: Mean-field message-passing equations in the Hopfield model and its generalizations. Phys. Rev. E 95, 022117 (2017)
Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, Oxford (2009)
Mézard, M., Parisi, G., Zecchina, R.: Analytic and algorithmic solution of random satisfiability problems. Science 297, 812–815 (2002)
Mézard, M., Parisi, G.: The Bethe lattice spin glass revisited. Eur. Phys. J. B 20, 217–233 (2001)
Mézard, M., Parisi, G.: The cavity method at zero temperature. J. Stat. Phys. 111, 1–34 (2003)
Moore, C.: The computer science and physics of community detection: landscapes, phase transitions, and hardness. arXiv:1702.00467 (2017)
Mossel, E., Neeman, J., Sly, A.: Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 162, 431–461 (2014)
Panchenko, D.: Spin glass models from the point of view of spin distributions. Ann. Probab. 41, 1315–1361 (2013)
Panchenko, D.: Structure of finite-RSB asymptotic Gibbs measures in the diluted spin glass models. J. Stat. Phys. 162, 1–42 (2016)
Panchenko, D.: The Sherrington–Kirkpatrick Model. Springer, Berlin (2013)
Panchenko, D., Talagrand, M.: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130, 319–336 (2004)
Richardson, T., Urbanke, R.: Modern Coding Theory. Cambridge University Press, Cambridge (2008)
Talagrand, M.: Mean Field Models for Spin Glasses. Volumes I and II. Springer, Berlin (2011)
Zdeborová, L., Krzakala, F.: Statistical physics of inference: thresholds and algorithms. Adv. Phys. 65, 453–552 (2016)
Acknowledgements
The first author thanks Max Hahn-Klimroth for helpful discussions on the cut metric.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by H. Spohn
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Proof of Lemmas 3.6 and 7.12
The proof of Lemma 3.6 requires the regularity lemma for measures from [24].Footnote 3 Let \(\lambda \) denote the Lebesgue measure. For \(\mu \in \mathscr {K}\) and measurable \(S,X\subset [0,1]\) we write
with the convention that \(\mu _{S,X}\) is uniform if \(\lambda (S)\lambda (X)=0\). Further, let \(\varvec{X}=(X_1,\ldots ,X_K),\varvec{S}=(S_1,\ldots ,S_L)\) be a partitions of [0, 1) into pairwise disjoint measurable sets. We write \(\#\varvec{X},\#\varvec{S}\) for the number K, L of classes, respectively. Then \(\mu \) is \(\varepsilon \)-regular with respect to \((\varvec{X},\varvec{S})\) if there exists \(R\subset [\#\varvec{X}]\times [\#\varvec{S}]\) such that the following conditions hold.
- REG1::
\(\lambda (X_i)>0\) and \(\lambda (S_j)>0\) for all \((i,j)\in R\).
- REG2::
\(\sum _{(i,j)\in R}\lambda (X_i)\lambda (S_j)>1-\varepsilon \).
- REG3::
for all \((i,j)\in R\) and almost all \(s,s'\in S_j\) we have \(\Vert {\int _{X_i}\mu _{s,x}-\mu '_{s',x}{\mathrm d}x}\Vert _{\mathrm {TV}}<\varepsilon \lambda (X_i)\).
- REG4::
if \((i,j)\in R\), then for every \(U\subset X_i\) with \(\lambda (U)\ge \varepsilon \lambda (X_i)\) and every \(T\subset S_j\) with \(\lambda (T)\ge \varepsilon \lambda (S_j)\) we have
$$\begin{aligned} \left\| {\mu _{S,X_i}-\mu _{T,U}}\right\| _{\mathrm {TV}}<\varepsilon . \end{aligned}$$
A refinement of a partition \((\varvec{X},\varvec{S})\) is a partition \((\varvec{X}',\varvec{S}')\) such that for every pair \((i',j')\in [\#\varvec{X}']\times [\varvec{S}']\) there is a pair \((i,j)\in [\#\varvec{X}]\times [\varvec{S}]\) such that \((X_{i'}',S_{j'}')\subset (X_i,S_j)\).
Theorem A.1
([24]). For any \(\varepsilon >0\) there exists \(N=N(\varepsilon ,\Omega )\) such that for every \(\mu \in \mathscr {K}\) the following is true. Every partition \((\varvec{X}_0,\varvec{S}_0)\) with \(\#\varvec{X}_0+\#\varvec{S}_0\le 1/\varepsilon \) has a refinement \((\varvec{X},\varvec{S})\) such that \(\#\varvec{X}+\#\varvec{S}\le N\) with respect to which \(\mu \) is \(\varepsilon \)-regular.
Additionally, we need the strong cut metric, defined by
where S, X range over measurable subsets of the unit interval and \(\omega \in \Omega \). It is well known that \(D_{\Box }(\,\cdot \,,\,\cdot \,)\) induces a metric on \(\mathscr {K}\).
For \(\mu ,\nu \in \mathscr {K}\) we define \(\mu \oplus \nu :[0,1]^3\rightarrow \mathscr {P}(\Omega ^2)\) by \(\mu \oplus \nu _{s,x_1,x_2}=\mu _{s,x_1}\otimes \mu _{s,x_2}\). Since \([0,1]^2\) with the Lebesgue measure is isomorphic as a measure space to [0, 1] with the Lebesgue measure, we can view \(\mu \oplus \nu \) as a strong \(\mathscr {P}(\Omega ^2)\)-valued kernel. In particular, it makes sense to apply the strong cut metric to these kernels.
Proposition A.2
The map \((\mu ,\nu )\mapsto \mu \oplus \nu \) is continuous with respect to the strong cut metric.
Proof
Given \(\varepsilon >0\) pick a small enough \(\delta >0\) and assume that \(D_{\Box }(\mu ,\mu ')<\delta \). Due to the triangle inequality it suffices to prove that \(D_{\Box }(\mu \oplus \nu ,\mu '\oplus \nu )<\varepsilon \) for every \(\nu \). Thus, we need to show that for any \(X\subset [0,1]^2\), \(S\subset [0,1]\) and \(\sigma ,\tau \in \Omega \),
To this end, we may assume that \(\lambda (S)>\varepsilon ^2\) and that \(\int _S\nu _{s,x_2}(\tau ){\mathrm d}s>\varepsilon ^2\) for all \((x_1,x_2)\in X\). Further, with \(z=\int _0^1\nu _{s,x_2}(\tau ){\mathrm d}s\) consider the variable transformation
Let T be the inverse image of S under the transformation (A.2). Then we obtain for any \(X_1\subset [0,1]\),
But the assumption \(D_{\Box }(\mu ,\mu ')<\delta \) implies that the double integral on the r.h.s. of (A.3) is bounded by \(\varepsilon ^4\) in absolute value (providing \(\delta \) is small enough). Thus, (A.1) follows.
\(\square \)
Proof of Lemma 3.6
We may assume without loss that \(f(\tau )=\varvec{1}\{\tau =\sigma \}\) for some \(\sigma \in \Omega ^k\). Let \(\varepsilon >0\), pick \(\alpha =\alpha (\varepsilon )\), \(\xi =\xi (\alpha )>0\) small enough and assume that \(\mu ,\nu \in \mathscr {K}\) are such that \(D_{\Box }(\mu ,\nu )<\delta \) for a small enough \(\delta =\delta (\xi )>0\). Applying Theorem A.1 twice, we obtain \((\varvec{X},\varvec{S})\) with respect to which both \(\mu ,\nu \) are \(\xi \)-regular, and \(L=\#\varvec{X}+\#\varvec{S}\) is bounded in terms of \(\xi \) only. Let \(R'\) be the set of all pairs for which REG1–REG4 are satisfied for both \(\mu ,\nu \) and that satisfy \(\lambda (\varvec{X}_i,\varvec{S}_j)>\xi ^8/L\). Assuming that \(\delta \) is sufficiently small, we obtain
Furthermore, consider the random variables
and define \(\mu ',\nu '\in \mathscr {K}\) as follows. To construct \(\mu '\), partition the interval [0, 1] into pairwise disjoint sets \(T_i\), \(i\in [\#\varvec{S}]\), of measure \(z_i/z\) and fill the strip \(T_i\times [0,1]\) with a suitably scaled copy of \((\mu _{s,x})_{s\in S_i,x\in [0,1]}\). Construct \(\nu '\) analogously from the \(z_i'\). Then \(\mathscr {D}_{\Box }(\mu ',f*\mu )=\mathscr {D}_{\Box }(\nu ',f*\nu )=0\). Furthermore, Proposition A.2 shows that with probability at least \(1-\alpha \) we have
provided that \(\xi ,\delta \) are chosen small enough. Since also \(z\ge \alpha \) because the function f is strictly positive, we conclude that with probability at least \(1-\alpha \) we have \(\mathscr {D}_{\Box }(\mu ',\nu ')<\alpha \). We thus obtain a coupling of the random variables \(f*\mu ,f*\nu \) under which the expected cut distance is bounded by \(\varepsilon \), as desired. \(\quad \square \)
Proof of Lemma 7.12
We proceed precisely as in the proof of Lemma 3.6, up until the point where the positivity of f is used. In the setup of Lemma 7.12, the function f may take the value 0 on kernels that take the value 1 with positive probability; however, since we are assuming that the values of the kernels are bounded by \(\lambda /(1+\lambda )\). Therefore, the function f always attains values that are bounded away from 0. \(\quad \square \)
Appendix B. Proof of Lemma 3.3
The proof of Lemma 3.3 requires the following operation. For functions \(f:\Omega ^{M\times N}\rightarrow \mathbb {R}\), \(g:\Omega ^{L\times N}\rightarrow \mathbb {R}\) we define
Thus, the first M rows of \(\sigma \) go into f, the last L rows go into g and we multiply the results.
We define a corresponding operation on kernels. Namely, for \(\mu ,\nu \in \mathscr {K}\) we define \(\mu \otimes \nu :[0,1]^3\rightarrow \mathscr {P}(\Omega ^2)\) by \(\mu \oplus \nu _{s,t,x}=\mu _{s,x}\otimes \nu _{t,x}\). Since \(([0,1]^2,\lambda \otimes \lambda )\) is isomorphic \(([0,1],\lambda )\), we can view \(\mu \otimes \nu \) as a \(\mathscr {P}(\Omega ^2)\)-valued kernel, and the cut metric extends to these kernels. Since the cut metric is invariant under swapping the axes, Proposition A.2 readily yields the following.
Proposition B.1
The map \((\mu ,\nu )\mapsto \mu \otimes \nu \) is continuous with respect to the cut metric.
As a final preparation toward the proof of Lemma 3.3 we need the following fact.
Lemma B.2
For any \(f:\Omega \rightarrow \mathbb {R}\) the map \(\mu \in \mathfrak {K}\mapsto \mathbb {E}\left\langle {{f},{\mu }}\right\rangle \) is continuous.
Proof
We may assume without loss that \(f(\tau )=\varvec{1}\{\sigma =\tau \}\) for some \(\sigma \in \Omega \). Then
and it is immediate from the definition of the cut metric that the integral on the right hand side is a continuous function of \(\mu \). \(\quad \square \)
Proof of Lemma 3.3
Let \(f:\Omega ^{m\times n}\rightarrow \mathbb {R}\) and let \(\mu \in \mathfrak {K}\). Define \(\nu =(\mu ^{\oplus n})^{\otimes m}\). Then \(\nu \) is a kernel with values in \(\Omega ^{mn}\) and the definition of \(\left\langle {{\,\cdot \,},{\,\cdot \,}}\right\rangle \) ensures that \(\mathbb {E}\left\langle {{f},{\mu }}\right\rangle =\mathbb {E}\left\langle {{f},{\nu }}\right\rangle \). This already shows that the map \(\mu \mapsto \mathbb {E}\left\langle {{f},{\mu }}\right\rangle \) is continuous, because the map \(\mu \mapsto \nu \) is continuous by Proposition A.2 and B.1 and the map \(\nu \mapsto \mathbb {E}\left\langle {{f},{\nu }}\right\rangle \) is continuous by Lemma B.2. Now fix an integer \(\ell \ge 2\) and let \(\eta =\nu ^{\otimes \ell }\). Then
and thus the continuity of the map \(\mu \mapsto \mathbb {E}\left[{\left\langle {{f},{\mu }}\right\rangle ^\ell }\right]\) follows from Proposition B.1 and Lemma B.2. \(\quad \square \)
Rights and permissions
About this article
Cite this article
Coja-Oghlan, A., Perkins, W. Spin Systems on Bethe Lattices. Commun. Math. Phys. 372, 441–523 (2019). https://doi.org/10.1007/s00220-019-03544-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00220-019-03544-y