1 Introduction

Spin glass is a paradigm for basic research on physics of collective behaviors induced by disorder and/or frustration [2]. It is also a tool box for applied research on hard problems originated from fields outside the conventional domain of physics. The mean-field theory of spin glasses was originally developed through the replica trick and by assuming the property of replica-symmetry-breaking (RSB) [11, 36, 45]. It was later re-derived using the cavity method [26, 27] without introducing replicas. Another advantage of the cavity method is its applicability to single problem instances with fixed disorder parameters. The cavity method was successfully extended to spin glass models defined on finite-connectivity random graphs [25], confirming and extending the earlier theoretical results obtained by the replica method [29, 30, 50]. These theoretical advances paved the way for the fruitful applications of spin glass theory to interdisciplinary problems in computer science, information theory, and biological sciences [13, 23]; for example, constraint satisfaction and combinatorial optimization [18, 28], perceptual learning with discrete synapses [5], signal transmission and processing [14, 47], network structure inference [3, 24, 43, 51], compressive sensing [9, 17].

The most crucial simplification made in the mean-field RSB cavity method is the Bethe-Peierls approximation [4, 38]. Roughly speaking, this approximation ignores all the possible correlations among the set of vertices that are nearest-neighbors to a specified central vertex. There have been several theoretical attempts to include the effects of correlations to the cavity method [6, 31, 37, 41]. A simple and general scheme of partition function expansion was introduced in [52] along this research line, following the initial work of Chertkov and Chernyak [6] on Ising spin glasses. By this expansion scheme, the mean-field free energy expression at each level of replica-symmetry-breaking can be formally derived without imposing any physical assumptions, and the correction contributions to these mean-field free energies are expressed as a sum over looped subgraphs. The strong constraint that the subgraphs with dangling edges all have completely zero correction contribution to the free energies leads to a set of message-passing equations, such as the belief-propagation equation and the survey-propagation equation, that must be satisfied by the auxiliary probability distribution functions introduced in the expansion scheme.

For readers not familiar with spin glass theories, the partition function expansion scheme [52] can serve as a straightforward mathematical approach to the essentials of the RSB mean field theory.

Finite-dimensional spin glass models are still very challenging for theoretical studies. One of the major reasons is the abundance of short loops and the associated (possibly strong) local correlations. Short loops are also presented in numerous graphical models derived from real-world applications in optimization, inference, and constraint satisfaction problems. For such spin glass systems, the free energy correction contributions from loopy subgraphs may be comparable to the mean-field free energy values. As a consequence, the prediction power of the mean-field theory can be severely damaged. A powerful conceptual framework for finite-dimensional lattice models is the cluster variation method, which was invented by Kikuchi [15] and later further developed by many authors (reviewed in [34, 39]). The key insight behind the cluster variation method is that, in a finite-dimensional system, the correlation between two vertices decays exponentially with the distance (as long as the system is not approaching a continuous phase transition), and therefore short-range correlations dominate the statistical property of the system. Short-range correlations among clusters of neighboring vertices are considered in a variational way in the cluster variation method [1, 15, 34, 39]. As an extension of the cluster variation method, Yedidia and coworkers proposed a region graph method [53] to tackle local correlations in finite-dimensional graphical models with more computational flexibility. A set of regions is specified in the region graph representation of a spin glass model [53]. Each region contains a subset of vertices and some (or all) of the interactions within these vertices. A partial order can be defined among these regions, which is represented by a set of directed edges between pairs of regions. A variational free energy is defined on the region graph, the minimization of this free energy leads to a set of generalized belief-propagation equations [53]. Recently, the generalized belief-propagation was applied to the two-dimensional (2D) Edwards-Anderson model by Ricci-Tersenghi, Mulet and co-workers [8, 21, 40].

As demonstrated in a very recent paper [56], the partition function expansion scheme of [52] can also be applied to spin glasses in the region graph representation. In the present work we give a detailed discussion on how to construct a region graph RSB mean-field theory for spin glass models and also to obtain the correction expressions to such a theory. Two sets of message-passing equation, namely the region graph belief-propagation equation and region graph survey-propagation equation, are derived through the partition function expansion. We also prove that, for non-redundant region graphs, the region graph Bethe-Peierls free energy as derived through the partition function expansion is equivalent to the variational free energy functional used in [53]. The region graph belief-propagation (rgBP) equation is applied to the 2D square-lattice Ising model and the 2D square-lattice Edwards-Anderson (EA) spin glass model, and our numerical results are compared with the results obtained by the conventional belief-propagation (BP) procedure.

The prediction on the Ising model’s Curie temperature by the rgBP is better than the prediction of the conventional BP. The exactly known transition temperature can be approached if region graphs with sufficiently large maximal regions are used in the rgBP. When applying the rgBP to the square-lattice EA model at sufficiently low temperature, a most interesting numerical observation for us is that, the rgBP identifies many small collective domains of vertices in the square lattice (see Fig. 11). Vertices in each of these collective domains are strongly correlated (and probably change states collectively), while the remaining vertices outside of all these small localized domains serve as paramagnetic background. Such highly heterogeneous patterns are in some respect similar to the patterns of dynamical heterogeneity in dense liquid when approaching the glass transition (reviewed in [10, 12]). A key difference is the collective domains in the square lattice do not move in space, only their areas increase as temperature decreases. These collective domains bring in many new time scales to the system’s low-temperature dynamics, they should be essential to understand many of the fascinating low-temperature glass-like behaviors of this system.

For the square-lattice EA model, the existence of many localized collective domains above the paramagnetic background naturally points to a way of improving the power of the rgBP. The basic idea is to use larger maximal regions for the identified collective domains, while the sizes of the maximal regions in the remaining parts of the lattice keep to be as small as possible. This hybrid adaptive strategy will be explored in a future work. We hope it will be helpful in achieving deeper insights on the heterogeneous (but globally still paramagnetic) behaviors of this and other 2D spin glass systems.

The next section introduces the general graphical model and its factor graph and region graph representations, and briefly reviews the cluster variation method. Section 3 derived the region graph replica-symmetric (RS) mean-field theory as well as the region graph belief-propagation equation; the connection with the generalized belief-propagation of [53] is also discussed here. Section 4 derived the region graph first-replica-symmetry-breaking (1RSB) mean-field theory and the associated region graph survey-propagation equation. The RS mean-field theory (rgBP) is then applied to the 2D Ising model and the 2D Edwards-Anderson model in Sect. 5. We conclude this work and list some possible future projects in Sect. 6. The paper have four Appendices AD.

2 The Model and Graphical Representations

Consider a very general model system of N particles (i=1,2,…,N) interacting with each other and with the environment. Each particle i is fixed in space and therefore has no translational degrees of freedom, it is fully characterized by an internal state variable x i . For example, in Ising models, the state variable is binary, x i =±1; in Potts models, the internal state can choose among Q different values, x i ∈{1,2,…,Q}; in Heisenberg models, x i is a three-dimensional continuous vector of unit length. In this paper we assume, for notational simplicity, that the state variables x i are discrete and can take only a finite number of different values. The microscopic configuration (state) \(\underline{x}\) of the whole system is defined by the states of all its particles, \(\underline{x} \equiv \{x_{1}, x_{2}, \ldots, x_{N}\}\). The configuration energy \(H(\underline{x})\) is supposed to have the following additive form

$$ H(\underline{x}) = \sum _{i=1}^{N} E_i(x_i) + \sum _{a=1}^{M} E_a( \underline{x}_{\partial a}) . $$
(1)

On the right side of (1), the first term is contributed by external forces (fields), with each energy E i depending only on a single particle i (if i is free of external forces, then E i (x i )≡0). The second term is contributed by M internal interactions (a=1,2,…,M), each having an energy E a . Let us denote by ∂a the set of particles that are involved in interaction a and by \(\underline{x}_{\partial a} \equiv \{ x_{i} | i\in \partial a\}\) a joint state of the particles in this set. The internal interaction energy E a is a function only of \(\underline{x}_{\partial a}\). For example, in a two-body Ising model an internal energy has the form E a =−J a x i x j with J a being a coupling constant, then ∂a={i,j} and \(\underline{x}_{\partial a}=\{x_{i}, x_{j}\}\).

After equilibrium is reached in an environment of temperature T, the probability of the system being in a configuration \(\underline{x}\) obeys the Boltzmann distribution

$$ P_B(\underline{x}) = \frac{1}{Z(\beta)} e^{-\beta H(\underline{x})} , $$
(2)

where β≡1/(k B T) is the inverse temperature, k B being the Boltzmann constant. The partition function Z(β) is expressed as

$$ Z(\beta) \equiv \sum_{\underline{x}} e^{-\beta H(\underline{x})} = \sum _{\underline{x}} \prod _{i=1}^{N} \psi_i(x_i) \prod _{a=1}^{M} \psi_a( \underline{x}_{\partial a}) , $$
(3)

where ψ i and ψ a are, respectively, the Boltzmann factor for the external field on particle i and the internal interaction a,

$$ \psi_i(x_i) \equiv e^{-\beta E_i(x_i)}, \qquad \psi_a(\underline{x}_{\partial a}) \equiv e^{-\beta E_a(\underline{x}_{\partial a})} . $$
(4)

From Z(β) we can define the equilibrium free energy F(β) as

$$ F(\beta) \equiv -\frac{1}{\beta} \ln Z(\beta) = \sum _{\underline{x}} P_B(\underline{x}) H(\underline{x}) + k_B T \sum _{\underline{x}} P_B(\underline{x}) \ln P_B(\underline{x}) . $$
(5)

The expression (5) for the equilibrium free energy can be extended to a general non-equilibrium situation. Given an arbitrary probability distribution \(\rho(\underline{x})\) on the configurations of the model (1), the Shannon entropy functional is defined as [7]

$$ S[\rho] \equiv - k_B \sum _{\underline{x}} \rho(\underline{x}) \ln \rho(\underline{x}) , $$
(6)

and the free energy functional F[ρ] is

$$ F[\rho ] \equiv \sum_{\underline{x}} \rho( \underline{x}) H(\underline{x}) - T S[\rho] . $$
(7)

It is easy to check that the absolute minimum of F[ρ] is equal to the equilibrium free energy F(β) and this minimum is achieved only when \(\rho(\underline{x})= P_{B}(\underline{x})\).

2.1 The Factor Graph Representation

The model defined by (1) can be represented conveniently by a factor graph G of variable nodes (shown as circles, denoting the N particles i,j,k,…), function nodes (shown as squares, denoting the M internal interactions a,b,c,…), and edges between pairs of variable and function nodes (i,a).Footnote 1 The factor graph G is a bipartite graph: all the edges are between a variable node and a function node, and an edge (i,a) exists between a variable node i and a function node a if and only if variable i is involved in the internal interaction represented by a. A comprehensive review on factor graphs can be found in [19].

As a simple illustration, we show in Fig. 1 part of the factor graph for the two-dimensional Edwards-Anderson spin glass model [11] on a square lattice. The energy function of the EA model is defined as

$$ H(\underline{\sigma}) = - \sum _{(i,j)} J_{i j} \sigma_i \sigma_j - \sum _{i} h_i^0 \sigma_i , $$
(8)

where the internal energies are due to ferromagnetic (J ij >0) or anti-ferromagnetic (J ij <0) spin couplings along the edges (i,j) of the square lattice, and \(h_{i}^{0}\) is the external magnetic field on particle i. We use σ i ∈{−1,+1} instead of x i to denote the binary spin state of particle i. The spin configuration of the whole system is \(\underline{\sigma}\equiv \{\sigma_{1}, \sigma_{2}, \ldots, \sigma_{N}\}\).

Fig. 1
figure 1

The factor graph for the two-dimensional Edwards-Anderson model. Each variable node i (circle) represents a lattice site i, it has a spin state σ i and an energy \(E_{i}(\sigma_{i}) = - h_{i}^{0} \sigma_{i}\) caused by an external field \(h_{i}^{0}\); each function node a (square) between two variable nodes i and j represents a spin coupling with energy E a (σ i ,σ j )=−J ij σ i σ j

2.2 A Brief Summary of the Cluster Variation Method

The cluster variation method (CVM), as originally proposed by Kikuchi [15], is an approximate method to calculate the Shannon entropy (6). It decomposes S[ρ] into contributions from different clusters of variable nodes (see [1] for an easy-to-access introduction to CVM). A cluster in the CVM is defined as a non-empty subset of the N variable nodes. The total number of possible clusters is 2N−1. For two clusters C 1 and C 2, we say that C 1C 2 (and C 2C 1) if and only if C 1 is a subset of C 2 (in cases that C 1C 2 and also C 1C 2, then C 1=C 2). If C 1 is a strict subset of C 2, we denote as C 1<C 2 (and C 2>C 1). This partial ordering corresponds to the following function ζ(C 1,C 2)

$$ \zeta(C_1, C_2) = \left\{ \begin{array}{l@{\quad }l} 1 , & \mbox{for}\ C_1 \leq C_2 ; \\ 0 , & \mbox{otherwise} . \end{array} \right. $$
(9)

This function can be regarded as a matrix, and it is invertible:

$$ \sum _{C_2} \zeta(C_1, C_2) \mu(C_2, C_3) = \sum _{C_2} \mu(C_1, C_2) \zeta(C_2, C_3) = \delta(C_1, C_3) , $$
(10)

where δ(C 1,C 2) is the Kronecker delta such that δ(C 1,C 2)=1 if C 1=C 2 and =0 otherwise. The function μ(C 1,C 2) is the Möbius inversion function [42] defined as

$$ \mu(C_1, C_2) = \left\{ \begin{array}{l@{\quad }l} (-1)^{|C_2|-|C_1|} , & \mbox{for}\ C_1 \leq C_2 ; \\ 0 , & \mbox{otherwise} . \end{array}\right. $$
(11)

In the above equation, |C| denotes the number of elements in cluster C.

The configuration of a cluster C is defined as the collection of states of all its elements and is denoted as \(\underline{x}_{C} \equiv \{x_{i} : i\in C\}\). Given a probability distribution \(\rho(\underline{x})\) for all the N variable nodes, the marginal distribution for the configuration \(\underline{x}_{C}\) is

$$ \rho_C(\underline{x}_C) \equiv \sum _{\underline{x} \backslash \underline{x}_C} \rho(\underline{x}) . $$
(12)

Following (6), we can define for each cluster C its Shannon entropy functional as

$$ S_C [\rho_C] \equiv - k_B \sum _{\underline{x}_C} \rho_C (\underline{x}_C) \ln \rho_C (\underline{x}_C) . $$
(13)

Let us define an entropy increment ΔS C for each cluster C as (see, for example, [1])

$$ \Delta S_C \equiv \sum _{C^\prime \leq C} (-1)^{|C|-|C^\prime|} S_{C^\prime} = \sum _{C^\prime } S_{C^\prime} \mu\bigl(C^\prime, C \bigr) . $$
(14)

Then from (14) and (10) we obtain that

$$ S_C = \sum _{C^\prime } \Delta S_{C^\prime} \zeta\bigl(C^\prime, C \bigr) = \sum _{C^\prime \leq C} \Delta S_{C^\prime} . $$
(15)

In other words, the entropy S C of each cluster C is the sum of the entropy increments of all its sub-clusters (including C itself).

If we take C as the set of all the variable nodes in G, the above equation (15) then states that, the entropy functional S[ρ] of the whole system is the sum of the entropy increments of all the 2N−1 different clusters, i.e., S[ρ]=∑ C ΔS C [ρ C ]. There are an exponential terms in this summation. The insight of Kikuchi [15] was that, the entropy increment ΔS C often decays quickly with the size |C| of the clusters, therefore a good approximation of S[ρ] can be obtained by summing over only a subset of small clusters. Let us choose a set of maximal clusters \(C_{1}^{(m)}, C_{2}^{(m)}, \ldots, C_{p}^{(m)}\). These p clusters are maximal in the sense that any of them is not a sub-cluster of any another cluster in this chosen set. These p maximal clusters and all their sub-clusters form a set, denoted as P. We then have the following approximate expression for the entropy functional:

$$ S [\rho ] \approx \sum _{C \in P} \Delta S_C [\rho_C ] . $$
(16)

This is the CVM approximation.

Inserting (14) into the above expression, we get

$$ S [\rho ] \approx \sum _{C\in P} a_C S_C [ \rho_C ] , $$
(17)

where the coefficient a C of a cluster C is expressed as

$$ a_C = \sum _{C^\prime \in P} \mu\bigl(C, C^\prime\bigr) = \sum _{\{C^\prime | C^\prime \in P, C^\prime \geq C\}} (-1)^{|C^\prime| - |C|} . $$
(18)

It is easy to check that each of the p maximal clusters (say C (m)) has the coefficient \(a_{C^{(m)}} = 1\). For other clusters of P, their coefficients can be calculated iteratively as

$$ a_C = 1 - \sum _{\{C^\prime | C^\prime \in P, C^\prime > C \}} a_{C^\prime} . $$
(19)

The applications of the CVM method were reviewed in [34, 39]. This method can be combined with Suzuki’s coherent anomaly method [46] to compute the critical exponents of a given finite-dimensional system.

2.3 The Region Graph Representation

In the CVM approximation (16), for each maximal cluster, all its sub-clusters are included into the cluster set P. This requirement was relaxed in the work of Yedidia and co-authors, who proposed a more flexible region graph representation [53]. A region graph R is formed by regionsFootnote 2 and a set of directed edges between the regions. A region α of a factor graph G includes a set of variable nodes and a set of function nodes, with the condition that, if a function node a belongs to a region α, then all the variable nodes in the set ∂a also belong to the region α. A configuration of a region α is defined by the states of all the variable nodes in this region, \(\underline{x}_{\alpha}\equiv \{x_{i} | i\in \alpha\}\).

If there is a directed edge pointing from a region μ to ν in the region graph R, then it must be the case that ν is a sub-region of μ (that is, all the variable nodes and function nodes of region ν also belong to region μ). We use μν to emphasize the directness of the edge; sometimes if the direction of an edge is unimportant for the discussion, the notion (μ,ν) is also used. It should be emphasized that, a region ν being a sub-region of another region μ does not necessarily indicate the existence of a directed edge from μ to ν. If there is a directed edge from region μ to ν, then μ is regarded as a parent of ν and ν a child of μ. If there is a directed path from region α to region ν, we say that α is an ancestor of ν and ν is a descendant of α. The ancestor-descendant relationship between two regions α and ν is denoted by ν<α and α>ν. (If ν is not a descendant of α, such a relationship does not exist even if ν is a sub-region of α.)

Each region γ is assigned a counting number c γ , which is determined recursively by

$$ c_\gamma = 1 - \sum _{\{\alpha| \alpha \in R, \alpha>\gamma\}} c_\alpha . $$
(20)

Notice the similarity between (20) and (19). In the region graph R, the subgraph R i induced by a variable node i is defined as the subgraph that include all the regions containing i and all the directed edges between these regions. Similarly, R a denotes the region graph induced by function node a, it includes all the regions containing a and all the directed edges between these regions. The region graph R and its associated counting numbers {c γ } are required to satisfy the following region graph conditions [53]:

  1. (1)

    For any variable node i, the induced subgraph R i is connected, and the sum of counting numbers within this subgraph is unity:

    $$ \sum _{\gamma \in R_i} c_\gamma = 1 . $$
    (21)
  2. (2)

    For any function node a, the induced subgraph R a is connected, and the sum of counting numbers within R a is unity:

    $$ \sum _{\gamma \in R_a} c_\gamma = 1 . $$
    (22)

Given a factor graph G, one may construct many different region graphs R, all of them satisfying the region graph conditions. A region graph R is considered as being non-redundant if it satisfies the following additional condition: the region subgraph R i induced by any variable node i contains no loops (it is a tree of regions). Region graphs which do not satisfy this tree condition are referred to as being redundant. As we will discuss in detail, non-redundant region graphs have some nice properties for partition function expansion.

A simple region graph R is shown in Fig. 2 for the two-dimensional EA model. Although this region graph contains many loops at the region level, it is non-redundant as can be easily checked.

Fig. 2
figure 2

A non-redundant region graph R for the factor graph shown in Fig. 1. Only a part of the full region graph is shown. There are three types of regions: each ‘square’ region (e.g., region α) contains n×n variable nodes and 2n(n−1) function nodes (in this example, n=2), its counting number is c=1; each ‘rod’ region contains n variable nodes and n−1 function nodes, its counting number is c=−1; and each ‘stripe’ region (e.g., region μ) contains n×2 variable nodes and n function nodes, with counting number c=1. Each stripe region is connected to two rod regions, and each rod region connects to one stripe region and one square region. The short red lines between two regions indicate the parent-child relationship (the directions of these edges are not shown, as they are obvious). In this region graph, variable node i appears in 5 different regions, and function node a appears in 3 different regions (Color figure online)

3 Partition Function Expansion on a Region Graph

Given a region graph R for a factor graph G, we now expand the equilibrium partition function (3) as a sum over the contributions of region subgraphs of R. The mathematical framework of [52] will be followed.

3.1 Introducing Auxiliary States of Variable Nodes and Removing Redundancy

First of all, because of the constraints (21) and (22), the partition function can be expressed as

$$ Z(\beta) = \sum _{\underline{x}} \prod _{\alpha \in R} \biggl[ \prod _{i\in \alpha} \psi_i(x_i) \prod _{a\in \alpha} \psi_a(\underline{x}_{\partial a}) \biggr]^{c_\alpha} . $$
(23)

The microscopic configuration of each region α is denoted as \(\underline{x}_{\alpha}\equiv \{x_{i}^{\alpha}| i \in \alpha\}\), where \(x_{i}^{\alpha}\) is the state of variable node i in region α. For a function node a in region α, the variable states at its neighborhood are denoted as \(\underline{x}_{\partial a}^{\alpha}\equiv \{ x_{i}^{\alpha}| i \in \partial a\}\). A Boltzmann factor \(\varPsi_{\alpha}(\underline{x}_{\alpha})\) is defined for region α as

$$ \varPsi_\alpha(\underline{x}_\alpha) = \prod _{i\in \alpha} \psi_i^{c_\alpha} \bigl(x_i^\alpha\bigr) \prod_{a\in \alpha} \psi_a^{c_\alpha}\bigl(\underline{x}_{\partial a}^\alpha \bigr) . $$
(24)

A variable node i may belong to two or more regions (say α,γ,…). If this is the case, we regard the states of node i in the different regions, \(x_{i}^{\alpha}, x_{i}^{\gamma}, \ldots\), as different variables. With the introduction of these new variables, the partition function expression (23) is re-written as

$$ Z(\beta) = \sum _{\underline{x}} \sum _{\{\underline{x}_\alpha\}} \prod _{\alpha \in R} \biggl( \varPsi_\alpha(\underline{x}_\alpha) \prod _{i\in \alpha} \delta\bigl(x_i^\alpha, x_i\bigr) \biggr) = \sum _{\{\underline{x}_\alpha\}} \prod _{\alpha \in R} \varPsi_\alpha( \underline{x}_\alpha) \prod_{i=1}^{N} \biggl[\, \sum _{x_i} \prod _{\gamma\in R_i} \delta\bigl(x_i^\gamma, x_i\bigr) \biggr]. $$
(25)

The Kronecker delta functions \(\delta(x_{i}^{\alpha}, x_{i})\) guarantee that, the partition function is contributed only by those region configurations \(\{\underline{x}_{\alpha}| \alpha \in R\}\) for which the states \(x_{i}^{\alpha}, x_{i}^{\gamma}, \ldots\) of each variable node i in different regions are equal to the same value.

Let us for the moment focus on the region subgraph R i induced by variable node i. Assume R i contains n i regions. The expression within the square brackets in (25) requires that the states of variable node i take the same value among the n i regions, which is equivalent to (n i −1) constraints. These (n i −1) constraints of each variable node i can be conveniently implemented in the following way: (1) Attach on each edge (μ,ν) of region graph R a variable node set, the cross-linker set μ#ν, which is initially empty. (2) For each variable node i∈{1,2,…,N}, choose (n i −1) different edges of the induced region graph R i in such a way that the n i regions of R i form a connected tree with these edges; then for each of these (n i −1) chosen edges (μ,ν), add i to its cross-linker set μ#ν. (3) After all the cross-linker sets are constructed, for each variable node j in any cross-linker set μ#ν, a constraint \(\delta(\sigma_{j}^{\mu}, \sigma_{j}^{\nu})\) is applied, forcing the states of j to be the same in the two connected regions μ and ν.

The region graph shown in Fig. 2 is non-redundant, and there is no need to remove redundancy. The left panel of Fig. 3 shows a redundant region graph R for the 2D Edwards-Anderson model. This region graph has three types of regions, and each variable node (such as node i) appears in 9 different regions. The region subgraph R i induced by variable node i contains 12 directed edges. One particular way of removing redundancy is indicated by the X and XX symbols on the directed edges of R. After removing redundancy, the resulting graph R # is shown in the right panel of Fig. 3. This graph R # is actually equivalent to the region graph shown in Fig. 2 (we will return to this point later).

Fig. 3
figure 3

Removing redundancy from a region graph. (Left panel) a redundant region graph R for the factor graph shown in Fig. 1. There are three types of regions in this region graph, as indicated by the three different colors. Each variable node (e.g., i,j,k,l) appears in 9 different regions. A X symbol on a directed edge μν (between a parent region μ and a child region ν) means that the single common variable node (say i) of μ and ν is not included into the cross-linker set μ#ν (that is, μ#ν=∅). A XX symbol on the edge μν means that the two common variable nodes (say i,j) of μ and ν are not included into the set μ#ν (again, μ#ν=∅). (Right panel) the resulting non-redundant graph R # after all the redundancy in R has been removed (Color figure online)

By this construction, it is easy to check that the states of each variable node i is associated with (n i −1) constraints. For some edges (μ,ν) of the region graph R, the cross-linker set might be empty, μ#ν=∅. We can remove all such edges from the region graph R, and the resulting graph is denoted as R # (see the right panel of Fig. 3 for an example). The partition function (25) is re-expressed as

$$ Z(\beta) = \sum _{\{\underline{x}_\alpha\}} \prod _{\alpha \in R} \varPsi_\alpha(\underline{x}_\alpha) \prod _{(\mu,\nu)\in R^\#} \biggl[ \prod _{i\in \mu\#\nu} \delta\bigl(x_i^\mu, x_i^\nu\bigr) \biggr] . $$
(26)

We should emphasize that the above-mentioned procedure of removing redundancy does not change the counting numbers, it only assigns a cross-linker set μ#ν (might be empty) on each directed edge (μ,ν) of the region graph R.

3.2 Introducing Auxiliary Cavity Probability Functions

For each edge (μ,ν) of the region graph R # we now introduce two auxiliary probability distribution functions, \(p_{\mu\rightarrow \nu}(\underline{x}_{\mu\#\nu}^{\nu})\) and \(p_{\nu \rightarrow \mu}(\underline{x}_{\mu\#\nu}^{\mu})\). They are non-negative and normalized but otherwise arbitrary. The microscopic configuration \(\underline{x}_{\mu\#\nu}^{\mu}\) denotes the states of variable nodes of set μ#ν in region μ, i.e., \(\underline{x}_{\mu\#\nu}^{\mu}\equiv \{ x_{i}^{\mu}| i \in \mu\#\nu\}\); similarly, \(\underline{x}_{\mu\#\nu}^{\nu}\equiv \{x_{i}^{\nu}| i\in \mu\#\nu\}\). We further rewrite (26) as

$$ Z(\beta) = \sum _{\{\underline{x}_\alpha\}} \prod _{\alpha} \biggl[ \varPsi_\alpha (\underline{x}_\alpha ) \prod_{\gamma \in \partial^{\#} \alpha} p_{\gamma\rightarrow \alpha}\bigl( \underline{x}_{\alpha\#\gamma}^\alpha\bigr) \biggr] \prod _{(\mu, \nu)\in R^{\#}} \biggl[ \frac{\prod_{i\in \mu\#\nu} \delta(x_i^\mu, x_i^\nu)}{ p_{\mu\rightarrow \nu}(\underline{x}_{\nu\#\mu}^\nu) p_{\nu\rightarrow \mu}(\underline{x}_{\mu\#\nu}^\mu)} \biggr] , $$
(27)

where # α denotes the set of nearest neighboring regions of region α in region graph R #.

It is helpful to define two partition function factors as follows:

(28)
(29)

Then (27) is written as the following simple form

$$ Z(\beta) = Z_0 \sum _{\{\underline{x}_\alpha\}} \prod _{\alpha} w_\alpha(\underline{x}_\alpha) \prod_{(\mu, \nu)\in R^\#} \bigl[1+ \Delta_{(\mu,\nu)} \bigl( \underline{x}_{\mu\#\nu}^\mu, \underline{x}_{\nu\#\mu}^\nu \bigr) \bigr] , $$
(30)

where the coefficient Z 0 is equal to

$$ Z_0 = \prod _{\alpha\in R} Z_\alpha \prod_{(\mu,\nu)\in R^\#} \frac{1}{Z_{(\mu, \nu)}} , $$
(31)

and Δ(μ,ν) is expressed as

$$ \Delta_{(\mu, \nu)}\bigl(\underline{x}_{\mu\#\nu}^\mu, \underline{x}_{\nu\#\mu}^\nu\bigr) = \frac{Z_{(\mu,\nu)} \prod_{i\in \mu\#\nu} \delta(x_i^\mu, x_i^\nu)}{p_{\mu\rightarrow \nu}(\underline{x}_{\nu\#\mu}^\nu) p_{\nu\rightarrow \mu}(\underline{x}_{\mu\#\nu}^\mu)} - 1 , $$
(32)

and \(w_{\alpha}(\underline{x}_{\alpha})\) is a probability distribution defined as

$$ w_\alpha (\underline{x}_\alpha) \equiv \frac{1}{Z_\alpha} \varPsi_\alpha(\underline{x}_\alpha) \prod _{\gamma \in \partial^\# \alpha} p_{\gamma \rightarrow \alpha} \bigl(\underline{x}_{\alpha\#\gamma}^\alpha \bigr) . $$
(33)

We regard Δ(μ,ν) as small quantities and expand the edge-product of (30), and finally obtain

$$ Z(\beta) = Z_0 \biggl[ 1+ \sum _{r\subseteq R^\#} L_r \biggr] , $$
(34)

where r denotes any subgraph of the region graph R #, which contains a subset of the edges of R # and the associated regions. The correction contribution L r of a region subgraph r is expressed as

$$ L_r = \sum _{\{\underline{x}_\gamma | \gamma \in r\}} \prod _{\gamma \in r} w_\gamma(\underline{x}_\gamma) \prod _{(\mu, \nu) \in r} \Delta_{(\mu,\nu)}\bigl(\underline{x}_{\mu\#\nu}^\mu, \underline{x}_{\nu\#\mu}^\nu\bigr) . $$
(35)

3.3 Region Graph Belief-Propagation Equation

Many of the subgraphs of the region graph R # contain dangling edges. A dangling edge (μ,ν) in a subgraph r is such an edge that if it is cut off from subgraph r, either region μ or region ν (or both) will be an isolated region of r. For a dangling edge (μ,ν) of subgraph r, suppose the region μ has only this edge attached to it. The correction contribution L r of subgraph r is calculated to be

(36)

where \(\hat{p}_{\mu\rightarrow \nu}(\underline{x}_{\nu\#\nu}^{\nu})\) is a probability distribution function determined by

(37)
(38)

In the above expressions, # μν denotes the set of nearest-neighboring regions of μ in the region graph R #, but with ν being removed from this set.

From (36) we know that, if the auxiliary probability distributions {p μν ,p νμ } are chosen as a fixed-point of the following equation,

$$ p_{\mu\rightarrow \nu}\bigl(\underline{x}_{\nu\#\mu}^\nu \bigr) = \frac{ \sum_{\underline{x}_\mu} \varPsi_\mu(\underline{x}_\mu) \prod_{\gamma\in \partial^\# \mu\backslash \nu} p_{\gamma\rightarrow \mu} (\underline{x}_{\mu\#\gamma}^\mu) \prod_{i\in \mu\#\nu} \delta(x_i^\mu, x_i^\nu) }{ \sum_{\underline{x}_\mu} \varPsi_\mu(\underline{x}_\mu) \prod_{\gamma \in \partial^\# \mu\backslash \nu} p_{\gamma\rightarrow \mu} (\underline{x}_{\mu\#\gamma}^\mu)} , $$
(39)

then all the subgraphs r of the region graph R # with at least one dangling edges will have zero correction contribution. Then the free energy of the system is expressed as

$$ F(\beta) = F_{0} -\frac{1}{\beta} \ln \biggl[1+ \sum _{r^{\mathit{loop}} \subseteq R^\#} L_{r^{\mathit{loop}}} \biggr] , $$
(40)

where r loop denotes a subgraph of R # that is free of dangling edges (in r loop, each region has at least two edges attached). Equation (39) is referred to as the region graph belief-propagation (rgBP) equation. It ensures that all the subgraphs of R # with one or more dangling edges have zero correction contribution to the partition function Z(β), therefore greatly reduces the number of terms in the free energy correction expression.

It is not difficult to check that, the expression of the rgBP equation (39) for the region graph R # shown in the right panel of Fig. 3 is equivalent to that of the rgBP equation for the non-redundant region graph shown in Fig. 2. In this sense, R # of Fig. 3 (right panel) is equivalent to R of Fig. 2.

3.4 Region Graph Bethe-Peierls Free Energy F 0

When all the loop correction contributions in (40) are neglected, the remaining term F 0 is an approximation to the true free energy F(β). We refer to F 0 as the region graph Bethe-Peierls free energy, its expression is

$$ F_{0} = \sum _{\alpha \in R} f_\alpha - \sum _{(\mu,\nu)\in R^\#} f_{(\mu, \nu)} , $$
(41)

with

(42)

The free energy F 0 can also be regarded as a functional of the probability functions {p μν ,p νμ }. A nice property of this functional is that, the partial derivative of F 0 with respective to any probability function p μν vanishes at a fixed point of the rgBP equation (39). This can easily be checked by showing that

(43)

where the probability function \(\hat{p}_{\nu\rightarrow \mu}\) has been defined through (38).

From the variational property of F 0, we know that each fixed point of the rgBP equation (39) locates a stationary point of the functional F 0 and vice versa, namely there is a one-to-one correspondence between the stationary points of the F 0 functional and the fixed points of the rgBP equation. A stationary point of F 0 could be a minimum, it could also be a maximum, or be a saddle point.

Using the set of probability functions {p μν ,p νμ } as order parameters, the functional F 0 gives an approximate description of the free energy landscape of model (1). If F 0 has only one stationary point (expected to be a minimum), then the rgBP equation has only one fixed point (solution). This simple situation can be referred to as the ‘replica-symmetric’ (RS) case, in connection to the physics approaches based on the replica method and the cavity method [23, 27]. In many non-trivial problems, however, the F 0 functional may turn out to be very rugged in shape and have many stationary points, then (39) has multiple solutions. This later complex situation can be referred to as the ‘replica-symmetry-broken’ (RSB) case. We will discuss this RSB case in the next Sect. 4. A schematic illustration of the qualitative change in shape of F 0 is shown in Fig. 4.

Fig. 4
figure 4

The region graph Bethe-Peierls free energy F 0 as a functional of probability functions may have only a single stationary point (left panel), or it may have multiple stationary points including minima, maxima, and saddles (right panel). This qualitative change in functional shape is induced by variation in environmental temperature or other control parameters

The free energy F 0 can be expressed as a sum over the contributions of all the regions. For any region α of the region graph R, we have the identity c α +∑ γ>α c γ =1. Then F 0 can be re-written as

(44)

Each region α contributes a term \(c_{\alpha}\tilde{F}_{\alpha}\) to F 0. The expression of \(\tilde{F}_{\alpha}\) is

$$ \tilde{F}_\alpha = f_\alpha - \sum _{(\alpha\rightarrow \gamma) \in R^\#} f_{(\alpha,\gamma)} + \sum _{\mu < \alpha} \biggl[ f_\mu - \sum _{(\mu\rightarrow \nu)\in R^\#} f_{(\mu,\nu)} \biggr] . $$
(45)

Notice that (45) has the same expression as (41), the only difference is that the summations are restricted to the subgraph containing region α and all its descendants. Therefore, \(\tilde{F}_{\alpha}\) can be regarded as the region graph Bethe-Peierls free energy of this subgraph α.

3.5 Factor Graph as the Simplest Region Graph

It has been demonstrated in [52] that the conventional belief-propagation (BP) equation can be derived from the framework of partition function expansion. Actually BP is just a limiting case of the more general rgBP equation.

Given a factor graph G, let us consider the following simplest region graph R: there are N ‘variable’ regions denoted by i=1,2,…,N, each of which contains a single variable node i, with counting number c i =1−k i (k i ≡|∂i| being the degree of node i in G); and there are M ‘function’ regions denoted by a=1,2,…,M, each of which contains a function node a and the set ∂a of all the nearest-neighboring variable nodes of node a, with counting number c a =1. Such a simple region graph is non-redundant, although it in general contains many loops.

Let us define two new probability functions b ia (x i ) and b ai (x i ) as

(46)

Using the rgBP equation (39), one can easily check that the newly defined probability functions satisfy the conventional BP equation:

(47)

The probability b ia (x i ) can be interpreted as the state distribution of variable node i in the absence its interaction with function node a; while the probability b ai (x i ) can be interpreted as the state distribution of i if it only interacts with function node a.

The free energy F 0 for this simplest region graph reduces to the conventional Bethe-Peierls free energy. Its expression is \(F_{0} = \sum_{a\in G} \tilde{F}_{a} - \sum_{i\in G}(k_{i} -1) \tilde{F}_{i}\). The free energies \(\tilde{F}_{i}\) (for a variable node i) and \(\tilde{F}_{a}\) (for a function node a) are expressed as

$$ \everymath{\displaystyle} \begin{array}{l} \tilde{F}_i = -\frac{1}{\beta} \ln \biggl[\,\sum _{x_i} \psi_i(x_i) \prod _{a\in \partial i} b_{a\rightarrow i}(x_i) \biggr] , \\[6pt] \tilde{F}_a = -\frac{1}{\beta}\ln \biggl[ \,\sum _{\underline{x}_{\partial a}} \psi_a(\underline{x}_{\partial a}) \prod_{i\in \partial a} \biggl[ \psi_i(x_i) \prod_{c\in \partial i\backslash a} b_{c\rightarrow i}(x_i) \biggr] \biggr] . \end{array} $$
(48)

Only the messages b ai (x i ) from the function regions (parents) to the variable regions (children) appear in the above two expressions. We now proceed to demonstrate that, such a property holds for a general non-redundant region graph.

3.6 Parent-to-Child Message-Passing in a Non-redundant Region Graph

In a non-redundant region graph R, the subgraph R i induced by any variable node i is a connected tree. Because of this property, R # is identical to R and the cross-linker set for each directed edge μν contains all the variable nodes in ν (μ#ν={i|iν}). It is easy to see that, if there is a directed path pointing from a region α to another region γ, then such a path must be unique in R; furthermore, the subgraph R a induced by any function node a is also a connected tree.

When we approximate the free energy F(β) by the region graph Bethe-Peierls free energy F 0, the corresponding approximate expression for the marginal configuration distribution of a region γ is

$$ p_\gamma (\underline{x}_\gamma) \propto \varPsi_\gamma (\underline{x}_\gamma) \prod _{\alpha \in \partial^\# \gamma} p_{\alpha \rightarrow \gamma}\bigl(\underline{x}_{\gamma\#\alpha}^\gamma \bigr) . $$
(49)

This marginal distribution has the following consistency property: if (μ,ν) is an edge of the region graph R, then

$$ \sum _{\underline{x}_\mu \backslash \underline{x}_{\mu\#\nu}} p_{\mu}(\underline{x}_\mu) \ = \ \sum _{\underline{x}_\nu \backslash \underline{x}_{\mu\#\nu}} p_{\nu}(\underline{x}_\nu) , $$
(50)

where \(\underline{x}_{\mu \# \nu} \equiv \{ x_{i} | i \in \mu\#\nu\}\) is the configuration of the variables in the cross-linker set μ#ν. Equation (50) ensures that the marginal configuration distribution of the cross-linker set μ#ν is the same whether it is inferred from \(p_{\mu}(\underline{x}_{\mu})\) or from \(p_{\nu}(\underline{x}_{\nu})\).

When R is non-redundant, we have # γ=∂γ in (49), where ∂γ is just the set of nearest-neighboring regions of region γ. Then, with the help of (39), the expression (49) is rewritten as

$$ p_\gamma (\underline{x}_\gamma ) \propto \prod _{\eta\in I_\gamma} \varPsi_\eta ( \underline{x}_\eta) \prod _{\{(\mu\rightarrow \nu) | \mu\in B_\gamma, \nu\in I_\gamma\} } p_{\mu\rightarrow \nu} (\underline{x}_\nu) . $$
(51)

In this expression, the region set I γ contains γ and all its descendants, i.e., I γ ≡{η|ηγ}; B γ is another region set with the property that any region in B γ does not belong to I γ but is parental to at least one region in I γ , see Fig. 5. The configuration \(\underline{x}_{\nu}\) for any region νI γ should be understood as \(\underline{x}_{\nu}\equiv \{x_{i}^{\gamma}| i\in \nu\}\).

Fig. 5
figure 5

For any region γ in a non-redundant region graph R, the set I γ is formed by γ and all its descendant regions, and the set B γ is formed by all the other regions that do not belong to I γ but are pointing to regions of I γ by one or more edges. Each white disk in this figure represents a region

For a non-redundant region graph R, we can prove (see Appendix A) that, the marginal probability distribution (51) is equivalent to

$$ p_{\gamma}(\underline{x}_\gamma) \propto \biggl[\, \prod _{a\in \gamma} \psi_a( \underline{x}_{\partial a}) \prod _{i\in \gamma} \psi_i(x_i) \biggr] \prod _{\{(\mu\rightarrow \nu) | \mu\in B_\gamma, \nu\in I_\gamma\}} m_{\mu\rightarrow \nu}(\underline{x}_\nu) , $$
(52)

where \(m_{\mu\rightarrow \nu}(\underline{x}_{\nu})\) is a probability distribution that satisfies the self-consistent equation (53). The message \(m_{\mu\rightarrow \nu}(\underline{x}_{\nu})\) is interpreted as the (cavity) probability that the variable set {iν} takes configuration \(\underline{x}_{\nu}\) in region μ when all the interactions that also appear in region ν are not considered (namely, setting ψ i (x i )=1 for any iν and \(\psi_{a}(\underline{x}_{\partial a})=1\) for any aν, while all the other interactions of region μ are not modified). The marginal probability expression (52) has the nice property that only the parent-to-child messages m μν are needed, but not the messages from children to parents.

The self-consistent equation for \(m_{\mu\rightarrow \nu}(\underline{x}_{\nu})\), derived from (39) after a lengthy process (see Appendix A), is intuitively simple:

$$ m_{\mu\rightarrow \nu}(\underline{x}_\nu) \propto \sum _{\underline{x}_\mu \backslash \underline{x}_\nu} \biggl[ \prod _{a\in \mu\backslash \nu} \psi_a(\underline{x}_{\partial a}) \prod _{i\in \mu\backslash \nu} \psi_i(x_i) \biggr] \prod _{\{(\alpha\rightarrow \eta) | \alpha\in B_\mu, \eta \in I_\mu\backslash I_\nu\}} m_{\alpha\rightarrow \eta}(\underline{x}_\eta) . $$
(53)

For a non-redundant region graph R, the marginal probability distributions \(p_{\gamma}(\underline{x}_{\gamma})\) should satisfy the following edge-consistency condition, namely

$$ \sum _{\underline{x}_\mu \backslash \underline{x}_\nu} p_\mu ( \underline{x}_{\mu}) = p_\nu ( \underline{x}_{\nu}) , \quad \forall (\mu\rightarrow \nu) \in R . $$
(54)

To check this consistency, plug (52) into the two sides of the above equation, we get the follow equation that must be satisfied by m μν :

$$ m_{\mu\rightarrow \nu} (\underline{x}_\nu) = C \frac{\prod_{\underline{x}_\mu \backslash \underline{x}_\nu} \prod_{a\in \mu\backslash \nu} \psi_a(\underline{x}_{\partial a}) \prod_{i\in \mu\backslash \nu} \psi_i(x_i) \prod_{\{(\alpha\rightarrow \eta ) | \alpha \in B_\mu, \eta \in I_\mu \backslash I_\nu\}} m_{\alpha \rightarrow \eta}(\underline{x}_\eta)}{ \prod_{\{\eta \rightarrow \lambda | \eta \in (B_\nu \cap I_\mu)\backslash \mu, \lambda \in I_\nu \backslash \nu\}} m_{\eta\rightarrow \lambda}(\underline{x}_\lambda)} , $$
(55)

where the normalization constant C is determined by \(\sum_{\underline{x}_{\nu}} m_{\mu\rightarrow \nu}(\underline{x}_{\nu}) = 1\). For a non-redundant graph R, the set (B ν I μ )∖μ is empty, and hence the denominator of the expression on the right side of (55) is equal to 1. Since (55) is equivalent to (53), the edge-consistency condition (54) holds.

If the region graph R is redundant, the marginal probability distribution of each region γ can no longer be written in the form of (52), and the edge-consistency condition (54) is no longer valid (but the weaker condition (50) is still valid). In [53], however, Yedidia and co-authors defined that a region γ has a marginal probability distribution (52), and then they required that this marginal probability distribution should satisfy the edge-consistency condition (54), which leads to the self-consistent equation (55) for \(m_{\mu\rightarrow \nu}(\underline{x}_{\nu})\). Such a treatment might be able to get good results for some problems, but in our opinion it lacks a solid theoretical foundation.

For a non-redundant graph R, the free energy \(\tilde{F}_{\alpha}\) (45) can also be expressed by parent-to-child messages (see Appendix B for details). The final expression is

$$ \tilde{F}_\alpha = -\frac{1}{\beta} \ln \biggl[ \sum _{\underline{x}_\alpha} \prod _{a\in \alpha} \psi_a(\underline{x}_{\partial a}) \prod _{i\in \alpha} \psi_i(x_i) \prod _{\{(\mu\rightarrow \nu) | \mu\in B_\alpha, \nu \in I_\alpha\}} m_{\mu\rightarrow \nu}(\underline{x}_{\nu}) \biggr] . $$
(56)

The free energy expression (56) also appeared in [53] as a basic assumption (both for redundant and non-redundant region graphs). A theoretical challenge for redundant region graphs, worth to be explored in future work, is to derive from the framework of partition function expansion an expression for \(\tilde{F}_{\alpha}\) using only parent-to-child messages.

4 Region Graph Expansion for the Grand Partition Function

If the region graph Bethe-Peierls free energy functional (41) has only a single stationary point, a global minimum, then the rgBP equation (39) has only a single solution (fixed point). Reaching this unique fixed point is computationally not hard. One may perform message-passing iteration along the edges of the region graph R, or one may work on the functional F 0 and find its unique minimum by gradient descend methods. The situation is a little bit more complicated, for example, if the F 0 functional has a single minimum and one or more saddle points. In this situation, we expect that the rgBP equation still has only one stable fixed point, which can be reached by direct iteration (with some damping to accelerate convergence) or by minimizing F 0.

The real non-trivial situation occurs when the F 0 functional has multiple minimal points (see the right panel of Fig. 4), and consequently the rgBP equation (39) has multiple stable fixed points. We take F 0 as an approximate free energy landscape for the system under study. Then each minimal point of F 0 can be regarded as a metastable state s of the system, it describes approximately the system’s collective behavior within certain timescale τ s . The timescale τ s is determined by the shape of the free energy landscape surrounding metastable state s, especially the relative height of the closest free energy saddle point. For some mean-field models (such as spin glasses on random graphs [25]) τ s is expected to approach infinity in the thermodynamic limit N→∞ (see, e.g., [22, 32]), then a metastable state s is a true thermodynamic Gibbs state (i.e., ergodicity is broken at N→∞). For finite-dimensional systems, the timescale τ s of a metastable state s may be finite even in the thermodynamic limit.

In this section, we regard each fixed point of the rgBP equation (39) as a macrostate, no matter whether it is a minimum of F 0 or a maximum or saddle point. Viewing each macrostate as an energy level, it is natural to define a grand partition function Ξ as [25, 29, 52, 55]

$$ \varXi(y; \beta) = \sum_{s} \exp \bigl[ -y F_{0}^{(s)}(\beta) \bigr] , $$
(57)

where \(F_{0}^{(s)}(\beta)\) is the value of F 0(β) at the rgBP fixed point s; the parameter y is the inverse temperature at the level of macrostates.Footnote 3 The grand partition function Ξ(y;β) is associated with a grand free energy G(y;β) through

$$ G(y; \beta) \equiv -\frac{1}{y} \ln \varXi(y; \beta) . $$
(58)

At a given inverse temperature y, the statistical weight of a macrostate s is

$$ \mathcal{P}_B(s) = \frac{\exp [-y F_0^{(s)}(\beta) ]}{ \sum_{s} \exp [-y F_0^{(s)}(\beta) ]} . $$
(59)

From \(\mathcal{P}_{B}(s)\) the mean value of free energy is expressed as \(\langle F_{0}(\beta) \rangle_{y} \equiv \sum_{s} \mathcal{P}_{B}(s) F_{0}^{(s)}(\beta)\). And the entropy density Σ(y;β) of macrostates is expressed as

$$ \varSigma(y; \beta) \equiv - \frac{1}{N} \sum _{s} \mathcal{P}_B(s) \ln \mathcal{P}_B(s) . $$
(60)

Σ(y;β) is also called the complexity in the spin glass literature [25]. It is easy to verify that \(G(y;\beta) = \langle F_{0}(\beta) \rangle_{y} - \frac{N}{y} \varSigma(y; \beta)\).

We are interested in obtaining an approximate expression for the grand free energy G(y;β). For this purpose we follow again the framework of partition function expansion [52]. First notice that

$$ \varXi(y; \beta) = \prod _{(\mu,\nu)\in R^\#} \int \mathrm{D} p_{\mu\rightarrow \nu} \int \mathrm{D} p_{\nu\rightarrow \mu} \delta ( p_{\mu\rightarrow \nu}-B_{\mu\rightarrow \nu} ) \delta ( p_{\nu\rightarrow \mu}-B_{\nu\rightarrow \mu} ) e^{-y F_{0}} , $$
(61)

where ∫Dp μν means integrating over all possible probability measures \(p_{\mu\rightarrow \nu}(\underline{x}_{\nu\#\mu}^{\nu})\) on the edge (μ,ν) of the region graph R #. Because of the Dirac function δ(p μν B μν ({p γμ |γ # μν})), only the fixed points of (39) contribute to the grand partition function.

We then introduce on each edge (μ,ν) of the region graph R # two probability measures P μν (p μν ) and P νμ (p νμ ). Using the expression (41) for the free energy F 0, the grand partition function is re-written as

(62)

This expression is very similar in form to (27), therefore the method of partition function expansion can be directly applied to (62). As a result we obtain that

$$ G(y; \beta) = G_{0} - \frac{1}{y} \ln \biggl[1+ \sum _{r^{\mathit{loop}} \subseteq R^\#} L_{r^{\mathit{loop}}}^{(1)} \biggr] . $$
(63)

The first term on the right of (63) is expressed as

$$ G_{0}(y;\beta) = \sum _{\gamma\in R} g_\gamma - \sum _{(\mu,\nu)\in R^\#} g_{(\mu,\nu)} , $$
(64)

where

(65a)
(65b)

The grand free energy G 0 as expressed by (64) can also be regarded as a functional of the probability functionals {P μν (p μν ),P νμ (p νμ )}.

The correction contribution \(L_{r}^{(1)}\) of a subgraph to the grand free energy has a similar expression as (35) [52]. To ensure that any subgraph with at least one dangling edge has vanishing correction contribution to the grand free energy, each probability function P μν (p μν ) needs to satisfy the following equation

(66)

where

$$ f_{\mu\rightarrow \nu} \equiv - \frac{1}{\beta} \ln \biggl[ \sum _{\underline{x}_\mu} \varPsi_\mu (\underline{x}_\mu) \prod_{\gamma \in \partial^\# \mu \backslash \nu} p_{\gamma \rightarrow \mu} \bigl( \underline{x}_{\mu\#\gamma}^\mu\bigr) \biggr] . $$
(67)

Equation (66) is called the region graph survey-propagation equation, in correspondence to the 1RSB mean-field theory of spin glasses [25, 28]. If we neglect the loop correction contributions to G(y;β), then the grand free energy functional G 0 gives an approximate description of the system’s free energy landscape at the level of macrostates. It can again be verified that, the first variation of G 0 with respect to any of its arguments P μν (p μν ) is identically zero at a fixed point of (66).

If we neglected all the loop correction contributions to G(y;β) and approximate it with G 0, then an approximate expression for the mean free energy of macrostates is

(68)

And the complexity Σ(y;β) is expressed as

$$ \varSigma(y; \beta) \approx \frac{ y ( \langle F_0(\beta) \rangle_y - G_0)}{N} . $$
(69)

It is particularly interesting to determine the complexity value at y=β, and to determine the maximal value of y at which the complexity becomes negative [23].

It is not a simple task to solve numerically the region graph survey-propagation equation (66). A direct approach is to iterate (66) on the region graph, with each probability functional P μν (p μν ) represented by a sample set of probability functions \(p_{\mu\rightarrow \nu}(\underline{x}_{\nu\#\mu}^{\nu})\). A major complication is the reweighting factor \(e^{-y f_{\mu\rightarrow \nu}}\) in (66). In the special situation of y=β, the reweighting can be replaced by introducing new auxiliary probability functions [18, 22]. Several reweighting tricks were discussed in the thesis of Zdeborová [54], which may be helpful for solving the region-graph survey-propagation equation for general y values.

The region-graph survey-propagation equation (66) applies to a general redundant or non-redundant region graph R. Rizzo and co-authors [40] derived a set of generalized survey-propagation equations using the replica cluster variation method, which are different from the region-graph survey-propagation equation (66). The connection between these two approaches needs to be further studied.

5 Numerical Results on the Two-Dimensional Ising and Edwards-Anderson Models

We now apply the region graph belief-propagation equation to the 2D Ising and Edwards-Anderson models, mainly for the purpose of testing its performance. Both models are described by the energy function (8). For the Ising model, all edge coupling constants J ij are equal to a positive value J; for the EA model, J ij are independent and identically distributed random variables, taking value J and −J with equal probability. All the external fields \(h_{i}^{0}\) in (8) are set to be zero for simplicity. In the numerical calculations, J and the Boltzmann constant k B are both set to be unity, so the energy unit is J and the temperature unit is J/k B .

We consider L×L square lattices with L coupling interactions in each dimension. If we assume periodic boundary condition on both dimensions, the total number of vertices (variable nodes) in the lattice is N=L 2; if open boundary condition is used on both directions, the total number of vertices is N=(L+1)2.

The region graph for the square lattice is constructed to be non-redundant. It has three types of regions, the ‘square’ regions, the ‘stripe’ regions, and the ‘rod’ regions. Each square region contains n×n vertices, each stripe region contains n×2 vertices, and each rod region contains n×1 vertices. All the coupling interactions between the vertices of a given region are also included into this region. Each square region is connected to four rod regions, each stripe region is connected to two rod regions, and each rod region is connected to a square region and a stripe region. The region graph R at n=2 is shown in Fig. 2, where a square region and a stripe region have the same shape. For the general case of n≥2, if we regard each square region as a ‘giant vertex’ and each stripe region plus its two connected rod regions as a ‘giant bond’, then the region graph is again a square lattice of giant vertices and giant bonds. In this sense, our region graph representation is a coarse-graining of the original lattice that keeps its topology unchanged.

The partition function of the 2D Edwards-Anderson can be calculated exactly by polynomial algorithms (see, for example, Ref. [48] and references therein). Although the heuristic rgBP message-passing approach can only obtain an approximate value for the free energy, it is very efficient in estimating all the N single-variable marginal probabilities simultaneously.

5.1 Region Graph Belief-Propagation Equations at n=2

A subgraph of the region graph R at n=2 is plotted in Fig. 6. For notational simplicity we denote the square regions and the stripe regions by Greek symbols (such as α and μ) and denote a rod region just by the index of its function node (such as a and c). Consider a square region α and a rod region a in Fig. 6. The two probability distributions between these regions can be parameterized as

(70a)
(70b)

Similarly the two probability distributions between a stripe region μ and a rod region a are expressed as

(71a)
(71b)

The self-consistent equations for these set of parameters are derived from the rgBP equation (39) and listed in Appendix C.

Fig. 6
figure 6

A local part of the region graph R shown in Fig. 2. The counting numbers for a square region α, a stripe region μ, and a rod region a are, respectively, c α =1, c μ =1, c a =−1

A trivial solution of the rgBP equation is the paramagnetic one with the fields in the expressions (70a)–(71b) all being identically zero,

(72)

The stability of the paramagnetic solution is analyzed through a set of linearized rgBP iterative equations listed in Appendix D. All the fields such as \(h_{\alpha\rightarrow a}^{(i)}\) and \(h_{\mu\rightarrow a}^{(i)}\) are randomly initialized, and their values then evolve according to the linearized rgBP equations. If all the fields finally decay to zero, the paramagnetic solution is stable, otherwise it is unstable and the original rgBP equation has other stable fixed points.

The number (2n−1) of parameters needed to completely characterize a probability distribution of the rgBP equation grows quickly with the number n of vertices on a boundary line of the square region. The paramagnetic solution and its stability analysis have been worked out up to n=10 for the Ising model. For the EA model we have only considered the simplest case of n=2.

5.2 The Ferromagnetic Ising Model

At sufficiently high temperatures T the paramagnetic solution (72) is the only solution for the rgBP equation at n=2. This trivial solution becomes unstable at the critical temperature \(T_{c}^{(n=2)}=2.65635\) (periodic boundary conditions). This value is higher than the exact transition temperature T c =2.26919 [16], but a little bit lower than the value of \(T_{c}^{(n=1)}=2.88539\) as obtained through the conventional belief-propagation equation (i.e., n=1) [4, 38].

The paramagnet-ferromagnet transition temperature as predicted by the rgBP equations decreases if larger square regions are used. Figure 7 demonstrates that the predicted critical temperature \(T_{c}^{(n)}\) can be fitted by the following curve

$$ T_c^{(n)} = T_c^\infty + \frac{a}{n^b} , $$
(73)

with \(T_{c}^{\infty}= 2.2376 \pm 0.0037\), a=0.6875±0.0017, and b=0.7140±0.0108. The fitted value \(T_{c}^{\infty}\) is slightly lower than the exact critical point T c =2.2692. The data can also be fitted well by \(T_{c}^{(n)} = T_{c} + a^{\prime}/ n^{b^{\prime}}\), with a′=0.688±0.007 and b′=0.8182±0.0078.

Fig. 7
figure 7

The critical temperature T at which the paramagnetic fixed-point of the rgBP equation becomes unstable (periodic boundary conditions). The integer n is the number of vertices on a boundary line of a square region. The exactly known phase-transition temperature \(T_{c}= 2/\ln(1+\sqrt{2})\simeq 2.2692\) is indicated by the horizontal dashed line. The red solid curve is a fitting function T=2.2376+0.6875n −0.7140 to the data with n≥2

The free energy density, mean energy density, entropy density, and magnetization as a function of temperature T are shown in Fig. 8 and compared with the exactly known results of Onsager [35]. As n increases, the results are closer to the exact value. The free energy as obtained by the rgBP equations is an upper bound to the true free energy value of the system, but the difference is small at n≥2. Both the energy density and the entropy density have a kink at the rgBP critical point \(T_{c}^{(n)}\), but this kink becomes more and more weaker as n increases (in the limit of n→∞ the exact Onsager solution should be reached). Using the theoretical framework of coherent anomaly method [46], the results obtained at different values of n can be used to predict the critical exponents of the 2D Ising model. The results of such an exercise (to be carried out) will be reported elsewhere.

Fig. 8
figure 8

Results obtained on the ferromagnetic Ising model (periodic boundary conditions) using the conventional BP and the rgBP at various values of n. The results of Onsager’s exact solution are also shown for comparison. (Upper left) free energy density; (upper right) mean energy density; (lower left) entropy density; (lower right) magnetization

Under the periodic boundary condition, the instability temperature of the paramagnetic solution shown in Fig. 7 is independent of lattice side length L. But this is not the case for the open boundary condition. Under the open boundary condition, the shorter the lattice side length L, the more stable the paramagnetic solution is (see Fig. 9). However, this difference in threshold temperature between the periodic and the open boundary conditions becomes very small for L>100. The rgBP equation at n=2 have two stable ferromagnetic fixed points even for a very small 5×5 square lattice (open boundary conditions) if the temperature is lower than 1.92 (see Fig. 9). For larger open square lattices, the ferromagnetic solutions are stable at higher temperatures. Of course for finite lattices there is no real phase transition. The two low-temperature ferromagnetic solutions are interpreted as describing the two metastable states of the square lattice. They are stable because the correlation length in the system exceeds the length scale of the maximal square region of the region graph. If n exceeds the correlation length of the system, the paramagnetic fixed point of the rgBP equation will again be stable.

Fig. 9
figure 9

Threshold instability temperature of the paramagnetic solution for the Ising model on an L×L square lattice with open boundary condition. Circular points are obtained by the BP approximation, square points by the rgBP at n=2. At each value of L, the threshold temperature as predicted by rgBP at n=2 is lower than that predicted by the conventional BP

5.3 The Edwards-Anderson Model

A set of single instances of the 2D EA model with ±J coupling constants are randomly generated. For each instance the threshold stability temperature \(T_{c}^{(n=2)}\) of the paramagnetic solution of the rgBP equation at n=2 is determined numerically, and this value is compared with the threshold value as obtained by the conventional BP approximation. As shown in Fig. 10, the performance of rgBP (n=2) is better than that of BP, but \(T_{c}^{(n=2)}\) is still positive, and it increases with the side length L of the periodic square lattice. Figure 10 also suggests that the finite-size corrections to the critical temperature \(T_{c}^{(n=2)}\) decrease as 1/ln(L), similar to the observation of [8].

Fig. 10
figure 10

The threshold instability temperature of the paramagnetic solution of the conventional BP and the rgBP equation at n=2 for L×L periodic square lattices. Each data point is obtained by averaging over 10 single instances of the Edwards-Anderson model. The dashed lines are two fitting curves of the form T=ab/log2(L), the two fitting parameters a, b are shown in the figure

The EA model on a square lattice has no real spin glass phase at finite temperatures (see, for example, [33, 44]). At any positive temperature T the system has only a paramagnetic phase, and the magnetization on each vertex is equal to zero in the long time limit. Apparently, the results of Fig. 10 with the paramagnetic solution being unstable at positive temperatures are in contradiction with the absence of finite-temperature spin glass phase.

This apparent discrepancy can actually be removed. The instability of the paramagnetic rgBP solution does not mean the system is in a spin glass phase. It just signifies the emergence of some collective domains in the square lattice, the length scales of these collective domains exceed the length scale (=n) of the region graph’s maximal square region. The detailed arguments go as follows.

An elementary square (including four coupling interactions) of the square lattice is called a plaquette. There are two types of plaquettes, frustrated or non-frustrated. A plaquette is said to be frustrated (non-frustrated) if the product of the four edge coupling constants on its boundary is negative (positive) [49]. It is an obvious fact that the four edge coupling energies of a frustrated plaquette can not be simultaneously minimized. For the EA model studied in this paper, on average one-half of the plaquettes in each problem instance are frustrated, and these frustrated plaquettes are randomly distributed on the 2D lattice (see the upper left panel of Fig. 11 for a concrete example).

Fig. 11
figure 11

Emergence of domains with collective dynamics. (Upper left) Distribution of frustrated plaquettes in a single instance of the 2D Edwards-Anderson model with periodic boundary conditions. Each frustrated plaquette is shown as black, while each non-frustrated plaquette is shown as white. For this instance, the paramagnetic solution of the rgBP equation at n=2 becomes unstable at \(T_{c}^{(n=2)}=1.6618\) (β=0.60177). (Upper right) At temperature T=1.6502 (β=0.606), which is just slightly below \(T_{c}^{(n=2)}\), one small collective domain begins to emerge, as demonstrated by non-zero magnetizations for some plaquettes in a connected cluster of plaquettes. The color of each plaquette encodes the mean value of absolute magnetizations as averaged over its four vertices. (Lower right) At a further decreased temperature T=1.5385 (β=0.65), another collective domain becomes quite evident. As the temperature further decreases, more collective domains form, and the formed collective domains enlarge in size and their boundaries start to be in contact. As an example, the mean absolute magnetization of each plaquette is shown at T=1.3333 (β=0.75) at the lower left panel (Color figure online)

The abundance of frustrated plaquettes destroys the long-range ferromagnetic correlations in the system. However, the local density of frustrated plaquettes fluctuates considerably at different parts of the lattice. There are some small patches of the lattice that are mainly formed by non-frustrated plaquettes. For example, in the 64×64 periodic square lattice of Fig. 11 (upper left panel), there is a 5×5 patch centered at position (62,28) in which only 4 of its 25 plaquettes are frustrated. The spin coupling interactions in such small patches are essentially ferromagnetic in nature (under a gauge transformation of the spin variables and the coupling constants [49]). Some of these ferromagnetic small patches may considerably exceed the maximal n×n square region in size, and at low enough temperatures, the ferromagnetic correlation length will exceed n, making the rgBP fixed point to be locally ferromagnetic (after the gauge transform) in these patches but paramagnetic in the remaining parts of the square lattice. Given a patch with a specified contour shape and a specified density of unfrustrated plaquettes, the probability to discover such a patch is higher in a square lattice with longer side length L. Therefore, the observation that \(T_{c}^{(n=2)}\) increases with lattice size L (Fig. 10) is consistent with Fig. 9.

The marginal probability distribution of the two spins σ i and σ j of a rod region a (see Fig. 6) is calculated as

(74)

Then we get the magnetization of vertex i as

$$ m_i = \frac{\tanh [\beta (h_{\alpha\rightarrow a}^{(i)} + h_{\mu\rightarrow a}^{(i)}) ] + \tanh [\beta (h_{\alpha\rightarrow a}^{(j)} + h_{\mu\rightarrow a}^{(j)}) ] \tanh [\beta (J_{\alpha\rightarrow a}^{(i j)} +J_{\mu\rightarrow a}^{(i j)} - J_{i j} ) ]}{ 1+ \tanh [\beta (h_{\alpha\rightarrow a}^{(i)} + h_{\mu\rightarrow a}^{(i)}) ] \tanh [\beta (h_{\alpha\rightarrow a}^{(j)} + h_{\mu\rightarrow a}^{(j)}) ] \tanh [\beta (J_{\alpha\rightarrow a}^{(i j)} +J_{\mu\rightarrow a}^{(i j)} - J_{i j} ) ] } . $$
(75)

The magnetizations of all the vertices in the square lattice can be obtained in a similar way. We define the mean absolute magnetization of a plaquette as the average of the absolute magnetizations of its four vertices.

Figure 11 shows the pattern of mean absolute plaquette magnetizations at a given temperature T, for a single instances of the EA model on a 64×64 periodic square lattice. For β<0.60177 the rgBP equation has only the paramagnetic solution, and all the plaquette mean magnetizations are zero. At β=0.606, the mean absolute magnetizations of a small domain of the square lattice are nonzero (Fig. 11, upper right panel). As temperature further decreases, this collective domain enlarges in area, and also other collective domains start to form (Fig. 11, lower right panel, β=0.65). As temperature further decreases, different collective domains start to get into contact and they compete for the boundary vertices (Fig. 11, lower left panel, β=0.75).Footnote 4

The heterogeneous patterns and its evolution shown in Fig. 11 in some respect are similar to the phenomenon of dynamical heterogeneity in structural glasses [10, 12]. This link deserves to be explored more deeply. The existence of many collective domains and the frustration effects between these domains very probably are responsible for the glassy-like low-temperature dynamics of the 2D Edwards-Anderson model.

Figure 11 also suggests a way to improve the rgBP prediction power. We infer that different local regions of the square lattice have different correlation lengths. To consider more precisely the correlations in the collective domains, a conceptually easy way is to construct region graphs with larger regions for these collective domains, while small regions are used for the remaining parts to lower computational complexity. This adaptive strategy needs to be implemented in future work.

We end this section by emphasizing that, the BP and the rgBP iterative process are still able to converge when the paramagnetic fixed point becomes unstable. For the 2D square-lattice Edwards-Anderson model (periodic boundary conditions), we found that BP converges as long as T>1.52, in agreement with earlier simulation results [8, 20]. The rgBP at n=2 converges at even lower temperatures.

6 Conclusion and Outlook

In this paper we gave a detailed description of the region graph partition function expansion approach (first introduced in [56]), and obtained approximate expresses for the free energy and generalized free energies of a general graphical system with an abundant number of short loops. A series of message-passing equations (such as region graph belief-propagation and region graph survey-propagation) were derived in the expansion process. We have applied the rgBP equation to the square lattice Ising model and Edwards-Anderson model and found that it outperforms the conventional belief-propagation equation. We also demonstrated that the fixed points of the rgBP equation reveal the heterogeneous pattern of collective domains in the square lattice. An adaptive strategy of improving the rgBP performance was also suggested.

As discussed in the previous sections, there are many issues remain to be explored. The adaptive rgBP scheme for the 2D Edwards-Anderson model needs to be implemented. Another very interesting issue is the effect of adding redundancy to the region-graph. A redundant region-graph have been used for the 2D Ising model and Edwards-Anderson model in various papers within the framework of cluster variation method [8, 15, 21, 34, 53]. The performance of rgBP needs to be tested on such a redundant region-graph. We have performed some preliminary computations and found that the results of rgBP depend on the particular way of removing redundancy. For example, the region graph R # shown in the right panel of Fig. 3 actually is equivalent to the non-redundant region graph R of Fig. 2. Some other ways of removing redundancy are able to make the paramagnetic rgBP fixed point to be stable at even lower temperatures.

For a general redundant region graph R, the rgBP equation (39) is not equivalent to the generalized belief-propagation equation (55) of [53]. We are working on the issue of deriving the generalized belief-propagation equation from the approach of partition function expansion.

The three-dimensional Edwards-Anderson model is believed to have a true spin glass phase. This system will be studied using rgBP and the region-graph survey-propagation equation in a future work.

Particles in a dense liquid have translational degrees of freedom. It remains to be seen whether a similar partition function expansion scheme can be worked out for systems with mobile particles. If approximate free energy landscapes can also be built for such systems with the help of message-passing equations, it should be very helpful for understanding structural glasses and supercooled liquids.

A different theoretical approach (the replica cluster variation method) of studying finite-dimensional spin glasses has been explored in [8, 21, 40]. The message-passing process of this replica cluster variation method appears to be much more complicated than the simple rgBP process.