1 Introduction

1.1 Background

Spin systems are a general framework for modeling nearest-neighbor interactions on graphs, and are widely studied in both statistical physics and applied probability. A spin system consists of a large collection of nodes, each of which may be in one of a fixed number of states called spins. A neighborhood structure is specified by edges between the nodes. Interactions between neighboring nodes are determined by edge potentials, which assign an energy value to each edge based on the spin values of its endpoints. In addition, there are vertex potentials which assign an energy value to each node based on the value of its spin. For any configuration \(\sigma \) of spins on the nodes, the energy \(H(\sigma )\) is just the sum of its edge and vertex energies. Based on the Gibbs formalism from statistical physics, the probability of finding the system in configuration \(\sigma \) is then proportional to the weight \(w(\sigma )=\exp (-H(\sigma ))\).

In this paper, we concentrate on two-state spin systems, where each vertex can be in one of two states, referred to as “\(+\)” and “\(-\)”. Such a system can be defined by specifying a \((+,+)\) edge activity \(\beta \), a \((-,-)\) edge activity \(\gamma \), and a vertex activity \(\lambda \), where \(\beta \), \(\gamma \) and \(\lambda \) are non-negative parameters. For a graph \(G=(V,E)\), a configuration \(\sigma : V\rightarrow \left\{ +~,{-}~\right\} \) is an assignment of \(+\) and \(-\) spins to the vertices of \(G\). The weight \(w(\sigma )\) of the configuration \(\sigma \) is given by

$$\begin{aligned} w(\sigma ) = \lambda ^{m(\sigma )}\beta ^{n_+(\sigma )}\gamma ^{n_-(\sigma )}, \end{aligned}$$
(1)

where \(m(\sigma )\) denotes the number of vertices assigned spin \(-\), and \(n_+(\sigma )\) (respectively, \(n_-(\sigma )\)) denotes the number of edges for which both endpoints are assigned spin \(+\) (respectively, \(-\)). The partition function of the model is defined as

$$\begin{aligned} Z = \sum _{\sigma \in \left\{ +,-\right\} ^V}w(\sigma ). \end{aligned}$$
(2)

The partition function, in addition to being a natural weighted generalization of the notion of counting, is a fundamental quantity in statistical physics. For example, it is the normalizing factor in the Gibbs distribution: the probability of occurrence of configuration \(\sigma \) is given by \(\mu (\sigma ) = w(\sigma )/Z\). In addition, many other properties of the model can be deduced by studying the partition function [4].

As a simple concrete example of a two-state spin system, consider the setting \(\beta =1\) and \(\gamma = 0\) (so that configurations with adjacent “\(-\)” spins are assigned weight zero, and thus prohibited), known as the hard-core model. The associated Gibbs distribution is a weighted measure on independent sets in the graph \(G\), in which any independent set \(U\) has weight \(\lambda ^{|U|}\). Another important class of examples, known as the Ising model Footnote 1, is obtained by setting \(\beta = \gamma > 0\). There is a significant qualitative difference between the Ising model with \(\beta = \gamma > 1\) (the ferromagnetic case) and with \(\beta = \gamma < 1\) (the anti-ferromagnetic case). The latter is an example of a “repulsive” model, which means that the edge potentials assign higher weights to edges with different spins at their endpoints, while the ferromagnetic case is “attractive” (higher weights are assigned to edges with the same spin at their endpoints). The parameter \(\lambda \) can be identified with an “external field”, i.e., a bias associated with each spin. The case \(\lambda =1\) corresponds to zero field, while \(\lambda <1\) and \(\lambda >1\) correspond to positive and negative fields respectively. More generally, we will refer to any two-state system satisfying \(\beta \gamma > 1\) as ferromagnetic, and any satisfying \(\beta \gamma < 1\) as anti-ferromagnetic. Also, a model satisfying \(\beta \gamma > 0\) is said to have soft constraints (in the sense that no combination of spin values at adjacent vertices is prohibited). In a sense to be made precise later (see Appendix 1), Ising models capture arbitrary two-state spin systems with soft constraints; in particular, the two descriptions are equivalent on regular graphs. For this reason we will henceforth focus mainly on Ising models.

The theory of spin systems derives in large part from considering the limiting behavior of the Gibbs distribution as the size of the underlying graph goes to infinity. Based on the above formalism for finite graphs, one may define a Gibbs measure \(\mu \) on an infinite graph \(\fancyscript{G}\) by requiring that the marginal distribution on any finite subgraph \(\fancyscript{H}\), conditional on the configuration on \(\fancyscript{G}\backslash \fancyscript{H}\), is given by Eq. (1). (Here the spins in \(\fancyscript{G}\backslash \fancyscript{H}\) act as a fixed boundary condition in (1).) It is a well known result in the statistical physics literature (see, for example, [4]) that at least one such measure \(\mu \) can always be defined. However, for certain values of the parameters of the spin system there may be multiple solutions for \(\mu \), in which case the Gibbs measure is said to be non-unique.

We will now look at the phenomenon of non-uniqueness more closely in the special case when the infinite graph \(\fancyscript{G}\) is a \(d\)-ary tree.Footnote 2 As noted above, the anti-ferromagnetic Ising model essentially captures all two-state anti-ferromagnetic spin systems with soft constraints on regular graphs, and hence it is sufficient to consider the Ising case. Consider an anti-ferromagnetic Ising model on the \(d\)-ary tree with edge activity \(\beta (=\gamma )\) and vertex activity \(\lambda \). It turns out that if \(\beta \ge \frac{d-1}{d+1}\) then the Gibbs measure is unique for all values of \(\lambda \). In particular, in the zero-field case \(\lambda = 1\), the Gibbs measure is unique if and only if \(\beta \ge \frac{d-1}{d+1}\). However, when \(\beta < \frac{d-1}{d+1}\), the Gibbs measure is no longer unique for all values of the vertex activity \(\lambda \). In this case, there exists a critical activity \(\lambda _c\left( \beta , d\right) \ge 1\) such that the Gibbs measure is unique if and only if \(\left| \log \lambda \right| \ge \log \lambda _c(\beta ,d)\). We sketch the curves of \(\log \lambda _c(\beta , d)\) in Fig. 1, for \(d = 5\) and \(d = 13\). The area below the curves is the non-uniqueness region. We note here that the non-uniqueness region is monotonically increasing with degree, so the curve for \(d = 5\) lies strictly below that for \(d = 13\). Also, note that the curves intersect the \(\beta \)-axis at \(\beta = \frac{d-1}{d+1}\).

Fig. 1
figure 1

\(\log \lambda _c(\beta , d)\) for \(d=5\) and \(d = 13\). The curves intersect the \(\beta \)-axis at \(\beta = \frac{2}{3}\) and \(\beta = \frac{6}{7}\) respectively.

The phenomenon of non-uniqueness of the Gibbs measure can also be described in terms of the more algorithmic notion of decay of correlations. We stick to our example of the infinite \(d\)-ary tree. Fix a vertex \(v\) in the tree, and let \(S_l\) be the set of vertices in the tree at distance at least \(l\) from \(v\). Let \(q_v(l, \sigma )\) be the probability of having spin \(+\) at \(v\) conditional on the configuration on \(S_l\) being \(\sigma \). It turns out that uniqueness of the Gibbs measure is equivalent to the condition that the inequality

$$\begin{aligned} |q_v(l,\sigma ) - q_v(l, \tau )| \le \exp (-\Omega (l)), \end{aligned}$$
(3)

holds for any two configurations \(\sigma \) and \(\tau \) on \(S_l\).Footnote 3 The above condition is referred to in the literature as weak spatial mixing.

It has been believed for a long time (and proved in various manifestations) that there is an intimate relationship between weak spatial mixing and the running time of algorithms for approximating the associated partition function: roughly speaking, in the uniqueness region (where there is decay of correlations), the system should be amenable to local algorithms and thus be computationally tractable. A spectacular result in this direction was Weitz’s deterministic fully polynomial time approximation scheme (FPTAS)Footnote 4 for the partition function of the hard-core model, which works on all graphs of degree at most \(d+1\) for all activities \(\lambda \) less than the critical activity \(\lambda _c(d)\) for the uniqueness of the Gibbs measure on the infinite \(d\)-ary tree [15]. This is even more interesting in light of a later breakthrough result of Sly [12] (see also [3]), who showed that the existence of a fully polynomial randomized approximation scheme (FPRAS) for the partition function of the hard-core model on graphs of degree at most \(d+1\) for activities larger than \(\lambda _c(d)\) would imply that \(\mathrm {NP}= \mathrm {RP}\).Footnote 5 Thus the range of validity of Weitz’s algorithm is optimal.

Weitz [15] gave a general two-step framework for designing deterministic algorithms for approximating partition functions of two-state spin systems. To describe this framework, we begin with the standard observation that in order to get an FPTAS for the partition function, it is sufficient to give an FPTAS for the probability of having spin \(+\) at any given vertex \(v\). The first component of the framework is a combinatorial reduction, which shows that the problem of approximating this probability for a general two-state spin system on a graph \(G\) of maximum degree \(d+1\) can be reduced to the problem of approximating the same probability on a related finite subtree of the infinite \((d+1)\)-regular tree rooted at \(v\), in which the spins of some of the vertices are fixed to certain values (this is the so-called self-avoiding walk tree of the graph \(G\)). We emphasize that this is a model-independent reduction, and depends only upon the fact that the number of spin values is two. The associated self-avoiding walk tree, however, may be exponential in the size of the original graph \(G\), and thus one needs to show that it is sufficient to truncate the tree at a depth logarithmic in the size of \(G\) in order to obtain a good approximation. However, since some of the fixed vertices in the tree might be very close to the root \(v\), it is not possible to argue using weak spatial mixing that a logarithmic depth of recursion suffices for approximating the partition function (because the parameter \(l\) in Eq. (3) must be taken to be the minimum distance of a fixed vertex from the root).

Accordingly, the second component of Weitz’s framework is to establish that, for the spin system in question, weak spatial mixing on the infinite \(d\)-ary tree is in fact equivalent to strong spatial mixing, which roughly states that the exponential decay of point-to-set correlations (3) guaranteed by weak spatial mixing holds also when the spins at an arbitrary set of vertices are fixed to arbitrarily chosen values (see Sect. 2 for a precise definition). Weitz [15] established this fact for the hard core model, using a step-by-step comparison of ratios of occupation probabilities on the standard \(d\)-ary tree and on the modified tree with fixed vertices. It was claimed in [15] that such a result holds also for the anti-ferromagnetic Ising model, but to the best of our knowledge no proof of this fact (except in the special zero-field case where \(\lambda =1\); see [11, 16]) has so far been published.

1.2 Contributions

In this paper, we give a proof of the fact that for the anti-ferromagnetic Ising model with any field, weak spatial mixing implies strong spatial mixing on the \(d\)-ary tree. Formally, we have the following theorem.

Theorem 1

For the anti-ferromagnetic Ising model with arbitrary field on the \(d\)-ary tree with \(d\ge 2\), weak spatial mixing implies strong spatial mixing.

Remark

We point out a byproduct of our approach that may be of independent interest. The exact value of the critical field \(\lambda _c(\beta ,d)\) as a function of \((\beta ,d)\) is apparently widely accepted folklore, but the only derivation we could find for it in the literature [4, p. 255] does not provide a formal proof, appealing instead to numerical evidence. We emphasize that our proof of strong spatial mixing does not assume knowledge of \(\lambda _c(\beta ,d)\), and in fact derives its value as a byproduct (see the Remark following Theorem 3 for more details). Thus our approach gives a proof of the location of the uniqueness threshold \(\lambda _c(\beta ,d)\) of two-state anti-ferromagnetic spin systems.

Notice that it is easy to see that in the statement of Theorem 1, “infinite \(d\)-ary tree” can be replaced by “infinite \((d+1)\)-regular tree”, since the \((d+1)\)-regular tree and the \(d\)-ary tree differ only in the degree of the root. Given Weitz’s general reduction described above, we obtain as an almost immediate consequence an FPTAS for the partition function of the anti-ferromagnetic Ising model on graphs of maximum degree at most \(d+1\) throughout the uniqueness region of the Gibbs measure on the \(d\)-ary tree.

Corollary 1

Let \(d \ge 2\). Consider an anti-ferromagnetic Ising model with parameters \(\beta \) and \(\lambda \). For \(\beta \) and \(\lambda \) in the interior of the uniqueness region of the \(d\)-ary tree, every graph of degree at most \(d+1\) exhibits strong spatial mixing. Moreover, for such \(\beta \) and \(\lambda \), there is a deterministic fully polynomial time approximation scheme for the partition function of the associated spin system on graphs of degree at most \(d+1\).

By the translation described in Appendix 1, we can extend this result to general two-state anti-ferromagnetic spin systems. The difference is that the critical activity may now differ for vertices of different degrees. Let \(\lambda _c(\beta ,d)\) be the critical activity for the anti-ferromagnetic Ising model described above (and defined formally in Sect. 2.2.1). Then, we have the following corollary.

Corollary 2

Let \(d \ge 2\). Consider an anti-ferromagnetic two-state spin system with parameters \(\beta \), \(\gamma \) and \(\lambda \). Let \(\beta '\) be the edge potential for the equivalent anti-ferromag-netic Ising model. Let \(\fancyscript{G}\) be the class of graphs with maximum degree \(d + 1\) in which every vertex \(v\) satisfies the condition

$$\begin{aligned} |\log \lambda _v| \triangleq \left| \log \lambda + \frac{\mathrm{deg}{(v)}}{2}(\log {\gamma } - \log {\beta })\right| > \log \lambda _c(\beta ',d). \end{aligned}$$

Then there is a deterministic fully polynomial time approximation scheme for the partition function of the associated spin system on graphs in the class \(\fancyscript{G}\). In particular, the class \(\fancyscript{G}\) includes all \((d+1)\)-regular graphs when \(\beta , \gamma \) and \(\lambda \) are in the interior of the uniqueness region of the \(d\)-ary tree.

We briefly sketch the approach we use to prove our main technical result, namely that weak spatial mixing implies strong spatial mixing (Theorem 1). Inspired by recent work of Restrepo et al. [11], we design a “message” (i.e., an invertible function of the probability of a vertex having spin \(+\) ) such that “disagreements” in the message decay by a constant factor at each vertex of the tree. The challenge is to ensure that such a message can be designed for all points in the uniqueness region of the \(d\)-ary tree. For the special zero-field case of the anti-ferromagnetic Ising model (when \(\lambda = 1\)), such a message is well known [16]. However, this message does not work up to the threshold for general vertex potentials \(\lambda \). Restrepo et al. [11] recently derived a message which works up to the tree uniqueness threshold for the hard-core model. For the general anti-ferromagnetic Ising model, such a message turns out to be more complex than those known for the zero-field case and for the hard-core model. Our message is defined at the beginning of Sect. 3, and the requisite decay property is established in Sect. 4.

We conjecture that our proof of strong spatial mixing based on stepwise decay of messages may lead to further consequences. For example, as shown by Restrepo et al. [11], the message decay property can be used to extend Weitz’s algorithm by exploiting the structure of special classes of graphs to obtain approximation algorithms beyond the tree threshold for those graphs. In addition, our proof demonstrates the versatility of the message approach introduced by Restrepo et al.

Remark

After obtaining our message-decay proof, we received a sketch of Weitz’s original unpublished proof [14]. It is interesting to note that that proof is quite different from ours, and employs a delicate two-step analysis of the tree recursion described in Sect. 2. For reasons mentioned above, we believe that our message-decay proof, in addition to being the first published version of this result, is potentially more robust and flexible than Weitz’s approach; for example, it is not clear how to adapt Weitz’s analysis to obtain stronger results for special classes of graphs such as lattices, as is done in [11].

1.3 Related Work

Our work is mainly motivated by the deterministic counting algorithm of Weitz [15], which was the first to show an interesting connection between the running time of an algorithm not related to Markov chain Monte Carlo and the phase transition phenomenon for spin systems. On the complexity side, using a randomized gadget first proposed by Dyer, Frieze and Jerrum [2] and analysed further by Mossel, Weitz and Wormald [10], Sly [12] proved that if there is an FPRAS for the partition function of the hard-core model on graphs of degree at most \(d\) in the non-uniqueness region of the \(d\)-regular tree, then \(\mathrm {NP}= \mathrm {RP}\), thus showing that the range of validity of Weitz’s algorithm is optimal. Technically Sly’s result holds only sufficiently close to the boundary of the uniqueness region; this restriction was mostly removed in a later paper of Galanis et al. [3].

For two-state anti-ferromagnetic spin systems on unbounded degree graphs, Goldberg, Jerrum and Paterson [6] showed that approximating the partition function for the zero-field case (\(\lambda = 1\)) is NP-hard in the interior of the square \(0 \le \beta , \gamma \le 1\). For bounded degree graphs, subsequent to the results of this paper, Sly and Sun [13] have shown that there is no FPRAS for the partition function of the anti-ferromagnetic Ising model on \(d\)-regular graphs in the non-uniqueness regime of the \(d\)-regular tree, unless \(\mathrm {NP}= \mathrm {RP}\). The results of [13] taken together with the results of this paper therefore strengthen the connections between the theory of NP-hardness and the phase transition phenomenon first established by the results of Weitz [15] and Sly [12] cited above.

A related problem is to get exponential lower bounds on the mixing time of any local Markov chain (Glauber dynamics) that samples from the hard-core and anti-ferromagnetic Ising models. Mossel, Weitz and Wormald [10] and Gerschenfeld and Montanari [5] showed that beyond the uniqueness threshold for \(d\)-regular trees, Glauber dynamics for these models can take exponential time to mix on \(d\)-regular graphs. Gerschenfeld and Montanari also show that for these models on random regular graphs, this threshold for slow mixing is also the threshold beyond which the reconstruction problem on the graph is solvable.

On the algorithmic side, an analysis of Weitz’s algorithm for the zero-field case of the anti-ferromagnetic Ising model appears in [16]. There has been some subsequent progress on the hard-core model on special classes of graphs too: recently, Restrepo et al. [11] used a message-decay proof to get improved strong spatial mixing thresholds on the \(2\)D integer lattice for the hard core model. They achieved this by exploiting the special structure of self-avoiding walk trees obtained when Weitz’s reduction is applied to the lattice. The message-decay proof turns out to be crucial in tightening the analysis to obtain strong spatial mixing over a wider range of parameters for these special trees. Much more is known about algorithms for the ferromagnetic case: Jerrum and Sinclair [7] gave an FPRAS for the Ising model with arbitrary field on graphs of arbitrary degree, while Goldberg, Jerrum and Paterson [6] showed how to extend this to the whole of the ferromagnetic region \(\beta \gamma > 1\) with \(\lambda = 1\). The latter paper [6] also gave an FPRAS for the partition function on graphs of arbitrary degree for parts of the anti-ferromagnetic region \(\beta \gamma < 1\). However, the results of [6] when restricted to bounded degree graphs do not hold throughout the uniqueness region and hence are incomparable to ours.

More recently, the work of [6] has been improved upon by Li, Lu and Yin [9]. They consider two-state anti-ferromagnetic spin systems with zero field (\(\lambda = 1\)) on general graphs, and derive a condition under which an FPTAS exists for approximating the partition function. Their condition requires that \((\beta , \gamma )\) lies in the intersection of the uniqueness regions for all possible degrees \(d\). (Note that in the \((\beta , \gamma )\) parameterization, this is a non-trivial region of the \(\beta \gamma \)-plane.) However, for any fixed degree \(d\), this region is smaller than the uniqueness region for \(d\), which is the range of validity obtained in our present paper. We also point out an important qualitative difference between the parameterization we use (edge potential \(\beta \) and vertex field \(\lambda \)) and the parameterization via two edge potentials (\(\beta \) and \(\gamma \)) used by Li  et al.: while the \((\beta ,\gamma )\) parametrization is not monotonic, in the sense that uniqueness on a \(d\)-regular tree does not imply uniqueness on a \((d-1)\)-regular tree, the \((\beta , \lambda )\) parametrization is monotonic. In fact, as our results show, uniqueness on the \(d\)-regular tree implies strong spatial mixing, and hence uniqueness, on all \(d'\)-regular trees for \(d'\le d\) in the \((\beta , \lambda )\) parametrization. Subsequent to the results of this paper, Li, Lu and Yin [8] have presented a unified framework for establishing the results of this paper as well as their earlier paper [9]; their new approach also improves upon our Corollary 2 in the case of non-regular graphs.

2 Preliminaries

2.1 Notation and Terminology

We will mostly follow the notational conventions of [6]. Given a graph \(G = (V, E)\), a two-state spin configuration is defined as an assignment \(\sigma : V \rightarrow \left\{ +~, {-}~\right\} \) of spins to the vertices. Weights for different configurations are computed in terms of the \((+,+)\)-edge activity \(\beta \), the \((-,-)\) edge activity \(\gamma \) and a vertex activity \(\lambda \), and are given by

$$\begin{aligned} w(\sigma ) = \lambda ^{m(\sigma )}\beta ^{n_+(\sigma )}\gamma ^{n_-(\sigma )}, \end{aligned}$$
(4)

where given the configuration \(\sigma \), \(m(\sigma )\) denotes the number of vertices assigned spin \(-\), and \(n_+(\sigma )\) (respectively, \(n_-(\sigma )\)) denotes the number of edges for which both endpoints are assigned spin \(+\) (respectively, \(-\)). The partition function is defined as

$$\begin{aligned} Z = \sum _{\sigma \in \left\{ -1,1\right\} ^V}w(\sigma ). \end{aligned}$$

We remark that this representation can be easily translated to the usual description in terms of edge potentials and vertex field: for completeness we give the translation in Appendix 1.

Definition 1

(Occupation probability) Given a vertex \(v\) in the graph \(G\), the occupation probability \(p_v\) is the probability that \(v\) is assigned spin \(+\) in a random configuration \(\sigma \) sampled according to the weights defined in Eq. (4).

The standard notion of efficient approximation in complexity theory is that of a fully polynomial time approximation scheme, which we define below in relation to the computation of the partition function.

Definition 2

(Fully polynomial time approximation scheme) Consider a two state spin system with parameters \(\beta ,\gamma \) and \(\lambda \). A deterministic algorithm \(\fancyscript{A}\) for approximating the partition function \(Z\) of the model is called a fully polynomial approximation scheme (FPTAS) if, given an input graph \(G\) and accuracy parameter \(\epsilon > 0\), the algorithm runs in time polynomial in \(1/\epsilon \) and the size of the graph \(G\) and outputs a quantity \(\hat{Z}\) satisfying

$$\begin{aligned} (1-\epsilon )Z \le \hat{Z} \le (1+\epsilon )Z. \end{aligned}$$
(5)

A closely related notion is that of a fully polynomial randomized approximation scheme (FPRAS). In this case, the algorithm \(\fancyscript{A}\) is a randomized algorithm, and takes as input a graph \(G\), an accuracy parameter \(\epsilon > 0\) and a confidence parameter \(\delta > 0\), runs in time polynomial in \(1/\epsilon \), \(\log (1/\delta )\) and the size of the graph \(G\), and outputs a quantity \(\hat{Z}\) which satisfies the condition in Eq. (5) with probability at least \(1 - \delta \).

2.2 The Ising Model

The Ising model corresponds to the case \(\beta =\gamma \). The model is ferromagnetic when \(\beta > 1\) and anti-ferromagnetic when \(\beta < 1\) (the case \(\beta = 1\) is trivial). The zero-field case corresponds to \(\lambda = 1\), the positive field case to \(\lambda < 1\) and the negative field case to \(\lambda > 1\). As shown in Appendix 1, on \(d\)-regular graphs the Ising model is equivalent to general two-state spin systems. Thus, in the rest of this paper, we will concentrate mostly on the Ising case. On non-regular graphs the equivalence still holds; however, the vertex activity \(\lambda \) in the Ising model may then be different on different vertices. The adaptation of our results to this setting is described in Corollary 2.

2.2.1 Phase Transition

The anti-ferromagnetic Ising model exhibits a uniqueness phase transition on the \(d\)-ary tree for \(d \ge 2\). In particular, one can show the existence of a critical activity \(\lambda _c(\beta ,d)\) as expressed in the following classical result (see, e.g., Georgii [4, p. 254]).

Theorem 2

(Existence of a critical activity) Consider the anti-ferromagnetic Ising model on an infinite \(d\)-ary tree with edge activity \(\beta \) and vertex activity \(\lambda \). If \(\beta \ge \frac{d-1}{d+1}\) then the Gibbs measure is unique for all values of \(\lambda \). If \(\beta < \frac{d-1}{d+1}\), then there exists a critical activity \(\lambda _c(\beta ,d) \ge 1\) such that the Gibbs measure is unique if and only if \(\left| \log \lambda \right| \ge \log \lambda _c(\beta ,d)\).

A consequence of uniquenessFootnote 6 of the Gibbs measure is weak spatial mixing, which captures a weak notion of decay of point to set correlations. Let \(p_\rho (\sigma , S)\) be the probability of occupation of the root \(\rho \) of an infinite \(d\)-ary tree when the spins of a set \(S\) of nodes are fixed according to the configuration \(\sigma \). Let \(\delta (\rho ,S)\) denote the distance of \(\rho \) from the set \(S\).

Definition 3

(Weak spatial mixing) Given any two-state spin system, weak spatial mixing is said to hold if for any set \(S\) whose distance \(\delta (\rho , S)\) from the root \(\rho \) of the tree is finite, and any two configurations \(\sigma _1\) and \(\sigma _2\), we have

$$\begin{aligned} |p_\rho (\sigma _1, S) - p_\rho (\sigma _2, S)| \le \exp (-\Omega (\delta (\rho ,S))). \end{aligned}$$

Notice that weak spatial mixing does not guarantee exponential decay of correlations when the set \(S\) contains vertices which are very close to the root \(\rho \), even when \(\sigma _1\) and \(\sigma _2\) differ only on vertices which are very far away from \(\rho \). A related but, as the name suggests, stronger notion is that of strong spatial mixing, which captures the idea that fixing vertices near the root to the same spin should not affect the exponential decay of point-to-set correlations. We note that strong spatial mixing is not in general implied by weak spatial mixing for arbitrary spin systems; see Appendix 2 for a counterexample involving the ferromagnetic Ising model with appropriate parameters.

Definition 4

(Strong spatial mixing)  Given any two-state spin system, strong spatial mixing is said to hold if for any set \(S\) whose distance \(\delta (\rho , S)\) from the root \(\rho \) of the tree is finite, and any two configurations \(\sigma _1\) and \(\sigma _2\) which differ only on a set \(T\subseteq S\) of vertices, we have

$$\begin{aligned} |p_\rho (\sigma _1, S) - p_\rho (\sigma _2, S)| \le \exp (-\Omega (\delta (\rho ,T))). \end{aligned}$$

2.2.2 Phase Transition and Tree Recursions

It is well known (see, for example, [4]) that the uniqueness condition for two-state spin systems on \(d\)-ary trees can be written in terms of the number of fixed points of the recursion for occupation probabilities. Consider a subtree rooted at a vertex \(v\) in the \(d\)-ary tree, and let \(v_i, i = 1,2,...d\) be its children. Let \(p_v\) be the occupation probability at vertex \(v\) and define \(R_v = \frac{1 - p_v}{p_v}\). One can then write the following recurrence for \(R_v\):

$$\begin{aligned} R_v = \lambda \prod _{i=1}^d \left( \frac{\beta R_{v_i} + 1}{\beta + R_{v_i}}\right) . \end{aligned}$$

This can easily be converted to a recurrence for occupation probabilities. Define

$$\begin{aligned} h(x) \triangleq \frac{\beta + (1-\beta )x}{1-(1-\beta )x}. \end{aligned}$$

We can then write the recurrence as

$$\begin{aligned} p_v = F(p_{v_1}, p_{v_2},\ldots , p_{v_d}) \triangleq \frac{1}{1 + \lambda \prod _{i=1}^dh(p_{v_i})}. \end{aligned}$$
(6)

We will find it useful in what follows to consider the tree recurrence with the special boundary condition in which all vertices at some distance \(l\) from the root are fixed to the same spin. In this case, by symmetry, the tree recurrence outputs the same value \(p_v\) at all vertices \(v\) which are at the same distance from the root. Thus, the recurrence can be simplified to a one-parameter recurrence as follows:

$$\begin{aligned} p_v = f(p_{v_1}) \triangleq \frac{1}{1 + \lambda h(p_{v_1})^d}. \end{aligned}$$

Note that in the anti-ferromagnetic case, \(h\) is an increasing function, and hence \(F\) and \(f\) are decreasing in each of their arguments. We also note that since \(f\) is strictly decreasing in \(\left[ 0,1\right] \), it has a unique fixed point.

In terms of the recurrence function \(f\), the condition for uniqueness can be stated as follows.

Theorem 3

([4]) For given values of \(\beta \) and \(\lambda \), the infinite \(d\)-ary tree has a unique Gibbs measure if and only if the two-step recurrence function \(f\circ f\) has a unique fixed point. In particular, if the Gibbs measure is unique, and \((\beta , \lambda )\) are not on the boundary of the uniqueness region, then the unique fixed point \(x^\star \) of \(f\) satisfies

$$\begin{aligned} f'(x^\star )&> -1. \end{aligned}$$
(7)

Remark

In [4], it is claimed (implicitly) on the basis of numerical simulations that the condition (7) is also sufficient for uniqueness. To be precise, the expression for the critical activity \(\lambda _c(\beta , d)\) given in [4, p. 255] is exactly the same as that obtained by assuming that (7) is also a sufficient condition for uniqueness. While we believe this fact to be folklore, we have not been able to find a rigorous proof of it in the literature. With a slight abuse of terminology, we will henceforth refer to the set of \((\beta , \lambda )\) for which the fixed point \(x^\star \) satisfies \(f'(x^\star ) > -1\) as the “uniqueness region”. We will justify this terminology later (see the Remark following the proof of Theorem 1 in Sect. 4) by proving that condition (7) does indeed imply uniqueness. Thus we will obtain a rigorous proof of the expression for the critical activity appearing in [4].

2.3 Messages

Definition 5

(Message) A message is a continuously differentiable function \(\phi : [0,1] \rightarrow \mathbb {R}\) with positive derivative.

Note that a message is strictly increasing and hence invertible on its range. Moreover, the inverse function \(\phi ^{-1}\) is also a continuously differentiable function with positive derivative.

Given a recurrence function \(f: [0,1] \rightarrow [0,1]\), and a message \(\phi \), we denote by \(f^\phi \) the function \(\phi \circ f \circ \phi ^{-1}\). The function \(f^\phi \), which will play a crucial role in this paper, describes the evolution of the message \(\phi \) under the recurrence, in the sense that \(f^\phi (\phi (x)) = \phi (f(x))\). We will also need the following fact.

Notation In the following, we will find it convenient to use the following shorthand notations: we will denote the function \(f^\phi \) as \(g\), and the inverse \(\phi ^{-1}\) of \(\phi \) as \(\psi \).

Fact 4

For any message \(\phi \), the parameters \((\beta , \lambda )\) are in the uniqueness region if and only if \(g'(p^\star ) > -1\) at the unique fixed point \(p^\star \) of \(g = f^\phi \).

Proof

Notice that since \(\phi \) is strictly increasing, and \(f\) has a unique fixed point \(x^\star \), \(g = f^\phi \) also has a unique fixed point \(p^\star = \phi (x^\star )\). Now, we notice that \(g'(p^\star ) = f'(x^\star )\), because

$$\begin{aligned} g'(p^\star )&= \phi '(f(\psi (p^\star )))f'(\psi (p^\star ))\psi '(p^\star )\\&= \phi '(f(x^\star ))f'(x^\star )\frac{1}{\phi '(\psi (p^\star ))}\\&= \frac{\phi '(x^\star )}{\phi '(x^\star )}f'(x^\star )\\&= f'(x^\star ), \end{aligned}$$

where in the second line we used the facts that \(\psi (p^\star ) = x^\star \) and \(\psi '(y) = \frac{1}{\phi '(\psi (y))}\), and in the third line the fact that \(f(x^\star ) = x^\star \). Thus, \((\beta , \lambda )\) are in the uniqueness region (as defined in the Remark following Theorem 3) if and only if \(g'(p^\star ) = f'(x^\star ) > -1\).

2.4 Weitz’s Tree Reduction

As indicated in the introduction, Weitz [15] proved the following combinatorial reduction (see [15] for a formal definition of the self-avoiding walk tree used in this theorem).

Theorem 5

([15]) Let \(G\) be a graph of degree at most \(d\), and let \(v\) be a vertex in \(G\). For any two-state spin system on \(G\), the occupation probability \(p_v\) in \((G)\) is equal to the occupation probability of \(v\) in the self-avoiding walk (SAW) tree (which is also of degree at most \(d\)) rooted at \(v\). Further, if the SAW tree exhibits strong spatial mixing, then there is a deterministic fully polynomial time approximation scheme for the partition function of the spin system on graphs of degree at most \(d\).

3 Messages and Contraction on the \(d\)-ary Tree

In this section, we will prove the main technical ingredient of our result, which is expressed in the following theorem.

Theorem 6

Given \(d, \beta \) and \(\lambda \), there exists a message \(\phi \) and a constant \(c < 1\), such that the tree recurrence \(g = f^\phi \) for the quantity \(\phi (p_v)\) satisfies \(\Vert g' \Vert _{\infty } \le c < 1\), whenever \((\beta , \lambda )\) is in the uniqueness region for the \(d\)-ary tree.

The above theorem says that in the uniqueness region, the single-parameter recurrence \(g=f^\phi \) for the message \(\phi (p_v)\) contracts at every step. (Without the message, the function \(f\) itself is not contractive.) This stepwise contraction is easily seen by standard arguments to imply weak spatial mixing; for completeness, we give a proof in the Section titled “Contraction and Weak Spatial Mixing” in Appendix 3. To extend the argument to strong spatial mixing, as required for Theorem 1, we need to consider a multi-parameter (vectorized) version of the message recurrence \(g\), since under arbitrary boundary conditions the occupation probabilities need not be uniform. We will show in Sect. 4 that for our message \(\phi \) in Theorem 6, the analysis of the vectorized version can in fact be reduced to an application of Theorem 6.

Remark

For ease of notation, in the rest of the paper we will prove our results in terms of the uniqueness threshold of the \(d\)-ary tree, relating it to algorithms on graphs of degree at most \(d+1\). As already noted, the uniqueness thresholds on the \((d+1)\)-regular tree and the \(d\)-ary tree coincide, and hence our results apply equally to the infinite \((d+1)\)-regular tree.

We begin by setting up some notation for the proof of Theorem 6. In light of Fact 4, the main technical challenge is to come up with a message \(\phi \) such that the quantity \(\left| g'\right| \) is maximized at the unique fixed point of \(g = f^\phi \). Let us fix constants

$$\begin{aligned} A = d(1-\beta ^2) + (1-\beta )^2\text { and }D = \frac{\sqrt{A + 4\beta } - \sqrt{A}}{2\sqrt{A}}. \end{aligned}$$

Define

$$\begin{aligned} \phi (x) = \log \left( \frac{x + D}{1-x + D}\right) . \end{aligned}$$

Notice that \(D > 0\), so \(\phi \) is a continuously differentiable function with positive derivative on the interval \(\left[ 0,1\right] \). Using this message we are able to prove the following (recall that \(\psi \) denotes \(\phi ^{-1}\)).

Lemma 1

Consider the anti-ferromagnetic Ising model on a \(d\)-ary tree with edge activity \(\beta \) and vertex activity \(\lambda \). Then, using the shorthand notations \(\alpha \) and \(\eta \) for \(\psi (x)\) and \(f(\psi (x))\) respectively, we have

$$\begin{aligned} g''(x)&= (\eta - \alpha ){g'(x)\psi '(x)}\nonumber \\&\qquad \times {\frac{d\beta (1-\beta ^2)(2\beta + A\alpha \eta + A(1-\alpha )(1-\eta ))}{(\beta + \alpha (1-\alpha )(1-\beta )^2)(\beta + A\eta (1-\eta ))(\beta + A\alpha (1-\alpha ))}}. \end{aligned}$$
(8)

The proof of Lemma 1 is somewhat technical and is deferred to Sect. 5.

Before proceeding with the technical development, we pause to give some comments on the design of our message. Notice that the requirement that the derivative of the function \(g = f^\phi \) should have its maximum magnitude at the unique fixed point of \(g\) does not immediately lead to a solution for \(\phi \), and thus we must resort to some educated guesswork for the functional form of \(\phi \). Our choice is guided by the intuition that, by analogy with the zero field case, where it is well known that the simple message \(\phi (x) = \log \left( \frac{x}{1-x}\right) \) is sufficient, a log ratio of probabilities shifted by an additive constant \(D\) to account for the field should be appropriate. The choice of \(D\) is then determined by the above requirement. An important additional property of our message is that, perhaps surprisingly, it does not depend upon the vertex potential \(\lambda \), but only upon the edge potential \(\beta \) and the degree \(d\); this is reflected in the fact that the additive shift \(D\) is the same for both probabilities. This property will be important in extending our algorithm to the setting of general two-state anti-ferromagnetic spin systems in Corollary 2.

Lemma 2

Let \(g = f^\phi \), with the message \(\phi \) defined above. Then \(|g'(x)|\) is maximized at the unique positive fixed point of \(g\).

Proof

We use the notation established in Lemma 1, where we derived an expression for \(g''(x)\) in Eq. (8). It is easy to see that ignoring the factor \((\eta - \alpha )\), the rest of the right hand side of Eq. (8) is negative: this is because \(g\) is a decreasing function, while \(\psi \), being the inverse of the increasing function \(\phi \), is increasing. Also, we have \(0 < \beta < 1\) (in the anti-ferromagnetic case) and \(0 \le \alpha ,\eta \le 1\) (since they are probabilities), so that the fractions appearing on the right hand side are positive.

Let \(x^\star \) be the unique fixed point of the strictly decreasing function \(g\). From the above discussion, it follows that the sign of \(g''(x)\) is the opposite of the sign of \(\eta - \alpha = f(\psi (x)) - \psi (x)\). Notice that \(\eta - \alpha \) is strictly positive for \(x < x^\star \) and strictly negative for \(x > x^\star \). This implies that \(g'(x)\) is strictly decreasing for \(x < x^\star \) and strictly increasing for \(x > x^\star \). Since \(g\) is strictly decreasing this shows that the magnitude of \(g'\) is maximized at \(x^\star \).

Combining Lemma 2 with Fact 4, we immediately get Theorem 6. Lemma 2 further implies that the constant \(c\) in the Theorem is \(|g'(x^\star )|\), where \(x^\star \) is the unique fixed point of \(g\).

4 Strong Spatial Mixing on the \(d\)-ary Tree

In this final section we use the message defined in the previous section to prove our main result, Theorem 1. Along with Weitz’s reduction stated in Theorem 5, this will immediately imply Corollary 1, the FPTAS for the anti-ferromagnetic Ising model with arbitrary fields. To derive Corollary 2 for general two-state spin systems, we will need the translation described in Appendix 1. Both these latter proofs appear at the end of this section.

Recall that Theorem 1 asserts that weak spatial mixing implies strong spatial mixing. We already showed in Theorem 6 that in the uniqueness region, there is uniform contraction in the tree recurrence with uniform boundary conditions. However, in order to prove strong spatial mixing, we will need to handle non-uniform boundary conditions as well, in which case the one-parameter recurrence \(g = f^\phi \) is no longer sufficient. We therefore consider the multi-parameter vectorized version \(G\) of the function \(g\). For \(\varvec{x} \in \mathbb {R}^d\), \(G(\varvec{x})\) is defined as

$$\begin{aligned} G(x_1,x_2,\ldots ,x_d) = \phi \left( \frac{1}{1+\lambda \prod _{i=1}^d h\left( \psi (x_i)\right) }\right) , \end{aligned}$$
(9)

where, as before, \(\psi = \phi ^{-1}\). We claim that strong spatial mixing on the \(d\)-ary tree is implied whenever the function \(G\) satisfies the following condition; a proof of this implication can be found in the Section titled “Contraction and Strong Spatial Mixing” in Appendix 3.

Definition 6

(Contractive spatial mixing) Given the parameters \(\beta \) and \(\lambda \) for the anti-ferromagnetic Ising model on a \(d\)-regular tree, contractive spatial mixing holds if there exists a constant \(c < 1\) such that

$$\begin{aligned} \left| G(\varvec{x}) - G(\varvec{y})\right| \le c\Vert \varvec{x}-\varvec{y} \Vert _{\infty }, \end{aligned}$$

for the vectorized version \(G\) of \(g\) defined as above with respect to the message \(\phi \).

To establish this condition, we will rely on the following lemma.

Lemma 3

Let \(\eta = \psi (G(\varvec{x}))\). Let \(\bar{\eta }\) be the unique solution of \(\psi (g(\bar{\eta })) = \eta \). Then \(\Vert \nabla G(\varvec{x}) \Vert _{1} \le \Vert \nabla G(\bar{\eta },\bar{\eta },\ldots ,\bar{\eta }) \Vert _{1} = |g'(\eta )|\).

Proof

Set \(\alpha _i = \psi (x_i)\) for \(i=1,2,\ldots d\). We then have

$$\begin{aligned} \eta = \frac{1}{1 + \lambda \prod _{i=1}^d h(\alpha _i)} = \frac{1}{1 + \lambda h(\psi (\bar{\eta }))^d}. \end{aligned}$$
(10)

Recalling the definitions of the quantities \(A\) and \(D\) given just before Lemma 1, we can now write \(\Vert \nabla G(\varvec{x}) \Vert _{1}\) as

$$\begin{aligned} \Vert \nabla G(\varvec{x}) \Vert _{1} = \frac{d\eta (1-\eta )(1-\beta ^2)}{\beta +A\eta (1-\eta )}\left( 1 + (1-\beta ^2) \sum _{i=1}^d\frac{\alpha _i(1-\alpha _i)}{\beta + (1-\beta )^2\alpha _i(1-\alpha _i)}\right) . \end{aligned}$$
(11)

For notational convenience, we define the function \(J(x) \triangleq \frac{x(1-x)}{\beta + (1-\beta )^2x(1-x)}\). Note that maximizing the sum in (11) under the constraint (10) is the same as maximizing \(\sum _{i=1}^{d}J(\alpha _i)\) under the constraint that \(\prod _{i=1}^dh(\alpha _i) = \frac{1-\eta }{\lambda \eta }\). Since \(h\) is positive and invertible, it is therefore sufficient to show that the function \(K(x) \triangleq J(h^{-1}(e^x))\) is concave in order to show that all \(\alpha _i\)’s are equal at a maximum. We now show this by direct computation. After differentiating twice and simplifying, we have

$$\begin{aligned} K''(x) = -\frac{e^{-x}(1+e^{2x})\beta }{(1-\beta ^2)^2} < 0. \end{aligned}$$

This shows that \(K\) is concave. By the discussion above, it follows that the sum in Eq. (11) is maximized when all \(\alpha _i\)’s are equal. In conjunction with the condition that \(\eta = \frac{1}{1 + \prod _{i=1}^dh(\alpha _i)}\), this shows that

$$\begin{aligned} \Vert \nabla G(\varvec{x}) \Vert _{1} \le \Vert \nabla G(\bar{\eta },\bar{\eta },\ldots ,\bar{\eta }) \Vert _{1}. \end{aligned}$$

Note that for any \(x\), \(G(x,x,...,x)\! =\! g(x)\), and therefore \(\Vert \nabla G(\bar{\eta },\bar{\eta },\ldots ,\bar{\eta }) \Vert _{1} \!=\! |g'(\bar{\eta })|\).

Using Lemma 2 and the above lemma, we are now ready to prove our main technical result, Theorem 1, which says that weak spatial mixing implies strong spatial mixing for general anti-ferromagnetic Ising models.

Proof of Theorem 1

Consider a setting of parameters \(\beta \) and \(\lambda \) such that the \(d\)-ary tree has weak spatial mixing. Let \(x^\star \) be the unique fixed point of the function \(g\). We will use only the property that the fixed point satisfies the condition (7) of Theorem 3. By Theorem 6 we have \(\Vert g' \Vert _{\infty } = c < 1\). By Lemma 3, this implies that for all \(\varvec{x}\) in the domain of the function \(G\) defined in Eq. (9), \(\Vert \nabla G(\varvec{x}) \Vert _{1} \le c\). Using the mean value theorem followed by Hölder’s inequality, we then have

$$\begin{aligned} \left| G(\varvec{x}) - G(\varvec{y})\right| \le c\Vert \varvec{x}-\varvec{y} \Vert _{\infty }, \end{aligned}$$

for all vectors \(\varvec{x}\) and \(\varvec{y}\) in the domain of \(G\), and thus contractive spatial mixing holds. As observed above, this implies strong spatial mixing.

Remark

We can now justify our use of the term “uniqueness region” as described in the Remark following Theorem 3. Notice that in the proof of Theorem 1 above, we used only the fact that weak spatial mixing implies that \((\beta , \lambda )\) is in the “uniqueness region” as defined in the aforementioned Remark. Thus, we see that whenever \((\beta , \lambda )\) is in the uniqueness region, we have strong spatial mixing, and hence, in particular, uniqueness. As stated earlier, this provides a rigorous proof of the claim in [4] that the interior of the uniqueness region is equivalent to the condition (7).

Combining the above theorem with the general reduction of Weitz [15] stated in Theorem 5, we can now prove Corollary 1, which asserts the existence of an FPTAS for general anti-ferromagnetic Ising models on bounded-degree graphs up to the uniqueness threshold.

Proof of Corollary 1

As observed earlier, in order to obtain an FPTAS for the partition function of the associated spin system, it is sufficient to give an FPTAS for approximating the occupation probability \(p_\rho \) of a vertex \(\rho \), under an arbitrary fixing of spin values for an arbitrary subset of vertices. Given a vertex \(\rho \) in a graph \(G\) of maximum degree \((d+1)\), we start by constructing Weitz’s self-avoiding walk (SAW) tree rooted at \(\rho \). For non-leaf vertices (apart from \(\rho \)) in this tree which do not have \(d\) children, we can create dummy children (so as to make the arity of the vertex \(d\)) all of which independently have occupation probabilities of \(1/2\). It is easy to see that this does not change the output of the tree recurrence (Eq. (6)) at any vertex of the tree. As we saw in the proof of Theorem 1, we have strong spatial mixing on this SAW tree whenever \((\beta , \lambda )\) are in the uniqueness region of the \(d\)-ary tree. The corollary now follows using Weitz’s reduction (Theorem 5).

Finally, we will see how to use Lemmas 1 and 3 to prove Corollary 2, which extends the FPTAS to general two-state anti-ferromagnetic spin systems.

Proof of Corollary 2

Given a two-state spin system with parameters \(\beta , \gamma \) and \(\lambda \) on a graph \(G\) of degree at most \(d + 1\), we can use the translation given in Appendix 1 to come up with an equivalent Ising model with edge potential \(\beta ' = \sqrt{\beta \gamma }\) and vertex-dependent potentials \(\lambda _v = \lambda (\sqrt{\gamma /\beta })^{d_v}\). Now, as before, in order to estimate the occupation probability \(p_\rho \) for a given vertex \(\rho \), we construct Weitz’s self-avoiding walk (SAW) tree rooted at \(\rho \), and complete the degree of any non-leaf vertex (apart from \(\rho \)) in the tree which does not have \(d\) children by attaching dummy children which are fixed to have occupation probability \(\frac{1}{2}\). We now use the message \(\phi \) constructed above for \(d\)-ary trees for the parameter \(\beta '\). By the hypotheses of the corollary, the parameters \((\beta ', \lambda _u)\) at each vertex \(u\) of the SAW tree are in the uniqueness region of the \(d\)-ary tree. Since the message \(\phi \) does not depend upon \(\lambda _u\), Theorem 6 and Lemma 3 apply at each vertex \(u\) of the tree. Thus, as in the proof of Theorem 1, we get contractive spatial mixing and, hence, strong spatial mixing on the SAW tree. Employing Weitz’s reduction (Theorem 5), we have the first part of the corollary.

The claim that the class \(\fancyscript{G}\) in the corollary includes \((d+1)\)-regular graphs when \(\beta \), \(\gamma \) and \(\lambda \) are in the uniqueness region of \(d\)-ary tree follows by noticing that in this case the parameters \(\lambda ' = \lambda _v\) obtained by the translation are the same at each vertex \(v\), and that \(\beta '\) and \(\lambda '\) are in the uniqueness region of the \(d\)-ary tree by the hypotheses of the corollary. Thus, we can complete the proof for this case in the same manner as in the proof of Corollary 1.

5 Proof of Lemma 1

In this section, we prove Lemma 1 from Sect. 3. The proof involves a few somewhat lengthy derivative computations, which we isolate in the following lemma.

Lemma 4

With the notation used in Lemma 1 above, we have

$$\begin{aligned} \frac{\phi ''(x)}{\phi '(x)}&=\frac{A(2x - 1)}{\beta + Ax(1-x)};\end{aligned}$$
(12)
$$\begin{aligned} \frac{h'(x)}{h(x)}&= \frac{1-\beta ^2}{\beta + (1-\beta )^2x(1-x)};\end{aligned}$$
(13)
$$\begin{aligned} \frac{h''(x)}{h'(x)}&= \frac{2(1-\beta )}{1-(1-\beta )x};\end{aligned}$$
(14)
$$\begin{aligned} f'(x)&= -df(x)(1-f(x))\frac{h'(x)}{h(x)};\end{aligned}$$
(15)
$$\begin{aligned} \frac{f''(x)}{f'(x)}&= \frac{f'(x)(1-2f(x))}{f(x)(1-f(x))} + \frac{h''(x)}{h'(x)} - \frac{h'(x)}{h(x)}. \end{aligned}$$
(16)

Proof

(sketch) All of these identities are easily verified by direct computation. In proving Eq. (12), one needs to keep in mind the definition of the constant \(D\).

Proof of Lemma 1

To ease notation, we will suppress the dependence of the quantities \(\eta \) and \(\alpha \) on \(x\). Using the chain rule, we have

$$\begin{aligned} g'(x) = \frac{\phi '(\eta )}{\phi '(\alpha )} f'(\alpha ). \end{aligned}$$

Here, we used the fact that since \(\psi = \phi ^{-1}\), \(\psi '(x) = \frac{1}{\phi '\left( \psi (x)\right) }\). After taking the logarithm, and noticing that the right hand side is more easily expressed as a function of \(\alpha \) rather than of \(x\), one can write the second derivative of \(g\) as

$$\begin{aligned} \frac{1}{\psi '(x)}\frac{g''(x)}{g'(x)} = \frac{\phi ''(\eta )}{\phi '(\eta )}\frac{\text {d}\eta }{\text {d}\alpha } - \frac{\phi ''(\alpha )}{\phi '(\alpha )} + \frac{f''(\alpha )}{f'(\alpha )}. \end{aligned}$$
(17)

We now consider each of the terms involved above. Recalling that \(\eta = f(\alpha )\), and using Eqs. (15) and (16) to expand the first and last terms in Eq. (17) above, we get

$$\begin{aligned} \frac{1}{\psi '(x)}\frac{g''(x)}{g'(x)} = T_1 - T_2, \end{aligned}$$
(18)

where \(T_1\) and \(T_2\) are defined as

$$\begin{aligned} T_1&\triangleq \frac{h''(\alpha )}{h'(\alpha )} - \frac{h'(\alpha )}{h(\alpha )} - \frac{\phi ''(\alpha )}{\phi '(\alpha )}\text {, and}\\ T_2&\triangleq d\frac{h'(\alpha )}{h(\alpha )}\left[ \frac{\phi ''(\eta )}{\phi '(\eta )}\eta (1-\eta ) + 1-2\eta \right] . \end{aligned}$$

Notice that all terms containing \(\eta \) are isolated in \(T_2\). We now consider each of the terms separately. For \(T_1\), we have

$$\begin{aligned} \frac{h''(\alpha )}{h'(\alpha )} - \frac{h'(\alpha )}{h(\alpha )}&= \frac{2(1-\beta )}{1 - (1-\beta )\alpha } - \frac{1-\beta ^2}{\beta + (1-\beta )^2\alpha (1-\alpha )}\\&= \frac{(1-\beta )^2(2\alpha - 1)}{\beta + (1-\beta )^2\alpha (1-\alpha )}. \end{aligned}$$

Here, we used Eqs (14) and (13) in the first line. Now using Eq. (12), we have

$$\begin{aligned} T_1&= \frac{(2\alpha - 1)\left( (1-\beta )^2\left[ \beta + A\alpha (1-\alpha )\right] -A\left[ \beta + (1-\beta )^2\alpha (1-\alpha )\right] \right) }{\left( \beta + (1-\beta )^2\alpha (1-\alpha )\right) \left( \beta + A\alpha (1-\alpha )\right) }\\&=\frac{\beta (2\alpha -1)((1-\beta )^2 - A)}{\left( \beta + (1-\beta )^2\alpha (1-\alpha )\right) \left( \beta + A\alpha (1-\alpha )\right) }\\&= \frac{-d\beta (2\alpha - 1)}{\left( \beta + A\alpha (1-\alpha )\right) }\frac{h'(\alpha )}{h(\alpha )}. \end{aligned}$$

Here, we use \(A = d(1-\beta ^2) + (1-\beta )^2\), followed by Eq. (13) in the last line.

We now consider \(T_2\). Again using Eq. (12), we have

$$\begin{aligned} T_2&= d\frac{h'(\alpha )}{h(\alpha )}\left[ \frac{A(2\eta -1)\eta (1-\eta )}{\beta + A\eta (1-\eta )} - (2\eta - 1)\right] \\&= \frac{-d\beta (2\eta -1)}{\beta + A\eta (1-\eta )}\frac{h'(\alpha )}{h(\alpha )}. \end{aligned}$$

Notice that modulo the \(\frac{h'(\alpha )}{h(\alpha )}\) factor, \(T_1\) and \(T_2\) have the same functional form as functions of \(\alpha \) and \(\eta \) respectively. In fact, the message \(\phi \) is designed so as to make this possible. We can now substitute these values into Eq. (18) to get

$$\begin{aligned} g''(x)&= d\beta g'(x)\psi '(x)\frac{h'(\alpha )}{h(\alpha )} \left[ \frac{2\eta -1}{\beta + A\eta (1-\eta )} - \frac{2\alpha -1}{\beta +A\alpha (1-\alpha )}\right] \\&= d\beta g'(x)\psi '(x)\frac{h'(\alpha )}{h(\alpha )}\frac{(\eta -\alpha )(2\beta + A(\alpha \eta + (1-\alpha )(1-\eta )))}{\left( \beta + A\alpha (1-\alpha )\right) \left( \beta + A\eta (1-\eta )\right) }\\&=(\eta - \alpha )g'(x)\psi '(x) \frac{d\beta (1-\beta ^2)(2\beta + A\alpha \eta + A(1-\alpha )(1-\eta ))}{(\beta + \alpha (1-\alpha )(1-\beta )^2)(\beta + A\eta (1-\eta ))(\beta + A\alpha (1-\alpha ))}, \end{aligned}$$

where in the last step we used Eq. (13).