1 Introduction

In this paper we study the high temperature behavior of the Sherrington–Kirkpatrick model and more general Ising models, especially with regards to mixing of the Glauber dynamics (i.e. Gibbs sampling) chain. More precisely, if \(\mu \) is the uniform measure over the hypercube \(\{\pm \, 1\}^n\), we consider a general Ising model of the form

$$\begin{aligned} \frac{{d\nu _0}}{d\mu }(x) =\frac{1}{Z} \exp \left( \frac{1}{2}\langle x, J x \rangle + \langle h, x \rangle \right) \end{aligned}$$
(1)

for an arbitrary symmetric quadratic interaction matrix J and external field \(h \in {\mathbb {R}}^n\), where Z, the partition function, is a normalization constant. Because the evaluation of the partition function Z is a difficult computational task, in practice samples from (1) are generally constructed by simulating a Markov chain such as the Glauber dynamics, where at each step (in discrete time) a site i is chosen uniformly at random from [n] and the random spin \(X_i\) is resampled from its conditional law given \(X_{\sim i}\). (Here and throughout, we write \(x_{\sim i}\) for the collections \(\{x_j\}_{j\ne i}\), with similar notation for \(X_{\sim i}\).)

The behavior of Glauber dynamics in the Ising model is a classical and well-studied topic with rich connections to structural properties of the Gibbs measure \(\nu \) and concentration of measure. As far as sufficient conditions for fast mixing are concerned, one of the most general and well-known situations where rapid mixing is guaranteed is under Dobrushin’s uniqueness condition [7], which requires that \(\Vert J\Vert _{\infty \rightarrow \infty } < 1\) or equivalently that \(\sum _{j} |J_{ij}| < 1\) for all rows i. Unfortunately, even though there exist situations where this bound is tight (the mean-field/Curie–Weiss model), in other situations of interest this bound is far from tight.

One notable model where Dobrushin’s condition is not satisfied at interesting situations is the celebrated Sherrington–Kirkpatrick (SK) model from spin glass theory [21]. In the SK model, J is given by a rescaled matrix from the Gaussian Orthogonal Ensemble so that J is symmetric with off-diagonal entries \(J_{ij} \sim {\mathcal {N}}(0, \beta ^2/n)\), where \(\beta > 0\) is a parameter specifying the inverse temperature of the model. Here the expected \(\ell _1\) norm of a row of J is on the order of \(\beta \sqrt{n}\), so that Dobrushin’s uniqueness condition only holds under the restrictive condition \(\beta = O(1/\sqrt{n})\). Nevertheless, it is expected that in reality the Glauber dynamics are actually fast mixing for all sufficiently small constant \(\beta = O(1)\) (i.e. not shrinking with n). We indeed prove this below, see Theorem 11 and Sect. 5.

In the classical case of ferromagnetic Ising models on a lattice, it is known that there are close connections between rapid mixing of the Glauber dynamics and functional inequalities such as the log-Sobolev inequality. In a recent breakthrough result, Bauerschmidt and Bodineau [3] proved a form of the log-Sobolev inequality for the SK model at sufficiently high temperature (\(\beta < 0.25\)). More precisely, they proved that if J is a positive semidefinite matrix of operator norm \(\Vert J\Vert _{OP}\), then for any probability measure \(\rho \) on \(\{\pm 1\}^n\),

$$\begin{aligned} {D(\rho || \nu _0)} \lesssim \frac{1}{1 - \Vert J\Vert _{OP}} \sum _{i = 1}^n {\mathbb {E}}_{\nu } \left| \partial _i \sqrt{\frac{d\rho }{{d\nu _0}}}(X)\right| ^2 \end{aligned}$$
(2)

for any model of the form (1), where \(D(\rho ||{ \nu _0}) = {\mathbb {E}}_{\rho } \log \frac{d\rho }{{d\nu _0}}\) is the relative entropy and \(\partial _i f(x) = f(x_{\sim i}, x_i = 1) - f(x_{\sim i}, x_i = -1)\) is the discrete gradient on the hypercube. By a standard argument (see e.g. [15, 23]), this implies the following Poincaré-type inequality

$$\begin{aligned} \mathrm {Var}(\varphi ) \lesssim \frac{1}{1 - \Vert J\Vert _{OP}} \sum _{i = 1}^n {\mathbb {E}}_{{\nu _0}} \left| \partial _i \varphi (X)\right| ^2 \end{aligned}$$
(3)

as well. Their proof is based upon an explicit decomposition of the measure \(\nu _0\) into a mixture of product measures.

However, in the case of the SK model the estimates (2) and (3) are not known to imply polynomial time bounds on the mixing time (or relaxation time) of Glauber dynamics. The reason is a subtle discrepancy between different notions of discrete gradients. A simple example which illustrates this is the uniform measure \(\mu _{\mathrm {even}}\) on the set of vertices with even parity, \(\{(x_1,\dots ,x_n \in \{\pm 1\}^n; \prod _i x_i = 1 \}\). It is not hard to check that the right hand side of (3) remains unchanged if the measure \(\mu _{\mathrm {even}}\) is replaced by the uniform measure \(\mu \), therefore, if we apply the law of total variance, we see that \(\mu _{\mathrm {even}}\) satisfies a Poincaré-type inequality of this form. On the other hand, the Glauber dynamics with respect to this measure is trapped at one vertex.

For the Glauber dynamics, having spectral gap \(\gamma \) (or equivalently, relaxation time \(1/\gamma \)) is equivalent to the following Poincaré inequality (see e.g. [23]):

$$\begin{aligned} \mathrm {Var}(\varphi ) \le \frac{1}{\gamma } {\mathcal {E}}_{{\nu _0}}(\varphi ,\varphi ) := \frac{1}{\gamma } {\mathbb {E}}_{{\nu _0}} \sum _{i = 1}^n ({\mathbb {E}}_{{\nu _0}}[\varphi (X) | X_{\sim i}] - \varphi (X))^2 \end{aligned}$$
(4)

where the rhs \({\mathcal {E}}_{{\nu _0}}(\varphi ,\varphi )\) here is the Dirichlet form corresponding to the continuous time Glauber dynamics. For distributions with full support on the hypercube \(\{\pm 1\}^n\), the right hand side of (3) can be realized as the Dirichlet form for a certain dynamics in continuous time, see [17, Equation (3.6)], but these dynamics can have a rate that is super-polynomial (see discussion below). Similarly, the canonical log-Sobolev inequality for the Glauber semigroup in the sense of Gross [10], which implies (4) as well as rapid mixing, replaces the sum on right hand side of (2) by the Dirichlet form \({\mathcal {E}}_{{\nu _0}}\left( \sqrt{\frac{d\rho }{d{\nu _0}}}, \sqrt{\frac{d\rho }{d{\nu _0}}}\right) \). For some models, the discrepancy between the rhs of (3) and (4) is at most a constant factor and so the difference between the Dirichlet form of the Glauber dynamics and the \(\ell _2\)-norm of the usual discrete gradient can be disregarded.

Unfortunately, for the SK model it turns out the right hand side of (3) can be size \(e^{\Theta (\beta \sqrt{n})}\) larger than the right hand side of (4) in some simple examples. (We give such an example in “Appendix A”.) On the other hand, it is not hard to show that in the reverse direction, the rhs of (4) is never bigger than (3) by more than a constant factor, so that (4) is a stronger estimate. This inequivalence of the two Dirichlet forms reflects the fact that the rate of the continuous-time dynamics corresponding to (3) can be as large as \(e^{\Theta (\beta \sqrt{n})}\).

The main result of this paper is a proof of the Poincaré inequality (4) with \(\gamma = 1 - \Vert J\Vert _{OP}\) (for J psd, as before), from which we obtain polynomial bounds on the mixing time of the Glauber dynamics. It is unclear how to obtain such a result from the product measure decomposition used to prove (2), so a key technical idea in our work is the construction of a new decomposition of the measure \(\nu \) as a mixture of rank-one Ising models (i.e. where J has rank one). This can be thought of as a natural analogue of the needle decomposition used in convex geometry [12]. A structural theorem of this form, however not used directly in our result, is formulated in Sect. 4 below. The needle decomposition itself is generated by a natural stochastic process (a version of stochastic localization [8]) and the smooth nature of the decomposition allows us to explicitly analyze the evolution of the Dirichlet form along this process, allowing us to prove the result.

In the next section, we formulate and prove our basic Poincaré inequality, Theorem 1. The related “Appendix A” discusses the inequivalence between the Dirichlet forms \({\mathcal {E}}_\nu (\varphi ,\varphi )\) and \({\mathbb {E}}|\nabla \varphi (X)|^2\). Sect. 3 is devoted to the estimate on mixing time, Theorem 11. In Sect. 4 we outline a structural theorem in the spirit of the needle-decompositions mentioned above. Finally, Sect. 5 is devoted to examples.

2 Poincaré inequality

Recall that \(\mu \) denotes the uniform measure on \(\{\pm \, 1\}^n\), and that for a matrix J and a vector h, the Ising measure is defined as

$$\begin{aligned} d \nu _0(x) = \frac{1}{Z} e^{\frac{1}{2}\langle x, J x \rangle + \langle h, x \rangle } d \mu . \end{aligned}$$
(5)

We can clearly assume without loss of generality that J is symmetric and positive definite, which we do henceforth. For any measure \(\nu \) on \(\{\pm \, 1\}^n\), we define the Dirichlet form

$$\begin{aligned} {\mathcal {E}}_{\nu }(\varphi ,\varphi ) = {\mathbb {E}}_{\nu } \sum _{i = 1}^n ({\mathbb {E}}_{\nu }[\varphi (X) \mid X_{\sim i}] - \varphi (X))^2, \end{aligned}$$
(6)

where the associated generator of the Glauber dynamics is

$$\begin{aligned} ({\mathcal {L}}_{\nu } \varphi )(x) = \sum _{i = 1}^n \big ({\mathbb {E}}_{\nu }[\varphi (X) \mid X_{\sim i} = x_{\sim i}] - \varphi (x)\big ). \end{aligned}$$

The main result of this section is a dimension-free Poincaré inequality for \(\nu _0\) under the Glauber dynamics, provided that \(\Vert J\Vert _{OP} < 1\).

Theorem 1

For \(\nu _0\) as in (5) with \(0 \preceq J \prec \mathrm {Id}\) and any test function \(\varphi : \{\pm 1\}^n \rightarrow {\mathbb {R}}\), we have the following inequality:

$$\begin{aligned} (1 - \Vert J\Vert _{OP})\mathrm {Var}_{\nu _0}(\varphi (X)) \le {\mathcal {E}}_{\nu _0}(\varphi ,\varphi ). \end{aligned}$$

Proof

The proof proceeds by a dynamical approach. It is clearly enough to consider \(\varphi \) with Lipshitz norm 1 and \(\varphi (\mathbf{1})=\varphi (( 1,\ldots ,1))=0\), which implies that \(\Vert \varphi \Vert _\infty \le n\). We will introduce a path of measures

$$\begin{aligned} d \nu _t(x) = e^{c_t + \frac{1}{2}\langle x, J_t x\rangle + \langle q_t + h, x \rangle } d \mu (x)=:F_t(x) d\nu _0(x), \end{aligned}$$
(7)

where \(J_t,q_t\) are processes, adapted to the filtration \({\mathcal {F}}_t\) generated by an n-dimensional Brownian motion \(W_t\), and \(c_t\) is a normalization constant. For \(\nu _t\) as in (7), introduce the barycenter

$$\begin{aligned} a_t := \int x\ d \nu _t(x) \end{aligned}$$
(8)

and the test-function adjusted barycenter

$$\begin{aligned} V_t := \int \varphi (x) (x-a_t)\ d \nu _t{(x)}. \end{aligned}$$
(9)

To define the process \(F_t(x)\), we will first need the following technical result. Define by \({\mathcal {H}}\) the set of all linear subspaces of \({\mathbb {R}}^n\).

Lemma 2

For every \(\delta > 0\), there exists a function \(C: {\mathbb {R}}^n \times {\mathcal {H}} \rightarrow {\mathbb {M}}_{n \times n}\) which attains the following properties. For any linear subspace \(H \subset {\mathbb {R}}^n\),

  1. 1.

    For any v the matrix C(vH) is positive semidefinite and \(\mathrm {Im}(C(v,H)) \subseteq H\).

  2. 2.

    The map \(v \mapsto C(v,H)\) is Lipschitz continuous.

  3. 3.

    If dim\((H)=d>1\) then \(\text{ Tr }(C(v,H))\ge d-1\).

  4. 4.

    For any v we have

    $$\begin{aligned} {|C(v,H) v| \le \delta } \end{aligned}$$
    (10)

If the continuity assumption is ignored, then one may simple take C(vH) as the orthogonal projection onto \(H \cap v^\perp \), and for the sake on intuition, the reader may think of C this way. Otherwise, the actual construction (and proof of the lemma) is postponed to Sect. 2.1.

Fix \(\delta \ll 1\) (possibly depending on n), and let C be the function provided by the above lemma. For a symmetric matrix J, introduce the subspace \(H_J\) spanned by the eigenvectors of J corresponding to positive eigenvalues (when J is positive semidefinite, this is \(\mathrm {Im}(J)\)).

We are finally ready to introduce the dynamics for \(F_t\), as the solution to the system of equations

$$\begin{aligned}&\frac{dJ_t}{dt}= -\ C(V_t,H_{J_t})^2, \qquad J_0=J \end{aligned}$$
(11)
$$\begin{aligned}&d F_t(x) = F_t(x) \langle C(V_t,H_{J_t}) (x-a_t), d W_t \rangle , \qquad F_0(x) = 1, \forall x \in \{\pm 1\}^n, \end{aligned}$$
(12)

see (8) and (9) for the definitions of \(a_t\), \(V_t\). Note that the system in (11)–(12) is a stochastic differential equation of dimension \(2^n+n(n+1)/2\). The existence and uniqueness of solutions to this system follow from the next lemma, whose proof is also postponed to Sect. 2.1 below. In the proof of existence, we also show that \(J_t\) remains positive semidefinite, which essentially follows from the fact \(\mathrm {Im}(C(V_t,H_{J_t})) \subseteq H_{J_t}\).

Lemma 3

For any positive semidefinite matrix J, the system of stochastic differential equations (11)–(12) admits a unique strong solution. Furthermore, the matrix \(J_t\) is positive semidefinite for all times \(t \ge 0\), almost surely.

We continue with the Proof of Theorem 1. Define now \(H_t = H_{J_t}\) and \(C_t=C(V_t,H_t)\). We start by verifying that the measure \(\nu _t\) with density \(d\nu _t(x)= F_t(x) d\nu _0(x) \) is indeed a probability measure on \(\{\pm 1\}^n\). The nonnegativity of \(F_t(x)\) is easily verified (it follows from (17) below) so it remains to check that the total mass of \(\nu _t\) is 1. Note that, due to (8) and (12), we have for the total mass

$$\begin{aligned} z_t := \sum _{x\in \{\pm 1\}^n} F_t(x) \nu _0(x) \end{aligned}$$
(13)

that

$$\begin{aligned} dz_t = \sum _{x\in \{\pm 1\}^n} F_t(x) \langle C_t (x - a_t), dW_t \rangle \nu _0(x) = \langle C_t (a_t - z_t a_t), dW_t \rangle \end{aligned}$$

and because \(z_0 = 1\), by uniqueness of the solution we have \(z_t = 1\) for all time, and hence \(\nu _t\) is a probability measure as claimed.

Define now

$$\begin{aligned} M_t = \int \varphi (x) F_t(x)\ d \nu . \end{aligned}$$

We have from (12) that

$$\begin{aligned} d M_t = \left\langle \int \varphi (x) C_t (x-a_t)\ d \nu _t, d W_t \right\rangle = \left\langle C_t V_t, d W_t \right\rangle , \end{aligned}$$
(14)

where \(V_t := \int \varphi (x) (x-a_t) d \nu _t\). Note that

$$\begin{aligned} |d[M]_t/dt|=|C_t V_t|^2\le \delta ^2,\; \text {almost surely}, \end{aligned}$$
(15)

see (10).

Define the stopping time \( T = \min \{t: \mathrm {rank}(J_t) \le 1 \} \) and let \( Y_t := \mathrm {Var}_{\nu _t}[\varphi ]. \) Then

$$\begin{aligned} d Y_t = d \left( \int \varphi ^2\ d\nu _t - M_t^2 \right) = -\, d[M]_t + \text{ martingale }. \end{aligned}$$

Consequently, we get from (15) that

$$\begin{aligned} |{\mathbb {E}}Y_T - Y_0| =|{\mathbb {E}}Y_T- \mathrm {Var}_{\nu _0}[\varphi ]|\le \delta ^2 {\mathbb {E}}T. \end{aligned}$$
(16)

Next, Ito’s formula gives \( d \log F_t(x) = \langle C_t (x-a_t), d W_t \rangle - {(1/2)}|C_t (x-a_t)|^2 dt \) so by integrating, we have

$$\begin{aligned} \log F_t(x) = c_t + \langle q_t, x \rangle - \langle B_t x, x \rangle {/2} \end{aligned}$$
(17)

with \(c_t, q_t\) being some Ito processes and with

$$\begin{aligned} B_t = \int _0^t C_s^2\ ds \end{aligned}$$
(18)

(here we use that the matrix \(C_t\) is symmetric). Note that, with \(J_t\) as in (7), we obtain that

$$\begin{aligned} J_t = J - B_t, \end{aligned}$$
(19)

where J is the original interaction matrix.

We next claim that almost surely, \(T \le \frac{1}{2} \mathrm {Tr}(J_0)=:T_0\). Indeed, \(\frac{d}{dt} \mathrm {Tr}(J_t) \le - 2 \mathrm {dim}(H_t) + 2 \le -2\) for all \(t < T\), which means that if \(T>T_0 \) then \(\mathrm {Tr}(J_t)<0\), contradicting the positive semidefiniteness of \(J_t\). We also deduce by monotonicity that

$$\begin{aligned} 0 \preceq J_T \preceq \Vert J\Vert _{OP} \mathrm {Id}. \end{aligned}$$

Thus, (7) implies that

$$\begin{aligned} d \nu _T(x) = e^{c_T + \frac{1}{2} \langle U, x \rangle ^2 + \langle q_T + h, x \rangle } d \mu (x) \end{aligned}$$
(20)

where \(|U|^2 \le \Vert J\Vert _{OP}\).

To deduce the final result we use two more facts, proved below: Lemma 8, which says the Poincaré inequality holds for the rank one model \(\nu _T\), and Lemma 9, which says the Dirichlet form is a supermartingale under the dynamics (11)–(12). Given these facts, it follows from (16) that

$$\begin{aligned} (1-\Vert J\Vert _{OP})\mathrm {Var}_{\nu _0}(\varphi )\le & {} (1 - \Vert J\Vert _{OP}){\mathbb {E}}\mathrm {Var}_{\nu _T}(\varphi ) + \delta ^2 T_0 \le {\mathbb {E}}{\mathcal {E}}_{\nu _T}(\varphi ,\varphi ) \\&+ \delta ^2 T_0 \le {\mathcal {E}}_{\nu _0}(\varphi ,\varphi ) + \delta ^2 T_0. \end{aligned}$$

Taking \(\delta \rightarrow 0\) proves the result. \(\square \)

2.1 Proof of the existence of the process

In this section we prove the technical Lemmas 2 and 3.

Proof of Lemma 2

Introduce a smooth function \(\phi : {\mathbb {R}}_{+} \rightarrow [0,1]\) satisfying

$$\begin{aligned} \phi (0)=1,\qquad \phi '(0) = 0, \qquad \sup _{z\in {\mathbb {R}}_+} z \phi (z)\le \delta . \end{aligned}$$
(21)

For example, the function

$$\begin{aligned} \phi (z)= e^{- z^2/2\delta ^2} \end{aligned}$$

will do. Given a vector \(v\in {\mathbb {R}}^n\) and a linear subspace H of \({\mathbb {R}}^n\), write \(v=v_1+v_2\) where \(v_1\in H\) and \(v_2\in H^\perp \), write \({\hat{v}}_1=v_1/|v_1|\), and set

$$\begin{aligned} C(v,H)=\mathrm {Proj}_{H\cap v^\perp }+ \phi (|v_1|){\hat{v}}_1\otimes {\hat{v}}_1. \end{aligned}$$
(22)

When \(v_1 = 0\), C(vH) is just \(\mathrm {Proj}_{H}\), the orthogonal projection onto subspace H. The function C(vH) is a smooth approximation to the function \(A(v,H)=\mathrm {Proj}_{H\cap v^\perp }\); the latter is not smooth owing to a discontinuity when \(|v_1|\) is small. Indeed, when \(v=v_2 \in H^{\perp }\), we note that A(vH) is the projection onto H, while if \(v=\epsilon {\hat{v}}_1+ v_2\) for \(\epsilon >0\), then the operator A(vH) is a projection onto a codimension 1 subspace of H. On the other hand, C(vH) smoothes this transition, at the cost that it is not a projection.

From the definition of C(vH) we have

$$\begin{aligned} |C(v,H) v|= \phi (|v_1|)|\langle {\hat{v}}_1,v\rangle | \le |v_1|\phi (|v_1|)\le \delta . \end{aligned}$$

which shows the last claim (10) in the lemma.

We now justify the second claim of the lemma; the remaining claimsfollow directly from the definition. Rewrite \(C(v,H) = \mathrm {Proj}_H + (\phi (|v_1|) - 1) \hat{v}_1 \otimes \hat{v}_1\) and observe that the first term is constant and the second term is Lipschitz in v: this is clear away from zero, and in a neighborhood of zero it follows by rewriting the second term as \(\frac{\phi (|v_1|) - 1}{|v_1|^2} v_1 \otimes v_1\) and using that \(\phi (0) = 1, \phi '(0) = 1\), and \(\phi \) is smooth. \(\square \)

Proof of Lemma 3

Informally, both existence and uniqueness follow from the fact that \(H_{J_t}\) will be piecewise constant.

First we note that for a fixed subspace H, the equations

$$\begin{aligned}&\frac{dJ_t}{dt}= - C(V_t,H)^2 \end{aligned}$$
(23)
$$\begin{aligned}&d F_t(x) = F_t(x) \langle C(V_t,H) (x-a_t), d W_t \rangle \qquad \forall x \in \{\pm 1\}^n, \end{aligned}$$
(24)

have Lipschitz coefficients (recall that a product of bounded Lipschitz functions is Lipschitz). Therefore, for any initial condition, a strong solution exists and is unique [13]. Consider (23)–(24) with \(H := H_{J} = \mathrm {Im}(J)\) and initial conditions \(J_0 = J, F_0(x) = 1\) and define the stopping time \(\tau _1 = \inf \{t \ge 0 : \mathrm {dim}(H_{J_t}) \le \mathrm {dim}(H_{J_0}) - 1 \}\). We use this system of equations to define \(J_t, F_t\) on the interval \([0,\tau _1]\) and observe that this solution satisfies (11)–(12), provided we show that \(H_{J_t} = H\) for \(t < \tau _1\), which we do next. First, observe that for \(v \in \ker J\) that

$$\begin{aligned} J_t v = \left( J + \int _0^t -C(V_s,H)^2 ds\right) v = 0 \end{aligned}$$

where the first equality is by (23), and the second equality uses that \(J v = 0\) and \(C(V_s,H) v = (v^T C(V_s,H))^T = 0\) using that \(C(V_s,H)\) is symmetric, \(\mathrm {Im}(C(V_s,H)) \subseteq H\), and v is in \(\ker J\) which is the orthogonal complement of H. Thus, \(\ker J \subseteq \ker J_t\) for all \(t \le \tau _1\). Next, by the definition of \(\tau _1\), we have for all \(t < \tau _1\) that \(\dim (H_{J_t}) = \dim (H_{J_0})\); by a dimension count, this implies that \(\ker J = \ker J_t\), that \(H_{J_t}\) has no negative eigenvalues, and finally that \(H_{J_t} = H_{J_0}\) since they are both equal to the orthogonal complement of \(\ker J\).

More generally, for all \(i \le \mathrm {rank}(J_0)\) we define the stopping times

$$\begin{aligned} \tau _i = \{t \ge \tau _{i - 1} : \mathrm {dim}(H_{J_t}) \le \mathrm {dim}(H_{J_{0}}) - i \} \end{aligned}$$

and define \(J_t,F_t\) on \(t \in [\tau _{i - 1}, \tau _i]\) by the solution to (23)–(24) with initial condition \(J_{\tau _{i - 1}}, F_{\tau _{i - 1}}\) at time \(\tau _{i - 1}\) and \(H = H_{J_{\tau _{i - 1}}}.\) Finally, define the solution for \(t\ge \tau _{\mathrm {rank}(J_0)}\) similarly, with \(H = \emptyset \) (i.e. the solution is constant). This shows existence of the solution and positive semidefiniteness of \(J_t\) and essentially the same argument proves uniqueness as well. \(\square \)

2.2 Rank one inequality

In this section we prove the needed Poincaré inequality for rank one models (Lemma 8). We use the result of [24], which establishes a Poincaré inequality under a condition on the influence matrix referred to as the \(\ell _2\)-Dobrushin uniqueness regime (also studied in [11, 18]).

Definition 4

For two probability measures \({\mathbb {P}}\) and \({\mathbb {Q}}\) defined over the same measure space, their Total Variation (TV) Distance is defined to be

$$\begin{aligned} \Vert {\mathbb {P}}- {\mathbb {Q}}\Vert _{TV} := \sup _A |{\mathbb {P}}(A) - {\mathbb {Q}}(A)| \end{aligned}$$

where A ranges over all measurable events.

Definition 5

Suppose that X is a random vector supported on a finite set \({\mathcal {X}}^n\) and distributed according to \(\nu \). Define the influence matrix A to be the matrix with \(\text {diag}(A) = 0\) and

$$\begin{aligned} A_{ij} := \max _{x_{\sim i}, x'_{\sim i}} \Vert {\mathbb {P}}_{\nu }[X_i { = \cdot } | X_{\sim i} = x_{\sim i}] - {\mathbb {P}}_{\nu }[X_i {= \cdot } | X_{\sim i} = x'_{\sim i}]\Vert _{TV} \end{aligned}$$

where \(x_{\sim i}\) and \(x'_{\sim i}\) in \({\mathcal {X}}^{n - 1}\) are allowed to differ only in coordinate j, and \({\mathbb {P}}_{\nu }[X_i { = \cdot } | X_{\sim i} = x_{\sim i}]\) denotes the conditional law of \(X_i\) under \(\nu \) given \(X_{\sim i} = x_{\sim i}\). We say that (the law of) X satisfies the \(\ell _2\)-Dobrushin uniqueness condition if \(\Vert A\Vert _{OP} < 1\).

Note that in contrast to the interaction matrix J, the influence matrix A has nonnegative entries. We specialize the following Theorem to the setting of spins valued in \(\{\pm 1\}\), though it holds in more general settings.

Theorem 6

(Theorem 2.1 of [24]) Suppose that \(X \sim \nu \) is a random vector valued in the hypercube \(\{\pm 1\}^n\) and let A be the corresponding influence matrix (as in Definition 5). For any test function \(\varphi : \{\pm 1\}^n \rightarrow {\mathbb {R}}\),

$$\begin{aligned} (1 - \Vert A\Vert _{OP}) \mathrm {Var}(\varphi ) \le {\mathcal {E}}_{\nu }(\varphi ,\varphi ) \end{aligned}$$

where \({\mathcal {E}}_{\nu }(\varphi ,\varphi )\) is the Dirichlet form associated to the Glauber dynamics under \(\nu \).

To use this Theorem, we need to upper bound the spectral norm of the influence matrix for rank one models, which we do in the following Lemma.

Lemma 7

Suppose that

$$\begin{aligned} d\nu (x) = \exp \left( \frac{1}{2} \langle x, u \rangle ^2 + \langle h, x \rangle - c \right) d\mu (x) \end{aligned}$$
(25)

where \(\mu \) is the uniform measure on \(\{\pm 1\}^n\). The influence matrix A of \(\nu \) (from Definition 5) satisfies \(\Vert A\Vert _{OP} \le |u|^2\).

Proof

First observe that

$$\begin{aligned} {\mathbb {E}}[X_i \mid X_{\sim i}] = \tanh (u_i \langle X_{\sim i}, u \rangle + h_i). \end{aligned}$$

Therefore, from the definition of \(A_{ij}\) and since \(\tanh (\cdot )\) is 1-Lipschitz, we have

$$\begin{aligned} A_{ij} = \frac{1}{2} \max _{x_{\sim i}, x'_{\sim i}} |{\mathbb {E}}_{\nu }[X_i | X_{\sim i} = x_{\sim i}] - {\mathbb {E}}_{\nu }[X_i | X_{\sim i} = x'_{\sim i}]| \le |u_i u_j| \end{aligned}$$
(26)

where \(x_{\sim i}, x'_{\sim i}\) range over vectors in \(\{\pm 1\}^n\) differing only in coordinate j. Define v to be the element-wise absolute value of u, i.e. \(v_i = |u_i|\). Since A is a matrix with nonnegative entries, it follows from the Perron-Frobenius Theorem and (26) that \(\Vert A\Vert _{OP} \le \Vert v v^T \Vert _{OP} = |v|^2 = |u|^2\). \(\square \)

Combining Lemma 7 and Theorem 6 yields the desired Poincaré inequality for rank one models.

Lemma 8

Suppose that \(\nu ,u\) are as in (25). Then for any test function \(\varphi : \{\pm 1\}^n \rightarrow {\mathbb {R}}\),

$$\begin{aligned} {(1 - |u|^2)} \mathrm {Var}(\varphi ) \le {\mathcal {E}}_{\nu }(\varphi ,\varphi ) \end{aligned}$$

where \({\mathcal {E}}_{\nu }\) is the Dirichlet form associated to the Glauber dynamics under \(\nu \).

2.3 The Dirichlet form is a supermartingale

Lemma 9

Let \(W_t\) be a Brownian motion adapted to a filtration \({\mathcal {F}}_t\). Let \(C_t\) be a matrix-valued process adapted to \({\mathcal {F}}_t\). Let \(\nu _0\) be an arbitrary measure on \(\{\pm 1\}^n\) and suppose that \(F_t\) is a solution to the SDE

$$\begin{aligned} d F_t(x) = F_t(x) \langle C_t (x-a_t), d W_t \rangle , \qquad F_0(x) = 1, \forall x \in \{\pm 1\}^n \end{aligned}$$

where \(d\nu _t(x) = F_t(x) d\nu _0(x)\) and \(a_t = \int x d\nu _t(x)\). Let \(\varphi : \{\pm 1\}^n \rightarrow {\mathbb {R}}\) be an arbitrary test function. Then the Dirichlet form \({\mathcal {E}}_{\nu _t}(\varphi ,\varphi )\) is a supermartingale.

Proof

We use that the Dirichlet form can be rewritten as

$$\begin{aligned} {\mathcal {E}}_{\nu _t}(\varphi ,\varphi ) = \sum _{x \sim y} \frac{ \nu _t(x) \nu _t(y) }{\nu _t(x) + \nu _t(y)} (\varphi (x) - \varphi (y))^2 \end{aligned}$$
(27)

where \(x \sim y\) denotes the adjacency relation on the hypercube, i.e. x and y differ in exactly one coordinate. To see this, consider arbitrary \(\nu \), let X and Y be two independent samples from \(\nu \), and observe

$$\begin{aligned} {\mathcal {E}}_{\nu }(\varphi ,\varphi )&= {\mathbb {E}}_{\nu } \sum _{i = 1}^n \mathrm {Var}(\varphi (X) \mid X_{\sim i}) \nonumber \\&= \frac{1}{2} {\mathbb {E}}_{\nu } \sum _{i = 1}^n {\mathbb {E}}_{\nu }[(\varphi (Y) - \varphi (X))^2 \mid Y_{\sim i} = X_{\sim i}, X_{\sim i}] \nonumber \\&= \frac{1}{2} {\mathbb {E}}_{\nu } \sum _{i = 1}^n \frac{(\varphi (Y) - \varphi (X))^2 \cdot {\mathbf {1}}[Y_{\sim i} = X_{\sim i}]}{{\mathbb {P}}_{\nu }(Y_{\sim i} = X_{\sim i} \mid X_{\sim i})} \nonumber \\&= \sum _{x \sim y} \nu (x) \nu (y) (\varphi (x) - \varphi (y))^2 \sum _{i = 1}^n \frac{{\mathbf {1}}[y_{\sim i} = x_{\sim i}]}{{\mathbb {P}}_{\nu }(Y_{\sim i} = x_{\sim i})}\nonumber \\&= \sum _{x \sim y} \frac{\nu (x) \nu (y)}{\nu (x) + \nu (y)} (\varphi (x) - \varphi (y))^2 \end{aligned}$$
(28)

where in the second equality we used the identity \(\mathrm {Var}(X) = \frac{1}{2} {\mathbb {E}}[(X - Y)^2]\) for Y an independent copy of X.

Given this, it suffices to show that \(\frac{ \nu _t(x) \nu _t(y) }{\nu _t(x) + \nu _t(y)}\) is a supermartingale for fixed \(x \sim y\). Let us calculate the Ito differential \(d \frac{\nu _t(x) \nu _t(y)}{\nu _t(x) + \nu _y(y)}\). We have by Ito’s Lemma,

$$\begin{aligned} d \log \nu _t(x) = \langle d W_t, {\tilde{x}} \rangle - \frac{1}{2} |{\tilde{x}}|^2 dt. \end{aligned}$$

and

$$\begin{aligned} d \log \nu _t(y) = \langle d W_t, {\tilde{y}} \rangle - \frac{1}{2} |{\tilde{y}}|^2 dt. \end{aligned}$$

where \({\tilde{x}} = C_t(x-a_t)\), \({\tilde{y}} = C_t(y-a_t)\). Moreover,

$$\begin{aligned} d \log \left( \nu _t(x) + \nu _t(y) \right) = \langle d W_t, \alpha {\tilde{x}} + \beta {\tilde{y}} \rangle - \frac{1}{2} | \alpha {\tilde{x}} + \beta {\tilde{y}} |^2 dt \end{aligned}$$

where \(\alpha = \frac{\nu _t(x)}{\nu _t(x) + \nu _t(y)}\), \(\beta = \frac{\nu _t(y)}{\nu _t(x) + \nu _t(y)}\). Therefore,

$$\begin{aligned}&d \left( \frac{\nu _t(x) \nu _t(y)}{\nu _t(x) + \nu _y(y)} \right) \\&= \left( \frac{\nu _t(x) \nu _t(y)}{\nu _t(x) + \nu _y(y)} \right) \frac{1}{2} \left( | \alpha {\tilde{x}} + \beta {\tilde{y}} |^2 + | \beta {\tilde{x}} + \alpha {\tilde{y}} |^2 - |{\tilde{x}}|^2 - |{\tilde{y}}|^2 \right) dt + \text{ martingale }. \end{aligned}$$

So again by Ito’s Lemma, since \(d e^{S_t} = e^{S_t} d S_t + \frac{1}{2} e^{S_t} d[S]_t\), we have

$$\begin{aligned} d \left( \frac{\nu _t(x) \nu _t(y)}{\nu _t(x) + \nu _y(y)} \right)= & {} \left( \frac{\nu _t(x) \nu _t(y)}{\nu _t(x) + \nu _y(y)} \right) \frac{1}{2} \left( | \alpha {\tilde{x}} + \beta {\tilde{y}} |^2 + | \beta {\tilde{x}} + \alpha {\tilde{y}} |^2 - |{\tilde{x}}|^2 - |{\tilde{y}}|^2 \right) dt \\&+ \text{ martingale }. \end{aligned}$$

By convexity of \(|\cdot |^2\) and since \(\alpha + \beta = 1\), the above expression is a supermartingale. \(\square \)

3 Consequences for mixing time

By standard arguments which we now recall, the Poincaré inequality implies mixing time estimates for the Glauber dynamics. For a Markov semigroup \(P_t = e^{t\Lambda }\), reversible with respect to \(\nu \), a Poincaré inequality

$$\begin{aligned} \gamma \mathrm {Var}_{\nu }[\varphi ] \le {\mathcal {E}}_{\nu }(\varphi ,\varphi ) \end{aligned}$$

is equivalent to a spectral gap estimate:

$$\begin{aligned} \gamma \le \lambda _1 - \lambda _2 \end{aligned}$$

where \(\lambda _1,\lambda _2\) are the top two eigenvalues of the transition rate matrix \(\Lambda \) (see e.g. [23]). The quantity \(1/\gamma \) is known as the relaxation time of the Markov chain. As usual, we let \(P_t(\cdot ,\cdot )\) denote the transition kernel of the Markov chain. Linear algebraic arguments establish the following mixing time estimate:

Theorem 10

(Theorem 20.6 of [16]) Let \(P_t\) be a reversible Markov semigroup over the finite state space \(\Omega \), with stationary measure \(\pi \) and spectral gap \(\gamma \). Then,

$$\begin{aligned} \max _{x \in \Omega } \Vert P_t(x,\cdot ) - \pi \Vert _{TV} \le \epsilon \end{aligned}$$

as long as

$$\begin{aligned} t \ge \frac{1}{\gamma } \log \frac{1}{\epsilon \min _{x \in \Omega } \pi (x)}. \end{aligned}$$

Applied to our situation, we have \(\gamma = 1 - \Vert J\Vert _{OP}\) by Theorem 1 and

$$\begin{aligned} \min _{x \in \{\pm 1\}^n} \nu _0(x) \ge 2^{-n} e^{-2n \Vert J\Vert _{OP} - 2|h|_1} \end{aligned}$$

from the definition and Hölder’s inequality. As a result, we obtain the following mixing time estimate for Glauber dynamics:

Theorem 11

For \(P_t\) the continuous time Glauber dynamics on \(\nu _0\) defined in (5),

$$\begin{aligned} \max _{x \in \{\pm 1\}^n} \Vert P_t(x,\cdot ) - \nu _0\Vert _{TV} \le \epsilon \end{aligned}$$

as long as

$$\begin{aligned} t \ge \frac{1}{1 - \Vert J\Vert _{OP}} \left( (1 + 2 \Vert J\Vert _{OP})n + 2 |h|_1 + \log \frac{1}{\epsilon } \right) . \end{aligned}$$

One unit of time for the continuous dynamics corresponds to a Poissonian (with parameter n) number of steps of the discrete-time Glauber dynamics. Correspondingly, Theorem 1 implies an \(O\left( \frac{n^2 + \Vert h\Vert _1 n + n\log (1/\epsilon )}{1 - \Vert J\Vert _{OP}}\right) \) mixing time estimate for the discrete time Glauber dynamics, using Theorem 12.3 of [16] in place of Theorem 10 above.

4 A needle decomposition theorem

In this section we formulate a structural theorem, which follows as a byproduct of our proof. As mentioned above, this theorem may be thought of as an analogue to the technique due to Kannan et al. [12] used in convex geometry (this technique was later generalized to the context of Riemannian manifolds, see [14]). It roughly states that measures with arbitrary quadratic potentials can be decomposed into mixtures of measures whose potentials are quadratic of rank one, in a way that: (i) The integral of some test function \(\varphi \) is preserved, and (ii) the operator norm of the quadratic potential does not increase.

For \(v,u \in {\mathbb {R}}^n\), consider the measure \(w_{v,u}\) on \(\{\pm 1\}^n\) defined as

$$\begin{aligned} \frac{d w_{u,v}}{d\mu }(x) =\frac{1}{Z_{u,v}} \exp \left( \frac{1}{2}\langle u, x \rangle ^2 + \langle v, x \rangle \right) \end{aligned}$$

where \(Z_{u,v} = \int _{\{\pm 1\}^n} e^{\frac{1}{2}\langle u, x \rangle ^2 + \langle v, x \rangle } d \mu \).

Following roughly the same lines as the proof of Theorem 1 gives rise to the following result.

Theorem 12

Let \(\nu _0\) be a probability measure on \(\{\pm 1\}^n\) of the form

$$\begin{aligned} \frac{d\nu _0}{d\mu }(x) =\frac{1}{Z} \exp \left( \frac{1}{2} \langle x, J x \rangle + \langle h, x \rangle \right) , \end{aligned}$$
(29)

where J is positive definite, and let \(\varphi : \{\pm 1\}^n \rightarrow {\mathbb {R}}\). There exists a probability measure m on \({\mathbb {R}}^n \times {\mathbb {R}}^n\) such that \(\nu _0\) admits the decomposition

$$\begin{aligned} \nu _0(A) = \int w_{u,v}(A) d m(u,v), ~~ \forall A \subset \{\pm 1\}^n, \end{aligned}$$

with the properties that:

  1. 1.

    m-almost surely, (uv) are such that

    $$\begin{aligned} \int \varphi dw_{u,v} = \int \varphi d \nu _0, \end{aligned}$$
  2. 2.

    such that for m-almost every (uv), we have \(|u| \le \Vert J\Vert _{OP}\),

  3. 3.

    and such that for all \(x \sim y\) on the hypercube, we have the following inequality of conductances:

    $$\begin{aligned} \int \frac{w_{u,v}(x) w_{u,v}(y)}{w_{u,v}(x) + w_{u,v}(y)} dm(u,v) \le \frac{\nu _0(x) \nu _0(y)}{\nu _0(x) + \nu _0(y)}. \end{aligned}$$

Proof

(sketch) The decomposition follows by considering the evolution defined in (11) and (12) and defining the measure m according to the decomposition implied by equation (20) above. The last property is a consequence of the corresponding supermartingale property shown in the proof of Lemma 9. \(\square \)

The theorem can be used to reduce the concentration of a test function \(\varphi \) over an Ising model to concentration over the measures \(w_{u,v}\), as demonstrated by the following corollary.

Corollary 13

Let \(K>0\) and let \(\varphi : \{\pm 1\}^n \rightarrow {\mathbb {R}}\) be a function such that for all \(u,v \in {\mathbb {R}}^n\) with \(|u| \le K\) one has \(\mathrm {Var}_{w_{u,v}}[\varphi ] \le 1\). Then for every \(\nu _0\) of the form (29) with \(\Vert J\Vert _{OP} \le K\), one has \(\mathrm {Var}_{\nu _0}[\varphi ] \le 1\).

Proof

Applying Theorem 12, the law of total variance implies that

$$\begin{aligned} \mathrm {Var}_{\nu _0}[\varphi ]&= \int _{{\mathbb {R}}^n \times {\mathbb {R}}^n} \left( \int \varphi d w_{u,v} - {\mathbb {E}}_{\nu _0}[\varphi ] \right) ^2 d m(u,v) + \int _{{\mathbb {R}}^n \times {\mathbb {R}}^n} \mathrm {Var}_{w_{u,v}}[\varphi ] d m(u,v) \\&= \int _{{\mathbb {R}}^n \times {\mathbb {R}}^n} \mathrm {Var}_{w_{u,v}}[\varphi ] d m(u,v) \le \sup _{|u| \le K, v \in {\mathbb {R}}^n} \mathrm {Var}_{w_{u,v}}[\varphi ]. \end{aligned}$$

The result follows readily. \(\square \)

Similarly, and analogous to the needle decomposition for convex sets, it allows us to establish functional inequalities for Ising models by reducing to the case of the rank one measures \(w_{u,v}\); not just the Poincaré inequality, but also related inequalities such as the Log–Sobolev inequality. For the (bounded rate) Glauber dynamics, there is no uniform Log-Sobolev inequality over the class of models we consider, as there is no such inequality for biased product measures [6] which are a special case. In subsequent work the Modified Log-Sobolev Inequality has been shown using this reduction [1].

5 Some examples

5.1 Sherrington–Kirkpatrick (SK) model

This is the Ising model with J symmetric and \(J_{ij} \sim {\mathcal {N}}(0, \beta ^2/n)\), i.e. up to rescaling J is drawn from the Gaussian Orthogonal Ensemble. Letting the diagonal of J be 0, the spectrum of a GOE is contained in \([-2 - \epsilon ,2 + \epsilon ]\) asymptotically almost surely for any \(\epsilon > 0\) (see e.g. [2]). Therefore our result implies the Poincaré inequality and polynomial time mixing for all \(\beta < 1/4\).

Remark 14

The Poincaré inequality (Theorem 1) applied to linear functions gives

$$\begin{aligned} \mathrm {Var}(\langle w, X \rangle ) \le \frac{1}{1 - \Vert J\Vert _{OP}} {\mathbb {E}}_{\nu } \sum _{i = 1}^n ({\mathbb {E}}_{\nu }[\langle w, X \rangle | X_{\sim i}] - \langle w, X \rangle )^2 \le \frac{1}{1 - \Vert J\Vert _{OP}} |w|^2 \end{aligned}$$

using (28), and estimate (3) from [3] gives a similar bound with a different constant. Thus, both results imply that in the SK model for any fixed \(\beta < 1/4\), \(\Vert \Sigma \Vert _{OP} = O(1)\) with high probability, where \(\Sigma = {\mathbb {E}}_{\nu } XX^T\) is the covariance matrix. This partially verifies Conjecture 11.5.1 of [22] that \(\Vert \Sigma \Vert _{OP} = O(1)\) for any \(\beta < 1\).

5.2 Diluted SK model (d-regular)

A variety of spin glass models on sparse graphs have been studied in the literature; one well-known “diluted” version of the SK model has the interaction matrix J supported on a sparse Erdős–Reyni random graph—see [21]. Along similar lines, we can consider a dilution where J is supported on a random d-regular graph with \(d \ge 3\). If we take a Rademacher disorder, i.e. \(J_{ij} \sim Uni \{\pm \beta \}\) for ij neighbors and otherwise \(J_{ij} = 0\), then it follows from a version of Friedman’s Theorem that \(\Vert J\Vert _{OP} \le \beta (2\sqrt{d - 1} + \epsilon )\) a.a.s. for any \(\epsilon > 0\)—see [4, 5, 19]. It follows from our results that we have the Poincaré inequality and fast mixing for all \(\beta < \frac{1}{4\sqrt{d - 1}}\), whereas the model is only in Dobrushin’s uniqueness regime for \(\beta = O\left( \frac{1}{d}\right) \)—note that up to constants the latter bound is tight for general Ising models on arbitrary d-regular graphs [9, 20].