1 Introduction

The Sherrington-Kirkpatrick Model

In 1975, Sherrington and Kirkpatrick [37] introduced a mean field model for a spin glass—a disordered magnetic alloy that exhibits unusual magnetic behavior. Given a configuration of N Ising spins,

$$\sigma= (\sigma_1,\ldots,\sigma_N) \in \varSigma_N = \{-1,+1\}^N, $$

the Hamiltonian of the model is given by

$$ H_N(\sigma) = \frac{1}{\sqrt{N}} \sum_{i,j =1}^N g_{ij}\sigma_i \sigma_j, $$
(1)

where (g ij ) are i.i.d. standard Gaussian random variables, collectively called the disorder of the model. The fact that the distribution of H N (σ) is invariant under the permutations of the coordinates of σ is called the symmetry between sites, which is what one usually understands by a mean field model. The Hamiltonian (1) is a Gaussian process with the covariance

$$ \mathbb{E}H_N\bigl(\sigma^1\bigr)H_N\bigl( \sigma^2\bigr) = \frac{1}{N} \sum_{i,j =1}^N \sigma_i^1 \sigma_j^1 \sigma_i^2 \sigma_j^2 = N \Biggl( \frac{1}{N} \sum_{i=1}^N \sigma_i^1 \sigma_i^2 \Biggr)^2 = N R_{1,2}^2 $$
(2)

that depends on the spin configurations σ 1, σ 2 only through their normalized scalar product

$$ R_{1,2} = \frac{1}{N} \sigma^1 \cdot \sigma^2 =\frac{1}{N} \sum_{i=1}^N \sigma_i^1 \sigma_i^2, $$
(3)

called the overlap of σ 1 and σ 2. Since the distribution of a Gaussian process is determined by its covariance, it is not surprising that the overlaps play a central role in the analysis of the model. One can also consider a generalization of the Sherrington-Kirkpatrick model, the so-called mixed p-spin model, which corresponds to the Hamiltonian

$$ H_N(\sigma) = \sum_{p\geq1} \beta_p H_{N,p}(\sigma) $$
(4)

given by a linear combination of pure p-spin Hamiltonians

$$ H_{N,p}(\sigma) = \frac{1}{N^{(p-1)/2}} \sum_{i_1,\ldots,i_p = 1}^N g_{i_1\ldots i_p} \sigma_{i_1}\cdots \sigma_{i_p}, $$
(5)

where the random variables \((g_{i_{1}\ldots i_{p}})\) are standard Gaussian, independent for all p≥1 and all (i 1,…,i p ). Similarly to (2), it is easy to check that the covariance is, again, a function of the overlap,

$$ \mathbb{E}H_N\bigl(\sigma^1\bigr) H_N\bigl( \sigma^2\bigr) = N\xi(R_{1,2}), \quad \mbox{where } \xi(x)= \sum_{p\geq1}\beta_p^2 x^p. $$
(6)

One usually assumes that the coefficients (β p ) decrease fast enough to ensure that the process is well defined when the sum in (4) includes infinitely many terms. The model may also include the external field term h(σ 1+⋯+σ N ) with the external field parameter h∈ℝ. For simplicity of notation, we will assume that h=0, but all the results hold in the presence of the external field with some minor modifications. One of the main problems in these models is to understand the behavior of the ground state energy \(\min_{\sigma\in\varSigma_{N}} H_{N}(\sigma )\) in the thermodynamic limit N→∞. In a standard way this problem can be reduced to the computation of the limit of the free energy

$$ F_N = \frac{1}{N}\mathbb{E}\log Z_N, \quad \mbox{where } Z_N = \sum_{\sigma\in\varSigma_N } \exp \bigl(- \beta H_N(\sigma) \bigr), $$
(7)

for each inverse temperature parameter β=1/T>0, and a formula for this limit was proposed by Sherrington and Kirkpatrick in [37] based on the so-called replica formalism. At the same time, they observed that their replica symmetric solution exhibits “unphysical behavior” at low temperature, which means that it can only be correct at high temperature. Several years later, Parisi proposed in [32, 33], another, replica symmetry breaking, solution within replica theory, now called the Parisi ansatz, which was consistent at any temperature T>0 and, moreover, was in excellent agreement with computer simulations. The Parisi formula for the free energy is given by the following variational principle.

The Parisi Formula

A basic parameter, called the functional order parameter, is a distribution function ζ on [0,1],

$$ \zeta \bigl( \{q_p \} \bigr) = \zeta_{p} - \zeta_{p-1} \quad \mbox{for } p=0,\ldots, r, $$
(8)

corresponding to the choice of r≥1,

$$ 0=\zeta_{-1}< \zeta_0 <\cdots< \zeta_{r-1} < \zeta_r = 1 $$
(9)

and

$$ 0=q_0<q_1 <\cdots<q_{r-1}< q_r =1. $$
(10)

Notice that ζ carries some weight on r−1 points inside the interval (0,1) and on the points 0 and 1. In general, one can remove the atoms q 0=0 and q r =1 and allow ζ 0=0 and ζ r−1=1, but these cases can be recovered by continuity, so it will be convenient to assume that the inequalities in (9) are strict. Next, we consider i.i.d. standard Gaussian random variables (η p )0≤pr and define

$$ X_r = \log\operatorname{ch}\beta \biggl( \eta_0 \xi'(0)^{1/2} + \sum_{1\leq p\leq r} \eta_p \bigl(\xi'(q_{p}) - \xi'(q_{p-1}) \bigr)^{1/2} \biggr). $$
(11)

Recursively over 0≤lr−1, we define

$$ X_l=\frac{1}{\zeta_l}\log\mathbb{E}_l\exp \zeta_l X_{l+1}, $$
(12)

where \(\mathbb{E}_{l}\) denotes the expectation with respect to η l+1 only. Notice that X 0 is a function of η 0. Finally, we let θ(x)=′(x)−ξ(x) and define the so-called Parisi functional by

(13)

The Parisi solution predicted that the limit of the free energy is equal to

(14)

where the infimum is taken over all distribution functions ζ as above or, in other words, over all r≥1 and sequences (9) and (10). The replica method by which the formula (14) was discovered did not give a definite interpretation of the f.o.p. ζ or the functional (13), but a more clear picture emerged in the physics literature (a classical reference is [22]) during the subsequent interpretation of the Parisi ansatz in terms of some physical properties of the Gibbs measure of the model,

$$ G_N(\sigma) = \frac{\exp(- \beta H_N(\sigma))}{Z_N}, $$
(15)

where the normalizing factor Z N defined in (7) is called the partition function. To describe this picture, let us first explain a modern mathematical framework that is used to encode relevant information about the model in the thermodynamic limit.

Asymptotic Gibbs’ Measures

Notice that, due to the special covariance structure (6), the distribution of the Gaussian Hamiltonian (4) is invariant under orthogonal transformations of the set of spin configurations Σ N , which means that, given any orthogonal transformation U on ℝN, we have the equality in distribution

$$\bigl( H_N\bigl(U(\sigma)\bigr) \bigr)_{\sigma\in\varSigma_N} \stackrel{d}{=} \bigl( H_N(\sigma) \bigr)_{\sigma\in\varSigma_N}. $$

As a result, we are just as interested in the measure G N U −1 on the set U(Σ N ) as in the original Gibbs measure G N . To encode the information about G N up to orthogonal transformations, let us consider an i.i.d. sequence (σ l) l≥1 of replicas sampled from G N and consider the normalized Gram matrix of their overlaps

$$ R^N= \bigl(R^N_{l,l'} \bigr)_{l,l'\geq1} = \frac{1}{N} \bigl(\sigma^l \cdot\sigma^{l'} \bigr)_{l,l'\geq1}. $$
(16)

It is easy to see that, given R N, one can reconstruct the Gibbs measure G N up to orthogonal transformations, because, every time we observe an entry R l,l equal to 1, it means that the replicas l and l′ are equal. This way we can group equal replicas and then use the law of large numbers to estimate their Gibbs weights from the frequencies of their appearance in the sample. Since the Gram matrix describes relative position of points in the Euclidean space up to orthogonal transformations, the overlap matrix R N can be used to encode the information about the Gibbs measure G N up to orthogonal transformations. For this reason (and some other reasons that will be mentioned below), the Gibbs measure in the Sherrington-Kirkpatrick and mixed p-spin models is often identified with the distribution of the overlap matrix R N. Since the overlaps are bounded in absolute value by 1, this allows us to pass to the infinite-volume limit and consider a set of all possible limiting distributions of R N over subsequences. An infinite array R with any such limiting distribution inherits two basic properties of R N. First, it is non-negative definite and, second, it satisfies a “replica symmetry” property

$$ (R_{\pi(l), \pi(l')} )_{l,l'\geq1} \stackrel{d}{=} (R_{l,l'} )_{l,l'\geq1}, $$
(17)

for any permutation π of finitely many indices, where the equality is in distribution. Such arrays are called Gram-de Finetti arrays and the Dovbysh-Sudakov representation [15] (see also [24]) guarantees the existence of a random measure G on the unit ball of a separable Hilbert space H such that

$$ (R_{l,l'} )_{l\not= l'}\stackrel{d}{=} \bigl(\sigma^{l} \cdot\sigma^{l'} \bigr)_{l\not= l'}, $$
(18)

where (σ l) is an i.i.d. sequence of replicas sampled from the measure G. We will call such measure G an asymptotic Gibbs measure and think of it as a limit of the Gibbs measures G N over some subsequence, where the convergence is defined by way of the overlap arrays, as above. The reason why the diagonal elements are not included in (18) is because in (16) they were equal to 1 by construction, while the asymptotic Gibbs measure is not necessarily concentrated on the unit sphere. This mathematical definition of an asymptotic Gibbs measure via the Dovbysh-Sudakov representation was first given by Arguin and Aizenman in [3]. We will now describe (reinterpret) various predictions of the Parisi ansatz in the language of these asymptotic Gibbs measures.

Order Parameter and Pure States

First of all, in the work of Parisi, [34], the functional order parameter ζ in (8) was identified with the distribution of the overlap R 1,2 under the average (asymptotic) Gibbs measure,

$$ \zeta(A) = \mathbb{E}G^{\otimes2} \bigl(\bigl(\sigma^1, \sigma^2\bigr) : R_{1,2} = \sigma^1\cdot \sigma^2 \in A \bigr), $$
(19)

and the infimum in the Parisi formula (14) is taken over all possible candidates for this distribution in the thermodynamic limit. The fact that the infimum in (14) is taken over discrete distributions ζ is not critical, since the definition of the Parisi functional can be extended to all distributions on [0,1] by continuity. Another important idea introduced in [34] was the decomposition of the Gibbs measure into pure states. This simply means that an asymptotic Gibbs measure G is concentrated on countably many points (h l ) l≥1 in the Hilbert space H, and these are precisely the pure states. It was also suggested in [34] that it is reasonable to assume that all the pure states have equal norm, so the asymptotic Gibbs measure G is concentrated on some non-random sphere, G(∥h∥=c)=1. This implies, for example, that the largest value the overlap can take is R 1,2=c 2 when the replicas σ 1=σ 2=h l for some l≥1 and, since this can happen with positive probability, the distribution of the overlap has an atom at the largest point of its support, ζ({c 2})>0. We will see below that, due to some stability properties of the Gibbs measure, the pure state picture is correct if the distribution of the overlap has an atom at the largest point c 2 of its support, otherwise, the measure G is non-atomic, but is still concentrated on the sphere ∥h∥=c.

Ultrametricity

Perhaps, the most famous feature of the Parisi solution of the SK model in [32, 33], was the choice of an ultrametric parametrization of the replica matrix in the replica method, and in the work of Mézard, Parisi, Sourlas, Toulouse and Virasoro [20, 21], this was interpreted as the ultrametricity of the support of the asymptotic Gibbs measure G in H, which means that the distances between any three points in the support satisfy the strong triangle, or ultrametric, inequality

$$ \bigl\|{\sigma}^2 - {\sigma}^3\bigr\| \leq \max \bigl(\bigl\|{ \sigma}^1 - {\sigma}^2\bigr\|, \bigl\|{\sigma}^1 -{ \sigma}^3\bigr\| \bigr). $$
(20)

When the Gibbs measure is concentrated on the sphere ∥h∥=c, we can express the distance in terms of the overlap, ∥σ 1σ 22=2(c 2R 1,2), and, therefore, the ultrametricity can also be expressed in terms of the overlaps,

$$ R_{2,3} \geq\min(R_{1,2}, R_{1,3}). $$
(21)

One can think about ultrametricity as clustering of the support of G, because the ultrametric inequality (20) implies that the relation defined by the condition

$$ \sigma^1 \sim_d \sigma^2 \quad \Longleftrightarrow \quad 2\bigl\|{\sigma}^1 - {\sigma }^2\bigr\| \leq d $$
(22)

is an equivalence relation on the support of G for any d≥0. As we increase d, smaller clusters will collapse into bigger clusters and the whole process can be visualized by a branching tree. For a given diameter d≥0, one can consider the equivalence clusters and study the joint distribution of their Gibbs weights. In the case when d=0, which corresponds to the weights of the pure states (G({h l })) l≥1, this distribution was characterized in [20] using the replica method, but the same computation works for any d≥0. More generally, one can consider several cluster sizes d 1<⋯<d r and for each pure state h l consider the weights of the clusters it belongs to,

$$G\bigl(\|\sigma- h_l\| \leq d_1\bigr),\ldots,G\bigl(\|\sigma- h_l\| \leq d_r\bigr). $$

Again, using the replica method within the Parisi ansatz, one can study the joint distribution of all these weights for all pure states, but the computation gets very complicated and not particularly illuminating. On the other hand, the problem of understanding the distribution of the cluster weights is very important, since this gives, in some sense, a complete description of the asymptotic Gibbs measure G. Fortunately, a much more explicit and useful description of the asymptotic Gibbs measures arose from the study of some related toy models.

Derrida’s Random Energy Models

In the early eighties, Derrida proposed two simplified models of spin glasses: the random energy model (REM) in [10, 11], and the generalized random energy model (GREM) in [12, 13]. The Hamiltonian of the REM is given by a vector \((H_{N}(\sigma))_{\sigma\in\varSigma_{N}}\) of independent Gaussian random variables with variance N, which is a rather classical object. The GREM combines several random energy models in a hierarchical way with the ultrametric structure built into the model from the beginning. Even though these simplified models do not shed light on the Parisi ansatz in the SK model directly, the behavior of the Gibbs measures in these models was predicted to be, in some sense, identical to that of the SK model. For example, Derrida and Toulouse showed in [14] that the Gibbs weights in the REM have the same distribution in the thermodynamic limit as the Gibbs weights of the pure states in the SK model, described in [20], and de Dominicis and Hilhorst [9] demonstrated a similar connection between the distribution of the cluster weights in the GREM and the cluster weights in the SK model. Motivated by this connection with the SK model, in a seminal paper [36], Ruelle gave an alternative, much more explicit and illuminating, description of the Gibbs measure of the GREM in the infinite-volume limit in terms of a certain family of Poisson processes, as follows.

The Ruelle Probability Cascades

The points and weights of these measures will be indexed by ℕr for some fixed r≥1. It will be very convenient to think of ℕr as the set of leaves of a rooted tree (see Fig. 1) with the vertex set

(23)

where ℕ0={∅}, ∅ is the root of the tree and each vertex α=(n 1,…,n p )∈ℕp for pr−1 has children

$$\alpha n : = (n_1,\ldots,n_p,n) \in\mathbb{N}^{p+1} $$

for all n∈ℕ. Therefore, each vertex α is connected to the root ∅ by the path

$$\emptyset\to n_1 \to(n_1,n_2) \to\cdots \to(n_1,\ldots,n_p) = \alpha. $$

We will denote all the vertices in this path by (the root is not included)

$$ p(\alpha) = \bigl\{ n_1, (n_1,n_2), \ldots,(n_1,\ldots,n_p) \bigr\}. $$
(24)

The identification of the index set ℕr with the leaves of this infinitary tree is very important, because, even though the points in the support of the random measure will be indexed by α∈ℕr, the construction itself will involve random variables indexed by vertices of the entire tree. For each vertex , let us denote by |α| its distance from the root of the tree ∅, or, equivalently, the number of coordinates in α, i.e. α∈ℕ|α|. If we recall the parameters (9) then, for each , let Π α be a Poisson process on (0,∞) with the mean measure

$$ \zeta_{|\alpha|} x^{-1-\zeta_{|\alpha|}} \,dx $$
(25)

and let us generate these processes independently for all such α. Let us recall that each Poisson process Π α can be generated by partitioning (0,∞)=⋃ m≥1 S m into disjoint sets S 1=[1,∞) and S m =[1/m,1/(m−1)) for m≥2 and then on each set S m generating independently a Poisson number of points with the mean

$$\int_{S_m} \zeta_{|\alpha|} x^{-1-\zeta_{|\alpha|}}\,dx $$

from the probability distribution on S m proportional to (25). Let us mention that, for technical reasons, it is important that the parameters ζ |α| in this construction are strictly between 0 and 1, which is why we assumed that the inequalities in (9) are strict. One can arrange all the points in Π α in the decreasing order,

$$ u_{\alpha1} > u_{ \alpha2} >\cdots>u_{\alpha n} > \cdots, $$
(26)

and enumerate them using the children (αn) n≥1 of the vertex α. In other words, parent vertices enumerate independent Poisson processes Π α and child vertices enumerate individual points u αn . Given a vertex and the path p(α) in (24), we define

$$ w_\alpha= \prod_{\beta\in p(\alpha)} u_{\beta}. $$
(27)

Finally, for the leaf vertices α∈ℕr we define

$$ v_\alpha= \frac{w_\alpha}{\sum_{\beta\in\mathbb{N}^r} w_\beta}. $$
(28)

One can show that the denominator is finite with probability one, so this sequence is well defined. Now, let e α for be some sequence of orthonormal vectors in H. Given this sequence, we consider a set of points h α H indexed by α∈ℕr,

$$ h_\alpha= \sum_{\beta\in p(\alpha)} e_\beta (q_{|\beta|} - q_{|\beta|-1} )^{1/2}, $$
(29)

where the parameters (q p )0≤pr were introduced in (10). In other words, as we walk along the path p(α) to the leaf α∈ℕr, at each step β we add a vector in the new orthogonal direction e β of length \(\sqrt{q_{|\beta|} - q_{|\beta|-1}}\). We define a random measure G on the Hilbert space H by

$$ G(h_\alpha) = v_\alpha \quad \mbox{for } \alpha\in \mathbb{N}^r. $$
(30)

The measure G is called the Ruelle probability cascades (RPC) associated to the parameters (9) and (10). From the definition (29), it is clear that the scalar product h α h β between any two points in the support of G depends only on the number

$$ \alpha\wedge\beta := \bigl|p(\alpha) \cap p(\beta) \bigr| $$
(31)

of common vertices in the paths from the root ∅ to the leaves α,β∈ℕr. With this notation, (29) implies that h α h β =q αβ . Now, if we take three leaves α,β,γ∈ℕr then their paths satisfy

$$\beta\wedge\gamma\geq\min (\alpha\wedge\beta, \alpha \wedge \gamma ), $$

since the vertices shared by the path p(α) with both paths p(β) and p(γ) will also be shared by p(β) and p(γ) and, therefore,

$$ h_\beta\cdot h_\gamma\geq\min ( h_\alpha\cdot h_\beta, h_\alpha \cdot h_\gamma ), $$
(32)

so the support of G is ultrametric in H by construction. In the work of Ruelle [36], it was stated as an almost evident fact that the Gibbs measure in the Derrida GREM looks like the measure (30) in the infinite-volume limit, but a detailed proof of this was given later by Bovier and Kurkova in [6]. Because of the connection to the SK model mentioned above, the Ruelle probability cascades are precisely the measures that were expected to describe the Gibbs measures in the SK model in the sense that, asymptotically, the overlap array (16) can be approximated in distribution by an overlap array generated by some RPC. The points \((h_{\alpha})_{\alpha\in\mathbb{N}^{r}}\) are the pure states and the tree can be viewed as a branching tree that indexes the clusters around all the pure states. One can show that the distribution (19) of the overlap of two replicas sampled from the Ruelle probability cascades is equal to the distribution function in (8), which agrees with the Parisi interpretation of the functional order parameter ζ.

Fig. 1
figure 1

The leaves α∈ℕr index the pure states. The rightmost path is an example of p(α) in (24) for one leaf α. The figure corresponds to what is called “r-step replica symmetry breaking” in the Parisi ansatz

Such an explicit description of the expected asymptotic Gibbs measures was a very big step, because one could now study their properties using the entire arsenal of the theory of Poisson processes [19]. Some important properties of the Ruelle probability cascades were already described in the original paper of Ruelle [36], while other important properties, which express certain invariance features of these measures, were discovered later by Bolthausen and Sznitman in [5]. We will mention this again below when we talk about the unified stability property in the SK model. In the next few section we will explain that all the predictions of the physicists about the structure of the Gibbs measure in the SK and mixed p-spin models are, essentially, correct. In general, they hold under a small perturbation of the Hamiltonian, which does not affect the free energy in the infinite-volume limit, but for a class of the so-called generic mixed p-spin models they hold precisely, without any perturbation. First, we will explain the connection between the Gibbs measure and the Parisi formula for the free energy.

2 Free Energy and Gibbs Measure

The Aizenman-Sims-Starr Scheme

Before we describe rigorous results about the structure of the Gibbs measure, let us explain how this structure implies the Parisi formula for the free energy (14). For simplicity of notation, we will focus on the Sherrington-Kirkpatrick model (1) instead of the general mixed p-spin model (4). We begin with the so-called Aizenman-Sims-Starr cavity computation, which was introduced in [2]. Let us recall the definition of the partition function Z N in (7) and for j≥0 let us denote

$$ A_j = \mathbb{E}\log Z_{j+1} - \mathbb{E}\log Z_{j}, $$
(33)

with the convention that Z 0=1. Then we can rewrite the free energy as follows,

$$ F_N = \frac{1}{N}\mathbb{E}\log Z_N = \frac{1}{N} \sum_{j=0}^{N-1} A_j. $$
(34)

Clearly, this representation implies that if the sequence A N converges then its limit is also the limit of the free energy F N . Unfortunately, it is usually difficult to prove that the limit of A N exists (we will mention one such result at the end, when we talk about generic mixed p-spin models) and, therefore, this representation is used only to obtain a lower bound on the free energy,

$$ \liminf_{N\to\infty} F_N \geq\liminf_{N\to\infty} A_N. $$
(35)

Let us compare the partition functions Z N and Z N+1 and see what they have in common and what makes them different. If we denote ρ=(σ,ε)∈Σ N+1 for σΣ N and ε∈{−1,+1} then we can write

$$ H_{N+1}(\rho) = H_N'(\sigma) + { \varepsilon}z_N(\sigma), $$
(36)

where

$$ H_N'(\sigma) = \frac{1}{\sqrt{N+1}} \sum _{i,j =1}^N g_{ij}\sigma_i \sigma_j $$
(37)

and

$$ z_N(\sigma) = \frac{1}{\sqrt{N+1}} \sum_{i=1}^N (g_{i(N+1)} + g_{(N+1)i} )\sigma_i. $$
(38)

One the other hand, the part (37) of the Hamiltonian H N+1(ρ) is, in some sense, also a part of the Hamiltonian H N (σ) since, in distribution, the Gaussian process H N (σ) can be decomposed into a sum of two independent Gaussian processes

$$ H_N(\sigma) \stackrel{d}{=} H_N'(\sigma) + y_N(\sigma), $$
(39)

where

$$ y_N(\sigma) = \frac{1}{\sqrt{N(N+1)}} \sum_{i,j =1}^N g_{ij}'\sigma_i \sigma_j $$
(40)

for some independent array \((g_{ij}')\) of standard Gaussian random variables. Using the above decompositions (36) and (39), we can write

$$ \mathbb{E}\log Z_{N+1} = \mathbb{E}\log\sum _{\sigma\in\varSigma_N} 2\operatorname {ch} \bigl(-\beta z_N(\sigma) \bigr) \exp \bigl(- \beta H_{N}'(\sigma) \bigr) $$
(41)

and

$$ \mathbb{E}\log Z_{N} = \mathbb{E}\log\sum _{\sigma\in\varSigma_N} \exp \bigl(-\beta y_N(\sigma) \bigr) \exp \bigl({-}\beta H_{N}'(\sigma) \bigr). $$
(42)

Finally, if we consider the Gibbs measure on Σ N corresponding to the Hamiltonian \(H_{N}'(\sigma)\) in (37),

$$ G_N'(\sigma) = \frac{\exp(- \beta H_N'(\sigma))}{Z_N'} \quad \mbox{where } Z_N' = \sum_{\sigma\in\varSigma_{N}} \exp \bigl( -\beta H_N'(\sigma ) \bigr), $$
(43)

then (41), (42) can be combined to give the Aizenman-Sims-Starr representation,

(44)

Notice that the Gaussian processes (z N (σ)) and (y N (σ)) are independent of the randomness of the measure \(G_{N}'\) and have the covariance

$$ \mathbb{E}z_N\bigl(\sigma^1\bigr) z_N\bigl( \sigma^2\bigr) = 2 R_{1,2} + O\bigl(N^{-1}\bigr), \quad \mathbb{E}y_N\bigl(\sigma^1\bigr) y_N\bigl( \sigma^2\bigr) = R_{1,2}^2 + O \bigl(N^{-1}\bigr). $$
(45)

Suppose that we replace the Gibbs measure \(G_{N}'\) in (44) by the Ruelle probability cascades G in (30) and replace the Gaussian processes (z N (σ)) and (y N (σ)) by Gaussian processes (z(h α )) and (y(h α )) indexed by the points \((h_{\alpha})_{\alpha\in\mathbb{N}^{r}}\) in the support of G with the same covariance structure as (45),

$$ \mathbb{E}z(h_{\alpha}) z(h_{\beta}) = 2 h_{\alpha}\cdot h_{\beta }, \qquad \mathbb{E}y(h_{\alpha}) y(h_{\beta}) = (h_{\alpha}\cdot h_{\beta})^2. $$
(46)

Such processes are very easy to construct explicitly, if we recall the definition of the points h α in (29). Namely, let be a sequence of i.i.d. standard Gaussian random variables and, for each p≥1, let us define a family of Gaussian random variables indexed by \((h_{\alpha})_{\alpha\in\mathbb{N}^{r}}\),

$$ g_p(h_\alpha) = \sum_{\beta\in p(\alpha)} \eta_\beta\bigl(q_{|\beta |}^p - q_{|\beta|-1}^p \bigr)^{1/2}. $$
(47)

Recalling the notation (31), it is obvious that the covariance of this process is

$$ \mathbb{E}g_p(h_\alpha)g_p(h_\beta) = q_{\alpha\wedge\beta}^p = (h_\alpha \cdot h_\beta)^p, $$
(48)

so we can take \(z(h_{\alpha}) = \sqrt{2} g_{1}(h_{\alpha})\) and y(h α )=g 2(h α ). Then the functional (44) will be replaced by

(49)

Writing here is not an abuse of notation, since it turns out that the right hand side coincides with the Parisi functional in (13) when ξ(x)=x 2 and θ(x)=x 2, which is precisely the case of the Sherrington-Kirkpatrick model. The equality of these two different representations can be proved using the properties of the Poisson processes with the mean measures (25) that appear in the definition of the Ruelle probability cascades, and (49) gives a very natural interpretation of the Parisi functional in (13). It remains to explain that, if we assume the Parisi ansatz for the Gibbs measure, then the connection between (44) and (49) is more than just a formal resemblance and that together with (35) it implies that

(50)

This is again a consequence of the fundamental fact that we mentioned above, namely, that all the relevant information about the Gibbs measure in the SK model is contained in the overlap matrix. In the present context, it is not difficult to show that, due to the covariance structure of the Gaussian processes (45) and (46), the quantities A N in (44) and in (46) are, in fact, given by the same continuous functional of the distribution of the overlap arrays R N and R generated by i.i.d. samples of replicas from \(G_{N}'\) and G correspondingly. As a result, if we consider a subsequence along which the lim inf N→∞ A N is achieved and, at the same time, the array R N converges in distribution to some array R , then the lower limit of A N can be written as the same functional of the distribution of R . Finally, if we believe that the predictions of the physicists are correct, we can approximate R in distribution by the overlap arrays R generated by the Ruelle probability cascades and, therefore, the lower limit is bounded from below by , which proves (50). The main difficulty in this approach is to show that the Parisi ansatz for the Gibbs measure and the overlap array is, indeed, correct in the infinite-volume limit, which will be discussed below.

Guerra’s Replica Symmetry Breaking Bound

The fact that the Parisi formula also gives an upper bound on the free energy,

(51)

was proved in a breakthrough work of Guerra [17]. The original argument in [17] was given in the language of the recursive formula (12), but, as was observed in [2], it can also be written in the language of the Ruelle probability cascades. The essence of Guerra’s result is the following interpolation between the SK model and the Ruelle probability cascades. Let (z i (h α )) and (y i (h α )) for i≥1 be independent copies of the processes (z(h α )) and (y(h α )) in (46) and, for 0≤t≤1, let us consider the Hamiltonian

$$ H_{N,t}(\sigma,h_\alpha) = \sqrt{t}\, H_N(\sigma) + \sqrt{1-t}\sum_{i=1}^N z_{i}(h_\alpha) \sigma_i +\sqrt{t}\sum_{i=1}^N y_{i}(h_\alpha) $$
(52)

indexed by vectors (σ,h α ) such that σ belongs to the support Σ N of the Gibbs measure G N and h α belongs to the support of the measure G in (30). To this Hamiltonian one can associate the free energy

$$ \varphi(t)=\frac{1}{N}\mathbb{E}\log\sum_{\sigma,\alpha} v_{\alpha } \exp \bigl( {-}\beta H_{N, t}(\sigma,h_\alpha) \bigr) $$
(53)

and, by a straightforward computation using the Gaussian integration by parts, one can check that φ′(t)≤0 and, therefore, φ(1)≤φ(0). It is easy to see that

$$\varphi(0) = \frac{1}{N}\mathbb{E}\log\sum_{\alpha\in\mathbb{N}^r} v_{\alpha} \prod_{i\leq N} 2\operatorname{ch} \bigl( {-}\beta z_{i}(h_\alpha) \bigr) $$

and

$$\varphi(1) = F_N + \frac{1}{N}\mathbb{E}\log\sum _{\alpha\in\mathbb{N}^r} v_{\alpha} \prod_{i\leq N} \exp \bigl({-}\beta y_{i}(h_\alpha) \bigr). $$

It is, again, a consequence of the properties of the Poisson processes involved in the construction of the Ruelle probability cascades that, in fact, the independent copies for iN can be decoupled here and

$$\varphi(0) = \mathbb{E}\log\sum_{\alpha\in\mathbb{N}^r} 2\operatorname {ch} \bigl({-}\beta z(h_\alpha ) \bigr) v_{\alpha} $$

and

$$\varphi(1) = F_N + \mathbb{E}\log\sum_{\alpha\in\mathbb{N}^r} \exp \bigl({-}\beta y(h_\alpha ) \bigr) v_{\alpha} . $$

Recalling the representation (49), the inequality φ(1)≤φ(0) can be written as , which yields the upper bound (51). After Guerra’s discovery of the above interpolation argument, Talagrand proved in his famous tour-de-force paper [42] that the Parisi formula, indeed, gives the free energy in the SK model in the infinite-volume limit. Talagrand’s ingenious proof finds a way around the Parisi ansatz for the Gibbs measure, but it is rather involved. The Aizenman-Sims-Starr scheme above gives a more natural approach if we are able to confirm the Parisi ansatz for the asymptotic Gibbs measures. Moreover, the argument in [42] works only for mixed p-spin models for even p≥2, while the above approach can be modified to yield the Parisi formula in the case when odd p-spin interactions are present as well (see [29]). Nevertheless, to understand the impact of the results of Guerra [17] and Talagrand [42] (proved in 2003), one only needs to remember that a proof of the existence of the limit of the free energy by Guerra and Toninelli in [18] was quite an impressive result only a year earlier.

3 Stability of the Gibbs Measure

The Ghirlanda-Guerra Identities

Below we will explain an approach to proving the predictions of the physicists for the Gibbs measure based on the so called Ghirlanda-Guerra identities. These identities were first discovered by Ghirlanda and Guerra in [16] in the setting of the mixed p-spin models, where they were proved on average over the parameters (β p ) in (4). However, the general idea can be used in many other models if we utilize the mixed p-spin Hamiltonian in the role of a perturbation. We will try to emphasize wide applicability of this idea by giving some mild sufficient conditions that ensure the validity of these identities. For all p≥1, let us consider

$$ g_{p}(\sigma) = \frac{1}{N^{p/2}} \sum_{i_1,\ldots,i_p = 1}^N g_{i_1\ldots i_p}' \sigma_{i_1}\ldots \sigma_{i_p}, $$
(54)

where the random variables \((g_{i_{1}\ldots i_{p}}')\) are i.i.d. standard Gaussian and independent of everything else, and define

$$ g(\sigma) = \sum_{p\geq1} 2^{-p} x_pg_{p}(\sigma) $$
(55)

for some parameters (x p ) p≥1 that belong to the interval x p ∈[0,3] for all p≥1. This Gaussian process is of the same nature as the mixed p-spin Hamiltonian (4) except for a different normalization in (54), which implies that the covariance

$$ \mathbb{E}g\bigl(\sigma^1\bigr) g\bigl(\sigma^2\bigr) = \sum_{p\geq1} 4^{-p} x_p^2R_{1,2}^p. $$
(56)

In other words, g(σ) is of a smaller order than H N (σ) because of the additional factor N −1/2. Let us now consider a model with an arbitrary Hamiltonian H(σ) on Σ N , either random or non-random, and consider the perturbed Hamiltonian

$$ H^{\mathrm{pert}}(\sigma) = H(\sigma) + s g(\sigma), $$
(57)

for some parameter s≥0. What is the advantage of adding the perturbation term (55) to the original Hamiltonian of the model? The answer to this question lies in the fact that, under certain conditions, this perturbation term, in some sense, regularizes the Gibbs measure and forces it to satisfy useful properties without affecting our mail goal—the computation of the free energy. Using (56) and the independence of g(σ) and H(σ), it is easy to see that

(58)

Both inequalities follow from Jensen’s inequality applied either to the sum or the expectation with respect to g(σ) conditionally on H(σ). This implies that if we let s=s N in (57) depend on N in such a way that

$$ \lim_{N\to\infty} N^{-1} s_N^2 = 0, $$
(59)

then the limit of the free energy is unchanged by the perturbation term sg(σ). On the other hand, if s=s N is not too small then it turns out that the perturbation term has a non-trivial influence on the Gibbs measure of the model. Consider a function

$$ \varphi= \log\sum_{\sigma\in\varSigma_N} \exp \bigl(H(\sigma) + s g( \sigma) \bigr) $$
(60)

that will be viewed as a random function φ=φ((x p )) of the parameters (x p ), and suppose that

$$ \sup \bigl\{ \mathbb{E}|\varphi- \mathbb{E}\varphi| : 0\leq x_p\leq3,\ p \geq 1 \bigr\} \leq v_N(s) $$
(61)

for some function v N (s) that describes how well φ((x p )) is concentrated around its expected value uniformly over all possible choices of the parameters (x p ) from the interval [0,3]. Main condition about the model will be expressed in terms of this concentration function, namely, that there exists a sequence s=s N such that

$$ \lim_{N\to\infty} s_N=\infty \quad \mbox{and} \quad \lim_{N\to\infty} s_N^{-2} v_N(s_N) = 0. $$
(62)

Of course, this condition will be useful only if the sequence s N also satisfies (59) and the perturbation term does not affect the limit of the free energy. In the case of the mixed p-spin model, H(σ)=−βH N (σ) for H N (σ) defined in (4), one can easily check using some standard Gaussian concentration inequalities that (59) and (62) hold with the choice of s N =N γ for any 1/4<γ<1/2. One can also check that this condition holds in other models, for example, random p-spin and K-sat models, or for any non-random Hamiltonian H(σ). Now, let

$$ G_N(\sigma) = \frac{\exp H^{\mathrm{pert}}(\sigma)}{Z_N}, \quad \mbox{where } Z_N = \sum_{\sigma\in\varSigma_{N}} \exp H^{\mathrm{pert}}(\sigma), $$
(63)

be the Gibbs measure corresponding to the perturbed Hamiltonian (57) and let 〈⋅〉 denote the average with respect to \(G_{N}^{\otimes\infty}\), or all replicas. For any n≥2, p≥1 and any function f of the overlaps (R l,l) l,l′≤n of n replicas, let us define

$$ \varDelta(f,n,p) = \biggl| \mathbb{E} \bigl\langle f R_{1,n+1}^p \bigr\rangle- \frac {1}{n}\mathbb{E} \langle f \rangle \mathbb{E} \bigl \langle R_{1,2}^p \bigr\rangle- \frac{1}{n}\sum _{l=2}^{n}\mathbb{E} \bigl\langle f R_{1,l}^p \bigr\rangle \biggr|. $$
(64)

If we now think of the parameters (x p ) p≥1 in (55) as a sequence of i.i.d. random variables with the uniform distribution on [1,2] and denote by \(\mathbb{E}_{x}\) the expectation with respect to such sequence then (62) is a sufficient condition to guarantee that

$$ \lim_{N\to\infty} \mathbb{E}_x \varDelta(f,n,p) = 0. $$
(65)

Once we have this statement on average, we can, of course, make a specific non-random choice of parameters \((x_{p}^{N})_{p\geq1}\), which may vary with N, such that

$$ \lim_{N\to\infty} \varDelta(f,n,p) = 0. $$
(66)

In the thermodynamic limit, this can be expressed as a property of the asymptotic Gibbs measures. Let us consider any subsequential limit of the distribution of the overlap matrix R N generated by the perturbed Gibbs measure G N in (63) and let G be the corresponding asymptotic Gibbs measure on a Hilbert space defined via the Dovbysh-Sudakov representation. If we still denote by 〈⋅〉 the average with respect to G ⊗∞ then (66) implies the Ghirlanda-Guerra identities,

$$ \mathbb{E} \bigl\langle f R_{1,n+1}^p \bigr\rangle = \frac{1}{n}\mathbb{E} \langle f \rangle\mathbb{E} \bigl\langle R_{1,2}^p \bigr\rangle + \frac{1}{n}\sum _{l=2}^{n}\mathbb{E} \bigl\langle f R_{1,l}^p \bigr\rangle, $$
(67)

for any n≥2,p≥1 and any function f of the overlaps (R l,l) l,l′≤n =(σ lσ l) l,l′≤n of n replicas sampled from G. Of course, since any bounded measurable function ψ can be approximated by polynomials (in the L 1 sense),

$$ \mathbb{E} \bigl\langle f \psi(R_{1,n+1}) \bigr\rangle = \frac{1}{n}\mathbb{E} \langle f \rangle\mathbb{E} \bigl\langle \psi(R_{1,2}) \bigr\rangle + \frac{1}{n}\sum _{l=2}^{n}\mathbb{E} \bigl\langle f \psi (R_{1,l}) \bigr\rangle. $$
(68)

If ζ is the distribution of one overlap, as in (19), then (68) can be expressed by saying that, conditionally on (R l,l) l,l′≤n , the distribution of R 1,n+1 is given by the mixture

$$n^{-1} \zeta+ n^{-1} \sum_{l=2}^n \delta_{R_{1,l}}. $$

These identities already appear in the Parisi replica method where they arise as a consequence of “replica equivalence”, but Ghirlanda and Guerra gave the first mathematical proof using the self-averaging of the free energy, which is what the condition (62) basically means. The self-averaging of the free energy in the Sherrington-Kirkpatrick model was first proved by Pastur and Shcherbina in [35]. The identities (67) might look mysterious, but, in fact, they are just a manifestation of the general principle of the concentration of a Hamiltonian, in this case

$$ \lim_{N\to\infty} \mathbb{E}_x \mathbb{E} \bigl\langle \bigl|g_p(\sigma) -\mathbb{E} \bigl\langle g_p(\sigma) \bigr \rangle \bigr| \bigr\rangle=0, $$
(69)

which can be proved using the self-averaging of the free energy condition (62). The way (69) implies the Ghirlanda-Guerra identities is very simple, essentially, by testing this concentration on a test function. If we fix n≥2 and consider a bounded function f=f((R l,l) l,l′≤n ) of the overlaps of n replicas then

$$ \bigl|\mathbb{E} \bigl\langle f g_p\bigl(\sigma^1\bigr) \bigr \rangle - \mathbb{E} \langle f \rangle\mathbb{E} \bigl\langle g_p( \sigma) \bigr\rangle \bigr| \leq \|f\|_\infty \mathbb{E} \bigl\langle \bigl| g_p(\sigma) - \mathbb{E} \bigl\langle g_p(\sigma) \bigr \rangle \bigr| \bigr\rangle $$
(70)

and (69) implies that

$$ \lim_{N\to\infty} \mathbb{E}_x \bigl|\mathbb{E} \bigl\langle f g_p\bigl(\sigma^1\bigr) \bigr\rangle - \mathbb{E} \langle f \rangle\mathbb{E} \bigl\langle g_p(\sigma) \bigr\rangle \bigr| =0. $$
(71)

This is precisely the equation (65) after we use the Gaussian integration by parts.

The Aizenman-Contucci Stochastic Stability

Another famous property of the Gibbs measure, the so-called stochastic stability discovered by Aizenman and Contucci in [1], is also a consequence of (69). A proof can be found in [43] (see also [8]) and a rigorous justification of how to extend the stochastic stability to the setting of the asymptotic Gibbs measures can be found in [4]. To state this property, let us assume, for simplicity, that the asymptotic Gibbs measure G is atomic, G(h l )=v l , with the weights arranged in non-increasing order, v 1v 2≥⋯ . Given integer p≥1, let (g p (h l )) l≥1 be a Gaussian sequence conditionally on G indexed by the points (h l ) l≥1 with the covariance

$$ \mathbb{E}g_p(h_l) g_p(h_{l'}) = (h_l\cdot h_{l'})^p, $$
(72)

which is reminiscent of (48), only now we do not know a priori that the support {h l :l≥1} is ultrametric in H. Given t∈ℝ, consider a new measure

$$ G_t(h_l) = v_l^t= \frac{v_l \exp t g_p(h_l)}{\sum_{j\geq1} v_j \exp t g_p(h_j)} $$
(73)

defined by the random change of density proportional to exptg p (h l ). Then the Aizenman-Contucci stochastic stability, basically, states that this new measure generates the same overlap array in distribution as the original measure G. One way to express this property is as follows. Let π:ℕ→ℕ be a permutation such that the weights \((v_{\pi(l)}^{t})\) are also arranged in non-increasing order. Then,

$$ \bigl( \bigl(v_{\pi(l)}^t \bigr)_{l\geq1}, (h_{\pi(l)}\cdot h_{\pi (l')} )_{l,l'\geq1} \bigr) \stackrel{d}{=} \bigl( (v_l )_{l\geq1}, (h_l\cdot h_{l'} )_{l,l'\geq1} \bigr) $$
(74)

for any p≥1 and t∈ℝ, which, clearly, implies that the overlap arrays generated by these measures will have the same distribution. It is not difficult to see that, in fact, the two statements are equivalent.

Unified Stability Property

Even though the proof of the Parisi ansatz that will be described in the next section is based only on the Ghirlanda-Guerra identities, the reason we mention the Aizenman-Contucci stochastic stability (74) is because, in a number of ways, it played a very important role in the development of the area. In particular, one of the main ideas behind the proof of the Parisi ansatz was first discovered using a unified stability property, proved in [30], that combines the Ghirlanda-Guerra identities (67) and the Aizenman-Contucci stochastic stability (74). It can be stated with the notation in (74) as follows. It is well known (we will discuss this again below) that if the measure G satisfies the Ghirlanda-Guerra identities and if c 2 is the largest point of the support of the distribution of the overlap R 1,2 under \(\mathbb{E} G^{\otimes2}\) then, with probability one, G is concentrated on the sphere of radius c. Let

$$ b_p = \bigl(c^2\bigr)^p - \mathbb{E}\bigl \langle R_{1,2}^p\bigr\rangle. $$
(75)

Then, a random measure G satisfies the Ghirlanda-Guerra identities (67) and the Aizenman-Contucci stochastic stability (74) if and only if it is concentrated on the sphere of constant radius, say c, and for any p≥1 and t∈ℝ,

(76)

Comparing with (74), we see that the Ghirlanda-Guerra identities are now replaced by the statement that, after the permutation π which rearranges the weights in (73) in the decreasing order, the distribution of the Gaussian process (g p (h l )) will only be affected by a constant shift b p t. Interestingly, the unified stability property (76) was known for some time in the setting of the Ruelle probability cascades, where it was proved by Bolthausen and Sznitman in [5] using properties of the Poisson processes in the construction of the Ruelle probability cascades. However, the Ghirlanda-Guerra identities for the RPC were originally proved by Talagrand [39] and Bovier and Kurkova [6] by analyzing the Gibbs measure in the Derrida REM and GREM, and it was only later noticed by Talagrand that they follow much more easily from the Bolthausen-Sznitman invariance. The main result of [30] stated in (76), basically, reverses Talagrand’s observation.

4 Structure of the Gibbs Measure

It was clear since they were discovered that the stability properties impose strong constraints on the structure of the Gibbs measure, but the question was whether they lead all the way to the Ruelle probability cascades? The first partial answer to this question was given in an influential work of Arguin and Aizenman, [3], who proved that, under a technical assumption that the overlap takes only finitely many values in the thermodynamic limit, R 1,2∈{q 0,…,q r }, the Aizenman-Contucci stochastic stability (74) implies the ultrametricity predicted by the Parisi ansatz. Soon after, it was shown in [25] under the same technical assumption that the Ghirlanda-Guerra identities also imply ultrametricity (an elementary proof can be found in [27]). Another approach was given by Talagrand in [43]. However, since at low temperature the overlap does not necessarily take finitely many values in the thermodynamic limit, all these result were not directly applicable to the SK and mixed p-spin models. Nevertheless, they strongly suggested that the stability properties can explain the Parisi ansatz and, indeed, it was recently proved in [28] that the Ghirlanda-Guerra identities imply ultrametricity (21) without any technical assumptions. Before we explain some of the main ideas in the proof, let us first describe several preliminary facts that follow from the Ghirlanda-Guerra identities.

Pure States

First of all, one can show very easily (see [25]) that the Ghirlanda-Guerra identities (67) yield the Parisi pure states picture described above. Namely, if c 2 is the largest point in the support of the distribution ζ of the overlap R 1,2 under \(\mathbb{E}G^{\otimes2}\) defined in (19) then, with probability one, G(∥h∥=c)=1. Moreover, the measure G is purely atomic if ζ({c 2})>0, otherwise, it has no atoms. Of course, it is clear that Parisi’s pure states picture in [34] was meant to be understood in approximate sense and, when ζ({c 2})=0 and G has not atoms, we can create pure states using ultrametricity by considering equivalence clusters in (22) for small positive diameter d>0. In the case when ζ({c 2})>0, the pure states picture holds not only for the asymptotic Gibbs measures in the infinite-volume limit, but also for the original Gibbs measures G N for finite size systems in some approximates sense, as was shown by Talagrand in [43].

Talagrand’s Positivity Principle

Another important consequence of the Ghirlanda-Guerra identities is the so-called Talagrand positivity principle, proved in [39], which states that the overlaps can take only non-negative values in the thermodynamic limit, so σ 1σ 2≥0 for any two points in the support of G. In the Parisi replica method, the overlap was always assumed to be non-negative due to the symmetry breaking, and we see that this, indeed, can be obtained using a small perturbation of the Hamiltonian which ensures the validity of the Ghirlanda-Guerra identities. One key application of the positivity principle is to show that Guerra’s interpolation argument leading to the upper bound (51) also works for mixed p-spin models that include the pure p-spin Hamiltonians for odd p≥3, which was observed by Talagrand in [40].

Characterizing Asymptotic Gibbs’ Measures

Finally, let us mention a fact that has been well known since the discovery of the Ghirlanda-Guerra identities, namely, that together with ultrametricity these identities determine the distribution of the entire overlap array uniquely in terms of the functional order parameter ζ and, moreover, one can approximate the overlap array in distribution by an overlap array generated by some Ruelle probability cascades. In other words, as soon as we have ultrametricity, all the predictions of the physicists are confirmed. The idea here is very straightforward and we will only illustrate it in the simplest case when the overlaps take finitely many values, R 1,2∈{q 0,…,q r }. The general case easily follows by approximation. In the discrete case, we only need to demonstrate that, using ultrametricity and the Ghirlanda-Guerra identities, we can compute in terms of ζ the probability of any particular configuration of finitely many overlaps,

$$ \mathbb{E} \bigl\langle I \bigl(R_{l,l'} = q_{l,l'} : l\neq l'\leq n+1 \bigr) \bigr\rangle, $$
(77)

for any n≥1 and any q l,l∈{q 0,…,q r }. Let us find the largest elements among q l,l for ll′ and, without loss of generality, suppose that q 1,n+1 is one of them. We only have to consider (q l,l) that are ultrametric, since, otherwise, (77) is equal to zero. In particular, since q 1,n+1 is the largest, for 2≤ln,

$$q_{1,l} \geq\min (q_{1,n+1}, q_{l,n+1} ) = q_{l,n+1} $$

and

$$q_{l,n+1} \geq\min (q_{1,n+1}, q_{1,l} ) = q_{1,l}, $$

which implies that q 1,l =q l,n+1. Hence, if the overlap R 1,n+1=q 1,n+1 then, for all 2≤ln, R 1,l =q 1,l automatically implies that R l,n+1=q 1,l and (77) equals

$$ \mathbb{E} \bigl\langle I \bigl(R_{l,l'} = q_{l,l'} : l,l'\leq n \bigr) I (R_{1,n+1} = q_{1,n+1} ) \bigr \rangle. $$
(78)

In other words, if we know that the replicas σ 1 and σ n+1 are the closest then, due to ultrametricity, all the conditions R l,n+1=q l,n+1=q 1,l become redundant and we can omit them. The quantity (78) is now of the same type as the left hand side of (68) and, therefore, the Ghirlanda-Guerra identities imply that it is equal to

(79)

We can continue this computation recursively over n and, in the end, (77) will be expressed completely in terms of the distribution of one overlap, ζ. To conclude that the overlap array can actually be generated by the Ruelle probability cascades corresponding to the functional order parameter ζ, we only need to recall that both properties, the Ghirlanda-Guerra identities and ultrametricity, are satisfied by the RPC, so all the probabilities (77) will be given by the same computation.

Ultrametricity

It remains to explain the main result in [28] which shows that ultrametricity is also a consequence of the Ghirlanda-Guerra identities. The main idea of the proof is the following invariance property. Given n≥1, we consider n bounded measurable functions f 1,…,f n :ℝ→ℝ and let

$$ F\bigl(\sigma,\sigma^1,\ldots,\sigma^n\bigr) = f_1\bigl(\sigma\cdot\sigma^1\bigr)+\cdots +f_n\bigl(\sigma\cdot\sigma^n\bigr). $$
(80)

For 1≤ln, we define

$$ F_l\bigl(\sigma,\sigma^1,\ldots,\sigma^n \bigr) = F\bigl(\sigma,\sigma^1,\ldots ,\sigma^n\bigr) - f_l\bigl( \sigma\cdot\sigma^l\bigr)+ \mathbb{E} \bigl \langle f_l(R_{1,2}) \bigr\rangle, $$
(81)

where, as before, 〈⋅〉 denotes the average with respect to G ⊗∞. Then, for any bounded measurable function Φ of the overlaps (R l,l) l,l′≤n of n replicas,

$$ \mathbb{E} \langle\varPhi \rangle= \mathbb{E} \biggl\langle \varPhi \frac{\exp\sum_{l=1}^{n} F_l(\sigma^l,\sigma^1,\ldots,\sigma^n)}{ \langle\exp F(\sigma,\sigma^1,\ldots,\sigma^n)\rangle_{-}^n} \biggr\rangle, $$
(82)

where 〈⋅〉 in the denominator is the average in σ for fixed σ 1,…,σ n with respect to the measure G. One can think of the ratio on the right hand side as a change of density that does not affect the distribution of the overlaps of n replicas. Originally, this invariance property was discovered using the unified stability property (76), so the Aizenman-Contucci stochastic stability played an equally important role. However, the proof presented in [28] is much more simple and straightforward, and is based only on the Ghirlanda-Guerra identities. In some sense, this is good news because the Aizenman-Contucci stability is a more subtle property to work with than the Ghirlanda-Guerra identities, especially in the thermodynamic limit. To prove (82), one can consider an interpolating function

$$ \varphi(t) = \mathbb{E} \biggl\langle \varPhi \frac{\exp\sum_{l=1}^{n} t F_l(\sigma^l,\sigma^1,\ldots,\sigma^n)}{\langle\exp t F(\sigma,\sigma^1,\ldots,\sigma^n)\rangle_{-}^n} \biggr\rangle $$
(83)

and, using an elementary calculation, check that the Ghirlanda-Guerra identities imply that all the derivatives vanish at zero, φ (k)(0)=0. Taylor’s expansion and some basics estimates of the derivatives yield that this function is constant for all t≥0, proving that φ(0)=φ(1), which is precisely (82). A special feature of the invariance property (82) is that it contains some very useful information not only about the overlaps but also about the Gibbs weights of the neighborhoods of the replicas σ 1,…,σ n. Let us give one simple example. Recall that the measure G is concentrated on the sphere ∥h∥=c and, for q=c 2ε, let f 1(x)=tI(xq) and f 2=⋯=f n =0. Then

$$F\bigl(\sigma,\sigma^1,\ldots,\sigma^n\bigr) = t I\bigl( \sigma\cdot\sigma^1 \geq q\bigr) $$

is a scaled indicator of a small neighborhood of σ 1 on the sphere ∥h∥=c. If we denote by W 1=G(σ:σσ 1q) the Gibbs weight of this neighborhood then the average in the denominator in (82) is equal to

$$\bigl\langle\exp F\bigl(\sigma,\sigma^1,\ldots,\sigma^n \bigr)\bigr\rangle_{-} = W_1 e^t + 1-W_1. $$

Suppose now that the function Φ=I A is an indicator of the event

$$A = \bigl\{\bigl(\sigma^1,\ldots, \sigma^n\bigr) : \sigma^1 \cdot\sigma^l <q \mbox{ for } 2\leq l\leq n \bigr \} $$

that the replicas σ 2,…,σ n are outside of this neighborhood of σ 1. Then, it is easy to see that

$$\sum_{l=1}^{n} F_l\bigl( \sigma^l,\sigma^1,\ldots,\sigma^n\bigr) = t \mathbb{E}\bigl\langle I(R_{1,2} \geq q)\bigr\rangle = : t\gamma $$

and (82) becomes

$$ \mathbb{E} \langle I_A \rangle= \mathbb{E} \biggl\langle I_A \frac{e^{t\gamma}}{ (W_1 e^t + 1-W_1)^n} \biggr\rangle, $$
(84)

which may be viewed as a condition on the weight W 1 and the event A. This is just one artificial example, but the idea can be pushed much further and with some work one can obtain some very useful consequences about the structure of the measure G. One of these consequences is the following “duplication property”.

Suppose that with positive probability over the choice of the measure G we can sample n replicas σ 1,…,σ n from G that are approximately at certain fixed distances from each other. Of course, since all replicas live on the same sphere, this can be expressed in terms of the overlaps, R l,la l,l, for some n×n matrix of constraints A=(a l,l). Let

$$a_n^* = \max(a_{1,n},\ldots,a_{n-1,n}) $$

be the constraint corresponding to the closest point among σ 1,…,σ n−1 to the last replica σ n and suppose that the distance between them is strictly positive, \(a_{n}^{*}<c^{2}\). Then, using the invariance property (82), one can show that with positive probability over the choice of G one can sample n+1 replicas σ 1,…,σ n+1 from G such that the distances between the first n replicas are as above, R l,la l,l, and the new replica σ n+1 duplicates σ n in the following sense (see Fig. 2). First of all, it is approximately at the same distances from the replicas σ 1,…,σ n−1 as σ n,

$$R_{1,n+1} \approx a_{1,n},\ldots, R_{n-1,n+1} \approx a_{n-1,n}, $$

and, moreover, it is at least as far from σ n as the other closest replica, \(R_{n,n+1} \lesssim a_{n}^{*}\). The motivation for this property becomes clear if we recall how the support of the Ruelle probability cascades was constructed in (29). In that case it is obvious that one can always duplicate any replica with probability one, and not just in a weak sense described here. On the other hand, even this weak duplication property implies that the support of the measure G is ultrametric with probability one.

Fig. 2
figure 2

Duplication property. The grey area corresponds to all the points on the sphere ∥h∥=c which are approximately at the same distance from the first n−1 replicas σ 1,…,σ n−1 as the last replica σ n. Then the white point is a duplicate σ n+1 of σ n. It is in the grey area, so it is approximately at the same distances from the replicas σ 1,…,σ n−1 as σ n, and it is at least as far from σ n as the closest of the first n−1 replicas, in this case σ n−1

To see this, suppose that ultrametricity is violated and with positive probability three replicas can take values

$$ R_{1,2}\approx x, R_{1,3} \approx y \quad \mbox{and} \quad R_{2,3}\approx z $$
(85)

for some constraints x<yz<c 2. Let us duplicate each replica in the above sense m−1 times, so that at the end we will have n=3m replicas. Suppose that

$$\{1,\ldots,n\} = I_1\cup I_2\cup I_3 $$

is the partition such that jI j for j≤3, |I j |=m and each I j ∖{j} is precisely the index set of duplicates of σ j. Then, it should be almost obvious that

  1. (a)

    R l,lz for all ll′≤n,

  2. (b)

    R l,lx if lI 1,l′∈I 2, R l,ly if lI 1, l′∈I 3 and R l,lz if lI 2, l′∈I 3.

Property (a) holds, because a new replica never get too close to the old replicas and the overlap will never exceed z, and property (b) holds, because, every time we duplicate a point, the distances to all other points will be the same, so the overlaps between point in different groups I 1, I 2, I 3 will always be the same as the original constraints in (85). This means that with positive probability we can find replicas σ 1,…,σ n on the sphere of radius c such that their overlaps satisfy properties (a) and (b). Let \(\bar{\sigma}^{j}\) be the barycenter of the set {σ l:lI j }. The condition (a) implies that

$$\bigl\|\bar{\sigma}^j\bigr\|^2 = \frac{1}{m^2}\sum _{l\in I_j} \bigl\|\sigma^l\bigr\|^2 + \frac{1}{m^2}\sum_{l\neq l'\in I_j} R_{l,l'} \lesssim\frac{mc^2 + m(m-1) z}{m^2}, $$

and the condition (b) implies that \(\bar{\sigma}^{1}\cdot\bar{\sigma}^{2} \approx x\), \(\bar{\sigma}^{1}\cdot\bar{\sigma}^{3} \approx y\) and \(\bar{\sigma}^{2}\cdot\bar{\sigma}^{3} \approx z\). Hence,

$$\bigl\|\bar{\sigma}^2-\bar{\sigma}^3\bigr\|^2 = \bigl\| \bar{\sigma}^2\bigr\|^2 + \bigl\| \bar {\sigma}^3 \bigr\|^2 - 2\bar{\sigma}^2\cdot\bar{\sigma}^3 \lesssim\frac{2(c^2 -z)}{m} $$

and \(0< b-a \approx\bar{\sigma}^{1} \cdot\bar{\sigma}^{3} -\bar {\sigma }^{1}\cdot\bar{\sigma}^{2} \lesssim K m^{-1/2}\). We arrive at contradiction by letting m→∞. The conclusion is that if the measure G satisfies the Ghirlanda-Guerra identities then its support must be ultrametric with probability one. Therefore, as was explained above, under a small perturbation of the Hamiltonian that yields the Ghirlanda-Guerra identities, all possible limits of the Gibbs measure can be identified with the Ruelle probability cascades. Moreover, we will see below that for “generic” mixed p-spin models no perturbation is necessary and the limit of the Gibbs measure is unique.

5 Consequences of the Parisi Formula

Once we know the Parisi formula for the free energy in the mixed p-spin models, some results can be extended and strengthened.

Universality in the Disorder

First of all, one can prove the universality in the disorder and show that the Parisi formula (14) still holds if the Gaussian random variables (g ij ) in the Hamiltonian (1) are replaced by i.i.d. random variables (x ij ) from any other distribution, as long as

$$ \mathbb{E}x_{11} = 0, \qquad \mathbb{E}x_{11}^2 =1 \quad \mbox{and} \quad \mathbb {E}|x_{11}|^3 <\infty. $$
(86)

This was proved by Carmona and Hu in [7], who generalized an earlier result of Talagrand [38] in the case of the Bernoulli disorder. The proof is based on the following interpolation between the two Hamiltonians for 0≤t≤1,

$$ H_{N,t}(\sigma) = \frac{1}{\sqrt{N}} \sum_{i,j =1}^N (\sqrt {t}x_{ij} + \sqrt{1-t}\, g_{ij} )\sigma_i \sigma_j, $$
(87)

and the estimates of the derivative of the free energy along this interpolation using some approximate integration by parts formulas. The same result was also proved in [7] for the p-spin model.

Generic Mixed p-Spin Models

In another direction, we can say more about the thermodynamic limit in the case of the so-called generic mixed p-spin models whose Hamitlonian (4) contains sufficiently many pure p-spin terms (5), so that the following condition is satisfied:

  1. (G)

    linear span of constants and power functions x p corresponding to β p ≠0 is dense in (C[−1,1],∥⋅∥).

The reason for this is that each pure p-spin term contains some information about the pth moment of the overlap and the condition (G) allows us to confirm the Parisi ansatz for the Gibbs measure in the thermodynamic limit without the help of the perturbation term in (55), so the result becomes more pure, in some sense. Moreover, in this case we can show that the asymptotic Gibbs measure is unique. This can be seen in two steps. Let us denote by the infimum on the right-hand side of (14) and let be the set of all limits over subsequences of the distribution of the overlap R 1,2 of two spin configurations sampled from the Gibbs measure G N corresponding to the Hamiltonian (4). It was proved by Talagrand in [41] that, for each p≥1, the Parisi formula is differentiable with respect to β p and

(88)

for all . If β p ≠0, then this implies that all the limits have the same pth moment and the condition (G) then implies that for some unique distribution ζ 0 on [−1,1]. As a second step, one can prove the convergence of the entire overlap array (R l,l) l,l′≥1 in distribution as follows. As a consequence of the differentiability of the Parisi formula, it was proved in [26] that, whenever β p ≠0, the Ghirlanda-Guerra identities

$$ \mathbb{E} \bigl\langle f R_{1,n+1}^p \bigr\rangle = \frac{1}{n}\mathbb{E} \langle f \rangle\mathbb{E} \bigl\langle R_{1,2}^p \bigr\rangle + \frac{1}{n}\sum _{l=2}^{n}\mathbb{E} \bigl\langle f R_{1,l}^p \bigr\rangle $$
(89)

for the pth moment of the overlap hold in the thermodynamic limit in a strong sense, for the Gibbs measure G N corresponding to the original Hamiltonian (4) without the perturbation term (55). The condition (G) then, obviously, implies that the general Ghirlanda-Guerra identities (68) also hold. As we explained above, the Ghirlanda-Guerra identities imply ultrametricity and, as a result, the distribution of the entire overlap array can be uniquely determined by the distribution of one overlap. Since this distribution ζ 0 is unique, the distribution of the entire overlap array under \(\mathbb{E}G_{N}^{\otimes\infty}\) also has a unique limit, so the asymptotic Gibbs measure is unique. Notice also that, by Talagrand’s positivity principle, the distribution ζ 0 is, actually, supported on [0,1]. Finally, in this case one can show using the Aizenman-Sims-Starr scheme (44) that the limit of the free energy is equal to , where we understand that the definition of the Parisi functional is extended to all distributions on [0,1] by continuity. Thus, the infimum in the Parisi formula (14) is achieved on the asymptotic distribution of the overlap, ζ 0. The functional is conjectured to be convex in ζ (see [23] for a partial result) and, if true, this would imply that ζ 0 is the unique minimizer of . Convexity of would also give a more direct approach to describing the high temperature region, which was done by Talagrand in [39] (see also [44]).