1 Introduction

In recent years, mixing times of dynamics of statistical mechanical models have been the focus of much probability research, drawing interest from researchers in mathematics, physics and computer science. The topic is both physically relevant and mathematically rich. But up to now, most of the attention has focused on particular models including rigorous results for several mean-field models. A few examples are (a) the Curie–Weiss (mean-field Ising) model [5, 6, 12], (b) the mean-field Blume–Capel model [8, 11], (c) the Curie–Weiss–Potts (mean-field Potts) model [1, 4, 9]. A good survey of the topic of mixing times of statistical mechanical models can be found in the recent paper by Cuff et. al. [4].

An important question driving the work in the field is the relationship between the mixing times of the dynamics and the equilibrium phase transition structure of the corresponding statistical mechanical models. For example, the Curie–Weiss model, which undergoes a continuous, second-order, phase transition, was one of the first models studied to investigate this relationship and it was found that the mixing times undergo a transition from rapid to slow mixing at precisely the same critical value as the equilibrium phase transition [13]. This property was also shown for the mean-field Blume–Capel model [11] in the parameter regime where the model undergoes a continuous, second-order phase transition.

On the other hand, for models that exhibit a discontinuous, first-order, phase transition, they do not appear to share this same property. This was first verified for the mean-field Blume–Capel model in the discontinuous phase transition parameter regime [11] and recently for the Curie–Weiss–Potts model [4]. For these models, it was shown that the change in mixing times occurs, not at the equilibrium phase transition value, but instead at a smaller parameter value at which metastable states first emerge.

The results for models that exhibit a continuous phase transition were obtained by a direct application of the standard path coupling method that requires contraction of the mean path coupling distance between all neighboring configurations. See [2] and [13]. For models that exhibit a discontinuous phase transition straightforward path coupling methods fail and the results of [11] were obtained by applying a novel extension called aggregate path coupling in one dimension and large deviations estimates.

In this paper, we extend the work we did in [11] and provide a single general framework for determining the parameter regime for rapid mixing of the Glauber dynamics for a large class of statistical mechanical models, including all those listed above. The aggregate path coupling method presented here extends the classical path coupling method in two directions. First, we consider macroscopic quantities in higher dimensions and find a monotone contraction path by considering a related variational problem in the continuous space. We also do not require the monotone path to be a nearest-neighbor path. In fact, in most situations we consider, a nearest-neighbor path will not work for proving contraction. Second, the aggregation of the mean path distance along a monotone path is shown to contract for some but not all pairs of configurations. Yet, we use measure concentration and large deviation principle to show that showing contraction for pairs of configurations, where at least one of them is close enough to the equilibrium, is sufficient for establishing rapid mixing.

Our main result is general enough to be applied to statistical mechanical models that undergo both types of phase transitions and to models whose macroscopic quantity are in higher dimensions. Moreover, despite the generality, the application of our results requires straightforward conditions that we illustrate in Sect. 10. This is a significant simplification for proving rapid mixing for statistical mechanical models, especially those that undergo first-order, discontinuous phase transitions. Lastly, our results also provide a link between measure concentration of the stationary distribution and rapid mixing of the corresponding dynamics for this class of statistical mechanical models. This idea has been previously studied in [15] where the main result showed that rapid mixing implied measure concentration defined in terms of Lipschitz functions. In our work, we prove a type of converse where measure concentration, in terms of a large deviation principle, implies rapid mixing.

The paper is organized as follows. In Sect. 2 the general construction of the class of the mean-field models considered in this paper is provided. Next, in Sect. 3 the large deviation principle for the Gibbs measures from [7] that will be used in the main result of this manuscript is reviewed, and the concept of equilibrium macrostates is discussed. In Sects. 4 and 5 the Glauber dynamics is introduced, and its transition probabilities are analyzed. Section 6 provides a greedy coupling construction, standard for the Glauber dynamics of a mean-field statistical mechanical model. Section 7 describes a single time step evolution of the mean coupling distance for two configurations whose spin proportion vectors are \({\varepsilon }\)-close. Section 7 is followed by Sect. 8 which describes a single time step evolution of the mean coupling distance in general by aggregating the mean coupling distances along a monotone path of points connecting two configurations. Also, in Sect. 8 general conditions for the main result Theorem 9.2 are stated and discussed. The main result is stated and proved in Sect. 9. The paper concludes with Sect. 10, where the region of rapid mixing \(\beta < \beta _s(q,r)\) for the generalized Curie–Weiss–Potts model (including the standard Curie–Weiss–Potts model) is proven as an immediate and simple application of the main result of the current paper.

2 Gibbs Ensembles

We begin by defining the general class of statistical mechanical spin models for which our results can be applied. As mentioned above, this class includes all of the models listed in the introduction and we illustrate the application of our main result for the particular model: the Curie–Weiss–Potts model, in Sect. 10.

Let q be a fixed integer and define \(\Lambda = \{ e^1, e^2, \ldots , e^q \}\), where \(e^k\) are the q standard basis vectors of \({\mathbb {R}}^q\). A configuration of the model has the form \(\omega = (\omega _1, \omega _2, \ldots , \omega _n) \in \Lambda ^n\). We will consider a configuration on a graph with n vertices and let \(X_i(\omega ) = \omega _i\) be the spin at vertex i. The random variables \(X_i\)’s for \(i=1, 2, \ldots , n\) are independent and identically distributed with common distribution \(\rho \).

In terms of the microscopic quantities, the spins at each vertex, the relevant macroscopic quantity is the magnetization vector (aka empirical measure or proportion vector)

$$\begin{aligned} L_n(\omega ) = (L_{n,1}(\omega ), L_{n,2}(\omega ), \ldots , L_{n,q}(\omega )), \end{aligned}$$
(1)

where the kth component is defined by

$$\begin{aligned} L_{n, k}(\omega ) = \frac{1}{n} \sum _{i=1}^n \delta (\omega _i, e^k) \end{aligned}$$

which yields the proportion of spins in configuration \(\omega \) that take on the value \(e^k\). The magnetization vector \(L_n\) takes values in the set of probability vectors

$$\begin{aligned} \mathcal {P}_n = \left\{ \frac{n_k}{n} : \hbox {each} \ n_k \in \{0, 1, \ldots , n\} \ \hbox {and} \ \sum _{k=1}^q n_k = n \right\} \end{aligned}$$
(2)

inside the continuous simplex

$$\begin{aligned} \mathcal {P} = \left\{ \nu \in {\mathbb {R}}^q : \nu = (\nu _1, \nu _2, \ldots , \nu _q), \hbox {each} \ \nu _k \ge 0, \ \sum _{k=1}^q \nu _k = 1 \right\} . \end{aligned}$$

Remark 2.1

For \(q=2\), the empirical measure \(L_n\) yields the empirical mean \(S_n(\omega )/n\) where \(S_n(\omega ) = \sum _{i=1}^n \omega _i\). Therefore, the class of models considered in this paper includes those where the relevant macroscopic quantity is the empirical mean, like the Curie–Weiss (mean-field Ising) model.

Statistical mechanical models are defined in terms of the Hamiltonian function, which we denote by \(H_n(\omega )\). The Hamiltonian function encodes the interactions of the individual spins and the total energy of a configuration. To take advantage of the large deviation bounds stated in the next section, we assume that the Hamiltonian can be expressed in terms of the empirical measures \(L_n\) as stated in the following definition.

Definition 2.2

For \(z \in {\mathbb {R}}^q\), we define the interaction representation function, denoted by H(z), to be a differentiable function satisfying

$$\begin{aligned} H_n (\omega ) = n H(L_n(\omega )) \end{aligned}$$

Throughout the paper we suppose the interaction representation function H(z) is a finite concave \(\mathcal {C}^3({\mathbb {R}}^q)\) function that has the form

$$\begin{aligned} H(z)=H_1(z_1)+H_2(z_2)+\ldots +H_q(z_q) \end{aligned}$$

For the Curie–Weiss–Potts (CWP) model discussed in Sect. 10,

$$\begin{aligned} H(z)=-{1 \over 2} \big <z,z \big >=-{1 \over 2} z_1^2-{1 \over 2} z_2^2-\ldots -{1 \over 2} z_q^2. \end{aligned}$$

Definition 2.3

The Gibbs measure or Gibbs ensemble in statistical mechanics is defined as

$$\begin{aligned} P_{n, \beta } (B) = \frac{1}{Z_n(\beta )} \int _B \exp \left\{ -\beta H_n(\omega ) \right\} dP_n = \frac{1}{Z_n(\beta )} \int _B \exp \left\{ -\beta n \, H\left( L_n(\omega ) \right) \right\} dP_n \end{aligned}$$
(3)

where \(P_n\) is the product measure with identical marginals \(\rho \) and \(Z_n(\beta ) = \int _{\Lambda ^n} \exp \left\{ -\beta H_n(\omega ) \right\} dP_n\) is the partition function. The positive parameter \(\beta \) represents the inverse temperature of the external heat bath.

Remark 2.4

To simplify the presentation, we take \(\Lambda = \{ e^1, e^2, \ldots , e^q \}\), where \(e^k\) are the q standard basis vectors of \({\mathbb {R}}^q\). But our analysis has a straight-forward generalization to the case where \(\Lambda = \{ \theta ^1, \theta ^2, \ldots , \theta ^q \}\), where \(\theta ^k\) is any basis of \({\mathbb {R}}^q\). In this case, the product measure \(P_n\) would have identical one-dimensional marginals equal to

$$\begin{aligned} \bar{\rho } = \frac{1}{q} \sum _{i=1}^q \delta _{\theta ^i} \end{aligned}$$

An important tool we use to prove rapid mixing of the Glauber dynamics that converge to the Gibbs ensemble above is the large deviation principle of the empirical measure with respect to the Gibbs ensemble. This measure concentration is precisely what drives the rapid mixing. The large deviations background is presented next.

3 Large Deviations

By Sanov’s Theorem, the empirical measure \(L_n\) satisfies the large deviation principle (LDP) with respect to the product measure \(P_n\) with identical marginals \(\rho \) and the rate function is given by the relative entropy

$$\begin{aligned} R(\nu | \rho ) = \sum _{k=1}^q \nu _k \log \left( \frac{\nu _k}{\rho _k} \right) \end{aligned}$$

for \(\nu \in \mathcal {P}\). Theorem 2.4 of [7] yields the following result for the Gibbs measures (3).

Theorem 3.1

The empirical measure \(L_n\) satisfies the LDP with respect to the Gibbs measure \(P_{n, \beta }\) with rate function

$$\begin{aligned} I_\beta (z) = R(z | \rho ) + \beta H(z) - \inf _t \{ R(t | \rho ) + \beta H(t) \} \end{aligned}$$

In other words, for any closed subset F,

$$\begin{aligned} \limsup _{n {\rightarrow }\infty } \frac{1}{n} \log P_{n, \beta } ( L_n \in F ) \le - \inf _{z \in F} I_\beta (z) \end{aligned}$$
(4)

and for any open subset G,

$$\begin{aligned} \liminf _{n {\rightarrow }\infty } \frac{1}{n} \log P_{n, \beta } ( L_n \in G ) \ge - \inf _{z \in G} I_\beta (z). \end{aligned}$$

The LDP upper bound (4) stated in the previous theorem yields the following natural definition of equilibrium macrostates of the model.

$$\begin{aligned} \mathcal {E}_\beta := \left\{ \nu \in \mathcal {P} : \nu \ \hbox {minimizes} \ R(\nu | \rho ) + \beta H(\nu ) \right\} \end{aligned}$$
(5)

For our main result, we assume that there exists a positive interval B such that for all \(\beta \in B\), \(\mathcal {E}_\beta \) consists of a single state \(z_\beta \). We refer to this interval B as the single phase region.

Again from the LDP upper bound, when \(\beta \) lies in the single phase region, we get

$$\begin{aligned} P_{n, \beta } (L_n \in dx) \Longrightarrow \delta _{z_\beta } \,\, {\hbox {as}} \,\, n {\rightarrow }\infty . \end{aligned}$$
(6)

The above asymptotic behavior will play a key role in obtaining a rapid mixing time rate for the Glauber dynamics corresponding to the Gibbs measures (3).

An important function in our work is the free energy functional defined below. It is defined in terms of the interaction representation function H and the logarithmic moment generating function of a single spin; specifically, for \(z \in {\mathbb {R}}^q\) and \(\rho \) equal to the uniform distribution, the logarithmic moment generating function of \(X_1\), the spin at vertex 1, is defined by

$$\begin{aligned} \Gamma (z) = \log \left( \frac{1}{q} \sum _{k=1}^q \exp \{z_k\}\right) . \end{aligned}$$
(7)

Definition 3.2

The free energy functional for the Gibbs ensemble \(P_{n, \beta }\) is defined as

$$\begin{aligned} G_{\beta } (z) = \beta (-H)^*(-\nabla H(z)) - \Gamma ( -\beta \nabla H(z)) \end{aligned}$$
(8)

where for a finite, differentiable, convex function F on \({\mathbb {R}}^q\), \(F^*\) denotes its Legendre-Fenchel transform defined by

$$\begin{aligned} F^*(z) = \sup _{x \in {\mathbb {R}}^q} \{ \langle x, z \rangle - F(x) \} \end{aligned}$$

The following lemma yields an alternative formulation of the set of equilibrium macrostates of the Gibbs ensemble in terms of the free energy functional. The proof is a straightforward generalization of Theorem A.1 in [3].

Lemma 3.3

Suppose H is finite, differentiable, and concave. Then

$$\begin{aligned} \inf _{z \in \mathcal {P}} \{ R(z | \rho ) + \beta H(z) \} = \inf _{z \in {\mathbb {R}}^q} \{ G_\beta (z) \} \end{aligned}$$

Moreover, \(z_0 \in \mathcal {P}\) is a minimizer of \(R(z | \rho ) + \beta H(z)\) if and only if \(z_0\) is a minimizer of \(G_\beta (z)\).

Therefore, the set of equilibrium macrostates can be expressed in terms of the free energy functional as

$$\begin{aligned} \mathcal {E}_\beta = \left\{ z \in \mathcal {P} : z \ \hbox {minimizes} \ G_\beta (z) \right\} \end{aligned}$$
(9)

As mentioned above, we consider only the single phase region of the Gibbs ensemble; i.e. values of \(\beta \) where \(G_\beta (z)\) has a unique global minimum. For example, for the Curie–Weiss–Potts model, the single phase region are values of \(\beta \) such that \(0 < \beta < \beta _c := (2(q-1)/(q-2)) \log (q-1)\). At this critical value \(\beta _c\), the model undergoes a first-order, discontinuous phase transition in which the single phase changes to a multiple phase discontinuously. This is discussed in detail in Sect. 10.

As we will show, the geometry of the free energy functional \(G_\beta \) not only determines the equilibrium behavior of the Gibbs ensembles but it also yields the condition for rapid mixing of the corresponding Glauber dynamics.

4 Glauber Dynamics and Mixing Times

On the configuration space \(\Lambda ^n\), we define the Glauber dynamics for the class of spin models considered in this paper. These dynamics yield a reversible Markov chain \(X^t\) with stationary distribution being the Gibbs ensemble \(P_{n, \beta }\).

  1. (i)

    Select a vertex i uniformly,

  2. (ii)

    Update the spin at vertex i according to the distribution \(P_{n, \beta }\), conditioned on the event that the spins at all vertices not equal to i remain unchanged.

For a given configuration \(\sigma = (\sigma _1, \sigma _2, \ldots , \sigma _n)\), denote by \(\sigma _{i, e^k}\) the configuration that agrees with \(\sigma \) at all vertices \(j \ne i\) and the spin at the vertex i is \(e^k\); i.e.

$$\begin{aligned} \sigma _{i, e^k} = (\sigma _1, \sigma _2, \ldots , \sigma _{i-1}, e^k, \sigma _{i+1}, \ldots , \sigma _n) \end{aligned}$$

Then if the current configuration is \(\sigma \) and vertex i is selected, the probability the spin at i is updated to \(e^k\), denoted by \(P(\sigma {\rightarrow }\sigma _{i, e^k})\), is equal to

$$\begin{aligned} P(\sigma {\rightarrow }\sigma _{i, e^k}) = \frac{\exp \big \{-\beta n H(L_n(\sigma _{i, e^k}))\big \}}{ \sum _{\ell =1}^q \exp \big \{-\beta n H(L_n(\sigma _{i, e^\ell }))\big \}}. \end{aligned}$$
(10)

The mixing time is a measure of the convergence rate of a Markov chain to its stationary distribution and is defined in terms of the total variation distance between two distributions \(\mu \) and \(\nu \) on the configuration space \(\Omega \) is defined by

$$\begin{aligned} \Vert \mu - \nu \Vert _{\mathrm{TV}} = \sup _{A \subset \Omega } | \mu (A) - \nu (A)| = \frac{1}{2} \sum _{x \in \Omega } | \mu (x) - \nu (x)| \end{aligned}$$

Given the convergence of the Markov chain, we define the maximal distance to stationary to be

$$\begin{aligned} d(t) = \max _{x \in \Omega } \Vert P^t(x, \cdot ) - \pi \Vert _{\mathrm{TV}} \end{aligned}$$

where \(P^t(x, \cdot )\) is the transition probability of the Markov chain starting in configuration x and \(\pi \) is its stationary distribution. Then, given \(\epsilon >0\), the mixing time of the Markov chain is defined by

$$\begin{aligned} t_{mix}(\epsilon ) = \min \{ t : d(t) \le \epsilon \} \end{aligned}$$

See [13] for a detailed survey on the theory of mixing times.

Rates of mixing times are generally categorized into two groups: rapid mixing which implies that the mixing time exhibits polynomial growth with respect to the system size n, and slow mixing which implies that the mixing time grows exponentially with the system size. Determining the parameter regime where a model undergoes rapid mixing is of major importance, as it is in this region that the application of the dynamics is physically feasible.

5 Glauber Dynamics Transition Probabilities

In this section, we show that the update probabilities of the Glauber dynamics introduced in the previous section can be expressed in terms of the derivative of the logarithmic moment generating function of the individual spins \(\Gamma \) defined in (7). The partial derivative of \(\Gamma \) in the direction of \(e^\ell \) has the form

$$\begin{aligned} \left[ \partial _{\ell } \Gamma \right] (z) = \frac{\exp \{z_\ell \}}{\sum _{k=1}^q \exp \{z_k\}} \end{aligned}$$

We introduce the following function that plays the key role in our analysis.

$$\begin{aligned} g_{\ell }^{H, \beta }(z) = \left[ \partial _{\ell } \Gamma \right] (-\beta \nabla H(z)) = \frac{\exp \left( -\beta \, [\partial _\ell H](z) \right) }{ \sum _{k=1}^q \exp \left( -\beta \, [\partial _k H](z) \right) }. \end{aligned}$$
(11)

Denote

$$\begin{aligned} g^{H, \beta }(z):=\Big (g_1^{H, \beta }(z),\ldots ,g_q^{H, \beta }(z) \Big ). \end{aligned}$$
(12)

Note that \(g^{H, \beta }(z)\) maps the simplex

$$\begin{aligned} \mathcal {P} = \left\{ \nu \in {\mathbb {R}}^q : \nu = (\nu _1, \nu _2, \ldots , \nu _q), \hbox {each} \ \nu _k \ge 0, \ \sum _{k=1}^q \nu _k = 1 \right\} \end{aligned}$$

into itself and it can be expressed in terms of the free energy functional \(G_\beta \) defined in (8) by

$$\begin{aligned} \nabla G_\beta (z) = \beta [ \nabla (-H)^*(-\nabla H(z)) - g^{H, \beta }(z) ] \end{aligned}$$

Lemma 5.1

Let \(P(\sigma {\rightarrow }\sigma _{i, e^k})\) be the Glauber dynamics update probabilities given in (10). Then, for any \(k \in \{1,2,\ldots , q \}\),

$$\begin{aligned}&P(\sigma {\rightarrow }\sigma _{i, e^k}) = \left[ \partial _k \Gamma \right] \Big (-\beta \nabla H(L_n(\sigma ))-{\beta \over 2n}\mathcal {Q}H(L_n(\sigma ))+{\beta \over n} \Big < \sigma _i, \mathcal {Q}H(L_n(\sigma ))\Big > \sigma _i \Big )\\&\quad \quad \quad \quad \quad \qquad \qquad +\,O\left( \frac{1}{n^2} \right) , \end{aligned}$$

where \(\mathcal {Q}\) is the following linear operator:

$$\begin{aligned} \mathcal {Q} F(z):=\left( \partial _1^2 F(z), ~\partial _2^2 F(z), ~\ldots , ~\partial _q^2 F(z)\right) , \end{aligned}$$

for any \(~F: \mathbb {R}^q \rightarrow \mathbb {R}~\) in \(\mathcal {C}^2\).

Proof

Suppose \(\sigma _i=e^m\). By Taylor’s theorem, for any \(k \not = m\), we have

$$\begin{aligned} H(L_n(\sigma _{i, e^k}))= & {} H(L_n(\sigma )) +H_m\big (L_{n, m} (\sigma ) -1/n \big )-H_m\big (L_{n, m} (\sigma ) \big ) \\&+ ~H_k\big (L_{n, k} (\sigma ) +1/n \big )-H_k\big (L_{n, k} (\sigma ) \big ) \\= & {} H(L_n(\sigma )) + \frac{1}{n} \left[ \partial _{k} H(L_n(\sigma )) - \partial _m H(L_n(\sigma )) \right] \\&+~{1 \over 2n^2} \left[ \partial ^2_{k} H(L_n(\sigma )) + \partial ^2_m H(L_n(\sigma )) \right] + O\left( \frac{1}{n^3} \right) . \end{aligned}$$

Now, if \(k=m\),

$$\begin{aligned} H(L_n(\sigma _{i, e^k}))= & {} H(L_n(\sigma ))\\= & {} H(L_n(\sigma )) + \frac{1}{n} \left[ \partial _{k} H(L_n(\sigma )) -\partial _m H(L_n(\sigma )) \right] \\&+\,{1 \over 2n^2} \left[ -\partial ^2_{k} H(L_n(\sigma )) +\partial ^2_m H(L_n(\sigma )) \right] . \end{aligned}$$

This implies that the transition probability (10) has the form

$$\begin{aligned} P(\sigma {\rightarrow }\sigma _{i, e^k})= & {} \left[ \partial _k \Gamma \right] \Big (-\beta \nabla H(L_n(\sigma ))-{\beta \over 2n}\mathcal {Q}H(L_n(\sigma ))+{\beta \over n}\partial ^2_{m}H(L_n(\sigma ))e^m \Big )\\&+ \,O\left( \frac{1}{n^2} \right) \end{aligned}$$

as \(\exp \big \{O\left( \frac{1}{n^2} \right) \big \}=1+O\left( \frac{1}{n^2} \right) \). \(\square \)

The above Lemma 5.1 can be restated as follows using Taylor expansion.

Corollary 5.2

Let \(P(\sigma {\rightarrow }\sigma _{i, e^k})\) be the Glauber dynamics update probabilities given in (10). Then, for any \(k \in \{1,2,\ldots , q \}\),

$$\begin{aligned} P(\sigma {\rightarrow }\sigma _{i, e^k}) = g_k^{H, \beta }(L_n(\sigma )) +{ \beta \over n} \varphi _{k, \sigma _i}^{H, \beta }(L_n(\sigma ))+O\left( \frac{1}{n^2} \right) , \end{aligned}$$

where

$$\begin{aligned} \varphi _{k, e^r}^{H, \beta }(z):=-{1 \over 2}\Big <\mathcal {Q}H(z), \left[ \nabla \partial _{k} \Gamma \right] (-\beta \nabla H(z)) \Big >+\Big < e^r, \mathcal {Q} H(z)\Big > \Big <e^r, \left[ \nabla \partial _{k} \Gamma \right] (-\beta \nabla H(z)) \Big > . \end{aligned}$$

As indicated in the title of the paper, we employ a coupling method for proving rapid mixing of Glauber dynamics of Gibbs ensembles. In the next section, we define the specific coupling used.

6 Coupling of Glauber Dynamics

We begin by defining a metric on the configuration space \(\Lambda ^n\). For two configurations \(\sigma \) and \(\tau \) in \(\Lambda ^n\), define

$$\begin{aligned} d(\sigma , \tau ) = \sum _{j=1}^n 1\{ \sigma _j \ne \tau _j \} \end{aligned}$$
(13)

which yields the number of vertices at which the two configurations differ.

Let \(X^t\) and \(Y^t\) be two copies of the Glauber dynamics. Here, we use the standard greedy coupling of \(X^t\) and \(Y^t\). At each time step a vertex is selected at random, uniformly from the n vertices. Suppose \(X^t=\sigma \), \(~Y^t=\tau \), and the vertex selected is denoted by j. Next, we erase the spin at location j in both processes, and replace it with a new one according to the following update probabilities. For all \(\ell = 1, 2, \ldots , q\), define

$$\begin{aligned} p_\ell = P(\sigma {\rightarrow }\sigma _{j, e^\ell }) \quad \hbox {and} \quad q_\ell = P(\tau {\rightarrow }\tau _{j, e^\ell }) \end{aligned}$$

and let

$$\begin{aligned} P_\ell = \min \{ p_\ell , q_\ell \} \,\, \hbox {and} \,\, P = \sum _{\ell = 1}^q P_\ell . \end{aligned}$$

Now, let B be a Bernoulli random variable with probability of success P. If \(B = 1\), we update the two chains equally with the following probabilities

$$\begin{aligned} P\left( X_j^{t+1} = e^\ell , Y_j^{t+1} = e^\ell \, | \, B=1\right) = \frac{P_\ell }{P} \end{aligned}$$

for \(\ell = 1, 2, \ldots , q\). On the other hand, if \(B= 0\), we update the chains differently according to the following probabilities

$$\begin{aligned} P\left( X_j^{t+1} = e^\ell , Y_j^{t+1} = e^m \, | \, B=0\right) = \frac{p_\ell - P_\ell }{1-P} \cdot \frac{q_m - P_m}{1-P} \end{aligned}$$

for all pairs \(\ell \ne m\). Then the total probability that the two chains update the same is equal to P and the total probability that the chains update differently is equal to \(1-P\).

Observe that once \(X^t=Y^t\), the processes remain matched (coupled) for the rest of the time. In the coupling literature, the time

$$\begin{aligned} \min \{t \ge 0 ~:~ X^t=Y^t\} \end{aligned}$$

is refered to as the coupling time.

The mean coupling distance \(~\mathbb {E}[d(X^t, Y^t) ]~\) is tied to the total variation distance via the following inequality known as the coupling inequality:

$$\begin{aligned} \Vert P^t(x, \cdot ) - P^t(y, \cdot ) \Vert _{{\hbox {TV}}} \le P(X^t \ne Y^t) \le \mathbb {E}[d(X^t, Y^t) ] \end{aligned}$$
(14)

The above inequality implies that the order of the mean coupling time is an upper bound on the order of the mixing time. See [13] and [14] for details on coupling and coupling inequalities.

7 Mean Coupling Distance

Fix \(~\varepsilon >0\). Consider two configurations \(\sigma \) and \(\tau \) such that

$$\begin{aligned} d(\sigma ,\tau )=d, \end{aligned}$$

where \(d(\sigma , \tau ) \in \mathbb {N}\) is the metric defined in (13) and \({\varepsilon }\le \Vert L_n(\sigma )-L_n(\tau )\Vert _1 < 2\varepsilon \).

Let \(\mathcal {I}=\{i_1,\ldots ,i_d\}\) be the set of vertices at which the spin values of the two configurations \(\sigma \) and \(\tau \) disagree. Define \(\kappa (e^\ell )\) to be the probability that the coupled processes update differently when the chosen vertex \(j\not \in \mathcal {I}\) has spin \(e^\ell \). If the chosen vertex j is such that \(\sigma _j=\tau _j=e^\ell \), then expressing \(\kappa (e^\ell )\) by total variation distance and by Corollary 5.2 of Lemma 5.1,

$$\begin{aligned} \kappa (e^\ell ):= & {} {1 \over 2}\sum _{k=1}^q \Big |P(\sigma {\rightarrow }\sigma _{j, e^k}) -P(\tau {\rightarrow }\tau _{j, e^k}) \Big |\nonumber \\= & {} {1 \over 2}\sum _{k=1}^q \Big | \Big (g_k^{H, \beta }(L_n(\sigma )) +{ \beta \over n} \varphi _{k, e^\ell }^{H, \beta }(L_n(\sigma )) \Big ) - \Big (g_k^{H, \beta }(L_n(\tau )) +{ \beta \over n} \varphi _{k, e^\ell }^{H, \beta }(L_n(\tau )) \Big ) \Big |\nonumber \\&+\,O\left( \frac{1}{n^2} \right) \nonumber \\= & {} {1 \over 2}\sum _{k=1}^q \Big | g_k^{H, \beta }(L_n(\sigma )) - g_k^{H, \beta }(L_n(\tau )) \Big | +O\left( \frac{\varepsilon }{n}+ \frac{1}{n^2} \right) . \end{aligned}$$
(15)

Next, we observe that for any \(\mathcal {C}^2\) function \(f:~\mathcal {P} \rightarrow \mathbb {R}\), there exists \(C>0\) such that

$$\begin{aligned} \left| f(z') - f(z) - \Big <z' - z, \nabla f(z) \Big >~\right| < C\varepsilon ^2 \end{aligned}$$
(16)

for all \(z, z' \in \mathcal {P}\) satisfying \(\varepsilon \le \Vert z'-z\Vert _1 < 2\varepsilon \).

Therefore for n large enough, there exists \(C'>0\) such that

$$\begin{aligned} \left| \kappa (e^\ell ) ~-~ {1 \over 2}\sum _{k=1}^q \Big | \Big <L_n(\tau ) - L_n(\sigma ), \nabla g_k^{H, \beta }(L_n(\sigma )) \Big > \Big | \right| ~<C'\varepsilon ^2. \end{aligned}$$
(17)

The above result holds regardless of the value of \(\ell \in \{1,2,\dots ,q\}\).

Similarly, when the chosen vertex \(j \in \mathcal {I}\), the probability of not coupling at j satisfies (17).

We conclude that in terms of \(\kappa _{\sigma ,\tau } := {1 \over 2}\sum _{k=1}^q \Big | \Big <L_n(\tau ) - L_n(\sigma ), \nabla g_k^{H, \beta }(L_n(\sigma )) \Big > \Big |\), the mean distance between a coupling of the Glauber dynamics starting in \(\sigma \) and \(\tau \) with \(d(\sigma , \tau ) = d\) after one step has the form

$$\begin{aligned} \mathbb {E}_{\sigma ,\tau }[d(X,Y)]\le & {} d-{d \over n} (1-\kappa _{\sigma ,\tau })+{n-d \over n}\kappa _{\sigma ,\tau } +c \, \varepsilon ^2 \nonumber \\= & {} d \cdot \left[ 1-{1 \over n}\left( 1-{\kappa _{\sigma ,\tau }+c \, \varepsilon ^2 \over d/n} \right) \right] \end{aligned}$$
(18)

for a fixed \(c>0\) and all \({\varepsilon }\) small enough.

8 Aggregate Path Coupling

In the previous section, we derived the form of the mean distance between a coupling of the Glauber dynamics starting in two configurations whose distance is bounded. We next derive the form of the mean coupling distance of a coupling starting in two configurations that are connected by a path of configurations where the distance between successive configurations are bounded.

Definition 8.1

Let \(\sigma \) and \(\tau \) be configurations in \(\Lambda ^n\). We say that a path \(\pi \) connecting configurations \(\sigma \) and \(\tau \) denoted by

$$\begin{aligned} \pi : \ \sigma = x_0, x_1, \ldots , x_r = \tau , \end{aligned}$$

is a monotone path if

  1. (i)

    \(~\sum \limits _{i=1}^r d(x_{i-1},x_i) = d(\sigma ,\tau )\)

  2. (ii)

    for each \(k=1, 2, \ldots , q\), the kth coordinate of \(L_n(x_i)\), \(L_{n,k}(x_i)\) is monotonic as i increases from 0 to r;

Observe that here the points \(x_i\) on the path are not required to be nearest-neighbors.

A straightforward property of monotone paths is that

$$\begin{aligned} \sum _{i=1}^r \sum _{k=1}^q | L_{n,k}(x_i) - L_{n,k}(x_{i-1}) | = \Vert L_n(\sigma ) - L_n(\tau ) \Vert _1 \end{aligned}$$

Another straightforward observation is that for any given path

$$\begin{aligned} L_n(\sigma )=z_0, z_1,\ldots ,z_r=L_n(\tau ) \end{aligned}$$

in \(\mathcal {P}_n\), monotone in each coordinate, with \(~\Vert z_i-z_{i-1}\Vert _1>0\) for all \(i \in \{1,2,\ldots ,r\}\), there exists a monotone path

$$\begin{aligned} \pi : \ \sigma = x_0, x_1, \ldots , x_r = \tau \end{aligned}$$

such that \(L_n(x_i)=z_i\) for each i.

Let \(\pi : \sigma = x_0, x_1, \ldots , x_r = \tau \) be a monotone path connecting configurations \(\sigma \) and \(\tau \) such that \(\varepsilon \le \Vert L_n(x_i) - L_n(x_{i-1}) \Vert _1 < 2\varepsilon \) for all \(i = 1, \ldots , r\). Equation (18) implies the following bound on the mean distance between a coupling of the Glauber dynamics starting in configurations \(\sigma \) and \(\tau \):

$$\begin{aligned}&\mathbb {E}_{\sigma , \tau }[d(X,Y)] \nonumber \\&\quad \le \sum _{i=1}^r \mathbb {E}_{x_{i-1}, x_i}[d(X_{i-1}, X_i)] \nonumber \\&\quad \le \sum \limits _{i=1}^r \left\{ d(x_{i-1},x_i) \cdot \left[ 1 -{1 \over n} \left( 1-{{1 \over 2} \sum \limits _{k=1}^q \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big |+c \varepsilon ^2 \over d(x_{i-1},x_i)/n} \right) \right] \right\} \nonumber \\&\quad = d(\sigma ,\tau ) \left[ 1-{1 \over n}\left( 1-{\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big |+c \varepsilon ^2 \over 2d(\sigma ,\tau )/n }\right) \right] \nonumber \\&\quad \le d(\sigma ,\tau ) \left[ 1-{1 \over n}\left( 1-{\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big |+c \varepsilon ^2 \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 }\right) \right] , \end{aligned}$$
(19)

as \(~\sum \limits _{i=1}^r d(x_{i-1},x_i) = d(\sigma ,\tau )\).

From inequality (19), if there exists monotone paths between all pairs of configurations such that there is a uniform bound less than 1 on the ratio

$$\begin{aligned} {\sum _{k=1}^q \sum _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big | \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 } \end{aligned}$$

then the mean coupling distance contracts which yields a bound on the mixing time via coupling inequality (14).

Although the Gibbs measure are distributions of the empirical measure \(L_n\) defined on the discrete space \(\mathcal {P}_n\), proving contraction of the mean coupling distance is often facilitated by working in the continuous space, namely the simplex \(\mathcal {P}\). We begin our discussion of aggregate path coupling by defining distances along paths in \(\mathcal {P}\).

Recall the function \(g^{H,\beta }\) defined in (12) which is dependent on the Hamiltonian of the model through the interaction representation function H defined in Definition 2.2.

Definition 8.2

Define the aggregate g- variation between a pair of points x and z in \(\mathcal {P}\) along a continuous monotone (in each coordinate) path \(\rho \) to be

$$\begin{aligned} D_\rho ^g (x, z) := \sum \limits _{k=1}^q\int \limits _{\rho } \Big | \Big <\nabla g_k^{H, \beta }(y), dy \Big > \Big | \end{aligned}$$

Define the corresponding pseudo-distance between a pair of points points x and z in \(\mathcal {P}\) as

$$\begin{aligned} d_g(x,z) :=\inf _{\rho }D_\rho ^g (x, z), \end{aligned}$$

where the infimum is taken over all continuous monotone paths in \(\mathcal {P}\) connecting x and z.

Notice if the monotonicity restriction is removed, the above infimum would satisfy the triangle inequality. We will need the following condition.

Condition 8.3

Let \(z_\beta \) be the unique equilibrium macrostate. There exists \(\delta \in (0,1)\) such that

$$\begin{aligned} {d_g(z,z_\beta ) \over \Vert z-z_\beta \Vert _1} \le 1-\delta \end{aligned}$$

for all z in \(\mathcal {P}\).

Observe that if it is shown that \(~d_g(z,z_\beta ) < \Vert z-z_\beta \Vert _1\) for all z in \(\mathcal {P}\), then by continuity the above condition is equivalent to

$$\begin{aligned} \limsup _{z \rightarrow z_\beta } {d_g(z,z_\beta ) \over \Vert z-z_\beta \Vert _1} <1 \end{aligned}$$

Suppose Condition 8.3 is satisfied. Then let denote by \(\mathrm{NG_\delta }\) the family of neo-geodesic smooth curves, monotone in each coordinate such that for each \(z \not = z_\beta \) in \(\mathcal {P}\), there is exactly one curve \(\rho =\rho _z\) in the family \(\mathbf{NG}_\delta \) connecting \(z_\beta \) to z, and

$$\begin{aligned} {D_\rho ^g (z, z_\beta ) \over \Vert z-z_\beta \Vert _1} \le 1-\delta /2 \end{aligned}$$

Condition 8.4

For \(\varepsilon >0\) small enough, there exists a neo-geodesic family \(\mathrm{NG_\delta }\) such that for each z in \(\mathcal {P}\) satisfying \(\Vert z-z_\beta \Vert _1 \ge \varepsilon \) , the curve \(\rho =\rho _z\) in the family \(\mathrm{NG_\delta }\) that connects \(z_\beta \) to z satisfies

$$\begin{aligned} {\sum _{k=1}^q \sum _{i=1}^r \Big | \Big <z_i - z_{i-1}, \nabla g_k^{H, \beta }(z_{i-1}) \Big > \Big | \over \Vert z-z_\beta \Vert _1} \le 1-\delta /3 \end{aligned}$$

for a sequence of points \(z_0=z_\beta ,z_1,\ldots ,z_r=z\) interpolating \(\rho \) such that

$$\begin{aligned} \varepsilon \le \Vert z_i - z_{i-1}\Vert _1 < 2\varepsilon \quad \text { for } i=1,2,\ldots , r. \end{aligned}$$

It is important to observe that Condition 8.3 is often simpler to verify than Condition 8.4. Moreover, under certain simple additional prerequisites, Condition 8.3 implies Condition 8.4. For example, this is achieved if there is a uniform bound on the Cauchy curvature at every point of every curve in \(\mathrm{NG_\delta }\). So it will be demonstrated on the example of Curie–Weiss–Potts model that the natural way for establishing Condition 8.4 for the model is via first establishing Condition 8.3.

In addition to Condition 8.4 that will be shown to imply contraction when one of the two configurations in the coupled processes is at the equilibrium, i.e. \(L_n(\sigma )=z_\beta \) , we need a condition that will imply contraction between two configurations within a neighborhood of the equilibrium configuration. We state this assumption next.

Condition 8.5

Let \(z_\beta \) be the unique equilibrium macrostate. Then,

$$\begin{aligned} \limsup \limits _{z \rightarrow z_\beta } {\Vert g^{H, \beta }(z)-g^{H, \beta }(z_\beta )\Vert _1 \over \Vert z-z_\beta \Vert _1} <1. \end{aligned}$$

Since \(H(z) \in \mathcal {C}^3\), the above Condition 8.5 implies that for any \({\varepsilon }>0\) sufficiently small, there exists \(\gamma \in (0,1)\) such that

$$\begin{aligned} {\Vert g^{H, \beta }(z)-g^{H, \beta }(w)\Vert _1 \over \Vert z-w\Vert _1} <1-\gamma \end{aligned}$$

for all z and w in \(\mathcal {P}\) satisfying

$$\begin{aligned} \Vert z-z_\beta \Vert _1<{\varepsilon }\quad \text { and }\quad \Vert w-z_\beta \Vert _1<{\varepsilon }. \end{aligned}$$

9 Main Result

A sufficient condition for rapid mixing of the Glauber dynamics of Gibbs ensembles is contraction of the mean coupling distance \(\mathbb {E}_{\sigma , \tau }[d(X,Y)]\) between coupled processes starting in all pairs of configurations in \(\Lambda ^n\). The classical path coupling argument [2] is a method of obtaining this contraction by only proving contraction between couplings starting in neighboring configurations. However for some classes of models (e.g. models that undergo a first-order, discontinuous phase transition) there are situations when Glauber dynamics exhibits rapid mixing, but coupled processes do not exhibit contraction between some neighboring configurations. Such models include the mean-field Blume–Capel (in the discontinuous phase transition region) and Curie–Weiss–Potts models. A major strength of the aggregate path coupling method introduced in [11] for mean-field Blume–Capel model and further expanded in this study is that, in addition to its generality, it yields a proof for rapid mixing even in those cases when contraction of the mean distance between couplings starting in all pairs of neighboring configurations does not hold.

The strategy is to take advantage of the large deviations estimates discussed in Sect. 3. Recall from that section that we assume that the set of equilibrium macrostates \(\mathcal {E}_\beta \), which can be expressed in the form given in (9), consists of a single point \(z_\beta \). Define an equilibrium configuration \(\sigma _\beta \) to be a configuration such that

$$\begin{aligned} L_n(\sigma _\beta ) = z_\beta = ((z_\beta )_1, (z_\beta )_2, \ldots , (z_\beta )_q). \end{aligned}$$

First we observe that in order to use the coupling inequality (14) we need to show contraction of the mean coupling distance \(\mathbb {E}_{\sigma , \tau }[d(X,Y)]\) between a Markov chain initially distributed according to the stationary probability distribution \( P_{n, \beta }\) and a Markov chain starting at any given configuration. Using large deviations we know that with high probability the former process starts near the equilibrium and stays near the equilibrium for long duration of time.

Our main result Theorem 9.2 states that once we establish contraction of the mean coupling distance between two copies of a Markov chain where one of the coupled dynamics starts near an equilibrium configuration in Lemma 9.1, then this contraction, along with the large deviations estimates of the empirical measure \(L_n\), yields rapid mixing of the Glauber dynamics converging to the Gibbs measure.

Now, the classical path coupling relies on showing contraction along any monotone path connecting two configurations, in one time step. Here we observe that we only need to show contraction along one monotone path connecting two configurations in order to have the mean coupling distance \(\mathbb {E}_{\sigma , \tau }[d(X,Y)]\) contract in a single time step. However, finding even one monotone path with which we can show contraction in the equation (19) is not easy. The answer to this is in finding a monotone path \(\rho \) in \(\mathcal {P}\) connecting the \(L_n\) values of the two configurations, \(\sigma \) and \(\tau \), such that

$$\begin{aligned} {\sum \limits _{k=1}^q\int \limits _{\rho } \Big | \Big <\nabla g_k^{H, \beta }(y), dy \Big > \Big | \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 } ~<1 \end{aligned}$$

Although \(\rho \) is a continuous path in continuous space \(\mathcal {P}\), it serves as Ariadne’s thread for finding a monotone path

$$\begin{aligned} \pi :~\sigma =x_0,~x_1,\ldots ,x_r=\tau \end{aligned}$$

such that \(L_n(x_0),~L_n(x_1),\ldots ,L_n(x_r)\) in \(\mathcal {P}_n\) are positioned along \(\rho \), and

$$\begin{aligned} \sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big | \end{aligned}$$

is a Riemann sum approximating \(~\sum \limits _{k=1}^q\int \limits _{\rho } \Big | \Big <\nabla g_k^{H, \beta }(y), dy \Big > \Big |\). Therefore we obtain

$$\begin{aligned} {\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big | \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 } ~<1, \end{aligned}$$

that in turn implies contraction in (19) for \({\varepsilon }\) small enough and n large enough. See Fig. 1.

Fig. 1
figure 1

Case \(q=3\). Dashed curve is the continuous monotone path \(\rho \). Solid lines represent the path \(L_n(x_0),~L_n(x_1),\ldots ,L_n(x_r)\) in \(\mathcal {P}_n\)

Observe that in order for the above argument to work, we need to spread points \(L_n(x_i) \in \mathcal {P}_n\) along a continuous path \(\rho \) at intervals of fixed order \({\varepsilon }\). Thus \(\pi \) has to be not a nearest-neighbor path in the space of configurations, another significant deviation from the classical path coupling.

Lemma 9.1

Assume Condition 8.4 and Condition 8.5. Let (XY) be a coupling of the Glauber dynamics as defined in Sect.  6, starting in configurations \(\sigma \) and \(\tau \) and let \(z_\beta \) be the single equilibrium macrostate of the corresponding Gibbs ensemble. Then there exists an \(\alpha >0\) and an \(\varepsilon '>0\) small enough such that for n large enough,

$$\begin{aligned} \mathbb {E}_{\sigma ,\tau } [ d(X,Y)] \le e^{-\alpha /n} d(\sigma , \tau ) \end{aligned}$$

whenever \(\Vert L_n(\sigma ) -z_\beta \Vert _1<\varepsilon ' \).

Proof

Let \({\varepsilon }\) and \(\delta \) be as in Condition 8.4, and let \({\varepsilon }'={\varepsilon }^2 \delta /M\) with a constant \(M\gg 0\).

Case I  Suppose \(L_n(\tau )=z\) and \(L_n(\sigma )=w\), where \(\Vert z-z_\beta \Vert _1 \ge {\varepsilon }\) and \(\Vert w-z_\beta \Vert _1< {\varepsilon }'\).

Then there is an equlibrium configuration \(\sigma _\beta \) with \(L_n(\sigma _\beta )=z_\beta \) such that there is a monotone path

$$\begin{aligned} \pi ':~\sigma _\beta =x'_0,~x'_1,\ldots ,x'_r=\tau \end{aligned}$$

connecting configurations \(\sigma _\beta \) and \(\tau \) on \(\Lambda ^n\) such that \({\varepsilon }\le \Vert L_n(x'_i) - L_n(x'_{i-1})\Vert _1<2{\varepsilon }\), and by Condition 8.4,

$$\begin{aligned} {\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x'_i) - L_n(x'_{i-1}), \nabla g_k^{H, \beta }(L_n( x'_{i-1})) \Big > \Big | \over \Vert L_n(\sigma _\beta )-L_n(\tau )\Vert _1 } \le 1-\delta /4 \end{aligned}$$

for n large enough. Note that the difference between the above inequality and Condition 8.4 is that here we take \(L_n(x'_i) \in \mathcal {P}_n\). Now, there exists a monotone path with from \(\sigma \) to \(\tau \)

$$\begin{aligned} \pi :~\sigma =x_0,~x_1,\ldots ,x_r=\tau \end{aligned}$$

such that

$$\begin{aligned} \Vert L_n(x_i)-L_n(x'_i)\Vert _1 \le {\varepsilon }' \qquad \text { for all }i=0,1,\ldots ,r. \end{aligned}$$

The new monotone path \(\pi \) is constructed from \(\pi '\) by insuring that either

$$\begin{aligned} 0 \le \Big <L_n(x_i)-L_n(x_{i-1}),e^k\Big > \le \Big <L_n(x'_i)-L_n(x'_{i-1}),e^k\Big > \end{aligned}$$

or

$$\begin{aligned} \Big <L_n(x'_i)-L_n(x'_{i-1}),e^k\Big > \le \Big <L_n(x_i)-L_n(x_{i-1}),e^k\Big > \le 0 \end{aligned}$$

for \(i=2,\ldots ,r\) and each coordinate \(k \in \{1,2,\ldots ,q\}\).

Then

$$\begin{aligned}&\left| {\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big | \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 }\right. \\&\left. -\,{\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x'_i) - L_n(x'_{i-1}), \nabla g_k^{H, \beta }(L_n( x'_{i-1})) \Big > \Big | \over \Vert L_n(\sigma _\beta )-L_n(\tau )\Vert _1 }\right| \le C'' r {\varepsilon }' / {\varepsilon }\end{aligned}$$

for a fixed constant \(C''>0\). Noticing that \(~r{\varepsilon }'/{\varepsilon }\le \delta /M~\) as \(~r \le 1/{\varepsilon }\), and taking M large enough, we obtain

$$\begin{aligned} {\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big | \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 } \le 1-\delta /4. \end{aligned}$$

Thus equation (19) will imply

$$\begin{aligned} \mathbb {E}_{\sigma , \tau }[d(X,Y)]\le & {} d(\sigma ,\tau ) \left[ 1-{1 \over n}\left( 1-{\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <L_n(x_i) - L_n(x_{i-1}), \nabla g_k^{H, \beta }(L_n( x_{i-1})) \Big > \Big |+c \, \varepsilon ^2 \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 }\right) \right] \\\le & {} d(\sigma ,\tau ) \left[ 1 -{1 \over n}\big (1- (1-\delta /4)-\delta /20 \big ) \right] \\= & {} d(\sigma ,\tau ) \left[ 1 -{1 \over n}\delta /5 \right] \end{aligned}$$

as \(~{c \, \varepsilon ^2 \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1 } ~\le ~c \, {\varepsilon }~\le ~\delta /20~\) for \({\varepsilon }\) small enough.

Case II  Suppose \(L_n(\tau )=z\) and \(L_n(\sigma )=w\), where \(\Vert z-z_\beta \Vert _1 < {\varepsilon }\) and \(\Vert w-z_\beta \Vert _1< {\varepsilon }'\).

Similarly to (18), equation (15) implies for n large enough,

$$\begin{aligned} \mathbb {E}[d(X,Y)]\le & {} d(\sigma ,\tau ) \cdot \left[ 1-{1 \over n}\left( 1-{\Vert g^{H,\beta }\big (L_n(\sigma )\big )-g^{H,\beta }\big (L_n(\tau )\big )\Vert _1 \over \Vert L_n(\sigma )-L_n(\tau )\Vert _1} \right) \right] + O\left( {1 \over n^2}\right) \\\le & {} d(\sigma ,\tau ) \cdot \left[ 1-{\gamma \over n} \right] + O\left( {1 \over n^2}\right) \\\le & {} d(\sigma ,\tau ) \cdot \left[ 1-{\gamma \over 2n} \right] \end{aligned}$$

by Condition 8.5 (see also discussion following Condition 8.5). \(\square \)

We now state and prove the main theorem of the paper that yields sufficient conditions for rapid mixing of the Glauber dynamics of the class of statistical mechanical models discussed.

Theorem 9.2

Suppose H(z) and \(\beta >0\) are such that Condition 8.4 and Condition 8.5 are satisfied. Then the mixing time of the Glauber dynamics satisfies

$$\begin{aligned} t_{mix} = O(n \log n) \end{aligned}$$

Proof

Let \({\varepsilon }'>0\) and \(\alpha >0\) be as in Lemma 9.1. Let \((X^t, Y^t)\) be a coupling of the Glauber dynamics such that \(X^0 \overset{dist}{=} P_{n, \beta }\), the stationary distribution. Then, for sufficiently large n,

$$\begin{aligned}&\Vert P^t(Y^0, \cdot ) - P_{n, \beta } \Vert _{\mathrm{TV}} \\&\quad \,\, \,\, \le P \{ X^t \ne Y^t \} \\&\quad \,\, \,\, = P \{ d(X^t, Y^t ) \ge 1 \} \\&\quad \,\, \,\, \le \mathbb {E} [ d(X^t,Y^t)] \\&\quad \,\, \,\, = \mathbb {E} [ {\scriptstyle \mathbb {E}[d(X^t,Y^t)~|X^{t-1},Y^{t-1}]} ]\\&\quad \,\, \,\, \le \mathbb {E} [ {\scriptstyle \mathbb {E}[d(X^t,Y^t)~|X^{t-1},Y^{t-1}]}~|~\Vert L_n(X^{t-1}) - z_\beta \Vert _1 < \varepsilon ' ] \cdot P\{\Vert L_n(X^{t-1}) - z_\beta \Vert _1<\varepsilon ' \} \\&\quad \quad \quad +\,n P\{\Vert L_n(X^{t-1}) - z_\beta \Vert _1 \ge \varepsilon ' \} . \end{aligned}$$

By Lemma 9.1, we have

$$\begin{aligned}&\mathbb {E} \big [ {\scriptstyle \mathbb {E}[d(X^t,Y^t)~|X^{t-1},Y^{t-1}]}~|~\Vert L_n(X^{t-1}) - z_\beta \Vert _1 < \varepsilon ' \big ] \nonumber \\&\quad \le e^{-\alpha /n} \mathbb {E} \big [ d(X^{t-1},Y^{t-1})~|~\Vert L_n(X^{t-1}) - z_\beta \Vert _1 < \varepsilon ' \big ] \end{aligned}$$
(20)

By iterating (20), it follows that

We recall the LDP limit (6) for \(\beta \) in the single phase region B,

$$\begin{aligned} P_{n, \beta }\{ L_n(X^0) \in dx \} \Longrightarrow \delta _{z_\beta } \,\, {\hbox {as}} n {\rightarrow }\infty . \end{aligned}$$

Moreover, for any \(\gamma '>1\) and n sufficiently large, by the LDP upper bound (4), we have

$$\begin{aligned} \Vert P^t(Y^0, \cdot ) - P_{n, \beta } \Vert _{\mathrm{TV}}\le & {} n e^{-\alpha t/n}+n t P_{n, \beta }\{\Vert L_n(X^0) - z_\beta \Vert _1 \ge \varepsilon ' \} \\< & {} n e^{-\alpha t/n} +t n e^{-{n \over \gamma '}I_{\beta }(\varepsilon ')} . \end{aligned}$$

For \(t = {n \over \alpha } (\log n + \log (2/{\varepsilon }'))\), the above right-hand side converges to \({\varepsilon }'/2\) as \(n {\rightarrow }\infty \). \(\square \)

10 Aggregate Path Coupling Applied to the Generalized Potts Model

In this section, we illustrate the strength of our main result of Sect. 9, Theorem 9.2, by applying it to the generalized Curie–Weiss–Potts model (GCWP), studied recently in [10]. The classical Curie–Weiss–Potts (CWP) model, which is the mean-field version of the well known Potts model of statistical mechanics [16] is a particular case of the GCWP model with \(r=2\). While the mixing times for the CWP model has been studied in [4], these are the first results for the mixing times of the GCWP model. Moreover, the application of our methods gives a significantly shorter derivation for the region of rapid mixing than the one used for the CWP model in [4], where the result in [4] is part of a complete analysis of the CWP model that includes cut-off.

Let q be a fixed integer and define \(\Lambda = \{ e^1, e^2, \ldots , e^q \}\), where \(e^k\) are the q standard basis vectors of \({\mathbb {R}}^q\). A configuration of the model has the form \(\omega = (\omega _1, \omega _2, \ldots , \omega _n) \in \Lambda ^n\). We will consider a configuration on a graph with n vertices and let \(X_i(\omega ) = \omega _i\) be the spin at vertex i. The random variables \(X_i\)’s for \(i=1, 2, \ldots , n\) are independent and identically distributed with common distribution \(\rho \).

For the generalized Curie–Weiss–Potts model, for \(r \ge 2\), the interaction representation function as in Definition 2.2, has the form

$$\begin{aligned} H^r(z) = - \frac{1}{r} \sum _{j=1}^q z_j^r \end{aligned}$$

and the generalized Curie–Weiss–Potts model is defined as the Gibbs measure

$$\begin{aligned} P_{n, \beta , r} (B) = \frac{1}{Z_n(\beta )} \int _B \exp \left\{ -\beta n \, H^r\left( L_n(\omega ) \right) \right\} dP_n \end{aligned}$$
(21)

where \(L_n(\omega )\) is the empirical measure defined in (1).

In [10], the authors proved that there exists a phase transition critical value \(\beta _c(q, r)\) such that in the parameter regime \((q, r) \in \{2\} \times [2,4]\), the GCWP model undergoes a continuous, second-order, phase transition and for (qr) in the complementary regime, the GCWP model undergoes a discontinuous, first-order, phase transition. This is stated in the following theorem.

Theorem 10.1

(Generalized Ellis–Wang Theorem) Assume that \(q \ge 2\) and \(r \ge 2\). Then there exists a critical temperature \(\beta _c(q,r) > 0\) such that in the weak limit

$$\begin{aligned} \lim _{n {\rightarrow }\infty } P_{n, \beta , r} (L_n \in \cdot ) = \left\{ \begin{array}{l@{\quad }l} \delta _{1/q(1, \ldots , 1)} &{} \hbox {if} \ \beta < \beta _c(q,r) \\ \frac{1}{q} \sum _{i=1}^q \delta _{u(\beta , q, r) e^i + (1-u(\beta , q, r))/q(1, \ldots , 1)} &{} \hbox {if} \ \beta > \beta _c(q,r) \end{array} \right. \end{aligned}$$

where \(u(\beta , q, r)\) is the largest solution to the so-called mean-field equation

$$\begin{aligned} u = \frac{1 - \exp ( \Delta (u))}{1+(q-1) \exp (\Delta (u))} \end{aligned}$$

with \(\Delta (u) :=-{\beta \over q^{r-1}}\big [(1+(q-1)u)^{r-1}-(1-u)^{r-1} \big ]\). Moreover, for \((q, r) \in \{2\} \times [2,4]\), the function \(\beta \mapsto u(\beta , q, r)\) is continuous whereas, in the complementary case, the function is discontinuous at \(\beta _c(q,r)\).

For the GCWP model, the function \(g_{\ell }^{H, \beta }(z)\) defined in general in (11) has the form

$$\begin{aligned} g_{k}^{H, \beta }(z) = \left[ \partial _{k} \Gamma \right] (\beta \nabla H(z)) = \left[ \partial _{k} \Gamma \right] (\beta z) = {e^{\beta z_k^{r-1}} \over e^{\beta z_1^{r-1}}+ \ldots +e^{\beta z_q^{r-1}}}. \end{aligned}$$

For the remainder of this section, we will replace the notation \(H, \beta \) and refer to \(g^{H, \beta }(z)=\big (g_1^{H, \beta }(z),\ldots ,g_q^{H, \beta }(z)\big )\) as simply \(g^r(z)=\big (g_1^r(z),\ldots ,g_q^r(z)\big )\). As we will prove next, the rapid mixing region for the GCWP model is defined by the following value.

$$\begin{aligned} \beta _s(q, r) := \sup \left\{ \beta \ge 0 : g_k^r(z) < z_k \ \hbox {for all } z \in \mathcal {P} \hbox { such that} \ z_k \in (1/q, 1] \right\} . \end{aligned}$$
(22)

Lemma 10.2

If \(\beta _c(q,r)\) is the critical value derived in [10] and defined in Theorem 10.1, then

$$\begin{aligned} \beta _s(q,r) \le \beta _c(q,r) \end{aligned}$$

Proof

We will prove this lemma by contradiction. Suppose \(\beta _c(q,r) < \beta _s(q,r)\). Then there exists \(\beta \) such that

$$\begin{aligned} \beta _c(q,r) < \beta < \beta _s(q,r). \end{aligned}$$

Then, by Theorem 10.1, since \(\beta _c(q,r) <\beta \), there exists \(u>0\) satisfying the following inequality

$$\begin{aligned} u < {1-e^{\Delta (u)} \over 1+(q-1)e^{\Delta (u)}}, \end{aligned}$$
(23)

where \(~\Delta (u):=-{\beta \over q^{r-1}}\big [(1+(q-1)u)^{r-1}-(1-u)^{r-1} \big ]\). Here, the above inequality (23) rewrites as

$$\begin{aligned} e^{\Delta (u)}=\exp \left\{ \beta \left[ \left( {1-u \over q} \right) ^{r-1}-\left( {1+(q-1)u \over q} \right) ^{r-1}\right] \right\} ~<~ {1-u \over (q-1)u+1}. \end{aligned}$$
(24)

Next, we substitute \(\lambda =(1-u){q-1 \over q}\) into the above inequality (24), obtaining

$$\begin{aligned} \exp \left\{ \beta \left[ \left( {\lambda \over q-1} \right) ^{r-1}-\left( 1-\lambda \right) ^{r-1}\right] \right\} ~<~ {\lambda \over (1-\lambda )(q-1)}. \end{aligned}$$
(25)

Now, consider

$$\begin{aligned} z=\left( 1-\lambda , {\lambda \over q-1},\ldots , {\lambda \over q-1} \right) . \end{aligned}$$

Observe that \(~z_1=1-\lambda =1-(1-u){q-1 \over q}={1+u(q-1) \over q}>{1 \over q}~\) as \(u>0\). Here, the inequality (25) can be consequently rewritten in terms of the above selected z as follows

$$\begin{aligned} z_1=1-\lambda ~<~{e^{\beta (1-\lambda )^{r-1}} \over e^{\beta (1-\lambda )^{r-1}}+(q-1)e^{\beta \big ({\lambda \over q-1}\big )^{r-1}}}=g_1^r(z), \end{aligned}$$

thus contradicting \(\beta < \beta _s(q,r)\). Hence \(~\beta _s(q,r) \le \beta _c(q,r)\). \(\square \)

Combining Theorem 10.1 and Lemma 10.2 yields that for parameter values (qr) in the continuous, second-order phase transition region \(\beta _s(q,r) = \beta _c(q,r)\), whereas in the discontinuous, first-order, phase transition region, \(\beta _s(q,r)\) is strictly less than \(\beta _c(q,r)\). This relationship between the equilibrium transition critical value and the mixing time transition critical value was also proved for the mean-field Blume–Capel model discussed in [11]. This appears to be a general distinguishing feature between models that exhibit the two distinct types of phase transition. We now prove rapid mixing for the generalized Curie–Weiss–Potts model for \(\beta < \beta _s(q,r)\) using the aggregate path coupling method derived in Sect. 9.

We state the lemmas that we prove below, and the main result for the Glauber dynamics of the generalized Curie–Weiss–Potts model, a Corollary to Theorem 9.2.

Lemma 10.3

Condition 8.3 and Condition 8.4 are satisfied for all \(\beta < \beta _s(q,r)\).

Lemma 10.4

Condition 8.5 is satisfied for all \(\beta < \beta _s(q,r)\).

Corollary 10.5

If \(\beta < \beta _s(q,r)\), then

$$\begin{aligned} t_{\mathrm{mix}} = O(n \log n). \end{aligned}$$

Proof

Condition 8.4 and Condition 8.5 required for Theorem 9.2 are satisfied by Lemma 10.3 and Lemma 10.4. \(\square \)

Proof of Lemma 10.4

Denote \(~z'=(z'_1,\ldots ,z'_q)=z-z_\beta \). Then by Taylor’s Theorem, we have

$$\begin{aligned} \limsup _{z \rightarrow z_\beta } {\Vert g^r(z)-g^r(z_\beta )\Vert _1 \over \Vert z-z_\beta \Vert _1}= & {} \limsup _{z \rightarrow z_\beta } {\sum \limits _{k=1}^q \left| {e^{\beta z_k^{r-1}} \over \sum \limits _{j=1}^q e^{\beta z_j^{r-1}}} -{1 \over q} \right| \over \sum \limits _{k=1}^q \left| z_k -{1 \over q} \right| } \nonumber \\= & {} \lim _{z' \rightarrow 0} \frac{ \sum \limits _{k=1}^q \left| {\beta (r-1) \left( {1 \over q} \right) ^{r-2} z'_k +O\big ((z'_1)^2+\ldots +(z'_q)^2\big ) \over q+O\big ((z'_1)^2+\ldots +(z'_q)^2\big ) } \right| }{ \sum \limits _{k=1}^q \left| z'_k \right| } \nonumber \\= & {} \frac{\beta (r-1)}{q^{r-1}}. \end{aligned}$$
(26)

Recall that \(\beta _s(q,r)\le \beta _c(q,r)\) was shown in Lemma 10.2, and \(\beta _c(q,r) \le {q^{r-1} \over r-1}\) was shown in the proof of Lemma 5.4 of [10]. Therefore, \(~\beta < {q^{r-1} \over r-1}\) and the last expression above is less than 1, and we conclude that

$$\begin{aligned} \limsup _{z \rightarrow z_\beta } {\Vert g^r(z)-g^r(z_\beta )\Vert _1 \over \Vert z-z_\beta \Vert _1} < 1. \end{aligned}$$

\(\square \)

Proof of Lemma 10.3

First, we prove that the family of straight lines connecting to the equilibrium point \(z_\beta =\left( 1/q,\ldots ,1/q\right) \) is a neo-geodesic family as it was defined following Condition 8.3. Specifically, for any \(z = (z_1, z_2, \ldots , z_q) \in \mathcal {P}\) define the line path \(\rho \) connecting z to \(z_\beta \) by

$$\begin{aligned} z(t) = {1 \over q} (1-t) + z \, t, \,\, 0 \le t \le 1 \end{aligned}$$
(27)

Then, along this straight-line path \(\rho \), the aggregate g-variation has the form

$$\begin{aligned} D_\rho ^g (z,z_\beta ) := \sum \limits _{k=1}^q\int \limits _\rho \Big | \Big <\nabla g_k^r (y), dy \Big > \Big | = \sum \limits _{k=1}^q \int _0^1 \left| \frac{d}{dt} [g_k^r (z(t))] \right| \, dt \end{aligned}$$

Next, for all \(k = 1, 2, \ldots , q\) and \(t \in [0,1]\), denote

$$\begin{aligned} z(t)_k = {1 \over q} (1-t) + z_k t \end{aligned}$$

Then

$$\begin{aligned} g_k^r(z(t)) = \frac{e^{\beta \big ((1/q)(1-t) + z_k t \big )^{r-1}}}{\sum _{j=1}^q e^{\beta \big ((1/q) (1-t) + z_j t \big )^{r-1}}} \end{aligned}$$
(28)

and

$$\begin{aligned} \frac{d}{dt} \big [g_k^r(z(t)) \big ] = \beta (r-1) g_k^r (z(t)) \left[ \left( \frac{1}{q} (1-t) + z_k t \right) ^{r-2} \left( z_k- \frac{1}{q} \right) - \langle z-z_\beta , g^r(z(t)) \rangle _\rho \right] \nonumber \\ \end{aligned}$$
(29)

where \(\langle z-z_\beta , g^r(z(t)) \rangle _\rho \) is the weighted inner product

$$\begin{aligned} \langle z-z_\beta , g^r(z(t)) \rangle _\rho := \sum _{j=1}^q g_j^r(z(t)) \left( z_k- \frac{1}{q} \right) \left( \frac{1}{q} (1-t) + z_k t \right) ^{r-2} \end{aligned}$$

Now, observe that for z(t) as in (27) with \(z \not = z_\beta \), the inner product \(\langle (z-z_\beta ), g^r(z(t)) \rangle _\rho \) is monotonically increasing in t since

$$\begin{aligned} \frac{d}{dt} \langle z-z_\beta , g^r(z(t)) \rangle _\rho \ge \beta (r-1) \, \hbox {Var}_{g^r}\left( \left( z_k-\frac{1}{q} \right) \left( \frac{1}{q} (1-t) + z_j t \right) ^{r-1} \right) > 0 \end{aligned}$$

where \(\hbox {Var}_{g^r}(\cdot )\) is the variance with respect to \(g^r\).

So \(\langle z-z_\beta , g^r(z(t)) \rangle _\rho \) begins at \(\langle z-z_\beta , g^r(z(0)) \rangle _\rho =\langle z-z_\beta , z_\beta \rangle =0\) and increases for all \(t \in (0,1)\).

The above monotonicity yields the following claim about the behavior of \(g_k^r(z(t))\) along the straight-line path \(\rho \).

  1. (a)

    If \(z_k \le 1/q\), then \(g_k^r(z(t))\) is monotonically decreasing in t.

  2. (b)

    If \(z_k > 1/q\), then \(g_k^r(z(t))\) has at most one critical point \(t_k^*\) on (0, 1).

The above claim (a) follows immediately from (29) as \(\langle z-z_\beta , g^r(z(t)) \rangle _\rho >0\) for \(t>0\). Claim (b) also follows from (29) as its right-hand side, \(z_k-1/q>0\) and \(\langle z-z_\beta , g^r(z(t)) \rangle _\rho \) is increasing. Thus there is at most one point \(t_k^*\) on (0, 1) such that \(~\frac{d}{dt} \big [g_k^r(z(t)) \big ] =0\).

Next, define

$$\begin{aligned} A_z = \{ k : z_k > 1/q \} \end{aligned}$$

Then the aggregate g-variation can be split into

$$\begin{aligned} D_\rho ^g (z,z_\beta ) = \sum _{k \in A_z} \int _0^1 \left| \frac{d}{dt} [g_k^r(z(t))] \right| \, dt + \sum _{k \notin A_z} \int _0^1 \left| \frac{d}{dt} [g_k^r(z(t))] \right| \, dt \end{aligned}$$

For \(k \notin A_z\), claims (a) and (b) imply

$$\begin{aligned} \int _0^1 \left| \frac{d}{dt} [g_k^r(z(t))] \right| \, dt = - \int _0^1 \frac{d}{dt} [g_k^r(z(t))] \, dt = g_k^r(z(0)) - g_k^r(z(1)) = \frac{1}{q} - g_k^r(z) \end{aligned}$$

For \(k \in A_z\), let \(t_k = \max \{ t_k^*, 1\}\) ,where \(t_k^*\) is defined in (b). Then, we have

$$\begin{aligned} \int _0^1 \left| \frac{d}{dt} [g_k^r(z(t))] \right| \, dt= & {} \int _0^{t_k^*} \frac{d}{dt} [g_k^r(z(t))] \, dt - \int _{t_k^*}^1 \frac{d}{dt} [g_k^r(z(t))] \, dt\\= & {} 2 g_k^r(z(t_k^*)) - g_k^r(z) - \frac{1}{q} \end{aligned}$$

Combining the previous two displays, we get

$$\begin{aligned} D_\rho ^g (z,z_\beta )= & {} \sum _{k \in A} \left( 2 g_k^r(z(t_k^*)) - g_k^r(z) - \frac{1}{q} \right) + \sum _{k \notin A} \left( \frac{1}{q} - g_k^r(z) \right) \\= & {} 2 \sum _{k \in A} \left( g_k^r(z(t_k^*)) - \frac{1}{q} \right) \end{aligned}$$

Since \(\beta < \beta _s\) and \(k \in A_z\), we have

$$\begin{aligned} g_k^r(z(t_k^*)) < z(t_k^*)_k \le z(1)_k = z_k \end{aligned}$$

and we conclude that

$$\begin{aligned} D_\rho ^g (z,z_\beta ) < 2 \sum _{k \in A} \left( z_k - \frac{1}{q} \right) = \Vert z - z_\beta \Vert _1 \end{aligned}$$

Thus

$$\begin{aligned} {d_g(z,z_\beta ) \over \Vert z - z_\beta \Vert _1} \le {D_\rho ^g (z,z_\beta ) \over \Vert z - z_\beta \Vert _1} <1 \quad \text{ for } \text{ all } z \not =z_\beta \text { in } \mathcal {P}. \end{aligned}$$

Next, since we are dealing with the straight line segments \(\rho \),

$$\begin{aligned} \limsup _{z \rightarrow z_\beta }{D_\rho ^g (z,z_\beta ) \over \Vert z - z_\beta \Vert _1}=\limsup _{z \rightarrow z_\beta } {\Vert g(z)-g(z_\beta )\Vert _1 \over \Vert z-z_\beta \Vert _1} <1 \end{aligned}$$

by (26), the Mean Value Theorem, and \(H(z) \in \mathcal {C}^3\). This, in turn, guarantees the continuity required for Condition 8.3:

$$\begin{aligned} \limsup _{z \rightarrow z_\beta }{d_g(z,z_\beta ) \over \Vert z - z_\beta \Vert _1} \le \limsup _{z \rightarrow z_\beta }{D_\rho ^g (z,z_\beta ) \over \Vert z - z_\beta \Vert _1}<1 \end{aligned}$$

Thus Condition 8.3 is proved for the GCWP model. Moreover this proves that the family of straight line segments \(\rho \) is a neo-geodesic family (see definition following Condition 8.3). Indeed, there is \(\delta \in (0,1)\) such that

$$\begin{aligned} \left\{ \rho :~z(t)={1 \over q}(1-t)+zt, ~~z \in \mathcal {P} \right\} \quad \text { is a } \mathrm{NG_\delta } \text { family of smooth curves,} \end{aligned}$$

i.e. \(\forall z\not = z_\beta \) in \(\mathcal {P}\), and corresponding \(\rho :~z(t)={1 \over q}(1-t)+zt\),

$$\begin{aligned} {D_\rho ^g (z,z_\beta ) \over \Vert z - z_\beta \Vert _1} \le 1-\delta /2 \end{aligned}$$

Since the family of straight line segments \(\rho \) is a neo-geodesic family \(\mathrm{NG_\delta }\), the integrals

$$\begin{aligned} D_\rho ^g (x, z) := \sum \limits _{k=1}^q\int \limits _{\rho } \Big | \Big <\nabla g_k^r(y), dy \Big > \Big | \end{aligned}$$

can be uniformly approximated by the corresponding Riemann sums of small enough step size by the Mean Value Theorem as \(H(z) \in \mathcal {C}^3\) and therefore each \(g_k^r(z) \in \mathcal {C}^2\). That is, there exists a constant \(C>0\) that depends on the second partial derivatives of \(g^r(z)=\big (g_1^r(z),\ldots ,g_q^r(z)\big )\), such that for \(\varepsilon >0\) small enough, the curve \(\rho ={1 \over q}(1-t)+zt\) in the family \(\mathrm{NG_\delta }\) that connects \(z_\beta \) to z satisfies

$$\begin{aligned} \left| \sum _{k=1}^q \sum _{i=1}^r \Big | \Big <z_i - z_{i-1}, \nabla g_k^r(z_{i-1}) \Big > \Big |-D_\rho ^g (z,z_\beta ) \right| < C r {\varepsilon }^2 \qquad \forall z \in \mathcal {P} \text { s.t. } \Vert z-z_\beta \Vert _1 \ge \varepsilon \end{aligned}$$

for a sequence of points \(z_0=z_\beta ,z_1,\ldots ,z_r=z \in \mathcal {P}\) interpolating \(\rho \) such that

$$\begin{aligned} \varepsilon \le \Vert z_i - z_{i-1}\Vert _1 < 2\varepsilon \quad \text { for } i=1,2,\ldots , r. \end{aligned}$$

Hence

$$\begin{aligned} {\sum \limits _{k=1}^q \sum \limits _{i=1}^r \Big | \Big <z_i - z_{i-1}, \nabla g_k^r(z_{i-1}) \Big > \Big | \over \Vert z-z_\beta \Vert _1} \le 1-\delta /2+C{\varepsilon }\le 1-\delta /3 \end{aligned}$$

for \({\varepsilon }\le \delta /(6C)\). This concludes the proof of Condition 8.4. \(\square \)

Finally, the region of exponentially slow mixing \(\beta > \beta _s(q,r)\) can be shown using the standard approach of bottleneck ratio similar to Sect. 7 in [11].