1 Introduction and Results

We study the dynamics of the Potts model on the complete graph (mean-field) known as the Curie-Weiss Potts model. For n≥1, β≥0, the Curie-Weiss Potts distribution is a probability measure on Σ n =Q V where Q={1,…,q} and V={1,…,n}, defined by

where σΣ n and Z β,n is the normalizing constant. When q=2 this is the classic Ising model while in this paper we will focus on the case q≥3 for an integer q (for an extension to non-integer q via the random cluster model, see e.g. [23]). We use the standard notation β c (q) for the (explicitly known) threshold value between the ordered and the disordered phases (see [16]).

Throughout the paper (σ t ) t≥0 will denote the discrete time Glauber dynamics for this model, namely, starting from σ 0, at each step we choose a vertex uV uniformly and set

$$\sigma_{t+1}(v) = \left\{ \everymath{\displaystyle }\begin{array}{l@{\quad }l} \sigma_t(v) & \mbox {if } v \neq u, \cr\noalign{\vspace{3pt}} k & \mbox {with probability } \mu_n (\sigma(u)=k | \sigma(w)=\sigma _t(w) \ \forall w \neq u )\mbox{ if } v=u. \end{array} \right. $$

We denote by P n the transition kernel for this Markov process and by \(\mathbb {P}_{\sigma_{0}}\) the underlying probability measure. We will measure the distance between the distribution of the chain at time t and its stationary distribution μ n via the total-variation norm. Accordingly,

$$d^{\sigma_0}_t(n) = \bigl \|\mathbb {P}_{\sigma_0} (\sigma_t \in \cdot )- \mu_n\bigr \|_{\mathrm {TV}} \quad \mbox {and}\quad d_t(n) = \max_{\sigma_0 \in \varSigma _n} d^{\sigma_0}_t(n) . $$

For ϵ∈(0,1), the ϵ-mixing time is the number of steps until the total-variation distance to stationarity is at most ϵ in the worst case, i.e.:

and by convention we set \(t_{{ \scriptsize \mbox {\textsc {mix}}}}(n) := t_{{ \scriptsize \mbox {\textsc {mix}}}(1/4)}(n)\). If for any fixed ϵ∈(0,1)

$$w_{\epsilon }(n) \triangleq t_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n) - t_{{ \scriptsize \mbox {\textsc {mix}}}(1-\epsilon )}(n) = o\bigl(t_{{\scriptsize \mbox {\textsc {mix}}}(1/4)}(n)\bigr) \quad \mbox {as } n \to\infty, $$

we say that the family of Markov chains exhibits the cutoff phenomenon, which describes a sharp drop in the total variation distance from close to 1 to close to 0 (in an interval of time of smaller order than \(t_{{ \scriptsize \mbox {\textsc {mix}}}}(n)\) denoted as the cutoff window). Observe that cutoff occurs if and only if \(t_{{ \scriptsize \mbox {\textsc {mix}}}(\delta)}(n) / t_{{\scriptsize \mbox {\textsc {mix}}}(\epsilon)}(n) \to1\) as n→∞ for any fixed δ,ϵ∈(0,1).

1.1 Results

We show that the dynamics for the Curie-Weiss Potts model undergoes a critical slowdown at an inverse-temperature β s (q)>0. This threshold is given by

(1.1)

Unlike mean-field Ising, for which β s (2)=β c (2)=1, the dynamical transition for q≥3 occurs at a strictly higher temperature than the static phase transition, i.e., β s (q)<β c (q).

Our first result addresses the regime β<β s (q), where rapid mixing occurs within O(nlogn) steps and the dynamics exhibits cutoff with a window of size O(n) (see Fig. 1).

Fig. 1
figure 1

Rapid mixing with cutoff in the subcritical regime of β<β s (q) for q=3. Blue curve marks the magnetization vector of the Glauber dynamics along time (Color figure online)

Theorem 1

Let q≥3 be an integer. If β<β s (q) then the Glauber dynamics for the q-state Curie-Weiss Potts model exhibits cutoff at mixing time

$$ t_{{ \scriptsize \mbox {\textsc {mix}}}}(n) = \alpha _1(\beta , q) n \log n $$
(1.2)

with cutoff window w ϵ (n)=O ϵ (n) where α 1(β,q)=[2(1−2β/q)]−1.

We proceed to analyze the order of the mixing time as β(n)→β s (q) as n→∞,

$$ \beta (n) = \beta _s(q) - \xi(n) $$
(1.3)

where ξ(n)→0 as n→∞. The asymptotics of the mixing time will, of course, depend on how fast ξ decays, but it turns out that cutoff is observed only iff the decay is slow enough. This is captured in the following theorem.

Theorem 2

Let q≥3 be an integer. With β(n) given as in Eq. (1.3) we have:

  1. (1)

    If lim n→∞ n 2/3 ξ(n)=∞ then the Glauber dynamics has cutoff with mixing time and cutoff window given by

    $$ \begin{array}{llll} t_{{ \scriptsize \mbox {\textsc {mix}}}}(n) &=& \alpha _1\bigl(\beta (n), q\bigr) n \log n + \alpha _2(q) n/\sqrt{\xi(n)} ,\cr\noalign{\vspace{3pt}} w_{\epsilon }(n) &=& O_{\epsilon }\big( n + \sqrt{n/\xi(n)^{5/2}} \,\big) , \end{array} $$
    (1.4)

    where α 2(q) is a positive constant and α 1 is the constant defined in Theorem 1.

  2. (2)

    If 0≤lim inf n→∞ n 2/3 ξ(n)≤lim sup n→∞ n 2/3 ξ(n)<∞ then the dynamics does not exhibit cutoff and has mixing time

    $$ t_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n) = \varTheta_{\epsilon } \big( n^{4/3} \big). $$
    (1.5)

Part (2) of Theorem 2 in particular applies at criticality β=β s (q) where the mixing time is of order n 4/3 with a scaling window of order n −2/3 (in contrast, the mixing time for the critical Ising model is of order n 3/2 with a window of \(\sqrt {n}\) ).

Finally, above β s (q) the mixing time is exponentially large in n, as depicted in Fig. 2.

Fig. 2
figure 2

Slow mixing without cutoff in the supercritical regime β>β s for q=3. On left β s <β<β c and on right β>β c . Curve color marks time from blue to red (Color figure online)

Theorem 3

Let q≥3 be an integer, and fix β>β s (q). For every 0<ϵ<1 there exist C 1,C 2>0 such that for all n,

$$t_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n) \geq C_1 \exp(C_2 n). $$

Combined these results give a complete analysis of the mixing time of Glauber dynamics for the Curie-Weiss Potts model.

The slowdown in the mixing of the dynamics occurring as soon as ββ s (q) (be it power-law at β s (q) or exponential mixing above this point) is due to the existence of states from which the Markov chain takes a long time to escape. However, in the range β∈[β s (q),β c (q)) the subset of initial configurations from which mixing is slow is exponentially small in probability. One can then ask instead about the mixing time from typical starting locations, known as essential mixing. Define the mixing time started from a subset of initial configurations \(\widetilde {\varSigma }_{n} \subseteq \varSigma _{n}\) via \(d^{\widetilde {\varSigma }_{n}}_{t}(n) = \max_{\sigma\in \widetilde {\varSigma }_{n}} d^{\sigma}_{t}(n)\) as well as

With these definitions we have the following result, showing that the subcritical mixing time behavior from Theorem 1 extends all the way to β<β c (q) once one excludes a subset of initial configurations with a total mass that is exponentially small in n.

Theorem 4

Let q≥3 be an integer and let β<β c (q). There exist constants C 1,C 2>0 and subsets \(\widetilde {\varSigma }_{n} \subseteq \varSigma _{n}\) such that the Glauber dynamics has cutoff with mixing time and cutoff window given by

$$t^{\widetilde {\varSigma }_n}_{{ \scriptsize \mbox {\textsc {mix}}}}(n) = \alpha _1(\beta , q) n \log n ; \qquad w^{\widetilde {\varSigma }_n}_{\epsilon }(n) = O_{\epsilon } (n), $$

where \(\mu_{n} (\varSigma _{n} \setminus \widetilde {\varSigma }_{n}) \leq C_{1} \mathrm{e}^{-C_{2} n}\) and α 1 is the constant in Theorem 1.

1.2 Related Work

Through several decades of work by mathematicians, physicists and computer scientists a general picture of how the mixing time varies with the temperature has been developed. It is believed that in a wide class of models and geometries the mixing time undergoes the following “critical slowdown”. For some critical inverse-temperature β d and a geometric parameter L(n), where n is the size of the system, we should have:

  • High temperature (0≤β<β d ): mixing time of order nlogn with cutoff.

  • Critical temperature (β=β d ): mixing time of order nL(n)z for some fixed z>0.

  • Low temperature (β>β d ): mixing time of order exp(τ β L(n)) for some fixed τ β >0.

For a more comprehensive description of critical slowdown see [19, 30, 33]. It should be noted that to demonstrate the above phenomenon in full, one needs to derive precise estimates on the mixing time up to the critical temperature, which can be quite challenging.

Perhaps the most studied model in this context is Ising. For the complete graph, a comprehensive treatment is given in [17, 18, 28], where critical slowdown (as described above) around the uniqueness threshold β c is established in full. In this setting, finer statements about the asymptotics of the mixing time can be made. For instance, in [17] the case where β approaches β c with the size of the system is analyzed (in Theorem 2 here we consider this case as well). The same picture, yet with the notable exclusion of a cutoff proof at high temperatures, is also known on the d-regular tree where [3] established the high and low temperature regimes and recently [19] proved polynomial mixing at criticality.

From a mathematical physics point of view, the most interesting underlying graph to consider is the lattice ℤd. For d=2 the full critical slowdown is now known: for a box with n vertices the mixing time is O(nlogn) throughout the high temperature regime [34, 35] whereas it is exp((τ β +o(1))n) throughout the low temperature regime [13, 14, 39] with τ β being the surface tension. The d=2 picture was very recently completed by two of the authors establishing cutoff in the high temperature regime [32] and polynomial mixing at the critical temperature [30]. For a more comprehensive survey of recent literature for Ising on the lattice see [30].

Turning back to the Potts model, understanding the kinetic picture here is of interest, not just as an extension of the results for Ising, but as an example of a model with a first order phase transition. Unlike in Ising, the free energy in the Potts model on various graphs and values of q undergoes a first order phase transition as the temperature is varied. This is certainly true for all q≥3 in the mean-field approximation, i.e. on the complete graph as treated here, but also known to be the case on ℤd for d≥2 and q>Q(d) for some Q(d)<∞ [23] (although most values of Q(d) are not known rigorously, it was shown that Q(2)=4 [2] and Q(d)<3 for all d large enough [6]).

A first order phase transition has direct implications on the dynamics of the system. For one, the coexistence of phases at criticality, implies slow mixing. This is because getting from one phase to another requires passing through a large free energy barrier, i.e. states which are exponentially unlikely. Indeed, in [10, 11] the mixing time for Potts on a box with n vertices in ℤd for any fixed d≥2 and sufficiently large q is shown to be exponential in the surface area of the box for any β larger or (notably) equal to the uniqueness threshold β c (d,q). This should be compared with the aforementioned polynomial mixing of Glauber dynamics for Ising at criticality. In fact, coexistence of the ordered and disordered phases also accounts for the slow mixing of the Swendsen-Wang dynamics at the critical temperature. This is shown in [10, 11] for ℤd under a similar range of d and q and in [22] for the complete graph. Other dynamics also exhibit slow mixing at criticality [4].

First order phase transitions are expected to lead to metastability type phenomena on the lattice in some instances. There has been extensive work on this topic (see [5, 12] and the references therein) yet the picture remains incomplete. It is expected that the transition to equilibrium will be carried through a nucleation process, which has an O(1) lifetime and therefore does not affect the order of the mixing time in contrast to the mean-field case. This is affirmed, for instance, in Ising where O(nlogn) mixing time is known for low enough temperatures under an (arbitrarily small) non-zero external field, despite the first order phase transition (in the field) around 0. For related works see e.g. [7, 8, 15, 36, 37] as well as [33] and the references there.

Similarly, the Potts model on the lattice should feature rapid mixing of O(nlogn) throughout the sub-critical regime β<β c (d,q) due to the vanishing surface-area-to-volume ratio. Thus, contrary to the critical slowdown picture predicted for Ising, whenever there is a first order phase transition it should be accompanied by a sharp transition from fast mixing at β<β c to an exponentially slow mixing at β c in lieu of a critical power law. However, fast mixing of the Potts model on ℤd for β<β c is not rigorously known except at very high temperatures (where it follows from standard arguments [33]). For sufficiently high temperatures, cutoff was very recently shown in [31].

On the complete graph however, metastability is apparent. In the absence of geometry, the order parameter sufficiently characterizes the state of the system and thus the dynamics and its stationary distribution are described effectively by the free energy of the system as a function of the order parameter. While coexistence of phases implies that at criticality the free energy is minimized at more than one value of the order parameter (corresponding to each phase), continuity entails that some of these global minima will turn into local minima just below or above criticality. These local minimizers correspond exactly to the metastable states and the value (or curve) of the thermodynamic parameter (e.g. temperature) beyond which these local minima cease to appear is called spinodal.

Consider the model at hand, namely Potts on the complete graph (for general background on the Potts model, see e.g. [9, 21, 23]). The order parameter here, analogous to the magnetization in the Ising model, is the vector of proportions of each color . It is well known that there exists β c (q), below which the Potts distribution μ n is supported almost entirely on configurations with roughly equal (about 1/q) proportions of each color and above which μ n is supported almost entirely on configurations where one of the q colors is dominant. In the former case we say that in equilibrium the system is in the disordered phase, while in the latter case we say that in equilibrium the system is in one of the q ordered phases, corresponding to the q-colors.

Up to relabeling of the vertices configurations are essentially described by the proportions vector s and as such, on a logarithmic scale, the Potts distribution can be read from the graph of the free energy as a function of s. As depicted in Fig. 4 (showing q=3, the situation is qualitatively the same for all q>2), when β<β c the free energy has a single global minimum at the center, corresponding to the disordered phase, while for β>β c there are q “on-axis” global minima, corresponding to the q ordered phases obtainable from one another through a permutation of the coordinates. At β c , coexistence of the ordered and disordered phases is evident in the presence of q+1 global minima of the free energy. For more details see, e.g., [16].

Below β c but sufficiently close to it, the free energy, globally minimized only at the center, has q local minima in place of the q global minima which corresponded to the ordered phases at criticality. Once β is too small, these local minima disappear. The threshold value for the appearance of these local minima is the spinodal inverse temperature β s (there is a similar behavior above β c marked by a second spinodal temperature β S , as illustrated by Fig. 4, but we do not address this regime in the paper).

Once the system starts from an initial configuration whose proportions vector is close to a local minimizer, the system will spend a time which is exponential in n near this minimizer before escaping to the global minimizer and reaching equilibrium. This is because away from a local minimizer, energy increases locally exponentially (in n), i.e. there is an energy barrier of an exponential order to cross. Thus, as n→∞ the system will spend an unbounded amount of time at a non-equilibrium state, which will be seemingly stable. In terms of the mixing time of the dynamics, as the definition involves the worst case initial configuration, metastable states will result in exponentially slow mixing.

Our result is a rigorous affirmation of this picture. Although the definition of β s in (1.1) seems different than the one given above for the spinodal inverse temperature, it can be shown, in fact, that this is indeed the threshold value of β for the emergence of local minima below β c . Theorem 3 then asserts that above β s mixing is exponentially slow while Theorem 1 shows that below β s mixing is still fast. The set of configurations whose exclusion in Theorem 4 leads to fast mixing all the way up to (but below) β c are precisely the ones from which the process will get stuck in a metastable state. Indeed as the free energy of such initial configurations is higher than that of configurations near the globally minimizing stable state, such configurations will have a probability which is exponentially small in the size of the system.

Furthermore, the transition from fast to slow mixing passes through polynomial mixing which occurs at β s and in its vicinity (Theorem 2). This in fact establishes that the aforementioned critical slowdown phenomenon occurs here as well, albeit at the spinodal rather than the uniqueness threshold. We predict that this should be the case for the dynamical behavior on other mean-field geometries such as an Erdős-Rényi random graph or a random regular graph.

For a (non-rigorous) treatment of metastability and its effect on the rate of convergence to equilibrium in other mean-field models with a first order phase transition, see for example [24, 25]. A rigorous analysis of such a system (the Blume-Capel model), below and above criticality, was recently carried out in [26]. It is illuminating to contrast the graph of the free energy as a function of the proportions vector in the Potts model to that of the free energy as a function of the magnetization in Curie-Weiss Ising, given in Fig. 5. The second order phase transition and lack of phase coexistence at the critical temperature, implies the absence of a local minima at any value below or above β c . As a consequence there is no spinodal temperature and mixing is fast throughout the whole range β<β c .

1.3 Proof Ideas

As discussed before, up to a permutation of the vertices, configurations can be described by their proportions vector. Formally, for a configuration σΣ n , we denote by S(σ) the q-dimensional vector (S 1(σ),…,S q(σ)), where

Note that \(S(\sigma) \in \mathcal {S}\) where . Now it is not difficult to see that S t =S(σ t ) is itself a Markov process with state space \(\mathcal {S}\) and stationary distribution π n =μ n S −1, the distribution of S(σ) under μ n . We shall refer to this process as the proportions chain. Figures 1 and 2 show a realization of the proportion chains superimposed on the free energy graph plotted upside down for better visibility. 3 different values of β, corresponding to 3 different regimes are exhibited. The color of the curve, representing time, shows the temporal evolution of the proportion chain. Notice how local minima (shown as local maxima) “trap” the chain for a long time.

As a projection of the chain, S t mixes at least as fast as σ t . Moreover, when starting in one of the q configurations where all sites have the same color, a symmetry argument reveals that the mixing time of S t is equal to that of σ t . One therefore must control the effect of starting from an initial state which is not monochrome. Using a coupling argument we show that the difference in the mixing times is of the same order as the cutoff window for S t and can thus be absorbed into our error estimates. It will then suffice to analyze the proportions chain, which is of a lesser complexity than the original process. In particular the state space of S t has a fixed q−1 dimension, independently of n, and its transition probabilities can be easily calculated.

Next, we show that when β<β c (q) most of the mass of the stationary distribution π n is concentrated on balanced states whose distance from the “equi-proportionality” vector (1/q,…,1/q) is \(O(1/\sqrt{n}\,)\). A simple coupling argument then shows that the Markov chain is mixed soon after arriving at such a state. Thus the main effort in the proof becomes finding sharp estimates on the time is takes for S t to reach a balanced state from a worst-case initial configuration, in different regimes of β.

It turns out that these hitting times are determined by the function D β (x), which is defined as (up to a multiplication by 1/n) the drift of one coordinate of S t when that coordinate has value x in the worst case, i.e. the minimum drift towards 1/q, where the minimum is taken over all possible values for the remaining coordinates. An explicit formula for D β (x) can be obtained (3.2). Its graph is plotted in Fig. 3 for x∈[1/q,1] and different values of β. For ββ c , this drift is strictly negative in (1/q,1] and thus each coordinate quickly (in O(nlogn) time) gets to within \(1/\sqrt{n}\) of 1/q. On the other hand, the function D β (x) is monotone increasing in β and therefore for sufficiently large β, it will no longer be negative throughout (1/q,1]—there will be an interval in (1/q,1] where it is positive. Such an interval will take an exponential amount of time to traverse and this will lead to an exponential mixing time. The smallest β for which this happens is, by definition, β s . This β s in turn coincides with the inverse temperature at which local minima begin to appear in the free energy as a function of s. In fact, to show exponentially slow mixing, we use standard conductance arguments, which in face of local minima in the free energy give exponential mixing quite automatically.

Fig. 3
figure 3

Minimal drift towards 1/q of a single coordinate of S t as a function of its value for the Curie-Weiss Potts model with q=3 and different values of β. Two lowest curves correspond to β<β s , green middle curve has β=β s (with points marking its two local extrema), second from top curve has β s <β<β c and top curve has β=β c (Color figure online)

Fig. 4
figure 4

The free energy as a function of the proportions vector s for the Potts model on the complete graph with q=3 and large n. The simplex is mapped into the XY plane via (s 1,s 2,s 3)↦(s 1,s 2,1−s 1s 2)

Fig. 5
figure 5

The free energy as a function of the magnetization m for the Ising model on the complete graph for large n. No phase coexistence at β c and the global maximizers for β>β c are seen to emerge continuously from m=0.5

The most delicate analysis is in the critical regime where β is near or equal to β s . In this case the x-axis is tangential to the graph of D β (x) at its peak (the green curve in Fig. 3) and the challenge is in finding the asymptotics of the passage time through the tangential point on the x-axis (left green dot in the figure). As the drift there is 0, locally around this point, a coordinate of S t behaves as a random walk and Doob’s decomposition of a suitably chosen function of the coordinate yields the right passage time estimates.

1.4 Organization

Section 2 sets notation and contains some useful facts on the Curie-Weiss Potts model, as well as tools (and a few non-standard variations on them) needed in the analysis of mixing time. In Sect. 3 we derive basic properties of the proportions chain that will be repeatedly used in the remainder of the proof. In Sect. 4 we analyze the case β<β s (q) and prove Theorem 1 while Sect. 5 treats the case β>β s (q) and establishes Theorem 3. The near critical regime is analyzed in Sect. 6, which includes the proof of Theorem 2. The final section, Sect. 7, gives the proof of Theorem 4.

2 Preliminaries

2.1 Notation

We let [a,b] denote the set {a,…,b} for a,b∈ℤ. We use the same notation for vector and scalar valued variables. For an m-dimensional vector s, we denote by s k its k-th component and for I=(i 1,…,i k )⊆[1,m], \(s^{I} = (s^{i_{1}}, \dots, s^{i_{k}})\). Matrix-valued variables will appear in bold. We let W m,k denote the (m,k) element of W and let W m denote is its m-th row.

We write e i for the unit vector in the i-th direction and \(\widehat {\mathsf {e}}= (1/q, 1/q, \dots, 1/q) \in \mathbb {R}^{q}\) for the equiproportionality vector. For s∈ℝq, we denote \(\widehat {s} = s - \widehat {e}\).

Most of our vectors will live on the simplex or even \(\mathcal {S}_{n} = \mathcal {S}\cap \frac {1}{n} \mathbb {Z}^{q}\). Occasionally we would like to further limit this set and for ρ>0 we define

Note that \(\mathcal {S}^{\rho+} \subseteq \mathcal {S}^{(q-1)\rho}\) and similar relations hold for \(\mathcal {S}_{n}\) and Σ n .

Vectors in \(\mathcal {S}\) will often be viewed also as distributions on [1,q]. A coupling of ν, \(\widetilde {\nu} \in \mathcal {S}\) is the joint distribution of two random variables X, \(\widetilde {X}\), defined on the same probability space and marginally distributed according to ν, \(\widetilde {\nu}\). If ℙ is the underlying probability measure then we always have \(\|\nu-\widetilde {\nu}\|_{\mathrm {TV}} \leq \mathbb {P}^{\star}(X \neq \widetilde {X})\). We shall call this coupling a best coupling if it satisfies \(\|\nu-\widetilde {\nu}\|_{\mathrm {TV}} = \mathbb {P}^{\star}(X \neq \widetilde {X})\). Such a coupling always exists.

In the course of the proofs, we introduce various couplings of two copies of the Glauber dynamics (σ t ) t . For the second copy we shall use the notation \(\widetilde {\sigma}_{t}\) and \(\widetilde {S}_{t} = S(\widetilde {\sigma}_{t})\). Couplings will be identified by their underlying probability measure, for which we will use the symbol ℙ with a superscript that changes from coupling to coupling, e.g. ℙBC. A subscript will indicate initial state or states, e.g. \(\mathbb {P}^{BC}_{\sigma_{0}, \widetilde {\sigma }_{0}}\). The expectation and variance, \(\mathbb {E}[\cdot]\) and \(\operatorname {\mathbb {V}\mathrm {ar}}(\cdot)\) resp., will be decorated in the same way as the underlying measure with respect to which they are defined. The σ-algebra \(\mathcal {F}_{t}\) will always include all the randomness up-to time t. For example, with a single chain (σ t ) t this is the σ-algebra generated by {σ s :st}, for a coupling of two chains (σ t ) t , \((\widetilde {\sigma}_{t})_{t}\), it is the one generated by , etc.

2.2 Large Deviations Results for the Curie-Weiss Potts Distribution

In this subsection we recall several results concerning the concentration of the proportions vector measures π n . See, e.g., [16, 20] for proofs of these results.

It is a consequence of Sanov’s Theorem together with an application of Varadhan’s Lemma that the sequence (π n ) n≥1 satisfies a large deviation principle (LDP) on \(\mathcal {S}\) with rate function

$$ I_{\beta , q}(s) = \sum_{k=1}^q s^k \log\bigl(q s^k\bigr) - \beta \| s \|_{2}^2 - C, $$
(2.1)

where C is chosen so that \(\min_{s \in \mathcal {S}} I_{\beta , q}(s) = 0\). The minimizing set ε β,q ={s:I β,q (s)=0} which is the support of the weak limit \(\mathfrak {p}_{\beta , q}\) of (π n ) n≥1 is then

$$ \varepsilon _{\beta , q} = \left\{ \everymath{\displaystyle }\begin{array}{l@{\quad }l} \{ \widehat {\mathsf {e}}\}& \mbox {if } \beta < \beta _c(q) \cr\noalign{\vspace{3pt}} \{ \widehat {\mathsf {e}}, \mathrm {T}^1 \check{s}_{\beta _c, q}, \mathrm {T}^2 \check{s}_{\beta _c, q}, \dots, \mathrm {T}^q \check{s}_{\beta _c, q} \}& \mbox {if } \beta = \beta _c(q) \cr\noalign{\vspace{3pt}} \{\mathrm {T}^1 \check{s}_{\beta , q}, \mathrm {T}^2 \check{s}_{\beta , q}, \dots, \mathrm {T}^q \check{s}_{\beta , q} \}& \mbox {if } \beta > \beta _c(q) \end{array} \right., $$
(2.2)

where

$$ \beta _c(q)=\frac{(q-1)\log(q-1)}{q-2} $$
(2.3)

and

$$ \check{s}_{\beta , q} = \biggl (\check{s}_{\beta , q}^1, \frac{1 - \check{s}_{\beta , q}^1}{q-1}, \dots, \frac{1 - \check {s}_{\beta , q}^1}{q-1} \biggr ). $$
(2.4)

The function \(\beta \mapsto\check{s}_{\beta , q}^{1}\) is continuous and increasing on [β c (q),∞) and interchanges the 1-st and k-th coordinates. Furthermore, the value of \(\check{s}_{\beta , q}\) for all β,q is known in implicit form and in particular for β=β c (q), we have

$$ \check{s}_{\beta _c(q), q} = \biggl (1-\frac {1}{q}, \frac {1}{q(q-1)}, \dots, \frac {1}{q(q-1)} \biggr ). $$
(2.5)

This is true for all q≥3. For q=2, (2.1), (2.2), (2.4), (2.5) still hold, but the critical inverse-temperature is now

$$ \beta _c(2) = 1. $$
(2.6)

It is here that a fundamental difference between q=2 and q>2 can be observed. If q=2 then \(\check{s}_{\beta _{c}(2), 2} = \widehat {\mathsf {e}}\), in which case \(\beta \mapsto \mathfrak {p}_{\beta , 2}\) is continuous for all β≥0. On the other hand, if q≥3 we have \(\check{s}_{\beta _{c}(q), q} \neq \widehat {\mathsf {e}}\) and \(\beta \mapsto \mathfrak {p}_{\beta , q}\) is discontinuous at β c (q). Thus, as it is recorded in the Physics literature, the system exhibits a first order phase transition if q≥3, but only a second order phase transition if q=2.

2.3 Hitting Time Estimates for General Supermartingales

We will require some standard hitting time estimates for supermartingales and related processes.

Lemma 2.1

For x 0∈ℝ, let (X t ) t≥0 be a discrete time process, adapted to \((\mathcal {F}_{t})_{t \geq0}\) which satisfies

  1. (1)

    \(\exists \delta \geq0{:}\ \mathbb {E}_{x_{0}} [X_{t+1} - X_{t} |\mathcal {F}_{t} ]\leq-\delta \) on {X t ≥0} for all t≥0.

  2. (2)

    R>0: |X t+1X t |≤R, ∀t≥0.

  3. (3)

    X 0=x 0,

where \(\mathbb {P}_{x_{0}}\) is the underlying probability measure. Let and . The following holds.

  1. (1)

    If δ>0 then for any t 1≥0:

    $$ \mathbb {P}_{x_0} \bigl (\tau_0^- > t_1 \bigr )\leq \exp \biggl \{-\frac{(\delta t_1 - x_0 )^2}{8 t_1 R^2} \biggr \}. $$
    (2.7)
  2. (2)

    If x 0≤0, then for any x 1>0 and t 2≥0,

    $$ \mathbb {P}_{x_0} \bigl (\tau_{x_1}^+ \leq t_2 \bigr )\leq2 \exp \biggl \{-\frac {(x_1-R)^2}{8t_2 R^2} \biggr \}. $$
    (2.8)
  3. (3)

    If x 0≤0, δ>0, then for any x 1>0 and t 3≥0:

    $$ \mathbb {P}_{x_0} \bigl (\tau_{x_1}^+ \leq t_3 \bigr )\leq t_3^2 \exp \biggl \{-\frac{(x_1-R) \delta ^2}{8 R^3} \biggr \}. $$
    (2.9)

Proof

Starting with (1), if x 0<0, there is nothing to prove. Otherwise, let Z t be a supermartingale independent of X t , which starts from 0, has drift −δ and steps which are bounded by R. Set \(Y_{t} = X_{t \land \tau_{0}^{-}} + Z_{(t -\tau _{0}^{-})^{+}}\) and write Y t =M t +A t as its Doob Decomposition, with M t a martingale, A t a predictable processes and A 0=x 0. Clearly A t x 0δt and |M t+1M t |≤2R, \(\mathbb {P}_{x_{0}}\)-a.s. The Hoeffding-Azuma inequality implies:

as desired.

Now set \(Y_{t} = X_{t \land \tau_{x_{1}}^{+}}\) and observe that (Y t ) t≥0 satisfies conditions 1–3, and consequently it is enough to prove (2.8), with \(\mathbb {P}_{x_{0}} (Y_{t_{2}} > x_{1} )\) as the LHS. Therefore, for all t, let \(W_{t+1} = \mathbb {E}_{x_{0}} [Y_{t+1} - Y_{t} |\mathcal {F}_{t} ]\) and Z t+1=Y t+1Y t W t+1. Then clearly, W t+1, Z t are \(\mathcal {F}_{t}\)-measurable, W t+1≤0 on {Y t ≥0} and \(\mathbb {E}_{x_{0}} [Z_{t+1} | \mathcal {F}_{t} ]= 0\). This is a Doob-type decomposition. Now, define M t for all t inductively as follows.

$$M_0 = 0 ; \qquad M_{t+1} = M_t + \operatorname {sign}(M_t) Z_{t+1} $$

and set N t =|M t |. We claim the following:

  1. (1)

    (M t ) t∈ℕ is an \((\mathcal {F}_{t} )_{t \geq0}\)-adapted martingale.

  2. (2)

    N 0=0; N t+1=|N t +Z t+1|, ∀t.

  3. (3)

    Y t N t +R.

The first two assertions follow straightforwardly from the construction. The third one, can be proven by induction, since it clearly holds for t=0 and assuming Y t−1N t−1+R, if Y t−1≥0, then

and if Y t−1<0, then

$$Y_t \leq R \leq N_t + R . $$

Finally, by the Hoeffding-Azuma inequality applied to M t we have

This shows (2.8).

For part (3), let \(\tau_{0}^{-}(s) = \inf\{t \geq s: X_{t} \leq0 \}\) and Y t (s), M t (s) and A t (s) defined as in the proof of part (1) only with \(\tau_{0}^{-}(s)\) replacing \(\tau_{0}^{-}\). Then,

where the last inequality follows from Hoeffding-Azuma inequality and since the summands are non zero only when 0≤s 1<s 2t 3 and (s 2s 1)Rx 1R. □

Lemma 2.2

Let (X t ) t∈ℕ be a process adapted to \((\mathcal {F}_{t})_{t \in \mathbb {N}}\) and satisfying the following conditions for some a>2δ≥0:

  1. (1)

    X t+1X t ∈{−1,0,1}.

  2. (3)

    \(\mathbb {E}[X_{t+1} - X_{t} | \mathcal {F}_{t}] \geq-\delta \).

  3. (3)

    \(\operatorname {\mathbb {V}\mathrm {ar}}(X_{t+1} | \mathcal {F}_{t}) \geq a\).

  4. (4)

    X 0≥0.

Let . Then

$$ \mathbb {P}\bigl(\tau_r^{+} \leq t\bigr) \geq C_1 \exp\bigl\{-C_2 (r/\sqrt{t} + \delta \sqrt {t}\,)^2 \bigr\} + O\bigl(t^{-1/2}\bigr) $$
(2.10)

where C 1,C 2 are positive constants which depends only on a.

Proof

It is easy to verify that conditions 1–3 imply

$$ \mathbb {P}(X_{t+1} \neq X_t | \mathcal {F}_t) \geq a ; \qquad \mathbb {P}(X_{t+1} - X_t = -1 | \mathcal {F}_t ,\, X_{t+1} \neq X_t) \leq \frac {1}{2} + \frac {\delta }{2a} . $$
(2.11)

Now, let T 0=0 and \(T_{k} = \inf\{t > T_{k-1} : \: X_{t} \neq X_{T_{k-1}} \}\), \(Y_{k} = X_{T_{k}}\) for k≥1. Also, let N k =T k T k−1 and Z k =Y k Y k−1. From (2.11), it is not hard to see that we can couple (N k ) k , (Z k ) k with two i.i.d. sequences \((\widetilde {N}_{k})_{k}\),\((\widetilde {Z}_{k})_{k}\) such that \(\widetilde {N}_{k} \geq N_{k}\), \(\widetilde {Z}_{k} \leq Z_{k}\) a.s. and

$$\widetilde {N}_k \sim\mbox{Geom}(a) ; \qquad \widetilde {Z}_k = \left\{ \begin{array}{l@{\quad} l} +1 & \mbox {w.p. } 1/2 - \delta /2a \cr\noalign{\vspace{3pt}} -1 & \mbox {w.p. } 1/2 + \delta /2a. \end{array} \right. $$

Consequently, with \(\widetilde {Y}_{k} = \sum_{m \leq k} \widetilde {Z}_{m}\) and \(\widetilde {T}_{k} = \sum_{m \leq k} \widetilde {N}_{m}\) we have:

where the last inequality follows by the local CLT for \(\widetilde {Y}_{k}\), which is a nearest neighbor random walk whose steps have mean −δ/a and variance 1−(δ/a)2 and also by Cramer’s theorem for \(\widetilde {T}_{k}\). All constants depend only on a. □

Finally, we will also make use of the following result from [29]:

Lemma 2.3

[29, Proposition 17.20]

Let (Z t ) t≥0 be a non-negative supermartingale adapted to \((\mathcal {G}_{t} )_{t \geq0}\) and N a stopping time. Suppose that:

  1. (1)

    Z 0=z 0.

  2. (2)

    |Z t+1Z t |≤B.

  3. (3)

    σ>0 such that \(\operatorname {\mathbb {V}\mathrm {ar}}(Z_{t+1} | \mathcal {G}_{t} )> \sigma^{2}\) on the event {N>t}.

If u>4B 2/(3σ 2), then:

$$\mathbb {P}_{z_0} (N > u )\leq\frac{4 z_0}{\sigma\sqrt{u}}. $$

2.4 Variance Lemma

The following is a straightforward extension of [28, Lemma 2.6] to vector valued Markov processes. We include a proof here for completeness.

Lemma 2.4

Let (Z t ) be a Markov chain taking values ind and with transition matrix P. Write \(\mathbb {P}_{z_{0}}\), \(\mathbb {E}_{z_{0}}\) for its probability measure and expectation respectively, when Z 0=z 0. Suppose that there is some 0<η<1 such that for all pairs of starting states \((z_{0}, \widetilde {z}_{0} )\),

(2.12)

Then \(v_{t} \triangleq \sup_{z_{0}} \operatorname {\mathbb {V}\mathrm {ar}}_{z_{0}} (Z_{t} )= \sup_{z_{0}} \mathbb {E}_{z_{0}} \| Z_{t} - \mathbb {E}_{z_{0}} Z_{t} \|_{2}^{2}\) satisfies:

$$ v_t \leq v_1 \min \bigl \{t, \bigl (1-\eta^2 \bigr )^{-1} \bigr \}. $$
(2.13)

Proof

Let Z t and \(Z_{t}^{*}\) be independent copies of the chain with the same starting state z 0. By the assumption (2.12), we obtain that

$$\bigl \| \mathbb {E}_{z_0}[Z_t \mid Z_1 = z_1] - \mathbb {E}_{z_0}\bigl[Z_t^{*} \mid Z_1^* = z_1^*\bigr] \bigr \|_{2} = \bigl \| \mathbb {E}_{z_1}[Z_{t-1}] - \mathbb {E}_{z_1^*} \bigl [Z_{t-1}^*\bigr]\bigl\|_2 \,\leq\eta^{t-1} \bigr\|z_1 - z_1^* \bigr \|_{2}. $$

Hence, we see that

Combined with the total variance formula, it follows that

$$v_t \leq\sup_{z_0}\bigl\{\mathbb {E}_{z_0}\bigl[\operatorname {\mathbb {V}\mathrm {ar}}_{z_0}(Z_t \mid Z_1)\bigr] + \operatorname {\mathbb {V}\mathrm {ar}}_{z_0}\bigl(\mathbb {E}_{z_0}[Z_t \mid Z_1]\bigr)\bigr\} \leq v_{t-1} + \eta^{2(t-1)} v_1, $$

which then gives that \(v_{t} = \sum_{i=1}^{t}(v_{i} - v_{i-1}) \leq\sum _{i=1}^{t} \eta^{2(t-1)}v_{1}\), implying the desired upper bound immediately. □

2.5 Bottleneck Ratio

Let P be an irreducible, aperiodic transition kernel for a Markov chain on S with stationary measure π. The bottleneck ratio of a set AS is:

$$\varPhi(A) = \frac{\sum_{x \in A, y \notin A} \pi(x) P(x,y)}{\sum _{x \in A} \pi(x)} \leq\frac{\pi(\partial_{P} A)}{\pi(A)}, $$

where P A={xA:P(x,y)>0 for some yA}. The bottleneck ratio of the chain is

$$ \varPhi_* = \min_{A : \pi(A) \leq \frac {1}{2}} \varPhi(A) . $$
(2.14)

The following result, due to [1, 27, 38] in several similar forms (see, e.g., [29, Theorem 7.3]) relates the bottleneck ratio with the mixing time of the chain.

Theorem 2.5

If Φ is the bottleneck ratio defined in (2.14) then \(t_{{ \scriptsize \mbox {\textsc {mix}}}(1/4)} \geq\frac{1}{4\varPhi_{*}} \).

3 Drift Analysis for the Proportions Chain

In this section we prove various results concerning the drift of the process S t . We analyze both the one coordinate process \(S^{1}_{t}\) and the distance-to-equiproportionality \(\| S_{t} - \widehat {\mathsf {e}}\|_{2}\). In the course of this analysis, we also define two couplings which will be of independent use later on and prove a uniform bound on the variance of S t .

3.1 The Drift of One Proportion Coordinate

From symmetry, it is enough to analyze the drift of \(S^{1}_{t}\). For β≥0, define \(g_{\beta }:\mathcal {S}\to \mathcal {S}\) as

$$g_{\beta }(s) = \bigl(g^1_{\beta }(s), \dots, g^q_{\beta }(s)\bigr) ; \qquad g_{\beta }^k(s) = \frac{\mathrm{e}^{2\beta s^k}}{\sum_{j=1}^q \mathrm{e}^{2\beta s^j}} . $$

We can express the drift of \(S_{t}^{1}\) as follows:

(3.1)

with

$$ d_{\beta }(s) \triangleq -s^1 + g_{\beta }^1(s). $$
(3.2)

The function \(d_{\beta }: \mathcal {S}\to \mathbb {R}\) thus describes (up to a constant factor of n −1 and an error term) the drift of the first coordinate given the current proportions vector. It turns out the rapid mixing hinges on whether d β (s) is strictly negative whenever s 1>1/q (and for any values for the remaining coordinates of s). Accordingly we define D β :[0,1]→ℝ as

(3.3)
(3.4)

and check when D β (x) is strictly negative for all x∈(1/q,1]. We will see in Proposition 3.1 below that this happens if and only if β<β s (q), where β s (q)>0 is defined in (1.1).

For β≥0, define s (β) and s (β) as:

with the inf or sup being 1, if the respective sets are empty. We may now state:

Proposition 3.1

For all q≥3 the following holds:

  1. (1)

    We have that D β (s) is increasing in β if s∈[1/q,1] and for all β≥0, that D β (1/q)=0 and that D β (1)<0.

  2. (2)

    That \(s^{\sharp}(\beta ) > \frac {1}{q}\) if β<q/2.

  3. (3)

    The following statements are equivalent if β<q/2:

    1. (a)

      s (β)=1.

    2. (b)

      D β (s) has no roots in (1/q,1].

    3. (c)

      D β (s (β))<0.

  4. (4)

    All the statements in part (3) hold if and only if β<β s (q).

Proof

We start by proving part (1). It is clear that

$$D_\beta (s) = -s + \frac{1}{1 + (q-1) \exp(2\beta \frac {1 - qs}{q-1})}, $$

and hence is strictly increasing in β if 1/q<s≤1. Furthermore, we have that

$$D_\beta (1/q) = -\frac{1}{q} + \frac{\mathrm{e}^{2\beta /q}}{ \mathrm{e}^{2\beta /q} + (q-1) \mathrm{e}^{2\beta /q}} = -\frac{1}{q} + \frac{1}{q} = 0, $$

as well as

$$D_\beta (1) = -1 + \frac{1}{1 + (q-1)\mathrm{e}^{-\beta }} < 0. $$

For part (2), taking the derivative of D β (s) with respect to s evaluated at 1/q, one obtains \(\frac {d}{ds} D_{\beta }(s) |_{s=1/q} = -1 + \frac{2\beta }{q}\) which is negative if β<q/2. Together with D β (1/q)=0, this completes the proof.

For part (3), we first show that there exists at most two points in [1/q,1] such that \(\frac{d}{d s}D_{\beta }(s) = 0\). To see this, we compute the first derivative and obtain that

$$\frac{d}{ds}D_\beta (s) = -1+ \frac{2q\beta \exp(2\beta \frac{1 - qs}{q-1})}{[1+ (q-1) \exp(2\beta \frac{1-qs}{q-1})]^2} = -1 + \frac {2q\beta }{q-1}h\big((q-1) \mathrm{e}^{2\beta \frac{1-qs}{q-1}}\big), $$

where \(h(x) = \frac{x}{(1+x)^{2}}\). Obviously, there are at most two zeros for \(-1 + \frac{2q\beta }{q-1}h(x)\) and since \((q-1) \exp(2\beta \frac{1-qs}{q-1})\) is a strictly monotone in s, we conclude that there are at most two points such that \(\frac{d}{ds}D_{\beta }(s)\) vanishes.

Notice also that \(\frac{d}{ds}D_{\beta }(1/q) <0\) provided that β<q/2 and hence D β (1/q+ξ)<0 for all ξξ 0, where ξ 0 is a sufficiently small positive number. We are now ready to derive the equivalence stated in the proposition. Observing that D β (s) is a smooth function and D β (1)<0, we deduce that (3a) ⇒ (3b) ⇒ (3c). It remains to prove that (3c) ⇒ (3a). Suppose now that (3c) holds and there exists s 0∈(1/q,1] such that D β (s 0)≥0. Recalling that D β (1/q+ξ)<0 and D β (1)<0, we deduce the following:

  • If s 0<s , we will then have at least two zeros in (1/q,s ) for \(\frac{d}{ds}D_{\beta }(s)\).

  • If s 0>s , we will then have at least one zero in (s ,1) for \(\frac{d}{ds}D_{\beta }(s)\).

We see that the first case contradicts with the fact that there can be at most two zeros for \(\frac{d}{ds}D_{\beta }(s)\) and the second case contradicts our definition of s . Altogether, we established that (3c) ⇒ (3a).

As for the last part, continuity and part (1) imply that (1.1) is equivalent to

Since D β (s) is increasing in β for all s∈[1/q,1], it follows that for all β<β s (q), we indeed have D β (s)<0 for all s∈(1/q,1]. On the other hand, by the continuity of the function D β (s) and our definition of β s , we know that there exists s M ∈(1/q,1] such that \(D_{\beta _{s}}(s_{M}) =0\). Now, using the result of part (1) we conclude that D β (s M )>0 for any β>β s , completing the proof. □

In the following proposition we discuss the relation between β s (q) and β c (q).

Proposition 3.2

For q≥3 we have that 0<β s (q)<β c (q)<q while β s (2)=β c (2)=1.

Proof

As recalled in Sect. 2.2,

$$\beta _c(2) = 1 \quad \mbox {and} \quad \beta _c(q)=\frac{(q-1)\log(q-1)}{q-2} , \quad q \geq3 . $$

In the q=2 case, it is easy to verify that \(\frac{d}{ds} D_{\beta }(s) < 0\) for all \(s \in[\frac{1}{2}, 1]\) if β<1 and \(\frac{d}{ds} D_{\beta }(\frac{1}{2}) > 0\) if β>1. Since in addition \(D_{\beta }(\frac{1}{2}) = 0\), D β (1)<0, we obtain β s (2)=1.

For q≥3 we have β c (q)<q/2 and therefore

$$D_{\beta _c}\biggl (\frac{q-1}{q}\biggr )= - \frac{q-1}{q} + \frac{1}{1 + (q-1) \exp \big(2\beta _c \frac{1 - q\frac{q-1}{q}}{q-1}\big)} =0 $$

and

Now if ϕ(q)=2(q−1)log(q−1)−q(q−2) then ϕ(2)=ϕ′(2)=0 and \(\phi''(s)=-2+\frac{2}{q-1}\) which is negative when q>2. It follows that ϕ(q) is negative when q>2 and hence for q≥3,

$$\frac{d}{ds} D_{\beta _c}(s) \bigg|_{s=\frac{q-1}{q}}<0. $$

So for small enough ϵ>0 and \(s\in(\frac{q-1}{q}-\epsilon ,\frac {q-1}{q})\) we have that \(D_{\beta _{c}}(s)>0\) and hence \(\sup_{s} D_{\beta _{c}}(s)>0\). By the smoothness of D β (s) this implies that there exists β<β c such that sup s D β (s)>0 which establishes that β s <β c . □

We will make repeated use of the following proposition throughout the paper.

Proposition 3.3

  1. (1)

    Assume β<q/2. For all 0<ρ 0<ρ small enough, there exists γ>0 and C,c>0 such that for all n with t=eγn we have

    (3.5)

    for all \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}^{+}}\).

  2. (2)

    Assume β<q/2. For all r 0>0, γ>0 there exists C,c>0 such that for all n and r>r 0 with t=γn, \(\rho_{0} = \frac{r_{0}}{\sqrt{n}}\) and \(\rho= \frac{r}{\sqrt{n}}\) we have

    (3.6)

    for all \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}+}\).

  3. (3)

    Assume β<β s (q). For all ρ>0 there exists γ>0 and C,c>0, such that for all n, with t=γn we have

    $$ \mathbb {P}_{\sigma_0} \bigl (\sigma_t \notin \varSigma _n^{\rho+} \bigr )\leq C \mathrm{e}^{-c n} $$
    (3.7)

    for all σ 0Σ n .

Proof

Consider the process until the first time it is to the right of 1/q+ρ. If n is large enough and, in case (1), if ρ is small enough, Proposition 3.1 part (1) and Eq. (3.1) imply that this process satisfies the conditions of Lemma 2.1. Parts (1) and (2) of the proposition then follow from parts (3) and (2) of the lemma and summing over all coordinates.

As for part (2), consider this time . Since β<β s (q), it follows from Proposition 3.1 (parts (1) and (3)) that this process also satisfies the conditions of Lemma 2.1, with δ>Cn −1, for some positive constant C′. Now set γ=2/C′ and apply part (1) of the lemma to conclude that except with probability exponentially small in n, \(S^{1}_{t} \leq1/q+\rho/2\) for some tγn. Once this happens, by Lemma 2.1 part (3), as in the proof of (3.5), we have \(S^{1}_{\gamma n} \leq1/q+\rho\) again except with probability tending exponentially fast to zero with n. It remains to use union bound to complete the proof. □

3.2 Bounded Dynamics

The bounded dynamics is a process that evolves like σ t , only that S(σ t ) is forced to stay close to \(\widehat {\mathsf {e}}\) by rejecting transitions which violate this condition. Formally, fix ρ>0 and let (σ t ) t≥0 be a Markov chain on \(\varSigma _{n}^{\rho+}\), which evolves as follows. Start from some \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and at step t+1:

  • Draw \(\widetilde {\sigma}_{t+1}\) according to P n (σ t ,⋅), where P n is the original transition kernel.

  • If \(\widetilde {\sigma}_{t+1} \in \varSigma _{n}^{\rho+}\) set \(\sigma_{t+1} = \widetilde {\sigma}_{t+1}\) and otherwise set σ t+1=σ t .

We shall denote by ℙρ the underlying probability measure.

The unbounded and ρ-bounded dynamics admit a natural coupling, under which the two processes start from the same configuration and evolve together until time , where S t is the unbounded process. This leads to the following two immediate observations which will be useful later.

  1. (1)

    For any integer t and bounded function f:(Σ n )t+1↦ℝ:

    $$ \bigl |\mathbb {E}f(\sigma_{[0,t]}) - \mathbb {E}^{\rho} f(\sigma_{[0,t]})\bigr| \leq2 \|f\|_{\infty } \mathbb {P}(\tau\leq t). $$
    (3.8)
  2. (2)

    In particular for any set A⊆(Σ n )t+1:

    $$ \bigl |\mathbb {P}(\sigma_{[0,t]} \in A) - \mathbb {P}^{\rho} (\sigma_{[0,t]} \in A)\bigr| \leq2 \mathbb {P}(\tau\leq t). $$
    (3.9)

3.3 Synchronized Coupling

The synchronized coupling is a (Markov) coupling of two ρ-bounded dynamics in which the two chains “synchronize” their steps as much as possible. Formally, define (σ t ) t≥0, \((\widetilde {\sigma}_{t})_{t\geq0}\) on the same probability space such that starting from σ 0, \(\widetilde {\sigma}_{0}\), at time t+1:

  1. (1)

    Choose colors I t+1, \(\widetilde {I}_{t+1}\) according to an optimal coupling of S t , \(\widetilde {S}_{t}\).

  2. (2)

    Choose colors J t+1, \(\widetilde {J}_{t+1}\), according to an optimal coupling of \(g_{\beta } (S_{t} - n^{-1} \mathsf {e}_{I_{t+1}} )\), \(g_{\beta } (\widetilde {S}_{t} - n^{-1} \mathsf {e}_{\widetilde {I}_{t+1}})\).

  3. (3)

    Change a uniformly chosen vertex of color I t+1 in σ t to have color J t+1 in σ t+1, but only if \(\sigma_{t+1} \in \varSigma _{n}^{\rho+}\).

  4. (4)

    Change a uniformly chosen vertex of color \(\widetilde {I}_{t+1}\) in \(\widetilde {\sigma}_{t}\) to have color \(\widetilde {J}_{t+1}\) in \(\widetilde {\sigma}_{t+1}\), but only if \(\widetilde {\sigma}_{t+1} \in \varSigma _{n}^{\rho+}\).

We shall write \(\mathbb {P}^{SC,\rho}_{\sigma_{0}, \widetilde {\sigma}_{0}}\) for the underlying measure and omit ρ if it is large enough for the dynamics not to be bounded.

The following shows that this coupling contracts the ∥⋅∥1 distance of the proportions vector.

Lemma 3.4

There exists C(β,q)>0 such that for any ρ>0, uniformly in \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma _{n}^{\rho+}\) as n→∞

$$ \mathbb {E}^{SC, \rho}_{\sigma_0, \widetilde {\sigma}_0} \| S_t - \widetilde {S}_t \|_{1} \leq \biggl (p + \frac{C\rho}{n} \biggr )^t \| s_0 - \widetilde {s}_0 \|_{1}, $$
(3.10)

where

$$ p = p(n, \beta , q) = 1 - \frac{1-2\beta /q}{n}. $$
(3.11)

Proof

For \(s, \widetilde {s} \in \mathcal {S}_{n}\) by a Taylor expansion of g β around s, then another expansion for ∇g β around \(\widehat {\mathsf {e}}\) one has:

$$\bigl \| g_{\beta }(s) - g_{\beta }(\widetilde {s}) \bigr \|_{1} = \frac{2\beta }{q} \| s - \widetilde {s} \|_{1} \bigl (1 + O \bigl (\| s - \widehat {\mathsf {e}}\|_{1} + \| \widetilde {s} - \widehat {\mathsf {e}}\|_{1} \bigr )\bigr ), $$

where we use the easily verified:

(3.12)

Now under the bounded dynamics, \(I_{t+1} \neq \widetilde {I}_{t+1}\) implies that \(S_{t}^{I_{t+1}} > \widetilde {S}_{t}^{I_{t+1}}\) and \(\widetilde {S}_{t}^{\widetilde {I}_{t+1}} > S_{t}^{\widetilde {I}_{t+1}}\) while \(J_{t+1} \neq \widetilde {J}_{t+1}\) implies that \((S_{t} - n^{-1} \mathsf {e}_{I_{t+1}} )^{J_{t+1}} > (\widetilde {S}_{t} - n^{-1} \mathsf {e}_{\widetilde {I}_{t+1}} )^{J_{t+1}}\) and \((\widetilde {S}_{t} - n^{-1} \mathsf {e}_{\widetilde {I}_{t+1}} )^{\widetilde {J}_{t+1}} > (S_{t} - n^{-1} \mathsf {e}_{I_{t+1}} )^{\widetilde {J}_{t+1}}\). It follows by the definition of the coupling that

$$\| S_{t+1} - \widetilde {S}_{t+1} \|_{1} -\| S_t - \widetilde {S}_t \|_{1} = -\frac{2}{n} [ \mathbb{I}_{\{ I_{t+1} \neq \widetilde {I}_{t+1} \}} - \mathbb{I}_{\{ J_{t+1} \neq \widetilde {J}_{t+1} \}}]. $$

Recalling that for \(s \in \mathcal {S}_{n}\), \(\|s\|_{\mathrm {TV}} = \frac {1}{2} \| s \|_{1}\) and that under the best coupling of distributions s, \(\widetilde {s}\) the probability of disagreement is \(\|s-\widetilde {s}\|_{\mathrm {TV}}\), we have:

The result follows by iteration. □

3.4 Uniform Variance Bound

Lemma 3.5

Assume β<q/2. There exists ρ 0=ρ 0(β,q) such that if ρρ 0

$$ \operatorname {\mathbb {V}\mathrm {ar}}_{\sigma_0}^{\rho} (S_t )= O \bigl (n^{-1} \bigr ), $$
(3.13)

uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and t≥0, and there exists γ 0>0 such that

$$ \operatorname {\mathbb {V}\mathrm {ar}}_{\sigma_0} (S_t )= O \bigl (n^{-1} \bigr ), $$
(3.14)

uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}+}\) and \(t \leq \mathrm{e}^{\gamma _{0} n}\).

Proof

Equation (3.13) will follow directly from Lemma 2.4 applied to S t under the ρ-bounded dynamics. Indeed, Lemma 3.4 gives a stronger version of Condition 2.12 with η=p+ρO(n −2). Now, if β<q/2 and ρ is small enough, we have \(\eta\leq1 - \frac{1 - \frac{2\beta }{q}}{2n}\) for large enough n. Then (3.13) follows from (2.13) since \(\operatorname {\mathbb {V}\mathrm {ar}}^{\rho}_{\sigma_{0}} S_{1} = O (n^{-2} )\).

For (3.14), find ρ′<ρ and use (3.5) and (3.8) to conclude that for all \(\sigma_{0} \in \varSigma _{n}^{\rho^{\prime}+}\) and t≤eγn for some γ=γ(ρ,ρ′):

$$\operatorname {\mathbb {V}\mathrm {ar}}_{\sigma_0} (S_t )= \operatorname {\mathbb {V}\mathrm {ar}}^{\rho}_{\sigma_0} (S_t )+ o\bigl(n^{-1}\bigr) = O \bigl (n^{-1} \bigr ). $$

 □

Corollary 3.6

For β<β c (q), we have \(\operatorname {\mathbb {V}\mathrm {ar}}_{\mu_{n}} (S) = O(n^{-1})\).

Proof

Fix ρ<ρ 0, where ρ 0 is given in Lemma 3.5 and notice that the bounded dynamics is reversible with respect to the Potts measure μ n restricted to \(\varSigma _{n}^{\rho+}\). Therefore the bounded dynamics has \(\mu^{\rho}_{n}(\cdot) = \mu_{n}(\cdot| \sigma\in \varSigma _{n}^{\rho+})\) as its stationary measure. From the large deviation analysis in Sect. 2.2 it is straightforward to conclude that if β<β c (q)

$$\mu_n\bigl(S \notin \mathcal {S}^{\rho+}\bigr) \leq\mathrm{e}^{-C n}, $$

for some C>0 and n large enough, depending on ρ and β. Therefore, we have \(\|\mu_{n} - \mu^{\rho}_{n}\|_{\mathrm {TV}} \leq\mathrm{e}^{-Cn}\). Since μ n , \(\mu^{\rho}_{n}\) live on a compact space, this gives \(\operatorname {\mathbb {V}\mathrm {ar}}_{\mu_{n}}(S) \leq \operatorname {\mathbb {V}\mathrm {ar}}_{\mu^{\rho}_{n}}(S) + \mathrm{e}^{-Cn}\). Since \(\mathbb {P}^{\rho}_{\sigma_{0}} (\sigma_{t} \in\cdot)\) converges to \(\mu_{n}^{\rho }\) as t→∞ for any fixed \(\sigma_{0} \in \varSigma ^{\rho+}_{n}\), Lemma 3.5 can be extended to σ 0 chosen from \(\mu _{n}^{\rho}\). This completes the proof. □

3.5 The Drift of the Distance to Equiproportionality

Here we show that \(\widehat {S}_{t}\) has drift towards 0. Write S t+1=S t +ξ t+1 where we have that for i, j=1,…,q,

$$\mathbb {P}\biggl(\xi_{t+1} = \frac {1}{n} (\mathsf {e}_j - \mathsf {e}_i )\biggr) = S_t^i g_{\beta }^j \biggl (S_t - \frac {1}{n} \mathsf {e}_i \biggr )= S_t^i g_{\beta }^j (S_t )+ O \bigl (n^{-1} \bigr ). $$

Then,

where \(h(s) \triangleq \sum_{k=1}^{q} g_{\beta }^{k}(s) s^{k}\). Notice that since \(g^{k}_{\beta}(s)=g^{k}_{\beta}(s- \widehat {\mathsf {e}})\) we have that \(h(s) = 1/q + h(\widehat {s})\) and its gradient and Hessian are:

$$\mathrm {D}_1 h(0) = \frac{1}{q} \quad\mbox{and}\quad \mathrm {D}_2 h(0) = \frac {4 \beta }{q} P, $$

where P is a projection matrix onto \(( \widehat {\mathsf {e}})^{\perp}\). Therefore, we may write

$$h(s) = \frac{1}{q} + \frac{2\beta }{q} \| \widehat {s} \|_{2}^2 + O \bigl (\| \widehat {s} \|_{2}^3 \bigr ). $$

This gives

(3.15)

where p is defined in (3.11).

3.6 Contraction for the Distance to Equiproportionality

Fix β<q/2 and ρ<ρ 0 where ρ 0 is given in Lemma 3.5. For what follows, assume that \(\sigma_{0} \in \varSigma ^{\rho+}_{n}\) and \(t \leq \mathrm{e}^{\gamma _{0} n}\) where γ 0 is also given. Then, taking expectation in Eq. (3.15), we get:

$$ \mathbb {E}_{\sigma_0} \| \widehat {S}_{t+1} \|_{2} = p^2 \mathbb {E}_{\sigma_0} \| \widehat {S}_t \|_{2}^2 + \bigl (\mathbb {E}_{\sigma_0} \| \widehat {S}_t \|_{2}^3 \bigr )O \bigl (n^{-1} \bigr )+ O \bigl (n^{-2} \bigr ). $$
(3.16)

Now by Taylor expansion of \(s \mapsto \| s \|_{2}^{3}\) around \(\mathbb {E}_{\sigma _{0}} \widehat {S}_{t}\) in view of (3.14),

Then

(3.17)

This will in turn imply:

Proposition 3.7

Fix β<q/2. There exist ρ 0=ρ 0(β,q)>0 and C=C(β,q)>0 such that if ρρ 0 there exists γ(ρ)>0 such that:

(3.18)

uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and t≤eγ(ρ)n, where p=p(n,β,q) is defined in (3.11).

Proof

Set \(\lambda _{t} \triangleq \mathbb {E}_{\sigma_{0}} \| \widehat {S}_{t} \|_{2}^{2}\). It follows from (3.17) and (3.5) that for any \(\rho< \overline {\rho} < \rho_{0}\), where ρ 0 is given in Lemma 3.5, there exists \(\gamma = \gamma (\rho, \overline {\rho }) > 0\) such that uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and t≤eγn:

$$\lambda _{t+1} \leq \lambda _t \bigl (p^2 + \overline {\rho} O \bigl (n^{-1} \bigr )\bigr )+ O \bigl (n^{-2} \bigr )= \lambda _t \bigl (p + \overline {\rho} O \bigl (n^{-1} \bigr )\bigr )^2 + O \bigl (n^{-2} \bigr ). $$

We now use the following fact, which can be easily verified. If (λ t ) t≥0 is a sequence satisfying:

$$\lambda _{t+1} = p \lambda _t + a r^t + b, $$

for some pr, p≠1, a and b, then:

$$ \lambda _{t} = \lambda _0 p^t + a \frac{p^t - r^t}{p-r} + \frac{b}{1-p} \bigl (1 - p^t \bigr ). $$
(3.19)

Apply this (with a=0) and use the monotonicity in λ t of the right hand side above (at least if n is large enough), to conclude:

Plugging this a priori bound back into (3.17) we see that:

$$\lambda _{t+1} = p^2 \lambda _t + \overline {\rho}^3 \bigl (p + \overline {\rho} O\bigl (n^{-1}\bigr) \bigr )^{3t} O \bigl (n^{-1} \bigr )+ O \bigl (n^{-2} \bigr ). $$

Using (3.19) again and choose \(\overline {\rho}\) small enough to obtain

as desired. □

4 Mixing in the Subcritical Regime

In this section we prove Theorem 1. Recall that \(\alpha _{1} = \alpha _{1}(\beta , q) = \frac{1}{2 (1 - 2\beta /q )}\) and set

$$ t^{\alpha _1}(n) = \alpha _1 n \log n ; \qquad t^{\alpha _1}_{\gamma }(n) = \alpha _1 n \log n + \gamma n. $$
(4.1)

4.1 Proof of Lower Bound in Theorem 1

Proof

The analysis in this subsection pertains to all β<β c (q). Fix 0<ρ 2<ρ 1<ρ 0, where ρ 0 is given in Proposition 3.7 and let σ 0Σ n be such that \(\rho_{2} < \| \widehat {s}_{0} \|_{2} < \rho_{1}\). Then if \(t < t^{\alpha _{1}}_{\gamma }(n)\) and ρ 1 is small enough, Proposition 3.7 implies

$$\mathbb {E}_{\sigma_0} \| \widehat {S}_t \|_{2}^2 \geq\frac{\rho_2^2}{2} \bigg( 1 - \frac {1-2\beta /q}{n}\bigg)^{2 t^{\alpha _1}_{\gamma}(n)} + O \bigl (n^{-1} \bigr )\geq\frac{1}{n} \mathrm{e}^{-(1 - 2\beta /q)\gamma}, $$

for sufficiently large −γ depending on ρ 2 and large enough n. Combined with the uniform variance bound given in Lemma 3.5, it follows that for large enough n

$$\mathbb {E}_{\sigma_0} \| \widehat {S}_{t} \|_{2} \geq\frac{\mathrm{e}^{-(1 - 2\beta /q) \gamma/2}}{\sqrt{n}}. $$

Applying Chebyshev’s inequality and using Lemma 3.5 again, we conclude that uniformly in all r>0, \(t \leq t^{\alpha _{1}}_{\gamma }(n)\) and \(\sigma_{0} \in \varSigma _{n}^{\rho_{1}+} \setminus \varSigma _{n}^{\rho_{2}+}\)

$$ \mathbb {P}_{\sigma_0} \biggl (\| \widehat {S}_{t} \|_{2} < \frac{r}{\sqrt{n}} \biggr )\leq\frac{\operatorname {\mathbb {V}\mathrm {ar}}_{\sigma_0} ( \widehat {S}_{t} )}{(\frac{\mathrm{e}^{-(1 - 2\beta /q) \gamma/2}}{\sqrt{n}} - \frac{r}{\sqrt{n}} )^2} = O\big(\bigl(\mathrm{e}^{-(1 - 2\beta /q) \gamma/2} - r\bigr )^{-2}\big). $$
(4.2)

In particular, this implies

$$ \lim_{\gamma \to-\infty} \limsup_{n \to\infty} \mathbb {P}_{\sigma_0} \biggl (\| \widehat {S}_{t^{\alpha _1}_{\gamma }(n)} \|_{2} < \frac{r}{\sqrt{n}} \biggr )= 0 . $$
(4.3)

On the other hand, \(\mathbb {E}_{\mu_{n}} S_{t} = \widehat {\mathsf {e}}\) and from Corollary 3.6 it follows that \(\operatorname {\mathbb {V}\mathrm {ar}}_{\mu_{n}} S_{t} = O (n^{-1} )\) for β<β c (q). Therefore another application of Chebyshev’s inequality yields that

$$ \mu_n \biggl (\| \widehat {s}_t \|_{2} < \frac{r}{\sqrt{n}} \biggr )\geq1 - \frac {O(1)}{r^2}, $$
(4.4)

for all t≥0. Altogether, we have that for any r>0,

$$\lim_{\gamma \to-\infty} \liminf_{n \to\infty} d_{t^{\alpha _1}_{\gamma }(n)}(n) \geq1 - \frac{O(1)}{r^2}\ $$

and it remains to send r→∞. □

In the remainder of the section we prove the upper bound on the mixing time when β<β s (q). The proof is based on upper bounding the coalescence time of two coupled dynamics, one starting from any configuration in Σ n and the other starting from the stationary distribution μ n . This coupling will be done in several stages with different couplings from one stage to the next. In what follows, (σ t ) t≥0 and \((\widetilde {\sigma})_{t \geq0}\) will denote the two coupled processes.

4.2 O(n −1/2) from Coalescence

We now show that with arbitrarily high probability, S t gets O(n −1/2)-close to \(\widehat {\mathsf {e}}\) in O(nlogn) steps, if initially its distance is at most ρ, where ρ is small enough. More precisely,

Lemma 4.1

Fix β<q/2. Then for all r>0:

$$\mathbb {P}_{\sigma_0} \bigl (S_{t^{\alpha _1}(n)} \notin \mathcal {S}^{\frac{r}{\sqrt {n}}} \bigr )= O\bigl(r^{-1}\bigr), $$

uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}+}\) where ρ 0=ρ 0(β,q) is defined in Proposition 3.7 and \(t^{\alpha _{1}}(n)\) is defined in (4.1).

Proof

This follows immediately from Proposition 3.7 and a first moment argument:

 □

4.3 O(n −1) from Coalescence

To get the correct order of the mixing time it is not sufficient to simply use the drift to couple the chains as the drift is very weak when S t is close to \(\widehat {\mathsf {e}}\). As such, in this section we define a different coupling of the dynamics which will bring σ t and \(\widetilde {\sigma }_{t}\) to distance O(n −1) apart in linear time. This will be achieved one coordinate after the other. We begin by giving a general definition of, what we call, a semi-independent coupling and then use it to define the coupling of the dynamics.

Let ν, \(\widetilde {\nu}\) be two positive distributions on Ω m =[1,m] and fix a non-empty AΩ m , where m is some positive integer. We shall write ν| A for the conditional distribution given A, i.e. ν| A (x)=ν(x)/ν(A), for xA. The A-semi-independent coupling of ν, \(\widetilde {\nu}\) is a coupling of two random variables X and \(\widetilde {X}\) with underlying measure ℙ, constructed according to the following procedure:

  1. (1)

    Choose U∈[0,1] uniformly.

  2. (2)

    If \(U \leq\min \{\nu(A), \widetilde {\nu}(A) \}\), draw X and \(\widetilde {X}\) using a best coupling of \((\nu|_{A}, \widetilde {\nu}|_{A} )\).

  3. (3)

    Otherwise, independently:

    1. (a)

      Draw X according to ν| A if U<ν(A) and according to \(\nu|_{A^{c}}\) if Uν(A).

    2. (b)

      Draw \(\widetilde {X}\) according to \(\widetilde {\nu}|_{A}\) if \(U < \widetilde {\nu}(A)\) and according to \(\widetilde {\nu}|_{A^{c}}\) if \(U \geq \widetilde {\nu}(A)\).

Clearly a Ω m -semi-independent coupling is a best coupling and for A=Ø, we define Ø-semi-independent coupling to be the standard independent coupling. The following proposition states a few properties of this coupling, which will be useful for the sequel.

Proposition 4.2

The following holds for the A-semi-independent coupling of \((\nu ,\widetilde {\nu} )\):

  1. (1)

    X, \(\widetilde {X}\) are distributed according to ν, \(\widetilde {\nu}\) respectively.

  2. (2)

    \(\mathbb {P}^{\star }(\bigcup_{x \in A} \{ X = x \} \varDelta \{ \widetilde {X} = x \} )\leq \frac {3}{2} \sum_{x \in A} | \nu(x) - \widetilde {\nu}(x) | \).

  3. (3)

    \(\forall x \notin A,\ \mathbb {P}^{\star }(X = x,\,\widetilde {X} \neq x )\geq\nu(x)\widetilde {\nu}(A^{c} \setminus\{x\}) \) and \(\forall x \notin A,\ \mathbb {P}^{\star }(\widetilde {X} = x,\,X \neq x )\geq \widetilde {\nu}(x)\nu(A^{c} \setminus\{x\})\).

Proof

Part one of the lemma is immediate. Part (2) follows from a straightforward calculation:

As for part (3), we have:

and similarly for \(\mathbb {P}^{\star }(\widetilde {X}=x, X \neq x)\). □

We are now ready to define the coupling of σ t , \(\widetilde {\sigma }_{t}\) for this section. Fix \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma _{n}\) and y 1,…,y q−1>0. The coordinate-wise coupling with parameters y 1,…,y q−1 and starting configurations σ 0, \(\widetilde {\sigma}_{0}\) is defined as follows.

  1. (1)

    Set T (0)=0, k=1.

  2. (2)

    As long as kq−1:

    1. (a)

      As long as \(|S^{k}_{t} - \widetilde {S}^{k}_{t} |> \frac{y_{k}}{n}\):

      1. (i)

        Draw I t+1, \(\widetilde {I}_{t+1}\), using a {1,…,k−1}-semi-independent coupling of S t , \(\widetilde {S}_{t}\).

      2. (ii)

        Draw J t+1, \(\widetilde {J}_{t+1}\), using a {1,…,k−1}-semi-independent coupling of \(g_{\beta } (S_{t} - \frac{1}{n} \mathsf {e}_{I_{t+1}} )\), \(g_{\beta } (\widetilde {S}_{t} - \frac{1}{n} \mathsf {e}_{\widetilde {I}_{t+1}} )\).

      3. (iii)

        Change a uniformly chosen vertex of color I t+1 in σ t to have color J t+1 in σ t+1.

      4. (iv)

        Change a uniformly chosen vertex of color \(\widetilde {I}_{t+1}\) in \(\widetilde {\sigma}_{t}\) to have color \(\widetilde {J}_{t+1}\) in \(\widetilde {\sigma}_{t+1}\).

      5. (v)

        Set t=t+1.

    2. (b)

      When \(|S^{k}_{t} - \widetilde {S}^{k}_{t} |\leq\frac{y_{k}}{n}\) set T (k)=tT (k−1) and k=k+1.

  3. (3)

    Set \(T^{CC} = \sum_{k=1}^{q-1} T^{(k)}\).

We shall use \(\mathbb {P}^{CC}_{\sigma_{0}, \widetilde {\sigma}_{0}}\) to denote the probability measure for this coupling and \(\mathbb {P}^{CC(m)}_{\sigma_{0}, \widetilde {\sigma }_{0}}\) for the same coupling, only with k=m instead of k=1 in step (1), i.e. starting from the m-th stage. Notice that, in principle, the stopping condition at stage k, may never get satisfied, in which case we stay at that stage forever and T (k)=T CC=∞.

For u,r>0, define

where above \((s,\widetilde {s}) = (S(\sigma), S(\widetilde {\sigma}))\). Finally, set \(\mathcal {H}_{u,r} \triangleq \mathcal {H}_{u,r}^{q}\). The following lemma will be the main ingredient in an inductive proof for an upper bound on T CC:

Lemma 4.3

Fix β<q/2. Let k∈[1,q−1]. For all u k−1,r k−1,ϵ>0, there exist y k ,u k ,r k ,γ k >0, such that if \((\sigma_{0}, \widetilde {\sigma}_{0}) \in \mathcal {H}^{k}_{u_{k-1}, r_{k-1}}\) then

$$ \mathbb {P}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \bigl (T^{(k)} < \gamma _k n ,\, (\sigma _{T^{(k)}}, \widetilde {\sigma}_{T^{(k)}}) \in \mathcal {H}^k_{u_k, r_k} \bigr )\geq1-\epsilon . $$
(4.5)

Proof

Recall the expression for the drift of one coordinate (3.1). Near \(\widehat {\mathsf {e}}\), this becomes by Taylor expansion for any i∈[1,q]:

$$\mathbb {E}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \bigl [S^i_{t+1} - S^i_t \mid \mathcal {F}_t \bigr ]= \frac {1}{n} \biggl \{- \biggl (1 - \frac {2\beta }{q} \biggr )\widehat {S}^i_t + O \bigl (\| \widehat {S}_t \|_{2}^2 \bigr )\biggr \}+ O \bigl (n^{-2} \bigr ), $$

and if \(W_{t} = S_{t} - \widetilde {S}_{t}\) then

(4.6)

Now, for some r k >0 to be chosen later, let . Then, from (4.6) it follows that there exists y k >0 such that \(\overline {W}^{k}_{t} \triangleq |W^{k}_{t \land \tau^{(k)} \land T^{(k)}} |\) is a supermartingale. Clearly \(|\overline {W}^{k}_{t+1} - \overline {W}^{k}_{t} |< \frac{2}{n}\). Also, from Proposition 4.2, if t<τ (k)T (k):

which implies that \(\mathbb {E}^{CC(k)}_{\sigma_{0}, \widetilde {\sigma}_{0}} [ (\overline {W}^{k}_{t+1} - \overline {W}^{k}_{t} )^{2} |\mathcal {F}_{t} ]\geq \frac {1}{2} q^{-2}n^{-2} + O (n^{-5/2} )\). On the other hand, in view of (4.6), \(\mathbb {E}^{CC(k)}_{\sigma_{0}, \widetilde {\sigma}_{0}} [ \overline {W}^{k}_{t+1} - \overline {W}^{k}_{t} |\mathcal {F}_{t} ]= O(n^{-3/2})\). Combining the two bounds, we infer that there exists C>0, which doesn’t depend on r k or y k , such that on {t<τ (k)T (k)}

$$ \operatorname {\mathbb {V}\mathrm {ar}}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \bigl ( \overline {W}^k_{t+1} |\mathcal {F}_t \bigr )\geq C n^{-2}, $$
(4.7)

for n sufficiently large. We now apply Lemma 2.3 with \(Z_{t} = \overline {W}^{k}_{t}\), \(z_{0} = \frac{2 r_{k-1}}{\sqrt{n}}\) and N=τ (k)T (k). This gives for γ k >0:

whence we may choose γ k =γ k (r k−1,ϵ) independently of r k , y k but sufficiently large, such that

$$ \mathbb {P}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \bigl (T^{(k)} \land \tau ^{(k)} > \gamma _k n \bigr )\leq\frac{\epsilon }{3} . $$
(4.8)

This gives an upper bound on T (k), since by Proposition 3.3 part (2) we may choose r k large enough such that:

$$ \mathbb {P}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \bigl (\tau^{(k)} < \gamma _k n \bigr )\leq \frac{\epsilon }{3}. $$
(4.9)

It remains to ensure that we do not increase the distances in the first k−1 coordinates, by too much. Proposition 4.2 implies that for any t:

It follows that

Taking expectation of both sides and using the assumption on \(\| \overline {W}_{0}^{[1, k-1]} \|_{1}\), we have

$$\mathbb {E}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \bigl \| \overline {W}_{\gamma _k n}^{[1, k-1]} \bigr \|_{1} \leq \bigl \| \overline {W}_0^{[1, k-1]} \bigr \|_{1} \biggl (1 + \frac{C_1}{n} \biggr )^{\gamma _k n} \leq C_1 \frac{u_{k-1}}{n} \mathrm{e}^{C_2 \gamma _k}. $$

Hence by Markov’s inequality, there exists u k >0 such that

$$ \mathbb {P}^{CC(k)}_{\sigma_0, \widetilde {\sigma}_0} \biggl (\bigl \| \overline {W}_{\gamma _k n}^{[1, k-1]} \bigr \|_{1} > \frac{u_k-y_k}{n} \biggr )\leq\frac{\epsilon }{3} . $$

Combined with (4.8) and (4.9), the proof is complete. □

Corollary 4.4

Fix β<q/2. For any ϵ,r>0, there exist γ,u,r′>0 and y 1,…,y q−1>0 such that for \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma ^{\frac{r}{\sqrt{n}}}_{n}\).

$$ \mathbb {P}^{CC}_{\sigma_0, \widetilde {\sigma}_0} \bigl (T^{CC} < \gamma n ,\, (\sigma_{T^{CC}},\widetilde {\sigma}_{T^{CC}}) \in \mathcal {H}_{u, r^{\prime}} \bigr )\geq1-\epsilon . $$

Proof

Starting from r 0=u 0=r and applying Lemma 4.3 inductively, we obtain for some (y k ,u k ,r k ,γ k ) k∈[1,q−1]:

$$\mathbb {P}^{CC}_{\sigma_0, \widetilde {\sigma}_0} \bigl(T^{(k)} < \gamma _k n, (\sigma_{T^{(k)}},\widetilde {\sigma}_{T^{(k)}}) \in \mathcal {H}^k_{u_k, r_k} \ \forall k \in[1,q-1]\bigr) \geq1- \epsilon . $$

It remains to set \(\gamma = \sum_{k=1}^{q-1} \gamma _{k}\), r′=r q−1 and u=2u q−1. □

4.4 Coalescence of Proportions Vector Chains

The next lemma completes the coupling of the proportions chains.

Lemma 4.5

Fix β<q/2. For all r,u,ϵ>0 there exists γ>0 such that if \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma _{n}\) satisfy \((\sigma_{0}, \widetilde {\sigma}_{0}) \in \mathcal {H}_{u, r}\) and tγn, then

$$\mathbb {P}^{SC}_{\sigma_0, \widetilde {\sigma}_0} (S_{t} = \widetilde {S}_{t} )\geq1-\epsilon , $$

where under \(\mathbb {P}^{SC}_{\sigma_{0}, \widetilde {\sigma}_{0}}\), the processes (σ t ) t≥0, \((\widetilde {\sigma}_{t})_{t \geq0}\) evolve according to the synchronized coupling, as defined in Sect3.3.

Proof

If ρ is small enough, it follows from Lemma 3.4 that

$$\mathbb {E}^{\mathrm {SC}, {\rho}}_{\sigma_0, \widetilde {\sigma}_0} \| S_t - \widetilde {S}_t \|_{1} \leq \biggl (1 - \frac{1- 2\beta /q}{2n} \biggr )^t \frac{u}{n}. $$

Combined with Proposition 3.3 part (1), this implies that there exists γ=γ(u) such that \(\mathbb {E}^{SC}_{\sigma_{0}, \widetilde {\sigma}_{0}} \| S_{t} - \widetilde {S}_{t} \|_{1} \leq\frac {\epsilon }{n}\) for tγn. Then by Markov’s inequality:

$$ \mathbb {P}^{SC}_{\sigma_0, \widetilde {\sigma}_0} ( S_{t} \neq \widetilde {S}_{t} ) = \mathbb {P}^{SC}_{\sigma_0, \widetilde {\sigma}_0} \bigg( \| S_{t} - \widetilde {S}_{t} \|_{1} > \frac {1}{n} \bigg) \leq \epsilon . $$

 □

4.5 Basket-Wise Proportions Coalescence

The next coupling will allow us to turn a well-mixed proportions chains into a well-mixed configurations chain. Let \(\mathfrak {B}= (\mathcal {B}_{m})_{m=1}^{q}\) be a partition of [1,n]. We shall refer to \(\mathcal {B}_{m}\) as a basket and call \(\mathfrak {B}\) a λ-partition if \(|\mathcal {B}_{m}| > \lambda n\) for all m. Given σΣ n , let S(σ) denote a q×q matrix whose (m,k) entry is equal to the proportion in σ of color k in basket m, namely

S is an element of \(\mathbb {S}\triangleq \prod_{m=1}^{q} \mathcal {S}\) and we define \(\mathbb {S}^{\rho} = \prod_{m=1}^{q} \mathcal {S}^{\rho}\) and \(\mathbb {S}^{\rho+} = \prod _{m=1}^{q} \mathcal {S}^{\rho+}\). We also let \(\mathcal {B}_{[m_{0}, m_{1}]} = \bigcup_{m_{0} \leq m \leq m_{1}} \mathcal {B}_{m}\) and as before use S t as a shorthand for S(σ t ). The following is an analogue of Lemma 4.1 for the basket proportions matrix.

Lemma 4.6

Let \(\mathfrak {B}\) be a λ-partition for some λ>0. If either of the following holds:

  1. (1)

    \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}}\) and \(t^{\alpha _{1}}(n) \leq t \leq \mathrm{e}^{\gamma (\rho _{0}) n}\) where ρ 0, γ(ρ 0) are given in Proposition 3.7 and \(t^{\alpha _{1}}(n)\) is defined in (4.1).

  2. (2)

    \(\mathbf {S}_{0} \in \mathbb {S}^{\frac{r_{0}}{\sqrt{n}}}\) and tγ 0 n for some r 0,γ 0>0,

then

$$\mathbb {P}_{\sigma_0} \bigl (\mathbf {S}_t \notin \mathbb {S}^{\frac{r}{\sqrt{n}}} \bigr )= O\bigl(r^{-2}\bigr), $$

where the O(r −2) term is as r→∞, uniformly in n.

In order to prove Lemma 4.6, we use the following proposition to bound the second moment of the basket proportions matrix.

Proposition 4.7

If \(\mathfrak {B}\) is a λ-partition for λ>0 and m,k∈[1,q], then

$$ \mathbb {E}\bigl[\bigl(\mathbf {S}^{m,k}_{t+1} - S^k_{t+1}\bigr)^2 \mid \mathcal {F}_t \bigr] = \biggl(1- \frac{2}{n}\biggr)\bigl(\mathbf {S}^{m,k}_t - S^k_t\bigr)^2+ O \biggl(\frac {1}{n^2}\biggr). $$
(4.10)

Proof

Let \(\lambda _{0} = |\mathcal {B}_{m}|/n \geq \lambda > 0\) and set \(\mathbf {Q}^{m,k}_{t} = \mathbf {S}^{m,k}_{t} - S^{k}_{t}\). Then:

(4.11)

where (denoting by V t+1 the chosen vertex at step t+1):

Plugging these into (4.11), we obtain

$$\mathbb {E}\bigl[\bigl(\mathbf {S}^{m,k}_{t+1} - S^k_{t+1}\bigr)^2 \mid \mathcal {F}_t \bigr] = \biggl(1- \frac {2}{n}\biggr)\bigl(\mathbf {S}^{m,k}_t - S^k_t\bigr)^2+ O\biggl(\frac {1}{n^2}\biggr) $$

as required. □

Proof of Lemma 4.6

Taking expectation in (4.10) and applying (3.19) one gets:

$$\mathbb {E}_{\sigma_0} \bigl [\mathbf {S}^{m,k}_{t} - S_{t}^k \bigr ]^2 \leq \biggl (1-\frac{2}{n} \biggr )^t \mathbb {E}_{\sigma_0} \bigl [\mathbf {S}^{m,k}_0 - S_0^k \bigr ]^2 + O\bigl(n^{-1}\bigr). $$

In both cases (note that \(\alpha _{1}(\beta , q) > \frac {1}{2}\) for all β<q/2), it implies \(\mathbb {E}_{\sigma_{0}} [\mathbf {S}^{m,k}_{t} - S_{t}^{k} ]^{2} = O(n^{-1})\). Summing over all m and k and using Markov’s inequality we get

$$\mathbb {P}_{\sigma_0} \biggl (\,\sum_{m \leq q} \bigl \| \mathbf {S}^{m}_t - S_t \bigr \|_{2} > r/(2\sqrt {n}\,) \biggr )= O\bigl(r^{-2}\bigr). $$

Now, by Proposition 3.7 Case (1) and Proposition 3.3 Case (2) we also have that

$$\mathbb {P}_{\sigma_0} \bigl (\| S_t - \widehat {\mathsf {e}}\|_{2} > r/(2\sqrt{n}\,) \bigr )= O\bigl(r^{-2}\bigr). $$

Combining the two, we complete the proof. □

Suppose now that you have two initial configurations \(\sigma_{0}, \widetilde {\sigma }_{0}\), such that \(s_{0} = \widetilde {s}_{0}\). The following is a coupling under which eventually (with probability 1) also \(\mathbf {S}_{t} = \widetilde {\mathbf {S}}_{t}\). Equality is achieved one basket at a time, indexed below by m and once the proportions in a basket are equated they remain so. We shall call this coupling Basket-wise Coupling and denote by ℙBC the underlying probability measure.

  1. (1)

    Set t=0, m=1.

  2. (2)

    As long as mq:

    1. (a)

      As long as \(\mathbf {S}_{t}^{m} \neq \widetilde {\mathbf {S}}_{t}^{m}\):

      1. (i)

        Choose “old” color I t+1 according to distribution \(S_{t} = \widetilde {S}_{t}\).

      2. (ii)

        Choose “new” color J t+1 according to distribution \(g_{\beta } (S_{t} - \frac{1}{n} \mathsf {e}_{I_{t+1}} )= g_{\beta } (\widetilde {S}_{t} - \frac{1}{n} \mathsf {e}_{I_{t+1}} )\).

      3. (iii)

        Choose a vertex V t+1 uniformly among all vertices in [1,n] having color I t+1 under σ t .

      4. (iv)

        Choose \(\widetilde {V}_{t+1}\):

        1. (A)

          If \(V_{t+1} \in \mathcal {B}_{m_{0}}\) for m 0<m, choose \(\widetilde {V}_{t+1}\) uniformly among all vertices in \(\mathcal {B}_{m_{0}}\) having color I t+1 under \(\widetilde {\sigma}_{t}\).

        2. (B)

          Otherwise, if \(\mathbf {S}_{t}^{m, I_{t+1}} \neq \widetilde {\mathbf {S}}_{t}^{m, I_{t+1}}\) and \(\mathbf {S}_{t}^{m, J_{t+1}} \neq \widetilde {\mathbf {S}}_{t}^{m, J_{t+1}}\), choose \(\widetilde {V}_{t+1}\) uniformly among all vertices in \(\mathcal {B}_{[m, q]}\) having color I t+1 under \(\widetilde {\sigma}_{t}\).

        3. (C)

          Otherwise, let v 1,v 2,… be an enumeration of the vertices in \(\mathcal {B}_{[m, q]}\) having color I t+1 under σ t ordered first by the index of the basket they belong to and then by their index in V and let \(\widetilde {v}_{1}, \widetilde {v}_{2}, \dots\) be the same for \(\widetilde {\sigma}_{t}\). Then set \(\widetilde {V}_{t+1}=\widetilde {v}_{i}\) where i is such that V t+1=v i .

      5. (v)

        Set \(\sigma_{t+1}(V_{t+1}) = \widetilde {\sigma}_{t+1}(\widetilde {V}_{t+1}) = J_{t+1}\) and t=t+1.

    2. (b)

      Set m=m+1.

The following lemma gives an upper bound for the time of basket-wise proportions coalescence.

Lemma 4.8

Fix β<q/2. For any λ>0, r>0, ϵ>0, there exists γ=γ(λ,r,ϵ), such that for any λ-partition and any \(\sigma_{0}, \widetilde {\sigma_{0}}\) such that \(S_{0} = \widetilde {S}_{0}\) and \(\mathbf {S}_{0}, \widetilde {\mathbf {S}}_{0} \in \mathbb {S}^{\frac{r}{\sqrt{n}}}\),

$$ \mathbb {P}^{BC}_{\sigma_0, \widetilde {\sigma}_0} (\mathbf {S}_{\gamma n} = \widetilde {\mathbf {S}}_{\gamma n} )\geq1 - \epsilon . $$
(4.12)

Proof

From the definition of the coupling, once the proportions of basket m have coalesced they will remain equal forever. It suffices, therefore, to analyze the coalescence time of each basket separately. Note also that the coupling preserves the equality \(S_{t} = \widetilde {S}_{t}\) for all t≥0.

Define \(\mathbf {W}_{t} = \mathbf {S}_{t} - \widetilde {\mathbf {S}}_{t}\), \(W^{m}_{t} = \| \mathbf {W}^{m}_{t} \|_{1}\) and let τ (0)=0 and \(\tau^{(m)} = \min\{t \geq\tau^{(m-1)}: W^{m}_{t} = 0\}\) for m∈[1,q]. Also set

for ρ>0 sufficiently small and \(\tau^{(m)}_{*} = \tau^{(m)} \land \tau_{*}\). We claim that for all m, \((W^{m}_{t})_{t \geq0}\) is a supermartingale between τ (m−1) and τ (m) as long as τ is not reached. In order to see this, fix m, t and assume \(\{\tau^{(m-1)} \leq t < \tau^{(m)} \; ,\, \tau _{*} > t\}\). Then at step (2(a)iv), according to the coupling, there are 3 cases:

  1. (A)

    Clearly \(\mathbf {S}^{m}_{t+1} = \mathbf {S}^{m}_{t}\) and \(\widetilde {\mathbf {S}}^{m}_{t+1} = \widetilde {\mathbf {S}}^{m}_{t}\) and hence \(W^{m}_{t+1} = W^{m}_{t}\).

  2. (B)

    Notice that in this case, we have \(\mathbf {W}_{t}^{m, I_{t+1}} \mathbf {W}_{t+1}^{m, I_{t+1}} \geq0\) and \(\mathbf {W}_{t}^{m, J_{t+1}} \mathbf {W}_{t+1}^{m, J_{t+1}} \geq0\). Therefore,

  3. (C)

    If \(V_{t+1}, \widetilde {V}_{t+1} \in \mathcal {B}_{m}\) or \(V_{t+1}, \widetilde {V}_{t+1} \notin \mathcal {B}_{m}\), then \(W^{m}_{t+1} = W^{m}_{t}\), otherwise from the construction we must have:

    $$\bigl |\mathbf {W}^{m, I_{t+1}}_{t+1}\bigr| - \bigl |\mathbf {W}^{m, I_{t+1}}_{t}\bigr| = - \frac {1}{|\mathcal {B}_m|} , $$

    as well as

    $$\bigl |\mathbf {W}^{m, J_{t+1}}_{t+1}\bigr| - \bigl |\mathbf {W}^{m, J_{t+1}}_{t}\bigr| \leq \frac {1}{|\mathcal {B}_m|} . $$

    Summing these two, we obtain a non-positive drift for \(W^{m}_{t}\).

Observe that as long as τ is not reached, both \(\operatorname {\mathbb {V}\mathrm {ar}}^{BC}(W^{m}_{t+1} | \mathcal {F}_{t})\) under Case (B) and the probability that this case happens are bounded below uniformly in n and t. This gives a uniform lower bound on the variance \(\operatorname {\mathbb {V}\mathrm {ar}}^{BC}(W^{m}_{t+1} | \mathcal {F}_{t})\). Furthermore, if for some t, we have S t , \(\widetilde {\mathbf {S}}_{t} \in \mathbb {S}^{\frac{r^{\prime}}{\sqrt{n}}}\), then in view of Lemma 4.6 after γn time, we have S t+γn , \(\widetilde {\mathbf {S}}_{t+\gamma ^{\prime} n} \in \mathbb {S}^{\frac{r^{\prime\prime}}{\sqrt{n}}}\) with probability 1−O((r′′)−2). Therefore, using Lemma 2.3 we may find γ 1,…,γ q−1 such that inductively, conditioned on \(\tau ^{(m-1)}_{*} \leq \gamma _{m-1} n\) with probability at least 1−ϵ/(2q) we have \(\tau^{(m)}_{*} \leq \gamma _{m} n\). This in turn implies that \(\tau ^{(q-1)}_{*} \leq \gamma n\) with probability at least 1−ϵ/2, where γγ q−1.

It remains to bound τ below with high probability. Let \(B^{m,j} = \bigcup_{t = 1}^{\gamma n}\{ |\mathbf {S}_{t}^{m, j} - 1/q | \geq \rho\}\) and

Using Lemma 4.6 we obtain that

$$\mathbb {E}_{\sigma_0,\widetilde {\sigma}_0}^{BC} \bigl[Y^{m,j}\bigr] \leq \gamma n O\biggl(\frac {1}{n}\biggr) = O(\gamma ). $$

Then as B m,j implies that \(Y^{m,j} > \frac{n\lambda\rho}{2}\),

$$\mathbb {P}_{\sigma_0,\widetilde {\sigma}_0}^{BC} \bigl(B^{m,j}\bigr) \leq \mathbb {P}_{\sigma_0,\widetilde {\sigma }_0}^{BC}\biggl(Y^{m,j} \geq\frac{n \lambda \rho}{2}\biggr) \leq\frac {2\mathbb {E}_{\sigma_0,\widetilde {\sigma}_0}^{BC}[Y^{m,j}]}{n \lambda \rho} = O \bigl(n^{-1}\bigr) . $$

Summing over all m, j and arguing the same for \(\widetilde {\mathbf {S}}_{t}\) we obtain

$$\mathbb {P}_{\sigma_0,\widetilde {\sigma}_0}^{BC} (\tau_* < \gamma n) = O\bigl (n^{-1}\bigr). $$

Finally by a union bound we have \(\mathbb {P}_{\sigma_{0},\widetilde {\sigma}_{0}}^{BC} (\tau^{(q)} \leq\gamma n) \geq1-\frac {\epsilon }{2} + O(n^{-1})\) as desired. □

4.6 The Overall Coupling

We now describe precisely how the previous couplings are combined together to create the overall coupling. This coupling will be the main tool in proving the upper bound. Formally, let γ 1,γ 3,γ 4,γ 5 and y 1,…,y q−1 be positive numbers and σ 0Σ n . The overall coupling with parameters γ 1,…,γ 5, y 1,…,y q−1 and initial configuration σ 0 is a coupling of two chains (σ t ) t , \((\widetilde {\sigma}_{t})_{t}\) under measure \(\mathbb {P}^{OC}_{\sigma_{0}}\). The initial configuration for (σ t ) t is σ 0, while \(\widetilde {\sigma }_{0}\) is chosen according to μ n . Then, the two processes evolve as follows.

  1. (1)

    Run σ t and \(\widetilde {\sigma}_{t}\) independently until time t (1)(n)=γ 1 n.

    1. (1A)

      Partition the vertex set [1,n] into baskets \(\mathfrak {B}= (\mathcal {B}_{1}, \dots, \mathcal {B}_{q})\) such that for k∈[1,q].

  2. (2)

    Run σ t and \(\widetilde {\sigma}_{t}\) independently (again) until time \(t^{(2)}(n) = t^{(1)} + t^{\alpha _{1}}(n)\) time where \(t^{\alpha _{1}}(n)\) is defined in (4.1).

  3. (3)

    Run σ t and \(\widetilde {\sigma}_{t}\) according to the coordinate-wise coupling with parameters y 1,…,y q−1 until time t (3)(n)=t (2)(n)+γ 3 n (unless stopped before).

  4. (4)

    Run σ t and \(\widetilde {\sigma}_{t}\) according to the synchronized coupling until time t (4)(n)=t (3)(n)+γ 4 n time.

  5. (5)

    Run σ t and \(\widetilde {\sigma}_{t}\) according to the basket-wise coupling for t (5)(n)=t (4)(n)+γ 5 n time with the baskets above.

4.7 Proof of Upper Bound in Theorem 1

We will now use the overall coupling with appropriate parameters to establish the upper bound of the mixing time. Recall that β<β s (q). Fix ϵ>0, pick ρ>0 small enough and let σ 0 be any initial configuration. By Proposition 3.3 part (3), we can choose γ 1 large enough such that

$$ \mathbb {P}_{\sigma_0}^{OC} \bigl (S_{t^{(1)}(n)} \in \mathcal {S}^{\rho} \bigr )\geq 1-\epsilon . $$
(4.13)

Assuming that this event indeed occurred, \(\mathfrak {B}\) is a \(( \frac {1}{q} - \rho )\)-partition and provided that ρ is small enough, the conditions in Lemma 4.1 are satisfied. From the latter we conclude that for some r>0, with probability at least 1−ϵ, \(S_{t^{(2)}(n)} \in \mathcal {S}^{\frac{r}{\sqrt{n}}}\). On the other hand, as in (4.4) with probability at least 1−2ϵ we also have \(\widetilde {S}_{t^{(2)}(n)} \in \mathcal {S}^{\frac{r}{\sqrt{n}}}\) if r is large enough. Then Corollary 4.4 and Lemma 4.5 ensure that there exist y 1,…,y q−1 and γ 3, γ 4, such that \(S_{t^{(4)}(n)} = \widetilde {S}_{t^{(4)}(n)}\) with probability at least 1−3ϵ. From Lemma 4.6 we have that \(\mathbf {S}_{t^{(4)}(n)}, \widetilde {\mathbf {S}}_{t^{(4)}(n)} \in \mathbb {S}^{\frac{r^{\prime }}{\sqrt{n}}}\) with probability at least 1−4ϵ for some r′>0. Then, by Lemma 4.8 we may choose γ 5 such that \(\mathbf {S}_{t^{(5)}(n)} = \widetilde {\mathbf {S}}_{t^{(5)}(n)}\) with probability at least 1−5ϵ.

Now, by symmetry, for any tt (1)(n) the distribution of σ t , given \(\mathcal {F}_{t^{(1)}(n)}\), is invariant under permutations of the vertices in each basket of \(\mathcal {B}\) and the same is clearly true for μ n . Therefore we conclude that

Then from Jensen’s inequality we obtain

(4.14)

Now \(t^{(5)}(n)=t^{\alpha _{1}}_{\gamma }(n)\) (as defined in (4.1)) with γ=γ 1+γ 3+γ 4+γ 5 and since σ 0 is arbitrary and ϵ can be made arbitrarily small, by choosing γ large enough, this establishes the upper bound for the cutoff.

5 Mixing in the Supercritical Regime

5.1 Proof of Theorem 3

We first give the proof for the case β s <β<β c . Recall (Sect. 2.2) that \(\beta _{c}(q) = \frac{(q-1)}{q-2}\log(q-1)\) for q≥3 and β c (2)=1. We claim that for q≥3

$$ \beta _c(q) \big(1- 1/q\big) < \beta _c(q-1). $$
(5.1)

It can be checked for q=3 and for q≥4, it suffices to prove that \(f(x) := \frac{x-1}{x (x-2)} \log(x-1)\) is decreasing in x on [3,∞). We compute the derivative and obtain that

$$f'(x) = -\frac{1}{x(x-2)}\biggl(\frac{x^2 -2x+ 2}{x(x-2)} \log(x-1) -1\biggr ), $$

which is negative for x≥3.

Now fix δ>0 and notice that if s 1∈[1/q,1−δ], conditional on {S 1=s 1}, (S i/(1−s 1):2≤iq) is distributed as the proportions vector for the (q−1)-states Curie-Weiss Potts model on (1−s 1)n vertices with β′=β(1−s 1)≤(1−1/q)β≤(1−1/q)β c (q)<β c (q−1). Therefore, for all δ 1>0

$$ \mu_n \biggl(\exists2\leq i \leq q \mbox{ such that } \biggl|S^i - \frac{1-s^1}{q-1}\biggr| \geq \delta _1 \mid S^1 = s^1\biggr) \to0 $$
(5.2)

as n→∞ uniformly in s 1∈[1/q,1−δ]. Also uniformly in \(s \in \mathcal {S}_{n}\), recall that:

$$\mathbb {E}\bigl [S^1_{t+1} - S^1_t \mid S_t=s \bigr ]= \frac{1}{n} d_{\beta }(s) + O \bigl (n^{-2} \bigr ), $$

where \(d_{\beta }(s) = -s^{1} + g_{\beta }^{1}(s)\). It now follows from the uniform continuity of d β (s) and (5.2) that uniformly in s 1∈[1/q,1−δ],

$$\mathbb {E}_{\mu_n}\bigl[S_{t+1}^1 - S_t^1 \mid S_t^1 = s^1\bigr] = \frac {1}{n} \bigl(D_\beta \bigl(s^1\bigr)+ o(1)\bigr), $$

where \(D_{\beta }(s^{1}) = d_{\beta }(s^{1}, \frac{1-s^{1}}{q-1}, \dots, \frac {1-s^{1}}{q-1} )\).

Now if β>β s (q), from Proposition 3.1, there exists δ 2, such that D β (s 1) is uniformly positive in a δ 2-neighborhood of s (β). All together we infer that there exists ϵ>0 such that uniformly in s 1∈(s (β)−δ 2,s (β)+δ 2) for all n large enough:

$$\mathbb {E}_{\mu_n}\bigl[S_{t+1}^1 - S_t^1 \mid S_t^1 = s^1\bigr] \geq \frac{\epsilon }{n}, $$

and also

$$\mathbb {P}_{\mu_n}\biggl(S_{t+1}^1 = s_1^1 + \frac{j}{n} \Bigm| S_t^1 = s_1^1\biggr) - \mathbb {P}_{\mu_n}\biggl(S_{t+1}^1 = s_1^1 + \frac {1}{n} + \frac{j}{n} \Bigm| S_t^1 = s_1^1 + \frac {1}{n}\biggr) = o(1), $$

for j∈−1,0,1 where the last inequality follows from the concentration of the conditioned measure as well as the continuity of the probability to stay put. These two formulas together imply that

$$\mathbb {P}_{\mu_n}\biggl(S_{t+1}^1 = s_1^1 + \frac{1}{n} \Bigm| S_t^1 = s_1^1\biggr) \geq \lambda \mathbb {P}_{\mu_n}\biggl(S_{t+1}^1 = s_1^1 \Bigm| S_t^1 = s_1^1 + \frac{1}{n}\biggr), $$

for some fixed constant λ>1 for all s 1∈(s (β)−δ 2,s (β)+δ 2) when n is sufficiently large. Since (S t ) t≥0 is a reversible Markov chain, with μ n its stationary measure,

$$\mathbb {P}_{\mu_n}\biggl(S_{t+1}^1 = s_1^1 + \frac{1}{n}, S_t^1 = s_1^1\biggr) = \mathbb {P}_{\mu _n}\biggl(S_{t+1}^1 = s_1^1, S_t^1 = s_1^1 + \frac{1}{n}\biggr), $$

and therefore, for all s 1∈(s (β)−δ 2,s (β)+δ 2)

$$\mu_n\biggl(S^1 = s^1 + \frac{1}{n}\biggr) \geq \lambda \mu_n \bigl(S^1 = s^1\bigr), $$

and hence

$$ \mu_n\bigl(S^1 = s^*(\beta )+ \delta_2\bigr) \geq \lambda ^{2 \delta_2 n} \mu_n\bigl(S^1 = s^*(\beta ) - \delta_2\bigr). $$
(5.3)

Now select the set A={S 1s (β)−δ 2}. By (5.3), \(\frac{\mu_{n}(\partial_{P_{n}} A)}{\mu_{n}(A)} \leq \lambda ^{-\delta_{2} n}\), where

and P n is the transition kernel of the Glauber dynamics. Since β<β c (q) and s (β)−δ 2>1/q we also have μ n (A)=o(1) as n→∞. Therefore Cheeger’s inequality (Theorem 2.5) immediately implies an exponential lower bound on the mixing time.

The case ββ c (q) is simpler. As the large deviations analysis in Sect. 2.2 shows, we may find \(A = \{ \| S - \check{s}_{\beta , q} \|_{2} < \delta \}\), where \(\check{s}_{\beta , q}\) is defined in (2.4) and δ>0 is small enough such that \(\limsup_{n \to\infty} n^{-1} \log\mu_{n}(\partial_{P_{n}} A)<0\) and lim inf n→∞ n −1logμ n (A)=0. Since symmetry implies μ n (A)≤1/q (if δ is sufficiently small), exponential mixing time follows immediately from another application of Cheeger’s inequality (Theorem 2.5).

6 Mixing Near Criticality

We now assume β(n)=β s (q)−ξ(n), with ξ(n)→0 as n→∞. Once β(n) approaches β s with n, we no longer have a uniform negative upper bound on the drift to the right of 1/q for each coordinate. Instead, near s (β), the drift will be of order ξ(n), possibly even positive and hence it will take longer than linear time to get close to \(\widehat {\mathsf {e}}\) and this may have an effect on the order of the mixing time and cutoff window. Accordingly, in addition to the coalescence time analysis near \(\widehat {\mathsf {e}}\), one has to obtain sharp asymptotics for the passage time near s (β). This is achieved using several propositions which we state in Sect. 6.1. Their proofs will be deferred until the end of the section in favor of first showing how they are used along with the previous coalescence analysis to find the mixing time near criticality which gives the proof of Theorem 2.

Both the analysis and the results in Theorem 2 are qualitatively different, depending on whether ξ(n) decays faster or slower than some threshold rate. Accordingly, we shall distinguish between two regimes and write:

$$ \everymath{\displaystyle }\begin{array}{ll} \xi\in \mbox {[CR]} &\quad \mbox {if } \lim_{n \to\infty} n^{2/3} \xi(n) = \infty,\ \xi (n)=o(1) \cr\noalign{\vspace{3pt}} \xi\in \mbox {[NCR]} &\quad \mbox {if } 0 \leq\liminf_{n\to \infty} n^{2/3} \xi(n) \leq \limsup_{n\to\infty} n^{2/3} \xi(n) < \infty \end{array} $$
(6.1)

([CR] stands for Cutoff Regime and [NCR] stands for No-Cutoff Regime). For a>0, define also

$$ t^{\xi, a} _{\gamma }(n) = \left\{ \begin{array}{ll} \frac{\pi}{\sqrt{a}} \frac{n}{\sqrt{|\xi(n)|}} + \gamma \bigl (\frac{n^{1/2}}{|\xi(n)|^{5/4}} \lor n \bigr )& \quad \mbox {if } \xi\in \mbox {[CR]} \cr\noalign{\vspace{3pt}} \mathrm{e}^{\gamma } n^{4/3} & \quad \mbox {if } \xi\in \mbox {[NCR]}. \end{array} \right. $$
(6.2)

Both (6.1) and (6.2) will be used for sequences other than ξ as well. We shall also employ the following notation for hitting times. Given a real-valued process (X t ) t≥0 and a number x∈ℝ we shall write

for the right and left hitting time of X at x. Notice that this notation does not carry an indication for the process for which \(\tau _{x}^{+}\) is a hitting time and in case this is not clear from the context, it will be mentioned explicitly.

6.1 Drift Analysis Near s (β)

The following proposition states several properties of the function D β near s (β).

Proposition 6.1

For all q≥3 the following holds:

  1. (1)

    The point s (β s ) is the unique \(s\in(\frac{1}{q},1]\) such that \(D_{\beta _{s}}(s)=0\).

  2. (2)

    For k=0,1,…, the functions \(D^{*}_{k}(\beta ) \triangleq \frac{d^{k}}{ds^{k}} D_{\beta }(s^{*}(\beta ))\) are C in a neighborhood of β s . Furthermore:

    • \(\frac{d}{d \beta } D^{*}_{0}(\beta _{s}) > 0\).

    • \(D^{*}_{2}(\beta _{s}) < 0\).

  3. (3)

    For all ρ>0, there exists δ>0 such that:

    (6.3)

The next lemma gives sharp asymptotics for the passage time near 0 for a process with certain drift assumptions near 0 (given by (6.4) below). The one coordinate process will fall into this category if we analyze it near s (β).

Formally, let \(((Z_{t})_{t \geq0}^{n} \ ; \; n \geq0 )\) be a sequence of discrete time processes. For all n, suppose that \((Z_{t} )_{t \geq0} = (Z_{t} )_{t \geq 0}^{n}\) is adapted to \((\mathcal {F}_{t} )_{t \geq0}^{n}\), satisfies n|Z t+1Z t |∈{−1,0,1} with probability 1, and

$$ \mathbb {E}[Z_{t+1}-Z_t | \mathcal {F}_t] = \frac{1}{n} \bigl (\zeta (n) + a Z_t^2 + b Z_t^3 + O \bigl (\zeta (n) Z_t^2 + Z_t^4 \bigr )\bigr )$$
(6.4)

where a>0, b∈ℝ and ζ(n) is a sequence satisfying ζ(n)→0 as n→∞. We allow both ζ∈[CR] and ζ∈[NCR], but in the latter case, we assume in addition the existence of d>0 such that for all n

$$ \operatorname {\mathbb {V}\mathrm {ar}}\bigl[n(Z_{t+1} - Z_t)| \mathcal {F}_t\bigr] \geq d. $$
(6.5)

Write \(\mathbb {P}_{z_{0}}\) for the probability measure under which this process is defined and starts from z 0.

Lemma 6.2

Fix ρ>0 sufficiently small. Then for z 0=−ρ there exist functions L ,U :(−∞,∞)→[0,1] satisfying lim γ→−∞ L (γ)=lim γ→∞ U (γ)=0 such that for all γ,

(6.6)
(6.7)

where \(\tau_{\rho}^{+}\) is a hitting time for Z. Moreover, if ζ∈[NCR] we can chose U such that for all γ we have

$$ U^*(\gamma ) < 1. $$
(6.8)

Remark 6.3

The upper (lower) bound in the lemma still holds if (Z t ) t≥0 satisfies (6.4) with ≥ (≤) in place of the equality sign or if in place of z 0=−ρ we have z 0≥−ρ (z 0≤−ρ). Since (Z t ) t≥0 has \(0, \pm \frac {1}{n}\) steps this can be shown by a simple coupling argument.

The next proposition shows that the drift of one coordinate stays close to its upper bound D β (⋅) for sufficiently long time. More precisely, for σ 0Σ n , t>0, δ>0, y∈[0,1] let

$$ K_n(\sigma_0, t, y, \delta ) = \mathbb {P}_{\sigma_0} \Bigl(\,\max_{0\leq \theta \leq\min\{t,\tau_y^- \}} D_\beta \bigl(S_{\theta }^1\bigr) - n \mathbb {E}_{\sigma_0} \bigl [S_{\theta +1}^1- S_{\theta }^1\mid\mathcal{F}_{\theta } \bigr ]> \delta \Bigr), $$
(6.9)

where \(\tau_{y}^{-}\) is a hitting time for \(S^{1}_{t}\). Then,

Proposition 6.4

Suppose that βq/2 and set σ 0≡1. Then for any \(y > \frac {1}{q}\):

  1. (1)

    If t(n)=o(n 2) and δ(n)n 2 t(n)−1→∞ then lim n→∞ K n (σ 0,t(n),y,δ(n))=0.

  2. (2)

    If t(n)=γn 4/3 then for all δ>0 we have

    $$\lim_{\gamma \to0} \limsup_{n \to\infty} K_n\bigl(\sigma_0, t(n), y, \delta n^{-2/3}\bigr) = 0. $$

6.2 Proof of Theorem 2

6.2.1 Upper Bound on Mixing Time

Fix ρ>0 small enough and let σ 0Σ n be given. By Proposition 6.1 part (1), we can find δ>0 so that

(6.10)

where we use s in place of s (β). Then by Lemma 2.1 part (1), we have that,

$$\mathbb {P}_{\sigma_0} \bigl (\tau_{(s^* + \rho)}^- > (2 /\delta ) n \bigr )= o(1) $$

where this and all hitting times below are of \(S_{t}^{1}\). Define now \(Z_{t}=s^{*} - S^{1}_{t+\tau_{(s^{*}+\rho)}^{-}}\). Using (3.1), (3.3) Proposition 6.1 and applying Taylor’s expansion for D β (s) around s and then again for \(D_{0}^{*}(\beta )\) around β s , we infer that there exist a>0, α≠0, b∈ℝ such that

$$\mathbb {E}[Z_{t+1}-Z_t | \mathcal {F}_t] \geq \frac{1}{n} \bigl (\zeta (n) + a Z_t^2 + b Z_t^3 + O \bigl (\zeta (n) Z_t^2 + Z_t^4 \bigr )\bigr ), $$

where ζ(n)=αξ(n)+O(ξ(n)2+n −1) and also (6.5) holds (if needed), since the probability of choosing any new color at time t+1 is bounded above and below, uniformly in n and S t . Hence by Lemma 6.2 and Remark 6.3, for all γ

$$\mathbb {P}_{\sigma_0} \bigl (\tau_{(s^*-\rho)}^- - \tau_{(s^*+\rho)}^- > t^{\zeta , a}_{\gamma }(n) \bigr )\leq U^*(\gamma ) + o(1). $$

Now, using the relation between ζ(n) and ξ(n), it is not difficult to verify that \(t^{\zeta , a}_{\gamma }(n) \leq t^{\xi, a^{\prime}}_{\gamma ^{\prime}}(n)\) for all γ′, where a′=αa and γ=F(γ′) for some F such that γ→∞ if γ′→∞.

From Lemma 2.1 part (3), applied to the process \((S^{1}_{t+\tau_{(s^{*}-\rho)}^{-}} - (s^{*} - \rho ): \; t \geq0 )\), it follows that with 1−o(1) probability \(S^{1}_{t+\tau_{(s^{*}-\rho )}^{-}}\) stays to the left of s ρ/2 for all t<n 2. Then we may apply Lemma 2.1 part (1) to the process to conclude

$$\mathbb {P}_{\sigma_0} \bigl (\tau_{(q^{-1} + \rho/2)}^- - \tau_{(s^*-\rho)}^- > (2/\delta ) n \bigr )=o(1). $$

Finally another application of Lemma 2.1 part (3) gives \(S^{1}_{t+\tau_{(q^{-1}+\rho/2)}^{-}} \leq1/q + \rho\) for all t<n 2 with 1−o(1) probability. For the [CR] case, we use union bound (over all coordinates):

$$ \mathbb {P}_{\sigma_0} \bigl (S_{t_{\gamma ^{\prime}}^{\xi, a^{\prime}}(n)} \notin \mathcal {S}^{\rho+} \bigr )\leq q U^*(\gamma ) +o(1) . $$
(6.11)

For the [NCR] case, define \(\tau^{(1)} = \tau_{(q^{-1}+\rho/2)}^{-}\) and for k>1. Then, by inductive conditioning we obtain

Since also \(S^{k}_{t+\tau^{(k)}} \leq1/q + \rho\) for all k∈[1,q], t<n 2 with 1−o(1) probability, we arrive to

$$ \mathbb {P}_{\sigma_0} \bigl (S_{t_{\gamma ^{\prime}}^{\xi, a^{\prime}}(n)} \notin \mathcal {S}^{\rho+} \bigr )\leq1- \bigl(1-U^*(\gamma )\bigr)^q +o(1) . $$
(6.12)

We now re-employ the overall coupling in Sect. 4.6, but in view of (6.11) and (6.12) we change step (1) and instead of running the two chains for γ 1 n time, we run them for \(t^{(1)}(n) = t_{\gamma ^{\prime}}^{\xi,a^{\prime}}(n)\). As (6.11), (6.12) show, we can choose γ′ large enough such that \(\mathbb {P}^{OC}_{\sigma_{0}} (S_{t_{\gamma ^{\prime}}^{\xi, a^{\prime}}(n)} \notin \mathcal {S}^{\rho+}) \leq \epsilon \) for n sufficiently large. The remaining steps in the coupling are left unchanged and we choose the same parameter values, as in the proof of Theorem 1.

Using the analysis of the modified step (1) given by (6.11) and (6.12), together with the analysis in Sect. 4.7 of the remaining steps—which carries over (uniformly in β near β s (q)), since it only required β<β c (q), we recover (4.14), namely

$$\bigl \|\mathbb {P}_{\sigma_0} (\sigma_{t^{(5)}(n)} \in\cdot )- \mu_n\bigr \|_{\mathrm {TV}} \leq6\epsilon . $$

The time is now given by

$$t^{(5)}(n) = t^{\alpha _1}_{\gamma }(n) + t^{\xi, a^{\prime}}_{\gamma ^{\prime}}(n), $$

for some γ>0. Since σ 0 is arbitrary and ϵ can be made arbitrarily small, by having γ, γ′ large enough, this completes the proof for the upper bound in (1.4) and (1.5) with \(\alpha _{2} = \pi /\sqrt {\alpha a}\).

6.2.2 No Cutoff in NCR Case

Using the modified overall coupling as introduced above, we obtain from (4.14), (6.12) and (6.8) for any γ′ and sufficiently large γ

$$\bigl \|\mathbb {P}_{\sigma_0} (\sigma_{t^{(5)}(n)} \in\cdot )- \mu_n\bigr \|_{\mathrm {TV}} \leq1-\epsilon , $$

for all σ 0, large enough n and some ϵ>0. Then, since in the [NCR] case

$$t^{(5)}(n) = t^{\alpha _1}_{\gamma }(n) + t^{\xi, a^{\prime}}_{\gamma ^{\prime}}(n) \leq t^{\xi, a^{\prime}}_{\gamma ^{\prime} + C}(n) , $$

this shows that there is no cut-off.

6.2.3 Lower Bound on Mixing Time.

Fix ρ>0 small enough and start with σ 0≡1—the all ‘1’ configuration. Define:

$$\delta (n) = \left\{ \begin{array}{ll} n^{-1}\xi(n)^{-1/2} A(n) \quad& \mbox {if } \xi\in \mbox {[CR]}, \cr\noalign{\vspace{3pt}} \delta _1 n^{-2/3} & \mbox {if } \xi\in \mbox {[NCR]}, \end{array} \right. $$

where A(n) is a sequence tending to ∞ sufficiently slowly and δ 1>0. Set

and define the process Y t which is equal to \(S_{t}^{1}\) up to time N, but after this time evolves like a birth-and-death processes with ±1/n increments and drift \(-n^{-1} D_{\beta }(S^{1}_{t})\). Then (Z t s Y t :t≥0) satisfies

$$\mathbb {E}[Z_{t+1}-Z_t | \mathcal {F}_t] \leq \frac{1}{n} \bigl (\zeta (n) + a Z_t^2 + b Z_t^3 + O \bigl (\zeta (n) Z_t^2 + Z_t^4 \bigr )\bigr ), $$

with a, b, α as in the upper bound case, but with ζ(n)=αξ(n)+δ(n)+O(ξ(n)2+n −1) and condition (6.5) holds (if needed) as before. Then, using Lemma 6.2 and Remark 6.3, we have for n large enough \(\mathbb {P}_{\sigma_{0}} (\tau_{\rho}^{+} < t^{\zeta , a}_{\gamma }(n) )\leq2 L^{*}(\gamma )\), where \(\tau_{\rho}^{+}\) is a hitting time for Z and γ∈ℝ. As before, it is not difficult to verify that if A(n) is increasing slowly enough, \(t^{\zeta , a}_{\gamma }(n) \geq t^{\xi, a^{\prime }}_{\gamma ^{\prime}}(n)\), where γ=F(γ′) satisfies γ→−∞ if γ′→−∞.

Now define and τ′ as \(\tau_{\rho}^{+}\), only with \(S_{t}^{1}\) in place of Y t . Then

where K n is defined in (6.9). Then, if ρ is sufficiently small, we can use (4.2) for \(\widehat {S}_{t}\) starting from time T to obtain for all r>0 and γ′′:

Using Proposition 6.4 for the middle term, the last inequality gives (4.3) with \(t^{\alpha _{1}}_{\gamma ^{\prime\prime}}(n) + t^{\xi, a^{\prime}}_{\gamma ^{\prime }}(n)\) in place of \(t^{\alpha _{1}}_{\gamma }(n)\). The remaining of the proof is identical to the subcritical case and this shows the lower bound for both parts of Theorem 2 with \(\alpha _{2} = \pi /\sqrt{\alpha a}\).

6.3 Proofs for Sect. 6.1

Proof of Proposition 6.1

First observe that for all β, \(D_{\beta }(\frac{1}{q})=0\) and for all \(s>\frac{1}{q}\),

$$\frac{d}{d\beta } D_{\beta }(s) = \frac{d}{d\beta } \biggl (-s + \frac{1}{1+(q-1)\mathrm{e}^{-\frac{2\beta q}{q-1}(s-\frac{1}{q})}} \biggr )> 0. $$

Now since D β (s) is smooth as a function of s and β and \(d_{0}(s)=-s+\frac{1}{q}\) it follows that β s >0. We have that

$$ \everymath{\displaystyle }\begin{array}{ll} \frac{d}{ds} D_{\beta }(s) &= -1 + \frac{2\beta q \mathrm{e}^{-\frac{2\beta q}{q-1}(s-\frac{1}{q})}}{(1+(q-1)\mathrm{e}^{-\frac{2\beta q}{q-1}(s-\frac{1}{q})})^2}, \cr\noalign{\vspace{3pt}} \frac{d^2}{ds^2}D_{\beta }(s) &= \frac{4\beta ^2 q^2 \mathrm{e}^{-\frac{2\beta q}{q-1}(s-\frac{1}{q})}(1-(q-1)\mathrm{e}^{-\frac{2\beta q}{q-1}(s-\frac{1}{q})})}{(q-1)(1+(q-1)\mathrm{e}^{-\frac{2\beta q}{q-1}(s-\frac{1}{q})})^3}, \end{array} $$
(6.13)

and so

$$ \frac{d}{ds} D_\beta (s) \bigg|_{s=\frac{1}{q}}=-1+\frac{2\beta }{q}, \qquad \frac{d^2}{ds^2} D_\beta (s) \bigg|_{s=\frac{1}{q}}=\frac {4\beta ^2 (q-2)}{q(q-1)}>0, $$
(6.14)

which implies that d q/2(s)>0 when \(s\in(\frac{1}{q},\frac{1}{q}+\epsilon )\) for some small ϵ. This implies that β s <q/2. It follows that

$$\frac{d}{ds} D_{\beta _s}(s) \bigg|_{s=\frac{1}{q}} < 0, $$

and so D β (s)<0 when β∈[β s ,β s +ϵ] and \(s\in (\frac{1}{q},\frac{1}{q}+\epsilon)\) for some small ϵ. It follows by compactness then that for some \(\frac{1}{q} + \epsilon\leq s^{*}(\beta _{s})\leq 1\) that \(D_{\beta _{s}}(s^{*}(\beta _{s}))=0\). By the definition of β s and since D β (s) is smooth we have that

$$ \frac{d}{ds} D_{\beta _s}(s) \bigg|_{s=s^*(\beta _s)}= 0 ,\qquad \frac{d^2}{ds^2}D_{\beta _s}(s) \bigg|_{s=s^*(\beta _s)} \leq0. $$
(6.15)

The equation \(\frac{d}{ds} D_{\beta _{s}}(s)= 0\) is equivalent to

$$2\beta q \mathrm{e}^{-\frac{2\beta _s q}{q-1}(s-\frac{1}{q})}=\bigl(1+(q-1)\mathrm{e}^{-\frac {2\beta _s q}{q-1}(s-\frac{1}{q})}\bigr)^2, $$

which is a quadratic equation in \(\mathrm{e}^{-\frac{2\beta _{s} q}{q-1}(s-\frac{1}{q})}\) and hence has at most 2 solutions which we denote s 1,s 2 with s 1<s 2. Since \(\frac{d}{ds} D_{\beta _{s}}(0)<0\) then \(D_{\beta _{s}}(s_{1})<0\) and so s (β s )=s 2. In particular this implies that s (β s ) is the unique \(s\in(\frac{1}{q},1]\) such that \(D_{\beta _{s}}(s)=0\). Also it follows that \(\frac{d}{ds} D_{\beta _{s}}(s)>0\) for s∈(s 1,s (β s )) and that there exists s′∈(s 1,s (β s )) such that \(\frac {d^{2}}{ds^{2}} D_{\beta _{s}}(s')=0\). Since by Eq. (6.13) there is at most one s such that \(\frac {d^{2}}{ds^{2}}D_{\beta _{s}}(s) = 0\) it follows that

$$\frac{d^2}{ds^2}D_{\beta _s}(s) \bigg|_{s=s^*(\beta _s)} < 0. $$

Hence by the Inverse Function Theorem s (β) is a smooth function of β when β is in a small neighborhood of β s . Then we have that

since \(\frac{d}{ds}D_{\beta _{s}}(s)|_{s = s^{*}(\beta _{s})} = 0\) which completes the proof of the second part.

We now turn to prove the third part. As we have observed \(D_{\beta _{s}}(s)\) is a smooth function satisfying

$$D_{\beta _s} \bigl(s^*(\beta _s)\bigr) = 0,\qquad \frac{d}{ds}D_{\beta _s}(s)\bigg|_{s = s^*(\beta _s)} = 0, \quad\mbox{and}\quad \frac{d^2}{ds^2}D_{\beta _s}(s)\bigg|_{s = s^*(\beta _s)} < 0. $$

Therefore, we deduce that for any ρ>0, there exists δ 1>0 such that

(6.16)

Since \(\frac{d^{2}}{ds^{2}}D_{\beta _{s}}(s) |_{s=s^{*}(\beta )} < 0\) for all ρ>0, there exists δ 2>0 such that

$$\bigl |s^*(\beta ) - s^*(\beta _s)\bigr| < \rho/2\quad \mbox{for all } |\beta - \beta _s| < \delta_2. $$

Combined with (6.16), it follows that

$$ D_{\beta _s}(s) < -2\delta_1, \quad \mbox{for all } \bigl |s - s^*(\beta )\bigr| \geq \rho\quad\mbox{and}\quad |\beta - \beta _s| < \delta_2. $$
(6.17)

Now that D β (s) can be viewed as a continuous function of (β,s) and by compactness, there exists δ 3>0 such that \(|D_{\beta }(s) - D_{\beta _{s}}(s)| \leq\delta_{1}\) for all s∈[1/q,1] and |ββ s |≤δ 3. Combined with (6.17), it completes the proof by taking δ=δ 1δ 2δ 3. □

Proof of Lemma 6.2

We do not lose anything by assuming that

(6.18)

for some a>0, b, c with ρ 0=2ρ and that once Z t exits [−ρ 0,+ρ 0] it is stopped. Indeed, having f vanish outside of [−ρ 0,+ρ 0], does not change the asymptotics of the passing time. Clearly, this is the case for z>+ρ 0. For z<−ρ 0, it follows from

$$ \mathbb {P}_{-\rho} \bigl(\tau_{-\rho_0}^- < t^{\zeta , a}_{\gamma }(n)\bigr ) = o(1), $$
(6.19)

for all γ, which is a consequence of Lemma 2.1 part (3) since the drift of Z t is at least \(\frac {c}{n}\) on [−ρ 0,−ρ] for some positive c uniformly in n (if ρ is small enough).

As for replacing the error term in (6.4) by \(c Z_{t}^{4}\), as the proof below shows, the functions U , L in the lemma restricted to condition (6.18) can be chosen to be continuous in a in a small interval [a 0ϵ,a 0+ϵ] and the limits (6.6), (6.7) hold uniformly in a in this interval. This together with Remark 6.3 implies the existence of U , L under which (6.6), (6.7) hold in the general case (6.4) with \(t_{\gamma }^{\zeta , a+O(\zeta (n))}(n)\). Now, it is not difficult to see that the latter is bounded above and below by \(t_{\gamma \pm C}^{\zeta , a}(n)\) for some C>0 and hence (6.6), (6.7) hold with \(t_{\gamma }^{\zeta , a}(n)\). Similar considerations apply for (6.8).

Set

$$ \varPsi(z) = \int_0^z \frac{1}{f(x)} dx \quad \mbox {and} \quad Y_t = \varPsi(Z_t) - \varPsi(Z_0) - t. $$

The motivation behind the above definitions comes from a continuous time deterministic analog of (6.18) in the form of an ODE

$$ \dot{z}(t) = f\bigl(z(t)\bigr) $$
(6.20)

for which z(t)=Ψ −1(tt 0) is a solution (roughly speaking Y t measures how far behind or ahead “in schedule” Z t is, judging from its position).

Start with the [CR] case and set

$$t^{\zeta , a}(n) = t^{\zeta ,a}_0(n) = \frac{\pi}{\sqrt{a}} \frac{n}{\sqrt{\zeta (n)}} ; \qquad w^{\zeta }(n) = \frac{n^{1/2}}{\zeta ^{5/4}(n)} \lor n. $$

In the deterministic setting the time it takes for z(t) to pass from z(0)=−ρ to ρ is

$$\varPsi(\rho) - \varPsi(-\rho) = \int_{-\rho}^{\rho} \frac{n\, dx}{\zeta (n) + a x^2 + b x^3 + c x^4 } = t^{\zeta , a}(n) + O(n) $$

if ρ is small enough. This will be shown in Proposition 6.5 below. Thus, bounding the passage time \(\tau _{\rho}^{+}\) around t ζ,a(n) can be achieved by bounding \(|Y_{t^{\zeta , a}(n)}|\).

Accordingly, let Y t =M t +A t be the Doob-decomposition of Y t , with M t a zero-mean martingale and A t the predictable process. The next proposition will allow us to bound Y t . The proof of this proposition will be deferred to the end of this section.

Proposition 6.5

If ρ is small enough and ζ∈[CR] then

  1. (1)

    Ψ(ρ)−Ψ(−ρ)=t ζ,a(n)+O(n).

  2. (2)

    \(\mathbb {E}M^{2}_{t_{\gamma}^{\zeta , a}(n)} = O( w^{\zeta }(n)^{2})\).

  3. (3)

    \(A_{t_{\gamma }^{\zeta , a}(n)} = o(w^{\zeta }(n))\) with probability 1.

Now using the monotonicity of Ψ in [−ρ 0,+ρ 0] we have

where the last inequality is a second moment bound. This shows (6.6). For the lower bound, if −γ is large enough, we may write

where the last inequality follows from Doob’s inequality. This shows (6.7)

Next, we address the [NCR] case. We can no longer use the means analysis (6.20) throughout the entire passage interval [−ρ,ρ] as Z t is not concentrated around its mean near 0. Accordingly, we analyze the passage time in each of the following segments separately:

$$\bigl[-\rho, -r n^{-1/3}\bigr] ; \qquad\bigl[-r n^{-1/3}, +r n^{-1/3}\bigr] ; \qquad \bigl[+r n^{-1/3}, +\rho\bigr], $$

for some r>0 to be chosen later.

We start with the upper bound. For the sequel, let w=rn −1/3. The upper bound will follow if we show the following:

  1. (1)

    Segment [−ρ,−w]. For any γ,

    $$ \lim_{r \to\infty} \limsup_{n \to\infty} \mathbb {P}_{-\rho} \bigl (\tau^+_{-w} > t^{\zeta , a}_{\gamma }(n)\bigr) = 0. $$
    (6.21)
  2. (2)

    Segment [+w,+ρ]. For any γ,

    $$ \lim_{r \to\infty} \limsup_{n \to\infty} \mathbb {P}_{w} \bigl(\tau^+_{\rho} > t^{\zeta , a}_{\gamma }(n)\bigr) = 0. $$
    (6.22)
  3. (3)

    Segment [−w,+w]. For any r>0, there exists u:ℝ→[0,1) such that

    $$ \limsup_{n \to\infty} \mathbb {P}_{-w} \bigl (\tau^+_{w} > t_{\gamma }^{\zeta , a}(n) \bigr )\leq u(\gamma ) <1 , $$
    (6.23)

    for all γ. Furthermore, u(γ)→0 as γ→∞.

All are hitting times for Z. Indeed, by first choosing large enough r and then choosing large enough γ both (6.6) and (6.8) will follow by multiplication. We proceed to prove each of the above statements.

Segments [−ρ,−w], [+w,+ρ]. Here we can use the means analysis as in the [CR] case. As before, we do not change the asymptotics of the passage time through these intervals, if we assume that f(z) in (6.18) satisfies:

where a, b, c, ρ 0 are as before, w 0=w/2 and once Z t exits [−ρ 0,−w 0]∪[+w 0,+ρ 0] it is stopped. Indeed this follows from the same reasoning and in addition since for any γ

$$\lim_{r \to\infty} \mathbb {P}_{w} \bigl(\tau^-_{w_0} < t^{\zeta , a}_{\gamma }(n)\bigr) = 0, $$

uniformly in n (large enough) as it follows from Lemma 2.1 part (2) since the drift of Z t is non-negative on [w 0,w].

We use the same definitions for Y t , M t and A t as above. In place of Proposition 6.5 we have

Proposition 6.6

Assume ζ∈[NCR]. There exists k(r) satisfying k(r)→0 as r→∞ such that for any ρ small enough, r large enough, n large enough and all t:

  1. (1)

    Ψ(−w)−Ψ(−ρ),Ψ(ρ)−Ψ(w)≤k(r)n 4/3.

  2. (2)

    \(\mathbb {E}M_{t}^{2} \leq k(r) t n^{4/3}\).

  3. (3)

    |A t |≤k(r)t with probability 1.

The proof is again deferred. Now, as before

(6.24)

where the last inequality is Chebyshev. This goes to zero as r→∞ for any γ. This shows (6.21). Similarly,

and this shows (6.22).

Segment [−w,w]. Here we still assume (6.18), but instead of absorbing Z t at the boundaries, we shall now suppose that Z t evolves like a symmetric random walk with ±n −1 steps, once it exits [−ρ 0,+ρ 0].

We first show that u can be chosen to vanish at infinity. Consider the process U t =U 0−(Z t Z 0+δn −5/3 t), for δ>0 with U 0 to be chosen later and set N=inf{t:U t ≤0}. Then, by the definition of the [NCR] regime for n large enough U tN is a non-negative supermartingale satisfying the requirements of Lemma 2.3 and hence

where we choose U 0=w+δeγ n −1/3Z 0 and δ=eγ. The last expression can be made arbitrarily small by taking γ large enough, uniformly in n if it is sufficiently large.

To show that u can satisfy u(γ)<1 for all γ, we have to show that Z can cross from −w to w in \(t^{\zeta , a}_{\gamma }(n)\)-time for arbitrarily small γ. If ρ is small and n is large, then X t =n(Z t −(−w)) satisfies the conditions in Lemma 2.2 with δ=ζ(n) and a=d. Therefore

which is positive for all γ, once n is large enough. This proves (6.23) and concludes the proof of the upper bound.

To show the lower bound in the [NCR] case, set V t =Z t δn −5/3 t+w and choose δ,r>0 such that \(V_{t \land \tau^{+}_{w}}\) has non-positive drift whenever \(V_{t \land \tau^{+}_{w}} \geq0\). Then,

and part (2) of Lemma 2.1 shows that the last expression goes to 0 as γ→−∞ uniformly in n (large enough). This proves (6.7) and completes the [NCR] case.  □ It remains to prove Propositions 6.4–6.6.

Proof of Proposition 6.4

Let . Fix some 2≤i<jq and set

$$Y_t = S_t^i - S_t^j. $$

Let \(U_{t-1} = Y_{t}-Y_{t-1} - \mathbb {E}_{\sigma_{0}} [Y_{t}-Y_{t-1}\mid \mathcal {F}_{t-1}]\) and then since \(|Y_{t}-Y_{t-1}| \leq\frac{2}{n}\) we have that \(|U_{i} | \leq\frac{4}{n}\). Define the process Z t by Z 0=0 and

$$Z_t -Z_{t-1} := \operatorname {sign}(Z_{t-1})\operatorname {sign}(Y_{t-1}) U_{t-1} $$

where

With this definition Z t is clearly a martingale and since \(|Z_{t}-Z_{t-1}|\leq\frac{4}{n}\), then \(\mathbb {E}_{\sigma_{0}} Z_{t}^{2} \leq\frac{16t}{n^{2}}\) and so by Doob’s maximal inequality,

$$ \mathbb {E}_{\sigma_0} \Bigl[ \max_{0\leq t \leq t(n)} |Z_t| \Bigr]^2 \leq2 \mathbb {E}_{\sigma_0} Z_{t(n)}^2 \leq\frac{32t(n)}{n^2}. $$
(6.25)

Now when t<τ we have that \(S^{i}_{t},S_{t}^{j}<\frac{1}{q}\) and so

$$\bigl |\mathrm{e}^{2\beta (S_t^i-\frac{1}{q})}-\mathrm{e}^{2\beta (S_t^j-\frac{1}{q})}\bigr| \leq 2\beta \bigl | S_t^i - S_t^j\bigr|. $$

By Jensen’s inequality \(\sum_{k=1}^{q} \mathrm{e}^{2\beta (S_{t}^{i}-\frac{1}{q})} \geq q\) so

(6.26)

Now when \(|Y_{t-1}| \geq\frac{2}{n}\) we have that \(|Y_{t}| - |Y_{t-1}| = \operatorname {sign}(Y_{t-1})(Y_{t}-Y_{t-1})\) and we always have

$$|Z_t| - |Z_{t-1}| \geq \operatorname {sign}(Z_{t-1})(Z_t-Z_{t-1})= \operatorname {sign}(Y_{t-1}) U_{t-1} $$

with equality when \(\operatorname {sign}(Z_{t}) = \operatorname {sign}(Z_{t-1})\). Hence it follows that when \(|Y_{t-1}| \geq\frac{2}{n}\) and tτ ,

where the first inequality follows from Eq. (6.26). It follows by induction that \(|Z_{t}| \geq|Y_{t}| - \frac{3}{n}\) for all tτ . In particular we have by Eq. (6.25) that

$$ \mathbb {E}_{\sigma_0} \Bigl[\,\max_{2\leq i < j \leq q} \max_{0\leq t \leq \tau^*} \bigl |S_{t}^i - S_{t}^j \bigr| \Bigr]^2 = O\biggl(\frac{t(n)}{n^2}\biggr)=o(1). $$
(6.27)

By Markov’s inequality with probability tending to 1 we have that \(|S_{\tau^{*}}^{i} - S_{\tau^{*}}^{j} | = o(1)\) for every pair 2≤i,jq. Now by construction \(S_{\tau^{*}}^{1} \geq y-\frac{1}{n}\) so with high probability we have that \(S_{\tau^{*}}^{i} \leq\frac{1}{q}-\frac{y-\frac{1}{q}}{q-1} + o(1) < \frac{1}{q}\) which implies that with high probability \(\tau^{*} = \min\{t(n),\tau_{y}^{-}\}\).

Now, by Taylor series expansions,

where the first inequality is by Jensen, and we have used the fact that \(\sum_{i=2}^{q} S_{t}^{i} = 1 - S_{t}^{1}\). It therefore follows that with high probability for all \(0\leq t\leq\max\{t(n),\tau_{y}^{-}\} \) that

and hence that

$$\max_{0\leq t \leq\tau^*} D_\beta \bigl(S_{t-1}^1\bigr) - n \mathbb {E}_{\sigma_0} \bigl [ S_{t}^1- S_{t-1}^1\mid\mathcal{F}_{t-1}\bigr ] \leq O\Bigl ( \Bigl (\max_{2\leq i < j \leq q} \max_{0\leq t \leq\tau^*} \bigl | S_t^i - S_t^j \bigr |\Bigr )^2\Bigr ) $$

which combined with Eq. (6.27) and Markov’s inequality completes the result. □

Proof of Proposition 6.5

Starting with part (1),

To prove part (2), we use the law of total variance:

Hence by induction

$$\mathbb {E}M^2_{t_{\gamma}^{\zeta , a}(n)} = \operatorname {\mathbb {V}\mathrm {ar}}M_{t_{\gamma}^{\zeta , a}(n)} \leq\frac{ t_{\gamma}^{\zeta , a}(n)}{\zeta ^2(n)} = O \bigl( w^{\zeta }(n)^2\bigr) . $$

As for part (3),

where the last inequality follows from

if ρ is small enough. Then again by induction, we conclude that

$$ A_{t_{\gamma}^{\zeta , a}(n)} = O \biggl (\frac{t_{\gamma}^{\zeta , a}(n)}{n \zeta ^{3/2}(n)} \biggr )= o \bigl ( w^{\zeta }(n) \bigr ). $$

 □

Proof of Proposition 6.6

If r is large enough and ρ is small, we have f(z)≥n −1(a/2)z 2 for all w 0<|z|<ρ 0, where as before w 0=w/2 and ρ 0=2ρ. This immediately gives part (1) with k(r)=Cr −1.

The proof for parts (2), (3) are similar to the ones in Proposition 6.5. This time the bounds on the derivatives become

$$\max_{w_0/2 < |z| < 2\rho_0} \bigl | \varPsi^{\prime}(z) \bigr |^2 \leq C r^{-4} n^{10/3} ; \qquad \max_{w_0/2 < |z| < 2\rho_0} \bigl | \varPsi^{\prime\prime}(z) \bigr | \leq C r^{-3} n^2 . $$

Proceeding by induction as before, we obtain (3), (2) with k(r)=Cr −4 and k(r)=Cr −3 respectively. □

7 Essential Mixing

Proof of Theorem 4

As the reader can verify, most statements in Sects. 3 and 4 hold when β<β c (q) and even β<q/2 (the restrictions on β are indicated before each statement there). The only time β<β s (q)<β c (q) is required is in step (1) of the overall coupling, where the condition ensures that the drift of each single coordinate \(S^{i}_{t}\) is negative in all (1/q,1], which, in turn, implies that for any initial configuration, after t=O(n) time, \(\sigma_{t} \in \varSigma _{n}^{\rho }\), which is a necessary starting point for the couplings that follow.

Now, if ββ s (q), but still β<β c (q), we may replace this step, with the requirement that σ 0 is initially chosen from \(\widetilde {\varSigma }_{n} = \varSigma _{n}^{\rho}\). The analysis of the overall coupling will remain the same, with the coalescence time being even smaller (but just by a linear term, which can be absorbed in the cutoff-window term). Thus, the restricted mixing time \(t^{\widetilde {\varSigma _{n}}}_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n)\) will be upper bounded as before. In addition, the lower bound in Sect. 4.1 will also hold for \(t^{\widetilde {\varSigma _{n}}}_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n)\), since as initial configuration, we may take any \(\sigma_{0} \in\partial_{P_{n}} \varSigma _{n}(\rho)\) for ρ>0.

It remains to show that \(\varSigma _{n} \setminus \widetilde {\varSigma }_{n}\) has an exponentially decreasing probability under μ n . This follows immediately from the large deviations analysis in Sect. 2.2. If β<β c (q), the rate function I β,q is strictly positive away from \(\widehat {\mathsf {e}}\) and in particular there exist C 1>0, C 2>0, such that

$$\mu_n (\varSigma _n \setminus \widetilde {\varSigma }_n )= \pi_n (\mathcal {S}_n \setminus \mathcal {S}^{\rho_1}_n )\leq C_1 \mathrm{e}^{-C_2 n}. $$

This concludes the proof of the theorem. □