Abstract
We study Glauber dynamics for the mean-field (Curie-Weiss) Potts model with q≥3 states and show that it undergoes a critical slowdown at an inverse-temperature β s (q) strictly lower than the critical β c (q) for uniqueness of the thermodynamic limit. The dynamical critical β s (q) is the spinodal point marking the onset of metastability.
We prove that when β<β s (q) the mixing time is asymptotically C(β,q)nlogn and the dynamics exhibits the cutoff phenomena, a sharp transition in mixing, with a window of order n. At β=β s (q) the dynamics no longer exhibits cutoff and its mixing obeys a power-law of order n 4/3. For β>β s (q) the mixing time is exponentially large in n. Furthermore, as β↑β s with n, the mixing time interpolates smoothly from subcritical to critical behavior, with the latter reached at a scaling window of O(n −2/3) around β s . These results form the first complete analysis of mixing around the critical dynamical temperature—including the critical power law—for a model with a first order phase transition.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and Results
We study the dynamics of the Potts model on the complete graph (mean-field) known as the Curie-Weiss Potts model. For n≥1, β≥0, the Curie-Weiss Potts distribution is a probability measure on Σ n =Q V where Q={1,…,q} and V={1,…,n}, defined by
where σ∈Σ n and Z β,n is the normalizing constant. When q=2 this is the classic Ising model while in this paper we will focus on the case q≥3 for an integer q (for an extension to non-integer q via the random cluster model, see e.g. [23]). We use the standard notation β c (q) for the (explicitly known) threshold value between the ordered and the disordered phases (see [16]).
Throughout the paper (σ t ) t≥0 will denote the discrete time Glauber dynamics for this model, namely, starting from σ 0, at each step we choose a vertex u∈V uniformly and set
We denote by P n the transition kernel for this Markov process and by \(\mathbb {P}_{\sigma_{0}}\) the underlying probability measure. We will measure the distance between the distribution of the chain at time t and its stationary distribution μ n via the total-variation norm. Accordingly,
For ϵ∈(0,1), the ϵ-mixing time is the number of steps until the total-variation distance to stationarity is at most ϵ in the worst case, i.e.:
and by convention we set \(t_{{ \scriptsize \mbox {\textsc {mix}}}}(n) := t_{{ \scriptsize \mbox {\textsc {mix}}}(1/4)}(n)\). If for any fixed ϵ∈(0,1)
we say that the family of Markov chains exhibits the cutoff phenomenon, which describes a sharp drop in the total variation distance from close to 1 to close to 0 (in an interval of time of smaller order than \(t_{{ \scriptsize \mbox {\textsc {mix}}}}(n)\) denoted as the cutoff window). Observe that cutoff occurs if and only if \(t_{{ \scriptsize \mbox {\textsc {mix}}}(\delta)}(n) / t_{{\scriptsize \mbox {\textsc {mix}}}(\epsilon)}(n) \to1\) as n→∞ for any fixed δ,ϵ∈(0,1).
1.1 Results
We show that the dynamics for the Curie-Weiss Potts model undergoes a critical slowdown at an inverse-temperature β s (q)>0. This threshold is given by
Unlike mean-field Ising, for which β s (2)=β c (2)=1, the dynamical transition for q≥3 occurs at a strictly higher temperature than the static phase transition, i.e., β s (q)<β c (q).
Our first result addresses the regime β<β s (q), where rapid mixing occurs within O(nlogn) steps and the dynamics exhibits cutoff with a window of size O(n) (see Fig. 1).
Theorem 1
Let q≥3 be an integer. If β<β s (q) then the Glauber dynamics for the q-state Curie-Weiss Potts model exhibits cutoff at mixing time
with cutoff window w ϵ (n)=O ϵ (n) where α 1(β,q)=[2(1−2β/q)]−1.
We proceed to analyze the order of the mixing time as β(n)→β s (q) as n→∞,
where ξ(n)→0 as n→∞. The asymptotics of the mixing time will, of course, depend on how fast ξ decays, but it turns out that cutoff is observed only iff the decay is slow enough. This is captured in the following theorem.
Theorem 2
Let q≥3 be an integer. With β(n) given as in Eq. (1.3) we have:
-
(1)
If lim n→∞ n 2/3 ξ(n)=∞ then the Glauber dynamics has cutoff with mixing time and cutoff window given by
$$ \begin{array}{llll} t_{{ \scriptsize \mbox {\textsc {mix}}}}(n) &=& \alpha _1\bigl(\beta (n), q\bigr) n \log n + \alpha _2(q) n/\sqrt{\xi(n)} ,\cr\noalign{\vspace{3pt}} w_{\epsilon }(n) &=& O_{\epsilon }\big( n + \sqrt{n/\xi(n)^{5/2}} \,\big) , \end{array} $$(1.4)where α 2(q) is a positive constant and α 1 is the constant defined in Theorem 1.
-
(2)
If 0≤lim inf n→∞ n 2/3 ξ(n)≤lim sup n→∞ n 2/3 ξ(n)<∞ then the dynamics does not exhibit cutoff and has mixing time
$$ t_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n) = \varTheta_{\epsilon } \big( n^{4/3} \big). $$(1.5)
Part (2) of Theorem 2 in particular applies at criticality β=β s (q) where the mixing time is of order n 4/3 with a scaling window of order n −2/3 (in contrast, the mixing time for the critical Ising model is of order n 3/2 with a window of \(\sqrt {n}\) ).
Finally, above β s (q) the mixing time is exponentially large in n, as depicted in Fig. 2.
Theorem 3
Let q≥3 be an integer, and fix β>β s (q). For every 0<ϵ<1 there exist C 1,C 2>0 such that for all n,
Combined these results give a complete analysis of the mixing time of Glauber dynamics for the Curie-Weiss Potts model.
The slowdown in the mixing of the dynamics occurring as soon as β≥β s (q) (be it power-law at β s (q) or exponential mixing above this point) is due to the existence of states from which the Markov chain takes a long time to escape. However, in the range β∈[β s (q),β c (q)) the subset of initial configurations from which mixing is slow is exponentially small in probability. One can then ask instead about the mixing time from typical starting locations, known as essential mixing. Define the mixing time started from a subset of initial configurations \(\widetilde {\varSigma }_{n} \subseteq \varSigma _{n}\) via \(d^{\widetilde {\varSigma }_{n}}_{t}(n) = \max_{\sigma\in \widetilde {\varSigma }_{n}} d^{\sigma}_{t}(n)\) as well as
With these definitions we have the following result, showing that the subcritical mixing time behavior from Theorem 1 extends all the way to β<β c (q) once one excludes a subset of initial configurations with a total mass that is exponentially small in n.
Theorem 4
Let q≥3 be an integer and let β<β c (q). There exist constants C 1,C 2>0 and subsets \(\widetilde {\varSigma }_{n} \subseteq \varSigma _{n}\) such that the Glauber dynamics has cutoff with mixing time and cutoff window given by
where \(\mu_{n} (\varSigma _{n} \setminus \widetilde {\varSigma }_{n}) \leq C_{1} \mathrm{e}^{-C_{2} n}\) and α 1 is the constant in Theorem 1.
1.2 Related Work
Through several decades of work by mathematicians, physicists and computer scientists a general picture of how the mixing time varies with the temperature has been developed. It is believed that in a wide class of models and geometries the mixing time undergoes the following “critical slowdown”. For some critical inverse-temperature β d and a geometric parameter L(n), where n is the size of the system, we should have:
-
High temperature (0≤β<β d ): mixing time of order nlogn with cutoff.
-
Critical temperature (β=β d ): mixing time of order nL(n)z for some fixed z>0.
-
Low temperature (β>β d ): mixing time of order exp(τ β L(n)) for some fixed τ β >0.
For a more comprehensive description of critical slowdown see [19, 30, 33]. It should be noted that to demonstrate the above phenomenon in full, one needs to derive precise estimates on the mixing time up to the critical temperature, which can be quite challenging.
Perhaps the most studied model in this context is Ising. For the complete graph, a comprehensive treatment is given in [17, 18, 28], where critical slowdown (as described above) around the uniqueness threshold β c is established in full. In this setting, finer statements about the asymptotics of the mixing time can be made. For instance, in [17] the case where β approaches β c with the size of the system is analyzed (in Theorem 2 here we consider this case as well). The same picture, yet with the notable exclusion of a cutoff proof at high temperatures, is also known on the d-regular tree where [3] established the high and low temperature regimes and recently [19] proved polynomial mixing at criticality.
From a mathematical physics point of view, the most interesting underlying graph to consider is the lattice ℤd. For d=2 the full critical slowdown is now known: for a box with n vertices the mixing time is O(nlogn) throughout the high temperature regime [34, 35] whereas it is exp((τ β +o(1))n) throughout the low temperature regime [13, 14, 39] with τ β being the surface tension. The d=2 picture was very recently completed by two of the authors establishing cutoff in the high temperature regime [32] and polynomial mixing at the critical temperature [30]. For a more comprehensive survey of recent literature for Ising on the lattice see [30].
Turning back to the Potts model, understanding the kinetic picture here is of interest, not just as an extension of the results for Ising, but as an example of a model with a first order phase transition. Unlike in Ising, the free energy in the Potts model on various graphs and values of q undergoes a first order phase transition as the temperature is varied. This is certainly true for all q≥3 in the mean-field approximation, i.e. on the complete graph as treated here, but also known to be the case on ℤd for d≥2 and q>Q(d) for some Q(d)<∞ [23] (although most values of Q(d) are not known rigorously, it was shown that Q(2)=4 [2] and Q(d)<3 for all d large enough [6]).
A first order phase transition has direct implications on the dynamics of the system. For one, the coexistence of phases at criticality, implies slow mixing. This is because getting from one phase to another requires passing through a large free energy barrier, i.e. states which are exponentially unlikely. Indeed, in [10, 11] the mixing time for Potts on a box with n vertices in ℤd for any fixed d≥2 and sufficiently large q is shown to be exponential in the surface area of the box for any β larger or (notably) equal to the uniqueness threshold β c (d,q). This should be compared with the aforementioned polynomial mixing of Glauber dynamics for Ising at criticality. In fact, coexistence of the ordered and disordered phases also accounts for the slow mixing of the Swendsen-Wang dynamics at the critical temperature. This is shown in [10, 11] for ℤd under a similar range of d and q and in [22] for the complete graph. Other dynamics also exhibit slow mixing at criticality [4].
First order phase transitions are expected to lead to metastability type phenomena on the lattice in some instances. There has been extensive work on this topic (see [5, 12] and the references therein) yet the picture remains incomplete. It is expected that the transition to equilibrium will be carried through a nucleation process, which has an O(1) lifetime and therefore does not affect the order of the mixing time in contrast to the mean-field case. This is affirmed, for instance, in Ising where O(nlogn) mixing time is known for low enough temperatures under an (arbitrarily small) non-zero external field, despite the first order phase transition (in the field) around 0. For related works see e.g. [7, 8, 15, 36, 37] as well as [33] and the references there.
Similarly, the Potts model on the lattice should feature rapid mixing of O(nlogn) throughout the sub-critical regime β<β c (d,q) due to the vanishing surface-area-to-volume ratio. Thus, contrary to the critical slowdown picture predicted for Ising, whenever there is a first order phase transition it should be accompanied by a sharp transition from fast mixing at β<β c to an exponentially slow mixing at β c in lieu of a critical power law. However, fast mixing of the Potts model on ℤd for β<β c is not rigorously known except at very high temperatures (where it follows from standard arguments [33]). For sufficiently high temperatures, cutoff was very recently shown in [31].
On the complete graph however, metastability is apparent. In the absence of geometry, the order parameter sufficiently characterizes the state of the system and thus the dynamics and its stationary distribution are described effectively by the free energy of the system as a function of the order parameter. While coexistence of phases implies that at criticality the free energy is minimized at more than one value of the order parameter (corresponding to each phase), continuity entails that some of these global minima will turn into local minima just below or above criticality. These local minimizers correspond exactly to the metastable states and the value (or curve) of the thermodynamic parameter (e.g. temperature) beyond which these local minima cease to appear is called spinodal.
Consider the model at hand, namely Potts on the complete graph (for general background on the Potts model, see e.g. [9, 21, 23]). The order parameter here, analogous to the magnetization in the Ising model, is the vector of proportions of each color . It is well known that there exists β c (q), below which the Potts distribution μ n is supported almost entirely on configurations with roughly equal (about 1/q) proportions of each color and above which μ n is supported almost entirely on configurations where one of the q colors is dominant. In the former case we say that in equilibrium the system is in the disordered phase, while in the latter case we say that in equilibrium the system is in one of the q ordered phases, corresponding to the q-colors.
Up to relabeling of the vertices configurations are essentially described by the proportions vector s and as such, on a logarithmic scale, the Potts distribution can be read from the graph of the free energy as a function of s. As depicted in Fig. 4 (showing q=3, the situation is qualitatively the same for all q>2), when β<β c the free energy has a single global minimum at the center, corresponding to the disordered phase, while for β>β c there are q “on-axis” global minima, corresponding to the q ordered phases obtainable from one another through a permutation of the coordinates. At β c , coexistence of the ordered and disordered phases is evident in the presence of q+1 global minima of the free energy. For more details see, e.g., [16].
Below β c but sufficiently close to it, the free energy, globally minimized only at the center, has q local minima in place of the q global minima which corresponded to the ordered phases at criticality. Once β is too small, these local minima disappear. The threshold value for the appearance of these local minima is the spinodal inverse temperature β s (there is a similar behavior above β c marked by a second spinodal temperature β S , as illustrated by Fig. 4, but we do not address this regime in the paper).
Once the system starts from an initial configuration whose proportions vector is close to a local minimizer, the system will spend a time which is exponential in n near this minimizer before escaping to the global minimizer and reaching equilibrium. This is because away from a local minimizer, energy increases locally exponentially (in n), i.e. there is an energy barrier of an exponential order to cross. Thus, as n→∞ the system will spend an unbounded amount of time at a non-equilibrium state, which will be seemingly stable. In terms of the mixing time of the dynamics, as the definition involves the worst case initial configuration, metastable states will result in exponentially slow mixing.
Our result is a rigorous affirmation of this picture. Although the definition of β s in (1.1) seems different than the one given above for the spinodal inverse temperature, it can be shown, in fact, that this is indeed the threshold value of β for the emergence of local minima below β c . Theorem 3 then asserts that above β s mixing is exponentially slow while Theorem 1 shows that below β s mixing is still fast. The set of configurations whose exclusion in Theorem 4 leads to fast mixing all the way up to (but below) β c are precisely the ones from which the process will get stuck in a metastable state. Indeed as the free energy of such initial configurations is higher than that of configurations near the globally minimizing stable state, such configurations will have a probability which is exponentially small in the size of the system.
Furthermore, the transition from fast to slow mixing passes through polynomial mixing which occurs at β s and in its vicinity (Theorem 2). This in fact establishes that the aforementioned critical slowdown phenomenon occurs here as well, albeit at the spinodal rather than the uniqueness threshold. We predict that this should be the case for the dynamical behavior on other mean-field geometries such as an Erdős-Rényi random graph or a random regular graph.
For a (non-rigorous) treatment of metastability and its effect on the rate of convergence to equilibrium in other mean-field models with a first order phase transition, see for example [24, 25]. A rigorous analysis of such a system (the Blume-Capel model), below and above criticality, was recently carried out in [26]. It is illuminating to contrast the graph of the free energy as a function of the proportions vector in the Potts model to that of the free energy as a function of the magnetization in Curie-Weiss Ising, given in Fig. 5. The second order phase transition and lack of phase coexistence at the critical temperature, implies the absence of a local minima at any value below or above β c . As a consequence there is no spinodal temperature and mixing is fast throughout the whole range β<β c .
1.3 Proof Ideas
As discussed before, up to a permutation of the vertices, configurations can be described by their proportions vector. Formally, for a configuration σ∈Σ n , we denote by S(σ) the q-dimensional vector (S 1(σ),…,S q(σ)), where
Note that \(S(\sigma) \in \mathcal {S}\) where . Now it is not difficult to see that S t =S(σ t ) is itself a Markov process with state space \(\mathcal {S}\) and stationary distribution π n =μ n ∘S −1, the distribution of S(σ) under μ n . We shall refer to this process as the proportions chain. Figures 1 and 2 show a realization of the proportion chains superimposed on the free energy graph plotted upside down for better visibility. 3 different values of β, corresponding to 3 different regimes are exhibited. The color of the curve, representing time, shows the temporal evolution of the proportion chain. Notice how local minima (shown as local maxima) “trap” the chain for a long time.
As a projection of the chain, S t mixes at least as fast as σ t . Moreover, when starting in one of the q configurations where all sites have the same color, a symmetry argument reveals that the mixing time of S t is equal to that of σ t . One therefore must control the effect of starting from an initial state which is not monochrome. Using a coupling argument we show that the difference in the mixing times is of the same order as the cutoff window for S t and can thus be absorbed into our error estimates. It will then suffice to analyze the proportions chain, which is of a lesser complexity than the original process. In particular the state space of S t has a fixed q−1 dimension, independently of n, and its transition probabilities can be easily calculated.
Next, we show that when β<β c (q) most of the mass of the stationary distribution π n is concentrated on balanced states whose distance from the “equi-proportionality” vector (1/q,…,1/q) is \(O(1/\sqrt{n}\,)\). A simple coupling argument then shows that the Markov chain is mixed soon after arriving at such a state. Thus the main effort in the proof becomes finding sharp estimates on the time is takes for S t to reach a balanced state from a worst-case initial configuration, in different regimes of β.
It turns out that these hitting times are determined by the function D β (x), which is defined as (up to a multiplication by 1/n) the drift of one coordinate of S t when that coordinate has value x in the worst case, i.e. the minimum drift towards 1/q, where the minimum is taken over all possible values for the remaining coordinates. An explicit formula for D β (x) can be obtained (3.2). Its graph is plotted in Fig. 3 for x∈[1/q,1] and different values of β. For β≪β c , this drift is strictly negative in (1/q,1] and thus each coordinate quickly (in O(nlogn) time) gets to within \(1/\sqrt{n}\) of 1/q. On the other hand, the function D β (x) is monotone increasing in β and therefore for sufficiently large β, it will no longer be negative throughout (1/q,1]—there will be an interval in (1/q,1] where it is positive. Such an interval will take an exponential amount of time to traverse and this will lead to an exponential mixing time. The smallest β for which this happens is, by definition, β s . This β s in turn coincides with the inverse temperature at which local minima begin to appear in the free energy as a function of s. In fact, to show exponentially slow mixing, we use standard conductance arguments, which in face of local minima in the free energy give exponential mixing quite automatically.
The most delicate analysis is in the critical regime where β is near or equal to β s . In this case the x-axis is tangential to the graph of D β (x) at its peak (the green curve in Fig. 3) and the challenge is in finding the asymptotics of the passage time through the tangential point on the x-axis (left green dot in the figure). As the drift there is 0, locally around this point, a coordinate of S t behaves as a random walk and Doob’s decomposition of a suitably chosen function of the coordinate yields the right passage time estimates.
1.4 Organization
Section 2 sets notation and contains some useful facts on the Curie-Weiss Potts model, as well as tools (and a few non-standard variations on them) needed in the analysis of mixing time. In Sect. 3 we derive basic properties of the proportions chain that will be repeatedly used in the remainder of the proof. In Sect. 4 we analyze the case β<β s (q) and prove Theorem 1 while Sect. 5 treats the case β>β s (q) and establishes Theorem 3. The near critical regime is analyzed in Sect. 6, which includes the proof of Theorem 2. The final section, Sect. 7, gives the proof of Theorem 4.
2 Preliminaries
2.1 Notation
We let [a,b] denote the set {a,…,b} for a,b∈ℤ. We use the same notation for vector and scalar valued variables. For an m-dimensional vector s, we denote by s k its k-th component and for I=(i 1,…,i k )⊆[1,m], \(s^{I} = (s^{i_{1}}, \dots, s^{i_{k}})\). Matrix-valued variables will appear in bold. We let W m,k denote the (m,k) element of W and let W m denote is its m-th row.
We write e i for the unit vector in the i-th direction and \(\widehat {\mathsf {e}}= (1/q, 1/q, \dots, 1/q) \in \mathbb {R}^{q}\) for the equiproportionality vector. For s∈ℝq, we denote \(\widehat {s} = s - \widehat {e}\).
Most of our vectors will live on the simplex or even \(\mathcal {S}_{n} = \mathcal {S}\cap \frac {1}{n} \mathbb {Z}^{q}\). Occasionally we would like to further limit this set and for ρ>0 we define
Note that \(\mathcal {S}^{\rho+} \subseteq \mathcal {S}^{(q-1)\rho}\) and similar relations hold for \(\mathcal {S}_{n}\) and Σ n .
Vectors in \(\mathcal {S}\) will often be viewed also as distributions on [1,q]. A coupling of ν, \(\widetilde {\nu} \in \mathcal {S}\) is the joint distribution of two random variables X, \(\widetilde {X}\), defined on the same probability space and marginally distributed according to ν, \(\widetilde {\nu}\). If ℙ⋆ is the underlying probability measure then we always have \(\|\nu-\widetilde {\nu}\|_{\mathrm {TV}} \leq \mathbb {P}^{\star}(X \neq \widetilde {X})\). We shall call this coupling a best coupling if it satisfies \(\|\nu-\widetilde {\nu}\|_{\mathrm {TV}} = \mathbb {P}^{\star}(X \neq \widetilde {X})\). Such a coupling always exists.
In the course of the proofs, we introduce various couplings of two copies of the Glauber dynamics (σ t ) t . For the second copy we shall use the notation \(\widetilde {\sigma}_{t}\) and \(\widetilde {S}_{t} = S(\widetilde {\sigma}_{t})\). Couplings will be identified by their underlying probability measure, for which we will use the symbol ℙ with a superscript that changes from coupling to coupling, e.g. ℙBC. A subscript will indicate initial state or states, e.g. \(\mathbb {P}^{BC}_{\sigma_{0}, \widetilde {\sigma }_{0}}\). The expectation and variance, \(\mathbb {E}[\cdot]\) and \(\operatorname {\mathbb {V}\mathrm {ar}}(\cdot)\) resp., will be decorated in the same way as the underlying measure with respect to which they are defined. The σ-algebra \(\mathcal {F}_{t}\) will always include all the randomness up-to time t. For example, with a single chain (σ t ) t this is the σ-algebra generated by {σ s :s≤t}, for a coupling of two chains (σ t ) t , \((\widetilde {\sigma}_{t})_{t}\), it is the one generated by , etc.
2.2 Large Deviations Results for the Curie-Weiss Potts Distribution
In this subsection we recall several results concerning the concentration of the proportions vector measures π n . See, e.g., [16, 20] for proofs of these results.
It is a consequence of Sanov’s Theorem together with an application of Varadhan’s Lemma that the sequence (π n ) n≥1 satisfies a large deviation principle (LDP) on \(\mathcal {S}\) with rate function
where C is chosen so that \(\min_{s \in \mathcal {S}} I_{\beta , q}(s) = 0\). The minimizing set ε β,q ={s:I β,q (s)=0} which is the support of the weak limit \(\mathfrak {p}_{\beta , q}\) of (π n ) n≥1 is then
where
and
The function \(\beta \mapsto\check{s}_{\beta , q}^{1}\) is continuous and increasing on [β c (q),∞) and interchanges the 1-st and k-th coordinates. Furthermore, the value of \(\check{s}_{\beta , q}\) for all β,q is known in implicit form and in particular for β=β c (q), we have
This is true for all q≥3. For q=2, (2.1), (2.2), (2.4), (2.5) still hold, but the critical inverse-temperature is now
It is here that a fundamental difference between q=2 and q>2 can be observed. If q=2 then \(\check{s}_{\beta _{c}(2), 2} = \widehat {\mathsf {e}}\), in which case \(\beta \mapsto \mathfrak {p}_{\beta , 2}\) is continuous for all β≥0. On the other hand, if q≥3 we have \(\check{s}_{\beta _{c}(q), q} \neq \widehat {\mathsf {e}}\) and \(\beta \mapsto \mathfrak {p}_{\beta , q}\) is discontinuous at β c (q). Thus, as it is recorded in the Physics literature, the system exhibits a first order phase transition if q≥3, but only a second order phase transition if q=2.
2.3 Hitting Time Estimates for General Supermartingales
We will require some standard hitting time estimates for supermartingales and related processes.
Lemma 2.1
For x 0∈ℝ, let (X t ) t≥0 be a discrete time process, adapted to \((\mathcal {F}_{t})_{t \geq0}\) which satisfies
-
(1)
\(\exists \delta \geq0{:}\ \mathbb {E}_{x_{0}} [X_{t+1} - X_{t} |\mathcal {F}_{t} ]\leq-\delta \) on {X t ≥0} for all t≥0.
-
(2)
∃R>0: |X t+1−X t |≤R, ∀t≥0.
-
(3)
X 0=x 0,
where \(\mathbb {P}_{x_{0}}\) is the underlying probability measure. Let and . The following holds.
-
(1)
If δ>0 then for any t 1≥0:
$$ \mathbb {P}_{x_0} \bigl (\tau_0^- > t_1 \bigr )\leq \exp \biggl \{-\frac{(\delta t_1 - x_0 )^2}{8 t_1 R^2} \biggr \}. $$(2.7) -
(2)
If x 0≤0, then for any x 1>0 and t 2≥0,
$$ \mathbb {P}_{x_0} \bigl (\tau_{x_1}^+ \leq t_2 \bigr )\leq2 \exp \biggl \{-\frac {(x_1-R)^2}{8t_2 R^2} \biggr \}. $$(2.8) -
(3)
If x 0≤0, δ>0, then for any x 1>0 and t 3≥0:
$$ \mathbb {P}_{x_0} \bigl (\tau_{x_1}^+ \leq t_3 \bigr )\leq t_3^2 \exp \biggl \{-\frac{(x_1-R) \delta ^2}{8 R^3} \biggr \}. $$(2.9)
Proof
Starting with (1), if x 0<0, there is nothing to prove. Otherwise, let Z t be a supermartingale independent of X t , which starts from 0, has drift −δ and steps which are bounded by R. Set \(Y_{t} = X_{t \land \tau_{0}^{-}} + Z_{(t -\tau _{0}^{-})^{+}}\) and write Y t =M t +A t as its Doob Decomposition, with M t a martingale, A t a predictable processes and A 0=x 0. Clearly A t ≤x 0−δt and |M t+1−M t |≤2R, \(\mathbb {P}_{x_{0}}\)-a.s. The Hoeffding-Azuma inequality implies:
as desired.
Now set \(Y_{t} = X_{t \land \tau_{x_{1}}^{+}}\) and observe that (Y t ) t≥0 satisfies conditions 1–3, and consequently it is enough to prove (2.8), with \(\mathbb {P}_{x_{0}} (Y_{t_{2}} > x_{1} )\) as the LHS. Therefore, for all t, let \(W_{t+1} = \mathbb {E}_{x_{0}} [Y_{t+1} - Y_{t} |\mathcal {F}_{t} ]\) and Z t+1=Y t+1−Y t −W t+1. Then clearly, W t+1, Z t are \(\mathcal {F}_{t}\)-measurable, W t+1≤0 on {Y t ≥0} and \(\mathbb {E}_{x_{0}} [Z_{t+1} | \mathcal {F}_{t} ]= 0\). This is a Doob-type decomposition. Now, define M t for all t inductively as follows.
and set N t =|M t |. We claim the following:
-
(1)
(M t ) t∈ℕ is an \((\mathcal {F}_{t} )_{t \geq0}\)-adapted martingale.
-
(2)
N 0=0; N t+1=|N t +Z t+1|, ∀t.
-
(3)
Y t ≤N t +R.
The first two assertions follow straightforwardly from the construction. The third one, can be proven by induction, since it clearly holds for t=0 and assuming Y t−1≤N t−1+R, if Y t−1≥0, then
and if Y t−1<0, then
Finally, by the Hoeffding-Azuma inequality applied to M t we have
This shows (2.8).
For part (3), let \(\tau_{0}^{-}(s) = \inf\{t \geq s: X_{t} \leq0 \}\) and Y t (s), M t (s) and A t (s) defined as in the proof of part (1) only with \(\tau_{0}^{-}(s)\) replacing \(\tau_{0}^{-}\). Then,
where the last inequality follows from Hoeffding-Azuma inequality and since the summands are non zero only when 0≤s 1<s 2≤t 3 and (s 2−s 1)R≥x 1−R. □
Lemma 2.2
Let (X t ) t∈ℕ be a process adapted to \((\mathcal {F}_{t})_{t \in \mathbb {N}}\) and satisfying the following conditions for some a>2δ≥0:
-
(1)
X t+1−X t ∈{−1,0,1}.
-
(3)
\(\mathbb {E}[X_{t+1} - X_{t} | \mathcal {F}_{t}] \geq-\delta \).
-
(3)
\(\operatorname {\mathbb {V}\mathrm {ar}}(X_{t+1} | \mathcal {F}_{t}) \geq a\).
-
(4)
X 0≥0.
Let . Then
where C 1,C 2 are positive constants which depends only on a.
Proof
It is easy to verify that conditions 1–3 imply
Now, let T 0=0 and \(T_{k} = \inf\{t > T_{k-1} : \: X_{t} \neq X_{T_{k-1}} \}\), \(Y_{k} = X_{T_{k}}\) for k≥1. Also, let N k =T k −T k−1 and Z k =Y k −Y k−1. From (2.11), it is not hard to see that we can couple (N k ) k , (Z k ) k with two i.i.d. sequences \((\widetilde {N}_{k})_{k}\),\((\widetilde {Z}_{k})_{k}\) such that \(\widetilde {N}_{k} \geq N_{k}\), \(\widetilde {Z}_{k} \leq Z_{k}\) a.s. and
Consequently, with \(\widetilde {Y}_{k} = \sum_{m \leq k} \widetilde {Z}_{m}\) and \(\widetilde {T}_{k} = \sum_{m \leq k} \widetilde {N}_{m}\) we have:
where the last inequality follows by the local CLT for \(\widetilde {Y}_{k}\), which is a nearest neighbor random walk whose steps have mean −δ/a and variance 1−(δ/a)2 and also by Cramer’s theorem for \(\widetilde {T}_{k}\). All constants depend only on a. □
Finally, we will also make use of the following result from [29]:
Lemma 2.3
[29, Proposition 17.20]
Let (Z t ) t≥0 be a non-negative supermartingale adapted to \((\mathcal {G}_{t} )_{t \geq0}\) and N a stopping time. Suppose that:
-
(1)
Z 0=z 0.
-
(2)
|Z t+1−Z t |≤B.
-
(3)
∃σ>0 such that \(\operatorname {\mathbb {V}\mathrm {ar}}(Z_{t+1} | \mathcal {G}_{t} )> \sigma^{2}\) on the event {N>t}.
If u>4B 2/(3σ 2), then:
2.4 Variance Lemma
The following is a straightforward extension of [28, Lemma 2.6] to vector valued Markov processes. We include a proof here for completeness.
Lemma 2.4
Let (Z t ) be a Markov chain taking values in ℝd and with transition matrix P. Write \(\mathbb {P}_{z_{0}}\), \(\mathbb {E}_{z_{0}}\) for its probability measure and expectation respectively, when Z 0=z 0. Suppose that there is some 0<η<1 such that for all pairs of starting states \((z_{0}, \widetilde {z}_{0} )\),
Then \(v_{t} \triangleq \sup_{z_{0}} \operatorname {\mathbb {V}\mathrm {ar}}_{z_{0}} (Z_{t} )= \sup_{z_{0}} \mathbb {E}_{z_{0}} \| Z_{t} - \mathbb {E}_{z_{0}} Z_{t} \|_{2}^{2}\) satisfies:
Proof
Let Z t and \(Z_{t}^{*}\) be independent copies of the chain with the same starting state z 0. By the assumption (2.12), we obtain that
Hence, we see that
Combined with the total variance formula, it follows that
which then gives that \(v_{t} = \sum_{i=1}^{t}(v_{i} - v_{i-1}) \leq\sum _{i=1}^{t} \eta^{2(t-1)}v_{1}\), implying the desired upper bound immediately. □
2.5 Bottleneck Ratio
Let P be an irreducible, aperiodic transition kernel for a Markov chain on S with stationary measure π. The bottleneck ratio of a set A⊆S is:
where ∂ P A={x∈A:P(x,y)>0 for some y∉A}. The bottleneck ratio of the chain is
The following result, due to [1, 27, 38] in several similar forms (see, e.g., [29, Theorem 7.3]) relates the bottleneck ratio with the mixing time of the chain.
Theorem 2.5
If Φ ∗ is the bottleneck ratio defined in (2.14) then \(t_{{ \scriptsize \mbox {\textsc {mix}}}(1/4)} \geq\frac{1}{4\varPhi_{*}} \).
3 Drift Analysis for the Proportions Chain
In this section we prove various results concerning the drift of the process S t . We analyze both the one coordinate process \(S^{1}_{t}\) and the distance-to-equiproportionality \(\| S_{t} - \widehat {\mathsf {e}}\|_{2}\). In the course of this analysis, we also define two couplings which will be of independent use later on and prove a uniform bound on the variance of S t .
3.1 The Drift of One Proportion Coordinate
From symmetry, it is enough to analyze the drift of \(S^{1}_{t}\). For β≥0, define \(g_{\beta }:\mathcal {S}\to \mathcal {S}\) as
We can express the drift of \(S_{t}^{1}\) as follows:
with
The function \(d_{\beta }: \mathcal {S}\to \mathbb {R}\) thus describes (up to a constant factor of n −1 and an error term) the drift of the first coordinate given the current proportions vector. It turns out the rapid mixing hinges on whether d β (s) is strictly negative whenever s 1>1/q (and for any values for the remaining coordinates of s). Accordingly we define D β :[0,1]→ℝ as
and check when D β (x) is strictly negative for all x∈(1/q,1]. We will see in Proposition 3.1 below that this happens if and only if β<β s (q), where β s (q)>0 is defined in (1.1).
For β≥0, define s ∗(β) and s ♯(β) as:
with the inf or sup being 1, if the respective sets are empty. We may now state:
Proposition 3.1
For all q≥3 the following holds:
-
(1)
We have that D β (s) is increasing in β if s∈[1/q,1] and for all β≥0, that D β (1/q)=0 and that D β (1)<0.
-
(2)
That \(s^{\sharp}(\beta ) > \frac {1}{q}\) if β<q/2.
-
(3)
The following statements are equivalent if β<q/2:
-
(a)
s ♯(β)=1.
-
(b)
D β (s) has no roots in (1/q,1].
-
(c)
D β (s ∗(β))<0.
-
(a)
-
(4)
All the statements in part (3) hold if and only if β<β s (q).
Proof
We start by proving part (1). It is clear that
and hence is strictly increasing in β if 1/q<s≤1. Furthermore, we have that
as well as
For part (2), taking the derivative of D β (s) with respect to s evaluated at 1/q, one obtains \(\frac {d}{ds} D_{\beta }(s) |_{s=1/q} = -1 + \frac{2\beta }{q}\) which is negative if β<q/2. Together with D β (1/q)=0, this completes the proof.
For part (3), we first show that there exists at most two points in [1/q,1] such that \(\frac{d}{d s}D_{\beta }(s) = 0\). To see this, we compute the first derivative and obtain that
where \(h(x) = \frac{x}{(1+x)^{2}}\). Obviously, there are at most two zeros for \(-1 + \frac{2q\beta }{q-1}h(x)\) and since \((q-1) \exp(2\beta \frac{1-qs}{q-1})\) is a strictly monotone in s, we conclude that there are at most two points such that \(\frac{d}{ds}D_{\beta }(s)\) vanishes.
Notice also that \(\frac{d}{ds}D_{\beta }(1/q) <0\) provided that β<q/2 and hence D β (1/q+ξ)<0 for all ξ≤ξ 0, where ξ 0 is a sufficiently small positive number. We are now ready to derive the equivalence stated in the proposition. Observing that D β (s) is a smooth function and D β (1)<0, we deduce that (3a) ⇒ (3b) ⇒ (3c). It remains to prove that (3c) ⇒ (3a). Suppose now that (3c) holds and there exists s 0∈(1/q,1] such that D β (s 0)≥0. Recalling that D β (1/q+ξ)<0 and D β (1)<0, we deduce the following:
-
If s 0<s ∗, we will then have at least two zeros in (1/q,s ∗) for \(\frac{d}{ds}D_{\beta }(s)\).
-
If s 0>s ∗, we will then have at least one zero in (s ∗,1) for \(\frac{d}{ds}D_{\beta }(s)\).
We see that the first case contradicts with the fact that there can be at most two zeros for \(\frac{d}{ds}D_{\beta }(s)\) and the second case contradicts our definition of s ∗. Altogether, we established that (3c) ⇒ (3a).
As for the last part, continuity and part (1) imply that (1.1) is equivalent to
Since D β (s) is increasing in β for all s∈[1/q,1], it follows that for all β<β s (q), we indeed have D β (s)<0 for all s∈(1/q,1]. On the other hand, by the continuity of the function D β (s) and our definition of β s , we know that there exists s M ∈(1/q,1] such that \(D_{\beta _{s}}(s_{M}) =0\). Now, using the result of part (1) we conclude that D β (s M )>0 for any β>β s , completing the proof. □
In the following proposition we discuss the relation between β s (q) and β c (q).
Proposition 3.2
For q≥3 we have that 0<β s (q)<β c (q)<q while β s (2)=β c (2)=1.
Proof
As recalled in Sect. 2.2,
In the q=2 case, it is easy to verify that \(\frac{d}{ds} D_{\beta }(s) < 0\) for all \(s \in[\frac{1}{2}, 1]\) if β<1 and \(\frac{d}{ds} D_{\beta }(\frac{1}{2}) > 0\) if β>1. Since in addition \(D_{\beta }(\frac{1}{2}) = 0\), D β (1)<0, we obtain β s (2)=1.
For q≥3 we have β c (q)<q/2 and therefore
and
Now if ϕ(q)=2(q−1)log(q−1)−q(q−2) then ϕ(2)=ϕ′(2)=0 and \(\phi''(s)=-2+\frac{2}{q-1}\) which is negative when q>2. It follows that ϕ(q) is negative when q>2 and hence for q≥3,
So for small enough ϵ>0 and \(s\in(\frac{q-1}{q}-\epsilon ,\frac {q-1}{q})\) we have that \(D_{\beta _{c}}(s)>0\) and hence \(\sup_{s} D_{\beta _{c}}(s)>0\). By the smoothness of D β (s) this implies that there exists β<β c such that sup s D β (s)>0 which establishes that β s <β c . □
We will make repeated use of the following proposition throughout the paper.
Proposition 3.3
-
(1)
Assume β<q/2. For all 0<ρ 0<ρ small enough, there exists γ>0 and C,c>0 such that for all n with t=eγn we have
(3.5)for all \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}^{+}}\).
-
(2)
Assume β<q/2. For all r 0>0, γ>0 there exists C,c>0 such that for all n and r>r 0 with t=γn, \(\rho_{0} = \frac{r_{0}}{\sqrt{n}}\) and \(\rho= \frac{r}{\sqrt{n}}\) we have
(3.6)for all \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}+}\).
-
(3)
Assume β<β s (q). For all ρ>0 there exists γ>0 and C,c>0, such that for all n, with t=γn we have
$$ \mathbb {P}_{\sigma_0} \bigl (\sigma_t \notin \varSigma _n^{\rho+} \bigr )\leq C \mathrm{e}^{-c n} $$(3.7)for all σ 0∈Σ n .
Proof
Consider the process until the first time it is to the right of 1/q+ρ. If n is large enough and, in case (1), if ρ is small enough, Proposition 3.1 part (1) and Eq. (3.1) imply that this process satisfies the conditions of Lemma 2.1. Parts (1) and (2) of the proposition then follow from parts (3) and (2) of the lemma and summing over all coordinates.
As for part (2), consider this time . Since β<β s (q), it follows from Proposition 3.1 (parts (1) and (3)) that this process also satisfies the conditions of Lemma 2.1, with δ>C′n −1, for some positive constant C′. Now set γ=2/C′ and apply part (1) of the lemma to conclude that except with probability exponentially small in n, \(S^{1}_{t} \leq1/q+\rho/2\) for some t≤γn. Once this happens, by Lemma 2.1 part (3), as in the proof of (3.5), we have \(S^{1}_{\gamma n} \leq1/q+\rho\) again except with probability tending exponentially fast to zero with n. It remains to use union bound to complete the proof. □
3.2 Bounded Dynamics
The bounded dynamics is a process that evolves like σ t , only that S(σ t ) is forced to stay close to \(\widehat {\mathsf {e}}\) by rejecting transitions which violate this condition. Formally, fix ρ>0 and let (σ t ) t≥0 be a Markov chain on \(\varSigma _{n}^{\rho+}\), which evolves as follows. Start from some \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and at step t+1:
-
Draw \(\widetilde {\sigma}_{t+1}\) according to P n (σ t ,⋅), where P n is the original transition kernel.
-
If \(\widetilde {\sigma}_{t+1} \in \varSigma _{n}^{\rho+}\) set \(\sigma_{t+1} = \widetilde {\sigma}_{t+1}\) and otherwise set σ t+1=σ t .
We shall denote by ℙρ the underlying probability measure.
The unbounded and ρ-bounded dynamics admit a natural coupling, under which the two processes start from the same configuration and evolve together until time , where S t is the unbounded process. This leads to the following two immediate observations which will be useful later.
-
(1)
For any integer t and bounded function f:(Σ n )t+1↦ℝ:
$$ \bigl |\mathbb {E}f(\sigma_{[0,t]}) - \mathbb {E}^{\rho} f(\sigma_{[0,t]})\bigr| \leq2 \|f\|_{\infty } \mathbb {P}(\tau\leq t). $$(3.8) -
(2)
In particular for any set A⊆(Σ n )t+1:
$$ \bigl |\mathbb {P}(\sigma_{[0,t]} \in A) - \mathbb {P}^{\rho} (\sigma_{[0,t]} \in A)\bigr| \leq2 \mathbb {P}(\tau\leq t). $$(3.9)
3.3 Synchronized Coupling
The synchronized coupling is a (Markov) coupling of two ρ-bounded dynamics in which the two chains “synchronize” their steps as much as possible. Formally, define (σ t ) t≥0, \((\widetilde {\sigma}_{t})_{t\geq0}\) on the same probability space such that starting from σ 0, \(\widetilde {\sigma}_{0}\), at time t+1:
-
(1)
Choose colors I t+1, \(\widetilde {I}_{t+1}\) according to an optimal coupling of S t , \(\widetilde {S}_{t}\).
-
(2)
Choose colors J t+1, \(\widetilde {J}_{t+1}\), according to an optimal coupling of \(g_{\beta } (S_{t} - n^{-1} \mathsf {e}_{I_{t+1}} )\), \(g_{\beta } (\widetilde {S}_{t} - n^{-1} \mathsf {e}_{\widetilde {I}_{t+1}})\).
-
(3)
Change a uniformly chosen vertex of color I t+1 in σ t to have color J t+1 in σ t+1, but only if \(\sigma_{t+1} \in \varSigma _{n}^{\rho+}\).
-
(4)
Change a uniformly chosen vertex of color \(\widetilde {I}_{t+1}\) in \(\widetilde {\sigma}_{t}\) to have color \(\widetilde {J}_{t+1}\) in \(\widetilde {\sigma}_{t+1}\), but only if \(\widetilde {\sigma}_{t+1} \in \varSigma _{n}^{\rho+}\).
We shall write \(\mathbb {P}^{SC,\rho}_{\sigma_{0}, \widetilde {\sigma}_{0}}\) for the underlying measure and omit ρ if it is large enough for the dynamics not to be bounded.
The following shows that this coupling contracts the ∥⋅∥1 distance of the proportions vector.
Lemma 3.4
There exists C(β,q)>0 such that for any ρ>0, uniformly in \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma _{n}^{\rho+}\) as n→∞
where
Proof
For \(s, \widetilde {s} \in \mathcal {S}_{n}\) by a Taylor expansion of g β around s, then another expansion for ∇g β around \(\widehat {\mathsf {e}}\) one has:
where we use the easily verified:
Now under the bounded dynamics, \(I_{t+1} \neq \widetilde {I}_{t+1}\) implies that \(S_{t}^{I_{t+1}} > \widetilde {S}_{t}^{I_{t+1}}\) and \(\widetilde {S}_{t}^{\widetilde {I}_{t+1}} > S_{t}^{\widetilde {I}_{t+1}}\) while \(J_{t+1} \neq \widetilde {J}_{t+1}\) implies that \((S_{t} - n^{-1} \mathsf {e}_{I_{t+1}} )^{J_{t+1}} > (\widetilde {S}_{t} - n^{-1} \mathsf {e}_{\widetilde {I}_{t+1}} )^{J_{t+1}}\) and \((\widetilde {S}_{t} - n^{-1} \mathsf {e}_{\widetilde {I}_{t+1}} )^{\widetilde {J}_{t+1}} > (S_{t} - n^{-1} \mathsf {e}_{I_{t+1}} )^{\widetilde {J}_{t+1}}\). It follows by the definition of the coupling that
Recalling that for \(s \in \mathcal {S}_{n}\), \(\|s\|_{\mathrm {TV}} = \frac {1}{2} \| s \|_{1}\) and that under the best coupling of distributions s, \(\widetilde {s}\) the probability of disagreement is \(\|s-\widetilde {s}\|_{\mathrm {TV}}\), we have:
The result follows by iteration. □
3.4 Uniform Variance Bound
Lemma 3.5
Assume β<q/2. There exists ρ 0=ρ 0(β,q) such that if ρ≤ρ 0
uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and t≥0, and there exists γ 0>0 such that
uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}+}\) and \(t \leq \mathrm{e}^{\gamma _{0} n}\).
Proof
Equation (3.13) will follow directly from Lemma 2.4 applied to S t under the ρ-bounded dynamics. Indeed, Lemma 3.4 gives a stronger version of Condition 2.12 with η=p+ρO(n −2). Now, if β<q/2 and ρ is small enough, we have \(\eta\leq1 - \frac{1 - \frac{2\beta }{q}}{2n}\) for large enough n. Then (3.13) follows from (2.13) since \(\operatorname {\mathbb {V}\mathrm {ar}}^{\rho}_{\sigma_{0}} S_{1} = O (n^{-2} )\).
For (3.14), find ρ′<ρ and use (3.5) and (3.8) to conclude that for all \(\sigma_{0} \in \varSigma _{n}^{\rho^{\prime}+}\) and t≤eγn for some γ=γ(ρ,ρ′):
□
Corollary 3.6
For β<β c (q), we have \(\operatorname {\mathbb {V}\mathrm {ar}}_{\mu_{n}} (S) = O(n^{-1})\).
Proof
Fix ρ<ρ 0, where ρ 0 is given in Lemma 3.5 and notice that the bounded dynamics is reversible with respect to the Potts measure μ n restricted to \(\varSigma _{n}^{\rho+}\). Therefore the bounded dynamics has \(\mu^{\rho}_{n}(\cdot) = \mu_{n}(\cdot| \sigma\in \varSigma _{n}^{\rho+})\) as its stationary measure. From the large deviation analysis in Sect. 2.2 it is straightforward to conclude that if β<β c (q)
for some C>0 and n large enough, depending on ρ and β. Therefore, we have \(\|\mu_{n} - \mu^{\rho}_{n}\|_{\mathrm {TV}} \leq\mathrm{e}^{-Cn}\). Since μ n , \(\mu^{\rho}_{n}\) live on a compact space, this gives \(\operatorname {\mathbb {V}\mathrm {ar}}_{\mu_{n}}(S) \leq \operatorname {\mathbb {V}\mathrm {ar}}_{\mu^{\rho}_{n}}(S) + \mathrm{e}^{-Cn}\). Since \(\mathbb {P}^{\rho}_{\sigma_{0}} (\sigma_{t} \in\cdot)\) converges to \(\mu_{n}^{\rho }\) as t→∞ for any fixed \(\sigma_{0} \in \varSigma ^{\rho+}_{n}\), Lemma 3.5 can be extended to σ 0 chosen from \(\mu _{n}^{\rho}\). This completes the proof. □
3.5 The Drift of the Distance to Equiproportionality
Here we show that \(\widehat {S}_{t}\) has drift towards 0. Write S t+1=S t +ξ t+1 where we have that for i, j=1,…,q,
Then,
where \(h(s) \triangleq \sum_{k=1}^{q} g_{\beta }^{k}(s) s^{k}\). Notice that since \(g^{k}_{\beta}(s)=g^{k}_{\beta}(s- \widehat {\mathsf {e}})\) we have that \(h(s) = 1/q + h(\widehat {s})\) and its gradient and Hessian are:
where P is a projection matrix onto \(( \widehat {\mathsf {e}})^{\perp}\). Therefore, we may write
This gives
where p is defined in (3.11).
3.6 Contraction for the Distance to Equiproportionality
Fix β<q/2 and ρ<ρ 0 where ρ 0 is given in Lemma 3.5. For what follows, assume that \(\sigma_{0} \in \varSigma ^{\rho+}_{n}\) and \(t \leq \mathrm{e}^{\gamma _{0} n}\) where γ 0 is also given. Then, taking expectation in Eq. (3.15), we get:
Now by Taylor expansion of \(s \mapsto \| s \|_{2}^{3}\) around \(\mathbb {E}_{\sigma _{0}} \widehat {S}_{t}\) in view of (3.14),
Then
This will in turn imply:
Proposition 3.7
Fix β<q/2. There exist ρ 0=ρ 0(β,q)>0 and C=C(β,q)>0 such that if ρ≤ρ 0 there exists γ(ρ)>0 such that:
uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and t≤eγ(ρ)n, where p=p(n,β,q) is defined in (3.11).
Proof
Set \(\lambda _{t} \triangleq \mathbb {E}_{\sigma_{0}} \| \widehat {S}_{t} \|_{2}^{2}\). It follows from (3.17) and (3.5) that for any \(\rho< \overline {\rho} < \rho_{0}\), where ρ 0 is given in Lemma 3.5, there exists \(\gamma = \gamma (\rho, \overline {\rho }) > 0\) such that uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho+}\) and t≤eγn:
We now use the following fact, which can be easily verified. If (λ t ) t≥0 is a sequence satisfying:
for some p≠r, p≠1, a and b, then:
Apply this (with a=0) and use the monotonicity in λ t of the right hand side above (at least if n is large enough), to conclude:
Plugging this a priori bound back into (3.17) we see that:
Using (3.19) again and choose \(\overline {\rho}\) small enough to obtain
as desired. □
4 Mixing in the Subcritical Regime
In this section we prove Theorem 1. Recall that \(\alpha _{1} = \alpha _{1}(\beta , q) = \frac{1}{2 (1 - 2\beta /q )}\) and set
4.1 Proof of Lower Bound in Theorem 1
Proof
The analysis in this subsection pertains to all β<β c (q). Fix 0<ρ 2<ρ 1<ρ 0, where ρ 0 is given in Proposition 3.7 and let σ 0∈Σ n be such that \(\rho_{2} < \| \widehat {s}_{0} \|_{2} < \rho_{1}\). Then if \(t < t^{\alpha _{1}}_{\gamma }(n)\) and ρ 1 is small enough, Proposition 3.7 implies
for sufficiently large −γ depending on ρ 2 and large enough n. Combined with the uniform variance bound given in Lemma 3.5, it follows that for large enough n
Applying Chebyshev’s inequality and using Lemma 3.5 again, we conclude that uniformly in all r>0, \(t \leq t^{\alpha _{1}}_{\gamma }(n)\) and \(\sigma_{0} \in \varSigma _{n}^{\rho_{1}+} \setminus \varSigma _{n}^{\rho_{2}+}\)
In particular, this implies
On the other hand, \(\mathbb {E}_{\mu_{n}} S_{t} = \widehat {\mathsf {e}}\) and from Corollary 3.6 it follows that \(\operatorname {\mathbb {V}\mathrm {ar}}_{\mu_{n}} S_{t} = O (n^{-1} )\) for β<β c (q). Therefore another application of Chebyshev’s inequality yields that
for all t≥0. Altogether, we have that for any r>0,
and it remains to send r→∞. □
In the remainder of the section we prove the upper bound on the mixing time when β<β s (q). The proof is based on upper bounding the coalescence time of two coupled dynamics, one starting from any configuration in Σ n and the other starting from the stationary distribution μ n . This coupling will be done in several stages with different couplings from one stage to the next. In what follows, (σ t ) t≥0 and \((\widetilde {\sigma})_{t \geq0}\) will denote the two coupled processes.
4.2 O(n −1/2) from Coalescence
We now show that with arbitrarily high probability, S t gets O(n −1/2)-close to \(\widehat {\mathsf {e}}\) in O(nlogn) steps, if initially its distance is at most ρ, where ρ is small enough. More precisely,
Lemma 4.1
Fix β<q/2. Then for all r>0:
uniformly in \(\sigma_{0} \in \varSigma _{n}^{\rho_{0}+}\) where ρ 0=ρ 0(β,q) is defined in Proposition 3.7 and \(t^{\alpha _{1}}(n)\) is defined in (4.1).
Proof
This follows immediately from Proposition 3.7 and a first moment argument:
□
4.3 O(n −1) from Coalescence
To get the correct order of the mixing time it is not sufficient to simply use the drift to couple the chains as the drift is very weak when S t is close to \(\widehat {\mathsf {e}}\). As such, in this section we define a different coupling of the dynamics which will bring σ t and \(\widetilde {\sigma }_{t}\) to distance O(n −1) apart in linear time. This will be achieved one coordinate after the other. We begin by giving a general definition of, what we call, a semi-independent coupling and then use it to define the coupling of the dynamics.
Let ν, \(\widetilde {\nu}\) be two positive distributions on Ω m =[1,m] and fix a non-empty A⊆Ω m , where m is some positive integer. We shall write ν| A for the conditional distribution given A, i.e. ν| A (x)=ν(x)/ν(A), for x∈A. The A-semi-independent coupling of ν, \(\widetilde {\nu}\) is a coupling of two random variables X and \(\widetilde {X}\) with underlying measure ℙ⋆, constructed according to the following procedure:
-
(1)
Choose U∈[0,1] uniformly.
-
(2)
If \(U \leq\min \{\nu(A), \widetilde {\nu}(A) \}\), draw X and \(\widetilde {X}\) using a best coupling of \((\nu|_{A}, \widetilde {\nu}|_{A} )\).
-
(3)
Otherwise, independently:
-
(a)
Draw X according to ν| A if U<ν(A) and according to \(\nu|_{A^{c}}\) if U≥ν(A).
-
(b)
Draw \(\widetilde {X}\) according to \(\widetilde {\nu}|_{A}\) if \(U < \widetilde {\nu}(A)\) and according to \(\widetilde {\nu}|_{A^{c}}\) if \(U \geq \widetilde {\nu}(A)\).
-
(a)
Clearly a Ω m -semi-independent coupling is a best coupling and for A=Ø, we define Ø-semi-independent coupling to be the standard independent coupling. The following proposition states a few properties of this coupling, which will be useful for the sequel.
Proposition 4.2
The following holds for the A-semi-independent coupling of \((\nu ,\widetilde {\nu} )\):
-
(1)
X, \(\widetilde {X}\) are distributed according to ν, \(\widetilde {\nu}\) respectively.
-
(2)
\(\mathbb {P}^{\star }(\bigcup_{x \in A} \{ X = x \} \varDelta \{ \widetilde {X} = x \} )\leq \frac {3}{2} \sum_{x \in A} | \nu(x) - \widetilde {\nu}(x) | \).
-
(3)
\(\forall x \notin A,\ \mathbb {P}^{\star }(X = x,\,\widetilde {X} \neq x )\geq\nu(x)\widetilde {\nu}(A^{c} \setminus\{x\}) \) and \(\forall x \notin A,\ \mathbb {P}^{\star }(\widetilde {X} = x,\,X \neq x )\geq \widetilde {\nu}(x)\nu(A^{c} \setminus\{x\})\).
Proof
Part one of the lemma is immediate. Part (2) follows from a straightforward calculation:
As for part (3), we have:
and similarly for \(\mathbb {P}^{\star }(\widetilde {X}=x, X \neq x)\). □
We are now ready to define the coupling of σ t , \(\widetilde {\sigma }_{t}\) for this section. Fix \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma _{n}\) and y 1,…,y q−1>0. The coordinate-wise coupling with parameters y 1,…,y q−1 and starting configurations σ 0, \(\widetilde {\sigma}_{0}\) is defined as follows.
-
(1)
Set T (0)=0, k=1.
-
(2)
As long as k≤q−1:
-
(a)
As long as \(|S^{k}_{t} - \widetilde {S}^{k}_{t} |> \frac{y_{k}}{n}\):
-
(i)
Draw I t+1, \(\widetilde {I}_{t+1}\), using a {1,…,k−1}-semi-independent coupling of S t , \(\widetilde {S}_{t}\).
-
(ii)
Draw J t+1, \(\widetilde {J}_{t+1}\), using a {1,…,k−1}-semi-independent coupling of \(g_{\beta } (S_{t} - \frac{1}{n} \mathsf {e}_{I_{t+1}} )\), \(g_{\beta } (\widetilde {S}_{t} - \frac{1}{n} \mathsf {e}_{\widetilde {I}_{t+1}} )\).
-
(iii)
Change a uniformly chosen vertex of color I t+1 in σ t to have color J t+1 in σ t+1.
-
(iv)
Change a uniformly chosen vertex of color \(\widetilde {I}_{t+1}\) in \(\widetilde {\sigma}_{t}\) to have color \(\widetilde {J}_{t+1}\) in \(\widetilde {\sigma}_{t+1}\).
-
(v)
Set t=t+1.
-
(i)
-
(b)
When \(|S^{k}_{t} - \widetilde {S}^{k}_{t} |\leq\frac{y_{k}}{n}\) set T (k)=t−T (k−1) and k=k+1.
-
(a)
-
(3)
Set \(T^{CC} = \sum_{k=1}^{q-1} T^{(k)}\).
We shall use \(\mathbb {P}^{CC}_{\sigma_{0}, \widetilde {\sigma}_{0}}\) to denote the probability measure for this coupling and \(\mathbb {P}^{CC(m)}_{\sigma_{0}, \widetilde {\sigma }_{0}}\) for the same coupling, only with k=m instead of k=1 in step (1), i.e. starting from the m-th stage. Notice that, in principle, the stopping condition at stage k, may never get satisfied, in which case we stay at that stage forever and T (k)=T CC=∞.
For u,r>0, define
where above \((s,\widetilde {s}) = (S(\sigma), S(\widetilde {\sigma}))\). Finally, set \(\mathcal {H}_{u,r} \triangleq \mathcal {H}_{u,r}^{q}\). The following lemma will be the main ingredient in an inductive proof for an upper bound on T CC:
Lemma 4.3
Fix β<q/2. Let k∈[1,q−1]. For all u k−1,r k−1,ϵ>0, there exist y k ,u k ,r k ,γ k >0, such that if \((\sigma_{0}, \widetilde {\sigma}_{0}) \in \mathcal {H}^{k}_{u_{k-1}, r_{k-1}}\) then
Proof
Recall the expression for the drift of one coordinate (3.1). Near \(\widehat {\mathsf {e}}\), this becomes by Taylor expansion for any i∈[1,q]:
and if \(W_{t} = S_{t} - \widetilde {S}_{t}\) then
Now, for some r k >0 to be chosen later, let . Then, from (4.6) it follows that there exists y k >0 such that \(\overline {W}^{k}_{t} \triangleq |W^{k}_{t \land \tau^{(k)} \land T^{(k)}} |\) is a supermartingale. Clearly \(|\overline {W}^{k}_{t+1} - \overline {W}^{k}_{t} |< \frac{2}{n}\). Also, from Proposition 4.2, if t<τ (k)∧T (k):
which implies that \(\mathbb {E}^{CC(k)}_{\sigma_{0}, \widetilde {\sigma}_{0}} [ (\overline {W}^{k}_{t+1} - \overline {W}^{k}_{t} )^{2} |\mathcal {F}_{t} ]\geq \frac {1}{2} q^{-2}n^{-2} + O (n^{-5/2} )\). On the other hand, in view of (4.6), \(\mathbb {E}^{CC(k)}_{\sigma_{0}, \widetilde {\sigma}_{0}} [ \overline {W}^{k}_{t+1} - \overline {W}^{k}_{t} |\mathcal {F}_{t} ]= O(n^{-3/2})\). Combining the two bounds, we infer that there exists C>0, which doesn’t depend on r k or y k , such that on {t<τ (k)∧T (k)}
for n sufficiently large. We now apply Lemma 2.3 with \(Z_{t} = \overline {W}^{k}_{t}\), \(z_{0} = \frac{2 r_{k-1}}{\sqrt{n}}\) and N=τ (k)∧T (k). This gives for γ k >0:
whence we may choose γ k =γ k (r k−1,ϵ) independently of r k , y k but sufficiently large, such that
This gives an upper bound on T (k), since by Proposition 3.3 part (2) we may choose r k large enough such that:
It remains to ensure that we do not increase the distances in the first k−1 coordinates, by too much. Proposition 4.2 implies that for any t:
It follows that
Taking expectation of both sides and using the assumption on \(\| \overline {W}_{0}^{[1, k-1]} \|_{1}\), we have
Hence by Markov’s inequality, there exists u k >0 such that
Combined with (4.8) and (4.9), the proof is complete. □
Corollary 4.4
Fix β<q/2. For any ϵ,r>0, there exist γ,u,r′>0 and y 1,…,y q−1>0 such that for \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma ^{\frac{r}{\sqrt{n}}}_{n}\).
Proof
Starting from r 0=u 0=r and applying Lemma 4.3 inductively, we obtain for some (y k ,u k ,r k ,γ k ) k∈[1,q−1]:
It remains to set \(\gamma = \sum_{k=1}^{q-1} \gamma _{k}\), r′=r q−1 and u=2u q−1. □
4.4 Coalescence of Proportions Vector Chains
The next lemma completes the coupling of the proportions chains.
Lemma 4.5
Fix β<q/2. For all r,u,ϵ>0 there exists γ>0 such that if \(\sigma_{0}, \widetilde {\sigma}_{0} \in \varSigma _{n}\) satisfy \((\sigma_{0}, \widetilde {\sigma}_{0}) \in \mathcal {H}_{u, r}\) and t≥γn, then
where under \(\mathbb {P}^{SC}_{\sigma_{0}, \widetilde {\sigma}_{0}}\), the processes (σ t ) t≥0, \((\widetilde {\sigma}_{t})_{t \geq0}\) evolve according to the synchronized coupling, as defined in Sect. 3.3.
Proof
If ρ is small enough, it follows from Lemma 3.4 that
Combined with Proposition 3.3 part (1), this implies that there exists γ=γ(u) such that \(\mathbb {E}^{SC}_{\sigma_{0}, \widetilde {\sigma}_{0}} \| S_{t} - \widetilde {S}_{t} \|_{1} \leq\frac {\epsilon }{n}\) for t≥γn. Then by Markov’s inequality:
□
4.5 Basket-Wise Proportions Coalescence
The next coupling will allow us to turn a well-mixed proportions chains into a well-mixed configurations chain. Let \(\mathfrak {B}= (\mathcal {B}_{m})_{m=1}^{q}\) be a partition of [1,n]. We shall refer to \(\mathcal {B}_{m}\) as a basket and call \(\mathfrak {B}\) a λ-partition if \(|\mathcal {B}_{m}| > \lambda n\) for all m. Given σ∈Σ n , let S(σ) denote a q×q matrix whose (m,k) entry is equal to the proportion in σ of color k in basket m, namely
S is an element of \(\mathbb {S}\triangleq \prod_{m=1}^{q} \mathcal {S}\) and we define \(\mathbb {S}^{\rho} = \prod_{m=1}^{q} \mathcal {S}^{\rho}\) and \(\mathbb {S}^{\rho+} = \prod _{m=1}^{q} \mathcal {S}^{\rho+}\). We also let \(\mathcal {B}_{[m_{0}, m_{1}]} = \bigcup_{m_{0} \leq m \leq m_{1}} \mathcal {B}_{m}\) and as before use S t as a shorthand for S(σ t ). The following is an analogue of Lemma 4.1 for the basket proportions matrix.
Lemma 4.6
Let \(\mathfrak {B}\) be a λ-partition for some λ>0. If either of the following holds:
-
(1)
\(\sigma_{0} \in \varSigma _{n}^{\rho_{0}}\) and \(t^{\alpha _{1}}(n) \leq t \leq \mathrm{e}^{\gamma (\rho _{0}) n}\) where ρ 0, γ(ρ 0) are given in Proposition 3.7 and \(t^{\alpha _{1}}(n)\) is defined in (4.1).
-
(2)
\(\mathbf {S}_{0} \in \mathbb {S}^{\frac{r_{0}}{\sqrt{n}}}\) and t≤γ 0 n for some r 0,γ 0>0,
then
where the O(r −2) term is as r→∞, uniformly in n.
In order to prove Lemma 4.6, we use the following proposition to bound the second moment of the basket proportions matrix.
Proposition 4.7
If \(\mathfrak {B}\) is a λ-partition for λ>0 and m,k∈[1,q], then
Proof
Let \(\lambda _{0} = |\mathcal {B}_{m}|/n \geq \lambda > 0\) and set \(\mathbf {Q}^{m,k}_{t} = \mathbf {S}^{m,k}_{t} - S^{k}_{t}\). Then:
where (denoting by V t+1 the chosen vertex at step t+1):
Plugging these into (4.11), we obtain
as required. □
Proof of Lemma 4.6
Taking expectation in (4.10) and applying (3.19) one gets:
In both cases (note that \(\alpha _{1}(\beta , q) > \frac {1}{2}\) for all β<q/2), it implies \(\mathbb {E}_{\sigma_{0}} [\mathbf {S}^{m,k}_{t} - S_{t}^{k} ]^{2} = O(n^{-1})\). Summing over all m and k and using Markov’s inequality we get
Now, by Proposition 3.7 Case (1) and Proposition 3.3 Case (2) we also have that
Combining the two, we complete the proof. □
Suppose now that you have two initial configurations \(\sigma_{0}, \widetilde {\sigma }_{0}\), such that \(s_{0} = \widetilde {s}_{0}\). The following is a coupling under which eventually (with probability 1) also \(\mathbf {S}_{t} = \widetilde {\mathbf {S}}_{t}\). Equality is achieved one basket at a time, indexed below by m and once the proportions in a basket are equated they remain so. We shall call this coupling Basket-wise Coupling and denote by ℙBC the underlying probability measure.
-
(1)
Set t=0, m=1.
-
(2)
As long as m≤q:
-
(a)
As long as \(\mathbf {S}_{t}^{m} \neq \widetilde {\mathbf {S}}_{t}^{m}\):
-
(i)
Choose “old” color I t+1 according to distribution \(S_{t} = \widetilde {S}_{t}\).
-
(ii)
Choose “new” color J t+1 according to distribution \(g_{\beta } (S_{t} - \frac{1}{n} \mathsf {e}_{I_{t+1}} )= g_{\beta } (\widetilde {S}_{t} - \frac{1}{n} \mathsf {e}_{I_{t+1}} )\).
-
(iii)
Choose a vertex V t+1 uniformly among all vertices in [1,n] having color I t+1 under σ t .
-
(iv)
Choose \(\widetilde {V}_{t+1}\):
-
(A)
If \(V_{t+1} \in \mathcal {B}_{m_{0}}\) for m 0<m, choose \(\widetilde {V}_{t+1}\) uniformly among all vertices in \(\mathcal {B}_{m_{0}}\) having color I t+1 under \(\widetilde {\sigma}_{t}\).
-
(B)
Otherwise, if \(\mathbf {S}_{t}^{m, I_{t+1}} \neq \widetilde {\mathbf {S}}_{t}^{m, I_{t+1}}\) and \(\mathbf {S}_{t}^{m, J_{t+1}} \neq \widetilde {\mathbf {S}}_{t}^{m, J_{t+1}}\), choose \(\widetilde {V}_{t+1}\) uniformly among all vertices in \(\mathcal {B}_{[m, q]}\) having color I t+1 under \(\widetilde {\sigma}_{t}\).
-
(C)
Otherwise, let v 1,v 2,… be an enumeration of the vertices in \(\mathcal {B}_{[m, q]}\) having color I t+1 under σ t ordered first by the index of the basket they belong to and then by their index in V and let \(\widetilde {v}_{1}, \widetilde {v}_{2}, \dots\) be the same for \(\widetilde {\sigma}_{t}\). Then set \(\widetilde {V}_{t+1}=\widetilde {v}_{i}\) where i is such that V t+1=v i .
-
(A)
-
(v)
Set \(\sigma_{t+1}(V_{t+1}) = \widetilde {\sigma}_{t+1}(\widetilde {V}_{t+1}) = J_{t+1}\) and t=t+1.
-
(i)
-
(b)
Set m=m+1.
-
(a)
The following lemma gives an upper bound for the time of basket-wise proportions coalescence.
Lemma 4.8
Fix β<q/2. For any λ>0, r>0, ϵ>0, there exists γ=γ(λ,r,ϵ), such that for any λ-partition and any \(\sigma_{0}, \widetilde {\sigma_{0}}\) such that \(S_{0} = \widetilde {S}_{0}\) and \(\mathbf {S}_{0}, \widetilde {\mathbf {S}}_{0} \in \mathbb {S}^{\frac{r}{\sqrt{n}}}\),
Proof
From the definition of the coupling, once the proportions of basket m have coalesced they will remain equal forever. It suffices, therefore, to analyze the coalescence time of each basket separately. Note also that the coupling preserves the equality \(S_{t} = \widetilde {S}_{t}\) for all t≥0.
Define \(\mathbf {W}_{t} = \mathbf {S}_{t} - \widetilde {\mathbf {S}}_{t}\), \(W^{m}_{t} = \| \mathbf {W}^{m}_{t} \|_{1}\) and let τ (0)=0 and \(\tau^{(m)} = \min\{t \geq\tau^{(m-1)}: W^{m}_{t} = 0\}\) for m∈[1,q]. Also set
for ρ>0 sufficiently small and \(\tau^{(m)}_{*} = \tau^{(m)} \land \tau_{*}\). We claim that for all m, \((W^{m}_{t})_{t \geq0}\) is a supermartingale between τ (m−1) and τ (m) as long as τ ∗ is not reached. In order to see this, fix m, t and assume \(\{\tau^{(m-1)} \leq t < \tau^{(m)} \; ,\, \tau _{*} > t\}\). Then at step (2(a)iv), according to the coupling, there are 3 cases:
-
(A)
Clearly \(\mathbf {S}^{m}_{t+1} = \mathbf {S}^{m}_{t}\) and \(\widetilde {\mathbf {S}}^{m}_{t+1} = \widetilde {\mathbf {S}}^{m}_{t}\) and hence \(W^{m}_{t+1} = W^{m}_{t}\).
-
(B)
Notice that in this case, we have \(\mathbf {W}_{t}^{m, I_{t+1}} \mathbf {W}_{t+1}^{m, I_{t+1}} \geq0\) and \(\mathbf {W}_{t}^{m, J_{t+1}} \mathbf {W}_{t+1}^{m, J_{t+1}} \geq0\). Therefore,
-
(C)
If \(V_{t+1}, \widetilde {V}_{t+1} \in \mathcal {B}_{m}\) or \(V_{t+1}, \widetilde {V}_{t+1} \notin \mathcal {B}_{m}\), then \(W^{m}_{t+1} = W^{m}_{t}\), otherwise from the construction we must have:
$$\bigl |\mathbf {W}^{m, I_{t+1}}_{t+1}\bigr| - \bigl |\mathbf {W}^{m, I_{t+1}}_{t}\bigr| = - \frac {1}{|\mathcal {B}_m|} , $$as well as
$$\bigl |\mathbf {W}^{m, J_{t+1}}_{t+1}\bigr| - \bigl |\mathbf {W}^{m, J_{t+1}}_{t}\bigr| \leq \frac {1}{|\mathcal {B}_m|} . $$Summing these two, we obtain a non-positive drift for \(W^{m}_{t}\).
Observe that as long as τ ∗ is not reached, both \(\operatorname {\mathbb {V}\mathrm {ar}}^{BC}(W^{m}_{t+1} | \mathcal {F}_{t})\) under Case (B) and the probability that this case happens are bounded below uniformly in n and t. This gives a uniform lower bound on the variance \(\operatorname {\mathbb {V}\mathrm {ar}}^{BC}(W^{m}_{t+1} | \mathcal {F}_{t})\). Furthermore, if for some t, we have S t , \(\widetilde {\mathbf {S}}_{t} \in \mathbb {S}^{\frac{r^{\prime}}{\sqrt{n}}}\), then in view of Lemma 4.6 after γ′n time, we have S t+γ′n , \(\widetilde {\mathbf {S}}_{t+\gamma ^{\prime} n} \in \mathbb {S}^{\frac{r^{\prime\prime}}{\sqrt{n}}}\) with probability 1−O((r′′)−2). Therefore, using Lemma 2.3 we may find γ 1,…,γ q−1 such that inductively, conditioned on \(\tau ^{(m-1)}_{*} \leq \gamma _{m-1} n\) with probability at least 1−ϵ/(2q) we have \(\tau^{(m)}_{*} \leq \gamma _{m} n\). This in turn implies that \(\tau ^{(q-1)}_{*} \leq \gamma n\) with probability at least 1−ϵ/2, where γ≜γ q−1.
It remains to bound τ ∗ below with high probability. Let \(B^{m,j} = \bigcup_{t = 1}^{\gamma n}\{ |\mathbf {S}_{t}^{m, j} - 1/q | \geq \rho\}\) and
Using Lemma 4.6 we obtain that
Then as B m,j implies that \(Y^{m,j} > \frac{n\lambda\rho}{2}\),
Summing over all m, j and arguing the same for \(\widetilde {\mathbf {S}}_{t}\) we obtain
Finally by a union bound we have \(\mathbb {P}_{\sigma_{0},\widetilde {\sigma}_{0}}^{BC} (\tau^{(q)} \leq\gamma n) \geq1-\frac {\epsilon }{2} + O(n^{-1})\) as desired. □
4.6 The Overall Coupling
We now describe precisely how the previous couplings are combined together to create the overall coupling. This coupling will be the main tool in proving the upper bound. Formally, let γ 1,γ 3,γ 4,γ 5 and y 1,…,y q−1 be positive numbers and σ 0∈Σ n . The overall coupling with parameters γ 1,…,γ 5, y 1,…,y q−1 and initial configuration σ 0 is a coupling of two chains (σ t ) t , \((\widetilde {\sigma}_{t})_{t}\) under measure \(\mathbb {P}^{OC}_{\sigma_{0}}\). The initial configuration for (σ t ) t is σ 0, while \(\widetilde {\sigma }_{0}\) is chosen according to μ n . Then, the two processes evolve as follows.
-
(1)
Run σ t and \(\widetilde {\sigma}_{t}\) independently until time t (1)(n)=γ 1 n.
-
(1A)
Partition the vertex set [1,n] into baskets \(\mathfrak {B}= (\mathcal {B}_{1}, \dots, \mathcal {B}_{q})\) such that for k∈[1,q].
-
(1A)
-
(2)
Run σ t and \(\widetilde {\sigma}_{t}\) independently (again) until time \(t^{(2)}(n) = t^{(1)} + t^{\alpha _{1}}(n)\) time where \(t^{\alpha _{1}}(n)\) is defined in (4.1).
-
(3)
Run σ t and \(\widetilde {\sigma}_{t}\) according to the coordinate-wise coupling with parameters y 1,…,y q−1 until time t (3)(n)=t (2)(n)+γ 3 n (unless stopped before).
-
(4)
Run σ t and \(\widetilde {\sigma}_{t}\) according to the synchronized coupling until time t (4)(n)=t (3)(n)+γ 4 n time.
-
(5)
Run σ t and \(\widetilde {\sigma}_{t}\) according to the basket-wise coupling for t (5)(n)=t (4)(n)+γ 5 n time with the baskets above.
4.7 Proof of Upper Bound in Theorem 1
We will now use the overall coupling with appropriate parameters to establish the upper bound of the mixing time. Recall that β<β s (q). Fix ϵ>0, pick ρ>0 small enough and let σ 0 be any initial configuration. By Proposition 3.3 part (3), we can choose γ 1 large enough such that
Assuming that this event indeed occurred, \(\mathfrak {B}\) is a \(( \frac {1}{q} - \rho )\)-partition and provided that ρ is small enough, the conditions in Lemma 4.1 are satisfied. From the latter we conclude that for some r>0, with probability at least 1−ϵ, \(S_{t^{(2)}(n)} \in \mathcal {S}^{\frac{r}{\sqrt{n}}}\). On the other hand, as in (4.4) with probability at least 1−2ϵ we also have \(\widetilde {S}_{t^{(2)}(n)} \in \mathcal {S}^{\frac{r}{\sqrt{n}}}\) if r is large enough. Then Corollary 4.4 and Lemma 4.5 ensure that there exist y 1,…,y q−1 and γ 3, γ 4, such that \(S_{t^{(4)}(n)} = \widetilde {S}_{t^{(4)}(n)}\) with probability at least 1−3ϵ. From Lemma 4.6 we have that \(\mathbf {S}_{t^{(4)}(n)}, \widetilde {\mathbf {S}}_{t^{(4)}(n)} \in \mathbb {S}^{\frac{r^{\prime }}{\sqrt{n}}}\) with probability at least 1−4ϵ for some r′>0. Then, by Lemma 4.8 we may choose γ 5 such that \(\mathbf {S}_{t^{(5)}(n)} = \widetilde {\mathbf {S}}_{t^{(5)}(n)}\) with probability at least 1−5ϵ.
Now, by symmetry, for any t≥t (1)(n) the distribution of σ t , given \(\mathcal {F}_{t^{(1)}(n)}\), is invariant under permutations of the vertices in each basket of \(\mathcal {B}\) and the same is clearly true for μ n . Therefore we conclude that
Then from Jensen’s inequality we obtain
Now \(t^{(5)}(n)=t^{\alpha _{1}}_{\gamma }(n)\) (as defined in (4.1)) with γ=γ 1+γ 3+γ 4+γ 5 and since σ 0 is arbitrary and ϵ can be made arbitrarily small, by choosing γ large enough, this establishes the upper bound for the cutoff.
5 Mixing in the Supercritical Regime
5.1 Proof of Theorem 3
We first give the proof for the case β s <β<β c . Recall (Sect. 2.2) that \(\beta _{c}(q) = \frac{(q-1)}{q-2}\log(q-1)\) for q≥3 and β c (2)=1. We claim that for q≥3
It can be checked for q=3 and for q≥4, it suffices to prove that \(f(x) := \frac{x-1}{x (x-2)} \log(x-1)\) is decreasing in x on [3,∞). We compute the derivative and obtain that
which is negative for x≥3.
Now fix δ>0 and notice that if s 1∈[1/q,1−δ], conditional on {S 1=s 1}, (S i/(1−s 1):2≤i≤q) is distributed as the proportions vector for the (q−1)-states Curie-Weiss Potts model on (1−s 1)n vertices with β′=β(1−s 1)≤(1−1/q)β≤(1−1/q)β c (q)<β c (q−1). Therefore, for all δ 1>0
as n→∞ uniformly in s 1∈[1/q,1−δ]. Also uniformly in \(s \in \mathcal {S}_{n}\), recall that:
where \(d_{\beta }(s) = -s^{1} + g_{\beta }^{1}(s)\). It now follows from the uniform continuity of d β (s) and (5.2) that uniformly in s 1∈[1/q,1−δ],
where \(D_{\beta }(s^{1}) = d_{\beta }(s^{1}, \frac{1-s^{1}}{q-1}, \dots, \frac {1-s^{1}}{q-1} )\).
Now if β>β s (q), from Proposition 3.1, there exists δ 2, such that D β (s 1) is uniformly positive in a δ 2-neighborhood of s ∗(β). All together we infer that there exists ϵ>0 such that uniformly in s 1∈(s ∗(β)−δ 2,s ∗(β)+δ 2) for all n large enough:
and also
for j∈−1,0,1 where the last inequality follows from the concentration of the conditioned measure as well as the continuity of the probability to stay put. These two formulas together imply that
for some fixed constant λ>1 for all s 1∈(s ∗(β)−δ 2,s ∗(β)+δ 2) when n is sufficiently large. Since (S t ) t≥0 is a reversible Markov chain, with μ n its stationary measure,
and therefore, for all s 1∈(s ∗(β)−δ 2,s ∗(β)+δ 2)
and hence
Now select the set A={S 1≥s ∗(β)−δ 2}. By (5.3), \(\frac{\mu_{n}(\partial_{P_{n}} A)}{\mu_{n}(A)} \leq \lambda ^{-\delta_{2} n}\), where
and P n is the transition kernel of the Glauber dynamics. Since β<β c (q) and s ∗(β)−δ 2>1/q we also have μ n (A)=o(1) as n→∞. Therefore Cheeger’s inequality (Theorem 2.5) immediately implies an exponential lower bound on the mixing time.
The case β≥β c (q) is simpler. As the large deviations analysis in Sect. 2.2 shows, we may find \(A = \{ \| S - \check{s}_{\beta , q} \|_{2} < \delta \}\), where \(\check{s}_{\beta , q}\) is defined in (2.4) and δ>0 is small enough such that \(\limsup_{n \to\infty} n^{-1} \log\mu_{n}(\partial_{P_{n}} A)<0\) and lim inf n→∞ n −1logμ n (A)=0. Since symmetry implies μ n (A)≤1/q (if δ is sufficiently small), exponential mixing time follows immediately from another application of Cheeger’s inequality (Theorem 2.5).
6 Mixing Near Criticality
We now assume β(n)=β s (q)−ξ(n), with ξ(n)→0 as n→∞. Once β(n) approaches β s with n, we no longer have a uniform negative upper bound on the drift to the right of 1/q for each coordinate. Instead, near s ∗(β), the drift will be of order ξ(n), possibly even positive and hence it will take longer than linear time to get close to \(\widehat {\mathsf {e}}\) and this may have an effect on the order of the mixing time and cutoff window. Accordingly, in addition to the coalescence time analysis near \(\widehat {\mathsf {e}}\), one has to obtain sharp asymptotics for the passage time near s ∗(β). This is achieved using several propositions which we state in Sect. 6.1. Their proofs will be deferred until the end of the section in favor of first showing how they are used along with the previous coalescence analysis to find the mixing time near criticality which gives the proof of Theorem 2.
Both the analysis and the results in Theorem 2 are qualitatively different, depending on whether ξ(n) decays faster or slower than some threshold rate. Accordingly, we shall distinguish between two regimes and write:
([CR] stands for Cutoff Regime and [NCR] stands for No-Cutoff Regime). For a>0, define also
Both (6.1) and (6.2) will be used for sequences other than ξ as well. We shall also employ the following notation for hitting times. Given a real-valued process (X t ) t≥0 and a number x∈ℝ we shall write
for the right and left hitting time of X at x. Notice that this notation does not carry an indication for the process for which \(\tau _{x}^{+}\) is a hitting time and in case this is not clear from the context, it will be mentioned explicitly.
6.1 Drift Analysis Near s ∗(β)
The following proposition states several properties of the function D β near s ∗(β).
Proposition 6.1
For all q≥3 the following holds:
-
(1)
The point s ∗(β s ) is the unique \(s\in(\frac{1}{q},1]\) such that \(D_{\beta _{s}}(s)=0\).
-
(2)
For k=0,1,… , the functions \(D^{*}_{k}(\beta ) \triangleq \frac{d^{k}}{ds^{k}} D_{\beta }(s^{*}(\beta ))\) are C ∞ in a neighborhood of β s . Furthermore:
-
\(\frac{d}{d \beta } D^{*}_{0}(\beta _{s}) > 0\).
-
\(D^{*}_{2}(\beta _{s}) < 0\).
-
-
(3)
For all ρ>0, there exists δ>0 such that:
(6.3)
The next lemma gives sharp asymptotics for the passage time near 0 for a process with certain drift assumptions near 0 (given by (6.4) below). The one coordinate process will fall into this category if we analyze it near s ∗(β).
Formally, let \(((Z_{t})_{t \geq0}^{n} \ ; \; n \geq0 )\) be a sequence of discrete time processes. For all n, suppose that \((Z_{t} )_{t \geq0} = (Z_{t} )_{t \geq 0}^{n}\) is adapted to \((\mathcal {F}_{t} )_{t \geq0}^{n}\), satisfies n|Z t+1−Z t |∈{−1,0,1} with probability 1, and
where a>0, b∈ℝ and ζ(n) is a sequence satisfying ζ(n)→0 as n→∞. We allow both ζ∈[CR] and ζ∈[NCR], but in the latter case, we assume in addition the existence of d>0 such that for all n
Write \(\mathbb {P}_{z_{0}}\) for the probability measure under which this process is defined and starts from z 0.
Lemma 6.2
Fix ρ>0 sufficiently small. Then for z 0=−ρ there exist functions L ∗,U ∗:(−∞,∞)→[0,1] satisfying lim γ→−∞ L ∗(γ)=lim γ→∞ U ∗(γ)=0 such that for all γ,
where \(\tau_{\rho}^{+}\) is a hitting time for Z. Moreover, if ζ∈[NCR] we can chose U ∗ such that for all γ we have
Remark 6.3
The upper (lower) bound in the lemma still holds if (Z t ) t≥0 satisfies (6.4) with ≥ (≤) in place of the equality sign or if in place of z 0=−ρ we have z 0≥−ρ (z 0≤−ρ). Since (Z t ) t≥0 has \(0, \pm \frac {1}{n}\) steps this can be shown by a simple coupling argument.
The next proposition shows that the drift of one coordinate stays close to its upper bound D β (⋅) for sufficiently long time. More precisely, for σ 0∈Σ n , t>0, δ>0, y∈[0,1] let
where \(\tau_{y}^{-}\) is a hitting time for \(S^{1}_{t}\). Then,
Proposition 6.4
Suppose that β≤q/2 and set σ 0≡1. Then for any \(y > \frac {1}{q}\):
-
(1)
If t(n)=o(n 2) and δ(n)n 2 t(n)−1→∞ then lim n→∞ K n (σ 0,t(n),y,δ(n))=0.
-
(2)
If t(n)=γn 4/3 then for all δ>0 we have
$$\lim_{\gamma \to0} \limsup_{n \to\infty} K_n\bigl(\sigma_0, t(n), y, \delta n^{-2/3}\bigr) = 0. $$
6.2 Proof of Theorem 2
6.2.1 Upper Bound on Mixing Time
Fix ρ>0 small enough and let σ 0∈Σ n be given. By Proposition 6.1 part (1), we can find δ>0 so that
where we use s ∗ in place of s ∗(β). Then by Lemma 2.1 part (1), we have that,
where this and all hitting times below are of \(S_{t}^{1}\). Define now \(Z_{t}=s^{*} - S^{1}_{t+\tau_{(s^{*}+\rho)}^{-}}\). Using (3.1), (3.3) Proposition 6.1 and applying Taylor’s expansion for D β (s) around s ∗ and then again for \(D_{0}^{*}(\beta )\) around β s , we infer that there exist a>0, α≠0, b∈ℝ such that
where ζ(n)=αξ(n)+O(ξ(n)2+n −1) and also (6.5) holds (if needed), since the probability of choosing any new color at time t+1 is bounded above and below, uniformly in n and S t . Hence by Lemma 6.2 and Remark 6.3, for all γ
Now, using the relation between ζ(n) and ξ(n), it is not difficult to verify that \(t^{\zeta , a}_{\gamma }(n) \leq t^{\xi, a^{\prime}}_{\gamma ^{\prime}}(n)\) for all γ′, where a′=αa and γ=F(γ′) for some F such that γ→∞ if γ′→∞.
From Lemma 2.1 part (3), applied to the process \((S^{1}_{t+\tau_{(s^{*}-\rho)}^{-}} - (s^{*} - \rho ): \; t \geq0 )\), it follows that with 1−o(1) probability \(S^{1}_{t+\tau_{(s^{*}-\rho )}^{-}}\) stays to the left of s ∗−ρ/2 for all t<n 2. Then we may apply Lemma 2.1 part (1) to the process to conclude
Finally another application of Lemma 2.1 part (3) gives \(S^{1}_{t+\tau_{(q^{-1}+\rho/2)}^{-}} \leq1/q + \rho\) for all t<n 2 with 1−o(1) probability. For the [CR] case, we use union bound (over all coordinates):
For the [NCR] case, define \(\tau^{(1)} = \tau_{(q^{-1}+\rho/2)}^{-}\) and for k>1. Then, by inductive conditioning we obtain
Since also \(S^{k}_{t+\tau^{(k)}} \leq1/q + \rho\) for all k∈[1,q], t<n 2 with 1−o(1) probability, we arrive to
We now re-employ the overall coupling in Sect. 4.6, but in view of (6.11) and (6.12) we change step (1) and instead of running the two chains for γ 1 n time, we run them for \(t^{(1)}(n) = t_{\gamma ^{\prime}}^{\xi,a^{\prime}}(n)\). As (6.11), (6.12) show, we can choose γ′ large enough such that \(\mathbb {P}^{OC}_{\sigma_{0}} (S_{t_{\gamma ^{\prime}}^{\xi, a^{\prime}}(n)} \notin \mathcal {S}^{\rho+}) \leq \epsilon \) for n sufficiently large. The remaining steps in the coupling are left unchanged and we choose the same parameter values, as in the proof of Theorem 1.
Using the analysis of the modified step (1) given by (6.11) and (6.12), together with the analysis in Sect. 4.7 of the remaining steps—which carries over (uniformly in β near β s (q)), since it only required β<β c (q), we recover (4.14), namely
The time is now given by
for some γ>0. Since σ 0 is arbitrary and ϵ can be made arbitrarily small, by having γ, γ′ large enough, this completes the proof for the upper bound in (1.4) and (1.5) with \(\alpha _{2} = \pi /\sqrt {\alpha a}\).
6.2.2 No Cutoff in NCR Case
Using the modified overall coupling as introduced above, we obtain from (4.14), (6.12) and (6.8) for any γ′ and sufficiently large γ
for all σ 0, large enough n and some ϵ>0. Then, since in the [NCR] case
this shows that there is no cut-off.
6.2.3 Lower Bound on Mixing Time.
Fix ρ>0 small enough and start with σ 0≡1—the all ‘1’ configuration. Define:
where A(n) is a sequence tending to ∞ sufficiently slowly and δ 1>0. Set
and define the process Y t which is equal to \(S_{t}^{1}\) up to time N, but after this time evolves like a birth-and-death processes with ±1/n increments and drift \(-n^{-1} D_{\beta }(S^{1}_{t})\). Then (Z t ≜s ∗−Y t :t≥0) satisfies
with a, b, α as in the upper bound case, but with ζ(n)=αξ(n)+δ(n)+O(ξ(n)2+n −1) and condition (6.5) holds (if needed) as before. Then, using Lemma 6.2 and Remark 6.3, we have for n large enough \(\mathbb {P}_{\sigma_{0}} (\tau_{\rho}^{+} < t^{\zeta , a}_{\gamma }(n) )\leq2 L^{*}(\gamma )\), where \(\tau_{\rho}^{+}\) is a hitting time for Z and γ∈ℝ. As before, it is not difficult to verify that if A(n) is increasing slowly enough, \(t^{\zeta , a}_{\gamma }(n) \geq t^{\xi, a^{\prime }}_{\gamma ^{\prime}}(n)\), where γ=F(γ′) satisfies γ→−∞ if γ′→−∞.
Now define and τ′ as \(\tau_{\rho}^{+}\), only with \(S_{t}^{1}\) in place of Y t . Then
where K n is defined in (6.9). Then, if ρ is sufficiently small, we can use (4.2) for \(\widehat {S}_{t}\) starting from time T to obtain for all r>0 and γ′′:
Using Proposition 6.4 for the middle term, the last inequality gives (4.3) with \(t^{\alpha _{1}}_{\gamma ^{\prime\prime}}(n) + t^{\xi, a^{\prime}}_{\gamma ^{\prime }}(n)\) in place of \(t^{\alpha _{1}}_{\gamma }(n)\). The remaining of the proof is identical to the subcritical case and this shows the lower bound for both parts of Theorem 2 with \(\alpha _{2} = \pi /\sqrt{\alpha a}\).
6.3 Proofs for Sect. 6.1
Proof of Proposition 6.1
First observe that for all β, \(D_{\beta }(\frac{1}{q})=0\) and for all \(s>\frac{1}{q}\),
Now since D β (s) is smooth as a function of s and β and \(d_{0}(s)=-s+\frac{1}{q}\) it follows that β s >0. We have that
and so
which implies that d q/2(s)>0 when \(s\in(\frac{1}{q},\frac{1}{q}+\epsilon )\) for some small ϵ. This implies that β s <q/2. It follows that
and so D β (s)<0 when β∈[β s ,β s +ϵ] and \(s\in (\frac{1}{q},\frac{1}{q}+\epsilon)\) for some small ϵ. It follows by compactness then that for some \(\frac{1}{q} + \epsilon\leq s^{*}(\beta _{s})\leq 1\) that \(D_{\beta _{s}}(s^{*}(\beta _{s}))=0\). By the definition of β s and since D β (s) is smooth we have that
The equation \(\frac{d}{ds} D_{\beta _{s}}(s)= 0\) is equivalent to
which is a quadratic equation in \(\mathrm{e}^{-\frac{2\beta _{s} q}{q-1}(s-\frac{1}{q})}\) and hence has at most 2 solutions which we denote s 1,s 2 with s 1<s 2. Since \(\frac{d}{ds} D_{\beta _{s}}(0)<0\) then \(D_{\beta _{s}}(s_{1})<0\) and so s ∗(β s )=s 2. In particular this implies that s ∗(β s ) is the unique \(s\in(\frac{1}{q},1]\) such that \(D_{\beta _{s}}(s)=0\). Also it follows that \(\frac{d}{ds} D_{\beta _{s}}(s)>0\) for s∈(s 1,s ∗(β s )) and that there exists s′∈(s 1,s ∗(β s )) such that \(\frac {d^{2}}{ds^{2}} D_{\beta _{s}}(s')=0\). Since by Eq. (6.13) there is at most one s such that \(\frac {d^{2}}{ds^{2}}D_{\beta _{s}}(s) = 0\) it follows that
Hence by the Inverse Function Theorem s ∗(β) is a smooth function of β when β is in a small neighborhood of β s . Then we have that
since \(\frac{d}{ds}D_{\beta _{s}}(s)|_{s = s^{*}(\beta _{s})} = 0\) which completes the proof of the second part.
We now turn to prove the third part. As we have observed \(D_{\beta _{s}}(s)\) is a smooth function satisfying
Therefore, we deduce that for any ρ>0, there exists δ 1>0 such that
Since \(\frac{d^{2}}{ds^{2}}D_{\beta _{s}}(s) |_{s=s^{*}(\beta )} < 0\) for all ρ>0, there exists δ 2>0 such that
Combined with (6.16), it follows that
Now that D β (s) can be viewed as a continuous function of (β,s) and by compactness, there exists δ 3>0 such that \(|D_{\beta }(s) - D_{\beta _{s}}(s)| \leq\delta_{1}\) for all s∈[1/q,1] and |β−β s |≤δ 3. Combined with (6.17), it completes the proof by taking δ=δ 1∧δ 2∧δ 3. □
Proof of Lemma 6.2
We do not lose anything by assuming that
for some a>0, b, c with ρ 0=2ρ and that once Z t exits [−ρ 0,+ρ 0] it is stopped. Indeed, having f vanish outside of [−ρ 0,+ρ 0], does not change the asymptotics of the passing time. Clearly, this is the case for z>+ρ 0. For z<−ρ 0, it follows from
for all γ, which is a consequence of Lemma 2.1 part (3) since the drift of Z t is at least \(\frac {c}{n}\) on [−ρ 0,−ρ] for some positive c uniformly in n (if ρ is small enough).
As for replacing the error term in (6.4) by \(c Z_{t}^{4}\), as the proof below shows, the functions U ∗, L ∗ in the lemma restricted to condition (6.18) can be chosen to be continuous in a in a small interval [a 0−ϵ,a 0+ϵ] and the limits (6.6), (6.7) hold uniformly in a in this interval. This together with Remark 6.3 implies the existence of U ∗, L ∗ under which (6.6), (6.7) hold in the general case (6.4) with \(t_{\gamma }^{\zeta , a+O(\zeta (n))}(n)\). Now, it is not difficult to see that the latter is bounded above and below by \(t_{\gamma \pm C}^{\zeta , a}(n)\) for some C>0 and hence (6.6), (6.7) hold with \(t_{\gamma }^{\zeta , a}(n)\). Similar considerations apply for (6.8).
Set
The motivation behind the above definitions comes from a continuous time deterministic analog of (6.18) in the form of an ODE
for which z(t)=Ψ −1(t−t 0) is a solution (roughly speaking Y t measures how far behind or ahead “in schedule” Z t is, judging from its position).
Start with the [CR] case and set
In the deterministic setting the time it takes for z(t) to pass from z(0)=−ρ to ρ is
if ρ is small enough. This will be shown in Proposition 6.5 below. Thus, bounding the passage time \(\tau _{\rho}^{+}\) around t ζ,a(n) can be achieved by bounding \(|Y_{t^{\zeta , a}(n)}|\).
Accordingly, let Y t =M t +A t be the Doob-decomposition of Y t , with M t a zero-mean martingale and A t the predictable process. The next proposition will allow us to bound Y t . The proof of this proposition will be deferred to the end of this section.
Proposition 6.5
If ρ is small enough and ζ∈[CR] then
-
(1)
Ψ(ρ)−Ψ(−ρ)=t ζ,a(n)+O(n).
-
(2)
\(\mathbb {E}M^{2}_{t_{\gamma}^{\zeta , a}(n)} = O( w^{\zeta }(n)^{2})\).
-
(3)
\(A_{t_{\gamma }^{\zeta , a}(n)} = o(w^{\zeta }(n))\) with probability 1.
Now using the monotonicity of Ψ in [−ρ 0,+ρ 0] we have
where the last inequality is a second moment bound. This shows (6.6). For the lower bound, if −γ is large enough, we may write
where the last inequality follows from Doob’s inequality. This shows (6.7)
Next, we address the [NCR] case. We can no longer use the means analysis (6.20) throughout the entire passage interval [−ρ,ρ] as Z t is not concentrated around its mean near 0. Accordingly, we analyze the passage time in each of the following segments separately:
for some r>0 to be chosen later.
We start with the upper bound. For the sequel, let w=rn −1/3. The upper bound will follow if we show the following:
-
(1)
Segment [−ρ,−w]. For any γ,
$$ \lim_{r \to\infty} \limsup_{n \to\infty} \mathbb {P}_{-\rho} \bigl (\tau^+_{-w} > t^{\zeta , a}_{\gamma }(n)\bigr) = 0. $$(6.21) -
(2)
Segment [+w,+ρ]. For any γ,
$$ \lim_{r \to\infty} \limsup_{n \to\infty} \mathbb {P}_{w} \bigl(\tau^+_{\rho} > t^{\zeta , a}_{\gamma }(n)\bigr) = 0. $$(6.22) -
(3)
Segment [−w,+w]. For any r>0, there exists u:ℝ→[0,1) such that
$$ \limsup_{n \to\infty} \mathbb {P}_{-w} \bigl (\tau^+_{w} > t_{\gamma }^{\zeta , a}(n) \bigr )\leq u(\gamma ) <1 , $$(6.23)for all γ. Furthermore, u(γ)→0 as γ→∞.
All are hitting times for Z. Indeed, by first choosing large enough r and then choosing large enough γ both (6.6) and (6.8) will follow by multiplication. We proceed to prove each of the above statements.
Segments [−ρ,−w], [+w,+ρ]. Here we can use the means analysis as in the [CR] case. As before, we do not change the asymptotics of the passage time through these intervals, if we assume that f(z) in (6.18) satisfies:
where a, b, c, ρ 0 are as before, w 0=w/2 and once Z t exits [−ρ 0,−w 0]∪[+w 0,+ρ 0] it is stopped. Indeed this follows from the same reasoning and in addition since for any γ
uniformly in n (large enough) as it follows from Lemma 2.1 part (2) since the drift of Z t is non-negative on [w 0,w].
We use the same definitions for Y t , M t and A t as above. In place of Proposition 6.5 we have
Proposition 6.6
Assume ζ∈[NCR]. There exists k(r) satisfying k(r)→0 as r→∞ such that for any ρ small enough, r large enough, n large enough and all t:
-
(1)
Ψ(−w)−Ψ(−ρ),Ψ(ρ)−Ψ(w)≤k(r)n 4/3.
-
(2)
\(\mathbb {E}M_{t}^{2} \leq k(r) t n^{4/3}\).
-
(3)
|A t |≤k(r)t with probability 1.
The proof is again deferred. Now, as before
where the last inequality is Chebyshev. This goes to zero as r→∞ for any γ. This shows (6.21). Similarly,
and this shows (6.22).
Segment [−w,w]. Here we still assume (6.18), but instead of absorbing Z t at the boundaries, we shall now suppose that Z t evolves like a symmetric random walk with ±n −1 steps, once it exits [−ρ 0,+ρ 0].
We first show that u can be chosen to vanish at infinity. Consider the process U t =U 0−(Z t −Z 0+δn −5/3 t), for δ>0 with U 0 to be chosen later and set N=inf{t:U t ≤0}. Then, by the definition of the [NCR] regime for n large enough U t∧N is a non-negative supermartingale satisfying the requirements of Lemma 2.3 and hence
where we choose U 0=w+δeγ n −1/3−Z 0 and δ=e−γ. The last expression can be made arbitrarily small by taking γ large enough, uniformly in n if it is sufficiently large.
To show that u can satisfy u(γ)<1 for all γ, we have to show that Z can cross from −w to w in \(t^{\zeta , a}_{\gamma }(n)\)-time for arbitrarily small γ. If ρ is small and n is large, then X t =n(Z t −(−w)) satisfies the conditions in Lemma 2.2 with δ=ζ(n)− and a=d. Therefore
which is positive for all γ, once n is large enough. This proves (6.23) and concludes the proof of the upper bound.
To show the lower bound in the [NCR] case, set V t =Z t −δn −5/3 t+w and choose δ,r>0 such that \(V_{t \land \tau^{+}_{w}}\) has non-positive drift whenever \(V_{t \land \tau^{+}_{w}} \geq0\). Then,
and part (2) of Lemma 2.1 shows that the last expression goes to 0 as γ→−∞ uniformly in n (large enough). This proves (6.7) and completes the [NCR] case. □ It remains to prove Propositions 6.4–6.6.
Proof of Proposition 6.4
Let . Fix some 2≤i<j≤q and set
Let \(U_{t-1} = Y_{t}-Y_{t-1} - \mathbb {E}_{\sigma_{0}} [Y_{t}-Y_{t-1}\mid \mathcal {F}_{t-1}]\) and then since \(|Y_{t}-Y_{t-1}| \leq\frac{2}{n}\) we have that \(|U_{i} | \leq\frac{4}{n}\). Define the process Z t by Z 0=0 and
where
With this definition Z t is clearly a martingale and since \(|Z_{t}-Z_{t-1}|\leq\frac{4}{n}\), then \(\mathbb {E}_{\sigma_{0}} Z_{t}^{2} \leq\frac{16t}{n^{2}}\) and so by Doob’s maximal inequality,
Now when t<τ ∗ we have that \(S^{i}_{t},S_{t}^{j}<\frac{1}{q}\) and so
By Jensen’s inequality \(\sum_{k=1}^{q} \mathrm{e}^{2\beta (S_{t}^{i}-\frac{1}{q})} \geq q\) so
Now when \(|Y_{t-1}| \geq\frac{2}{n}\) we have that \(|Y_{t}| - |Y_{t-1}| = \operatorname {sign}(Y_{t-1})(Y_{t}-Y_{t-1})\) and we always have
with equality when \(\operatorname {sign}(Z_{t}) = \operatorname {sign}(Z_{t-1})\). Hence it follows that when \(|Y_{t-1}| \geq\frac{2}{n}\) and t≤τ ∗,
where the first inequality follows from Eq. (6.26). It follows by induction that \(|Z_{t}| \geq|Y_{t}| - \frac{3}{n}\) for all t≤τ ∗. In particular we have by Eq. (6.25) that
By Markov’s inequality with probability tending to 1 we have that \(|S_{\tau^{*}}^{i} - S_{\tau^{*}}^{j} | = o(1)\) for every pair 2≤i,j≤q. Now by construction \(S_{\tau^{*}}^{1} \geq y-\frac{1}{n}\) so with high probability we have that \(S_{\tau^{*}}^{i} \leq\frac{1}{q}-\frac{y-\frac{1}{q}}{q-1} + o(1) < \frac{1}{q}\) which implies that with high probability \(\tau^{*} = \min\{t(n),\tau_{y}^{-}\}\).
Now, by Taylor series expansions,
where the first inequality is by Jensen, and we have used the fact that \(\sum_{i=2}^{q} S_{t}^{i} = 1 - S_{t}^{1}\). It therefore follows that with high probability for all \(0\leq t\leq\max\{t(n),\tau_{y}^{-}\} \) that
and hence that
which combined with Eq. (6.27) and Markov’s inequality completes the result. □
Proof of Proposition 6.5
Starting with part (1),
To prove part (2), we use the law of total variance:
Hence by induction
As for part (3),
where the last inequality follows from
if ρ is small enough. Then again by induction, we conclude that
□
Proof of Proposition 6.6
If r is large enough and ρ is small, we have f(z)≥n −1(a/2)z 2 for all w 0<|z|<ρ 0, where as before w 0=w/2 and ρ 0=2ρ. This immediately gives part (1) with k(r)=Cr −1.
The proof for parts (2), (3) are similar to the ones in Proposition 6.5. This time the bounds on the derivatives become
Proceeding by induction as before, we obtain (3), (2) with k(r)=Cr −4 and k(r)=Cr −3 respectively. □
7 Essential Mixing
Proof of Theorem 4
As the reader can verify, most statements in Sects. 3 and 4 hold when β<β c (q) and even β<q/2 (the restrictions on β are indicated before each statement there). The only time β<β s (q)<β c (q) is required is in step (1) of the overall coupling, where the condition ensures that the drift of each single coordinate \(S^{i}_{t}\) is negative in all (1/q,1], which, in turn, implies that for any initial configuration, after t=O(n) time, \(\sigma_{t} \in \varSigma _{n}^{\rho }\), which is a necessary starting point for the couplings that follow.
Now, if β≥β s (q), but still β<β c (q), we may replace this step, with the requirement that σ 0 is initially chosen from \(\widetilde {\varSigma }_{n} = \varSigma _{n}^{\rho}\). The analysis of the overall coupling will remain the same, with the coalescence time being even smaller (but just by a linear term, which can be absorbed in the cutoff-window term). Thus, the restricted mixing time \(t^{\widetilde {\varSigma _{n}}}_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n)\) will be upper bounded as before. In addition, the lower bound in Sect. 4.1 will also hold for \(t^{\widetilde {\varSigma _{n}}}_{{ \scriptsize \mbox {\textsc {mix}}}(\epsilon )}(n)\), since as initial configuration, we may take any \(\sigma_{0} \in\partial_{P_{n}} \varSigma _{n}(\rho)\) for ρ>0.
It remains to show that \(\varSigma _{n} \setminus \widetilde {\varSigma }_{n}\) has an exponentially decreasing probability under μ n . This follows immediately from the large deviations analysis in Sect. 2.2. If β<β c (q), the rate function I β,q is strictly positive away from \(\widehat {\mathsf {e}}\) and in particular there exist C 1>0, C 2>0, such that
This concludes the proof of the theorem. □
References
Alon, N., Milman, V.D.: λ 1, isoperimetric inequalities for graphs, and superconcentrators. J. Comb. Theory, Ser. A 38(1), 73–88 (1985)
Baxter, R.J.: Exactly Solved Models in Statistical Mechanics. Academic Press [Harcourt Brace Jovanovich Publishers], London (1989). Reprint of the 1982 original, MR998375 (90b:82001)
Berger, N., Kenyon, C., Mossel, E., Peres, Y.: Glauber dynamics on trees and hyperbolic graphs. Probab. Theory Relat. Fields 131, 311–340 (2005)
Bhatnagar, N., Randall, D.: Torpid mixing of simulated tempering on the Potts model. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 478–487 (2004)
Binder, K.: Theory of first-order phase transitions. Rep. Prog. Phys. 50(7), 783–859 (1987)
Biskup, M.: Reflection positivity and phase transitions in lattice spin models. In: Methods of Contemporary Mathematical Statistical Physics. Lecture Notes in Math., vol. 1970, pp. 1–86. Springer, Berlin (2009)
Biskup, M., Chayes, L.: Rigorous analysis of discontinuous phase transitions via mean-field bounds. Commun. Math. Phys. 238(1–2), 53–93 (2003)
Biskup, M., Chayes, L., Crawford, N.: Mean-field driven first-order phase transitions in systems with long-range interactions. J. Stat. Phys. 122(6), 1139–1193 (2006)
Bollobás, B., Grimmett, G., Janson, S.: The random-cluster model on the complete graph. Probab. Theory Relat. Fields 104, 283–317 (1996)
Borgs, C., Chayes, J.T., Tetali, P.: Tight bounds for mixing of the Swendsen-Wang algorithm at the Potts transition point. Probab. Theory Relat. Fields 152(3), 509–557 (2012)
Borgs, C., Chayes, J.T., Frieze, A., Kim, J.H., Tetali, P., Vigoda, E., Vu, V.H.: Torpid mixing of some Monte Carlo Markov chain algorithms in statistical physics. In: 40th Annual Symposium on Foundations of Computer Science (New York, 1999), pp. 218–229. IEEE Comput. Soc., Los Alamitos (1999)
Bovier, A.: Metastability: a potential theoretic approach. In: International Congress of Mathematicians, vol. III, pp. 499–518. Eur. Math. Soc., Zürich (2006)
Cesi, F., Guadagni, G., Martinelli, F., Schonmann, R.H.: On the two-dimensional stochastic Ising model in the phase coexistence region near the critical point. J. Stat. Phys. 85(1–2), 55–102 (1996)
Chayes, J.T., Chayes, L., Schonmann, R.H.: Exponential decay of connectivities in the two-dimensional Ising model. J. Stat. Phys. 49(3–4), 443–445 (1987)
Cirillo, E.N.M., Lebowitz, J.L.: Metastability in the two-dimensional Ising model with free boundary conditions. J. Stat. Phys. 90(1–2), 211–226 (1998)
Costeniuc, M., Ellis, R.S., Touchette, H.: Complete analysis of phase transitions and ensemble equivalence for the Curie-Weiss-Potts model. J. Math. Phys. 46(6), 063301, 25 pp. (2005)
Ding, J., Lubetzky, E., Peres, Y.: The mixing time evolution of Glauber dynamics for the mean-field Ising model. Commun. Math. Phys. 289(2), 725–764 (2009)
Ding, J., Lubetzky, E., Peres, Y: Censored Glauber dynamics for the mean field Ising model. J. Stat. Phys. 137(3), 407–458 (2009)
Ding, J., Lubetzky, E., Peres, Y.: Mixing time of critical Ising model on trees is polynomial in the height. Commun. Math. Phys. 295(1), 161–207 (2010)
Ellis, R.S., Wang, K.: Limit theorems for the empirical vector of the Curie-Weiss-Potts model. Stoch. Process. Appl. 35(1), 59–79 (1990)
Georgii, H.-O., Miracle-Sole, S., Ruiz, J., Zagrebnov, V.A.: Mean-field theory of the Potts gas. J. Phys. A 39(29), 9045–9053 (2006)
Gore, V.K., Jerrum, M.R.: The Swendsen-Wang process does not always mix rapidly. J. Stat. Phys. 97, 67–86 (1999)
Grimmett, G.: The Random-Cluster Model. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 333. Springer, Berlin (2006)
Griffiths, R.B., Weng, C.-Y., Langer, J.S.: Relaxation times for metastable states in the mean-field model of a ferromagnet. Phys. Rev. 149, 301–305 (1966)
Kirkpatrick, T.R., Wolynes, P.G.: Stable and metastable states in mean-field Potts and structural glasses, 36. Phys. Rev. B 16, 8552–8564 (1987)
Kovchegov, Y., Otto, P.T., Titus, M.: Mixing times for the mean-field Blume-Capel model via aggregate path coupling. J. Stat. Phys. 144(5), 1009–1027 (2011)
Lawler, G.F., Sokal, A.D.: Bounds on the L 2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality. Trans. Am. Math. Soc. 309(2), 557–580 (1988)
Levin, E.A., Luczak, M., Peres, Y.: Glauber dynamics for the mean-field Ising model: cut-off, critical power law, and metastability. Probab. Theory Relat. Fields 146(1), 223–265 (2010)
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society, Providence (2009). With a chapter by James G. Propp and David B. Wilson
Lubetzky, E., Sly, A.: Critical Ising on the square lattice mixes in polynomial time. Commun. Math. Phys. (to appear)
Lubetzky, E., Sly, A.: Cutoff for general spin systems with arbitrary boundary conditions. Preprint. Available at arXiv:1202.4246
Lubetzky, E., Sly, A.: Cutoff for the Ising model on the lattice. Invent. Math. (to appear)
Martinelli, F.: Lectures on Glauber dynamics for discrete spin models. In: Lectures on Probability Theory and Statistics (Saint-Flour, 1997). Lecture Notes in Math., vol. 1717, pp. 93–191. Springer, Berlin (1999)
Martinelli, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region. I. The attractive case. Commun. Math. Phys. 161(3), 447–486 (1994)
Martinelli, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region. II. The general case. Commun. Math. Phys. 161(3), 487–514 (1994)
Rikvold, P.A., Tomita, H., Miyashita, S., Sides, S.W.: Metastable lifetimes in a kinetic Ising model: dependence on field and system size. Phys. Rev. E 49(6), 5080–5090 (1994)
Schonmann, R.H., Shlosman, S.B.: Wulff droplets and the metastable relaxation of kinetic Ising models. Commun. Math. Phys. 194(2), 389–462 (1998)
Sinclair, A., Jerrum, M.: Approximate counting, uniform generation and rapidly mixing Markov chains. Inf. Comput. 82(1), 93–133 (1989)
Thomas, L.E.: Bound on the mass gap for finite volume stochastic Ising models at low temperature. Commun. Math. Phys. 126(1), 1–11 (1989)
Acknowledgements
This work was initiated while P.C., O.L. and A.S. were interns at the Theory Group of Microsoft Research, and they thank the Theory Group for its hospitality. P.C would like to acknowledge NSF grant CCF-1116013. The research of O.L. was supported in part by NSF grant OISE-07-30136.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cuff, P., Ding, J., Louidor, O. et al. Glauber Dynamics for the Mean-Field Potts Model. J Stat Phys 149, 432–477 (2012). https://doi.org/10.1007/s10955-012-0599-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-012-0599-2