1 Introduction

In this paper we consider a discrete-time stochastic dynamics for a spin system at low temperature, in which high mobility of parallel updating and asymmetry of the interaction combine to produce efficient dynamical stability and fast convergence to equilibrium.

The control of the convergence to equilibrium of irreducible Markov Chains (MC) is particularly interesting when the invariant measure is strongly polarized, for instance in MC describing large scale ferromagnetic systems at low temperature. Indeed in the region of parameters where the system exhibits coexistence of more phases, the problem of the control of the convergence to equilibrium of the MC describing the system becomes strictly related to the problem of metastability, since the tunneling between different phases is necessary to reach equilibrium. This tunneling time usually is exponentially divergent in the size of the problem so that the convergence to equilibrium in these cases is exponentially slow. See [7] for a beautiful review on this problem.

We briefly recall the well known Ising model in 2-d in order to explain in more detail the problem.

Let \(L\) be a positive integer, and \({\Lambda }:= \left( {\mathbb Z}/L{\mathbb Z}\right) ^2\) be the two dimensional discrete torus. Consider the standard Ising model on \({\Lambda }\) without external field with spin configurations \({\sigma }= ({\sigma }_x)_{x \in {\Lambda }} \in \mathcal {S}:= \{-1,1\}^{{\Lambda }}\) and with Hamiltonian

$$\begin{aligned} H({\sigma })=-\sum _{(x,y)}J{\sigma }_x{\sigma }_y \end{aligned}$$
(1)

where \(J>0\) and the sum is on neighboring sites in \({\Lambda }\). Denote by \({\pi }_{G}\) its Gibbs measure

$$\begin{aligned} {\pi }_{G}({\sigma })=\frac{e^{-H({\sigma })}}{Z_{G}},\qquad Z_{G}=\sum _{{\sigma }\in \mathcal {S}}e^{-H({\sigma })}. \end{aligned}$$
(2)

A popular discrete-time MC, reversible w.r.t. this Gibbs measure, is given by the following algorithm: at each time \(t\) a point \(x \in {\Lambda }\) is chosen with uniform probability; all spins \({\sigma }_y\), \(y \ne x\) are left unchanged, while \({\sigma }_x\) is flipped with probability

$$\begin{aligned} \exp \left[ -(H({\sigma }^x) - H({\sigma }))^+ \right] , \end{aligned}$$

where \({\sigma }^x\) is the configuration obtained by \({\sigma }\) by flipping \({\sigma }_x\) and, for a real number \(a\), \(a^+ := \max (a,0)\). Denote by \({\mathbb P}_{{\sigma }}^t\) the probability distribution of the process at time \(t\) starting from \({\sigma }\) at time \(0\), and by \(\mathcal {P}\) the transition matrix \(({\mathbb P}_{{\sigma }}^1(\eta ))_{{\sigma },\eta \in \mathcal {S}}\).

Different quantities can be used to control the convergence to equilibrium of MC’s; the most popular is the mixing time

$$\begin{aligned} T_{mix}:=\min \Big \{t>0;\; d(t)\le \frac{1}{e}\Big \} \end{aligned}$$
(3)

where \(d(t)\) is the maximal distance in total variation between the distribution at time \(t\) and the invariant measure

$$\begin{aligned} d(t)=\sup _{\sigma }\Vert {\mathbb P}_{{\sigma }}^t-{\pi }_G\Vert _{TV}. \end{aligned}$$

For the Glauber dynamics defined above when the interaction constant \(J\) is so large that the Gibbs measure \({\pi }_{G}\) is nearly concentrated on the configurations \(\mathbf{+1}\) and \(\mathbf{-1}\), with all spins +1 and all spins -1 respectively, it is possible to prove that \(T_{mix}\) diverges exponentially in \(L\). This result is due to the presence of a rather tight “bottleneck” in the state space. Indeed starting for instance from \(\mathbf{-1}\), in order to relax to equilibrium the dynamics has to reach a neighborhood of the opposite minimum \(\mathbf{+1}\), crossing the set of configurations with zero magnetization which has a small Gibbs measure. In other words the system is trapped for a very long time near the configuration \(\mathbf{-1}\), and only after many attempts to leave this trap, a +1 droplet is nucleated and grows up to reach the bottleneck, i.e., the set of configurations of zero magnetization. This mechanism is typical in metastability and produces a large relaxation time.

If the relaxation time is exponentially large, the MC given by the Glauber dynamics is not an efficient way for sampling from the Gibbs measure \({\pi }_G\) for large systems. A possible way to bypass this problem is the following: for each size \(L\) of the system, we construct a MC whose invariant measure \(\pi \) is close to \(\pi _G\), in the sense that \(\Vert {\pi }-{\pi }_{G}\Vert _{TV}\) converges to zero as \(L \rightarrow +\infty \), and such that its mixing time grows polynomially in \(L\). We call this a asymptotically polynomial approximation scheme; this notion is weaker but closely related to that of fully polynomial randomized approximation scheme (FPRAS) introduced in theoretical computer science (see [5]).

In this paper we present two independent results; their combination provide an asymptotically polynomial approximation scheme for the 2d Ising model.

More precisely we introduce a modification of the above MC in which:

  • all spins are simultaneously updated;

  • the updating of the spin \({\sigma }_x\) only depends on the spin values at the previous time of its South and West nearest neighbors; this makes the dynamics non reversible.

This dynamics is a probabilistic cellular automaton (PCA) for which the invariant measure \({\pi }_{PCA}\) can be found without a detailed balance condition, but using the notion of weak balance condition discussed in Ref. [6] also known in the literature as dynamical reversibility (see for instance [1]). By using the ideas developed in Ref. [3] we can control the total variation distance between the Gibbs measure \({\pi }_{G}\) and \({\pi }_{PCA}\). This is the content of Theorem 2.3.

In the second theorem we study the convergence to equilibrium of this parallel dynamics. The key step is an estimate on the tunneling time between \(\mathbf{-1}\) and \(\mathbf{1}\). This estimate is obtained by using some of the basic ideas developed in the context of metastability. The main point is concerned with the separation of time scales. The general idea is the following: the energy landscape determines a sequence \(\mathcal {S} = \mathcal {S}_0 \supset \mathcal {S}_1 \supset \cdots \supset \mathcal {S}_n = \{\mathbf{-1}, \mathbf{1} \}\) of nested subsets of \(\mathcal {S}\) is such a way that for \(k \ge 1\) a time scale \(T_k\) is associated to each \(\mathcal {S}_k\) in the following sense: the dynamics need a time of order \(T_k\) to leave \(\mathcal {S}_k\), but a much smaller time to return in \(\mathcal {S}_k\) after having left it; moreover, \(T_k\) is much smaller than \(T_{k+1}\). This allows to define an effective renormalized dynamics on \(\mathcal {S}_k\) which evolves at time-scale \(T_k\), and which consists of the successive returns in \(\mathcal {S}_k\). See for instance [2, 9, 10] for more details on such a renormalisation procedure. Iterating this strategy on larger and larger time scales \(t_0<t_1<\cdots <t_n\) one arrives to the situation in which \(\mathcal {S}_n\) is given just by the absolute minima of the energy. In this case the corresponding renormalized process is a very elementary two states process with a tunneling time \({\tau }(\mathbf{-1},\mathbf{+1})\) given by an exponential random variable with mean given by the inverse of the transition probability \((\mathbf{-1},\mathbf{+1})\) of the renormalized chain on \(\mathcal {S}_n\).

We do not completely develop this analysis for our PCA dynamics, but we will use the main ideas of separation of time scales and corresponding reduction of the state space in order to control the mean tunneling time, and, with this, the mixing time of the PCA. Exploiting the complete asymmetry of the interaction (only SW), the simultaneous updating and the periodic boundary conditions, we observe that configurations with the same spin on a NW-SE diagonal are stable on the time scale of order 1, just moving in the NE direction. Playing on the difference of time scales involved in the process, we can tune the parameters of the dynamics in order to describe the evolution between diagonal configurations in terms of a 1d nearly symmetric Random Walk, producing a tunneling time which is polynomial in the size of \({\Lambda }\). Cellular automata with completely oriented interaction are extensively studied since the pioneering paper by Toom [11]. However in this paper we are mainly interested in the study of the relations between PCA and statistical mechanics and, most of all, n the study of the rate of relaxation to equilibrium of an irreversible PCA. The latter is, to our knowledge, a largely unexplored subject.

In Sect. 2 we define the model in details, and state our main results. Section 3 is devoted to the analysis of the invariant measure of the PCA, and its relations with the Ising model. Some fundamental facts on time scale separation for the PCA is presented in Sect. 4, while Sect. 5 contains the key estimate on the tunneling time.

2 The Model and the Results

2.1 The Model

On the same space of configurations \(\mathcal {S}:= \{-1,1\}\) discussed in the Introduction for the Ising model we want to construct a Markov chain given in terms of a completely asymmetric interaction as follows. For \(x = (i,j) \in {\Lambda }= \left( {\mathbb Z}/L{\mathbb Z}\right) ^2\), we introduce the following notation for its nearest neighbors:

$$\begin{aligned} x^u := (i,j+1) \ \ \ x^r := (i+1,j) \ \ \ x^d := (i, j-1) \ \ \ x^l := (i-1,j) \end{aligned}$$
(4)

where sums and difference has to be meant mod. \(L\). Given a spin configuration \({\sigma } = ({\sigma }_x)_{x \in {\Lambda }} \in \mathcal {S}\), for typographical reasons we write \({\sigma }_x^u\) for \({\sigma }_{x^u}\), and similarly for the other nearest neighbors of \(x\). Consider the discrete-time Markov chain on \(\mathcal {S}\), whose transition matrix is given by

$$\begin{aligned} P({{\sigma },{\tau }}) := \frac{e^{-H({\sigma },{\tau })}}{\sum _{{\sigma }' \in \mathcal {S}} e^{-H({\sigma },{\sigma }')}}, \end{aligned}$$
(5)

where \(H({\sigma },{\tau })\) is the following asymmetric Hamiltonian, defined on pairs of configurations:

$$\begin{aligned} H({\sigma },{\tau })&:= - \sum _{x \in {\Lambda }} \left[ J {\sigma }_x({\tau }_x^u + {\tau }_x^r) + q {\sigma }_x {\tau }_x \right] \nonumber \\&= - \sum _{x \in {\Lambda }} \left[ J {\tau }_x({\sigma }_x^d + {\sigma }_x^l) + q {\sigma }_x {\tau }_x \right] \end{aligned}$$
(6)

and \(J,q>0\) are given parameters. In what follows we set

$$\begin{aligned} Z_{{\sigma }} := \sum _{{\sigma }' \in \mathcal {S}} e^{-H({\sigma },{\sigma }')}. \end{aligned}$$
(7)

Some basic facts on this Markov chain are grouped in the next Proposition (see [6] for more details) motivating the name Probabilistic Cellular Automata (PCA) for this dynamics.

Proposition 2.1

  1. (1)

    \(P({{\sigma },\tau })\) is of the following product form:

    $$\begin{aligned} P({{\sigma },\tau }) = \prod _{x \in {\Lambda }} p_x(\tau _x|{\sigma }) \end{aligned}$$

    where

    $$\begin{aligned} p_x(\tau _x|{\sigma }) := \frac{\exp \left\{ \tau _x\left[ J({\sigma }_x^d + {\sigma }_x^l) + q {\sigma }_x\right] \right\} }{2 \cosh (J({\sigma }_x^d + {\sigma }_x^l) + q {\sigma }_x)}. \end{aligned}$$
  2. (2)

    \(H({\sigma },{\tau })\not =H({\tau },{\sigma })\) but the following weak symmetry condition holds

    $$\begin{aligned} \sum _{\tau \in \mathcal {S}} e^{-H({\sigma },\tau )} = \sum _{\tau \in \mathcal {S}} e^{-H(\tau ,{\sigma })} . \end{aligned}$$
  3. (3)

    The Markov chain is irreversible with a unique stationary distribution \(\pi _{PCA}\) given by

    $$\begin{aligned} \pi _{PCA}({\sigma }) := \frac{Z_{{\sigma }}}{Z_{PCA}}, \end{aligned}$$

    with \(Z_{PCA} := \sum _{{\sigma }} Z_{{\sigma }}\).

Proof

The statement in (1) amounts to a straightforward computation; in particular, it implies irreducibility of the chain, which therefore has a unique stationary distribution. The statement in (3) thus follows readily from (2), that is the only nontrivial point to show. Note that

$$\begin{aligned} \sum _{\tau \in \mathcal {S}} e^{-H({\sigma },\tau )}&= 2^{|{\Lambda }|} \prod _{x \in {\Lambda }} \cosh (J({\sigma }_x^d + {\sigma }_x^l) + q {\sigma }_x)) \nonumber \\ \sum _{\tau \in \mathcal {S}} e^{-H(\tau ,{\sigma })}&= 2^{|{\Lambda }|} \prod _{x \in {\Lambda }} \cosh (J({\sigma }_x^u + {\sigma }_x^r) + q {\sigma }_x)). \end{aligned}$$
(8)

Denote by \({\Lambda }^* := \{ \{x,y\}: \, {\xi },y \in {\Lambda }, \, |x-y| = 1\}\) the set of bonds in \({\Lambda }\). Note that \(|{\Lambda }^*| = 2L^2\). For \({\sigma }\in \mathcal {S}\), we let

$$\begin{aligned} {\gamma }({\sigma }) := \{ \{x,y\} \in {\Lambda }^* : \, {\sigma }_x \ne {\sigma }_y \} \end{aligned}$$
(9)

be the Peierls contour associated to \({\sigma }\). The following identities are immediately checked:

$$\begin{aligned} \cosh (J({\sigma }_x^d + {\sigma }_x^l) + q {\sigma }_x)) = \left\{ \begin{array}{ll} \cosh (2J+q) &{}\quad \text{ if } \{x,x^d\} \not \in {\gamma }({\sigma }), \, \{x,x^l\} \not \in {\gamma }({\sigma }) \\ \cosh (2J-q) &{}\quad \text{ if } \{x,x^d\} \in {\gamma }({\sigma }), \, \{x,x^l\} \in {\gamma }({\sigma }) \\ \cosh (q) &{}\quad \text{ otherwise }. \end{array} \right. \end{aligned}$$

So, if we let

$$\begin{aligned} n_{dl} = n_{dl}({\sigma }) := \left| \{ x \in {\Lambda }: \, \{x,x^d\} \in {\gamma }({\sigma }), \, \{x,x^l\} \in {\gamma }({\sigma }) \} \right| , \end{aligned}$$

using (8) we obtain

$$\begin{aligned} \sum _{\tau \in \mathcal {S}} e^{-H({\sigma },\tau )} = 2^{L^2} [\cosh (2J-q)]^{n_{dl}}[\cosh (q)]^{|{\gamma }({\sigma })| - 2 n_{dl}}[\cosh (2J+q)]^{L^2 -|{\gamma }({\sigma })| + n_{dl}}.\nonumber \\ \end{aligned}$$
(10)

With the same argument, defining

$$\begin{aligned} n_{ur} =n_{ur}({\sigma }) := \left| \{ x \in {\Lambda }: \, \{x,x^u\} \in {\gamma }({\sigma }), \, \{x,x^r\} \in {\gamma }({\sigma }) \} \right| , \end{aligned}$$
(11)

we obtain

$$\begin{aligned} \sum _{\tau \in \mathcal {S}} e^{-H(\tau ,{\sigma })} = 2^{L^2} [\cosh (2J-q)]^{n_{ur}}[\cosh (q)]^{|{\gamma }({\sigma })| - 2 n_{ur}}[\cosh (2J+q)]^{L^2 -|{\gamma }({\sigma })| + n_{ur}}.\nonumber \\ \end{aligned}$$
(12)

The conclusion now follows from the observation that, for every \({\sigma }\in \mathcal {S}\), the identity \(n_{dl}({\sigma }) = n_{ur}({\sigma })\) holds. This can be shown, for instance, by induction on \(n^+({\sigma })\), where \(n^+({\sigma })\) denotes the number of spins equal to \(+1\) in \({\sigma }\). If \(n^+({\sigma }) = 0\) the statement is obvious. For \(n^+({\sigma }) = n >0\), let \(x \in {\Lambda }\) be such that \({\sigma }_x = +1\), and let \({\sigma }^x\) the configuration obtained from \({\sigma }\) by flipping the spin at \(x\). By considering all possible spin configuration in the \(3 \times 3\) square centered at \(x\), one checks that \(n_{dl}({\sigma }^x) - n_{ur}({\sigma }^x) = n_{dl}({\sigma }) - n_{ur}({\sigma })\). Since \(n^+({\sigma }^x) = n^+({\sigma }) -1\), the proof is completed. \(\square \)

2.2 The Results

We are interested in the limit \(L\rightarrow \infty \) and in the low temperature (\(J\) large) regime defined as follows.

Definition 2.2

The low temperature regime with parameters \(k\) and \(c\) corresponds to the following choice

$$\begin{aligned} J = J(L) = k \log L\qquad q = q(L) = c \frac{\log L}{L} \end{aligned}$$
(13)

We state here our two main results. The first concerns the relation between the two considered models, controlling the distance in total variation between the Gibbs measure of the symmetric standard Ising model and the stationary distribution of the asymmetric PCA. The numerical constants appearing in the statements of the theorems are not optimized.

Theorem 2.3

In the low temperature regime with parameter \(k\) and \(c\), there is a constant \(C>0\) such that

$$\begin{aligned} \Vert \pi _{PCA}-\pi _G\Vert _{TV}\le C \left( \frac{1}{L^{\frac{c}{2}-1}} + \frac{1}{L^{2k-2}} \right) \!. \end{aligned}$$
(14)

The second result is the control of the convergence to equilibrium of the PCA proving that the mixing time of the parallel dynamics is polynomial in \(L\).

Theorem 2.4

In the low temperature regime with parameter \(k\) and \(c\) such that \(c > \frac{1}{2}\) and \(k-4c >4\), we have

$$\begin{aligned} \lim _{L\rightarrow \infty } d_{PCA}(L^{8k})=0 \end{aligned}$$

where

$$\begin{aligned} d_{PCA}(t)=\sup _{\sigma }||P^t({\sigma },.)-{\pi }_{PCA}(.)||_{TV} \end{aligned}$$

Theorems 2.3 and 2.4 imply that the Markov chain defined in Ref. (5) provides a asymptotically polynomial approximation scheme for the Ising model on the 2d torus.

Remark 2.5

There is another example, see [8], of rapid mixing of a Markov chain having as stationary measure the Gibbs measure of the low temperature Ising model. This example is the Swendsen-Wang dynamics. As in our case, such dynamics is fast because it allows the possibility to update in a single step of the Markov chain a large amount of spins. However, as far as we know, this is the first case in literature of a fast irreversible dynamics based on the idea of the PCA. In particular it seems that the ingredient of the irreversibility combined with parallelism is quite crucial in order to obtain the fast mixing. Indeed the dynamical stability of the NW-SE diagonals, mentioned in the introduction (see also Sect. 4), is based exactly on the combination of parallelism and complete asymmetry of the interaction. The interest of these results is also due to the fact that irreversible Markov chains are a good model for the study of stationary measure of non equilibrium statistical mechanical systems.

Remark 2.6

The results listed above are quite delicate. As mentioned in the introduction, the periodic boundary conditions play a crucial role in the proof of our results. Moreover, the choice of the parameters’ scaling (namely our low temperature regime, see 2.2) is also crucial in our arguments. In particular, our estimates on the mixing time (Theorem 2.4 below) are based of a separation of time scales which requires this choice. It would be desirable to have estimates on the mixing time for \(J\) large but independent of \(L\); such estimates are, for the time being, out of reach. It would be reasonable to conjecture that also in this case the mixing time grows polynomially in the size of the system.

3 The Relation Between Ising Gibbs Measure and PCA Stationary Measure at Low Temperature

We prove in this section Theorem 2.3.

We use the representation introduced in Ref. [3]. Note first of all that

$$\begin{aligned} Z_{\sigma }&=\sum _{\tau }e^{-\sum _{x} [J(\sigma _x^d+\sigma _x^l)+q{\sigma }_x]{\tau }_x}\nonumber \\&= e^{q|{\Lambda }|}\sum _{I\subset {\Lambda }}e^{\sum _{(x,y)}J{\sigma }_x{\sigma }_y-2\sum _{x\in I}J({\sigma }_x{\sigma }_x^u+{\sigma }_x{\sigma }_x^r)-2q|I|}\nonumber \\&= e^{q|{\Lambda }|}w^{G}({\sigma })\prod _{x\in {\Lambda }}(1+\delta \phi _x) \end{aligned}$$
(15)

where we have used \({\delta }=e^{-2q}\),

$$\begin{aligned} w^{G}({\sigma }) = e^{-H({\sigma })}, \end{aligned}$$

and

$$\begin{aligned} \phi _x=e^{-2J({\sigma }_x{\sigma }_x^u+{\sigma }_x{\sigma }_x^r)}. \end{aligned}$$

We will call

$$\begin{aligned} f(\sigma )=\prod _{x\in {\Lambda }}(1+\delta \phi _x). \end{aligned}$$
(16)

It easily follows that

$$\begin{aligned} {\pi }^{PCA}({\sigma })={\pi }^{G}({\sigma })\frac{f}{{\pi }^G(f)} \end{aligned}$$
(17)

We have then

$$\begin{aligned} \Vert \pi _{PCA}-\pi _G\Vert _{TV}=\pi _G\left[ \left| \frac{f}{\pi _G(f)}-1\right| \right] \end{aligned}$$
(18)

Write now the Gibbs measure in terms of Peierls contours (see 9):

$$\begin{aligned} \pi _G({\sigma })=\frac{e^{-2Jl({\sigma })}}{Z_G} \end{aligned}$$

where \(l({\sigma }) := |{\gamma }({\sigma })|\) is the total length of the Peierls contours of the configuration \({\sigma }\).

Let \(\mathbf{1}\) be the configuration with \({\sigma }_x=1\) for all \(x\).

Normalizing the value of \(f({\sigma })\) with the value \(f(\mathbf{1})\), which is a constant ineffective in the evaluation of (18), the expression of \(f({\sigma })\) can be written as (see also 11)

$$\begin{aligned} f({\sigma })=\left[ \frac{(1+{\delta }e^{4J})}{(1+{\delta }e^{-4J})}\right] ^{n_{ur}({\sigma })} \left[ \frac{(1+{\delta })}{(1+{\delta }e^{-4J})}\right] ^{l({\sigma })-2n_{ur}({\sigma })}. \end{aligned}$$
(19)

where we have simply observed that

$$\begin{aligned} {\sigma }_x{\sigma }_x^u+{\sigma }_x{\sigma }_x^r = \left\{ \begin{array}{ll} 2 &{}\quad \text{ if } (x,x^u) \in {\gamma }({\sigma }), (x,x^r) \in {\gamma }({\sigma }) \\ 0 &{} \quad \text{ if } (x,x^u) \not \in {\gamma }({\sigma }), (x,x^r) \not \in {\gamma }({\sigma }) \\ 1 &{}\quad \text{ otherwise. } \end{array} \right. \end{aligned}$$

Note that with this normalization \(f(\mathbf{1})=1\) obviously holds.

Let us first give an upper bound of \(\pi _G(f)\). We can write

$$\begin{aligned} \pi _G(f)&= \frac{1}{Z_G}\sum _{\sigma }\left[ e^{-4J}\frac{(1+{\delta }e^{4J})}{(1+{\delta }e^{-4J})}\right] ^{n_{ur}({\sigma })} \left[ e^{-2J}\frac{(1+{\delta })}{(1+{\delta }e^{-4J})}\right] ^{l({\sigma })-2n_{ur}({\sigma })} \\&\le \frac{1}{Z_G}\sum _{\sigma }\left[ {\delta }+e^{-4J}\right] ^{n_{ur}({\sigma })}\left[ 2e^{-2J}\right] ^{l({\sigma })-2n_{ur}({\sigma })}. \end{aligned}$$

To give estimates for this last sum, we use again Peierls contours. We say that a pair of adjacent bonds \((x,x^u), (x,x^r)\) both belonging to \({\gamma }({\sigma })\) form a ur-elbow. Note that the only closed paths in \({\Lambda }^*\) exclusively consisting of ur-elbows is necessarily union of complete diagonals (actually of a even number of diagonals, for the contour to correspond to a spin configuration). Any contour \({\gamma }= {\gamma }({\sigma })\) can be decomposed as \({\gamma }= {\gamma }_D \cup {\gamma }_{ND}\), where \({\gamma }_D\) only contains complete diagonals, while \({\gamma }_{ND}\) has no complete diagonal. Observe that \(l({\sigma })-2n_{ur}({\sigma }) = 0 \ \iff \ {\gamma }_{ND}({\sigma }) = \emptyset \). Now, for any fixed \(m \ge 0\) we obtain an upper bound for the contribution of all configurations \({\sigma }\) such that \(l({\sigma })-2n_{ur}({\sigma }) = m\). We can write

$$\begin{aligned} A(m)&:= \sum _{{\sigma }: l({\sigma })-2n_{ur}({\sigma }) = m} \left[ {\delta }+e^{-4J}\right] ^{n_{ur}({\sigma })}\left[ 2e^{-2J}\right] ^{l({\sigma })-2n_{ur}({\sigma })}\\&= 2 \sum _{{\gamma }: |{\gamma }|-2n_{ur}({\gamma }) = m} \left[ {\delta }+e^{-4J}\right] ^{n_{ur}({\gamma })}\left[ 2e^{-2J}\right] ^{m}, \end{aligned}$$

where the factor \(2\) come from the fact that there are exactly two configurations for each contour. Observe now that \(e^{-2J} = 1/L^{2k}\) while

$$\begin{aligned} {\delta }+e^{-4J} = e^{-2c \frac{\log L}{L}} + e^{-4k \log L} \le 1-c \frac{\log L}{L} + \frac{1}{L^{4k}} \le 1- \frac{c}{2} \frac{\log L}{L} < 1, \end{aligned}$$

for \(L\) sufficiently large. Thus, using the decomposition \({\gamma }= {\gamma }_D \cup {\gamma }_{ND}\),

$$\begin{aligned} A(m)&\le 2\sum _{{\gamma }: |{\gamma }|-2n_{ur}({\gamma }) = m} \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^{n_{ur}({\gamma }_D)} \left( \frac{2}{L^{2k}} \right) ^m \\&\le 2\left( \frac{2}{L^{2k}} \right) ^m N_m \sum _{{\gamma }: {\gamma }_{ND} = \emptyset } \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^{\frac{|{\gamma }|}{2}}, \end{aligned}$$

where

$$\begin{aligned} N_m := \left| \{ {\gamma }: {\gamma }_D = \emptyset , \, |{\gamma }|-2n_{ur}({\gamma }) = m\} \right| . \end{aligned}$$

A very rough upper bound for \(N_m\) can be obtained as follows. We first place the \(m\) bonds not belonging to a ur-elbow (we have at most \((2L^2)^m\) different choices); call \(\tilde{{\gamma }}_{ND}\) the resulting set of bonds. We then place an arbitrary number of ur-elbows, with the constraint that the endpoints of a connected sequence of NE elbows must coincide with two of the \(2m\) endpoints of \(\tilde{{\gamma }}_{ND}\). Moreover, for any endpoint \(x\) of \(\tilde{{\gamma }}_{ND}\) there are at most two connected sequences of ur-elbows which connect \(x\) to exactly one endpoint of \(\tilde{{\gamma }}_{ND}\). Thus, sequences of ur-elbows can be placed in at most \(4^{2m}\) different ways. This yields

$$\begin{aligned} N_n \le \left( 32 L^2 \right) ^m. \end{aligned}$$

To complete the upper bound for \(A(m)\), we need to estimate

$$\begin{aligned} \sum _{{\gamma }: {\gamma }_{ND} = \emptyset } \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^{\frac{|{\gamma }|}{2}}. \end{aligned}$$

Since such diagonal contours are just union of complete diagonals, and each complete diagonal has length \(2L\), for \(L\) sufficiently large we have

$$\begin{aligned} \sum _{{\gamma }: {\gamma }_{ND} = \emptyset } \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^{\frac{|{\gamma }|}{2}}&\le \sum _{l \ge 0} \left( {\begin{array}{c}L\\ l\end{array}}\right) \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^{lL} \\&= \left[ 1+ \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^L \right] ^L \le 1+\frac{2}{L^{\frac{c}{2}-1}}. \end{aligned}$$

Thus we have

$$\begin{aligned} A(m) \le 2 \left( 1- \frac{c}{2} \frac{\log L}{L} \right) ^{\frac{|{\gamma }|}{2}} \left( \frac{64}{L^{2k-2}} \right) ^m . \end{aligned}$$

Summing up, using also the obvious fact that \(Z_G = \sum _{{\sigma }} e^{-2J l({\sigma })} > 2\), we can choose \(C>0\) such that for \(L\) large enough:

$$\begin{aligned} \pi _G(f)&\le \frac{1}{Z_G}\sum _{m \ge 0} A(m) \le \left( 1+\frac{2}{L^{\frac{c}{2}-1}} \right) \left[ \sum _{m \ge 0} \left( \frac{64}{L^{2k-2}} \right) ^m \right] \nonumber \\&\le 1+ \frac{C}{L^{\frac{c}{2}-1}} + \frac{C}{L^{2k-2}}. \end{aligned}$$
(20)

Comparing (19) with (20), and using the fact that \({\delta }\simeq 1\) (in particular \({\delta }\ge \frac{1}{2}\)) for large \(L\), one realizes that \(f({\sigma }) > \pi _G(f)\) for all configurations different from \(\pm \mathbf{1}\). This is evident for \(n_{ur}({\sigma })>0\); for \(n_{ur}({\sigma })=0\) we have that if \(l({\sigma })>0\), then \(l({\sigma })\ge L\), giving \(f\ge (1+1/4)^L\). By this observation

$$\begin{aligned} \pi _G\left[ \left| \frac{f}{\pi _G(f)}-1\right| \right] =\frac{2}{\pi _G(f)}\sum _{{\sigma }:f({\sigma })<\pi _G(f)} \frac{e^{-H({\sigma })}}{Z_G}[\pi _G(f)-f({\sigma })] \end{aligned}$$

and the sum contains actually only the two configurations \({\sigma }=\pm \mathbf 1\), such that \(f({\sigma })=1\).

Hence we have

$$\begin{aligned} \Vert \pi _{PCA}-\pi _G\Vert _{TV}&= \pi _G\left[ \left| \frac{f}{\pi _G(f)}-1\right| \right] \le \frac{2}{\pi _G(f)}[\pi _G(f)-1]=2\left( 1-\frac{1}{\pi _G(f)}\right) \nonumber \\&\le 2C \left( \frac{1}{L^{\frac{c}{2}-1}} + \frac{1}{L^{2k-2}} \right) \!. \end{aligned}$$
(21)

Inserting (20) in Ref. (21), and using (18), we complete the proof of the theorem.

4 PCA at Low Temperature

4.1 Realization Through Random Numbers

In what follows it will be useful to realize the Markov chain described above using uniformly distributed random numbers. Let \(\{U_x(n): \, x \in {\Lambda }, \, n \ge 1\}\) be a family of i.i.d. random variables, uniformly distributed in \((0,1)\), defined in some probability space \((\Omega ,\mathcal {A},P)\). Given the initial configuration \({\sigma }(0)\), define recursively \({\sigma }(n+1)\) by: \({\sigma }_x(n+1) = 1\) if and only if one of the following conditions holds:

  1. (A)

    \({\sigma }_x^d(n) = {\sigma }_x^l(n) = 1\) and \(U_x(n+1) \le \frac{e^{2J + q {\sigma }_x(n)}}{2 \cosh (2J + q {\sigma }_x(n))}\);

  2. (B)

    \({\sigma }_x^d(n) = {\sigma }_x^l(n) = -1\) and \(U_x(n+1) \le \frac{e^{-2J + q {\sigma }_x(n)}}{2 \cosh (-2J + q {\sigma }_x(n))}\);

  3. (C)

    \({\sigma }_x^d(n) = -{\sigma }_x^l(n) \) and \(U_x(n+1) \le \frac{e^{q {\sigma }_x(n)}}{2 \cosh (q {\sigma }_x(n))}\),

while \({\sigma }_x(n+1) = -1\) otherwise.

Remark 4.1

Note that with this construction of the process it is immediate to see that the Markov chain preserves the componentwise partial order on configuration. Coupling the processes \(({\sigma }(n))_{n\in {\mathbb N}}\) starting at \({\sigma }\) and \(({\sigma }'(n))_{n\in {\mathbb N}}\) starting at \({\sigma }'\) by using the same realization of uniform variables \(\{U_x(n): \, x \in {\Lambda }, \, n \ge 1\}\) we have that if \({\sigma }\le {\sigma }'\) in the sense that \({\sigma }_x\le {\sigma }'_x\) for all \(x\in {\Lambda }\) then \({\sigma }(n)\le {\sigma }'(n)\) for each time \(n\ge 0\).

4.2 Zero-Temperature Dynamics

In the low temperature regime considered in this paper, updatings of type (A) or, symmetrically, those for which \({\sigma }_x^d(n) = {\sigma }_x^l(n) = -1\) \( \mapsto \) \({\sigma }_x(n+1) = -1\), are typical, as they occur with probability \(\frac{e^{2J \pm q}}{2 \cosh (2J\pm q} \simeq 1\); conversely updatings of type (B), or those for which \({\sigma }_x^d(n) = {\sigma }_x^l(n) = 1\) \( \mapsto \) \({\sigma }_x(n+1) = -1\), are atypical, as they occur with probability \(\frac{e^{-2J \pm q}}{2 \cosh (2J \pm q)} \simeq 0\). Finally, updatings of type (C), or those for which \({\sigma }_x^d(n) = -{\sigma }_x^l(n) = 1\) \( \mapsto \) \({\sigma }_x(n+1) = -1\), are neutral, as they occur with probability \( \frac{e^{\pm q}}{2 \cosh (q)} \simeq \frac{1}{2}\).

Let \(N > 0\) be a given (large) time. In next section it will be useful to rule out events of very small probability. For instance, given a time \(N>0\), we can “force” the system to perform no atypical updating up to time \(N\). To this aim, we can define

$$\begin{aligned} S := \min \left\{ n \ge 1: \, \exists \, x \text{ such } \text{ that } U_x(n) \not \in \left( \frac{e^{-2J+q}}{2\cosh (2J-q)}, 1- \frac{e^{-2J+q}}{2\cosh (2J-q)}\right) \right\} \!,\nonumber \\ \end{aligned}$$
(22)

and condition to the event \(\{S>N\}\). Note that under \({\mathbb P}(\cdot | S>N)\) the random numbers \(\{U_x(n): \, x \in {\Lambda }, \, 1 \le n \le N\}\) are i.i.d., uniformly distributed on \(\left( \frac{e^{-2J+q}}{2\cosh (2J-q)}, 1- \frac{e^{-2J+q}}{2\cosh (2J-q)}\right) \). Thus, \(({\sigma }(n))_{n=0}^N\) is a homogeneous Markov chain also under \({\mathbb P}(\cdot | S>N)\), for which only typical and neutral transitions are allowed. This conditioned dynamics is often called the zero-temperature dynamics corresponding, for the inverse temperature parameter \(J\), to the limit \(J\rightarrow \infty \).

Note also that, if \(A\) is an event depending on \(({\sigma }(n))_{n=0}^N\), then

$$\begin{aligned} {\mathbb P}(A|S>N){\mathbb P}(S>N) \le {\mathbb P}(A) \le {\mathbb P}(A|S>N) + {\mathbb P}(S \le N), \end{aligned}$$
(23)

so that estimates for \({\mathbb P}(A)\) are obtained if estimates for \({\mathbb P}(A|S>N)\) and \({\mathbb P}(S>N)\) are available.

Similarly, to control that the system performs at most one atypical updating per time up to time \(N\), we define the random time

$$\begin{aligned} T&:= \min \left\{ n \ge 1: \, \exists \, x \ne y \text{ such } \text{ that } U_x(n) ,U_y(n)\right. \nonumber \\&\quad \left. \not \in \left( \frac{e^{-2J+q}}{2\cosh (2J-q)}, 1-\frac{e^{-2J+q}}{2\cosh (2J-q)}\right) \right\} \!. \end{aligned}$$
(24)

By definition \(T\ge S\). We now establish estimates for the random times \(S\) and \(T\) independently of the starting configurations. From now on, when we need to indicate the initial condition \({\sigma }(0) = {\sigma }\), we write \({\mathbb P}_{{\sigma }}\) rather that \({\mathbb P}\) for the underlying probability.

We will adopt the following notation. For a given function \(f:(0,+\infty ) \rightarrow (0,+\infty )\) we let \(O(f(r))\) to be any function for which there is a constant \(C>0\) satisfying \(\frac{f(r)}{C} \le O(f(r)) \le C f(r)\) for \(r \ge C\). Moreover, \(a_r \sim b_r\) will stand for \(\lim _{r \rightarrow +\infty } \frac{a_r}{b_r} = 1\).

Lemma 4.2

There exist constants \(C_i\) such that for each \(a>0\) and \(L\) sufficiently large we have

$$\begin{aligned} \sup _{{\sigma }}{\mathbb P}_{\sigma }(S > L^{4k-2+a}) \le C_1 e^{-O(L^a)}. \end{aligned}$$
(25)
$$\begin{aligned} \sup _{{\sigma }}{\mathbb P}_{\sigma }({S} \le L^{4k-2-a}) \le C_2 L^{-a} \end{aligned}$$
(26)
$$\begin{aligned} \sup _{{\sigma }}{\mathbb P}_{\sigma }(T \le L^{8k-4-a}) \le C_3 L^{-a} \end{aligned}$$
(27)
$$\begin{aligned} \sup _{{\sigma }}{\mathbb P}_{\sigma }(T =S) \le C_4 L^{-(4k-2)+2a}. \end{aligned}$$
(28)

Proof

To show (25), observe that \(\{S>n\}\) means that up to time \(n\) only typical updatings have been made. Since the probability that a given updating is typical is bounded above by \(1- \frac{e^{-2J-q}}{2 \cosh (2J+q)} = 1-O(L^{-4k})\),

$$\begin{aligned} {\mathbb P}_{\sigma }(S > L^{4k-2+a}) \le \left( 1-O(L^{-4k}) \right) ^{L^2 \cdot L^{4k-2+a}} \le C_1 e^{-O(L^a)}, \end{aligned}$$

for some \(C_1>0\), which establishes (25). To prove (26), observe that

$$\begin{aligned}&{\mathbb P}_{\sigma }({S} \le L^{4k-2-a}) \nonumber \\&\quad = P\left( \exists x \in {\Lambda }, \, n \le L^{4k-2-a}: \, U_x(n) \not \in \left( \frac{e^{-2J+q}}{2\cosh (2J-q)}, 1- \frac{e^{-2J+q}}{2\cosh (2J-q)}\right) \right) \\&\quad \le L^2 L^{4k-2-a} \frac{e^{-2J+q}}{\cosh (2J-q)} = O(L^{-a}). \end{aligned}$$

The proof of (27) is similar, the difference being that at least two atypical updatings need to occur:

$$\begin{aligned}&{\mathbb P}_{\sigma }(T \le L^{8k-4-a}) \\&\quad = P\left( \exists x,y \!\in \! {\Lambda }, \, n \!\le \! L^{8k-4-a}\!: \, U_x(n), U_y(n) \!\not \in \! \left( \frac{e^{-2J+q}}{2\cosh (2J-q)}, 1\!-\! \frac{e^{-2J+q}}{2\cosh (2J\!-\!q)}\right) \right) \\&\quad \le L^4 L^{8k-4-a} \left( \frac{e^{-2J+q}}{\cosh (2J\!-\!q)} \right) ^2 = O(L^{-a}). \end{aligned}$$

Finally, using (25) and (27),

$$\begin{aligned} {\mathbb P}_{\sigma }(T =S)&= {\mathbb P}_{\sigma }(T =S, \, S > L^{4k-2+a}) + {\mathbb P}_{\sigma }(T =S, \, T \le L^{4k-2+a}) \\&\le C_1 e^{-O(L^a)} + O(L^{-(4k-2)+2a}) = O(L^{-(4k-2)+2a}). \end{aligned}$$

\(\square \)

5 Mixing Time and Tunneling Time

In this section we prove Theorem 2.4 by giving estimates on the distribution of the hitting time

$$\begin{aligned} T_\mathbf{1} := \min \{n \ge 1: {\sigma }(n) = \mathbf{1}\}. \end{aligned}$$

Since the dynamics described in the previous construction preserves the componentwise partial order on configurations, as noted in Remark 4.1, we have

$$\begin{aligned} \sup _{{\sigma }\in \mathcal {S}}{\mathbb P}_{{\sigma }}(T_\mathbf{1} > N) \le {\mathbb P}_\mathbf{-1}(T_\mathbf{1} > N). \end{aligned}$$
(29)

Thus, an upper bound on \({\mathbb P}_\mathbf{-1}(T_\mathbf{1} > N)\) provides an upper bound for the mixing time. Indeed by using the coupling defined in Remark 4.1 we can define the coupling time

$$\begin{aligned} {\tau }_{couple}=\min \{n\ge 0:\; {\sigma }(n)={\sigma }'(n)\}. \end{aligned}$$

The total variation distance between the evoluted measure at time \(n\) and the stationary one, \(d_{PCA}(n)\), is related to the coupling time by the following

$$\begin{aligned} d_{PCA}(n)\le \max _{{\sigma },{\sigma }'}{\mathbb P}_{{\sigma },{\sigma }'}({\tau }_{couple}>n) \end{aligned}$$

moreover, again due to the monotonicity of the dynamics mentioned above, we have

$$\begin{aligned} \max _{{\sigma },{\sigma }'}{\mathbb P}_{{\sigma },{\sigma }'}({\tau }_{couple}>n)\le {\mathbb P}_\mathbf{-1}(T_\mathbf{1} > n). \end{aligned}$$

So Theorem 2.4 immediately follows by the following:

Theorem 5.1

In the low temperature regime given in Definition 2.2, with \(c > \frac{1}{2}\) and \(k-4c >~4\),

$$\begin{aligned} \lim _{L \rightarrow +\infty } {\mathbb P}_\mathbf{-1}\left( T_\mathbf{1} > L^{8k} \right) = 0. \end{aligned}$$

The proof of this theorem is obtained in two steps and both are driven by the following idea. We have three time scales given by three well separated order of magnitude of transition probabilities. In the first scale the dynamics recurs in a very small subset of the state space \({\mathcal {S}}_1\subset {\mathcal {S}}\), this recurrence can be described in terms of a suitable 1 dimensional random walk. On the second time scale the process jumps between different states in \({\mathcal {S}}_1\) and we can define a chain on this restricted state space \({\mathcal {S}}_1\) and estimate its transition probabilities. The third time scale is large enough with respect to the thermalisation of the random walk and thus can be ignored.

In the first step we show that, due to the particular considered interaction, the configurations with the same spin in each diagonal are stable under the zero-temperature dynamics and when the first atypical move takes place, at time \(S\), with large probability we have \(S<T\) so that a single discrepancy appears in a diagonal. The crucial remark is that starting with such a configuration, the time \(R\) needed to come back to diagonal configurations is typically much shorter than the waiting time for the next atypical move, so that starting from \(\mathbf{-1}\) the dynamics can be studied in terms of a much simpler evolution moving in the space of diagonal configurations.

We need some notations. We denote by \(\theta \) the horizontal shift on \({\Lambda }\):

$$\begin{aligned} \theta (i,j) = (i+1,j). \end{aligned}$$

By a common abuse of notation, we let \(\theta \) act on configurations by \(\theta {\sigma }_x := {\sigma }_{\theta (x)}\). For \(m=0,1,L-1\), let \(D_m\) denote the \(m\)-th NW-SE diagonal:

$$\begin{aligned} D_m := \{ (i,j) \in {\Lambda }: i+j = m\} \end{aligned}$$

(sums are, always, mod. \(L\)). Note that \(D_{m+1} = \theta D_m\). The diagonal configurations, i.e. those that are constant on the diagonals, are denoted by:

$$\begin{aligned} \mathcal {D} := \{ {\sigma }\in \mathcal {S} : \,x,y \in D_m \Rightarrow {\sigma }_x = {\sigma }_y \}. \end{aligned}$$

Assuming \({\sigma }(0) = {\sigma }\in \mathcal {D}\), it is immediately seen from the construction of the process given in Sect. 18 that if only typical updatings occur up to time \(N\), then \({\sigma }(n) = \theta ^n {\sigma }\) for \(n \le N\). Thus, the evolution is trivial up to the stopping time \(S\) and actually

$$\begin{aligned} S= \min \{n : \, {\sigma }(n) \ne \theta {\sigma }(n-1)\}. \end{aligned}$$
(30)

Let \(T\) be the time defined in (24). In the event \(S < T\), which happens, as proven in Lemma 4.2 with high probability, \({\sigma }(S)\) is diagonal up to a single discrepancy, i.e. there is a unique \(X \in {\Lambda }\) such that \({\sigma }_X(S)\) is opposite to all other spins in the diagonal containing \(X\), while \({\sigma }(S)\) is constant on all other diagonals. Next Lemma shows that the site \(X\) at which the first discrepancy appears is nearly uniformly distributed in \({\Lambda }\).

Lemma 5.2

The conditional probability

$$\begin{aligned} {\mathbb P}(X = x| S < T) \end{aligned}$$

is constant on both elements of the following partition of \({\Lambda }\):

$$\begin{aligned} \{x : {\sigma }_x = {\sigma }_x^l\}, \ \ \{x : {\sigma }_x = -{\sigma }_x^l\} \end{aligned}$$

and

$$\begin{aligned} \frac{e^{-4q}}{L^2} \le {\mathbb P}(X = x| S < T) \le \frac{e^{4q}}{L^2} \end{aligned}$$
(31)

The next step in our argument consists in studying the process from the time the first discrepancy appears to the next hitting time of \(\mathcal {D}\), i.e. the time at which a diagonal configuration obtained. As we shall see, the time needed to go back to \(\mathcal {D}\) is, with high probability, much shorter than the time needed for the next atypical updating to take place.

For a rigorous analysis, under the condition \(\{S < T\}\), we study the process \(\{{\sigma }(S + n): n \ge 0\}\). By the strong Markov property, this is equivalent to study the process \(\{{\sigma }(n): n \ge 0\}\) with an initial condition \({\sigma }(0) = {\sigma }\) which is diagonal, with a single discrepancy in \(x \in D_m\). Starting with such \({\sigma }\), besides typical and atypical updatings, neutral updatings arise. Indeed, the sites \(x^u\) and \(x^r\) can perform neutral updatings, having the left neighbor and the down neighbor of opposite sign. Suppose that no atypical updating occur. Then at time \(1\) all diagonals are constant, except at most for the diagonal \(D_{m+1}\). Here there are three possibilities:

  1. (i)

    both \({\sigma }^u_{x}\) and \({\sigma }_x^r\) update to \(-1\): the discrepancy disappears, and \({\sigma }(1)\) is diagonal;

  2. (ii)

    both \({\sigma }^u_{x}\) and \({\sigma }_x^r\) update to \(1\): the discrepancy has doubled, two neighboring sites in \(D_{m+1}\) are \(1\), while the rest of the diagonal is \(-1\).

  3. (iii)

    in both other cases, the discrepancy has just shifted (up or right) to \(D_{m+1}\).

Under the condition of no atypical updatings, this argument can be repeated: the discrepancy is shifted from a diagonal \(D\) to \(\theta D\), and its length can at most increase or decrease by one unit. The configuration goes back to \(\mathcal {D}\) as soon as the discrepancies disappear or fill the whole diagonal. In order to keep fixed the diagonal containing the discrepancy, set

$$\begin{aligned} \eta (n) := \theta ^{-n} {\sigma }(n). \end{aligned}$$

If no atypical updating occur, \(\eta \) remains constant except for the spins in \(D_m\): here the number of spins equal to \(1\) evolves as a random walk, that we show to be nearly symmetric. Standard estimates on random walks allow to estimate the probability the diagonal \(D_m\) gets filled by ones before returning to all \(-1\)’s.

To make this argument precise, define the following stopping time:

$$\begin{aligned} R := \min \{n>0 : \, {\sigma }(n) \in \mathcal {D} \}. \end{aligned}$$

Thus, \(R\) is the time the configuration has returned to \(\mathcal {D}\).

Lemma 5.3

Assume the initial configuration \({\sigma }\) is diagonal with a single discrepancy at \(x\), i.e., if \(x\in D_m\) then \({\sigma }_x=-{\sigma }_y\) for all \(y\not =x\) in \(D_m\), call \({\mathcal D}_x\) these configurations. Assume \(2k-4c-3>0\). Then, for all \(1<r<2k-1\)

$$\begin{aligned} {\mathbb P}_{{\sigma }}(R> L^r | {S} > L^{2k})&\le O(L^{-r+1}) \nonumber \\ {\mathbb P}_{{\sigma }}(R> L^r )&\le O(L^{-r+1}). \end{aligned}$$
(32)
$$\begin{aligned} {\mathbb P}_{{\sigma }}(\eta _x(R) = {\sigma }_x |{S} > L^{2k}) \sim \left\{ \begin{array}{ll} 4c \frac{\log L}{L} &{}\quad \text{ if } {\sigma }_x^u = {\sigma }_x \\ \frac{4c \log L}{L^{4c+1}} &{}\quad \text{ if } {\sigma }_x^u = -{\sigma }_x \end{array} \right. \end{aligned}$$
(33)

Moreover, let \(\eta ^{D_m}\) be the configuration obtained from \(\eta \) by flipping all spins in \(D_m\). Then

$$\begin{aligned} {\mathbb P}_{{\sigma }}\left( \eta (R) = \eta ^{D_m}\right) \sim \left\{ \begin{array}{ll} 4c \frac{\log L}{L} &{}\quad \text{ if } {\sigma }_x^u = {\sigma }_x \\ \frac{4c \log L}{L^{4c+1}} &{}\quad \text{ if } {\sigma }_x^u = -{\sigma }_x \end{array} \right. \end{aligned}$$
(34)

Before continuing our argument, we comment on the meaning of these inequalities. Since by (26) we know that the probability that an atypical updating occurs before time \(L^{2k}\) is small, inequality (32) implies, in particular, that the configuration goes back to \(\mathcal {D}\) in a time much shorter that \({S}\) (we are assuming \(k\) large). Inequality (33) states that the probability the initial discrepancy at \(x\) propagates to the whole diagonal is much higher if \({\sigma }_x\), \(x \in D_m\), is equal to the spins in \(D_{m+1}\). Most importantly, Lemma 5.3 provides estimates on the transition for a starting diagonal configuration \({\sigma }\in \mathcal {D}\) to the next diagonal configuration hit after having left \(\mathcal {D}\). This suggests to study an effective process obtained by observing \(\eta (n)\) only at the times it enters \(\mathcal {D}\).

Define the stopping times

$$\begin{aligned} R_0&:= 0 \nonumber \\ S_1&:= \min \{m >0 : \, {\sigma }(m) \not \in \mathcal {D} \}=S \nonumber \\ R_n&:= \min \{m > S_n : \, {\sigma }(m) \in \mathcal {D} \}=S_n+R\circ \Theta _{S_n} \nonumber \\ S_{n+1}&:= \min \{m > R_n : {\sigma }(m) \not \in \mathcal {D}\}=R_n+S\circ \Theta _{R_n} \end{aligned}$$
(35)

where \(\Theta _t\) is the time shift operator acting on each trajectory of the Markov Chain \(\{{\sigma }(0),{\sigma }(1),\ldots \}\) as a shift

$$\begin{aligned} \Theta _t\{{\sigma }(0),{\sigma }(1),\ldots \}=\{{\sigma }(t),{\sigma }(t+1),\ldots \}. \end{aligned}$$

The following estimates follow from Lemmas 5.2 and 5.3.

Corollary 5.4

The following estimates hold for all \(n \ge 0\):

$$\begin{aligned} {\mathbb P}\left( S_{n+1} - R_n > L^{5k} \right) \le e^{-L^k}. \end{aligned}$$
(36)
$$\begin{aligned} {\mathbb P}\left( R_n - S_n > L^k \right) \le \sup _{{\sigma }\in \cup _x{\mathcal D}_x} {\mathbb P}_{\sigma }\left( R>L^k\right) +\sup _{\sigma }{\mathbb P}_{\sigma }\left( S=T\right) \le O\left( L^{-k+1}\right) . \end{aligned}$$
(37)

We now consider the Markov chain \((\eta (n))_{n \ge 0}\) at the times \(R_n\) where the chain visits \(\mathcal {D}\); more precisely we define

$$\begin{aligned} \xi (n) := \eta (R_n). \end{aligned}$$
(38)

By the strong Markov property, \((\xi (n))_{n \ge 0}\) is a Markov chain in \(\mathcal {D}\). Estimates on its transition probability are given in the following statement.

Corollary 5.5

For all \(\eta \in \mathcal {D}\) the following estimates hold.

  1. (a)

    If \(\eta _x = - \eta _y\) for \(x \in D_m\), \(y \in D_{m+1}\) (we say \(D_m\) is a favorable diagonal), then

    $$\begin{aligned} {\mathbb P}\left( \xi (n+1) = \eta ^{D_m} | \xi (n) = \eta \right) \ge O\left( \frac{\log L}{L^2} \right) . \end{aligned}$$
    (39)

    Moreover, the above conditional probability is constant in \(m\) on both elements of the partition of \(\{0,1,\ldots ,L-1\}\):

    $$\begin{aligned} \{m: x \in D_m, y \in D_{m-1} \Rightarrow {\sigma }_x = {\sigma }_y\}, \ \ \{m: x \in D_m, y \in D_{m-1} \Rightarrow {\sigma }_x = - {\sigma }_y\}. \end{aligned}$$
  2. (b)

    If \(\eta _x = \eta _y\) for \(x \in D_m\), \(y \in D_{m+1}\) (\(D_m\) is an unfavorable diagonal), then

    $$\begin{aligned} O\left( L^{-4c-2}\right) \le {\mathbb P}\left( \xi (n+1) = \eta ^{D_m} | \xi (n) = \eta \right) \le O\left( L^{-4c-1}\right) . \end{aligned}$$
    (40)
  3. (c)
    $$\begin{aligned} {\mathbb P}\left( \xi (n+1) \not \in \{ \eta , \eta ^{D_m}: \, m=0,\ldots ,L-1\}| \xi (n) = \eta \right) \le O\left( L^{-k+1}\right) \end{aligned}$$
    (41)

Proof

Estimates (39) and (40) follow from (34) and the fact (see 31) that a discrepancy is nearly uniformly distributed in \({\Lambda }\) (Lemma 5.2 ). Estimate (41) follows for the observation that if \(\xi (n+1) \not \in \{ \eta , \eta ^{D_m}: \, m=0,\ldots ,L-1\}\), then necessarily either two atypical updatings have occurred simultaneously between times \(R_n\) and \(S_{n+1}\), or an atypical updating have have occurred between times \(S_{n+1}\) and \(R_{n+1}\); the probability of this event has been estimated in (28) (used here with \(a = k-1\)) and (32). \(\square \)

The process \(\xi (n)\) defined in (38) starts at \(\xi (0) = \mathbf{-1}\), and it can clearly identified with a process taking values in \(\{-1,1\}^L\). Thus we write \(\xi = (\xi _i)_{i=0}^{L-1}\), where \(\xi _i\) is the spin in the diagonal \(D_i\). By (40), after a waiting time of order at most \(L^{4c+2}\), a one is created at some \(i\). At this point there are two favorable diagonals: \(D_i\) and \(D_{i-1}\); all other diagonals are unfavorable. Thus, in one time step, two transitions are equally likely: \(\xi _i\) goes back to \(-1\) or \(\xi _{i-1}\) flips to \(1\). By (39), these transitions occur with probability \(p \ge O\left( \frac{\log L}{L^2} \right) \). The probability that \(\xi \) changes to some other configurations is, by (40) and (41), not larger than \( O\left( L^{-k+1}\right) + O\left( L^{-4c-1}\right) \). In the case \(\xi \) is back to \(\mathbf{-1}\) the process starts afresh. Otherwise, there are two consecutive ones at \(i-1,i\). The above argument can be iterated: in the next step two diagonals are favorable, \(D_i\) and \(D_{i-2}\), so \(\xi _{i-2}\) and \(\xi _i\) flips with the same probability \(p\). Therefore, with overwhelming probability, the ones in \(\xi (n)\) are consecutive, and their number evolves, up to events of small probability, as a symmetric \(p\) random walk. This makes simple, for this effective process, to give estimates on the hitting time of \(\mathbf{1}\).

Lemma 5.6

Define \(H^{(\xi )}_\mathbf{1}\) the first time \(\xi (n)\) visits \(\{\mathbf{1}\}\). Then, assuming \(c > \frac{1}{2}\) and \(k-4c >4\),

$$\begin{aligned} {\mathbb P}\left( H^{(\xi )}_\mathbf{1} > L^{k+2} \right) \le O\left( L^{-1}\right) . \end{aligned}$$

We are now ready to complete the proof of Theorem 5.1. Indeed using also Corollary 5.4,

$$\begin{aligned} {\mathbb P}_\mathbf{-1}\left( T_\mathbf{1} > L^{8k} \right)&\le {\mathbb P}\left( H^{(\xi )}_\mathbf{1}>L^{k+2} \right) + {\mathbb P}\left( R_{L^{k+2}}>L^{8k}\right) \\&\le O\left( L^{-1}\right) + \sum _{n \le L^{k+2}} {\mathbb P}\left( R_{n} - R_{n-1}>L^{7k-2}\right) \\&= O\left( L^{-1}\right) , \end{aligned}$$

which is the desired result.

5.1 Proofs of the Lemmas

We are therefore left with the proof of Lemmas 5.2, 5.3 and 5.6.

Proof of Lemma 5.2

For the proof of (31), recall that an atypical updating is made at \(x\) at time \(n\) if \(U_n(x) \in I_x({\sigma }(n-1))\), where

$$\begin{aligned} I_x({\sigma }) = \left\{ \begin{array}{ll} \left( 0, \frac{e^{-2J + q {\sigma }_x}}{2 \cosh (-2J + q {\sigma }_x)}\right) &{} \text{ if } {\sigma }_x^d = {\sigma }_x^l = -1 \\ \left( \frac{e^{2J + q {\sigma }_x}}{2 \cosh (2J + q {\sigma }_x)},1 \right) &{} \text{ if } {\sigma }_x^d = {\sigma }_x^l = 1 \end{array} \right. \end{aligned}$$

We have:

$$\begin{aligned} {\mathbb P}\left( X=x | T > S\right) =\frac{1}{{\mathbb P}(T>S)}\sum _n{\mathbb P}\left( X=x,\, S=n, \, T>n\right) \end{aligned}$$

and

$$\begin{aligned}&\{X=x,\, S=n, \, T>n \} \nonumber \\&\quad = \{S > n-1\} \cap \left\{ U_x(n) \in I_x({\sigma }(n-1)), \, U_y(n) \not \in I_y({\sigma }(n-1)) \text{ for } y \ne x \right\} . \end{aligned}$$

so that

$$\begin{aligned} {\mathbb P}\left( X=x,\, S=n, \, T>n\right)&= {\mathbb P}\left( S>n-1\right) |I_x({\sigma }(n-1))|\prod _{ y \ne x}\Big (1-|I_y({\sigma }(n-1))|\Big )\\ {}&= {\mathbb P}\left( S>n\right) \frac{|I_x({\sigma }(n-1))|}{1-|I_x({\sigma }(n-1))|}=: {\mathbb P}\left( S>n\right) f_x(n) \end{aligned}$$

We note that the function \(f_x(n)\) as a function on \(x\), is constant on the sets

$$\begin{aligned} M_+=\{x : {\sigma }_x = {\sigma }_x^l\}, \ \ M_-= \{x : {\sigma }_x = -{\sigma }_x^l\}, \end{aligned}$$

so on these sets \({\mathbb P}\left( X=x | T > S\right) \) is constant, say \({\mathbb P}\left( X=x | T > S\right) =P_{M_{\pm }}\) . Moreover since

$$\begin{aligned} \min _{\sigma }|I_x({\sigma })|\ge e^{-4q}\max _{\sigma }|I_x({\sigma })| \end{aligned}$$

we have uniformly in \(n\)

$$\begin{aligned} e^{-4q}<\frac{f_x(n)}{f_y(n)}<e^{4q} \end{aligned}$$

and so

$$\begin{aligned} e^{-4q}<\frac{P_{M_+}}{P_{M_-}}<e^{4q},\quad |M_+| P_{M_+}+|M_-|P_{M_+}=1 \end{aligned}$$

from which (31) easily follows. \(\square \)

Proof of Lemma 5.3

We prove (32) and (33). The second inequality in (32) follows form the first, (26) and the assumption \(r <2k-1\), since

$$\begin{aligned} {\mathbb P}_{{\sigma }}(R> L^r )\le {\mathbb P}_{{\sigma }}(R> L^r |S > L^{2k}) + {\mathbb P}_{{\sigma }}({S} \le L^{2k} ) \end{aligned}$$

Note that, under \({\mathbb P}_{{\sigma }}(\cdot |{S} > L^{2k})\), the random numbers \(\{U_x(n): \, x \in {\Lambda }, \, n \le L^{2k} \}\) are i.i.d. with uniform distribution on \(\left( \frac{e^{-2J+q}}{2\cosh (2J-q)}, 1- \frac{e^{-2J+q}}{2\cosh (2J-q)}\right) \). The following probability describe the two possible neutral updatings; atypical updatings are forbidden by the conditioning.

$$\begin{aligned} {\mathbb P}(\eta _x(1) = 1 |{S} > L^{2k})&= {\mathbb P}(\eta _{x^u}(1) = 1 |{S} > L^{2k})= \frac{e^{q {\sigma }_x^r}}{2 \cosh (q {\sigma }_x^r)} \nonumber \\&= \frac{1}{2} + \frac{c{\sigma }_x^r}{2} \frac{\log L}{L} + O\left( \left( \frac{\log L}{L} \right) ^2 \right) . \end{aligned}$$

Thus, denoting by \(N(n)\) the number of spins equal to \(1\) in the restriction to \(D_m\) of \(\eta (n)\), we have that

$$\begin{aligned} p_+ := {\mathbb P}(N(1) = 2 |{S} > L^{2k})&= \left( {\mathbb P}(\eta _x(1) = 1 |{S} > L^{2k})\right) ^2 \\&= \frac{1}{4} + \frac{c {\sigma }_x^r}{2} \frac{\log L}{L} + O\left( \left( \frac{\log L}{L} \right) ^2 \right) \nonumber \\ p_- := {\mathbb P}(N(1) = 0 |{S} > L^{2k})&= \left( 1- {\mathbb P}(\eta _x(1) = 1 |{S} > L^{2k})\right) ^2 \nonumber \\&= \frac{1}{4} - \frac{c {\sigma }_x^r}{2} \frac{\log L}{L} + O\left( \left( \frac{\log L}{L} \right) ^2 \right) . \end{aligned}$$

This argument can now be repeated, since either the discrepancy for \(\eta \) in \(D_m\) has disappeared, or two neutral updatings are possible. This implies that, for \(n \le L^{2k}\) and \(m>0\)

$$\begin{aligned} p_+&= {\mathbb P}(N(n) = m+1 |N(n-1) = m, \, {S} > L^{2k}) \nonumber \\ p_-&= {\mathbb P}(N(n) = m-1 |N(n-1) = m, \, {S} > L^{2k}) \end{aligned}$$

So, set \(\tilde{R} := \min \{ n: N(n) \in \{0,L\}\}\). Note that \(\tilde{R}\wedge L^{2k} = R\wedge L^{2k}\) on \(\{T > S > L^{2k} \}\). Moreover, up to time \(\tilde{R}\wedge L^{2k}\), \(N(n)\) evolves as a \((p_+, p_-)\) one dimensional random walk. We recall that if \((\xi (n))_{n \ge 1}\) is a \((p_+, p_-)\) random walk with \(\xi (0) = 1\), and denote by \(H_{0L}, H_0, H_L\) the hitting times of, respectively, \(\{0,L\}\), \(\{0\}\) and \(\{L\}\), then (see e.g. [4], XIV.2 and XIV.3, where the case \(p_+ + p_- = 1\) is treated, but the same proof applies to \(p_+ + p_- < 1\))

$$\begin{aligned} {\mathbb P}(H_L<H_0)=\frac{1-\frac{p_-}{p_+}}{1-\Big (\frac{p_-}{p_+}\Big )^L} \, \sim \, \left\{ \begin{array}{ll} \frac{4c \log L}{L} &{}\quad \text{ if } {\sigma }_x^u ={\sigma }_x \\ \frac{4c \log L}{L^{4c+1}} &{} \quad \text{ if } {\sigma }_x^u = - {\sigma }_x \end{array} \right. \end{aligned}$$
(42)
$$\begin{aligned} {\mathbb E}(H_{0L})=\frac{1}{p_+ - p_-}\left[ L\frac{1-\frac{p_-}{p_+}}{1-\Big (\frac{p_-}{p_+}\Big )^L}-1 \right] \, \sim \, \left\{ \begin{array}{ll} 4L &{} \quad \text{ if } {\sigma }_x^r =1 \\ \frac{ L}{c \log L} &{}\quad \text{ if } {\sigma }_x^r = - 1 \end{array} \right. \end{aligned}$$
(43)

In particular, by Markov inequality, for every \(r>1\)

$$\begin{aligned} {\mathbb P}(H_{0L} > L^r) \le O(L^{-r+1}). \end{aligned}$$
(44)

From (42) and (44), the desired estimate (32) and (33) follow. Finally, (33) follows from (25) and(33), using the assumption \(2k-4c-3>0\). \(\square \)

Proof of Lemma 5.6

Let

$$\begin{aligned} T^{\xi } := \min \{n: \xi (n) \not \in \{ \xi (n-1), \xi ^{D_m}(n-1): \, m=0,\ldots ,L-1\}\} . \end{aligned}$$

By (41),

$$\begin{aligned} {\mathbb P}(T^{\xi }\le L^{k-2}) \le L^{k-2} O\left( L^{-k+1}\right) = O(L^{-1}). \end{aligned}$$

Similarly with what we did in previous Lemmas, we condition the Markov chain \(\xi (n)\) to the event \(\{T^{\xi } > L^{k-2}\}\). Under this conditioning, we are left with a Markov chain for which, up to time \(L^{k-2}\), (39) and (40) hold, but transitions of the type in (41) are forbidden. Let

$$\begin{aligned} S^{(\xi )}_1 := \min \{n: \, \xi (n) \ne \mathbf{-1} \} \end{aligned}$$

be the first time the process leaves the initial configuration, and

$$\begin{aligned} \overline{S}^{(\xi )}_1 := \min \{n > S^{(\xi )}_1 : \xi (n) = \xi ^i(n-1) \text{ for } \text{ some } i \text{ such } \text{ that } \xi _i(n-1) = \xi _{i+1}(n-1) \}, \end{aligned}$$

where \(\xi ^i\) is the configuration obtained from \(\xi \) by flipping \(\xi _i\). By (40)

$$\begin{aligned} {\mathbb P}\left( S^{(\xi )}_1>L^{4c+3}\right) \le \left( 1- O\left( L^{-4c-2}\right) \right) ^{L^{4c+3}} \le e^{-O(L)}, \end{aligned}$$

and

$$\begin{aligned} {\mathbb P}\left( \overline{S}^{(\xi )}_1 - S^{(\xi )}_1 \le L^{2c} \right) \le L^{2c} L O\left( L^{-4c-1}\right) = O\left( L^{-2c}\right) . \end{aligned}$$

Conditioning to the event \(\{ T^{\xi }>L^{k-2}, \, S^{(\xi )}_1 \le L^{4c+3}, \overline{S}^{(\xi )}_1- S^{(\xi )}_1 > L^{2c} \}\) which, for \(k-4c\) large enough, has probability at least \(1-O\left( L^{-2c}\right) \ge 1-O(L^{-1})\), the number of spin equal to \(1\) in \(\xi (S^{(\xi )}_1 + n)\) evolves as a symmetric random walk, starting from \(1\), and moving with probability \(p \ge O\left( \frac{\log L}{L^2} \right) \) (see 39). We now use identities analogous to (42) and (43) for the case \(p_+ = p_- = p\):

$$\begin{aligned} {\mathbb P}(H_L<H_0)= \frac{1}{L}, \end{aligned}$$
(45)

and

$$\begin{aligned} {\mathbb E}(H_{0L}) = \frac{L-1}{2p} \le O(L^3). \end{aligned}$$

It follows that

$$\begin{aligned} {\mathbb P}\left( \xi (H_{0L}) = \mathbf{1}, H_{0L}<\overline{S}^{(\xi )}_1 | T^{\xi }>L^{k-2}, \, S^{(\xi )}_1 \le L^{4c+3}, \overline{S}^{(\xi )}_1- S^{(\xi )}_1 > L^{2c} \right) \ge O(L^{-1}), \end{aligned}$$

and

$$\begin{aligned} {\mathbb P}(H_{0L} > C) \le \frac{O(L^3)}{C}. \end{aligned}$$

Thus, introducing the stopping times, for \(j \ge 1\) (note the analogy with (35) in the previous step of the renormalization)

$$\begin{aligned} R^{(\xi )}_0&:= 0 \\ S^{(\xi )}_j&:= \min \{n>R^{(\xi )}_{j-1}: \xi (n) \not \in \{\mathbf{-1}, \mathbf{1}\} \} \\ R^{(\xi )}_{j}&:= \min \{n>S^{(\xi )}_j : \xi (n) \in \{\mathbf{-1}, \mathbf{1}\} \} \end{aligned}$$

we have, by (45),

$$\begin{aligned} {\mathbb P}\left( \xi (R^{(\xi )}_j) = \mathbf{1} | \xi (R^{(\xi )}_{j-1}) = \mathbf{-1} \right) \ge O(L^{-1}), \end{aligned}$$

and

$$\begin{aligned} {\mathbb P}\left( R^{(\xi )}_j - R^{(\xi )}_{j-1} > L^k \right) \le {\mathbb P}\left( S^{(\xi )}_1 > L^{k-1}\right) + {\mathbb P}(H_{0L} > L^{k-1}) \le O(L^{-k+4}), \end{aligned}$$

where we have used again the fact that \(k-4c\) is sufficiently large. Finally, for \(k\) large enough,

$$\begin{aligned} {\mathbb P}\left( T^{(\xi )}_\mathbf{1}>L^{k+2} \right)&\le {\mathbb P}\left( R^{(\xi )}_{L^2} \le L^{k} \right) + \sum _{j \le L^2} {\mathbb P}\left( R^{(\xi )}_j - R^{(\xi )}_{j-1} > L^k \right) \\&\le \left( 1- O(L^{-1}) \right) ^{L^2} + L^2 O(L^{-k+4}) \le O(L^{-1}). \end{aligned}$$

\(\square \)