1 Introduction

This paper studies the out-of-equilibrium behavior of the Metropolis dynamics on the random energy model (REM). Our main goal is to answer one of the remaining important open questions in the field, namely whether this dynamics exhibits aging, and, if yes, whether its aging behavior admits the usual description in terms of stable Lévy processes.

Aging is one of the main features appearing in the long-time behavior of complex disordered systems (see e.g. [12] for a review). It was for the first time observed experimentally in the anomalous relaxation patterns of the residual magnetization of spin glasses (e.g. [18, 29]). One of the most influential steps in the theoretical modeling of the aging phenomenon is the introduction of the so-called trap models by Bouchaud [11] and Bouchaud and Dean [13]. These models, while being sufficiently simple to allow analytical treatment, reproduce the characteristic power law decay seen experimentally.

Since then a considerable effort has been made in putting the predictions obtained from the trap models to a solid basis, that is to derive these predictions from underlying spin-glass dynamics. The first attempt in this direction was made in [68] where it was shown that, for a very particular Glauber-type dynamics, at time scales very close to the equilibration, a well chosen two-point correlation function converges to that given by Bouchaud’s trap model.

With the paper [9], where the same type of dynamics was studied in a more general framework and on a broader range of time scales, it emerged that aging establishes itself by the fact that scaling limits of certain additive functionals of Markov chains are stable Lévy processes, and that the convergence of the two-point correlation functions is just a manifestation of the classical arcsine law for stable subordinators.

The Glauber-type dynamics used in those papers, sometimes called random hopping time (RHT) dynamics, is however rather simple and is often considered as ‘non-realistic’, mainly because its transition rates do not take into account the energy of the target state. Its advantage is that it can be expressed as a time change of a simple random walk on the configuration space of the spin glass, which allows for a certain decoupling of the randomness of the dynamics from the randomness of the Hamiltonian of the spin glass, making its rigorous studies more tractable.

For more realistic Glauber-type dynamics of spin glasses, like the so-called Bouchaud’s asymmetric dynamics or the Metropolis dynamics, such decoupling is not possible. As a consequence, these dynamics are far less understood.

Recently, some progress has been achieved in the context of the simplest mean-field spin glass model, the REM. First, in [31], the Bouchaud’s asymmetric dynamics have been considered in the regime where the asymmetry parameter tends to zero with the size of the system. Building on the techniques started in [32], this papers confirms the predictions of Bouchaud’s trap model in this regime. Second, the Metropolis dynamics have been studied in [26], for a truncated version of the REM, using the techniques developed for the RHT dynamics in [24, 25], again confirming Bouchaud’s predictions.

The weak asymmetry assumption of [31] and the truncation of [26] have both the same purpose. They aim at overcoming some specific features of the asymmetry and recovering certain features of the RHT dynamics. Our aim in this work is to get rid of these simplifications and treat the non-modified REM with the usual Metropolis dynamics.

Let us also mention that Bouchaud’s asymmetric dynamics (and implicitly the Metropolis one) is rather well understood in the context of trap models on \(\mathbb {Z}^d\), see [3, 14, 27], where it is possible to exploit the connections to the random conductance model with unbounded conductances [4]. Finally, the Metropolis dynamics on the complete graph was considered in [25].

Before stating our main result, let us briefly recall the general scheme for proving aging in terms of convergence to stable Lévy processes. The actual spin glass dynamics, \(X=(X_t)_{t\ge 0}\), which is reversible with respect to the Gibbs measure of the Hamiltonian, is compared to another Markov chain \(Y=(Y_t)_{t\ge 0}\) on the same space, which is an ‘accelerated’ version of X and whose stationary measure is uniform. The process Y is typically easier to understand, e.g. it is a simple random walk for the RHT dynamics, and the original process X can be written as its time change,

$$\begin{aligned} X(t)=Y(S^{-1}(t)), \end{aligned}$$
(1.1)

for the right continuous inverse \(S^{-1}\) of a certain additive functional S of the Markov chain Y, called the ‘clock process’. The aim is then to show convergence of the properly rescaled clock process S to an increasing stable Lévy process, that is to a stable subordinator.

We now state our main result. We consider the unmodified REM, as introduced in [19, 20]. The state space of this model is the N-dimensional hypercube \(\mathbb {H}_N=\{-1,1\}^N\), and its Hamiltonian is a collection \((E_x)_{x\in \mathbb {H}_N}\) of i.i.d. standard Gaussian random variables defined on some probability space \((\Omega , \mathcal {F},\mathbb {P})\). The non-normalized Gibbs measure \(\tau _x = e^{\beta \sqrt{N}E_x}\) at inverse temperature \(\beta >0\) gives the equilibrium distribution of the system.

The Metropolis dynamics on the REM is the continuous-time Markov chain \(X=(X_t)_{t\ge 0}\) on \(\mathbb {H}_N\) with transition rates

$$\begin{aligned} r_{xy}= e^{-\beta \sqrt{N} (E_x-E_y)_+}\mathbf {1}_{\{x\sim y\}} = \left( 1\wedge \frac{\tau _y}{\tau _x}\right) \mathbf {1}_{\{x\sim y\}}, \qquad x,y\in {\mathbb {H}}_N. \end{aligned}$$
(1.2)

Here, \(x\sim y\) means that x and y are neighbors on \({\mathbb {H}}_N\), that is they differ in exactly one coordinate.

As explained above, we will compare the Metropolis chain X with another ‘fast’ Markov chain \(Y = (Y_t)_{t\ge 0}\) with transition rates

$$\begin{aligned} q_{xy}= \frac{\tau _x\wedge \tau _y}{1\wedge \tau _x}\mathbf {1}_{\{x\sim y\}},\qquad x,y\in {\mathbb {H}}_N. \end{aligned}$$
(1.3)

It can be easily checked using the detailed balance conditions that Y is reversible and that its equilibrium distribution is

$$\begin{aligned} \nu _x=\frac{1\wedge \tau _x}{Z_N}, \qquad x\in {\mathbb {H}}_N, \end{aligned}$$

where \(Z_N= \sum _{x\in \mathbb {H}_N}(1\wedge \tau _x)\). Finally, since \(r_{xy} = (1\vee \tau _x)^{-1} q_{xy}\), X can be written as a time change of Y as in (1.1) with the clock process S being given by

$$\begin{aligned} S(t) = \int _0^t (1\vee \tau _{Y_s})\,{\mathrm d}s. \end{aligned}$$
(1.4)

For the rest of the paper we mostly only deal with the process Y and the clock process S. With exception of this introduction and Sect. 8, the actual Metropolis dynamics X does not appear. For a fixed environment \(\tau =(\tau _x)_{x\in \mathbb {H}_N}\), let \(P^{\tau }_{\nu }\) denote the law of the process Y started from its stationary distribution \(\nu \), and let \(D([0,T],\mathbb {R})\) be the space of \(\mathbb {R}\)-valued cadlag functions on [0, T]. We denote by \(\beta _c = \sqrt{2 \log 2}\) the (static) critical temperature of the REM. Our main result is the following theorem.

Theorem 1.1

Let \(\alpha \in (0,1)\) and \(\beta >0\) be such that

$$\begin{aligned} \frac{1}{2} < \frac{\alpha ^2\beta ^2}{\beta _c^2} < 1, \end{aligned}$$
(1.5)

and define

$$\begin{aligned} g_N=e^{\alpha \beta ^2 N}\big (\alpha \beta \sqrt{2\pi N}\big )^{-\frac{1}{\alpha }}. \end{aligned}$$
(1.6)

Then there are random variables \(R_N\) which depend on the environment \((E_x)_{x\in \mathbb {H}_N}\) only, such that for every \(T>0\) the rescaled clock processes

$$\begin{aligned} S_N(t) = g_N^{-1}S(tR_N),\qquad t\in [0,T], \end{aligned}$$

converge in \(\mathbb {P}\)-probability as \(N\rightarrow \infty \), in \(P^{\tau }_{\nu }\)-distribution on the space \(D([0,T],\mathbb {R})\) equipped with the Skorokhod \(M_1\)-topology, to an \(\alpha \)-stable subordinator \(V_{\alpha }\). The random variables \(R_N\) satisfy

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\log R_N}{N} = \frac{\alpha ^2 \beta ^2}{2}, \quad \mathbb {P}\text {-a.s.} \end{aligned}$$
(1.7)

Let us make a few remarks on this result.

  1. 1.

    The result of Theorem 1.1 confirms that the predictions of Bouchaud’s trap model hold for the Metropolis dynamics on the REM, at least at the level of scaling limits of clock processes. It also compares directly to the results obtained for the symmetric (RHT) dynamics in Theorem 3.1 of [9]. The scales \(g_N\) and \(R_N\) are (up to sub-exponential prefactors) the same as previously. Similarly as in [9], the right inequality of the assumption (1.5) is completely natural, beyond it Y ‘feels’ the finiteness of \(\mathbb {H}_N\) and aging is not expected to occur. The left inequality in (1.5) is technical, it ensures that the relevant deep traps are well separated (cf. Lemma 2.1), introducing certain simplifications in the proof. Basing also on Theorem 2.1 of [15] (which is an improved version of Theorem 3.1 of [9]), we believe that this bound might be improved to \(\alpha ^2 \beta ^2 /\beta _c^2 > 0\), by further exploiting our method. Finally, as previously, note that (1.5) is satisfied also for \( \beta <\beta _c\) for appropriate \(\alpha \), hence aging can occur above the critical temperature at sufficiently short observation scales.

  2. 2.

    Our choice of the fast chain Y is rather unusual. In view of the previous papers [3, 31], it would be natural to take instead the chain \(\tilde{Y}\) with uniform invariant measure and with transition rates \(\tau _x\wedge \tau _y\), that is without the correction \(1\wedge \tau _x\) which appears in the denominator of (1.3). This choice has, however, some deficiencies. On the heuristic level, \(\tilde{Y}\) is not an acceleration of X, since it is much slower than X on sites with very small Gibbs measure \(\tau _x\ll 1\). These sites, which are irrelevant for the statics, then ‘act as traps’ on \(\tilde{Y}\), making them relevant for the dynamics, which is undesirable. On the technical level, the trapping on sites with small Gibbs measure has the consequence that the mixing time of \(\tilde{Y}\) is very large. Our choice of the fast chain Y runs as fast as X on the sites with small Gibbs measure and thus does not have this deficiency. Moreover, since \(\nu _x=Z_N^{-1}\) whenever \(E_x\ge 0\), the equilibrium distribution of the fast chain Y is still uniform on the relevant deep traps, so the clock process S retains its usual importance for aging. Remark also that in order to overcome the difficulties with the slow mixing of the chain \(\tilde{Y}\), [31] truncate the Hamiltonian of the REM at 0 which effectively sets \(\tau _x\ge 1\) for all \(x\in {\mathbb {H}}_N\). We prefer to retain the full REM and use the modified fast chain Y instead. Finally, [26] uses the discrete skeleton of X as the base chain, which has some interesting features but introduces similar undesirable effects.

  3. 3.

    We view Theorem 1.1 as our primary aging statement. The reason for this is that it seems hard, without extending the paper considerably, to derive aging results in terms of the usual two-point functions as e.g. in [9, 26], or in terms of the age process as in [22, 31]. Such derivation usually requires some knowledge of the fast chain Y that goes over the \(M_1\)-convergence of the clock processes. This knowledge is typically automatically obtained in the previous approaches. The strength (or the weakness) of our method is that we do not need to obtain such finer knowledge to show the clock process convergence. Without the finer knowledge, it is possible to consider only non-standard (and to some extent artificial) two-point functions or age processes. We give two examples of such results in the concluding Sect. 8.

  4. 4.

    A rather unusual feature of Theorem 1.1 is the fact that the scaling \(R_N\) is random, it depends on the random environment. This is again a consequence of our technique. Claim (1.7) in Theorem 1.1 however shows that at least the exponential growth of \(R_N\) is deterministic. The random scale \(R_N\) is explicitly defined in (2.10). We will see that its definition depends on a somewhat free choice of an auxiliary parameter, but nevertheless the final result does not depend on this parameter. This property (and also other heuristic considerations) makes us conjecture that \(R_N\) should actually satisfy a law of large numbers,

    $$\begin{aligned} \lim _{N\rightarrow \infty } h(N) e^{-\alpha ^2 \beta ^2N /2 } R_N =1, \qquad {\mathbb {P}}\text {-a.s.}, \end{aligned}$$

    for some deterministic function h(N) growing (or decaying) at most sub-exponentially. Observe also that the randomness of \(R_N\) should not have, at least heuristically, any influence on the aging behavior of the process X. \(R_N\) is the time scale of the auxiliary process Y. The only physically relevant scale is \(g_N\), giving the observation time-scale for the Metropolis chain X. At least partially, this is confirmed by the aging results of Sect. 8.

  5. 5.

    The mode of convergence in Theorem 1.1 is not optimal, one would rather like to obtain the convergence in \(P^{\tau }_{\nu }\)-distribution for \(\mathbb {P}\)-almost every environment, which is usually called ‘quenched’ convergence. Actually, Theorem 1.1 can be strengthened slightly to a statement which is somewhere between \({\mathbb {P}}\)-a.s. convergence and convergence in \({\mathbb {P}}\)-probability. Namely, the statement holds for a.e. realization of sites with ‘small’ \(\tau _x\), but only in probability over sites with ‘large’ \(\tau _x\), cf. Remark 6.4.

  6. 6.

    Our proof of Theorem 1.1 strongly exploits the i.i.d. structure of the Hamiltonian of the REM. At present we do not know if it is possible to combine our techniques with those used for the RHT dynamics of the p-spin model in [5, 10].

We proceed by commenting on the proof of Theorem 1.1, concentrating mainly on its novelties. The general strategy so far to prove such a result has been to first reduce the problem to the clock process restricted to a set of deep traps which govern the behavior of the original clock process. The different methods then all more or less aim at dividing the contribution of consequently found deep traps into essentially i.i.d. blocks. For example in [9] or [3], this is achieved by controlling the hitting probabilities of deep traps, proving that they are hit essentially uniformly in exponentially distributed times, and controlling the time the chain spends at the deep traps by a sharp control of the Green function. Similar rather precise estimates on hitting probabilities and/or Green function are necessary in other approaches. Using this i.i.d. structure, one can then show convergence of the clock process by standard methods, e.g. computing the Laplace transform.

The method used in this paper is slightly inspired by the general approach taken in [17, 22]. There, models of trapped random walks on \(\mathbb {Z}^d\) are considered where little information about the discrete skeleton as well as the waiting times of a continuous-time Markov chain is available, and minimal necessary conditions for convergence of the clock process are found. Taking up this idea, instead of analyzing in detail the behavior of the fast chain Y, we extract the minimal amount of information needed to show convergence of the clock process. In particular, we do not need any exact control of hitting probabilities and Green functions of deep traps, as most previous work did.

The first step in our proof is standard, namely that the main contribution to the clock process comes from a small set of vertices with large Gibbs measure \(\tau _x\), the so-called deep traps. More precisely, denoting the set of deep traps by \(\mathcal {D}_N\) (see Sect. 2 for details), we will show that the clock process S can be well approximated by the ‘clock process of the deep traps’

$$\begin{aligned} S_{\mathcal {D}}(t) = \int _0^{t} \big (1\vee \tau _{Y_s}\big )\mathbf {1}_{\{Y_s\in \mathcal {D}_N\}}\,{\mathrm d}s. \end{aligned}$$
(1.8)

Then it remains to show that in fact \(g_N^{-1}S_{\mathcal {D}}(tR_N)\) converges to a stable subordinator.

To this end, we will in some sense invert the standard procedure described above. Instead of approximating the clock process by an i.i.d. block structure and then use the Laplace transform to show convergence, we will first compute a certain conditional Laplace transform using some special properties of the Metropolis dynamics. Then we analyze what is actually needed in order to show convergence of the unconditional Laplace transform.

A bit more detailed, this will be done as follows. Under condition (1.5), the deep traps are almost surely well separated. This fact and the fact that the definition (1.3) contains the factor \(\tau _x\wedge \tau _y\) imply that the transition rates \(q_{xy}\) of the fast chain Y do not depend on the energies \(E_x\) of the deep traps. Therefore, one can condition on the location of all deep traps and the energies \(E_x\) of the non-deep traps, which determines the law \(P^{\tau }_{\nu }\) of Y, and take the expectation over the energies of the deep traps. We call this a ‘quasi-annealed’ expectation, and denote it by \(\mathbb {E}_{\mathcal {D}}\) for the moment. Let \(\ell _ {t}(x)\) denote the local time of the fast chain Y (see Sect. 2 for details). As \(\mathbb {E}_{\mathcal {D}}\) is simply an expectation over i.i.d. random variables, the quasi-annealed Laplace transform of the rescaled clock process of the deep traps given Y can be computed. It essentially behaves like

$$\begin{aligned} \mathbb {E}_{\mathcal {D}}\Big [e^{-\lambda \frac{1}{g_N}S_{\mathcal {D}}(tR_N)}\,\Big |\, Y \Big ] \approx \exp \bigg \{-\mathcal {K}\lambda ^{\alpha } \varepsilon _N \sum _{x\in \mathcal {D}_N} \ell _{tR_N}(x)^{\alpha }\bigg \}. \end{aligned}$$
(1.9)

Here, \(\varepsilon _N\) is a deterministic sequence tending to 0 as \(N\rightarrow \infty \). The above approximation shows that the only object related to Y we have to control is the local-time functional \(\varepsilon _N \sum _{x\in \mathcal {D}_N}\ell _{tR_N}(x)^{\alpha }\).

We will show that this a priori non-additive functional of Y actually behaves in an additive way, namely that it converges to t as \(N\rightarrow \infty \), under \(P^{\tau }_{\nu }\) for \(\mathbb {P}\)-a.e. environment \(\tau \). For this convergence to hold it is sufficient to have some weak bounds on the mean hitting time of deep traps as well as some control on the mixing of the chain Y together with an appropriate choice of the scale \(R_N\) that depends on the environment.

Using standard methods we then strengthen the quasi-annealed convergence to quenched convergence (in the sense of Theorem 1.1).

To conclude the introduction, let us comment on how our method might be extended. The key argument in the computation of the quasi-annealed Laplace transform, namely the fact that the chain Y is independent of the depth of the deep traps, seems very specific for the Metropolis dynamics. However, by adapting the method appropriately and using network reduction techniques, we believe that one could also treat Bouchaud’s asymmetric dynamics and Metropolis dynamics in the regime where the left-hand side inequality of (1.5) fails, i.e. there are neighboring deep traps.

The rest of the paper is structured as follows. Detailed definitions and notations used through the paper are introduced in Sect. 2. In Sect. 3 we analyze the mixing properties of the fast chain Y, which will be crucial at several points later. In Sect. 4 we give bounds on the mean hitting time of deep traps and on the normalizing scale \(R_N\). Using these bounds and the results on the mixing of Y, we show concentration of the local time functional \(\varepsilon _N\sum _{x \in \mathcal {D}_N}\ell _{tR_N}(x)^{\alpha }\) in Sect. 5. We prove convergence of the rescaled clock process of the deep traps in Sect. 6 with the above mentioned computation of the quasi-annealed Laplace transform, using the concentration of the local time functional. We treat the shallow traps in Sect. 7 by showing that their contribution to the clock process can be neglected. Finally, in Sect. 8, we prove Theorem 1.1 and present some additional aging results, as already mentioned. In Appendix we give the proof of a technical result which is used to bound the expected hitting times in Sect. 4.

2 Definitions and notation

In this section we introduce some notation used through the paper and recall a few useful facts. We use \({\mathbb {H}}_N\) to denote the N-dimensional hypercube \(\{-1,1\}^N\) equipped with the usual distance

$$\begin{aligned} d(x,y)=\frac{1}{2}\sum _{i=1}^N |x_i-y_i|, \end{aligned}$$

and write \(\mathcal {E}_N\) for the set of nearest-neighbor edges \(\mathcal {E}_N = \{\{x,y\}:~d(x,y)=1\}\).

For given parameters \(\alpha \) and \(\beta \), let

$$\begin{aligned} \gamma =\frac{\alpha ^2 \beta ^2}{\beta _c^2}\in ( 1/2, 1), \end{aligned}$$
(2.1)

by condition (1.5) in Theorem 1.1.

Recall from the introduction that \((E_x:x\in \mathbb {H}_N,N\ge 1)\), is a family of i.i.d. standard Gaussian random variables defined on some probability space \((\Omega ,\mathcal {F},\mathbb {P})\). Note that we do not denote the dependence on N explicitly, but we assume that the space \((\Omega , \mathcal {F}, {\mathbb {P}})\) is the same for all N. For \(\beta >0\) the non-normalized Gibbs factor \(\tau _x\) is given by \(\tau _x=e^{\beta \sqrt{N}E_x}\).

Using the standard Gaussian tail approximation,

$$\begin{aligned} \mathbb {P}[E_x\ge t] = \frac{1}{t\sqrt{2\pi } }\ e^{-{t^2}/{2}}\big (1+o(1)\big ) \quad \text {as }t\rightarrow \infty , \end{aligned}$$
(2.2)

we obtain that \(g_N\), as defined in Theorem 1.1, satisfies

$$\begin{aligned} \mathbb {P}[\tau _x>ug_N] = u^{-\alpha } 2^{-\gamma N}\big (1+o(1)\big ). \end{aligned}$$

This heuristically important computation explains the appearance of stable laws in the distribution of sums of \(\tau _x\): If we observe \(2^{\gamma N}\) vertices, then finitely many of them have their rescaled Gibbs measures \(\tau _x/g_N\) of order unity, and, moreover, those rescaled Gibbs measures behave like random variables in the domain of attraction of an \(\alpha \)-stable law.

Recall also that \(Y=(Y_t)_{t\ge 0}\) stands for the fast Markov chain whose transition rates \(q_{xy}\) are given in (1.3), and that \(\nu = (\nu _x)_{x\in {\mathbb {H}}_N}\) denotes the invariant distribution of this chain, \(\nu _x=\frac{1\wedge \tau _x}{Z_N}\). For a given environment \(\tau =(\tau _x)_{x\in \mathbb {H}_N}\), let \(P^{\tau }_x\) and \(P^{\tau }_{\nu }\) denote the laws of Y started from a vertex x or from \(\nu \) respectively, and \(E^{\tau }_x\), \(E^{\tau }_{\nu }\) the corresponding expectations.

Note that the normalization factor \(Z_N = \sum _{x\in {\mathbb {H}}_N} (1\wedge \tau _x)\) satisfies, for every constant \(\kappa \in (0,1/2)\),

$$\begin{aligned} \kappa 2^N \le Z_N \le 2^N \qquad {\mathbb {P}}\text {-a.s for}\,N\, \text {large enough}. \end{aligned}$$
(2.3)

Indeed, obviously \(Z_N \le 2^N\), and \(Z_N \ge \sum _{x\in \mathbb {H}_N} \mathbf {1}_{\{E_x\ge 0\}}\). But \(\mathbf {1}_{\{E_x\ge 0\}}\) are i.i.d. Bernoulli random variables, therefore the statement follows immediately by the law of large numbers.

An important role in the study of properties of Y is played by the conductances defined by

$$\begin{aligned} c_{xy} = \nu _x q_{xy} = \frac{\tau _x\wedge \tau _y}{Z_N} \qquad \text {for }x\sim y. \end{aligned}$$
(2.4)

Let \(\theta _s\) be the left shift on the space of trajectories of Y, that is

$$\begin{aligned} (\theta _sY)_t=Y_{s+t}. \end{aligned}$$
(2.5)

Let \(H_x = \inf \{t>0:~Y_t=x\}\) be the hitting time of x by Y, \(J_1\) the time of the first jump of Y, and let \(H^+_x = H_x\circ \theta _{J_1} + J_1 = \inf \{t>J_1:~Y_t=x\}\) be the return time to x by Y. Similarly define \(H_A\) and \(H^+_A\) for a set \(A \subset \mathbb {H}_N\). The local time \(\ell _t(x)\) of Y is given by

$$\begin{aligned} \ell _t(x) = \int _0^t\mathbf {1}_{\{Y_s=x\}}{\mathrm d}s. \end{aligned}$$

Using this notation the clock process S introduced in (1.4) can be written as

$$\begin{aligned} S(t) = \int _0^t (1\vee \tau _{Y_s})\,{\mathrm d}s = \sum _{x\in \mathbb {H}_N} \ell _t(x) (1\vee \tau _x). \end{aligned}$$

To define the set of deep traps \(\mathcal {D}_N\) and the random scale \(R_N\) mentioned in the introduction we introduce a few additional parameters. For \(\alpha \in (0,1)\), \(\beta >0\) as in Theorem 1.1 and \(\gamma \) as in (2.1), we fix \(\gamma '\) and \(\alpha '\) such that

$$\begin{aligned} \frac{1}{2}<\gamma '<\gamma , \quad \text {and} \quad \alpha ' = \frac{\beta _c}{\beta } \sqrt{\gamma '}. \end{aligned}$$
(2.6)

An explicit choice of \(\gamma '\) will be made later in Sect. 5. We define the auxiliary scale

$$\begin{aligned} g'_N = e^{\alpha ' \beta ^2 N} \big (\alpha '\beta \sqrt{2\pi N}\big )^{-\frac{1}{\alpha ' }}, \end{aligned}$$

and set

$$\begin{aligned} \mathcal {D}_N=\{x\in \mathbb {H}_N:~\tau _x \ge g'_N\}. \end{aligned}$$

to be the set of deep traps. By the Gaussian tail approximation (2.2) it follows that the density of \(\mathcal {D}_N\) satisfies

$$\begin{aligned} \mathbb {P}[x\in \mathcal {D}_N] = 2^{-\gamma ' N}(1+o(1)). \end{aligned}$$
(2.7)

We quote the following observation on the size and sparseness of \(\mathcal {D}_N\). The sparseness will play a key role in our computation of the quasi-annealed Laplace transform in Sect. 6.

Lemma 2.1

[9, Lemma 3.7] For every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} |\mathcal {D}_N|2^{(\gamma '-1)N} \in (1-\varepsilon ,1+\varepsilon ). \end{aligned}$$
(2.8)

Moreover, since \(\gamma '>1/2\), there exists \(\delta >0\) such that \(\mathbb {P}\)-a.s. for N large enough, the separation event

$$\begin{aligned} \mathscr {S}=\big \{\min \{d(x,y):~x,y\in \mathcal {D}_N\} \ge \delta N\big \} \end{aligned}$$
(2.9)

holds.

Finally, for the sake of concreteness, let us give the explicit form of the random scale \(R_N\),

$$\begin{aligned} R_N = 2^{(\gamma -\gamma ')N}\left( \sum _{x\in \mathcal {D}_N} \frac{E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }]}{E^{\tau }_{\nu }[H_x]}\right) ^{-1}, \end{aligned}$$
(2.10)

where \({T_{\mathrm {mix}}}\) denotes the mixing time of Y, a randomized stopping time which we will construct in Sect. 3. The reason for this definition will become apparent when we prove the concentration of the local time functional mentioned in the introduction. Although the definition of \(R_N\) seems arbitrary by the somewhat free choice of the parameter \(\gamma '\), Theorem 1.1 actually shows that asymptotically \(R_N\) will be independent of \(\gamma '\).

For the rest of the paper, \(c,c',c''\) will always denote positive constants whose values may change from line to line. We will use the notation \(g=o(1)\) for a function g(N) that tends to 0 as \(N\rightarrow \infty \), and \(g=O(f)\) for a function g(N) that is asymptotically at most of order f(N), i.e. \(\lim _{N\rightarrow \infty }|g(N)|/f(N)\le c\), for some \(c>0\).

3 Mixing properties of the fast chain

The fact that the chain Y mixes fast, namely on a scale polynomial in N, plays a crucial role in many of our arguments. In this section we analyze the mixing behavior of Y. We first give a lower bound on the spectral gap \(\lambda _Y\) of Y, which we then use to construct a strong stationary time \({T_{\mathrm {mix}}}\).

Proposition 3.1

There are constants \(\kappa >0\), \(K>0\), \(C_0>0\), such that \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \lambda _Y \ge \frac{\kappa }{4} N^{-K-1-\beta C_0}. \end{aligned}$$

We prove this proposition with help of the Poincaré inequality derived in [21]. To state this inequality, let \(\Gamma \) be a complete set of self-avoiding nearest-neighbor paths on \(\mathbb {H}_N\), that is for each \(x\ne y\in \mathbb {H}_N\) there is exactly one path \(\gamma _{xy} \in \Gamma \) connecting x and y. Let \(|\gamma |\) be the length of the path \(\gamma \). By [21, Proposition 1’, p. 38], using also the reversibility of Y and recalling the definition (2.4) of the conductances, it follows that

$$\begin{aligned} \frac{1}{\lambda _Y} \le \max _{e=\{u,v\}\in \mathcal {E}_N} \left\{ \frac{1}{c_{uv}}\sum _{ \begin{array}{c} \gamma _{xy}\in \Gamma :\gamma _{xy}\ni e \end{array}} |\gamma _{xy}|\nu _x\nu _y \right\} . \end{aligned}$$
(3.1)

To minimize the right-hand side of (3.1), special care should be taken of the edges whose conductance \(c_{uv} = (\tau _u \wedge \tau _v)/Z_N\) is very small, that is which are incident to vertices with very small \(\tau _u\). Those ‘bad’ edges should be avoided if possible by paths \(\gamma \in \Gamma \). They cannot be avoided completely, since \(\Gamma \) should be a complete set of paths. On the other hand, if such edge is the first or the last edge of some path \(\gamma _{xy} \), its small conductance is canceled by equally small \(\nu _x\) or \(\nu _y\). Therefore, to apply (3.1) efficiently, one should find a set of paths \(\Gamma \) such that all paths \(\gamma \in \Gamma \) avoid ‘bad’ vertices, except for vertices at both ends of the paths.

In the context of spin glass dynamics this method was used before in [23] to find the spectral gap of the Metropolis dynamics (1.2). Using the same approach, that is using the same set of paths \(\Gamma \) as in [23], we could find a lower bound on the spectral gap of the fast chain Y of leading order \(\exp \{-c\sqrt{ N \log N}\}\). This turns out to be too small for our purposes, cf. Remark 6.4.

In the next lemma we construct a set of paths \(\Gamma \) that avoids more ‘bad’ vertices, which allows to improve the lower bound on the spectral gap to be polynomial in N. This is possible by using an embedding of \(\mathbb {H}_N\) into its sub-graph of ‘good’ vertices, i.e. vertices with not too small \(\tau _x\), which is inspired by similar embeddings in [28].

For a nearest-neighbor path \(\gamma = \{x_0,\ldots ,x_n\}\), we call the vertices \(x_1,\ldots ,x_{n-1}\) the interior vertices of \(\gamma \), and the edges \(\{x_i,x_{i+1}\}\), \(i=1,\ldots ,n-2\), the interior edges of \(\gamma \).

Lemma 3.2

There is an integer \(K>0\) and a constant \(C_0>0\), such that \(\mathbb {P}\)-a.s. for N large enough there exists a complete set of paths \(\Gamma \), such that the following three properties hold.

  1. (i)

    For every path \(\gamma \in \Gamma \), every interior edge \(e=\{u,v\}\) satisfies

    $$\begin{aligned} Z_N c_{uv}=\tau _u\wedge \tau _v \ge N^{-\beta C_0}. \end{aligned}$$
  2. (ii)

    \(|\gamma |\le 8 N\) for all \(\gamma \in \Gamma \).

  3. (iii)

    Every edge \(e\in \mathcal {E}_N\) is contained in at most \(N^K 2^{N-1}\) paths \(\gamma \in \Gamma \).

Proof

For \(C_0>0\), whose value will be fixed later, we say that \(x\in \mathbb {H}_N\) is good if \(\tau _x \ge N^{-\beta C_0}\), and it is bad otherwise. To construct the complete set of paths \(\Gamma \) satisfying the required properties, we will use the fact that the set of good vertices is very dense in \(\mathbb {H}_N\). In particular, we will show that

$$\begin{aligned} \mathbb {P}{\text {-}}\hbox {a.s. for }N\hbox { large enough, every }x\in \mathbb {H}_N \,\hbox {has at least } \frac{1}{2}C_0\sqrt{N}\hbox { good neighbors}, \end{aligned}$$
(3.2)

and

$$\begin{aligned}&\mathbb {P}{\text {-}}\hbox {a.s. for }N\hbox { large enough, for any pair of vertices }x,y\hbox { at distance 2 or 3}, \nonumber \\&\quad \hbox {there is a nearest-neighbor path of length at most 7 connecting }x\hbox { and }y,\nonumber \\&\quad \hbox { such that all interior vertices of this path are good}, \end{aligned}$$
(3.3)

To prove these two claims, note first that for any \(x\in \mathbb {H}_N\), the probability of being bad is

$$\begin{aligned} \mathbb {P}\left[ \tau _x < N^{-\beta C_0}\right] = \mathbb {P}[E_x < - C_0 N^{-\frac{1}{2}}\log N] = \frac{1}{2} - \int _0^{C_0 N^{-\frac{1}{2}}\log N} \frac{1}{\sqrt{2\pi }} e^{-\frac{s^2}{2}}{\mathrm d}s. \end{aligned}$$

For N large enough the integrand is larger than \(\frac{3}{10}\), and it follows that

$$\begin{aligned} \mathbb {P}[x\text { is bad}] \le \frac{1}{2}\left( 1-\frac{3}{5} C_0N^{-\frac{1}{2}}\log N\right) =: \frac{1}{2}(1-q_N). \end{aligned}$$

Hence, the number of bad neighbors of a vertex \(x\in \mathbb {H}_N\) is stochastically dominated by a Binomial\(\big (N,\frac{1}{2}(1-q_N)\big )\) random variable B. For \(\lambda >0\), the exponential Chebyshev inequality yields

$$\begin{aligned} \mathbb {P}\bigg [x&\text { has more than } N-\frac{1}{2} C_0\sqrt{N} \text { bad neighbors}\bigg ]\\&\le \mathbb {P}\left[ B\ge N-\frac{1}{2} C_0\sqrt{N}\right] = \mathbb {P}\left[ e^{\lambda B} \ge e^{\lambda (N-\frac{1}{2} C_0\sqrt{N})} \right] \\&\le e^{-\lambda (N-\frac{1}{2} C_0\sqrt{N})} \left( 1+ \frac{1}{2}(1-q_N)(e^{\lambda }-1)\right) ^N \\&= e^{-\lambda (N-\frac{1}{2} C_0\sqrt{N})} \left( \frac{e^{\lambda }}{2}\left( 1-q_N + e^{-\lambda }(1+q_N)\right) \right) ^N \\&\le 2^{-N} e^{\frac{\lambda }{2} C_0\sqrt{N}} \left( \exp \{-q_N + e^{-\lambda }(1+q_N)\}\right) ^N. \end{aligned}$$

Since \(q_N\rightarrow 0\) as \(N\rightarrow \infty \), the second term in the braces is bounded by \(2e^{-\lambda }\) for N large enough. Inserting \(q_N\) and choosing \(\lambda = \log N \), the above is bounded by

$$\begin{aligned} 2^{-N}&\exp \left\{ \frac{1}{2} C_0\sqrt{N}\log N - \frac{3}{5} C_0 \sqrt{N} \log N + 2\right\} \\&\le 2^{-N} \exp \left\{ -\frac{1}{10}C_0\sqrt{N} \log N\right\} , \end{aligned}$$

for N large enough. With a union bound over all \(x\in \mathbb {H}_N\) and using the Borel–Cantelli lemma, (3.2) follows.

To prove (3.3), we first introduce some notation. For a given vertex x and \(\{i_1,\ldots ,i_k\}\subset \{1,\ldots ,N\}\), denote by \(x^{i_1\ldots i_k}\) the vertex that differs from x exactly in coordinates \(i_1,\ldots ,i_k\). If two vertices x and y are at distance 2, then \(y=x^{kl}\) for some \(k,l\in \{1,\ldots , N\}\). Then for \(\{i,j\}\cap \{k,l\}=\emptyset \) we define the path \(\gamma _{xy}^{ij}\) of length 6 as \(\{x,x^i, x^{ij}, x^{ijk}, x^{ijkl}=y^{ij}, y^{j}, y\}\). Similarly, for xy with \(d(x,y)=3\), we have \(y=x^{klm}\), and for \(\{i,j\}\cap \{k,l,m\}=\emptyset \) we define the path \(\gamma _{xy}^{ij}\) of length 7 by \(\{x,x^i,x^{ij},x^{ijk},x^{ijkl},x^{ijklm}=y^{ij},y^j,y\}\). Observe that for fixed xy with \(d(x,y)=2\) or 3 and for different pairs ij the innermost 3 or 4 vertices of the paths \(\gamma _{xy}^{ij}\) are disjoint.

We now show that with high probability, for every xy at distance 2 or 3, we may find ij such that \(\gamma _{xy}^{ij}\) has only good interior vertices. Fix a pair \(x,y\in \mathbb {H}_N\) at distance 2 or 3, and let as above kl or klm be the coordinates in which x and y differ. Assume for the moment that both x and y have at least \(\frac{1}{2}C_0\sqrt{N}\) good neighbors. Then there are at least \(\frac{1}{4}C_0^2 N\) pairs ij such that the vertices \(x^i\) and \(y^j\) are good. Moreover, since it is a matter of dealing with a constant number of exceptions, we may tacitly assume that \(i\ne j\), and \(\{i,j\}\cap \{k,l\}=\emptyset \) or \(\{i,j\}\cap \{k,l,m\}=\emptyset \), respectively.

The remaining interior vertices \(\{x^{ij}, x^{ijk}, x^{ijkl}=y^{ij}\}\) or \(\{x^{ij},x^{ijk},x^{ijkl},x^{ijklm}=y^{ij}\}\) are all good with probability strictly larger than 1 / 2, so the probability that one or more of these vertices are bad is bounded by 15 / 16. Since these 3 or 4 innermost vertices are disjoint for different pairs ij, by independence, the probability that among all \(\frac{1}{4}C_0^2 N\) pairs \(\{i,j\}\) there is none for which all innermost 3 or 4 vertices of \(\gamma _{xy}^{ij}\) are good is bounded by \((15/16)^{\frac{1}{4}C_0^2 N}\). Hence, for one fixed pair \(x,y\in \mathbb {H}_N\) at distance 2 or 3, where both x and y have at least \(\frac{1}{2}C_0\sqrt{N}\) good neighbors, the probability that there is no path from x to y of length 6 or 7 with all interior vertices good is bounded by \((15/16)^{\frac{1}{4}C_0^2 N}\).

There are less than \(2^N (N^2+N^3)\) pairs of vertices at distance 2 or 3 respectively, and we know from the proof of (3.2) that with probability larger than \(1- e^{-c \sqrt{N}\log N}\) every \(x\in \mathbb {H}_N\) has at least \(\frac{1}{2}C_0\sqrt{N}\) good neighbors. It follows that the probability that the event in (3.3) does not happen is bounded by

$$\begin{aligned} e^{-c \sqrt{N}\log N} + 2^N (N^2+N^3) (15/16)^{\frac{1}{4}C_0^2 N}. \end{aligned}$$
(3.4)

Choosing \(C_0>\sqrt{\frac{4\log 2}{\log 15/16}}\) and applying the Borel–Cantelli lemma implies (3.3).

We now use the density properties (3.2) and (3.3) of good vertices to define a (random) mapping from the hypercube to its sub-graph of good vertices. Let \(\varphi _N(x): \mathbb {H}_N \rightarrow \mathbb {H}_N\) be given by

$$\begin{aligned} \varphi _N(x) = {\left\{ \begin{array}{ll} x, &{}\text {if }x\text { is good;} \\ x^i, &{}\text {if }x\text { and }x^j, j<i,\text { are bad but }x^i\text { is good}; \\ x, &{}\text {if }x\text { is bad and has no good neighbor}. \end{array}\right. } \end{aligned}$$

By (3.2), \(\mathbb {P}\)-a.s. for N large enough the last option will not be used, and therefore \(\varphi _N\) maps all vertices to good vertices. In this case, for two neighboring vertices xy, their good images \(\varphi _N(x)\) and \(\varphi _N(y)\) can either coincide, or be at distance 1, 2, or 3.

Let further

$$\begin{aligned} \mathcal {P}_N=\left\{ \{x_0,\ldots ,x_k\}:~k\ge 0,~d(x_i,x_{i-1}) =1~\forall ~i=1,\ldots ,k\right\} \end{aligned}$$

be the set of finite nearest-neighbor paths on \(\mathbb {H}_N\), including paths of length zero, which are just single vertices. We extend the mapping \(\varphi _N\) to the set of edges by associating to every edge \(e\in \mathcal {E}_N\) an element of \(\mathcal {P}_N\): For an edge \(e=\{x,y\} \in \mathcal {E}_N\), let \(\varphi _N(e)\) be

  • the ‘path’ \(\{\varphi _N(x)\}\), if \(\varphi _N(x)\) is good and \(\varphi _N(x)=\varphi _N(y)\);

  • the path \(\{\varphi _N(x),\varphi _N(y)\}\), if both \(\varphi _N(x)\) and \(\varphi _N(y)\) are good and at distance 1;

  • the path \(\gamma ^{ij}_{\varphi _N(x),\varphi _N(y)}\) with ‘minimal’ ij such that all vertices of this path are good, if both \(\varphi _N(x)\) and \(\varphi _N(y)\) are good with distance 2 or 3 and such path exists;

  • the path \(\{x,y\}\) in any other case.

From (3.2) and (3.3) it follows that \(\mathbb {P}\)-a.s. for N large enough the last option does not occur and \(\varphi _N\) maps all edges to paths that contain only good vertices.

Finally, we extend \(\varphi _N\) to be a map that sends paths to paths. For \(\gamma =\{x_0,\ldots ,x_n\}\in \mathcal {P}_N\) we define \(\varphi _N(\gamma )\) to be a concatenation of paths \(\varphi _N(\{x_{i-1},x_i\})\), \(i=1,\ldots , n\), with possible loops erased by an arbitrary fixed loop-erasure algorithm. Note that \(\varphi _N\) can make paths shorter or longer, but by construction, for any path \(\gamma \in \mathcal {P}_N\),

$$\begin{aligned} |\varphi _N(\gamma )|\le 7|\gamma |. \end{aligned}$$
(3.5)

We can now construct the random set of paths \(\Gamma \) that satisfies the properties of the lemma. We first define a certain canonical set of paths \(\tilde{\Gamma }\), and then use the mapping \(\varphi _N\) to construct \(\Gamma \) from \(\tilde{\Gamma }\).

For any pair of vertices \(x\ne y\in \mathbb {H}_N\), let \(\tilde{\gamma }_{xy}\) be the path from x to y obtained by consequently flipping the disagreeing coordinates, starting at coordinate 1. These paths are all of length smaller or equal to N, and the set \(\tilde{\Gamma }=\{\tilde{\gamma }_{xy}:~x\ne y\in \mathbb {H}_N\}\) has the property that any edge e is used by at most \(2^{N-1}\) paths in \(\tilde{\Gamma }\). Indeed, if \(e=\{u,v\}\), then there is a unique i such that \(u_i\ne v_i\). By construction, \(e\in \tilde{\gamma }_{xy}\) if

$$\begin{aligned} x&=(x_1,\ldots ,x_{i-1}, u_i, u_{i+1},\ldots ,u_N),\\ y&=(v_1,\ldots ,v_{i-1},v_i,y_{i+1},\ldots ,y_N). \end{aligned}$$

It follows that a total of \(N-1\) coordinates of x and y are unknown, and so the number of possible pairs xy for paths \(\tilde{\gamma }_{xy}\) through e is bounded by \(2^{N-1}\) (cf. [21, Example 2.2]).

For any pair \(x\ne y\in \mathbb {H}_N\), let the path \(\gamma _{xy}\) in the set \(\Gamma \) be defined by

$$\begin{aligned} \gamma _{xy} = {\left\{ \begin{array}{ll} \varphi _N(\tilde{\gamma }_{xy}),&{}\text {if }x,y\text { are good},\\ \{x\}\circ \varphi _N(\tilde{\gamma }_{xy}),&{}\text {if }x\text { is bad and }y\text { is good},\\ \varphi _N(\tilde{\gamma }_{xy})\circ \{y\},&{}\text {if }x\text { is good and }y\text { is bad},\\ \{x\}\circ \varphi _N(\tilde{\gamma }_{xy})\circ \{y\},&{}\text {if }x,y\text { are bad},\\ \end{array}\right. } \end{aligned}$$

where ‘\(\circ \)’ denotes the path concatenation.

It remains to check that this set of paths \(\Gamma \) indeed satisfies the required properties. First, by construction, \(\Gamma \) is complete, that is every path \(\gamma _{xy}\in \Gamma \) connects x with y and is nearest-neighbor and self-avoiding. Further, by construction of \(\varphi _N\) and the properties (3.2) and (3.3), \(\mathbb {P}\)-a.s. for N large enough, all interior vertices of all \(\gamma \in \Gamma \) are good, i.e. (i) is satisfied. Moreover, by (3.5) and the construction of the paths \(\tilde{\gamma }\in \tilde{\Gamma }\), the paths \(\gamma \in \Gamma \) have length at most \(7N+2\), hence (ii) is satisfied for \(N\ge 2\). Finally, \(\varphi _N\) deforms the paths \(\tilde{\gamma }\in \tilde{\Gamma }\) only locally, so that the number of paths in \(\Gamma \) passing through an edge e is bounded by the number of paths in \(\tilde{\Gamma }\) passing through the ball of radius 4 around e. But this number is bounded by \(2^{N-1}\) times the number of edges in that ball, which is bounded by \(N^K\) for some integer \(K>0\). This proves (iii) and thus finishes the proof of the lemma. \(\square \)

We can now prove the spectral gap estimate.

Proof of Proposition 3.1

\(\mathbb {P}\)-a.s. for every N large enough we can find a complete set of paths \(\Gamma \) such that (i), (ii) and (iii) of Lemma 3.2 and (2.3) hold. By (ii), the expression in (3.1) over which the maximum is taken is bounded from above by

$$\begin{aligned} \frac{8N}{Z_N} \frac{1}{\tau _u \wedge \tau _v} \sum _{\gamma _{xy} \ni \{u,v\}} (\tau _x\wedge 1)(\tau _y \wedge 1). \end{aligned}$$
(3.6)

We distinguish three cases for the position of the edge \(\{u,v\}\) in a path \(\gamma _{xy}\).

  1. (1)

    If \(\{u,v\}\) is an interior edge of \(\gamma _{xy}\), then \(\tau _u \wedge \tau _v\) is larger than \(N^{-\beta C_0}\) by (i) of Lemma 3.2.

  2. (2)

    If \(\{u,v\}\) is at the end of the path \(\gamma _{xy}\), say at \(u=x\), and v is an interior vertex of \(\gamma _{xy}\), then \(\tau _x \wedge \tau _v\) is either larger than \(N^{-\beta C_0}\), or it is equal to \(\tau _x\) in which case it cancels with \(\tau _x\wedge 1\). Indeed, if \(\tau _x \wedge \tau _v\) was smaller than \(N^{-\beta C_0}\) and equal to \(\tau _v\), then v would be a bad interior vertex of \(\gamma \), which contradicts (i) of Lemma 3.2.

  3. (3)

    If \(\gamma _{xy}\) only consists of the single edge \(\{x,y\}\), then \(\tau _x \wedge \tau _y\) is either larger than 1, or the term \(\tau _x\wedge \tau _y\) cancels with the smaller one of \(\tau _x\wedge 1\) and \(\tau _y\wedge 1\).

It follows that for every edge \(\{u,v\}\) the expression (3.6) is bounded from above by

$$\begin{aligned} \frac{8N}{Z_N} N^{\beta C_0} \#\{\text {paths through }e\}. \end{aligned}$$

Since, by (iii) of Lemma 3.2, the number of paths is bounded by \(N^K2^{N-1}\), and, by (2.3), \({Z_N} \ge \kappa 2^{N}\), this completes the proof. \(\square \)

In a next step we construct the mixing time \({T_{\mathrm {mix}}}\) of the fast chain Y. To this end, define the mixing scale

$$\begin{aligned} m_N = \frac{8}{\kappa }N^{K+3+\beta C_0}. \end{aligned}$$
(3.7)

Then Proposition 3.1 reads \(\lambda _N \ge 2N^2m_N^{-1}\).

We assume that our probability space \((\Omega ,\mathcal {F},\mathbb {P})\) is rich enough so that there exist infinitely many independent random variables distributed uniformly on [0, 1], independent of everything else. A randomized stopping time T is a positive random variable such that the event \(\{T\le t\}\) depends only on \(\{Y_s:s\le t\}\), the environment, and on the values of these additional random variables.

Proposition 3.3

\(\mathbb {P}\)-a.s. for N large enough, there exists a randomized stopping time \({T_{\mathrm {mix}}}\) with values in \(\{m_N,2m_N,3m_N,\ldots \}\) such that \({T_{\mathrm {mix}}}\) is a strong stationary time for Y, that is for any (possibly random) \(Y_0\in \mathbb {H}_N\),

  1. (i)

    \(P^{\tau }_{Y_0}[Y_{{T_{\mathrm {mix}}}}=y]= \nu _y\),

  2. (ii)

    for any \(k\ge 1\), \(P^{\tau }_{Y_0}[{T_{\mathrm {mix}}}\ge km_N] = e^{-(k-1)}\),

  3. (iii)

    \({T_{\mathrm {mix}}}\) and \(Y_{{T_{\mathrm {mix}}}}\) are independent.

Proof

This construction follows closely [31, Proposition 3.1], with only minor adaptations. Define the following distances from stationarity,

$$\begin{aligned} \begin{aligned} s(t)&= \min \left\{ s\ge 0:~\forall x,y\in \mathbb {H}_N,~P^{\tau }_x[Y_{t}=y] \ge (1-s)\nu (y)\right\} ,\\ \bar{d}(t)&=\max _{x,y\in \mathbb {H}_N} \big \Vert P^{\tau }_x[Y_{t}\in \cdot \,] - P^{\tau }_y[Y_{t}\in \cdot \,]\big \Vert _{TV}, \end{aligned} \end{aligned}$$

where \(\Vert \cdot \Vert _{TV}\) denotes the total variation distance. Define the time

$$\begin{aligned} \mathcal {T}=\inf \{t\ge 0:~\bar{d}(t)\le e^{-1}\}. \end{aligned}$$

From [2, Lemmas 4.5, 4.7 and 4.23] we know that

$$\begin{aligned} \begin{aligned} \bar{d}(t)&\le e^{-\left\lfloor t/\mathcal {T} \right\rfloor },\\ s(2t)&\le 1-(1-\bar{d}(t))^2,\\ \mathcal {T}&\le \frac{1}{\lambda _Y} \left( 1+\frac{1}{2}\log \frac{1}{\nu ^*}\right) , \end{aligned} \end{aligned}$$
(3.8)

where \(\nu ^*=\min _x \nu _x\). Since \(\mathbb {P}[\tau _x\le e^{-N^2/4}]\le ce^{-c'N^3}\), by the Borel–Cantelli lemma, \(\mathbb {P}\)-a.s. for N large enough, \(\log \frac{1}{\nu ^*} \le \frac{N^2}{4} + \log Z_N \le \frac{1}{2} N^2\). Therefore, by Proposition 3.1 and (3.8), \(\mathbb {P}\)-a.s. for N large enough, \(\mathcal {T}\le \frac{1}{4}m_N\), \(\bar{d}(\frac{1}{2}m_N)\le e^{-2}\), and \(s(m_N) \le 2 e^{-2} \le e^{-1}\), which means that for all \(Y_0,y\in \mathbb {H}_N\),

$$\begin{aligned} P^{\tau }_{Y_0}[Y_{m_N} = y]\ge (1-e^{-1})\nu _y. \end{aligned}$$

We can now define the strong stationary time \({T_{\mathrm {mix}}}\) with values in \(\{m_N,2m_N,\ldots \}\). Let \(U_1,U_2,\ldots \) be i.i.d. uniformly on [0, 1] distributed random variables, independent of anything else. Conditionally on \(Y_0=x\), \(Y_{m_N}=y\), let \({T_{\mathrm {mix}}}=m_N\) if

$$\begin{aligned} U_1\le \frac{(1-e^{-1}) \nu _y}{P^{\tau }_x[Y_{m_N}=y]} \quad ({\le }{1}). \end{aligned}$$

Otherwise, we define \({T_{\mathrm {mix}}}\) inductively: for every \(k\in \mathbb {N}\), conditionally on \({T_{\mathrm {mix}}}>km_N\), \(Y_{km_N}=z\) and \(Y_{(k+1)m_N}=y\), let \({T_{\mathrm {mix}}}=(k+1)m_N\) if

$$\begin{aligned} U_{k+1}\le \frac{(1-e^{-1}) \nu _y}{P^{\tau }_z[Y_{m_N}=y]} \quad (\le 1). \end{aligned}$$

By construction, we have for every \(x\in {\mathbb {H}}_N\),

$$\begin{aligned} P^{\tau }_{x}[{T_{\mathrm {mix}}}=m_N\mid Y_{m_N}=y] = \frac{(1-e^{-1}) \nu _y}{P^{\tau }_{x}[Y_{m_N}=y]}, \end{aligned}$$

and thus

$$\begin{aligned} P^{\tau }_{Y_0}[{T_{\mathrm {mix}}}=m_N,~ Y_{m_N}=y\mid Y_0=x] = (1-e^{-1}) \nu _y. \end{aligned}$$

Similarly, we have

$$\begin{aligned} P^{\tau }_{Y_0}[{T_{\mathrm {mix}}}=(k+1)m_N,~Y_{(k+1)m_N}=y\mid {T_{\mathrm {mix}}}>km_N,~Y_{km_N}=x] = (1-e^{-1}) \nu _y. \end{aligned}$$

By induction over k, we obtain that for any \(k\in \mathbb {N}\) and \(y\in \mathbb {H}_N\),

$$\begin{aligned} P^{\tau }_{Y_0}[{T_{\mathrm {mix}}}=km_N,~ Y_{km_N}=y] = e^{-(k-1)}(1-e^{-1}) \nu _y, \end{aligned}$$

which finishes the proof. \(\square \)

For future reference we collect here two useful statements that follow directly from the construction of \({T_{\mathrm {mix}}}\).

Lemma 3.4

For every \(t>0\) and \(x\in {\mathbb {H}}_N\) and every starting distribution \(\rho \),

$$\begin{aligned} \begin{aligned} P^\tau _\rho [Y_t=x| {T_{\mathrm {mix}}}<t]&=\nu _x,\\ \big |P^\tau _\rho [Y_t=x]-\nu _x\big |&\le P^\tau _\rho [{T_{\mathrm {mix}}}>t]=e^{-\left\lfloor t/m_N-1 \right\rfloor }. \end{aligned} \end{aligned}$$

4 Bounds on mean hitting time and random scale

In this section we prove bounds on the mean hitting time \(E^{\tau }_{\nu }[H_x]\) of deep traps \(x\in \mathcal {D}_N\). As a corollary of the proof we will obtain a useful bound on the Green function in deep traps. The bounds on the mean hitting times will further imply bounds on the random scale \(R_N\), which will imply the claim (1.7) of Theorem 1.1.

Proposition 4.1

There exists \(\delta \in (0,1/6)\), such that \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} 2^{N-N^{1-\delta }} \le E^{\tau }_{\nu }[H_x] \le 2^{N+N^{1-\delta }} \quad \text {for every }x\in \mathcal {D}_N. \end{aligned}$$

The proof of Proposition 4.1 is split in two parts.

Proof of the upper bound

For the upper bound we use [2, Lemma 3.17] which states that

$$\begin{aligned} E^{\tau }_{\nu }[H_x] \le \frac{1-\nu _x}{\lambda _Y \nu _x}. \end{aligned}$$

Since \(\tau _x\ge 1\) for deep traps \(x\in \mathcal {D}_N\), this is smaller than \(\frac{Z_N}{\lambda _Y}\), which by Proposition 3.1 and (2.3) is bounded by \(2^{N+N^{1-\delta }}\), \(\mathbb {P}\)-a.s. for N large enough. \(\square \)

For the lower bound we will use a version of Proposition 3.2 of [16] which allows to bound the inverse of the mean hitting time \(E^{\tau }_{\nu }[H_x]\) in terms of the effective conductance from x to a suitable set B. Recall the definition of the conductances \(c_{xy}\) from (2.4), and let \(c_x=\sum _{y\sim x}c_{xy}\). Following the terminology of [30, Chapter 2], we define the effective conductance between a vertex x and a set B as

$$\begin{aligned} \mathcal {C}(x\rightarrow B) = P^{\tau }_x[H^+_x > H_B] c_x. \end{aligned}$$

By Proposition 8.5, which is a generalization of [16, Proposition 3.2] to arbitrary continuous-time finite-state-space Markov chains,

$$\begin{aligned} \frac{1}{E^{\tau }_{\nu }[H_x]} \le \mathcal {C}(x\rightarrow B) \nu (B)^{-2}. \end{aligned}$$
(4.1)

To apply this bound effectively, we should find a set B such that \(\mathcal {C}(x\rightarrow B)\) is small and \(\nu (B)\) close to 1. In the next lemma we construct such sets B for every \(x\in {\mathbb {H}}_N\). For these sets we have some control on the conductances connecting B and \(B^c\). Using standard network reduction techniques we can then give a bound on the effective conductance \(\mathcal {C}(x\rightarrow B)\), which when plugged into (4.1) will imply the lower bound on \(E^{\tau }_{\nu }[H_x]\).

Denote by \(B(x,r)=\{y\in \mathbb {H}_N,~d(x,y)\le r\}\) the ball of radius r around x, and by \(\partial B(x,r)=\{y\in \mathbb {H}_N,~d(x,y) = r\}\) the sphere of radius r.

Lemma 4.2

For every \(\delta \in (0,1/6)\), \(\mathbb {P}\)-a.s. for N large enough, there exist radii \((\rho _x)_{x\in \mathbb {H}_N}\) satisfying \(1\le \rho _x\le N^{3\delta }\), such that for all \(x\in \mathbb {H}_N\) and for all \(y\in \partial B(x,\rho _x)\), \(\tau _y \le 2^{\frac{1}{2}N^{1-\delta }}\).

Proof

Fix \(\delta \in (0,1/6)\). We say that a sphere \(\partial B(x,r)\) is good if \(\tau _y \le 2^{\frac{1}{2}N^{1-\delta }}\) for all \(y\in \partial B(x,r)\), otherwise we say that it is bad. Using the Gaussian tail approximation (2.2), we get that

$$\begin{aligned} \mathbb {P}\left[ \tau _y > 2^{\frac{1}{2}N^{1-\delta }}\right] \le c e^{-\frac{\log ^2 2}{8\beta ^2}N^{1-2\delta }}. \end{aligned}$$

The size of the sphere \(\partial B(x,r)\) is bounded by \(N^r\), hence the probability that the sphere \(\partial B(x,r)\) is bad is bounded by

$$\begin{aligned} N^r \mathbb {P}\left[ \tau _y > 2^{\frac{1}{2}N^{1-\delta }}\right] \le c \exp \left\{ r \log N - \frac{\log ^2 2}{8\beta ^2} N^{1-2\delta }\right\} . \end{aligned}$$

By independence of the \(\tau _x\), the probability that for one fixed x all the spheres \(\partial B(x,r)\), \(r=1,\ldots ,N^{3\delta }\), are bad is bounded by

$$\begin{aligned} \prod _{r=1}^{N^{3\delta }} N^r \mathbb {P}\left[ \tau _y > 2^{\frac{1}{2}N^{1-\delta }} \right]&\le \left( N^{N^{3\delta }} \mathbb {P}\left[ \tau _y > 2^{\frac{1}{2}N^{1-\delta }} \right] \right) ^{N^{3\delta }}\\&\le \exp \left\{ N^{3\delta }\log c + N^{6\delta }\log N - \frac{\log ^2 2}{8\beta } N^{1+\delta }\right\} . \end{aligned}$$

Finally, by a union bound, the probability that among all \(2^N\) vertices in \(\mathbb {H}_N\) there is one for which all spheres \(\partial B(x,r)\), \(r=1,\ldots ,N^{3\delta }\), are bad is bounded by

$$\begin{aligned}&2^N \left( N^{N^{3\delta }}\mathbb {P}\left[ \tau _y>2^{\frac{1}{2}N^{1-\delta }}\right] \right) ^{N^{3\delta }} \\&\quad \le \exp \left\{ N^{3\delta }\log c + N^{6\delta }\log N+N\log 2 -\frac{\log ^2 2}{8\beta } N^{1+\delta }\right\} . \end{aligned}$$

Since \(\delta <1/6\) this decays faster than exponentially, and so by the Borel–Cantelli lemma the event occurs \(\mathbb {P}\)-a.s. only for finitely many N, i.e. \(\mathbb {P}\)-a.s. for N large enough we can find for every \(x\in \mathbb {H}_N\) a radius \(\rho _x\le N^{3\delta }\) such that the sphere \(\partial B(x,\rho _x)\) is good. \(\square \)

For every \(x\in \mathcal {D}_N\) we define the set \(A_x\) by

$$\begin{aligned} A_x := {\left\{ \begin{array}{ll} B(x,\rho _x),\qquad &{}\text {if the radius }\rho _x\text { from Lemma 4.2 exists,}\\ \{x\},&{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(4.2)

We may now proceed with the proof of the lower bound in Proposition 4.1.

Proof of the lower bound of Proposition 4.1

We will apply (4.1) with \(B=A_x^c\). Using Lemma 4.2 and (2.3), \(\mathbb {P}\)-a.s. for N large enough, for all \(x\in \mathcal {D}_N\) all conductances \(c_{yz} = (\tau _y\wedge \tau _z)/Z_N\) connecting \(A_x\) and \(A_x^c\) are smaller than \(2^{\frac{1}{2}N^{1-\delta }}/(\kappa 2^N)\). By the parallel law (cf. [30, Chapter 2.3]), the effective conductance between the boundaries of \(A_x\) and \(A_x^c\) is equal to the sum of all the conductances of edges connecting \(A_x\) and \(A_x^c\), and so \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \mathcal {C}(\partial A_x\rightarrow \partial A_x^c) = \sum _{\begin{array}{c} y\in \partial A_x \\ z \in \partial A_x^c \end{array}} c_{yz} \le \kappa ^{-1}N^{\rho _x+1} 2^{\frac{1}{2}N^{1-\delta }} 2^{-N}. \end{aligned}$$

By Rayleigh’s monotonicity principle (cf. [30, Chapter 2.4]), comparing the effective conductances from x to \(A_x^c\) before and after setting all the conductances inside \(A_x\) to infinity, it follows that

$$\begin{aligned} \mathcal {C}(x\rightarrow A_x^c) \le \mathcal {C}(\partial A_x\rightarrow \partial A_x^c) \le \kappa ^{-1}N^{\rho _x+1} 2^{\frac{1}{2}N^{1-\delta }}2^{-N}. \end{aligned}$$

Since \(\delta <1/6\) and \(\rho _x\le N^{3\delta }\), we have \(N^{\rho _x+1} \le 2^{\frac{1}{2} N^{1-\delta }}\) for N large enough, and thus, \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \mathcal {C}(x\rightarrow A_x^c) \le c2^{-N+N^{1-\delta }}. \end{aligned}$$
(4.3)

Moreover, \(\mathbb {P}\)-a.s. for N large enough, as \(\nu _y = (1\wedge \tau _y)/Z_N \le 1/Z_N\), using (2.3) again,

(4.4)

Plugging (4.3) and (4.4) into (4.1) and readjusting \(\delta \) to accommodate for constants easily yields the required lower bound \(E_\nu ^\tau [H_x]\ge 2^{N-N^{1-\delta }}\). This completes the proof. \(\square \)

As a corollary we get a lower bound on \(E^{\tau }_x[\ell _{H_{A_x^c}}(x)]\) for the deep traps \(x\in \mathcal {D}_N\).

Corollary 4.3

There are constants \(\delta \in (0,1/6)\) and \(c>0\), such that \(\mathbb {P}\)-a.s. for N large enough, for all \(x\in \mathcal {D}_N\), under \(P^{\tau }_x\) the local time of Y in x before leaving \(A_x\), \(\ell _{H_{A_x^c}}(x)\), stochastically dominates an exponential random variable with mean \(c2^{-N^{1-\delta }}\). In particular, \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} E^{\tau }_x\left[ \ell _{H_{A_x^c}}(x)\right] \ge c2^{-N^{1-\delta }}. \end{aligned}$$

Proof

Recall that \(J_1\) denotes the time of the first jump of Y. The local time at x before hitting \(A_x^c\) is an exponential random variable with mean equal to

$$\begin{aligned} E^{\tau }_x\left[ \#\{\text {visits to }x\text { before }H_{A_x^c}\}\right] \cdot E^{\tau }_x[J_1]. \end{aligned}$$

The expected number of visits before leaving \(A_x\) is \(P^{\tau }_x[H^+_x > H_{A_x^c}]^{-1}=c_x\mathcal {C}(x\rightarrow A_x^c)^{-1}\). The mean duration of one visit to x is \(E^{\tau }_x[J_1]=(\sum _{y\sim x}q_{xy})^{-1}\). For the deep traps we have \(\tau _x>1\), therefore \(\sum _{y\sim x}q_{xy}=\sum _{y\sim x} c_{xy}/\nu _x=Z_Nc_x\). It follows that the local time at x before hitting \(A_x^c\) is in fact an exponential random variable with mean \(Z_N^{-1}\mathcal {C}(x\rightarrow A_x^c)^{-1}\). Using the bounds (4.3) and (2.3), the claim follows easily. \(\square \)

As a next consequence we give bounds on the random scale \(R_N\) defined in (2.10). Note that this lemma also proves the statement (1.7) about the asymptotic behavior of \(R_N\) in Theorem 1.1.

Lemma 4.4

For every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} 2^{(\gamma -\varepsilon )N} \le R_N \le 2^{(\gamma +\varepsilon )N}. \end{aligned}$$

Proof

By Proposition 3.3, \({T_{\mathrm {mix}}}/m_N\) is a geometric random variable with parameter \(e^{-1}\), and thus \(E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }] \le E^{\tau }_x[{T_{\mathrm {mix}}}^{\alpha }] \le c m_N^{\alpha }\le e^{\epsilon N}\) by (3.7), for every \(\epsilon >0\) and N large enough. Moreover, \(|\mathcal {D}_N| \le c' 2^{(1-\gamma ')N}\) by (2.8). Using the lower bound on \(E^{\tau }_{\nu }[H_x]\) from Proposition 4.1, we obtain that for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} R_N = 2^{(\gamma -\gamma ')N}\left( \sum _{x\in \mathcal {D}_N} \frac{E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }]}{E^{\tau }_{\nu }[H_x]}\right) ^{-1} \ge 2^{(\gamma -\varepsilon )N}. \end{aligned}$$

For the upper bound we need a lower bound on \(E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }]\). Recall the sets \(A_x\) defined in (4.2), and note that

$$\begin{aligned} E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }] \ge E^{\tau }_x\left[ \mathbf {1}_{\{{T_{\mathrm {mix}}}\ge H_{A_x^c}\}}\ell _{H_{A_x^c}}(x)^{\alpha }\right] . \end{aligned}$$
(4.5)

By Corollary 4.3, \(\mathbb {P}\)-a.s. for N large enough, the local time at x before hitting \(A_x^c\) stochastically dominates an exponential random variable with mean \(c2^{-N^{1-\delta }}\), hence

$$\begin{aligned} P^{\tau }_x\left[ \ell _{H_{A_x^c}}(x) \le 2^{-2N^{1-\delta }}\right] \le 1-e^{-c2^{-N^{1-\delta }}} \le c2^{-N^{1-\delta }}. \end{aligned}$$

Moreover, for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P_x^{\tau }[{T_{\mathrm {mix}}}< H_{A_x^c}] \le P_x^{\tau }[Y_{{T_{\mathrm {mix}}}}\in A_x] = \nu (A_x) \le \kappa ^{-1}2^{-N}N^{N^{3\delta }} \le 2^{-\varepsilon N}. \end{aligned}$$

Using the last two observations in (4.5), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }]&\ge P^{\tau }_x\left[ \{{T_{\mathrm {mix}}}\ge H_{A_x^c}\}\cap \{\ell _{H_{A_x^c}}(x) \ge 2^{-2N^{1-\delta }}\}\right] \left( 2^{-2N^{1-\delta }}\right) ^{\alpha } \\&\ge 2^{-2\alpha N^{1-\delta }}\left( P^{\tau }_x\left[ \ell _{H_{A_x^c}}(x) \ge 2^{-2N^{1-\delta }}\right] \right. \\&\quad \,-\left. P^{\tau }_x\left[ \{\ell _{H_{A_x^c}}(x) \ge 2^{-2N^{1-\delta }}\}\cap \{{T_{\mathrm {mix}}}< H_{A_x^c}\}\right] \right) \\&\ge 2^{-2\alpha N^{1-\delta }}\left( P^{\tau }_x\left[ \ell _{H_{A_x^c}}(x) \ge 2^{-2N^{1-\delta }}\right] - P^{\tau }_x\left[ {T_{\mathrm {mix}}}< H_{A_x^c}\right] \right) \\&\ge 2^{-2\alpha N^{1-\delta }}\left( (1-c'2^{-N^{1-\delta }}) - 2^{-\varepsilon N}\right) \\&\ge 2^{-\varepsilon N}. \end{aligned}$$

Combining this with \(|\mathcal {D}_N| \ge c 2^{(1-\gamma ')N}\) by (2.8) and the upper bound on \(E^{\tau }_{\nu }[H_x]\) from Proposition 4.1, we obtain the required upper bound on \(R_N\). \(\square \)

5 Concentration of the local time functional

In this section we prove the concentration of the local time functional that appears in the computation of the quasi-annealed Laplace transform of the clock process on the deep traps, as explained in the introduction (cf. (1.9)). We denote this functional by

$$\begin{aligned} L_N(t) = 2^{(\gamma '-\gamma ) N} \sum _{x\in \mathcal {D}_N} \ell _{tR_N}(x)^{\alpha }. \end{aligned}$$

So far we had no restriction on the choice of \(\gamma '\) other than \(1/2<\gamma '<\gamma \), see (2.6). We now make an explicit choice as follows. Let \(\varepsilon _0=\frac{1}{2}\left( (1-\gamma )\wedge (\gamma -\frac{1}{2})\right) \), and define \(\gamma '= \gamma -\varepsilon _0\), such that in particular

$$\begin{aligned}&1-\gamma \ge 2\varepsilon _0, \end{aligned}$$
(5.1)
$$\begin{aligned}&\gamma -\gamma '=\varepsilon _0. \end{aligned}$$
(5.2)

The main result of this section is the following proposition.

Proposition 5.1

For every fixed \(t\ge 0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P^{\tau }_{\nu }\left[ \left| L_N(t) - t\right| \ge 2^{-\frac{1}{5}\varepsilon _0N}\right] \le c2^{-\frac{1}{10}\varepsilon _0N}. \end{aligned}$$

Proof

We approximate \(L_N(t)\) by the sum of essentially independent random variables as follows. Let \(K=\left\lfloor 2^{\varepsilon _0N} \right\rfloor \). For a fixed \(t>0\), define

$$\begin{aligned} t_k=\frac{tR_N}{K} k, \qquad k=0,\ldots ,K. \end{aligned}$$

Recall the notation (2.5). For every \(x\in \mathcal {D}_N\) and \(k=1,\ldots ,K\), define \(H^k_x=t_{k-1}+H_x\circ \theta _{t_{k-1}}\) to be the time of the first visit to x after \(t_{k-1}\), and set

$$\begin{aligned} \ell _{t,x}^k = \left( \int _{H^k_x\wedge (t_{k}-2N^2m_N)}^{(H^k_x+N^2m_N) \wedge (t_{k}-N^2m_N)} \mathbf {1}_{\{Y_s=x\}}{\mathrm d}s\right) ^{\alpha }. \end{aligned}$$

The random variable \(\ell _{t,x}^k\) gives ‘roughly’ the \(\alpha \)-th power of the time that Y spends in x between \(t_{k-1}\) and \(t_k-N^2m_N\), with some suitable truncations. Let further

$$\begin{aligned} U_N^k(t) = 2^{(\gamma '-\gamma )N}\sum _{x\in \mathcal {D}_N} \ell _{t,x}^k. \end{aligned}$$

The next lemma, which we prove later, shows that the sum of the \(U_N^k(t)\)’s is a good approximation for \(L_N(t)\).

Lemma 5.2

For every \(t>0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P_{\nu }^{\tau }\left[ L_N(t)\ne \sum _{k=1}^K U_N^k(t)\right] \le c2^{-\frac{1}{2}\varepsilon _0N}. \end{aligned}$$

With Lemma 5.2, the proof of the proposition reduces to understanding of the approximating sum \(\sum _{k=1}^K U_N^k(t)\). We will compute its expectation and variance under \(P_{\nu }^{\tau }\). In particular, we will show that there is \(c<\infty \) such that for every \(t>0\),

$$\begin{aligned} \left| E_{\nu }^{\tau }\left[ \sum _{k=1}^K U_N^k(t)\right] - t \right| \le c2^{-2\varepsilon _0N}, \qquad {\mathbb {P}}\text {-a.s. as}\, N\rightarrow \infty , \end{aligned}$$
(5.3)

and

$$\begin{aligned} {\text {Var}}_{\nu }^{\tau }\left( \sum _{k=1}^K U_N^k(t)\right) \le c2^{-\frac{1}{2} \varepsilon _0 N}, \qquad {\mathbb {P}}\text {-a.s. as }N\rightarrow \infty . \end{aligned}$$
(5.4)

The statement of the proposition then follows from Lemma 5.2, (5.3) and (5.4) by routine application of the Chebyshev inequality. Indeed, \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \begin{aligned}&P^{\tau }_{\nu }\left[ \left| L_N(t) - t\right| \ge 2^{-\frac{1}{5}\varepsilon _0N}\right] \\&\quad \le P_{\nu }^{\tau }\left[ L_N(t)\ne \sum _{k=1}^K U_N^k(t)\right] + P_{\nu }^{\tau }\left[ \left| \sum _{k=1}^K U_N^k(t)-E_{\nu }^{\tau } \left[ \sum _{k=1}^K U_N^k(t)\right] \right| \ge 2\cdot 2^{-\frac{1}{5}\varepsilon _0N}\right] \\&\quad \le c2^{-\frac{1}{2}\varepsilon _0N} + c' 2^{-\frac{1}{10}\varepsilon _0N} \le c'' 2^{-\frac{1}{10}\varepsilon _0N}, \end{aligned} \end{aligned}$$

which is the claim of the proposition.

We proceed by computing the expectation (5.3). We will need two lemmas which we show later. The first lemma estimates the probability that a deep trap is visited by the process Y.

Lemma 5.3

For every \(t_N\) such that \(1\le t_N\le 2^N\), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough, for all \(x\in {\mathcal {D}}_N\),

$$\begin{aligned} P^{\tau }_{\nu }[H_x\le t_N ]= & {} \frac{t_N}{E^{\tau }_{\nu }[H_x]} + O\left( t_N^2 2^{2(\varepsilon -1)N}\right) +O\left( 2^{(\varepsilon -1)N}\right) \le c t_N 2^{(\varepsilon -1)N}. \end{aligned}$$

The second lemma then gives the expected contribution of a single \(\ell _{t,x}^k\) to \(\sum _{k=1}^K U_N^k(t)\).

Lemma 5.4

For every fixed \(t>0\), \(k=1,\ldots ,K\) and \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough, for all \(x\in \mathcal {D}_N\),

$$\begin{aligned} E_{\nu }^{\tau }\left[ \ell _{t,x}^k\right] = \frac{tR_N}{KE_{\nu }^{\tau }[H_x]} E_{x}^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\right] + O\left( 2^{(2\gamma +3\varepsilon -2\varepsilon _0-2)N}\right) . \end{aligned}$$

With Lemma 5.4 it is easy to compute the expectation (5.3). Using that \(|\mathcal {D}_N| \le c2^{(1-\gamma ')N}\) by (2.8), and the definition (2.10) of \(R_N\), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} E^{\tau }_{\nu }\left[ \sum _{k=1}^K U_N^k(t)\right]&= 2^{(\gamma '-\gamma ) N}\sum _{x\in \mathcal {D}_N}\sum _{k=1}^K \Biggl (\frac{tR_N }{KE^{\tau }_{\nu }[H_x]}E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }]\\&\quad \,+ O\left( 2^{(2\gamma +3\varepsilon -2\varepsilon _0-2) N}\right) \Biggr )\\&= t + O\left( 2^{(\gamma '-\gamma )N}2^{(1-\gamma ')N} 2^{(2\gamma -2+3\varepsilon -\varepsilon _0) N}\right) \\&= t + O\left( 2^{(\gamma -1+3\varepsilon -\varepsilon _0) N}\right) . \end{aligned}$$

Choosing \(\varepsilon <\varepsilon _0/3\) and recalling (5.1) imply (5.3).

Next, we estimate the variance (5.4). Since \(\nu \) is the stationary measure for Y, the random variables \(U_N^k(t)\), \(k=1,\ldots ,K\), are identically distributed under \(P_{\nu }^{\tau }\). Hence

$$\begin{aligned} {\text {Var}}_{\nu }^{\tau }\left( \sum _{k=1}^K U_N^k(t)\right) = K{\text {Var}}_{\nu }^{\tau }\left( U_N^1(t)\right) + 2 \sum _{1\le k< j\le K}{\text {Cov}}_{\nu }^{\tau }\big (U_N^k(t),U_N^j(t)\big ). \end{aligned}$$
(5.5)

The covariances can be neglected easily. Indeed, since by definition \(U_N^k(t)\) depends on the trajectory of Y between times \(t_{k-1}\) and \(t_{k}-N^2 m_N\) only, we can use the Markov property at the later time to write

$$\begin{aligned}&{\text {Cov}}_\nu ^\tau \left( U_N^k(t),U_N^j(t)\right) \nonumber \\&\quad = E_\nu ^\tau \left[ \left( U^k_N(t)-E_\nu ^\tau U^k_N(t)\right) E^\tau \left[ U^j_N(t)- E_\nu ^\tau U^j_N(t)\,\big |\, Y_{t_k-N^2 m_N}\right] \right] . \end{aligned}$$
(5.6)

By Lemma 3.4, \(\big |P^\tau [Y_{t_k}=y|Y_{t_k-N^2 m_N}]-\nu _y\big | \le e^{-c N^2}\). Using in addition that \(U_N^j\le e^{c'N}\) for some sufficiently large \(c'\), we see that the inner expectation satisfies

$$\begin{aligned} \left| E^\tau [U^j_N(t)- E_\nu ^\tau U^j_N(t)\mid Y_{t_k-N^2 m_N}]\right| \le e^{-c N^2 /2}. \end{aligned}$$

Inserting this inequality back to (5.6) and summing over \(k<j\) then implies that the second term in (5.5) is \(O(e^{-cN^2})\) and thus can be neglected when proving (5.4).

To control the variance of \(U_N^1(t)\) in (5.5), it is enough to bound its second moment, which is

$$\begin{aligned} E_{\nu }^{\tau }\left[ U_N^1(t)^2\right] = 2^{2(\gamma '-\gamma )N}\left( \sum _{x\in \mathcal {D}_N} E_{\nu }^{\tau }\left[ (\ell _{t,x}^1)^2\right] + \sum _{x\ne y \in \mathcal {D}_N} E_{\nu }^{\tau }[\ell _{t,x}^1 \ell _{t,y}^1]\right) . \end{aligned}$$

Since, by definition, \(\ell ^1_{t,x}\le (N^2 m_N)^\alpha \) and \(\ell ^1_{t,x}\ne 0\) implies \(H_x\le tR_N/K\),

$$\begin{aligned} E_{\nu }^{\tau }\left[ U_N^1(t)^2\right]&\le 2^{2(\gamma '-\gamma )N}N^{4\alpha }m_N^{2\alpha } \left( \sum _{x\in \mathcal {D}_N}P_{\nu }^{\tau }\left[ H_x\le \frac{tR_N}{K}\right] \right. \nonumber \\&\left. \qquad + \sum _{x\ne y \in \mathcal {D}_N} P_{\nu }^{\tau } \left[ H_x,H_y\le \frac{tR_N}{K}\right] \right) . \end{aligned}$$
(5.7)

By Lemmas 5.3 and 4.4, for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \),

$$\begin{aligned} P_{\nu }^{\tau }\left[ H_x\le \frac{tR_N}{K}\right] \le c2^{(\gamma -1+\varepsilon -\varepsilon _0)N}. \end{aligned}$$
(5.8)

Moreover, by (2.8), \(|\mathcal {D}_N| \le c2^{(1-\gamma ')N}\), and by (3.7), \(N^{4\alpha }m_N^{2\alpha }\le 2^{\varepsilon N}\), for every \(\varepsilon >0\) and N large enough. It follows that the contribution of the first sum in (5.7) to the variance, including the prefactor \(K=2^{\varepsilon _0N}\) from (5.5), can be bounded by

$$\begin{aligned} c2^{(2(\gamma '-\gamma ) +1-\gamma ' + \gamma -1+2\varepsilon )N} = c 2^{(\gamma '-\gamma +2\varepsilon )N}. \end{aligned}$$

By (5.2), \(\gamma '-\gamma +2\varepsilon \le -\varepsilon _0+2\varepsilon <-\frac{1}{2}\varepsilon _0\) for \(\varepsilon <\varepsilon _0/4\), and hence this contribution is smaller than \(c2^{-\frac{1}{2} \varepsilon _0 N}\) as required for (5.4).

For the second summation in (5.7) we write

$$\begin{aligned} P_{\nu }^{\tau }\left[ H_x,H_y\le \frac{tR_N}{K}\right] \le P_{\nu }^{\tau }\left[ H_x < H_y\le \frac{tR_N}{K}\right] + P_{\nu }^{\tau }\left[ H_y < H_x \le \frac{tR_N}{K}\right] . \end{aligned}$$

By the Markov property, each of these two probabilities can be bounded by

$$\begin{aligned} P_{\nu }^{\tau }\left[ H_x < H_y\le \frac{tR_N}{K}\right]&= \int _0^{\frac{tR_N}{K}} P_{\nu }^{\tau }[H_x\in {\mathrm d}u] P_x^{\tau }\left[ H_y< \frac{tR_N}{K}-u\right] \\&\le \int _0^{\frac{tR_N}{K}} P_{\nu }^{\tau }[H_x\in {\mathrm d}u] \left( P_x^{\tau }[H_y\le {T_{\mathrm {mix}}}] + P_{\nu }^{\tau }\left[ H_y\le \frac{tR_N}{K}\right] \right) \\&\le P_{\nu }^{\tau }\left[ H_x \le \frac{tR_N}{K}\right] \left( P_x^{\tau }[H_y\le {T_{\mathrm {mix}}}] + P_{\nu }^{\tau }\left[ H_y \le \frac{tR_N}{K}\right] \right) . \end{aligned}$$

Using (5.8) and (2.8) again, the second sum in (5.7) is bounded by

$$\begin{aligned} c 2^{(\gamma -1+\varepsilon -\varepsilon _0)N} \left( 2^{2(1-\gamma ')N} 2^{(\gamma -1+\varepsilon -\varepsilon _0)N} + \sum _{x\ne y \in \mathcal {D}_N} P^\tau _x[H_y\le {T_{\mathrm {mix}}}]\right) . \end{aligned}$$
(5.9)

The first term in the parentheses of (5.9) together with the prefactors K from (5.5) and \(2^{2(\gamma '-\gamma )N}N^{4\alpha }m_N^{2\alpha } \le 2^{(2(\gamma '-\gamma )+\varepsilon )N}\) from (5.7), contributes to the variance by at most

$$\begin{aligned} c2^{(\varepsilon _0+2(\gamma '-\gamma )+\varepsilon + 2(1-\gamma ') + 2(\gamma -1 +\varepsilon -\varepsilon _0))N} = c2^{(3\varepsilon - \varepsilon _0)N}\le c2^{-\frac{1}{2} \varepsilon _0 N} \end{aligned}$$

if \(\varepsilon \) is small enough, as required by (5.4).

For the second term in the parentheses of (5.9) we need the following lemma whose proof is again postponed.

Lemma 5.5

Let \(\mathcal {W}^x_t=\sum _{y\in \mathcal {D}_N,y\ne x} \mathbf {1}_{\{H_y\le t\}}\). Then for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough, for every \(x\in \mathcal {D}_N\),

$$\begin{aligned} E^{\tau }_x[\mathcal {W}^x_{{T_{\mathrm {mix}}}}] \le 2^{\varepsilon N}. \end{aligned}$$

Using Lemma 5.5, and including all the prefactors as before, the contribution of the second term in (5.9) to the variance (5.5) is bounded by

$$\begin{aligned} c2^{(\varepsilon _0 + 2(\gamma '-\gamma )+\epsilon + 1-\gamma ' + \gamma -1+\varepsilon -\varepsilon _0 + \varepsilon )N} = c \, 2^{ (\gamma '-\gamma + 3\varepsilon )N} \le 2^{-\frac{1}{2}\varepsilon _0N}, \end{aligned}$$

where for the last inequality we used (5.2) again, and choose \(\varepsilon \) small enough. This completes the proof of (5.4) and thus of the proposition. \(\square \)

We proceed by proving the lemmas used in the above proof.

Proof of Lemma 5.3

By [1, Theorem 1] the hitting time \(H_x\) is approximately exponential in the sense that

$$\begin{aligned} \left| P^{\tau }_{\nu }[H_x>t]-e^{-\frac{t}{E^{\tau }_{\nu }[H_x]}}\right| \le \frac{1}{\lambda _Y E^{\tau }_{\nu }[H_x]}. \end{aligned}$$

Hence, using Propositions 3.1 and 4.1 to bound \(\lambda _Y\) and \(E^{\tau }_{\nu }[H_x]\) respectively, we have for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \begin{aligned} P^{\tau }_{\nu }[H_x\le t_N ]&= \left( 1-e^{-\frac{t_N}{E^{\tau }_{\nu }[H_x]}}\right) + O\left( 2^{(\varepsilon -1)N}\right) \\&= \frac{t_N}{E^{\tau }_{\nu }[H_x]} + O\left( t_N^2 2^{2(\varepsilon -1)N}\right) + O\left( 2^{(\varepsilon -1)N}\right) . \end{aligned} \end{aligned}$$

Finally, if \(1\le t_N\le 2^N\) this is bounded by \(ct_N2^{(\varepsilon -1)N}\), which proves the lemma. \(\square \)

Proof of Lemma 5.4

By the strong Markov property and the definition of \(\ell ^k_{t,x}\),

$$\begin{aligned} \begin{aligned} E_{\nu }^{\tau }\left[ \ell _{t,x}^k\right]&\ge P_{\nu }^{\tau }\left[ H_x\in [t_{k-1},t_k-2N^2m_N]\right] E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] ,\\ E_{\nu }^{\tau }\left[ \ell _{t,x}^k\right]&\le P_{\nu }^{\tau }\left[ H_x\in [t_{k-1},t_k-N^2m_N]\right] E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] . \end{aligned}\end{aligned}$$
(5.10)

We will now give approximations of the expressions appearing in (5.10).

Observe that for every \(s,t>0\),

$$\begin{aligned} \ell _{t}(x)^{\alpha } \le \ell _{s}(x)^{\alpha } + (\ell _{t}(x)-\ell _s(x))^{\alpha }. \end{aligned}$$

Using this inequality with \(t=N^2m_N\) and \(s={T_{\mathrm {mix}}}\) and applying the strong Markov property at \({T_{\mathrm {mix}}}\), observing that \(Y_{T_{\mathrm {mix}}}\) is \(\nu \)-distributed,

$$\begin{aligned} E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] \le E_x^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\right] + E_{\nu }^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] . \end{aligned}$$

By Lemma 5.3, using also that by (3.7), \(\ell _{N^2m_N}(x)^{\alpha }\le N^{2\alpha }m_N^{\alpha } \le 2^{\varepsilon N}\) for every \(\varepsilon >0\) and N large enough,

$$\begin{aligned} E_{\nu }^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] \le P_{\nu }^{\tau }\left[ H_x\le N^2m_N\right] 2^{\varepsilon N} \le c2^{(3\varepsilon -1)N}. \end{aligned}$$

Hence we obtain the upper bound

$$\begin{aligned} E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] \le E_x^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\right] + c2^{(3\varepsilon -1)N}. \end{aligned}$$
(5.11)

For a matching lower bound, note that

$$\begin{aligned} E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] \ge E_x^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha } \mathbf {1}_{\{{T_{\mathrm {mix}}}\le N^2m_N\}}\right] . \end{aligned}$$

But from Proposition 3.3 it follows that

$$\begin{aligned} E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\mathbf {1}_{\{{T_{\mathrm {mix}}}>N^2m_N\}}] \le E^{\tau }_x[{T_{\mathrm {mix}}}^{\alpha }\mathbf {1}_{\{{T_{\mathrm {mix}}}>N^2m_N\}}] \le \sum _{k=N^2}^\infty (km_N)^{\alpha }e^{-k} \le c e^{-c'N^2}, \end{aligned}$$

so that

$$\begin{aligned} E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] \ge E_x^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\right] - ce^{-cN^2}. \end{aligned}$$
(5.12)

Combining (5.11) and (5.12), we obtain

$$\begin{aligned} E_x^{\tau }\left[ \ell _{N^2m_N}(x)^{\alpha }\right] = E_{x}^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\right] + O\left( 2^{(3\varepsilon -1)N}\right) . \end{aligned}$$
(5.13)

Note also that by (3.7), for every \(\varepsilon >0\) and N large enough,

$$\begin{aligned} E^{\tau }_x[\ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }] \le E^{\tau }_x[{T_{\mathrm {mix}}}^{\alpha }]\le cm_N^{\alpha } \le 2^{\varepsilon N}. \end{aligned}$$
(5.14)

To approximate the probabilities in (5.10), we apply Lemma 5.3 for \(t_N=t_{k-1}\) and \(t_N=t_k-iN^2m_N\), for a fixed \(t>0\) and \(i=1,2\). Using Lemma 4.4 to bound \(R_N\) and Proposition 4.1 to bound \(E^{\tau }_{\nu }[H_x]\), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough, for both \(i=1,2\),

$$\begin{aligned}&P_{\nu }^{\tau }\left[ H_x\in [t_{k-1},t_k-iN^2m_N]\right] \nonumber \\&\quad = \frac{tR_N}{KE_{\nu }^{\tau }[H_x]} + O\left( 2^{2(\gamma +\varepsilon -\varepsilon _0-1)N}\right) = O\left( 2^{(\gamma +\varepsilon -\varepsilon _0-1)N}\right) . \end{aligned}$$
(5.15)

Inserting both (5.15) and (5.13) in (5.10), and using (5.14), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} E_{\nu }^{\tau }\left[ \ell _{t,x}^k\right] = \frac{tR_N}{KE_{\nu }^{\tau }[H_x]} E_{x}^{\tau }\left[ \ell _{{T_{\mathrm {mix}}}}(x)^{\alpha }\right] + O\left( 2^{(2\gamma +3\varepsilon -2\varepsilon _0-2)N}\right) . \end{aligned}$$

This proves the lemma. \(\square \)

Proof of Lemma 5.2

Note first that

$$\begin{aligned} \left\{ L_N(t)\ne \sum _{k=1}^K U_N^k(t)\right\} \subseteq \left\{ \exists x\in \mathcal {D}_N:~\ell _{tR_N}(x)^{\alpha }\ne \sum _{k=1}^K \ell _{t,x}^k\right\} . \end{aligned}$$

To control the probability of this event, we introduce some more notation. Set \(H_x^{(0)}=0\), \(H_x^{(1)}=H_x\), and for \(k\ge 2\) define the time of the ‘k-th visit after mixing’ inductively as

$$\begin{aligned} H_x^{(k)}=\inf \{t>{T_{\mathrm {mix}}}\circ \theta _{H_x^{(k-1)}}+H_x^{(k-1)}:~Y_t=x\}. \end{aligned}$$

Let \(\mathcal {N}_t^x=\min \{k\ge 0,~H_x^{(k)}\le t\}\) be the number of ‘visits after mixing’ to x before time t. Finally, let \(I_k=[t_k-2N^2m_N, t_k]\). Then

$$\begin{aligned} \begin{aligned} P^{\tau }_{\nu }\left[ \exists x\in \mathcal {D}_N:~\ell _{tR_N}(x)^{\alpha }\ne \sum _{k=1}^K \ell _{t,x}^k\right]&\le P^{\tau }_{\nu }\left[ Y_s\in \mathcal {D}_N \text { for some }s\in \bigcup _{k=1}^K I_k\right] \\&\quad +P^{\tau }_{\nu }\left[ \exists x\in \mathcal {D}_N:~\mathcal {N}_{tR_N}^x\ge 2\right] \\&\quad +P^{\tau }_{\nu }\left[ \exists x\in \mathcal {D}_N:~{T_{\mathrm {mix}}}\circ \theta _{H_x}\!>\! N^2m_N\right] . \end{aligned}\nonumber \\ \end{aligned}$$
(5.16)

We show that each of the three terms on the right-hand side is smaller than \(c2^{-\frac{1}{2}\varepsilon _0 N}\), which will prove the lemma.

For the first term in (5.16), using the stationarity of \(\nu \) and the Markov property

$$\begin{aligned} P^{\tau }_{\nu }\left[ Y_s\in \mathcal {D}_N \text { for some } s\in \bigcup _{k=1}^K I_k\right] \le K \sum _{x\in \mathcal {D}_N} P^{\tau }_{\nu }\left[ H_x \le 2 N^2 m_N\right] . \end{aligned}$$
(5.17)

By Lemma 5.3, \(\mathbb {P}\)-a.s. for all \(x\in \mathcal {D}_N\), for \(\varepsilon >0\) small and N large enough,

$$\begin{aligned} P^{\tau }_{\nu }[H_x \le 2 N^2m_N] \le 2^{(\varepsilon -1)N}. \end{aligned}$$

Since \(|\mathcal {D}_N| \le c2^{(1-\gamma ')N}\) by (2.8), the right hand side of (5.17) is bounded by \(c 2^{\varepsilon _0 N} 2^{(\varepsilon -\gamma ')N}\). Since \(\gamma '>1/2\) and by definition \(\varepsilon _0\le 1/4\), when \(\varepsilon \) is small enough this is smaller than \(c 2^{-\frac{1}{2} \varepsilon _0 N}\) as required.

For the second term in (5.16), by Lemma 5.3 and the strong Markov property at \({T_{\mathrm {mix}}}\), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P^{\tau }_{\nu }[H_x^{(2)} \le tR_N] \le P^{\tau }_{\nu }[H_x\le tR_N]^2 \le c2^{2(\gamma -1+\varepsilon )N}. \end{aligned}$$

Together with (2.8) to bound \(|\mathcal {D}_N|\), and using (5.1) and (5.2), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P^{\tau }_{\nu }\left[ \exists x\in \mathcal {D}_N:~\mathcal {N}_{tR_N}^x\ge 2\right]&\le c2^{(1-\gamma ')N} 2^{2(\gamma -1+ \varepsilon )N}\\&= c2^{(\gamma -\gamma ')N + (\gamma -1)N + \varepsilon N}\\&\le c 2^{(-\varepsilon _0+\varepsilon ) N} \le c2^{-\frac{1}{2} \varepsilon _0N} \end{aligned}$$

as required.

Finally we give a bound on the third term in (5.16). By Proposition 3.3, \(P^{\tau }_x[{T_{\mathrm {mix}}}> N^2m_N] \le e^{-cN^2 }\). Thus, with (2.8) to bound \(|\mathcal {D}_N|\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P^{\tau }_{\nu }\left[ \exists x\in \mathcal {D}_N:~{T_{\mathrm {mix}}}\circ \theta _{H_x}> N^2m_N\right]&\le c 2^{(1-\gamma ')N} P^{\tau }_x[{T_{\mathrm {mix}}}> N^2m_N]\\&\le c'2^{-\frac{1}{2}\varepsilon _0N}. \end{aligned}$$

Together with the previous estimates, this implies that the right-hand side of (5.16) is bounded by \(c2^{-\frac{1}{2}\varepsilon _0N}\), and concludes the proof of the lemma. \(\square \)

Proof of Lemma 5.5

Let \(\mathcal {H}_0=0\) and define recursively for \(i\ge 1\)

$$\begin{aligned} \mathcal {H}_i = \inf \left\{ t\ge \mathcal {H}_{i-1}: Y_t\in \mathcal {D}_N{\setminus }\{Y_{\mathcal {H}_{i-1}}\}\right\} . \end{aligned}$$

By (2.9), \(\mathbb {P}\)-a.s. for N large enough, the vertices in \(\mathcal {D}_N\) are at least distance \(\delta N\) from each other. In particular the balls \(A_x = B(x,\rho _x)\), \(x\in \mathcal {D}_N\), defined in (4.2) are disjoint. Hence, when on \(y\in \mathcal {D}_N\), the random walk Y should first leave \(A_y\) in order to visit \(\mathcal {D}_N{\setminus }\{y\}\). The strong Markov property and Corollary 4.3 then imply that \(\mathcal {H}_i\) stochastically dominates a Gamma random variable with parameters i and \(\mu :=c2^{N^{1-\delta }}\).

If \(\mathcal {W}^x_t\ge i\), then \(\mathcal {H}_i\le t\). Hence, for \(t\ge \mu \),

$$\begin{aligned} E^\tau _x \left[ \mathcal {W}^x_t\right] =\sum _{i\ge 1} P^\tau _x[\mathcal {W}^x_t\ge i] \le \sum _{i\ge 1} P^\tau _x[\mathcal {H}_i\le t] \le \sum _{i\ge 1} \int _0^t \mu ^i u^{i-1} e^{-\mu u }\Gamma (i)^{-1} {\mathrm d}u = \mu t. \end{aligned}$$

Using the trivial estimate \(E^\tau _x[\mathcal {W}^x_{T_{\mathrm {mix}}}\mathbf {1}_{\{{T_{\mathrm {mix}}}\ge N^2 m_N\}}] \le |\mathcal {D}_N| P^\tau _x[{T_{\mathrm {mix}}}\ge N^2 m_N]\), it follows that

$$\begin{aligned}&E^\tau _x \left[ \mathcal {W}_{T_{\mathrm {mix}}}^x\right] \le E^\tau _x \left[ \mathcal {W}_{N^2 m_N}^x\right] + |\mathcal {D}_N| P^\tau _x[{T_{\mathrm {mix}}}\ge N^2 m_N]\nonumber \\&\quad \le \mu N^2 m_N + c 2^{(\gamma '-1)N}e^{-cN^2} \le 2^{\varepsilon N} \end{aligned}$$

by (2.8), (3.7) and Proposition 3.3. This completes the proof. \(\square \)

For later applications, we state two further consequences of the proof of Lemma 5.2.

Lemma 5.6

\(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} P^{\tau }_{\nu }\left[ \exists x\in \mathcal {D}_N:~\ell _{t R_N}(x)> N^2m_N\right] \le c2^{-\frac{1}{2}\varepsilon _0N}, \end{aligned}$$

and

$$\begin{aligned} P^{\tau }_{\nu }\left[ \left| \{x\in \mathcal {D}_N:~H_x\le tR_N\}\right| \ge 2^{\frac{3}{2}\varepsilon _0N}\right] \le c2^{-\frac{1}{4}\varepsilon _0 N}. \end{aligned}$$

Proof

The first claim follows directly from the bounds on the second and third term on the right hand side of (5.16) in the proof of Lemma 5.2, since the local time in a vertex that is only ‘visited once after mixing’ is bounded by \({T_{\mathrm {mix}}}\circ \theta _{H_x}\).

The second assertion can be seen in the following way. Using Lemma 5.3 to bound the probability of a single vertex \(x\in \mathcal {D}_N \) to be visited before time \(tR_N\) and (2.8) to bound the size of \(\mathcal {D}_N\), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} E^{\tau }_{\nu }\left[ |\{x\in \mathcal {D}_N:~H_x\le tR_N\}|\right] \le c2^{(1-\gamma ')N}2^{(\gamma -1+\varepsilon )N} \le c2^{(\gamma -\gamma '+\varepsilon )N}. \end{aligned}$$

By (5.2) this is equal to \(c2^{(\varepsilon _0+\varepsilon )N}\), so choosing \(\varepsilon <\varepsilon _0/4\) this is smaller than \(c2^{\frac{5}{4}\varepsilon _0 N}\). Then by the Markov inequality the probability that there are more than \(2^{\frac{3}{2}\varepsilon _0 N}\) vertices visited is smaller than \(c2^{-\frac{1}{4}\varepsilon _0 N}\). \(\square \)

6 Clock process of the deep traps

This section contains the main steps leading to the proof of Theorem 1.1. Recall from (1.8) that the ‘clock process of deep traps’ \(S_\mathcal {D}\) is given by

$$\begin{aligned} S_{\mathcal {D}}(t) = \int _0^{t} (1\vee \tau _{Y_s})\mathbf {1}_{\{Y_s\in \mathcal {D}_N\}}{\mathrm d}s = \int _0^{t} \tau _{Y_s}\mathbf {1}_{\{Y_s\in \mathcal {D}_N\}}{\mathrm d}s. \end{aligned}$$

We now show that \(S_\mathcal {D}\) converges to a stable process.

Proposition 6.1

Under the assumptions of Theorem 1.1, the rescaled clock processes of the deep traps \(g_N^{-1}S_{\mathcal {D}}(tR_N)\) converge in \(\mathbb {P}\)-probability as \(N\rightarrow \infty \), in \(P^{\tau }_{\nu }\) distribution on the space \(D([0,T],\mathbb {R})\) equipped with the Skorokhod \(M_1\)-topology, to an \(\alpha \)-stable subordinator \(V_{\alpha }\).

The proof of Proposition 6.1 consists of three steps. In a first step, we show convergence in distribution of one-dimensional marginals by showing that the Laplace transform of one-dimensional marginals converges. This step contains, to some extent, the principal insight of this paper and is split in two parts: We first show the quasi-annealed convergence mentioned in the introduction, which is then strengthened to convergence in probability with respect to the environment. The second and third step of the proof of Proposition 6.1 are rather standard and deal with the joint convergence of increments and the tightness.

6.1 Quasi-annealed convergence

We establish here the connection between the Laplace transform of the clock process of deep traps and the local time functional \(L_N\) studied in Sect. 5. The key observation is that the depths of the deep traps are in some sense independent of the fast chain Y, and can be thus averaged out easily.

To formalize this, we introduce a two-step procedure to sample the environment \(\tau \). Let \(\xi =(\xi _x)_{x\in \mathbb {H}_N}\) be i.i.d. Bernoulli random variables such that, cf. (2.7),

$$\begin{aligned} \mathbb {P}[\xi _x=1]=1-\mathbb {P}[\xi _k=0]=\mathbb {P}[x\in \mathcal {D}_N]= 2^{-\gamma 'N}(1+o(1)). \end{aligned}$$

Further, let \(\overline{E}=(\overline{E}_x)_{x\in \mathbb {H}_N}\) be i.i.d. standard Gaussian random variables conditioned to be larger than \(\frac{1}{\beta \sqrt{N}}\log g'_N\), and \(\underline{E}=(\underline{E}_x)_{x\in \mathbb {H}_N}\) i.i.d. standard Gaussian random variables conditioned to be smaller than \(\frac{1}{\beta \sqrt{N}}\log g'_N\). The collections \(\xi \), \(\overline{E}\) and \(\underline{E}\) are mutually independent. The Hamiltonian of the REM can be obtained by setting

$$\begin{aligned} E_x=\overline{E}_x \mathbf {1}_{\{\xi _x=1\}} + \underline{E}_x \mathbf {1}_{\{\xi _x=0\}}. \end{aligned}$$
(6.1)

From now on, we always assume that \(E_x\) are given by (6.1). Observe that in this procedure the set \(\mathcal {D}_N\) coincides with the set \(\{x\in {\mathbb {H}}_N:\xi _x=1\}\).

We use \(\mathcal {G}=\sigma (\xi ,\underline{E})\) to denote the \(\sigma \)-algebra generated by the \(\xi \)’s and \(\underline{E}\)’s. In particular, the number and positions of deep traps and all the \(\tau _y\), \(y\notin \mathcal {D}_N\), are \(\mathcal {G}\)-measurable. The depths of deep traps are however independent of \(\mathcal {G}\).

In the next lemma we compute the quasi-annealed Laplace transform of \(S_{\mathcal {D}}\). The term ‘quasi-annealed’ refers to the fact that we average over the energies of the deep traps \(\overline{E}_x\) (and over the law of the process), but we keep quenched the positions of the deep traps \(\xi _x\) and the energies of remaining traps \(\underline{E}_x\).

Lemma 6.2

There is a constant \(\mathcal {K}\in (0,\infty )\) such that for every \(\lambda >0\) and \(t\ge 0\),

$$\begin{aligned} \mathbb {E}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] \bigg |\,\mathcal {G}\right] \xrightarrow {N\rightarrow \infty } e^{-\mathcal {K}\lambda ^{\alpha } t}, \qquad {\mathbb {P}}\text {-a.s.} \end{aligned}$$

Proof

Recall the separation event \(\mathscr {S}\) defined in (2.9). This event depends only on \(\xi \) and is therefore \(\mathcal {G}\)-measurable, and by Lemma 2.1 it occurs \(\mathbb {P}\)-a.s. for N large enough. On \(\mathscr {S}\), no deep traps \(x\in \mathcal {D}_N\) are neighbors. Since moreover \(\tau _x\ge 1\) for \(x\in \mathcal {D}_N\), all the transition rates

$$\begin{aligned} q_{xy} \mathbf {1}_{\mathscr {S}} = \frac{\tau _x\wedge \tau _y}{1\wedge \tau _x}\mathbf {1}_{\mathscr {S}} ,\qquad x,y\in {\mathbb {H}}_N, \end{aligned}$$

are \(\mathcal {G}\)-measurable. That is, on the event \(\mathscr {S}\), the law of the chain Y is in fact \(\mathcal {G}\)-measurable. Therefore, on \(\mathscr {S}\), the order of taking expectations over the depth of the deep traps and the chain Y can be exchanged. Namely, denoting by \({\overline{\mathbb {E}}}\) the expectation over the random variables \(\overline{E}_x\), on \(\mathscr {S}\),

$$\begin{aligned} \mathbb {E}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] \, \bigg |\, \mathcal {G}\right]= & {} E^{\tau }_{\nu }\left[ {\overline{\mathbb {E}}}\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] \right] \nonumber \\= & {} E^{\tau }_{\nu }\left[ {\overline{\mathbb {E}}}\left[ \exp \left\{ -\frac{\lambda }{g_N} \int _0^{tR_N}\tau _{Y_s}\mathbf {1}_{\{Y_s\in \mathcal {D}_N\}}{\mathrm d}s\right\} \right] \right] \nonumber \\= & {} E^{\tau }_{\nu }\left[ {\overline{\mathbb {E}}}\left[ \exp \left\{ -\frac{\lambda }{g_N} \sum _{x\in \mathcal {D}_N}\ell _{tR_N}(x)\tau _x\right\} \right] \right] .\nonumber \\ \end{aligned}$$
(6.2)

We next approximate the inner expectation on the right-hand side of (6.2). Since its argument is bounded by one, it will be sufficient to control it on an event of \(P^{\tau }_{\nu }\)-probability tending to 1 as \(N\rightarrow \infty \). Define the event

$$\begin{aligned} \mathcal {A}= & {} \left\{ \text {for all }x\in \mathcal {D}_N,~\ell _{tR_N}(x) \le N^2 m_N\right\} \cap \left\{ \left| L_N(t) - t\right| \le 2^{-\frac{1}{5}\varepsilon _0N}\right\} . \end{aligned}$$
(6.3)

By Proposition 5.1 and Lemma 5.6, \(\mathbb {P}\)-a.s. for N large enough, \(P^{\tau }_{\nu }[\mathcal {A}^c]\le e^{-cN}\).

When performing the inner expectation of (6.2), the local times \(\ell _{tR_N}(x)\) of Y as well as \(\mathcal {D}_N\) are fixed, the expectation is taken only over the energies of the deep traps. By independence of the \(\overline{E}_x\) it follows that

$$\begin{aligned} {\overline{\mathbb {E}}}\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right]= & {} \prod _{x\in \mathcal {D}_N} {\overline{\mathbb {E}}}\left[ \exp \left\{ - \frac{\lambda }{g_N}\ell _{tR_N}(x) e^{\beta \sqrt{N} \,\overline{E}_x}\right\} \right] \nonumber \\= & {} \exp \left\{ \sum _{x\in \mathcal {D}_N} \log {\overline{\mathbb {E}}}\left[ \exp \left\{ - \frac{\lambda }{g_N}\ell _{tR_N}(x) e^{\beta \sqrt{N} \,\overline{E}_x} \right\} \right] \right\} .\nonumber \\ \end{aligned}$$
(6.4)

For \(u\in [0,N^2 m_N]\), let

$$\begin{aligned} \vartheta (u) = 1- {\overline{\mathbb {E}}}\left[ \exp \left\{ -\frac{\lambda }{g_N} u e^{\beta \sqrt{N} \,\overline{E}_x}\right\} \right] . \end{aligned}$$

Since \((\overline{E}_x)\) has standard Gaussian distribution conditioned on being larger than \(\frac{1}{\beta \sqrt{N}}\log g'_N\), using that by (2.7),

$$\begin{aligned} \mathbb {P}\left[ E_x > \frac{1}{\beta \sqrt{N}}\log g'_N\right] = \mathbb {P}[x\in \mathcal {D}_N] = 2^{-\gamma ' N}(1+o(1)), \end{aligned}$$

it follows that

$$\begin{aligned} \vartheta (u) = \frac{2^{\gamma ' N}}{\sqrt{2\pi }}\, (1+o(1)) \int _{\frac{1}{\beta \sqrt{N}}\log g'_N}^{\infty } e^{-\frac{s^2}{2}} \left( 1-e^{-\frac{\lambda u}{g_N}e^{\beta \sqrt{N} s}} \right) {\mathrm d}s. \end{aligned}$$

We use the substitution \(s=\frac{1}{\beta \sqrt{N}} ( \beta z + \log g_N - \log \lambda -\log u)\). The lower limit of the integral then becomes

$$\begin{aligned} \frac{1}{\beta }(\log g'_N-\log g_N+\log \lambda +\log u) =: \omega (N). \end{aligned}$$

For \(u\le N^2 m_N\), \(\omega _N\) is asymptotically dominated by \(\log g'_N-\log g_N \le -cN\), and thus \(\lim _{N\rightarrow \infty }\omega (N)=-\infty \). After the substitution,

$$\begin{aligned} \vartheta (u)=\frac{2^{\gamma ' N}}{\sqrt{2\pi }} (1+o(1)) \int _{\omega (N)}^{\infty }e^{-\frac{1}{2\beta ^2 N} (\beta z + \log g_N - \log \lambda - \log u)^2} \left( 1-e^{-e^{\beta z}} \right) \frac{1}{\sqrt{N}}\, {\mathrm d}z. \end{aligned}$$
(6.5)

For \(u\in [0,N^2 m_N]\), using the definition (1.6) of \(g_N\), the exponent of the first exponential satisfies

$$\begin{aligned} \begin{aligned} -\frac{1}{2\beta ^2 N}&(\beta z+\log g_N-\log \lambda -\log u)^2 \\&= -\frac{1}{2\beta ^2 N}\left( \beta z + \alpha \beta ^2 N -\frac{1}{\alpha } \log \left( \alpha \beta \sqrt{2\pi N}\right) - \log \lambda - \log u \right) ^2 \\&= -\frac{\alpha ^2\beta ^2}{2}N + \alpha \log \lambda + \alpha \log u +\log \left( \alpha \beta \sqrt{2\pi N}\right) - \alpha \beta z + {\text {err}}(z) + o(1). \end{aligned} \end{aligned}$$
(6.6)

Here, o(1) is an error independent of the variable z. Note that for the \(\log ^2 u\) part to be o(1) it is important that \(m_N\) defined in (3.7) is not too large, see also Remark 6.4. The second error term is

$$\begin{aligned} {\text {err}}(z)=-\frac{1}{2N} z^2 + \frac{1}{\beta N} z \left( \frac{1}{\alpha }\log (\alpha \beta \sqrt{2\pi N}) + \log \lambda + \log u\right) . \end{aligned}$$

Observe that \(\lim _{N\rightarrow \infty } {\text {err}}(z)=0\) for every \(z\in \mathbb {R}\), and that for every \(\varepsilon \) there is \(N_0\) large enough, so that for \(N\ge N_0\) and all \(z\in \mathbb {R}\)

$$\begin{aligned} {\text {err}}(z) \le \varepsilon |z|. \end{aligned}$$
(6.7)

Inserting the results of the computation (6.6) back into (6.5), using that \({\alpha ^2\beta ^2}/{2}=\gamma \log 2\), we obtain

$$\begin{aligned} \vartheta (u) = \alpha \beta 2^{(\gamma '-\gamma ) N} \lambda ^{\alpha }u^{\alpha } \int _{\omega (N)}^{\infty } e^{-\alpha \beta z + {\text {err}}(z)} \left( 1-e^{-e^{\beta z}} \right) {\mathrm d}z\, (1+o(1)). \end{aligned}$$
(6.8)

We now claim that

$$\begin{aligned} \int _{\omega (N)}^{\infty } e^{-\alpha \beta z + {\text {err}}(z)} \left( 1-e^{-e^{\beta z}}\right) {\mathrm d}z \xrightarrow {N\rightarrow \infty } \int _{\mathbb {R}} e^{-\alpha \beta z } \left( 1-e^{-e^{\beta z}}\right) {\mathrm d}z=:C. \end{aligned}$$
(6.9)

Indeed, the integrand converges point-wise on \(\mathbb {R}\) to \(e^{-\alpha \beta z}(1-e^{-e^{\beta z}})\) which is integrable if \(\alpha <1\). Moreover, by (6.7), the integrand is bounded by \(e^{-\alpha \beta z+\varepsilon |z|}(1-e^{-e^{\beta z}})\), which is integrable if we choose \(\varepsilon <\beta (1-\alpha )\wedge \alpha \beta \). The claim (6.9) follows by the dominated convergence theorem.

We now come back to (6.4). Since on \(\mathcal {A}\), \(\ell _{tR_N}(x)\le N^2 m_N\) for all \(x\in \mathcal {D}_N\), and \(\gamma '<\gamma \), we see that \(\vartheta (\ell _{tR_N}(x)) = o(1)\) uniformly in \(x\in \mathcal {D}_N\) on \(\mathcal {A}\). With \(\log (1-x)= -x(1+O(x))\) as \(x\rightarrow 0\) this yields

$$\begin{aligned} \begin{aligned} {\overline{\mathbb {E}}}\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right]&= \exp \left\{ \sum _{x\in \mathcal {D}_N}\log \left( 1-\vartheta (\ell _{tR_N}(x))\right) \right\} \\&= \exp \left\{ -\sum _{x\in \mathcal {D}_N}\vartheta (\ell _{tR_N}(x))\left( 1+o(1)\right) \right\} . \end{aligned} \end{aligned}$$

The inner sum can be easily computed from (6.8). Recalling that on \(\mathcal {A}\) the local time functional \(L_N(t)\) converges, denoting \(\mathcal {K}=\alpha \beta C\), we obtain on \(\mathcal {A}\),

$$\begin{aligned} \sum _{x\in \mathcal {D}_N} \vartheta (\ell _{tR_N}(x))= & {} \alpha \beta C \lambda ^{\alpha } 2^{(\gamma '-\gamma ) N}\sum _{x\in \mathcal {D}_N} \ell _{tR_N}(x)^{\alpha } \big (1+o(1)\big ) \nonumber \\= & {} \alpha \beta C \lambda ^{\alpha } L_N(t) \left( 1+o(1)\right) \nonumber \\= & {} \mathcal {K}\lambda ^{\alpha } t +o(1) \quad \text {as }N\rightarrow \infty . \end{aligned}$$
(6.10)

It follows that on \(\mathcal {A}\)

$$\begin{aligned} {\overline{\mathbb {E}}}\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] = e^{-\mathcal {K}t \lambda ^{\alpha }(1+o(1))} = e^{-\mathcal {K}t \lambda ^{\alpha }} + o(1)\quad \text {as }N\rightarrow \infty . \end{aligned}$$

Inserting this into (6.2), using that \(P^{\tau }_{\nu }[\mathcal {A}^c]=O(e^{-cN})\), we conclude that, on \(\mathscr {S}\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \),

$$\begin{aligned} \mathbb {E}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] \,\bigg | \,\mathcal {G}\right]= & {} E^{\tau }_{\nu }\left[ {\overline{\mathbb {E}}}\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)} \right] \mathbf {1}_{\mathcal {A}}\right] + O(e^{-cN})\\= & {} e^{-\mathcal {K}t \lambda ^{\alpha }} + o(1). \end{aligned}$$

Since \(\mathscr {S}\) occurs \(\mathbb {P}\)-a.s. for N large enough, this completes the proof. \(\square \)

6.2 Quenched convergence

We strengthen the convergence in Lemma 6.2 in the following way.

Lemma 6.3

The one-dimensional marginals of the rescaled clock processes \(g_N^{-1}S_{\mathcal {D}}(tR_N)\) converge in \(\mathbb {P}\)-probability as \(N\rightarrow \infty \), in \(P^{\tau }_{\nu }\)-distribution to an \(\alpha \)-stable law, that is for every \(t>0\) and \(\lambda >0\),

$$\begin{aligned} E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] \xrightarrow {N\rightarrow \infty } e^{-\mathcal {K}\lambda ^{\alpha } t} \qquad \text {in }\mathbb {P}\text {-probability.} \end{aligned}$$

Proof

It will be enough to show that \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \mathbb {E}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] ^2 \bigg |\,\mathcal {G}\right] = e^{-2\mathcal {K}\lambda ^{\alpha } t} +o(1). \end{aligned}$$
(6.11)

Indeed, if (6.11) holds, then the conditional variance

$$\begin{aligned} {\text {Var}}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] \bigg |\,\mathcal {G} \right] \xrightarrow {N\rightarrow \infty } 0, \qquad {\mathbb {P}}\text {-a.s.}, \end{aligned}$$

and the claim follows by an application of the Chebyshev inequality and Lemma 6.2.

To show (6.11), we rewrite

$$\begin{aligned} \mathbb {E}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] ^2 \bigg |\,\mathcal {G}\right] = \mathbb {E}\left[ \hat{E}^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}\sum _{x\in \mathcal {D}_N} (\ell _{tR_N}^{(1)}(x)+\ell _{tR_N}^{(2)}(x))\tau _x}\right] \bigg | \,\mathcal {G}\right] , \end{aligned}$$

where \(\ell ^{(1)}\) and \(\ell ^{(2)}\) are the local times of two independent Markov chains \(Y^{(1)}\) and \(Y^{(2)}\), both having law \(P^{\tau }_{\nu }\), and \(\hat{E}^{\tau }_{\nu }\) is the expectation with respect to the joint law \(\hat{P}^{\tau }_{\nu }\) of these chains. Again \(\mathbb {P}\)-a.s. for N large enough the separation event \(\mathscr {S}\) holds, and on this event the law \(\hat{P}^{\tau }_{\nu }\) is \(\mathcal {G}\)-measurable. Therefore we can exchange the expectations similarly as before. As in Lemma 6.2, it will be enough to control the expression on an event of \(\hat{P}^{\tau }_{\nu }\)-probability tending to 1 as \(N\rightarrow \infty \). We thus set \(\hat{\mathcal {A}}=\mathcal {A}^{(1)}\cap \mathcal {A}^{(2)}\) where \(\mathcal {A}^{(i)}\) are defined for both chains \(Y^{(i)}\) as in (6.3). Applying Proposition 5.1 and Lemma 5.6 for both independent chains, we have that \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \), \(\hat{P}^{\tau }_{\nu }[\hat{\mathcal {A}}^c]=O(e^{-cN})\).

Let \(\mathcal {C}\) be the event that \(Y^{(1)}\) and \(Y^{(2)}\) visit disjoint sets of deep traps,

$$\begin{aligned} \mathcal {C}=\left\{ \{x\in \mathcal {D}_N:~\ell _{tR_N}^{(1)}(x)>0\} \cap \{x\in \mathcal {D}_N:~\ell _{tR_N}^{(2)}(x)>0\}=\emptyset \right\} . \end{aligned}$$

We claim that \(\hat{P}^{\tau }_{\nu }[\mathcal {C}^c]= O(e^{-cN})\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \). Indeed, by Lemma 5.6, with probability larger than \(1-c2^{-\frac{1}{4}\varepsilon _0 N}\), the chain \(Y^{(1)}\) visits at most \(2^{\frac{3}{2}\varepsilon _0N}\) different vertices in \(\mathcal {D}_N\). By Lemma 5.3, each of those vertices has probability smaller than \(c2^{(\gamma -1+\varepsilon )N}\) of being hit by \(Y^{(2)}\), for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough. Therefore by the choice (5.1) of \(\varepsilon _0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} \hat{P}^{\tau }_{\nu }[\mathcal {C}^c] \le c2^{-\frac{1}{4}\varepsilon _0 N} + 2^{\frac{3}{2}\varepsilon _0 N} c'2^{(\gamma -1+\varepsilon )N} \le c2^{-\frac{1}{4}\varepsilon _0 N} + c'2^{-\frac{1}{2}\varepsilon _0N +\varepsilon N}, \end{aligned}$$

which decays exponentially if \(\varepsilon <\varepsilon _0/2\).

Since on \(\mathcal {C}\) the \(\tau _x\) of the vertices \(x\in \mathcal {D}_N\) visited by \(Y^{(1)}\) and \(Y^{(2)}\) are independent, and since the integrand is bounded by 1, we have on the separation event \(\mathscr {S}\),

$$\begin{aligned} \mathbb {E}&\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] ^2 \bigg |\mathcal {G}\right] \\&=\hat{E}^{\tau }_{\nu }\left[ \overline{\mathbb {E}}\left[ e^{-\frac{\lambda }{g_N}\sum _{x\in \mathcal {D}_N} (\ell _{tR_N}^{(1)}(x)+\ell _{tR_N}^{(2)}(x))\tau _x}\right] \right] \\&=\hat{E}^{\tau }_{\nu }\left[ \overline{\mathbb {E}}\left[ e^{-\frac{\lambda }{g_N}\sum _{x\in \mathcal {D}_N} (\ell _{tR_N}^{(1)}(x)+\ell _{tR_N}^{(2)}(x))\tau _x}\right] \mathbf {1}_{\hat{\mathcal {A}}\cap \mathcal {C}} \right] + O(e^{-cN})\\&= \hat{E}^{\tau }_{\nu }\left[ \overline{\mathbb {E}}\left[ e^{-\frac{\lambda }{g_N}\sum _{x\in \mathcal {D}_N} \ell _{tR_N}^{(1)}(x)\tau _x}\right] \overline{\mathbb {E}}\left[ e^{-\frac{\lambda }{g_N}\sum _{x\in \mathcal {D}_N} \ell _{tR_N}^{(2)}(x)\tau _x}\right] \mathbf {1}_{\hat{\mathcal {A}}\cap \mathcal {C}} \right] + O(e^{-cN}). \end{aligned}$$

Using the same procedure as in the proof of Lemma 6.2, on the event \(\hat{\mathcal {A}}\), the two inner expectations, \(x\in \mathcal {D}_N\), both converge to

$$\begin{aligned} \exp \left\{ -\mathcal {K}\lambda ^{\alpha } 2^{(\gamma '-\gamma )N} \sum _{x\in \mathcal {D}_N} \ell _{tR_N}^{(i)}(x)^{\alpha }\right\} =\exp \left\{ -\mathcal {K}\lambda ^{\alpha }L_N^{(i)}(t)\right\} , \quad i=1,2. \end{aligned}$$

Moreover, on \(\hat{\mathcal {A}}\), the local time functionals \(L_N^{(i)}(t)\) concentrate on t simultaneously. It follows that on \(\mathscr {S}\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \),

$$\begin{aligned} \mathbb {E}\left[ E^{\tau }_{\nu }\left[ e^{-\frac{\lambda }{g_N}S_{\mathcal {D}}(tR_N)}\right] ^2 \bigg |\, \mathcal {G}\right] = e^{-2\mathcal {K}\lambda ^{\alpha } t} +o(1). \end{aligned}$$

Noting again that \(\mathscr {S}\) occurs \(\mathbb {P}\)-a.s. for N large enough, this shows (6.11), and hence the lemma. \(\square \)

Remark 6.4

  1. (a)

    Inspecting the last proof carefully, it follows that Lemma 6.3 can be slightly strengthened. Namely, the stated convergence holds a.s. with respect to \(\xi \) and \(\underline{E}\), and in probability only with respect to \(\overline{E}\). The same remark then applies to Theorem 1.1.

  2. (b)

    A closer analysis of the errors made in the computation of the quasi-annealed Laplace transform, in particular in (6.6), shows that the error in Lemma 6.2 and (6.11) is of order \(O(N^{-1}\log ^2N)\), where the logarithmic part comes from the \(\log ^2 u\) part in (6.6), u being bounded by \(N^2m_N\), and \(m_N\) being polynomial in N. Therefore the variance decay is not enough to apply the Borel–Cantelli lemma and obtain \(\mathbb {P}\)-a.s. convergence.

  3. (c)

    Note also that the previous proof, more precisely bounding the \(\log ^2 u\) part of (6.6), requires that \(\log (N^2 m_N)\ll N^{1/2}\). This is where our improved techniques to estimate the spectral gap in Proposition 3.1 are necessary. As we already remarked, the techniques of [23] show roughly that \(m_N \le e^{\sqrt{N \log N}}\) only, which is not sufficient.

6.3 Joint convergence of increments

In the next step, we extend the convergence to joint convergence of increments.

Lemma 6.5

The increments of the rescaled clock processes \(g_N^{-1}S_{\mathcal {D}}(tR_N)\) converge jointly in \(\mathbb {P}\)-probability in \(P^{\tau }_{\nu }\)-distribution to the increments of an \(\alpha \)-stable subordinator.

Proof

Fix \(k\ge 1\) and \(0=t_0<t_1<\cdots <t_k \). We will show that for every \(\lambda _1,\ldots ,\lambda _k \in (0,\infty )\) and \(\mathbb {P}\)-a.e. environment \(\tau \),

$$\begin{aligned} \lim _{N\rightarrow \infty }E^{\tau }_{\nu }\left[ e^{-\frac{1}{g_N} \sum _{i=1}^k \lambda _i(S_{\mathcal {D}}(t_iR_N)-S_{\mathcal {D}}(t_{i-1}R_N))} \right] = \lim _{N\rightarrow \infty } \prod _{i=1}^k E^{\tau }_{\nu }\left[ e^{-\frac{\lambda _i}{g_N} S_{\mathcal {D}}((t_i-t _{i-1})R_N)} \right] . \end{aligned}$$
(6.12)

Then the lemma follows by using the above proved convergence in \(\mathbb {P}\)-probability in \(P^{\tau }_{\nu }\)-distribution of the one-dimensional marginals.

Let \( I^i=[t_{i}R_N-N^2m_N,t_iR_N]\). For a set \(I\subset [0,\infty )\), let \(\mathcal {V}(I)\) be the event

$$\begin{aligned} \mathcal {V}(I) = \{Y_s\notin \mathcal {D}_N \text { for all }s\in I\}. \end{aligned}$$

On the event \(\mathcal {V}\left( \cup _{i=1}^k I^i\right) \), for every \(i\le k\),

$$\begin{aligned} S_{\mathcal {D}}(t_iR_N)-S_{\mathcal {D}}(t_{i-1}R_N) = S_{\mathcal {D}}(t_iR_N-N^2m_N) - S_{\mathcal {D}}(t_{i-1}R_N). \end{aligned}$$
(6.13)

Moreover, by Lemma 5.3, \(\mathbb {P}\)-a.s. for all \(x\in \mathcal {D}_N\), for \(\varepsilon >0\) small and N large enough,

$$\begin{aligned} P^{\tau }_{\nu }[H_x \le N^2m_N] \le 2^{(\varepsilon -1)N}. \end{aligned}$$

By (2.8), \(|\mathcal {D}_N|\le c2^{(1-\gamma ')N}\), hence the expected number of vertices \(x\in \mathcal {D}_N\) visited in a time-interval of length \(N^2m_N\) is smaller than \(c2^{(\varepsilon -\gamma ')N}\), \(\mathbb {P}\)-a.s. for N large enough. This still holds for a finite union of intervals of length \(N^2 m_N\), and so we conclude that by the Markov inequality, \(P^{\tau }_{\nu }\left[ \mathcal {V}\left( \cup _{i=1}^k I^i\right) \right] \rightarrow 1\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \).

The reason to shorten the time intervals as above is to give the Markov chain Y the time it needs to mix. Define the event

$$\begin{aligned} {\mathcal {M}}= \{{T_{\mathrm {mix}}}\circ {\theta }_{t_{i}R_{N}-N^2m_{N}}\le N^2m_{N}\,\forall i=1,\ldots ,k\}. \end{aligned}$$

It is easy to see using Proposition 3.3 that \(P^{\tau }_{\nu }[\mathcal {M}] \rightarrow 1\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \). On the event \(\mathcal {M}\) the Markov chain Y always mixes between \(t_iR_N-N^2m_N\) and \(t_i R_N\) and thus, by Lemma 3.4, for every \(i=1,\ldots ,k\) and \(y\in \mathbb {H}_N\),

$$\begin{aligned} P^{\tau }_\nu [Y_{t_iR_N} = y\mid \mathcal {M}]=\nu _y. \end{aligned}$$

Therefore, on \(\mathcal {M}\),

$$\begin{aligned}&\left( S_{\mathcal {D}}(t_iR_N-N^2m_N)-S_{\mathcal {D}}(t_{i-1}R_N)\right) _{i=1,\ldots ,k}\nonumber \\&\quad \mathop {=}\limits ^{d} \left( S_{\mathcal {D}}^{(i)}((t_i-t _{i-1})R_N-N^2m_N) \right) _{i=1,\ldots ,k}, \end{aligned}$$
(6.14)

where the \(S_{\mathcal {D}}^{(i)}\) are the clock processes of the deep traps of independent stationary started processes \(Y^{(i)}\) having the same law as Y.

Combining observations (6.13) and (6.14), with the estimates on the probabilities of \(\mathcal {V}\left( \cup _{i=1}^k I^i\right) \) and \(\mathcal {M}\), since the integrand is bounded by 1, we obtain that \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \),

$$\begin{aligned} E^{\tau }_{\nu }&\left[ e^{-\frac{1}{g_N} \sum _{i=1}^k \lambda _i(S_{\mathcal {D}}(t_iR_N)-S_{\mathcal {D}}(t_{i-1}R_N))} \right] \\&= E^{\tau }_{\nu }\left[ e^{-\frac{1}{g_N} \sum _{i=1}^k \lambda _i(S_{\mathcal {D}}(t_iR_N-N^2m_N)-S_{\mathcal {D}}(t_{i-1}R_N))} \mathbf {1}_{\mathcal {V}\left( \cup _{i=1}^k I^i\right) \cap \mathcal {M}} \right] +o(1)\\&= E^{\tau }_{\nu }\left[ \prod _{i=1}^k E^{\tau }_{\nu }\left[ e^{-\frac{\lambda _i}{g_N} S_{\mathcal {D}}^{(i)}((t_i-t _{i-1})R_N-N^2m_N)}\right] \mathbf {1}_{\mathcal {V}\left( \cup _{i=1}^k I^i\right) \cap \mathcal {M}} \right] +o(1)\\&= \prod _{i=1}^k E^{\tau }_{\nu }\left[ e^{-\frac{\lambda _i}{g_N} S_{\mathcal {D}}^{(i)}((t_i-t _{i-1})R_N-N^2m_N)} \right] +o(1). \end{aligned}$$

Using analogous arguments it can be shown that for every \(i=1,\ldots ,k\), \(\mathbb {P}\)-a.s. as \(N\rightarrow \infty \),

$$\begin{aligned} E^{\tau }_{\nu }\left[ e^{-\frac{\lambda _i}{g_N} S_{\mathcal {D}}^{(i)}((t_i-t _{i-1})R_N-N^2m_N)} \right] = E^{\tau }_{\nu }\left[ e^{-\frac{\lambda _i}{g_N} S_{\mathcal {D}}^{(i)}((t_i-t _{i-1})R_N)}\right] +o(1). \end{aligned}$$

Combining the last two equations proves (6.12) and hence the lemma. \(\square \)

6.4 Tightness in the Skorokhod topology

The last step in the proof of Proposition 6.1 is to show tightness.

Lemma 6.6

The sequence of probability measures \(P^{\tau }_{\nu }\big [g_N^{-1}S_{\mathcal {D}}(tR_N)\in \,\cdot \,\big ]\) is \(\mathbb {P}\)-a.s. tight with respect to the Skorokhod \(M_1\)-topology on \(D([0,T],\mathbb {R})\).

Proof

The proof is standard but we include it for the sake of completeness. By [33, Theorem 12.12.3], the tightness in the Skorokhod \(M_1\)-topology on \(D([0,T],\mathbb {R})\) is characterized in the following way: For \(f\in D([0,T],\mathbb {R})\), \(\delta >0\), \(t\in [0,T]\), let

$$\begin{aligned} w_f(\delta )&= \sup \left\{ \inf _{\alpha \in [0,1]}|f(t)-(\alpha f(t_1) +(1-\alpha )f(t_2))|:~t_1\le t\le t_2\le T,~t_2-t_1 \le \delta \right\} ,\\ v_f(t,\delta )&= \sup \left\{ |f(t_1)-f(t_2)|:~t_1,t_2\in [0,T]\cap (t-\delta ,t+\delta )\right\} . \end{aligned}$$

The sequence of probability measures \(P_N=P^{\tau }_{\nu }\big [g_N^{-1}S_{\mathcal {D}}(tR_N)\in \,\cdot \,\big ]\) on \(D([0,T],\mathbb {R})\) is tight in the \(M_1\)-topology, if

  1. (i)

    For every \(\varepsilon >0\) there is c such that

    $$\begin{aligned} P_N[f:~\Vert f\Vert _{\infty }>c]\le \varepsilon , \quad N\ge 1. \end{aligned}$$
    (6.15)
  2. (ii)

    For every \(\varepsilon >0\) and \(\eta >0\), there exist \(\delta \in (0,T)\) and \(N_0\) such that

    $$\begin{aligned} P_N[f:~w_f(\delta )\ge \eta ]\le \varepsilon , \quad N\ge N_0, \end{aligned}$$
    (6.16)

    and

    $$\begin{aligned} P_N[f:~v_f(0,\delta )\ge \eta ]\le \varepsilon \quad \text { and }\quad P_N[f:~v_f(T,\delta )\ge \eta ]\le \varepsilon , \quad N\ge N_0. \end{aligned}$$
    (6.17)

Since the clock processes are increasing, (6.15) is equivalent to convergence of the distribution of \(g_N^{-1}S_{\mathcal {D}}(TR_N)\), which follows from the convergence of the Laplace transform of the marginal at time T. (6.16) is immediate from the fact that the oscillating function \(w_f(\delta )\) is always zero since the processes \(g_N^{-1}S_{\mathcal {D}}(tR_N)\) are increasing. To check (6.17), again by the monotonicity of the \(g_N^{-1}S_{\mathcal {D}}(tR_N)\) it is enough to check that for \(\delta \) small enough and \(N\ge N_0\), \(P^{\tau }_{\nu }[g_N^{-1}S_{\mathcal {D}}(\delta R_N)\ge \eta ]\le \varepsilon \). By the convergence of the marginal at time \(\delta \), we may take \(\delta \) such that \(\mathbb {P}[V_{\alpha }(\delta )\ge \eta ]\le \frac{\varepsilon }{2}\) and \(N_0\) such that for \(N\ge N_0\),

$$\begin{aligned} \left| P^{\tau }_{\nu }\left[ \frac{1}{g_N}S_{\mathcal {D}}(\delta R_N)\ge \eta \right] -\mathbb {P}\left[ V_{\alpha }(\delta )\ge \eta \right] \right| \le \frac{\varepsilon }{2}. \end{aligned}$$

The reasoning for \(v_f(T,\delta )\) is similar. \(\square \)

7 Shallow traps

In this section we show that the convergence of the clock process of the deep traps shown in Sect. 6 is enough for convergence of the clock process itself.

Proposition 7.1

Under the assumptions of Theorem 1.1, the clock process of the deep traps approximates the clock process, namely, for every \(t\ge 0\),

$$\begin{aligned} \frac{1}{g_N}\left( S(tR_N)-S_{\mathcal {D}}(tR_N)\right) \xrightarrow {N\rightarrow \infty } 0 \qquad \mathbb {P}\text {-a.s. in }P^{\tau }_{\nu }\text {-probability}. \end{aligned}$$

Proof

We will split the set of shallow traps \(\mathcal {S}_N:={\mathbb {H}}_N{\setminus } \mathcal {D}_N\) into two parts and separately deal with the corresponding contributions to the clock process.

We start with ‘very shallow traps’. Let \(\delta >0\) be a small constant which will be fixed later and \(h_N= e^{\delta \alpha \beta ^2 N}\). Define the set of very shallow traps as

$$\begin{aligned} \overline{\mathcal {S}}_N = \{x\in \mathbb {H}_N:~\tau _x \le h_N\}. \end{aligned}$$

The contribution of this set to the clock process can easily be neglected as follows. Write

$$\begin{aligned} \begin{aligned} E^{\tau }_{\nu }\left[ \frac{1}{g_N}\int _0^{tR_N}(1\vee \tau _{Y_s}) \mathbf {1}_{\{Y_s\in \overline{\mathcal {S}}_N\}}{\mathrm d}s\right]&=\frac{1}{g_N} \sum _{x\in \overline{\mathcal {S}}_N} (1\vee \tau _x) E_{\nu }^{\tau }\left[ \ell _{tR_N}(x)\right] \end{aligned} \end{aligned}$$

Note that \(E^{\tau }_{\nu }[\ell _{tR_N}(x)] = \nu _x tR_N = Z_N^{-1}(1\wedge \tau _x)tR_N\), and \((1\vee \tau _x)(1\wedge \tau _x)=\tau _x\le h_N\) on \(\overline{\mathcal {S}}_N\). With (2.3) for \(Z_N\), and Lemma 4.4 for \(R_N\), for every \(\epsilon >0\), \(\mathbb {P}\)-a.s. for N large enough, the right-hand side of the last equation can be bounded from above by

$$\begin{aligned} g_N^{-1} 2^N h_N Z_N^{-1} tR_N \le c g_N^{-1} e^{\delta \alpha \beta ^2 N} 2^{(\gamma +\epsilon )N}. \end{aligned}$$

To obtain exponential decay of this expression, it is enough to take account of the exponential part of \(g_N\), which is \(e^{\alpha \beta ^2N}\). Then, up to sub-exponential factors, using that \(\gamma =\frac{\alpha ^2\beta ^2}{2\log 2}\), the above is bounded by

$$\begin{aligned} \exp \left\{ ((\delta -1)\alpha \beta ^2 + \frac{1}{2}\alpha ^2\beta ^2 + \epsilon \log 2) N\right\} . \end{aligned}$$

Since \(\alpha <1\), by choosing \(\epsilon \) and \(\delta \) small enough this can be made smaller than \(e^{-cN}\) for some \(c>0\). Applying the Markov inequality and the Borel–Cantelli lemma,

$$\begin{aligned} \frac{1}{g_N}\int _0^{tR_N}(1\vee \tau _{Y_s})\mathbf {1}_{\{Y_s\in \overline{\mathcal {S}}_N\}}{\mathrm d}s \xrightarrow {N\rightarrow \infty } 0 \qquad \mathbb {P}\text {-a.s. in }P_{\nu }^{\tau }\text {-probability.} \end{aligned}$$
(7.1)

To control the contribution of the remaining shallow traps \(\mathcal {S}_N{\setminus } \overline{\mathcal {S}}_N\), we first split this set into slices \(\mathcal {S}^i_N\) as follows. Set

$$\begin{aligned} I_N = \left\lceil \frac{1}{\log 2}(\log g'_N - \log h_N) \right\rceil . \end{aligned}$$

Note that by definition of \(g'_N\) and \(h_N\), for \(\delta \) small as fixed above, \(I_N = cN + O(1)\) for some \(c>0\). For \(i=1,\ldots ,I_N\), let

$$\begin{aligned} \mathcal {S}^i_N=\left\{ x\in \mathcal {S}_N{\setminus }\overline{\mathcal {S}}_N:~\tau _x \in [2^{-i}g'_N,2^{-i+1}g'_N)\right\} , \end{aligned}$$

so that \(\mathcal {S}_N{\setminus }\overline{\mathcal {S}}_N= \cup _{i=1}^{I_N} \mathcal {S}_N^i\).

We next control the sizes of the slices \(\mathcal {S}^i_N\). By the tail approximation (2.2), for all \(i=1,\ldots ,I_N\),

$$\begin{aligned} \mathbb {P}[y\in \mathcal {S}^i_N]\le & {} \mathbb {P}\left[ E_x > \frac{1}{\beta \sqrt{N}}(\log g'_N - i \log 2) \right] \nonumber \\= & {} f_{N,i}^{(1)} \exp \left\{ -\frac{1}{2} \alpha '^2\beta ^2N + \alpha 'i\log 2 - f_{N,i}^{(2)} - o(1)\right\} (1+o(1)).\nonumber \\ \end{aligned}$$
(7.2)

We separately control the two expressions \(f_{N,i}^{(1)}\) and \(f_{N,i}^{(2)}\). The first one equals

$$\begin{aligned} f_{N,i}^{(1)} = \frac{\alpha '\beta \sqrt{2\pi N}}{\frac{\sqrt{2\pi }}{\beta \sqrt{N}}(\log g'_N - i \log 2)}. \end{aligned}$$

To control this, note that by definition of \(I_N\), for all \(i=1,\ldots ,I_N\),

$$\begin{aligned} \log g'_N - i \log 2 \ge \log h_N - \log 2 = \delta \alpha \beta ^2N -\log 2. \end{aligned}$$

It follows that, for all \(i=1,\ldots ,I_N\), \(f_{N,i}^{(1)}\) is bounded by some constant \(c>0\), which can be chosen to be independent of i. The second expression to control in (7.2) is

$$\begin{aligned} f_{N,i}^{(2)} = \frac{i^2 \log ^2 2}{2\beta ^2 N} + \frac{i\log 2}{\alpha '\beta ^2 N}\log (\alpha '\beta \sqrt{2\pi N}). \end{aligned}$$

This is strictly positive, so it can be omitted in (7.2) in order to obtain an upper bound. Using the obtained control on \(f_{N,i}^{(1)}\) and \(f_{N,i}^{(2)}\) in (7.2), as well as the fact that \(\gamma '=\frac{\alpha '^2\beta ^2}{2\log 2}\), we conclude that for all \(i=1,\ldots ,I_N\),

$$\begin{aligned} \mathbb {P}[y\in \mathcal {S}^i_N] \le c 2^{-\gamma 'N} 2^{\alpha 'i}. \end{aligned}$$

In particular, the size \(|\mathcal {S}^i_N|\) of the i-th slice is dominated by a binomial random variable with parameters \(n=2^N\) and \(p=c2^{\alpha ' i}2^{-\gamma 'N}\). Then it follows by the Markov inequality that for every \(\epsilon >0\),

$$\begin{aligned} \mathbb {P}\left[ |\mathcal {S}^i_N|> 2^{\epsilon N} c2^{\alpha ' i}2^{(1-\gamma ')N}\right] \le 2^{-\epsilon N}. \end{aligned}$$

Since \(I_N = cN+O(1)\), a union bound and the Borel–Cantelli lemma imply that for every \(\epsilon >0\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} |\mathcal {S}_N^i|\le 2^{\epsilon _N} c2^{\alpha ' i} 2^{(1-\gamma ')N},\quad \text { for all }\quad i=1,\ldots ,I_N. \end{aligned}$$
(7.3)

Coming back to the contribution of the intermediate traps \(\mathcal {S}_N{\setminus } \overline{\mathcal {S}}_N\) to the clock process, we use as before that \(E^{\tau }_{\nu }[\ell _{tR_N}(y)] = \nu _y tR_N = \frac{1\wedge \tau _y}{Z_N}tR_N\), and \((1\vee \tau _y)(1\wedge \tau _y)=\tau _y\le 2^{-i+1}g'_N\) on \(\mathcal {S}_N^i\). With (2.3) for \(Z_N\), Lemma 4.4 for \(R_N\), and (7.3) for the size of \(\mathcal {S}_N^i\), we obtain that for every \(\varepsilon >0\), \(\mathbb {P}\)-a.s. for N large enough, for all \(i=1,\ldots ,I_N\),

$$\begin{aligned} E^{\tau }_{\nu }\left[ \frac{1}{g_N}\int _0^{tR_N}(1\vee \tau _{Y_s}) \mathbf {1}_{\{Y_s\in \mathcal {S}^i_N\}}{\mathrm d}s\right]&= \frac{1}{g_N}\sum _{y\in \mathcal {S}_N^i} (1\vee \tau _y) E^{\tau }_{\nu }[\ell _{tR_N}(y)]\\&\le g_N^{-1}|\mathcal {S}_N^i| 2^{-i+1}g'_N Z_N^{-1} t R_N\\&\le c \frac{g'_N}{g_N} 2^{(\alpha -1)i} 2^{(\gamma -\gamma '+2\varepsilon )N}. \end{aligned}$$

Summing over \(i=1,\dots ,I_N\), \(\mathbb {P}\)-a.s. for N large enough,

$$\begin{aligned} E^{\tau }_{\nu }\left[ \frac{1}{g_N}\int _0^{tR_N}(1\vee \tau _{Y_s}) \mathbf {1}_{\{Y_s\in \bigcup _{i=1}^{I_N}\mathcal {S}_N^i\}}{\mathrm d}s\right] \le c' \frac{g'_N}{g_N} 2^{(\gamma -\gamma '+2\varepsilon )N}. \end{aligned}$$
(7.4)

We claim that the right hand side of (7.4) decays exponentially in N for \(\varepsilon >0\) small enough. To this end, as before, it is enough to take account of the exponential parts in both \(g_N\) and \(g'_N\), which contribute to the right hand side of (7.4) by

$$\begin{aligned} e^{(\alpha ' - \alpha )\beta ^2 N} = 2^{(\sqrt{\gamma '}-\sqrt{\gamma })\frac{2\beta }{\beta _c} N}. \end{aligned}$$

Hence, to show the exponential decay on the right hand side of (7.4), it is sufficient to prove that we can choose \(\varepsilon >0\) small enough, such that

$$\begin{aligned} (\sqrt{\gamma '}-\sqrt{\gamma })\frac{2\beta }{\beta _c} + \gamma -\gamma '+2\varepsilon <0. \end{aligned}$$
(7.5)

With a first order approximation of the concave function \(\sqrt{x}\) at \(\gamma \),

$$\begin{aligned} \frac{1}{2\sqrt{\gamma }} (\gamma -\gamma ') < \sqrt{\gamma } - \sqrt{\gamma '}. \end{aligned}$$

Since, \(\frac{1}{2\sqrt{\gamma }}= \frac{\beta _c}{2\alpha \beta }>\frac{\beta _c}{2\beta }\) and \(\alpha <1\), this implies

$$\begin{aligned} \frac{\beta _c}{2\beta } (\gamma -\gamma ') < \sqrt{\gamma } - \sqrt{\gamma '}, \end{aligned}$$

and (7.5) thus holds for \(\varepsilon >0\) small enough. The right hand side of (7.4) then decays exponentially, and with Markov inequality we conclude that

$$\begin{aligned} \frac{1}{g_N}\int _0^{tR_N}(1\vee \tau _{Y_s}) \mathbf {1}_{\{Y_s\in \bigcup _{i=1}^{I_N}\mathcal {S}_N^i\}}{\mathrm d}s \xrightarrow {N\rightarrow \infty } 0 \quad \mathbb {P}\text {-a.s. in } P_{\nu }^{\tau }\text {-probability.} \end{aligned}$$

This together with (7.1) finishes the proof of the proposition. \(\square \)

8 Conclusions

We first complete the proof of Theorem 1.1 by observing that it follows directly from Propositions 6.1, 7.1 and Lemma 4.4.

In the remaining part of this section we discuss how our main result, Theorem 1.1, and the techniques leading to its proof can be extended to show more usual aging statements in terms of two-point functions or in terms of the age process. The purpose of this discussion is not to give the best possible results, but more to show what is possible with our techniques, and also to explain what is missing to obtain better results.

Let us start by pointing out what are the main obstacles impeding us from working with the standard objects considered in the literature previously. As example, consider first the usual two-point function

$$\begin{aligned} \Pi (t,\theta )=P^\tau _\nu [X_{t} = X_{\theta t}], \qquad t>0,\, \theta >1, \end{aligned}$$
(8.1)

where X is the Metropolis chain defined in (1.2). For this two-point function the desired statement would be, say,

$$\begin{aligned} \lim _{N\rightarrow \infty } \Pi (t g_N,\theta ) = \mathrm {Asl}_\alpha (\theta ), \qquad \text {in }{\mathbb {P}}\text {-probability, for every }t>0, \end{aligned}$$
(8.2)

where \(\mathrm {Asl}_\alpha (\theta )\) is the probability, given by the arc-sine law, that the range of an \(\alpha \)-stable Lévy process does not intersect the interval \([1,\theta ]\).

Statements like (8.2) usually follow from a clock-process convergence analogous to Theorem 1.1, if one can show that

(8.3)

However, the strongest information we have in this direction is contained in Lemma 5.5:

That means that we cannot exclude the (unlikely) possibility that X is trapped in a cluster of deep traps, making the verification of (8.2) non-trivial.

An alternative way how to show aging, put forward in [22, 31], is to consider the age process \(A_t = \tau _{X_t}\). This process does not depend on the auxiliary chain Y, which makes it more physically relevant than the clock process. Aging in the language of age processes manifests itself by the convergence

$$\begin{aligned} b_N^{-1}A_{g_N t } \xrightarrow {N\rightarrow \infty } \mathcal {Z}_t, \end{aligned}$$
(8.4)

in a sufficiently weak topology on the space of càdlàg processes, with a well chosen scaling factor \(b_N\), and with \(\mathcal {Z}_t\) being a particular process, related to the limit behavior of the clock process, whose definition we quote from [31]: Let \(\Upsilon \) be an \(\alpha \)-stable Lévy process, and let

$$\begin{aligned} \mathcal {V}_t = \int _0^t T_s\, {\mathrm d}\Upsilon _s, \end{aligned}$$

where \((T_t)_{t\ge 0}\) is an independent family of independent, mean one, exponential random variables. Let \(\mathcal {W}=\mathcal {V}^{-1}\) be the inverse of \(\mathcal {V}\). Finally, define

$$\begin{aligned} \mathcal {Z}_t = \Upsilon _{\mathcal {W}_t}-\Upsilon _{\mathcal {W}_t-}. \end{aligned}$$

In order to prove a statement like (8.4) in our context, the first difficulty comes in fixing a right scaling factor \(b_N\). In [31], this factor consist of two ingredients: (a) the observation scale for the processes X and A (i.e. \(g_N\) here), and (b) the diagonal Green function of the suitably killed process Y, evaluated at the deep traps. As the consequence of the weak asymmetry assumption of [31], this Green function can be computed explicitly, and, more importantly, it can be shown to be essentially deterministic (see (2.5) and Theorems 10.1, 11.1 in [31]).

In our case, we do not have access to such Green function (actually one of the main points of this paper is to show that one does not need it to show the clock-process convergence). As a consequence, we even cannot guess the right scale \(b_N\). Moreover, we actually doubt that a statement like (8.4) holds without modifications. The reason for this is the fact that the Green function (when suitably defined) will be random in our setting, and will take different values on different deep traps. This additional randomness should be incorporated into (8.4), by either modifying the age process \(A_t\) or the limit process \(\mathcal {Z}_t\).

One possible idea would be to replace \(A_t\) with

$$\begin{aligned} \tilde{A}_t=\tilde{\tau }_{X_t}, \qquad \text {where }\, \tilde{\tau }_x = E^\tau _x[S({T_{\mathrm {mix}}})]. \end{aligned}$$
(8.5)

While \(\tau _x\) is directly related to the energy \(E_x\) of the state x, \(\tilde{\tau }_x\) is a more dynamical quantity. It gives the mean holding time in x and ‘its neighborhood’ before mixing, which could, possibly, be related to the ‘activation energy’ needed to leave x. We expect that

$$\begin{aligned} g_N^{-1} \tilde{A}_{g_N t} \xrightarrow {N\rightarrow \infty } \mathcal {Z}_t, \end{aligned}$$

but we cannot show it easily; the non-availability of (8.3), again, hinders proving that \(\tilde{\tau }_x^{-1}S({T_{\mathrm {mix}}})\) under \(P^\tau _x\) has an exponential distribution in the limit \(N\rightarrow \infty \).

We now state our complementary aging results. To this end, set \(H_1 = \inf \{t \ge 0: {Y_t \in \mathcal {D}_N}\}\), and define inductively for \(i\ge 1\)

$$\begin{aligned} \begin{aligned} M_i&=H_i + {T_{\mathrm {mix}}}\circ \theta _{H_i},\\ N_i&=M_i + {T_{\mathrm {mix}}}\circ \theta _{M_i},\\ H_{i+1}&= \inf \{t\ge N_i:Y_t\in \mathcal {D}_N\}. \end{aligned} \end{aligned}$$

Let further

$$\begin{aligned} \mathcal {H}_i=S(H_i), \end{aligned}$$

where S is the clock process as usual. Finally, let

$$\begin{aligned} \Pi '(t,\theta )=P^\tau _\nu \left[ \{\mathcal {H}_i:i\in \mathbb {N}\}\cap [t,\theta t]=\emptyset \right] . \end{aligned}$$
(8.6)

Heuristically, \(\mathcal {H}_i\) corresponds to times when the Metropolis chain X enters a ‘new’ deep trap, or possibly a cluster of deep traps, after mixing twice previously. \(\Pi '\) then gives the probability that no such entrance occurs between t and \(\theta t\). \(\Pi '\) can also be viewed as an approximation to the probability that X remains in one (cluster of) deep trap(s) during the observation period, which is close in spirit to (8.1).

Observe, however, that the times \(\mathcal {H}_i\) are defined in terms of mixing of Y and not of X, which makes the two-point function \(\Pi '\) to some extent artificial. Optimally, two-point functions should only depend on the physically relevant chain X, and not on the auxiliary chain Y. Remark also that various modified two-point functions like \(\Pi '\) were considered in the literature previously, see e.g. [5, 9]; all of them are however expressed in terms of X only.

The function \(\Pi '\) exhibits aging given by the arc-sine law:

Theorem 8.1

Let \(\alpha ,\beta \) and \(g_N\) be as in Theorem 1.1. Then for every \(t>0\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\Pi '(tg_N,\theta )=\mathrm {Asl}_\alpha (\theta ), \qquad \text {in }{\mathbb {P}}{\text{- }}\text {probability}. \end{aligned}$$

Remark 8.2

Observe that the random scale \(R_N\) does not enter the statement of Theorem 8.1. This partially confirms the heuristic arguments given in the introduction.

Proof of Theorem 8.1

The result is far from being optimal and its proof is rather standard, so we only briefly describe its main ingredients.

We decompose \(S(H_i)\) into three parts

$$\begin{aligned} \begin{aligned} \mathcal {H}_i = S(H_i)&= \int _{0}^{H_i} (1\vee \tau _{Y_s})\, {\mathrm d}s \\&= \int _{0}^{H_i} (1\vee \tau _{Y_s}) \mathbf {1}_{\{Y_s\notin \mathcal {D}_s\}} {\mathrm d}s + \int _{I_i} \tau _{Y_s} \mathbf {1}_{\{Y_s\in \mathcal {D}_s\}} {\mathrm d}s +\int _{J_i} \tau _{Y_s} \mathbf {1}_{\{Y_s\in \mathcal {D}_s\}} {\mathrm d}s, \end{aligned} \end{aligned}$$
(8.7)

where \(I_i=\cup _{j=1}^{i-1}[H_j,M_j]\) and \(J_i=\cup _{j=1}^{i-1}[M_j,N_j]\). The first part gives the contribution of the shallow traps, the second and the third part then give the time spent in the deep traps in the corresponding time slots; observe that no deep trap is visited by Y in \(\cup _i[N_i,H_{i+1})\).

Using Proposition 7.1, the first part can be neglected by showing

$$\begin{aligned} \sup _{i:S(H_i)\in [tg_N, \theta t g_N]} \frac{1}{g_N}\int _{0}^{H_i} (1\vee \tau _{Y_s}) \mathbf {1}_{\{Y_s\notin \mathcal {D}_s\}} {\mathrm d}s =0, \qquad \text {in}\,P^\tau _\nu \text {-probability}, \mathbb {P}\text {-a.s.} \end{aligned}$$

The third part can be neglected as well, it is non-zero with very small probability. Indeed, using the same arguments as in (5.17) and below, we observe that, by the strong Markov property, the sequence \(\int _{M_j}^{N_j} \tau _{Y_s}\mathbf {1}_{\{Y_s\in \mathcal {D}_s\}} {\mathrm d}s\), \(j\ge 1\), is an i.i.d. sequence. Since \(Y_{M_j}\) is \(\nu \)-distributed, we have for every \(j\ge 1\)

$$\begin{aligned} P^\tau _\nu \left[ \int _{M_j}^{N_j} \tau _{Y_s}\mathbf {1}_{\{Y_s\in \mathcal {D}_s\}} {\mathrm d}s \ne 0\right] = P^\tau _\nu [H_{\mathcal {D}_N}\le {T_{\mathrm {mix}}}]\le 2^{(\varepsilon -\gamma ')N}, \end{aligned}$$

by Lemma 5.3 and Proposition 4.1. Moreover, by the second part of Lemma 5.6, we need to consider only \(j\le 2^{\frac{3}{2} \varepsilon _0 N+\varepsilon N}\) to compute \(S(H_i)\) at the time scales we are interested in. It follows that the probability that the third term is not-zero is bounded by \(2^{(2\varepsilon -\gamma '+\frac{3}{2} \varepsilon _0)N}\) which is negligible since \(\gamma '>1/2\) and \(\varepsilon _0<1/4\).

The dominating contribution comes from the second part which is again a sum of an i.i.d. sequence, namely of

$$\begin{aligned} U_j=\int _{H_j}^{M_j} \tau _{Y_s} \mathbf {1}_{\{Y_s\in \mathcal {D}_s\}} {\mathrm d}s, \qquad j\ge 1. \end{aligned}$$
(8.8)

We may now define the process

$$\begin{aligned} \tilde{S}(t)=\sum _{H_j\le t} U_j, \end{aligned}$$

and observe that \(g_N^{-1}\tilde{S}(t R_N)\) is close in the \(M_1\)-distance to \(g_N^{-1} S(t R_N)\), by previous arguments. It follows that \(g_N^{-1} \tilde{S} (t R_N)\) converges to a Lévy process in the \(M_1\)-topology, and thus, since the jumps of \(\tilde{S}\) are given by an i.i.d. sum, also in \(J_1\)-topology.

We may now conclude using the usual continuity arguments,

$$\begin{aligned} \Pi '(tg_N,\theta )=P^\tau _n[\{g_N^{-1}\tilde{S}(t):t\ge 0\}\cap [t,t\theta ]=\emptyset ]+o(1)=\mathrm {Asl}_\alpha (\theta )+o(1), \end{aligned}$$

as \(N\rightarrow \infty \), as required. \(\square \)

Remark 8.3

It might be not obvious why we require Y to mix twice in the definition of \(H_i\). The reason for this is the fact that while \(Y_{M_i}\) is \(\nu \)-distributed, it is not independent of \(U_i\). We need thus an additional mixing to introduce an i.i.d. structure. Without this mixing, \(U_i\) would be a two-dependent sequence.

Finally, let us state an age-process type result. Using the notation (8.8) introduced in the proof of Theorem 8.1, let

$$\begin{aligned} \bar{A}_t=U_i\quad \text {whenever }\, t\in [\mathcal {H}_{i},\mathcal {H}_{i+1}), \end{aligned}$$

with \(\mathcal {H}_0:=0\) and \(U_0:=0\). Roughly, \(A_t\) gives the total holding time in the (cluster of) deep trap(s) where X is located at time t. The next theorem is an aging statement for \(\bar{A}\).

Theorem 8.4

Let \(\alpha ,\beta , g_N\) be as in Theorem 1.1. Then for every \(T>0\), the rescaled age process

$$\begin{aligned} \bar{A}_N(t)=g_N^{-1} \bar{A}(g_N t), \qquad t\ge 0, \end{aligned}$$

converge in \({\mathbb {P}}\)-probability an \(N\rightarrow \infty \), in \(P^\tau _\nu \)-distribution on the space \(D([0,T),\mathbb {R})\) equipped with the Skorokhod \(J_1\)-topology, to the process \(\bar{\mathcal {Z}}_t\) given by \(\bar{\mathcal {Z}}_t = \Upsilon _{\mathcal {W}_t}-\Upsilon _{\mathcal {W}_t-}\), where \(\Upsilon \) is an \(\alpha \)-stable Lévy process and \(\mathcal {W}= \Upsilon ^{-1}\) its inverse.

Comparing this theorem with the ‘desired’ statement (8.5), \(\tilde{\tau }_x\)’s of \(\tilde{A}\) are replaced by U’s in \(\bar{A}\), that means the expectation \(E_x^\tau \) is not taken. This is reflected in the change made in the limiting process: the definition of the process \(\bar{\mathcal {Z}}\) does not contain the additional exponential family \(T_t\). The reason for this is the already mentioned fact that we cannot show easily that the holding times are exponential after rescaling them by their mean.

Proof of Theorem 8.4

In the proof or Theorem 8.1 we showed that \(\mathcal {H}_i\) is well approximated by \(\sum _{j=1}^{i-1} U_j\). As \(U_j\) is an i.i.d. sequence converging after rescaling to an \(\alpha \)-stable Lévy process, the claim of the Theorem can be proved by standard arguments similarly to [31]. Observe also that the stronger \(J_1\)-topology can be used here, because \(\bar{A}\) is considerably tamer than A. \(\square \)