1 Introduction

Investigation of the energy transport in crystals is one of the main problems in the non-equilibrium statistical mechanics (see [8]). It is closely related to the derivation of autonomous equations which describe a flow of quantities, conserved by the Hamiltonian (for example, the flow of energy and the corresponding heat equation). In the classical setting one looks for the energy transport in a Hamiltonian system, coupled with thermal baths which have different temperatures. This coupling is weak in geometrical sense: the thermal baths interact with the Hamiltonian system only through its boundary. Unfortunately, for the moment of writing this problem turns out to be too difficult due to the weakness of the coupling. In this case even the existence of a stationary state in the system is not clear (see [15, 29], and [13, 34] for a similar problem in a deterministic setting). That is why usually one modifies the system in order to get some additional ergodic properties. Two usual ways to achieve that are (i) to consider a weak perturbation of the hyperbolic system of independent particles [11, 31]; (ii) to perturb each particle of the Hamiltonian system by stochastic dynamics of order one [1, 4, 6, 7, 9, 27].

In particular, in [11] the authors consider a finite region of a lattice of weakly interacting geodesic flows on manifolds of negative curvature. In [27] the authors investigate that of weakly interacting anharmonic oscillators perturbed by energy preserving stochastic exchange of momentum between neighbouring nodes. Then in the both papers the authors rescale the time appropriately and, tending the strength of interaction in the Hamiltonian system to zero, show that the limiting dynamics of local energy is governed by a certain autonomous (stochastic) equation, which turns out to be the same in the both papers.

In all works listed in (i) and (ii) above, a source of the additional ergodic properties (the hyperbolicity of unperturbed system and the coupling of Hamiltonian system with stochastic dynamics) stays of order one. It is natural to investigate what happens when its intensity goes to zero. Such situation was studied in [2, 3] and [5]. In [2] the authors consider the FPU-chain with the nonlinearity replaced by energy preserving stochastic exchange of momentum between neighbouring nodes. They investigate the energy transport under the limit when the rate of this exchange tends to zero. In [3] the authors study a pinned disordered harmonic chain, where each oscillator is weakly perturbed by energy preserving noise and an anharmonic potential. They investigate behaviour of an upper bound for the Green–Kubo conductivity under the limit when the perturbation vanishes. In [5] the authors consider an infinite chain of weakly coupled cells, where each cell is weakly perturbed by energy preserving noise. They formally find a main term of the Green–Kubo conductivity and investigate its limiting behaviour when strength of the noise tends to zero.

In the present paper we weakly couple each particle of a Hamiltonain system with its own Langevin-type stochastic thermal bath and study the energy transport when this coupling goes to zero (note that such stochastic perturbation does not preserve the energy of the system). So, as in the classical setting given above, we study the situation when the coupling of the Hamiltonian system with the thermal baths is weak, but the weakness is understood in a different, non-geometrical sense. This setting seems to be natural: one can think about a crystal put in some medium and weakly interacting with it.

However, as in a number of works above, we have to assume the coupling of particles in the Hamiltonian system also to be sufficiently weak. Namely, we rescale the time and let the strength of interaction in the Hamiltonian system go to zero in the appropriate scaling with the coupling between the Hamiltonian system and the thermal baths. We prove that under this limit the local energy of the system satisfies a certain autonomous (stochastic) equation, which turns out to be mixing,Footnote 1 and show that the limiting behaviour of steady states of the system is governed by a unique stationary measure of this equation.

Since the systems of statistical physics are of very high dimension, then it is crusial to control the dependence of the systems on their size. Our work satisfies this physical requirement: most of results we obtain are uniform in the size of the system.

More specifically, we consider a \(d\)-dimensional lattice of \(N\) nonlinear Hamiltonian rotators. The neighbouring rotators have opposite spins and interact weakly via a potential (linear or nonlinear) of size \(\varepsilon ^a\), \(a\ge 1/2\). We couple each rotator with its own stochastic Langevin-type thermostat of arbitrary positive temperature by a coupling of size \(\varepsilon \). We introduce action-angle variables for the uncoupled Hamiltonian, corresponding to \(\varepsilon \) = 0, and note that a sum of actions is conserved by the Hamiltonian dynamics with \(\varepsilon > 0\). That is why the actions play for us the role of the local energy. In order to feel the interaction between rotators and the influence of thermal baths, we consider time interval of order \(t\thicksim \varepsilon ^{-1}.\) We let \(\varepsilon \) go to zero and obtain that the limiting dynamics of actions is given by equation which describes their autonomous (stochastic) evolution. It has completely non-Hamiltonian nature (i.e. it does not feel the Hamiltonian interaction of rotators) and describes a non-Hamiltonian flow of actions. Since we consider a time interval of order \(t\thicksim \varepsilon ^{-1}\), in the case \(a=1/2\) we have \(t\thicksim (\text{ Hamiltonian } \text{ interaction })^{-2}\). In [11] and [27] scalings of time and of the Hamiltonian interaction satisfy the same relation. Since the autonomous equations for energy obtained there feel the Hamiltonian interaction, in our setting one could expect to obtain an autonomous equation for actions which also feels it. However, it is not the case.

For readers, interested in the limiting dynamics of energy, we note that it can be easily expressed in terms of the limiting dynamics of actions.

The system in question (i.e. the Hamiltonian system, coupled with the thermal baths) is mixing. We show that its stationary measure \(\widetilde{\mu }^\varepsilon \), written in action-angle variables, converges, as \(\varepsilon \rightarrow 0\), to the product of the unique stationary measure \(\pi \) of the obtained autonomous equation for actions and the normalized Lebesgue measure on the torus \(\mathbb {T}^N\).

We prove that the convergence as \(\varepsilon \rightarrow 0\) of the vector of actions to a solution of the autonomous equation is uniform in the number of rotators \(N\). The convergence of the stationary measures is also uniform in \(N\), in some natural cases.

We use Khasminski–Freidlin–Wentzell-type averaging technics in the form developed in [2325]. For a general Hamiltonian these methods are applied when the interaction potential is of the same order as the coupling with the thermal baths, i.e. \(a=1\). However, we find a large natural class of Hamiltonians such that the results stay the same even if \(1/2 \le a <1\), i.e. when the interaction potential is stronger. This class consists of Hamiltonians which describe lattices of rotators with alternated spins, when neighbouring rotators rotate in opposite directions. It has to do with the fact that such systems of rotators do not have resonances of the first order. To apply the methods above in the case \(1/2 \le a <1\) we kill the leading term of the interaction potential by a global canonical transformation which is \(\varepsilon ^a\)-close to the identity. The resulting autonomous equation for actions has the non-Hamiltonian nature since the averaging eliminates Hamiltonian terms.

Note that a similar (but different) problem was considered in [17] (see also Chapter 9.3 of [18]). There the authors study a system of oscillators, weakly interacting via couplings of size \(\varepsilon \). Each oscillator is weakly perturbed by its own stochastic Langevin-type thermostat, also of the size \(\varepsilon \). The authors consider time interval of order \(\varepsilon ^{-1}\) and using the averaging method show that under the limit \(\varepsilon \rightarrow 0\) the local energy satisfies an autonomous (stochastic) equation. Compare to our work, in this study the authors do not investigate the limiting (as \(\varepsilon \rightarrow 0\)) behaviour of stationary measures as well as the dependence of the results on the number of particles in the system.

2 Set Up and Main Results

2.1 Set Up

We consider a lattice \({\mathcal {C}}\subset \mathbb {Z}^d\), \(d\in \mathbb {N}\), which consists of \(N\) nodes \(j\in {\mathcal {C}},\, j=(j_1,\ldots ,j_d).\) In each node we put an integrable nonlinear Hamiltonian rotator which is coupled through a small potential with rotators in neighbouring positions. The rotators are described by complex variables \(u=(u_j)_{j \in {\mathcal {C}}} \in \mathbb {C}^{N}\). Introduce the symplectic structure by the \(2\)-form \(\frac{i}{2}\sum \limits _{j\in {\mathcal {C}}}\,d u_j \wedge d\overline{u}_j=\sum \limits _{j\in {\mathcal {C}}} d x_j\wedge d y_j\), if \(u_j=x_j+iy_j\). Then the system of rotators is given by the Hamiltonian equation

$$\begin{aligned} \dot{u}_j = i\nabla _{j} H^\varepsilon (u), \quad j\in {\mathcal {C}}, \end{aligned}$$
(2.1)

where the dot means a derivative in time \(t\) and \(\nabla _j H^\varepsilon = 2 \partial _{\overline{u}_j} H^\varepsilon \) is the gradient of the Hamiltonian \(H^\varepsilon \) with respect to the Euclidean scalar product \(\cdot \) in \(\mathbb {C}\simeq \mathbb {R}^{2}:\)

$$\begin{aligned} \text{ for } \quad z_1,z_2 \in \mathbb {C}\quad z_1\cdot z_2 := \mathrm{Re }z_1 \mathrm{Re }z_2+\mathrm{Im }z_1\mathrm{Im }z_2= \mathrm{Re }z_1 \overline{z}_2. \end{aligned}$$
(2.2)

The Hamiltonian has the form

$$\begin{aligned} H^\varepsilon =\frac{1}{2} \sum \limits _{j \in {\mathcal {C}}} F_j \left( |u_j|^2\right) + \frac{\varepsilon ^a}{4} \sum \limits _{j,k \in {\mathcal {C}}: |j-k|=1} G\left( |u_j-u_{k}|^2\right) , \end{aligned}$$
(2.3)

where \(|j|:=|j_1|+\cdots +|j_d|\), \(a\ge 1/2\) and \(F_j, G: [0,\infty ) \rightarrow \mathbb {R}\) are sufficiently smooth functions with polynomial bounds on the growth at infinity (precise assumptions are given below).

We weakly couple each rotator with its own stochastic thermostat of arbitrary temperature \({\mathcal {T}}_j\), satisfying

$$\begin{aligned} 0<{\mathcal {T}}_{j}\le C<\infty , \end{aligned}$$

where the constant \(C\) does not depend on \(j,N,\varepsilon \). More precisely, we consider the system

$$\begin{aligned} \dot{u}_j = i\nabla _{j} H^\varepsilon (u) + \varepsilon g_j(u) + \sqrt{\varepsilon {\mathcal {T}}_j} \dot{\beta }_j, \quad u_j(0) = u_{0j},\quad j\in {\mathcal {C}}, \end{aligned}$$
(2.4)

where \(\beta = (\beta _j)_{j \in {\mathcal {C}}}\in \mathbb {C}^{N}\) are standard complex independent Brownian motions. That is, their real and imaginary parts are standard real independent Wiener processes. Initial conditions \(u_0=(u_{0j})_{j\in {\mathcal {C}}}\) are random variables, independent from \(\beta \). They are the same for all \(\varepsilon \). Functions \(g_j\), which we call “dissipations”, have some dissipative properties, for example, \(g_j(u)=-u_j\) (see Remark 2.1 below). They couple only neighbouring rotators, i.e. \(g_j(u)=g_j\big ((u_k)_{k\in {\mathcal {C}}:|k-j|\le 1}\big )\).

The scaling of the thermostatic term in Eq. (2.4) is natural since, in view of the dissipative properties of \(g_j\), the only possibility for solution of equation \(\dot{u}_j=\varepsilon g_j(u) + \varepsilon ^b \sqrt{ {\mathcal {T}}_j} \dot{\beta }_j, \; j\in {\mathcal {C}}\), to stay of the order 1 for all \(t\ge 0\) as \(\varepsilon \rightarrow 0\) is \(b=1/2\).

The case \(a=1/2\) is the most difficult, so further on we consider only it, the other cases are similar. Writing the corresponding Eq. (2.4) in more details, we obtain

$$\begin{aligned} \dot{u}_j&= if_j\left( |u_j|^2\right) u_j + i\sqrt{\varepsilon }\sum \limits _{k \in {\mathcal {C}}: |j-k|=1} G^{\prime }\left( |u_j-u_k|^2\right) (u_j-u_k) + \varepsilon g_j(u) + \sqrt{\varepsilon {\mathcal {T}}_j}\dot{\beta }_j, \end{aligned}$$
(2.5)
$$\begin{aligned} u_j(0)&=u_{0j}, \quad j \in {\mathcal {C}}, \end{aligned}$$
(2.6)

where \(f_j(x):=F^{\prime }_j(x)\) and the prime denotes a derivative in \(x\).

Remark 2.1

Our principal example is the case of diagonal dissipation, when \(g_j(u)=-|u_j|^{p-2}u_j\) for all \(j\in {\mathcal {C}}\) and some \(p\in \mathbb {N},\,p\ge 2\). In particular, the linear diagonal dissipation when \(p=2\) and \(g_j(u)=-u_j\). The diagonal dissipation does not provide any interaction between rotators. In this case each rotator is just coupled with a Langevin-type thermostat. The results become more interesting if we admit functions \(g_j\) of a more involved structure which not only introduces dissipation, but also provides some non-Hamiltonian interaction between the rotators. If for the reader the presence of the non-Hamiltonian interaction seems unnatural, he can simply assume that the dissipation is diagonal.

We impose on the system assumptions HF, HG, Hg and HI. Their exact statements are given at the end of the section. Now we briefly summarize them. We fix some \(p\in \mathbb {N},\,p\ge 2\), and assume that \(f_j(|u_j|^2)=(-1)^{|j|}f(|u_j|^2)\), where \(f(|u_j|^2)\) is separated from zero and has at least a polynomial growth of a power \(p\) (HF). It means that the leading term of the Hamiltonian \(H^\varepsilon \) is a nonlinearity which rotates the neighbouring rotators in opposite directions sufficiently fast. We call it the “alternated spins condition”. The function \(G'(|u_j|^2)\) is assumed to have at most the polynomial growth of the power \(p-2\), i.e. the interaction term in (2.5) has the growth at most of the power \(p-1\) (HG). The functions \(g_j(u)\) have some dissipative properties and have the polynomial growth of the power \(p-1\) (Hg). The functions \(f, G\) and \(g_j\) are assumed to be sufficiently smooth. In HI we assume that the initial conditions are “not very bad”, this assumption is not restrictive. For an example of functions \(f,G\) and \(g_j\) satisfying assumptions HF, HG and Hg, see Example 2.4. In the case \(a\ge 1\) the assumptions get weaker, see Remark 2.5. In particular, the rotators are permitted to rotate in any direction.

2.2 Main Results

For a vector \(u=(u_k)_{k\in {\mathcal {C}}}\in \mathbb {C}^N\) we define the corresponding vectors of actions and angles

$$\begin{aligned} I=I(u)=(I_k(u_k))_{k\in {\mathcal {C}}},\; I_k=\frac{1}{2} |u_k|^2 \quad \text{ and }\quad \varphi =\varphi (u)=(\varphi _k(u_k))_{k\in {\mathcal {C}}}, \; \varphi _k=\arg u_k, \end{aligned}$$

where we put \(\varphi _k(0)=0\). Thus, \((I,\varphi )\in \mathbb {R}^N_{+0}\times \mathbb {T}^N\), where \(\mathbb {R}^N_{+0}=\{I=(I_k)_{k\in {\mathcal {C}}}\in \mathbb {R}^N:\, I_k\ge 0\; \forall k\in {\mathcal {C}}\}\), and \(u_k=\sqrt{2I_k}e^{i\varphi _k}\).Footnote 2 The variables \((I,\varphi )\) form the action-angle coordinates for the uncoupled Hamiltonian (2.3)\(|_{\varepsilon =0}\).

The direct computation shows that the sum of actions \(\sum \limits _{k\in {\mathcal {C}}} I_k\) is a first integral of the Hamiltonian \(H^\varepsilon \) for every \(\varepsilon >0\). That is why for our study the actions will play the role of the local energy, and we will examine their limiting behaviour as \(\varepsilon \rightarrow 0\) instead of the limiting behaviour of energy. Moreover, the reader, interested in the limiting dynamics of energy, will easily express it in terms of the limiting dynamics of actions, since in view of (2.3), the energy of a \(j\)-th rotator tends to \(\frac{1}{2} F_j(2I_j)\) as \(\varepsilon \rightarrow 0\), see Corollary 4.7 for details.

Let us write a function \(h(u)\) in the action-angle coordinates, \(h(u)=h(I,\varphi )\). Denote its averaging in angles as

$$\begin{aligned} \langle h \rangle (I):=\int \limits _{\mathbb {T}^{N}} h(I,\varphi ) \, d\varphi . \end{aligned}$$

Here and further on \(d\varphi \) denotes the normalized Lebesgue measure on the torus \(\mathbb {T}^{N}\). Let

$$\begin{aligned} {\mathcal {R}}_j(I):=\langle g_j(u) \cdot u_j \rangle , \end{aligned}$$
(2.7)

where we recall that the scalar product \(\cdot \) is given by (2.2). It is well known that under our assumptions a solution \(u^\varepsilon (t)\) of system (2.5)–(2.6) exists, is unique and is defined for all \(t\ge 0\) ([21]). Let \(I^\varepsilon (t)\) and \(\varphi ^\varepsilon (t)\) be the corresponding vectors of actions and angles, i.e. \(I^\varepsilon (t)=I(u^\varepsilon (t))\), \(\varphi ^\varepsilon (t)=\varphi (u^\varepsilon (t))\). We fix arbitrary \(T\ge 0\) and examine the dynamics of actions \(I^\varepsilon \) on the long-time interval \([0,T/\varepsilon ]\) under the limit \(\varepsilon \rightarrow 0\). It is useful to pass to the slow time \(\tau =\varepsilon t\), then the interval \(t\in [0,T/\varepsilon ]\) corresponds to \(\tau \in [0,T]\). We prove

Theorem 2.2

In the slow time the family of distributions of the actions \({\mathcal {D}}(I^{\varepsilon }(\cdot ))\) with \(\varepsilon \rightarrow 0\) converges weakly on \(C([0,T], \mathbb {R}^N)\) to a distribution \({\mathcal {D}}(I^0(\cdot ))\) of a unique weak solution \(I^0(\tau )\) of the system

$$\begin{aligned}&d I_{j} = ({\mathcal {R}}_j(I) +{\mathcal {T}}_{j} )\,d\tau + \sqrt{2I_{j} {\mathcal {T}}_{j}}\,d\widetilde{\beta }_{j}, \quad j\in {\mathcal {C}},\end{aligned}$$
(2.8)
$$\begin{aligned}&{\mathcal {D}}(I(0))={\mathcal {D}}(I(u_0)), \end{aligned}$$
(2.9)

where \(\widetilde{\beta }_{j}\) are standard real independent Brownian motions. The convergence is uniform in \(N\).

The limiting measure satisfies some estimates, for details see Theorem 4.6. In order to speak about the uniformity in \(N\) of convergence, we assume that the set \({\mathcal {C}}\) depends on the number of rotators \(N\) in such a way that \({\mathcal {C}}(N_1)\subset {\mathcal {C}}(N_2)\) if \(N_1<N_2\). The functions \(G, F_j\) and the temperatures \({\mathcal {T}}_j\) are assumed to be independent from \(N\), while the functions \(g_j\) are assumed to be independent from \(N\) for \(N\) sufficiently large (depending on \(j\)).Footnote 3 The initial conditions \(u_0\) are assumed to agree in \(N\), see assumption HI(ii). The uniformity of convergence of measures through all the text is understood in the sense of finite-dimensional projections. For example, for Theorem 2.2 it means that for any \(\Lambda \subset \mathbb {Z}^d\) which does not depend on \(N\) and satisfies \(\Lambda \subset {\mathcal {C}}(N)\) for all \(N\ge N_\Lambda ,\,N_\Lambda \in \mathbb {N}\), we have Footnote 4

$$\begin{aligned} {\mathcal {D}}\big ((I^\varepsilon _j(\cdot ))_{j\in \Lambda }\big )\rightharpoonup {\mathcal {D}}\big ((I^0_j(\cdot ))_{j\in \Lambda }\big )\quad \text{ as }\quad \varepsilon \rightarrow 0 \quad \text{ uniformly } \text{ in } N\ge N_\Lambda . \end{aligned}$$

Note that in the case of diagonal dissipation \(g_j(u)=-u_j|u_j|^{p-2}\) Eq. (2.8) turns out to be diagonal

$$\begin{aligned} d I_{j} = (-(2I_j)^{p/2} +{\mathcal {T}}_{j} )\,d\tau + \sqrt{2I_{j} {\mathcal {T}}_{j}}\,d\widetilde{\beta }_{j}, \quad j\in {\mathcal {C}}. \end{aligned}$$
(2.10)

For more examples see Sect. 4.4.

Relation (2.8) is an autonomous equation for actions which describes their transport under the limit \(\varepsilon \rightarrow 0\). Since it is obtained by the avergaing method we call it the averaged equation. Note that the averaged equation does not depend on a precise form of the potential \(G\). It means that the limiting dynamics does not feel the Hamiltonian interaction between rotators and provides a flow of actions between nodes only if the dissipation is not diagonal.

In Sect. 4.2 we investigate the limiting behaviour, as \(\varepsilon \rightarrow 0\), of averaged in time joint distribution of actions and angles \(I^\varepsilon , \varphi ^\varepsilon \). See Theorem 4.8.

Recall that a stochastic differential equation is mixing if it has a unique stationary measure and all solutions of this equation weakly converge to this stationary measure in distribution. It is well known that Eq. (2.5) is mixing (see [21, 35, 36]). Denote its stationary measure by \(\widetilde{\mu }^{\varepsilon }\). Denote the projections to spaces of actions and angles by \(\Pi _{ac}:\,\mathbb {C}^N\rightarrow \mathbb {R}^N\) and \(\Pi _{ang}:\,\mathbb {C}^N\rightarrow \mathbb {T}^N\) correspondingly. Let

$$\begin{aligned} {\mathcal {C}}^\infty :=\cup _{N\in \mathbb {N}}{\mathcal {C}}(N). \end{aligned}$$

We will call Eq. (2.8) for the case \(N=\infty \), i.e. with \({\mathcal {C}}\) replaced by \({\mathcal {C}}^\infty \), the “averaged equation for the infinite system of rotators”. Let \(\mathbb {R}^\infty \) (\(\mathbb {C}^\infty \)) be the space of real (complex) sequences provided with the Tikhonov topology.

Theorem 2.3

  1. (i)

    The averaged equation (2.8) is mixing.

  2. (ii)

    For the unique stationary measure \(\widetilde{\mu }^\varepsilon \) of (2.5), written in the action-angle coordinates, we have

    $$\begin{aligned} (\Pi _{ac}\times \Pi _{ang})_*\widetilde{\mu }^\varepsilon \rightharpoonup \pi \times \, d\varphi \quad \text{ as } \quad \varepsilon \rightarrow 0, \end{aligned}$$
    (2.11)

    where \(\pi \) is a unique stationary measure of the averaged equation (2.8). If the averaged equation for the infinite system of rotators has a unique stationary measure \(\pi ^\infty \) in the class of measures defined on the Borel \(\sigma \)-algebra \({\mathcal {B}}(\mathbb {R}^\infty )\) and satisfying \(\sup \limits _{j\in {\mathcal {C}}^\infty }\langle \pi ^\infty , I_j \rangle <\infty \), then convergence (2.11) is uniform in \(N\).

  3. (iii)

    The vector of actions \(I^\varepsilon (\tau )\), written in the slow time, satisfies

    $$\begin{aligned} \lim \limits _{\tau \rightarrow \infty }\lim \limits _{\varepsilon \rightarrow 0} {\mathcal {D}}(I^\varepsilon (\tau ))=\lim \limits _{\varepsilon \rightarrow 0}\lim \limits _{\tau \rightarrow \infty } {\mathcal {D}}(I^\varepsilon (\tau ))=\pi . \end{aligned}$$
    (2.12)

We prove this theorem in Sect. 4.3. Each limiting point (as \(\varepsilon \rightarrow 0\)) of the family of measures \(\{\widetilde{\mu }^\varepsilon , 0<\varepsilon \le 1\}\) is an invariant measure of the system of uncoupled integrable rotators, corresponding to (2.1)\(|_{\varepsilon =0}\). It has plenty of invariant measures. Theorem 2.3 ensures that only one of them is a limiting point, and distinguishes it.

Arguing as when proving Theorem 2.3, we can show that the averaged equation for the infinite system of rotators has a stationary measure belonging to the class of measures above, but we do not know if it is unique. However, it can be proven that it is unique if this equation is diagonal. In this case the convergence (2.11) holds uniformly in \(N\). In particular, this happens when the dissipation is diagonal. For more examples see Sect. 4.4.

Let us briefly discuss some generalizations. Assume that the power \(p\) from assumptions HF, HG, Hg equals to 2 (so that, in particular, the interaction potential has at most a quadratic growth). Then, if the functions \(g_j(u)\) do not have dissipative properties (more precisely, if assumption Hg(ii) below is not satisfied), Theorems 2.2 and 4.8 hold true, but Theorem 2.3 fails.

Let us now suppose that some rotators are “defective”: there exists a region \({\mathcal {C}}_D\subset \mathbb {Z}^d\), independent from \(N\), such that for \(j\in {\mathcal {C}}_D\) the spins are not alternated. Then theorems similar to Theorems 2.2, 4.8, 2.3 hold for the projections of the corresponding family of measures to the “non defective” nodes \({\mathcal {C}}_{ND}:={\mathcal {C}}\setminus U({\mathcal {C}}_D)\), where \(U({\mathcal {C}}_D)\) denotes some neighbourhood of \({\mathcal {C}}_D\) in \(\mathbb {Z}^d\). Let us discuss the theorem, corresponding to Theorem 2.2, for the other results the changes are similar. We show that any limiting (as \(\varepsilon \rightarrow 0\)) point \(Q^0\) of the family of measures \(\{{\mathcal {D}}\big ((I_j^\varepsilon (\cdot ))_{j\in {\mathcal {C}}_{ND}}\big ), \; 0<\varepsilon \le 1\}\) is a weak solution of the averaged equation (2.8)–(2.9) with \({\mathcal {C}}\) replaced by \(\hat{\mathcal {C}}_{ND}\), where \(\hat{\mathcal {C}}_{ND}={\mathcal {C}}_{ND}\setminus U(\partial {\mathcal {C}}_{ND})\) and \(U(\partial {\mathcal {C}}_{ND})\) is some neighbourhood of \(\partial {\mathcal {C}}_{ND}\). Thus, if for \(j\in \partial \hat{\mathcal {C}}_{ND}\) the function \({\mathcal {R}}_j(I)\) depends on \(I_j\) with \(j\in {\mathcal {C}}_{ND}\setminus \hat{\mathcal {C}}_{ND}\), then the averaged equation is not closed and we do not know if its weak solution is unique. In this case we can say nothing about the uniqueness of the limiting point \(Q^0\) and about the uniformity of convergence in \(N\). However, if the averaged equation is diagonal (for example, in the case of diagonal dissipation), then \({\mathcal {R}}_j(I)={\mathcal {R}}_j(I_j)\), the averaged equation is closed and has a unique weak solution. In this case the projection \(\hat{Q}^0\) of the limiting point \(Q^0\) to \(\hat{\mathcal {C}}_{ND}\) is uniquely determined, the convergence \({\mathcal {D}}\big ((I_j^\varepsilon (\cdot ))_{j\in \hat{\mathcal {C}}_{ND}}\big ) \rightharpoonup \hat{Q}^0\) as \(\varepsilon \rightarrow 0\) holds and is uniform in \(N\). Thus, the defects have only local influence on the limiting dynamics as \(\varepsilon \rightarrow 0\). For details see [14], Sect. 7.

2.3 Strategy

In this section we describe the main steps of proofs of Theorems 2.2 and 2.3.

First we need to obtain uniform in \(\varepsilon \), \(N\) and time \(t\) estimates for solutions of (2.5). For a general system of particles there is no reason why all the energy could not concentrate at a single position, forming a kind of delta-function as \(N\rightarrow \infty \). It is remarkable that in our system this does not happen, at least on time intervals of order \(1/\sqrt{\varepsilon }\), even without alternated spins condition and in absence of dissipation. One can prove it working with the family of norms \(\Vert \cdot \Vert _{j,q}\) (see Agreements 6). But for a dissipative system with alternated spins the concentration of energy also does not happen as \(t\rightarrow \infty \). To see this, we make first one step of the perturbation theory. The alternated spins condition provides that the system does not have resonances of the first order. Then in Theorem 3.1 we find a global canonical change of variables in \(\mathbb {C}^N\), transforming \(u\rightarrow v,\,(I,\varphi )\rightarrow (J,\psi ),\) which is \(\sqrt{\varepsilon }\)-close to identity uniformly in \(N\) and kills in the Hamiltonian the term of order \(\sqrt{\varepsilon }\). We rewrite Eq. (2.5) in the new variables \(v\) and call the result “\(v\)-equation” (see 3.5). Using the fact that in the new coordinates the interaction potential has the same size as the dissipation and working with the family of norms \(\Vert \cdot \Vert _{j,q}\), we obtain desired estimates for solutions of the \(v\)-equation.

Then we pass to the limit \(\varepsilon \rightarrow 0\). In the action-angle coordinates \((J,\psi )\) the \(v\)-equation takes the form

$$\begin{aligned} d J&=X(J, \psi ,\varepsilon )\,d\tau + \sigma (J,\psi ,\varepsilon ) d \beta +\overline{\sigma }(J,\psi ,\varepsilon ) d\overline{\beta }, \end{aligned}$$
(2.13)
$$\begin{aligned} d\psi&=\varepsilon ^{-1}Y(J,\varepsilon )\,d\tau + \ldots , \end{aligned}$$
(2.14)

where the term \(\ldots \) and \(X,Y, \sigma \) are of order \(1\). For details see (4.2)–(4.3). So the angles rotate fast, while the actions change slowly. The averaging principle for systems of the type (2.13)–(2.14) was established in [1618, 20] and, more recently, in [24, 25]. Our situation is similar to that in [24, 25], and we follow the scheme suggested there. Let \(v^\varepsilon (\tau )\) be a solution of the \(v\)-equation, written in the slow time, and \(J^\varepsilon (\tau )=J(v^\varepsilon (\tau ))\) be the corresponding vector of actions. We prove Theorem 4.2, stating that the family of measures \({\mathcal {D}}(J^{\varepsilon }(\cdot ))\) converges weakly as \(\varepsilon \rightarrow 0\) to a distribution of a unique weak solution of the averaged in angles Eq. (2.13)\(|_{\varepsilon =0}\), which has the form (2.8). To prove that this convergence is uniform in \(N\), we use the uniformity of estimates obtained above and the fact that the averaged equation for the infinite system of rotators has a unique weak solution. Since the change of variables is \(\sqrt{\varepsilon }\)-close to identity, the behaviours of actions \(J^\varepsilon \) and \(I^\varepsilon \) as \(\varepsilon \rightarrow 0\) coincide, and we get Theorem 2.2. The averaged equation (2.8) does not feel the Hamiltonian interaction of rotators since the averaging eliminates the Hamiltonian terms.

The averaged equation (2.8) is irregular: its dispersion matrix is not Lipschitz continuous. To study it we use the method of effective equation, suggested in [23],[24] (in our case its application simplifies). The effective equation (see 4.35) is defined in the complex coordinates \(v=(v_k)_{k\in {\mathcal {C}}}\in \mathbb {C}^N\). If \(v(\tau )\) is its solution then the actions \(J(v(\tau ))\) form a weak solution of Eq. (2.8) and vice versa (see Proposition 4.9). The effective equation is well posed and mixing. This implies item (i) of Theorem 2.3. The proof of item (ii) is based on the averaging technics developed in Theorem 2.2.

Note that the convergence (2.11) is equivalent to

$$\begin{aligned} \widetilde{\mu }^\varepsilon \rightharpoonup m \quad \text{ as }\quad \varepsilon \rightarrow 0, \end{aligned}$$
(2.15)

where \(m\) is the unique stationary measure of the effective equation, see Remark 4.13. Item (iii) of Theorem 2.3 follows from the first two items and Theorem 2.2.

2.4 Agreements and Assumptions

Agreements

  1. (1)

    We refer to item 1 of Theorem 3.1 as Theorem 3.1(1), etc.

  2. (2)

    By \(C,C_1,C_2,\ldots \) we denote various positive constants and by \(C(b),C_1(b),\ldots \) we denote positive constants which depend on the parameter \(b\). We do not indicate their dependence on the dimension \(d\), power \(p\) and time \(T\) which are fixed through all the text and always indicate if they depend on the number of rotators \(N\), times \(t,s,\tau ,\ldots \), positions \(j,k,l,m,\ldots \in {\mathcal {C}}\) and small parameter \(\varepsilon \). Constants \(C,C(b),\ldots \) can change from formula to formula.

  3. (3)

    Unless otherwise stated, assertions of the type “\(b\) is sufficiently close to \(c\)” and “\(b\) is sufficiently small/big” always suppose estimates independent from \(N\), positions \(j,k,l,m,\ldots \in {\mathcal {C}}\) and times \(t,s,\tau ,\ldots \).

  4. (4)

    We use notations \(b \wedge c:=\min (b,c),\; b \vee c=\max (b,c)\).

  5. (5)

    For vectors \(b=(b_k),\, c=(c_k)\), \(b_k,c_k\in \mathbb {C}\), we denote

    $$\begin{aligned} \quad a\cdot b:=\sum \limits a_k\cdot b_k=\sum \limits \mathrm{Re }a_k \overline{b}_k. \end{aligned}$$
  6. (6)

    For \(1/2<\gamma <1\), \(j\in {\mathcal {C}}\) and \(q>0\) we introduce a family of scalar products and a family of norms on \(\mathbb {C}^N\) as Footnote 5

    $$\begin{aligned} ( u \cdot u^1)_j:&= \sum \limits _{k\in {\mathcal {C}}} \gamma ^{|k-j|}u_k\cdot u^1_k, \;\\ \Vert u\Vert _{j,q}^q:&= \sum \limits _{k\in {\mathcal {C}}}\gamma ^{|k-j|}|u_k|^q, \quad \text{ where } u=(u_k)_{k\in {\mathcal {C}}},\,u^1=(u^1_k)_{k\in {\mathcal {C}}}\in \mathbb {C}^N. \end{aligned}$$
  7. (7)

    For a metric space \(X\) by \({\mathcal {L}}_b(X)\) (\({\mathcal {L}}_{loc}(X)\)) we denote the space of bounded Lipschitz continuous (locally Lipschitz continuous) functions from \(X\) to \(\mathbb {R}\).

  8. (8)

    Convergence of measures we always understand in the weak sense.

  9. (9)

    We suppose \(\varepsilon \) to be sufficiently small, where it is needed.

Assumptions

Here we formulate our assumptions. In Example 2.4 we give examples of functions \(F_j,G\) and \(g_j\) satisfying them.

Fix \(p\in \mathbb {N},\,p\ge 2\). Assume that there exists \(\varsigma >0\) such that the following holds.

HF (Alternated spins condition). For every \(j\in {\mathcal {C}}\) and some function \(f\) we have \(f_j=(-1)^{|j|} f\). Function \(f:(-\varsigma ,\infty )\mapsto \mathbb {R}_+\) is \(C^3\) -smooth and its derivative \(f'\) has only isolated zeros. Moreover, for any \(x \ge 0\) we have

$$\begin{aligned} f(x) \ge C(1+x^{p/2}) \quad \text{ and }\quad |f'(x)|x^{1/2}+|f''(x)|x + |f'''(x)|x^{3/2} \le Cf(x). \end{aligned}$$

HG Function \(G:(-\varsigma ,\infty )\mapsto \mathbb {R}\) is \(C^4\) -smooth. Moreover, for any \(x\ge 0\) it satisfies

$$\begin{aligned} |G'(x)|x^{1/2}+ |G''(x)|x + |G'''(x)|x^{3/2}\le C(1+ x^{(p-1)/2}). \end{aligned}$$

Hg

  1. (i)

    Functions \(g_l:\mathbb {C}^N\mapsto \mathbb {C},\,l\in {\mathcal {C}}\) are \(C^2\) -smooth and depend on \(u=(u_k)_{k\in {\mathcal {C}}}\) only through \((u_k)_{k:|k-l|\le 1}\). For any \(u\in \mathbb {C}^N\) and \(l,m\in {\mathcal {C}}\) they satisfy

    $$\begin{aligned} |g_l(u)|,|\partial _{u_m} g_{l}(u)|,|\partial _{\overline{u}_m} g_{l}(u)|\le C\left( 1+\sum \limits _{k:|k-l|\le 1}|u_k|^{p-1}\right) , \end{aligned}$$

    while all the second derivatives are assumed to have at most a polynomial growth at infinity, which is uniform in \(l\in {\mathcal {C}}\).

  2. (ii)

    (Dissipative condition) There exists a constant \(C_g>0\), independent from \(N\), such that for any \(j\in {\mathcal {C}}\) and \(1/2<\gamma <1\) sufficiently close to one, for any \((u_k)_{k\in {\mathcal {C}}}\in \mathbb {C}^N\)

    $$\begin{aligned} ( g(u)\cdot u )_j \le -C_g \Vert u\Vert ^p_{j,p} + C(\gamma ), \quad \text{ where }\quad g:=(g_{l})_{l\in {\mathcal {C}}}, \end{aligned}$$

and the scalar product \((\cdot )_j\) and the norm \(\Vert \cdot \Vert _{j,p}\) are defined in Agreements.6. Recall that they depend on \(\gamma \).

HI

  1. (i)

    For some constant \(\alpha _0>0\), independent from \(N\), and every \(j \in {\mathcal {C}}\) we have

    $$\begin{aligned} {\mathbf {E}}\,e^{\alpha _0 |u_{0j}|^2}\le C. \end{aligned}$$
  2. (ii)

    The initial conditions \(u_0=u_0^N\) agree in \(N\) in the sense that there exists a \(\mathbb {C}^\infty \) -valued random variable \(u_0^\infty =(u_{0j}^\infty )_{j\in {\mathcal {C}}^\infty }\) satisfying for any \(N\in \mathbb {N}\) the relation

    $$\begin{aligned} {\mathcal {D}}\left( (u_{0j}^N)_{j\in {\mathcal {C}}(N)}\right) ={\mathcal {D}}\left( (u^\infty _{0j})_{j\in {\mathcal {C}}(N)}\right) . \end{aligned}$$

In what follows, we suppose the assumptions above to be held.

Example 2.4

As an example of functions \(f\) and \(G\) satisfying conditions HF and HG, we propose \(f(x)=1+x^k\) for any \( \mathbb {N}\ni k \ge p/2\), and \(G(x)=\hat{G}(\sqrt{x+\varsigma })\) for any constant \(\varsigma >0\) and any \(C^4\)-smooth function \(\hat{G}:\mathbb {R}_{+}\mapsto \mathbb {R}\) satisfying

$$\begin{aligned} |\hat{G}'(x)|+|\hat{G}''(x)|+|\hat{G}'''(x)| \le C(1+x^{p-1})\quad \text{ for } \text{ all } x \ge \sqrt{\varsigma }. \end{aligned}$$

The simpliest example of functions \(g_l\) satisfying assumption Hg is the diagonal dissipation \(g_l(u)=-u_l|u_l|^{p-2}\). As an example of functions \(g_l\) providing non-Hamiltonian interaction between rotators, we propose \(g_l(u)=-u_l|u_l|^{p-2}+\widetilde{g}_l(u)\), where \(\widetilde{g}_l\) satisfies Hg(i) and \(|\widetilde{g}_l(u)| \le \widetilde{C}\sum \limits _{k:|k-l|\le 1} |u_k|^{p-1} + C\), where the constant \(\widetilde{C}\) satisfiesFootnote 6 \(\widetilde{C}<\frac{1}{8d(2d+1)^2}\)

For more examples see Sect. 4.4.

Remark 2.5

In the case \(a\ge 1\) assumptions HF and HG simplify.

HF’-HG’. Functions \(f_j,G:(-\varsigma ,\infty )\mapsto \mathbb {R}\) are \(C^1\) - and \(C^4\) -smooth correspondingly, \(f_j'\) have only isolated zeros and \(|G'(x)|x^{1/2}\le C(1+x^{(p-1)/2})\) for any \(x\ge 0\).

3 Preliminaries

3.1 Norms

Since \(\sum \limits _{j\in {\mathcal {C}}} |u_j|^2\) is conserved by the Hamiltonian flow, it would be natural to work in the \(l_2\)-norm. However, the \(l_2\)-norm of solution of (2.5) diverges as \(N\rightarrow \infty \). To overcome this difficulty and obtain uniform in \(N\) estimates for the solution, we introduce the family of \(l_q\)-weighted norms with exponential decay: for each \(q>0\) and every \(j\in {\mathcal {C}}\), for \(v=(v_k)_{k\in {\mathcal {C}}}\in \mathbb {C}^N\) we set

$$\begin{aligned} \Vert v\Vert _{j,q}=\Big (\sum \limits _{k\in {\mathcal {C}}} \gamma ^{|k-j|} |v_k|^q\Big )^{1/q}, \quad \text{ where } \text{ the } \text{ constant } 1/2<\gamma <1 \text{ will } \text{ be } \text{ chosen } \text{ later. } \end{aligned}$$

Similar norms were considered, for example, in [10], Sect. 3.12. Define the family of \(l_2\)-weighted scalar products on \(\mathbb {C}^{N}\),

$$\begin{aligned} ( v^1 \cdot v^2 )_j= \sum \limits _{k\in {\mathcal {C}}} \gamma ^{|k-j|} v_k^1 \cdot v_k^2, \end{aligned}$$

corresponding to the norms \( \Vert v\Vert _j^2: = \Vert v\Vert _{j,2}^2= (v \cdot v )_j. \) It is easy to see that the Holder inequality holds: for any \(m,n>0\), satisfying \(m^{-1}+n^{-1}=1\), we have

$$\begin{aligned} |( v^1\cdot v^2 )_j|\le \Vert v^1\Vert _{j,m}\Vert v^2\Vert _{j,n}. \end{aligned}$$
(3.1)

Moreover, since for any \(m\ge n\) we have \(|v_k|^n\le |v_k|^m +1 \), then we get

$$\begin{aligned} \Vert v\Vert _{j,n}^{n}\le \Vert v\Vert _{j,m}^{m} + \sum \limits _{k\in {\mathcal {C}}}\gamma ^{|j-k|} \le \Vert v\Vert _{j,m}^{m} + C(\gamma ) \quad \text{ for } \quad m\ge n, \end{aligned}$$
(3.2)

where the constant \(C(\gamma )\) does not depend on \(N\) since the geometrical series converges.

3.2 The Change of Variables

Consider the complex variables \(v=(v_j)_{j\in {\mathcal {C}}}\in \mathbb {C}^N\) and the corresponding vectors of actions and angles \((J,\psi )\in \mathbb {R}^N_{+0}\times \mathbb {T}^N.\) Define a vector \(B:=(\beta ,\overline{\beta })^T\in \mathbb {C}^{2N}\), where \(\beta \) is a complex \(N\)-dimensional Brownian motion as before and \(T\) denotes the transposition. Recall that by \(\langle \cdot \rangle \) we denote the averaging in angles, see Appendix for its properties. Let \(\nabla :=(\nabla _j)_{j\in {\mathcal {C}}}\) and \(g:=(g_j)_{j\in {\mathcal {C}}}\).

Theorem 3.1

There exists a \(C^2\)-smooth \(\sqrt{\varepsilon }\)-close to identity canonical change of variables of \(\mathbb {C}^N\), transforming \(u\rightarrow v,\,(I,\varphi )\rightarrow (J,\psi )\) such that the Hamiltonian \(H^\varepsilon \) in the new coordinates takes the form

$$\begin{aligned} {\mathcal {H}}^\varepsilon (J,\psi ) =H^\varepsilon _0(J) + \varepsilon H_2(J,\psi ) + \varepsilon \sqrt{\varepsilon }H^\varepsilon _> (J,\psi ), \end{aligned}$$
(3.3)

where

$$\begin{aligned} H^\varepsilon _0(v)=\frac{1}{2}\sum \limits _{j\in {\mathcal {C}}} F_j \big (|v_j|^2\big )+ \frac{\sqrt{\varepsilon }}{4}\sum \limits _{|j-k|=1} \big \langle G(|v_j-v_k|^2)\big \rangle \end{aligned}$$
(3.4)

is \(C^4\)-smooth and the functions \(H_2(v)\) and \(H_>^\varepsilon (v)\) are \(C^2\)-smooth. System (2.5)–(2.6) written in \(v\)-variables has the form

$$\begin{aligned} \dot{v}&=i\nabla H^\varepsilon _0(v) + \varepsilon i \nabla H_2(v) + \varepsilon g(v) + \varepsilon \sqrt{\varepsilon }r^\varepsilon (v) + \sqrt{\varepsilon }W^\varepsilon (v)\dot{B}, \end{aligned}$$
(3.5)
$$\begin{aligned} v(0)&=v(u_{0})=:v_{0}, \end{aligned}$$
(3.6)

where \(r^\varepsilon =(r^\varepsilon _j)_{j\in {\mathcal {C}}}:\mathbb {C}^N\mapsto \mathbb {C}^N\) is a continuous vector-function and \(W^\varepsilon \) is a new dispersion matrix. The latter has the size \(N\times 2N\) and consists of two blocks, \(W^\varepsilon =(W^{\varepsilon 1},W^{\varepsilon 2}),\) so that \(W^\varepsilon \dot{B}=W^{\varepsilon 1} \dot{\beta }+W^{\varepsilon 2} \dot{\overline{\beta }}.\) The blocks have the form \(W^{\varepsilon 1,2}=(W^{\varepsilon 1,2}_{kl})_{k,l\in {\mathcal {C}}}\), where \(W_{kl}^{\varepsilon 1}=\sqrt{{\mathcal {T}}_l}\partial _{u_l}v_k\), \(W_{kl}^{\varepsilon 2}=\sqrt{{\mathcal {T}}_l}\partial _{\overline{u}_l}v_k\). Moreover, for any \(j\in {\mathcal {C}}\) and \(1/2 <\gamma < 1\) we have

1. \(|( i\nabla H_ 2\cdot v)_j| \le (1-\gamma )C\Vert v\Vert _{j,p}^p + C(\gamma )\).

2. a. \(\nabla _{j} H_2\) depends only on \(v_n\) such that \(|n-j|\le 2\), and \(|\nabla _{j} H_2|\le C\sum \limits _{n:|n-j|\le 2}|v_n|^{p-1}+C.\)

     b. For any \(q\ge 1\) we have \(\Vert r^\varepsilon \Vert _{j,q}^{q}\le C(\gamma ,q)+C(q)\Vert v\Vert _{j,q(p-1)}^{q(p-1)}.\)

3. The functions \(d_{kl}^{1,2}\), defined as in (5.33), satisfy \(|d^{1}_{kl}-\delta _{kl}{\mathcal {T}}_k|, \,|d^{2}_{kl}|\le C\sqrt{\varepsilon }\) for all \(k,l\in {\mathcal {C}}\).

4. We have \(|u_j-v_j|\le C\sqrt{\varepsilon }\) and \(|I_j-J_j|\le C\sqrt{\varepsilon }\).

Further on we will usually skip the upper index \(\varepsilon .\) If \(\gamma =1\), then the norm \(\Vert u\Vert _j=\big (\sum \limits _{j\in {\mathcal {C}}} |u_j|^2\big )^{1/2}\) is the first integral of the Hamiltonian \(H^\varepsilon \). Consequently, the norm \(\Vert u\Vert _j\) with \(\gamma \) close to one is an approximate integral of the Hamiltonian flow. Item 1 of Theorem 3.1 means that the change of variables preserves this property in the order \(\varepsilon \), modulo constant \(C(\gamma )\). This is crucial for deriving of uniform in \(N\) estimates for solutions of (3.5).

In Eq. (2.5) all functions, except the rotating nonlinearity \(if_j(|u_j|^2)u_j\), have at most a polynomial growth of a power \(p-1\). Item 2 affirms, in particular, that this property is conserved by the transformation.

The proof of the theorem is technically rather complicated. Since the potential \(G\) is not a differentiable function of actions, we have to work in the \(v\)-coordinates despite that the transformation is constructed in the action-angle variables. This rises some difficulties since the derivative of \(\psi _j\) with respect to \(v_j\) have a singularity when \(v_j=0\). Moreover, we have to work in rather inconvenient norms \(\Vert \cdot \Vert _{j,q}\) and estimate not only Poisson brackets, but also non-Hamiltonian terms of the \(v\)-equation. The sketch of the proof is given in Sect. 5.3. For the complete proof see [14], Section 6.

Let us briefly explain why the alternated spins condition HF provides that system (2.5) does not have resonances of the first order. Writing equation (2.5) in the action-angle coordinates, we find that the angles satisfy \(\dot{\varphi }_j\thicksim f_j(|u_j|^2)\), \(j\in {\mathcal {C}}\). It is not difficult to see that the interaction potential \(G(|u_j-u_k|^2)\) depends on the angles only through their difference \(\varphi _j-\varphi _k\), see (5.28). Due to assumption HF, the corresponding combination of rotation frequences is separated from zero. Indeed, \(f_j-f_k=2(-1)^{|j|}f\), where we recall that the function \(f\) is assumed to be strictly positive.

3.3 Estimates for Solution

System (3.5)–(3.6) has a unique solution since system (2.5)–(2.6) does. Let us denote it by \(v(t)=(v_k(t))_{k\in {\mathcal {C}}}\).

Lemma 3.2

For any \(1/2<\gamma <1\) sufficiently close to one there exists \(\alpha =\alpha (\gamma )>0\) such that for all \(j\in {\mathcal {C}}\), \(t\ge 0\) and \(\varepsilon \) sufficiently small we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} e^{\alpha \Vert v(s)\Vert _j^2} < C(\gamma ). \end{aligned}$$
(3.7)

Let us emphasize that estimate (3.7) holds uniformly in \(N,j,t\) and \(\varepsilon \) sufficiently small.

Corollary 3.3

There exists \(\alpha >0\) such that for any \(m>0\), \(t\ge 0\), \(j\in {\mathcal {C}}\) and \(\varepsilon \) sufficiently small we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} e^{\alpha |v_j(s)|^2} < C,\quad {\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} |r_j(v(s))|^m < C(m), \end{aligned}$$

where \(r=(r_j)_{j\in {\mathcal {C}}}\) is the reminder in (3.5).

Proof of Corollary 3.3

Fix any \(\gamma \) and \(\alpha \) such that (3.7) holds true. By the definition of \(\Vert \cdot \Vert ^2_j\) we have \(|v_j|^2 \le \Vert v\Vert ^2_j\), so Lemma 3.2 implies the first inequality. Let us prove the second one. Without loss of generality we assume that \(m\ge 2\). Theorem 3.1(2b) implies

$$\begin{aligned} |r_j|^m\le \Vert r\Vert _{j,m}^m\le C(\gamma ,m) + C(m)\Vert v\Vert _{j,m(p-1)}^{m(p-1)} \le C(\gamma ,m) + C(m,\kappa ) e^{\kappa \Vert v\Vert _{j,m(p-1)}^2} \end{aligned}$$
(3.8)

for any \(\kappa >0\). Using that \(2/m(p-1)\le 1\) and the Jensen inequality, we get

$$\begin{aligned} e^{\kappa \Vert v\Vert _{j,m(p-1)}^2}\le e^{\kappa \sum \limits _{k\in {\mathcal {C}}}\gamma ^{\frac{2|j-k|}{m(p-1)}}|v_k|^2}\le \sum \limits _{k\in {\mathcal {C}}}\gamma ^{\frac{2|j-k|}{m(p-1)}}(C(\gamma ))^{-1} e^{\kappa C(\gamma ) |v_k|^2}, \end{aligned}$$
(3.9)

where \(C(\gamma )=\sum \limits _{k\in {\mathcal {C}}}\gamma ^{\frac{2|j-k|}{m(p-1)}}\). Choosing \(\kappa \) in such a way that \(\kappa C(\gamma )\le \alpha \) and combining (3.8), (3.9) and the first estimate of the corollary, we get the desired inequality. \(\square \)

Proof of Lemma 3.2

Step 1. Take some \(1/2<\gamma <1\) and \(0<\alpha _1<1\). Further on we present only formal computation which could be justified by standard stopping-time arguments (see, e.g., [19]). Applying the Ito formula in complex coordinates (see Appendix ) to \(e^{\alpha _1 \Vert v\Vert ^2_j}\) and noting that \(i\nabla _{j} H_0\cdot v_j =0\) since \(H_0\) depends on \(v\) only through \(J(v)\), we get

$$\begin{aligned} \frac{d}{ds}e^{\alpha _1 \Vert v(s)\Vert ^2_j}&= 2\alpha _1 \varepsilon e^{\alpha _1 \Vert v\Vert ^2_j} \Big ( ( i\nabla H_2\cdot v )_j + ( g\cdot v )_j + \sqrt{\varepsilon }( r\cdot v )_j + \sum \limits _{k\in {\mathcal {C}}} \gamma ^{|j-k|}d_{kk}^1 \nonumber \\&\qquad +\, \alpha _1 \sum \limits _{k,l\in {\mathcal {C}}} \gamma ^{|j-k|+|j-l|} \big (v_k \overline{v}_l d^1_{kl} + \mathrm{Re }(\overline{v}_k \overline{v}_l d^2_{kl}) \big ) \Big ) + 2\alpha _1\sqrt{\varepsilon }\dot{M}_s,\qquad \end{aligned}$$
(3.10)

where we recall that \(d^{1,2}_{kl}\) are calculated in (5.33), and the martingal

$$\begin{aligned} M_s:=\int \limits _{s_0}^s e^{\alpha _1 \Vert v\Vert ^2_j}( v \cdot WdB )_j \quad \text{ for } \text{ some } \quad s_0<s. \end{aligned}$$
(3.11)

First we estimate \( ( r \cdot v)_j\). Theorem 3.1(2b) implies

$$\begin{aligned} \Vert r\Vert _{j,p/(p-1)} \le \big ( C \Vert v\Vert _{j,p}^p+C(\gamma )\big )^{(p-1)/p}\le C_1 \Vert v\Vert _{j,p}^{p-1} + C_1(\gamma ). \end{aligned}$$

Then, the Holder inequality (3.1) with \(m=p/(p-1)\) and \(n=p\), jointly with (3.2) implies

$$\begin{aligned} |(r\cdot v )_j|\le \Vert r\Vert _{j,p/(p-1)}\Vert v\Vert _{j,p}\le C_1\Vert v\Vert _{j,p}^p + C_1(\gamma ) \Vert v\Vert _{j,p} \le C_2(\gamma )(\Vert v\Vert _{j,p}^p + 1). \end{aligned}$$
(3.12)

Secondly we estimate Ito’s term. By Theorem 3.1(3) we get

$$\begin{aligned} \left| \sum \limits _{k\in {\mathcal {C}}} \gamma ^{|j-k|}d_{kk}^1\right| \le C(\gamma ). \end{aligned}$$
(3.13)

Note that

$$\begin{aligned} \sum \limits _{k,l\in {\mathcal {C}}} \gamma ^{|j-k|+|j-l|} |v_k| |v_l| \le \sum \limits _{k,l\in {\mathcal {C}}} \gamma ^{|j-k|+|j-l|} (|v_k|^2 + |v_l|^2) \le C(\gamma )\Vert v\Vert ^2_j. \end{aligned}$$

Consequently, due to Theorem 3.1(3), we have

$$\begin{aligned} \left| \sum \limits _{k,l\in {\mathcal {C}}} \gamma ^{|j-k|+|j-l|}\big (v_l \overline{v}_k d^1_{kl} + \mathrm{Re }(\overline{v}_k \overline{v}_l d^3_{kl}) \big )\right|&\le \sum \limits _{k\in {\mathcal {C}}} \gamma ^{2|j-k|}{\mathcal {T}}_k|v_k|^2 + \sqrt{\varepsilon }C(\gamma ) \Vert v\Vert _j^2\nonumber \\&\le \big (C+ \sqrt{\varepsilon }C_1(\gamma )\big )\Vert v\Vert ^2_j \nonumber \\&\le \big (C+ \sqrt{\varepsilon }C_1(\gamma )\big )\Vert v\Vert ^p_{j,p}+C_2(\gamma ), \end{aligned}$$
(3.14)

where we have used (3.2). Now Theorem 3.1(1), assumption Hg(ii), (3.12), (3.13) and (3.14), applied to (3.10), imply that for \(\gamma \) sufficiently close to one we have

$$\begin{aligned} \frac{d}{ds}e^{\alpha _1 \Vert v\Vert ^2_j} \le 2\alpha _1\varepsilon e^{\alpha _1 \Vert v\Vert ^2_j} \Big ( -\big (C_{g}-(1-\gamma )C-\alpha _1 C-\sqrt{\varepsilon }C(\gamma ) \big )\Vert v\Vert ^p_{j,p} + C_1(\gamma ) \Big ) + 2\alpha _1\sqrt{\varepsilon }\dot{M}_s. \end{aligned}$$
(3.15)

We take \(1/2<\gamma <1\) sufficiently close to one, then choose \(\alpha _1(\gamma )>0\) and \(\varepsilon _0(\gamma )>0\), sufficiently small, in such a way that

$$\begin{aligned} \Delta :=C_{g}-(1-\gamma )C- \alpha _1 C-\sqrt{\varepsilon }_0 C(\gamma ) > 0. \end{aligned}$$
(3.16)

For any constant \(C\) there exists a constant \(C_1\) such that for all \(x\ge 0\) we have

$$\begin{aligned} 2\alpha _1 e^{\alpha _1 x}(-\Delta x + C) \le - e^{\alpha _1 x} + C_1. \end{aligned}$$

Consequently, (3.15) jointly with (3.2) implies that for \(\varepsilon <\varepsilon _0\) we have

$$\begin{aligned} \frac{d}{ds}e^{\alpha _1 \Vert v(s)\Vert ^2_j} \le -\varepsilon e^{\alpha _1 \Vert v(s)\Vert ^2_j} + \varepsilon C(\gamma )+ 2\alpha _1\sqrt{\varepsilon }\dot{M}_s. \end{aligned}$$
(3.17)

Fixing \(s_0=0\) (which is defined in 3.11), taking expectation and applying the Gronwall–Bellman inequality to (3.17), we have

$$\begin{aligned} {\mathbf {E}}\,e^{\alpha _1 \Vert v(s)\Vert ^2_j} \le {\mathbf {E}}\,e^{\alpha _1 \Vert v_0\Vert ^2_j} e^{-\varepsilon s} + C(\gamma ). \end{aligned}$$

Due to assumption HI(i) and Theorem 3.1, we have \( {\mathbf {E}}\,e^{\alpha _1 |v_{0j}|^2} \le C\) for all \(j\in {\mathcal {C}}\). Then the Jensen inequality implies that \( {\mathbf {E}}\,e^{\alpha _1 \Vert v_0\Vert ^2_j} \le C(\gamma ),\) if \(\alpha _1\) is sufficiently small. Thus we obtain

$$\begin{aligned} {\mathbf {E}}\,e^{\alpha _1 \Vert v(s)\Vert _j^2}\le C(\gamma ) \quad \text{ for } \text{ all } s\ge 0 \text{ and } j\in {\mathcal {C}}. \end{aligned}$$
(3.18)

Step 2. We fix the parameters \(\gamma \) and \(\alpha _1\) as above. Accordingly, the constants, which depend only on them, will be denoted just \(C,C_1,\ldots \).

Now we will prove (3.7). Take any \(0<\alpha <\alpha _1/2\) and fix \(s_0=t\). Integrating inequality (3.17) with \(\alpha _1\) replaced by \(\alpha \) over the interval \( t \le s\le t+1/\varepsilon \) and using (3.18), we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} e^{\alpha \Vert v(s)\Vert ^2_j}&\le {\mathbf {E}}\,e^{\alpha \Vert v(t)\Vert ^2_j} + C+ 2\alpha \sqrt{\varepsilon }\,{\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} M_s \nonumber \\&\le C_1+ 2\alpha \sqrt{\varepsilon }\,{\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} M_s. \end{aligned}$$
(3.19)

Now we turn to the martingal part. The definition of \(M_s\) implies

$$\begin{aligned} \sup \limits _{s\in [t,t+1/\varepsilon ]} M_s \le \sum \limits _{k\in {\mathcal {C}}} \sup \limits _{s\in [t,t+1/\varepsilon ]} M_{ks}, \end{aligned}$$

where \( M_{ks} = \int \limits _{t}^s e^{\alpha \Vert v\Vert ^2_j}\gamma ^{|j-k|}v_k\cdot (WdB)_k\). The Doob–Kolmogorov inequality implies that

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} M_{ks} \le C {\mathbf {E}}\,\sqrt{[ M_k]_{t+1/\varepsilon }} \le C \sqrt{{\mathbf {E}}\,[ M_k]_{t+1/\varepsilon } }, \end{aligned}$$

where \([ M_k]_s\) denotes the quadratic variation of \(M_{ks}\). Similarly to (5.36), we obtain

$$\begin{aligned}{}[ M_k]_{t+1/\varepsilon } = \int \limits _{t}^{t+1/\varepsilon } e^{2\alpha \Vert v\Vert ^2_j} \gamma ^{2|j-k|} S^J_{kk}\, ds\le C(\kappa )\gamma ^{|j-k|}\int \limits _{t}^{t+1/\varepsilon } e^{2(\alpha +\kappa ) \Vert v\Vert ^2_j} \left( |d_{kk}^1|+|d_{kk}^2|\right) \, ds \end{aligned}$$

for any \(\kappa >0\), where \(S_{kk}^J\) is defined in (5.35). Take \(0<\kappa <\alpha _1/2-\alpha \). Then, using Theorem 3.1(3) and (3.18), we get

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{s\in [t,t+1/\varepsilon ]} M_s \!\le \! C\sum \limits _{k\in {\mathcal {C}}} \sqrt{{\mathbf {E}}\,[ M_k]_{t+1/\varepsilon } }&\le C(\kappa )\sum \limits _{k\in {\mathcal {C}}} \gamma ^{|j-k|/2}\left( \int \limits _{t}^{t+1/\varepsilon } {\mathbf {E}}\,e^{2(\alpha +\kappa ) \Vert v\Vert ^2_j} \, ds \right) ^{1/2}\\&\le \frac{C_1(\kappa )}{\sqrt{\varepsilon }}. \end{aligned}$$

Now (3.7) follows from (3.19). \(\square \)

4 The Limiting Dynamics

In this section we investigate the limiting (as \(\varepsilon \rightarrow 0\)) behaviour of system (2.5). We prove Theorems 4.6, 4.8 and 2.3 which are our main results.

4.1 Averaged Equation

Here we prove Theorem 4.6, which describes the limiting dynamics of actions on long time intervals of order \(\varepsilon ^{-1}\). In the slow time \(\tau =\varepsilon t\) system (3.5)–(3.6) has the form

$$\begin{aligned} d v_j=( \varepsilon ^{-1} i\nabla _{j} H_0 + i \nabla _{j} H_2 + g_j + \sqrt{\varepsilon }r_j )\,d\tau + ( W d B)_j, \quad v_j(0)=v_{0j},\quad j\in {\mathcal {C}}. \end{aligned}$$
(4.1)

Let us write Eq. (4.1) in the action-angle variables \(J=J(v),\psi =\psi (v)\). Due to (5.34) and the equalities \(i\nabla _j H_0\cdot v_j=0\) and \(i\nabla _j H_0\cdot \frac{iv_j}{|v_j|^2}=\partial _{J_j}H_0\), we have

$$\begin{aligned} dJ_j&= A_j^J\,d\tau + v_j\cdot ( W d B)_j, \end{aligned}$$
(4.2)
$$\begin{aligned} d \psi _j&= \left( \varepsilon ^{-1}\frac{\partial H_0}{\partial J_j} +\frac{A_j^\psi }{|v_j|^2} \right) \,d\tau + \frac{ iv_j}{|v_j|^2}\cdot ( W d B)_j, \quad j\in {\mathcal {C}}, \end{aligned}$$
(4.3)

where

$$\begin{aligned} A_j^J:=A_j\cdot v_j+ d_{jj}^1, \quad A_j^\psi := A_j\cdot (iv_j) -\mathrm{Im }(\overline{v}_j v_j^{-1} d^2_{jj}), \quad A_j:= i \nabla _{j} H_2 +g_j + \sqrt{\varepsilon }r_j, \end{aligned}$$
(4.4)

and \(d^{1,2}_{jj}\) are calculated in (5.33). In view of (3.4), Proposition 5.9 implies that

$$\begin{aligned} \text{ for } \text{ each }\, j\in {\mathcal {C}}\, \mathrm{the\, function}\, \partial _{J_j}H_0\, \mathrm{is}\, C^1\mathrm{-smooth\, with\, respect\, to}\, J=(J_k)_{k\in {\mathcal {C}}}. \end{aligned}$$
(4.5)

Theorem 3.1(a), 3 jointly with Corollary 3.3 implies that for all \(j,k,l\in {\mathcal {C}}\) and every \(m>0\) we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{0\le \tau \le T} \big ( |A_j| + |A^J_j| + |A^\psi _j| + |S^J_{kl}|\big )^m \le C(m), \end{aligned}$$
(4.6)

where \(S^J_{kl}\) is the element of the diffusion matrix for Eq. (4.2) with respect to the real Brownian motion; it is calculated in (5.35).

Note that the quadratic vatriations of the martingales from the r.h.s. of (4.2) and (4.3) are calculated in (5.36).

Let \(v^{\varepsilon }(\tau )\) be a solution of (4.1). Then \(J^{\varepsilon }(\tau ):=J(v^\varepsilon (\tau )),\psi ^{\varepsilon }(\tau ):=\psi (v^\varepsilon (\tau ))\) satisfy (4.2)–(4.3). Due to estimate (4.6) and slow equation (4.2), using Arzela–Ascoli theorem, we get

Proposition 4.1

The family of measures \(\{{\mathcal {D}}(J^{\varepsilon }(\cdot )), \; 0<\varepsilon \le 1\}\) is tight on \(C([0,T],\, \mathbb {R}^N)\).

Let \(Q_0\) be a weak limiting point of \({\mathcal {D}}(J^{\varepsilon }(\cdot ))\):

$$\begin{aligned} {\mathcal {D}}(J^{\varepsilon _k}(\cdot )) \rightharpoonup Q_0 \quad \text{ as }\quad k\rightarrow \infty \quad \text{ on }\quad C([0,T],\, \mathbb {R}^N), \end{aligned}$$
(4.7)

where \(\varepsilon _k\rightarrow 0\) as \(k\rightarrow \infty \) is a suitable sequence. Now we are going to show that the limiting point \(Q_0\) does not depend on the sequence \((\varepsilon _k)\) and is governed by the main order in \(\varepsilon \) of the averaging of Eq. (4.2). Let us begin with writing down this equation. Since by Theorem 3.1(3) we have \(d_{jj}^1={\mathcal {T}}_j+O(\sqrt{\varepsilon })\), the main order of the drift of Eq. (4.2) is \( i\nabla _{j} H_2 \cdot v_{j} + g_j(v)\cdot v_j +{\mathcal {T}}_{j}.\) Since for any real-valued \(C^1\)-smooth function \(h(v)\) we have \(i\nabla _j h\cdot v_j=-\partial _{\psi _j} h\), then periodicity of the function \(h\) with respect to \(\psi _j\) implies \(\langle i\nabla _{j} h \cdot v_{j} \rangle =0\). So that, in particular, \(\langle i\nabla _{j} H_2 \cdot v_{j} \rangle =0\). Thus the main order of the averaged drift takes the form

$$\begin{aligned} \langle i\nabla _{j} H_2 \cdot v_{j}+ g_j(v)\cdot v_j +{\mathcal {T}}_{j}\rangle ={\mathcal {R}}_{j}(J) +{\mathcal {T}}_{j}, \end{aligned}$$
(4.8)

where \({\mathcal {R}}_j\) is defined in (2.7). Proposition 5.8 jointly with Theorem 3.1(3) implies that the main order of the diffusion matrix of (4.2) with respect to the real Brownian motion \((\mathrm{Re }\beta _k,\mathrm{Im }\beta _k)_k\) is \(\mathrm{diag }({\mathcal {T}}_{k}|v_{k}|^2)_{k\in {\mathcal {C}}}=\mathrm{diag }(2{\mathcal {T}}_{k}J_{k})_{k\in {\mathcal {C}}}.\) It does not depend on angles, so the averaging does not change it. Choose its square root as \(\mathrm{diag }(\sqrt{2{\mathcal {T}}_{k}J_{k}})_{k\in {\mathcal {C}}}\). Then in the main order the averaging of Eq. (4.2) takes the form

$$\begin{aligned} d J_{j} = ({\mathcal {R}}_{j}(J)+{\mathcal {T}}_{j}) \,d\tau + \sqrt{2J_{j}{\mathcal {T}}_{j}}\,d\widetilde{\beta }_{j}, \quad {j}\in {\mathcal {C}}, \end{aligned}$$
(4.9)

where \(\widetilde{\beta }_{j}\) are independent standard real Brownian motions. The averaged equation (4.9) has a weak singularity: its dispersion matrix is not Lipschitz continuous. However, its drift is regular: Proposition 5.9 implies that

$$\begin{aligned} \text{ for } \text{ each }\,j\in {\mathcal {C}}\ \mathrm{the\,function} \ {\mathcal {R}}_j\,\mathrm{is} \ C^1\mathrm{- \ smooth\,with\,respect\,to} \ J=(J_k)_{k\in {\mathcal {C}}}. \end{aligned}$$
(4.10)

Theorem 4.2

The measure \(Q_0\) is a law of the process \(J^{0}(\cdot )\) which is a unique weak solution of the averaged equation (4.9) with the initial conditions \({\mathcal {D}}(J(0))={\mathcal {D}}(I(u_0))\). Moreover,

$$\begin{aligned} {\mathcal {D}}(J^{\varepsilon }(\cdot )) \rightharpoonup {\mathcal {D}}(J^0(\cdot )) \quad \text{ as }\quad \varepsilon \rightarrow 0 \quad \text{ on }\quad C([0,T],\mathbb {R}^N), \end{aligned}$$
(4.11)

This convergence is uniform in \(N\). For all \(j\in {\mathcal {C}}\) we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{\tau \in [0,T]} e^{2\alpha J^0_j(\tau )} < C \quad \text{ and }\quad \int \limits _0^T {\mathbf {P}}\,(J_j^0(\tau )<\delta )\, d\tau \rightarrow 0 \text{ as } \delta \rightarrow 0, \end{aligned}$$
(4.12)

where the latter convergence is uniform in \(N\).

Proof

The proof of convergence (4.11) follows a scheme suggested in [24, 25] while the latter works use the averaging method developed in [18, 20]. Main difficulties of our situation compared to [20] are similar to those in [24, 25] and manifest themselves in the proof of Lemma 4.4 below. Equation (4.3) has a singularity when \(J_k=0\), and for \(J\), such that the rotating frequencies \(\partial _{J_j}H_0\) are rationally dependent, system (4.2)–(4.3) enters into resonant regime. To overcome these difficulties we note that singularities and resonances have Lebesgue measure zero and prove the following lemma, which affirms that the probability of the event that actions \(J^{\varepsilon }\) for a long lime belong to a set of small Lebesgue measure is small. A similar idea was used in [16, 17], where was established the stochastic averaging principle for different systems with weak resonances ([16, 17]) and singularities ([17]). See also [18], Chapters 9.2 and 9.3.

$$\begin{aligned}&\text{ Let } \ \Lambda \subset \mathbb {Z}^d \ \mathrm{be\,independent\,from} \ N \ \mathrm{and\,satisfies}\, \Lambda \subset {\mathcal {C}}(N) \ \mathrm{for} \ N\ge N_\Lambda . \ \mathrm{Denote\,by} \ M \nonumber \\&\text{ the } \text{ number } \text{ of } \text{ nodes } \text{ in } \ \Lambda . \ \mathrm{Further\,on\,we\,assume\,that} \ N\ge N_\Lambda . \end{aligned}$$
(4.13)

Lemma 4.3

Let \({\mathcal {J}}^{\varepsilon }:=(J^{\varepsilon }_k)_{k\in \Lambda }\) and a set \(E^\varepsilon \in \mathbb {R}^M_{+0}\) be such that its Lebesgue measure \(|E^\varepsilon |\rightarrow 0\) as \(\varepsilon \rightarrow 0\). Then

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,\big ( {\mathcal {J}}^{\varepsilon }(\tau ) \in E^\varepsilon \big ) \, d\tau \rightarrow 0 \quad \text{ as }\quad \varepsilon \rightarrow 0 \quad \text{ uniformly } \text{ in } \text{ N. } \end{aligned}$$
(4.14)

The proof of Lemma 4.3 is based on Krylov’s estimates (see [22]) and the concept of local time. It follows a scheme suggested in [32] (see also [26], Section 5.2.2).

Another difficulty, which is the principal difference between our case and those of all works mentioned above, is that we need to establish the uniformity in \(N\) of the convergence (4.11). For this purpose we use the uniformity of estimates and convergences of Corollary 3.3 and Lemmas 4.3, 4.4, and the fact that the averaged equation for the infinite system of rotators has a unique weak solution (in a suitable class).

Now let us formulate the following averaging lemma which is the main tools of the proof of the theorem.

Lemma 4.4

Take a function \(P\in {\mathcal {L}}_{loc}(\mathbb {C}^N)\) which depends on \(v=(v_j)_{j\in {\mathcal {C}}}\in \mathbb {C}^N\) only through \((v_j)_{j\in \Lambda }\in \mathbb {C}^M\). Let it has at most a polynomial growth at infinity. Then, writing \(P(v)\) in the action-angle coordinates \(P(v)=P(J,\psi )\), we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{\tau \in [0,T]} \left| \int \limits _0^\tau P(J^{\varepsilon }(s), \psi ^{\varepsilon }(s)) - \langle P \rangle (J^{\varepsilon }(s)) \, ds \right| \rightarrow 0 \text{ as } \varepsilon \rightarrow 0 \text{ uniformly } \text{ in } \ N. \end{aligned}$$

Similarly one can prove that

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{\tau \in [0,T]} \left| \int \limits _0^\tau P(J^{\varepsilon }(s), \psi ^{\varepsilon }(s)) - \langle P \rangle (J^{\varepsilon }(s)) \, ds \right| ^2 \rightarrow 0 \text{ as } \varepsilon \rightarrow 0 \text{ uniformly } \text{ in } \text{ N. } \end{aligned}$$
(4.15)

We establish Lemmas 4.3 and 4.4 in Sect. 5.

Now we will prove that \(Q_0\) is a law of a weak solution of (4.9). It sufficies to show (see [19], Chapter 5.4) that for any \(j,k,l\in {\mathcal {C}}\) the processes

$$\begin{aligned} Z_{j}(\tau ):= J_{j}(\tau ) - \int \limits _0^{\tau } ({\mathcal {R}}_{j}(J(s))+{\mathcal {T}}_{j}) \, ds, \quad Z_{k} Z_{l}(\tau ) - 2\delta _{kl} {\mathcal {T}}_{k}\int \limits _0^{\tau } J_{k}(s) \, ds \end{aligned}$$
(4.16)

are square-integrable martingales with respect to the measure \(Q_0\) and the natural filtration of \(\sigma \)-algebras in \(C([0,T], \mathbb {R}^N)\). We establish it for the first process, for the second the proof is similar, but one should use (4.15) (for the first one we do not need this). Consider the process

$$\begin{aligned} K_{j}^{\varepsilon _k}(\tau ):=J_{j}^{\varepsilon _k}(\tau )- \int \limits _0^\tau ({\mathcal {R}}_{j}(J^{\varepsilon _k}(s)) +{\mathcal {T}}_{j})\, ds. \end{aligned}$$
(4.17)

Then, according to (4.2),

$$\begin{aligned} K_{j}^{\varepsilon _k}(\tau )=M_{j}^{\varepsilon _k}(\tau ) + \Theta _{j}^{\varepsilon _k}(\tau ), \end{aligned}$$

where \(M_{j}^{\varepsilon _k}\) is a martingal and by (4.8) we have

$$\begin{aligned} \Theta _{j}^{\varepsilon _k}(\tau )=\int \limits _0^\tau \big ( (i\nabla _{j} H_2 +g_{j})\cdot v_j^{\varepsilon _k} - \langle (i\nabla _{j} H_2 +g_j)\cdot v_{j}^{\varepsilon _k} \rangle + \sqrt{\varepsilon }r_{j}\cdot v_{j}^{\varepsilon _k} + (d_{jj}^1-{\mathcal {T}}_{j}) \big ) \,ds. \end{aligned}$$
(4.18)

Due to Corollary 3.3 and Theorem 3.1(3), we have

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{0\le \tau \le T }|r_{j} \cdot v^{\varepsilon _k}_{j}| \le C, \quad |d_{jj}^1-{\mathcal {T}}_{j}|\le C\sqrt{\varepsilon }. \end{aligned}$$
(4.19)

Then, applying Lemma 4.4, we get

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{0\le \tau \le T} |\Theta _{j}^{\varepsilon _k}(\tau ) |\rightarrow 0 \quad \text{ as }\quad \varepsilon _k\rightarrow 0. \end{aligned}$$
(4.20)

Consequently,

$$\begin{aligned} \lim \limits _{\varepsilon _k\rightarrow 0}{\mathcal {D}}\big (K_{j}^{\varepsilon _k}(\cdot )\big )\!=\!\lim \limits _{\varepsilon _k\rightarrow 0}{\mathcal {D}}\big ( M_{j}^{\varepsilon _k}(\cdot )\big ) \end{aligned}$$
(4.21)

in the sense that if one limit exists then the another exists as well and the two are equal.

Due to (4.7) and the Skorokhod Theorem, we can find random processes \(L^{\varepsilon _k}(\tau )\) and \(L(\tau )\), \(0\le \tau \le T\), such that \({\mathcal {D}}\big (L^{\varepsilon _k}(\cdot )\big )={\mathcal {D}}\big (J^{\varepsilon _k}(\cdot )\big )\), \({\mathcal {D}}\big (L(\cdot )\big )=Q_0\) and

$$\begin{aligned} L^{\varepsilon _k}\rightarrow L \quad \text{ in } \quad C([0,T], \mathbb {R}^N) \quad \text{ as } \quad \varepsilon _k\rightarrow 0 \quad \text{ a.s. } \end{aligned}$$

Then by (4.17) the left-hand side limit in (4.21) exists and equals

$$\begin{aligned} L_{j}(\tau ) - \int \limits _0^{\tau } ({\mathcal {R}}_j(L(s))+{\mathcal {T}}_{j}) \, ds. \end{aligned}$$
(4.22)

Due to (4.6), the family of martingales \(\{M_{j}^{\varepsilon _k},\,k\in \mathbb {N}\}\) is uniformly square integrable. Due to (4.21), they converge in distribution to the process (4.22). Then the latter is a square integrable martingal as well. Thus, each limiting point \(Q_0\) is a weak solution of the averaged equation (4.9).

Since the initial conditions \(u_0\) are independent from \(\varepsilon \), Theorem 3.1(4) implies that \({\mathcal {D}}(J(0))={\mathcal {D}}(I(u_0))\). In [38] Yamada and Watanabe established the uniqueness of a weak solution for an equation with a more general dispersion matrix then that for (4.9), but with a Lipschitz-continuous drift. Their proof can be easily generalized to our case by the stopping time arguments. We will not do this here since in Proposition 4.5 we will consider more difficult infinite-dimensional situation.

The uniqueness of a weak solution of (4.9) implies that all the limiting points (4.7) coincide and we obtain the convergence (4.11). The first estimate in (4.12) follows from Corollary 3.3 and the second one follows from Lemma 4.3.

Now we will prove the uniformity in \(N\) of the convergence (4.11). Recall that it is understood in the sense that for any \(\Lambda \subset \mathbb {Z}^d\) as in (4.13) we have

$$\begin{aligned} {\mathcal {D}}\big ((J^{\varepsilon }_{j}(\cdot ))_{j\in \Lambda }\big ) \rightharpoonup {\mathcal {D}}\big ((J^{0}_{j}(\cdot ))_{j\in \Lambda }\big ) \quad \text{ as } \varepsilon \rightarrow 0 \text{ on } C([0,T],\mathbb {R}^M) \text{ uniformly } \text{ in } N. \end{aligned}$$
(4.23)

It is well known that the weak convergence of probability measures on a separable metric space is equivalent to convergence in the dual-Lipschitz norm, see Theorem 11.3.3 in [12]. Analysing the proof of this theorem, we see that in order to establish the uniformity in \(N\) of the convergence (4.23) with respect to the dual-Lipschitz norm, it suffices to show that for any bounded continuous functional \(h: (J_j(\cdot ))_{j\in \Lambda }\in C([0,T],\mathbb {R}^M) \mapsto \mathbb {R}\), we have

$$\begin{aligned} {\mathbf {E}}\,h(J^{\varepsilon })\rightarrow {\mathbf {E}}\,h(J^{0})\quad \text{ as } \quad \varepsilon \rightarrow 0 \quad \text{ uniformly } \text{ in } N, \end{aligned}$$
(4.24)

where we have denoted \(h(J):=h\big ((J_j(\cdot ))_{j\in \Lambda }\big )\). In order to prove (4.24), first we pass to the limit \(N\rightarrow \infty \). Recall that \({\mathcal {C}}^\infty =\cup _{N\in \mathbb {N}}{\mathcal {C}}(N)\). Denote \(J^{\varepsilon ,N}=(J^{\varepsilon ,N}_j)_{j\in {\mathcal {C}}^\infty }\), where

$$\begin{aligned} J_j^{\varepsilon ,N}:= \left\{ \begin{array}{cl} J_j^{\varepsilon },\quad &{}\text{ if }\quad j\in {\mathcal {C}}={\mathcal {C}}(N), \\ 0, \quad &{}\text{ if }\quad j\in {\mathcal {C}}^\infty \setminus {\mathcal {C}}. \end{array} \right. \end{aligned}$$
(4.25)

Using the uniformity in \(N\) of estimate (4.6), we get that the family of measures

$$\begin{aligned} \left\{ {\mathcal {D}}(J^{\varepsilon ,N}(\cdot )),\quad 0<\varepsilon \le 1,\; N\in \mathbb {N}\right\} \end{aligned}$$

is tight on a space \(C([0,T],\mathbb {R}^\infty )\). Take any limiting point \(Q_0^\infty \) such that \({\mathcal {D}}(J^{\varepsilon _k,N_k}(\cdot ))\rightharpoonup Q_0^\infty \) as \(\varepsilon _k\rightarrow 0, \, N_k\rightarrow \infty \). Recall that the initial conditions \(u_0\) satisfy HI(ii). Denote the vector of actions corresponding to \(u_0^\infty \) by \(I_0^\infty =I(u_{0}^\infty )\in \mathbb {R}_{0+}^\infty \).

Proposition 4.5

The measure \(Q_0^\infty \) is a law of the process \(J^{0,\infty }(\tau )\) which is a unique weak solution of the averaged equation for the infinite system of rotators

$$\begin{aligned} d J_{j} = ({\mathcal {R}}_{j}(J)+{\mathcal {T}}_{j}) \,d\tau + \sqrt{2J_{j}{\mathcal {T}}_{j}}\,d\widetilde{\beta }_{j}, \quad j\in {\mathcal {C}}^\infty , \quad {\mathcal {D}}(J(0))={\mathcal {D}}(I^\infty _0), \end{aligned}$$
(4.26)

Moreover, \({\mathcal {D}}(J^{\varepsilon ,N}(\cdot ))\rightharpoonup {\mathcal {D}}(J^{0,\infty }(\cdot ))\) as \(\varepsilon \rightarrow 0,\, N\rightarrow \infty \) on \(C([0,T],\mathbb {R}^\infty )\).

Before proving this proposition we will establish (4.24). Proposition 4.5 implies

$$\begin{aligned} {\mathbf {E}}\,h(J^{\varepsilon })\rightarrow {\mathbf {E}}\,h(J^{0,\infty }) \quad \text{ as } \quad \varepsilon \rightarrow 0,\,N\rightarrow \infty . \end{aligned}$$
(4.27)

In view of convergence (4.11) which is already proven for every \(N\), (4.27) implies that \({\mathbf {E}}\,h(J^{0})\rightarrow {\mathbf {E}}\,h(J^{0,\infty })\) as \(N\rightarrow \infty \). Consequently, for all \(\delta >0\) there exist \(N_1\in \mathbb {N}\) and \(\varepsilon _1>0,\) such that for every \(N\ge N_1,\, 0\le \varepsilon < \varepsilon _1\), we have

$$\begin{aligned} |{\mathbf {E}}\,h(J^{\varepsilon })-{\mathbf {E}}\,h(J^{0,\infty })|< \delta /2. \end{aligned}$$

Then, for \(N\) and \(\varepsilon \) as above,

$$\begin{aligned} |{\mathbf {E}}\,h(J^{\varepsilon })-{\mathbf {E}}\,h(J^{0})|\le |{\mathbf {E}}\,h(J^{\varepsilon })-{\mathbf {E}}\,h(J^{0,\infty })| + |{\mathbf {E}}\,h(J^{0,\infty })-{\mathbf {E}}\,h(J^{0})|< \delta . \end{aligned}$$
(4.28)

Choose \(\varepsilon _2>0\) such that for every \(0<\varepsilon <\varepsilon _2\) and \(N<N_1\) we have

$$\begin{aligned} |{\mathbf {E}}\,h(J^{\varepsilon })-{\mathbf {E}}\,h(J^{0})|< \delta . \end{aligned}$$
(4.29)

Then, due to (4.28), (4.29) holds for all \(N\) and \(\varepsilon <\varepsilon _1\wedge \varepsilon _2\). Thus, we obtain (4.24). The proof of the theorem is completed. \(\square \)

Proof of Proposition 4.5

To prove that \(Q_0^\infty \) is a law of a weak solution of (4.26), it suffices to show that the processes (4.16) are square-integrable martingales with respect to the measure \(Q_0^\infty \) and the natural filtration of \(\sigma \)-algebras in \(C([0,T], \mathbb {R}^\infty )\) (see [39]). The proof of that literally coincides with the corresponding proof for the finite-dimensional case, one should just replace the limit \(\varepsilon _k\rightarrow 0\) by \(\varepsilon _k\rightarrow 0, N_k\rightarrow \infty \) and the space \(C([0,T], \mathbb {R}^N)\) by \(C([0,T], \mathbb {R}^\infty )\). Estimate of Corollary 3.3 joined with Fatou’s lemma implies that the obtained weak solution belongs to the desired class of processes. To prove that the weak solution is unique, it suffices to show that the pathwise uniqueness of solutions holds (see [30, 39]). Let \(J(\tau )\) and \(\hat{J}(\tau )\) be two solutions of (4.26), defined on the same probability space, corresponding to the same Brownian motions and initial conditions, distributed as \(I_0^\infty \), and satisfying the first estimate from (4.12). Let \(w(\tau ):=J(\tau )-\hat{J}(\tau ).\) Following literally the proof of Theorem 1 in [38], for every \(j\in {\mathcal {C}}^\infty \) and any \(\tau \ge 0\) we get the estimate

$$\begin{aligned} {\mathbf {E}}\,|w_j(\tau )|\le {\mathbf {E}}\,\int \limits _{0}^\tau |{\mathcal {R}}_j(J(s))-{\mathcal {R}}_j(\hat{J}(s))|\, ds. \end{aligned}$$
(4.30)

Define for \(R>0\) and \(q>0\) a stopping time

$$\begin{aligned} \tau _R=\inf \{\tau \ge 0:\,\exists j\in {\mathcal {C}}^\infty \text{ satisfying } J_j(\tau )\vee \hat{J}_j(\tau )\ge R(|j|^q+1)\}. \end{aligned}$$

For any \(\tau \ge 0\) we have

$$\begin{aligned} {\mathbf {P}}\,(\tau _R\le \tau )&\le \! \sum \limits _{j\in {\mathcal {C}}^\infty }{\mathbf {P}}\,\left( \sup \limits _{0\le s\le \tau }J_j(s)\ge R(|j|^q+1)\!\right) \!+\! \sum \limits _{j\in {\mathcal {C}}^\infty }{\mathbf {P}}\,\left( \sup \limits _{0\le s\le \tau } \hat{J}_j(s)\ge R(|j|^q+1)\!\right) \nonumber \\&\le C\sum \limits _{j\in {\mathcal {C}}^\infty }e^{-2\alpha R (|j|^q+1)} \rightarrow 0 \quad \text{ as }\quad R\rightarrow \infty . \end{aligned}$$
(4.31)

For \(L\in \mathbb {N}\) denote \(|w|_L:=\sum \limits _{|j|\le L} e^{-|j|} |w_j|\). Using the Taylor expansion, it is possible to show that, in view of (4.10) and assumption Hg(i), the derivatives \(\partial _{J_k}{\mathcal {R}}_j(J)\) have at most a polynomial growth of some power \(m>0\), which is uniform in \(j,k\in {\mathcal {C}}^\infty \). Since for any \(\tau <\tau _R\) and \(k\in {\mathcal {C}}^\infty \) satisfying \(|k|\le L+1\) we have \(J_k(\tau ),\hat{J}_k(\tau )\le R((L+1)^q+1)\), then estimate (4.30) implies

$$\begin{aligned} {\mathbf {E}}\,|w(\tau \wedge \tau _R)|_L&\le C \sum \limits _{|j|\le L}e^{-|j|}{\mathbf {E}}\,\int \limits _0^{\tau \wedge \tau _R}\Big (1+\sum \limits _{k:|k-j|\le 1}(J_k+\hat{J}_k)^m\Big ) \sum \limits _{k:|k-j|\le 1} |w_j|\, ds \\&\le C(R) (L+1)^{mq}{\mathbf {E}}\,\int \limits _0^{\tau \wedge \tau _R} \Big ( |w|_L + e^{-L}\sum \limits _{|k|= L+1} |w_k|\Big ) \, ds \\&\le C_1(R) (L+1)^{mq} \int \limits _0^{\tau }\big ( {\mathbf {E}}\,|w(s\wedge \tau _R)|_L + e^{-L}L^{d-1} \big ) \, ds, \end{aligned}$$

where we used \({\mathbf {E}}\,\sum \limits _{|k|= L+1} |w_k|\le CL^{d-1}\). Applying the Gronwall–Bellman inequality, we obtain

$$\begin{aligned} {\mathbf {E}}\,|w(\tau \wedge \tau _R)|_L\le L^{d-1}e^{-L+C_1(R)(L+1)^{mq}\tau }. \end{aligned}$$

Choosing \(q< 1/m\), we obtain that \({\mathbf {E}}\,|w(\tau \wedge \tau _R)|_L\rightarrow 0\) as \(L\rightarrow \infty \) and, consequently, \({\mathbf {E}}\,|w_j(\tau \wedge \tau _R)|=0\) for all \(j\in {\mathcal {C}}^\infty \). Sending \(R\rightarrow \infty \), in view of (4.31) we get that \({\mathbf {E}}\,|w_j(\tau )|=0\) for any \(\tau \ge 0\) and \(j\in {\mathcal {C}}^\infty \). \(\square \)

Let us now investigate the dynamics in the original \((I,\varphi )\)-variables. Let \(u^{\varepsilon }(\tau )\) be a solution of (2.5)–(2.6), written in the slow time and \(I^\varepsilon (\tau )=I(u^\varepsilon (\tau ))\) be the corresponding vector of actions. By Theorems 3.1(4) and 4.2 we have

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0} {\mathcal {D}}(I^{\varepsilon }(\cdot ))=\lim \limits _{\varepsilon \rightarrow 0} {\mathcal {D}}(J^{\varepsilon }(\cdot )) = Q_0 \quad \text{ on }\quad C([0,T],\mathbb {R}^N). \end{aligned}$$

Since the estimate of Theorem 3.1(4) and the convergence (4.11) are uniform in \(N\), then the convergence \({\mathcal {D}}(I^{\varepsilon }(\cdot ))\rightharpoonup Q_0\) is also uniform in \(N\). Thus, we get

Theorem 4.6

The assertion of Theorem 2.2 holds. Moreover, for any \(j\in {\mathcal {C}}\)

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{\tau \in [0,T]} e^{2\alpha I^0_j(\tau )}< C \text{ and } \int \limits _0^T {\mathbf {P}}\,(I_j^0(\tau )<\delta )\, d\tau \rightarrow 0 \text{ as } \delta \rightarrow 0, \end{aligned}$$
(4.32)

where the latter convergence is uniform in \(N\).

Let us define a local energy of a \(j\)-th rotator as

$$\begin{aligned} H^\varepsilon _j(u)=\frac{1}{2} F_j(|u_j|^2) + \frac{\sqrt{\varepsilon }}{4}\sum \limits _{k:|j-k|=1} G(|u_j-u_k|^2). \end{aligned}$$

Consider the vectors \( \hat{H}^\varepsilon (u):=(H^\varepsilon _j(u))_{j\in {\mathcal {C}}} \) and \( \hat{F}(I):=\frac{1}{2} (F_j(2I_j))_{j\in {\mathcal {C}}}. \)

Corollary 4.7

Let \(I^0(\tau )\) be a unique weak solution of system (2.8)–(2.9). Then

$$\begin{aligned} {\mathcal {D}}\big (\hat{H}^\varepsilon (u^\varepsilon (\cdot ))\big ) \rightharpoonup {\mathcal {D}}\big (\hat{F}(I^0(\cdot ))\big ) \quad \text{ as }\quad \varepsilon \rightarrow 0 \quad \text{ on }\quad C([0,T],\mathbb {R}^N), \end{aligned}$$

uniformly in \(N\).

Proof

The second estimate of Theorem 3.1(4) implies that the process \(u^\varepsilon \) satisfies the first estimate of Corollary 3.3. Since the potential \(G\) has at most a polynomial growth, we get

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0} {\mathcal {D}}\big (\hat{H}^\varepsilon (u^\varepsilon (\cdot ))\big ) =\lim \limits _{\varepsilon \rightarrow 0}{\mathcal {D}}\big (\hat{F}(I^\varepsilon (\cdot ))\big ) \quad \text{ on }\quad C([0,T],\mathbb {R}^N) \end{aligned}$$
(4.33)

in the sense that if one limit exists then another one exists as well and the two are equal. Moreover, if one convergence holds uniformly in \(N\) then another one also holds uniformly in \(N\). It remains to note that, due to Theorem 4.6, we have \({\mathcal {D}}\big (\hat{F}(I^\varepsilon (\cdot ))\big )\rightharpoonup {\mathcal {D}}\big (\hat{F}(I^0(\cdot ))\big )\) as \(\varepsilon \rightarrow 0\) uniformly in \(N\). \(\square \)

4.2 Joint Distribution of Actions and Angles

Here we prove Theorem 4.8, which describes the limiting joint dynamics of actions and angles. Let, as usual, \(u^{\varepsilon }(\tau )\) be a solution of (2.5)–(2.6), written in the slow time, and let \(I^\varepsilon (\tau )=I(u^\varepsilon (\tau )),\varphi ^\varepsilon (\tau )=\varphi (u^\varepsilon (\tau ))\). Denote by \(\mu ^{\varepsilon }_\tau ={\mathcal {D}}(I^{\varepsilon }(\tau ), \varphi ^{\varepsilon }(\tau ))\) the law of \(u^\varepsilon (\tau )\) in action-angle coordinates. For any function \(h(\tau )\ge 0\) satisfying \(\int \limits _0^T h(\tau ) \, d\tau =1\), set \(\mu ^{\varepsilon }(h):=\int \limits _0^T h(\tau ) \mu ^{\varepsilon }_\tau \, d\tau \). Moreover, denote \(m^{0}(h):=\int \limits _0^T h(\tau ) {\mathcal {D}}(I^{0}(\tau )) \, d\tau \), where \(I^{0}(\tau )\) is a weak solution of (2.8)–(2.9).

Theorem 4.8

For any continuous function \(h\) as above, we have

$$\begin{aligned} \mu ^{\varepsilon }(h)\rightharpoonup m^{0}(h)\times d\varphi \quad \text{ as } \quad \varepsilon \rightarrow 0 \quad \text{ uniformly } \text{ in } N. \end{aligned}$$

Proof

Let us first consider the case \(h=(\tau _2-\tau _1)^{-1} \mathbb {I}_{[\tau _1,\tau _2]}\), where \(\mathbb {I}_{[\tau _1,\tau _2]}\) is an indicator function of the interval \([\tau _1,\tau _2]\). Take a set \(\Lambda \) as in (4.13) and a function \(P\in {\mathcal {L}}_b(\mathbb {R}^N\times \mathbb {T}^N)\) which depends on \((I,\varphi )=(I_j,\varphi _j)_{j\in {\mathcal {C}}}\in \mathbb {R}^N\times \mathbb {T}^N\) only through \((I_j,\varphi _j)_{j\in \Lambda }\). Let us first treat the case when the function \(P(u):=P(I,\varphi )(u)\) belongs to \({\mathcal {L}}_{loc}(\mathbb {C}^N)\) (this can fail since the vector-function \(\varphi (u)\) has a discontinuity when \(u_j=0\) for some \(j\in {\mathcal {C}}\), so the function \( P(u)\) may be also discontinuous there). Let \(v^{\varepsilon }(\tau )\) be a solution of (4.1) and \(J^\varepsilon (\tau ),\psi ^\varepsilon (\tau )\) be the corresponding vectors of actions and angles. Due to Theorem 3.1(4), we have

$$\begin{aligned} \int \limits _{\tau _1}^{\tau _2} \langle \mu ^{\varepsilon }_\tau ,P\rangle \, d\tau = {\mathbf {E}}\,\int \limits _{\tau _1}^{\tau _2} P(u^{\varepsilon }(\tau )) \, d\tau \quad \text{ is } \text{ close } \text{ to } \quad {\mathbf {E}}\,\int \limits _{\tau _1}^{\tau _2}P( v^{\varepsilon }(\tau )) \, d\tau \quad \text{ uniformly } \text{ in }\, N. \end{aligned}$$

Due to Lemma 4.4, the integral \(\displaystyle {}{{\mathbf {E}}\,\int \limits _{\tau _1}^{\tau _2} P( v^{\varepsilon }(\tau )) \, d\tau }\) is close to \(\displaystyle {}{{\mathbf {E}}\,\int \limits _{\tau _1}^{\tau _2}\langle P \rangle (J^{\varepsilon }(\tau )) \, d\tau }\) uniformly in \(N\). Due to Theorem 4.2, the last integral is uniformly in \(N\) close to

$$\begin{aligned} {\mathbf {E}}\,\int \limits _{\tau _1}^{\tau _2}\langle P \rangle (J^{0}(\tau )) \, d\tau ={\mathbf {E}}\,\int \limits _{\mathbb {T}^N}\int \limits _{\tau _1}^{\tau _2}P(J^{0}(\tau ), \varphi ) \, d\tau d\varphi =(\tau _2-\tau _1)\langle m^{0}(h)\times d\varphi ,P\rangle . \end{aligned}$$

If the function \(P(u)\notin {\mathcal {L}}_{loc}(\mathbb {C}^N)\), we approximate it by functions \(P_\delta (u)\in {\mathcal {L}}_{loc}(\mathbb {C}^N)\),

$$\begin{aligned} P_\delta (u) = P(u) k_\delta ([I(u)]), \qquad [I(u)]:=\min \limits _{j\in \Lambda } I_j(u), \end{aligned}$$

where the function \(k_\delta \) is smooth, \(0\le k_\delta \le 1\), \(k_\delta (x)=0\) for \(x\le \delta \) and \(k_\delta (x)=1\) for \(x\ge 2\delta \). Then we let \(\delta \rightarrow 0\) as \(\varepsilon \rightarrow 0\) and use the estimate of Lemma 4.3 and (4.12).

In the case of a continuous function \(h\), we approximate it by piecewise constant functions. \(\square \)

4.3 Stationary Measures

In this section we prove Theorem 2.3 which describes the limiting behaviour of a stationary regime of (2.5).

4.3.1 The effective equation and proof of Theorem 2.3(i)

The averaged equation (4.9) is irregular: its dispersion matrix is not Lipschitz continuous, so we do not know if (4.9) is mixing or not. We are going to lift it to so-called effective equation which is regular and mixing.

Let us define an operator \(\Psi _\theta : v=(v_j)_{j\in {\mathcal {C}}}\in \mathbb {C}^N\mapsto \mathbb {C}^N\) of rotation by an angle \(\theta =(\theta _j)_{j\in {\mathcal {C}}}\in \mathbb {T}^N\), i.e. \((\Psi _\theta v)_j=v_je^{i\theta _j}\). We rewrite the function \({\mathcal {R}}_j\) from (2.7) as

$$\begin{aligned} {\mathcal {R}}_j(J)=\langle g_j(v)\cdot v_j \rangle = \int \limits _{\mathbb {T}^{N}} g_j(\Psi _\theta v)\cdot (e^{i\theta _j} v_j)\, d\theta = {\mathcal {K}}_j(v)\cdot v_j, \end{aligned}$$
(4.34)

where \(\displaystyle {}{{\mathcal {K}}_j(v):=\int \limits _{\mathbb {T}^{N}} e^{-i\theta _j}g_j(\Psi _\theta v)\, d\theta }\) and \( d\theta \) is a normalized Lebesgue measure on the torus \( \mathbb {T}^{N}\). Consider the effective equation

$$\begin{aligned} d v_j = {\mathcal {K}}_j(v)\,d\tau + \sqrt{{\mathcal {T}}_j}d\beta _j, \quad j\in {\mathcal {C}}, \end{aligned}$$
(4.35)

where \(\beta _j\), as usual, are standard complex independent Brownian motions. It is well known that a stochastic equation of the form (4.35) has a unique solution which is defined globally (see [21]), and that it is mixing (see [21, 35, 36]). The following proposition explains the role of the effective equation.

Proposition 4.9

  1. (i)

    Let \(v(\tau )\), \(\tau \ge 0\) be a weak solution of the effective equation (4.35) and \(J(\tau )=J(v(\tau ))\) be the corresponding vector of actions. Then \(J(\tau )\), \(\tau \ge 0\) is a weak solution of the averaged equation (4.9).

  2. (ii)

    Let \(J^0(\tau )\), \(\tau \ge 0\) be a weak solution of the averaged equation (4.9). Then for any vector \(\theta =(\theta _j)_{j\in {\mathcal {C}}}\in \mathbb {T}^{N}\) there exists a weak solution \(v(\tau )\) of the effective equation (4.35) such that

    $$\begin{aligned} {\mathcal {D}}(J(v(\cdot )))={\mathcal {D}}(J^0(\cdot )) \text{ on } C([0,\infty ),\mathbb {R}^N) \quad \text{ and } \quad v_j(0)=\sqrt{2J^0_j(0)}e^{i\theta _j},\,j\in {\mathcal {C}}. \end{aligned}$$
    (4.36)

Proof

  1. (i)

    Due to (4.34) and (4.35), the actions \(J(\tau )\) satisfy

    $$\begin{aligned} d J_j=({\mathcal {R}}_j(J) + {\mathcal {T}}_j)\,d\tau + \sqrt{{\mathcal {T}}_j}v_j\cdot \,d\beta _j, \quad j\in {\mathcal {C}}. \end{aligned}$$
    (4.37)

    The drift and the diffusion matrix of Eq. (4.37) coincide with those of the averaged equation (4.9). Consequently, \(J(\tau )\) is a solution of the (local) martingale problem associated with the averaged equation (see [19], Proposition. 5.4.2). So, due to [19], Proposition 5.4.6, we get that \(J(\tau )\) is a weak solution of the averaged equation (4.9).

  2. (ii)

    Let \(v(\tau )\) be a solution the effective equation with the initial condition as in (4.36). Then, due to (i), the process \(J(\tau ):=J(v(\tau ))\) is a weak solution of the averaged equation and \(J(0)=J^0(0)\). Since the weak solution of the averaged equation is unique, we obtain that \({\mathcal {D}}(J(\cdot ))={\mathcal {D}}(J^0(\cdot ))\). Consequently, \(v(\tau )\) is the desired process.

\(\square \)

Let \(m\) be the unique stationary measure of the effective equation. Denote the projections to the spaces of actions and angles by \(\Pi _{ac}:v\in \mathbb {C}^N\mapsto \mathbb {R}^N_{+0}\ni I\) and \(\Pi _{ang}:v\in \mathbb {C}^N\mapsto \mathbb {T}^N\ni \psi \) correspondingly. Denote

$$\begin{aligned} \pi :=\Pi _{ac*}m. \end{aligned}$$
(4.38)

Corollary 4.10

The averaged equation (4.9) is mixing, and \(\pi \) is its unique stationary measure. More precisely, for any its solution \(J(\tau )\) we have \({\mathcal {D}}(J(\tau ))\rightharpoonup \pi \) as \(\tau \rightarrow \infty .\)

Corollary 4.10 implies Theorem 2.3(i).

Proof

First we claim that \(\pi \) is a stationary measure of the averaged equation. Indeed, take a stationary distributed solution \(\widetilde{v}(\tau )\) of the effective equation, \({\mathcal {D}}(\widetilde{v}(\tau ))\equiv m\). By Proposition 4.9(i), the process \(J(\widetilde{v}(\tau ))\) is a stationary weak solution of the averaged equation. It remains to note that (4.38) implies \({\mathcal {D}}\big (J(\widetilde{v}(\tau ))\big )\equiv \pi .\)

Now we claim that any solution \(J^0(\tau )\) of the averaged equation converges in distribution to \(\pi \) as \(\tau \rightarrow \infty \). For some \(\theta \in \mathbb {T}^N\) take \(v(\tau )\) from Proposition 4.9(ii). Due to the mixing property of the effective equation, \({\mathcal {D}}(v(\tau ))\rightharpoonup m\) as \(\tau \rightarrow \infty \) and, consequently, \({\mathcal {D}}(J^0(\tau ))={\mathcal {D}}(J(v(\tau )))\rightharpoonup \Pi _{ac*}m=\pi \) as \(\tau \rightarrow \infty \). \(\square \)

Proof of Theorem 2.3(ii). First we will show that

$$\begin{aligned} \Pi _{ac*}\widetilde{\mu }^\varepsilon \rightharpoonup \pi \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$
(4.39)

We will work in the \(v\)-variables. Note that Eq. (4.1) is mixing since it is obtained by a \(C^2\)-smooth time independent change of variables from Eq. (2.5), which is mixing. Denote by \(\widetilde{\nu }^\varepsilon \) its unique stationary measure. Due to Theorem 3.1(4) to establish (4.39) it suffices to show that

$$\begin{aligned} \Pi _{ac*}\widetilde{\nu }^\varepsilon \rightharpoonup \pi \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$
(4.40)

Let \(\widetilde{v}^\varepsilon (\tau )\) be a stationary solution of Eq. (4.1), \({\mathcal {D}}(\widetilde{v}^\varepsilon (\tau ))\equiv \widetilde{\nu }^\varepsilon \), and \(\widetilde{J}^\varepsilon (\tau )=J(\widetilde{v}^\varepsilon (\tau ))\) be the corresponding vector of actions. Similarly to Proposition 4.1 we get that the set of laws \(\{{\mathcal {D}}(\widetilde{J}^\varepsilon (\cdot )),\; 0<\varepsilon \le 1\}\) is tight in \(C([0,T],\mathbb {R}^N)\). Let \(\widetilde{Q}_0\) be its limiting point as \(\varepsilon _k\rightarrow 0.\) Obviously, it is stationary in \(\tau \). The same arguments that was used in the proof of Theorem 4.2 imply

Proposition 4.11

The measure \(\widetilde{Q}_0\) is a law of the process \(\widetilde{J}^0(\tau )\), \(0\le \tau \le T\), which is a stationary weak solution of the averaged equation (4.9).

Since \(\pi \) is the unique stationary measure of the averaged equation, we have \({\mathcal {D}}(\widetilde{J}^0(\tau ))\equiv \pi \). Consequently, we get (4.40) which implies (4.39).

Let \(\widetilde{u}^\varepsilon (\tau )\) be a stationary solution of Eq. (2.5) and \(\widetilde{I}^\varepsilon (\tau ),\widetilde{\varphi }^\varepsilon (\tau )\) be the corresponding vectors of actions and angles. By the same reason as in Theorem 4.8, we have

$$\begin{aligned} \widetilde{\mu }^\varepsilon (h)\rightharpoonup \widetilde{m}^0(h)\times d\varphi \quad \text{ as }\quad \varepsilon \rightarrow 0, \end{aligned}$$
(4.41)

where \(\widetilde{\mu }^\varepsilon (h)\) and \( \widetilde{m}^0(h)\) are defined as \(\mu ^\varepsilon (h)\) and \(m^0(h)\), but with the processes \(I^\varepsilon (\tau ),\varphi ^\varepsilon (\tau )\) and \(I^0(\tau )\) replaced by the processes \(\widetilde{I}^\varepsilon (\tau ),\widetilde{\varphi }^\varepsilon (\tau )\) and \(\widetilde{J}^0(\tau )\) correspondingly. Since a stationary regime does not depend on time, we get (2.11):

$$\begin{aligned} {\mathcal {D}}(\widetilde{I}^\varepsilon (\tau ), \widetilde{\varphi }^\varepsilon (\tau )) \rightharpoonup \pi \times d\varphi \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$
(4.42)

Assume now that the averaged equation for the infinite system of rotators (4.26) has a unique stationary measure \(\pi ^\infty \) in the class of measures satisfying \(\sup \limits _{j\in {\mathcal {C}}^\infty }\langle \pi ^\infty , J_j\rangle <\infty \). Let us define \(\widetilde{J}^{\varepsilon ,N}\) as in (4.25), but with \(J^\varepsilon \) replaced by \(\widetilde{J}^\varepsilon \). The set of laws \(\{{\mathcal {D}}(\widetilde{J}^{\varepsilon ,N}(\cdot )),\; 0<\varepsilon \le 1,N\in \mathbb {N}\}\) is tight in \(C([0,T],\mathbb {R}^\infty )\). Let \(\widetilde{Q}^\infty _0\) be its limiting point as \(\varepsilon _k\rightarrow 0,N_k\rightarrow \infty \). Similarly to Proposition 4.5 we get

Proposition 4.12

The measure \(\widetilde{Q}_0^\infty \) is a law of the process \(\widetilde{J}^{0,\infty }(\tau )\), \(0\le \tau \le T\), which is a stationary weak solution of Eq. (4.26), satisfying the first estimate from (4.12) for all \(j\in {\mathcal {C}}^\infty \).

Thus, we obtain that a marginal distribution of the measure \(\widetilde{Q}_0^\infty \) as \(\tau =const\) is a stationary measure of Eq. (4.26) from the class of measures above. So that it coincides with \(\pi ^\infty \) and we have \({\mathcal {D}}(\widetilde{J}^{\varepsilon ,N}(\tau ))\rightharpoonup \pi ^\infty \) as \(\varepsilon \rightarrow 0, N\rightarrow \infty \). Then, arguing as in Theorem 4.2, we get that the convergence (4.40) is uniform in \(N\). As in the proof of Theorem 4.8, this implies that the convergence (4.41) and, consequently, the convergence (4.42) are also uniform in \(N\).

Proof of Theorem 2.3(iii) Due to the mixing property of (2.5), we have \({\mathcal {D}}(I^{\varepsilon }(\tau ))\rightharpoonup \Pi _{ac*}\widetilde{\mu }^\varepsilon \) as \(\tau \rightarrow \infty \). Then item (ii) of the theorem implies that \(\Pi _{ac*}\widetilde{\mu }^\varepsilon \rightharpoonup \pi \) as \(\varepsilon \rightarrow 0\). On the other hand, Theorem 4.6 implies that \({\mathcal {D}}(I^\varepsilon (\tau ))\rightharpoonup {\mathcal {D}}(I^0(\tau ))\) as \(\varepsilon \rightarrow 0\) for any \(\tau \ge 0\), where \(I^0(\tau )\) is a weak solution of Eqs. (2.8)–(2.9). Then item (i) of the theorem implies that \({\mathcal {D}}(I^0(\tau ))\rightharpoonup \pi \) as \(\tau \rightarrow \infty \). The proof of the theorem is completed. \(\square \)

Remark 4.13

It is possible to show that the effective equation is rotation invariant: if \(v(\tau )\) is its weak solution, then for any \(\xi \in \mathbb {T}^{N}\) we have that \(\Psi _\xi v\) is also its weak solution. Since it has the unique stationary measure \(m\), we get that \(m\) is rotation invariant. Consequently, \(\Pi _{ang*}m=\,d\varphi .\) That is why the convergence (2.15) is equivalent to (2.11).

4.4 Examples

1. Consider a system with linear dissipation, i.e. \(p=2\) and \(g_j(u)= - u_j + \sum \limits _{k:|k-j|=1} b_{jk} u_k\), where \(b_{jk}\in \mathbb {C}\). If \(|b_{jk}|\) are sufficiently small uniformly in \(j\) and \(k\) then assumption Hg is satisfied (see Example 2.4). Since \(\left\langle u_k\cdot u_j\right\rangle =0\) for \(k\ne j\), we have \({\mathcal {R}}_j(I)=-2 I_j\). Then the averaged equation (2.8) turns out to be diagonal and takes the form

$$\begin{aligned} d I_j= (-2I_j + {\mathcal {T}}_j) d\tau + \sqrt{2{\mathcal {T}}_j I_j} \,d\widetilde{\beta }_j, \quad j\in {\mathcal {C}}. \end{aligned}$$
(4.43)

The unique stationary measure of (4.43) is

$$\begin{aligned} \pi (dI)=\prod _{j\in {\mathcal {C}}} \frac{2}{{\mathcal {T}}_j} \mathbb {I}_{\mathbb {R}_+}(I_j) e^{-2I_j/{\mathcal {T}}_j} dI_j. \end{aligned}$$

The averaged equation for the infinite system of rotators is diagonal and, consequently, has a unique stationary measure. Thus, the convergence (2.11) holds uniformly in \(N\).

2. Let \(d=1\) and \({\mathcal {C}}=\{1,2,\ldots ,N\}\). Put for simplicity \(p=4\) and choose

$$\begin{aligned} g_j(u)=\frac{1}{4}\Big ( |u_{j+1}|^{2} u_j-|u_{j-1}|^2 u_j -|u_j|^2 u_j \Big ), \end{aligned}$$

where \(1\le j\le N\), \(u_0=u_{N+1}:=0\). By the direct computation one can verify that \(g_j\) satisfies the condition Hg. We have \(R_j(I)=\left\langle g_j(u)\cdot u_j \right\rangle = I_{j+1} I_j - I_{j-1} I_j- I_j^2\), and the averaged equation (2.8) takes the form

$$\begin{aligned} d I_j = \Big ( \frac{1}{2} (2I_{j+1} I_j - 2I_{j-1} I_j) - I_j^2 + {\mathcal {T}}_j\Big )\,d\tau + \sqrt{2I_j{\mathcal {T}}_j}\,d\widetilde{\beta }_j. \end{aligned}$$

Its r.h.s. consists of two parts:

$$\begin{aligned} d I_j/d\tau =\widetilde{\nabla }\Theta (j) + \text{ Ter }(j), \end{aligned}$$

where \(\Theta (j):=2I_{j+1} I_j\), \(\widetilde{\nabla }\Theta (j):= \frac{1}{2}(\Theta (j) - \Theta (j-1))\) is the discrete gradient of \(\Theta \), and \(\text{ Ter }(j):= - I_j^2 + {\mathcal {T}}_j + \sqrt{2I_j{\mathcal {T}}_j}\,d\widetilde{\beta }_j/d\tau \). Analogically to the concept of the flow of energy (see [8], Section 5.2) we call the function \(\Theta (j)\) the flow of actions. The term \(\widetilde{\nabla }\Theta (j)\) describes the transport of actions through the \(j\)-th site while the term \(\text{ Ter }(j)\) can be considered as an input of a (new) stochastic thermostat interacting with the \(j\)-th node. In the same way one can treat the case \(p=2q\), where \(q\in \mathbb {N},\,q>2\).

5 Auxiliary Propositions

In this section we prove Lemmas 4.3, 4.4 and sketch Theorem 3.1.

5.1 Proof of Lemma 4.3

For the brevity of notations we skip the index \(\varepsilon \) everywhere, except the set \(E^\varepsilon \). Let us rewrite (4.2) for \(k\in \Lambda \) as an equation with real noise

$$\begin{aligned} d{\mathcal {J}}= A^{\mathcal {J}}\,d\tau + \sigma \,d\hat{\beta }, \quad \text{ where }\quad {\mathcal {J}}:=(J_k)_{k\in \Lambda },\;A^{\mathcal {J}}:=(A^J_k)_{k\in \Lambda }, \end{aligned}$$
(5.1)

\(\sigma \) is \(M\times 2N\) matrix with real entires and \(\hat{\beta }=(\mathrm{Re }\beta _k,\mathrm{Im }\beta _k)_{k\in {\mathcal {C}}}\). Denote by \(a=(a_{kl})_{k,l\in \Lambda }\) the diffusion matrix for (5.1), divided by two, \(a:=\frac{1}{2}\sigma \sigma ^T.\) It is \(M\times M\)-matrix with real entires \(a_{kl}=S^J_{kl}/2\), \(k,l\in \Lambda \), where \(S^J_{kl}\) is calculated in (5.35). Then Theorem 3.1(3) implies that

$$\begin{aligned} | a_{kl}-{\mathcal {T}}_k \delta _{kl}\frac{|v_k|^2}{2} |\le C\sqrt{\varepsilon }|v_k||v_l|. \end{aligned}$$
(5.2)

Step 1. For \(R>0\) denote by \(\tau _R\) the stopping time

$$\begin{aligned} \tau _R=\inf \{\tau \ge 0: \Vert {\mathcal {J}}(\tau )\Vert _{\mathbb {R}^M}\vee \Vert A^{\mathcal {J}}(\tau )\Vert _{\mathbb {R}^M} \ge R\}, \end{aligned}$$

where \(\Vert \cdot \Vert _{\mathbb {R}^M}\) stands for the Euclidean norm in \(\mathbb {R}^M\), \({\mathcal {J}}(\tau )={\mathcal {J}}(v(\tau ))\), \(A^{\mathcal {J}}(\tau )=A^{\mathcal {J}}(v(\tau ))\), and \(v(\tau )\) is a solution of (4.1). A particular case of Theorem 2.2.2 in [22] provides that

$$\begin{aligned} {\mathbf {E}}\,\int \limits _0^{\tau _R \wedge T} e^{-\int \limits _0^\tau \Vert A^{\mathcal {J}}(s)\Vert _{\mathbb {R}^M}\, ds }\mathbb {I}_{E^\varepsilon }({\mathcal {J}}(\tau )) (\det a(\tau ))^{1/M} \, d\tau \le C(R,M) |E^\varepsilon |^{1/M}, \end{aligned}$$
(5.3)

where \(a(\tau )=a(v(\tau ))\). Denote the event \(\Omega _\nu (\tau )=\{\det a(\tau )< \nu \}. \) We have

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,({\mathcal {J}}(\tau )\in E^\varepsilon )\, d\tau&\!= {\mathbf {E}}\,\!\int \limits _0^T\!\! \mathbb {I}_{E^\varepsilon }({\mathcal {J}}(\tau )) \, d\tau \!\le \! {\mathbf {E}}\,\int \limits _0^{\tau _R \wedge T} \mathbb {I}_{E^\varepsilon }({\mathcal {J}}(\tau )) \mathbb {I}_{\overline{\Omega }_\nu }({\mathcal {J}}(\tau )) \Big (\frac{\det a(\tau )}{\nu }\Big )^{1/M} \, d\tau \nonumber \\&\quad + \int \limits _0^T {\mathbf {P}}\,(\Omega _\nu (\tau )) d\tau + T{\mathbf {P}}\,(\tau _R<T)=:{\mathcal {Y}}_1+{\mathcal {Y}}_2+{\mathcal {Y}}_3. \end{aligned}$$
(5.4)

Due to (5.3),

$$\begin{aligned} {\mathcal {Y}}_1\le \frac{e^{TR}}{\nu ^{1/M}} {\mathbf {E}}\,\int \limits _0^{\tau _R \wedge T} e^{-\int \limits _0^\tau \Vert A^{\mathcal {J}}(s)\Vert _{\mathbb {R}^M}\, ds } \mathbb {I}_{E^\varepsilon } ({\mathcal {J}}(\tau ))(\det a(\tau ))^{1/M} \, d\tau \le C(R,M)\Big (\frac{|E^\varepsilon |}{\nu }\Big )^{1/M}. \end{aligned}$$
(5.5)

Take \(\nu =\sqrt{|E^\varepsilon |}\). Choosing \(R\) sufficiently large and \(\varepsilon \) sufficiently small, we can make the terms \({\mathcal {Y}}_1\) and \( {\mathcal {Y}}_3\) arbitrary small uniformly in \(N\). Indeed, for \({\mathcal {Y}}_1\) this follows from (5.5), while for \({\mathcal {Y}}_3\) this follows from Corollary 3.3 and estimate (4.6). So, to finish the proof of the lemma it remains to show that if \(\nu (\varepsilon )\rightarrow 0\) with \(\varepsilon \rightarrow 0\) then

$$\begin{aligned} {\mathcal {Y}}_2=\int \limits _0^T {\mathbf {P}}\,(\Omega _\nu (\tau )) \, d\tau \rightarrow 0 \quad \text{ when }\quad \varepsilon \rightarrow 0 \quad \text{ uniformly } \text{ in } N. \end{aligned}$$
(5.6)

Step 2. The rest of the proof is devoted to the last convergence. Note that by (5.2)

$$\begin{aligned} \det a = \prod \limits _{k\in \Lambda } ({\mathcal {T}}_kJ_k) + \sqrt{\varepsilon }\Delta _1, \end{aligned}$$

where \({\mathbf {E}}\,\sup \limits _{0\le \tau \le T} |\Delta _1| \le C \) by Corollary 3.3. The constant \(C\) does not depend on \(N\) because the dimension \(M\) does not depend on it. Then

$$\begin{aligned} {\mathbf {P}}\,(\Omega _\nu ) \le {\mathbf {P}}\,\big (\prod \limits _{k\in \Lambda } ({\mathcal {T}}_kJ_k) < \nu +\sqrt{\varepsilon }|\Delta _1| \big )\le \sum \limits _{k\in \Lambda }{\mathbf {P}}\,\big (J_k< {\mathcal {T}}_k^{-1}(\nu +\sqrt{\varepsilon }|\Delta _1|)^{1/M}\big ). \end{aligned}$$

Thus, to establish (5.6), it is sufficient to show that

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,\big ( \sqrt{J_j(\tau )} < \delta \big ) \, d\tau \rightarrow 0 \text{ when } \delta \rightarrow 0 \text{ uniformly } \text{ in } N \text{ and } \varepsilon \, \mathrm{sufficiently\, small.} \end{aligned}$$
(5.7)

Step 3. To prove the last convergence we use the concept of the local time. Let \(h\in C^2(\mathbb {R})\) and its second derivative has at most polynomial growth at the infinity. We consider the process \(h_\tau :=h(J_j(\tau ))\). Then, by the Ito formula,

$$\begin{aligned} d h_\tau = A^h \,d\tau + \sigma ^h d\hat{\beta }, \end{aligned}$$

where

$$\begin{aligned} A^h = h'(J_j) A^{\mathcal {J}}_j+h''(J_j) a_{jj} = h'(J_j)(A_j\cdot v_j +d_{jj}^1) + h''(J_j)a_{jj}, \end{aligned}$$

and the \(1\times 2N\)-matrix \(\sigma ^h(\tau )=(\sigma ^h_k(\tau ))\) is out of the interest.

Due to Theorem 3.1(3) and (5.2), for sufficiently small \(\varepsilon \) we have

$$\begin{aligned} d_{jj}^1 \ge \frac{7}{8} {\mathcal {T}}_j, \quad |a_{jj}|\le \frac{3J_j}{2} {\mathcal {T}}_j. \end{aligned}$$
(5.8)

Let \(\Theta _\tau (b,\omega )\) be the local time for the process \(h_\tau \). Then for any Borel set \({\mathcal {G}}\subset \mathbb {R}\) we have

$$\begin{aligned} \int \limits _0^T \mathbb {I}_{{\mathcal {G}}}(h_\tau ) \sum \limits _{k} |\sigma _k^h|^2 \, d\tau = 2\int \limits _{-\infty }^{\infty } \mathbb {I}_{{\mathcal {G}}}(b)\Theta _T(b,\omega ) \, db. \end{aligned}$$

On the other hand, denoting \((h_\tau -b)_+:=\max (h_\tau -b,0)\), we have

$$\begin{aligned} (h_T-b)_+ = (h_0-b)_+ + \int \limits _0^T \mathbb {I}_{(b,\infty )} (h_\tau )\sigma ^h d\hat{\beta }+ \int \limits _0^T \mathbb {I}_{(b,\infty )}(h_\tau ) A^h \, d\tau + \Theta _T(b,\omega ). \end{aligned}$$

Consequently,

$$\begin{aligned} {\mathbf {E}}\,\int \limits _0^T \mathbb {I}_{{\mathcal {G}}}(h_\tau ) \sum \limits _{k} |\sigma _k^h|^2 \, d\tau \!=\!2{\mathbf {E}}\,\int \limits _{-\infty }^{\infty } \mathbb {I}_{{\mathcal {G}}}(b) \left( (h_T-b)_+ \!-\! (h_0-b)_+ \!-\! \int \limits _0^T \mathbb {I}_{(b,\infty )}(h_\tau ) A^h \,d\tau \right) \, db. \end{aligned}$$

The left-hand side is non negative, so

$$\begin{aligned} {\mathbf {E}}\,\int \limits _{-\infty }^{\infty } \mathbb {I}_{{\mathcal {G}}}(b)\int \limits _0^T \mathbb {I}_{(b,\infty )}(h_\tau ) A^h \, d\tau db \le {\mathbf {E}}\,\int \limits _{-\infty }^{\infty } \mathbb {I}_{{\mathcal {G}}}(b) \big ((h_T-b)_+ - (h_0-b)_+ \big ) \, db. \end{aligned}$$
(5.9)

Let us apply relation (5.9) with \({\mathcal {G}}=(\xi _1,\xi _2)\), \(\xi _2>\xi _1>0\) and a function \(h(x)\in C^2(\mathbb {R})\) that coincides with \(\sqrt{x}\) for \(x\ge \xi _1\) and vanishes for \(x\le 0\). Due to Corollary 3.3, the right-hand side of (5.9) is bounded by \((\xi _2-\xi _1)C\). Then

$$\begin{aligned} {\mathbf {E}}\,\int \limits _{\xi _1}^{\xi _2} \int \limits _0^T \mathbb {I}_{(b,\infty )}(\sqrt{J_j}) \left( \frac{A_j\cdot v_j + d_{jj}^1}{2\sqrt{J_j}} - \frac{a_{jj}}{4\sqrt{J_j^3}}\right) \, d\tau db \le (\xi _2-\xi _1)C. \end{aligned}$$
(5.10)

In view of estimate (4.6) we have

$$\begin{aligned} \displaystyle {}{{\mathbf {E}}\,\int \limits _{\xi _1}^{\xi _2} \int \limits _0^T \frac{|A_j\cdot v_j| }{2\sqrt{J_j}} \, d\tau db \le (\xi _2-\xi _1)C }. \end{aligned}$$

Moving this term to the right-hand side of (5.10), applying (5.8) and sending \(\xi _1\) to \(0^+\), we get

$$\begin{aligned} {\mathbf {E}}\,\int \limits _{0}^{\xi _2} \int \limits _0^T \mathbb {I}_{(b,\infty )}(\sqrt{J_j}) J_j^{-1/2} \, d\tau db \le C\xi _2. \end{aligned}$$

Note that

$$\begin{aligned} {\mathbf {E}}\,\int \limits _{0}^{\xi _2} \int \limits _0^T \mathbb {I}_{(b,\infty )}(\sqrt{J_j}) J_j^{-1/2} \, d\tau db&\ge \frac{1}{\delta } {\mathbf {E}}\,\int \limits _{0}^{\xi _2} \int \limits _0^T \mathbb {I}_{(b,\delta )}(\sqrt{J_j}) \, d\tau db \\&= \frac{1}{\delta } \int \limits _{0}^{\xi _2} \int \limits _0^T {\mathbf {P}}\,(b<\sqrt{J_j}<\delta )\, d\tau db. \end{aligned}$$

Consequently,

$$\begin{aligned} \frac{1}{\xi _2} \int \limits _{0}^{\xi _2} \int \limits _0^T {\mathbf {P}}\,(b<\sqrt{J_j}<\delta )\, d\tau db \le C\delta . \end{aligned}$$

Tending \(\xi _2\rightarrow 0^+\) we obtain that

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,\big ( 0<\sqrt{J_j} < \delta \big ) \, d\tau \rightarrow 0 \text{ when } \delta \rightarrow 0 \text{ uniformly } \text{ in } N \text{ and } \varepsilon \, \mathrm{sufficiently\, small}. \end{aligned}$$

Step 4. To establish (5.7) it remains to show that

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,\big ( |v_j(\tau )|=0) \, d\tau = 0 \text{ for } \text{ all } N\text{, } j\in {\mathcal {C}} \text{ and } \varepsilon \text{ sufficiently } \text{ small. } \end{aligned}$$
(5.11)

Writing a \(j\)-th component of Eq. (4.1) in the real coordinates \(v^x_j:=\mathrm{Re }v_j\) and \( v^y_j:=\mathrm{Im }v_j\), we obtain the following two-dimensional system:

$$\begin{aligned} dv_j^x=\mathrm{Re }\widetilde{A}_j\,d\tau +\mathrm{Re }(Wd B)_j,\quad dv_j^y=\mathrm{Im }\widetilde{A}_j\,d\tau +\mathrm{Im }(Wd B)_j, \end{aligned}$$
(5.12)

where \(\widetilde{A}_j:= \varepsilon ^{-1} i\nabla _j H_0+ i\nabla _j H_2+ g_j+ \sqrt{\varepsilon }r_j.\) By the direct computation we get that the diffusion matrix for (5.12) with respect to the real Brownian motion \((\mathrm{Re }\beta _k, \mathrm{Im }\beta _k)_{k\in {\mathcal {C}}}\) is

$$\begin{aligned} a^j:=\begin{pmatrix} d_{jj}^1+\mathrm{Re }d_{jj}^2 &{} \mathrm{Im }d_{jj}^2 \\ \mathrm{Im }d_{jj}^2 &{} d_{jj}^1-\mathrm{Re }d_{jj}^2 \end{pmatrix}. \end{aligned}$$

Theorem 3.1(3) implies that for \(\varepsilon \) sufficiently small, \(\det a^j(\tau )\) is separated from zero uniformly in \(\tau \). For \(R>0\) define a stopping time

$$\begin{aligned} \widetilde{\tau }_R=\inf \{\tau \ge 0: |v_j(\tau )|\vee |\widetilde{A}_j(\tau )|\ge \varepsilon ^{-1}R\}. \end{aligned}$$

Then, similarly to (5.4) and (5.5), we have

$$\begin{aligned} {\mathbf {E}}\,\int \limits _0^T \mathbb {I}_{[0,\delta )}(|v_j(\tau )|) \, d\tau&\le C e^{\varepsilon ^{-1}TR}{\mathbf {E}}\,\int \limits _0^{\widetilde{\tau }_R \wedge T} e^{-\int \limits _0^\tau |\widetilde{A}_j(s)|\, ds} \mathbb {I}_{[0,\delta )}(|v_j(\tau )|) (\det a^j(\tau ))^{1/2} \, d\tau \nonumber \\&\quad + T{\mathbf {P}}\,(\widetilde{\tau }_R<T) \le C(R,\varepsilon ^{-1})\sqrt{\delta }+ T{\mathbf {P}}\,(\widetilde{\tau }_R<T). \end{aligned}$$
(5.13)

Letting first \(\delta \rightarrow 0\) and then \(R\rightarrow \infty \) while \(\varepsilon \) is fixed, we arrive at (5.11). \(\square \)

5.2 Proof of Lemma 4.4

For the purposes of the proof we first introduce some notations. For events \(\Gamma _1, \Gamma _2\) and a random variable \(\xi \) we denote

$$\begin{aligned} {\mathbf {E}}_{\Gamma _1}\,\xi :={\mathbf {E}}\,(\xi \mathbb {I}_{\overline{\Gamma }_1}) \text{ and } {\mathbf {P}}\,_{\Gamma _1}(\Gamma _2):= {\mathbf {P}}\,(\Gamma _2\cap \overline{\Gamma }_1). \end{aligned}$$

Let us emphasize that in these definitions we consider an expectation and a probability on the complement of \(\Gamma _1\). By \(\kappa (r),\kappa _1(r),\ldots \) we denote various functions of \(r\) such that \(\kappa (r)\rightarrow 0\) as \(r\rightarrow \infty \). By \(\kappa _\infty (r)\) we denote functions \(\kappa (r)\) such that \(\kappa (r)=o(r^{-m})\) for each \(m>0\). We write \(\kappa (r)=\kappa (r;b)\) to indicate that \(\kappa (r)\) depends on a parameter \(b\). Functions \(\kappa _\infty (r),\kappa (r),\kappa (r;b),\ldots \) never depend on \(N\) and may depend on \(\varepsilon \) only through \(r\), and we do not indicate their dependence on the dimension \(d\), power \(p\) and time \(T\). Moreover, they can change from formula to formula.

Step 1. For the brevity of notation we skip the index \(\varepsilon \). Denote by \(\widetilde{\Lambda }\) the neighbourhood of radius \(1\) of \(\Lambda \):

$$\begin{aligned} \widetilde{\Lambda }:=\{n\in {\mathcal {C}}\big | \, \text{ there } \text{ exists } k\in \Lambda \text{ satisfying } |n-k|\le 1\}. \end{aligned}$$
(5.14)

Fix \(R>0\). Set

$$\begin{aligned} \Omega _R = \left\{ \max \limits _{k\in \widetilde{\Lambda }}\sup \limits _{0\le \tau \le T} |J_k(\tau )|\vee |A^\psi _k(\tau )| \ge R\right\} . \end{aligned}$$
(5.15)

Due to Corollary 3.3 and estimate (4.6),

$$\begin{aligned} {\mathbf {P}}\,(\Omega _R)\le \kappa _\infty (R). \end{aligned}$$
(5.16)

The polynomial growth of the function \(P\) implies

$$\begin{aligned} {\mathbf {E}}\,_{\overline{\Omega }_R} \sup \limits _{\tau \in [0,T]} \left| \int \limits _0^\tau P(J(s),\psi (s))\,ds \right| \le \kappa _\infty (R), \end{aligned}$$

and the function \(\langle P \rangle (J(s))\) satisfies a similar relation. Thus it is sufficient to show that for any \(R\ge 0\)

$$\begin{aligned} {\mathcal {U}}:= {\mathbf {E}}_{\Omega _{R}}\,\sup \limits _{\tau \in [0,T]} \left| \int \limits _0^\tau P(J(s), \psi (s)) - \langle P \rangle (J(s)) \, ds \right| \rightarrow 0 \quad \text{ as }\quad \varepsilon \rightarrow 0\quad \text{ uniformly } \text{ in } N. \end{aligned}$$

For this purpose we consider a partition of the interval \([0,T]\) to subintervals of length \( \nu \) by the points

$$\begin{aligned} \tau _l=\tau _0 + l \nu , \quad 0\le l \le L,\quad L=[T/\nu ]-1, \end{aligned}$$

where the (deterministic) initial point \(\tau _0 \in [0,\nu )\) will be chosen later. Choose the diameter of the partition as

$$\begin{aligned} \nu =\varepsilon ^{7/8}. \end{aligned}$$

Denote

$$\begin{aligned} \displaystyle {}{\eta _l=\int \limits _{\tau _l}^{\tau _{l+1}} P(J(s), \psi (s)) - \langle P \rangle (J(s)) \, ds.} \end{aligned}$$

Then

$$\begin{aligned} \displaystyle {}{{\mathcal {U}}\le {\mathbf {E}}_{\Omega _{R}}\,\sum \limits _{l=0}^{L-1} |\eta _l| + \nu C(R).} \end{aligned}$$
$$\begin{aligned} \text{ Denote } Y(J)=(Y_{k}(J))_{k\in {\mathcal {C}}}:=\big (\partial _{J_k} H_0(J)\big )_{k\in {\mathcal {C}}}\in \mathbb {R}^N \mathrm{and} \, \ Y(\tau ):=Y(J(\tau )). \end{aligned}$$
(5.17)

We have

$$\begin{aligned} |\eta _l|&\le \left| \int \limits _{\tau _l}^{\tau _{l+1}} P\big ( J(s), \psi (s) \big ) - P \big (J(\tau _l), \psi (\tau _l)+ \varepsilon ^{-1}Y(\tau _l)(s-\tau _l) \big ) \, ds \right| \nonumber \\&\quad + \left| \int \limits _{\tau _l}^{\tau _{l+1}} P \big (J(\tau _l), \psi (\tau _l)+ \varepsilon ^{-1}Y(\tau _l)(s-\tau _l) \big ) - \langle P \rangle \big (J(\tau _l)\big ) \, ds \right| \nonumber \\&\quad + \left| \int \limits _{\tau _l}^{\tau _{l+1}} \langle P \rangle \big (J(\tau _l)\big ) - \langle P \rangle \big (J(s)\big ) \, ds \right| =: {\mathcal {Y}}_l^1 + {\mathcal {Y}}_l^2 + {\mathcal {Y}}_l^3. \end{aligned}$$
(5.18)

Step 2. In the next proposition we will introduce “bad” events, outside of which actions are separated from zero, change slowly, and the rotation frequencies \(Y(J(\tau _l))\) are not resonant. We will choose the initial point \(\tau _0\) in such a way that probabilities of these events will be small, and it will be sufficient to estimate \({\mathcal {Y}}^1_l,{\mathcal {Y}}^2_l,{\mathcal {Y}}^3_l\) only outside these events. Recall that \(\widetilde{\Lambda }\) is defined in (5.14).

Proposition 5.1

There exist events \({\mathcal {F}}_l,\) \(0\le l\le L-1\), such that outside \({\mathcal {F}}_l\cup \Omega _R\)

$$\begin{aligned}&(i)\quad \forall k\in \Lambda \; \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} J_k(\tau )\ge \frac{1}{2} \varepsilon ^{1/24}, \quad (ii)\quad \forall k\in \widetilde{\Lambda }\; \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} |J_k(\tau ) - J_k(\tau _l)| \le \nu ^{1/3}, \\&(iii)\quad \Big |\frac{1}{\varepsilon ^{-1}\nu }\int _0^{\varepsilon ^{-1}\nu } P\big (J(\tau _l),\psi (\tau _l) +Y(\tau _l)s\big )\, ds - \langle P\rangle \big (J(\tau _l)\big )\Big | \le \kappa (\varepsilon ^{-1};R), \end{aligned}$$

where the function \(\kappa \) is independent from \(0\le l\le L-1\). There exists \(\tau _0\) such that

$$\begin{aligned} L^{-1} \sum \limits _{l=0}^{L-1} {\mathbf {P}}\,_{\Omega _R}({\mathcal {F}}_l) \le \kappa (\varepsilon ^{-1};R). \end{aligned}$$
(5.19)

Before proving this proposition we will finish the proof of the lemma. Outside \(\Omega _R\) we have \({\mathcal {Y}}_l^i\le \nu C(R)\le C_1(R)/L\). Fix \(\tau _0\) as in Proposition 5.1. Then from (5.19) we obtain

$$\begin{aligned} \sum \limits _{l=0}^{L-1}({\mathbf {E}}_{\Omega _{R}}\,\!-\!{{\mathbf {E}}}_{{\mathcal {F}}_l\cup \Omega _R}\,) {\mathcal {Y}}_l^i \!\le \! \frac{C(R)}{L} \sum \limits _{l=0}^{L-1} {\mathbf {P}}\,_{\Omega _R} ({\mathcal {F}}_l)\!\le \! C(R)\kappa (\varepsilon ^{-1};R) \!=\! \kappa _1(\varepsilon ^{-1}; R), \quad i\!=\!1,2,3. \end{aligned}$$

Thus, it is sufficient to show that for any \(R\ge 0\) we have

$$\begin{aligned} \displaystyle {}{\sum \limits _{l=0}^{L-1}{{\mathbf {E}}}_{{\mathcal {F}}_l\cup \Omega _R}\,({\mathcal {Y}}_l^1+{\mathcal {Y}}_l^2+{\mathcal {Y}}_l^3)}\rightarrow 0 \quad \text{ as }\quad \varepsilon \rightarrow 0\quad \text{ uniformly } \text{ in } N. \end{aligned}$$

Step 3. Now we will estimate each term \({\mathcal {Y}}_l^i\) outside the “bad” event \({\mathcal {F}}_l\cup \Omega _R\).

Terms \({\mathcal {Y}}_l^1.\) We will need the following

Proposition 5.2

For every \(k\in \Lambda \) and each \(0 \le l\le L-1\), we have

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R} \Big ( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} |\psi _k(\tau )- \big ( \psi _k(\tau _l) + \varepsilon ^{-1} Y_k(\tau _l)(\tau -\tau _l)\big )| \ge \varepsilon ^{1/24} \Big ) \le \kappa _\infty (\varepsilon ^{-1}), \end{aligned}$$
(5.20)

where the function \(\kappa _\infty \) is independent from \(k,l\).

Proof

Let us denote the event in the left-hand side of (5.20) by \(\Gamma \). According to (4.3),

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma )&\le {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R} \left( \varepsilon ^{-1}\sup \limits _{\tau _l \le \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^{\tau } Y_k(s)- Y_k(\tau _l) \, ds \right| \ge \frac{1}{3}\varepsilon ^{1/24} \right) \\&\quad +\, {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}\left( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^\tau \frac{A^\psi _k}{|v_k|^2} \, ds \right| \ge \frac{1}{3}\varepsilon ^{1/24} \right) \\&\quad +\, {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}\left( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^\tau \frac{ iv_k}{|v_k|^2} \cdot (WdB)_k \right| \ge \frac{1}{3}\varepsilon ^{1/24} \right) \\&=: {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _1)+{\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _2)+{\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _3). \end{aligned}$$

\(\Gamma _1:\) Due to (4.5), \(Y_k(J) \in {\mathcal {L}}_{loc}(\mathbb {R}^N)\). Since it depends on \(J\) only through \(J_n\) with \(n\) satisfying \(|n-k|\le 1\), we get

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _1)\le {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}\left( \max \limits _{n:|n-k| \le 1} \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} |J_n(\tau )- J_n(\tau _l)| \ge C(R) \varepsilon ^{1+1/24}\nu ^{-1} \right) . \end{aligned}$$

If \(\varepsilon \) is sufficiently small, we have \(C(R)\varepsilon ^{1+1/24}\nu ^{-1}>\nu ^{1/3}\) (recall that \(\nu =\varepsilon ^{7/8}\)). Then, due to Proposition 5.1(ii), we get

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _1) = 0 \quad \text{ for } \quad \varepsilon \ll 1. \end{aligned}$$

\(\Gamma _2:\) Proposition 5.1.i implies

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _2) \le {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R} \left( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} |A^\psi _k| \ge \frac{1}{3}\varepsilon ^{1/24+ 1/24}\nu ^{-1} \right) =0 \quad \text{ for }\quad \varepsilon \ll 1, \end{aligned}$$

since outside \(\Omega _R\) we have \(|A^\psi _k|\le R\), in view of (5.15).

\(\Gamma _3:\) In view of (5.36), the Burkholder-Davis-Gandy inequality jointly with Theorem 3.1(3), and Proposition 5.1(i) imply that

$$\begin{aligned} {{\mathbf {E}}}_{{\mathcal {F}}_l\cup \Omega _R}\,\sup \limits _{ \tau _l \!\le \! \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^{\tau } \frac{ iv_k}{|v_k|^2} \cdot ( W_k dB) \right| ^{2m}\!\le \! C(m) {{\mathbf {E}}}_{{\mathcal {F}}_l\cup \Omega _R}\,\left( \int \limits _{\tau _l}^{\tau _{l+1}} \frac{1}{|v_k|^2}\, ds \right) ^m \!\le \! C(m)\nu ^m\varepsilon ^{\!-\!m /24}, \end{aligned}$$

for any \(m>0\). From Chebyshev’s inequality it follows that

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _3) \le C(m) \nu ^m\varepsilon ^{-m(1/24+ 2/24)} \quad \text{ for } \text{ any } m>0. \end{aligned}$$

Thus, \({\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}(\Gamma _3) = \kappa _\infty (\varepsilon ^{-1}).\) \(\square \)

Estimates (i) and (ii) of Proposition 5.1 imply that outside \({\mathcal {F}}_l\cup \Omega _R\), for any \(k\in \Lambda \)

$$\begin{aligned} \sup \limits _{\tau _l\le \tau \le \tau _{l+1}}\big ||v_k(\tau )|-|v_k(\tau _l)|\big |\le \frac{\sqrt{2} |J_k(\tau )-J_k(\tau _l)|}{\sqrt{J_k(\tau )}+\sqrt{J_k(\tau _l)}}\le \nu ^{1/3}\varepsilon ^{-1/48}=\varepsilon ^{13/48}. \end{aligned}$$
(5.21)

Since \(P\in {\mathcal {L}}_{loc}(\mathbb {C}^N)\), then Proposition 5.2 and (5.21) imply that

$$\begin{aligned} {\mathbf {P}}\,_{{\mathcal {F}}_l\cup \Omega _R}\big ({\mathcal {Y}}_l^1\ge \nu C(R)(\varepsilon ^{1/24}+\varepsilon ^{13/48})\big )\le \kappa _\infty (\varepsilon ^{-1}). \end{aligned}$$

Then we get

$$\begin{aligned} {{\mathbf {E}}}_{{\mathcal {F}}_l\cup \Omega _R}\,{\mathcal {Y}}_l^1\le \nu C(R)(\varepsilon ^{1/24}+\varepsilon ^{13/48}) +\nu C(R)\kappa _\infty (\varepsilon ^{-1})= \nu \kappa (\varepsilon ^{-1};R). \end{aligned}$$

Terms \({\mathcal {Y}}_l^2\). Put \(\hat{s}:=\varepsilon ^{-1}(s-\tau _l)\). Then Proposition 5.1(iii) implies that outside \({\mathcal {F}}_l\cup \Omega _R \)

$$\begin{aligned} {\mathcal {Y}}_l^2=\nu \left| \frac{1}{\varepsilon ^{-1}\nu }\int \limits _{0}^{\varepsilon ^{-1}\nu } P\big (J(\tau _l), \psi (\tau _l)+Y(\tau _l)\hat{s} \big ) \, d\hat{s} - \langle P \rangle \big ( J(\tau _l)\big )\right| \le \nu \kappa (\varepsilon ^{-1};R). \end{aligned}$$

Terms \({\mathcal {Y}}_l^3\). Proposition 5.9(i) jointly with (5.21) implies that outside \({\mathcal {F}}_l\cup \Omega _R\) we have

$$\begin{aligned} {\mathcal {Y}}_l^3 \le \nu C(R) \varepsilon ^{13/48}. \end{aligned}$$

Step 4. Summing by \(l\), taking the expectation and noting that \(L\nu \le T\), we get

$$\begin{aligned} \sum \limits _{l=0}^{L-1}{{\mathbf {E}}}_{{\mathcal {F}}_l\cup \Omega _R}\,\left( {\mathcal {Y}}_l^1+{\mathcal {Y}}_l^2+{\mathcal {Y}}_l^3\right) \le L\left( \nu \kappa (\varepsilon ^{-1}; R)+ \nu C(R)\varepsilon ^{13/48}\right) \rightarrow 0 \text{ as } \varepsilon \rightarrow 0, \end{aligned}$$

uniformly in \(N\). The proof of the lemma is complete. \(\square \)

Proof of Proposition 5.1

We will construct the set \({\mathcal {F}}_l\) as a union of three parts. The first two are \({\mathcal {E}}_l:=\cup _{k\in \Lambda }{\mathcal {E}}_l^k\) and \(Q_l:=\cup _{k\in \widetilde{\Lambda }} Q_l^k\), where

$$\begin{aligned} {\mathcal {E}}^k_l:=\{ J_k (\tau _l) \le \varepsilon ^{1/24} \}, \quad Q^k_l:=\left\{ \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} |J_k(\tau ) - J_k(\tau _l)| \ge \nu ^{1/3} \right\} . \end{aligned}$$
(5.22)

Outside \(Q_l\) we have (ii) and, if \(\varepsilon \) is small, outside \({\mathcal {E}}_l\cup Q_l\) we get (i): for every \(k\in \Lambda \)

$$\begin{aligned} \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} J_k(\tau )\ge \varepsilon ^{1/24} - \nu ^{1/3}\ge \frac{1}{2}\varepsilon ^{1/24}, \quad \text{ if } \varepsilon \ll 1. \end{aligned}$$

Now we will construct the event \(\Omega _l^{\varepsilon ,R}\), which will form the third part of \({\mathcal {F}}_l\). Let us accept the following notation:

$$\begin{aligned} \text{ for } \text{ a } \text{ vector }\, Z=(Z_j)_{j\in {\mathcal {C}}}\in \mathbb {R}^N \mathrm{we\, denote}\,+ Z^\Lambda :=(Z_j)_{j\in \Lambda }\in \mathbb {R}^M. \end{aligned}$$

For any fixed \(J\in \mathbb {R}^N_{+0}\) the function \(P(J,\psi )\) is Lipschitz-continuous in angles \(\psi \in \mathbb {T}^N\). From [33] it follows that the Fourier series of a Lipschitz-continuous function of \(\psi \in \mathbb {T}^N\) converges uniformly in \(\psi \). Then, using standard method (e.g., see in [28]), we obtain that for every \(\delta >0\) and \(R'>0\) there exists a Borel set \(E^{\delta ,R'}\subset \{ x=(x_k)_{k\in \Lambda }\in \mathbb {R}^M: \Vert x\Vert _{\mathbb {R}^M}\le R'\}\) with the Lebesgue measure \(|E^{\delta ,R'}|\le \delta \), such that for any \(Z=(Z_k)_{k\in {\mathcal {C}}}\in \mathbb {R}^N\) satisfying \(Z^\Lambda \notin E^{\delta ,R'}\) and \( \Vert Z^\Lambda \Vert _{\mathbb {R}^M}\le R',\) we have

$$\begin{aligned} \left| \frac{1}{t}\int _0^t P(J, \psi + Z s) \, ds - \langle P\rangle (J) \right| \le \kappa (t;J,\delta , R'), \end{aligned}$$
(5.23)

for all \(\psi \in \mathbb {T}^N\). Moreover, since \(P\in {\mathcal {L}}_{loc}(\mathbb {C}^N)\), then we can choose the function \(\kappa \) to be independent from \(J\) for \(J\in B_R^\Lambda \), where

$$\begin{aligned} B_R^\Lambda :=\left\{ J=(J_k)_{k\in {\mathcal {C}}}\in \mathbb {R}^N_{0+}:\,\max \limits _{k\in \Lambda } J_k\le R\right\} , \end{aligned}$$

i.e. \(\kappa =\kappa (t;R,\delta ,R').\) The rate of convergence in (5.23) depends on \(\delta \). Choose a function \(\delta =\delta (\varepsilon )\), such that \( \delta (\varepsilon )\rightarrow 0\) as \(\varepsilon \rightarrow 0 \) so slow that

$$\begin{aligned} \left| \frac{1}{\varepsilon ^{-1}\nu }\int _0^{\varepsilon ^{-1}\nu } P(J, \psi + Z s) \, ds - \langle P\rangle (J) \right| \le \kappa (\varepsilon ^{-1};R, R') \end{aligned}$$
(5.24)

for all \( J\in B^\Lambda _R,\,\psi \in \mathbb {T}^N\) and \(Z\) as above.

Let us choose \(R'=R'(R)=\sup \limits _{\overline{\Omega }_R} \sup \limits _{0 \le \tau \le T}\Vert Y^\Lambda (\tau )\Vert _{\mathbb {R}^M}\). Let

$$\begin{aligned} \Omega _l^{\varepsilon , R}:=\{ Y^\Lambda (\tau _l)\in E^{\delta (\varepsilon ),R'(R)} \}. \end{aligned}$$

Then outside \(\Omega _l^{\varepsilon , R}\cup \Omega _R\) we get \(Y^\Lambda (\tau _l)\notin E^{\delta (\varepsilon ),R'(R)}\) and \(\Vert Y^\Lambda (\tau _l)\Vert _{\mathbb {R}^M}\le R'(R).\) Since outside \(\Omega _R\) we have \(J(\tau _l)\in B^\Lambda _R\), then, due to (5.24), outside \(\Omega _l^{\varepsilon ,R}\cup \Omega _R\) we get (iii).

Let \({\mathcal {F}}_l:={\mathcal {E}}_l\cup Q_l \cup \Omega _l^{\varepsilon ,R}\). Then outside \({\mathcal {F}}_l\cup \Omega _R\) items (i), (ii) and (iii) hold true.

Now we will estimate the probabilities of \({\mathcal {E}}_l,Q_l\) and \(\Omega _l^{\varepsilon ,R}\).

Proposition 5.3

  1. (i)

    We have \({\mathbf {P}}\,(Q_l)\le \kappa _\infty (\nu ^{-1})\), where \(\kappa _\infty \) is independent from \(l\).

  2. (ii)

    There exists an initial point \(\tau _0\in [0,\nu )\) such that

    $$\begin{aligned} L^{-1} \sum \limits _{l=0}^{L-1} {\mathbf {P}}\,_{\Omega _R}({\mathcal {E}}_l\cup \Omega _l^{\varepsilon ,R}) = \kappa (\varepsilon ^{-1}; R). \end{aligned}$$

Propositions 5.3 implies (5.19):

$$\begin{aligned} L^{-1} \sum \limits _{l=0}^{L-1} {\mathbf {P}}\,_{\Omega _R}({\mathcal {F}}_l) \le \kappa \left( \varepsilon ^{-1};R\right) + \kappa _\infty \left( \nu ^{-1}\right) =\kappa _1\left( \varepsilon ^{-1};R\right) . \end{aligned}$$

\(\square \)

Proof of Proposition 5.3

(i) Let us take \(\rho >\sqrt{\nu }\). Then, due to (4.2), for any \(k\in \widetilde{\Lambda }\)

$$\begin{aligned} {\mathbf {P}}\,\left( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} |J_k(\tau ) - J_k(\tau _l)| \ge \rho \right)&\le {\mathbf {P}}\,\left( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^{\tau } A^J_k\, ds \right| \ge \rho /2\right) \\&\quad + {\mathbf {P}}\,\left( \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^{\tau } v_k \cdot (WdB)_k \right| \ge \rho /2\right) \\&=: {\mathbf {P}}\,(\Gamma _1) + {\mathbf {P}}\,(\Gamma _2). \end{aligned}$$

Due to estimate (4.6), we have

$$\begin{aligned} {\mathbf {P}}\,(\Gamma _1) \le {\mathbf {P}}\,\left( \nu \sup \limits _{\tau _l \le \tau \le \tau _{l+1}} | A_k^J| \ge \rho /2\right) \le \kappa _\infty \left( \nu ^{-1}\right) . \end{aligned}$$

In view of (5.36), the Burkholder–Davis–Gundy inequality jointly with (4.6) implies

$$\begin{aligned} {\mathbf {E}}\,\sup \limits _{\tau _l \le \tau \le \tau _{l+1}} \left| \int \limits _{\tau _l}^{\tau } v_k \cdot (WdB)_k \right| ^{2m} \le C(m) {\mathbf {E}}\,\left( \int \limits _{\tau _l}^{\tau _{l+1}} S_{kk}^J \, ds \right) ^m \le C_1(m)\nu ^m, \end{aligned}$$
(5.25)

for every \(m >0\). Consequently, \({\mathbf {P}}\,(\Gamma _2) \le C(m)\nu ^m \rho ^{-2m}\). Choosing \(\rho =\nu ^{1/3}\) we get \({\mathbf {P}}\,(\Gamma _2)\le \kappa _\infty (\nu ^{-1})\). It remains to sum up the probabilities by \(k\in \widetilde{\Lambda }\).

(ii) Denote \({\mathcal {A}}(\tau ):=({\mathcal {E}}\cup \Omega ^{\varepsilon ,R})(\tau )\), where the last set is defined similarly to \({\mathcal {E}}_l\cup \Omega _l^{\varepsilon ,R}\) but at the moment of time \(\tau \) instead of \(\tau _l\). Recall that \(Y^\Lambda (J)\) depends on \(J\) only through \(J^{\widetilde{\Lambda }}:=(J_k)_{k\in \widetilde{\Lambda }}\). Denote by \(\widetilde{M}\) the number of nodes in \(\widetilde{\Lambda }\) and let

$$\begin{aligned} E^{\varepsilon ,R}_J:=\big \{J^{\widetilde{\Lambda }} \in \mathbb {R}^{\widetilde{M}}_{+0} : Y^\Lambda (J) \in E^{\delta (\varepsilon ),R'(R)} \text{ and } J_k\le R\quad \forall k\in \widetilde{\Lambda }\big \}. \end{aligned}$$

In view of assumption HF which states that the functions \(f'_j\) have only isolated zeros, it is not difficult to show that the convergence \(|E^{\delta (\varepsilon ),R'(R)}|\rightarrow 0\) as \(\varepsilon \rightarrow 0\) implies that \(|E^{\varepsilon ,R}_J|\rightarrow 0\) as \(\varepsilon \rightarrow 0\). Note that \(\overline{\Omega }_R\cap \Omega ^{\varepsilon , R}(\tau )\subset \{J^{\widetilde{\Lambda }}(\tau )\in E^{\varepsilon ,R}_J\}\). Then Lemma 4.3 implies

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,_{\Omega _R}\big ({\mathcal {A}}(\tau )\big ) \, d\tau \le \int \limits _0^T {\mathbf {P}}\,_{\Omega _R}\big ({\mathcal {E}}(\tau )\big ) \, d\tau + \int \limits _0^T {\mathbf {P}}\,\big (J^{\widetilde{\Lambda }}(\tau )\in E^{\varepsilon ,R}_J\big ) \, d\tau \rightarrow 0 \quad \text{ as } \varepsilon \rightarrow 0, \end{aligned}$$

uniformly in \(N\). It remains to note that there exists a deterministic point \(\tau _0\in [0,\infty )\) such that

$$\begin{aligned} \int \limits _0^T {\mathbf {P}}\,_{\Omega _R}\big ({\mathcal {A}}(\tau )\big ) \, d\tau&\ge \int \limits _0^\nu \sum \limits _{l=0}^{L-1}{\mathbf {P}}\,_{\Omega _R}\big ({\mathcal {A}}(l\nu +s)\big )\,ds \\&\ge \nu \sum \limits _{l=0}^{L-1}{\mathbf {P}}\,_{\Omega _R}\big ({\mathcal {A}}(l\nu +\tau _0)\big ) \ge T(L+1)^{-1} \sum \limits _{l=0}^{L-1} {\mathbf {P}}\,_{\Omega _R}\big ({\mathcal {A}}(\tau _l)\big ). \end{aligned}$$

\(\square \)

5.3 Proof of Theorem 3.1

The proof of the theorem is rather long and technical, so that we only sketch it. For the complete proof see [14], Section 6. Let

$$\begin{aligned} \widetilde{F}(J):=\frac{1}{2}\sum \limits _{j\in {\mathcal {C}}} F_j (|v_j|^2)\quad \text{ and }\quad \widetilde{G}(J,\psi ):= \frac{1}{4}\sum \limits _{|j-k|=1} G(|v_j-v_{k}|^2). \end{aligned}$$

Introduce for functions \(h_1,h_2:\,(J,\psi )\in \mathbb {R}_{+0}^N\times \mathbb {T}^N\mapsto \mathbb {R}\) their Poisson bracket asFootnote 7

$$\begin{aligned} \{h_1,h_2 \}=\sum \frac{\partial h_1}{\partial \psi _j}\frac{\partial h_2}{\partial J_j}-\frac{\partial h_2}{\partial \psi _j}\frac{\partial h_1}{\partial J_j}. \end{aligned}$$

We find the canonical transformation as the time-1-map \(\Gamma \) of the Hamiltonian flow \(X^s_{\sqrt{\varepsilon }{\Phi }}\) given by the Hamiltonian \(\sqrt{\varepsilon }{\Phi }\). The Taylor expansion provides

$$\begin{aligned} {\mathcal {H}}^\varepsilon (J,\psi ) = H^\varepsilon \circ \Gamma (J,\psi ) = \widetilde{F}(J) + \sqrt{\varepsilon } \Big ( \widetilde{G}(J,\psi ) + \big \{ \widetilde{F}, \Phi \big \}(J,\psi ) \Big ) + O(\varepsilon ). \end{aligned}$$
(5.26)

We wish to choose the function \(\Phi \) in such a way that the homological equation holds

$$\begin{aligned} \widetilde{G}(J,\psi ) + \big \{ \widetilde{F}, \Phi \big \}(J,\psi ) = \langle \widetilde{G}\rangle (J). \end{aligned}$$
(5.27)

The potential \(G\) depends only on the difference of angles:

$$\begin{aligned} G(|v_{j}-v_{n}|^2)=G\left( |\sqrt{2J_{j}} e^{i{(\psi _{j}-\psi _{n})}} - \sqrt{2J_{n}}|^2\right) =:G(J_{j},J_n,\psi _{j}-\psi _{n}). \end{aligned}$$
(5.28)

Using the Fourier’s expansion, we see that the function

$$\begin{aligned} \Phi =\sum \limits _{|j-n|=1} \Phi _{jn}, \quad \text{ where }\quad \Phi _{jn}:=\frac{1}{4}\frac{\int \limits _0^{\psi _j-\psi _n}G^0(J_j,J_n,\theta )\,d\theta }{f_{j}-f_{n}}\quad \text{ and }\quad G^0:=G-\langle G\rangle , \end{aligned}$$
(5.29)

satisfies (5.27). Due to the alternated spins condition HF, the denominator of (5.29) is separated from zero.

The main ingredient of the further proof is the following proposition which affirms, in particular, the \(C^2\)-smoothness of the transformation in \(u\rightarrow v\) variables.

Proposition 5.4

The function \(\Phi (v)\) is \(C^3\)-smooth. Let \(a,b,c\in \{v,\overline{v}\}\). Then for every \(k,l,m\in {\mathcal {C}}\), satisfying the relation \(| k-l | \le 1\) and \(l=m\), we have

$$\begin{aligned} \left| \frac{\partial \Phi }{\partial \psi _{k} }\right| , \left| \frac{\partial \Phi }{\partial a_{k} }\right| , \left| \frac{\partial ^2\Phi }{\partial a_{k}\partial b_{l}}\right| , \left| \frac{\partial ^3\Phi }{\partial a_{k}\partial b_{l}\partial c_{m}}\right| \le C. \end{aligned}$$

For other \(k,l,m \in {\mathcal {C}}\) the second and the third derivatives are equal to zero.

Taking the next order of the Taylor expansion in (5.26) and using (5.27), we get (3.3):

$$\begin{aligned} {\mathcal {H}}^\varepsilon (J,\psi )&=\widetilde{F}(J) + \sqrt{\varepsilon }\langle \widetilde{G} \rangle (J) + \frac{\varepsilon }{2} \big \{ \langle \widetilde{G} \rangle + \widetilde{G},\Phi \big \} \nonumber \\&\quad +\frac{\varepsilon \sqrt{\varepsilon }}{2} \left( \big \{\widetilde{G}, \Phi \big \}_2 + \int _0^1(1-s)^2 \big \{ H^\varepsilon ,\Phi \big \}_3 \circ X_{\sqrt{\varepsilon }\Phi }^s \,ds \right) \nonumber \\&=: H^\varepsilon _0(J) + \varepsilon H_2(J,\psi ) + \varepsilon \sqrt{\varepsilon }H^\varepsilon _> (J,\psi ), \end{aligned}$$
(5.30)

where \(\displaystyle { \{h,\Phi \}_k:=\{\ldots {\{h,\Phi \},\Phi },\ldots ,\Phi \}}\) denotes the Poisson bracket with \(\Phi \) taken \(k\) times. Proposition 5.9 implies that \(H^\varepsilon _0(v)\) is \(C^4\)-smooth, while Proposition 5.4 provides that \({\mathcal {H}}(v)\) and \(H^\varepsilon _2(v)\) are \(C^2\)-smooth. Then, due to (5.30), \(H^\varepsilon _>(v)\) is also \(C^2\)-smooth.

Let \(v^s:=X^s_{-\sqrt{\varepsilon }{\Phi }} (u)\) and \(J^s:=J(v^s)\) (in particular, \(v=X^1_{-\sqrt{\varepsilon }{\Phi }} (u)=v^1\) and \(J=J(v)=J^1\)). Then we have

$$\begin{aligned} v^s_j=u_j - \sqrt{\varepsilon }\int \limits _0^s i\nabla _{j}\Phi \big |_{v^\tau } \, d\tau , \quad J^s_j=I_j + \sqrt{\varepsilon }\int \limits _0^s \partial _{\psi _j}\Phi \big |_{v^\tau } \, d\tau ,\quad 0\le s \le 1. \end{aligned}$$
(5.31)

Proposition 5.4 jointly with (5.31) immediately imply item 4 of the theorem.

Denote by \(\mathrm{Id }\) the identity matrix of the size \(N\times N\), by \(\displaystyle {}{\frac{\partial v^s}{\partial u}}\) the matrix of the same size with the elements \(\displaystyle {}{\Big (\frac{\partial v^s}{\partial u}\Big )_{jk}=\frac{\partial v^s_j}{\partial u_k},\;j,k\in {\mathcal {C}}}\) and define the matrix \(\displaystyle {}{\frac{\partial v^s}{\partial \overline{u}}}\) in the same way. Using (5.31), we obtain the following corollary of Proposition 5.4.

Corollary 5.5

For every \(j\in {\mathcal {C}}\), \(q>0\) and \(0\le s \le 1\), we have

$$\begin{aligned} \Big \Vert \frac{\partial v^s}{\partial u}-\mathrm{Id }\Big \Vert _{j,q},\; \Big \Vert \frac{\partial v^s}{\partial \overline{u}}\Big \Vert _{j,q} \le C\sqrt{\varepsilon }, \end{aligned}$$

where \(\Vert \cdot \Vert _{j,q}\) denotes the operator norm corresponding to the norm \(\Vert \cdot \Vert _{j,q}\) on \(\mathbb {C}^N\).

Applying Ito’s formula in complex coordinates to \(v\), we get

$$\begin{aligned} \dot{v} = i\nabla {\mathcal {H}}^\varepsilon (v) + \varepsilon \frac{\partial v}{\partial u} g(u) + \varepsilon \frac{\partial v}{\partial \overline{u}} \overline{g}(u) + \varepsilon \sum \limits _{k\in {\mathcal {C}}} {\mathcal {T}}_k \frac{\partial ^2 v}{\partial u_k \partial \overline{u}_k} + \sqrt{\varepsilon } W^\varepsilon \dot{B}, \end{aligned}$$
(5.32)

where \(B=(\beta ,\overline{\beta })^T\) and the dispersion matrix \(W^\varepsilon \) has the size \(N \times 2 N\) and consists of two \(N\times N\) blocks \(W^\varepsilon =(W^{\varepsilon 1},W^{\varepsilon 2})\), where \(\displaystyle {}{ W^{\varepsilon 1}:= \frac{\partial v}{\partial u}\text{ diag }\,(\sqrt{{\mathcal {T}}_j})}\), \(\displaystyle {}{W^{\varepsilon 2}:=\frac{\partial v}{\partial \overline{u}}\text{ diag }\,(\sqrt{{\mathcal {T}}_j})}\).

In view of Theorem 3.1(4), Corollaries 5.5, 5.31 and Proposition 5.4, we have

$$\begin{aligned} g(u)-g(v),\; \frac{\partial v}{\partial u} - \mathrm{Id },\; \frac{\partial v}{\partial \overline{u}}, \;\sum \limits _{k\in {\mathcal {C}}} {\mathcal {T}}_k \frac{\partial ^2 v}{\partial u_k \partial \overline{u}_k} \thicksim \sqrt{\varepsilon }. \end{aligned}$$

Denote

$$\begin{aligned} r^\varepsilon := i\nabla H^\varepsilon _>+ \varepsilon ^{-1/2}\left( \big (g(u)-g(v)\big ) + \left( \frac{\partial v}{\partial u}- \mathrm{Id }\right) g(u) + \frac{\partial v}{\partial \overline{u}}\overline{g}(u) + \sum \limits _{k\in {\mathcal {C}}} {\mathcal {T}}_k \frac{\partial ^2 v}{\partial u_k \partial \overline{u}_k} \right) . \end{aligned}$$

Substituting this relation to (5.32), we arrive at (3.5).

Item 3 of the theorem follows from the definition of the matrices \(W^{\varepsilon 1},W^{\varepsilon 2}\), Corollary 5.5 and the fact that if for a matrix \(A=(A_{kl})_{k,l\in {\mathcal {C}}}\) and some \(q>0\) we have \(\Vert A\Vert _{j,q}\le C_0\) with the same constant \(C_0\) for all \(j\in {\mathcal {C}}\) then \(|A_{kl}|\le C_0\) for every \(k,l\in {\mathcal {C}}\). Item 2 follows from assumptions HG,Hg, Proposition 5.4 and Corollary 5.5 by tedious computation. The proof of item 1 is based on the following simple proposition.

Proposition 5.6

Let the function \(h(\psi )=h\big ((\psi _k)_{k\in {\mathcal {C}}}\big )\) be \(C^1\)-smooth and depends on \(\psi \) only through the differences of the neighbouring components: \(h((\psi _j)_{j\in {\mathcal {C}}})=h((\theta _{kn})_{k,n:|k-n|=1}),\) where \(\theta _{kn}=\psi _k-\psi _n\). Then

$$\begin{aligned} \left| \sum \limits _{k\in {\mathcal {C}}} \gamma ^{|j-k|}\partial _{\psi _k} h \right| \le 2(1-\gamma )\sum \limits _{|k-n|=1} \gamma ^{|j-k|} |\partial _{\theta _{kn}} h|. \end{aligned}$$

Since for any real \(C^1\)-smooth function \(h(v)\) we have \((i\nabla _k h)\cdot v_k= -\partial _{\psi _k} h\), the inequality of item 1 of the theorem is equivalent to \(|\sum \limits _{k\in {\mathcal {C}}}\gamma ^{|j-k|}\partial _{\psi _k} H_2|\le (1-\gamma )C \Vert v\Vert _{j,p}^p+C(\gamma )\). Noting that \(H_2\) satisfies Proposition 5.6, and using that \(\partial _{\theta _{kn}} H_2\) depends only on \(v_m\) such that \(m\) satisfies \(|m-k|\wedge |m-n|\le 1\), we get the desired estimate by the direct computation. \(\square \)