1 Introduction

1.1 Overview

In a variety of instances partial differential equations are a faithful approximation—in fact, a law of large numbers—for particle systems in suitable limits. This is notably the case for stochastic interacting particle systems, for which the mathematical theory has gone very far [24]. The closeness between the particle system and PDE is typically proven in the limit of systems with a large number \(N\) of particles or for infinite systems under a space rescaling involving a large parameter \(N\)—for example a spin or particle system on \({\mathbb {Z}} ^d\) and the lattice spacing scaled down to \(\frac{1}{N}\)—and up to a time horizon which may depend on \(N\). Of course the question of capturing the finite \(N\) corrections has been taken up too, and the related central limit theorems as well as large deviations principles have been established (see [24] and references therein). Sizable deviations from the law of large numbers, not just small fluctuations or rare events, can be observed beyond the time horizon for which the PDE behavior has been established and these phenomena can be very relevant.

The first examples that come to mind are the ones in which the PDE has multiple isolated stable stationary points: metastability phenomena happens on exponentially long time scales [29]. Deviations on substantially shorter time scales can also take place and this is the case for example of the noise induced escape from stationary unstable solutions, which is particularly relevant in plenty of situations: for example for the model in [30, Ch. 5] phase segregation originates from homogeneous initial data via this mechanism, on times proportional to the logarithm of the size of the system. The logarithmic factor is directly tied to the exponential instability of the stationary solution (see [30] for more literature on this phenomenon). Of course, the type of phenomena happen also in finite dimensional random dynamical systems, in the limit of small noise, but we restrict this quick discussion to infinite dimensional models and PDEs.

In the case on which we focus the deviations also happen on time scales substantially shorter than the exponential ones, but the mechanism of the phenomenon does not involve exponential instabilities. In the system we consider there are multiple stationary solutions, but they are not (or, at least, not all) isolated, and hence they are not stable in the standard sense. Deviations from the PDE behavior happen as a direct result of the cumulative effect of the fluctuations. More precisely, this phenomenon is due to the presence of whole stable manifold of stationary solutions: the deterministic limit dynamics has no dumping effect along the tangential direction to the manifold so, for the finite size system, the weak noise does have a macroscopic effect on a suitable time scale that depends on how large the system is. We review the mathematical literature on this type of phenomena in Sect. 1.6, after stating our results.

Apart for the general interest on deviations from the PDE behavior, the model we consider—mean-field plane rotators—is a fundamental one in mathematical physics and, more generally, it is the basic model for synchronization phenomena. Our results provide a sharp description of the long time dynamics of this model for general initial data.

1.2 The model

Consider the set of ordinary stochastic differential equations

$$\begin{aligned} \mathrm{d}\varphi _t^{j,N} = \frac{1}{N} \sum _{i=1}^N J \left( \varphi _t^{j,N}-\varphi _t^{i,N}\right) \mathrm{d}t + \mathrm{d}W^j_t. \end{aligned}$$
(1.1)

with \(j=1, 2, \ldots , N\), \(\{W_j\}_{j=1,2 ,\ldots }\) is an IID collection of standard Brownian motions and \(J(\cdot )= -K \sin (\cdot )\). With abuse of notation, when writing \(\varphi _t^{j,N}\) we will actually mean \(\varphi _t^{j,N}\text {mod}(2\pi )\) and for us (1.1), supplemented with an (arbitrary) initial condition, will give origin to a diffusion process on \({\mathbb {S}} ^N\), where \({\mathbb {S}} \) is the circle \({\mathbb {R}} /(2\pi {\mathbb {Z}} )\).

The choice of the interaction potential \(J(\cdot )\) is such that the (unique) invariant probability of the system is

$$\begin{aligned} \pi _{N, K}(\mathrm{d}\varphi ) \propto \exp \left( \frac{K}{N} \sum _{i,j=1}^N \cos ( \varphi _i-\varphi _j)\right) \lambda _N(\mathrm{d}\varphi ), \end{aligned}$$
(1.2)

where \(\lambda _N\) is the uniform probability measure on \({\mathbb {S}} ^N\). Moreover, the evolution is reversible with respect to \(\pi _{N, K}\), which is the well known Gibbs measure associated to mean-field plane rotators (or classical \(XY\) model).

We are therefore considering the simplest Langevin dynamics of mean-field plane rotators and it is well known that such a model exhibits a phase transition, for \(K>K_c:=1\), that breaks the continuum symmetry of the model (for a detailed mathematical physics literature we refer to [6]). The continuum symmetry of the model is evident both in the dynamics (1.1) and in the equilibrium measure (1.2): if \(\{\varphi _t^{j,N}\}_{t\ge 0, j=1,\ldots , N}\) solves (1.1), so does \(\{\varphi _t^{j,N}+c\}_{t\ge 0, j=1,\ldots , N}\), \(c\) an arbitrary constant, and \(\pi _{N,K} \Theta _c^{-1}= \pi _{N,K}\), where \(\Theta _c\) is the rotation by an angle \(c\), that is \((\Theta _c \varphi )_j= \varphi _j+c\) for every \(j\).

1.3 The \(N \rightarrow \infty \) dynamics and the stationary states

The phase transition can be understood also taking a dynamical standpoint. Given the mean-field set up it turns out to be particularly convenient to consider the empirical measure

$$\begin{aligned} \mu _{N,t}(\mathrm{d}\theta ):= \frac{1}{N} \sum _{j=1}^N \delta _{\varphi _t^{j,N}}(\mathrm{d}\theta ), \end{aligned}$$
(1.3)

which is a probability on (the Borel subsets of) \({\mathbb {S}} \). It is well known, see [6] (for detailed treatment and original references), that if \(\mu _{N,0}\) converges weakly for \(N \rightarrow \infty \), then so does \(\mu _{N,t}\) for every \(t>0\). Actually, the process itself \(t \mapsto \{\mu _{N, t}\}\), seen as an element of \(C^0 ([0,T], {\mathcal M} _1)\), where \(T>0\) and \({\mathcal M} _1\) is the space of probability measures on \({\mathbb {S}} \) equipped with the weak topology, converges to a non-random limit which is the process that concentrates on the unique solution of the non-local PDE (\(*\) denotes the convolution)

$$\begin{aligned} \partial _t p_t(\theta ) = \frac{1}{2} \partial _\theta ^2 p_t( \theta ) - \partial _\theta \left( (J*p_t)(\theta ) p_t(\theta )\right) \!, \end{aligned}$$
(1.4)

with initial condition prescribed by the limit of \(\{\mu _{N, 0}\}_{N=1,2, \ldots }\) (by [21] this solution is smooth for \(t>0\)). We insist on the fact that \(p_t(\cdot )\) is a probability density: \(\int _{\mathbb {S}} p_t(\theta ) \mathrm{d}\theta =1\). We will often commit the abuse of notation of writing \(p(\theta )\) when \(p\in {\mathcal M} _1\) and \(p\) has a density. Much in the same way, if \(p(\cdot )\) is a probability density, \(p\), or \(p(\!\mathrm{d}\theta )\), is the probability measure.

It is worthwhile to point out that \( (J*p)(\theta )= -\mathfrak {R}(\hat{p}_1) K \sin (\theta ) + \mathfrak {I}(\hat{p}_1) K \cos (\theta )\) with \(\hat{p}_1:= \int _{\mathbb {S}} p(\theta )\exp (i \theta )\,\mathrm{d}\theta \). This is to say that the nonlinearity enters only through the first Fourier coefficient of the solution, a peculiarity that allows to go rather far in the analysis of the model. Notably, starting from this observation one can easily (once again details and references are given in [6]) see that all the stationary solutions to (1.4), in the class of probability densities, can be written, up to a rotation, as

$$\begin{aligned} q(\theta ) := \frac{\exp \left( 2Kr \cos (\theta )\right) }{2\pi I_0(2Kr)}, \end{aligned}$$
(1.5)

where \(2\pi I_0(2Kr)\) is the normalization constant written in terms of the modified Bessel function of order zero (\(I_j(x)= (2\pi )^{-1}\int _{\mathbb {S}} (\cos \theta )^j \exp (x \cos (\theta )) \mathrm{d}\theta \), for \(j=0,1\)) and \(r\) is a non-negative solution of the fixed point equation \(r= \Psi (2Kr)\), with \(\Psi (x)=I_1(x)/I_0(x)\). Since \(\Psi (\cdot ):[0, \infty ) \rightarrow [0,1)\) is increasing, concave, \(\Psi (0)=0\) and \(\Psi '(0)=1/2\) we readily see that if (and only if) \(K>1\) there exists a non-trivial (i.e. non-constant) solution to (1.4). Let us not forget however that \(\Psi (0)=0\) implies that \(r=0\) is a solution and therefore the constant density \(\frac{1}{2\pi }\) is a solution no matter what the value of \(K\) is. From now on we set \(K>1\) and choose \(r=r(K)\), the unique positive solution of the fixed point equation, so that the probability density \(q(\cdot )\) in (1.5) is non trivial and it achieves the unique maximum at \(0\) and the minimum at \(\pi \). Note that the rotation invariance of the system immediately yields that there is a whole family of stationary solution:

$$\begin{aligned} M = \{ q_\psi (\cdot ):\, q_\psi (\cdot ):= q(\cdot - \psi ) \text { and } \psi \in {\mathbb {S}} \}, \end{aligned}$$
(1.6)

and, when \(x \in {\mathbb {R}} \), \(q_x(\cdot )\) of course means \(q_{x\text {mod}(2\pi )}(\cdot )\). \(M\), which is more practically viewed as a manifold (in a suitable function space, see Sect. 2.2 below), is invariant and stable for the evolution. The proper notion of stability is given in the context of normally hyperbolic manifolds (see [32] and references therein), but the full power of such a concept is not needed for the remainder. Nevertheless let us stress that in [21] one can find a complete analysis of the global dynamic phase diagram, notably the fact that unless \(p_0(\cdot )\) belongs to the stable manifold \(U\) of the unstable solution \(\frac{1}{2\pi }\)—the solution corresponding to \(r=0\) in (1.5)—\(p_t(\cdot )\) converges (also in strong norms, controlling all the derivatives) to one of the points in \(M\), see Fig. 1. There is actually an explicit characterization of \(U\):

$$\begin{aligned} U = \left\{ p\in {\mathcal M} _1: \, \int _{\mathbb {S}} \exp (i \theta ) p(\mathrm{d}\theta )\, =0\right\} . \end{aligned}$$
(1.7)

As a matter of fact, it is easy to realize that if \(p_0(\cdot ) \in U\) then (1.4) reduces to the heat equation \(\partial _t p_t(\theta )= \frac{1}{2} \partial _\theta ^2 p_t (\theta )\) which of course relaxes to \(\frac{1}{2\pi }\).

Fig. 1
figure 1

The limit evolution (1.4) instantaneously smoothens an arbitrary initial probability and, unless the Fourier decomposition such an initial condition has zero coefficients corresponding to the first harmonics (the hyperplane \(U\)), it drives it to a point \(p_\infty \)—a synchronized profile—on the invariant manifold \(M\) and of course it stays there for all times. This has been proven in [21], here we are interested in what happens for the finite size—\(N\)—system and we show that the PDE approximation is faithful up to times much shorter than \(N\): on times proportional to \(N\) synchronization is kept and the center of synchronization \(\psi \) performs a Brownian motion on \({\mathbb {S}} \)

1.4 Random dynamics on \(M\): the main result

In spite of the stability of \(M\), \(q_\psi (\cdot )\) itself is not stable, simply because if we start nearby, say from \(q_{\psi '}\), the solution of (1.4) does not converge to \(q_\psi (\cdot )\). The important point here is that the linearized evolution operator around \(q(\cdot ) \in M\) (\(q\) is an arbitrary element of \(M\), not necessarily the one in (1.5): the phase \(\psi \) of \(q_\psi \) is explicit only when its absence may be misleading)

$$\begin{aligned} L_q u (\theta ) := \frac{1}{2} u'' -[ u J*q+ q J*u]', \end{aligned}$$
(1.8)

with domain \(\{u\in C^2({\mathbb {S}} , {\mathbb {R}} ):\, \int _{\mathbb {S}} u =0\}\) is symmetric in \(H_{-1,1/q}\)—a weighted \(H_{-1}\) Hilbert space that we introduce in detail in Sect. 2.1—and it has compact resolvent. Moreover the spectrum of \(L_q\), which is of course discrete, lies in \((-\infty , 0]\) and the eigenvalue \(0\) has a one dimensional eigenspace, generated by \(q'\). So \(q'\) is the only neutral direction and it corresponds precisely to the tangent space of \(M\) at \(q(\cdot )\): all other directions, in function space, are contracted by the linear evolution and the nonlinear part of the evolution does not alter substantially this fact [21, 25].

Let us now step back and recall that our main concern is with the behavior of (1.1), with \(N\) large but finite, and not (1.4). In a sense the finite size, i.e. finite \(N\), system is close to a suitable stochastic perturbation of (1.4): the type of stochastic PDE, with noise vanishing as \(N \rightarrow \infty \), needs to be carefully guessed [19], keeping in particular in mind that we are dealing with a system with one conservation law. We will tackle directly (1.1), but the heuristic picture that one obtains by thinking of an SPDE with vanishing noise is of help. In fact the considerations we have just made on \(L_q\) suggest that if one starts the SPDE on \(M\), the solution keeps very close to \(M\), since the deterministic part of the dynamics is contractive in the orthogonal directions to \(M\), but a (slow, since the noise is small) random motion on \(M\) arises because in the tangential direction the deterministic part of the dynamics is neutral. This is indeed what happens for the model we consider for \(N\) large. The difficulty that arises in dealing with the interacting diffusion system (1.1) is that one has to work with (1.3), which is not a function. Of course one can mollify it, but the evolution is naturally written and, to a certain extent, closed in terms of the empirical measure, and we do not believe that any significative simplification arises in proving our main statement for a mollified version. Working with the empirical measure imposes a clarification from now: as we explain in Sect. 2.1 and Appendix A, if \(\mu \) and \(\nu \in {\mathcal M} _1\), then \(\mu -\nu \) can be seen as an element of \(H_{-1}\) (or, as a matter of fact, also as an element of a weighted \(H_{-1}\) space).

Here is the main result that we prove (recall that \(K>1\)):

Theorem 1.1

Choose a positive constant \(\tau _f\) and a probability \(p_0\in {\mathcal M} _1 {\setminus } U\). If for every \(\varepsilon >0\)

$$\begin{aligned} \lim _{N \rightarrow \infty } {\mathbb {P}} \left( \left\| \mu _{N, 0} - p_0 \right\| _{-1}\le \varepsilon \right) = 1, \end{aligned}$$
(1.9)

then there exist a constant \(\psi _0\) that depends only on \(p_0(\cdot )\) and, for every \(N\), a continuous process \(\{W_{N,\tau }\}_{\tau \ge 0}\), adapted to the natural filtration of \(\{W^j_{N\cdot }\}_{j=1,2, \ldots ,N}\), such that \(W_{N, \cdot }\in C^0([0,\tau _f]; {\mathbb {R}} )\) converges weakly to a standard Brownian motion and for every \(\varepsilon >0\)

$$\begin{aligned} \lim _{N \rightarrow \infty } {\mathbb {P}} \left( \sup _{\tau \in [\varepsilon _N, \tau _f]} \left\| \mu _{N, \tau N} - q_{\psi _0 + D_K W_{N,\tau } } \right\| _{-1}\le \varepsilon \right) = 1, \end{aligned}$$
(1.10)

where \(\varepsilon _N := C /N\), \(C=C(K, p_0, \varepsilon )>0\), and

$$\begin{aligned} D_K := \frac{1}{\sqrt{1- \left( I_0(2Kr)\right) ^{-2}}}. \end{aligned}$$
(1.11)

The result is saying that, unless one starts on the stable manifold of the unstable solution (see Remark 2.5 for what one expects if \(p_0\in U\)), the empirical measure reaches very quickly a small neighborhood of the manifold \(M\): this happens on a time scale of order one, as a consequence of the properties of the deterministic evolution law (1.4) (Fig. 1), and, since we are looking at times of order \(N\), this happens almost instantaneously. Actually, in spite of the fact that the result just addresses the limit of the empirical measure, the drift along \(M\) is due to fluctuations: the noise pushes the empirical measure away from \(M\) but the deterministic part of the dynamics projects back the trajectory to \(M\) and the net effect of the noise is a random shift—in fact, a rotation—along the manifold (this is taken up in more detail in the next section, where we give a complete heuristic version of the proof of Theorem 1.1).

Remark 1.2

Without much effort, one can upgrade this result to much longer times: if we set \(\tau _f(N)=N^a\) with an arbitrary \(a>1\), there exists an adapted process \(W^a_{N,\tau }\) converging to a standard Brownian motion such that

$$\begin{aligned} \lim _{N \rightarrow \infty } {\mathbb {P}} \left( \sup _{\tau \in [\varepsilon _N, \tau _f(N)]} \left\| \mu _{N, \tau N^a} - q_{\psi _0 + D_K N^{a-1}W^a_{N,\tau } } \right\| _{-1}\le \varepsilon \right) = 1. \end{aligned}$$
(1.12)

This is due to the fact that our estimates ultimately rely on moment estimates, cf. Sect. 3. These estimates are obtained for arbitrary moments and we choose the moment sufficiently large to get uniformity for times \(O(N)\), but working for times \(O(N^a)\) would just require choosing larger moments. We have preferred to focus on the case \(a=1\); this is the natural scale, that is the scale in which the center of the probability density converges to a Brownian motion and not to an accelerated Brownian motion (this is really due to the fact that we work on \({\mathbb {S}} \) and marks a difference with [4, 9] where one can rescale the space variable).

1.5 The synchronization phenomena viewpoint

The model (1.1) we consider is actually a particular case of the Kuramoto synchronization model (the full Kuramoto model includes quenched disorder in terms of random constant speeds for the rotators, see [1, 6] and references therein). The mathematical physics literature and the more bio-physically oriented literature use somewhat different notations reflecting a slightly different viewpoint. In the synchronization literature one introduces the synchronization degree \({\varvec{r}} _{N,t}\) and the synchronization center \({\varvec{\Psi }}_{N,t} \) via

$$\begin{aligned} {\varvec{r}}_{N,t} \exp (i {\varvec{\Psi }}_{N, t}) := \frac{1}{N} \sum _{j=1}^N \exp \left( i \varphi ^{j,N}_t\!\right) \, \left( = \int _{\mathbb {S}} \exp (i \theta ) \mu _{N, t} (\mathrm{d}\theta )\right) , \end{aligned}$$
(1.13)

which clearly correspond to the parameters \(r\) and \(\psi \) that appear in the definition of \(M\), but \({\varvec{r}}_{N,t}\) and \({\varvec{\Psi }}_{N,t} \) are defined for \(N\) finite and also far from \(M\). Note that if (1.9) holds, then both \({\varvec{r}}_{N,t}\) and \({\varvec{\Psi }}_{N,t} \) converge in probability as \(N \rightarrow \infty \) to the limits \(r\) and \(\psi \), with \(r\exp (i\psi )= \int _{\mathbb {S}} \exp (i \theta ) p_0(\mathrm{d}\theta )\) and the assumption that \(p_0 \not \in U\) just means \(r\not = 0\). Here is a straightforward consequence of Theorem 1.1:

Corollary 1.3

Under the same hypotheses and definitions as in Theorem 1.1 we have that the stochastic process \( {\varvec{\Psi }}_{N, N\cdot } \in C^0([\varepsilon , \tau _f]; {\mathbb {S}} )\) converges weakly, for every \(\varepsilon \in (0, \tau _f]\), to \((\psi _0 + D_K W_{\cdot })\hbox {mod} (2\pi )\).

It is tempting to prove such a result by looking directly at the evolution of \({\varvec{\Psi }}_{N,t}\):

$$\begin{aligned} \mathrm{d}{\varvec{\Psi }}_{N,t}&= \left( - K + \frac{1}{2N {\varvec{r}}^2_{N,t}} \right) \frac{1}{N} \sum _{j=1}^N \sin \left( 2\left( \varphi ^{j,N}_t- {\varvec{\Psi }}_{N,t}\right) \right) \mathrm{d}t \nonumber \\&+ \frac{1}{{\varvec{r}}_{N,t} N} \sum _{j=1}^N \cos \left( \varphi ^{j,N}_t- \varvec{\Psi }_{N,t}\right) \mathrm{d}W_j (t). \end{aligned}$$
(1.14)

But this clearly requires a control of the evolution of the empirical measure, so it does not seem that (1.14) could provide an alternative way to many of the estimates that we develop, namely convergence to a neighborhood of \(M\) and persistence of the proximity to \(M\) (see Sects. 3 and 5). On the other hand, it seems plausible that one could use (1.14) to develop an alternative approach to the dynamics on \(M\), that is an alternative to Sect. 4. While this can be interesting in its own right, since the notion of synchronization center that we use in the proof and \({\varvec{\Psi }}_{N, t}\) are almost identical (where they are both defined, that is close to \(M\)) we do not expect substantial simplifications.

1.6 A look at the literature and perspectives

Results related to our work have been obtained in the context of SPDE models with vanishing noise. In [8, 15] one dimensional stochastic reaction diffusion equations with bistable potential (also called stochastic Cahn–Allen or model A) are analyzed for initial data that are close to profiles that connect the two phases. It is shown that the location of the phase boundary performs a Brownian motion. These results have been improved in a number of ways, notably to include small asymmetries that result in a drift for the arising diffusion process [7] and to deal with macroscopically finite volumes [4] (which introduce a repulsive effect approaching the boundary). Also the case of stochastic phase field equations has been considered [5].

For interacting particle systems results have been obtained for the zero temperature limit of \(d\)-dimensional Brownian particles interacting via local pair potentials in [16]: in this case the frozen clusters perform a Brownian motion and, in one dimension, also the merging of clusters is analyzed [17]. In this case the very small temperature is the small noise from which cluster diffusion originates. With respect to [16, 17], our results hold for any super-critical interaction, but of course our system is of mean field type. It is also interesting to observe that for the model in [16, 17] establishing the stability of the frozen clusters is the crucial issue, because the motion of the center of mass is a martingale, i.e. there is no drift. A substantial part of our work is in controlling that the drift of the center of synchronization vanishes (and controlling the drift is a substantial part also of [4, 5, 7, 8, 15]). This is directly related to the content of Sect. 1.5.

As a matter of fact, in spite of the fact that our work deals directly with an interacting system, and not with an SPDE model, our approach is closer to the one in the SPDE literature. However, as we have already pointed out, a non negligible point is that we are forced to perform an analysis in distribution spaces, in fact Sobolev spaces with negative exponent, in contrast to the approach in the space of continuous functions in [4, 5, 7, 8, 15]. We point out that approaches to dynamical mean field type systems via Hilbert spaces of distribution has been already taken up in [14] but in our case the specific use of weighted Sobolev spaces is not only a technical tool, but it is intimately related to the geometry of the contractive invariant manifold \(M\). In this sense and because of the iterative procedure we apply—originally introduced in [8]—our work is a natural development of [4, 8].

An important issue about our model that we have not stressed at all is that propagation of chaos holds (see e.g. [18]), in the sense that if the initial condition is given by a product measure, then this property is approximately preserved, at least for finite times. Recently much work has been done toward establishing quantitative estimates of chaos propagation (see for example the references in [11]). On the other hand, like for the model in [11], we know that, for our model, chaos propagation eventually breaks down: this is just because one can show by Large Deviations arguments that the empirical measure at equilibrium converges in law as \(N \rightarrow \infty \) to the random probability density \(q_{X}(\cdot )\), with \(X\) a uniform random variable on \({\mathbb {S}} \). But using Theorem 1.1 one can go much farther and show that chaos propagation breaks down at times proportional to \(N\). From Theorem 1.1 one can actually extract also an accurate description of how the correlations build up due to the random motion on \(M\).

It is natural to ask whether the type of results we have proven extend to the case in which random natural frequencies are present, that is to the disordered version of the model we consider that goes under the name of Kuramoto model. The question is natural because for the limit PDE [13, 26] there is a contractive manifold similar to \(M\) [20]. However the results in [27] suggest that a nontrivial dynamics on the contractive manifold is observed rather on times proportional to \(\sqrt{N}\) and one expects a dynamics with a nontrivial random drift. The role of disorder in this type of models is not fully elucidated (see however [12] on the critical case) and the global long time dynamics represents a challenging issue.

The paper is organized as follows: we start off (Sect. 2) by introducing the precise mathematical set-up and a number of technical results. This will allow us to present quantitative heuristic arguments and sketch of proofs. In Sect. 3 we prove that if the system is close to \(M\), it stays so for a long time. We then move on to analyzing the dynamics on \(M\) (Sect. 4) and it is here that we show that the drift is negligible. Section 5 provides the estimates that guarantee that we do approach \(M\) and in Sect. 6 we collect all these estimates and complete the proof our main result (Theorem 1.1).

2 More on the mathematical set-up and sketch of proofs

2.1 On the linearized evolution

We introduce the Hilbert space \(H_{-1,1/q}\) or, more generally, the space \(H_{-1,w}\) for a general weight \(w\in C^1({\mathbb {S}} ; (0, \infty ))\) by using the rigged Hilbert space structure [10] with pivot space \({\mathbb {L}} ^2_0:= \{ u\in {\mathbb {L}} ^2: \, \int _{\mathbb {S}} u =0\}\). In this way given an Hilbert space \(V\subset {\mathbb {L}} ^2_0\), \(V\) dense in \({\mathbb {L}} ^2_0\), for which the canonical injection of \(V\) into \({\mathbb {L}} ^2_0\) is continuous, one automatically obtains a representation of \(V'\)—the dual space—in terms of a third Hilbert space into which \({\mathbb {L}} ^2_0\) is canonically and densely injected. If \(V\) is the closure of \(\{u \in C^1({\mathbb {S}} ; {\mathbb {R}} ):\, \int u =0\}\) under the squared norm \(\int _{\mathbb {S}} (u')^2 / w\), that is \(H_{1, 1/w}\), the third Hilbert space is precisely \(H_{-1,w}\). The duality between \(H_{1, 1/w}\) and \(H_{-1,w}\) is denoted in principle by \(\langle \, \cdot \, , \, \cdot \,\rangle _{H_{1, 1/w},H_{-1,w}}\), but less cumbersome notations will be introduced when the duality is needed (for example, below we drop the subscripts).

It is not difficult to see that for \(u, v \in H_{-1,w}\)

$$\begin{aligned} \left( u, v \right) _{-1,w} = \int _{\mathbb {S}} w\, {\mathcal U} {\mathcal V} , \end{aligned}$$
(2.1)

where \({\mathcal U} \), respectively \({\mathcal V} \), is the primitive of \(u\) (resp. \(v\)) such that \(\int _{\mathbb {S}} w {\mathcal U} =0\) (resp. \(\int _{\mathbb {S}} w {\mathcal V} =0\)), see [6, § 2.2]. More precisely, \(u\in H_{-1,w}\) if there exists \({\mathcal U} \in {\mathbb {L}} ^2({\mathbb {S}} ; {\mathbb {R}} )\) such that \(\int _{\mathbb {S}} {\mathcal U} w =0\) and \(\langle u, h\rangle =-\int _{\mathbb {S}} {\mathcal U} h'\) for every \(h \in H_{1,1/w}\). One sees directly also that by changing \(w\) one produces equivalent \(H_{1, w}\) norms [22, §2.1] so, when the geometry of the Hilbert space is not crucial, one can simply replace the weight by \(1\), and in this case we simply write \(H_{-1}\). Occasionally we will need also \(H_{-2}\) which is introduced in an absolutely analogous way.

Remark 2.1

One observation that is of help in estimating weighted \(H_{-1}\) norms is that computing the norm of \(u\) requires access to \({\mathcal U} \): in practice if one identifies a primitive \(\widetilde{{\mathcal U} }\) of \(u\), then \(\Vert u\Vert _{-1, w}^2 \le \int _{\mathbb {S}} \widetilde{{\mathcal U} } ^2 w\). This is just because \(\widetilde{{\mathcal U} } = {\mathcal U} +c\) for some \(c \in {\mathbb {R}} \) and \(\int _{\mathbb {S}} \widetilde{{\mathcal U} } ^2 w= \int _{\mathbb {S}} {\mathcal U} ^2 w + c^2 \int _{\mathbb {S}} w\).

The reason for introducing weighted \(H_{-1}\) spaces is because, as one can readily verify, \(L_q\), given in (1.8), is symmetric in \(H_{-1,1/q}\). A deeper analysis (cf. [6]) shows that \(L_q\) is essentially self-adjoint, with compact resolvent. The spectrum of \(-L_q\) lies in \([0, \infty )\), there is an eigenvalue \(\lambda _0=0\) with one dimensional eigenspace generated by \(q'\). We therefore denote the set of eigenvalues of \(-L_q\) as \(\{\lambda _0, \lambda _1, \ldots \}\), with \(\lambda _1>0\) and \(\lambda _{j+1} \ge \lambda _j\) for \(j=1,2, \ldots \). The set of eigenfunctions is denoted by \(\{e_j\}_{j=0,1, \ldots }\) and let us point out that it is straightforward to see that \(e_j \in C^\infty ({\mathbb {S}} ; {\mathbb {R}} )\). Moreover, if \(u \in C^2({\mathbb {S}} ; {\mathbb {R}} )\) is even (respectively, odd), then \(L_q u\) is even (respectively, odd): the notion of parity is of course the one obtained by observing that \(u \in C^2({\mathbb {S}} ; {\mathbb {R}} )\) can be extended to a periodic function in \(C^2({\mathbb {R}} ; {\mathbb {R}} )\). This implies that one can choose \(\{e_j\}_{j=0,1, \ldots }\) with \(e_j\) that is either even or odd, and we will do so.

Remark 2.2

By rotation symmetry the eigenvalues do not depend on the choice of \(q(\cdot )\in M\), but the eigenfunctions do depend on it, even if in a rather trivial way: the eigenfunction of \(L_{q_\psi }\) and \(L_{q_{\psi '}}\) just differ by a rotation of \(\psi '-\psi \). We will often need to be precise about the choice of \(q(\cdot )\) and for this it is worthwhile to introduce the notations

$$\begin{aligned} L_\psi := L_{q_{\psi }} \ \text { and } \ -L_\psi e_{\psi , j} = \lambda _j e_{\psi , j}. \end{aligned}$$
(2.2)

The eigenfunctions are normalized in \(H_{-1, 1/q_{\psi }}\).

Remark 2.3

Some expressions involving weighted \(H_{-1}\) norms can be worked out explicitly. For example a recurrent expression in what follows is \((u, q')_{1, 1/q}\), for \(u \in H_{-1}\) and \(q \in M\). If \({\mathcal U} \) is the primitive of \(u\) such that \(\int _{\mathbb {S}} {\mathcal U} /q =0\), then we have \((u, q')_{1, 1/q}= \int _{\mathbb {S}} {\mathcal U} (q-c)/q= \int _{\mathbb {S}} {\mathcal U} \), where \(c\) is uniquely defined by \(\int _{\mathbb {S}} (q-c)/q=0\), but of course the explicit value of \(c\) is not used in the final expression. In practice however it may be more straightforward to use an arbitrary primitive \(\widetilde{{\mathcal U} }\) of \(u\) (i.e. \(\int _{\mathbb {S}} \widetilde{{\mathcal U} } /q\) is not necessarily zero) for which we have

$$\begin{aligned} (u, q')_{1, 1/q} = \int _{\mathbb {S}} \widetilde{{\mathcal U} } \left( 1 - \frac{c}{q}\right) . \end{aligned}$$
(2.3)

Since now \(c\) appears, let us make it explicit:

$$\begin{aligned} c = \frac{2\pi }{\int _{\mathbb {S}} 1/q} = \frac{1}{2\pi I_0^2(2Kr)}. \end{aligned}$$
(2.4)

2.2 About the manifold \(M\)

As we have anticipated, we look at the set of stationary solutions \(M\), defined in (2.2), as a manifold. For this we introduce

$$\begin{aligned} \widetilde{H}_{-1} := \left\{ \mu :\, \mu - \frac{1}{2\pi } \in H_{-1}\right\} , \end{aligned}$$
(2.5)

which is a metric space equipped with the distance inherited from \(H_{-1}\), that is \(\hbox {dist}(\mu _1,\mu _2)= \Vert \mu _1-\mu _2 \Vert _{-1}\). We have \(M \subset \widetilde{H}_{-1}\) and \(M\) can be viewed as a smooth one dimensional manifold in \(\widetilde{H}_{-1}\). The tangent space at \(q \in M\) is \(q' {\mathbb {R}} \) and for every \(u \in H_{-1}\) we define the projection \(P^o_q\) on this tangent space as \(P^o_q u= (u, q')_{-1,1/q}q'/(q',q')_{-1,1/q}\). The following result is proven in [32, p. 501] (see also [22, Lemma 5.1]):

Lemma 2.4

There exists \(\sigma >0\) such that for all \(p\in N_\sigma \) with

$$\begin{aligned} N_\sigma := \cup _{q \in M}\left\{ \mu \in \widetilde{H}_{-1} : \, \Vert \mu -q \Vert _{-1} < \sigma \right\} \!, \end{aligned}$$
(2.6)

there is one and only one \(q=:v( \mu )\in M\) such that \((\mu -q,q')_{-1,1/q}=0\). Furthermore, the mapping \(\mu \mapsto v( \mu )\) is in \(C^\infty (\widetilde{H}_{-1},\widetilde{H}_{-1})\), and (with \(D\) the Fréchet derivative)

$$\begin{aligned} Dv( \mu ) = P^o_{v( \mu )}. \end{aligned}$$
(2.7)

Note that the empirical (probability) measure \(\mu _{N,t}\) that describes our system at time \(t\) is in \(\widetilde{H}_{-1}\) (see Appendix A) and Lemma 2.4 guarantees in particular that as soon as it is sufficiently close to \(M\) there is a well defined projection \(v(\mu _{N,t})\) on the manifold. Since the manifold is isomorphic to \({\mathbb {S}} \) it is practical to introduce, for \(\mu \in {\widetilde{H}}_{-1}\), also \(\mathtt {p}(\mu )\in {\mathbb {S}} \), uniquely defined by \(v(\mu )=q_{\mathtt {p}(\mu )}\). It is immediate to see that the projection \(\mathtt {p}\) is \(C^\infty ({\widetilde{H}}_{-1},{\mathbb {S}} )\).

2.3 A quantitative heuristic analysis: the diffusion coefficient

The proof of Theorem 1.1 is naturally split into two parts: the approach to \(M\) and the motion on \(M\). The approach to \(M\) is based on the properties of the PDE (1.4): in [21] it is shown, using the gradient flow structure of (1.4), that if the initial condition is not on the stable manifold \(U\) (see (1.7)) of the unstable stationary solution \(\frac{1}{2\pi }\), then the solution converges for time going to infinity to one of the probability densities \(q=q_\psi \in M\) (of course \(\psi \) is a function of the initial condition), so given a neighborhood of \(q_\psi \) after a finite time (how large it depends only on the initial condition), it gets to the chosen neighborhood: due to the regularizing properties of the PDE, such a neighborhood can be even in a topology that controls all the derivatives [21], but here there is no point to use a strong topology, since at the level of interacting diffusions we deal with a measure (that we inject into \(H_{-1}\)). And in fact we have to estimate the distance between the empirical measure and the solution to (1.4)—controlling thus the effect of the noise—but this type of estimates on finite time intervals is standard. However here there is a subtle point: the result we are after is a matter of fluctuations and it will not come as a surprise that the empirical measure approaches \(M\) but does not reach it (of course: \(M\) just contains smooth functions, and \(\mu _{N, t}\) is not a function), but it will stay in a \(N^{-1/2}\)-neighborhood (measured in the \(H_{-1}\) norm). How long will it take to reach such a neighborhood? The approach to \(M\) is actually exponential and driven by the spectral gap (\(\lambda _1\)) of the linearized evolution operator (at least close to \(M\)). Therefore in order to enter such a \(N^{-1/2}\)-neighborhood a time proportional to \(\log N\) appears to be needed, as the quick observation that \(\exp ( -\lambda _1 t)=O(N^{-1/2})\) for \(t\ge \log N/(2 \lambda _1)\) suggests. The proofs on this stage of the evolution are in Sect. 5: here we just stress that

  1. 1.

    controlling the effect of the noise on the system on times \(O(\log N)\) is in any case sensibly easier than controlling it on times of order \(N\), which is our final aim;

  2. 2.

    on times of order \(N\) it is no longer a matter of showing that the empirical measure stays close to the solution of the PDE: on such a time scale the noise takes over and the finite \(N\) system, which has a non-trivial (random) dynamics, substantially deviates from the behavior of the solution to the PDE, which just converges to one of the stationary profiles.

Let us therefore assume that the empirical measure is in a \(N^{-1/2}\)-neighborhood of a given \(q=q_\psi \). It is reasonable to assume that the dominating part of the dynamics close to \(q\) is captured by the operator \(L_q\) and we want to understand the action of the semigroup generated by \(L_q\) on the noise that stirs the system, on long times. Note that we cannot choose arbitrarily long times, in particular not times proportional to \(N\) right away, because in view of the result we are after, we expect that on such a time scale the empirical measure of the system is no longer in a neighborhood of \(q\), but close to \(q_{\psi '}\) for a \(\psi ' \ne \psi \). We will actually choose some intermediate time scale \(N^{1/10}\) as we will see in Sect. 2.4 and Remark 2.6, that guarantees that working with \(L_q\) makes sense, i.e. that the projection of the empirical measure on \(M\) is still sufficiently close to \(q\). The point is that the effect of the noise on intermediate times is very different in the tangential direction and the orthogonal directions to \(M\), simply because in the orthogonal direction there is a damping, that is absent in the tangential direction. So on intermediate times the leading term in the evolution of the empirical measure turns out to be the projection of the evolution on the tangential direction, that is \(( q',\mu _{N,t}-q)_{-1,1/q}/ \Vert q'\Vert _{-1,1/q}\). One can now use Remark 2.3 to obtain

$$\begin{aligned} \left( q',\mu _{N,t}-q\right) _{-1,1/q} = - \int _{\mathbb {S}} {\mathcal K} (\theta ) \left( \mu _{N,t} (\mathrm{d}\theta ) - q(\theta ) \mathrm{d}\theta \right) \!, \end{aligned}$$
(2.8)

with \({\mathcal K} \) a primitive of \(1-c/q\) (\(c\) given in Remark 2.3). By applying Itô’s formula we see that the term in (2.8) can be written as the sum of a drift term and of a martingale term. It is not difficult to see that to leading order the drift term is zero (a more attentive analysis shows that one has to show that the next order correction does not give a contribution, but we come back to this below). The quadratic variation of the martingale term instead turns out to be equal to \(t/N\) times

$$\begin{aligned} \int _{{\mathbb {S}} }({\mathcal K} ' (\theta ))^2 q(\theta ) \mathrm{d}\theta = 1- \frac{(2\pi )^2}{\int _{\mathbb {S}} 1/q} = \Vert q'\Vert _{-1,1/q}^2. \end{aligned}$$
(2.9)

Since \(q_{\psi +\varepsilon }= q_\psi - \varepsilon q'_\psi + \cdots \) (note that \(q'_\psi \) is not normalized), (2.9) suggests that the diffusion coefficient \(D_K\) in Theorem 1.1 is \(\Vert q'\Vert ^{-1}_{-1,1/q}\), which coincides with (1.11).

To make this procedure work one has to carefully put together the analysis on the intermediate time scale, by setting up an adequate iterative scheme. Several delicate issues arise and one of the challenging points is precisely to control that the drift can be neglected. In fact the first order expansion of the projection that we have used

$$\begin{aligned} \mathtt {p}\left( q_\psi +h\right) = \psi -\frac{( h, q')_{-1, 1/q}}{( q', q')_{-1,1/q}}+ O ( \Vert h \Vert _{-1}^2), \end{aligned}$$
(2.10)

is not accurate enough and one has to go to the next order, see Lemma 7.5. This is due to the fact that the random contribution, which in principle appears as first order, fluctuates and generates a cancellation, so in the end the term is of second order.

Remark 2.5

It is natural to expect that Theorem 1.1 holds true also when \(p_0\in U\) and this is just because the evolution is attracted to \(\frac{1}{2\pi }\) and then the noise will cause an escape from this unstable profile after a time \(\propto \log N\), since the exponential instability will make the fluctuations grow exponentially with a rate which is just given by the linearized dynamics (linearized around \(\frac{1}{2\pi }\) of course). Arguments in this spirit can be found for example in [30, Ch. 5], see [3] and references therein for the finite dimensional counterpart. However

  1. 1.

    this is not so straightforward because it requires a good control on the dynamics on and around the heteroclinic orbits linking \(\frac{1}{2\pi }\) to \(M_0\) [21, Section 5];

  2. 2.

    the statement would require more details about the initial condition: the simple convergence to a point on \(U\) is largely non sufficient (the fluctuations of the initial conditions now matter!);

  3. 3.

    in general the initial phase \(\psi _0\) on \(M\) is certainly going to be random: if the initial condition is rotation invariant (at least in law), like if \(\{\varphi _0^{j,N}\}_{j=1, \ldots , N}\) are IID variables uniformly distributed on \({\mathbb {S}} \) or if \(\varphi _0^{j,N}= 2\pi j/N\), one expects \(\psi _0\) to be uniformly distributed on \({\mathbb {S}} \). Note however that uniform distribution of \(\psi _0\) is definitely not expected in the general case and asymmetries in the initial condition should affect the distribution of \(\psi _0\).

2.4 The iterative scheme

As we have explained in Sect. 2.3, the analysis close to \(M\) requires an iterative procedure, which we introduce here. We assume that at \(t=0\) the system is already close to \(M\), while in practice this will happen after some time: in Sect. 6 we explain how to put together the results on the early stage of the evolution and the analysis close to \(M\), that we start here. So, for \(\mu _0=\mu _{N,0}=\frac{1}{N}\sum _{j=1}^N \delta _{\varphi ^{j,N}_0}\) such that \(\text {dist}(\mu _0,M)\le \sigma \) (here and below \(\text {dist}(\cdot , \cdot )\) is the distance built with the norm of \(H_{-1}\)), by Lemma 2.4 we can define \(\psi _0 = \mathtt {p}(\mu _0)\). Applying the Itô formula to \(\nu _t=\mu _t-q_{\psi _0}\) (\(\mu _t=\mu _{N,t}\)), we see that

$$\begin{aligned} \nu _{t} = e^{-t L_{\psi _0}} \nu _0-\int _0^{t} e^{-(t -s)L_{\psi _0}}\partial _\theta [\nu _s J*\nu _s]\mathrm{d}s + Z_{t}, \end{aligned}$$
(2.11)

where

$$\begin{aligned} Z_t = \frac{1}{N} \sum _{j=1}^N \int _0^t \partial _{\theta '} {\mathcal G} ^{\psi _0}_{t-s} \left( \theta ,\varphi ^{j,N}_s\right) \mathrm{d}W^j_s, \end{aligned}$$
(2.12)

and \({\mathcal G} ^{\psi _0}_s(\theta ,\theta ')\) is the kernel of \(e^{-sL_{\psi _0}}\) in \({\mathbb {L}} ^2\). The evolution equation (2.11) and the noise term (2.12) have a meaning in \(H_{-1}\), as well as the recentered empirical measures \(\nu _t\), and it is in this sense that we will use them: we detail this in Appendix A, where one finds also an explicit expression and some basic facts about the kernel \({\mathcal G} ^{\psi _0}_s(\theta ,\theta ')\). We have started here an abuse of notation that will be persistent through the text: \(\partial _{\theta '}{\mathcal G} ^{\psi _0}_{t-s}(\theta ,\varphi ^{j,N}_s)\) stands for \(\partial _{\theta '}{\mathcal G} ^{\psi _0}_{t-s}(\theta ,\theta ')\vert _{\theta '=\varphi ^{j,N}_s}\).

Equations (2.11)–(2.12) are useful tools as long as we can properly define the phase associated to the empirical measure of the system and that this phase is close to \(\psi _0\): in view of the result we want to prove, this is expected to be true for a long time, but it is certainly expected to fail for times of the order of \(N\), since on this timescale the phase does change of an amount that does not vanish as \(N\) becomes large.

The idea is therefore to divide the evolution of the particle system up to a final time proportional to \(N\) into \(n=n_N\mathop {\longrightarrow }\limits ^{N \rightarrow \infty }\infty \) time intervals \([T_i, T_{i+1}]\), where \(T_i=iT\) and \(T=T(N)\) is chosen close to a fractional power of \(N\) (see Remark 2.6). Moreover \(i\) runs from \(1\) up to \(n=n_N\) so that \(n_N T(N)= T_{n_N}\) and \(\lim _N T_{n_N}/N\) is equal to a positive constant (the \(\tau _f\) of Theorem 1.1). If the empirical measure \(\mu _t\) stays close to the manifold \(M\), we can define the projections of \(\mu _{T_k}\) and successively update the reentering phase at all times \(T_k\). The point then will be essentially to show that the process given by these phases, on the time scale \(\propto N\), converges to a Brownian motion.

More formally, we construct the following iterative scheme: we choose

$$\begin{aligned} \sigma =\sigma _N := \left\lceil N^{2 \zeta } \sqrt{T/N}\right\rceil \mathop {\longrightarrow }\limits ^{N\rightarrow \infty }0, \end{aligned}$$
(2.13)

for a suitable \(\zeta >0\) (see Remark 2.6), we set \(\tau ^0_{\sigma _N}=0\) and for \(k=1,2, \ldots \) we define

$$\begin{aligned} \psi _{k-1} := \mathtt {p}(\mu _{T_{k-1}}), \end{aligned}$$
(2.14)

if \(\text {dist}(\mu _{T_{k-1}},M)\le \sigma _N\) and

$$\begin{aligned} \tau ^k_{\sigma _N}\!=\!\tau _{\sigma _N}^{k-1}\mathbf {1}_{\{\tau _{\sigma _N}^{k-1}<T_{k-1}\}}+\inf \{s\in [T_{k-1}, T_{k}]:\, \Vert \mu _s-q_{\psi _{k-1}}\Vert _{-1}>{\sigma _N}\}\mathbf {1}_{\{\tau _{\sigma _N}^{k-1}\!\geqslant \! T_{k-1}\}}.\nonumber \\ \end{aligned}$$
(2.15)

Then we set

$$\begin{aligned} \nu _t^{k}:= \mu _t - q_{\psi _{k-1}}, \end{aligned}$$
(2.16)

for \(t\in [T_{k-1},T_{k}]\) and \(t\le \tau _{\sigma _N}^{k}\), and otherwise \(\nu _t^{k}:=\nu _{ \tau _{\sigma _N}^{k}}^k\) for every \(t\ge \tau _{\sigma _N}^{k}\) (of course \(\tau _{\sigma _N}^{k}\) can be smaller than \(T_{k-1}\) and, in this case, the definition becomes redundant). Therefore the \(\nu \) process we have just defined solves for \(t\in [T_{k-1},T_{k}]\)

$$\begin{aligned} \nu ^{k}_{t}&= \mathbf {1}_{\{\tau ^k_{\sigma _N}<T_{k-1}\}}\nu ^k_{T_{k-1}} + \mathbf {1}_{\{\tau ^k_{\sigma _N}\geqslant T_{k-1}\}} \nonumber \\&\times \left( e^{-(t\wedge \tau ^k_{\sigma _N}-T_{k-1}) L_{\psi _{k-1}}} \nu ^{k}_{T_{k-1}}-\int _{T_{k-1}}^{t\wedge \tau ^k_{\sigma _N}} e^{-(t\wedge \tau ^k_{\sigma _N} -s)L_{\psi _{k-1}}}\partial _\theta [\nu ^{k}_s J*\nu ^{k}_s]\mathrm{d}s +Z^{k}_{t\wedge \tau ^k_{\sigma _N}}\right) \!,\nonumber \\ \end{aligned}$$
(2.17)

where

$$\begin{aligned} Z^k_{t} = \frac{1}{N} \sum _{j=1}^N \int _{T_{k-1}}^t \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-s}\left( \theta , \varphi ^{j,N}_s\right) \mathrm{d}W^j_s. \end{aligned}$$
(2.18)

Once again, we refer to Appendix A for the precise meaning of (2.17) and (2.18).

Remark 2.6

For the remainder of the paper we choose \(T(N)\sim N^{1/10}\) and \(\zeta \le 1/100\). The two exponents do not have any particular meaning: a look at the argument shows that the exponent for \(T(N)\) has in any case to be chosen smaller than \(1/2\), but then a number of technical estimates enter the game and we have settled for a value \(1/10\) without trying to get the optimal value that comes out of the method we use.

3 A priori estimates: persistence of proximity to \(M\)

The aim of this section is to prove that, if we are (say, at time zero) sufficiently close to \(M\), we stay close to \(M\) for times \(O(N)\). The arguments in this section justify the choice of the proximity parameter \(\sigma _N\) that we have made in the iterative scheme. We first prove some estimates on the size of the noise term and then we will give the estimates on the empirical measure.

3.1 Noise estimates

We define the event

$$\begin{aligned} B^N&= \left\{ \sup _{1\leqslant k\leqslant n-1}\sup _{ t \in [T_k,T_{k+1}]} \left\| Z^k_t \right\| _{-1}\leqslant \sqrt{\frac{T}{N}}N^\zeta \right\} \nonumber \\&\bigcap \left\{ \sup _{1\leqslant k\leqslant n-1}\sup _{ t \in [T_k,T_{k+1}]} \left\| Z^{k,\perp }_t \right\| _{-1}\leqslant \frac{1}{\sqrt{N}}N^\zeta \right\} , \end{aligned}$$
(3.1)

where \(Z^{k,\perp }_t \) is defined precisely like \(Z^{k}_t \), see (2.18), except for the replacement of \({\mathcal G} _{t-s}^{\psi }(\cdot , \cdot )\) with \({\mathcal G} _{t-s}^{\psi } (\theta , \theta ')-e_{\psi _{k-1},0}(\theta ) f_{\psi _{k-1},0}(\theta ' )\) [see (3.3)].

Lemma 3.1

\(\lim _{N \rightarrow \infty } {\mathbb {P}} \left( B^N\right) =1\).

Proof

In order to perform the estimates we introduce and work with approximated versions of \(Z^{k}_t \) and \(Z^{k,\perp }_t \) (see Lemma 7.4). Define for \(T_{k-1}<t'<t\)

$$\begin{aligned} Z^k_{t,t'}=\frac{1}{N} \sum _ {j=1}^N\int _{T_{k-1}}^{t'} \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-s}(\theta ,\varphi ^{j,N}_s)\mathrm{d}W^j_s. \end{aligned}$$
(3.2)

The kernel \({\mathcal G} ^{\psi _{k-1}}_\cdot \) in this case is (cf. Appendix A)

$$\begin{aligned} {\mathcal G} ^{\psi _{k-1}}_s(\theta ,\theta ') = \sum _{l=0}^\infty e^{-s\lambda _l}e_{\psi _{k-1},l}(\theta )f_{\psi _{k-1},l}(\theta '), \end{aligned}$$
(3.3)

where \(\lambda _l\) are the ordered eigenvalues of \(-L_{\psi _{k-1}}\), \(e_{\psi _{k-1},l}\) are the associated eigenfunctions of unit norm in \(H_{-1,1/{q_{\psi _{k-1}}}}\), cf. Remark 2.2, and \(f_{\psi _{k-1},l}\) are the eigenfunctions of \(L_{\psi _{k-1}}^*\), the adjoint in \({\mathbb {L}} ^2\) (see Appendix A).

Very much in the same way we define

$$\begin{aligned} Z_{t,t'}^{k,\perp }=\frac{1}{N} \sum _ {j=1}^N\int _{T_{k-1}}^{t'}\partial _{\theta '} {\mathcal G} ^{\psi _{k-1},\perp }_{t-s}(\theta ,\varphi ^{j,N}_s)\mathrm{d}W^j_s, \end{aligned}$$
(3.4)

with

$$\begin{aligned} {\mathcal G} ^{\psi _{k-1},\perp }_s(\theta ,\theta ') = \sum _{l=1}^\infty e^{-s\lambda _l}e_{\psi _{k-1},l}(\theta )f_{\psi _{k-1},l}(\theta '). \end{aligned}$$
(3.5)

We decompose for \(T_{k-1}<s'<s<t\) and \(s'<t'<t\)

$$\begin{aligned} Z^k_{t,t'}-Z^k_{s,s'}&= \frac{1}{N} \sum _ {j=1}^N\int _{T_{k-1}}^{s'}\left( \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\theta ,\varphi ^{j,N}_u)-\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{s-u}(\theta ,\varphi ^{j,N}_u)\right) \mathrm{d}W^j_u \nonumber \\&+\frac{1}{N} \sum _{j=1}^N\int _{s'}^{t'}\partial _{\theta '} {\mathcal G} ^{\psi _{k-1}}_{t-u}(\theta ,\varphi ^{j,N}_u)\mathrm{d}W^j_u, \end{aligned}$$
(3.6)

and an absolutely analogous formula holds for \(Z^{k,\perp }\): in fact the bounds for \(Z^{k}\) and \(Z^{k,\perp }\) are obtained with the same technique even if the results are slightly different due to the presence of the zero eigenvalue in \(Z^{k}\). Moreover we apply \( \Vert a+b\Vert ^2 \leqslant 2(\Vert a\Vert ^2 + \Vert b\Vert ^2)\), so that we can estimate the two terms in the right-hand side of (3.6) separately.

And we start with the second term of the right-hand side in (3.6): by the orthogonality properties of the eigenvectors we obtain

$$\begin{aligned}&\left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^2\nonumber \\&\quad = \frac{1}{N^2} \sum _{l=0}^\infty \sum _{j,j'=1}^N \int _{s'}^{t'}\int _{s'}^{t'} e^{-(2t-u-u')\lambda _l}f'_{\psi _{k-1},l}(\varphi ^{j,N}_u)f'_{\psi _{k-1},l}(\varphi ^{j',N}_{u'})\mathrm{d}W^j_u\mathrm{d}W^{j'}_{u'},\qquad \qquad \end{aligned}$$
(3.7)

and by taking the expectation

$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^2\right] \nonumber \\&\quad =\frac{1}{N^2}\sum _{l=0}^\infty \sum _{j=1}^N \int _{s'}^{t'}e^{-2(t-u)\lambda _l}{\mathbb {E}} \left[ (f_{\psi _{k-1},l}' (\varphi ^{j,N}_u))^2\right] \mathrm{d}u. \end{aligned}$$
(3.8)

By Corollary 8.6 there exists a constant \(C_1\) such that

$$\begin{aligned} {\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u} (\cdot ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^2\right] \!\leqslant \! \frac{C_1}{N}\sum _{l=0}^\infty \int _{s'}^{t'}e^{-2(t-u)\lambda _l}\mathrm{d}u.\qquad \quad \end{aligned}$$
(3.9)

Proposition 8.4, Remark 8.3, leads us to

$$\begin{aligned} \sum _{l=0}^\infty \int _{s'}^{t'}e^{-2(t-u)\lambda _l}\mathrm{d}u \!\leqslant \! \sum _{l=0}^\infty \int _{s'}^{t'}e^{-(t-u)\frac{l^2}{C}}\mathrm{d}u \!\leqslant \! C\sum _{l=0}^\infty \frac{1}{l^2}\left( 1-e^{-(t'-s')\frac{l^2}{C}}\right) ,\qquad \qquad \end{aligned}$$
(3.10)

where the addend with \(l=0\) (times \(C\)) has to be read as \(t'-s'\). The right-most term in (3.10) for \(t'-s'\ge 1\) can be bounded by \(C(t'-s')+ C\sum _{l=1}^\infty 1/l^2 \le 3C(t'-s')\). Instead for \(t'-s'<1\) we decompose the same term and then estimate as follows:

$$\begin{aligned}&C\sum _{l=0}^{\llcorner (t'-s')^{-1/2}\lrcorner } \frac{1}{l^2}\left( 1-e^{-(t'-s')\frac{l^2}{C}}\right) +C\sum _{l=\llcorner (t'-s')^{-1/2}\lrcorner +1}^\infty \frac{1}{l^2}\left( 1-e^{-(t'-s')\frac{l^2}{C}}\right) \nonumber \\&\quad \leqslant \sum _{l=0}^{\llcorner (t'-s')^{-1/2}\lrcorner } (t'-s')+\sum _{l=\llcorner (t'-s')^{-1/2}\lrcorner +1}^\infty \frac{C}{l^2} \le (3+2C) \sqrt{t'-s'}, \end{aligned}$$
(3.11)

where for the first term we have used \((1-\exp (-a)) \le a\), for \(a\ge 0\). Therefore we have proven that there exists \(C\) such that for every \(k\) and every \(s\), \(s'\), \(t\), \(t'\) such that \(T_{k-1} <s' <s<t\) and \(s' <t'<t\) we have

$$\begin{aligned} {\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '} {\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^2\right] \leqslant \frac{C h_1(t'-s')}{N}. \end{aligned}$$
(3.12)

with

$$\begin{aligned} h_1(u) := u^{1/2}\mathbf {1}_{[0,1)}(u) +u\mathbf {1}_{[1,\infty )}. \end{aligned}$$
(3.13)

We can do better in the case of \({\mathcal G} ^{\psi _{k-1},\perp }\), for which a direct inspection of the argument we have just presented shows that the linearly growing term in the estimate can be avoided (since the term \(l=0\) is no longer there) and the net result is

$$\begin{aligned} {\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '}{\mathcal G} ^{\psi _{k-1},\perp }_{t-u}(\cdot ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^2\right] \leqslant \frac{C h_2(t'-s')}{N}. \end{aligned}$$
(3.14)

with \(h_2\) defined as

$$\begin{aligned} h_2(u) = u^{1/2}\mathbf {1}_{[0,1)}(u) +\mathbf {1}_{[1,\infty )}. \end{aligned}$$
(3.15)

For what concerns the first term in the right-hand side of (3.6), we have

$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{T_{k-1}}^{s'}\left( \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)-\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{s-u}(\cdot ,\varphi ^{j,N}_u)\right) \mathrm{d}W^j_u\right\| _{-1,1/q}^2\right] \nonumber \\&\qquad = \frac{1}{N^2} \sum _{l=1}^\infty \sum _{j=1}^N \int _{T_{k-1}}^{s'}\left( e^{-(t-u)\lambda _l}-e^{-(s-u)\lambda _l}\right) ^2{\mathbb {E}} \left[ (f'_{\psi _{k-1},l}(\varphi ^{j,N}_u))^2\right] \mathrm{d}u,\qquad \qquad \end{aligned}$$
(3.16)

and, by proceeding like for (3.9), we see that the expression in (3.16) is bounded by

$$\begin{aligned} \frac{C_1}{N}\sum _{l=1}^\infty \frac{\left( 1- e^{-\lambda _l(t-s')}\right) ^2}{\lambda _l} \le \frac{C_1 C}{N}\sum _{l=1}^\infty \frac{\left( e^{-(t-s')\frac{l^2}{C}}-1\right) ^2}{l^2}. \end{aligned}$$
(3.17)

This last term is estimated once again by separating the two cases of \(t-s'\) small and large. The net result is that there exists \(C>0\) such that for every \(k\), every \(s\), \(s'\) and \(t\) such that \(T_k<s'< s<t\) we have

$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{T_{k-1}}^{s'}\left( \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)-\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{s-u}(\cdot ,\varphi ^{j,N}_u)\right) \mathrm{d}W^j_u\right\| _{-1,1/q}^2\right] \nonumber \\&\quad \leqslant C h_2(t-s'). \end{aligned}$$
(3.18)

In order to complete the proof of Lemma 3.1 quadratic estimates do not suffice: we need to generalize (3.12), (3.14) and(3.18) to larger exponents. We actually need estimates on moments of order \(2m\), with \(m\) finite, but sufficiently large, so to apply the standard Kolmogorov Lemma type estimates and get uniform bounds. We are going to use

$$\begin{aligned} \Vert a + b \Vert ^m \leqslant m (\Vert a \Vert ^m + \Vert b \Vert ^m), \end{aligned}$$
(3.19)

but actually we will not track the \(m\) dependence of the constants. We aim at showing that the expectation of the moments of order \(2m\) of the quantities we are interested in are bounded by the \(m{\mathrm{th}}\) power of the estimate we found in the quadratic case, times an \(m\)-dependent constant.

For \(m=1,2, \ldots \), the \(m\)th-power of the expression in (3.7) gives

$$\begin{aligned}&\left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\theta ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^{2m}\nonumber \\&\quad = \frac{1}{N^{2m}} \sum _{l_1,\dots ,l_m=0}^\infty \sum _{j_1,j_1',\dots ,j_m,j'_m=1}^N F^{j_1}_{l_1}(t,s',t')F^{j'_1}_{l_1}(t,s',t')\cdots \nonumber \\&\qquad \times F^{j_m}_{l_m}(t,s',t')F^{j'_m}_{l_m}(t,s',t'), \end{aligned}$$
(3.20)

in which we have introduced the random variables

$$\begin{aligned} F^j_l(t,s',t') = \int _{s'}^{t'} e^{-\lambda _l(t-u)}f'_{\psi _{k-1},l}(\varphi ^{j,N}_u)\mathrm{d}W^j_u. \end{aligned}$$
(3.21)

We now take the expectation of both terms in (3.20) and all the terms in the sum that do not include an even number of each Brownian motion vanish. The number of non-zero terms in the expectation can thus be bounded by \((2m)!N^m\). Applying the Itô formula to each of these non-zero terms, we get at most \((2m)!/(2^mm!)\) terms (the number of possibilities classifying \(2m\) elements in couples) of the type \(I_1\cdots I_m\), where

$$\begin{aligned} I_k =I_k(l_1,l_2) = \int _{s'}^{t'} e^{-(\lambda _{l_1}+\lambda _{l_2})(t-u)}{\mathbb {E}} \left[ f'_{\psi _{k-1},l_1}(\varphi ^{j,N}_u)f'_{\psi _{k-1},l_2}(\varphi ^{j,N}_u) \right] \mathrm{d}u.\qquad \quad \end{aligned}$$
(3.22)

We now observe that

$$\begin{aligned} \vert I_k(l_1,l_2)\vert \le \sqrt{I_k(l_1,l_1)I_k(l_2,l_2)}, \end{aligned}$$
(3.23)

and for each index \(l_i\) in the first sum in the right-hand side of (3.20) gives rise either directly to a term \(I_k (l_i)\) (for this it is needed that the two terms share the Brownian motion), or in the arising products the terms \(I_k(l_i,l_{i'})\) are associated with a term of the type \(I_k(l_i,l_{i''})\). Therefore the expression obtained after applying Itô formula can be bounded by a sum of terms of the type \(\hat{I}_1\cdots \hat{I}_m\), with

$$\begin{aligned} \hat{I}_k = I_k(l,l)= \int _{s'}^{t'} e^{-2\lambda _l(t-u)}{\mathbb {E}} \left[ (f'_{\psi _{k-1},l}(\varphi ^{j,N}_u))^2\right] \mathrm{d}u. \end{aligned}$$
(3.24)

Therefore we are facing the same estimates that we have encountered in the quadratic case, see (3.8) and (3.9), except of course for combinatorial contribution. In the end we obtain that there exists \(C=C_m\) such that

$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{s'}^{t'}\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u} (\cdot ,\varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^{2m}\right] \leqslant C\frac{h^m_1(t'-s')}{N^m},\end{aligned}$$
(3.25)
$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _{j=1}^N\int _{s'}^{t'} \partial _{\theta '}{\mathcal G} ^{\psi _{k-1},\perp }_{t-u}(\cdot , \varphi ^{j,N}_u)\mathrm{d}W^j_u\right\| _{-1,1/q}^{2m}\right] \leqslant C\frac{h_2^m(t'-s')}{N^m}.\qquad \quad \end{aligned}$$
(3.26)

In a similar way

$$\begin{aligned}&\left\| \frac{1}{N} \sum _{j=1}^N\int _{T_{k-1}}^{s'}\left( \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)- \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{s-u}(\cdot ,\varphi ^{j,N}_u) \right) \mathrm{d}W^j_u\right\| _{-1,1/q}^{2m}\nonumber \\&\quad =\frac{1}{N^{2m}} \sum _{l_1,\dots ,l_m=0}^\infty \sum _{j_1,j_1',\dots ,j_m,j'_m=1}^N G^{j_1}_{l_1}(s,t,s')G^{j'_1}_{l_1} (s,t,s')\cdots \nonumber \\&\qquad \times G^{j_m}_{l_m}(s,t,s')G^{j'_m}_{l_m}(s,t,s') \end{aligned}$$
(3.27)

with

$$\begin{aligned} G^{j}_{l}(s,t,s')= \int _{T_{k-1}}^{s'} \left( e^{-\lambda _l(t-u)}-e^{-\lambda _l(s-u)}\right) f'_{\psi _{k-1},l} (\varphi ^{j,N}_u)\mathrm{d}W^j_u. \end{aligned}$$
(3.28)

We reduce the problem as above to the study of products of integral terms \(J_1\cdots J_k\) with

$$\begin{aligned} J_k&= \int _{T_{k-1}}^{s'}\left( e^{-\lambda _{l_1}(t-u)}-e^{-\lambda _{l_1} (s-u)}\right) \left( e^{-\lambda _{l_2}(t-u)}-e^{-\lambda _{l_2}(s-u)}\right) \nonumber \\&\times {\mathbb {E}} \left[ f'_{\psi _{k-1},l_1}(\varphi ^{j,N}_u)f'_{\psi _{k-1}, l_2}(\varphi ^{j,N}_u)\right] \mathrm{d}u, \end{aligned}$$
(3.29)

and then, like before, in terms of products of diagonal terms of the type

$$\begin{aligned} \hat{J}_k= \int _{T_{k-1}}^{s'}\left( e^{-\lambda _l(t-u)}-e^{-\lambda _l(s-u)} \right) ^2{\mathbb {E}} \left[ (f'_{\psi _{k-1},l}(\varphi ^{j,N}_u))^2\right] \mathrm{d}u. \end{aligned}$$
(3.30)

Again we are reduced to the estimating terms that have already appeared in the quadratic case, see (3.16), so we obtain that there exists \(C=C_m\) such that

$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \frac{1}{N} \sum _ {j=1}^N\int _{T_{k-1}}^{s'} \left( \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{t-u}(\cdot ,\varphi ^{j,N}_u)- \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{s-u}(\cdot ,\varphi ^{j,N}_u)\right) \mathrm{d}W^j_u\right\| _{-1,1/q}^{2m}\right] \nonumber \\&\qquad \leqslant C\frac{h_2^m(t-s)}{N^m}. \end{aligned}$$
(3.31)

We now let \(t' \nearrow t\) and \(s'\nearrow s\) and by applying Fatou’s Lemma and Lemma 7.4, from(3.6), (3.25), (3.26) and (3.31) we get

$$\begin{aligned} {\mathbb {E}} \left[ \left\| Z^k_t-Z^k_s \right\| _{-1}^{2m} \right] \leqslant C\frac{h_1^m(t-s)}{N^m}, \end{aligned}$$
(3.32)

and

$$\begin{aligned} {\mathbb {E}} \left[ \left\| Z_t^{k,\perp }-Z_s^{k,\perp } \right\| _{-1}^{2m} \right] \leqslant C\frac{h_2^m(t-s)}{N^m}. \end{aligned}$$
(3.33)

The fact that we are allowed to drop the weight in the \(H_{-1}\) norm is of course due to the norm equivalence.

We are now in good condition to apply the Garsia–Rodemich–Rumsey Lemma [33]:

Lemma 3.2

Let \(p\) and \(\Psi \) be continuous, strictly increasing functions on \((0,\infty )\) such that \(p(0)=\Psi (0)=0\) and \(\lim _{t\nearrow \infty } \Psi (t)=\infty \). Given \(T>0\) and \(\phi \) continuous on \((0,T)\) and taking its values in a Banach space \((E,\Vert .\Vert )\), if

$$\begin{aligned} \int _0^T\int _0^T \Psi \left( \frac{\Vert \phi (t)-\phi (s)\Vert }{p(|t-s|)}\right) \mathrm{d}s\mathrm{d}t \leqslant B < \infty , \end{aligned}$$
(3.34)

then for \(0\leqslant s\leqslant t\leqslant T\):

$$\begin{aligned} \Vert \phi (t)-\phi (s)\Vert \leqslant 8\int _0^{t-s} \Psi ^{-1}\left( \frac{4B}{u^2}\right) p(\mathrm{d}u). \end{aligned}$$
(3.35)

We apply Lemma 3.2 with

$$\begin{aligned} \phi (t) =Z^k_{t-T_{k-1}}, \ \ p(u) = u^{\frac{2+\zeta }{2m}}\ \ \text { and } \ \ \Psi (u) = u^{2m}, \end{aligned}$$
(3.36)

and \(\zeta =1/100\) (Remark 2.6). With these choices we can find an explicit constant \(C=C(m, \zeta )\) such that

$$\begin{aligned} \Vert Z^k_t-Z^k_s \Vert _{-1}^{2m} \le C ( t-s ) ^\zeta B, \end{aligned}$$
(3.37)

for every \(s\) and \(t\) such that \(T_{k-1}\!\!\leqslant \! s\!<\!\! t\!\!\leqslant \!\! T_k\) and \(B\) is a positive random variable such that

$$\begin{aligned} {\mathbb {E}} [B] \le \frac{C}{N^m} \int _0^T \int _0^T \frac{h_1^m(\vert t-s \vert )}{\vert t-s\vert ^{2+\zeta }} \mathrm{d}s \mathrm{d}t, \end{aligned}$$
(3.38)

where \(C\) is the constant in (3.32). For \(m>4\) the function \(t \mapsto h_1^m(t) / t^{2+\zeta }\), defined for \(t>0\), is increasing (and it tends to zero for \(t \searrow 0\)). So \({\mathbb {E}} [B]\) is bounded by \(CN^{-m} h_1^m(T)/T^\zeta \) and therefore

$$\begin{aligned} {\mathbb {E}} \left[ \sup _{T_{k-1}\leqslant s< t\leqslant T_k}\frac{\Vert Z^k_t-Z^k_s \Vert _{-1}^{2m}}{|t-s|^\zeta }\right] \leqslant C \frac{T^{m-\zeta }}{N^m}, \end{aligned}$$
(3.39)

which leads to

$$\begin{aligned} {\mathbb {P}} \left[ \sup _{T_{k-1}\leqslant t\leqslant T_k} \Vert Z^k_t \Vert _{-1} \geqslant \sqrt{\frac{T}{N}}N^\zeta \right] \leqslant C\frac{1}{N^{m\zeta }}. \end{aligned}$$
(3.40)

Then, (recall \(n=n_N=\frac{N}{T}\)) we deduce

$$\begin{aligned} {\mathbb {P}} \left[ \sup _{1\leqslant k\leqslant n} \sup _{T_{k-1}\leqslant t\leqslant T_k} \Vert Z^k_t \Vert _{-1} \geqslant \sqrt{\frac{T}{N}}N^\zeta \right] \leqslant C\frac{1}{TN^{m\zeta -1}}, \end{aligned}$$
(3.41)

where the right hand side tends to 0 when \(m\) is chosen sufficiently large. A similar argument gives for \(Z_t^{k,\perp }\)

$$\begin{aligned} {\mathbb {P}} \left[ \sup _{1\leqslant k\leqslant n} \sup _{T_{k-1}\leqslant t\leqslant T_k} \Vert Z_t^{k,\perp } \Vert _{-1} \geqslant \frac{1}{\sqrt{N}}N^\zeta \right] \leqslant C\frac{T^{\zeta -1}}{N^{m\zeta -1}}. \end{aligned}$$
(3.42)

\(\square \)

We now give the main result of the section:

Proposition 3.3

If \(\Vert \nu ^1_0 \Vert _{-1} \leqslant \frac{N^{2\zeta }}{\sqrt{N}}\) and if the event \(B^N\) defined in (3.1) is realized (then, with probability approaching \(1\) as \(N \rightarrow \infty )\) we have

$$\begin{aligned} \sup _{1\leqslant k\leqslant n}\sup _{ t \in [T_{k-1},T_{k}]} \left\| \nu ^k_t \right\| _{-1}\leqslant \sqrt{\frac{T}{N}}N^{2\zeta }, \end{aligned}$$
(3.43)

and

$$\begin{aligned} \max _{1\leqslant k\leqslant n} \left\| \nu ^{k}_{T_{k-1}} \right\| _{-1}\leqslant \frac{N^{2\zeta }}{\sqrt{N}}. \end{aligned}$$
(3.44)

Proof

From (2.17) and Lemma 7.2, we get,for all \(k=1\dots n\) and \(t\in [T_{k-1},T_k]\)

$$\begin{aligned} \Vert \nu ^k_t \Vert _{-1} \!\leqslant \! Ce^{-\lambda _1 (t-T_{k-1})} \Vert \nu ^k_{T_{k-1}}\Vert _{-1} + C \int _{T_{k-1}}^t \left( 1+\frac{1}{\sqrt{t-s}}\right) \Vert \nu ^k_s \Vert _{-1}^2 \mathrm{d}s + \Vert Z^k_t \Vert _{-1}.\nonumber \\ \end{aligned}$$
(3.45)

The constant \(C\) in front of the first term of the right hand side above would be equal to \(1\) if we were using the \(\Vert .\Vert _{-1,1/q_{\psi _{k-1}}}\) norm. Let us assume that \(\Vert \nu ^k_{T_{k-1}} \Vert _{-1}\le N^{2\zeta }/\sqrt{N}\), since we are working in \(B^N\) we obtain

$$\begin{aligned} \Vert \nu ^k_t\Vert _{-1} \!\leqslant \! Ce^{-\lambda _1 (t-T_{k-1})} \frac{N^{2\zeta } }{\sqrt{N}}\!+\! C(T\!+\!\sqrt{T})\sup _{T_{k-1}\leqslant s\leqslant t}\Vert \nu ^k_s \Vert _{-1}^2 \!+\! \frac{\sqrt{T}}{\sqrt{N}}N^{\zeta }.\qquad \qquad \end{aligned}$$
(3.46)

Therefore we readily see that if we define

$$\begin{aligned} t^* = \sup \left\{ t\in [T_{k-1},T_k]: \Vert \nu ^k_t \Vert _{-1} > \frac{\sqrt{T}}{\sqrt{N}}N^{2\zeta }\right\} , \end{aligned}$$
(3.47)

we have that for \(t\le t^*\)

$$\begin{aligned} \Vert \nu ^k_t\Vert _{-1} \le CN^{2\zeta - \frac{1}{2}}+2CT^2 N^{4\zeta -1}+ \sqrt{T} N^{\zeta -\frac{1}{2}}. \end{aligned}$$
(3.48)

Therefore since \(\lim _N T^3 N^{-1+4\zeta }=0\) (see Remark 2.6), for \(N\) large enough, we have \(t^*=T_k\) and (3.43) is reduced to proving n \(\Vert \nu ^k_{T_{k-1}} \Vert _{-1}\le N^{2\zeta }/\sqrt{N}\) for \(k=1,2, \ldots , n\). This holds for \(k=1\): we are now going to show by induction (3.44) and therefore that the assumption propagates from \(k\) to \(k+1\).

To prove the bound on \(\nu ^{k+1}_{T_k}\), assuming the bound on \(\nu ^{k}_{T_{k-1}}\), we use the smoothness of the manifold \(M\). Since we are working in \(B^N\), \(\tau ^k_\sigma =T_k\) and we have

$$\begin{aligned} \nu ^{k+1}_{T_{k}}&= q_{\psi _{k-1}}+\nu ^k_{T_k}-q_{\psi _k} \nonumber \\&= P^\perp _{\psi _k}\left[ q_{\psi _{k-1}}+\nu ^k_{T_k}-q_{\psi _k}\right] \nonumber \\&= \left( P^\perp _{\psi _k}-P^\perp _{\psi _{k-1}}\right) \left[ q_{\psi _{k-1}}+\nu ^k_{T_k}-q_{\psi _k}\right] +P^\perp _{\psi _{k-1}}\left[ q_{\psi _{k-1}}-q_{\psi _k}\right] +P^\perp _{\psi _{k-1}}\nu ^k_{T_k},\nonumber \\ \end{aligned}$$
(3.49)

Since the mapping \(\psi \mapsto P^\perp _\psi \) is smooth on the compact \(M\), we have (cf. Sect. 2.2)

$$\begin{aligned} \left\| P^\perp _{\psi _k}-P^\perp _{\psi _{k-1}} \right\| _{{\mathcal L} (H_{-1},H_{-1})}\leqslant C\left| \psi _k-\psi _{k-1} \right| , \end{aligned}$$
(3.50)

and the identities

$$\begin{aligned} \psi _k-\psi _{k-1}= \mathtt {p}(\mu _{T_k})-\mathtt {p}(\mu _{T_{k-1}}), \end{aligned}$$
(3.51)

and

$$\begin{aligned} \mu _{T_k}-\mu _{T_{k-1}}= \nu ^k_{T_k}-\nu ^k_{T_{k-1}}, \end{aligned}$$
(3.52)

combined with the smoothness of \(\mathtt {p}\), lead to [using (3.43)]

$$\begin{aligned} \left\| P^\perp _{\psi _k}-P^\perp _{\psi _{k-1}} \right\| _{{\mathcal L} (H_{-1},H_{-1})} \leqslant C\frac{\sqrt{T}}{\sqrt{N}}N^{2\zeta }. \end{aligned}$$
(3.53)

On the other hand, the smoothness of \(q_\psi \) with respect to \(\psi \), (3.51) and (3.52) imply

$$\begin{aligned} \left\| q_{\psi _{k-1}}+\nu ^k_{T_k}-q_{\psi _k} \right\| _{-1} \leqslant C \left( \Vert \nu ^k_{T_{k-1}}\Vert _{-1} +\Vert \nu ^k_{T_k}\Vert _{-1}\right) , \end{aligned}$$
(3.54)

so the first term in the last line of (3.49) is of order \(\frac{T}{N}N^{4\zeta }\), which is much smaller than \( \frac{N^{2\zeta }}{\sqrt{N}}\) nor \(N \rightarrow \infty \), since \(\lim _N T N^{2\zeta -\frac{1}{2}}=0\) (see Remark 2.6). Moreover, Lemma 2.4 implies

$$\begin{aligned} \left\| P^\perp _{\psi _{k-1}}\left[ q_{\psi _{k-1}}-q_{\psi _k}\right] \right\| _{-1} \!=\! \left\| P^\perp _{\psi _{k-1}} \left[ v(\mu _{T_{k-1}})\!-\!v(\mu _{T_k}) \right] \right\| _{-1} \!\leqslant \! C\left\| \mu _{T_{k-1}}\!-\!\mu _{T_k}\right\| _{-1},\nonumber \\ \end{aligned}$$
(3.55)

so the second term in the last line of (3.49) is also of order \(\frac{T}{N}N^{4\zeta }\). Finally, projecting (2.17) on \(\text {Range}\big (L_{q_{\psi _{k-1}}}\big )\) and by using again Lemma 7.2, we get

$$\begin{aligned} \left\| P^\perp _{{\psi _{k-1}}}\nu ^k_t\right\| _{-1}&\leqslant C e^{-\lambda _1 (t-T_{k-1})} \Vert \nu ^k_{T_{k-1}}\Vert _{-1} \nonumber \\&+ C \int _{T_{k-1}}^t \left( 1+\frac{1}{\sqrt{t-s}}\right) \Vert \nu ^k_s \Vert _{-1}^2 \mathrm{d}s + \Vert Z^{k,\perp }_t \Vert _{-1}, \end{aligned}$$
(3.56)

which, since \(\lim _N T^4 N^{1-5\zeta }=0\) (see Remark 2.6), leads for \(N\) large enough to

$$\begin{aligned} \left\| P^\perp _{{\psi _{k-1}}}\nu ^k_{T_k}\right\| _{-1} \leqslant \frac{N^{3\zeta /2}}{\sqrt{N}}. \end{aligned}$$
(3.57)

This takes care of the third term in the last line of (3.49) and by collecting the three estimates we obtain (3.44) and the proof is complete. \(\square \)

4 The effective dynamics on the tangent space

The following result states that each rotation increment of our discretization scheme is well approximated by the projection the dynamical noise on the tangent space.

Proposition 4.1

We have the first order approximation in probability: for every \(\varepsilon >0\)

$$\begin{aligned} {\mathbb {P}} \left( \left| \sum _{k=1}^n (\psi _k-\psi _{k-1}) - \sum _{k=1}^n \frac{( Z^{k}_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q',q')_{-1,1/q}}\right| \le \varepsilon \right) = 1. \end{aligned}$$
(4.1)

Proof

Lemma 7.5 and Proposition 3.3 give (assuming that \(B^N\) is realized: we will do this through all the proof)

$$\begin{aligned} \psi _k-\psi _{k-1}&= -\frac{( \nu ^k_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q', q')_{-1,1/q}}\nonumber \\&-\frac{1}{2\pi I_0^2(2Kr)}\frac{( \nu ^k_{T_k}, (\log q_{\psi _{k-1}})'')_{-1,1/q_{\psi _{k-1}}}}{( q',q')_{-1,1/q}}\frac{( \nu ^k_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q', q')_{-1,1/q}}\nonumber \\&+o\left( \frac{T}{N}\right) . \end{aligned}$$
(4.2)

Since \(\log (q_{\psi _{k-1}})''\) is in \(R(L_{q_{\psi _{k-1}}})\), we have

$$\begin{aligned} ( \nu ^k_{T_k}, (\log q_{\psi _{k-1}})'')_{-1,1/q_{\psi _{k-1}}}= ( (\nu ^k_{T_k})^\perp , (\log q_{\psi _{k-1}})'')_{-1,1/q_{\psi _{k-1}}}, \end{aligned}$$
(4.3)

and thus using again Proposition 3.3 we get for the second term of the right-hand side

$$\begin{aligned}&\left\| \frac{1}{2\pi I_0^2(2Kr)}\frac{( \nu ^k_{T_k}, (\log q_{\psi _{k-1}})'')_{-1,1/q_{\psi _{k-1}}}}{( q',q')_{-1,1/q}}\frac{( \nu ^k_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q', q')_{-1,1/q}} \right\| _{-1} \nonumber \\&\qquad \leqslant C\frac{1}{\sqrt{N}}N^{2\zeta } \frac{\sqrt{T}}{\sqrt{N}}N^{2\zeta }, \end{aligned}$$
(4.4)

and hence it is \(o(T/N)\), since \(\lim _N N^{4\zeta }/\sqrt{T}=0\) (see Remark 2.6). So only the component on the tangent space of \(M\) at the point \(\psi _{k-1}\) is of order \(T/N\):

$$\begin{aligned} \psi _k-\psi _{k-1}= -\frac{( \nu ^k_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q', q')_{-1,1/q}}+o\left( \frac{T}{N}\right) . \end{aligned}$$
(4.5)

We now decompose this tangent term. Our goal is to show that the projection of the noise \(Z_{T_k}\) is the only term that gives a non negligible contribution when \(N\) goes to infinity. However, a direct domination of the remainder—the nonlinear part of the evolution equation (2.17)—using the a priori bound \(\Vert \nu ^k_t\Vert _{-1}\!\leqslant \!\frac{\sqrt{T}}{\sqrt{N}}N^{2\zeta }\) is not sufficient. In fact

$$\begin{aligned} \left| \left( \int _{T_{k-1}}^{T_k} e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta [\nu ^k_sJ*\nu ^k_s]\mathrm{d}s, q'_{\psi _{k-1}}\right) _{-1,1/q_{\psi _{k-1}}}\right| \leqslant \frac{T^2}{N}N^{4\zeta }.\quad \end{aligned}$$
(4.6)

In order to improve this estimate the strategy is to we re-inject (2.17) into the projection \(( I_k(T_k), q_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}\), where

$$\begin{aligned} I_k(t) = \mathbf {1}_{\{\tau ^k_{\sigma _N}\geqslant T_{k-1}\}}\int _{T_{k-1}}^{t\wedge \tau ^k_{\sigma _N} } e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta [\nu ^k_sJ*\nu ^k_s]\mathrm{d}s, \end{aligned}$$
(4.7)

and this leads to a rather long expression

$$\begin{aligned} \sum _{k=1}^n (\psi _k-\psi _{k-1}) = \sum _{k=1}^n \frac{( Z^{k}_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q',q')_{-1,1/q}}+\sum _{k=1}^n\sum _{i=1}^9A_{k,i}+o(1), \end{aligned}$$
(4.8)

with

$$\begin{aligned} A_{k,1}&= \mathbf {1}_E \left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ e^{-(s-T_{k-1})L_{q_{\psi _{k-1}}}}\nu ^k_{T_{k-1}}J*\right. \right. \nonumber \\&\left. \left. \times \left( e^{-(s-T_{k-1})L_{q_{\psi _{k-1}}}}\nu ^k_{T_{k-1}}\right) \right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,2}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ I_k(s)J*I_k(s)\right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,3}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ Z^k_sJ*Z^k_s\right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,4}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ e^{-(s-T_{k-1})L_{q_{\psi _{k-1}}}}\nu ^k_{T_{k-1}}J*I_k(s)\right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,5}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ I_k(s)J*\left( e^{-(s-T_{k-1})L_{q_{\psi _{k-1}}}}\nu ^k_{T_{k-1}}\right) \right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,6}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ e^{-(s-T_{k-1})L_{q_{\psi _{k-1}}}}\nu ^k_{T_{k-1}}J*Z^k_s\right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,7}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ Z^k_sJ*\left( e^{-(s-T_{k-1})L_{q_{\psi _{k-1}}}}\nu ^k_{T_{k-1}}\right) \right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,8}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ I_k(s)J*Z^k_s\right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star \nonumber \\ A_{k,9}&= \mathbf {1}_E\left( \int _\star e^{-(T_k-s)L_{q_{\psi _{k-1}}}}\partial _\theta \left[ Z^k_sJ*I_k(s)\right] \mathrm{d}s, q'_{\psi _{k-1}}\right) _\star , \end{aligned}$$
(4.9)

where we have used the shortcuts \(E=\{\tau ^k_{\sigma _N}\geqslant T_{\psi _{k-1}}\}\), \(\int _\star \) stands for \(\int _{T_{k-1}}^{T\wedge \tau ^k_{\sigma _N}}\) and \((\cdot , \cdot )_\star \) is \((\cdot , \cdot )_{-1,1/q_{\psi _{k-1}}}\).

The following bound (a direct consequence of Lemmas 7.2 and 7.3) is now going to be of help:

$$\begin{aligned}&\left\| \,\, \int _{T_{k-1}}^{T_k}e^{-(T_k-s)L_{q_{\psi _{k-1}}}} \partial _\theta [h_1(s)J*h_2(s)]\mathrm{d}s\right\| _{-1}\nonumber \\&\qquad \leqslant C\int _{T_{k-1}}^{T_k} \left( 1+\frac{1}{\sqrt{T_k-s}}\right) \Vert h_1(s)\Vert _{-1} \Vert h_2(s)\Vert _{-1}\mathrm{d}s. \end{aligned}$$
(4.10)

In fact it is not difficult to see that by using Proposition 3.3 and (4.10) we can efficiently bound all the \(A_{k,j}\)’s, except \(A_{k,3}\):

$$\begin{aligned} | A_{k,1} | \!\!&\leqslant \!\! \frac{1}{N}N^{5\zeta }, \ \ | A_{k,2} |\!\!\leqslant \!\! \frac{T^5}{N^2}N^{9\zeta }, \ \ | A_{k,4} |\!\! \leqslant \!\! \frac{T^2}{N^{3/2}}N^{7\zeta }, \ \ | A_{k,5} | \nonumber \\ \!\!&\leqslant \!\! \frac{T^2}{N^{3/2}}N^{7\zeta }, \nonumber \\ | A_{k,6} | \!\!&\leqslant \!\! \frac{T^{1/2}}{N^{3/2}}N^{4\zeta }, \ \ | A_{k,7} | \!\!\leqslant \!\! \frac{T^{1/2}}{N^{3/2}}N^{4\zeta }, \ \ | A_{k,8} | \!\!\leqslant \!\! \frac{T^{7/2}}{N^{3/2}}N^{6\zeta } \ \ and \ \ | A_{k,9} | \nonumber \\&\!\!\leqslant \!\!&\frac{T^{7/2}}{N^{3/2}}N^{6\zeta }. \end{aligned}$$
(4.11)

Since \(T^4N^{9\zeta -1}\rightarrow 0\) and \(N^{5\zeta }/T\rightarrow 0\) (see Remark 2.6), we get (recall that \(n=n_N=\frac{N}{T}\))

$$\begin{aligned} \sum _{k=1}^n (\psi _k-\psi _{k-1}) = \sum _{k=1}^n \frac{( Z^{k}_{T_k}, q'_{\psi _{k-1}}) _{-1,1/q_{\psi _{k-1}}} }{( q',q'\rangle }+\sum _{k=1}^n A_{k,3}+o(1). \end{aligned}$$
(4.12)

For the \(A_{k,3}\) terms we need to use something more sophisticate. To deal with these terms in fact we rely on an averaging phenomena. This method has been used in [4] for the same kind of problem. We write the Doob decomposition

$$\begin{aligned} \sum _{k=1}^{m} A_{k,3} = M_m +\sum _{k=1}^m \gamma _k, \end{aligned}$$
(4.13)

where

$$\begin{aligned} \gamma _k = {\mathbb {E}} \left[ A_{k,3}| {\mathcal F} _{T_{k-1}}\right] , \end{aligned}$$
(4.14)

and \(M_m\) is a \( {\mathcal F} _{T_m}\)-martingale with brackets

$$\begin{aligned} \langle M\rangle _m = \sum _{k=1}^m \left( {\mathbb {E}} \left[ A_{k,3}^2|{\mathcal F} _{T_{k-1}}\right] -\gamma _k^2\right) . \end{aligned}$$
(4.15)

We have

$$\begin{aligned} \gamma _k&= {\mathbb {E}} \left[ \frac{1}{N^2}\sum _{i,j=1}^N\int _{T_{k-1}}^{T_k\wedge \tau ^k_{\sigma _N}}\mathrm{d}W^i_s\int _{T_{k-1}}^{T_k\wedge \tau ^k_{\sigma _N}}\mathrm{d}W^j_{s'}\int _{\mathbb {S}} \mathrm{d}\theta \left( 1 -\frac{1}{2\pi I^2_0(2Kr)q_{\psi _{k-1}}(\theta )}\right) \right. \nonumber \\&\times \left. \int _{\mathbb {S}} \mathrm{d}\theta ''\partial {\mathcal G} ^{\psi _{k-1}}_{T_k-s}(\theta , \varphi ^{i,N}_s)J(\theta -\theta '')\partial {\mathcal G} ^{\psi _{k-1}}_{T_k-{s'}} (\theta '',\varphi ^{j,N}_{s'})\bigg | {\mathcal F} _{T_{k-1}} \right] \mathbf {1}_{\tau ^{k}_{\sigma _N}\geqslant T_{k-1}},\nonumber \\ \end{aligned}$$
(4.16)

where \(\partial {\mathcal G} ^{\psi }_{t}(\theta ,\theta '):=\partial _{\theta ' } {\mathcal G} ^{\psi }_{t}(\theta ,\theta ')\), and from this we obtain

$$\begin{aligned}&\gamma _k = {\mathbb {E}} \left[ \frac{1}{N}\int _{0}^{T\wedge \widetilde{\tau }_{\sigma _N}} \mathrm{d}s \int _{\mathbb {S}} \widetilde{\mu }_s(\mathrm{d}\theta ') \int _{\mathbb {S}} \mathrm{d}\theta \left( 1-\frac{1}{2\pi I^2_0(2Kr)q_{\psi _{k-1}}(\theta )}\right) \nonumber \right. \\&\qquad \left. \int _{\mathbb {S}} \mathrm{d}\theta ''\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{T-s} (\theta ,\theta ')J(\theta -\theta '')\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{T-s} (\theta '',\theta ')\right] \mathbf {1}_{\tau ^{k}_{\sigma _N}\geqslant T_{k-1}}, \end{aligned}$$
(4.17)

with

$$\begin{aligned} \widetilde{\mu }_s := \frac{1}{N} \sum _{j=1}^N \delta _{\widetilde{\varphi }^{j,N}_s}, \end{aligned}$$
(4.18)

where \(\{\widetilde{\varphi }^{j,N}_s\}_{s \ge 0}\) is a solution of (1.1) depending on \({\mathcal F} _{T_{k-1}}\) only through the initial condition

$$\begin{aligned} \widetilde{\varphi }^{j,N}_0 = \varphi ^{j,N}_{T_{k-1}}. \end{aligned}$$
(4.19)

The stopping time \(\widetilde{\tau }_{\sigma _N}\) is defined as follows:

$$\begin{aligned} \widetilde{\tau }_{\sigma _N} := \inf \{s>0, \Vert \widetilde{\mu }_s-q_{\psi _{T_{k-1}}}\Vert _{-1} >\sigma _N\}. \end{aligned}$$
(4.20)

We now write \( \widetilde{\mu }_s(\mathrm{d}\theta ')=q_{\psi _{k-1}}(\theta ')\mathrm{d}\theta ' + \widetilde{\nu }^k_s(\mathrm{d}\theta ')\) and split the for right hand-side of (4.17) into the corresponding two terms.

The term coming \(q_{\psi _{k-1}}(\theta ')\mathrm{d}\theta '\) is zero as one can see by using the symmetry:

$$\begin{aligned} {\mathcal G} ^{\psi _{{k-1}}}_s(\psi _{k-1}+\theta ,\psi _{k-1}+\theta ') = {\mathcal G} ^{\psi _{{k-1}}}_s(\psi _{k-1}-\theta ,\psi _{k-1}-\theta ') \end{aligned}$$
(4.21)

which follows from the same statement with \(\psi _{{k-1}}=0\), which in turn is a consequence of the representation (Appendix A) and of the fact that if \(e_j\) is even (respectively, odd) then \(f_j\) is even (respectively, odd) too (see Sect. 2.1 and Appendix A).

For the term containing \(\widetilde{\nu }^k_s(\mathrm{d}\theta ')\) instead we get the bound

$$\begin{aligned}&\left| {\mathbb {E}} \left[ \frac{1}{N}\int _{T_{k-1}}^{T_k\wedge \widetilde{\tau }_{\sigma _N}} \mathrm{d}s \int _{\mathbb {S}} \mathrm{d}\widetilde{\nu }^k_s(\theta ') \int _{\mathbb {S}} \mathrm{d}\theta \int _{\mathbb {S}} \mathrm{d}\theta '' \left( 1-\frac{1}{2\pi I^2_0(2Kr)q_{\psi _{k-1}}(\theta )}\right) \right. \right. \nonumber \\&\qquad \times \left. \left. \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{T_k-s}(\theta ,\theta ')J (\theta -\theta '')\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{T_k-s}(\theta '', \theta ') \right] \right| \nonumber \\&\quad \leqslant {\mathbb {E}} \left[ \frac{1}{N}\int _{T_{k-1}}^{T_k\wedge \tau ^k_{\sigma _N}} \mathrm{d}s \Vert \widetilde{\nu }^k_s\Vert _{-1} \Vert H^k_s\Vert _{H_1} \right] \end{aligned}$$
(4.22)

where

$$\begin{aligned} H^k_s(\theta ')&= \int _{\mathbb {S}} \mathrm{d}\theta \int _{\mathbb {S}} \mathrm{d}\theta '' \left( 1-\frac{1}{2\pi I^2_0(2Kr)q_{\psi _{k-1}}(\theta )}\right) \nonumber \\&\quad \times \partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{T_k-s}(\theta ,\theta ') J (\theta -\theta '')\partial _{\theta '}{\mathcal G} ^{\psi _{k-1}}_{T_k-s} (\theta '',\theta '). \end{aligned}$$
(4.23)

We now plug in the explicit representation for the kernels:

$$\begin{aligned} H^k_s(\theta ')&= \sum _{l_1,l_2=0}^\infty e^{-(\lambda _{l_1}+\lambda _{l_2})(T_k-s)} \int _0^{2\pi }\mathrm{d}\theta \int _0^{2\pi }\mathrm{d}\theta '' \left( 1-\frac{1}{2\pi I^2_0(2Kr)q_{\psi _{k-1}}(\theta )}\right) \nonumber \\&\quad \times e_{\psi _{k-1},l_1}(\theta )J(\theta -\theta '') e_{\psi _{k-1},l_2}(\theta '')f'_{\psi _{k-1},l_1}(\theta ') f'_{\psi _{k-1},l_2}(\theta '). \end{aligned}$$
(4.24)

We obtain

$$\begin{aligned}&\Vert H^k_s \Vert _1 \leqslant \sum _{l,m=0}^\infty e^{-(\lambda _{l_1}+\lambda _{l_2})(T_k-s)} \left| \int _0^{2\pi }\mathrm{d}\theta \int _0^{2\pi }\mathrm{d}\theta '' \left( 1-\frac{1}{2\pi I^2_0(2Kr)q_{\psi _{k-1}}(\theta )}\right) \nonumber \right. \\&\quad \times \left. e_{\psi _{k-1},l_1}(\theta )J(\theta -\theta '')e_{\psi _{k-1},l_2}(\theta '') \right| \left( \Vert f''_{\psi _{k-1},l_1} \Vert _2 \Vert f'_{\psi _{k-1},l_2}\Vert _\infty \!+\!\Vert f'_{\psi _{k-1},l_1}\Vert _\infty \Vert f''_{\psi _{k-1},l_2}\Vert _2\right) \!.\nonumber \\ \end{aligned}$$
(4.25)

We aim at proving the convergence of this sum. For the integral term, thanks to the rotation symmetry, we can limit the study to \(\psi _{k-1}=0\). Since \(J(\theta -\theta '')=-K\sin (\theta -\theta '')=- K\sin (\theta )\cos (\theta '')+K\cos (\theta )\sin (\theta '')\), we can split these double integrals into products of two simple ones. Corollary 8.5 implies that there exists \(l_0\) in \({\mathbb {N}} \) such that \(e_{0,l_0+2p}\) and \(e_{0,l_0+2p+1}\) can be written as

$$\begin{aligned} e_{0,l_0+2p}&= p q_0^{1/2}(c_{1,l_0+2p}v_{1,l_0+p}+c_{2,l_0+2p} v_{2,l_0+p})+O\left( \frac{1}{p}\right) ,\end{aligned}$$
(4.26)
$$\begin{aligned} e_{0,l_0+2p+1}&= p q_0^{1/2}(c_{1,l_0+2p+1}v_{1,l_0+p}+ c_{2,l_0+2p+1}v_{2,l_0+p})+O\left( \frac{1}{p}\right) ,\qquad \quad \end{aligned}$$
(4.27)

where

$$\begin{aligned} \sup _{l\geqslant l_0}\{ |c_{1,l}|,|c_{2,l}| \} < \infty \end{aligned}$$
(4.28)

and the functions \(v_{i,l}\) are defined in Proposition 8.4. The \(v_{i,l}\) are sums and products of sines and cosines, and there exists \(h\in {\mathbb {N}} \) such that the only non-zero Fourier coefficients of \(v_{i,l_0+p}\) are of index included between \(h+p-2\) and \(h+p+2\) and are bounded with respect to \(p\). We deduce that the simple integral terms containing \(e_{0,l_0+2p}\), which are of the form

$$\begin{aligned} C\int _0^{2\pi } q_0^{\pm 1/2}(\theta ) e_{0,l_0+2p}(\theta )g(\theta )\mathrm{d}\theta , \end{aligned}$$
(4.29)

where \(g\) is sine or cosine and \(C\) is a constant independent of \(p\), are up to a correction of order \(1/p\) a bounded linear combination of the Fourier coefficients of \(q_0^{1/2}\) or \(q_0^{-1/2}\) of index taken between \(h+p-3\) and \(h+p+3\). The same argument applies for \(e_{0,l_0+2p+1}\). Since these Fourier terms decrease faster than exponentially (this can be seen by observing that \(\int _{\mathbb {S}} \exp (a \cos \theta ) \mathrm{d}\theta = 2\pi I_n(a)\) and that \(\exp (a \cos (\cdot ))\) is an entire function), these simple integral terms are of order \(1/p\). Using Remark 8.3 and Corollary 8.6 we deduce the following bound for \(\Vert H^k_s\Vert _1\):

$$\begin{aligned} \Vert H^k_s \Vert _1 \leqslant C+C\sum _{l_1,l_2=1}^\infty \frac{l_1+l_2}{l_1l_2}e^{(T_k-s)\frac{l_1^2+l_2^2}{C}} +C\sum _{l=1}^\infty e^{(T_k-s)\frac{l^2}{C}}, \end{aligned}$$
(4.30)

where the first term of the right hand side corresponds to the case \(l_1=0,l_2=0\) in (4.25), the second term corresponds to \(l_1>0,l_2>0\) and the third term to \(l_1=0,l_2>0\) or \(l_2=0,l_1>0\). Applying (4.22) and Proposition 3.3, we get:

$$\begin{aligned} |\gamma _k| \!&\leqslant \! C\frac{T^{1/2}}{N^{3/2}}N^{2\zeta }\int _{T_{k-1}}^{T_k} \mathrm{d}s \Vert H^k_s\Vert _1\nonumber \\ \!&\!\leqslant&\! C\frac{T^{1/2}}{N^{3/2}}N^{2\zeta } \left( T \!+\! \sum _{h,l=1}^\infty \frac{h+l}{hl(h^2+l^2)} \!+\!\sum _{l=1}^\infty \frac{1}{l^2}\right) \leqslant C\frac{T^{3/2}}{N^{3/2}}N^{2\zeta },\qquad \end{aligned}$$
(4.31)

and thus for \(N\) large enough

$$\begin{aligned} \sum _{k=1}^n |\gamma _k| \leqslant \sqrt{\frac{T}{N}}N^{3\zeta }. \end{aligned}$$
(4.32)

On the other hand, applying Doob Inequality, (4.9) and Proposition 3.3, it comes

$$\begin{aligned}&{\mathbb {P}} \left[ \sup _{1\leqslant m\leqslant n} |M_m|\geqslant \sqrt{\frac{T}{N}}N^{3\zeta }\right] \leqslant \frac{N}{T N^{6\zeta }}{\mathbb {E}} \left[ \langle M\rangle _n\right] \nonumber \\&\qquad \leqslant \frac{N^{1-6\zeta }}{T}\sum _{m=1}^n{\mathbb {E}} \left[ A_{k,3}^2\right] \leqslant \frac{T^3}{N^{1+2\zeta }}. \end{aligned}$$
(4.33)

Since \(T N^{-1-2\zeta }\rightarrow 0\) (see Remark 2.6), the combination of (4.32) and (4.33) leads to

$$\begin{aligned} {\mathbb {P}} \left[ \left| \sum _{i=1}^n A_{k,3}\right| \geqslant \sqrt{\frac{T}{N}}N^{3\zeta }\right] \rightarrow 0, \end{aligned}$$
(4.34)

and the proof is complete. \(\square \)

5 Approach to \(M\)

The long time behavior of the solutions to (1.4) is rather well understood, so, in particular, we know that if \(p_0\) is not on the set attracted to the unstable solution \(\frac{1}{2\pi }\), then it converges to one probability density in \(M\) (cf. Proposition 5.2) and, in particular, it reaches a given (\(N\) independent) neighborhood of \(M\) in finite time: this is directly extracted from [6, 21]). This takes care of the first stage of the evolution, because the deterministic result directly extends to the empirical measure by standard arguments since the time horizon is finite. But we do need to get to distances of about \(N^{-1/2}\) and this requires a more attentive control of the dynamics. In fact, we exploit the approximate contracting properties of the dynamics when the empirical measure is close to \(q_\psi \). We talk about approximate contracting properties because the noise plays against getting to \(M\) and limits the contraction effect of the linearized operator. Nevertheless, the proof mimics the deterministic proof of nonlinear stability, to which the control of the noise is added. In principle the argument is straightforward: one exploits the spectral gap of the linearized evolution. In practice, one has to set up an iterative procedure similar to the one developed in Sect. 3, because the center of synchronization may change somewhat over long times. This procedure is however substantially easier than the one presented in Sect. 3, mostly because here the control required on the noise is for substantially shorter times (\(\log N\) versus \(N\)!), so we will not go through the arguments in full detail again.

Proposition 5.1

Choose \(p_0\in {\mathcal M} _1 {\setminus } U\) such that (1.9) is satisfied. Then there exists \(\psi _0\) (non random!), that depends on \(K\) and \(p_0(\cdot )\), \(C\), that depends only on \(K\), and a random variable \( \Psi _N\) such that

$$\begin{aligned} \lim _{N \rightarrow \infty } {\mathbb {P}} \left( \left\| \mu _{N, \widetilde{\varepsilon }_N N} - q_{\Psi _N} \right\| _{-1} \le \frac{N^{2\zeta }}{\sqrt{N}} \right) = 1, \end{aligned}$$
(5.1)

where \(\widetilde{\varepsilon }_N := \lfloor C \log N \rfloor /N\), and \(\lim _N\Psi _N= \psi _0\) in probability. Moreover for \(\varepsilon \) and \(\varepsilon _N\) as in Theorem 1.1 we have

$$\begin{aligned} \lim _{N \rightarrow \infty } {\mathbb {P}} \left( \sup _{t \in [\varepsilon _N N,\widetilde{\varepsilon }_N N ]} \left\| \mu _{N, t} - q_{\psi _0} \right\| _{-1} \le \varepsilon \right) = 1, \end{aligned}$$
(5.2)

Proof

The proof is divided in two parts. First we prove, using the convergence of \(\mu _t=\mu _{N,t}\) to the deterministic solution \(p_t\), that for a given \(h>0\) (arbitrarily small), there exists \(t_0\) such that for \(\varepsilon \) small enough, \({\mathbb {P}} (\text {dist}(\mu _{N,t_0},M)\leqslant h)\rightarrow 1\) when \(N\rightarrow \infty \). Then we show that after a time of order \(\log N\), the empirical measure \(\mu _{t}\) moves to a distance \(N^{\zeta -1/2}\) from \(M\), without a macroscopic change of the phase.

The first part of the proof relies on the following result:

Proposition 5.2

If \(p_0\in {\mathcal M} _1{\setminus } U\) then there exists \(\psi \in {\mathbb {S}} \) such that \(\lim _{t \rightarrow \infty } p_t = q_\psi \) in \(C^k({\mathbb {S}} ; {\mathbb {R}} )\) (for every \(k)\).

Proposition 5.2 is essentially taken from [21], in the sense that it follows by piecing together some results taken from [21]. We give below a proof that of course relies on [21]. We point out that the very same result can be proven also by adapting entropy production arguments, like in [2].

Proposition 5.2 guarantees that the deterministic solution \(p_t\) converges to a element \(q_{\psi _0}\) of \(M\). Therefore for \(t\ge t_0\), we have that \(p_t\) is no farther than \(h/2\) from \(q_{\psi _0}\) (this is a statement that can be made for example in \(C^k\), but here we just need it in \(H_{-1}\)). Actually, it is not difficult to see that one can choose \(t_0 = -\frac{2}{\lambda _1}\log h\), for \(h\) sufficiently small (\(\lambda _1\) is the spectral gat of \(L_{q_{\psi _0}}\)), but this is of little relevance here. Applying the Itô formula

$$\begin{aligned} \mu _t-p_t = e^{t\frac{\Delta }{2}}( \mu _0-p_0)-\int _0^t e^{(t-s)\frac{\Delta }{2}}[\mu _s J*\mu _s-p_sJ*p_s]\mathrm{d}s + z_t, \end{aligned}$$
(5.3)

where

$$\begin{aligned} z_t = \frac{1}{N}\sum _{j=1}^N \partial _{\theta '}{\mathcal H} (\theta ,\phi ^{j,N}_s)\mathrm{d}W^j_s, \end{aligned}$$
(5.4)

\(e^{t\frac{\Delta }{2}}\) is the semi-group of the Laplacian and \({\mathcal H} \) is the kernel of \(e^{s\frac{\Delta }{2}}\) in \({\mathbb {L}} ^2\). Define \(W_N=\{ w, \Vert \mu _0-p_0\Vert _{-1} \leqslant \varepsilon \}\). Using the classical estimate \(\Vert e^{t \Delta /2}u\Vert _{-1}\leqslant \frac{C}{\sqrt{t}}\Vert u\Vert _{-2}\) and similar argument as in Sect. 3, we deduce that there exist events \(\widetilde{W}_N\subset W_N\) such that \({\mathbb {P}} (\widetilde{W}_N)\rightarrow 1\) and that for all outcomes in \(\widetilde{W}_N\) we have

$$\begin{aligned} \sup _{0\leqslant t\leqslant t_0}\Vert z_t\Vert _{-1}\leqslant \sqrt{\frac{t_0}{N}}N^\zeta . \end{aligned}$$
(5.5)

From now, we restrict ourselves to \(\widetilde{W}_N\). From (5.3) we get for all \(t\in [0,t_0]\)

$$\begin{aligned} \Vert \mu _t-p_t \Vert _{-1} \leqslant \varepsilon +C\int _0^t\frac{1}{\sqrt{t-s}}\Vert \mu _s-p_s \Vert _{-1} \mathrm{d}s + \sqrt{\frac{t_0}{N}}N^\zeta . \end{aligned}$$
(5.6)

The Gronwall–Henry inequality (see [32]) implies that there exists \(\gamma >0\) (independent of \(\varepsilon \) and \(N\)) such that

$$\begin{aligned} \sup _{t \le t_0} \Vert \mu _{t_0}-p_{t_0} \Vert _{-1} \leqslant \left( \varepsilon + \sqrt{\frac{t_0}{N}}N^\zeta \right) e^{\gamma t_0}. \end{aligned}$$
(5.7)

So for \(\varepsilon =h/4\) and \(N\) large enough, \(\Vert \mu _{t_0}-q_{\psi _{0}} \Vert _{-1} \leqslant h\) on the event \(\widetilde{W}_N\).

To show that we enter a neighborhood of size slightly larger than \(N^{-1/2}\), it will be \(N^{2 \zeta -1/2}\), we set up an iterative scheme. It is very similar to the one given in Sect. 2.4, but with times \(t_i\) bounded with respect to \(N\). This times are chosen such that after each iteration, the distance between the empirical measure and \(M\) is at least divided by \(2\). We define \(h_0:=h\) and for \(m \geqslant 1\)

$$\begin{aligned} t_m&:= t_{m-1}+\frac{1}{\lambda _1}|\log \alpha |,\end{aligned}$$
(5.8)
$$\begin{aligned} h_{m}&:= \frac{1}{2} h_{m-1}, \end{aligned}$$
(5.9)

until the index \(m_f\) defined by

$$\begin{aligned} m_f := \inf \left\{ m\geqslant 1,\, h_m\leqslant N^{2\zeta -1/2}\right\} . \end{aligned}$$
(5.10)

The constant \(\alpha \) above does not depends on \(N\) and will be chosen below. It is now easy to check that \(m_f\) is of order \(\log N\). Then we define \(\widetilde{\tau }_0 := t_0\), and for \(1\!\leqslant \! m\!\leqslant \! m_f+1\)

$$\begin{aligned} \widetilde{\psi }_{m-1}&:= \mathtt {p}(\mu _{t_{m-1}}),\end{aligned}$$
(5.11)
$$\begin{aligned} \widetilde{\nu }^{m}_{t_{m-1}}&:= \mu _{t_{m-1}}-q_{\widetilde{\psi }_{m-1}} \end{aligned}$$
(5.12)

if \(\text {dist}(\mu _{t_{m-1}},M)\!\leqslant \! \sigma \) (see Lemma 2.4). We consider for \(1\!\leqslant \! m\!\leqslant \! m_f\) the stopping times

$$\begin{aligned} \widetilde{\tau }_m&:= \widetilde{\tau }_{m-1} \mathbf {1}_{\{\widetilde{\tau }_{m-1}<t_{m-1}\}}\nonumber \\&+\inf \{s\in [t_{m-1},t_m], \Vert \mu _s-q_{\widetilde{\psi }_{m-1}}\Vert _{-1}\geqslant \sigma \}\mathbf {1}_{\{\widetilde{\tau }_{m-1}\geqslant t_{m-1}\}}. \end{aligned}$$
(5.13)

and the process solution of

$$\begin{aligned} \widetilde{\nu }^{m}_{t} \!&= \! \mathbf {1}_{\{\widetilde{\tau }_m<t_{m-1}\}} \widetilde{\nu }^m_{\widetilde{\tau }_{m}} \!+\! \mathbf {1}_{\{\widetilde{\tau }_m\geqslant t_{m-1}\}} \nonumber \\&\!\times \! \left( e^{-(t\wedge \widetilde{\tau }_m-t_{m-1}) L_{\widetilde{\psi }_{m-1}}} \widetilde{\nu }^{m}_{t_{m-1}}\!-\!\int _{t_{m-1}}^{t\wedge \widetilde{\tau }_m} e^{-(t\wedge \widetilde{\tau }_m -s)L_{\widetilde{\psi }_{m-1}}} \partial _\theta [\widetilde{\nu }^{m}_s J*\widetilde{\nu }^{m}_s]\mathrm{d}s \!+\! \widetilde{Z}^{m}_{t\wedge \widetilde{\tau }_m}\right) \!,\nonumber \\ \end{aligned}$$
(5.14)

where

$$\begin{aligned} \widetilde{Z}^m_{t} = \frac{1}{N} \sum _{j=1}^N \int _{t_{m-1}}^t \partial _{\theta '}{\mathcal G} ^{\widetilde{\psi }_{m-1}}_{t-s}\left( \theta , \varphi ^{j,N}_s\right) \mathrm{d}W^j_s. \end{aligned}$$
(5.15)

With the same arguments as given in Lemma 3.1, we can prove (recall that \(m_f\) is of order \(\log N\)) that the probability of the event

$$\begin{aligned} \Omega _N := \left\{ \sup _{1\leqslant m\leqslant m_f} \sup _{t_{m-1}\leqslant t\leqslant t_m} \left\| \widetilde{Z}^m_t\right\| _{-1} \leqslant \sqrt{\frac{t_m-t_{m-1}}{N}}N^{\zeta }\right\} \end{aligned}$$
(5.16)

tends to 1 when \(N\rightarrow \infty \). From now, we assume that \(\Omega _N\) is verified. We insist on the fact that the generic constants \(C\) appearing in the following do not depend on \(N\), and if not mentioned do not depend on \(\alpha \). From Lemma 7.2 and (5.14) we get that for all \(1\leqslant m\leqslant m_f\),

$$\begin{aligned} \Vert \widetilde{\nu }^m_t\Vert _{-1} \leqslant C h_{m-1}+C(t+\sqrt{t})\sup _{s\in [t_{m-1},t]}\Vert \widetilde{\nu }^m_s\Vert _{-1}+\sqrt{\frac{t_m-t_{m-1}}{N}}N^{\zeta }.\qquad \end{aligned}$$
(5.17)

We now prove that for \(1\leqslant m\leqslant m_f-1\), \(\Vert \nu ^{m}_{t_{m-1}}\Vert _{-1}\leqslant h_{m-1}\) implies \(\Vert \nu ^{m+1}_{t_m}\Vert _{-1}\leqslant h_{m}\), and that \(\Vert \widetilde{\nu }^{m_f}_{t_{m_f-1}}\Vert _{-1}\leqslant h_{m_f-1}\) implies \(\Vert \widetilde{\nu }^{m_f+1}_{t_{m_f}}\Vert _{-1}\leqslant N^{2\zeta -1/2}\). Define

$$\begin{aligned} s^*_m := \sup \{s\in [t_{m-1},t_m], \Vert \widetilde{\nu }^m_s\Vert _{-1}\leqslant h_{m-1}^{3/4}\}. \end{aligned}$$
(5.18)

Then for \(s<s^*_m\), if \(\Vert \nu ^{m}_{t_{m-1}}\Vert _{-1}\leqslant h_{m-1}\) ,we get using (5.17)

$$\begin{aligned} \Vert \widetilde{\nu }^m_s\Vert _{-1} \leqslant Ch_{m-1}+C(s+\sqrt{s}) h_{m-1}^{3/2}+\sqrt{\frac{t_{m}-t_{m-1}}{N}}N^{\zeta }. \end{aligned}$$
(5.19)

Since \(N^{2\zeta -1/2}\leqslant h_{m-1}\), we deduce that \(s^*_m=t_m\) if \(h_0\) is small enough. Then using (5.14) we get

$$\begin{aligned} \Vert \widetilde{\nu }^m_{t_m}\Vert _{-1} \leqslant C\alpha h_{m-1} +C h_{m-1}^{3/2} + \sqrt{\frac{t_m-t_{m-1}}{N}}N^\zeta . \end{aligned}$$
(5.20)

Since \( h_{m-1}^{3/2} \leqslant \alpha h_{m-1}\) for \(h_0\) small enough, it leads us to (recall that \(h_{m-1}=2h_m\))

$$\begin{aligned} \Vert \widetilde{\nu }^m_{t_m}\Vert _{-1} \leqslant 4C\alpha h_{m}+ \sqrt{\frac{t_m-t_{m-1}}{N}}N^\zeta . \end{aligned}$$
(5.21)

If \(m<m_f\), \(\sqrt{\frac{t_m-t_{m-1}}{N}}N^\zeta \!\!\!\!\leqslant \!\!\!\! C\alpha h_{m}\) and thus \(\Vert \widetilde{\nu }^m_{t_m}\Vert _{-1}\!\!\leqslant \!\! 5C\alpha h_{m}\). If \(m=m_f\), \(h_m\leqslant N^{2\zeta -1/2}\) and thus \(\Vert \widetilde{\nu }^m_{t_m}\Vert _{-1}\leqslant 5C\alpha N^{2\zeta -1/2}\). We now have a good control on \(\mu _{t_m}=q_{\widetilde{\psi }_{m-1}}+\widetilde{\nu }^m_{t_m}\), and project it with respect to \(\widetilde{\psi }_{m}\) (writing \(\mu _{t_m}=q_{\widetilde{\psi }_{m}}+\widetilde{\nu }^{m+1}_{t_m}\)) to get a bound for \(\Vert \widetilde{\nu }^{m+1}_{t_m}\Vert _{-1}\). We use the same decomposition as the proof of Proposition 3.3:

$$\begin{aligned} \widetilde{\nu }^{m+1}_{t_m}&= q_{\widetilde{\psi }_{m-1}}+\widetilde{\nu }^m_{t_m}- q_{\widetilde{\psi }_{m}} = P^\perp _{\widetilde{\psi }_{m}}[q_{\widetilde{\psi }_{m-1}}+ \widetilde{\nu }^m_{t_m}-q_{\widetilde{\psi }_{m}}] \nonumber \\&= \left( P^\perp _{\widetilde{\psi }_{m}}-P^\perp _{\widetilde{\psi }_{m-1}} \right) [q_{\widetilde{\psi }_{m-1}}+\widetilde{\nu }^m_{t_m}-q_{\widetilde{\psi }_{m}}] +P^\perp _{\widetilde{\psi }_{m-1}}[q_{\widetilde{\psi }_{m-1}}-q_{\widetilde{\psi }_{m}}]+ P^\perp _{\widetilde{\psi }_{m-1}}\widetilde{\nu }^m_{t_m}.\nonumber \\ \end{aligned}$$
(5.22)

Since the projection \(\mathtt {p}\) is smooth, we get the bound

$$\begin{aligned} |\widetilde{\psi }_m-\widetilde{\psi }_{m-1}| = |\mathtt {p}(\mu _{t_m})-\mathtt {p}(\mu _{t_{m_1}})| \leqslant C\Vert \mu _{t_m}-\mu _{t_{m_1}}\Vert _{-1} \leqslant C\Vert \widetilde{\nu }^m_{t_m}-\widetilde{\nu }^m_{t_{m-1}}\Vert _{-1}.\nonumber \\ \end{aligned}$$
(5.23)

But (5.21) implies in particular that

$$\begin{aligned} \Vert \widetilde{\nu }^m_{t_m}\Vert _{-1}\leqslant C(1+4\alpha )h_{m-1}, \end{aligned}$$
(5.24)

which implies, using also (5.23),

$$\begin{aligned} |\widetilde{\psi }_m-\widetilde{\psi }_{m-1}| \leqslant 2C(1+4\alpha )h_{m-1}. \end{aligned}$$
(5.25)

Using similar arguments as in the proof of Proposition 3.3 (using in particular the smoothness of the projection \(P^\perp _\psi \)), we see that the two first terms of the right hand side in (5.22) are of order \(h_{m-1}^2\). More precisely, there exists a constant \(C'[\alpha ]\) depending in \(\alpha \) (increasing in \(\alpha \)) such that

$$\begin{aligned} \Vert \widetilde{\nu }^{m+1}_{t_m}\Vert _{-1}\leqslant C'[\alpha ]h_{m-1}^2+C\Vert \widetilde{\nu }^m_{t_m}\Vert _{-1}. \end{aligned}$$
(5.26)

So, since \(\Vert \widetilde{\nu }^m_{t_m}\Vert _{-1}\!\!\!\leqslant \!\!\! 5C\alpha h_{m}\) for \(m<m_f\) and \(\Vert \widetilde{\nu }^{m_f}_{t_{m_f}}\Vert _{-1}\!\!\!\leqslant \!\! 5C\alpha N^{2\zeta -1/2}\), if \(h_0\) and \(\alpha \) are small enough we get \(\Vert \widetilde{\nu }^{m+1}_{t_m}\Vert _{-1}\leqslant h_{m}\) for \(m<m_f\) and \(\Vert \widetilde{\nu }^{m_f+1}_{t_{m_f}}\Vert _{-1}\leqslant N^{2\zeta -1/2}\).

We have therefore shown that after a time of order \(\log N\), the empirical measure comes at distance \(N^{2\zeta -1/2}\) from \(q_{\widetilde{\psi }_{m_f}}\). This angle \(\widetilde{\psi }_{m_f}\) corresponds to the angle \(\Psi _N\) in the Proposition 5.1. So it remains to prove that \(\widetilde{\psi }_{m_f}\) converges to \(\psi _0\) in probability as \(N\) goes to infinity. We decompose

$$\begin{aligned} |\widetilde{\psi }_{m_f}-\psi _0| \leqslant |\widetilde{\psi }_0-\psi _0|+ \sum _{m=1}^{m_f}|\widetilde{\psi }_{m}-\widetilde{\psi }_{m-1}|. \end{aligned}$$
(5.27)

We restrict our study on the event \(\Omega _N\bigcap \widetilde{W}_N\), whose probability tends to \(1\). Since \(\Vert \mu _{t_0}-q_{\psi _0}\Vert _{-1}\leqslant h\) and the projection \(\mathtt {p}\) is smooth, we get

$$\begin{aligned} |\widetilde{\psi }_0-\psi _0|\leqslant C h \end{aligned}$$
(5.28)

and (5.25) implies (recall \(h_0=h\) and \(h_{m-1}=2h_m\))

$$\begin{aligned} |\widetilde{\psi }_{m}-\widetilde{\psi }_{m-1}|\leqslant C 2^{1-m} h. \end{aligned}$$
(5.29)

Consequently for \(C\) large enough \({\mathbb {P}} [|\widetilde{\psi }_{m_f}-\psi _0|>Ch]\rightarrow _{N\rightarrow \infty } 0\), which completes the proof of (5.1). The bound (5.2) is much rougher and it follows directly from the argument we have used for establishing (5.1). This completes the proof of Proposition 5.1 \(\square \)

Proof of Proposition 5.2

The crucial issues are the gradient flow structure of (1.4) and its dissipativity properties. The gradient structure of (1.4) [6] implies that the functional

$$\begin{aligned} {\mathcal F} (p):= \frac{1}{2} \int _{\mathbb {S}} p(\theta ) \log p(\theta ) \mathrm{d}\theta -\frac{K}{2} \int _ {\mathbb {S}} \int _{\mathbb {S}} p(\theta ) \cos ( \theta - \theta ') p(\theta ') \mathrm{d}\theta \mathrm{d}\theta ',\qquad \quad \end{aligned}$$
(5.30)

is non increasing along the time evolution. The dissipativity properties proven in [21, Theorem 2.1] show that for every \(k \in {\mathbb {N}} \) and \(a>0\) we can find \(\widetilde{t}\) such that \(\Vert p_t \Vert _{C^k}< a\) for every \(t \ge \widetilde{t}\). Therefore for any \(k\) there exists \(\{t_n\}_{n=1,2, \ldots }\) such that \(t_{n+1}-t_n>1\) and \(\lim _n p_{t_n}\) exists in \(C^k\) and we call it \(p_\infty \). An immediate consequence is that \(\lim _n {\mathcal F} (p_{t_n}) ={\mathcal F} (p_{\infty })\). But we can go beyond by introducing the semigroup \(S_t\) associated to (1.4), by setting \(S_{t'}p_t=p_{t+t'}\). [21, Theorem 2.2] implies the continuity of this semigroup in \(C^k\), so that, since for \(t \in [0,1]\) we have \(t_n \le t_n + t < t_{n+1}\), we obtain \({\mathcal F} (S_t p_\infty )= {\mathcal F} (p_\infty )\). Therefore \(\partial _t{\mathcal F} (S_t p_\infty )=0\), but the condition \(\partial _t{\mathcal F} (p_t)=0\), for a solution of (1.4), directly implies that \(\partial ^2 _\theta p_t= 2 \partial _\theta (p_t J*p_t)\), which is the stationarity condition for (1.4). Therefore \(p_t\) is either \(q_\psi \), for some \(\psi \), or it coincides with \(\frac{1}{2\pi }\) [see (1.5)–(1.6)].

Let us point out that if \(p_{t_n}\) converges to \(\frac{1}{2\pi }\) then \(\{p_t\}_{t>0}\) itself converges to \(\frac{1}{2\pi }\). This is just because \({\mathcal F} \left( \frac{1}{2\pi }\right) > {\mathcal F} ( q_\psi )\), so that if \(\lim _n p_{t'_n}= q_\psi \) and \(\lim _n p_{t_n}=\frac{1}{2\pi }\) then it suffices to choose \(n\) such that \({\mathcal F} (p_{t'_n})< {\mathcal F} \left( \frac{1}{2\pi }\right) \) and \(m\) such that \(t_m>t'_n\) to get \({\mathcal F} (p_{t'_n})\ge {\mathcal F} (p_{t_m})\ge {\mathcal F} \left( \frac{1}{2\pi }\right) \), which is impossible.

So we have seen that either \(\lim _{t \rightarrow \infty } p_t = \frac{1}{2\pi }\) or all limit points are in \(M\). The stronger result we need is the convergence also when the limit point is not \(\frac{1}{2\pi }\). This result is provided by the nonlinear stability result [21, Therem 4.6] which says that if \(p_0\) is in a neighborhood of \(M\) (the result is proven for a \({\mathbb {L}} ^2\) neighborhood, which is much more than what we need here), then there exists \(\psi \) such that \(\lim _{t \rightarrow \infty } p_t= q_\psi \) in \(C^k\).

To complete the proof we need to characterize the portion of \({\mathcal M} _1\) which is attracted by \(\frac{1}{2\pi }\), that is we need to identify the stable manifold of the unstable point with the set \(U\) in (1.7). But this is the content of [21, Proposition 4.4]. \(\square \)

6 Proof of Theorem 1.1

The proof of Theorem 1.1 relies on the results of the previous sections and on a convergence argument of the process in the tangent space that we give here.

Proof of Theorem 1.1

First of all Proposition 5.1 takes care of the evolution up to time \(N \widetilde{\varepsilon }_N= C\log N\) and provides an estimate on the closeness of the empirical measure to the manifold \(M\) that allows to apply directly Proposition 3.3 and then Proposition 4.1. Note that the iterative scheme that we have set up in Sect. 2.4 has been presented without asking \(\psi _0\) not to be random or not to depend on \(N\). In fact we start the iterative scheme at time \(N \widetilde{\varepsilon }_N\) and from the random phase \(\Psi _N\) of Proposition 5.1 that converges in probability to the (non random) value \(\psi _0\). Of course there is here an abuse of notation in the use of \(\psi _0\), but notice actually that, by the rotation invariance of the system, we can actually consider without loss of generality that the empirical measure \(\mu _{N, C \log N}\) has precisely the phase \(\psi _0\). Moreover we make a time shift of \(N\widetilde{\varepsilon }_N\), so that the phase is \(\psi _0\) at time \(T_0=0\). The result in Theorem 1.1 is given for times starting from \(N\varepsilon _N\) and not \(N\widetilde{\varepsilon }_N\), but as stated in Proposition 5.1, the empirical measure stays close to \(q_{\psi _0}\) in the time interval \([N\varepsilon _N,N\widetilde{\varepsilon }_N]\). Therefore we have the finite sequence of times \(T_0, T_1, \ldots , T_n\), with the corresponding phases \(\psi _0, \psi _1, \cdot , \ldots , \psi _n\) and we define \(\psi _t\) for every \(t\in [0, T_n]\) by linear interpolation. We assume \(T_n> \tau _f N\).

We then note that, in view of (3.43), the control on the phases, see Proposition 4.1, on the times \(T_1, T_2, \ldots \) of our iteration scheme suffices not only to control the distance between the empirical measure \(\mu _{N, t}\) and \(q_{\psi _{t}}\), in the \(H_{-1}\) norm, for \(t= T_k\), but for every \(t\in [0, T_n]\). We are now ready to identify the process \(W_{N, \cdot }\) of Theorem 1.1:

$$\begin{aligned} W_{N, \tau } := \frac{\psi _{\tau N} -\psi _0}{D_K}, \end{aligned}$$
(6.1)

where we recall that \(\tau \in [0, T_n/N]\). We are therefore left with showing that \(W_{N, \cdot }\) converges to standard Brownian motion.

In proving the convergence to Brownian motion we apply Proposition 4.1 and replace the process \(\psi _\cdot \) with the cadlag process \(\psi _0+ M_{N , \cdot } \in D([0, T_n /N]; {\mathbb {R}} )\) defined by

$$\begin{aligned} M_{N, \tau } := \sum _{k \in {\mathbb {N}} : T_k \le N \tau }\Delta M _{N, k}, \end{aligned}$$
(6.2)

and

$$\begin{aligned} \Delta M _{N, k} := \frac{( Z^{k}_{T_k}, q'_{\psi _{k-1}})_{-1,1/q_{\psi _{k-1}}}}{( q',q')_{-1,1/q}}. \end{aligned}$$
(6.3)

It is straightforward to see that \(M_{N ,\cdot }\) is a martingale with respect to the filtration \(\widetilde{{\mathcal F} }_\tau := {\mathcal F} _{\lfloor \tau T\rfloor / T}\), where \({\mathcal F} _\cdot \) is the natural filtration of \(\{W^j_{N \cdot }\}_{j=1, \ldots , N}\): the martingale is actually in \(L^p\), for every \(p\), as the moment estimates is Sect. 3 show. We can now apply the Martingale Invariance Principle in the form given by [23, Corollary 3.24, Ch. VIII] to \(M_{N , \cdot }\) for continuous time martingales: the hypotheses to verify in the case of piecewise constant cadlag martingales boil down to the variance convergence condition that for every \(\tau \in [0, \tau _f]\)

$$\begin{aligned} \lim _{N \rightarrow \infty } \sum _{k \in {\mathbb {N}} : T_k \le \tau N} {\mathbb {E}} \left[ \left( \Delta M_{N, T_k}\right) ^2 \Big \vert {\mathcal F} _{T_{k-1}} \right] = \tau D_K^2, \end{aligned}$$
(6.4)

in probability, and the Lindeberg condition that for every \(\varepsilon >0\) in probability we have

$$\begin{aligned} \lim _{N \rightarrow \infty } \sum _{k \in {\mathbb {N}} : T_k \le \tau N} {\mathbb {E}} \left[ \left( \Delta M_{N, T_k}\right) ^2; \Delta M_{N, T_k}^2> \varepsilon \Big \vert {\mathcal F} _{T_{k-1}} \right] = 0. \end{aligned}$$
(6.5)

For what concerns (6.4) we have

$$\begin{aligned} {\mathbb {E}} \left[ \left( \Delta M_{N, T_k}\right) ^2 \Big \vert {\mathcal F} _{T_{k-1}} \right] = \frac{1}{N\Vert q' \Vert _{-1,1/q}^2} \int _{T_{k-1}}^{T_k} \int _{\mathbb {S}} \left( f'_{\psi _{k-1},0}(\theta )\right) ^2 \mu _{N, s}(\mathrm{d}\theta ) \mathrm{d}s.\qquad \quad \end{aligned}$$
(6.6)

Now take the sum over \(k\) and use the uniform estimate (3.43) of Proposition 3.3 to replace the empirical measure with \(q_{\psi _{T_{k-1}}}(\theta )\mathrm{d}\theta \). Since a direct computation shows that \( \int _{\mathbb {S}} (f'(\theta )_{\psi , 0})^2 q_\psi (\theta ) \mathrm{d}\theta =1\), (6.4) follows.

For what concerns (6.5) we remark that, by the Markov inequality, it suffices to show that

$$\begin{aligned} \lim _{N \rightarrow \infty } \sum _{k \in {\mathbb {N}} :T_k \le \tau N} {\mathbb {E}} \left[ \left( \Delta M_{N, T_k}\right) ^4 \Big \vert {\mathcal F} _{T_{k-1}} \right] = 0. \end{aligned}$$
(6.7)

Actually one can show that there exists a non random constant \(C\) such that almost surely

$$\begin{aligned} {\mathbb {E}} \left[ \left( \Delta M_{N, T_k}\right) ^4 \Big \vert {\mathcal F} _{T_{k-1}} \right] \le C \left( \frac{T}{N} \right) ^2. \end{aligned}$$
(6.8)

This is an immediate consequence of (3.32), but of course, since we are projecting on \(q'\) and since we are just considering the fourth moment, a similar estimate can be easily obtained explicitly by proceeding like for (6.4) and by using the fact that \(\Vert f'_{\psi , 0}\Vert _\infty = \Vert f'_{0}\Vert _\infty < \infty \). Of course (6.7) follows from (6.8).

Therefore \(M_{N, \cdot }\!\!\in \!\! D([0, \tau _f]; {\mathbb {R}} )\) converges in law to \(W_\cdot / \Vert q' \Vert _{-1,1/q}\), where \(W_\cdot \) is a standard Brownian motion. This is almost the result we want (recall that \(D_K=1/ \Vert q' \Vert _{-1,1/q}\)), since \(M_{N,\cdot }\!/D_K\) differs from \(W_{N, \cdot }\) just for the fact that they interpolate in a different way between the times \(T_k\) (where the coincide) and that in the case of \(W_{N, \cdot }\) the convergence is in \(C^0([0, \tau _f]; {\mathbb {R}} )\). But (6.8) guarantees that the sum of the fourth power of the jumps of \(M_{N, \cdot }\) adds up to \(O(T^2/N)=o(1)\) in probability, so the supremum of the jumps is \(o(1)\), and therefore the convergence for \(M_{N, \cdot }\in D([0, \tau _f]; {\mathbb {R}} )\) implies the convergence of \(W_{N, \cdot }\in C^0([0, \tau _f]; {\mathbb {R}} )\). The proof of Theorem 1.1 is therefore complete. \(\square \)