Abstract
We study the Dyson–Ornstein–Uhlenbeck diffusion process, an evolving gas of interacting particles. Its invariant law is the beta Hermite ensemble of random matrix theory, a non-product log-concave distribution. We explore the convergence to equilibrium of this process for various distances or divergences, including total variation, relative entropy, and transportation cost. When the number of particles is sent to infinity, we show that a cutoff phenomenon occurs: the distance to equilibrium vanishes abruptly at a critical time. A remarkable feature is that this critical time is independent of the parameter beta that controls the strength of the interaction, in particular the result is identical in the non-interacting case, which is nothing but the Ornstein–Uhlenbeck process. We also provide a complete analysis of the non-interacting case that reveals some new phenomena. Our work relies among other ingredients on convexity and functional inequalities, exact solvability, exact Gaussian formulas, coupling arguments, stochastic calculus, variational formulas and contraction properties. This work leads, beyond the specific process that we study, to questions on the high-dimensional analysis of heat kernels of curved diffusions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and main results
Let us consider a Markov process \(X={(X_t)}_{t\ge 0}\) with state space S and invariant law \(\mu \) for which
where \(\mathrm {dist}(\cdot \mid \cdot )\) is a distance or divergence on the probability measures on S. Suppose now that \(X=X^n\) depends on a dimension, size, or complexity parameter n, and let us set \(S=S^n\), \(\mu =\mu ^n\), and \(X_0=x^n_0\in S^n\). For example \(X^n\) can be a random walk on the symmetric group of permutations of \(\{1,\ldots ,n\}\), Brownian motion on the group of \(n\times n\) unitary matrices, Brownian motion on the n-dimensional sphere, etc. In many of such examples, it has been proved that when n is large enough, the supremum over some set of initial conditions \(x^n_0\) of the quantity \(\mathrm {dist}(\mathrm {Law}(X_t^n)\mid \mu ^n)\) collapses abruptly to 0 when t passes a critical value \(c=c_n\) which may depend on n. This is often referred to as a cutoff phenomenon. More precisely, if \(\mathrm {dist}\) ranges from 0 to \(\max \), then, for some subset \(S^n_0 \subset S^n\) of initial conditions, some critical value \(c=c_n\) and for all \(\varepsilon \in (0,1)\),
It is standard to introduce, for an arbitrary small threshold \(\eta >0\), the quantity \(\inf \{t\ge 0:\sup _{x_0\in S_0^n}\mathrm {dist}(\mathrm {Law}(X^n_t)\mid \mu ^n)\le \eta \}\) known as the mixing time in the literature. Of course such a definition fully makes sense as soon as \(t\mapsto {\sup _{x_0\in S_0^n}}\mathrm {dist}(\mathrm {Law}(X^n_t)\mid \mu ^n)\) is non-increasing.
When \(S^n\) is finite, it is customary to take \(S^n_0 = S^n\). When \(S^n\) is infinite, it may happen that the supremum over the whole set \(S^n\) of the distance to equilibrium remains equal to \(\max \) at all times, in which case one has to consider strict subspaces of initial conditions. For some processes, it is possible to restrict \(S^n_0\) to a single state in which case one obtains a very precise description of the convergence to equilibrium starting from this initial condition. Note that the constraint over the initial condition can be made compatible with a limiting dynamics, for instance a mean-field limit when the process describes an exchangeable interacting particle system.
The cutoff phenomenon was put forward by Aldous and Diaconis at the origin for random walks on finite sets, see for instance [1, 26, 28, 52] and references therein. The analysis of the cutoff phenomenon is the subject of an important activity, still seeking for a complete theory: let us mention that, for the total variation distance, Peres proposed the so-called product condition (the mixing time must be much larger than the inverse of the spectral gap) as a necessary and sufficient condition for a cutoff phenomenon to hold, but counter-examples were exhibited [52, Sec. 18.3] and the product condition is only necessary.
The study of the cutoff phenomenon for Markov diffusion processes goes back at least to the works of Saloff-Coste [62, 64] in relation notably with Nash–Sobolev type functional inequalities, heat kernel analysis, and Diaconis–Wilson probabilistic techniques. We also refer to the more recent work [55] for the case of diffusion processes on compact groups and symmetric spaces, in relation with group invariance and representation theory, a point of view inspired by the early works of Diaconis on Markov chains and of Saloff-Coste on diffusion processes. Even if most of the available results in the literature on the cutoff phenomenon are related to compact state spaces, there are some notable works devoted to non-compact spaces such as [8,9,10,11, 19, 47].
Our contribution is an exploration of the cutoff phenomenon for the Dyson–Ornstein–Uhlenbeck diffusion process, for which the state space is \({\mathbb {R}}^n\). This process is an interacting particle system. When the interaction is turned off, we recover the Ornstein–Uhlenbeck process, a special case that has been considered previously in the literature but for which we also provide new results.
1.1 Distances
As for \(\mathrm {dist}\) we use several standard distances or divergences between probability measures: total variation (denoted TV), Hellinger, relative entropy (denoted Kullback), relative variance (denoted \(\chi ^2\)), Wasserstein of order 2, and Fisher information, surveyed in Appendix A. We take the following convention for probability measures \(\mu \) and \(\nu \) on the same space:
see Appendix A for precise definitions. The maximal value \(\max \) taken by \(\mathrm {dist}\) is given by
1.2 The Dyson–Ornstein–Uhlenbeck (DOU) process and preview of main results
The DOU process is the solution \(X^n={(X^n_t)}_{t\ge 0}\) on \({\mathbb {R}}^n\) of the stochastic differential equation
where \({(B_t)}_{t\ge 0}\) is a standard n-dimensional Brownian motion (BM), and where
-
\(V(x)=\frac{x^2}{2}\) is a “confinement potential” acting through the drift \(-V'(x)=-x\)
-
\(\beta \ge 0\) is a parameter tuning the interaction strength.
The notation \(X^{n,i}_t\) stands for the i-th coordinate of the vector \(X^n_t\). The process \(X^n\) can be thought of as an interacting particle system of n one-dimensional Brownian particles \(X^{n,1},\ldots ,X^{n,n}\), subject to confinement and singular pairwise repulsion when \(\beta >0\) (respectively first and second term in the drift). We take an inverse temperature of order n in (1.3) in order to obtain a mean-field limit without time-changing the process, see Sect. 2.5. The spectral gap is 1 for all \(n\ge 1\), see Sect. 2.6. We refer to Sect. 2.9 for other parametrizations or choices of inverse temperature.
In the special cases \(\beta \in \{0,1,2\}\), the cutoff phenomenon for the DOU process can be established by using Gaussian analysis and stochastic calculus, see Sects. 1.4 and 1.5. For \(\beta = 0\), the process reduces to the Ornstein–Uhlenbeck process (OU) and its behavior serves as a benchmark for the interaction case \(\beta \ne 0\), while when \(\beta \in \{1,2\}\), the approach involves a lift to unitary invariant ensembles of random matrix theory. For a general \(\beta \ge 1\), our main results regarding the cutoff phenomenon for the DOU process are given in Sects. 1.6 and 1.7. We are able, in particular, to prove the following: for all \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Wasserstein}\}\), \(a>0\), \(\varepsilon \in (0,1)\), we have
where \(P_n^\beta \) is the invariant law of the process, and where
This result is stated in a slightly more general form in Corollary 1.7. Our proof relies crucially on an exceptional exact solvability of the dynamics, notably the fact that we know explicitly the optimal long time behavior in entropy and coupling distance, as well as the eigenfunction associated to the spectral gap which turns out to be linear and optimal. This comes from the special choice of V as well as the special properties of the Coulomb interaction. We stress that such an exact solvability is no longer available for a general strongly convex V, even for instance in the simple example \(V(x)=\frac{x^2}{2}+x^4\) or for general linear forces. Nevertheless, and as usual, two other special classical choices of V could be explored, related to Laguerre and Jacobi weights, see Sect. 2.8.
1.3 Analysis of the Dyson–Ornstein–Uhlenbeck process
The process \(X^n\) was essentially discovered by Dyson in [33], in the case \(\beta \in \{1,2,4\}\), because it describes the dynamics of the eigenvalues of \(n\times n\) symmetric/Hermitian/symplectic random matrices with independent Ornstein–Uhlenbeck entries, see Lemma 5.1 and Lemma 5.2 below for the cases \(\beta =1\) and \(\beta =2\) respectively.
-
Case \(\beta =0\) (interaction turned off). The particles become n independent one-dimensional Ornstein–Uhlenbeck processes, and the DOU process \(X^n\) becomes exactly the n-dimensional Ornstein–Uhlenbeck process \(Z^n\) solving (1.8). The process lives in \({\mathbb {R}}^n\). The particles collide but since they do not interact, this does not raise any issue.
-
Case \(0<\beta <1\). Then with positive probability the particles collide producing a blow up of the drift, see for instance [22, 25] for a discussion. Nevertheless, it is possible to define the process for all times, for instance by adding a local time term to the stochastic differential equation, see [25] and references therein. It is natural to expect that the cutoff universality works as for \(\beta \not \in (0,1)\), but for simplicity we do not consider this case here.
-
Case \(\beta \ge 1\). If we order the coordinates by defining the convex domain
$$\begin{aligned} D_n=\{x\in {\mathbb {R}}^n:x_1<\ldots <x_n\}, \end{aligned}$$and if \(x^n_0\in D_n\) then the Eq. (1.3) admits a unique strong solution that never exits \(D_n\), in other words the particles never collide and the order of the initial particles is preserved at all times, see [60]. Moreover if
$$\begin{aligned} {\overline{D}}_n=\{x\in {\mathbb {R}}^n:x_1\le \ldots \le x_n\} \end{aligned}$$then it is possible to start the process from the boundary \({\overline{D}}_n\setminus D_n\), in particular from \(x^n_0\) such that \(x^{n,1}_0=\ldots =x^{n,n}_0\), and despite the singularity of the drift, it can be shown that with probability one, \(X^n_t\in D_n\) for all \(t>0\). We refer to [2, Th. 4.3.2] for a proof in the Dyson Brownian Motion case that can be adapted mutatis mutandis. In the sequel, we will only consider the cases \(\beta =0\) with \(x_0^n\in {\mathbb {R}}^n\) and \(\beta \ge 1\) with \(x^n_0\in {\overline{D}}_n\).
The drift in (1.3) is the gradient of a function, and (1.3) rewrites
where
can be interpreted as the energy of the configuration of particles \(x_1,\ldots ,x_n\).
-
If \(\beta =0\), then the Markov process \(X^n\) is an Ornstein–Uhlenbeck process, irreducible with unique invariant law \(P_n^0={\mathcal {N}}(0,\frac{1}{n}I_n)\) which is reversible.
-
If \(\beta \ge 1\), then the Markov process \(X^n\) is not irreducible, but \(D_n\) is a recurrent class carrying a unique invariant law \(P_n^\beta \), which is reversible and given by
$$\begin{aligned} P_n^\beta =\frac{\mathrm {e}^{-E(x_1,\ldots ,x_n)}}{C_n^\beta }{\mathbf {1}}_{(x_1,\ldots ,x_n)\in {\overline{D}}_n} \mathrm {d}x_1\ldots \mathrm {d}x_n, \end{aligned}$$(1.6)where \(C_n^\beta \) is the normalizing factor given by
$$\begin{aligned} C_n^\beta =\int _{{\overline{D}}_n} \mathrm {e}^{-E(x_1,\ldots ,x_n)}\mathrm {d}x_1\ldots \mathrm {d}x_n. \end{aligned}$$(1.7)
In terms of geometry, it is crucial to observe that since \(-\log \) is convex on \((0,+\infty )\), the map
is convex. Thus, since V is convex on \({\mathbb {R}}\), it follows that E is convex on \(D_n\). For all \(\beta \ge 0\), the law \(P_n^\beta \) is log-concave with respect to the Lebesgue measure as well as with respect to \({\mathcal {N}}(0,\frac{1}{n}I_n)\).
1.4 Non-interacting case and Ornstein–Uhlenbeck benchmark
When we turn off the interaction by taking \(\beta =0\) in (1.3), the DOU process becomes an Ornstein–Uhlenbeck process (OU) \(Z^n={(Z^n_t)}_{t\ge 0}\) on \({\mathbb {R}}^n\) solving the stochastic differential equation
where \(B^n\) is a standard n-dimensional BM. The invariant law of \(Z^n\) is the product Gaussian law \(P_n^0={\mathcal {N}}(0,\frac{1}{n}I_n)={\mathcal {N}}(0,\frac{1}{n})^{\otimes n}\). The explicit Gaussian nature of \(Z_t^n\sim {\mathcal {N}}(z_0^n\mathrm {e}^{-t},\frac{1-\mathrm {e}^{-2t}}{n}I_n)\), valid for all \(t\ge 0\), allows for a fine analysis of convergence to equilibrium, as in the following theorem.
Theorem 1.1
(Cutoff for OU: mean-field regime) Let \(Z^n={(Z^n_t)}_{t\ge 0}\) be the OU process (1.8) and let \(P_n^0\) be its invariant law. Suppose that
where \(|z|=\sqrt{z_1^2+\ldots +z_n^2}\) is the Euclidean norm. Then for all \(\varepsilon \in (0,1)\),
where
Theorem 1.1 is proved in Sect. 3. See Figs. 1 and 2 for a numerical experiment.
Theorem 1.1 constitutes a very natural benchmark for the cutoff phenomenon for the DOU process. Theorem 1.1 is not a surprise, and actually the TV and Hellinger cases are already considered in [47], see also [7]. Let us mention that in [9], a cutoff phenomenon for TV, entropy and Wasserstein is proven for the OU process of fixed dimension d and vanishing noise. This is to be compared with our setting where the dimension is sent to infinity: the results (and their proofs) are essentially the same in these two situations, however we will see below that if one considers more general initial conditions, there are some substantial differences according to whether the dimension is fixed or sent to infinity.
The restriction over the initial condition in Theorem 1.1 is spelled out in terms of the second moment of the empirical distribution, a natural choice suggested by the mean-field limit discussed in Sect. 2.5. It yields a mixing time of order \(\log (n)\), just like for Brownian motion on compact Lie groups, see [55, 64]. For the OU process and more generally for overdamped Langevin processes, the non-compactness of the space is replaced by the confinement or tightness due to the drift.
Actually, Theorem 1.1 is a particular instance of the following, much more general result that reveals that, except for the Wasserstein distance, a cutoff phenomenon always occurs.
Theorem 1.2
(General cutoff for OU). Let \(Z^n={(Z^n_t)}_{t\ge 0}\) be the OU process (1.8) and let \(P_n^0\) be its invariant law. Let \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Fisher}\}\). Then, for all \(\varepsilon \in (0,1)\),
where
Regarding the Wasserstein distance, the following dichotomy occurs:
-
if \(\lim _{n\rightarrow \infty }|z_0^n|=+\infty \), then for all \(\varepsilon \in (0,1)\), with \(c_n=\log |z_0^n|\),
$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {Wasserstein}(\mathrm {Law}(Z_{t_n}),P_n^0) = {\left\{ \begin{array}{ll} +\infty &{} \text {if }t_n=(1-\varepsilon )c_n,\\ 0 &{} \text {if }t_n=(1+\varepsilon )c_n, \end{array}\right. } \end{aligned}$$ -
if \(\lim _{n\rightarrow \infty }|z_0^n|=\alpha \in [0,\infty )\) then there is no cutoff phenomenon namely for any \(t>0\)
$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {Wasserstein}^2(\mathrm {Law}(Z_{t}),P_n^0) = \alpha ^2\mathrm {e}^{-2t} +2\Bigr (1-\sqrt{1-\mathrm {e}^{-2t}} - \tfrac{1}{2}\mathrm {e}^{-2t}\Bigr ). \end{aligned}$$
Theorem 1.2 is proved in Sect. 3.
The observation that for every distance or divergence, except for the Wasserstein distance, a cutoff phenomenon occurs generically seems to be new.
Let us make a few comments. First, in terms of convergence to equilibrium the relevant observable in Theorem 1.2 appears to be the Euclidean norm \(|z_0^n|\) of the initial condition. This quantity differs from the eigenfunction associated to the spectral gap of the generator, which is given by \(z_1+\ldots +z_n\) as we will recall later on. This is also related to the equality of (2.3) and (3.4). Second, cutoff occurs at a time that is independent of the initial condition provided that its Euclidean norm is small enough: this cutoff time appears as the time required to regularize the initial condition (a Dirac mass) into a sufficiently spread out absolutely continuous probability measure; in particular this cutoff phenomenon would not hold generically if we allowed for spread out (non-Dirac) initial conditions. Note that, for the OU process of fixed dimension and vanishing noise, we would not observe a cutoff phenomenon when starting from initial conditions with small enough Euclidean norm: this is a high dimensional phenomenon. In this respect, the Wasserstein distance is peculiar since it is much less stringent on the local behavior of the measures at stake: for instance \(\lim _{n\rightarrow \infty }\mathrm {Wasserstein}(\delta _0,\delta _{1/n})=0\) while for all other distances or divergences considered here, the corresponding quantity would remain equal to \(\max \). This explains the absence of generic cutoff phenomenon for Wasserstein. Third, the explicit expressions provided in our proof allow to extract the cutoff profile in each case, but we prefer not to provide them in our statement and refer the interested reader to the end of Sect. 3.
1.5 Exactly solvable intermezzo
When \(\beta \ne 0\), the law of the DOU process is no longer Gaussian nor explicit. However several exactly solvable aspects are available. Let us recall that a Cox–Ingersoll–Ross process (CIR) of parameters \(a,b,\sigma \) is the solution \(R = (R_t)_{t\ge 0}\) on \({\mathbb {R}}_+\) of
where W is a standard BM. Its invariant law is \(\mathrm {Gamma}(2a/\sigma ^2,2b/\sigma ^2)\) with density proportional to \(r\ge 0 \mapsto r^{2a/\sigma ^2-1}\mathrm {e}^{-2br/\sigma ^2}\), with mean a/b, and variance \(a\sigma ^2/(2b^2)\). It was proved by William Feller in [38] that the density of \(R_t\) at an arbitrary t can be expressed in terms of special functions.
If \({(Z_t)}_{t\ge 0}\) is a d-dimensional OU process of parameters \(\theta \ge 0\) and \(\rho \in {\mathbb {R}}\), weak solution of
where W is a d-dimensional BM, then \(R={(R_t)}_{t\ge 0}\), \(R_t:=|Z_t|^2\), is a CIR process with parameters \(a=\theta ^2d\), \(b=2\rho \), \(\sigma = 2\theta \). When \(\rho =0\) then Z is a BM while \(R=|Z|^2\) is a squared Bessel process.
The following theorem gathers some exactly solvable aspects of the DOU process for general \(\beta \ge 1\), which are largely already in the statistical physics folklore, see [58]. It is based on our knowledge of eigenfunctions associated to the first spectral values of the dynamics, see (2.6), and their remarkable properties. As in (2.6), we set \(\pi (x):=x_1+\ldots +x_n\) when \(x\in {\mathbb {R}}^n\).
Theorem 1.3
(From DOU to OU and CIR). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3), with \(\beta =0\) or \(\beta \ge 1\), and let \(P_n^\beta \) be its invariant law. Then:
-
\({(\pi (X^n_t))}_{t\ge 0}\) is a one-dimensional OU process weak solution of (1.8) with \(\theta =\sqrt{2}\), \(\rho =1\). Its invariant law is \({\mathcal {N}}(0,1)\). It does not depend on \(\beta \), and \(\pi (X^n_t)\sim {\mathcal {N}}(\pi (x^n_0)\mathrm {e}^{-t},1-\mathrm {e}^{-2t})\), \(t\ge 0\). Furthermore \(\pi (X^n_t)^2\) is a CIR process of parameters \(a=2\), \(b=2\), \(\sigma = 2\sqrt{2}\).
-
\({(|X^n_t|^2)}_{t\ge 0}\) is a CIR process, weak solution of (1.9) with \(a=2+\beta (n-1)\), \(b=2\), \(\sigma = \sqrt{8/n}\). Its invariant law is \(\mathrm {Gamma}(\frac{1}{2}(n+\beta \frac{n(n-1)}{2}),\frac{n}{2})\) of mean \(1+\frac{\beta }{2}(n-1)\) and variance \(\beta +\frac{2-\beta }{n}\). Furthermore, if \(d=n+\beta \frac{n(n-1)}{2}\) is a positive integer, then \({(|X^n_t|^2)}_{t\ge 0}\) has the law of \({(|Z_t|^2)}_{t\ge 0}\) where \({(Z_t)}_{t\ge 0}\) is a d-dimensional OU process, weak solution of (1.8) with \(\theta =\sqrt{2/n}\), \(\rho =1\), and \(Z_0=z^n_0\) for an arbitrary \(z^n_0\in {\mathbb {R}}^d\) such that \(|z^n_0|=|x^n_0|\).
At this step it is worth noting that Theorem 1.3 gives in particular, denoting \(\beta _n:=1+\frac{\beta }{2}(n-1)\),
Following [25, Sec. 2.2], the limits can also be deduced from the Dumitriu–Edelman tridiagonal random matrix model [32] isospectral to \(\beta \)-Hermite. These formulas for the “transient” first two moments \({\mathbb {E}}[\pi (X_t^n)]\) and \({\mathbb {E}}[|X_t^n|^2]\) reveal an abrupt convergence to their equilibrium values :
-
If \(\lim _{n\rightarrow \infty }\frac{\pi (x^n_0)}{n}=\alpha \ne 0\) then for all \(\varepsilon \in (0,1)\),
$$\begin{aligned} \lim _{n\rightarrow \infty } \vert {\mathbb {E}}[\pi (X^n_{t_n})]\vert = {\left\{ \begin{array}{ll} +\infty &{}\text {if }t_n=(1-\varepsilon )\log (n)\\ 0&{}\text {if }t_n=(1+\varepsilon )\log (n) \end{array}\right. }. \end{aligned}$$(1.12) -
If \(\lim _{n\rightarrow \infty }\frac{|x^n_0|^2}{n} = \alpha \ne \frac{\beta }{2}\) then for all \(\varepsilon \in (0,1)\), denoting \(\beta _n:=1+\frac{\beta }{2}(n-1)\),
$$\begin{aligned} \lim _{n\rightarrow \infty } \left| {\mathbb {E}}[|X^n_{t_n}|^2]-\beta _n \right| = {\left\{ \begin{array}{ll} +\infty &{}\text {if }t_n=(1-\varepsilon )\frac{1}{2}\log (n)\\ 0&{}\text {if }t_n=(1+\varepsilon )\frac{1}{2}\log (n) \end{array}\right. }. \end{aligned}$$(1.13)
These critical times are universal with respect to \(\beta \). The first two transient moments are related to the eigenfunctions (2.6) associated to the first two non-zero eigenvalues of the dynamics. Higher order transient moments are related to eigenfunctions associated to higher order eigenvalues. Note that \({\mathbb {E}}[\pi (X^n_t)]\) and \({\mathbb {E}}[|X^n_t|^2]\) are the first two moments of the non-normalized mean empirical measure \({\mathbb {E}}[\sum _{i=1}^n\delta _{X^{n,i}_t}]\), and this lack of normalization is responsible of the critical times of order \(\log (n)\). In contrast, the first two moments of the normalized mean empirical measure \({\mathbb {E}}[\frac{1}{n}\sum _{i=1}^n\delta _{X^{n,i}_t}]\), given by \(\frac{1}{n}{\mathbb {E}}[\pi (X^n_t)]\) and \(\frac{1}{n}{\mathbb {E}}[|X^n_t|^2]\) respectively, do not exhibit a critical phenomenon. This is related to the exponential decay of the first two moments in the mean-field limit (2.12), as well as the lack of cutoff for Wasserstein already revealed for OU by Theorem 1.2. This also reminds the high dimension behavior of norms in the field of the asymptotic geometric analysis of convex bodies. In another direction, this elementary observation on the moments also illustrates that the cutoff phenomenon for a given quantity is not stable under rather simple transformations of this quantity.
From the first part of Theorem 1.3 and contraction properties available for some distances or divergences, see Lemma A.2, we obtain the following lower bound on the mixing time for the DOU, which is independent of \(\beta \):
Corollary 1.4
(Lower bound on the mixing time). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3) with \(\beta =0\) or \(\beta \ge 1\), and invariant law \(P_n^\beta \). Let \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Wasserstein}\}\). Set
and assume that \(\lim _{n\rightarrow \infty }c_n=\infty \). Then, for all \(\varepsilon \in (0,1)\), we have
Theorem 1.3 and Corollary 1.4 are proved in Sect. 4.
The derivation of an upper bound on the mixing time is much more delicate: once again recall that the case \(\beta = 0\) covered by Theorem 1.2 is specific as it relies on exact Gaussian computations which are no longer available for \(\beta \ge 1\). In the next subsection, we will obtain results for general values of \(\beta \ge 1\) via more elaborate arguments.
In the specific cases \(\beta \in \{1,2\}\), there are some exactly solvable aspects that one can exploit to derive, in particular, precise upper bounds on the mixing times. Indeed, for these values of \(\beta \), the DOU process is the process of eigenvalues of the matrix-valued OU process:
where B is a BM on the symmetric \(n\times n\) matrices if \(\beta = 1\) and on Hermitian \(n\times n\) matrices if \(\beta =2\), see (5.4) and (5.16) for more details. Based on this observation, we can deduce an upper bound on the mixing times by contraction (for most distances or divergences).
Theorem 1.5
(Upper bound on mixing time in matrix case). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3) with \(\beta \in \{0,1,2\}\), and invariant law \(P_n^\beta \), and \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Wasserstein}\}\). Set
and assume that \(\lim _{n\rightarrow \infty }c_n=\infty \) if \(\mathrm {dist} = \mathrm {Wasserstein}\). Then, for all \(\varepsilon \in (0,1)\), we have
Combining this upper bound with the lower bound already obtained above, we derive a cutoff phenomenon in this particular matrix case.
Corollary 1.6
(Cutoff for DOU in the matrix case). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3), with \(\beta \in \{0,1,2\}\), and invariant law \(P_n^\beta \). Let \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Wasserstein}\}\). Let \({(a_n)}_n\) be a real sequence satisfying \(\inf _n \sqrt{n} a_n > 0\), and assume further that \(\lim _{n\rightarrow \infty }\sqrt{n} a_n=\infty \) if \(\mathrm {dist}=\mathrm {Wasserstein}\). Then, for all \(\varepsilon \in (0,1)\), we have
where
Theorem 1.5 and Corollary 1.6 are proved in Sect. 5.
It is worth noting that \(d=n+\beta \frac{n(n-1)}{2}\) in Theorem 1.3 is indeed an integer in the “random matrix” cases \(\beta \in \{1,2\}\), and corresponds then exactly to the degree of freedom of the Gaussian random matrix models GOE and GUE respectively. More precisely, if we let \(X^n_\infty \sim P_n^\beta \) then:
-
If \(\beta =1\) then \(P_n^\beta \) is the law of the eigenvalues of \(S\sim \mathrm {GOE}_n\), and \(|X^n_\infty |^2=\sum _{j,k=1}^nS_{jk}^2\) which is the sum of n squared Gaussians of variance \(v=1/n\) (diagonal) plus twice the sum of \(\frac{n^2-n}{2}\) squared Gaussians of variance \(\frac{v}{2}\) (off-diagonal) all being independent. The duplication has the effect of renormalizing the variance from \(\frac{v}{2}\) to v. All in all we have the sum of \(d=\frac{n^2+n}{2}\) independent squared Gaussians of same variance v. See Sect. 5.
-
If \(\beta =2\) then \(P_n^\beta \) is the law of the eigenvalues of \(H\sim \mathrm {GUE}_n\), and \(|X^n_\infty |^2=\sum _{j,k=1}^n|H_{jk}|^2\) is the sum of n squared Gaussians of variance \(v=1/n\) (diagonal) plus twice the sum of \(n^2-n\) squared Gaussians of variance \(\frac{v}{2}\) (off-diagonal) all being independent. All in all we have the sum of \(d=n^2\) independent squared Gaussians of same variance v. See Sect. 5.
Another manifestation of exact solvability lies at the level of functional inequalities. Indeed, and following [25], the optimal Poincaré constant of \(P_n^\beta \) is given by 1/n and does not depend on \(\beta \), and the extremal functions are tranlations/dilations of \(x\mapsto \pi (x)=x_1+\ldots +x_n\). This corresponds to a spectral gap of the dynamics equal to 1 and its associated eigenfunction. Moreover, the optimal logarithmic Sobolev inequality of \(P_n^\beta \) (Lemma B.1) is given by 2/n and does not depend on \(\beta \), and the extremal functions are of the form \(x\mapsto \mathrm {e}^{c(x_1+\ldots +x_n)}\), \(c\in {\mathbb {R}}\). This knowledge of the optimal constants and extremal functions and their independence with respect to \(\beta \) is truly remarkable. It plays a crucial role in the results presented in this article. More precisely, the optimal Poincaré inequality is used for the lower bound via the first eigenfunctions while the optimal logarithmic Sobolev inequality is used for the upper bound via exponential decay of the entropy.
1.6 Cutoff in the general interacting case
Our main contribution consists in deriving an upper bound on the mixing times in the general case \(\beta \ge 1\): the proof relies on the logarithmic Sobolev inequality, some coupling arguments and a regularization procedure.
Theorem 1.7
(Upper bound on the mixing time: the general case). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3), with \(\beta =0\) or \(\beta \ge 1\) and invariant law \(P_n^\beta \). Take \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Wasserstein}\}\). Set
Then, for all \(\varepsilon \in (0,1)\), we have
Combining this upper bound with the general lower bound that we obtained in Corollary 1.4, we deduce the following cutoff phenomenon. Observe that it holds both for \(\beta =0\) and \(\beta \ge 1\), and that the expression of the mixing time does not depend on \(\beta \).
Corollary 1.8
(Cutoff for DOU in the general case). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3) with \(\beta =0\) or \(\beta \ge 1\) and invariant law \(P_n^\beta \). Take \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Wasserstein}\}\). Let \({(a_n)}_n\) be a real sequence satisfying \(\inf _na_n > 0\). Then, for all \(\varepsilon \in (0,1)\), we have
where
The proofs of Theorem 1.7 and Corollary 1.8 for the TV and Hellinger distances are presented in Sect. 6. The Wasserstein distance is treated in Sect. 7. Let us make a comment on the assumptions made on \(a_n\) in Corollaries 1.6 and 1.8. They are dictated by the upper bounds established in Theorems 1.5 and 1.7, which take the form of maxima of two terms: one that depends on the initial condition, and another one which is a power of a logarithm of n. The logarithmic term is an upper bound on the time required to regularize a pointwise initial condition, its precise expression varies according to the method of proof we rely on: in the matrix case, it is the time required to regularize a larger object, the matrix-valued OU process; in the general case, it is related to the time it takes to make the entropy of a pointwise initial condition small. These bounds are not optimal for \(\beta =0\) (compare with Theorem 1.2), and probably neither for \(\beta \ge 1\).
A natural, but probably quite difficult, goal would be to establish a cutoff phenomenon in the situation where the set of initial conditions is reduced to any given singleton, as in Theorem 1.2 for the case \(\beta = 0\). Recall that in that case, the asymptotic of the mixing time is dictated by the Euclidean norm of the initial condition. In the case \(\beta \ge 1\), this cannot be the right observable since the Euclidean norm does not measure the distance to equilibrium. Instead one should probably consider the Euclidean norm \(|x_0^n-\rho _n|\), where \(\rho _n\) is the vector of the quantiles of order 1/n of the semi-circle law that arises in the mean-field limit equilibrium (see Sect. 2.5). More precisely
Note that \(\rho _n = 0\) when \(\beta = 0\).
A first step in this direction is given by the following result:
Theorem 1.9
(DOU in the general case and pointwise initial condition). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3) with \(\beta =0\) or \(\beta \ge 1\), and invariant law \(P_n^\beta \). There hold
-
If \(\lim _{n\rightarrow \infty }|x^n_0-\rho _n|=+\infty \), then, denoting \(t_n = \log (|x_0^n-\rho _n|)\), for all \(\varepsilon \in (0,1)\),
$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {Wasserstein}(\mathrm {Law}(X_{(1+\varepsilon )t_n}),P_n^\beta ) = 0. \end{aligned}$$ -
If \(\lim _{n\rightarrow \infty }|x^n_0-\rho _n|=\alpha \in [0,\infty )\), then, for all \(t>0\),
$$\begin{aligned} \varlimsup _{n\rightarrow \infty } \mathrm {Wasserstein}(\mathrm {Law}(X_{t}),P_n^\beta )^2 \le \alpha ^2\mathrm {e}^{-2t}. \end{aligned}$$
Theorem 1.9 is proved in Sect. 7.
1.7 Non-pointwise initial conditions
It is natural to ask about the cutoff phenomenon when the initial conditions \(X^n_0\) is not pointwise. Even if we turn off the interaction by taking \(\beta =0\), the law of the process at time t is then no longer Gaussian in general, which breaks the method of proof used for Theorem 1.1 and Theorem 1.2. Nevertheless, Theorem 1.10 below provides a universal answer, that is both for \(\beta =0\) and \(\beta \ge 1\), at the price however of introducing several objects and notations. More precisely, for any probability measure \(\mu \) on \({\mathbb {R}}^n\), we introduce
Note that S takes its values in the whole \((-\infty ,+\infty ]\), and when \(S(\mu )<+\infty \) then \(-S(\mu )\) is the Boltzmann–Shannon entropy of the law \(\mu \). For all \(x\in {\mathbb {R}}^n\) with \(x_i \ne x_j\) for all \(i\ne j\), we have
where \(\displaystyle L_n:=\frac{1}{n}\sum _{i=1}^n\delta _{x_i}\) and where \(\displaystyle \Phi (x,y):=\frac{n}{n-1}\frac{V(x)+V(y)}{2}+\frac{\beta }{2}\log \frac{1}{|x-y|}\).
Let us define the map \(\Psi :{\mathbb {R}}^n\mapsto {\overline{D}}_n\) by
where \(\sigma \) is any permutation of \(\{1,\ldots ,n\}\) that reorders the particles non-decreasingly.
Theorem 1.10
(Cutoff for DOU with product smooth initial conditions). Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3) with \(\beta =0\) or \(\beta \ge 1\), and invariant law \(P_n^\beta \). Let S, \(\Phi \), and \(\Psi \) be as in (1.15), (1.16), and (1.17). Let us assume that \(\mathrm {Law}(X_0^n)\) is the image law or push forward of a product law \(\mu _1\otimes \ldots \otimes \mu _n\) by \(\Psi \) where \(\mu _1,\ldots ,\mu _n\) are laws on \({\mathbb {R}}\). Then:
-
(1)
If \(\displaystyle \varliminf _{n\rightarrow \infty } \Bigr |\frac{1}{n}\sum _{i=1}^n \int x \mu _i(\mathrm {d}x)\Bigr | \ne 0\) then, for all \(\varepsilon \in (0,1)\),
$$\begin{aligned} \lim _{n\rightarrow \infty }\mathrm {Kullback}(\mathrm {Law}(X_{(1-\varepsilon )\log (n)})\mid P_n^\beta )=+\infty . \end{aligned}$$ -
(2)
If \(\displaystyle \varlimsup _{n\rightarrow \infty }\frac{1}{n^2}\sum _{i=1}^n S(\mu _i)<\infty \) and \(\displaystyle \varlimsup _{n\rightarrow \infty } \frac{1}{n^2}\sum _{i\ne j}\iint \Phi \,\mathrm {d}\mu _i \otimes \mathrm {d}\mu _j <\infty \), then, for all \(\varepsilon \in (0,1)\),
$$\begin{aligned} \lim _{n\rightarrow \infty }\mathrm {Kullback}(\mathrm {Law}(X_{(1+\varepsilon )\log (n)})\mid P_n^\beta )=0. \end{aligned}$$
Theorem 1.10 is proved in Sect. 6.3.
It is likely that Theorem 1.10 can be extended to the case \(\mathrm {dist}\in \{\mathrm {Wasserstein}, \mathrm {Hellinger}, \mathrm {Fisher}\}\).
1.8 Structure of the paper
-
Section 2 provides additional comments and open problems.
-
Section 3 focuses on the OU process (\(\beta = 0\)) and gives the proofs of Theorems 1.1 and 1.2.
-
Section 4 concerns the exact solvability of the DOU process for all \(\beta \), and provides the proofs of Theorem 1.3 and Corollary 1.4.
-
Section 5 is about random matrices and gives the proofs of Theorem 1.5 and Corollary 1.6.
-
Section 6 deals with the DOU process for all \(\beta \) with the TV and Hellinger distances, and provides the proofs of Theorem 1.7 and Corollary 1.8.
-
Section 7 gives the Wasserstein counterpart of Sect. 6 and the proof of Theorem 1.9.
-
Appendix A provides a survey on distances and divergences, with new results.
-
Appendix B gathers useful dynamical consequences of convexity.
2 Additional comments and open problems
2.1 About the results and proofs
The proofs of our results rely among other ingredients on convexity and optimal functional inequalities, exact solvability, exact Gaussian formulas, coupling arguments, stochastic calculus, variational formulas, contraction properties and regularization.
The proofs of Theorems 1.1 and 1.2 are based on the explicit Gaussian nature of the OU process, which allows to use Gaussian formulas for all the distances and divergences that we consider (the Gaussian formula for \(\mathrm {Fisher}\) seems to be new). Our analysis of the convergence to equilibrium of the OU process seems to go beyond what is already known, see for instance [47] and [8,9,10,11].
Theorem 1.3 is a one-dimensional analogue of [15, Th. 1.2]. The proof exploits the explicit knowledge of eigenfunctions of the dynamics (2.6), associated with the first two non-zero spectral values, and their remarkable properties. The first one is associated to the spectral gap and the optimal Poincaré inequality. It implies Corollary 1.4, which is the provider of all our lower bounds on the mixing time for the cutoff.
The proof of Theorem 1.5 is based on a contraction property and the upper bound for matrix OU processes. It is not available beyond the matrix cases. All the other upper bounds that we establish are related to an optimal exponential decay which comes from convexity and involves sometimes coupling, the simplest instance being Theorem 1.7 about the Wasserstein distance. The usage of the Wasserstein metrics for Dyson dynamics is quite natural, see for instance [13].
The proof of Theorem 1.7 for the \(\mathrm {TV}\) and \(\mathrm {Hellinger}\) relies on the knowledge of the optimal exponential decay of the entropy (with respect to equilibrium) related to the optimal logarithmic Sobolev inequality. Since pointwise initial conditions have infinite entropy, the proof proceeds in three steps: first we regularize the initial condition to make its entropy finite, second we use the optimal exponential decay of the entropy of the process starting from this regularized initial condition, third we control the distance between the processes starting from the initial condition and its regularized version. This last part is inspired by a work of Lacoin [48] for the simple exclusion process on the segment, subsequently adapted to continuous state-spaces [18, 19], where one controls an area between two versions of the process.
The (optimal) exponential decay of the entropy (Lemma B.2) is equivalent to the (optimal) logarithmic Sobolev inequality (Lemma B.1). For the DOU process, the optimal logarithmic Sobolev inequality provided by Lemma B.1 achieves also the universal bound with respect to the spectral gap, just like for Gaussians. This sharpness between the best logarithmic Sobolev constant and the spectral gap also holds for instance for the random walk on the hypercube, a discrete process for which a cutoff phenomenon can be established with the optimal logarithmic Sobolev inequality, and which can be related to the OU process, see for instance [29, 30] and references therein. If we generalize the DOU process by adding an arbitrary convex function to V, then we will still have a logarithmic Sobolev inequality – see [25] for several proofs including the proof via the Bakry–Émery criterion – however the optimal logarithmic Sobolev constant will no longer be explicit nor sharp with respect to the spectral gap, and the spectral gap will no longer be explicit.
The proof of Theorem 1.10 relies crucially on the tensorization property of \(\mathrm {Kullback}\) and on the asymptotics on the normalizing constant \(C_n^\beta \) at equilibrium.
2.2 Analysis and geometry of the equilibrium
The full space \({\mathbb {R}}^n\) is, up to a bunch of hyperplanes, covered with n! disjoint isometric copies of the convex domain \(D_n\) obtained by permuting the coordinates (simplices or Weyl chambers). Following [25], for all \(\beta \ge 0\) let us define the law \(P_{*n}^\beta \) on \({\mathbb {R}}^n\) with density proportional to \(\mathrm {e}^{-E}\), just like for \(P_n^\beta \) in (1.6) but without the \({\mathbf {1}}_{(x_1,\ldots ,x_n)\in {\overline{D}}_n}\).
If \(\beta =0\) then \(P_{*n}^0=P_n^0={\mathcal {N}}(0,\frac{1}{n}I_n)\) according to our definition of \(P_n^0\).
If \(\beta > 0\) then \(P_{*n}^\beta \) has density \((C_{*n}^\beta )^{-1}\mathrm {e}^{-E}\) with \(C_{*n}^\beta =n!C_n^\beta \) where \(C_n^\beta \) is the normalization of \(P_n^\beta \). Moreover \(P_{*n}^\beta \) is a mixture of n! isometric copies of \(P_n^\beta \), while \(P_n^\beta \) is the image law or push forward of \(P_{*n}^\beta \) by the map \(\Psi _n:{\mathbb {R}}^n\rightarrow {\overline{D}}_n\) defined in (1.17). Furthermore for all bounded measurable \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\), denoting \(\Sigma _n\) the symmetric group of permutations of \(\{1,\ldots ,n\}\),
Regarding log-concavity, it is important to realize that if \(\beta =0\) then E is convex on \({\mathbb {R}}^n\), while if \(\beta >0\) then E is convex on \(D_n\) but is not convex on \({\mathbb {R}}^n\) and has n! isometric local minima.
-
The law \(P_{*n}^\beta \) is centered but is not log-concave when \(\beta >0\) since E is not convex on \({\mathbb {R}}^n\).
As \(\beta \rightarrow 0^+\) the law \(P_{*n}^\beta \) tends to \(P_{*n}^0=P_n^0={\mathcal {N}}(0,\frac{1}{n}I_n)\) which is log-concave.
-
The law \(P_n^\beta \) is not centered but is log-concave for all \(\beta \ge 0\).
Its density vanishes at the boundary of \(D_n\) if \(\beta >0\).
As \(\beta \rightarrow 0^+\) the law \(P_n^\beta \) tends to the law of the order statistics of n i.i.d. \({\mathcal {N}}(0,\frac{1}{n})\).
2.3 Spectral analysis of the generator: the non-interacting case
This subsection and the next deal with analytical aspects of our dynamics. We start with the OU process (\(\beta =0\)) for which everything is explicit; the next subsection deals with the DOU process (\(\beta \ge 1\)).
The infinitesimal generator of the OU process is given by
It is a self-adjoint operator on \(L^2({\mathbb {R}}^n, P_n^0)\) that leaves globally invariant the set of polynomials. Its spectrum is the set of all non-positive integers, that is, \(\lambda _0 = 0> \lambda _1 = - 1> \lambda _2 = -2 > \ldots \). The corresponding eigenspaces \(F_0,F_1,F_2,\ldots \) are finite dimensional: \(F_m\) is spanned by the multivariate Hermite polynomials of degree m, in other words tensor products of univariate Hermite polynomials. In particular, \(F_0\) is the vector space of constant functions while \(F_1\) is the n-dimensional vector space of all linear functions.
Let us point out that \({{\,\mathrm{G}\,}}\) can be restricted to the set of \(P_n^0\) square integrable symmetric functions: it leaves globally invariant the set of symmetric polynomials, its spectrum is unchanged but the associated eigenspaces \(E_m\) are the restrictions of the vector spaces \(F_m\) to the set of symmetric functions, in other words, \(E_m\) is spanned by the multivariate symmetrized Hermite polynomials of degree m. Note that \(E_1\) is the one-dimensional space generated by \(\pi (x) =x_1+\ldots +x_n\).
The Markov semigroup \({(\mathrm {e}^{t{{\,\mathrm{G}\,}}})}_{t\ge 0}\) generated by \({{\,\mathrm{G}\,}}\) admits \(P_n^0\) as a reversible invariant law since \({{\,\mathrm{G}\,}}\) is self-adjoint in \(L^2(P_n^0)\). Following [62], let us introduce the heat kernel \(p_t(x,y)\) which is the density of \(\mathrm {Law}(X^n_t\mid X^n_0=x)\) with respect to the invariant law \(P_n^0\). The long-time behavior reads \(\lim _{t\rightarrow \infty }p_t(x,\cdot )=1\) for all \(x\in {\mathbb {R}}^n\). Let \(\left\| \cdot \right\| _p\) be the norm of \(L^p=L^p(P_n^0)\). For all \(1\le p\le q\), \(t\ge 0\), \(x\in {\mathbb {R}}^n\), we have
In the particular case \(p=2\) we can write
where \(B_m\) is an orthonormal basis of \(F_m\subset L^2(P_n^0)\), hence
which leads to a lower bound on the \(\chi ^2\) (in other words \(L^2\)) cutoff, provided one can estimate \(\sum _{\psi \in B_1}|\psi (x)|^2\) which is the square of the norm of the projection of \(\delta _x\) on \(B_1\).
Following [62, Th. 6.2], an upper bound would follow from a Bakry–Émery curvature–dimension criterion \(\mathrm {CD}(\rho ,d)\) with a finite dimension d, in relation with Nash–Sobolev inequalities and dimensional pointwise estimates on the heat kernel \(p_t(x,\cdot )\) or ultracontractivity of the Markov semigroup, see for instance [63, Sec. 4.1]. The OU process satisfies to \(\mathrm {CD}(\rho ,\infty )\) but never to \(\mathrm {CD}(\rho ,d)\) with d finite and is not ultracontractive. Actually the OU process is a critical case, see [3, Ex. 2.7.3].
2.4 Spectral analysis of the generator: the interacting case
We now assume that \(\beta \ge 1\). The infinitesimal generator of the DOU process is the operator
Despite the interaction term, the operator leaves globally invariant the set of symmetric polynomials. Following Lassalle in [4, 49], see also [25], the operator \({{\,\mathrm{G}\,}}\) is a self-adjoint operator on the space of \(P_{*n}^\beta \) square integrable symmetric functions of n variables, its spectrum does not depend on \(\beta \) and matches the spectrum of the OU process case \(\beta =0\). In particular the spectral gap is 1. The eigenspaces \(E_m\) are spanned by the generalized symmetrized Hermite polynomials of degree m. For instance, \(E_1\) is the one-dimensional space generated by \( \pi (x)=x_1+\ldots +x_n\) while \(E_2\) is the two-dimensional space spanned by
From the isometry between \(L^2({\overline{D}}_n,P_n^\beta )\) and \(L^2_{\mathrm {sym}}({\mathbb {R}}^n,P_{*n}^\beta )\), the above explicit spectral decomposition applies to the semigroup of the DOU on \({\overline{D}}_n\). Formally, the discussion presented at the end of the previous subsection still applies. However, in the present interacting case the integrability properties of the heat kernel are not known: in particular, we do not know whether \(p_t(x,\cdot )\) lies in \(L^p(P_n^\beta )\) for \(t>0\), \(x\in {\overline{D}}_n\) and \(p>1\). This leads to the question, of independent interest, of pointwise upper and lower Gaussian bounds for heat kernels similar to the OU process, with explicit dependence of the constants over the dimension. We refer for example to [36, 41, 65] for some results in this direction.
2.5 Mean-field limit
The measure \(P_n^\beta \) is log-concave since E is convex, and its density writes
See [25, Sec. 2.2] for a high-dimensional analysis. The Boltzmann–Gibbs measure \(P_n^\beta \) is known as the \(\beta \)-Hermite ensemble or H\(\beta \)E. When \(\beta =2\), it is better known as the Gaussian Unitary Ensemble (GUE). If \(X^n\sim P_n^\beta \) then the Wigner theorem states that the empirical measure with atoms distributed according to \(P_n^\beta \) converges in distribution to a semi-circle law, namely
and this can be deduced in this Coulomb gas context from a large deviation principle as in [12].
Let \({(X^n)}_{t\ge 0}\) be the process solving (1.3) with \(\beta \ge 0\) or \(\beta \ge 1\), and let
be the empirical measure of the particles at time t. Following notably [14, 20, 21, 31, 53, 60], if the sequence of initial conditions \({(\mu _0^n)}_{n\ge 1}\) converges weakly as \(n\rightarrow \infty \) to a probability measure \(\mu _0\), then the sequence of measure valued processes \({({(\mu _t^n)}_{t\ge 0})}_{n\ge 1}\) converges weakly to the unique probability measure valued deterministic process \({(\mu _t)}_{t\ge 0}\) satisfying the evolution equation
for all \(t\ge 0\) and \(f\in {\mathcal {C}}^3_b({\mathbb {R}},{\mathbb {R}})\). The Eq. (2.10) is a weak formulation of a McKean–Vlasov equation or free Fokker–Planck equation associated to a free OU process. Moreover, if \(\mu _0\) has all its moments finite, then for all \(t\ge 0\), we have the free Mehler formula
where \(\mathrm {dil}_\sigma \mu \) is the law of \(\sigma X\) when \(X\sim \mu \), where “\(\boxplus \)” stands for the free convolution of probability measures of Voiculescu free probability theory, and where \(\mu _\infty \) is the semi-circle law of variance \(\frac{\beta }{2}\). In particular, if \(\mu _0\) is a semi-circle law then \(\mu _t\) is a semi-circle law for all \(t\ge 0\).
Let us introduce the k-th moment \(m_k(t):=\displaystyle \int x^k\mu _t(\mathrm {d}x)\) of \(\mu _t\). The first and second moments satisfy the differential equations \(m_1'=-m_1\) and \(m_2'=-2m_2+\beta \) respectively, which give
More generally, beyond the first two moments, the Cauchy–Stieltjes transform
of \(\mu _t\) is the solution of the following complex Burgers equation
The semi-circle law on \([-c,c]\) has density \(\frac{2\sqrt{c^2-x^2}}{\pi c^2}{\mathbf {1}}_{x\in [-c,c]}\), mean 0, second moment or variance \(\frac{c^2}{4}\), and Cauchy–Stieltjes transform \(s_t(z)=\frac{\sqrt{4z^2-4c^2}-2z}{c^2}\), \(t\ge 0 , z\in {\mathbb {C}}_+\).
The cutoff phenomenon is in a sense a diagonal (t, n) estimate, melting long time behavior and high dimension. When \(|z_0^n|\) is of order n, cutoff occurs at a time of order \(\approx \log (n)\): this informally corresponds to taking \(t\rightarrow \infty \) in \((\mu _t)_{t\ge 0}\).
When \(\mu _0\) is centered with same second moment \(\frac{\beta }{2}\) as \(\mu _\infty \), then there is a Boltzmann H-theorem interpretation of the limiting dynamics as \(n\rightarrow \infty \): the steady-state is the Wigner semi-circle law \(\mu _\infty \), the second moment is conserved by the dynamics, the Voiculescu entropy is monotonic along the dynamics, grows exponentially, and is maximized by the steady-state.
2.6 \(L^p\) cutoff
Following [26], we can deduce an \(L^p\) cutoff started from x from an \(L^1\) cutoff by showing that the heat kernel \(p_t(x,\cdot )\) is in \(L^p(P_n^\beta )\) for some \(t>0\). Thanks to the Mehler formula, it can be checked that this holds for the OU case, despite the lack of ultracontractivity. The heat kernel of the DOU process is less accessible.
In another exactly solvable direction, the \(L^p\) cutoff phenomenon has been studied for instance in [62, 64] for Brownian motion on compact simple Lie groups, and in [55, 64] for Brownian motion on symmetric spaces, in relation with representation theory, an idea which goes back at the origin to the works of Diaconis on random walks on groups.
2.7 Cutoff window and profile
Once a cutoff phenomenon is established, one can ask for a finer description of the pattern of convergence to equilibrium. The cutoff window is the order of magnitude of the transition time from the value \(\max \) to the value 0: more precisely, if cutoff occurs at time \(c_n\) then we say that the cutoff window is \(w_n\) if
and for any \(b\in {\mathbb {R}}\)
Note that necessarily \(w_n = o(c_n)\) by definition of the cutoff phenomenon. Note also that \(w_n\) is unique in the following sense: \(w'_n\) is a cutoff window if and only if \(w_n/w'_n\) remains bounded from above and below as \(n\rightarrow \infty \).
We say that the cutoff profile is given by \(\varphi :{\mathbb {R}}\rightarrow [0,1]\) if
The analysis of the OU process carried out in Theorems 1.1 and 1.2 can be pushed further to establish the so-called cutoff profiles, we refer to the end of Sect. 3 for details.
Regarding the DOU process, such a detailed description of the convergence to equilibrium does not seem easily accessible. However it is straightforward to deduce from our proofs that the cutoff window is of order 1, in other words the inverse of the spectral gap, in the setting of Corollary 1.6. This is also the case in the setting of Corollary 1.8 for the Wasserstein distance.
We believe that this remains true in the setting of Corollary 1.8 for the TV and Hellinger distances: actually, a lower bound of the required order can be derived from the calculations in the proof of Corollary 1.4; on the other hand, our proof of the upper bound on the mixing time does not allow to give a precise enough upper bound on the window.
2.8 Other potentials
It is natural to ask about the cutoff phenomenon for the process solving (1.3) when V is a more general \({\mathcal {C}}^2\) function. The invariant law \(P_n^\beta \) of this Markov diffusion writes
The case where \(V-\frac{\rho }{2}\left| \cdot \right| ^2\) is convex for some constant \(\rho \ge 0\) generalizes the DOU case and has exponential convergence to equilibrium, see [25]. Three exactly solvable cases are known:
-
\(\mathrm {e}^{-V(x)}=\mathrm {e}^{-\frac{x^2}{2}}\): the DOU process associated to the Gaussian law weight and the \(\beta \)-Hermite ensemble including HOE/HUE/HSE when \(\beta \in \{1,2,4\}\),
-
\(\mathrm {e}^{-V(x)}=x^{a-1}\mathrm {e}^{-x}{\mathbf {1}}_{x\in [0,\infty )}\): the Dyson–Laguerre process associated to the Gamma law weight and the \(\beta \)-Laguerre ensembles including LOE/LUE/LSE when \(\beta \in \{1,2,4\}\),
-
\(\mathrm {e}^{-V(x)}=x^{a-1}(1-x)^{b-1}{\mathbf {1}}_{x\in [0,1]}\): the Dyson–Jacobi process associated to the Beta law weight and the \(\beta \)-Jacobi ensembles including JOE/JUE/JSE when \(\beta \in \{1,2,4\}\),
up to a scaling. Following Lassalle [4, 49,50,51] and Bakry [5], in these three cases, the multivariate orthogonal polynomials of the invariant law \(P_n^\beta \) are the eigenfunctions of the dynamics of the process. We refer to [32, 35, 54] for more information on (H/L/J)\(\beta \)E random matrix models.
The contraction property or spectral projection used to pass from a matrix process to the Dyson process can be used to pass from BM on the unitary group to the Dyson circular process for which the invariant law is the Circular Unitary Ensemble (CUE). This provides an upper bound for the cutoff phenomenon. The cutoff for BM on the unitary group is known and holds at critical time or order \(\log (n)\), see for instance [55, 62, 64].
More generally, we could ask about the cutoff phenomenon for a McKean–Vlasov type interacting particle system \({(X^n_t)}_{t\ge 0}\) in \(({\mathbb {R}}^d)^n\) solution of the stochastic differential equation of the form
for various types of confinement V and interaction W (convex, repulsive, attractive, repulsive-attractive, etc), and discuss the relation with the propagation of chaos. The case where V and W are both convex and constant in time is already very well studied from the point of view of long-time behavior and mean-field limit in relation with convexity, see for instance [20, 21, 53].
Regarding universality, it is worth noting that if \(V=\left| \cdot \right| ^2\) and if W is convex then the proof by factorization of the optimal Poincaré and logarithmic Sobolev inequalities and their extremal functions given in [25] remains valid, paving the way to the generalization of many of our results in this spirit. On the other hand, the convexity of the limiting energy functional in the mean-field limit is of Bochner type and suggests to take for W a power, in other words a Riesz type interaction.
2.9 Alternative parametrization
If \({(X^n_t)}_{t\ge 0}\) is the process solution of the stochastic differential equation (1.3), then for all real parameters \(\alpha >0\) and \(\sigma >0\), the space scaled and time changed stochastic process \({(Y^n_t)}_{t\ge 0}={(\sigma X^n_{\alpha t})}_{t\ge 0}\) solves the stochastic differential equation
where \({(B_t)}_{t\ge 0}\) is a standard n-dimensional BM. The invariant law of \({(Y^n_t)}_{t\ge 0}\) is
where \(C_n^\beta \) is the normalizing constant. This law and its normalization \(C_n^\beta \) depend on the “shape parameter” \(\beta \), the “scale parameter” \(\sigma \), and does not depend on the “speed parameter” \(\alpha \). When \(\beta >0\), taking \(\sigma ^2=\beta ^{-1}\), the stochastic differential equation (2.17) boils down to
while the invariant law becomes
The Eq. (2.19) is the one considered in [37, Eq. (12.4)] and in [46, Eq. (1.1)]. The advantage of (2.19) is that \(\beta \) can be now truly interpreted as an inverse temperature and the right-hand side in the analogue of (2.8) does not depend on \(\beta \), while the drawback is that we cannot turn off the interaction by setting \(\beta =0\) and recover the OU process as in (1.3). It is worthwhile mentioning that for instance Theorem 1.7 remains the same for the process solving (2.19) in particular the cutoff threshold is at critical time \(\frac{c_n}{\alpha }\) and does not depend on \(\beta \).
2.10 Discrete models
There are several discrete space Markov processes admitting the OU process as a scaling limit, such as for instance the random walk on the discrete hypercube, related to the Ehrenfest model, for which the cutoff has been studied in [29, 30], and the M/M/\(\infty \) queuing process, for which a discrete Mehler formula is available [24]. Certain discrete space Markov processes incorporate a singular repulsion mechanism, such as for instance the exclusion process on the segment, for which the study of the cutoff in [48] shares similarities with our proof of Theorem 1.7. It is worthwhile noting that there are discrete Coulomb gases, related to orthogonal polynomials for discrete measures, suggesting to study discrete Dyson processes. More generally, it could be natural to study the cutoff phenomenon for Markov processes on infinite discrete state spaces, under curvature condition, even if the subject is notoriously disappointing in terms of high-dimensional analysis. We refer to the recent work [61] for the finite state space case.
3 Cutoff phenomenon for the OU
In this section, we prove Theorems 1.1 and 1.2: actually we only prove the latter since it implies the former. We start by recalling a well-known fact.
Lemma 3.1
(Mehler formula). If \({(Y_t)}_{t\ge 0}\) is an OU process in \({\mathbb {R}}^d\) solution of the stochastic differential equation \(Y_0=y_0\in {\mathbb {R}}^d\) and \(\mathrm {d}Y_t=\sigma \mathrm {d}B_t-\mu Y_t\mathrm {d}t\) for parameters \(\sigma >0\) and \(\mu >0\) where B is a standard d-dimensional Brownian motion then
Moreover its coordinates are independent one-dimensional OU processes with initial condition \(y_0^i\) and invariant law \({\mathcal {N}}(0,\frac{\sigma ^2}{2\mu })\), \(1\le i\le d\).
Proof of Theorem 1.1 and Theorem 1.2
By using Lemma 3.1, for all \(n\ge 1\) and \(t\ge 0\),
Hellinger, Kullback, \(\chi ^2\), Fisher, and Wasserstein cutoffs. A direct computation from (3.1) or Lemma A.5 either from multivariate Gaussian formulas or univariate via tensorization gives
which gives the desired lower and upper bounds as before by using the hypothesis on \(z^n_0\).
Total variation cutoff. By using the comparison between total variation and Hellinger distances (Lemma A.1) we deduce from (3.2) the cutoff in total variation distance at the same critical time. The upper bound for the total variation distance can alternatively be obtained by using the \(\mathrm {Kullback}\) estimate (3.3) and the Pinsker–Csiszár–Kullback inequality (Lemma A.1). Since both distributions are tensor products, we could use alternatively the tensorization property of the total variation distance (Lemma A.4) together with the one-dimensional version of the Gaussian formula for \(\mathrm {Kullback}\) (Lemma A.1) to obtain the result for the total variation. \(\square \)
Remark 3.2
(Competition between bias and variance mixing). From the computations of the proof of Theorem 1.2, we can show that for \(\mathrm {dist} \in \{\mathrm {TV}, \mathrm {Hellinger}, \chi ^2\}\)
has a cutoff at time \(c_n^{A} = \log (\sqrt{n} |z^n_0|)\), while
admits a cutoff at time \(c_n^{B} = \frac{1}{4} \log (n)\). The triangle inequality for \(\mathrm {dist}\) yields
Therefore the critical time of Theorem 1.2 is dictated by either \(A_t\) or \(B_t\), according to whether \(c_n^A \gg c_n^B\) or \(c_n^A \ll c_n^B\). This can be seen as a competition between bias and variance mixing.
Remark 3.3
(Total variation discriminating event for small initial conditions). Let us introduce the random variable \(Z^n_\infty \sim P_n^0={\mathcal {N}}(0,\frac{1}{n}I_n)={\mathcal {N}}(0,\frac{1}{n})^{\otimes n}\), in accordance with (3.1). There holds
We can check, using an explicit computation of Hellinger and Kullback between Gamma distributions and the comparison between total variation and Hellinger distances (Lemma A.1), that
admits a cutoff at time \(c_n^C = c_n^B = \frac{1}{4} \log (n)\). Moreover, one can exhibit a discriminating event for the TV distance. Namely, we can observe that
with \(\alpha _t\) the unique point where the two densities meet, which happens to be
From the explicit expressions (3.2), (3.3), (3.4), (3.5), (3.6), it is immediate to extract the cutoff profile associated to the convergence of \(\mathrm {Law}(Z_t^n)\) to \(P_n^0\) in Hellinger, Kullback, \(\chi ^2\), Fisher and Wasserstein. For Wasserstein we already know by Theorem 1.2 that a cutoff occurs if and only if \(|z_0^n|\underset{n\rightarrow \infty }{\rightarrow } \infty \). In this case, regarding the profile, we have
where for all \(b\in {\mathbb {R}}\),
For the other distances and divergences, let us assume that the following limit exists
This quantity can be related with
which were already introduced in Remark 3.2. Indeed
while \(a\in (0,\infty )\) is equivalent to \(c_n^A \asymp c_n^B\).
Then, for \(\mathrm {dist}\in \{\mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Fisher}\}\), we have, for all \(b\in {\mathbb {R}}\),
where \(t_{n,b}\) and \(\phi (b)\) are as in Table 1. The cutoff window is always of size 1.
Since the total variation distance is not expressed in a simple explicit manner, further computations are needed to extract the precise cutoff profile, which is given in the following lemma:
Lemma 3.4
(Cutoff profile in \(\mathrm {TV}\) for OU). Let \(Z^n=(Z_t^n)_{t\ge 0}\) be the OU process (1.8), started from \(z_0^n\in {\mathbb {R}}^n\), and let \(P_n^0\) be its invariant law. Assume as in (3.9) that \(a:=\lim _{n\rightarrow \infty }|z_0^n|^2\sqrt{n}\in [0,+\infty ]\), and let \(t_{n,b}\) be as in Table 1 for Hellinger. Then, for all \(b\in {\mathbb {R}}\), we have
where
where \(\displaystyle \mathrm {erf}(u):=\frac{1}{\sqrt{\pi }}\int _{|t|\le u}\mathrm {e}^{-t^2}\mathrm {d}t={\mathbb {P}}(|X|\le \sqrt{2}u)\) with \(X\sim {\mathcal {N}}(0,1)\) is the error function.
Proof of Lemma 3.4
The idea is to exploit the fact that we consider Gaussian product measures (the covariance matrices are multiple of the identity), which allows a finer analysis than for instance in [27, Le. 3.1]. We begin with a rather general step. Let \(\mu \) and \(\nu \) be two probability measures on \({\mathbb {R}}^n\) with densities f and g with respect to the Lebesgue measure \(\mathrm {d}x\). We have then
and since
we obtain
In particular, when \(\mu ={\mathcal {N}}(m_1,\sigma _1^2I_n)\) and \(\nu ={\mathcal {N}}(m_2,\sigma _2^2I_n)\) then \(g(x)\le f(x)\) is equivalent to
for all \(x\in {\mathbb {R}}^n\), and therefore, with \(Z_1\sim \mu \) and \(Z_2\sim \nu \), we get
Let us assume from now on that \(m_2=0\) and \(\sigma _1\ne \sigma _2\). We can then gather the quadratic terms as
We observe at this step that the random variable \(\frac{|Z_1-{\tilde{m}}_1|^2}{\sigma _1^2}\) follows a noncentral chi-squared distribution, which depends only on n and on the noncentrality parameter
Similarly, the random variable \(\frac{|Z_2-{\tilde{m}}_1|^2}{\sigma _2^2}\) follows a noncentral chi-squared distribution, which depends only on n and on the noncentrality parameter
It follows that the law of \(\psi (Z_1)\) and the law of \(\psi (Z_2)\) depend over \(m_1\) only via \(|m_1|\). Hence
where \(X_1,\ldots ,X_n\) and \(Y_1,\ldots ,Y_n\) are two sequences of i.i.d. random variables for which the mean and variance depends only (and explicitly) on \(|m_1|\), \(\sigma _1\), \(\sigma _2\). Note in particular that these means and variances are given by \(\frac{1}{n}\) the ones of \(\psi (Z_1)\) and \(\psi (Z_2)\). Now we specialize to the case where \(\mu =\mathrm {Law}(Z^n_t)={\mathcal {N}}(z_0^n\mathrm {e}^{-t},\frac{1-\mathrm {e}^{-2t}}{n}I_n)\) and \(\nu =\mathrm {Law}(Z^n_\infty )={\mathcal {N}}(0,\frac{1}{n}I_n)=P_n^0\), and we find
while
Let \(t=t_{n,b}\) be as in Table 1 for Hellinger. Using (3.15) and the central limit theorem for the i.i.d. random variables \(X_1,\ldots ,X_n\) and \(Y_1,\ldots ,Y_n\), we get, with \(Z\sim {\mathcal {N}}(0,1)\),
where
Expanding \(\gamma _{n,t_{n,b} }\) gives the cutoff profile. Let us detail the computations in the most involved case \(\lim _{n\rightarrow \infty }|z_0^n|^2\sqrt{n}=a\in (0,+\infty )\). For all \(b\in {\mathbb {R}}\), recall \(t_{n,b}=\frac{\log n}{4}+b\). One may check that
It follows that
The other cases are similar. \(\square \)
4 General exactly solvable aspects
In this section, we prove Theorem 1.3 and Corollary 1.4.
The proof of Theorem 1.3 is based on the fact that the polynomial functions \(\pi (x)=x_1+\ldots +x_n\) and \(|x|^2=x_1^2+\ldots +x_n^2\) are, up to an additive constant for the second, eigenfunctions of the dynamics associated to the spectral values \(-1\) and \(-2\) respectively, and that their “carré du champ” is affine. In the matrix cases \(\beta \in \{1,2\}\), these functions correspond to the dynamics of the trace, the dynamics of the squared Hilbert–Schmidt trace norm, and the dynamics of the squared trace. It is remarkable that this phenomenon survives beyond these matrix cases, yet another manifestation of the Gaussian “ghosts” concept due to Edelman, see for instance [34].
Proof of Theorem 1.3
The process \(Y_t := \pi (X_t^n)\) solves
By symmetry, the double sum vanishes. Note that the process \(W_t := \frac{1}{\sqrt{n}} \sum _{i=1}^n B^i_t\) is a standard one dimensional BM, so that \( \mathrm {d}Y_t = \sqrt{2} \mathrm {d}W_t - Y_t \mathrm {d}t. \) This proves the first part of the statement.
We turn to the second part. Recall that \(X_t \in D_n\) for all \(t>0\). By Itô’s formula
Set \(W_t := \sum _{i=1}^n \int _0^t \frac{X_s^{n,i}}{|X^n_s|} \mathrm {d}B^i_s\). The process \({(W_t)}_{t\ge 0}\) is a BM by the Lévy characterization since
Furthermore, a simple computation shows that
Consequently the process \(R_t := |X_t^n|^2\) solves
and is therefore a CIR process of parameters \(a=2+\beta (n-1)\), \(b=2\), and \(\sigma = \sqrt{8/n}\).
When \(d=\frac{\beta }{2}n^2 + (1-\frac{\beta }{2})n\) is a positive integer, the last property of the statement follows from the connection between OU and CIR recalled right before the statement of the theorem. \(\square \)
The last proof actually relies on the following general observation. Let X be an n-dimensional continuous semi-martingale solution of
where B is a n-dimensional standard BM, and where
are Lipschitz. The infinitesimal generator of the Markov semigroup is given by
for all \(f\in {\mathcal {C}}^2({\mathbb {R}}^n,{\mathbb {R}})\) and \(x\in {\mathbb {R}}^n\). Then, by Itô’s formula, the process \(M^f={(M^f_t)}_{t\ge 0}\) given by
is a local martingale, and moreover, for all \(t\ge 0\),
The functional quadratic form \(\Gamma \) is known as the “carré du champ” operator.
If f is an eigenfunction of \({{\,\mathrm{G}\,}}\) associated to the spectral value \(\lambda \) in the sense that \({{\,\mathrm{G}\,}}f=\lambda f\) (note by the way that \(\lambda \le 0\) since \({{\,\mathrm{G}\,}}\) generates a Markov process), then we get
Now if \(\Gamma (f) = c\) (as in the first part of the theorem), then by the Lévy characterization of Brownian motion, the continuous local martingale \(W:=\frac{1}{\sqrt{c}}M^f\) starting from the origin is a standard BM and we recover the result of the first part of the theorem. On the other hand, if \(\Gamma (f) = cf\) (as in the second part of the theorem), then by the Lévy characterization of BM the local martingale
is a standard BM and we recover the result of the second part.
At this point, we observe that the infinitesimal generator of the CIR process R is the Laguerre partial differential operator
This operator leaves invariant the set of polynomials of degree less than or equal to k, for all integer \(k\ge 0\), a property inherited from (2.5). We will use this property in the following proof.
4.1 Proof of Corollary 1.4
By Theorem 1.3, \(Z = \pi (X^n)\) is an OU process in \({\mathbb {R}}\) solution of the stochastic differential equation
where B is a standard one-dimensional BM. By Lemma 3.1, \(Z_t \sim {\mathcal {N}}(Z_0 \mathrm {e}^{-t}, 1-\mathrm {e}^{-2t})\) for all \(t\ge 0\) and the equilibrium distribution is \(P_n^\beta \circ \pi ^{-1} = {\mathcal {N}}(0, 1)\). Using the contraction property stated in Lemma A.2, the comparison between Hellinger and TV of Lemma A.1 and the explicit expressions for Gaussian distributions of Lemma A.5, we find
Setting \(c_n := \log (\vert \pi (X^n_0)\vert )\) and assuming that \(\lim _{n\rightarrow \infty }c_n=\infty \), we deduce that for all \(\varepsilon \in (0,1)\)
The comparison between \(\mathrm {Hellinger}\) and \(\mathrm {TV}\) of Lemma A.1 allows to deduce that this remains true for the Hellinger distance.
We turn to Kullback. The contraction property stated in Lemma A.2 and the explicit expressions for Gaussian distributions of Lemma A.5 yield
This is enough to deduce that
The situation is similar for \(\chi ^2\): the contraction property stated in Lemma A.2 and the explicit expressions for Gaussian distributions of Lemma A.5 yield
so that
Regarding the Wasserstein distance, we have \(\left\| \pi \right\| _{\mathrm {Lip}}:=\sup _{x\ne y}\frac{|\pi (x)-\pi (y)|}{|x-y|}\le \sqrt{n}\) from the Cauchy–Schwarz inequality, and by Lemma A.2, for all probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^n\),
Using the explicit expressions for Gaussian distributions of Lemma A.5, we thus find
Setting \(c_n:= \log \Big (\frac{\vert \pi (x_0^n)\vert }{\sqrt{n}}\Big )\) and assuming \(c_n \rightarrow \infty \) as \(n\rightarrow \infty \), we thus deduce that for all \(\varepsilon \in (0,1)\)
5 The random matrix cases
In this section, we prove Theorem 1.5 and Corollary 1.6 that cover the matrix cases \(\beta \in \{1,2\}\). For these values of \(\beta \), the DOU process is the image by the spectral map of a matrix OU process, connected to the random matrix models \(\mathrm {GOE}\) and \(\mathrm {GUE}\). We could consider the case \(\beta =4\) related to \(\mathrm {GSE}\). Beyond these three algebraic cases, it could be possible for an arbitrary \(\beta \ge 1\) to use random tridiagonal matrices dynamics associated to \(\beta \) Dyson processes, see for instance [44].
The next two subsections are devoted to the proof of Theorem 1.5 in the \(\beta =2\) and \(\beta = 1\) cases respectively. The third section provides the proof of Corollary 1.6.
5.1 Hermitian case (\(\beta =2\))
Let \(\mathrm {Herm}_n\) be the set of \(n\times n\) complex Hermitian matrices, namely the set of \(h\in {\mathcal {M}}_{n,n}({\mathbb {C}})\) with \(h_{i,j}=\overline{h_{j,i}}\) for all \(1\le i,j\le n\). An element \(h\in \mathrm {Herm}_n\) is parametrized by the \(n^2\) real variables \((h_{i,i})_{1\le i\le n}\), \((\Re h_{i,j})_{1\le i<j\le n}\), \((\Im h_{i,j})_{1\le i<j\le n}\). We define, for \(h\in \mathrm {Herm}_n\) and \(1\le i,j\le n\),
Note that
We thus identify \(\mathrm {Herm}_n\) with \({\mathbb {R}}^n\times {\mathbb {R}}^{2\frac{n^2-n}{2}}={\mathbb {R}}^{n^2}\), this identification is isometrical provided \(\mathrm {Herm}_n\) is endowed with the norm \(\sqrt{\mathrm {Tr}(h^2)}\) and \({\mathbb {R}}^{n^2}\) with the Euclidean norm.
The Gaussian Unitary Ensemble \(\mathrm {GUE}_n\) is the Gaussian law on \(\mathrm {Herm}_n\) with density
If H is a random \(n\times n\) Hermitian matrix then \(H\sim \mathrm {GUE}_n\) if and only if the \(n^2\) real random variables \(\pi _{i,j}(H)\), \(1\le i,j\le n\), are independent Gaussian random variables with
The law \(\mathrm {GUE}_n\) is the unique invariant law of the Hermitian matrix OU process \({(H_t)}_{t\ge 0}\) on \(\mathrm {Herm}_n\) solution of the stochastic differential equation
where \(B={(B_t)}_{t\ge 0}\) is a Brownian motion on \(\mathrm {Herm}_n\), in the sense that the stochastic processes \((\pi _{i,j}(B_t))_{t\ge 0}\), \(1\le i \ne j \le n\), are independent standard one-dimensional BM. The coordinates stochastic processes \({(\pi _{i,j}(H_t))}_{t\ge 0}\), \(1\le i,j\le n\), are independent real OU processes.
For any h in \(\mathrm {Herm}_n\), we denote by \(\Lambda (h)\) the vector of the eigenvalues of h ordered in non-decreasing order. Lemma 5.1 below is an observation which dates back to the seminal work of Dyson [33], hence the name DOU for \(X^n\). We refer to [37, Ch. 12] and [2, Sec. 4.3] for a mathematical approach using modern stochastic calculus.
Lemma 5.1
(From matrix OU to DOU) The image of \(\mathrm {GUE}_n\) by the map \(\Lambda \) is the Coulomb gas \(P_n^\beta \) given by (1.6) with \(\beta =2\). Moreover the stochastic process \(X^n={(X^n_t)}_{t\ge 0}={(\Lambda (H_t))}_{t\ge 0}\) is well-defined and solves the stochastic differential equation (1.3) with \(\beta =2\) and \(x_0^n=\Lambda (h_0)\).
Let \(\beta =2\). Let us assume from now on that the initial value \(h_0\in \mathrm {Herm}_n\) of \({(H_t)}_{t\ge 0}\) has eigenvalues \(x_0^n\) where \(x^n_0\) is as in Theorem 1.5. We start by proving the upper bound on the \(\chi ^2\) distance stated in Theorem 1.5: it will be an adaptation of the proof of the upper bound of Theorem 1.1 applied to the Hermitian matrix OU process \({(H_t)}_{t\ge 0}\) combined with the contraction property of the \(\chi ^2\) distance. Indeed, by Lemma 5.1 and the contraction property of Lemma A.2
We claim now that the right-hand side tends to 0 as \(n\rightarrow \infty \) when \(t=t_n\) is well chosen. Indeed, using the identification between \(\mathrm {Herm}_n\) and \({\mathbb {R}}^{n^2}\) mentioned earlier, we have \(\mathrm {GUE}_n={\mathcal {N}}(m_2,\Sigma _2)\) where \(m_2=0\) and where \(\Sigma _2\) is an \(n^2\times n^2\) diagonal matrix with
On the other hand, the Mehler formula (Lemma 3.1) gives \(\mathrm {Law}(H_t)={\mathcal {N}}(m_1,\Sigma _1)\) where \(m_1=\mathrm {e}^{-t}h_0\) and where \(\Sigma _1\) is an \(n^2\times n^2\) diagonal matrix with
Therefore, using Lemma A.5, the analogue of (3.3) reads
where
Taking now \(c_n := \log (\sqrt{n} |x_0^n|) \vee \log (\sqrt{n})\), for any \(\varepsilon \in (0,1)\), we get
In the right-hand side of (5.8), the factor \(n^2\) is the dimension of the \({\mathbb {R}}^{n^2}\) to which \(\mathrm {Herm}_n\) is identified, while the factor n in the first term is due to the 1/n scaling in the stochastic differential equation of the process. This explains the difference with the analogue (3.3) in dimension n.
From the comparison between TV, Hellinger, Kullback and \(\chi ^2\) stated in Lemma A.1, we easily deduce that the previous convergence remains true upon replacing \(\chi ^2\) by \(\mathrm {TV}\), \(\mathrm {Hellinger}\) or \(\mathrm {Kullback}\).
It remains to cover the upper bound for the Wasserstein distance. This distance is more sensitive to contraction arguments: according to Lemma A.2, one needs to control the Lipschitz norm of the “contraction map” at stake. It happens that the spectral map, restricted to the set \(\mathrm {Herm}_n\) of \(n\times n\) Hermitian matrices, is 1-Lipschitz: more precisely, the Hoffman–Wielandt inequality, see [43] and [45, Th. 6.3.5], asserts that for any two such matrices A and B, denoting \(\Lambda (A)=(\lambda _i(A))_{1\le i \le n}\) and \(\Lambda (B)=(\lambda _i(B))_{1\le i \le n}\) the ordered sequences of their eigenvalues, we have
Applying Lemma A.2, we thus deduce that
Following the Gaussian computations in the proof of Theorem 1.2, we obtain
Set \(c_n := \log (|x_0^n|)\). If \(c_n\rightarrow \infty \) as \(n\rightarrow \infty \) then for all \(\varepsilon \in (0,1)\) we find
This completes the proof of Theorem 1.5.
5.2 Symmetric case (\(\beta =1\))
The method is similar to the case \(\beta =2\). Let us focus only on the differences. Let \(\mathrm {Sym}_n\) be the set of \(n\times n\) real symmetric matrices, namely the set of \(s\in {\mathcal {M}}_{n,n}({\mathbb {R}})\) with \(s_{i,j}=s_{j,i}\) for all \(1\le i,j\le n\). An element \(s\in \mathrm {Sym}_n\) is parametrized by the \(n+\frac{n^2-n}{2}=\frac{n(n+1)}{2}\) real variables \((s_{i,j})_{1\le i\le j\le n}\). We define, for \(s\in \mathrm {Sym}_n\) and \(1\le i\le j\le n\),
Note that
We thus identify isometrically \(\mathrm {Sym}_n\), endowed with the norm \(\sqrt{\mathrm {Tr}(h^2)}\), with \({\mathbb {R}}^n\times {\mathbb {R}}^{\frac{n^2-n}{2}}={\mathbb {R}}^{\frac{n(n+1)}{2}}\) endowed with the Euclidean norm.
The Gaussian Orthogonal Ensemble \(\mathrm {GOE}_n\) is the Gaussian law on \(\mathrm {Sym}_n\) with density
If S is a random \(n\times n\) real symmetric matrix then \(S\sim \mathrm {GOE}_n\) if and only if the \(\frac{n(n+1)}{2}\) real random variables \(\pi _{i,j}(S)\), \(1\le i\le j\le n\), are independent Gaussian random variables with
The law \(\mathrm {GOE}_n\) is the unique invariant law of the real symmetric matrix OU process \({(S_t)}_{t\ge 0}\) on \(\mathrm {Sym}_n\) solution of the stochastic differential equation
where \(B={(B_t)}_{t\ge 0}\) is a Brownian motion on \(\mathrm {Sym}_n\), in the sense that the stochastic processes \((\pi _{i,j}(B_t))_{t\ge 0}\), \(1\le i\le j \le n\), are independent standard one-dimensional BM. The coordinates stochastic processes \({(\pi _{i,j}(S_t))}_{t\ge 0}\), \(1\le i\le j\le n\), are independent real OU processes.
For any s in \(\mathrm {Sym}_n\), we denote by \(\Lambda (s)\) the vector of the eigenvalues of s ordered in non-decreasing order. Lemma 5.2 below is the real symmetric analogue of Lemma 5.1.
Lemma 5.2
(From matrix OU to DOU) The image of \(\mathrm {GOE}_n\) by the map \(\Lambda \) is the Coulomb gas \(P_n^\beta \) given by (1.6) with \(\beta =1\). Moreover the stochastic process \(X^n={(X^n_t)}_{t\ge 0}={(\Lambda (S_t))}_{t\ge 0}\) is well-defined and solves the stochastic differential equation (1.3) with \(\beta =1\) and \(x_0^n=\Lambda (s_0)\).
As for the case \(\beta =2\), the idea now is that the DOU process is sandwiched between a real OU process and a matrix OU process.
By similar computations to the case \(\beta =2\), the analogue of (5.8) becomes
This allows to deduce the upper bound for TV, Hellinger, Kullback and \(\chi ^2\). Regarding the Wasserstein distance, the analogue of (5.12) reads
If \(\lim _{n\rightarrow \infty }\log (|x_0^n|)=\infty \) then we deduce the asserted result, concluding the proof of Theorem 1.5.
5.3 Proof of Corollary 1.6
Let \(\beta \in \{1,2\}\). Recall the definitions of \(a_n\) and \(c_n\) from the statement. Take \(x_0^{n,i} = a_n\) for all i, and note that \(\pi (x_0^n) = n a_n\). Given our assumptions on \(a_n\), Corollary 1.4 yields for this particular choice of initial condition and for any \(\varepsilon \in (0,1)\)
On the other hand, in the proof of Theorem 1.5 we saw that
where \(b_n = n^2\) for \(\beta =2\) and \(b_n = (n(n+1)/2)^2\) for \(\beta =1\). Since \(|x_0^n| \le \sqrt{n} a_n\) for all \(x_0^n \in [-a_n,a_n]^n\), and given the comparison between TV, Hellinger, Kullback and \(\chi ^2\) stated in Lemma A.1 we obtain for \(\mathrm {dist} \in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2\}\) and for all \(\varepsilon \in (0,1)\)
thus concluding the proof of Corollary 1.6 regarding theses distances.
Concerning Wasserstein, the proof of Theorem 1.5 shows that for any \(x_0^n \in [-a_n,a_n]^n\) we have
If \(\sqrt{n} a_n \rightarrow \infty \), then for \(c_n = \log (\sqrt{n}a_n)\) we deduce that for all \(\varepsilon \in (0,1)\)
6 Cutoff phenomenon for the DOU in TV and Hellinger
In this section, we prove Theorem 1.7 and Corollary 1.8 for the TV and Hellinger distances. We only consider the case \(\beta \ge 1\), although the arguments could be adapted mutatis mutandis to cover the case \(\beta = 0\): note that the result of Theorem 1.7 and Corollary 1.8 for \(\beta = 0\) can be deduced from Theorem 1.2. At the end of this section, we also provide the proof of Theorem 1.10.
6.1 Proof of Theorem 1.7 in TV and Hellinger
By the comparison between TV and Hellinger stated in Lemma A.1, it suffices to prove the result for the TV distance, so we concentrate on this distance until the end of this section. Our proof is based on the exponential decay of the relative entropy at an explicit rate given by the optimal logarithmic Sobolev constant. However, this requires the relative entropy of the initial condition to be finite. Consequently, we proceed in three steps. First, given an arbitrary initial condition \(x_0^n \in {\overline{D}}_n\), we build an absolutely continuous probability measure \(\mu _{x_0^n}\) on \(D_n\) that approximates \(\delta _{x_0^n}\) and whose relative entropy is not too large. Second, we derive a decay estimate starting from this regularized initial condition. Third, we control the total variation distance between the two processes starting respectively from \(\delta _{x_0^n}\) and \(\mu _{x_0^n}\).
6.1.1 Regularization
In order to have a finite relative entropy at time 0, we first regularize the initial condition by smearing out each particle in a ball of radius bounded below by \(n^{-(\kappa +1)}\), for some \(\kappa >0\). Let us first introduce the regularization at scale \(\eta \) of a Dirac distribution \(\delta _{z}\), \(z\in {\mathbb {R}}\) by
Given \(x \in {\overline{D}}_n\) and \(\kappa > 0\), we define a regularized version of \(\delta _{x}\) at scale \(n^{-\kappa }\), that we denote \(\mu _x\), by setting
where \(\eta := n^{-(\kappa +1 )}\). The parameters have been tuned in such a way that, independently of the choice of \(x\in {\overline{D}}_n\), the following properties hold. The supports of the Dirac masses \(\delta _{x_i+3i\eta }^{(\eta )}\), \(i\in \{1,\ldots ,n\}\), lie at distance at least \(\eta \) from each other. The volume of the support of \(\mu _x\) is equal to \(\eta ^n\), and therefore the relative entropy of \(\mu _x\) with respect to the Lebesgue measure is not too large. Finally, provided \(X_0^n = x\) and \(Y_0^n\) is distributed according to \(\mu _x\), almost surely \(\vert X_0^n - Y_0^n\vert _\infty \le (3n+1) \eta \).
6.1.2 Convergence of the regularized process to equilibrium
Lemma 6.1
(Convergence of regularized process) Let \((Y_t^n)_{t\ge 0}\) be a DOU process solution of (1.3), \(\beta \ge 1\), and let \(P_n^\beta \) be its invariant law. Assume that \(\mathrm {Law}(Y^n_0)\) is the regularized measure \(\mu _{x_0^n}\) in (6.1) associated to some initial condition \(x_0^n \in {\overline{D}}_n\). Then there exists a constant \(C > 0\), only depending on \(\kappa \), such that for all \(t\ge 0\), all \(n\ge 2\) and all \(x_0^n \in {\overline{D}}_n\)
Proof of Lemma 6.1
By Lemma B.2 and since \(\mathrm {Law}(Y^n_0) = \mu _{x_0^n}\), for all \(t\ge 0\), there holds
Now we have
Recall the definition of S in (1.15). As \(P_n^\beta \) has density \(\frac{\mathrm {e}^{- E}}{C_n^\beta }\), we may re-write this as
Recall the partition function \(C_{*n}^\beta = n! C_n^\beta \) from Sect. 2.2. It is proved in [12], using explicit expressions involving Gamma functions via a Selberg integral, that for some constant \(C>0\)
Next, we claim that \(S(\mu _{x_0^n})\le n \log (n^{1+\kappa })\). Indeed since \(\mu _{x_0^n}\) is a product measure, the tensorization property of entropy recalled in Lemma A.4 gives
Moreover an immediate computation yields \(\mathrm {Kullback}( \delta _{0}^{(\eta )} \mid \mathrm {d}x)=\log (\eta ^{-1})\) so that, given the definition of \(\eta \), we get
We turn to the estimation of the term \({\mathbb {E}}_{\mu _{x_0^n} }[E].\) The confinement term can be easily bounded:
Let us now estimate the logarithmic energy of \(\mu _{x_0^n}\). Using the fact that the logarithmic function is increasing, together with the fact the supports of \(\delta _{x_i+3i\eta }^{(\eta )}\) lie at distance at least \(\eta \) from each other, we notice that for any \(i > j\) there holds
It follows that the initial logarithmic energy cannot be much larger than \(n^2\log n\):
This implies that there exists a constant \(C>0\), only depending on \(\kappa \), such that for all \(n\ge 2\)
Inserting (6.4), (6.5) and (6.6) into (6.3) we obtain (for a different constant \(C>0\))
This bound, combined with (6.2), concludes the proof of Lemma 6.1. \(\square \)
6.1.3 Convergence to the regularized process in total variation distance
Let \((X_t^n)_{t\ge 0}\) and \((Y_t^n)_{t\ge 0}\) be two DOU processes with \(X_0^n=x_0^n\) and \(\mathrm {Law}(Y_0^n)=\mu _{x_0^n}\), where the measure \(\mu _{x_0^n}\) is defined in (6.1). Below we prove that, as soon as the parameter \(\kappa \) is large enough, the total variation distance between \(\mathrm {Law}(X_t^n)\) and \(\mathrm {Law}(Y_t^n)\) tends to 0, for any fixed \(t> 0\).
Note that at time 0, almost surely, there holds \(X_0^{n,i} \le Y_0^{n,i}\), for every \(i\in \{1,\ldots ,n\}\). We now introduce a coupling of the processes \((X_t^n)_{t\ge 0}\) and \((Y_t^n)_{t\ge 0}\) that preserves this ordering at all times. Consider two independent standard BM \(B^n\) and \(W^n\) in \({\mathbb {R}}^n\). Let \(X^n\) be the solution of (1.3) driven by \(B^n\), and let \(Y^n\) be the solution of
We denote by \({\mathbb {P}}\) the probability measure under which these two processes are coupled. Let us comment on the driving noise in the equation satisfied by \(Y^n\). When the i-th coordinates of \(X^n\) and \(Y^n\) equal, we take the same driving Brownian motion and the difference \(Y^{n,i}-X^{n,i}\) remains non-negative due to the convexity of \(-\log \) defined in (1.5), see the monotoncity result stated in Lemma B.4. On the other hand, when these two coordinates differ, we take independent driving Brownian motions in order for their difference to have non-zero quadratic variation (this allows to increase their merging probability). Under this coupling, the ordering of \(X^n\) and \(Y^n\) is thus preserved at all times, and if \(X_s^n = Y_s^n\) for some \(s\ge 0\), then it remains true at all times \(t\ge s\). Note however that if \(X_s^{n,i} = Y_s^{n,i}\), then this equality does not remain true at all times except if all the coordinates match.
As in (A.7), the total variation distance between the laws of \(X_t^n\) and \(Y_t^n\) may be bounded by
for all \(t\ge 0\). We wish to establish that for any given \(t>0\),
To do so, we work with the area between the two processes \(X^n\) and \(Y^n\), defined by
As the two processes are ordered at any time, this is nothing but the geometric area between the two discrete interfaces \(i\mapsto X_t^{n,i}\) and \(i\mapsto Y_t^{n,i}\) associated to the configurations \(X_t^n\) and \(Y_t^n\). We deduce that the merging time of the two processes coincide with the hitting time of 0 by this area, that we denote by \(\tau =\inf \{t\ge 0:A_t^n=0\}\).
The process \(A^n\) has a very simple structure: it is a semimartingale that behaves like an OU process with a randomly varying quadratic variation. Let \(N_t\) be the number of coordinates that do not coincide at time t, that is
Then \(A^n\) satisfies
where M is a centered martingale with quadratic variation
Note that whenever \(t< \tau \) we have
This a priori lower bound on the quadratic variation of M, combined with the Dubins–Schwarz theorem, allows to check that \(\tau < \infty \) almost surely. Note that in view of the coupling between \(X_t^n\) and \(Y_t^n\), we have \(X_t^n=Y_t^n\) for all \(t\ge \tau \).
Recall the following informal fact: with large probability, a Brownian motion starting from a hits b by a time of order \((a-b)^2\). For a continuous martingale, this becomes: with large probability, a continuous martingale starting from a accumulates a quadratic variation of order \((a-b)^2\) up to its first hitting time of b. Our next lemma states such a bound on the supermartingale \(A^n\).
Lemma 6.2
Let \(a>b\ge 0\). Let \(\tau _b=\inf \{t>0:A_t=b\}<\infty \) almost surely. Then, for all \(u\ge 1\),
Proof
Without loss of generality one can assume that \(A_0 = a\) almost surely.
By Itô’s formula, for all \(\lambda \ge 0\), the process
defines a submartingale (taking its values in [0, 1]). Doob’s stopping theorem yields
On the other hand, for \(\lambda = 2 (a-b)^{-1} u^{-1/2}\), there holds
Consequently one deduces that
\(\square \)
We are now ready to prove the following lemma:
Lemma 6.3
If \(\kappa >\frac{3}{2}\), then for every sequence of times \({(t_n)}_n\) with \(\varliminf _{n\rightarrow \infty }t_n>0\), we have
Proof of Lemma 6.3
Let \({(t_n)}_n\) be a sequence of times such that \(\varliminf _{n\rightarrow \infty }t_n>0\). In view of the definition of \(\mu _{{x_0^n}}\) and \(\eta \), the initial area satisfies almost surely
According to Lemma 6.2, with a probability that goes to 1, one has
On the other hand, by (6.7), we have the following control on the quadratic variation:
One deduces that, with a probability that goes to 1,
and this quantity goes to 0 as \(n\rightarrow \infty \), whenever \(\kappa >\frac{3}{2}\). Therefore for \(\kappa >\frac{3}{2}\), there holds
thus concluding the proof of Lemma 6.3. \(\square \)
Proof of Theorem 1.7 in TV and Hellinger
Let \(\kappa >\frac{3}{2}\) and fix some initial condition \(x_0^n \in {\overline{D}}_n\). By the triangle inequality for \(\mathrm {TV}\), there holds
Taking \(t=t_n(1+\varepsilon )\) with \(t_n=\log (\sqrt{n} |x_0^n|) \vee \log (n)\), one deduces from Lemma 6.1 and the Pinsker inequality stated in Lemma A.1 that the first term in the right-hand side of (6.8) vanishes as n tends to infinity. Meanwhile Lemma 6.3 guaranties that the second term tends to 0 as n tends to infinity. We also conclude using the comparison between TV and Hellinger (see Lemma A.1) that
\(\square \)
6.2 Proof of Corollary 1.8 in TV and Hellinger
Proof of Corollary 1.8 in TV and Hellinger
By Lemma A.1 and the triangle inequality for \(\mathrm {TV}\), we have
Take \(t=(1+\varepsilon ) c_n\) with \(c_n = \log (na_n)\). Lemmas 6.1 and 6.3, combined with the assumption made on \((a_n)\), show that the two terms on the right-hand side vanish as \(n\rightarrow \infty \). Using Lemma A.1, the same result holds for \(\mathrm {Hellinger}\).
On the other hand, take \(x_0^{n,i} = a_n\) for all i and note that \(\pi (x_0^n) = na_n\) goes to \(+\infty \) as \(n\rightarrow \infty \). By Corollary 1.4 we find
whenever \(\mathrm {dist} \in \{\mathrm {TV}, \mathrm {Hellinger}\}\). \(\square \)
6.3 Proof of Theorem 1.10
Proof of Theorem 1.10
Lower bound. The contraction property provided by Lemma A.2 gives
By Theorem 1.3\(P_n\circ \pi ^{-1}={\mathcal {N}}(0,1)\) and \(Y=\pi (X^n)\) is an OU process weak solution of \(Y_0=\pi (X^n_0)\) and \(\mathrm {d}Y_t=\sqrt{2}\mathrm {d}B_t-Y_t\mathrm {d}t\). In particular for all \(t\ge 0\), \(\mathrm {Law}(Y_t)\) is a mixture of Gaussian laws in the sense that for any measurable test function g with polynomial growth,
Now we use (again) the variational formula used in the proof of Lemma A.2 to get
and taking for g the linear function defined by \(g(x)=\lambda x\) for all \(x\in {\mathbb {R}}\) and for some \(\lambda \ne 0\) yields
Finally, by using the assumption on first moment and taking \(\lambda \) small enough we get, for all \(\varepsilon \in (0,1)\),
Upper bound. From Lemma B.2 we have, for all \(t\ge 0\),
Arguing like in the proof of Lemma 6.1 and using the contraction property of \(\mathrm {Kullback}\) provided by Lemma A.2 for the map \(\Psi \) defined in (1.17), we can write the following decomposition
Combining (6.4) with the assumptions on the \(\mu _i\)’s yields for some constant \(C>0\)
and it follows finally that for all \(\varepsilon \in (0,1)\),
\(\square \)
7 Cutoff phenomenon for the DOU in Wasserstein
7.1 Proofs of Theorem 1.7 and Corollary 1.8 in Wasserstein
Let \({(X_t)}_{t\ge 0}\) be the DOU process. By Lemma B.2, for all \(t\ge 0\) and all initial conditions \(X_0 \in {\overline{D}}_n\),
Suppose now that \(\mathrm {Law}(X^n_0)=\delta _{x^n_0}\). Then the triangle inequality for the Wasserstein distance gives
By Theorem 1.3, the mean at equilibrium of \(|X_t^n|^2\) equals \(1 + \frac{\beta }{2}(n-1)\) and therefore
We thus get
Set \(c_n := \log (|x_0^n|) \vee \log (\sqrt{n})\). For any \(\varepsilon \in (0,1)\), we have
and this concludes the proof of Theorem 1.7 in the Wasserstein distance.
Regarding the proof of Corollary 1.8, if \(x_0^n \in [-a_n,a_n]^n\) then \(|x_0^n| \le \sqrt{n} a_n\). Therefore if \(\inf _n a_n > 0\), setting \(c_n = \log (\sqrt{n} a_n)\) we find, as required,
7.2 Proof of Theorem 1.9
This is an adaptation of the previous proof. We compute
where \(\rho _n \in D_n\) is the vector of the quantiles of order 1/n of the semi-circle law as in (1.14). The rigidity estimates established in [17, Th. 2.4] justify that
If \(|x^n_0-\rho _n|\) diverges with n, we deduce that for all \(\varepsilon \in (0,1)\), with \(t_n = \log (|x^n_0-\rho _n|)\),
On the other hand, if \(|x^n_0-\rho _n|\) converges to some limit \(\alpha \) then we easily get, for any \(t\ge 0\),
Remark 7.1
(High-dimensional phenomena) With \(X_n\sim P_n^\beta \), in the bias-variance decomposition
the second term of the right hand side is a variance term that measures the concentration of the log-concave random vector \(X_n\) around its mean \({\mathbb {E}}X_n\), while the first term in the right hand side is a bias term that measures the distance of the mean \({\mathbb {E}}X_n\) to the mean-field limit \(\rho _n\). Note also that \({\mathbb {E}}(|X_n-{\mathbb {E}}X_n|^2)={\mathbb {E}}(|X_n|^2)-|{\mathbb {E}}X_n|^2=1+\frac{\beta }{2}(n-1)-|{\mathbb {E}}X_n|^2\), reducing the problem to the mean. We refer to [42] for a fine asymptotic analysis in the determinantal case \(\beta =2\).
Notes
Here “H” is the capital \(\eta \) used by Boltzmann for entropy, “W” is for Wasserstein, “I” is for Fisher information.
References
Aldous, D., Diaconis, P.: Shuffling cards and stopping times. Am. Math. Mon. 93, 333–348 (1986)
Anderson, G.W., Guionnet, A., Zeitouni, O.: An introduction to random matrices, volume 118 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (2010)
Ané, C., Blachère, S., Chafaï, D., Fougères, P., Gentil, I., Malrieu, F., Roberto, C., Scheffer, G.: Sur les inégalités de Sobolev logarithmiques, vol. 10. Société Mathématique de France, Paris (2000)
Baker, T.H., Forrester, P.J.: The Calogero-Sutherland model and polynomials with prescribed symmetry. Nuclear Phys. B 492(3), 682–716 (1997)
Bakry, D.: Remarques sur les semigroupes de Jacobi. Astérisque 236, 23–39 (1996). (Hommage à P. A. Meyer et J. Neveu)
Bakry, D., Gentil, I., Ledoux, M.: Analysis and geometry of Markov diffusion operators, vol. 348. Springer, Cham (2014)
Barrera, G.: Abrupt convergence for a family of Ornstein-Uhlenbeck processes. Braz. J. Probab. Stat. 32(1), 188–199 (2018)
Barrera, G., Högele, M.A., Pardo, J.C.: The cutoff phenomenon in total variation for nonlinear Langevin systems with small layered stable noise. preprint arXiv:2011.10806v1, (2020)
Barrera, G., Högele, M.A., Pardo, J.C.: Cutoff thermalization for Ornstein-Uhlenbeck systems with small Lévy noise in the Wasserstein distance. preprint arXiv:2009.10590v1 to appear in J. Stat. Phys. 2021 (2020)
Barrera, G., Jara, M.: Thermalisation for small random perturbations of dynamical systems. Ann. Appl. Probab. 30(3), 1164–1208 (2020)
Barrera, G., Pardo, J.C.: Cut-off phenomenon for Ornstein-Uhlenbeck processes driven by Lévy processes. Electron. J. Probab. 25, 33 (2020). (Paper No. 15)
Arous, G.B., Guionnet, A.: Large deviations for Wigner’s law and Voiculescu’s non-commutative entropy. Probab. Theory Relat. Fields 108(4), 517–542 (1997)
Bertucci, C., Debbah, M., Lasry, J.-M., Lions, P.-L.: A spectral dominance approach to large random matrices. preprint arXiv:2105.08983v1, (2021)
Biane, P., Speicher, R.: Free diffusions, free entropy and free Fisher information. Ann. Inst. H. Poincaré Probab. Stat. 37(5), 581–606 (2001)
Bolley, F., Chafaï, D., Fontbona, J.: Dynamics of a planar Coulomb gas. Ann. Appl. Probab. 28(5), 3152–3183 (2018)
Bolley, F., Gentil, I., Guillin, A.: Convergence to equilibrium in Wasserstein distance for Fokker-Planck equations. J. Funct. Anal. 263(8), 2430–2457 (2012)
Bourgade, P., Erdös, L., Yau, H.-T.: Edge universality of beta ensembles. Comm. Math. Phys. 332(1), 261–353 (2014)
Caputo, P., Labbé, C., Lacoin, H.: Mixing time of the adjacent walk on the simplex. Ann. Probab. 48(5), 2449–2493 (2020)
Caputo, P., Labbé, C., Lacoin, H.: Spectral gap and cutoff phenomenon for the Gibbs sampler of \(\nabla \varphi \) interfaces with convex potential. Ann. Inst. H. Poincaré Probab. Stat. 58(2), 794–826 (2022)
Carrillo, J.A., McCann, R.J., Villani, C.: Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Rev. Mat. Iberoamericana 19(3), 971–1018 (2003)
Carrillo, J.A., McCann, R.J., Villani, C.: Contractions in the 2-Wasserstein length space and thermalization of granular media. Arch. Ration. Mech. Anal. 179(2), 217–263 (2006)
Cépa, E., Lépingle, D.: Diffusing particles with electrostatic repulsion. Probab. Theory Relat. Fields 107(4), 429–449 (1997)
Chafaï, D.: Entropies, convexity, and functional inequalities: on \(\Phi \)-entropies and \(\Phi \)-Sobolev inequalities. J. Math. Kyoto Univ. 44(2), 325–363 (2004)
Chafaï, D.: Binomial-Poisson entropic inequalities and the M/M/\(\infty \) queue. ESAIM, Probab. Stat. 10, 317–339 (2006)
Chafaï, D., Lehec, J.: On Poincaré and logarithmic Sobolev inequalities for a class of singular Gibbs measures. In: Geometric aspects of functional analysis. Israel seminar (GAFA) 2017–2019. Volume 1, pages 219–246. Springer, Cham (2020)
Chen, G.-Y., Saloff-Coste, L.: The cutoff phenomenon for ergodic Markov processes. Electron. J. Probab. 13(3), 26–78 (2008)
Devroye, L., Mehrabian, A., Reddad, T.: The total variation distance between high-dimensional Gaussians. preprint arXiv:1810.08693v5, (2018)
Diaconis, P.: The cutoff phenomenon in finite Markov chains. Proc. Nat. Acad. Sci. U.S.A. 93(4), 1659–1664 (1996)
Diaconis, P., Saloff-Coste, L.: Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Probab. 6(3), 695–750 (1996)
Diaconis, P., Shahshahani, M.: Time to reach stationarity in the Bernoulli-Laplace diffusion model. SIAM J. Math. Anal. 18, 208–218 (1987)
Donati-Martin, C., Groux, B., Maïda, M.: Convergence to equilibrium in the free Fokker-Planck equation with a double-well potential. Ann. Inst. Henri Poincaré, Probab. Stat. 54(4), 1805–1818 (2018)
Dumitriu, I., Edelman, A.: Matrix models for beta ensembles. J. Math. Phys. 43(11), 5830–5847 (2002)
Dyson, F.J.: A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191–1198 (1962)
Edelman, A.: The random matrix technique of ghosts and shadows. Markov Process. Relat. Fields 16(4), 783–792 (2010)
Edelman, A., Rao, N.R.: Random matrix theory. Acta Numerica 14, 233–297 (2005)
Engoulatov, A.: A universal bound on the gradient of logarithm of the heat kernel for manifolds with bounded Ricci curvature. J. Funct. Anal. 238(2), 518–529 (2006)
Erdős, L., Yau, H.-T.: A dynamical approach to random matrix theory, volume 28 of Courant Lecture Notes in Mathematics. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI (2017)
Feller, W.: Two singular diffusion problems. Ann. Math. 2(54), 173–182 (1951)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2), 231–240 (1984)
Grigor’yan, A.: Heat kernel and analysis on manifolds, volume 47. Providence, RI: American Mathematical Society (AMS); Somerville, MA: International Press (2009)
Gustavsson, J.: Gaussian fluctuations of eigenvalues in the GUE. Ann. Inst. Henri Poincaré, Probab. Stat. 41(2), 151–178 (2005)
Hoffman, A.J., Wielandt, H.W.: The variation of the spectrum of a normal matrix. Duke Math. J. 20, 37–39 (1953)
Holcomb, D., Paquette, E.: Tridiagonal models for dyson brownian motion. preprint arXiv:1707.02700, (2017)
Horn, R.A., Johnson, C.R.: Matrix analysis, 2nd edn. Cambridge University Press, Cambridge (2013)
Huang, J., Landon, B.: Rigidity and a mesoscopic central limit theorem for Dyson Brownian motion for general \(\beta \) and potentials. Probab. Theory Relat. Fields 175(1–2), 209–253 (2019)
Lachaud, B.: Cut-off and hitting times of a sample of Ornstein-Uhlenbeck processes and its average. J. Appl. Probab. 42(4), 1069–1080 (2005)
Lacoin, H.: Mixing time and cutoff for the adjacent transposition shuffle and the simple exclusion. Ann. Probab. 44(2), 1426–1487 (2016)
Lassalle, M.: Polynômes de Hermite généralisés. C. R. Acad. Sci. Paris Sér. I Math. 313(9), 579–582 (1991)
Lassalle, M.: Polynômes de Jacobi généralisés. C. R. Acad. Sci. Paris Sér. I Math. 312(6), 425–428 (1991)
Lassalle, M.: Polynômes de Laguerre généralisés. C. R. Acad. Sci. Paris Sér. I Math. 312(10), 725–728 (1991)
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov chains and mixing times. With a chapter on “Coupling from the past” by James G. Propp and David B. Wilson. 2nd edition. Providence, RI: American Mathematical Society (AMS), 2nd edition edition (2017)
Li, S., Li, X.-D., Xie, Y.-X.: On the law of large numbers for the empirical measure process of generalized Dyson Brownian motion. J. Stat. Phys. 181(4), 1277–1305 (2020)
Lippert, R.A.: A matrix model for the \(\beta \)-Jacobi ensemble. J. Math. Phys. 44(10), 4807–4816 (2003)
Méliot, P.-L.: The cut-off phenomenon for brownian motions on compact symmetric spaces. Potential Anal. 40(4), 427–509 (2014)
Pardo, L.: Statistical inference based on divergence measures, volume 185 of Statistics: Textbooks and Monographs. Chapman & Hall/CRC, Boca Raton, FL (2006)
Pollard, D.: A user’s guide to measure theoretic probability, volume 8 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2002)
Potters, M., Bouchaud, J.-P.: A first course in random matrix theory: for physicists, engineers and data scientists. Cambridge University Press, Cambridge (2021)
Rachev, S.T.: Probability metrics and the stability of stochastic models. John Wiley & Sons Ltd., Chichester etc. (1991)
Rogers, L., Shi, Z.: Interacting Brownian particles and the Wigner law. Probab. theory relat. fields 95(4), 555–570 (1993)
Salez, J.: Cutoff for non-negatively curved Markov chains. preprint arXiv:2102.05597v1, (2021)
Saloff-Coste, L.: Precise estimates on the rate at which certain diffusions tend to equilibrium. Mathematische Zeitschrift 217(1), 641–677 (1994)
Saloff-Coste, L.: Aspects of Sobolev-type inequalities, vol. 289. Cambridge University Press, Cambridge (2002)
Saloff-Coste, L.: On the convergence to equilibrium of Brownian motion on compact simple Lie groups. J. Geom. Anal. 14(4), 715–733 (2004)
Souplet, P., Zhang, Q.S.: Sharp gradient estimate and Yau’s Liouville theorem for the heat equation on noncompact manifolds. Bull. Lond. Math. Soc. 38(6), 1045–1053 (2006)
Villani, C.: Optimal transport. Old and new, vol. 338. Springer, Berlin (2009)
Acknowledgements
JB is supported by a “Fondation CFM pour la Recherche” grant. DC is supported by project EFI ANR-17-CE40-0030. CL is supported by project SINGULAR ANR-16-CE40-0020-01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Distances and divergences
We use the following standard distances and divergences to quantify the trend to equilibrium of Markov processes and to formulate the cutoff phenomena.
The Wasserstein–Kantorovich–Monge transportation distance of order 2 and with respect to the underlying Euclidean distance is defined for all probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^n\) by
where \(|x|=\sqrt{x_1^2+\ldots +x_n^2}\) and where the inf runs over all couples (X, Y) with \(X\sim \mu \) and \(Y\sim \nu \).
The total variation distance between probability measures \(\mu \) and \(\nu \) on the same space is
where the supremum runs over Borel subsets. If \(\mu \) and \(\nu \) are absolutely continuous with respect to a reference measure \(\lambda \) with densities \(f_\mu \) and \(f_\nu \) then \(\Vert \mu -\nu \Vert _{\mathrm {TV}}=\frac{1}{2}\int |f_\mu -f_\nu |\mathrm {d}\lambda =\frac{1}{2}\Vert f_\mu -f_\nu \Vert _{L^1(\lambda )}\).
The Hellinger distance between probability measures \(\mu \) and \(\nu \) with densities \(f_\mu \) and \(f_\nu \) with respect to the same reference measure \(\lambda \) is
This quantity does not depend on the choice of \(\lambda \). We have \(\mathrm {Hellinger}(\mu ,\nu )=\frac{1}{\sqrt{2}}\Vert \sqrt{f_\mu }-\sqrt{f_\nu }\Vert _{L^2(\lambda )}\). Note that an alternative normalization is sometimes considered in the literature, making the maximal value of the Hellinger distance equal \(\sqrt{2}\).
The Kullback–Leibler divergence or relative entropy is defined by
if \(\nu \) is absolutely continuous with respect to \(\mu \), and \(\mathrm {Kullback}(\nu \mid \mu )=+\infty \) otherwise.
The \(\chi ^2\) divergence or relative variance is given by
We set it to \(+\infty \) if \(\nu \) is not absolutely continuous with respect to \(\mu \). If \(\mu \) and \(\nu \) have densities \(f_\mu \) and \(f_\nu \) with respect to a reference measure \(\lambda \) then \(\chi ^2(\nu \mid \mu )=\int (f_\nu ^2/f_\mu )\mathrm {d}\lambda -1\).
The (logarithmic) Fisher information or divergence is defined by
if \(\nu \) is absolutely continuous with respect to \(\mu \), and \(\mathrm {Fisher}(\nu \mid \mu )=+\infty \) otherwise.
Each of these distances or divergences has its advantages and drawbacks. In some sense, the most sensitive is Fisher due to its Sobolev nature, then \(\chi ^2\), then Kullback which can be seen as a sort of \(L^{1+}=L\log L\) norm, then TV and Hellinger which are comparable, then Wasserstein, but this rough hierarchy misses some subtleties related to some scales and nature of the arguments.
Some of these distances or divergences can generically be compared as the following result shows.
Lemma A.1
(Inequalities). For any probability measures \(\mu \) and \(\nu \) on the same space,
We refer to [57, p. 61-62] for a proof. The inequality between the total variation distance and the relative entropy is known as the Pinsker or Csiszár–Kullback inequality, while the inequalities between the total variation distance and the Hellinger distance are due to Kraft. There are many other metrics between probability measures, see for instance [39, 59] for a discussion.
The total variation distance can also be seen as a special Wasserstein distance of order 1 with respect to the atomic distance, namely
where the infimum runs over all couplings \(X\sim \mu \) and \(Y\sim \nu \). This explains in particular why \(\mathrm {TV}\) is more sensitive than \(\mathrm {Wasserstein}\) at short scales but less sensitive at large scales, a consequence of the sensitivity difference between the underlying atomic and Euclidean distances. The probabilistic representations of \(\mathrm {TV}\) and \(\mathrm {Wasserstein}\) make them compatible with techniques of coupling, which play an important role in the literature on convergence to equilibrium of Markov processes.
We gather now useful results on distances and divergences.
Lemma A.2
(Contraction properties). Let \(\mu \) and \(\nu \) be two probability measures on a same measurable space S. Let \(f:S\mapsto T\) be a measurable function, where T is another measurable space.
-
If \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Kullback}, \chi ^2\}\) then
$$\begin{aligned} \mathrm {dist}(\nu \circ f^{-1}\mid \mu \circ f^{-1}) \le \mathrm {dist}(\nu \mid \mu ). \end{aligned}$$ -
If \(S={\mathbb {R}}^n\), \(T={\mathbb {R}}^k\) then, denoting \(\left\| f\right\| _{\mathrm {Lip}}=\sup _{x\ne y}\frac{|f(x)-f(y)|}{|x-y|}\),
$$\begin{aligned} \mathrm {Wasserstein}(\mu \circ f^{-1},\nu \circ f^{-1}) \le \left\| f\right\| _{\mathrm {Lip}}\mathrm {Wasserstein}(\mu ,\nu ). \end{aligned}$$
The notation \(f^{-1}\) stands for the reciprocal map \(f^{-1}(A)=\{y\in S:f(x)\in A\}\) and \(\mu \circ f^{-1}\) is the image measure or push-forward of \(\mu \) by the map f, defined by \((\mu \circ f^{-1})(A)=\mu (f^{-1}(A))\). In terms of random variables we have \(Y\sim \mu \circ f^{-1}\) if and only \(Y=f(X)\) where \(X\sim \mu \).
The proof of the contraction properties of Lemma A.2 are all based on variational formulas. Note that following [66, Ex. 22.20 p. 588], there is a variational formula for \(\mathrm {Fisher}\) that comes from its dual representation as an inverse Sobolev norm. We do not develop this idea in this work.
Proof
The proof of the contraction property for Wasserstein comes from the fact that every coupling of \(\mu \) and \(\nu \) produces a coupling for \(\mu \circ f^{-1}\) and \(\nu \circ f^{-1}\). Regarding TV, the contraction property is a consequence of the definition of this distance and of measurability. In the case of Kullback, the property can be proved using the following well known variational formula:
where the supremum runs over all \(g\in L^1(\nu )\), or by approximation when the supremum runs over all bounded measurable g. This variational formula can be derived for instance by applying Jensen’s inequality to \(- \log {\mathbb {E}}_{\nu }[\mathrm {e}^g \frac{\mathrm {d}\mu }{\mathrm {d}\nu }]\).
Equality is achieved for \(g=\log (\mathrm {d}\nu /\mathrm {d}\mu )\). Now, taking \(g=h\circ f\) gives
and it remains to take the supremum over h to get
The variational formula for \(\mathrm {Kullback}(\cdot \mid \mu )\) is a manifestation of its convexity, it expresses this functional as the envelope of its tangents, its Fenchel–Legendre transform or convex dual is the log-Laplace transform. Such a variational formula is equivalent to tensorization, and is available for all \(\Phi \)-entropies such that \((u,v)\mapsto \Phi ''(u)v^2\) is convex, see [24, Th. 4.4]. In particular, the analogous variational formula as well as the consequence in terms of contraction are also available for \(\chi ^2\) which corresponds to the \(\Phi \)-entropy with \(\Phi (u)=u^2-1\) (variance as a \(\Phi \)-entropy). \(\square \)
Lemma A.3
(Scale invariance versus homogeneity) The total variation distance is scale invariant while the Wasserstein distance is homogeneous just like a norm, namely for all probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^n\) and all scaling factor \(\sigma \in (0,\infty )\), denoting \(\mu _\sigma =\mathrm {Law}(\sigma X)\) where \(X\sim \mu \), we have
Proof
For the Wasserstein distance, the result follows from
while for the \(\mathrm {TV}\) distance, it comes from the fact that \(A\mapsto A_\sigma := \{\sigma x: x\in A\}\) is a bijection. \(\square \)
We turn to the behavior of the distances/divergences under tensorization.
Lemma A.4
(Tensorization) For all probability measures \(\mu _1,\ldots ,\mu _n\) and \(\nu _1,\ldots ,\nu _n\) on \({\mathbb {R}}\), we have
The equality for the Wasserstein distance comes by taking the product of optimal couplings. The first inequality for the total variation distance comes from its contraction property (Lemma A.2), while the second comes from \(|(a_1\ldots a_n)-(b_1\ldots b_n)|\le \sum _{i=1}^n|a_i-b_i|(a_1\ldots a_{i-1})(b_{i+1}\ldots b_n)\), \(a_1,\ldots ,a_n,b_1,\ldots ,b_n\in [0,+\infty )\), which comes itself from the triangle inequality on the telescoping sum \(\sum _{i=1}^n(c_i-c_{i-1})\) where \(c_i=(a_1\ldots a_i)(b_{i+1}\ldots b_n)\) via \(c_i-c_{i-1}=(a_i-b_i)(a_1\ldots a_{i-1})(b_{i+1}\ldots b_n)\).
Lemma A.5
(Explicit formulas for Gaussian distributions) For all \(n\ge 1\), \(m_1,m_2\in {\mathbb {R}}^n\), and all \(n\times n\) covariance matrices \(\Sigma _1,\Sigma _2\), denoting \(\Gamma _1={\mathcal {N}}(\mu _1,\Sigma _1)\) and \(\Gamma _2={\mathcal {N}}(\mu _2,\Sigma _2)\), we have
where the formula for \(\chi ^2(\Gamma _1\mid \Gamma _2)\) holds if \(2\Sigma _2>\Sigma _1\), and \(\chi ^2(\Gamma _1\mid \Gamma _2)=+\infty \) otherwise. Moreover the formulas for Fisher and Wasserstein rewrite, if \(\Sigma _1\) and \(\Sigma _2\) commute, \(\Sigma _1\Sigma _2=\Sigma _2\Sigma _1\), to
Regarding the total variation distance, there is no general simple formula for Gaussian laws, but we can use for instance the comparisons with \(\mathrm {Kullback}\) and \(\mathrm {Hellinger}\) (Lemma A.1), see [27] for a discussion.
Proof of Lemma A.5
We refer to [56, p. 47 and p. 51] for Kullback and Hellinger, and to [40] for Wasserstein, a far more subtle case. The formula for \(\chi ^2(\Gamma _1\mid \Gamma _2)\) follows easily from a direct computation. We have not found in the literature a formula for Fisher. Let us give it here for the sake of completeness. Using \({\mathbb {E}}[X_iX_j]=\Sigma _{ij}+m_im_j\) when \(X\sim {\mathcal {N}}(m,\Sigma )\) we get, for all \(n\times n\) symmetric matrices A and B
and thus for all n-dimensional vectors a and b,
Now, using the notation \(q_i(x)=\Sigma _i^{-1}(x-m_i)\cdot (x-m_i)\) and \(|\Sigma _i|=\det (\Sigma _i)\),
The formula when \(\Sigma _1\Sigma _2=\Sigma _2\Sigma _1\) follows immediately. \(\square \)
Appendix B. Convexity and its dynamical consequences
We gather useful dynamical consequences of convexity. We start with functional inequalities.
Lemma B.1
(Logarithmic Sobolev inequality) Let \(P_n^\beta \) be the invariant law of the DOU process solving (1.3). Then, for all law \(\nu \) on \({\mathbb {R}}^n\), we have
Moreover the constant \(\frac{1}{2n}\) is optimal.
Furthermore, finite equality is achieved if and only if \(\mathrm {d}\nu /\mathrm {d}P_n^\beta \) is of the form \(\mathrm {e}^{\lambda (x_1+\ldots +x_n)}\), \(\lambda \in {\mathbb {R}}\).
Linearizing the log-Sobolev inequality above with \(\mathrm {d}\nu /\mathrm {d}P_n^\beta =1+\varepsilon f\) gives the Poincaré inequality
It can be extended by truncation and regularization from the case where f is smooth and compactly supported to the case where f is in the Sobolev space \(H^1(P^\beta _n)\). Finite equality is achieved when f is an eigenfunction associated to the eigenvalue \(-1\) of \({{\,\mathrm{G}\,}}\), namely \(f(x)=a(x_1+\ldots +x_n)+b\), \(a,b\in {\mathbb {R}}\), hence the other name spectral gap inequality. It rewrites in terms of \(\chi ^2\) divergence as
The right-hand side plays for the \(\chi ^2\) divergence the role played by Fisher for Kullback.
We refer to [25, 37] for a proof of Lemma B.1. This logarithmic Sobolev inequality is a consequence of the log-concavity of \(P_n^\beta \) with respect to \({\mathcal {N}}(0,\frac{1}{n}\mathrm {I}_n)\). A slightly delicate aspect lies in the presence of the restriction to \(D_n\), which can be circumvented by using a regularization procedure.
There are many other functional inequalities which are a consequence of this log-concavity, for instance the Talagrand transportation inequality that states that when \(\nu \) has finite second moment,
and the HWI inequalityFootnote 1 that states that when \(\nu \) has finite second moment,
and we refer to [66] for this couple of functional inequalities, that we do not use here.
Lemma B.2
(Sub-exponential convergence to equilibrium) Let \({(X^n_t)}_{t\ge 0}\) be the DOU process solution of (1.3) with \(\beta =0\) or \(\beta \ge 1\), and let \(P_n^\beta \) be its invariant law. Then for all \(t\ge 0\), we have the sub-exponential convergences
Recall that when \(\beta >0\) the initial condition \(X^n_0\) is always taken in \(D_n\).
For each inequality, if the right-hand side is infinite then the inequality is trivially satisfied. This is in particular the case for \(\mathrm {Kullback}\) and \(\mathrm {Fisher}\) when \(\mathrm {Law}(X^n_0)\) is not absolutely continuous with respect to the Lebesgue measure, and for Wasserstein when \(\mathrm {Law}(X^n_0)\) has infinite second moment.
Elements of proof of Lemma B.2
The idea is that an exponential decay for \(\mathrm {Kullback}\), \(\chi ^2\), \(\mathrm {Fisher}\), and \(\mathrm {Wasserstein}\) can be established by taking the derivative, using a functional inequality, and using the Grönwall lemma. More precisely, for \(\mathrm {Kullback}\) it is a log-Sobolev inequality, for \(\chi ^2\) a Poincaré inequality, for \(\mathrm {Wasserstein}\) a transportation type inequality, and for \(\mathrm {Fisher}\) a Bakry – Émery \(\Gamma _2\) inequality, see for instance [3, 6, 66]. It is a rather standard piece of probabilistic functional analysis, related to the log-concavity of \(P_n^\beta \). We recall the crucial steps for the reader convenience. Let us set \(\mu _t=\mathrm {Law}(X^n_t)\) and \(\mu =P_n^\beta \). For \(t>0\) the density \(p_t=\mathrm {d}\mu _t/\mathrm {d}\mu \) exists and solves the evolution equation \(\partial _tp_t={{\,\mathrm{G}\,}}p_t\) where \({{\,\mathrm{G}\,}}\) is as in (2.5). We have the integration by parts
For \(\mathrm {Kullback}\), we find using these tools, for all \(t>0\), denoting \(\Phi (u):=u\log (u)\),
where the inequality comes from the logarithmic Sobolev inequality of Lemma B.1. It remains to use the Grönwall lemma to get the exponential decay of \(\mathrm {Kullback}\).
The derivation of the exponential decay of the Fisher divergence follows the same lines by differentiating again with respect to time. Indeed, after a sequence of differential computations and integration by parts, we find, see for instance [3, Ch. 5], [6], or [66],
where \(\Gamma _{\!2}(f):=\frac{1}{n^2}f''^2+\frac{1}{n}V''f'^2\) is the Bakry – Émery “Gamma-two” operator of the dynamics. Now using the convexity of V, we get, by the Grönwall lemma, for all \(t>0\),
This can be used to prove the log-Sobolev inequality, see [3, Ch. 5], [6], and [66]. This differential approach goes back at least to Boltzmann (statistical physics) and Stam (information theory) and was notably extensively developed later on by Bakry, Ledoux, Villani and their followers.
For the Wasserstein distance, we proceed by coupling. Indeed, since the diffusion coefficient is constant in space, we can simply use a parallel coupling. Namely, let \({(X'_t)}_{t\ge 0}\) be the process started from another possibly random initial condition \(X'_0\), and satisfying to the same stochastic differential equation, with the same BM. We get
hence
Now since E is uniformly convex with \(\nabla ^2 E\ge n I_n\), we get, for all \(x,y\in {\mathbb {R}}^n\),
which gives
and by the Grönwall lemma,
It follows that
By taking the infimum over all couplings of \(X_0\) and \(X_0'\) we get
Taking \(X_0'\sim P_n^\beta \) we get, by invariance, for all \(t\ge 0\),
\(\square \)
Lemma B.3
(Monotonicity) Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3), with \(\beta =0\) or \(\beta \ge 1\) and invariant law \(P^\beta _n\). Then for all \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Fisher}, \mathrm {Wasserstein}\}\), the function \(t\ge 0\mapsto \mathrm {dist}(\mathrm {Law}(X^n_t)\mid P^\beta _n)\) is non-increasing.
Elements of proof of Lemma B.3
The monotonicity for \(\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2\) comes from the Markov nature of the process and the convexity of
This is known as the \(\Phi \)-entropy dissipation of Markov processes, see [6, 23, 66]. This can also be seen from (B.3). The monotonicity for \(\mathrm {TV}\) follows also from the contraction property of the total variation with respect to general Markov kernels, see [52, Ex. 4.2].
The monotonicity for \(\mathrm {Fisher}\) comes from the identity (B.4) and the convexity of V. By (B.3) this monotonicity is also equivalent to the convexity of \(\mathrm {Kullback}\) along the dynamics. The monotonicity for \(\mathrm {Wasserstein}\) can be obtained by computing the derivative along the dynamics starting from (B.6), but this is more subtle due to the variational nature of this distance and involves the convexity of V, see for instance [16, Bottom of p. 2442 and Lem. 3.2].
The monotonicities can also be extracted from the exponential decays of Lemma B.2 thanks to the Markov property and the profile \(\mathrm {e}^{-t}=1-t+o(t)\) of the prefactor in the right hand side. \(\square \)
The convexity of the interaction \(-\log \) as well as the constant nature of the diffusion coefficient in the evolution Eq. (1.3) allows to use simple “maximum principle” type arguments to prove that the dynamic exhibits a monotonous behavior and an exponential decay.
Lemma B.4
(Monotonicity and exponential decay) Let \({(X_t^n)}_{t\ge 0}\) and \({(Y_t^n)}_{t\ge 0}\) be a pair of DOU processes solving (1.3), \(\beta \ge 1\), driven by the same Brownian motion \((B_t)_{t\ge 0}\) on \({\mathbb {R}}^n\) and with respective initial conditions \(X_0^n\in {\overline{D}}_n\) and \(Y_0^n\in {\overline{D}}_n\). If for all \(i\in \{1,\ldots ,n\}\)
then the following properties hold true:
-
(Monotonicity property) for all \(t\ge 0\) and \(i \in \{1,\ldots ,n\}\),
$$\begin{aligned} X_t^{n,i}\le Y_t^{n,i}, \end{aligned}$$ -
(Decay estimate) for all \(t\ge 0\),
$$\begin{aligned} \max _{i\in \{1,\ldots ,n\}} (Y_t^{n,i}-X_t^{n,i}) \le \max _{i\in \{1,\ldots ,n\}} (Y_0^{n,i}-X_0^{n,i})\mathrm {e}^{-t}. \end{aligned}$$
Proof of Lemma B.4
The difference of \(Y_t^n - X_t^n\) satisfies
Since there are almost surely no collisions between the coordinates of \(X^n\), resp. of \(Y^n\), the right-hand side is almost surely finite for all \(t > 0\) and every process \(Y_t^{n,i}-X_t^{n,i}\) is \({\mathcal {C}}^1\) on \((0,\infty )\). Note that at time 0 some derivatives may blow up as two coordinates of \(X^n\) or \(Y^n\) may coincide.
Let us define
Elementary considerations imply that M and m are themselves \({\mathcal {C}}^1\) on \((0,\infty )\) and that at all times \(t>0\), there exist i, j such that
This would not be true if there were infinitely many processes of course. Now observe that if at time \(t>0\) we have \(Y_t^{n,i}-X_t^{n,i} = M(t)\), then
This implies that \(\partial _t M(t) \le - M(t)\). Similarly, we can deduce that \(\partial _t m(t) \ge - m(t)\). Integrating these differential equations, we get for all \(t\ge t_0 > 0\)
Since all processes are continuous on \([0,\infty )\), we can pass to the limit \(t_0\downarrow 0\) and get for all \(t\ge 0\),
\(\square \)
Remark B.5
(Beyond DOU dynamics) The monotonicity property of Lemma B.4 relies on the convexity of the interaction \(-\log \), and has nothing to do with the long-time behavior and the strength of V. In particular, this monotonicity property remains valid for the process solving (1.3) with an arbitrary V provided that it is \({\mathcal {C}}^1\) and there is no explosion, even in the situation where V is not strong enough to ensure that the process has an invariant law. If V is \({\mathcal {C}}^2\) then the decay estimate of Lemma B.4 survives in the following decay or growth form:
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Boursier, J., Chafaï, D. & Labbé, C. Universal cutoff for Dyson Ornstein Uhlenbeck process. Probab. Theory Relat. Fields 185, 449–512 (2023). https://doi.org/10.1007/s00440-022-01158-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-022-01158-5
Keywords
- Dyson process
- Ornstein–Uhlenbeck process
- Coulomb gas
- Random matrix theory
- High dimensional phenomenon
- Cutoff phenomenon
- High-dimensional probability
- Functional inequalities
- Spectral analysis
- Stochastic calculus
- Gaussian analysis
- Markov process
- Diffusion process
- Interacting particle system