We explore the numerical simulation of Coulomb gases and log-gases by mean of Hybrid or Hamiltonian Monte Carlo algorithms (HMC) [19, 36]. Such algorithms consist basically in using discretized kinetic (underdamped) Langevin dynamics to produce proposals for Metropolis–Hastings algorithms. This can be viewed as a way to add momentum to a Monte Carlo interacting particle system. The basic outcome of this exploratory work is that HMC algorithms have remarkably good numerical behavior for such gases despite the singularity of the interactions. Such algorithms scale well with the dimension of the system, see [4, 8]. They are therefore more efficient than the tamed overdamped version already explored in the literature for instance in  [55]. In this paper, we benchmark the capability of the algorithm to reproduce known results efficiently, and we make it ready to explore new conjectures.

Another advantage of this approach is that it could be adapted to take into account a sub-manifold constraint [51]. For instance, this could be used for simulating random matrices with prescribed trace or determinant, which is difficult to achieve by direct sampling of matrices.

For the sake of completeness, we should mention that there are remarkable alternative simulation algorithms which are not based on a diffusion process, such as the ones based on piecewise deterministic Markov processes (PDMP), see for instance [41] and [72].

1 Boltzmann–Gibbs Measures

We are interested in interacting particle systems subject to an external field and experiencing singular pair interactions. In order to encompass Coulomb gases as well as log-gases from random theory, we introduce a vector subspace S of dimension d of \({\mathbb {R}}^n\), with \(n\ge 2\) and \(n\ge d\ge 1\). The particles belong to S, and \({\mathbb {R}}^n\) is understood as a physical ambient space. We equip S with the trace of the Lebesgue measure of \({\mathbb {R}}^n\), denoted by \(\mathrm {d}x\). The external field and the pair interaction are respectively denoted by \(V:S\mapsto {\mathbb {R}}\) and \(W:S\mapsto (-\infty ,+\infty ]\), and belong to \(\mathcal {C}^2\) functions, with \(W(x)<\infty \) for all \(x\ne 0\). For any \(N\ge 2\), we consider the probability measure \(P_N\) on \(S^N=S\times \cdots \times S\) defined by

$$\begin{aligned} P_N(\mathrm {d}x)=\frac{\mathrm {e}^{-\beta _NH_N(x_1,\ldots ,x_N)}}{Z_N}\mathrm {d}x_1,\ldots ,\mathrm {d}x_N, \end{aligned}$$
(1.1)

where \(\beta _N>0\) is a parameter,

$$\begin{aligned} Z_N=\int _{S^N}\mathrm {e}^{-\beta _NH_N(x_1,\ldots ,x_N)}\mathrm {d}x_1,\ldots ,\mathrm {d}x_N \end{aligned}$$

is the normalizing factor, and

$$\begin{aligned} H_N(x_1,\ldots ,x_N)=\frac{1}{N}\sum _{i=1}^NV(x_i) +\frac{1}{2N^2}\sum _{i\ne j}W(x_i-x_j) \end{aligned}$$

is usually called energy or Hamiltonian of the system. We assume that \(\beta _N\), V, and W are chosen in such a way that \(Z_N<\infty \) for any N. The law \(P_N\) is invariant by permutation of the coordinates \(x_1,\ldots ,x_N\) (exchangeable), and \(H_N\) depends only on the empirical measure

$$\begin{aligned} \mu _N=\frac{1}{N}\sum _{i=1}^N\delta _{x_i}. \end{aligned}$$

Therefore \(P_N\) is also the law of a random empirical measure encoding a cloud of indistinguishable particles \(x_1,\ldots ,x_N\). We emphasize that the particles live on the space \(S^N=S\times \cdots \times S\) of dimension dN. The parameter n serves as the physical dimension of the ambient space, for the Coulomb gas setting described next.

For any \(m\ge 1\) and \(x\in \mathbb {R}^m\), we denote by \({{\left| x \right| }}=\sqrt{x_1^2+\cdots +x_m^2}\) the Euclidean norm of x. This matches the absolute value when \(m=1\) and the modulus when \(m=2\), \(\mathbb {R}^2\equiv \mathbb {C}\).

1.1 Coulomb Gases

The notion of Coulomb gas is based on elementary electrostatics. Here the vector subspace S is interpreted as a conductor. It corresponds to taking \(W=g\) where g is the Coulomb kernel or Green function in the physical space \(\mathbb {R}^n\). More precisely, recall that the Green function g in \({\mathbb {R}}^n\), \(n\ge 2\), is defined for all \(x\in {\mathbb {R}}^n\), \(x\ne 0\), by

$$\begin{aligned} g(x)= {\left\{ \begin{array}{ll} \log \frac{1}{{{\left| x \right| }}} &{} \text {if }n=2,\\ \frac{1}{{{\left| x \right| }}^{n-2}} &{} \text {if }n\ge 3. \end{array}\right. } \end{aligned}$$

This function is the fundamental solution of the Poisson equation, namely, denoting by \(\Delta \) the Laplace operator in \(\mathbb {R}^n\) and by \(\delta _0\) the Dirac mass at 0, we have, in the sense of distributions,

$$\begin{aligned} -\Delta g=c\delta _0, \quad \mathrm {with}\quad c ={\left\{ \begin{array}{ll} 2\pi &{} \text {if }n=2,\\ (n-2)|{\mathbb {S}}^{n-1}|=\frac{n(n-2)\pi ^{n/2}}{\Gamma (1+n/2)} &{} \text {if }n\ge 3. \end{array}\right. } \end{aligned}$$

The physical interpretation in terms of electrostatics is as follows: \(H_N(x_1,\ldots ,x_N)\) is the electrostatic energy of a configuration of N electrons in \(\mathbb {R}^n\) lying on S at positions \(x_1,\ldots ,x_N\), in an external field given by the potential V. The Green function or Coulomb kernel g expresses the Coulomb repulsion which is a two body singular interaction. The probability measure \(P_N\) can be seen as a Boltzmann–Gibbs measure, \(\beta _N\) playing the role of an inverse temperature. The probability measure \(P_N\) is known as a Coulomb gas or as a one-component plasma, see for instance [68] and references therein.

1.2 Log-Gases

A log-gas corresponds to choosing \(d=n\) and a logarithmic interaction W whatever the value of n is, namely

$$\begin{aligned} W(x)=\log \frac{1}{{{\left| x \right| }}}=-\frac{1}{2}\log (x_1^2+\cdots +x_d^2),\quad x\in S. \end{aligned}$$

Coulomb gases and log-gases coincide when \(d=n=2\). In dimension \(d=n\ge 3\), log-gases are natural and classical objects of approximation theory and can be seen as limiting Riesz potentials, namely \(\lim _{\alpha \rightarrow 0}\frac{1}{\alpha }(|x|^{-\alpha }-1)\), see for instance [68,69,70].

1.3 Static Energy and Equilibrium Measures

Under natural assumptions over V and W, typically when \(\beta _N\gg N\) and V beats W at infinity, it is well known, see for instance [14, 67] and references therein, that \(P_N\) almost surely, the empirical measure

$$\begin{aligned} \mu _N=\frac{1}{N}\sum _{i=1}^N\delta _{x_i} \end{aligned}$$

tends as \(N\rightarrow \infty \) to a non random probability measure, the equilibrium measure

$$\begin{aligned} \mu _*=\arg \inf \mathcal {E}, \end{aligned}$$

the unique minimizer of the strictly convex lower semi-continuous “energy” \(\mathcal {E}\) defined by

$$\begin{aligned} \mu \mapsto \mathcal {E}(\mu )=\int \!V\mathrm {d}\mu +\iint W(x-y)\mu (\mathrm {d}x)\mu (\mathrm {d}y). \end{aligned}$$

When \(W=g\) is the Coulomb kernel, the quantity \(\mathcal {E}(\mu )\) is the electrostatic energy of the distribution of charges \(\mu \), formed by the sum of the electrostatic potential coming from the external electric field V with the Coulomb self repulsion by mean of the Coulomb kernel g. Note that \(\mathcal {E}(\mu )=\infty \) if \(\mu \) has a Dirac mass due to the singularity of g. An Euler–Lagrange variational analysis reveals that when \(S=\mathbb {R}^d\) and V is smooth, convex, and grows faster than g at infinity then the equilibrium probability measure \(\mu _*\) is compactly supported and has density proportional to \(\Delta V\), see [14] and references therein. Table 1 gives examples of equilibrium measures in this Coulomb setting. We refer to [33, 44, 65, 67, 68] for old and new potential theory from this analytic point of view. Moreover, quite a few equilibrium measures are known for log-gases beyond Coulomb gases, see for instance [16].

Table 1 Examples of equilibrium measures for Coulomb gases, see [14, 65]

Actually it can be shown that essentially if \(\beta _N\gg N\) and V beats g at infinity then under \({(P_N)}_N\) the sequence of random empirical measures \({(\mu _N)}_N\) satisfies a large deviation principle with speed \(\beta _N\) and good rate function \(\mathcal {E}\), see [3, 14, 30]. Concentration of measure inequalities are also available, see [12] and references therein.

1.4 Two Remarkable Gases from Random Matrix Theory

Let us give a couple of famous gases from random matrix theory that will serve as benchmark for our algorithm. They correspond to \(n=2\) because the Lebesgue measure on a matrix translates via the Jacobian of the change of variable to a Vandermonde determinant on the eigenvalues, giving rise to the two-dimensional Coulomb kernel inside the exponential via the identity

$$\begin{aligned} \prod _{i<j}|x_i-x_j|=\exp \left( \sum _{i<j}\log |x_i-x_j|\right) . \end{aligned}$$

Hence the name “log-gases”. A good reference on this subject is [28] and we refer to [21, 24, 28, 29, 39] for more examples of Coulomb gases related to random matrix models. Coulomb gases remain interesting in any dimension n beyond random matrices, see [67, 68].

Beta-Hermite model This model corresponds to

$$\begin{aligned} d=1, \ n=2, \ S={\mathbb {R}}, \ V(x)=\frac{x^2}{2\beta }, \ W(x)=-\log {{\left| \cdot \right| }}, \ \beta _N=N^2\beta , \ \beta \in (0,\infty ). \end{aligned}$$

This means that the particles evolve on the line \(\mathbb {R}\) with Coulomb interactions given by the Coulomb kernel in \(\mathbb {R}^2\). For \(\beta =2\), it becomes the famous Gaussian Unitary Ensemble (GUE), which is the distribution of the eigenvalues of random \(N\times N\) Hermitian matrices distributed according to the Gaussian probability measure with density proportional to \(H\mapsto \mathrm {e}^{-N\mathrm {Tr}(H^2)}\). Beyond the case \(\beta =2\), the cases \(\beta =1\) and \(\beta =4\) correspond respectively to Gaussian random matrices with real and quaternionic entries. Following [21], for all \(\beta \in (0,\infty )\), the measure \(P_N\) is also the distribution of the eigenvalues of special random \(N\times N\) Hermitian tridiagonal matrices with independent but non identically distributed entries. Back to the case \(\beta =2\), the law \(P_N\) writes

$$\begin{aligned} (x_1,\ldots ,x_N)\in \mathbb {R}^N\mapsto \mathrm {e}^{-\frac{N}{2}\sum _{i=1}^N x_i^2}\prod _{i<j}(x_i-x_j)^2. \end{aligned}$$
(1.2)

In this case, the Coulomb gas \(P_N\) has a determinantal structure, making it integrable or exactly solvable for any \(N \ge 2\), see [28, 57]. This provides in particular a formula for the density of the mean empirical spectral distribution \(\mathbb {E}\mu _N\) under \(P_N\), namely

$$\begin{aligned} x\in \mathbb {R}\mapsto \frac{\mathrm {e}^{-\frac{N}{2}x^2}}{\sqrt{2\pi N}} \sum _{\ell =0}^{N-1}H_\ell ^2(\sqrt{N}x), \end{aligned}$$
(1.3)

where \({(H_\ell )}_{\ell \ge 0}\) are the Hermite polynomials which are the orthonormal polynomials for the standard Gaussian distribution \(\mathcal {N}(0,1)\). The equilibrium measure \(\mu _*\) in this case is the Wigner semicircle distribution with the following density with respect to the Lebesgue measure:

$$\begin{aligned} x\in \mathbb {R}\mapsto \frac{\sqrt{4-x^2}}{2\pi }\mathbf {1}_{x\in [-2,2]}. \end{aligned}$$
(1.4)

A plot of \(\mu _*\) and \(\mathbb {E}\mu _N\) is provided in Fig. 1, together with our simulations. We refer to [46] for a direct proof of convergence of (1.3)–(1.4) as \(N\rightarrow \infty \). Beyond the case \(\beta =2\), the equilibrium measure \(\mu _*\) is still a Wigner semicircle distribution, scaled by \(\beta \), supported by the interval \([-\beta ,\beta ]\), but up to our knowledge we do not have a formula for the mean empirical spectral distribution \(\mathbb {E}\mu _N\), except when \(\beta \) is an even integer, see [21].

Beta-Ginibre model This model corresponds to

$$\begin{aligned} d=2, \ n=2, \ S={\mathbb {R}}^2, \ V(x)=\frac{|x|^2}{\beta }, \ W(x)=-\log {{\left| x \right| }}, \ \beta _N=N^2\beta , \ \beta \in (0,\infty ). \end{aligned}$$

In this case, the particles move in \(\mathbb {R}^2\) with a Coulomb repulsion of dimension 2—it is therefore a Coulomb gas. As for the GUE, the law \(P_N\) can be written as

$$\begin{aligned} (x_1,\ldots ,x_N)\in (\mathbb {R}^2)^N \mapsto \mathrm {e}^{-N\sum _{i=1}^N|x_i|^2}\prod _{i<j}|x_i-x_j|^\beta . \end{aligned}$$
(1.5)

When \(\beta =m\) for an even integer \(m\in \{2,4,\ldots \}\), the law of this gas matches the Laughlin wavefunction modeling the fractional quantum Hall effect (FQHE), see for instance [26].

For \(\beta =2\), this gas, known as the complex Ginibre Ensemble, matches the distribution of the eigenvalues of random \(N\times N\) complex matrices distributed according to the Gaussian probability measure with density proportional to \(M\mapsto \mathrm {e}^{-N\mathrm {Tr}(MM^*)}\) where \(M^*=\overline{M}^\top \). In this case \(P_N\) has a determinantal structure, see [28, 57]. This provides a formula for the density of the mean empirical spectral distribution \(\mathbb {E}\mu _N\) under \(P_N\), namely

$$\begin{aligned} x\in \mathbb {R}^2\mapsto \frac{\mathrm {e}^{-N|x|^2}}{\pi }\sum _{\ell =0}^{N-1}\frac{|\sqrt{N}x|^{2\ell }}{\ell !}, \end{aligned}$$
(1.6)

which is the analogue of (1.3) for the Gaussian Unitary Ensemble. Moreover, if \(Y_1,\ldots ,Y_N\) are independent and identically distributed Poisson random variables of mean \(|x|^2\) for some \(x\in \mathbb {R}^2\), then (1.6) writes

$$\begin{aligned} x\in \mathbb {R}^2\mapsto \frac{1}{\pi }\mathbb {P}\left( \frac{Y_1+\cdots +Y_N}{N}<1\right) . \end{aligned}$$

As \(N\rightarrow \infty \), by the law of large numbers, it converges to \(1/\pi \) if \(|x|<1\) and to 0 if \(|x|>1\), while by the central limit theorem it converges to \(1/(2\pi )\) if \(|x|=1\). It follows that \(\mathbb {E}\mu _N\) converges weakly as \(N\rightarrow \infty \) to the uniform distribution on the disk, with density

$$\begin{aligned} x\in \mathbb {R}^2 \mapsto \frac{\mathbf {1}_{|x|<1}}{\pi }, \end{aligned}$$
(1.7)

which is the equilibrium measure \(\mu _*\). When N is finite, the numerical evaluation of (1.6) is better done by mean of the Gamma law. Namely, by induction and integration by parts, (1.6) writes

$$\begin{aligned} x\in \mathbb {R}^2 \mapsto \frac{1}{\pi (N-1)!}\int _{N|x|^2}^\infty u^{N-1}\mathrm {e}^{-u}\mathrm {d}u =\frac{\Gamma (N,N|x|^2)}{\pi }, \end{aligned}$$

where \(\Gamma \) is the normalized incomplete Gamma function and where we used the identity

$$\begin{aligned} \mathrm {e}^{-r}\sum _{\ell =0}^{N-1}\frac{r^\ell }{\ell !} = \frac{1}{(N-1)!}\int _r^\infty u^{N-1}\mathrm {e}^{-u}\mathrm {d}u. \end{aligned}$$

Note that \(t\mapsto 1-\Gamma (N,t)\) is the cumulative distribution function of the Gamma distribution with shape parameter N and scale parameter 1. Figure 4 illustrates the difference between the limiting distribution (1.7) and the mean empirical spectral distribution (1.6) for a finite N. Beyond the case \(\beta =2\), we no longer have a formula for the density of \(\mathbb {E}\mu _N\), but a simple scaling argument reveals that the equilibrium measure \(\mu _*\) is in this case the uniform distribution on the centered disk of radius \(\sqrt{\beta /2}\).

2 Simulating Log-Gases and Coulomb Gases

Regarding simulation of log-gases or Coulomb gases such as (1.1), it is natural to use the random matrix models when they are available. There exist also methods specific to determinantal processes which cover the log-gases of random matrix theory with \(\beta =2\), see [2, 18, 32, 37, 45, 59, 66]. Beyond these specially structured cases, a great variety of methods are available for simulating Boltzmann–Gibbs measures, such as overdamped Langevin diffusion algorithm, Metropolis–Hastings algorithm, Metropolis adjusted Langevin algorithm (MALA), and kinetic versions called Hybrid or Hamiltonian Monte Carlo (HMC) which are based on a kinetic (or underdamped) Langevin diffusion, see for instance [10, 52]. Other possibilities exist, such as Nosé-Hoover dynamics [40] or piecewise deterministic Markov processes [9].

Two difficulties arise when sampling measures as (1.1). First, the Hamiltonian \(H_N\) involves all couples, so the computation of forces and energy scales quadratically with the number of particles. A natural way to circumvent this numerical problem is to use clusterization procedures such as the “fast multipole methods”, see for instance [35]. A second difficult feature of such a Hamiltonian is the singularity of the interacting function W, which typically results in numerical instability. A standard stabilization procedure is to «tame» the dynamics [11, 38], which is the strategy adopted in [55]. However, this smoothing of the force induces a supplementary bias in the invariant measure, as shown in [11] for regular Hamiltonians. This requires using small time steps, hence long computations. In the present note, we explore for the first time the usage of HMC for general Coulomb gases in the context of random matrices, in the spirit of [71], the difficulty being the singularity of the interaction. This method has the advantage of sampling the exact invariant measure (1.1), while allowing to choose large time steps, which reduces the overall computational cost [27].

In Sect. 2.1, we review standard methods for sampling measures of the form \(\mathrm {e}^{-\beta _N H_N}\), before presenting in detail the HMC algorithm in Sect. 2.2.

2.1 Standard Sampling Methods

To simplify and from now on, we suppose the support set S in (1.1) to be \(\mathbb {R}^d\). We introduce the methods based on the overdamped Langevin dynamics. To sample approximately (1.1), the idea is to exploit the fact that \(P_N\) in (1.1) is the reversible invariant probability measure of the Markov diffusion process \({(X_t)}_{t\ge 0}\) solution to the stochastic differential equation:

$$\begin{aligned} \mathrm {d}X_t=-\alpha _N\nabla H_N(X_t)\,\mathrm {d}t+\sqrt{2\frac{\alpha _N}{\beta _N}}\,\mathrm {d}B_t, \end{aligned}$$
(2.1)

or in other words

$$\begin{aligned} X_t=X_0-\alpha _N\int _0^t\nabla H_N(X_s)\,\mathrm {d}s+\sqrt{2\frac{\alpha _N}{\beta _N}}B_t, \end{aligned}$$

where \({(B_t)}_{t\ge 0}\) is a standard Brownian motion on \(S^N\) and \(\alpha _N>0\) is an arbitrary time scaling parameter (for instance \(\alpha _N=1\) or \(\alpha _N=\beta _N\)). The infinitesimal generator associated with (2.1) is

$$\begin{aligned} Lf=\frac{\alpha _N}{\beta _N}\Delta f -\alpha _N\nabla H_N\cdot \nabla f. \end{aligned}$$

The difficulty in solving (2.1) lies in the fact that the energy \(H_N\) involves a singular interaction W, which may lead the process to explode. Actually, under certain conditions on \(\beta _N\) and V, the Eq. (2.1) is well posed, the process \({(X_t)}_{t\ge 0}\) is well defined, and

$$\begin{aligned} X_t \, \underset{t\rightarrow \infty }{\overset{\mathrm {Law}}{\longrightarrow }} \, P_N, \end{aligned}$$

for all non-degenerate initial condition \(X_0\). See for instance [1, 13, 25] for the case of Beta-Hermite case known as the Dyson Ornstein–Uhlenbeck process, and [6] for the Beta-Ginibre case. We do not discuss these delicate aspects in this note. A convergence in Cesáro mean is provided by the ergodic theorem for additive functionals,

$$\begin{aligned} \frac{1}{t}\int _0^t\delta _{X_s}\, \mathrm {d}s\, \underset{t\rightarrow \infty }{\overset{\mathrm {weak}}{\longrightarrow }} \, P_N \end{aligned}$$

almost surely or, for any test function \(f\in L^1(P_N)\),

$$\begin{aligned} \frac{1}{t} \int _0^tf(X_s)\,\mathrm {d}s\, \underset{t\rightarrow \infty }{\longrightarrow }\, \int _S f \, \mathrm {d}P_N, \end{aligned}$$

almost surely. It is also possible to accelerate the convergence by adding a divergence free term in the dynamics (2.1), see for instance [22, 49] and references therein. This modification keeps the same invariant distribution but produces a non-reversible dynamics.

This method of simulation is referred to as an “unadjusted Langevin algorithm”, a terminology which will be clarified later on. In practice, one cannot simulate the continuous stochastic process \({(X_t)}_{t\ge 0}\) solution to (2.1), and resorts to a numerical integration with a finite time step \(\Delta t\). A typical choice is the Euler–Maruyama scheme [42, 58], which reads

$$\begin{aligned} x_{k+1} = x_k -\nabla H_N(x_k)\alpha _N\Delta t+ \sqrt{2\frac{\alpha _N}{\beta _N}\Delta t}G_k, \end{aligned}$$
(2.2)

where \((G_k)\) is a family of independent and identically distributed standard Gaussian variables, and \(x_k\) is an approximation of \(X_{k\Delta t}\). Note that \(\alpha _N\) and \(\Delta t\) play the same role here. However, because of the singularity of \(H_N\), this sampling scheme leads to important biases in practice, and (2.2) may even lack an invariant measure [56, Sect. 6]. One way to stabilize the dynamics is to use a tamed version of (2.2), which typically takes the following form:

$$\begin{aligned} x_{k+1} = x_k - \frac{\nabla H_N(x_k)\alpha _N\Delta t}{1+|\nabla H_N(x_k)|\alpha _N\Delta t} + \sqrt{2\frac{\alpha _N}{\beta _N}\Delta t}G_k. \end{aligned}$$
(2.3)

This strategy is used in [55] but, as noted by the authors, the number of time steps needed to run a trajectory of fixed time T scales as \(\Delta t\sim N^{-2}\), which makes the study of large systems difficult.

Another strategy is to add a selection step at each iteration. This is the idea of the Metropolis Adjusted (overdamped) Langevin Algorithm (MALA) [63], which prevents irrelevant moves with a Metropolis step. One can also view the MALA algorithm as a Metropolis algorithm in which the proposal is produces by using a one step discretization of the Langevin dynamics (2.1). Let us make this precise; more details can be found e.g. in [61, 63].

Algorithm 2.1

(Metropolis Adjusted (overdamped) Langevin Algorithm—MALA) Let K be the Gaussian transition kernel associated to the Markov chain of the Euler discretization (2.2) of the dynamics (2.1). For each step k,

  • draw a proposal \({\tilde{x}}_{k+1}\) according to the kernel \(K(x_k,\cdot )\),

  • compute the probability

    $$\begin{aligned} p_k=1\wedge \frac{K({\tilde{x}}_{k+1},x_k)\mathrm {e}^{-\beta _N H_N({\tilde{x}}_{k+1})}}{K(x_{k},{\tilde{x}}_{k+1})\mathrm {e}^{-\beta _N H_N(x_{k})} }, \end{aligned}$$
    (2.4)
  • set

    $$\begin{aligned} x_{k+1} ={\left\{ \begin{array}{ll} {\tilde{x}}_{k+1} &{} \text {with probability }p_k;\\ x_k &{} \text {with probability }1-p_k. \end{array}\right. } \end{aligned}$$

Note that the “reversed” kernel \(K(\cdot ,x)\) is Gaussian only if \(H_N\) is a quadratic form. Note also that if the proposal kernel K is symmetric in the sense that \(K(x,y)=K(y,x)\) for all xy then it disappears in (2.4), and it turns out that this is the case for the Hybrid Monte Carlo algorithm described next (up to momentum reversal)!

A natural issue with these algorithms is the choice of \(\Delta t\): if it is too large, an important fraction of the proposed moves will be rejected, hence poor convergence properties; conversely, if \(\Delta t\) is too small, many steps will be accepted but the physical ellapsed time will be small, hence a large variance for a fixed number of iterations. This algorithm actually has a nice scaling of the optimal time step \(\Delta t\) with the dimension of the system. Indeed, it can be shown that it scales as \(\Delta t\sim N^{-\frac{1}{3}}\), at least for product measures (see [62] and references therein). Although this algorithm is already efficient, we propose to use a kinetic version with further advantages.

2.2 Hybrid Monte Carlo Algorithm

Hybrid Monte Carlo is built on Algorithm 2.1, but using a kinetic version of (2.1). For this, a momentum variable is introduced so as to improve the exploration of the space. Namely, set \(E=\mathbb {R}^{dN}\), and let \(U_N:E\rightarrow \mathbb {R}\) be smooth and such that \(\mathrm {e}^{-\beta _N U_N}\) is Lebesgue integrable. Let \({(X_t,Y_t)}_{t\ge 0}\) be the diffusion process on \(E\times E\) solution to the stochastic differential equation

$$\begin{aligned} \left\{ \begin{aligned} \mathrm {d}X_t&=\alpha _N\nabla U_N(Y_t)\,\mathrm {d}t,\\ \mathrm {d}Y_t&\displaystyle =-\alpha _N\nabla H_N(X_t)\,\mathrm {d}t-\gamma _N\alpha _N\nabla U_N(Y_t)\,\mathrm {d}t+\sqrt{2\frac{\gamma _N\alpha _N}{\beta _N}}\,\mathrm {d}B_t, \end{aligned} \right. \end{aligned}$$
(2.5)

where \({(B_t)}_{t\ge 0}\) is a standard Brownian motion on E, and \(\gamma _N>0\) is an arbitrary parameter which plays the role of a friction, and which may depend a priori on N and \((X_t)_{t\ge 0}\), even if we do not use this possibility here. In addition, \(H_N\) and \(\beta _N\) are as in (1.1), while \(U_N\) plays the role of a generalized kinetic energy [71]. This dynamics admits the following generator:

$$\begin{aligned} Lf= \underbrace{-\alpha _N\nabla H_N(x)\cdot \nabla _yf+\alpha _N\nabla U_N(y)\cdot \nabla _x f}_{L_1} + \underbrace{\frac{\gamma _N\alpha _N}{\beta _N}\Delta _yf-\gamma _N\alpha _N\nabla U_N(y)\cdot \nabla _yf}_{L_2} \end{aligned}$$
(2.6)

where \(L_1\) is known as the Hamiltonian part while \(L_2\) is called the fluctuation-dissipation part. The dynamics leaves invariant the product Boltzmann–Gibbs measure

$$\begin{aligned} R_N=P_N\otimes Q_N \quad \text {where}\quad Q_N(\mathrm {d}y)=\frac{\mathrm {e}^{-\beta _NU_N(y)}}{Z_N'}\mathrm {d}y, \end{aligned}$$

see for instance [71]. In other words

$$\begin{aligned} R_N(\mathrm {d}x,\mathrm {d}y)=\frac{\mathrm {e}^{-\beta _N\widetilde{H}_N(x,y)}}{Z_NZ_N'}\mathrm {d}x\, \mathrm {d}y \quad \text {with}\quad \widetilde{H}_N(x,y)=H_N(x)+U_N(y). \end{aligned}$$
(2.7)

As for the overdamped dynamics, the ergodic theorem for additive functionals gives

$$\begin{aligned} \frac{1}{t}\int _0^t\delta _{(X_s,Y_s)}\, \mathrm {d}s\, \underset{t\rightarrow \infty }{\overset{\mathrm {weak}}{\longrightarrow }}\, R_N \quad \text {almost surely.} \end{aligned}$$

Remark 2.2

(Terms: Hamiltonian, Langevin, overdamped, underdamped, kinetic) The dynamics (2.5) is called “Hamiltonian” when we turn off the noise by taking \(\gamma _N=0\). On the other hand, when \(\gamma _N\rightarrow \infty \) and \(\alpha _N\rightarrow 0\) with \(\alpha _N\gamma _N=1\), we recover (2.1) from (2.5) with \(Y_t\) and \(U_N\) instead of \(X_t\) and \(H_N\). Both (2.1) and (2.5) are known as Langevin dynamics. To be more precise, (2.1) is generally called overdamped while (2.5) is referred to as kinetic or underdamped.

When \(U_N(y)=\frac{1}{2}|y|^2\) then \(Y_t=\mathrm {d}X_t/\mathrm {d}t\), and in this case \(X_t\) and \(Y_t\) can be interpreted respectively as the position and the velocity of a system of N points in S at time t. In this case we say that \(U_N\) is the kinetic energy. For simplicity, we specialize in what follows to this “physical” or “kinetic” case and refer to [71] for more possibilities.

As before, to simulate \({(X_t,Y_t)}_{t\ge 0}\), one can discretize (2.5) and sample from a trajectory. This will provide a proposal for the HMC scheme as the Euler discretization (2.2) did for Algorithm 2.1. A good way of doing this is a splitting procedure. First, one integrates the Hamiltonian part i.e. the operator \(L_1\) in (2.6), which amounts to a standard Hamiltonian dynamics, before integrating the fluctuation-dissipation part i.e. the operator \(L_2\) in (2.6). For discretizing the Hamiltonian dynamics over a time step, a standard approach is the Verlet integrator [31, 50], which we describe now. For a time step \(\Delta t>0\), this scheme reads, starting from a state \((x_k,y_k)\) at time k:

$$\begin{aligned} \left\{ \begin{aligned} y_{k+\frac{1}{2}}&= y_k -\nabla H_N(x_k)\alpha _N\frac{\Delta t}{2}, \\ x_{k+1}&= x_k +y_{k+\frac{1}{2}}\alpha _N\Delta t, \\ {\tilde{y}}_{k+1}&= y_{k+\frac{1}{2}}-\nabla H_N(x_{k+1})\alpha _N\frac{\Delta t}{2}. \end{aligned} \right. \end{aligned}$$

This corresponds to updating the velocity over half a time step, then the positions over a time step, and again the velocity over half a time-step. Given that this scheme only corresponds to the Hamiltonian part, it remains to integrate the fluctuation-dissipation part, corresponding to \(L_2\) in (2.6). For quadratic energies, it is a simple Ornstein–Uhlenbeck process whose variance can be computed explicitly. Therefore, we add to the previous scheme the following velocity update which comes from the Mehler formulaFootnote 1:

$$\begin{aligned} y_{k+1}= \eta {\tilde{y}}_{k+1} +\sqrt{\frac{1-\eta ^2}{\beta _N}}G_k, \quad \eta =\mathrm {e}^{-\gamma _N\alpha _N\Delta t}, \end{aligned}$$

where \(G_k\) is a standard Gaussian random variable. Like the numerical scheme (2.2), because of the singularity of the interactions, this integrator may not have an invariant measure [56], or its invariant measure may be a poor approximation of \(R_N\) depending on the time step [48]. Note that, here again, \(\alpha _N\) and \(\Delta t\) play the same role.

Hybrid or Hamiltonian Monte Carlo (HMC) methods, built on the later integration, appeared in theoretical physics in lattice quantum chromodynamics with [19], see also [64], and are still actively studied in applied mathematics, see for instance [4, 8, 17, 23, 34, 50, 71] and references therein. The HMC algorithm can be thought of in a sense as a special Metropolis Adjusted (underdamped) Langevin Algorithm. Indeed, inspired by the MALA Algorithm 2.1, a way to avoid the stability problem of the discretization of the kinetic Langevin dynamics mentioned above is to add an acceptance-rejection step. A surprising advantage of this approach is that the Verlet integration scheme is time reversible up to momenta reversal [50, Sect. 2.1.3 and Eq. (2.11)], hence when computing the acceptance probability as in (2.4), the transition kernel does not appear. Note that the Verlet algorithm has been widely used for years by statistical physicists, and goes back to the historical works of Verlet [73] and Levesque and Verlet [53, 54]. Let us now describe the algorithm.

Fig. 1
figure 1

Study of the Gaussian unitary ensemble with \(N=8\) (top) and \(N=50\) (bottom). The solid line is the plot of the limiting spectral distribution (1.4) while the dashed line is the plot of the mean empirical distribution (1.3). The bars form the histogram of simulations obtained using our HMC algorithm. This algorithm was run once with final-time \(T=10^6\) and time-step \(\Delta t=0.5\). The histogram was produced by looking at the last half of the trajectory and retaining the positions each 1000 time-steps, producing n values, namely \(\approx ~8\times 10^3\) and \(\approx ~5\times 10^4\) respectively

Fig. 2
figure 2

Evolution of the rejection rate in Algorithm 2.3 as \(\Delta t\) goes to zero, for the Gaussian unitary ensemble with \(N=50\), \(\beta =2\) and \(T=10^{5}\) (in log–log coordinate)

Fig. 3
figure 3

Study of the quartic confinement with \(N=8\) (top) and \(N=50\) (bottom). The solid line is the plot of the limiting spectral distribution (1.4). The bars form the histogram of simulations obtained using our HMC algorithm. This algorithm was run once with final-time \(T=10^6\) and time-step \(\Delta t=0.5\). The histogram was produced by looking at the last half of the trajectory and retaining the positions each 1000 time-steps, producing n values namely \(\approx ~8\times 10^3\) and \(\approx ~5\times 10^4\) respectively. We do not have a formula for the mean empirical distribution for this model. This gas describes the law of the eigenvalues of a random symmetric tridiagonal matrix model but its entries are not independent, see [43, Prop. 2]

Fig. 4
figure 4

Study of the complex Ginibre ensemble with \(N=8\) (top) and \(N=50\) (bottom). The solid line is the plot of the limiting spectral distribution (1.7) while the dashed line is the plot of the mean empirical distribution (1.6), both as functions of the radius |z| and scaled by \(2\pi \) (in order to obtain a radial density). The bars form the histogram of simulations obtained using our HMC algorithm. This algorithm was run 40 times with final-time \(T=10^5\) and time-step \(\Delta t=0.1\). The histogram was produced by looking at the last halves of the 40 trajectories and retaining the positions each 10000 time-steps, producing n values namely \(\approx ~16\times 10^3\) and \(\approx ~10^5\) respectively

Fig. 5
figure 5

Evolution of the energy difference in Algorithm 2.3 as \(\Delta t\) goes to zero, for the complex Ginibre ensemble with \(N=50\), \(\beta =2\) and \(T=10^{3}\) (in log–log coordinate)

Fig. 6
figure 6

Study of the fluctuation of the largest particle in modulus for the \(\beta \) complex Ginibre ensemble with \(N=50\), in the cases \(\beta \in \{1,2,4\}\). The solid line is the plot of the fit with a translation-scale Gumbel distribution. The Gumbel fluctuation is proved only in the case \(\beta =2\), see [15, 60]. These simulations suggest to conjecture that the Gumbel fluctuation is valid for any \(\beta >0\). The simulation matches pretty well the edge support at \(\sqrt{\beta /2}\) and suggests that the variance is not very sensitive to \(\beta \)

Fig. 7
figure 7

Study of the 3D Coulomb case (top) and 3D Log-gas (bottom) with Euclidean confinement and \(\beta =2\) and \(N=50\). Equilibrium measure in solid line and histogram obtained with our HMC algorithm with \(N=50\) and same simulation parameters as for Fig. 4. In contrast with the GUE case and the Ginibre case, we do not have a formula for the mean empirical distribution at fixed N for both cases, and for the Log-gas (bottom) the equilibrium measure is not known

Algorithm 2.3

[HMC] Start from a configuration \((x_0,y_0)\) and perform the following steps for each time \(k\ge 0\):

  1. (1)

    update the velocities with

    $$\begin{aligned} {\tilde{y}}_{k}=\eta y_{k}+\sqrt{\frac{1-\eta ^2}{\beta _N}} G_k, \quad \eta = \mathrm {e}^{-\gamma _N\alpha _N\Delta t}; \end{aligned}$$
  2. (2)

    run one step of the Verlet scheme:

    $$\begin{aligned} \left\{ \begin{aligned} {\tilde{y}}_{k+\frac{1}{2}}&={\tilde{y}}_k -\nabla H_N(x_k)\alpha _N\frac{\Delta t}{2};\\ {\tilde{x}}_{k+1}&=x_k +{\tilde{y}}_{k+\frac{1}{2}}\alpha _N\Delta t;\\ {\tilde{y}}_{k+1}&={\tilde{y}}_{k+\frac{1}{2}}-\nabla H_N(x_{k+1})\alpha _N\frac{\Delta t}{2};\\ \end{aligned} \right. \end{aligned}$$
    (2.8)
  3. (3)

    compute the probability ratio

    $$\begin{aligned} p_k = 1\wedge \exp \left[ -\beta _N\Big (H_N({\tilde{x}}_{k+1}) + \frac{{\tilde{y}}_{k+1}^2}{2} - H_N(x_{k}) - \frac{{\tilde{y}}_{k}^2}{2} \Big )\right] ; \end{aligned}$$
  4. (4)

    set

    $$\begin{aligned} (x_{k+1},y_{k+1}) = {\left\{ \begin{array}{ll} ({\tilde{x}}_{k+1},{\tilde{y}}_{k+1}) &{} \text {with probability }p_k;\\ (x_{k},-{\tilde{y}}_{k}) &{} \text {with probability }1-p_k. \end{array}\right. } \end{aligned}$$

As noted in the various references above, the Metropolis step acts as a corrector on the energy conservation of the Hamiltonian step. In this, it helps avoiding irrelevant moves, while enhancing the exploration capacities of the dynamics through the speed variable. A more precise argument in favor of this algorithm is the scaling of the time step \(\Delta t\) with respect to the system size N. Indeed, as shown in [4] for product measures, the optimal scaling is as \(\Delta t\sim N^{-\frac{1}{4}}\), which makes the algorithm appealing for large systems. Since the Hamiltonian computational cost scales as \(N^2\), we see that the cost of the algorithm for a fixed time T and \(N={\lceil }[T/\Delta t]{\rceil }\) is in \(\mathcal {O}(N^\frac{9}{4})\), which has to be compared to the \(\mathcal {O}(N^4)\) cost reached in [55]. Finally, the parameter \(\gamma _N\) can also be tuned in order to optimize the speed of convergence – we leave this point here and stick to \(\gamma _N=1\).

The control of the error or rate of convergence for the HMC algorithm is the subject of active research, see for instance [47] and [7, 23] for some results under structural assumptions.

From a practical point of view, the algorithm can be tested in the following way. First, when only the Hamiltonian part of the dynamics is integrated with the Verlet scheme (2.8), it can be checked that the energy variation over one time step scales as \(\Delta t^3\) as \(\Delta t\rightarrow 0\). Then, if the selection step is added, the rejection rate should also scale as \(\Delta t^3\). When the momentum resampling is added, this rejection rate scaling should not change. For completeness, we illustrate some of these facts in Sect. 3.

3 Numerical Experiments on Remarkable Models

In this section, we start testing Algorithm 2.3 for the two cases described in Sect. 1.4. Since the equilibrium measures are known for any \(N\ge 2\), we will be able to compare accurately our results with the expected one. We will also consider models for which the empirical spectral distribution and the equilibrium distribution are not known. We remind that when \(S=\mathbb {R}^d\) with \(d\ge 1\) we have the following formulas that hold in any dimension:

$$\begin{aligned} \nabla {{\left| x \right| }}^2=2x,\quad \nabla \log \frac{1}{{{\left| x \right| }}} =-\dfrac{x}{{{\left| x \right| }}^2},\quad \nabla \frac{1}{{{\left| x \right| }}}=-\dfrac{x}{{{\left| x \right| }}^3}. \end{aligned}$$

3.1 Case Study: 1D

We test the numerical method by looking at the mean empirical distribution in the case of the Gaussian Unitary Ensemble (1.2) with \(\beta =2\), \(N=8\), for which the exact expression of \(\mathbb {E}\mu _N\) under \(P_N\) is provided by (1.3). The results in Fig. 1 show a very good agreement between the exact result and the algorithm. For completeness, we study the rejection rate of the algorithm as \(\Delta t\) goes to zero, as mentioned at the end of Sect. 2.2. More precisely, we compute over a trajectory the rate of rejected moves in the Step 4 of Algorithm 2.3. The logarithmic plot in Fig. 2 shows a linear fit with a slope of about 3.1, which confirms the expected scaling in \(\Delta t^3\).

We also study the quartic confinement potential \(V(x)=x^4/4\), as in [55]. In this case, the empirical spectral distribution is not known, but the equilibrium distribution has density with respect to the Lebesgue measure given by

$$\begin{aligned} x\in \mathbb {R}\mapsto (2a^2+x^2)\frac{\sqrt{4a^2-x^2}}{2\pi }\mathbf {1}_{x\in [-2a,2a]}, \quad a = 3^{-\frac{1}{4}}. \end{aligned}$$

The results of the numerical simulations, see Fig. 3, show a good agreement with the equilibrium measure when N is large. Note that a tridiagonal random matrix model is known but it does not have independent entries, see [43, Prop. 2.1].

3.2 Case Study: 2D

We next consider in Fig. 4 the mean empirical distribution in the case of the Complex Ginibre Ensemble (1.5) with \(\beta =2\), \(N=8\). In this case, we also know a theoretical formula for \(\mathbb {E}\mu _N\) under \(P_N\), given by (1.6). For completeness, we investigate the scaling of the relative energy difference in the Step 3 of Algorithm 2.3 (by turning off the selection procedure of Step 4). The logarithmic plot in Fig. 5 shows a slope of about 2.9, which confirms the expected scaling in \(\Delta t^3\) that corresponds to the error of energy conservation, over one time step, of the Verlet integrator (2.8).

We explore next in Fig. 6 the Gumbel fluctuation at the edge, which is proved for \(\beta =2\) and conjectured for \(\beta \ne 2\), see [15, 20, 60] (note that in this case we have a formula for \(\mu _*\) but not for \(\mathbb {E}\mu _N\) under \(P_N\)). One could also explore the crystallization phenomenon, see [5] and references therein.

3.3 Case Study: 3D

In Fig. 7, we finally turn to the Coulomb gas which corresponds to \(S=\mathbb {R}^3\), \(d=n=3\), \(V={{\left| \cdot \right| }}^2/\beta \), \(W=1/{{\left| \cdot \right| }}\) and to the log-gas for which \(W=-\log {{\left| \cdot \right| }}\). In the first case the equilibrium measure \(\mu _*\) is uniform on the centered ball of \(\mathbb {R}^d\) of radius \((\beta (d-2)/2)^{1/d}\), see for instance [14, Cor. 1.3], while in the second case the equilibrium measure is not know yet, see however [16]. In both cases we do not have a formula for \(\mathbb {E}\mu _N\) under \(P_N\). One could study the fluctuation at the edge, which is conjectured to be Gumbel, just like for the complex Ginibre ensemble in 2D.