1 Introduction

We are interested in the long-time convergence to equilibrium of the solution f of the so-called linear Boltzmann (or BGK) equation

$$\begin{aligned} \partial _t f_t(x,y) \ = \ -v\cdot \nabla _x f_t(x,y) + \nabla U(x) \cdot \nabla _y f_t(x,y) + \lambda Qf_t(x,y) \end{aligned}$$
(1)

where \((x,y)\in {\mathbb {R}}^{2d}\), \(d\in {\mathbb {N}}_*\), \(\lambda >0\) is constant, \( U\in {\mathcal {C}}^2\left( {\mathbb {R}}^d,{\mathbb {R}}\right) \) and Q is either \(Q_1\) or \(Q_2\) with

$$\begin{aligned} Q_1 f(x,y)= & {} \gamma _{d,\sigma }(y) \int _{{\mathbb {R}}^d} f(x,v) \mathrm {d}v - f(x,y)\\ Q_2 f(x,y)= & {} \sum _{k=1}^d \left( \gamma _{1,\sigma }(y_k)\int _{{\mathbb {R}}} f(x,y_1,\dots ,y_{k-1},w,y_{k+1},\dots ,y_d) \mathrm {d}w - f(x,y)\right) \end{aligned}$$

with, for some \(\sigma >0\),

$$\begin{aligned} \gamma _{p,\sigma }(y) = \frac{e^{-\frac{1}{2\sigma ^2}|y|^2} }{\left( 2\pi \sigma ^2\right) ^{p/2} } \end{aligned}$$

a Gaussian measure on \({\mathbb {R}}^p\). We assume that \(f_0\) is a probability density so that, mass and positivity being conserved through time, \(f_t\) is a probability density for all \(t\geqslant 0\). Denoting \(H(x,y)=U(x)+|y|^2/2\), we suppose that \(\exp (-H/\sigma ^2)\) is integrable and we denote by \(\mu \) the probability law with density proportional to it (we also write \(\mu \) this density). Then \(\mu \) is a fixed point of (1). Our goal is to give a quantitative estimate for the convergence of a solution of (1) toward \(\mu \). In fact we will rather work with the relative density \(h_t = f_t/\mu \), which solves

$$\begin{aligned} \partial _t h_t \ =\ Lh_t \end{aligned}$$
(2)

with

$$\begin{aligned} Lh(x,y)\ = \ -v\cdot \nabla _x h(x,y) + \nabla U(x) \cdot \nabla _y h(x,y) + \eta \left( Ph - h\right) \,,\end{aligned}$$

where \((P,\eta )\) is either \((P_1,\eta _1)\) or \((P_2,\eta _2)\) with \(\eta _1=\lambda \), \(\eta _2=\lambda d\) and

$$\begin{aligned} P_1 h(x,y)= & {} \int _{{\mathbb {R}}^d} h(x,v) \gamma _{d,\sigma }(v) \mathrm {d}v\\ P_2 h(x,y)= & {} \frac{1}{d}\sum _{k=1}^d\int _{{\mathbb {R}}} h(x,y_1,\dots ,y_{k-1},w,y_{k+1},\dots ,y_d) \gamma _{1,\sigma }(w) \mathrm {d}w \,. \end{aligned}$$

Remark that \(P_1\) and \(P_2\) are Markov operators.

Equation (1) is a classical model in statistical physics, modelling the motion of a particle influenced by an external potential U and by random collisions with other particles with Gaussian velocities. We refer the interested reader to [12] and references within for details. Moreover, it intervenes in Markov Chain Monte Carlo methods. More precisely, denote \(L^*\) the dual of L in \(L^2(\mu )\). Integrating by parts, we see that

$$\begin{aligned} L^* \varphi (x,y) \ = \ v\cdot \nabla _x \varphi (x,y) - \nabla U(x) \cdot \nabla _y \varphi (x,y) + \eta \left( P\varphi - \varphi \right) \,. \end{aligned}$$
(3)

This is the generator of a Markov process (XY) whose law solves (1). When \(Q=Q_1\), the dynamics of the process is the following: the particle follows the Hamiltonian flow \(\dot{x} = y\), \(\dot{y} = -\nabla _x U(x)\) and, at random times with exponential law of intensity \(\lambda \), the velocity y is refreshed to a new Gaussian value. The motion is similar when \(Q=Q_2\), except that each coordinate of the velocity has its own exponential clock, and is refreshed to a new Gaussian value independently from the other components. This process is sometimes called the randomized Hamiltonian Monte Carlo process [6]. Since its law converges to \(\mu \), ergodic averages of the process can be used as estimators for the expectations of some observables with respect to \(\mu \). A non-asymptotic, quantitative long-time convergence estimate for (2) then classically provides bounds on the bias and variance of such estimators.

The question of the long-time convergence of (1) [or equivalenty (2)] has been studied in much general forms in a number of works (see e.g. [12] and references within). The exponential convergence in the \(L^2\) sense, i.e. the existence of constants \(C,\theta >0\) such that

$$\begin{aligned} \Vert h_t - 1 \Vert ^2_{L^2\left( \mu \right) }\leqslant & {} C e^{-\theta t} \Vert h_0 - 1 \Vert ^2_{L^2\left( \mu \right) }, \end{aligned}$$

has been established under different assumptions by several authors [1, 8, 12, 13]. This long-time convergence is said to be hypocoercive [8, 18], in the sense that C is necessarily greater than 1 or, in other words, \(h_t\) converges exponentially fast to 0 but not at a constant rate (note that both the \(L^2\) norm and the relative entropy studied below are non-increasing with time).

When one studies a system of N particles with chain or mean-field interactions (so that \(d=Nd'\), where \(d'\) is the dimension of the ambient space), the \(L^2\)-norm is not well-adapted, since it scales badly in N. In these contexts, a more natural way to quantify the distance to equilibrium is the relative entropy \(\int h \ln h \mathrm {d}\mu \), as in [10, 15, 16]. Nevertheless, entropic hypocoercivity results (see e.g. [17,18,19]) are usually restricted to diffusion processes (i.e. differential operators). Indeed, since non-local operators such as \( \eta \left( P-I\right) \) do not satisfy the chain rule, it is less easy to handle derivatives of non-quadratic quantities of h and \(\nabla h\). This is a general and important problem, related to the question of giving good definitions of non-local Fisher Information.

Nevertheless, for the linear Boltzmann equation, this has been achieved by Evans [9] in a recent paper in the case of the periodic torus (namely \(x\in {\mathbb {T}}^d\), \({\mathbb {T}}= {\mathbb {R}}/{\mathbb {Z}}\)) with no potential (\(U=0\)). The purpose of the present note is to show that the computations of Evans, together with the recent results on generalized Ornstein-Uhlenbeck processes ( [2, 3, 17]), allows in fact to deal with the case where \(x\in {\mathbb {R}}^d\) and U is close to being quadratic.

Assumption 1

There exist \(K,\kappa >0\) such that \(\kappa \leqslant \nabla ^2 U(x) \leqslant K\) for all \(x\in {\mathbb {R}}^d\).

In fact, we won’t deal with the entropy itself, but with the classical Fisher Information

$$\begin{aligned} {\mathcal {I}}\left( h\right)= & {} \int \frac{|\nabla h|^2}{h} \mathrm {d}\mu . \end{aligned}$$

Under Assumption 1, the Hamiltonian H is strictly convex, so that, by classical arguments (see e.g. [4]), \(\mu \) satisfies a log-Sobolev inequality

$$\begin{aligned} \int h \ln h \mathrm {d}\mu\leqslant & {} c {\mathcal {I}}(h), \end{aligned}$$

where the constant c only depends on \(\kappa \) and \(\sigma \). For elliptic or hypoelliptic diffusions, such as the kinetic Langevin (or Fokker-Planck) equation, a short-time regularization occurs, so that the Fisher Information is finite for all positive times given that the initial entropy is finite (see for instance [17, Theorem 9]). However, this is not the case for equation (2), and thus we will only consider smooth initial datum with \({\mathcal {I}}(h_0)<\infty \). More precisely, for the sake of simplicity, we will assume that, for some \(\varepsilon \geqslant 0\),

$$\begin{aligned} h_0\in {\mathcal {A}}_\varepsilon \ :=\ \left\{ g\in {\mathcal {C}}^\infty \left( {\mathbb {R}}^{2d}\right) ,\ \int g \mathrm {d}\mu = 1, \ g\geqslant \varepsilon ,\ \partial ^\alpha g \text { bounded }\forall \alpha \text { multi-index}\right\} . \end{aligned}$$

Note that, for any \(\varepsilon \geqslant 0\), the set \({\mathcal {A}}_\varepsilon \) is fixed by Equation (2), as proved in [7, Appendix] (here we don’t need uniform in time estimates for the bounds of the derivatives). The result can then be extended by a density argument to all positive \(h_0\) with \({\mathcal {I}}(h_0)<+\infty \).

Theorem 1

Under Assumption 1 let \((h_t)_{t\geqslant 0}\) be a solution of (2) with \(h_0\in {\mathcal {A}}_0\) such that \({\mathcal {I}}(h_0) < +\infty \). Let

$$\begin{aligned} \theta \ = \ \frac{2}{3}\left( \frac{2\lambda K}{4K + \lambda ^2} - \frac{(K-\kappa )^2}{\sqrt{K} } \right) \,,\qquad C\ = \ 3 \max \left( \frac{4K^2}{4K + \lambda ^2}, \frac{4K + \lambda ^2}{4K^2}\right) \,. \end{aligned}$$

Suppose that \(\theta \geqslant 0\). Then, for all \(t\geqslant 0\),

$$\begin{aligned} {\mathcal {I}}\left( h_t\right)\leqslant & {} C e^{-\theta t} {\mathcal {I}}\left( h_0\right) . \end{aligned}$$

Remarks

  • The result is the same for \(Q=Q_1\) or \(Q_2\), and C and \(\theta \) do not depend on \(\sigma \).

  • The log-Sobolev inequality satisfied by \(\mu \) and Theorem 1 imply that, for some \(C'>0\),

    $$\begin{aligned} \int h_t \ln h_t \mathrm {d}\mu\leqslant & {} C' e^{-\theta t} {\mathcal {I}}(h_0). \end{aligned}$$

    By the Pinsker’s inequality, considering the Markov process (XY) with generator \(L^*\) given by (3) and initial law \(f_0\), we get that for all measurable set \(A\subset {\mathbb {R}}^{2d}\),

    $$\begin{aligned}|{\mathbb {P}} \left( (X_t,Y_t)\in A\right) - \mu (A)| \ \leqslant \ \frac{1}{2} \Vert f_t -\mu \Vert _1 \ \leqslant \ \sqrt{\frac{1}{2} \int h_t \ln h_t \mathrm {d}\mu }\,. \end{aligned}$$

    This gives a bound on the bias of the Monte Carlo estimator of \(\mu (A)\) based on the process (XY), with constants \(C'\) and \(\theta \) that depends (explicitly) only on \(\kappa ,K,\gamma ,\sigma \).

  • The assumption that the potential is convex is usual in the studies of the long-time behaviour of Markov processes. The fact that its Hessian is also bounded above, and more precisely that the Hessian is not too far from a constant matrix, is a much more rigid assumption, which already appeared in similar works [2, 5]. Besides, there are examples of kinetic processes with a convex potential with an unbounded Hessian, which does not converge exponentially fast to their equilibrium [11]. Essentially, we are able to deal with the Gaussian case because of some nice algebra, and have some room for a small perturbation. More precisely, in the Gaussian case, the Jacobian of the drift is a constant matrix, so that the question of the contraction of a suitably modified Fisher Information boils down to a linear algebra problem (see Proposition 5 below). Then a Lipschitz perturbation from this linear case can be absorbed by the positive contraction of the linear case (see the end of the proof of Theorem 1).

  • The rate of convergence is of order \(\lambda \) when \(\lambda \) goes to zero and of order \(\lambda ^{-1}\) when \(\lambda \) goes to infinity, which is similar to the kinetic Langevin case ( [14]), and expected. Indeed, when \(\lambda \) is small, the typical time for the velocity to be refreshed (and thus, to mix) is \(\lambda ^{-1}\). On the other hand, when \(\lambda \) is large, in a time of order 1, there are many jumps, and by the law of large number, the effective velocity is close to zero, and the position moves (and thus, mixes) slowly. If time is accelerated by a factor \(\lambda \), the position then converges to an overdamped Langevin process.

  • In this particular close-to-quadratic case, Theorem 1 answers the question raised in [9, Section 1.5].

  • Consider the case where \(Q=Q_2\) and \(U(x)=a|x|^2+ \frac{1}{d}\sum _{i,j=1}^d W(x_i-x_j)\) with an even potential W with bounded Hessian and \(a>0\). This corresponds to a mean-field interaction between \(N=d\) particles. Provided \(\Vert W''\Vert _\infty \) is sufficiently small with respect to a and \(\lambda \), Theorem 1 yields a speed of convergence to equilibrium wich is independent from the number of particles. Then, the arguments from [10, 16] may be adapted (the parallel coupling with Wiener processes being replaced by a parallel coupling with Poisson processes) to obtain uniform in time propagation of chaos, and long-time convergence for the non-linear PDE obtained at the limit (note that the latter is not the Boltzmann equation, for which the interaction lies at the level of the collisions rather than of the Hamiltonian).

2 Proof

We write \(h_t = e^{tL}h_0\) the solution of (2) with initial condition \(h_0\). In the rest of the paper, we will always consider \(h \in {\mathcal {A}}_\varepsilon \) with \(\varepsilon >0\). Indeed, suppose that Theorem 1 has been proved for \(h \in {\mathcal {A}}_\varepsilon \) with any arbitrary \(\varepsilon >0\), and consider \(h_0 \in {\mathcal {A}}_0\). Set \(h_0^{(\varepsilon )} = (1-\varepsilon ) h_0 + \varepsilon \). Then \(h_t^{(\varepsilon )} := e^{tL}h_0^{(\varepsilon )}= (1-\varepsilon ) h_t + \varepsilon \) so that, applying Theorem 1 to \(h_t^{(\varepsilon )}\) and letting \(\varepsilon \) go to 0, the monotone convergence theorem yields the result for \(h_t\). The restriction to the cases \(\varepsilon >0\) will ensure that all the forthcoming derivations under the integral sign are correct. In particular, \({\mathcal {I}}(h)<+\infty \) for all \(h\in {\mathcal {A}}_\varepsilon \) for \(\varepsilon >0\).

We start with a general computation. Denoting by \(A^T\) the transpose of a matrix (and seeing vectors as column matrices, so that the scalar product between two vectors u and v can be denoted by \(u^T v\)), for a symmetric matrix M, we write

$$\begin{aligned} {\mathcal {I}}_M\left( h\right)= & {} \int \frac{(\nabla h)^T M \nabla h}{h} \mathrm {d}\mu . \end{aligned}$$

Our aim is to construct M such that \(\partial _t \left( {\mathcal {I}}_M\left( e^{tL}h\right) \right) \leqslant - \theta {\mathcal {I}}_M\left( e^{tL} h\right) \) with \(\theta >0\). In the following, in a \(2d \times 2d\) matrix, a \(d\times d\) block equal to \(\alpha I_d\) for some \(\alpha \in {\mathbb {R}}\) will sometimes be denoted only by \(\alpha \), and \(N\geqslant M\) stands for the usual order for symmetric matrices NM.

For an operator A, we write \(\left( \partial _t\right) _{|A}\) the derivative at \(t=0\) along the semi-group \(e^{tA}\).

Lemma 2

Let P be a Markov operator which fixes \({\mathcal {A}}_\varepsilon \), \(h\in {\mathcal {A}}_\varepsilon \) and \(M=R^TR\) be a positive symmetric matrix. Then

$$\begin{aligned} \left( \partial _t\right) _{|P-I} {\mathcal {I}}_{M}\left( h\right)\leqslant & {} - {\mathcal {I}}_M\left( h\right) + {\mathcal {I}}_M\left( P h\right) \,. \end{aligned}$$

Proof

The computation is similar to [9, Lemma 3]. Indeed,

$$\begin{aligned} \left( \partial _t\right) _{|P-I} {\mathcal {I}}_M\left( h\right)= & {} \int \frac{2\left( \nabla h\right) ^T R^T R \nabla \left( Ph - h\right) }{h} - \frac{|R\nabla h|^2(Ph-h) }{h^2} \mathrm {d}\mu \\= & {} \int -\frac{|R\nabla h|^2}{h}\left( 1 +\frac{Ph}{h}\right) + 2\frac{(\nabla h)^T R^T R \nabla P h }{h} \mathrm {d}\mu \\= & {} - {\mathcal {I}}_M\left( h\right) +\lambda \int -\left| \frac{R\nabla h}{h} -\frac{R\nabla P h}{P h} \right| ^2 Q h + \frac{|R\nabla P h|^2}{P h} \mathrm {d}\mu \\\leqslant & {} - {\mathcal {I}}_M\left( h\right) + {\mathcal {I}}_M\left( P h\right) \end{aligned}$$

where we used the positivity of the density h. \(\square \)

For \(k\in \llbracket 1,d\rrbracket \), let \(E_k\) be the \(2d\times 2d\) diagonal matrix with all its coefficients being zero except the \((d+k,d+k)^{th}\) being equal to 1, and

$$\begin{aligned} E\ =\ \sum _{k=1}^d E_k \ = \ \begin{pmatrix} 0 &{}\quad 0\\ 0 &{}\quad I_d \end{pmatrix}\,,\qquad E' \ = \ I_{2d} - E \ = \ \begin{pmatrix} I_d &{}\quad 0\\ 0 &{}\quad 0 \end{pmatrix}\,.\end{aligned}$$

In the particular case of (2), Lemma 2 yields the following.

Lemma 3

Let \(\lambda >0\), \(h\in {\mathcal {A}}_\varepsilon \) and \(M=R^TR\) be a positive symmetric matrix. Then,

$$\begin{aligned} \left( \partial _t\right) _{|\eta (P-I)} {\mathcal {I}}_{M}\left( h\right)\leqslant & {} -\lambda \left( {\mathcal {I}}_{E M + M E -E M E }(h)\right) \end{aligned}$$
(4)

for \((P,\eta )=(P_1,\eta _1)\). Moreover, this is also true for \((P,\eta )=(P_2,\eta _2)\) if the right down \(d\times d\) corner of M is an homothety.

Proof

We recall the following argument from [9, Lemma 1]. From \(\nabla _y P_1 =0\) and \(\nabla _x P_1 = P_1\nabla _x\), \({\mathcal {I}}_M\left( P_1 h\right) = \int \phi \left( P_1(\nabla h,h)\right) \mathrm {d}\mu \), where \(\phi (u,v) = (u^T E' M E' u)/v\). Applying Jensen’s Inequality to the convex function \(\phi \) and the Markov operator \(P_1\), we get \(\phi \left( P_1(h,\nabla h)\right) \leqslant P_1 \phi (h,\nabla h)\). Integrated with respect to \(\mu \) (which is fixed by \(P_1\)), this reads

$$\begin{aligned} {\mathcal {I}}_M\left( P_1 h\right) \ \leqslant \ {\mathcal {I}}_{E'ME'}\left( h\right) . \end{aligned}$$

Applying Lemma 2 yields (4) (since \(\eta _1=\lambda \)).

Similarly, denoting

$$\begin{aligned} P_{2,k} f(x,y)= & {} \int f(x,y_1,\dots ,y_{k-1},w,y_{k+1},\dots ,y_d) \frac{e^{-\frac{1}{2\sigma ^2} w^2}}{ \sigma \sqrt{2\pi } } \mathrm {d}w, \end{aligned}$$

for \(k\in \llbracket 1,d\rrbracket \), we get with the previous argument

$$\begin{aligned} {\mathcal {I}}_M\left( P_{2,k} h\right) \leqslant \ {\mathcal {I}}_{(I_{2d}-E_k)M(I_{2d}-E_k)}\left( h\right) , \end{aligned}$$

so that

$$\begin{aligned} \left( \partial _t\right) _{|\eta _2 (P_{2}-I)} {\mathcal {I}}_{M}\left( h\right)= & {} \sum _{k=1}^d \left( \partial _t\right) _{|\lambda (P_{2,k}-I)} {\mathcal {I}}_{M}\left( h\right) \\\leqslant & {} -\lambda \left( {\mathcal {I}}_{ \sum _{k=1}^d \left( E_k M + M E_k- E_k M E_k \right) }(h)\right) . \end{aligned}$$

Now, suppose that the right down \(d\times d\) corner of M is an homothety, i.e. that

$$\begin{aligned}M\ = \ \begin{pmatrix} M_{1} &{} M_{2}\\ M_2^T &{} \alpha I_d \end{pmatrix} \end{aligned}$$

for some matrices \(M_i\) and some \(\alpha >0\). In that case,

$$\begin{aligned} E M E \ = \ \alpha \begin{pmatrix} 0 &{} 0 \\ 0 &{} 1 \end{pmatrix} \ = \ \alpha \sum _{k=1}^d E_k E_k \ =\ \sum _{k=1}^d E_k M E_k, \end{aligned}$$

which means that we have obtained the same bound (4) on \(\left( \partial _t\right) _{|\eta _i(P_{i}-I)} {\mathcal {I}}_{M}\left( h\right) \) for both \(i=1,2\). \(\square \)

On the other hand, the derivative of \({\mathcal {I}}_M\) along the transport semi-group \(e^{tA}\) where \(A=y\cdot \nabla _x - \nabla _x U(x) \cdot \nabla _y \) is a classical computation (see e.g. [17, Example 8]), which we recall for the sake of completeness:

Lemma 4

For \(h\in {\mathcal {A}}_\varepsilon \),

$$\begin{aligned} \left( \partial _t\right) _{|A } {\mathcal {I}}_M\left( h\right)= & {} \int \frac{(\nabla h)^T (MJ + J^T M) \nabla h }{h} \mathrm {d}\mu \end{aligned}$$
(5)

with \(J = \begin{pmatrix} 0 &{} -\nabla ^2 U \\ 1 &{} 0 \end{pmatrix}\).

Proof

Since A satisfies the chain rule,

$$\begin{aligned} \left( \partial _t\right) _{|A } {\mathcal {I}}_M\left( h\right)= & {} \int \left( \frac{2(\nabla h)^T M \nabla A h }{h} - \frac{Ah}{h^2} (\nabla h)^T M \nabla h \right) \mathrm {d}\mu \\= & {} \int \left( A \left( \frac{2(\nabla h)^T M \nabla h }{h} \right) + \frac{2(\nabla h)^T M (\nabla A h-A\nabla h) }{h}\right) \mathrm {d}\mu \end{aligned}$$

where \(A\nabla h\) should be understood coordinate by coordinate. Conclusion follows from \(\int Ag \mathrm {d}\mu = 0\) for all g and \(\nabla A h-A\nabla h = J\nabla h\). \(\square \)

Combining the two previous results, we get:

Proposition 5

Under Assumption 1, let \(h_t = e^{tL} h_0\) where \(h_0 \in {\mathcal {A}}_{\varepsilon }\). Suppose that there exist \(a,b,\theta \in {\mathbb {R}}\) with \(b^2 < a\) and such that, for \(\xi \in \{\kappa ,K\}\),

$$\begin{aligned} N(\xi ) \ := \ {\begin{pmatrix} 2b &{} \ &{} a-\xi + \lambda b \\ a-\xi + \lambda b &{} \ &{} -2b\xi + \lambda a \end{pmatrix}} \ \geqslant \ \theta \begin{pmatrix} 1 &{} b \\ b &{} a \end{pmatrix} \ := \ \theta M. \end{aligned}$$
(6)

Then, for all \(t\geqslant 0\),

$$\begin{aligned} {\mathcal {I}}_M\left( h_t\right)\leqslant & {} e^{-\theta t} {\mathcal {I}}_M\left( h_0\right) \,. \end{aligned}$$

Proof

Let \(a,b,\theta \in {\mathbb {R}}\) and M be as in the proposition. Since \(L=-A+\eta (P-I)\), Lemmas 3 and 4 read

$$\begin{aligned} \partial _t {\mathcal {I}}_M\left( h_t\right)= & {} \left( \partial _t\right) _{|\eta (P-I)} {\mathcal {I}}_M\left( h_t\right) - \left( \partial _t\right) _{|A} {\mathcal {I}}_M\left( h\right) \ \leqslant \ - {\mathcal {I}}_{N'}\left( h\right) \end{aligned}$$

with, for all \(x\in {\mathbb {R}}^d\),

$$\begin{aligned} N'(x)= & {} M \begin{pmatrix} 0 &{} \ &{} -\nabla ^2 U(x)\\ 1 &{} \ &{} \lambda \end{pmatrix} + \begin{pmatrix} 0 &{} \ &{} 1\\ -\nabla ^2 U(x) &{}\ &{} \lambda \end{pmatrix} M - \lambda \begin{pmatrix} 0 &{}0\\ 0 &{} a \end{pmatrix} \\= & {} \begin{pmatrix} 2b &{} \ &{} a-\nabla ^2 U(x) + \lambda b \\ a-\nabla ^2 U(x) + \lambda b &{} \ &{} -2b\nabla ^2 U(x) + \lambda a \end{pmatrix}\,. \end{aligned}$$

The proof will be concluded (by the Gronwall’s Lemma) if we prove that \(N'(x) \geqslant \theta M\) for all \(x\in {\mathbb {R}}^d\). Fix \(x\in {\mathbb {R}}^d\), and let \({\mathcal {O}}(x)\) be an orthonormal \(d\times d\) matrix such that \({\mathcal {O}}^T(x) \nabla ^2 U(x) {\mathcal {O}}(x)\) is diagonal. Let

$$\begin{aligned}{\mathcal {O}}' \ = \ \begin{pmatrix} {\mathcal {O}}(x) &{} 0 \\ 0 &{} {\mathcal {O}}(x) \end{pmatrix}\,.\end{aligned}$$

Notice that \(N'(x) \geqslant \theta M\) if and only if \({\mathcal {O}}^TN'(x){\mathcal {O}} \geqslant \theta M\), where we used that \({\mathcal {O}} ^T M {\mathcal {O}} = M\). Now, \({\mathcal {O}}^TN'(x){\mathcal {O}} \geqslant \theta M\) if and only if \(N(\xi _k) \geqslant \theta M\) for all eigenvalues \(\xi _k\) of \(\nabla ^2 U(x)\), \(k\in \llbracket 1,d\rrbracket \). Writing such an eigenvalue as \(\xi _k = p_k \kappa + (1-p_k)K\) for some \(p_k\in [0,1]\), we get

$$\begin{aligned}N(\xi _k) \ = \ p_k N(\kappa ) + (1-p_k) N(K) \ \geqslant \ \theta M\end{aligned}$$

by assumption, which concludes. \(\square \)

With Proposition 5 in hand, the proof of our main result has boiled down to elementary computations.

Proof of Theorem 1

Let us find a and b such that \(N(\xi )\) as given by (6) is definite positive for a given \(\xi \) (to be chosen later on). For simplicity, we want to enforce the following conditions:

$$\begin{aligned} 4b^2 \leqslant a\,, \qquad a+\lambda b = \xi \,, \qquad \lambda a \geqslant 4 b\xi \,, \end{aligned}$$

which ensures that \(N(\xi )\) is diagonal with positive terms and that the corresponding M is definite positive. It is clear that such conditions are satisfied for b small enough with \(a=\xi -\lambda b\). More precisely, the first condition is implied by the third if \(b\leqslant \xi /\lambda \), and the third is implied by the second if \(b\leqslant \lambda \xi /(4\xi + \lambda ^2)\) (notice that \(\lambda \xi /(4\xi + \lambda ^2) \leqslant \xi /\lambda \)). As a consequence, we chose

$$\begin{aligned} b \ = \ \frac{\lambda \xi }{4\xi + \lambda ^2}\,, \qquad a \ =\ \xi - \lambda b \ = \ \frac{4\xi ^2}{4\xi + \lambda ^2}\,. \end{aligned}$$

The condition \(4b^2 \leqslant a\) is such that the corresponding matrix M satisfies

$$\begin{aligned} \frac{1}{2} \begin{pmatrix} 1 &{} 0 \\ 0 &{} a \end{pmatrix}\ \leqslant \ M \ \leqslant \ \frac{3}{2} \begin{pmatrix} 1 &{} 0 \\ 0 &{} a \end{pmatrix}\,. \end{aligned}$$

The choice of a and b also ensures that

$$\begin{aligned}N(\xi ) \ = \ \begin{pmatrix} 2b &{} \ &{} 0 \\ 0 &{} \ &{} -2b\xi + \lambda a \end{pmatrix} \ \geqslant \ \begin{pmatrix} 2b &{} \ &{} 0 \\ 0 &{} \ &{} \frac{\lambda a}{2} \end{pmatrix}\,. \end{aligned}$$

For \(\xi '\in \{\kappa ,K\}\), for all \(\gamma >0\),

$$\begin{aligned}&N(\xi ') = N(\xi ) + \begin{pmatrix} 0 &{} \ &{} \xi - \xi ' \\ \xi -\xi ' &{}\ &{} 2b(\xi -\xi ') \end{pmatrix} \geqslant \begin{pmatrix} 2b - \gamma (\xi -\xi ')^2 &{} \ &{} 0 \\ 0 &{} \ &{} \frac{\lambda a}{2} + 2b(\xi -\xi ') - \frac{1}{\gamma }(\xi -\xi ')^2 \end{pmatrix}\,. \end{aligned}$$

In other words,

$$\begin{aligned}N(\xi ') \ \geqslant \ \theta _1 \begin{pmatrix} 1&{} \ &{} 0 \\ 0 &{} \ &{} a \end{pmatrix}\end{aligned}$$

with

$$\begin{aligned}\theta _1 \ = \ \min \left( 2b \left( 1 -\frac{\gamma }{2b}(\xi -\xi ')^2 \right) , \frac{\lambda }{2} \left( 1 -\frac{2}{\lambda a\gamma }(\xi -\xi ')^2 + \frac{\xi -\xi '}{\xi }\right) \right) \,.\end{aligned}$$

Using that \(2b\leqslant \lambda /2\), we chose \(\gamma ^2 = 4b/(\lambda a) = 1/ \xi \) to get that

$$\begin{aligned}\theta _1 \ \geqslant \ \theta _2 \ :=\ 2b \left( 1 -\frac{1}{2b\sqrt{\xi }}(\xi -\xi ')^2 + \frac{\xi -\xi '}{\xi }\right) \,.\end{aligned}$$

Finally, we simply chose \(\xi = K\), so that \(\xi -\xi '\geqslant 0\) for \(\xi '\in \{\kappa ,K\}\). Assuming that \((K-\kappa )^2 \leqslant 2b\sqrt{\xi }\), we get that \(\theta _2 \geqslant 0\) for both \(\xi '\in \{\kappa ,K\}\), and thus

$$\begin{aligned}N(\xi ') \ \geqslant \ \theta _2 \begin{pmatrix} 1&{} \ &{} 0 \\ 0 &{} \ &{} a \end{pmatrix} \ \geqslant \ \frac{2\theta _2}{3} M \ = \ \theta M\,, \end{aligned}$$

and we conclude by

$$\begin{aligned} {\mathcal {I}}(h_t) \leqslant \frac{2}{\min (1,a)} {\mathcal {I}}_{M}(h_t) \leqslant \frac{2}{\min (1,a)} e^{-\theta t} {\mathcal {I}}_{M}(h_0) \leqslant \frac{3\max (1,a)}{\min (1,a)} e^{-\theta t} {\mathcal {I}}(h_0)\,. \end{aligned}$$

\(\square \)