On Explicit $$L^2$$ -Convergence Rate Estimate for Underdamped Langevin Dynamics

Cao, Yu; Lu, Jianfeng; Wang, Lihan

doi:10.1007/s00205-023-01922-4

On Explicit $L^2$-Convergence Rate Estimate for Underdamped Langevin Dynamics

Published: 24 August 2023

Volume 247, article number 90, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Archive for Rational Mechanics and Analysis Aims and scope Submit manuscript

On Explicit $L^2$-Convergence Rate Estimate for Underdamped Langevin Dynamics

Download PDF

Yu Cao¹,
Jianfeng Lu^2,3,4 &
Lihan Wang ORCID: orcid.org/0000-0002-9130-0505⁵

750 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

We provide a refined explicit estimate of the exponential decay rate of underdamped Langevin dynamics in the $L^2$ distance, based on a framework developed in Albritton et al. (Variational methods for the kinetic Fokker–Planck equation, arXiv arXiv:1902.04037, 2019). To achieve this, we first prove a Poincaré-type inequality with a Gibbs measure in space and a Gaussian measure in momentum. Our estimate provides a more explicit and simpler expression of the decay rate; moreover, when the potential is convex with a Poincaré constant $m \ll 1$, our estimate shows the decay rate of $O(\sqrt{m})$ after optimizing the choice of the friction coefficient, which is much faster than m for the overdamped Langevin dynamics.

Convergence rates for nonequilibrium Langevin dynamics

Article 06 October 2017

The Small-Mass Limit for Langevin Dynamics with Unbounded Coefficients and Positive Friction

Article 22 March 2016

Langevin Equations in the Small-Mass Limit: Higher-Order Approximations

Article 13 May 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We consider the convergence rate for the following underdamped Langevin dynamics $({x}_t, {v}_t)\in {\mathbb {R}}^d\times {\mathbb {R}}^d$:

$$\begin{aligned} \left\{ \begin{aligned} \textrm{d}x_t&= {v}_t\,\textrm{d}t\\ \textrm{d}{v}_t&= -\nabla U({x}_t)\,\textrm{d}t - \gamma {v}_t\,\textrm{d}t + \sqrt{2\gamma }\,\textrm{d}{W}_t.\\ \end{aligned}\right. \end{aligned}$$

(1)

Have U(x) is the potential energy, $\gamma >0$ is the friction coefficient, and ${W}_t$ is a d-dimensional standard Brownian motion; the mass and temperature are set to be 1 for simplicity. The law of the process (1), $\rho (t,x,v)$, satisfies the kinetic Fokker–Planck equation

$$\begin{aligned} \partial _t \rho = - v \cdot \nabla _x \rho + \nabla _x U \cdot \nabla _v\rho + \gamma \left( \Delta _v \rho + \nabla _v \cdot (v \rho )\right) . \end{aligned}$$

(2)

It is well-known (see for example [45, Proposition 6.1]) that under mild assumptions, (2) admits a unique stationary density function given by

$$\begin{aligned} \,\textrm{d}\rho _{\infty }(x, v) = \,\textrm{d}\mu (x) \,\textrm{d}\kappa (v), \end{aligned}$$

(3)

where

$$\begin{aligned} \,\textrm{d}\mu (x) = \frac{1}{Z_U}e^{-U(x)}\,\textrm{d}x, \quad \,\textrm{d}\kappa (v) = \frac{1}{(2\pi )^{d/2}} e^{-\frac{|v|^2}{2}} \,\textrm{d}v, \quad Z_U = \int _{{\mathbb {R}}^d} e^{- U(x)}\,\textrm{d}x. \end{aligned}$$

When $\gamma \rightarrow \infty $, the rescaled dynamics $x^{(\gamma )}_t {:}{=} x_{\gamma t}$ converges to the Smoluchowski SDE, also known as the overdamped Langevin dynamics (see e.g., [45, Sec. 6.5]), which is given by

$$\begin{aligned} \textrm{d}x^{(\infty )}_t = -\nabla U(x^{(\infty )}_t)\,\textrm{d}t + \sqrt{2}\,\textrm{d}B_t. \end{aligned}$$

An equivalent formalism of (2) is the following backward Kolmogorov equation:

$$\begin{aligned} \begin{aligned} \partial _t f&= {\mathcal {L}}f,\qquad {\mathcal {L}}= {\mathcal {L}}_{\text {ham}}+ \gamma {\mathcal {L}}_{\text {FD}}, \qquad f(0, x, v) = f_0(x,v). \end{aligned} \end{aligned}$$

(4)

Have ${\mathcal {L}}_{\text {ham}}$ is the Hamiltonian transport operator and ${\mathcal {L}}_{\text {FD}}$ is the fluctuation-dissipation term

$$\begin{aligned} \left\{ \begin{aligned} {\mathcal {L}}_{\text {ham}}&= v \cdot \nabla _x - \nabla _x U \cdot \nabla _v \\ {\mathcal {L}}_{\text {FD}}&= \Delta _v - v \cdot \nabla _v. \end{aligned}\right. \end{aligned}$$

(5)

Indeed, (4) could be derived from (2) by considering $\rho (t, x, v) = f(t, x, -v) \rho _{\infty }(x, v)$ [45]; since by $L^2$-duality, $\left\Vert \rho - \rho _{\infty }\right\Vert _{L^2(\rho _{\infty }^{-1})} \equiv \left\Vert f - \int f\,\textrm{d}\rho _{\infty }\right\Vert _{L^2(\rho _{\infty })}$, the exponential convergence of the solution $\rho (t,\cdot ,\cdot )$ of (2) to $\rho _\infty $ is equivalent to the exponential decay of $f(t, \cdot , \cdot )$ to zero, provided that $\int f_0\,\textrm{d}\rho _{\infty }= 0$. Similarly, one could obtain the backward Kolmogorov equation for the overdamped Langevin dynamics, which is given by

$$\begin{aligned} \partial _t h = -\nabla _x U \cdot \nabla _x h + \Delta _x h, \qquad h(0, x) = h_0(x). \end{aligned}$$

(6)

If $\mu $ satisfies a Poincaré inequality, one could show that the generator in the above equation (6) is self-adjoint and coercive with respect to $L^2(\mu )$. As a consequence, if $\int h_0\,\textrm{d}\mu = 0$, then h(t, x) decays to zero exponentially fast as $t\rightarrow \infty $, see for example [6, Theorem 4.2.5].

Unlike the generator of (6), the generator ${\mathcal {L}}$ in (4) for the underdamped Langevin is not uniformly elliptic. As a result, proving the exponential convergence of $\rho (t, \cdot , \cdot )$ to the equilibrium $\rho _{\infty }$ is more challenging. With extensive works throughout the years, the exponential convergence of the underdamped Langevin dynamics is now better understood in various norms (see Section 1.2 below for a review).

Our goal in this work is to provide an explicit estimate of the decay rate in $L^2$ for the semigroup in (4), based on a framework proposed in [1] which implicitly uses Hörmander’s bracket conditions [32]. In particular, under some mild assumptions of U, we obtain explicit estimates for some universal constant $C>1$ independent of $U,\gamma ,d$ and some $\nu > 0$ such that for any possible $f=f(t,x,v)$ satisfying (4) and $\int _{} f_0 \,\textrm{d}\rho _{\infty }= 0$, we have

$$\begin{aligned} \left\Vert f(t, \cdot , \cdot )\right\Vert _{L^2(\rho _{\infty })} \leqslant Ce^{-\nu t}\left\Vert f_0\right\Vert _{L^2(\rho _{\infty })} . \end{aligned}$$

(7)

In the rest of this section, we will first present our assumptions and main results. Next, we will briefly review existing approaches to study the exponential convergence of (4) (or equivalently (2)) in Section 1.2, and compare our estimate of the decay rate $\nu $ with some previous works aiming at explicit estimates [9, 16, 40, 47]. We would like to comment here that convergence results are also obtained in earlier works [17, 26], although their rates are only explicit in $\gamma $.

1.1 Notations

Throughout the paper we assume I to be the time interval (0, T), and we use $\,\textrm{d}\lambda (t)=\frac{1}{T}\chi _{(0,T)}(t)\,\textrm{d}t$ to denote the rescaled Lebesgue measure on I so that $\,\textrm{d}\lambda (t)$ denotes a probability measure. For any probability measure $\rho $, we use $L^2(\rho )$ (and similarly $H^1(\rho ),H^2(\rho )$) to denote the standard Sobolev spaces, and $H^{-1}(\rho )$ to denote the dual space of $H^1(\rho )$. For the Gaussian probability measure $\kappa $ in velocity space, we also use $L^2_\kappa $, $H^1_\kappa , \, H^{-1}_\kappa $ to denote the corresponding spaces. Moreover, we use $H_0^1(\lambda \otimes \mu )$ to denote the $H^1(\lambda \otimes \mu )$ functions that vanish at both time boundaries $t=0$ and $t=T$. By abuse of notation, we denote the canonical pairing $\langle \cdot , \cdot \rangle _{H^{-1}(\rho ),H^{1}(\rho )}$ between $f\in H^1(\rho )$ and $g\in H^{-1}(\rho )$ by

$$\begin{aligned} \int fg\,\textrm{d}\rho {:}{=}\langle g, f \rangle _{H^{-1}(\rho ),H^{1}(\rho )}. \end{aligned}$$

For $f\in H^{-1}(\rho )$, we use the notation $(f)_{\rho } {:}{=} \langle f,1\rangle _{H^{-1}(\rho ),H^1(\rho )}$. For an arbitrary Banach space V and time interval I equipped with Lebesgue measure $\,\textrm{d}\lambda (t)$, we denote by $L^p(\lambda \otimes \mu ;V)$ the Banach space of functions f(t, x, v) with norm

$$\begin{aligned} \Vert f\Vert _{L^p(\lambda \otimes \mu ;V)}{:}{=}\left( \int _{I\times {\mathbb {R}}^d} \Vert f(t,x,\cdot )\Vert ^p_V \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \right) ^\frac{1}{p}. \end{aligned}$$

Inspired by [1], we define the Banach space

$$\begin{aligned} H_{hyp}^1(\lambda \otimes \mu ){:}{=}\left\{ f\in L^2(\lambda \otimes \mu ;H^1_\kappa )~:~\partial _t f-{\mathcal {L}}_{\text {ham}}f\in L^2(\lambda \otimes \mu ;H_\kappa ^{-1}) \right\} . \end{aligned}$$

We define a projection operator for $\phi (t,x,v)\in L^2(\lambda \otimes \rho _{\infty })$ by

$$\begin{aligned} (\Pi _v \phi )(t,x) {:}{=} \int _{{\mathbb {R}}^d} \phi (t,x,v)\,\textrm{d}\kappa (v). \end{aligned}$$

(8)

Equivalently, $\Pi _v$ is used to obtain the marginal component of $\phi $ in $L^2(\lambda \otimes \mu )$. By a slight abuse of notation, for $\phi (x, v) \in L^2(\rho _{\infty })$, we also use the same notation $\Pi _v$ to represent the similar projection, i.e., $(\Pi _v \phi )(x) {:}{=} \int _{{\mathbb {R}}^d} \phi (x, v) \,\textrm{d}\kappa (v)$. The adjoints of $\nabla _x$ and $\nabla _v$ in the Hilbert space $L^2(\rho _{\infty })$ are respectively given by $\nabla _x^* F= - \nabla _x \cdot F + \nabla _x U \cdot F$ and $\nabla _v^* F = -\nabla _v \cdot F + v \cdot F$ for any vector field $F(x,v): {\mathbb {R}}^{2d} \rightarrow {\mathbb {R}}^d$. Thus we can rewrite operators ${\mathcal {L}}_{\text {ham}}$ and ${\mathcal {L}}_{\text {FD}}$ as

$$\begin{aligned} {\mathcal {L}}_{\text {ham}}= \nabla _v^* \nabla _x - \nabla _x^* \nabla _v, \qquad {\mathcal {L}}_{\text {FD}}= -\nabla _v^* \nabla _v. \end{aligned}$$

(9)

For time-augmented state space $I\times {\mathbb {R}}^d$ equipped with measure $\lambda \otimes \mu $, we use the convention $\partial _{x_0}{:}{=}\partial _t$, the short-hand notation $\bar{\nabla }{:}{=}(\partial _t,\nabla _x)^\top $, and the notation ${\mathscr {L}}{:}{=}-\partial _{tt}+\nabla _x^*\nabla _x$ to denote the “Laplace” operator on $L^2(\lambda \otimes \mu )$. We use C to denote a universal constant independent of all parameters that may change from line to line.

1.2 Assumptions and Main Results

Assumption 1

(Poincaré inequality for $\mu $) Assume that the potential U(x) satisfies a Poincaré inequality in space

$$\begin{aligned} \int _{{\mathbb {R}}^d} \left( f-\int _{{\mathbb {R}}^d} f \,\textrm{d}\mu \right) ^2\textrm{d}\mu \leqslant \dfrac{1}{m}\int _{{\mathbb {R}}^d} |\nabla _x f|^2 \,\textrm{d}\mu , \qquad \forall f\in H^1(\mu ). \end{aligned}$$

(10)

Assumption 2

The potential $U\in C^2({\mathbb {R}}^d)$, and there exist constants $M>0$ and $\delta \in (0,1)$ such that

$$\begin{aligned} |\nabla _x^2 U(x)|^2= & {} \sum _{i,j=1}^d |\partial _{x_ix_j} U(x)|^2\leqslant M^2(d + |\nabla _x U(x)|^2), \nonumber \\{} & {} \quad \text{ and } \Delta _x U(x) \leqslant Md + \frac{\delta }{2}|\nabla _x U(x)|^2 \qquad \forall \ x\in {\mathbb {R}}^d. \end{aligned}$$

(11)

for some constant $M\geqslant 1$.

Assumption 3

The embedding is compact.

Remark 1.1

(i)
Assumption 1 guarantees that the elliptic equation $\nabla _x^*\nabla _x u = h$ has a unique solution $u\in H^2(\mu )$ for any $h\in L^2(\mu )$ satisfying $(h)_\mu =0$ (see for example [19, Proposition 5]). Hence, together with Assumption 3, we derive from Fredholm alternative that $L^2(\mu )$ has an orthonormal basis $\{1\}\cup \{w_\alpha \}_{\alpha >0}$ where $w_\alpha \in H^2(\mu )$ are eigenfunctions of $\nabla _x^*\nabla _x$ with eigenvalue $\alpha ^2$ for a discrete set of $\alpha >0$ (see [22, Chapter 6] for an argument with bounded domains):
$$\begin{aligned} \nabla ^*_x\nabla _x w_\alpha =\alpha ^2 w_\alpha . \end{aligned}$$
Further, by Assumption 1, any eigenvalue $\alpha ^2$ of $\nabla ^*_x \nabla _x$ satisfies $\alpha \ge \sqrt{m}$, in fact, the smallest $\alpha $ is precisely $\sqrt{m}$, the square root of the Poincaré constant; the spectrum of $\nabla _x^*\nabla _x$ is unbounded from above.
(ii)
Assumption 3 is satisfied when
$$\begin{aligned} \lim _{|x|\rightarrow \infty } \dfrac{U(x)}{|x|^\beta }=\infty \end{aligned}$$
for some $\beta >1$ (see [31] for a proof). We would like to comment here that we require Assumption 3 only for technical purposes, more precisely in the proof of Lemma 2.6 where we used the spectral decomposition of the elliptic operator $\nabla _x^*\nabla _x$ to construct the test functions we desire. We believe that the assumption is not necessary for our main results to hold. We leave this for future research.
(iii)
Similar versions of Assumption 2 is commonly used in the literature, see e.g., the books [45, 54] and the papers [18, 19], and is satisfied when U grows at most exponentially fast as $x\rightarrow \infty $. Here we adopt the more natural dimension scaling in [10, Assumption 1] (in particular, we take $c_1=c_3=M$ in their setting), since in the case of separable potential $U(x) = \sum _{i=1}^d u(x_i)$, this amounts to the more natural one-dimensional estimate $|u''|^2 \leqslant M(1+|u'|^2)$.

Theorem 1

Under Assumptions 1, 2, and 3, there exist a constant $\nu > 0$ and universal constants C, c independent of all parameters such that, for every f(t, x, v) satisfying the backward Kolmogorov equation (4) with initial condition $f_0 \in L^2(\mu ;H^1_\kappa )$ and

$$\begin{aligned} (f_0)_{\rho _{\infty }}=0, \end{aligned}$$

(12)

we have, for every $t\in (0,\infty )$,

$$\begin{aligned} \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })} \leqslant C\exp (-\nu t)\Vert f_0\Vert _{L^2(\rho _{\infty })}. \end{aligned}$$

Moreover, $\nu $ can be made explicit as

$$\begin{aligned} \nu = \dfrac{m\gamma }{c(\sqrt{m}+R+\gamma )^2} \end{aligned}$$

(13)

with some constant $R>0$ given by

(i)
If U is convex, then
$$\begin{aligned} R=0. \end{aligned}$$
(ii)
If the Hessian of U is bounded from below
$$\begin{aligned} \nabla _x^2 U(x) \geqslant -K\, \textrm{Id}, \qquad \forall \, x \in {\mathbb {R}}^d \end{aligned}$$
(14)
for some constant $K \geqslant 0$, then
$$\begin{aligned} R=\sqrt{K}. \end{aligned}$$
Note that if $K = 0$, we recover the estimate in case (i).
(iii)
In the most general case without further assumptions,
$$\begin{aligned} R=M+M^\frac{3}{4}d^\frac{1}{4}. \end{aligned}$$

Remark 1.2

(i)
If we fix $m=O(1)$, then, when $\gamma \rightarrow 0$ (resp. $\gamma \rightarrow \infty $), our estimate provides an estimate on decay rate of $O(\gamma )$ (resp. $O(\gamma ^{-1})$). This is consistent with [17, 26, 47] and also the isotropic Gaussian case when $U(x)=\frac{m}{2}|x|^2$ (see Appendix A).
(ii)
In the convex case, if we optimize with respect to $\gamma $ by choosing $\gamma =\sqrt{m}$, then
$$\begin{aligned} \nu =\frac{\sqrt{m}}{4c}. \end{aligned}$$
As is shown in Appendix A, the scaling on m is optimal in the regime $m\rightarrow 0$, as it is the rate even for isotropic quadratic potential. We refer the readers to Appendix B for the corresponding results from the DMS method, with a slightly more explicit estimate compared to [47].
(iii)
In the case where condition (14) is satisfied, e.g. for the double well potential $U(x)=(|x|^2-1)^2$ with $K=4$, our scaling on K is consistent with [36, Theorem 1] and [37, Sec. 5]. Similar assumption is also used in [44, Theorem 1] for functional inequalities.
(iv)
It is well-known that for overdamped Langevin dynamics, the decay rate is simply m in $L^2(\mu )$ for (6). By part (ii) of this remark, when $m \ll 1$, the underdamped Langevin dynamics (1) could converge to its equilibrium $\rho _{\infty }$ at a rate $O(\sqrt{m})$ for convex potentials, which is much faster than the overdamped Langevin dynamics.
(v)
Due to the relation (see e.g., [48])
$$\begin{aligned}&\frac{1}{\sqrt{2}}\left\Vert \rho - \rho _{\infty }\right\Vert _{\text {TV}} \leqslant \sqrt{\textrm{KL}\left( \rho \,\Vert \,\rho _{\infty }\right) } \leqslant \sqrt{\chi ^2(\rho , \rho _{\infty })} \\&\quad \equiv \left\Vert \rho - \rho _{\infty }\right\Vert _{L^2(\rho _{\infty }^{-1})} \equiv \left\Vert f - \int f\,\textrm{d}\rho _{\infty }\right\Vert _{L^2(\rho _{\infty })}, \end{aligned}$$
where $f = \,\textrm{d}\rho / \,\textrm{d}\rho _{\infty }$, and the Talagrand inequality [44] $W_2(\rho ,\rho _{\infty }) \leqslant \sqrt{\frac{2}{C_{LSI}}\textrm{KL}(\rho \Vert \rho _{\infty })}$ where $C_{LSI}$ is the logarithmic Sobolev constant, Theorem 1 implies that $\rho (t, \cdot , \cdot )$ converges to $\rho _{\infty }$ with rate $2\nu $ in both $\chi ^2$-divergence and relative entropy, and with rate $\nu $ in total variation and (if $\mu $ satisfies log-Sobolev inequality) 2-Wasserstein distance. On the other hand, our result does not imply
$$\begin{aligned}d(\rho _t,\rho _{\infty }) \leqslant C\exp (-\nu t) d(\rho _0,\rho _{\infty }) \end{aligned}$$
where $d(\rho ,\pi ) = TV(\rho ,\pi ), \, W_2(\rho ,\pi )$ or $\textrm{KL}(\rho \Vert \pi )$. It is interesting to study if one could establish the same convergence rate with Wasserstein distance (which is the same as asking if one could establish a coupling argument for our result) or relative entropy.

Our decay estimate is based on the following Poincaré-type inequality in time-augmented space:

Theorem 2

Under Assumptions 1, 2, and 3, there exist a universal constant C independent of all parameters, and a constant $R<\infty $ (the same constant as in Theorem 1) such that for every $f\in H_{hyp}^1(\lambda \otimes \mu )$, we have

$$\begin{aligned}{} & {} \Vert f-(f)_{\lambda \otimes \rho _{\infty }}\Vert _{L^2(\lambda \otimes \rho _{\infty })} \leqslant C\left( \left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}\right. \right. \nonumber \\{} & {} \quad \left. + \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \Vert (\mathcal {I}-\Pi _v) f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \nonumber \\{} & {} \left. \quad +\left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )} \right) . \end{aligned}$$

(15)

Let us give a brief introduction on the strategy of the proof, which is strongly motivated by the work of Armstrong and Mourrat [1]. A naive energy estimate and Gaussian Poincaré inequality yields

$$\begin{aligned} \dfrac{\,\textrm{d}}{\,\textrm{d}t} \Vert f(t, \cdot )\Vert _{L^2(\rho _{\infty })}^2 =-2\gamma \Vert \nabla _v f(t, \cdot )\Vert _{L^2(\rho _{\infty })}^2\leqslant -2\gamma \Vert (\mathcal {I}-\Pi _v) f(t, \cdot )\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

While the above establishes the $L^2$ energy decay, it does not directly yield exponential decay rate. In particular, the energy dissipation is only present in velocity variable. However, instead of looking at single time slice, we should look at time intervals, since after time propagation, the dissipation in v together with the transport terms in x will lead to dissipation in x. Moreover, in the analysis, we are essentially treating the time variable t as another space variable alongside x. With the help a Poincaré-type inequality in the time-augmented state space established in Theorem 2, we can prove exponential convergence still using the standard energy estimate, in line with the moral “hypocoercivity is simply coercivity with respect to the correct norm”, quoted from [1, Page 4].

To prove Theorem 2, as an educated reader might realize from [19], the elliptic regularity in x variable plays an important role in the estimates, which in Lemma 2.4 we made a mild generalization to the time-augmented space $L^2(\lambda \otimes \mu )$. However, in the proof of Theorem 2 when applying integration by parts, we need test functions that vanish at both boundary layers $t=0$ and $t=T$, which is not necessarily satisfied by the derivatives of the solution to the elliptic equation (22). This is why we resort to Lemma 2.6 (also an extension of Bogovskii’s operator [11] to $(I\times {\mathbb {R}}^d,\lambda \otimes \mu )$) for the solution of the divergence equation (25), which is a cornerstone of this proof. In particular, even for convex U, the constants in (15) blow up as $T\rightarrow 0$, which can be traced down to the estimate of $\psi _{2,\alpha }'$ in (35), and thus prevents us from working on single time slices.

1.3 A Literature Review and Comparison

Kinetic Fokker–Planck equation was first studied by Kolmogorov [34], and was the main motivation for Hörmander’s theory on hypoelliptic equations [32], which gave an almost complete classification of second-order hypoelliptic operators. The earliest result regarding its exponential convergence were established in [52] for potentials with bounded Hessian, which was later generalized in [41, 51, 55]. There is a substantial amount of works in the literature for studying the exponential convergence of the underdamped Langevin dynamics. Below, we shall categorize them based on the norms and approaches to characterize the convergence.

(i)
(Convergence in $H^1(\rho _{\infty })$ norm). The exponential convergence of the kinetic Fokker–Planck equation in $H^1(\rho _{\infty })$ was proved by Villani in [54, Theorem 35], which was inspired by early works of [27, 29]. See also [53] for a brief overview of main ideas. The earlier work of [43] proved similar results on the torus without forcing term. Since $L^2(\rho _{\infty })$ norm is controlled by $H^1(\rho _{\infty })$ norm, this result automatically implies the convergence of (4) in $L^2(\rho _{\infty })$. However, the decay rate therein is quite implicit; see [54, Sec. 7.2]. This approach is extended in [9] to possibly singular potentials with convergence rates given in certain cases.
(ii)
(Convergence in a modified $L^2(\rho _{\infty })$ norm). A more direct approach for convergence in $L^2(\rho _{\infty })$ was developed by Dolbeault, Mouhot and Schmeiser in [18, 19], see also earlier ideas in [28]. They identified a modified $L^2(\rho _{\infty })$ norm, denoted by $\textsf{E}$, such that $\textsf{E}(\rho (t, x, v)) \rightarrow 0$ exponentially fast for $\rho (t, \cdot , \cdot )$ evolving according to (2). This hypocoercivity method was revisited and adapted in [17, 26, 47] to deal with the backward Kolmogorov equation (4), i.e., to show that $\textsf{E}(f(t, \cdot , \cdot ))$ decays to zero exponentially fast. In Appendix B.1, we will briefly revisit how to choose the Lyapunov function $\textsf{E}$, based on [16, Sec. 2], because their setup is consistent with our $L^2(\rho _{\infty })$ estimate in Section 1.1 above. We would like to remark that while [47] gets some rate, for which the scalings in d and $\gamma $ are known, it is difficult to determine the optimal $\gamma $ for their convergence rate estimates. As a remark, the DMS method [18, 19] has been extended or adapted to study the convergence of spherical velocity Langevin equation [25], non-equilibrium Langevin dynamics [33], Langevin dynamics with general kinetic energy [49], temperature-accelerated molecular dynamics [50], adaptive Langevin dynamics [38], dynamics with Boltzmann-type dissipation [2], dynamics with singular potentials [12], just to name a few. It might be interesting to study whether the variational framework [1] we based on can be extended to these cases.
(iii)
(Convergence in Wasserstein distance). Baudoin discussed a general framework of the Bakry–Émery methodology [5] to hypoelliptic and hypocoercive operators, based on which the exponential convergence of the kinetic Fokker–Planck equation (quantified by a Wasserstein distance associated with a special metric) was proved under certain assumptions on the potential U(x) [7, Theorem 2.6]; see also [8]. A different approach is the coupling method for underdamped Langevin dynamics (1). In [16, Sec. 2], for strongly convex potential U, Dalalyan and Riou-Durand considered the mixing of the marginal distribution in the x coordinate, by a synchronous coupling argument; an estimate of the convergence rate was also explicitly provided, quantified by $W_2$ distance [16, Theorem 1]. For more general potentials, Eberle, Guillin and Zimmer developed a hybrid coupling method, composed of synchronous and reflection couplings, to study the exponential convergence of probability distributions for the underdamped Langevin dynamics (1), quantified by a Kantorovich semi-metric [20]. Unfortunately, their rates are dimension dependent in general.
(iv)
(Convergence in relative entropy) Villani [54] obtained exponential convergence of kinetic Fokker–Planck in the case of potentials with bounded Hessian, which is extended in [8]. A more quantitative convergence rate is obtained in [40]. All of them essentially used Gamma calculus on a twisted metric so that derivatives in x direction can be introduced. In [13], exponential convergence of entropy is established for potentials that may not have bounded Hessians but satisfy a stronger weighted log-Sobolev inequality.

There are other approaches to study the long time behavior of the underdamped Langevin dynamics, e.g., Lyapunov function [4, 41, 51, 55] and spectral analysis [21, 35]. There are also works that extend the aforementioned approaches to dynamics with singular potentials [9, 12, 14, 15, 30, 39]. We will not go into details here.

While our work is not the first one that studies the exponential convergence of underdamped Langevin dynamics, our estimates are more quantitative, and in certain cases, sharper than any existing result. In particular, for a large class of convex potentials, we establish an $O(\sqrt{m})$ convergence rate after optimizing in $\gamma $, which is independent of dimension and only assumes a mild upper bound (Assumption 2) on the derivatives of the potential. To the best of our knowledge, this optimal $O(\sqrt{m})$ convergence rate is new in the literature.

Table 1 summarizes the previous results [9, 16, 40] under the assumption $m \textrm{Id} \leqslant \nabla _x^2 U \leqslant L \textrm{Id}$ (and hence guarantee Assumptions 1–3) in the most interesting regime $m\ll 1 \ll L$, with optimal choice of $\gamma $. To elaborate the comparison with result of [40], after a rescaling, they proved exponential convergence of (4) with friction parameter (using their notations) $\gamma \sqrt{\xi }$ and convergence rate $O(\frac{\lambda }{\sqrt{\xi }})$, with constraints that requires (see [40, Proof of Lemma 8])

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\xi }{2L}-\left( \frac{1}{4L}+\frac{1}{2m}\right) \lambda>0 \\ {}&\gamma \left( \frac{4\xi }{L}+1\right) -\left( \frac{1}{2m}+\frac{2}{L}\right) \lambda >0 \\ {}&\frac{1}{2}- \frac{\xi }{2L}+\left( \frac{1}{4L}+\frac{1}{2m}\right) \lambda -\gamma \left( \frac{4\xi }{L}+1\right) +\left( \frac{1}{2m}+\frac{2}{L}\right) \lambda \leqslant 0. \end{aligned} \right. \end{aligned}$$

Combined, these yield $\xi \geqslant O(L)$ and $\lambda \leqslant O(m)$, which means the convergence rate cannot exceed $O(\frac{m}{\sqrt{L}})$. Moreover, they require $\gamma \geqslant O(1)$, or their friction parameter must be at least $O(\sqrt{L})$.

Table 1 Summary of the convergence rate $\nu $ depending on d, m, L under the assumption $m{\text {Id}}\leqslant \nabla _x^2 U \leqslant L {\text {Id}}$ for the regime $m\ll 1 \ll L$

Full size table

We also comment that in the case where $\Vert \nabla _x^2 U\Vert \leqslant L \textrm{Id}$, but U is not necessarily convex, our convergence rate is $\nu = O(\frac{m}{\sqrt{L}})$ after optimizing in $\gamma $ by choosing $\gamma \sim \sqrt{L}$, which matches the results of existing works [9, 40].

2 Proofs

In this section, we present the statements and proofs of auxiliary lemmas, followed by the proofs of the two main theorems. Lemmas 2.1, 2.2 and 2.3 are the technical lemmas that prepare us for the elliptic regularity result in Lemma 2.4. The proof of the divergence Lemma, which builds up from elliptic regularity, is presented in Lemma 2.6. The proof of Theorem 2 is then possible with the test functions obtained from Lemma 2.6. Finally we present the proof of Theorem 1 which follows from Theorem 2 and energy estimate.

We start with the Poincaré inquality on tensorized space $(I\times {\mathbb {R}}^d, \lambda \otimes \mu )$, which allows elliptic regularity to hold in the time-augmented state space. The proof is standard and is thus omitted.

Lemma 2.1

(Poincaré Inequality) For $f\in H^1(\lambda \otimes \mu )$,

$$\begin{aligned} \Vert f-(f)_{\lambda \otimes \mu }\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant \max \left\{ \frac{1}{m},\frac{T^2}{\pi ^2}\right\} \left( \Vert \partial _t f\Vert _{L^2(\lambda \otimes \mu )}^2+\Vert \nabla _x f\Vert _{L^2(\lambda \otimes \mu )}^2\right) .\nonumber \\ \end{aligned}$$

(16)

The next lemma is also a technical lemma, the goal of which is to show that under Assumption 2, $|\nabla ^2 U|$ defines a bounded operator $H^1(\lambda \otimes \mu )\rightarrow L^2(\lambda \otimes \mu )$, which allows us to improve the regularity $u\in H^2(\lambda \otimes \mu )$ for u being the solution of (22) in the proof of Lemma 2.4.

Lemma 2.2

[54, Lemma A.24] For any $\phi \in H^1(\lambda \otimes \mu )$, we have

$$\begin{aligned} \Vert \phi \nabla _x U\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant 16\Vert \nabla _x \phi \Vert _{L^2(\lambda \otimes \mu )}^2+4Md\Vert \phi \Vert _{L^2(\lambda \otimes \mu )}^2, \end{aligned}$$

(17)

where M is the constant in (11).

Proof

$$\begin{aligned} \Vert \phi \nabla _x U\Vert _{L^2(\lambda \otimes \mu )}^2&= \int _{I\times {\mathbb {R}}^d} \phi ^2 \nabla _x U\cdot \nabla _x U \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= \int _{I\times {\mathbb {R}}^d} \nabla _x \cdot (\phi ^2 \nabla _x U)\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= 2\int _{I\times {\mathbb {R}}^d} \phi \nabla _x \phi \cdot \nabla _x U\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) +\int _{I\times {\mathbb {R}}^d} \phi ^2 \Delta _x U \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&{\mathop {\leqslant }\limits ^{(11)}} \dfrac{1}{4} \Vert \phi \nabla _x U \Vert _{L^2(\lambda \otimes \mu )}^2 + 4 \Vert \nabla _x \phi \Vert _{L^2(\lambda \otimes \mu )}^2\\ {}&\qquad + Md \Vert \phi \Vert _{L^2(\lambda \otimes \mu )}^2 +\frac{\delta }{2}\int _{I\times {\mathbb {R}}^d} \phi ^2 |\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

We thus finish the proof of (17) after rearranging and using $\delta <1$. $\quad \square $

Next we present a technical lemma that prepares us for the (mixed space-time) $H^2$ estimates of u, the solution of the elliptic equation (22). This is a generalization of a similar $L^2$–$H^2$ regularity estimate in [19, Proposition 5], where only the spatial variable is considered, but our estimates are algebraically simpler thanks to Bochner’s formula. Let us remark that we adopt the same scaling of parameters as [10, Lemma 3.6], especially in the most general case (iii).

Lemma 2.3

For any $u\in H^2(\lambda \otimes \mu )$ such that $\bar{\nabla } u\in H_0^1(\lambda \otimes \mu )^{d+1}$,

$$\begin{aligned} \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2=\sum _{i,j=0}^d\Vert \partial _{x_i}\partial _{x_j} u\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant C\left( \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2+R^2\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\right) , \end{aligned}$$

(18)

Similarly,

$$\begin{aligned} \Vert \nabla _x^2 u\Vert _{L^2(\lambda \otimes \mu )}^2\leqslant C\left( \Vert \nabla _x^*\nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2+R^2\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\right) . \end{aligned}$$

(19)

Here C is a universal constant whose the precise value can be traced in the proof under different assumptions in Theorem 1, and R is defined in Theorem 1.

Proof

We only prove (18) since the proof of (19) follows from a similar argument. The starting point of the proof is Bochner’s formula

$$\begin{aligned} \sum _{i,j=0}^d |\partial _{x_i, x_j} u|^2=\bar{\nabla }u\cdot \bar{\nabla }{\mathscr {L}}u-(\nabla _xu)^{\top }\nabla _x^2U\nabla _xu -{\mathscr {L}}\dfrac{|\bar{\nabla } u|^2}{2}. \end{aligned}$$

Integrate over $\lambda \otimes \mu $ and (noticing the last term above has integral zero) we get

$$\begin{aligned} \sum _{i,j=0}^{d}\Vert \partial _{x_i, x_j} u\Vert _{L^2(\lambda \otimes \mu )}^2 =\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2-\int _{I\times {\mathbb {R}}^d} (\nabla _xu)^{\top } \nabla _x^2U \nabla _xu \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x).\nonumber \\ \end{aligned}$$

(20)

This already verifies the conclusion in cases (i) (setting $K=0$) and (ii) with $C=1$.

Now we deal with the more general case, without assuming (14). Using (17) with $\phi =\partial _{x_i} u,\ i=1,\cdots ,d$,

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&\quad = \sum _{i=1}^d \int _{I\times {\mathbb {R}}^d} (\partial _{x_i} u)^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {\leqslant }\limits ^{(17)}} 16 \Vert D_x^2 u\Vert _{L^2(\lambda \otimes \mu )}^2 +4Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {=}\limits ^{(20)}} 16\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +4Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad -16\int _{I\times {\mathbb {R}}^d} (\nabla _xu)^{\top } \nabla _x^2U \nabla _xu \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {\leqslant }\limits ^{(11)}} 16\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2+4 Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad +16M\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2(\sqrt{d}+|\nabla _x U|) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {\leqslant }\limits ^{d\geqslant 1}} 16 \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +20Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad +128M^2 \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) +\dfrac{1}{2}\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

Rearranging the terms, we arrive at

$$\begin{aligned}{} & {} \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \leqslant 32\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 \nonumber \\{} & {} \quad + (40Md+256M^2)\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

(21)

Therefore by (21) and triangle inequality,

$$\begin{aligned} \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2&{\mathop {\leqslant }\limits ^{(11),(20)}} \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +M\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2(\sqrt{d}+|\nabla _x U|) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&\leqslant \; \; \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 \\&\quad +M\sqrt{d}\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 +M\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}\Vert |\nabla _x u||\nabla _x U|\Vert _{L^2(\lambda \otimes \mu )}\\&{\mathop {\leqslant }\limits ^{(21)}} \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +M\sqrt{d}\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\\&\quad +M \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}\left( 6\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}+(16M+\sqrt{40Md})\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}\right) \\&\leqslant 4\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +(19M^2+M\sqrt{40Md})\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

$\square $

One of the key lemmas of our proof is the following result on elliptic regularity on the space $(I\times {\mathbb {R}}^d, \lambda \otimes \mu )$ (the solution to such elliptic equation will play an important role in the proof of Lemma 2.6):

Lemma 2.4

Consider the following elliptic equation:

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {L}}u=h&\text{ in }&\ I\times {\mathbb {R}}^d,\\ {}&\partial _t u(t=0, \cdot )=\partial _t u(t=T,\cdot )=0&\text{ in }&\ {\mathbb {R}}^d. \end{aligned}\right. \end{aligned}$$

(22)

Assume $h\in H^{-1}(\lambda \otimes \mu )$, and $(h)_{\lambda \otimes \mu }=0$. Define the function space

$$\begin{aligned} V=\left\{ u\in H^1(\lambda \otimes \mu )~:~ (u)_{\lambda \otimes \mu }=0 \right\} . \end{aligned}$$

Then

(i)
There exists a unique $u\in V$ which is a weak solution to (22). More precisely, for any $v\in H^1(\lambda \otimes \mu )$, we have
$$\begin{aligned} \int _{I\times {\mathbb {R}}^d} (\partial _t u\partial _t v +\nabla _x u \cdot \nabla _x v)\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) = \int _{I\times {\mathbb {R}}^d} hv \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$
Moreover, when $h\in L^2(\lambda \otimes \mu )$, we have the estimate
$$\begin{aligned} \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant \max \left\{ \frac{1}{m},\frac{T^2}{\pi ^2}\right\} \Vert h\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$
(23)
(ii)
If $h\in L^2(\lambda \otimes \mu )$, then the solution u to (22) satisfies $u\in H^2(\lambda \otimes \mu )$.

Remark 2.5

One could in fact estimate $\Vert u\Vert _{H^1(\lambda \otimes \mu )}$ using only $\Vert h\Vert _{H^{-1}(\lambda \otimes \mu )}$, but with a slightly worsened constant $\max \{\frac{1}{m},\frac{T^2}{\pi ^2},1\}$ on the rhs. Since in our applications we only use $\Vert h\Vert _{L^2(\lambda \otimes \mu )}$, we opt for the current version of (23) for simplicity.

Proof

(i) V is a linear Hilbert space and has non-zero elements (any function constant in t, and $H^1$ and mean zero in x is included in V). Moreover, V is a subspace of $H^1(\lambda \otimes \mu )$, and for the rest of the paper we equip it with the $H^1(\lambda \otimes \mu )$ norm. We also define the following inner-product:

$$\begin{aligned} B(u,v){:}{=}\int _{I\times {\mathbb {R}}^d} (\partial _t u\partial _t v +\nabla _x u \cdot \nabla _x v)\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

One can easily verify $B(\cdot ,\cdot )$ is an inner product on V. Notice that if $B(u,u)=0$ then $\partial _t u=\nabla _x u=0$, leaving u to be a constant, which has to be 0 since $(u)_{\lambda \otimes \mu }=0$. If u is a weak solution of (22), then for any $v\in V$, $B(u,v)=\int _{I\times {\mathbb {R}}^d} hv\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) $, and necessarily $(h)_{\lambda \otimes \mu }=0$ when we take $v=1$.

Since $(u)_{\lambda \otimes \mu }=0$, by Poincaré inequality (Lemma 2.1) we can show B is coercive under $H^1(\lambda \otimes \mu )$ norm in the sense of

$$\begin{aligned} B[u,u]&= \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2+ \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 \\ {}&\geqslant \dfrac{1}{C}(\Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2+\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2+\Vert u\Vert _{L^2(\lambda \otimes \mu )}^2) \\ {}&= \dfrac{1}{C}\Vert u\Vert _{H^1(\lambda \otimes \mu )}^2. \end{aligned}$$

We can also show B is bounded above since it is an inner-product and $B[u,u]\leqslant \Vert u\Vert _{H^1(\lambda \otimes \mu )}^2$. Define a linear functional on V: $H(v){:}{=}\int _{I\times {\mathbb {R}}^d} hv \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)$. One can verify the boundedness of H:

$$\begin{aligned} |H(v)|\leqslant \Vert h\Vert _{H^{-1}(\lambda \otimes \mu )}\Vert v\Vert _{H^1(\lambda \otimes \mu )}. \end{aligned}$$

Thus by Lax–Milgram’s Theorem, the equation (22) has a unique weak solution $u\in V$. Moreover,

$$\begin{aligned}&( \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 )^2= B[u,u]^2 \\ {}&\quad = \left( \int _{I\times {\mathbb {R}}^d} hu\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\right) ^2 \leqslant \Vert h\Vert _{L^2(\lambda \otimes \mu )}^2\Vert u\Vert _{L^2(\lambda \otimes \mu )}^2 \\ {}&\quad {\mathop {\leqslant }\limits ^{(16)}} \max \left\{ \frac{1}{m},\frac{T^2}{\pi ^2} \right\} \Vert h\Vert _{L^2(\lambda \otimes \mu )}^2 \left( \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\right) , \end{aligned}$$

and the desired estimate follows.

(ii) For each $i=1,2,\cdots ,d$, consider the elliptic equation

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {L}}w_i=\partial _{x_i}h-\nabla _x u\cdot \nabla _x \partial _{x_i} U&\text{ in }&\ I\times {\mathbb {R}}^d,\\ {}&\partial _t w_i(t=0, \cdot )=\partial _t w_i(t=T,\cdot )=0&\text{ in }&\ {\mathbb {R}}^d. \end{aligned} \right. \end{aligned}$$

(24)

The motivation of considering (24) is that, if we formally differentiate (22) with respect to $\partial _{x_i}$, then $\partial _{x_i} u $ satisfies precisely the equation (24) for $w_i$. Hence, our plan is to use part (i) to establish $w_i\in H^1(\lambda \otimes \mu )$, then argue that $w_i-\partial _{x_i} u$ must be constant.

We first verify the rhs of (24) has total integral zero. Indeed

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^d} (\partial _{x_i} h-\nabla _xu \cdot \nabla _x \partial _{x_i} U) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad = \int _{I\times {\mathbb {R}}^d} (h\partial _{x_i} U-\nabla _xu \cdot \nabla _x \partial _{x_i} U) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad = \int _{I\times {\mathbb {R}}^d} \left( {\mathscr {L}}u\partial _{x_i} U-\nabla _xu \cdot \nabla _x \partial _{x_i} U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad = \int _{I\times {\mathbb {R}}^d} \left( \partial _{t}u\partial _{tx_i}U+\nabla _x u\cdot \nabla _x\partial _{x_i} U-\nabla _xu \cdot \nabla _x \partial _{x_i} U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) = 0. \end{aligned}$$

The next step is to show rhs is in $H^{-1}(\lambda \otimes \mu )$. Pick a test function $\phi \in H^1(\lambda \otimes \mu )$ with $\Vert \phi \Vert _{H^1(\lambda \otimes \mu )}=1$, and, by Lemma 2.2,

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^d} (\partial _{x_i} h-\nabla _x u\cdot \nabla _x\partial _{x_i} U)\phi \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad \leqslant \int _{I\times {\mathbb {R}}^d} (-h\partial _{x_i} \phi +h\phi \partial _{x_i} U) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)+\int _{I\times {\mathbb {R}}^d}|\phi \nabla _x u||\nabla _x\partial _{x_i} U| \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad {\mathop {\leqslant }\limits ^{(11)}} \Vert h\Vert _{L^2(\lambda \otimes \mu )}(1+\Vert \phi \partial _{x_i} U\Vert _{L^2(\lambda \otimes \mu )})+M\int _{I\times {\mathbb {R}}^d}|\phi \nabla _x u|(\sqrt{d}+|\nabla _x U|) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&\qquad \leqslant \Vert h\Vert _{L^2(\lambda \otimes \mu )}(1+\Vert \phi \partial _{x_i} U\Vert _{L^2(\lambda \otimes \mu )})+M\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}(\sqrt{d}+\Vert \phi \nabla _x U\Vert _{L^2(\lambda \otimes \mu )})\\&\qquad {\mathop {\leqslant }\limits ^{(17),(23)}} C(M,d)\Vert h\Vert _{L^2(\lambda \otimes \mu )}, \end{aligned}$$

where $C(M,d)>0$ is a constant depending on M, d. Therefore, by (i) we know there exists a $w_i\in V$ which is the weak solution of (24). Finally, comparing (22) and (24), we observe that ${\mathscr {L}}(w_i-\partial _{x_i}u)=0$ in the sense of distributions, which by (i) indicates $w_i-\partial _{x_i} u$ must be constant, which must be $-(\partial _{x_i} u)_{\lambda \otimes \mu }$, since by construction $w\in V$ and $(w)_{\lambda \otimes \mu }=0$. This also means $\partial _{x_i} u \in H^1(\lambda \otimes \mu )$ since $w_i\in H^1(\lambda \otimes \mu )$. We end the proof of $u\in H^2(\lambda \otimes \mu )$ by writing $\partial _{tt} u =\nabla _x^*\nabla _x u-h \in L^2(\lambda \otimes \mu ) $. $\quad \square $

We finally need a lemma for the solution of a divergence equation with Dirichlet boundary conditions. The resolution of divergence equation is an important tool in mathematical fluid dynamics (see the book [23, Section III.3]). However, in order to obtain more natural estimate on the constants, instead of resorting to the aforementioned Bogovskii’s operator, we take advantage of the structure of space $L^2(\mu )$ by eigenspace decomposition, which is made possible thanks to Assumption 3. This will provide us test functions which play a crucial role in the proof of Theorem 2.

Lemma 2.6

For any function $f\in L^2(\lambda \otimes \mu )$ with $(f)_{\lambda \otimes \mu }=0$, there exist two functions $\phi _0 \in H_0^1(\lambda \otimes \mu )$ and $\Phi \in H^2(\lambda \otimes \mu )$ such that $\nabla _x \Phi \in H_0^1(\lambda \otimes \mu )^d$ and

$$\begin{aligned} -\partial _t\phi _0+\nabla _x^*\nabla _x \Phi = f \end{aligned}$$

(25)

with estimates

$$\begin{aligned} \Vert \phi _0\Vert _{L^2(\lambda \otimes \mu )}+ \Vert \nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )} \leqslant C\left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}\nonumber \\ \end{aligned}$$

(26)

and

$$\begin{aligned}{} & {} \Vert \nabla _x \phi _0\Vert _{L^2(\lambda \otimes \mu )} + \Vert \bar{\nabla }\nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )} \leqslant C\left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2} \right. \nonumber \\{} & {} \left. \quad + \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}. \end{aligned}$$

(27)

Here C is a universal constant and R is the constant defined in Theorem 1.

Remark 2.7

We believe the correct scaling of the rhs should be $O(\frac{1}{T})$ as $T\rightarrow 0$, which we are unable to obtain, due to the pessimistic estimates in the last two lines of (31) that changed the scaling of the last two terms from O(1) to $O(T^2)$, but will not pursue further since in the proof of Theorem 1 we only take $T=\frac{1}{\sqrt{m}}$. As we mentioned iearlier after Theorem 2, the scaling of $O(\frac{1}{T})$ as $T\rightarrow 0$ should come from (35).

Before we proceed to the proof, let us give a brief heuristic argument on why we need to introduce the space of harmonic functions (i.e. the space ${\mathbb {H}}$ that appears at the beginning of the proof) and consider orthogonal projection on it. Indeed, a direct way to look for a solution of (25) is to look for that of (22) and set $\phi _0=\partial _t u, \Phi = u$. However, these test functions do not satisfy the appropriate boundary conditions. In particular, if solution of (22) satisfy $\nabla _x u(t=0,\cdot ) = \nabla _x u(t=T,\cdot )=0$, then necessarily f has to be perpendicular to the space of harmonic functions. Meanwhile, the harmonic part of f requires special treatment from us and brings technical difficulty to the proof. However, thanks to Assumption 3, one can decompose the harmonic part of f using separation of variables, which enables us to obtain the solution of divergence equation by constructing it for each component and adding them up.

Proof

Let ${\mathbb {H}}$ be the subspace of $L^2(\lambda \otimes \mu )$ that consists of “harmonic functions”, in other words, $f\in {\mathbb {H}}$ if and only if ${\mathscr {L}}f=0$. We consider the decomposition $f=f^{(1)}+f^{(2)}$ where $f^{(1)}\in {\mathbb {H}}$ and $f^{(2)}\perp {\mathbb {H}}$. Since $1\in {\mathbb {H}}$ we know $(f^{(2)})_{\lambda \otimes \mu }=0$ and hence $(f^{(1)})_{\lambda \otimes \mu }=0$. Therefore by linearity it suffices to consider $f^{(1)}$ and $f^{(2)}$ separately. For $f^{(2)}$, the equation

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {L}}u=f^{(2)}&\text{ in }&\ I\times {\mathbb {R}}^d, \\ {}&\partial _t u(t=0, \cdot )=\partial _t u(t=T,\cdot )=0&\text{ in }&\ {\mathbb {R}}^d \end{aligned} \right. \end{aligned}$$

(28)

has a unique solution in $V\cap H^2(\lambda \otimes \mu )$ by Lemma 2.4. Moreover, for any $v\in {\mathbb {H}} \cap H^2(\lambda \otimes \mu )$, integration by parts yields

$$\begin{aligned} 0= & {} \int _{I\times {\mathbb {R}}^d} f^{(2)}v \,\textrm{d}\lambda (t) \,\textrm{d}\mu (x) = B[u,v]\\= & {} \int _{I\times {\mathbb {R}}^d} u{\mathscr {L}}v \,\textrm{d}\lambda (t) \,\textrm{d}\mu (x) + \int _{{\mathbb {R}}^d} \left( u(T)\partial _t v(T) - u(0)\partial _t v(0)\right) \,\textrm{d}\mu (x) \end{aligned}$$

Therefore, since v is arbitrary, we have $u(T)=u(0)=0$, which implies $\nabla _x u\in H_0^1(\lambda \otimes \mu )^d$. Also by construction of boundary conditions $\partial _t u\in H_0^1(\lambda \otimes \mu )$. Thus for $f^{(2)}$ part, it suffices to take correspondingly $\phi _0^{(2)}=\partial _t u,~\Phi ^{(2)}= u$ with the estimates

$$\begin{aligned} \Vert \bar{\nabla }u\Vert _{L^2(\lambda \otimes \mu )}^2 {\mathop {\leqslant }\limits ^{(23)}} C\max \left\{ \frac{1}{m},T^2 \right\} \Vert f^{(2)}\Vert ^2_{L^2(\lambda \otimes \mu )}, \end{aligned}$$

(29)

and

$$\begin{aligned} \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2 {\mathop {\leqslant }\limits ^{(18),(29)}} C\left( 1+\dfrac{R^2}{m}+R^2T^2\right) \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

(30)

We now consider the $f^{(1)}$ part. Since $ \{1\}\cup \{w_\alpha \} $ forms an orthonormal basis in $L^2(\mu )$ and $(f^{(1)})_{\lambda \otimes \mu }=0$, we have an orthogonal decomposition

$$\begin{aligned} f^{(1)}(t, x)=f_0(t)+\sum _{\alpha } f_\alpha (t) w_\alpha (x). \end{aligned}$$

Since $f^{(1)}$ is harmonic,

$$\begin{aligned} 0={\mathscr {L}}f^{(1)}= -f_0''(t)+\sum _{\alpha } (-f''_\alpha (t)+\alpha ^2f_\alpha (t)) w_\alpha (x) \end{aligned}$$

and therefore $f_0(t)$ is an affine function $f_0(t)=c_0(t-\frac{T}{2})$ for some constant $c_0$, as $f_0(t)$ has integral zero. Moreover for $\alpha >0$ there exist constants $c_\pm ^\alpha $ such that

$$\begin{aligned} f_\alpha (t)=c_+^\alpha e^{-\alpha t}+c_-^\alpha e^{-\alpha (T-t)}. \end{aligned}$$

Therefore, by orthogonality in $L^2(\lambda \otimes \mu )$, we can write for some constant $C\in (1,\infty )$,

$$\begin{aligned} \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2&= \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + c_0^2\Vert t-\frac{T}{2}\Vert _{L^2(\lambda )}^2+\sum _\alpha \Vert c_+^\alpha e^{-\alpha t} + c_-^\alpha e^{-\alpha (T-t)} \Vert _{L^2(\lambda )}^2 \nonumber \\&= \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{T^2 c_0^2}{12}+\sum _\alpha \left( \left( (c_+^\alpha )^2+(c_-^\alpha )^2 \right) \frac{1-e^{-2\alpha T}}{2\alpha T} + 2c_+^\alpha c_-^\alpha e^{-\alpha T} \right) \nonumber \\&\geqslant \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{T^2 c_0^2}{12}+\sum _\alpha \left( (c_+^\alpha )^2+(c_-^\alpha )^2 \right) \left( \frac{1-e^{-2\alpha T}}{2\alpha T} - e^{-\alpha T} \right) \nonumber \\&\geqslant \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{T^2 c_0^2}{12} + \frac{1}{C}\sum _\alpha \left( (c_+^\alpha )^2+(c_-^\alpha )^2 \right) \frac{(1-e^{-\alpha T})^3}{\alpha T}. \end{aligned}$$

(31)

The construction of test functions for $f_0(t)$ is straightforward: We simply take $\Phi ^{(0)}=0$ and $\phi _0^{(0)}(t,x)=\frac{c_0}{2}(t^2-tT)$. We then construct $\phi _{0,\alpha },\Phi _\alpha $ for each component of the sum $e^{-\alpha t} w_\alpha (x)$, and therefore the functions $\phi _{0,\alpha }(T-t,\cdot ),\Phi _{\alpha }(T-t,\cdot )$ also apply to the component $e^{-\alpha (T-t)}w_\alpha (x)$, so that the eventual test functions $\phi _0,\Phi $ can be obtained after taking linear combination. The goal is to find $\phi _{0,\alpha },\Phi _\alpha $ such that

$$\begin{aligned} -\partial _t\phi _{0,\alpha }+\nabla _x^*\nabla _x \Phi _\alpha = e^{-\alpha t}w_\alpha (x). \end{aligned}$$

Since $w_\alpha \in H^2(\lambda \otimes \mu )$, in order to eliminate the x part of the equation, we can take the natural ansatz by separation of variables $\phi _{0,\alpha }=\psi _{1,\alpha }(t)w_\alpha (x)$ and $\Phi _\alpha =\psi _{2,\alpha }(t) w_\alpha (x)$, and the two functions $\psi _{1,\alpha }(t),\psi _{2,\alpha }(t)$ should satisfy $\psi _{1,\alpha }(0)=\psi _{1,\alpha }(T)=\psi _{2,\alpha }(0)=\psi _{2,\alpha }(T)=0$ as well as the equation

$$\begin{aligned} -\psi _{1,\alpha }'(t)+\alpha ^2\psi _{2,\alpha }(t)=e^{-\alpha t}. \end{aligned}$$

(32)

Integrating (32) against t, we obtain the necessary and sufficient condition

$$\begin{aligned} \int _0^T \psi _{2,\alpha }(t) \,\textrm{d}t = \frac{1-e^{-\alpha T}}{\alpha ^3}. \end{aligned}$$

(33)

Of course there exists infinitely many possible solutions, since for any $\psi _{2,\alpha }$ that vanishes at both time boundaries and satisfies (33), the choice $\psi _{1,\alpha } = \int _0^t (\alpha ^2\psi _{2,\alpha } (\tau ) -e^{-\alpha \tau })\,\textrm{d}\tau $ also vanishes at both time boundaries. Therefore we only need to choose a particular one to satisfy the desired estimates. Let us introduce a short-hand notation $\ell =e^{-\alpha T} \in (0,1)$. Our idea is to find $\psi _{2,\alpha }$ of the form $\psi _{2,\alpha }(t)= \frac{1}{\alpha ^2}g(e^{-\alpha t})$, which after a change of variable $s{:}{=}e^{-\alpha t}$ turns the condition (33) into $\int _{\ell }^1 \frac{g(s)}{s}\,\textrm{d}s = 1-\ell $, and the boundary conditions into $g(1)=g(\ell )=0$. Hence, we may finish our construction by picking $g(s)=sh(s)$ with

$$\begin{aligned} h(x)= \frac{6}{(1-\ell )^2}(x-\ell )(1-x). \end{aligned}$$

From the expression we can directly derive (using $\alpha \geqslant \sqrt{m}$)

$$\begin{aligned} 0 \leqslant g(s)\leqslant \frac{3}{2}s \ \text{ and } \ |g'(s)|\leqslant \dfrac{4}{1-\ell }= \frac{4}{1-e^{-\alpha T}}. \end{aligned}$$

One can explicitly compute

$$\begin{aligned}&\Vert \psi _{2,\alpha }\Vert _{L^2(\lambda )}^2=\dfrac{1}{\alpha ^4T}\int _0^T g(e^{-\alpha t})^2 \,\textrm{d}t = \dfrac{1}{\alpha ^5T}\int _\ell ^1 \dfrac{g(s)^2}{s}\,\textrm{d}s = \dfrac{3(1-e^{-2\alpha T})}{5\alpha ^5T}, \end{aligned}$$

(34)

$$\begin{aligned} \text{ and }&\Vert \psi _{2,\alpha }'\Vert _{L^2(\lambda )}^2=\dfrac{1}{\alpha ^2T}\int _0^T g'(e^{-\alpha t})^2 e^{-2\alpha t} \,\textrm{d}t = \dfrac{1}{\alpha ^3T}\int _\ell ^1 g'(s)^2 s \,\textrm{d}s \leqslant \dfrac{8}{\alpha ^3T(1-e^{-\alpha T})}. \end{aligned}$$

(35)

Moreover since $\psi _{1,\alpha }'(t) = \alpha ^2\psi _{2,\alpha }(t) - e^{-\alpha t}$ from (32),

$$\begin{aligned} \Vert \psi _{1,\alpha }'\Vert _{L^2(\lambda )}^2\leqslant 2\alpha ^4 \Vert \psi _{2,\alpha }\Vert _{L^2(\lambda )}^2 +\dfrac{1-e^{-2\alpha T}}{\alpha T}\leqslant \dfrac{3(1-e^{-2\alpha T})}{\alpha T}. \end{aligned}$$

(36)

Finally since

$$\begin{aligned} \psi _{1,\alpha }(t)=\int _0^t (g(e^{-\alpha s})-e^{-\alpha s})\,\textrm{d}s =\dfrac{1}{\alpha }\int _{e^{-\alpha t}}^1 (\dfrac{g(\tau )}{\tau }-1)\,\textrm{d}\tau = \dfrac{1}{\alpha } r(e^{-\alpha t}) \end{aligned}$$

with

$$\begin{aligned}r(s) = \int _s^1 (h(\tau )-1)\,\textrm{d}\tau = \frac{(s-\ell )(1-s)(1+\ell -2s)}{(1-\ell )^2},\end{aligned}$$

we can estimate

$$\begin{aligned}{} & {} \alpha ^2\Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2= \frac{1}{\alpha T} \int ^1_\ell \frac{r(t)^2}{t}\,\textrm{d}t = \frac{(1-\ell )^3}{\alpha T}\nonumber \\{} & {} \quad \int _0^1 \frac{s^2(1-s)^2(1-2s)^2}{(1-\ell )s+\ell }\,\textrm{d}s \leqslant \frac{C(1-e^{-\alpha T})^3}{\alpha T}. \end{aligned}$$

(37)

To sum up, our construction of test functions can be write as

$$\begin{aligned} \phi _0&= \partial _t u+c_0\frac{t^2-tT}{2} + \sum _\alpha (c_+^\alpha \psi _{1,\alpha }(t) + c_-^\alpha \psi _{1,\alpha }(T-t))w_\alpha (x), \\ \Phi&= u + \sum _\alpha (c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t))w_\alpha (x), \end{aligned}$$

here we recall that u is the solution of (28).

We now establish the estimates by direct calculations, which is possible since the variables are separated. Notice that for $\alpha , \beta $,

$$\begin{aligned} \langle \nabla _x w_\alpha , \nabla _x w_\beta \rangle _{L^2(\mu )}= \langle w_\alpha , \nabla _x^*\nabla _x w_\beta \rangle _{L^2(\mu )}= \beta ^2 \langle w_\alpha , w_\beta \rangle _{L^2(\mu )}=\alpha ^2 \delta _{\alpha ,\beta }, \end{aligned}$$

hence cross terms in the expansion of $\Vert \sum _\alpha (c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t))\nabla _x w_\alpha (x)\Vert _{L^2(\lambda \otimes \mu )}^2$ vanish. Therefore, we can estimate

$$\begin{aligned}&\Vert \phi _0\Vert ^2_{L^2(\lambda \otimes \mu )} + \Vert \nabla _x\Phi \Vert ^2_{L^2(\lambda \otimes \mu )}\nonumber \\ {}&\quad \leqslant 3\left( \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{c_0^2}{4}\Vert t^2-tT\Vert _{L^2(\lambda )}^2 \right. \nonumber \\&\qquad + \sum _\alpha \Vert c_+^\alpha \psi _{1,\alpha }(t) + c_-^\alpha \psi _{1,\alpha }(T-t)\Vert _{L^2(\lambda )}^2\Vert w_\alpha \Vert _{L^2(\mu )}^2 \nonumber \\&\qquad \left. + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 + \left\| \sum _\alpha (c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t))\nabla _x w_\alpha \right\| _{L^2(\lambda \otimes \mu )}^2 \right) \nonumber \\&\quad {\mathop {\leqslant }\limits ^{(23)}} 6\left( \max \{\frac{1}{m},T^2 \} \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{c_0^2T^4}{120} + \sum _\alpha ((c_+^\alpha )^2+(c_-^\alpha )^2) \Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2\right. \nonumber \\&\qquad \left. +\sum _{\alpha } \Vert c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t)\Vert _{L^2(\lambda )}^2 \Vert \nabla _x w_\alpha \Vert _{L^2(\mu )}^2 \right) \nonumber \\&\quad \leqslant C\left( \max \{\frac{1}{m},T^2\}\Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + c_0^2T^4 \right. \nonumber \\&\qquad \left. + \sum _\alpha ((c_+^\alpha )^2+(c_-^\alpha )^2) (\Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2+\alpha ^2\Vert \psi _{2,\alpha }\Vert _{L^2(\lambda )}^2) \right) \nonumber \\&\quad {\mathop {\leqslant }\limits ^{(37),(34)}} C\left( \max \{\frac{1}{m},T^2\}\left\| f^{(2)}\right\| _{L^2(\lambda \otimes \mu )}^2+ c_0^2T^4 \right. \nonumber \\&\qquad \left. + \sum _\alpha \frac{1}{\alpha ^2}((c_+^\alpha )^2+(c_-^\alpha )^2) \frac{(1-e^{-\alpha T})^3+1-e^{-2\alpha T}}{\alpha T} \right) \nonumber \\&\quad {\mathop {\leqslant }\limits ^{(31)}} C\max \left\{ \frac{1}{m(1-e^{-\sqrt{m} T})^2},T^2\right\} \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

(38)

Here in the last line when we used (31), the worse factor $(1-e^{-\sqrt{m}T})^{-2}$ comes only from the last term on the line above. This establishes (26). Using similar arguments, we can estimate

$$\begin{aligned} \Vert \nabla _x \phi _0\Vert _{L^2(\lambda \otimes \mu )}^2&= \Big \Vert \nabla _x\partial _t u+ \sum _\alpha (c_+^\alpha \psi _{1,\alpha }(t) - c_-^\alpha \psi _{1,\alpha }(T-t))\nabla _x w_\alpha (x)\Big \Vert ^2_{L^2(\lambda \otimes \mu )} \nonumber \\ {}&\leqslant 2\left( \Vert \nabla _x\partial _t u \Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha \alpha ^2\Vert c_+^\alpha \psi _{1,\alpha }(t) + c_-^\alpha \psi _{1,\alpha }(T-t)\Vert _{L^2(\lambda )}^2 \right) \nonumber \\ {}&\leqslant C\left( \Vert \nabla _x\partial _t u \Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2) \alpha ^2 \Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2 \right) \nonumber \\&{\mathop {\leqslant }\limits ^{(37)}} C\left( \Vert \nabla _x\partial _t u \Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2) \frac{(1-e^{-\alpha T})^3}{\alpha T} \right) , \end{aligned}$$

(39)

as well as

$$\begin{aligned} \Vert \partial _t \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu )}^2&= \Bigl \Vert \nabla _x \partial _t u + \sum _\alpha (c_+^\alpha \psi _{2,\alpha }'(t) - c_-^\alpha \psi _{2,\alpha }'(T-t))\nabla _x w_\alpha (x)\Bigr \Vert _{L^2(\lambda \otimes \mu )}^2 \nonumber \\&\leqslant 2 \left( \Vert \nabla _x \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha \Vert c_+^\alpha \psi _{2,\alpha }'(t) \right. \nonumber \\&\quad \left. - c_-^\alpha \psi _{2,\alpha }'(T-t)\Vert _{L^2(\lambda )}^2\Vert \nabla _x w_\alpha \Vert _{L^2(\mu )}^2\right) \nonumber \\&\leqslant C \left( \Vert \nabla _x \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha \alpha ^2((c_+^\alpha )^2+ (c_-^\alpha )^2)\Vert \psi _{2,\alpha }'\Vert _{L^2(\lambda )}^2\right) \nonumber \\&{\mathop {\leqslant }\limits ^{(35)}} C\left( \Vert \nabla _x \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\frac{1}{\alpha T(1-e^{-\alpha T})}\right) . \end{aligned}$$

(40)

We finally treat the terms from $\nabla ^2_x \Phi $:

$$\begin{aligned} \Vert \nabla ^2_x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2&{\mathop {\leqslant }\limits ^{(19)}} C\left( \Vert \nabla _x^*\nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2 +R^2 \Vert \nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2\right) \nonumber \\ {}&{\mathop {\leqslant }\limits ^{(25),(38)}} C\left( \Big \Vert f+\partial _{tt} u+c_0(t-\frac{T}{2}) + \sum _\alpha (c_+^\alpha \psi _{1,\alpha }'(t) \right. \nonumber \\&\quad - c_-^\alpha \psi _{1,\alpha }'(T-t))w_\alpha (x)\Big \Vert _{L^2(\lambda \otimes \mu )}^2\nonumber \\&\quad \left. +R^2\left( T^2+ \frac{1}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) \nonumber \\&\leqslant C\left( \Vert \partial _{tt}u\Vert _{L^2(\lambda \otimes \mu )}^2+c_0^2T^2+ \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\Vert \psi _{1,\alpha }'\Vert _{L^2(\lambda )}^2 \right. \nonumber \\&\quad \left. +\left( 1+R^2T^2+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) \nonumber \\&{\mathop {\leqslant }\limits ^{(36)}} C\left( \Vert \partial _{tt}u\Vert _{L^2(\lambda \otimes \mu )}^2+c_0^2T^2+ \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\frac{1-e^{-2\alpha T}}{\alpha T} \right. \nonumber \\ {}&\quad \left. +\left( 1+R^2T^2+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) . \end{aligned}$$

(41)

Adding together (39), (40) and (41), we arrive at

$$\begin{aligned}&\Vert \nabla _x \phi _0\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \bar{\nabla }\nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2 \\&\quad \leqslant C\left( \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2 + c_0^2 T^2 +\sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\frac{1}{\alpha T(1-e^{-\alpha T})} \right. \\&\qquad \left. + \left( 1+R^2T^2+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) \\&\quad {\mathop {\leqslant }\limits ^{(30),(31)}}C \left( 1+R^2T^2+ \frac{1}{(1-e^{-\sqrt{m}T})^4}+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^4}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

$\quad \square $

We are now ready to prove the main results of the paper. The proof is essentially inspired from that of [1, Proof of Theorem 3]. In particular, to retrieve the $L^2(\lambda \otimes \mu ;H^{-1}_\kappa )$ norm, we need to construct a test function that is in $L^2(\lambda \otimes \mu ;H^1_\kappa )$, which is highly related to the test functions constructed in Lemma 2.6. The differences of these two proofs are: (1) we choose the test functions explicitly $\xi _0=1$ and $\xi _i = v_i$, which are orthogonal to each other and have explicit expressions for up to fourth moments (in particular any first and third moments vanish); (2) Instead of using $\Vert \bar{\nabla } \Pi _v f\Vert _{H^{-1}(\lambda \otimes \mu )}$ as an intermediate step, we proceed as (42) and control the $L^2(\lambda \otimes \mu ;H^1_\kappa )$ norm of another explicitly constructed function, in order to minimize the usage of Cauchy–Schwarz inequalities and track the dimension dependence of constants carefully.

Proof of Theorem 2

Without loss of generality, assume $(f)_{\lambda \otimes \rho _{\infty }}=0$. which indicates $ (\Pi _v f)_{\lambda \otimes \mu } = 0$. Therefore, we can take $\phi _0, \Phi $ as in Lemma 2.6 with $\Pi _v f$ in place of f, so that $-\partial _t \phi _0 + \nabla _x^* \nabla _x \Phi = \Pi _v f$. The trick in our following step is to introduce v variable in the calculation. Notice, by Gaussianity, that

$$\begin{aligned} \int _{{\mathbb {R}}^d} v_i\ \,\textrm{d}\kappa (v)=0,\qquad \int _{{\mathbb {R}}^d} v_iv_j\ \,\textrm{d}\kappa (v)=\delta _{i,j}, \end{aligned}$$

where $\delta _{i,j}$ is the Kronecker symbol which equals to 1 if $i=j$ and 0 otherwise. Thus,

$$\begin{aligned} \begin{aligned} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2&=\int _{I\times {\mathbb {R}}^d} \Pi _v f(-\partial _t \phi _0 +\nabla _x^*\nabla _x \Phi )\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= \int _{I\times {\mathbb {R}}^{2d}}\Pi _v f (-\partial _t \phi _0+v\cdot \nabla _x\phi _0+v\cdot \partial _t \nabla _x \Phi \\&\quad -v\cdot \nabla _x^2\Phi \cdot v+\nabla _x\Phi \cdot \nabla _x U) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&= \int _{I\times {\mathbb {R}}^{2d}} f (-\partial _t \phi _0+v\cdot \nabla _x\phi _0+v\cdot \partial _t \nabla _x \Phi \\&\quad -v\cdot \nabla _x^2\Phi \cdot v+\nabla _x\Phi \cdot \nabla _x U) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad + \int _{I\times {\mathbb {R}}^{2d}} (\partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x \Phi +v\cdot \nabla _x^2\Phi \cdot v\\&\quad -\nabla _x\Phi \cdot \nabla _x U) (f-\Pi _v f) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v). \end{aligned} \end{aligned}$$

(42)

For the first integral on the right hand side, we use integration by parts, where it is important that the test functions $(\phi _0,\nabla _x\Phi )$ have Dirichlet boundary conditions in time:

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^{2d}} f (-\partial _t \phi _0+v\cdot \nabla _x\phi _0+v\cdot \partial _t \nabla _x \Phi -v\cdot \nabla _x^2\Phi \cdot v+\nabla _x\Phi \cdot \nabla _x U) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( \partial _t f\phi _0-\partial _t f(v\cdot \nabla _x\Phi )-\phi _0(v\cdot \nabla _x f)+f\phi _0(v\cdot \nabla _x U) \right. \\ {}&\qquad \left. +(v\cdot \nabla _x f)(v\cdot \nabla _x \Phi )-f(v\cdot \nabla _x\Phi )(v\cdot \nabla _x U)+f\nabla _x\Phi \cdot \nabla _x U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( \partial _t f\phi _0-\partial _t f(v\cdot \nabla _x\Phi )-\phi _0(v\cdot \nabla _x f)+\phi _0(\nabla _v f\cdot \nabla _x U) \right. \\&\qquad \left. +(v\cdot \nabla _x f)(v\cdot \nabla _x\Phi )-\nabla _v\cdot ((v\cdot \nabla _x\Phi )f \nabla _x U)+ f\nabla _x\Phi \cdot \nabla _x U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( (\partial _t f-v\cdot \nabla _x f+\nabla _x U\cdot \nabla _v f)(\phi _0-v\cdot \nabla _x\Phi )\right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad \leqslant \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )}\Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )}. \end{aligned}$$

We further estimate the term $\Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )}$ by explicit integration, noticing $(\phi _0,\Phi )$ do not depend on v so that explicit moments of v can be directly calculated:

$$\begin{aligned} \Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )}^2&= \int _{I\times {\mathbb {R}}^d} \Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{H^1_\kappa }^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= \int _{I\times {\mathbb {R}}^d} \left( \Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2_\kappa }^2\right. \\&\quad \left. +\Vert \nabla _v(\phi _0-v\cdot \nabla _x\Phi )\Vert _{L^2_\kappa }^2\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&= \int _{I\times {\mathbb {R}}^d} \left( \int _{{\mathbb {R}}^d}(\phi _0-v\cdot \nabla _x\Phi )^2\,\textrm{d}\kappa (v)\right. \\&\quad \left. +\int _{{\mathbb {R}}^d}|\nabla _x\Phi |^2\,\textrm{d}\kappa (v)\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&= \int _{I\times {\mathbb {R}}^d} \left( \phi _0^2+2|\nabla _x\Phi |^2 \right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&{\mathop {\leqslant }\limits ^{(26)}} C\left( \frac{1}{m(1-e^{-\sqrt{m} T})^2}+T^2\right) \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

For the second integral in (42), we estimate again by explicit expansion in v, which is possible since we have explicit up to fourth moments of v:

$$\begin{aligned}&\Vert \partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x\Phi +v\cdot \nabla ^2_x\Phi \cdot v-\nabla _x\Phi \cdot \nabla _x U\Vert _{L^2(\lambda \otimes \rho _{\infty })}^2 \\&\quad = \int _{I\times {\mathbb {R}}^{2d}} (\partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x\Phi \\&\qquad +v\cdot \nabla ^2_x\Phi \cdot v-\nabla _x\Phi \cdot \nabla _x U)^2\,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&\quad = \int _{I\times {\mathbb {R}}^{2d}}\left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)^2-2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U) (v\cdot \nabla _x\phi _0)\right. \\&\qquad -2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U) (v\cdot \partial _t\nabla _x\Phi ) +(v\cdot \nabla _x \phi _0)^2+(v\cdot \partial _t \nabla _x\Phi )^2\\&\qquad +2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)v\cdot \nabla _x^2 \Phi \cdot v+2(v\cdot \partial _t \nabla _x\Phi )(v\cdot \nabla _x\phi _0) \\&\qquad +(v\cdot \nabla _x^2 \Phi \cdot v)^2-2(v\cdot \partial _{t} \nabla _x\Phi )(v\cdot \nabla _x^2\Phi \cdot v)\\&\qquad \left. -2(v\cdot \partial _{x_k} \phi _0)(v\cdot \nabla _x^2 \Phi \cdot v) \right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)^2 \right. \\&\qquad +\sum _{i} v_i^2\left( (\partial _{x_i}\phi _0)^2+(\partial _t \partial _{x_i}\Phi )^2+2\partial _{x_i}\phi _0\partial _t \partial _{x_i}\Phi \right) \\&\qquad +2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)\sum _{i} v_i^2 \partial _{x_i x_i}\Phi \\&\qquad + \sum _{i} v_i^4 (\partial _{x_i x_i}\Phi _i)^2+2\sum _{i\ne j}v_i^2v_j^2(\partial _{x_ix_j} \Phi )^2\\&\qquad \left. +\sum _{i\ne j} v_i^2v_j^2 \partial _{x_ix_i}\Phi \partial _{x_jx_j}\Phi \right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad = \int _{I\times {\mathbb {R}}^d} \left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)^2+ |\nabla _x\phi _0+\partial _t \nabla _x\Phi |^2\right. + 3\sum _{i} (\partial _{x_i x_i}\Phi )^2\\&\qquad +2\sum _{i\ne j}(\partial _{x_i x_j} \Phi )^2+2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)\Delta _x\Phi \left. +\sum _{i\ne j} \partial _{x_ix_i}\Phi \partial _{x_jx_j}\Phi \right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&\quad \leqslant \int _{I\times {\mathbb {R}}^d} \left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U+\Delta _x\Phi )^2+2|\nabla _x \phi _0|^2+ 2 |\bar{\nabla } \nabla _x \Phi |^2\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&\quad {\mathop {=}\limits ^{(28)}} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2+2\Vert \nabla _x \phi _0\Vert ^2_{L^2(\lambda \otimes \mu )}+2\Vert \bar{\nabla } \nabla _x \Phi \Vert ^2_{L^2(\lambda \otimes \mu )} \\&\quad {\mathop {\leqslant }\limits ^{(27)}} C\left( 1+R^2T^2+ \frac{1}{(1-e^{-\sqrt{m}T})^4}+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^4}\right) \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

Combining the above estimates, we arrive at

$$\begin{aligned} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2&\leqslant \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )}\Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )} \\&\quad + \Vert \partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x\Phi +v\cdot \nabla _x^2\Phi \cdot v\\&\quad -\nabla _x\Phi \cdot \nabla _x U\Vert _{L^2(\lambda \otimes \rho _{\infty })}\Vert f-\Pi _v f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \\&\leqslant C\left( \left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )} \right. \\&\quad +\left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}+ \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \\&\quad \left. \Vert ({Id}-\Pi _v)f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}\right) . \end{aligned}$$

Finally

$$\begin{aligned} \Vert f\Vert _{L^2(\lambda \otimes \rho _{\infty })}&\leqslant \Vert ({Id}-\Pi _v )f\Vert _{L^2(\lambda \otimes \rho _{\infty })} + \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )} \\&\leqslant C\left( \left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )} \right. \\&\quad \left. + \left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}+ \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \Vert ({Id}-\Pi _v) f\Vert _{L^2(\lambda \otimes \rho _{\infty })}\right) , \end{aligned}$$

as claimed.$\quad \square $

With Theorem 2, we are now able to prove exponential relaxation to equilibrium claimed in Theorem 1, which essentially follows from a standard energy estimate.

Proof of Theorem 1

We first notice that the solution $f\in H^1_{hyp}((0,T)\otimes \mu )$ for all $T>0$. Indeed, as long as $f_0\in L^2(\mu ;H^1_\kappa )$, we have $f(t,\cdot ,\cdot ) \in L^2(\mu ;H^1_\kappa )$ for any $t>0$ (see for example [54, Theorem 35]), and hence $\partial _t f -{\mathcal {L}}_{\text {ham}}f = -\gamma \nabla _v^*\nabla _v f \in L^2(\lambda \otimes \mu ;H_\kappa ^{-1})$. We also have that (12) implies

$$\begin{aligned} \int _{{\mathbb {R}}^d\times {\mathbb {R}}^d} f(t,x,v) \,\textrm{d}\rho _{\infty }(x,v)=0 \end{aligned}$$

for all $t\in (0,T)$. This follows from

$$\begin{aligned} \dfrac{\,\textrm{d}}{\,\textrm{d}t}\int _{{\mathbb {R}}^d\times {\mathbb {R}}^d} f(t,x,v)\,\textrm{d}\rho _{\infty }(x,v)=0, \end{aligned}$$

using the equation (4) and integration by parts.

For every $0<s<t$, we have the typical energy estimate (hereafter we use $L^2((s,t)\otimes \rho _{\infty })$ to denote $L^2(\lambda _{(s,t)}\otimes \rho _{\infty })$):

$$\begin{aligned} \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })}^2-\Vert f(s,\cdot )\Vert _{L^2(\rho _{\infty })}^2 =-2\gamma \Vert \nabla _v f\Vert _{L^2((s,t)\otimes \rho _{\infty })}^2. \end{aligned}$$

(43)

In particular,

$$\begin{aligned} \text { the mapping } t\mapsto \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })}^2 \text { is nonincreasing.} \end{aligned}$$

(44)

Since by equation (4),

$$\begin{aligned} -\gamma \nabla _v^*\nabla _v f=\partial _t f-{\mathcal {L}}_{\text {ham}}f, \end{aligned}$$

we have

$$\begin{aligned} \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2((s,t)\otimes \mu ,H^{-1}_\kappa )}=\gamma \Vert \nabla _v^*\nabla _v f\Vert _{L^2((s,t)\otimes \mu ,H^{-1}_\kappa )}\leqslant \gamma \Vert \nabla _v f\Vert _{L^2((s,t)\otimes \rho _{\infty })}. \end{aligned}$$

Now fix T to be the length of the time interval. Denote $b_1=C(\frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T)$ and $b_2=C(1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}+ \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2})$, and thus by Theorem 2, (43) and (44), and Gaussian Poincaré inequality

$$\begin{aligned} \Vert ({Id}-\Pi _v)f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \leqslant \Vert \nabla _v f\Vert _{L^2(\lambda \otimes \rho _{\infty })}, \end{aligned}$$

we have for time stamps $t_k =kT$

$$\begin{aligned}&\Vert f(t_k,\cdot )\Vert _{L^2(\rho _{\infty })}^2 -\Vert f(t_{k-1},\cdot )\Vert _{L^2(\rho _{\infty })}^2 \\&\quad \leqslant -\dfrac{2\gamma }{(b_1\gamma +b_2)^2}\left( b_2\Vert \nabla _v f\Vert _{L^2((t_{k-1},t_k)\otimes \rho _{\infty })}+b_1\Vert \partial _t f \right. \\&\qquad \left. -{\mathcal {L}}_{\text {ham}}f\Vert _{L^2((t_{k-1},t_k)\otimes \mu ,H^{-1}_\kappa )}\right) ^2 \\&\quad \leqslant -\dfrac{2\gamma }{(b_1\gamma +b_2)^2}\left( b_2\Vert ({Id}-\Pi _v) f\Vert _{L^2((t_{k-1},t_k)\otimes \rho _{\infty })}+b_1\Vert \partial _t f \right. \\&\qquad \left. -{\mathcal {L}}_{\text {ham}}f\Vert _{L^2((t_{k-1},t_k)\otimes \mu ,H^{-1}_\kappa )}\right) ^2 \\&\quad \leqslant -\dfrac{2\gamma }{(b_1\gamma +b_2)^2}\Vert f\Vert _{L^2((t_{k-1},t_k)\otimes \rho _{\infty })}^2 \\&\quad \leqslant -\dfrac{2\gamma T }{(b_1\gamma +b_2)^2}\Vert f(t_k,\cdot )\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

Now for any $t>0$, we pick the integer k satisfying $t_k\leqslant t < t_{k+1}$, so that $\Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })} \leqslant \Vert f(t_k,\cdot )\Vert _{L^2(\rho _{\infty })}$. Applying above inequality iteratively and using the monoticity (44), we obtain

$$\begin{aligned} \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })}^2&\leqslant \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) ^{-k} \Vert f_0\Vert _{L^2(\rho _{\infty })}^2 \\&\leqslant \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) ^{-\frac{t}{T}+1} \Vert f_0\Vert _{L^2(\rho _{\infty })}^2 \\&= \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) \exp \left( -\frac{t}{T}\log \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) \right) \Vert f_0\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

The prefactor

$$\begin{aligned}1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2} \leqslant C\left( 1+ \frac{\gamma T}{\left( \frac{\gamma }{\sqrt{m}}+\gamma T+1\right) ^2}\right) \end{aligned}$$

is bounded above by a constant. Using $\log (1+x) \geqslant \frac{1}{C}x$ for $x \in [0, \frac{1}{C}]$ for some universal constant C, and then pick $T=\frac{1}{\sqrt{m}}$, this yields exponential decay with rate

$$\begin{aligned} \nu \geqslant C\frac{\gamma }{(b_1\gamma +b_2)^2} \geqslant C \frac{\gamma m}{(\gamma +R+\sqrt{m})^2}, \end{aligned}$$

which is precisely (13). $\quad \square $

Data Availability Statement

This manuscript has no associated data.

References

Albritton, D., Armstrong, S., Mourrat, J.-C., Novack, M.: Variational methods for the kinetic Fokker–Planck equation, arXiv preprint arXiv:1902.04037, 2019
Andrieu, C., Durmus, A., Nüsken, N., Roussel, J.: Hypocoercivity of piecewise deterministic Markov process-Monte Carlo. Ann. Appl. Probab. 31(5), 2478–2517, 2021
Article MathSciNet MATH Google Scholar
Armstrong, S.: Answer to “Elliptic regularity with Gibbs measure satisfying Bakry–Emery condition”, MathOverflow. https://mathoverflow.net/q/335599 (version: 2019-07-06)
Bakry, D., Cattiaux, P., Guillin, A.: Rate of convergence for ergodic continuous Markov processes: Lyapunov versus Poincaré. J. Funct. Anal. 254(3), 727–759, 2008
Article MathSciNet MATH Google Scholar
Bakry, D., Émery, M.: Diffusions hypercontractives, Séminaire de Probabilités XIX 1983/84, Springer, 177–206, 1985
Bakry, D., Gentil, I., Ledoux, M.: Analysis and Geometry of Markov Diffusion Operators, Springer, Cham, 2014
Baudoin, F.: Wasserstein contraction properties for hypoelliptic diffusions, arXiv:1602.04177 [math], 2016
Baudoin, F.: Bakry–émery meet Villani. J. Funct. Anal. 273(7), 2275–2291, 2017
Article MathSciNet MATH Google Scholar
Baudoin, F., Gordina, M., Herzog, D.P.: Gamma calculus beyond Villani and explicit convergence estimates for Langevin dynamics with singular potentials. Arch. Rational Mech. Anal. 241(2), 765–804, 2021
Article ADS MathSciNet MATH Google Scholar
Bernard, É., Fathi, M., Levitt, A., Stoltz, G.: Hypocoercivity with Schur complements. Annales Henri Lebesgue 5, 523–557, 2022
Article MathSciNet MATH Google Scholar
Bogovskii, M.E.: Solution of the First Boundary Value Problem for the Equation of Continuity of an Incompressible Medium, Doklady Akademii Nauk, vol. 248, Russian Academy of Sciences, 1037–1040, 1979
Camrud, E., Herzog, D.P., Stoltz, G., Gordina, M.: Weighted $L^2$-contractivity of Langevin dynamics with singular potentials. Nonlinearity 35(2), 998, 2021
Article ADS MATH Google Scholar
Cattiaux, P., Guillin, A., Monmarché, P., Zhang, C.: Entropic multipliers method for Langevin diffusion and weighted log Sobolev inequalities. J. Funct. Anal. 277(11), 108288, 2019
Article MathSciNet MATH Google Scholar
Conrad, F., Grothaus, M.: Construction, ergodicity and rate of convergence of ${N}$-particle Langevin dynamics with singular potentials. J. Evol. Equ. 10(3), 623–662, 2010
Article MathSciNet MATH Google Scholar
Cooke, B., Herzog, D.P., Mattingly, J.C., McKinley, S.A., Schmidler, S.C.: Geometric ergodicity of two-dimensional Hamiltonian systems with a Lennard–Jones-like repulsive potential. Commun. Math. Sci. 15(7), 1987–2025, 2017
Article MathSciNet MATH Google Scholar
Dalalyan, A.S., Riou-Durand, L.: On sampling from a log-concave density using kinetic Langevin diffusions. Bernoulli 26(3), 1956–1988, 2020
Article MathSciNet MATH Google Scholar
Dolbeault, J., Klar, A., Mouhot, C., Schmeiser, C.: Exponential rate of convergence to equilibrium for a model describing fiber lay-down processes. Appl. Math. Res. eXpress 2013(2), 165–175, 2013
MathSciNet MATH Google Scholar
Dolbeault, J., Mouhot, C., Schmeiser, C.: Hypocoercivity for kinetic equations with linear relaxation terms. Comptes Rendus Mathematique 347(9), 511–516, 2009
Article MathSciNet MATH Google Scholar
Dolbeault, J., Mouhot, C., Schmeiser, C.: Hypocoercivity for linear kinetic equations conserving mass. Trans. Am. Math. Soc. 367(6), 3807–3828, 2015
Article MathSciNet MATH Google Scholar
Eberle, A., Guillin, A., Zimmer, R.: Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab. 47(4), 1982–2010, 2019
Article MathSciNet MATH Google Scholar
Eckmann, J.-P., Hairer, M.: Spectral properties of hypoelliptic operators. Commun. Math. Phys. 235(2), 233–253, 2003
Article ADS MathSciNet MATH Google Scholar
Evans, L.C.: Partial Differential Equations, vol. 19, American Mathematical Society, Philadelphia, 2010
Galdi, G.: An Introduction to the Mathematical Theory of the Navier–Stokes Equations: Steady-State Problems, Springer, Berlin, 2011
Gigli, N.: Answer to “Elliptic regularity with Gibbs measure satisfying Bakry–Emery condition”, MathOverflow. https://mathoverflow.net/q/335608 (version: 2019-07-06)
Grothaus, M., Stilgenbauer, P.: Hypocoercivity for Kolmogorov backward evolution equations and applications. J. Funct. Anal. 267(10), 3515–3556, 2014
Article MathSciNet MATH Google Scholar
Grothaus, M., Stilgenbauer, P.: Hilbert space hypocoercivity for the Langevin dynamics revisited. Methods Funct. Anal. Topol. 22(02), 152–168, 2016
MathSciNet MATH Google Scholar
Helffer, B., Nier, F.: Hypoelliptic Estimates and Spectral Theory for Fokker–Planck Operators and Witten Laplacians, vol. 1862, Springer, 2005
Hérau, F.: Hypocoercivity and exponential time decay for the linear inhomogeneous relaxation Boltzmann equation. Asympt. Anal. 46(3–4), 349–359, 2006
MathSciNet MATH Google Scholar
Hérau, F., Nier, F.: Isotropic hypoellipticity and trend to equilibrium for the Fokker–Planck equation with a high-degree potential. Arch. Ration. Mech. Anal. 171(2), 151–218, 2004
Article MathSciNet MATH Google Scholar
Herzog, D.P., Mattingly, J.C.: Ergodicity and Lyapunov functions for Langevin dynamics with singular potentials. Commun. Pure Appl. Math. 72(10), 2231–2255, 2019
Article MathSciNet MATH Google Scholar
Hooton, J.G.: Compact Sobolev imbeddings on finite measure spaces. J. Math. Anal. Appl. 83, 570–581, 1981
Article MathSciNet MATH Google Scholar
Hörmander, L.: Hypoelliptic second order differential equations. Acta Math. 119, 147–171, 1967
Article MathSciNet MATH Google Scholar
Iacobucci, A., Olla, S., Stoltz, G.: Convergence rates for nonequilibrium Langevin dynamics. Annales mathématiques du Québec 43(1), 73–98, 2019
Article MathSciNet MATH Google Scholar
Kolmogorov, A.: Zufallige bewegungen (zur theorie der Brownschen bewegung). Ann. Math. 66, 116–117, 1934
Article MathSciNet MATH Google Scholar
Kozlov, S.M.: Effective diffusion in the Fokker–Planck equation. Math. Notes Acad. Sci. USSR 45(5), 360–368, 1989
MATH Google Scholar
Ledoux, M.: A simple analytic proof of an inequality by P. Buser. Proc. Am. Math. Soc. 121(3), 951–959, 1994
Article MathSciNet MATH Google Scholar
Ledoux, M.: Spectral gap, logarithmic Sobolev constant, and geometric bounds. Surv. Differ. Geom. 9(1), 219–240, 2004
Article MathSciNet MATH Google Scholar
Leimkuhler, B., Sachs, M., Stoltz, G.: Hypocoercivity properties of adaptive Langevin dynamics. SIAM J. Appl. Math. 80(3), 1197–1222, 2020
Article MathSciNet MATH Google Scholar
Lu, Y., Mattingly, J.C.: Geometric ergodicity of Langevin dynamics with Coulomb interactions. Nonlinearity 33(2), 675, 2019
Article ADS MathSciNet MATH Google Scholar
Ma, Y.-A., Chatterji, N.S., Cheng, X., Flammarion, N., Bartlett, P.L., Jordan, M.I.: Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli 27(3), 1942–1992, 2021
Mattingly, J.C., Stuart, A.M., Higham, D.J.: Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise. Stoch. Process. Appl. 101(2), 185–232, 2002
Article MathSciNet MATH Google Scholar
Metafune, G., Pallara, D., Priola, E.: Spectrum of Ornstein–Uhlenbeck operators in lp spaces with respect to invariant measures. J. Funct. Anal. 196(1), 40–60, 2002
Article MathSciNet MATH Google Scholar
Mouhot, C., Neumann, L.: Quantitative perturbative study of convergence to equilibrium for collisional kinetic models in the torus. Nonlinearity 19(4), 969, 2006
Otto, F., Villani, C.: Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173(2), 361–400, 2000
Article MathSciNet MATH Google Scholar
Pavliotis, G.A.: Stochastic Processes and Applications: Diffusion Processes, the Fokker–Planck and Langevin Equations, vol. 60, Springer, 2014
Risken, H.: Fokker–Planck Equation: Methods of Solution and Applications, Springer Series in Synergetics, 1989
Roussel, J., Stoltz, G.: Spectral methods for Langevin dynamics and associated error estimates. ESAIM Math. Model. Numer. Anal. 52(3), 1051–1083, 2018
Sason, I., Verdú, S.: $f$-divergence inequalities. IEEE Trans. Inf. Theory 62(11), 5973–6006, 2016
Article MathSciNet MATH Google Scholar
Stoltz, G., Trstanova, Z.: Langevin dynamics with general kinetic energies. Multiscale Model. Simul. 16(2), 777–806, 2018
Article MathSciNet MATH Google Scholar
Stoltz, G., Vanden-Eijnden, E.: Longtime convergence of the temperature-accelerated molecular dynamics method. Nonlinearity 31(8), 3748–3769, 2018
Article ADS MathSciNet MATH Google Scholar
Talay, D.: Stochastic Hamiltonian systems: exponential convergence to the invariant measure, and discretization by the implicit Euler scheme. Markov Process. Relat. Fields 8(2), 163–198, 2002
MathSciNet MATH Google Scholar
Tropper, M.M.: Ergodic and quasideterministic properties of finite-dimensional stochastic systems. J. Stat. Phys. 17(6), 491–509, 1977
Article ADS MathSciNet MATH Google Scholar
Villani, C.: Hypocoercive diffusion operators. Bollettino dell’Unione Matematica Italiana 10-B(2), 257–275, 2007
Villani, C.: Hypocoercivity. Mem. Am. Math. Soc. 202(950), 66, 2009
Wu, L.: Large and moderate deviations and exponential convergence for stochastic damping Hamiltonian systems. Stoch. Process. Appl. 91(2), 205–238, 2001
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is supported in part by National Science Foundation via Grants DMS-1454939 and CCF-1910571. We would like to thank Rong Ge, Yulong Lu, Jonathan Mattingly, Jean-Christophe Mourrat, and Gabriel Stoltz for helpful discussions, and thank Felix Otto for discussions and providing an idea leading to the proof of Lemma 2.6. LW would also like to thank Scott Armstrong [3] and Nicola Gigli [24] for answering our question on MathOverflow, which lead to our proof of Lemma 2.4 (ii).

Author information

Authors and Affiliations

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
Yu Cao
Department of Mathematics, Duke University, Durham, NC, 27708, USA
Jianfeng Lu
Department of Physics, Duke University, Durham, NC, 27708, USA
Jianfeng Lu
Department of Chemistry, Duke University, Durham, NC, 27708, USA
Jianfeng Lu
Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Lihan Wang

Authors

Yu Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Lihan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihan Wang.

Ethics declarations

Conflict of interest Statement

The authors have no conflict of interest.

Additional information

Communicated by C. Le Bris.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. The Decay Rate for Isotropic Quadratic Potential

For isotropic quadratic potential, an explicit expression for the spectral gap of ${\mathcal {L}}$ is available (thus also the decay rate in (7)). Note that while the result is stated for $d = 1$, it trivially extends to arbitrary dimension for isotropic quadratic potential as different coordinates are independent. The spectrum is also explicitly known for $V=0$ and $x\in {\mathbb {T}}^d$ on a torus, see [35].

Theorem 3

([46, (10.83)], [42, Theorem 3.1]) When $U(x) = \frac{m}{2}|x|^2$, $d = 1$, the spectrum of the operator $-{\mathcal {L}}$ is given by

$$\begin{aligned} \left\{ \lambda _{i,j} {:}{=} \frac{\gamma }{2} (i + j) + \frac{\sqrt{\gamma ^2 - 4 m}}{2}(i-j), \qquad i,j = 0, 1, 2, \cdots .\right\} . \end{aligned}$$

Let $\lambda _{\text {exact}}$ be the spectral gap for the real component of $\{\lambda _{i,j}\}_{i,j\geqslant 0}$. Notice that the spectral gap is always achieved when $i = 0$ and $j = 1$, thus

$$\begin{aligned} \lambda _{\text {exact}} = \mathfrak {Re}\left( \frac{\gamma }{2} - \frac{\sqrt{\gamma ^2-4m}}{2}\right) . \end{aligned}$$

(45)

Corollary A.1

For any dimension d, for isotropic potential $U(x) = \frac{m}{2}|x|^2$, (7) holds with the decay rate $\lambda _{\text {exact}}$.

Appendix B. The DMS Hypocoercive Estimation

In this section, we will revisit the decay rate by DMS estimation [18, 19], adapted and summarized for underdamped Langevin equation in [47, Sec. 2]. In the first part of this section, we will review the main result based on [47]; in addition, we will provide a new estimate of the operator norm of $ \left\Vert {\mathcal {A}}{\mathcal {L}}_{\text {ham}}(1-\Pi _v)\right\Vert _{L^2(\rho _{\infty }) \rightarrow L^2(\rho _{\infty })}$, which leads into a more explicit expression of the decay rate. In the second part, we will present the asymptotic analysis of the decay rate with respect to m and $\gamma $, under the assumption that $\nabla _x^2 U \geqslant -2\, \textrm{Id}$.

1.1 Revisiting the DMS Hypocoercive Estimation in $L^2(\rho _{\infty })$

Let us first define an operator

$$\begin{aligned} {\mathcal {A}}= \left( 1 + ({\mathcal {L}}_{\text {ham}}\Pi _v)^* ({\mathcal {L}}_{\text {ham}}\Pi _v)\right) ^{-1} ({\mathcal {L}}_{\text {ham}}\Pi _v)^* \end{aligned}$$

(46)

and a Lyapunov function $\textsf{E}$ for $\phi (x, v)$ by

$$\begin{aligned} \textsf{E}(\phi ) = \frac{1}{2} \left\Vert \phi \right\Vert ^2_{L^2(\rho _{\infty })} -\epsilon \left( {\mathcal {A}}\phi , \phi \right) _{L^2(\rho _{\infty })}, \end{aligned}$$

(47)

where $\epsilon \in (-1, 1)$ is some quantity depending on ${\mathcal {L}}$, to be specified below. The functional $\textsf{E}$ is equivalent to $L^2(\rho _{\infty })$ norm in the following sense (see e.g., [47, Eq. (17)]),

$$\begin{aligned} \frac{1- |\epsilon |}{2} \left\Vert \phi \right\Vert _{L^2(\rho _{\infty })}^2 \leqslant \textsf{E}(\phi ) \leqslant \frac{1+|\epsilon |}{2} \left\Vert \phi \right\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

(48)

Theorem 4

(See [47, Theorem 1]) Assume that the Poincaré inequality (10) holds and there exists $\textsf{R}_{\text {ham}}< \infty $ such that

$$\begin{aligned} \left\Vert {\mathcal {A}}{\mathcal {L}}_{\text {ham}}(1-\Pi _v)\right\Vert _{L^2(\rho _{\infty }) \rightarrow L^2(\rho _{\infty })} \leqslant \textsf{R}_{\text {ham}}. \end{aligned}$$

(49)

Suppose $\epsilon \in (-1, 1)$ is chosen such that $\lambda _{\text {DMS}} = \lambda _{\text {DMS}} (\gamma , m, \textsf{R}_{\text {ham}}, \epsilon ) > 0$, where

$$\begin{aligned} \lambda _{\text {DMS}} {:}{=} \frac{\gamma - \frac{\epsilon }{1 + m} - \sqrt{\epsilon ^2 (\textsf{R}_{\text {ham}}+ \frac{\gamma }{2})^2 + \left( \gamma - \frac{2 m + 1}{m+1}\epsilon \right) ^2}}{2(1+|\epsilon |)}. \end{aligned}$$

(50)

Then for any solution f(t, x, v) of (4) with $\int f_0\ \,\textrm{d}\rho _{\infty }= 0$, we have

$$\begin{aligned} \left\Vert f(t, \cdot , \cdot )\right\Vert _{L^2(\rho _{\infty })} \leqslant \sqrt{\frac{1+|\epsilon |}{1-|\epsilon |}} \left\Vert f_0\right\Vert _{L^2(\rho _{\infty })} e^{- \lambda _{\text {DMS}}\ t}. \end{aligned}$$

Notice that when $\epsilon = 0$, the rate $\lambda _{\text {DMS}} = 0$, which reduces to the conclusion that $\left\Vert f(t, \cdot , \cdot )\right\Vert _{L^2(\rho _{\infty })}$ is non-increasing in time t. The existence of $\textsf{R}_{\text {ham}}$ has been studied under fairly general assumptions on the potential U(x) in [19, Sec. 2]. In the Proposition B.1 below, we provide a simpler estimation of $\textsf{R}_{\text {ham}}$ only under the assumption of lower bound on Hessian; see the Appendix B.3 for its proof. The first part of the proof is the same as [19, Lemma 4]; the simplicity in our approach comes from the application of Bochner’s formula. It is interesting to observe that $\textsf{R}_{\text {ham}}$ does not depend on m when U is an isotropic quadratic potential.

Proposition B.1

Assume there exists $K \in {\mathbb {R}}$ such that $\nabla _x^2 U \geqslant -K\, \textrm{Id}$ for all $x \in {\mathbb {R}}^d$, then we can choose

$$\begin{aligned} \textsf{R}_{\text {ham}}= \sqrt{\max \{K, 2\}}. \end{aligned}$$

(51)

such that (49) is satisfied.

For the isotropic case $U(x) = \frac{m}{2}|x|^2$, we have

$$\begin{aligned} \left\Vert {\mathcal {A}}{\mathcal {L}}_{\text {ham}}(1-\Pi _v)\right\Vert _{L^2(\rho _{\infty }) \rightarrow L^2(\rho _{\infty })} = \sqrt{2}. \end{aligned}$$

Thus the optimal choice of $\textsf{R}_{\text {ham}}$ is $\sqrt{2}$ and (51) is tight in this case.

As an immediate consequence, if it holds that $\nabla _x^2 U \geqslant -2\, \textrm{Id}$, we can take $ \textsf{R}_{\text {ham}}= \sqrt{2}$, which is tight for the isotropic case.

1.2 Asymptotic Analysis of the Decay Rate

In this subsection, we shall assume that $\nabla _x^2 U \geqslant -2\, \textrm{Id}$, thus we can choose $\textsf{R}_{\text {ham}}= \sqrt{2}$, according to the Proposition B.1. To remove the dependence on the parameter $\epsilon $ and to find the optimal decay rate, let us introduce

$$\begin{aligned} \begin{aligned} \Lambda _{\text {DMS}}(\gamma , m)&{:}{=} \sup _{\epsilon \in (-1, 1)}\ \lambda _{\text {DMS}}(\gamma , m, \sqrt{2}, \epsilon )\\&= \sup _{\epsilon \in (-1, 1)} \frac{\gamma - \frac{\epsilon }{1 + m} - \sqrt{\epsilon ^2 (\sqrt{2} + \frac{\gamma }{2})^2 + \left( \gamma - \frac{2 m + 1}{m+1}\epsilon \right) ^2}}{2(1+|\epsilon |)}, \end{aligned} \end{aligned}$$

(52)

provided that the supremum is not achieved at the boundary i.e., $\epsilon = 1^{-}$ or $\epsilon = (-1)^{+}$. Observe that

When $\epsilon = 0$, $\lambda _{\text {DMS}}(\gamma , m, \sqrt{2}, 0) = 0$;
When $\epsilon = (-1)^{+}$, $\lambda _{\text {DMS}}(\gamma , m, \sqrt{2}, (-1)^{+}) < 0$.

Therefore, the supremum can only be achieved at $\epsilon = 1^{-}$, or the critical points of the expression on the right hand side of (52). In general, it is hard to obtain a simple explicit expression of $\Lambda _{\text {DMS}}(\gamma , m)$. Therefore, we shall consider the following asymptotic regions:

Proposition B.2

(i)
For fixed $m = O(1)$, we have
$$\begin{aligned} \Lambda _{\text {DMS}}(\gamma , m) = \left\{ \begin{aligned} \left( \frac{-(1+m) \sqrt{3m^2+4m+1} + 3m^2+3m+1}{6m^2+8m+3}\right) \gamma +O(\gamma ^2),&\qquad \text{ when } \ \gamma \rightarrow 0;\\ \frac{4m^2}{(1+m)^2 }\gamma ^{-1} + O(\gamma ^{-2}),&\qquad \text{ when } \ \gamma \rightarrow \infty .\\ \end{aligned} \right. \end{aligned}$$
(53)
(ii)
Consider coupled asymptotic regime $\gamma = b\sqrt{m}$ (or equivalently $m =\left( \gamma /b\right) ^2$) for some $b = O(1)$, we have
$$\begin{aligned} \Lambda _{\text {DMS}}(\gamma , m) = \left\{ \begin{aligned}&\frac{\gamma ^5}{2b^4} + O(\gamma ^6),&\qquad \text { when } \gamma \rightarrow 0;\\ \frac{4}{\gamma } + O(\gamma ^{-2}),&\qquad \text { when } \gamma \rightarrow \infty .\\ \end{aligned}\right. \end{aligned}$$
(54)

The proof can be found in Appendix B.3. The scaling in the first case is already known in e.g., [17, 26, 47]; in the above proposition, we simply explicitly calculate the leading order term. The second case is relevant when we choose $\gamma $ to optimize the convergence rate according to m and for the regime $m\rightarrow 0$.

1.3 Proofs of the Propositions in Appendix

Proof of Proposition B.1

We first consider the case that Hessian is bounded from below. It is equivalent to consider the operator norm of

$$\begin{aligned} -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}{\mathcal {A}}^* = -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}^2 \Pi _v \left( 1 + ({\mathcal {L}}_{\text {ham}}\Pi _v)^* ({\mathcal {L}}_{\text {ham}}\Pi _v)\right) ^{-1}. \end{aligned}$$

Notice that this operator is supported on $\text {Ran}(\Pi _v)$ from the observation that ${\mathcal {A}}= \Pi _v {\mathcal {A}}$, it is then equivalent to find the smallest $\textsf{R}_{\text {ham}}$ such that for any $\phi (x,v)$ with $\Pi _v \phi = \phi $ (i.e., $\phi (x,v) \equiv \phi (x)$ is a function of x only), we have

$$\begin{aligned} \left\Vert -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}{\mathcal {A}}^* \phi \right\Vert _{L^2(\rho _{\infty })}\leqslant \textsf{R}_{\text {ham}}\left\Vert \phi \right\Vert _{L^2(\rho _{\infty })} = \textsf{R}_{\text {ham}}\left\Vert \phi \right\Vert _{L^2(\mu )}. \end{aligned}$$

(55)

Given such a function $\phi $ with $\Pi _v \phi = \phi $, define

$$\begin{aligned}\varphi {:}{=} \left( 1 + ({\mathcal {L}}_{\text {ham}}\Pi _v)^* ({\mathcal {L}}_{\text {ham}}\Pi _v)\right) ^{-1} \phi .\end{aligned}$$

It is easy to check that $\Pi _v \varphi = \varphi $. By simplifying the above equation with (5) and (9),

$$\begin{aligned} \phi (x) = \varphi (x) - \Delta _x \varphi (x) + \nabla _x U(x) \cdot \nabla _x \varphi = \varphi (x) + \nabla _x^* \nabla _x \varphi (x). \end{aligned}$$

(56)

Furthermore, by some straightforward calculation, we have

$$\begin{aligned} -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}{\mathcal {A}}^* \phi = - (1-\Pi _v) {\mathcal {L}}_{\text {ham}}^2 \Pi _v \varphi = - \sum _{i,j} (v_i v_j - \delta _{i,j}) \partial _{x_i, x_j} \varphi . \end{aligned}$$

Thus

$$\begin{aligned} \left\Vert -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}{\mathcal {A}}^* \phi \right\Vert _{L^2(\rho _{\infty })}^2&= \int \left( \sum _{i,j} (v_i v_j - \delta _{i,j}) \partial _{x_i, x_j} \varphi \right) ^2\,\textrm{d}\rho _{\infty }\\&= 2 \sum _{i,j} \int \left( \partial _{x_i, x_j} {\varphi }\right) ^2\,\textrm{d}\mu . \end{aligned}$$

Then, by Bochner’s formula,

$$\begin{aligned} \Vert -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}&{\mathcal {A}}^* (\phi ) \Vert _{L^2(\rho _{\infty })}^2 \\ =&\ 2 \int \nabla _x \varphi \cdot \nabla _x \nabla _x^* \nabla _x {\varphi } - \nabla _x {\varphi } \cdot \nabla _x^2 U \nabla _x {\varphi } - \nabla _x^* \nabla _x \left( \frac{|\nabla _x {\varphi }|^2}{2}\right) \,\textrm{d}\mu \\ =&\ 2 \int |\nabla _x^*\nabla _x {\varphi }|^2 - \nabla _x {\varphi } \cdot \nabla _x^2 U \nabla _x {\varphi }\,\textrm{d}\mu \\ \leqslant&\ 2 \left( \int |\nabla _x^*\nabla _x {\varphi }|^2\,\textrm{d}\mu + K \int |\nabla _x {\varphi }|^2\,\textrm{d}\mu \right) \\ \leqslant&\ \max \left\{ K, 2\right\} \left( \int |\nabla _x^* \nabla _x {\varphi }|^2\,\textrm{d}\mu + 2 \int |\nabla _x {\varphi }|^2\,\textrm{d}\mu \right) . \end{aligned}$$

From (56), we have

$$\begin{aligned} \left\Vert {\phi }\right\Vert _{L^2(\mu )}^2&= \int {\varphi }^2 + 2 {\varphi }\ \nabla _x^* \nabla _x {\varphi } + \left|\nabla _x^*\nabla _x {\varphi }\right|^2\ \textrm{d}\mu \\&\geqslant 2 \int |\nabla _x{\varphi }|^2\,\textrm{d}\mu + \int |\nabla _x^* \nabla _x {\varphi }|^2\,\textrm{d}\mu . \end{aligned}$$

By combining the last two equations,

$$\begin{aligned} \left\Vert -(1-\Pi _v) {\mathcal {L}}_{\text {ham}}{\mathcal {A}}^* (\phi )\right\Vert _{L^2(\rho _{\infty })}^2&\leqslant \max \{K, 2\} \left\Vert {\phi }\right\Vert _{L^2(\mu )}^2, \end{aligned}$$

which yields (51).

We now consider the isotropic case. Recall that the operator norm of ${\mathcal {A}}{\mathcal {L}}_{\text {ham}}(1-\Pi _v)$ is the smallest $\textsf{R}_{\text {ham}}$ such that (55) holds. Let us consider the elliptic PDE (56). By the choice $U(x) = \frac{m}{2}|x|^2$,

$$\begin{aligned} {\phi }(x) = \left( 1 + m (x - \frac{1}{m} \nabla _x) \cdot \nabla _x \right) {\varphi }(x). \end{aligned}$$

Then by rescaling the variable $x = \frac{y}{\sqrt{m}}$ and rescaling the functions $\bar{\phi }(y) {:}{=} {\phi }(x) = \phi \left( \frac{y}{\sqrt{m}}\right) $, $\bar{\varphi }(y) {:}{=} {\varphi }(x) = {\varphi }\left( \frac{y}{\sqrt{m}}\right) $, we have

$$\begin{aligned} \bar{\phi }(y) = \left( 1 + m (y - \nabla _y) \cdot \nabla _y\right) \bar{\varphi }(y). \end{aligned}$$

(57)

In addition, by rewriting (55), we need to find the smallest $\textsf{R}_{\text {ham}}$ such that

$$\begin{aligned} 2 m^2 \sum _{i,j} \int \left|\partial _{y_i, y_j} \bar{\varphi }(y)\right|^2\ e^{-\frac{|y|^2}{2}}\,\textrm{d}y\leqslant \textsf{R}_{\text {ham}}^2 \int \bar{\phi }(y)^2 e^{-\frac{|y|^2}{2}}\,\textrm{d}y. \end{aligned}$$

(58)

Next, let us expand the last equation by probabilists’ Hermite polynomials $H_k(z) {:}{=} (z - \frac{\textrm{d}}{\textrm{d}z})^k \cdot 1$ for integers $k\geqslant 0$. Recall two important properties

$$\begin{aligned} H'_k(z) = k H_{k-1}(z), \qquad \frac{1}{\sqrt{2\pi }}\int H_j(z) H_k(z) e^{-\frac{z^2}{2}}\,\textrm{d}z = k!\, \delta _{j,k}. \end{aligned}$$

Given $\varvec{n} = (n_1, n_2, \cdots , n_d)$, define

$$\begin{aligned} H_{\varvec{n}}(y) {:}{=} H_{n_1}(y_1) H_{n_2}(y_2) \cdots H_{n_d}(y_d). \end{aligned}$$

By the above properties, it is easy to show that if $\bar{\varphi } = H_{\varvec{n}}$, then $\bar{\phi } = N_{\varvec{n}} H_{\varvec{n}}$, where $N_{\varvec{n}} {:}{=} 1 + m \sum _{i} n_i$. Thus if $\bar{\varphi }(y) = \sum _{\varvec{n}} a_{\varvec{n}}H_{\varvec{n}}$, then we have $\bar{\phi } = \sum _{n} a_{\varvec{n}} N_{\varvec{n}} H_{\varvec{n}}$. By such an expansion, (58) can be rewritten as

$$\begin{aligned} 2m^2 \sum _{i, j} \sum _{\varvec{n}} a_{\varvec{n}}^2 (n_i n_j - \delta _{i,j} n_i) \prod _{k=1}^{d} n_k! \leqslant \textsf{R}_{\text {ham}}^2 \sum _{\varvec{n}} a_{\varvec{n}}^2 N_{\varvec{n}}^2 \prod _{k=1}^{d} n_k! \end{aligned}$$

Then finding the operator norm of ${\mathcal {A}}{\mathcal {L}}_{\text {ham}}(1-\Pi _v)$ is equivalent to finding the smallest $\textsf{R}_{\text {ham}}$ such that for any $\varvec{n}$, one has

$$\begin{aligned} \sum _{i,j} (n_i n_j - \delta _{i,j} n_i) \leqslant \frac{\textsf{R}_{\text {ham}}^2}{2 m^2 } N_{\varvec{n}}^2 \equiv \frac{\textsf{R}_{\text {ham}}^2}{2 m^2 } \left( 1 + m \sum _i n_i\right) ^2. \end{aligned}$$

When $n_1 \rightarrow \infty $ and $n_2, n_3, \cdots , n_d = 0$, we know that $\frac{\textsf{R}_{\text {ham}}^2}{2} \geqslant 1$. Also observe that

$$\begin{aligned} \sum _{i,j} (n_i n_j - \delta _{i,j} n_i) \leqslant \left( \sum _{i} n_i\right) ^2 = \frac{1}{m^2} \left( m \sum _i n_i\right) ^2\leqslant \frac{1}{m^2} \left( 1 + m \sum _i n_i\right) ^2. \end{aligned}$$

Therefore, $\frac{\textsf{R}_{\text {ham}}^2}{2} = 1$ is sufficient.

In summary, $\left\Vert {\mathcal {A}}{\mathcal {L}}_{\text {ham}}(1-\Pi _v)\right\Vert _{L^2(\rho _{\infty }) \rightarrow L^2(\rho _{\infty })} = \sqrt{2}$ and the optimal choice of $\textsf{R}_{\text {ham}}$ is $\sqrt{2}$. $\quad \square $

Proof of Proposition B.2

We used Maple software to help verify the asymptotic expansion.

Part (i): $m = O(1)$.

(when $\gamma \rightarrow 0$). Via asymptotic expansion, we have
$$\begin{aligned} \lambda _{\text {DMS}}(\gamma , m, \sqrt{2}, 1^{-}) = - \frac{1+\sqrt{6 m^2 + 8m + 3}}{4(1+m)} + O(\gamma ) < 0. \end{aligned}$$
Thus the supremum is not obtained at $\epsilon = 1^{-}$. Then let us consider critical points within the domain $(-1, 1)$, whose asymptotic expansions are
$$\begin{aligned} \epsilon _{\pm } = \frac{(6m^2+5m+1 \pm \sqrt{3m^2+4m+1})(1+m)}{18m^3+30m^2+17m+3} \gamma + O(\gamma ^2) > 0. \end{aligned}$$
After comparison, the larger decay rate is obtained at $\epsilon _{-}$ with the value in (53).
(when $\gamma \rightarrow \infty $). Similarly, via asymptotic expansion, we have
$$\begin{aligned} \lambda _{\text {DMS}}(\gamma , m, \sqrt{2}, 1^{-}) = -\frac{\frac{\sqrt{5}}{2}-1}{4}\gamma + O(1) < 0. \end{aligned}$$
Thus we need to consider the critical points. It turns out, there is only one critical point within the domain $(-1, 1)$, which is $\varepsilon = \frac{8m}{1+m}\gamma ^{-1} + O(\gamma ^{-2})$ with the decay rate in (53).

Part (ii): $\gamma = b\sqrt{m}$ with $b=O(1)$.

(when $\gamma \rightarrow 0$). Via asymptotic expansion, one could check that
$$\begin{aligned} \lambda _{\text {DMS}}(\gamma , m=(\gamma /b)^2, \sqrt{2}, 1^{-}) = -\frac{1+\sqrt{3}}{4} + O(\gamma ) < 0. \end{aligned}$$
Thus, we only need to consider the decay rate at critical points, which are given by
$$\begin{aligned} \epsilon _1 = \frac{\gamma ^3}{b^2} + O(\gamma ^4), \qquad \epsilon _2 = \frac{2}{3} \gamma + O(\gamma ^2). \end{aligned}$$
and the associated decay rates are
$$\begin{aligned} \lambda _{\text {DMS}}(\gamma , m=(\gamma /b)^2, \sqrt{2}, \epsilon _1)&= \frac{\gamma ^5}{2b^4} + O(\gamma ^{6})> 0;\\ \lambda _{\text {DMS}}(\gamma , m=(\gamma /b)^2, \sqrt{2}, \epsilon _2)&= -\frac{1}{3} \gamma + O(\gamma ^2) < 0. \end{aligned}$$
Therefore, the optimal decay rate is obtained at $\epsilon _1$, which gives (54).
(when $\gamma \rightarrow \infty $). Via asymptotic expansion, one could obtain
$$\begin{aligned} \lambda _{\text {DMS}}(\gamma , m=(\gamma /b)^2, \sqrt{2}, 1^{-}) = -\frac{\sqrt{5}-2}{8}\gamma + O(1) < 0. \end{aligned}$$
Thus the supremum in (52) cannot be obtained at $\epsilon = 1^{-}$. Then, let us look at the critical points. It turns out there is only one within the interval $(-1, 1)$, which is $\epsilon _1 = \frac{8}{\gamma } + O(\gamma ^{-2})$. The optimal decay rate must be achieved at $\epsilon _1$, with the expression given in (54). $\quad \square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, Y., Lu, J. & Wang, L. On Explicit $L^2$-Convergence Rate Estimate for Underdamped Langevin Dynamics. Arch Rational Mech Anal 247, 90 (2023). https://doi.org/10.1007/s00205-023-01922-4

Download citation

Received: 05 July 2022
Accepted: 04 August 2023
Published: 24 August 2023
DOI: https://doi.org/10.1007/s00205-023-01922-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Explicit \(L^2\)-Convergence Rate Estimate for Underdamped Langevin Dynamics

Abstract

Similar content being viewed by others

Convergence rates for nonequilibrium Langevin dynamics

The Small-Mass Limit for Langevin Dynamics with Unbounded Coefficients and Positive Friction

Langevin Equations in the Small-Mass Limit: Higher-Order Approximations

1 Introduction

1.1 Notations

1.2 Assumptions and Main Results

Assumption 1

Assumption 2

Assumption 3

Remark 1.1

Theorem 1

Remark 1.2

Theorem 2

1.3 A Literature Review and Comparison

2 Proofs

Lemma 2.1

Lemma 2.2

Proof

Lemma 2.3

Proof

Lemma 2.4

Remark 2.5

Proof

Lemma 2.6

Remark 2.7

Proof

Proof of Theorem 2

Proof of Theorem 1

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest Statement

Additional information

Publisher's Note

Appendices

Appendix A. The Decay Rate for Isotropic Quadratic Potential

Theorem 3

Corollary A.1

Appendix B. The DMS Hypocoercive Estimation

1.1 Revisiting the DMS Hypocoercive Estimation in \(L^2(\rho _{\infty })\)

Theorem 4

Proposition B.1

1.2 Asymptotic Analysis of the Decay Rate

Proposition B.2

1.3 Proofs of the Propositions in Appendix

Proof of Proposition B.1

Proof of Proposition B.2

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation