1 Introduction

We consider the convergence rate for the following underdamped Langevin dynamics \(({x}_t, {v}_t)\in {\mathbb {R}}^d\times {\mathbb {R}}^d\):

$$\begin{aligned} \left\{ \begin{aligned} \textrm{d}x_t&= {v}_t\,\textrm{d}t\\ \textrm{d}{v}_t&= -\nabla U({x}_t)\,\textrm{d}t - \gamma {v}_t\,\textrm{d}t + \sqrt{2\gamma }\,\textrm{d}{W}_t.\\ \end{aligned}\right. \end{aligned}$$
(1)

Have U(x) is the potential energy, \(\gamma >0\) is the friction coefficient, and \({W}_t\) is a d-dimensional standard Brownian motion; the mass and temperature are set to be 1 for simplicity. The law of the process (1), \(\rho (t,x,v)\), satisfies the kinetic Fokker–Planck equation

$$\begin{aligned} \partial _t \rho = - v \cdot \nabla _x \rho + \nabla _x U \cdot \nabla _v\rho + \gamma \left( \Delta _v \rho + \nabla _v \cdot (v \rho )\right) . \end{aligned}$$
(2)

It is well-known (see for example [45, Proposition 6.1]) that under mild assumptions, (2) admits a unique stationary density function given by

$$\begin{aligned} \,\textrm{d}\rho _{\infty }(x, v) = \,\textrm{d}\mu (x) \,\textrm{d}\kappa (v), \end{aligned}$$
(3)

where

$$\begin{aligned} \,\textrm{d}\mu (x) = \frac{1}{Z_U}e^{-U(x)}\,\textrm{d}x, \quad \,\textrm{d}\kappa (v) = \frac{1}{(2\pi )^{d/2}} e^{-\frac{|v|^2}{2}} \,\textrm{d}v, \quad Z_U = \int _{{\mathbb {R}}^d} e^{- U(x)}\,\textrm{d}x. \end{aligned}$$

When \(\gamma \rightarrow \infty \), the rescaled dynamics \(x^{(\gamma )}_t {:}{=} x_{\gamma t}\) converges to the Smoluchowski SDE, also known as the overdamped Langevin dynamics (see e.g., [45, Sec.  6.5]), which is given by

$$\begin{aligned} \textrm{d}x^{(\infty )}_t = -\nabla U(x^{(\infty )}_t)\,\textrm{d}t + \sqrt{2}\,\textrm{d}B_t. \end{aligned}$$

An equivalent formalism of (2) is the following backward Kolmogorov equation:

$$\begin{aligned} \begin{aligned} \partial _t f&= {\mathcal {L}}f,\qquad {\mathcal {L}}= {\mathcal {L}}_{\text {ham}}+ \gamma {\mathcal {L}}_{\text {FD}}, \qquad f(0, x, v) = f_0(x,v). \end{aligned} \end{aligned}$$
(4)

Have \({\mathcal {L}}_{\text {ham}}\) is the Hamiltonian transport operator and \({\mathcal {L}}_{\text {FD}}\) is the fluctuation-dissipation term

$$\begin{aligned} \left\{ \begin{aligned} {\mathcal {L}}_{\text {ham}}&= v \cdot \nabla _x - \nabla _x U \cdot \nabla _v \\ {\mathcal {L}}_{\text {FD}}&= \Delta _v - v \cdot \nabla _v. \end{aligned}\right. \end{aligned}$$
(5)

Indeed, (4) could be derived from (2) by considering \(\rho (t, x, v) = f(t, x, -v) \rho _{\infty }(x, v)\) [45]; since by \(L^2\)-duality, \(\left\Vert \rho - \rho _{\infty }\right\Vert _{L^2(\rho _{\infty }^{-1})} \equiv \left\Vert f - \int f\,\textrm{d}\rho _{\infty }\right\Vert _{L^2(\rho _{\infty })}\), the exponential convergence of the solution \(\rho (t,\cdot ,\cdot )\) of (2) to \(\rho _\infty \) is equivalent to the exponential decay of \(f(t, \cdot , \cdot )\) to zero, provided that \(\int f_0\,\textrm{d}\rho _{\infty }= 0\). Similarly, one could obtain the backward Kolmogorov equation for the overdamped Langevin dynamics, which is given by

$$\begin{aligned} \partial _t h = -\nabla _x U \cdot \nabla _x h + \Delta _x h, \qquad h(0, x) = h_0(x). \end{aligned}$$
(6)

If \(\mu \) satisfies a Poincaré inequality, one could show that the generator in the above equation (6) is self-adjoint and coercive with respect to \(L^2(\mu )\). As a consequence, if \(\int h_0\,\textrm{d}\mu = 0\), then h(tx) decays to zero exponentially fast as \(t\rightarrow \infty \), see for example [6, Theorem 4.2.5].

Unlike the generator of (6), the generator \({\mathcal {L}}\) in (4) for the underdamped Langevin is not uniformly elliptic. As a result, proving the exponential convergence of \(\rho (t, \cdot , \cdot )\) to the equilibrium \(\rho _{\infty }\) is more challenging. With extensive works throughout the years, the exponential convergence of the underdamped Langevin dynamics is now better understood in various norms (see Section 1.2 below for a review).

Our goal in this work is to provide an explicit estimate of the decay rate in \(L^2\) for the semigroup in (4), based on a framework proposed in [1] which implicitly uses Hörmander’s bracket conditions [32]. In particular, under some mild assumptions of U, we obtain explicit estimates for some universal constant \(C>1\) independent of \(U,\gamma ,d\) and some \(\nu > 0\) such that for any possible \(f=f(t,x,v)\) satisfying (4) and \(\int _{} f_0 \,\textrm{d}\rho _{\infty }= 0\), we have

$$\begin{aligned} \left\Vert f(t, \cdot , \cdot )\right\Vert _{L^2(\rho _{\infty })} \leqslant Ce^{-\nu t}\left\Vert f_0\right\Vert _{L^2(\rho _{\infty })} . \end{aligned}$$
(7)

In the rest of this section, we will first present our assumptions and main results. Next, we will briefly review existing approaches to study the exponential convergence of (4) (or equivalently (2)) in Section 1.2, and compare our estimate of the decay rate \(\nu \) with some previous works aiming at explicit estimates [9, 16, 40, 47]. We would like to comment here that convergence results are also obtained in earlier works [17, 26], although their rates are only explicit in \(\gamma \).

1.1 Notations

Throughout the paper we assume I to be the time interval (0, T), and we use \(\,\textrm{d}\lambda (t)=\frac{1}{T}\chi _{(0,T)}(t)\,\textrm{d}t\) to denote the rescaled Lebesgue measure on I so that \(\,\textrm{d}\lambda (t)\) denotes a probability measure. For any probability measure \(\rho \), we use \(L^2(\rho )\) (and similarly \(H^1(\rho ),H^2(\rho )\)) to denote the standard Sobolev spaces, and \(H^{-1}(\rho )\) to denote the dual space of \(H^1(\rho )\). For the Gaussian probability measure \(\kappa \) in velocity space, we also use \(L^2_\kappa \), \(H^1_\kappa , \, H^{-1}_\kappa \) to denote the corresponding spaces. Moreover, we use \(H_0^1(\lambda \otimes \mu )\) to denote the \(H^1(\lambda \otimes \mu )\) functions that vanish at both time boundaries \(t=0\) and \(t=T\). By abuse of notation, we denote the canonical pairing \(\langle \cdot , \cdot \rangle _{H^{-1}(\rho ),H^{1}(\rho )}\) between \(f\in H^1(\rho )\) and \(g\in H^{-1}(\rho )\) by

$$\begin{aligned} \int fg\,\textrm{d}\rho {:}{=}\langle g, f \rangle _{H^{-1}(\rho ),H^{1}(\rho )}. \end{aligned}$$

For \(f\in H^{-1}(\rho )\), we use the notation \((f)_{\rho } {:}{=} \langle f,1\rangle _{H^{-1}(\rho ),H^1(\rho )}\). For an arbitrary Banach space V and time interval I equipped with Lebesgue measure \(\,\textrm{d}\lambda (t)\), we denote by \(L^p(\lambda \otimes \mu ;V)\) the Banach space of functions f(txv) with norm

$$\begin{aligned} \Vert f\Vert _{L^p(\lambda \otimes \mu ;V)}{:}{=}\left( \int _{I\times {\mathbb {R}}^d} \Vert f(t,x,\cdot )\Vert ^p_V \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \right) ^\frac{1}{p}. \end{aligned}$$

Inspired by [1], we define the Banach space

$$\begin{aligned} H_{hyp}^1(\lambda \otimes \mu ){:}{=}\left\{ f\in L^2(\lambda \otimes \mu ;H^1_\kappa )~:~\partial _t f-{\mathcal {L}}_{\text {ham}}f\in L^2(\lambda \otimes \mu ;H_\kappa ^{-1}) \right\} . \end{aligned}$$

We define a projection operator for \(\phi (t,x,v)\in L^2(\lambda \otimes \rho _{\infty })\) by

$$\begin{aligned} (\Pi _v \phi )(t,x) {:}{=} \int _{{\mathbb {R}}^d} \phi (t,x,v)\,\textrm{d}\kappa (v). \end{aligned}$$
(8)

Equivalently, \(\Pi _v\) is used to obtain the marginal component of \(\phi \) in \(L^2(\lambda \otimes \mu )\). By a slight abuse of notation, for \(\phi (x, v) \in L^2(\rho _{\infty })\), we also use the same notation \(\Pi _v\) to represent the similar projection, i.e., \((\Pi _v \phi )(x) {:}{=} \int _{{\mathbb {R}}^d} \phi (x, v) \,\textrm{d}\kappa (v)\). The adjoints of \(\nabla _x\) and \(\nabla _v\) in the Hilbert space \(L^2(\rho _{\infty })\) are respectively given by \(\nabla _x^* F= - \nabla _x \cdot F + \nabla _x U \cdot F\) and \(\nabla _v^* F = -\nabla _v \cdot F + v \cdot F\) for any vector field \(F(x,v): {\mathbb {R}}^{2d} \rightarrow {\mathbb {R}}^d\). Thus we can rewrite operators \({\mathcal {L}}_{\text {ham}}\) and \({\mathcal {L}}_{\text {FD}}\) as

$$\begin{aligned} {\mathcal {L}}_{\text {ham}}= \nabla _v^* \nabla _x - \nabla _x^* \nabla _v, \qquad {\mathcal {L}}_{\text {FD}}= -\nabla _v^* \nabla _v. \end{aligned}$$
(9)

For time-augmented state space \(I\times {\mathbb {R}}^d\) equipped with measure \(\lambda \otimes \mu \), we use the convention \(\partial _{x_0}{:}{=}\partial _t\), the short-hand notation \(\bar{\nabla }{:}{=}(\partial _t,\nabla _x)^\top \), and the notation \({\mathscr {L}}{:}{=}-\partial _{tt}+\nabla _x^*\nabla _x\) to denote the “Laplace” operator on \(L^2(\lambda \otimes \mu )\). We use C to denote a universal constant independent of all parameters that may change from line to line.

1.2 Assumptions and Main Results

Assumption 1

(Poincaré inequality for \(\mu \)) Assume that the potential U(x) satisfies a Poincaré inequality in space

$$\begin{aligned} \int _{{\mathbb {R}}^d} \left( f-\int _{{\mathbb {R}}^d} f \,\textrm{d}\mu \right) ^2\textrm{d}\mu \leqslant \dfrac{1}{m}\int _{{\mathbb {R}}^d} |\nabla _x f|^2 \,\textrm{d}\mu , \qquad \forall f\in H^1(\mu ). \end{aligned}$$
(10)

Assumption 2

The potential \(U\in C^2({\mathbb {R}}^d)\), and there exist constants \(M>0\) and \(\delta \in (0,1)\) such that

$$\begin{aligned} |\nabla _x^2 U(x)|^2= & {} \sum _{i,j=1}^d |\partial _{x_ix_j} U(x)|^2\leqslant M^2(d + |\nabla _x U(x)|^2), \nonumber \\{} & {} \quad \text{ and } \Delta _x U(x) \leqslant Md + \frac{\delta }{2}|\nabla _x U(x)|^2 \qquad \forall \ x\in {\mathbb {R}}^d. \end{aligned}$$
(11)

for some constant \(M\geqslant 1\).

Assumption 3

The embedding is compact.

Remark 1.1

  1. (i)

    Assumption 1 guarantees that the elliptic equation \(\nabla _x^*\nabla _x u = h\) has a unique solution \(u\in H^2(\mu )\) for any \(h\in L^2(\mu )\) satisfying \((h)_\mu =0\) (see for example [19, Proposition 5]). Hence, together with Assumption 3, we derive from Fredholm alternative that \(L^2(\mu )\) has an orthonormal basis \(\{1\}\cup \{w_\alpha \}_{\alpha >0}\) where \(w_\alpha \in H^2(\mu )\) are eigenfunctions of \(\nabla _x^*\nabla _x\) with eigenvalue \(\alpha ^2\) for a discrete set of \(\alpha >0\) (see [22, Chapter 6] for an argument with bounded domains):

    $$\begin{aligned} \nabla ^*_x\nabla _x w_\alpha =\alpha ^2 w_\alpha . \end{aligned}$$

    Further, by Assumption 1, any eigenvalue \(\alpha ^2\) of \(\nabla ^*_x \nabla _x\) satisfies \(\alpha \ge \sqrt{m}\), in fact, the smallest \(\alpha \) is precisely \(\sqrt{m}\), the square root of the Poincaré constant; the spectrum of \(\nabla _x^*\nabla _x\) is unbounded from above.

  2. (ii)

    Assumption 3 is satisfied when

    $$\begin{aligned} \lim _{|x|\rightarrow \infty } \dfrac{U(x)}{|x|^\beta }=\infty \end{aligned}$$

    for some \(\beta >1\) (see [31] for a proof). We would like to comment here that we require Assumption 3 only for technical purposes, more precisely in the proof of Lemma 2.6 where we used the spectral decomposition of the elliptic operator \(\nabla _x^*\nabla _x\) to construct the test functions we desire. We believe that the assumption is not necessary for our main results to hold. We leave this for future research.

  3. (iii)

    Similar versions of Assumption 2 is commonly used in the literature, see e.g., the books [45, 54] and the papers [18, 19], and is satisfied when U grows at most exponentially fast as \(x\rightarrow \infty \). Here we adopt the more natural dimension scaling in [10, Assumption 1] (in particular, we take \(c_1=c_3=M\) in their setting), since in the case of separable potential \(U(x) = \sum _{i=1}^d u(x_i)\), this amounts to the more natural one-dimensional estimate \(|u''|^2 \leqslant M(1+|u'|^2)\).

Theorem 1

Under Assumptions 1, 2, and 3, there exist a constant \(\nu > 0\) and universal constants Cc independent of all parameters such that, for every f(txv) satisfying the backward Kolmogorov equation (4) with initial condition \(f_0 \in L^2(\mu ;H^1_\kappa )\) and

$$\begin{aligned} (f_0)_{\rho _{\infty }}=0, \end{aligned}$$
(12)

we have, for every \(t\in (0,\infty )\),

$$\begin{aligned} \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })} \leqslant C\exp (-\nu t)\Vert f_0\Vert _{L^2(\rho _{\infty })}. \end{aligned}$$

Moreover, \(\nu \) can be made explicit as

$$\begin{aligned} \nu = \dfrac{m\gamma }{c(\sqrt{m}+R+\gamma )^2} \end{aligned}$$
(13)

with some constant \(R>0\) given by

  1. (i)

    If U is convex, then

    $$\begin{aligned} R=0. \end{aligned}$$
  2. (ii)

    If the Hessian of U is bounded from below

    $$\begin{aligned} \nabla _x^2 U(x) \geqslant -K\, \textrm{Id}, \qquad \forall \, x \in {\mathbb {R}}^d \end{aligned}$$
    (14)

    for some constant \(K \geqslant 0\), then

    $$\begin{aligned} R=\sqrt{K}. \end{aligned}$$

    Note that if \(K = 0\), we recover the estimate in case (i).

  3. (iii)

    In the most general case without further assumptions,

    $$\begin{aligned} R=M+M^\frac{3}{4}d^\frac{1}{4}. \end{aligned}$$

Remark 1.2

  1. (i)

    If we fix \(m=O(1)\), then, when \(\gamma \rightarrow 0\) (resp. \(\gamma \rightarrow \infty \)), our estimate provides an estimate on decay rate of \(O(\gamma )\) (resp. \(O(\gamma ^{-1})\)). This is consistent with [17, 26, 47] and also the isotropic Gaussian case when \(U(x)=\frac{m}{2}|x|^2\) (see Appendix A).

  2. (ii)

    In the convex case, if we optimize with respect to \(\gamma \) by choosing \(\gamma =\sqrt{m}\), then

    $$\begin{aligned} \nu =\frac{\sqrt{m}}{4c}. \end{aligned}$$

    As is shown in Appendix A, the scaling on m is optimal in the regime \(m\rightarrow 0\), as it is the rate even for isotropic quadratic potential. We refer the readers to Appendix B for the corresponding results from the DMS method, with a slightly more explicit estimate compared to [47].

  3. (iii)

    In the case where condition (14) is satisfied, e.g. for the double well potential \(U(x)=(|x|^2-1)^2\) with \(K=4\), our scaling on K is consistent with [36, Theorem 1] and [37, Sec. 5]. Similar assumption is also used in [44, Theorem 1] for functional inequalities.

  4. (iv)

    It is well-known that for overdamped Langevin dynamics, the decay rate is simply m in \(L^2(\mu )\) for (6). By part (ii) of this remark, when \(m \ll 1\), the underdamped Langevin dynamics (1) could converge to its equilibrium \(\rho _{\infty }\) at a rate \(O(\sqrt{m})\) for convex potentials, which is much faster than the overdamped Langevin dynamics.

  5. (v)

    Due to the relation (see e.g., [48])

    $$\begin{aligned}&\frac{1}{\sqrt{2}}\left\Vert \rho - \rho _{\infty }\right\Vert _{\text {TV}} \leqslant \sqrt{\textrm{KL}\left( \rho \,\Vert \,\rho _{\infty }\right) } \leqslant \sqrt{\chi ^2(\rho , \rho _{\infty })} \\&\quad \equiv \left\Vert \rho - \rho _{\infty }\right\Vert _{L^2(\rho _{\infty }^{-1})} \equiv \left\Vert f - \int f\,\textrm{d}\rho _{\infty }\right\Vert _{L^2(\rho _{\infty })}, \end{aligned}$$

    where \(f = \,\textrm{d}\rho / \,\textrm{d}\rho _{\infty }\), and the Talagrand inequality [44] \(W_2(\rho ,\rho _{\infty }) \leqslant \sqrt{\frac{2}{C_{LSI}}\textrm{KL}(\rho \Vert \rho _{\infty })}\) where \(C_{LSI}\) is the logarithmic Sobolev constant, Theorem 1 implies that \(\rho (t, \cdot , \cdot )\) converges to \(\rho _{\infty }\) with rate \(2\nu \) in both \(\chi ^2\)-divergence and relative entropy, and with rate \(\nu \) in total variation and (if \(\mu \) satisfies log-Sobolev inequality) 2-Wasserstein distance. On the other hand, our result does not imply

    $$\begin{aligned}d(\rho _t,\rho _{\infty }) \leqslant C\exp (-\nu t) d(\rho _0,\rho _{\infty }) \end{aligned}$$

    where \(d(\rho ,\pi ) = TV(\rho ,\pi ), \, W_2(\rho ,\pi )\) or \(\textrm{KL}(\rho \Vert \pi )\). It is interesting to study if one could establish the same convergence rate with Wasserstein distance (which is the same as asking if one could establish a coupling argument for our result) or relative entropy.

Our decay estimate is based on the following Poincaré-type inequality in time-augmented space:

Theorem 2

Under Assumptions 1, 2, and 3, there exist a universal constant C independent of all parameters, and a constant \(R<\infty \) (the same constant as in Theorem 1) such that for every \(f\in H_{hyp}^1(\lambda \otimes \mu )\), we have

$$\begin{aligned}{} & {} \Vert f-(f)_{\lambda \otimes \rho _{\infty }}\Vert _{L^2(\lambda \otimes \rho _{\infty })} \leqslant C\left( \left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}\right. \right. \nonumber \\{} & {} \quad \left. + \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \Vert (\mathcal {I}-\Pi _v) f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \nonumber \\{} & {} \left. \quad +\left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )} \right) . \end{aligned}$$
(15)

Let us give a brief introduction on the strategy of the proof, which is strongly motivated by the work of Armstrong and Mourrat [1]. A naive energy estimate and Gaussian Poincaré inequality yields

$$\begin{aligned} \dfrac{\,\textrm{d}}{\,\textrm{d}t} \Vert f(t, \cdot )\Vert _{L^2(\rho _{\infty })}^2 =-2\gamma \Vert \nabla _v f(t, \cdot )\Vert _{L^2(\rho _{\infty })}^2\leqslant -2\gamma \Vert (\mathcal {I}-\Pi _v) f(t, \cdot )\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

While the above establishes the \(L^2\) energy decay, it does not directly yield exponential decay rate. In particular, the energy dissipation is only present in velocity variable. However, instead of looking at single time slice, we should look at time intervals, since after time propagation, the dissipation in v together with the transport terms in x will lead to dissipation in x. Moreover, in the analysis, we are essentially treating the time variable t as another space variable alongside x. With the help a Poincaré-type inequality in the time-augmented state space established in Theorem 2, we can prove exponential convergence still using the standard energy estimate, in line with the moral “hypocoercivity is simply coercivity with respect to the correct norm”, quoted from [1, Page 4].

To prove Theorem 2, as an educated reader might realize from [19], the elliptic regularity in x variable plays an important role in the estimates, which in Lemma 2.4 we made a mild generalization to the time-augmented space \(L^2(\lambda \otimes \mu )\). However, in the proof of Theorem 2 when applying integration by parts, we need test functions that vanish at both boundary layers \(t=0\) and \(t=T\), which is not necessarily satisfied by the derivatives of the solution to the elliptic equation (22). This is why we resort to Lemma 2.6 (also an extension of Bogovskii’s operator [11] to \((I\times {\mathbb {R}}^d,\lambda \otimes \mu )\)) for the solution of the divergence equation (25), which is a cornerstone of this proof. In particular, even for convex U, the constants in (15) blow up as \(T\rightarrow 0\), which can be traced down to the estimate of \(\psi _{2,\alpha }'\) in (35), and thus prevents us from working on single time slices.

1.3 A Literature Review and Comparison

Kinetic Fokker–Planck equation was first studied by Kolmogorov [34], and was the main motivation for Hörmander’s theory on hypoelliptic equations [32], which gave an almost complete classification of second-order hypoelliptic operators. The earliest result regarding its exponential convergence were established in [52] for potentials with bounded Hessian, which was later generalized in [41, 51, 55]. There is a substantial amount of works in the literature for studying the exponential convergence of the underdamped Langevin dynamics. Below, we shall categorize them based on the norms and approaches to characterize the convergence.

  1. (i)

    (Convergence in \(H^1(\rho _{\infty })\) norm). The exponential convergence of the kinetic Fokker–Planck equation in \(H^1(\rho _{\infty })\) was proved by Villani in [54, Theorem 35], which was inspired by early works of [27, 29]. See also [53] for a brief overview of main ideas. The earlier work of [43] proved similar results on the torus without forcing term. Since \(L^2(\rho _{\infty })\) norm is controlled by \(H^1(\rho _{\infty })\) norm, this result automatically implies the convergence of (4) in \(L^2(\rho _{\infty })\). However, the decay rate therein is quite implicit; see [54, Sec. 7.2]. This approach is extended in [9] to possibly singular potentials with convergence rates given in certain cases.

  2. (ii)

    (Convergence in a modified \(L^2(\rho _{\infty })\) norm). A more direct approach for convergence in \(L^2(\rho _{\infty })\) was developed by Dolbeault, Mouhot and Schmeiser in [18, 19], see also earlier ideas in [28]. They identified a modified \(L^2(\rho _{\infty })\) norm, denoted by \(\textsf{E}\), such that \(\textsf{E}(\rho (t, x, v)) \rightarrow 0\) exponentially fast for \(\rho (t, \cdot , \cdot )\) evolving according to (2). This hypocoercivity method was revisited and adapted in [17, 26, 47] to deal with the backward Kolmogorov equation (4), i.e., to show that \(\textsf{E}(f(t, \cdot , \cdot ))\) decays to zero exponentially fast. In Appendix B.1, we will briefly revisit how to choose the Lyapunov function \(\textsf{E}\), based on [16, Sec. 2], because their setup is consistent with our \(L^2(\rho _{\infty })\) estimate in Section 1.1 above. We would like to remark that while [47] gets some rate, for which the scalings in d and \(\gamma \) are known, it is difficult to determine the optimal \(\gamma \) for their convergence rate estimates. As a remark, the DMS method [18, 19] has been extended or adapted to study the convergence of spherical velocity Langevin equation [25], non-equilibrium Langevin dynamics [33], Langevin dynamics with general kinetic energy [49], temperature-accelerated molecular dynamics [50], adaptive Langevin dynamics [38], dynamics with Boltzmann-type dissipation [2], dynamics with singular potentials [12], just to name a few. It might be interesting to study whether the variational framework [1] we based on can be extended to these cases.

  3. (iii)

    (Convergence in Wasserstein distance). Baudoin discussed a general framework of the Bakry–Émery methodology [5] to hypoelliptic and hypocoercive operators, based on which the exponential convergence of the kinetic Fokker–Planck equation (quantified by a Wasserstein distance associated with a special metric) was proved under certain assumptions on the potential U(x) [7, Theorem 2.6]; see also [8]. A different approach is the coupling method for underdamped Langevin dynamics (1). In [16, Sec. 2], for strongly convex potential U, Dalalyan and Riou-Durand considered the mixing of the marginal distribution in the x coordinate, by a synchronous coupling argument; an estimate of the convergence rate was also explicitly provided, quantified by \(W_2\) distance [16, Theorem 1]. For more general potentials, Eberle, Guillin and Zimmer developed a hybrid coupling method, composed of synchronous and reflection couplings, to study the exponential convergence of probability distributions for the underdamped Langevin dynamics (1), quantified by a Kantorovich semi-metric [20]. Unfortunately, their rates are dimension dependent in general.

  4. (iv)

    (Convergence in relative entropy) Villani [54] obtained exponential convergence of kinetic Fokker–Planck in the case of potentials with bounded Hessian, which is extended in [8]. A more quantitative convergence rate is obtained in [40]. All of them essentially used Gamma calculus on a twisted metric so that derivatives in x direction can be introduced. In [13], exponential convergence of entropy is established for potentials that may not have bounded Hessians but satisfy a stronger weighted log-Sobolev inequality.

There are other approaches to study the long time behavior of the underdamped Langevin dynamics, e.g., Lyapunov function [4, 41, 51, 55] and spectral analysis [21, 35]. There are also works that extend the aforementioned approaches to dynamics with singular potentials [9, 12, 14, 15, 30, 39]. We will not go into details here.

While our work is not the first one that studies the exponential convergence of underdamped Langevin dynamics, our estimates are more quantitative, and in certain cases, sharper than any existing result. In particular, for a large class of convex potentials, we establish an \(O(\sqrt{m})\) convergence rate after optimizing in \(\gamma \), which is independent of dimension and only assumes a mild upper bound (Assumption 2) on the derivatives of the potential. To the best of our knowledge, this optimal \(O(\sqrt{m})\) convergence rate is new in the literature.

Table 1 summarizes the previous results [9, 16, 40] under the assumption \(m \textrm{Id} \leqslant \nabla _x^2 U \leqslant L \textrm{Id}\) (and hence guarantee Assumptions 13) in the most interesting regime \(m\ll 1 \ll L\), with optimal choice of \(\gamma \). To elaborate the comparison with result of [40], after a rescaling, they proved exponential convergence of (4) with friction parameter (using their notations) \(\gamma \sqrt{\xi }\) and convergence rate \(O(\frac{\lambda }{\sqrt{\xi }})\), with constraints that requires (see [40, Proof of Lemma 8])

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\xi }{2L}-\left( \frac{1}{4L}+\frac{1}{2m}\right) \lambda>0 \\ {}&\gamma \left( \frac{4\xi }{L}+1\right) -\left( \frac{1}{2m}+\frac{2}{L}\right) \lambda >0 \\ {}&\frac{1}{2}- \frac{\xi }{2L}+\left( \frac{1}{4L}+\frac{1}{2m}\right) \lambda -\gamma \left( \frac{4\xi }{L}+1\right) +\left( \frac{1}{2m}+\frac{2}{L}\right) \lambda \leqslant 0. \end{aligned} \right. \end{aligned}$$

Combined, these yield \(\xi \geqslant O(L)\) and \(\lambda \leqslant O(m)\), which means the convergence rate cannot exceed \(O(\frac{m}{\sqrt{L}})\). Moreover, they require \(\gamma \geqslant O(1)\), or their friction parameter must be at least \(O(\sqrt{L})\).

Table 1 Summary of the convergence rate \(\nu \) depending on dmL under the assumption \(m{\text {Id}}\leqslant \nabla _x^2 U \leqslant L {\text {Id}}\) for the regime \(m\ll 1 \ll L\)

We also comment that in the case where \(\Vert \nabla _x^2 U\Vert \leqslant L \textrm{Id}\), but U is not necessarily convex, our convergence rate is \(\nu = O(\frac{m}{\sqrt{L}})\) after optimizing in \(\gamma \) by choosing \(\gamma \sim \sqrt{L}\), which matches the results of existing works [9, 40].

2 Proofs

In this section, we present the statements and proofs of auxiliary lemmas, followed by the proofs of the two main theorems. Lemmas 2.12.2 and 2.3 are the technical lemmas that prepare us for the elliptic regularity result in Lemma 2.4. The proof of the divergence Lemma, which builds up from elliptic regularity, is presented in Lemma 2.6. The proof of Theorem 2 is then possible with the test functions obtained from Lemma 2.6. Finally we present the proof of Theorem 1 which follows from Theorem 2 and energy estimate.

We start with the Poincaré inquality on tensorized space \((I\times {\mathbb {R}}^d, \lambda \otimes \mu )\), which allows elliptic regularity to hold in the time-augmented state space. The proof is standard and is thus omitted.

Lemma 2.1

(Poincaré Inequality) For \(f\in H^1(\lambda \otimes \mu )\),

$$\begin{aligned} \Vert f-(f)_{\lambda \otimes \mu }\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant \max \left\{ \frac{1}{m},\frac{T^2}{\pi ^2}\right\} \left( \Vert \partial _t f\Vert _{L^2(\lambda \otimes \mu )}^2+\Vert \nabla _x f\Vert _{L^2(\lambda \otimes \mu )}^2\right) .\nonumber \\ \end{aligned}$$
(16)

The next lemma is also a technical lemma, the goal of which is to show that under Assumption 2, \(|\nabla ^2 U|\) defines a bounded operator \(H^1(\lambda \otimes \mu )\rightarrow L^2(\lambda \otimes \mu )\), which allows us to improve the regularity \(u\in H^2(\lambda \otimes \mu )\) for u being the solution of (22) in the proof of Lemma 2.4.

Lemma 2.2

[54, Lemma A.24] For any \(\phi \in H^1(\lambda \otimes \mu )\), we have

$$\begin{aligned} \Vert \phi \nabla _x U\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant 16\Vert \nabla _x \phi \Vert _{L^2(\lambda \otimes \mu )}^2+4Md\Vert \phi \Vert _{L^2(\lambda \otimes \mu )}^2, \end{aligned}$$
(17)

where M is the constant in (11).

Proof

$$\begin{aligned} \Vert \phi \nabla _x U\Vert _{L^2(\lambda \otimes \mu )}^2&= \int _{I\times {\mathbb {R}}^d} \phi ^2 \nabla _x U\cdot \nabla _x U \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= \int _{I\times {\mathbb {R}}^d} \nabla _x \cdot (\phi ^2 \nabla _x U)\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= 2\int _{I\times {\mathbb {R}}^d} \phi \nabla _x \phi \cdot \nabla _x U\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) +\int _{I\times {\mathbb {R}}^d} \phi ^2 \Delta _x U \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&{\mathop {\leqslant }\limits ^{(11)}} \dfrac{1}{4} \Vert \phi \nabla _x U \Vert _{L^2(\lambda \otimes \mu )}^2 + 4 \Vert \nabla _x \phi \Vert _{L^2(\lambda \otimes \mu )}^2\\ {}&\qquad + Md \Vert \phi \Vert _{L^2(\lambda \otimes \mu )}^2 +\frac{\delta }{2}\int _{I\times {\mathbb {R}}^d} \phi ^2 |\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

We thus finish the proof of (17) after rearranging and using \(\delta <1\). \(\quad \square \)

Next we present a technical lemma that prepares us for the (mixed space-time) \(H^2\) estimates of u, the solution of the elliptic equation (22). This is a generalization of a similar \(L^2\)\(H^2\) regularity estimate in [19, Proposition 5], where only the spatial variable is considered, but our estimates are algebraically simpler thanks to Bochner’s formula. Let us remark that we adopt the same scaling of parameters as [10, Lemma 3.6], especially in the most general case (iii).

Lemma 2.3

For any \(u\in H^2(\lambda \otimes \mu )\) such that \(\bar{\nabla } u\in H_0^1(\lambda \otimes \mu )^{d+1}\),

$$\begin{aligned} \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2=\sum _{i,j=0}^d\Vert \partial _{x_i}\partial _{x_j} u\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant C\left( \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2+R^2\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\right) , \end{aligned}$$
(18)

Similarly,

$$\begin{aligned} \Vert \nabla _x^2 u\Vert _{L^2(\lambda \otimes \mu )}^2\leqslant C\left( \Vert \nabla _x^*\nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2+R^2\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\right) . \end{aligned}$$
(19)

Here C is a universal constant whose the precise value can be traced in the proof under different assumptions in Theorem 1, and R is defined in Theorem 1.

Proof

We only prove (18) since the proof of (19) follows from a similar argument. The starting point of the proof is Bochner’s formula

$$\begin{aligned} \sum _{i,j=0}^d |\partial _{x_i, x_j} u|^2=\bar{\nabla }u\cdot \bar{\nabla }{\mathscr {L}}u-(\nabla _xu)^{\top }\nabla _x^2U\nabla _xu -{\mathscr {L}}\dfrac{|\bar{\nabla } u|^2}{2}. \end{aligned}$$

Integrate over \(\lambda \otimes \mu \) and (noticing the last term above has integral zero) we get

$$\begin{aligned} \sum _{i,j=0}^{d}\Vert \partial _{x_i, x_j} u\Vert _{L^2(\lambda \otimes \mu )}^2 =\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2-\int _{I\times {\mathbb {R}}^d} (\nabla _xu)^{\top } \nabla _x^2U \nabla _xu \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x).\nonumber \\ \end{aligned}$$
(20)

This already verifies the conclusion in cases (i) (setting \(K=0\)) and (ii) with \(C=1\).

Now we deal with the more general case, without assuming (14). Using (17) with \(\phi =\partial _{x_i} u,\ i=1,\cdots ,d\),

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&\quad = \sum _{i=1}^d \int _{I\times {\mathbb {R}}^d} (\partial _{x_i} u)^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {\leqslant }\limits ^{(17)}} 16 \Vert D_x^2 u\Vert _{L^2(\lambda \otimes \mu )}^2 +4Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {=}\limits ^{(20)}} 16\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +4Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad -16\int _{I\times {\mathbb {R}}^d} (\nabla _xu)^{\top } \nabla _x^2U \nabla _xu \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {\leqslant }\limits ^{(11)}} 16\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2+4 Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad +16M\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2(\sqrt{d}+|\nabla _x U|) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\quad {\mathop {\leqslant }\limits ^{d\geqslant 1}} 16 \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +20Md \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad +128M^2 \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) +\dfrac{1}{2}\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

Rearranging the terms, we arrive at

$$\begin{aligned}{} & {} \int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2|\nabla _x U|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \leqslant 32\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 \nonumber \\{} & {} \quad + (40Md+256M^2)\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$
(21)

Therefore by (21) and triangle inequality,

$$\begin{aligned} \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2&{\mathop {\leqslant }\limits ^{(11),(20)}} \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +M\int _{I\times {\mathbb {R}}^d} |\nabla _x u|^2(\sqrt{d}+|\nabla _x U|) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&\leqslant \; \; \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 \\&\quad +M\sqrt{d}\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 +M\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}\Vert |\nabla _x u||\nabla _x U|\Vert _{L^2(\lambda \otimes \mu )}\\&{\mathop {\leqslant }\limits ^{(21)}} \Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +M\sqrt{d}\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\\&\quad +M \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}\left( 6\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}+(16M+\sqrt{40Md})\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}\right) \\&\leqslant 4\Vert {\mathscr {L}}u\Vert _{L^2(\lambda \otimes \mu )}^2 +(19M^2+M\sqrt{40Md})\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

\(\square \)

One of the key lemmas of our proof is the following result on elliptic regularity on the space \((I\times {\mathbb {R}}^d, \lambda \otimes \mu )\) (the solution to such elliptic equation will play an important role in the proof of Lemma 2.6):

Lemma 2.4

Consider the following elliptic equation:

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {L}}u=h&\text{ in }&\ I\times {\mathbb {R}}^d,\\ {}&\partial _t u(t=0, \cdot )=\partial _t u(t=T,\cdot )=0&\text{ in }&\ {\mathbb {R}}^d. \end{aligned}\right. \end{aligned}$$
(22)

Assume \(h\in H^{-1}(\lambda \otimes \mu )\), and \((h)_{\lambda \otimes \mu }=0\). Define the function space

$$\begin{aligned} V=\left\{ u\in H^1(\lambda \otimes \mu )~:~ (u)_{\lambda \otimes \mu }=0 \right\} . \end{aligned}$$

Then

  1. (i)

    There exists a unique \(u\in V\) which is a weak solution to (22). More precisely, for any \(v\in H^1(\lambda \otimes \mu )\), we have

    $$\begin{aligned} \int _{I\times {\mathbb {R}}^d} (\partial _t u\partial _t v +\nabla _x u \cdot \nabla _x v)\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) = \int _{I\times {\mathbb {R}}^d} hv \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

    Moreover, when \(h\in L^2(\lambda \otimes \mu )\), we have the estimate

    $$\begin{aligned} \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 \leqslant \max \left\{ \frac{1}{m},\frac{T^2}{\pi ^2}\right\} \Vert h\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$
    (23)
  2. (ii)

    If \(h\in L^2(\lambda \otimes \mu )\), then the solution u to (22) satisfies \(u\in H^2(\lambda \otimes \mu )\).

Remark 2.5

One could in fact estimate \(\Vert u\Vert _{H^1(\lambda \otimes \mu )}\) using only \(\Vert h\Vert _{H^{-1}(\lambda \otimes \mu )}\), but with a slightly worsened constant \(\max \{\frac{1}{m},\frac{T^2}{\pi ^2},1\}\) on the rhs. Since in our applications we only use \(\Vert h\Vert _{L^2(\lambda \otimes \mu )}\), we opt for the current version of (23) for simplicity.

Proof

(i) V is a linear Hilbert space and has non-zero elements (any function constant in t, and \(H^1\) and mean zero in x is included in V). Moreover, V is a subspace of \(H^1(\lambda \otimes \mu )\), and for the rest of the paper we equip it with the \(H^1(\lambda \otimes \mu )\) norm. We also define the following inner-product:

$$\begin{aligned} B(u,v){:}{=}\int _{I\times {\mathbb {R}}^d} (\partial _t u\partial _t v +\nabla _x u \cdot \nabla _x v)\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x). \end{aligned}$$

One can easily verify \(B(\cdot ,\cdot )\) is an inner product on V. Notice that if \(B(u,u)=0\) then \(\partial _t u=\nabla _x u=0\), leaving u to be a constant, which has to be 0 since \((u)_{\lambda \otimes \mu }=0\). If u is a weak solution of (22), then for any \(v\in V\), \(B(u,v)=\int _{I\times {\mathbb {R}}^d} hv\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \), and necessarily \((h)_{\lambda \otimes \mu }=0\) when we take \(v=1\).

Since \((u)_{\lambda \otimes \mu }=0\), by Poincaré inequality (Lemma 2.1) we can show B is coercive under \(H^1(\lambda \otimes \mu )\) norm in the sense of

$$\begin{aligned} B[u,u]&= \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2+ \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 \\ {}&\geqslant \dfrac{1}{C}(\Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2+\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2+\Vert u\Vert _{L^2(\lambda \otimes \mu )}^2) \\ {}&= \dfrac{1}{C}\Vert u\Vert _{H^1(\lambda \otimes \mu )}^2. \end{aligned}$$

We can also show B is bounded above since it is an inner-product and \(B[u,u]\leqslant \Vert u\Vert _{H^1(\lambda \otimes \mu )}^2\). Define a linear functional on V: \(H(v){:}{=}\int _{I\times {\mathbb {R}}^d} hv \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\). One can verify the boundedness of H:

$$\begin{aligned} |H(v)|\leqslant \Vert h\Vert _{H^{-1}(\lambda \otimes \mu )}\Vert v\Vert _{H^1(\lambda \otimes \mu )}. \end{aligned}$$

Thus by Lax–Milgram’s Theorem, the equation (22) has a unique weak solution \(u\in V\). Moreover,

$$\begin{aligned}&( \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 )^2= B[u,u]^2 \\ {}&\quad = \left( \int _{I\times {\mathbb {R}}^d} hu\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\right) ^2 \leqslant \Vert h\Vert _{L^2(\lambda \otimes \mu )}^2\Vert u\Vert _{L^2(\lambda \otimes \mu )}^2 \\ {}&\quad {\mathop {\leqslant }\limits ^{(16)}} \max \left\{ \frac{1}{m},\frac{T^2}{\pi ^2} \right\} \Vert h\Vert _{L^2(\lambda \otimes \mu )}^2 \left( \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2\right) , \end{aligned}$$

and the desired estimate follows.

(ii) For each \(i=1,2,\cdots ,d\), consider the elliptic equation

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {L}}w_i=\partial _{x_i}h-\nabla _x u\cdot \nabla _x \partial _{x_i} U&\text{ in }&\ I\times {\mathbb {R}}^d,\\ {}&\partial _t w_i(t=0, \cdot )=\partial _t w_i(t=T,\cdot )=0&\text{ in }&\ {\mathbb {R}}^d. \end{aligned} \right. \end{aligned}$$
(24)

The motivation of considering (24) is that, if we formally differentiate (22) with respect to \(\partial _{x_i}\), then \(\partial _{x_i} u \) satisfies precisely the equation (24) for \(w_i\). Hence, our plan is to use part (i) to establish \(w_i\in H^1(\lambda \otimes \mu )\), then argue that \(w_i-\partial _{x_i} u\) must be constant.

We first verify the rhs of (24) has total integral zero. Indeed

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^d} (\partial _{x_i} h-\nabla _xu \cdot \nabla _x \partial _{x_i} U) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad = \int _{I\times {\mathbb {R}}^d} (h\partial _{x_i} U-\nabla _xu \cdot \nabla _x \partial _{x_i} U) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad = \int _{I\times {\mathbb {R}}^d} \left( {\mathscr {L}}u\partial _{x_i} U-\nabla _xu \cdot \nabla _x \partial _{x_i} U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad = \int _{I\times {\mathbb {R}}^d} \left( \partial _{t}u\partial _{tx_i}U+\nabla _x u\cdot \nabla _x\partial _{x_i} U-\nabla _xu \cdot \nabla _x \partial _{x_i} U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) = 0. \end{aligned}$$

The next step is to show rhs is in \(H^{-1}(\lambda \otimes \mu )\). Pick a test function \(\phi \in H^1(\lambda \otimes \mu )\) with \(\Vert \phi \Vert _{H^1(\lambda \otimes \mu )}=1\), and, by Lemma 2.2,

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^d} (\partial _{x_i} h-\nabla _x u\cdot \nabla _x\partial _{x_i} U)\phi \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad \leqslant \int _{I\times {\mathbb {R}}^d} (-h\partial _{x_i} \phi +h\phi \partial _{x_i} U) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)+\int _{I\times {\mathbb {R}}^d}|\phi \nabla _x u||\nabla _x\partial _{x_i} U| \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\ {}&\qquad {\mathop {\leqslant }\limits ^{(11)}} \Vert h\Vert _{L^2(\lambda \otimes \mu )}(1+\Vert \phi \partial _{x_i} U\Vert _{L^2(\lambda \otimes \mu )})+M\int _{I\times {\mathbb {R}}^d}|\phi \nabla _x u|(\sqrt{d}+|\nabla _x U|) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&\qquad \leqslant \Vert h\Vert _{L^2(\lambda \otimes \mu )}(1+\Vert \phi \partial _{x_i} U\Vert _{L^2(\lambda \otimes \mu )})+M\Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}(\sqrt{d}+\Vert \phi \nabla _x U\Vert _{L^2(\lambda \otimes \mu )})\\&\qquad {\mathop {\leqslant }\limits ^{(17),(23)}} C(M,d)\Vert h\Vert _{L^2(\lambda \otimes \mu )}, \end{aligned}$$

where \(C(M,d)>0\) is a constant depending on Md. Therefore, by (i) we know there exists a \(w_i\in V\) which is the weak solution of (24). Finally, comparing (22) and (24), we observe that \({\mathscr {L}}(w_i-\partial _{x_i}u)=0\) in the sense of distributions, which by (i) indicates \(w_i-\partial _{x_i} u\) must be constant, which must be \(-(\partial _{x_i} u)_{\lambda \otimes \mu }\), since by construction \(w\in V\) and \((w)_{\lambda \otimes \mu }=0\). This also means \(\partial _{x_i} u \in H^1(\lambda \otimes \mu )\) since \(w_i\in H^1(\lambda \otimes \mu )\). We end the proof of \(u\in H^2(\lambda \otimes \mu )\) by writing \(\partial _{tt} u =\nabla _x^*\nabla _x u-h \in L^2(\lambda \otimes \mu ) \). \(\quad \square \)

We finally need a lemma for the solution of a divergence equation with Dirichlet boundary conditions. The resolution of divergence equation is an important tool in mathematical fluid dynamics (see the book [23, Section III.3]). However, in order to obtain more natural estimate on the constants, instead of resorting to the aforementioned Bogovskii’s operator, we take advantage of the structure of space \(L^2(\mu )\) by eigenspace decomposition, which is made possible thanks to Assumption 3. This will provide us test functions which play a crucial role in the proof of Theorem 2.

Lemma 2.6

For any function \(f\in L^2(\lambda \otimes \mu )\) with \((f)_{\lambda \otimes \mu }=0\), there exist two functions \(\phi _0 \in H_0^1(\lambda \otimes \mu )\) and \(\Phi \in H^2(\lambda \otimes \mu )\) such that \(\nabla _x \Phi \in H_0^1(\lambda \otimes \mu )^d\) and

$$\begin{aligned} -\partial _t\phi _0+\nabla _x^*\nabla _x \Phi = f \end{aligned}$$
(25)

with estimates

$$\begin{aligned} \Vert \phi _0\Vert _{L^2(\lambda \otimes \mu )}+ \Vert \nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )} \leqslant C\left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}\nonumber \\ \end{aligned}$$
(26)

and

$$\begin{aligned}{} & {} \Vert \nabla _x \phi _0\Vert _{L^2(\lambda \otimes \mu )} + \Vert \bar{\nabla }\nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )} \leqslant C\left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2} \right. \nonumber \\{} & {} \left. \quad + \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}. \end{aligned}$$
(27)

Here C is a universal constant and R is the constant defined in Theorem 1.

Remark 2.7

We believe the correct scaling of the rhs should be \(O(\frac{1}{T})\) as \(T\rightarrow 0\), which we are unable to obtain, due to the pessimistic estimates in the last two lines of (31) that changed the scaling of the last two terms from O(1) to \(O(T^2)\), but will not pursue further since in the proof of Theorem 1 we only take \(T=\frac{1}{\sqrt{m}}\). As we mentioned iearlier after Theorem 2, the scaling of \(O(\frac{1}{T})\) as \(T\rightarrow 0\) should come from (35).

Before we proceed to the proof, let us give a brief heuristic argument on why we need to introduce the space of harmonic functions (i.e. the space \({\mathbb {H}}\) that appears at the beginning of the proof) and consider orthogonal projection on it. Indeed, a direct way to look for a solution of (25) is to look for that of (22) and set \(\phi _0=\partial _t u, \Phi = u\). However, these test functions do not satisfy the appropriate boundary conditions. In particular, if solution of (22) satisfy \(\nabla _x u(t=0,\cdot ) = \nabla _x u(t=T,\cdot )=0\), then necessarily f has to be perpendicular to the space of harmonic functions. Meanwhile, the harmonic part of f requires special treatment from us and brings technical difficulty to the proof. However, thanks to Assumption 3, one can decompose the harmonic part of f using separation of variables, which enables us to obtain the solution of divergence equation by constructing it for each component and adding them up.

Proof

Let \({\mathbb {H}}\) be the subspace of \(L^2(\lambda \otimes \mu )\) that consists of “harmonic functions”, in other words, \(f\in {\mathbb {H}}\) if and only if \({\mathscr {L}}f=0\). We consider the decomposition \(f=f^{(1)}+f^{(2)}\) where \(f^{(1)}\in {\mathbb {H}}\) and \(f^{(2)}\perp {\mathbb {H}}\). Since \(1\in {\mathbb {H}}\) we know \((f^{(2)})_{\lambda \otimes \mu }=0\) and hence \((f^{(1)})_{\lambda \otimes \mu }=0\). Therefore by linearity it suffices to consider \(f^{(1)}\) and \(f^{(2)}\) separately. For \(f^{(2)}\), the equation

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {L}}u=f^{(2)}&\text{ in }&\ I\times {\mathbb {R}}^d, \\ {}&\partial _t u(t=0, \cdot )=\partial _t u(t=T,\cdot )=0&\text{ in }&\ {\mathbb {R}}^d \end{aligned} \right. \end{aligned}$$
(28)

has a unique solution in \(V\cap H^2(\lambda \otimes \mu )\) by Lemma 2.4. Moreover, for any \(v\in {\mathbb {H}} \cap H^2(\lambda \otimes \mu )\), integration by parts yields

$$\begin{aligned} 0= & {} \int _{I\times {\mathbb {R}}^d} f^{(2)}v \,\textrm{d}\lambda (t) \,\textrm{d}\mu (x) = B[u,v]\\= & {} \int _{I\times {\mathbb {R}}^d} u{\mathscr {L}}v \,\textrm{d}\lambda (t) \,\textrm{d}\mu (x) + \int _{{\mathbb {R}}^d} \left( u(T)\partial _t v(T) - u(0)\partial _t v(0)\right) \,\textrm{d}\mu (x) \end{aligned}$$

Therefore, since v is arbitrary, we have \(u(T)=u(0)=0\), which implies \(\nabla _x u\in H_0^1(\lambda \otimes \mu )^d\). Also by construction of boundary conditions \(\partial _t u\in H_0^1(\lambda \otimes \mu )\). Thus for \(f^{(2)}\) part, it suffices to take correspondingly \(\phi _0^{(2)}=\partial _t u,~\Phi ^{(2)}= u\) with the estimates

$$\begin{aligned} \Vert \bar{\nabla }u\Vert _{L^2(\lambda \otimes \mu )}^2 {\mathop {\leqslant }\limits ^{(23)}} C\max \left\{ \frac{1}{m},T^2 \right\} \Vert f^{(2)}\Vert ^2_{L^2(\lambda \otimes \mu )}, \end{aligned}$$
(29)

and

$$\begin{aligned} \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2 {\mathop {\leqslant }\limits ^{(18),(29)}} C\left( 1+\dfrac{R^2}{m}+R^2T^2\right) \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$
(30)

We now consider the \(f^{(1)}\) part. Since \( \{1\}\cup \{w_\alpha \} \) forms an orthonormal basis in \(L^2(\mu )\) and \((f^{(1)})_{\lambda \otimes \mu }=0\), we have an orthogonal decomposition

$$\begin{aligned} f^{(1)}(t, x)=f_0(t)+\sum _{\alpha } f_\alpha (t) w_\alpha (x). \end{aligned}$$

Since \(f^{(1)}\) is harmonic,

$$\begin{aligned} 0={\mathscr {L}}f^{(1)}= -f_0''(t)+\sum _{\alpha } (-f''_\alpha (t)+\alpha ^2f_\alpha (t)) w_\alpha (x) \end{aligned}$$

and therefore \(f_0(t)\) is an affine function \(f_0(t)=c_0(t-\frac{T}{2})\) for some constant \(c_0\), as \(f_0(t)\) has integral zero. Moreover for \(\alpha >0\) there exist constants \(c_\pm ^\alpha \) such that

$$\begin{aligned} f_\alpha (t)=c_+^\alpha e^{-\alpha t}+c_-^\alpha e^{-\alpha (T-t)}. \end{aligned}$$

Therefore, by orthogonality in \(L^2(\lambda \otimes \mu )\), we can write for some constant \(C\in (1,\infty )\),

$$\begin{aligned} \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2&= \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + c_0^2\Vert t-\frac{T}{2}\Vert _{L^2(\lambda )}^2+\sum _\alpha \Vert c_+^\alpha e^{-\alpha t} + c_-^\alpha e^{-\alpha (T-t)} \Vert _{L^2(\lambda )}^2 \nonumber \\&= \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{T^2 c_0^2}{12}+\sum _\alpha \left( \left( (c_+^\alpha )^2+(c_-^\alpha )^2 \right) \frac{1-e^{-2\alpha T}}{2\alpha T} + 2c_+^\alpha c_-^\alpha e^{-\alpha T} \right) \nonumber \\&\geqslant \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{T^2 c_0^2}{12}+\sum _\alpha \left( (c_+^\alpha )^2+(c_-^\alpha )^2 \right) \left( \frac{1-e^{-2\alpha T}}{2\alpha T} - e^{-\alpha T} \right) \nonumber \\&\geqslant \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{T^2 c_0^2}{12} + \frac{1}{C}\sum _\alpha \left( (c_+^\alpha )^2+(c_-^\alpha )^2 \right) \frac{(1-e^{-\alpha T})^3}{\alpha T}. \end{aligned}$$
(31)

The construction of test functions for \(f_0(t)\) is straightforward: We simply take \(\Phi ^{(0)}=0\) and \(\phi _0^{(0)}(t,x)=\frac{c_0}{2}(t^2-tT)\). We then construct \(\phi _{0,\alpha },\Phi _\alpha \) for each component of the sum \(e^{-\alpha t} w_\alpha (x)\), and therefore the functions \(\phi _{0,\alpha }(T-t,\cdot ),\Phi _{\alpha }(T-t,\cdot )\) also apply to the component \(e^{-\alpha (T-t)}w_\alpha (x)\), so that the eventual test functions \(\phi _0,\Phi \) can be obtained after taking linear combination. The goal is to find \(\phi _{0,\alpha },\Phi _\alpha \) such that

$$\begin{aligned} -\partial _t\phi _{0,\alpha }+\nabla _x^*\nabla _x \Phi _\alpha = e^{-\alpha t}w_\alpha (x). \end{aligned}$$

Since \(w_\alpha \in H^2(\lambda \otimes \mu )\), in order to eliminate the x part of the equation, we can take the natural ansatz by separation of variables \(\phi _{0,\alpha }=\psi _{1,\alpha }(t)w_\alpha (x)\) and \(\Phi _\alpha =\psi _{2,\alpha }(t) w_\alpha (x)\), and the two functions \(\psi _{1,\alpha }(t),\psi _{2,\alpha }(t)\) should satisfy \(\psi _{1,\alpha }(0)=\psi _{1,\alpha }(T)=\psi _{2,\alpha }(0)=\psi _{2,\alpha }(T)=0\) as well as the equation

$$\begin{aligned} -\psi _{1,\alpha }'(t)+\alpha ^2\psi _{2,\alpha }(t)=e^{-\alpha t}. \end{aligned}$$
(32)

Integrating (32) against t, we obtain the necessary and sufficient condition

$$\begin{aligned} \int _0^T \psi _{2,\alpha }(t) \,\textrm{d}t = \frac{1-e^{-\alpha T}}{\alpha ^3}. \end{aligned}$$
(33)

Of course there exists infinitely many possible solutions, since for any \(\psi _{2,\alpha }\) that vanishes at both time boundaries and satisfies (33), the choice \(\psi _{1,\alpha } = \int _0^t (\alpha ^2\psi _{2,\alpha } (\tau ) -e^{-\alpha \tau })\,\textrm{d}\tau \) also vanishes at both time boundaries. Therefore we only need to choose a particular one to satisfy the desired estimates. Let us introduce a short-hand notation \(\ell =e^{-\alpha T} \in (0,1)\). Our idea is to find \(\psi _{2,\alpha }\) of the form \(\psi _{2,\alpha }(t)= \frac{1}{\alpha ^2}g(e^{-\alpha t})\), which after a change of variable \(s{:}{=}e^{-\alpha t}\) turns the condition (33) into \(\int _{\ell }^1 \frac{g(s)}{s}\,\textrm{d}s = 1-\ell \), and the boundary conditions into \(g(1)=g(\ell )=0\). Hence, we may finish our construction by picking \(g(s)=sh(s)\) with

$$\begin{aligned} h(x)= \frac{6}{(1-\ell )^2}(x-\ell )(1-x). \end{aligned}$$

From the expression we can directly derive (using \(\alpha \geqslant \sqrt{m}\))

$$\begin{aligned} 0 \leqslant g(s)\leqslant \frac{3}{2}s \ \text{ and } \ |g'(s)|\leqslant \dfrac{4}{1-\ell }= \frac{4}{1-e^{-\alpha T}}. \end{aligned}$$

One can explicitly compute

$$\begin{aligned}&\Vert \psi _{2,\alpha }\Vert _{L^2(\lambda )}^2=\dfrac{1}{\alpha ^4T}\int _0^T g(e^{-\alpha t})^2 \,\textrm{d}t = \dfrac{1}{\alpha ^5T}\int _\ell ^1 \dfrac{g(s)^2}{s}\,\textrm{d}s = \dfrac{3(1-e^{-2\alpha T})}{5\alpha ^5T}, \end{aligned}$$
(34)
$$\begin{aligned} \text{ and }&\Vert \psi _{2,\alpha }'\Vert _{L^2(\lambda )}^2=\dfrac{1}{\alpha ^2T}\int _0^T g'(e^{-\alpha t})^2 e^{-2\alpha t} \,\textrm{d}t = \dfrac{1}{\alpha ^3T}\int _\ell ^1 g'(s)^2 s \,\textrm{d}s \leqslant \dfrac{8}{\alpha ^3T(1-e^{-\alpha T})}. \end{aligned}$$
(35)

Moreover since \(\psi _{1,\alpha }'(t) = \alpha ^2\psi _{2,\alpha }(t) - e^{-\alpha t}\) from (32),

$$\begin{aligned} \Vert \psi _{1,\alpha }'\Vert _{L^2(\lambda )}^2\leqslant 2\alpha ^4 \Vert \psi _{2,\alpha }\Vert _{L^2(\lambda )}^2 +\dfrac{1-e^{-2\alpha T}}{\alpha T}\leqslant \dfrac{3(1-e^{-2\alpha T})}{\alpha T}. \end{aligned}$$
(36)

Finally since

$$\begin{aligned} \psi _{1,\alpha }(t)=\int _0^t (g(e^{-\alpha s})-e^{-\alpha s})\,\textrm{d}s =\dfrac{1}{\alpha }\int _{e^{-\alpha t}}^1 (\dfrac{g(\tau )}{\tau }-1)\,\textrm{d}\tau = \dfrac{1}{\alpha } r(e^{-\alpha t}) \end{aligned}$$

with

$$\begin{aligned}r(s) = \int _s^1 (h(\tau )-1)\,\textrm{d}\tau = \frac{(s-\ell )(1-s)(1+\ell -2s)}{(1-\ell )^2},\end{aligned}$$

we can estimate

$$\begin{aligned}{} & {} \alpha ^2\Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2= \frac{1}{\alpha T} \int ^1_\ell \frac{r(t)^2}{t}\,\textrm{d}t = \frac{(1-\ell )^3}{\alpha T}\nonumber \\{} & {} \quad \int _0^1 \frac{s^2(1-s)^2(1-2s)^2}{(1-\ell )s+\ell }\,\textrm{d}s \leqslant \frac{C(1-e^{-\alpha T})^3}{\alpha T}. \end{aligned}$$
(37)

To sum up, our construction of test functions can be write as

$$\begin{aligned} \phi _0&= \partial _t u+c_0\frac{t^2-tT}{2} + \sum _\alpha (c_+^\alpha \psi _{1,\alpha }(t) + c_-^\alpha \psi _{1,\alpha }(T-t))w_\alpha (x), \\ \Phi&= u + \sum _\alpha (c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t))w_\alpha (x), \end{aligned}$$

here we recall that u is the solution of (28).

We now establish the estimates by direct calculations, which is possible since the variables are separated. Notice that for \(\alpha , \beta \),

$$\begin{aligned} \langle \nabla _x w_\alpha , \nabla _x w_\beta \rangle _{L^2(\mu )}= \langle w_\alpha , \nabla _x^*\nabla _x w_\beta \rangle _{L^2(\mu )}= \beta ^2 \langle w_\alpha , w_\beta \rangle _{L^2(\mu )}=\alpha ^2 \delta _{\alpha ,\beta }, \end{aligned}$$

hence cross terms in the expansion of \(\Vert \sum _\alpha (c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t))\nabla _x w_\alpha (x)\Vert _{L^2(\lambda \otimes \mu )}^2\) vanish. Therefore, we can estimate

$$\begin{aligned}&\Vert \phi _0\Vert ^2_{L^2(\lambda \otimes \mu )} + \Vert \nabla _x\Phi \Vert ^2_{L^2(\lambda \otimes \mu )}\nonumber \\ {}&\quad \leqslant 3\left( \Vert \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{c_0^2}{4}\Vert t^2-tT\Vert _{L^2(\lambda )}^2 \right. \nonumber \\&\qquad + \sum _\alpha \Vert c_+^\alpha \psi _{1,\alpha }(t) + c_-^\alpha \psi _{1,\alpha }(T-t)\Vert _{L^2(\lambda )}^2\Vert w_\alpha \Vert _{L^2(\mu )}^2 \nonumber \\&\qquad \left. + \Vert \nabla _x u\Vert _{L^2(\lambda \otimes \mu )}^2 + \left\| \sum _\alpha (c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t))\nabla _x w_\alpha \right\| _{L^2(\lambda \otimes \mu )}^2 \right) \nonumber \\&\quad {\mathop {\leqslant }\limits ^{(23)}} 6\left( \max \{\frac{1}{m},T^2 \} \Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + \frac{c_0^2T^4}{120} + \sum _\alpha ((c_+^\alpha )^2+(c_-^\alpha )^2) \Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2\right. \nonumber \\&\qquad \left. +\sum _{\alpha } \Vert c_+^\alpha \psi _{2,\alpha }(t) + c_-^\alpha \psi _{2,\alpha }(T-t)\Vert _{L^2(\lambda )}^2 \Vert \nabla _x w_\alpha \Vert _{L^2(\mu )}^2 \right) \nonumber \\&\quad \leqslant C\left( \max \{\frac{1}{m},T^2\}\Vert f^{(2)}\Vert _{L^2(\lambda \otimes \mu )}^2 + c_0^2T^4 \right. \nonumber \\&\qquad \left. + \sum _\alpha ((c_+^\alpha )^2+(c_-^\alpha )^2) (\Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2+\alpha ^2\Vert \psi _{2,\alpha }\Vert _{L^2(\lambda )}^2) \right) \nonumber \\&\quad {\mathop {\leqslant }\limits ^{(37),(34)}} C\left( \max \{\frac{1}{m},T^2\}\left\| f^{(2)}\right\| _{L^2(\lambda \otimes \mu )}^2+ c_0^2T^4 \right. \nonumber \\&\qquad \left. + \sum _\alpha \frac{1}{\alpha ^2}((c_+^\alpha )^2+(c_-^\alpha )^2) \frac{(1-e^{-\alpha T})^3+1-e^{-2\alpha T}}{\alpha T} \right) \nonumber \\&\quad {\mathop {\leqslant }\limits ^{(31)}} C\max \left\{ \frac{1}{m(1-e^{-\sqrt{m} T})^2},T^2\right\} \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$
(38)

Here in the last line when we used (31), the worse factor \((1-e^{-\sqrt{m}T})^{-2}\) comes only from the last term on the line above. This establishes (26). Using similar arguments, we can estimate

$$\begin{aligned} \Vert \nabla _x \phi _0\Vert _{L^2(\lambda \otimes \mu )}^2&= \Big \Vert \nabla _x\partial _t u+ \sum _\alpha (c_+^\alpha \psi _{1,\alpha }(t) - c_-^\alpha \psi _{1,\alpha }(T-t))\nabla _x w_\alpha (x)\Big \Vert ^2_{L^2(\lambda \otimes \mu )} \nonumber \\ {}&\leqslant 2\left( \Vert \nabla _x\partial _t u \Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha \alpha ^2\Vert c_+^\alpha \psi _{1,\alpha }(t) + c_-^\alpha \psi _{1,\alpha }(T-t)\Vert _{L^2(\lambda )}^2 \right) \nonumber \\ {}&\leqslant C\left( \Vert \nabla _x\partial _t u \Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2) \alpha ^2 \Vert \psi _{1,\alpha }\Vert _{L^2(\lambda )}^2 \right) \nonumber \\&{\mathop {\leqslant }\limits ^{(37)}} C\left( \Vert \nabla _x\partial _t u \Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2) \frac{(1-e^{-\alpha T})^3}{\alpha T} \right) , \end{aligned}$$
(39)

as well as

$$\begin{aligned} \Vert \partial _t \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu )}^2&= \Bigl \Vert \nabla _x \partial _t u + \sum _\alpha (c_+^\alpha \psi _{2,\alpha }'(t) - c_-^\alpha \psi _{2,\alpha }'(T-t))\nabla _x w_\alpha (x)\Bigr \Vert _{L^2(\lambda \otimes \mu )}^2 \nonumber \\&\leqslant 2 \left( \Vert \nabla _x \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha \Vert c_+^\alpha \psi _{2,\alpha }'(t) \right. \nonumber \\&\quad \left. - c_-^\alpha \psi _{2,\alpha }'(T-t)\Vert _{L^2(\lambda )}^2\Vert \nabla _x w_\alpha \Vert _{L^2(\mu )}^2\right) \nonumber \\&\leqslant C \left( \Vert \nabla _x \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha \alpha ^2((c_+^\alpha )^2+ (c_-^\alpha )^2)\Vert \psi _{2,\alpha }'\Vert _{L^2(\lambda )}^2\right) \nonumber \\&{\mathop {\leqslant }\limits ^{(35)}} C\left( \Vert \nabla _x \partial _t u\Vert _{L^2(\lambda \otimes \mu )}^2 + \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\frac{1}{\alpha T(1-e^{-\alpha T})}\right) . \end{aligned}$$
(40)

We finally treat the terms from \(\nabla ^2_x \Phi \):

$$\begin{aligned} \Vert \nabla ^2_x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2&{\mathop {\leqslant }\limits ^{(19)}} C\left( \Vert \nabla _x^*\nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2 +R^2 \Vert \nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2\right) \nonumber \\ {}&{\mathop {\leqslant }\limits ^{(25),(38)}} C\left( \Big \Vert f+\partial _{tt} u+c_0(t-\frac{T}{2}) + \sum _\alpha (c_+^\alpha \psi _{1,\alpha }'(t) \right. \nonumber \\&\quad - c_-^\alpha \psi _{1,\alpha }'(T-t))w_\alpha (x)\Big \Vert _{L^2(\lambda \otimes \mu )}^2\nonumber \\&\quad \left. +R^2\left( T^2+ \frac{1}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) \nonumber \\&\leqslant C\left( \Vert \partial _{tt}u\Vert _{L^2(\lambda \otimes \mu )}^2+c_0^2T^2+ \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\Vert \psi _{1,\alpha }'\Vert _{L^2(\lambda )}^2 \right. \nonumber \\&\quad \left. +\left( 1+R^2T^2+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) \nonumber \\&{\mathop {\leqslant }\limits ^{(36)}} C\left( \Vert \partial _{tt}u\Vert _{L^2(\lambda \otimes \mu )}^2+c_0^2T^2+ \sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\frac{1-e^{-2\alpha T}}{\alpha T} \right. \nonumber \\ {}&\quad \left. +\left( 1+R^2T^2+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) . \end{aligned}$$
(41)

Adding together (39), (40) and (41), we arrive at

$$\begin{aligned}&\Vert \nabla _x \phi _0\Vert _{L^2(\lambda \otimes \mu )}^2 + \Vert \bar{\nabla }\nabla _x \Phi \Vert _{L^2(\lambda \otimes \mu )}^2 \\&\quad \leqslant C\left( \Vert D^2 u\Vert _{L^2(\lambda \otimes \mu )}^2 + c_0^2 T^2 +\sum _\alpha ((c_+^\alpha )^2+ (c_-^\alpha )^2)\frac{1}{\alpha T(1-e^{-\alpha T})} \right. \\&\qquad \left. + \left( 1+R^2T^2+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^2}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2\right) \\&\quad {\mathop {\leqslant }\limits ^{(30),(31)}}C \left( 1+R^2T^2+ \frac{1}{(1-e^{-\sqrt{m}T})^4}+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^4}\right) \Vert f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

\(\quad \square \)

We are now ready to prove the main results of the paper. The proof is essentially inspired from that of [1, Proof of Theorem 3]. In particular, to retrieve the \(L^2(\lambda \otimes \mu ;H^{-1}_\kappa )\) norm, we need to construct a test function that is in \(L^2(\lambda \otimes \mu ;H^1_\kappa )\), which is highly related to the test functions constructed in Lemma 2.6. The differences of these two proofs are: (1) we choose the test functions explicitly \(\xi _0=1\) and \(\xi _i = v_i\), which are orthogonal to each other and have explicit expressions for up to fourth moments (in particular any first and third moments vanish); (2) Instead of using \(\Vert \bar{\nabla } \Pi _v f\Vert _{H^{-1}(\lambda \otimes \mu )}\) as an intermediate step, we proceed as (42) and control the \(L^2(\lambda \otimes \mu ;H^1_\kappa )\) norm of another explicitly constructed function, in order to minimize the usage of Cauchy–Schwarz inequalities and track the dimension dependence of constants carefully.

Proof of Theorem 2

Without loss of generality, assume \((f)_{\lambda \otimes \rho _{\infty }}=0\). which indicates \( (\Pi _v f)_{\lambda \otimes \mu } = 0\). Therefore, we can take \(\phi _0, \Phi \) as in Lemma 2.6 with \(\Pi _v f\) in place of f, so that \(-\partial _t \phi _0 + \nabla _x^* \nabla _x \Phi = \Pi _v f\). The trick in our following step is to introduce v variable in the calculation. Notice, by Gaussianity, that

$$\begin{aligned} \int _{{\mathbb {R}}^d} v_i\ \,\textrm{d}\kappa (v)=0,\qquad \int _{{\mathbb {R}}^d} v_iv_j\ \,\textrm{d}\kappa (v)=\delta _{i,j}, \end{aligned}$$

where \(\delta _{i,j}\) is the Kronecker symbol which equals to 1 if \(i=j\) and 0 otherwise. Thus,

$$\begin{aligned} \begin{aligned} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2&=\int _{I\times {\mathbb {R}}^d} \Pi _v f(-\partial _t \phi _0 +\nabla _x^*\nabla _x \Phi )\,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= \int _{I\times {\mathbb {R}}^{2d}}\Pi _v f (-\partial _t \phi _0+v\cdot \nabla _x\phi _0+v\cdot \partial _t \nabla _x \Phi \\&\quad -v\cdot \nabla _x^2\Phi \cdot v+\nabla _x\Phi \cdot \nabla _x U) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&= \int _{I\times {\mathbb {R}}^{2d}} f (-\partial _t \phi _0+v\cdot \nabla _x\phi _0+v\cdot \partial _t \nabla _x \Phi \\&\quad -v\cdot \nabla _x^2\Phi \cdot v+\nabla _x\Phi \cdot \nabla _x U) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad + \int _{I\times {\mathbb {R}}^{2d}} (\partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x \Phi +v\cdot \nabla _x^2\Phi \cdot v\\&\quad -\nabla _x\Phi \cdot \nabla _x U) (f-\Pi _v f) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v). \end{aligned} \end{aligned}$$
(42)

For the first integral on the right hand side, we use integration by parts, where it is important that the test functions \((\phi _0,\nabla _x\Phi )\) have Dirichlet boundary conditions in time:

$$\begin{aligned}&\int _{I\times {\mathbb {R}}^{2d}} f (-\partial _t \phi _0+v\cdot \nabla _x\phi _0+v\cdot \partial _t \nabla _x \Phi -v\cdot \nabla _x^2\Phi \cdot v+\nabla _x\Phi \cdot \nabla _x U) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( \partial _t f\phi _0-\partial _t f(v\cdot \nabla _x\Phi )-\phi _0(v\cdot \nabla _x f)+f\phi _0(v\cdot \nabla _x U) \right. \\ {}&\qquad \left. +(v\cdot \nabla _x f)(v\cdot \nabla _x \Phi )-f(v\cdot \nabla _x\Phi )(v\cdot \nabla _x U)+f\nabla _x\Phi \cdot \nabla _x U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( \partial _t f\phi _0-\partial _t f(v\cdot \nabla _x\Phi )-\phi _0(v\cdot \nabla _x f)+\phi _0(\nabla _v f\cdot \nabla _x U) \right. \\&\qquad \left. +(v\cdot \nabla _x f)(v\cdot \nabla _x\Phi )-\nabla _v\cdot ((v\cdot \nabla _x\Phi )f \nabla _x U)+ f\nabla _x\Phi \cdot \nabla _x U\right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( (\partial _t f-v\cdot \nabla _x f+\nabla _x U\cdot \nabla _v f)(\phi _0-v\cdot \nabla _x\Phi )\right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad \leqslant \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )}\Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )}. \end{aligned}$$

We further estimate the term \(\Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )}\) by explicit integration, noticing \((\phi _0,\Phi )\) do not depend on v so that explicit moments of v can be directly calculated:

$$\begin{aligned} \Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )}^2&= \int _{I\times {\mathbb {R}}^d} \Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{H^1_\kappa }^2 \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x)\\&= \int _{I\times {\mathbb {R}}^d} \left( \Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2_\kappa }^2\right. \\&\quad \left. +\Vert \nabla _v(\phi _0-v\cdot \nabla _x\Phi )\Vert _{L^2_\kappa }^2\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&= \int _{I\times {\mathbb {R}}^d} \left( \int _{{\mathbb {R}}^d}(\phi _0-v\cdot \nabla _x\Phi )^2\,\textrm{d}\kappa (v)\right. \\&\quad \left. +\int _{{\mathbb {R}}^d}|\nabla _x\Phi |^2\,\textrm{d}\kappa (v)\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&= \int _{I\times {\mathbb {R}}^d} \left( \phi _0^2+2|\nabla _x\Phi |^2 \right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&{\mathop {\leqslant }\limits ^{(26)}} C\left( \frac{1}{m(1-e^{-\sqrt{m} T})^2}+T^2\right) \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

For the second integral in (42), we estimate again by explicit expansion in v, which is possible since we have explicit up to fourth moments of v:

$$\begin{aligned}&\Vert \partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x\Phi +v\cdot \nabla ^2_x\Phi \cdot v-\nabla _x\Phi \cdot \nabla _x U\Vert _{L^2(\lambda \otimes \rho _{\infty })}^2 \\&\quad = \int _{I\times {\mathbb {R}}^{2d}} (\partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x\Phi \\&\qquad +v\cdot \nabla ^2_x\Phi \cdot v-\nabla _x\Phi \cdot \nabla _x U)^2\,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&\quad = \int _{I\times {\mathbb {R}}^{2d}}\left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)^2-2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U) (v\cdot \nabla _x\phi _0)\right. \\&\qquad -2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U) (v\cdot \partial _t\nabla _x\Phi ) +(v\cdot \nabla _x \phi _0)^2+(v\cdot \partial _t \nabla _x\Phi )^2\\&\qquad +2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)v\cdot \nabla _x^2 \Phi \cdot v+2(v\cdot \partial _t \nabla _x\Phi )(v\cdot \nabla _x\phi _0) \\&\qquad +(v\cdot \nabla _x^2 \Phi \cdot v)^2-2(v\cdot \partial _{t} \nabla _x\Phi )(v\cdot \nabla _x^2\Phi \cdot v)\\&\qquad \left. -2(v\cdot \partial _{x_k} \phi _0)(v\cdot \nabla _x^2 \Phi \cdot v) \right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v) \\&\quad = \int _{I\times {\mathbb {R}}^{2d}} \left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)^2 \right. \\&\qquad +\sum _{i} v_i^2\left( (\partial _{x_i}\phi _0)^2+(\partial _t \partial _{x_i}\Phi )^2+2\partial _{x_i}\phi _0\partial _t \partial _{x_i}\Phi \right) \\&\qquad +2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)\sum _{i} v_i^2 \partial _{x_i x_i}\Phi \\&\qquad + \sum _{i} v_i^4 (\partial _{x_i x_i}\Phi _i)^2+2\sum _{i\ne j}v_i^2v_j^2(\partial _{x_ix_j} \Phi )^2\\&\qquad \left. +\sum _{i\ne j} v_i^2v_j^2 \partial _{x_ix_i}\Phi \partial _{x_jx_j}\Phi \right) \,\textrm{d}\lambda (t)\,\textrm{d}\rho _{\infty }(x,v)\\&\quad = \int _{I\times {\mathbb {R}}^d} \left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)^2+ |\nabla _x\phi _0+\partial _t \nabla _x\Phi |^2\right. + 3\sum _{i} (\partial _{x_i x_i}\Phi )^2\\&\qquad +2\sum _{i\ne j}(\partial _{x_i x_j} \Phi )^2+2(\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U)\Delta _x\Phi \left. +\sum _{i\ne j} \partial _{x_ix_i}\Phi \partial _{x_jx_j}\Phi \right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&\quad \leqslant \int _{I\times {\mathbb {R}}^d} \left( (\partial _t\phi _0-\nabla _x\Phi \cdot \nabla _x U+\Delta _x\Phi )^2+2|\nabla _x \phi _0|^2+ 2 |\bar{\nabla } \nabla _x \Phi |^2\right) \,\textrm{d}\lambda (t)\,\textrm{d}\mu (x) \\&\quad {\mathop {=}\limits ^{(28)}} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2+2\Vert \nabla _x \phi _0\Vert ^2_{L^2(\lambda \otimes \mu )}+2\Vert \bar{\nabla } \nabla _x \Phi \Vert ^2_{L^2(\lambda \otimes \mu )} \\&\quad {\mathop {\leqslant }\limits ^{(27)}} C\left( 1+R^2T^2+ \frac{1}{(1-e^{-\sqrt{m}T})^4}+ \frac{R^2}{m(1-e^{-\sqrt{m}T})^4}\right) \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2. \end{aligned}$$

Combining the above estimates, we arrive at

$$\begin{aligned} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}^2&\leqslant \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )}\Vert \phi _0-v\cdot \nabla _x\Phi \Vert _{L^2(\lambda \otimes \mu ;H^1_\kappa )} \\&\quad + \Vert \partial _t \phi _0-v\cdot \nabla _x\phi _0-v\cdot \partial _t \nabla _x\Phi +v\cdot \nabla _x^2\Phi \cdot v\\&\quad -\nabla _x\Phi \cdot \nabla _x U\Vert _{L^2(\lambda \otimes \rho _{\infty })}\Vert f-\Pi _v f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \\&\leqslant C\left( \left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )} \right. \\&\quad +\left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}+ \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \\&\quad \left. \Vert ({Id}-\Pi _v)f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )}\right) . \end{aligned}$$

Finally

$$\begin{aligned} \Vert f\Vert _{L^2(\lambda \otimes \rho _{\infty })}&\leqslant \Vert ({Id}-\Pi _v )f\Vert _{L^2(\lambda \otimes \rho _{\infty })} + \Vert \Pi _v f\Vert _{L^2(\lambda \otimes \mu )} \\&\leqslant C\left( \left( \frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T\right) \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2(\lambda \otimes \mu ;H^{-1}_\kappa )} \right. \\&\quad \left. + \left( 1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}+ \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2}\right) \Vert ({Id}-\Pi _v) f\Vert _{L^2(\lambda \otimes \rho _{\infty })}\right) , \end{aligned}$$

as claimed.\(\quad \square \)

With Theorem 2, we are now able to prove exponential relaxation to equilibrium claimed in Theorem 1, which essentially follows from a standard energy estimate.

Proof of Theorem 1

We first notice that the solution \(f\in H^1_{hyp}((0,T)\otimes \mu )\) for all \(T>0\). Indeed, as long as \(f_0\in L^2(\mu ;H^1_\kappa )\), we have \(f(t,\cdot ,\cdot ) \in L^2(\mu ;H^1_\kappa )\) for any \(t>0\) (see for example [54, Theorem 35]), and hence \(\partial _t f -{\mathcal {L}}_{\text {ham}}f = -\gamma \nabla _v^*\nabla _v f \in L^2(\lambda \otimes \mu ;H_\kappa ^{-1})\). We also have that (12) implies

$$\begin{aligned} \int _{{\mathbb {R}}^d\times {\mathbb {R}}^d} f(t,x,v) \,\textrm{d}\rho _{\infty }(x,v)=0 \end{aligned}$$

for all \(t\in (0,T)\). This follows from

$$\begin{aligned} \dfrac{\,\textrm{d}}{\,\textrm{d}t}\int _{{\mathbb {R}}^d\times {\mathbb {R}}^d} f(t,x,v)\,\textrm{d}\rho _{\infty }(x,v)=0, \end{aligned}$$

using the equation (4) and integration by parts.

For every \(0<s<t\), we have the typical energy estimate (hereafter we use \(L^2((s,t)\otimes \rho _{\infty })\) to denote \(L^2(\lambda _{(s,t)}\otimes \rho _{\infty })\)):

$$\begin{aligned} \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })}^2-\Vert f(s,\cdot )\Vert _{L^2(\rho _{\infty })}^2 =-2\gamma \Vert \nabla _v f\Vert _{L^2((s,t)\otimes \rho _{\infty })}^2. \end{aligned}$$
(43)

In particular,

$$\begin{aligned} \text { the mapping } t\mapsto \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })}^2 \text { is nonincreasing.} \end{aligned}$$
(44)

Since by equation (4),

$$\begin{aligned} -\gamma \nabla _v^*\nabla _v f=\partial _t f-{\mathcal {L}}_{\text {ham}}f, \end{aligned}$$

we have

$$\begin{aligned} \Vert \partial _t f-{\mathcal {L}}_{\text {ham}}f\Vert _{L^2((s,t)\otimes \mu ,H^{-1}_\kappa )}=\gamma \Vert \nabla _v^*\nabla _v f\Vert _{L^2((s,t)\otimes \mu ,H^{-1}_\kappa )}\leqslant \gamma \Vert \nabla _v f\Vert _{L^2((s,t)\otimes \rho _{\infty })}. \end{aligned}$$

Now fix T to be the length of the time interval. Denote \(b_1=C(\frac{1}{\sqrt{m}(1-e^{-\sqrt{m} T})}+T)\) and \(b_2=C(1+RT+ \frac{1}{(1-e^{-\sqrt{m}T})^2}+ \frac{R}{\sqrt{m}(1-e^{-\sqrt{m}T})^2})\), and thus by Theorem 2, (43) and (44), and Gaussian Poincaré inequality

$$\begin{aligned} \Vert ({Id}-\Pi _v)f\Vert _{L^2(\lambda \otimes \rho _{\infty })} \leqslant \Vert \nabla _v f\Vert _{L^2(\lambda \otimes \rho _{\infty })}, \end{aligned}$$

we have for time stamps \(t_k =kT\)

$$\begin{aligned}&\Vert f(t_k,\cdot )\Vert _{L^2(\rho _{\infty })}^2 -\Vert f(t_{k-1},\cdot )\Vert _{L^2(\rho _{\infty })}^2 \\&\quad \leqslant -\dfrac{2\gamma }{(b_1\gamma +b_2)^2}\left( b_2\Vert \nabla _v f\Vert _{L^2((t_{k-1},t_k)\otimes \rho _{\infty })}+b_1\Vert \partial _t f \right. \\&\qquad \left. -{\mathcal {L}}_{\text {ham}}f\Vert _{L^2((t_{k-1},t_k)\otimes \mu ,H^{-1}_\kappa )}\right) ^2 \\&\quad \leqslant -\dfrac{2\gamma }{(b_1\gamma +b_2)^2}\left( b_2\Vert ({Id}-\Pi _v) f\Vert _{L^2((t_{k-1},t_k)\otimes \rho _{\infty })}+b_1\Vert \partial _t f \right. \\&\qquad \left. -{\mathcal {L}}_{\text {ham}}f\Vert _{L^2((t_{k-1},t_k)\otimes \mu ,H^{-1}_\kappa )}\right) ^2 \\&\quad \leqslant -\dfrac{2\gamma }{(b_1\gamma +b_2)^2}\Vert f\Vert _{L^2((t_{k-1},t_k)\otimes \rho _{\infty })}^2 \\&\quad \leqslant -\dfrac{2\gamma T }{(b_1\gamma +b_2)^2}\Vert f(t_k,\cdot )\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

Now for any \(t>0\), we pick the integer k satisfying \(t_k\leqslant t < t_{k+1}\), so that \(\Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })} \leqslant \Vert f(t_k,\cdot )\Vert _{L^2(\rho _{\infty })}\). Applying above inequality iteratively and using the monoticity (44), we obtain

$$\begin{aligned} \Vert f(t,\cdot )\Vert _{L^2(\rho _{\infty })}^2&\leqslant \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) ^{-k} \Vert f_0\Vert _{L^2(\rho _{\infty })}^2 \\&\leqslant \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) ^{-\frac{t}{T}+1} \Vert f_0\Vert _{L^2(\rho _{\infty })}^2 \\&= \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) \exp \left( -\frac{t}{T}\log \left( 1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2}\right) \right) \Vert f_0\Vert _{L^2(\rho _{\infty })}^2. \end{aligned}$$

The prefactor

$$\begin{aligned}1+\dfrac{2\gamma T}{(b_1\gamma +b_2)^2} \leqslant C\left( 1+ \frac{\gamma T}{\left( \frac{\gamma }{\sqrt{m}}+\gamma T+1\right) ^2}\right) \end{aligned}$$

is bounded above by a constant. Using \(\log (1+x) \geqslant \frac{1}{C}x\) for \(x \in [0, \frac{1}{C}]\) for some universal constant C, and then pick \(T=\frac{1}{\sqrt{m}}\), this yields exponential decay with rate

$$\begin{aligned} \nu \geqslant C\frac{\gamma }{(b_1\gamma +b_2)^2} \geqslant C \frac{\gamma m}{(\gamma +R+\sqrt{m})^2}, \end{aligned}$$

which is precisely (13). \(\quad \square \)