1 Introduction

This work concerns the problem of transporting a probability measure \(d\mu =e^{-V}d{\text {Vol}}\) on a Riemannian manifold (Mg) onto a measure \(d\nu =e^{-(V+W)}d{\text {Vol}}\) which is an L-log-Lipschitz perturbation of \(\mu \), i.e., \(|\nabla W|\le L\). We will show that the Langevin transport map (also known as the heat flow transport map of Kim and Milman) between \(\mu \) and \(\nu \) is Lipschitz in various Euclidean and manifold settings. This construction of transport maps was introduced in [1]. Our main results are Theorem 1 in the Euclidean setting and Theorem 3 in the Riemannian setting (with sharp results in the special case of the sphere, Theorem 2). We shall also discuss applications to functional inequalities and optimal transport.

The question of proving Lipschitz bounds for transport maps goes back to the seminal work of Caffarelli [2], who proved that the quadratic optimal transport map (or Brenier map) from a standard Gaussian measure onto a uniformly log-concave measure on \({\mathbb {R}}^n\) is globally Lipschitz, with a dimension-free bound (see [3, 4] for alternative proofs based on entropic optimal transport and [5] for Sobolev bounds). The existence of such maps allows to transfer various functional, isoperimetric, and concentration inequalities from the source measure to the target measure [2, 6, 7]. For example, Caffarelli’s result immediately recovers the classical results of Bakry and Émery on sharp functional inequalities for uniformly log-concave measures. Moreover, Milman [8] showed that such Lipschitz estimates imply bounds on higher eigenvalues of certain differential operators, for which no non-transport proofs are known at this time. What is crucial for many applications, such as the correlation inequalities [7] and quantitative central limit theorems [9], is to ensure that the Lipschitz estimates are dimension-free.

Several extensions of Caffarelli’s theorem on the Brenier maps have been proven since then, such as [10], which showed that the optimal transport map from a Gaussian measure onto any log-compactly-supported perturbation is globally Lipschitz. Beyond these extensions not much is known about the Lipschitz properties of the optimal transport map, and there are many remaining questions [11]. However, for most applications, it is not particularly important that the map is optimal and any globally Lipschitz map will suffice. An emerging line of research has focused on the construction and analysis of Lipschitz transport maps beyond the setting of optimal transport. Our starting point is the paper [1], in which Kim and Milman introduced a new construction of transport maps and recovered, as well as extended, Caffarelli’s result. The construction is based on a time reversal of an overdamped Langevin (or drift-diffusion) SDE, and we shall call it the Langevin transport map in the sequel. As shown by Tanana [12], this construction does not coincide with the Brenier map in general (although they do coincide in dimension one), see also [13]. It is on this construction that the present work is based.

More recently, there have been several works investigating Lipschitz properties of the Langevin transport map in the Euclidean setting when the target measure satisfies certain convexity conditions [14,15,16,17]. Our focus here is on a different class of target measures, with a first-order condition on the target rather than a second-order condition (but still using strong convexity assumptions on the source measure). As a general motivation for relaxing regularity assumptions, we mention that reversing diffusion processes to construct transport maps has recently gained a lot of traction in the machine learning community [18, 19], where such constructions are used to generate samples from unknown distributions.

The study of the Langevin transport map led to new results that were previously unattainable for the optimal transport map. However, all results mentioned above are limited to measures on Euclidean spaces, while the question of Lipschitz transport maps is also interesting for non-Euclidean geometries. Indeed, for Riemannian manifolds there are some motivating questions of E. Milman [8] and of Beck and Jerison [20]. In light of this, an additional goal in this work is to establish sufficient conditions on weighted Riemmnanian manifolds to ensure the existence of Lipschitz transport maps. While the existence of a Lipschitz optimal transport map can be established in some manifold settings [21, Theorem 3.1], the bounds are dimension-dependent and the necessary assumptions are highly restrictive. In contrast, under the first-order condition described above, our results yield, for the first time, a construction of globally Lipschitz transport maps with explicit dimension-free bounds in a general manifold setting.

1.1 Main results

As explained above, our results concern Lipschitz estimates for the Langevin transport map between a source measure (on \({\mathbb {R}}^n\) or on a Riemannian manifold) and a target measure on the same space, whose density with respect to the original measure is log-Lipschitz.

1.1.1 Euclidean spaces

Our main result in the Euclidean setting, which shall be proved in Sect. 3, is

Theorem 1

Let \(\kappa >0\) and suppose that \(V:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is a thrice-differentiable function such that

$$\begin{aligned} \nabla ^2 V(x) \succeq \kappa \textrm{I}\quad \forall ~x\in {\mathbb {R}}^n \quad \text {and} \quad |\nabla ^3V(x)(u,u)|\le K \quad \forall ~u\in {\mathbb {S}}^{n-1}. \end{aligned}$$

Let \(d\mu =e^{-V}dx\) and \(d\nu =e^{-(V+W)}dx\) with

$$\begin{aligned} |\nabla W(x)|\le L \quad \forall ~ x\in {\mathbb {R}}^n. \end{aligned}$$

Then, the Langevin transport map between \(\mu \) and \(\nu \) is Lipschitz with constant \(\exp \left( 10\left[ \frac{L}{\sqrt{\kappa }}+\frac{L^2}{\kappa }+\frac{LK}{\kappa ^2}\right] \right) \).

Remark 1

The log-Lipschitz assumption implies similar results for measures \(\nu \) whose relative density with respect to \(\mu \) is Lipschitz and strictly bounded from below. Indeed, if \(|\nabla e^{-W}|\le C\) and \(e^{-W}>c>0\), then \(|\nabla W|\le L:=\frac{C}{c}\).

The dependence on L in Theorem 1 is sharp as can be evidenced by the standard Gaussian. Indeed, if \(\mu \) is the standard Gaussian measure on \({\mathbb {R}}\), then \(\kappa =1\) and \(K=0\). We now choose \(W(x)=L|x|+\log Z\), where Z is a normalizing constant, and suppose that there is an M-Lipschitz transport map from \(\mu \) to \(\nu \). Then, as in [6, 11], the Gaussian isoperimetric inequality can be transported to \(\nu \), up to a factor of \(\frac{1}{M}\). More precisely, given a set \(A\subset {\mathbb {R}}\) satisfying \(\nu (A)=\frac{1}{2}\), we have that \(\nu ^+(A)\ge \frac{1}{\sqrt{2\pi }M}\) where \(\nu ^+(A):=\lim _{\varepsilon \downarrow 0}\frac{\nu (A^{\varepsilon })-\nu (A)}{\varepsilon }\) with \(A^{\varepsilon }\) standing for the \(\varepsilon \)-neighborhood of A. We now take \(A=[0,\infty )\) and note that \(\nu (A)=\frac{1}{2}\), by symmetry, and that \(\nu ^+(A)=\frac{d\nu }{dx}(0)\). We have

$$\begin{aligned} Z=\int e^{L|x|}d\mu \ge \int e^{-Lx}d\mu =e^{\frac{L^2}{2}} \end{aligned}$$

so

$$\begin{aligned} \frac{d\nu }{dx}(0) = e^{-W(0)}\frac{d\mu }{dx}(0) =\frac{1}{Z}\frac{1}{\sqrt{2\pi }} \le \frac{e^{-\frac{L^2}{2}}}{\sqrt{2\pi }}. \end{aligned}$$

It follows that \(M\ge e^{\frac{L^2}{2}}\).

1.1.2 Riemannian manifolds

We now discuss the manifold setting in which our first result applies to the uniform measure on the round sphere \({\mathbb {S}}^{n-1}\), with radius scaled as \(\sqrt{n-2}\). In this case, we obtain a sharp estimate, similar to Theorem 1.

Theorem 2

Let \(n \ge 3\). Then, the Langevin transport map from the uniform measure on \({\mathbb {S}}^{n-1}(\sqrt{n-2})\) onto a measure with L-log-Lipschitz density is Lipschitz with constant \(\exp \left( 35(L+L^2)\right) \).

The scaling in \(\sqrt{n}\) of the radius is essentially sharp for dimension-free behavior, as well as the behavior in \(\exp (L + L^2)\) of the Lipschitz constant, since if we let n go to infinity, the distribution of a single coordinate converges to a standard Gaussian measure, so we should recover the same type of estimates as in the Gaussian case in the limit, which is indeed the case with our estimate. When compared to Theorem 1, one may see that the convexity term \(\kappa \) is replaced by the Ricci curvature of the sphere, which is of order 1 when the diameter scales like \(\sqrt{n}\). The other difference is the existence of the regularity parameter K, which does not appear in Theorem 2. Indeed, the source measure is uniform and its density is constant.

In Sect. 5 we shall consider a more general setting of weighted Riemannian manifolds and our bounds will be given both in terms of the regularity of the density and the curvature of the manifold, as in Theorem 2. In particular, our results shall apply to manifolds that satisfy the Bakry-Émery curvature-dimension condition, see [22, Chapter 1.6], as well as some extra conditions. We state here an informal version of the theorem and defer the exact statement and definitions to Theorem 5 in Sect. 5.

Theorem 3

Let \((M,g,\mu )\) be a weighted Riemannian manifold with \(\mu = e^{-V}d\textrm{Vol},\) and let \(\nu = e^{-W}d\mu \), with W an L-Lipschitz function. Then, under appropriate assumptions on the curvature of M and on V, the Langevin transport map from \(\mu \) onto \(\nu \) is Lipschitz with constant \(e^{e^{cL^2}}\) for large L, where c can be made explicit, and depends on the curvature and on V, but not on the dimension.

The estimate of Theorem 3 is still dimension-free, although significantly worse than what we derived in Theorem 1 and Theorem 2. We believe that the estimate of Theorem 3 is suboptimal and that the double exponential can be omitted, as in the sphere.

As far as we know, in the Riemannian setting, the above-mentioned results are the first results about Lipschitz transport maps with a dimension-free behavior (under appropriate scaling). The existence of such maps paves the way to several applications of interest. In Sect. 7, we prove new functional inequalities, both in the Euclidean and Riemannian settings, and we shall also discuss open problems on optimal transport maps.

1.1.3 The inverse map

In all of the examples above, we have taken a well-conditioned weighted manifold that satisfies some desirable combination of convexity and curvature assumptions. In these cases, we showed that log-Lipschitz perturbations can be realized as push-forward by Lipschitz mappings. We also investigate the analogous question in the reverse direction. Namely, we show that there exists a Lipschitz transport map from the perturbation, which is often not as well-behaved, back to the source measure.

Theorem 4

In all of the above settings, the inverses of the Langevin transport maps are Lipschitz with dimension-free constants.

The exact value of the Lipschitz constants turn out to be comparable to the Lipschitz constants of the Langevin transport maps. In Sect. 6 we give the exact dependence of the Lipschiz constants on all parameters. Theorem 4 has an additional interesting consequence: since maps can be composed, the theorem implies the existence of Lipschitz maps from one log-Lipschitz perturbation onto any other.

2 Preliminaries

Before explaining the construction of the Langevin transport map, we introduce the Langevin dynamics on \({\mathbb {R}}^n\). Let \(d\mu =e^{-V}dx\) be a probability measure on \({\mathbb {R}}^n\) with \(V: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) twice-differentiable. The Langevin dynamics, associated to \(\mu \), starting at \(x\in {\mathbb {R}}^n\) is the solution of the stochastic differential equation

$$\begin{aligned} dX_t^x=-\nabla V(X_t^x)dt+\sqrt{2}d\omega _t,\quad X_0^x=x, \end{aligned}$$
(1)

where \((\omega _t)\) is a standard Brownian motion in \({\mathbb {R}}^n\). We denote by \((P_t)\) the Langevin semigroup,

$$\begin{aligned} P_tf(x):= {\mathbb {E}}[f(X_t^x)], \end{aligned}$$

for any \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) for which the above is defined. It is well known that under appropriate regularity assumptions \((P_t)\) is ergodic. Consequently, for any \(x\in {\mathbb {R}}^n\), the law of \(X_t^x\) converges weakly to \(\mu \) as \(t\rightarrow \infty \).

The definition of the Langevin dynamics can readily be extended to Riemannian manifolds (see [22, 23] for in-depth discussions). Let (Mg) be a complete n-dimensional Riemannian manifold, and let O(M) be its orthonormal frame bundle. Let \((B_t)\) be a standard Brownian motion in \({\mathbb {R}}^n\) and let \((\Phi _t)\) be the horizontal Brownian motion on O(M) defined by

$$\begin{aligned} d\Phi _t=\sum _{i=1}^nH^i(\Phi _t)\circ dB_t, \end{aligned}$$

where \(\{H^i\}_{i\in [n]}\) are the horizontal lifts of the standard basis \(\{e_i\}_{i\in [n]}\), and \(\circ \) stands for the Stratonovitch integral. We assume that \((\Phi _t)\) does not explode in finite time. The process \((\omega _t)\) given by \(\omega _t=\pi \Phi _t\), where \(\pi :O(M)\rightarrow M\) is the canonical projection, is what we refer to as the Brownian motion on M.

Now, as in the Euclidean case, given a probability measure \(d\mu =e^{-V}d\textrm{Vol}\) on M, where \(d\textrm{Vol}\) is the volume measure on M and \(V:M\rightarrow {\mathbb {R}}\) is twice-differentiable, we define the Langevin dynamics starting at \(x\in M\) as the solution of the stochastic differential equation

$$\begin{aligned} dX_t^x=-\nabla _M V(X_t^x)dt+\sqrt{2}\circ d\omega _t,\quad X_0^x=x, \end{aligned}$$
(2)

where \(\nabla _M\) is the gradient in M. Note that the use of the Stratonovitch integral instead of the Itô integral is immaterial in our setting because the coefficient of the Brownian motion is a constant.

2.1 The Langevin transport map

We now briefly explain the construction of the Langevin transport map. The reader is referred to [14] for further details on the constructions introduced in this section. Let \(d\nu =e^{-W}d\mu \) be a probability measure on M and consider the Langevin dynamics, associated to \(\mu \), and starting at \(\nu \),

$$\begin{aligned} X_t:=X_t^x \ \ \text {with} \ \ x\sim \nu . \end{aligned}$$
(3)

The flow of probability measures \((\rho _t):=(\text {Law}(X_t))\) forms an interpolation between \(\rho _0=\nu \) and \(\rho _{\infty }=\mu \). In particular, it satisfies the transport equation

$$\begin{aligned} \partial _t\rho _t=-\nabla _M \cdot (\rho _t\nabla _M\log P_t e^{-W}), \end{aligned}$$

with \(\nabla _M\) standing for the gradient in M. We define a flow of diffeomorphisms \(\{S_t\}_{t \ge 0}\) as a solution to the following integral curve equation

$$\begin{aligned} \partial _t S_t=-\nabla _M\log P_t e^{-W}(S_t),\quad S_0(x)=x ~~\forall x\in M. \end{aligned}$$
(4)

It turns out that for any \(t \ge 0\), \(S_t\) transports \(\nu \) to \(\rho _t\). Setting \(T_t:=S_t^{-1}\), and letting

$$\begin{aligned} T:=\lim _{t\rightarrow \infty } T_t, \end{aligned}$$

we see that T transports \(\mu \) to \(\nu \). This is the Langevin transport map between \(\mu \) and \(\nu \).Footnote 1 The particular form of the equation in (4) is particularly amenable to establishing Lipschitz properties of T from estimates of \(\nabla _M^2\log P_te^{-W}\), where \(\nabla _M^2\) is the Hessian in M, since differentiating (4) yields

$$\begin{aligned} \partial _t \nabla _MS_t=-\nabla _M^2\log P_t e^{-W}(S_t)\nabla _MS_t,\quad \nabla _MS_0=\textrm{I}. \end{aligned}$$

This is the content of the following Lemma. The proof is an adaption of [14, Lemma 3] and [15, Lemma 3.2] which only consider Euclidean spaces.

Proposition 1

If for all \(t\ge 0\),

$$\begin{aligned} \nabla _M^2\log P_te^{-W}\preceq \theta _t g, \end{aligned}$$

then T is \(\exp \left( \int _0^{\infty }\theta _tdt\right) \)-Lipschitz.

Proof

Consider \(F_t: M \longrightarrow {\mathbb {R}}\) a time-dependent family of smooth functions on (Mg), such that for any \(t > 0\) we have

$$\begin{aligned} \nabla _M^2 F_t \ge -\theta _t g. \end{aligned}$$

As is classical (e.g., [25, Proposition 16.2 (iv’)]), this is equivalent to saying that for any \(t > 0\), \(x,y \in M\) and constant speed geodesic \(\alpha : [0, 1] \longrightarrow M\) connecting x to y, we have

$$\begin{aligned} \langle {\dot{\alpha }}(1), \nabla _M F_t(y) \rangle - \langle {\dot{\alpha }}(0), \nabla _M F_t(x)\rangle \ge -\theta _t d(x,y)^2. \end{aligned}$$

Therefore, if \(x_t\) and \(y_t\) are time-dependent gradient flows of \(F_t\) with different initial data, and \(\alpha ^t\) is a constant-speed geodesic connecting \(x_t\) to \(y_t\), we have (with d standing for the distance induced by the Riemannian metric),

$$\begin{aligned} \frac{d}{dt}d(x_{t},y_{t})^2&= 2\left( \langle {\dot{y}}_{t}, {\dot{\alpha }}^{t}(1) \rangle -\langle {\dot{x}}_{t}, {\dot{\alpha }}^{t}(0)\rangle \right) \\&= 2\left( \langle \nabla _{M} F(y_{t}), {\dot{\alpha }}^{t}(1) \rangle -\langle \nabla _{M} F(x_{t}),{\dot{\alpha }}^{t}(0)\rangle \right) \\&\ge -2\theta _{t} d(x_{t},y_{t})^2. \end{aligned}$$

The rest of the proof proceeds as in the proof of [15, Lemma 3.2], with the manifold setting making no difference. \(\square \)

As is clear from Proposition 1, our goal will be to estimate the Hessian along the Langevin dynamics. Such estimates have previously been studied in the literature, in particular by Elworthy and Li [26]. We also mention [27], which contains a pedagogical exposition of second derivative computations and estimates in the manifold setting. For the general manifold setting, our results will be based on the recent work of Cheng, Thalmaier, and Wang [28]. However, for the Euclidean and Sphere setting, our sharp results require new Hessian estimates, which we shall provide. All of the above methods are based on Bismut’s integration by parts formula [29].

3 Euclidean spaces

This section is devoted to the proof of Theorem 1. For the rest of this section, we fix two probability measures \(\mu = e^{-V(x)}dx\) and \(\nu =e^{-W(x)}d\mu \) on \({\mathbb {R}}^n\). We require that \(\mu \), the source, satisfies the assumptions of Theorem 1,

$$\begin{aligned} \nabla ^2 V(x) \succeq \kappa \textrm{I}\quad \forall ~x\in {\mathbb {R}}^n \quad \text {and} \quad |\nabla ^3V(x)(u,u)|\le K \quad \forall ~u\in {\mathbb {S}}^{n-1}, \end{aligned}$$

for some \(\kappa > 0\) and \(K \ge 0\). We also require that \(\nu \), the target, is an L-log-Lipschitz perturbation of \(\mu \),

$$\begin{aligned} |\nabla W(x)|\le L \quad \forall ~ x\in {\mathbb {R}}^n. \end{aligned}$$

Further, as in Sect. 2, we let \(X_t\) stand for the Langevin dynamics, associated to \(\mu \) with \(X_0 \sim \nu \), and \(P_t\) is its semigroup. As suggested by Proposition 1 we shall require the following global bound of \(\nabla ^2\log P_t e^{-W(x)}\).

Proposition 2

For every \(t\ge 0\) and \(x \in {\mathbb {R}}^n\),

$$\begin{aligned} \nabla ^2\log P_t e^{-W(x)}\preceq Le^{-\kappa t}\left[ 5L+\frac{5}{\sqrt{t}}+\frac{Kt}{2}\right] \textrm{I}. \end{aligned}$$

The proof of Proposition 2 will be the main focus of this section. Before delving into the proof, let us show how Theorem 1 follows.

Proof of Theorem 1

Using

$$\begin{aligned} \int _0^{\infty }e^{-\kappa t} dt=\frac{1}{\kappa }, \quad \quad \int _0^{\infty } \frac{e^{-\kappa t}}{\sqrt{t}}dt=\sqrt{\frac{\pi }{\kappa }}, \quad \quad \int _0^{\infty } t e^{-\kappa t} dt=\frac{1}{\kappa ^2}, \end{aligned}$$

the combination of Proposition 1 and Proposition 2 yields that T, the Langevin transport map, is Lipschitz with constant \(\exp \left( \frac{5\,L^2}{\kappa }+\frac{5\sqrt{\pi }L}{\sqrt{\kappa }}+\frac{LK}{2\kappa ^2}\right) \le \exp \left( 10\left[ \frac{L}{\sqrt{\kappa }}+\frac{L^2}{\kappa }+\frac{LK}{\kappa ^2}\right] \right) \). \(\square \)

3.1 Stochastic representation of the Hessian

In order to establish Lipschitz properties of the Langevin transport map our main task will be to derive global bounds on \(\nabla ^2\log P_te^{-W}\). Toward this goal, our main technical tool is a stochastic representation of \(\nabla ^2P_t\), known as Bismut’s formula. Before stating the formula, we first define the Jacobi flow of the Langevin dynamics as \((\nabla X_t^x)_{t\ge 0}\), where \(\nabla X_t^x:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) is the derivative of (the random function) \(X^x_t\) with respect to the initial condition x,

$$\begin{aligned} \nabla _u X_t^x:=\lim _{\varepsilon \downarrow 0}\frac{X_t^{x+\varepsilon u}-X_t^x}{\varepsilon }\in {\mathbb {R}}^n, \quad \forall u\in {\mathbb {R}}^n. \end{aligned}$$

The Jacobi flow \((\nabla X_t^x)_{t\ge 0}\) appears naturally when applying the chain rule in the differentiation of \( P_tf(x)\) with respect to x, formally,

$$\begin{aligned} \nabla P_tf(x)=\nabla {\mathbb {E}}[f(X_t^x)]={\mathbb {E}}[\nabla f(X_t^x)\nabla X_t^x]. \end{aligned}$$

The process \((\nabla X_t^x)\) satisfies the differential equation

$$\begin{aligned} \partial _t\nabla X_t^x=-\nabla ^2V(X_t^x)\nabla X_t^x, \quad \nabla X_0^x=\textrm{I}, \end{aligned}$$
(5)

which can be seen by differentiating (1) and noting that \((\omega _t)\) does not depend on x. Since we require Hessian estimates of \(P_tf\), we also consider the second variation of \((X_t^x)\),

$$\begin{aligned} \nabla _{u,v}^2X_t^x:=\lim _{\varepsilon \downarrow 0}\frac{\nabla _v X_t^{x+\varepsilon u}-\nabla _v X_t^x}{\varepsilon }\in {\mathbb {R}}^n, \quad \forall u,v\in {\mathbb {R}}^n \end{aligned}$$

(which is symmetric with respect to u and v). The equation for the process \((\nabla ^2X_t^x)\) can be obtained by differentiating (5),

$$\begin{aligned} \partial _t\nabla _{u,v}^2 X_t^x=-\nabla ^3V(X_t^x)(\nabla _u X_t^x,\nabla _v X_t^x)-\nabla ^2V(X_t^x)\nabla _{u,v}^2 X_t^x, \quad \nabla ^2 X_0^x=0, \end{aligned}$$
(6)

where, for \(x\in {\mathbb {R}}^n\) and \(u,v\in {\mathbb {R}}^n\),

$$\begin{aligned} {[}\nabla ^3V(x)(u,v)]_i=\sum _{j,k=1}^n\partial ^3_{ijk}V(x)u_jv_k=\nabla _u\nabla _v(\nabla V)_i(x)\quad \forall ~i\in [n]. \end{aligned}$$

With these definitions, and the following notation,

$$\begin{aligned} \nabla _vf(x):= \langle f(x), v\rangle \quad \text { and } \quad \nabla ^2_{v,u}f(x) = \langle v,\nabla ^2f(x)u\rangle \end{aligned}$$

(the definition is extended naturally for vector-valued functions), the stochastic formula now reads:

Lemma 1

([26, p. 8 in arXiv version]) For any \(x\in {\mathbb {R}}^n\), \(u,v\in {\mathbb {R}}^n\), and \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) twice-differentiable,

$$\begin{aligned} \nabla _{u,v}^2P_t f(x)&=\frac{1}{t\sqrt{2}}{\mathbb {E}}\left[ \nabla _v [f(X_t^x)] M_t^{x,u}\right] +\frac{1}{t}\int _0^t {\mathbb {E}}\left[ \langle \nabla P_{t-s}f(X_s^x),\nabla _{u,v}^2X_s^x\rangle \right] ds \end{aligned}$$

where

$$\begin{aligned} M_t^{x,u}:=\int _0^t\langle \nabla _u X_s^x, d\omega _s\rangle \quad x\in {\mathbb {R}}^n~, u\in {\mathbb {R}}^n. \end{aligned}$$

In our setting, \(f=e^{-W}\) and we need to bound the various quantities appearing in Lemma 1. The first derivative of \(P_te^{-W}\) can be controlled by the log-Lipschitz assumption. The first variation \(\nabla X^x_t\) is controlled by the \(\kappa \)-log-concavity assumption, and the second variation \(\nabla ^2 X^x_t\) is controlled by the \(\kappa \)-log-concavity assumption and the bound on the third derivative of V.

3.2 Estimates

The purpose of this section is to provide the requisite estimates for the proof of Proposition 2. We start with the first variation.

Lemma 2

(First variation estimate) Suppose that \(\nabla ^2 V(x) \succeq \kappa \textrm{I}\) for every \(x\in {\mathbb {R}}^n\). Then, a.s., for any fixed \(v\in S^{n-1}\),

$$\begin{aligned} |\nabla _vX_t^x|\le e^{-\kappa t}\quad \forall ~t\ge 0. \end{aligned}$$

Consequently, for any differentiable \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\),

$$\begin{aligned} |\nabla [f(X_t^x)]|\le e^{-\kappa t} |\nabla f(X_t^x)|. \end{aligned}$$

Proof

By the Cauchy-Schwarz inequality, for any \(v\in {\mathbb {R}}^n\),

$$\begin{aligned} |\nabla _v[f(X_t^x)]|=|\langle \nabla f(X_t^x),\nabla _vX_t^x\rangle |\le |\nabla f(X_t^x)||\nabla _vX_t^x|. \end{aligned}$$

Hence, taking the supremum over \(v\in {\mathbb {S}}^{n-1}\) it suffices to show that \(|\nabla _vX_t^x|\le e^{-\kappa t}\). To see the latter bound fix \(v\in {\mathbb {S}}^{n-1}\) and let \(\beta _v:{\mathbb {R}}_{\ge 0}\rightarrow {\mathbb {R}}_{\ge 0}\) be given by \(\beta _v(t):=|\nabla _vX_t^x|\). Then, by (5) and the convexity assumption on V,

$$\begin{aligned} \partial _t \beta _v(t)=\frac{\langle \nabla _vX_t^x), \partial _t\nabla _vX_t^x\rangle }{\beta _v(t)} =-\frac{\langle \nabla _vX_t^x,\nabla ^2V(X_t^x)\nabla _vX_t^x \rangle }{\beta _v(t)}\le -\kappa \frac{|\nabla _vX_t^x|^2}{\beta _v(t)}=-\kappa \beta _v(t). \end{aligned}$$

Since \(\beta _v(0)=1\), Grönwall’s inequality yields \(|\nabla _vX_t^x|\le e^{-\kappa t}\). \(\square \)

We now move to the second variation.

Lemma 3

(Second variation estimate) Suppose that

$$\begin{aligned} \nabla ^2 V(x) \succeq \kappa \textrm{I}\quad \forall ~x\in {\mathbb {R}}^n \quad \text {and} \quad |\nabla ^3V(x)(u,u)|\le K \quad \forall ~u\in {\mathbb {S}}^{n-1}. \end{aligned}$$

Then, a.s., for any \(t\ge 0\) and \(v\in {\mathbb {S}}^{n-1}\),

$$\begin{aligned} |\nabla _{v,v}^2X_t^x|\le K t e^{-\kappa t}. \end{aligned}$$

Proof

Fix \(v\in {\mathbb {S}}^{n-1}\) and let \(\beta _v:{\mathbb {R}}_{\ge 0}\rightarrow {\mathbb {R}}_{\ge 0}\) be given by \(\beta _v(t):=|\nabla _{v,v}^2X_t^x|\). By (6) and Lemma 2,

$$\begin{aligned} \partial _t\beta _v(t)&=\frac{\langle \nabla _{v,v}^2 X_t^x, \partial _t\nabla _{v,v}^2 X_t^x\rangle }{\beta _v(t)}\\&=\frac{\langle \nabla _{v,v}^2 X_t^x,-\nabla ^3V(X_t^x)(\nabla _v X_t^x,\nabla _v X_t^x)-\nabla ^2V(X_t^x)\nabla _{v,v}^2 X_t^x\rangle }{\beta _v(t)}\\&\le K \frac{|\nabla _{v,v}^2 X_t^x||\nabla _v X_t^x|}{\beta _v(t)}-\kappa \frac{|\nabla _{v,v}^2 X_t^x|^2}{\beta _v(t)}=K|\nabla _v X_t^x| -\kappa \beta _v(t)\le Ke^{-\kappa t} -\kappa \beta _v(t). \end{aligned}$$

The solution to the differential equation

$$\begin{aligned} \partial _t\xi (t)=Ke^{-\kappa t} -\kappa \xi (t),\quad \xi (0)=0 \end{aligned}$$

is \(\xi (t)=Kte^{-\kappa t}\) so by Grönwall’s inequality

$$\begin{aligned} |\nabla _{v,v}^2X_t^x|=\beta _v(t)\le K t e^{-\kappa t}. \end{aligned}$$

\(\square \)

We will also use a reverse Hölder inequality for log-Lipschitz functions under a \(\kappa \)-log-concavity assumption:

Lemma 4

Let \(d\mu =e^{-V}dx\) and suppose that \(\nabla ^2 V(x) \succeq \kappa \textrm{I}\) for every \(x\in {\mathbb {R}}^n\). Then, for every L-log-Lipschitz function \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\),

$$\begin{aligned} P_t(f^2)\le \exp \left( L^2\frac{1-e^{-2\kappa t}}{\kappa }\right) (P_tf)^2. \end{aligned}$$

Proof

It is well-known (e.g., [30, Eq. (1.3)]) that if a measure \(\eta \) satisfies a log-Sobolev inequality with constant C then it satisfies a reverse Hölder inequality for L-log-Lipschitz functions,

$$\begin{aligned} \Vert f\Vert _{L^p(\eta )}\le \exp \left( \frac{C}{2}L^2(p-q)\right) \Vert f\Vert _{L^q(\eta )}\quad \forall ~0\le q<p. \end{aligned}$$

Since the measure \(\eta :=P_t^*\delta _x\) satisfies a log-Sobolev inequality with constant \(C=\frac{1-e^{-2\kappa t}}{2\kappa }\) [22, Eq. (5.5.5)], and since

$$\begin{aligned} P_t(f^2)(x)=\Vert f\Vert _{L^2(\eta )}^2\quad \text {and}\quad (P_tf)^2=\Vert f\Vert _{L^1(\eta )}^2, \end{aligned}$$

we get

$$\begin{aligned} P_t(f^2)\le \exp \left( \frac{1}{4}L^2\frac{1-e^{-2\kappa t}}{\kappa }\right) (P_tf)^2\le \exp \left( L^2\frac{1-e^{-2\kappa t}}{\kappa }\right) (P_tf)^2. \end{aligned}$$

\(\square \)

Remark 2

The statement, and proof, of Lemma 4 can be extended verbatim to the manifold setting by requiring that the measure satisfies the so-called curvature-dimension condition CD\((\kappa , \infty )\). In particular, the statement is for the uniform measure on the unit sphere \({\mathbb {S}}^{n-1}\), with \(\kappa \) replaced by \(n-2\).

We shall also make use of the following classical lemma about martingales:

Lemma 5

Let \(M_t\) be a continuous martingale, with \(M_0 = 0\), and whose quadratic variation is almost surely bounded, that is, \(\langle M\rangle _t \le \varphi (t)\) for all \(t \ge 0\). Then

$$\begin{aligned} {\mathbb {P}}[|M_t| \ge \delta ] \le 2\exp \left( -\frac{\delta ^2}{2\varphi (t)}\right) . \end{aligned}$$

Proof

From Novikov’s condition, for any \(\sigma \in {\mathbb {R}}\) we have

$$\begin{aligned} {\mathbb {E}}[\exp (\sigma M_t - \sigma ^2\langle M\rangle _t/2)] = 1, \end{aligned}$$

so

$$\begin{aligned} {\mathbb {E}}[\exp (\sigma M_t )] \le \exp (\sigma ^2 \varphi (t)/2). \end{aligned}$$

We then apply the Chernoff bound,

$$\begin{aligned} {\mathbb {P}}[M_t \ge \delta ] \le \inf _{\sigma \ge 0} \exp (-\sigma \delta ){\mathbb {E}}[\exp (\sigma M_t)] \end{aligned}$$

to conclude. \(\square \)

3.3 Proof of Proposition 2

By Lemma 1,

$$\begin{aligned} \nabla _{u,v}^2P_t f(x)&=\frac{1}{t\sqrt{2}}{\mathbb {E}}\left[ \nabla _v [f(X_t^x)] M_t^{x,u}\right] +\frac{1}{t}\int _0^t {\mathbb {E}}\left[ (\nabla P_{t-s}f(X_s^x))^{\top }\nabla _{u,v}^2X_s^x\right] ds, \end{aligned}$$

and we will apply this identity with \(f=e^{-W}\). We start by analyzing the first term.

Lemma 6

For any \(v\in {\mathbb {S}}^{n-1}\),

$$\begin{aligned} \frac{1}{t\sqrt{2}}{\mathbb {E}}\left[ \nabla _v [e^{-W(X_t^x)}] M_t^{x,u}\right] \le 5e^{-\kappa t}\left[ L^2+\frac{L}{\sqrt{t}}\right] P_te^{-W(x)}. \end{aligned}$$

Proof

For any \(\delta \ge 0\) and f differentiable,

$$\begin{aligned}&|{\mathbb {E}}\left[ \nabla _v [f(X_t^x)] M_t^{x,u}\right] |\\ {}&\quad \le {\mathbb {E}}\left[ |\nabla _v [f(X_t^x)]| |M_t^{x,u}|\right] \\&\quad ={\mathbb {E}}\left[ 1_{ |M_t^{x,u}|< \delta }|\nabla _v [f(X_t^x)]| |M_t^{x,u}|\right] +{\mathbb {E}}\left[ 1_{ |M_t^{x,u}|\ge \delta }|\nabla _v [f(X_t^x)]| |M_t^{x,u}|\right] \\&\quad \le \delta {\mathbb {E}}\left[ |\nabla _v [f(X_t^x)]| \right] +{\mathbb {E}}\left[ |\nabla _v [f(X_t^x)]|^2\right] ^{1/2}{\mathbb {E}}\left[ 1_{ |M_t^{x,u}|\ge \delta } |M_t^{x,u}|^2\right] ^{1/2}\\&\quad \le \delta {\mathbb {E}}\left[ |\nabla _v [f(X_t^x)]| \right] +{\mathbb {E}}\left[ |\nabla _v [f(X_t^x)]|^2\right] ^{1/2}{\mathbb {E}}\left[ |M_t^{x,u}|^4\right] ^{1/4}{\mathbb {P}}[ |M_t^{x,u}|\ge \delta ]^{1/4}. \end{aligned}$$

We will now analyze the terms above one-by-one. For \(v\in {\mathbb {S}}^{n-1}\), Lemma 2 yields

$$\begin{aligned} |\nabla _v [f(X_t^x)]|\le |\nabla [f(X_t^x)]| \le e^{-\kappa t} |\nabla f(X_t^x)|= e^{-\kappa t}|f(X_t^x)| |\nabla \log f(X_t^x)|, \end{aligned}$$

so taking \(f=e^{-W}\) yields

$$\begin{aligned} |\nabla _v [e^{-W(X_t^x)}]|\le Le^{-\kappa t}e^{-W(X_t^x)}. \end{aligned}$$
(7)

It follows that

$$\begin{aligned} {\mathbb {E}}[|\nabla _v [e^{-W(X_t^x)}]|]\le L e^{-\kappa t}P_te^{-W(x)}. \end{aligned}$$
(8)

For the second term, (7) gives

$$\begin{aligned} {\mathbb {E}}[|\nabla _v [e^{-W(X_t^x)}]|^2]^{1/2}\le L e^{-\kappa t}\left( P_t(e^{-W(x)})^2\right) ^{1/2}, \end{aligned}$$

so applying Lemma 4 yields

$$\begin{aligned} {\mathbb {E}}[|\nabla _v [e^{-W(X_t^x)}]|^2]^{1/2}\le L \exp \left( L^2\frac{1-e^{-2\kappa t}}{2\kappa }-\kappa t\right) P_te^{-W(x)}. \end{aligned}$$
(9)

Next we turn to analyze the martingale \((M_t)\). By Lemma 2, \(|\nabla _uX_t^x|\le e^{-\kappa t}\) a.s. for any \(u\in {\mathbb {S}}^{n-1}\) so the quadratic variation of \((M_t)\) satisfies

$$\begin{aligned} \langle M^{x,u}\rangle _t=\int _0^t |\nabla _uX_s^x|^2 ds\le \frac{1-e^{-2\kappa t}}{2\kappa }. \end{aligned}$$
(10)

On the other hand, the Burkholder-Davis-Gundy inequalities yield that, for any \(0<p<\infty \),

$$\begin{aligned} {\mathbb {E}}[|M_t^{x,u}|^p]\le C_p{\mathbb {E}}[\langle M^{x,u}\rangle _t^{p/2}], \end{aligned}$$
(11)

for a constant \(C_p\) depending only on p. Combining (10) and (11), with \(p=4\), and using \(C_4\le 360\) (see [31, proof of Proposition 3.26]), yields

$$\begin{aligned} {\mathbb {E}}[|M_t^{x,u}|^4]^{1/4}\le 5\sqrt{\frac{1-e^{-2\kappa t}}{2\kappa }}. \end{aligned}$$
(12)

Moreover, applying Lemma 5, we have

$$\begin{aligned} {\mathbb {P}}[ |M_t^{x,u}|\ge \delta ] \le 2\exp \left( -\delta ^2\frac{\kappa }{1-e^{-2\kappa t}}\right) . \end{aligned}$$

It follows that

$$\begin{aligned} {\mathbb {P}}[ |M_t^{x,u}|\ge \delta ]^{1/4}\le 2^{1/4}\exp \left( -\frac{\delta ^2}{4}\frac{\kappa }{1-e^{-2\kappa t}}\right) . \end{aligned}$$
(13)

Combining (8), (9), (12), (13) yields

$$\begin{aligned}&\frac{1}{t\sqrt{2}}|{\mathbb {E}}\left[ \nabla _v [f(X_t^x)] M_t^{x,u}\right] |\\&\quad \le \frac{1}{t\sqrt{2}}\left[ \delta L e^{-\kappa t}+ L \exp \left( L^2\frac{1-e^{-2\kappa t}}{2\kappa }-\kappa t\right) 5\sqrt{\frac{1-e^{-2\kappa t}}{2\kappa }}2^{1/4}\exp \left( -\frac{\delta ^2}{4}\frac{\kappa }{1-e^{-2\kappa t}}\right) \right] P_te^{-W(x)}\\&\quad = \frac{Le^{-\kappa t}}{t\sqrt{2}}\left[ \delta + 5\cdot 2^{1/4}\sqrt{\frac{1-e^{-2\kappa t}}{2\kappa }}\exp \left( L^2\frac{1-e^{-2\kappa t}}{2\kappa }\right) \exp \left( -\frac{\delta ^2}{8}\frac{2\kappa }{1-e^{-2\kappa t}}\right) \right] P_te^{-W(x)}. \end{aligned}$$

We choose \(\delta \) so that the term \(e^{L^2}\) vanishes. In particular, we take

$$\begin{aligned} \delta :=2\sqrt{2}L\frac{1-e^{-2\kappa t}}{2\kappa } \end{aligned}$$

to get

$$\begin{aligned} \frac{1}{t\sqrt{2}}|{\mathbb {E}}\left[ \nabla _v [f(X_t^x)] M_t^{x,u}\right] |&\le \frac{Le^{-\kappa t}}{t\sqrt{2}}\left[ 2\sqrt{2}L\frac{1-e^{-2\kappa t}}{2\kappa } + 5\cdot 2^{1/4}\sqrt{\frac{1-e^{-2\kappa t}}{2\kappa }}\right] P_te^{-W(x)}\\&\le \frac{Le^{-\kappa t}}{t\sqrt{2}}\left[ 2\sqrt{2}L\frac{2\kappa t}{2\kappa } + 5\cdot 2^{1/4}\sqrt{\frac{2\kappa t}{2\kappa }}\right] P_te^{-W(x)}\\&=\left[ 2L^2e^{-\kappa t} + 5\cdot 2^{-1/2}\cdot 2^{1/4}\frac{Le^{-\kappa t}}{\sqrt{t}}\right] P_te^{-W(x)}\\&\le 5e^{-\kappa t}\left[ L^2+\frac{L}{\sqrt{t}}\right] P_te^{-W(x)}. \end{aligned}$$

\(\square \)

We now analyze the second term.

Lemma 7

$$\begin{aligned} \frac{1}{t}\int _0^t \left| {\mathbb {E}}\left[ \langle \nabla P_{t-s}e^{-W(X_s^x)},\nabla _{u,u}^2X_s^x\rangle \right] \right| ds\le LKt\frac{e^{-\kappa t}}{2}P_te^{-W(x)}. \end{aligned}$$

Proof

By [22, Eq. (5.5.4)], for any \(y\in {\mathbb {R}}^n\),

$$\begin{aligned} |\nabla P_{t-s}e^{-W(y)}|&\le e^{-\kappa (t-s)}P_{t-s}\left( |\nabla e^{-W(y)}|\right) =e^{-\kappa (t-s)}P_{t-s}\left( |\nabla W|e^{-W(y)}\right) \nonumber \\&\le L e^{-\kappa (t-s)} P_{t-s}e^{-W(y)} \end{aligned}$$
(14)

so by (14) and Lemma 3,

$$\begin{aligned}&\frac{1}{t}\int _0^t \left| {\mathbb {E}}\left[ \langle \nabla P_{t-s}e^{-W(X_s^x)},\nabla _{u,u}^2X_s^x\rangle \right] \right| ds\\&\quad \le \frac{1}{t}\int _0^t {\mathbb {E}}\left[ |\nabla P_{t-s}e^{-W(X_s^x)}||\nabla _{u,u}^2X_s^x|\right] ds\\&\quad \le \frac{1}{t} \int _0^t {\mathbb {E}}\left[ e^{-\kappa (t-s)}L (P_{t-s}e^{-W(X_s^x)}) Ks e^{-\kappa s}\right] ds = LK\frac{e^{-\kappa t}}{t}P_te^{-W(x)}\int _0^t s ds\\&\quad = LKt\frac{e^{-\kappa t}}{2}P_te^{-W(x)}. \end{aligned}$$

\(\square \)

We can now complete the proof of Proposition 2. By Lemma 6 and Lemma 7,

$$\begin{aligned} \nabla _{u,u}^2\log P_t e^{-W(x)}&=\frac{\nabla _{u,u}^2P_t e^{-W(x)}}{P_te^{-W(x)}}-|\nabla _u\log P_t e^{-W(x)}|^2\le \frac{\nabla _{u,u}^2P_t e^{-W(x)}}{P_te^{-W(x)}}\\&\le 5e^{-\kappa t}\left[ L^2+\frac{L}{\sqrt{t}}\right] +\frac{LK t}{2} e^{-\kappa t}\\&=Le^{-\kappa t}\left[ 5L+\frac{5}{\sqrt{t}}+\frac{Kt}{2}\right] . \end{aligned}$$

4 The round sphere

This section is devoted to the proof of Theorem 2. Throughout this section, we let \(\mu \) stand for the uniform probability measure on \({\mathbb {S}}^{n-1}\), the unit sphere in \({\mathbb {R}}^n\), and let \(\nu = e^{-W(x)}d\mu \) be an L-log-Lipschitz perturbation of \(\mu \). In other words, if \(\nabla _S\) is the spherical gradient, then for almost every \(x \in {\mathbb {S}}^{n-1}\),

$$\begin{aligned} |\nabla _SW(x)| \le L. \end{aligned}$$

As before, \(X_t^x\) is the Langevin process associated with \(\mu \), with semigroup \((P_t)\). Since \(\mu \) is uniform, \(X_t^x\) takes a particularly simple form \(X_t^x = \omega _t\), where \(\omega _t\) is a standard Brownian motion on \({\mathbb {S}}^{n-1}\), with \(\omega _0 = x\). We also write \(X_t\) when \(X_0 \sim \nu \).

As suggested by Proposition 1, the proof of Theorem 2 will require bounding \(\nabla _S^2\log P_t e^{-W(x)}\). The bound of \(\nabla _S^2\log P_t e^{-W(x)}\) will be obtained, as in the Euclidean setting, by using a stochastic representation via Bismut’s formula. The main result of this section is the following result.

Proposition 3

For every \(t\ge 0\) and \(x \in {\mathbb {S}}^{n-1}\),

$$\begin{aligned} \nabla _S^2\log P_t e^{-W(x)}\preceq 12\left( L + \frac{L^2}{\sqrt{n-2}}\right) e^{-(n-2)t}\left( \frac{1}{\sqrt{t}} + 1\right) g. \end{aligned}$$

Before delving into the proof of Proposition 3, let us show how Theorem 2 follows.

Proof of Theorem 2

Using

$$\begin{aligned} \int _0^{\infty }e^{-(n-2) t} dt=\frac{1}{n-2}, \quad \quad \int _0^{\infty } \frac{e^{-(n-2) t}}{\sqrt{t}}dt=\sqrt{\frac{\pi }{n-2}}, \end{aligned}$$

the combination of Proposition 1 and Proposition 3 yields that T, the Langevin transport map, is Lipschitz with constant \(\exp \left( 12\left( \frac{L}{\sqrt{n-2}} + \frac{L^2}{n-2}\right) \left( \frac{1}{\sqrt{n-2}} + \sqrt{\pi }\right) \right) \le \exp \left( 35\left( \frac{L}{\sqrt{n-2}} + \frac{L^2}{n-2}\right) \right) .\) The result now follows by re-scaling the sphere. \(\square \)

4.1 Stochastic representation of the Hessian

The proof of Proposition 3 will rely on a stochastic representation of the Hessian of the heat semigroup on the sphere, analogous to Lemma 1. Before introducing the particular form of this representation, we recall a few facts about the curvature of the sphere. Let g be the metric on \({\mathbb {S}}^{n-1}\) induced from \({\mathbb {R}}^n\). Given vector fields XYZ, the Riemannian curvature tensor on the sphere is given by

$$\begin{aligned} {\text {Riem}}_S(X,Y)Z=g(Y,Z) X-g( X,Z) Y, \end{aligned}$$
(15)

and its trace, the Ricci curvature, is given by

$$\begin{aligned} {\text {Ric}}_S(Y,Z)=(n-2)g( Y,Z). \end{aligned}$$

For \(t \ge 0\) and \(x \in {\mathbb {S}}^{n-1}\), we use \(//_t: T_x{\mathbb {S}}^{n-1}\rightarrow T_{X_t^x}{\mathbb {S}}^{n-1}\) for the (random) parallel transport operator between the tangent spaces at x and at \(X_t^x\). The role of the Jacobi flow will be played by the operators \(Q_t: T_x{\mathbb {S}}^{n-1}\rightarrow T_{X_t^x}{\mathbb {S}}^{n-1}\) defined as,

$$\begin{aligned} Q_t:= e^{-(n-2)t}//_t. \end{aligned}$$
(16)

An elementary calculation using the above formula for \({\text {Ric}}_S\) shows that

$$\begin{aligned} \partial _tQ_t=-{\text {Ric}}_S^{\sharp } Q_t; \hspace{3mm} Q_0 = {\text {id}}, \end{aligned}$$

which is analogous to (5) for the Jacobi flow with a curvature term. Here, \({\text {Ric}}_S^{\sharp }\) is defined by the canonical isomorphism \(g({\text {Ric}}_S^{\sharp }(u), v)_x = {\text {Ric}}_S(u,v)\) for all \( x \in M\), and \(u, v \in T_xM.\)

Lemma 8

([28, Lemma 2.6]Footnote 2) Let \(f:{\mathbb {S}}^{n-1}\rightarrow {\mathbb {R}}\) and \(t > 0\). For any continuously differentiable \(k:[0,t]\rightarrow {\mathbb {R}}\) such that \(k(0) = 1\) and \(k(t) = 0\), we have, for any \(x \in {\mathbb {S}}^{n-1}\) and \(v,u\in T_x{\mathbb {S}}^{n-1}\),

$$\begin{aligned} \nabla _S^2P_tf (v, u)&= - \sqrt{2}{\mathbb {E}}\left[ g(\nabla _Sf(X^x_t),Q_t(v)) \int _0^t g(Q_s({\dot{k}}(s)u), //_sdB_s)\right] \\&\quad +\sqrt{2}{\mathbb {E}}\left[ g\left( \nabla _Sf(X^x_t), Q_t\int _0^tQ_s^{-1}{\text {Riem}}_S(//_sdB_s,Q_s(v))Q_s(k(s)u)\right) \right] , \end{aligned}$$

where \(B_s\) is a standard Brownian motion on \(T_x{\mathbb {S}}^{n-1}\).

Lemma 8 allows for some flexibility with the choice of the function k. For our use, we fix the following specific choice

$$\begin{aligned} k(s):= 1-\frac{s}{t}\quad \forall s\in [0,t]. \end{aligned}$$
(17)

4.2 Proof of Proposition 3

As in the Euclidean setting, we need to bound the first and second terms of Lemma 8 separately. The first term is analogous to the first term of Lemma 1 where we need to bound the martingale \(\int _0^t \langle Q_s({\dot{k}}(s)w), //_sdB_s\rangle \). The second term of Lemma 1 is absent in the sphere setting since \(\mu \) is the uniform measure. However, due to the curvature of the sphere, we get the second term in Lemma 8. As we will show, this term also involves a martingale. More precisely, the term

$$\begin{aligned} Q_t\int _0^tQ_s^{-1}{\text {Riem}}_S(//_sdB_s,Q_s(v))Q_s(k(s)v) \end{aligned}$$

is a martingale, which is a feature of the constant curvature of the sphere, absent in the general manifold setting. In light of this, to bound \(\nabla _S^2P_te^{-W}\), we shall require the following martingale argument, which extends the bounds obtained in the Euclidean setting.

Lemma 9

Let \(f:{\mathbb {S}}^{n-1}\rightarrow {\mathbb {R}}\) be an L-log-Lipschitz function and let \((M_t)\) be a martingale on \(T_x{\mathbb {S}}^{n-1}\), for some \(x \in {\mathbb {S}}^{n-1}\). Assume that almost surely, \(M_t\) has bounded quadratic variation, that is

$$\begin{aligned} \langle M \rangle _t \le \varphi (t) \hspace{2mm} a.s., \end{aligned}$$

for some \(\varphi (t) > 0\). Then

$$\begin{aligned} {\mathbb {E}}[|g( \nabla _Sf(X^x_t),//_t M_t )|] \le \left( 6L + 2\frac{L^2}{\sqrt{n-2}}\right) \sqrt{\varphi (t)}{\mathbb {E}}[f(X^x_t)]. \end{aligned}$$

Proof

Since \(\mu \) is the uniform measure on \({\mathbb {S}}^{n-1}\), Lemma 4 and Remark 2 yield

$$\begin{aligned} {\mathbb {E}}[|\nabla _Sf(X^x_t)|^2] \le L^2{\mathbb {E}}[f(X^x_t)^2] \le L^2\exp \left( L^2\frac{1 - e^{-2(n-2) t}}{n-2}\right) {\mathbb {E}}[f(X^x_t)]^2, \end{aligned}$$

where the first inequality is the L-log-Lipschitz condition coupled with the chain rule. As in the Euclidean setting, we have, for any \(\delta > 0\),

$$\begin{aligned} {\mathbb {E}}[|\nabla _Sf(X^x_t)| |M_t|] \le \delta {\mathbb {E}}[|\nabla _Sf(X^x_t)|] + {\mathbb {E}}[|\nabla _Sf(X^x_t)|^2]^{1/2}{\mathbb {E}}[|M_t|^4]^{1/4}{\mathbb {P}}[|M_t| \ge \delta ]^{1/4}, \end{aligned}$$

as well as the bound \({\mathbb {E}}[|M_t|^4]^{1/4} \le 5\varphi (t)^{1/2}\) which follows from the Burkholder-Davis-Gundy inequality [31, Proposition 3.26] and the bound on the quadratic variation. Moreover, using Lemma 5, \({\mathbb {P}}[|M_t| \ge \delta ] \le 2\exp \left( -\frac{\delta ^2}{2\varphi (t)}\right) \). It follows that

$$\begin{aligned} {\mathbb {E}}[|\nabla _Sf(X^x_t)| |M_t|]&\le \delta {\mathbb {E}}[|\nabla _Sf(X^x_t)|]\\&\quad + \left( 5\cdot 2^{1/4} L\exp \left( L^2\frac{1 - e^{-2(n-2) t}}{2(n-2)}\right) \varphi (t)^{1/2}\left( -\frac{\delta ^2}{8\varphi (t)} \right) \right) {\mathbb {E}}[f(X_t)]\\&\le \left( \delta L+5\cdot 2^{1/4} L\exp \left( L^2\frac{1 - e^{-2(n-2) t}}{2(n-2)}\right) \varphi (t)^{1/2}\right. \\&\quad \left. \exp \left( -\frac{\delta ^2}{8\varphi (t)}\right) \right) {\mathbb {E}}[f(X_t)], \end{aligned}$$

where the second inequality used \({\mathbb {E}}[|\nabla _Sf(X^x_t)|] \le L{\mathbb {E}}[f(X^x_t)]\) since f is L-log-Lipschitz. We choose \(\delta \) so that the term \(e^{L^2}\) vanishes. In particular, we take \(\delta = 2\sqrt{2}L\sqrt{\varphi (t)}\sqrt{\frac{1 - e^{-2(n-2) t}}{2(n-2)}}\), we get

$$\begin{aligned} {\mathbb {E}}[|\nabla _Sf(X^x_t)| |M_t|]&\le \sqrt{\varphi (t)}\left( 2\sqrt{2}L^2\sqrt{\frac{1 - e^{-2(n-2)t}}{2(n-2)}} + 5\cdot 2^{1/4}L \right) {\mathbb {E}}[f(X^x_t)] \\&\le \left( 6L + 2\frac{L^2}{\sqrt{n-2}}\right) \sqrt{\varphi (t)}{\mathbb {E}}[f(X^x_t)]. \end{aligned}$$

\(\square \)

We now bound the two terms appearing in Lemma 8 using Lemma 9. For brevity, let us use \(f:=e^{-W}\). We start by analyzing the first term.

Lemma 10

Fix \(x \in {\mathbb {S}}^{n-1}\). For every unit vectors \(v,u\in T_x{\mathbb {S}}^{n-1}\) and \(t>0\),

$$\begin{aligned}&\left| {\mathbb {E}}\left[ g(\nabla _Sf(X^x_t),Q_t(v)) \int _0^t g( Q_s({\dot{k}}(s)u), //_sdB_s)\right] \right| \\&\quad \le 6\frac{e^{-(n-2)t}}{\sqrt{t}}\left( L + \frac{L^2}{\sqrt{n-2}}\right) P_tf(x). \end{aligned}$$

Proof

Using (16) we see that the term \({\mathbb {E}}\left[ g(\nabla _Sf(X^x_t),Q_t(v)) \int _0^t g(Q_s({\dot{k}}(s)u), //_sdB_s)\right] \) can be written as \(e^{-(n-2)t}{\mathbb {E}}[g(\nabla _Sf(X^x_t),//_t M_t) ]\) where

$$\begin{aligned} M_t=\left( \int _0^t g(Q_s({\dot{k}}(s)u), //_sdB_s)\right) v \end{aligned}$$

is a martingale. The quadratic variation of \(M_t\) is equal to, using (17),

$$\begin{aligned} \int _0^te^{-2(n-2)s}{\dot{k}}(s)^2ds= \frac{1}{t^2}\int _0^te^{-2(n-2)s}=\frac{1}{t^2} \frac{1-e^{-2(n-2)t}}{2(n-2)}\le \frac{1}{t}=:\varphi (t) \end{aligned}$$

where the inequality follows from \(1-e^{-2(n-2)t}\le 2(n-2)t\). The proof is complete by Lemma 9. \(\square \)

We now analyze the second term where we take \(v=u\).

Lemma 11

Fix \(x \in {\mathbb {S}}^{n-1}\). For every unit vector \(v\in T_x{\mathbb {S}}^{n-1}\) and \(t>0\),

$$\begin{aligned}&\left| {\mathbb {E}}\left[ g\left( \nabla _Sf(X^x_t), Q_t\int _0^tQ_s^{-1}{\text {Riem}}_S(//_sdB_s,Q_s(v))Q_s(k(s)v)\right) \right] \right| \\&\quad \le 6e^{-(n-2)t}\left( L + \frac{L^2}{\sqrt{n-2}}\right) P_tf(x). \end{aligned}$$

Proof

We use the expressions (15) and (16) to write

$$\begin{aligned}&Q_s^{-1}{\text {Riem}}_S(//_sdB_s,Q_s(v))Q_s(k(s)v)\\&\quad =\left( e^{-(n-2)s} \right) ^{-1}//_s^{-1}{\text {Riem}}_S(//_sdB_s,e^{-(n-2)s}//_s v)e^{-(n-2)s}k(s)//_s v\\&\quad =e^{-(n-2)s}k(s)//_s^{-1}{\text {Riem}}_S(//_sdB_s,//_s v) //_sv\\&\quad =e^{-(n-2)s}k(s)//_s^{-1}\left[ g\left( //_sv, //_sv\right) //_s dB_s-g\left( //_s dB_s,//_s v,\right) //_s v\right] \\&\quad =e^{-(n-2)s}k(s)//_s^{-1}\left[ //_s dB_s-g\left( //_s dB_s,//_s v\right) //_sv\right] \\&\quad =e^{-(n-2)s}k(s) \left[ dB_s-g\left( dB_s,v\right) v\right] , \end{aligned}$$

where in the last equation we used that \(//_s\) is a linear isometry. The term \(dB_s-g\left( dB_s, v\right) v\) is a Brownian motion in the hyperplane orthogonal to v inside \(T_x{\mathbb {S}}^{n-1}\), so it follows that \(\int _0^tQ_s^{-1}{\text {Riem}}_S(//_sdB_s,Q_s(v))Q_s(k(s)v)\) is a martingale in \(T_x{\mathbb {S}}^{n-1}\) whose quadratic variation is bounded from above by (using \(0\le k(s)\le 1\) for all \(s\in [0,t]\)),

$$\begin{aligned} (n-2)\int _0^t{e^{-2(n-2)s}k(s)^2dr}\le (n-2)\int _0^t{e^{-2(n-2)s}dr}\le \frac{n-2}{2(n-2)} =\frac{1}{2}, \end{aligned}$$

Therefore, we can view the term

$$\begin{aligned} {\mathbb {E}}\left[ g\left( \nabla _Sf(X^x_t), Q_t\int _0^tQ_s^{-1}{\text {Riem}}_S(//_sdB_s,Q_s(v))Q_s(k(s)v)\right) \right] \end{aligned}$$

as of the form

$$\begin{aligned} e^{-(n-2)t}{\mathbb {E}}[g(\nabla _Sf(X^x_t),//_t M_t )], \end{aligned}$$

where \(M_t\) is a martingale on \(T_x{\mathbb {S}}^{n-1}\) with quadratic variation bounded by \(\frac{1}{2}\). The proof is complete by Lemma 9. \(\square \)

We can now complete the proof of Proposition 3. By Lemma 10 and Lemma 11,

$$\begin{aligned} \nabla _S^2\log P_t e^{-W(x)}(u,u)&=\frac{\nabla _S^2P_t e^{-W(x)}(u,u)}{P_te^{-W(x)}}-g(\nabla _S\log P_t e^{-W(x)},u)^2\\ {}&\le \frac{|\nabla _S^2P_t e^{-W(x)}(u,u)|}{P_te^{-W(x)}} \\&\le 12\left( L + \frac{L^2}{\sqrt{n-2}}\right) e^{-(n-2)t}\left( \frac{1}{\sqrt{t}} + 1\right) . \end{aligned}$$

5 General manifolds

Let \((M,g,\mu )\) with \(\mu = e^{-V}d{\text {Vol}}\) be a weighted Riemannian manifold, with tangent bundle TM. We start by introducing the relevant concepts from Riemannian geometry.

Notions from Riemannian geometry

  • If S is a symmetric tensor on TM, we denote \(S^{\#}: TM \longrightarrow TM\) the operator such that \(g(S^{\#}(u), v)_x = S(u,v), ~~ \forall x \in M\), \(u, v \in T_xM\) where we suppress the dependence of S on x.

  • \({\text {Riem}}\) denotes the Riemannian curvature 4-tensor on (Mg). Its norm is defined as

    $$\begin{aligned} ||{\text {Riem}}||_\infty :=\sup \left\{ \sum _{i=i}^d g( {\text {Riem}}(x)(e_i, u)v, S^{\#}(e_i)), x \in M, |u|, |v| \le 1, |S|_{\text {op}} \le 1\right\} \end{aligned}$$

    where \(u, v \in T_xM\), \(\{e_i\}\) is any basis of \(T_xM\), and S is a symmetric 2-tensor.

  • \({\text {Ric}}\) denotes the Ricci curvature tensor on (Mg).

  • \(d^*{\text {Riem}}\) is defined as

    $$\begin{aligned} g(d^*{\text {Riem}}(u,v), w) = g((\nabla _w {\text {Ric}}^{\#})(u), v) - g((\nabla _v {\text {Ric}}^{\#})(w), u). \end{aligned}$$
  • \({\text {Ric}}_V\) denotes the weighted Ricci curvature tensor of Bakry-Émery on \((M, g, \mu )\), that is,

    $$\begin{aligned} {\text {Ric}}_V:= {\text {Ric}}+ \nabla _M^2 V \end{aligned}$$

    where \(\nabla _M\) and \(\nabla _M^2\) are, respectively, the gradient and Hessian on the manifold M.

The manifold \((M,g,\mu )\) satisfies the \(\textrm{CD}(\kappa ,\infty )\) curvature-dimension, for \(\kappa \in {\mathbb {R}}\), if

$$\begin{aligned} {\text {Ric}}_V \succeq \kappa g. \end{aligned}$$

With these definitions, we state our main result, which is the precise form of Theorem 3.

Theorem 5

Let \((M,d,\mu )\) be a weighted Riemannian manifold with \(\mu = e^{-V}d\textrm{Vol},\) and let \(\nu = e^{-W}d\mu \) be another measure on M, with W an L-Lipschitz function. Assume that \(||{\text {Riem}}||_\infty < \infty \), \({\text {Ric}}_V \succeq \kappa g\) with \(\kappa > 0\), and that

$$\begin{aligned} \beta := ||\nabla {\text {Ric}}_V^{\#} + d^*{\text {Riem}}+ {\text {Riem}}(\nabla V)||_\infty <\infty . \end{aligned}$$

Then, the Langevin transport map from \(\mu \) onto \(\nu \) is Lipschitz with constant \(\exp \left( L\left( \frac{e^{\frac{L^2}{2\kappa }}}{\sqrt{\kappa }} + e^{\frac{L^2}{2\kappa }}\frac{\Vert {\text {Riem}}\Vert _\infty }{\kappa ^{\frac{3}{2}}} + \frac{\beta }{\kappa ^2}\right) \right) \).

On the assumptions, the lower bound on the Ricci curvature tensor is the natural analogue of the uniform convexity assumption in the Euclidean setting, while a bound on \(\nabla {\text {Ric}}^{\#}_V\) is the natural counterpart to the third derivative bound on V in the Euclidean setting. The other terms we have to control are purely geometric and vanish if we consider a Euclidean space or the uniform measure on the sphere.

As in the previous section, we shall require a bound on the Hessian along the Langevin semigroup. The following result is the analog of Proposition 2 and Proposition 3 but for general manifolds. The estimate of Proposition 4 is not strong enough to yield the sharp results of Theorem 1 and Theorem 2 which necessitate Proposition 2 and Proposition 3. It is this estimate which leads to the double-exponential estimate on the Lipschitz constant of the transport maps.

Proposition 4

For every \(t\ge 0\) and \(x \in M\),

$$\begin{aligned} \nabla _M^2\log P_t e^{-W(x)}\preceq e^{-\kappa t}L\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) e^{L^2\left( \frac{1-e^{-2\kappa t}}{2\kappa }\right) } + \frac{\beta }{\kappa }\right) g. \end{aligned}$$

Proof

By Lemma 4 and Remark 2, if f is positive and L-log-Lipschitz then

$$\begin{aligned} P_t(|\nabla _M f|^2)\le \exp \left( L^2\frac{1-e^{-2\kappa t}}{\kappa }\right) (P_t|\nabla _M f|)^2\le L^2\exp \left( L^2\frac{1-e^{-2\kappa t}}{\kappa }\right) (P_t f)^2. \end{aligned}$$
(18)

Hence, by [28, Theorem 2.5]Footnote 3 and (18),

$$\begin{aligned} \begin{aligned} \nabla _M^2 P_t e^{-W(x)}&\preceq e^{-\kappa t}\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{2\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) (P_t|\nabla _M e^{-W}|^2)^{1/2} + \frac{\beta }{\kappa } P_t|\nabla _M e^{-W}|\right) g\\&\preceq e^{-\kappa t}\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{2\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) Le^{L^2\left( \frac{1-e^{-2\kappa t}}{2\kappa }\right) }P_t e^{-W} + \frac{\beta L}{\kappa }P_te^{-W}\right) g\\ \end{aligned} \end{aligned}$$
(19)

so (19) yields

$$\begin{aligned} \nabla _M^2 \log P_t e^{-W(x)}&=\frac{\nabla _M^2 P_t e^{-W(x)}}{P_t e^{-W(x)}}-\left( \nabla _M \log P_t e^{-W(x)}\right) ^{\otimes 2}\preceq \frac{\nabla _M^2 P_t e^{-W(x)}}{P_t e^{-W(x)}}\\&\preceq e^{-\kappa t}L\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{2\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) e^{L^2\left( \frac{1-e^{-2\kappa t}}{2\kappa }\right) } + \frac{\beta }{\kappa }\right) g. \end{aligned}$$

\(\square \)

Proof of Theorem 5

Proposition 4 implies, for every \(x \in M\) and unit vector \(u\in T_xM\),

$$\begin{aligned}&\int \limits _0^\infty \left| \nabla _M^2 \log P_te^{-W(x)}(u,u)\right| dt\\ {}&\quad \le \int \limits _0^\infty e^{-\kappa t}L\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{2\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) e^{\frac{L^2}{2\kappa }} + \frac{\beta }{\kappa }\right) dt\\&\quad = L\left( \sqrt{\kappa }e^{\frac{L^2}{2\kappa }}\int \limits _0^\infty \frac{e^{-\kappa t}}{\sqrt{e^{2\kappa t}-1}}dt + \left( e^{\frac{L^2}{2\kappa }}\frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }} + \frac{\beta }{\kappa }\right) \int \limits _0^\infty e^{-\kappa t} dt\right) \\&\quad = L\left( \frac{e^{\frac{L^2}{2\kappa }}}{\sqrt{\kappa }} + e^{\frac{L^2}{2\kappa }}\frac{\Vert {\text {Riem}}\Vert _\infty }{\kappa ^{\frac{3}{2}}} + \frac{\beta }{\kappa ^2}\right) . \end{aligned}$$

The proof is complete by Proposition 1. \(\square \)

6 The inverse of the Langevin transport map

In this section, we focus on the inverse of the Langevin transport map. Recall, from (4), that \((S_t)_{t\ge 0}\) stands for forward maps along the Langevin flow. Thus we have that \(S:=\lim \limits _{t\rightarrow \infty } S_t = T^{-1}\), when T is the Langevin transport map. We shall prove the following precise form of Theorem 4.

Theorem 6

  1. 1.

    Let \(\mu \) and \(\nu \) be as in Theorem 1. Then S is Lipschitz with constant \(\exp \left( \frac{21}{2}\frac{L^2}{\kappa }+\frac{5\sqrt{\pi }L}{\sqrt{\kappa }}+\frac{LK}{2\kappa ^2}\right) \)

  2. 2.

    Let \(\mu \) and \(\nu \) be as in Theorem 2. Then S is Lipschitz with constant \( \exp \left( 35\frac{L}{\sqrt{n-2}} + \frac{71}{2}\frac{L^2}{n-2}\right) \).

  3. 3.

    Let \(\mu \) and \(\nu \) be as in Theorem 5. Then S is Lipschitz with constant \( \exp \left( L\left( \frac{e^{\frac{L^2}{2\kappa }}}{\sqrt{\kappa }} + e^{\frac{L^2}{2\kappa }}\frac{\Vert {\text {Riem}}\Vert _\infty }{\kappa ^{\frac{3}{2}}} + \frac{\beta }{\kappa ^2}\right) +\frac{L^2}{2\kappa }\right) \).

The proof of Theorem 6 is analogous to the proofs of the Lipschitz properties of the Langevin transport map T from \(\mu \) to \(\nu \). With the same argument as in the proof of Proposition 1, without reversing the map, we get:

Proposition 5

If for all \(t\ge 0\)

$$\begin{aligned} \nabla _M^2\log P_te^{-W}\succeq -\ell _t g, \end{aligned}$$

then S is \(\exp \left( \int _0^{\infty }\ell _tdt\right) \)-Lipschitz.

The next result shows how to combine log-Lipschitz properties together with Hessian estimates to get a lower bound on the Hessian as in Proposition 5.

Proposition 6

Let \((M,g,\mu )\) be weighted Riemannian manifold with the associated Langevin semigroup \((P_t)\) and let \(|\cdot |:=g(\cdot ,\cdot )\). Suppose that we have:

  1. 1.

    There exists \(\kappa >0\) such that for every test function f,

    $$\begin{aligned} |\nabla _M P_tf| \le e^{-\kappa t}P_t|\nabla _M f|. \end{aligned}$$
  2. 2.

    There exists \(\alpha :[0,\infty )\times [0,\infty ) \rightarrow {\mathbb {R}}\) such that for every L-log-Lipschitz nonnegative function f,

    $$\begin{aligned} \frac{\nabla _M^2P_tf}{P_tf}\succeq -\alpha (t,L)g. \end{aligned}$$

Then, for every L-log-Lipschitz function f,

$$\begin{aligned} \nabla _M^2\log P_tf \succeq -(\alpha (t,L)+L^2e^{-2\kappa t})g. \end{aligned}$$

Proof

Item 1 implies that

$$\begin{aligned} | \nabla _M \log P_t f|&= \frac{|\nabla _M P_tf|}{|P_tf|}\le e^{-\kappa t}\frac{P_t|\nabla _M f|}{|P_t f|}= e^{-\kappa t}\frac{P_t|f\nabla _M \log f|}{|P_t f|}\le Le^{-\kappa t} \end{aligned}$$

so, by item 2, for every tangent vector u,

$$\begin{aligned} \nabla _M^2 \log P_t f(u,u)=\frac{\nabla _M^2 P_t f(u,u)}{P_t f}-g(\nabla _M \log P_t f,u)^2\ge (\alpha (t,L)+L^2e^{-2\kappa t}). \end{aligned}$$

\(\square \)

We can now complete the proof of Theorem 6.

  1. 1.

    The assumption \(\nabla ^2 V(x) \succeq \kappa \textrm{I}\) together with the classical Bakry-Émery gradient estimate [22, Eq. (5.5.4)] shows that item 1 of Proposition 6 holds with \(\kappa \). As for item 2, it follows from the proof of Proposition 2 that it holds with

    $$\begin{aligned} \alpha (t,L)=Le^{-\kappa t}\left[ 5L+\frac{5}{\sqrt{t}}+\frac{Kt}{2}\right] . \end{aligned}$$

    Hence, the condition in Proposition 5 is satisfied with

    $$\begin{aligned} \ell _t =Le^{-\kappa t}\left[ 5L+\frac{5}{\sqrt{t}}+\frac{Kt}{2}\right] +L^2e^{-2\kappa t}, \end{aligned}$$

    which completes the proof by integration from \(t=0\) to \(t=\infty \) (see the proof of Theorem 1).

  2. 2.

    The sphere \({\mathbb {S}}^{n-1}\) satisfies the curvature-dimension condition \(\textrm{CD}(n-2,\infty )\) so the Bakry-Émery gradient estimate [22, Eq. (5.5.4)] can be applied to show that item 1 of Proposition 6 holds with \(n-2\). As for item 2, it follows from the proof of Proposition 3 that it holds with

    $$\begin{aligned} \alpha (t,L)= 45\left( L + \frac{L^2}{\sqrt{n-2}}\right) e^{-(n-2)t}\left( \frac{1}{\sqrt{t}} + 1\right) . \end{aligned}$$

    Hence, the condition in Proposition 5 is satisfied with

    $$\begin{aligned} \ell _t =35\left( L + \frac{L^2}{\sqrt{n-2}}\right) e^{-(n-2)t}\left( \frac{1}{\sqrt{t}} + 1\right) +L^2e^{-2(n-2) t}, \end{aligned}$$

    which completes the proof by integration from \(t=0\) to \(t=\infty \) (see the proof of Theorem 2).

  3. 3.

    By assumption, M satisfies the curvature-dimension condition \(\textrm{CD}(\kappa ,\infty )\) so the Bakry-Émery gradient estimate [22, Eq. (5.5.4)] can be applied to show that item 1 of Proposition 6 holds with \(\kappa \). As for item 2, it follows from the proof of Proposition 4 that it holds with

    $$\begin{aligned} \alpha (t,L)= e^{-\kappa t}L\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{2\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) e^{L^2\left( \frac{1-e^{-2\kappa t}}{2\kappa }\right) } + \frac{\beta }{\kappa }\right) . \end{aligned}$$

    Hence, the condition in Proposition 5 is satisfied with

    $$\begin{aligned} \ell _t =e^{-\kappa t}L\left( \left( \frac{\sqrt{\kappa }}{\sqrt{e^{2\kappa t}-1}} + \frac{\Vert {\text {Riem}}\Vert _\infty }{\sqrt{\kappa }}\right) e^{L^2\left( \frac{1-e^{-2\kappa t}}{2\kappa }\right) } + \frac{\beta }{\kappa }\right) +L^2e^{-2\kappa t}, \end{aligned}$$

    which completes the proof by integration from \(t=0\) to \(t=\infty \) (see the proof of Theorem 5).

7 Applications

As is classical, Lipschitz transport maps can be used to prove functional inequalities, by transferring them from the source measure to the target measure. In this section, we discuss some results on functional inequalities for Lispchitz perturbations that appear to be new.

7.1 Dimension-free Gaussian isoperimetric inequalities

One of the strongest functional inequalities for the Gaussian measure is the Gaussian isoperimetric inequality, which states that among all sets with given Gaussian mass, the minimal Gaussian perimeter is achieved for half-spaces. The general functional inequality is defined as follows:

Definition 7

Let \(\mu \) be a probability measure on a metric space. Given a Borel set A, we define its t-enlargement as \(\{x; d(x,A) \le t\}\) and its boundary measure (or Minkowski content) as

$$\begin{aligned} \mu ^+(A):= \liminf _{t \rightarrow 0} t^{-1}\mu (A^t \backslash A). \end{aligned}$$

The probability measure \(\mu \) is said to satisfy a Gaussian isoperimetric inequality with constant \(\alpha > 0\) if for any Borel set we have

$$\begin{aligned} \mu ^+(A) \ge \alpha I(\mu (A)),\quad I = \varphi \circ \phi ^{-1} \end{aligned}$$

where \(\varphi (x) = (2\pi )^{-1/2}\exp (-x^2/2)\) and \(\phi (x) = \int _{-\infty }^x{ \varphi (t)dt}\).

The function I corresponds to the perimeter of a half-space with the right Gaussian mass, and the Gaussian measure satisfies this inequality with constant 1 in all dimensions. This inequality was generalized to uniformly log-concave measures in [32]. The functional form of the Gaussian isoperimetric inequality is Bobkov’s inequality [33]. It is immediate that Gaussian isoperimetric inequalities can be transferred by Lipschitz transport maps (see [6]), so we get the following as a corollary of Theorem 1:

Theorem 8

If \(\mu \) is a log-Lipschitz perturbation of a standard Gaussian measure on \({\mathbb {R}}^d\), then it satisfies a Gaussian isoperimetric inequality with a dimension-free constant.

The same result holds in the general context of Theorems 12 and 5. For the sphere, we can transport the sharp isoperimetric inequality on the sphere, which contains a dimensional improvement compared with the Gaussian isoperimetric inequality.

To our knowledge, this is the first result on dimension-free isoperimetric inequalities for log-Lipschitz perturbations. A result of Miclo (see [34, Lemma 2.1]) states that Lipschitz perturbations of uniformly log-concave measures can be recast as bounded perturbations, allowing to use the Holley-Stroock lemma to deduce functional inequalities. However, the constants obtained in that way are dimension-dependent. Some of the results of [35, Section 5] can be used to prove functional inequalities for Lipschitz perturbations of uniformly-convex measures, but the constants depend on exponential moments of the unperturbed measure, and are hence dimensional.

Gaussian isoperimetric inequalities imply other functional inequalities, including in particular logarithmic Sobolev inequalities:

Definition 9

A probability measure \(\mu \) on a manifold is said to satisfy a logarithmic Sobolev inequality if for any locally Lipschitz function f such that \(|\nabla f| \in L^2(\mu )\), we have

$$\begin{aligned} \int {f^2\log f^2 d\mu } - \left( \int {f^2d\mu }\right) \log \left( \int {f^2d\mu }\right) \le 2C_{LSI}\int {|\nabla f|^2d\mu }. \end{aligned}$$

The standard Gaussian satisfies a logarithmic Sobolev inequality, with sharp constant \(C_{LSI} = 1\). This inequality, which is the sharpest possible analogue of Sobolev inequalities for the Gauss space, implies for example dimension-free concentration bounds, and has found many applications in statistics and statistical physics. We refer to [22] for an overview.

For logarithmic Sobolev inequalities, Aida and Shigekawa [36] showed that such Lipschitz perturbations satisfy them, with a dimension-free constant that has not been made explicit. Gaussian isoperimetric inequalities are strictly stronger than logarithmic Sobolev inequalities.

As a corollary, we recover functional inequalities for Gaussian mixtures with compactly supported mixing measures, which can be rewritten as Lipschitz perturbations of a Gaussian measure (see the discussion at the end of [34, Section 2]).

There are some other estimates for which it is difficult to provide direct proofs, but that can easily be extended from one measure to another via Lipschitz transport maps, including general eigenvalue estimates for diffusion generators [8] and sharpened integrability bounds for non-Lipschitz functions [37, 38].

7.2 A growth estimate for optimal transport maps

Our argument does not apply to quadratic optimal transport maps. Nonetheless, it would be natural to expect similar Lipschitz estimates for them. As a hint towards such a result, we remark that the Gaussian concentration inequality implied by our Lipschitz estimate for non-optimal maps, combined with the sub-Gaussian case of [39, Theorem 1.1], implies controlled linear growth for optimal transport maps, in the following way:

Proposition 7

Let \(\mu \) be a centered and L-log-Lipschitz perturbation of a standard Gaussian measure on \({\mathbb {R}}^d\). Then the quadratic optimal transport map \(\nabla \varphi \) sending \(\gamma \) onto \(\mu \) satisfies

$$\begin{aligned} |\nabla \varphi (x)| \le C_1 \exp (C_2 \max (L, L^2))\sqrt{d + |x|^2} \end{aligned}$$

where \(C_1\) and \(C_2\) are universal constants.

In view of our results, it is natural to expect that Lipschitz estimates hold for optimal transport maps. Even the Gaussian case is open at this time:

Conjecture 1

The Brenier map from the standard Gaussian measure onto a log-Lipschitz perturbation of it is globally Lipschitz, and its norm can be bounded independently of the dimension.

Note that such a statement would already significantly extend the Gaussian sub-case of [10, Theorem 1.1] when the perturbation is smooth. Similar questions can be raised for more general measures, as well as in the Riemannian setting, but the Gaussian case seems like a good starting point. Since the heat flow map and the Brenier transport map coincide in dimension one, this conjecture does hold on \({\mathbb {R}}\).

Let us conclude with some comments on what is missing to prove this conjecture. The optimal transport map is the unique weak solution to the Monge-Ampère equation

$$\begin{aligned} e^{-V} = e^{-V(\nabla \varphi ) - W(\nabla \varphi )}\det \nabla ^2 \varphi \end{aligned}$$

among convex functions. This is a fully nonlinear second-order PDE. One can start by investigating the linearized equation. If we replace W by a small perturbation \(\varepsilon W - c_\varepsilon \), with \(c_\varepsilon \) a constant to enforce unit mass for the target distribution, and look for a solution of the form \(\nabla \varphi = x + \varepsilon \nabla h\), the linearized equation as \(\varepsilon \) goes to zero is

$$\begin{aligned} \Delta h - \nabla V \cdot \nabla h = W - \int {We^{-V}dx}. \end{aligned}$$
(20)

The operator on the left-hand side is precisely the generator of the diffusion process (1). This linearization highlights the connection between quadratic optimal transport and the heat flow map investigated here: the linearization of both constructions gives rise to the same PDE.

To prove a regularity estimate for a nonlinear PDE, it is natural to start from a proof for the linearized equation. Regularity for solutions to Poisson equations such as (20) was investigated in [40,41,42] using the stochastic representation of solutions, Malliavin calculus and Bismut’s formula. These are precisely the kind of tools we used in this work.

Since optimal transport maps do not admit a similar stochastic representation, it would be natural to start by looking for a PDE proof of the regularity estimates of [40] on (20). To our knowledge, this has not been successfully investigated and is at this point a barrier to proving Conjecture 1. The problem is of course a variant of many well-studied elliptic regularity problems. The main difficulties are that we work in a non-compact setting, allowing for unbounded solutions, and seek explicit dimension-free estimates. This last point in particular has rarely been investigated with PDE methods.