1 Introduction and Main Results

In recent years, the study of Lipschitz transport maps has emerged as an important line of research, with applications in probability and functional analysis. Let us fix a measure \(\mu \) on \(\mathbb R^d\). It is often desirable to write \(\mu \) as a push-forward \(\mu =\varphi _*\eta \), for a well-behaved measure \(\eta \) and a Lipschitz map \(\varphi :\mathbb R^d \to \mathbb R^d\). The main advantage of this approach lies in the fact that one can use the regularity of \(\varphi \) to transfer known analytic properties from \(\eta \) to \(\mu \), compensating for the potential complexity of \(\mu \).

Perhaps the most well-known result in this direction is due to Caffarelli [7], which states that if \(\gamma _d\) is the standard Gaussian in \(\mathbb R^d\), and \(\mu \) is more log-concave than \(\gamma _d\), then there exists a 1-Lipschitz map \(\varphi ^{\mathrm {opt}}\) such that \(\varphi ^{\mathrm {opt}}_*\gamma _d = \mu \). The map \(\varphi ^{\mathrm {opt}}\) is known as the optimal transport map [6]. Crucially, the Lipschitz constant of \(\varphi ^{\mathrm {opt}}\) does not depend on the dimension d and, consequently, \(\varphi ^{\mathrm {opt}}\) transfers functional inequalities from \(\gamma _d\) to \(\mu \), in a dimension-free fashion. For example, the optimal bounds on the Poincaré and log-Sobolev constants are recovered for the class of strongly log-concave measures [10]. The main goal of this work is to establish quantitative generalizations of this fact, for measures that are not necessarily strongly log-concave. To this end, we shall use a different transport map, \(\varphi ^{\mathrm {flow}}\), along the heat flow, of Kim and Milman [15], which was previously used, in the context of functional inequalities, by Otto and Villani [24].Footnote 1

In general, there is no reason to expect that an arbitrary measure could be represented as a push-forward of \(\gamma _d\) by a Lipschitz map. Indeed, in line with the above discussion, such measures must satisfy certain functional inequalities with constants that are determined by the regularity of the mapping. Thus, we restrict our attention to classes of measures that contain, among others, log-concave measures with bounded support and Gaussian mixtures.

We now turn to discuss, in greater detail, the types of measures for which our results shall hold. First, we consider log-concave measures with support contained in a ball of radius D. It is a classical fact that these measures satisfy Poincaré [25] and log-Sobolev [11] inequalities with constants of order D. For this reason, Kolesnikov raised the question of whether, in this setting, the optimal transport map \(\varphi ^{\mathrm {opt}}\) is \(O(D)\)-Lipschitz [17, Problem 4.3]. Up to now, the best known estimate, in [17, Theorem 4.2], gave a Lipschitz constant that is of order \(\sqrt {d}D\). One of our main contributions is to close this gap, for the map \(\varphi ^{\mathrm {flow}}\). In fact, we prove a stronger result that captures a trade-off between the convexity of \(\mu \) and the size of its support.

In the sequel, for \(\kappa \in \mathbb R\) (possibly negative), we say that \(\mu \) is \(\kappa \)-log-concave if its support is convex and, for \(\mu \)-almost every x, its density satisfies,

$$\displaystyle \begin{aligned} -\nabla^2\log \left(\frac{d\mu}{dx}(x)\right) \succeq \kappa\mathrm{I}_d. \end{aligned}$$

Theorem 1

Let \(\mu \) be a \(\kappa \) -log-concave probability measure on \(\mathbb R^d\) , and set \(D := \mathrm {diam}(\mathrm {supp}(\mu ))\) . Then, for the map \(\varphi ^{\mathrm {flow}}:\mathbb R^d \to \mathbb R^d\) , which satisfies \(\varphi ^{\mathrm {flow}}_*\gamma _d = \mu \) , the following holds:

  1. 1.

    If \(\kappa > 0\) then,

    $$\displaystyle \begin{aligned} \|\nabla\varphi^{\mathrm{flow}}(x)\|{}_{\mathrm{op}}\leq \frac{1}{\sqrt{\kappa}}, \end{aligned}$$

    for \(\mu \) -almost every x.

  2. 2.

    If \(\kappa D^2 < 1\) then,

    $$\displaystyle \begin{aligned} \|\nabla\varphi^{\mathrm{flow}}(x)\|{}_{\mathrm{op}}\leq e^{\frac{1 - \kappa D^2}{2}} D, \end{aligned}$$

    for \(\mu \) -almost every x.

Item 1 of Theorem 1 follows from the result of Kim and Milman [15], and is analogous to Caffarelli’s result [7]. Item 2 improves and generalizes the bound in Item 1 in two ways:

  • When \(\kappa > 0\) and \(\kappa D^2 <1\), since \(e^{\frac {1 - \kappa D^2}{2}} D < \frac {1}{\sqrt {\kappa }}\), Item 2 offers a strict improvement of the Lipschitz constant in Caffarelli’s result.

  • When \(\kappa \leq 0\), Theorem 1 provides a Lipschitz transport map for measures that are not strongly log-concave. In particular, the case \(\kappa = 0\) is precisely the setting of Kolesnikov’s question [17, Problem 4.3].

Theorem 1 may also be compared with [9, Theorem 1.1], which studies Lipschitz properties of the optimal transport map when the target measure is a semi-log-concave perturbation of \(\gamma _d\). We point out that the two results apply in different regimes: while our result applies to semi-log-concave measures with bounded support, the result of [9] requires that the support of the measure is the entire \(\mathbb R^d\).

The other type of measures we consider are Gaussian mixtures of the form \(\mu = \gamma _d \star \nu \), where \(\nu \) has bounded support. It was recently shown that these measures satisfy several dimension-free functional inequalities [3, 8, 30]. As we shall show, this phenomenon can be better understood and further strengthened by establishing the existence of a Lipschitz transport map.

Theorem 2

Let\(\nu \)be a probability measure on\(\mathbb R^d\)with\( \mathrm {diam}(\mathrm {supp}(\nu ))\leq R\)and consider\(\mu = \gamma _d\star \nu \). Then, for the map\(\varphi ^{\mathrm {flow}}:\mathbb R^d \to \mathbb R^d\), which satisfies\(\varphi ^{\mathrm {flow}}_*\gamma _d = \mu \),

$$\displaystyle \begin{aligned} \|\nabla\varphi^{\mathrm{flow}}(x)\|{}_{\mathrm{op}} \leq e^{\frac{R^2}{2}}, \end{aligned}$$

for almost every\(x \in \mathbb R^d\).

As mentioned above, the proofs of Theorems 1 and 2 follow from the analysis of Kim and Milman [15]. The main result of [15] is a generalization of Caffarelli’s result that establishes Lipschitz properties of \(\varphi ^{\mathrm {flow}}\), under an appropriate symmetry assumption. We shall extend the analysis to the classes of measures considered in Theorems 1 and 2. A similar, but in some sense orthogonal to this work, extension was recently performed by Klartag and Putterman [16, Section 3] where the authors considered transportation from \(\mu \) to \(\mu \star \gamma _d\). We also mention the concurrent work of Neeman in [22], which, using a similar method to one presented here, studied Lipschitz properties of bounded perturbations of the Gaussians, generalizing [9]. In the broader context, a similar map was recently used in [1].

Both of the results presented above deal with Lipschitz transport maps that push-forward the standard Gaussian. As discussed, and as we shall demonstrate, the existence of such maps is important for applications. However, one could also ask the reverse question: for which measures \(\mu \) do we have \(\gamma _d = \varphi _*\mu \), with \(\varphi \) Lipschitz?

To answer this question we introduce the class of \(\beta \)-semi-log-convex measures, as measures \(\mu \) on \(\mathbb R^d\), which satisfy,

$$\displaystyle \begin{aligned} -\nabla^2\log \left(\frac{d\mu}{dx}(x)\right) \preceq \beta\mathrm{I}_d, \end{aligned}$$

for some \(\beta > 0\). It follows from the definition that \(\mathrm {supp}(\mu )=\mathbb R^d\) (which is why \(\beta > 0\)). In some sense, this is a complementary notion to being \(\kappa \)-log-concave. Our next result makes this intuition precise.

Theorem 3

Let\(\beta > 0\)and let\(\mu \)be a\(\beta \)-semi-log-convex probability measure on\(\mathbb R^d\). Then, for the inverse map\((\varphi ^{\mathrm {flow}})^{-1}:\mathbb R^d \to \mathbb R^d\), which satisfies\((\varphi ^{\mathrm {flow}})^{-1}_*\mu = \gamma _d\),

$$\displaystyle \begin{aligned} \|\nabla(\varphi^{\mathrm{flow}})^{-1}(x)\|{}_{\mathrm{op}} \leq \sqrt{\beta}, \end{aligned}$$

for almost every\(x \in \mathbb R^d\).

Let us remark that the same question was previously addressed in [17, Theorem 2.2], which expanded upon Caffarelli’s original proof, and obtained the same Lipschitz bounds, for \((\varphi ^{\mathrm {opt}})^{-1}\). Thus, Theorem 3 gives a more complete picture by proving the analogous result for the map \((\varphi ^{\mathrm {flow}})^{-1}\).

Transport Along Heat Flows and the Brownian Transport Map

It is tempting to compare Theorems 1 and 2 to the recent construction in [20] of the Brownian transport map. The results apply in similar settings, and the asymptotic dependencies on all parameters are essentially the same. However, as we shall explain, the results are not strictly comparable.

The constructions are qualitatively different: the domain of the Brownian transport map is the infinite-dimensional Wiener space, in contrast to the finite-dimensional domain afforded by the above theorems. Since the Gaussian measure, also in infinite dimensional Wiener space, satisfies numerous functional inequalities with dimension-free constants, realizing a measure on \(\mathbb R^d\) as a push-forward of the Wiener measure turns out to be satisfactory for many applications. However, there are some applications that require a map between equal dimensions, which explains the need for the present work. We expand on such applications below.

On the other hand, as demonstrated by Mikulincer and Shenfeld [20, Theorem 1.5], in several interesting cases, the Brownian transport map is provably ’Lipschitz on average’. Bounding the averaged derivative of a transport map is an important property (related to the Kannan-Lovász-Simonovits conjecture [14] and to quantitative central limit theorems [20, Theorem 1.7]) that seems to be out of reach for current finite-dimensional constructions.

Having said the above, we do note that for log-concave measures, the Lipschitz constants of the Brownian transport map [20, Theorem 1.1] are usually better than the ones provided by Theorem 1. For Gaussian mixtures, the roles seem to reverse, at least when R is large, as Theorem 2 can be better than [20, Theorem 1.4].

1.1 Applications

As mentioned in the previous section, for some applications it is essential that the domain and image of the transport map coincide. Here we review such applications and state several new implications of Theorems 1 and 2. To keep the statements concise, we will not cover applications that could be obtained by previous results, as in [10, 20, 21].

1.1.1 Eigenvalues Comparisons

A measure \(\mu \) is said to satisfy a Poincaré inequality if, for some constant \(C_{\mathrm {p}}(\mu ) \geq 0\) and every test function g,

$$\displaystyle \begin{aligned} \mathrm{Var}_\mu (g) \leq C_{\mathrm{p}}(\mu)\int\limits_{\mathbb R^d} \|\nabla g\|{}^2d\mu. \end{aligned}$$

We implicitly assume that, when it exists, \(C_{\mathrm {p}}(\mu )\) denotes the optimal constant. According to the Gaussian Poincaré inequality [2], \(C_{\mathrm {p}}(\gamma _d) = 1\). If \(\mu = \varphi _*\gamma _d\) and \(\varphi \) is L-Lipschitz, this immediately implies \(C_{\mathrm {p}}(\mu ) \leq L^2\). Indeed,

$$\displaystyle \begin{aligned} {} \mathrm{Var}_{\mu}(g) &=\mathrm{Var}_{\gamma_d}(g\circ \varphi)\le \int\limits_{\mathbb R^d} \|\nabla(g\circ \varphi)\|{}^2d\gamma_d\\ &\le \int\limits_{\mathbb R^d} \|\nabla\varphi\|{}_{\mathrm{op}}^2\,(\|\nabla g\|\circ \varphi)^2d\gamma_d\le L^2\int\limits_{\mathbb R^d} \|\nabla g\|{}^2d\mu. \end{aligned} $$
(1)

Note that the same argument works even if \(\varphi \) is a map between spaces of different dimensions. However, for certain generalizations of the Poincaré inequality, as we now explain, it turns out that it is beneficial for the domain of \(\varphi \) to be the same as the domain of \(\mu \). If \(\frac {d\mu }{dx} = e^{-V}\) and we define the weighted Laplacian \(\mathcal {L}_\mu = \Delta - \langle \nabla , \nabla V\rangle \), then \(C_{\mathrm {p}}(\mu )\) corresponds to the inverse of the first non-zero eigenvalue of \(\mathcal {L}_\mu \). In [21, Theorem 1.7], E. Milman showed that a similar argument to (1) works for higher order eigenvalues of \(\mathcal {L}_\mu \) and \(\mathcal {L}_{\gamma _d}\).

Since for \(\mathcal {L}_{\gamma _d}\) the multiplicities of the eigenvalues grow with the dimension d, the full power of Milman’s argument requires that \(\varphi \) is a map from \(\mathbb R^d\) to \(\mathbb R^d\). Thus, by considering the map \(\varphi ^{\mathrm {flow}}\) from Theorems 1 and 2 and applying Milman’s contraction principle, we immediately obtain:

Corollary 4

Let \(\mu \) be a probability measure on \(\mathbb R^d\) and let \(\lambda _i(\mathcal {L}_\mu )\) (resp. \(\lambda _i(\mathcal {L}_{\gamma _d})\) ) stand for the \(i{\mathrm {th}}\) eigenvalue of \(\mathcal {L}_\mu \) (resp. \(\mathcal {L}_{\gamma _d}\) ). Then,

  1. 1.

    If\(\mu \)is\(\kappa \)-log-concave,\(D :=\mathrm {diam}(\mathrm {supp}(\mu ))\), and\(\kappa D^2 < 1\),

    $$\displaystyle \begin{aligned} \frac{1}{e^{1 - \kappa D^2} D^2} \lambda_i(\mathcal{L}_{\gamma_d}) \leq \lambda_i(\mathcal{L}_{\mu}). \end{aligned}$$
  2. 2.

    If \(\mu = \gamma _d\star \nu \) and \( \mathrm {diam}(\mathrm {supp}(\nu ))\leq R\) , then

    $$\displaystyle \begin{aligned} \frac{1}{e^{R^2}} \lambda_i(\mathcal{L}_{\gamma_d}) \leq \lambda_i(\mathcal{L}_{\mu}). \end{aligned}$$

1.1.2 Dimensional Functional Inequalities

Another direction for improving and generalizing the Poincaré inequality goes through dimensional functional inequalities, as in [4].

Let us give a first example, in the form of the dimensional Gaussian log-Sobolev inequality [2], which is a strict improvement over the logarithmic Sobolev inequality. For \(g:\mathbb R^d \to \mathbb R_+\) we define its entropy relative to \(\mu \) as

$$\displaystyle \begin{aligned} \mathrm{Ent}_\mu(g):= \int\limits_{\mathbb R^d}\log(g)gd\mu - \log\left(\int\limits_{\mathbb R^d}gd\mu\right)\int\limits_{\mathbb R^d}gd\mu. \end{aligned}$$

For \(\gamma _d\), the following holds:

$$\displaystyle \begin{aligned} \mathrm{Ent}_{\gamma_d}(g) \leq \frac{d}{2} \log\left(1 + \frac{1}{d}\int\limits_{\mathbb R^d} \frac{\|\nabla g\|{}^2}{g}d\gamma_d\right). \end{aligned}$$

With the same argument as in (1), and since the logarithm is monotone, we have the corollary:

Corollary 5

Let \(\mu \) be a probability measure on \(\mathbb R^d\) and let \(g:\mathbb R^d \to \mathbb R_+\) be a test function. Then,

  1. 1.

    If\(\mu \)is\(\kappa \)-log-concave,\(D :=\mathrm {diam}(\mathrm {supp}(\mu ))\), and\(\kappa D^2 < 1\),

    $$\displaystyle \begin{aligned} \mathrm{Ent}_\mu(g)\leq \frac{d}{2}\log\left(1 + \frac{e^{1 - \kappa D^2} D^2}{d}\int\limits_{\mathbb R^d} \frac{\|\nabla g\|{}^2}{g}d\mu\right). \end{aligned}$$
  2. 2.

    If \(\mu = \gamma _d\star \nu \) and \( \mathrm {diam}(\mathrm {supp}(\nu ))\leq R\) , then

    $$\displaystyle \begin{aligned} \mathrm{Ent}_\mu(g)\leq \frac{d}{2}\log\left(1 + \frac{e^{R^2}}{d}\int\limits_{\mathbb R^d} \frac{\|\nabla g\|{}^2}{g}d\mu\right). \end{aligned}$$

Another example is the dimensional weighted Poincaré inequality which appears in [5, Corrolary 5.6], according to which,

$$\displaystyle \begin{aligned} {} \mathrm{Var}_{\gamma_d}(g) \leq \frac{d(d+3)}{d-1}\int\limits_{\mathbb R^d}\frac{\|\nabla g(x)\|{}^2}{1+\|x\|{}^2}d\gamma_d(x). \end{aligned} $$
(2)

For certain test functions, this is a strict improvement of the Gaussian Poincaré inequality. When the target measure is symmetric, we can adapt the argument in (1), and obtain:

Corollary 6

Let\(\mu \)be a symmetric probability measure on\(\mathbb R^d\). Then, for any test function\(g:\mathbb R^d \to \mathbb R\),

  1. 1.

    If\(\mu \)is\(\kappa \)-log-concave,\(D :=\mathrm {diam}(\mathrm {supp}(\mu ))\), and\(\kappa D^2 < 1\),

    $$\displaystyle \begin{aligned} \mathrm{Var}_{\mu}(g) \leq \frac{d(d+3)}{d-1}e^{1 -\kappa D^2}D^2\int\limits_{\mathbb R^d}\frac{\|\nabla g(x)\|{}^2}{1+\frac{e^{\kappa D^2-1}}{D^2}\|x\|{}^2}d\mu(x). \end{aligned}$$
  2. 2.

    If\(\mu = \gamma _d\star \nu \)and\( \mathrm {diam}(\mathrm {supp}(\nu ))\leq R\),

    $$\displaystyle \begin{aligned} \mathrm{Var}_{\mu}(g) \leq \frac{d(d+3)}{d-1}e^{R^2}\int\limits_{\mathbb R^d}\frac{\|\nabla g(x)\|{}^2}{1+e^{-R^2}\|x\|{}^2}d\mu(x). \end{aligned}$$

Proof

Suppose that \(\mu = \varphi _*\gamma _d\) where \(\varphi :\mathbb R^d \to \mathbb R^d\) is L-Lipschitz and satisfies \(\varphi (0) = 0\). Then, by (2),

$$\displaystyle \begin{aligned} \mathrm{Var}_{\mu}(g)&=\mathrm{Var}_{\gamma_d}(g\circ \varphi)\le \frac{d(d+3)}{d-1}\int\limits_{\mathbb R^d} \frac{\|\nabla(g\circ \varphi(x))\|{}^2}{1+\|x\|{}^2}d\gamma_d\\ & \le \frac{d(d+3)L^2}{d-1} \int\limits_{\mathbb R^d} \frac{(\|\nabla g\|\circ \varphi(x))^2}{1+\|x\|{}^2}d\gamma_d. \end{aligned} $$

To handle the integral on the right hand side, we invoke the disintegration theorem [13, Theorems 1 and 2] to decompose \(\gamma _d\) along the fibers of \(\varphi \) in the following way: there exists a family of probability measures \(\{\gamma _x\}_{x\in \mathbb R^d}\), such that \(\mathrm {supp}(\gamma _x)\subset \varphi ^{-1}(\{x\})\), and satisfies

$$\displaystyle \begin{aligned} \int\limits_{\mathbb R^d} h(x)d\gamma_d(x) = \int\limits_{\mathbb R^d}\int\limits_{\varphi^{-1}(\{x\})}h(y)d\gamma_x(y)d\mu(x), \end{aligned}$$

for every test function h. Hence, taking \(h(x) = \frac {(\|\nabla g\| \circ \varphi (x))^2}{1+\|x\|{ }^2}\),

$$\displaystyle \begin{aligned} \int\limits_{\mathbb R^d}& \frac{(\|\nabla g\|\circ \varphi(x))^2}{1+\|x\|{}^2}d\gamma_d(x) = \int\limits_{\mathbb R^d} \int\limits_{\varphi^{-1}(\{x\})} \frac{(\|\nabla g\|\circ \varphi(y))^2}{1+\|y\|{}^2}d\gamma_x(y)d\mu(x)\\ &= \int\limits_{\mathbb R^d} \int\limits_{\varphi^{-1}(\{x\})} \frac{\|\nabla g(x)\|{}^2}{1+\|y\|{}^2}d\gamma_x(y)d\mu(x) \leq \int\limits_{\mathbb R^d} \int\limits_{\varphi^{-1}(\{x\})} \frac{\|\nabla g(x)\|{}^2}{1+L^{-2}\|x\|{}^2}d\gamma_x(y)d\mu(x)\\ &=\int\limits_{\mathbb R^d}\frac{\|\nabla g(x)\|{}^2}{1+L^{-2}\|x\|{}^2}\left(\int\limits_{\varphi^{-1}(\{x\})} d\gamma_x(y)\right)d\mu(x) = \int\limits_{\mathbb R^d}\frac{\|\nabla g(x)\|{}^2}{1+L^{-2}\|x\|{}^2}d\mu(x) \end{aligned} $$

where in the inequality we have used the estimate \(\|y\| \geq \frac {1}{L} \|x\|\) for any y such that \(\varphi (y) = x\). Indeed, by assumption, \(\varphi (0) = 0\) and \(\varphi \) is L-Lipschitz, which immediately yields \(\|\varphi (y)\| \leq L\|y\|.\)

Finally, when \(\mu \) is symmetric, our transport map, \(\varphi := \varphi ^{\mathrm {flow}}\), will turn out to be odd and, hence, satisfies \(\varphi ^{\mathrm {flow}}(0) = 0\) (see Remark 8). The result follows by combining the above calculations with Theorems 1 and 2. □

1.1.3 Majorization

For an absolutely continuous measure \(\mu \), define its distribution function by

$$\displaystyle \begin{aligned} F_{\mu}(\lambda) = \mathrm{Vol}\left(\left\{x: \frac{d\mu}{dx}(x)\geq \lambda\right\}\right). \end{aligned}$$

We say that \(\mu \) majorizes \(\eta \), denoted as \(\eta \prec \mu \), if for every \(t\in \mathbb R\),

$$\displaystyle \begin{aligned} \int\limits_t^\infty F_\eta(\lambda)d\lambda \leq \int\limits_t^\infty F_\mu(\lambda)d\lambda. \end{aligned}$$

In [19, Lemma 1.4], the following assertion is proven: If \(\mu = \varphi _*\eta \) for some \(\varphi :\mathbb R^d \to \mathbb R^d\), and \(|\det (\nabla \varphi (x))|\leq 1\) for every \(x \in \mathbb R^d\), then \(\eta \prec \mu \).

We use the singular value decomposition to deduce the identity \(|\det (\nabla \varphi (x))| = \prod \limits _{i=1}^d\sigma _i(\nabla \varphi (x))\), where \(\sigma _i(\nabla \varphi (x))\) stands for the \(i{\mathrm {th}}\) singular value of \(\nabla \varphi (x)\). So, we have the implication,

$$\displaystyle \begin{aligned} \|\nabla\varphi(x)\|{}_{\mathrm{op}} \leq 1 \implies |\det(\nabla\varphi(x))| \leq 1. \end{aligned}$$

By using Theorems 1 and 2 we can find regimes of parameters where \(\varphi ^{\mathrm {flow}}\) is 1-Lipschitz as required by the computation above. For log-concave measures it is enough to have a sufficiently bounded support, while for Gaussian mixtures one needs to both re-scale the variance and bound the support of the mixing measure. With this in mind, we get the following corollary:

Corollary 7

Let\(\mu \)be a probability measure on\(\mathbb R^d\).

  1. 1.

    If\(\mu \)is\(\kappa \)-log-concave,\(D :=\mathrm {diam}(\mathrm {supp}(\mu ))\), \(\kappa D^2 < 1\), and\(e^{\frac {1 - \kappa D^2}{2}} D \leq 1\), then,

    $$\displaystyle \begin{aligned} \gamma_d \prec \mu. \end{aligned}$$
  2. 2.

    If\(\mu = \gamma _d^a\star \nu \), where\(\gamma _d^a\)stands for the Gaussian measure with covariance\(a\mathrm {I}_d\), and\(\sqrt {a}e^{\frac {\mathrm {diam}(\mathrm {supp}(\nu ))^2}{2a}} \leq 1\)then,

    $$\displaystyle \begin{aligned} \gamma_d \prec \mu. \end{aligned}$$

Proof

For the first part, the condition \(e^{\frac {1 - \kappa D^2}{2}} D\leq 1\), along with Theorem 1, ensures that the transport map \(\varphi ^{\mathrm {flow}}\) is 1-Lipschitz. The claim follows from [19, Lemma 1.4].

For the second part, let \(a > 0\) and \(X \sim \gamma ^a_d \star \nu \), where \(\mathrm {diam}(\mathrm {supp}(\nu )) = R\). Then, \(\frac {1}{\sqrt {a}}X \sim \gamma _d\star \tilde {\nu }\), and \(\mathrm {diam}(\mathrm {supp}(\tilde {\nu })) \leq \frac {R}{\sqrt {a}}\). Let \(\varphi ^{\mathrm {flow}}\) be the \(e^{\frac {R^2}{2a}}\)-Lipschitz map, from Theorem 2, that transports \(\gamma _d\) to \(\gamma _d\star \tilde {\nu }.\) The above argument shows that \(\sqrt {a}\varphi ^{\mathrm {flow}}\) transports \(\gamma _d\) to \(\gamma ^a_d \star \nu \) and the map is \(\sqrt {a}e^{\frac {R^2}{2a}}\)-Lipschitz. Thus, if \(\sqrt {a}e^{\frac {R^2}{2a}} \leq 1\), there exists a 1-Lipschitz transport map, which implies the result. □

The fact that a measure majorizes the standard Gaussian has some interesting consequences. We state here one example, which appears in the proof of [19, Corollary 2.14]. If \(\gamma _d \prec \mu \), then

$$\displaystyle \begin{aligned} h_q(\gamma_d) \leq h_q(\mu), \end{aligned}$$

where, for \(q > 0\),

$$\displaystyle \begin{aligned} h_q(\mu) :=\frac{\log\left(\int\limits_{\mathbb R^d}\left(\frac{d\mu}{dx}(x)\right)^qdx\right)}{1-q}, \end{aligned}$$

is the q-Rényi entropy. So, Corollary 1 allows us to bound the q-Rényi entropy from below for some measures.

2 Proofs

2.1 Preliminaries

Before proving the main results, we briefly recall the construction of the transport map from [15, 24]. We take an informal approach and provide a rigorous statement at the end of the section.

Let \((Q_t)_{t\geq 0}\) stand for the Orenstein-Uhlenbeck semi-group, acting on functions \(g:\mathbb R^d \to \mathbb R\) by,

$$\displaystyle \begin{aligned} Q_tg(x) =\int\limits_{\mathbb R^d}g(e^{-t}x + \sqrt{1-e^{-2t}}y)d\gamma_d(y). \end{aligned}$$

For sufficiently integrable g, we have, for almost every \(x \in \mathbb R^d\),

$$\displaystyle \begin{aligned} Q_0g(x) = g(x) \text{ and } \lim\limits_{t \to \infty} Q_tg(x) = \mathbb E_{\gamma_d}[g]. \end{aligned}$$

Now, fix \(\mu \), a measure on \(\mathbb R^d\), with \(f(x):=\frac {d\mu }{d\gamma _d}(x)\), and consider the measure-valued path \(\mu _t := (Q_tf)\gamma _d\). We have \(\mu _0 = \mu \) and, for well-behaved measures, we also have ) Thus, there exists a time-dependent vector field \(V_t\), for which the continuity equation holds (see [29, Chapter 8] and [26, Section 4.1.2]):

$$\displaystyle \begin{aligned} \frac{d}{dt}\mu_t + \nabla \cdot (V_t\mu_t) = 0. \end{aligned}$$

In other words, by differentiating under the integral sign, for any test function g,

$$\displaystyle \begin{aligned} \int\limits_{\mathbb R^d}g\left(\frac{d}{dt}Q_tf\right)d\gamma_d = \int\limits_{\mathbb R^d}\langle \nabla g ,V_t\rangle (Q_tf)d\gamma_d. \end{aligned}$$

We now turn to computing \(V_t\). Observe that, by the definition of \(Q_t\),

$$\displaystyle \begin{aligned} \frac{d}{dt}Q_tf(x) = \Delta Q_tf(x) - \langle x, \nabla Q_tf(x)\rangle. \end{aligned}$$

Hence, integrating by parts with respect to the standard Gaussian shows,

$$\displaystyle \begin{aligned} \int\limits_{\mathbb R^d}g\left(\frac{d}{dt}Q_tf\right)d\gamma_d = -\int\limits_{\mathbb R^d}\langle \nabla g, \nabla Q_tf\rangle d\gamma_d, \end{aligned}$$

whence it follows that \(V_t = -\frac {\nabla Q_tf}{Q_tf} = -\nabla \log Q_tf.\) Now consider the maps \(\{S_t\}_{t \geq 0}\), obtained as the solution to the differential equation

$$\displaystyle \begin{aligned} {} \frac{d}{dt}S_t(x) =V_t(S_t(x)), \ \ \ S_0(x) = x. \end{aligned} $$
(3)

The map \(S_t\) turns out to be a diffeomorphism which transports \(\mu _0\) to \(\mu _t\) and we denote \(T_t : = S_t^{-1}\), which transports \(\mu _t\) to \(\mu _0\). We define the transport maps T and S as the limits

$$\displaystyle \begin{aligned} T:=\lim\limits_{t \to \infty}T_t,\ \ S:=\lim\limits_{t \to \infty}S_t, \end{aligned}$$

in which case, we have \(T_*\gamma _d = \mu \) and \(S_*\mu = \gamma _d\). These are our transport maps

$$\displaystyle \begin{aligned} \varphi^{\mathrm{flow}}:= T \quad \text{and}\quad (\varphi^{\mathrm{flow}})^{-1} := S. \end{aligned}$$

Remark 8

It is clear that if \(f(x) = f(-x)\), then \(V_t\) and, consequently, \(S_t\) (see the discussion following [15, Lemma 3.1]) are odd functions. Hence, if the target measure is symmetric, \(T(0) = 0\).

The above arguments are heuristic and require a rigorous justification (as in [15, Section 3]). For the sake of completeness, below, in Lemma 10, we prove sufficient conditions for the existence of the diffeomorphisms \(\{S_t\}_{t \geq 0}\), \(\{T_t\}_{t \geq 0}\) and for the existence of the transport maps S and T.

We shall require the following approximation lemma, adapted from [22, Lemma 2.1] (a generalization of [15, Lemma 3.2]), which we shall repeatedly use.

Lemma 9

Let\(\eta \)and\(\eta '\)be two probability measures on\(\mathbb R^d\), and let\(\{\eta _k\}_{k \geq 0}\), \(\{\eta ^{\prime }_k\}_{k \geq 0}\)be two sequences of probability measures which converge to\(\eta \)and\(\eta '\)in distribution. Suppose that for every k there exists an\(L_k\)-Lipschitz map\(\varphi _k\)with\((\varphi _k)_*\eta _k=\eta _k^{\prime }\). Then, if\(L:=\limsup \limits _{k \to \infty } L_k < \infty \), there exists an L-Lipschitz map\(\varphi \)with\(\varphi _*\eta =\eta '\). Moreover, by passing to a sub-sequence, we have that for\(\eta \)-almost every x,

$$\displaystyle \begin{aligned} \lim\limits_{k\to \infty} \varphi_k(x) = \varphi(x). \end{aligned}$$

Proof

Under the assumptions of the lemma, the existence of the limiting map \(\varphi \) is assured by the proof of [22, Lemma 2.1]. We are left with showing that \(\varphi \) is L-Lipschitz. Let \(r > 0\), and observe that, since \(\limsup L_k < \infty \), there exists a sub-sequence, still denoted \(\varphi _k\), such that, for every \(k \geq 0\), \(\varphi _k\) is \((L+r)\)-Lipschitz. It follows from [22, Lemma 2.1] that \(\varphi \) is \((L+r)\)-Lipschitz. Since r is arbitrary the proof is complete. □

We are now ready to state our main technical lemma.

Lemma 10

Assume that \(\mu \) has a smooth density.

  • Suppose that, for every\(t \geq 0\), there exists\(a_t < \infty \)such that,

    $$\displaystyle \begin{aligned} {} \sup_{s\in [0,t]}\|\nabla V_s\|{}_{\mathrm{op}} \leq a_t. \end{aligned} $$
    (4)

    Then, there exists a solution,\(\{S_t\}_{t \geq 0}\), to (3), which is a diffeomorphism, for every\(t \geq 0.\)

  • As\(t \to \infty \), \(\mu _t\)converges weakly to\(\gamma _d\).

  • Suppose (4) holds, and that, for every\(t \geq 0\), \(T_t\)(resp.\(S_t\)) is\(L_t\)-Lipschitz. Then, if\(L:= \limsup \limits _{t\to \infty }L_t < \infty \), the map T (resp. S) is well-defined and T (resp. S) is L-Lipschitz.

Proof

Combining the assumption on the smoothness of \(\frac {d\mu }{dx}\) with (4) gives that, for every \(T < \infty \), V  is a smooth, spatially Lipschitz, function on \([0,T] \times \mathbb R^d\) . Thus, by the Picard–Lindelöf theorem, [23, Theorem 3.1], there exists a unique global smooth (see [12, Chapter 1, Theorem 3.3] and the subsequent discussion) solution \(S_t\) to (3). By inverting the flow, one may see that the maps \(S_t\) are invertible. Indeed, for fixed \(t > 0\), consider, for \(0\leq s \leq t\),

$$\displaystyle \begin{aligned} \frac{d}{ds}T_{t,s}(x)= -V_{t-s}(T_{t,s}(x)), \ \ \ T_{t,0}(x) = x. \end{aligned}$$

Then, \(S_t^{-1} := T_t := T_{t,t}\), which establishes the first item.

For the second item, note that the Orenstein-Uhlenbeck process is ergodic (see, for example, [15, Lemma 3.2]) and, hence,

$$\displaystyle \begin{aligned} \lim\limits_{t \to \infty} \|Q_tf - \mathbb E_{\gamma_d}[f]\|{}_{L_1(\gamma_d)} = \lim\limits_{t \to \infty} \|Q_tf - 1\|{}_{L_1(\gamma_d)} = 0. \end{aligned}$$

Thus, \(\mu _t\) converges to \(\gamma _d\) in total variation, implying weak convergence.

To see the third item, note that the first item establishes the existence of maps \(S_t\) which satisfy, \((S_t)_*\mu = \mu _t\), [26, Section 4.1.2]. The second item shows that, as \(t \to \infty \), we may approximate \(\gamma _d\) by \(\mu _t\). These conditions allow us to invoke Lemma 9, which shows that there exists a sequence ), such that, for \(\mu \)-almost every x,

$$\displaystyle \begin{aligned} S(x):= \lim\limits_{k \to \infty}S_{t_k}(x), \end{aligned}$$

is well-defined and such that \(S_*\mu = \gamma _d\). Since \(S_t\) is invertible, for every \(t \geq 0\), the same argument, applied to \(T_t\), shows the existence of T.

Finally, let us address the Lipschitz constants of S and T. We shall prove the claim for S; the proof for T is identical. The previous argument shows that there exists a null set \(E \subset \mathrm {supp}(\mu )\), such that, for every \(z \in \mathrm {supp}(\mu ) \setminus E\), \(\lim \limits _{k \to \infty }S_{t_k}(z)\) exists. So, for any \(x,y \in \mathrm {supp}(\mu ) \setminus E\),

$$\displaystyle \begin{aligned} \|S(x) - S(y)\| = \lim\limits_{k \to \infty}\|S_{t_k}(x) - S_{t_k}(y)\| \leq \limsup_{k \to \infty} L_{t_k}\|x - y\| \leq L\|x-y\|. \end{aligned} $$

This shows \(\|S(x) - S(y)\| \leq L\|x-y\|\), \(\mu \)-almost everywhere, which finishes the proof. □

We shall also require the following lemma, which explains how to deduce global Lipschitz bounds from estimates on the derivatives of the vector fields \(V_t\).

Lemma 11

Let the above notation prevail and assume that \(\mu \) has a smooth density. For every \(t \geq 0\) , let \(\theta ^{\mathrm {max}}_t,\theta ^{\mathrm {min}}_t\) be such that

$$\displaystyle \begin{aligned} \theta^{\mathrm{max}}_t \geq \lambda_{\mathrm{max}}\left(-\nabla V_t(x)\right) \geq \lambda_{\mathrm{min}}\left(-\nabla V_t(x)\right)\geq \theta^{\mathrm{min}}_t, \end{aligned}$$

for almost every \(x \in \mathbb R^d\) . Then,

  1. 1.

    The Lipschitz constant of S is at most\(\exp \left (-\int \limits _0^\infty \theta ^{\mathrm {min}}_tdt\right )\).

  2. 2.

    The Lipschitz constant of T is at most\(\exp \left (\int \limits _0^\infty \theta ^{\mathrm {max}}_tdt\right )\).

Proof

We begin with the first item. For every \(t \geq 0\), we will show that

$$\displaystyle \begin{aligned} {} \|S_t(x)-S_t(y)\| \leq \exp\left(-\int\limits_0^t\theta^{\mathrm{min}}_sds\right)\|x-y\| \text{ for every } x,y \in \mathbb R^d. \end{aligned} $$
(5)

The desired result will be obtained by taking \(t \to \infty \) and invoking Item 3 of Lemma 10.

Towards (5), it will suffice to show that, for every unit vector \(w \in \mathbb R^d\),

$$\displaystyle \begin{aligned} \|\nabla S_t(x) w\|\leq \exp\left(-\int\limits_0^t\theta^{\mathrm{min}}_sds\right). \end{aligned}$$

Fix \(x,w \in \mathbb R^d\) with \(\|w\|=1\), and define the function \(\alpha _w(t) := \nabla S_t(x) w\). To understand the evolution of \(\|\alpha _w(t)\|\), recall that \(S_t\) satisfies the differential equation in (3). Thus,

$$\displaystyle \begin{aligned} \frac{d}{dt}\|\alpha_w(t)\| &= \frac{1}{\|\alpha_w(t)\|}\alpha_w(t)^{\mathsf{T}}\cdot \frac{d}{dt}\alpha_w(t)\\ & =\frac{1}{\|\alpha_w(t)\|}w^{\mathsf{T}}\nabla S_t(x)^{\mathsf{T}}\nabla V_t(S_t(x))\nabla S_t(x) w\\ &\leq -\theta_t^{\mathrm{min}}\frac{1}{\|\alpha_w(t)\|}w^{\mathsf{T}}\nabla S_t(x)^{\mathsf{T}}\nabla S_t(x) w = -\theta^{\mathrm{min}}_t \|\nabla S_t(x) w\| \\ & = -\theta_t^{\mathrm{min}} \|\alpha_w(t)\|. \end{aligned} $$

Since \(\|\alpha _w(0)\| =1\), from Gronwall’s inequality we deduce,

$$\displaystyle \begin{aligned} \|\nabla S_t(x) w\| = \|\alpha_w(t)\| \leq \exp\left(-\int\limits_0^t\theta_s^{\mathrm{min}}ds\right). \end{aligned}$$

Thus, (5) is established, as required.

The proof of the second part is similar, but this time we will need to show, for every unit vector \(w \in \mathbb R^d\),

$$\displaystyle \begin{aligned} \|\nabla S_t(x) w\|\geq \exp\left(-\int\limits_0^t\theta^{\mathrm{max}}_sds\right). \end{aligned}$$

Indeed, this would imply \(\nabla S_t(x) \nabla S_t(x)^{\mathsf {T}} \succeq \exp \left (-2\int \limits _0^t\theta ^{\mathrm {max}}_sds\right )\mathrm {I}_d\). Since \(S_t\) is a diffeomorphism, and \(T_t = S_t^{-1}\), by the inverse function theorem, the local expansiveness of \(S_t\) implies

$$\displaystyle \begin{aligned} \nabla T_t(x)\nabla T_t(x)^{\mathsf{T}} \preceq \exp\left(2\int\limits_0^t\theta^{\mathrm{max}}_sds\right)\mathrm{I}_d. \end{aligned}$$

So, for almost every \(x \in \mathbb R^d\), \(\|\nabla T_t(x)\|{ }_{\mathrm {op}} \leq \exp \left (\int \limits _0^t\theta ^{\mathrm {max}}_sds\right )\), and the claim is proven by, again, invoking Item 3 in Lemma 10.

Let \(\alpha _w(t)\) be as above. Then,

$$\displaystyle \begin{aligned} \frac{d}{dt}\|\alpha_w(t)\| &= \frac{1}{\|\alpha_w(t)\|}\alpha_w(t)^{\mathsf{T}}\cdot \frac{d}{dt}\alpha_w(t)\\ & =\frac{1}{\|\alpha_w(t)\|}w^{\mathsf{T}}\nabla S_t(x)^{\mathsf{T}}\nabla V_t(S_t(x))\nabla S_t(x) w\\ &\geq -\theta^{\mathrm{max}}_t\frac{1}{\|\alpha_w(t)\|}w^{\mathsf{T}}\nabla S_t(x)^{\mathsf{T}}\nabla S_t(x) w = -\theta^{\mathrm{max}}_t \|\nabla S_t(x) w\| \\ & = -\theta^{\mathrm{max}}_t \|\alpha_w(t)\|. \end{aligned} $$

As before, Gronwall’s inequality implies

$$\displaystyle \begin{aligned} \|\nabla S_t(x) w\| = \|\alpha_w(t)\| \geq \exp\left(-\int\limits_0^t\theta^{\mathrm{max}}_sds\right), \end{aligned}$$

which concludes the proof. □

2.2 Lipschitz Properties of Transportation Along Heat Flows

2.2.1 Transportation from the Gaussian

Our proofs of Theorems 1 and 2 go through bounding the derivative, \(\nabla V_t = -\nabla ^2\log Q_tf\), of the vector field constructed above, and then applying Lemma 11. Our main technical tools are uniform estimates on \(\nabla ^2\log Q_tf\), when the measures satisfy some combination of convexity and boundedness assumptions.

Lemma 12

Let \(\mu =f \gamma _d\) and let \(D:=\mathrm {diam}(\mathrm {supp}(\mu ))\) . Then, for \(\mu \) -almost every x,

$$\displaystyle \begin{aligned} -\nabla V_t(x) \succeq -\frac{e^{-2t}}{1-e^{-2t}}\mathrm{I}_d. \end{aligned}$$

Furthermore,

  1. 1.

    For every\(t\geq 0\),

    $$\displaystyle \begin{aligned} -\nabla V_t(x) \preceq e^{-2t}\left(\frac{D^2}{(1-e^{-2t})^2}-\frac{1}{1-e^{-2t}}\right)\mathrm{I}_d. \end{aligned}$$
  2. 2.

    Let \(\kappa \in \mathbb R\) and suppose that \(\mu \) is \(\kappa \) -log-concave. Then,

    $$\displaystyle \begin{aligned} -\nabla V_t(x) \preceq e^{-2t}\frac{1-\kappa}{\kappa(1-e^{-2t})+e^{-2t}}, \end{aligned}$$

    where the inequality holds for any\(t \geq 0\)when\(\kappa \geq 0\), and for\(t\in \left [0, \frac {1}{2}\log \left (\frac {\kappa -1}{\kappa }\right )\right ]\)if\(\kappa <0\).

  3. 3.

    If\(\mu :=\gamma _d\star \nu \), with\(\mathrm {diam}(\mathrm {supp}(\nu ))\leq R\), then, for\(t \geq 0\),

    $$\displaystyle \begin{aligned} -\nabla V_t(x) \preceq e^{-2t}R^2\mathrm{I}_d. \end{aligned}$$

Proof

Let \((P_t)_{t \in [0,1]}\) stand for the heat semi-group, related to \(Q_t\) by \(Q_tf(x) = P_{1-e^{-2t}}f(e^{-t}x)\). In particular,

$$\displaystyle \begin{aligned} -\nabla V_t(x) = \nabla^2 \log Q_tf(x) = e^{-2t}\nabla^2 \log P_{1-e^{-2t}}f(e^{-t}x). \end{aligned}$$

The desired result is now an immediate consequence of [20, Lemma 3.3 and Equation (3.3)], where the paper uses the notation \(v(t,x):= \nabla \log P_{1-t}f(x)\). □

By integrating Lemma 12 and plugging the result into Lemma 11 we can now prove Theorems 1 and 2. We begin with the proof of Theorem 2, which is easier.

Proof of Theorem 2

Recall that \(\varphi ^{\mathrm {flow}}\) is the transport map T, constructed in Sect. 2.1. Remark that the conditions of Lemma 10 are satisfied for the measures we consider: Lemma 12 ensures that (4) holds and \(\mu \) has a smooth density.

If \(\mu :=\gamma _d\star \nu \), and \(\nu \) is supported on a ball of radius R, then, by Lemma 12, we may take \(\theta ^{\mathrm {max}}_t = e^{-2t}R^2\) in Lemma 11. Compute

$$\displaystyle \begin{aligned} \int\limits_0^\infty \theta^{\mathrm{max}}_tdt = \frac{R^2}{2}. \end{aligned} $$

Thus, \(\varphi ^{\mathrm {flow}}\) is Lipschitz with constant \(e^{\frac {R^2}{2}}\). □

The proof of Theorem 1 is similar, but the calculations involved are more tedious, even if elementary.

Proof of Theorem 1

We begin by assuming that \(\mu \) has a smooth density, and handle the general case later with an approximation argument. Thus, as in the proof of Theorem 2, the conditions of Lemma 10 are satisfied, and we recall that \(\varphi ^{\mathrm {flow}}\) is the transport map T. The first item of the Theorem is covered by Kim and Milman [15, Theorem 1.1] (the authors actually prove it for \(\kappa = 1\); the general case follows by a re-scaling argument), so we may assume \(\kappa D^2 < 1\). Set \(t_0 = \frac {1}{2} \log \Big (\frac {D^2 (\kappa - 1) - 1}{\kappa D^2 - 1}\Big ).\) By optimizing over the first and second estimates in Lemma 12 we define,

$$\displaystyle \begin{aligned} \theta^{\mathrm{max}}_t = \begin{cases} \frac{e^{-2t}(1-\kappa)}{\kappa(1-e^{-2t})+e^{-2t}}& \text{if }t \in [0,t_0]\\ e^{-2t}\left(\frac{D^2}{(1-e^{-2t})^2}-\frac{1}{1-e^{-2t}}\right)& \text{if }t > t_0 \end{cases}. \end{aligned}$$

Remark that when \(\kappa < 0\), \(t_0 < \frac {1}{2}\log \left (\frac {\kappa -1}{\kappa }\right )\), so the second bound of Lemma 12 remains valid in this case.

We compute,

$$\displaystyle \begin{aligned} \int\limits_0^\infty \theta^{\mathrm{max}}_tdt &= \int\limits_0^{t_0} \theta^{\mathrm{max}}_tdt + \int\limits_{t_0}^{\infty} \theta^{\mathrm{max}}_tdt\\ &= \int\limits_0^{t_0}\frac{e^{-2t}(1-\kappa)}{\kappa(1-e^{-2t})+e^{-2t}}dt + \int\limits_{t_0}^{\infty}e^{-2t}\left(\frac{D^2}{(1-e^{-2t})^2}-\frac{1}{1-e^{-2t}}\right)dt\\ &= -\frac{1}{2}\log(\kappa(1-e^{-2t}) + e^{-2t})\Bigg\vert_{0}^{t_0} +\frac{1}{2}\left(-\frac{D^2}{1-e^{-2t}}-\log(1-e^{-2t})\right)\Bigg\vert_{t_0}^{\infty}\\ &= \frac{1}{2}\log\left(1 - D^2(\kappa-1)\right) + \frac{1 - \kappa D^2}{2} +\frac{1}{2}\log(D^2) \\ & - \frac{1}{2}\log(1-D^2(\kappa-1))\\ &= \frac{1 - \kappa D^2}{2} +\frac{1}{2}\log(D^2). \end{aligned} $$

By Lemma 11, the Lipschitz constant of \(\varphi ^{\mathrm {flow}}\) is at most

$$\displaystyle \begin{aligned} \exp\left(\int\limits_0^\infty \theta^{\mathrm{max}}_tdt\right) = e^{\frac{1 - \kappa D^2}{2}} D. \end{aligned}$$

If \(\mu \) does not have a smooth density, by Lemma 9, it will be enough to show that \(\mu \) can be approximated in distribution by \(\{\mu _k\}_{k\geq 0}\), where each \(\mu _k\) is log-concave with bounded support and

$$\displaystyle \begin{aligned} \lim\limits_{k\to\infty}\mathrm{diam}(\mathrm{supp}(\mu_k)) = D. \end{aligned}$$

For \(\varepsilon > 0\), let \(h_\varepsilon (x) = e^{-\frac {1}{1-\|\frac {x}{\varepsilon }\|{ }^2}}{\mathbf {1}}_{\{\|x\| \leq \varepsilon \}}\) and define the measure \(\xi _\varepsilon \) with density proportional to \(h_\varepsilon \). Then, \(\xi _\varepsilon \) is a log-concave measure with smooth density and \(\mathrm {diam}(\mathrm {supp}(\xi _\varepsilon )) = \varepsilon \). It is straightforward to verify that, for every \(\varepsilon > 0\), \(\mu _\varepsilon := \xi _\varepsilon \star \mu \) is \(\kappa \)-log-concave with smooth density. Further, as \(\varepsilon \to 0\), \(\mu _\varepsilon \) converges to \(\mu \), in distribution, and \(\lim \limits _{\varepsilon \to 0}\mathrm {diam}(\mathrm {supp}(\mu _\varepsilon )) = D.\) The claim is proven. □

2.2.2 Transportation to the Gaussian

To prove Theorem 3 we will need an analogue of Lemma 12 with bounds in the other direction. This is done in the following lemma which shows that the evolution of log-convex functions along the heat flow is dominated by the evolution of Gaussian functions. The proof of the lemma is similar to the proof that strongly log-concave measures are preserved under convolution, [27, Theorem 3.7(b)]. The only difference between the proofs is that the use of the Prékopa-Leindler inequality is replaced by the fact that a mixture of log-convex functions is log-convex.

Lemma 13 (Semi-Log-Convexity Under the Heat Flow)

Let \(d\mu = fd\gamma \) be a \(\beta \) -semi-log-convex probability measure on \(\mathbb R^d\) . Then, for almost every x,

$$\displaystyle \begin{aligned} -\nabla V_t(x) \succeq \frac{e^{-2t}\left(1 - \beta\right)}{(1-e^{-2t})\left(\beta -1\right) +1}\mathrm{I}_d. \end{aligned}$$

Proof

We let \((P_t)_{t \geq 0}\) stand for the heat semi-group, defined by

$$\displaystyle \begin{aligned} P_tf(x) = \int\limits_{\mathbb R^d} f(x + \sqrt{t}y)d\gamma_d(y). \end{aligned}$$

Since \(-\nabla V_t(x) = \nabla ^2 \log Q_tf(x) = e^{-2t}\nabla ^2\log P_{1-e^{-2t}}f(e^{-t}x)\), it will be enough to prove,

$$\displaystyle \begin{aligned} {} \nabla^2\log P_tf(x) \succeq \frac{\left(1 - \beta\right)}{t\left(\beta -1\right) +1}\mathrm{I}_d. \end{aligned} $$
(6)

We first establish the claim in the special case when \(f(x) := \psi _\beta (x) \propto e^{-\frac {1}{2}\left (\beta - 1\right )\|x\|{ }^2}\), where the symbol \(\propto \) signifies equality up to a constant which does not depend on x, which corresponds to \(\mu = \mathcal {N}(0, \frac {1}{\beta }\mathrm {I}_d).\) This case is facilitated by the fact that \(P_t\) acts on f by convolving it with a Gaussian kernel. The result follows since a convolution of Gaussians is a Gaussian and since \(\nabla ^2\log \) applied to a Gaussian yields the covariance matrix. To elucidate what comes next, we provide below the full calculation.

For convenience denote \(\beta _t = \left (t\left (\beta -1\right ) + 1 \right )\), and compute,

$$\displaystyle \begin{aligned} P_t\psi_\beta(x) &\propto \int\limits_{\mathbb R^d} e^{- \frac{1}{2}\left(\beta-1\right)\|x + \sqrt{t}y\|{}^2}e^{-\frac{\|y\|{}^2}{2}}dy \\ &= \int\limits_{\mathbb R^d} \exp\left(-\frac{1}{2}\left(\left(\beta-1\right)\|x\|{}^2 + 2\left(\beta-1\right)\sqrt{t}\langle x, y\rangle \right.\right.\\ & \quad \left.\left.+ \left(t\left(\beta-1\right) + 1 \right)\|y\|{}^2\right)\right)dy\\ &= \int\limits_{\mathbb R^d} \exp\left(-\frac{\beta_t}{2}\left(\frac{\beta-1}{\beta_t}\|x\|{}^2 + 2\sqrt{t}\frac{\beta-1}{\beta_t}\langle x, y\rangle + \|y\|{}^2\right)\right)dy\\ &= \exp\left(-\frac{\beta_t}{2}\left(\frac{\beta-1}{\beta_t}\left(1 - t\frac{\beta-1}{\beta_t}\right)\right)\|x\|{}^2\right)\\ &\quad \times \int \exp\left(-\frac{\beta_t}{2}\left\|\sqrt{t}\frac{\beta-1}{\beta_t}x+y\right\|{}^2\right)dy. \end{aligned} $$

The integrand in the last line is proportional to the density of a Gaussian. Hence, the value of the integral does not depend on x, and

$$\displaystyle \begin{aligned} P_t\psi_\beta(x)&\propto \exp\left(-\frac{\beta_t}{2}\left(\frac{\beta-1}{\beta_t}\left(1 - t\frac{\beta-1}{\beta_t}\right)\right)\|x\|{}^2\right)\\ &= \exp\left(-\frac{1}{2}\left(\left(\beta-1\right)\left(1 - t\frac{\beta-1}{\beta_t}\right)\right)\|x\|{}^2\right)\\ &= \exp\left(-\frac{1}{2}\left(\frac{\beta-1}{t\left(\beta-1\right)+1}\|x\|{}^2\right)\right). \end{aligned} $$

So,

$$\displaystyle \begin{aligned} {} \nabla^2 \log P_t\psi_\beta(x) &= \frac{\left(1 - \beta\right)}{t\left(\beta -1\right) +1}\mathrm{I}_d, \end{aligned} $$
(7)

which gives equality in (6).

For the general case, the log-convexity assumption means that we can write \(\frac {d\mu }{dx} = e^{V(x)- \beta \frac {\|x\|{ }^2}{2}},\) for a convex function V . Hence, \(f(x) \propto e^{V(x)- \frac {1}{2}(\beta -1)\|x\|{ }^2}\). With analogous calculations to the ones made above, we get,

$$\displaystyle \begin{aligned} P_tf(x) &\propto \int\limits_{\mathbb R^d} e^{V(x +\sqrt{t}y)- \frac{1}{2}\left(\beta-1\right)\|x + \sqrt{t}y\|{}^2}e^{-\frac{y^2}{2}}dy\\ &= \exp\left(-\frac{1}{2}\left(\frac{\beta-1}{t\left(\beta-1\right)+1}\|x\|{}^2\right)\right) \\ &\quad \times\int\limits_{\mathbb R^d} \exp\left(V(x +\sqrt{t}y)-\frac{\beta_t}{2}\left\|\sqrt{t}\frac{\beta-1}{\beta_t}x+y\right\|{}^2\right)dy\\ &\propto P_t\psi_\beta(x)\int\limits_{\mathbb R^d} \exp\left(V(x +\sqrt{t}y)-\frac{\beta_t}{2}\left\|\sqrt{t}\frac{\beta-1}{\beta_t}x+y\right\|{}^2\right)dy. \end{aligned} $$

Write \(H_t(x) := \int \limits _{\mathbb R^d} \exp \left (V(x +\sqrt {t}y)-\frac {\beta _t}{2}\left \|\sqrt {t}\frac {\beta -1}{\beta _t}x+y\right \|{ }^2\right )dy\) and observe by (7),

$$\displaystyle \begin{aligned} {} \nabla^2\log P_t f(x) &= \nabla^2 \log P_t\psi_\beta(x) + \nabla^2\log(H_t(x)) = \frac{\left(1 - \beta\right)}{t\left(\beta -1\right) +1}\mathrm{I}_d \\ &\quad + \nabla^2\log(H_t(x)). \end{aligned} $$
(8)

To finish the proof we will show that \(\nabla ^2\log (H_t(x)) \succeq 0\), or , equivalently, that \(H_t\) is log-convex. By applying a linear change of variables, we can re-write \(H_t\) as,

$$\displaystyle \begin{aligned} H_t(x) = \int\limits_{\mathbb R^d} \exp\left(V\left(\left(1-t\frac{\beta - 1}{\beta_t}\right)x + \sqrt{t}y\right)\right)e^{-\frac{\beta_t\|y\|{}^2}{2}}dy. \end{aligned}$$

As V  is convex, for every \(t \geq 0\) and \(y\in \mathbb R^d\), the function \(x \mapsto V\left (\left (1-t\frac {\beta - 1}{\beta _t}\right )x + \sqrt {t}y\right )\) is convex. So, \(H_t(x)\) is a mixture of log-convex functions. Since a mixture of log-convex functions is also log-convex (see [18, Chapter 16.B]), the proof is complete. □

We now prove Theorem 3.

Proof of Theorem 3

Recall that \((\varphi ^{\mathrm {flow}})^{-1}\) is the transport map S, constructed in Sect. 2.1. Again, we begin by assuming that \(\mu \) has a smooth density, and one may verify that the conditions of Lemma 10 are satisfied, which makes S well-defined.

Let \(\theta _t^{\mathrm {min}}= e^{-2t}\frac {\left (1 - \beta \right )}{(1-e^{-2t})\left (\beta -1\right ) +1}\). Combining Lemma 13 with Lemma 11 shows that S is \(\exp \left (\int _0^\infty -\theta _t^{\mathrm {min}}dt\right )\)-Lipschitz. Compute,

$$\displaystyle \begin{aligned} \int_0^\infty-\theta_t^{\mathrm{min}}dt &= \int_0^\infty -e^{-2t}\frac{\left(1 - \beta\right)}{(1-e^{-2t})\left(\beta -1\right) +1}dt \\ &= \frac{1}{2} \log\left((1 - e^{-2 t})(\beta - 1) +1\right)\Big\vert_0^\infty =\frac{\log(\beta)}{2}. \end{aligned} $$

Hence, S is \(\exp \left (\frac {\log (\beta )}{2}\right ) = \sqrt {\beta }\)-Lipschitz.

To finish the proof, we shall construct a family \(\{\mu _\varepsilon \}_{\varepsilon > 0}\) of \(\beta _\varepsilon \)-log-convex measures which converge to \(\mu \) in distribution as \(\varepsilon \to 0\), and such that

$$\displaystyle \begin{aligned} \lim\limits_{\varepsilon \to 0}\beta_\varepsilon = \beta. \end{aligned}$$

The claim then follows by invoking Lemma 9.

Let \(\gamma _{d,\varepsilon }\) stand for the d-dimensional Gaussian measure with covariance \(\varepsilon \mathrm {I}_d\), and set \(\mu _\varepsilon = \mu \star \gamma _{d,\varepsilon }\). It is clear that, as \(\varepsilon \to 0\), \(\mu _\varepsilon \) converges to \(\mu \) in distribution. Moreover, if we replace f by \(\frac {d\mu }{dx} \), in (8), we see that \(\mu _\varepsilon \) is \(\beta _\varepsilon \)-log-convex, with,