Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Statements of Results

An important theme in recent work in asymptotic geometric analysis is that many classical implications between different types of geometric or functional inequalities can be reversed in the presence of convexity. A particularly striking recent example is the work of E. Milman [1113], showing for example that, on a Riemannian manifold equipped with a probability measure satisfying a convexity assumption, the existence of a Cheeger inequality, a Poincaré inequality, and exponential concentration of Lipschitz functions are all equivalent. Important earlier examples of this theme are C. Borell’s 1974 proof of reverse Hölder inequalities for log-concave measures [4], and K. Ball’s 1991 proof of a reverse isoperimetric inequality for convex bodies [1].

In this note, we explore the extent to which different notions of distance between probability measures are comparable in the presence of a convexity assumption. Specifically, we consider log-concave probability measures; that is, Borel probability measures μ on \(\mathbb{R}^{n}\) such that for all nonempty compact sets \(A,B \subseteq \mathbb{R}^{n}\) and every λ ∈ (0, 1),

$$\displaystyle{\mu (\lambda A + (1-\lambda )B) \geq \mu (A)^{\lambda }\mu (B)^{1-\lambda }.}$$

We moreover consider only those log-concave probability measures μ on \(\mathbb{R}^{n}\) which are isotropic, meaning that if X ∼ μ then

$$\displaystyle{\mathbb{E}X = 0\qquad \mbox{ and}\qquad \mathbb{E}\mathit{XX}^{T} = I_{ n}.}$$

The following distances between probability measures μ and ν on \(\mathbb{R}^{n}\) appear below.

  1. 1.

    The total variation distance is defined by

    $$\displaystyle{d_{\mathit{TV}}(\mu,\nu ):= 2\sup _{A\subseteq \mathbb{R}^{n}}\left \vert \mu (A) -\nu (A)\right \vert,}$$

    where the supremum is over Borel measurable sets.

  2. 2.

    The bounded Lipschitz distance is defined by

    $$\displaystyle{d_{\mathit{BL}}(\mu,\nu ):=\sup _{\left \Vert g\right \Vert _{\mathit{BL}}\leq 1}\left \vert \int g\ d\mu -\int g\ d\nu \right \vert,}$$

    where the bounded-Lipschitz norm \(\left \Vert g\right \Vert _{\mathit{BL}}\) of \(g: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is defined by

    $$\displaystyle{\left \Vert g\right \Vert _{\mathit{BL}}:=\max \left \{\left \Vert g\right \Vert _{\infty },\ \sup _{x\neq y}\frac{\left \vert g(x) - g(y)\right \vert } {\left \Vert x - y\right \Vert } \right \}}$$

    and \(\left \Vert \cdot \right \Vert\) denotes the standard Euclidean norm on \(\mathbb{R}^{n}\). The bounded-Lipschitz distance is a metric for the weak topology on probability measures (see, e.g., [6, Theorem 11.3.3]).

  3. 3.

    The L p Wasserstein distance for p ≥ 1 is defined by

    $$\displaystyle{W_{p}(\mu,\nu ):=\inf _{\pi }\left [\int \left \Vert x - y\right \Vert ^{p}\ d\pi (x,y)\right ]^{\frac{1} {p} },}$$

    where the infimum is over couplings π of μ and ν; that is, probability measures π on \(\mathbb{R}^{2n}\) such that \(\pi (A \times \mathbb{R}^{n}) =\mu (A)\) and \(\pi (\mathbb{R}^{n} \times B) =\nu (B)\). The L p Wasserstein distance is a metric for the topology of weak convergence plus convergence of moments of order p or less. (See [15, Sect. 6] for a proof of this fact, and a lengthy discussion of the many fine mathematicians after whom this distance could reasonably be named.)

  4. 4.

    If μ is absolutely continuous with respect to ν, the relative entropy, or Kullback–Leibler divergence is defined by

    $$\displaystyle{H(\mu \mid \nu ):=\int \left (\frac{d\mu } {d\nu }\right )\log \left (\frac{d\mu } {d\nu }\right )\ d\nu =\int \log \left (\frac{d\mu } {d\nu }\right )\ d\mu.}$$

It is a classical fact that for any probability measures μ and ν on \(\mathbb{R}^{n}\),

$$\displaystyle{ d_{\mathit{BL}}(\mu,\nu ) \leq d_{\mathit{TV}}(\mu,\nu ). }$$
(1)

This follows from a dual formulation of total variation distance: the Riesz representation theorem implies that

$$\displaystyle{ d_{\mathit{TV}}(\mu,\nu ) =\sup \left \{\left \vert \int g\ d\mu -\int g\ d\nu \right \vert: g \in C(\mathbb{R}^{n}),\ \left \Vert g\right \Vert _{ \infty }\leq 1\right \}. }$$
(2)

In the case that μ and ν are log-concave, there is the following complementary inequality.

Proposition 1.

Let μ and ν be log-concave isotropic probability measures on \(\mathbb{R}^{n}\) . Then

$$\displaystyle{d_{\mathit{TV}}(\mu,\nu ) \leq C\sqrt{\mathit{nd } _{\mathit{BL } } (\mu,\nu )}.}$$

In this result and below, C, c, etc. denote positive constants which are independent of n, μ, and ν, and whose values may change from one appearance to the next.

In the special case in which n = 1 and ν = γ 1, Brehm, Hinow, Vogt and Voigt proved a similar comparison between total variation distance and Kolmogorov distance d K .

Proposition 2 ([5, Theorem 3.3]).

Let μ be a log-concave measure on \(\mathbb{R}\) . Then

$$\displaystyle{d_{\mathit{TV}}(\mu,\gamma _{1}) \leq C\sqrt{\max \left \{1,\log (1/d_{K } (\mu,\gamma _{1 } )) \right \} d_{K } (\mu,\gamma _{1 } )}.}$$

Together with (1), Proposition 1 implies the following.

Corollary 3.

On the family of isotropic log-concave probability measures on \(\mathbb{R}^{n}\) , the topologies of weak convergence and of total variation coincide.

Corollary 3 will probably be unsurprising to experts, but we have not seen it stated in the literature.

Proposition 1 and Corollary 3 are false without the assumption of isotropicity. For example, a sequence of nondegenerate Gaussian measures \(\{\mu _{k}\}_{k\in \mathbb{N}}\) on \(\mathbb{R}^{n}\) may weakly approach a Gaussian measure μ supported on a lower-dimensional subspace, but d TV (μ k , μ) = 2 for every k. It may be possible to extend Corollary 3 to a class of log-concave probability measures with, say, a nontrivial uniform lower bound on the smallest eigenvalue of the covariance matrix, but we will not pursue this here.

The Kantorovitch duality theorem (see [15, Theorem 5.10]) gives a dual formulation of the L 1 Wasserstein distance similar to the formulation of total variation distance in (2):

$$\displaystyle{W_{1}(\mu,\nu ) =\sup _{g}\left \vert \int g\ d\mu -\int g\ d\nu \right \vert,}$$

where the supremum is over 1-Lipschitz functions \(g: \mathbb{R}^{n} \rightarrow \mathbb{R}\). An immediate consequence is that for any probability measures μ and ν,

$$\displaystyle{d_{\mathit{BL}}(\mu,\nu ) \leq W_{1}(\mu,\nu ).}$$

The following complementary inequality holds in the log-concave case.

Proposition 4.

Let μ and ν be log-concave isotropic probability measures on \(\mathbb{R}^{n}\) . Then

$$\displaystyle{ W_{1}(\mu,\nu ) \leq C\max \left \{\sqrt{n},\log \left ( \frac{\sqrt{n}} {d_{\mathit{BL}}(\mu,\nu )}\right )\right \}d_{\mathit{BL}}(\mu,\nu ). }$$
(3)

The following graph of \(f(x) =\max \left \{1,\log \left (\frac{1} {x}\right )\right \}x\) may be helpful in visualizing the bounds in Proposition 4 and the results below.

In particular, when d BL is moderate, we simply have \(W_{1} \leq C\sqrt{n}d_{\mathit{BL}}\). When d BL is small, the right hand side of (3) is not quite linear in d BL , but is \(o{\bigl (n^{\varepsilon /2}d_{\mathit{BL}}^{1-\varepsilon }\bigr )}\) for each \(\varepsilon > 0\).

From Hölder’s inequality, it is immediate that if p ≤ q, then W p (μ, ν) ≤ W q (μ, ν). In the log-concave case, we have the following.

Proposition 5.

Let μ and ν be isotropic log-concave probability measures on \(\mathbb{R}^{n}\) and let 1 ≤ p < q. Then

$$\displaystyle{W_{q}(\mu,\nu )^{q} \leq C\left (\max \left \{\sqrt{n},\log \left (\frac{\left (c\max \{q,\sqrt{n}\}\right )^{q}} {W_{p}(\mu,\nu )^{p}} \right )\right \}\right )^{q-p}W_{ p}(\mu,\nu )^{p}.}$$

Because the bounded-Lipschitz distance metrizes the weak topology, and convergence in L p Wasserstein distance implies convergence of moments of order smaller than p, Propositions 4 and 5 imply the following.

Corollary 6.

Let μ, \(\{\mu _{k}\}_{k\in \mathbb{N}}\) be isotropic log-concave probability measures on \(\mathbb{R}^{n}\) such that μ k →μ weakly. Then all moments of the μ k converge to the corresponding moments of μ.

The following, known as the Csiszár–Kullback–Pinsker inequality, holds for any probability measures μ and ν:

$$\displaystyle{ d_{\mathit{TV}}(\mu,\nu ) \leq \sqrt{2H(\mu \mid \nu )}. }$$
(4)

(See [3] for a proof, generalizations, and original references.) Unlike the other notions of distance considered above, H(⋅ ∣⋅ ) is not a metric, and H(μν) can only be finite if μ is absolutely continuous with respect to ν. Nevertheless, it is frequently used to quantify convergence; (4) shows that convergence in relative entropy is stronger than convergence in total variation. Convergence in relative entropy is particularly useful in quantifying convergence to the Gaussian distribution, and it is in that setting that (4) can be essentially reversed under an assumption of log-concavity.

Proposition 7.

Let μ be an isotropic log-concave probability measure on \(\mathbb{R}^{n}\) , and let γ n denote the standard Gaussian distribution on \(\mathbb{R}^{n}\) . Then

$$\displaystyle{H(\mu \mid \gamma _{n}) \leq C\max \left \{\log ^{2}\left ( \frac{n} {d_{\mathit{TV}}(\mu,\gamma _{n})}\right ),n\log (n + 1)\right \}d_{\mathit{TV}}(\mu,\gamma _{n}).}$$

The proof of Proposition 7 uses a rough bound on the isotropic constant \(L_{f} = \left \Vert f\right \Vert _{\infty }^{1/n}\) of the density f of μ. Better estimates are available but only result in a change in the absolute constants in our bound. In the case that the isotropic constant is bounded independent of n (e.g. if μ is the uniform measure on an unconditional convex body, or if the hyperplane conjecture is proved), then the bound above can be improved slightly to

$$\displaystyle{H(\mu \mid \gamma _{n}) \leq C\max \left \{\log ^{2}\left ( \frac{n} {d_{\mathit{TV}}(\mu,\gamma _{n})}\right ),n\right \}d_{\mathit{TV}}(\mu,\gamma _{n}).}$$

Corollary 8.

Let \(\{\mu _{k}\}_{k\in \mathbb{N}}\) be isotropic log-concave probability measures on \(\mathbb{R}^{n}\) . The following are equivalent:

  1. 1.

    μ k →γ n weakly.

  2. 2.

    μ k →γ n in total variation.

  3. 3.

    H(μ k ∣γ n ) → 0.

It is worth noting that Proposition 7 implies that B. Klartag’s central limit theorem for convex bodies (proved in [8, 9] in total variation) also holds in the a priori stronger sense of entropy, with a polynomial rate of convergence.

2 Proofs of the Results

The proof of Proposition 1 uses the following deconvolution result of R. Eldan and B. Klartag.

Lemma 9 ([7, Proposition 10]).

Suppose that f is the density of an isotropic log-concave probability measure on \(\mathbb{R}^{n}\) , and for t > 0 define

$$\displaystyle{\varphi _{t}(x) = \frac{1} {(2\pi t^{2})^{n/2}}e^{-\left \Vert x\right \Vert ^{2}/2t^{2} }.}$$

Then

$$\displaystyle{\left \Vert f - f {\ast}\varphi _{t}\right \Vert _{1} \leq \mathit{cnt}.}$$

Proof (Proof of Proposition 1).

Let \(g \in C(\mathbb{R}^{n})\) with \(\left \Vert g\right \Vert _{\infty }\leq 1\). For t > 0, let \(g_{t} = g {\ast}\varphi _{t}\), where \(\varphi _{t}\) is as in Lemma 9. It follows from Young’s inequality that \(\left \Vert g_{t}\right \Vert _{\infty } \leq 1\) and that g t is 1∕t-Lipschitz. We have

$$\displaystyle\begin{array}{rcl} \left \vert \int g\ d\mu -\int g\ d\nu \right \vert & \leq & \left \vert \int (g - g_{t})\ d\mu \right \vert + \left \vert \int g_{t}\ d\mu -\int g_{t}\ d\nu \right \vert + \left \vert \int (g_{t} - g)\ d\nu \right \vert. {}\\ \end{array}$$

It is a classical fact due to C. Borell [4] that a log-concave probability measures which is not supported on a proper affine subspace of \(\mathbb{R}^{n}\) has a density. If f is the density of μ, then by Lemma 9,

$$\displaystyle{\left \vert \int (g - g_{t})\ d\mu \right \vert = \left \vert \int g(f - f {\ast}\varphi _{t})\right \vert \leq \left \Vert f - f {\ast}\varphi _{t}\right \Vert _{1} \leq \mathit{cnt},}$$

and

$$\displaystyle{\left \vert \int (g - g_{t})\ d\nu \right \vert \leq \mathit{cnt}}$$

similarly. Furthermore,

$$\displaystyle{\left \vert \int g_{t}\ d\mu -\int g_{t}\ d\nu \right \vert \leq d_{\mathit{BL}}(\mu,\nu )\left \Vert g_{t}\right \Vert _{\mathit{BL}} \leq d_{\mathit{BL}}(\mu,\nu )\max \{1,1/t\}.}$$

Combining the above estimates and taking the supremum over g yields

$$\displaystyle{d_{\mathit{TV}}(\mu,\nu ) \leq d_{\mathit{BL}}(\mu,\nu )\max \{1,1/t\} + \mathit{cnt}}$$

for every t > 0. The proposition follows by picking \(t = \sqrt{d_{\mathit{BL } } (\mu,\nu )/2n} \leq 1\).

The remaining propositions all depend in part on the following deep concentration result due to G. Paouris.

Proposition 10 ([14]).

Let X be an isotropic log-concave random vector in \(\mathbb{R}^{n}\) . Then

$$\displaystyle{\mathbb{P}\left [\left \Vert X\right \Vert \geq R\right ] \leq e^{-\mathit{cR}}}$$

for every \(R \geq C\sqrt{n}\) , and

$$\displaystyle{\left (\mathbb{E}\left \Vert X\right \Vert ^{p}\right )^{1/p} \leq C\max \{\sqrt{n},p\}}$$

for every p ≥ 1.

The following simple optimization lemma will also be used in the remaining proofs.

Lemma 11.

Given A,B,M,k > 0,

$$\displaystyle{\inf _{t\geq M}\left (\mathit{At}^{k} + \mathit{Be}^{-t}\right ) \leq A\left (1 + \left (\max \left \{M,\log (B/A)\right \}\right )^{k}\right ).}$$

Proof.

Set t = max{M, log(BA)}.

Proof (Proof of Proposition 4).

Let \(g: \mathbb{R}^{n} \rightarrow \mathbb{R}\) be 1-Lipschitz and without loss of generality assume that g(0) = 0, so that \(\left \vert g(x)\right \vert \leq \left \Vert x\right \Vert\). For R > 0 define

$$\displaystyle{g_{R}(x) = \left \{\begin{array}{@{}l@{\quad }l@{}} -R \quad &\text{if }g(x) < -R, \\ g(x)\quad &\text{if } - R \leq g(x) \leq R, \\ R \quad &\text{if }g(x) > R, \end{array} \right.}$$

and observe that \(\left \Vert g_{R}\right \Vert _{\mathit{BL}} \leq \max \{ 1,R\}\). Let X ∼ μ and Y ∼ ν. Then

$$\displaystyle\begin{array}{rcl} \left \vert \mathbb{E}g(X) - \mathbb{E}g(Y )\right \vert & \leq & \mathbb{E}\left \vert g_{R}(X) - g_{R}(Y )\right \vert + \mathbb{E}\left \vert g(X) - g_{R}(X)\right \vert + \mathbb{E}\left \vert g(Y ) - g_{R}(Y )\right \vert {}\\ & \leq & \max \{1,R\}d_{\mathit{BL}}(\mu,\nu ) + \mathbb{E}\left \Vert X\right \Vert \mathbf{1}_{\left \Vert X\right \Vert \geq R} + \mathbb{E}\left \Vert Y \right \Vert \mathbf{1}_{\left \Vert Y \right \Vert \geq R}. {}\\ \end{array}$$

By the Cauchy–Schwarz inequality and Proposition 10,

$$\displaystyle{\mathbb{E}\left \Vert X\right \Vert \mathbf{1}_{\left \Vert X\right \Vert \geq R} \leq \sqrt{n\mathbb{P}\left [\left \Vert X\right \Vert \geq R \right ]} \leq \sqrt{n}e^{-\mathit{cR}}}$$

for \(R \geq C\sqrt{n}\), and the last term is bounded similarly. Combining the above estimates and taking the supremum over g yields

$$\displaystyle{W_{1}(\mu,\nu ) \leq \max \{ 1,R\}d_{\mathit{BL}}(\mu,\nu ) + 2\sqrt{n}e^{-\mathit{cR}}}$$

for every \(R \geq C\sqrt{n}\). The proposition follows using Lemma 11.

Proof (Proof of Proposition 5).

Let (X, Y ) be a coupling of μ and ν on \(\mathbb{R}^{n} \times \mathbb{R}^{n}\). Then for each R > 0,

$$\displaystyle\begin{array}{rcl} \mathbb{E}\left \Vert X\,-\,Y \right \Vert ^{q}& \leq & \,R^{q\,-\,p}\mathbb{E}\left [\left \Vert X\,-\,Y \right \Vert ^{p}\mathbf{1}_{\left \Vert X\,-\,Y \right \Vert \leq R}\right ]\,+\,\sqrt{\mathbb{P}\left [\left \Vert X- Y \right \Vert \geq R \right ] \mathbb{E}\left \Vert X- Y \right \Vert ^{2q}}. {}\\ \end{array}$$

By Proposition 10,

$$\displaystyle{\mathbb{P}\left [\left \Vert X - Y \right \Vert \geq R\right ] \leq \mathbb{P}\left [\left \Vert X\right \Vert \geq R/2\right ] + \mathbb{P}\left [\left \Vert Y \right \Vert \geq R/2\right ] \leq e^{-\mathit{cR}}}$$

when \(R \geq C\sqrt{n}\), and

$$\displaystyle{\left (\mathbb{E}\left \Vert X - Y \right \Vert ^{2q}\right )^{1/2q} \leq \left (\mathbb{E}\left \Vert X\right \Vert ^{2q}\right )^{1/2q} + \left (\mathbb{E}\left \Vert Y \right \Vert ^{2q}\right )^{1/2q} \leq C\max \{q,\sqrt{n}\},}$$

so that

$$\displaystyle{\mathbb{E}\left \Vert X - Y \right \Vert ^{q} \leq R^{q-p}\mathbb{E}\left \Vert X - Y \right \Vert ^{p} + \left (C\max \{q,\sqrt{n}\}\right )^{q}e^{-\mathit{cR}}}$$

for every \(R \geq C\sqrt{n}\). Taking the infimum over couplings and then applying Lemma 11 completes the proof.

The proof of Proposition 7 uses the following variance bound which follows from a more general concentration inequality due to Bobkov and Madiman.

Lemma 12 (See [2, Theorem 1.1]).

Suppose that μ is an isotropic log-concave probability measure on \(\mathbb{R}^{n}\) with density f, and let Y ∼μ. Then

$$\displaystyle{\text{Var}{\bigl (\log f(Y )\bigr )} \leq \mathit{Cn}.}$$

Proof (Proof of Proposition 7).

Let f be the density of μ, and let \(\varphi (x) = (2\pi )^{-n/2}e^{-\left \Vert x\right \Vert ^{2}/2 }\) be the density of γ n . Let Z ∼ γ n , Y ∼ μ, \(X = \frac{f(Z)} {\varphi (Z)}\), and \(W = \frac{f(Y )} {\varphi (Y )}\). Then

$$\displaystyle{H(\mu \mid \gamma _{n}) = \mathbb{E}X\log X.}$$

In general, if μ and ν have densities f μ and f ν , it is an easy exercise to show that \(d_{\mathit{TV}}(\mu,\nu ) =\int \left \vert f_{\mu } - f_{\nu }\right \vert\); from this, it follows that

$$\displaystyle{d_{\mathit{TV}}(\mu,\gamma _{n}) = \mathbb{E}\left \vert X - 1\right \vert = \frac{1} {2}\mathbb{E}(X - 1)\mathbf{1}_{X\geq 1}.}$$

Let h(x) = xlogx. Since h is convex and h(1) = 0, we have that h(x) ≤ a(x − 1) for 1 ≤ x ≤ R as long as a is such that h(R) ≤ a(R − 1). Let R ≥ 2, so that \(\frac{R} {R-1} \leq 2\). Then

$$\displaystyle{h(R) = R\log R \leq 2(R - 1)\log R = a(R - 1)}$$

for a = 2logR. Thus

$$\displaystyle\begin{array}{rcl} \mathbb{E}X\log X& \leq & \mathbb{E}(X\log X)\mathbf{1}_{X\geq 1} {}\\ & \leq & a\mathbb{E}(X - 1)\mathbf{1}_{X\geq 1} + \mathbb{E}(X\log X)\mathbf{1}_{X\geq R} {}\\ & =& (\log R)d_{\mathit{TV}}(\mu,\gamma _{n}) + \mathbb{E}(X\log X)\mathbf{1}_{X\geq R}. {}\\ \end{array}$$

The Cauchy–Schwarz inequality implies that

$$\displaystyle{\mathbb{E}(X\log X)\mathbf{1}_{X\geq R} = \mathbb{E}(\log W)\mathbf{1}_{W\geq R} \leq \sqrt{\mathbb{E}(\log W)^{2}}\sqrt{\mathbb{P}\left [W \geq R \right ]}.}$$

By the L 2 triangle inequality, we have

$$\displaystyle\begin{array}{rcl} \sqrt{ \mathbb{E}(\log W)^{2}}& =& \sqrt{\mathbb{E}\left \vert \log f(Y ) -\log \varphi (Y ) \right \vert ^{2}} {}\\ & \leq & \sqrt{\mathbb{E}\left \vert \log f(Y ) \right \vert ^{2}} + \sqrt{\mathbb{E}\left \vert \log \varphi (Y ) \right \vert ^{2}}, {}\\ \end{array}$$

and by Proposition 10,

$$\displaystyle\begin{array}{rcl} \mathbb{E}\left \vert \log \varphi (Y )\right \vert ^{2}& =& \mathbb{E}\left (\frac{n} {2} \log 2\pi + \frac{\left \Vert Y \right \Vert ^{2}} {2} \right )^{2} \leq \mathit{Cn}^{2}. {}\\ \end{array}$$

By Lemma 12,

$$\displaystyle{\mathbb{E}\left \vert \log f(Y )\right \vert ^{2} \leq \left (\mathbb{E}\log f(Y )\right )^{2} + \mathit{Cn}.}$$

Recall that the entropy of μ is

$$\displaystyle{-\int f(y)\log f(y)\ \mathit{dy} = -\mathbb{E}\log f(Y ) \geq 0,}$$

and that γ n is the maximum-entropy distribution with identity covariance, so that

$$\displaystyle{\left (\mathbb{E}\log f(Y )\right )^{2} \leq \left (\mathbb{E}\log \varphi (Z)\right )^{2} = \left (n\log \sqrt{2\pi e}\right )^{2}.}$$

Thus

$$\displaystyle{\sqrt{\mathbb{E}(\log W)^{2}} \leq \mathit{Cn}.}$$

By [10, Theorem 5.14(e)], \(\left \Vert f\right \Vert _{\infty }\leq 2^{8n}n^{n/2}\), and so

$$\displaystyle\begin{array}{rcl} \mathbb{P}\left [W \geq R\right ]& =& \mathbb{P}\left [\frac{f(Y )} {\varphi (Y )} \geq R\right ] \leq \mathbb{P}\left [e^{\left \Vert Y \right \Vert ^{2} /2} \geq (2^{17}\pi n)^{-n/2}R\right ] {}\\ & =& \mathbb{P}\left [\left \Vert Y \right \Vert \geq \sqrt{2\log \left ((2^{17 } \pi n)^{-n/2 } R \right )}\right ] {}\\ \end{array}$$

for each R ≥ (217 π n)n∕2. Proposition 10 now implies that

$$\displaystyle{\mathbb{P}\left [W \geq R\right ] \leq e^{-c\sqrt{\log R-\frac{n} {2} \log (2^{17}\pi n)} } \leq e^{-c'\sqrt{\log R}}}$$

for logR ≥ Cnlog(n + 1).

Substituting \(S = c\sqrt{\log R}\), all together this shows that

$$\displaystyle{H(\mu \mid \gamma _{n}) \leq C\left (S^{2}d_{\mathit{ TV}}(\mu,\gamma _{n}) + ne^{-S}\right )}$$

for every \(S \geq c\sqrt{n\log (n + 1)}\). The result follows using Lemma 11.