2010 Mathematics Subject Classification.

1 Introduction

It is known that many high-dimensional probability distributions μ on the Euclidean space \(\mathbb{R}^{n}\) (and other metric spaces, including graphs) possess strong concentration properties. In a functional language, this may informally be stated as the assertion that any sufficiently smooth function f on \(\mathbb{R}^{n}\), e.g., having a bounded Lipschitz semi-norm, is almost a constant on almost all space. There are several ways to quantify such a property. One natural approach proposed by Alon et al. [2] associates with a given metric probability space (M, d, μ) its spread constant,

$$\displaystyle{s^{2}(\mu ) =\sup \,\mathrm{ Var}_{\mu }(\,f) =\sup \int (\,f - m)^{2}\,d\mu,}$$

where m = ∫ fd μ, and the sup is taken over all functions f on M with ∥ f ∥ Lip ≤ 1. More information is contained in the so-called subgaussian constant σ 2 = σ 2(μ) which is defined as the infimum over all σ 2 such that

$$\displaystyle{ \int e^{tf}\,d\mu \leq e^{\sigma ^{2}t^{2}/2 },\quad \ \text{for all}\ \ t \in \mathbb{R}, }$$
(1)

in the class \(\mathcal{L}_{0}\) of all f on M with m = 0 and ∥ f ∥ Lip ≤ 1 (cf. [8]). Describing the diameter of \(\mathcal{L}_{0}\) in the Orlicz space \(L^{\psi _{2}}(\mu )\) for the Young function \(\psi _{2}(t) = e^{t^{2} } - 1\) (within universal factors), the quantity σ 2(μ) appears as a parameter in a subgaussian concentration inequality for the class of all Borel subsets of M. As an equivalent approach, it may also be introduced via the transport-entropy inequality connecting the classical Kantorovich distance and the relative entropy from an arbitrary probability measure on M to the measure μ (cf. [7]).

While in general s 2 ≤ σ 2, the latter characteristic allows one to control subgaussian tails under the probability measure μ uniformly in the entire class of Lipschitz functions on M. More generally, when ∥ f ∥ Lip ≤ L, (1) yields

$$\displaystyle{ \mu \{\vert \,f - m\vert \geq t\} \leq 2e^{-t^{2}/(2\sigma ^{2}L^{2}) },\qquad t> 0. }$$
(2)

Classical and well-known examples include the standard Gaussian measure on \(M = \mathbb{R}^{n}\) in which case s 2 = σ 2 = 1, and the normalized Lebesgue measure on the unit sphere M = S n−1 with \(s^{2} =\sigma ^{2} = \frac{1} {n-1}\). The last example was a starting point in the study of the concentration of measure phenomena, a fruitful direction initiated in the early 1970s by V.D. Milman.

Other examples come often after verification that μ satisfies certain Sobolev-type inequalities such as Poincaré-type inequalities

$$\displaystyle{\lambda _{1}\mathrm{Var}_{\mu }(u) \leq \int \vert \nabla u\vert ^{2}\,d\mu,}$$

and logarithmic Sobolev inequalities

$$\displaystyle{\rho \,\mathrm{Ent}_{\mu }(u^{2}) =\rho \bigg [\int u^{2}\log u^{2}\,d\mu -\int u^{2}\,d\mu \,\log \int u^{2}\,d\mu \bigg] \leq 2\int \vert \nabla u\vert ^{2}\,d\mu,}$$

where u may be any locally Lipschitz function on M, and the constants λ 1 > 0 and ρ > 0 do not depend on u. Here the modulus of the gradient may be understood in the generalized sense as the function

$$\displaystyle{\vert \nabla u(x)\vert =\limsup _{y\rightarrow x}\frac{\vert u(x) - u(y)\vert } {d(x,y)},\qquad x \in M}$$

(this is the so-called “continuous setting”), while in the discrete spaces, e.g., graphs, we deal with other naturally defined gradients. In both cases, one has respectively the well-known upper bounds

$$\displaystyle{ s^{2}(\mu ) \leq \frac{1} {\lambda _{1}},\qquad \sigma ^{2}(\mu ) \leq \frac{1} {\rho }. }$$
(3)

For example, λ 1 = ρ = n − 1 on the unit sphere (best possible values, [17]), which can be used to make a corresponding statement about the spread and Gaussian constants.

One of the purposes of this note is to give new examples by involving the family of the normalized restricted measures

$$\displaystyle{\mu _{A}(B) = \frac{\mu (A \cap B)} {\mu (A)},\qquad B \subset M\ (\mathrm{Borel}),}$$

where a Borel measurable set A ⊂ M is fixed and has a positive measure. As an example, returning to the standard Gaussian measure μ on \(\mathbb{R}^{n}\), it is known that σ 2(μ A ) ≤ 1 for any convex body \(A \subset \mathbb{R}^{n}\). This remarkable property, discovered by Bakry and Ledoux [3] in a sharper form of a Gaussian-type isoperimetric inequality, has nowadays several proofs and generalizations, cf. [5, 6]. Of course, in general, the set A may have a rather disordered structure, for instance, to be disconnected. And then there is no hope for validity of a Poincaré-type inequality for the measure μ A . Nevertheless, it turns out that the concentration property of μ A is inherited from μ, unless the measure of A is too small. In particular, we have the following observation about abstract metric probability spaces.

Theorem 1.1

For any measurable set A ⊂ M with μ(A) > 0, the subgaussian constant σ 2 A ) of the normalized restricted measure satisfies

$$\displaystyle{ \sigma ^{2}(\mu _{ A})\, \leq \, c\,\log \Big( \frac{e} {\mu (A)}\Big)\,\sigma ^{2}(\mu ), }$$
(4)

where c is an absolute constant.

Although this assertion is technically simple, we will describe two approaches: one is direct and refers to estimates on the ψ 2-norms over the restricted measures, and the other one uses a general comparison result due to Barthe and Milman on the concentration functions [4].

One may further generalize Theorem 1.1 by defining the subgaussian constant \(\sigma _{\mathcal{F}}^{2}(\mu )\) within a given fixed subclass \(\mathcal{F}\) of functions on M, by using the same bound (1) on the Laplace transform. This is motivated by a possible different level of concentration for different classes; indeed, in case of \(M = \mathbb{R}^{n}\), the concentration property may considerably be strengthened for the class \(\mathcal{F}\) of all convex Lipschitz functions. In particular, one result of Talagrand [18, 19] provides a dimension-free bound \(\sigma _{\mathcal{F}}^{2}(\mu ) \leq C\) for an arbitrary product probability measure μ on the n-dimensional cube [−1, 1]n. Hence, a more general version of Theorem 1.1 yields the bound

$$\displaystyle{\sigma _{\mathcal{F}}^{2}(\mu _{ A})\, \leq \, c\,\log \Big( \frac{e} {\mu (A)}\Big)}$$

with some absolute constant c, which holds for any Borel subset A of [−1, 1]n (cf. Sect. 6 below).

According to the very definition, the quantities σ 2(μ) and σ 2(μ A ) might seem to be responsible for deviations of only Lipschitz functions f on M and A, respectively. However, the inequality (4) may also be used to control deviations of non-Lipschitz f — on large parts of the space and under certain regularity hypotheses. Assume, for example,  | ∇f | d μ ≤ 1 (which is kind of a normalization condition) and consider

$$\displaystyle{ A =\{ x \in M: \vert \nabla f(x)\vert \leq L\}. }$$
(5)

If L ≥ 2, this set has the measure \(\mu (A) \geq 1 - \frac{1} {L} \geq \frac{1} {2}\), and hence, σ 2(μ A ) ≤ c σ 2(μ) with some absolute constant c. If we assume that f has a Lipschitz semi-norm ≤ L on A, then, according to (2),

$$\displaystyle{ \mu _{A}\{x \in A: \vert \,f - m\vert \geq t\} \leq 2e^{-t^{2}/(c\sigma ^{2}(\mu )L^{2}) },\qquad t> 0, }$$
(6)

where m is the mean of f with respect to μ A . It is in this sense one may say that f is almost a constant on the set A.

This also yields a corresponding deviation bound on the whole space,

$$\displaystyle{\mu \{x \in M: \vert \,f - m\vert \geq t\} \leq 2e^{-t^{2}/c\sigma ^{2}(\mu )L^{2} } + \frac{1} {L}.}$$

Stronger integrability conditions posed on | ∇f | can considerably sharpen the conclusion. By a similar argument, Theorem 1.1 yields, for example, the following exponential bound, known in the presence of a logarithmic Sobolev inequality for the space (M, d, μ), and with σ 2 replaced by 1∕ρ (cf. [7]).

Corollary 1.2

Let f be a locally Lipschitz function on M with Lipschitz semi-norms ≤ L on the sets  (5) . If \(\int e^{\vert \nabla f\vert ^{2} }\,d\mu \leq 2\) , then f is μ-integrable, and moreover,

$$\displaystyle{\mu \{x \in M: \vert \,f - m\vert \geq t\} \leq 2e^{-t/c\sigma (\mu )},\qquad t> 0,}$$

where m is the μ-mean of f and c is an absolute constant.

Equivalently (up to an absolute factor), we have a Sobolev-type inequality

$$\displaystyle{\|\,f - m\|_{\psi _{1}} \leq c\sigma (\mu )\,\|\nabla f\|_{\psi _{2}},}$$

connecting the ψ 1-norm of fm with the ψ 2-norm of the modulus of the gradient of f. We prove a more general version of this corollary in Sect. 6 (cf. Theorem 6.1). As will be explained in the same section, similar assertions may also be made about convex f and product measures μ on M = [−1, 1]n, thus extending Talagrand’s theorem to the class of non-Lipschitz functions.

In view of the right bound in (3) and (4), the spread and subgaussian constants for restricted measures can be controlled in terms of the logarithmic Sobolev constant ρ via

$$\displaystyle{s^{2}(\mu _{ A})\, \leq \,\sigma ^{2}(\mu _{ A})\, \leq \, c\,\log \Big( \frac{e} {\mu (A)}\Big)\,\frac{1} {\rho }.}$$

However, it may happen that ρ = 0 and σ 2(μ) = , while λ 1 > 0 (e.g., for the product exponential distribution on \(\mathbb{R}^{n}\)). Then one may wonder whether one can estimate the spread constant of a restricted measure in terms of the spectral gap. In that case there is a bound similar to (4).

Theorem 1.3

Assume the metric probability space (M,d,μ) satisfies a Poincaré-type inequality with λ 1 > 0. For any A ⊂ M with μ(A) > 0, with some absolute constant c

$$\displaystyle{ s^{2}(\mu _{ A})\, \leq \, c\,\log ^{2}\Big( \frac{e} {\mu (A)}\Big)\,\frac{1} {\lambda _{1}}. }$$
(7)

It should be mentioned that the logarithmic terms in (4) and (7) may not be removed and are actually asymptotically optimal as functions of μ(A), as μ(A) is getting small, see Sect. 7.

Our contribution below is organized into sections as follows:

  1. 2.

    Bounds on ψ α -Norms for Restricted Measures.

  2. 3.

    Proof of Theorem 1.1. Transport-Entropy Formulation.

  3. 4.

    Proof of Theorem 1.3. Spectral Gap.

  4. 5.

    Examples.

  5. 6.

    Deviations for Non-Lipschitz Functions.

  6. 7.

    Optimality.

  7. 8.

    Appendix.

2 Bounds on ψ α -Norms for Restricted Measures

A measurable function f on the probability space (M, μ) is said to have a finite ψ α -norm, α ≥ 1, if for some r > 0,

$$\displaystyle{\int e^{(\vert \,f\vert /r)^{\alpha }}\,d\mu \leq 2.}$$

The infimum over all such r represents the ψ α -norm \(\|\,f\|_{\psi _{\alpha }}\) or \(\|\,f\|_{L^{\psi _{\alpha }}(\mu )}\), which is just the Orlicz norm associated with the Young function \(\psi _{\alpha }(t) = e^{\vert t\vert ^{\alpha } }- 1\).

We are mostly interested in the particular cases α = 1 and α = 2. In this section we recall well-known relations between the ψ 1 and ψ 2-norms and the usual L p-norms \(\|\,f\|_{p} =\|\, f\|_{L^{p}(\mu )} = (\int \vert \,f\vert ^{p}\,d\mu )^{1/p}\). For the readers’ convenience, we include the proof in the Appendix.

Lemma 2.1

We have

$$\displaystyle{ \sup _{p\geq 1} \frac{\|\,f\|_{p}} {\sqrt{p}} \leq \|\, f\|_{L^{\psi _{2}}(\mu )} \leq 4\,\sup _{p\geq 1} \frac{\|\,f\|_{p}} {\sqrt{p}}, }$$
(8)
$$\displaystyle{ \sup _{p\geq 1}\frac{\|\,f\|_{p}} {p} \leq \|\, f\|_{L^{\psi _{1}}(\mu )} \leq 6\,\sup _{p\geq 1}\frac{\|\,f\|_{p}} {p}. }$$
(9)

Given a measurable subset A of M with μ(A) > 0, we consider the normalized restricted measure μ A on M, i.e.,

$$\displaystyle{\mu _{A}(B) = \frac{\mu (A \cap B)} {\mu (A)},\qquad B \subset M.}$$

Our basic tool leading to Theorem 1.1 will be the following assertion.

Proposition 2.2

For any measurable function f on M,

$$\displaystyle{ \|\,f\|_{L^{\psi _{2}}(\mu _{A})}\, \leq \, 4e\,\log ^{1/2}\Big( \frac{e} {\mu (A)}\Big)\,\|\,f\|_{L^{\psi _{2}}(\mu )}. }$$
(10)

Proof

Assume that \(\|\,f\|_{L^{\psi _{2}}(\mu )} = 1\) and fix p ≥ 1. By the left inequality in (8), for any q ≥ 1,

$$\displaystyle{q^{q/2} \geq \int \vert \,f\vert ^{q}\,d\mu \geq \mu (A)\int \vert \,f\vert ^{q}\,d\mu _{ A},}$$

so

$$\displaystyle{\frac{\|\,f\|_{L^{q}(\mu _{A})}} {\sqrt{q}} \leq \bigg ( \frac{1} {\mu (A)}\bigg)^{1/q}.}$$

But by the right inequality in (8),

$$\displaystyle{\|\,f\|_{\psi _{2}}\, \leq \, 4\,\sup _{q\geq 1} \frac{\|\,f\|_{q}} {\sqrt{q}}\, \leq \, 4\sqrt{p}\,\sup _{q\geq p} \frac{\|\,f\|_{q}} {\sqrt{q}}.}$$

Applying it on the space (M, μ A ), we then get

$$\displaystyle\begin{array}{rcl} \|\,f\|_{L^{\psi _{2}}(\mu _{A})}& \leq & 4\sqrt{p}\,\sup _{q\geq p}\frac{\|\,f\|_{L^{q}(\mu _{A})}} {\sqrt{q}} {}\\ & \leq & 4\sqrt{p}\,\sup _{q\geq p}\,\bigg( \frac{1} {\mu (A)}\bigg)^{1/q}\, =\, 4\sqrt{p}\,\bigg( \frac{1} {\mu (A)}\bigg)^{1/p}. {}\\ \end{array}$$

The obtained inequality,

$$\displaystyle{\|\,f\|_{L^{\psi _{2}}(\mu _{A})}\, \leq \, 4\sqrt{p}\,\bigg( \frac{1} {\mu (A)}\bigg)^{1/p},}$$

holds true for any p ≥ 1 and therefore may be optimized over p. Choosing \(p =\log \frac{e} {\mu (A)}\), we arrive at (10). □ 

A possible weak point in the bound (10) is that the means of f are not involved. For example, in applications, if f were defined only on A and had μ A -mean zero, we might need to find an extension of f to the whole space M keeping the mean zero with respect to μ. In fact, this should not create any difficulty, since one may work with the symmetrization of f.

More precisely, we may apply Proposition 2.2 on the product space (M × M, μμ) to the product sets A × A and functions of the form f(x) − f(y). Then we get

$$\displaystyle{\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu _{A}\otimes \mu _{A})}\, \leq \, 4e\,\log ^{1/2}\bigg( \frac{e} {\mu (A)^{2}}\bigg)\,\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )}.}$$

Since \(\log \big( \frac{e} {\mu (A)^{2}} \big) \leq 2\log \big( \frac{e} {\mu (A)}\big),\) we arrive at:

Corollary 2.3

For any measurable function f on M,

$$\displaystyle{\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu _{A}\otimes \mu _{A})}\, \leq \, 4e\sqrt{2}\,\log ^{1/2}\Big( \frac{e} {\mu (A)}\Big)\,\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )}.}$$

Let us now derive an analog of Proposition 2.2 for the ψ 1-norm, using similar arguments. Assume that \(\|\,f\|_{L^{\psi _{1}}(\mu )} = 1\) and fix p ≥ 1. By the left inequality in (9), for any q ≥ 1,

$$\displaystyle{q^{q} \geq \int \vert \,f\vert ^{q}\,d\mu \geq \mu (A)\int \vert \,f\vert ^{q}\,d\mu _{ A},}$$

so

$$\displaystyle{\frac{\|\,f\|_{L^{q}(\mu _{A})}} {q} \leq \Big ( \frac{1} {\mu (A)}\Big)^{1/q}.}$$

But, by the inequality (9),

$$\displaystyle{\|\,f\|_{L^{\psi _{1}}}\, \leq \, 6\,\sup _{q\geq 1}\frac{\|\,f\|_{q}} {q} \, \leq \, 6p\,\sup _{q\geq p}\frac{\|\,f\|_{q}} {q}.}$$

Applying it on the space (M, μ A ), we get

$$\displaystyle\begin{array}{rcl} \|\,f\|_{L^{\psi _{1}}(\mu _{A})}& \leq & 6p\,\sup _{q\geq p}\frac{\|\,f\|_{L^{q}(\mu _{A})}} {q} {}\\ & \leq & 6p\,\sup _{q\geq p}\,\Big( \frac{1} {\mu (A)}\Big)^{1/q}\, =\, 6p\,\Big( \frac{1} {\mu (A)}\Big)^{1/p}. {}\\ \end{array}$$

The obtained inequality,

$$\displaystyle{\|\,f\|_{L^{\psi _{1}}(\mu _{A})}\, \leq \, 6p\,\Big( \frac{1} {\mu (A)}\Big)^{1/p},}$$

holds true for any p ≥ 1 and therefore may be optimized over p. Choosing \(p =\log \frac{e} {\mu (A)}\), we arrive at:

Proposition 2.4

For any measurable function f on M, we have

$$\displaystyle{\|\,f\|_{L^{\psi _{1}}(\mu _{A})}\, \leq \, 6e\,\log \Big( \frac{e} {\mu (A)}\Big)\,\|\,f\|_{L^{\psi _{1}}(\mu )}.}$$

Similarly to Corollary 2.3 one may write down this relation on the product probability space (M × M, μμ) with the functions of the form \(\tilde{f}(x,y) = f(x) - f(y)\) and the product sets \(\tilde{A} = A \times A\). Then we get

$$\displaystyle{ \|\,f(x) - f(y)\|_{L^{\psi _{1}}(\mu _{A}\otimes \mu _{A})}\, \leq \, 12\,e\,\log \Big( \frac{e} {\mu (A)}\Big)\,\|\,f(x) - f(y)\|_{L^{\psi _{1}}(\mu \otimes \mu )}. }$$
(11)

3 Proof of Theorem 1.1: Transport-Entropy Formulation

The finiteness of the subgaussian constant for a given metric probability space (M, d, μ) means that ψ 2-norms of Lipschitz functions on M with mean zero are uniformly bounded. Equivalently, for any (for all) x 0 ∈ M, we have that, for some λ > 0,

$$\displaystyle{\int e^{d(x,x_{0})^{2}/\lambda ^{2} }\,d\mu (x) <\infty.}$$

The definition (1) of σ 2(μ) inspires to consider another norm-like quantity

$$\displaystyle{\sigma _{f}^{2} =\sup _{ t\neq 0}\bigg[ \frac{1} {t^{2}/2}\log \int e^{tf}\,d\mu \bigg].}$$

Here is a well-known relation (with explicit numerical constants) which holds in the setting of an abstract probability space (M, μ). Once again, we include a proof in the Appendix for completeness.

Lemma 3.1

If f has mean zero and finite ψ 2 -norm, then

$$\displaystyle{ \frac{1} {\sqrt{6}}\,\|\,f\|_{\psi _{2}}^{2} \leq \sigma _{ f}^{2} \leq 4\,\|\,f\|_{\psi _{ 2}}^{2}.}$$

One can now relate the subgaussian constant of the restricted measure to the subgaussian constant of the original measure. Let now (M, d, μ) be a metric probability space. First, Lemma 3.1 immediately yields an equivalent description in terms of ψ 2-norms, namely

$$\displaystyle{ \frac{1} {\sqrt{6}}\,\sup _{f}\|\,f\|_{\psi _{2}}^{2} \leq \sigma ^{2}(\mu ) \leq 4\,\sup _{ f}\|\,f\|_{\psi _{2}}^{2}, }$$
(12)

where the supremum is running over all \(f: M \rightarrow \mathbb{R}\) with μ-mean zero and ∥ f ∥ Lip ≤ 1. Here, one can get rid of the mean zero assumption by considering functions of the form f(x) − f(y) on the product space (M × M, μμ, d 1), where d 1 is the l 1-type metric given by d 1((x 1, y 1), (x 2, y 2)) = d(x 1, x 2) + d(y 1, y 2). If f has mean zero, then, by Jensen’s inequality,

$$\displaystyle{\int \!\!\int e^{(\,f(x)-f(y))^{2}/r^{2} }\,d\mu (x)\,d\mu (y) \geq \int e^{f(x)^{2}/r^{2} }\,d\mu (x),}$$

which implies that

$$\displaystyle{\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )} \geq \|\, f\|_{L^{\psi _{2}}(\mu )}.}$$

On the other hand, by the triangle inequality,

$$\displaystyle{\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )} \leq 2\,\|\,f\|_{L^{\psi _{2}}(\mu )}.}$$

Hence, we arrive at another, more flexible relation, where the mean zero assumption may be removed.

Lemma 3.2

We have

$$\displaystyle{ \frac{1} {4\sqrt{6}}\,\sup _{f}\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )}^{2} \leq \sigma ^{2}(\mu ) \leq 4\,\sup _{ f}\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )}^{2},}$$

where the supremum is running over all functions f on M with ∥ f∥ Lip ≤ 1.

Proof of Theorem 1.1

We are prepared to make last steps for the proof of the inequality (4). We use the well-known Kirszbraun’s theorem: Any function \(f: A \rightarrow \mathbb{R}\) with Lipschitz semi-norm ∥ f ∥ Lip ≤ 1 on A admits a Lipschitz extension to the whole space [10, 14]. Namely, one may put

$$\displaystyle{\tilde{f}(x) =\inf _{a\in A}\big[f(a) + d(a,x)\big],\quad x \in M.}$$

Applying first Corollary 2.3 and then the left inequality of Lemma 3.2 to \(\tilde{f}\), we get

$$\displaystyle\begin{array}{rcl} \|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu _{A}\otimes \mu _{A})}^{2}& =& \big\|\tilde{f}(x) -\tilde{ f}(y)\big\|_{ L^{\psi _{2}}(\mu _{A}\otimes \mu _{A})}^{2} {}\\ & \leq & \big(4e\sqrt{2}\,\big)^{2}\,\log \Big( \frac{e} {\mu (A)}\Big)\,\big\|\tilde{f}(x) -\tilde{ f}(y)\big\|_{L^{\psi _{2}}(\mu \otimes \mu )}^{2} {}\\ & \leq & \big(4e\sqrt{2}\,\big)^{2}\,\log \Big( \frac{e} {\mu (A)}\Big) \cdot \big (4\sqrt{6}\,\big)^{2}\,\sigma ^{2}(\mu ). {}\\ \end{array}$$

Another application of Lemma 3.2 — in the space (A, d, μ A ) (now the right inequality) yields

$$\displaystyle{\sigma ^{2}(\mu _{ A})\, \leq \, 4 \cdot \big (4e\sqrt{2}\,\big)^{2}\,\log \Big( \frac{e} {\mu (A)}\Big) \cdot \big (4\sqrt{6}\,\big)^{2}\,\sigma ^{2}(\mu ).}$$

This is exactly (4) with constant \(c = 4 \cdot (4e\sqrt{2}\,)^{2}\,(4\sqrt{6}\,)^{2} = 3 \cdot 2^{12}e^{2} = 90,796.72\ldots\) □ 

Remark 3.3

Let us also record the following natural generalization of Theorem 1.1, which is obtained along the same arguments. Given a collection \(\mathcal{F}\) of (integrable) functions on the probability space (M, μ), define \(\sigma _{\mathcal{F}}^{2}(\mu )\) as the infimum over all σ 2 such that

$$\displaystyle{\int e^{t(\,f-m)}\,d\mu \leq e^{\sigma ^{2}t^{2}/2 },\quad \ \text{for all}\ \ t \in \mathbb{R},}$$

for any \(f \in \mathcal{F}\), where m = ∫ fd μ. Then with the same constant c as in Theorem 1.1, for any measurable A ⊂ M, μ(A) > 0, we have

$$\displaystyle{\sigma _{\mathcal{F}_{A}}^{2}(\mu _{ A})\, \leq \, c\,\log \Big( \frac{e} {\mu (A)}\Big)\,\sigma _{\mathcal{F}}^{2}(\mu ),}$$

where \(\mathcal{F}_{A}\) denotes the collection of restrictions of functions f from \(\mathcal{F}\) to the set A.

Let us now mention an interesting connection of the subgaussian constants with the Kantorovich distances

$$\displaystyle{W_{1}(\mu,\nu ) =\inf \int \!\!\!\int d(x,y)\,\pi (x,y)}$$

and the relative entropies

$$\displaystyle{D(\nu \vert \vert \mu ) =\int \log \frac{d\nu } {d\mu }\,d\nu }$$

(called also Kullback-Leibler’s distances or informational divergences). Here, ν is a probability measure on M, which is absolutely continuous with respect to μ (for short, ν < ​ < μ), and the infimum in the definition of W 1 is running over all probability measures π on the product space M × M with marginal distributions μ and ν, i.e., such that

$$\displaystyle{\pi (B \times M) =\mu (B),\quad \pi (M \times B) =\nu (B)\qquad (\mathrm{Borel}\ B \subset M).}$$

As was shown in [7], if (M, d) is a Polish space (complete separable), the subgaussian constant σ 2 = σ 2(μ) may be described as an optimal value in the transport-entropy inequality

$$\displaystyle{ W_{1}(\mu,\nu ) \leq \sqrt{2\sigma ^{2 } D(\nu \vert \vert \mu )}. }$$
(13)

Hence, we obtain from the inequality (4) a similar relation for measures ν supported on given subsets of M.

Corollary 3.4

Given a Borel probability measure μ on a Polish space (M,d) and a closed set A in M such that μ(A) > 0, for any Borel probability measure ν supported on A,

$$\displaystyle{W_{1}^{2}(\mu _{ A},\nu ) \leq c\sigma ^{2}(\mu )\log \Big( \frac{e} {\mu (A)}\Big)\,D(\nu \vert \vert \mu _{A}),}$$

where c is an absolute constant.

This assertion is actually equivalent to Theorem 1.1. Note that, for ν supported on A, there is an identity D(ν | | μ A ) = logμ(A) + D(ν | | μ). In particular, D(ν | | μ A ) ≤ D(ν | | μ), so the relative entropies decrease when turning to restricted measures.

For another (almost equivalent) description of the subgaussian constant, introduce the concentration function

$$\displaystyle{\mathcal{K}_{\mu }(r)\, =\,\sup \,\big [1 -\mu (A^{r})\big]\qquad (r> 0),}$$

where A r = { x ∈ M: d(x, a) < r for some a ∈ A} denotes an open r-neighbourhood of A for the metric d, and the sup is running over all Borel sets A ⊂ M of measure \(\mu (A) \geq \frac{1} {2}\). As is well-known, the transport-entropy inequality (13) gives rise to a concentration inequality on (M, d, μ) of a subgaussian type (K. Marton’s argument), but this can also be seen by a direct application of (1). Indeed, for any function f on M with ∥ f ∥ Lip ≤ 1, it implies

$$\displaystyle{\int \!\!\!\int e^{t(\,f(x)-f(y))}\,d\mu (x)\,d\mu (y)\, \leq \, e^{\sigma ^{2}t^{2} },\qquad \ t \in \mathbb{R},}$$

and, by Chebyshev’s inequality, we have a deviation bound

$$\displaystyle{(\mu \otimes \mu )\{(x,y) \in M \times M: f(x) - f(y) \geq r\}\, \leq \, e^{-r^{2}/4\sigma ^{2} },\qquad r \geq 0.}$$

In particular, one may apply it to the distance functions f(x) = d(A, x) = inf a ∈ A d(a, x). Assuming that \(\mu (A) \geq \frac{1} {2}\), the measure on the left-hand side is greater than or equal to \(\frac{1} {2}\,(1 -\mu (A^{r}))\), so that we obtain a concentration inequality

$$\displaystyle{1 -\mu (A^{r}) \leq 2e^{-r^{2}/4\sigma ^{2} }.}$$

Therefore,

$$\displaystyle{\mathcal{K}_{\mu }(r)\, \leq \,\min \Big\{ \frac{1} {2},2e^{-r^{2}/4\sigma ^{2} }\Big\}\, \leq \, e^{-r^{2}/8\sigma ^{2} }.}$$

To argue in the opposite direction, suppose that the concentration function admits a bound of the form \(\mathcal{K}_{\mu }(r) \leq e^{-r^{2}/b^{2} }\) for all r > 0 with some constant b > 0. Given a function f on M with ∥ f ∥ Lip ≤ 1, let m be a median of f under μ. Then the set A = { f ≤ m} has measure \(\mu (A) \geq \frac{1} {2}\), and by the Lipschitz property, A r ⊂ { f < m + r} for all r > 0. Hence, by the concentration hypothesis,

$$\displaystyle{\mu \{\,f - m \geq r\} \leq \mathcal{K}_{\mu }(r) \leq e^{-r^{2}/b^{2} }.}$$

A similar deviation bound also holds for the function − f with its median − m, so that

$$\displaystyle{\mu \{\vert \,f - m\vert \geq r\} \leq 2e^{-r^{2}/b^{2} },\qquad r> 0.}$$

This is sufficient to properly estimate ψ 2-norm of fm on (M, μ). Namely, for any λ < 1∕b 2,

$$\displaystyle\begin{array}{rcl} \int e^{\lambda \vert \,f-m\vert ^{2} }\,d\mu & =& 1 + 2\lambda \int _{0}^{\infty }re^{\lambda r^{2} }\,\mu \{\vert \,f - m\vert \geq r\}\,dr {}\\ & \leq & 1 + 2\lambda \int _{0}^{\infty }re^{\lambda r^{2} }\,e^{-r^{2}/b^{2} }\,dr\, =\, 1 + \frac{\lambda } { \frac{1} {b^{2}} -\lambda }\, =\, 2, {}\\ \end{array}$$

where in the last equality the value \(\lambda = \frac{1} {2b^{2}}\) is chosen. Thus, \(\int e^{\vert \,f-m\vert ^{2}/(2b^{2}) }\,d\mu \leq 2\), which means that \(\|\,f - m\|_{\psi _{2}} \leq \sqrt{2}\,b\). The latter gives \(\|\,f(x) - f(y)\|_{L^{\psi _{2}}(\mu \otimes \mu )} \leq 2\sqrt{2}\,b\). Taking the supremum over f, it remains to apply Lemma 3.2, and then we get σ 2(μ) ≤ 32 b 2.

Let us summarize.

Proposition 3.5

Let b = b(μ) be an optimal value such that the concentration function of the space (M,d,μ) satisfies a subgaussian bound \(\mathcal{K}_{\mu }(r) \leq e^{-r^{2}/b^{2} }(r> 0)\) . Then

$$\displaystyle{\frac{1} {8}\,b^{2}(\mu )\, \leq \,\sigma ^{2}(\mu ) \leq 32\,b^{2}(\mu ).}$$

Once this description of the subgaussian constant is recognized, one may give another proof of Theorem 1.1, by relating the concentration function \(\mathcal{K}_{\mu _{A}}\) to \(\mathcal{K}_{\mu }\). In this connection, let us state below as a lemma one general observation due to Barthe and Milman (cf. [4], Lemma 2.1, p. 585).

Lemma 3.6

Let a Borel probability measure ν on M be absolutely continuous with respect to μ and have density p. Suppose that, for some right-continuous, non-increasing function R: (0,1∕4] → (0,∞), such that β(ɛ) = ɛ∕R(ɛ) is increasing, we have

$$\displaystyle{\nu \{x \in M: p(x)> R(\varepsilon )\} \leq \varepsilon \qquad \Big(0 <\varepsilon \leq \frac{1} {4}\Big).}$$

Then

$$\displaystyle{K_{\nu }(r) \leq 2\beta ^{-1}\big(K_{\mu }(r/2)\big),\quad for\ all\ \ r \geq 2K_{\mu }^{-1}(\beta (1/4)).}$$

Here β −1 denotes the inverse function, and K μ −1(ɛ) = inf{r > 0: K μ (r) < ɛ}.

The 2nd Proof of Theorem 1.1

The normalized restricted measure ν = μ A has density \(p = \frac{1} {\mu (A)}\,1_{A}\) (thus taking only one non-zero value), and an optimal choice of R is the constant function \(R(\varepsilon ) = \frac{1} {\mu (A)}\). Hence, Lemma 3.6 yields the relation

$$\displaystyle{K_{\mu _{A}}(r) \leq \frac{2} {\mu (A)}\,K_{\mu }(r/2),\qquad \mathrm{for}\ \ r \geq 2K_{\mu }^{-1}(\mu (A)/4).}$$

In particular, if \(\mathcal{K}_{\mu }(r) \leq e^{-r^{2}/b^{2} }\), then

$$\displaystyle{K_{\mu _{A}}(r) \leq \frac{2} {\mu (A)}\,e^{-r^{2}/(4b^{2}) },\qquad \mathrm{for}\ \ r \geq 2b\sqrt{\log (4/\mu (A))}.}$$

Necessarily \(K_{\mu _{A}}(r) \leq \frac{1} {2}\), so the last relation may be extended to the whole positive half-axis. Moreover, at the expense of a factor in the exponent, one can remove the factor \(\frac{2} {\mu (A)}\); more precisely, we get \(K_{\mu _{A}}(r) \leq e^{-r^{2}/\tilde{b}^{2} }\) with \(\tilde{b}^{2} = \frac{4b^{2}} {\log 2} \,\log \frac{4} {\mu (A)}\), that is,

$$\displaystyle{b^{2}(\mu _{ A})\, \leq \, \frac{4} {\log 2}\,\log \frac{4} {\mu (A)}\,b^{2}(\mu ).}$$

It remains to apply the two-sided bound of Proposition 3.5. □ 

4 Proof of Theorem 1.3: Spectral Gap

Theorem 1.1 insures, in particular, that, for any function f on the metric probability space (M, d, μ) with Lipschitz semi-norm ∥ f ∥ Lip ≤ 1,

$$\displaystyle{\mathrm{Var}_{\mu _{A}}(\,f) \leq c\,\log \bigg( \frac{e} {\mu (A)}\bigg)\,\sigma ^{2}(\mu )}$$

up to some absolute constant c. In fact, in order to reach a similar concentration property of the restricted measures, it is enough to start with a Poincaré-type inequality on M,

$$\displaystyle{\lambda _{1}\mathrm{Var}_{\mu }(\,f) \leq \int \vert \nabla f\vert ^{2}\,d\mu.}$$

Under this hypothesis, a well-known theorem due to Gromov-Milman and Borovkov-Utev asserts that mean zero Lipschitz functions f have bounded ψ 1-norm. One may use a variant of this theorem proposed by Aida and Strook [1], who showed that

$$\displaystyle{\int e^{\sqrt{\lambda _{1}} \,f}\,d\mu \leq K_{ 0} = 1.720102\ldots \qquad (\|\,f\|_{\mathrm{Lip}} \leq 1).}$$

Hence

$$\displaystyle{\int e^{\sqrt{\lambda _{1}} \,\vert \,f\vert }\,d\mu \leq 2K_{ 0}\quad \mathrm{and}\quad \int e^{\frac{1} {2} \sqrt{\lambda _{1}}\,\vert \,f\vert }\,d\mu \leq \sqrt{2K_{0}} <2,}$$

thus implying that \(\|\,f\|_{\psi _{1}} \leq \frac{2} {\sqrt{\lambda _{1}}}\). In addition,

$$\displaystyle{\iint e^{\sqrt{\lambda _{1}} \,(\,f(x)-f(y))}\,d\mu (x)d\mu (y) \leq K_{ 0}^{2}\quad \mathrm{and}\quad \iint e^{\sqrt{\lambda _{1}} \,\vert \,f(x)-f(y)\vert }\,d\mu (x)d\mu (y) \leq 2K_{ 0}^{2} <6.}$$

From this,

$$\displaystyle{\int e^{\frac{1} {3} \sqrt{\lambda _{1}}\,\vert \,f(x)-f(y)\vert }\,d\mu (x)d\mu (y) <6^{1/3} <2,}$$

which means that \(\|\,f(x) - f(y)\|_{\psi _{1}} \leq \frac{3} {\sqrt{\lambda _{1}}}\) with respect to the product measure μμ on the product space M × M. This inequality is translation invariant, so the mean zero assumption may be removed. Thus, we arrive at:

Lemma 4.1

Under the Poincaré-type inequality with spectral gap λ 1 > 0, for any mean zero function f on (M,d,μ) with ∥ f∥ Lip ≤ 1,

$$\displaystyle{\|\,f\|_{\psi _{1}} \leq \frac{2} {\sqrt{\lambda _{1}}}.}$$

Moreover, for any f with ∥ f∥ Lip ≤ 1,

$$\displaystyle{ \|\,f(x) - f(y)\|_{L^{\psi _{1}}(\mu \otimes \mu )} \leq \frac{3} {\sqrt{\lambda _{1}}}. }$$
(14)

This is a version of the concentration of measure phenomenon (with exponential integrability) in presence of a Poincaré-type inequality. Our goal is therefore to extend this property to the normalized restricted measures μ A . This can be achieved by virtue of the inequality (11) which when combined with (14) yields an upper bound

$$\displaystyle{\|\,f(x) - f(y)\|_{L^{\psi _{1}}(\mu _{A}\otimes \mu _{A})}\, \leq \, 36\,e\,\log \bigg( \frac{e} {\mu (A)}\bigg)\, \frac{1} {\sqrt{\lambda _{1}}}.}$$

Moreover, if f has μ A -mean zero, the left norm dominates \(\|\,f\|_{L^{\psi _{1}}(\mu _{A})}\) (by Jensen’s inequality). We can summarize, taking into account once again Kirszbraun’s theorem, as we did in the proof of Theorem 1.1.

Proposition 4.2

Assume the metric probability space (M,d,μ) satisfies a Poincaré-type inequality with constant λ 1 > 0. Given a measurable set A ⊂ M with μ(A) > 0, for any function \(f: A \rightarrow \mathbb{R}\) with μ A -mean zero and such that ∥ f∥ Lip ≤ 1 on A,

$$\displaystyle{\|\,f\|_{L^{\psi _{1}}(\mu _{A})}\, \leq \, 36\,e\,\log \Big( \frac{e} {\mu (A)}\Big)\, \frac{1} {\sqrt{\lambda _{1}}}.}$$

Theorem 1.3 is now easily obtained with constant c = 2 (36e)2 by noting that L 2-norms are dominated by \(L^{\psi _{1}}\)-norms. More precisely, since \(e^{\vert t\vert }- 1 \geq \frac{1} {2}\,t^{2}\), one has \(\|\,f\|_{\psi _{1}}^{2} \geq \frac{1} {2}\,\|\,f\|_{2}^{2}\).

Remark 4.3

Related stability results are known for various classes of probability distributions on the Euclidean spaces \(M = \mathbb{R}^{n}\) (and even in a more general situation, where μ A is replaced by an absolutely continuous measure with respect to μ). See, in particular, the works by Milman [15, 16] on convex bodies and log-concave measures.

5 Examples

Theorems 1.1 and 1.3 involve a lot of interesting examples. Here are a few obvious cases.

  1. 1.

    The standard Gaussian measure μ = γ on \(\mathbb{R}^{n}\) satisfies a logarithmic Sobolev inequality on \(M = \mathbb{R}^{n}\) with a dimension-free constant ρ = 1. Hence, from Theorem 1.1 we get:

Corollary 5.1

For any measurable set \(A \subset \mathbb{R}^{n}\) with γ(A) > 0, the subgaussian constant σ 2 A ) of the normalized restricted measure γ A satisfies

$$\displaystyle{\sigma ^{2}(\gamma _{ A})\, \leq \, c\,\log \Big( \frac{e} {\gamma (A)}\Big),}$$

where c is an absolute constant.

As it was already mentioned, if A is convex, there is a sharper bound σ 2(γ A ) ≤ 1. However, it may not hold without convexity assumption. Nevertheless, if γ(A) is bounded away from zero, we obtain a more universal principle.

Clearly, Corollary 5.1 extends to all product measures μ = ν n on \(\mathbb{R}^{n}\) such that ν satisfies a logarithmic Sobolev inequality on the real line, and with constants c depending on ρ, only. A characterization of the property ρ > 0 in terms of the distribution function of the measure ν and the density of its absolutely continuous component may be found in [7].

  1. 2.

    Consider a uniform distribution ν on the shell

    $$\displaystyle{A_{\varepsilon } =\big\{ x \in \mathbb{R}^{n}: 1-\varepsilon \leq \vert x\vert \leq 1\big\},\qquad 0 \leq \varepsilon \leq 1\ \ (n \geq 2).}$$

Corollary 5.2

The subgaussian constant of ν satisfies  \(\sigma ^{2}(\nu ) \leq \frac{c} {n}\) , up to some absolute constant c.

In other words, mean zero Lipschitz functions f on A ɛ are such that \(\sqrt{n}\,f\) are subgaussian with universal constant factor. This property is well-known in the extreme cases—on the unit Euclidean ball A = B n (ɛ = 1) and on the unit sphere A = S n−1(ɛ = 0).

Let μ denote the normalized Lebesgue measure on B n . In the case \(\varepsilon \geq \frac{1} {n}\), the shell A ɛ represents the part of B n of measure

$$\displaystyle{\mu (A_{\varepsilon }) = 1 -\Big (1 - \frac{1} {n}\Big)^{n} \geq 1 -\frac{1} {e}.}$$

Since the logarithmic Sobolev constant of the unit ball is of order \(\frac{1} {n}\), and therefore \(\sigma ^{2}(\mu ) \leq \frac{c} {n}\), the assertion of Corollary 5.2 immediately follows from Theorem 1.1.

In case \(\varepsilon \leq \frac{1} {n}\), the assertion follows from a similar concentration property of the uniform distribution σ n−1 on the unit sphere. Indeed, with every Lipschitz function f on A ɛ one may associate its restriction to S n−1, which is also Lipschitz (with respect to the Euclidean distance). We have \(\vert \,f(r\theta ) - f(\theta )\vert \leq \vert r - 1\vert \leq \varepsilon \leq \frac{1} {n}\), for any r ∈ [1 −ɛ, 1] and θ ∈ S n−1. Hence,

$$\displaystyle\begin{array}{rcl} \vert \,f(r^{{\prime}}\theta ^{{\prime}}) - f(r\theta )\vert & \leq & \vert \,f(\theta ^{{\prime}}) - f(\theta )\vert + \vert \,f(r^{{\prime}}\theta ) - f(\theta ^{{\prime}})\vert + \vert \,f(r\theta ) - f(\theta )\vert {}\\ &\leq & \vert \,f(\theta ^{{\prime}}) - f(\theta )\vert + \frac{2} {n}, {}\\ \end{array}$$

whenever r, r  ∈ [1 −ɛ, 1] and θ, θ  ∈ S n−1, which implies

$$\displaystyle{\vert \,f(r^{{\prime}}\theta ^{{\prime}}) - f(r\theta )\vert ^{2}\, \leq \, 2\,\vert \,f(\theta ^{{\prime}}) - f(\theta )\vert ^{2} + \frac{8} {n^{2}}.}$$

But the map (r, θ) → θ pushes forward ν onto σ n−1, so, we obtain that, for any c > 0,

$$\displaystyle\begin{array}{rcl} & & \int \!\!\!\int \exp \{cn\,\vert \,f(r^{{\prime}}\theta ^{{\prime}}) - f(r\theta )\vert ^{2}\}\,d\nu (r^{{\prime}},\theta ^{{\prime}})\,d\nu (r,\theta ) {}\\ & & \quad \leq e^{8/n}\int \!\!\!\int \exp \{2cn\,\vert \,f(\theta ^{{\prime}}) - f(\theta )\vert ^{2}\}\,d\sigma _{ n-1}(\theta ^{{\prime}})\,d\sigma _{ n-1}(\theta ). {}\\ \end{array}$$

Here, for a certain numerical constant c > 0, the right-hand side is bounded by a universal constant. This constant can be replaced with 2 using Jensen’s inequality. The assertion follows from Lemma 3.2.

  1. 3.

    The two-sided product exponential measure μ on \(\mathbb{R}^{n}\) with density \(2^{-n}e^{-(\vert x_{1}\vert +\ldots +\vert x_{n}\vert )}\) satisfies a Poincaré-type inequality on \(M = \mathbb{R}^{n}\) with a dimension-free constant λ 1 = 1∕4. Hence, from Proposition 4.2 we get:

Corollary 5.3

For any measurable set \(A \subset \mathbb{R}^{n}\) with μ(A) > 0, and for any function \(f: A \rightarrow \mathbb{R}\) with μ A -mean zero and ∥ f∥ Lip ≤ 1, we have

$$\displaystyle{\|\,f\|_{L^{\psi _{1}}(\mu _{A})}\, \leq \, c\,\log \Big( \frac{e} {\mu (A)}\Big),}$$

where c is an absolute constant. In particular,

$$\displaystyle{s^{2}(\mu _{ A})\, \leq \, c\,\log ^{2}\Big( \frac{e} {\mu (A)}\Big).}$$

Clearly, Corollary 5.3 extends to all product measures μ = ν n on \(\mathbb{R}^{n}\) such that ν satisfies a Poincaré-type inequality on the real line, and with constants c depending on λ 1, only. A characterization of the property λ 1 > 0 may also be given in terms of the distribution function of ν and the density of its absolutely continuous component (cf. [7]).

  1. 4a.

    Let us take the metric probability space ({0, 1}n, d n , μ), where d n is the Hamming distance, that is, d n (x, y) = {i: x i y i }, equipped with the uniform measure μ. For this particular space, Marton established the transport-entropy inequality (13) with an optimal constant \(\sigma ^{2} = \frac{n} {4}\), cf. [12]. Using the relation (13) as an equivalent definition of the subgaussian constant, we obtain from Theorem 1.1:

Corollary 5.4

For any non-empty set A ⊂{ 0,1} n , the subgaussian constant σ 2 A ) of the normalized restricted measure μ A satisfies, up to an absolute constant c,

$$\displaystyle{ \sigma ^{2}(\mu _{ A})\, \leq \, cn\,\log \Big( \frac{e} {\mu (A)}\Big). }$$
(15)
  1. 4b.

    Let us now assume that A is monotone, i.e., A satisfies the condition

    $$\displaystyle{(x_{1},\ldots,x_{n}) \in A\quad \Rightarrow\quad (y_{1},\ldots,y_{n}) \in A,\ \mathrm{whenever}\ \ y_{i} \geq x_{i},\ i = 1,\ldots,n.}$$

Recall that the discrete cube can be equipped with a natural graph structure: there is an edge between x and y whenever they are of Hamming distance d n (x, y) = 1. For monotone sets A, the graph metric d A on the subgraph of A is equal to the restriction of d n to A × A. Indeed, we have:

$$\displaystyle{d_{n}(x,y) \leq d_{A}(x,y) \leq d_{A}(x,x\wedge y)+d_{A}(y,x\wedge y) = d_{n}(x,x\wedge y)+d_{n}(y,x\wedge y) = d_{n}(x,y),}$$

where xy = (x 1y 1, , x n y n ). Thus,

$$\displaystyle{s^{2}(\mu _{ A},d_{A}) \leq \sigma ^{2}(\mu _{ A},d_{A}) \leq cn\log \left ( \frac{e} {\mu (A)}\right ).}$$

This can be compared with what follows from a recent result of Ding and Mossel (see [9]). The authors proved that the conductance (Cheeger constant) of (A, μ A ) satisfies \(\phi (A) \geq \frac{\mu (A)} {16n}\). However, this type of isoperimetric results may not imply sharp concentration bounds. Indeed, by using Cheeger inequality, the above inequality leads to λ 1 ≥ c μ(A)2n 2 and s 2(μ A , d A ) ≤ 1∕λ 1 ≤ cn 2μ(A)2, which is even worse than the trivial estimate \(s^{2}(\mu _{A},d_{A}) \leq \frac{1} {2}\,\mathrm{diam}(A)^{2} \leq n^{2}/2\).

  1. 5.

    Let (M, d, μ) be a (separable) metric probability space with finite subgaussian constant σ 2(μ). The previous example can be naturally generalized to the product space (M n, μ n), when it is equipped with the 1-type metric

    $$\displaystyle{d_{n}(x,y) =\sum _{ i=1}^{n}d(x_{ i},y_{i}),\qquad x = (x_{1},\ldots,x_{n}),\ y = (y_{1},\ldots,y_{n}) \in M^{n}.}$$

    This can be done with the help of the following elementary observation.

Proposition 5.5

The subgaussian constant of the space (M n ,d n n ) is related to the subgaussian constant of (M,d,μ) by the equality σ 2 n ) = nσ 2 (μ).

Indeed, one may argue by induction on n. Let f be a function on M n. The Lipschitz property ∥ f ∥ Lip ≤ 1 with respect to d n is equivalent to the assertion that f is coordinatewise Lipschitz, that is, any function of the form x i  → f(x) has a Lipschitz semi-norm ≤ 1 on M for all fixed coordinates x j  ∈ M (ji). Hence, in this case, for all \(t \in \mathbb{R}\),

$$\displaystyle{\int _{M}e^{tf(x)}\,d\mu (x_{ n}) \leq \exp \Big\{ t\int _{M}f(x)\,d\mu (x_{n}) + \frac{\sigma ^{2}t^{2}} {2} \Big\},}$$

where σ 2 = σ 2(μ). Here the function (x 1, , x n−1) →  M f(x) d μ(x n ) is also coordinatewise Lipschitz. Integrating the above inequality with respect to d μ n−1(x 1, , x n−1) and applying the induction hypothesis, we thus get

$$\displaystyle{\int _{M^{n}}e^{tf(x)}\,d\mu ^{n}(x) \leq \exp \Big\{ t\int _{ M^{n}}f(x)\,d\mu ^{n}(x) + n\,\frac{\sigma ^{2}t^{2}} {2} \Big\}.}$$

But this means that σ 2(μ n) ≤ n σ 2(μ).

For an opposite bound, it is sufficient to test (1) for (M n, d n , μ n) in the class of all coordinatewise Lipschitz functions of the form f(x) = u(x 1) + + u(x n ) with μ-mean zero functions u on M such that ∥ u ∥ Lip ≤ 1.

Corollary 5.6

For any Borel set A ⊂ M n such that μ n (A) > 0, the subgaussian constant of the normalized restricted measure μ A n with respect to the ℓ 1 -type metric d n satisfies

$$\displaystyle{\sigma ^{2}(\mu _{ A}^{n}) \leq cn\sigma ^{2}(\mu )\,\log \Big( \frac{e} {\mu ^{n}(A)}\Big),}$$

where c is an absolute constant.

For example, if μ is a probability measure on \(M = \mathbb{R}\) such that \(\int _{-\infty }^{\infty }e^{x^{2}/\lambda ^{2} }\,d\mu (x) \leq 2\) (λ > 0), then for the restricted product measures we have

$$\displaystyle{ \sigma ^{2}(\mu _{ A}^{n})\, \leq \, cn\lambda ^{2}\,\log \Big( \frac{e} {\mu ^{n}(A)}\Big) }$$
(16)

with respect to the 1-norm ∥ x ∥ 1 = | x 1 | + + | x n  | on \(\mathbb{R}^{n}\).

Indeed, by the integral hypothesis on μ, for any f on \(\mathbb{R}\) with ∥ f ∥ Lip ≤ 1,

$$\displaystyle\begin{array}{rcl} \int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }e^{(\,f(x)-f(y))^{2}/2\lambda ^{2} }\,d\mu (x)d\mu (y)& \leq & \int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }e^{(x-y)^{2}/2\lambda ^{2} }\,d\mu (x)d\mu (y) {}\\ & \leq & \int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }e^{(x^{2}+y^{2})/\lambda ^{2} }\,d\mu (x)d\mu (y)\, \leq \, 4. {}\\ \end{array}$$

Hence, if f has μ-mean zero, by Jensen’s inequality,

$$\displaystyle{\int _{-\infty }^{\infty }e^{f(x)^{2}/4\lambda ^{2} }\,d\mu (x)\, \leq \,\int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }e^{(\,f(x)-f(y))^{2}/4\lambda ^{2} }\,d\mu (x)d\mu (y)\, \leq \, 2,}$$

meaning that \(\|\,f\|_{L^{\psi _{2}}(\mu )} \leq 2\lambda\). By Lemma 3.1, cf. (12), it follows that σ 2(μ) ≤ 16λ 2, so, (16) holds true by an application of Corollary 5.6.

6 Deviations for Non-Lipschitz Functions

Let us now turn to the interesting question on the relationship between the distribution of a locally Lipschitz function and the distribution of its modulus of the gradient. We still keep the setting of a metric probability space (M, d, μ) and assume it has a finite subgaussian constant σ 2 = σ 2(μ)(σ ≥ 0).

Let us say that a continuous function f on M is locally Lipschitz, if | ∇f(x) | is finite for all x ∈ M. Recall that we consider the sets

$$\displaystyle{ A =\{ x \in M: \vert \nabla f(x)\vert \leq L\},\qquad L> 0. }$$
(17)

First we state a more general version of Corollary 1.2.

Theorem 6.1

Assume that a locally Lipschitz function f on M has Lipschitz semi-norms ≤ L on the sets of the form  (17) . If \(\mu \{\vert \nabla f\vert \geq L_{0}\} \leq \frac{1} {2}\) , then for all t > 0,

$$\displaystyle{ (\mu \otimes \mu )\big\{\vert \,f(x) - f(y)\vert \geq t\big\}\, \leq \, 2\,\inf _{L\geq L_{0}}\Big[\,e^{-t^{2}/c\sigma ^{2}L^{2} } +\mu \big\{ \vert \nabla f\vert> L\big\}\Big], }$$
(18)

where c is an absolute constant.

Proof

Although the argument is already mentioned in Sect. 1, let us replace (6) with a slightly different bound. First note that the Lipschitz semi-norm of f with respect to the metric d in M is the same as its Lipschitz semi-norm with respect to the metric on the set A induced from M (which is true for any non-empty subset of M). Hence, we are in position to apply Theorem 1.1, and then the definition (1) for the normalized restriction μ A yields a subgaussian bound

$$\displaystyle{\int \!\!\!\int e^{t(\,f(x)-f(y))}\,d\mu _{ A}(x)d\mu _{A}(y) \leq e^{c\sigma ^{2}L^{2}t^{2}/2 },\quad \ \text{for all}\ \ t \in \mathbb{R},}$$

where A is defined in (17) with L ≥ L 0, and where c is universal constant. From this, for any t > 0,

$$\displaystyle{(\mu _{A} \otimes \mu _{A})\,\{(x,y) \in A \times A: \vert \,f(x) - f(y)\vert \geq t\} \leq 2e^{-t^{2}/(2c\sigma ^{2}L^{2}) },}$$

and therefore

$$\displaystyle{(\mu \otimes \mu )\,\{(x,y) \in A \times A: \vert \,f(x) - f(y)\vert \geq t\} \leq 2e^{-t^{2}/(2c\sigma ^{2}L^{2}) }.}$$

The product measure of the complement of A × A does not exceed 2μ{ | ∇f(x) | > L}, and we obtain (18). □ 

If \(\int e^{\vert \nabla f\vert ^{2} }\,d\mu \leq 2\), we have, by Chebyshev’s inequality, \(\mu \{\vert \nabla f\vert \geq L\} \leq 2e^{-L^{2} }\), so one may take \(L_{0} = \sqrt{\log 4}\). Theorem 6.1 then gives that, for any L 2 ≥ log4,

$$\displaystyle{(\mu \otimes \mu )\big\{\vert \,f(x) - f(y)\vert \geq t\big\}\, \leq \, 2\,e^{-t^{2}/c\sigma ^{2}L^{2} } + 4e^{-L^{2} }.}$$

For t ≥ 2σ one may choose here \(L^{2} = \frac{t} {\sigma }\), leading to

$$\displaystyle{(\mu \otimes \mu )\big\{\vert \,f(x) - f(y)\vert \geq t\big\}\, \leq \, 6\,e^{-t/c\sigma }\,,}$$

for some absolute constant c > 1. In case 0 ≤ t ≤ 2σ, this inequality is fulfilled automatically, so it holds for all t ≥ 0. As a result, with some absolute constant C,

$$\displaystyle{\|\,f(x) - f(y)\|_{\psi _{1}} \leq C\sigma,}$$

which is an equivalent way to state the inequality of Corollary 1.2.

As we have already mentioned, with the same arguments inequalities like (18) can be derived on the basis of subgaussian constants defined for different classes of functions. For example, one may consider the subgaussian constant \(\sigma _{\mathcal{F}}^{2}(\mu )\) for the class \(\mathcal{F}\) of all convex Lipschitz functions f on the Euclidean space \(M = \mathbb{R}^{n}\) (which we equip with the Euclidean distance). Note that | ∇f(x) | is everywhere finite in the n-space, when f is convex. Keeping in mind Remark 3.3, what we need is the following analog of Kirszbraun’s theorem:

Lemma 6.2

Let f be a convex function on \(\mathbb{R}^{n}\) . For any L > 0, there exists a convex function g on \(\mathbb{R}^{n}\) such that f = g on the set A ={ x: |∇f(x)|≤ L} and |∇g|≤ L on  \(\mathbb{R}^{n}\) .

Accepting for a moment this lemma without proof, we get:

Theorem 6.3

Assume that a convex function f on \(\mathbb{R}^{n}\) satisfies \(\mu \{\vert \nabla f\vert \geq L_{0}\} \leq \frac{1} {2}\) . Then for all t > 0,

$$\displaystyle{(\mu \otimes \mu )\big\{\vert \,f(x) - f(y)\vert \geq t\big\}\, \leq \, 2\,\inf _{L\geq L_{0}}\Big[\,e^{-t^{2}/c\sigma ^{2}L^{2} } +\mu \big\{ \vert \nabla f\vert> L\big\}\Big],}$$

where \(\sigma ^{2} =\sigma _{ \mathcal{F}}^{2}(\mu )\) and c is an absolute constant.

For illustration, let μ = μ 1μ n be an arbitrary product probability measure on the cube [−1, 1]n. If f is convex and Lipschitz on \(\mathbb{R}^{n}\), thus with | ∇f | ≤ 1, then

$$\displaystyle{ (\mu \otimes \mu )\big\{\vert \,f(x) - f(y)\vert \geq t\big\}\, \leq \, 2e^{-t^{2}/c }. }$$
(19)

This is one of the forms of Talagrand’s concentration phenomenon for the family of convex sets/functions (cf. [11, 13, 18, 19]). That is, the subgaussian constants \(\sigma _{\mathcal{F}}^{2}(\mu )\) are bounded for the class \(\mathcal{F}\) of convex Lipschitz f and product measures μ on the cube. Hence, using Theorem 6.3, Talagrand’s deviation inequality (19) admits a natural extension to the class of non-Lipschitz convex functions:

Corollary 6.4

Let μ be a product probability measure on the cube, and let f be a convex function on \(\mathbb{R}^{n}\) . If \(\mu \{\vert \nabla f\vert \geq L_{0}\} \leq \frac{1} {2}\) , then for all t > 0,

$$\displaystyle{(\mu \otimes \mu )\big\{\vert \,f(x) - f(y)\vert \geq t\big\}\, \leq \, 2\,\inf _{L\geq L_{0}}\Big[\,e^{-t^{2}/cL^{2} } +\mu \big\{ \vert \nabla f\vert> L\big\}\Big],}$$

where c is an absolute constant.

In particular, we have a statement similar to Corollary 1.2 — for this family of functions, namely

$$\displaystyle{\|\,f - m\|_{L^{\psi _{1}}(\mu )} \leq c\,\|\nabla f\|_{L^{\psi _{2}}(\mu )},}$$

where m is the μ-mean of f.

Proof of Lemma 6.2

An affine function \(l_{a,v}(x) = a + \left <x,v\right>\) (\(v \in \mathbb{R}^{n}\), \(a \in \mathbb{R}\)) may be called to be a tangent function to f, if f ≥ l on \(\mathbb{R}^{n}\) and f(x) = l a, v (x) for at least one point x. It is well-known that

$$\displaystyle{f(x) =\sup \{ l_{a,v}(x): l_{a,v} \in \mathcal{L}\},}$$

where \(\mathcal{L}\) denotes the collection of all tangent functions l a, v . Put,

$$\displaystyle{g(x) =\sup \{ l_{a,v}(x): l_{a,v} \in \mathcal{L},\ \vert v\vert \leq L\}.}$$

By the construction, g ≤ f on \(\mathbb{R}^{n}\) and, moreover,

$$\displaystyle\begin{array}{rcl} \|g\|_{\mathrm{Lip}}& \leq & \sup \{\|l_{a,v}\|_{\mathrm{Lip}}: l_{a,v} \in \mathcal{L},\ \vert v\vert \leq L\} {}\\ & =& \sup \{\vert v\vert: l_{a,v} \in \mathcal{L},\ \vert v\vert \leq L\}\, \leq \, L. {}\\ \end{array}$$

It remains to show that g = f on the set A = { | ∇f | ≤ L}. Let x ∈ A and let l a, v be tangent to f and such that l a, v (x) = f(x). This implies that \(f(y) - f(x) \geq \left <y - x,v\right>\) for all \(y \in \mathbb{R}^{n}\) and hence

$$\displaystyle{\vert \nabla f(x)\vert \, =\,\limsup _{y\rightarrow x}\frac{\vert \,f(y) - f(x)\vert } {\vert y - x\vert } \,\geq \,\limsup _{y\rightarrow x}\frac{\left <y - x,v\right>} {\vert y - x\vert } \, =\, v.}$$

Thus, | v | ≤ L, so that g(x) ≥ l a, v (x) = f(x). □ 

7 Optimality

Here we show that the logarithmic dependence in μ(A) in Theorems 1.1 and 1.3 is optimal, up to the universal constant c. We provide several examples.

Example 1

Let us return to Example 4, Sect. 5, of the discrete hypercube M = { 0, 1}n, which we equip with the Hamming distance d n and the uniform measure μ. Let us test the inequality (15) of Corollary 5.4 on the set A ⊂ {−1, 1}n consisting of n + 1 points

$$\displaystyle{(0,0,0,\ldots,0),\ \ (1,0,0,\ldots,0),\ \ (1,1,0,\ldots,0),\ \ \ldots,\ \ (1,1,1,\ldots,1).}$$

We have μ(A) = (n + 1)∕2n ≥ 1∕2n. The function \(f: A \rightarrow \mathbb{R}\), defined by

$$\displaystyle{f(x) = \sharp \{i: x_{i} = 1\} -\frac{n} {2},}$$

has a Lipschitz semi-norm ∥ f ∥ Lip ≤ 1 with respect to d and the μ A -mean zero. Moreover, \(\int f^{2}\,d\mu _{A} = \frac{n(n+2)} {12}\). Expanding the inequality \(\int e^{tf}\,d\mu _{A} \leq e^{\sigma ^{2}(\mu _{ A})\,t^{2}/2 }\) at the origin yields ∫ f 2d μ A  ≤ σ 2(μ A ). Hence, recalling that \(\sigma ^{2}(\mu ) \leq \frac{n} {4}\), we get

$$\displaystyle\begin{array}{rcl} \sigma ^{2}(\mu _{ A})\ & \geq & \int f^{2}\,d\mu _{ A}\, \geq \, \frac{n^{2}} {12} {}\\ & \geq & \frac{n} {3} \,\sigma ^{2}(\mu )\, \geq \, \frac{1} {3\log 2}\,\sigma ^{2}(\mu )\,\log \Big( \frac{1} {\mu (A)}\Big). {}\\ \end{array}$$

This example shows the optimality of (15) in the regime μ(A) → 0.

Example 2

Let γ n be the standard Gaussian measure on \(\mathbb{R}^{n}\) of dimension n ≥ 2. We have σ 2(γ n ) = 1. Consider the normalized measure \(\gamma _{A_{R}}\) on the set

$$\displaystyle{A_{R} = \left \{(x_{1},x_{2},\ldots,x_{n}) \in \mathbb{R}^{n}:\ x_{ 1}^{2} + x_{ 2}^{2} \geq R^{2}\right \},\qquad R \geq 0.}$$

Using the property that the function \(\frac{1} {2}\,(x_{1}^{2} + x_{ 2}^{2})\) has a standard exponential distribution under the measure γ n , we find that \(\gamma _{n}(A_{R}) = e^{-R^{2}/2 }\). Moreover,

$$\displaystyle\begin{array}{rcl} s^{2}(\gamma _{ A_{R}})\, \geq \,\mathrm{ Var}_{\gamma _{A_{R}}}(x_{1})& =& \int x_{1}^{2}\,d\gamma _{ A_{R}}(x)\, =\, \frac{1} {2}\int (x_{1}^{2} + x_{ 2}^{2})\,d\gamma _{ A_{R}}(x) {}\\ & =& \frac{1} {e^{-R^{2}/2}}\int _{R^{2}/2}^{\infty }re^{-r}\,dr\, =\, \frac{R^{2}} {2} + 1\, =\,\log \Big ( \frac{e} {\gamma _{n}(A_{R})}\Big). {}\\ \end{array}$$

Therefore,

$$\displaystyle{\sigma ^{2}(\gamma _{ A_{R}})\, \geq \, s^{2}(\gamma _{ A_{R}})\, \geq \,\log \Big ( \frac{e} {\gamma _{n}(A_{R})}\Big),}$$

showing that the inequality (4) of Theorem 1.1 is optimal, up to the universal constant, for any value of γ n (A) ∈ [0, 1].

Example 3

A similar conclusion can be made about the uniform probability measure μ on the Euclidean ball \(B(0,\sqrt{n})\) of radius \(\sqrt{n}\), centred at the origin (asymptotically for growing dimension n). To see this, it is sufficient to consider the cylinders

$$\displaystyle{A_{\varepsilon } =\big\{ (x_{1},y) \in \mathbb{R} \times \mathbb{R}^{n-1}: \vert x_{ 1}\vert \leq \sqrt{n -\varepsilon ^{2}}\ \,\mathrm{and}\,\,\vert y\vert \leq \varepsilon \big\},\qquad 0 <\varepsilon \leq \sqrt{n},}$$

and the function f(x) = x 1. We leave to the readers corresponding computations.

Example 4

Let μ be the two-sided exponential measure on \(\mathbb{R}\) with density \(\frac{1} {2}\,e^{-\vert x\vert }\). In this case σ 2(μ) = , but, as easy to see, 2 ≤ s 2(μ) ≤ 4 (recall that \(\lambda _{1}(\mu ) = \frac{1} {4}\)). We are going to test optimality of the inequality (7) on the sets \(A_{R} =\{ x \in \mathbb{R}: \vert x\vert \geq R\}\) (R ≥ 0). Clearly, μ(A R ) = e R, and we find that

$$\displaystyle\begin{array}{rcl} s^{2}(\mu _{ A_{R}}) \geq \mathrm{ Var}_{\mu _{A_{R}}}(x)& =& \int _{-\infty }^{\infty }x^{2}\,d\mu _{ A_{R}}(x)\, =\, \frac{1} {e^{-R}}\int _{R}^{\infty }r^{2}e^{-r}\,dr {}\\ & =& R^{2} + 2R + 2\, \geq \, (R + 1)^{2}\, =\,\log ^{2}\Big( \frac{e} {\mu (A_{R})}\Big). {}\\ \end{array}$$

Therefore,

$$\displaystyle{s^{2}(\mu _{ A_{R}}) \geq \log ^{2}\Big( \frac{e} {\mu (A_{R})}\Big),}$$

showing that the inequality (7) is optimal, up to the universal constant, for any value of μ(A) ∈ (0, 1].