Some Notes on Concentration for α-Subexponential Random Variables

Sambale, Holger

doi:10.1007/978-3-031-26979-0_7

Holger Sambale⁹

Part of the book series: Progress in Probability ((PRPR,volume 80))

410 Accesses
1 Citations

Abstract

We prove extensions of classical concentration inequalities for random variables that have α-subexponential tail decay for any α ∈ (0, 2]. This includes Hanson–Wright-type and convex concentration inequalities in various situations. In particular, we show uniform Hanson–Wright inequalities and convex concentration results for simple random tensors in the spirit of recent work by Klochkov–Zhivotovskiy (Electron J Probab 25(22):30, 2020) and Vershynin (Bernoulli 26(4):3139–3162, 2020).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Concentration inequalities for upper probabilities

Article Open access 03 June 2020

Estimates for the Concentration Functions in the Littlewood–Offord Problem

Article 05 March 2015

Arak’s inequalities for concentration functions and the Littlewood–Offord problem

Article 01 March 2016

Keywords

2020 Mathematics Subject Classification:

60E15; 60F10; 46E30; 46N30

1 Introduction

The aim of this note is to compile a number of smaller results that extend some classical as well as more recent concentration inequalities for bounded or sub-Gaussian random variables to random variables with heavier (but still exponential type) tails. In detail, we shall consider random variables X that satisfy

$$\displaystyle \begin{aligned} \mathbb {P}(|X-\mathbb {E}X| \ge t) \le 2 \exp(-t^\alpha/C_{1,\alpha}^\alpha) \end{aligned} $$

(1.1)

for any t ≥ 0, some α ∈ (0, 2], and a suitable constant C_1,α > 0. Such random variables are sometimes called α-subexponential (for α = 2, they are sub-Gaussian) or sub-Weibull(α) (cf. [23, Definition 2.2]).

There are several equivalent reformulations of (1.1), e. g., in terms of L^p norms:

$$\displaystyle \begin{aligned} \lVert X \rVert_{L^p} \le C_{2,\alpha} p^{1/\alpha} \end{aligned} $$

(1.2)

for any p ≥ 1. Another characterization is that these random variables have finite Orlicz norms of order α, i. e.,

$$\displaystyle \begin{aligned} C_{3,\alpha} := \lVert X \rVert_{\Psi_\alpha} := \inf \{t > 0 : \mathbb {E} \exp ((|X|/t)^\alpha) \le 2 \} < \infty. \end{aligned} $$

(1.3)

If α < 1, $\lVert \cdot \rVert _{\Psi _\alpha }$ is actually a quasi-norm; however, many norm-like properties (such as a triangle-type inequality) can nevertheless be recovered up to α-dependent constants (see, e. g., [12, Appendix A]). In fact, C_1,α, C_2,α, and C_3,α can be chosen such that they only differ by a constant α-dependent factor.

Note that α-subexponential random variables have log-convex (if α ≤ 1) or log-concave (if α ≥ 1) tails, i. e., $t \mapsto - \log \mathbb {P}(\lvert {X}\rvert \ge t)$ is convex or concave, respectively. For log-convex or log-concave measures, two-sided L^p norm estimates for polynomial chaos (and as a consequence, concentration bounds) have been established over the last 25 years. In the log-convex case, results of this type have been derived for linear forms in [17] and for forms of any order in [12, 21]. For log-concave measures, starting with linear forms again in [10], important contributions have been made in [3, 24, 25, 27].

In this note, we mainly present four different results for functions of α-subexponential random variables: a Hanson–Wright-type inequality in Sect. 2, a version of the convex concentration inequality in Sect. 3, a uniform Hanson–Wright inequality in Sect. 4, and finally a convex concentration inequality for simple random tensors in Sect. 5. These results are partly based on and generalize recent research, e. g., [20] and [42]. In fact, they partially build upon each other: for instance, in the proofs of Sect. 5, we apply results both from Sects. 2 and 3. A more detailed discussion is provided in each of the sections.

Finally, let us introduce some conventions that we will use in this chapter.

Notations.

If X₁, …, X_n is a sequence of random variables, we denote by X = (X₁, …, X_n) the corresponding random vector. Moreover, we shall need the following types of norms throughout the paper:

The norms $\lVert x \rVert _p := (\sum _{i=1}^n\lvert {x_i}\rvert ^p)^{1/p}$ for $x \in \mathbb {R}^n$
The L^p norms $\lVert X \rVert _{L^p} := (\mathbb {E}|X|{ }^p)^{1/p}$ for random variables X (cf. (1.2))
The Orlicz (quasi-)norms $\lVert X \rVert _{\Psi _\alpha }$ as introduced in (1.3)
The Hilbert–Schmidt and operator norms $\lVert A \rVert _{\mathrm {HS}} := (\sum _{i,j} a_{ij}^2)^{1/2}$, $\lVert A \rVert _{\mathrm {op}} := \sup \{ \lVert Ax \rVert _2 : \lVert x \rVert _2 = 1\}$ for matrices A = (a_ij)

The constants appearing in this chapter (typically denoted C or c) may vary from line to line. Without subscript, they are assumed to be absolute, and if they depend on α (only), we shall write C_α or c_α.

2 A Generalized Hanson–Wright Inequality

Arguably, the most famous concentration result for quadratic form is the Hanson–Wright inequality, which first appeared in [16]. We may state it as follows: assuming X₁, …, X_n are centered, independent random variables satisfying $\lVert X_i \rVert _{\Psi _2} \le K$ for any i and A = (a_ij) is a symmetric matrix, we have for any t ≥ 0

$$\displaystyle \begin{aligned} {\mathbb {P}}\big(\lvert{X^TAX - {\mathbb {E}} X^TAX}\rvert \ge t\big) \le 2 \exp\Big( - \frac{1}{C} \min\Big( \frac{t^2}{K^4 \lVert A \rVert _{\mathrm{HS}}^2}, \frac{t}{K^2 \lVert A \rVert _{\mathrm{op}}} \Big)\Big). \end{aligned}$$

For a modern proof, see [33], and for various developments, cf. [2, 4, 18, 43].

In this note, we provide an extension of the Hanson–Wright inequality to random variables with bounded Orlicz norms of any order α ∈ (0, 2]. This complements the results in [12], where the case of α ∈ (0, 1] was considered, while for α = 2, we get back the actual Hanson–Wright inequality.

Theorem 21

For any α ∈ (0, 2], let X₁, …, X_n be independent, centered random variables such that $\lVert X_i \rVert _{\Psi _\alpha } \le K$ for any i and A = (a_ij) be a symmetric matrix. Then, for any t ≥ 0,

$$\displaystyle \begin{aligned} {\mathbb {P}}\big(\lvert{X^TAX - {\mathbb {E}} X^TAX}\rvert \ge t\big) \le 2 \exp\Big( - \frac{1}{C_\alpha} \min\Big( \frac{t^2}{K^4 \lVert A \rVert _{\mathrm{HS}}^2}, \Big( \frac{t}{K^2 \lVert A \rVert _{\mathrm{op}}}\Big)^{\frac \alpha 2} \Big)\Big). \end{aligned}$$

Theorem 21 generalizes and implies a number of inequalities for quadratic forms in α-subexponential random variables (in particular for α = 1) that are spread throughout the literature. For a detailed discussion, see [12, Remark 1.7]. Note that it is possible to sharpen the tail estimate given by Theorem 21, cf., e. g., [12, Corollary 1.4] for α ∈ (0, 1] or [3, Theorem 3.2] for α ∈ [1, 2] (in fact, the proof of Theorem 21 works by evaluating the family of norms used therein). The main benefit of Theorem 21 is that it uses norms that are easily calculable and in many situations already sufficient for applications.

Before we give the proof of Theorem 21, let us briefly mention that for the standard Hanson–Wright inequality, a number of selected applications can be found in [33]. Some of them were generalized to α-subexponential random variables with α ≤ 1 in [12], and it is no problem to extend these proofs to any order α ∈ (0, 2] using Theorem 21. Here, we just focus on a single example that yields a concentration result for the Euclidean norm of a linear transformation of a vector X having independent components with bounded Orlicz norms around the Hilbert–Schmidt norm of the transformation matrix. This is a variant and extension of [12, Proposition 2.1] and will be applied in Sect. 5.

Proposition 22

Let X₁, …, X_n be independent, centered random variables such that ${\mathbb {E}} X_i^2 = 1$ and $\lVert X_i \rVert _{\Psi _\alpha } \le K$ for some α ∈ (0, 2] and let B ≠ 0 be an m × n matrix. For any t ≥ 0, we have

$$\displaystyle \begin{aligned} {\mathbb {P}}(\lvert{\lVert BX \rVert _2 - \lVert B \rVert _{\mathrm{HS}}}\rvert \ge t K^2 \lVert B \rVert _{\mathrm{op}}) \le 2\exp(- t^\alpha/C_\alpha ). \end{aligned} $$

(2.1)

In particular, for any t ≥ 0, it holds

$$\displaystyle \begin{aligned} {\mathbb {P}}( \lvert{\lVert X \rVert _2 - \sqrt{n}}\rvert \ge tK^2) \le 2\exp(- t^\alpha/C_\alpha). \end{aligned} $$

(2.2)

For the proofs, let us recall some elementary relations that we will use throughout the paper to adjust the constants in the tail bounds we derive.

Adjusting constants.

For any two constants C₁ > C₂ > 1, we have for all r ≥ 0 and C > 0

$$\displaystyle \begin{aligned} C_1 \exp(-r/C) \le C_2 \exp\Big(-\frac{\log(C_2)}{C\log(C_1)} r\Big) \end{aligned} $$

(2.3)

whenever the left-hand side is smaller or equal to 1 (cf., e. g., [35, Eq. (3.1)]). Moreover, for any α ∈ (0, 2), any γ > 0, and all t ≥ 0, we may always estimate

$$\displaystyle \begin{aligned} \exp(-(t/C)^2) \le 2 \exp(-(t/C')^\alpha), \end{aligned} $$

(2.4)

using $\exp (-s^2) \le \exp (1-s^\alpha )$ for any s > 0 and (2.3). More precisely, we may choose $C' := C/\log ^{1/\alpha }(2)$. Note that strictly speaking, the range of t∕C ≤ 1 is not covered by (2.3); however, in this case (in particular, choosing C′ as suggested), both sides of (2.4) are at least 1 anyway so that the right-hand side still provides a valid upper bound for any probability.

Let us now turn to the proof of Theorem 21. In what follows, we actually show that for any p ≥ 2,

$$\displaystyle \begin{aligned} \lVert X^TAX - {\mathbb {E}} X^TAX \rVert _{L^p} \le C_\alpha K^2 \big(p^{1/2} \lVert A \rVert _{\mathrm{HS}} + p^{2/\alpha} \lVert A \rVert _{\mathrm{op}} \big). \end{aligned} $$

(2.5)

From here, Theorem 21 follows by standard means (cf. [34, Proof of Theorem 3.6]). Moreover, we may restrict ourselves to α ∈ (1, 2], since the case of α ∈ (0, 1] has been proven in [12].

Proof of Theorem 21

First we shall treat the off-diagonal part of the quadratic form. Let $w^{(1)}_i, w^{(2)}_i$ be independent (of each other as well as of the X_i) symmetrized Weibull random variables with scale 1 and shape α, i. e., $w^{(j)}_i$ are symmetric random variables with $\mathbb {P}(\lvert {w^{(j)}_i}\rvert \ge t) = \exp (-t^\alpha )$. In particular, the $w^{(j)}_i$ have logarithmically concave tails.

Using standard decoupling and symmetrization arguments (cf. [8, Theorem 3.1.1 & Lemma 1.2.6]) as well as [3, Theorem 3.2] in the second inequality, for any p ≥ 2, it holds

$$\displaystyle \begin{aligned} \lVert \sum_{i \neq j} a_{ij} X_i X_j \rVert _{L^p} &\le C_\alpha K^2 \lVert \sum_{i \neq j} a_{ij} w^{(1)}_i w^{(2)}_j \rVert _{L^p} \\ &\le C_\alpha K^2(\lVert A \rVert _{\{1,2\},p}^{\mathcal{N}} + \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}}), \end{aligned} $$

(2.6)

where the norms $\lVert A \rVert _{\mathcal {J},p}^{\mathcal {N}}$ are defined as in [3]. Instead of repeating the general definitions, we will only focus on the case we need in our situation. Indeed, for the symmetric Weibull distribution with parameter α, we have (again, in the notation of [3]) N(t) = t^α, and so for α ∈ (1, 2], it follows that $\hat {N}(t) = \min (t^2, \lvert {t}\rvert ^\alpha )$. Hence, the norms can be written as follows:

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{1,2\},p}^{\mathcal{N}} &= 2 \sup\big\lbrace \sum_{i,j} a_{ij} x_{ij} : \sum_{i = 1}^n \min\big(\sum_{j} x_{ij}^2, \big(\sum_{j} x_{ij}^2\big)^{\alpha/2}\big) \le p \big\rbrace,\\ \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}} &= \sup\big\lbrace \sum_{i,j} a_{ij} x_i y_j : \sum_{i = 1}^n \min(x_i^2, \lvert x_i \rvert ^{\alpha}) \\ &\le p, \sum_{j = 1}^n \min(y_j^2, \lvert y_j \rvert ^{\alpha}) \le p \big\rbrace. \end{aligned} $$

Before continuing with the proof, we next introduce a lemma that will help to rewrite the norms in a more tractable form. □

Lemma 23

For any p ≥ 2, define

$$\displaystyle \begin{aligned} I_1(p) &:= \big\lbrace x = (x_{ij}) \in {\mathbb {R}}^{n \times n} : \sum_{i = 1}^n \min\big( \big( \sum_{j = 1}^n x_{ij}^2\big)^{\alpha/2}, \sum_{j = 1}^n x_{ij}^2 \big) \le p \big\rbrace,\\ I_2(p) &:= \big\lbrace x_{ij} = z_i y_{ij} \in {\mathbb {R}}^{n \times n} : \sum_{i = 1}^n \min(\lvert z_i \rvert ^\alpha, z_i^2) \le p, \max_{i = 1,\ldots,n} \sum_{j = 1}^n y_{ij}^2 \le 1 \big\rbrace. \end{aligned} $$

Then I₁(p) = I₂(p).

Proof

The inclusion I₁(p) ⊇ I₂(p) is an easy calculation, and the inclusion I₁(p) ⊆ I₂(p) follows by defining $z_i = \lVert (x_{ij})_j \rVert $ and $y_{ij} = x_{ij} / \lVert (x_{ij})_j \rVert $ (or 0, if the norm is zero). □

Proof of Theorem 21, continued

For brevity, for any matrix A = (a_ij), let us write $\lVert A \rVert _m := \max _{i = 1,\ldots ,n} ( \sum _{j = 1}^n a_{ij}^2 )^{1/2}$. Note that clearly, $\lVert A \rVert _m \le \lVert A \rVert _{\mathrm {op}}$.

Now, fix some vector $z \in {\mathbb {R}}^n$ such that $\sum _{i = 1}^n \min (\lvert z_i \rvert ^\alpha , z_i^2) \le p$. The condition also implies

$$\displaystyle \begin{aligned} p \ge \sum_{i = 1}^n \lvert z_i \rvert ^\alpha \mathbb m{1}_{\{\lvert z_i \rvert > 1\}} + \sum_{i= 1}^n z_i^2 \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}} \ge \max\Big( \sum_{i=1}^n z_i^2 \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}}, \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert > 1\}} \Big), \end{aligned}$$

where in the second step we used α ∈ [1, 2] to estimate $\lvert z_i \rvert ^\alpha \mathbb m{1} _{\{\lvert z_i \rvert > 1\}} \ge \lvert z_i \rvert \mathbb m{1} _{\{\lvert z_i \rvert > 1\}}$. So, given any z and y satisfying the conditions of I₂(p), we can write

$$\displaystyle \begin{aligned} \lvert \sum_{i,j} a_{ij} z_i y_{ij} \rvert &\le \sum_{i = 1}^n \lvert z_i \rvert \big( \sum_{j = 1}^n a_{ij}^2 \big)^{1/2} \big( \sum_{j = 1}^n y_{ij}^2 \big)^{1/2} \le \sum_{i = 1}^n \lvert z_i \rvert \big( \sum_{j = 1}^n a_{ij}^2 \big)^{1/2} \\ &\le \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}} \big( \sum_{j = 1}^n a_{ij}^2 \big)^{1/2} + \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert > 1\}} \big( \sum_{j =1 }^n a_{ij}^2 \big)^{1/2} \\ &\le \lVert A \rVert _{\mathrm{HS}} \big(\sum_{i = 1}^n z_i^2 \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}}\big)^{1/2} + \lVert A \rVert _m \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert > 1\}}. \end{aligned} $$

So, this yields

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{1,2\},p}^{\mathcal{N}} \le 2p^{1/2} \lVert A \rVert _{\mathrm{HS}} + 2p \lVert A \rVert _m \le 2p^{1/2} \lVert A \rVert _{\mathrm{HS}} + 2p \lVert A \rVert _{\mathrm{op}}. \end{aligned} $$

(2.7)

As for $\lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal {N}}$, we can use the decomposition z = z₁ + z₂, where $(z_1)_i = z_i \mathbb m{1} _{\{\lvert z_i \rvert > 1\}}$ and z₂ = z − z₁, and obtain

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}} &\le \sup\big\lbrace \sum_{ij} a_{ij} (x_1)_i (y_1)_j : \lVert x_1 \rVert _\alpha \le p^{1/\alpha}, \lVert y_1 \rVert _\alpha \le p^{1/\alpha} \big\rbrace \\ &+ 2 \sup\big\lbrace \sum_{ij} a_{ij} (x_1)_i (y_2)_j : \lVert x_1 \rVert _\alpha \le p^{1/\alpha}, \lVert y_2 \rVert _2 \le p^{1/2} \big\rbrace \\ &+ \sup\big\lbrace\sum_{ij} a_{ij} (x_2)_i (y_2)_j : \lVert x_2 \rVert _2 \le p^{1/2}, \lVert y_2 \rVert _2 \le p^{1/2} \big\rbrace \\ &= p^{2/\alpha} \sup\lbrace \ldots \rbrace + 2p^{1/\alpha + 1/2} \sup\{\ldots\} + p \lVert A \rVert _{\mathrm{op}} \end{aligned} $$

(in the braces, the conditions $\lVert \cdot \rVert _\beta \le p^{1/\beta }$ have been replaced by $\lVert \cdot \rVert _\beta \le 1$). Clearly, since $\lVert x_1 \rVert _\alpha \le 1$ implies $\lVert x_1 \rVert _2 \le 1$ (and the same for y₁), all of the norms can be upper bounded by $\lVert A \rVert _{\mathrm {op}}$, i. e., we have

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}} \le (p^{2/\alpha} + 2p^{1/\alpha + 1/2} + p) \lVert A \rVert _{\mathrm{op}} \le 4p^{2/\alpha} \lVert A \rVert _{\mathrm{op}}, \end{aligned} $$

(2.8)

where the last inequality follows from p ≥ 2 and 1∕2 ≤ 1∕α ≤ 1 ≤ (α + 2)∕(2α) ≤ 2∕α.

Combining the estimates (2.6), (2.7), and (2.8) yields

$$\displaystyle \begin{aligned} \lVert \sum_{i,j} a_{ij} X_i X_j \rVert _{L^p} \le C_\alpha K^2\big( 2p^{1/2} \lVert A \rVert _{\mathrm{HS}} + 6p^{2/\alpha} \lVert A \rVert _{\mathrm{op}} \big). \end{aligned} $$

To treat the diagonal terms, we use Corollary 6.1 in [12], as $X_i^2$ are independent and satisfy $\lVert X_i^2 \rVert _{\Psi _{\alpha /2}} \le K^2$, so that it yields

$$\displaystyle \begin{aligned} {\mathbb {P}}\big( \lvert \sum_{i = 1}^n a_{ii} (X_i^2 - {\mathbb {E}} X_i^2) \rvert \ge t\big) \le 2 \exp&\Big( - \frac{1}{C_\alpha K^2} \min\Big( \frac{t^2}{\sum_{i =1}^n a_{ii}^2},\\ &\Big( \frac{t}{\max_{i = 1,\ldots,n} \lvert a_{ii} \rvert }\Big)^{\alpha/2} \Big)\Big). \end{aligned} $$

Now it is clear that $\max _{i = 1,\ldots , n} \lvert a_{ii} \rvert \le \lVert A \rVert _{\mathrm {op}}$ and $\sum _{i = 1}^n a_{ii}^2 \le \lVert A \rVert _{\mathrm {HS}}^2$. In particular,

$$\displaystyle \begin{aligned} \lVert \sum_{i = 1}^n a_{ii} (X_i^2 - {\mathbb {E}} X_i^2) \rVert _{L^p} \le C_\alpha K^2(p^{1/2} \lVert A \rVert _{\mathrm{HS}} + p^{2/\alpha} \lVert A \rVert _{\mathrm{op}}). \end{aligned}$$

The claim (2.5) now follows from Minkowski’s inequality. □

Finally, we prove Proposition 22.

Proof of Proposition 22

It suffices to prove (2.1) for matrices satisfying $\lVert B \rVert _{\mathrm {HS}} = 1$, as otherwise we set $\widetilde {B} = B \lVert B \rVert _{\mathrm {HS}}^{-1}$ and use the equality

$$\displaystyle \begin{aligned} \{ \lvert \lVert BX \rVert _2 - \lVert B \rVert _{\mathrm{HS}} \rvert \ge \lVert B \rVert _{\mathrm{op}} t \} = \{ \lvert \lVert \widetilde{B}X \rVert _2 - 1 \rvert \ge \lVert \widetilde{B} \rVert _{\mathrm{op}} t \}. \end{aligned}$$

Now let us apply Theorem 21 to the matrix A := B^TB. An easy calculation shows that $\mathrm {trace}(A) = \mathrm {trace}(B^T B) = \lVert B \rVert _{\mathrm {HS}}^2 = 1$, so that we have for any t ≥ 0

$$\displaystyle \begin{aligned} {\mathbb {P}}\big( \lvert \lVert BX \rVert _2 - 1 \rvert \ge t \big) &\le {\mathbb {P}}\big( \lvert \lVert BX \rVert _2^2 - 1 \rvert \ge \max(t, t^2) \big) \\ &\le 2\exp\Big( - \frac{1}{C_\alpha} \min\Big( \frac{\max(t,t^2)^2}{K^4\lVert B \rVert _{\mathrm{op}}^2}, \Big( \frac{\max(t,t^2)}{K^4 \lVert B \rVert _{\mathrm{op}}^2} \Big)^{\alpha/2} \Big) \Big) \\ &\le 2\exp\Big( - \frac{1}{C_\alpha} \min\Big( \frac{t^2}{K^4 \lVert B \rVert _{\mathrm{op}}^2}, \Big( \frac{t^2}{K^4 \lVert B \rVert _{\mathrm{op}}^2} \Big)^{\alpha/2} \Big) \Big) \\ &\le 2\exp\Big( - \frac{1}{C_\alpha} \Big( \frac{t}{K^2 \lVert B \rVert _{\mathrm{op}}} \Big)^{\alpha} \Big). \end{aligned} $$

Here, the first step follows from $\lvert z - 1 \rvert \le \min ( \lvert z^2 -1 \rvert , \lvert z^2 - 1 \rvert ^{1/2})$ for z ≥ 0, in the second step, we have used the estimates $\lVert A \rVert _{\mathrm {HS}}^2 \le \lVert B \rVert _{\mathrm {op}}^2 \lVert B \rVert _{\mathrm {HS}}^2 = \lVert B \rVert _{\mathrm {op}}^2$ and $\lVert A \rVert _{\mathrm {op}} \le \lVert B \rVert _{\mathrm {op}}^2$, and moreover, the fact that since $\mathbb {E}X_i^2 = 1$, K ≥ C_α > 0 (cf., e. g., [12, Lemma A.2]), while the last step follows from (2.4) and (2.3). Setting $t = K^2s\lVert B \rVert _{\mathrm {op}}$ for s ≥ 0 finishes the proof of (2.1). Finally, (2.2) follows by taking m = n and B = I. □

3 Convex Concentration for Random Variables with Bounded Orlicz Norms

Assume X₁, …, X_n are independent random variables each taking values in some bounded interval [a, b]. Then, by convex concentration as established in [19, 29, 38], for every convex 1-Lipschitz function $f : [a,b]^n \to \mathbb {R}$,

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{t^2}{2(b-a)^2}\Big) \end{aligned} $$

(3.1)

for any t ≥ 0 (see, e. g., [36, Corollary 3]).

While convex concentration for bounded random variables is by now standard, there is less literature for unbounded random variables. In [31], a Martingale-type approach is used, leading to a result for functionals with stochastically bounded increments. The special case of suprema of unbounded empirical processes was treated in [1, 28, 40]. Another branch of research, begun in [29] and continued, e. g., in [5, 13,14,15, 36, 37], is based on functional inequalities (such as Poincaré or log-Sobolev inequalities) restricted to convex functions and weak transport-entropy inequalities. In [20, Lemma 1.8], a generalization of (3.1) for sub-Gaussian random variables (α = 2) was proven, which we may extend to any order α ∈ (0, 2].

Proposition 31

Let X₁, …, X_n be independent random variables, α ∈ (0, 2] and $f : \mathbb {R}^n \to \mathbb {R}$ convex and 1-Lipschitz. Then, for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{t^\alpha}{C_\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big). \end{aligned}$$

In particular,

$$\displaystyle \begin{aligned} \lVert f(X) - \mathbb {E}f(X) \rVert_{\Psi_\alpha} \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}. \end{aligned} $$

(3.2)

Note that the main results of the following two sections can be regarded as applications of Proposition 31. If f is separately convex only (i. e., convex is every coordinate with the other coordinates being fixed), it is still possible to prove a corresponding result for the upper tails. Indeed, it is no problem to modify the proof below accordingly, replacing (3.1) by [7, Theorem 6.10]. Moreover, note that $\lVert \max _i |X_i| \rVert _{\Psi _\alpha }$ cannot be replaced by $\max _i \lVert |X_i| \rVert _{\Psi _\alpha }$ (a counterexample for α = 2 is provided in [20]). In general, the Orlicz norm of $\max _i\lvert X_i \rvert $ will be of order $(\log n)^{1/\alpha }$ (cf. Lemma 56).

Proof of Proposition 31

Following the lines of the proof of [20, Lemma 3.5], the key step is a suitable truncation that goes back to [1]. Indeed, write

$$\displaystyle \begin{aligned} X_i = X_i1_{\{|X_i|\le M\}} + X_i1_{\{|X_i| > M\}} =: Y_i + Z_i \end{aligned} $$

(3.3)

with $M := 8 \mathbb {E}\max _i |X_i|$ (in particular, $M \le C_\alpha \lVert \max _i |X_i| \rVert _{\Psi _\alpha }$, cf. [12, Lemma A.2]), and let Y = (Y₁, …, Y_n), Z = (Z₁, …, Z_n). By the Lipschitz property of f,

$$\displaystyle \begin{aligned} &\mathbb {P}(|f(X) - \mathbb {E}f(X)| > t)\\ \le \ &\mathbb {P}(|f(Y) - \mathbb {E}f(Y)| + |f(X) - f(Y)| + |\mathbb {E}f(Y) - \mathbb {E}f(X)| > t)\\ \le \ &\mathbb {P}(|f(Y) - \mathbb {E}f(Y)| + \lVert Z \rVert_2 + \mathbb {E}\lVert Z \rVert_2 > t), \end{aligned} $$

(3.4)

and hence, it suffices to bound the terms in the last line.

Applying (3.1) to Y and using (2.4) and (2.3), we obtain

$$\displaystyle \begin{aligned} \mathbb {P}(|f(Y) - \mathbb {E}f(Y)| > t) \le 2 \exp\Big(- \frac{t^\alpha}{C_\alpha^\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big). \end{aligned} $$

(3.5)

Furthermore, below we will show that

$$\displaystyle \begin{aligned} \lVert \lVert Z \rVert_2 \rVert_{\Psi_\alpha} \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}. \end{aligned} $$

(3.6)

Hence, for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(\lVert Z \rVert_2 \ge t) \le 2 \exp \Big(- \frac{t^\alpha}{C_\alpha^\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big), \end{aligned} $$

(3.7)

and by [12, Lemma A.2],

$$\displaystyle \begin{aligned} \mathbb {E}\lVert Z \rVert_2 \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}. \end{aligned} $$

(3.8)

Temporarily writing $K := C_\alpha \lVert \max _i |X_i| \rVert _{\Psi _\alpha }$, where C_α is large enough so that (3.5), (3.7), and (3.8) hold, (3.4) and (3.8) yield

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le \mathbb {P}(|f(Y) - \mathbb {E}f(Y)| + \lVert Z \rVert_2 > t - K) \end{aligned}$$

if t ≥ K. Using subadditivity and invoking (3.5) and (3.7), we obtain

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 4 \exp\Big(-\frac{(t-K)^\alpha}{(2K)^\alpha}\Big) \le 4 \exp\Big(-\frac{t^\alpha}{c_\alpha(2K)^\alpha}\Big), \end{aligned}$$

where the last step holds for t ≥ K + δ for some δ > 0. This bound extends trivially to any t ≥ 0 (if necessary, by a suitable change of constants). Finally, the constant in front of the exponential may be adjusted to 2 by (2.3), which finishes the proof.

It remains to show (3.6). To this end, recall the Hoffmann-Jørgensen inequality (cf. [30, Theorem 6.8]) in the following form: if W₁, …, W_n are independent random variables, S_k := W₁ + … + W_k, and t ≥ 0 is such that $\mathbb {P}(\max _k |S_k| > t) \le 1/8$, then

$$\displaystyle \begin{aligned} \mathbb {E}\max_k|S_k| \le 3 {\mathbb {E}} \max_i|W_i| + 8t. \end{aligned}$$

In our case, we set $W_i := Z_i^2$, t = 0, and note that by Chebyshev’s inequality,

$$\displaystyle \begin{aligned} \mathbb {P}(\max_iZ_i^2 > 0) = \mathbb {P}(\max_i|X_i| > M) \le \mathbb {E}\max_i|X_i|/M = 1/8, \end{aligned}$$

and consequently, recalling that $S_k = Z_1^2 + \ldots + Z_k^2$,

$$\displaystyle \begin{aligned} \mathbb {P}(\max_k |S_k| > 0) \le \mathbb {P}(\max_i Z_i^2 > 0) \le 1/8. \end{aligned}$$

Thus, together with [12, Lemma A.2], we obtain

$$\displaystyle \begin{aligned} \mathbb {E}\lVert Z \rVert_2^2 \le 3 \mathbb {E} \max_i Z_i^2 \le C_\alpha \lVert \max_i Z_i^2 \rVert_{\Psi_{\alpha/2}}. \end{aligned}$$

Now it is easy to see that $\lVert \max _i Z_i^2 \rVert _{\Psi _{\alpha /2}} \le \lVert \max _i |X_i| \rVert _{\Psi _\alpha }^2$, so that altogether we arrive at

$$\displaystyle \begin{aligned} \mathbb {E}\lVert Z \rVert_2^2 \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}^2. \end{aligned} $$

(3.9)

Furthermore, by [30, Theorem 6.21], if W₁, …, W_n are independent random variables with zero mean and α ∈ (0, 1],

$$\displaystyle \begin{aligned} \lVert \sum_{i=1}^{n} W_i \rVert_{\Psi_\alpha} \le C_\alpha (\lVert \sum_{i=1}^n W_i \rVert_{L^1} + \lVert \max_i |W_i| \rVert_{\Psi_\alpha}). \end{aligned}$$

In our case, we consider $W_i = Z_i^2 - \mathbb {E} Z_i^2$ and α∕2 (instead of α). Together with the previous arguments (in particular, (3.9)) and [12, Lemma A.3], this yields

$$\displaystyle \begin{aligned} \lVert \sum_{i=1}^n(Z_i^2 - \mathbb {E}Z_i^2) \rVert_{\Psi_{\alpha/2}} &\le C_\alpha (\mathbb {E}|\lVert Z\rVert_2^2 - \mathbb {E} \lVert Z\rVert_2^2| + \lVert \max_i |Z_i^2 - \mathbb {E} Z_i^2| \rVert_{\Psi_{\alpha/2}})\\ &\le C_\alpha (\mathbb {E}\lVert Z\rVert_2^2 + \lVert \max_i Z_i^2 \rVert_{\Psi_{\alpha/2}}) \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}^2. \end{aligned} $$

Combining this with [12, Lemma A.3] and (3.9), we arrive at (3.6). □

4 Uniform Tail Bounds for First- and Second-Order Chaos

In this section, we discuss bounds for the tails of the supremum of certain chaos-type classes of functions. Even if we are particularly interested in quadratic forms, i. e., uniform Hanson–Wright inequalities, let us first consider linear forms.

Let X₁, …, X_n be independent random variables, let α ∈ (0, 2], and let $\{a_{i,t} : i = 1, \ldots , n, t \in \mathcal {T}\}$ be a compact set of real numbers, where $\mathcal {T}$ is some index set. Consider $g(X) := \sup _{t \in \mathcal {T}} \sum _{i=1}^n a_{i,t}X_i$. Clearly, g is convex and has Lipschitz constant $D := \sup _{t \in \mathcal {T}} (\sum _{i=1}^n a_{i,t}^2)^{1/2}$. Therefore, applying Proposition 31, we immediately obtain that for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(\lvert g(X)-\mathbb {E}g(X) \rvert \ge t) \le 2 \exp\Big(-\frac{t^\alpha}{C_\alpha D^\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big). \end{aligned} $$

(4.1)

For bounded random variables, corresponding tail bounds can be found, e. g., in [32, Eq. (14)], and choosing α = 2, we get back this result up to constants.

Our main aim is to derive a second-order analogue of (4.1), i. e., a uniform Hanson–Wright inequality. A pioneering result in this direction (for Rademacher variables) can be found in [39]. Later results include [2] (which requires the so-called concentration property), [22], [9], and [11] (certain classes of weakly dependent random variables). In [20], a uniform Hanson–Wright inequality for sub-Gaussian random variables was proven. We may show a similar result for random variables with bounded Orlicz norms of any order α ∈ (0, 2].

Theorem 41

Let X₁, …, X_n be independent, centered random variables and $K := \lVert \max _i \lvert X_i \rvert \rVert _{\Psi _\alpha }$, where α ∈ (0, 2]. Let $\mathcal {A}$ be a compact set of real symmetric n × n matrices, and let $f(X) := \sup _{A \in \mathcal {A}} (X^TAX - \mathbb {E}X^TAX)$. Then, for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(f(X) - \mathbb {E}f(X) \ge t) \le 2\exp&\Big(-\frac{1}{C_\alpha K^\alpha}\min\\ &\Big(\frac{t^\alpha}{(\mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2)^\alpha}, \frac{t^{\alpha/2}}{\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}^{\alpha/2}}\Big)\Big). \end{aligned} $$

For α = 2, this gives back [20, Theorem 1.1] (up to constants and a different range of t). Comparing Theorems 41 to 21, we note that instead of a sub-Gaussian term, we obtain an α-subexponential term (which can be trivially transformed into a sub-Gaussian term for $t \le \mathbb {E}\sup _{A\in \mathcal {A}}\lVert AX \rVert _2$, but this does not cover the complete α-subexponential regime). Moreover, Theorem 41 only gives a bound for the upper tails. Therefore, if $\mathcal {A}$ just consists of a single matrix, Theorem 21 is stronger. These differences have technical reasons.

To prove Theorem 41, we shall follow the basic steps of [20] and modify those where the truncation comes in. Let us first repeat some tools and results. In the sequel, for a random vector W = (W₁, …, W_n), we shall denote

$$\displaystyle \begin{aligned} f(W) := \sup_{A \in \mathcal{A}} (W^TAW - g(A)), \end{aligned} $$

(4.2)

where $g : \mathbb {R}^{n \times n} \to \mathbb {R}$ is some function. Moreover, if A is any matrix, we denote by Diag(A) its diagonal part (regarded as a matrix with zero entries on its off-diagonal). The following lemma combines [20, Lemmas 3.2 & 3.5].

Lemma 42

(1)
Assume the vector W has independent components that satisfy W_i ≤ K a.s. Then, for any t ≥ 1, we have
$$\displaystyle \begin{aligned} f(W) - \mathbb {E}f(W) \le C\big(K(\mathbb {E}\sup_{A \in \mathcal{A}}\lVert AW \rVert _2 &+ \mathbb {E}\sup_{A \in \mathcal{A}} \lVert \mathrm{Diag}(A)W \rVert _2)\sqrt{t} \\ &+ K^2\sup_{A\in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}t\big) \end{aligned} $$

with probability at least 1 − e^−t.
(2)
Assuming the vector W has independent (but not necessarily bounded) components with mean zero, we have
$$\displaystyle \begin{aligned} \mathbb {E}\sup_{A\in\mathcal{A}}\lVert \mathrm{Diag}(A)W \rVert _2 \le C\mathbb {E}\sup_{A\in\mathcal{A}}\lVert AW \rVert _2. \end{aligned}$$

From now on, let X be the random vector from Theorem 41, and recall the truncated random vector Y that we introduced in (3.3) (and the corresponding “remainder” Z). Then, Lemma 42 (1) for f(Y ) with $g(A) = \mathbb {E}X^TAX$ yields

$$\displaystyle \begin{aligned} f(Y) - \mathbb {E}f(Y) \le C\big(M(\mathbb {E}\sup_{A \in \mathcal{A}}\lVert AY \rVert _2 &+ \mathbb {E}\sup_{A\in \mathcal{A}}\lVert \mathrm{Diag}(A) \rVert _2)t^{1/\alpha} \\ &+ M^2t^{2/\alpha}\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}\big) \end{aligned} $$

(4.3)

with probability at least 1 − e^−t (actually, (4.3) even holds with α = 2, but in the sequel we will have to use the weaker version given above anyway). Here we recall that $M \le C_\alpha \lVert \max _i |X_i| \rVert _{\Psi _\alpha }$.

To prove Theorem 41, it remains to replace the terms involving the truncated random vector Y by the original vector X. First, by Proposition 31 and since $\sup _{A\in \mathcal {A}} \lVert AX \rVert _2$ is $\sup _{A\in \mathcal {A}}\lVert A \rVert _{\mathrm {op}}$-Lipschitz, we obtain

$$\displaystyle \begin{aligned} \mathbb {P}(\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 > \mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2 + C_\alpha\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \sup_{A\in\mathcal{A}} \lVert A \rVert _{\mathrm{op}}t^{1/\alpha}) \le 2e^{-t}. \end{aligned} $$

(4.4)

Moreover, by (3.8),

$$\displaystyle \begin{aligned} \lvert \mathbb {E}\sup_{A\in\mathcal{A}} \lVert AY \rVert _2 - \mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 \rvert \le C_\alpha\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \sup_{A\in\mathcal{A}} \lVert A \rVert _{\mathrm{op}}. \end{aligned} $$

(4.5)

Next we estimate the difference between the expectations of f(X) and f(Y ).

Lemma 43

We have

$$\displaystyle \begin{aligned} \lvert \mathbb {E}f(Y)-\mathbb {E}f(X) \rvert \le C_\alpha\big(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2 + \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2 \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}\big). \end{aligned}$$

Proof

First note that

$$\displaystyle \begin{aligned} f(X) &= \sup_{A\in\mathcal{A}} (Y^T AY - \mathbb {E}X^TAX + Z^TAX + Z^TAY)\\ &\le \sup_{A\in\mathcal{A}} (Y^T AY - \mathbb {E}X^TAX) + \sup_{A\in\mathcal{A}}\lvert Z^TAX \rvert + \sup_{A\in\mathcal{A}}\lvert Z^TAY \rvert \\ &\le f(Y) + \lVert Z \rVert _2\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\sup_{A\in\mathcal{A}}\lVert AY \rVert _2. \end{aligned} $$

The same holds if we reverse the roles of X and Y . As a consequence,

$$\displaystyle \begin{aligned} \lvert f(X)-f(Y) \rvert \le \lVert Z \rVert _2\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\sup_{A\in\mathcal{A}}\lVert AY \rVert _2, \end{aligned} $$

(4.6)

and thus, taking expectations and applying Hölder’s inequality,

$$\displaystyle \begin{aligned} \lvert \mathbb {E}f(X)-\mathbb {E}f(Y) \rvert \le (\mathbb {E}\lVert Z \rVert _2^2)^{1/2}((\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2^2)^{1/2} + (\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AY \rVert _2^2)^{1/2}). \end{aligned} $$

(4.7)

We may estimate $(\mathbb {E}\lVert Z \rVert _2^2)^{1/2}$ using (3.9). Moreover, by related arguments as in (3.8), from (4.4), we get that

$$\displaystyle \begin{aligned} \mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2^2 \le C_\alpha((\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2)^2 + \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2 \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}^2). \end{aligned}$$

Arguing similarly and using (4.5), the same bound also holds for $(\mathbb {E}\sup _{A\in \mathcal {A}} $ $\lVert AY \rVert _2^2)^{1/2}$. Taking roots and plugging everything into (4.7) complete the proof. □

Finally, we prove the central result of this section.

Proof of Theorem 41

First, it immediately follows from Lemma 43 that

$$\displaystyle \begin{aligned} \mathbb {E}f(Y) \le \mathbb {E}f(X) + C_\alpha\big(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2 + \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2 \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}\big). \end{aligned} $$

(4.8)

Moreover, by (4.5) and Lemma 42 (2),

$$\displaystyle \begin{aligned} \mathbb {E}\sup_{A \in \mathcal{A}}\lVert AY \rVert _2 + \mathbb {E}\sup_{A\in \mathcal{A}}\lVert \mathrm{Diag}(A)Y \rVert _2 &\le C_\alpha(\mathbb {E}\sup_{A \in \mathcal{A}}\lVert AX \rVert _2 \\ &+ \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}). \end{aligned} $$

(4.9)

Finally, it follows from (4.6), (4.4), and (4.5) that

$$\displaystyle \begin{aligned} \lvert f(X)-f(Y) \rvert &\le \lVert Z \rVert _2\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\sup_{A\in\mathcal{A}}\lVert AY \rVert _2\\ &\le C_\alpha(\lVert Z \rVert _2\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}t^{1/\alpha}) \end{aligned} $$

with probability at least 1 − 4e^−t for all t ≥ 1. Using (3.7), it follows that

$$\displaystyle \begin{aligned} \lvert f(X)-f(Y) \rvert &\le C_\alpha(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2t^{1/\alpha} \\ &+ \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}t^{2/\alpha}) \end{aligned} $$

(4.10)

with probability at least 1 − 6e^−t for all t ≥ 1. Combining (4.8), (4.9), and (4.10) and plugging into (4.3) thus yield that with probability at least 1 − 6e^−t for all t ≥ 1,

$$\displaystyle \begin{aligned} f(X) - \mathbb {E}f(X) &\le C_\alpha(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2t^{1/\alpha} \\ &+ \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}t^{2/\alpha})\\ &=: C_\alpha(at^{1/\alpha} + bt^{2/\alpha}). \end{aligned} $$

If $u \ge \max (a,b)$, it follows that

$$\displaystyle \begin{aligned} \mathbb {P}(f(X) - \mathbb {E}f(X) \ge u) \le 6\exp\Big(-\frac{1}{C_\alpha}\min\Big(\Big(\frac{u}{a}\Big)^\alpha, \Big(\frac{u}{b}\Big)^{\alpha/2}\Big)\Big). \end{aligned}$$

By standard means (a suitable change of constants, using (2.3)), this bound may be extended to any u ≥ 0, and the constant may be adjusted to 2. □

5 Random Tensors

By a simple random tensor, we mean a random tensor of the form

$$\displaystyle \begin{aligned} X := X_1 \otimes \cdots \otimes X_d = (X_{1,i_1} \cdots X_{d,i_d})_{i_1, \ldots, i_d} \in \mathbb {R}^{n^d}, \end{aligned} $$

(5.1)

where all X_k are independent random vectors in $\mathbb {R}^n$ whose coordinates are independent, centered random variables with variance one. Concentration results for random tensors (typically for polynomial-type functions) have been shown in [6, 12, 26], for instance.

Recently, in [42], new and interesting concentration bounds for simple random tensors were shown. In comparison to previous work, these inequalities focus on small values of t, e. g., a regime where sub-Gaussian tail decay holds. Moreover, in contrast to previous papers, [42] provides constants with optimal dependence on d. One of these results is the following convex concentration inequality: assuming that n and d are positive integers, $f : \mathbb {R}^{n^d} \to \mathbb {R}$ is convex and 1-Lipschitz, and the X_ij are bounded a.s., then for any t ∈ [0, 2n^d∕2],

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp \Big(- \frac{t^2}{Cdn^{d-1}}\Big), \end{aligned} $$

(5.2)

where C > 0 only depends on the bound of the coordinates. Using Theorem 21 and Proposition 31, we may extend this result to unbounded random variables as follows:

Theorem 51

Let $n, d \in \mathbb {N}$ and $f : \mathbb {R}^{n^d} \to \mathbb {R}$ be convex and 1-Lipschitz. Consider a simple random tensor X := X₁ ⊗⋯ ⊗ X_d as in (5.1). Fix α ∈ [1, 2], and assume that $\lVert X_{i,j} \rVert _{\Psi _\alpha } \le K$. Then, for any $t \in [0, c_\alpha n^{d/2} (\log n)^{1/\alpha }/K]$,

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big( \frac{t}{d^{1/2} n^{(d-1)/2} (\log n)^{1/\alpha} K}\Big)^\alpha\Big). \end{aligned}$$

On the other hand, if α ∈ (0, 1), then, for any $t \in [0, c_\alpha n^{d/2} (\log n)^{1/\alpha }d^{1/\alpha -1/2}/K]$,

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big( \frac{t}{d^{1/\alpha}n^{(d-1)/2} (\log n)^{1/\alpha} K}\Big)^\alpha\Big). \end{aligned}$$

The logarithmic factor stems from the Orlicz norm of max_i|X_i| in Proposition 31. For a slightly sharper version that includes the explicit dependence on these norms (and also gives back (5.2) for bounded random variables and α = 2), see (5.12) in the proof of Theorem 51. We believe that Theorem 51 is non-optimal for α < 1 as we would expect a bound of the same type as for α ∈ [1, 2]. However, a key difference in the proofs is that in the case of α ≥ 1 we can make use of moment-generating functions. This is clearly not possible if α < 1, so that less subtle estimates must be invoked instead.

For the proof of Theorem 51, we first adapt some preliminary steps and compile a number of auxiliary lemmas whose proofs are deferred to the appendix. As a start, we need some additional characterizations of α-subexponential random variables via the behavior of the moment-generating functions:

Proposition 52

Let X be a random variable and α ∈ (0, 2]. Then, the properties (1.1), (1.2), and (1.3) are equivalent to

$$\displaystyle \begin{aligned} \mathbb {E}\exp(\lambda^\alpha |X|{}^\alpha) \le \exp(C_{4,\alpha}^\alpha \lambda^\alpha) \end{aligned} $$

(5.3)

for all 0 ≤ λ ≤ 1∕C_4,α. If α ∈ [1, 2] and $\mathbb {E}X = 0$, then the above properties are moreover equivalent to

$$\displaystyle \begin{aligned} \mathbb {E}\exp(\lambda X) \le \begin{cases} \exp(C_{5,\alpha}^2 \lambda^2) & \mathit{\text{if}} \ |\lambda| \le 1/C_{5,\alpha} \\ \exp(C_{5,\alpha}^{\alpha/(\alpha-1)} |\lambda|{}^{\alpha/(\alpha-1)}) & \mathit{\text{if}} \ |\lambda| \ge 1/C_{5,\alpha} \ \mathit{\text{and}} \ \alpha > 1. \end{cases} \end{aligned} $$

(5.4)

The parameters C_i,α, i = 1, …, 5, can be chosen such that they only differ by constant α-dependent factors. In particular, we can take $C_{i,\alpha } = c_{i,\alpha }\lVert X \rVert _{\Psi _\alpha }$.

To continue, note that $\lVert X \rVert _2 = \prod _{i=1}^d \lVert X_i \rVert _2$. A key step in the proofs of [42] is a maximal inequality that simultaneously controls the tails of $\prod _{i=1}^k \lVert X_i \rVert _2$, k = 1, …, d, where the X_i have independent sub-Gaussian components, i. e., α = 2. Generalizing these results to any order α ∈ (0, 2] is not hard. The following preparatory lemma extends [42, Lemma 3.1]. Note that in the proof (given in the appendix again), we apply Proposition 22.

Lemma 53

Let $X_1, \ldots , X_d \in \mathbb {R}^n$ be independent random vectors with independent, centered coordinates such that ${\mathbb {E}} X_{i,j}^2 = 1$ and $\lVert X_{i,j} \rVert _{\Psi _\alpha } \le K$ for some α ∈ (0, 2]. Then, for any t ∈ [0, 2n^d∕2],

$$\displaystyle \begin{aligned} \mathbb {P} \Big(\prod_{i=1}^d \lVert X_i \rVert_2 > n^{d/2} + t\Big) \le 2 \exp\Big(-\frac{1}{C_\alpha} \Big(\frac{t}{K^2 d^{1/2}n^{(d-1)/2}}\Big)^\alpha\Big). \end{aligned}$$

To control all k = 1, …, d simultaneously, we need a generalized version of the maximal inequality [42, Lemma 3.2] that we state next.

Lemma 54

Let $X_1, \ldots , X_d \in \mathbb {R}^n$ be independent random vectors with independent, centered coordinates such that ${\mathbb {E}} X_{i,j}^2 = 1$ and $\lVert X_{i,j} \rVert _{\Psi _\alpha } \le K$ for some α ∈ (0, 2]. Then, for any u ∈ [0, 2],

$$\displaystyle \begin{aligned} \mathbb {P}\Big(\max_{1 \le k \le d}n^{-k/2}\prod_{i=1}^k \lVert X_i \rVert_2 > 1 + u \Big) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2} u}{K^2d^{1/2}}\Big)^\alpha\Big). \end{aligned}$$

The following Martingale-type bound is directly taken from [42]:

Lemma 55 ([42], Lemma 4.1)

Let X₁, …X_d be independent random vectors. For each k = 1, …, d, let f_k = f_k(X_k, …, X_d) be an integrable real-valued function and $\mathcal {E}_k$ be an event that is uniquely determined by the vectors X_k, …, X_d. Let $\mathcal {E}_{d+1}$ be the entire probability space. Suppose that for every k = 1, …, d, we have

$$\displaystyle \begin{aligned} \mathbb {E}_{X_k} \exp(f_k) \le \pi_k \end{aligned}$$

for every realization of X_k+1, …, X_d in $\mathcal {E}_{k+1}$. Then, for $\mathcal {E} := \mathcal {E}_2 \cap \cdots \cap \mathcal {E}_d$, we have

$$\displaystyle \begin{aligned} \mathbb {E} \exp(f_1 + \ldots + f_d)1_{\mathcal{E}} \le \pi_1 \cdots \pi_d. \end{aligned}$$

Finally, we need a bound for the Orlicz norm of max_i|X_i|.

Lemma 56

Let X₁, …, X_n be independent, centered random variables such that $\lVert X_i \rVert _{\Psi _\alpha } \le K$ for any i and some α > 0. Then,

$$\displaystyle \begin{aligned} \lVert \max_i |X_i| \rVert_{\Psi_\alpha} \le C_\alpha K \max\Big\{\Big(\frac{\sqrt{2}+1}{\sqrt{2}-1}\Big)^{1/\alpha},(\log n)^{1/\alpha}\Big(\frac{2}{\log 2}\Big)^{1/\alpha}\Big\}. \end{aligned}$$

Here, we may choose $C_\alpha = \max \{2^{1/\alpha -1}, 2^{1-1/\alpha }\}$.

Note that for α ≥ 1, [8, Proposition 4.3.1] provides a similar result. However, we are also interested in the case of α < 1 in the present note. The condition $\mathbb {E}X_i = 0$ in Lemma 56 can easily be removed only at the expense of a different absolute constant.

We are now ready to prove Theorem 51.

Proof of Theorem 51

We shall adapt the arguments from [42]. First let

$$\displaystyle \begin{aligned} \mathcal{E}_k := \Big\{ \prod_{i=k}^d \lVert X_i \rVert_2 \le 2 n^{(d-k+1)/2}\Big\},\qquad k = 1, \ldots, d, \end{aligned}$$

and let $\mathcal {E}_{d+1}$ be the full space. It then follows from Lemma 54 for u = 1 that

$$\displaystyle \begin{aligned} \mathbb {P}(\mathcal{E}) \ge 1 - 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2}}{K^2d^{1/2}}\Big)^\alpha\Big), \end{aligned} $$

(5.5)

where $\mathcal {E} := \mathcal {E}_2 \cap \cdots \cap \mathcal {E}_d$.

Now fix any realization x₂, …, x_d of the random vectors X₂, …, X_d in $\mathcal {E}_2$, and apply Proposition 31 to the function f₁(x₁) given by x₁↦f(x₁, …x_d). Clearly, f₁ is convex, and since

$$\displaystyle \begin{aligned} |f(x \otimes x_2 \otimes \cdots \otimes x_d) - f(y \otimes x_2 \otimes \cdots \otimes x_d)| \le \lVert x - y \rVert_2 \prod_{i=2}^d \lVert x_i \rVert_2 \le \lVert x - y \rVert_2 2 n^{(d-1)/2}, \end{aligned}$$

we see that it is 2n^(d−1)∕2-Lipschitz. Hence, it follows from (3.2) that

$$\displaystyle \begin{aligned} \lVert f - \mathbb {E}_{X_1} f \rVert_{\Psi_\alpha(X_1)} \le c_\alpha n^{(d-1)/2} \lVert \max_j |X_{1,j}| \rVert_{\Psi_\alpha} \end{aligned} $$

(5.6)

for any x₂, …, x_d in $\mathcal {E}_2$, where $\mathbb {E}_{X_1}$ denotes taking the expectation with respect to X₁ (which, by independence, is the same as conditionally on X₂, …, X_d).

To continue, fix any realization x₃, …, x_d of the random vectors X₃, …, X_d that satisfy $\mathcal {E}_3$ and apply Proposition 31 to the function f₂(x₂) given by $x_2 \mapsto \mathbb {E}_{X_1}f(X_1, x_2, \ldots , x_d)$. Again, f₂ is a convex function, and since

$$\displaystyle \begin{aligned} &|\mathbb {E}_{X_1}f(X_1 \otimes x \otimes x_3 \otimes \ldots \otimes x_d) - \mathbb {E}_{X_1}f(X_1 \otimes y \otimes x_3 \otimes \ldots \otimes x_d)|\\ &\le \mathbb {E}_{X_1} \lVert X_1 \otimes (x-y) \otimes x_3 \otimes \ldots \otimes x_d \rVert_2 \le (\mathbb {E}\lVert X_1 \rVert_2^2)^{1/2} \lVert x - y \rVert_2 \prod_{i=3}^d \lVert x_i \rVert_2\\ &\le \sqrt{n} \lVert x - y \rVert_2 \cdot 2 n^{(d-2)/2} = \lVert x - y \rVert_2 \cdot 2 n^{(d-1)/2}, \end{aligned} $$

f₂ is 2n^(d−1)∕2-Lipschitz. Applying (3.2), we thus obtain

$$\displaystyle \begin{aligned} \lVert \mathbb {E}_{X_1} f - \mathbb {E}_{X_1,X_2} f \rVert_{\Psi_\alpha(X_2)} \le c_\alpha n^{(d-1)/2} \lVert \max_j |X_{2,j}| \rVert_{\Psi_\alpha} \end{aligned} $$

(5.7)

for any x₃, …, x_d in $\mathcal {E}_3$. Iterating this procedure, we arrive at

$$\displaystyle \begin{aligned} \lVert \mathbb {E}_{X_1, \ldots, X_{k-1}} f - \mathbb {E}_{X_1,\ldots, X_k} f \rVert_{\Psi_\alpha(X_k)} \le c_\alpha n^{(d-1)/2} \lVert \max_j |X_{k,j}| \rVert_{\Psi_\alpha} \end{aligned} $$

(5.8)

for any realization x_k+1, …, x_d of X_k+1, …, X_d in $\mathcal {E}_{k+1}$.

We now combine (5.8) for k = 1, …, d. To this end, we write

$$\displaystyle \begin{aligned} \Delta_k := \Delta_k(X_k, \ldots, X_d) := \mathbb {E}_{X_1, \ldots, X_{k-1}}f - \mathbb {E}_{X_1, \ldots, X_k}f \end{aligned}$$

and apply Proposition 52. Here we have to distinguish between the cases where α ∈ [1, 2] and α ∈ (0, 1). If α ≥ 1, we use (5.4) to arrive at a bound for the moment-generating function. Writing $M_k := \lVert \max _j |X_{k,j}| \rVert _{\Psi _\alpha }$, we obtain

$$\displaystyle \begin{aligned} \mathbb {E}\exp(\lambda \Delta_k) \le \begin{cases} \exp((c_\alpha n^{(d-1)/2} M_k)^2 \lambda^2)\\ \exp((c_\alpha n^{(d-1)/2}M_k)^{\alpha/(\alpha-1)} |\lambda|{}^{\alpha/(\alpha-1)}) \end{cases} \end{aligned}$$

for all x_k+1, …, x_d in $\mathcal {E}_{k+1}$, where the first line holds if |λ|≤ 1∕(c_αn^(d−1)∕2M_k) and the second one if |λ|≥ 1∕(c_αn^(d−1)∕2M_k) and α > 1. For the simplicity of presentation, temporarily assume that c_αn^(d−1)∕2 = 1 (alternatively, replace M_k by c_αn^(d−1)∕2M_k in the following arguments) and that M₁ ≤… ≤ M_d. Using Lemma 55, we obtain

$$\displaystyle \begin{aligned} &\qquad \mathbb {E} \exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} = \mathbb {E}\exp(\lambda(\Delta_1 + \cdots + \Delta_d))1_{\mathcal{E}}\\ &\le \exp((M_1^2 + \ldots + M_k^2) \lambda^2 + (M_{k+1}^{\alpha/(\alpha-1)} + \ldots + M_d^{\alpha/(\alpha-1)}) |\lambda|{}^{\alpha/(\alpha-1)}) \end{aligned} $$

for |λ|∈ [1∕M_k+1, 1∕M_k], where we formally set M₀ := 0 and M_d+1 := ∞. In particular, setting $M := (M_1^2 + \ldots + M_d^2)^{1/2}$, we have

$$\displaystyle \begin{aligned} \mathbb {E} \exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} \le \exp(M^2 \lambda^2) \end{aligned}$$

for all |λ|≤ 1∕M_d = 1∕(max_kM_k). Furthermore, for α > 1, it is not hard to see that

$$\displaystyle \begin{aligned} (M_1^2 + \ldots + M_k^2) \lambda^2 + (M_{k+1}^{\alpha/(\alpha-1)} + \ldots + M_d^{\alpha/(\alpha-1)}) |\lambda|{}^{\alpha/(\alpha-1)} \le M^{\alpha/(\alpha-1)}|\lambda|{}^{\alpha/(\alpha-1)} \end{aligned}$$

If |λ|∈ [1∕M_k+1, 1∕M_k] for some k = 0, 1, …, d − 1 or |λ|∈ [1∕M, 1∕M_d] for k = d. Indeed, by monotonicity (divide by λ² and compare the coefficients), it suffices to check this for λ = 1∕M_k+1 or λ = 1∕M if k = d. The cases of k = 0 and k = d follow by simple calculations. In the general case, set $x^2 = (M_1^2 + \ldots + M_{k+1}^2)/M_{k+1}^2$ and $y^{\alpha /(\alpha -1)} = (M_{k+2}^{\alpha /(\alpha -1)} + \ldots + M_d^{\alpha /(\alpha -1)})/M_{k+1}^{\alpha /(\alpha -1)}$. Clearly, (x² + y^{α∕(α−1)})^{(α−1)∕α} ≤ (x² + y²)^1∕2 since x ≥ 1 and α∕(α − 1) ≥ 2. Moreover, $y^2 \le (M_{k+2}^2 + \ldots + M_d^2)/M_{k+1}^2$, which proves the inequality. Altogether, inserting the factor c_αn^(d−1)∕2 again, we therefore obtain

$$\displaystyle \begin{aligned} &\mathbb {E} \exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} = \mathbb {E}\exp(\lambda(\Delta_1 + \cdots + \Delta_d))1_{\mathcal{E}}\notag\\ &\qquad \le \begin{cases} \exp((c_\alpha n^{(d-1)/2})^2 M^2 \lambda^2) \\ \exp((c_\alpha n^{(d-1)/2})^{\alpha/(\alpha-1)} M^{\alpha/(\alpha-1)} |\lambda|{}^{\alpha/(\alpha-1)}), \end{cases} \end{aligned} $$

(5.9)

where the first line holds if |λ|≤ 1∕(c_αn^(d−1)∕2M) and the second one if |λ|≥ 1∕(c_αn^(d−1)∕2M) and α > 1.

On the other hand, if α < 1, we use (5.3). Together with Lemma 55 and the subadditivity of |⋅|^α for α ∈ (0, 1), this yields

$$\displaystyle \begin{aligned} &\mathbb {E} \exp(\lambda^\alpha|f-\mathbb {E}f|{}^\alpha)1_{\mathcal{E}} \le \mathbb {E}\exp(\lambda^\alpha(|\Delta_1|{}^\alpha + \cdots + |\Delta_d|{}^\alpha))1_{\mathcal{E}}\\ &\le \exp((c_\alpha n^{(d-1)/2})^\alpha (M_1^\alpha + \cdots + M_d^\alpha) \lambda^\alpha) \end{aligned} $$

(5.10)

for λ ∈ [0, 1∕(c_αn^(d−1)∕2max_kM_k)].

To finish the proof, first consider α ∈ [1, 2]. Then, for any λ > 0, we have

$$\displaystyle \begin{aligned} \mathbb {P}(f - \mathbb {E}f > t) &\le \mathbb {P}(\{f - \mathbb {E}f > t \} \cap \mathcal{E}) + \mathbb {P}(\mathcal{E}^c)\\ &\le \mathbb {P}(\exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} > \exp(\lambda t)) + \mathbb {P}(\mathcal{E}^c)\\ &\le \exp\Big(- \Big(\frac{t}{c_\alpha n^{(d-1)/2} M}\Big)^\alpha\Big) + 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2}}{K^2d^{1/2}}\Big)^\alpha\Big), \end{aligned} $$

(5.11)

where the last step follows by standard arguments (similarly as in the proof of Proposition 52 given in the appendix), using (5.9) and (5.5). Now, assume that t ≤ c_αn^d∕2M∕(K²d^1∕2). Then, the right-hand side of (5.11) is dominated by the first term (possibly after adjusting constants), so that we arrive at

$$\displaystyle \begin{aligned} \mathbb {P}(f - \mathbb {E}f > t) \le 3 \exp\Big(- \frac{1}{C_\alpha} \Big(\frac{t}{n^{(d-1)/2} M}\Big)^\alpha\Big). \end{aligned}$$

The same arguments hold if f is replaced by − f. Adjusting constants by (2.3), we obtain that for any t ∈ [0, c_αn^d∕2M∕(K²d^1∕2)],

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big( \frac{t}{n^{(d-1)/2} M}\Big)^\alpha\Big). \end{aligned} $$

(5.12)

Now it remains to note that by Lemma 56, we have

$$\displaystyle \begin{aligned} \lVert \max_j |X_{i,j}| \rVert_{\Psi_\alpha} \le C_\alpha (\log n)^{1/\alpha} \max_j \lVert X_{i,j} \rVert_{\Psi_\alpha} \le C_\alpha (\log n)^{1/\alpha} K. \end{aligned}$$

If α ∈ (0, 1), similarly to (5.11), using (5.10), (5.5) and Proposition 52,

$$\displaystyle \begin{aligned} \mathbb {P}(|f - \mathbb {E}f| > t) &\le \mathbb {P}(\{|f - \mathbb {E}f| > t \} \cap \mathcal{E}) + \mathbb {P}(\mathcal{E}^c)\\ &\le 2\exp\Big(- \Big(\frac{t}{c_\alpha n^{(d-1)/2} M_\alpha}\Big)^\alpha\Big) + 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2}}{K^2d^{1/2}}\Big)^\alpha\Big), \end{aligned} $$

where $M_\alpha := (M_1^\alpha + \ldots + M_d^\alpha )^{1/\alpha }$. The rest follows as above. □

References

R. Adamczak, A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. 13(34), 1000–1034 (2008)
MathSciNet MATH Google Scholar
R. Adamczak, A note on the Hanson-Wright inequality for random vectors with dependencies. Electron. Commun. Probab. 20(72), 13 (2015)
Google Scholar
R. Adamczak, R. Latała, Tail and moment estimates for chaoses generated by symmetric random variables with logarithmically concave tails. Ann. Inst. Henri Poincaré Probab. Stat. 48(4), 1103–1136 (2012)
Article MathSciNet MATH Google Scholar
R. Adamczak, R. Latała, R. Meller, Hanson-Wright inequality in Banach spaces. Ann. Inst. Henri Poincaré Probab. Stat. 56(4), 2356–2376 (2020)
Article MathSciNet MATH Google Scholar
R. Adamczak, M. Strzelecki, On the convex Poincaré inequality and weak transportation inequalities. Bernoulli 25(1), 341–374 (2019)
Article MathSciNet MATH Google Scholar
R. Adamczak, P. Wolff, Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order. Probab. Theory Relat. Fields 162(3–4), 531–586 (2015)
Article MathSciNet MATH Google Scholar
S. Boucheron, G. Lugosi, P. Massart, Concentration Inequalities (Oxford University Press, Oxford, 2013). A nonasymptotic theory of independence, With a foreword by Michel Ledoux
Google Scholar
V.H. de la Peña, E. Giné, Decoupling. Probability and Its Applications (New York) (Springer, New York, 1999)
Book Google Scholar
L.H. Dicker, M.A. Erdogdu, Flexible results for quadratic forms with applications to variance components estimation. Ann. Stat. 45(1), 386–414 (2017)
Article MathSciNet MATH Google Scholar
E.D. Gluskin, S. Kwapień, Tail and moment estimates for sums of independent random variables with logarithmically concave tails. Studia Math. 114(3), 303–309 (1995)
Article MathSciNet MATH Google Scholar
F. Götze, H. Sambale, A. Sinulis, Concentration inequalities for bounded functionals via log-Sobolev-type inequalities. J. Theor. Probab. 34(3), 1623–1652 (2021)
Article MathSciNet MATH Google Scholar
F. Götze, H. Sambale, A. Sinulis, Concentration inequalities for polynomials in α-sub-exponential random variables. Electron. J. Probab. 26(48), 22 (2021)
Google Scholar
N. Gozlan, C. Roberto, P.-M. Samson, From dimension free concentration to the Poincaré inequality. Calc. Var. Partial Differ. Equ. 52(3–4), 899–925 (2015)
Article MATH Google Scholar
N. Gozlan, C. Roberto, P.-M. Samson, Y. Shu, P. Tetali, Characterization of a class of weak transport-entropy inequalities on the line. Ann. Inst. Henri Poincaré Probab. Stat. 54(3), 1667–1693 (2018)
Article MathSciNet MATH Google Scholar
N. Gozlan, C. Roberto, P.-M. Samson, P. Tetali, Kantorovich duality for general transport costs and applications. J. Funct. Anal. 273(11), 3327–3405 (2017)
Article MathSciNet MATH Google Scholar
D.L. Hanson, F.T. Wright, A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42, 1079–1083 (1971)
Article MathSciNet MATH Google Scholar
P. Hitczenko, S.J. Montgomery-Smith, K. Oleszkiewicz, Moment inequalities for sums of certain independent symmetric random variables. Studia Math. 123(1), 15–42 (1997)
MathSciNet MATH Google Scholar
D. Hsu, S.M. Kakade, T. Zhang, A tail inequality for quadratic forms of sub-Gaussian random vectors. Electron. Commun. Probab. 17(52), 6 (2012)
Google Scholar
W.B. Johnson, G. Schechtman, Remarks on Talagrand’s deviation inequality for Rademacher functions, in Functional Analysis (Austin, TX, 1987/1989). Lecture Notes in Mathematics, vol. 1470 (Springer, Berlin, 1991), pp. 72–77
Google Scholar
Y. Klochkov, N. Zhivotovskiy, Uniform Hanson–Wright type concentration inequalities for unbounded entries via the entropy method. Electron. J. Probab. 25(22), 30 (2020)
Google Scholar
K. Kolesko, R. Latała, Moment estimates for chaoses generated by symmetric random variables with logarithmically convex tails. Stat. Probab. Lett. 107, 210–214 (2015)
Article MathSciNet MATH Google Scholar
F. Krahmer, S. Mendelson, H. Rauhut, Suprema of chaos processes and the restricted isometry property. Commun. Pure Appl. Math. 67(11), 1877–1904 (2014)
Article MathSciNet MATH Google Scholar
A.K. Kuchibhotla, A. Chakrabortty, Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression. arXiv preprint, 2018
Google Scholar
R. Latała, Tail and moment estimates for sums of independent random vectors with logarithmically concave tails. Studia Math. 118(3), 301–304 (1996)
Article MathSciNet MATH Google Scholar
R. Latała, Tail and moment estimates for some types of chaos. Studia Math. 135(1), 39–53 (1999)
Article MathSciNet MATH Google Scholar
R. Latała, Estimates of moments and tails of Gaussian chaoses. Ann. Probab. 24(6), 2315–2331 (2006)
MathSciNet MATH Google Scholar
R. Latała, R. Łochowski, Moment and tail estimates for multidimensional chaos generated by positive random variables with logarithmically concave tails, in Stochastic Inequalities and Applications. Progress in Probability, vol. 56 (Birkhäuser, Basel, 2003), pp. 77–92
Google Scholar
J. Lederer, S. van de Geer, New concentration inequalities for suprema of empirical processes. Bernoulli 20(4), 2020–2038 (2014)
Article MathSciNet MATH Google Scholar
M. Ledoux, On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Stat. 1, 63–87 (1995/1997)
Article MathSciNet MATH Google Scholar
M. Ledoux, M. Talagrand, Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 23 (Springer, Berlin, 1991)
Google Scholar
A. Marchina, Concentration inequalities for separately convex functions. Bernoulli 24(4A), 2906–2933 (2018)
Article MathSciNet MATH Google Scholar
P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. (6) 9(2), 245–303 (2000)
Google Scholar
M. Rudelson, R. Vershynin, Hanson-Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18(82), 9 (2013)
Google Scholar
H. Sambale, A. Sinulis, Logarithmic Sobolev inequalities for finite spin systems and applications. Bernoulli 26(3), 1863–1890 (2020)
Article MathSciNet MATH Google Scholar
H. Sambale, A. Sinulis, Modified log-Sobolev inequalities and two-level concentration. ALEA Lat. Am. J. Probab. Math. Stat. 18, 855–885 (2021)
Article MathSciNet MATH Google Scholar
P.-M. Samson, Concentration of measure inequalities for Markov chains and Φ-mixing processes. Ann. Probab. 28(1), 416–461 (2000)
Article MathSciNet MATH Google Scholar
P.-M. Samson, Concentration inequalities for convex functions on product spaces, in Stochastic Inequalities and Applications. Progress in Probability, vol. 56 (Birkhäuser, Basel, 2003), pp. 33–52
Google Scholar
M. Talagrand, An isoperimetric theorem on the cube and the Khintchine-Kahane inequalities. Proc. Am. Math. Soc. 104(3), 905–909 (1988)
Article MATH Google Scholar
M. Talagrand, New concentration inequalities in product spaces. Invent. Math. 126(3), 505–563 (1996)
Article MathSciNet MATH Google Scholar
S. van de Geer, J. Lederer, The Bernstein-Orlicz norm and deviation inequalities. Probab. Theory Relat. Fields 157(1–2), 225–250 (2013)
Article MathSciNet MATH Google Scholar
R. Vershynin, High-Dimensional Probability. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47 (Cambridge University Press, Cambridge, 2018)
Google Scholar
R. Vershynin, Concentration inequalities for random tensors. Bernoulli 26(4), 3139–3162 (2020)
Article MathSciNet MATH Google Scholar
V.H. Vu, K. Wang, Random weighted projections, random quadratic forms and random eigenvectors. Random Struct. Algorithm 47(4), 792–821 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the German Research Foundation (DFG) via CRC 1283 “Taming uncertainty and profiting from randomness and low regularity in analysis, stochastics and their applications.” The author would moreover like to thank Arthur Sinulis for carefully reading this paper and many fruitful discussions and suggestions.

Author information

Authors and Affiliations

Faculty of Mathematics, Bielefeld University, Bielefeld, Germany
Holger Sambale

Authors

Holger Sambale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Holger Sambale .

Editor information

Editors and Affiliations

Institute of Mathematics, University of Warsaw, Warszawa, Poland
Radosław Adamczak
MAP5, Université Paris Cité, Paris, France
Nathael Gozlan
Centre de Mathématiques Appliquées, École Polytechnique, Palaiseau, France
Karim Lounici
Department of Mathematical Sciences, University of Delaware, Newark, DE, USA
Mokshay Madiman

Appendix A

Proof of Proposition 52

The equivalence of (1.1), (1.2), (1.3), and (5.3) is easily seen by directly adapting the arguments from the proof of [41, Proposition 2.5.2]. To see that these properties imply (5.4), first note that since in particular $\lVert X \rVert _{\Psi _1} < \infty $, the bound for $|\lambda | \le 1/C_{5,\alpha }^{\prime }$ directly follows from [41], Proposition 2.7.1 (e). To see the bound for large values of |λ|, we infer that by the weighted arithmetic–geometric mean inequality (with weights α − 1 and 1),

$$\displaystyle \begin{aligned} y^{(\alpha-1)/\alpha} z^{1/\alpha} \le \frac{\alpha-1}{\alpha} y + \frac{1}{\alpha} z\end{aligned}$$

for any y, z ≥ 0. Setting y := |λ|^{α∕(α−1)} and z := |x|^α, we may conclude that

$$\displaystyle \begin{aligned} \lambda x \le \frac{\alpha-1}{\alpha} |\lambda|{}^{\alpha/(\alpha-1)} + \frac{1}{\alpha} |x|{}^\alpha\end{aligned}$$

for any $\lambda , x \in \mathbb {R}$. Consequently, using (5.3), assuming C_4,α = 1, for any |λ|≥ 1,

$$\displaystyle \begin{aligned} \mathbb {E} \exp(\lambda X) &\le \exp\big(\frac{\alpha-1}{\alpha} |\lambda|{}^{\alpha/(\alpha-1)}\big)\mathbb {E}\exp(|X|{}^\alpha/\alpha)\\ &\le \exp\big(\frac{\alpha-1}{\alpha} |\lambda|{}^{\alpha/(\alpha-1)}\big) \exp(1/\alpha) \le \exp(|\lambda|{}^{\alpha/(\alpha-1)}). \end{aligned} $$

This yields (5.4) for $|\lambda | \ge 1/C_{5,\alpha }^{\prime \prime }$. The claim now follows by taking $C_{5,\alpha } := \max (C_{5,\alpha }^{\prime }, C_{5,\alpha }^{\prime \prime })$.

Finally, starting with (5.4), assuming C_5,α = 1, let us check (1.1). To this end, note that for any λ > 0,

$$\displaystyle \begin{aligned} \mathbb {P}(X \ge t) \le \exp(-\lambda t) \mathbb {E}\exp(\lambda X) \le \exp(-\lambda t + \lambda^2 \mathbb m{1}_{\{\lambda \le 1\}} + \lambda^{\alpha/(\alpha-1)}\mathbb m{1}_{\{\lambda > 1\}}). \end{aligned}$$

Now choose λ := t∕2 if t ≤ 2, λ := ((α − 1)t∕α)^α−1 if t ≥ α∕(α − 1), and λ := 1 if t ∈ (2, α∕(α − 1)). This yields

$$\displaystyle \begin{aligned} \mathbb {P}(X \ge t) \le \begin{cases} \exp(-t^2/4) & \text{if} \ t \le 2,\\ \exp(-(t-1)) & \text{if} \ t \in (2, \alpha/(\alpha-1)),\\ \exp(-\frac{(\alpha-1)^{\alpha-1}}{\alpha^\alpha} t^\alpha) & \text{if} \ t \ge \alpha/(\alpha-1). \end{cases}\end{aligned}$$

Now use (2.3), (2.4), and the fact that $\exp (-(t-1)) \le \exp (- t^\alpha /C_\alpha ^\alpha )$ for any t ∈ (2, α∕(α − 1)). It follows that

$$\displaystyle \begin{aligned} \mathbb {P}(X \ge t) \le 2 \exp(- t^\alpha/C_{1,\alpha}^{\prime\alpha})\end{aligned}$$

for any t ≥ 0. The same argument for − X completes the proof. □

Proof of Lemma 53

By the arithmetic and geometric means inequality and since $\mathbb {E}\lVert X_i \rVert _2 \le \sqrt {n}$, for any s ≥ 0,

$$\displaystyle \begin{aligned}{} \mathbb {P}\Big(\prod_{i=1}^d \lVert X_i \rVert_2 > (\sqrt{n} + s)^d \Big) &\le \mathbb {P}\Big(\frac{1}{d}\sum_{i = 1}^d (\lVert X_i \rVert_2 - \sqrt{n}) > s\Big)\\ &\le \mathbb {P}\Big(\frac{1}{d}\sum_{i = 1}^d (\lVert X_i \rVert_2 - \mathbb {E}\lVert X_i \rVert_2) > s\Big). \end{aligned} $$

(A.1)

Moreover, by (2.2) and [12, Corollary A.5],

$$\displaystyle \begin{aligned} \big\lVert \lVert X_i \rVert_2 - \mathbb {E}\lVert X_i \rVert_2 \big\rVert_{\Psi_\alpha} = \big\lVert \lVert X_i \rVert_2 - \sqrt{n} - (\mathbb {E}\lVert X_i \rVert_2 - \sqrt{n}) \big\rVert_{\Psi_\alpha} \le C_\alpha K^2 \end{aligned}$$

for any i = 1, …, d. On the other hand, if Y₁, …, Y_d are independent centered random variables with $\lVert Y_i \rVert _{\Psi _\alpha } \le M$, we have

$$\displaystyle \begin{aligned} \mathbb {P}\Big(\frac{1}{d}\Big|\sum_{i=1}^d Y_i\Big| \ge s\Big) &\le 2 \exp\Big(-\frac{1}{C_\alpha}\min\Big( \Big(\frac{s\sqrt{d}}{M}\Big)^2, \Big(\frac{s\sqrt{d}}{M}\Big)^\alpha \Big)\Big)\\ &\le 2 \exp\Big(-\frac{1}{C_\alpha} \Big(\frac{s\sqrt{d}}{M}\Big)^\alpha \Big). \end{aligned} $$

Here, the first estimate follows from [10] (α > 1) and [17] (α ≤ 1), while the last step follows from (2.4). As a consequence, (A.1) can be bounded by $2 \exp (-s^\alpha d^{\alpha /2} / (K^{2\alpha }C_\alpha ))$.

For u ∈ [0, 2] and $s = u \sqrt {n}/2d$, we have $ (\sqrt {n} + s)^d \le n^{d/2}(1+u)$. Plugging in, we arrive at

$$\displaystyle \begin{aligned} \mathbb {P}\Big(\prod_{i=1}^d \lVert X_i \rVert_2 > n^{d/2}(1 + u) \Big) \le 2 \exp\Big(-\frac{1}{C_\alpha} \Big(\frac{n^{1/2} u}{K^2 d^{1/2}}\Big)^\alpha\Big). \end{aligned}$$

Now set u := t∕n^d∕2. □

Proof of Lemma 54

Let us first recall the partition into “binary sets” that appears in the proof of [42, Lemma 3.2]. Here we assume that d = 2^L for some $L \in \mathbb {N}$ (if not, increase d). Then, for any ℓ ∈{0, 1, …, L}, we consider the partition $\mathcal {I}_\ell $ of {1, …, d} into 2^ℓ successive (integer) intervals of length d_ℓ := d∕2^ℓ that we call “binary intervals.” It is not hard to see that for any k = 1, …, d, we can partition [1, k] into binary intervals of different lengths such that this partition contains at most one interval of each family $\mathcal {I}_\ell $.

Now it suffices to prove that

$$\displaystyle \begin{aligned} \mathbb {P} \Big(\exists 0 \le \ell \le L, \exists I \in \mathcal{I}_\ell : \prod_{i\in I} \lVert X_i \rVert_2 &> (1 + 2 ^{-\ell/4}u)n^{d_\ell/2}\Big) \\ &\le 2 \exp\Big(-\frac{1}{C_\alpha} \Big(\frac{n^{1/2}u}{K^2d^{1/2}}\Big)^\alpha \Big) \end{aligned} $$

(cf. Step 3 of the proof of [42, Lemma 3.2], where the reduction to this case is explained in detail). To this end, for any ℓ ∈{0, 1, …, L}, any $I \in \mathcal {I}_\ell $, and d_ℓ := |I| = d∕2^ℓ, we apply Lemma 53 for d_ℓ and $t := 2^{-\ell /4} n^{d_\ell /2}u$. This yields

$$\displaystyle \begin{aligned} \mathbb {P} \Big(\prod_{i\in I} \lVert X_i \rVert_2 > (1 + 2 ^{-\ell/4}u)n^{d_\ell/2}\Big) &\le 2 \exp\Big(-\frac{1}{C_\alpha} \Big(\frac{n^{1/2}u}{2^{\ell/4}K^2d_\ell^{1/2}}\Big)^\alpha\Big)\\ &= 2 \exp\Big(-\frac{1}{C_\alpha} \Big(2^{\ell/4}\frac{n^{1/2}u}{K^2d^{1/2}}\Big)^\alpha\Big). \end{aligned} $$

Altogether, we arrive at

$$\displaystyle \begin{aligned} \mathbb {P} \Big(\exists \ell \in \{0, 1, \ldots, L\}, \exists I \in \mathcal{I}_\ell : &\prod_{i\in I} \lVert X_i \rVert_2 > (1 + 2 ^{-\ell/4}u)n^{d_\ell/2}\Big)\\ &\le \sum_{\ell=0}^L 2^\ell \cdot 2 \exp\Big(-\frac{1}{C_\alpha}\Big(2^{\ell/4}\frac{n^{1/2}u}{K^2d^{1/2}}\Big)^\alpha\Big). \end{aligned} $$

(A.2)

We may now assume that (n^1∕2u∕(K²d^1∕2))^α∕C_α ≥ 1 (otherwise the bound in Lemma 54 gets trivial by adjusting C_α). Using the elementary inequality ab ≥ (a + b)∕2 for all a, b ≥ 1, we arrive at

$$\displaystyle \begin{aligned} 2^{\ell\alpha/4} \frac{1}{C_\alpha}\Big(\frac{n^{1/2}u}{K^2d^{1/2}}\Big)^\alpha \ge \frac{1}{2} \Big(2^{\ell\alpha/4} + \frac{1}{C_\alpha} \Big(\frac{n^{1/2}u}{K^2 d^{1/2}}\Big)^\alpha\Big). \end{aligned}$$

Using this in (A.2), we obtain the upper bound

$$\displaystyle \begin{aligned} 2 \exp\Big(- \frac{1}{2C_\alpha} \Big(\frac{n^{1/2}u}{K^2d^{1/2}}\Big)^\alpha\Big) \sum_{\ell=0}^L 2^\ell \exp(- 2^{\ell\alpha/4 - 1}) \le c_\alpha \exp\Big(-\frac{1}{2C_\alpha} \Big(\frac{n^{1/2}u}{K^2d^{1/2}}\Big)^\alpha \Big). \end{aligned}$$

By (2.3), we can assume c_α = 2. □

To prove Lemma 56, we first present a number of lemmas and auxiliary statements. In particular, recall that if α ∈ (0, ∞), then for any x, y ∈ (0, ∞),

$$\displaystyle \begin{aligned} c_\alpha(x^\alpha + y^\alpha) \le (x+y)^\alpha \le \widetilde{c}_\alpha(x^\alpha+y^\alpha), \end{aligned} $$

(A.3)

where c_α := 2^α−1 ∧ 1 and $\widetilde {c}_\alpha := 2^{\alpha -1} \vee 1$. Indeed, if α ≤ 1, using the concavity of the function x↦x^α, it follows by standard arguments that 2^α−1(x^α + y^α) ≤ (x + y)^α ≤ x^α + y^α. Likewise, for α ≥ 1, using the convexity of x↦x^α, we obtain x^α + y^α ≤ (x + y)^α ≤ 2^α−1(x^α + y^α).

Lemma A.1

Let X₁, …, X_n be independent, centered random variables such that $\lVert X_i \rVert _{\Psi _\alpha } \le 1$ for some α > 0. Then, if Y :=max_i|X_i| and $c := (c_\alpha ^{-1}\log n)^{1/\alpha }$, we have

$$\displaystyle \begin{aligned} \mathbb {P}(Y \ge c + t) \le 2 \exp(-c_\alpha t^\alpha) \end{aligned}$$

with c_α as in (A.3).

Proof

We have

$$\displaystyle \begin{aligned} \mathbb {P}(Y \ge c + t) &\le n \mathbb {P}(|X_i| \ge c + t) \le 2n \exp(-(c+t)^\alpha)\\ &\le 2n \exp(-c_\alpha (t^\alpha + c^\alpha) = 2 \exp(-c_\alpha t^\alpha), \end{aligned} $$

where we have used (A.3) in the next-to-last step. □

Lemma A.2

Let Y ≥ 0 be a random variable that satisfies

$$\displaystyle \begin{aligned} \mathbb {P}(Y \ge c + t) \le 2 \exp(-t^\alpha) \end{aligned}$$

for some c ≥ 0 and any t ≥ 0. Then,

$$\displaystyle \begin{aligned} \lVert Y \rVert_{\Psi_\alpha} \le \widetilde{c}_\alpha^{1/\alpha} \max\Big\{\Big(\frac{\sqrt{2}+1}{\sqrt{2}-1}\Big)^{1/\alpha},c\Big(\frac{2}{\log 2}\Big)^{1/\alpha}\Big\} \end{aligned}$$

with $\widetilde {c}_\alpha $ as in (A.3).

Proof

By (A.3) and monotonicity, we have $Y^\alpha \le \widetilde {c}_\alpha ((Y-c)_+^\alpha + c^\alpha )$, where $x_+ := \max (x,0)$. Thus,

$$\displaystyle \begin{aligned} \mathbb {E}\exp\Big(\frac{Y^\alpha}{s^\alpha}\Big) &\le \exp\Big(\frac{\widetilde{c}_\alpha c^\alpha}{s^\alpha}\Big) \mathbb {E}\exp\Big(\frac{\widetilde{c}_\alpha (Y-c)_+^\alpha}{s^\alpha}\Big)\\ &= \exp\Big(\frac{c^\alpha}{t^\alpha}\Big) \mathbb {E}\exp\Big(\frac{(Y-c)_+^\alpha}{t^\alpha}\Big) =: I_1 \cdot I_2, \end{aligned} $$

where we have set $t := s\widetilde {c}_\alpha ^{-1/\alpha }$. Obviously, $I_1 \le \sqrt {2}$ if $t \ge c(1/\log \sqrt {2})^{1/\alpha }$. As for I₂, we have

$$\displaystyle \begin{aligned} I_2 &= 1 + \int_1^\infty \mathbb {P}((Y-c)_+\ge t(\log y)^{1/\alpha}) dy\\ &\le 1 + 2 \int_1^\infty \exp(-t^\alpha \log y) dy = 1 + 2\int_1^\infty \frac{1}{y^{t^\alpha}} dy \le \sqrt{2} \end{aligned} $$

if $t \ge ((\sqrt {2}+1)/(\sqrt {2}-1))^{1/\alpha }$. Therefore, I₁I₂ ≤ 2 if $t \ge \max \{((\sqrt {2}+1)/(\sqrt {2}-1))^{1/\alpha }, c(2/\log 2)^{1/\alpha }\}$, which finishes the proof. □

Having these lemmas at hand, the proof of Lemma 56 is easily completed.

Proof of Lemma 56

The random variables $\hat {X}_i := X_i/K$ obviously satisfy the assumptions of Lemma A.1. Hence, setting $Y := \max _i |\hat {X}_i| = K^{-1} \max _i |X_i|$,

$$\displaystyle \begin{aligned} \mathbb {P}(c_\alpha^{1/\alpha}Y \ge (\log n)^{1/\alpha} + t) \le 2 \exp(-t^\alpha). \end{aligned}$$

Therefore, we may apply Lemma A.2 to $\hat {Y} := c_\alpha ^{1/\alpha } K^{-1} \max _i |X_i|$. This yields

$$\displaystyle \begin{aligned} \lVert \hat{Y} \rVert_{\Psi_\alpha} \le \widetilde{c}_\alpha^{1/\alpha} \max\Big\{\Big(\frac{\sqrt{2}+1}{\sqrt{2}-1}\Big)^{1/\alpha},(\log n)^{1/\alpha}\Big(\frac{2}{\log 2}\Big)^{1/\alpha}\Big\}, \end{aligned}$$

i. e., the claim of Lemma 56, where we have set $C := (\widetilde {c}_\alpha c_\alpha ^{-1})^{1/\alpha }$. □

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sambale, H. (2023). Some Notes on Concentration for α-Subexponential Random Variables. In: Adamczak, R., Gozlan, N., Lounici, K., Madiman, M. (eds) High Dimensional Probability IX. Progress in Probability, vol 80. Birkhäuser, Cham. https://doi.org/10.1007/978-3-031-26979-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-26979-0_7
Published: 06 February 2023
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-031-26978-3
Online ISBN: 978-3-031-26979-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Some Notes on Concentration for α-Subexponential Random Variables

Abstract

Similar content being viewed by others

Concentration inequalities for upper probabilities

Estimates for the Concentration Functions in the Littlewood–Offord Problem

Arak’s inequalities for concentration functions and the Littlewood–Offord problem

Keywords

2020 Mathematics Subject Classification:

1 Introduction

Notations.

2 A Generalized Hanson–Wright Inequality

Theorem 21

Proposition 22

Adjusting constants.

Lemma 23

Proof

Proof of Theorem 21, continued

Proof of Proposition 22

3 Convex Concentration for Random Variables with Bounded Orlicz Norms

Proposition 31

Proof of Proposition 31

4 Uniform Tail Bounds for First- and Second-Order Chaos

Theorem 41

Lemma 42

Lemma 43

Proof

Proof of Theorem 41

5 Random Tensors

Theorem 51

Proposition 52

Lemma 53

Lemma 54

Lemma 55 ([42], Lemma 4.1)

Lemma 56

Proof of Theorem 51

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A

Appendix A

Proof of Proposition 52

Proof of Lemma 53

Proof of Lemma 54

Lemma A.1

Proof

Lemma A.2

Proof

Proof of Lemma 56

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation