Keywords

2020 Mathematics Subject Classification:

1 Introduction

The aim of this note is to compile a number of smaller results that extend some classical as well as more recent concentration inequalities for bounded or sub-Gaussian random variables to random variables with heavier (but still exponential type) tails. In detail, we shall consider random variables X that satisfy

$$\displaystyle \begin{aligned} \mathbb {P}(|X-\mathbb {E}X| \ge t) \le 2 \exp(-t^\alpha/C_{1,\alpha}^\alpha) \end{aligned} $$
(1.1)

for any t ≥ 0, some α ∈ (0, 2], and a suitable constant C1,α > 0. Such random variables are sometimes called α-subexponential (for α = 2, they are sub-Gaussian) or sub-Weibull(α) (cf. [23, Definition 2.2]).

There are several equivalent reformulations of (1.1), e. g., in terms of Lp norms:

$$\displaystyle \begin{aligned} \lVert X \rVert_{L^p} \le C_{2,\alpha} p^{1/\alpha} \end{aligned} $$
(1.2)

for any p ≥ 1. Another characterization is that these random variables have finite Orlicz norms of order α, i. e.,

$$\displaystyle \begin{aligned} C_{3,\alpha} := \lVert X \rVert_{\Psi_\alpha} := \inf \{t > 0 : \mathbb {E} \exp ((|X|/t)^\alpha) \le 2 \} < \infty. \end{aligned} $$
(1.3)

If α < 1, \(\lVert \cdot \rVert _{\Psi _\alpha }\) is actually a quasi-norm; however, many norm-like properties (such as a triangle-type inequality) can nevertheless be recovered up to α-dependent constants (see, e. g., [12, Appendix A]). In fact, C1,α, C2,α, and C3,α can be chosen such that they only differ by a constant α-dependent factor.

Note that α-subexponential random variables have log-convex (if α ≤ 1) or log-concave (if α ≥ 1) tails, i. e., \(t \mapsto - \log \mathbb {P}(\lvert {X}\rvert \ge t)\) is convex or concave, respectively. For log-convex or log-concave measures, two-sided Lp norm estimates for polynomial chaos (and as a consequence, concentration bounds) have been established over the last 25 years. In the log-convex case, results of this type have been derived for linear forms in [17] and for forms of any order in [12, 21]. For log-concave measures, starting with linear forms again in [10], important contributions have been made in [3, 24, 25, 27].

In this note, we mainly present four different results for functions of α-subexponential random variables: a Hanson–Wright-type inequality in Sect. 2, a version of the convex concentration inequality in Sect. 3, a uniform Hanson–Wright inequality in Sect. 4, and finally a convex concentration inequality for simple random tensors in Sect. 5. These results are partly based on and generalize recent research, e. g., [20] and [42]. In fact, they partially build upon each other: for instance, in the proofs of Sect. 5, we apply results both from Sects. 2 and 3. A more detailed discussion is provided in each of the sections.

Finally, let us introduce some conventions that we will use in this chapter.

Notations.

If X1, …, Xn is a sequence of random variables, we denote by X = (X1, …, Xn) the corresponding random vector. Moreover, we shall need the following types of norms throughout the paper:

  • The norms \(\lVert x \rVert _p := (\sum _{i=1}^n\lvert {x_i}\rvert ^p)^{1/p}\) for \(x \in \mathbb {R}^n\)

  • The Lp norms \(\lVert X \rVert _{L^p} := (\mathbb {E}|X|{ }^p)^{1/p}\) for random variables X (cf. (1.2))

  • The Orlicz (quasi-)norms \(\lVert X \rVert _{\Psi _\alpha }\) as introduced in (1.3)

  • The Hilbert–Schmidt and operator norms \(\lVert A \rVert _{\mathrm {HS}} := (\sum _{i,j} a_{ij}^2)^{1/2}\), \(\lVert A \rVert _{\mathrm {op}} := \sup \{ \lVert Ax \rVert _2 : \lVert x \rVert _2 = 1\}\) for matrices A = (aij)

The constants appearing in this chapter (typically denoted C or c) may vary from line to line. Without subscript, they are assumed to be absolute, and if they depend on α (only), we shall write Cα or cα.

2 A Generalized Hanson–Wright Inequality

Arguably, the most famous concentration result for quadratic form is the Hanson–Wright inequality, which first appeared in [16]. We may state it as follows: assuming X1, …, Xn are centered, independent random variables satisfying \(\lVert X_i \rVert _{\Psi _2} \le K\) for any i and A = (aij) is a symmetric matrix, we have for any t ≥ 0

$$\displaystyle \begin{aligned} {\mathbb {P}}\big(\lvert{X^TAX - {\mathbb {E}} X^TAX}\rvert \ge t\big) \le 2 \exp\Big( - \frac{1}{C} \min\Big( \frac{t^2}{K^4 \lVert A \rVert _{\mathrm{HS}}^2}, \frac{t}{K^2 \lVert A \rVert _{\mathrm{op}}} \Big)\Big). \end{aligned}$$

For a modern proof, see [33], and for various developments, cf. [2, 4, 18, 43].

In this note, we provide an extension of the Hanson–Wright inequality to random variables with bounded Orlicz norms of any order α ∈ (0, 2]. This complements the results in [12], where the case of α ∈ (0, 1] was considered, while for α = 2, we get back the actual Hanson–Wright inequality.

Theorem 21

For any α ∈ (0, 2], let X1, …, Xn be independent, centered random variables such that \(\lVert X_i \rVert _{\Psi _\alpha } \le K\) for any i and A = (aij) be a symmetric matrix. Then, for any t ≥ 0,

$$\displaystyle \begin{aligned} {\mathbb {P}}\big(\lvert{X^TAX - {\mathbb {E}} X^TAX}\rvert \ge t\big) \le 2 \exp\Big( - \frac{1}{C_\alpha} \min\Big( \frac{t^2}{K^4 \lVert A \rVert _{\mathrm{HS}}^2}, \Big( \frac{t}{K^2 \lVert A \rVert _{\mathrm{op}}}\Big)^{\frac \alpha 2} \Big)\Big). \end{aligned}$$

Theorem 21 generalizes and implies a number of inequalities for quadratic forms in α-subexponential random variables (in particular for α = 1) that are spread throughout the literature. For a detailed discussion, see [12, Remark 1.7]. Note that it is possible to sharpen the tail estimate given by Theorem 21, cf., e. g., [12, Corollary 1.4] for α ∈ (0, 1] or [3, Theorem 3.2] for α ∈ [1, 2] (in fact, the proof of Theorem 21 works by evaluating the family of norms used therein). The main benefit of Theorem 21 is that it uses norms that are easily calculable and in many situations already sufficient for applications.

Before we give the proof of Theorem 21, let us briefly mention that for the standard Hanson–Wright inequality, a number of selected applications can be found in [33]. Some of them were generalized to α-subexponential random variables with α ≤ 1 in [12], and it is no problem to extend these proofs to any order α ∈ (0, 2] using Theorem 21. Here, we just focus on a single example that yields a concentration result for the Euclidean norm of a linear transformation of a vector X having independent components with bounded Orlicz norms around the Hilbert–Schmidt norm of the transformation matrix. This is a variant and extension of [12, Proposition 2.1] and will be applied in Sect. 5.

Proposition 22

Let X1, …, Xn be independent, centered random variables such that \({\mathbb {E}} X_i^2 = 1\) and \(\lVert X_i \rVert _{\Psi _\alpha } \le K\) for some α ∈ (0, 2] and let B ≠ 0 be an m × n matrix. For any t ≥ 0, we have

$$\displaystyle \begin{aligned} {\mathbb {P}}(\lvert{\lVert BX \rVert _2 - \lVert B \rVert _{\mathrm{HS}}}\rvert \ge t K^2 \lVert B \rVert _{\mathrm{op}}) \le 2\exp(- t^\alpha/C_\alpha ). \end{aligned} $$
(2.1)

In particular, for any t ≥ 0, it holds

$$\displaystyle \begin{aligned} {\mathbb {P}}( \lvert{\lVert X \rVert _2 - \sqrt{n}}\rvert \ge tK^2) \le 2\exp(- t^\alpha/C_\alpha). \end{aligned} $$
(2.2)

For the proofs, let us recall some elementary relations that we will use throughout the paper to adjust the constants in the tail bounds we derive.

Adjusting constants.

For any two constants C1 > C2 > 1, we have for all r ≥ 0 and C > 0

$$\displaystyle \begin{aligned} C_1 \exp(-r/C) \le C_2 \exp\Big(-\frac{\log(C_2)}{C\log(C_1)} r\Big) \end{aligned} $$
(2.3)

whenever the left-hand side is smaller or equal to 1 (cf., e. g., [35, Eq. (3.1)]). Moreover, for any α ∈ (0, 2), any γ > 0, and all t ≥ 0, we may always estimate

$$\displaystyle \begin{aligned} \exp(-(t/C)^2) \le 2 \exp(-(t/C')^\alpha), \end{aligned} $$
(2.4)

using \(\exp (-s^2) \le \exp (1-s^\alpha )\) for any s > 0 and (2.3). More precisely, we may choose \(C' := C/\log ^{1/\alpha }(2)\). Note that strictly speaking, the range of tC ≤ 1 is not covered by (2.3); however, in this case (in particular, choosing C′ as suggested), both sides of (2.4) are at least 1 anyway so that the right-hand side still provides a valid upper bound for any probability.

Let us now turn to the proof of Theorem 21. In what follows, we actually show that for any p ≥ 2,

$$\displaystyle \begin{aligned} \lVert X^TAX - {\mathbb {E}} X^TAX \rVert _{L^p} \le C_\alpha K^2 \big(p^{1/2} \lVert A \rVert _{\mathrm{HS}} + p^{2/\alpha} \lVert A \rVert _{\mathrm{op}} \big). \end{aligned} $$
(2.5)

From here, Theorem 21 follows by standard means (cf. [34, Proof of Theorem 3.6]). Moreover, we may restrict ourselves to α ∈ (1, 2], since the case of α ∈ (0, 1] has been proven in [12].

Proof of Theorem 21

First we shall treat the off-diagonal part of the quadratic form. Let \(w^{(1)}_i, w^{(2)}_i\) be independent (of each other as well as of the Xi) symmetrized Weibull random variables with scale 1 and shape α, i. e., \(w^{(j)}_i\) are symmetric random variables with \(\mathbb {P}(\lvert {w^{(j)}_i}\rvert \ge t) = \exp (-t^\alpha )\). In particular, the \(w^{(j)}_i\) have logarithmically concave tails.

Using standard decoupling and symmetrization arguments (cf. [8, Theorem 3.1.1 & Lemma 1.2.6]) as well as [3, Theorem 3.2] in the second inequality, for any p ≥ 2, it holds

$$\displaystyle \begin{aligned} \lVert \sum_{i \neq j} a_{ij} X_i X_j \rVert _{L^p} &\le C_\alpha K^2 \lVert \sum_{i \neq j} a_{ij} w^{(1)}_i w^{(2)}_j \rVert _{L^p} \\ &\le C_\alpha K^2(\lVert A \rVert _{\{1,2\},p}^{\mathcal{N}} + \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}}), \end{aligned} $$
(2.6)

where the norms \(\lVert A \rVert _{\mathcal {J},p}^{\mathcal {N}}\) are defined as in [3]. Instead of repeating the general definitions, we will only focus on the case we need in our situation. Indeed, for the symmetric Weibull distribution with parameter α, we have (again, in the notation of [3]) N(t) = tα, and so for α ∈ (1, 2], it follows that \(\hat {N}(t) = \min (t^2, \lvert {t}\rvert ^\alpha )\). Hence, the norms can be written as follows:

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{1,2\},p}^{\mathcal{N}} &= 2 \sup\big\lbrace \sum_{i,j} a_{ij} x_{ij} : \sum_{i = 1}^n \min\big(\sum_{j} x_{ij}^2, \big(\sum_{j} x_{ij}^2\big)^{\alpha/2}\big) \le p \big\rbrace,\\ \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}} &= \sup\big\lbrace \sum_{i,j} a_{ij} x_i y_j : \sum_{i = 1}^n \min(x_i^2, \lvert x_i \rvert ^{\alpha}) \\ &\le p, \sum_{j = 1}^n \min(y_j^2, \lvert y_j \rvert ^{\alpha}) \le p \big\rbrace. \end{aligned} $$

Before continuing with the proof, we next introduce a lemma that will help to rewrite the norms in a more tractable form. □

Lemma 23

For any p ≥ 2, define

$$\displaystyle \begin{aligned} I_1(p) &:= \big\lbrace x = (x_{ij}) \in {\mathbb {R}}^{n \times n} : \sum_{i = 1}^n \min\big( \big( \sum_{j = 1}^n x_{ij}^2\big)^{\alpha/2}, \sum_{j = 1}^n x_{ij}^2 \big) \le p \big\rbrace,\\ I_2(p) &:= \big\lbrace x_{ij} = z_i y_{ij} \in {\mathbb {R}}^{n \times n} : \sum_{i = 1}^n \min(\lvert z_i \rvert ^\alpha, z_i^2) \le p, \max_{i = 1,\ldots,n} \sum_{j = 1}^n y_{ij}^2 \le 1 \big\rbrace. \end{aligned} $$

Then I1(p) = I2(p).

Proof

The inclusion I1(p) ⊇ I2(p) is an easy calculation, and the inclusion I1(p) ⊆ I2(p) follows by defining \(z_i = \lVert (x_{ij})_j \rVert \) and \(y_{ij} = x_{ij} / \lVert (x_{ij})_j \rVert \) (or 0, if the norm is zero). □

Proof of Theorem 21, continued

For brevity, for any matrix A = (aij), let us write \(\lVert A \rVert _m := \max _{i = 1,\ldots ,n} ( \sum _{j = 1}^n a_{ij}^2 )^{1/2}\). Note that clearly, \(\lVert A \rVert _m \le \lVert A \rVert _{\mathrm {op}}\).

Now, fix some vector \(z \in {\mathbb {R}}^n\) such that \(\sum _{i = 1}^n \min (\lvert z_i \rvert ^\alpha , z_i^2) \le p\). The condition also implies

$$\displaystyle \begin{aligned} p \ge \sum_{i = 1}^n \lvert z_i \rvert ^\alpha \mathbb m{1}_{\{\lvert z_i \rvert > 1\}} + \sum_{i= 1}^n z_i^2 \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}} \ge \max\Big( \sum_{i=1}^n z_i^2 \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}}, \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert > 1\}} \Big), \end{aligned}$$

where in the second step we used α ∈ [1, 2] to estimate \(\lvert z_i \rvert ^\alpha \mathbb m{1} _{\{\lvert z_i \rvert > 1\}} \ge \lvert z_i \rvert \mathbb m{1} _{\{\lvert z_i \rvert > 1\}}\). So, given any z and y satisfying the conditions of I2(p), we can write

$$\displaystyle \begin{aligned} \lvert \sum_{i,j} a_{ij} z_i y_{ij} \rvert &\le \sum_{i = 1}^n \lvert z_i \rvert \big( \sum_{j = 1}^n a_{ij}^2 \big)^{1/2} \big( \sum_{j = 1}^n y_{ij}^2 \big)^{1/2} \le \sum_{i = 1}^n \lvert z_i \rvert \big( \sum_{j = 1}^n a_{ij}^2 \big)^{1/2} \\ &\le \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}} \big( \sum_{j = 1}^n a_{ij}^2 \big)^{1/2} + \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert > 1\}} \big( \sum_{j =1 }^n a_{ij}^2 \big)^{1/2} \\ &\le \lVert A \rVert _{\mathrm{HS}} \big(\sum_{i = 1}^n z_i^2 \mathbb m{1}_{\{\lvert z_i \rvert \le 1\}}\big)^{1/2} + \lVert A \rVert _m \sum_{i = 1}^n \lvert z_i \rvert \mathbb m{1}_{\{\lvert z_i \rvert > 1\}}. \end{aligned} $$

So, this yields

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{1,2\},p}^{\mathcal{N}} \le 2p^{1/2} \lVert A \rVert _{\mathrm{HS}} + 2p \lVert A \rVert _m \le 2p^{1/2} \lVert A \rVert _{\mathrm{HS}} + 2p \lVert A \rVert _{\mathrm{op}}. \end{aligned} $$
(2.7)

As for \(\lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal {N}}\), we can use the decomposition z = z1 + z2, where \((z_1)_i = z_i \mathbb m{1} _{\{\lvert z_i \rvert > 1\}}\) and z2 = z − z1, and obtain

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}} &\le \sup\big\lbrace \sum_{ij} a_{ij} (x_1)_i (y_1)_j : \lVert x_1 \rVert _\alpha \le p^{1/\alpha}, \lVert y_1 \rVert _\alpha \le p^{1/\alpha} \big\rbrace \\ &+ 2 \sup\big\lbrace \sum_{ij} a_{ij} (x_1)_i (y_2)_j : \lVert x_1 \rVert _\alpha \le p^{1/\alpha}, \lVert y_2 \rVert _2 \le p^{1/2} \big\rbrace \\ &+ \sup\big\lbrace\sum_{ij} a_{ij} (x_2)_i (y_2)_j : \lVert x_2 \rVert _2 \le p^{1/2}, \lVert y_2 \rVert _2 \le p^{1/2} \big\rbrace \\ &= p^{2/\alpha} \sup\lbrace \ldots \rbrace + 2p^{1/\alpha + 1/2} \sup\{\ldots\} + p \lVert A \rVert _{\mathrm{op}} \end{aligned} $$

(in the braces, the conditions \(\lVert \cdot \rVert _\beta \le p^{1/\beta }\) have been replaced by \(\lVert \cdot \rVert _\beta \le 1\)). Clearly, since \(\lVert x_1 \rVert _\alpha \le 1\) implies \(\lVert x_1 \rVert _2 \le 1\) (and the same for y1), all of the norms can be upper bounded by \(\lVert A \rVert _{\mathrm {op}}\), i. e., we have

$$\displaystyle \begin{aligned} \lVert A \rVert _{\{\{1\},\{2\}\}, p}^{\mathcal{N}} \le (p^{2/\alpha} + 2p^{1/\alpha + 1/2} + p) \lVert A \rVert _{\mathrm{op}} \le 4p^{2/\alpha} \lVert A \rVert _{\mathrm{op}}, \end{aligned} $$
(2.8)

where the last inequality follows from p ≥ 2 and 1∕2 ≤ 1∕α ≤ 1 ≤ (α + 2)∕(2α) ≤ 2∕α.

Combining the estimates (2.6), (2.7), and (2.8) yields

$$\displaystyle \begin{aligned} \lVert \sum_{i,j} a_{ij} X_i X_j \rVert _{L^p} \le C_\alpha K^2\big( 2p^{1/2} \lVert A \rVert _{\mathrm{HS}} + 6p^{2/\alpha} \lVert A \rVert _{\mathrm{op}} \big). \end{aligned} $$

To treat the diagonal terms, we use Corollary 6.1 in [12], as \(X_i^2\) are independent and satisfy \(\lVert X_i^2 \rVert _{\Psi _{\alpha /2}} \le K^2\), so that it yields

$$\displaystyle \begin{aligned} {\mathbb {P}}\big( \lvert \sum_{i = 1}^n a_{ii} (X_i^2 - {\mathbb {E}} X_i^2) \rvert \ge t\big) \le 2 \exp&\Big( - \frac{1}{C_\alpha K^2} \min\Big( \frac{t^2}{\sum_{i =1}^n a_{ii}^2},\\ &\Big( \frac{t}{\max_{i = 1,\ldots,n} \lvert a_{ii} \rvert }\Big)^{\alpha/2} \Big)\Big). \end{aligned} $$

Now it is clear that \(\max _{i = 1,\ldots , n} \lvert a_{ii} \rvert \le \lVert A \rVert _{\mathrm {op}}\) and \(\sum _{i = 1}^n a_{ii}^2 \le \lVert A \rVert _{\mathrm {HS}}^2\). In particular,

$$\displaystyle \begin{aligned} \lVert \sum_{i = 1}^n a_{ii} (X_i^2 - {\mathbb {E}} X_i^2) \rVert _{L^p} \le C_\alpha K^2(p^{1/2} \lVert A \rVert _{\mathrm{HS}} + p^{2/\alpha} \lVert A \rVert _{\mathrm{op}}). \end{aligned}$$

The claim (2.5) now follows from Minkowski’s inequality. □

Finally, we prove Proposition 22.

Proof of Proposition 22

It suffices to prove (2.1) for matrices satisfying \(\lVert B \rVert _{\mathrm {HS}} = 1\), as otherwise we set \(\widetilde {B} = B \lVert B \rVert _{\mathrm {HS}}^{-1}\) and use the equality

$$\displaystyle \begin{aligned} \{ \lvert \lVert BX \rVert _2 - \lVert B \rVert _{\mathrm{HS}} \rvert \ge \lVert B \rVert _{\mathrm{op}} t \} = \{ \lvert \lVert \widetilde{B}X \rVert _2 - 1 \rvert \ge \lVert \widetilde{B} \rVert _{\mathrm{op}} t \}. \end{aligned}$$

Now let us apply Theorem 21 to the matrix A := BTB. An easy calculation shows that \(\mathrm {trace}(A) = \mathrm {trace}(B^T B) = \lVert B \rVert _{\mathrm {HS}}^2 = 1\), so that we have for any t ≥ 0

$$\displaystyle \begin{aligned} {\mathbb {P}}\big( \lvert \lVert BX \rVert _2 - 1 \rvert \ge t \big) &\le {\mathbb {P}}\big( \lvert \lVert BX \rVert _2^2 - 1 \rvert \ge \max(t, t^2) \big) \\ &\le 2\exp\Big( - \frac{1}{C_\alpha} \min\Big( \frac{\max(t,t^2)^2}{K^4\lVert B \rVert _{\mathrm{op}}^2}, \Big( \frac{\max(t,t^2)}{K^4 \lVert B \rVert _{\mathrm{op}}^2} \Big)^{\alpha/2} \Big) \Big) \\ &\le 2\exp\Big( - \frac{1}{C_\alpha} \min\Big( \frac{t^2}{K^4 \lVert B \rVert _{\mathrm{op}}^2}, \Big( \frac{t^2}{K^4 \lVert B \rVert _{\mathrm{op}}^2} \Big)^{\alpha/2} \Big) \Big) \\ &\le 2\exp\Big( - \frac{1}{C_\alpha} \Big( \frac{t}{K^2 \lVert B \rVert _{\mathrm{op}}} \Big)^{\alpha} \Big). \end{aligned} $$

Here, the first step follows from \(\lvert z - 1 \rvert \le \min ( \lvert z^2 -1 \rvert , \lvert z^2 - 1 \rvert ^{1/2})\) for z ≥ 0, in the second step, we have used the estimates \(\lVert A \rVert _{\mathrm {HS}}^2 \le \lVert B \rVert _{\mathrm {op}}^2 \lVert B \rVert _{\mathrm {HS}}^2 = \lVert B \rVert _{\mathrm {op}}^2\) and \(\lVert A \rVert _{\mathrm {op}} \le \lVert B \rVert _{\mathrm {op}}^2\), and moreover, the fact that since \(\mathbb {E}X_i^2 = 1\), K ≥ Cα > 0 (cf., e. g., [12, Lemma A.2]), while the last step follows from (2.4) and (2.3). Setting \(t = K^2s\lVert B \rVert _{\mathrm {op}}\) for s ≥ 0 finishes the proof of (2.1). Finally, (2.2) follows by taking m = n and B = I. □

3 Convex Concentration for Random Variables with Bounded Orlicz Norms

Assume X1, …, Xn are independent random variables each taking values in some bounded interval [a, b]. Then, by convex concentration as established in [19, 29, 38], for every convex 1-Lipschitz function \(f : [a,b]^n \to \mathbb {R}\),

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{t^2}{2(b-a)^2}\Big) \end{aligned} $$
(3.1)

for any t ≥ 0 (see, e. g., [36, Corollary 3]).

While convex concentration for bounded random variables is by now standard, there is less literature for unbounded random variables. In [31], a Martingale-type approach is used, leading to a result for functionals with stochastically bounded increments. The special case of suprema of unbounded empirical processes was treated in [1, 28, 40]. Another branch of research, begun in [29] and continued, e. g., in [5, 13,14,15, 36, 37], is based on functional inequalities (such as Poincaré or log-Sobolev inequalities) restricted to convex functions and weak transport-entropy inequalities. In [20, Lemma 1.8], a generalization of (3.1) for sub-Gaussian random variables (α = 2) was proven, which we may extend to any order α ∈ (0, 2].

Proposition 31

Let X1, …, Xn be independent random variables, α ∈ (0, 2] and \(f : \mathbb {R}^n \to \mathbb {R}\) convex and 1-Lipschitz. Then, for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{t^\alpha}{C_\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big). \end{aligned}$$

In particular,

$$\displaystyle \begin{aligned} \lVert f(X) - \mathbb {E}f(X) \rVert_{\Psi_\alpha} \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}. \end{aligned} $$
(3.2)

Note that the main results of the following two sections can be regarded as applications of Proposition 31. If f is separately convex only (i. e., convex is every coordinate with the other coordinates being fixed), it is still possible to prove a corresponding result for the upper tails. Indeed, it is no problem to modify the proof below accordingly, replacing (3.1) by [7, Theorem 6.10]. Moreover, note that \(\lVert \max _i |X_i| \rVert _{\Psi _\alpha }\) cannot be replaced by \(\max _i \lVert |X_i| \rVert _{\Psi _\alpha }\) (a counterexample for α = 2 is provided in [20]). In general, the Orlicz norm of \(\max _i\lvert X_i \rvert \) will be of order \((\log n)^{1/\alpha }\) (cf. Lemma 56).

Proof of Proposition 31

Following the lines of the proof of [20, Lemma 3.5], the key step is a suitable truncation that goes back to [1]. Indeed, write

$$\displaystyle \begin{aligned} X_i = X_i1_{\{|X_i|\le M\}} + X_i1_{\{|X_i| > M\}} =: Y_i + Z_i \end{aligned} $$
(3.3)

with \(M := 8 \mathbb {E}\max _i |X_i|\) (in particular, \(M \le C_\alpha \lVert \max _i |X_i| \rVert _{\Psi _\alpha }\), cf. [12, Lemma A.2]), and let Y = (Y1, …, Yn), Z = (Z1, …, Zn). By the Lipschitz property of f,

$$\displaystyle \begin{aligned} &\mathbb {P}(|f(X) - \mathbb {E}f(X)| > t)\\ \le \ &\mathbb {P}(|f(Y) - \mathbb {E}f(Y)| + |f(X) - f(Y)| + |\mathbb {E}f(Y) - \mathbb {E}f(X)| > t)\\ \le \ &\mathbb {P}(|f(Y) - \mathbb {E}f(Y)| + \lVert Z \rVert_2 + \mathbb {E}\lVert Z \rVert_2 > t), \end{aligned} $$
(3.4)

and hence, it suffices to bound the terms in the last line.

Applying (3.1) to Y  and using (2.4) and (2.3), we obtain

$$\displaystyle \begin{aligned} \mathbb {P}(|f(Y) - \mathbb {E}f(Y)| > t) \le 2 \exp\Big(- \frac{t^\alpha}{C_\alpha^\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big). \end{aligned} $$
(3.5)

Furthermore, below we will show that

$$\displaystyle \begin{aligned} \lVert \lVert Z \rVert_2 \rVert_{\Psi_\alpha} \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}. \end{aligned} $$
(3.6)

Hence, for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(\lVert Z \rVert_2 \ge t) \le 2 \exp \Big(- \frac{t^\alpha}{C_\alpha^\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big), \end{aligned} $$
(3.7)

and by [12, Lemma A.2],

$$\displaystyle \begin{aligned} \mathbb {E}\lVert Z \rVert_2 \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}. \end{aligned} $$
(3.8)

Temporarily writing \(K := C_\alpha \lVert \max _i |X_i| \rVert _{\Psi _\alpha }\), where Cα is large enough so that (3.5), (3.7), and (3.8) hold, (3.4) and (3.8) yield

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le \mathbb {P}(|f(Y) - \mathbb {E}f(Y)| + \lVert Z \rVert_2 > t - K) \end{aligned}$$

if t ≥ K. Using subadditivity and invoking (3.5) and (3.7), we obtain

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 4 \exp\Big(-\frac{(t-K)^\alpha}{(2K)^\alpha}\Big) \le 4 \exp\Big(-\frac{t^\alpha}{c_\alpha(2K)^\alpha}\Big), \end{aligned}$$

where the last step holds for t ≥ K + δ for some δ > 0. This bound extends trivially to any t ≥ 0 (if necessary, by a suitable change of constants). Finally, the constant in front of the exponential may be adjusted to 2 by (2.3), which finishes the proof.

It remains to show (3.6). To this end, recall the Hoffmann-Jørgensen inequality (cf. [30, Theorem 6.8]) in the following form: if W1, …, Wn are independent random variables, Sk := W1 + … + Wk, and t ≥ 0 is such that \(\mathbb {P}(\max _k |S_k| > t) \le 1/8\), then

$$\displaystyle \begin{aligned} \mathbb {E}\max_k|S_k| \le 3 {\mathbb {E}} \max_i|W_i| + 8t. \end{aligned}$$

In our case, we set \(W_i := Z_i^2\), t = 0, and note that by Chebyshev’s inequality,

$$\displaystyle \begin{aligned} \mathbb {P}(\max_iZ_i^2 > 0) = \mathbb {P}(\max_i|X_i| > M) \le \mathbb {E}\max_i|X_i|/M = 1/8, \end{aligned}$$

and consequently, recalling that \(S_k = Z_1^2 + \ldots + Z_k^2\),

$$\displaystyle \begin{aligned} \mathbb {P}(\max_k |S_k| > 0) \le \mathbb {P}(\max_i Z_i^2 > 0) \le 1/8. \end{aligned}$$

Thus, together with [12, Lemma A.2], we obtain

$$\displaystyle \begin{aligned} \mathbb {E}\lVert Z \rVert_2^2 \le 3 \mathbb {E} \max_i Z_i^2 \le C_\alpha \lVert \max_i Z_i^2 \rVert_{\Psi_{\alpha/2}}. \end{aligned}$$

Now it is easy to see that \(\lVert \max _i Z_i^2 \rVert _{\Psi _{\alpha /2}} \le \lVert \max _i |X_i| \rVert _{\Psi _\alpha }^2\), so that altogether we arrive at

$$\displaystyle \begin{aligned} \mathbb {E}\lVert Z \rVert_2^2 \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}^2. \end{aligned} $$
(3.9)

Furthermore, by [30, Theorem 6.21], if W1, …, Wn are independent random variables with zero mean and α ∈ (0, 1],

$$\displaystyle \begin{aligned} \lVert \sum_{i=1}^{n} W_i \rVert_{\Psi_\alpha} \le C_\alpha (\lVert \sum_{i=1}^n W_i \rVert_{L^1} + \lVert \max_i |W_i| \rVert_{\Psi_\alpha}). \end{aligned}$$

In our case, we consider \(W_i = Z_i^2 - \mathbb {E} Z_i^2\) and α∕2 (instead of α). Together with the previous arguments (in particular, (3.9)) and [12, Lemma A.3], this yields

$$\displaystyle \begin{aligned} \lVert \sum_{i=1}^n(Z_i^2 - \mathbb {E}Z_i^2) \rVert_{\Psi_{\alpha/2}} &\le C_\alpha (\mathbb {E}|\lVert Z\rVert_2^2 - \mathbb {E} \lVert Z\rVert_2^2| + \lVert \max_i |Z_i^2 - \mathbb {E} Z_i^2| \rVert_{\Psi_{\alpha/2}})\\ &\le C_\alpha (\mathbb {E}\lVert Z\rVert_2^2 + \lVert \max_i Z_i^2 \rVert_{\Psi_{\alpha/2}}) \le C_\alpha \lVert \max_i |X_i| \rVert_{\Psi_\alpha}^2. \end{aligned} $$

Combining this with [12, Lemma A.3] and (3.9), we arrive at (3.6). □

4 Uniform Tail Bounds for First- and Second-Order Chaos

In this section, we discuss bounds for the tails of the supremum of certain chaos-type classes of functions. Even if we are particularly interested in quadratic forms, i. e., uniform Hanson–Wright inequalities, let us first consider linear forms.

Let X1, …, Xn be independent random variables, let α ∈ (0, 2], and let \(\{a_{i,t} : i = 1, \ldots , n, t \in \mathcal {T}\}\) be a compact set of real numbers, where \(\mathcal {T}\) is some index set. Consider \(g(X) := \sup _{t \in \mathcal {T}} \sum _{i=1}^n a_{i,t}X_i\). Clearly, g is convex and has Lipschitz constant \(D := \sup _{t \in \mathcal {T}} (\sum _{i=1}^n a_{i,t}^2)^{1/2}\). Therefore, applying Proposition 31, we immediately obtain that for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(\lvert g(X)-\mathbb {E}g(X) \rvert \ge t) \le 2 \exp\Big(-\frac{t^\alpha}{C_\alpha D^\alpha\lVert \max_i |X_i| \rVert_{\Psi_\alpha}^\alpha}\Big). \end{aligned} $$
(4.1)

For bounded random variables, corresponding tail bounds can be found, e. g., in [32, Eq. (14)], and choosing α = 2, we get back this result up to constants.

Our main aim is to derive a second-order analogue of (4.1), i. e., a uniform Hanson–Wright inequality. A pioneering result in this direction (for Rademacher variables) can be found in [39]. Later results include [2] (which requires the so-called concentration property), [22], [9], and [11] (certain classes of weakly dependent random variables). In [20], a uniform Hanson–Wright inequality for sub-Gaussian random variables was proven. We may show a similar result for random variables with bounded Orlicz norms of any order α ∈ (0, 2].

Theorem 41

Let X1, …, Xn be independent, centered random variables and \(K := \lVert \max _i \lvert X_i \rvert \rVert _{\Psi _\alpha }\), where α ∈ (0, 2]. Let \(\mathcal {A}\) be a compact set of real symmetric n × n matrices, and let \(f(X) := \sup _{A \in \mathcal {A}} (X^TAX - \mathbb {E}X^TAX)\). Then, for any t ≥ 0,

$$\displaystyle \begin{aligned} \mathbb {P}(f(X) - \mathbb {E}f(X) \ge t) \le 2\exp&\Big(-\frac{1}{C_\alpha K^\alpha}\min\\ &\Big(\frac{t^\alpha}{(\mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2)^\alpha}, \frac{t^{\alpha/2}}{\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}^{\alpha/2}}\Big)\Big). \end{aligned} $$

For α = 2, this gives back [20, Theorem 1.1] (up to constants and a different range of t). Comparing Theorems 41 to 21, we note that instead of a sub-Gaussian term, we obtain an α-subexponential term (which can be trivially transformed into a sub-Gaussian term for \(t \le \mathbb {E}\sup _{A\in \mathcal {A}}\lVert AX \rVert _2\), but this does not cover the complete α-subexponential regime). Moreover, Theorem 41 only gives a bound for the upper tails. Therefore, if \(\mathcal {A}\) just consists of a single matrix, Theorem 21 is stronger. These differences have technical reasons.

To prove Theorem 41, we shall follow the basic steps of [20] and modify those where the truncation comes in. Let us first repeat some tools and results. In the sequel, for a random vector W = (W1, …, Wn), we shall denote

$$\displaystyle \begin{aligned} f(W) := \sup_{A \in \mathcal{A}} (W^TAW - g(A)), \end{aligned} $$
(4.2)

where \(g : \mathbb {R}^{n \times n} \to \mathbb {R}\) is some function. Moreover, if A is any matrix, we denote by Diag(A) its diagonal part (regarded as a matrix with zero entries on its off-diagonal). The following lemma combines [20, Lemmas 3.2 & 3.5].

Lemma 42

  1. (1)

    Assume the vector W has independent components that satisfy Wi ≤ K a.s. Then, for any t ≥ 1, we have

    $$\displaystyle \begin{aligned} f(W) - \mathbb {E}f(W) \le C\big(K(\mathbb {E}\sup_{A \in \mathcal{A}}\lVert AW \rVert _2 &+ \mathbb {E}\sup_{A \in \mathcal{A}} \lVert \mathrm{Diag}(A)W \rVert _2)\sqrt{t} \\ &+ K^2\sup_{A\in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}t\big) \end{aligned} $$

    with probability at least 1 − et.

  2. (2)

    Assuming the vector W has independent (but not necessarily bounded) components with mean zero, we have

    $$\displaystyle \begin{aligned} \mathbb {E}\sup_{A\in\mathcal{A}}\lVert \mathrm{Diag}(A)W \rVert _2 \le C\mathbb {E}\sup_{A\in\mathcal{A}}\lVert AW \rVert _2. \end{aligned}$$

From now on, let X be the random vector from Theorem 41, and recall the truncated random vector Y  that we introduced in (3.3) (and the corresponding “remainder” Z). Then, Lemma 42 (1) for f(Y ) with \(g(A) = \mathbb {E}X^TAX\) yields

$$\displaystyle \begin{aligned} f(Y) - \mathbb {E}f(Y) \le C\big(M(\mathbb {E}\sup_{A \in \mathcal{A}}\lVert AY \rVert _2 &+ \mathbb {E}\sup_{A\in \mathcal{A}}\lVert \mathrm{Diag}(A) \rVert _2)t^{1/\alpha} \\ &+ M^2t^{2/\alpha}\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}\big) \end{aligned} $$
(4.3)

with probability at least 1 − et (actually, (4.3) even holds with α = 2, but in the sequel we will have to use the weaker version given above anyway). Here we recall that \(M \le C_\alpha \lVert \max _i |X_i| \rVert _{\Psi _\alpha }\).

To prove Theorem 41, it remains to replace the terms involving the truncated random vector Y  by the original vector X. First, by Proposition 31 and since \(\sup _{A\in \mathcal {A}} \lVert AX \rVert _2\) is \(\sup _{A\in \mathcal {A}}\lVert A \rVert _{\mathrm {op}}\)-Lipschitz, we obtain

$$\displaystyle \begin{aligned} \mathbb {P}(\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 > \mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2 + C_\alpha\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \sup_{A\in\mathcal{A}} \lVert A \rVert _{\mathrm{op}}t^{1/\alpha}) \le 2e^{-t}. \end{aligned} $$
(4.4)

Moreover, by (3.8),

$$\displaystyle \begin{aligned} \lvert \mathbb {E}\sup_{A\in\mathcal{A}} \lVert AY \rVert _2 - \mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 \rvert \le C_\alpha\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \sup_{A\in\mathcal{A}} \lVert A \rVert _{\mathrm{op}}. \end{aligned} $$
(4.5)

Next we estimate the difference between the expectations of f(X) and f(Y ).

Lemma 43

We have

$$\displaystyle \begin{aligned} \lvert \mathbb {E}f(Y)-\mathbb {E}f(X) \rvert \le C_\alpha\big(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2 + \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2 \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}\big). \end{aligned}$$

Proof

First note that

$$\displaystyle \begin{aligned} f(X) &= \sup_{A\in\mathcal{A}} (Y^T AY - \mathbb {E}X^TAX + Z^TAX + Z^TAY)\\ &\le \sup_{A\in\mathcal{A}} (Y^T AY - \mathbb {E}X^TAX) + \sup_{A\in\mathcal{A}}\lvert Z^TAX \rvert + \sup_{A\in\mathcal{A}}\lvert Z^TAY \rvert \\ &\le f(Y) + \lVert Z \rVert _2\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\sup_{A\in\mathcal{A}}\lVert AY \rVert _2. \end{aligned} $$

The same holds if we reverse the roles of X and Y . As a consequence,

$$\displaystyle \begin{aligned} \lvert f(X)-f(Y) \rvert \le \lVert Z \rVert _2\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\sup_{A\in\mathcal{A}}\lVert AY \rVert _2, \end{aligned} $$
(4.6)

and thus, taking expectations and applying Hölder’s inequality,

$$\displaystyle \begin{aligned} \lvert \mathbb {E}f(X)-\mathbb {E}f(Y) \rvert \le (\mathbb {E}\lVert Z \rVert _2^2)^{1/2}((\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2^2)^{1/2} + (\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AY \rVert _2^2)^{1/2}). \end{aligned} $$
(4.7)

We may estimate \((\mathbb {E}\lVert Z \rVert _2^2)^{1/2}\) using (3.9). Moreover, by related arguments as in (3.8), from (4.4), we get that

$$\displaystyle \begin{aligned} \mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2^2 \le C_\alpha((\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2)^2 + \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2 \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}^2). \end{aligned}$$

Arguing similarly and using (4.5), the same bound also holds for \((\mathbb {E}\sup _{A\in \mathcal {A}} \) \(\lVert AY \rVert _2^2)^{1/2}\). Taking roots and plugging everything into (4.7) complete the proof. □

Finally, we prove the central result of this section.

Proof of Theorem 41

First, it immediately follows from Lemma 43 that

$$\displaystyle \begin{aligned} \mathbb {E}f(Y) \le \mathbb {E}f(X) + C_\alpha\big(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \mathbb {E}\sup_{A\in\mathcal{A}}\lVert AX \rVert _2 + \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2 \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}\big). \end{aligned} $$
(4.8)

Moreover, by (4.5) and Lemma 42 (2),

$$\displaystyle \begin{aligned} \mathbb {E}\sup_{A \in \mathcal{A}}\lVert AY \rVert _2 + \mathbb {E}\sup_{A\in \mathcal{A}}\lVert \mathrm{Diag}(A)Y \rVert _2 &\le C_\alpha(\mathbb {E}\sup_{A \in \mathcal{A}}\lVert AX \rVert _2 \\ &+ \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha} \sup_{A \in \mathcal{A}}\lVert A \rVert _{\mathrm{op}}). \end{aligned} $$
(4.9)

Finally, it follows from (4.6), (4.4), and (4.5) that

$$\displaystyle \begin{aligned} \lvert f(X)-f(Y) \rvert &\le \lVert Z \rVert _2\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\sup_{A\in\mathcal{A}}\lVert AY \rVert _2\\ &\le C_\alpha(\lVert Z \rVert _2\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2 + \lVert Z \rVert _2\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}t^{1/\alpha}) \end{aligned} $$

with probability at least 1 − 4et for all t ≥ 1. Using (3.7), it follows that

$$\displaystyle \begin{aligned} \lvert f(X)-f(Y) \rvert &\le C_\alpha(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2t^{1/\alpha} \\ &+ \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}t^{2/\alpha}) \end{aligned} $$
(4.10)

with probability at least 1 − 6et for all t ≥ 1. Combining (4.8), (4.9), and (4.10) and plugging into (4.3) thus yield that with probability at least 1 − 6et for all t ≥ 1,

$$\displaystyle \begin{aligned} f(X) - \mathbb {E}f(X) &\le C_\alpha(\lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}\mathbb {E}\sup_{A\in\mathcal{A}} \lVert AX \rVert _2t^{1/\alpha} \\ &+ \lVert \max_i \lvert X_i \rvert \rVert _{\Psi_\alpha}^2\sup_{A\in\mathcal{A}}\lVert A \rVert _{\mathrm{op}}t^{2/\alpha})\\ &=: C_\alpha(at^{1/\alpha} + bt^{2/\alpha}). \end{aligned} $$

If \(u \ge \max (a,b)\), it follows that

$$\displaystyle \begin{aligned} \mathbb {P}(f(X) - \mathbb {E}f(X) \ge u) \le 6\exp\Big(-\frac{1}{C_\alpha}\min\Big(\Big(\frac{u}{a}\Big)^\alpha, \Big(\frac{u}{b}\Big)^{\alpha/2}\Big)\Big). \end{aligned}$$

By standard means (a suitable change of constants, using (2.3)), this bound may be extended to any u ≥ 0, and the constant may be adjusted to 2. □

5 Random Tensors

By a simple random tensor, we mean a random tensor of the form

$$\displaystyle \begin{aligned} X := X_1 \otimes \cdots \otimes X_d = (X_{1,i_1} \cdots X_{d,i_d})_{i_1, \ldots, i_d} \in \mathbb {R}^{n^d}, \end{aligned} $$
(5.1)

where all Xk are independent random vectors in \(\mathbb {R}^n\) whose coordinates are independent, centered random variables with variance one. Concentration results for random tensors (typically for polynomial-type functions) have been shown in [6, 12, 26], for instance.

Recently, in [42], new and interesting concentration bounds for simple random tensors were shown. In comparison to previous work, these inequalities focus on small values of t, e. g., a regime where sub-Gaussian tail decay holds. Moreover, in contrast to previous papers, [42] provides constants with optimal dependence on d. One of these results is the following convex concentration inequality: assuming that n and d are positive integers, \(f : \mathbb {R}^{n^d} \to \mathbb {R}\) is convex and 1-Lipschitz, and the Xij are bounded a.s., then for any t ∈ [0, 2nd∕2],

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp \Big(- \frac{t^2}{Cdn^{d-1}}\Big), \end{aligned} $$
(5.2)

where C > 0 only depends on the bound of the coordinates. Using Theorem 21 and Proposition 31, we may extend this result to unbounded random variables as follows:

Theorem 51

Let \(n, d \in \mathbb {N}\) and \(f : \mathbb {R}^{n^d} \to \mathbb {R}\) be convex and 1-Lipschitz. Consider a simple random tensor X := X1 ⊗⋯ ⊗ Xd as in (5.1). Fix α ∈ [1, 2], and assume that \(\lVert X_{i,j} \rVert _{\Psi _\alpha } \le K\). Then, for any \(t \in [0, c_\alpha n^{d/2} (\log n)^{1/\alpha }/K]\),

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big( \frac{t}{d^{1/2} n^{(d-1)/2} (\log n)^{1/\alpha} K}\Big)^\alpha\Big). \end{aligned}$$

On the other hand, if α ∈ (0, 1), then, for any \(t \in [0, c_\alpha n^{d/2} (\log n)^{1/\alpha }d^{1/\alpha -1/2}/K]\),

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big( \frac{t}{d^{1/\alpha}n^{(d-1)/2} (\log n)^{1/\alpha} K}\Big)^\alpha\Big). \end{aligned}$$

The logarithmic factor stems from the Orlicz norm of maxi|Xi| in Proposition 31. For a slightly sharper version that includes the explicit dependence on these norms (and also gives back (5.2) for bounded random variables and α = 2), see (5.12) in the proof of Theorem 51. We believe that Theorem 51 is non-optimal for α < 1 as we would expect a bound of the same type as for α ∈ [1, 2]. However, a key difference in the proofs is that in the case of α ≥ 1 we can make use of moment-generating functions. This is clearly not possible if α < 1, so that less subtle estimates must be invoked instead.

For the proof of Theorem 51, we first adapt some preliminary steps and compile a number of auxiliary lemmas whose proofs are deferred to the appendix. As a start, we need some additional characterizations of α-subexponential random variables via the behavior of the moment-generating functions:

Proposition 52

Let X be a random variable and α ∈ (0, 2]. Then, the properties (1.1), (1.2), and (1.3) are equivalent to

$$\displaystyle \begin{aligned} \mathbb {E}\exp(\lambda^\alpha |X|{}^\alpha) \le \exp(C_{4,\alpha}^\alpha \lambda^\alpha) \end{aligned} $$
(5.3)

for all 0 ≤ λ ≤ 1∕C4,α. If α ∈ [1, 2] and \(\mathbb {E}X = 0\), then the above properties are moreover equivalent to

$$\displaystyle \begin{aligned} \mathbb {E}\exp(\lambda X) \le \begin{cases} \exp(C_{5,\alpha}^2 \lambda^2) & \mathit{\text{if}} \ |\lambda| \le 1/C_{5,\alpha} \\ \exp(C_{5,\alpha}^{\alpha/(\alpha-1)} |\lambda|{}^{\alpha/(\alpha-1)}) & \mathit{\text{if}} \ |\lambda| \ge 1/C_{5,\alpha} \ \mathit{\text{and}} \ \alpha > 1. \end{cases} \end{aligned} $$
(5.4)

The parameters Ci,α, i = 1, …, 5, can be chosen such that they only differ by constant α-dependent factors. In particular, we can take \(C_{i,\alpha } = c_{i,\alpha }\lVert X \rVert _{\Psi _\alpha }\).

To continue, note that \(\lVert X \rVert _2 = \prod _{i=1}^d \lVert X_i \rVert _2\). A key step in the proofs of [42] is a maximal inequality that simultaneously controls the tails of \(\prod _{i=1}^k \lVert X_i \rVert _2\), k = 1, …, d, where the Xi have independent sub-Gaussian components, i. e., α = 2. Generalizing these results to any order α ∈ (0, 2] is not hard. The following preparatory lemma extends [42, Lemma 3.1]. Note that in the proof (given in the appendix again), we apply Proposition 22.

Lemma 53

Let \(X_1, \ldots , X_d \in \mathbb {R}^n\) be independent random vectors with independent, centered coordinates such that \({\mathbb {E}} X_{i,j}^2 = 1\) and \(\lVert X_{i,j} \rVert _{\Psi _\alpha } \le K\) for some α ∈ (0, 2]. Then, for any t ∈ [0, 2nd∕2],

$$\displaystyle \begin{aligned} \mathbb {P} \Big(\prod_{i=1}^d \lVert X_i \rVert_2 > n^{d/2} + t\Big) \le 2 \exp\Big(-\frac{1}{C_\alpha} \Big(\frac{t}{K^2 d^{1/2}n^{(d-1)/2}}\Big)^\alpha\Big). \end{aligned}$$

To control all k = 1, …, d simultaneously, we need a generalized version of the maximal inequality [42, Lemma 3.2] that we state next.

Lemma 54

Let \(X_1, \ldots , X_d \in \mathbb {R}^n\) be independent random vectors with independent, centered coordinates such that \({\mathbb {E}} X_{i,j}^2 = 1\) and \(\lVert X_{i,j} \rVert _{\Psi _\alpha } \le K\) for some α ∈ (0, 2]. Then, for any u ∈ [0, 2],

$$\displaystyle \begin{aligned} \mathbb {P}\Big(\max_{1 \le k \le d}n^{-k/2}\prod_{i=1}^k \lVert X_i \rVert_2 > 1 + u \Big) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2} u}{K^2d^{1/2}}\Big)^\alpha\Big). \end{aligned}$$

The following Martingale-type bound is directly taken from [42]:

Lemma 55 ([42], Lemma 4.1)

Let X1, …Xd be independent random vectors. For each k = 1, …, d, let fk = fk(Xk, …, Xd) be an integrable real-valued function and \(\mathcal {E}_k\) be an event that is uniquely determined by the vectors Xk, …, Xd. Let \(\mathcal {E}_{d+1}\) be the entire probability space. Suppose that for every k = 1, …, d, we have

$$\displaystyle \begin{aligned} \mathbb {E}_{X_k} \exp(f_k) \le \pi_k \end{aligned}$$

for every realization of Xk+1, …, Xd in \(\mathcal {E}_{k+1}\). Then, for \(\mathcal {E} := \mathcal {E}_2 \cap \cdots \cap \mathcal {E}_d\), we have

$$\displaystyle \begin{aligned} \mathbb {E} \exp(f_1 + \ldots + f_d)1_{\mathcal{E}} \le \pi_1 \cdots \pi_d. \end{aligned}$$

Finally, we need a bound for the Orlicz norm of maxi|Xi|.

Lemma 56

Let X1, …, Xn be independent, centered random variables such that \(\lVert X_i \rVert _{\Psi _\alpha } \le K\) for any i and some α > 0. Then,

$$\displaystyle \begin{aligned} \lVert \max_i |X_i| \rVert_{\Psi_\alpha} \le C_\alpha K \max\Big\{\Big(\frac{\sqrt{2}+1}{\sqrt{2}-1}\Big)^{1/\alpha},(\log n)^{1/\alpha}\Big(\frac{2}{\log 2}\Big)^{1/\alpha}\Big\}. \end{aligned}$$

Here, we may choose \(C_\alpha = \max \{2^{1/\alpha -1}, 2^{1-1/\alpha }\}\).

Note that for α ≥ 1, [8, Proposition 4.3.1] provides a similar result. However, we are also interested in the case of α < 1 in the present note. The condition \(\mathbb {E}X_i = 0\) in Lemma 56 can easily be removed only at the expense of a different absolute constant.

We are now ready to prove Theorem 51.

Proof of Theorem 51

We shall adapt the arguments from [42]. First let

$$\displaystyle \begin{aligned} \mathcal{E}_k := \Big\{ \prod_{i=k}^d \lVert X_i \rVert_2 \le 2 n^{(d-k+1)/2}\Big\},\qquad k = 1, \ldots, d, \end{aligned}$$

and let \(\mathcal {E}_{d+1}\) be the full space. It then follows from Lemma 54 for u = 1 that

$$\displaystyle \begin{aligned} \mathbb {P}(\mathcal{E}) \ge 1 - 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2}}{K^2d^{1/2}}\Big)^\alpha\Big), \end{aligned} $$
(5.5)

where \(\mathcal {E} := \mathcal {E}_2 \cap \cdots \cap \mathcal {E}_d\).

Now fix any realization x2, …, xd of the random vectors X2, …, Xd in \(\mathcal {E}_2\), and apply Proposition 31 to the function f1(x1) given by x1f(x1, …xd). Clearly, f1 is convex, and since

$$\displaystyle \begin{aligned} |f(x \otimes x_2 \otimes \cdots \otimes x_d) - f(y \otimes x_2 \otimes \cdots \otimes x_d)| \le \lVert x - y \rVert_2 \prod_{i=2}^d \lVert x_i \rVert_2 \le \lVert x - y \rVert_2 2 n^{(d-1)/2}, \end{aligned}$$

we see that it is 2n(d−1)∕2-Lipschitz. Hence, it follows from (3.2) that

$$\displaystyle \begin{aligned} \lVert f - \mathbb {E}_{X_1} f \rVert_{\Psi_\alpha(X_1)} \le c_\alpha n^{(d-1)/2} \lVert \max_j |X_{1,j}| \rVert_{\Psi_\alpha} \end{aligned} $$
(5.6)

for any x2, …, xd in \(\mathcal {E}_2\), where \(\mathbb {E}_{X_1}\) denotes taking the expectation with respect to X1 (which, by independence, is the same as conditionally on X2, …, Xd).

To continue, fix any realization x3, …, xd of the random vectors X3, …, Xd that satisfy \(\mathcal {E}_3\) and apply Proposition 31 to the function f2(x2) given by \(x_2 \mapsto \mathbb {E}_{X_1}f(X_1, x_2, \ldots , x_d)\). Again, f2 is a convex function, and since

$$\displaystyle \begin{aligned} &|\mathbb {E}_{X_1}f(X_1 \otimes x \otimes x_3 \otimes \ldots \otimes x_d) - \mathbb {E}_{X_1}f(X_1 \otimes y \otimes x_3 \otimes \ldots \otimes x_d)|\\ &\le \mathbb {E}_{X_1} \lVert X_1 \otimes (x-y) \otimes x_3 \otimes \ldots \otimes x_d \rVert_2 \le (\mathbb {E}\lVert X_1 \rVert_2^2)^{1/2} \lVert x - y \rVert_2 \prod_{i=3}^d \lVert x_i \rVert_2\\ &\le \sqrt{n} \lVert x - y \rVert_2 \cdot 2 n^{(d-2)/2} = \lVert x - y \rVert_2 \cdot 2 n^{(d-1)/2}, \end{aligned} $$

f2 is 2n(d−1)∕2-Lipschitz. Applying (3.2), we thus obtain

$$\displaystyle \begin{aligned} \lVert \mathbb {E}_{X_1} f - \mathbb {E}_{X_1,X_2} f \rVert_{\Psi_\alpha(X_2)} \le c_\alpha n^{(d-1)/2} \lVert \max_j |X_{2,j}| \rVert_{\Psi_\alpha} \end{aligned} $$
(5.7)

for any x3, …, xd in \(\mathcal {E}_3\). Iterating this procedure, we arrive at

$$\displaystyle \begin{aligned} \lVert \mathbb {E}_{X_1, \ldots, X_{k-1}} f - \mathbb {E}_{X_1,\ldots, X_k} f \rVert_{\Psi_\alpha(X_k)} \le c_\alpha n^{(d-1)/2} \lVert \max_j |X_{k,j}| \rVert_{\Psi_\alpha} \end{aligned} $$
(5.8)

for any realization xk+1, …, xd of Xk+1, …, Xd in \(\mathcal {E}_{k+1}\).

We now combine (5.8) for k = 1, …, d. To this end, we write

$$\displaystyle \begin{aligned} \Delta_k := \Delta_k(X_k, \ldots, X_d) := \mathbb {E}_{X_1, \ldots, X_{k-1}}f - \mathbb {E}_{X_1, \ldots, X_k}f \end{aligned}$$

and apply Proposition 52. Here we have to distinguish between the cases where α ∈ [1, 2] and α ∈ (0, 1). If α ≥ 1, we use (5.4) to arrive at a bound for the moment-generating function. Writing \(M_k := \lVert \max _j |X_{k,j}| \rVert _{\Psi _\alpha }\), we obtain

$$\displaystyle \begin{aligned} \mathbb {E}\exp(\lambda \Delta_k) \le \begin{cases} \exp((c_\alpha n^{(d-1)/2} M_k)^2 \lambda^2)\\ \exp((c_\alpha n^{(d-1)/2}M_k)^{\alpha/(\alpha-1)} |\lambda|{}^{\alpha/(\alpha-1)}) \end{cases} \end{aligned}$$

for all xk+1, …, xd in \(\mathcal {E}_{k+1}\), where the first line holds if |λ|≤ 1∕(cαn(d−1)∕2Mk) and the second one if |λ|≥ 1∕(cαn(d−1)∕2Mk) and α > 1. For the simplicity of presentation, temporarily assume that cαn(d−1)∕2 = 1 (alternatively, replace Mk by cαn(d−1)∕2Mk in the following arguments) and that M1 ≤… ≤ Md. Using Lemma 55, we obtain

$$\displaystyle \begin{aligned} &\qquad \mathbb {E} \exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} = \mathbb {E}\exp(\lambda(\Delta_1 + \cdots + \Delta_d))1_{\mathcal{E}}\\ &\le \exp((M_1^2 + \ldots + M_k^2) \lambda^2 + (M_{k+1}^{\alpha/(\alpha-1)} + \ldots + M_d^{\alpha/(\alpha-1)}) |\lambda|{}^{\alpha/(\alpha-1)}) \end{aligned} $$

for |λ|∈ [1∕Mk+1, 1∕Mk], where we formally set M0 := 0 and Md+1 := . In particular, setting \(M := (M_1^2 + \ldots + M_d^2)^{1/2}\), we have

$$\displaystyle \begin{aligned} \mathbb {E} \exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} \le \exp(M^2 \lambda^2) \end{aligned}$$

for all |λ|≤ 1∕Md = 1∕(maxkMk). Furthermore, for α > 1, it is not hard to see that

$$\displaystyle \begin{aligned} (M_1^2 + \ldots + M_k^2) \lambda^2 + (M_{k+1}^{\alpha/(\alpha-1)} + \ldots + M_d^{\alpha/(\alpha-1)}) |\lambda|{}^{\alpha/(\alpha-1)} \le M^{\alpha/(\alpha-1)}|\lambda|{}^{\alpha/(\alpha-1)} \end{aligned}$$

If |λ|∈ [1∕Mk+1, 1∕Mk] for some k = 0, 1, …, d − 1 or |λ|∈ [1∕M, 1∕Md] for k = d. Indeed, by monotonicity (divide by λ2 and compare the coefficients), it suffices to check this for λ = 1∕Mk+1 or λ = 1∕M if k = d. The cases of k = 0 and k = d follow by simple calculations. In the general case, set \(x^2 = (M_1^2 + \ldots + M_{k+1}^2)/M_{k+1}^2\) and \(y^{\alpha /(\alpha -1)} = (M_{k+2}^{\alpha /(\alpha -1)} + \ldots + M_d^{\alpha /(\alpha -1)})/M_{k+1}^{\alpha /(\alpha -1)}\). Clearly, (x2 + yα∕(α−1))(α−1)∕α ≤ (x2 + y2)1∕2 since x ≥ 1 and α∕(α − 1) ≥ 2. Moreover, \(y^2 \le (M_{k+2}^2 + \ldots + M_d^2)/M_{k+1}^2\), which proves the inequality. Altogether, inserting the factor cαn(d−1)∕2 again, we therefore obtain

$$\displaystyle \begin{aligned} &\mathbb {E} \exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} = \mathbb {E}\exp(\lambda(\Delta_1 + \cdots + \Delta_d))1_{\mathcal{E}}\notag\\ &\qquad \le \begin{cases} \exp((c_\alpha n^{(d-1)/2})^2 M^2 \lambda^2) \\ \exp((c_\alpha n^{(d-1)/2})^{\alpha/(\alpha-1)} M^{\alpha/(\alpha-1)} |\lambda|{}^{\alpha/(\alpha-1)}), \end{cases} \end{aligned} $$
(5.9)

where the first line holds if |λ|≤ 1∕(cαn(d−1)∕2M) and the second one if |λ|≥ 1∕(cαn(d−1)∕2M) and α > 1.

On the other hand, if α < 1, we use (5.3). Together with Lemma 55 and the subadditivity of |⋅|α for α ∈ (0, 1), this yields

$$\displaystyle \begin{aligned} &\mathbb {E} \exp(\lambda^\alpha|f-\mathbb {E}f|{}^\alpha)1_{\mathcal{E}} \le \mathbb {E}\exp(\lambda^\alpha(|\Delta_1|{}^\alpha + \cdots + |\Delta_d|{}^\alpha))1_{\mathcal{E}}\\ &\le \exp((c_\alpha n^{(d-1)/2})^\alpha (M_1^\alpha + \cdots + M_d^\alpha) \lambda^\alpha) \end{aligned} $$
(5.10)

for λ ∈ [0, 1∕(cαn(d−1)∕2maxkMk)].

To finish the proof, first consider α ∈ [1, 2]. Then, for any λ > 0, we have

$$\displaystyle \begin{aligned} \mathbb {P}(f - \mathbb {E}f > t) &\le \mathbb {P}(\{f - \mathbb {E}f > t \} \cap \mathcal{E}) + \mathbb {P}(\mathcal{E}^c)\\ &\le \mathbb {P}(\exp(\lambda(f-\mathbb {E}f))1_{\mathcal{E}} > \exp(\lambda t)) + \mathbb {P}(\mathcal{E}^c)\\ &\le \exp\Big(- \Big(\frac{t}{c_\alpha n^{(d-1)/2} M}\Big)^\alpha\Big) + 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2}}{K^2d^{1/2}}\Big)^\alpha\Big), \end{aligned} $$
(5.11)

where the last step follows by standard arguments (similarly as in the proof of Proposition 52 given in the appendix), using (5.9) and (5.5). Now, assume that t ≤ cαnd∕2M∕(K2d1∕2). Then, the right-hand side of (5.11) is dominated by the first term (possibly after adjusting constants), so that we arrive at

$$\displaystyle \begin{aligned} \mathbb {P}(f - \mathbb {E}f > t) \le 3 \exp\Big(- \frac{1}{C_\alpha} \Big(\frac{t}{n^{(d-1)/2} M}\Big)^\alpha\Big). \end{aligned}$$

The same arguments hold if f is replaced by − f. Adjusting constants by (2.3), we obtain that for any t ∈ [0, cαnd∕2M∕(K2d1∕2)],

$$\displaystyle \begin{aligned} \mathbb {P}(|f(X) - \mathbb {E}f(X)| > t) \le 2 \exp\Big(-\frac{1}{C_\alpha}\Big( \frac{t}{n^{(d-1)/2} M}\Big)^\alpha\Big). \end{aligned} $$
(5.12)

Now it remains to note that by Lemma 56, we have

$$\displaystyle \begin{aligned} \lVert \max_j |X_{i,j}| \rVert_{\Psi_\alpha} \le C_\alpha (\log n)^{1/\alpha} \max_j \lVert X_{i,j} \rVert_{\Psi_\alpha} \le C_\alpha (\log n)^{1/\alpha} K. \end{aligned}$$

If α ∈ (0, 1), similarly to (5.11), using (5.10), (5.5) and Proposition 52,

$$\displaystyle \begin{aligned} \mathbb {P}(|f - \mathbb {E}f| > t) &\le \mathbb {P}(\{|f - \mathbb {E}f| > t \} \cap \mathcal{E}) + \mathbb {P}(\mathcal{E}^c)\\ &\le 2\exp\Big(- \Big(\frac{t}{c_\alpha n^{(d-1)/2} M_\alpha}\Big)^\alpha\Big) + 2 \exp\Big(-\frac{1}{C_\alpha}\Big(\frac{n^{1/2}}{K^2d^{1/2}}\Big)^\alpha\Big), \end{aligned} $$

where \(M_\alpha := (M_1^\alpha + \ldots + M_d^\alpha )^{1/\alpha }\). The rest follows as above. □