Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Main Results

Random matrices and their spectra are under intensive study in Statistics since the work of Wishart [28] on sample covariance matrices, in Numerical Analysis since their introduction by von Neumann and Goldstine [25] in the 1940s, and in Physics as a consequence of Wigner’s work [26, 27] since the 1950s. His Semicircle Law, a fundamental theorem in the spectral theory of large random matrices describing the limit of the empirical spectral measure for what is nowadays known as Wigner matrices, is among the most celebrated results of the theory.

In Banach Space Theory and Asymptotic Geometric Analysis, random matrices appeared already in the 70s (see e.g. [2, 3, 9]). In [2], the authors obtained asymptotic bounds for the expected value of the operator norm of a random matrix B = (b ij ) i, j = 1 m, n with independent mean-zero entries with | b ij  | ≤ 1 from 2 n to q m, 2 ≤ q < . To be more precise, they proved that

$$\displaystyle{\mathbb{E}\,\big\|B:\,\ell_{ 2}^{n} \rightarrow \ell_{ q}^{m}\big\| \leq C_{ q} \cdot \max \big (m^{1/q},\sqrt{n}\big),}$$

where C q depends only on q. This was then successfully used to characterize (p, q)-absolutely summing operators on Hilbert spaces. Ever since, random matrices are extensively studied and methods of Banach spaces have produced numerous deep and new results. In particular, in many applications the spectral properties of a Gaussian matrix, whose entries are independent identically distributed (i.i.d.) standard Gaussian random variables, were used. Seginer proved in [22] that for an m × n random matrix with i.i.d. symmetric random variables the expectation of its spectral norm (that is, the operator norm from 2 n to 2 m) is of the order of the expectation of the largest Euclidean norm of its rows and columns. He also obtained an optimal result in the case of random matrices with entries ɛ ij a ij , where ɛ ij are independent Rademacher random variables and a ij are fixed numbers. We refer the interested reader to the surveys [6, 7] and references therein.

It is natural to ask similar questions about general random matrices, in particular about Gaussian matrices whose entries are still independent centered Gaussian random variables, but with different variances. In this structured case, where we drop the assumption of identical distributions, very little is known. It is conjectured that the expected spectral norm of such a Gaussian matrix is as in Seginer’s result, that is, of the order of the expectation of the largest Euclidean norm of its rows and columns. A big step toward the solution was made by Latała in [15], who proved a bound involving fourth moments, which is of the right order \(\max (\sqrt{m},\sqrt{n})\) in the i.i.d. setting, but does not capture the right behavior in the case of, for instance, diagonal matrices. On one hand, as is mentioned in [15], in view of the classical Bai-Yin theorem, the presence of fourth moments is not surprising, on the other hand they are not needed if the conjecture is true.

Later in [20], Riemer and Schütt proved the conjecture up to a logn factor. The two results are incomparable—depending on the choice of variances, one or another gives a better bound. The Riemer-Schütt estimate was used recently in [21].

We would also like to mention that the non-commutative Khintchine inequality can be used to show that the expected spectral norm is bounded from above by the largest Euclidean norm of its rows and columns times a factor\(\sqrt{\log n}\) (see e.g. (4. 9) in [23]).

Another big step toward the solution was made a short while ago by Bandeira and Van Handel [1]. In particular, they proved that

$$\displaystyle\begin{array}{rcl} \mathbb{E}\,\big\|(a_{ij}g_{ij}):\ell_{ 2}^{n} \rightarrow \ell_{ 2}^{m}\big\| \leq C\Big(\left \vert \left \vert \left \vert A\right \vert \right \vert \right \vert + \sqrt{\log \min (n, m)} \cdot \max _{ ij}\vert a_{ij}\vert \Big),& &{}\end{array}$$
(1)

where \(\left \vert \left \vert \left \vert A\right \vert \right \vert \right \vert\) denotes the largest Euclidean norm of the rows and columns of (a ij ), C > 0 is a universal constant, and g ij are independent standard Gaussian random variables (see [1, Theorem  3.1]). Under mild structural assumptions, the bound (1) is already optimal. Further progress was made by Van Handel [24] who verified the conjecture up to a \(\sqrt{\log \log n}\) factor. In fact, more was proved in [24]. He computed precisely the expectation of the largest Euclidean norm of the rows and columns using Gaussian concentration. And, while the moment method is at the heart of the proofs in [22] and [1], he proposed a very nice approach based on the comparison of Gaussian processes to improve the result of Latała. His approach can be also used for our setting. We comment on this in Sect. 4.

The purpose of this work is to provide bounds for operator norms of such structured Gaussian random matrices considered as operators from \(\ell_{p^{{\ast}}}^{n}\) to q m.

In what follows, by g i , g ij , i ≥ 1, j ≥ 1 we always denote independent standard Gaussian random variables. Let \(n,m \in \mathbb{N}\) and \(A = (a_{ij})_{i,j=1}^{m,n} \in \mathbb{R}^{m\times n}\). We write G = G A  = (a ij g ij ) i, j = 1 m, n. For r ≥ 1, we denote by \(\gamma _{r} \approx \sqrt{r}\) the L r -norm of a standard Gaussian random variable. The notation f ≈ h means that there are two absolute positive constants c and C (that is, independent of any parameters) such that cf ≤ h ≤ Cf and f ≈  p, q h means that there are two positive constants c(p, q) and C(p, q), which depend only on the parameters p and q, such that c(p, q)f ≤ h ≤ C(p, q)f.

Our main result is the following theorem.

Theorem 1.1

For every 1 < p ≤ 2 ≤ q < ∞ one has

$$\displaystyle\begin{array}{rcl} \mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|& \leq & \Big(\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|^{q}\Big)^{1/q} {}\\ & \leq & C\,p^{5/q}\,(\log m)^{1/q}\,\bigg[\,\gamma _{ p}\,\max _{i\leq m}\|(a_{ij})_{j=1}^{n}\|_{ p} +\gamma _{q}\,\mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert \,\bigg] {}\\ & & +2^{1/q}\,\gamma _{ q}\,\max _{j\leq n}\|(a_{ij})_{i=1}^{m}\|_{ q}, {}\\ \end{array}$$

where C is a positive absolute constant.

We conjecture the following bound.

Conjecture 1.2

For every 1 ≤ p ≤ 2 ≤ q ≤∞ one has

$$\displaystyle\begin{array}{rcl} \mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\| \approx \max _{ i\leq m}\|(a_{ij})_{j=1}^{n}\|_{ p} +\max _{j\leq n}\|(a_{ij})_{i=1}^{m}\|_{ q} + \mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert.& & {}\\ \end{array}$$

Here, as usual, p is defined via the relation 1∕p + 1∕p  = 1. This conjecture extends the corresponding conjecture for the case p = q = 2 and m = n. In this case, Bandeira and Van Handel proved in [1] an estimate with \(\sqrt{ \log \min (m,n)}\max \vert a_{ij}\vert\) instead of \(\mathbb{E}\max \vert a_{ij}g_{ij}\vert\) (see Eq. (1)), while in [24] the corresponding bound is proved with \(\sqrt{\log \log n}\) in front of the right hand side.

Remark 1.3

The lower bound in the conjecture is almost immediate and follows from standard estimates. Thus the upper bound is the only difficulty.

Remark 1.4

In the case p  = 1 and q ≥ 2, a direct computation following along the lines of Lemma 3.2 below, shows that

$$\displaystyle{\mathbb{E}\,\big\|G:\ell_{ 1}^{n} \rightarrow \ell_{ q}^{m}\big\|\lesssim \gamma _{ q}\max _{j\leq n}\|(a_{ij})_{i=1}^{m}\|_{ q} + \mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert.}$$

Remark 1.5

Note that if 1 ≤ p  ≤ 2 ≤ q ≤ , in the case of matrices of tensor structure, that is, (a ij ) i, j = 1 n = xy = (x j ⋅ y i ) i, j = 1 n, with \(x,y \in \mathbb{R}^{n}\), Chevet’s theorem [3, 4] and a direct computation show that

$$\displaystyle{\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{n}\big\| \approx _{ p,q}\|y\|_{q}\|x\|_{\infty } +\| y\|_{\infty }\|x\|_{p}.}$$

If the matrix is diagonal, that is, \((a_{ij})_{i,j=1}^{n} =\mathop{ \mathrm{diag}}\nolimits (a_{11},\ldots,a_{nn})\), then we immediately obtain

$$\displaystyle{\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{n}\big\| = \mathbb{E}\,\|(a_{ ii}g_{ii})_{i=1}^{n}\|_{ \infty }\approx \max _{i\leq n}\sqrt{\ln (i + 3)}\, \cdot a_{ii}^{{\ast}}\approx \| (a_{ ii})_{i=1}^{n}\|_{ M_{g}},}$$

where (a ii ) i ≤ n is the decreasing rearrangement of ( | a ii  | ) i ≤ n and M g is the Orlicz function given by

$$\displaystyle{M_{g}(s) = \sqrt{\frac{2} {\pi }} \int _{0}^{s}e^{- \frac{1} {2t^{2}} }\,dt}$$

(see Lemma 2.2 below and [11, Lemma  5.2] for the Orlicz norm expression).

Slightly different estimates, but of the same flavour, can also be obtained in the case 1 ≤ q ≤ 2 ≤ p  ≤ .

2 Notation and Preliminaries

By c, C, C 1,  we always denote positive absolute constants, whose values may change from line to line, and we write c p , C p ,  if the constants depend on some parameter p.

Given p ∈ [1, ], p denotes its conjugate and is given by the relation 1∕p + 1∕p  = 1. For \(x = (x_{i})_{i\leq n} \in \mathbb{R}^{n}\), ∥ x ∥  p denotes its p -norm, that is ∥ x ∥   = max i ≤ n  | x i  | and, for p < ,

$$\displaystyle{ \|x\|_{p} =\Big (\sum _{i=1}^{n}\vert x_{ i}\vert ^{p}\Big)^{1/p}. }$$

The corresponding space \((\mathbb{R}^{n},\|\cdot \|_{p})\) is denoted by p n, its unit ball by B p n.

If E is a normed space, then E denotes its dual space and B E its closed unit ball. The modulus of convexity of E is defined for any ɛ ∈ (0, 2) by

$$\displaystyle{\delta _{E}(\varepsilon ):=\inf \Big\{ 1 -\Big\|\frac{x + y} {2} \Big\|_{E}\,:\,\| x\|_{E} = 1,\ \|y\|_{E} = 1,\ \|x - y\|_{E}>\varepsilon \Big\}.}$$

We say that E has modulus of convexity of power type 2 if there exists a positive constant c such that for all ɛ ∈ (0, 2), δ E (ɛ) ≥ c ɛ 2. It is well known that this property (see e.g. [8] or [18, Proposition  2.4]) is equivalent to the fact that

$$\displaystyle{\Big\|\frac{x + y} {2} \Big\|_{E}^{2} +\lambda ^{-2}\Big\|\frac{x - y} {2} \Big\|_{E}^{2} \leq \frac{\|x\|_{E}^{2} +\| y\|_{ E}^{2}} {2} }$$

holds for all x, y ∈ E, where λ > 0 is a constant depending only on c. In that case, we say that E has modulus of convexity of power type 2 with constant λ. We clearly have δ E (ɛ) ≥ ɛ 2∕(2λ 2).

Recall that a Banach space E is of Rademacher type r for some 1 ≤ r ≤ 2 if there is C > 0 such that for all \(n \in \mathbb{N}\) and for all x 1, , x n  ∈ E,

$$\displaystyle{\bigg(\mathbb{E}_{\varepsilon }\Big\|\sum _{i=1}^{n}\varepsilon _{ i}x_{i}\Big\|^{2}\bigg)^{1/2} \leq C\left (\sum _{ i=1}^{n}\|x_{ i}\|^{r}\right )^{1/r},}$$

where (ɛ i ) i = 1 is a sequence of independent random variables defined on some probability space \((\Omega, \mathbb{P})\) such that \(\mathbb{P}(\varepsilon _{i} = 1) = \mathbb{P}(\varepsilon _{i} = -1) = \frac{1} {2}\) for every \(i \in \mathbb{N}\). The smallest C is called type-r constant of E, denoted by T r (E). This concept was introduced into Banach space theory by Hoffmann-Jørgensen [14] in the early 1970s and the basic theory was developed by Maurey and Pisier [17].

We will need the following theorem.

Theorem 2.1

Let E be a Banach space with modulus of convexity of power type 2 with constant λ. Let X 1 ,…,X m ∈ E be independent random vectors, q ≥ 2 and define

$$\displaystyle{B:= C\lambda ^{4}T_{ 2}(E^{{\ast}})\sqrt{\frac{\log m} {m}}\Big(\mathbb{E}\max _{i\leq m}\|X_{i}\|_{E^{{\ast}}}^{q}\Big)^{1/2},}$$

and

$$\displaystyle{\sigma:=\sup _{y\in B_{E}}\left ( \frac{1} {m}\sum _{i=1}^{m}\mathbb{E}\vert \langle X_{ i},y\rangle \vert ^{q}\right )^{1/q}.}$$

Then

$$\displaystyle\begin{array}{rcl} \mathbb{E}\sup _{y\in B_{E}}\bigg\vert \frac{1} {m}\sum _{i=1}^{m}\vert \langle X_{ i},y\rangle \vert ^{q} - \mathbb{E}\vert \langle X_{ i},y\rangle \vert ^{q}\bigg\vert & \leq B^{2} + B \cdot \sigma ^{q/2}.& {}\\ \end{array}$$

Its proof is done following the argument “proof of condition (H)” of [13] in combination with the improvement on covering numbers established in [12, Lemma  2]. Indeed, in [12], the argument is only made in the simpler case q = 2, but it can be extended verbatim to the case q ≥ 2.

We also recall known facts about Gaussian random variables. The next lemma is well-known (see e.g. Lemmas 2.3, 2.4 in [24]).

Lemma 2.2

Let \(a = (a_{i})_{i\leq n} \in \mathbb{R}^{n}\) and (a i ) i≤n be the decreasing rearrangement of (|a i |) i≤n . Then

$$\displaystyle{\mathbb{E}\,\max _{i\leq n}\vert a_{i}g_{i}\vert \approx \max _{i\leq n}\sqrt{\ln (i + 3)}\, \cdot a_{i}^{{\ast}}.}$$

Note that in general the maximum of i.i.d. random variables weighted by coordinates of a vector a is equivalent to a certain Orlicz norm ∥ a ∥  M , where the function M depends only on the distribution of random variables (see [10, Corollary  2] and Lemma  5.2 in [11]).

The following theorem is the classical Gaussian concentration inequality (see e.g. [5] or inequality (2.35) and Proposition 2.18 in [16]).

Theorem 2.3

Let \(n \in \mathbb{N}\) and \((Y,\left \Vert \cdot \right \Vert _{Y })\) be a Banach space. Let y 1 ,…,y n ∈ Y and X = ∑ i=1 n g i y i . Then, for every t > 0,

$$\displaystyle{ \mathbb{P}\Big(\big\vert \left \Vert X\right \Vert _{Y } - \mathbb{E}\left \Vert X\right \Vert _{Y }\big\vert \geq t\Big) \leq 2\exp \left (- \frac{t^{2}} {2\sigma _{Y }(X)^{2}}\right ), }$$
(2)

where \(\sigma _{Y }(X) =\sup _{\|\xi \|_{Y^{{\ast}}}=1}\left (\sum _{i=1}^{n}\left \vert \xi (y_{i})\right \vert ^{2}\right )^{1/2}\) .

Remark 2.4

Let p ≥ 2. Let \(a = (a_{j})_{j\leq n} \in \mathbb{R}^{n}\) and X = (a j g j ) j ≤ n . Then we clearly have

$$\displaystyle{\sigma _{\ell_{p}^{n}}(X) =\max _{j\leq n}\vert a_{j}\vert.}$$

Thus, Theorem 2.3 implies for X = (a j g j ) j ≤ n

$$\displaystyle\begin{array}{rcl} \mathbb{P}\Big(\big\vert \|X\|_{p} - \mathbb{E}\|X\|_{p}\big\vert> t\Big) \leq 2\,\exp \bigg(- \frac{t^{2}} {2\max _{j\leq n}\vert a_{j}\vert ^{2}}\bigg).& &{}\end{array}$$
(3)

Note also that

$$\displaystyle\begin{array}{rcl} \mathbb{E}\|X\|_{p} \leq \bigg (\sum _{j=1}^{n}\vert a_{ j}\vert ^{p}\,\mathbb{E}\vert g_{ j}\vert ^{p}\bigg)^{1/p} =\gamma _{ p}\|a\|_{p}.& &{}\end{array}$$
(4)

3 Proof of the Main Result

We will apply Theorem 2.1 with \(E =\ell_{ p^{{\ast}}}^{n}\), 1 < p  ≤ 2 and X 1, , X m being the rows of the matrix G = (a ij g ij ) i, j = 1 m, n. We start with two lemmas in which we estimate the quantity σ and the expectation, appearing in that theorem.

Lemma 3.1

Let \(m,n \in \mathbb{N}\) , 1 < p ≤ 2 ≤ q, and for i ≤ m let X i = (a ij g ij ) j=1 n . Then

$$\displaystyle\begin{array}{rcl} \sigma & =\sup _{y\in B_{p^{{\ast}}}^{n}}\bigg( \frac{1} {m}\sum _{i=1}^{m}\mathbb{E}\big\vert \langle X_{ i},y\rangle \big\vert ^{q}\bigg)^{1/q} = \frac{\gamma _{q}} {m^{1/q}}\,\max _{j\leq n}\|(a_{ij})_{i=1}^{m}\|_{q}.& {}\\ \end{array}$$

Proof

For every i ≤ m, \(\langle X_{i},y\rangle =\sum _{ j=1}^{n}a_{ij}y_{j}g_{ij}\), is a Gaussian random variable with variance \(\|(a_{ij}y_{j})_{j=1}^{n}\|_{2}\). Hence,

$$\displaystyle\begin{array}{rcl} \sigma ^{q} =\sup _{ y\in B_{p^{{\ast}}}^{n}} \frac{1} {m}\sum _{i=1}^{m}\mathbb{E}\vert \langle X_{ i},y\rangle \vert ^{q} = \frac{\gamma _{q}^{q}} {m}\sup _{y\in B_{p^{{\ast}}}^{n}}\sum _{i=1}^{m}\bigg(\sum _{ j=1}^{n}\vert a_{ ij}y_{j}\vert ^{2}\bigg)^{q/2}.& & {}\\ \end{array}$$

Since p  ≤ 2 ≤ q, the function

$$\displaystyle{ \phi (z) =\sum _{ i=1}^{m}\bigg(\sum _{ j=1}^{n}\vert a_{ ij}\vert ^{2}\vert z_{ j}\vert ^{2/p^{{\ast}} }\bigg)^{q/2} }$$

is a convex function on the simplex \(S =\{ z \in \mathbb{R}^{n}\,\vert \,\sum _{j=1}^{n} \leq 1,\,\forall j:\, z_{j} \geq 0\}\). Therefore, it attains its maximum on extreme points, that is, on vectors of the canonical unit basis of \(\mathbb{R}^{n}\), e 1, , e n . Thus,

$$\displaystyle{ \sup _{y\in B_{p^{{\ast}}}^{n}}\sum _{i=1}^{m}\bigg(\sum _{ j=1}^{n}\vert a_{ ij}y_{j}\vert ^{2}\bigg)^{q/2} =\sup _{ z\in S}\phi (z) =\sup _{k\leq n}\phi (e_{k}) =\max _{j\leq n}\|(a_{ij})_{i=1}^{m}\|_{ q}^{q}, }$$

which completes the proof. □ 

Now we estimate the expectation in Theorem 2.1. The proof is based on the Gaussian concentration, Theorem 2.3, and is similar to Theorem 2.1 and Remark 2.2 in [24].

Lemma 3.2

Let \(m,n \in \mathbb{N}\) , 1 < p ≤ 2 ≤ q, and for i ≤ m let X i = (a ij g ij ) j=1 n . Then

$$\displaystyle\begin{array}{rcl} \Big(\mathbb{E}\max _{i\leq m}\|X_{i}\|_{p}^{q}\Big)^{1/q}& \leq & \max _{ i\leq m}\mathbb{E}\|X_{i}\|_{p} + C\,\gamma _{q}\,\mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert {}\\ &\leq & \gamma _{p}\,\max _{i\leq m}\|(a_{ij})_{j=1}^{n}\|_{ p} + C\,\gamma _{q}\,\mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert, {}\\ \end{array}$$

where C is a positive absolute constant.

Proof

We have

$$\displaystyle\begin{array}{rcl} \Big(\mathbb{E}\max _{i\leq m}\|X_{i}\|_{p}^{q}\Big)^{1/q}& \leq & \big\|\max _{ i\leq m}\big\vert \|X_{i}\|_{p} - \mathbb{E}\|X_{i}\|_{p}\big\vert +\max _{i\leq m}\mathbb{E}\|X_{i}\|_{p}\big\|_{L_{q}} {}\\ & \leq & \Big(\mathbb{E}\max _{i\leq m}\big\vert \|X_{i}\|_{p} - \mathbb{E}\|X_{i}\|_{p}\big\vert ^{q}\Big)^{1/q} +\max _{ i\leq m}\mathbb{E}\|X_{i}\|_{p}. {}\\ \end{array}$$

For all i ≤ m and t > 0 by (3) we have

$$\displaystyle{ \mathbb{P}\Big(\big\vert \|X_{i}\|_{p} - \mathbb{E}\|X_{i}\|_{p}\big\vert> t\Big) \leq 2\,\exp \bigg(- \frac{t^{2}} {2\max _{j\leq n}\vert a_{ij}\vert ^{2}}\bigg). }$$
(5)

By permuting the rows of (a ij ) i, j = 1 m, n, we can assume that

$$\displaystyle{ \max _{j\leq n}\vert a_{1j}\vert \geq \ldots \geq \max _{j\leq n}\vert a_{mj}\vert. }$$

For each i ≤ m, choose j(i) ≤ n such that | a ij(i) | = max j ≤ n  | a ij  | . Clearly,

$$\displaystyle{\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert \geq \max _{i\leq m}\vert a_{ij(i)}\vert \cdot \vert g_{ij(i)}\vert }$$

and hence, by independence of g ij ’s and Lemma 2.2,

$$\displaystyle\begin{array}{rcl} b:= \mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert \geq \mathbb{E}\max _{i\leq m}\vert a_{ij(i)}\vert \cdot \vert g_{i}\vert \geq c\max _{i\leq m}\sqrt{\log (i + 3)} \cdot \vert a_{ij(i)}\vert,& & {}\\ \end{array}$$

where the latter inequality follows since | a 1j(1) | ≥  ≥ | a nj(n) | . Thus, for i ≤ m,

$$\displaystyle\begin{array}{rcl} \max _{j\leq n}\vert a_{ij}\vert ^{2} = a_{ ij(i)}^{2} \leq \frac{b^{2}} {c\log (i + 3)}.& & {}\\ \end{array}$$

By (5) we observe for every t > 0,

$$\displaystyle\begin{array}{rcl} \mathbb{P}\Big(\max _{i\leq m}\big\vert \|X_{i}\|_{p} - \mathbb{E}\|X_{i}\|_{p}\big\vert> t\Big)& \leq & 2\,\sum _{i=1}^{m}\exp \bigg(-\frac{ct^{2}\log (i + 3)} {2b^{2}} \bigg) {}\\ & =& 2\,\sum _{i=1}^{m}\bigg( \frac{1} {i + 3}\bigg)^{ct^{2}/2b^{2} } \leq 2\,\int _{3}^{\infty }x^{-ct^{2}/2b^{2} }\,dx {}\\ & \leq & 6 \cdot 3^{-ct^{2}/2b^{2} }, {}\\ \end{array}$$

whenever ct 2b 2 ≥ 4. Integrating the tail inequality proves that

$$\displaystyle{\bigg(\mathbb{E}\max _{i\leq m}\Big\vert \|X_{i}\|_{p} - \mathbb{E}\|X_{i}\|_{p}\Big\vert ^{q}\bigg)^{1/q} \leq C_{ 1}\sqrt{q}\,b \leq C_{2}\,\gamma _{q}\,\,\mathbb{E}\max _{{ i\leq m \atop j\leq n} }\vert a_{ij}g_{ij}\vert.}$$

By the triangle inequality, we obtain the first desired inequality, the second one follows by (4). □ 

We are now ready to present the proof of the main theorem.

Proof of Theorem 1.1

First observe that

$$\displaystyle{\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\| \leq \Big (\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|^{q}\Big)^{1/q} =\bigg (\mathbb{E}\sup _{ y\in B_{p^{{\ast}}}^{n}}\sum _{i=1}^{m}\big\vert \langle X_{ i},y\rangle \big\vert ^{q}\bigg)^{1/q}.}$$

We have

$$\displaystyle\begin{array}{rcl} \mathbb{E}\sup _{y\in B_{p^{{\ast}}}^{n}}\sum _{i=1}^{m}\big\vert \langle X_{ i},y\rangle \big\vert ^{q}& \leq & \mathbb{E}\sup _{ y\in B_{p^{{\ast}}}^{n}}\left [\sum _{i=1}^{m}\big\vert \langle X_{ i},y\rangle \big\vert ^{q} - \mathbb{E}\big\vert \langle X_{ i},y\rangle \big\vert ^{q}\right ]+\sup _{ y\in B_{p^{{\ast}}}^{n}}\sum _{i=1}^{m}\mathbb{E}\big\vert \langle X_{ i},y\rangle \big\vert ^{q} {}\\ & =& m \cdot \mathbb{E}\sup _{y\in B_{p^{{\ast}}}^{n}}\left [ \frac{1} {m}\sum _{i=1}^{m}\big\vert \langle X_{ i},y\rangle \big\vert ^{q} - \mathbb{E}\big\vert \langle X_{ i},y\rangle \big\vert ^{q}\right ]+m \cdot \sigma ^{q}. {}\\ \end{array}$$

Hence, Theorem 2.1 applied with \(E =\ell_{ p^{{\ast}}}^{n}\) implies

$$\displaystyle\begin{array}{rcl} \mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|^{q} \leq m \cdot \big [B^{2} + B\sigma ^{q/2}\big] + m \cdot \sigma ^{q} \leq 2m\,\big(B^{2} +\sigma ^{q}\big),& & {}\\ \end{array}$$

where B and σ are defined in that theorem. Therefore,

$$\displaystyle\begin{array}{rcl} \Big(\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|^{q}\Big)^{1/q} \leq 2^{1/q}m^{1/q}\,\left (B^{2/q}+\sigma \right ).& & {}\\ \end{array}$$

Now, recall that \(T_{2}(\ell_{p}^{n}) \approx \sqrt{p}\) and that \(B_{p^{{\ast}}}^{n}\) has modulus of convexity of power type 2 with λ −2 ≈ 1∕p (see, e.g., [19, Theorem 5.3]). Therefore,

$$\displaystyle\begin{array}{rcl} B^{2/q}& =& C^{2/q}\lambda ^{8/q}\,T_{ 2}^{2/q}(\ell_{ p}^{n})\left (\frac{\log m} {m}\right )^{1/q}\Big(\mathbb{E}\max _{ i\leq m}\|X_{i}\|_{p}^{q}\Big)^{1/q} {}\\ & =& C^{2/q}p^{5/q}(\log m)^{1/q}m^{-1/q}\Big(\mathbb{E}\max _{ i\leq m}\|X_{i}\|_{p}^{q}\Big)^{1/q}. {}\\ \end{array}$$

Applying Lemma 3.1, we obtain

$$\displaystyle\begin{array}{rcl} & & \Big(\mathbb{E}\,\big\|G:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|^{q}\Big)^{1/q} {}\\ & & \quad \leq (2C^{2})^{1/q} \cdot p^{5/q} \cdot (\log m)^{1/q}\Big(\mathbb{E}\max _{ i\leq m}\|X_{i}\|_{p}^{q}\Big)^{1/q} {}\\ & & \qquad + 2^{1/q}\gamma _{ q} \cdot \max _{j\leq n}\|(a_{ij})_{i=1}^{m}\|_{ q}. {}\\ \end{array}$$

The desired bound follows now from Lemma 3.2. □ 

Remark 3.3

This proof can be extended to the case of random matrices whose rows are centered independent vectors with multivariate Gaussian distributions. We leave the details to the interested reader.

4 Concluding Remarks

In this section, we briefly outline what can be obtained using the approach of [24]. We use a standard trick to pass to a symmetric matrix. The matrix G A being given, define S as

$$\displaystyle{S = \frac{1} {2}\left (\begin{array}{*{10}c} 0\quad G_{A}^{T} \\ G_{A}\quad 0 \end{array} \right ).}$$

Then, S is a random symmetric matrix and

$$\displaystyle{\sup _{w}\langle Sw,w\rangle =\sup _{u\in B_{p^{{\ast}}}^{n}}\sup _{v\in B_{q^{{\ast}}}^{m}}\langle G_{A}u,v\rangle =\big\| G_{A}:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|,}$$

where the supremum in w is taken over all vectors of the form (u, v)T with \(u \in B_{p^{{\ast}}}^{n}\) and \(v \in B_{q^{{\ast}}}^{m}\). Repeating verbatim the proof of Theorem  4.1 in [24] one gets

$$\displaystyle\begin{array}{rcl} \mathbb{E}\,\big\|G_{A}:\ell_{ p^{{\ast}}}^{n} \rightarrow \ell_{ q}^{m}\big\|\quad & \lesssim _{ p,q}& \mathbb{E}\max _{i\leq m}\bigg(\sum _{j=1}^{n}\vert g_{ j}\vert ^{p}\vert a_{ ij}\vert ^{p}\bigg)^{1/p} {}\\ & & \quad + \mathbb{E}\max _{j\leq n}\bigg(\sum _{i=1}^{m}\vert g_{ i}\vert ^{q}\vert a_{ ij}\vert ^{g}\bigg)^{1/q} + \mathbb{E}\max _{ i}Y _{i}, {}\\ \end{array}$$

where Y ∼ N(0, A ) and A is a positive definite matrix whose diagonal elements are bounded by

$$\displaystyle{ \max \Bigg(\max _{i\leq m}\sqrt{\sum _{j } a_{ij }^{4}}\,,\,\max _{j\leq n}\sqrt{\sum _{i } a_{ij }^{4}}\,\Bigg). }$$

However, the bounds obtained here and in Theorem 1.1 are incomparable. Depending on the situation one may be better than the other.