Abstract
In this paper, we derive an extension of the Marc̆enko–Pastur theorem to a large class of weak dependent sequences of real-valued random variables having only moment of order 2. Under a mild dependence condition that is easily verifiable in many situations, we derive that the limiting spectral distribution of the associated sample covariance matrix is characterized by an explicit equation for its Stieltjes transform, depending on the spectral density of the underlying process. Applications to linear processes, functions of linear processes, and ARCH models are given.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A typical object of interest in many fields is the sample covariance matrix \(\mathbf{B}_n=n^{-1}\sum _{j=1}^n \mathbf{X}^T_j\mathbf{X}_j\) where \((\mathbf{X}_j)\), \(j=1,\ldots ,n\), is a sequence of \(N=N(n)\)-dimensional real-valued row random vectors. The interest in studying the spectral properties of such matrices has emerged from multivariate statistical inference since many test statistics can be expressed in terms of functionals of their eigenvalues. The study of the empirical distribution function (e.d.f.) \(F^{\mathbf{B}_n}\) of the eigenvalues of \({\mathbf{B}_n}\) goes back to Wishart 1920s, and the spectral analysis of large-dimensional sample covariance matrices has been actively developed since the remarkable work of Marc̆enko and Pastur [10] stating that if \(\lim _{n\rightarrow \infty }N/n=c\in (0,\infty )\), and all the coordinates of all the vectors \(\mathbf{X}_j\)’s are i.i.d. (independent identically distributed), centered and in \(\mathbb{L }^2\), then, with probability one, \(F^{\mathbf{B}_n}\) converges in distribution to a non-random distribution (the original Marc̆enko–Pastur’s theorem is stated for random variables having moment of order four, for the proof under moment of order two only, we refer to Yin [24]).
Since the Marc̆enko–Pastur’s pioneering paper, there has been a large amount of work aiming at relaxing the independence structure between the coordinates of the \(\mathbf{X}_j\)’s. Yin [24] and Silverstein [17] considered a linear transformation of independent random variables, which leads to the study of the empirical spectral distribution of random matrices of the form \(\mathbf{B}_n=n^{-1}\sum _{j=1}^n\Gamma _N^{1/2}\mathbf{Y}^T_j \mathbf{Y}_j\Gamma _N^{1/2}\) where \(\Gamma _N\) is an \(N \times N\) nonnegative definite Hermitian random matrix, independent of the \(\mathbf{Y}_j\)’s which are i.i.d and such that all their coordinates are i.i.d. In the latter paper, it is shown that if \(\lim _{n\rightarrow \infty } N/n=c\in (0,\infty )\) and \(F^{{\Gamma }_N}\) converges almost surely in distribution to a non-random probability distribution function (p.d.f.) \(H\) on \([0,\infty )\), then, almost surely, \(F^{\mathbf{B}_n}\) converges in distribution to a (non-random) p.d.f. \(F\) that is characterized in terms of its Stieltjes transform which satisfies a certain equation. Some further investigations on the model mentioned above can be found Silverstein and Bai [18] and Pan [13].
A natural question is then to wonder whether other possible correlation patterns of coordinates can be considered, in such a way that, almost surely (or in probability), \(F^{\mathbf{B}_n}\) still converges in distribution to a non-random p.d.f. The recent work by Bai and Zhou [2] is in this direction. Assuming that the \(\mathbf{X}_j\)’s are i.i.d. and a very general dependence structure of their coordinates, they derive the limiting spectral distribution (LSD) of \(\mathbf{B}_n\). Their result has various applications. In particular, in case when the \(\mathbf{X}_j\)’s are independent copies of \(\mathbf{X}= (X_1, \ldots ,X_N)\) where \((X_k)_{k \in \mathbb{Z }}\) is a stationary linear process with centered i.i.d. innovations, applying their Theorem 1.1, they prove that, almost surely, \(F^{\mathbf{B}_n}\) converges in distribution to a non-random p.d.f. \(F\), provided that \(\lim _{n \rightarrow \infty } N/n = c \in (0,\infty )\), the coefficients of the linear process are absolutely summable and the innovations have a moment of order four (see their Theorem 2.5). For this linear model, let us mention that in a recent paper, Yao [23] shows that the Stieltjes transform of the limiting p.d.f. \(F\) satisfies an explicit equation that depends on \(c\) and on the spectral density of the underlying linear process. Still in the context of the linear model described above but relaxing the equidistribution assumption on the innovations, and using a different approach than the one considered in the papers by Bai and Zhou [2] and by Yao [23], Pfaffel and Schlemm [15] also derive the LSD of \(\mathbf{B}_n\) still assuming moments of order four for the innovations plus a polynomial decay of the coefficients of the underlying linear process.
In this work, we extend such Marc̆enko–Pastur-type theorems along another direction. We shall assume that the \(\mathbf{X}_j\)’s are independent copies of \(\mathbf{X}= (X_1,\ldots ,X_N)\) where \((X_k)_{k \in \mathbb{Z }}\) is a stationary process of the form \(X_k=g(\ldots , \varepsilon _{k-1},\varepsilon _k )\), the \(\varepsilon _k\)’s are i.i.d. real-valued random variables, and \(g:\mathbb{R }^{\mathbb{Z }} \rightarrow \mathbb{R }\) is a measurable function such that \(X_k\) is a proper centered random variable. Assuming that \(X_0\) has a moment of order two only, and imposing a dependence condition expressed in terms of conditional expectation, we prove that if \(\lim _{n \rightarrow \infty } N/n = c \in (0,\infty )\), then almost surely, \(F^{\mathbf{B}_n}\) converges in distribution to a non-random p.d.f. \(F\) whose Stieltjes transform satisfies an explicit equation that depends on \(c\) and on the spectral density of the underlying stationary process \((X_k)_{k \in \mathbb{Z }}\) (see our Theorem 2.1). The imposed dependence condition is directly related to the physical mechanisms of the underlying process and is easy verifiable in many situations. For instance, when \((X_k)_{k \in \mathbb{Z }}\) is a linear process with i.i.d. innovations, our dependence condition is satisfied, and then our Theorem 2.1 applies, as soon as the coefficients of the linear process are absolutely summable and the innovations have a moment of order two only, which improves Theorem 2.5 in Bai and Zhou [2] and Theorem 1.1 in Yao [23]. Other models, such as functions of linear processes, and ARCH models, for which our Theorem 2.1 applies, are given in Sect. 3.
Let us now give an outline of the method used to prove our Theorem 2.1. Since the \(\mathbf{X}_j\)’s are independent, the result will follow if we can prove that the expectation of the Stieltjes transform of \(F^{\mathbf{B}_n}\), say \(S_{F^{\mathbf{B}_n}}(z)\), converges to the Stieltjes transform of \(F\), say \(S(z)\), for any complex number \(z\) with positive imaginary part. With this aim, we shall consider a sample covariance matrix \(\mathbf{G}_n=n^{-1} \sum _{j=1}^n \mathbf{Z}^T_j\mathbf{Z}_j\) where the \( \mathbf{Z}_j\)’s are independent copies of \(\mathbf{Z}= (Z_1, \ldots Z_N)\) where \((Z_k)_{k \in \mathbb{Z }}\) is a sequence of Gaussian random variables having the same covariance structure as the underlying process \((X_k)_{k \in \mathbb{Z }}\). The \( \mathbf{Z}_j\)’s will be assumed to be independent of the \( \mathbf{X}_j\)’s. Using the Gaussian structure of \(\mathbf{G}_n\), the convergence of \(\mathbb{E }\big (S_{F^{\mathbf{G}_n}}(z) \big ) \) to \(S(z)\) will follow by Theorem 1.1 in Silverstein [17]. The main step of the proof is then to show that the difference between the expectations of the Stieltjes transform of \(F^{\mathbf{B}_n}\) and that of \(F^{\mathbf{G}_n}\) converges to zero. This will be achieved by approximating first \((X_k)_{k \in \mathbb{Z }}\) by an \(m\)-dependent sequence of random variables that are bounded. This leads to a new sample covariance matrix \({\bar{\mathbf{B}}_n}\). We then handle the difference between \(\mathbb{E }\big (S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big )\) and \(\mathbb{E }\big (S_{F^{\mathbf{G}_n}}(z)\big )\) with the help of the so-called Lindeberg method used in the multidimensional case. Lindeberg method is known to be an efficient tool to derive limit theorems, and from our knowledge, it has been used for the first time in the context of random matrices by Chatterjee [4]. With the help of this method, he proved the LSD of Wigner matrices associated with exchangeable random variables.
The paper is organized as follows: in Sect. 2, we specify the model and state the LSD result for the sample covariance matrix associated with the underlying process. Applications to linear processes, functions of linear processes, and ARCH models are given in Sect. 3. Section 4 is devoted to the proof of the main result, whereas some technical tools are stated and proved in “Appendix”.
Here are some notations used all along the paper. For any nonnegative integer \(q\), the notation \(\mathbf{0}_q\) means a row vector of size \(q\). For a matrix \(A\), we denote by \(A^T\) its transpose matrix, by \(\mathrm{Tr} (A)\) its trace, by \(\Vert A\Vert \) its spectral norm, and by \(\Vert A \Vert _2\) its Hilbert-Schmidt norm (also called the Frobenius norm). We shall also use the notation \(\Vert X\Vert _r\) for the \(\mathbb{L }^r\)-norm (\(r \ge 1\)) of a real-valued random variable \(X\). For any square matrix \(A\) of order \(N\) with only real eigenvalues, the empirical spectral distribution of \(A\) is defined as
where \(\lambda _1,\ldots ,\lambda _N\) are the eigenvalues of \(A\). The Stieltjes transform of \(F^{A}\) is given by
where \(z=u+iv\in \mathbb{C }^+\) (the set of complex numbers with positive imaginary part), and \(\mathbf {I}\) is the identity matrix.
Finally, the notation \([x]\) is used to denote the integer part of any real \(x\) and, for two reals \(a\) and \(b\), the notation \(a \wedge b\) means \(\min (a,b)\), whereas the notation \(a \vee b\) means \(\max (a,b)\).
2 Main Result
We consider a stationary causal process \((X_k)_{ k\in \mathbb Z }\) defined as follows: let \((\varepsilon _k)_{k\in \mathbb Z }\) be a sequence of i.i.d. real-valued random variables and let \(g:\mathbb{R }^{\mathbb{Z }} \rightarrow \mathbb{R }\) be a measurable function such that, for any \(k \in \mathbb{Z }\),
is a proper random variable, \(\mathbb{E }(g(\xi _k))=0\) and \(\Vert g(\xi _k ) \Vert _2 < \infty \).
The framework (2.1) is very general and it includes many widely used linear and nonlinear processes. We refer to the papers by Wu [21, 22] for many examples of stationary processes that are of form (2.1). Following Priestley [16] and Wu [21], \((X_k)_{ k\in \mathbb Z }\) can be viewed as a physical system with \(\xi _k\) (respectively \(X_k\)) being the input (respectively the output) and \(g\) being the transform or data-generating mechanism.
For \(n\) a positive integer, we consider \(n\) independent copies of the sequence \( (\varepsilon _k)_{ k\in \mathbb Z }\) that we denote by \(( \varepsilon ^{(i)}_k)_{ k\in \mathbb Z }\) for \(i = 1, \ldots , n\). Setting \(\xi ^{(i)}_k = \big ( \ldots , \varepsilon ^{(i)}_{k-1}, \varepsilon ^{(i)}_k \big )\) and \(X^{(i)}_k=g(\xi ^{(i)}_k )\), it follows that \(( X_{k}^{(1)})_{ k\in \mathbb Z },\ldots ,(X_{k}^{(n)})_{ k\in \mathbb Z }\) are \(n\) independent copies of \(( X_k)_{ k\in \mathbb Z }\). Let now \(N=N(n)\) be a sequence of positive integers, and define for any \(i \in \{1, \ldots , n \}\), \(\mathbf{{X}}_{i}=\big ( X_{1}^{(i)}, \ldots ,X_{N}^{(i)}\big )\). Let
In what follows, \({\mathbf{B}_n}\) will be referred to as the sample covariance matrix associated with \( (X_k)_{ k\in \mathbb Z }\). To derive the limiting spectral distribution of \({\mathbf{B}_n}\), we need to impose some dependence structure on \((X_k)_{k \in \mathbb{Z }}\). With this aim, we introduce the projection operator: for any \(k\) and \(j\) belonging to \(\mathbb{Z }\), let
We state now our main result.
Theorem 2.1
Let \(( X_k)_{ k\in \mathbb Z }\) be defined in (2.1) and \(\mathbf{B}_n\) by (2.2). Assume that
and that \(c(n)=N/n \rightarrow c \in (0,\infty )\). Then, with probability one, \(F^{\mathbf{B}_n}\) tends to a non-random probability distribution \(F\), whose Stieltjes transform \(S=S(z)\) (\(z \in \mathbb C ^+\)) satisfies the equation
where \({\underline{S}}(z):=-(1-c)/z +c S(z)\) and \(f(\cdot )\) is the spectral density of \(( X_k)_{ k\in \mathbb Z }\).
Let us mention that, in the literature, the condition (2.3) is referred to as the Hannan–Heyde condition and is known to be essentially optimal for the validity of the central limit theorem for the partial sums (normalized by \(\sqrt{n}\)) associated with an adapted regular stationary process in \(\mathbb{L }^2\). As we shall see in the next section, the quantity \(\Vert P_0(X_{k}) \Vert _2 \) can be computed in many situations including nonlinear models. We would like to mention that the condition (2.3) is weaker than the 2-strong stability condition introduced by [21, Definition 3] that involves a coupling coefficient.
Remark 2.2
Under the condition (2.3), the series \(\sum _{k \ge 0} |\mathrm{Cov} ( X_0 , X_k ) |\) is finite (see for instance the inequality (4.61)). Therefore (2.3) implies that the spectral density \(f(\cdot )\) of \((X_k)_{k \in \mathbb{Z }}\) exists, is continuous, and bounded on \([0,2\pi )\). It follows that Proposition 1 in Yao [23] concerning the support of the limiting spectral distribution \(F\) still applies if (2.3) holds. In particular, \(F\) is compactly supported. Notice also that condition (2.3) is essentially optimal for the covariances to be absolutely summable. Indeed, for a causal linear process with nonnegative coefficients and generated by a sequence of i.i.d. real-valued random variables centered and in \(\mathbb{L }^2\), both conditions are equivalent to the summability of the coefficients.
Remark 2.3
Let us mention that each of the following conditions is sufficient for the validity of (2.3):
where \(\mathcal{F }_{1}^n = \sigma ( \varepsilon _k , 1 \le k \le n)\). A condition as the second part of (2.5) is usually referred to as a near epoch dependence-type condition. The fact that the first part of (2.5) implies (2.3) follows from Corollary 2 in Peligrad and Utev [14]. Corollary 5 of the same paper asserts that the second part of (2.5) implies its first part.
Remark 2.4
Since many processes encountered in practice are causal, Theorem 2.1 is stated for the one-sided process \((X_k)_{k \in \mathbb{Z }}\) having the representation (2.1). With non-essential modifications in the proof, the same result holds when \((X_k)_{k \in \mathbb{Z }}\) is a two-sided process having the representation
where \(( \varepsilon _k)_{ k\in \mathbb Z }\) is a sequence of i.i.d. real-valued random variables. Assuming that \(X_0\) is centered and in \(\mathbb{L }^2\), condition (2.3) has then to be replaced by the following condition: \(\sum _{k\in \mathbb{Z }} \Vert P_0(X_{k}) \Vert _2 < \infty \).
Remark 2.5
One can wonder whether Theorem 2.1 extends to the case of functionals of another strictly stationary sequence which can be strong mixing or absolutely regular, even if this framework and ours have different range of applicability. Actually, many models encountered in econometric theory have the representation (2.1), whereas, for instance, functionals of absolutely regular (\(\beta \)-mixing) sequences occur naturally as orbits of chaotic dynamical systems. In this situation, we do not think that Theorem 2.1 extends in its full generality without requiring an additional near epoch dependence-type condition. It is outside the scope of this paper to study such models, which will be the object of further investigations.
3 Applications
In this section, we give two different classes of models for which the condition (2.3) is satisfied and then for which our Theorem 2.1 applies. Other classes of models, including nonlinear time series such as iterative Lipschitz models or chains with infinite memory, which are of the form (2.1) and for which the quantities \(\Vert P_0(X_k) \Vert _2\) or \(\Vert \mathbb{E }(X_k | \xi _0) \Vert _2\) can be computed, may be found in [22].
3.1 Functions of Linear Processes
In this section, we shall focus on functions of real-valued linear processes. Define
where \((a_{i})_{i\in {\mathbb{Z }}}\) is a sequence of real numbers in \(\ell ^{1}\) and \((\varepsilon _{i})_{i\in \mathbb Z }\) is a sequence of i.i.d. real-valued random variables in \(\mathbb{L }^{1}\). We shall give sufficient conditions in terms of the regularity of the function \(h\), for the condition (2.3) to be satisfied.
Denote by \(w_{h}(\cdot )\) the modulus of continuity of the function \(h\) on \(\mathbb{R }\), that is:
Corollary 3.1
Assume that
or
Then, provided that \(c(n)=N/n\rightarrow c \in (0,\infty )\), the conclusion of Theorem 2.1 holds for \(F^{\mathbf{B}_n}\) where \(\mathbf{B}_n\) is the sample covariance matrix of dimension \(N\) defined by (2.2) and associated with \((X_{k})_{k\in \mathbf{Z }}\) defined by (3.1).
Example 1
Assume that \(h\) is \(\gamma \)-Hölder with \(\gamma \in ]0,1]\), that is: there is a positive constant \(C\) such that \(w_{h}(t) \le C |t|^{\gamma }\). Assume that
then the condition (3.2) is satisfied and the conclusion of Corollary 3.1 holds. In particular, when \(h\) is the identity, which corresponds to the fact that \(X_k\) is a causal linear process, the conclusion of Corollary 3.1 holds as soon as \(\sum _{k\ge 0}|a_{k}|<\infty \) and \(\varepsilon _0\) belongs to \(\mathbb{L }^2\). This improves Theorem 2.5 in Bai and Zhou [2] and Theorem 1 in Yao [23] that require \(\varepsilon _0\) to be in \(\mathbb{L }^4\).
Example 2
Assume \(\Vert \varepsilon _0 \Vert _{\infty } \le M\) where \(M\) is a finite positive constant, and that \(|a_k| \le C \rho ^k\) where \(\rho \in (0,1)\) and \(C\) is a finite positive constant, then the condition (3.3) is satisfied and the conclusion of Corollary 3.1 holds as soon as \( \sum _{k \ge 1} k^{-1/2} w_h \big ( \rho ^k M C (1- \rho )^{-1}\big ) < \infty \). Using the usual comparison between series and integrals, it follows that the latter condition is equivalent to
For instance if \(w_{h}(t) \le C | \log t |^{-\alpha }\) with \( \alpha >1/2\) near zero, then the above condition is satisfied.
Let us now consider the special case of functionals of Bernoulli shifts (also called Raikov or Riesz–Raikov sums). Let \((\varepsilon _k)_{k\in \mathbb{Z }}\) be a sequence of i.i.d. random variables such that \(\mathbb{P }(\varepsilon _0=1)=\mathbb{P }(\varepsilon _0=0)=1/2\) and let, for any \(k \in \mathbb{Z }\),
where \(h \in \mathbb{L }^2 ([0,1])\), \([0,1]\) being equipped with the Lebesgue measure. Recall that \(Y_n\), \(n \ge 0\), is an ergodic stationary Markov chain taking values in \([0,1]\), whose stationary initial distribution is the restriction of Lebesgue measure to \([0,1]\). As we have seen previously, if \(h\) has a modulus of continuity satisfying (3.4), then the conclusion of Theorem 2.1 holds for the sample covariance matrix associated with such a functional of Bernoulli shifts. Since for Bernoulli shifts, the computations can be done explicitly, we can even derive an alternative condition to (3.4), still in terms of regularity of \(h\), in such a way that (2.3) holds.
Corollary 3.2
. Assume that
for some \(t>1\). Then, provided that \(c(n)=N/n\rightarrow c \in (0,\infty )\), the conclusion of Theorem 2.1 holds for \(F^{\mathbf{B}_n}\) where \(\mathbf{B}_n\) is the sample covariance matrix of dimension \(N\) defined by (2.2) and associated with \((X_{k})_{k\in \mathbf{Z }}\) defined by (3.5).
As a concrete example of a map satisfying (3.6), we can consider the function
(see the computations pages 23–24 in Merlevède et al. [11] showing that the above function satisfies (3.6)).
Proof of Corollary 3.1
To prove the corollary, it suffices to show that the condition (2.3) is satisfied as soon as (3.2) or (3.3) holds. Let \((\varepsilon _k^*)_{k \in \mathbb{Z }}\) be an independent copy of \((\varepsilon _k)_{k \in \mathbb{Z }}\). Denoting by \(\mathbb{E }_{\varepsilon }(\cdot )\) the conditional expectation with respect to \(\varepsilon =(\varepsilon _k)_{k \in \mathbb{Z }}\), we have that, for any \(k \ge 0\),
Next, by the subadditivity of \(w_{h}(\cdot )\), \(w_{h} ( | a_k (\varepsilon _0-\varepsilon _0^*) | ) \le w_{h} (| a_k \varepsilon _0 | )+ w_{h} ( | a_k \varepsilon _0^* | )\). Whence, \( \Vert P_0 ( X_{k} )\Vert _2 \le 2 \Vert w_{h} ( | a_k \varepsilon _0 | )\Vert _2 \). This proves that the condition (2.3) is satisfied under (3.2).
We prove now that if (3.3) holds then so does the condition (2.3). According to Remark 2.3, it suffices to prove that the first part of (2.5) is satisfied. With the same notations as before, we have that, for any \(\ell \ge 0\),
Hence, for any nonnegative integer \(\ell \),
where we have used the subadditivity of \(w_{h}(\cdot )\) for the last inequality. This latter inequality entails that the first part of (2.5) holds as soon as (3.3) does. \(\square \)
Proof of Corollary 3.2
By Remark 2.3, it suffices to prove that the second part of (2.5) is satisfied as soon as (3.6) is. Actually we shall prove that (3.6) implies that
which clearly entails the second part of (2.5) since \(t >1\). An upper bound for the quantity \(\Vert X_{n} - \mathbb{E } ( X_n| \mathcal{F }_{1}^n)\Vert ^2_2\) has been obtained in [8, Chapter 19.3]. Setting \(A_{jn}=[j2^{-n}, (j+1)2^{-n})\) for \(j=0,1,\ldots ,2^{n}-1\), they obtained (see the pages 372–373 of their monograph) that
Since
it follows that
This latter inequality together with the fact that for any \(u \in (0,1)\), \(\sum _{n : 2^{-n} \ge u } ( \log n)^t \le C u^{-1} ( \log ( \log u^{-1}) )^t \) for some positive constant \(C\) prove that (3.7) holds under (3.6). \(\square \)
3.2 ARCH Models
Let \((\varepsilon _{k})_{k\in \mathbb Z }\) be an i.i.d. sequence of zero mean real-valued random variables such that \(\Vert \varepsilon _{0}\Vert _{2 }= 1\). We consider the following ARCH(\(\infty \)) model described by Giraitis et al. [5]:
where \(a\ge 0\), \(a_{j}\ge 0\), and \(\sum _{j\ge 1}a_{j}<1\). Such models are encountered when the volatility \((\sigma _{k}^{2})_{k\in \mathbb{Z }}\) is unobserved. In that case, the process of interest is \((Y_{k}^{2})_{k\in \mathbb{Z }}\) and, in what follows, we consider the process \((X_{k})_{k\in \mathbb Z }\) defined, for any \(k \in \mathbb{Z }\), by:
Notice that, under the above conditions, there exists a unique stationary solution of Eq. (3.8) satisfying (see [5]):
Corollary 3.3
Assume that \(\varepsilon _0\) belongs to \(\mathbb{L }^4\) and that
Then, provided that \(c(n)=N/n\rightarrow c \in (0,\infty )\), the conclusion of Theorem 2.1 holds for \(F^{\mathbf{B}_n}\) where \(\mathbf{B}_n\) is the sample covariance matrix of dimension \(N\) defined by (2.2) and associated with \((X_{k})_{k\in \mathbb{Z }}\) defined by (3.9).
Proof of Corollary 3.3
By Remark 2.3, it suffices to prove that the first part of (2.5) is satisfied as soon as (3.11) is. With this aim, let us notice that, for any integer \(n \ge 1\),
where \(\kappa = \Vert \varepsilon _0\Vert _4^2 \sum _{j \ge 1} a_j\). So, under (3.11), there exists a positive constant \(C\) not depending on \(n\) such that \( \Vert \mathbb{E }( X_n | \xi _0 ) \Vert _2 \le C n^{-b} \). This upper bound implies that the first part of (2.5) is satisfied as soon as \(b>1/2\). \(\square \)
Remark 3.4
Notice that if we consider the sample covariance matrix associated with \((Y_k)_{k \in \mathbb{Z }}\) defined in (3.8), then its LSD follows directly by Theorem 2.1 since \(P_0 (Y_k) = 0\), for any positive integer \(k\).
4 Proof of Theorem 2.1
To prove the theorem, it suffices to show that for any \(z \in \mathbb{C }^+\),
Since the columns of \(\mathcal X _n\) are independent, by Step 1 of the proof of Theorem 1.1 in Bai and Zhou [2], to prove (4.1), it suffices to show that, for any \(z \in \mathbb{C }^+\),
where \(S(z)\) satisfies the Eq. (2.4).
The proof of (4.2) being very technical, for reader convenience, let us describe the different steps leading to it. We shall consider a sample covariance matrix \(\mathbf{G}_n:=\frac{1}{n}\mathcal Z _n\mathcal Z _{n}^{T}\) (see (4.32)) such that the columns of \(\mathcal Z _n\) are independent and the random variables in each column of \(\mathcal Z _n\) form a sequence of Gaussian random variables whose covariance structure is the same as that of the sequence \((X_k)_{k\in \mathbb{Z }}\) (see Sect. 4.2). The aim will be then to prove that, for any \(z\in \mathbb{C }^+\),
and
The proof of (4.4) will be achieved in Sect. 4.4 with the help of Theorem 1.1 in Silverstein [17] combined with arguments developed in the proof of Theorem 1 in Yao [23]. The proof of (4.3) will be divided in several steps. First, to “break” the dependence structure, we introduce a parameter \(m\), and approximate \({\mathbf{B}_n}\) by a sample covariance matrix \({{\bar{\mathbf{B}}}_n}:= \frac{1}{n} {\bar{\mathcal{X }}}_{n} {\bar{\mathcal{X }}}_{n}^{T}\) (see (4.16)) such that the columns of \({\bar{\mathcal{X }}}_{n}\) are independent and the random variables in each column of \({\bar{\mathcal{X }}}_{n}\) form of an \(m\)-dependent sequence of random variables bounded by \(2M \), with \(M\) a positive real (see Sect. 4.1). This approximation will be done in such a way that, for any \(z\in \mathbb C ^+\),
Next, the sample Gaussian covariance matrix \(\mathbf{G}_n\) is approximated by another sample Gaussian covariance matrix \({\widetilde{\mathbf{G}}}_{n}\) (see (4.34)), depending on the parameter \(m\) and constructed from \(\mathbf{G}_n\) by replacing some of the variables in each column of \(\mathcal Z _n\) by zeros (see Sect. 4.2). This approximation will be done in such a way that, for any \(z\in \mathbb C ^+\),
In view of (4.5) and (4.6), the convergence (4.3) will then follow if we can prove that, for any \(z\in \mathbb C ^+\),
This will be achieved in Sect. 4.3 with the help of the Lindeberg method. The rest of this section is devoted to the proofs of the convergences (4.3)–(4.7).
4.1 Approximation by a Sample Covariance Matrix Associated with an \(m\)-Dependent Sequence
Let \(N\ge 2\) and \(m\) be a positive integer fixed for the moment and assumed to be less than \(\sqrt{N/2}\). Set
where we recall that \([ \, \cdot \, ]\) denotes the integer part. Let \(M\) be a fixed positive number that depends neither on \(N\) nor on \(n\), nor on \(m\). Let \(\varphi _{M}\) be the function defined by \(\varphi _{M}(x)=(x \wedge M)\vee (-M)\). Now for any \(k\in \mathbb Z \) and \( i \in \lbrace 1, \ldots , n \rbrace \) let
In what follows, to soothe the notations, we shall write \({\widetilde{X}}^{(i)}_{k,m}\) and \({\bar{X}}^{(i)}_{k,m}\) instead of, respectively, \(\widetilde{X}^{(i)}_{k,M,m}\) and \(\bar{X}^{(i)}_{k,M,m}\), when no confusion is allowed. Notice that \(\big ( \bar{X}^{(1)}_{k,m}\big )_{k \in \mathbb{Z }}, \ldots , \big ( \bar{X}^{(n)}_{k,m}\big )_{k \in \mathbb{Z }}\) are \(n\) independent copies of the centered and stationary sequence \(\big ( {\bar{X}}_{k,m}\big )_{k \in \mathbb{Z }}\) defined by
This implies in particular that: for any \(i \in \{1, \ldots , n \}\) and any \(k\in \mathbb Z \),
For any \(i \in \{1, \ldots , n\}\), note that \( \big ( {\bar{X}}^{(i)}_{k,m}\big )_{k \in \mathbb{Z }}\) forms an \({m}\)-dependent sequence, in the sense that \( {\bar{X}}^{(i)}_{k,m}\) and \({\bar{X}}^{(i)}_{k',m}\) are independent if \(\vert k-k' \vert > {m}\).
We write now the interval \([1, N]\cap \mathbb N \) as a union of disjoint sets as follows:
where, for \( \ell \in \{1, \ldots , k_{N,m} \}\),
and, for \(\ell = k_{N,m} +1 \),
and \(J_{k_{N,m} +1} = \emptyset \). Note that \(I_{k_{N,m} +1} =\emptyset \) if \(k_{N,m}(m^2 +m) = N\).
Let now \(\big (\mathbf{{u}}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m}\} }\) be the random vectors defined as follows. For any \(\ell \) belonging to \(\{1, \ldots , k_{N,m}-1\}\),
Hence, the dimension of the random vectors defined above is equal to \(m^2+{m}\). Now, for \(\ell = k_{N,m}\), we set
where \(r= m + N - k_{N,m} (m^2+m)\). This last vector is then of dimension \( N -( k_{N,m} -1 ) (m^2+m)\).
Notice that the random vectors \(\big ( \mathbf{{u}}^{(i)}_{\ell } \big )_{1 \le i \le n, 1 \le \ell \le k_{N,m}}\) are mutually independent.
For any \(i \in \{1, \ldots , n \}\), we define now row random vectors \({\bar{\mathbf{X}}^{(i)}}\) of dimension \(N\) by setting
where the \(\mathbf{{u}}^{(i)}_{ \ell } \)’s are defined in (4.13) and (4.14). Let
In what follows, we shall prove the following proposition.
Proposition 4.1
For any \(z\in \mathbb C ^+\), the convergence (4.5) holds true with \({\mathbf{B}_n}\) and \({{\bar{\mathbf{B}}}_n}\) as defined in (2.2) and (4.16), respectively.
To prove the proposition above, we start by noticing that, by integration by parts, for any \(z=u+iv \in \mathbb C ^+\),
Now, \(\int \big | F^{\mathbf{B}_n}(x) -F^{{{\bar{\mathbf{B}}}_n}}(x) \big | \hbox {d}x\) is nothing else but the Wasserstein distance of order \(1\) between the empirical measure of \({\mathbf{B}_n}\) and that of \({{{\bar{\mathbf{B}}}_n}}\). To be more precise, if \(\lambda _1, \ldots , \lambda _N\) denote the eigenvalues of \(\mathbf{B}_n\) in the non-increasing order, and \({\bar{\lambda }}_1, \ldots , {\bar{\lambda }}_N\) the ones of \({{{\bar{\mathbf{B}}}_n}}\), also in the non-increasing order, then, setting \(\eta _n=\frac{1}{N} \sum _{k=1}^N \delta _{\lambda _k}\) and \({\bar{\eta }}_n=\frac{1}{N} \sum _{k=1}^N \delta _{ {\bar{\lambda }}_k}\), we have that
where the infimum runs over the set of couples of random variables \((X,Y)\) on \(\mathbb{R } \times \mathbb{R }\) such that \(X\sim \eta _n\) and \(Y \sim {\bar{\eta }}_n\). Arguing as in Remark 4.2.6 in [3], we have
where \(\pi \) is a permutation belonging to the symmetric group \(\mathcal{S }_N\) of \(\{1, \ldots , N \}\). By standard arguments, involving the fact that if \(x,y,u,v\) are real numbers such that \(x\le y\) and \(u >v\), then \(|x-u| + |y-v| \ge |x-v| + |y-u|\), we get that \(\min _{ \pi \in \mathcal{S }_N}\sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{\pi (k)}\vert = \sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{k}\vert \). Therefore,
Notice that \({\lambda }_k=s_k^2\) and \({\bar{\lambda }}_{k} = {\bar{s}}^2_{k}\) where the \(s_k\)’s (respectively the \({\bar{s}}_{k}\)’s) are the singular values of the matrix \(n^{-1/2}\mathcal X _{n}\) (respectively of \(n^{-1/2} \bar{\mathcal{X }}_{n}\)). Hence, by Cauchy–Schwarz’s inequality,
Next, by Hoffman–Wielandt’s inequality (see, e.g., Corollary 7.3.8 in Horn and Johnson [7]),
Therefore,
Starting from (4.17), considering (4.18) and (4.19), and using Cauchy–Schwarz’s inequality, it follows that
By the definition of \({\mathbf{B}_n}\),
where we have used that for each \(i\), \(\big ( X^{(i)}_{k} \big )_{k \in \mathbb{Z }}\) is a copy of the stationary sequence \( ( X_k )_{k \in \mathbb{Z }}\). Now, setting
recalling the definition (4.16) of \({{\bar{\mathbf{B}}}_n}\), using the stationarity of the sequence \(({\bar{X}}^{(i)}_{k,m})_{k \in \mathbb{Z }}\), and the fact that \(\mathrm{card} (\mathcal{I }_{N,m} ) = m^2 k_{N,m}\le N \), we get
Next,
Therefore,
Now, by definition of \(\mathcal{X }_{n}\) and \( \bar{\mathcal{X }}_{n}\),
Using stationarity, the fact that \(\mathrm{card} (\mathcal{I }_{N,m} ) \le N \) and
we get that
Starting from (4.20), considering the upper bounds (4.21), (4.24), and (4.26), we derive that there exists a positive constant \(C\) not depending on \((m,M)\) and such that
Therefore, Proposition 4.1 will follow if we can prove that
Let us introduce now the sequence \((X_{k,m})_{k \in \mathbb{Z }}\) defined as follows: for any \(k \in \mathbb{Z }\),
With the above notation, we write that
Since \(X_0\) is centered, so is \({ X}_{0,m}\). Then \(\Vert X_{0,m} - {\bar{X}}_{0,m} \Vert _2=\Vert X_{0,m} - \mathbb{E }( X_{0,m}) - {\bar{X}}_{0,m} \Vert _2\). Therefore, recalling the definition (4.10) of \({\bar{X}}_{0,m}\), it follows that
Since \(X_0\) belongs to \(\mathbb{L }^2\), \(\lim _{M \rightarrow \infty }\Vert \big ( |X_0| - M )_+ \Vert _2 = 0\). Therefore, to prove (4.27) (and then Proposition 4.1), it suffices to prove that
Since \((X_{0,m})_{m \ge 0}\) is a martingale with respect to the increasing filtration \((\mathcal{G }_m)_{m \ge 0}\) defined by \(\mathcal{G }_m= \sigma ( \varepsilon _{-m}, \ldots , \varepsilon _0 )\) and is such that \(\sup _{m \ge 0} \Vert X_{0,m} \Vert _2 \le \Vert X_0 \Vert _2 < \infty \), (4.30) follows by the martingale convergence theorem in \(\mathbb{L }^2\) (see for instance Corollary 2.2 in Hall and Heyde [6]). This ends the proof of Proposition 4.1. \(\square \)
4.2 Construction of Approximating Sample Covariance Matrices Associated with Gaussian Random Variables
Let \(( Z_k)_{ k\in \mathbb Z }\) be a centered Gaussian process with real values, whose covariance function is given, for any \(k,\ell \in \mathbb Z \), by
For \(n\) a positive integer, we consider \(n\) independent copies of the Gaussian process \(( Z_k)_{ k\in \mathbb Z }\) that are in addition independent of \((X_k^{(i)} )_{ k \in \mathbb{Z } , i \in \{1, \ldots , n \}}\). We shall denote these copies by \(( Z^{(i)}_k)_{ k\in \mathbb Z }\) for \(i = 1, \ldots , n\). For any \(i \in \{1, \ldots , n \}\), define \(\mathbf{{Z}}_{i}=\big ( Z_{1}^{(i)}, \ldots ,Z_{N}^{(i)}\big )\). Let \(\mathcal Z _n=(\mathbf{{Z}}^T_{1} \vert \cdots \vert \mathbf{{Z}}^T_{n}) \) be the matrix whose columns are the \(\mathbf{{Z}}^T_{i}\)’s and consider its associated sample covariance matrix
For \(k_{N,m}\) given in (4.8), we define now the random vectors \(\big (\mathbf{v}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m} \} }\) as follows. They are defined as the random vectors \(\big (\mathbf{{u}}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m} \}}\) defined in (4.13) and (4.14), but by replacing each \({\bar{X}}^{(i)}_{k,m}\) by \(Z^{(i)}_{k}\). For any \(i \in \{1, \ldots , n \}\), we then define the random vectors \(\mathbf{{\widetilde{Z}}}^{(i)}\) of dimension \(N\), as follows:
Let now
In what follows, we shall prove the following proposition.
Proposition 4.2
For any \(z\in \mathbb C ^+\), the convergence (4.6) holds true with \(\mathbf{{G}_{n}}\) and \({\widetilde{\mathbf{G}}}_{n}\) as defined in (4.32) and (4.34) respectively.
To prove the proposition above, we start by noticing that, for any \(z=u+iv \in \mathbb C ^+\),
Hence, by Theorem A.44 in Bai and Silverstein [1],
By definition of \(\mathcal{Z }_{n}\) and \(\widetilde{\mathcal{Z }}_{n}\), \(\mathrm{rank} \big ( \mathcal{Z }_{n} - \widetilde{\mathcal{Z }}_{n}\big ) \le \mathrm{card} ( \mathcal{R }_{N,m} )\), where \(\mathcal{R }_{N,m}\) is defined in (4.22). Therefore, using (4.25), we get that, for any \(z=u+iv \in \mathbb C ^+\),
which converges to zero by letting \(n\) first tend to infinity and after \(m\). This ends the proof of Proposition 4.2. \(\square \)
4.3 Approximation of \(\mathbb{E }\big (S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big )\) by \(\mathbb{E }\big (S_{F^{{\widetilde{\mathbf{G}}}_{n}}}(z)\big )\)
In this section, we shall prove the following proposition.
Proposition 4.3
Under the assumptions of Theorem 2.1, for any \(z\in \mathbb C ^+\), the convergence (4.7) holds true with \({{{\bar{\mathbf{B}}}_n}}\) and \({\widetilde{\mathbf{G}}}_{n}\) as defined in (4.16) and (4.34), respectively.
With this aim, we shall use the Lindeberg method that is based on telescoping sums. In order to develop it, we first give the following definition:
Definition 4.1
Let \(x\) be a vector of \(\mathbb{R }^{n N}\) with coordinates
Let \(z\in \mathbb C ^+\) and \(f:=f_z\) be the function defined from \(\mathbb R ^{nN}\) to \(\mathbb C \) by
and \(\mathbf {I}\) is the identity matrix.
The function \(f\), as defined above, admits partial derivatives of all orders. Indeed, let \(u\) be one of the coordinates of the vector \(x\) and \(A_u=A(x)\) the matrix-valued function of the scalar \(u\). Then, setting \(G_u = \big (A_u-z\mathbf{{I}}\big )^{-1}\) and differentiating both sides of the equality \(G_u(A_u-z\mathbf{{I}})= \mathbf{{I}}\), it follows that
(see the equality (17) in Chatterjee [4]). Higher-order derivatives may be computed by applying repeatedly the above formula. Upper bounds for some partial derivatives up to the fourth order are given in “Appendix”.
Now, using Definition 4.1 and the notations (4.15) and (4.33), we get that, for any \(z\in \mathbb C ^+\),
To continue the development of the Lindeberg method, we introduce additional notations. For any \(i \in \{1, \ldots , n \}\) and \(k_{N,m}\) given in (4.8), we define the random vectors \(\big (\mathbf{{U}}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m}\} }\) of dimension \(n N\) as follows. For any \(\ell \in \{1, \ldots , k_{N,m}\}\),
where the \(\mathbf{{u}}^{(i)}_{\ell } \)’s are defined in (4.13) and (4.14), and
Note that the vectors \(\big ( \mathbf{{U}}^{(i)}_{\ell }\big )_{1 \le i \le n, 1 \le \ell \le k_{N,m}}\) are mutually independent. Moreover, with the notations (4.38) and (4.15), the following relations hold. For any \(i \in \{1, \ldots , n \}\),
where the \({\bar{\mathbf{X}}}^{(i)}\)’s are defined in (4.15).
Now, for any \(i \in \{1, \ldots , n \}\), we define the random vectors \(\big (\mathbf{V}^{(i)}_{\ell } \big )_{\ell \in \{1, \ldots , k_{N,m}\} }\) of dimension \(n N \), as follows: for any \(\ell \in \{1, \ldots , k_{N,m}\}\),
where \(r_{\ell }\) is defined in (4.39) and the \(\mathbf{v}^{(i)}_{\ell }\)’s are defined in Sect. 4.2. With the notations (4.41) and (4.33), the following relations hold: for any \(i \in \{1, \ldots , n \}\),
where the \( \mathbf{{\widetilde{Z}}}^{(i)}\)’s are defined in (4.33). We define now, for any \(i \in \{1, \ldots , n \}\),
and any \(s \in \{ 1, \ldots , k_{N,m} \}\),
In all the notations above, we use the convention that \(\sum _{k=r}^s =0\) if \(r>s\). Therefore, starting from (4.37), considering the relations (4.40) and (4.42), and using the notations (4.43) and (4.44), we successively get
Therefore, setting for any \(i \in \{1, \ldots , n \}\) and any \(s \in \{ 1, \ldots , k_{N,m} \}\),
and
we are lead to
where
In order to continue the multidimensional Lindeberg method, it is useful to introduce the following notations.
Definition 4.2
Let \(d_1\) and \(d_2\) be two positive integers. Let \(A = (a_1, \ldots , a_{d_1})\) and \(B= (b_1, \ldots , b_{d_2})\) be two real-valued row vectors of respective dimensions \(d_1\) and \(d_2\). We define \(A \otimes B\) as being the transpose of the Kronecker product of \(A\) by \(B\). Therefore
For any positive integer \(k\), the \(k\)th transpose Kronecker power \(A^{\otimes k}\) is then defined inductively by: \(A^{\otimes 1}=A^T\) and \(A^{\otimes k} = A \bigotimes \big ( A^{\otimes (k-1)} \big )^T\).
Notice that, here, \(A \otimes B\) is not exactly the usual Kronecker product (or Tensor product) of \(A\) by \(B\) that rather produces a row vector. However, for later notation convenience, the above notation is useful.
Definition 4.3
Let \(d\) be a positive integer. If \(\nabla \) denotes the differentiation operator given by \(\nabla = \big ( \frac{\partial }{\partial x_1}, \ldots , \frac{\partial }{\partial x_d} \big )\) acting on the differentiable functions \(h: \mathbb{R }^d \rightarrow \mathbb{R }\), we define, for any positive integer \(k\), \(\nabla ^{\otimes k}\) in the same way as in Definition 4.2. If \(h : \mathbb{R }^d \rightarrow \mathbb{R }\) is \(k\)-times differentiable, for any \(x \in \mathbb{R }^d\), let \( D^k h(x) = \nabla ^{\otimes k} h(x) \), and for any row vector \(Y\) of \(\mathbb{R }^d\), we define \(D^k h(x) \mathbf{.} Y^{\otimes k}\) as the usual scalar product in \(\mathbb{R }^{d^k}\) between \(D^k h(x)\) and \( Y^{\otimes k}\). We write \(D h\) for \(D^1 h\).
Let \(z =u+iv \in \mathbb{C }^+\). We start by analyzing the term \( \mathbb{E }\big ( \Delta ^{(i)}_{s} (f) \big ) \) in (4.47). By Taylor’s integral formula,
Let us analyze the right-hand term of (4.48). Recalling the definition (4.38) of the \(\mathbf{{U}}^{(i)}_{s}\)’s, for any \(t \in [0,1]\),
where \(I_{s}\) is defined in (4.12). Therefore, using (4.11), stationarity and (4.23), it follows that, for any \(t \in [0,1]\),
Notice that by (4.43) and (4.44),
where \(w^{(i)} (t)\) is the row vector of dimension \(N\) defined by
where the \(\mathbf{{u}}^{(i)}_{ \ell } \)’s are defined in (4.13) and (4.14), whereas the \(\mathbf{{v}}^{(i)}_{ \ell } \)’s are defined in Sect. 4.2. Therefore, by Lemma 5.1 of the “Appendix”, (4.11), and since \((Z_k^{(i)})_{k \in \mathbb{Z }}\) is distributed as the stationary sequence \((Z_k)_{k \in \mathbb{Z }}\), we infer that there exists a positive constant \(C_1\) not depending on \((n,M,m)\) and such that, for any \(t \in [0,1]\),
Now, since \(Z_0\) is a Gaussian random variable, \( \Vert Z_0 \Vert ^6_6 = 15 \Vert Z_0 \Vert _2^6 \). Moreover, by (4.31), \( \Vert Z_0 \Vert _2 = \Vert X_0 \Vert _2 \). Therefore, there exists a positive constant \(C_2\) not depending on \((n,M,m)\) and such that, for any \(t \in [0,1]\),
On another hand, since for any \( i \in \lbrace 1 , \ldots , n \rbrace \) and any \(s \in \lbrace 1 , \ldots , k_{N,m} \rbrace \), \(\mathbf{{U}}^{(i)}_{s}\) is a centered random vector independent of \({\widetilde{\mathbf{W}}}^{(i)}_{s}\), it follows that
Hence starting from (4.48), using (4.51), (4.52) and the fact that \(m^2k_{N,m} \le N\), we derive that there exists a positive constant \(C_3\) not depending on \((n,M,m)\) and such that
We analyze now the “Gaussian part” in (4.47), namely \(\mathbb{E }\big ( \widetilde{\Delta }^{(i)}_{s} (f) \big )\). By Taylor’s integral formula,
Proceeding as to get (4.53), we then infer that there exists a positive constant \(C_4\) not depending on \((n,M,m)\) and such that
We analyze now the terms \(\mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s} \big )\) in (4.54). Recalling the definition (4.41) of the \(\mathbf{V}^{(i)}_{s}\)’s, we write
where \(I_{s}\) is defined in (4.12). To handle the terms in the right-hand side, we shall use the so-called Stein’s identity for Gaussian vectors (see, for instance, Lemma 1 in Liu [9]), as done by Neumann [12] in the context of dependent real random variables: for \(G=(G_1, \ldots , G_d)\) a centered Gaussian vector of \(\mathbb{R }^d\) and any function \(h : \mathbb{R }^d\rightarrow \mathbb{R }\) such that its partial derivatives exist almost everywhere and \(\mathbb{E }\big | \frac{\partial h }{\partial x_i} (G) \big | < \infty \) for any \(i=1, \ldots , d\), the following identity holds true:
Using (4.55) with \(G= \big ( \mathbf{{T}}^{(i)}_{s+1} , Z_j^{(i)} \big ) \in \mathbb{R }^{nN} \times \mathbb{R }\), \(h : \mathbb{R }^{n N} \times \mathbb{R } \rightarrow \mathbb{R }\) satisfying \(h(x,y) = \frac{\partial f}{\partial x^{(i)}_{j} } (x)\) for any \((x,y) \in \mathbb{R }^{n N} \times \mathbb{R }\), and noticing that \(G\) is independent of \({\widetilde{\mathbf{W}}}^{(i)}_{s} - \mathbf{{T}}^{(i)}_{s+1} \), we infer that, for any \(j \in I_{s}\),
Therefore,
From (4.49) and (4.50) (with \(t=0\)) and Lemma 5.1 of the “Appendix”, we infer that there exists a positive constant \(C_5\) not depending on \((n,M,m)\) and such that, for any \(k \in I_{\ell }\) and any \(j \in I_{s}\),
Hence, using the fact that \(\mathrm{Cov} ( Z_k^{(i)}, Z_j^{(i)}) =\mathrm{Cov} ( Z_k, Z_j)\) together with (4.31), we then derive that
By stationarity,
where \(\mathcal{E }_{m, \ell }:=\{ 1-m^2 +(\ell -s) (m^2 +m), \ldots , m^2-1 + (\ell -s) (m^2 +m) \}\). Notice that since \(m \ge 1\), \(\mathcal{E }_{m, \ell } \cap \mathcal{E }_{m, \ell +2} = \emptyset \). Then, summing on \(\ell \), and using the fact that \(k_{N,m} (m^2 +m) \le N\), we get that, for any \(s \ge 1\),
So, overall, for any positive integer \(s\),
Therefore, starting from (4.57) and using that \(m^2 k_{N,m} \le N\), it follows that
Since \(\mathcal{F }_{-\infty } = \bigcap _{k \in \mathbb{Z }} \sigma ( \xi _k)\) is trivial, for any \(k \in \mathbb{Z }\), \(\mathbb{E }(X_k | \mathcal{F }_{-\infty })=\mathbb{E }(X_k)=0\) a.s. Therefore, the following decomposition is valid: \(X_k = \sum _{r=-\infty }^k P_r (X_k)\). Next, since \(\mathbb{E }\big ( P_i (X_0)P_j (X_k)\big ) =0\) if \(i \ne j\), we get, by stationarity, that for any integer \(k \ge 0\),
implying that for any nonnegative integer \(u\),
Hence, starting from (4.59) and considering (4.61) together with the condition (2.3), we derive that there exists a positive constant \(C_6\) not depending on \((n,M,m)\) such that
We analyze now the terms of second order in (4.54), namely \(\mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big )\). Recalling the definition (4.41) of the \(\mathbf{V}^{(i)}_{s}\)’s, we first write that
where \(I_{s}\) is defined in (4.12). Using now (4.55) with \(G= \big ( \mathbf{{T}}^{(i)}_{s+1} , Z_{j_1}^{(i)}, Z_{j_2}^{(i)} \big ) \in \mathbb{R }^{n N} \times \mathbb{R }\times \mathbb{R }\), \(h : \mathbb{R }^{nN} \times \mathbb{R }\times \mathbb{R } \rightarrow \mathbb{R }\) satisfying \(h(x,y,z) = y \frac{\partial ^2 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2} } (x)\) for any \((x,y,z) \in \mathbb{R }^{n N} \times \mathbb{R }\times \mathbb{R }\), and noticing that \(G\) is independent of \({\widetilde{\mathbf{W}}}^{(i)}_{s} - \mathbf{{T}}^{(i)}_{s+1} \), we infer that, for any \(j_1, j_2\) belonging to \(I_{s}\),
Therefore, starting from (4.63) and using (4.64) combined with the definitions 4.2 and 4.3, it follows that
Next, with similar arguments, we infer that
By the definition (4.41) of the \(\mathbf{V}^{(i)}_{\ell }\)’s, we first write that
where for the last line, we have used that \((Z_k^{(i)})_{k \in \mathbb{Z }}\) is distributed as \((Z_k)_{k \in \mathbb{Z }}\) together with (4.31). From (4.49) and (4.50) (with \(t=0\)), Lemma 5.1 of the “Appendix”, and the stationarity of the sequences \(({\bar{X}}^{(i)}_{k,m})_{k \in \mathbb{Z }}\) and \((Z_k^{(i)})_{k \in \mathbb{Z }}\), we infer that there exists a positive constant \(C_7\) not depending on \((n,M,m)\) such that
By (4.11) and (4.23), \(\Vert {\bar{X}}_{0,m} \Vert _4^4 \le (2M)^2 \Vert {\bar{X}}_{0,m} \Vert _2^2 \le 16 M^2 \Vert X_0 \Vert _2^2\). Moreover, \(Z_0\) being a Gaussian random variable, \(\Vert Z_0 \Vert _4^4=3\Vert Z_0 \Vert _2^4\). Hence, by (4.31), \( \Vert Z_0 \Vert _4^4=3\Vert X_0 \Vert _2^4\) and \(\Vert Z_0 \Vert _2^2=\Vert X_0 \Vert _2^2\). Therefore, there exists a positive constant \(C_8\) not depending on \((n,M,m)\) and such that
On the other hand, by using (4.58) and (4.61), we get that, for any positive integer \(s\),
Whence, starting from (4.66), using (4.67), and considering the upper bounds (4.68) and (4.69) together with the condition (2.3), we derive that there exists a positive constant \(C_9\) not depending on \((n,M,m)\) such that
So, overall, starting from (4.65), considering (4.70) and using the fact that \( m^2 k_{N,m} \le N\), we derive that
Then, starting from (4.47), and considering the upper bounds (4.53), (4.54), (4.62), and (4.71), we get that
where \(C_{10} = \max ( C_3,C_4,C_6,C_7)\). Since \(c(n) \rightarrow c \in (0,\infty )\), it follows that the second and third terms in the right-hand side of the above inequality tend to zero as \(n\) tends to infinity. On the other hand, by the condition (2.3), \(\lim _{m \rightarrow \infty }\sum _{k \ge m+1 } \Vert P_0 (X_{k}) \Vert _2=0 \). Therefore, Proposition 4.3 will follow if we can prove that, for any \(z\in \mathbb C ^+\),
Using the fact that \((Z_k^{(i)})_{k \in \mathbb{Z }}\) is distributed as \((Z_k)_{k \in \mathbb{Z }}\) together with (4.31) and that \(( {\bar{X}}^{(i)}_{k,m})_{k \in \mathbb{Z }}\) is distributed as \(({\bar{X}}_{k,m})_{k \in \mathbb{Z }}\), we first write that
Hence, by using (4.56) and by stationarity, we get that there exists a positive constant \(C_{11}\) not depending on \((n,M,m)\) such that
To handle the right-hand side term, we first write that
where \(X_{0,m}\) and \(X_{k,m}\) are defined in (4.28). Notice now that \(\mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) = \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) =0\) if \(k>m\). Therefore,
Next, using stationarity, the fact that the random variables are centered, (4.11) and (4.29), we get that
As to get (4.29), notice that \( \Vert X_{0,m} - {\bar{X}}_{0,m} \Vert _1 \le 2 \Vert \big ( |X_0| - M )_+ \Vert _1 \). Moreover, \(\big ( |x| - M )_+ \le 2|x|\mathbf{1}_{|x| \ge M}\) which in turn implies that \(M \big ( |x| - M )_+ \le 2|x|^2\mathbf{1}_{|x| \ge M} \). So, overall,
We handle now the second term in the right-hand side of (4.74). Let \(b(m)\) be an increasing sequence of positive integers such that \(b(m) \rightarrow \infty \), \(b(m) \le [m/2]\), and
Notice that since (4.30) holds true, it is always possible to find such a sequence. Now, using (4.60),
Recalling the definition (4.28) of the \(X_{j,m}\)’s, we notice that \(P_{0} (X_{j,m}) = 0 \) if \(j\ge m+1\). Now, for any \(j \in \{0, \ldots , m\}\),
Actually, the last two equalities follow from the tower lemma, whereas, for the second one, we have used the following well-known fact with \(\mathcal{G }_1=\sigma (\varepsilon _0, \ldots , \varepsilon _{j-m}) \), \(\mathcal{G }_2=\sigma (\varepsilon _k, k \le j-m-1) \) and \(Y = X_{j,m}\): if \(Y\) is an integrable random variable, and \(\mathcal{G }_1\) and \(\mathcal{G }_2\) are two \(\sigma \)-algebras such that \( \sigma (Y) \vee \mathcal{G }_1\) is independent of \(\mathcal{G }_2\), then
Similarly, for any \(j \in \{0, \ldots , m-1\}\),
Then using the equality (4.78) with \(\mathcal{G }_1=\sigma (\varepsilon _{-1}, \ldots , \varepsilon _{j-m}) \) and \(\mathcal{G }_2=\sigma (\varepsilon _0)\), we get that, for any \(j \in \{1,\ldots ,m-1\}\),
whereas \(\mathbb{E }( X_{m,m} | \xi _{-1}) =0\) a.s. So, finally, \(\Vert P_{0} (X_{m,m}) \Vert _2 = \Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 \), \(\Vert P_{0} (X_{j,m}) \Vert _2 =0\) if \(j \ge m+1\), and, for any \(j \in \{1,\ldots ,m-1\}\),
Therefore, starting from (4.77), we infer that
On the other hand,
Since the random variables are centered, \(\mathrm{Cov} \big (X_0 - X_{0,m}, X_{k,m} \big )=\mathbb{E }\big ( X_{k,m} (X_0 - X_{0,m})\big )\). Since \(X_{k,m}\) is \(\sigma ( \varepsilon _{k-m}, \ldots , \varepsilon _k)\)-measurable,
But, for any \(k \in \{ 0, \ldots , m \}\), by using the equality (4.78) with \(\mathcal{G }_1=\sigma (\varepsilon _{0}, \ldots , \varepsilon _{k-m}) \) and \(\mathcal{G }_2=\sigma (\varepsilon _{k}, \ldots , \varepsilon _1)\), it follows that
and
Whence,
To handle the second term in the right-hand side of (4.80), we start by writing that
Using the fact that the random variables are centered together with stationarity, we get that
On the other hand, noticing that \(\mathbb{E }( X_k - X_{k,m} | \varepsilon _k, \ldots , \varepsilon _{k-m} ) =0\), and using the fact that the random variables are centered, and stationarity, it follows that
Next, using (4.81), we get that, for any \(k \in \{0, \ldots , m\}\),
Therefore, starting from (4.85), taking into account (4.86) and the fact that
we get that
Starting from (4.83), gathering (4.84) and (4.87), and using the fact that \(b(m) \le [m/2]\), we then derive that
which combined with (4.80) and (4.82) implies that
So, overall, starting from (4.74), gathering the upper bounds (4.75), (4.79), and (4.88), and taking into account the condition (2.3), we get that that there exists a positive constant \(C_{12}\) not depending on \((n,M,m)\) and such that
Therefore, starting from (4.73), considering the upper bound (4.89), using the fact that \( m^2 k_{N,m} \le N\) and that \(\lim _{n \rightarrow \infty }c(n) = c\), it follows that there exists a positive constant \(C_{13}\) not depending on \((M,m)\) and such that
Letting first \(M\) tend to infinity and using the fact that \(X_0\) belongs to \(\mathbb{L }^2\), the first term in the right-hand side is going to zero. Letting now \(m\) tend to infinity the third term vanishes by the condition (2.3), whereas the last one goes to zero by taking into account (4.76). To show that the second term goes to zero as \(m\) tends to infinity, we notice that, by stationarity, \(\Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 \le \Vert \mathbb{E }( X_m | \xi _0) \Vert _2 = \Vert \mathbb{E }( X_0| \xi _{-m}) \Vert _2\). By the reverse martingale convergence theorem, setting \(\mathcal{F }_{-\infty } = \bigcap _{k \in \mathbb{Z }} \sigma ( \xi _k)\), \( \lim _{m \rightarrow \infty }\mathbb{E }( X_0| \xi _{-m}) =\mathbb{E }( X_0| \mathcal{F }_{-\infty })=0 \) a.s. (since \(\mathcal{F }_{-\infty }\) is trivial and \(\mathbb{E }(X_0)=0\)). So, since \(X_0\) belongs to \(\mathbb{L }^2\), \(\lim _{m \rightarrow \infty }\Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 =0\). This ends the proof of (4.72) and then of Proposition 4.3. \(\square \)
4.4 End of the Proof of Theorem 2.1
According to Propositions 4.1, 4.2, and 4.3, the convergence (4.3) follows. Therefore, to end the proof of Theorem 2.1, it remains to show that (4.4) holds true with \(\mathbf{G}_n\) defined in Sect. 4.2. This can be achieved by using Theorem 1.1 in Silverstein [17] combined with arguments developed in the proof of Theorem 1 in [23] (see also [19]). With this aim, we consider \((y_k)_{k \in \mathbb{Z }}\) a sequence of i.i.d. real-valued random variables with law \(\mathcal{N } (0,1)\), and \(n\) independent copies of \((y_k)_{k \in \mathbb{Z }}\) that we denote by \((y^{(1)}_k)_{k \in \mathbb{Z }}, \ldots , (y^{(n)}_k)_{k \in \mathbb{Z }}\). For any \(i \in \{1, \ldots , n \}\), define \(\mathbf{{y}}_{i}=\big ( y_{1}^{(i)}, \ldots ,y_{N}^{(i)}\big )\). Let \(\mathcal Y _n=(\mathbf{{y}}^T_{1} \vert \cdots \vert \mathbf{{y}}^T_{n}) \) be the matrix whose columns are the \(\mathbf{{y}}^T_{i}\)’s and consider its associated sample covariance matrix \( \mathbf{Y}_n=\frac{1}{n}\mathcal Y _n\mathcal Y _{n}^{T} \). Let \(\gamma (k) = \mathrm{Cov} (X_0, X_k )\) and note that, by (4.31), \(\gamma (k)\) is also equal to \(\mathrm{Cov} (Z_0, Z_k )=\mathrm{Cov} (Z^{(i)}_0, Z^{(i)}_k )\) for any \(i\in \{1, \ldots , n \}\). Set
Note that \((\Gamma _N)\) is bounded in spectral norm. Indeed, by the Gerschgorin theorem, the largest eigenvalue of \(\Gamma _N\) is not larger than \(\gamma (0) + 2 \sum _{k \ge 1} \vert \gamma (k) |\) which, according to Remark 2.2, is finite. Note also that the vector \(( \mathbf{{Z}}_{1}, \ldots , \mathbf{{Z}}_{n})\) has the same distribution as \(\big ( \mathbf{{y}}_{1} \Gamma _N^{1/2}, \ldots , \mathbf{{y}}_{n}\Gamma _N^{1/2} \big )\) where \(\Gamma _N^{1/2}\) is the symmetric nonnegative square root of \(\Gamma _N\) and the \(\mathbf{{Z}}_{i}\)’s are defined in Sect. 4.2. Therefore, for any \(z\in \mathbb C ^+\), \(\mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) =\mathbb{E }\big ( S_{F^{\mathbf{A}_n}}(z) \big )\) where \(\mathbf{A}_n = \Gamma _N^{1/2}\mathbf{Y}_n\Gamma _N^{1/2}\). The proof of (4.4) is then reduced to prove that, for any \(z\in \mathbb C ^+\),
where \(S\) is defined in (2.4). According to Theorem 1.1 in Silverstein [17], if one can show that
then (4.91) holds with \(S\) satisfying the equation (1.4) in Silverstein [17]. Due to the Toeplitz form of \(\Gamma _N\) and to the fact that \(\sum _{k \ge 0} \vert \gamma (k) | < \infty \) (see Remark 2.2), the convergence (4.92) can be proved by taking into account the arguments developed in the proof of Theorem 1 of [23]. Indeed, the fundamental eigenvalue distribution theorem of Szegö for Toeplitz forms allows to assert that the empirical spectral distribution of \(\Gamma _N\) converges weakly to a nonrandom distribution \(H\) that is defined via the spectral density of \((X_k)_{k \in \mathbb{Z }}\) (see Relations (12) and (13) in [23]). To end the proof, it suffices to notice that the relation (1.4) in Silverstein [17] combined with the relation (13) in [23] leads to (2.4). \(\square \)
References
Bai, Z., Silverstein, J.W.: Spectral Analysis of Large Dimensional Random Matrices. Springer Series in Statistics, 2nd edn. Springer, New York (2010)
Bai, Z., Zhou, W.: Large sample covariance matrices without independence structures in columns. Stat. Sinica 18, 425–442 (2008)
Chafaï, D., Guédon, O., Lecué, G., Pajor, A. (2012) Interactions between compressed sensing, random matrices, and high dimensional geometry. To appear in Panoramas et Synthèses 38, Société Mathématique de France (SMF)
Chatterjee, S.: A generalization of the Lindeberg principle. Ann. Probab. 34, 2061–2076 (2006)
Giraitis, L., Kokoszka, P., Leipus, R.: Stationary ARCH models: dependence structure and central limit theorem. Econom. Theory 16, 3–22 (2000)
Hall, P., Heyde, C.C.: Martingale Limit Theory and its Application. Probability and Mathematical Statistics. Academic Press, New York, London (1980)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985)
Ibragimov, I.A., Linnik Yu, V.: Independent and stationary sequences of random variables. Wolters-Noordhoff Publishing, Groningen, Translation from the Russian edited by J.F.C. Kingman (1971)
Liu, J.S.: Siegel’s formula via Stein’s identities. Stat. Probab. Lett. 21, 247–251 (1994)
Marc̆enko, V., Pastur, L.: Distribution of eigenvalues for some sets of random matrices. Math. Sbornik 72, 507–536 (1967)
Merlevède, F., Peligrad, M., Utev, S.: Recent advances in invariance principles for stationary sequences. Probab. Surv. 3, 1–36 (2006)
Neumann, M.: A central limit theorem for triangular arrays of weakly dependent random variables, with applications in statistics. ESAIM Probab. Stat. 17, 120–134 (2013)
Pan, G.: Strong convergence of the empirical distribution of eigenvalues of sample covariance matrices with a perturbation matrix. J. Multivar. Anal. 101, 1330–1338 (2010)
Peligrad, M., Utev, S.: Central limit theorem for stationary linear processes. Ann. Probab. 34, 1608–1622 (2006)
Pfaffel, O., Schlemm, E.: Eigenvalue distribution of large sample covariance matrices of linear processes. Probab. Math. Stat. 31, 313–329 (2011)
Priestley, M.B.: Nonlinear and Nonstationary Time Series Analysis. Academic Press, Waltham (1988)
Silverstein, J.W.: Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivar. Anal. 55, 331–339 (1995)
Silverstein, J.W., Bai, Z.D.: On the empirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivar. Anal. 54, 175–192 (1995)
Wang, C., Jin, B., Miao, B.: On limiting spectral distribution of large sample covariance matrices by \({\rm VARMA}(p, q)\). J. Time Ser. Anal. 32, 539–546 (2011)
Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Clarendon Press, Oxford (1965)
Wu, W.B.: Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci. USA 102, 14150–14154 (2005)
Wu, W.B.: Asymptotic theory for stationary processes. Stat. Interface 4, 207–226 (2011)
Yao, J.: A note on a Marc̆enko-Pastur type theorem for time series. Stat. Probab. Lett. 82, 22–28 (2012)
Yin, Y.Q.: Limiting spectral distribution for a class of random matrices. J. Multivar. Anal. 20, 50–68 (1986)
Acknowledgments
The authors would like to thank the referee for carefully reading the manuscript and for numerous suggestions which improved the presentation of this paper. The authors are also indebted to Djalil Chafaï for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we give some upper bounds for the partial derivatives of \(f\) defined in (4.35).
Lemma 5.1
Let \(x\) be a vector of \(\mathbb{R }^{n N}\) with coordinates
Let \(z=u+{\sqrt{ -1}}v\in \mathbb C ^+\) and \(f:=f_z\) be the function defined in (4.35). Then, for any \(i \in \lbrace 1 ,\ldots , n \rbrace \) and any \(j,k,\ell ,m \in \lbrace 1 ,\ldots , N \rbrace \), the following inequalities hold true:
and
Proof
Recall that \(f(x)=\frac{1}{N}\mathrm{Tr}\big (A(x)-z\mathbf{{I}}\big )^{-1}\) where \( A(x) = \frac{1}{n}\sum _{k=1}^{n}( x^{(k)})^Tx^{(k)}\). To prove the lemma, we shall proceed as in Chatterjee [4] (see the proof of its Theorem 1.3) but with some modifications since his computations are made in case where \(A(x)\) is a Wigner matrix of order \(N\).
Let \(i \in \lbrace 1, \ldots , n \rbrace \) and consider for any \(j,k \in \lbrace 1 , \ldots , N \rbrace \), the notations \( \partial _{j}\) instead of \(\partial / \partial x^{(i)}_{j} \), \( \partial _{jk}^{2}\) instead of \(\partial ^{2}/\partial x^{(i)}_{j} \partial x^{(i)}_{k}\) and so on. We shall also write \(A\) instead of \(A(x)\), \(f\) instead of \(f(x)\), and define \(G= \big (A-z\mathbf{{I}}\big )^{-1}\).
Note that \(\partial _{j}A\) is the matrix with \(n^{-1}\big (x^{(i)}_{1}, \ldots ,x^{(i)}_{j-1},2x^{(i)}_{j},x^{(i)}_{j+1},\ldots , x^{(i)}_{N}\big ) \) as the \(j^{th}\) row, its transpose as the \(j^{th}\) column, and zero otherwise. Thus, the Hilbert–Schmidt norm of \(\partial _{j}A\) is bounded as follows:
Now, for any \( m,j \in \lbrace 1 , \ldots , N \rbrace \) such that \(m \ne j \), \(\partial _{mj}^{2}A\) has only two non-zero entries which are equal to \(1/n\), whereas if \(m=j\), it has only one non-zero entry which is equal to \(2/n\). Hence,
Finally, note that \(\partial _{lmj}^{3}A\equiv 0\) for any \(j , m , l \in \lbrace 1 , \ldots , N \rbrace \).
Now, by using (4.36), it follows that, for any \(j \in \lbrace 1 , \ldots , N \rbrace \),
In what follows, the notations \(\sum _{\lbrace \!j',m'\rbrace \!=\!\lbrace j,m\rbrace }\), \(\sum _{\lbrace j',m',\!\ell '\rbrace \!=\!\lbrace j,m, \ell \rbrace }\) and \(\sum _{\lbrace j',m',\!\ell ' , k'\rbrace \!=\!\lbrace j,m, \ell , k \rbrace }\) mean, respectively, the sum over all permutations of \(\lbrace j,m\rbrace \), of \(\lbrace j,m, \ell \rbrace \), and of \(\lbrace j,m, \ell , k\rbrace \). Therefore, the first sum consists of \(2\) terms, the second one of \(6\) terms, and the last one of \(24\) terms. Starting from (5.3) and applying repeatedly (4.36), we then derive the following cumbersome formulas for the partial derivatives up to the order four: for any \( j,m,\ell ,k \in \{1, \ldots , N \}\),
and
where
and
We start by giving an upper bound for \(\partial _{mj}^{2}f\). Since the eigenvalues of \(G^2\) are all bounded by \(v^{-2}\), then so are its entries. Then, as \( \mathrm{Tr}(G(\partial _{mj}^{2}A)G) = \mathrm{Tr}((\partial _{mj}^{2}A)G^2) \), it follows that
Next, to give an upper bound for \( | \mathrm{Tr}\big (G(\partial _{j}A) G (\partial _{m}A) G \big ) |\), it is useful to recall some properties of the Hilbert–Schmidt norm: Let \(B=(b_{ij})_{1\le i,j\le N}\) and \(C=(c_{ij})_{1\le i,j\le N}\) be two \(N\times N\) complex matrices in \(\mathcal L _{2}\), the set of Hilbert–Schmidt operators. Then
-
(a)
\(|\mathrm{Tr}(BC)|\le \Vert B\Vert _{2} \Vert C\Vert _{2}\).
-
(b)
If \(B\) admits a spectral decomposition with eigenvalues \(\lambda _1, \ldots , \lambda _N\), then \(\max \lbrace \Vert BC\Vert _{2}, \Vert CB \Vert _{2}\rbrace \le \max _{1 \le i \le N} | \lambda _i|.\Vert C \Vert _{2}\).
(See, e.g., [20] pages 55–58, for a proof of these facts).
Using the properties of the Hilbert–Schmidt norm recalled above, the fact that the eigenvalues of \(G\) are all bounded by \(v^{-1}\), and (5.1), we then derive that
Starting from (5.4) and considering (5.7) and (5.8), the first inequality of Lemma 5.1 follows.
Next, using again the above properties (a) and (b), the fact that the eigenvalues of \(G\) are all bounded by \(v^{-1}\), (5.1) and (5.2), we get that
and
The same last bound is obviously valid for \(| \mathrm{Tr}(G(\partial _{m}A)G(\partial ^2_{\ell j} A) G)| \). Hence, starting from (5.5) and considering (5.9) and (5.10), the second inequality of Lemma 5.1 follows.
It remains to prove the third inequality of Lemma 5.1. Using again the above properties (a) and (b), the fact that the eigenvalues of \(G\) are all bounded by \(v^{-1}\), (5.1) and (5.2), we infer that
and
Clearly, the bound (5.12) is also valid for the quantities \(| \mathrm{Tr}( G(\partial _{m}A)\!G(\partial ^2_{\ell j} A)\!G (\partial _{k}A)\!G)| \) and \(| \mathrm{Tr}( G(\partial _{m}A)G (\partial _{k}A)G(\partial ^2_{\ell j} A)G)| \). So, overall, starting from (5.6) and considering (5.11), (5.12), and (5.13), the third inequality of Lemma 5.1 follows. \(\square \)
Rights and permissions
About this article
Cite this article
Banna, M., Merlevède, F. Limiting Spectral Distribution of Large Sample Covariance Matrices Associated with a Class of Stationary Processes. J Theor Probab 28, 745–783 (2015). https://doi.org/10.1007/s10959-013-0508-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-013-0508-x
Keywords
- Sample covariance matrices
- Weak dependence
- Lindeberg method
- Marc̆enko–Pastur distributions
- Limiting spectral distribution