Abstract
We consider a class of random matrices \(B_{N}=X_{N}T_{N}X_{N}^{t},\) where \( X_{N}\) is a matrix \((N\times n(N))\) whose rows are independent, the entries \(X_{ij}\) in each row satisfy an autoregressive relation AR(1), and \( T_{N}\) is a diagonal matrix independent of \(X_{N}\). Under some conditions, we show that if the empirical distribution function of eigenvalues of \(T_{N}\) converges almost surely to a proper probability distribution as \(N\longrightarrow \infty \) and \(\frac{n(N)}{N}\longrightarrow c>0\), then the empirical distribution function of eigenvalues of \(B_{N}\) converges almost surely to a non-random limit function given by Marcenko and Pastur. Numerical simulations illustrate the behavior of kernel density estimators and density estimators of Stieltjes transform around the true density and we give a numerical comparison on the base of \(L_{1}\) error varying different parameters.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
- Large dimensional random matrix
- Empirical distribution function of eigenvalues
- Covariance matrix
- Autoregressive processes
- Stieltjes transform
- Kernel density estimators
1 Introduction
Theoretical studies on covariance matrices have a long history and appear in many domains in the real world and having links with practical problems (see [1] and [9]). For example, in multivariate statistics, spectral asymptotic results are used in solving the detection problem in signal process [9].
Consider the following random matrix:
where \(X_{N}=(\frac{1}{\sqrt{N}}X_{ij}),\) (\(i=1,\ldots ,N;\) \(j=1,\ldots ,n(N)\)) is a matrix \((N\times n(N))\) with independent rows with the entries \(X_{ij}\) of each row satisfy an autoregressive relation AR(1) and \(T_{N}\) is a diagonal matrix \((n\times n)\) with real entries independent of \(X_{N}\) (\(X_{N}^{t}\) is the transpose matrix of \(X_{N}\)). More precisely, for each \(i\ge 1\) we have
where \(\left( \varepsilon _{ij},i,j\ge 1\right) \) are i.i.d. rv’s (random values) with mean 0 and variance \(\sigma ^{2}>0,\) such that \(\varepsilon _{ij}\) admits a continuous density function with respect to Lebesgue measure. The parameter \(\rho \) is such that \( \left| \rho \right| <1\) assuring a strictly stationary process. The diagonal matrix \(T_{N}=diag\left( \tau _{1},\ldots ,\tau _{n}\right) \) is independent of \(X_{N}\) and the rv’s \(\tau _{i}\) are real.
The empirical distribution function \(\left( e.d.f.\right) \) of the eigenvalues (\(\lambda _{i})\) of the symmetric matrix \(B_{N}\) is defined by
where \({1}_{A}\) denoting the indicator function of the set A.
A large number of papers have dealt with the problem to identify the limit of the e.d.f. of eigenvalues of random matrices \( B_{N}\) as \(N\longrightarrow \infty \) and \(\frac{n(N)}{N}\longrightarrow c>0\). Marcenko and Pastur [7] originally studied this problem for more general forms of random matrices. They establish, under some conditions on moments, that if the e.d.f. \(F^{T_{N}}\) converges to a proper distribution function H, then \(F^{B_{N}}\) converges in probability to a proper distribution function. Their method involves the Stieltjes transform, where they shown that the Stieltjes transform of the limiting distribution function satisfies a first-order partial differential equation, then via the characteristics they shown that this function is a solution of an algebraic equation identifying hence the limit.
Afterward, several authors [4, 5, 8, 11, 12] extended this result giving the almost sure convergence of the e.d.f. of eigenvalues under mild conditions on the entries \(X_{ij}.\) Most of the previous papers employ the same transform as [7] and the entries \(X_{ij}\) of the matrices are independent random variables, except the paper [3], where dependent entries are considered.
Our goal in this paper is to study, under some assumptions, the limit of the e.d.f. \(F^{B_{N}}\) of the random matrix \(B_{N}=X_{N}T_{N}X_{N}^{t}\), where the entries \(X_{ij}\) of the matrix \(X_{N}\) satisfy an autoregressive relation AR(1) for each i. We follow the approach given in [8] where the authors apply Marcenko and Pastur method to study the limit of Stieltjes transform of the e.d.f. \(F^{B_{N}}\) and then we identify the limit law. We illustrate by numerical simulations the behavior of kernel density estimators and density estimators of Stieltjes transform to identify the true density and give \(L_{1}\) errors varying different parameters.
The paper is organized as follows. Section 2 provides the main result. Section 3 presents numerical simulations. The proof of the main result will be postponed in Sect. 4.
2 Main Result
First, we introduce some random variables and random matrices. We truncate and centralize the entries \(X_{ij}\) of the random matrix \(X_{N}\) to obtain new corresponding random matrices as follows: for \(i=1,\ldots ,N;\) \(j=1,\ldots ,n(N)\), let
and
We pointed out that the problem described above has been often handled by the method of Stieltjes transform. Let \( \mathcal {M}\left( \mathbb {R} \right) \) be the set of distribution functions on \( \mathbb {R} .\) Recall that the Stieltjes transform of a distribution function \(F\in \mathcal {M} \left( \mathbb {R} \right) \) is defined by
where \({\mathfrak {I}\mathtt {m}}\) is the imaginary part. The inversion formula is given by
where a and b are continuity points of F. Also, the weak convergence of probability distribution functions is equivalent to the convergence of Stieltjes transforms (Theorem B.9, [1]). From the inversion formula, it follows that for any countable set \(S\subset \mathbb {C} ^{+}\) such that \( \mathbb {R} \subset \bar{S}\) the closure of S, and a sequence \((F_{N})\in \mathcal {M} \left( \mathbb {R} \right) \), \(F\in \mathcal {M}\left( \mathbb {R} \right) \), we have the following equivalence:
where \(F_{N}\rightarrow F\) is the vague convergence of distributions functions.
Furthermore, we consider the following random matrices. For \(j,l=1,2,\ldots ,n(N)\), denote by \(\bar{q}_{j}\) the jth column of \(\bar{X}_{N}\,\) defined by (5) that is
and by
where \(\tau _{j}\) are the elements of \(T_{N},\) and define
where \(m_{F^{\bar{B}_{N}}}\) and \(m_{F^{\bar{B}_{\left( j\right) }}}\)are Stieltjes transform of the matrices \(\bar{B}_{N}\) and \(\bar{B}_{\left( j\right) }\), respectively. Finally, set
where I is the identity matrix.
Now, we state the main result of this paper giving the almost sure limit of the e.d.f. of the eigenvalues of the random matrix \(B_N\) (tr is the trace of the matrix).
Theorem 1
Assume
-
(a)
For \(N=1,2,\ldots ,\) let\(\ \ X_{N}=\left( \frac{1}{\sqrt{N }}X_{ij}\right) \) be a matrix \(\left( N\times n(N)\right) \) with independent rows and an AR(1) autoregressive relation (2) in each row. The entries \( X_{ij}, i, j \ge 1, \) have all their moments finite and \(\frac{n(N)}{N}\rightarrow c>0\) as \(N\rightarrow \infty \).
-
(b)
\(T_{N}= diag \left( \tau _{1},\ldots ,\tau _{n}\right) ,\) \(\tau _{i}\in \mathbb {R}\), and the e.d.f. of \(T_{N}\) converges almost surely to a distribution function H as \(N\rightarrow \infty .\)
-
(c)
The matrices \(X_{N}\) and \(T_{N}\) are independent.
-
(d)
For \(k=1,2\) and \(j=1,2,\ldots ,n(N),\) the matrices \( C_{\left( j\right) }^{k}\) defined in (10) satisfy \(E\left| V_{j}^{t}C_{\left( j\right) }^{k}V_{j}-trC_{\left( j\right) }^{k}\right| ^{6}\le KN^{3},\) where \(V_{j}\) given by (7) and \(K>0\).
Then, the e.d.f. \(F^{B_{N}}\) of the random matrix \( B_{N}=X_{N}T_{N}X_{N}^{t}\) converges vaguely almost surely to a distribution function F, as \(N\longrightarrow \infty \), whose Stieltjes transform \(m_{F}\)(z) satisfies the following functional relation:
Remark Assumption (a) is fulfilled in part if the white noise \((\varepsilon _{ij})\) has all moments (Gaussian white noise). Assumptions (b), (c) are standard and analogous of that existing in [8]. Assumption (d) requires a control of sixth moment of quadratic form of a matrix and its trace by a third power of size N. In the case of i.i.d. entries \( X_{ij}, i, j \ge 1\), assumption (d) is fulfilled (cf. Lemma 3.1 in [8]).
3 Numerical Simulations
As a practical impact of the main result, we illustrated in [6] the behavior of the empirical density estimator of e.d.f. of eigenvalues \((\lambda _{i}, i=1,\ldots ,N)\) of large random matrices \(B_{N}\), and identify the density function of the limit law by numerical simulations. First, we recall the formulas giving density of limit law and the empirical Stieltjes transform estimator. From [10], we have for all \(x\in \mathbb {R} -\left\{ 0\right\} ,\) and \(z=x +iy\), \(y>0, \) the distribution function F (limit of the e.d.f. \( F^{B_{N}}\)) has a continuous derivative f defined by \( f(x)=\left( 1/\pi \right) {\mathfrak {I}\mathtt {m}}m_{0}(x), \) where \(m_{0}(x)\) is given by Stieltjes transform \(m_{F}\left( z\right) \) as \(\lim _{z\rightarrow x} \ \ m_{F}(z):=m_{0}(x)\) (Figs. 1, 2 and Table 1).
The Stieltjes Transform Estimator (STE) is defined by
where \(m_{F^{B_{N}}}\left( z\right) =\frac{1}{N}tr\left( B_{N}-zI\right) ^{-1}=\ \frac{1}{N}\sum \limits _{i=1}^{N}\left( \lambda _{i}-z\right) ^{-1}.\)
Now, we apply Gaussian Kernel Estimators (GKE) defined by
where \(h_{N}\) is the bandwidth converging to 0 and \( Nh_{N}\rightarrow \infty \), and K is a Gaussian kernel : \( K(u)=\frac{1}{\sqrt{2\pi }}\exp (-\frac{1}{2}u^{2}). \)
We compare the performance of Stieltjes Transform Estimators (STE) and Gaussian Kernel Estimators (GKE) on the base of \(L_{1}\)-errors (Fig. 3 and Table 2).
Conclusion
From more numerical simulations, we may observe that the performance of estimators strongly depends on the choices of the dimension c, AR parameter \(\rho \), and sample size N. The variability of parameters has a direct impact on the stabilization and convergence rate of the estimators. Particular choices of parameters confirm a good performance of the estimators and lead to indicate optimal values for these parameters. We also observe an effect of the dimension c on density estimator convergence rate. For \(c>1\), for both weak and strong dependencies (\(\rho =0.2,\) \(\rho =0.7\)), the estimators perform well from on \(N=100\). However, for small values of c (\(c<0.2\)), there is an influence of parameter values on the convergence rate. For weak dependence (\(\rho =0.2\)) STE perform quite well for moderate value \(N=100\), whereas for strong dependence (\(\rho =0.7\)) the estimator accurate enough well only for large N (when \(N>1000)\). The number of eigenvalues of \(T_{N}\) has an effect on the behavior of the estimators as well as on their performance. Both estimators perform well and give a good representation of the true density with a small advantage of GKE.
4 Proof of the Main Result
Recall these well-known facts. For each i, the process \((X_{ij},\) \(j\in \mathbb {Z} )\) satisfying relation (2) is a stationary AR(1) process, then it satisfies the geometric strong mixing property (G.S.M) with strong mixing coefficient \(\alpha _{k}=\alpha _{k}(\mathcal {F}_{0}^{m},\mathcal {F}_{m+k}^{\infty })=O\left( \rho ^{k}\right) \), where \(0<\rho <1\) and \(\mathcal {F}_{a}^{b}=\mathcal {F}_{a,i}^{b}=\sigma (X_{ij}, a\le j \le b)\), whenever \(\varepsilon _{ij}\) has a strictly positive continuous density (see [2] p. 58).
The covariance between two real-valued rv’s is bounded as follows: if \(\eta \in L^{p}\) and \(\xi \in \) \(L^{q}\) are \(\mathcal {F}_{0}^{m}\) and \(\mathcal {F}_{m+k}^{\infty }\)-measurable, respectively, then we have
for all \(1\le p,q,r\le \infty \) with \(\frac{1}{p}+\frac{1}{q}+ \frac{1}{r}=1,\) and the norm \(\left\| .\right\| _{p}=E^{\frac{1}{p} }\left| .\right| ^{p}\).
On the other hand, there exists a distance D(., .) on the space \(\mathcal {M}\left( \mathbb {R} \right) ,\) such that for two sequences \((F_{N}),(G_{N})\in \mathcal {M}\left( \mathbb {R} \right) ,\) we have (see [8]).
where \(\left\| .\right\| \) denotes the sup-norm of bounded functions from \( \mathbb {R} \) to \( \mathbb {R}.\)
To lighten the writing, the dependency of most of variables on N will occasionally be dropped form the notation. Now, we replace T by a suitable matrix for further analysis: for \(\theta \ge 0\) define, \(T_{\theta }=diag\left( \tau _{1}{1}_{\left( \left| \tau _{1}\right| \le \theta \right) },\ldots ,\tau _{n}{1 }_{\left( \left| \tau _{n}\right| \le \theta \right) }\right) ,\) and let Q be any \(\left( N\times n\right) \) matrix. If \(\theta \) and \( (-\theta ) \) are continuity points of H, then by Lemma 2.5 of [8] and assumption (b) of the Theorem 1, as \(N\rightarrow \infty \) and \(\frac{n}{N}\rightarrow c>0,\) we have
It follows that if \(\theta =\theta _{N}\rightarrow \infty \), then
Choose \(\theta \) such that
and
For continue, we need the following result.
Lemma 1
Let the \(\left( N\times n\right) \) matrices \(X=\left( \frac{1}{\sqrt{N}}X_{ij}\right) \) verifying assumption (a) of Theorem 1, and \(\hat{X} =\left( \frac{1}{\sqrt{N}}\hat{X}_{ij}\right) \) where \(\hat{X}_{ij}=X_{ij}{1}_{\left( \left| X_{ij}\right| <\sqrt{N}\right) }\). For \(\theta \ge 0 \) set \(T_{\theta }=diag\left( \tau _{1}{1}_{\left( \left| \tau _{1}\right| \le \theta \right) },\ldots ,\tau _{n}{1}_{\left( \left| \tau _{n}\right| \le \theta \right) }\right) , \tau _{i}\in \mathbb {R}.\) We have
Proof
From Corollary A.42 of [1], we find
In order that this distance tends almost surely to 0, we can show by Borel–Cantelli lemma that \(\left[ \frac{\theta ^{2}}{N}tr\left( XX^{t}-\hat{X}\hat{X}^{t}\right) \right] \) tends to 0 and \(\left[ \frac{4}{N}tr\hat{X}\hat{X}^{t}\right] \) is bounded almost surely. So the result.
4.1 Proof of the Theorem 1
For \(\left( N\times n\right) \) matrix, \(X=\left( \frac{1}{\sqrt{N}} X_{ij}\right) \) verifying assumption (a) of Theorem 1. With help to inequality (12) and the fact that \(\left( X_{ij}\right) \) satisfies the G.S.M. property, we obtain
where
With the same arguments, we may deduce a bound of the variance
Using (14) and (13), we may write
Furthermore, by Lemma 1, we get
For \(\hat{B}_{N}\) and \(\tilde{B}_{N}\) defined by relations (3) and (4), we have from Lemma 2.5 of [8],
Let \({\bar{\bar{X}}}_{ij}=\tilde{X}_{ij}-\bar{X}_{ij}\). Hence,
where \(\tilde{X}_{ij}\) and \(\bar{X}_{ij}\) are defined by the relations (4), (5), respectively.
Then, from Cauchy–Schwartz inequality, we can show that the squared distance
is bounded by
Therefore, in order to show that almost surely
it suffices to verify that
Since \(E\left( {\bar{\bar{X}}}_{11}\right) =0\) and\(\ {\bar{\bar{X}}} _{ij}=\tilde{X}_{ij}{1}_{\left( \left| X_{ij}\right| \ge \ln N\right) }+E\tilde{X}_{ij}{1} _{\left( \left| X_{ij}\right| <\ln N\right) },\) we have
For \(p\ge 4,\)
By dominated convergence theorem, we get
For \(p\ge 4\) and definition of rv’s \(\bar{X}_{11}\), we have
From (15), (23), (24), \(E(\left| X_{11}\right| ^{4}{1}_{\left( \ln N\le \left| X_{11}\right| <\sqrt{N}\right) })\le NE\left| X_{11}\right| ^{2}\) and relation (17), we may write
Also (18) gives
where the latter bound is summable by (16).
Hence, we obtain \(\frac{1}{N}\theta ^{4}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}}^{t}\right) ^{2}\rightarrow 0\) a.s.
Now it remains to show that \(\frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}=O\left( 1\right) \) a.s. Using (17), (25) and (26), we find
Consequently, \( E\left[ \frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] -\frac{n}{N }\left( \frac{n}{N}+1-\frac{2}{N}\right) E^{2}\left| \bar{X}_{11}\right| ^{2} \rightarrow 0, \) and,
\( E\left[ \frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] \rightarrow \gamma ^{2}\left[ c\left( c+1\right) \right] . \)
Concerning the variance, by (18), (25) and (26), we may obtain
which is summable. Then, (22) is verified from which (21) follows. This result with (14) allow us to write, \( D\left( F^{\tilde{X}T\tilde{X}^{t}}, F^{\bar{X}T\bar{X}^{t}}\right) \rightarrow 0\ \ a.s. \)
From (19) and (20), in order to prove \(D\left( F^{XTX^{t}},F\right) \rightarrow 0\) a.s, it suffices to verify that, \( D\left( F^{\bar{X}T\bar{X}^{t}},F\right) \rightarrow 0\ \ a.s. \) For this aim, we shall show that for any \(z\in \mathbb {C} ^{+}\), \( m_{F^{\bar{X}T\bar{X}^{t}}}\left( z\right) \rightarrow m_{F}\left( z\right) \ \ a.s. \)
Let \(z\in \mathbb {C} ^{+}\) and \(\bar{B}_{N}=\bar{X}T\bar{X}^{t},\) the sequence \(\{F^{\bar{B} _{N}}\}\) satisfies the assumptions of Lemma 2.8 of [8]. So \(\exists m>0\) such that
Write \( \bar{B}_{N}-zI=\left( x-z\right) I+\bar{X}T\bar{X}^{t}-xI,\) and then
where
with \(\bar{q}_{j}\) denote the jth column of \(\bar{X}\), and \(\bar{B} _{\left( j\right) },\) x, \(x_{\left( j\right) }\) are defined by relations (8) and (9).
Lemma 3.1 of [8] and assumption (d) of the Theorem 1 permit us to obtain
where
Lemma 2.6 of [8] gives us, \( \max _{j\le n}\max [ \mid \gamma _1\mid ,\mid \gamma _2\mid ]\rightarrow 0, \) where \(\gamma _1=m_{F^{\bar{B}_{\left( j\right) }}}\left( z\right) -m_{F^{\bar{B}_{N}}}\left( z\right) ,\) \( \gamma _2=m_{F^{\bar{B}_{N}}}\left( z\right) -\bar{q}_{j}^{t}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\bar{q}_{j}. \)
So that for N large enough, we have, \( \max _{j\le n}\max [ \left| \mathfrak {I}\mathtt {m}\gamma _1\right| ,\left| \mathfrak {I}\mathtt {m}\gamma _2\right| ]<\frac{\delta }{2}. \)
Then, for \(j,l\le n,\)
and
Therefore,
Using Lemmas 2.6, 2.7 of [8] and (28), (29), we may have
Since
we may conclude from (27) that
Hence, the relation (11) is satisfied.
References
Bai, Z.D., Silverstein, J.W.: Spectral Analysis of Large Dimensional Random Matrices, 2nd edn. Springer (2010)
Bosq, D.: Linear processes in function spaces. Theory and Applications. Lecture Notes in Statistics, vol. 149. Springer (2000)
Boutet de Monvel, A., Khorunzhy, A.: Limit theorems for random matrices with correlated entries. Markov Process. Relat. Fields 4(2), 175–197 (1998)
Grenander, U., Silverstein, J.W.: Spectral analysis of networks with random topologies. SIAM J. Appl. Math. 32(2), 499–519 (1977)
Jonsson, D.: Some limit theorems for the eigenvalues of a sample covariance matrix. J. Multivar. Anal. 12, 1–38 (1982)
Khettab, Z., Mourid, T.: Eigenvalues empirical distribution of covariance matrices of AR processes. A simulation study. Annales de l’ISUP, 60, fasc 1–2, 3–22 (2016)
Marcenko, V.A., Pastur, L.A.: Distribution of eigenvalues in certain sets of random matrices. Mat. SB (N. S.). 72(114), 507–536 (1967)
Silverstein, J.W., Bai, Z.D.: On the empirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivar. Anal. 54(2), 175–192 (1995)
Silverstein, J.W., Bai, Z.D., Couillet, R., Debbah, M.: Eigen-inference for energy estimation of multiple sources. IEEE Trans. Inf. Theor. 57(4), 2420–2439 (2011)
Silverstein, J.W., Choi, S.I.: Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivar. Anal. 54(2), 295–309 (1995)
Wachter, K.W.: The limiting empirical measure of multiple discriminant ratios. Ann. Stat. 8, 937–957 (1980)
Yin, Y.Q.: Limiting spectral distribution for a class of random matrices. J. Multivar. Anal. 20, 50–68 (1986)
Acknowledgements
The authors would like to thank the Editor and anonymous referees for insightful comments improving the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Khettab, Z., Mourid, T. (2018). Eigenvalues Distribution Limit of Covariance Matrices with AR Processes Entries. In: Rojas, I., Pomares, H., Valenzuela, O. (eds) Time Series Analysis and Forecasting. ITISE 2017. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96944-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-96944-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96943-5
Online ISBN: 978-3-319-96944-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)