Keywords

1 Introduction

Theoretical studies on covariance matrices have a long history and appear in many domains in the real world and having links with practical problems (see [1] and [9]). For example, in multivariate statistics, spectral asymptotic results are used in solving the detection problem in signal process [9].

Consider the following random matrix:

$$\begin{aligned} B_{N}=X_{N}T_{N}X_{N}^{t} \end{aligned}$$
(1)

where \(X_{N}=(\frac{1}{\sqrt{N}}X_{ij}),\) (\(i=1,\ldots ,N;\) \(j=1,\ldots ,n(N)\)) is a matrix \((N\times n(N))\) with independent rows with the entries \(X_{ij}\) of each row satisfy an autoregressive relation AR(1) and \(T_{N}\) is a diagonal matrix \((n\times n)\) with real entries independent of \(X_{N}\) (\(X_{N}^{t}\) is the transpose matrix of \(X_{N}\)). More precisely, for each \(i\ge 1\) we have

$$\begin{aligned} X_{ij+1}=\rho X_{ij}+\varepsilon _{ij+1},\ \ \ \ \ \ \ j\ge 1 \end{aligned}$$
(2)

where \(\left( \varepsilon _{ij},i,j\ge 1\right) \) are i.i.d. rv’s (random values) with mean 0 and variance \(\sigma ^{2}>0,\) such that \(\varepsilon _{ij}\) admits a continuous density function with respect to Lebesgue measure. The parameter \(\rho \) is such that \( \left| \rho \right| <1\) assuring a strictly stationary process. The diagonal matrix \(T_{N}=diag\left( \tau _{1},\ldots ,\tau _{n}\right) \) is independent of \(X_{N}\) and the rv’s \(\tau _{i}\) are real.

The empirical distribution function \(\left( e.d.f.\right) \) of the eigenvalues (\(\lambda _{i})\) of the symmetric matrix \(B_{N}\) is defined by

$$ F^{B_{N}}\left( x\right) =\frac{1}{N}\sum \limits _{i=1}^{N}{1}_{\left( \lambda _{i}\le x\right) } , $$

where \({1}_{A}\) denoting the indicator function of the set A.

A large number of papers have dealt with the problem to identify the limit of the e.d.f. of eigenvalues of random matrices \( B_{N}\) as \(N\longrightarrow \infty \) and \(\frac{n(N)}{N}\longrightarrow c>0\). Marcenko and Pastur [7] originally studied this problem for more general forms of random matrices. They establish, under some conditions on moments, that if the e.d.f. \(F^{T_{N}}\) converges to a proper distribution function H, then \(F^{B_{N}}\) converges in probability to a proper distribution function. Their method involves the Stieltjes transform, where they shown that the Stieltjes transform of the limiting distribution function satisfies a first-order partial differential equation, then via the characteristics they shown that this function is a solution of an algebraic equation identifying hence the limit.

Afterward, several authors [4, 5, 8, 11, 12] extended this result giving the almost sure convergence of the e.d.f. of eigenvalues under mild conditions on the entries \(X_{ij}.\) Most of the previous papers employ the same transform as [7] and the entries \(X_{ij}\) of the matrices are independent random variables, except the paper [3], where dependent entries are considered.

Our goal in this paper is to study, under some assumptions, the limit of the e.d.f. \(F^{B_{N}}\) of the random matrix \(B_{N}=X_{N}T_{N}X_{N}^{t}\), where the entries \(X_{ij}\) of the matrix \(X_{N}\) satisfy an autoregressive relation AR(1) for each i. We follow the approach given in [8] where the authors apply Marcenko and Pastur method to study the limit of Stieltjes transform of the e.d.f. \(F^{B_{N}}\) and then we identify the limit law. We illustrate by numerical simulations the behavior of kernel density estimators and density estimators of Stieltjes transform to identify the true density and give \(L_{1}\) errors varying different parameters.

The paper is organized as follows. Section 2 provides the main result. Section 3 presents numerical simulations. The proof of the main result will be postponed in Sect. 4.

2 Main Result

First, we introduce some random variables and random matrices. We truncate and centralize the entries \(X_{ij}\) of the random matrix \(X_{N}\) to obtain new corresponding random matrices as follows: for \(i=1,\ldots ,N;\) \(j=1,\ldots ,n(N)\), let

$$\begin{aligned} \hat{X}_{ij}=X_{ij}{1}_{\left( \left| X_{ij}\right| <\sqrt{N}\right) }, \hat{X}_{N}=\left( \frac{1}{\sqrt{N}}\hat{X} _{ij}\right) , \hat{B}_{N}=\hat{X}_{N}T_{N}\hat{X}_{N}^{t} , \end{aligned}$$
(3)
$$\begin{aligned} \tilde{X}_{ij}=\hat{X}_{ij}-E\left( \hat{X}_{ij}\right) , \tilde{X}_{N}=\left( \frac{1}{\sqrt{N}}\tilde{X}_{ij}\right) , \tilde{B}_{N}=\tilde{X}_{N}T_{N}\tilde{X}_{N}^{t} , \end{aligned}$$
(4)

and

$$\begin{aligned} {\left\{ \begin{array}{ll} \bar{X}_{ij}=\tilde{X}_{ij}{1}_{\left( \left| X_{ij}\right| \le \ln N\right) }-E\tilde{X}_{ij}{1} _{\left( \left| X_{ij}\right| \le \ln N\right) },\\ \bar{X}_{N}=\left( \frac{1}{\sqrt{N}}\bar{X}_{ij}\right) , \bar{B}_{N}= \bar{X}_{N}T_{N}\bar{X}_{N}^{t} . \end{array}\right. } \end{aligned}$$
(5)

We pointed out that the problem described above has been often handled by the method of Stieltjes transform. Let \( \mathcal {M}\left( \mathbb {R} \right) \) be the set of distribution functions on \( \mathbb {R} .\) Recall that the Stieltjes transform of a distribution function \(F\in \mathcal {M} \left( \mathbb {R} \right) \) is defined by

$$ m_{F}\left( z\right) =\int \frac{1}{\lambda -z}dF\left( \lambda \right) ,\ \ z\in \mathbb {C} ^{+}\equiv \left\{ z\in \mathbb {C} :\mathfrak {I}\mathtt {m}z>0\right\} , $$

where \({\mathfrak {I}\mathtt {m}}\) is the imaginary part. The inversion formula is given by

$$ F([a,b])=\frac{1}{\pi }\lim _{\epsilon \rightarrow 0^{+}}\int \nolimits _{a}^{b} \mathfrak {I}\mathtt {m}m_{F}\left( x+i\epsilon \right) dx, $$

where a and b are continuity points of F. Also, the weak convergence of probability distribution functions is equivalent to the convergence of Stieltjes transforms (Theorem B.9, [1]). From the inversion formula, it follows that for any countable set \(S\subset \mathbb {C} ^{+}\) such that \( \mathbb {R} \subset \bar{S}\) the closure of S,  and a sequence \((F_{N})\in \mathcal {M} \left( \mathbb {R} \right) \), \(F\in \mathcal {M}\left( \mathbb {R} \right) \), we have the following equivalence:

$$\begin{aligned} \lim _{N\rightarrow \infty }m_{F_{N}}\left( z\right) =m_{F}\left( z\right) , \forall z\in S\Longleftrightarrow F_{N}\rightarrow F\ \ as\ \ N\rightarrow \infty , \end{aligned}$$
(6)

where \(F_{N}\rightarrow F\) is the vague convergence of distributions functions.

Furthermore, we consider the following random matrices. For \(j,l=1,2,\ldots ,n(N)\), denote by \(\bar{q}_{j}\) the jth column of \(\bar{X}_{N}\,\) defined by (5) that is

$$\begin{aligned} \bar{q}_{j}=\frac{1}{\sqrt{N}}\left( \bar{X}_{1j},\ldots ,\bar{X}_{Nj}\right) ^{t}:=\frac{1}{\sqrt{N}}V_{j}, \end{aligned}$$
(7)

and by

$$\begin{aligned} \bar{B}_{\left( j\right) }=\bar{B}_{\left( j\right) }^{N}:=\bar{B}_{N}-\tau _{j}\bar{q}_{j}\bar{q}_{j}^{t}, \end{aligned}$$
(8)

where \(\tau _{j}\) are the elements of \(T_{N},\) and define

$$\begin{aligned} x=x_{N}:=\frac{1}{N}\sum \limits _{j=1}^{n}\frac{\tau _{j}}{1+\tau _{j}m_{F^{ \bar{B}_{N}}}\left( z\right) }, x_{\left( j\right) }=x_{\left( j\right) }^{N}:=\frac{1}{N}\sum \limits _{l=1}^{n}\frac{\tau _{l}}{1+\tau _{l}m_{F^{\bar{B}_{\left( j\right) }}}\left( z\right) }, \end{aligned}$$
(9)

where \(m_{F^{\bar{B}_{N}}}\) and \(m_{F^{\bar{B}_{\left( j\right) }}}\)are Stieltjes transform of the matrices \(\bar{B}_{N}\) and \(\bar{B}_{\left( j\right) }\), respectively. Finally, set

$$\begin{aligned} C_{\left( j\right) }^{1}:=\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1},\quad C_{\left( j\right) }^{2}:=\left( x_{\left( j\right) }-z\right) ^{-1}\left( \bar{B} _{\left( j\right) }-zI\right) ^{-1} \end{aligned}$$
(10)

where I is the identity matrix.

Now, we state the main result of this paper giving the almost sure limit of the e.d.f. of the eigenvalues of the random matrix \(B_N\) (tr is the trace of the matrix).

Theorem 1

Assume

  1. (a)

    For \(N=1,2,\ldots ,\) let\(\ \ X_{N}=\left( \frac{1}{\sqrt{N }}X_{ij}\right) \) be a matrix \(\left( N\times n(N)\right) \) with independent rows and an AR(1) autoregressive relation (2) in each row. The entries \( X_{ij}, i, j \ge 1, \) have all their moments finite and \(\frac{n(N)}{N}\rightarrow c>0\) as \(N\rightarrow \infty \).

  2. (b)

    \(T_{N}= diag \left( \tau _{1},\ldots ,\tau _{n}\right) ,\) \(\tau _{i}\in \mathbb {R}\), and the e.d.f. of \(T_{N}\) converges almost surely to a distribution function H as \(N\rightarrow \infty .\)

  3. (c)

    The matrices \(X_{N}\) and \(T_{N}\) are independent.

  4. (d)

    For \(k=1,2\) and \(j=1,2,\ldots ,n(N),\) the matrices \( C_{\left( j\right) }^{k}\) defined in (10) satisfy \(E\left| V_{j}^{t}C_{\left( j\right) }^{k}V_{j}-trC_{\left( j\right) }^{k}\right| ^{6}\le KN^{3},\) where \(V_{j}\) given by (7) and \(K>0\).

Then, the e.d.f. \(F^{B_{N}}\) of the random matrix \( B_{N}=X_{N}T_{N}X_{N}^{t}\) converges vaguely almost surely to a distribution function F, as \(N\longrightarrow \infty \), whose Stieltjes transform \(m_{F}\)(z) satisfies the following functional relation:

$$\begin{aligned} m_{F} (z) =-\left( z-c\int \frac{\tau dH\left( \tau \right) }{ 1+\tau m_{F}\left( z\right) }\right) ^{-1}; z\in \mathbb {C} ^{+}. \end{aligned}$$
(11)

Remark Assumption (a) is fulfilled in part if the white noise \((\varepsilon _{ij})\) has all moments (Gaussian white noise). Assumptions (b), (c) are standard and analogous of that existing in [8]. Assumption (d) requires a control of sixth moment of quadratic form of a matrix and its trace by a third power of size N. In the case of i.i.d. entries \( X_{ij}, i, j \ge 1\), assumption (d) is fulfilled (cf. Lemma 3.1 in [8]).

3 Numerical Simulations

As a practical impact of the main result, we illustrated in [6] the behavior of the empirical density estimator of e.d.f. of eigenvalues \((\lambda _{i}, i=1,\ldots ,N)\) of large random matrices \(B_{N}\), and identify the density function of the limit law by numerical simulations. First, we recall the formulas giving density of limit law and the empirical Stieltjes transform estimator. From [10], we have for all \(x\in \mathbb {R} -\left\{ 0\right\} ,\) and \(z=x +iy\), \(y>0, \) the distribution function F (limit of the e.d.f. \( F^{B_{N}}\)) has a continuous derivative f defined by \( f(x)=\left( 1/\pi \right) {\mathfrak {I}\mathtt {m}}m_{0}(x), \) where \(m_{0}(x)\) is given by Stieltjes transform \(m_{F}\left( z\right) \) as \(\lim _{z\rightarrow x} \ \ m_{F}(z):=m_{0}(x)\) (Figs. 1, 2 and Table 1).

The Stieltjes Transform Estimator (STE) is defined by

$$f_{N}(x)=\left( 1/\pi \right) \mathfrak {I}\texttt {m}m_{F^{B_{N}}}\left( z\right) ,$$

where \(m_{F^{B_{N}}}\left( z\right) =\frac{1}{N}tr\left( B_{N}-zI\right) ^{-1}=\ \frac{1}{N}\sum \limits _{i=1}^{N}\left( \lambda _{i}-z\right) ^{-1}.\)

Fig. 1
figure 1

Densities of the limit law and STE with \(T_{N}\) the identity matrix

Fig. 2
figure 2

Densities of the limit law and STE with \(T_{N}\) having three eigenvalues: 1, 3, 10

Table 1 \(L_{1}\)-errors of STE in Case 1: \(T_{N}\) identity matrix with \(c=1\) and Case 2: \(T_{N}\) diagonal matrix having three eigenvalues: 1, 3, 10 with \(c=0.2\). For weak and strong dependence and different sample size values N

Now, we apply Gaussian Kernel Estimators (GKE) defined by

$$\hat{f}_{N}(\lambda )=\frac{1}{Nh_{N}}\sum \limits _{i=1}^{N}K(\frac{ \lambda -\lambda _{i}}{h_{N}}); \lambda \in \mathbb {R} $$

where \(h_{N}\) is the bandwidth converging to 0 and \( Nh_{N}\rightarrow \infty \), and K is a Gaussian kernel : \( K(u)=\frac{1}{\sqrt{2\pi }}\exp (-\frac{1}{2}u^{2}). \)

We compare the performance of Stieltjes Transform Estimators (STE) and Gaussian Kernel Estimators (GKE) on the base of \(L_{1}\)-errors (Fig. 3 and Table 2).

Fig. 3
figure 3

Behavior of STE and GKE with matrix \(T_{N}\) having three eigenvalues: 1, 3, 10

Table 2 \(L_{1}\)-errors of the STE and GKE, with \(T_{N}\) having three eigenvalues and \(c=0.2\) for different sample size values N

Conclusion

From more numerical simulations, we may observe that the performance of estimators strongly depends on the choices of the dimension c, AR parameter \(\rho \), and sample size N. The variability of parameters has a direct impact on the stabilization and convergence rate of the estimators. Particular choices of parameters confirm a good performance of the estimators and lead to indicate optimal values for these parameters. We also observe an effect of the dimension c on density estimator convergence rate. For \(c>1\), for both weak and strong dependencies (\(\rho =0.2,\) \(\rho =0.7\)), the estimators perform well from on \(N=100\). However, for small values of c (\(c<0.2\)), there is an influence of parameter values on the convergence rate. For weak dependence (\(\rho =0.2\)) STE perform quite well for moderate value \(N=100\), whereas for strong dependence (\(\rho =0.7\)) the estimator accurate enough well only for large N (when \(N>1000)\). The number of eigenvalues of \(T_{N}\) has an effect on the behavior of the estimators as well as on their performance. Both estimators perform well and give a good representation of the true density with a small advantage of GKE.

4 Proof of the Main Result

Recall these well-known facts. For each i, the process \((X_{ij},\) \(j\in \mathbb {Z} )\) satisfying relation (2) is a stationary AR(1) process, then it satisfies the geometric strong mixing property (G.S.M) with strong mixing coefficient \(\alpha _{k}=\alpha _{k}(\mathcal {F}_{0}^{m},\mathcal {F}_{m+k}^{\infty })=O\left( \rho ^{k}\right) \), where \(0<\rho <1\) and \(\mathcal {F}_{a}^{b}=\mathcal {F}_{a,i}^{b}=\sigma (X_{ij}, a\le j \le b)\), whenever \(\varepsilon _{ij}\) has a strictly positive continuous density (see [2] p. 58).

The covariance between two real-valued rv’s is bounded as follows: if \(\eta \in L^{p}\) and \(\xi \in \) \(L^{q}\) are \(\mathcal {F}_{0}^{m}\) and \(\mathcal {F}_{m+k}^{\infty }\)-measurable, respectively, then we have

$$\begin{aligned} \left| E\left( \eta \xi \right) -E\left( \eta \right) E\left( \xi \right) \right| \le 12\left\| \eta \right\| _{p}\left\| \xi \right\| _{q}\alpha _{k}^{ \frac{1}{r}} \end{aligned}$$
(12)

for all \(1\le p,q,r\le \infty \) with \(\frac{1}{p}+\frac{1}{q}+ \frac{1}{r}=1,\) and the norm \(\left\| .\right\| _{p}=E^{\frac{1}{p} }\left| .\right| ^{p}\).

On the other hand, there exists a distance D(., .) on the space \(\mathcal {M}\left( \mathbb {R} \right) ,\) such that for two sequences \((F_{N}),(G_{N})\in \mathcal {M}\left( \mathbb {R} \right) ,\) we have (see [8]).

$$\begin{aligned} \lim _{N\rightarrow \infty }\left\| F_{N}-G_{N}\right\| =0\ \ \Longrightarrow \ \ \ \lim _{N\rightarrow \infty }D\left( F_{N},G_{N}\right) =0, \end{aligned}$$
(13)

where \(\left\| .\right\| \) denotes the sup-norm of bounded functions from \( \mathbb {R} \) to \( \mathbb {R}.\)

To lighten the writing, the dependency of most of variables on N will occasionally be dropped form the notation. Now, we replace T by a suitable matrix for further analysis: for \(\theta \ge 0\) define, \(T_{\theta }=diag\left( \tau _{1}{1}_{\left( \left| \tau _{1}\right| \le \theta \right) },\ldots ,\tau _{n}{1 }_{\left( \left| \tau _{n}\right| \le \theta \right) }\right) ,\) and let Q be any \(\left( N\times n\right) \) matrix. If \(\theta \) and \( (-\theta ) \) are continuity points of H, then by Lemma 2.5 of [8] and assumption (b) of the Theorem 1, as \(N\rightarrow \infty \) and \(\frac{n}{N}\rightarrow c>0,\) we have

$$ \left\| F^{QTQ^{t}}-F^{QT_{\theta }Q^{t}}\right\| \le \frac{1}{N} rg\left( T-T_{\theta }\right) =\frac{1}{N}\sum \limits _{j=1}^{n}{1} _{\left( \left| \tau _{j}\right| >\theta \right) }\rightarrow cH\left\{ \left[ -\theta ,\theta \right] ^{c}\right\} \ \ a.s. $$

It follows that if \(\theta =\theta _{N}\rightarrow \infty \), then

$$\begin{aligned} \left\| F^{QTQ^{t}}-F^{QT_{\theta }Q^{t}}\right\| \rightarrow 0\ \ a.s. \end{aligned}$$
(14)

Choose \(\theta \) such that

$$\begin{aligned} \theta ^{4}\left[ E^{\frac{2}{3}}\left| X_{11}\right| ^{2}{1 }_{\left( \left| X_{11}\right| \ge \ln N\right) }+\frac{1}{N} \right] \rightarrow 0, \end{aligned}$$
(15)

and

$$\begin{aligned} \sum \limits _{N=1}^{\infty }\theta ^{8}\left[ \frac{1}{N^{7/6}}E^{1/6} \left| X_{11}\right| ^{4}{1}_{\left( \ln N\le \left| X_{11}\right|<\sqrt{N}\right) }+\frac{1}{N^{2}}\right] <\infty . \end{aligned}$$
(16)

For continue, we need the following result.

Lemma 1

Let the \(\left( N\times n\right) \) matrices \(X=\left( \frac{1}{\sqrt{N}}X_{ij}\right) \) verifying assumption (a) of Theorem 1, and \(\hat{X} =\left( \frac{1}{\sqrt{N}}\hat{X}_{ij}\right) \) where \(\hat{X}_{ij}=X_{ij}{1}_{\left( \left| X_{ij}\right| <\sqrt{N}\right) }\). For \(\theta \ge 0 \) set \(T_{\theta }=diag\left( \tau _{1}{1}_{\left( \left| \tau _{1}\right| \le \theta \right) },\ldots ,\tau _{n}{1}_{\left( \left| \tau _{n}\right| \le \theta \right) }\right) , \tau _{i}\in \mathbb {R}.\) We have

$$ D\left( F^{XT_{\theta }X^{t}},\ \ F^{\hat{X}T_{\theta }\hat{X} ^{t}}\right) \rightarrow 0 \ \ a.s. $$

Proof

From Corollary A.42 of [1], we find

$$ D^{2}\left( F^{XT_{\theta }X^{t}},F^{\hat{X}T_{\theta }\hat{X}^{t}}\right) \le \left[ \frac{2}{N}tr\left( XX^{t}-\hat{X}\hat{X}^{t}\right) +\frac{4}{N}tr\hat{X}\hat{X}^{t}\right] \left[ \frac{\theta ^{2}}{N}tr\left( XX^{t}-\hat{X}\hat{X}^{t}\right) \right] . $$

In order that this distance tends almost surely to 0, we can show by Borel–Cantelli lemma that \(\left[ \frac{\theta ^{2}}{N}tr\left( XX^{t}-\hat{X}\hat{X}^{t}\right) \right] \) tends to 0 and \(\left[ \frac{4}{N}tr\hat{X}\hat{X}^{t}\right] \) is bounded almost surely. So the result.

4.1 Proof of the Theorem 1

For \(\left( N\times n\right) \) matrix, \(X=\left( \frac{1}{\sqrt{N}} X_{ij}\right) \) verifying assumption (a) of Theorem 1. With help to inequality (12) and the fact that \(\left( X_{ij}\right) \) satisfies the G.S.M. property, we obtain

$$\begin{aligned} M_{1}\le E\left( tr\left( XX^{t}\right) ^{2}\right) \le M_{2} \end{aligned}$$
(17)

where

$$ M_{1}=\frac{K}{N}\{nE\left| X_{11}\right| ^{4}+n\left( N+n-2\right) E^{2}\left| X_{11}\right| ^{2}-E^{\frac{2}{3}}\left| X_{11}\right| ^{6}\}, $$
$$ M_{2}=\frac{K}{N}\{nE\left| X_{11}\right| ^{4}+n\left( N+n-2\right) E^{2}\left| X_{11}\right| ^{2}+E^{\frac{2}{3}}\left| X_{11}\right| ^{6}+\left( N-1\right) E^{\frac{4}{3}}\left| X_{11}\right| ^{3}\}. $$

With the same arguments, we may deduce a bound of the variance

$$\begin{aligned} var\left( tr\left( XX^{t}\right) ^{2}\right) \le \frac{K}{N^{4}} \{N^{4}E\left| X_{11}\right| ^{4}E^{2}\left| X_{11}\right| ^{2} \end{aligned}$$
(18)
$$\begin{aligned}&+N^{3}[E^{\frac{4}{3}}\left| X_{11}\right| ^{6}+E^{\frac{1}{2} }\left| X_{11}\right| ^{4}E^{\frac{1}{3}}\left| X_{11}\right| ^{6}E^{\frac{1}{3}}\left| X_{11}\right| ^{12}+E\left| X_{11}\right| ^{2}E^{\frac{2}{3}}\left| X_{11}\right| ^{9}] \\&+N^{2}[E^{\frac{2}{3}}\left| X_{11}\right| ^{6}E^{\frac{1}{6} }\left| X_{11}\right| ^{24}+E^{\frac{1}{3}}\left| X_{11}\right| ^{15}E\left| X_{11}\right| ^{3}+E^{\frac{2}{3} }\left| X_{11}\right| ^{12}+E^{\frac{5}{12}}\left| X_{11}\right| ^{12}E^{\frac{1}{6}}\left| X_{11}\right| ^{18}]\}. \end{aligned}$$

Using (14) and (13), we may write

$$ D\left( F^{XTX^{t}}, F^{XT_{\theta }X^{t}}\right) \rightarrow 0 \ \ and \ \ D\left( F^{\hat{X}T_{\theta }\hat{X}^{t}}, F^{\hat{X}T\hat{X} ^{t}}\right) \rightarrow 0 \ \ a.s. $$

Furthermore, by Lemma 1, we get

$$\begin{aligned} D\left( F^{XTX^{t}}, F^{\hat{X}T\hat{X}^{t}}\right) \rightarrow 0\ \ a.s. \end{aligned}$$
(19)

For \(\hat{B}_{N}\) and \(\tilde{B}_{N}\) defined by relations (3) and (4), we have from Lemma 2.5 of [8],

$$\begin{aligned} \left\| F^{\hat{B}_{N}}-F^{\tilde{B}_{N}}\right\| \rightarrow 0. \end{aligned}$$
(20)

Let \({\bar{\bar{X}}}_{ij}=\tilde{X}_{ij}-\bar{X}_{ij}\). Hence,

$$ {\bar{\bar{X}}}_{ij}=\tilde{X}_{ij}{1}_{\left( \left| X_{ij}\right| \ge \ln N\right) }+E\tilde{X}_{ij} {1}_{\left( \left| X_{ij}\right| <\ln N\right) },\ \ {\bar{\bar{X}}}=\left( \frac{1}{\sqrt{N}}{\bar{\bar{X}}} _{ij}\right) $$

where \(\tilde{X}_{ij}\) and \(\bar{X}_{ij}\) are defined by the relations (4), (5), respectively.

Then, from Cauchy–Schwartz inequality, we can show that the squared distance

$$ D^{2}\left( F^{\tilde{X}T_{\theta }\tilde{X}^{t}}, F^{\bar{X} T_{\theta }\bar{X}^{t}}\right) $$

is bounded by

$$\begin{aligned}&\frac{1}{N}\left\{ \theta ^{2}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}}^{t}\right) ^{2} \right. +4\left[ \theta ^{4}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}}^{t}\right) ^{2}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] ^{\frac{1}{2}} \\&\qquad \qquad \qquad \qquad \left. +4\left[ \left[ \theta ^{4}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}} ^{t}\right) ^{2}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] ^{\frac{1}{2} }\theta ^{2}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}}^{t}\right) ^{2} \right] ^{\frac{1}{2}}\right\} . \end{aligned}$$

Therefore, in order to show that almost surely

$$\begin{aligned} D\left( F^{\tilde{X}T_{\theta }\tilde{X}^{t}}, F^{\bar{X}T_{\theta } \bar{X}^{t}}\right) \rightarrow 0 , \end{aligned}$$
(21)

it suffices to verify that

$$\begin{aligned} \theta ^{4}\frac{1}{N}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}}^{t} \right) ^{2}\rightarrow 0 , \frac{1}{N}tr\left( \bar{X} \bar{X}^{t}\right) ^{2}=O\left( 1\right) \ \ a.s. \end{aligned}$$
(22)

Since \(E\left( {\bar{\bar{X}}}_{11}\right) =0\) and\(\ {\bar{\bar{X}}} _{ij}=\tilde{X}_{ij}{1}_{\left( \left| X_{ij}\right| \ge \ln N\right) }+E\tilde{X}_{ij}{1} _{\left( \left| X_{ij}\right| <\ln N\right) },\) we have

$$\begin{aligned} E\left| {\bar{\bar{X}}}_{11}\right| ^{2}\le KE\left| X_{11}\right| ^{2}{1}_{\left( \left| X_{11}\right| \ge \ln N\right) }\rightarrow 0. \end{aligned}$$
(23)

For \(p\ge 4,\)

$$\begin{aligned} E\left| {\bar{\bar{X}}}_{11}\right| ^{p}\le K\left( N^{\frac{ p-4}{2}}E\left| X_{11}\right| ^{4}{1}_{\left( \ln N\le \left| X_{11}\right| <\sqrt{N}\right) }+1\right) . \end{aligned}$$
(24)

By dominated convergence theorem, we get

$$\begin{aligned} E\left| \bar{X}_{11}\right| ^{2}\rightarrow E\left| X_{11}\right| ^{2}=\gamma . \end{aligned}$$
(25)

For \(p\ge 4\) and definition of rv’s \(\bar{X}_{11}\), we have

$$\begin{aligned} E\left| \bar{X}_{11}\right| ^{p}\le K\left( \ln N\right) ^{p-2}. \end{aligned}$$
(26)

From (15), (23), (24), \(E(\left| X_{11}\right| ^{4}{1}_{\left( \ln N\le \left| X_{11}\right| <\sqrt{N}\right) })\le NE\left| X_{11}\right| ^{2}\) and relation (17), we may write

$$\begin{aligned} E\left[ \frac{1}{N}\theta ^{4}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}} ^{t}\right) ^{2}\right]\le & {} K\theta ^{4}\left[ E^{\frac{2}{3}}\left| X_{11}\right| ^{2} {1}_{\left( \left| X_{11}\right| \ge \ln N\right) }+\frac{1 }{N}\right] \rightarrow 0. \end{aligned}$$

Also (18) gives

$$ var\left( \frac{1}{N}\theta ^{4}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}} ^{t}\right) ^{2}\right) \le K\theta ^{8}\left[ \frac{1}{N^{7/6}} E^{1/6}\left| X_{11}\right| ^{4}{1}_{\left( \ln N\le \left| X_{11}\right| <\sqrt{N}\right) }+\frac{1}{N^{2}}\right] , $$

where the latter bound is summable by (16).

Hence, we obtain \(\frac{1}{N}\theta ^{4}tr\left( {\bar{\bar{X}}}{\bar{\bar{X}}}^{t}\right) ^{2}\rightarrow 0\) a.s.

Now it remains to show that \(\frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}=O\left( 1\right) \) a.s. Using (17), (25) and (26), we find

$$\begin{aligned} K\{-\frac{\left( \ln N\right) ^{\frac{8}{3}}}{N^{2}}\}\le & {} E\left[ \frac{1 }{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] -\frac{n}{N}\left( \frac{n }{N}+1-\frac{2}{N}\right) E^{2}\left| \bar{X}_{11}\right| ^{2} \\\le & {} K\{ \frac{n}{N}\frac{\left( \ln N\right) ^{2}}{N}+\frac{\left( \ln N\right) ^{\frac{8}{3}}}{N^{2}}+\frac{\left( \ln N\right) ^{\frac{4}{3}}}{N} \}. \end{aligned}$$

Consequently, \( E\left[ \frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] -\frac{n}{N }\left( \frac{n}{N}+1-\frac{2}{N}\right) E^{2}\left| \bar{X}_{11}\right| ^{2} \rightarrow 0, \) and,

\( E\left[ \frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right] \rightarrow \gamma ^{2}\left[ c\left( c+1\right) \right] . \)

Concerning the variance, by (18), (25) and (26), we may obtain

$$ var\left( \frac{1}{N}tr\left( \bar{X}\bar{X}^{t}\right) ^{2}\right) \le K \frac{\left( \ln N\right) ^{^{17/3}}}{N^{3}}, $$

which is summable. Then, (22) is verified from which (21) follows. This result with (14) allow us to write, \( D\left( F^{\tilde{X}T\tilde{X}^{t}}, F^{\bar{X}T\bar{X}^{t}}\right) \rightarrow 0\ \ a.s. \)

From (19) and (20), in order to prove \(D\left( F^{XTX^{t}},F\right) \rightarrow 0\) a.s, it suffices to verify that, \( D\left( F^{\bar{X}T\bar{X}^{t}},F\right) \rightarrow 0\ \ a.s. \) For this aim, we shall show that for any \(z\in \mathbb {C} ^{+}\), \( m_{F^{\bar{X}T\bar{X}^{t}}}\left( z\right) \rightarrow m_{F}\left( z\right) \ \ a.s. \)

Let \(z\in \mathbb {C} ^{+}\) and \(\bar{B}_{N}=\bar{X}T\bar{X}^{t},\) the sequence \(\{F^{\bar{B} _{N}}\}\) satisfies the assumptions of Lemma 2.8 of [8]. So \(\exists m>0\) such that

$$ \inf _{N}F^{\bar{B}_{N}}\left[ -m,m\right]>0 ,\ \delta =\inf _{N} \mathfrak {I}\mathtt {m}\left( m_{F^{\bar{B}_{N}}}\left( z\right) \right) >0\ \ a.s. $$

Write \( \bar{B}_{N}-zI=\left( x-z\right) I+\bar{X}T\bar{X}^{t}-xI,\) and then

$$\begin{aligned} \left( x-z\right) ^{-1}-m_{\bar{B}_{N}}\left( z\right) =\frac{1}{N}\sum \limits _{j=1}^{n}\frac{\tau _{j}}{1+\tau _{j}m_{F^{\bar{B}_{N}}}\left( z\right) }d_{j}, \end{aligned}$$
(27)

where

$$\begin{aligned} d_{j} =d_{j}^{N}= & {} \frac{1+\tau _{j}m_{F^{\bar{B}_{N}}}\left( z\right) }{ 1+\tau _{j}\bar{q}_{j}^{t}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1} \bar{q}_{j}}\bar{q}_{j}^{t}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\left( \left( x-z\right) ^{-1}I\right) \bar{q}_{j} \\&-\frac{1}{N}tr\left( \bar{B}_{N}-zI\right) ^{-1}\left( \left( x-z\right) ^{-1}I\right) , \end{aligned}$$

with \(\bar{q}_{j}\) denote the jth column of \(\bar{X}\), and \(\bar{B} _{\left( j\right) },\) x, \(x_{\left( j\right) }\) are defined by relations (8) and (9).

Lemma 3.1 of [8] and assumption (d) of the Theorem 1 permit us to obtain

$$\begin{aligned} \max _{j\le n}\max \left[ \beta _1, \beta _2,\beta _3 \right] \rightarrow 0\ \ a.s \end{aligned}$$
(28)

where

$$\begin{aligned} \beta _1= & {} \left| \left\| \bar{q}_{j}\right\| ^{2}-1\right| , \\ \beta _2= & {} \left| \bar{q}_{j}^{t}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\bar{q}_{j}-\frac{1}{N}tr\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\right| , \\ \beta _3= & {} \left| \bar{q}_{j}^{t}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\left( \left( x_{\left( j\right) }-z\right) I\right) ^{-1}\bar{q}_{j}\right. \\&\left. -\frac{1}{N}tr\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\left( \left( x_{\left( j\right) }-z\right) I\right) ^{-1}\right| . \end{aligned}$$

Lemma 2.6 of [8] gives us, \( \max _{j\le n}\max [ \mid \gamma _1\mid ,\mid \gamma _2\mid ]\rightarrow 0, \) where \(\gamma _1=m_{F^{\bar{B}_{\left( j\right) }}}\left( z\right) -m_{F^{\bar{B}_{N}}}\left( z\right) ,\) \( \gamma _2=m_{F^{\bar{B}_{N}}}\left( z\right) -\bar{q}_{j}^{t}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\bar{q}_{j}. \)

So that for N large enough, we have, \( \max _{j\le n}\max [ \left| \mathfrak {I}\mathtt {m}\gamma _1\right| ,\left| \mathfrak {I}\mathtt {m}\gamma _2\right| ]<\frac{\delta }{2}. \)

Then, for \(j,l\le n,\)

$$ \left| \frac{1+\tau _{j}m_{F^{\bar{B}_{N}}}\left( z\right) }{1+\tau _{j}^{t}\bar{q}_{j}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\bar{q} _{j}}-1\right| <\frac{2}{\delta }\mid \gamma _2\mid , $$

and

$$ \left| \frac{\tau _{l}}{1+\tau _{l}m_{F^{\bar{B}_{N}}}\left( z\right) }- \frac{\tau _{l}}{1+\tau _{l}m_{F^{\bar{B}_{\left( j\right) }}}\left( z\right) }\right| \le \frac{2}{\delta ^{2}}\mid \gamma _1\mid . $$

Therefore,

$$\begin{aligned} \max _{j\le n}\max [\left| \frac{1+\tau _{j}m_{F^{\bar{B}_{N}}}\left( z\right) }{1+\tau _{j}^{t}\bar{q}_{j}\left( \bar{B}_{\left( j\right) }-zI\right) ^{-1}\bar{q}_{j}}-1\right| ,\left| x-x_{\left( j\right) }\right| ]\rightarrow 0. \end{aligned}$$
(29)

Using Lemmas 2.6, 2.7 of [8] and (28), (29), we may have

$$ \max _{j\le n}d_{j}\rightarrow 0. $$

Since

$$ \left| \frac{\tau _{j}}{1+\tau _{j}m_{F^{\bar{B}_{N}}}\left( z\right) }\right| \le \frac{1}{\delta }, $$

we may conclude from (27) that

$$ \left( x-z\right) ^{-1}-m_{\bar{B}_{N}}\left( z\right) \rightarrow 0. $$

Hence, the relation (11) is satisfied.

Finally, using (6), the proof of Theorem 1 is now complete.