1 Introduction

A typical object of interest in many fields is the sample covariance matrix \(\mathbf{B}_n=n^{-1}\sum _{j=1}^n \mathbf{X}^T_j\mathbf{X}_j\) where \((\mathbf{X}_j)\), \(j=1,\ldots ,n\), is a sequence of \(N=N(n)\)-dimensional real-valued row random vectors. The interest in studying the spectral properties of such matrices has emerged from multivariate statistical inference since many test statistics can be expressed in terms of functionals of their eigenvalues. The study of the empirical distribution function (e.d.f.) \(F^{\mathbf{B}_n}\) of the eigenvalues of \({\mathbf{B}_n}\) goes back to Wishart 1920s, and the spectral analysis of large-dimensional sample covariance matrices has been actively developed since the remarkable work of Marc̆enko and Pastur [10] stating that if \(\lim _{n\rightarrow \infty }N/n=c\in (0,\infty )\), and all the coordinates of all the vectors \(\mathbf{X}_j\)’s are i.i.d. (independent identically distributed), centered and in \(\mathbb{L }^2\), then, with probability one, \(F^{\mathbf{B}_n}\) converges in distribution to a non-random distribution (the original Marc̆enko–Pastur’s theorem is stated for random variables having moment of order four, for the proof under moment of order two only, we refer to Yin [24]).

Since the Marc̆enko–Pastur’s pioneering paper, there has been a large amount of work aiming at relaxing the independence structure between the coordinates of the \(\mathbf{X}_j\)’s. Yin [24] and Silverstein [17] considered a linear transformation of independent random variables, which leads to the study of the empirical spectral distribution of random matrices of the form \(\mathbf{B}_n=n^{-1}\sum _{j=1}^n\Gamma _N^{1/2}\mathbf{Y}^T_j \mathbf{Y}_j\Gamma _N^{1/2}\) where \(\Gamma _N\) is an \(N \times N\) nonnegative definite Hermitian random matrix, independent of the \(\mathbf{Y}_j\)’s which are i.i.d and such that all their coordinates are i.i.d. In the latter paper, it is shown that if \(\lim _{n\rightarrow \infty } N/n=c\in (0,\infty )\) and \(F^{{\Gamma }_N}\) converges almost surely in distribution to a non-random probability distribution function (p.d.f.) \(H\) on \([0,\infty )\), then, almost surely, \(F^{\mathbf{B}_n}\) converges in distribution to a (non-random) p.d.f. \(F\) that is characterized in terms of its Stieltjes transform which satisfies a certain equation. Some further investigations on the model mentioned above can be found Silverstein and Bai [18] and Pan [13].

A natural question is then to wonder whether other possible correlation patterns of coordinates can be considered, in such a way that, almost surely (or in probability), \(F^{\mathbf{B}_n}\) still converges in distribution to a non-random p.d.f. The recent work by Bai and Zhou [2] is in this direction. Assuming that the \(\mathbf{X}_j\)’s are i.i.d. and a very general dependence structure of their coordinates, they derive the limiting spectral distribution (LSD) of \(\mathbf{B}_n\). Their result has various applications. In particular, in case when the \(\mathbf{X}_j\)’s are independent copies of \(\mathbf{X}= (X_1, \ldots ,X_N)\) where \((X_k)_{k \in \mathbb{Z }}\) is a stationary linear process with centered i.i.d. innovations, applying their Theorem 1.1, they prove that, almost surely, \(F^{\mathbf{B}_n}\) converges in distribution to a non-random p.d.f. \(F\), provided that \(\lim _{n \rightarrow \infty } N/n = c \in (0,\infty )\), the coefficients of the linear process are absolutely summable and the innovations have a moment of order four (see their Theorem 2.5). For this linear model, let us mention that in a recent paper, Yao [23] shows that the Stieltjes transform of the limiting p.d.f. \(F\) satisfies an explicit equation that depends on \(c\) and on the spectral density of the underlying linear process. Still in the context of the linear model described above but relaxing the equidistribution assumption on the innovations, and using a different approach than the one considered in the papers by Bai and Zhou [2] and by Yao [23], Pfaffel and Schlemm [15] also derive the LSD of \(\mathbf{B}_n\) still assuming moments of order four for the innovations plus a polynomial decay of the coefficients of the underlying linear process.

In this work, we extend such Marc̆enko–Pastur-type theorems along another direction. We shall assume that the \(\mathbf{X}_j\)’s are independent copies of \(\mathbf{X}= (X_1,\ldots ,X_N)\) where \((X_k)_{k \in \mathbb{Z }}\) is a stationary process of the form \(X_k=g(\ldots , \varepsilon _{k-1},\varepsilon _k )\), the \(\varepsilon _k\)’s are i.i.d. real-valued random variables, and \(g:\mathbb{R }^{\mathbb{Z }} \rightarrow \mathbb{R }\) is a measurable function such that \(X_k\) is a proper centered random variable. Assuming that \(X_0\) has a moment of order two only, and imposing a dependence condition expressed in terms of conditional expectation, we prove that if \(\lim _{n \rightarrow \infty } N/n = c \in (0,\infty )\), then almost surely, \(F^{\mathbf{B}_n}\) converges in distribution to a non-random p.d.f. \(F\) whose Stieltjes transform satisfies an explicit equation that depends on \(c\) and on the spectral density of the underlying stationary process \((X_k)_{k \in \mathbb{Z }}\) (see our Theorem 2.1). The imposed dependence condition is directly related to the physical mechanisms of the underlying process and is easy verifiable in many situations. For instance, when \((X_k)_{k \in \mathbb{Z }}\) is a linear process with i.i.d. innovations, our dependence condition is satisfied, and then our Theorem 2.1 applies, as soon as the coefficients of the linear process are absolutely summable and the innovations have a moment of order two only, which improves Theorem 2.5 in Bai and Zhou [2] and Theorem 1.1 in Yao [23]. Other models, such as functions of linear processes, and ARCH models, for which our Theorem 2.1 applies, are given in Sect. 3.

Let us now give an outline of the method used to prove our Theorem 2.1. Since the \(\mathbf{X}_j\)’s are independent, the result will follow if we can prove that the expectation of the Stieltjes transform of \(F^{\mathbf{B}_n}\), say \(S_{F^{\mathbf{B}_n}}(z)\), converges to the Stieltjes transform of \(F\), say \(S(z)\), for any complex number \(z\) with positive imaginary part. With this aim, we shall consider a sample covariance matrix \(\mathbf{G}_n=n^{-1} \sum _{j=1}^n \mathbf{Z}^T_j\mathbf{Z}_j\) where the \( \mathbf{Z}_j\)’s are independent copies of \(\mathbf{Z}= (Z_1, \ldots Z_N)\) where \((Z_k)_{k \in \mathbb{Z }}\) is a sequence of Gaussian random variables having the same covariance structure as the underlying process \((X_k)_{k \in \mathbb{Z }}\). The \( \mathbf{Z}_j\)’s will be assumed to be independent of the \( \mathbf{X}_j\)’s. Using the Gaussian structure of \(\mathbf{G}_n\), the convergence of \(\mathbb{E }\big (S_{F^{\mathbf{G}_n}}(z) \big ) \) to \(S(z)\) will follow by Theorem 1.1 in Silverstein [17]. The main step of the proof is then to show that the difference between the expectations of the Stieltjes transform of \(F^{\mathbf{B}_n}\) and that of \(F^{\mathbf{G}_n}\) converges to zero. This will be achieved by approximating first \((X_k)_{k \in \mathbb{Z }}\) by an \(m\)-dependent sequence of random variables that are bounded. This leads to a new sample covariance matrix \({\bar{\mathbf{B}}_n}\). We then handle the difference between \(\mathbb{E }\big (S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big )\) and \(\mathbb{E }\big (S_{F^{\mathbf{G}_n}}(z)\big )\) with the help of the so-called Lindeberg method used in the multidimensional case. Lindeberg method is known to be an efficient tool to derive limit theorems, and from our knowledge, it has been used for the first time in the context of random matrices by Chatterjee [4]. With the help of this method, he proved the LSD of Wigner matrices associated with exchangeable random variables.

The paper is organized as follows: in Sect. 2, we specify the model and state the LSD result for the sample covariance matrix associated with the underlying process. Applications to linear processes, functions of linear processes, and ARCH models are given in Sect. 3. Section 4 is devoted to the proof of the main result, whereas some technical tools are stated and proved in “Appendix”.

Here are some notations used all along the paper. For any nonnegative integer \(q\), the notation \(\mathbf{0}_q\) means a row vector of size \(q\). For a matrix \(A\), we denote by \(A^T\) its transpose matrix, by \(\mathrm{Tr} (A)\) its trace, by \(\Vert A\Vert \) its spectral norm, and by \(\Vert A \Vert _2\) its Hilbert-Schmidt norm (also called the Frobenius norm). We shall also use the notation \(\Vert X\Vert _r\) for the \(\mathbb{L }^r\)-norm (\(r \ge 1\)) of a real-valued random variable \(X\). For any square matrix \(A\) of order \(N\) with only real eigenvalues, the empirical spectral distribution of \(A\) is defined as

$$\begin{aligned} F^{A}(x)=\frac{1}{N} \sum _{k=1}^{N} 1\!\!1_{\lbrace \lambda _k \le x \rbrace }, \end{aligned}$$

where \(\lambda _1,\ldots ,\lambda _N\) are the eigenvalues of \(A\). The Stieltjes transform of \(F^{A}\) is given by

$$\begin{aligned} S_{F^{A}}(z)=\int \frac{1}{x-z} \hbox {d}F^{A}(x)= \frac{1}{N} \mathrm{Tr}(A- z\mathbf{I})^{-1}, \end{aligned}$$

where \(z=u+iv\in \mathbb{C }^+\) (the set of complex numbers with positive imaginary part), and \(\mathbf {I}\) is the identity matrix.

Finally, the notation \([x]\) is used to denote the integer part of any real \(x\) and, for two reals \(a\) and \(b\), the notation \(a \wedge b\) means \(\min (a,b)\), whereas the notation \(a \vee b\) means \(\max (a,b)\).

2 Main Result

We consider a stationary causal process \((X_k)_{ k\in \mathbb Z }\) defined as follows: let \((\varepsilon _k)_{k\in \mathbb Z }\) be a sequence of i.i.d. real-valued random variables and let \(g:\mathbb{R }^{\mathbb{Z }} \rightarrow \mathbb{R }\) be a measurable function such that, for any \(k \in \mathbb{Z }\),

$$\begin{aligned} X_k=g(\xi _k ) \ \text { with } \ \xi _k := (\ldots ,\varepsilon _{k-1}, \varepsilon _k ) \end{aligned}$$
(2.1)

is a proper random variable, \(\mathbb{E }(g(\xi _k))=0\) and \(\Vert g(\xi _k ) \Vert _2 < \infty \).

The framework (2.1) is very general and it includes many widely used linear and nonlinear processes. We refer to the papers by Wu [21, 22] for many examples of stationary processes that are of form (2.1). Following Priestley [16] and Wu [21], \((X_k)_{ k\in \mathbb Z }\) can be viewed as a physical system with \(\xi _k\) (respectively \(X_k\)) being the input (respectively the output) and \(g\) being the transform or data-generating mechanism.

For \(n\) a positive integer, we consider \(n\) independent copies of the sequence \( (\varepsilon _k)_{ k\in \mathbb Z }\) that we denote by \(( \varepsilon ^{(i)}_k)_{ k\in \mathbb Z }\) for \(i = 1, \ldots , n\). Setting \(\xi ^{(i)}_k = \big ( \ldots , \varepsilon ^{(i)}_{k-1}, \varepsilon ^{(i)}_k \big )\) and \(X^{(i)}_k=g(\xi ^{(i)}_k )\), it follows that \(( X_{k}^{(1)})_{ k\in \mathbb Z },\ldots ,(X_{k}^{(n)})_{ k\in \mathbb Z }\) are \(n\) independent copies of \(( X_k)_{ k\in \mathbb Z }\). Let now \(N=N(n)\) be a sequence of positive integers, and define for any \(i \in \{1, \ldots , n \}\), \(\mathbf{{X}}_{i}=\big ( X_{1}^{(i)}, \ldots ,X_{N}^{(i)}\big )\). Let

$$\begin{aligned} \mathcal X _n=(\mathbf{{X}}^T_{1} \vert \cdots \vert \mathbf{{X}}^T_{n}) \ \text { and } \ {\mathbf{B}_n}=\frac{1}{n}\mathcal X _n\mathcal X _{n}^{T}. \end{aligned}$$
(2.2)

In what follows, \({\mathbf{B}_n}\) will be referred to as the sample covariance matrix associated with \( (X_k)_{ k\in \mathbb Z }\). To derive the limiting spectral distribution of \({\mathbf{B}_n}\), we need to impose some dependence structure on \((X_k)_{k \in \mathbb{Z }}\). With this aim, we introduce the projection operator: for any \(k\) and \(j\) belonging to \(\mathbb{Z }\), let

$$\begin{aligned} P_j(X_k) = \mathbb{E }(X_k |\xi _j) - \mathbb{E }( X_k | \xi _{j-1}). \end{aligned}$$

We state now our main result.

Theorem 2.1

Let \(( X_k)_{ k\in \mathbb Z }\) be defined in (2.1) and \(\mathbf{B}_n\) by (2.2). Assume that

$$\begin{aligned} \sum _{k\ge 0} \Vert P_0(X_{k}) \Vert _2 < \infty , \end{aligned}$$
(2.3)

and that \(c(n)=N/n \rightarrow c \in (0,\infty )\). Then, with probability one, \(F^{\mathbf{B}_n}\) tends to a non-random probability distribution \(F\), whose Stieltjes transform \(S=S(z)\) (\(z \in \mathbb C ^+\)) satisfies the equation

$$\begin{aligned} z=-\frac{1}{{\underline{S}}} + \frac{c}{2\pi } \int \limits _0^{2\pi } \frac{1}{{\underline{S}} + \big ( 2 \pi f(\lambda )\big )^{-1}} \hbox {d}\lambda , \end{aligned}$$
(2.4)

where \({\underline{S}}(z):=-(1-c)/z +c S(z)\) and \(f(\cdot )\) is the spectral density of \(( X_k)_{ k\in \mathbb Z }\).

Let us mention that, in the literature, the condition (2.3) is referred to as the Hannan–Heyde condition and is known to be essentially optimal for the validity of the central limit theorem for the partial sums (normalized by \(\sqrt{n}\)) associated with an adapted regular stationary process in \(\mathbb{L }^2\). As we shall see in the next section, the quantity \(\Vert P_0(X_{k}) \Vert _2 \) can be computed in many situations including nonlinear models. We would like to mention that the condition (2.3) is weaker than the 2-strong stability condition introduced by [21, Definition 3] that involves a coupling coefficient.

Remark 2.2

Under the condition (2.3), the series \(\sum _{k \ge 0} |\mathrm{Cov} ( X_0 , X_k ) |\) is finite (see for instance the inequality (4.61)). Therefore (2.3) implies that the spectral density \(f(\cdot )\) of \((X_k)_{k \in \mathbb{Z }}\) exists, is continuous, and bounded on \([0,2\pi )\). It follows that Proposition 1 in Yao [23] concerning the support of the limiting spectral distribution \(F\) still applies if (2.3) holds. In particular, \(F\) is compactly supported. Notice also that condition (2.3) is essentially optimal for the covariances to be absolutely summable. Indeed, for a causal linear process with nonnegative coefficients and generated by a sequence of i.i.d. real-valued random variables centered and in \(\mathbb{L }^2\), both conditions are equivalent to the summability of the coefficients.

Remark 2.3

Let us mention that each of the following conditions is sufficient for the validity of (2.3):

$$\begin{aligned} \sum _{n \ge 1} \frac{1}{\sqrt{n}}\Vert \mathbb{E }( X_{n} | \xi _0 )\Vert _2 < \infty \ \text { or } \ \sum _{n \ge 1} \frac{1}{\sqrt{n}}\Vert X_{n} - \mathbb{E } ( X_n| \mathcal{F }_{1}^n)\Vert _2 < \infty , \end{aligned}$$
(2.5)

where \(\mathcal{F }_{1}^n = \sigma ( \varepsilon _k , 1 \le k \le n)\). A condition as the second part of (2.5) is usually referred to as a near epoch dependence-type condition. The fact that the first part of (2.5) implies (2.3) follows from Corollary 2 in Peligrad and Utev [14]. Corollary 5 of the same paper asserts that the second part of (2.5) implies its first part.

Remark 2.4

Since many processes encountered in practice are causal, Theorem 2.1 is stated for the one-sided process \((X_k)_{k \in \mathbb{Z }}\) having the representation (2.1). With non-essential modifications in the proof, the same result holds when \((X_k)_{k \in \mathbb{Z }}\) is a two-sided process having the representation

$$\begin{aligned} X_k=g ( \ldots , \varepsilon _{k-1}, \varepsilon _k, \varepsilon _{k+1},\ldots ), \end{aligned}$$
(2.6)

where \(( \varepsilon _k)_{ k\in \mathbb Z }\) is a sequence of i.i.d. real-valued random variables. Assuming that \(X_0\) is centered and in \(\mathbb{L }^2\), condition (2.3) has then to be replaced by the following condition: \(\sum _{k\in \mathbb{Z }} \Vert P_0(X_{k}) \Vert _2 < \infty \).

Remark 2.5

One can wonder whether Theorem 2.1 extends to the case of functionals of another strictly stationary sequence which can be strong mixing or absolutely regular, even if this framework and ours have different range of applicability. Actually, many models encountered in econometric theory have the representation (2.1), whereas, for instance, functionals of absolutely regular (\(\beta \)-mixing) sequences occur naturally as orbits of chaotic dynamical systems. In this situation, we do not think that Theorem 2.1 extends in its full generality without requiring an additional near epoch dependence-type condition. It is outside the scope of this paper to study such models, which will be the object of further investigations.

3 Applications

In this section, we give two different classes of models for which the condition (2.3) is satisfied and then for which our Theorem 2.1 applies. Other classes of models, including nonlinear time series such as iterative Lipschitz models or chains with infinite memory, which are of the form (2.1) and for which the quantities \(\Vert P_0(X_k) \Vert _2\) or \(\Vert \mathbb{E }(X_k | \xi _0) \Vert _2\) can be computed, may be found in [22].

3.1 Functions of Linear Processes

In this section, we shall focus on functions of real-valued linear processes. Define

$$\begin{aligned} X_{k}=h\Big (\sum _{i\ge 0}a_{i}\varepsilon _{k-i}\Big )-\mathbb{E }\Big (h\Big (\sum _{i\ge 0}a_{i}\varepsilon _{k-i}\Big )\Big ), \end{aligned}$$
(3.1)

where \((a_{i})_{i\in {\mathbb{Z }}}\) is a sequence of real numbers in \(\ell ^{1}\) and \((\varepsilon _{i})_{i\in \mathbb Z }\) is a sequence of i.i.d. real-valued random variables in \(\mathbb{L }^{1}\). We shall give sufficient conditions in terms of the regularity of the function \(h\), for the condition (2.3) to be satisfied.

Denote by \(w_{h}(\cdot )\) the modulus of continuity of the function \(h\) on \(\mathbb{R }\), that is:

$$\begin{aligned} w_{h}(t)=\sup _{|x-y| \le t} |h(x)-h(y)|\,. \end{aligned}$$

Corollary 3.1

Assume that

$$\begin{aligned} \sum _{k \ge 0}\Vert w_{h} ( | a_k \varepsilon _0 | )\Vert _2 < \infty , \end{aligned}$$
(3.2)

or

$$\begin{aligned} \sum _{k \ge 1} \frac{\big \Vert w_h \big ( \sum _{\ell \ge 0} |a_{k + \ell }| | \varepsilon _{- \ell }|\big ) \big \Vert _2}{k^{1/2}}< \infty . \end{aligned}$$
(3.3)

Then, provided that \(c(n)=N/n\rightarrow c \in (0,\infty )\), the conclusion of Theorem 2.1 holds for \(F^{\mathbf{B}_n}\) where \(\mathbf{B}_n\) is the sample covariance matrix of dimension \(N\) defined by (2.2) and associated with \((X_{k})_{k\in \mathbf{Z }}\) defined by (3.1).

Example 1

Assume that \(h\) is \(\gamma \)-Hölder with \(\gamma \in ]0,1]\), that is: there is a positive constant \(C\) such that \(w_{h}(t) \le C |t|^{\gamma }\). Assume that

$$\begin{aligned} \sum _{k\ge 0}|a_{k}|^{\gamma }<\infty \ \text { and } \ \mathbb{E } (|\varepsilon _{0}|^{(2 \gamma ) \vee 1})<\infty , \end{aligned}$$

then the condition (3.2) is satisfied and the conclusion of Corollary 3.1 holds. In particular, when \(h\) is the identity, which corresponds to the fact that \(X_k\) is a causal linear process, the conclusion of Corollary 3.1 holds as soon as \(\sum _{k\ge 0}|a_{k}|<\infty \) and \(\varepsilon _0\) belongs to \(\mathbb{L }^2\). This improves Theorem 2.5 in Bai and Zhou [2] and Theorem 1 in Yao [23] that require \(\varepsilon _0\) to be in \(\mathbb{L }^4\).

Example 2

Assume \(\Vert \varepsilon _0 \Vert _{\infty } \le M\) where \(M\) is a finite positive constant, and that \(|a_k| \le C \rho ^k\) where \(\rho \in (0,1)\) and \(C\) is a finite positive constant, then the condition (3.3) is satisfied and the conclusion of Corollary 3.1 holds as soon as \( \sum _{k \ge 1} k^{-1/2} w_h \big ( \rho ^k M C (1- \rho )^{-1}\big ) < \infty \). Using the usual comparison between series and integrals, it follows that the latter condition is equivalent to

$$\begin{aligned} \int \limits _0^1 \frac{w_h (t)}{t \sqrt{| \log t |}}\hbox {d}t < \infty . \end{aligned}$$
(3.4)

For instance if \(w_{h}(t) \le C | \log t |^{-\alpha }\) with \( \alpha >1/2\) near zero, then the above condition is satisfied.

Let us now consider the special case of functionals of Bernoulli shifts (also called Raikov or Riesz–Raikov sums). Let \((\varepsilon _k)_{k\in \mathbb{Z }}\) be a sequence of i.i.d. random variables such that \(\mathbb{P }(\varepsilon _0=1)=\mathbb{P }(\varepsilon _0=0)=1/2\) and let, for any \(k \in \mathbb{Z }\),

$$\begin{aligned} Y_k= \sum _{i\ge 0}2^{-i-1}\varepsilon _{k-i} \, \text { and } X_k = h ( Y_k) -\int \limits _0^1 h(x) \hbox {d}x , \end{aligned}$$
(3.5)

where \(h \in \mathbb{L }^2 ([0,1])\), \([0,1]\) being equipped with the Lebesgue measure. Recall that \(Y_n\), \(n \ge 0\), is an ergodic stationary Markov chain taking values in \([0,1]\), whose stationary initial distribution is the restriction of Lebesgue measure to \([0,1]\). As we have seen previously, if \(h\) has a modulus of continuity satisfying (3.4), then the conclusion of Theorem 2.1 holds for the sample covariance matrix associated with such a functional of Bernoulli shifts. Since for Bernoulli shifts, the computations can be done explicitly, we can even derive an alternative condition to (3.4), still in terms of regularity of \(h\), in such a way that (2.3) holds.

Corollary 3.2

. Assume that

$$\begin{aligned} \int \limits _0^1 \int \limits _0^1 ( h(x) - h(y))^2 \frac{1}{|x-y|} \big (\log \big (\log \frac{1}{|x-y|}\big )\big )^{t}\hbox {d}x \hbox {d}y < \infty , \end{aligned}$$
(3.6)

for some \(t>1\). Then, provided that \(c(n)=N/n\rightarrow c \in (0,\infty )\), the conclusion of Theorem 2.1 holds for \(F^{\mathbf{B}_n}\) where \(\mathbf{B}_n\) is the sample covariance matrix of dimension \(N\) defined by (2.2) and associated with \((X_{k})_{k\in \mathbf{Z }}\) defined by (3.5).

As a concrete example of a map satisfying (3.6), we can consider the function

$$\begin{aligned} g(x) = \frac{1}{\sqrt{x}} \frac{1}{ (1 + \log ( 2/x) )^4} \sin \Big ( \frac{1}{x}\Big ) \ , \, 0 < x < 1 \end{aligned}$$

(see the computations pages 23–24 in Merlevède et al. [11] showing that the above function satisfies (3.6)).

Proof of Corollary 3.1

To prove the corollary, it suffices to show that the condition (2.3) is satisfied as soon as (3.2) or (3.3) holds. Let \((\varepsilon _k^*)_{k \in \mathbb{Z }}\) be an independent copy of \((\varepsilon _k)_{k \in \mathbb{Z }}\). Denoting by \(\mathbb{E }_{\varepsilon }(\cdot )\) the conditional expectation with respect to \(\varepsilon =(\varepsilon _k)_{k \in \mathbb{Z }}\), we have that, for any \(k \ge 0\),

$$\begin{aligned} \Vert P_0 ( X_{k} )\Vert _2\!&= \! \Big \Vert \mathbb{E }_{\varepsilon }\Big (h\Big (\sum _{i=0 }^{k-1} a_i \varepsilon ^*_{k-i} \!+\! \sum _{i \ge k} a_i \varepsilon _{k-i}\Big ) - h\Big (\sum _{i=0 }^{k} a_i \varepsilon ^*_{k-i} + \sum _{i \ge k+1} a_i \varepsilon _{k-i}\Big ) \Big ) \Big \Vert _2 \\&\le \Vert w_{h} \big ( \big | a_k (\varepsilon _0-\varepsilon _0^*)\big | \big )\Vert _2. \end{aligned}$$

Next, by the subadditivity of \(w_{h}(\cdot )\), \(w_{h} ( | a_k (\varepsilon _0-\varepsilon _0^*) | ) \le w_{h} (| a_k \varepsilon _0 | )+ w_{h} ( | a_k \varepsilon _0^* | )\). Whence, \( \Vert P_0 ( X_{k} )\Vert _2 \le 2 \Vert w_{h} ( | a_k \varepsilon _0 | )\Vert _2 \). This proves that the condition (2.3) is satisfied under (3.2).

We prove now that if (3.3) holds then so does the condition (2.3). According to Remark 2.3, it suffices to prove that the first part of (2.5) is satisfied. With the same notations as before, we have that, for any \(\ell \ge 0\),

$$\begin{aligned} \mathbb{E }( X_{\ell } | \xi _0 )= \mathbb{E }_{\varepsilon }\Big (h\Big (\sum _{i=0 }^{\ell -1} a_i \varepsilon ^*_{\ell -i} + \sum _{i \ge \ell } a_i \varepsilon _{\ell -i}\Big ) - h\Big (\sum _{i\ge 0} a_i \varepsilon ^*_{\ell -i} \Big ) \Big ) . \end{aligned}$$

Hence, for any nonnegative integer \(\ell \),

$$\begin{aligned} \Vert \mathbb{E }( X_{\ell } | \xi _0 ) \Vert _2 \le \Big \Vert w_h \Big ( \sum _{i\ge \ell } |a_i (\varepsilon _{\ell -i}-\varepsilon ^*_{\ell -i} ) |\Big ) \Big \Vert _2 \le 2 \Big \Vert w_h \Big ( \sum _{i\ge \ell } |a_i | | \varepsilon _{\ell -i}|\Big ) \Big \Vert _2, \end{aligned}$$

where we have used the subadditivity of \(w_{h}(\cdot )\) for the last inequality. This latter inequality entails that the first part of (2.5) holds as soon as (3.3) does. \(\square \)

Proof of Corollary 3.2

By Remark 2.3, it suffices to prove that the second part of (2.5) is satisfied as soon as (3.6) is. Actually we shall prove that (3.6) implies that

$$\begin{aligned} \sum _{n \ge 1} ( \log n)^t \Vert X_{n} - \mathbb{E } ( X_n| \mathcal{F }_{1}^n)\Vert ^2_2 < \infty , \end{aligned}$$
(3.7)

which clearly entails the second part of (2.5) since \(t >1\). An upper bound for the quantity \(\Vert X_{n} - \mathbb{E } ( X_n| \mathcal{F }_{1}^n)\Vert ^2_2\) has been obtained in [8, Chapter 19.3]. Setting \(A_{jn}=[j2^{-n}, (j+1)2^{-n})\) for \(j=0,1,\ldots ,2^{n}-1\), they obtained (see the pages 372–373 of their monograph) that

$$\begin{aligned} \Vert X_{n} - \mathbb{E } ( X_n| \mathcal{F }_{1}^n)\Vert ^2_2 \le 2^n \sum _{j=0}^{2^n -1} \int \limits _{A_{j,n}} \int \limits _{A_{j,n}} ( h(x) - h(y) )^2 \hbox {d}x \hbox {d}y . \end{aligned}$$

Since

$$\begin{aligned} \sum _{j=0}^{2^n -1} \int \limits _{A_{j,n}} \int \limits _{A_{j,n}} ( h(x) - h(y) )^2 \hbox {d}x \hbox {d}y \le \int \limits _0^1 \int \limits _0^1 ( h(x) - h(y) )^2 \mathbf{1 }_{|x-y| \le 2^{-n}}\hbox {d}x \hbox {d}y , \end{aligned}$$

it follows that

$$\begin{aligned}&\sum _{n \ge 1} ( \log n)^t \Vert X_{n} - \mathbb{E } ( X_n| \mathcal{F }_{1}^n)\Vert ^2_2 \\&\quad \le \int \limits _0^1 \int \limits _0^1 \sum _{n : 2^{-n} \ge |x-y|}2^n ( \log n)^t ( h(x) - h(y) )^2 \mathbf{1 }_{|x-y| \le 2^{-n}}\hbox {d}x \hbox {d}y . \end{aligned}$$

This latter inequality together with the fact that for any \(u \in (0,1)\), \(\sum _{n : 2^{-n} \ge u } ( \log n)^t \le C u^{-1} ( \log ( \log u^{-1}) )^t \) for some positive constant \(C\) prove that (3.7) holds under (3.6). \(\square \)

3.2 ARCH Models

Let \((\varepsilon _{k})_{k\in \mathbb Z }\) be an i.i.d. sequence of zero mean real-valued random variables such that \(\Vert \varepsilon _{0}\Vert _{2 }= 1\). We consider the following ARCH(\(\infty \)) model described by Giraitis et al. [5]:

$$\begin{aligned} Y_{k}=\sigma _{k}\varepsilon _{k} \ \text { where } \ \sigma _{k}^{2}=a+\sum _{j\ge 1}a_{j}Y_{k-j}^{2}\,, \end{aligned}$$
(3.8)

where \(a\ge 0\), \(a_{j}\ge 0\), and \(\sum _{j\ge 1}a_{j}<1\). Such models are encountered when the volatility \((\sigma _{k}^{2})_{k\in \mathbb{Z }}\) is unobserved. In that case, the process of interest is \((Y_{k}^{2})_{k\in \mathbb{Z }}\) and, in what follows, we consider the process \((X_{k})_{k\in \mathbb Z }\) defined, for any \(k \in \mathbb{Z }\), by:

$$\begin{aligned} X_{k} = Y_k^2 - \mathbb{E }(Y_k^2)\quad \text { where } Y_k \text { is defined in}\,(3.8). \end{aligned}$$
(3.9)

Notice that, under the above conditions, there exists a unique stationary solution of Eq. (3.8) satisfying (see [5]):

$$\begin{aligned} \sigma _{k}^{2}=a+a\sum _{\ell =1}^{\infty }\sum _{j_{1},\ldots ,j_{\ell }=1}^{\infty }a_{j_{1}}\ldots a_{j_{\ell }}\varepsilon _{k-j_{1}}^{2}\ldots \varepsilon _{k-(j_{1}+\cdots +j_{\ell })}^{2}\,. \end{aligned}$$
(3.10)

Corollary 3.3

Assume that \(\varepsilon _0\) belongs to \(\mathbb{L }^4\) and that

$$\begin{aligned} \Vert \varepsilon _0\Vert _4^2 \sum _{j \ge 1} a_j <1 \text { and } \sum _{j \ge n} a_j = O (n^{-b}) \qquad \text {for some } \ b>1/2 . \end{aligned}$$
(3.11)

Then, provided that \(c(n)=N/n\rightarrow c \in (0,\infty )\), the conclusion of Theorem 2.1 holds for \(F^{\mathbf{B}_n}\) where \(\mathbf{B}_n\) is the sample covariance matrix of dimension \(N\) defined by (2.2) and associated with \((X_{k})_{k\in \mathbb{Z }}\) defined by (3.9).

Proof of Corollary 3.3

By Remark 2.3, it suffices to prove that the first part of (2.5) is satisfied as soon as (3.11) is. With this aim, let us notice that, for any integer \(n \ge 1\),

$$\begin{aligned}&\Vert \mathbb{E }( X_n | \xi _0 ) \Vert _2 = \Vert \varepsilon _0\Vert _4^2 \Vert \mathbb{E }(\sigma ^2_n | \xi _0 ) - \mathbb{E }(\sigma _n^2) \Vert _2 \\&\quad \le 2 a \Vert \varepsilon _0\Vert _4^2 \Big \Vert \sum _{\ell =1}^{\infty }\sum _{j_1, \ldots , j_{\ell }=1}^{\infty } a_{j_1}\ldots a_{j_{\ell }} \varepsilon _{n-j_1}^2 \ldots \varepsilon _{n-(j_1+\cdots +j_{\ell })}^2 \mathbf{1}_{j_1+\cdots +j_{\ell } \ge n}\Big \Vert _{2} \\&\quad \le 2 a \Vert \varepsilon _0\Vert _4^2 \sum _{\ell =1}^{\infty }\sum _{j_1, \ldots , j_{\ell }=1}^{\infty } \sum _{k=1}^{\ell } a_{j_1}\ldots a_{j_{\ell }} \mathbf{1}_{j_k \ge [n/\ell ]}\Vert \varepsilon _{0}\Vert _{4}^{2 \ell } \\&\quad \le 2 a \Vert \varepsilon _{0}\Vert _{4}^{2}\sum _{\ell =1}^{\infty } \ell \kappa ^{\ell -1} \sum _{k =[n/\ell ] }^{\infty }a_k, \end{aligned}$$

where \(\kappa = \Vert \varepsilon _0\Vert _4^2 \sum _{j \ge 1} a_j\). So, under (3.11), there exists a positive constant \(C\) not depending on \(n\) such that \( \Vert \mathbb{E }( X_n | \xi _0 ) \Vert _2 \le C n^{-b} \). This upper bound implies that the first part of (2.5) is satisfied as soon as \(b>1/2\). \(\square \)

Remark 3.4

Notice that if we consider the sample covariance matrix associated with \((Y_k)_{k \in \mathbb{Z }}\) defined in (3.8), then its LSD follows directly by Theorem 2.1 since \(P_0 (Y_k) = 0\), for any positive integer \(k\).

4 Proof of Theorem 2.1

To prove the theorem, it suffices to show that for any \(z \in \mathbb{C }^+\),

$$\begin{aligned} S_{F^{\mathbf{B}_n}}(z) \rightarrow S(z) \text { almost surely.} \end{aligned}$$
(4.1)

Since the columns of \(\mathcal X _n\) are independent, by Step 1 of the proof of Theorem 1.1 in Bai and Zhou [2], to prove (4.1), it suffices to show that, for any \(z \in \mathbb{C }^+\),

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb{E }\big ( S_{F^{\mathbf{B}_n}}(z) \big ) =S(z) , \end{aligned}$$
(4.2)

where \(S(z)\) satisfies the Eq. (2.4).

The proof of (4.2) being very technical, for reader convenience, let us describe the different steps leading to it. We shall consider a sample covariance matrix \(\mathbf{G}_n:=\frac{1}{n}\mathcal Z _n\mathcal Z _{n}^{T}\) (see (4.32)) such that the columns of \(\mathcal Z _n\) are independent and the random variables in each column of \(\mathcal Z _n\) form a sequence of Gaussian random variables whose covariance structure is the same as that of the sequence \((X_k)_{k\in \mathbb{Z }}\) (see Sect. 4.2). The aim will be then to prove that, for any \(z\in \mathbb{C }^+\),

$$\begin{aligned} \lim _{n \rightarrow \infty } \big | \mathbb{E }\big ( S_{F^{\mathbf{B}_n}}(z) \big ) - \mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) \big | = 0 , \end{aligned}$$
(4.3)

and

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) = S(z) . \end{aligned}$$
(4.4)

The proof of (4.4) will be achieved in Sect. 4.4 with the help of Theorem 1.1 in Silverstein [17] combined with arguments developed in the proof of Theorem 1 in Yao [23]. The proof of (4.3) will be divided in several steps. First, to “break” the dependence structure, we introduce a parameter \(m\), and approximate \({\mathbf{B}_n}\) by a sample covariance matrix \({{\bar{\mathbf{B}}}_n}:= \frac{1}{n} {\bar{\mathcal{X }}}_{n} {\bar{\mathcal{X }}}_{n}^{T}\) (see (4.16)) such that the columns of \({\bar{\mathcal{X }}}_{n}\) are independent and the random variables in each column of \({\bar{\mathcal{X }}}_{n}\) form of an \(m\)-dependent sequence of random variables bounded by \(2M \), with \(M\) a positive real (see Sect. 4.1). This approximation will be done in such a way that, for any \(z\in \mathbb C ^+\),

$$\begin{aligned} \lim _{m \rightarrow \infty } \limsup _{M \rightarrow \infty } \limsup _{n \rightarrow \infty }\Big | \mathbb{E }\big ( S_{F^{\mathbf{B}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) \Big | = 0 \,. \end{aligned}$$
(4.5)

Next, the sample Gaussian covariance matrix \(\mathbf{G}_n\) is approximated by another sample Gaussian covariance matrix \({\widetilde{\mathbf{G}}}_{n}\) (see (4.34)), depending on the parameter \(m\) and constructed from \(\mathbf{G}_n\) by replacing some of the variables in each column of \(\mathcal Z _n\) by zeros (see Sect. 4.2). This approximation will be done in such a way that, for any \(z\in \mathbb C ^+\),

$$\begin{aligned} \lim _{m \rightarrow \infty } \limsup _{n \rightarrow \infty }\Big | \mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) \Big | = 0 . \end{aligned}$$
(4.6)

In view of (4.5) and (4.6), the convergence (4.3) will then follow if we can prove that, for any \(z\in \mathbb C ^+\),

$$\begin{aligned} \lim _{m \rightarrow \infty } \limsup _{M \rightarrow \infty } \limsup _{n \rightarrow \infty }\Big | \mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) - \mathbb{E }\big ( S_{F^{{\mathbf{{\widetilde{G}}}}_{n}}}(z) \big ) \Big | = 0 . \end{aligned}$$
(4.7)

This will be achieved in Sect. 4.3 with the help of the Lindeberg method. The rest of this section is devoted to the proofs of the convergences (4.3)–(4.7).

4.1 Approximation by a Sample Covariance Matrix Associated with an \(m\)-Dependent Sequence

Let \(N\ge 2\) and \(m\) be a positive integer fixed for the moment and assumed to be less than \(\sqrt{N/2}\). Set

$$\begin{aligned} k_{N,m}=\left[ \frac{N}{m^2+{m}}\right] , \end{aligned}$$
(4.8)

where we recall that \([ \, \cdot \, ]\) denotes the integer part. Let \(M\) be a fixed positive number that depends neither on \(N\) nor on \(n\), nor on \(m\). Let \(\varphi _{M}\) be the function defined by \(\varphi _{M}(x)=(x \wedge M)\vee (-M)\). Now for any \(k\in \mathbb Z \) and \( i \in \lbrace 1, \ldots , n \rbrace \) let

$$\begin{aligned} \widetilde{X}^{(i)}_{k,M,m}\!=\! \mathbb{E }\Big ( \varphi _{M} (X^{(i)}_{k}) \vert \varepsilon _{k}^{(i)}, \ldots , \varepsilon _{k-m}^{(i)} \Big ) \ \text { and} \ \bar{X}^{(i)}_{k,M,m}\!=\! \widetilde{X}^{(i)}_{k,M,m} \!-\!\mathbb{E }\big (\widetilde{X}^{(i)}_{k,M,m}\big ).\qquad \end{aligned}$$
(4.9)

In what follows, to soothe the notations, we shall write \({\widetilde{X}}^{(i)}_{k,m}\) and \({\bar{X}}^{(i)}_{k,m}\) instead of, respectively, \(\widetilde{X}^{(i)}_{k,M,m}\) and \(\bar{X}^{(i)}_{k,M,m}\), when no confusion is allowed. Notice that \(\big ( \bar{X}^{(1)}_{k,m}\big )_{k \in \mathbb{Z }}, \ldots , \big ( \bar{X}^{(n)}_{k,m}\big )_{k \in \mathbb{Z }}\) are \(n\) independent copies of the centered and stationary sequence \(\big ( {\bar{X}}_{k,m}\big )_{k \in \mathbb{Z }}\) defined by

$$\begin{aligned} \bar{X}_{k,m}\!=\! \widetilde{X}_{k,m} \!-\!\mathbb{E }\big (\widetilde{X}_{k,m}\big ) \ \text { where } \ \widetilde{X}_{k,m}\!=\! \mathbb{E }\Big ( \varphi _{M} (X_{k}) \vert \varepsilon _{k}, \ldots , \varepsilon _{k-m}\Big ) \ , \ k \in \mathbb{Z } .\qquad \end{aligned}$$
(4.10)

This implies in particular that: for any \(i \in \{1, \ldots , n \}\) and any \(k\in \mathbb Z \),

$$\begin{aligned} \Vert {\bar{X}}^{(i)}_{k,m}\Vert _{\infty } =\Vert {\bar{X}}_{k,m}\Vert _{\infty } \le 2M . \end{aligned}$$
(4.11)

For any \(i \in \{1, \ldots , n\}\), note that \( \big ( {\bar{X}}^{(i)}_{k,m}\big )_{k \in \mathbb{Z }}\) forms an \({m}\)-dependent sequence, in the sense that \( {\bar{X}}^{(i)}_{k,m}\) and \({\bar{X}}^{(i)}_{k',m}\) are independent if \(\vert k-k' \vert > {m}\).

We write now the interval \([1, N]\cap \mathbb N \) as a union of disjoint sets as follows:

$$\begin{aligned}{}[1, N]\cap \mathbb N = \bigcup _{\ell =1}^{k_{N,m}+1} I_{\ell } \cup J_{\ell } , \end{aligned}$$

where, for \( \ell \in \{1, \ldots , k_{N,m} \}\),

$$\begin{aligned} I_{\ell }&:= \big [(\ell -1)(m^2 +m)+1 , (\ell -1)(m^2 +m)+m^2 \big ]\cap \mathbb N , \nonumber \\ J_{\ell }&:= \Big [(\ell -1)(m^2 +m)+m^2+1 , \ell (m^2 +m)\Big ]\cap \mathbb N , \end{aligned}$$
(4.12)

and, for \(\ell = k_{N,m} +1 \),

$$\begin{aligned} I_{k_{N,m} +1} = \big [ k_{N,m}(m^2 +m)+1 , N \big ]\cap \mathbb N , \end{aligned}$$

and \(J_{k_{N,m} +1} = \emptyset \). Note that \(I_{k_{N,m} +1} =\emptyset \) if \(k_{N,m}(m^2 +m) = N\).

Let now \(\big (\mathbf{{u}}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m}\} }\) be the random vectors defined as follows. For any \(\ell \) belonging to \(\{1, \ldots , k_{N,m}-1\}\),

$$\begin{aligned} \mathbf{{u}}^{(i)}_{\ell }= \Big ( \big ( {\bar{X}}^{(i)}_{k,m}\big )_{k \in I_{\ell }}, \mathbf{0}_{m} \Big ) . \end{aligned}$$
(4.13)

Hence, the dimension of the random vectors defined above is equal to \(m^2+{m}\). Now, for \(\ell = k_{N,m}\), we set

$$\begin{aligned} \mathbf{{u}}^{(i)}_{k_{N,m}}= \Big ( \big ( {\bar{X}}^{(i)}_{k,m}\big )_{k \in I_{k_{N,m}}}, \mathbf{0}_{r} \Big ), \end{aligned}$$
(4.14)

where \(r= m + N - k_{N,m} (m^2+m)\). This last vector is then of dimension \( N -( k_{N,m} -1 ) (m^2+m)\).

Notice that the random vectors \(\big ( \mathbf{{u}}^{(i)}_{\ell } \big )_{1 \le i \le n, 1 \le \ell \le k_{N,m}}\) are mutually independent.

For any \(i \in \{1, \ldots , n \}\), we define now row random vectors \({\bar{\mathbf{X}}^{(i)}}\) of dimension \(N\) by setting

$$\begin{aligned} {\bar{\mathbf{X}}^{(i)}} = \big (\mathbf{{u}}^{(i)}_{ \ell } ,\ell = 1, \ldots , k_{N,m} \big ) , \end{aligned}$$
(4.15)

where the \(\mathbf{{u}}^{(i)}_{ \ell } \)’s are defined in (4.13) and (4.14). Let

$$\begin{aligned} {\bar{\mathcal{X }}}_{n} = \big ( {\bar{\mathbf{X}}^{(1) T}}\vert \cdots \vert \bar{\mathbf{X}}^{(n) T}\big ) \text { and } \ {{\bar{\mathbf{B}}}_n}= \frac{1}{n} {\bar{\mathcal{X }}}_{n} {\bar{\mathcal{X }}}_{n}^{T}. \end{aligned}$$
(4.16)

In what follows, we shall prove the following proposition.

Proposition 4.1

For any \(z\in \mathbb C ^+\), the convergence (4.5) holds true with \({\mathbf{B}_n}\) and \({{\bar{\mathbf{B}}}_n}\) as defined in (2.2) and (4.16), respectively.

To prove the proposition above, we start by noticing that, by integration by parts, for any \(z=u+iv \in \mathbb C ^+\),

$$\begin{aligned}&\Big | \mathbb{E }\big ( S_{F^{\mathbf{B}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) \Big | \le \mathbb{E }\Big | \int \frac{1}{x-z} \hbox {d}F^{\mathbf{B}_n}(x) -\int \frac{1}{x-z} \hbox {d}F^{{{\bar{\mathbf{B}}}_n}}(x) \Big | \nonumber \\&\quad = \mathbb{E }\Big | \int \frac{F^{\mathbf{B}_n}(x) -F^{{{\bar{\mathbf{B}}}_n}}(x)}{(x-z)^2} \hbox {d}x \Big | \le \frac{1}{v^2} \, \mathbb{E }\int \big | F^{\mathbf{B}_n}(x) -F^{{{\bar{\mathbf{B}}}_n}}(x) \big | \hbox {d}x . \end{aligned}$$
(4.17)

Now, \(\int \big | F^{\mathbf{B}_n}(x) -F^{{{\bar{\mathbf{B}}}_n}}(x) \big | \hbox {d}x\) is nothing else but the Wasserstein distance of order \(1\) between the empirical measure of \({\mathbf{B}_n}\) and that of \({{{\bar{\mathbf{B}}}_n}}\). To be more precise, if \(\lambda _1, \ldots , \lambda _N\) denote the eigenvalues of \(\mathbf{B}_n\) in the non-increasing order, and \({\bar{\lambda }}_1, \ldots , {\bar{\lambda }}_N\) the ones of \({{{\bar{\mathbf{B}}}_n}}\), also in the non-increasing order, then, setting \(\eta _n=\frac{1}{N} \sum _{k=1}^N \delta _{\lambda _k}\) and \({\bar{\eta }}_n=\frac{1}{N} \sum _{k=1}^N \delta _{ {\bar{\lambda }}_k}\), we have that

$$\begin{aligned} \int \big | F^{\mathbf{B}_n}(x) -F^{{{\bar{\mathbf{B}}}_n}}(x) \big | \hbox {d}x = W_1 ( \eta _n , {\bar{\eta }}_n) = \inf \mathbb{E }| X -Y| , \end{aligned}$$

where the infimum runs over the set of couples of random variables \((X,Y)\) on \(\mathbb{R } \times \mathbb{R }\) such that \(X\sim \eta _n\) and \(Y \sim {\bar{\eta }}_n\). Arguing as in Remark 4.2.6 in [3], we have

$$\begin{aligned} W_1 ( \eta _n , {\bar{\eta }}_n) = \frac{1}{N} \min _{ \pi \in \mathcal{S }_N}\sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{\pi (k)}\vert , \end{aligned}$$

where \(\pi \) is a permutation belonging to the symmetric group \(\mathcal{S }_N\) of \(\{1, \ldots , N \}\). By standard arguments, involving the fact that if \(x,y,u,v\) are real numbers such that \(x\le y\) and \(u >v\), then \(|x-u| + |y-v| \ge |x-v| + |y-u|\), we get that \(\min _{ \pi \in \mathcal{S }_N}\sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{\pi (k)}\vert = \sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{k}\vert \). Therefore,

$$\begin{aligned} W_1 ( \eta _n , {\bar{\eta }}_n) = \int \big | F^{\mathbf{B}_n}(x) -F^{{{\bar{\mathbf{B}}}_n}}(x) \big | \hbox {d}x = \frac{1}{N} \sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{k}\vert . \end{aligned}$$
(4.18)

Notice that \({\lambda }_k=s_k^2\) and \({\bar{\lambda }}_{k} = {\bar{s}}^2_{k}\) where the \(s_k\)’s (respectively the \({\bar{s}}_{k}\)’s) are the singular values of the matrix \(n^{-1/2}\mathcal X _{n}\) (respectively of \(n^{-1/2} \bar{\mathcal{X }}_{n}\)). Hence, by Cauchy–Schwarz’s inequality,

$$\begin{aligned}&\sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{k}\vert \le \Big ( \sum _{k=1}^{N \wedge n} \big \vert {s}_k+{\bar{s}}_{k} \big \vert ^2 \Big )^{1/2} \Big ( \sum _{k=1}^{N \wedge n} \big \vert {s}_k-{\bar{s}}_{k} \big \vert ^2 \Big )^{1/2} \\&\quad \le 2^{1/2} \Big ( \sum _{k=1}^{N \wedge n} \big ( s^2_k+{\bar{s}}^2_{k} \big ) \Big )^{1/2} \Big ( \sum _{k=1}^{N \wedge n} \big \vert {s}_k-{\bar{s}}_{k} \big \vert ^2 \Big )^{1/2}\\&\quad \le 2^{1/2} \Big ( \mathrm{Tr} ( {\mathbf{B}_n} ) + \mathrm{Tr}( {{{\bar{\mathbf{B}}}_n}} ) \Big )^{1/2} \Big ( \sum _{k=1}^{N \wedge n} \big \vert {s}_k-{\bar{s}}_{k} \big \vert ^2 \Big )^{1/2} . \end{aligned}$$

Next, by Hoffman–Wielandt’s inequality (see, e.g., Corollary 7.3.8 in Horn and Johnson [7]),

$$\begin{aligned} \sum _{k=1}^{N \wedge n} \big \vert {s}_k-{\bar{s}}_{k} \big \vert ^2 \le n^{-1} \mathrm{Tr} \big ( \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big ) \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big )^T \big ). \end{aligned}$$

Therefore,

$$\begin{aligned}&\sum _{k=1}^{N \wedge n} \vert {\lambda }_k-{\bar{\lambda }}_{k}\vert \le 2^{1/2} n^{-1/2}\Big ( \mathrm{Tr} ( {\mathbf{B}_n} ) + \mathrm{Tr}( {{{\bar{\mathbf{B}}}_n}} ) \Big )^{1/2}\nonumber \\&\quad \times \Big ( \mathrm{Tr} \big ( \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big ) \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big )^T \big ) \Big )^{1/2}. \end{aligned}$$
(4.19)

Starting from (4.17), considering (4.18) and (4.19), and using Cauchy–Schwarz’s inequality, it follows that

$$\begin{aligned}&\Big | \mathbb{E }\big ( S_{F^{\mathbf{B}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) \Big | \nonumber \\&\quad \le \frac{2^{1/2}}{v^2} \frac{1}{N n^{1/2}} \Vert \mathrm{Tr} ( {\mathbf{B}_n}) + \mathrm{Tr} ( {{{\bar{\mathbf{B}}}_n}} ) \Vert _1^{1/2} \Vert \mathrm{Tr} \big ( \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big ) \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big )^T \big ) \Vert _1^{1/2}.\qquad \quad \end{aligned}$$
(4.20)

By the definition of \({\mathbf{B}_n}\),

$$\begin{aligned} \frac{1}{N} \mathbb{E }\big ( \vert \mathrm{Tr} ( {\mathbf{B}_n} ) \vert \big ) = \frac{1}{ n N} \sum _{i=1}^n \sum _{k=1}^N \big \Vert X_k^{(i)} \big \Vert _2^2 = \Vert X_0 \Vert _2^2 , \end{aligned}$$
(4.21)

where we have used that for each \(i\), \(\big ( X^{(i)}_{k} \big )_{k \in \mathbb{Z }}\) is a copy of the stationary sequence \( ( X_k )_{k \in \mathbb{Z }}\). Now, setting

$$\begin{aligned} \mathcal{I }_{N,m} = \bigcup _{\ell =1}^{k_{N,m}} I_{\ell } \ \text { and } \ \mathcal{R }_{N,m} = \{1, \ldots , N \} \backslash \mathcal{I }_{N,m} , \end{aligned}$$
(4.22)

recalling the definition (4.16) of \({{\bar{\mathbf{B}}}_n}\), using the stationarity of the sequence \(({\bar{X}}^{(i)}_{k,m})_{k \in \mathbb{Z }}\), and the fact that \(\mathrm{card} (\mathcal{I }_{N,m} ) = m^2 k_{N,m}\le N \), we get

$$\begin{aligned} \frac{1}{N} \mathbb{E }\big ( \vert \mathrm{Tr} ({{\bar{\mathbf{B}}}_n} ) \vert \big ) = \frac{1}{ n N} \sum _{i=1}^n \sum _{k \in \mathcal{I }_{N,m} } \big \Vert {\bar{X}}^{(i)}_{k,m}\big \Vert _2^2 \le \Vert {\bar{X}}_{0,m} \Vert _2^2. \end{aligned}$$

Next,

$$\begin{aligned} \Vert {\bar{X}}_{0,m} \Vert _2 \le 2 \Vert {\widetilde{X}}_{0,m} \Vert _2 \le 2 \Vert \varphi _M(X_0) \Vert _2 \le 2 \Vert X_0 \Vert _2 . \end{aligned}$$
(4.23)

Therefore,

$$\begin{aligned} \frac{1}{N} \mathbb{E }\big ( \vert \mathrm{Tr} ({{\bar{\mathbf{B}}}_n} ) \vert \big )\le 4 \Vert X_0 \Vert _2^2 . \end{aligned}$$
(4.24)

Now, by definition of \(\mathcal{X }_{n}\) and \( \bar{\mathcal{X }}_{n}\),

$$\begin{aligned}&\frac{1}{N n} \mathbb{E }\big ( \vert \mathrm{Tr} \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big ) \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big )^T \vert \big ) \\&\quad = \frac{1}{ n N} \sum _{i=1}^n \sum _{k \in \mathcal{I }_{N,m} } \big \Vert X_k^{(i)} - {\bar{X}}^{(i)}_{k,m}\big \Vert _2^2 + \frac{1}{ n N} \sum _{i=1}^n \sum _{k \in \mathcal{R }_{N,m} } \big \Vert X_k^{(i)} \big \Vert _2^2 . \end{aligned}$$

Using stationarity, the fact that \(\mathrm{card} (\mathcal{I }_{N,m} ) \le N \) and

$$\begin{aligned} \mathrm{card} (\mathcal{R }_{N,m}) =N - m^2k_{N,m}\le \frac{N }{m+1} + m^2, \end{aligned}$$
(4.25)

we get that

$$\begin{aligned}&\frac{1}{N n} \mathbb{E }\big ( \vert \mathrm{Tr} \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big ) \big ( \mathcal{X }_{n} - \bar{\mathcal{X }}_{n} \big )^T \vert \big ) \nonumber \\&\quad \le \Vert X_0 - {\bar{X}}_{0,m} \Vert ^2_2 + (m^{-1} + m^2 N^{-1}) \Vert X_0 \Vert _2^2. \end{aligned}$$
(4.26)

Starting from (4.20), considering the upper bounds (4.21), (4.24), and (4.26), we derive that there exists a positive constant \(C\) not depending on \((m,M)\) and such that

$$\begin{aligned} \limsup _{n \rightarrow \infty } \Big | \mathbb{E }\big ( S_{F^{\mathbf{B}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) \Big | \le \frac{C}{v^2} \big ( \Vert X_0 - {\bar{X}}_{0,m} \Vert _2 + m^{-1/2} \big ) . \end{aligned}$$

Therefore, Proposition 4.1 will follow if we can prove that

$$\begin{aligned} \lim _{m \rightarrow \infty }\limsup _{M \rightarrow \infty } \Vert X_0 - {\bar{X}}_{0,m} \Vert _2 = 0 . \end{aligned}$$
(4.27)

Let us introduce now the sequence \((X_{k,m})_{k \in \mathbb{Z }}\) defined as follows: for any \(k \in \mathbb{Z }\),

$$\begin{aligned} X_{k,m}= \mathbb{E }\big ( X_{k} \vert \varepsilon _{k}, \ldots , \varepsilon _{k-m}\big ) . \end{aligned}$$
(4.28)

With the above notation, we write that

$$\begin{aligned} \Vert X_0 - {\bar{X}}_{0,m} \Vert _2 \le \Vert X_0 - X_{0,m} \Vert _2 +\Vert X_{0,m} - {\bar{X}}_{0,m} \Vert _2. \end{aligned}$$

Since \(X_0\) is centered, so is \({ X}_{0,m}\). Then \(\Vert X_{0,m} - {\bar{X}}_{0,m} \Vert _2=\Vert X_{0,m} - \mathbb{E }( X_{0,m}) - {\bar{X}}_{0,m} \Vert _2\). Therefore, recalling the definition (4.10) of \({\bar{X}}_{0,m}\), it follows that

$$\begin{aligned}&\Vert X_{0,m}- {\bar{X}}_{0,m} \Vert _2 \le 2 \Vert X_{0,m} -\widetilde{X}_{0,m} \Vert _2 \nonumber \\&\quad \le 2 \Vert X_0 - \varphi _M(X_0) \Vert _2 \le 2 \Vert \big ( |X_0| - M )_+ \Vert _2 . \end{aligned}$$
(4.29)

Since \(X_0\) belongs to \(\mathbb{L }^2\), \(\lim _{M \rightarrow \infty }\Vert \big ( |X_0| - M )_+ \Vert _2 = 0\). Therefore, to prove (4.27) (and then Proposition 4.1), it suffices to prove that

$$\begin{aligned} \lim _{m \rightarrow \infty }\Vert X_0 - X_{0,m} \Vert _2 = 0 . \end{aligned}$$
(4.30)

Since \((X_{0,m})_{m \ge 0}\) is a martingale with respect to the increasing filtration \((\mathcal{G }_m)_{m \ge 0}\) defined by \(\mathcal{G }_m= \sigma ( \varepsilon _{-m}, \ldots , \varepsilon _0 )\) and is such that \(\sup _{m \ge 0} \Vert X_{0,m} \Vert _2 \le \Vert X_0 \Vert _2 < \infty \), (4.30) follows by the martingale convergence theorem in \(\mathbb{L }^2\) (see for instance Corollary 2.2 in Hall and Heyde [6]). This ends the proof of Proposition 4.1. \(\square \)

4.2 Construction of Approximating Sample Covariance Matrices Associated with Gaussian Random Variables

Let \(( Z_k)_{ k\in \mathbb Z }\) be a centered Gaussian process with real values, whose covariance function is given, for any \(k,\ell \in \mathbb Z \), by

$$\begin{aligned} \mathrm{Cov}(Z_k,Z_{\ell })=\mathrm{Cov}(X_k, X_{\ell }). \end{aligned}$$
(4.31)

For \(n\) a positive integer, we consider \(n\) independent copies of the Gaussian process \(( Z_k)_{ k\in \mathbb Z }\) that are in addition independent of \((X_k^{(i)} )_{ k \in \mathbb{Z } , i \in \{1, \ldots , n \}}\). We shall denote these copies by \(( Z^{(i)}_k)_{ k\in \mathbb Z }\) for \(i = 1, \ldots , n\). For any \(i \in \{1, \ldots , n \}\), define \(\mathbf{{Z}}_{i}=\big ( Z_{1}^{(i)}, \ldots ,Z_{N}^{(i)}\big )\). Let \(\mathcal Z _n=(\mathbf{{Z}}^T_{1} \vert \cdots \vert \mathbf{{Z}}^T_{n}) \) be the matrix whose columns are the \(\mathbf{{Z}}^T_{i}\)’s and consider its associated sample covariance matrix

$$\begin{aligned} \mathbf{G}_n=\frac{1}{n}\mathcal Z _n\mathcal Z _{n}^{T}. \end{aligned}$$
(4.32)

For \(k_{N,m}\) given in (4.8), we define now the random vectors \(\big (\mathbf{v}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m} \} }\) as follows. They are defined as the random vectors \(\big (\mathbf{{u}}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m} \}}\) defined in (4.13) and (4.14), but by replacing each \({\bar{X}}^{(i)}_{k,m}\) by \(Z^{(i)}_{k}\). For any \(i \in \{1, \ldots , n \}\), we then define the random vectors \(\mathbf{{\widetilde{Z}}}^{(i)}\) of dimension \(N\), as follows:

$$\begin{aligned} \mathbf{{\widetilde{Z}}}^{(i)} = \big (\mathbf{v}^{(i)}_{ \ell } \, , \, \ell = 1, \ldots , k_{N,m} \big ) . \end{aligned}$$
(4.33)

Let now

$$\begin{aligned} {\widetilde{\mathcal{Z }}}_n = \big ({\widetilde{\mathbf{Z}}}^{(1)T} \vert \cdots \vert {\widetilde{\mathbf{Z}}}^{(n) T} \big ) \ \text { and } \ {\widetilde{\mathbf{G}}}_{n}= \frac{1}{n} {\widetilde{\mathcal{Z }}}_{n} {\widetilde{\mathcal{Z }}}_{n}^{T}. \end{aligned}$$
(4.34)

In what follows, we shall prove the following proposition.

Proposition 4.2

For any \(z\in \mathbb C ^+\), the convergence (4.6) holds true with \(\mathbf{{G}_{n}}\) and \({\widetilde{\mathbf{G}}}_{n}\) as defined in (4.32) and (4.34) respectively.

To prove the proposition above, we start by noticing that, for any \(z=u+iv \in \mathbb C ^+\),

$$\begin{aligned}&\Big | \mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) \Big | \le \mathbb{E }\Big | \int \frac{1}{x-z} \hbox {d}F^{\mathbf{G}_n}(x) -\int \frac{1}{x-z} \hbox {d}F^{\widetilde{\mathbf{G}}_{n}}(x) \Big | \\&\quad \le \mathbb{E }\Big | \int \frac{F^{\mathbf{G}_n}(x) -F^{\mathbf{{\widetilde{G}}}_{n}}}{(x-z)^2} \hbox {d}x \Big | \le \frac{ \pi \, \big \Vert F^{\mathbf{G}_n} -F^{\mathbf{{\widetilde{G}}}_{n}} \big \Vert _{\infty }}{v} . \end{aligned}$$

Hence, by Theorem A.44 in Bai and Silverstein [1],

$$\begin{aligned} \Big | \mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) \Big | \le \frac{\pi }{v N} \mathrm{rank} \big ( \mathcal{Z }_{n} - \widetilde{\mathcal{Z }}_{n}\big ). \end{aligned}$$

By definition of \(\mathcal{Z }_{n}\) and \(\widetilde{\mathcal{Z }}_{n}\), \(\mathrm{rank} \big ( \mathcal{Z }_{n} - \widetilde{\mathcal{Z }}_{n}\big ) \le \mathrm{card} ( \mathcal{R }_{N,m} )\), where \(\mathcal{R }_{N,m}\) is defined in (4.22). Therefore, using (4.25), we get that, for any \(z=u+iv \in \mathbb C ^+\),

$$\begin{aligned} \Big | \mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) \Big | \le \frac{\pi }{v N} \Big ( \frac{ N }{m+1} + m^2 \Big ) , \end{aligned}$$

which converges to zero by letting \(n\) first tend to infinity and after \(m\). This ends the proof of Proposition 4.2. \(\square \)

4.3 Approximation of \(\mathbb{E }\big (S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big )\) by \(\mathbb{E }\big (S_{F^{{\widetilde{\mathbf{G}}}_{n}}}(z)\big )\)

In this section, we shall prove the following proposition.

Proposition 4.3

Under the assumptions of Theorem 2.1, for any \(z\in \mathbb C ^+\), the convergence (4.7) holds true with \({{{\bar{\mathbf{B}}}_n}}\) and \({\widetilde{\mathbf{G}}}_{n}\) as defined in (4.16) and (4.34), respectively.

With this aim, we shall use the Lindeberg method that is based on telescoping sums. In order to develop it, we first give the following definition:

Definition 4.1

Let \(x\) be a vector of \(\mathbb{R }^{n N}\) with coordinates

$$\begin{aligned} x =\big ( x^{(1)}, \ldots , x^{(n)} \big ) \text { where for any } i \in \{1, \ldots , n \}, \ x^{(i)} = \big ( x^{(i)}_k \, , \, k \in \{1, \ldots , N \} \big ). \end{aligned}$$

Let \(z\in \mathbb C ^+\) and \(f:=f_z\) be the function defined from \(\mathbb R ^{nN}\) to \(\mathbb C \) by

$$\begin{aligned} f(x)=\frac{1}{N}\mathrm{Tr}\big (A(x)-z\mathbf{{I}}\big )^{-1} \ \text { where } \ A(x) = \frac{1}{n}\sum _{k=1}^{n}( x^{(k)})^Tx^{(k)} , \end{aligned}$$
(4.35)

and \(\mathbf {I}\) is the identity matrix.

The function \(f\), as defined above, admits partial derivatives of all orders. Indeed, let \(u\) be one of the coordinates of the vector \(x\) and \(A_u=A(x)\) the matrix-valued function of the scalar \(u\). Then, setting \(G_u = \big (A_u-z\mathbf{{I}}\big )^{-1}\) and differentiating both sides of the equality \(G_u(A_u-z\mathbf{{I}})= \mathbf{{I}}\), it follows that

$$\begin{aligned} \dfrac{\hbox {d}G}{\hbox {d}u}=-G\dfrac{\hbox {d}A}{\hbox {d}u}G, \end{aligned}$$
(4.36)

(see the equality (17) in Chatterjee [4]). Higher-order derivatives may be computed by applying repeatedly the above formula. Upper bounds for some partial derivatives up to the fourth order are given in “Appendix”.

Now, using Definition 4.1 and the notations (4.15) and (4.33), we get that, for any \(z\in \mathbb C ^+\),

$$\begin{aligned} \mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) = \mathbb{E }f \big ({\bar{\mathbf{X}}}^{(1)},\ldots ,{\bar{\mathbf{X}}}^{(n)} \big ) - \mathbb{E }f \big (\mathbf{{\widetilde{Z}}}^{(1)} ,\ldots , \mathbf{{\widetilde{Z}}}^{(n)}\big ) .\qquad \end{aligned}$$
(4.37)

To continue the development of the Lindeberg method, we introduce additional notations. For any \(i \in \{1, \ldots , n \}\) and \(k_{N,m}\) given in (4.8), we define the random vectors \(\big (\mathbf{{U}}^{(i)}_{\ell } \big )_{ \ell \in \{1, \ldots , k_{N,m}\} }\) of dimension \(n N\) as follows. For any \(\ell \in \{1, \ldots , k_{N,m}\}\),

$$\begin{aligned} \mathbf{{U}}^{(i)}_{\ell }= \Big (\mathbf{0}_{(i-1)N} \, , \, \mathbf{0}_{(\ell -1) (m^2 +m)} \, , \, \mathbf{{u}}^{(i)}_{\ell } \, , \, \mathbf{0}_{r_{\ell }}, \mathbf{0}_{(n-i)N}\Big ) , \end{aligned}$$
(4.38)

where the \(\mathbf{{u}}^{(i)}_{\ell } \)’s are defined in (4.13) and (4.14), and

$$\begin{aligned} r_{\ell } = N - \ell (m^2 +m) \ \text { for } \ell \in \{1, \ldots , k_{N,m}-1\},\text { and } \ r_{k_{N,m}} = 0. \end{aligned}$$
(4.39)

Note that the vectors \(\big ( \mathbf{{U}}^{(i)}_{\ell }\big )_{1 \le i \le n, 1 \le \ell \le k_{N,m}}\) are mutually independent. Moreover, with the notations (4.38) and (4.15), the following relations hold. For any \(i \in \{1, \ldots , n \}\),

$$\begin{aligned}&\sum _{\ell =1}^{k_{N,m}} \mathbf{{U}}^{(i)}_{\ell } = \Big ( \mathbf{0}_{N(i-1)}\, , \, {\bar{\mathbf{X}}}^{(i)} \, , \, \mathbf{0}_{ (n-i)N} \Big ) \, \text { and } \nonumber \\&\quad \sum _{i=1}^n \sum _{\ell =1}^{k_{N,m}} \mathbf{{U}}^{(i)}_{\ell } = \Big ({\bar{\mathbf{X}}}^{(1)}, \ldots ,{\bar{\mathbf{X}}}^{(n)}\Big ), \end{aligned}$$
(4.40)

where the \({\bar{\mathbf{X}}}^{(i)}\)’s are defined in (4.15).

Now, for any \(i \in \{1, \ldots , n \}\), we define the random vectors \(\big (\mathbf{V}^{(i)}_{\ell } \big )_{\ell \in \{1, \ldots , k_{N,m}\} }\) of dimension \(n N \), as follows: for any \(\ell \in \{1, \ldots , k_{N,m}\}\),

$$\begin{aligned} \mathbf{V}^{(i)}_{\ell }= \Big (\mathbf{0}_{(i-1)N} \, , \, \mathbf{0}_{(\ell -1) (m^2 +m)} \, , \, \mathbf{v}^{(i)}_{\ell } \, , \, \mathbf{0}_{r_{\ell }}, \mathbf{0}_{(n-i)N}\Big ) , \end{aligned}$$
(4.41)

where \(r_{\ell }\) is defined in (4.39) and the \(\mathbf{v}^{(i)}_{\ell }\)’s are defined in Sect. 4.2. With the notations (4.41) and (4.33), the following relations hold: for any \(i \in \{1, \ldots , n \}\),

$$\begin{aligned}&\sum _{\ell =1}^{k_{N,m}} \mathbf{V}^{(i)}_{\ell } = \Big ( \mathbf{0}_{N(i-1)}\, , \, \mathbf{{\widetilde{Z}}}^{(i)} \, , \, \mathbf{0}_{N (n-i)} \Big ) \, \text { and } \nonumber \\&\quad \sum _{i=1}^n \sum _{\ell =1}^{k_{N,m}} \mathbf{V}^{(i)}_{\ell } = \Big (\mathbf{{\widetilde{Z}}}^{(1)} , \ldots , \mathbf{{\widetilde{Z}}}^{(n)} \Big ), \end{aligned}$$
(4.42)

where the \( \mathbf{{\widetilde{Z}}}^{(i)}\)’s are defined in (4.33). We define now, for any \(i \in \{1, \ldots , n \}\),

$$\begin{aligned} \mathbf{{S}}_i= \sum _{s=1}^i \sum _{\ell =1}^{k_{N,m}} \mathbf{{U}}^{(s)}_{\ell } \, \text { and } \, \mathbf{{T}}_i= \sum _{s=i}^n \sum _{\ell =1}^{k_{N,m}} \mathbf{V}^{(s)}_{\ell }, \end{aligned}$$
(4.43)

and any \(s \in \{ 1, \ldots , k_{N,m} \}\),

$$\begin{aligned} \mathbf{{S}}^{(i)}_s= \sum _{\ell =1}^{s} \mathbf{{U}}^{(i)}_{\ell } \, \text { and } \, \mathbf{{T}}^{(i)}_s= \sum _{\ell =s}^{k_{N,m}} \mathbf{V}^{(i)}_{\ell } . \end{aligned}$$
(4.44)

In all the notations above, we use the convention that \(\sum _{k=r}^s =0\) if \(r>s\). Therefore, starting from (4.37), considering the relations (4.40) and (4.42), and using the notations (4.43) and (4.44), we successively get

$$\begin{aligned}&\mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) = \sum _{i=1}^n \Big ( \mathbb{E }f \big ( \mathbf{{S}}_i + \mathbf{{T}}_{i+1} \big ) -\mathbb{E }f \big ( \mathbf{{S}}_{i-1} + \mathbf{{T}}_{i} \big ) \Big ) \\&\quad = \sum _{i=1}^n \sum _{s=1}^{k_{N,m}} \Big ( \mathbb{E }f \big ( \mathbf{{S}}_{i-1} \!+\! \mathbf{{S}}^{(i)}_s+ \mathbf{{T}}^{(i)}_{s+1} \!+\! \mathbf{{T}}_{i+1} \big ) -\mathbb{E }f \big ( \mathbf{{S}}_{i-1} \!+\! \mathbf{{S}}^{(i)}_{s-1}+ \mathbf{{T}}^{(i)}_{s} + \mathbf{{T}}_{i+1} \big ) \Big ) . \end{aligned}$$

Therefore, setting for any \(i \in \{1, \ldots , n \}\) and any \(s \in \{ 1, \ldots , k_{N,m} \}\),

$$\begin{aligned} \mathbf{W}^{(i)}_{s} = \mathbf{{S}}_{i-1} + \mathbf{{S}}^{(i)}_s+ \mathbf{{T}}^{(i)}_{s+1} + \mathbf{{T}}_{i+1}, \end{aligned}$$
(4.45)

and

$$\begin{aligned} {\widetilde{\mathbf{W}}}^{(i)}_{s} = \mathbf{{S}}_{i-1} + \mathbf{{S}}^{(i)}_{s-1}+ \mathbf{{T}}^{(i)}_{s+1} + \mathbf{{T}}_{i+1}, \end{aligned}$$
(4.46)

we are lead to

$$\begin{aligned} \mathbb{E }\big ( S_{F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) -\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) =\sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \Big ( \mathbb{E }\big ( \Delta ^{(i)}_{s} (f) \big ) - \mathbb{E }\big ( \widetilde{\Delta }^{(i)}_{s} (f) \big )\Big ), \end{aligned}$$
(4.47)

where

$$\begin{aligned} \Delta ^{(i)}_{s} (f) = f \big (\mathbf{W}^{(i)}_{s} \big ) - f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \, \text { and } \, \widetilde{\Delta }^{(i)}_{s} (f) = f \big (\mathbf{W}^{(i)}_{s-1} \big ) -f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ). \end{aligned}$$

In order to continue the multidimensional Lindeberg method, it is useful to introduce the following notations.

Definition 4.2

Let \(d_1\) and \(d_2\) be two positive integers. Let \(A = (a_1, \ldots , a_{d_1})\) and \(B= (b_1, \ldots , b_{d_2})\) be two real-valued row vectors of respective dimensions \(d_1\) and \(d_2\). We define \(A \otimes B\) as being the transpose of the Kronecker product of \(A\) by \(B\). Therefore

$$\begin{aligned} A \otimes B = \left( \begin{array}{c} a_{1}B^T \\ \vdots \\ a_{d_1}B^T \end{array} \right) \in \mathbb{R }^{d_1 d_2} . \end{aligned}$$

For any positive integer \(k\), the \(k\)th transpose Kronecker power \(A^{\otimes k}\) is then defined inductively by: \(A^{\otimes 1}=A^T\) and \(A^{\otimes k} = A \bigotimes \big ( A^{\otimes (k-1)} \big )^T\).

Notice that, here, \(A \otimes B\) is not exactly the usual Kronecker product (or Tensor product) of \(A\) by \(B\) that rather produces a row vector. However, for later notation convenience, the above notation is useful.

Definition 4.3

Let \(d\) be a positive integer. If \(\nabla \) denotes the differentiation operator given by \(\nabla = \big ( \frac{\partial }{\partial x_1}, \ldots , \frac{\partial }{\partial x_d} \big )\) acting on the differentiable functions \(h: \mathbb{R }^d \rightarrow \mathbb{R }\), we define, for any positive integer \(k\), \(\nabla ^{\otimes k}\) in the same way as in Definition 4.2. If \(h : \mathbb{R }^d \rightarrow \mathbb{R }\) is \(k\)-times differentiable, for any \(x \in \mathbb{R }^d\), let \( D^k h(x) = \nabla ^{\otimes k} h(x) \), and for any row vector \(Y\) of \(\mathbb{R }^d\), we define \(D^k h(x) \mathbf{.} Y^{\otimes k}\) as the usual scalar product in \(\mathbb{R }^{d^k}\) between \(D^k h(x)\) and \( Y^{\otimes k}\). We write \(D h\) for \(D^1 h\).

Let \(z =u+iv \in \mathbb{C }^+\). We start by analyzing the term \( \mathbb{E }\big ( \Delta ^{(i)}_{s} (f) \big ) \) in (4.47). By Taylor’s integral formula,

$$\begin{aligned}&\Big | \mathbb{E }\big ( \Delta ^{(i)}_{s} (f) \big ) - \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{{U}}^{(i)\, \otimes 1}_{s} \big ) - \frac{1}{2} \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) \Big | \nonumber \\&\quad \le \Big |\mathbb{E }\int \limits _{0}^{1} \frac{ (1-t)^2}{2} D^{3}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{{U}}^{(i) \, \otimes 3}_{s} \hbox {d}t \Big | . \end{aligned}$$
(4.48)

Let us analyze the right-hand term of (4.48). Recalling the definition (4.38) of the \(\mathbf{{U}}^{(i)}_{s}\)’s, for any \(t \in [0,1]\),

$$\begin{aligned}&\mathbb{E }\big | D^{3}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{{U}}^{(i) \, \otimes 3}_{s} \big | \\&\quad \le \sum _{k \in I_{s}} \sum _{\ell \in I_{s}} \sum _{j \in I_{s}} \mathbb{E }\Big ( \Big \vert \frac{\partial ^3 f }{\partial x^{(i)}_{k} \partial x^{(i)}_{\ell } \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) {\bar{X}}_{k,{m}}^{(i)} {\bar{X}}_{\ell ,{m}}^{(i)} {\bar{X}}_{j,{m}}^{(i)}\ \Big \vert \Big ) \\&\quad \le \sum _{k \in I_{s}} \sum _{\ell \in I_{s}} \sum _{j \in I_{s}}\Big \Vert \frac{\partial ^3 f }{\partial x^{(i)}_{k} \partial x^{(i)}_{\ell } \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \Big \Vert _2 \big \Vert {\bar{X}}_{k,{m}}^{(i)} {\bar{X}}_{\ell ,{m}}^{(i)} {\bar{X}}_{j,{m}}^{(i)}\big \Vert _2, \end{aligned}$$

where \(I_{s}\) is defined in (4.12). Therefore, using (4.11), stationarity and (4.23), it follows that, for any \(t \in [0,1]\),

$$\begin{aligned}&\mathbb{E }\big | D^{3}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{{U}}^{(i) \, \otimes 3}_{s} \big | \\&\quad \le 8 M^2 \sum _{k \in I_{s}} \sum _{\ell \in I_{s}} \sum _{j \in I_{s}}\Big \Vert \frac{\partial ^3 f}{\partial x^{(i)}_{k} \partial x^{(i)}_{\ell } \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \Big \Vert _2 \big \Vert X_{0} \big \Vert _2 . \end{aligned}$$

Notice that by (4.43) and (4.44),

$$\begin{aligned} {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} =\big ({\bar{\mathbf{X}}}^{(1)} , \ldots , {\bar{\mathbf{X}}}^{(i-1)} , w^{(i)} (t),\mathbf{{\widetilde{Z}}}^{(i+1)}, \ldots , \mathbf{{\widetilde{Z}}}^{(n)} \big ) , \end{aligned}$$
(4.49)

where \(w^{(i)} (t)\) is the row vector of dimension \(N\) defined by

$$\begin{aligned} w^{(i)} (t) = \mathbf{{S}}^{(i)}_{s-1}+ t\mathbf{{U}}^{(i)}_{s} + \mathbf{{T}}^{(i)}_{s+1} = \big (\mathbf{{u}}^{(i)}_{1} , \ldots ,\mathbf{{u}}^{(i)}_{s-1} , t \mathbf{{u}}^{(i)}_{s} , \mathbf{{v}}^{(i)}_{ s+1} , \ldots , \mathbf{{v}}^{(i)}_{k_{N,m}}\big ) ,\qquad \quad \end{aligned}$$
(4.50)

where the \(\mathbf{{u}}^{(i)}_{ \ell } \)’s are defined in (4.13) and (4.14), whereas the \(\mathbf{{v}}^{(i)}_{ \ell } \)’s are defined in Sect. 4.2. Therefore, by Lemma 5.1 of the “Appendix”, (4.11), and since \((Z_k^{(i)})_{k \in \mathbb{Z }}\) is distributed as the stationary sequence \((Z_k)_{k \in \mathbb{Z }}\), we infer that there exists a positive constant \(C_1\) not depending on \((n,M,m)\) and such that, for any \(t \in [0,1]\),

$$\begin{aligned} \Big \Vert \frac{\partial ^3 f }{\partial x^{(i)}_{k} \partial x^{(i)}_{\ell } \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \Big \Vert _2 \le C_1 \Big ( \frac{M + \Vert Z_0 \Vert _2}{v^3 N^{1/2}n^2} + \frac{N^{1/2} (M^3 + \Vert Z_0 \Vert ^3_6)}{v^4 n^3} \Big ) . \end{aligned}$$

Now, since \(Z_0\) is a Gaussian random variable, \( \Vert Z_0 \Vert ^6_6 = 15 \Vert Z_0 \Vert _2^6 \). Moreover, by (4.31), \( \Vert Z_0 \Vert _2 = \Vert X_0 \Vert _2 \). Therefore, there exists a positive constant \(C_2\) not depending on \((n,M,m)\) and such that, for any \(t \in [0,1]\),

$$\begin{aligned} \mathbb{E }\big | D^{3}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{{U}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{{U}}^{(i) \, \otimes 3}_{s} \big | \le \frac{C_2 m^6(1+M^3)}{v^3 ( 1 \wedge v)N^{1/2}n^2} . \end{aligned}$$
(4.51)

On another hand, since for any \( i \in \lbrace 1 , \ldots , n \rbrace \) and any \(s \in \lbrace 1 , \ldots , k_{N,m} \rbrace \), \(\mathbf{{U}}^{(i)}_{s}\) is a centered random vector independent of \({\widetilde{\mathbf{W}}}^{(i)}_{s}\), it follows that

$$\begin{aligned}&\mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{{U}}^{(i)\, \otimes 1}_{s}\big ) =0 \ \text { and } \nonumber \\&\quad \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big )=\mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ). \end{aligned}$$
(4.52)

Hence starting from (4.48), using (4.51), (4.52) and the fact that \(m^2k_{N,m} \le N\), we derive that there exists a positive constant \(C_3\) not depending on \((n,M,m)\) and such that

$$\begin{aligned}&\sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \Big | \mathbb{E }\big ( \Delta ^{(i)}_{s} (f) \big ) - \frac{1}{2} \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) \Big |\nonumber \\&\quad \le C_3 \frac{(1+M^5) N^{1/2} m^4}{v^3 ( 1 \wedge v) n} . \end{aligned}$$
(4.53)

We analyze now the “Gaussian part” in (4.47), namely \(\mathbb{E }\big ( \widetilde{\Delta }^{(i)}_{s} (f) \big )\). By Taylor’s integral formula,

$$\begin{aligned}&\Big | \mathbb{E }\big ( \widetilde{\Delta }^{(i)}_{s} (f) \big ) - \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s} \big ) - \frac{1}{2} \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big | \\&\quad \le \Big |\mathbb{E }\int \limits _{0}^{1} \frac{ (1-t)^2}{2} D^{3}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} +t\mathbf{V}^{(i)}_{s} \big ) \mathbf{.} \mathbf{V}^{(i) \, \otimes 3}_{s}\hbox {d}t \Big |. \end{aligned}$$

Proceeding as to get (4.53), we then infer that there exists a positive constant \(C_4\) not depending on \((n,M,m)\) and such that

$$\begin{aligned}&\sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \Big | \mathbb{E }\big ( \widetilde{\Delta }^{(i)}_{s} (f) \big ) - \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s}\big ) - \frac{1}{2} \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big | \nonumber \\&\quad \le C_4 \frac{(1+M^3) N^{1/2} m^4}{v^3 ( 1 \wedge v) n} . \end{aligned}$$
(4.54)

We analyze now the terms \(\mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s} \big )\) in (4.54). Recalling the definition (4.41) of the \(\mathbf{V}^{(i)}_{s}\)’s, we write

$$\begin{aligned} \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s} \big ) = \sum _{j \in I_{s}} \mathbb{E }\left( \frac{\partial f }{\partial x^{(i)}_{j} } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) Z_j^{(i)} \right) , \end{aligned}$$

where \(I_{s}\) is defined in (4.12). To handle the terms in the right-hand side, we shall use the so-called Stein’s identity for Gaussian vectors (see, for instance, Lemma 1 in Liu [9]), as done by Neumann [12] in the context of dependent real random variables: for \(G=(G_1, \ldots , G_d)\) a centered Gaussian vector of \(\mathbb{R }^d\) and any function \(h : \mathbb{R }^d\rightarrow \mathbb{R }\) such that its partial derivatives exist almost everywhere and \(\mathbb{E }\big | \frac{\partial h }{\partial x_i} (G) \big | < \infty \) for any \(i=1, \ldots , d\), the following identity holds true:

$$\begin{aligned} \mathbb{E }\big ( G_i \, h(G) \big ) = \sum _{\ell =1}^d\mathbb{E }\big (G_i G_{\ell } \big ) \mathbb{E }\Big ( \frac{\partial h }{\partial x_{\ell }} (G) \Big ) \ \text { for any } i \in \{1, \ldots , d \} . \end{aligned}$$
(4.55)

Using (4.55) with \(G= \big ( \mathbf{{T}}^{(i)}_{s+1} , Z_j^{(i)} \big ) \in \mathbb{R }^{nN} \times \mathbb{R }\), \(h : \mathbb{R }^{n N} \times \mathbb{R } \rightarrow \mathbb{R }\) satisfying \(h(x,y) = \frac{\partial f}{\partial x^{(i)}_{j} } (x)\) for any \((x,y) \in \mathbb{R }^{n N} \times \mathbb{R }\), and noticing that \(G\) is independent of \({\widetilde{\mathbf{W}}}^{(i)}_{s} - \mathbf{{T}}^{(i)}_{s+1} \), we infer that, for any \(j \in I_{s}\),

$$\begin{aligned} \mathbb{E }\left( \frac{\partial f}{\partial x^{(i)}_{j} } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) Z_j^{(i)} \right) =\sum _{\ell =s+1}^{k_{N,m}} \sum _{k \in I_{\ell }} \mathbb{E }\left( \frac{\partial ^2 f }{\partial x^{(i)}_{k} \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \mathrm{Cov} ( Z_k^{(i)}, Z_j^{(i)}) . \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s}\big ) =\sum _{\ell =s+1}^{k_{N,m}} \sum _{k \in I_{\ell }} \sum _{j \in I_{s}} \mathbb{E }\left( \frac{\partial ^2 f }{\partial x^{(i)}_{k} \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \mathrm{Cov} ( Z_k^{(i)}, Z_j^{(i)}). \end{aligned}$$

From (4.49) and (4.50) (with \(t=0\)) and Lemma 5.1 of the “Appendix”, we infer that there exists a positive constant \(C_5\) not depending on \((n,M,m)\) and such that, for any \(k \in I_{\ell }\) and any \(j \in I_{s}\),

$$\begin{aligned}&\mathbb{E }\left( \frac{\partial ^2 f}{\partial x^{(i)}_{k} \partial x^{(i)}_{j}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \le C_5 \Big ( \frac{1}{Nn v^2 }+ \frac{1}{n^2 v^3} \big ( \Vert X_0 \Vert ^2_2 + \Vert Z_0 \Vert ^2_2 ) \Big )\nonumber \\&\quad \le C_5 \frac{1 + 2 \Vert X_0 \Vert ^2_2 }{n v^2 ( 1 \wedge v) ( N \wedge n)} . \end{aligned}$$
(4.56)

Hence, using the fact that \(\mathrm{Cov} ( Z_k^{(i)}, Z_j^{(i)}) =\mathrm{Cov} ( Z_k, Z_j)\) together with (4.31), we then derive that

$$\begin{aligned}&\mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s}\big ) \le C_5 \frac{1 + 2 \Vert X_0 \Vert ^2_2 }{n v^2 ( 1 \wedge v) ( N \wedge n)}\nonumber \\&\quad \times \sum _{\ell =s+1}^{k_{N,m}} \sum _{k \in I_{\ell }} \sum _{j \in I_{s}} \big | \mathrm{Cov} (X_k, X_j) \big | . \end{aligned}$$
(4.57)

By stationarity,

$$\begin{aligned}&\sum _{k \in I_{\ell }} \sum _{j \in I_{s}} \big | \mathrm{Cov} (X_k, X_j) \big | = \sum _{j =1}^{m^2} \sum _{k =1}^{m^2} \big | \mathrm{Cov} (X_0, X_{k-j+(\ell -s) (m^2 +m)}) \big | \\&\quad \le m^2 \sum _{k \in \mathcal{E }_{m, \ell }} \big | \mathrm{Cov} (X_0, X_{k}) \big | , \end{aligned}$$

where \(\mathcal{E }_{m, \ell }:=\{ 1-m^2 +(\ell -s) (m^2 +m), \ldots , m^2-1 + (\ell -s) (m^2 +m) \}\). Notice that since \(m \ge 1\), \(\mathcal{E }_{m, \ell } \cap \mathcal{E }_{m, \ell +2} = \emptyset \). Then, summing on \(\ell \), and using the fact that \(k_{N,m} (m^2 +m) \le N\), we get that, for any \(s \ge 1\),

$$\begin{aligned} \sum _{\ell =s+1}^{k_{N,m}} \sum _{k \in \mathcal{E }_{m, \ell }} \big | \mathrm{Cov} (X_0, X_{k}) \big |\le 2 \sum _{k =m+1}^{m^2+N-1} \big | \mathrm{Cov} (X_0, X_{k}) \big | . \end{aligned}$$

So, overall, for any positive integer \(s\),

$$\begin{aligned} \sum _{\ell =s+1}^{k_{N,m}} \sum _{k \in I_{\ell }} \sum _{j \in I_{s}} \big | \mathrm{Cov} (X_k, X_j) \big | \le 2 m^2 \sum _{k =m+1}^{m^2+N-1} \big | \mathrm{Cov} (X_0, X_{k}) \big | . \end{aligned}$$
(4.58)

Therefore, starting from (4.57) and using that \(m^2 k_{N,m} \le N\), it follows that

$$\begin{aligned}&\sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \big | \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s}\big ) \big | \nonumber \\&\quad \le 2 C_5 \frac{ ( 1 + 2 \Vert X_0 \Vert ^2_2 ) ( 1 + c(n)) }{ v^2 ( 1 \wedge v) }\sum _{k \ge m+1}\big | \mathrm{Cov} (X_0, X_{k}) \big | . \end{aligned}$$
(4.59)

Since \(\mathcal{F }_{-\infty } = \bigcap _{k \in \mathbb{Z }} \sigma ( \xi _k)\) is trivial, for any \(k \in \mathbb{Z }\), \(\mathbb{E }(X_k | \mathcal{F }_{-\infty })=\mathbb{E }(X_k)=0\) a.s. Therefore, the following decomposition is valid: \(X_k = \sum _{r=-\infty }^k P_r (X_k)\). Next, since \(\mathbb{E }\big ( P_i (X_0)P_j (X_k)\big ) =0\) if \(i \ne j\), we get, by stationarity, that for any integer \(k \ge 0\),

$$\begin{aligned}&\big |\mathrm{Cov} (X_0, X_k) \big |=\Big | \sum _{r=-\infty }^{0} \mathbb{E }\Big ( P_r (X_0)P_r (X_k)\Big ) \Big |\nonumber \\&\quad \le \sum _{r=0}^{\infty } \Vert P_{0} (X_{r}) \Vert _2 \Vert P_0 (X_{k+r}) \Vert _2, \end{aligned}$$
(4.60)

implying that for any nonnegative integer \(u\),

$$\begin{aligned} \sum _{k \ge u} \big |\mathrm{Cov} (X_0, X_k) \big | \le \sum _{r\ge 0} \Vert P_0 (X_{r}) \Vert _2 \sum _{k \ge u} \Vert P_0 (X_{k}) \Vert _2 . \end{aligned}$$
(4.61)

Hence, starting from (4.59) and considering (4.61) together with the condition (2.3), we derive that there exists a positive constant \(C_6\) not depending on \((n,M,m)\) such that

$$\begin{aligned} \sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \big | \mathbb{E }\big ( D f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big )\mathbf{.} \mathbf{V}^{(i)\, \otimes 1}_{s}\big ) \big | \le \frac{C_6 (1 + c(n))}{ v^2 ( 1 \wedge v) } \sum _{k \ge m+1 } \Vert P_0 (X_{k}) \Vert _2 . \end{aligned}$$
(4.62)

We analyze now the terms of second order in (4.54), namely \(\mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big )\). Recalling the definition (4.41) of the \(\mathbf{V}^{(i)}_{s}\)’s, we first write that

$$\begin{aligned} \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) = \sum _{j_1 \in I_{s}} \sum _{j_2 \in I_{s}} \mathbb{E }\left( \frac{\partial ^2 f}{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2} } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) Z_{j_1}^{(i)} Z_{j_2}^{(i)} \right) , \end{aligned}$$
(4.63)

where \(I_{s}\) is defined in (4.12). Using now (4.55) with \(G= \big ( \mathbf{{T}}^{(i)}_{s+1} , Z_{j_1}^{(i)}, Z_{j_2}^{(i)} \big ) \in \mathbb{R }^{n N} \times \mathbb{R }\times \mathbb{R }\), \(h : \mathbb{R }^{nN} \times \mathbb{R }\times \mathbb{R } \rightarrow \mathbb{R }\) satisfying \(h(x,y,z) = y \frac{\partial ^2 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2} } (x)\) for any \((x,y,z) \in \mathbb{R }^{n N} \times \mathbb{R }\times \mathbb{R }\), and noticing that \(G\) is independent of \({\widetilde{\mathbf{W}}}^{(i)}_{s} - \mathbf{{T}}^{(i)}_{s+1} \), we infer that, for any \(j_1, j_2\) belonging to \(I_{s}\),

$$\begin{aligned}&\mathbb{E }\left( \frac{\partial ^2 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2} } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) Z_{j_1}^{(i)} Z_{j_2}^{(i)} \right) =\mathbb{E }\left( \frac{\partial ^2 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2} } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \mathbb{E }\big ( Z_{j_1}^{(i)} Z_{j_2}^{(i)} \big )\nonumber \\&\quad + \sum _{k=s+1}^{k_{N,m}} \sum _{j_3 \in I_{k}} \mathbb{E }\left( \frac{\partial ^3 f }{\partial x^{(i)}_{j_3} \partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2} } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) Z_{j_1}^{(i)} \right) \mathbb{E }\big ( Z_{j_3}^{(i)} Z_{j_2}^{(i)} \big ) . \end{aligned}$$
(4.64)

Therefore, starting from (4.63) and using (4.64) combined with the definitions 4.2 and 4.3, it follows that

$$\begin{aligned}&\mathbb{E }\big ( D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) =\mathbb{E }\big ( D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \nonumber \\&\quad + \sum _{k=s+1}^{k_{N,m}} \mathbb{E }\Big ( D^3 f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{V}^{(i)}_{s} \otimes \mathbb{E }\big ( \mathbf{V}^{(i)}_{k} \otimes \mathbf{V}^{(i)}_{s} \big ) \Big ) . \end{aligned}$$
(4.65)

Next, with similar arguments, we infer that

$$\begin{aligned}&\sum _{k=s+1}^{k_{N,m}} \mathbb{E }\Big ( D^3 f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{V}^{(i)}_{s} \otimes \mathbb{E }\big ( \mathbf{V}^{(i)}_{k} \otimes \mathbf{V}^{(i)}_{s} \big ) \Big ) \nonumber \\&\quad =\sum _{k=s+1}^{k_{N,m}} \sum _{\ell =s+1}^{k_{N,m}} \mathbb{E }\big ( D^4 f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \big ) \mathbf{.} \mathbb{E }\big ( \mathbf{V}^{(i)}_{\ell } \otimes \mathbf{V}^{(i)}_{s} \big ) \otimes \mathbb{E }\big ( \mathbf{V}^{(i)}_{k} \otimes \mathbf{V}^{(i)}_{s} \big ) . \end{aligned}$$
(4.66)

By the definition (4.41) of the \(\mathbf{V}^{(i)}_{\ell }\)’s, we first write that

$$\begin{aligned}&\mathbb{E }\big ( D^4 f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \big ) \mathbf{.} \mathbb{E }\big ( \mathbf{V}^{(i)}_{\ell } \otimes \mathbf{V}^{(i)}_{s} \big ) \otimes \mathbb{E }\big ( \mathbf{V}^{(i)}_{k} \otimes \mathbf{V}^{(i)}_{s} \big ) \nonumber \\&\quad = \sum _{j_1 \in I_{\ell }} \sum _{j_2 \in I_{s}} \sum _{j_3 \in I_{k}} \sum _{j_4 \in I_{s}} \mathbb{E }\left( \frac{\partial ^4 f}{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2}\partial x^{(i)}_{j_3} \partial x^{(i)}_{j_4}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \nonumber \\&\quad \quad \times \,\mathrm{Cov} \big ( Z_{j_1}^{(i)}, Z_{j_2}^{(i)} \big ) \mathrm{Cov} \big ( Z_{j_3}^{(i)}, Z_{j_4}^{(i)} \big ) \nonumber \\&\quad = \sum _{j_1 \in I_{\ell }} \sum _{j_2 \in I_{s}} \sum _{j_3 \in I_{k}} \sum _{j_4 \in I_{s}} \mathbb{E }\left( \frac{\partial ^4 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2}\partial x^{(i)}_{j_3} \partial x^{(i)}_{j_4}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \nonumber \\&\quad \quad \times \, \mathrm{Cov} \big ( X_{j_1}, X_{j_2} \big ) \mathrm{Cov} \big ( X_{j_3}, X_{j_4} \big ) , \end{aligned}$$
(4.67)

where for the last line, we have used that \((Z_k^{(i)})_{k \in \mathbb{Z }}\) is distributed as \((Z_k)_{k \in \mathbb{Z }}\) together with (4.31). From (4.49) and (4.50) (with \(t=0\)), Lemma 5.1 of the “Appendix”, and the stationarity of the sequences \(({\bar{X}}^{(i)}_{k,m})_{k \in \mathbb{Z }}\) and \((Z_k^{(i)})_{k \in \mathbb{Z }}\), we infer that there exists a positive constant \(C_7\) not depending on \((n,M,m)\) such that

$$\begin{aligned}&\mathbb{E }\left( \frac{\partial ^4 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2}\partial x^{(i)}_{j_3} \partial x^{(i)}_{j_4}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \\&\quad \le C_7 \Big ( \frac{1}{N n^2 v^3 }+ \frac{1}{N n^3 v^4 } \Big ( \sum _{k=1}^N \Vert {\bar{X}}^{(i)}_{k,m}\Vert _2^2 + \sum _{k=1}^N \Vert Z_k^{(i)} \Vert _2^2 \Big )\\&\quad + \frac{1}{N n^4 v^5 } \Big ( \Big \Vert \sum _{k=1}^N \big ( {\bar{X}}^{(i)}_{k,m}\big )^2 \Big \Vert _2^2 + \Big \Vert \sum _{k=1}^N \big ( Z_k^{(i)} \big )^2 \Big \Vert _2^2 \Big ) \Big ) \\&\quad \le \frac{C_7}{ n^2 N v^3 ( 1 \wedge v^2)} \left( 1+ \frac{N \big ( \Vert {\bar{X}}_{0,m} \Vert _2^2 + \Vert Z_0 \Vert _2^2 \big ) }{n} + \frac{N^2 \big ( \Vert {\bar{X}}_{0,m} \Vert _4^4 + \Vert Z_0 \Vert _4^4 \big ) }{n^2}\right) . \end{aligned}$$

By (4.11) and (4.23), \(\Vert {\bar{X}}_{0,m} \Vert _4^4 \le (2M)^2 \Vert {\bar{X}}_{0,m} \Vert _2^2 \le 16 M^2 \Vert X_0 \Vert _2^2\). Moreover, \(Z_0\) being a Gaussian random variable, \(\Vert Z_0 \Vert _4^4=3\Vert Z_0 \Vert _2^4\). Hence, by (4.31), \( \Vert Z_0 \Vert _4^4=3\Vert X_0 \Vert _2^4\) and \(\Vert Z_0 \Vert _2^2=\Vert X_0 \Vert _2^2\). Therefore, there exists a positive constant \(C_8\) not depending on \((n,M,m)\) and such that

$$\begin{aligned} \mathbb{E }\left( \frac{\partial ^4 f }{\partial x^{(i)}_{j_1} \partial x^{(i)}_{j_2}\partial x^{(i)}_{j_3} \partial x^{(i)}_{j_4}} \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \le \frac{ C_8 (1 + M^2)(1+c^2(n)) }{ n^2 N v^3 ( 1 \wedge v^2)} . \end{aligned}$$
(4.68)

On the other hand, by using (4.58) and (4.61), we get that, for any positive integer \(s\),

$$\begin{aligned}&\sum _{k=s+1}^{k_{N,m}} \sum _{\ell =s+1}^{k_{N,m}} \sum _{j_1 \in I_{\ell }} \sum _{j_2 \in I_{s}} \sum _{j_3 \in I_{k}} \sum _{j_4 \in I_{s}} \big | \mathrm{Cov} \big ( X_{j_1}, X_{j_2} \big ) \mathrm{Cov} \big ( X_{j_3}, X_{j_4} \big ) \big | \nonumber \\&\quad \le 4 m^4 \Big ( \sum _{r\ge 0} \Vert P_0 (X_{r}) \Vert _2 \Big )^2 \Big ( \sum _{k \ge m+1} \Vert P_0 (X_{k}) \Vert _2 \Big )^2 . \end{aligned}$$
(4.69)

Whence, starting from (4.66), using (4.67), and considering the upper bounds (4.68) and (4.69) together with the condition (2.3), we derive that there exists a positive constant \(C_9\) not depending on \((n,M,m)\) such that

$$\begin{aligned}&\sum _{k=s+1}^{k_{N,m}} \mathbb{E }\Big ( D^3 f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \mathbf{.} \mathbf{V}^{(i)}_{s} \otimes \mathbb{E }\big ( \mathbf{V}^{(i)}_{k} \otimes \mathbf{V}^{(i)}_{s} \big ) \Big ) \nonumber \\&\quad \le \frac{C_9 (1+ M^2)(1+c^2(n)) m^4}{ n^2 N v^3 ( 1 \wedge v^2)} . \end{aligned}$$
(4.70)

So, overall, starting from (4.65), considering (4.70) and using the fact that \( m^2 k_{N,m} \le N\), we derive that

$$\begin{aligned}&\Big | \sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \mathbb{E }\big ( D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big )\mathbf{.} {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) - \sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \mathbb{E }\big ( D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big | \nonumber \\&\quad \le \frac{C_9 (1+ M^2) (1+c^2(n))m^2}{ n v^3 ( 1 \wedge v^2)} . \end{aligned}$$
(4.71)

Then, starting from (4.47), and considering the upper bounds (4.53), (4.54), (4.62), and (4.71), we get that

$$\begin{aligned}&\!\!\!\!\Big |\mathbb{E }\big ( S_{\!\!F^{{{\bar{\mathbf{B}}}_n}}}(z) \big ) \!-\!\mathbb{E }\big ( S_{F^{\mathbf{{\widetilde{G}}}_{n}}}(z) \big ) \Big | \le \frac{1}{2} \sum _{i=1}^n \sum _{s=1}^{\!k_{N,m} } \Big | \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \Big ( \mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) -\mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big ) \Big | \\&\quad + \frac{4 C_{10} (1+M^5) N^{1/2} m^4}{v^3 ( 1 \wedge v) n} +\frac{C_{10}(1+ M^2) (1+c^2(n))m^2}{ n v^3 ( 1 \wedge v^2)}\\&\quad +\frac{C_{10} ( 1 + c^2(n))}{ v^2 ( 1 \wedge v) } \sum _{k \ge m+1 } \Vert P_0 (X_{k}) \Vert _2 , \end{aligned}$$

where \(C_{10} = \max ( C_3,C_4,C_6,C_7)\). Since \(c(n) \rightarrow c \in (0,\infty )\), it follows that the second and third terms in the right-hand side of the above inequality tend to zero as \(n\) tends to infinity. On the other hand, by the condition (2.3), \(\lim _{m \rightarrow \infty }\sum _{k \ge m+1 } \Vert P_0 (X_{k}) \Vert _2=0 \). Therefore, Proposition 4.3 will follow if we can prove that, for any \(z\in \mathbb C ^+\),

$$\begin{aligned} \lim _{m \rightarrow \infty } \limsup _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \Big | \mathbb{E }\big (D^{2}f \big ( {\!\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \Big ( \!\mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) \!-\!\mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big ) \Big | \!=\! 0 .\qquad \quad \end{aligned}$$
(4.72)

Using the fact that \((Z_k^{(i)})_{k \in \mathbb{Z }}\) is distributed as \((Z_k)_{k \in \mathbb{Z }}\) together with (4.31) and that \(( {\bar{X}}^{(i)}_{k,m})_{k \in \mathbb{Z }}\) is distributed as \(({\bar{X}}_{k,m})_{k \in \mathbb{Z }}\), we first write that

$$\begin{aligned}&\mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \Big ( \mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) -\mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big ) \\&\quad = \sum _{k \in I_{s}} \sum _{\ell \in I_{s}} \mathbb{E }\left( \frac{\partial ^2 f }{\partial x^{(i)}_{k} \partial x^{(i)}_{\ell } } \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s} \big ) \right) \Big ( \mathrm{Cov} \big ( {\bar{X}}_{k,m}, {\bar{X}}_{\ell ,m}\big ) - \mathrm{Cov} \big ( X_{k}, X_{\ell } \big ) \Big ) . \end{aligned}$$

Hence, by using (4.56) and by stationarity, we get that there exists a positive constant \(C_{11}\) not depending on \((n,M,m)\) such that

$$\begin{aligned}&\Big | \mathbb{E }\big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \Big ( \mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) -\mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big ) \Big | \nonumber \\&\quad \le \frac{C_{11}}{n v^2 ( 1 \wedge v) ( N \wedge n)} \sum _{\ell =1}^{m^2} \sum _{k=0}^{m^2-\ell } \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | .\qquad \end{aligned}$$
(4.73)

To handle the right-hand side term, we first write that

$$\begin{aligned}&\sum _{\ell =1}^{m^2} \sum _{k=0}^{m^2- \ell } \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | \nonumber \\&\quad \le m^2 \sum _{k=0}^{m^2} \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big )- \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) \big |\nonumber \\&\quad + m^2 \sum _{k=0}^{m^2} \big | \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | , \end{aligned}$$
(4.74)

where \(X_{0,m}\) and \(X_{k,m}\) are defined in (4.28). Notice now that \(\mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) = \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) =0\) if \(k>m\). Therefore,

$$\begin{aligned}&\sum _{k=0}^{m^2} \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) - \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) \big | \\&\quad = \sum _{k=0}^{m } \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big )- \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) \big | . \end{aligned}$$

Next, using stationarity, the fact that the random variables are centered, (4.11) and (4.29), we get that

$$\begin{aligned}&\big |\mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) - \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) \big | \\&\quad = \big |\mathrm{Cov} \big ( {\bar{X}}_{0,m} -X_{0,m}, {\bar{X}}_{k,m}\big ) + \mathrm{Cov} \big ( X_{0,m}- {\bar{X}}_{0,m}, {\bar{X}}_{k,m}- X_{k,m} \big )\\&\quad +\mathrm{Cov}\big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}- X_{k,m} \big ) \big | \\&\quad \le 4M \Vert X_{0,m} - {\bar{X}}_{0,m} \Vert _1 + 4 \Vert \big ( |X_0| - M )_+ \Vert ^2_2 . \end{aligned}$$

As to get (4.29), notice that \( \Vert X_{0,m} - {\bar{X}}_{0,m} \Vert _1 \le 2 \Vert \big ( |X_0| - M )_+ \Vert _1 \). Moreover, \(\big ( |x| - M )_+ \le 2|x|\mathbf{1}_{|x| \ge M}\) which in turn implies that \(M \big ( |x| - M )_+ \le 2|x|^2\mathbf{1}_{|x| \ge M} \). So, overall,

$$\begin{aligned} \sum _{k=0}^{m^2} \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) - \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) \big | \le 32 \, m \mathbb{E }\big ( X_0^2 \mathbf{1}_{|X_0| \ge M} \big ) . \end{aligned}$$
(4.75)

We handle now the second term in the right-hand side of (4.74). Let \(b(m)\) be an increasing sequence of positive integers such that \(b(m) \rightarrow \infty \), \(b(m) \le [m/2]\), and

$$\begin{aligned} \lim _{m \rightarrow \infty } b(m) \big \Vert X_0 -X_{0,[m/2]} \big \Vert ^2_2=0. \end{aligned}$$
(4.76)

Notice that since (4.30) holds true, it is always possible to find such a sequence. Now, using (4.60),

$$\begin{aligned}&\sum _{k=b(m)}^{m^2} \big | \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | \nonumber \\&\quad \le \sum _{k=b(m)}^{m^2}\sum _{r=0}^{\infty } \Vert P_{0} (X_{r,m}) \Vert _2 \Vert P_0 (X_{k+r,m}) \Vert _2 \nonumber \\&\quad +\sum _{k=b(m)}^{m^2} \sum _{r=0}^{\infty } \Vert P_{0} (X_{r}) \Vert _2 \Vert P_0 (X_{k+r}) \Vert _2 . \end{aligned}$$
(4.77)

Recalling the definition (4.28) of the \(X_{j,m}\)’s, we notice that \(P_{0} (X_{j,m}) = 0 \) if \(j\ge m+1\). Now, for any \(j \in \{0, \ldots , m\}\),

$$\begin{aligned}&\mathbb{E }( X_{j,m} | \xi _0) = \mathbb{E }( \mathbb{E }( X_j | \varepsilon _j, \ldots , \varepsilon _{j-m}) | \xi _0) =\mathbb{E }( \mathbb{E }( X_j | \varepsilon _j, \ldots , \varepsilon _{j-m}) | \varepsilon _0, \ldots , \varepsilon _{j-m}) \\&\quad = \mathbb{E }( X_j | \varepsilon _0, \ldots , \varepsilon _{j-m}) = \mathbb{E }( \mathbb{E }( X_j | \xi _0) | \varepsilon _0, \ldots , \varepsilon _{j-m}) \ \text { a.s.} \end{aligned}$$

Actually, the last two equalities follow from the tower lemma, whereas, for the second one, we have used the following well-known fact with \(\mathcal{G }_1=\sigma (\varepsilon _0, \ldots , \varepsilon _{j-m}) \), \(\mathcal{G }_2=\sigma (\varepsilon _k, k \le j-m-1) \) and \(Y = X_{j,m}\): if \(Y\) is an integrable random variable, and \(\mathcal{G }_1\) and \(\mathcal{G }_2\) are two \(\sigma \)-algebras such that \( \sigma (Y) \vee \mathcal{G }_1\) is independent of \(\mathcal{G }_2\), then

$$\begin{aligned} \mathbb{E }( Y | \mathcal{G }_1 \vee \mathcal{G }_2) =\mathbb{E }( Y | \mathcal{G }_1) \ \text { a.s.} \end{aligned}$$
(4.78)

Similarly, for any \(j \in \{0, \ldots , m-1\}\),

$$\begin{aligned} \mathbb{E }( X_{j,m} | \xi _{-1})= \mathbb{E }( X_j | \varepsilon _{-1}, \ldots , \varepsilon _{j-m}) = \mathbb{E }( \mathbb{E }( X_j | \xi _{-1}) | \varepsilon _{-1}, \ldots , \varepsilon _{j-m}) \ \text { a.s.} \end{aligned}$$

Then using the equality (4.78) with \(\mathcal{G }_1=\sigma (\varepsilon _{-1}, \ldots , \varepsilon _{j-m}) \) and \(\mathcal{G }_2=\sigma (\varepsilon _0)\), we get that, for any \(j \in \{1,\ldots ,m-1\}\),

$$\begin{aligned} \mathbb{E }( X_{j,m} | \xi _{-1}) = \mathbb{E }( \mathbb{E }( X_j | \xi _{-1}) | \varepsilon _0, \ldots , \varepsilon _{j-m}) \ \text { a.s.} \end{aligned}$$

whereas \(\mathbb{E }( X_{m,m} | \xi _{-1}) =0\) a.s. So, finally, \(\Vert P_{0} (X_{m,m}) \Vert _2 = \Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 \), \(\Vert P_{0} (X_{j,m}) \Vert _2 =0\) if \(j \ge m+1\), and, for any \(j \in \{1,\ldots ,m-1\}\),

$$\begin{aligned} \Vert P_{0} (X_{j,m}) \Vert _2&= \Vert \mathbb{E }( X_{j,m} | \xi _0) -\mathbb{E }( X_{j,m} | \xi _{-1}) \Vert _2 \\&= \Vert \mathbb{E }\big (\mathbb{E }( X_j | \xi _{0}) - \mathbb{E }( X_j | \xi _{-1}) | \varepsilon _0, \ldots , \varepsilon _{j-m} \big )\Vert _2 \le \Vert P_0(X_j) \Vert _2 . \end{aligned}$$

Therefore, starting from (4.77), we infer that

$$\begin{aligned}&\sum _{k=b(m)}^{m^2} \big | \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | \nonumber \\&\quad \le 2 \Vert X_0 \Vert _2 \Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 + 2 \sum _{r=0}^{\infty } \Vert P_{0} (X_{r}) \Vert _2 \sum _{k \ge b(m)}\Vert P_0 (X_{k}) \Vert _2 . \end{aligned}$$
(4.79)

On the other hand,

$$\begin{aligned}&\sum _{k=0}^{b(m)} \big | \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | \nonumber \\&\quad \le \sum _{k=0}^{b(m)} \big | \mathrm{Cov} \big (X_0 - X_{0,m}, X_{k,m} \big ) \big | + \sum _{k=0}^{b(m)} \big | \mathrm{Cov} \big ( X_0, X_k - X_{k,m} \big ) \big |. \end{aligned}$$
(4.80)

Since the random variables are centered, \(\mathrm{Cov} \big (X_0 - X_{0,m}, X_{k,m} \big )=\mathbb{E }\big ( X_{k,m} (X_0 - X_{0,m})\big )\). Since \(X_{k,m}\) is \(\sigma ( \varepsilon _{k-m}, \ldots , \varepsilon _k)\)-measurable,

$$\begin{aligned} \mathbb{E }\big ( X_{k,m} (X_0 - X_{0,m})\big ) = \mathbb{E }\big ( X_{k,m} \big ( \mathbb{E }( X_0 | \varepsilon _{k}, \ldots , \varepsilon _{k-m} ) - \mathbb{E }( X_{0,m} | \varepsilon _{k}, \ldots , \varepsilon _{k-m} \big )\big ). \end{aligned}$$

But, for any \(k \in \{ 0, \ldots , m \}\), by using the equality (4.78) with \(\mathcal{G }_1=\sigma (\varepsilon _{0}, \ldots , \varepsilon _{k-m}) \) and \(\mathcal{G }_2=\sigma (\varepsilon _{k}, \ldots , \varepsilon _1)\), it follows that

$$\begin{aligned} \mathbb{E }( X_{0,m} | \varepsilon _{k}, \ldots , \varepsilon _{k-m} \big ) = \mathbb{E }( X_0 | \varepsilon _{0}, \ldots , \varepsilon _{k-m} ) \ \text { a.s.} \end{aligned}$$
(4.81)

and

$$\begin{aligned} \mathbb{E }( X_{0} | \varepsilon _{k}, \ldots , \varepsilon _{k-m} \big ) = \mathbb{E }( X_0 | \varepsilon _{0}, \ldots , \varepsilon _{k-m} ) \ \text { a.s.} \end{aligned}$$

Whence,

$$\begin{aligned} \sum _{k=0}^{b(m)} \big | \mathrm{Cov} \big (X_0 - X_{0,m}, X_{k,m} \big ) \big | =0 . \end{aligned}$$
(4.82)

To handle the second term in the right-hand side of (4.80), we start by writing that

$$\begin{aligned}&\mathrm{Cov} \big ( X_0, X_k - X_{k,m} \big ) = \mathrm{Cov} \big ( X_0 -X_{0,m}, X_k - X_{k,m} \big )\nonumber \\&\quad + \mathrm{Cov} \big ( X_{0,m}, X_k - X_{k,m} \big ) . \end{aligned}$$
(4.83)

Using the fact that the random variables are centered together with stationarity, we get that

$$\begin{aligned} \big | \mathrm{Cov} \big ( X_0 -X_{0,m}, X_k - X_{k,m} \big ) \big | \le \Vert X_0 -X_{0,m} \Vert _2^2 . \end{aligned}$$
(4.84)

On the other hand, noticing that \(\mathbb{E }( X_k - X_{k,m} | \varepsilon _k, \ldots , \varepsilon _{k-m} ) =0\), and using the fact that the random variables are centered, and stationarity, it follows that

$$\begin{aligned}&\big | \mathrm{Cov} \big ( X_{0,m}, X_k - X_{k,m} \big ) \big | = \big | \mathbb{E }\big ( \big ( X_{0,m} -\mathbb{E }( X_{0,m} | \varepsilon _k, \ldots , \varepsilon _{k-m} ) \big ) \big ( X_k - X_{k,m} \big ) \big )\big | \nonumber \\&\quad \le \Vert X_{0,m} -\mathbb{E }( X_{0,m} | \varepsilon _k, \ldots , \varepsilon _{k-m})\Vert _2 \Vert X_0 -X_{0,m} \Vert _2. \end{aligned}$$
(4.85)

Next, using (4.81), we get that, for any \(k \in \{0, \ldots , m\}\),

$$\begin{aligned} \Vert X_{0,m} -\mathbb{E }( X_{0,m} | \varepsilon _k, \ldots , \varepsilon _{k-m} ) \Vert _2&= \Vert X_{0,m} -\mathbb{E }( X_{0} | \varepsilon _0, \ldots , \varepsilon _{k-m} ) \Vert _2\nonumber \\&= \Vert \mathbb{E }\big ( X_{0} -\mathbb{E }( X_{0} | \varepsilon _0, \ldots , \varepsilon _{k-m} )| \varepsilon _0, \ldots , \varepsilon _{-m} \big )\Vert _2\nonumber \\&\quad \le \Vert X_0 -\mathbb{E }( X_{0} | \varepsilon _0, \ldots , \varepsilon _{k-m} ) \Vert _2.\nonumber \\ \end{aligned}$$
(4.86)

Therefore, starting from (4.85), taking into account (4.86) and the fact that

$$\begin{aligned} \max _{0 \le k \le [m/2]} \Vert X_0 -\mathbb{E }( X_{0} | \varepsilon _0, \ldots , \varepsilon _{k-m} ) \Vert _2 \le \Vert X_0 -\mathbb{E }( X_{0} | \varepsilon _0, \ldots , \varepsilon _{-[ m/2]} ) \Vert _2 , \end{aligned}$$

we get that

$$\begin{aligned} \max _{0 \le k \le [m/2]} \big | \mathrm{Cov} \big ( X_{0,m}, X_k - X_{k,m} \big ) \big | \le \Vert X_0 - X_{0,[ m/2]} \Vert _2^2. \end{aligned}$$
(4.87)

Starting from (4.83), gathering (4.84) and (4.87), and using the fact that \(b(m) \le [m/2]\), we then derive that

$$\begin{aligned} \sum _{k=0}^{b(m)} \big | \mathrm{Cov} \big ( X_0, X_k - X_{k,m} \big ) \big ) \big | \le 2 \, b(m) \Vert X_0 - X_{0,[ m/2]} \Vert _2^2 , \end{aligned}$$

which combined with (4.80) and (4.82) implies that

$$\begin{aligned} \sum _{k=0}^{b(m)} \big | \mathrm{Cov} \big ( X_{0,m}, X_{k,m} \big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big | \le 2 \, b(m) \Vert X_0 - X_{0,[ m/2]} \Vert _2^2. \end{aligned}$$
(4.88)

So, overall, starting from (4.74), gathering the upper bounds (4.75), (4.79), and (4.88), and taking into account the condition (2.3), we get that that there exists a positive constant \(C_{12}\) not depending on \((n,M,m)\) and such that

$$\begin{aligned}&\sum _{\ell =1}^{m^2} \sum _{k=0}^{m^2-\ell } \big | \mathrm{Cov} \big ( {\bar{X}}_{0,m}, {\bar{X}}_{k,m}\big ) - \mathrm{Cov} \big ( X_{0}, X_{k} \big ) \big |\nonumber \\&\quad \le C_{12} \Big ( m^3 \mathbb{E }\big ( X_0^2 \mathbf{1}_{|X_0| \ge M} \big ) +m^2 \Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 \nonumber \\&\quad +\, m^2\sum _{k \ge b(m)}\Vert P_0 (X_{k}) \Vert _2 + m^2 \, b(m) \Vert X_0 - X_{0,[ m/2]} \Vert _2^2 \Big ). \end{aligned}$$
(4.89)

Therefore, starting from (4.73), considering the upper bound (4.89), using the fact that \( m^2 k_{N,m} \le N\) and that \(\lim _{n \rightarrow \infty }c(n) = c\), it follows that there exists a positive constant \(C_{13}\) not depending on \((M,m)\) and such that

$$\begin{aligned}&\limsup _{n \rightarrow \infty } \sum _{i=1}^n \sum _{s=1}^{k_{N,m} } \Big | E \big (D^{2}f \big ( {\widetilde{\mathbf{W}}}^{(i)}_{s}\big ) \big ) \mathbf{.} \Big ( \mathbb{E }\big ( {\mathbf{{U}}^{(i) \, \otimes 2}_{s}} \big ) -\mathbb{E }\big ( {\mathbf{V}^{(i)\, \otimes 2}_{s}} \big ) \Big ) \Big | \nonumber \\&\quad \le \frac{C_{13} }{ v^2 ( 1 \wedge v) } \Big ( m \mathbb{E }\big ( X_0^2 \mathbf{1}_{|X_0| \ge M} \big ) + \Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2\nonumber \\&\quad + \sum _{k \ge b(m)}\Vert P_0 (X_{k}) \Vert _2 + b(m) \Vert X_0 - X_{0,[ m/2]} \Vert _2^2 \Big ). \end{aligned}$$
(4.90)

Letting first \(M\) tend to infinity and using the fact that \(X_0\) belongs to \(\mathbb{L }^2\), the first term in the right-hand side is going to zero. Letting now \(m\) tend to infinity the third term vanishes by the condition (2.3), whereas the last one goes to zero by taking into account (4.76). To show that the second term goes to zero as \(m\) tends to infinity, we notice that, by stationarity, \(\Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 \le \Vert \mathbb{E }( X_m | \xi _0) \Vert _2 = \Vert \mathbb{E }( X_0| \xi _{-m}) \Vert _2\). By the reverse martingale convergence theorem, setting \(\mathcal{F }_{-\infty } = \bigcap _{k \in \mathbb{Z }} \sigma ( \xi _k)\), \( \lim _{m \rightarrow \infty }\mathbb{E }( X_0| \xi _{-m}) =\mathbb{E }( X_0| \mathcal{F }_{-\infty })=0 \) a.s. (since \(\mathcal{F }_{-\infty }\) is trivial and \(\mathbb{E }(X_0)=0\)). So, since \(X_0\) belongs to \(\mathbb{L }^2\), \(\lim _{m \rightarrow \infty }\Vert \mathbb{E }( X_m | {\varepsilon _0}) \Vert _2 =0\). This ends the proof of (4.72) and then of Proposition 4.3. \(\square \)

4.4 End of the Proof of Theorem 2.1

According to Propositions 4.1, 4.2, and 4.3, the convergence (4.3) follows. Therefore, to end the proof of Theorem 2.1, it remains to show that (4.4) holds true with \(\mathbf{G}_n\) defined in Sect. 4.2. This can be achieved by using Theorem 1.1 in Silverstein [17] combined with arguments developed in the proof of Theorem 1 in [23] (see also [19]). With this aim, we consider \((y_k)_{k \in \mathbb{Z }}\) a sequence of i.i.d. real-valued random variables with law \(\mathcal{N } (0,1)\), and \(n\) independent copies of \((y_k)_{k \in \mathbb{Z }}\) that we denote by \((y^{(1)}_k)_{k \in \mathbb{Z }}, \ldots , (y^{(n)}_k)_{k \in \mathbb{Z }}\). For any \(i \in \{1, \ldots , n \}\), define \(\mathbf{{y}}_{i}=\big ( y_{1}^{(i)}, \ldots ,y_{N}^{(i)}\big )\). Let \(\mathcal Y _n=(\mathbf{{y}}^T_{1} \vert \cdots \vert \mathbf{{y}}^T_{n}) \) be the matrix whose columns are the \(\mathbf{{y}}^T_{i}\)’s and consider its associated sample covariance matrix \( \mathbf{Y}_n=\frac{1}{n}\mathcal Y _n\mathcal Y _{n}^{T} \). Let \(\gamma (k) = \mathrm{Cov} (X_0, X_k )\) and note that, by (4.31), \(\gamma (k)\) is also equal to \(\mathrm{Cov} (Z_0, Z_k )=\mathrm{Cov} (Z^{(i)}_0, Z^{(i)}_k )\) for any \(i\in \{1, \ldots , n \}\). Set

$$\begin{aligned} \Gamma _N:= \big ( \gamma _{j,k} \big )= \left( \begin{array}{cccc} \gamma (0) &{} \gamma (1) &{} \cdots &{} \gamma (N-1) \\ \gamma (1) &{} \gamma (0) &{} &{} \gamma (N-2) \\ \vdots &{} \vdots &{}\vdots &{} \vdots \\ \gamma (N-1) &{} \gamma (N-2) &{}\cdots &{} \gamma (0) \end{array} \right) . \end{aligned}$$

Note that \((\Gamma _N)\) is bounded in spectral norm. Indeed, by the Gerschgorin theorem, the largest eigenvalue of \(\Gamma _N\) is not larger than \(\gamma (0) + 2 \sum _{k \ge 1} \vert \gamma (k) |\) which, according to Remark 2.2, is finite. Note also that the vector \(( \mathbf{{Z}}_{1}, \ldots , \mathbf{{Z}}_{n})\) has the same distribution as \(\big ( \mathbf{{y}}_{1} \Gamma _N^{1/2}, \ldots , \mathbf{{y}}_{n}\Gamma _N^{1/2} \big )\) where \(\Gamma _N^{1/2}\) is the symmetric nonnegative square root of \(\Gamma _N\) and the \(\mathbf{{Z}}_{i}\)’s are defined in Sect. 4.2. Therefore, for any \(z\in \mathbb C ^+\), \(\mathbb{E }\big ( S_{F^{\mathbf{G}_n}}(z) \big ) =\mathbb{E }\big ( S_{F^{\mathbf{A}_n}}(z) \big )\) where \(\mathbf{A}_n = \Gamma _N^{1/2}\mathbf{Y}_n\Gamma _N^{1/2}\). The proof of (4.4) is then reduced to prove that, for any \(z\in \mathbb C ^+\),

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb{E }\big ( S_{F^{\mathbf{A}_n}}(z) \big ) = S(z) , \end{aligned}$$
(4.91)

where \(S\) is defined in (2.4). According to Theorem 1.1 in Silverstein [17], if one can show that

$$\begin{aligned} F^{\Gamma _N} \text { converges to a probability distribution } H, \end{aligned}$$
(4.92)

then (4.91) holds with \(S\) satisfying the equation (1.4) in Silverstein [17]. Due to the Toeplitz form of \(\Gamma _N\) and to the fact that \(\sum _{k \ge 0} \vert \gamma (k) | < \infty \) (see Remark 2.2), the convergence (4.92) can be proved by taking into account the arguments developed in the proof of Theorem 1 of [23]. Indeed, the fundamental eigenvalue distribution theorem of Szegö for Toeplitz forms allows to assert that the empirical spectral distribution of \(\Gamma _N\) converges weakly to a nonrandom distribution \(H\) that is defined via the spectral density of \((X_k)_{k \in \mathbb{Z }}\) (see Relations (12) and (13) in [23]). To end the proof, it suffices to notice that the relation (1.4) in Silverstein [17] combined with the relation (13) in [23] leads to (2.4). \(\square \)