1 Introduction

Given a graph on N vertices, say, \(\{1,\ldots , N\}\), let \(A_N\) denote the adjacency matrix of the graph, whose (ij)th entry is 1 if there is an edge between vertices i and j and 0 otherwise. Important statistics of the graph are the eigenvalues and eigenvectors of \(A_N\) which encode crucial information about the graph. The present article considers the generalization of the most studied random graph, namely the Erdős–Rényi random graph (ERRG). It is a graph on N vertices where an edge is present independently with probability \(\varepsilon _N\). The adjacency matrix of the ERRG is a symmetric matrix with diagonal entries zero, and the entries above the diagonal are independent and identically distributed Bernoulli random variables with parameter \(\varepsilon _N\). We consider an inhomogeneous extension of the ERRG where the presence of an edge between vertices i and j is given by a Bernoulli random variable with parameter \(p_{i,j}\) and these \(\{p_{i,j}:\, 1\le i< j\le N\}\) need not be same. When \(p_{i,j}\) are same for all vertices i and j it shall be referred as (homogeneous) ERRG.

The mathematical foundations of inhomogeneous ERRG where the connection probabilities \(p_{i,j}\) come from a discretization of a symmetric, non-negative function f on \([0,1]^2\) was initiated in [9]. The said article considered edge probabilities given by

$$\begin{aligned} p_{i,j}= \frac{1}{N} f\left( \frac{i}{N},\, \frac{j}{N}\right) . \end{aligned}$$

In that case the average degree is bounded and the phase transition picture on the largest cluster size was studied in the same article (see also [8, 32] for results on inhomogeneous ERRG). The present article considers a similar set-up where the average degree is unbounded and studies the properties of eigenvalues of the adjacency matrix. The connection probabilities are given by

$$\begin{aligned} p_{i,j}=\varepsilon _N f\left( \frac{i}{N}, \frac{j}{N}\right) \end{aligned}$$

with the assumption that

$$\begin{aligned} N\varepsilon _N\rightarrow \infty . \end{aligned}$$
(1.1)

Let \(\lambda _1(A_N)\ge \cdots \ge \lambda _N(A_N)\) be the eigenvalues of \(A_N\). It was shown in [13] (see also [35] for a graphon approach) that the empirical distribution of the centered adjacency matrix converges, after scaling with \(\sqrt{N\varepsilon _N}\), to a compactly supported measure \(\mu _f\). When \(f\equiv 1\), the limiting law \(\mu _f\) turns out to be the semicircle law. Note that \(f\equiv 1\) corresponds to the (homogeneous) ERRG (see [16, 31] also for the homogeneous case). Quantitative estimates on the largest eigenvalue of the homogeneous case (when \(N\varepsilon _N\gg (\log N)^4\)) were studied in [20, 34] and it follows from their work that the smallest and second largest eigenvalue converge to the edge of the support of semicircular law. The results were improved recently in [7] and the condition on sparsity can be extended to the case \(N\varepsilon _N\gg \log N\) (which is also the connectivity threshold). It was shown that inhomogeneous ERRG also has a similar behaviour. The largest eigenvalue of inhomogeneous ERRG when \(N\varepsilon _N\ll \log N\) was treated in [6]. Under the assumption that \(N^{\xi }\ll N\varepsilon _N\) for some \(\xi \in (2/3, 1]\), it was proved in [17, Theorem 2.7] that the second largest eigenvalue of the (homogeneous) ERRG after appropriate centering and scaling converges in distribution to the Tracy–Widom law. The results were recently improved in [27]. The properties of the largest eigenvector in the homogeneous case was studied in [1, 17, 22, 27, 31].

The scaling limit of the maximum eigenvalue of inhomogenous ERRG also turns out to be interesting. The fluctuations of the maximum eigenvalue in the homogeneous case were studied in [18]. It was proved that

$$\begin{aligned} (\varepsilon _N(1-\varepsilon _N))^{-1/2}\left( \lambda _1( A_N)- \mathrm{E}[\lambda _1(A_N)]\right) \Rightarrow N(0, 2). \end{aligned}$$

The above result was shown under the assumption that

$$\begin{aligned} (\log N)^{\xi } \ll N\varepsilon _N \end{aligned}$$
(1.2)

for some \(\xi >8\), which is a stronger assumption than (1.1).

It is well known that in the classical case of a (standard) Wigner matrix, the largest eigenvalue converges to the Tracy–Widom law. We note that there is a different scaling between the edge and bulk of the spectrum in ERRG. As pointed out before, the bulk is of the order \((N\varepsilon _N)^{1/2}\) and the order of the largest eigenvalue is \(N\varepsilon _N\). Letting

$$\begin{aligned} W_N=A_N-\mathrm{E}(A_N), \end{aligned}$$
(1.3)

where \(\mathrm{E}(A_N)\) is the entrywise expectation of \(A_N\), it is easy to see that

$$\begin{aligned} A_N=\varepsilon _N\mathbf{1}\mathbf{1}^\prime +W_N, \end{aligned}$$

where \(\mathbf{1}\) is the \(N\times 1\) vector with each entry 1. Since the empirical spectral distribution of \((N\varepsilon _N)^{-1/2}W_N\) converges to semi-circle law, the largest eigenvalue of the same converges to 2 almost surely. As \(\mathrm{E}[A_N]\) is a rank-one matrix, it turns out that the largest eigenvalue of \(A_N\) scales like \(N\varepsilon _N\), which is different from the bulk scaling.

The above behaviour can be treated as a special case of the perturbation of a Wigner matrix. When \(W_N\) is a symmetric random matrix with independent and identically distributed entries with mean zero and finite variance \(\sigma ^2\) and the deformation is of the form

$$\begin{aligned} M_N= \frac{W_N}{\sqrt{N}}+ P_N, \end{aligned}$$

the largest eigenvalue is well-studied. Motivated by the study of adjacency matrix of homogeneous ERRG, [20] studied the above deformation with \(P_N= m N^{-1/2} \mathbf{1}\mathbf{1}^\prime \), \(m\ne 0\). They showed that

$$\begin{aligned} N^{-1/2}\left( \lambda _1(M_N)- Nm-\frac{\sigma ^2}{m}\right) \Rightarrow N(0, 2\sigma ^2). \end{aligned}$$

Since the bulk of \(N^{-1/2} M_N\) lies within \([-2\sigma , \, 2\sigma ]\), the largest eigenvalue is detached from the bulk. In general, when the largest eigenvalue of the perturbation has the same order as that of the maximum eigenvalue of \(W_N\), the problem is more challenging. One of the seminal results in this direction was obtained in [3]. They exhibited a phase transition in the behaviour of the largest eigenvalue for complex Wishart matrix, which is now referred to as the BBP (Baik–Ben Arous–Péché) phase transition. It is roughly as follows. Suppose \(P_N\) is a deterministic matrix of rank k with non-trivial eigenvalues \(\theta _1\ge \theta _2\dots \ge \theta _k>0\). If \(\theta _i\le \sigma \), then \(\lambda _i(M_N)\rightarrow 2\sigma \) almost surely, and if \(\theta _i> \sigma \) then

$$\begin{aligned} \lambda _i\rightarrow \theta _i+\frac{\sigma ^2}{\theta _i},\;\;\text {almost surely}. \end{aligned}$$

See [2, 19] for further extensions. It is clear that when \(\theta _i>\sigma \) the corresponding eigenvalue lies outside the bulk of the spectrum. The phase transition is also present at the level of fluctuations around \(2\sigma \) or \(\theta _i+\sigma ^2/\theta _i\). It is known that under some moment conditions on the entries of \(W_N\) (see [11, 24, 25]), when \(\theta _i\le \sigma \), the fluctuations are governed by the Tracy–Widom law, and when \(\theta _i>\sigma \), the limiting distribution is given by the eigenvalues of a random matrix of order k. This limiting random matrix depends on the eigenvectors of \(P_N\) and also on the entries of \(W_N\). The non-universal nature was pointed out in [11]. For example, when \(W_N\) is a Gaussian orthogonal ensemble and \(P_N=\theta _1 \mathbf{1}\mathbf{1}^{\prime }\) then the limit is Gaussian and if the entries of \(W_N\) are not from a Gaussian distribution, then the limit is a convolution of Gaussian and the distribution from which the entries of \(W_N\) are sampled. One can find further details in [3,4,5, 11, 12, 24, 25] and the survey by [29]. The case when the rank k depends on N was considered in [10, 23, 26]. Various applications of these results on outliers can be found in the literature, for example, [14, 15, 30].

The adjacency matrix of the inhomogeneous ERRG does not fall directly into purview of the above results, since \(W_N\), as in (1.3), is a symmetric matrix, with independent entries above the diagonal, but the entries have a variance profile, which also depends on the size of the graph. The inhomogeneity does not allow the use of local laws suitable for semicircle law in an obvious way. The present article aims at extending the results obtained in [18] for the case that f is a constant to the case that f is a non-negative, symmetric, bounded, Riemann integrable function on \([0,1]^2\) which induces an integral operator of finite rank k, under the assumption that (1.2) holds. The case \(k\ge 2\) turns out to be substantially difficult than the case \(k=1\) for the following reason. If \(k=1\), that is,

$$\begin{aligned} \mathrm{E}(A_N)=u_Nu_N^\prime , \end{aligned}$$

for some \(N\times 1\) deterministic column vector \(u_N\), then with high probability it holds that

$$\begin{aligned} u_N^\prime \left( \lambda I-W_N\right) ^{-1}u_N=1, \end{aligned}$$

where \(\lambda \) is the largest eigenvalue of \(A_N\). The above equation facilitates the asymptotic study of \(\lambda \). However, when \(k\ge 2\), the above equation takes a complicated form. The observation which provides a way out of this is that \(\lambda \) is also an eigenvalue of a \(k\times k\) matrix with high probability; the same is recorded in Lemma 5.2 of Sect. 5. Besides, working with the eigenvalues of a \(k\times k\) matrix needs more linear algebraic work when \(k\ge 2\). For example, the proof of Lemma 5.8, which is one of the major steps in the proof of a main result, becomes a tautology when \(k=1\).

The following results are obtained in the current paper. If the largest eigenvalue of the integral operator has multiplicity 1, then the largest eigenvalue of the adjacency matrix has a Gaussian fluctuation. More generally, it is shown that the eigenvalues which correspond to isolated eigenvalues, which will be defined later, of the induced integral operator jointly converge to a multivariate Gaussian law. Under the assumption that the function f is Lipschitz continuous, the leading order term in the expansion of the expected value of the isolated eigenvalues is obtained. Furthermore, under an additional assumption, the inner product of the eigenvector with the discretized eigenfunction of the integral operator corresponding to the other eigenvalues is shown to have a Gaussian fluctuation. Some important examples of such f include the rank-one case, and the stochastic block models. It remains an open question to see if the \((k+1)\)th eigenvalue follows a Tracy–Widom type scaling.

The mathematical set-up and the main results of the paper are stated in Sect. 2. Theorem 2.3 shows that of the k largest eigenvalues, the isolated ones, centred by their mean and appropriately scaled, converge to a multivariate normal distribution. Theorem 2.4 studies the first and second order of the expectation of the top k isolated eigenvalues. Theorems 2.5 and 2.6 study the behaviour of the eigenvectors corresponding to the top k isolated eigenvalues. Section 3 contains the special case when f is rank one and the example of stochastic block models. A few preparatory estimates are noted in Sect. 4, which are used later in the proofs of the main results, given in Sect. 5. The estimates in Sect. 4 are proved in Appendix.

2 The Set-Up And The Results

Let \(f:[0,1]\times [0,1]\rightarrow [0,\infty )\) be a function which is symmetric, bounded, and Riemann integrable, that is,

$$\begin{aligned} f(x,y)=f(y,x), \quad 0\le x,y\le 1, \end{aligned}$$
(2.1)

and the set of discontinuities of f in \([0,1]\times [0,1]\) has Lebesgue measure zero.

The integral operator \(I_f\) with kernel f is defined from \(L^2[0,1]\) to itself by

$$\begin{aligned} \bigl (I_f(g)\bigr )(x)=\int _0^1f(x,y)g(y)\,dy, \quad 0\le x\le 1. \end{aligned}$$

Besides the above, we assume that \(I_f\) is a non-negative definite operator and the range of \(I_f\) has a finite dimension.

Under the above assumptions \(I_f\) turns out to be a compact self-adjoint operator, and from the spectral theory one obtains \(\theta _1\ge \theta _2\ge \cdots \ge \theta _k>0\) as the non-zero eigenvalues of \(I_f\) (where k is the dimension of the range of \(I_f\)), and eigenfunctions \(r_i\) corresponding to \(\theta _i\). Therefore, \(\{r_1,\ldots ,r_k\}\) is an orthonormal set in \(L^2[0,1]\), and by assumption, each \(r_i\) is Riemann integrable (see Lemma 6.1 in Appendix). Also, for any \(g\in L^2[0,1]\) one has

$$\begin{aligned} I_f(g) =\sum _{i=1}^k \theta _i \langle \, r_i, g\rangle _{L^2[0,1]} r_i. \end{aligned}$$

Note that this gives

$$\begin{aligned} \int _0^1 \left( \sum _{i=1}^k \theta _i r_i(x) r_i(y) g(y)\right) \, dy= \int _0^1 f(x,y) g(y)\, dy \; \;\text {for almost all}{ x\in [0,1]}. \end{aligned}$$

Since g is an arbitrary function in \(L^2[0,1]\) this immediately gives

$$\begin{aligned} f(x,y)=\sum _{i=1}^k\theta _ir_i(x)r_i(y), \quad \text {for almost all }(x,y)\in [0,1]\times [0,1]. \end{aligned}$$

Since the functions on both sides of the above equation are Riemann integrable, the corresponding Riemann sums are approximately equal, and hence there is no loss of generality in assuming that the above equality holds for every x and y.

That is, we now assume that

$$\begin{aligned} f(x,y)=\sum _{i=1}^k\theta _ir_i(x)r_i(y)\ge 0, \quad \text {for all}\;(x,y)\in [0,1]\times [0,1], \end{aligned}$$
(2.2)

where \(\theta _1\ge \cdots \ge \theta _k>0\) and \(\{r_1,\ldots ,r_k\}\) is an orthonormal set in \(L^2[0,1]\). The assumptions on \(r_1,\ldots ,r_k\) are listed below for easy reference.

Assumption F1. The functions \(r_1,\ldots ,r_k\) from [0, 1] to \({{\mathbb {R}}}\) are bounded and Riemann integrable.

Assumption F2. For each \(i=1,\ldots ,k\), \(r_i\) is Lipschitz, that is,

$$\begin{aligned} |r_i(x)-r_i(y)|\le K_i|x-y|, \end{aligned}$$

for some fixed \(K_i<\infty \). This is clearly stronger than Assumption F1, and will be needed in a few results. A consequence of this assumption is that there exists K such that

$$\begin{aligned} \left| f(x,y)-f(x^\prime ,y^\prime )\right| \le K\left( |x-x^\prime |+|y-y^\prime |\right) , \quad 0\le x,x^\prime ,y,y^\prime \le 1. \end{aligned}$$
(2.3)

Let \((\varepsilon _N:N\ge 1)\) be a real sequence satisfying

$$\begin{aligned} 0<\varepsilon _N\le \left[ \sup _{0\le x,y\le 1}f(x,y)\right] ^{-1}, \quad N\ge 1. \end{aligned}$$

The following will be the bare minimum assumption for all the results.

Assumption E1. For some \(\xi >8\), fixed once and for all,

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N\varepsilon _N}(\log N)^\xi =0, \end{aligned}$$

that is, (1.2) holds. Furthermore,

$$\begin{aligned} \lim _{N\rightarrow \infty }\varepsilon _N=\varepsilon _\infty , \end{aligned}$$
(2.4)

for some \(\varepsilon _\infty \ge 0\). It’s worth emphasizing that we do not assume that \(\varepsilon _N\) necessarily goes to zero, although that may be the case.

For one result, we shall have to make a stronger assumption on \((\varepsilon _N)\) which is the following.

Assumption E2. As \(N\rightarrow \infty \),

$$\begin{aligned} N^{-2/3}\ll \varepsilon _N\ll 1. \end{aligned}$$
(2.5)

For \(N\ge 1\), let \({{\mathbb {G}}}_N\) be an inhomogeneous Erdős–Rényi graph where an edge is placed between vertices i and j with probability \(\varepsilon _N f(i/N,j/N)\), for \(i\le j\), the choice being made independently for each pair in \(\{(i,j):1\le i\le j\le N\}\). Note that we allow self-loops. Let \(A_N\) be the adjacency matrix of \({{\mathbb {G}}}_N\). In other words, \(A_N\) is an \(N\times N\) symmetric matrix, where \(\{A_N(i,j):1\le i\le j\le N\}\) is a collection of independent random variable, and

$$\begin{aligned} A_N(i,j)\sim \text {Bernoulli}\left( \varepsilon _N f\left( \frac{i}{N},\frac{j}{N}\right) \right) , \quad 1\le i\le j\le N. \end{aligned}$$

A few more notations are needed for stating the main results. For a moment, set \(\theta _0=\infty \) and \(\theta _{k+1}=-\infty \), and define the set of indices i for which \(\theta _i\) is isolated as follows:

$$\begin{aligned} {{\mathcal {I}}}=\{1\le i\le k: \theta _{i-1}>\theta _i>\theta _{i+1}\}. \end{aligned}$$

For an \(N\times N\) real symmetric matrix M, let \(\lambda _1(M)\ge \cdots \ge \lambda _N(M)\) denote its eigenvalues, as mentioned in Sect. 1. Finally, after the following definition, the main results will be stated.

Definition

A sequence of events \(E_N\) occurs with high probability, abbreviated as w.h.p., if

$$\begin{aligned} P(E_N^c)=O\left( e^{-(\log N)^\eta }\right) , \end{aligned}$$

for some \(\eta >1\). For random variables \(Y_N,Z_N\),

$$\begin{aligned} Y_N=O_{hp}(Z_N), \end{aligned}$$

means there exists a deterministic finite constant C such that

$$\begin{aligned} |Y_N|\le C|Z_N|\;\text {w.h.p.}, \end{aligned}$$

and

$$\begin{aligned} Y_N=o_{hp}(Z_N), \end{aligned}$$

means that for all \(\delta >0\),

$$\begin{aligned} |Y_N|\le \delta |Z_N|\;\text {w.h.p.} \end{aligned}$$

We shall say

$$\begin{aligned} Y_N=O_p(Z_N), \end{aligned}$$

to mean that

$$\begin{aligned} \lim _{x\rightarrow \infty }\sup _{N\ge 1}P(|Y_N|>x|Z_N|)=0, \end{aligned}$$

and

$$\begin{aligned} Y_N=o_p(Z_N), \end{aligned}$$

to mean that for all \(\delta >0\),

$$\begin{aligned} \lim _{N\rightarrow \infty }P(|Y_N|>\delta |Z_N|)=0. \end{aligned}$$

The reader may note that if \(Z_N\ne 0\) a.s., then “\(Y_N=O_{p}(Z_N)\)” and “\(Y_N=o_{p}(Z_N)\)” are equivalent to “\((Z_N^{-1}Y_N:N\ge 1)\) is stochastically tight” and “\(Z_N^{-1}Y_N{\mathop {\longrightarrow }\limits ^{P}}0\)”, respectively. Besides, “\(Y_N=O_{hp}(Z_N)\)” is a much stronger statement than “\(Y_N=O_p(Z_N)\)”, and so is “\(Y_N=o_{hp}(Z_N)\)” than “\(Y_N=o_{p}(Z_N)\)”.

In the rest of the paper, the subscript ‘N’ is dropped from notations like \(A_N\), \(W_N\), \(\varepsilon _N\) etc. and the ones that will be introduced. The first result is about the first order behaviour of \(\lambda _i(A)\).

Theorem 2.1

Under Assumptions E1 and F1, for every \(1\le i\le k\),

$$\begin{aligned} \lambda _i(A)=N\varepsilon \theta _i\left( 1+o_{hp}(1)\right) . \end{aligned}$$

An immediate consequence of the above is that for all \(1\le i\le k\), \(\lambda _i(A)\) is non-zero w.h.p. and hence dividing by the same is allowed, as done in the next result. Define

$$\begin{aligned} e_i=\left[ \begin{matrix} N^{-1/2}r_i(1/N)\\ N^{-1/2}r_i(2/N)\\ \vdots \\ N^{-1/2}r_i(1)\\ \end{matrix} \right] , \quad 1\le i\le k. \end{aligned}$$
(2.6)

The second main result studies the asymptotic behaviour of \(\lambda _i(A)\), for \(i\in {\mathcal {I}}\), after appropriate centering and scaling.

Theorem 2.2

Under Assumptions E1 and F1, for every \(i\in {\mathcal {I}}\), as \(N\rightarrow \infty \),

$$\begin{aligned} \lambda _i(A)=\mathrm{E}\left( \lambda _i(A)\right) +\frac{N\theta _i\varepsilon }{\lambda _i(A)}e_i^\prime We_i+o_p(\sqrt{\varepsilon }), \end{aligned}$$

where W is as defined in (1.3).

The next result is the corollary of the previous two.

Theorem 2.3

Under Assumptions E1 and F1, if \({\mathcal {I}}\) is a non-empty set, then as \(N\rightarrow \infty \),

$$\begin{aligned} \left( \varepsilon ^{-1/2}\left( \lambda _i(A)-\mathrm{E}[\lambda _i(A)]\right) :i\in {\mathcal I}\right) \Rightarrow \left( G_i:i\in {{\mathcal {I}}}\right) , \end{aligned}$$
(2.7)

where the right hand side is a multivariate normal random vector in \({{\mathbb {R}}}^{|{{\mathcal {I}}}|}\), with mean zero and

$$\begin{aligned} \mathrm{Cov}(G_i,G_j)=2\int _0^1\int _0^1 r_i(x)r_i(y)r_j(x)r_j(y)f(x,y)\left[ 1-\varepsilon _\infty f(x,y)\right] \,dx\,dy, \end{aligned}$$
(2.8)

for all \(i,j\in {\mathcal {I}}\).

For \(i,j\in {\mathcal {I}}\), that W is a symmetric matrix whose upper triangular entries are independent and zero mean implies that as \(N\rightarrow \infty \),

$$\begin{aligned}&Cov\left( e_i^\prime We_i,e_j^\prime We_j\right) \nonumber \\&\sim 4\sum _{1\le k\le l\le N}\mathrm{Cov}\left( e_i(k)W(k,l)e_i(l),e_j(k)W(k,l)e_j(l)\right) \nonumber \\&=4N^{-2}\sum _{1\le k\le l\le N}r_i\left( \frac{k}{N}\right) r_i\left( \frac{l}{N}\right) r_j\left( \frac{k}{N}\right) r_j\left( \frac{l}{N}\right) \varepsilon f\left( \frac{k}{N},\frac{l}{N}\right) \left[ 1-\varepsilon f\left( \frac{k}{N},\frac{l}{N}\right) \right] \nonumber \\&\sim 4\varepsilon \int _0^1\,dx\,\int _x^1\,dy\,r_i(x)r_i(y)r_j(x)r_j(y)f(x,y)\left[ 1-\varepsilon _\infty f(x,y)\right] \nonumber \\&=2\varepsilon \int _0^1\int _0^1r_i(x)r_i(y)r_j(x)r_j(y)f(x,y)\left[ 1-\varepsilon _\infty f(x,y)\right] \,dx\,dy. \end{aligned}$$
(2.9)

With the help of the above, it may be checked that the Lindeberg–Lévy central limit theorem implies that as \(N\rightarrow \infty \),

$$\begin{aligned} \left( \varepsilon ^{-1/2}e_i^\prime We_i:i\in \mathcal I\right) \Rightarrow (G_i:i\in {\mathcal {I}}), \end{aligned}$$
(2.10)

where the right hand side is a zero mean Gaussian vector with covariance given by (2.8). Therefore, Theorem 2.3 would follow from Theorems 2.1 and 2.2 .

Remark 2.1

If \(f>0\) a.e. on \([0,1]\times [0,1]\), then the Krein–Rutman theorem (see Lemma 6.2) implies that \(1\in {\mathcal {I}}\), and that \(r_1>0\) a.e. Thus, in this case, if \(\varepsilon _\infty =0\), then

$$\begin{aligned} \mathrm{Var}(G_1)=2\int _0^1\int _0^1r_1(x)^2r_1(y)^2f(x,y)\,dx\,dy>0. \end{aligned}$$

Remark 2.2

For a fixed \(\theta >1\), define

$$\begin{aligned} f(x,y)=\theta \mathbf{1}\left( x\vee y<\frac{1}{2}\right) +\mathbf{1}\left( x\wedge y>\frac{1}{2}\right) , \quad 0\le x,y\le 1. \end{aligned}$$

In this case, the integral operator associated with f has exactly two non-zero eigenvalues, which are \(\theta /2\) and 1/2, with corresponding normalized eigenfunctions \(r_1(x)=\sqrt{2}\mathbf{1}(x<1/2)\) and \(r_2(x)=\sqrt{2}\mathbf{1}(x\ge 1/2)\), respectively. Let \((\varepsilon _N)\) satisfy Assumption E1 and suppose that \(\varepsilon _\infty =0\). Theorem 2.3 implies that as \(N\rightarrow \infty \),

$$\begin{aligned} \left( \varepsilon ^{-1/2}\left( \lambda _1(A)-\mathrm{E}[\lambda _1(A)]\right) ,\varepsilon ^{-1/2}\left( \lambda _2(A)-\mathrm{E}[\lambda _2(A)]\right) \right) \Rightarrow \left( G_1,G_2\right) , \end{aligned}$$

where the right hand side has a bivariate normal distribution with mean zero. Furthermore, since \(r_1r_2\) is identically zero, it follows that \(G_1\) and \(G_2\) are uncorrelated and hence independent.

Remark 2.3

That the claim of Theorem 2.3 may not hold if \(i\notin \mathcal I\) is evident from the following example. As in Remark 2.2, suppose that Assumption E1 holds and that \(\varepsilon _\infty =0\). Let

$$\begin{aligned} f(x,y)=\mathbf{1}\left( x\vee y<\frac{1}{2}\right) +\mathbf{1}\left( x\wedge y>\frac{1}{2}\right) , \quad 0\le x,y\le 1. \end{aligned}$$

In this case, the integral operator associated with f has exactly one non-zero eigenvalue, which is 1/2, and that has multiplicity 2, with eigenfunctions \(r_1\) and \(r_2\) as in Remark 2.2. In other words, f doesn’t have any simple eigenvalue.

Theorem 2.3 itself implies that there exists \(\beta _N\in {{\mathbb {R}}}\) such that

$$\begin{aligned} \varepsilon ^{-1/2}\left( \lambda _1(A)-\beta \right) \Rightarrow G_1\vee G_2, \end{aligned}$$

where \(G_1\) and \(G_2\) are independent from normal with mean 0. Furthermore,

$$\begin{aligned} \mathrm{Var}(G_1)=2\int _0^1\int _0^1 r_1(x)^2r_1(y)^2f(x,y)\,dx\,dy=8\int _0^1\int _0^1\mathbf{1}\left( x\vee y\le \frac{1}{2}\right) \,dx\,dy=2. \end{aligned}$$

That is, \(G_1\) and \(G_2\) are i.i.d. from N(0, 2). Hence, there doesn’t exist a centering and a scaling by which \(\lambda _1(A)\) converges weakly to a non-degenerate normal distribution.

The next main result of the paper studies asymptotics of \(\mathrm{E}(\lambda _i(A))\) for \(i\in {\mathcal {I}}\).

Theorem 2.4

Under Assumptions E1 and F2, it holds for all \(i\in {\mathcal {I}}\),

$$\begin{aligned} \mathrm{E}\left[ \lambda _i(A)\right] =\lambda _i(B)+O\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) , \end{aligned}$$

where B is a \(k\times k\) symmetric deterministic matrix, depending on N, defined by

$$\begin{aligned} B(j,l)=\sqrt{\theta _j\theta _l}N\varepsilon e_j^\prime e_l+\theta _i^{-2}\sqrt{\theta _j\theta _l}(N\varepsilon )^{-1}\mathrm{E}\left( e_j^\prime W^2 e_l\right) , \quad 1\le j,l\le k, \end{aligned}$$

and \(e_j\) and W are as defined in (2.6) and (1.3), respectively.

The next result studies the asymptotic behaviour of the normalized eigenvector corresponding to \(\lambda _i(A)\), again for isolated vertices i. It is shown that the same is asymptotically aligned with \(e_i\), and hence it is asymptotically orthogonal to \(e_j\). Upper bounds on rates of convergence are obtained.

Theorem 2.5

As in Theorem 2.4, let Assumptions E1 and F2 hold. Then, for a fixed \(i\in {\mathcal {I}}\),

$$\begin{aligned} \lim _{N\rightarrow \infty }P\left( \lambda _i(A)\quad \text {is an eigenvalue of multiplicity }1\right) =1. \end{aligned}$$
(2.11)

If v is the eigenvector, with \(L^2\)-norm 1, of A corresponding to \(\lambda _i(A)\), then

$$\begin{aligned} e_i^\prime v=1+O_p\left( (N\varepsilon )^{-1}\right) , \end{aligned}$$
(2.12)

that is, \(N\varepsilon (1-e_i^\prime v)\) is stochastically tight. When \(k\ge 2\), it holds that

$$\begin{aligned} e_j^\prime v=O_p\left( (N\varepsilon )^{-1}\right) , \quad j\in \{1,\ldots ,k\}\setminus \{i\}. \end{aligned}$$
(2.13)

The last main result of this paper studies finer fluctuations of (2.13) under an additional condition.

Theorem 2.6

Let \(k\ge 2\), \(i\in {\mathcal {I}}\) and Assumptions E2 and F2 hold. If v is as in Theorem 2.5, then, for all \(j\in \{1,\ldots ,k\}\setminus \{i\}\),

$$\begin{aligned} e_j^\prime v=\frac{1}{\theta _i-\theta _j}\left[ \theta _i\frac{1}{\lambda _i(A)}e_i^\prime We_j+(N\varepsilon )^{-2}\frac{1}{\theta _i}\mathrm{E}\left( e_i^\prime W^2 e_j\right) \right] +o_p\left( \frac{1}{N\sqrt{\varepsilon }}\right) . \end{aligned}$$

Remark 2.4

An immediate consequence of Theorem 2.6 is that under Assumption E2., there exists a deterministic sequence \((z_N:N\ge 1)\) given by

$$\begin{aligned} z=\frac{1}{(N\varepsilon )^2\theta _i(\theta _i-\theta _j)}\mathrm{E}\left( e_i^\prime W^2e_j\right) , \end{aligned}$$

such that as \(N\rightarrow \infty \),

$$\begin{aligned} Z_{ij}=N\sqrt{\varepsilon }\left( e_j^\prime v-z\right) \end{aligned}$$
(2.14)

converges weakly to a normal distribution with mean zero, for all \(i\in {\mathcal {I}}\) and \(j\in \{1,\ldots ,k\}\setminus \{i\}\). Furthermore, the convergence holds jointly for all i and j satisfying the above. This, along with (2.7), implies that the collection

$$\begin{aligned} \left( Z_{ij}:i\in {\mathcal {I}}, j\in \{1,\ldots ,k\}\setminus \{i\}\right) \cup \left( \varepsilon ^{-1/2}\left( \lambda _i(A)-\mathrm{E}\left[ \lambda _i(A)\right] \right) :i\in \mathcal I\right) \end{aligned}$$

converges weakly, as \(N\rightarrow \infty \), to

$$\begin{aligned} \left( G_{ij}:i\in {\mathcal {I}}, j\in \{1,\ldots ,k\}\setminus \{i\}\right) \cup \left( G_i:i\in \mathcal I\right) \end{aligned}$$

which is a zero mean Gaussian vector in \({{\mathbb {R}}}^{k|{{\mathcal {I}}}|}\). The covariance matrix of \((G_i)\) is as in (2.8), and \(\mathrm{Cov}(G_{ij},G_{i^\prime \,j^\prime })\) and \(\mathrm{Cov}(G_{ij},G_{i^\prime })\) are not hard to calculate by proceeding as in (2.9).

3 Examples and Special Cases

3.1 The Rank One Case

Let us consider the special case of \(k=1\), that is,

$$\begin{aligned} f(x,y)=\theta r(x)r(y), \end{aligned}$$

for some \(\theta >0\), and a bounded Riemann integrable \(r:[0,1]\rightarrow [0,\infty )\) satisfying

$$\begin{aligned} \int _0^1r(x)^2\,dx=1. \end{aligned}$$

In this case, Theorem 2.3 implies that

$$\begin{aligned} \varepsilon ^{-1/2}\left( \lambda _1(A)-\mathrm{E}\left( \lambda _1(A)\right) \right) \Rightarrow G_1, \end{aligned}$$

as \(N\rightarrow \infty \), where

$$\begin{aligned} G_1\sim N\left( 0,\sigma ^2\right) , \end{aligned}$$

with

$$\begin{aligned} \sigma ^2=2\theta \left( \int _0^1r(x)^3\,dx\right) ^2-2\theta ^2\varepsilon _\infty \left( \int _0^1r(x)^4\,dx\right) ^2. \end{aligned}$$

If r is Lipschitz and \(\varepsilon _\infty =0\), then the claim of Theorem 2.4 boils down to

$$\begin{aligned} \mathrm{E}\left[ \lambda _1(A)\right] =\theta N\varepsilon e_1^\prime e_1+(N\varepsilon \theta )^{-1}\mathrm{E}\left( e_1^\prime W^2 e_1\right) +O\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) , \end{aligned}$$
(3.1)

where

$$\begin{aligned} e_1=N^{-1/2}\left[ r(1/N)\,r(2/N)\,\ldots r(1)\right] ^\prime . \end{aligned}$$

Lipschitz continuity of r implies that

$$\begin{aligned} e_1^\prime e_1=1+O\left( N^{-1}\right) , \end{aligned}$$

and hence (3.1) becomes

$$\begin{aligned} \mathrm{E}\left[ \lambda _1(A)\right] =\theta N\varepsilon +(N\varepsilon \theta )^{-1}\mathrm{E}\left( e_1^\prime W^2 e_1\right) +O\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) . \end{aligned}$$
(3.2)

Clearly,

$$\begin{aligned}&\mathrm{E}\left( e_1^\prime W^2 e_1\right) \\&\quad =\frac{1}{N}\sum _{i=1}^Nr\left( \frac{i}{N}\right) ^2\mathrm{E}\left[ W^2(i,i)\right] \\&\quad =\frac{1}{N}\sum _{i=1}^Nr\left( \frac{i}{N}\right) ^2\sum _{1\le j\le N,\,j\ne i}\varepsilon f\left( \frac{i}{N},\frac{j}{N}\right) \left( 1-\varepsilon f\left( \frac{i}{N},\frac{j}{N}\right) \right) \\&\quad =\theta \varepsilon N^{-1}\sum _{1\le i\ne j\le N}r\left( \frac{i}{N}\right) ^3r\left( \frac{j}{N}\right) +O\left( N^{-1}\varepsilon ^2\right) \\&\quad =N\theta \varepsilon \int _0^1r(x)^3\,dx\int _0^1r(y)\,dy+O(\varepsilon ). \end{aligned}$$

In conjunction with (3.2) this yields

$$\begin{aligned} \mathrm{E}\left[ \lambda _1(A)\right] =\theta N\varepsilon +\int _0^1r(x)^3\,dx\int _0^1r(y)\,dy+O\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) . \end{aligned}$$

3.2 Stochastic Block Model

Another important example is the stochastic block model, defined as follows. Suppose that

$$\begin{aligned} f(x,y)=\sum _{i,j=1}^kp(i,j)\mathbf{1}_{B_i}(x)\mathbf{1}_{B_j}(y), \quad 0\le x,y\le 1, \end{aligned}$$

where p is a \(k\times k\) symmetric positive definite matrix, and \(B_1,\ldots ,B_k\) are disjoint Borel subsets of [0, 1] whose boundaries are sets of measure zero, that is, their indicators are Riemann integrable. We show below how to compute the eigenvalues and eigenfunctions of \(I_f\), the integral operator associated with f.

Let \(\beta _i\) denote the Lebesgue measure of \(B_i\), which we assume without loss of generality to be strictly positive. Rewrite

$$\begin{aligned} f(x,y)=\sum _{i,j=1}^k{\tilde{p}}(i,j)s_i(x)s_j(y), \end{aligned}$$

where

$$\begin{aligned} {\tilde{p}}(i,j)=p(i,j)\sqrt{\beta _i\beta _j}, \quad 1\le i,j\le k, \end{aligned}$$

and

$$\begin{aligned} s_i=\beta _i^{-1/2}\mathbf{1}_{B_i}, \quad 1\le i\le k. \end{aligned}$$

Thus, \(\{s_1,\ldots ,s_k\}\) is an orthonormal set in \(L^2[0,1]\). Let

$$\begin{aligned} {\tilde{p}}=U^\prime DU, \end{aligned}$$

be a spectral decomposition of \({\tilde{p}}\), where U is a \(k\times k\) orthogonal matrix, and

$$\begin{aligned} D=\mathrm{Diag}(\theta _1,\ldots ,\theta _k), \end{aligned}$$

for some \(\theta _1\ge \cdots \ge \theta _k>0\).

Define functions \(r_1,\ldots ,r_k\) by

$$\begin{aligned} \left[ \begin{matrix} r_1(x)\\ \vdots \\ r_k(x) \end{matrix} \right] =U \left[ \begin{matrix} s_1(x)\\ \vdots \\ s_k(x) \end{matrix} \right] ,\,x\in [0,1]. \end{aligned}$$

It is easy to see that \(r_1,\ldots ,r_k\) are orthonormal in \(L^2[0,1]\), and for \(0\le x,y\le 1,\)

$$\begin{aligned} f(x,y)&=\left[ s_1(x)\,\ldots \,s_k(x)\right] {\tilde{p}}\left[ s_1(x)\,\ldots \,s_k(x)\right] ^\prime \\&=\left[ r_1(x)\,\ldots \,r_k(x)\right] U{\tilde{p}} U^\prime \left[ r_1(x)\,\ldots \,r_k(x)\right] ^\prime \\&=\sum _{i=1}^k\theta _ir_i(x)r_i(y). \end{aligned}$$

Thus, \(\theta _1,\ldots ,\theta _k\) are the eigenvalues of \(I_f\), and \(r_1,\ldots ,r_k\) are the corresponding eigenfunctions.

4 Estimates

In this section, we’ll record a few estimates that will subsequently be used in the proof. Since their proofs are routine, they are being postponed to Appendix. Let W be as defined in (1.3). Throughout this section, Assumptions E1 and F1 will be in force.

Lemma 4.1

There exist constants \(C_1\), \(C_2 > 0\) such that

$$\begin{aligned} P \left( \Vert W \Vert \ge 2 \sqrt{MN\varepsilon } + C_1 (N \varepsilon )^{1/4} ( \log N)^{\xi /4}\right) \le e^{-C_2(\log N)^{\xi /4}}, \end{aligned}$$
(4.1)

where \(M=\sup _{0\le x,y\le 1}f(x,y)\). Consequently,

$$\begin{aligned} \Vert W\Vert =O_{hp}\left( \sqrt{N\varepsilon }\right) . \end{aligned}$$

The notations \(e_1\) and \(e_2\) introduced in the next lemma and used in the subsequent lemmas should not be confused with \(e_j\) defined in (2.6). Continuing to suppress ‘N’ in the subscript, let

$$\begin{aligned} L&=[\log N], \end{aligned}$$

where [x] is the largest integer less than or equal to x.

Lemma 4.2

There exists \(0<C_1<\infty \) such that if \(e_1\) and \(e_2\) are \(N\times 1\) vectors with each entry in \([-1/\sqrt{N},1/\sqrt{N}]\), then

$$\begin{aligned} \left| \mathrm{E}\left( e_1^\prime W^ne_2\right) \right| \le (C_1N\varepsilon )^{n/2}, \quad 2\le n\le L. \end{aligned}$$

Lemma 4.3

There exists \(\eta _1>1\) such that for \(e_1,e_2\) as in Lemma 4.2, it holds that

$$\begin{aligned}&\max _{2\le n\le L}P\left( \left| e_1^\prime W^ne_2-\mathrm{E}\left( e_1^\prime W^ne_2\right) \right| >N^{(n-1)/2}\varepsilon ^{n/2}(\log N)^{n\xi /4}\right) \nonumber \\&\quad =O\left( e^{-(\log N)^{\eta _1}}\right) , \end{aligned}$$
(4.2)

where \(\xi \) is as in (1.2). In addition,

$$\begin{aligned} e_1^\prime We_2=o_{hp}\left( N\varepsilon \right) . \end{aligned}$$
(4.3)

Lemma 4.4

If \(e_1,e_2\) are as in Lemma 4.2, then

$$\begin{aligned} \mathrm{Var}\left( e_1^\prime We_2\right) =O(\varepsilon ), \end{aligned}$$
(4.4)

and

$$\begin{aligned} \mathrm{E}\left( e_1^\prime W^3e_2\right) =O(N\varepsilon ). \end{aligned}$$
(4.5)

5 Proof of the Main Results

This section is devoted to the proof of the main results. The section is split into several subsections, each containing the proof of one main result, for the ease of reading. Unless mentioned otherwise, Assumptions E1 and F1. are made.

5.1 Proof of Theorem 2.1

We start with showing that Theorem 2.1 is a corollary of Lemma 4.1. At this point, it should be clarified that throughout this section, \(e_j\) will always be as defined in (2.6).

Proof of Theorem 2.1

For a fixed \(i\in \{1,\ldots ,k\}\), it follows that

$$\begin{aligned} \left| \lambda _i(A)-\lambda _i\left( \mathrm{E}(A)\right) \right| \le \Vert W\Vert =O_{hp}\left( (N\varepsilon )^{1/2}\right) , \end{aligned}$$

by Lemma 4.1. In order to complete the proof, it suffices to show that

$$\begin{aligned} \lim _{N\rightarrow \infty }(N\varepsilon )^{-1}\lambda _i\left( \mathrm{E}(A)\right) =\theta _i, \end{aligned}$$

which however follows from the observation that (2.2) implies that

$$\begin{aligned} \mathrm{E}(A)=N\varepsilon \sum _{j=1}^k\theta _je_je_j^\prime . \end{aligned}$$
(5.1)

This completes the proof. \(\square \)

5.2 Proof of Theorem 2.2

Proceeding towards the proof of Theorem 2.2, let us fix \(i\in {\mathcal {I}}\), once and for all, denote

$$\begin{aligned} \mu =\lambda _i(A), \end{aligned}$$

and let V be a \(k\times k\) real symmetric matrix, depending on N which is suppressed in the notation, defined by

$$\begin{aligned} V(j,l)= {\left\{ \begin{array}{ll} N\varepsilon \sqrt{\theta _j\theta _l}\,e_j^\prime \left( I-\frac{1}{\mu }W\right) ^{-1}e_l,&{}\text {if }\Vert W\Vert <\mu ,\\ 0,&{}\text {else}, \end{array}\right. } \end{aligned}$$

for all \(1\le j,l\le k\). It should be noted that if \(\Vert W\Vert <\mu \), then \(I-W/\mu \) is invertible.

The proof of Theorem 2.2, which is the main content of this paper, involves several steps, and hence it is imperative to sketch the outline of the proof for the reader’s convenience, which is done below.

  1. (1)

    The first major step of the proof is to show that w.h.p., \(\mu \) exactly equals \(\lambda _i(V)\). This is done in Lemma 5.2. When \(i=k=1\), the matrix V is a scalar and hence in that case the equation boils down to \(\mu =V\) which is a consequence of the resolvent equation. For higher values of k, the Gershgorin circle theorem is employed for the desired claim. Therefore, this step is a novelty of the given proof.

  2. (2)

    The next step in the proof is to write \(\mu \) as the solution of an equation of the form

    $$\begin{aligned} \mu =\lambda _i\left( \sum _{n=0}^L\mu ^{-n}Y_n\right) +Error, \end{aligned}$$

    for suitable matrices \(Y_1,Y_2,\ldots \). This is done in Lemma 5.4.

  3. (3)

    The third step is to replace \(Y_n\) by \(\mathrm{E}(Y_n)\) in the equation obtained in the above step, for \(n\ge 2\). This is done in Lemma 5.5.

  4. (4)

    Arguably the most important step in the proof is to obtain an equation of the form

    $$\begin{aligned} \mu ={\bar{\mu }}+\mu ^{-1}\zeta +Error, \end{aligned}$$

    for some deterministic \({\bar{\mu }}\) depending on N and random \(\zeta \). Once again, this is achieved from the previous step with the help of the Gershgorin circle theorem and other linear algebraic tools. This is done in Lemma 5.8.

  5. (5)

    The final step of the proof is to show that \({\bar{\mu }}\) of the above step can be replaced by \(\mathrm{E}(\mu )\).

Now let us proceed towards executing the above steps for proving Theorem 2.2. As the zeroth step, we show that \(V/N\varepsilon \) converges to \(\mathrm{Diag}(\theta _1,\ldots ,\theta _k)\), that is, the \(k\times k\) diagonal matrix with diagonal entries \(\theta _1,\ldots ,\theta _k\), w.h.p.

Lemma 5.1

As \(N\rightarrow \infty \),

$$\begin{aligned} V(j,l)=N\varepsilon \theta _j\left( \mathbf{1}(j=l)+o_{hp}(1)\right) ,1\le j,l\le k. \end{aligned}$$

Proof

For fixed \(1\le j,l\le k\), writing

$$\begin{aligned} \left( I-\frac{1}{\mu }W\right) ^{-1}=I+O_{hp}\left( \mu ^{-1}\Vert W\Vert \right) , \end{aligned}$$

we get that

$$\begin{aligned} V(j,l)=N\varepsilon \sqrt{\theta _j\theta _l}\left( e_j^\prime e_l+\frac{1}{\mu }O_{hp}(\Vert W\Vert )\right) . \end{aligned}$$

Since

$$\begin{aligned} \lim _{N\rightarrow \infty }e_j^\prime e_l=\mathbf{1}(j=l), \end{aligned}$$
(5.2)

and

$$\begin{aligned} \Vert W\Vert =o_{hp}(\mu ) \end{aligned}$$

by Lemma 4.1 and Theorem 2.1, the proof follows. \(\square \)

The next step, which is one of the main steps in the proof of Theorem 2.2, shows that the ith eigenvalues of A and V are exactly equal w.h.p.

Lemma 5.2

With high probability,

$$\begin{aligned} \mu =\lambda _i(V). \end{aligned}$$

The proof of the above lemma is based on the following fact which is a direct consequence of the Gershgorin circle theorem; see Theorem 1.6, pg 8 of [33].

Fact 5.1

Suppose that U is an \(n\times n\) real symmetric matrix. Define

$$\begin{aligned} R_l=\sum _{1\le j\le n,\,j\ne l}|U(j,l)|,\,1\le l\le n. \end{aligned}$$

If for some \(1\le m\le n\) it holds that

$$\begin{aligned} U(m,m)+R_m<U(l,l)-R_l\quad \text {for all }1\le l\le m-1, \end{aligned}$$
(5.3)

and

$$\begin{aligned} U(m,m)-R_m>U(l,l)+R_l,\quad \text {for all }m+1\le l\le n, \end{aligned}$$
(5.4)

then

$$\begin{aligned} \bigl \{\lambda _1(U),\ldots ,\lambda _n(U)\bigr \}\setminus \left( \bigcup _{1\le l\le k,\,l\ne m}\left[ U(l,l)-R_l,U(l,l)+R_l\right] \right) =\bigl \{\lambda _m(U)\bigr \}. \end{aligned}$$

Remark 5.1

The assumptions (5.3) and (5.4) of Fact 5.1 mean that the Gershgorin disk containing the mth largest eigenvalue is disjoint from any other Gershgorin disk.

Proof of Lemma 5.2

The first step is to show that

$$\begin{aligned} \mu \in \bigl \{\lambda _1(V),\ldots ,\lambda _k(V)\bigr \}\text { w.h.p.} \end{aligned}$$
(5.5)

To that end, fix \(N\ge 1\) and a sample point for which \(\Vert W\Vert <\mu \). The following calculations are done for that fixed sample point.

Let v be an eigenvector of A, with norm 1, corresponding to \(\lambda _i(A)\). That is,

$$\begin{aligned} \mu v=Av=Wv+N\varepsilon \sum _{l=1}^k \theta _l(e_l^\prime v)e_l, \end{aligned}$$
(5.6)

by (5.1). Since \(\mu >\Vert W\Vert \), \(\mu I-W\) is invertible, and hence

$$\begin{aligned} v=N\varepsilon \sum _{l=1}^k\theta _l(e_l^\prime v)\left( \mu I-W\right) ^{-1}e_l. \end{aligned}$$
(5.7)

Fixing \(j\in \{1,\ldots ,k\}\) and premultiplying the above by \(\sqrt{\theta _j}\mu e_j^\prime \) yields

$$\begin{aligned} \mu \sqrt{\theta _j}(e_j^\prime v)=N\varepsilon \sum _{l=1}^k\sqrt{\theta _j}\theta _l(e_l^\prime v)e_j^\prime \left( I-\frac{1}{\mu }W\right) ^{-1}e_l=\sum _{l=1}^kV(j,l)\sqrt{\theta _l}(e_l^\prime v). \end{aligned}$$

As the above holds for all \(1\le j\le k\), this means that if

$$\begin{aligned} u=\left[ \sqrt{\theta _1}(e_1^\prime v)\ldots \sqrt{\theta _k}(e_k^\prime v)\right] ^\prime , \end{aligned}$$
(5.8)

then

$$\begin{aligned} Vu=\mu u. \end{aligned}$$
(5.9)

Recalling that in the above calculations a sample point is fixed such that \(\Vert W\Vert <\mu \), what we have shown, in other words, is that a vector u satisfying the above exists w.h.p.

In order to complete the proof of (5.5), it suffices to show that u is a non-null vector w.h.p. To that end, premultiply (5.6) by \(v^\prime \) to obtain that

$$\begin{aligned} \mu =v^\prime Wv+N\varepsilon \Vert u\Vert ^2. \end{aligned}$$

Dividing both sides by \(N\varepsilon \) and using Lemma 4.1 implies that

$$\begin{aligned} \Vert u\Vert ^2=\theta _i+o_{hp}(1). \end{aligned}$$

Thus, u is a non-null vector w.h.p. From this and (5.9), (5.5) follows.

Lemma 5.1 shows that for all \(l\in \{1,\ldots ,i-1\}\),

$$\begin{aligned}&\left[ V(i,i)+\sum _{1\le j\le k,\,j\ne i}|V(i,j)|\right] -\left[ V(l,l)-\sum _{1\le j\le k,\,j\ne l}|V(l,j)|\right] \\&\quad =N\varepsilon \left( \theta _i-\theta _l\right) (1+o_{hp}(1)), \end{aligned}$$

as \(N\rightarrow \infty \). Since \(i\in {\mathcal {I}}\), \(\theta _i-\theta _l<0\), and hence

$$\begin{aligned} V(i,i)+\sum _{1\le j\le k,\,j\ne i}|V(i,j)|<V(l,l)-\sum _{1\le j\le k,\,j\ne l}|V(l,j)|\;\text {w.h.p.} \end{aligned}$$

A similar calculation shows that for \(l\in \{i+1,\ldots ,k\}\),

$$\begin{aligned} V(i,i)-\sum _{1\le j\le k,\,j\ne i}|V(i,j)|>V(l,l)+\sum _{1\le j\le k,\,j\ne l}|V(l,j)|\;\text {w.h.p.} \end{aligned}$$

In view of (5.5) and Fact 5.1, the proof would follow once it can be shown that for all \(l\in \{1,\ldots ,k\}\setminus \{i\}\),

$$\begin{aligned} \left| \mu -V(l,l)\right| >\sum _{1\le j\le k,\,j\ne l}|V(l,j)|\text { w.h.p.} \end{aligned}$$

This follows, once again, by dividing both sides by \(N\varepsilon \) and using Theorem 2.1 and Lemma 5.1. This completes the proof. \(\square \)

The next step is to write

$$\begin{aligned} \left( I-\frac{1}{\mu }W\right) ^{-1}=\sum _{n=0}^\infty \mu ^{-n}W^n, \end{aligned}$$
(5.10)

which is possible because \(\Vert W\Vert <\mu \). Denote

$$\begin{aligned} Z_{j,l,n}=e_j^\prime W^n e_l,\quad 1\le j,\;l\le k,\;n\ge 0, \end{aligned}$$

which should not be confused with \(Z_{ij}\) defined in (2.14), and for \(n\ge 0\), let \(Y_n\) be a \(k\times k\) matrix with

$$\begin{aligned} Y_n(j,l)=\sqrt{\theta _j\theta _l}N\varepsilon Z_{j,l,n},\quad 1\le j,\;l\le k. \end{aligned}$$

The following bounds will be used several times.

Lemma 5.3

It holds that

$$\begin{aligned} \mathrm{E}\left( \Vert Y_1\Vert \right) =O\left( N\varepsilon ^{3/2}\right) , \end{aligned}$$

and

$$\begin{aligned} \Vert Y_1\Vert =o_{hp}\left( (N\varepsilon )^2\right) . \end{aligned}$$

Proof

Lemma 4.4 implies that

$$\begin{aligned} \mathrm{Var}\left( Z_{j,l,1}\right) =O(\varepsilon ),\quad 1\le j,l\le k. \end{aligned}$$

Hence,

$$\begin{aligned} \mathrm{E}\Vert Y_1\Vert =O\left( N\varepsilon \sum _{j,l=1}^k\mathrm{E}|Z_{j,l,1}|\right) =O\left( N\varepsilon \sum _{j,l=1}^k\sqrt{\mathrm{Var}(Z_{j,l,1})}\right) =O\left( N\varepsilon ^{3/2}\right) , \end{aligned}$$

the equality in the second line using the fact that \(Z_{j,l,1}\) has mean 0. This proves the first claim. The second claim follows from (4.3) of Lemma 4.3. \(\square \)

The next step is to truncate the infinite sum in (5.10) to level L, where \(L=[\log N]\) as defined before.

Lemma 5.4

It holds that

$$\begin{aligned} \mu =\lambda _i\left( \sum _{n=0}^L\mu ^{-n}Y_n\right) +o_{hp}\left( \sqrt{\varepsilon }\right) . \end{aligned}$$

Proof

From the definition of V, it is immediate that for \(1\le j,l\le k\),

$$\begin{aligned} V(j,l)=N\varepsilon \sqrt{\theta _j\theta _l}\sum _{n=0}^\infty \mu ^{-n}e_j^\prime W^ne_l\,\mathbf{1}(\Vert W\Vert <\mu ), \end{aligned}$$

and hence

$$\begin{aligned} V=\mathbf{1}(\Vert W\Vert <\mu )\sum _{n=0}^\infty \mu ^{-n}Y_n. \end{aligned}$$

For the sake of notational simplicity, let us suppress \(\mathbf{1}(\Vert W\Vert <\mu )\). Therefore, with the implicit understanding that the sum is set as zero if \(\Vert W\Vert \ge \mu \), for the proof it suffices to check that

$$\begin{aligned} \left\| \sum _{n=L+1}^\infty \mu ^{-n}Y_n\right\| =o_{hp}(\sqrt{\varepsilon }). \end{aligned}$$
(5.11)

To that end, Theorem 2.1 and Lemma 4.1 imply that

$$\begin{aligned} \left\| \sum _{n=L+1}^\infty \mu ^{-n}Y_n\right\| \le \sum _{n=L+1}^\infty |\mu |^{-n}\Vert Y_n\Vert =O_{hp}\left( (N\varepsilon )^{-(L-1)/2}\right) . \end{aligned}$$

In order to prove (5.11), it suffices to show that as \(N\rightarrow \infty \),

$$\begin{aligned} -\log \varepsilon =o\left( (L-1)\log (N\varepsilon )\right) . \end{aligned}$$
(5.12)

To that end, recall (1.2) to argue that

$$\begin{aligned} N^{-1}=o(\varepsilon ) \end{aligned}$$
(5.13)

and

$$\begin{aligned} \log \log N=O(\log (N\varepsilon )). \end{aligned}$$
(5.14)

By (5.13), it follows that

$$\begin{aligned} -\log \varepsilon&=O\left( \log N\right) \\&=o\left( \log N\log \log N\right) \\&=o\left( (L-1)\log (N\varepsilon )\right) , \end{aligned}$$

the last line using (5.14). Therefore, (5.12) follows, which ensures (5.11), which in turn completes the proof. \(\square \)

In the next step, \(Y_n\) is replaced by its expectation for \(n\ge 2\).

Lemma 5.5

It holds that

$$\begin{aligned} \mu =\lambda _i\left( Y_0+\mu ^{-1}Y_1+\sum _{n=2}^L\mu ^{-n}\mathrm{E}(Y_n)\right) +o_{hp}\left( \sqrt{\varepsilon }\right) . \end{aligned}$$

Proof

In view of Theorem 2.1 and Lemma 5.4, all that has to be checked is

$$\begin{aligned} \sum _{n=2}^L(N\varepsilon )^{-n}\Vert Y_n-\mathrm{E}(Y_n)\Vert =o_{hp}(\sqrt{\varepsilon }). \end{aligned}$$
(5.15)

For that, invoke Lemma 4.3 to claim that

$$\begin{aligned}&\max _{2\le n\le L,\,1\le j,l\le k}P\left( \left| Z_{j,l,n}-\mathrm{E}(Z_{j,l,n})\right| >N^{(n-1)/2}\varepsilon ^{n/2}(\log N)^{n\xi /4}\right) \nonumber \\&\qquad =O\left( e^{-(\log N)^{\eta _1}}\right) , \end{aligned}$$
(5.16)

where \(\xi \) is as in (1.2).

Our next claim is that there exists \(C_2>0\) such that for N large,

$$\begin{aligned} \bigcap _{2\le n\le L,1\le j,l\le k}\left[ \left| Z_{j,l,n}-\mathrm{E}(Z_{j,l,n})\right| \le N^{(n-1)/2}\varepsilon ^{n/2}(\log N)^{n\xi /4}\right] \end{aligned}$$
(5.17)
$$\begin{aligned} \subset \left[ \sum _{n=2}^L(N\varepsilon )^{-n}\Vert Y_n-\mathrm{E}(Y_n)\Vert \le C_2\sqrt{\varepsilon }\left( (N\varepsilon )^{-1}(\log N)^\xi \right) ^{1/2}\right] . \end{aligned}$$

To see this, suppose that the event on the left hand side holds. Then, for fixed \(1\le j,l\le k\), and large N,

$$\begin{aligned}&\sum _{n=2}^L(N\varepsilon )^{-n}\left\| Y_n(j,l)-\mathrm{E}\left[ Y_n(j,l)\right] \right\| \\&\quad \le \theta _1 N\varepsilon \sum _{n=2}^L(N\varepsilon )^{-n}\left| Z_{j,l,n}-\mathrm{E}\left( Z_{j,l,n}\right) \right| \\&\quad \le \theta _1\sum _{n=2}^\infty (N\varepsilon )^{-(n-1)}N^{(n-1)/2}\varepsilon ^{n/2}(\log N)^{n\xi /4}\\&\quad =\left[ 1-(N\varepsilon )^{-1/2}(\log N)^{\xi /4}\right] ^{-1}\theta _1\sqrt{\varepsilon }(N\varepsilon )^{-1/2}(\log N)^{\xi /2}. \end{aligned}$$

Thus, (5.17) holds for some \(C_2>0\).

Combining (5.16) and (5.17), it follows that

$$\begin{aligned}&P\left( \sum _{n=2}^L(N\varepsilon )^{-n}\Vert Y_n-\mathrm{E}(Y_n)\Vert >C_2\sqrt{\varepsilon }\left( (N\varepsilon )^{-1}(\log N)^\xi \right) ^{1/2}\right) \\&\quad =O\left( \log N e^{-(\log N)^{\eta _1}}\right) \\&\quad =o\left( e^{-(\log N)^{(1+\eta _1)/2}}\right) . \end{aligned}$$

This, with the help of (1.2), establishes (5.15) from which the proof follows. \(\square \)

The goal of the next two lemmas is replacing \(\mu \) by a deterministic quantity in

$$\begin{aligned} \sum _{n=2}^L\mu ^{-n}\mathrm{E}(Y_n). \end{aligned}$$

Lemma 5.6

For N large, the deterministic equation

$$\begin{aligned} x=\lambda _i\left( \sum _{n=0}^Lx^{-n}\mathrm{E}(Y_n)\right) ,\,x>0, \end{aligned}$$
(5.18)

has a solution \({\tilde{\mu }}\) such that

$$\begin{aligned} 0<\liminf _{N\rightarrow \infty }(N\varepsilon )^{-1}{\tilde{\mu }}\le \limsup _{N\rightarrow \infty }(N\varepsilon )^{-1}{\tilde{\mu }}<\infty . \end{aligned}$$
(5.19)

Proof

Define a function

$$\begin{aligned} h:(0,\infty )\rightarrow {{\mathbb {R}}}, \end{aligned}$$

by

$$\begin{aligned} h(x)=\lambda _i\left( \sum _{n=0}^Lx^{-n}\mathrm{E}(Y_n)\right) . \end{aligned}$$

Our first claim is that for any fixed \(x>0\),

$$\begin{aligned} \lim _{N\rightarrow \infty }(N\varepsilon )^{-1}h\left( xN\varepsilon \right) =\theta _i. \end{aligned}$$
(5.20)

To that end, observe that since \(\mathrm{E}(Y_1)=0\),

$$\begin{aligned} h\left( xN\varepsilon \right) =\lambda _i\left( \mathrm{E}(Y_0)+\sum _{n=2}^L(xN\varepsilon )^{-n}\mathrm{E}(Y_n)\right) . \end{aligned}$$

Recalling that

$$\begin{aligned} Y_0(j,l)=N\varepsilon \sqrt{\theta _j\theta _l}\,e_j^\prime e_l,\quad 1\le j,\;l\le k, \end{aligned}$$

it follows by (5.2) that

$$\begin{aligned} \lim _{N\rightarrow \infty }(N\varepsilon )^{-1}\mathrm{E}(Y_0)=\mathrm{Diag}(\theta _1,\ldots ,\theta _k). \end{aligned}$$
(5.21)

Lemma 4.2 implies that

$$\begin{aligned} \mathrm{E}(Z_{j,l,n})\le \left( O(N\varepsilon )\right) ^{n/2}, \end{aligned}$$

uniformly for \(2\le n\le L\), and hence there exists \(0<C_3<\infty \) with

$$\begin{aligned} \Vert \mathrm{E}(Y_n)\Vert \le (C_3N\varepsilon )^{n/2+1}, \quad 2 \le n\le L. \end{aligned}$$
(5.22)

Therefore,

$$\begin{aligned} \left\| \sum _{n=2}^L(xN\varepsilon )^{-n}\mathrm{E}(Y_n)\right\| \le \sum _{n=2} ^\infty (xN\varepsilon )^{-n}(C_3N\varepsilon )^{n/2+1}\rightarrow C_3^2x^{-2}, \end{aligned}$$

as \(N\rightarrow \infty \). With the help of (5.21), this implies that

$$\begin{aligned} \lim _{N\rightarrow \infty }(N\varepsilon )^{-1}\left( \sum _{n=0}^L(xN\varepsilon )^{-n}\mathrm{E}(Y_n)\right) =\mathrm{Diag}(\theta _1,\ldots ,\theta _k), \end{aligned}$$

and hence (5.20) follows. It follows that for a fixed \(0<\delta <\theta _i\).

$$\begin{aligned} \lim _{N\rightarrow \infty }(N\varepsilon )^{-1}\left[ N\varepsilon (\theta _i+\delta )-h\left( (\theta _i+\delta )N\varepsilon \right) \right] =\delta , \end{aligned}$$

and thus, for large N,

$$\begin{aligned} N\varepsilon (\theta _i+\delta )>h\left( (\theta _i+\delta )N\varepsilon \right) . \end{aligned}$$

Similarly, again for large N,

$$\begin{aligned} N\varepsilon (\theta _i-\delta )<h\left( (\theta _i-\delta )N\varepsilon \right) . \end{aligned}$$

Hence, for N large, (5.18) has a solution \({\tilde{\mu }}\) in \([(N\varepsilon )(\theta _i-\delta ),(N\varepsilon )(\theta _i+\delta )]\), which trivially satisfies (5.19). Hence the proof. \(\square \)

Lemma 5.7

If \({\tilde{\mu }}\) is as in Lemma 5.6, then

$$\begin{aligned} \mu -{\tilde{\mu }}=O_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$

Proof

Lemmas 5.5 and 5.6 imply that

$$\begin{aligned}&|\mu -{\tilde{\mu }}|\\&\quad =\left| \lambda _i\left( Y_0+\mu ^{-1}Y_1+\sum _{n=2}^L\mu ^{-n}\mathrm{E}(Y_n)\right) -\lambda _i\left( \sum _{n=0}^L{\tilde{\mu }}^{-n}\mathrm{E}(Y_n)\right) \right| +o_{hp}(\sqrt{\varepsilon })\\&\quad \le \Vert \mu ^{-1}Y_1\Vert +|\mu -{\tilde{\mu }}|\sum _{n=2}^L\mu ^{-n}{\tilde{\mu }}^{-n}\Vert E(Y_n)\Vert \sum _{j=0}^{n-1}\mu ^j{\tilde{\mu }}^{n-1-j}+o_{hp}(\sqrt{\varepsilon })\\&\quad =|\mu -{\tilde{\mu }}|\sum _{n=2}^L\mu ^{-n}{\tilde{\mu }}^{-n}\Vert E(Y_n)\Vert \sum _{j=0}^{n-1}\mu ^j{\tilde{\mu }}^{n-1-j}+O_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$

Thus,

$$\begin{aligned} |\mu -{\tilde{\mu }}|\left[ 1-\sum _{n=2}^L\mu ^{-n}{\tilde{\mu }}^{-n}\Vert E(Y_n)\Vert \sum _{j=0}^{n-1}\mu ^j{\tilde{\mu }}^{n-1-j}\right] \le O_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$
(5.23)

Equations (5.19) and (5.22) imply that

$$\begin{aligned} \left| \sum _{n=2}^L\mu ^{-n}{\tilde{\mu }}^{-n}\Vert E(Y_n)\Vert \sum _{j=0}^{n-1}\mu ^j{\tilde{\mu }}^{n-1-j}\right|&=O_{hp}\left( \sum _{n=2}^\infty n(N\varepsilon )^{-(n+1)}(C_3N\varepsilon )^{n/2+1}\right) \nonumber \\&=O_{hp}\left( (N\varepsilon )^{-1}\right) \nonumber \\&=o_{hp}(1),\,N\rightarrow \infty . \end{aligned}$$
(5.24)

This completes the proof with the help of (5.23). \(\square \)

The next lemma is arguably the most important step in the proof of Theorem 2.2, the other major step being Lemma 5.2.

Lemma 5.8

There exists a deterministic \({\bar{\mu }}\), which depends on N, such that

$$\begin{aligned} \mu ={\bar{\mu }}+\mu ^{-1}Y_1(i,i)+o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$

Proof

Define a \(k\times k\) deterministic matrix

$$\begin{aligned} X=\sum _{n=0}^L{\tilde{\mu }}^{-n}\mathrm{E}(Y_n), \end{aligned}$$

which, as usual, depends on N. Lemma 5.7 and (5.24) imply that

$$\begin{aligned} \left\| X-\sum _{n=0}^L\mu ^{-n}\mathrm{E}(Y_n)\right\|&\le |\mu -{\tilde{\mu }}|\sum _{n=2}^L\mu ^{-n}{\tilde{\mu }}^{-n}\Vert \mathrm{E}(Y_n)\Vert \sum _{j=0}^{n-1}\mu ^j{\tilde{\mu }}^{n-1-j}\\&=o_{hp}\left( |\mu -{\tilde{\mu }}|\right) \\&=o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$

By Lemma 5.5 it follows that

$$\begin{aligned} \mu =\lambda _i\left( \mu ^{-1}Y_1+X\right) +o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$
(5.25)

Let

$$\begin{aligned} H= & {} X+\mu ^{-1}Y_1-\left( X(i,i)+\mu ^{-1}Y_1(i,i)\right) I, \\ M= & {} X-X(i,i)I, \end{aligned}$$

and

$$\begin{aligned} {\bar{\mu }}=\lambda _i(X). \end{aligned}$$

Clearly,

$$\begin{aligned} \lambda _i\left( \mu ^{-1}Y_1+X\right) =X(i,i)+\mu ^{-1}Y_1(i,i)+\lambda _i(H)={\bar{\mu }}-\lambda _i(M)+\mu ^{-1}Y_1(i,i)+\lambda _i(H). \end{aligned}$$

Thus, the proof would follow with the aid of (5.25) if it can be shown that

$$\begin{aligned} \lambda _i(H)-\lambda _i(M)=o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert \right) . \end{aligned}$$
(5.26)

If \(k=1\), then \(i=1\) and hence \(H=M=0\). Thus, the above is a tautology in that case. Therefore, assume without loss of generality that \(k\ge 2\).

Proceeding towards proving (5.26) when \(k\ge 2\), set

$$\begin{aligned} U_1=(N\varepsilon )^{-1}M, \end{aligned}$$
(5.27)

and

$$\begin{aligned} U_2=(N\varepsilon )^{-1}H. \end{aligned}$$
(5.28)

The main idea in the proof of (5.26) is to observe that the eigenvector of \(U_1\) corresponding to \(\lambda _i(U_1)\) is same as that of M corresponding to \(\lambda _i(M)\), and likewise for \(U_2\) and X. Hence, the first step is to use this to get a bound on the differences between the eigenvectors in terms of \(\Vert U_1-U_2\Vert \).

An important observation that will be used later is that

$$\begin{aligned} \Vert U_1-U_2\Vert =O_{hp}\left( (N\varepsilon )^{-2}\Vert Y_1\Vert \right) . \end{aligned}$$
(5.29)

The second claim of Lemma 5.3 implies that the right hand side above is \(o_{hp}(1)\). The same implies that for \(m=1,2\) and \(1\le j,l\le k\),

$$\begin{aligned} U_m(j,l)=(\theta _j-\theta _i)\mathbf{1}(j=l)+o_{hp}(1), \quad N\rightarrow \infty . \end{aligned}$$
(5.30)

In other words, as \(N\rightarrow \infty \), \(U_1\) and \(U_2\) converge to \(\mathrm{Diag}(\theta _1-\theta _i,\ldots ,\theta _k-\theta _i)\) w.h.p. Therefore,

$$\begin{aligned} \lambda _i(U_m)=o_{hp}(1),\quad m=1,2. \end{aligned}$$
(5.31)

Let \({\tilde{U}}_m\), for \(m=1,2,\) be the \((k-1)\times (k-1)\) matrix (recall that \(k\ge 2\)) obtained by deleting the ith row and the ith column of \(U_m\), and let \({\tilde{u}}_m\) be the \((k-1)\times 1\) vector obtained from the ith column of \(U_m\) by deleting its ith entry. It is worth recording, for possible future use, that

$$\begin{aligned} \Vert {\tilde{u}}_m\Vert =o_{hp}(1),\quad m=1,2, \end{aligned}$$
(5.32)

which follows from (5.30), and that

$$\begin{aligned} \Vert {\tilde{u}}_1-\tilde{u}_2\Vert =O_{hp}\left( (N\varepsilon )^{-2}\Vert Y_1\Vert \right) , \end{aligned}$$
(5.33)

follows from (5.29).

Equations (5.30) and (5.31) imply that \({\tilde{U}}_m-\lambda _i(U_m)I_{k-1}\) converges w.h.p. to

$$\begin{aligned} \mathrm{Diag}(\theta _1-\theta _i,\ldots ,\theta _{i-1}-\theta _i,\theta _{i+1}-\theta _i,\theta _k-\theta _i). \end{aligned}$$

Since \(i\in {\mathcal {I}}\), the above matrix is invertible. Fix \(\delta >0\) such that every matrix in the closed \(\delta \)-neighborhood \(B_\delta \), in the sense of operator norm, of the above matrix is invertible. Let

$$\begin{aligned} C_4=\sup _{E\in B_\delta }\Vert E^{-1}\Vert . \end{aligned}$$
(5.34)

Then, \(C_4<\infty \). Besides, there exists \(C_5<\infty \) satisfying

$$\begin{aligned} \left\| E_1^{-1}-E_2^{-1}\right\| \le C_5\Vert E_1-E_2\Vert , \quad E_1,E_2\in B_\delta . \end{aligned}$$
(5.35)

Fix \(N\ge 1\) and a sample point such that \(\tilde{U}_m-\lambda _i(U_m)I_{k-1}\) belongs to \(B_\delta \). Then, it is invertible. Define a \((k-1)\times 1\) vector

$$\begin{aligned} {\tilde{v}}_m=-\left[ {\tilde{U}}_m-\lambda _i(U_m)I_{k-1}\right] ^{-1}\tilde{u}_m, \quad m=1,2, \end{aligned}$$

and a \(k\times 1\) vector

$$\begin{aligned} v_m=\left[ {\tilde{v}}_m(1),\ldots ,{\tilde{v}}_m(i-1),\,1,\,\tilde{v}_m(i),\ldots ,{\tilde{v}}_m(k-1)\right] ^\prime , \quad m=1,2. \end{aligned}$$

It is immediate that

$$\begin{aligned} \Vert {\tilde{v}}_m\Vert \le C_4\Vert {\tilde{u}}_m\Vert , \quad m=1,2. \end{aligned}$$
(5.36)

Our next claim is that

$$\begin{aligned} U_mv_m=\lambda _i(U_m)v_m, \quad m=1,2. \end{aligned}$$
(5.37)

This claim is equivalent to

$$\begin{aligned} \left[ U_m-\lambda _i(U_m)I_k\right] v_m=0. \end{aligned}$$
(5.38)

Let \({\bar{U}}_m\) be the \((k-1)\times k\) matrix obtained by deleting the ith row of \(U_m-\lambda _i(U_m)I_k\). Since the latter matrix is singular, and \({\tilde{U}}_m-\lambda _i(U_m)I_{k-1}\) is invertible, it follows that the ith row of \(U_m-\lambda _i(U_m)I_k\) lies in the row space of \({\bar{U}}_m\). In other words, the row spaces of \(U_m-\lambda _i(U_m)I_k\) and \({\bar{U}}_m\) are the same, and so do their null spaces. Thus, (5.38) is equivalent to

$$\begin{aligned} {\bar{U}}_mv_m=0. \end{aligned}$$

To see the above, observe that the ith column of \({\bar{U}}_m\) is \({\tilde{u}}_m\), and hence we can partition

$$\begin{aligned} {\bar{U}}_m=\left[ {\bar{U}}_{m1}\,{\tilde{u}}_m\,{\bar{U}}_{m2}\right] , \quad \end{aligned}$$

where \({\bar{U}}_{m1}\) and \({\bar{U}}_{m2}\) are of order \((k-1)\times (i-1)\) and \((k-1)\times (k-i)\), respectively. Furthermore,

$$\begin{aligned} \left[ {\bar{U}}_{m1}\,{\bar{U}}_{m2}\right] =\tilde{U}_m-\lambda _i(U_m)I_{k-1}. \end{aligned}$$

Therefore,

$$\begin{aligned} {\bar{U}}_mv_m={\tilde{u}}_m+\left[ {\bar{U}}_{m1}\,{\bar{U}}_{m2}\right] \tilde{v}_m={\tilde{u}}_m+\left( {\tilde{U}}_m-\lambda _i(U_m)I_{k-1}\right) \tilde{v}_m=0. \end{aligned}$$

Hence, (5.38) follows, which proves (5.37).

Next, we note

$$\begin{aligned} \Vert v_1-v_2\Vert&=\Vert {\tilde{v}}_1-{\tilde{v}}_2\Vert \le \Vert ({\tilde{U}}_1-\lambda _i(U_1)I_{k-1})^{-1}\Vert \Vert {\tilde{u}}_1-{\tilde{u}}_2\Vert \\&\quad +\Vert ({\tilde{U}}_1-\lambda _i(U_1)I_{k-1})^{-1}-({\tilde{U}}_2-\lambda _i(U_2)I_{k-1})^{-1}\Vert \Vert {\tilde{u}}_2\Vert \\&\quad \le C_4\Vert {\tilde{u}}_1-{\tilde{u}}_2\Vert +C_5\Vert (\tilde{U}_1-\lambda _i(U_1)I_{k-1})-(\tilde{U}_2-\lambda _i(U_2)I_{k-1})\Vert \Vert {\tilde{u}}_2\Vert , \end{aligned}$$

\(C_4\) and \(C_5\) being as in (5.34) and (5.35), respectively. Recalling that the above calculation was done on an event of high probability, what we have proven, with the help of (5.29) and (5.33), is that

$$\begin{aligned} \Vert v_1-v_2\Vert =O_{hp}\left( (N\varepsilon )^{-2}\Vert Y_1\Vert \right) . \end{aligned}$$

Furthermore, (5.32) and (5.36) imply that

$$\begin{aligned} \Vert {\tilde{v}}_m\Vert =o_{hp}(1). \end{aligned}$$

Finally, noting that

$$\begin{aligned} U_m(i,i)=0, \quad m=1,2, \end{aligned}$$

and that

$$\begin{aligned} v_m(i)=1, \quad m=1,2, \end{aligned}$$

it follows that

$$\begin{aligned} \left| \lambda _i(U_1)-\lambda _i(U_2)\right|&=\left| \sum _{1\le j\le k,\,j\ne i}U_1(i,j)v_1(j)-\sum _{1\le j\le k,\,j\ne i}U_2(i,j)v_2(j)\right| \\&\le \sum _{1\le j\le k,\,j\ne i}|U_1(i,j)||v_1(j)-v_2(j)| \\&\quad +\sum _{1\le j\le k,\,j\ne i}|U_1(i,j)-U_2(i,j)|v_2(j)|\\&=O_{hp}\left( \Vert {\tilde{u}}_1\Vert \Vert v_1-v_2\Vert +\Vert U_1-U_2\Vert \Vert {\tilde{v}}_2\Vert \right) \\&=o_{hp}\left( (N\varepsilon )^{-2}\Vert Y_1\Vert \right) . \end{aligned}$$

Recalling (5.27) and (5.28), (5.26) follows, which completes the proof in conjunction with (5.25). \(\square \)

Now, we are in a position to prove Theorem 2.2.

Proof of Theorem 2.2

Recalling that

$$\begin{aligned} Y_1(i,i)=\theta _iN\varepsilon \,e_i^\prime We_i, \end{aligned}$$

it suffices to show that

$$\begin{aligned} \mu -\mathrm{E}(\mu )=\mu ^{-1}Y_1(i,i)+o_p(\sqrt{\varepsilon }). \end{aligned}$$
(5.39)

Lemma 5.8 implies that

$$\begin{aligned} \mu -{\bar{\mu }}=\mu ^{-1}Y_1(i,i)+o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) =O_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) , \end{aligned}$$
(5.40)

a consequence of which, combined with Lemma 5.3, is that

$$\begin{aligned} \lim _{N\rightarrow \infty }(N\varepsilon )^{-1}{\bar{\mu }}=\theta _i. \end{aligned}$$
(5.41)

Thus,

$$\begin{aligned} \left| \frac{1}{{\bar{\mu }}}Y_1(i,i)-\frac{1}{\mu }Y_1(i,i)\right|&=O_{hp}\left( (N\varepsilon )^{-2}|\mu -{\bar{\mu }}|\Vert Y_1\Vert \right) \nonumber \\&=o_{hp}\left( |\mu -{\bar{\mu }}|\right) \nonumber \\&=o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) \nonumber \\&=o_p(\sqrt{\varepsilon }), \end{aligned}$$
(5.42)

Lemma 5.3 implying the second line, the third line following from (5.40) and the fact that

$$\begin{aligned} \Vert Y_1\Vert =O_p(N\varepsilon ^{3/2}), \end{aligned}$$
(5.43)

which is also a consequence of the former lemma, being used for the last line. Using Lemma 5.8 once again, we get that

$$\begin{aligned} \mu ={\bar{\mu }}+\frac{1}{{\bar{\mu }}}Y_1(i,i)+o_{hp}\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) . \end{aligned}$$
(5.44)

Let

$$\begin{aligned} R=\mu -{\bar{\mu }}-\frac{1}{{\bar{\mu }}}Y_1(i,i). \end{aligned}$$

Clearly,

$$\begin{aligned} \mathrm{E}(R)=\mathrm{E}(\mu )-{\bar{\mu }}, \end{aligned}$$

and (5.44) implies that for \(\delta >0\) there exists \(\eta >1\) with

$$\begin{aligned} \mathrm{E}|R| \le \delta (\sqrt{\varepsilon }+(N\varepsilon )^{-1}\mathrm{E}\Vert Y_1\Vert )+\mathrm{E}^{1/2}\left( \mu -{\bar{\mu }}-\frac{1}{{\bar{\mu }}}Y_1(i,i)\right) ^2O(e^{-(\log N)^\eta }). \end{aligned}$$

Lemma 5.3 implies that

$$\begin{aligned} \mathrm{E}|R|\le o(\sqrt{\varepsilon })+\mathrm{E}^{1/2}\left( \mu -{\bar{\mu }}-\frac{1}{{\bar{\mu }}}Y_1(i,i)\right) ^2O(e^{-(\log N)^\eta }). \end{aligned}$$

Next, (5.41) and that \(|\mu |\le N^2\) a.s. imply that

$$\begin{aligned} \mathrm{E}^{1/2}\left( \mu -{\bar{\mu }}-\frac{1}{{\bar{\mu }}}Y_1(i,i)\right) ^2 =O(N^2)=o(\varepsilon ^{1/2}N^3)=o(\varepsilon ^{1/2}e^{(\log N)^\eta }). \end{aligned}$$

Thus,

$$\begin{aligned} \mathrm{E}|R|=o(\sqrt{\varepsilon }), \end{aligned}$$

and hence

$$\begin{aligned} \mathrm{E}(\mu )={\bar{\mu }}+o(\sqrt{\varepsilon }). \end{aligned}$$

This, in view of (5.44), implies that

$$\begin{aligned} \mu =\mathrm{E}(\mu )+\frac{1}{{\bar{\mu }}}Y_1(i,i)+o_p\left( (N\varepsilon )^{-1}\Vert Y_1\Vert +\sqrt{\varepsilon }\right) =\mathrm{E}(\mu )+\frac{1}{{\bar{\mu }}}Y_1(i,i)+o_p\left( \sqrt{\varepsilon }\right) , \end{aligned}$$

the second line following from (5.43). This establishes (5.39) with the help of (5.42), and hence the proof. \(\square \)

5.3 Proof of Theorem 2.3

Theorems 2.1 and 2.2 establish Theorem 2.3 with the help of (2.10).

5.4 Proof of Theorem 2.4

Now we shall proceed toward proving Theorem 2.4. For the rest of this section, that is, this subsection and the subsequent two, Assumption F2. holds. In other words, \(r_1,\ldots ,r_k\) are assumed to be Lipschitz continuous and hence so is f.

The following lemma essentially proves Theorem 2.4.

Lemma 5.9

Under Assumptions E1 and F2.,

$$\begin{aligned} \mu =\lambda _i\left( Y_0+(N\varepsilon \theta _i)^{-2}\mathrm{E}(Y_2)\right) +O_p\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) . \end{aligned}$$

Proof

Lemma 5.5 implies that

$$\begin{aligned} \mu =\lambda _i\left( \sum _{n=0}^3\mu ^{-n}\mathrm{E}(Y_n)\right) +O_p\left( \mu ^{-1}\Vert Y_1\Vert +\sum _{n=4}^L\mu ^{-n}\Vert \mathrm{E}(Y_n)\Vert \right) +o_p(\sqrt{\varepsilon }). \end{aligned}$$

Equation (5.43) implies that

$$\begin{aligned} \mu =\lambda _i\left( \sum _{n=0}^3\mu ^{-n}\mathrm{E}(Y_n)\right) +O_p\left( \sqrt{\varepsilon }+\sum _{n=4}^L\mu ^{-n}\Vert \mathrm{E}(Y_n)\Vert \right) . \end{aligned}$$

From (5.22), it follows that

$$\begin{aligned} \sum _{n=4}^L\mu ^{-n}\Vert \mathrm{E}(Y_n)\Vert =O_p\left( (N\varepsilon )^{-1}\right) , \end{aligned}$$

and hence

$$\begin{aligned} \mu =\lambda _i\left( \sum _{n=0}^3\mu ^{-n}\mathrm{E}(Y_n)\right) +O_p\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) . \end{aligned}$$
(5.45)

Lemma 4.4, in particular (4.5) therein, implies that

$$\begin{aligned} \Vert \mathrm{E}(Y_3)\Vert =O\left( (N\varepsilon )^2\right) , \end{aligned}$$

and hence

$$\begin{aligned} \mu ^{-3}\Vert \mathrm{E}(Y_3)\Vert =O_p\left( (N\varepsilon )^{-1}\right) . \end{aligned}$$

This, in conjunction with (5.45), implies that

$$\begin{aligned} \mu =\lambda _i\left( Y_0+\mu ^{-2}\mathrm{E}(Y_2)\right) +O_p\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) . \end{aligned}$$
(5.46)

An immediate consequence of the above and (5.22) is that

$$\begin{aligned} \mu =\lambda _i(Y_0)+O_p(1). \end{aligned}$$
(5.47)

Applying Fact 5.1 as in the proof of Lemma 5.2, it can be shown that

$$\begin{aligned} \left| \lambda _i(Y_0)-Y_0(i,i)\right| \le \sum _{1\le j\le k,\,j\ne i}|Y_0(i,j)|. \end{aligned}$$
(5.48)

Since \(r_i\) and \(r_j\) are Lipschitz functions, it holds that

$$\begin{aligned} e_i^\prime e_j=\mathbf{1}(i=j)+O\left( N^{-1}\right) . \end{aligned}$$

Hence, it follows that

$$\begin{aligned} Y_0(i,i)=N\varepsilon \left( \theta _i+O(N^{-1})\right) =N\varepsilon \theta _i+O(\varepsilon ), \end{aligned}$$

and similarly,

$$\begin{aligned} Y_0(i,j)=O(\varepsilon ), \quad j\ne i. \end{aligned}$$

Combining these findings with (5.48) yields that

$$\begin{aligned} \lambda _i(Y_0)=N\varepsilon \theta _i+O(\varepsilon ). \end{aligned}$$
(5.49)

Equations (5.47) and (5.49) together imply that

$$\begin{aligned} \mu =N\varepsilon \theta _i+O_p(1). \end{aligned}$$
(5.50)

Therefore,

$$\begin{aligned}&\left\| \mu ^{-2}\mathrm{E}(Y_2)-(N\varepsilon \theta _i)^{-2}\mathrm{E}(Y_2)\right\| \\&\quad =O_p\left( (N\varepsilon )^{-3}\Vert \mathrm{E}(Y_2)\Vert \right) \\&\quad =O_p\left( (N\varepsilon )^{-1}\right) . \end{aligned}$$

This in conjunction with (5.46) completes the proof. \(\square \)

Theorem 2.4 is a simple corollary of the above lemma, as shown below.

Proof of Theorem 2.4

A consequence of Theorem 2.2 is that

$$\begin{aligned} \mu -\mathrm{E}(\mu )=O_p(\sqrt{\varepsilon }). \end{aligned}$$

The claim of Lemma 5.9 is equivalent to

$$\begin{aligned} \lambda _i(B)-\mu =O_p\left( \sqrt{\varepsilon }+(N\varepsilon )^{-1}\right) . \end{aligned}$$

The proof follows by adding the two equations, and noting that B is a deterministic matrix. \(\square \)

5.5 Proof of Theorem 2.5

Next we proceed towards the proof of Theorem 2.5, for which the following lemma will be useful.

Lemma 5.10

Under Assumptions E1 and F2, as \(N\rightarrow \infty \),

$$\begin{aligned} e_j^\prime \left( I-\mu ^{-1}W\right) ^{-n}e_l=\mathbf{1}(j=l)+O_p\left( (N\varepsilon )^{-1}\right) , \quad 1\le j,l\le k,\;n=1,2. \end{aligned}$$

Proof

For a fixed \(n=1,2,\) expand

$$\begin{aligned} \left( I-\mu ^{-1}W\right) ^{-n}= I+n\mu ^{-1}W+O_p\left( \mu ^{-2}\Vert W\Vert ^2\right) . \end{aligned}$$

The proof can be completed by proceeding along similar lines as in the proof of Lemma 5.9. \(\square \)

Now we are in a position to prove Theorem 2.5.

Proof of Theorem 2.5

Theorem 2.1 implies that (2.11) holds for any \(i\in {\mathcal {I}}\). Fix such an i, denote

$$\begin{aligned} \mu =\lambda _i(A), \end{aligned}$$

and let v be the eigenvector of A, having norm 1, corresponding to \(\mu \), which is uniquely defined with probability close to 1.

Fix \(k\ge 2\), and \(j\in \{1,\ldots ,k\}\setminus \{i\}\). Premultiplying (5.7) by \(e_j^\prime \) yields that

$$\begin{aligned} e_j^\prime v=N\varepsilon \sum _{l=1}^k\theta _l(e_l^\prime v)e_j^\prime \left( \mu I-W\right) ^{-1}e_l, \quad \text {w.h.p.} \end{aligned}$$
(5.51)

Therefore,

$$\begin{aligned}&e_j^\prime v\left( 1-\theta _j\frac{N\varepsilon }{\mu }e_j^\prime \left( I-\mu ^{-1}W\right) ^{-1}e_j\right) \nonumber \\&\quad =\frac{N\varepsilon }{\mu }\sum _{1\le l\le k,\,l\ne j}\theta _l(e_l^\prime v)e_j^\prime \left( I-\mu ^{-1}W\right) ^{-1}e_l, \quad \text {w.h.p.} \end{aligned}$$

Lemma 5.10 implies that as \(N\rightarrow \infty \),

$$\begin{aligned} 1-\theta _j\frac{N\varepsilon }{\mu }e_j^\prime \left( I-\mu ^{-1}W\right) ^{-1}e_j{\mathop {\longrightarrow }\limits ^{P}}1-\frac{\theta _j}{\theta _i}\ne 0. \end{aligned}$$

Therefore,

$$\begin{aligned} e_j^\prime v&=O_p\left( \frac{N\varepsilon }{\mu }\sum _{1\le l\le k,\,l\ne j}\theta _l(e_l^\prime v)e_j^\prime \left( I-\mu ^{-1}W\right) ^{-1}e_l\right) \\&=O_p\left( \sum _{1\le l\le k,\,l\ne j}\left| e_j^\prime \left( I-\mu ^{-1}W\right) ^{-1}e_l\right| \right) \\&=O_p\left( (N\varepsilon )^{-1}\right) , \end{aligned}$$

the last line being another consequence of Lemma 5.10. Thus, (2.13) holds.

Using (5.7) once again, we get that

$$\begin{aligned} 1=(N\varepsilon )^2\sum _{l,m=1}^k\theta _l\theta _m(e_l^\prime v)(e_m^\prime v)e_l^\prime \left( \mu I-W\right) ^{-2}e_m, \end{aligned}$$

that is,

$$\begin{aligned}&\theta _i^2(e_i^\prime v)^2e_i^\prime \left( I-\mu ^{-1}W\right) ^{-2}e_i\nonumber \\&\quad =(N\varepsilon )^{-2}\mu ^2-\sum _{(l,m)\in \{1,\ldots ,k\}^2\setminus \{(i,i)\}}\theta _l\theta _m(e_l^\prime v)(e_m^\prime v)e_l^\prime \left( I-\mu ^{-1} W\right) ^{-2}e_m. \end{aligned}$$
(5.52)

Using Lemma 5.10 once again, it follows that

$$\begin{aligned} e_i^\prime \left( I-\mu ^{-1}W\right) ^{-2}e_i=1+O_p\left( (N\varepsilon )^{-1}\right) . \end{aligned}$$

Thus, (2.12) would follow once it’s shown that

$$\begin{aligned} (N\varepsilon )^{-2}\mu ^2=\theta _i^2+O_p\left( (N\varepsilon )^{-1}\right) , \end{aligned}$$
(5.53)

and that for all \((l,m)\in \{1,\ldots ,k\}^2\setminus \{(i,i)\}\),

$$\begin{aligned} (e_l^\prime v)(e_m^\prime v)e_l^\prime \left( I-\mu ^{-1} W\right) ^{-2}e_m=O_p\left( (N\varepsilon )^{-1}\right) . \end{aligned}$$
(5.54)

Equation (5.53) is a trivial consequence of (5.50). For (5.54), assuming without loss of generality that \(l\ne i\), (2.13) implies that

$$\begin{aligned} \left| (e_l^\prime v)(e_m^\prime v)e_l^\prime \left( I-\mu ^{-1} W\right) ^{-2}e_m\right|&=\left| (e_m^\prime v)e_l^\prime \left( I-\mu ^{-1} W\right) ^{-2}e_m\right| O_p\left( (N\varepsilon )^{-1}\right) \\ \le&\left| e_l^\prime \left( I-\mu ^{-1} W\right) ^{-2}e_m\right| O_p\left( (N\varepsilon )^{-1}\right) \\&=O_p\left( (N\varepsilon )^{-1}\right) , \end{aligned}$$

the last line following from Lemma 5.10. Thus, (5.54) follows, which in conjunction with (5.53) establishes (2.12). This completes the proof. \(\square \)

5.6 Proof of Theorem 2.6

Finally, Theorem 2.6 is proved below, based on Assumptions E2. and F2.

Proof of Theorem 2.6

Fix \(i\in {\mathcal {I}}\). Recall (5.8) and (5.9), and let u be as defined in the former. Let \({\tilde{u}}\) be the column vector obtained by deleting the ith entry of u, \({\tilde{V}}_i\) be the column vector obtained by deleting the ith entry of the ith column of V, and \({\tilde{V}}\) be the \((k-1)\times (k-1)\) matrix obtained by deleting the ith row and ith column of V. Then, (5.9) implies that

$$\begin{aligned} \mu {\tilde{u}}={\tilde{V}}{\tilde{u}}+u(i)\tilde{V}_i, \quad \text {w.h.p.} \end{aligned}$$
(5.55)

Lemma 5.1 implies that

$$\begin{aligned} \left\| I_{k}-\mu ^{-1}V-\mathrm{Diag}\left( 1-\frac{\theta _1}{\theta _i},\ldots ,1-\frac{\theta _k}{\theta _i}\right) \right\| =o_{hp}(1), \end{aligned}$$

and hence \(I_{k-1}-\mu ^{-1}{\tilde{V}}\) is non-singular w.h.p. Thus, (5.55) implies that

$$\begin{aligned} {\tilde{u}}=u(i)\mu ^{-1}\left( I_{k-1}-\mu ^{-1}\tilde{V}\right) ^{-1}{\tilde{V}}_i, \quad \text {w.h.p.} \end{aligned}$$
(5.56)

The next step is to show that

$$\begin{aligned} \left\| \mu ^{-1}V-\mathrm{Diag}\left( \frac{\theta _1}{\theta _i},\ldots ,\frac{\theta _k}{\theta _i}\right) \right\| =o_{p}\left( \sqrt{\varepsilon }\right) . \end{aligned}$$
(5.57)

To see this, use the fact that f is Lipschitz to write for a fixed \(1\le j,l\le k\),

$$\begin{aligned} V(j,l)&=N\varepsilon \sqrt{\theta _j\theta _l}\left( e_j^\prime e_l+\mu ^{-1}e_j^\prime We_l+O_p\left( \mu ^{-2}\Vert W\Vert ^2\right) \right) \nonumber \\&=N\varepsilon \sqrt{\theta _j\theta _l}\left( e_j^\prime e_l+O_p\left( (N\varepsilon )^{-1}\right) \right) \nonumber \\&=N\varepsilon \theta _j\left( \mathbf{1}(j=l)+O_p\left( (N\varepsilon )^{-1}\right) \right) \nonumber \\&=N\varepsilon \theta _j\left( \mathbf{1}(j=l)+o_p\left( \sqrt{\varepsilon }\right) \right) , \end{aligned}$$
(5.58)

the last line following from the fact that

$$\begin{aligned} (N\varepsilon )^{-1}=o\left( \sqrt{\varepsilon }\right) , \end{aligned}$$
(5.59)

which is a consequence of (2.5). This along with (5.50) implies that

$$\begin{aligned} (N\varepsilon \theta _i)^{-1}\mu =1+o_p\left( \sqrt{\varepsilon }\right) . \end{aligned}$$
(5.60)

Combining this with (5.58) yields that

$$\begin{aligned} \mu ^{-1}V(j,l)=\theta _i^{-1}\theta _j\mathbf{1}(j=l)+o_p\left( \sqrt{\varepsilon }\right) . \end{aligned}$$

Thus, (5.57) follows, an immediate consequence of which is that

$$\begin{aligned} \left\| \left( I_{k-1}-\mu ^{-1}\tilde{V}\right) ^{-1}-{\tilde{D}}\right\| =o_p\left( \sqrt{\varepsilon }\right) , \end{aligned}$$
(5.61)

where

$$\begin{aligned} \tilde{D}=\left[ \mathrm{Diag}\left( 1-\frac{\theta _1}{\theta _i},\ldots ,1-\frac{\theta _{i-1}}{\theta _i},1-\frac{\theta _{i+1}}{\theta _i},\ldots ,1-\frac{\theta _k}{\theta _i}\right) \right] ^{-1}. \end{aligned}$$

Next, fix \(j\in \{1,\ldots ,k\}\setminus \{i\}\). By similar arguments as above, it follows that

$$\begin{aligned} V(i,j)&=N\varepsilon \sqrt{\theta _i\theta _j}\left( \sum _{n=0}^3\mu ^{-n}e_i^\prime W^ne_j+O_p\left( \mu ^{-4}\Vert W\Vert ^4\right) \right) \\&=N\varepsilon \sqrt{\theta _i\theta _j}\sum _{n=0}^3\mu ^{-n}e_i^\prime W^ne_j+O_p\left( (N\varepsilon )^{-1}\right) \\&=N\varepsilon \sqrt{\theta _i\theta _j}\sum _{n=1}^2\mu ^{-n}e_i^\prime W^ne_j+o_p\left( \sqrt{\varepsilon }\right) , \end{aligned}$$

using (5.59) once again, because

$$\begin{aligned} N\varepsilon e_i^\prime e_j=O(\varepsilon )=o\left( \sqrt{\varepsilon }\right) , \end{aligned}$$

and

$$\begin{aligned} N\varepsilon \mu ^{-3} e_i^\prime W^3e_j=O_p\left( (N\varepsilon )^{-2}\mathrm{E}(e_i^\prime W^3e_j)\right) =o_p\left( \sqrt{\varepsilon }\right) , \end{aligned}$$

by (4.5). Thus,

$$\begin{aligned} V(i,j)-N\varepsilon \sqrt{\theta _i\theta _j}\mu ^{-1}e_i^\prime We_j&=N\varepsilon \sqrt{\theta _i\theta _j}\mu ^{-2}e_i^\prime W^2e_j+o_p\left( \sqrt{\varepsilon }\right) \\&=N\varepsilon \sqrt{\theta _i\theta _j}\mu ^{-2}\mathrm{E}\left( e_i^\prime W^2e_j\right) +o_p\left( \sqrt{\varepsilon }\right) \\&=(N\varepsilon )^{-1}\theta _j^{1/2}\theta _i^{-3/2}\mathrm{E}\left( e_i^\prime W^2e_j\right) +o_p\left( \sqrt{\varepsilon }\right) , \end{aligned}$$

the second line following from Lemma 4.3, and the last line from (5.59), (5.60) and Lemma 4.2. In particular,

$$\begin{aligned} V(i,j)=O_p(1). \end{aligned}$$

The above in conjunction with (5.61) implies that

$$\begin{aligned}&\left[ \left( I_{k-1}-\mu ^{-1}{\tilde{V}}\right) ^{-1}{\tilde{V}}_i\right] (j)\\&\quad =\left( 1-\frac{\theta _j}{\theta _i}\right) ^{-1}\sqrt{\theta _i\theta _j}\left[ (N\varepsilon )^{-1}\theta _i^{-2}\mathrm{E}\left( e_i^\prime W^2e_j\right) +N\varepsilon \mu ^{-1}e_i^\prime We_j\right] +o_p(\sqrt{\varepsilon }). \end{aligned}$$

In light of (5.56), the above means that

$$\begin{aligned}&e_j^\prime v\\&\quad =(e_i^\prime v)\mu ^{-1}\left( 1-\frac{\theta _j}{\theta _i}\right) ^{-1}\left[ (N\varepsilon )^{-1}\theta _i^{-1}\mathrm{E}\left( e_i^\prime W^2e_j\right) +N\varepsilon \theta _i\mu ^{-1}e_i^\prime We_j+o_p(\sqrt{\varepsilon })\right] \\&\quad =\mu ^{-1}\left( 1-\frac{\theta _j}{\theta _i}\right) ^{-1}\left[ (N\varepsilon )^{-1}\theta _i^{-1}\mathrm{E}\left( e_i^\prime W^2e_j\right) +N\varepsilon \theta _i\mu ^{-1}e_i^\prime We_j+o_p(\sqrt{\varepsilon })\right] , \end{aligned}$$

the last line following from (2.12) and (5.59). Using (5.60) once again yields that

$$\begin{aligned} N\varepsilon (e_j^\prime v)=\frac{1}{\theta _i-\theta _j}\left[ (N\varepsilon )^{-1}\theta _i^{-1}\mathrm{E}\left( e_i^\prime W^2e_j\right) +N\varepsilon \theta _i\mu ^{-1}e_i^\prime We_j\right] +o_p(\sqrt{\varepsilon }). \end{aligned}$$

This completes the proof. \(\square \)