1 Introduction

The theory of random Schrödinger operators gained much success in describing various phenomena in mathematical physics. A 1-d random Schrödinger operator in its finite volume version can be described as the following operator defined on \(\{0,1,\ldots ,N\}\)

$$\begin{aligned} H_N\varphi (k)=\varphi (k-1)+\varphi (k+1)+{\mathfrak {a}}(k)\varphi (k),\quad \varphi (0)=\varphi (N)=0, \end{aligned}$$
(1.1)

where \({\mathfrak {a}}(k),k=1,\ldots ,N\), are certain random potentials. It corresponds to the matrix representation

$$\begin{aligned} H_N=H_N^\infty +\Lambda _N^0, \end{aligned}$$
(1.2)

where

$$\begin{aligned} H_N^\infty =\begin{pmatrix} 0&{}\quad 1&{}\quad 0&{}\quad \cdots &{}\quad 0\\ 1&{}\quad 0&{}\quad 1&{}\quad \ddots &{}\quad 0\\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\cdots &{}1&{}0&{}1\\ 0&{}\quad \cdots &{}\quad 0&{}\quad 1&{}\quad 0 \end{pmatrix},\end{aligned}$$
(1.3)

and

$$\begin{aligned} \Lambda _N^0={\text {diag}}\left( {\mathfrak {a}}(1),\ldots ,{\mathfrak {a}}(N)\right) . \end{aligned}$$
(1.4)

Under some slight assumptions on the potentials \({\mathfrak {a}}(\cdot )\), [19] proves that the bulk eigenvalues of \(H_N\) converge to a Poisson process after an appropriate rescaling. Further, assuming that \({\mathfrak {a}}\) has a doubly exponential type upper tail distribution, it is known [6] that \(\lambda _1(N)\), the top eigenvalue of \(H_N\), converges to a Gumbel distribution of max-order class.

To investigate the transition from localized phase to delocalized phase, a variant of \(H_N\) was introduced in [14] (for a modified model) and [15] where the potentials are vanishing with N, i.e.,

$$\begin{aligned} H_N^\alpha =H_N^\infty +\Lambda _N^\alpha ,\quad \Lambda _N^\alpha ={\text {diag}}(N^{-\alpha }{\mathfrak {a}}(1),\ldots , N^{-\alpha }{\mathfrak {a}}(N)). \end{aligned}$$
(1.5)

The transition from localized to delocalized phase has been verified the bulk statistics at the critical value \(\alpha =\frac{1}{2}\), see for example [20].

In this work, we investigate the asymptotics of the top eigenvalue of \(H_N^\alpha \) with decaying diagonal potentials (1.5). When the potentials \({\mathfrak {a}}\) are not too heavy-tailed, it is easy to check that the top eigenvalue of \(H_N^\alpha \) converges to 2 almost surely, and when \({\mathfrak {a}}\) has heavier tails, the top eigenvalue can be arbitrarily large with positive probability. We identify the large deviation profile of the top eigenvalue in the former light-tailed case, then identify the threshold of criticality and finally derive the law of the top eigenvector in the critical case. When \({\mathfrak {a}}(\cdot )\) has even heavier tails, we verify that the top eigenvalue has a Poisson distribution.

This work is partially inspired by the recent progress in large deviation of Wigner matrices. When the matrix has sub-Gaussian entries with a bound on its Laplace transform, the large deviation profile of the largest eigenvalue was derived via spherical integral in [11], where it was the collective behavior of random potentials, instead of the behavior of one outlier potential, that accounts the large deviation profile. When the random variables have a tail heavier than Gaussian, the large deviation of spectral statistics was derived in [7] (and see [3] for deviation of the largest eigenvalue in this regime) where the deviation is accounted for by few outlier potentials taking very large values. For heavy-tailed Wigner matrices, the distribution of the top eigenvalue was also well studied by many authors. When matrix entries have finite fourth moment, the top eigenvalue sticks to the edge of semicircle law [4], and the rescaled top eigenvalue has a Tracy–Widom fluctuation at the edge [18]. When the matrix entries do not have a finite fourth moment, it is verified in [2, 23] that the largest eigenvalue has a Poisson distribution. Finally, when the density of the matrix entries have a tail decay \(x^{-4}dx\), the distribution of the top eigenvalue is explicitly determined in [8]. For the random Schrödinger operator \(H_N^\alpha \), we will show that the large deviation profile and the asymptotic distribution of the top eigenvalue are always governed by outlier potentials of \({\mathfrak {a}}\) taking unusually large values.

We also study a model of close relevance: Consider the matrix

$$\begin{aligned} G_N^\infty = \begin{pmatrix} 0&{}\quad \sqrt{(N-1)/N}&{} \quad 0&{}\quad &{}\\ \sqrt{(N-1)/N} &{}\quad 0 &{}\quad \sqrt{(N-2)/N}\\ &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\\ &{}\quad &{}\quad 0&{}\quad \sqrt{2/N}&{}\quad 0\\ &{}\quad &{}\quad \sqrt{2/N}&{}\quad 0&{}\quad \sqrt{1/N}\\ &{}\quad &{}\quad 0&{}\quad \sqrt{1/N}&{}\quad 0 \end{pmatrix}.\nonumber \\ \end{aligned}$$
(1.6)

This model is closely related to the matrix model of beta ensembles, introduced by Dumitriu and Edelman in [10]. Indeed, \(G_N^\infty \) can be thought of the zero temperature model in this beta-parameterized matrix family (3.2). We are also interested in random operators of the following form

$$\begin{aligned} G_N^\infty +\Lambda _N^\alpha , \quad \Lambda _N^\alpha ={\text {diag}}(N^{-\alpha }{\mathfrak {a}}(1),\ldots , N^{-\alpha }{\mathfrak {a}}(N)), \end{aligned}$$
(1.7)

but in order to foster better analytic treatment, we will instead work with an orthogonal invariant version

$$\begin{aligned} G_N^\alpha \overset{\text {def}}{=}G_N^\infty +U_N\Lambda _N^\alpha U_N^T, \end{aligned}$$
(1.8)

where \(U_N\) is a Haar distributed orthogonal matrix of size N, independent of the random variables \({\mathfrak {a}}(\cdot )\) and \(U_N^T\) denotes the transpose of \(U_N\). We will verify that the large deviation profiles of the top eigenvalue of \(H_N^\infty \) and \(G_N^\infty \) are very similar, except that the precise expression of large deviation rate function is different.

1.1 Statement of Main Results

The main results of this paper are outlined as follows:

Theorem 1.1

Consider the random operator \(H_N^\alpha \) defined in (1.5), and denote by \(\lambda _1(N)\) the largest eigenvalue of \(H_N^\alpha \). Assume that the random potentials \({\mathfrak {a}}(1),\ldots ,{\mathfrak {a}}(N)\) are i.i.d. with distribution \({\mathfrak {a}}\). Then

  1. (1)

    (Weibull distribution) Assume that for some \(\beta >0\), some \(C>0\), \(0<C_1<C_2<\infty \) we have for any \(t>1\),

    $$\begin{aligned} C_1\exp (-Ct^\beta )\le & {} {\mathbb {P}}({\mathfrak {a}}>t)\le C_2\exp (-Ct^\beta ), \end{aligned}$$
    (1.9)
    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}<-t)\le & {} C_2\exp (-Ct^\beta ), \end{aligned}$$
    (1.10)

    then for any \(\lambda >2\), we have the upper tail estimate

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\log ({\mathbb {P}}(\lambda _1(N)>\lambda ))}{N^{\alpha \beta }}=-C\left( \sqrt{\lambda ^2-4}\right) ^\beta , \end{aligned}$$
    (1.11)

    and for any \(\lambda <2\), we have the lower tail estimate: There exists some constants \(C_3,C_4>0\), and when \(\beta <2\), there exists some sufficiently small \(c>0\) such that

    $$\begin{aligned} {\mathbb {P}}(\lambda _1(N)<\lambda )\le {\left\{ \begin{array}{ll} C_3e^{-C_4N^{\alpha \beta +1}},\quad \beta \in [2,\infty ),\\ C_3e^{-C_4N^{\alpha \beta +c}},\quad \beta \in (0,2). \end{array}\right. } \end{aligned}$$
    (1.12)

    The constant \(C_4>0\) depends on \(\lambda \) and can be written as an explicit function of \(\lambda \). See Remark 2.2 for a discussion.

  2. (2)

    (Subcritical) Assume that for some \(\beta >\frac{1}{\alpha }\), \(C>0,C_1>0\) we have for all \(t>1\),

    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}>t)= & {} Ct^{-\beta }, \end{aligned}$$
    (1.13)
    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}<-t)\le & {} C_1t^{-\beta }, \end{aligned}$$
    (1.14)

    then for any \(\lambda >2\), we have the upper tail estimate:

    $$\begin{aligned} \lim _{N\rightarrow \infty } N^{\alpha \beta -1}{\mathbb {P}}(\lambda _1(N)>\lambda )=\left( \sqrt{\lambda ^2-4}\right) ^{-\beta }, \end{aligned}$$
    (1.15)

    and for any \(\lambda <2\) we have the lower tail estimate: When \(\beta >2\), for any \(\lambda <2\) and any \(\epsilon >0\) we can find \(C_2=C_2(\epsilon )>0\) depending on \(\epsilon \) and \(\lambda \) such that

    $$\begin{aligned} {\mathbb {P}}(\lambda _1(N)<\lambda )\le (C_2N)^{-(\alpha (\beta -\epsilon )+1)} \end{aligned}$$
    (1.16)

    and when \(\beta \in (0,2]\), for any \(\lambda <2\) we can find sufficiently small \(c>0\) and some \(C_3=C_3(\beta ,c,\lambda )\) such that

    $$\begin{aligned} {\mathbb {P}}(\lambda _1(N)<\lambda )\le (C_3N)^{-c}. \end{aligned}$$
    (1.17)
  3. (3)

    (Critical regime) Assume there exists some \(C>0,C_1>0\) such that for any \(t>1\),

    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}>t)= & {} Ct^{-\frac{1}{\alpha }}, \end{aligned}$$
    (1.18)
    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}<-t)\le & {} C_1t^{-\frac{1}{\alpha }}, \end{aligned}$$
    (1.19)

    then almost surely

    $$\begin{aligned} \lambda _1(N)\overset{law}{\rightarrow }\sqrt{\xi ^2+4}, \quad N\rightarrow \infty , \end{aligned}$$
    (1.20)

    where \(\xi \) is a nonnegative-valued random variable that satisfies, for any \(\lambda >0\),

    $$\begin{aligned} {\mathbb {P}}(\xi >\lambda )=1-e^{-C\lambda ^{-\frac{1}{\alpha }}}. \end{aligned}$$
    (1.21)
  4. (4)

    (Randomness dominating) Assume that for some \(0<\beta <\frac{1}{\alpha }\), \(C>0,C_1>0\) we have for all \(t>1\),

    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}>t)= & {} Ct^{-\beta }, \end{aligned}$$
    (1.22)
    $$\begin{aligned} {\mathbb {P}}({\mathfrak {a}}<-t)\le & {} C_1t^{-\beta }, \end{aligned}$$
    (1.23)

    then

    $$\begin{aligned} N^{\alpha -\frac{1}{\beta }} \lambda _1(N)\overset{law}{\rightarrow }\xi , \quad N\rightarrow \infty , \end{aligned}$$
    (1.24)

    where \(\xi \) is a nonnegative-valued random variable satisfying for any \(\lambda >0\),

    $$\begin{aligned} {\mathbb {P}}(\xi >\lambda )=1-e^{-C\lambda ^{-\beta }}. \end{aligned}$$
    (1.25)

Remark 1.2

The upper tail estimates in cases (1) and (2) are sharp, yet the lower tail estimates derived in cases (1) and (2) are crude and may not be optimal. A related example is the lower tail of KPZ equation with narrow wedge initial condition [24], where the exact lower tail estimate was derived via computing the exponential moment of the Airy-2 process, which admits an interpretation via the stochastic Airy operator. The stochastic Airy operator is closely related to the theme of this paper, yet it is defined via Brownian motion, and thus, a lot of analytical techniques (like the Girsanov transform) can be used. In our paper, we consider different moment assumptions on the random variables, so that the analytical techniques in the context of stochastic Airy operator are not available to us for deriving a sharper lower tail estimate.

There has also been some recent investigations on the matrix model (1.5), where a scaling limit of top eigenvalue (when \(\alpha =\frac{3}{2}\)) was derived in [12].

We can also derive the limiting distribution of the second, third, fourth, etc. largest eigenvalues of \(H_N^\alpha \), denoted by \(\lambda _i(N),i\ge 2\). All the derivations depend on the distribution of the largest value of \({\mathfrak {a}}(1),\ldots ,{\mathfrak {a}}(N)\) and the finite rank perturbation formula in Lemma 2.1, so that the distributions of the second, third, fourth, etc. eigenvalues can be written out explicitly.

In cases (1) and (2), we can check that the deviation probabilities of \(\lambda _i(N),i\ge 2\) are negligible compared to \(e^{-N^{\alpha \beta }}\) and \(N^{1-\alpha \beta }\), respectively, and is thus negligible compared to deviation of \(\lambda _1(N)\). In cases (3) and (4), however, deviations of \(\lambda _i(N),i\ge 2\) have the same magnitude as that of \(\lambda _1(N)\), and we have the following point process characterization:

Theorem 1.3

In case (3) of Theorem 1.1, for any \(\epsilon >0\), the largest eigenvalues of \(H_N^\alpha \) in \((2+\epsilon ,\infty )\) converge to a random point process which is the image under the map \(x\mapsto \sqrt{x^2+4}\) of the Poisson process on \([0,\infty )\) with intensity \( \frac{C}{\alpha x^{\frac{1}{\alpha }+1}},\quad x>0.\)

In case (4) of Theorem 1.5, the point process

$$\begin{aligned} N^{\alpha -\frac{1}{\beta }}\lambda _1(N)\ge N^{\alpha -\frac{1}{\beta }}\lambda _2(N)\ge \cdots , \end{aligned}$$

where \(\lambda _1(N)\ge \lambda _2(N)\ge \cdots \) denote the eigenvalue of \(H_N^\alpha \) in decreasing order, converges to the Poisson point process on \([0,\infty )\) with intensity \( \frac{C\beta }{ x^{\beta +1}},\quad x>0. \)

Remark 1.4

The Poisson distribution claimed in (4) has been verified for several other random matrix models with heavy tail distribution, see [23]. Meanwhile, in all these claims, the assumption that the tail estimate for \({\mathfrak {a}}\) holds for all \(t>1\) can be easily changed to holding for all t larger than a fixed number.

In the following, we summarize the results on top eigenvalues concerning the random operator (1.8). We first introduce a convenient notation: For any \(\lambda \ge 2\), denote by \(f(\lambda )\) the solution to

$$\begin{aligned} f(\lambda )+\frac{1}{f(\lambda )}=\lambda \end{aligned}$$

that satisfies \(f(\lambda )\ge 1\).

Theorem 1.5

Consider the random operator \(G_N^\alpha \) defined in (1.8), where \(U_N\) is a Haar distributed unitary matrix independent of the random variables \({\mathfrak {a}}(\cdot )\). We denote by \(\lambda _1(N)\) the largest eigenvalue of \(G_N^\alpha \). Assume the random potentials \({\mathfrak {a}}(1),\ldots ,{\mathfrak {a}}(N)\) are i.i.d. with distribution a real-valued random variable \({\mathfrak {a}}\). Then

  1. (1)

    (Weibull distribution) Assume that the random variable \({\mathfrak {a}}(\cdot )\) satisfies Assumptions (1.9), (1.10). Then for any \(\lambda >2\), we have the upper tail estimate

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\log ({\mathbb {P}}(\lambda _1(N)>\lambda ))}{N^{\alpha \beta }}=-C\left( f(\lambda )\right) ^\beta , \end{aligned}$$
    (1.26)

    and for any \(\lambda <2\), we have the lower tail estimate as in (1.12).

  2. (2)

    (Subcritical) Assume that the random variables satisfy (1.13) and (1.14). Then for any \(\lambda >2\), we have the upper tail estimate:

    $$\begin{aligned} \lim _{N\rightarrow \infty } N^{\alpha \beta -1}{\mathbb {P}}(\lambda _1(N)>\lambda )=\left( f(\lambda )\right) ^{-\beta }, \end{aligned}$$
    (1.27)

    and for any \(\lambda <2\), we have the lower tail estimates in exactly the same form as in (1.16), (1.17) (though the constants \(C_2\), \(C_3\) there may be slightly different).

  3. (3)

    (Critical regime) Assume that the random variables satisfy (1.18), (1.19) Then

    $$\begin{aligned} \lambda _1(N)\overset{law}{=}{\left\{ \begin{array}{ll} 2,&{}\xi \le 1,\\ \xi +\frac{1}{\xi },&{}\xi >2, \end{array}\right. } \end{aligned}$$
    (1.28)

    where \(\xi \) is a nonnegative-valued random variable that satisfies, for any \(\lambda >0\),

    $$\begin{aligned} {\mathbb {P}}(\xi >\lambda )=1-e^{-C\lambda ^{-\frac{1}{\alpha }}}. \end{aligned}$$
    (1.29)

    More generally, for any \(d\in {\mathbb {N}}_+\), the largest d eigenvalues of \(G_N^\alpha \) in \((2, \infty )\) converge to a random point process which is the image under the map

    $$\begin{aligned} x\mapsto {\left\{ \begin{array}{ll}2&{}x<1\\ x+\frac{1}{x}&{}x\ge 1\end{array}\right. } \end{aligned}$$

    of the largest d points in the Poisson process on \([0,\infty )\) with intensity \( \frac{C}{\alpha x^{\frac{1}{\alpha }+1}},\quad x>0.\)

  4. (4)

    (Randomness dominating) Assume that the random variables satisfy (1.22), (1.23). Then the top eigenvalue \(\lambda _1(N)\) satisfies (1.24). Meanwhile, the point process

    $$\begin{aligned} N^{\alpha -\frac{1}{\beta }}\lambda _1(N)\ge N^{\alpha -\frac{1}{\beta }}\lambda _2(N)\ge \cdots \end{aligned}$$

    converges to the Poisson point process on \([0,\infty )\) with intensity \( \frac{C\beta }{ x^{\beta +1}},\quad x>0, \) where \(\lambda _1(N)\ge \lambda _2(N),\cdots \ge \) denote the eigenvalues of \(G_N^\alpha \) in decreasing order.

Remark 1.6

The matrix model \(G_N^\infty \) (1.6) is closely related to the Gaussian beta ensemble. The small deviations of top eigenvalue of Gaussian beta ensembles were derived in [17]. Note that the precise large deviation rate function was not derived in [17]. In this paper, we derive the precise large deviation rate function, but for a modified model with orthogonal invariance. From a different perspective, there has been a number of recent works concerning deviations of the top eigenvalue of Gaussian beta ensembles at the high temperature limit, including [21, 22]. These works mainly consider the double scaling regime of the Gaussian beta ensemble in which \(\beta \rightarrow 0\) and \(N\beta \rightarrow \infty \), and is very different from the models we investigate in this paper where there is only one scaling.

1.2 Plan of the Paper

In Sect. 1.2, we prove Theorems 2.4 and 1.3. Then in Sect. 3, we prove Theorem 1.5.

2 Proof of Theorem 1.1

To prove Theorem 1.1, we first need the following computation on the top eigenvalue of diagonally perturbed matrices.

Lemma 2.1

Consider a triangular array of real numbers \((\lambda _1^{(N)},\ldots ,\lambda _N^{(N)})_{N\ge 1}\) satisfying the following assumptions: There exists some (small) \(c\in (0,1)\) such that

  • \(M_N:=\max \{\lambda _1^{(N)},\ldots ,\lambda _N^{(N)}\}\in (0,\infty )\), there exists a unique \(i\in [N]\) such that \(M_N=\lambda _i^{(N)}\), and there exists some \(\epsilon _N>0\) such that for any other \(j\ne i\), \(\lambda _j^{(N)}\le M_N-\epsilon _N\). We require that \(\epsilon _N\) is bounded away from zero for N large, and that \(M_N\) is bounded from infinity for N large.

  • At any \(i\in [1,N]\) such that \(|\lambda _i^{(N)}|>N^{-c}\), for any \(j\in [1,N]\) where \(|i-j|<N^{2c}\) the value \(\lambda _j^{(N)}\) must satisfy \(|\lambda _j^{(N)}|\le N^{-c}.\)

  • There are at most \(N^{1.5c}\) indices \(i\in [1,N]\) such that \(|\lambda _i^{(N)}|>N^{-c}.\)

Then

  1. (1)

    If \(M_N\) is achieved in the middle of [1, N] in the sense that for any \(i\in [1,\lfloor N^{c}]\rfloor \cup [N-\lfloor N^{c}\rfloor ,N]\), we have \(|\lambda _i^{(N)}|\le N^{-c}\), then

    $$\begin{aligned} \lambda _1(H_N^\infty +{\text {diag}}(\lambda _1^{(N)},\ldots ,\lambda _N^{(N)}))= \sqrt{M_N^2+4}(1+o(1)) \end{aligned}$$

    as N tends to infinity, where \(\lambda _1(\cdot )\) denotes the largest eigenvalue of a matrix.

  2. (2)

    In any case, the largest eigenvalue of \(H_N^\infty +{\text {diag}}(\lambda _1^{(N)},\ldots ,\lambda _N^{(N)})\) is upper bounded by \(\sqrt{M_N^2+4}(1+o(1))\) as N tends to infinity.

  3. (3)

    Let \(M_N^1:=M_N,\) \(M_N^2,M_N^3,\ldots \) denote the first largest, second largest, third largest, ...elements from the array \((\lambda _1^{(N)},\ldots ,\lambda _N^{(N)})\). Fix any \(d\in {\mathbb {N}}_+\), assume \(M_N^1,\ldots ,M_N^d\) are nonnegative, uniformly bounded from infinity in N, and that \(M_N^i-M_N^{i+1}\ge \epsilon _N\) for \(i=1,\ldots ,d-1\) with \(\epsilon _N\) bounded away from zero. Assume moreover that for all \(i\in [1,N^c]\cup [N-N^c,N]\) we have \(|\lambda _i^{(N)}|\le N^{-c}\). Then the top d eigenvalues of \(H_N^\infty +{\text {diag}}(\lambda _1^{(N)},\ldots ,\lambda _N^{(N)})\) are given by \(\sqrt{(M_N^i)^2+4}(1+o(1))\) for \(i=1,\ldots ,d\).

Proof

Denote by \(\Lambda _N={\text {diag}}(\lambda _1^{(N)},\ldots ,\lambda _N^{(N)})\) and

$$\begin{aligned} \Lambda _N^*={\text {diag}}\left( \lambda _1^{(N)}1_{|\lambda _1^{(N)}|>N^{-c}}, \lambda _2^{(N)}1_{|\lambda _2^{(N)}|>N^{-c}},\ldots ,\lambda _N^{(N)}1_{|\lambda _N^{(N)}|>N^{-c}}\right) . \end{aligned}$$

Then as \(N\rightarrow \infty \), by Cauchy interlacing theorem,

$$\begin{aligned} |\lambda _1(H_N^\infty +\Lambda _N) -\lambda _1(H_N^\infty +\Lambda _N^*)|\le N^{-c}, \end{aligned}$$

so it suffices to compute the top eigenvalue of the latter matrix. Recall that \(\lambda >2\) is an eigenvalue of \(H_N^\infty +\Lambda _N^*\) if and only if

$$\begin{aligned} \det (\lambda {\text {I}}_N-H_N^\infty -\Lambda _N^*)=0. \end{aligned}$$

Since we assume \(\lambda >2\) and that all the eigenvalues of \(H_N^\infty \) lie in \((-2,2)\) by a standard calculation, this implies

$$\begin{aligned} \det (\lambda {\text {I}}_N-H_N^\infty )\ne 0, \end{aligned}$$

so that we have

$$\begin{aligned} \det ({\text {I}}_N-(\lambda {\text {I}}_N-H_N^\infty )^{-1}\Lambda _N^*)=0. \end{aligned}$$

At this point, we quote the computations in [13] to compute \(R_{ij}:=((\lambda {\text {I}}_N-H_N^\infty )^{-1})_{ij}\). Denote by \(\lambda ^*=\hbox {arccosh}(\frac{1}{2}\lambda )\), then the main result of [13] states that

$$\begin{aligned} R_{ij}=\frac{\cosh ((N+1-|i-j|)\lambda ^*)-\cosh ((N+1-i-j)\lambda ^*)}{2\sinh (\lambda ^*)\sinh ((N+1)\lambda ^*)}. \end{aligned}$$

In the following, we first prove the claims in item (1). We claim the following asymptotics: For any \(\lambda _0>2\), we can find a constant \(C(\lambda _0,N)>0\) such that, for any \(i\in [N^{2c},N-N^{2c}]\), and for any \(\lambda >\lambda _0\),

$$\begin{aligned} \left| R_{ii}-\frac{1}{2\sinh (\hbox {arccosh}(\frac{1}{2}\lambda ))}\right| \le C(\lambda _0,N) \rightarrow 0,\quad N\rightarrow \infty .\end{aligned}$$
(2.1)

Meanwhile, for any such \(i\in [N^{2c},N-N^{2c}]\), we have

$$\begin{aligned} |R_{ij}|\le e^{-N^{2c}}\quad \hbox {for any}\quad |j-i|>N^{2c}. \end{aligned}$$
(2.2)

These two estimates can be verified from the fact that for \(\lambda ^{*}>1\), \(\vert \cosh ((N+1-i-j)\lambda ^{*})\vert \sim \frac{1}{2}e^{\vert N+1-i-j\vert \lambda _{*}}\) and \(\vert \sinh ((N+1-i-j)\lambda ^{*})\vert \sim \frac{1}{2}e^{\vert N+1-i-j\vert \lambda _{*}}\). The same asymptotics hold if we take \(N+1\) and \(N+1-|i-j|\) in place of \(N+1-i-j\). Finally, the assumption \(i\in [N^{2c},N-N^{2c}]\) ensures that \(|N+1-|i-j||<N+1-N^{2c}\) and \(|N+1-i-j|<N+1-N^{2c}\), and these convergence can be quantified with explicit error rates for any \(\lambda>\lambda _0>2\). Combining all these asymptotics lead to (2.2). We can similarly verify (2.1): in this case we have \(\cosh ((N+1-i-j)\lambda ^*)/\sinh ((N+1)\lambda ^*)\rightarrow 0\) as \(N\rightarrow \infty \) (as we assume \(i\in [N^{2c},N-N^{2c}]\), this term is actually bounded by \(e^{-N^c}\)) and moreover that \(|\tanh ((N+1)\lambda ^*)- 1|\le e^{-N^c}\) as \(N\rightarrow \infty \). Combining all these asymptotics leads to (2.1).

Then for any \(j\in [1,N]\) such that \(|\lambda _j^{(N)}|<N^{-c}\), the jth column of \({\text {I}}_N-R\Lambda _N^*\) must have entry 1 in the jth coordinate and have entry 0 in all other coordinates. We then use Gauss elimination to simplify the matrix \({\text {I}}_N-R\Lambda _N^*\), in the following steps: (i) For any \(j_*\in [1,N]\) such that \(|\lambda _{j_*}^{(N)}|>N^{-c}\), we can use Gauss elimination so that any \((i,j_*)\)th entry of \({\text {I}}_N-R\Lambda _N^*\) with \(0<|i-j_*|<N^{2c}\) is subtracted to zero, without changing the diagonal entry. This follows from subtracting a suitable multiple of the ith column onto the jth column to set the \((i,j_*)\) entry zero. By our assumption on \(\lambda _j^{(N)}\) in the second bullet point in the statement of the lemma, this can be applied for all \(|i-j_*|<N^{2c}\). (ii) The same step can be applied to all \((i,j_*)\) entry such that \(|\lambda _i^{(N)}|<N^{-c}\), without changing the diagonal entry and determinant of \({\text {I}}_N-R\Lambda _N^*\). (iii) After these procedures, we are left with a \(n\times n\) matrix T with the same determinant and diagonal entries as \({\text {I}}_N-R\Lambda _N^*\), and with at most \(N^{1.5c}\times N^{1.5c}\) nonzero off-diagonal entries (ij) such that \(|T_{ij}|\le e^{-N^{2c}}\).

We can now easily compute \(\det ({\text {I}}_N-R\Lambda _N^*)\). By the aforementioned Gaussian elimination procedure, this boils down to computing the determinant of a diagonal matrix (with bounded entries) perturbed by a matrix with nonzero entries only on \(N^{1.5c}\) rows and columns, and that each entry of the perturbation is bounded by \(e^{-N^{2c}}\). The contribution to the determinant from the perturbation is at most \(\sum _{k=1}^N (N^{1.5c})^ke^{-kN^{2c}}=O(e^{-N^c})\), as when we expand the determinant, at each step we have \(N^{1.5c}\) possible choices of indices from these perturbed entries, and each perturbed entry is bounded by \(e^{-N^{2c}}\).

Therefore, we only need to compute the determinant of the diagonal part, and we get

$$\begin{aligned} \det ({\text {I}}_N-R\Lambda _N^*)=\prod _{i=1}^N\left( 1-\lambda _i^{(N)}R_{ii}1_{|\lambda _i^{(N)}|\ge N^{-c}}\right) +O(e^{-N^{c}}). \end{aligned}$$
(2.3)

Now we can check that if \(\lambda >2\) is such that \(M_N\big (\frac{1}{2\sinh \left( {\text {arccosh}}\left( \frac{1}{2}\lambda \right) \right) }-C(\lambda _0,N)\big )<1-\omega ,\) for any fixed \(\omega >0\), then \(|R_{ii}\lambda _i^{(N)}|\le 1-\omega \). This means \(\det \left( {\text {I}}_N-R\Lambda _N^*\right) >0\) for such \(\lambda \). Since \(\omega >0\) is arbitrary and \(C(\lambda _0,N)\) tends to 0, we conclude that for any \(\omega >0\), any \(\lambda >\sqrt{M_N^2+4}+\omega \) and any N sufficiently large, we must have \(\det ({\text {I}}_{{\text {N}}}-R\Lambda _N^*)>0\).

Fix any \(\omega _0>0\). If \(\lambda >2\) is such that \(M_N\left( \frac{1}{2\sinh \left( {\text {arccosh}} \left( \frac{1}{2}\lambda \right) \right) }-C(\lambda _0,N)\right) >1+\omega _0/4\), but meanwhile, \((M_N-\epsilon _N)\left( \frac{1}{2\sinh \left( {\text {arccosh}}\left( \frac{1}{2}\lambda \right) \right) }+C(\lambda _0,N)\right) <1-\omega _0/4\), then for any such \(\lambda \) we must have \(\det ({\text {I}}_N-R\Lambda _N^*)<0\) by the assumption in the first bullet point of this lemma. The existence of such \(\lambda \) is guaranteed by the fact that \(M_N\) is bounded, \(\epsilon _N\) is nonvanishing and \(C(\lambda _0,N)\) tends to zero as N gets large. Combining both facts, we conclude that the largest \(\lambda \) that solves \(\det (\lambda I_N-H_N^\infty -\Lambda _N^*)=0\) is \(\sqrt{M_N^2+4}(1+o(1))\), so that the largest eigenvalue of \(H_N^\infty +\Lambda _N^*\) must be \(\sqrt{M_N^2+4}(1+o(1))\) and this concludes the proof of part (1).

For case (2), all the reductions in the previous step work as well, and the only difference is that for \(i\in [1,N^c]\) or \([N-N^c,N]\) necessarily we have, as \(N\rightarrow \infty \),

$$\begin{aligned} |R_{ii}|\le \frac{1}{2\sinh ({\text {arccosh}}(\frac{1}{2}\lambda ))}+C(\lambda _0,N), \end{aligned}$$

where \(C(\lambda _0,N)>0\) vanishes to 0. Adapting the proof in case (1), one sees that we must have \(\lambda _1(H_N^\infty +\Lambda _N^*)\le \sqrt{M_N^2+4}(1+o(1))\).

For case (3) where we study the top d eigenvalues, we follow the same steps as in case (1). We again need to compute the determinant \(\det ({\text {I}}_{{\text {N}}}-R\Lambda _N^*)\) as in (2.3). Since by assumption \(M_N^{i-1}-M_N^i\ge \epsilon _N>0\) is bounded away from zero, we can check as in (2.3) that there is precisely one root to the equation \(\det ({\text {I}}_{{\text {N}}}-R\Lambda _N^*)\) at \(\sqrt{(M_N^i)^2+4}(1+o(1))\) for each \(i=1,\ldots ,d\), and there are no other roots existing in the interval \((\sqrt{(M_N^d)^2+4}(1-o(1),\infty ))\). This completes the proof. \(\square \)

Now we prove Theorem 1.1.

We first verify the upper tail claims in case (1), (2) and (3).

Proof of Theorem 1.1, upper tail of case (1)

Assume that case (1) of Theorem 1.1 holds. Then for any \(\lambda >0\),

$$\begin{aligned}{} & {} {\mathbb {P}}\left( \max \{N^{-\alpha } {\mathfrak {a}}(1),\ldots ,N^{-\alpha }{\mathfrak {a}}(N)\}\ge \lambda \right) =1-{\mathbb {P}}({\mathfrak {a}}(1)\le N^{\alpha }\lambda )^N,\nonumber \\{} & {} \quad =1-(1-{\mathbb {P}}({\mathfrak {a}}(1)>N^\alpha \lambda ))^N. \end{aligned}$$
(2.4)

Noting that \({\mathfrak {a}}\) has the Weibull distribution, we conclude that

$$\begin{aligned} \frac{{\mathbb {P}}\!\!\left( \max \{N^{-\alpha } {\mathfrak {a}}(1),\ldots ,N^{-\alpha }{\mathfrak {a}}(N)\}\ge \lambda \right) }{N\exp (-CN^{\alpha \beta }\lambda ^\beta )}\in [C_1,C_2](1+o(1)). \end{aligned}$$
(2.5)

Then we argue that the maximum in (2.4) is achieved at only one index i. Indeed, for any \(\lambda >0\),

$$\begin{aligned}{} & {} {\mathbb {P}}(\text { there are two indices } i: N^{-\alpha } {\mathfrak {a}}(i)\ge \lambda )\nonumber \\{} & {} \quad \le \frac{N^2-N}{2}\exp (-CN^{2\alpha \beta }\lambda ^{2\beta }) \end{aligned}$$
(2.6)

which is negligible compared to \(\exp (-CN^{\alpha \beta })\).

We further check the other assumptions of Lemma 2.1 hold with very high probability: For example, the possibility that there are indices ij such that \(|i-j|<N^{2c}\) such that \(|N^{-\alpha } {\mathfrak {a}}(i)|>N^{-c},|N^{-\alpha } {\mathfrak {a}}(j)|>N^{-c}\) is at most

$$\begin{aligned} N\cdot N^{2c}\cdot \exp (-CN^{2(\alpha -c)\beta }), \end{aligned}$$
(2.7)

which is negligible compared to \(\exp (-CN^{\alpha \beta })\) if \(c>0\) is chosen small enough. Meanwhile, the possibility that there are more than \(N^{1.5c}\) sites i with \(|N^{-\alpha }{\mathfrak {a}}(i)|>N^{-c}\) is also \(o(\exp (-CN^{\alpha \beta }))\) as can be seen from taking a union bound. Also, the possibility that there are two distinct i and j such that \(|N^{-\alpha }{\mathfrak {a}}(i)|>\lambda \), \(|N^{-\alpha }{\mathfrak {a}}(j)|>\lambda \) is negligible compared to the possibility that there is one i with \(|N^{-\alpha }{\mathfrak {a}}(i)|>\lambda \). This justifies that in the following computation we may assume \(\epsilon _N\) is bounded away from zero (see the first bullet point in the assumption of Lemma 2.1).

Now we can apply Lemma 2.1 to complete the proof as follows. For any \(\omega >0\), lemma 2.1 implies that when N is sufficiently large, if \(\max _i N^{-\alpha }{\mathfrak {a}}(i)\ge \sqrt{(t+\omega )^2-4}\) and all the other conditions in Lemma 2.1 (1) are satisfied, then we have \(\lambda _1(N)>t\).

Our discussions in the previous paragraphs imply that, conditioned on the event that \(\max _i N^{-\alpha }{\mathfrak {a}}(i)>\sqrt{(t+\omega )^2-4}\), then all the other assumptions in the three bullet points of Lemma 2.1 are satisfied with probability \(1-o(1)\) and can be made uniform for all \(t>t_0>0\). Therefore, we compute: For N sufficiently large,

$$\begin{aligned}{} & {} {} \!\!\!\! {\mathbb {P}}(\lambda _1(N)>t)\ge {\mathbb {P}}(\max _i N^{-\alpha }{\mathfrak {a}}(i)>{\sqrt{(t+\omega )^2-4}})\nonumber \\{} & {} {} \quad \cdot {\mathbb {P}} (\text{ Assumptions } \text{ of } \text{ Lemma }~2.1 \text{(1) } \text{ satisfied } \mid \max _i N^{-\alpha }{\mathfrak {a}}(i)\nonumber >\sqrt{(t+\omega )^2-4})\nonumber \\{} & {} {} \quad \ge N\exp (-CN^{\alpha \beta }(\sqrt{(t+\omega )^2-4})^\beta )(1+o(1)) \end{aligned}$$
(2.8)

We also utilized the fact that

$$\begin{aligned} {\mathbb {P}}({\text {argmax}}_i N^{-\alpha }{\mathfrak {a}}(i)\in [N^c,N-N^c])=1+o(1). \end{aligned}$$

This completes the proof of the lower bound.

Now we prove the reverse inequality. Fix \(t>2\) and \(\omega >0\) sufficiently small. The idea is that the possibility that the assumptions in the three bullet points of Lemma 2.1 do not hold is negligible compared to the probability that \({\mathbb {P}}(\max _i N^{-\alpha }{\mathfrak {a}}(i)>{\sqrt{(t-\omega )^2-4}})\), by our previous discussion. Then, assuming these assumptions are justified, Lemma 2.1 (2) implies that if none of the \(N^{-\alpha }\mathfrak (a)(i)\ge \sqrt{(t-\omega )^2-4}\), then we must have \(\lambda _1(N)<t\). Written more formally, we have deduced that for any \(\omega >0\), when N is sufficiently large,

$$\begin{aligned} {\mathbb {P}}(\lambda _1(N)>t)\le & {} {} {\mathbb {P}}(\max _i N^{-\alpha }{\mathfrak {a}}(i)>{\sqrt{(t-\omega )^2-4}})\nonumber \\{}{} & {} {} +{\mathbb {P}}(\text{ Assumptions } \text{ in } \text{ Lemma }~2.1 \text{ not } \text{ all } \text{ satisfied } \text{ with } \lambda _i\nonumber = {} N^{-\alpha }{\mathfrak {a}}(i))\nonumber \\ {}= & {} N\exp (-CN^{\alpha \beta }(\sqrt{(t-\omega )^2-4})^\beta )(1+o(1)). \end{aligned}$$
(2.9)

This completes the proof of case (1), the upper tail.\(\square \)

Proof of Theorem 1.1, upper tail of case (2)

Now we consider case (2), where the potentials \({\mathfrak {a}}(i)\) have heavier tails. We see that

$$\begin{aligned} \!\!\!\!\!\!\! {\mathbb {P}}\left( \max \{N^{-\alpha } {\mathfrak {a}}(1),\ldots ,N^{-\alpha }{\mathfrak {a}}(N)\}\ge \lambda \right)= & {} 1-(1-{\mathbb {P}}({\mathfrak {a}}(1)>N^\alpha \lambda ))^N\nonumber \\= & {} 1-(1-CN^{-\alpha \beta }\lambda ^{-\beta })^N\nonumber \\= & {} CN^{1-\alpha \beta }\lambda ^{-\beta } (1+o(1)), \end{aligned}$$
(2.10)

recalling the assumption that \(\alpha \beta >1\). Similar to the previous case, one can check that conditioning on the event that (2.10) holds, the assumptions of Lemma 2.1 hold with probability \(1+o(1)\). Therefore, we have the analogue of estimates (2.8), (2.9) in this setting of heavier tails: For any \(\omega >0\), when N is sufficiently large,

$$\begin{aligned} {\mathbb {P}}(\lambda _1(N)>t)\ge CN^{1-\alpha \beta }\sqrt{(t+\omega )^2-4}^{-\beta } (1+o(1)), \end{aligned}$$
(2.11)

and for any \(\omega >0\), when N is sufficiently large,

$$\begin{aligned} \begin{aligned} {\mathbb {P}}(\lambda _1(N)>t)&\le CN^{1-\alpha \beta }\sqrt{(t-\omega )^2-4}^{-\beta } (1+o(1))\\ {}&\quad +{\mathbb {P}}(\text{ Assumptions } \text{ in } \text{ Lemma }~2.1 \text{ not } \text{ all } \text{ satisfied } \text{ with } \lambda _i=N^{-\alpha }{\mathfrak {a}}(i))\\ {}&\le CN^{1-\alpha \beta }\sqrt{(t-\omega )^2-4}^{-\beta } (1+o(1)), \end{aligned}\nonumber \\ \end{aligned}$$
(2.12)

justifying the upper tail in case (2). \(\square \)

Proof of Theorem 1.1, upper tail of case (3)

Now we consider case (3). Under the critical moment condition (1.18),

$$\begin{aligned} {\mathbb {P}}\left( \max \{N^{-\alpha } {\mathfrak {a}}(1),\ldots ,N^{-\alpha }{\mathfrak {a}}(N)\}\ge \lambda \right)= & {} 1-(1-{\mathbb {P}}({\mathfrak {a}}(1)>N^\alpha \lambda ))^N\nonumber \\= & {} 1-(1-CN^{-1}\lambda ^{-\frac{1}{\alpha }})^N\nonumber \\\rightarrow & {} 1-e^{-C\lambda ^{-\frac{1}{\alpha }}},\quad N\rightarrow \infty . \nonumber \\ \end{aligned}$$
(2.13)

Moreover, when the claimed event (2.13) occurs, it is easy to check that the assumptions in Lemma 2.1 hold with probability \(1-o(1)\). Therefore, we have the following two-sided inequalities: For any \(t>2\) and any \(\omega >0\) sufficiently small, then for N sufficiently large,

$$\begin{aligned}{} & {} \!\!\!\!\!\!\!\left( 1-e^{-c(\sqrt{(t+\omega )^2-4})^{-\frac{1}{\alpha }}}\right) (1+o(1))\le {\mathbb {P}}(\lambda _1(N)>t)\nonumber \\{} & {} \quad \le \left( 1-e^{-c(\sqrt{(t- \omega )^2-4})^{-\frac{1}{\alpha }}}\right) (1+o(1)). \end{aligned}$$
(2.14)

Setting \(\omega >0\) to be arbitrarily small, the upper tail claim in case (3) now follows. \(\square \)

Proof of Theorem 1.1, Poisson distribution in case (4)

The argument is similar to [23].

We first compute that for any \(\lambda >0\),

$$\begin{aligned} {\mathbb {P}}\left( \max \{N^{-\frac{1}{\beta }} {\mathfrak {a}}(1),\ldots ,N^{-\frac{1}{\beta }}{\mathfrak {a}}(N)\}\ge \lambda \right)= & {} 1-(1-{\mathbb {P}}({\mathfrak {a}}(1)>N^\alpha \lambda ))^N \nonumber \\= & {} 1-(1-CN^{-1}\lambda ^{-\frac{1}{\beta }})^N\nonumber \\\rightarrow & {} 1-e^{-C\lambda ^{-\frac{1}{\beta }}},\quad N\rightarrow \infty .\nonumber \\ \end{aligned}$$
(2.15)

That is,

$$\begin{aligned} {\mathbb {P}}(\max _i N^{-\alpha }{\mathfrak {a}}(i)\ge N^{\frac{1}{\beta }-\alpha }\lambda )\rightarrow 1-e^{-C\lambda ^{-\frac{1}{\beta }}}, \end{aligned}$$

and one can easily check that the second largest value among \(N^{-\frac{1}{\beta }}{\mathfrak {a}}(i)\) is strictly smaller than the first largest, with probability \(1-o(1)\). Assuming that \(i\in [n]\) is the site where the maximum is achieved in (2.15), then we can choose a unit vector such that \(f_i=(0,\ldots ,0,1,\ldots ,0)\), i.e., \(f_i\) is 1 on the ith coordinate and is zero everywhere else. Then one easily sees, assuming \(\max _i N^{-\frac{1}{\beta }}{\mathfrak {a}}(i)=\lambda \), that

$$\begin{aligned} H_N^\alpha f_i=N^{\frac{1}{\beta }-\alpha }\lambda f_i(1+o(1)), \end{aligned}$$

so by perturbation theorem of Hermitian operators, \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \) has en eigenvalue at \(\lambda (1+o(1))\). Meanwhile, observe that for such a choice of \(\lambda \), we have \(N^\frac{1}{\beta }-\lambda \) is the value of the matrix \(H_N^\sigma \) (which is also sparse), so the largest eigenvalue of \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \) cannot exceed \(\lambda (1+o(1))\). This completes the proof of part (4). \(\square \)

Proof of Theorem 1.1, Lower tails in cases (1), (2) and (3).

The last step we need to verify is that the possibility for \(\lambda _1(N)\) to be smaller than 2 is much smaller than the possibility specified in each case (1),(2),(3), justifying our claims on the left tail.

For this purpose, note that if \(\lambda _1(N)<2-\delta <2\) for some \(\delta >0\), then

$$\begin{aligned} d_2(\mu _{H_N^{\alpha }},\mu _{H_N^\infty })>C(\delta )>0 \end{aligned}$$

for some constant \(C(\delta )>0\), where \(d_2\) denotes the 2-Wasserstein distance on \({\mathbb {R}}\) defined for two probability measures \(\mu ,\nu \) via

$$\begin{aligned} d_2(\mu ,\nu )=\inf _{\pi \in {\mathcal {C}}(\mu ,\nu )}\left( \int _{{\mathbb {R}}^2} |x-y|^2d\mu (x,y)\right) ^{1/2}, \end{aligned}$$

(with \({\mathcal {C}}(\mu ,\nu )\) the space of probability measures on \({\mathbb {R}}^2\) with first marginal \(\mu \) and second marginal \(\nu \)), and \(\mu _{H_N^{\alpha }}\),\(\mu _{H_N^\infty }\), respectively, denote the empirical measure of eigenvalues of \(H_N^\alpha \) and \(H_N^\infty \).

Using the trivial coupling, we have

$$\begin{aligned} d_2(\mu _{H_N^{\alpha }},\mu _{H_N^\infty })\le & {} (\frac{1}{N}\sum _{i=1}^N |\lambda _i^\alpha -\lambda _i^\infty |^2)^{1/2} \nonumber \\\le & {} \frac{1}{\sqrt{N}}({\text {Tr}}\left( H_N^\infty -H_N^\alpha )^2\right) ^{\frac{1}{2}}\nonumber \\= & {} \frac{1}{N^{\alpha +\frac{1}{2}}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^2\right) ^{\frac{1}{2}}, \end{aligned}$$
(2.16)

where the last step follows from Hoffman–Wielandt inequality, see for example [1], Lemma 2.1.19 and also [9].

We begin with case (1) of Theorem 2.4, where \({\mathfrak {a}}(\cdot )\) is assumed to have a Weibull distribution. First assume (1a) that \(\beta =2\). Then we proceed with the following computation: By Markov’s inequality, for any \(d>0\),

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^2\ge N^{2\alpha +1}\right)\le & {} e^{-CdN^{2\alpha +1}}{\mathbb {E}}\left[ \exp \left( d\sum _{i=1}^N |{\mathfrak {a}}(i)|^2\right) \right] \nonumber \\= & {} e^{-CdN^{2\alpha +1}}{\mathbb {E}}\left[ \exp \left( d|{\mathfrak {a}}(i)|^2\right) \right] ^{N}\nonumber \\\le & {} C_2e^{-C_3 N^{2\alpha +1}} \end{aligned}$$
(2.17)

for some constants \(C_2,C_3>0\), where the last step we used the tail decay of the law of \({\mathfrak {a}}\).

Then assume (1b) that \(\beta >2\), then from Jensen’s inequality

$$\begin{aligned} \sum _{i=1}^N |{\mathfrak {a}}(i)|^2 \le \left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^\beta \right) ^\frac{2}{\beta } N^{1-\frac{2}{\beta }} \end{aligned}$$
(2.18)

we similarly deduce that, for a different \(C'>0\),

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^2\ge CN^{2\alpha +1}\right)\le & {} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^\beta \ge C'N^{\frac{\beta }{2}(2\alpha +\frac{2}{\beta })}\right) \nonumber \\\le & {} C_2 e^{-C_3N^{\alpha \beta +1}}, \end{aligned}$$
(2.19)

where in the last steep we used Markov’s inequality (see (2.17)) and the tail estimate in the law of \({\mathfrak {a}}\).

Finally assume (1c) that \(\beta <2\). In this case, the above reasoning is no longer effective. We end up with a much weaker estimate of the following form: For some \(c>0\) to be determined,

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^2\ge CN^{2\alpha +1}\right)\le & {} {\mathbb {P}}\left( \max _{i\in [1,N]}|{\mathfrak {a}}(i)|\ge CN^{\alpha +c}\right) \nonumber \\{} & {} +{\mathbb {P}}\left( \sum _{i=1}^N|{\mathfrak {a}}(i)|^\beta \ge CN^{2\alpha +1-(2-\beta )(\alpha +c)}\right) \nonumber \\\le & {} C_2e^{-C_3N^{(\alpha +c)\beta }}+C_2e^{-C_3N^{\alpha \beta +1+(\beta -2) c}},\nonumber \\ \end{aligned}$$
(2.20)

and a careful choice of (sufficiently small) \(c>0\) completes the proof.

Now we consider case (2) of Theorem 2.4, i.e., the random variables \({\mathfrak {a}}(\cdot )\) have heavy tails yet \(\alpha \beta >1\). Fix some sufficiently small \(\epsilon >0\), then \({\mathfrak {a}}\) has finite \(\beta -\epsilon \)th moment. Denote by \(\beta '=\beta -\epsilon \). In case (2a), (2b) that \(\beta '\ge 2\), we get similar to the previous proof that

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^2\ge CN^{2\alpha +1}\right)\le & {} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^{\beta '}\ge C'N^{\frac{\beta '}{2}(2\alpha +\frac{2}{\beta '})}\right) \nonumber \\\le & {} (C_2N)^{-(\alpha \beta '+1)}. \end{aligned}$$
(2.21)

In case (2c) that \(\beta '<2\), we deduce as previous that for sufficiently small \(c>0\),

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^N |{\mathfrak {a}}(i)|^2\ge CN^{2\alpha +1}\right)\le & {} {\mathbb {P}}\left( \max _{i\in [1,N]}|{\mathfrak {a}}(i)|\ge CN^{\alpha +c}\right) \nonumber \\{} & {} +{\mathbb {P}}\left( \sum _{i=1}^N|{\mathfrak {a}}(i)|^{\beta '}\ge CN^{2\alpha +1-(2-{\beta '})(\alpha +c)}\right) \nonumber \\\le & {} C_2N^{1-\beta '(\alpha +c)}+(C_2N)^{-(\alpha \beta +1+(\beta -2) c)}.\nonumber \\ \end{aligned}$$
(2.22)

This completes the proof. \(\square \)

Remark 2.2

We can also obtain the explicit dependence on \(\lambda \) concerning the left tail rate function. Notice that \(\mu _{H_N^\infty }\) converges to the arcsine law \(\mu _{as}\) on \([-2,2]\), we can deduce that if \(\lambda _1(N)\in (-2,2),\) then

$$\begin{aligned} d_2^2(\mu _{H_N^{\alpha }},\mu _{as})\ge \int _{\lambda _1(N)}^2 \frac{(x-\lambda _1(N))^2dx}{\pi \sqrt{4-x^2}}, \end{aligned}$$

and when \(\lambda _1(N)<-2\), we similarly have

$$\begin{aligned} d_2(\mu _{H_N^{\alpha }},\mu _{as})\ge -2-\lambda _1(N). \end{aligned}$$

Then we can plug this more precise estimate into (2.16), (2.17) and apply Markov’s inequality. Overall we get a left tail estimate with explicit dependence on \(\mu \).

The same procedure applies to the matrix model \(G_N^\alpha \), and one only needs to replace the arcsine law \(\mu _{as}\) by the semicircle law \(\mu _{sc}\), which is the limiting spectral measure of \(G_N^\infty \). \(\square \)

Finally we prove Theorem 1.3. The ideas are similar to the proof of [23], Theorem 1.2. We begin with a lemma:

Lemma 2.3

For any \(0<c_1<d_1<c_2<d_2<\cdots<c_k<d_k<\infty \), let \(I_l=(c_l,d_l)\).

Then in case (3) of Theorem 2.4, the number of \(N^{-\alpha }{\mathfrak {a}}(i)\), \(i\in [N]\) in \((I_l)_{l=1}^k\) forms a Poisson point process on \([0,\infty )\) with intensity \(\frac{C}{\alpha x^{\frac{1}{\alpha }+1}}\) on \([0,\infty )\).

In case (4) of Theorem 2.4, the number of \(N^{-\frac{1}{\beta }}{\mathfrak {a}}(i)\) in \((I_l)_{l=1}^k\) forms a Poisson point process on \([0,\infty )\) with intensity \(\frac{C\beta }{ x^{\beta +1}}\) on \([0,\infty )\).

This lemma is close to Proposition 1 of [23]. Its verification is a generalization of computations in (2.13) and (2.15) and can be found for example in [16], Theorem 2.3.1.

Proof of Theorem 1.3

In case (3) of Theorem 1.3, from an elementary computation of random variables, we see that for any \(\epsilon >0\), and \(\omega '>0\), with probability \(1-\omega '\) there are only a (bounded in N, depending on \(\omega '\)) number of \(i\in [N]\) such that \(N^{-\alpha }{\mathfrak {a}}(i)>\epsilon \), and with probability \(1-\omega '\) the configuration \((\lambda _i^{(N)}=N^{-\alpha }{\mathfrak {a}}(i))_{i\in [N]}\) satisfies the assumptions of Lemma 2.1 (3), and that any i such that \(N^{-\alpha }{\mathfrak {a}}(i)>\epsilon \) must satisfy \(i\in [N^c,N-N^c]\) for a small c. In this case, the computations in Lemma 2.1(3) provide us with a one-to-one correspondence between the large N-limit of some \(N^{-\alpha }{\mathfrak {a}}(i)>\epsilon \) and the large N-limit of a large eigenvalue of \(H_N^\alpha \) via the map \(x\mapsto \sqrt{x^2+4}\). This completes the proof.

In case (4) of Theorem 1.3, the idea of proof is similar to [23], Theorem 1.2. We claim that the large N-limit of the configuration of \(N^{-\frac{1}{\beta }}{\mathfrak {a}}(i)\) precisely corresponds to the large N-limit of the largest eigenvalues of \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \). This correspondence has been verified for the top eigenvalue in the proof of Theorem 1.1, and repeating that proof for the second, third, etc. largest point in the limit (considering approximate eigenfunctions supported on only one index and using perturbation to show existence of an eigenvalue) shows that each of these limit points corresponds to the \(N\rightarrow \infty \) limit of an eigenvalue of \(H_N^\alpha \), so we have identified k distinct values as the \(N\rightarrow \infty \) limit of eigenvalues of \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \). To show that they are indeed the largest k limiting eigenvalues of \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \), consider the k indices i such that \(N^{-\frac{1}{\beta }}{\mathfrak {a}}(i)\) achieves the k largest values for \(i\in [1,N]\) and consider \(H_N^{\alpha ,k}\), which is \(H_N^\alpha \) with these k rows and columns removed. Then the largest eigenvalue of \(N^{\alpha -\frac{1}{\beta }}H_N^{\alpha ,k}\) converges to the \(k+1\)-st largest point in the limiting configuration of \(N^{-\frac{1}{\beta }}{\mathfrak {a}}(i)\). Meanwhile, by interlacing, this limit is at least as large as the \(k+1\)th largest eigenvalue of \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \). Thus, we have set up the correspondence between the top k eigenvalues of \(N^{\alpha -\frac{1}{\beta }}H_N^\alpha \) and the largest k points in the Poisson point process, for each k. \(\square \)

3 Proof of Theorem 1.5

The proof of Theorem 1.5 is essentially the same as that of Theorem 1.1, except that we need to prove the following result on unitarily perturbed matrices.

Lemma 3.1

Recall the deterministic matrix \(G_N^\infty \) defined in (1.6). Let

$$\begin{aligned} \Lambda _N={\text {diag}}(\lambda _1^{(N)},\ldots ,\lambda _N^{(N)}), \end{aligned}$$

and let \(U_N\) be a Haar distributed unitary matrix. Let \(M_N:=\max _{i\in [N]}\lambda _i^{(N)}>0\). Assume that for any \(\epsilon >0\) there are only finitely many \(i\in [1,N]\) (the number does not grow with N) such that \(|\lambda _i^{(N)}|>\epsilon \). Assume that

$$\begin{aligned} \lim _{N\rightarrow \infty }M_N=M>0. \end{aligned}$$

Then as \(N\rightarrow \infty \), the largest eigenvalue of

$$\begin{aligned} G_N^\infty +U_N\Lambda _N U_N^T \end{aligned}$$

converges almost surely to

$$\begin{aligned} {\left\{ \begin{array}{ll} 2, \quad M<1,\\ M+\frac{1}{M},\quad M\ge 1. \end{array}\right. } \end{aligned}$$
(3.1)

Proof

Consider the matrix \(G_N^\beta \) defined as

$$\begin{aligned} G_N^\beta =\frac{1}{\beta } \begin{pmatrix} N(0,2) &{}\quad \chi _{(N-1)\beta } &{}\quad 0&{}\quad \cdots &{}\quad \cdots &{}\quad 0\\ \chi _{(N-1)\beta }&{}\quad N(0,2) &{}\quad \chi _{(N-2)\beta }&{}\quad 0 &{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots \\ 0&{}\quad \cdots &{}\quad \chi _{3\beta }&{}\quad N(0,2) &{}\quad \chi _{2\beta }&{}\quad 0\\ 0&{}\quad \cdots &{}\quad 0 &{}\quad \chi _{2\beta } &{}\quad N(0,2) &{}\quad \chi _\beta \\ 0 &{}\quad \cdots &{}\quad \cdots &{}\quad 0&{}\quad \chi _\beta &{}\quad N(0,2) \end{pmatrix},\nonumber \\ \end{aligned}$$
(3.2)

where the \(N(\cdot )\) are centered normal distributions and the \(\chi _\cdot \) are chi distributions with the specified parameter. Thanks to sub-Gaussian tails of these random variables, once we have proved that the top eigenvalue of \(G_N^\beta +U_N\Lambda _N U_N^T\) converges almost surely to some deterministic limit as \(N\rightarrow \infty \), the same conclusion holds for the top eigenvalue of \(G_N^\infty +U_N\Lambda _N U_N^T\).

The matrix \(G_N^\beta \) is the matrix representation for Gaussian beta ensembles obtained in Dumitriu–Edelman [10]. In particular, the empirical measure of eigenvalues of \(G_N^\beta \) converges to the semicircle law on \([-2,2]\) as \(N\rightarrow \infty \) and that the largest and smallest eigenvalues of \(G_N^\beta \) converge to \(-2\) and 2, respectively.

Thanks to our assumption on the magnitude of \(\lambda _i\), we may assume without loss of generality that \(\Lambda _N\) has finite rank (that is, we set \(\lambda _i\) to be 0 if \(|\lambda _i|<c\), and the resulting top eigenvalue of the modified matrix differs by at most c to the top eigenvalue of the original matrix via Cauchy interlacing formula, and we finally set \(c\rightarrow 0\)).

Since \(U_N\Lambda _N U_N^T\) is orthogonal invariant, we may use the main result of [5] to conclude that the top eigenvalue of \(G_N^\beta +U_N \Lambda _N U_N^T\) converges to (3.1) almost surely. Although the paper [5] requires the spike matrix has eigenvalues independent of N, which is not assumed here, the result in [5] can still be applied here as we can take N sufficiently large. We only need the following modifications: As the largest eigenvalue of the spiked matrix only depends on \(M_N^1\) but not the rest, we perturb \(M_N^1\) to be \(M+\epsilon \) (such that \(M+\epsilon >M_N^1\), and later set \(\epsilon \rightarrow 0\),) and apply [5]. The effect of such replacement to the top eigenvalue converges to 0 thanks to Cauchy interlacing theorem. Then by the previous argument, the top eigenvalue of \(G_N^\infty +U_N\Lambda _N U_N^T\) converges to the same limit. \(\square \)

Now we complete the proof of Theorem 1.5.

Proof

The proof of Theorem 1.5 is exactly the same as the proof of Theorem 1.1 and 1.3, and one simply needs to replace Lemma 2.1 by Lemma 3.1. The proof of upper tails is exactly the same via Hoffman–Wielandt inequality, so we concentrate on proving the lower tails.

For the lower tails, we follow exactly the same computation as in Theorem 1.1. For the upper tails in cases (3), we only need to check that the assumptions of Lemma 3.1 are satisfied with probability \(1-o(1)\): Indeed, we may work under assumption (1.18) where \({\mathfrak {a}}\) has the heaviest tails and deduce that

$$\begin{aligned}{} & {} {\mathbb {P}}(\text {there exists } k \text { different } i\in [N]: N^{-\frac{1}{\alpha }}|{\mathfrak {a}}(i)|>\epsilon )\le C_\epsilon \genfrac(){0.0pt}1{k}{N} (N)^{-k}\le \frac{C_\epsilon }{k!}\rightarrow 0,\nonumber \\{} & {} \quad \quad k\rightarrow \infty , \end{aligned}$$
(3.3)

so that for any \(\epsilon >0\), we may assume thee are a (bounded in N) number of subscripts i such that \(N^{-\frac{1}{\alpha }}|{\mathfrak {a}}(i)|>\epsilon \). Then by Lemma 3.1, the largest eigenvalue is governed by the largest value of the potentials \(N^{-\frac{1}{\alpha }}{\mathfrak {a}}(i),i\in [N].\) The distribution of this maximum is already derived in (2.4), (2.10) and (2.13). This completes the proof.

For the upper tail in case (2), we check that for any \(\epsilon >0\),

$$\begin{aligned} {\mathbb {P}}(\text {there exists two different } i\in [N]: N^{-\frac{1}{\alpha }}|{\mathfrak {a}}(i)|>\epsilon )= & {} O(N^2N^{-2\alpha \beta })\nonumber \\= & {} o(N^{-\alpha \beta +1}) \end{aligned}$$
(3.4)

and hence negligible in the limit (1.27).

For the upper tail in case (1), we check that for any \(\epsilon >0\),

$$\begin{aligned} {\mathbb {P}}(\text {there exists two different } i\in [N]: N^{-\frac{1}{\alpha }}|{\mathfrak {a}}(i)|>\epsilon ) =O(N^2e^{-2CN^{\alpha \beta }})\nonumber \\ \end{aligned}$$
(3.5)

and hence negligible in the limit (1.26). \(\square \)