1 Introduction

In the study of the Anderson Model and Random Schrödinger operators, modulus of continuity of the Integrated Density of States (IDS) is well understood, (see Kirsch and Metzger [35] for a comprehensive review). In dimension bigger than one, there are very few results on further smoothness of the IDS, even when the single site distribution is assumed to have more smoothness, except for the case of the Anderson model itself at high disorder, (see for example Campanino and Klein [9], Bovier et al. [8], Klein and Speis [39], Simon and Taylor [51]).

In this paper we will show, in Theorems 3.4 and 4.4, that the IDS is almost as smooth as the single site distribution for a large class of continuous and discrete random operators. These are

$$\begin{aligned} H^\omega = H_0 + \sum _{n \in {\mathbb {Z}}^d} \omega _n u_n, \end{aligned}$$
(1.1)

on \(L^2({\mathbb {R}}^d)\) and

$$\begin{aligned} h^\omega = h_0 + \sum _{n \in \mathbb {G}} \omega _n P_n, \end{aligned}$$
(1.2)

on the separable Hilbert space \(\mathscr {H}\) and a countable set \(\mathbb {G}\). The operator \(h_0\) is a bounded self-adjoint operator and the \(\{P_n\}\) are finite rank projection. We specify the conditions on \(H_0, h_0, u_n, P_n\) and \(\omega _n\) in the following sections.

The IDS, denoted \({{\mathcal {N}}}(E)\), is the distribution function of a non-random measure obtained as the weak limit of a sequence of random atomic measures. The proof of the existence of such limits for various models of random operators has a long history. These results are well documented in the books of Carmona and Lacroix [10], Figotin and Pastur [46], Cycon et al. [18], Kirsch [33] and the reviews of Kirsch and Metzger [35], Veselić [57] and in a review for stochastic Jacobi matrices by Simon [49]. In terms of the projection valued spectral measures \(E_{H^\omega }, E_{h^\omega }\) associated with the self-adjoint operators \(H^\omega , h^\omega \), the function \({{\mathcal {N}}}(E)\) has an explicit expression, for the cases when \(h^\omega \), \(H^\omega \) are ergodic. For the model (1.1) it is given as

$$\begin{aligned} \frac{1}{\int u_0(x) dx} {\mathbb {E}}\Bigl [ tr\Bigl (u_0 E_{H^\omega }((-\infty , E])\Bigr )\Bigr ] \end{aligned}$$

and for the model (1.2) it turns out to be

$$\begin{aligned} \frac{1}{tr(P_0)} {\mathbb {E}}\bigg [ tr\bigg (P_0 E_{h^\omega } ((-\infty , E])\bigg ) \bigg ]. \end{aligned}$$

We note that by using the same symbol \({{\mathcal {N}}}\) for two different models, we are abusing notation but this abuse will not cause any confusion as the contexts are clearly separated to different sections. The first of these expressions for the IDS is often called the Pastur-Shubin trace formula.

In the case of the model (1.1) in dimensions \(d \ge 2\), there are no results in the literature on the smoothness of \({{\mathcal {N}}}(E)\), our results are the first to show even continuity of the density of states (DOS), which is the derivative of \({{\mathcal {N}}}\) almost every E. The results of Bovier et al. in [8] are quite strong for the Anderson model at large disorder and it is not clear that their proof using supersymmetry extends to other discrete random operators.

In the one dimensional Anderson model, Simon and Taylor [51] showed that \({{\mathcal {N}}}(E)\) is \(C^\infty \) when the single site distribution (SSD) is compactly supported and is Hölder continuous. Subsequently, Campanino and Klein [9] proved that \({{\mathcal {N}}}(E)\) has the same degree of smoothness as the SSD. In the one dimensional strip, smoothness results were shown by Speis [53, 54], Klein and Speis [38, 39], Klein et al. [37], Glaffig [30]. For some non-stationary random potentials on the lattice, Krishna [41] proved smoothness for an averaged total spectral measure.

There are several results showing \({{\mathcal {N}}}(E)\) is analytic for the Anderson model on \(\ell ^2({\mathbb {Z}}^d)\). Constantinescu et al. [16] showed analyticity of \({{\mathcal {N}}}(E)\) when SSD is analytic. The result of Carmona [10, Corollary VI.3.2] improved the condition on SSD to requiring fast exponential decay to get analyticity. In the case of the Anderson model over \(\ell ^2({\mathbb {Z}}^d)\) at large disorder the results of Bovier et al. [8] give smoothness of \({{\mathcal {N}}}(E)\) when the Fourier transform h(t) of the SSD is \(C^\infty \) and the derivatives decay like \(1/t^\alpha \) for some \(\alpha > 1\) at infinity. They also give variants of these, in particular if the SSD is \(C^{n+d}\) then \({{\mathcal {N}}}(E)\) is \(C^n\) under mild conditions on its decay at \(\infty \). They also obtain some analyticity results. Acosta and Klein [1] show that \({{\mathcal {N}}}(E)\) is analytic on the Bethe lattice for SSD close to the Cauchy distribution. While all these results are valid in the entire spectrum, Kaminaga et al. [32] showed local analyticity of \({{\mathcal {N}}}(E)\) when the SSD has an analytic component in an interval allowing for singular parts elsewhere, in particular for the uniform distribution. Analyticity results obtained by March and Sznitman [44] were similar to those of Campanino and Klein [9].

In all the above models, only when E varies in the pure point spectrum that regularity of \({{\mathcal {N}}}(E)\) beyond Lipshitz continuity is shown. This condition that E has to be in the pure point spectrum may not have been explicitly stated, but it turns out to be a consequence of the assumptions on disorder or assumptions on the dimension in which the models were considered. For the Cauchy distribution in the Anderson model on \(\ell ^2({\mathbb {Z}}^d)\), Carmona and Lacroix [10] have a theorem showing analyticity in the entire spectrum. However, absence of pure point spectrum is only a conjecture in these models as of now. At the time of revision of this paper one of us Kirsch and Krishna [36] could show that in the Anderson model on the Bethe lattice analyticity of the density of states with Cauchy distribution is valid at all disorders as part of a more general result. This result in particular exhibits regularity of the density of states through the mobility edge in the Bethe lattice case.

In the case of random band matrices, with the random variables following a Gaussian distribution, Disertori and Lager [25], Disertori [22, 23], Disertori et al. [24] have smoothness results for an appropriately defined density of states. Recently Chulaevsky [11] proved infinite smoothness for non-local random interactions.

For the one dimensional ergodic random operators IDS was shown to be log Hölder continuous by Craig and Simon [17]. Wegner proved Lipshitz continuity of the IDS for the Anderson model independent of disorder in the pioneering paper [58]. Subsequently there are numerous results giving the modulus of continuity of \({{\mathcal {N}}}(E)\), for independent random potential, showing its Lipschitz continuity. Combes et al. in [14] showed that for Random Scrödinger operators with independent random potentials, the modulus of continuity of \({{\mathcal {N}}}(E)\) is the same as that of the SSD. For non i.i.d potentials in higher dimensions there are some results on modulus of continuity for example that of Schlag [48] showing and by Bourgain and Klein [7] who show log Hölder continuity for the distribution functions of outer measures for a large class of random and non-random Schrödinger operators. We refer to these papers for more recent results on the continuity of \({{\mathcal {N}}}(E)\) not given in the books cited earlier.

The idea of proof of our Theorems is the following. Suppose we have a self-adjoint matrix \(A^\omega \) of size N with i.i.d real valued random variables \(\{\omega _1, \dots , \omega _N\}\) on the diagonal with each \(\omega _j\) following the distribution \(\rho (x)dx\). Then the average of the matrix elements of the resolvent of \(A^\omega \) are given by

$$\begin{aligned} f(z) = \int (A^\omega - zI)^{-1}(i, i) ~ \prod _{k=1}^N \rho (\omega _k)d\omega _k, \end{aligned}$$

for any \(z \in {\mathbb {C}}^+\). We take \(z = E+i\epsilon , ~ \epsilon >0\), then we see that from the definitions, the function \((A^\omega - zI)^{-1}(i, i)\) can be written as a function of \(\vec {\omega } - E\vec {1}\) and \(\epsilon \), namely,

$$\begin{aligned}&F(\vec {\omega } - E \vec {1}, \epsilon ) = (A^\omega - zI)^{-1}(i, i), ~ \Phi (\vec {\omega }) = \prod _{i=1}^N\rho (\omega _i) \\&\vec {\omega } = (\omega _1, \omega _2, \dots , \omega _N), ~ \vec {1} = (1, 1, \dots , 1). \end{aligned}$$

Then it is clear that with \(*\) denoting convolution of functions on \({\mathbb {R}}^N\) and setting \(\tilde{F_\epsilon }(x) = F(-x, \epsilon )\),

$$\begin{aligned} {\mathbb {E}}\left( (A^\omega - zI)^{-1}(i, i) \right) = (\tilde{F_\epsilon } * \Phi )(E \vec {1}). \end{aligned}$$

Since convolutions are smoothing, we get the required smoothness as a function of E if one of the components \(\tilde{F_\epsilon }\) or \(\Phi \) is smooth on \({\mathbb {R}}^N\). Since we are assuming that each \(\rho \) has a degree of smoothness, which passes on to \(\Phi \), we get a smoothness result for operators with finitely many random variables having the above form.

Let us remark here that it is in this step, which is crucial for further analysis, that we need a complete covering condition, even for finite dimensional compressions of our random operators be they continuous or discrete.

If we were to replace \(A^\omega \) by an operator with infinitely many random variables \(\omega _i\), we would encounter the problem of concluding smoothing properties of “convolutions of” functions of infinitely many variables. This is an important difficulty that needs to be solved.

One of the interesting aspects of the operator \(H^\omega \) (or \(h^\omega \)) we are dealing with is that there is a sequence of operators (denoted by \(A^\omega _k\)), containing finitely many random variables \(\omega _i\), which converges to \(H^\omega \) (or \(h^\omega \)) in strong resolvent sense. Hence we can write the limit as a telescoping sum, namely,

$$\begin{aligned} (A^\omega - z)^{-1}(i,i) = (A_1^\omega - z)^{-1}(i,i) + \sum _{k=1}^\infty \Bigl [ (A_{k+1}^\omega - z)^{-1}(i,i) - (A_k^\omega - z)^{-1}(i,i) \Bigr ]. \end{aligned}$$

Since the operators appearing in the summands all contain finitely many \(\omega _i\) their averages over the random variables can be written as convolutions of functions of finitely many variables \(\omega _i\). Then, most of the work in the proof is to show that quantities of the form

$$\begin{aligned} \left| \int \bigl [ (A_{k+1}^\omega - z)^{-1}(i,i) - (A_k^\omega - z)^{-1}(i,i) \bigr ] \left( \sum _{j=1}^{N_{k+1}}\frac{\partial }{\partial \omega _j}\right) ^l \prod _{n=1}^{N_{k+1}} \rho (\omega _n)d\omega _n\right| \end{aligned}$$

with \(N_k\) growing at most as a fixed polynomial in k, are summable in k. This is the part where we use the fact that we are working in the localized regime, where it is possible to show that they are exponentially small in k.

For the discrete case the procedure is relatively straight forward and there are no major technical difficulties to overcome, but in the continuous case, the infinite rank perturbations pose a problem, since the trace of the Borel–Stieltjes transform of the average spectral measures do not converge. We overcome this problem by renormalizing this transform appropriately. For our estimates to work, we have to use fractional moment bounds and also uniform bounds on the integrals of resolvents. Both of these are achieved because we have dissipative operators (up to a constant) whose resolvents can be written in terms of integrals of contraction semigroups.

As stated above, our proof is in the localized regime. The Anderson model was formulated by Anderson [5] who argued that there is no diffusion in these models for high disorder or at low energies. The corresponding spectral statement is that there is only pure point spectrum or only localization for these cases. In the one dimensional systems, where the results are disorder independent, localization was shown rigorously by Goldsheid et al. [31] for random Schrödinger operators and by Kunz and Souillard [43] for the Anderson model. For higher dimensional Anderson model the localization was proved simultaneously by Fröhlich et al. [26], Simon and Wolff [52], Delyon et al. [20] based on exponential decay shown by Fröhlich and Spencer [27] who introduced a tool called multiscale analysis in the discrete case. A simpler proof based on exponential decay of fractional moments was later given by Aizenman and Molchanov [4]. There are numerous improvements and extensions of localization results beyond these papers.

In the case of continuous models, Combes and Hislop [12, 14], Klopp [40], Germinet and Klein [28], Combes et al. [15], Bourgain and Kenig [6] and Germinet and De Bievre [29] provided proof of localization for different types of models. The fractional moment method was first extended to the continuous case in Aizenman et al. in [3] and later improved by Boutet de Monvel et al. [19].

We refer to Stollmann [55] for the numerous advances that followed on localization.

The rest of the article is divided into three parts. Section 2 has all the preliminary results, which will be used significantly for both the discrete and the continuous case. Section 3 will deal with the discrete case, where we use a method of proof which will be reused for the continuous case. The main result of Sect. 3 is Theorem 3.4 which in the case of Anderson tight binding model would prove the regularity of density of states. Finally in Sect. 4 we will deal with the random Schrödinger operators and the main result there is Theorem 4.4.

2 Some Preliminary Results

In this section we present some general results that are at the heart of the proofs of our theorems. These are Theorems 2.1 and 2.2. The latter theorem, stated for functions, gives a bound of the form

$$\begin{aligned} \left| \int \left( \frac{1}{x-w} - \frac{1}{x -z}\right) f(x) dx\right| \le C_{f,s} |z-w|^s \end{aligned}$$

for certain family of f. For operators, we need more work and need more uniformity for f.

The first theorem is quite general and is about random perturbations of self-adjoint operators and their smoothing properties of complex valued functions of the operators.

Theorem 2.1

Consider a self-adjoint operator A on a separable Hilbert space \(\mathscr {H}\) and let \(\{T_n\}_{n=1}^N, N < \infty \) be bounded positive operators such that \(\sum _{n=1}^N T_n = I\), where I denotes the identity operator on \(\mathscr {H}\). Suppose \(\{\omega _n, n=1, \dots , N\}\) are independent real valued random variables distributed according to \(\rho _n(x) dx\) and consider the random operators \(A^\omega = A + \sum _{n=1}^N \omega _n T_n\). If f is a complex valued function on the set of linear operators on \(\mathscr {H}\), such that \(f(A^\omega - E I)\) is a bounded measurable function of \((\omega _1, \dots , \omega _n, E)\), then \( h(E) = {\mathbb {E}}\big [ f(A^\omega - EI)\big ] \) satisfies \(h \in C^m({\mathbb {R}}) , ~~ \mathrm {if} ~~ \rho _n \in C^m({\mathbb {R}})\) and \(\rho _n^{(k)} \in L^1({\mathbb {R}}), ~n=1,2,\ldots ,N\) and \( 0\le k\le m\).

Proof

Using the conditions on \(\{T_n\}\) we see that \(A^\omega - EI = A + \sum _{n=1}^N (\omega _n - E) T_n\). Thus \( f(A^\omega - E I)\) is a bounded measurable function of the variables \((\omega _1 - E, \omega _2 - E, \dots , \omega _n -E)\), which is a point \(\vec {\omega } - E \vec {1}\) in \({\mathbb {R}}^N\), where \(\vec {1} = (1, \dots , 1)\), we write \(F(\vec {\omega }- E\vec {1}) = f(A^\omega - EI)\). Then the expectation can be written as

$$\begin{aligned} {\mathbb {E}}[f(A^\omega - EI)] = \int _{{\mathbb {R}}^N} F(\vec {\omega } - E \vec {1}) \Phi (\vec {\omega })d\vec {\omega } = \int _{{\mathbb {R}}^N} F(-( E \vec {1} -\vec {\omega })) \Phi (\vec {\omega })d\vec {\omega }, \end{aligned}$$

where we set \(\Phi (\vec {\omega }) = \prod _{n=1}^N \rho _n(\omega _n)\). Writing now \(g(\vec {x}) = F(-\vec {x})\) we see that

$$\begin{aligned} {\mathbb {E}}[f(A^\omega - EI)] = (g*\Phi )(E\vec {1}), \end{aligned}$$

where \(*\) denotes convolution in \({\mathbb {R}}^N\). The result now follows easily from the properties of convolution of functions on \({\mathbb {R}}^N\). \(\quad \square \)

For later use we note that if \(\nabla \) denotes the gradient operator on differentiable functions on \({\mathbb {R}}^N\) and \(\mathbf{D}\) denotes \(\mathbf{D}\Phi = \nabla \Phi \cdot \vec {1} = \sum _{j=1}^N \frac{\partial }{\partial x_i}\Phi \), then an integration by parts yields

$$\begin{aligned} \frac{d^\ell }{dE^\ell }h(E)=\frac{d^\ell }{dE^\ell }( g*\Phi )(E \vec {1}) = (g * (\mathbf{D}^\ell \Phi ))(E\vec {1}). \end{aligned}$$
(2.1)

Remark 2.1

This theorem clarifies why the complete covering condition is needed in main our results for the discrete and the continuous models. The covering property is needed even for obtaining smoothness of finitely many random perturbations of a self-adjoint operator, while such a property is not needed for modulus of continuity results. We are unsure at the moment if this condition can be relaxed.

Let AB be self-adjoint operators and let \(F_1, F_2\) be bounded non-negative operators on a separable Hilbert space \(\mathscr {H}\). For \(X \in \{A, B\}, ~ z \in {\mathbb {C}}^+\), set,

$$\begin{aligned} R(X, x, y, z) = (X+xF_1 + yF_2 -z)^{-1} \end{aligned}$$

and

$$\begin{aligned} R(X, x, z) = (X+xF -z)^{-1},~~ F = F_1 + F_2 \end{aligned}$$

for the following Theorem. For the rest of the paper by a smooth indicator function on an interval (ab) we mean a smooth function which is one in \([c, d] \subset (a, b)\) which vanishes on \({\mathbb {R}}{\setminus } (a, b)\) with \(a - c + b-d\) as small as one wishes.

Theorem 2.2

Suppose \(A, B, F_1, F_2, F, z\) and \(\mathscr {H}\) be as above. Suppose \(\rho _1, \rho _2\) are compactly supported functions on \({\mathbb {R}}^+\) such that their derivatives are \(\tau \)-Hölder continuous and their supports are contained in \((0, \mathbf{R})\). Let \(\chi _\mathbf{R}\) denote a smooth indicator function of the set \((0, 2\mathbf{R}+1)\) and let \(\phi _\mathbf{R}(x) = \chi _\mathbf{R}(x+ \frac{5}{2}\mathbf{R}+1)\). Then for any \(0< s < \tau \) and some constant \(\Xi \) (depending upon \(\rho _1, \rho _2, s, \tau \) but independent of \(z, A, F_1, F_2\)),

  1. 1.
    $$\begin{aligned}&\displaystyle {\left\Vert \int F^\frac{1}{2}\bigg ( R(A, x_1, x_2, z) - R(B, x_1, x_2, z) \bigg )F^\frac{1}{2}\rho _1(x_1) \rho _2(x_2) dx_1 dx_2 \right\Vert } \nonumber \\&\quad \le \displaystyle \Xi {\int \left\Vert F^\frac{1}{2}\bigg (R(A, x_1, x_2, z) - R(B, x_1, x_2, z) \bigg )F^\frac{1}{2}\right\Vert ^s}, \nonumber \\&\qquad \times \phi _\mathbf{R}(x_1 )\phi _\mathbf{R}(x_2) dx_1 dx_2 . \end{aligned}$$
    (2.2)
  2. 2.

    Specializing to the case when \(F_1 = F_2, x_1 = x_2 = x/2\) we have

    $$\begin{aligned}&\displaystyle {\left\Vert \int F^\frac{1}{2}\bigg ( R(A, x, z) - R(B, x, z) \bigg )F^\frac{1}{2}\rho _1(x) dx \right\Vert } \nonumber \\&\quad \le \displaystyle \Xi {\int \left\Vert F^\frac{1}{2}\bigg (R(A, x, z) - R(B, x, z) \bigg )F^\frac{1}{2}\right\Vert ^s \phi _\mathbf{R}\left( x \right) dx }. \end{aligned}$$
    (2.3)

Remark 2.3

The integrals appearing in (2.2) and (2.3) are viewed as operators in the sense of direct integrals (see [47, Theorem XIII.85]). This is the case because \(X+ x_1 F_1+x_2 F_2\) is decomposable on

$$\begin{aligned} L^2\big (\mathbb {R}^2,\prod _i\rho (x_i)dx_i, \mathcal {H}\big ). \end{aligned}$$

Hence all the integrals of this operator valued function, that appear in the proof, are well-defined in the sense of direct integral representation [42].

Proof

We define

$$\begin{aligned} A^{t}=A+t~F, ~~ B^{t}=B+ t~F,\qquad \forall -2\mathbf{R}-1<t<-2\mathbf{R}. \end{aligned}$$

Then, we have the equality,

$$\begin{aligned} A + x_1 F_1 + x_2 F_2=A^t +\left( \frac{x_1-x_2}{2}\right) (F_1 - F_2)+\left( \frac{x_1 + x_2}{2}-t\right) F . \end{aligned}$$
(2.4)

Using the resolvent equation, we have, with \(F_- = F_1 - F_2\),

$$\begin{aligned}&R(A, x_1, x_2, z) =\left( A^{t}+\left( \frac{x_1-x_2}{2}\right) F_- -z\right) ^{-1}\nonumber \\&\quad -\left( \frac{x_1+x_2}{2}-t\right) R(A, x_1, x_2, z)F \left( A^{t}+\left( \frac{ x_1 -x_2}{2}\right) F_- -z\right) ^{-1} \end{aligned}$$
(2.5)

which can be re-written (using the notation \({\tilde{A}}^{t}=A^{t}+\left( \frac{x_1 - x_2}{2}\right) F_- \)) as

$$\begin{aligned}&\sqrt{F}R(A, x_1, x_2, z)\sqrt{F}=\frac{1}{\frac{x_1+x_2}{2}-t}I\nonumber \\&\qquad -\frac{1}{\left( \frac{x_1+x_2}{2}-t\right) ^2}\left( \frac{1}{\frac{x_1+x_2}{2}-t} I+\sqrt{F} \left( {\tilde{A}}^{t}-z\right) ^{-1} \sqrt{F}\right) ^{-1}. \end{aligned}$$
(2.6)

(I is the identity operator on the range of \(\sqrt{F}\)) Similar relations hold for B, where \(B^t, {\tilde{B}}^t\) are defined by replacing A with B in the Eqs. (2.42.6). We set

$$\begin{aligned} {\tilde{R}}^t_{A, z} = \sqrt{F}({\tilde{A}}^t -z)^{-1}\sqrt{F}, ~~ {\tilde{R}}^t_{B, z} = \sqrt{F}({\tilde{B}}^t -z)^{-1}\sqrt{F}. \end{aligned}$$

Then using Eq. (2.6) we get the relation,

$$\begin{aligned}&\int \sqrt{F}(R(A, x_1, x_2, z)-R(B, x_1, x_2, z))\sqrt{F} ~\rho _1(x_1)\rho _2(x_2) dx_1 dx_2 \nonumber \\&\quad =\int \left[ \left( \frac{1}{\frac{x_1+x_2}{2}-t} I+ {\tilde{R}}^{t}_{A,z}\right) ^{-1}-\left( \frac{1}{\frac{x_1+x_2}{2}-t} I+{\tilde{R}}^{t}_{B,z}\right) ^{-1} \right] \nonumber \\&\qquad \frac{1}{\left( \frac{x_2+x_2}{2}-t\right) ^2}\rho _1(x_1)\rho _2(x_2) dx_1 dx_2 \nonumber \\&\quad =2\int \left[ \left( \gamma I+ {\tilde{R}}^{t}_{A,z}\right) ^{-1}-\left( \gamma I+{\tilde{R}}^{t}_{B,z}\right) ^{-1} \right] \nonumber \\&\quad \qquad \rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma d\eta \end{aligned}$$
(2.7)

where \(\gamma =\left( \frac{x_1+x_2}{2}-t\right) ^{-1}\) and \(\eta =\frac{x_1-x_2}{2}\). For X self-adjoint, \({\tilde{R}}^{t}_{X,z}\) is an operator valued Herglotz function and its imaginary part is a positive operator for \(\mathfrak {I}(z) >0\). Hence the operators \( \left( \gamma I+ {\tilde{R}}^{t}_{X,z}\right) \) generate a strongly continuous one parameter semi-group, and we can apply the Lemma A.3 for the \(\gamma \) integral, and then do the \(\eta \) integral to get

$$\begin{aligned}&\int \left[ \left( \gamma I+ {\tilde{R}}^{t}_{A,z}\right) ^{-1}-\left( \gamma I+{\tilde{R}}^{t}_{B,z}\right) ^{-1} \right] \nonumber \\&\qquad \rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma d\eta \nonumber \\&\quad =-\int \left[ \int _0^\infty \left( e^{i w\left( \gamma I+ {\tilde{R}}^{t}_{A,z}\right) }-e^{i w\left( \gamma I+ {\tilde{R}}^{t}_{B,z}\right) }\right) dw \right] \nonumber \\&\qquad \rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma d\eta \nonumber \\&\quad =-\int \int _0^\infty \left( e^{i w {\tilde{R}}^{t}_{A,z}}-e^{i w {\tilde{R}}^{t}_{B,z}}\right) ~ e^{i \gamma w }\rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma ~dw d\eta , \end{aligned}$$
(2.8)

which can be bounded as

$$\begin{aligned}&\bigg \Vert \int \int _0^\infty \left[ e^{i w {\tilde{R}}^{t}_{A,z}}-e^{i w {\tilde{R}}^{t}_{B,z}}\right] \nonumber \\&\qquad e^{i \gamma w }\rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma ~dw d\eta \bigg \Vert \nonumber \\&\qquad \le \int \left\Vert \left( e^{i w {\tilde{R}}^{t}_{A,z}}-e^{i w {\tilde{R}}^{t}_{B,z}}\right) \right\Vert \nonumber \\&\left| \int e^{ i\gamma w }\rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma \right| dw d\eta . \end{aligned}$$
(2.9)

The assumption we made on the supports of \(\rho _1, \rho _2\) implies that \(-\frac{\mathbf{R}}{2}<\eta <\frac{\mathbf{R}}{2}\), and the choice \({-2\mathbf{R}-1<t<-2\mathbf{R}}\) implies \(-\frac{5}{2}\mathbf{R}-1<t\pm \eta <-\frac{3\mathbf{R}}{2}\). This implies that

$$\begin{aligned} \bigg \{\gamma :\psi _{t,\eta }(\gamma )\ne 0, -\frac{5}{2}\mathbf{R}-1<t\pm \eta <-\frac{3\mathbf{R}}{2}\bigg \} \subset \bigg (\frac{2}{2+7\mathbf{R}},\frac{2}{3\mathbf{R}}\bigg ), \end{aligned}$$

where \(\psi _{t,\eta }(\gamma ) = \rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) \). Thus for fixed \(t, \eta \), the function \(\psi _{t,\eta }(\gamma )\) is of compact support and has a \(\tau \)-Hölder continuous derivative as a function of \(\gamma \), for the \(\tau \) stated as in the Theorem. Also, the derivative of \(\psi _{t,\eta }\) is uniformly \(\tau \)-Hölder continuous and the constant in the corresponding bound is uniform in \(t,\eta \), which follows from the support properties of \(\psi _{t,\eta }\) and the bounds on \(t,\eta \). Therefore, if we denote the Fourier transform of \(\psi _{t,\eta }(-\gamma )\) by \(\widehat{\psi _{t,\eta }}\), then standard Fourier analysis gives the bound,

$$\begin{aligned}&\left| \int e^{i \gamma w }\rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma \right| \\&\qquad \le \frac{C}{|w|^{1+\tau }}\left( \left\Vert |w|^{1+\tau } \widehat{\psi _{t,\eta }}(w)\right\Vert _\infty \right) \le \frac{{\tilde{C}}}{|w|^{1+\tau }} ~~ for~|w|\gg 1 \end{aligned}$$

for some \({\tilde{C}}\) depends on \(\rho _1, \rho _2\) but not on \(t, \eta \).

Again using the bounds on \(t,\eta \) and \(\gamma \), we see that for small |w|, the w integral is bounded uniformly in \(t,\eta \), by the \(L^\infty \) norm of \(\rho _1\) and \(\rho _2\) and hence \({\tilde{C}}\) is \((t, \eta )\)-independent for all w.

On other hand using the Lemma A.2, we have

$$\begin{aligned} \left\Vert e^{ iw {\tilde{R}}^{t}_{A,z}}-e^{i w {\tilde{R}}^{t}_{B,z}} \right\Vert \le 2^{1-s}|w|^s\left\Vert {\tilde{R}}^{t}_{A,z}-{\tilde{R}}^{t}_{B,z}\right\Vert ^s \end{aligned}$$

for \(0<s<1\). By choosing \(s< \tau /2 \) and using above bounds in (2.9) we have

$$\begin{aligned}&\bigg \Vert \int \int _0^\infty \left( e^{ i w {\tilde{R}}^{t}_{A,z}}-e^{i w {\tilde{R}}^{t}_{B,z}}\right) \nonumber \\&\qquad e^{ i\gamma w }\rho _1\left( t+\frac{1}{\gamma }+\eta \right) \rho _2\left( t+\frac{1}{\gamma }-\eta \right) d\gamma ~dw d\eta \bigg \Vert \end{aligned}$$
(2.10)
$$\begin{aligned}&\qquad \le {\hat{C}} \left( 1+\int _1^\infty \frac{1}{w^{1+\tau -s}}dw \right) \int \left\Vert {\tilde{R}}^{t}_{A,z}-{\tilde{R}}^{t}_{B,z}\right\Vert ^{s} d\eta . \end{aligned}$$
(2.11)

The integral we started with is independent of t so we can integrate it with respect to the Lebesgue measure on an interval of length one. Therefore, combining the inequalities (2.7, 2.82.92.10) and integrating t over an interval of length 1, yields

$$\begin{aligned}&\left\Vert \int \sqrt{F}(R(A, x_1, x_2,z) -R(B, x_1, x_2, z) \sqrt{F} ~\rho _1(x_1)\rho _2(x_2) dx_1 dx_2 \right\Vert \\&\quad = \int _{-2R-1}^{-2R} \left\Vert \int \sqrt{F}(R(A, x_1, x_2,z) -R(B, x_1, x_2, z) \sqrt{F} ~\rho _1(x_1)\rho _2(x_2) dx_1 dx_2 \right\Vert dt \\&\quad \le C \int ^{-2R}_{-2R-1} \int _{-\frac{R}{2}}^{\frac{R}{2}} \left\Vert {\tilde{R}}^{t}_{A,z}-{\tilde{R}}^{t}_{B,z}\right\Vert ^{s} d\eta dt \\&\quad \le C \int \int \bigg \Vert \sqrt{F}\left( A+{\hat{x}}_1 F_1+{\hat{x}}_2 F_2 -z\right) ^{-1}\sqrt{F}\\&\qquad - \sqrt{F}\left( B +{\hat{x}}_1 F_1 +{\hat{x}}_2 F_2-z\right) ^{-1}\sqrt{F}\bigg \Vert ^s \phi _\mathbf{R}({\hat{x}}_1)\phi _\mathbf{R}({\hat{x}}_2) ~ d{\hat{x}}_1 d{\hat{x}}_2. \end{aligned}$$

For the last inequality we used the definition of \({\tilde{R}}^{t}_{X,z}\) changed variables \({\hat{x}}_1=t+\eta , ~ {\hat{x}}_2=t-\eta \) along with a slight increase in the range of integration to accommodate the bump \(\phi _\mathbf{R}\) to have their supports in \((-\frac{5}{2}\mathbf{R}-1, -\frac{\mathbf{R}}{2})\). \(\quad \square \)

3 The Discrete Case

Let \(\mathbb {G}\) denote a un-directed connected graph with a graph-metric d. Let \(\{x_n\}_n\) denote an enumeration of \(\mathbb {G}\) satisfying \(d(\Lambda _N,x_{N+1})=1\) for any \(N \in {\mathbb {N}}\), where

$$\begin{aligned} \Lambda _N=\{x_n: n\le N\}, ~~ \Lambda _\infty = \mathbb {G}, \end{aligned}$$
(3.1)

and

$$\begin{aligned} \liminf _{N\rightarrow \infty } \frac{d(x_0,\mathbb {G}{\setminus }\Lambda _N)}{g(N)} = r_\mathbb {G}>0, \end{aligned}$$
(3.2)

for some increasing function g on \({\mathbb {R}}^+\). Typically, we will have \(g(N) = N^{1/d}\) for \(\mathbb {G}= {\mathbb {Z}}^d\) and \(g(N) = \log _{K}(N) \) for the Bethe lattice with connectivity \(K > 2\). Henceforth for indexing \(\mathbb {G}\) we will say \(n \in \mathbb {G}\) to mean \(x_n \in \mathbb {G}\).

Let \(\mathscr {H}\) be a complex separable Hilbert space equipped with a countable family \(\{P_n\}_{n\in \mathbb {G}}\) of finite rank orthogonal projections such that \(\sum _{n \in \mathbb {G}} P_n = Id\), with the maximum rank of \(P_n\) being finite, thus

$$\begin{aligned} \displaystyle {\mathscr {H}= \bigoplus _{n\in \mathbb {G}} Ran(P_n).} \end{aligned}$$

Let \(h_0\) denote a bounded self-adjoint operator on \(\mathscr {H}\) and consider the random operator, we stated in Eq. (1.2),

$$\begin{aligned} h^\omega =h_0+\sum _{n \in \Lambda _\infty } \omega _n P_n, \end{aligned}$$
(3.3)

where the random variables \(\omega _n\) satisfies Hypothesis 3.1 below. Given a finite subset \(\Lambda \subset \mathbb {G}\), we will denote \(P_{\Lambda }=\sum _{n\in \Lambda } P_n\), \(\mathscr {H}_\Lambda = P_{\Lambda }\mathscr {H}\) and

$$\begin{aligned} h^\omega _\Lambda =P_{\Lambda } h^\omega P_{\Lambda } \end{aligned}$$
(3.4)

denotes the restriction of \(h^\omega \) to \(\mathscr {H}_\Lambda \).

We abused notation to denote P for two different objects, \(P_n\) denoting projections onto sites \(x_n \in \mathbb {G}\) and \(P_\Lambda \) to denote the sum of \(P_n\) when \(x_n\) varies in \(\Lambda \), but we are sure the reader will not be confused and the meaning would be clear from the context.

We have the following assumptions on the quantities involved in the model.

Hypothesis 3.1

We assume that the random variables \(\omega _n\) are independent and distributed according to a density \(\rho _n\) which are compactly supported in (0, 1)and satisfy \(\rho _n \in C^m((0, 1))\) for some \(m \in {\mathbb {N}}\) and

$$\begin{aligned} {\mathcal {D}}= \sup _{n} \max _{\ell \le m} \Vert \rho _n^{(\ell )}\Vert _\infty < \infty . \end{aligned}$$
(3.5)

We note that as long as \(\rho _n\in C^m((a,b))\) for some \(-\infty<a<b<\infty \), a scaling and translation will move its support to (0, 1). So our support condition is no loss of generality.

Hypothesis 3.2

A compact interval \(J\subset \subset {\mathbb {R}}\) is said to be in region of localization for \(h^\omega \) with exponent \(0< s < 1\) and rate of decay \(\xi _s>0\), if there exist \(C>0\) such that

$$\begin{aligned} \sup _{\mathfrak {R}(z) \in J, \mathfrak {I}(z) >0}\mathop {\mathbb {E}}_\omega \left[ \left\Vert P_n(h^\omega - z)^{-1}P_k\right\Vert ^s\right] \le C e^{-\xi _s d(n,k)} \end{aligned}$$
(3.6)

for any \(n,k \in \mathbb {G}\). For the operators \(h_{\Lambda _K}^\omega \) exponential localization is defined with \(\Lambda , h^\omega _{\Lambda _K},\xi _{s,\Lambda _K}\) replacing \(\mathbb {G},h^\omega ,\xi _s\) respectively in the above bound.

We assume that for our models, for all\(\Lambda _K\), with\(K \ge N\)the inequality (3.6) holds for some\(\xi _s >0\)and\(\xi _{s, \Lambda _K} \ge \xi _{s}\), for all\(\Lambda _{K}\)with\(K \ge N\). We also assume that the constants\(C, \xi _s\)do not change if we replace the distribution\(\rho _n\)with one of its derivatives at finitely many sitesn.

Remark 3.3

For large disorder models one can get explicit values for \(\xi _s\) from the papers of Aizenman and Molchanov [4] or Aizenman [2]. For example the Anderson model on \(\ell ^2({\mathbb {Z}}^d)\) with disorder parameter \(\lambda>> 1\), typically \(\xi _s = -s\ln \frac{C_{s,\rho } 2d }{\lambda }\), for some constant \(C_{s,\rho } < \infty \) that depends on the single-site density \(\rho \) and is independent of \(\Lambda \). So \(\xi _{s,\Lambda } = \xi _s >0\) for large enough \(\lambda \). Similarly for the Bethe lattice with connectivity \(K+1 > 1\), \(\xi _{s, \Lambda } = \xi _s = - s \ln \frac{C_{s,\rho } (K+1)}{\lambda }\). Going through Lemma 2.1 of their paper, and tracing through the constants, we see that our assumption about changing the distribution at finitely many sites is valid.

Henceforth let \(E_A(\cdot )\) denote the projection valued spectral measure of a self-adjoint operator A. Our main goal in this section is to show that

$$\begin{aligned} {{\mathcal {N}}}(E)=\mathop {\mathbb {E}}_\omega \left[ tr(P_0E_{h^\omega }(-\infty ,E))\right] \qquad \end{aligned}$$

is m times differentiable in the region of localization, if \(\rho \) has a bit more than m derivatives, which means that the density of states DOS is \(m-1\) times differentiable. Our theorem is the following, where we tacitly assume that the spectrum \(\sigma (h^\omega )\) is a constant set a.s., a fact proved by Pastur [45] for a large class of random self-adjoint operators. While it may not be widely known, it is also possible to show the constancy of spectrum for operators that do not have ergodicity but when there is independent randomness involved see for example Kirsch et al. [34]. In such non-ergodic cases when there is no limiting eigenvalue distribution, our results are still valid for the spectral measures considered.

Theorem 3.4

Consider the random self-adjoint operators \(h^\omega \) given in Eq. (3.3) on the Hilbert space \(\mathscr {H}\) and a graph \(\mathbb {G}\) satisfying the condition (3.2) with \(g(N) = N^\alpha \), for some \(\alpha >0\). We assume that \(\omega _n\) is distributed with density \(\rho _n\) satisfying the Hypothesis 3.1 and, with m as in the Hypothesis, \(\rho _n^{(m)}\) is \(\tau \)-Hölder continuous for some \( 0< \tau < 1\). Assume that J is an interval in the region of localization for which the Hypothesis 3.2 holds for some \(0< s < \tau \). Then the function

$$\begin{aligned} {{\mathcal {N}}}(E)=\mathop {\mathbb {E}}_\omega \left[ tr(P_0 E_{h^\omega }(-\infty ,E))\right] \in C^{m-1}(J) \end{aligned}$$
(3.7)

and \({{\mathcal {N}}}^{(m)}(E)\) exists a.e. \(E \in J\).

Remark 3.5

  1. 1.

    We stated the Theorem in this generality so that it applies to multiple models, such as the Anderson models on \({\mathbb {Z}}^d\), other lattice or graphs, having the property that the number of points at a distance N from any fixed point grow polynomially in N. The models for which this Theorem is valid also include higher rank Anderson models, long range hopping with some restrictions, models with off-diagonal disorder to state a few. In all of these models, by including sufficiently high diagonal disorder, through a coupling constant \(\lambda \) on the diagonal part, we will have exponential localization for the corresponding operators via the Aizenman-Molchanov method. So this Theorem gives the Regularity of DOS in all such models. For the Bethe lattice and other countable sets for which g(N) is like \(\ln (N)\), our results hold but the order of smoothness m that can be obtained is restricted by the localization length by a condition such as \(\xi _s > m \ln K\). So in this work we do not consider such type of setting.

  2. 2.

    This Theorem also gives smoothness of DOS in the region of localization for the intermediate disorder cases considered for example by Aizenman [2] who exhibited exponential localization for such models in part of the spectrum.

  3. 3.

    In the case \(h^\omega \) is not the Anderson model, all these results are new and it is not clear that the method of proof using super symmetry, as done for the Anderson model at high disorder, will even work for these models.

  4. 4.

    We note that in the proof we will take at most \(m-1\) derivatives of resolvent kernels in the upper half-plane and show their boundedness, but we have a condition that the function \(\rho \) has a \(\tau \)-Hölder continuous derivative. The extra \(1+\tau \) ‘derivatives’ are needed for applying the Theorem 2.2 to obtain the inequality (3.19) from the equality (3.18).

Proof

Since the orthogonal projection \(P_0\) is finite rank, we can write \(P_0 = \sum _{i=1}^r |\phi _i\rangle \langle \phi _i|\) using a set \(\{\phi _i\}\) of finitely many orthonormal vectors. Then we have,

$$\begin{aligned} {{\mathcal {N}}}(E) = \sum _{i=1}^r \mathop {\mathbb {E}}_\omega \left( \langle \phi _i, E_{h^\omega }((-\infty , E)) \phi _i \rangle \right) . \end{aligned}$$

The densities of the measures \(\langle \phi _i, E_{h^\omega }(\cdot ) \phi _i \rangle \) are bounded by Lemma A.4 for each \(i=1, \dots , r\). Hence \({{\mathcal {N}}}\) is differentiable almost everywhere and its derivative, almost everywhere, is given by the boundary values,

$$\begin{aligned} \frac{1}{\pi } \mathop {\mathbb {E}}_\omega \bigg ( tr\left( P_0\mathfrak {I}(h^\omega - E - i0)^{-1}\right) \bigg ) \end{aligned}$$

is bounded. The Theorem follows from Lemma A.1 once we show

$$\begin{aligned} \sup _{\mathfrak {R}(z) \in J, \mathfrak {I}(z) >0} \frac{d^\ell }{dz^\ell } \mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega -z)^{-1})\right] < \infty , \end{aligned}$$
(3.8)

for all \(\ell \le m-1\), since such a bound implies that \(m-1\) derivatives of \(\eta \) are continuous and its mth derivative exists almost everywhere, since \(h^\omega \) are bounded operators. The projection \(P_0\) is finite rank which implies that the bounded operator valued analytic functions \(P_0(h^\omega - z)^{-1}, P_0(h_\Lambda ^\omega - z)^{-1}\) are trace class for \(z \in {\mathbb {C}}^+\). Therefore the linearity of the trace and the dominated convergence theorem together imply that

$$\begin{aligned} \mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega -z)^{-1}-P_0(h^\omega _{\Lambda }-z)^{-1})\right] \xrightarrow {\Lambda \rightarrow \mathbb {G}}0, \end{aligned}$$
(3.9)

compact uniformly in \({\mathbb {C}}^+\). For the rest of the proof we set \(h_K^\omega = h_{\Lambda _K}^\omega \) for ease of writing.

The convergence given in Eq. (3.9) implies that the telescoping sum,

$$\begin{aligned} \mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _{M}-z)^{-1})\right]= & {} \sum _{K=N}^{M} \bigg (\mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _{K+1}-z)^{-1})\right] -\mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _K-z)^{-1})\right] \bigg )\\&+\mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _N-z)^{-1})\right] \end{aligned}$$

also converges compact uniformly, in \({\mathbb {C}}^+\) to

$$\begin{aligned} \mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega -z)^{-1})\right] , \end{aligned}$$

which implies that their derivatives of all orders also converge compact uniformly in \({\mathbb {C}}^+\).

Therefore the inequality (3.8) follows if we prove the following uniform bound, for all \( 0 \le \ell \le m-1\) and N large,

$$\begin{aligned} \sum _{K=N}^{\infty }\displaystyle {\sup _{\mathfrak {R}(z) \in J} \left| \frac{d^\ell }{dz^\ell }\bigg (\mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _{K+1}-z)^{-1}) -\mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _K-z)^{-1})\right] \bigg )\right] \right| } < \infty . \end{aligned}$$
(3.10)

To this end we only need to estimate

$$\begin{aligned} \bigg |\frac{d^l}{dz^l}\mathop {\mathbb {E}}_\omega \left[ tr(P_0(h^\omega _{K+1}-z)^{-1}P_0)-tr(P_0(h^\omega _K-z)^{-1}P_0)\right] \bigg | \end{aligned}$$
(3.11)

for \(\mathfrak {R}(z) \in J\) where we used the trace property to get an extra \(P_0\) on the right and set \(G^\omega _{M}(z)=P_0(h^\omega _{M}-z)^{-1}P_0, ~~ M \in {\mathbb {N}}\) for further calculations.

The function

$$\begin{aligned} f_\epsilon (\vec {\omega }-E\vec {1}) = tr(G^\omega _{K}(E+i\epsilon )) \end{aligned}$$

is a complex valued bounded measurable function on \({\mathbb {R}}^{K+1}\) for each fixed \(\epsilon >0\). Therefore we compute the derivatives in E of its expectation

$$\begin{aligned} h_\epsilon (E) = \mathop {\mathbb {E}}_\omega \big (f_\epsilon (\vec {\omega }-E\vec {1})\big ) = {\mathbb {E}}\big ( tr(G^\omega _{K}(E+i\epsilon )^{-1})\big ) \end{aligned}$$

using Theorem 2.1. This calculation gives in the notation of that Theorem,

$$\begin{aligned} \frac{d^\ell }{dE^\ell } \mathop {\mathbb {E}}_\omega \big ( tr(G^\omega _K(E+i\epsilon ))\big ) = \int tr(G^\omega _K( E + i\epsilon )) \mathbf{D}^\ell \Phi _K(\vec {\omega })d\vec {\omega }, \end{aligned}$$
(3.12)

where we set \(\displaystyle \Phi _K(\vec {\omega }) = \prod _{n \in \Lambda _K} \rho _n(\omega _n), ~ d\vec {\omega } = \prod _{n \in \Lambda _K} d\omega _n\).

It is not hard to see that for each \(0 \le \ell \le m-1\),

$$\begin{aligned} \int tr(G^\omega _K(E + i\epsilon )) \mathbf{D}^\ell \Phi _K(\vec {\omega })d\vec {\omega }, =\int tr(G^\omega _K( E + i\epsilon )) \mathbf{D}^\ell \Phi _{K+1}(\vec {\omega })d\vec {\omega }, \end{aligned}$$
(3.13)

since the integrand \(tr(G^\omega _K( E + i\epsilon )) \) is independent of \(\omega _n, n \in \Lambda _{K+1}{\setminus } \Lambda _K\) and \(\rho _n\) satisfies \(\int \rho _n^{(j)}(x)dx = \delta _{j0}\). We set

$$\begin{aligned} R(\omega , K, E, \epsilon ) = tr\left( G^\omega _{K+1}(E+i\epsilon ) - G^\omega _K(E+i\epsilon )\right) \end{aligned}$$
(3.14)

to simplify writing. We may write the argument \(\omega \) of \(R(\omega , K, E, \epsilon )\) below in terms of the vector notation \(\vec {\omega }\) for uniformity as it is a function of the variables \(\{\omega _n, n \in \Lambda _{K+1}\}\).

Then combining the Eqs. (3.123.13) inside the absolute value of the expression in Eq. (3.11) to be estimated we have to consider the quantity, for \(K \ge N\),

$$\begin{aligned}&T_{K,\ell }(E,\epsilon ) = \frac{d^\ell }{dE^\ell } \mathop {\mathbb {E}}_\omega \left[ tr\big (G^\omega _{K+1}(E+i\epsilon ) - G^\omega _{K}(E+i\epsilon )\big )\right] \nonumber \\&\quad = \int _{{\mathbb {R}}^{K+1}} R(\vec {\omega }, K, E, \epsilon ) (\mathbf{D}^\ell \Phi _{K+1})(\vec {\omega })d\vec {\omega }. \end{aligned}$$
(3.15)

To prove the theorem we need to show that

$$\begin{aligned} \sum _{K=N}^\infty \sup _{E \in J, \epsilon >0} |T_{K, \ell }(E, \epsilon )| < \infty . \end{aligned}$$
(3.16)

Multinomial expansion of \(\displaystyle {\mathbf{D}^\ell = \bigg ( \sum _{n \in \Lambda _{K+1}} \frac{\partial }{\partial \omega _n} }\bigg )^\ell \) gives the relation

$$\begin{aligned} T_{K,\ell }(E, \epsilon )=\displaystyle { \sum _{\begin{array}{c} k_0+\dots +k_{K} = \ell \\ k_n \ge 0 \end{array}} \big (\begin{array}{c} \ell \\ k_0, \dots , k_{K} \end{array}\big )\int _{{\mathbb {R}}^{K+1}} R(\vec {\omega }, K, E, \epsilon ) \bigg (\prod _{n=0}^{K+1}\frac{\partial ^{k_n}}{{\partial ^{k_n}\omega _{n}}} \rho _n(\omega _{n})d\omega _{n}\bigg )}. \end{aligned}$$
(3.17)

We use Fubini to interchange the trace and an integral over \(\omega _0\) to get

$$\begin{aligned}&T_{K,\ell }(E, \epsilon ) \nonumber \\&\quad =\displaystyle {\sum _{\begin{array}{c} k_0+\dots +k_{K} = \ell \\ k_n \ge 0 \end{array}} \big (\begin{array}{c} \ell \\ k_0, \dots , k_{K} \end{array}\big )} \int _{{\mathbb {R}}^{K}} tr\bigg (\int \big (G_{K+1}^\omega (E+i\epsilon ) - G_{K}^\omega (E+i\epsilon )\big ){\rho _0^{(k_0)}(\omega _0) d\omega _0}\bigg ) \nonumber \\&\qquad \times \bigg (\prod _{n\in \Lambda _{K+1}, n \ne 0} \rho _n^{(k_n)}(\omega _{n})d\omega _{n}\bigg ) . \end{aligned}$$
(3.18)

We take the absolute value of T and estimate the \(\omega _0\) integrals using the Theorem 2.2, displaying explicitly the dependence on the \(\rho \) or its derivatives in the constant \(\Xi \) appearing in that theorem, to get, for \(0< s < 1/2\) (the choice for s will become clear in Lemma 3.1),

$$\begin{aligned} T_{K,\ell }(E, \epsilon )\le & {} \displaystyle {\sum _{\begin{array}{c} k_0+\dots +k_{K} = \ell \\ k_n \ge 0 \end{array}} \big (\begin{array}{c} \ell \\ k_0, \dots , k_{K} \end{array}\big )} \Xi (\rho _0^{(k_0)}) tr(P_0)\nonumber \\&\times \int _{{\mathbb {R}}^{K}} \bigg (\int \Vert \big (G_{K+1}^\omega (E+i\epsilon ) - G_{K}^\omega (E+i\epsilon )\big )\Vert ^s \phi _\mathbf{R}(\omega _0) d\omega _0\bigg ) \nonumber \\&\times \big (\prod _{n\in \Lambda _{K+1}, n \ne 0} |\rho _n^{(k_n)}(\omega _{n})|d\omega _{n}\big ) . \end{aligned}$$
(3.19)

We set

$$\begin{aligned} \tilde{\rho _n} = \frac{\rho _n^{(k_n)}}{\Vert \rho _n^{(k_n)}\Vert _1}, ~~ n \ne 0, ~~ \tilde{\rho _0} = \frac{\phi _\mathbf{R}}{\Vert \phi _\mathbf{R}\Vert _1} \end{aligned}$$
(3.20)

and set, using the inequality (3.5), \(C_0=\max \{{\mathcal {D}}, \Vert \phi _\mathbf{R}\Vert _1\}\), where \({\mathcal {D}}\) is such that \(\Vert \rho ^{(k_n)}_n\Vert _1\le {\mathcal {D}}~\forall ~n \ne 0\). We note here that at most \(\ell \) of \(\tilde{\rho }_n\) differ from \(\rho _n\) itself and that \(\Vert \rho _n\Vert _1 = 1\). We then get the bound

$$\begin{aligned} |T_{K,\ell }(E, \epsilon )|\le & {} C_0^\ell \sup _{j\le \ell } \Xi (\rho ^{(j)}) tr(P_0) \sum _{\begin{array}{c} k_0+\dots +k_{K} = \ell \\ k_n \ge 0 \end{array}} \big (\begin{array}{c} \ell \\ k_0, \dots , k_{K} \end{array}\big )\nonumber \\&\times \int _{{\mathbb {R}}^{K+1}} \Vert \big (G_{K+1}^\omega (E+i\epsilon ) - G_{K}^\omega (E+i\epsilon )\big )\Vert ^s \prod _{n \in \Lambda _{K+1}} \tilde{\rho }_{n}(\omega _{n})d\omega _{n}.\nonumber \\ \end{aligned}$$
(3.21)

We denote the probability measure

$$\begin{aligned} d\mathbb {P}_{K}(\vec {\omega }) = \prod _{n \in \Lambda _{K}} \tilde{\rho }_{n}(\omega _{n})d\omega _{n}, \end{aligned}$$

and expectation as \({\mathbb {E}}_{K}\). We also set,

$$\begin{aligned} C_{1,m}=\sup _{0\le \ell \le m}\{C_0^\ell \},~ C_{2, m} = \sup _{n \in \mathbb {G}, j\le m} \{\Xi (\rho _n^{(j)})\}. \end{aligned}$$

Then the inequality (3.21) becomes

$$\begin{aligned} |T_{K,\ell }(E, \epsilon )|&\le C_{1,m} C_{2,m}tr(P_0) \displaystyle { \sum _{k_{0}+\dots +k_{K} = \ell } \big (\begin{array}{c} \ell \\ k_{0}, \dots , k_{K} \end{array}\big )} \nonumber \\&\qquad \times {\mathbb {E}}_{K+1} \bigg [\Vert \big (G_{K+1}^\omega (E+i\epsilon ) - G_{K}^\omega (E+i\epsilon )\big )\Vert ^s \bigg ]. \end{aligned}$$
(3.22)

We use the estimate for the expectation \({\mathbb {E}}_{K+1}(\cdot )\) from Lemma 3.1 to get the following bound, for some constant \(C_6\) independent of K,

$$\begin{aligned} \sup _{ E \in J, \epsilon >0} |T_{K,\ell }(E, \epsilon )|&\le C_6C_5 \displaystyle { \sum _{\begin{array}{c} k_{0}+\dots +k_{K} = \ell \end{array}}} \big (\begin{array}{c} \ell \\ k_{0}, \dots , k_{K} \end{array}\big ) (1+2K) e^{-\xi _{2s} K^\alpha } \nonumber \\&\le C_6 C_5 (K+1)^\ell (1+2K) e^{-\xi _{2s} |K|^\alpha }. \end{aligned}$$
(3.23)

From this bound the summability stated in the inequality (3.16) follows since we assumed that \(\xi _{2s} >0\), completing the proof of the Theorem. \(\quad \square \)

We needed the exponential bound on the resolvent estimate, which is the focus of the following lemma.

Lemma 3.1

We take the interval J stated in Theorem 3.4, then we have the bound

$$\begin{aligned}&\sup _{\mathfrak {R}(z) \in J, \mathfrak {I}(z) >0} {\mathbb {E}}_{K+1} \bigg [\big \Vert \big (G_{K+1}^\omega (z) - G_{K}^\omega (z)\big )\big \Vert ^s \bigg ] \\&\qquad \le C_5(m,Rank(P_0), {\mathcal {D}}, h_0, R,s) (2K+1) e^{-\xi _{2s} K^\alpha }. \end{aligned}$$

Proof

We start with the resolvent identity

$$\begin{aligned} G_{K}^\omega (z) - G_{K+1}^\omega (z)&=P_0 \bigg ((h^\omega _{K} - z)^{-1} - (h_{K+1}^\omega - z)^{-1}\bigg ) P_0 \nonumber \\&=P_0 (h^\omega _{K} - z)^{-1}[h^\omega _{K+1}-h^\omega _{K}] (h_{K+1}^\omega - z)^{-1} P_0 \nonumber \\&=P_0 (h^\omega _{K+1} - z)^{-1}P_{\Lambda _K}h_0P_{K+1}(h_{K+1}^\omega - z)^{-1} P_0. \end{aligned}$$
(3.24)

In the above equation, the terms corresponding to the random part \(\omega _{K+1}P_{K+1}\) and the part \(P_{K+1}h_0P_{\Lambda _K}\) (appearing in the difference \([h^\omega _{K+1}-h^\omega _{K}]\)) are zero, since they are multiplied by \(P_0(h^\omega _{K} - z)^{-1}\) on the left and \(P_0(h_{K}^\omega -z)^{-1}P_{K+1}\) being the operator \(P_0(P_{\Lambda _K} h^\omega P_{\Lambda _K} -z)^{-1}P_{K+1}\) is obviously zero since \(P_0 P_{K+1} = 0\) if \(K > 1\). It is to be noted that this fact is independent of how \(h_0\) looks! We estimate the last line in the Eq. (3.24), by first by expanding \(P_{\Lambda _K} = \sum _{n\in \Lambda _K} P_n\) and estimate the norms of the operators (using \(\Vert B\sum _{i=1}^N A_i \Vert ^s \le \Vert B\Vert ^s\sum _{i=1}^N \Vert A_i\Vert ^s \) for any finite collection \(\{B, A_i, i=1, \dots , N\}\) of bounded operators and \(0< s < 1\)) to get

$$\begin{aligned} \Vert \big (G_{K+1}^\omega (z) - G_{K}^\omega (z)\big )\Vert ^s&\le \Vert h_0\Vert ^s\Vert P_{K+1}(h_{K+1}^\omega - z)^{-1} P_0\Vert ^s \Vert P_0 (h^\omega _{K+1} - z)^{-1}P_{\Lambda _K}\Vert ^s \nonumber \\&\le \Vert h_0\Vert ^s\Vert P_{K+1}(h_{K+1}^\omega - z)^{-1} P_0\Vert ^s \nonumber \\&\qquad \times \sum _{n\in \Lambda _K} \Vert P_0 (h^\omega _{K+1} - z)^{-1}P_n\Vert ^s. \end{aligned}$$
(3.25)

We take expectation of both the sides of the above equation, then interchange the sum and the expectation on the right hand side and use Cauchy–Schwartz inequality to get the bound

$$\begin{aligned}&{\mathbb {E}}_{K+1} \big (\Vert \big (G_{K+1}^\omega (z) - G_{K}^\omega (z)\big )\Vert ^s\big )\nonumber \\&\quad \le \Vert h_0\Vert ^s\sum _{n\in \Lambda _K}\big ({\mathbb {E}}_{K+1}\big (\Vert P_{K+1}(h_{K+1}^\omega - z)^{-1} P_0\Vert ^{2s}\big )\big )^{\frac{1}{2}}\nonumber \\&\qquad \big ({\mathbb {E}}_{K+1}\big (\Vert P_0(h_{K+1}^\omega - z)^{-1} P_n \Vert ^{2s}\big )\big )^{\frac{1}{2}}. \end{aligned}$$
(3.26)

We now estimate the above terms by getting an exponential decay bound for the term with operators kernels of the form \(P_{K+1}[\cdot ]P_0\) while the remaining factors are uniformly bounded with the bound independent of K, by using the Hypothesis 3.2.

Applying the bound on the fractional moments given in the Hypothesis 3.2, inequality 3.6 we get

$$\begin{aligned}&{\mathbb {E}}_{K+1} \big ( \Vert P_{n}(h_K^\omega - z)^{-1}P_0\Vert ^{2s}\big ) \le C , ~~ n \in \Lambda _{K}, \\&{\mathbb {E}}_{K+1} \big (\Vert P_0 (h^\omega _{K+1} - z)^{-1}P_{n}\Vert ^{2s}\big ) \le C , ~~ n \in \Lambda _K \\&{\mathbb {E}}_{K+1} \big (\Vert P_0 (h^\omega _{K+1} - z)^{-1}P_{K+1}\Vert ^{2s}\big ) \le C e^{-\xi _s K^\alpha }, \\&{\mathbb {E}}_{K+1} \big (\Vert P_{K+1} (h^\omega _{K+1} - z)^{-1}P_{0}\Vert ^{2s}\big ) \le C e^{-\xi _s K^\alpha }. \end{aligned}$$

Using these bounds in the inequality (3.26), we get the bound (after noting that the sum has 2K terms, so we get \((1+2K)\) as the only K dependence other than the exponential decay factor),

$$\begin{aligned}&\le C_5(m,Rank(P_0), {\mathcal {D}}, h_0, R,s) (1 + 2K) e^{-\xi _{2s} K^\alpha }, \end{aligned}$$

which is the required estimate to complete the proof of the Lemma. \(\quad \square \)

4 The Continuous Case

In this section we show that the density of states of some Random Schrödinger operators are almost as smooth as the single site distribution. On the Hilbert space \(L^2({\mathbb {R}}^d)\) we consider the operator

$$\begin{aligned} H_0=\sum _{i=1}^d \left( -i\frac{\partial }{\partial x_i}+A_i(x)\right) ^2, \end{aligned}$$

with the vector potential \(\vec {A}(x)=(A_1(x),\cdots ,A_d(x))\) assumed to have sufficient regularity so that \(H_0\) is essentially self-adjoint on \(C_0^\infty ({\mathbb {R}}^d)\).

The random operators considered here are given by

$$\begin{aligned} H^\omega =H_0+\lambda \sum _{n\in {\mathbb {Z}}^d} \omega _n u_n, \end{aligned}$$
(4.1)

where \(\{\omega _n\}_{n\in {\mathbb {Z}}^d}\) are independent real random variables satisfying Hypothesis 3.1, \(u_n\) are operators of multiplication by the functions \(u(x-n)\), for \(n \in {\mathbb {Z}}^d\) and \(\lambda >0\) a coupling constant.

We have the following hypotheses on the operators considered above to ensure \(H^\omega \) continue to be essentially self-adjoint on \(C_0^\infty ({\mathbb {R}}^d)\) for all \(\omega \). By now it is well known in the literature (see for example the book of Carmona and Lacroix [10]) that the spectral and other functions of these operators we consider below will have the measurability properties, as functions of \(\omega \), required for the computations we perform on them and we will not comment further on measurability.

Hypothesis 4.1

  1. 1.

    The random variables \(\{\omega _n\}_n\) satisfy the Hypothesis 3.1.

  2. 2.

    The function \(0\le u\le 1\) is a non-negative smooth function on \({\mathbb {R}}^d\) such that for some \(0< \epsilon _2< \frac{1}{2}, 0< \epsilon _1 < 1\), it satisfies

    $$\begin{aligned}&u(x) = {\left\{ \begin{array}{ll} 0, ~~ x \notin (-\frac{1}{2}-\epsilon _1,\frac{1}{2}+\epsilon _1)^d \\ 1, ~~ x \in (-\frac{1}{2}+\epsilon _2,\frac{1}{2}-\epsilon _2)^d \end{array}\right. } \\&\displaystyle {\sum _{n\in {\mathbb {Z}}^d}u(x-n)=1}\qquad x\in {\mathbb {R}}^d. \end{aligned}$$

We need some notation before we state our results. Given a subset \(\Lambda \subset {\mathbb {Z}}^d\), we set

$$\begin{aligned} {[}\Lambda ]=\bigg \{x\in {\mathbb {R}}^d: \sum _{n\in \Lambda }u(x-n)=1\bigg \} \end{aligned}$$
(4.2)

and denote the restrictions of \(H_0, H^\omega \) to \([\Lambda ]\) respectively by \(H_{0,\Lambda }, H^\omega _\Lambda \). As an abuse of notation, whenever we talk about restricting the operator on \(\Lambda \), we will mean restriction onto \([\Lambda ]\). We need this distinction because \(\sum _{n\in \Lambda }u(x-n)= 1\) only on \([\Lambda ]\) and we need the complete covering condition. While the boundary conditions are not that important, we will work with Dirichlet boundary conditions in this section. We will also denote \(u_{n,\Lambda }\) to be the restriction of \(u_n\) to \([\Lambda ]\) when the need arises. We denote by \(E_A(\cdot )\) the projection valued spectral measure of a self-adjoint operator A and from the context it will be clear that this symbol will not be confused with points in the spectrum denoted by E. We denote the Integrated Density of States (IDS) by

$$\begin{aligned} {{\mathcal {N}}}_\Lambda (E)=\mathop {\mathbb {E}}_\omega \left[ tr(u_0 E_{H^\omega _\Lambda }(-\infty ,E])\right] \qquad for~E\in {\mathbb {R}}, \end{aligned}$$
(4.3)

and the subscript \(\Lambda \) on the IDS is dropped in the case of the operator \(H^\omega \).

We start with our Hypothesis on the localization. where we set \(P_n\) to be the orthogonal projection onto \(L^2(supp(u_n))\).

Hypothesis 4.2

A compact interval \(J\subset {\mathbb {R}}\) is said to be in the region of localization for \(H^\omega \) with rate of decay \(\xi _s\) and exponent \(0< s <1\), if there exists \(C,\xi _s>0\) such that

$$\begin{aligned} \sup _{\mathfrak {R}(z) \in J, \mathfrak {I}(z) >0}\mathop {\mathbb {E}}_\omega \left[ \left\Vert P_n(H^\omega - z)^{-1}P_k\right\Vert ^s\right] \le C e^{-\xi _s \left\Vert n - k\right\Vert } \end{aligned}$$
(4.4)

for any \(n,k \in {\mathbb {Z}}^d\). For the operators \(H_\Lambda ^\omega \) exponential localization is similarly defined with \(\Lambda , H^\omega _\Lambda , \xi _{s,\Lambda }\) replacing \({\mathbb {Z}}^d, H^\omega ,\xi _s\) respectively in the bound for the same J.

We assume that for all \(\Lambda \)large enough\(\xi _{s,\Lambda } \ge \xi _s\)forJin the region of localization and the constants\(C, \xi _s\)do not change if we change the density\(\rho _n\)with one of its derivatives at finitely many n.

Remark 4.3

We note that the above Hypothesis holds with \(\xi _s >0\), for the models of the type we consider under a large disorder condition, introduced via a coupling constant. The condition \(\xi _s >0\) is sufficient for our Theorem and there is no need to specify how large it should be. Similarly the multiscale analysis which is the starting point of the fractional moment bounds, uses apriori bounds that depend on the Wegner estimate which depends on only the constant \({\mathcal {D}}\). So changing the distribution \(\rho _n\) with one of its derivatives at finitely many points n does not affect the constants \(C, \xi _s\).

Our main Theorem given next, is the analogue of the Theorem 3.4. We already know from Lemma A.5, that \(u_0E_{H^\omega }(-\infty ,E)\) is trace class for any \(E \in {\mathbb {R}}\), hence we will be working with

$$\begin{aligned} {{\mathcal {N}}}(E)=\mathop {\mathbb {E}}_\omega \left[ tr(u_0 E_{H^\omega }(-\infty ,E))\right] \qquad for~ E\in {\mathbb {R}}. \end{aligned}$$
(4.5)

The function \({{\mathcal {N}}}\) is well defined by Lemma A.5 and is known to be continuous (see [14, Theorem 1.1] for example) whenever \(\rho \) is continuous.

By the Pastur-Shubin trace formula for the IDS, the function \({{\mathcal {N}}}\) is at most a constant multiple of IDS, since \(\int u_0(x) dx \) may not be equal to 1, but this discrepancy does not affect the smoothness properties, so we will refer to \({{\mathcal {N}}}\) as the IDS below.

Our main Theorem given below implies that the density of states DOS is \(m-1\) times differentiable in J when \(\rho \) satisfies the conditions of the Theorem.

Theorem 4.4

On the Hilbert space \(L^2({\mathbb {R}}^d)\) consider the self-adjoint operators \(H^\omega \) given by (4.1), satisfying the Hypothesis 4.1. Let J be an interval in the region of localization satisfying the Hypothesis 4.2 with \(\xi _s > 0\) for some \(0< s < 1/6\). Suppose the density \(\rho \in C_c^m((0,\infty ))\), and \(\rho ^{(m)}\) is \(\tau \)-Hölder continuous for some \( s < \tau /2\). Then \({{\mathcal {N}}}\in C^{(m-1)}(J)\) and \({{\mathcal {N}}}^{(m)}\) exists almost everywhere in J.

Remark 4.5

A Theorem of Aizenman et al. [3, Theorem 5.2] shows that there are operators \(H^\omega \) of the type we consider for which the Hypothesis 4.2 is valid for large coupling \(\lambda \), where it was required that \(0< s < 1/3\). We take \(0< s < 1/6\) as we need to controls 2s-th moment of averages of norms of resolvent kernels in our proof.

Proof

We consider the boxes \( \Lambda _L=\{-L,\cdots ,L\}^d\), and set \(H_L^\omega = H_{\Lambda _L}^\omega , ~~ {{\mathcal {N}}}_L = {{\mathcal {N}}}_{\Lambda _L}^\omega \).

The strong resolvent convergence of \(H_{\Lambda _L}^\omega \) to \(H^\omega \), which is easy to verify, implies that \({{\mathcal {N}}}_{\Lambda _L}\) converges to \({{\mathcal {N}}}\) point wise since \({{\mathcal {N}}}\) is known to be a continuous function for the operators we consider. Since \(tr(u_0E_{H^\omega _L}((-\infty , E]))\) is a bounded measurable complex valued function, \({{\mathcal {N}}}_L \in C^m(J)\), by Theorem 2.1. Therefore it is enough to show that \({{\mathcal {N}}}(\cdot )-{{\mathcal {N}}}_{\Lambda _N}(\cdot )\) (which is a difference of distribution functions of the \(\sigma \)-finite measures \(tr(u_0E_{H^\omega }(\cdot ))\) and \(tr(u_0 E_{H_N^\omega }(\cdot ))\) appropriately normalized) is in \(C^m(J)\) for some N. We will need to use the Borel–Stieltjes transforms of these measures for the rest of the proof, but these transforms are not defined because \(u_0(H^\omega _N-z)^{-1}\) fails to be in trace class. Therefore we have to approximate \(u_0\) using finite rank operators first.

To this end let \(Q_k\) be a sequence of finite rank orthogonal projections, in the range of \(u_0\) such that they converge to the identity on this range. We then define,

$$\begin{aligned} {{\mathcal {N}}}_{L, Q_k}(E) = \mathop {\mathbb {E}}_\omega \left( tr(Q_ku_0E_{H^\omega _L}(-\infty , E]) \right) . \end{aligned}$$
(4.6)

Since the projections \(Q_k\) strongly converge to the identity on the range of \(u_0\), the projections \(Q_ku_0E_{H_{L}^\omega }((-\infty , E))\) also converge strongly to \(u_0E_{H_{L}^\omega }((-\infty , E))\) point wise in E. This convergence implies that \({{\mathcal {N}}}_{L, Q_k}(E)\) converge point wise to \({{\mathcal {N}}}_L(E)\) for any fixed L. Henceforth we drop the subscript on \(Q_k\) but remember that the rank of Q is finite.

Since Q is finite rank, the measures \(tr(Qu_0E_{H^\omega _L}(\cdot ))\) are finite measures. Therefore we can define the Borel–Stieltjes transform of the finite signed measure

$$\begin{aligned} \mathop {\mathbb {E}}_\omega \left[ tr(Qu_0E_{H^\omega _{L+1}}(\cdot ))- tr(Qu_0E_{H^\omega _L}(\cdot ))\right] , \end{aligned}$$

namely

$$\begin{aligned}&\mathop {\mathbb {E}}_\omega \left[ tr(Qu_0(H_{L+1}^\omega - z)^{-1} - tr(Qu_0(H_L^\omega -z)^{-1})\right] \nonumber \\&\qquad = \int \frac{1}{x-z}~~ d\mathop {\mathbb {E}}_\omega \left[ tr(Qu_0E_{H^\omega _{L+1}}(x))- tr(Qu_0E_{H^\omega _L}(x))\right] , \end{aligned}$$
(4.7)

where the signed measure has finite total variation for each Q and each L. Then the derivatives of \({{\mathcal {N}}}_{L+1, Q}(E) - {{\mathcal {N}}}_{L, Q}(E)\) are given by

$$\begin{aligned}&\lim _{\epsilon \downarrow 0} \frac{1}{\pi } \mathop {\mathbb {E}}_\omega \left[ tr(Qu_0\mathfrak {I}(H_{L+1}^\omega -E- i \epsilon )^{-1}) - tr(Qu_0\mathfrak {I}(H_L^\omega - E- i\epsilon )^{-1})\right] \nonumber \\&\quad =\lim _{\epsilon \downarrow 0} \frac{1}{\pi } \mathop {\mathbb {E}}_\omega \left[ tr\left[ Qu_0\bigg (\mathfrak {I}(H_{L+1}^\omega -E- i \epsilon )^{-1} - \mathfrak {I}(H_L^\omega - E- i \epsilon )^{-1}\bigg ) \right] \right] . \end{aligned}$$
(4.8)

Then, using the idea of a telescoping sum, as done in the previous section, we need to prove that

$$\begin{aligned} \sum _{L=N}^{\infty }\displaystyle {\sup _{\mathfrak {R}(z) \in J} \left| \frac{d^\ell }{dz^\ell }\bigg (\mathop {\mathbb {E}}_\omega \left[ tr(Qu_0(H^\omega _{L+1}-z)^{-1}) -\mathop {\mathbb {E}}_\omega \left[ tr(u_0(H^\omega _L-z)^{-1})\right] \bigg )\right] \right| } < \infty . \end{aligned}$$
(4.9)

We set (taking \(\kappa (L)\) as the volume of \(\Lambda _L{\setminus }\{0\}\)),

$$\begin{aligned}&G_L^\omega (z) = Qu_0 (H_L^\omega - z)^{-1} u_0, ~~~~ S(\vec {\omega },Q, L, z) = G_{L+1}^\omega (z) - G_{L}^\omega (z), \nonumber \\&\Phi _{L+1}(\vec {\omega }) = \prod _{n \in \Lambda _{L+1}} \rho (\omega _n), ~~~ \kappa (L) = |\Lambda _L|-1. \end{aligned}$$
(4.10)

Then, following the sequence of steps leading from Eqs. (3.11) to (3.16), we need only to consider

$$\begin{aligned}&T(L,\ell ,Q, z) = \frac{d^\ell }{dE^\ell } \mathop {\mathbb {E}}_\omega \big [ tr(G^\omega _{L+1}(E+i\epsilon ) - G^\omega _{L}(E+i\epsilon ))\big ] \nonumber \\&\quad = \int _{{\mathbb {R}}^{\kappa (L+1)+1}} tr(S(\vec {\omega },Q,L, E, \epsilon )) (\mathbf{D}^\ell \Phi _{L+1})(\vec {\omega })d\vec {\omega }, \end{aligned}$$
(4.11)

to estimate and show that

$$\begin{aligned} \sum _{L=N}^\infty \sup _{\begin{array}{c} \mathfrak {R}(z) \in J, \mathfrak {I}(z) >0,\\ \ell \le m,\\ Q \end{array}} |T(L,\ell ,Q, z)| < \infty , \end{aligned}$$
(4.12)

to prove the theorem. Using the steps followed from getting Eq. (3.18) from the equality (3.17), which is an identical calculation here, to get

$$\begin{aligned} T(L,\ell , Q, z)= & {} \displaystyle {\sum _{\begin{array}{c} \sum _{n=1}^{\Lambda _{L+1}} k_n = \ell \\ k_n \ge 0 \end{array}} \big (\begin{array}{c} \ell \\ k_0, \dots , k_{\kappa (L+1)+1} \end{array}\big )} \int _{{\mathbb {R}}^{\kappa (L+1)}} tr\bigg (\int \big (G_{L+1}^\omega (z) - G_{L}^\omega (z)\big )\rho _0^{(k_0)}(\omega _0) d\omega _0\bigg ) \nonumber \\&\cdot \bigg (\prod _{n\in \Lambda _{L+1}{\setminus }\{0\}} \rho _n^{(k_n)}(\omega _{n})d\omega _{n}\bigg ) . \end{aligned}$$
(4.13)

To proceed further, we need to get a uniform bound in the projection Q. We will show that the expression

$$\begin{aligned} {{\mathcal {G}}}(L,z,\omega ) = u_0 (H_{L+1}^\omega - z)^{-1} - u_0 (H^\omega _L -z)^{-1}, \end{aligned}$$
(4.14)

automatically comes with a trace class operator. This fact helps us drop the Q occurring in the expression

$$\begin{aligned} \big (G_{L+1}^\omega (z) - G_{L}^\omega (z)\big ) = Q {{\mathcal {G}}}(L, z, \omega ) u_0, \end{aligned}$$
(4.15)

making estimates on the trace.

We need a collection of \(d+2\) smooth functions \(0 \le \Theta _j \le 1, j=0,\dots , d+1\), where d is the dimension we are working with. Setting

$$\begin{aligned} \alpha _j = 2^{j+2}, j\in \{0,1,2, \dots , 2d+2\}, \end{aligned}$$
(4.16)

we choose the functions \(\Theta _j\) from \(C^\infty ({\mathbb {R}}^d)\) satisfying

$$\begin{aligned} \Theta _j(x) = {\left\{ \begin{array}{ll} 1, ~~ |x| \le \alpha _{2j}, \\ 0, ~~ |x| > \alpha _{2j+1},\end{array}\right. } ~~ j=0, \dots d+1 \end{aligned}$$
(4.17)

and note that all the derivatives of \(\Theta _j\) are bounded for all j, because they are all continuous and supported in a compact set. These functions satisfy the property

$$\begin{aligned} \Theta _{j+1}\phi = \phi , ~~ if ~ supp(\phi ) \subset supp(\Theta _j), ~~ j=0, \dots , d, \end{aligned}$$
(4.18)

in particular

$$\begin{aligned} \Theta _{j+1}\Theta _j = \Theta _j, ~~ for ~~ all ~~ j=0, \dots , d. \end{aligned}$$
(4.19)

We then take a free resolvent operator \(R_{L,a}^0 = (H_{0,\Lambda _L} + a)^{-1}\), with \(a>> 1\). Since, \(H_0\) is bounded below, \(R_{L,a}^0\) is a bounded positive operator for any L. It is a fact that, for any smooth bump function \(\phi \),

$$\begin{aligned} {[}\phi , H_0]R^0_{L,a},R^0_{L,a} u_j \in {{\mathcal {I}}}_p, ~~ p > d. \end{aligned}$$
(4.20)

See Combes et al. [14, Lemma A.1] and Simon [50, Chapter 4] for further details. Using the definition of \({{\mathcal {G}}}\) given in Eq. (4.15), the relation (4.19) and the resolvent equation we get

$$\begin{aligned}&{{\mathcal {G}}}(L,z, \omega ) \Theta _0 = u_0\bigg [ (H^\omega _{L+1} - z)^{-1} - (H_L^\omega - z)^{-1}\bigg ] \Theta _0 \nonumber \\&\quad =u_0\bigg [ (H^\omega _{L+1} - z)^{-1} - (H_L^\omega - z)^{-1}\bigg ]\Theta _1\Theta _0 \nonumber \\&\quad =u_0\bigg [ (H^\omega _{L+1} - z)^{-1}\Theta _1 - \Theta _1R_{L,a}^0 + \Theta _1R_{L,a}^0 - (H_L^\omega - z)^{-1}\Theta _1\bigg ]\Theta _0 \nonumber \\&\quad =u_0\bigg [ \big ((H^\omega _{L+1} - z)^{-1}\Theta _1 - \Theta _1R_{L,a}^0\big ) - \big ((H_L^\omega - z)^{-1}\Theta _1 - \Theta _1R_{L,a}^0\big )\bigg ]\Theta _0 \nonumber \\&\quad =u_0\bigg [ \big ((H^\omega _{L+1} - z)^{-1}\bigg (\Theta _1(H_{0,L}+a) - (H_{L+1}^\omega -z)\Theta _1\bigg )R_{L,a}^0\big )\nonumber \\&\qquad - \big ((H_L^\omega - z)^{-1}\bigg ((H_{0,L}+a)\Theta _1 - (H_{L}^\omega - z)\Theta _1\bigg )R_{L,a}^0\big )\bigg ] \Theta _0 \nonumber \\&\quad =u_0\bigg [ \big ((H^\omega _{L+1} - z)^{-1}\bigg (\Theta _1H_{0,L} - H_{0,L+1}\Theta _1 + (z+a - V_{L+1}^\omega )\Theta _1\bigg )R_{L,a}^0\big )\nonumber \\&\qquad - \big ((H_L^\omega - z)^{-1}\bigg (H_{0,L}\Theta _1 - H_{0,L}\Theta _1 + (z+a - V_L^\omega )\Theta _1\bigg )R_{L,a}^0\big )\bigg ] \Theta _0 \nonumber \\&\quad =u_0\big [ (H^\omega _{L+1} - z)^{-1}- (H_L^\omega - z)^{-1}\big ] \nonumber \\&\qquad \cdot \bigg [ [\Theta _1, H_0] + \bigg (z+a - \sum _{|n| \le \alpha _1} \omega _n u_n\bigg )\Theta _1 \bigg ]R_{L,a}^0\Theta _0 \nonumber \\&\quad ={{\mathcal {G}}}(L,z,\omega ) \bigg [ [\Theta _1, H_0] + \bigg (z+a - \sum _{|n| \le \alpha _1} \omega _n u_n\bigg )\Theta _1 \bigg ] R_{L,a}^0\Theta _0 \nonumber \\&\quad ={{\mathcal {G}}}(L,z,\omega ) \bigg (A_0(z,a,\alpha _1, H_0) + \sum _{|n| \le \alpha _1} \omega _n B_{0,n}(a,\alpha _1) \bigg ) \end{aligned}$$
(4.21)

where we used the definition

$$\begin{aligned}&A_0(z,a,\alpha _1, H_0) = \big ([\Theta _1, H_0] + (z+a)\Theta _1\big ) R_{L,a}^0\Theta _0 \nonumber \\&B_{0,n}(a,\alpha _1) = -u_n \Theta _1 R_{L,a}^0\Theta _0 \end{aligned}$$
(4.22)

and in passing from equality 6 to equality 7 of the above equation, used the fact that the support of \(\Theta _1\) is far away from the boundary of \(\Lambda _L,\Lambda _{L+1}\), so \(V_L^\omega , V_{L+1}^\omega \) agree on the support of \(\Theta _1\) and also the commutators of \(\Theta _1\) with \(H_{0,L}, H_{0,L+1}\) are the same and agree with that of \(H_0\). In the above \(A_0, B_{0,n}\) are operators independent of \(\omega \), each of which is in \({{\mathcal {I}}}_p\), by Eq. (4.19). Using the definitions and properties of \(\Theta _j\), we see that

$$\begin{aligned} \Theta _2 A_0(z,a,\alpha _1,H_0) = A_0(z,a,\alpha _1,H_0), ~~~ and ~~ \Theta _2 B_{0,n}(a,\alpha _1) = B_{0,n}(a,\alpha _1). \end{aligned}$$

Therefore we can repeat this argument by defining for \(j=0, \dots d\),

$$\begin{aligned}&A_j(z,a,\alpha _{2j+1}, H_0) = ([\Theta _{2j+1}, H_0] + (z+a)\Theta _{2j+1}) R_{L,a}^0\Theta _{2j} \nonumber \\&B_{j,n}(a,\alpha _{2j+1}) = -u_n \Theta _{2j+1} R_{L,a}^0\Theta _{2j}, ~ |n| \le \alpha _{2j+1}, \end{aligned}$$
(4.23)

by using the fact that

$$\begin{aligned}&\Theta _{2j} A_{j-1}(z,a,\alpha _{2j+1},H_0) = A_{j-1}(z,a,\alpha _{2j+1},H_0), ~~~ and \nonumber \\&\Theta _{2j} B_{j-1,n}(a,\alpha _{2j+1}) = B_{j-1,n}(a,\alpha _{2j+1}), \end{aligned}$$
(4.24)

for each \(j=1,2,\dots d\). We can then re-write the Eq. (4.21) as

$$\begin{aligned}&{{\mathcal {G}}}(L,z, \omega ) = {{\mathcal {G}}}(L,z, \omega ) \prod _{j=0}^{\begin{array}{c} d \\ \leftarrow \end{array}} \bigg (A_j(z,a,\alpha _{2j+1}, H_0) + \sum _{|n| \le \alpha _{2j+1}} \omega _n B_{j,n}(a,\alpha _{2j+1}) \bigg ), \end{aligned}$$
(4.25)

where the arrow on the product indicates an ordered product with the operator sum with a lower index j coming to the right of the one with a higher index j.

Now, counting the number of terms there are in the product, we see that each sum \(\sum _{|n| \le \alpha _{2j+1}} \) has a maximum of \((2 \alpha _{2j+1})^d = 2^{d(2j+4)}\) terms. A simple computation shows that there are a maximum of \(2^{d^2(d+4)}\) terms, if we completely expand out the product. In other words the number of terms are dependent on d but not on L.

We will now write the expression in Eq. (4.25) as

$$\begin{aligned}&{{\mathcal {G}}}(L,z, \omega ) = \sum _{|n| \le \alpha _{2d+2}} {\mathcal G}(L,z, \omega ) u_n \bigg ( \sum _{r_1, r_2 = 0}^{d+1} \omega _n^{r_1} \omega _0^{r_2} P_{n,0}(k, r, \omega ) \bigg ), \end{aligned}$$
(4.26)

where \(P_{n,0}(k,r), ~ r =(r_1, r_2)\) is a trace class operator valued function of \(\omega \), but independent of \(\omega _0, \omega _n\) for each kr. Note that even though \(A_d\) and \(B_d\) are supported in \(supp(\Theta _d)\), \(\sum _{|n|\le \alpha _{2d+1}}u_n\) is not one on the support of \(\Theta _d\), so we have to take a larger sum in the above expression. We can see from the structure of the product that the trace norms satsify a bound

$$\begin{aligned} \sup _{\mathfrak {R}(z) \in J, 0 < \mathfrak {I}(z) \le 1} \Vert P_{n,0}(k,r)\Vert _1 \le C_7(d, a, J), \end{aligned}$$

since an inspection of the product in Eq. (4.25), shows that in any product, z and \(\{\omega _{{\tilde{n}}}, {\tilde{n}} \ne 0, n\}\) occurs at most to a power of \(d+1\). The uniform boundedness of the trace norm as a function of \(z, ~\omega _{{\tilde{n}}}\) is clear since these variables are in compact sets. As for the finiteness of the trace norm itself, we note that any product has \(d+1\) factors from the set \(\{A_j, B_j, j=0, \dots , d\}\), hence by the claim in Eq. (4.20), such a product is trace class.

Using Eqs. (4.104.134.14, 4.15) and Eq. (4.26) in Eq. (4.13), we get, using the fact that \(P_{n,0}()\) are independent of \(\omega _0, \omega _n\),

$$\begin{aligned}&T(L,\ell , Q, z)\nonumber \\&\quad =\sum \limits _{\begin{array}{c} \sum _{n=1}^{k(L+1)+1}k_{n}=l\\ k_{n}\ge 0 \end{array} } \big (\begin{array}{c} \ell \\ k_0, \dots , k_{\kappa (L+1)+1} \end{array}\big ) \int _{{\mathbb {R}}^{\kappa (L+1)-1}} \sum _{|n|\le \alpha _{2d+2}} \sum _{r_1, r_2 =0}^{d+1} tr\bigg (Q \bigg [ \int u_0\big ((H_{L+1}^\omega -z)^{-1}\nonumber \\&\qquad - (H_L^\omega -z)^{-1}\big ) u_n \omega _n^{r_2} \omega _0^{r_1} \rho _n^{(k_n)}(\omega _n)\rho _0^{(k_0)}(\omega _0) d\omega _n d\omega _0 \bigg ] P_{n,0}(k,r,\omega )\bigg )\nonumber \\&\qquad \times \prod _{m\in \Lambda _{L+1}{\setminus }\{0,n\}} \rho _m^{(k_m)}(\omega _{m})d\omega _{m}. \end{aligned}$$
(4.27)

We now estimate the absolute value of the trace in Eq. (4.27) using the Theorem 2.2(1), taking the \(\phi _\mathbf{R}\) that appears there, for bounding the norm of the integral with respect to \(\omega _n, \omega _0\), since \(2s <\tau \).

$$\begin{aligned}&|T(L, \ell , Q, z)| \nonumber \\&\quad \le \sum \limits _{\begin{array}{c} \sum _{n=1}^{k(L+1)+1}k_{n}=l\\ k_{n}\ge 0 \end{array} } \big (\begin{array}{c} \ell \\ k_0, \dots , k_{\kappa (L+1)+1} \end{array}\big ) \int _{{\mathbb {R}}^{\kappa (L+1)-2}}\sum _{|n|\le \alpha _{2d+2}} \sum _{r_1, r_2 =0}^{d+1} \Vert Q\Vert \Vert P_{n,0}(k,r,\omega )\Vert _1 \nonumber \\&\qquad \bigg [ \int \big \Vert (u_0 + u_n)^\frac{1}{2}\big ((H_{L+1}^\omega -z)^{-1} - (H_L^\omega -z)^{-1}\big ) {(u_n+u_0)}^\frac{1}{2}\big \Vert ^s \phi _\mathbf{R}(\omega _0)\phi _\mathbf{R}(\omega _n) d\omega _n d\omega _0 \bigg ] \nonumber \\&\qquad \prod _{m\in \Lambda _{L+1}{\setminus }\{0,n\}} |\rho _m^{(k_m)}(\omega _{m})|d\omega _{m}. \end{aligned}$$
(4.28)

In the above inequality we also used the fact that \(u_0 (u_0 + u_n)^{-\frac{1}{2}}, u_n (u_0 + u_n)^{-\frac{1}{2}}\) are both bounded uniformly in n and replaced \(u_0, u_n\) by \((u_0 + u_n)^{\frac{1}{2}}\) on either side of the resolvents.

We would prefer to work with probability measures in above equation, so we normalize \(|\rho _m^{(k_m)}(x)|dx\) by their \(L^1\) norm. We also do the same for \(\phi _\mathbf{R}\). We then follow the steps involved in obtaining the inequality (3.21). We set \(\eta (m, \rho ) = (\sup _{n \in {\mathbb {Z}}^d, k_n \le m} \Vert \rho _n^{k_n}\Vert _1 + \Vert \rho _n^{k_n}\Vert _\infty ) + \Vert \phi _\mathbf{R}\Vert _1\) to get,

$$\begin{aligned}&|T(L, \ell , Q, z)| \nonumber \\&\quad \le \sum \limits _{\begin{array}{c} \sum _{n=1}^{k(L+1)+1}k_{n}=l\\ k_{n}\ge 0 \end{array} } \big (\begin{array}{c} \ell \\ k_0, \dots , k_{\kappa (L+1)+1} \end{array}\big ) \sum _{|n|\le \alpha _{2d+2}} C_9(a,d,J, \eta (\rho ,m)) \nonumber \\&\qquad \times \mathbb {E}_{L+1} \bigg [ \Vert (u_0 + u_n)^\frac{1}{2}\big ((H_{L+1}^\omega -z)^{-1} - (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}\Vert ^s \bigg ], \end{aligned}$$
(4.29)

where \(\mathbb {E}_{L+1}\) is the expectation with respect to the probability density

$$\begin{aligned} \frac{\phi _\mathbf{R}(\omega _0)d\omega _0}{\left\Vert \phi _\mathbf{R}\right\Vert _1}\frac{\phi _\mathbf{R}(\omega _n)d\omega _n}{\left\Vert \phi _\mathbf{R}\right\Vert _1} \prod _{m\in \Lambda _{L+1}{\setminus }\{0,n\}} \frac{|\rho _m^{(k_m)}(\omega _{m})|}{\Vert \rho _m^{(k_m)}\Vert _1}d\omega _{m}. \end{aligned}$$

We define a smooth radial function \(0 \le \Psi \le 1\) such that

$$\begin{aligned} \Psi (x) = {\left\{ \begin{array}{ll} 1, ~~ |x| \le L/2, \\ 0, ~~ |x| > L/2 + 4 \end{array}\right. }\!\!. \end{aligned}$$

Then \(\Psi _L \sqrt{u_0 + u_n} = \sqrt{u_0 + u_n}, ~~ |n| \le \alpha _{2d+2}\). Following the steps similar to obtaining the inequality (4.21), using the relation \((H_{0,L} +a) R_{L,a}^0 = Id\), we have

$$\begin{aligned}&(u_0 + u_n)^\frac{1}{2}\big ((H_{L+1}^\omega -z)^{-1} - (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}\nonumber \\&\quad = (u_0 + u_n)^\frac{1}{2}\big ((H_{L+1}^\omega -z)^{-1}[\Psi _L, H_0] (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}\nonumber \\&\quad = (u_0 + u_n)^\frac{1}{2}(H_{L+1}^\omega -z)^{-1}[\Psi _L, H_0]\big [R^0_{L,a}+(H^\omega _L-z)^{-1}-R^0_{L,a} \big ] (u_n+u_0)^\frac{1}{2}\nonumber \\&\quad = (u_0 + u_n)^\frac{1}{2}(H_{L+1}^\omega -z)^{-1}[\Psi _L, H_0] R_{L,a}^0 \big ( I + (z + a - V_L^\omega ) (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}\nonumber \\&\quad = (u_0 + u_n)^\frac{1}{2}(H_{L+1}^\omega -z)^{-1} \bigg [ -\sum _{i=1}^d \frac{\partial ^2}{\partial x_i^2} \Psi _L + 2\sum _{i=1}^d\bigg (\frac{\partial }{\partial x_i} \Psi _L\bigg )\bigg (-i\frac{\partial }{\partial x_i} + A_i\bigg )\bigg ] \nonumber \\&\qquad \quad \times R_{L,a}^0 \big ( I + (z + a - V_L^\omega ) (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}. \end{aligned}$$
(4.30)

We take a smooth bounded radial function \(0 \le \Upsilon _L \le 1\) which is 1 in a neighbourhood of \(L/2 \le r \le L/2 + 4\) and zero outside a neighbourhood of radial width 10. Then using the fact that

$$\begin{aligned}&\Upsilon _L \bigg (\sum _{i=1}^d \frac{\partial ^2}{\partial x_i^2} \Psi _L\bigg ) = \bigg (\sum _{i=1}^d \frac{\partial ^2}{\partial x_i^2} \Psi _L\bigg ) \nonumber \\&\Upsilon _L \bigg (\frac{\partial }{\partial x_i} \Psi _L\bigg ) = \bigg (\frac{\partial }{\partial x_i} \Psi _L\bigg ), ~ for ~~ all ~~ i=1,\dots , d \end{aligned}$$
(4.31)

and (4.30), we can now bound the expectation in the inequality (4.29),by

$$\begin{aligned}&\mathbb {E}_{L+1} \bigg [ \Vert (u_0 + u_n)^\frac{1}{2}\big ((H_{L+1}^\omega -z)^{-1} - (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}\Vert ^s \bigg ] \nonumber \\&\quad \le \mathbb {E}_{L+1} \bigg [ \Vert (u_n+u_0)^\frac{1}{2}(H_{L+1}^\omega - z)^{-1} \Upsilon _L\Vert ^s\nonumber \\&\qquad \bigg \Vert \bigg [ \bigg (-\sum _{i=1}^d \frac{\partial ^2}{\partial x_i^2} \Psi _L\bigg ) + 2\sum _{i=1}^d\bigg (\frac{\partial }{\partial x_i} \Psi _L\bigg )\bigg (-i\frac{\partial }{\partial x_i} + A_i\bigg )\bigg ] R_{L,a}^0\bigg \Vert ^s \nonumber \\&\qquad \big (1 + |z|+a + \Vert V_L^\omega \Vert _\infty ) \Vert \chi _{\Lambda _L} (H_L^\omega - z)^{-1} \sqrt{u_0 + u_n} \Vert ^s \bigg ]. \end{aligned}$$
(4.32)

Then using Cauchy–Schwartz inequality and Hypothesis 4.2 we get an exponential bound for the first factor, a uniform bound for the second factor after noting that \(dist(supp(\Upsilon _L, \{n : |n| \le \alpha _{2d}+1\}) \ge L/4\), \(\Vert \Lambda _L\Vert \le (2L)^d \), we get the estimate

$$\begin{aligned}&\sup _{z: \mathfrak {R}(z) \in J, \mathfrak {I}(z) \le 1} \mathop {\mathbb {E}}_\omega \bigg [ \Vert (u_0 + u_n)^\frac{1}{2}\big ((H_{L+1}^\omega -z)^{-1} - (H_L^\omega -z)^{-1}\big ) (u_n+u_0)^\frac{1}{2}\Vert ^s \bigg ] \nonumber \\&\qquad \le C_{10}(a, J, d) L^d e^{-\xi _{2s} L}. \end{aligned}$$
(4.33)

Using this inequality in (4.29) we get the bound

$$\begin{aligned}&\sup _{\begin{array}{c} z: \mathfrak {R}(z) \in J, \mathfrak {I}(z) \le 1,\\ Q \\ \ell \le m \end{array}} |T(L, \ell , Q, z)| \nonumber \\&\quad \le C_{11}(a, d, J, \eta (\rho , m)) (L+1)^{d(m+1} e^{-\xi _{2s} L}, \end{aligned}$$
(4.34)

as the combinatorial sum

$$\begin{aligned} \displaystyle {\sum \limits _{\begin{array}{c} \sum _{n=1}^{k(L+1)+1}k_{n}=l\\ k_{n}\ge 0 \end{array} }\big (\begin{array}{c} \ell \\ k_0, \dots , k_{\kappa (L+1)} \end{array}\big )} \end{aligned}$$

is easily seen to add up to \((L+1)^{d\ell }\), which is still polynomial in L. This bounds shows the summability in Eq. (4.9) completing the proof. \(\quad \square \)