1 Introduction

In classical panel data analysis, it is often assumed that the slope coefficients of interest in panel data models are homogeneous across individual units. However, in practice, they can be individually specific. Ignoring this form of heterogeneity may result in biased estimation and inference. Thus, a formal test for slope homogeneity is necessary. When the number of individuals or panels, N, is fixed, and the number of temporal observations, T, diverges, a simple method is to use the standard F-test, which assumes exogenous regressors and homoskedastic errors. To eliminate the effect of heteroscedasticity, Swamay [20] proposed a dispersion test based on generalized least squares estimators under a random coefficient model. Another type of tests is based on Hausman’s test [11], where the standard fixed effects estimator is compared to the mean group estimator; see, for example, Pesaran et al. [17] and Phillips and Sul [19]. However, these methods are not applicable in the case of panel data models that contain only strictly exogenous regressors and/or in the case of pure autoregressive models [18]. An early work of [22] proposed the seemingly unrelated regression equation (SURE) approach to incorporate cross-sectional dependence. The above approaches assume that \(N<T\), and would lose their efficiency or even fail when N is comparable to, or even larger than, T, such as in many micro-econometric applications; the latter situation is referred to as large or high-dimensional panel data models.

In a high-dimensional setup, the dispersion test proposed by [17] allows \(N>T\). Pesaran et al. [18] investigated the asymptotic distribution of the test statistic proposed by [20] in a large NT scenario, and proposed a modified Swamy-type statistic, based on different estimators of regression error variances. Under the paradigm of fixed T but diverging N, Juhl and Lugovskyy [13] proposed a conditional Lagrange multiplier test based on the conditional Gaussian likelihood function, and [4] proposed some Lagrange multiplier tests, generalizing the test proposed by [5] against random individual effects to all regression coefficients.

Most approaches mentioned above are based on the summation of a sequence of test statistics for individual units, which are referred to as Sum tests. Sum tests turn to be efficient under dense alternatives, in the sense that the number of individual units with heterogeneous slope coefficients is large. However, for sparse alternatives when there are only a few heterogeneous individual units, Sum tests would be inefficient. In the latter situation, a maximum-based strategy can be more suitable, as widely discussed in the statistical literature, such as [6] and [21]. Motivated by this, we first propose a Max test based on the maximum of these individual test statistics. We establish its asymptotic distribution under the null hypothesis when \(N,T\rightarrow \infty \), and show that the Max test outperforms a certain Sum test [18] in terms of power under sparse alternatives.

In practice, we seldomly know whether the alternatives are dense or sparse. Thus, it is kind of risky to simply apply a single Sum or Max test, if we have no priors on the sparsity level. This motivates us to develop an adaptive test to different levels of sparsity. We propose a Combo test, which combines the Sum and Max tests together, by taking the minimum p-value of these two separate tests. The asymptotic independence of Sum- and Max-type test statistics has been widely studied in the literature, such as [7, 12, 15], and [10], to name a few. Under some mild conditions, we show that the Sum test statistic is asymptotically independent of the Max test statistic under the null hypothesis when \(N,T\rightarrow \infty \). Consequently, the Combo test statistic is asymptotically distributed as the minimum of two independent standard uniform random variables under the null hypothesis. Theoretical results and simulation studies show that the Combo test performs very robust to either dense or sparse alternatives.

The rest of this paper is organized as follows. In Sect. 2, we give a brief literature review of testing procedures for slope homogeneity. We introduce Max and Combo tests, and establish their theoretical properties in Sect. 3. In Sect. 4, some numerical studies including real-data examples are conducted to evaluate the performance of the proposed methods. Some discussions are given in Sect. 5, and all technical details are deferred to Appendix.

2 The Model and Existing Approaches

We consider the following panel data model with fixed effects and potential heterogeneous slopes

$$\begin{aligned} y_{it}=\alpha _i+{\varvec{x}}_{it}^\top \varvec{\beta }_i+u_{it}, ~i=1,\ldots ,N,~t=1,\ldots ,T, \end{aligned}$$
(2.1)

where \({\varvec{x}}_{it}\) is a p-dimensional vector of strictly exogenous regressors, \(\alpha _i\) and \(\varvec{\beta }_i\) are the scalar intercept and p-dimensional slopes, respectively, and \(u_{it}\) are random errors with mean 0 and variance \(\sigma _i^2\). Suppose that \(\alpha _i\) are bounded on a compact set and \(\varvec{\beta }_i\) are bounded in the sense that \(\Vert \varvec{\beta }_i\Vert < K\) for some constant \(K>0\), where \(\Vert \cdot \Vert \) is the Euclidean norm. Write in a compact form

$$\begin{aligned} {\varvec{Y}}_i=\alpha _i \varvec{1}_T+\textbf{X}_i \varvec{\beta }_i+{\varvec{u}}_i, \end{aligned}$$

where \({\varvec{Y}}_i=(y_{i1},\ldots ,y_{iT})^\top \), \(\varvec{1}_T\) is a T-dimensional vector with all elements being 1, \(\textbf{X}_i=({\varvec{x}}_{i1},\ldots ,{\varvec{x}}_{iT})^\top \), and \({\varvec{u}}_i=(u_{i1},\ldots ,u_{iT})^\top \). Of interest is to test the null hypothesis

$$\begin{aligned} H_0: \varvec{\beta }_i=\varvec{\beta }\ \hbox { for all}\ i=1,\ldots ,N, \end{aligned}$$
(2.2)

against the alternative hypothesis

$$\begin{aligned} H_1: \text {there exist some} \; 1\le i\ne j\le N \text {such that}\; \varvec{\beta }_{i}\not =\varvec{\beta }_{j}. \end{aligned}$$

A well-known test is the standard F-test, which is valid for fixed N and diverging T and when the error variances are homoskedastic, i.e., \(\sigma _i^2=\sigma ^2\). For \(N>T\), [17] proposed a Hausman-type test [11], by comparing the standard fixed effects estimator with the mean group estimator, that is,

$$\begin{aligned} \hat{\varvec{\beta }}_{\textrm{FE}}=\left( \sum _{i=1}^N \textbf{X}_i^\top \textbf{M}\textbf{X}_i\right) ^{-1}\sum _{i=1}^N \textbf{X}_i^\top \textbf{M}{\varvec{Y}}_i\ \text {and}\ \hat{\varvec{\beta }}_{\textrm{MG}}=\frac{1}{N}\sum _{i=1}^N \hat{\varvec{\beta }}_i, \end{aligned}$$

respectively, where \(\hat{\varvec{\beta }}_i=\left( \textbf{X}_i^\top \textbf{M}\textbf{X}_i\right) ^{-1} \textbf{X}_i^\top \textbf{M}{\varvec{Y}}_i\), \(\textbf{M}=\textbf{I}_T-\varvec{1}_T(\varvec{1}_T^\top \varvec{1}_T)^{-1}\varvec{1}_T^\top \), and \(\textbf{I}_T\) is a \(T\times T\) identity matrix. However, this test would lack power under a random coefficient model such that \(E(\hat{\varvec{\beta }}_\textrm{FE}-\hat{\varvec{\beta }}_{\textrm{MG}})=0\). Phillips and Sul [19] proposed a different Hausman-type test based on

$$\begin{aligned} \left( \hat{\varvec{\beta }}-\varvec{1}_N \otimes \hat{\varvec{\beta }}_{\textrm{FE}}\right) ^\top \hat{{\varvec{\Sigma }}}^{-1}\left( \hat{\varvec{\beta }}-\varvec{1}_N \otimes \hat{\varvec{\beta }}_\textrm{FE}\right) , \end{aligned}$$

where \(\hat{\varvec{\beta }}=(\hat{\varvec{\beta }}_1^\top ,\ldots ,\hat{\varvec{\beta }}_N^\top )^\top \), \(\hat{{\varvec{\Sigma }}}\) is a consistent estimator of the variance matrix of \(\hat{\varvec{\beta }}-\varvec{1}_N \otimes \hat{\varvec{\beta }}_{\textrm{FE}}\) under \(H_0\). This test is likely to be more powerful than that proposed by [17], but is still limited for fixed N. In the case of fixed N, Swamay [20] proposed a test based on

$$\begin{aligned} {\hat{S}}=\sum _{i=1}^N \left( \hat{\varvec{\beta }}_i-\hat{\varvec{\beta }}_\textrm{WFE}\right) ^\top \frac{\textbf{X}_i^\top \textbf{M}\textbf{X}_i}{\hat{\sigma _i}^2}\left( \hat{\varvec{\beta }}_i-\hat{\varvec{\beta }}_\textrm{WFE}\right) , \end{aligned}$$

where

$$\begin{aligned} \hat{\varvec{\beta }}_{\textrm{WFE}}=\left( \sum _{i=1}^N \frac{\textbf{X}_i^\top \textbf{M}\textbf{X}_i}{\hat{\sigma }_i^2}\right) ^{-1}\sum _{i=1}^N \frac{\textbf{X}_i^\top \textbf{M}{\varvec{Y}}_i}{\hat{\sigma }_i^2} \end{aligned}$$

and \(\hat{\sigma }_i^2=(T-p-1)^{-1}{\left( {\varvec{Y}}_i-\textbf{X}_i\hat{\varvec{\beta }}_i\right) ^\top \textbf{M}\left( {\varvec{Y}}_i-\textbf{X}_i\hat{\varvec{\beta }}_i\right) }\). Based on \({\hat{S}}\), Pesaran and Yamagata [18] showed that, as \(N,T \rightarrow \infty \),

$$\begin{aligned} \hat{\Delta }=\sqrt{N(T+1)}\left( \frac{{\hat{S}}/N-p}{\sqrt{2p(T-p-1)}}\right) \nonumber \end{aligned}$$

converges to the standard normal distribution in distribution, if \(N/T^2\rightarrow 0\). Moreover, they proposed an adjusted test statistic, that is,

$$\begin{aligned} {\tilde{\Delta }}_\textrm{adj}=\sqrt{N(T+1)}\left( \frac{\tilde{S}/N-p}{\sqrt{2p(T-p-1)}}\right) , \end{aligned}$$
(2.3)

to weaken the dimension restriction, where

$$\begin{aligned} \tilde{S}=&\sum _{i=1}^N\tilde{S}_i,\ \tilde{S}_i=\left( \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}\right) ^\top \frac{\textbf{X}_i^\top \textbf{M}\textbf{X}_i}{\tilde{\sigma _i}^2}\left( \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_\textrm{WFE}\right) , \end{aligned}$$
(2.4)
$$\begin{aligned} {\tilde{\varvec{\beta }}}_{\textrm{WFE}}=\left( \sum _{i=1}^N \frac{\textbf{X}_i^\top \textbf{M}\textbf{X}_i}{{\tilde{\sigma }}_i^2}\right) ^{-1}\sum _{i=1}^N \frac{\textbf{X}_i^\top \textbf{M}{\varvec{Y}}_i}{{\tilde{\sigma }}_i^2}, \end{aligned}$$

and \({\tilde{\sigma }}_i^2=(T-1)^{-1}{\left( {\varvec{Y}}_i-\textbf{X}_i\hat{\varvec{\beta }}_\textrm{FE}\right) ^\top \textbf{M}\left( {\varvec{Y}}_i-\textbf{X}_i\hat{\varvec{\beta }}_{\textrm{FE}}\right) }\). In other words, it modifies the \({\hat{S}}\) test by replacing the variance estimators \(\hat{\sigma }_i^2\) by \({\tilde{\sigma }}_i^2\). The authors investigated the asymptotic normality under \(H_0\) for non-normal errors, provided that \(N/T^4\rightarrow \infty \). Notice that for normal errors, both \(\hat{\Delta }\) and \({\tilde{\Delta }}_{\textrm{adj}}\) are valid without any restrictions on N and T.

Under the asymptotic regime of diverging N but fixed T, [13] proposed a conditional Lagrange multiplier test based on

$$\begin{aligned} T_{\textrm{CLM}}=&\sum _{i=1}^N S_i^\top \left( \sum _{i=1}^N S_iS_i^\top \right) ^{-1}\sum _{i=1}^N S_i, \end{aligned}$$
(2.5)

where \(S_i=\hat{{\varvec{u}}}_i^\top \textbf{M}\textbf{X}_i\textbf{X}_i^\top \textbf{M}\hat{{\varvec{u}}}_i-\hat{\sigma }_i^2\textrm{tr}(\textbf{X}_i^\top \textbf{M}\textbf{X}_i)\) and \(\hat{{\varvec{u}}}_i=\textbf{M}({\varvec{Y}}_i-\textbf{X}_i\hat{\varvec{\beta }}_{\textrm{FE}})\). By the fact that

$$\begin{aligned} {\hat{S}}=\sum _{i=1}^N \hat{{\varvec{u}}}_i^\top \textbf{M}\textbf{X}_i (\sigma ^2\textbf{X}_i^\top \textbf{M}\textbf{X}_i)^{-1}\textbf{X}_i^\top \textbf{M}\hat{{\varvec{u}}}_i+o_p(1), \end{aligned}$$

the main difference between \(T_{\textrm{CLM}}\) and \(\hat{\Delta }\) is that the statistics \(S_i\) neglect such terms \((\sigma _i^2\textbf{X}_i^\top \textbf{M}\textbf{X}_i)^{-1}\) in \({\hat{S}}\). In fact, both can be regarded as testing the independence of \({\varvec{u}}_i\) and \(\textbf{M}\textbf{X}_i\) by the moment conditions \(E({\varvec{u}}_i^\top \textbf{M}\textbf{X}_i{\varvec{W}}_i\textbf{X}_i^\top \textbf{M}{\varvec{u}}_i)=\sigma _i^2E(\textrm{tr}(\textbf{M}\textbf{X}_i{\varvec{W}}_i\textbf{X}_i^\top \textbf{M}))\) with properly defined \({\varvec{W}}_i\); that is, \({\varvec{W}}_i=\textbf{I}_p\) for \(T_{\textrm{CLM}}\) and \({\varvec{W}}_i=(\sigma _i^2\textbf{X}_i^\top \textbf{M}\textbf{X}_i)^{-1}\) for \(\hat{\Delta }\). [4] proposed a Lagrange multiplier test under the heteroskedastic errors based on

$$\begin{aligned} T_{\textrm{LM}}=\left( \sum _{i=1}^N\sum _{t=2}^T \tilde{u}_{it}\tilde{z}_{it}\right) ^\top \left( \sum _{i=1}^N\sum _{t=2}^T \tilde{u}_{it}^2\tilde{z}_{it}\tilde{z}_{it}^\top \right) ^{-1}\left( \sum _{i=1}^N\sum _{t=2}^T \tilde{u}_{it}\tilde{z}_{it}\right) , \end{aligned}$$
(2.6)

where \(\tilde{u}_{it}\) is the t-th component of \(\hat{{\varvec{u}}}_i\) and \(\tilde{z}_{it}={\varvec{x}}_{it}\sum _{s=1}^{t-1}\tilde{u}_{is}{\varvec{x}}_{is}\). They showed that \(T_{\textrm{LM}}\rightarrow \chi ^2_p\) in distribution, as \(N\rightarrow \infty \) but keeping T fixed.

3 Our Tests

3.1 Methodology

A large value of

$$\begin{aligned} \tilde{S}_i=\left( \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_\textrm{WFE}\right) ^\top \left( {\textbf{X}_i^\top \textbf{M}\textbf{X}_i}/{\tilde{\sigma _i}^2}\right) \left( \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}\right) \end{aligned}$$

(cf. (2.4)) indicates a heterogeneous individual slope, for \(i=1,\ldots ,N\). Most existing procedures for slope homogeneity testing are based on the summation of all \(\tilde{S}_i\) or some variants. When there are a large proportion of individual units are heterogeneous with different slope coefficients (referred to as dense signals), Sum tests can accumulate all departure information together, thus making a powerful test against \(H_0\). In contrast, when the number of heterogeneous individuals is very small (i.e., sparse signals), the summation statistic brings with redundant noises, which greatly decrease the testing power. Motivated by this, we propose a maximum-based statistic

$$\begin{aligned} T_{\textrm{Max}}=\max _{1\le i\le N}\tilde{S}_i, \end{aligned}$$
(3.1)

and we refer to the associated testing procedure as the Max test. It can be expected that the Max test would be more powerful against sparse alternatives.

Sometimes, we have some knowledge of the sparsity level, and we can choose between a Sum or Max test. However, if such priors are unavailable, a new method that is adaptive to the sparsity is demanded. We propose combining the Sum and Max tests in the following way

$$\begin{aligned} T_{\textrm{Combo}}=\min \{p_S,p_M\}, \end{aligned}$$
(3.2)

where \(p_M\) and \(p_S\) are the p-values of the Max and Sum tests, respectively. To be specific, \(p_{M}=1-F\left\{ T_\textrm{Max}-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma ({p}/{2}))\right\} \) and \(p_{S}=1-\Phi \left\{ {\tilde{\Delta }}_{\textrm{adj}}\right\} \), where \(F(y)=e^{-e^{-y/2}}\) is the type-I extreme distribution function (i.e., the Gumbel distribution function), and \(\Phi (y)\) denotes the standard normal distribution function. Here we use \({\tilde{\Delta }}_{\textrm{adj}}\) [18] as the Sum test statistic. We refer to this new test as the Combo test, which is expected to perform well, regardless of whether the alternatives are sparse or dense.

We summarize some theoretical properties of the Max and Combo tests here; more details are revealed in the following sections. For the Max test, we show that under some mild conditions,

$$\begin{aligned} \tilde{T}_{\textrm{Max}}\equiv T_{\textrm{Max}}-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma ({p}/{2})) \end{aligned}$$

converges to the type-I extreme distribution in distribution under \(H_0\), as \(N,T\rightarrow \infty \). Hence given a significance level \(\alpha \in (0,1)\), we can reject \(H_0\) if \(\tilde{T}_{\textrm{Max}}\) is larger than the \((1-\alpha )\)-quantile of F(y), say \(q_{\alpha }\equiv -2\log (\log (1-\alpha )^{-1})\). We also derive the limiting null distribution of the Combo test by demonstrating that the Max statistic is asymptotically independent of the Sum statistic under \(H_0\), as \(N,T\rightarrow \infty \). Consequently, an asymptotic level-\(\alpha \) test is to reject \(H_0\) if \(T_\textrm{Combo}<1-\sqrt{1-\alpha }\).

3.2 Max Test

To establish theoretical properties of the Max test, we need the following conditions

  1. (C1)

    \(u_{it}\sim N(0,\sigma _i^2)\) and \(\sigma _{max}^2=\max _{1\le i\le N}\sigma _i^2\) is bounded.

  2. (C2)

    \(u_{it}\) and \(u_{js}\) are independently, for \(i\not =j\) and/or \(t\not =s\).

  3. (C3)

    For \(i=1,\ldots ,N\), \({\varvec{\Sigma }}_{iT}\equiv T^{-1}\textbf{X}_i^\top \textbf{M}\textbf{X}_i\) is positive definite and bounded, and converges to a non-stochastic positive definite and bounded matrix \({\varvec{\Sigma }}_i\), as \(T\rightarrow \infty \). \({\varvec{\Sigma }}_{A}\equiv (NT)^{-1}\left( \sum _{i=1}^N\textbf{X}_i^\top \textbf{M}\textbf{X}_i\right) \) is positive definite and converges to a non-stochastic positive definite matrix \({\varvec{\Sigma }}\), as \(N,T\rightarrow \infty \).

  4. (C4)

    \(u_{it}\) is independent of \(x_{js}\), for all ijts.

Condition (C1) is crucial to obtain the asymptotic distribution of the test statistic \(T_{\textrm{Max}}\) and the asymptotic independence between \(T_{\textrm{Max}}\) and \({\tilde{\Delta }}_{\textrm{adj}}\). An extension to non-normal errors deserves further studies, see discussions in Section 5. Condition (C2) assumes the cross-sectional independence, Condition (C3) is used for the consistency of the least square estimators of \(\varvec{\beta }_i\), and Condition (C4) means that \({\varvec{x}}_{it}\) are strictly exogenous; these conditions are standard in the literature, see, for example, [18].

Theorem 3.1

Suppose conditions (C1)–(C4) hold. Under \(H_0\), if \(\log (N)=o(T^{1/3})\), then

$$\begin{aligned} P\Big \{T_{\textrm{Max}}&-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma ({p}/{2}))\le x\Big \}\\&\rightarrow \exp (-\exp (-x/2)), \end{aligned}$$

as \(N,T\rightarrow \infty \).

According to the limiting null distribution, we can reject \(H_0\) if

$$\begin{aligned} \tilde{T}_{\textrm{Max}}\equiv T_\textrm{Max}-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma ({p}/{2}))\ge q_{\alpha }, \end{aligned}$$

where \(q_{\alpha }\) is the \((1-\alpha )\)-quantile of the type-I extreme value distribution with the cumulative distribution function \(\exp \left\{ -\exp \left( -{x}/{2}\right) \right\} \), namely, \(q_{\alpha }=-2\log (\log (1-\alpha )^{-1})\).

Now, we turn to the power analysis of the Max test. Define

$$\begin{aligned} \mathcal {A}(c)=\left\{ \varvec{\delta }: \max _{1\le i\le N} T\sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i} \ge c\log (N)\right\} , \end{aligned}$$

where

$$\begin{aligned} \varvec{\omega }_i={\varvec{\delta }}_{i}-\left( N^{-1}\sum _{i=1}^N{\sigma }_i^{-2}{\varvec{\Sigma }}_{iT}\right) ^{-1}\left( N^{-1} \sum _{i=1}^N{\sigma }_i^{-2}{\varvec{\Sigma }}_{iT}{\varvec{\delta }}_i\right) \ \text {and}\ {\varvec{\delta }}_i=\varvec{\beta }_i-\varvec{\beta }. \end{aligned}$$

Notice that [18] considered the following local alternative hypotheses for the Sum test, that is, \(\sum _{i=1}^N\sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}=O(T^{-1}N^{1/2})\).

Theorem 3.2

Suppose conditions (C1)–(C4) hold. If \(\log (N)=o(T^{1/3})\), then for any \(\epsilon >0\),

$$\begin{aligned} \inf _{\varvec{\delta }\in \mathcal {A}(16+\epsilon )}P(\Psi _{\alpha }=1) \rightarrow 1, \end{aligned}$$

as \(N,T\rightarrow \infty \), where \(\Psi _\alpha =I(\tilde{T}_{\textrm{Max}}\ge q_{\alpha })\) is the power function.

Theorem 3.2 shows that the proposed Max test is consistent if some \(\sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}\) is larger than the order \(\log (N)/T\).

To make a comparison between the Max and Sum tests, define a class of sparse alternatives

$$\begin{aligned} \mathcal {S}&(s_N,c_{T,N})\\ =&\left\{ {\varvec{\delta }}: \sum _{i=1}^N I({\varvec{\delta }}_i\not =0)\le s_N,(16+\epsilon )\log (N) \le \max _{1\le i\le N} T\sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i} \le c_{T,N}\right\} , \end{aligned}$$

with \(s_N=o(\sqrt{N}/c_{T,N})\). By observing that \(\mathcal {S}(s_N,c_{T,N})\subset \mathcal {A}(16+\epsilon )\), the Max test is consistent over \(\mathcal {S}(s_N,c_{T,N})\), according to Theorem 3.2. In contrast, in Section 3.2 of [18], the authors showed that, under \(\mathcal {S}(s_N,c_{T,N})\), \({\tilde{\Delta }}_{\textrm{adj}}\) would suffer from trivial power. Hence the Max test is more efficient than the Sum test in such situations.

3.3 Combo test

To investigate the limiting null distribution and power property of the proposed Combo test, we first demonstrate the asymptotic independence between \({\tilde{\Delta }}_{\textrm{adj}}\) and \(T_{\textrm{Max}}\) under the null hypothesis.

Theorem 3.3

Suppose conditions (C1)–(C4) hold. Under \(H_0\), if \(\log (N)=o(T^{1/3})\), then \({\tilde{\Delta }}_{\textrm{adj}}\) and \(T_\textrm{Max}\) are asymptotically independent in the sense that

$$\begin{aligned} P\Big ({\tilde{\Delta }}_{\textrm{adj}}\le x, T_{\textrm{Max}}&-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma ({p}/{2}))\le y\Big )\\&\rightarrow \Phi (x)F(y), \end{aligned}$$

as \(N,T\rightarrow \infty \).

As a corollary, we derive the limiting null distribution of the Combo test.

Corollary 3.4

Assume the conditions in Theorem 3.3 hold. Then \(T_{\textrm{Combo}}\) converges in distribution to \(W=\min \{U,V\}\), as \(N,T\rightarrow \infty \), where UV are independent and identically distributed (iid) as a standard uniform random variable, and thus W has the density \(G(w)=2(1-w)I(0\le w \le 1)\).

By Corollary 3.4, given a significance level \(\alpha \), we can reject \(H_0\) if \(T_{\textrm{Combo}}<1-\sqrt{1-\alpha }\approx {\alpha }/{2}\) for a relatively small \(\alpha \).

The power function of the Combo test is \(\beta _C({\varvec{\delta }},\alpha )=P(T_{\textrm{Combo}}<1-\sqrt{1-\alpha })\). It can be verified that

$$\begin{aligned} \beta _C({\varvec{\delta }},\alpha ) =&P\left( p_M<1-\sqrt{1-\alpha }\right) +P\left( p_S<1-\sqrt{1-\alpha }\right) \nonumber \\&-P\left( p_M<1-\sqrt{1-\alpha },\ p_S<1-\sqrt{1-\alpha }\right) \nonumber \\ \ge&\max \left\{ P\left( p_S<1-\sqrt{1-\alpha }\right) ,P\left( p_M<1-\sqrt{1-\alpha }\right) \right\} \nonumber \\ \approx&\max \left\{ \beta _S({\varvec{\delta }},\alpha /2),\beta _M({\varvec{\delta }},\alpha /2)\right\} , \end{aligned}$$
(3.3)

where \(\beta _M({\varvec{\delta }},\alpha )\) and \(\beta _S({\varvec{\delta }},\alpha )\) are the power functions of \(T_{\textrm{Max}}\) and \({\tilde{\Delta }}_{\textrm{adj}}\), respectively, at the significance level \(\alpha \). As shown in [18], \(\beta _S({\varvec{\delta }},\alpha )=\Phi \left( -z_{\alpha }+\psi \right) \), where \(\psi =\lim _{N,T\rightarrow \infty }\frac{1}{\sqrt{2pN}}\sum _{i=1}^N T\sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}\) and \(z_{\alpha }\) is the upper \((1-\alpha )\)-quantile of the standard normal distribution. According to (3.3), we have \(\beta _C({\varvec{\delta }},\alpha )\ge \Phi \left( -z_{\alpha /2}+\psi \right) \).

To compare the power of tests based on \({\tilde{\Delta }}_{\textrm{adj}}\), \(T_{\textrm{Max}}\) and \(T_{\textrm{Combo}}\), consider a simplified scenario where \({\varvec{\Sigma }}_{iT}=\textbf{I}_p\) and \(\sigma _i^2=1\). Moreover, m elements of \({\varvec{\delta }}_i=(\delta _{i1},\ldots ,\delta _{ip})^\top \) are randomly sampled from \(U(-\gamma ,\gamma )\) for some \(\gamma >0\), and the rest are set to be 0, where U(ab) is the uniform distribution with the support [ab].

  1. 1.

    Assume that \(m\rightarrow \infty \). By noticing that \(m^{-1}\sum _{i=1}^N{\sigma }_i^{-2}{\varvec{\Sigma }}_{iT}{\varvec{\delta }}_i\mathop {\rightarrow }\limits ^{p}{\varvec{0}}\) and \(m^{-1}\sum _{i=1}^N \sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}\mathop {\rightarrow }\limits ^{p}\frac{1}{3}p\gamma ^2\), we have

    $$\begin{aligned} \beta _S({\varvec{\delta }},\alpha )=\Phi \left( -z_{\alpha }+\frac{Tmp^{1/2}\gamma ^2}{3\sqrt{2N}}\right) . \end{aligned}$$

    In addition, we have \(\epsilon \gamma ^2<\max _{1\le i\le N} \sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}\le p \gamma ^2\), for any positive constant \(\epsilon <p\), with probability approaching one. We consider two special cases:

    1. (1)

      Dense case \(\gamma =O(T^{-\xi })\) and \(m=O(N^{1/2}T^{2\xi -1})\) with some \(\xi >1/2\). In this case, \(T\gamma ^2=o(1)\), and it can be verified that \(\beta _{M}({\varvec{\delta }},\alpha )\approx \alpha \). Thus, the Max test lacks power. Then, \(\beta _C({\varvec{\delta }},\alpha )\approx \beta _S({\varvec{\delta }},\alpha /2)\approx \beta _S({\varvec{\delta }},\alpha )\), if \(\alpha \) is small. Hence, the Combo test performs similarly to \({\tilde{\Delta }}_{\textrm{adj}}\).

    2. (2)

      Sparse case \(\gamma =c\sqrt{\log N /T}\) for a sufficient large constant c and \(m=o((\log N)^{-1}N^{1/2})\). In this case, \(\frac{Tmp^{1/2}\gamma ^2}{3\sqrt{2N}}\rightarrow 0\) and \(\beta _{S}({\varvec{\delta }},\alpha )\approx \alpha \); in other words, the Sum test based on \({\tilde{\Delta }}_{\textrm{adj}}\) lacks power. According to Theorem 3.2, \(\beta _M({\varvec{\delta }},\alpha )\rightarrow 1\). Consequently, the Combo test has the power \(\beta _C({\varvec{\delta }},\alpha )\rightarrow 1\).

  2. 2.

    Assume that m is fixed. By noticing that \(\sum _{i=1}^N \sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}=O_p(\gamma ^2)\) and \(\max _{1\le i\le N} \sigma _i^{-2}\varvec{\omega }_{i}^\top {\varvec{\Sigma }}_{iT} \varvec{\omega }_{i}=O_p(\gamma ^2)\), we can similarly show that, if \(\gamma =c\sqrt{\log N /T}\) for a sufficient large constant c, then \(\beta _{S}({\varvec{\delta }},\alpha )\approx \alpha \), \(\beta _M({\varvec{\delta }}, \alpha )\rightarrow 1\), and \(\beta _C({\varvec{\delta }},\alpha )\rightarrow 1\).

4 Numerical studies

4.1 Simulation

In this section, we investigate the finite-sample performance of the proposed Max and Combo tests based on \(T_{\textrm{Max}}\) and \(T_\textrm{Combo}\), respectively. We choose some benchmark approaches, i.e., the tests based on \({\tilde{\Delta }}_{\textrm{adj}}\) [18], \(T_{\textrm{CLM}}\) [13], and \(T_{\textrm{LM}}\) [4]; see (2.3), (2.5), and (2.6), respectively. We consider the following three examples with independent, correlated, and structured noises, respectively. All simulation results are based on 1000 replications. We set the nominal significance level as \(\alpha =5\%\) in all examples.

Example 4.1

We revisit the model in [18]

$$\begin{aligned} y_{it}&=\alpha _i+\sum _{l=1}^p x_{ilt}\beta _{il}+u_{it},\ i=1,\ldots ,N,\ t=1,\ldots ,T,\\ x_{ilt}&=\alpha _i(1-\rho _{il})+\rho _{il}x_{1i,t-1}+(1-\rho _{il}^2)^{1/2}v_{ilt},\ t=-48,\ldots ,0,\ldots ,T, \end{aligned}$$

where \(v_{ilt}\overset{iid}{\sim }\ N(0,\sigma _{ilx}^2)\). We fix some \(\rho _{il}\overset{iid}{\sim }\ U(0.05,0.95)\) and \(\sigma _{ilx}^2\overset{iid}{\sim }\ \chi ^2(1)\) throughout the simulation study. We generate \(\alpha _i\overset{iid}{\sim }\ N(1,1)\), and discard the first 49 observations to reduce the effect of initial values. Three scenarios to generate \(u_{it}=\sigma _i z_{it}\), with \(\sigma _i^2\overset{iid}{\sim }\frac{p}{2}\chi ^2(2)\), are considered as follows:

  1. (I)

    Normal distribution, \(z_{it}\overset{iid}{\sim }\ N(0,1)\);

  2. (II)

    t-distribution, \(z_{it}\overset{iid}{\sim }\ t(3)/\sqrt{3}\);

  3. (III)

    Mixture of normals, \(z_{it}\overset{iid}{\sim }\ \{0.9N(0,1)+0.1N(0,100)\}/\sqrt{10.9}\).

Under \(H_0\), \(\beta _{il}\equiv 1\), for all i and l. While under \(H_1\), we set \(\beta _{il}=\beta _{i1}\), for \(l\ne 1\), and for \(\{\beta _{11},\ldots ,\beta _{N1}\}\), we first randomly choose \(l_1<\cdots <l_m\) from \(\{1,\ldots ,N\}\), and then generate \(\beta _{l_i1}\sim U(1-{1.1}{m^{-0.65}},1+{1.1}{m^{-0.65}})\), for \(i=1,\ldots ,m\), and set the rest \(\beta _{i1},\ i\notin \{l_1,\ldots ,l_m\}\), to be 1.

Example 4.2

To study the impact of correlated errors on the testing procedures, we generated \(u_{it}=\sigma _i z_{it}\) with \(\varvec{z}_t=(z_{1t},\ldots ,z_{Nt})^\top \sim N(\varvec{0}, {\varvec{\Sigma }}_z)\), where \({\varvec{\Sigma }}_z=(0.5^{|i-j|})_{1\le i,j \le N}\). The other settings are the same as in Example 4.1.

Example 4.3

We consider a high-dimensional panel data model with interactive fixed effects [2]. We generated \(u_{it}=\varvec{f}_t^\top \varvec{\lambda }_i+\sigma _iz_{it}\) [1], where \(\varvec{f}_t\) are 2-dimensional factor vectors with iid N(0, 1) entries, \(\varvec{\lambda }_i\overset{iid}{\sim }\ N(\varvec{0},0.25\textbf{I}_2)\) are factor-loading vectors, and \(\varvec{z}_t=(z_{1t},\ldots ,z_{Nt})^\top \sim N(\varvec{0}, {\varvec{\Sigma }}_z)\), with \({\varvec{\Sigma }}_z=(0.5^{|i-j|})_{1\le i,j \le N}\), are noises. The other settings are the same as in Example 4.1, except that, under \(H_1\), \(\beta _{l_i1}\overset{iid}{\sim } U(1-{2}{m^{-0.6}},1+{2}{m^{-0.6}})\).

Table 1 presents empirical sizes of various slope homogeneity tests under Example 4.1, with the configuration that \(p\in \{2,3,4\}\), \(T\in \{50,100\}\) and \(N\in \{50,100,200\}\). We can see that the Max test is a little conservative when the sample size is small. This is not strange, because the convergence rate of the extreme value distribution is rather slow [16]. In most cases, the \({\tilde{\Delta }}_{\textrm{adj}}\), \(T_{\textrm{LM}}\) and \(T_{\textrm{Combo}}\) tests can maintain the sizes at the nominal level, while the \(T_{\textrm{CLM}}\) test tends to be fairly conservative.

Table 1 Empirical Sizes of various slope homogeneity tests under Example 4.1

Figures 1-3 report the power of various tests with \(p=2,3\) or 4 under Example 4.1, respectively, when \(T=100\) and \(N=200\). We observe that the \(T_{\textrm{LM}}\) and \(T_{\textrm{CLM}}\) tests perform not very well, in terms of low power. As expected, the Max test outperforms the \({\tilde{\Delta }}_{\textrm{adj}}\)-based Sum test when m is relatively small, and as m becomes large, the Sum test performs better than the Max test. This is consistent with our theoretical result, that is, the Sum test is favorable for detecting dense signals, while the Max test is preferred for sparse scenarios. The proposed Combo test performs similarly to the Max test for small m, and has a similar performance to the Sum test for large m. Moreover, it outperforms both the Max and Sum tests for moderate m. Our simulation results reveal that the Combo test is very efficient in most cases, and it adapts to different levels of the sparsity. In addition, both the proposed Max and Combo tests, together with \({\tilde{\Delta }}_{\textrm{adj}}\), are robust to non-normal noises.

Fig. 1
figure 1

Power of various slope homogeneity tests with \(p=2\) under Example 4.1

Fig. 2
figure 2

Power of various slope homogeneity tests with \(p=3\) under Example 4.1

Fig. 3
figure 3

Power of various slope homogeneity tests with \(p=4\) under Example 4.1

Tables 2 and 3 present empirical sizes of various tests under Examples 4.2 and 4.3, respectively, with a wide range of (pNT) configurations. Figures 4, 5 depict the power of each test against the sparsity level m. Similar conclusions can be made as under Example 4.1. In particular, the Combo test adapts to the sparsity and has very good power. We also conduct a simulation study regarding some larger dimensions \(N=400\) and \(T=200\) under Example 4.2; see Fig. 6. It can be seen from Fig. 6 that the proposed tests perform satisfactorily.

Table 2 Empirical Sizes of various slope homogeneity tests under Example 4.2
Fig. 4
figure 4

Power of various slope homogeneity tests under Example 4.2

Table 3 Empirical Sizes of various slope homogeneity tests under Example 4.3
Fig. 5
figure 5

Power of various slope homogeneity tests under Example 4.3

Fig. 6
figure 6

Power of various slope homogeneity tests under Example 4.2 with \(N=400\) and \(T=200\)

Table 4 Empirical sizes of the proposed tests with their bootstrap calibrations under Example 4.2

Having observed that the Max test can be sometimes conservative (see, for example, Tables 1-3), we provide a bootstrap calibration procedure to accommodate such issues. Based on the residuals \(\hat{{\varvec{u}}}_{i}=\textbf{M}({\varvec{Y}}_i-\textbf{X}_i\hat{\varvec{\beta }}_i)=({\hat{u}}_{i1},\ldots ,{\hat{u}}_{iT})^\top \), for \(i=1,\ldots ,N\), we generated N bootstrap samples

$$\begin{aligned} Y_i^*=\textbf{X}_i \varvec{\gamma }_0+\varvec{\eta }_{i}, i=1,\ldots ,N, \end{aligned}$$

where \(\varvec{\gamma }_0=(1,\ldots ,1)^\top \in \mathbb {R}^{p}\) and \(\varvec{\eta }_i=(\eta _{i1},\cdots ,\eta _{iT})^\top \) such that \(\varvec{\eta }_{\cdot t}\) are bootstrap samples from \(\{\hat{{\varvec{u}}}_{\cdot t}\}_{t=1}^T\) with \(\hat{{\varvec{u}}}_{\cdot t}=({\hat{u}}_{1t},\cdots ,{\hat{u}}_{Nt})^\top \), and \(\varvec{\eta }_{\cdot t}\) is the tth column of \((\varvec{\eta }_1,\ldots ,\varvec{\eta }_N)^\top \). Hence, a bootstrap calibrated Max statistic can be computed based on the bootstrap sample \((Y_i^*, \textbf{X}_i),\ i=1,\cdots ,N\). By repeating the sampling procedure \(B=500\) times, an empirical p-value can be obtained, say \(p^*_{M}\). If \(p^{*}_{M}<\alpha \), then we can reject the null hypothesis. We refer to this bootstrap calibrated procedure as Max*. In a similar way, we can define a bootstrap calibrated Combo test, by rejecting \(H_0\) if \(p^{*}_{M}<1-\sqrt{1-\alpha }\) or \(p_S<1-\sqrt{1-\alpha }\), which is referred to as the Combo* test. Table 4 reports the empirical sizes of the proposed testing procedures, together with their bootstrap calibrations, under Example 4.2. We observe that both calibrated procedures perform very well. It is interesting to investigate the asymptotic validity of these tests for future researches.

4.2 Real data analysis

We study a real-data example of securities in stock markets to assess the performance of the proposed tests. To model the data, we use the Fama–French three-factor model [9], which adds size risk and value risk factors to the market risk factor in the capital asset pricing model. To be specific, assume

$$\begin{aligned} Y_{it}=r_{it}-r_{\textrm{f}t}=\alpha _i+\beta _{i1}(r_{\textrm{m}t}-r_{\textrm{f}t})+\beta _{i2}\textrm{SMB}_t+\beta _{i3} {\textrm{HML}}_t+u_{it}, \end{aligned}$$

for \(i\in \{1,\ldots ,N\}\) and \(t\in \{\tau ,\ldots ,\tau +T-1\}\), where \(r_{it}\) is the return of portfolio i at time t, \(r_{\textrm{f}t}\) is the risk-free rate at time t, \(r_{\textrm{m}t}\) is the market portfolio return at time t, \(\textrm{SMB}_t\) is the size premium (small minus big), and \(\textrm{HML}_t\) is the value premium (high minus low). We are interested in testing \(H_0: \varvec{\beta }_i=\varvec{\beta }\) for all \(i=1,\ldots ,N\) versus \(H_1:\varvec{\beta }_i\ne \varvec{\beta }_j\) for some \(1\le i\ne j\le N\), where \(\varvec{\beta }_i=(\beta _{i1},\beta _{i2},\beta _{i3})^\top \), for all \(i=1,\ldots ,N\).

Two data sets are investigated. One is the data set of securities in China’s stock markets. We consider \(N=1,340\) securities during the period from June 2005 to May 2019, measured in percentages per month. Hence, we have a total amount of \(T=144\) temporal observations. The rate of China’s 10-year government bond is chosen as the risk-free rate \(r_{\textrm{f}t}\), the value-weighted return on the stocks of Shanghai Stock Exchange and Shenzhen Stock Exchange is used as a proxy for the market return \(r_{\textrm{m}t}\), the average return on three small portfolios minus that on three big portfolios is calculated as \(\textrm{SMB}_t\), and the average return on two value portfolios minus that on two growth portfolios is used as \(\textrm{HML}_t\). The other data set is from the S &P 500 index. We specify the time range from January 2005 to November 2018 with \(T=165\), and collect \(N=374\) securities during this period.

We first apply five tests based on \(T_{\textrm{Max}}, {\tilde{\Delta }}_\textrm{adj}, T_{\textrm{CLM}}, T_{\textrm{LM}}\) and \(T_{\textrm{Combo}}\) to each data set with full samples. All tests reject the null hypothesis significantly, which shows that different stocks have different beta values. Next, we consider a restricted data size by randomly sampling \(T\in \{25,30\}\) and \(N\in \{30,50,80\}\) observations from each data set, and then repeating the process 1,000 times for each (TN) combination. Table 5 reports the rejection rates of different tests for each data set. We observe that the Max test performs not very well, which indicates the signal may be dense. This is further verified by the fact that the \({\tilde{\Delta }}_\textrm{adj}\)-based Sum test performs the best among all (TN) combinations. The proposed Combo test (i.e., \(T_{\textrm{Combo}}\)) performs very similarly to the Sum test, consistent with our theoretical and simulation findings.

Table 5 Rejection rates of different tests for beta testing, regarding the China and US stock markets

5 Discussion

In this paper, we propose two approaches for slope homogeneity testing in high-dimensional panel data models, that is, the Max and Combo tests. The Max is more efficient compared to traditional Sum tests under sparse alternatives, while the Combo is robust to different levels of sparsity. We established the limiting null distributions of both test statistics. Two limitations of the present work are: (1) the errors are assumed to be normal; and (2) the cross-sectional units are assumed to be independent. Our simulation studies show that the proposed tests may perform satisfactorily under non-normal and/or correlated errors, but the theoretical properties deserve further studies. Recent developments of slope homogeneity testing with cross-sectional dependence and/or serially correlated errors, such as [1, 3] and [4], could be extended for our methods. We leave it as future researches.

6 Appendix

6.1 Some useful lemmas

Lemma 6.1 restates a result in Table 3.4.4 in [8].

Lemma 6.1

Suppose \(z_i \overset{iid}{\sim }\ \Gamma (k,\theta )\),   for \(i\!=\!1,\ldots ,n\). Then \(a_n(\max _{1\le i\le n}z_i-b_n)\mathop {\rightarrow }\limits ^{d}\!\Lambda \), as \(n\rightarrow \infty \), where \(\Lambda \) is the Gumbel distribution with \(P(\Lambda <x)=e^{-e^{-x}}\), \(a_n=1/\theta \), and \(b_n=\theta (\log (n)+(k-1)\log (\log (n))-\log (\Gamma (k)))\).

Lemma 6.2

Suppose \(\varvec{\varepsilon }_i \overset{iid}{\sim }N(0,\textbf{I}_p)\),  for \(i\!=\!1,\ldots ,N\). Let \(\textbf{A}_i\in \mathbb {R}^{p\times p}\) be positive definite matrices, for all \(i=1,\ldots ,N\), and \(\max _{1\le i\le N} \lambda _{\max }(\textbf{A}_i) \le C\), for some positive constant C. Then, \(\max _{1\le i\le N} \varvec{\varepsilon }_i^\top \textbf{A}_i \varvec{\varepsilon }_i=O_p(\log (N))\), as \(N\rightarrow \infty \).

Proof

Consider the eigenvalue decomposition \(\textbf{A}_i={\varvec{\Omega }}_i^\top \textbf{D}_i{\varvec{\Omega }}_i\), where \(\textbf{D}_i=\textrm{diag}(\lambda _{i1},\ldots ,\lambda _{ip})\), \(\lambda _{ik}\) are the eigenvalues of \(\textbf{A}_i\), and \({\varvec{\Omega }}_i\) is an orthogonal matrix. Then, \(\varvec{\varepsilon }_i^\top \textbf{A}_i \varvec{\varepsilon }_i=\varvec{\varepsilon }_i^\top {\varvec{\Omega }}_i^\top \textbf{D}_i {\varvec{\Omega }}_i\varvec{\varepsilon }_i\). Since \(\varvec{\varepsilon }_i \sim N(0,\textbf{I}_p)\), \({\varvec{\Omega }}_i\varvec{\varepsilon }_i \sim N(0,\textbf{I}_p)\). Thus, \(\varvec{\varepsilon }_i^\top \textbf{A}_i \varvec{\varepsilon }_i\) equals \(\varvec{\varepsilon }_i^\top \textbf{D}_i \varvec{\varepsilon }_i=\sum _{k=1}^p \lambda _{ik}\varvec{\varepsilon }_{ik}^2\) in distribution. Then,

$$\begin{aligned} \max _{1\le i\le N} \varvec{\varepsilon }_i^\top \textbf{A}_i \varvec{\varepsilon }_i \overset{d}{=}\max _{1\le i\le N} \sum _{k=1}^p \lambda _{ik}\varvec{\varepsilon }_{ik}^2\le \max _{1\le i\le N} \lambda _{\max }(A_i) \sum _{k=1}^p\varvec{\varepsilon }_{ik}^2 \le C\max _{1\le i\le N}\sum _{k=1}^p\varvec{\varepsilon }_{ik}^2. \end{aligned}$$

Obviously, \(\sum _{k=1}^p \varvec{\varepsilon }_{ik}^2 \sim \chi ^2_p=\Gamma (\frac{p}{2},2)\). Thus, by Lemma 6.1, we have

$$\begin{aligned} P\left( \max _{1\le i\le N}\sum _{k=1}^p \varvec{\varepsilon }_{ik}^2 \le 3\log (N)\right) \sim \exp (-\exp (-\log (N)/2))\rightarrow 1, \end{aligned}$$

as \(N\rightarrow \infty \). Then, \(\max _{1\le i\le N}\sum _{k=1}^p \varvec{\varepsilon }_{ik}^2=O_p(\log (N))\), and thus \(\max _{1\le i\le N} \varvec{\varepsilon }_i^\top \textbf{A}_i \varvec{\varepsilon }_i=O_p(\log (N))\). \(\square \)

Lemma 6.3 restates Lemma 6.1 in [14].

Lemma 6.3

Suppose \(X \sim \chi ^2_k\), we have \(P(X\ge k+\sqrt{2kx}+2x)\le \exp (-x)\) and \(P(k-X\ge \sqrt{2kx})\le \exp (-x)\).

Lemma 6.4

Suppose conditions (C1)–(C4) holds. Under \(H_0\), \(\max _{1\le i\le N}|{\tilde{\sigma }}_i^2-\sigma _i^2|=O_p(\sqrt{\log (N)/T})\).

\(\square \)

Proof

According to (A.15) in [18], we have

$$\begin{aligned} {\tilde{\sigma }}_i^2=\frac{\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}+\frac{1}{N(T-1)}\varvec{\xi }_A^\top {\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1}\varvec{\xi }_A+\frac{2}{\sqrt{N}(T-1)}\varvec{\xi }_A^\top {\varvec{\Sigma }}_A \varvec{\xi }_i, \end{aligned}$$

where \(\varvec{\xi }_i=T^{-1/2}\textbf{X}_i^\top \textbf{M}\varvec{\varepsilon }_i\) and \(\varvec{\xi }_A=N^{-1/2}\sum _{i=1}^N \varvec{\xi }_i\). Thus,

$$\begin{aligned} \max _{1\le i\le N}|{\tilde{\sigma }}_i^2-\sigma _i^2|&\le \max _{1\le i\le N} \left| \frac{\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}-\sigma _i^2\right| +\max _{1\le i\le N} \frac{1}{N(T-1)}\varvec{\xi }_A^\top {\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1}\varvec{\xi }_A\\&+\max _{1\le i\le N}\frac{2}{\sqrt{N}(T-1)}|\varvec{\xi }_A^\top {\varvec{\Sigma }}_A \varvec{\xi }_i|. \end{aligned}$$

By Condition (C1), we have \(\sigma _i^{-2}\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i \sim \chi ^2_{T-1}\). Let \(\sigma _{max}^2=\max _{1\le i\le N}\sigma _i^2\). By Lemma 6.3, we have

$$\begin{aligned}&P\left( \max _{1\le i\le N} \frac{\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}-\sigma _i^2>3\sigma _{\max }^{2}\sqrt{\log (N)/T}\right) \\&\le N P\left( \frac{\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}-\sigma _i^2>3\sigma _{\max }^2\sqrt{\log (N)/T}\right) \\&\le N P\left( \frac{\sigma _i^{-2}\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}-1>3\sigma _i^{-2}\sigma _{\max }^2\sqrt{\log (N)/T}\right) \\&\le N P\left( \chi ^2_{T-1}-(T-1)\ge \sqrt{2.5(T-1)\log (N)}+2.5\sqrt{\log (N)}\right) \\&\le N\exp (-1.25 \log (N))=N^{-1/4}\rightarrow 0. \end{aligned}$$

Similarly, we have

$$\begin{aligned}&P\left( \max _{1\le i\le N} \sigma _i^2-\frac{\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}>3\sigma _{\max }^{2}\sqrt{\log (N)/T}\right) \rightarrow 0. \end{aligned}$$

Thus,

$$\begin{aligned} P\left( \max _{1\le i\le N} \left| \sigma _i^2-\frac{\varvec{\varepsilon }_i^\top \textbf{M}\varvec{\varepsilon }_i}{T-1}\right| >3\sigma _{\max }^{2}\sqrt{\log (N)/T}\right) \rightarrow 0. \end{aligned}$$

By Condition (C1), we have

$$\begin{aligned} \varvec{\xi }_A^\top {\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1}\varvec{\xi }_A\overset{d}{=} {\varvec{z}}^\top {\varvec{\Sigma }}_{\sigma }^{1/2}{\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1} {\varvec{\Sigma }}_{\sigma }^{1/2}{\varvec{z}}, \end{aligned}$$

where \({\varvec{z}}\sim N(0,\textbf{I}_p)\) and \({\varvec{\Sigma }}_{\sigma }=N^{-1}\sum _{i=1}^N\sigma _i^2{\varvec{\Sigma }}_{iT}\). By Condition (C3), the eigenvalues of \( {\varvec{\Sigma }}_{\sigma }^{1/2}{\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1} {\varvec{\Sigma }}_{\sigma }^{1/2}\) are bounded. Thus,

$$\begin{aligned} \max _{1\le i\le N} \frac{1}{N(T-1)}\varvec{\xi }_A^\top {\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1}\varvec{\xi }_A\le \frac{C}{N(T-1)}\max _{1\le i\le N}{\varvec{z}}^\top {\varvec{z}}=\frac{C}{N(T-1)}{\varvec{z}}^\top {\varvec{z}}. \end{aligned}$$

Because \({\varvec{z}}^\top {\varvec{z}}\sim \chi ^2_p\), \(\max _{1\le i\le N} \frac{1}{N(T-1)}\varvec{\xi }_A^\top {\varvec{\Sigma }}_A^{-1}{\varvec{\Sigma }}_{iT}{\varvec{\Sigma }}_A^{-1}\varvec{\xi }_A=O_p(N^{-1}T^{-1})\).

Next, notice that

$$\begin{aligned} \max _{1\le i\le N} |\varvec{\xi }_A^\top {\varvec{\Sigma }}_A \varvec{\xi }_i| \le \max _{1\le i\le N} N^{-1/2} \varvec{\xi }_i^\top {\varvec{\Sigma }}_A\varvec{\xi }_i|+ \max _{1\le i\le N} N^{-1/2}|\sum _{j\not =i} \varvec{\xi }_j^\top {\varvec{\Sigma }}_A \varvec{\xi }_i|. \end{aligned}$$

By condition (C1), we have

$$\begin{aligned} \varvec{\xi }_i^\top {\varvec{\Sigma }}_A\varvec{\xi }_i \overset{d}{=}\ {}&\sigma _i^2{\varvec{z}}_i^\top {\varvec{\Sigma }}_{iT}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}{\varvec{z}}_i, \\N^{-1/2}\sum _{j\not =i} \varvec{\xi }_j^\top {\varvec{\Sigma }}_A \varvec{\xi }_{i} \overset{d}{=}\ {}&N^{-1/2}\sum _{j\not =i} \sigma _i\sigma _j{\varvec{z}}_j^\top {\varvec{\Sigma }}_{iT}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}{\varvec{z}}_i\\ \overset{d}{=}\ {}&(1-1/N)^{1/2}\sigma _i \varvec{e}_i{\varvec{\Sigma }}_{Ai}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}{\varvec{z}}_i, \end{aligned}$$

where \(\varvec{e}_i\sim N(0,\textbf{I}_p)\) is independent of \({\varvec{z}}_i\) and \({\varvec{\Sigma }}_{Ai}=\frac{1}{(N-1)}\sum _{j\not =i}\sigma _j^2{\varvec{\Sigma }}_{jT}\). By condition (C3) and Lemma 6.2, \(\max _{1\le i\le N} N^{-1/2} \varvec{\xi }_i^\top {\varvec{\Sigma }}_A\varvec{\xi }_i=O_p(N^{-1/2}\log (N))\). Note that

$$\begin{aligned} \max _{1\le i\le N}&\left( N^{-1/2}\sum _{j\not =i} \varvec{\xi }_j^\top {\varvec{\Sigma }}_A \varvec{\xi }_i\right) ^2\\&\le \max _{1\le i\le N} \left( \sigma _i\varvec{e}_i{\varvec{\Sigma }}_{Ai}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}{\varvec{z}}_i\right) ^2\\&\le \max _{1\le i\le N} \left( \varvec{e}_i{\varvec{\Sigma }}_{Ai}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}\varvec{e}_i\right) \max _{1\le i\le N} \sigma _i^2\left( \varvec{z}_i{\varvec{\Sigma }}_{Ai}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}{\varvec{z}}_i\right) . \end{aligned}$$

By Lemma 6.2, \(\max _{1\le i\le N} \left( \varvec{z}_i{\varvec{\Sigma }}_{Ai}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}{\varvec{z}}_i\right) =O_p(\log (N))\). Note that

$$\begin{aligned} \max _{1\le i\le N} \left( \varvec{e}_i{\varvec{\Sigma }}_{Ai}^{1/2}{\varvec{\Sigma }}_A{\varvec{\Sigma }}_{iT}^{1/2}\varvec{e}_i\right) \le C \max _{1\le i\le N} \varvec{e}_i^\top \varvec{e}_i\overset{d}{=}\ C\max _{1\le i\le N} \varvec{\zeta }_i, \end{aligned}$$

where \(\varvec{\zeta }_i\sim \chi ^2_p\). By Lemma 6.3,

$$\begin{aligned} P\left( \max _{1\le i\le N} \varvec{\zeta }_i> 3 \log (N)\right)&\le \sum _{i=1}^N P(\varvec{\zeta }_i> 3 \log (N))\\&=N P(\chi ^2_p>3 \log (N)) \\&\le N P(\chi ^2_p >p+2\sqrt{2.5p\log (N)}+2.5 \log (N))\\&\le N\exp (-1.25\log (N))=N^{-1/4} \rightarrow 0. \end{aligned}$$

Thus, \(\max _{1\le i\le N} \varvec{\zeta }_i=O_p(\log (N))\). Then, we have

$$\begin{aligned} \max _{1\le i\le N} \left( \left| N^{-1/2}\sum _{j\not =i} \varvec{\xi }_j^\top {\varvec{\Sigma }}_A \varvec{\xi }_i\right| \right) =O_p(\log (N)). \end{aligned}$$

Consequently, \(\max _{1\le i\le N}\frac{2}{\sqrt{N}(T-1)}|\varvec{\xi }_A^\top {\varvec{\Sigma }}_A \varvec{\xi }_i|=O_p\left( \frac{\log (N)}{\sqrt{N}T}\right) \).

Combining these facts together, we have

$$\begin{aligned} \max _{1\le i\le N}|{\tilde{\sigma }}_i^2-\sigma _i^2|=O_p\left( \sqrt{\frac{\log (N)}{T}}\right) +O_p\left( \frac{1}{NT}\right) +O_p\left( \frac{\log (N)}{\sqrt{N}T}\right) , \end{aligned}$$

which completes the proof. \(\square \)

6.2 Proof of Theorem 3.1

Under \(H_0\), we have

$$\begin{aligned}&\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_\textrm{WFE}=T^{-1/2}{\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_{i}-T^{-1/2}N^{-1/2}\\&\left( N^{-1}\sum _{j=1}^N{\tilde{\sigma }}_i^{-2} {\varvec{\Sigma }}_{iT}\right) ^{-1}\left( N^{-1/2}\sum _{j=1}^N{\tilde{\sigma }}_i^{-2}\varvec{\xi }_i\right) . \end{aligned}$$

Define \({\tilde{{\varvec{\Sigma }}}}_A=N^{-1}\sum _{j=1}^N{\tilde{\sigma }}_i^{-2}{\varvec{\Sigma }}_{iT}\) and \({\tilde{\varvec{\xi }}}_A=N^{-1/2}\sum _{j=1}^N{\tilde{\sigma }}_i^{-2}\varvec{\xi }_i\). Then,

$$\begin{aligned} \tilde{S}_i=&\left( \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}\right) ^\top \frac{\textbf{X}_i^\top \textbf{M}\textbf{X}_i}{\tilde{\sigma _i}^2}\left( \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}\right) \\ =&{\tilde{\sigma }}_i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i-2N^{-1/2}{\tilde{\sigma }}_i^{-2}\varvec{\xi }_i^\top {\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A+N^{-1}{\tilde{\varvec{\xi }}}_A^{\top } {\tilde{{\varvec{\Sigma }}}}_A^{-1}{\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A. \end{aligned}$$

By Condition (C1), we have \(\sigma _i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i \sim \chi ^2_p\). By Lemma 6.1, we have

$$\begin{aligned} P_{H_0}\Big \{\max _{1\le i \le N} {\sigma }_i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i&-2\log (N)-(p-2)\log (\log (N))+2\log \left( \Gamma (\frac{p}{2})\right) \le x\Big \}\\&\rightarrow \exp (-\exp (-x/2)). \end{aligned}$$

By Lemmas 6.2 and Lemma 6.4,

$$\begin{aligned} \Big |\max _{1\le i \le N} {\tilde{\sigma }}_i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1} \varvec{\xi }_i -&\max _{1\le i \le N}{\sigma }_i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i\Big |\\ \le&\max _{1\le i \le N} \left| {\tilde{\sigma }}_i^{2}-\sigma _i^{2}\right| {\tilde{\sigma }}_i^{-2} \max _{1\le i \le N} {\sigma }_i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i\\ =&O_p\left( \sqrt{\frac{\log (N)}{T}}\right) O_p(\log (N))\\ =&O_p\left( \frac{\log ^{3/2}(N)}{\sqrt{T}}\right) =o_p(1). \end{aligned}$$

Next, we show that

$$\begin{aligned} \max _{1\le i\le N} N^{-1}{\tilde{\varvec{\xi }}}_A^{\top } {\tilde{{\varvec{\Sigma }}}}_A^{-1}{\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A=O_p(N^{-1})=o_p(1). \end{aligned}$$

By Condition (C1), we have

$$\begin{aligned} {\tilde{\varvec{\xi }}}_A^{\top } {\tilde{{\varvec{\Sigma }}}}_A^{-1}{\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A \overset{d}{=}\ {\varvec{z}}^\top {\tilde{{\varvec{\Sigma }}}}_A^{-1/2}{\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1/2} {\varvec{z}}, \end{aligned}$$

and by Condition (C3), the eigenvalues of \({\tilde{{\varvec{\Sigma }}}}_A^{-1/2}{\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1/2}\) are bounded. Thus,

$$\begin{aligned}&\max _{1\le i\le N} N^{-1}{\tilde{\varvec{\xi }}}_A^{\top } {\tilde{{\varvec{\Sigma }}}}_A^{-1}{\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A\\&\quad \le N^{-1}C{\varvec{z}}^\top {\varvec{z}}=O_p(N^{-1}). \end{aligned}$$

By Cauchy’s inequality, we have

$$\begin{aligned} \max _{1\le i \le N}&N^{-1}\left( {\tilde{\sigma }}_i^{-2}\varvec{\xi }_i^\top {\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A\right) ^2\\&\le \max _{1\le i\le N} {\tilde{\sigma }}_i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1} \varvec{\xi }_i \times \max _{1\le i\le N} N^{-1}{\tilde{\varvec{\xi }}}_A^{\top } {\tilde{{\varvec{\Sigma }}}}_A^{-1} {\varvec{\Sigma }}_{iT}{\tilde{{\varvec{\Sigma }}}}_A^{-1}{\tilde{\varvec{\xi }}}_A\\&= O_p(\log (N))O_p(N^{-1})=o_p(1). \end{aligned}$$

Thus,

$$\begin{aligned}&P_{H_0}\left\{ \max _{1\le i \le N} \tilde{S}_i -2\log (N)-(p-2)\log (\log (N))+2\log \left( \Gamma \left( \frac{p}{2}\right) \right) \le x\right\} \\&\quad \rightarrow \exp (-\exp (-x/2)). \end{aligned}$$

6.3 Proof of Theorem 3.2

Under \(H_1\), we have

$$\begin{aligned} \hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_\textrm{WFE}=T^{-1/2}{\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_{i}-T^{-1/2}N^{-1/2}\left( N^{-1} \sum _{j=1}^N{\tilde{\sigma }}_i^{-2}{\varvec{\Sigma }}_{iT}\right) ^{-1}\\ \left( N^{-1/2} \sum _{j=1}^N{\tilde{\sigma }}_i^{-2}\varvec{\xi }_i\right) +\hat{\varvec{\omega }}_i, \end{aligned}$$

where

$$\begin{aligned} \hat{\varvec{\omega }}_i={\varvec{\delta }}_{i}-\left( N^{-1}\sum _{j=1}^N{\tilde{\sigma }}_i^{-2} {\varvec{\Sigma }}_{iT}\right) ^{-1}\left( N^{-1}\sum _{j=1}^N{\tilde{\sigma }}_i^{-2}{\varvec{\Sigma }}_{iT}{\varvec{\delta }}_i\right) . \end{aligned}$$

By Lemma 6.4 and Condition (C3), we have \(\max _{1\le i \le N}|\hat{{\varvec{\omega }}}_i-{\varvec{\omega }}_i|=O_p({\log (N)/T})\). By the triangle inequality,

$$\begin{aligned}&\max _{1\le i\le N} T{\tilde{\sigma }}_i^{-2}\hat{{\varvec{\omega }}}_i^\top {\varvec{\Sigma }}_{iT} \hat{{\varvec{\omega }}}_i \ge \frac{1}{2}\max _{1\le i\le N} T{\tilde{\sigma }}_i^{-2}{{\varvec{\omega }}}_i^\top {\varvec{\Sigma }}_{iT} {{\varvec{\omega }}}_i\\&-\max _{1\le i\le N} T{\tilde{\sigma }}_i^{-2}(\hat{{\varvec{\omega }}}_i-{\varvec{\omega }}_i)^\top {\varvec{\Sigma }}_{iT} (\hat{{\varvec{\omega }}}_i-{\varvec{\omega }}_i)\\ \ge&\frac{1}{2}\max _{1\le i\le N} T{\sigma }_i^{-2}{{\varvec{\omega }}}_i^\top {\varvec{\Sigma }}_{iT} {{\varvec{\omega }}}_i-\frac{1}{2}\max _{1\le i\le N} T{\sigma }_i^{-2}{{\varvec{\omega }}}_i^\top {\varvec{\Sigma }}_{iT} {{\varvec{\omega }}}_i \times \max _{1\le i\le N} \left| {\tilde{\sigma }}_i^{-2}-\sigma _i^{-2}\right| \\&-\max _{1\le i\le N} T{\tilde{\sigma }}_i^{-2}(\hat{{\varvec{\omega }}}_i-{\varvec{\omega }}_i)^\top {\varvec{\Sigma }}_{iT} (\hat{{\varvec{\omega }}}_i-{\varvec{\omega }}_i)\\ \ge&(8+\frac{1}{2}\epsilon )\log (N)-O_p(\log ^{3/2}(N)/T^{-1/2})-O_p(\log ^2(N)/T)\\ \ge&(8+\frac{1}{4}\epsilon )\log (N), \end{aligned}$$

as \(N \rightarrow \infty \). According to the proof of Theorem 3.1, we have

$$\begin{aligned} P\bigg \{&\max _{1\le i\le N} T\tilde{\sigma _i}^{-2}(\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}} -\hat{{\varvec{\omega }}}_i)^\top {\varvec{\Sigma }}_{iT}(\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}-\hat{{\varvec{\omega }}}_i)\\&-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma (\frac{p}{2}))\le x\bigg \}\rightarrow \exp (-\exp (-x/2)). \end{aligned}$$

Hence,

$$\begin{aligned} P\bigg \{&\max _{1\le i\le N} T\tilde{\sigma _i}^{-2}(\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}} -\hat{{\varvec{\omega }}}_i)^\top {\varvec{\Sigma }}_{iT}(\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}-\hat{{\varvec{\omega }}}_i)\\&\le 2\log (N)+(p-1)\log (\log (N))\bigg \}\rightarrow 1, \end{aligned}$$

by setting \(x=\log (\log (N))+2\log (\Gamma (\frac{p}{2}))\). By the triangle inequality, we have

$$\begin{aligned} \max _{1\le i\le N}T&\tilde{\sigma _i}^{-2}(\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}})^\top {\varvec{\Sigma }}_{iT} (\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}) \\&\ge \frac{1}{2}\max _{1\le i\le N} T{\tilde{\sigma }}_i^{-2}\hat{{\varvec{\omega }}}_i^\top {\varvec{\Sigma }}_{iT} \hat{{\varvec{\omega }}}_i-\max _{1\le i\le N}T\tilde{\sigma _i}^{-2}(\hat{\varvec{\beta }}_i\\&\quad -{\tilde{\varvec{\beta }}}_{\textrm{WFE}}-\hat{{\varvec{\omega }}}_i)^\top {\varvec{\Sigma }}_{iT}(\hat{\varvec{\beta }}_i-{\tilde{\varvec{\beta }}}_{\textrm{WFE}}-\hat{{\varvec{\omega }}}_i)\\&\ge (4+\frac{1}{8}\epsilon )\log (N)-2\log (N)-(p-1)\log (\log (N))\\&\ge \frac{1}{16}\epsilon \log (N)+2\log (N)+(p-2)\log (\log (N))-2\log \left( \Gamma \left( \frac{p}{2}\right) \right) +q_{\alpha }, \end{aligned}$$

with probability approaching one, as \(N\rightarrow \infty \). Hence, \(P(\Phi _{\alpha }=1)\rightarrow 1\).

6.4 Proof of Theorem 3.3

Lemma 6.5

Suppose \(Z_1,\ldots ,Z_N\) are independent and identically distributed random sample from \(\chi ^2_p\). Set \(S_N=Z_1+\cdots + Z_N\), \(\upsilon _N=(2pN)^{1/2}\) and \(A_N=\{\frac{S_N-pN}{\upsilon _N}\le x\}\). For \(y \in \mathbb {R}\), denote \(l_N= 2\log (N)+(p-2)\log (\log (N))-2\log (\Gamma (\frac{p}{2}))+y\) and \(B_{i}=\{Z_i>l_N\}.\) Then, for each \(n\ge 1\),

$$\begin{aligned} \sum _{1\le i_1< \cdots < i_{n}\le N}\big | P(A_NB_{i_1}\cdots B_{i_{n}}) - P(A_N)\cdot P(B_{i_1}\cdots B_{i_{n}}) \big |\rightarrow 0, \end{aligned}$$

as \(N\rightarrow \infty \).

Proof

Write

$$\begin{aligned} S_N=\sum _{i=1}^NZ_i=\sum _{i=n+1}^N Z_i+\sum _{i=1}^n Z_i\doteq U_N+\Theta _N. \end{aligned}$$

We will show the last term on the right hand side is negligible. By the definition, we have \(\Theta _N\sim \chi ^2_{pn}\). By Lemma 6.3, for any \(n\ge 1\) and \(\epsilon >0\), there exists \(t=t_N>0\) with \(\lim _{N\rightarrow \infty }t_N=\infty \) and \(N_0\), depending on \(n, \epsilon \), such that

$$\begin{aligned} P(\Theta _N\ge \epsilon \upsilon _N)\le \frac{1}{N^{t}}, \end{aligned}$$

for \(N\ge N_0\). Define

$$\begin{aligned} A_N(x)=\Big \{\frac{1}{\upsilon _N}(S_N-pN)\le x\Big \},\ \ x \in \mathbb {R}, \end{aligned}$$

for \(N\ge 1\). From the fact \(S_N=U_N+\Theta _N\) we see that

$$\begin{aligned} P(A_N(x)B_1\cdots B_n)\le & {} P\Big (A_N(x)B_1\cdots B_n,\ \frac{|\Theta _N|}{\upsilon _N}< \epsilon \Big ) + \frac{1}{N^t}\\\le & {} P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x+\epsilon ,\ B_1\cdots B_n\Big ) +\frac{1}{N^{t}}\\= & {} P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x+\epsilon \Big )\cdot P\big ( B_1\cdots B_n\big ) +\frac{1}{N^{t}}, \end{aligned}$$

by the independence between \(U_N\) and \(\Theta _N\). Now,

$$\begin{aligned} P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x+\epsilon \Big )\le & {} P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x+\epsilon ,\ \frac{|\Theta _N|}{\upsilon _N}< \epsilon \Big ) + \frac{1}{N^{t}} \\\le & {} P\Big (\frac{1}{\upsilon _N}(U_N+\Theta _N-pN)\le x+2\epsilon \Big ) + \frac{1}{N^{t}}\\\le & {} P\big (A_N(x+2\epsilon )\big ) + \frac{1}{N^{t}}. \end{aligned}$$

Combining the two inequalities,

$$\begin{aligned} P(A_N(x)B_1\cdots B_n) \le P\big (A_N(x+2\epsilon )\big )\cdot P\big ( B_1\cdots B_n\big ) + \frac{2}{N^{t}}. \end{aligned}$$
(6.1)

Similarly,

$$\begin{aligned}{} & {} P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x-\epsilon ,\ B_1\cdots B_n\Big )\\{} & {} \quad \le P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x-\epsilon , B_1\cdots B_n, \frac{|\Theta _N|}{\upsilon _N}< \epsilon \Big ) +\frac{1}{N^{t}}\\{} & {} \quad \le P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x,\ B_1\cdots B_n\Big ) +\frac{1}{N^{t}}. \end{aligned}$$

By the independence between \(U_N\) and \(\Theta _N\),

$$\begin{aligned} P(A_N(x)B_1\cdots B_n) \ge P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x-\epsilon \Big )\cdot P( B_1\cdots B_n)-\frac{1}{N^{t}}. \end{aligned}$$

Furthermore,

$$\begin{aligned} P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x-2\epsilon \Big )\le & {} P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x-2\epsilon ,\ \frac{|\Theta _N|}{\upsilon _N}< \epsilon \Big ) + \frac{1}{N^{t}} \\\le & {} P\Big (\frac{1}{\upsilon _N}(U_N-pN)\le x-\epsilon \Big ) +\frac{1}{N^{t}}, \end{aligned}$$

due to the fact \(S_N=U_N+\Theta _N\). Combining the two inequalities, we get

$$\begin{aligned} P(A_N(x)B_1\cdots B_n) \ge P(A_N(x-2\epsilon ))\cdot P(B_1\cdots B_n)-\frac{2}{N^{t}}. \end{aligned}$$

This, together with (6.1), concludes

$$\begin{aligned}{} & {} \big |P(A_N(x)B_1\cdots B_n)-P(A_N(x))\cdot P(B_1\cdots B_n)\big |\\{} & {} \quad \le \Delta _{N, \epsilon }\cdot P(B_1\cdots B_n)+\frac{2}{N^{t}}, \end{aligned}$$

for \(N\ge N_0\), where

$$\begin{aligned} \Delta _{N, \epsilon }:= & {} |P(A_N(x))-P(A_N(x+2\epsilon ))| + |P(A_N(x))-P(A_N(x-2\epsilon ))|. \end{aligned}$$

Similarly, for any \(1\le i_1< i_2<\cdots <i_n\le N\), we have

$$\begin{aligned}{} & {} \big |P(A_N(x)B_{i_1}\cdots B_{i_n})-P(A_N(x))\cdot P(B_{i_1}\cdots B_{i_n})\big |\\{} & {} \quad \le \Delta _{N, \epsilon }\cdot P(B_{i_1}\cdots B_{i_n})+\frac{2}{N^{t}}, \end{aligned}$$

for \(N\ge N_0\). As a result,

$$\begin{aligned} \zeta (N,n):= & {} \sum _{1\le i_1< \cdots< i_{n}\le N}\big [P(A_N(x) B_{i_1}\cdots B_{i_{n}}) - P(A_N(x))\cdot P(B_{i_1}\cdots B_{i_{n}})\big ] \nonumber \\\le & {} \sum _{1\le i_1< \cdots < i_{n}\le N}\Big [\Delta _{N, \epsilon }\cdot P(B_{i_1}\cdots B_{i_{n}})+\frac{2}{N^{t}}\Big ] \nonumber \\\le & {} \Delta _{N, \epsilon }\cdot H(n, N)+ \left( {\begin{array}{c}N\\ n\end{array}}\right) \cdot \frac{2}{N^{t}}, \end{aligned}$$
(6.2)

where

$$\begin{aligned} H(n, N):=\sum _{1\le i_1< \cdots < i_{n}\le N}P(B_{i_1}\cdots B_{i_{n}}). \end{aligned}$$

First, by the central limit theorem,

$$\begin{aligned} \frac{S_N-pN}{\upsilon _N}\rightarrow N(0, 1)\ \text{ weakly }, \end{aligned}$$

as \(N\rightarrow \infty \), and hence

$$\begin{aligned} \Delta _{N, \epsilon } \rightarrow |\Phi (x+2\epsilon )-\Phi (x)|+|\Phi (x-2\epsilon )-\Phi (x)|, \end{aligned}$$

as \(N\rightarrow \infty \), where \(\Phi (x)=\frac{1}{\sqrt{2\pi }}\int _{-\infty }^xe^{-t^2/2}\,dt\). This implies that \(\lim _{\epsilon \downarrow 0}\lim _{N\rightarrow \infty }\Delta _{N, \epsilon }=0\). Second, by the independence of \(Z_i\), we have

$$\begin{aligned} H(n, N)&=\sum _{1\le i_1< \cdots < i_{n}\le N}P(B_{i_1}\cdots B_{i_{n}})\\&=\left( {\begin{array}{c}N\\ n\end{array}}\right) P(B_{1}\cdots B_{{n}})=\left( {\begin{array}{c}N\\ n\end{array}}\right) P(B_1)^n\\&=\left( {\begin{array}{c}N\\ n\end{array}}\right) \{P(\chi ^2_p>l_N)\}^n. \end{aligned}$$

As \(N\rightarrow \infty \),

$$\begin{aligned} \log P(\max _{1\le i \le N}\chi ^2_p\le l_N)=N\log P(\chi ^2_p\le l_N)=N \log (1-P(\chi ^2_p> l_N))\sim NP(\chi ^2_p> l_N). \end{aligned}$$

By Lemma 6.1, we have \(P(\chi ^2_p>l_N)\sim \frac{1}{N}e^{-y/2}\). Thus,

$$\begin{aligned} \lim _{N\rightarrow \infty }H(n, N)=\frac{1}{n!}e^{-ny/2}, \end{aligned}$$
(6.3)

for each \(n\ge 1.\) By using \(\left( {\begin{array}{c}N\\ n\end{array}}\right) \le N^n\) and (6.2), for fixed \(n\ge 1\), sending \(N\rightarrow \infty \) first, then sending \(\epsilon \downarrow 0\), we get \(\lim _{N\rightarrow \infty }\zeta (N,n)= 0\), for each \(n\ge 1\). The proof is completed. \(\square \)

Lemma 6.6

Suppose \(Z_1,\ldots ,Z_N\) are independent and identically distributed random sample from \(\chi ^2_p\), we have \(\frac{\sum _{i=1}^N Z_i-pN}{\sqrt{2pN}}\) and \(\max _{1\le i\le N}Z_i-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma (\frac{p}{2}))\) are asymptotically independent, as \(N \rightarrow \infty \).

Proof

Define \(S_N=\sum _{i=1}^N Z_i\) and \(\upsilon _N=\sqrt{2pN}\). By the central limit theorem, we have

$$\begin{aligned} \frac{S_N-pN}{\upsilon _N}\rightarrow N(0, 1)\ \text{ weakly },\ \ \ \ \ \ \end{aligned}$$
(6.4)

as \(N\rightarrow \infty \). By Lemma 6.1, we have

$$\begin{aligned} \max _{1\le i\le N}Z_i&-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma (\frac{p}{2}))\nonumber \\&\rightarrow F(y)=\exp \Big \{-e^{-y/2}\Big \}\ \end{aligned}$$
(6.5)

in distribution, as \(N\rightarrow \infty .\) To show the asymptotic independence, it suffices to prove

$$\begin{aligned} P\Big (\frac{S_N-pN}{\upsilon _N}\le x,&\max _{1\le i \le N}Z_i-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma (\frac{p}{2}))\le y\Big )\\&\rightarrow \Phi (x)\cdot F(y), \end{aligned}$$

as \(N\rightarrow \infty \), for any \(x\in \mathbb {R}\) and \(y \in \mathbb {R}\), where \(\Phi (x)=(2\pi )^{-1/2}\int _{-\infty }^xe^{-t^2/2}\,dt.\) Set

$$\begin{aligned} L_N=\max _{1\le i \le N}Z_i\ \ \text{ and }\ \ l_N= 2\log (N)+(p-2)\log (\log (N))-2\log (\Gamma (\frac{p}{2}))+y. \end{aligned}$$

Because of (6.4) and (6.5), it is equivalent to show

$$\begin{aligned} \lim _{N\rightarrow \infty }P\Big (\frac{S_N-pN}{\upsilon _N}\le x,\ L_N>l_N\Big )= \Phi (x)\cdot [1-F(y)], \end{aligned}$$
(6.6)

for any \(x\in \mathbb {R}\) and \(y \in \mathbb {R}\). Define

$$\begin{aligned} A_N=\Big \{\frac{S_N-pN}{\upsilon _N}\le x\Big \}\ \ \ \text{ and }\ \ \ B_{i}=\big \{Z_i>l_N\big \}, \end{aligned}$$

for \(1\le i\le N\). Therefore,

$$\begin{aligned} P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x,\ L_N>l_N\Big )=P\Big (\bigcup _{i=1}^NA_NB_{i}\Big ). \end{aligned}$$
(6.7)

Here the notation \(A_NB_i\) stands for \(A_N\cap B_i\). By the inclusion–exclusion principle,

$$\begin{aligned} P\Big (\bigcup _{i=1}^NA_NB_{i}\Big )\le & {} \sum _{1\le i_1 \le N}P(A_NB_{i_1})-\sum _{1\le i_1< i_2\le N}P(A_NB_{i_1}B_{i_2})+\cdots +\nonumber \\{} & {} \sum _{1\le i_1< \cdots < i_{2k+1}\le N}P(A_NB_{i_1}\cdots B_{i_{2k+1}}) \end{aligned}$$
(6.8)

and

$$\begin{aligned} P\Big (\bigcup _{i=1}^NA_NB_{i}\Big )\ge & {} \sum _{1\le i_1 \le N}P(A_NB_{i_1})-\sum _{1\le i_1< i_2\le N}P(A_NB_{i_1}B_{i_2})+\cdots - \nonumber \\{} & {} \sum _{1\le i_1< \cdots < i_{2k}\le N}P(A_NB_{i_1}\cdots B_{i_{2k}}), \nonumber \\ \end{aligned}$$
(6.9)

for any integer \(k\ge 1\). Define

$$\begin{aligned} H(N, n)=\sum _{1\le i_1< \cdots < i_{n}\le N}P(B_{i_1}\cdots B_{i_{n}}), \end{aligned}$$

for \(n\ge 1\). From (6.3) we know

$$\begin{aligned} \lim _{n\rightarrow \infty }\limsup _{N\rightarrow \infty }H(N, n)=0. \end{aligned}$$
(6.10)

Set

$$\begin{aligned} \zeta (N,n)=\sum _{1\le i_1< \cdots < i_n\le N}\big [P(A_NB_{i_1}\cdots B_{i_n}) - P(A_N)\cdot P(B_{i_1}\cdots B_{i_n})\big ], \end{aligned}$$

for \(n\ge 1.\) By Lemma 6.5,

$$\begin{aligned} \lim _{N\rightarrow \infty }\zeta (N,n)=0, \end{aligned}$$
(6.11)

for each \(n\ge 1\). The assertion (6.8) implies that

$$\begin{aligned} P\Big (\bigcup _{i=1}^NA_NB_{i}\Big )\le & {} P(A_N)\Big [\sum _{1\le i_1 \le N}P(B_{i_1})-\sum _{1\le i_1< i_2\le N}P(B_{i_1}B_{i_2})+\cdots - \nonumber \\{} & {} \sum _{1\le i_1< \cdots < i_{2k} \le N}P(B_{i_1}\cdots B_{i_{2k}})\Big ]+ \Big [\sum _{n=1}^{2k}\zeta (N,n)\Big ] + H(N, 2k+1) \nonumber \\\le & {} P(A_N)\cdot P\Big (\bigcup _{i=1}^NB_{i}\Big )+ \Big [\sum _{n=1}^{2k}\zeta (N,n)\Big ] + H(N, 2k+1), \end{aligned}$$
(6.12)

where the inclusion–exclusion formula is used again in the last inequality, that is,

$$\begin{aligned} P\Big (\bigcup _{i=1}^NB_{i}\Big )\ge & {} \sum _{1\le i_1 \le N}P(B_{i_1})-\sum _{1\le i_1< i_2\le N}P(B_{i_1}B_{i_2})+\cdots - \nonumber \\{} & {} \sum _{1\le i_1< \cdots < i_{2k}\le N}P(B_{i_1}\cdots B_{i_{2k}}), \end{aligned}$$

for all \(k\ge 1\). By the definition of \(l_N\) and (6.5),

$$\begin{aligned} P\Big (\bigcup _{i=1}^NB_{i}\Big )= & {} P\big (L_N>l_N\big )\\= & {} P\big (\max _{1\le i \le N}Z_i-2\log (N)-(p-2)\log (\log (N))+2\log (\Gamma (\frac{p}{2}))> y\big )\\\rightarrow & {} 1-F(y), \end{aligned}$$

as \(N\rightarrow \infty \). By (6.4), \(P(A_N)\rightarrow \Phi (x)\), as \(N\rightarrow \infty .\) From (6.7), (6.11) and (6.12), by fixing k first and sending \(N\rightarrow \infty \), we obtain that

$$\begin{aligned} \limsup _{N\rightarrow \infty }P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x,\ L_N>l_N\Big )\le \Phi (x)\cdot [1-F(y)] +\lim _{N\rightarrow \infty }H(N, 2k+1). \end{aligned}$$

Now, by letting \(k\rightarrow \infty \) and using (6.10), we have

$$\begin{aligned} \limsup _{N\rightarrow \infty }P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x,\ L_N>l_N\Big )\le \Phi (x)\cdot [1-F(y)]. \end{aligned}$$
(6.13)

By applying the same argument to (6.9), we see that the counterpart of (6.12) becomes

$$\begin{aligned} P\Big (\bigcup _{i=1}^NA_NB_{i}\Big )\ge & {} P(A_N)\Big [\sum _{1\le i_1 \le N}P(B_{i_1})-\sum _{1\le i_1< i_2\le N}P(B_{i_1}B_{i_2})+\cdots + \nonumber \\{} & {} \sum _{1\le i_1< \cdots < i_{2k-1}\le N}P(B_{i_1}\cdots B_{i_{2k-1}})\Big ] + \Big [\sum _{n=1}^{2k-1}\zeta (N,n)\Big ] - H(N, 2k) \nonumber \\\ge & {} P(A_N)\cdot P\Big (\bigcup _{i=1}^NB_{i}\Big ) + \Big [\sum _{n=1}^{2k-1}\zeta (N,n)\Big ] - H(N, 2k), \end{aligned}$$

where in the last step we use the inclusion–exclusion principle, i.e.,

$$\begin{aligned} P\Big (\bigcup _{i=1}^NB_{i}\Big )\le & {} \sum _{1\le i_1 \le N}P(B_{i_1})-\sum _{1\le i_1< i_2\le N}P(B_{i_1}B_{i_2})+\cdots + \nonumber \\{} & {} \sum _{1\le i_1< \cdots < i_{2k-1}\le N}P(B_{i_1}\cdots B_{i_{2k-1}}), \end{aligned}$$

for all \(k\ge 1\). Review (6.7) and repeat the earlier procedure to see

$$\begin{aligned} \liminf _{N\rightarrow \infty }P\Big (\frac{1}{\upsilon _N}(S_N-pN)\le x,\ L_N>l_N\Big )\ge \Phi (x)\cdot [1-F(y)], \end{aligned}$$

by sending \(N\rightarrow \infty \) and then sending \(k\rightarrow \infty .\) This and (6.13) yield (6.6). The proof is completed.

\(\square \)

Proof of Theorem 3.3

According to the proof of Theorem 3.2 in [18], we have \({\tilde{\Delta }}_{\textrm{adj}}=S_a+O_p(T^{-1/2})+O_p(N^{-1/2})\), where \(S_{a}=(2pN)^{-1/2}\sum _{i=1}^N (\sigma _i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i-p)\). According to the proof of Theorem 3.2, we have \(T_{\textrm{Max}}=M_a+O_p(\sqrt{\frac{\log N}{N}})+O_p(\frac{\log ^{3/2} N}{T^{1/2}})\), where \(M_a=\max _{1\le i \le N}\sigma _i^{-2}\varvec{\xi }_i^\top {\varvec{\Sigma }}_{iT}^{-1}\varvec{\xi }_i\). Given \(\epsilon \in (0,1)\), set \(\Omega _N=\{|{\tilde{\Delta }}_\textrm{adj}-S_a|<\epsilon ,|M-M_a|<\epsilon \}\). We have \(\lim _{N,T\rightarrow \infty }P(\Omega _N)=1\). By Lemma 6.6,

$$\begin{aligned} P({\tilde{\Delta }}_{\textrm{adj}}\le x, T_{\textrm{Max}}>l_N)&\le P({\tilde{\Delta }}_{\textrm{adj}}\le x, T_{\textrm{Max}}>l_N, \Omega _N)+P(\Omega _N^c)\\&\le P(S_a\le x+\epsilon , M_a>l_N-\epsilon )+P(\Omega _N^c)\\&\rightarrow \Phi (x+\epsilon )(1-F(y-\epsilon )), \end{aligned}$$

as \(N,T\rightarrow \infty \). Similarly, by Lemma 6.6,

$$\begin{aligned} P({\tilde{\Delta }}_{\textrm{adj}}\le x, T_{\textrm{Max}}>l_N)&\ge P({\tilde{\Delta }}_{\textrm{adj}}\le x, T_{\textrm{Max}}>l_N, \Omega _N)\\&\ge P(S_a\le x-\epsilon , M_a>l_N+\epsilon )\\&\rightarrow \Phi (x-\epsilon )(1-F(y+\epsilon )), \end{aligned}$$

as \(N,T\rightarrow \infty \). So

$$\begin{aligned} \Phi (x-\epsilon )(1-F(y+\epsilon )) \le \lim _{N,T\rightarrow \infty } P({\tilde{\Delta }}_{\textrm{adj}}\le x, T_{\textrm{Max}}>l_N)\le \Phi (x+\epsilon )(1-F(y-\epsilon )). \end{aligned}$$

Sending \(\epsilon \rightarrow 0\), the conclusion follows. \(\square \)