Abstract
This paper considers testing for cross-sectional dependence in a panel factor model. Based on the model considered by Bai (Econometrica 71: 135–171, 2003), we investigate the use of a simple \(F\) test for testing for cross-sectional dependence when the factor may be known or unknown. The limiting distributions of these \(F\) test statistics are derived when the cross-sectional dimension and the time-series dimension are both large. The main contribution of this paper is to propose a wild bootstrap \(F\) test which is shown to be consistent and which performs well in Monte Carlo simulations especially when the factor is unknown.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Cross-sectional dependence caused by common shocks can seriously impact inference as well as estimation. For example, Andrews (2005) demonstrates that common shocks can result in inconsistent estimates in cross-sectional regressions and accordingly serious consequences for statistical inference. Footnote 1 To deal with the problems of common shocks, Bai (2003) considers a panel factor model, and proposes a principal components (PC) method to consistently estimate the factor and loading. Assuming that the common factor is the only source of cross-section dependence, testing for zero factor loadings is also a test for no cross-sectional dependence in the panel factor model considered by Bai (2003). Footnote 2 This is done using a simple \(F\) statistic that tests the null hypothesis of zero factor loading. It is well known that the limiting distribution of this \(F\) statistic can be approximated by a chi-squared distribution, when the cross-sectional dimension \(n\) is fixed and the time-series dimension \(T\) is large. For the case of large \(n\) and fixed \(T,\) one can use the results of Boos and Brownie (1995) and Akritas and Arnold (2000) to infer that the asymptotic distribution of an appropriately normalized \(F\) statistic is also normal. However, as far as we know, there is no result regarding the asymptotic distribution of this \(F\) statistic when both \(n\) and \(T\) are large.Footnote 3 The robustness of the \(F\) test with respect to serial correlation in time-series has been studied extensively in the literature, e.g., Krämer (1989) and Krämer (1997). The contribution of this paper is to suggest the use of a bootstrap \(F\) test for testing the cross-sectional dependence with large \(n\) and \(T\) when the common factor may be known or unknown. We also allow for heteroskedasticity across the cross-sectional and time-series dimensions. For this purpose, we use the wild bootstrap method which is well developed in the statistics and econometrics literature. Section 2 introduces the factor model. Section 3 shows that the limiting distribution of the proposed \(F\) statistic when the unknown factor is replaced by its estimate. In Sect. 4, we propose a wild bootstrap \(F\) test and prove its consistency. Section 5 presents the Monte Carlo results, while Sect. 6 concludes. All the proofs are relegated to the Appendix.
For the asymptotic results in this paper, we use the joint limit, \( (n,T)\rightarrow \infty \). Specifically, we assume that \(\frac{T}{n} \rightarrow c\) as \((n,T)\rightarrow \infty \), where \(0<c<\infty .\) We use \( \overset{p}{\rightarrow }\) and \(\overset{d}{\rightarrow }\) to denote convergence in probability and in distribution, respectively. \( F_{t}\) is used to denote the common factor, while \(F_{\lambda }\) is used to denote the \(F\) statistic testing for zero factor loading. The bootstrap sample and the bootstrap test statistic will be denoted with the superscript star. For example, \(F_{\lambda }^{*}\) and \(P^{*} \) indicate the bootstrap \(F\) statistic and the bootstrap probability measure. Let \(\delta _{nt}=\min \left\{ \sqrt{n},\sqrt{T}\right\} .\) Lastly, let \(K\left( \cdot ,\cdot \right) \) denote the Kolmogorov metric, i.e., \( K\left( P,Q\right) =\sup _{x}\left|P\left( x\right) -Q\left( x\right) \right|\) for distribution functions \(P\) and \(Q\).
2 The model
Consider a panel data factor modelFootnote 4
where \(y_{it}\) is a scalar, \(\lambda _{i}\) is the loading, \(F_{t}\) is the common factor, and \(u_{it}\) the independent idiosyncratic error term across \(i\) and \(t\). To test the null hypothesis of no cross-sectional dependence, we set the null as
against the alternative that
To construct the \(F\) statistic, let \(RRSS=\sum _{i=1}^{n} \sum _{t=1}^{T}y_{it}^{2}\) denote the residual sum of squares from the restricted model, while \(URSS=\sum _{i=1}^{n}\sum _{t=1}^{T}\widetilde{u} _{it}^{2}\) denote the residual sum of squares from the unrestricted model
when \(F_{t}\) is known, or \(URSS=\sum _{i=1}^{n}\sum _{t=1}^{T}\widehat{u} _{it}^{2}\), from the unrestricted model
when \(F_{t}\) is unknown. The standard \(F\) statistic is defined as
Rewriting Eq. (1) in matrix notation, we have
where \(y\) is a \(T\times n\) matrix of observed data, \(u\) is a \(T\times n\) matrix of idiosyncratic errors, \(\Lambda \) is \(n\times 1\), and \(F\) is \( T\times 1\).
It is important to note that \(F_{t}\) \((t=1,2,\ldots ,T)\) may or may not be observable. If \(F_{t}\) is observable, \(\lambda _{i}\) can be estimated using ordinary least squares (OLS). That is,
On the other hand, if \(F_{t}\) is not observable, one can estimate \(F_{t}\) using the method of Principal Components subject to the constraint \( F^{^{\prime }}F/T=I_{r}\). As illustrated in Bai (2003), \(\widehat{F}=\left( \widehat{F}_{1},\cdots ,\widehat{F}_{T}\right) ^{^{\prime }}\), the vector of estimated factor, is \(\sqrt{T}\) times the eigenvectors corresponding to the largest eigenvalue of \(\frac{yy^{^{\prime }}}{nT}\). Given \(\widehat{F}\), one can obtain \(\widehat{\Lambda }=\left( \widehat{\lambda }_{1},\cdots , \widehat{\lambda }_{n}\right) ^{^{\prime }}=y^{^{\prime }}\widehat{F}/T\).
3 \(F\) test
In this section, we discuss the asymptotic distribution of the \(F\) statistic for three cases: (i) fixed \(n\) and large \(T\), (ii) large \(n\) and fixed \(T\), and (iii) large \(n\) and large \(T\). Based on these asymptotic results, we argue that the \(F\) distribution may not be always appropriate to use, and we suggest a bootstrap \(F\) test as a good alternative.
3.1 The asymptotics of the \(F\) statistic
When \(F_{t}\) is known, the \(F\) statistic to test the null hypothesis \(H_{0}:\lambda _{i}=0\) for all \(i\), is given by \(F_{\lambda }= \frac{n\left( T-1\right) }{n}\frac{RRSS-URSS}{URSS}\), where \( RRSS=\sum _{i=1}^{n}\sum _{t=1}^{T}y_{it}^{2}\), and \(URSS=\sum _{i=1}^{n} \sum _{t=1}^{T}\left( y_{it}-\widetilde{\lambda }_{i}F_{t}\right) ^{2}.\)
-
(i)
When \(n\) is fixed and \(T\rightarrow \infty ,\) \(F_{\lambda }\) can be approximated by a chi-squared distribution,
$$\begin{aligned} nF_{\lambda }\overset{d}{\rightarrow }\chi _{n}^{2}. \end{aligned}$$ -
(ii)
When \(n\rightarrow \infty \) and \(T\) is fixed, \(F_{\lambda }\) is asymptotically normalFootnote 5:
In this section, we provide the asymptotic properties of the \(F\) statistic with large \(n\) and large \(T\) and with known and unknown \(F_{t}.\) Our analysis is based on the following assumptions:
Assumption 1
The error term, \(u_{it},\) is assumed to be independent across both the cross-section and time-series dimensions.
Assumption 2
-
1.
The common factor satisfies \(\frac{1}{T}\sum _{t=1}^{T}F_{t}^{2}\overset{p}{\rightarrow }\phi _{F}<\infty \).
-
2.
The factor loading \(\lambda _{i}\) is either deterministic or stochastic such that \(\frac{1}{n}\sum _{i=1}^{n}\lambda _{i}^{2}\overset{p}{ \rightarrow }\phi _{\lambda }<\infty \).
Assumption 3
\(\left\{ \lambda _{i}\right\} ,\) \(\left\{ F_{t}\right\} \), and \( \left\{ u_{it}\right\} \) are independent of each other and among themselves.
Assumption 4
-
1.
For each \(t\), as \(n\rightarrow \infty ,\)
$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\lambda _{i}u_{it}\overset{d}{\rightarrow } N\left( 0,\Gamma _{t}\right) \end{aligned}$$where
$$\begin{aligned} \Gamma _{t}=\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n} \sum _{j=1}^{n}E\left[ \lambda _{i}\lambda _{j}u_{it}u_{jt}\right] ; \end{aligned}$$ -
2.
For each \(i\), as \(T\rightarrow \infty ,\)
$$\begin{aligned} \frac{1}{\sqrt{T}}\sum _{t=1}^{T}F_{t}u_{it}\overset{d}{\rightarrow }N\left( 0,\Phi _{i}\right) \end{aligned}$$where
$$\begin{aligned} \Phi _{i}=\lim _{n\rightarrow \infty }\frac{1}{T}\sum _{t=1}^{T}\sum _{s=1}^{T}E \left[ F_{t}F_{s}u_{is}u_{it}\right] ; \end{aligned}$$ -
3.
Let \(\alpha _{t}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\lambda _{i}u_{it}\) and \(\beta _{i}=\frac{1}{\sqrt{T}}\sum _{t=1}^{T}F_{t}u_{it}.\) As \(\left( n,T\right) \rightarrow \infty \)
$$\begin{aligned} \frac{\frac{1}{\sqrt{T}}\sum _{t=1}^{T}\left( \alpha _{t}^{2}-E\left( \alpha _{t}^{2}\right) \right) }{\sqrt{\frac{1}{T}Var\sum _{t=1}^{T}\alpha _{t}^{2}}} \overset{d}{\rightarrow }N\left( 0,1\right) \end{aligned}$$and
$$\begin{aligned} \frac{\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\left( \beta _{i}^{2}-E\left( \beta _{i}^{2}\right) \right) }{\sqrt{\frac{1}{n}Var\sum _{i=1}^{n}\beta _{i}^{2}}} \overset{d}{\rightarrow }N\left( 0,1\right) . \end{aligned}$$
Assumption 3 is standard in the panel data factor literature. Assumption 4 requires that \(\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\lambda _{i}u_{it}\) and \(\frac{1}{\sqrt{T}}\sum _{t=1}^{T}F_{t}u_{it}\) satisfy the central limit theorem (CLT). For Assumption 4 part (3), for each \(t\), \(\alpha _{t}^{2}\sim \Gamma _{t}\chi _{1}^{2},\) \(E\left( \alpha _{t}^{2}\right) =\Gamma _{t},\) and \(Var\left( \alpha _{t}^{2}\right) =2\Gamma _{t}^{2},\) such that
and
This means that
In what follows, we distinguish between the case where the factor \(F_{t}\) is observable versus the case when it is not. If \(F_{t}\) is observable, then one can easily obtain \(\widetilde{\lambda }_{i}\) using OLS. If \(F_{t}\) is unknown, one can use Principal Components to estimate \(\lambda _{i}\) and \( F_{t}\) as in Bai (2003). We first study the benchmark case where \(u_{it}\) is i.i.d. in order to obtain the essence of the results.
Assumption 5
\(u_{it}\overset{i.i.d.}{\sim }(0,\sigma ^{2})\) for all \(i\) and \( t\) with finite fourth order cumulants.
Theorem 1
Suppose Assumptions 1–5 hold. If \(F_{t}\) is known and \(\frac{\sqrt{n}}{T}\rightarrow 0,\) then
as \(\left( n,T\right) \rightarrow \infty \).
Theorem 1 shows that the limiting distribution of the \(F\) statistic, \(F_{\lambda },\) is normal if \(F_{t}\) is known under the condition \(\frac{\sqrt{n}}{T}\rightarrow 0\). If \(F_{t}\) is not observable, however, one needs to estimate \(\lambda _{i}\) and \(F_{t}\). Next we investigate the limiting distributions of the \(F\) statistic when \(F_{t}\) is unknown as \(\left( n,T\right) \rightarrow \infty \).
Theorem 2
Suppose Assumptions 1–5 hold. Let \( 0<c<\infty .\) If \(F_{t}\) is unknown and \(\frac{T}{n}\rightarrow c,\) then
as \(\left( n,T\right) \rightarrow \infty .\)
Theorem 2 indicates that \(F_{\lambda }\) will converge to \(\sqrt{ c}+1\) instead of \(1\) when the factor is replaced by its principal components estimate. This indicates that there is a need to account for asymptotic bias when the common factor is estimated. Note that \(\frac{T}{n} \rightarrow c\) with \(0<c<\infty \) implies that \(\frac{\sqrt{T}}{n} \rightarrow 0\) and \(\frac{\sqrt{n}}{T}\rightarrow 0.\) This means that when \( n \) is relatively smaller than \(T\) (e.g., \(\frac{\sqrt{T}}{n}\rightarrow d,\) \(0<d<\infty )\), \(F_{\lambda }\) may suffer larger bias and this is verified by our simulations in Sect. 5. Note that Theorem 2 can be written as
3.2 Time and cross-section heteroskedasticity
In this section, we extend the results in Theorems 1 and by allowing for heteroskedasticity in \(u_{it}\) along the time and cross-section dimensions.
Assumption 6
\(u_{it}=\sigma _{it}e_{it},\) where \(e_{it}\overset{i.i.d.}{\sim } (0,1)\).
Theorem 3
Suppose Assumptions 1–4 and Assumption hold. Assume \(\frac{T}{n}\rightarrow c\) with \(0<c<\infty \) as \(\left( n,T\right) \rightarrow \infty .\)
-
1.
With \(u_{it}=\sigma _{i}e_{it}\). Assume \(\lim _{n\rightarrow \infty } \frac{1}{n}\sum _{i=1}^{n}\sigma _{i}^{2}<\infty \) and \(\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\sigma _{i}^{4}<\infty .\) (a) If \(F_{t}\) is known,
$$\begin{aligned} \sqrt{n}\left( F_{\lambda }-1\right) \overset{d}{\rightarrow }N\left( 0,2 \frac{\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\sigma _{i}^{4}}{ \left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\sigma _{i}^{2}\right) ^{2}}\right); \end{aligned}$$(b) If \(F_{t}\) is unknown
$$\begin{aligned} \sqrt{n}\left( F_{\lambda }-\left( \sqrt{\frac{T}{n}}+1\right) \right) \overset{d}{\rightarrow }N\left( 0,4\left( c+1\right) \frac{\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\sigma _{i}^{4}}{\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\sigma _{i}^{2}\right) ^{2}}\right) ; \end{aligned}$$ -
2.
With \(u_{it}=\sigma _{t}e_{it},\) (a) If \(F_{t}\) is known,
$$\begin{aligned} \sqrt{n}\left( F_{\lambda }-1\right) \overset{d}{\rightarrow }N\left( 0,2\right) ; \end{aligned}$$(b) If \(F_{t}\) is unknown
$$\begin{aligned} \sqrt{n}\left( F_{\lambda }-\left( \sqrt{\frac{T}{n}}+1\right) \right) \overset{d}{\rightarrow }N\left( 0,4\left( c+1\right) \right) ; \end{aligned}$$ -
3.
With \(u_{it}=\sigma _{it}e_{it}.\) Let \(\omega _{i}^{2}=\lim _{T\rightarrow \infty }\frac{1}{T}\sum _{i=1}^{n}\sigma _{it}^{2}.\) Assume \(\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{2}<\infty \) and \(\lim _{n\rightarrow \infty }\frac{ 1}{n}\sum _{i=1}^{n}\omega _{i}^{4}<\infty .\) (a) If \(F_{t}\) is known,
$$\begin{aligned} \sqrt{n}\left( F_{\lambda }-1\right) \overset{d}{\rightarrow }N\left( 0,2 \frac{\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{4}}{ \left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{2}\right) ^{2}}\right) ; \end{aligned}$$(b) If \(F_{t}\) is unknown
$$\begin{aligned} \sqrt{n}\left( F_{\lambda }-\left( \sqrt{\frac{T}{n}}+1\right) \right) \overset{d}{\rightarrow }N\left( 0,4\left( c+1\right) \frac{ \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{4}}{\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{2}\right) ^{2}}\right) . \end{aligned}$$
4 Bootstrap \(F\) test
Before we go into the validity of bootstrap \(F\) test, we discuss the bootstrap data generating process (DGP). With independent but possibly heteroskedastic errors, one can rely on the wild bootstrap. First of all, this method is quite simple to implement from its construction. In addition, as shown in the simulations of Davidson and Flachaire (2008), the wild bootstrap tests perform well in practice especially under heteroskedasticity. In fact, a specific version (using Rademacher distribution) of the wild bootstrap is shown to outperform another version of the wild bootstrap as well as the pairs bootstraps even when the disturbances are homoskedastic.
We adopt the wild bootstrap using Rademacher distribution in our simulations because it is robust to heteroskedasticity. Let
then the corresponding bootstrap DGP, for example under the null, is constructed as follows:
where \(y_{it}^{*}\) is the bootstrap data, and \(\lambda _{i}=0\) for all \( i \) under the null. \(\varepsilon _{it}^{*}\) follows the Rademacher distribution:
which is introduced by Liu (1988) and developed by Davidson and Flachaire (2008). Footnote 6 Note that one has \(E\left( \varepsilon _{it}^{*}\right) =0\) and \(E\left( \varepsilon _{it}^{*2}\right) =1\) with this setting. Footnote 7
Next we describe in some details how to implement the wild bootstrap test for the panel factor model.
Step 1: One estimates the common factor model. If \(F_{t}\) is known, we simply obtain the LS residuals. If \(F_{t}\) is not observed, we use Principal Components method. Note that the unrestricted residuals as well as the restricted residuals should be computed in order to calculate the \(F\)statistic. Let this empirical statistic be \(F_{\lambda }\).
Step 2: After we obtain the residuals from step 1, we re-generate the data using the restricted residuals and an external random variable \(\varepsilon _{it}^{*}\). Note that we simply use \(u_{it}\) as the restricted residuals which are the same as \(y_{it}\) under the null \(H_{0}:\lambda _{i}=0\) for all \(i\). Now one can compute the bootstrap counterpart of our \(F\) statistic which we denote by \(F_{\lambda }^{*}\).
Step 3: One repeats Step2, say \(B\) times. Then we obtain the distribution of \(F_{\lambda }^{*}\) and calculate the percentile of \( F_{\lambda }^{*}\) which are greater than or equal to \(F_{\lambda }\). Finally setting this proportion at \(\alpha ^{*}\), one can test the null by rejecting \(\alpha ^{*}<\alpha \), at the \(100\times \alpha \%\) significance level.
Next we discuss the asymptotic validity of the proposed bootstrap \(F\) test statistic. The validity of the bootstrap \(F\) statistic can be verified using the results in Mammen (1993a, b), the asymptotic normality of \( F_{\lambda }\) as in Theorem 3, the Berry-Esseen inequality and the Polya’s theorem. That is, the bootstrapped \(F_{\lambda }^{*}\) is consistent which is presented in the following theorem.
Theorem 4
Suppose Assumptions 1–4 and Assumption hold and \(F_{t}\) could be known or unknown, then
where \(\mathcal L \left( F_{\lambda }\right) =P\left( \sqrt{n}\left( F_{\lambda }-a\right) \le x\right) \) and \(\ \mathcal L ^{*}\left( F_{\lambda }^{*}\right) =P^{*}\left( \sqrt{n}\left( F_{\lambda }^{*}-a\right) \le x\right) \) where \(P^{*}\) is the bootstrap probability measure, \(a=1\) when \(F_{t}\) is known and \(a=\sqrt{\frac{T}{n}}+1\) when \(F_{t}\) is unknown.
Theorem 4 provides the consistency of the bootstrap distribution of the \(F\) statistic and justifies the use of a residual-based bootstrap method for testing for no cross-sectional dependence.
According to Theorem 4, the distribution of the bootstrap \(F\) statistic will uniformly converge to the asymptotic distribution of the \(F\) statistic. One concludes that the bootstrap \(F\) statistic can be used in testing for cross-sectional dependence whether \(F_{t}\) is known or not. The following section presents various simulation results in support of this conclusion.
5 Monte Carlo results
In this section, we report results from a simulation experiment that documents the properties of the proposed wild bootstrap \(F\) statistic. We consider the following model:
where \(\lambda _{i}=0\) for all \(i\) under the null. For simplicity, we assume that \(\lambda _{i}\) is a scalar. \(u_{it}\) is generated by \(IIDN(0,1)\). We study the finite sample properties of the \(F\) statistic for \(H_{0}:\lambda _{i}=0\) for all \(i;\) based on various estimators discussed in Sect. 2. We denote the empirical \(F\) statistic and the bootstrap \(F\) statistic as EF and BF, respectively. The limiting distribution of the EF is based on the chi-squared distribution that is computed by the Pulson’s approximation e.g., Johnson et al. (1995). This may be misspecified when the factor is unknown and/or with heteroskedasticity. The sample sizes \(n\ \)and \(T\) are varied over the range \(\left\{ 10,50,100,150\right\} .\)
For each experiment, we perform 5,000 replications and 500 bootstrap iterations. GAUSS 12.0 is used to perform the simulations. Random numbers for \(u_{it}\), \(F_{t}\), and \(x_{it}\) are generated by the GAUSS procedure RNDNS. We generate \(n(T+1000)\) random numbers and then split them into \(n\) series so that each series has the same mean and variance. The first 1,000 observations are discarded for each series.
Note that in this case we generate the bootstrap data from \(y_{it}^{*}=y_{it}\varepsilon _{it}^{*}\) under the null.
Let us first consider the benchmark case under which both \(F_{t}\) and \(u_{it} \) are generated from \(IIDN(0,1)\). The upper panel of Table 1 shows the empirical size of EF and BF with true size 5 %. Given this setting, we find the following: (i) If \(F_{t}\) is known, both EF and BF are quite close to their true size. (ii) In contrast, when \(F_{t}\) is unknown, EF gets extremely shifted to the right so that its size becomes almost 100 %, which implies rejection for almost all cases. BF, however, mimics the empirical \(F\) distribution quite well so that its size stays very close to 5 %. For example, with \((n,T)=\) (50,50) the size of EF is 99.98 % while that of BF is 5.06 % when \(F_{t}\) is unknown. Figures 1, 2, 3, 4 confirm the findings in Table 1.
Next, in order to examine the power of the \(F\) test under some alternative hypotheses, we distinguish between strong and weak cross section dependence. Weak dependence is set at \(\lambda _{i}\sim IIDU\left( 0.01,0.2\right) \) while strong dependence at \(\lambda _{i}\sim IIDU\left( 0.2,0.5\right) \). All the results are reported in the lower panel of Table 1. Overall, the power of the \(F\) test seems satisfactory: (i) The power increases as \(\lambda _{i}\) increases as expected. (ii) Also, the power increases as \(n\) or \(T\) increases. (iii) With weak dependence, both EF and BF have no power or very low power if any, when \(F_{t}\) is unknown. In fact, even for the largest sample size of our experiments, \((n,T)=(100,100)\), the power of EF and BF is no more than 46.1 %.
We also check robustness of our benchmark results to heteroskedasticity and serial correlation in the error term. We first introduce heteroskedasticity into the error as follows:
where \(e_{it}\) is generated from i.i.d. \(N(0,1)\) and \(\sigma _{i}\) is set as either standard normal or simply 10. That is,
Notice that we do not correct for heteroskedasticity to compute the residuals. All the results are reported in the upper panel of Table 2. We find that BF stays robust despite the presence of heteroskedasticity. More specifically: (i) With heteroskedasticity, EF gets over-sized although \(F_{t}\) is known. In fact, the empirical size of EF varies from 13 to 21 %. This is different from our benchmark case where the size of EF stays close to 5 % when \(F_{t}\) is known. (ii) When \(F_{t}\) is unknown, as expected, EF shows extreme over-rejection like in the benchmark case. However, BF behaves well whether or not \(F_{t}\) is known. In fact, the empirical size of BF consistently stays robust varying from 4–6 % for all experiments. Therefore, we conclude that bootstrap \(F\) test in the common factor model can be used under heteroskedasticity.
For serial correlation, the error terms are set as follows:
where \(\rho =\left( 0.4,0.8\right) \) and \(\nu _{it}\sim N(0,1)\). Again we do not correct for serial correlation. In the lower panel of Table 2, one can observe the following: (i) Overall, it appears that both EF and BF are not appropriate to use because of considerable over-rejections. In fact, they get more over-sized as \(n\) increases. We also find that EF and BF get more over-sized as \(\rho \) increases. For example, from Table 2 we note that both EF and BF are severely oversized if we increase \(\rho \) from 0.4 to 0.8. (ii) More specifically, when \(\rho =0.4,\) the empirical size of EF and BF varies between \(5\) and \(17\,\%\) even when \(F_{t}\) is known. (iii) This is an expected result in the sense that the wild bootstrap method used in this paper is not designed for the serially correlated case. Note that Gonçalves and Perron (2010) also obtain some noticeable size distortions for the serially correlated error terms. Hence, one needs to explore alternative bootstrap methods (such as the block bootstrap) rather than the wild bootstrap for this case.
6 Conclusion
High-dimensional data analysis for large \(n\) and large \(T\) has become an integral part of the macro panel data literature. This paper makes two main contributions. First, we derive the limiting distributions of an \(F\) test statistic testing for cross-sectional dependence when the factor is known and unknown. Second, we suggest using a wild bootstrap \(F\) test to test for no cross-sectional dependence. The simulation results show that the proposed wild bootstrap \(F\) test performs well in testing for no cross-sectional dependence and is recommended in practice. Extensive simulations show that the wild bootstrap \(F\) test is robust to heteroskedasticity but sensitive to serial correlation.
Notes
These common shocks could be macroeconomic, political, environmental, health, and/or sociological shocks in nature to mention a few, see Andrews (2005).
In a different context, Schott (2005) proposes a Lagrange multiplier type test to test the independence of random variables when both the dimension and sample are large.
To keep things simple, the number of factors is assumed to be one. The information criteria approach of Bai and Ng (2002) can be used as an alternative method for testing for cross-sectional dependence by testing whether the number of factors is zero or larger than zero. This method is also useful when the number of factors is unknown.
In the statistics literature, Boos and Brownie (1995) and Akritas and Arnold (2000) consider the asymptotic distribution of the ANOVA \(F\) statistic for this case where \(n\) and \(T\) denote the number of treatments and replications per treatment, respectively. Under their settings, it is shown that
$$\begin{aligned} \sqrt{n}\left( F_{n}-1\right) \overset{d}{\rightarrow }N\left( 0,\frac{2T}{ T-1}\right) \end{aligned}$$where \(F_{n}\) is the \(F\) statistic under their setting as \(n\rightarrow \infty \) with fixed \(T\). They also show that the asymptotics above hold in a two-way fixed effects model as well. Extending these results to the interaction effects model, Bathke (2004) shows that the limiting normal distribution can be still achievable with the \(F\) statistic centered at 1. Interestingly, in the econometrics literature, Orme and Yamagata (2006) consider an \(F\) test for individual effects in a panel data model and derive similar limiting distributions.
Alternatively, one may want to use the following bootstrap DGP suggested by Mammen (1993b) especially when the distribution of the error terms is sufficiently asymmetric.
$$\begin{aligned} \varepsilon _{it}^{*}=\left\{ \begin{array}{l} \frac{-\left( \sqrt{5}-1\right) }{2} \text{ with} \text{ probability} p=\frac{\left( \sqrt{5}+1\right) }{2\sqrt{5}} \\ \frac{\left( \sqrt{5}+1\right) }{2} \text{ with} \text{ probability} 1-p \end{array} .\right. \end{aligned}$$However, in their simulations, Davidson and Flachaire (2008) show that the version we adopt here performs at least as good as this version even when the disturbances are asymmetric.
\(E\left( \varepsilon _{it}^{*3}\right) =1\) is often added for the bootstrap error literature.
References
Andrews DWK (2005) Cross-section regression with common shocks. Econometrica 73:1551–1585
Akritas M, Arnold S (2000) Asymptotics for analysis of variance when the number of levels is large. J Am Stat Assoc 449:212–226
Bai J (2003) Inferential theory for factor models of large dimensions. Econometrica 71:135–171
Bai J (2009) Panel data models with interactive fixed effects. Econometrica 77:1229–1279
Bai J, Kao C, Ng S (2009) Panel cointegration with global stochastic trends. J Econom 149:82–99
Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70: 191–221
Bathke A (2004) The ANOVA F test can still be used in some balanced designs with unequal variances and nonnormal data. J Stat Plan Inference 126:413–422
Boos DD, Brownie C (1995) ANOVA and rank tests when the number of treatments is large. Stat Probab Lett 23:183–191
Davidson R, Flachaire E (2008) Wild bootstrap, tamed at last. J Econom 146:162–169
Gonçalves S, Perron B (2010) Bootstrapping factor-augmented regression models. Université de Montréal, Working Paper
Johnson NL, Kotz S, Balakrishinan N (1995) Distributions in statistics: continuous univariate distributions. Wiley, New York
Krämer W (1989) On the robustness of the F-test to autocorrelation among disturbances. Econ Lett 30:37–40
Krämer W, Michels S (1997) Autocorrelation- and heteroskedasticity-consistent T-values with trending data. J Econom 76:141–147
Liu RY (1988) Bootstrap procedures under some non-IID models. Ann Stat 16:1696–1708
Mammen E (1993a) When does bootstrap work?. Asymptotic results and simulations. Springer, New York
Mammen E (1993b) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21:255–285
Orme CD, Yamagata T (2006) The asymptotic distribution of the F-test statistic for individual effects. Econom J 9:404–422
Schott JR (2005) Testing for complete independence in high dimensions. Biometrika 92:951–956
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is dedicated to Walter Krämer for his important contributions to statistics and econometrics.
Appendices
Appendix
A Proof of Theorem 1
Proof
Now we have
where \(R_{\lambda }=\frac{\left( RRSS-URSS\right) }{n}\) and \(\widehat{\sigma }^{2}=\frac{URSS}{\left( nT-n\right) }\) using a set up which is similar to Orme and Yamagata (2006). Rearranging the terms, we have
Expanding the equations, we have
Consider \(I\).
For \({ II}\),
Then
For \(III\),
For \(IV\) and \(V\), as already shown above,
and
After rearranging all the terms, one obtains
It is easy to see that
Now we obtain
by Assumption 4.
Finally,
as \(\left( n,T\right) \rightarrow \infty \) if \(\frac{\sqrt{n}}{T}\rightarrow 0.\) Note that \(\frac{T}{n}\rightarrow c\) with \(0<c<\infty \) implies that \( \frac{\sqrt{n}}{T}=\frac{n}{T}\frac{1}{\sqrt{n}}\rightarrow 0.\)
B Proof of Theorem 2
Proof
First we consider
Consider \(I\). One can easily verify that
as \(\left( n,T\right) \rightarrow \infty \). Note from (Bai (2003), p. 166), that \(\widehat{\lambda }_{i}\widehat{F}_{t}-\lambda _{i}F_{t}\) can be expanded as
For \(II\).
Consider \(II_{a}.\)
Similarly,
Consider \(II_{c}.\)
Hence
Consider \(III\).
Now, we write
Note that \(IV=O_{p}\left( \frac{\sqrt{T}}{\delta _{nt}^{2}}\right) \) and \( V=O_{p}\left( \frac{\sqrt{T}}{\delta _{nt}^{2}}\right) \) as shown above. Now we assume that
Note that \(\frac{\sqrt{T}}{\delta _{nt}^{2}}\rightarrow 0\) when \(\frac{\sqrt{ T}}{n}\rightarrow 0.\) Also \(\frac{T}{n}\rightarrow c\) implies that \(\frac{ \sqrt{T}}{n}=\frac{T}{n}\frac{1}{\sqrt{T}}\rightarrow 0.\) Clearly,
Consider the first term.
Consider \(I+II\).
Consider \(III\).
Consider the second term.
Therefore
We know that
Hence
by Assumption 4 and because \(\frac{1}{\sqrt{n}}\sum _{k=1}^{n} \lambda _{k}u_{kt}\) and \(\frac{1}{\sqrt{T}}\sum _{s=1}^{T}F_{s}u_{is}\) are asymptotically independent. Therefore
as \(\left( n,T\right) \rightarrow \infty \) and \(\frac{T}{n}\rightarrow c.\) Finally
as required.\(\square \)
C Proof of Theorem 3
Proof
First we revisit Theorems 1 and 2 with
Now suppose \(F_{t}\) is known.
Recall that
We know Assumption 4 that
where
Consider III.
It is easy to see that \(IV\) and \(V\) are \(O_{p}\left( \frac{\sqrt{n}}{T} \right) .\) Then
It follows that
or
Hence
Finally
if
and
Next we assume \(F_{t}\) is unknown. Recall
We know that \(II+\) \(III=O_{p}\left( \frac{1}{\delta _{nT}^{2}}\right) .\) Following similar steps as in the proof of Theorem 2 we obtain
Next we allow
and we examine
which lead to
if \(F_{t}\) is known and
if \(F_{t}\) is unknown.
Finally we set \(u_{it}=\sigma _{it}e_{it}.\) Notice that
Recall that
where
with
Recall
and
or
Hence
Finally
if \(F_{t}\) is known and
if \(F_{t}\) is unknown.\(\square \)
Rights and permissions
About this article
Cite this article
Baltagi, B.H., Kao, C. & Na, S. Testing for cross-sectional dependence in a panel factor model using the wild bootstrap \(F\) test. Stat Papers 54, 1067–1094 (2013). https://doi.org/10.1007/s00362-013-0499-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0499-9