Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Panel data contain more degrees of freedom and more sample variability than cross-sectional or time series data. It not only provides the possibility of obtaining more accurate statistical inference, but also provides the possibility of constructing and testing more realistic behavioral hypotheses; see, e.g., [29, 30]. However, panel data also raise many methodological challenges. This paper considers some statistical issues of using panel data in finance research. We consider (i) estimation of standard errors; (ii) multiple equations modeling; (iii) to pool or not to pool; (iv) aggregation and predictions; (v) cross-sectional dependence and (vi) multi-dimensional statistics.

2 Estimation of Panel Standard Errors

Consider a single equation model often used in corporate finance or asset pricing models for \(N\) cross-sectional units observed over \(T\) time periods,

(2.1)

where

$$\begin{aligned} v_{it}=\alpha _i+\lambda _t+u_{it}, \end{aligned}$$
(2.2)

\(\alpha _i\) denotes the individual-specific effects that vary across \(i\), but stay constant over time, \(\lambda _t\) denotes the time-specific effects that are individual-invariant but time-varying, and \(u_{it}\) denotes the impact of those omitted variables that vary across \(i\) and over time. \(t\). Covariance transformation is often used to remove the impacts of \(\alpha _i\) and \(\lambda _t\); see, e.g., [29], Chap. 3. The covariance estimator of is defined as

(2.3)

where

Statistical inference on depends on the property of \(u_{it}\). “Although the literature has used an assortment of methods to estimate standard errors in panel data sets, the chosen method is often incorrect and the literature provides little guidance to researchers as to which method should be used. In addition, some of the advice in the literature is simply wrong.” ([54], p. 436).

Vogelsang [64] showed that the covariance matrix estimate proposed in [22] based on the Newey-West [48] heteroscedastic autocorrelation (HAC) covariance matrix estimator of cross-section averages,

(2.4)

is robust to heterosceasticity, autocorrelation and spatial dependence, where

\(k\left( \displaystyle \frac{j}{m}\right) =1-\displaystyle \frac{j}{m}\) if \(\left| \displaystyle \frac{j}{m}\right| <1\) and \(k\left( \displaystyle \frac{j}{m}\right) =0\) if \(\left| \displaystyle \frac{j}{m}\right| >1\) \(m\) an a priori chosen positive constant less than or equal to \(T\). The choice of \(m\) depends on how strongly an investigator thinks about the serial correlation of the error \(u_{it}\).

The Vogelsang 2012 estimator [64] of the covariance matrix of the covariance estimator, (2.4), is consistent when errors are autocorrelated and heterocedastic provided is strictly exogenous. As noted by Nerlove [47], “all interesting economic behavior is inherently dynamic, dynamic panel models are the only relevant models; what might superficially appear to be a static model only conceals underlying dynamics, since any state variable presumed to influence present behavior is likely to depend in some way on past behavior.” When lagged dependent variables appear in the explanatory variables to capture the inertia in human behavior, strict exogeneity of is violated. Not only the covariance estimator is biased if the time series dimension \(T\) is finite, no matter how large the cross-sectional dimension \(N\) is (e.g., [29], Chap. 3 [5, 24, 35]), so is the Vogelsang 2012 estimator [64] of the covariance matrix. General formulae for the estimator and its covariance matrix when the errors are autocorrelated and heterocedastic for dynamic panel model remain to be developed.

3 Multiple Equations Modeling

One of the prominent features of econometrics analysis is the incorporation of economic theory into the analysis of numerical and institutional data. Economists, from León Walras onwards, perceive the economy as a coherent system. The interdependence of sectors of an economy is represented by a set of functional relations, each representing an aspect of the economy by a group of individuals, firms, or authorities. The variables entering into these relations consist of a set of endogenous (or joint dependent) variables, whose formations are conditioning on a set of exogenous variables which the economic theory regards as given; see, e.g., [57]. Combining the joint dependence and dynamic dependence, a Cowles Commission structural equation model could be specified as,

(3.1)

where are \(G\times 1\) contemporaneous and lagged joint dependent variables, is a \(k\times 1\) vector of strictly exogenous variables, is a \(G\times 1\) vector of time-invariant individual-specific effects and are assumed to be independently, identically distributed over \(i\) and \(t\) with zero mean and nonsingular covariance matrix \(\varOmega _u\). We assume that are observed.

The distinct feature of panel dynamic simultaneous equations models are the joint dependence of and the presence of time persistent effects in the \(i\)th individual’s time series observations. The joint dependence of makes \(B\not =I_G\ \text{ and }\ | B|\not = 0\).

Premultiplying \(B^{-1}\) to (3.1) yields the reduced form specification

(3.2)

where .

Statistical inference can only be made in terms of observed data. The joint dependence of observed variables raises the possibility that many observational equivalent structures could generate the same observed phenomena; see, e.g., [26]. Moreover, the presence of time-invariant individual-specific effects \((\) or ) creates correlations of all current and past realized endogenous variables even is independently, identically distributed across \(i\) and over \(t\) with nonsingular covariance matrix \(\varOmega _u\). Hsiao and Zhou [36] show that the standard Cowles Commission rank and order conditions (e.g., [28]) for the identification of (2.1) still holds provided the roots of \(| B-\lambda \varGamma | =0\) lie outside the unit circle.

When the process is stationary, both the likelihood approach and the generalized method of moments (GMM) approach can be used to make inference on (3.1) or (3.2); see, e.g., [17, 36]. The advantages of the GMM approach are that there is no need to specify the probability density function of the random variable or to worry about how to treat the initial values, \(y_{i0}\). The disadvantages are that in many cases the GMM approach does not guarantee a global minimum and there could be huge number of moment conditions to consider, for instance, the number of moment conditions for the Arellano-Bond [10] type GMM is of order \(T^2\). Moreover, Akashi and Kunitomo [2, 3] show that the GMM approach of estimating the structural form (3.1) is inconsistent if \(\displaystyle \frac{T}{N}\rightarrow c\not = 0 <\infty \) as both \(N\) and \(T\) are large. Even though the GMM approach can yield consistent estimator for the reduced form model (3.2), following the approach of [2, 3, 5], Hsiao and Zhang show [35] that it is asymptotically biased of order \(\sqrt{\displaystyle \frac{T}{N}}\) when both \(N\) and \(T\) are large. The limited Monte Carlo studies conducted by Hsiao and Zhang [35] show that whether an estimator is asymptotically biased or not plays a pivotal role in statistical inference. The distortion of the size of the test for the Arellano-Bond [10] type GMM test could be 100 % for a nominal size of 5 % if \(\displaystyle \frac{T}{N}\rightarrow \not = 0<\infty \).

The advantages of the likelihood approach are that a likelihood function is a natural objective function to maximize and the number of moment conditions is fixed independent of \(N\) and \(T\). The quasi maximum likelihood estimator (QMLE) is asymptotically unbiased independent of the way \(N\) or \(T\) or both tend to infinity; see, e.g., [35, 36]. The disadvantages are that specific assumptions about the initial values need to be made and specific assumptions of the data generating process of need to be imposed to get around the issue of incidental parameters; see, e.g., [36, 38]. When the initial distributions of are misspecified, the QMLE is consistent and asymptotically unbiased only if \(N\) is fixed and \(T\rightarrow \infty \). When \(\displaystyle \frac{N}{T}\rightarrow c\not = 0<\infty \) as \(N,T\rightarrow \infty \), the QMLE is asymptotically biased of order \(\sqrt{\displaystyle \frac{N}{T}}\).

4 To Pool or Not to Pool

Panel data, by nature, focus on individual outcomes. Factors affecting individual outcomes are numerous. Yet a model is not a mirror image of the reality, but a simplification of reality. A good model wishes to capture the essentials that affect the outcomes while allowing the existence of unobserved heterogeneity. When a variable of interest, say \(y\), is modeled as a function of some important factors, say the \(K+m\) variables, , where and are of dimension \(K\) and \(m\), respectively,

(4.1)

One way to justify pooling is to test if and for all \(i\). However, the homogeneity assumption is often rejected by empirical investigation; see, e.g., [40]. When and are treated as fixed and different for each \(i\), the only advantage of pooling is to put the model (4.1) in Zellner’s [65] seemingly unrelated regression framework to improve the efficiency of the estimates of the individual behavioral equation.

One way to accommodate heterogeneity across individuals in pooling is to use a mixed random and fixed coefficient framework proposed by Hsiao, Appelbe and Dineen [37],

(4.2)

where

The coefficients are assumed fixed and different. The coefficients is assumed to be subject to stochastic constraints of the form

(4.3)

where \(A\) is an \(NK\times L\) matrix with known elements, is an \(L\times 1\) vector of constants, and is assumed to be randomly distributed with mean 0 and nonsingular covariance matrix. When , where is an \(N\times 1\) vector of 1’s, , i.e., individual is randomly distributed with mean . The justification for (4.3) is that conditional on , individual’s responses towards changes in are similar. The difference across \(i\) is due to chance mechanism, i.e., satisfying de Finetti’s [20] exchangeability criteria. Hsiao et al. [37] propose a Bayesian solution to obtain best predictors of and .

The advantage of Bayesian framework over the sampling framework to consider the issue of poolability is that all sampling tests essentially exploit the implications of a certain formulation in a specific framework; see, e.g., [15]. They are indirect in nature. The distribution of a test statistic is derived under a specific null, but the alternative is composite. The rejection of a null hypothesis does not automatically imply the acceptance of a specific alternative. It would appear more appropriate to treat the pooling issues as a model selection issue. Hsiao and Sun [32] propose to classify the conditional variables into and (i.e. the dimension of \(K\) and \(m\)) using some well known model selection criterion such as Akaike’s information criterion [1] or Schwarz’s Bayesian information criteria [58]. If \(m=0\) simple pooling is fine. If \(m\not = 0\) then one can consider pooling conditioning on . Their limited Monte Carlo studies appear to show that combining the Bayesian framework with some model selection criterion works well in answering the question of to pool or not to pool.

5 Aggregation and Predictions

One of the tools for reducing the real world detail is through “suitable” aggregation. However, for aggregation not to distort the fundamental behavioral relations among economic agents, certain “homogeneity” conditions must hold between the micro units. Many economists have shown that if micro units are heterogeneous, aggregation can lead to very different relations among macro variables from those of the micro relations; see, e.g., [43, 44, 49, 59, 61, 63].

For instance, consider the simple dynamic equation,

(5.1)

where the error \(u_{it}\) is covariance stationary. Equation (5.1) implies a long-run relation between \(y_{it}\) and ,

(5.2)

where .

Let \(y_t=\sum \limits ^N_{i=1}y_{it}\) and , then a similar long-run relation between and ,

(5.3)

holds for a stationary \(v_t\) if and only if either of the following conditions hold [39]:

  1. (i)

    for all \(i\) and \(j\); or

  2. (ii)

    if , then must lie on the null space of \(D\) for all \(t\), where .

These conditions are fairly restrictive. If “heterogeneity” is indeed present in micro units, then shall we predict the aggregate outcome based on the summation of estimated micro relations or shall we predict the aggregate outcomes based on the estimated aggregate relations? Unfortunately, there is not much work on this specific issue. In choosing between whether to predict aggregate variables using aggregate \((H_a)\) or disaggregate equations \((H_d)\), Grunfeld and Griliches [23] suggest using the criterion of:

(5.4)

where and are the estimates of the errors in predicting aggregate outcomes under \(H_d\) and \(H_a\), respectively. The Grunfeld and Griliches criterion is equivalent to using simple average of micro-unit prediction to generate aggregate prediction if (5.4) holds. As discussed by Hsiao and Wan [34] that if cross-sectional units are not independent, there are many other combination approaches that could yield better aggregate forecasts, such as Bates and Granger regression approach [16], Buckland et al. Bayesian averaging [18], Hsiao and Wan eigenvector approach [34], Swanson and Zeng information combination [60], etc. (for a survey of forecast combinations, see [62]). However, if a model is only a local approximation, then frequent structural breaks could occur from a model’s perspective even there is no break in the underlying structure. In this situation, it is not clear there exists an optimal combination of micro forecasts. Perhaps, “robustness” is a more relevant criterion than “optimality”; see, e.g., [53].

6 Cross-Sectional Dependence

Most panel inference procedures assume that apart from the possible presence of individual invariant but period varying time-specific effects, the effects of omitted variables are independently distributed across cross-sectional units. Often economic theory predicts that agents take actions that lead to interdependence among themselves. For example, the prediction that risk averse agents will make insurance contracts allowing them to smooth idiosyncratic shocks implies dependence in consumption across individuals. Contagion of views could also lead to herding or imitating behavior; see, e.g., [4]. Cross-sectional units could also be affected by common omitted factors. The presence of cross-sectional dependence can substantially complicate statistical inference for a panel data model.

Ignoring cross-sectional dependence in panel data could lead to seriously misleading inference; see, e.g., [33, 56]. However, modeling cross-sectional dependence is a lot more complicated than modeling serial dependence. There is a natural order of how a variable evolves over time. Cross-sectional index is arbitrary. There is no natural ordering. Three popular approaches for taking account the cross-sectional dependence are: spatial approach; see, e.g., [8, 9, 41, 42], factor approach (e.g., [11, 12]), and cross-sectional mean augment approach (e.g., [50, 53]). The spatial approach assumes that there exists a known \(N\times N\) spatial weight matrix \(W\), where the \(i,j\)th element of \(W, w_{ij}\), gives the strength of the interaction between the \(i\)th and \(j\)th cross-sectional units. The conventional specification assumes that the diagonal elements, \(w_{ii}=0\) and \(\sum \nolimits ^{N}_{j=1}w_{ij}=1\) through the row normalization. The only term unknown is the absolute strength, \(\rho \). However, to ensure the interaction between the \(i\)th and \(j\)th unit has a “decaying” effect among cross-sectional units where the “distance” between them increases, \(\rho \) is assumed to have absolute value less than 1. Apart from the fact that it is difficult to have prior information to specify \(w_{ij}\), it also raises the issue of relations between observed sample and the population. If \(N\) is not the population size, the restriction that \(\sum \nolimits ^{N}_{j=1}w_{ij}=1\ \text{ and }\ |\rho | <1\) implies that as \(N\) increases, each element of \(w_{ij}\rightarrow 0\).

Another approach to model cross-sectional dependence is to assume that the variable or error follows a linear factor model,

(6.1)

where is a \(r\times 1\) vector of random factors with mean zero, , is a \(r\times 1\) nonrandom factor loading coefficients, \(u_{it}\) represents the effects of idiosyncratic shocks which is independent of and is independently distributed across \(i\) with diagonal covariance matrix \(D\).

An advantage of factor model over the spatial approach is that there is no need to prespecify the strength of correlations between units \(i\) and \(j\). The disadvantage is that when no restriction is imposed on the factor loading matrix , it implies strong cross-sectional dependence [19]. Unless \(B\) is known, there is no way to find transformation to control the impact of cross-sectional dependence on statistical inference. Bai [11] has proposed methods to estimate a model with factor structure error term. However, most financial data contains large number of cross-sectional units. When \(N\) is large, the estimation of the factor loading matrix, \(B\), is not computational feasible.

Instead of estimating and , Pesaran suggests [50] a simple approach to filter out the cross-sectional dependence by augmenting the cross-sectional mean of observed data to a model. For instance, Pesaran, Schuermann and Weiner propose [53] a global vector autoregressive model (VAR) for an \(m\times 1\) dimensional random variables, to accommodate dynamic cross-dependence by considering

(6.2)

where \(\varPhi _i(L)=I+\varPhi _{1i}L+\cdots +\varPhi _{pi}L^{p_i}\), and \(L\) denotes the lag operator,

(6.3)
$$\begin{aligned} r_{ii}=0,\ \ \sum \limits ^N_{j=1}r_{ij}=1,\ \text{ and }\ \sum \limits ^N_{j=1}r^2_{ij}\rightarrow 0\ \text{ as }\ N\rightarrow \infty . \end{aligned}$$
(6.4)

The weight \(r_{ij}\) could be \(\displaystyle \frac{1}{N-1}\) for \(i\ne j\), or constructed from trade value or other measures of some economic distance and could be time-varying. The global average is inserted into the individual \(i\)’s VAR model,

(6.5)

to take account of the cross-sectional dependence. When can be treated as weakly exogenous (predetermined), the estimation of each \(i\) can proceed using standard time series estimation techniques; see, e.g., [54]. Pesaran et al. [53] show that the weak exogeneity assumption of hold for all countries except for the U.S. because of U.S.’s dominate position in the world. They also show that (6.2) yields better results than (6.5) when cross-sectional units are correlated.

The advantage of Pesaran’s 2006 cross-sectional mean-augmented approach [50] to take account the cross-sectional dependence is its simplicity. However, there are restrictions on its application. The method works when for all \(t\) or if can be considered as a linear combinations of \(\bar{y}_t\) and . It is hard to ensure if \(r>1\). For instance, consider the case that , then . However, if , then while . If , cross-sectional mean does not approximate (6.1). Additional conditions are need to approximate .

7 Multi-dimensional Statistics

Panel data is multi-dimensional. Phillips and Moon [55] have shown that the multi-dimensional asymptotics is a lot more complicated than one-dimensional asymptotics. Financial data typically have cross-sectional and time-series dimension increase at the same rate or some arbitrary rate. Moreover, computing speed and storage capability have enabled researchers to collect, store and analyze data sets of very high dimensions. Multi-dimensional panel will become more available. Classical asymptotic theorems under the assumption that the dimension of data is fixed (e.g., [7]) appear to be inadequate to analyze issues arising from finite sample of very high dimensional data; see, e.g., [14]. For example, Bai and Saranadasa [13] proved that when testing the difference of means of two high dimensional populations, Dempster’s 1958 non-exact test [21] is more powerful than Hotelling’s 1931 \(T^2\)-test [27] even though the latter is well defined. Another example is in the regression analysis economists sometimes consider optimal ways to combine a set of explanatory variables to capture their essential variations as a dimension reduction method when the degrees of freedom are limited (e.g., [6]) or to combine a number of independent forecasts to generate a more accurate forecast; see, e.g., [62]. The former leads to principal component analysis that chooses the combination weights as the eigenvectors corresponding to the largest eigenvalues of the covariance matrix of the set of variables in question. The latter leads to choosing the combination weights proportional to the eigenvector corresponding to the smallest eigenvalue of the prediction mean square error matrix of the set of independent forecasts [31]. However, the true covariance matrix is unknown. Economists have to use the finite sample estimated covariance matrix (or mean square error matrix) in lieu of the true one. Unfortunately, when the dimension of the matrix \((p)\) relative to the available sample \((n)\) is large, \(\displaystyle \frac{p}{n}=c\not = 0\), the sample estimates can be very different from the true ones and whose eigenvectors may point in a random direction [46]; for an example, see [31]. Many interesting and important issues providing insight to finite and large sample issues for high dimensional data analysis remain to be worked out and can be very useful to economists and/or social scientists; see, e.g., [14].

8 Concluding Remarks

Panel data contain many advantages (but they also raise many methodological issues; see, e.g., [29, 30]. This paper attempts to provide a selective summary of what have been achieved and challenging issues confronting panel financial analysis. In choosing an appropriate statistical method to analyze the panel financial data on hand, it is helpful to keep several factors in mind. First, what advantages do panel data offer us in adapting economic theory for empirical investigation over data sets consisting of a single cross-section or time series? Second, what are the limitations of panel data and the econometric methods that have been proposed for analyzing such data. Third, the usefulness of panel data in providing particular answers to certain issues depends critically on the compatibility between the assumptions underlying the statistical inference procedures and the data generating process. Fourth, when using panel data, how can we increase the efficiency of parameter estimates? “Analyzing economic data (or financial data) requires skills of synthesis, interpretation and empirical imagination. Command of statistical methods is only a part, and sometimes a very small part, of what is required to do a first-class empirical research” [25]. Panel data are no panacea. Nevertheless, if “panel data are only a little window that opens upon a great world, they are nevertheless the best window in econometrics” [45].