On the Exact Statistical Distribution of Econometric Estimators and Test Statistics

Bao, Yong; Liu, Xiaotian; Ullah, Aman

doi:10.1007/978-3-030-62900-7_6

Yong Bao⁹,
Xiaotian Liu⁹ &
Aman Ullah¹⁰

Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))

859 Accesses
1 Citations

Abstract

Barry Arnold has made many fundamental and innovative contributions in different areas of statistics and econometrics, including estimation and inference, distribution theory, Bayesian inference, order statistics, income inequality measures, and characterization problems. His extensive work in the area of distribution theory includes studies on income distributions and Lorenz curves, the exact sampling distribution theory of test statistics, and the characterization of distributions. In our paper, we consider the problem of developing exact sampling distributions of various econometric and statistical estimators and test statistics. The motivation stems from the fact that inference procedures based on the asymptotic distributions may provide misleading results if the sample size is small or moderately large. In view of this, we develop a unified procedure by first observing that a large number of econometric and statistical estimators can be written as ratios of quadratic forms. Their distributions can then be straightforwardly analyzed by using Imhof’s (Biometrika 48:419–426, 1961) method. We show the applications of this procedure to develop the distribution of some commonly used statistics in applied work. The exact results developed will be helpful for practitioners to conduct appropriate inference for any given size of the sample data.

Access provided by Autonomous University of Puebla. Download chapter PDF

Nonparametric confidence intervals for generalized Lorenz curve using modified empirical likelihood

Article 03 November 2023

Panel Asymptotics and Statistical Decision Theory

Article 01 March 2016

Inferential procedures based on the integrated empirical characteristic function

Article 31 July 2018

1 Introduction

In the early twentieth century, Sir R. A. Fisher and others set in motion what is known today as the classical parametric approach to statistical estimation of a finite number of population parameters using sample data. Thus began the practice of statistical inference within the framework of estimation and hypothesis testing of univariate and multivariate probability distributions. The extensive study of conditional probability distributions followed, and hence, the estimation and testing in the conditional mean (regression) and conditional variance (volatility) models became a norm in econometrics and statistics. The estimation of parameters of regression and other models gave rise to the development of statistical properties of econometric estimators of such models like their bias, mean squared error (MSE), and distributions. Within this and in related contexts, Barry Arnold has made many fundamental and innovative contributions in different areas of statistics and econometrics, including estimation and testing, distribution theory and characterization of distributions, income distribution theory and Lorenz curves, among others. See, for example, Arnold (1983, 1987, 2012, 2015), Arnold et al. (1987), Arnold and Sarabia (2018), Coelho and Arnold (2014), Marques et al. (2011), and Villaseñor and Arnold (1984, 1989). All of these have made significant impact on the profession and have been instrumental in advancing statistics and econometrics.

The large sample limiting distribution theory was well developed, but there were challenges to develop needed analytical finite sample distributional results. In general, the large sample properties did not necessarily imply the small sample properties, and if they were used in small or moderately large sample cases, they may give misleading policy implications. This problem was posed since most of econometric estimators were nonlinear functions of multivariate random variables and it was not easy to develop their exact distributional properties. Nagar (1959) developed finite sample approximate bias and MSE of the two-stage least squares (2SLS) estimator of the parameters in a structural model. This was followed by an extensive work of many other econometrians and statisticians on the exact bias and MSE, and some on the exact distribution, of the 2SLS estimator. This literature is summarized in Ullah (2004), also see Anderson and Sawa (1973), Phillips (1980, 1986), and Bao et al. (2017). However, the exact distribution of many other econometric and statistical estimators is not yet developed.

In view of this in this paper, we develop a unified procedure to analyze the exact distribution by observing that many econometric and statistical estimators can be written as ratios of quadratic forms. Their distributions can then be straightforwardly developed by using Imhof’s (1961) result on the distribution of an indefinite quadratic form. We show the applications of this procedure to develop the distribution of some statistics used in applied work. These include squared coefficient of variation for measuring income inequality, squared Sharpe ratio commonly used in financial management, Durbin–Watson test statistic for serial correlation routinely used in practice, Moran’s test statistic for spatial correlation, and goodness of fit in regression models. The exact results developed here will be helpful for practitioners to conduct appropriate inference for any given size of the sample data.

This paper is organized as follows. In Sect. 2, we present the exact distributional results. Then in Sect. 3, we provide a numerical analysis of the exact distribution of a goodness of fit measure. Finally, the conclusion is given in Sect. 4. Throughout, I = I _n is the n × n identity matrix, ι = ι _n is an n × 1 vector of ones, and M ₀ = I − n ⁻¹ ιι ^′.

2 The Exact Distribution

Let us consider the ratio of quadratic forms as

$$\displaystyle \begin{aligned} q=\frac{\boldsymbol{y}^{\prime}\boldsymbol{N}_{1}\boldsymbol{y}}{\boldsymbol{y}^{\prime}\boldsymbol{N}_{2}\boldsymbol{y}},{} \end{aligned} $$

(1)

where y is an n × 1 normal random vector with E(y) = μ and Var(y) = Σ being positive definite, N ₁ and N ₂ are n × n nonstochastic symmetric matrices, and N ₂ is a positive semi-definite.^{Footnote 1} The cumulative distribution function (CDF) of this ratio is

$$\displaystyle \begin{aligned} F(q_{0}) & =\Pr(q\leq q_{0})=\Pr(\boldsymbol{y}^{\prime}\boldsymbol{N}\boldsymbol{y}\leq0), \end{aligned} $$

where N=N ₁−q ₀ N ₂. Note that y ^′ Ny=y ^′ Σ ^−1∕2 QQ ^′ Σ ^1∕2 N Σ ^1∕2 QQ ^′ Σ ^−1∕2 y ≡z ^′ Λ z, where z = Q ^′ Σ ^−1∕2 y ∼N(μ _z, I), μ _z = Q ^′ Σ ^−1∕2 μ, Λ is a diagonal matrix of eigenvalues of Σ ^1∕2 N Σ ^1∕2, and Q is an orthogonal matrix of eigenvectors of Σ ^1∕2 N Σ ^1∕2 such that Q ^′ Σ ^1∕2 N Σ ^1∕2 Q = Λ. So the distribution of the ratio of quadratic forms translates to that of a linear combination of independent non-central chi-squared random variables. Without loss of generality, let λ _j, j = 1, …, r ≤ n, denote non-zero distinct elements of Λ, n _j be the corresponding mutiplicities, $\delta _{j}=\sum _{i\rightarrow j}\mu _{z_{i}}^{2}$, where ∑_i→j denotes summing over i such that the ith element of Λ equals λ _j. Then, $\boldsymbol {z}^{\prime }\boldsymbol {\Lambda }\boldsymbol {z}=\sum _{j=1}^{r}\lambda _{j}\zeta _{j}^{2}$, where $\zeta _{j}^{2}\sim \chi _{n_{j}}^{2}(\delta _{j})$, and they are independent of each other. For a linear combination (with weights λ _j) of independent non-central chi-squared variables $\zeta _{j}^{2}$ (with non-centrality parameter δ _j and degrees of freedom n _j), Imhof (1961) showed that

$$\displaystyle \begin{aligned} \Pr\left(\sum_{j=1}^{r}\lambda_{j}\zeta_{j}^{2}\leq q_{0}^{*}\right) & =\frac{1}{2}-\frac{1}{\pi}\int_{0}^{\infty}\frac{\sin\theta(v)}{v\rho(v)}\mathrm{d}v,{} \end{aligned} $$

(2)

where

$$\displaystyle \begin{aligned} \theta(v) & =-\frac{q_{0}^{*}v}{2}+\sum_{j=1}^{r}\left[\frac{n_{j}}{2}\tan^{-1}(\lambda_{j}v)+\frac{\lambda_{j}\delta_{j}v}{2(1+\lambda_{j}^{2}v^{2})}\right],\\ \rho(v)&=\prod_{j=1}^{r}(1+\lambda_{j}^{2}v^{2})^{n_{j}/4}\exp\left[\frac{\lambda_{j}^{2}v^{2}\delta_{j}}{2(1+\lambda_{j}^{2}v^{2})}\right]. \end{aligned} $$

Setting $q_{0}^{*}=0$, we have $F(q_{0})=\Pr (\boldsymbol {y}^{\prime }\boldsymbol {N}\boldsymbol {y}\leq 0)=\Pr (\boldsymbol {y}^{\prime }\boldsymbol {N}_{1}\boldsymbol {y}/\boldsymbol {y}^{\prime }\boldsymbol {N}_{2}\boldsymbol {y}\leq q_{0}).$

2.1 Goodness of Fit Statistic R ²

For the linear regression model y = Xβ + u, where y = (y ₁, …, y _n)^′ is an n × 1 vector of observations on the dependent variable, X = (x ₁, …, x _n)^′ is an n × k nonstochastic matrix of covariates (including a constant term) with coefficient vector β, and u = (u ₁, …, u _n)^′ collects normally distributed error terms, a goodness of fit statistic is

$$\displaystyle \begin{aligned} R^{2}=\frac{\sum_{i=1}^{n}(y_{i}-\bar{y})(\hat{y}_{i}-\bar{y})}{\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}}=\frac{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{P}\boldsymbol{y}}{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{y}}, \end{aligned} $$

(3)

where $\bar {y}=n^{-1}\sum _{i=1}^{n}y_{i}$, $\hat {y_{i}}=\boldsymbol {x}_{i}^{\prime }\hat {\boldsymbol {\beta }}$, and P = X(X ^′ X)⁻¹ X ^′. We can thus evaluate the distribution of R ² with N ₁ = M ₀ P and N ₂ = M ₀ by applying (2).

Denote M = I −P and P ₀ = n ⁻¹ ιι ^′. Then, we can put N = M ₀ P − a M ₀ = M ₀((1 − a)P − a M) = P + (a − 1)P ₀ − a I. Note that P is idempotent with eigenvalues 1 (of multiplicity k) and 0 (of multiplicity n − k), and P ₀ is also idempotent with eigenvalues 1 (of multiplicity 1) and 0 (of multiplicity n − 1). Since P ₀ v = (P ₀ P)v = P ₀(Pv) for any conformable vector v, we see that if v is an eigenvector of P associated with eigenvalue 0, then it must also be an eigenvector of P ₀ corresponding to its eigenvalue 0. There are n − k linearly independent such vectors. Denote them by v _i, i = 1, …, n − k. Further, Nv _i = [P + (a − 1)P ₀ − a I]v _i = [0 + (a − 1) ⋅ 0 − a ⋅ 1]v _i = −a ⋅v _i, implying that N has eigenvalue − a with the corresponding eigenvectors v _i. Similarly, if w is an eigenvector of P ₀ associated with eigenvalue 1, so it is an eigenvector of P corresponding to its eigenvalue 1, and Nw = [P + (a − 1)P ₀ − a I]w = [1 + (a − 1) ⋅ 1 − a ⋅ 1]w = 0 ⋅w, implying that N has eigenvalue 0 with a corresponding eigenvector w. Further, v _i and w are linearly independent. Since N, P, and P ₀ are all symmetric matrices, their eigenvectors span $\mathbb {R}^{n}$ (see page 179, Exercise 7.48 of Abadir and Magnus 2005). Thus, there must exist k − 1 linearly independent vectors $\boldsymbol {z}_{j}\in \mathbb {R}^{n}$, j = 1, …, k − 1 (also linearly independent of v _i and w) to be eigenvectors of N, P, and P ₀. Eigenvectors z _j correspond to eigenvalue 1 of P since z _j and v _i are linearly independent. Eigenvectors z _j also correspond to eigenvalue 0 of P ₀ since z _j and w are linearly independent. As such, Nz _j = [P + (a − 1)P ₀ − a I]z _j = [1 + (a − 1) ⋅ 0 − a ⋅ 1]z _j = (1 − a) ⋅z _j, implying that N has eigenvalue 1 − a with the corresponding eigenvectors z _j.

Given that N = M ₀((1 − a)P − a M) has two non-zero eigenvalues, 1 − a and − a, with the corresponding mutiplicities k − 1 and n − k, respectively, it is convenient for us to rewrite

$$\displaystyle \begin{aligned} R^{2}=\frac{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{P}\boldsymbol{y}}{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{P}\boldsymbol{y}+\boldsymbol{y}^{\prime}\boldsymbol{M}\boldsymbol{y}}. \end{aligned} $$

(4)

If the error terms are independent and identically distributed (i.i.d) with variance $\sigma _{u}^{2}$, then $\boldsymbol {y}^{\prime }\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {y}/\sigma _{u}^{2}\sim \chi _{k-1}^{2}(\boldsymbol {\beta }^{\prime }\boldsymbol {X}^{\prime }\boldsymbol {M}_{0}\boldsymbol {X}\boldsymbol {\beta })$, $\boldsymbol {y}^{\prime }\boldsymbol {M}\boldsymbol {y}/\sigma _{u}^{2}\sim \chi _{n-k}^{2}(0)$, and they are independent of each other. As such, R ² follows a singly non-central beta distribution (see Koerts and Abrahamse 1969), and its distribution takes on the following form:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Pr(R^{2}\leq r_{0})& =&\displaystyle \sum_{j=0}^{\infty}\frac{1}{j!}\left(\frac{\boldsymbol{\beta}^{\prime}\boldsymbol{X}^{\prime}\boldsymbol{M}_{0}\boldsymbol{X}\boldsymbol{\beta}}{2\sigma_{u}^{2}}\right)^{j}\exp\left(-\frac{\boldsymbol{\beta}^{\prime}\boldsymbol{X}^{\prime}\boldsymbol{M}_{0}\boldsymbol{X}\boldsymbol{\beta}}{2\sigma_{u}^{2}}\right) \\ & &\displaystyle I\left(\left.r_{0}\right|\frac{k-1}{2}+j,\frac{n-k}{2}\right), \end{array} \end{aligned} $$

(5)

where $I(x|a,b)=\int _{0}^{x}z^{a-1}(1-z)^{b-1}\mathrm {d}z$ is the incomplete beta function with parameters a and b. Alternatively, the distribution function can be calculated by (2) with λ ₁ = 1 − a, λ ₂ = −a, n ₁ = k − 1, n ₂ = n − k, $\delta _{1}=\boldsymbol {\beta }^{\prime }\boldsymbol {X}^{\prime }\boldsymbol {M}_{0}\boldsymbol {X}\boldsymbol {\beta }/\sigma _{u}^{2}$, and δ ₂ = 0.^{Footnote 2}

2.2 Squared Sharpe Ratio

In financial portfolio management, a routine task is to assess a portfolio’s performance. The most widely used metric may be the Sharpe ratio, introduced by Sharpe (1966). Recently, Barillas and Shanken (2017) discussed how to compare asset pricing models under the classic Sharpe metric and showed that the quadratic form in the investment alphas is equivalent to the improvement in the squared Sharpe ratio when investment in other assets is permitted in addition to the given model’s factors.

The squared Sharpe ratio of an asset is defined as s = μ ²∕σ ², where μ is the is mean of the asset’s excess return and σ ² is its variance. Given a sample y = (y ₁, …, y _n)^′ of excess returns, the sample squared Sharpe ratio is

$$\displaystyle \begin{aligned} \hat{s}=\left(\frac{\hat{\mu}}{\hat{\sigma}}\right)^{2}=\frac{\boldsymbol{y}^{\prime}\boldsymbol{\iota}\boldsymbol{\iota}^{\prime}\boldsymbol{y}/n^{2}}{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{y}/(n-1)}=\frac{\boldsymbol{y}^{\prime}\left(\frac{\boldsymbol{\iota}\boldsymbol{\iota}^{\prime}}{n^{2}}\right)\boldsymbol{y}}{\boldsymbol{y}^{\prime}\left(\frac{\boldsymbol{M}_{0}}{n-1}\right)\boldsymbol{y}}, \end{aligned} $$

(6)

and (2) can be sued to evaluate its exact finite sample distribution with N ₁ = ιι ^′∕n ² and N ₂ = M ₀∕(n − 1).

When the excess return series is i.i.d. normal, the sample Sharpe ratio $\hat {\xi }=\hat {\mu }/\hat {\sigma }$, when scaled by $\sqrt {n}$, is equivalent to a non-central t random variable with degrees of freedom n − 1 and non-centrality parameter $\sqrt {n}\xi $.^{Footnote 3} As such, the sample squared Sharpe ratio (scaled by n) follows a singly non-central F-distribution, F _1,n−1(ns, 0).^{Footnote 4} So we have

$$\displaystyle \begin{aligned} \Pr(\hat{s}\leq s_{0})=\sum_{j=0}^{\infty}\left(\frac{(\frac{ns}{2})^{j}}{j!}\exp\left(-\frac{ns}{2}\right)\right)I\left(\left.\frac{ns_{0}}{n-1+ns_{0}}\right|\frac{1}{2}+j,\frac{n-1}{2}\right). \end{aligned} $$

(7)

2.3 Squared Coefficient of Variation

The coefficient of variation (CV) has long been used in the literature as one of the income inequality indexes across regions or over time. It is defined as the ratio of the standard deviation of the variable of interest (e.g., household income) to its mean value, namely, σ∕μ. A closely related measure is the squared CV, usually called the coefficient of variation squared (CV2), denoted by α = σ ²∕μ ². When the mean value of the variable of interest is positive, CV and CV2 are monotonic transformation of each other. As neither the population mean nor the standard deviation is known, in practice, we usually use their sample analogues to calculate CV and CV2.

Specifically, the sample CV2 is defined as

$$\displaystyle \begin{aligned} \hat{\alpha}=\frac{\hat{\sigma}^{2}}{\hat{\mu}^{2}}=\frac{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{y}/(n-1)}{\boldsymbol{y}^{\prime}\boldsymbol{\iota}\boldsymbol{\iota}^{\prime}\boldsymbol{y}/n^{2}}=\frac{\boldsymbol{y}^{\prime}\left(\frac{\boldsymbol{M}_{0}}{n-1}\right)\boldsymbol{y}}{\boldsymbol{y}^{\prime}\left(\frac{\boldsymbol{\iota}\boldsymbol{\iota}^{\prime}}{n^{2}}\right)\boldsymbol{y}}. \end{aligned} $$

(8)

Obviously, we can set N ₁ = M ₀∕(n − 1) and N ₂ = ιι ^′∕n ² in (2) to evaluate the exact distribution $\Pr (\hat {\alpha }\leq a)$.

If we further assume that the data is i.i.d., then from the discussion in the previous subsection, it is obvious that the distribution of $\hat {\alpha }$ (scaled by n ⁻¹) is F _n−1,1(0, ns), where s = 1∕α. This is a special case of double non-central F-distribution, and since it is the reciprocal of F _1,n−1(ns, 0), we have

$$\displaystyle \begin{aligned} \Pr(\hat{\alpha} & \leq\alpha_{0})=1-\Pr(\hat{\alpha}\geq\alpha_{0}) \\ & =1-\Pr\left(\frac{1}{\hat{\alpha}}\leq\frac{1}{\alpha_{0}}\right) \\ & =1-\sum_{j=0}^{\infty}\left(\frac{(\frac{n}{2\alpha})^{j}}{j!}\exp\left(-\frac{n}{2\alpha}\right)\right)I\left(\left.\frac{\frac{n}{\alpha_{0}}}{n-1+\frac{n}{\alpha_{0}}}\right|\frac{1}{2}+j,\frac{n-1}{2}\right). \end{aligned} $$

(9)

2.4 The Durbin–Watson Statistic and Moran’s I

For testing the first-order autocorrelation in the error term in the classical linear regression model, the Durbin–Watson statistic for testing H ₀ : ρ = 0 against H ₁ : ρ ≠ 0 in u _i = ρu _i−1 + e _i, where e _i is an i.i.d. innovation term, is calculated as

$$\displaystyle \begin{aligned} d=\frac{\sum_{i=2}^{n}(\hat{u}_{i}-\hat{u}_{i-1})^{2}}{\sum_{i=1}^{n}\hat{u}_{i}^{2}}=\frac{\hat{\boldsymbol{u}}^{\prime}\boldsymbol{A}\hat{\boldsymbol{u}}}{\hat{\boldsymbol{u}}^{\prime}\hat{\boldsymbol{u}}}=\frac{\boldsymbol{y}^{\prime}\boldsymbol{M}\boldsymbol{A}\boldsymbol{M}\boldsymbol{y}}{\boldsymbol{y}^{\prime}\boldsymbol{M}\boldsymbol{y}}, \end{aligned} $$

(10)

where $\hat {\boldsymbol {u}}=(\hat {u}_{1},\ldots ,\hat {u}_{n})^{\prime }$, $\hat {u_{i}}=y_{i}-\hat {y}_{i}$, is the residual vector, and A is a tri-diagonal matrix with − 1 on the super- and sub-diagonal positions, a ₁₁ = a _nn = 1, and a _ii = 2, i = 2, …, n. So setting N ₁ = MAM and N ₂ = M in (2), we can evaluate the exact distribution of the Durbin–Watson statistic. Srivastava (1987) derived the asymptotic distribution of Durbin–Watson statistic under the null hypothesis $u_{i}\sim \text{N}(0,\sigma _{u}^{2}\boldsymbol {I})$ as $[(n-k)d-{\mathrm {tr}}(\boldsymbol {A}\boldsymbol {M})]/\sqrt {2\mathrm {tr}(\boldsymbol {A}\boldsymbol {M})^{2}}\rightarrow \mathrm {N}(0,1)$.

For spatial data, Moran’s I statistic is to test for possible correlation across space. It is calculated as

$$\displaystyle \begin{aligned} I=\frac{n}{\boldsymbol{1}^{\prime}\boldsymbol{W}\boldsymbol{1}}\frac{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{W}\boldsymbol{M}_{0}\boldsymbol{y}}{\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{y}},{} \end{aligned} $$

(11)

where W is the so-called spatial weights matrix with zeros on the diagonal.^{Footnote 5} Again, its exact distribution can be straightforwardly evaluated by (2).

3 Illustration

In this section, we illustrate the performance of the exact result via (2) in comparison with the asymptotic distributional results. We focus on the statistic R ². As discussed in Xu (2014), in the statistical and public health communities, reliable inference on R ² has attracted a lot of attention. The literature on statistical inference of R ² has been scarce; however, Xu (2014) developed the asymptotic distribution of R ² in linear regression models with possibly nonnormal errors and discussed the F-distribution approximation with degrees of freedom adjustment. In Xu’s (2014) setup, the data is demeaned such that $\bar {y}=0$. Here, we relax this restriction. We begin with the general case when the error distribution may be nonnormal. In what follows, let γ ₁ and γ ₂ denote the skewness and excess kurtosis coefficients of the error distribution. Obviously, when the error is normal, γ ₁ = γ ₂ = 0.

Recall that we have written R ² = y ^′ M ₀ Py∕(y ^′ M ₀ Py + y ^′ My). Below we present the asymptotic distributions of R ² and two monotonic transformations of it.

Theorem 1

For the linear regression model, y = Xβ + u , where X is nonstochastic and u consists of i.i.d. errors, R ² , R ²∕(1 − R ²), and $\log (R^{2}/(1-R^{2}))$ have the following asymptotic distributions:

(12)

(13)

(14)

Proof

By substitution, y ^′ M ₀ Py = (Xβ+u)^′ M ₀ P(Xβ+u) = β ^′ X ^′ M ₀ Xβ+u ^′ M ₀ Pu+2u ^′ M ₀ Xβ. Using results on the moments of quadratic forms in nonnormal random vectors (see, for example, Bao and Ullah 2010), we have $\mathrm {E}(\boldsymbol {u}^{\prime }\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {u})=\sigma _{u}^{2}\mathrm {tr}(\boldsymbol {M}_{0}\boldsymbol {P})=k{\sigma _{u}^{2}}-n^{-1}\boldsymbol {1}^{\prime }\boldsymbol {P}\boldsymbol {1}$ and $\mathrm {Var}(\boldsymbol {u}^{\prime }\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {u})={\sigma _{u}^{4}}[\gamma _{2}\mathrm {tr}(\boldsymbol {M}_{0}\boldsymbol {P}\odot \boldsymbol {M}_{0}\boldsymbol {P})+2\mathrm {tr}(\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {M}_{0}\boldsymbol {P})]$, where ⊙ denotes the Hadamard product operator. Since the idempotent matrix P has elements of order O(n ⁻¹) and M ₀ is uniformly bounded in row and column sums, we can write $\mathrm {Var}(\boldsymbol {u}^{\prime }\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {u})={2\sigma _{u}^{4}}\mathrm {tr}(\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {M}_{0}\boldsymbol {P})+o(1)={2\sigma _{u}^{4}[}\mathrm {tr}(\boldsymbol {P})-2n^{-1}\boldsymbol {1}^{\prime }\boldsymbol {P}\boldsymbol {P}\boldsymbol {1}+n^{-2}(\boldsymbol {1}^{\prime }\boldsymbol {P}\boldsymbol {1})^{2}]+o(1)=O(1)$. Thus we can claim n ^−1∕2 u ^′ M ₀ Pu = o _P(1). Using the central limit theorem on linear and quadratic forms in random vectors (see Kelejian and Prucha 2001), we have $n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {M}_{0}\boldsymbol {X}\boldsymbol {\beta }\overset {d}{\rightarrow }\mathrm {N}(0,\sigma _{u}^{2}\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta })$, where Σ =lim_n→∞ n ⁻¹ X ^′ M ₀ X. So $n^{-1/2}(\boldsymbol {y}^{\prime }\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {y}-\boldsymbol {\beta }^{\prime }\boldsymbol {X}^{\prime }\boldsymbol {M}_{0}\boldsymbol {X}\boldsymbol {\beta })=2n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {M}_{0}\boldsymbol {X}\boldsymbol {\beta }+o_{P}(1)\overset {d}{\rightarrow }\mathrm {N}(0,4\sigma _{u}^{2}\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta })$. Similarly, $n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {M}\boldsymbol {u}=n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {u}+o_{P}(1)\overset {d}{\rightarrow }\mathrm {N}(\sigma _{u}^{2},\sigma _{u}^{4}(2+\gamma _{2}))$. Further, Cov(u ^′ M ₀ Xβ, u ^′ u) = E(β ^′ X ^′ M ₀ uu ^′ u) = γ ₁ β ^′ X ^′ M ₀ ι = 0. Following Kelejian and Prucha (2001) again, we can show that any linear combination of n ^−1∕2 u ^′ M ₀ Xβ and $n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {u}-\sigma _{u}^{2}$ (say, $l_{1}n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {M}_{0}\boldsymbol {X}\boldsymbol {\beta }+l_{2}(n^{-1/2}\boldsymbol {u}^{\prime }\boldsymbol {u}-\sigma _{u}^{2})$, where l ₁ and l ₂ are non-zero constants) is asymptotically normal ($\mathrm {N}(0,l_{1}^{2}\sigma _{u}^{2}\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta }+l_{2}^{2}\sigma _{u}^{4}(2+\gamma _{2}))$). Therefore,

$$\displaystyle \begin{aligned} \sqrt{n}\left(\begin{array}{c} n^{-1}\boldsymbol{y}^{\prime}\boldsymbol{M}_{0}\boldsymbol{P}\boldsymbol{y}{-}n^{-1}\boldsymbol{\beta}^{\prime}\boldsymbol{X}^{\prime}\boldsymbol{M}_{0}\boldsymbol{X}\boldsymbol{\beta}\\ n^{-1}\boldsymbol{u}^{\prime}\boldsymbol{M}\boldsymbol{u}{-}\sigma_{u}^{2} \end{array}\right)\overset{d}{\rightarrow}\mathrm{N}\left(\left(\begin{array}{c} 0\\ 0 \end{array}\right),\left(\begin{array}{cc} 4\sigma_{u}^{2}\boldsymbol{\beta}^{\prime}\boldsymbol{\Sigma}\boldsymbol{\beta} & 0\\ 0 & \sigma_{u}^{4}(2{+}\gamma_{2}) \end{array}\right)\right). \end{aligned}$$

The asymptotic distributions of R ² = y ^′ M ₀ Py∕(y ^′ M ₀ Py + u ^′ Mu), R ²∕(1 − R ²) = y ^′ M ₀ Py∕u ^′ Mu, and $\log (R^{2}/(1-R^{2}))=\log (\boldsymbol {y}^{\prime }\boldsymbol {M}_{0}\boldsymbol {P}\boldsymbol {y}/\boldsymbol {u}^{\prime }\boldsymbol {M}\boldsymbol {u}$) then follow immediately from the delta method. □

Note that R ², R ²∕(1 − R ²), and $\log (R^{2}/(1-R^{2}))$ are monotonic transformations of each other. Thus,

$$\displaystyle \begin{aligned} \Pr(R^{2}\leq r_{0})=\Pr\left(\frac{R^{2}}{1-R^{2}}\leq\frac{r_{0}}{1-r_{0}}\right)=\Pr\left[\log\left(\frac{R^{2}}{1-R^{2}}\right)\leq\log\left(\frac{r_{0}}{1-r_{0}}\right)\right].{} \end{aligned} $$

(15)

When the error is normally distributed, by using (2) and setting N ₁ = M ₀ P and N ₂ = M ₀, we can calculate the exact distribution of R ² and, equivalently, that of R ²∕(1 − R ²) or $\log (R^{2}/(1-R^{2}))$, in light of the above relationship. The asymptotic distribution, however, crucially depends on which statistic we are using. From (13)–(14), we see that when the signal to noise ratio, measured by $\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta }/\sigma _{u}^{2}$, increases, the asymptotic distribution of R ²∕(1 − R ²) becomes more dispersed, whereas the asymptotic distribution of $\log (R^{2}/(1-R^{2}))$ becomes more concentrated. Its effect on the asymptotic distribution of R ² is ambiguous, and it depends on the strength of the signal to noise ratio.^{Footnote 6} An interesting case is the extreme case when β = 0. In this case, while R ² and R ²∕(1 − R ²) have well-defined asymptotic distributions, $\log (R^{2}/(1-R^{2}))$ does not have a properly defined asymptotic distribution.^{Footnote 7} The exact distribution is free of this kind of pitfall and can be calculated regardless of the strength of the signal to noise ratio.

Figures 1, 2, 3, and 4 plot the cumulative distribution functions of the three statistics by comparing the true, exact, and asymptotic distributions for samples sizes 10, 20, 50, and 100, based on averages across 100,000 simulations.^{Footnote 8} The data generating process is y = β ₀ + β ₁ x ₁ + β ₂ x ₂ + u, where x ₁ and x ₂ are generated from two independent i.i.d. N(0, 4) random variables, and the error term is simulated as i.i.d. $\mathrm {N}(0,\sigma _{u}^{2})$. We fix β = (β ₀, β ₁, β ₂)^′ = (1, 0.3, 0.5)^′ and set σ _u = 0.1, 1, 5, 10, corresponding to scenarios from high to low signal to noise ratios. We observe that regardless of the sample size and the signal to noise ratio, the exact distribution matches precisely the true distribution. The asymptotic distributions fare reasonably well in small samples with n = 10 when σ = 0.1, corresponding to the situation of high signal to noise ratio. When σ = 10, the asymptotic distributions can be quite off the exact distribution in small samples. Xu (2014) recommended using $\log (R^{2}/(1-R^{2}))$ by arguing that its asymptotic distribution is more stable. We see here clearly that this is not necessarily the case, as it depends on the signal to noise ratio.

4 Concluding Remarks

In this paper, we have presented a unified development of the exact distributions of many econometric statistics. These results can be straightforwardly implemented by numerical integration. In the context of the exact distribution of a goodness of fit measure, we numerically demonstrate that the asymptotic distribution may not carry forward in the small sample case. The exact distributional results developed in the paper could be easily extended to a class of some other econometric and statistical estimators used in practice that can be written as ratios of quadratic forms.

Notes

1.
If they are not symmetric, we can simply replace N ₁ and N ₂ by $(\boldsymbol {N}_{1}+\boldsymbol {N}_{1}^{\prime })/2$ and $(\boldsymbol {N}_{2}+\boldsymbol {N}_{2}^{\prime })/2$, respectively.
2.
The linkage between Imhof’s formula and the non-central F (see the next subsections) and beta distribution functions was discussed in Ennis and Johnson (1993).
3.
The connection of the Sharpe ratio to the t-distribution seems to originate in Miller and Gehr’s (1978) note on the bias of the Sharpe ratio.
4.
This notation is from the double non-central F-distribution with non-centrality parameters δ ₁ and δ ₂ and the corresponding degrees of freedom d ₁ and d ₂, denoted by $F_{d_{1},d_{2}}(\delta _{1},\delta _{2}$).
5.
If we are interested instead in testing whether the spatial correlation arises from the unobservable error term in a linear regression model, Moran’s I can be calculated with M replacing M ₀ in (11).
6.
More specifically, when $\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta }/\sigma _{u}^{2}<1$, as $\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta }/\sigma _{u}^{2}$ goes up, the asymptotic distribution of R ² gets more dispersed. But when $\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta }/\sigma _{u}^{2}>1$, as $\boldsymbol {\beta }^{\prime }\boldsymbol {\Sigma }\boldsymbol {\beta }/\sigma _{u}^{2}$ goes up, the asymptotic distribution of R ² gets more concentrated.
7.
Recall that the null distribution of the F statistic for testing the overall significance of a linear regression, which is proportional to R ²∕(1 − R ²), has a well-defined distribution.
8.
We never know the true distribution. But we believe that averaging 100,000 simulations should give results very close to the truth. In calculating the exact distributions via (2), we used Matlab’s integral function.

References

Abadir, K. M., & Magnus, J. R. (2005). Matrix algebra. New York: Cambridge University Press.
Book Google Scholar
Anderson, T. W., & Sawa, T. (1973). Distributions of estimates of coefficients of a single equation in a simultaneous system and their asymptotic expansions. Econometrica, 41, 683–714.
Article MathSciNet Google Scholar
Arnold, B. C. (1983). Pareto distributions. Fairland: International Cooperative Publishing House.
MATH Google Scholar
Arnold, B. C. (1987). Majorization and the Lorenz curve. New York: Springer.
Google Scholar
Arnold, B. C. (2012). On the Amato inequality index. Statistics and Probability Letters, 82, 1504–1506.
Article MathSciNet Google Scholar
Arnold, B. C. (2015). Pareto distributions (2nd ed.). Boca Raton: CRC Press.
Book Google Scholar
Arnold, B. C., & Sarabia, J. M. (2018). Analytic expressions for multivariate Lorenz surfaces. Sankhya A, 80, 84–111.
Article MathSciNet Google Scholar
Arnold, B. C., Robertson, C. A., Brockett, P. L., & Shu, B.-Y. (1987). Generating ordered families of Lorenz curves by strongly unimodal distributions. Journal of Business & Economic Statistics, 5, 305–308
Google Scholar
Bao, Y., & Ullah, A. (2010). Expectation of quadratic forms in normal and nonnormal variables with applications. Journal of Statistical Planning and Inference, 140, 193–1205.
Article MathSciNet Google Scholar
Bao, Y., Ullah, A., & Wang, Y. (2017). Distribution of the mean reversion estimator in the Ornstein–Uhlenbeck process. Econometric Reviews, 36, 1039–1056.
Article MathSciNet Google Scholar
Barillas, F., & Shanken, J. (2017). Which alpha? Review of Financial Studies, 30, 1316–1338
Article Google Scholar
Coelho, C. A., & Arnold, B. C. (2014). On the exact and near-exact distributions of the product of generalized Gamma random variables and the generalized variance. Communications in Statistics Theory and Methods, 43, 2007–2033.
Article MathSciNet Google Scholar
Ennis, D. M., & Johnson, N. L. (1993). Noncentral and central chi-square, F and beta distribution functions as special cases of the distribution function of an indefinite quadratic form. Communications in Statistics Theory and Methods, 22, 897–905.
Article MathSciNet Google Scholar
Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables. Biometrika, 48, 419–426.
Article MathSciNet Google Scholar
Kelejian, H. H., & Prucha, I. R. (2001). On the asymptotic distribution of the Moran I test statistic with applications. Journal of Econometrics, 104, 219–257.
Article MathSciNet Google Scholar
Koerts, J., & Abrahamse, A. P. J. (1969). On the theory and application of the general linear model. Rotterdam: Universitaire Pers Rotterdam
MATH Google Scholar
Marques, F. J., Coelho, C. A., & Arnold, B. C. (2011). A general near-exact distribution theory for the most common likelihood ratio test statistics used in multivariate analysis. TEST, 20, 180–203.
Article MathSciNet Google Scholar
Miller, R. E., & Gehr, A. K. (1978). Sample size bias and Sharpe’s performance measure: A note. The Journal of Financial and Quantitative Analysis, 13, 943–946.
Article Google Scholar
Nagar, A. L. (1959). The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica, 27, 575–595.
Article MathSciNet Google Scholar
Phillips, P. C. B. (1980). Finite sample theory and the distributions of alternative estimators of the marginal propensity to consume. Review of Economic Studies, 47, 183–224.
Article MathSciNet Google Scholar
Phillips, P. C. B. (1986). The exact distribution of the Wald statistic. Econometrica, 54, 881–895.
Article MathSciNet Google Scholar
Sharpe, W. F. (1966). Mutual fund performance. The Journal of Business, 39, 119–138.
Article Google Scholar
Srivastava, M. S. (1987). Asymptotic distribution of Durbin-Watson statistic. Economics Letters, 24, 157–160.
Article MathSciNet Google Scholar
Ullah, A. (2004). Finite sample econometrics. New York: Oxford University Press.
Book Google Scholar
Villaseñor, J. A., & Arnold, B. C. (1984). Some examples of fitted general quadratic Lorenz curves. Technical report no. 130, Riverside, CA: Statistics Department, University of California.
Google Scholar
Villaseñor, J. A., & Arnold, B. C. (1989). Elliptical Lorenz curves. Journal of Econometrics, 40, 327–338.
Article MathSciNet Google Scholar
Xu, L. (2014). R-squared Inference under Non-normal Error. Dissertation, Department of Statistics, University of Washington.
Google Scholar

Download references

Acknowledgements

We are grateful to an anonymous referee for very helpful comments.

Author information

Authors and Affiliations

Department of Economics, Krannert School of Management, Purdue University, West Lafayette, IN, USA
Yong Bao & Xiaotian Liu
Department of Economics, University of California, Riverside, CA, USA
Aman Ullah

Authors

Yong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Aman Ullah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aman Ullah .

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC, USA
Indranil Ghosh
Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada
N. Balakrishnan
Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
Hon Keung Tony Ng

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bao, Y., Liu, X., Ullah, A. (2021). On the Exact Statistical Distribution of Econometric Estimators and Test Statistics. In: Ghosh, I., Balakrishnan, N., Ng, H.K.T. (eds) Advances in Statistics - Theory and Applications. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-62900-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-62900-7_6
Published: 18 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62899-4
Online ISBN: 978-3-030-62900-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us