Keywords

JEL Classifications

An artificial regression is a linear regression that is associated with some other econometric model, which is usually, but not always, nonlinear. It can be used for a variety of purposes, in particular, computing covariance matrices and calculating test statistics. The best-known artificial regression is the GaussNewton regression (GNR), which is discussed in the next section. All artificial regressions share the key properties of the GNR.

The Gauss–Newton Regression

A univariate nonlinear regression model may be written as

$$ {y}_t={x}_t\left(\beta \right)+{u}_t,\ \ \ {u}_t\sim \mathrm{IID}\left(0,\ \ \ {\sigma}^2\right),t=1,\dots, n, $$
(1)

where yt is the tth observation on the dependent variable, and β is a k-vector of parameters to be estimated. Here the scalar function xt(β) is a nonlinear regression function which may depend on exogenous and/or predetermined variables. The model (1) may also be written using vector notation as

$$ \boldsymbol{y}=\boldsymbol{x}\left(\boldsymbol{\beta} \right)+\boldsymbol{u},\ \ \ \mathrm{u}\sim \mathrm{IID}\left(0,\ \ \ {\sigma}^2\mathbf{I}\right), $$
(2)

where y is an n-vector with typical element yt, x(β) is an n-vector with typical element xt(β), and I is an n × n identity matrix.

The Gauss–Newton regression that corresponds to (2) is

$$ \boldsymbol{y}-\boldsymbol{x}\left(\boldsymbol{\beta} \right)=\boldsymbol{X}\left(\boldsymbol{\beta} \right)\boldsymbol{b}+\ \ \ \mathrm{residuals}, $$
(3)

where b is an n-vector of regression coefficients, and the matrix X(β) is n × k with tith element the derivative of xt(β) with respect to βi, the ith component of β. The regressand here is a vector of residuals, and the regressors are matrices of derivatives. When regression (3) is evaluated at the least-squares estimates \( \widehat{\boldsymbol{\beta}} \), it becomes

$$ \widehat{\boldsymbol{u}}\equiv \boldsymbol{y}-\widehat{\boldsymbol{x}}=\widehat{\boldsymbol{X}} \mathrm{b}+\ \ \ \mathrm{residuals}, $$
(4)

where \( \widehat{\boldsymbol{x}}\equiv \boldsymbol{x}\left(\widehat{\boldsymbol{\beta}}\right) \) and \( \widehat{\boldsymbol{X}}\equiv \boldsymbol{X}\left(\widehat{\boldsymbol{\beta}}\right). \)Since the regressand of this artificial regression must be orthogonal to all the regressors, running the GNR (4) is an easy way to check that the NLS estimates actually satisfy the first-order conditions.

The usual OLS covariance matrix for \( \widehat{\boldsymbol{b}} \) from regression (4) is

$$ {s}^2{\left(\widehat{{\boldsymbol{X}}^{\prime }} \widehat{\boldsymbol{X}}\right)}^{-1},\ \ \ \mathrm{where}\ \ \ {s}^2=\frac{1}{n-k}{\left(\boldsymbol{y}-\widehat{\boldsymbol{x}}\right)}^{\prime}\left(\boldsymbol{y}-\widehat{\boldsymbol{x}}\right). $$
(5)

This is also the usual estimator of the covariance matrix of the NLS estimator \( \widehat{\boldsymbol{\beta}} \) under the assumption that the errors are IID. If that assumption were relaxed to allow for heteroskedasticity of unknown form, then (5) would be replaced by a heteroskedasticity-consistent covariance matrix (HCCME) of the form

$$ {\left({\widehat{\boldsymbol{X}}}^{\prime}\widehat{\boldsymbol{X}}\right)}^{-1}{\widehat{\boldsymbol{X}}}^{\prime}\widehat{\Omega}\widehat{\boldsymbol{X}}{\left({\widehat{\boldsymbol{X}}}^{\prime}\widehat{\boldsymbol{X}}\right)}^{-1}, $$
(6)

where \( \widehat{\boldsymbol{\Omega}} \) is an n × n diagonal matrix with squared residuals, probably rescaled, on the principal diagonal. The matrix (6) is precisely what a regression package would give if we ran the GNR (4) and requested an HCCME. Similar results hold if we relax the independence assumption and use a HAC estimator. In every case, a standard estimator of the covariance matrix of \( \widehat{\boldsymbol{b}} \) from the artificial regression (4) is also perfectly valid for the NLS estimates \( \widehat{\boldsymbol{\beta}} \).

If we evaluate the GNR (3) at a vector of restricted estimates \( \tilde{\boldsymbol{\beta}} \), we can use the resulting artificial regression to test the restrictions. For simplicity, assume that \( \tilde{\boldsymbol{\beta}}={\left[{{\tilde{\boldsymbol{\beta}}}_1}^{\prime}\;{0}^{\prime}\right]}^{\prime } \), where β1 is a k1-vector and β2, which is equal to 0 under the null hypothesis, is a k2–vector. In this case, the GNR becomes

$$ \tilde{\boldsymbol{u}}={\tilde{\boldsymbol{X}}}_1{\boldsymbol{b}}_1+{\tilde{\boldsymbol{X}}}_2{\boldsymbol{b}}_2+\ \ \ \mathrm{residuals}. $$
(7)

The ordinary F statistic for b2 = 0 is asymptotically valid as a test for β2 = 0, and it is asymptotically equal, under the null hypothesis, to the F statistic for β2 = 0 in the nonlinear regression (1). Of course, when X2 has just one column, the t statistic for the scalar b2 to equal zero is also asymptotically valid. Yet another test statistic that is frequently used is n times the uncentred R2 from regression (7), which is asymptotically distributed as χ2(k2) under the null hypothesis.

The GNR (3) can also be used as part of a quasi-Newton minimization procedure if it is evaluated at any vector, say β(j), where j denotes the jth step of an iterative procedure. In fact, this is where the name of the GNR came from. It is not hard to show that the vector

$$ {\boldsymbol{b}}_{(j)}\equiv {\left({\boldsymbol{X}}_{(j)}^{\prime }{\boldsymbol{X}}_{(j)}\right)}^{-1}{\boldsymbol{X}}_{(j)}\left(\boldsymbol{y}-{\boldsymbol{x}}_{(j)}\right), $$

where the notation should be obvious, is asymptotically equivalent to the vector that defines a Newton step starting at β(j). The vector b(j) is asymptotically equivalent to what we would get by postmultiplying minus the inverse of the Hessian of the sum of squared residuals function by the gradient. Because of this, the GNR has the same one-step property as Newton’s method itself. If we evaluate (3) at any consistent estimator, say \( \ddot{\boldsymbol{\beta}} \), then the one-step estimator \( {\boldsymbol{\beta}}^{\prime }=\ddot{\boldsymbol{\beta}}+\ddot{\boldsymbol{b}} \) is asymptotically equivalent to the NLS estimator \( \widehat{\boldsymbol{\beta}} \).

For more detailed treatments of the Gauss–Newton regression, see MacKinnon (1992) and Davidson and MacKinnon (2001, 2004).

Properties of Artificial Regressions

A very general class of artificial regressions can be written as

$$ \boldsymbol{r}\left(\boldsymbol{\theta} \right)=\boldsymbol{R}\left(\boldsymbol{\theta} \right)\boldsymbol{b}+\ \ \ \mathrm{residuals}, $$
(8)

where θ is a parameter vector of length k ,  r(θ) is a vector of length an integer multiple of the sample size n, and R(θ) is a matrix with k columns and as many rows as r(θ). In order to qualify as an artificial regression, the linear regression (8) must satisfy three key properties.

  1. 1.

    The regressand \( \boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right) \) is orthogonal to every column of the matrix of regressors \( \boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right), \) where \( \widehat{\boldsymbol{\theta}} \) denotes a vector of unrestricted estimates. That is,

$$ {\boldsymbol{R}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right)=0. $$
(9)
  1. 2.

    The asymptotic covariance matrix of \( {n}^{1/2}\left(\widehat{\theta}-{\theta}_0\right) \) is given either by

$$ \underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ {\left({n}^{-1}{\boldsymbol{R}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)\right)}^{-1},\ \ \ \mathrm{or}\ \ \ \mathrm{by} $$
(10)
$$ \underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ \ {s}^2{\left({\boldsymbol{n}}^{-1}{\boldsymbol{R}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)\right)}^{-1}, $$
(11)

where s2 is the OLS estimate of the error variance obtained by running regression (8) with \( \boldsymbol{\theta} =\widehat{\boldsymbol{\theta}} \). Of course, this is also tme if \( \widehat{\boldsymbol{\theta}} \) is replaced by any other consistent estimator of θ.

  1. 3.

    If \( \ddot{\boldsymbol{\theta}} \) denotes a consistent estimator, and \( \ddot{\boldsymbol{b}} \) denotes the vector of estimates obtained by mnning regression (8) evaluated at \( \ddot{\boldsymbol{\theta}} \), then

$$ \underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ \ {n}^{1/2}\left(\ddot{\boldsymbol{\theta}}+\ddot{\boldsymbol{b}}-{\boldsymbol{\theta}}_0\right)=\underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ \ {n}^{1/2}\left(\widehat{\boldsymbol{\theta}}-{\boldsymbol{\theta}}_0\right). $$
(12)

This is the one-step property, which holds because the vector \( \ddot{\boldsymbol{b}} \) is asymptotically equivalent to a single Newton step.

There exist many artificial regressions that take the form of (8) and satisfy conditions 1, 2, and 3. Some of these will be discussed in the next section. We have seen that the GNR satisfies these conditions and that its asymptotic covariance matrix is given by (11).

The most widespread use of artificial regressions is for specification testing. Of course, any artificial regression can be used to test restrictions on the model to which it corresponds. We simply evaluate the artificial regression for the unrestricted model at the restricted estimates, as in (7). However, in many cases, we can also use artificial regressions to test model specification without explicitly specifying an alternative. Consider the artificial regression

$$ \boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right)=\boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{b}+Z\left(\widehat{\boldsymbol{\theta}}\right)c+\ \ \ \mathrm{residuals}, $$
(13)

which is evaluated at unrestricted estimates \( \widehat{\boldsymbol{\theta}}. \) Here Z(θ) is a matrix with r columns, each of which is supposed to be asymptotically uncorrelated with r(θ), that has certain other properties which ensure that standard test statistics for c = 0 are asymptotically valid. In effect, regression (13) must have the same properties as if it corresponded to an unrestricted model. See Davidson and MacKinnon (2001, 2004) for details.

When the artificial regression (13) is a GNR, \( \boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right)=\widehat{\boldsymbol{u}} \) and \( \boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)=\widehat{\boldsymbol{X}} \). Such a GNR can be used to implement a number of well-known specification tests, including the following ones.

  • If we let \( \boldsymbol{Z}\left(\widehat{\boldsymbol{\theta}}\right) \) be a vector of squared fitted values, then the t statistic for the coefficient on the test regressor to be zero can be used to perform one version of the well-known RESET test (Ramsey, 1969).

  • If we let \( \boldsymbol{Z}\left(\widehat{\boldsymbol{\theta}}\right) \) be an n × p matrix containing the residuals lagged once through p times, either the F statistic for c = 0 or n times the uncentred R2 can be used to perform a standard test for pth order serial correlation (Godfrey, 1978).

  • If we let \( \boldsymbol{Z}\left(\widehat{\boldsymbol{\theta}}\right) \) be the vector \( \widehat{\boldsymbol{w}}-\widehat{\boldsymbol{x}} \), where \( \widehat{\boldsymbol{w}} \) denotes the fitted values from a non-nested alternative model, then the t statistic on the test regressor can be used to perform a non-nested hypothesis test, namely, the P test proposed by Davidson and MacKinnon (1981).

Like all asymptotic tests, the three tests just described may not have good finite-sample properties. This is particularly true for the P test and other non-nested hypothesis tests. Finite-sample properties can often be greatly improved by bootstrapping, which is quite easy to do in these cases. For a recent survey of bootstrap methods in econometrics, see Davidson and MacKinnon (2006).

More Artificial Regressions

A great many artificial regressions have been proposed over the years, far more than there is space to discuss here. Some of them apply to very broad classes of econometric models, and others to quite narrow ones.

One of the most widely applicable and commonly used artificial regressions is the outer product of the gradient (OPG) regression. It applies to every model for which the log-likelihood function can be written as

$$ \ell \left(\boldsymbol{\theta} \right)=\sum_{t=1}^n\;{\ell}_t\left(\boldsymbol{\theta} \right), $$
(14)

where t is the contribution to the log-likelihood made by the tth observation, and θ is a k-vector of parameters. The n × k matrix of contributions to the gradient, G(θ), has typical element

$$ {\boldsymbol{G}}_{ti}\left(\boldsymbol{\theta} \right)\equiv \frac{\partial {\ell}_t\left(\boldsymbol{\theta} \right)}{\partial {\theta}_i}. $$
(15)

Summing the elements of the ith column of this matrix yields the ith element of the gradient. The OPG regression is

$$ \iota =\boldsymbol{G}\left(\boldsymbol{\theta} \right)+ residuals, $$
(16)

where ι is an n-vector of ones.

It is easy to see that the OPG regression satisfies condition 1, since the inner product of ι and G(θ) is just the gradient, which must be zero when evaluated at the maximum likelihood estimates \( \widehat{\boldsymbol{\theta}} \). That it satisfies condition 2 follows from the fact that the plim of the matrix n−1G(θ)G(θ) is the information matrix, which implies that the asymptotic covariance matrix is give by (10). The OPG regression also satisfies condition 3, and it is therefore a valid artificial regression.

Because it applies to such a broad class of models, the OPG regression is easy to use in a wide variety of contexts. This includes information matrix tests (Chesher 1983; Lancaster 1984) and conditional moment tests (Newey 1985), both of which may be thought of as special cases of regression (13). However, because \( {n}^{-1}{\boldsymbol{G}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{G}\left(\widehat{\boldsymbol{\theta}}\right) \) tends to be an inefficient estimator of the information matrix, tests based on the OPG regression often have poor finite-properties, iterative procedures based on it may converge slowly, and covariance matrix estimates may be poor. Davidson and MacKinnon (1992) contains some simulation results which show just how poor the finite-properties of tests based on the OPG regression can be. However, these properties can often be improved dramatically by bootstrapping.

Another artificial regression that applies to a fairly general class of models estimated by maximum likelihood is the double-length artificial regression (DLR), proposed by Davidson and MacKinnon (1984). The class of models to which it applies may be written as

$$ {f}_t\left({y}_t,\ \ \ \boldsymbol{\theta} \right)={\varepsilon}_t,\ \ \ t=1,\dots, n,\ \ \ {\varepsilon}_t\sim \mathrm{NID}\left(0,1\right), $$
(17)

where ft(⋅) is a smooth function that depends on the random variable yt, on a k-vector of parameters θ, and, implicitly, on exogenous and/or predetermined variables. This class of models is much more general than may be apparent at first. It includes both univariate and multivariate linear and nonlinear regression models, as well as models that involve transformations of the dependent variable. The main restrictions are that the dependent variable(s) must be continuous and that the distribution(s) of the error terms must be known.

As its name suggests, the DLR has 2n observations. It can be written as

$$ \left[\begin{array}{c}\boldsymbol{f}\left(\boldsymbol{y},\ \boldsymbol{\theta} \right)\\ {}\iota \end{array}\right]=\left[\begin{array}{c}-\boldsymbol{F}\left(\boldsymbol{y},\ \boldsymbol{\theta} \right)\\ {}\boldsymbol{K}\left(\boldsymbol{y},\ \boldsymbol{\theta} \right)\end{array}\right]\boldsymbol{b}+\mathrm{residuals}. $$
(18)

Here f(y,  θ) is an n-vector with typical element ft(yt,  θ), ι is an n-vector of ones, F(y,  θ) is an n × k matrix with typical element ∂ft(yt,  θ)/∂θi, and K(y,  θ) is an n × k matrix with typical element ∂kt(yt,  θ)/∂θi, where

$$ {k}_t\left({y}_t,\ \ \ \boldsymbol{\theta} \right)\equiv \log \left|\frac{\partial {f}_t\left({y}_t,\boldsymbol{\theta} \right)}{\partial {y}_t}\right| $$

is a Jacobian term that appears in the log-likelihood function for the model (17). The information matrix associated with the DLR (18) has the form

$$ \frac{1}{n}\left({\boldsymbol{F}}^{\prime}\left(\boldsymbol{\theta} \right)+\boldsymbol{F}\left(\boldsymbol{\theta} \right)+{\boldsymbol{K}}^{\prime}\left(\boldsymbol{\theta} \right)\boldsymbol{K}\left(\boldsymbol{\theta} \right)\right). $$
(19)

In most cases, this is a much more efficient estimator than the one associated with the OPG regression. As a result, inferences based on the DLR are generally more reliable than inferences based on the OPG regression. See, for example, Davidson and MacKinnon (1992). The DLR is not the only artificial regression for which the number of ‘observations’ is a multiple of the actual number. For other examples, see Orme (1995).

Ideally, an information matrix estimator should depend on the data only through estimates of the parameters. A Lagrange multiplier, or score, test based on such an estimator is often called an efficient score test. Because (19) often does not satisfy this condition, using the DLR generally does not yield efficient score tests. In contrast, at least for models with no lagged dependent variables, the GNR does yield efficient score tests, as do several other artificial regressions.

A number of somewhat specialized artificial regressions can be obtained as modified versions of the Gauss–Newton regression. These include two different forms of GNR that are robust to heteroskedasticity of unknown form, a variant of the GNR for models estimated by instrumental variables, a variant of the GNR for models estimated by the generalized method of moments, a variant of the GNR for multivariate nonlinear regression models, and the binary response model regression (BRMR), which applies to models like the logit and probit model. See Davidson and MacKinnon (2001, 2004) for detailed discussions and references.

Of course, any quantity that can be computed using an artificial regression can also be computed directly by using a matrix language. Why then use artificial regressions for computation? This is, to some extent, simply a matter of taste. One potential advantage is that most statistics packages perform least squares regressions efficiently and accurately. In my view, however, the chief advantage of artificial regressions is conceptual. Because econometricians are very familiar with linear regression models, using them for computation reduces the chance of errors and makes the results easier to comprehend intuitively.

See Also