Artificial Regressions

MacKinnon, James G.

doi:10.1057/978-1-349-95189-5_2245

James G. MacKinnon¹

19 Accesses

Abstract

An artificial regression is a linear regression that is associated with some other econometric model, which is usually nonlinear. It can be used for a variety of purposes, in particular computing covariance matrices and calculating test statistics. The best-known artificial regression is the Gauss–Newton regression, whose key properties are shared by all artificial regressions. The chief advantage of artificial regressions is conceptual: because econometricians are very familiar with linear regression models, using them for computation reduces the chance of errors and makes the results easier to comprehend intuitively.

Access provided by CONRICYT-eBooks. Download reference work entry PDF

Simple and Multiple Linear Regression

General Aspects of Fitting Regression Models

Regression, logistische

Keywords

JEL Classifications

C1

An artificial regression is a linear regression that is associated with some other econometric model, which is usually, but not always, nonlinear. It can be used for a variety of purposes, in particular, computing covariance matrices and calculating test statistics. The best-known artificial regression is the Gauss–Newton regression (GNR), which is discussed in the next section. All artificial regressions share the key properties of the GNR.

The Gauss–Newton Regression

A univariate nonlinear regression model may be written as

$$ {y}_t={x}_t\left(\beta \right)+{u}_t,\ \ \ {u}_t\sim \mathrm{IID}\left(0,\ \ \ {\sigma}^2\right),t=1,\dots, n, $$

(1)

where y_t is the tth observation on the dependent variable, and β is a k-vector of parameters to be estimated. Here the scalar function x_t(β) is a nonlinear regression function which may depend on exogenous and/or predetermined variables. The model (1) may also be written using vector notation as

$$ \boldsymbol{y}=\boldsymbol{x}\left(\boldsymbol{\beta} \right)+\boldsymbol{u},\ \ \ \mathrm{u}\sim \mathrm{IID}\left(0,\ \ \ {\sigma}^2\mathbf{I}\right), $$

(2)

where y is an n-vector with typical element y_t, x(β) is an n-vector with typical element x_t(β), and I is an n × n identity matrix.

The Gauss–Newton regression that corresponds to (2) is

$$ \boldsymbol{y}-\boldsymbol{x}\left(\boldsymbol{\beta} \right)=\boldsymbol{X}\left(\boldsymbol{\beta} \right)\boldsymbol{b}+\ \ \ \mathrm{residuals}, $$

(3)

where b is an n-vector of regression coefficients, and the matrix X(β) is n × k with tith element the derivative of x_t(β) with respect to β_i, the ith component of β. The regressand here is a vector of residuals, and the regressors are matrices of derivatives. When regression (3) is evaluated at the least-squares estimates $ \widehat{\boldsymbol{\beta}} $, it becomes

$$ \widehat{\boldsymbol{u}}\equiv \boldsymbol{y}-\widehat{\boldsymbol{x}}=\widehat{\boldsymbol{X}} \mathrm{b}+\ \ \ \mathrm{residuals}, $$

(4)

where $ \widehat{\boldsymbol{x}}\equiv \boldsymbol{x}\left(\widehat{\boldsymbol{\beta}}\right) $ and $ \widehat{\boldsymbol{X}}\equiv \boldsymbol{X}\left(\widehat{\boldsymbol{\beta}}\right). $Since the regressand of this artificial regression must be orthogonal to all the regressors, running the GNR (4) is an easy way to check that the NLS estimates actually satisfy the first-order conditions.

The usual OLS covariance matrix for $ \widehat{\boldsymbol{b}} $ from regression (4) is

$$ {s}^2{\left(\widehat{{\boldsymbol{X}}^{\prime }} \widehat{\boldsymbol{X}}\right)}^{-1},\ \ \ \mathrm{where}\ \ \ {s}^2=\frac{1}{n-k}{\left(\boldsymbol{y}-\widehat{\boldsymbol{x}}\right)}^{\prime}\left(\boldsymbol{y}-\widehat{\boldsymbol{x}}\right). $$

(5)

This is also the usual estimator of the covariance matrix of the NLS estimator $ \widehat{\boldsymbol{\beta}} $ under the assumption that the errors are IID. If that assumption were relaxed to allow for heteroskedasticity of unknown form, then (5) would be replaced by a heteroskedasticity-consistent covariance matrix (HCCME) of the form

$$ {\left({\widehat{\boldsymbol{X}}}^{\prime}\widehat{\boldsymbol{X}}\right)}^{-1}{\widehat{\boldsymbol{X}}}^{\prime}\widehat{\Omega}\widehat{\boldsymbol{X}}{\left({\widehat{\boldsymbol{X}}}^{\prime}\widehat{\boldsymbol{X}}\right)}^{-1}, $$

(6)

where $ \widehat{\boldsymbol{\Omega}} $ is an n × n diagonal matrix with squared residuals, probably rescaled, on the principal diagonal. The matrix (6) is precisely what a regression package would give if we ran the GNR (4) and requested an HCCME. Similar results hold if we relax the independence assumption and use a HAC estimator. In every case, a standard estimator of the covariance matrix of $ \widehat{\boldsymbol{b}} $ from the artificial regression (4) is also perfectly valid for the NLS estimates $ \widehat{\boldsymbol{\beta}} $.

If we evaluate the GNR (3) at a vector of restricted estimates $ \tilde{\boldsymbol{\beta}} $, we can use the resulting artificial regression to test the restrictions. For simplicity, assume that $ \tilde{\boldsymbol{\beta}}={\left[{{\tilde{\boldsymbol{\beta}}}_1}^{\prime}\;{0}^{\prime}\right]}^{\prime } $, where β₁ is a k₁-vector and β₂, which is equal to 0 under the null hypothesis, is a k₂–vector. In this case, the GNR becomes

$$ \tilde{\boldsymbol{u}}={\tilde{\boldsymbol{X}}}_1{\boldsymbol{b}}_1+{\tilde{\boldsymbol{X}}}_2{\boldsymbol{b}}_2+\ \ \ \mathrm{residuals}. $$

(7)

The ordinary F statistic for b₂ = 0 is asymptotically valid as a test for β₂ = 0, and it is asymptotically equal, under the null hypothesis, to the F statistic for β₂ = 0 in the nonlinear regression (1). Of course, when X₂ has just one column, the t statistic for the scalar b₂ to equal zero is also asymptotically valid. Yet another test statistic that is frequently used is n times the uncentred R² from regression (7), which is asymptotically distributed as χ²(k₂) under the null hypothesis.

The GNR (3) can also be used as part of a quasi-Newton minimization procedure if it is evaluated at any vector, say β_(j), where j denotes the jth step of an iterative procedure. In fact, this is where the name of the GNR came from. It is not hard to show that the vector

$$ {\boldsymbol{b}}_{(j)}\equiv {\left({\boldsymbol{X}}_{(j)}^{\prime }{\boldsymbol{X}}_{(j)}\right)}^{-1}{\boldsymbol{X}}_{(j)}\left(\boldsymbol{y}-{\boldsymbol{x}}_{(j)}\right), $$

where the notation should be obvious, is asymptotically equivalent to the vector that defines a Newton step starting at β_(j). The vector b_(j) is asymptotically equivalent to what we would get by postmultiplying minus the inverse of the Hessian of the sum of squared residuals function by the gradient. Because of this, the GNR has the same one-step property as Newton’s method itself. If we evaluate (3) at any consistent estimator, say $ \ddot{\boldsymbol{\beta}} $, then the one-step estimator $ {\boldsymbol{\beta}}^{\prime }=\ddot{\boldsymbol{\beta}}+\ddot{\boldsymbol{b}} $ is asymptotically equivalent to the NLS estimator $ \widehat{\boldsymbol{\beta}} $.

For more detailed treatments of the Gauss–Newton regression, see MacKinnon (1992) and Davidson and MacKinnon (2001, 2004).

Properties of Artificial Regressions

A very general class of artificial regressions can be written as

$$ \boldsymbol{r}\left(\boldsymbol{\theta} \right)=\boldsymbol{R}\left(\boldsymbol{\theta} \right)\boldsymbol{b}+\ \ \ \mathrm{residuals}, $$

(8)

where θ is a parameter vector of length k , r(θ) is a vector of length an integer multiple of the sample size n, and R(θ) is a matrix with k columns and as many rows as r(θ). In order to qualify as an artificial regression, the linear regression (8) must satisfy three key properties.

1.
The regressand $ \boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right) $ is orthogonal to every column of the matrix of regressors $ \boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right), $ where $ \widehat{\boldsymbol{\theta}} $ denotes a vector of unrestricted estimates. That is,

$$ {\boldsymbol{R}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right)=0. $$

(9)

2.
The asymptotic covariance matrix of $ {n}^{1/2}\left(\widehat{\theta}-{\theta}_0\right) $ is given either by

$$ \underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ {\left({n}^{-1}{\boldsymbol{R}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)\right)}^{-1},\ \ \ \mathrm{or}\ \ \ \mathrm{by} $$

(10)

$$ \underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ \ {s}^2{\left({\boldsymbol{n}}^{-1}{\boldsymbol{R}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)\right)}^{-1}, $$

(11)

where s² is the OLS estimate of the error variance obtained by running regression (8) with $ \boldsymbol{\theta} =\widehat{\boldsymbol{\theta}} $. Of course, this is also tme if $ \widehat{\boldsymbol{\theta}} $ is replaced by any other consistent estimator of θ.

3.
If $ \ddot{\boldsymbol{\theta}} $ denotes a consistent estimator, and $ \ddot{\boldsymbol{b}} $ denotes the vector of estimates obtained by mnning regression (8) evaluated at $ \ddot{\boldsymbol{\theta}} $, then

$$ \underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ \ {n}^{1/2}\left(\ddot{\boldsymbol{\theta}}+\ddot{\boldsymbol{b}}-{\boldsymbol{\theta}}_0\right)=\underset{\mathrm{n}\to \infty }{\mathrm{plim}}\ \ \ \ \ {n}^{1/2}\left(\widehat{\boldsymbol{\theta}}-{\boldsymbol{\theta}}_0\right). $$

(12)

This is the one-step property, which holds because the vector $ \ddot{\boldsymbol{b}} $ is asymptotically equivalent to a single Newton step.

There exist many artificial regressions that take the form of (8) and satisfy conditions 1, 2, and 3. Some of these will be discussed in the next section. We have seen that the GNR satisfies these conditions and that its asymptotic covariance matrix is given by (11).

The most widespread use of artificial regressions is for specification testing. Of course, any artificial regression can be used to test restrictions on the model to which it corresponds. We simply evaluate the artificial regression for the unrestricted model at the restricted estimates, as in (7). However, in many cases, we can also use artificial regressions to test model specification without explicitly specifying an alternative. Consider the artificial regression

$$ \boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right)=\boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{b}+Z\left(\widehat{\boldsymbol{\theta}}\right)c+\ \ \ \mathrm{residuals}, $$

(13)

which is evaluated at unrestricted estimates $ \widehat{\boldsymbol{\theta}}. $ Here Z(θ) is a matrix with r columns, each of which is supposed to be asymptotically uncorrelated with r(θ), that has certain other properties which ensure that standard test statistics for c = 0 are asymptotically valid. In effect, regression (13) must have the same properties as if it corresponded to an unrestricted model. See Davidson and MacKinnon (2001, 2004) for details.

When the artificial regression (13) is a GNR, $ \boldsymbol{r}\left(\widehat{\boldsymbol{\theta}}\right)=\widehat{\boldsymbol{u}} $ and $ \boldsymbol{R}\left(\widehat{\boldsymbol{\theta}}\right)=\widehat{\boldsymbol{X}} $. Such a GNR can be used to implement a number of well-known specification tests, including the following ones.

If we let $ \boldsymbol{Z}\left(\widehat{\boldsymbol{\theta}}\right) $ be a vector of squared fitted values, then the t statistic for the coefficient on the test regressor to be zero can be used to perform one version of the well-known RESET test (Ramsey, 1969).
If we let $ \boldsymbol{Z}\left(\widehat{\boldsymbol{\theta}}\right) $ be an n × p matrix containing the residuals lagged once through p times, either the F statistic for c = 0 or n times the uncentred R² can be used to perform a standard test for pth order serial correlation (Godfrey, 1978).
If we let $ \boldsymbol{Z}\left(\widehat{\boldsymbol{\theta}}\right) $ be the vector $ \widehat{\boldsymbol{w}}-\widehat{\boldsymbol{x}} $, where $ \widehat{\boldsymbol{w}} $ denotes the fitted values from a non-nested alternative model, then the t statistic on the test regressor can be used to perform a non-nested hypothesis test, namely, the P test proposed by Davidson and MacKinnon (1981).

Like all asymptotic tests, the three tests just described may not have good finite-sample properties. This is particularly true for the P test and other non-nested hypothesis tests. Finite-sample properties can often be greatly improved by bootstrapping, which is quite easy to do in these cases. For a recent survey of bootstrap methods in econometrics, see Davidson and MacKinnon (2006).

More Artificial Regressions

A great many artificial regressions have been proposed over the years, far more than there is space to discuss here. Some of them apply to very broad classes of econometric models, and others to quite narrow ones.

One of the most widely applicable and commonly used artificial regressions is the outer product of the gradient (OPG) regression. It applies to every model for which the log-likelihood function can be written as

$$ \ell \left(\boldsymbol{\theta} \right)=\sum_{t=1}^n\;{\ell}_t\left(\boldsymbol{\theta} \right), $$

(14)

where ℓ_t is the contribution to the log-likelihood made by the tth observation, and θ is a k-vector of parameters. The n × k matrix of contributions to the gradient, G(θ), has typical element

$$ {\boldsymbol{G}}_{ti}\left(\boldsymbol{\theta} \right)\equiv \frac{\partial {\ell}_t\left(\boldsymbol{\theta} \right)}{\partial {\theta}_i}. $$

(15)

Summing the elements of the ith column of this matrix yields the ith element of the gradient. The OPG regression is

$$ \iota =\boldsymbol{G}\left(\boldsymbol{\theta} \right)+ residuals, $$

(16)

where ι is an n-vector of ones.

It is easy to see that the OPG regression satisfies condition 1, since the inner product of ι and G(θ) is just the gradient, which must be zero when evaluated at the maximum likelihood estimates $ \widehat{\boldsymbol{\theta}} $. That it satisfies condition 2 follows from the fact that the plim of the matrix n⁻¹G^′(θ)G(θ) is the information matrix, which implies that the asymptotic covariance matrix is give by (10). The OPG regression also satisfies condition 3, and it is therefore a valid artificial regression.

Because it applies to such a broad class of models, the OPG regression is easy to use in a wide variety of contexts. This includes information matrix tests (Chesher 1983; Lancaster 1984) and conditional moment tests (Newey 1985), both of which may be thought of as special cases of regression (13). However, because $ {n}^{-1}{\boldsymbol{G}}^{\prime}\left(\widehat{\boldsymbol{\theta}}\right)\boldsymbol{G}\left(\widehat{\boldsymbol{\theta}}\right) $ tends to be an inefficient estimator of the information matrix, tests based on the OPG regression often have poor finite-properties, iterative procedures based on it may converge slowly, and covariance matrix estimates may be poor. Davidson and MacKinnon (1992) contains some simulation results which show just how poor the finite-properties of tests based on the OPG regression can be. However, these properties can often be improved dramatically by bootstrapping.

Another artificial regression that applies to a fairly general class of models estimated by maximum likelihood is the double-length artificial regression (DLR), proposed by Davidson and MacKinnon (1984). The class of models to which it applies may be written as

$$ {f}_t\left({y}_t,\ \ \ \boldsymbol{\theta} \right)={\varepsilon}_t,\ \ \ t=1,\dots, n,\ \ \ {\varepsilon}_t\sim \mathrm{NID}\left(0,1\right), $$

(17)

where f_t(⋅) is a smooth function that depends on the random variable y_t, on a k-vector of parameters θ, and, implicitly, on exogenous and/or predetermined variables. This class of models is much more general than may be apparent at first. It includes both univariate and multivariate linear and nonlinear regression models, as well as models that involve transformations of the dependent variable. The main restrictions are that the dependent variable(s) must be continuous and that the distribution(s) of the error terms must be known.

As its name suggests, the DLR has 2n observations. It can be written as

$$ \left[\begin{array}{c}\boldsymbol{f}\left(\boldsymbol{y},\ \boldsymbol{\theta} \right)\\ {}\iota \end{array}\right]=\left[\begin{array}{c}-\boldsymbol{F}\left(\boldsymbol{y},\ \boldsymbol{\theta} \right)\\ {}\boldsymbol{K}\left(\boldsymbol{y},\ \boldsymbol{\theta} \right)\end{array}\right]\boldsymbol{b}+\mathrm{residuals}. $$

(18)

Here f(y, θ) is an n-vector with typical element f_t(y_t, θ), ι is an n-vector of ones, F(y, θ) is an n × k matrix with typical element ∂f_t(y_t, θ)/∂θ_i, and K(y, θ) is an n × k matrix with typical element ∂k_t(y_t, θ)/∂θ_i, where

$$ {k}_t\left({y}_t,\ \ \ \boldsymbol{\theta} \right)\equiv \log \left|\frac{\partial {f}_t\left({y}_t,\boldsymbol{\theta} \right)}{\partial {y}_t}\right| $$

is a Jacobian term that appears in the log-likelihood function for the model (17). The information matrix associated with the DLR (18) has the form

$$ \frac{1}{n}\left({\boldsymbol{F}}^{\prime}\left(\boldsymbol{\theta} \right)+\boldsymbol{F}\left(\boldsymbol{\theta} \right)+{\boldsymbol{K}}^{\prime}\left(\boldsymbol{\theta} \right)\boldsymbol{K}\left(\boldsymbol{\theta} \right)\right). $$

(19)

In most cases, this is a much more efficient estimator than the one associated with the OPG regression. As a result, inferences based on the DLR are generally more reliable than inferences based on the OPG regression. See, for example, Davidson and MacKinnon (1992). The DLR is not the only artificial regression for which the number of ‘observations’ is a multiple of the actual number. For other examples, see Orme (1995).

Ideally, an information matrix estimator should depend on the data only through estimates of the parameters. A Lagrange multiplier, or score, test based on such an estimator is often called an efficient score test. Because (19) often does not satisfy this condition, using the DLR generally does not yield efficient score tests. In contrast, at least for models with no lagged dependent variables, the GNR does yield efficient score tests, as do several other artificial regressions.

A number of somewhat specialized artificial regressions can be obtained as modified versions of the Gauss–Newton regression. These include two different forms of GNR that are robust to heteroskedasticity of unknown form, a variant of the GNR for models estimated by instrumental variables, a variant of the GNR for models estimated by the generalized method of moments, a variant of the GNR for multivariate nonlinear regression models, and the binary response model regression (BRMR), which applies to models like the logit and probit model. See Davidson and MacKinnon (2001, 2004) for detailed discussions and references.

Of course, any quantity that can be computed using an artificial regression can also be computed directly by using a matrix language. Why then use artificial regressions for computation? This is, to some extent, simply a matter of taste. One potential advantage is that most statistics packages perform least squares regressions efficiently and accurately. In my view, however, the chief advantage of artificial regressions is conceptual. Because econometricians are very familiar with linear regression models, using them for computation reduces the chance of errors and makes the results easier to comprehend intuitively.

Bibliography

Chesher, A. 1983. The information matrix test: Simplified calculation via a score test interpretation. Economics Letters 13(1): 45–48.
Article Google Scholar
Davidson, R., and J.G. MacKinnon. 1981. Several tests for model specification in the presence of alternative hypotheses. Econometrica 49: 781–793.
Article Google Scholar
Davidson, R., and J.G. MacKinnon. 1984. Model specification tests based on artificial linear regressions. International Economic Review 25: 485–502.
Article Google Scholar
Davidson, R., and J.G. MacKinnon. 1992. A new form of the information matrix test. Econometrica 60: 145–157.
Article Google Scholar
Davidson, R., and J.G. MacKinnon. 2001. Artificial regressions. In Companion to theoretical econometrics, ed. B. Baltagi. Oxford: Blackwell.
Google Scholar
Davidson, R., and J.G. MacKinnon. 2004. Econometric theory and methods. New York: Oxford University Press.
Google Scholar
Davidson, R., and J.G. MacKinnon. 2006. Bootstrap methods in econometrics. In Palgrave handbooks of econometrics. volume 1: Econometric theory, ed. T.C. Mills and K.D. Patterson. Basingstoke: Palgrave Macmillan.
Google Scholar
Godfrey, L.G. 1978. Testing for higher order serial correlation in regression equations when the regressors include lagged dependent variables. Econometrica 46: 1303–1310.
Article Google Scholar
Lancaster, T. 1984. The covariance matrix of the information matrix test. Econometrica 52: 1051–1053.
Article Google Scholar
MacKinnon, J.G. 1992. Model specification tests and artificial regressions. Journal of Economic Literature 30: 102–146.
Google Scholar
Newey, W.K. 1985. Maximum likelihood specification testing and conditional moment tests. Econometrica 53: 1047–1070.
Article Google Scholar
Orme, C.D. 1995. On the use of artificial regressions in certain microeconometric models. Econometrica 11: 290–305.
Google Scholar
Ramsey, J.B. 1969. Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society, Series B 31: 350–371.
Google Scholar

Download references

Author information

Authors and Affiliations

http://springerlink.bibliotecabuap.elogim.com/referencework/10.1057/978-1-349-95121-5
James G. MacKinnon

Authors

James G. MacKinnon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Copyright information

About this entry

Cite this entry

MacKinnon, J.G. (2018). Artificial Regressions. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2245

Download citation

DOI: https://doi.org/10.1057/978-1-349-95189-5_2245
Published: 15 February 2018
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics

Artificial Regressions

Abstract

Similar content being viewed by others

Simple and Multiple Linear Regression

General Aspects of Fitting Regression Models

Regression, logistische

Keywords

JEL Classifications

The Gauss–Newton Regression

Properties of Artificial Regressions

More Artificial Regressions

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Artificial Regressions

Abstract

Similar content being viewed by others

Simple and Multiple Linear Regression

General Aspects of Fitting Regression Models

Regression, logistische

Keywords

JEL Classifications

The Gauss–Newton Regression

Properties of Artificial Regressions

More Artificial Regressions

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation