Problems with the Control Variable Approach in Achieving Unbiased Estimates in Nonlinear Models in the Presence of Many Instruments

Hahn, Jinyong; Hausman, Jerry

doi:10.1007/s40953-021-00262-y

Problems with the Control Variable Approach in Achieving Unbiased Estimates in Nonlinear Models in the Presence of Many Instruments

Original Article
Published: 15 December 2021

Volume 19, pages 39–58, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Quantitative Economics Aims and scope Submit manuscript

Problems with the Control Variable Approach in Achieving Unbiased Estimates in Nonlinear Models in the Presence of Many Instruments

Download PDF

Jinyong Hahn¹ &
Jerry Hausman²

255 Accesses
1 Citation
Explore all metrics

Abstract

We present a pseudo-panel model and argue that the control variable approach is subject to the many instrument problem, since it uses the predicted value of the endogenous variable. We show how the bias can be analytically characterized. Finally, we demonstrate the problems of split sample cross fitting.

Measurement Error in the Linear Dynamic Panel Data Model

A combined estimator of regression models with measurement errors

Article 04 October 2017

Partly linear instrumental variables regressions without smoothing on the instruments

Article 30 May 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The “control variable” approach has been used in various nonlinear models to address the endogeneity problems.^{Footnote 1} The purpose of this paper is to examine (i) whether the control variable approach is also subject to the bias problem due to the many instruments problem as pointed out by Bekker (1994) for the linear models, and if so, (ii) whether the cross-fitting advocated by the modern machine learning type estimators^{Footnote 2} eliminates such bias. The many instrument problem is essentially a problem of bias due to many nuisance parameters, which can be understood by using the incidental parameters problem in the panel data. Using a pseudo-panel analysis,^{Footnote 3} we demonstrate that (i) the control variable approach is indeed subject to the many instrument problem; and (ii) the cross-fitting does not remove the bias. The negative result arises primarily because the control variable approach essentially used the fitted value of the endogenous variable which creates a finite sample bias problem. The bias and its correction in the control variable approach can in principle be understood from the perspective of the quite large general literature^{Footnote 4} on bias correction, but because our focus is on the situation where there is a large number of instruments, we use a pseudo-panel analysis^{Footnote 5} and answer these questions.

The bias of the control variable approach is relatively straightforward to understand. In linear simultaneous equations models, it is well-known that the 2SLS is equivalent to a control variable estimator. See, e.g., Hausman (1978). Therefore, it is quite natural to expect that the control variable approach is subject to the bias problem even in nonlinear models. The problem of cross fitting is not as immediately obvious, which we try to answer using the pseudo-panel analysis in the current paper. In the current section, we just explain what the cross fitting means in control variable estimation, and why it may be intuitively appealing.

Bias in the 2SLS estimator in finite samples has been long recognized. Nagar (1959) proposed the first estimator to remove this finite sample bias.^{Footnote 6} As has been recognized more recently, the bias can be especially important in the many instrument problem which occurs often with increased size of data sets as Bekker (1994) and Hahn and Hausman (2002, 2003) demonstrate and Hansen et al. (2008) explore empirically. The bias problem in 2SLS is quite important with a number of subsequent papers proposing methods to remove the finite sample bias. The Nagar estimator removes bias by analytically adjusting the estimating equation, and it only holds for linear models. For a linear model

$$\begin{aligned} y&=X\theta +\varepsilon \\ X&=Z\pi +\eta \end{aligned}$$

with the instrument matrix Z, the usual 2SLS estimator solves

$$\begin{aligned} 0=\widehat{X}^{\prime }\left( y-X\widehat{b}\right) , \end{aligned}$$

(1)

while Nagar (1959) estimator solves

$$\begin{aligned} 0=\widetilde{X}^{\prime }\left( y-X\widetilde{b}\right), \end{aligned}$$

(2)

where $\widehat{X}=PX$, $\widetilde{X}=\left( P-\frac{k}{n-k}Q\right) X$, and $P=Z\left( Z^{\prime }Z\right) ^{-1}Z^{\prime }$ and $Q=I-P$ denote the usual projection matrices. Here, n and k denote the number of observations and the number of instruments. Note that Nagar’s bias correction $\frac{k}{n-k}Q$ is roughly proportional to k/n, which can be understood to be the ratio between the “number of nuisance parameters” and the sample size, where the nuisance parameters here are the first stage OLS coefficients.

This approach can be motived by observing that the moment underlying 2SLS is biased

$$\begin{aligned} E\left[ \left( PX\right) ^{\prime }\left( y-X\theta \right) \right] =E\left[ \left( Z\pi +P\eta \right) ^{\prime }\varepsilon \right] =k\sigma _{\varepsilon \eta }, \end{aligned}$$

where $\sigma _{\varepsilon \eta }$ denotes the covariance between the ith elements of $\varepsilon $ and $\eta $, while the moment underlying Nagar’s estimator is unbiased

$$\begin{aligned} E\left[ \left( \left( P-\frac{k}{n-k}Q\right) X\right) ^{\prime }\left( y-X\theta \right) \right] =E\left[ \left( Z\pi +\left( P-\frac{k}{n-k}Q\right) \eta \right) ^{\prime }\varepsilon \right] =0. \end{aligned}$$

The lack of bias in Nagar (1959) moment can be understood from the perspective that the noise of estimating the instrument $\widehat{X}$ used in the moment (1) for 2SLS is correlated with the error $\varepsilon $ in the second stage, which is eliminated by using the instrument $\widetilde{X}$. As such, we can understand the cross-fit estimator using sample splitting as sharing a similar spirit as Nagar (1959) estimator. Specifically, we can see that the moment underlying the cross fit estimator

$$\begin{aligned} 0=\check{X}^{\prime }\left( y-X\check{b}\right) \end{aligned}$$

has an unbiased moment, i.e., $E\left[ \check{X}^{\prime }\left( y-X\theta \right) \right] =0$. Here,

$$\begin{aligned} \check{X}=\left[ \begin{array} [c]{c} \check{x}_{\left( 1\right) ,1}\\ \vdots \\ \check{x}_{\left( 1\right) ,m}\\ \check{x}_{\left( 2\right) ,1}\\ \vdots \\ \check{x}_{\left( 2\right) ,m} \end{array} \right] =\left[ \begin{array} [c]{c} z_{1}^{\prime }\widehat{\pi }_{\left( 2\right) }\\ \vdots \\ z_{m}^{\prime }\widehat{\pi }_{\left( 2\right) }\\ z_{m+1}^{\prime }\widehat{\pi }_{\left( 1\right) }\\ \vdots \\ z_{2m}^{\prime }\widehat{\pi }_{\left( 1\right) } \end{array} \right] , \end{aligned}$$

where $n=2m$, we split the sample into two equal sized subsamples, and $\widehat{\pi }_{\left( 1\right) }$ and $\widehat{\pi }_{\left( 2\right) }$ are first stage estimators based on the first and second subsamples.

We ask whether such an interpretation would lead to a reasonable inference for nonlinear models. Our conclusion is that it does not. We show that it is impossible in general to remove the bias of the moment equation by manipulating the first stage estimator alone. We do this analysis by considering nonlinear models of endogeneity with many instruments, and showing that the moment equation with the cross fit estimator does not eliminate a bias due to nonlinearity, and as a consequence, does not have the desired unbiasedness property.

Pseudo-Panel Model

Our model of interest is nonlinear models with endogeneity such as the probit model with endogenous regressors, where

$$\begin{aligned} y_{i}=& {} 1\left( x_{i}\bar{\delta }+\varepsilon _{i}\ge 0\right) ,\\ x_{i}=& {} z_{i}^{\prime }\pi +\eta _{i}, \end{aligned}$$

and $\left( \varepsilon _{i},\eta _{i}\right) $ have a bivariate normal distribution. The model has a built-in nonlinearity, and therefore, the endogeneity is probably best handled by the control variable approach. In the particular case of probit models, Rivers and Vuong (1988) solved the problem by writing

$$\begin{aligned} y_{i}=& {} 1\left( x_{i}\delta +\rho \left( x_{i}-z_{i}^{\prime }\pi \right) +\zeta _{i}\right) ,\\ x_{i}=& {} z_{i}^{\prime }\pi +\eta _{i}, \end{aligned}$$

which generates a consistent estimator as long as $\left( \varepsilon _{i},\eta _{i}\right) $ have a bivariate normal distribution. (We assume that $\zeta _{i}$ has a standard normal distribution, i.e., the parameters $\delta $ and $\rho $ reflect such normalization.)

In order to examine the consequence of many instruments, we adopt the strategy of interpreting the nuisance parameters (due to many instruments) as incidental parameters similar to the fixed effects in panel data. Therefore, we consider a special case that has a panel representation:

$$\begin{aligned} y_{it}=& {} 1\left( x_{it}\delta +\rho \left( x_{it}-\alpha _{i}\right) +\zeta _{it}\right) \nonumber \\ x_{it}=& {} \alpha _{i}+\eta _{it}. \end{aligned}$$

(3)

This is a model where the first stage is characterized by n dummy instruments, and $\alpha _{i}$ denotes the first stage coefficient for the ith dummy instrument, i.e., $\pi =\left( \alpha _{1},\alpha _{2},\ldots ,\alpha _{n}\right) $.

The usual two step estimator can be understood to be the method of moments estimator

$$\begin{aligned} 0=& {} E\left[ x_{it}-\alpha _{i}\right] \\ 0=& {} E\left[ \begin{array} [c]{c} m\left( z_{it},\theta ,\alpha _{i}\right) x_{it}\\ m\left( z_{it},\theta ,\alpha _{i}\right) \left( x_{it}-\alpha _{i}\right) \end{array} \right] \end{aligned}$$

where

$$\begin{aligned} m\left( z_{it},\theta ,\alpha _{i}\right) =\frac{y_{it}-\varPhi \left( x_{it}\delta +\rho \left( x_{it}-\alpha _{i}\right) \right) }{\varPhi \left( x_{it}\delta +\rho \left( x_{it}-\alpha _{i}\right) \right) \left[ 1-\varPhi \left( x_{it}\delta +\rho \left( x_{it}-\alpha _{i}\right) \right) \right] } \end{aligned}$$

and $\varPhi $ denotes the cumulative distribution function of a standard normal distribution.

Bias of Panel Two Step Estimator

In this section, we review the panel literature, and discuss how the bias of panel data estimator can be interpreted. The framework in this section provides a basis of understanding the problem of cross-fit estimator presented in Sect. 5.

The model in the previous section is a special case of the nonlinear panel data estimator

$$\begin{aligned} 0=& {} \sum _{t=1}^{M}\underline{v}\left( z_{it},\widehat{\theta },\widehat{\gamma }_{i}\right) \\ 0=& {} \sum _{i=1}^{n}\sum _{t=1}^{M}\underline{u}\left( z_{it},\widehat{\theta },\widehat{\gamma }_{i}\right). \end{aligned}$$

For reasons that will become clearer later, we use the symbol M to denote the time series dimension of the panel data. Hahn and Newey (2004) and Arellano and Hahn (2007, 2016) are among the few who analyzed the finite sample bias from the large n, large T asymptotic approximation point of view. For our purpose, it is useful to make an explicit assumption that the fixed effects $\gamma $ are multi-dimensional, and that v is of the same dimension as $\gamma $. We let J denote $\dim \left( \theta \right) $.

We provide a brief summary of the finite sample bias from the literature. It is convenient to analyze the general panel estimator in terms of the efficient score

$$\begin{aligned} 0=& {} \sum _{t=1}^{M}\underline{v}\left( z_{it},\widehat{\theta },\widehat{\gamma }_{i}\right) \\ 0=& {} \sum _{i=1}^{n}\sum _{t=1}^{M}\underline{U}\left( z_{it},\widehat{\theta },\widehat{\gamma }_{i}\right) \end{aligned}$$

where

$$\begin{aligned} \underline{U}\left( z_{it},\theta ,\gamma _{i}\right)=& {} \underline{u} \left( z_{it},\theta ,\gamma _{i}\right) -\underline{\varDelta }_{i} \underline{v}\left( z_{it},\theta ,\gamma _{i}\right) ,\\ \underline{\varDelta }_{i}=& {} E\left[ u_{it}^{\gamma _{i}}\right] E\left[ v_{it}^{\gamma _{i}}\right] ^{-1}. \end{aligned}$$

Here, the $E\left[ \underline{v}_{it}^{\gamma _{i}}\right] =E\left[ \partial \underline{v}\left( z_{it},\theta _{0},\gamma _{i0}\right) / \partial \gamma _{i}^{\prime }\right] $ and $E\left[ \underline{u}_{it} ^{\gamma _{i}}\right] =E\left[ \partial \underline{u}\left( z_{it},\theta _{0},\gamma _{i0}\right) / \partial \gamma _{i}^{\prime }\right] $ are evaluated at the ‘truth’. The asymptotic distribution of $\sqrt{nM}\left( \widehat{\theta }-\theta \right) $ is asymptotically normal with variance equal to

$$\begin{aligned} \left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \underline{\mathcal {I}}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} E\left[ \underline{U}_{it}^{2}\right] \right) \left( \left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \underline{\mathcal {I}}_{i}\right) ^{-1}\right) ^{\prime } \end{aligned}$$

(4)

and mean equal to

$$\begin{aligned} \left( \lim _{n\rightarrow \infty }\frac{\sqrt{n}}{\sqrt{M}}\right) \underline{B} \end{aligned}$$

(5)

where

$$\begin{aligned} \underline{B}=\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\underline{\mathcal {I}}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\underline{b}_{i}\right) \end{aligned}$$

(6)

for

$$\begin{aligned} \underline{\mathcal {I}}_{i}\equiv & {} -E\left[ \frac{\partial U_{it}\left( \theta _{0},\gamma _{i0}\right) }{\partial \theta ^{\prime }}\right] , \end{aligned}$$

(7)

$$\begin{aligned} \underline{b}_{i}=& {} \left( \underline{b}_{i,1},\ldots ,\underline{b} _{i,J}\right) ^{\prime },\nonumber \\ \underline{b}_{i,j}=& {} -\text{trace}\left( \left( E\left[ \underline{v}_{it}^{\gamma _{i}}\right] \right) ^{-1}E\left[ \underline{v} _{it}\underline{U}_{it,j}^{\gamma _{i}}\right] \right) \nonumber \\&+\frac{1}{2}\text{trace}\left( E\left[ \underline{U} _{it,j}^{\gamma _{i}\gamma _{i}}\right] \left( E\left[ \underline{v} _{it}^{\gamma _{i}}\right] \right) ^{-1}E\left[ \underline{v}_{it} \underline{v}_{it}^{\prime }\right] \left( \left( E\left[ \underline{v} _{it}^{\gamma _{i}}\right] \right) ^{-1}\right) ^{\prime }\right) , \end{aligned}$$

(8)

and $\underline{b}_{i,j}$ and $\underline{U}_{it,j}$ denote the j-th components of $\underline{b}_{i}$ and $\underline{U}_{it}$. In other words, the 1/M bias is given by the formula $\underline{B}/M$. Here, the 1/M bias denotes the approximate bias of $\widehat{\theta }$ based on the asymptotic bias (5) of $\sqrt{nM}\left( \widehat{\theta }-\theta \right) $. Because the number of fixed effects is equal to n, and the sample size is equal to nM, we can see that the ratio between the “number of nuisance parameters” and the sample size is 1/M, and that it is of the same order of magnitude of the bias of 2SLS as discussed by Nagar (1959).

Applying this result to the two-step estimation case where $M=T$ and the fixed effects are scalars,

$$\begin{aligned} 0=& {} \sum _{t=1}^{T}v\left( z_{it},\widehat{\alpha }_{i}\right) \\ 0=& {} \sum _{i=1}^{n}\sum _{t=1}^{T}u\left( z_{it},\widehat{\theta },\widehat{\alpha }_{i}\right) \end{aligned}$$

we have the asymptotic variance of $\sqrt{nT}\left( \widehat{\theta } -\theta \right) $ equal to

$$\begin{aligned} \left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} E\left[ U_{it}^{2}\right] \right) \left( \left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \mathcal {I}_{i}\right) ^{-1}\right) ^{\prime } \end{aligned}$$

and the approximate bias equal to

$$\begin{aligned} \frac{1}{T}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} \sum _{i=1}^{n}b_{i}\right), \end{aligned}$$

(9)

where

$$\begin{aligned} U\left( z_{it},\theta ,\alpha _{i}\right)=& {} u\left( z_{it},\theta ,\alpha _{i}\right) -\varDelta _{i}v\left( z_{it},\alpha _{i}\right), \end{aligned}$$

(10)

$$\begin{aligned} \varDelta _{i}\equiv & {} \frac{E\left[ u_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }, \end{aligned}$$

(11)

$$\begin{aligned} \mathcal {I}_{i}\equiv & {} -E\left[ \frac{\partial U_{it}}{\partial \theta ^{\prime }}\right], \end{aligned}$$

(12)

$$\begin{aligned} b_{i}=& {} -\frac{E\left[ v_{it}U_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+\frac{1}{2}\frac{E\left[ U_{it}^{\alpha _{i}\alpha _{i}}\right] E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}}. \end{aligned}$$

(13)

Further Analysis of the Bias Formula

In this section, we analyze the formula (13) in two important models, and show that the bias formula simplifies for linear models, but not in the probit models. In Sect. 5, we will use this difference to illustrate why the bias in the probit models cannot be removed by cross fitting.

Because $E\left[ v_{it}U_{it}^{\alpha _{i}}\right] =E\left[ v_{it} u_{it}^{\alpha _{i}}\right] -\varDelta _{i}E\left[ v_{it}v_{it}^{\alpha _{i} }\right] $, and $E\left[ U_{it}^{\alpha _{i}\alpha _{i}}\right] =E\left[ u_{it}^{\alpha _{i}\alpha _{i}}\right] -\varDelta _{i}E\left[ v_{it}^{\alpha _{i}\alpha _{i}}\right] $, we can see that the bias formula simplifies a little bit if $\varDelta _{i}=0$ or $v_{it}^{\alpha _{i}}$ is constant. Under this condition, we can see

$$\begin{aligned} b_{i}=-\frac{E\left[ v_{it}u_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+\frac{1}{2}\frac{E\left[ u_{it}^{\alpha _{i}\alpha _{i}}\right] E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}}. \end{aligned}$$

The condition $\varDelta _{i}=0$ is satisfied if $E\left[ u_{it}^{\alpha _{i} }\right] =0$, i.e., under Neyman orthogonality. The condition that $v_{it} ^{\alpha _{i}}$ is constant is satisfied if $v_{it}$ is an affine function of $\alpha _{i}$.

In order to understand these conditions, consider the panel model with n dummy IV’s

$$\begin{aligned} y_{it}=& {} x_{it}\theta +\varepsilon _{it},\nonumber \\ x_{it}=& {} \alpha _{i}+\eta _{it}. \end{aligned}$$

(14)

If our 2SLS estimator solves

$$\begin{aligned} 0=& {} \sum _{t=1}^{T}\left( x_{it}-\widehat{\alpha }_{i}\right) ,\nonumber \\ 0=& {} \sum _{i=1}^{n}\sum _{t=1}^{T}\widehat{\alpha }_{i}\left( y_{it} -x_{it}\widehat{\theta }\right) , \end{aligned}$$

(15)

we see that $v_{it}^{\alpha _{i}}=-1$. We also see that $E\left[ u_{it}^{\alpha _{i}}\right] =E\left[ y_{it}-x_{it}\theta \right] =E\left[ \varepsilon _{it}\right] =0$ so the condition $\varDelta _{i}=0$ is also satisfied. The 2SLS for the pseudo panel model is special because $u_{it}^{\alpha _{i}\alpha _{i}}=0$. This implies that the bias formula is very simple and satisfies $b_{i}= -E\left[ v_{it}u_{it}^{\alpha _{i} }\right] / E\left[ v_{it}^{\alpha _{i}}\right] $. This plays an important role in understanding the properties of split sample cross fitting for 2SLS.

We should recognize that these conditions are not satisfied for the probit model with endogenous regressors. In fact, the special nature of 2SLS, i.e., $u_{it}^{\alpha _{i}\alpha _{i}}=0$, can be argued to be an implication of the IV type interpretation of 2SLS. If the 2SLS is interpreted to be a regression using the fitted value from the first stage as a regressor in the second stage, we see that the 2SLS solves

$$\begin{aligned} 0=& {} \sum _{t=1}^{T}\left( x_{it}-\widehat{\alpha }_{i}\right) ,\nonumber \\ 0=& {} \sum _{i=1}^{n}\sum _{t=1}^{T}\widehat{\alpha }_{i}\left( y_{it} -\widehat{\alpha }_{i}\widehat{\theta }\right) . \end{aligned}$$

(16)

Here, we an easily see that $u_{it}^{\alpha _{i}\alpha _{i}}\ne 0$ in general. In general, control variable approach requires that it be used as a regressor, so we should expect $u_{it}^{\alpha _{i}\alpha _{i}}\ne 0$ in general.

Split Sample Cross Fit Estimator

In this section, we will use the framework of Sect. 3, and analyze the bias of the cross fit estimator after sample splitting. We assume that the sample is split into two, and we use the estimate $\alpha _{i}$ from one subsample to be used as part of u in the second half of the sample cross fit. In other words, in order to understand the issue, we will assume now that the data consists of

$$\begin{aligned} z_{i}=\left( z_{i1},\ldots ,z_{iT}\right) =\left( q_{i1},\ldots ,q_{iM},r_{iM},\ldots ,r_{iM}\right) , \end{aligned}$$

i.e., we will assume that $T=2M$, and write q and r for the first and second half of the observations. We will write the split sample cross fit estimator as

$$\begin{aligned} 0=& {} \sum _{t=1}^{M}v\left( q_{it},\widehat{\alpha }_{1,i}\right) \\ 0= & {} \sum _{t=1}^{M}v\left( r_{it},\widehat{\alpha }_{2,i}\right) \\ 0= & {} \sum _{i=1}^{n}\sum _{t=1}^{M}\left( u\left( q_{it},\widehat{\theta },\widehat{\alpha }_{2,i}\right) +u\left( r_{it},\widehat{\theta },\widehat{\alpha }_{1,i}\right) \right) \end{aligned}$$

with the recognition that $\widehat{\alpha }_{1,i}$ and $\widehat{\alpha } _{2,i}$ are estimators of $\alpha _{1,i}=\alpha _{2,i}=\alpha _{i}$. In order to see the resemblance to the panel model, we will write it

$$\begin{aligned} 0= & {} \sum _{t=1}^{M}\underline{v}_{\left( S\right) }\left( q_{it} ,r_{it},\widehat{\theta },\widehat{\gamma }_{i}\right) =\sum _{t=1}^{M}\left[ \begin{array} [c]{c} v\left( q_{it},\widehat{\theta },\widehat{\alpha }_{1,i}\right) \\ v\left( r_{it},\widehat{\theta },\widehat{\alpha }_{2,i}\right) \end{array} \right], \\ 0= & {} \sum _{t=1}^{M}\underline{u}_{\left( S\right) }\left( q_{it} ,r_{it},\widehat{\theta },\widehat{\gamma }_{i}\right) =\sum _{t=1}^{M}\left( u\left( q_{it},\widehat{\theta },\widehat{\alpha }_{2,i}\right) +u\left( r_{it},\widehat{\theta },\widehat{\alpha }_{1,i}\right) \right). \end{aligned}$$

In other words, the split sample cross fit estimator can be analyzed by adopting a perspective that the fixed effects are multidimensional. The result for the multi-dimensional fixed effects is already available from Arellano and Hahn (2016), which we will utilize here.

It can be shown^{Footnote 7} that the efficient score is

$$\begin{aligned} \underline{U}_{\left( S\right) }\left( q_{it},r_{it},\theta ,\alpha _{1,i},\alpha _{2,i}\right) =\left( u\left( q_{it},\theta ,\alpha _{2,i}\right) -\varDelta _{i}v\left( q_{it},\theta ,\alpha _{1,i}\right) \right) +\left( u\left( r_{it},\theta ,\alpha _{1,i}\right) -\varDelta _{i}v\left( r_{it},\theta ,\alpha _{2,i}\right) \right), \end{aligned}$$

where the $\varDelta _{i}$ is identical to the one in (11). Note that at $\alpha _{1,i}=\alpha _{2,i}=\alpha _{1,i}$, we see that the counterparts of $\underline{U}$ and $\underline{\mathcal {I}}_{i}$ are

$$\begin{aligned} \underline{U}_{\left( S\right) }\left( q_{it},r_{it},\theta ,\alpha _{1,i},\alpha _{2,i}\right)= & {} U\left( q_{it},\theta ,\alpha _{i}\right) +U\left( r_{it},\theta ,\alpha _{i}\right) ,\\ \underline{\mathcal {I}}_{\left( S\right) ,i}\equiv & {} -E\left[ \frac{\partial \left( U\left( q_{it},\theta ,\alpha _{i}\right) +U\left( r_{it},\theta ,\alpha _{i}\right) \right) }{\partial \theta }\right] =2\mathcal {I}_{i}, \end{aligned}$$

where the U and $\mathcal {I}_{i}$ on the RHS are identical to the ones in (10) and (12). We therefore see that the asymptotic distribution of $\sqrt{nM}\left( \widehat{\theta }-\theta \right) $ is normal with variance equal to

$$\begin{aligned} \frac{1}{2}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \text{Var}\left( U_{it}\right) \right) \left( \left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\mathcal {I}_{i}\right) ^{-1}\right) ^{\prime }. \end{aligned}$$

It follows that the asymptotic distribution of $\sqrt{nT}\left( \widehat{\theta }-\theta \right) =\sqrt{n\left( 2M\right) }\left( \widehat{\theta }-\theta \right) =\sqrt{2}\sqrt{nM}\left( \widehat{\theta }-\theta \right) $ is normal with variance equal to

$$\begin{aligned} \left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\mathcal {I} _{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \text{Var}\left( U_{it}\right) \right) \left( \left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\mathcal {I}_{i}\right) ^{-1}\right) ^{\prime } \end{aligned}$$

In other words, the asymptotic variance of $\sqrt{nT}\left( \widehat{\theta }-\theta \right) $ does not change.

As for the bias, we see that the counter part of $\underline{b}_{i}$ is given by

$$\begin{aligned} \underline{b}_{\left( S\right) ,i}=2\varDelta _{i}\frac{E\left[ v_{it} v_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+E\left[ U_{it}^{\alpha \alpha }\right] \frac{E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}} \end{aligned}$$

so

$$\begin{aligned} \underline{b}_{\left( S\right) ,i}=2\left( b_{i}+\frac{E\left[ v_{it}u_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }\right) \end{aligned}$$

and the implied bias is

$$\begin{aligned} \frac{1}{M}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\underline{\mathcal {I}}_{\left( S\right) ,i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\underline{b}_{\left( S\right) ,i}\right) \nonumber \\= \frac{2}{T}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\underline{\mathcal {I}}_{\left( S\right) ,i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\underline{b}_{\left( S\right) ,i}\right) \quad = \frac{2}{T}\left( 2\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\left( 2\varDelta _{i}\frac{E\left[ v_{it}v_{it}^{\alpha _{i} }\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+ \frac{E\left[ U_{it} ^{\alpha \alpha }\right] E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}}\right) \right) \nonumber \\ = \frac{1}{T}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\left( 2\varDelta _{i}\frac{E\left[ v_{it}v_{it}^{\alpha _{i} }\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+ \frac{E\left[ U_{it} ^{\alpha \alpha }\right] E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}}\right) \right) . \end{aligned}$$

(17)

We now compare the bias of the split sample cross fit estimator with the full sample estimator. We first rewrite the bias (9) with the full sample plug in estimator as

$$\begin{aligned} \frac{1}{T}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} \sum _{i=1}^{n}\left( -\frac{E\left[ v_{it}u_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+\varDelta _{i}\frac{E\left[ v_{it} v_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+\frac{1}{2}\frac{E\left[ U_{it}^{\alpha _{i}\alpha _{i}}\right] E\left[ v_{it} ^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2} }\right) \right) \end{aligned}$$

(18)

using $U_{it}^{\alpha _{i}}=u_{it}^{\alpha _{i}}-\varDelta _{i}v_{it}^{\alpha _{i}}$. Comparing (17) with (18), we can see that the split sample cross fit affects the bias in three ways:

1.
It eliminates the bias $ -E\left[ v_{it}u_{it}^{\alpha _{i} }\right] / E\left[ v_{it}^{\alpha _{i}}\right] $ due to the correlation between $v_{it}$ and $u_{it}^{\alpha _{i}}$.
2.
It magnifies the bias $\varDelta _{i}E\left[ v_{it}v_{it} ^{\alpha _{i}}\right] / E\left[ v_{it}^{\alpha _{i}}\right] $ due to the correlation between $v_{it}$ and $v_{it}^{\alpha _{i}}$ by a factor of two.
3.
It magnifies the bias $ \frac{1}{2}E\left[ U_{it}^{\alpha _{i}\alpha _{i}}\right] E\left[ v_{it}^{2}\right] / \left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}$ due to the variance of $v_{it}$ by a factor of two.

This is all intuitive. The finite sample bias is due to the noise of estimating $\alpha _{i}$, which may be correlated with the second stage moment u. The split sample cross fit estimator severs this correlation, which explains the first effect. On the other hand, the split sample estimator effectively uses half the sample size for estimation of each $\alpha _{i}$, which leads to the second and third effects.

We saw that in the pseudo panel 2SLS (15) with the IV interpretation, $\varDelta _{i}=0$ and $u_{it}^{\alpha _{i}\alpha _{i}}$. This implies that the bias of the full sample estimator takes a simple form $ -E\left[ v_{it}u_{it}^{\alpha _{i}}\right] / E\left[ v_{it}^{\alpha _{i}}\right] $, and it is completely eliminated by the split sample cross fit.

Note that $E\left[ v_{it}v_{it}^{\alpha _{i}}\right] =0$ if $v_{it} ^{\alpha _{i}}$ is constant as in (3). Even then, we should expect that (i) the bias is not removed by the cross fitting in general, although (ii) it is removed in the special case where $E\left[ u_{it}^{\alpha _{i}\alpha _{i}}\right] =0$.

Getting back to our panel rendition of the probit model with endogenous regressor, we see that

$$\begin{aligned} U_{it}=\left[ \begin{array} [c]{c} x_{it}m\left( z_{it},\theta ,\alpha _{i}\right) \\ \left( x_{it}-\alpha _{i}\right) m\left( z_{it},\theta ,\alpha _{i}\right) \end{array} \right] +\varDelta _{i}\left( x_{it}-\alpha _{i}\right) \end{aligned}$$

where

$$\begin{aligned} \varDelta _{i}= & {} E\left[ \begin{array} [c]{c} \partial \left( x_{it}m\left( z_{it},\theta ,\alpha _{i}\right) \right) / \partial \alpha _{i}\\ \partial \left( \left( x_{it}-\alpha _{i}\right) m\left( z_{it},\theta ,\alpha _{i}\right) \right) / \partial \alpha _{i} \end{array} \right] \\= & {} E\left[ \begin{array} [c]{c} x_{it}\left( \partial m\left( z_{it},\theta ,\alpha _{i}\right) / \partial \alpha _{i}\right) \\ \left( x_{it}-\alpha _{i}\right) \left( \partial m\left( z_{it},\theta ,\alpha _{i}\right) / \partial \alpha _{i}\right) -m\left( z_{it},\theta ,\alpha _{i}\right) \end{array} \right] \\= & {} E\left[ \begin{array} [c]{c} x_{it}\phi \left( x_{it}\theta +\rho \eta _{it}\right) \\ \left( x_{it}-\alpha _{i}\right) \phi \left( x_{it}\theta +\rho \eta _{it}\right) \end{array} \right] \rho , \end{aligned}$$

where we use

$$\begin{aligned} E\left[ \left. \frac{\partial m\left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}}\right| x_{it},\eta _{it}\right] =\phi \left( x_{it}\theta +\rho \eta _{it}\right) \rho. \end{aligned}$$

It can be seen that $U_{it}^{\alpha _{i}\alpha _{i}}\ne 0$, so we cannot expect the cross fitting estimator to remove the bias.

Note that the probit model is just one of the examples where the control variable is used as part of a nonlinear regression. We should therefore expect that (i) the control variable based estimator to have the many IV problem, and (ii) the problem is not solved by cross fitting. (In fact, the pseudo panel 2SLS (15) with the regression interpretation would be such that the bias due to $u_{it}^{\alpha _{i}\alpha _{i}}$ will not be eliminated by the split sample cross fit.)

Modified Objective Function for the Second Step

In this section, we review the panel literature, discuss a method of bias removal in the context of control variable estimation, which suggests how the bias can be corrected in principle. One can conjecture with high confidence that the bias can be removed from traditional methods of bias correction such as jackknife^{Footnote 8}, but it may be useful to find a simple alternative to these computationally intensive procedures. Panel literature discussed various methods of bias correction in the recent past, so one can imagine that it would work even in non-panel settings with some modifications. It is not clear how to frame an asymptotic sequence of models such that the biases in non-panel models with a large number of nuisance parameters can be easily understood and corrected, which we leave for future research.

Consider the moment (16) of the linear model, with a twist that the fitted value from the first stage is used as a regressor in the second stage. In particular, assume that $x_{it}=\alpha _{i}+\eta _{it}$, which implies that

$$\begin{aligned} v\left( z_{it},\alpha _{i}\right) =x_{it}-\alpha _{i} \end{aligned}$$

and that the bias formula (13) takes the form

$$\begin{aligned} b_{i}=E\left[ \eta _{it}u_{it}^{\alpha _{i}}\right] +\frac{1}{2}E\left[ u_{it}^{\alpha _{i}\alpha _{i}}\right] E\left[ \eta _{it}^{2}\right] . \end{aligned}$$

If we further assume that $\eta _{it}$ is i.i.d. over i and t, the formula further simplifies to

$$\begin{aligned} b_{i}=E\left[ \eta _{it}u_{it}^{\alpha _{i}}\right] +\frac{\sigma _{\eta }^{2} }{2}E\left[ u_{it}^{\alpha _{i}\alpha _{i}}\right] . \end{aligned}$$

In Sect. 5, we saw that the term $E\left[ \eta _{it} u_{it}^{\alpha _{i}}\right] $ can be eliminated by sample split cross fit, but the second term actually gets magnified.

We consider changing the moment equation altogether, adopting Arellano and Hahn (2007) proposal to correct the bias of the moment equation. For this purpose, we assume that the moment u is obtained in the maximization of some objective function

$$\begin{aligned} \sum _{i}\sum _{t}\psi \left( z_{it},\theta ,\widehat{\alpha }_{i}\right) \end{aligned}$$

with respect to $\theta $, i.e., assume that

$$\begin{aligned} u\left( z_{it},\theta ,\alpha _{i}\right) =\frac{\partial \psi \left( z_{it},\theta ,\alpha _{i}\right) }{\partial \theta }. \end{aligned}$$

We then have

$$\begin{aligned} \frac{\partial u\left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}}= & {} \frac{\partial }{\partial \theta }\left( \frac{\partial \psi \left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}}\right) \\ \frac{\partial ^{2}u\left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}^{2}}= & {} \frac{\partial }{\partial \theta }\left( \frac{\partial ^{2}\psi \left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}^{2} }\right). \end{aligned}$$

This suggests that we can adopt the proposal in Arellano and Hahn (2007), and consider maximizing

$$\begin{aligned} \sum _{i}\left( \sum _{t}\psi \left( z_{it},\theta ,\widehat{\alpha }_{i}\right) -\frac{1}{T}\sum _{t}\left( v\left( z_{it},\widehat{\alpha }_{i}\right) \frac{\partial \psi \left( z_{it},\theta ,\widehat{\alpha }_{i}\right) }{\partial \alpha _{i}}+\frac{\widehat{\sigma }_{\eta }^{2}}{2}\frac{\partial ^{2}\psi \left( z_{it},\theta ,\widehat{\alpha }_{i}\right) }{\partial \alpha _{i}^{2}}\right) \right) , \end{aligned}$$

where $\widehat{\sigma }_{\eta }^{2}=\frac{1}{nT}\sum _{i=1}\sum _{t=1}^{T}\left( x_{it}-\widehat{\alpha }_{i}\right) ^{2}$ with the corresponding moment equation^{Footnote 9}

$$\begin{aligned} 0=\sum _{i}\left( \sum _{t}u\left( z_{it},\widehat{\theta },\widehat{\alpha }_{i}\right) -\frac{1}{T}\sum _{t}\left( v\left( z_{it},\widehat{\alpha } _{i}\right) u^{\alpha _{i}}\left( z_{it},\widehat{\theta },\widehat{\alpha }_{i}\right) +\frac{\widehat{\sigma }_{\eta }^{2}}{2}u^{\alpha _{i}\alpha _{i} }\left( z_{it},\widehat{\theta },\widehat{\alpha }_{i}\right) \right) \right) . \end{aligned}$$

(19)

Summary

Using a pseudo-panel model, we have demonstrated that the control variable approach is subject to the many instrument problem, since it uses the predicted value of the endogenous variable. It is essentially the same bias problem analyzed by Cattaneo et al. (2019), who advocated the use of the jackknife to remove the higher order bias. It would be interesting to develop a method of analytic bias correction in the non-panel setting, which we leave for future research.

Data Availability

Not applicable.

Code Availability (Software Application or Custom Code)

Not applicable.

Notes

The control variable approach was developed as early as in Hausman (1978). See Blundell and Powell (2004), e.g., for recent application of the control variable approach.
See, e.g., Chernozhukov et al. (2017).
It is essentially the same bias problem analyzed by Cattaneo et al. (2019). Their analysis is predicated on the assumption that the first stage takes the form of an OLS estimation, which may be restrictive for certain applications. In order to facilitate the analysis, we present a pseudo-panel model to make the same point, and demonstrate the problems of split sample cross fitting. The panel analogy makes it easy to understand the reason why the split sample cross fitting does not remove the bias.
The long list includes Cox and Snell (1968), Firth (1993), and Rilstone et al. (1996), to list a few.
See Arellano and Hahn (2007), e.g., for a survey of the panel literature.
Biases in the finite samples may be removed either by correcting the estimator or by fixing the moments (estimating equation). The interpretation of Nagar’s estimator is that it is a result of fixing the moment.
Algebraic details are collected in the appendix.
See Cattaneo et al. (2019). Hahn et al. (2019) recently reviewed higher order bias correction in a general framework. Cattaneo et al. (2019) results are predicated on first stage estimation based on many covariates, and therefore, more relevant for the many IV type situation.
In the case where
$$\begin{aligned} \psi \left( z_{it},\theta ,\widehat{\alpha }_{i}\right) =-\frac{1}{2}\left( y_{it}-\widehat{\alpha }_{i}\theta \right) ^{2}, \end{aligned}$$
which corresponds to the 2SLS for the linear model
$$\begin{aligned} x_{it}&=\alpha _{i}+\eta _{it}\\ y_{it}&=x_{it}\theta +\varepsilon _{it}, \end{aligned}$$
we can show that the moment equation (19) produces Nagar’s estimator. Algebraic details are collected in the appendix, which also contains a discussion of the relationship to Cattaneo et al. (2019).

References

Arellano, M., and J. Hahn. 2007. Understanding Bias in Nonlinear Panel Models: Some Recent Developments. In Advances in Economics and Econometrics, ed. R. Blundell, W.K. Newey, and T. Persson. Cambridge: Cambridge University Press.
Google Scholar
Arellano, M., and J. Hahn. 2016. A Likelihood-based Approximate Solution to the Incidental Parameter Problem in Dynamic Nonlinear Models with Multiple Effects. Global Economic Review 45: 251–274.
Article Google Scholar
Bekker, P.A. 1994. Alternative Approximations to the Distributions of Instrumental Variable Estimators. Econometrica 92: 657–681.
Article Google Scholar
Blundell, R.W., and J.L. Powell. 2004. Endogeneity in Semiparametric Binary Response Models. Review of Economic Studies 71: 655–679.
Article Google Scholar
Cattaneo, M., M. Jansson, and X. Ma. 2019. Two-Step Estimation and Inference with Possibly Many Included Covariates. Review of Economic Studies 86: 1095–1122.
Article Google Scholar
Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W.K. Newey, and J.M. Robins. 2017. Double/Debiased Machine Learning for Treatment and Structural Parameters. Econometrics Journal 21: C1–C68.
Article Google Scholar
Cox, D.R., and E.J. Snell. 1968. A General Definition of Residuals. Journal of the Royal Statistical Society Series B (Methodological) 30: 248–275.
Article Google Scholar
Firth, D. 1993. Bias Reduction of Maximum Likelihood Estimates. Biomelrika 80: 27–38.
Article Google Scholar
Hahn, J., and J. Hausman. 2002. A New Specification Test for the Validity of Instrumental Variables. Econometrica 70: 163–189.
Article Google Scholar
Hahn, J., and J. Hausman. 2003. Weak Instruments: Diagnosis and Cures in Empirical Econometrics. American Economic Review: Papers and Proceedings 93: 118–125.
Article Google Scholar
Hahn, J., and W. Newey. 2004. Jackknife and Analytical Bias Reduction for Nonlinear Panel Models. Econometrica 72: 1295–1319.
Article Google Scholar
Hahn, J., G. Kuersteiner, and W. Newey. 2019.: “Bias Correction and Higher Order Efficiency”. Unpublished Working Paper.
Hansen, C., J. Hausman, and W. Newey. 2008. Estimation with Many Instrumental Variables. Journal of Business & Economic Statistics 26: 398–422.
Article Google Scholar
Hausman, J.A. 1978. Specification Tests in Econometrics. Econometrica 46: 1251–1271.
Article Google Scholar
Nagar, A.L. 1959. The Bias and Moment Matrix of the General k-Class Estimators of the Parameters in Simultaneous Equations. Econometrica 27: 575–595.
Article Google Scholar
Rilstone, P., V.K. Srivastava, and A. Ullah. 1996. The Second-Order Bias and Mean Squared Error of Nonlinear Estimators. Journal of Econometrics 75: 369–395.
Article Google Scholar
Rivers, D., and Q.H. Vuong. 1988. Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics 39: 347–366.
Article Google Scholar

Download references

Funding

None.

Author information

Authors and Affiliations

UCLA Economics, BOX 951477, 8283 Bunche Hall, Los Angeles, CA, 90095-1477, USA
Jinyong Hahn
MIT Economics, 77 Massachusetts Avenue, Bldg E52-300, Cambridge, MA, 02139, USA
Jerry Hausman

Authors

Jinyong Hahn
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Hausman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Hausman.

Ethics declarations

Conflicts of interests

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Detailed Derivations for Section 5

Noting that

$$\begin{aligned} \underline{\varDelta }_{i}= & {} E\left[ \underline{u}_{\left( S\right) ,it}^{\gamma _{i}}\right] \left( E\left[ \underline{v}_{\left( S\right) ,it}^{\gamma _{i}}\right] \right) ^{-1}\\= & {} \left( E\left[ u^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{1,i}\right) \right] ,\;E\left[ u^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{2,i}\right) \right] \right) \left( E\left[ \begin{array} [c]{cc} v^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{1,i}\right) &{} 0\\ 0 &{} v^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{2,i}\right) \end{array} \right] \right) ^{-1}\\= & {} \left( \frac{E\left[ u^{\alpha _{i}}\left( z_{it},\theta ,\alpha _{i}\right) \right] }{E\left[ v^{\alpha _{i}}\left( z_{it},\theta ,\alpha _{1}\right) \right] },\;\frac{E\left[ u^{\alpha _{i}}\left( z_{it},\theta ,\alpha _{i}\right) \right] }{E\left[ v^{\alpha _{i}}\left( z_{it},\theta ,\alpha _{1}\right) \right] }\right) \equiv \left( \varDelta _{i},\varDelta _{i}\right) , \end{aligned}$$

where the $\varDelta _{i}$ is identical to the one in (11), we see that the efficient score is

$$\begin{aligned} { \underline{U}_{\left( S\right) }\left( q_{it},r_{it},\theta ,\alpha _{1,i},\alpha _{2,i}\right) } = \left( u\left( q_{it},\theta ,\alpha _{2,i}\right) +u\left( r_{it},\theta ,\alpha _{1,i}\right) \right) -\left( \varDelta _{i}v\left( q_{it},\theta ,\alpha _{1,i}\right) +\varDelta _{i}v\left( r_{it},\theta ,\alpha _{2,i}\right) \right) \\ = \left( u\left( q_{it},\theta ,\alpha _{2,i}\right) -\varDelta _{i}v\left( r_{it},\theta ,\alpha _{2,i}\right) \right) +\left( u\left( r_{it} ,\theta ,\alpha _{1,i}\right) -\varDelta _{i}v\left( q_{it},\theta ,\alpha _{1,i}\right) \right) \\ \equiv U_{\left( S,1\right) }\left( q_{it},r_{it},\theta ,\alpha _{1,i}\right) +U_{\left( S,2\right) }\left( q_{it},r_{it},\theta ,\alpha _{2,i}\right) \\ = \left( u\left( q_{it},\theta ,\alpha _{2,i}\right) -\varDelta _{i}v\left( q_{it},\theta ,\alpha _{1,i}\right) \right) +\left( u\left( r_{it} ,\theta ,\alpha _{1,i}\right) -\varDelta _{i}v\left( r_{it},\theta ,\alpha _{2,i}\right) \right) \end{aligned}$$

Note that at $\alpha _{1,i}=\alpha _{2,i}=\alpha _{1,i}$, we see that the counterparts of $\underline{U}$ and $\underline{\mathcal {I}}_{i}$ are

$$\begin{aligned} \underline{U}_{\left( S\right) }\left( q_{it},r_{it},\theta ,\alpha _{1,i},\alpha _{2,i}\right)= & {} U\left( q_{it},\theta ,\alpha _{i}\right) +U\left( r_{it},\theta ,\alpha _{i}\right) ,\\ \underline{\mathcal {I}}_{\left( S\right) ,i}\equiv & {} -E\left[ \frac{\partial \left( U\left( q_{it},\theta ,\alpha _{i}\right) +U\left( r_{it},\theta ,\alpha _{i}\right) \right) }{\partial \theta ^{\prime }}\right] =2\mathcal {I}_{i}, \end{aligned}$$

where the U and $\mathcal {I}_{i}$ on the RHS are identical to the ones in (10) and (10). We therefore see that the asymptotic distribution of $\sqrt{nM}\left( \widehat{\theta }-\theta \right) $ is normal with variance equal to

$$\begin{aligned}&\left( \lim _{n\rightarrow \infty }2\frac{1}{n}\sum _{i=1}^{n}\mathcal {I} _{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \text{Var}\left( U\left( q_{it},\theta ,\alpha _{i}\right) +U\left( r_{it},\theta ,\alpha _{i}\right) \right) \right) \left( \left( \lim _{n\rightarrow \infty }2\frac{1}{n}\sum _{i=1}^{n}\mathcal {I}_{i}\right) ^{-1}\right) ^{\prime }\\&=\frac{1}{2}\left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1} ^{n}\mathcal {I}_{i}\right) ^{-1}\left( \lim _{n\rightarrow \infty }\frac{1}{n} {\textstyle \sum \nolimits _{i=1}^{n}} \text{Var}\left( U_{it}\right) \right) \left( \left( \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\mathcal {I}_{i}\right) ^{-1}\right) ^{\prime }, \end{aligned}$$

where we used independence between $q_{it}$ and $r_{it}$, which implies

$$\begin{aligned} \text{Var}\left( U\left( q_{it},\theta ,\alpha _{i}\right) +U\left( r_{it},\theta ,\alpha _{i}\right) \right) =\text{Var}\left( U\left( q_{it},\theta ,\alpha _{i}\right) \right) +\text{Var}\left( U\left( r_{it},\theta ,\alpha _{i}\right) \right) . \end{aligned}$$

As for the bias, we simplify notations a little bit by assuming that $J=1$, and recognize that the counter parts of the components of $\underline{b}_{i}$ are given by

$$\begin{aligned} \left( E\left[ \underline{v}_{\left( S\right) ,it}^{\gamma _{i}}\right] \right) ^{-1}= & {} \left[ \begin{array} [c]{cc} \left( E\left[ v^{\alpha _{i}}\left( q_{it},\alpha _{i}\right) \right] \right) ^{-1} &{} 0\\ 0 &{} \left( E\left[ v^{\alpha _{i}}\left( r_{it},\alpha _{i}\right) \right] \right) ^{-1} \end{array} \right] ,\\ E\left[ \underline{v}_{\left( S\right) ,it}\underline{U}_{\left( S\right) ,it}^{\gamma _{i}}\right]= & {} E\left[ \left[ \begin{array} [c]{c} v\left( q_{it},\alpha _{i}\right) \\ v\left( r_{it},\alpha _{i}\right) \end{array} \right] \left[ \left( \begin{array} [c]{c} u^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{1,i}\right) \\ -\varDelta _{i}v^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{1,i}\right) \end{array} \right) ,\;\left( \begin{array} [c]{c} u^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{2,i}\right) \\ -\varDelta _{i}v^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{2,i}\right) \end{array} \right) \right] \right] \\= & {} \left[ \begin{array} [c]{cc} E\left[ v\left( q_{it},\alpha _{i}\right) \left( \begin{array} [c]{c} u^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{1,i}\right) \\ -\varDelta _{i}v^{\alpha _{i}}\left( q_{it},\alpha _{1,i}\right) \end{array} \right) \right] &{} E\left[ v\left( q_{it},\alpha _{i}\right) \left( \begin{array} [c]{c} u^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{2,i}\right) \\ -\varDelta _{i}v^{\alpha _{i}}\left( r_{it},\alpha _{2,i}\right) \end{array} \right) \right] \\ E\left[ v\left( r_{it},\alpha _{i}\right) \left( \begin{array} [c]{c} u^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{1,i}\right) \\ -\varDelta _{i}v^{\alpha _{i}}\left( q_{it},\alpha _{1,i}\right) \end{array} \right) \right] &{} E\left[ v\left( r_{it},\alpha _{i}\right) \left( \begin{array} [c]{c} u^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{2,i}\right) \\ -\varDelta _{i}v^{\alpha _{i}}\left( r_{it},\alpha _{2,i}\right) \end{array} \right) \right] \end{array} \right] \\= & {} \left[ \begin{array} [c]{cc} -\varDelta _{i}E\left[ v\left( q_{it},\alpha _{i}\right) v^{\alpha _{i}}\left( q_{it},\alpha _{i}\right) \right] &{} E\left[ v\left( q_{it},\alpha _{i}\right) u^{\alpha _{i}}\left( q_{it},\theta ,\alpha _{i}\right) \right] \\ E\left[ v\left( r_{it},\alpha _{i}\right) u^{\alpha _{i}}\left( r_{it},\theta ,\alpha _{i}\right) \right] &{} -\varDelta _{i}E\left[ v\left( r_{it},\alpha _{i}\right) v^{\alpha _{i}}\left( r_{it},\alpha _{i}\right) \right] \end{array} \right] , \end{aligned}$$

so

$$\begin{aligned}&\left( \left( E\left[ \underline{v}_{\left( S\right) ,it}^{\gamma _{i} }\right] \right) ^{-1}E\left[ \underline{v}_{\left( S\right) ,it}\underline{U}_{\left( S\right) ,it}^{\gamma _{i}}\right] \right) \\&\quad = \left[ \begin{array} [c]{cc} \left( E\left[ v^{\alpha _{i}}\left( q_{it},\alpha _{i}\right) \right] \right) ^{-1} &{} 0\\ 0 &{} \left( E\left[ v^{\alpha _{i}}\left( r_{it},\alpha _{i}\right) \right] \right) ^{-1} \end{array} \right] \\&\quad \quad \times \left[ \begin{array} [c]{cc} -\varDelta _{i}E\left[ v_{it}v_{it}^{\alpha _{i}}\right] &{} E\left[ v_{it} U_{it}^{\alpha _{i}}\right] +\varDelta _{i}E\left[ v_{it}v_{it}^{\alpha _{i} }\right] \\ E\left[ v_{it}U_{it}^{\alpha _{i}}\right] +\varDelta _{i}E\left[ v_{it} v_{it}^{\alpha _{i}}\right] &{} -\varDelta _{i}E\left[ v_{it}v_{it}^{\alpha _{i} }\right] \end{array} \right] \\&\quad = \left[ \begin{array}[c]{cc} -\left( E\left[ v^{\alpha _{i}}\left( q_{it},\alpha _{i}\right) \right] \right) ^{-1}\varDelta _{i}E\left[ v_{it}v_{it}^{\alpha _{i}}\right] &{} \left( E\left[ v^{\alpha _{i}}\left( q_{it},\alpha _{i}\right) \right] \right) ^{-1}E\left[ v_{it}u_{it}^{\alpha _{i}}\right] \\ \left( E\left[ v^{\alpha _{i}}\left( r_{it},\alpha _{i}\right) \right] \right) ^{-1}E\left[ v_{it}u_{it}^{\alpha _{i}}\right] &{} -\left( E\left[ v^{\alpha _{i}}\left( r_{it},\alpha _{i}\right) \right] \right) ^{-1} \varDelta _{i}E\left[ v_{it}v_{it}^{\alpha _{i}}\right] \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} - \text{trace}\left( \left( E\left[ \underline{v}_{\left( S\right) ,it}^{\gamma _{i}}\right] \right) ^{-1}E\left[ \underline{v}_{\left( S\right) ,it}\underline{U}_{\left( S\right) ,it}^{\gamma _{i}}\right] \right) =2\varDelta _{i}\frac{E\left[ v_{it}v_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }. \end{aligned}$$

(20)

We also have

$$\begin{aligned} E\left[ \underline{U}_{\left( S\right) ,it}^{\gamma _{i}\gamma _{i}}\right]= & {} E\left[ \begin{array} [c]{cc} u^{\alpha _{i}\alpha _{i}}\left( r_{it},\theta ,\alpha _{1,i}\right) -\varDelta _{i}v^{\alpha _{i}\alpha _{i}}\left( q_{it},\theta ,\alpha _{1,i}\right) &{} 0\\ 0 &{} u^{\alpha _{i}\alpha _{i}}\left( q_{it},\theta ,\alpha _{2,i}\right) -\varDelta _{i}v^{\alpha _{i}\alpha _{i}}\left( r_{it},\theta ,\alpha _{2,i}\right) \end{array} \right] \\= & {} \left[ \begin{array} [c]{cc} E\left[ U^{\alpha \alpha }\left( z_{it},\theta ,\alpha _{i}\right) \right] &{} 0\\ 0 &{} E\left[ U^{\alpha \alpha }\left( z_{it},\theta ,\alpha _{i}\right) \right] \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} E\left[ \underline{v}_{\left( S\right) ,it}\underline{v}_{\left( S\right) ,it}^{\prime }\right] =E\left[ \left[ \begin{array} [c]{c} v\left( q_{it},\alpha _{i}\right) \\ v\left( r_{it},\alpha _{i}\right) \end{array} \right] \left[ v\left( q_{it},\theta ,\alpha _{1,i}\right) ,\;v\left( r_{it},\theta ,\alpha _{2,i}\right) \right] \right] =\left[ \begin{array} [c]{cc} E\left[ v_{it}^{2}\right] &{} 0\\ 0 &{} E\left[ v_{it}^{2}\right] \end{array} \right] \end{aligned}$$

so

$$\begin{aligned}&\frac{1}{2}\text{trace}\left( E\left[ \underline{U}_{\left( S\right) ,it}^{\gamma _{i}\gamma _{i}}\right] \left( E\left[ \underline{v} _{\left( S\right) ,it}^{\gamma _{i}}\right] \right) ^{-1}E\left[ \underline{v}_{\left( S\right) ,it}\underline{v}_{\left( S\right) ,it}^{\prime }\right] \left( \left( E\left[ \underline{v}_{\left( S\right) ,it}^{\gamma _{i}}\right] \right) ^{-1}\right) ^{\prime }\right) \nonumber \\&=\frac{1}{2}\text{trace}\left[ \begin{array} [c]{cc} E\left[ U_{it}^{\alpha \alpha }\right] \frac{E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}} &{} 0\\ 0 &{} E\left[ U_{it}^{\alpha \alpha }\right] \frac{E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}} \end{array} \right] =E\left[ U_{it}^{\alpha \alpha }\right] \frac{E\left[ v_{it} ^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}} \end{aligned}$$

(21)

Combining (20) and (21), we see that the counterpart of (8) is given by

$$\begin{aligned} \underline{b}_{\left( S\right) ,i}=2\varDelta _{i}\frac{E\left[ v_{it} v_{it}^{\alpha _{i}}\right] }{E\left[ v_{it}^{\alpha _{i}}\right] }+E\left[ U_{it}^{\alpha \alpha }\right] \frac{E\left[ v_{it}^{2}\right] }{\left( E\left[ v_{it}^{\alpha _{i}}\right] \right) ^{2}} \end{aligned}$$

Detailed Derivations for Section 6

In this case, we see that

$$\begin{aligned} u\left( z_{it},\theta ,\alpha _{i}\right)= & {} \frac{\partial \psi \left( z_{it},\theta ,\alpha _{i}\right) }{\partial \theta }=\widehat{\alpha }_{i}\left( y_{it}-\widehat{\alpha }_{i}\theta \right) \\ \frac{\partial u\left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}}= & {} \frac{\partial }{\partial \theta }\left( \frac{\partial \psi \left( z_{it},\theta ,\alpha _{i}\right) }{\alpha _{i}}\right) =y_{it} -2\widehat{\alpha }_{i}\theta \\ \frac{\partial ^{2}u\left( z_{it},\theta ,\alpha _{i}\right) }{\partial \alpha _{i}^{2}}= & {} \frac{\partial }{\partial \theta }\left( \frac{\partial ^{2}\psi \left( z_{it},\theta ,\alpha _{i}\right) }{\alpha _{i}^{2}}\right) =-2\theta \end{aligned}$$

so the moment equation (19) is now

$$\begin{aligned} 0=\sum _{i}\left( \sum _{t}\widehat{\alpha }_{i}\left( y_{it}-\widehat{\alpha }_{i}\widehat{\theta }\right) -\frac{1}{T}\sum _{t}\left( \left( x_{it}-\widehat{\alpha }_{i}\right) \left( y_{it}-2\widehat{\alpha } _{i}\widehat{\theta }\right) +\widehat{\sigma }_{\eta }^{2}\left( -\widehat{\theta }\right) \right) \right) \end{aligned}$$

Because $\widehat{\alpha }_{i}=\overline{x}_{i}$ in this case, we can now write the moment as

$$\begin{aligned} 0=\sum _{i}\left( \sum _{t}\overline{x}_{i}\left( y_{it}-\overline{x} _{i}\widehat{\theta }\right) -\frac{1}{T}\sum _{t}\left( \left( x_{it}-\overline{x}_{i}\right) \left( y_{it}-2\overline{x}_{i} \widehat{\theta }\right) +\widehat{\sigma }_{\eta }^{2}\left( -\widehat{\theta }\right) \right) \right) \end{aligned}$$

Noting that

$$\begin{aligned} \sum _{t}\overline{x}_{i}\left( y_{it}-\overline{x}_{i}\widehat{\theta }\right)= & {} T\overline{x}_{i}\overline{y}_{i}-T\overline{x}_{i} ^{2}\widehat{\theta },\\ \frac{1}{T}\sum _{t}\left( x_{it}-\overline{x}_{i}\right) \left( y_{it}-2\overline{x}_{i}\widehat{\theta }\right)= & {} \frac{1}{T}\sum _{t}\left( x_{it}-\overline{x}_{i}\right) \left( y_{it}-\overline{y} _{i}\right) ,\\ \widehat{\sigma }_{\eta }^{2}= & {} \frac{1}{nT}\sum _{i=1}\sum _{t=1}^{T}\left( x_{it}-\overline{x}_{i}\right) ^{2}, \end{aligned}$$

we can rewrite the moment as

$$\begin{aligned} 0= & {} \sum _{i}\left( \left( T\overline{x}_{i}\overline{y}_{i}-T\overline{x}_{i}^{2}\widehat{\theta }\right) -\frac{1}{T}\sum _{t}\left( x_{it} -\overline{x}_{i}\right) \left( y_{it}-\overline{y}_{i}\right) +\left( \frac{1}{T}\sum _{i=1}\sum _{t=1}^{T}\left( x_{it}-\overline{x}_{i}\right) ^{2}\right) \widehat{\theta }\right) \\= & {} \sum _{i}\left( T\overline{x}_{i}\overline{y}_{i}-\frac{1}{T}\sum _{t}\left( x_{it}-\overline{x}_{i}\right) \left( y_{it}-\overline{y} _{i}\right) \right) \\&-\left( \sum _{i}T\overline{x}_{i}^{2}-\frac{1}{T}\sum _{i=1}\sum _{t=1} ^{T}\left( x_{it}-\overline{x}_{i}\right) ^{2}\right) \widehat{\theta } \end{aligned}$$

or

$$\begin{aligned} \widehat{\theta }=\frac{\sum _{i}\sum _{t}\left( \overline{x}_{i}\overline{y}_{i}-\frac{1}{T}\left( x_{it}-\overline{x}_{i}\right) \left( y_{it}-\overline{y}_{i}\right) \right) }{\sum _{i}\sum _{t}\left( \overline{x}_{i}^{2}-\frac{1}{T}\left( x_{it}-\overline{x}_{i}\right) ^{2}\right) }=\frac{\sum _{i}\sum _{t}\left( \overline{x}_{i}\overline{y} _{i}-\frac{1}{T}\widetilde{x}_{it}\widetilde{y}_{it}\right) }{\sum _{i} \sum _{t}\left( \overline{x}_{i}^{2}-\frac{1}{T}\widetilde{x}_{it}^{2}\right) }. \end{aligned}$$

We now show that Nagar’s formula $\left( X^{\prime }\left( P-\frac{k}{N}M\right) X\right) ^{-1}X^{\prime }\left( P-\frac{k}{N}M\right) y$ reduces to the expression above for our particular case with n dummy instruments. Because

$$\begin{aligned}Z=\left[ \begin{array} [c]{ccc} \ell &{} &{} 0\\ &{} \ddots &{} \\ 0 &{} &{} \ell \end{array} \right] \end{aligned}$$

where $\ell $ is the T-dimensional column vector of ones, we have

$$\begin{aligned} P= & {} Z\left( Z^{\prime }Z\right) ^{-1}Z=\left[ \begin{array} [c]{ccc} \frac{1}{T}\ell \ell ^{\prime } &{} &{} 0\\ &{} \ddots &{} \\ 0 &{} &{} \frac{1}{T}\ell \ell ^{\prime } \end{array} \right] ,\\ M= & {} \left[ \begin{array} [c]{ccc} I-\frac{1}{T}\ell \ell ^{\prime } &{} &{} 0\\ &{} \ddots &{} \\ 0 &{} &{} I-\frac{1}{T}\ell \ell ^{\prime } \end{array} \right] . \end{aligned}$$

Noting that $k=n$ and $N=nT$ in this case, we obtain

$$\begin{aligned} \left( X^{\prime }\left( P-\frac{k}{N}M\right) X\right) ^{-1}X^{\prime }\left( P-\frac{k}{N}M\right) y=\frac{\sum _{i}\sum _{t}\left( \overline{x}_{i}\overline{y}_{i}-\frac{1}{T}\widetilde{x}_{it}\widetilde{y} _{it}\right) }{\sum _{i}\sum _{t}\left( \overline{x}_{i}^{2}-\frac{1}{T}\widetilde{x}_{it}^{2}\right) }. \end{aligned}$$

To conclude, Nagar’s estimator can be understood to be the variation/generalization of (19).

We also discuss the higher order bias in non-panel setting. Although the higher order properties of 2SLS are well known, it would be interesting to compare it with the recent analysis of general two-step estimators developed by Cattaneo et al. (2019). For

$$\begin{aligned} \psi \left( z_{i},\theta ,\widehat{\mu }_{i}\right) =-\frac{1}{2}\left( y_{i}-\widehat{\mu }_{i}\theta \right) ^{2} \end{aligned}$$

with

$$\begin{aligned} m\left( z_{i},\theta ,\widehat{\mu }_{i}\right) =\frac{\partial \psi \left( z_{i},\theta ,\widehat{\mu }_{i}\right) }{\partial \theta }=\widehat{\mu } _{i}\left( y_{i}-\widehat{\mu }_{i}\theta \right) \end{aligned}$$

where $\widehat{\mu }_{i}=z_{i}^{\prime }\widehat{\pi }$ is the fitted value in the regression of the first stage $x_{i}=w_{i}^{\prime }\pi +\eta _{i}$ is regressed on $w_{i}$, we see that $\dot{m}\left( z_{i},\theta ,\mu _{i}\right) =y_{i}-2\mu _{i}\theta $ and $\ddot{m}\left( z_{i},\theta ,\mu _{i}\right) =-2\theta $ in Cattaneo et al. (2019) notations, so their bias is equal to

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}E\left[ \left( y_{i}-2\mu _{i}\theta \right) \eta _{i}\right] p_{ii}+\left( -\theta \right) \sigma _{\eta }^{2}\frac{1}{N}\sum _{i=1}^{N}\sum _{j=1}^{N}p_{ij}^{2} \end{aligned}$$

(22)

divided by $E\left[ \mu _{i}^{2}\right] $.

We will simplify their expression a bit, using the fact that $\sum _{j=1} ^{N}p_{ij}^{2}$ is the $\left( i,i\right) $-element of $PP^{\prime }$, and P is symmetric and idempotent, i.e.,

$$\begin{aligned} \sum _{j=1}^{N}p_{ij}^{2}=p_{ii}, \end{aligned}$$

(23)

which allows (22) to be rewritten as

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}E\left[ \left( y_{i}-2\mu _{i}\theta \right) \eta _{i}\right] p_{ii}+\left( -\theta \right) \sigma _{\eta }^{2}\left( \frac{1}{N}\sum _{i=1}^{N}p_{ii}\right) =\frac{1}{N}\sum _{i=1}^{N}\left( E\left[ \left( y_{i}-2\mu _{i}\theta \right) \eta _{i}\right] +\left( -\theta \right) \sigma _{\eta }^{2}\right) p_{ii}. \end{aligned}$$

Because

$$\begin{aligned} y_{i}-\mu _{i}\theta =x_{i}\theta +\varepsilon _{i}-\mu _{i}\theta =\left( \mu _{i}+\eta _{i}\right) \theta +\varepsilon _{i}-\mu _{i}\theta =\eta _{i} \theta +\varepsilon _{i}, \end{aligned}$$

we have

$$\begin{aligned} E\left[ \left( y_{i}-2\mu _{i}\theta \right) \eta _{i}\right]= & {} E\left[ \left( \eta _{i}\theta +\varepsilon _{i}-\mu _{i}\theta \right) \eta _{i}\right] =\theta \sigma _{\eta }^{2}+\sigma _{\varepsilon \eta },\\ \sum _{i=1}^{N}p_{ii}= & {} \text{trace}\left( P\right) =k,\\ \sum _{i=1}^{N}\sum _{j=1}^{N}p_{ij}= & {} \text{trace}\left( PP^{\prime }\right) = \text{trace}\left( P\right) =k, \end{aligned}$$

we obtain further simplification of (22)

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left( E\left[ \left( y_{i}-2\mu _{i}\theta \right) \eta _{i}\right] +\left( -\theta \right) \sigma _{\eta }^{2}\right) p_{ii}=\frac{k}{N}\sigma _{\varepsilon \eta } \end{aligned}$$

Therefore, their bias reduces to

$$\begin{aligned} \frac{\frac{k}{N}\left( \theta \sigma _{\eta }^{2}+\sigma _{\varepsilon \eta }\right) -\frac{k}{N}\left( \theta \sigma _{\eta }^{2}\right) }{E\left[ \mu _{i}^{2}\right] }=\frac{k}{N}\frac{\sigma _{\varepsilon \eta }}{E\left[ \mu _{i}^{2}\right] } \end{aligned}$$

(24)

i.e., the usual bias of 2SLS.

The idea behind (19) and their bias (24), if applied to the current framework, suggest that we may consider a modified moment equation

$$\begin{aligned} 0= & {} \sum _{i}\widehat{\mu }_{i}\left( y_{i}-\widehat{\mu }_{i}\theta \right) -\frac{K}{N}\sum _{i=1}^{N}\left( y_{i}-x_{i}\theta \right) \left( x_{i}-\widehat{\mu }_{i}\right) \\= & {} x^{\prime }P\left( y-\left( Px\right) \theta \right) -\frac{K}{N}\left( y-x\theta \right) ^{\prime }Mx\\= & {} x^{\prime }P\left( y-x\theta \right) -\frac{K}{N}x^{\prime }M\left( y-x\theta \right) \\= & {} x^{\prime }\left( P-\frac{K}{N}\right) \left( y-x\theta \right) , \end{aligned}$$

which yields Nagar’s estimator.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hahn, J., Hausman, J. Problems with the Control Variable Approach in Achieving Unbiased Estimates in Nonlinear Models in the Presence of Many Instruments. J. Quant. Econ. 19 (Suppl 1), 39–58 (2021). https://doi.org/10.1007/s40953-021-00262-y

Download citation

Accepted: 27 October 2021
Published: 15 December 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s40953-021-00262-y

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Problems with the Control Variable Approach in Achieving Unbiased Estimates in Nonlinear Models in the Presence of Many Instruments

Abstract

Similar content being viewed by others

Measurement Error in the Linear Dynamic Panel Data Model

A combined estimator of regression models with measurement errors

Partly linear instrumental variables regressions without smoothing on the instruments

Introduction

Pseudo-Panel Model

Bias of Panel Two Step Estimator

Further Analysis of the Bias Formula

Split Sample Cross Fit Estimator

Modified Objective Function for the Second Step

Summary

Data Availability

Code Availability (Software Application or Custom Code)

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Appendices

Detailed Derivations for Section 5

Detailed Derivations for Section 6

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Problems with the Control Variable Approach in Achieving Unbiased Estimates in Nonlinear Models in the Presence of Many Instruments

Abstract

Similar content being viewed by others

Measurement Error in the Linear Dynamic Panel Data Model

A combined estimator of regression models with measurement errors

Partly linear instrumental variables regressions without smoothing on the instruments

Introduction

Pseudo-Panel Model

Bias of Panel Two Step Estimator

Further Analysis of the Bias Formula

Split Sample Cross Fit Estimator

Modified Objective Function for the Second Step

Summary

Data Availability

Code Availability (Software Application or Custom Code)

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Appendices

Detailed Derivations for Section 5

Detailed Derivations for Section 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation