Abstract
The control function approach is an econometric method used to correct for biases that arise as a consequence of selection and/or endogeneity. It is the leading approach for dealing with selection bias in the correlated random coefficients model. The basic idea of the method is to model the dependence between the variables not observed by the analyst on the observables in a way that allows us to construct a function K such that, conditional on the function, the endogeneity problem (relative to the object of interest) disappears.
Access provided by CONRICYT-eBooks. Download reference work entry PDF
Similar content being viewed by others
Keywords
- Average treatment effect
- Control functions
- Endogeneity
- Identification
- Instrumental variables
- Roy model
- Selection bias
JEL Classifications
The control function approach is an econometric method used to correct for biases that arise as a consequence of selection and/or endogeneity. It is the leading approach for dealing with selection bias in the correlated random coefficients model (see Heckman and Robb 1985, 1986; Heckman and Vytlacil 1998; Wooldridge 1997, 2003; Heckman and Navarro 2004), but it can be applied in more general semiparametric settings (see Newey et al. 1999; Altonji and Matzkin 2005; Chesher 2003; Imbens and Newey 2006; Florens et al. 2007).
The basic idea behind the control function methodology is to model the dependence between the variables not observed by the analyst on the observables in a way that allows us to construct a function K such that, conditional on the function, the endogeneity problem (relative to the object of interest) disappears.
In this article I deal exclusively with the problem of identification. That is, I assume access to data on an arbitrarily large population. As a consequence, I do not discuss estimation, standard errors or inference. In the examples, I analyse how to recover parameters in a way that, I hope, shows directly how to perform estimation via sample analogues.
The Set-Up
The general set-up I consider is the following two-equation structural model; an outcome equation:
and an equation describing the mechanism assigning values of D to individuals:
where X and Z are vectors of observed random variables, D is a (possibly vector valued) observed random variable, and ε and ν are general disturbance vectors not independent of each other but satisfying some form of independence of X and Z.
The problem of endogeneity arises because D is correlated with ε via the dependence between ε and ν. Because Eq. (2) represents an assignment mechanism in many economic models, it is generically called the ‘selection’ or ‘choice’ equation. This set-up has been applied to problems like earnings and schooling (Willis and Rosen 1979; Cunha et al. 2005), wages and sectoral choice (Heckman and Sedlacek 1985) and production functions and productivity (Olley and Pakes 1996), among others.
The goal of the analysis is to recover some functional of g(X, D, ε) of interest
that cannot be recovered in a straightforward way because of the endogeneity/ selection problem. As an example, when D is binary interest sometimes centres on the effect of going from D = 0 to D = 1 for an individual chosen at random from the population, the so-called average treatment effect:
The key behind the control function approach is to notice that (conditional on X, Z) the only source of dependence is given by the relation between ε and ν. If ν was known, we could condition on it and analyse Eq. (1) without having to worry about endogeneity. The main idea behind the control function approach is to recover some function of ν via its relationship with the model observables so that we can now condition on it and solve the endogeneity problem.
Definition
The control function approach proposes a function K (the control function) that allows us to recover a (X, D) such that K satisfies
A-1. K is a function of X, Z, D.
A-2. ε satisfies some form of independence of D conditional on ρ (X, K), with ρ a knowable function.
A-3. K is identified.
Assumption A-2 is the key assumption of the approach. It states that, once we condition on K, the dependence between ε and D (that is, the endogeneity) is no longer a problem. To help fix ideas, consider the following example of a simple linear in parameters additively separable version of the model of Eqs. (1 and 2).
Example 1
Linear regression with constant effects. Write the outcome Eq. (1) as
and assume that our object of interest (3) is α. Assume that we can write Eq. (2) as
with ν, ε ⊥⊥ X, Z where ⊥⊥ denotes statistical independence. Such a model arises, for example, if Y is logearnings and D is years of schooling as in Heckman et al. (2003). If ability is unobservable since high ability is associated with higher earnings but also with higher schooling, then ε and ν would be correlated.
If we let K = ν be the residual of the regression in (4), then we can recover a from the following regression
where it follows that E(η|X, K) = 0. It is easy to show that in this case the control function estimator and the two-stage least squares estimator are equivalent. (To my knowledge, although in a different context – a SUR model – Telser 1964, was the first to use the residuals from other equations as regressors in the equation of interest.)
The previous case is a simple example of a control function where K = D − E(D|X, Z). In this case, because of the constant effects assumption (that is, α is not random), standard instrumental variables methods and the control function approach coincide. In general, this is not the case.
In the next section I describe in detail the control function methodology for the binary choice case (Roy 1951). This case is interesting both because it is the workhorse of the policy evaluation literature and because, by virtue of its nonlinearity, it highlights the implications of a nonlinear structure in a relatively simple context. I then briefly describe extensions to more general cases. For simplicity, I focus on the additively separable in unobservables case, but recent research provides generalizations to non-additive functions (see Blundell and Powell 2003; Imbens and Newey 2006, among others).
The Case of a Binary Endogenous Variable
In this section I describe how the control function approach solves the selection/ endogeneity problem when the endogenous variable is binary. This problem has a long tradition in economics going back (at least) to Roy (1951). In Roy’s original version of the model (see Roy model) an individual is deciding whether to become a fisherman (D = 0) or a hunter (D = 1).
Associated with each occupation is a payoff YD = gD(X) + εD. Since we can only observe individuals in one sector at a time, the observed outcome for an individual is given by Y1 if he becomes a hunter (D = 1) and by Y0 if he becomes a fisherman (D = 0). That is, the observed outcome (Y) can be written as:
The model is closed by assuming that individuals choose the occupation with the highest payoff. That is,
where 1(a) is an indicator function that takes value 1 if a is true and 0 if it is false. Endogeneity arises because the error term in choice Eq. (6) contains the same random variables as the outcome Eq. (5). A generalized version of the model replaces the simple income maximization rule in (6) with a more general decision rule
The model described by Eqs. (5 and 7) is general enough to be used in many different cases. Many qsts of interest in economics fit this framework if, instead of thinking of two sectors, fishing and hunting, we think of two generic potential states, the treated state (D = 1) and the untreated state (D = 0) with their associated potential outcomes. The decision rule in (7) is general enough to capture not only income maximization but also utility maximization and even a deciding actor different from the agent directly affected by the outcomes (parents deciding for their children, for example). The simple income maximization rule in (6) shows why, in general if ε1 ≠ ε0, then ε1 − ε0 is likely to be correlated with D.
The correlated random coefficients model is a special case of the model described by (5) and (7) when ε1 − ε0 is not independent of D and gj (X) = αj + Xβ for j = 0,1. (For simplicity I assume β1 = β0 = β. The case where β1 ≠ β0 follows directly.) To see why simply rewrite (5) as
so that now the coefficient on D is (a) random and (b) correlated with D. In this case we have that the gains from treatment (α1 − α0 + ε1 − ε0) are heterogeneous (that is, they are not constant even after controlling for X) and they are correlated with D. I come back to this special linear in parameters case in Example 2.
Though other parameters of interest can be defined, I consider the case in which we are interested in the two particular functionals that receive the most attention in the evaluation literature – the average treatment effect and the average effect of treatment on the treated. I impose that ε1, ε0, ν are absolutely continuous with finite means, and that ε1, ε0, ν ⊥⊥ X, Z. (One could weaken the assumption to be ε1, ε0 ⊥⊥ X|Z and ν ⊥⊥ X, Z.)
Under these assumptions the average treatment effect is given by
where the last equality follows if Eq. (8) applies. ATE(X) is of interest to answer qsts like the average effect of a policy that is mandatory, for example. When receipt of treatment is not mandatory or randomly assigned, the average effect of treatment among those individuals who are selected into treatment is commonly the functional of interest (see Heckman 1997; Heckman and Smith 1998). This effect is measured by the average effect of treatment on the treated:
where the last equality follows for the linear in parameters case of Eq. (8).
Now, suppose we ignored the endogeneity problem and attempted to recover either of these objects from the data on outcomes at hand. In particular, if we used the (observed) conditional means of the outcome
we would not recover either ATE(X) or TT(x). Notice too that, since the endogenous variable D is binary, we cannot directly recover ν and use it as a control as we did in the linear case of Example 1 above. Instead, we can recover a function of ν that satisfies the definition of a control function.
Let Fν() denote the cumulative distribution function of ν. To form the control function in this case, first take Eq. (7) and write the choice probability
which under our assumptions implies
Following the analysis in Matzkin (1992), we can recover both h(x, z) and Fν() nonparametrically up to normalization.
Next, take the conditional (on X, Z) expectation of the outcome for the treated group
We can write the last term as
That is, we can write it as a function of the known h(x, z) or, equivalently, as a function of the probability of selection P(x, z),
where K1(P(X, Z)) satisfies our definition of a control function. So, provided that we can vary K1(P(X, Z)) independently of g1(X), we can recover g1(X) up to a constant. We can identify the constant in a limit set such that P → 1 since limP → 1K1(P) = 0. Provided that we have enough support in the probability of treatment – that is, provided that some people choose treatment with probability arbitrarily close to (1) – we can recover the constant. (See Example 2.) Using the same argument we can form
and identify g0(X) (up to a constant) and the control function K0(P(X, Z)). As before, we can recover the constant in g0(X) by noting that limP → 0K0(P) = 0.
Intuitively, we need to be able to vary the K1(P(X, Z)) function relative to the g1(X) function so that we can identify them from the observed variation in Y1. One possibility is to impose that g1 and K1 are measurably separated functions. (That is, provided that, if g1(X) = K1(P(X, Z)) almost surely then g1(X) is a constant almost surely; see Florens et al. 1990.) The simplest way to satisfy this restriction is by exclusion. That is, if K1(P(X, Z)) is a nontrivial function of Z conditional on X and Z shows enough variation, we can vary the K1 function by varying Z while keeping g1(X) constant. Another related possibility is to assume that g1 and K1 live in different function spaces. For example, g1 a linear function and K1 the nonlinear mills ratio term that results from assuming that (ε0, ε1, ν) are jointly normal as in the original Heckman (1979) selection correction model.
Once we have recovered g0(X), g1(X), K0 (P(X, Z)), K1 (P(X, Z)) we can now form our parameters of interest. Given g0(X) and g1(X), ATE(X) = g1 (X) − g0(X) immediately follows. To recover TT (X), first notice that, by the law of iterated expectations
where P(X, Z) is known from our analysis above and E(ε0| X = x, Z = z, D = 0) = K0(P(X, z)). Rewriting the expression above we get \( E\left({\varepsilon}_0|X=x,Z=z,D=1\right)=\frac{K_0\left(P\left(x,z\right)\right)\left(1-P\left(x,z\right)\right)}{P\left(x,z\right)} \). With this expectation in hand we can recover \( TT\left(X,Z\right)={g}_1(X)-{g}_0(X)+{K}_1\left(P\left(X,Z\right)\right)+\frac{K_0\Big(P\left(X,Z\right)\left(1-P\left(X,Z\right)\right)}{P\left(X,Z\right)} \). By integrating against the appropriate distribution, we can recover TT(X) = ∫TT(X, z)dFZ|X, D = 1(z).
The following example shows how the control function methodology can be applied to recover average effects of treatment in a linear in parameters model with correlated random coefficients. This model arises when there are unobservable gains that vary over individuals and these gains are correlated with the choice of treatment (that is, when there is essential heterogeneity. See Heckman et al. 2006; Basu et al. 2006). The Roy model of Eqs. (5 and 6) in which the unobservable individual gains (ε1 − ε0) are correlated with the choice of sector is an example of this case.
Example 2
Correlated random coefficients with binary treatment. Assume we can write the outcome equations in linear in parameters form,
Let D be an indicator of whether an individual receives treatment (D = 1) or not (D = 0). We also write a linear in parameters decision rule:
From the analysis in Manski (1988) we can recover δ, γ and Fν (up to scale). With P(x, z) = Pr(D = 1|X = x, Z = z) in hand, we then form
where E(ηj| X = x, Kj(P(X, Z)) = kj) = 0. To emphasize the problem of identification of the constant αj we can rewrite the outcome as
where \( {K}_j=\left(P\left(X,Z\right)\right)={\kappa}_j+{\tilde{K}}_j\left(P\left(X,Z\right)\right) \) and τj = αj + κj.
The elements of the outcome equations can be recovered by various methods. One could, for example, use Robinson (1988) and use residualized nonparametric regressions to recover βj, τj and Kj(P(X, Z)). Alternatively, one could approximate K(P(X, Z)) with a polynomial on P(X, Z). In this case we would have
where \( {\tilde{K}}_j\left(P\left(X,Z\right)\right)={\sum}_{i=1}^n{\pi}_{j1}P{\left(X,\,\, Z\right)}^i. \)When j = 0 then limP → 0K0(P) = 0 and it follows that \( {\tilde{K}}_0(P)={K}_0(P) \) and τ0 = α0. For the treated case (j = 1) we have that limP→1K1(P(X, Z)) = 0. Since \( {\tilde{K}}_1(1)={\sum}_{i=1}^n{\pi}_{1i} \) it follows that \( {\kappa}_1=-{\sum}_{i=1}^n{\pi}_{1i} \) and \( {\alpha}_1={\tau}_1-{\sum}_{i=1}^n{\pi}_{1i} \).
Extensions for a Continuous Endogenous Variable
In this section I briefly review the use of the control function approach for the case in which the endogenous variable D is continuous and we assume that X, Z ⊥⊥ε, ν. Following Blundell and Powell (2003) I assume that the object of interest is the average structural function
which, in the additively separable case g(X, D, ε) = μ(X, D) + ε is simply the regression function μ(X, D).
If we assume that the choice equation
is strictly monotonic in ν (which would follow automatically if it were additively separable in ν), we can recover h() and Fν from the analysis of Matzkin (2003) up to normalization. A convenient normalization is to assume that ν ∼ Uniform (0,1) in which case we can directly recover ν from the quantiles of Fν, but other normalizations are possible. From the independence assumption it follows that E(ε|X, D, Z) = E(ε|ν), so we can write the outcome equation as
which allows us to recover μ(X, D) directly (up to normalization). In the additively separable case we analyse, we can relax the full independence assumption and instead assume directly that the weaker mean independence assumption E(ε|X, D, Z) = E(ε|ν) holds.
Bibliography
Altonji, J.G., and R.L. Matzkin. 2005. Cross section and panel data estimators for nonseparable models with endogenous regressors. Econometrica 73: 1053–1102.
Basu, A., J.J. Heckman, S. Navarro, and S. Urzua. 2006. Use of instrumental variables in the presence of heterogeneity and self-selection: An application in breast cancer patients. Unpublished manuscript, Department of Medicine, University of Chicago.
Blundell, R., and J. Powell. 2003. Endogeneity in nonparametric and semiparametric regression models. In Advances in economics and econometrics: Theory and applications, eighth world congress, ed. L.P. Hansen, M. Dewatripont, and S.J. Turnovsky, Vol. 2. Cambridge: Cambridge University Press.
Chesher, A. 2003. Identification in nonseparable models. Econometrica 71: 1405–1441.
Cunha, F., J.J. Heckman, and S. Navarro. 2005. Separating uncertainty from heterogeneity in life cycle earnings. Oxford Economic Papers 57: 191–261.
Florens, J.-P., M. Mouchart, and J.M. Rolin. 1990. Elements of Bayesian statistics. New York: M. Dekker.
Florens, J.-P., J.J. Heckman, C. Meghir, and E.J. Vytlacil. 2007. Identification of treatment effects using control functions in models with continuous, endogenous treatment and heterogeneous effects. Unpublished manuscript, Columbia University.
Heckman, J.J. 1979. Sample selection bias as a specification error. Econometrica 47: 153–162.
Heckman, J.J. 1997. Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations. Journal of Human Resources 32: 441–462. Addendum published in 33(1) (1998).
Heckman, J.J., and S. Navarro. 2004. Using matching, instrumental variables, and control functions to estimate economic choice models. Review of Economics and Statistics 86: 30–57.
Heckman, J.J., and R. Robb. 1985. Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics 30: 239–267.
Heckman, J.J., and R. Robb. 1986. Alternative methods for solving the problem of selection bias in evaluating the impact of treatments on outcomes. In Drawing inferences from self-selected samples, ed. H. Wainer. New York: Springer. Repr. Mahwah: Lawrence Erlbaum Associates, 2000.
Heckman, J.J., and G.L. Sedlacek. 1985. Heterogeneity, aggregation, and market wage functions: An empirical model of self-selection in the labor market. Journal of Political Economy 93: 1077–1125.
Heckman, J.J., and J.A. Smith. 1998. Evaluating the welfare state. In Econometrics and economic theory in the twentieth century: The ragnar frisch centennial symposium, ed. S. Strom. New York: Cambridge University Press.
Heckman, J.J., and E.J. Vytlacil. 1998. Instrumental variables methods for the correlated random coefficient model: Estimating the average rate of return to schooling when the return is correlated with schooling. Journal of Human Resources 33: 974–987.
Heckman, J.J., L.J. Lochner, and P.E. Todd. 2003. Fifty years of mincer earnings regressions. Technical Report No. 9732. Cambridge, MA: NBER.
Heckman, J.J., S. Urzua, and E.J. Vytlacil. 2006. Understanding instrumental variables in models with essential heterogeneity. Review of Economics and Statistics 88: 389–432.
Imbens, G.W., and W.K. Newey. 2006. Identification and estimation of triangular simultaneous equations models without additivity. Unpublished manuscript, Department of Economics, MIT.
Manski, C.F. 1988. Identification of binary response models. Journal of the American Statistical Association 83: 729–738.
Matzkin, R.L. 1992. Nonparametric and distribution-free estimation of the binary threshold crossing and the binary choice models. Econometrica 60: 239–270.
Matzkin, R.L. 2003. Nonparametric estimation of nonadditive random functions. Econometrica 71: 1393–1375.
Newey, W.K., J.L. Powell, and F. Vella. 1999. Nonparametric estimation of triangular simultaneous equations models. Econometrica 67: 565–603.
Olley, G.S., and A. Pakes. 1996. The dynamics of productivity in the telecommunications equipment industry. Econometrica 64: 1263–1297.
Robinson, P.M. 1988. Root-n-consistent semiparametric regression. Econometrica 56: 931–954.
Roy, A.D. 1951. Some thoughts on the distribution of earnings. Oxford Economic Papers 3: 135–146.
Telser, L.G. 1964. Iterative estimation of a set of linear regression equations. Journal of the American Statistical Association 59: 845–862.
Willis, R.J., and S. Rosen. 1979. Education and self-selection. Journal of Political Economy 87(5, Par 2): S7–S36.
Wooldridge, J.M. 1997. On two stage least squares estimation of the average treatment effect in a random coefficient model. Economics Letters 56: 129–133.
Wooldridge, J.M. 2003. Further results on instrumental variables estimation of average treatment effects in the correlated random coefficient model. Economics Letters 79: 185–191.
Author information
Authors and Affiliations
Editor information
Copyright information
© 2018 Macmillan Publishers Ltd.
About this entry
Cite this entry
Navarro, S. (2018). Control Functions. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2262
Download citation
DOI: https://doi.org/10.1057/978-1-349-95189-5_2262
Published:
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences