Abstract
We provide an introduction to the use of machine learning methods in econometrics and how these methods can be employed to assist in causal inference. We begin with an extended presentation of the lasso (least absolute shrinkage and selection operator) of Tibshirani [50]. We then discuss the ‘Post-Double-Selection’ (PDS) estimator of Belloni et al. [13, 19] and show how it uses the lasso to address the omitted confounders problem. The PDS methodology is particularly powerful for the case where the researcher has a high-dimensional set of potential control variables, and needs to strike a balance between using enough controls to eliminate the omitted variable bias but not so many as to induce overfitting. The last part of the paper discusses recent developments in the field that go beyond the PDS approach.
Invited paper for the International Conference of the Thailand Econometric Society, ‘Behavioral Predictive Modeling in Econometrics’, Chiang Mai University, Thailand, 8–10 January 2020. Our exposition of the ‘rigorous lasso’ here draws in part on our paper Ahrens et al. [1]. All errors are our own.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
That said, traditional methods such as k-means cluster analysis and ridge regression are often now associated with ML.
- 3.
Bickel et al. [21] use instead the weaker restricted eigenvalue condition (REC). The RSEC implies the REC and has the advantage of being sufficient for both the lasso and the post-lasso.
- 4.
See, for example, Hastie et al. ( [35], Ch. 2).
- 5.
In this special case, the requirement of the slack is loosened and \(c=1\).
- 6.
- 7.
- 8.
The formula in (16) for the penalty loading is familiar from the standard Eicker-Huber-White heteroskedasticty-robust covariance estimator.
- 9.
These authors are careful to note that the problem readily arises when researchers make decisions contingent on their data analysis; no conscious attempt to deceive is needed. Deliberate falsification, sometimes called ‘p-hacking’, is special case and likely much rarer.
- 10.
- 11.
To treat it as a causal variable and obtain a valid standard error, we would have to estimate an additional lasso regression with y81 as the dependent variable etc.
- 12.
Of course, this discussion raises a more fundamental question: given a specification like that above, how does one construct such a function? The answer is a little technical, unfortunately. Interested readers can find a detailed discussion in van der Vaart [52].
- 13.
- 14.
The estimators that possess this property need not be semiparametrically efficient, but because they can be, we restrict our focus to those that are.
- 15.
Estimators based on the efficient influence function are also double-robust (Robins et al. [48]).
- 16.
- 17.
There are a number of conditions that are intimately related to the size of the function class, as measured by its bracketing and covering numbers, which if satisfied are sufficient for it to be Donsker. See Vaart and Wellner [51] for a full statement.
- 18.
Note that this observation can be connected to the discussion in the paragraph above: if one of the models is estimated parametrically based on a relationship that is known to be true, the other model need only be consistent, since the product of the two rates would be \(o_{p}\left( n^{-1/2}\right) \).
- 19.
Laan et al. [42] provide asymptotic justifications for weighted combinations of estimators, particularly those which use cross-validation to calculate the weights.
References
Ahrens, A., Hansen, C.B., Schaffer, M.E.: lassopack: model selection and prediction with regularized regression in Stata. The Stata J. 20, 176–235 (2020)
Altonji, J.G., Segal, L.M.: Small-sample bias in GMM estimation of covariance structures. J. Bus. Econ. Stat. 14, 353–366 (1996)
Angrist, J.D., Krueger, A.B.: Split-sample instrumental variables estimates of the return to schooling. J. Bus. Econ. Stat. 13, 225–235 (1995)
Angrist, J.D., Pischke, J.-S.: The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. J. Econ. Perspect. 24, 3–30 (2010)
Angrist, J., Azoulay, P., Ellison, G., Hill, R., Lu, S.F.: Economic research evolves: fields and styles. Am. Econ. Rev. 107, 293–297 (2017)
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Statist. Surv. 4, 40–79 (2010)
Athey, S., Imbens, G.W.: The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31, 3–32 (2017)
Athey, S., Imbens, G.W.: Machine learning methods that economists should know about. Ann. Rev. Econ. 11, 685–725 (2019)
Athey, S., Imbens, G.W., Wager, S.: Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 80, 597–623 (2018)
Backhouse, R., Cherrier, B.: The age of the applied economist: the transformation of economics since the 1970s. Hist. Polit. Econ. 47 (2017)
Bansak, K., Ferwerda, J., Hainmueller, J., Dillon, A., Hangartner, D., Lawrence, D., Weinstein, J.: Improving refugee integration through data-driven algorithmic assignment. Science 359, 325–329 (2018)
Begun, J.M., Hall, W.J., Huang, W.-M., Wellner, J.A.: Information and asymptotic efficiency in parametric-nonparametric models. Ann. Stat. 11, 432–452 (1983)
Belloni, A., Chen, D., Chernozhukov, V., Hansen, C.: Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80, 2369–2429 (2012)
Belloni, A., Chernozhukov, V.: High dimensional sparse econometric models: an introduction. In: Alquier, P., Gautier, E., Stoltz, G. (eds.) Inverse Problems and High-Dimensional Estimation SE - 3. Lecture Notes in Statistics, pp. 121–156. Springer, Heidelberg (2011)
Belloni, A., Chernozhukov, V.: Least squares after model selection in high-dimensional sparse models. Bernoulli 19, 521–547 (2013)
Belloni, A., Chernozhukov, V., Fernandez-Val, I., Hansen, C.: Program evaluation and causal inference with high-dimensional data. Econometrica 85, 233–298 (2017)
Belloni, A., Chernozhukov, V., Hansen, C.: Inference for High-Dimensional Sparse Econometric Models (2011). http://arxiv.org/abs/1201.0220
Belloni, A., Chernozhukov, V., Hansen, C.: High-dimensional methods and inference on structural and treatment effects. J. Econ. Perspect. 28, 29–50 (2014a)
Belloni, A., Chernozhukov, V., Hansen, C.: Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81, 608–650 (2014b)
Belloni, A., Chernozhukov, V., Hansen, C., Kozbur, D.: Inference in high dimensional panel models with an application to gun control. J. Bus. Econ. Stat. 34, 590–605 (2016)
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J.: Double/debiased machine learning for treatment and structural parameters. Econom. J. 21, C1–C68 (2018)
Chernozhukov, V., Hansen, C., Spindler, M.: Post-selection and post-regularization inference in linear models with many controls and instruments. Am. Econ. Rev. 105, 486–490 (2015)
Chetverikov, D., Liao, Z., Chernozhukov, V.: On cross-validated lasso in high dimensions. Annal. Stat. (Forthcoming)
D’Amour, A., Ding, P., Feller, A., Lei, L., Sekhon, J.: A Gaussian Process Framework for Overlap and Causal Effect Estimation with High-Dimensional Covariates, arXiv:1711.02582v3 [math.ST] (2019)
Deaton, A., Cartwright, N.: Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210, 2–21 (2018)
Farrell, M.H., Liang, T., Misra, S.: Deep Neural Networks for Estimation and Inference (2019)
Feigenbaum, J.J.: Automated census record linking: a machine learning approach (2016). Working Paper
Fisher, R.A.: Statistical Methods for Research Workers, 5th edn. Oliver and Boyd Ltd., Edinburgh (1925)
Fisher, R.A.: The Design of Experiments, 8th edn. Hafner Publishing Company, New York (1935)
Gelman, A., Loken, E.: The garden of forking paths: why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time (2013). http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
Gentzkow, M., Shapiro, J.M., Taddy, M.: Measuring group differences in high-dimensional choices: method and application to congressional speech. Econometrica 87, 1307–1340 (2019)
Gentzkow, M., Kelly, B., Taddy, M.: Text as data. J. Econo. Lit. 57, 535–574 (2019)
Hahn, J.: On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315 (1998)
Hastie, T., Tibshirani, R., Wainwright, M.J.: Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics & Applied Probability. CRC Press, Taylor & Francis, Boca Raton (2015)
Hamermesh, D.S.: Six decades of top economics publishing: who and how? J. Econ. Lit. 51, 162–172 (2013)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663 (1952)
Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice, 2 ed. (2018)
Jing, B.-Y., Shao, Q.-M., Wang, Q.: Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31, 2167–2215 (2003)
Kennedy, E.H.: Semiparametric Theory and Empirical Processes in Causal Inference, arXiv:1510.04740v3 [math.ST] (2016)
Kiel, K., McClain, K.: House prices during siting decision stages: the case of an incinerator from rumor through operation. J. Environ. Econ. Manag. 28, 241–255 (1995)
van der Laan, M.J., Dudoit, S., van der Vaart, A.W.: The cross-validated adaptive epsilon-net estimator. Stat. Decisions 24, 373–395 (2006)
Luo, Y., Spindler, M.: High-Dimensional L2 Boosting: Rate of Convergence, arXiv:1602.08927v2 [stat.ML] (2019)
Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31, 87–106 (2017)
Neyman, J.: On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9. Translated by D. M. Dabrowska and T. P. Speed. Stat. Sci. 5, 465–472 (1990)
Ning, Y., Peng, S., Imai, K.: Robust estimation of causal effects via a high-dimensional covariate balancing propensity score. Biometrika (2020)
Powell, J.: Estimation of Semiparametric Models. Elsevier Science B.V., Amsterdam (1994)
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Statis. Assoc. 89, 846 (1994)
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996)
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Heidelberg (1996)
van der Vaart, A.W.: Asymptotic Statistics, Cambridge University Press (1998)
Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242 (2018)
Wager, S., Walther, G.: Adaptive concentration of regression trees, with application to random forests, arXiv:1503.06388 [math.ST] (2019)
Wooldridge, J.M.: Violating ignorability of treatment by controlling for too many factors. Econom. Theory 21, 1026–1028 (2005)
Wooldridge, J.M.: Introductory Econometrics: A Modern Approach, 4th edn. Cengage, Boston (2009)
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data, 2nd edn. MIT Press, Cambridge (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ahrens, A., Aitken, C., Schaffer, M.E. (2021). Using Machine Learning Methods to Support Causal Inference in Econometrics. In: Sriboonchitta, S., Kreinovich, V., Yamaka, W. (eds) Behavioral Predictive Modeling in Economics. Studies in Computational Intelligence, vol 897. Springer, Cham. https://doi.org/10.1007/978-3-030-49728-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-49728-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49727-9
Online ISBN: 978-3-030-49728-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)