Using Machine Learning Methods to Support Causal Inference in Econometrics

Ahrens, Achim; Aitken, Christopher; Schaffer, Mark E.

doi:10.1007/978-3-030-49728-6_2

Achim Ahrens⁵,
Christopher Aitken⁶ &
Mark E. Schaffer⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 897))

1214 Accesses

Abstract

We provide an introduction to the use of machine learning methods in econometrics and how these methods can be employed to assist in causal inference. We begin with an extended presentation of the lasso (least absolute shrinkage and selection operator) of Tibshirani [50]. We then discuss the ‘Post-Double-Selection’ (PDS) estimator of Belloni et al. [13, 19] and show how it uses the lasso to address the omitted confounders problem. The PDS methodology is particularly powerful for the case where the researcher has a high-dimensional set of potential control variables, and needs to strike a balance between using enough controls to eliminate the omitted variable bias but not so many as to induce overfitting. The last part of the paper discusses recent developments in the field that go beyond the PDS approach.

Invited paper for the International Conference of the Thailand Econometric Society, ‘Behavioral Predictive Modeling in Econometrics’, Chiang Mai University, Thailand, 8–10 January 2020. Our exposition of the ‘rigorous lasso’ here draws in part on our paper Ahrens et al. [1]. All errors are our own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multiple Regression Analysis from Data Science Perspective

Bootstrapping multiple linear regression after variable selection

Article 11 April 2019

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

Notes

1.
Previous research on that topic had relied instead on anecdotal evidence, or restricted its attention to a subset of the literature that is not representative because it is not feasible to manually classify the full corpus of work (Hamermesh [36] and Backhouse and Cherrier [10]).
2.
That said, traditional methods such as k-means cluster analysis and ridge regression are often now associated with ML.
3.
Bickel et al. [21] use instead the weaker restricted eigenvalue condition (REC). The RSEC implies the REC and has the advantage of being sufficient for both the lasso and the post-lasso.
4.
See, for example, Hastie et al. ( [35], Ch. 2).
5.
In this special case, the requirement of the slack is loosened and $c=1$.
6.
These conditions relate to the use of the moderate deviation theory of self-normalized sums [39] that allows the extension of the theory to cover non-Gaussianity. See Belloni et al. [13].
7.
The alternative is to simulate the distribution of the score vector. This is known as the ‘exact’ or X-dependent approach. See Belloni and Chernozhukov [14] for details and Ahrens et al. [1] for a summary discussion and an implementation in Stata.
8.
The formula in (16) for the penalty loading is familiar from the standard Eicker-Huber-White heteroskedasticty-robust covariance estimator.
9.
These authors are careful to note that the problem readily arises when researchers make decisions contingent on their data analysis; no conscious attempt to deceive is needed. Deliberate falsification, sometimes called ‘p-hacking’, is special case and likely much rarer.
10.
Wooldridge [56], pp. 450-3, 474 and Wooldridge [57], pp. 153-4.
11.
To treat it as a causal variable and obtain a valid standard error, we would have to estimate an additional lasso regression with y81 as the dependent variable etc.
12.
Of course, this discussion raises a more fundamental question: given a specification like that above, how does one construct such a function? The answer is a little technical, unfortunately. Interested readers can find a detailed discussion in van der Vaart [52].
13.
$\partial _{\eta }$ is shorthand for $\partial /\partial \eta '$. This version of the condition is actually more stringent than required. A more general definition of it, based on Gateaux derivatives, can be found in Belloni et al. [16] and Chernozhukov et al. [22].
14.
The estimators that possess this property need not be semiparametrically efficient, but because they can be, we restrict our focus to those that are.
15.
Estimators based on the efficient influence function are also double-robust (Robins et al. [48]).
16.
Sample splitting was originally introduced by Angrist and Krueger [3] and Altonji and Segal [2] in the context of bias reduction of IV and GMM estimators.
17.
There are a number of conditions that are intimately related to the size of the function class, as measured by its bracketing and covering numbers, which if satisfied are sufficient for it to be Donsker. See Vaart and Wellner [51] for a full statement.
18.
Note that this observation can be connected to the discussion in the paragraph above: if one of the models is estimated parametrically based on a relationship that is known to be true, the other model need only be consistent, since the product of the two rates would be $o_{p}\left( n^{-1/2}\right) $.
19.
Laan et al. [42] provide asymptotic justifications for weighted combinations of estimators, particularly those which use cross-validation to calculate the weights.

References

Ahrens, A., Hansen, C.B., Schaffer, M.E.: lassopack: model selection and prediction with regularized regression in Stata. The Stata J. 20, 176–235 (2020)
Article Google Scholar
Altonji, J.G., Segal, L.M.: Small-sample bias in GMM estimation of covariance structures. J. Bus. Econ. Stat. 14, 353–366 (1996)
Google Scholar
Angrist, J.D., Krueger, A.B.: Split-sample instrumental variables estimates of the return to schooling. J. Bus. Econ. Stat. 13, 225–235 (1995)
Google Scholar
Angrist, J.D., Pischke, J.-S.: The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. J. Econ. Perspect. 24, 3–30 (2010)
Article Google Scholar
Angrist, J., Azoulay, P., Ellison, G., Hill, R., Lu, S.F.: Economic research evolves: fields and styles. Am. Econ. Rev. 107, 293–297 (2017)
Article Google Scholar
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Statist. Surv. 4, 40–79 (2010)
Article MathSciNet MATH Google Scholar
Athey, S., Imbens, G.W.: The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31, 3–32 (2017)
Article Google Scholar
Athey, S., Imbens, G.W.: Machine learning methods that economists should know about. Ann. Rev. Econ. 11, 685–725 (2019)
Article Google Scholar
Athey, S., Imbens, G.W., Wager, S.: Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 80, 597–623 (2018)
Article MathSciNet MATH Google Scholar
Backhouse, R., Cherrier, B.: The age of the applied economist: the transformation of economics since the 1970s. Hist. Polit. Econ. 47 (2017)
Google Scholar
Bansak, K., Ferwerda, J., Hainmueller, J., Dillon, A., Hangartner, D., Lawrence, D., Weinstein, J.: Improving refugee integration through data-driven algorithmic assignment. Science 359, 325–329 (2018)
Article Google Scholar
Begun, J.M., Hall, W.J., Huang, W.-M., Wellner, J.A.: Information and asymptotic efficiency in parametric-nonparametric models. Ann. Stat. 11, 432–452 (1983)
Article MathSciNet MATH Google Scholar
Belloni, A., Chen, D., Chernozhukov, V., Hansen, C.: Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80, 2369–2429 (2012)
Article MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V.: High dimensional sparse econometric models: an introduction. In: Alquier, P., Gautier, E., Stoltz, G. (eds.) Inverse Problems and High-Dimensional Estimation SE - 3. Lecture Notes in Statistics, pp. 121–156. Springer, Heidelberg (2011)
Google Scholar
Belloni, A., Chernozhukov, V.: Least squares after model selection in high-dimensional sparse models. Bernoulli 19, 521–547 (2013)
Article MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V., Fernandez-Val, I., Hansen, C.: Program evaluation and causal inference with high-dimensional data. Econometrica 85, 233–298 (2017)
Article MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V., Hansen, C.: Inference for High-Dimensional Sparse Econometric Models (2011). http://arxiv.org/abs/1201.0220
Belloni, A., Chernozhukov, V., Hansen, C.: High-dimensional methods and inference on structural and treatment effects. J. Econ. Perspect. 28, 29–50 (2014a)
Article MATH Google Scholar
Belloni, A., Chernozhukov, V., Hansen, C.: Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81, 608–650 (2014b)
Article MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V., Hansen, C., Kozbur, D.: Inference in high dimensional panel models with an application to gun control. J. Bus. Econ. Stat. 34, 590–605 (2016)
Article MathSciNet Google Scholar
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Article MathSciNet MATH Google Scholar
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J.: Double/debiased machine learning for treatment and structural parameters. Econom. J. 21, C1–C68 (2018)
Article MathSciNet Google Scholar
Chernozhukov, V., Hansen, C., Spindler, M.: Post-selection and post-regularization inference in linear models with many controls and instruments. Am. Econ. Rev. 105, 486–490 (2015)
Article Google Scholar
Chetverikov, D., Liao, Z., Chernozhukov, V.: On cross-validated lasso in high dimensions. Annal. Stat. (Forthcoming)
Google Scholar
D’Amour, A., Ding, P., Feller, A., Lei, L., Sekhon, J.: A Gaussian Process Framework for Overlap and Causal Effect Estimation with High-Dimensional Covariates, arXiv:1711.02582v3 [math.ST] (2019)
Deaton, A., Cartwright, N.: Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210, 2–21 (2018)
Article Google Scholar
Farrell, M.H., Liang, T., Misra, S.: Deep Neural Networks for Estimation and Inference (2019)
Google Scholar
Feigenbaum, J.J.: Automated census record linking: a machine learning approach (2016). Working Paper
Google Scholar
Fisher, R.A.: Statistical Methods for Research Workers, 5th edn. Oliver and Boyd Ltd., Edinburgh (1925)
MATH Google Scholar
Fisher, R.A.: The Design of Experiments, 8th edn. Hafner Publishing Company, New York (1935)
Google Scholar
Gelman, A., Loken, E.: The garden of forking paths: why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time (2013). http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
Gentzkow, M., Shapiro, J.M., Taddy, M.: Measuring group differences in high-dimensional choices: method and application to congressional speech. Econometrica 87, 1307–1340 (2019)
Article MathSciNet MATH Google Scholar
Gentzkow, M., Kelly, B., Taddy, M.: Text as data. J. Econo. Lit. 57, 535–574 (2019)
Article Google Scholar
Hahn, J.: On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315 (1998)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Wainwright, M.J.: Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics & Applied Probability. CRC Press, Taylor & Francis, Boca Raton (2015)
Book MATH Google Scholar
Hamermesh, D.S.: Six decades of top economics publishing: who and how? J. Econ. Lit. 51, 162–172 (2013)
Article Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663 (1952)
Article MathSciNet MATH Google Scholar
Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice, 2 ed. (2018)
Google Scholar
Jing, B.-Y., Shao, Q.-M., Wang, Q.: Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31, 2167–2215 (2003)
Article MathSciNet MATH Google Scholar
Kennedy, E.H.: Semiparametric Theory and Empirical Processes in Causal Inference, arXiv:1510.04740v3 [math.ST] (2016)
Kiel, K., McClain, K.: House prices during siting decision stages: the case of an incinerator from rumor through operation. J. Environ. Econ. Manag. 28, 241–255 (1995)
Article Google Scholar
van der Laan, M.J., Dudoit, S., van der Vaart, A.W.: The cross-validated adaptive epsilon-net estimator. Stat. Decisions 24, 373–395 (2006)
Article MathSciNet MATH Google Scholar
Luo, Y., Spindler, M.: High-Dimensional L2 Boosting: Rate of Convergence, arXiv:1602.08927v2 [stat.ML] (2019)
Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31, 87–106 (2017)
Article Google Scholar
Neyman, J.: On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9. Translated by D. M. Dabrowska and T. P. Speed. Stat. Sci. 5, 465–472 (1990)
Google Scholar
Ning, Y., Peng, S., Imai, K.: Robust estimation of causal effects via a high-dimensional covariate balancing propensity score. Biometrika (2020)
Google Scholar
Powell, J.: Estimation of Semiparametric Models. Elsevier Science B.V., Amsterdam (1994)
Google Scholar
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Statis. Assoc. 89, 846 (1994)
Article MathSciNet MATH Google Scholar
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Heidelberg (1996)
Google Scholar
van der Vaart, A.W.: Asymptotic Statistics, Cambridge University Press (1998)
Google Scholar
Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242 (2018)
Article MathSciNet MATH Google Scholar
Wager, S., Walther, G.: Adaptive concentration of regression trees, with application to random forests, arXiv:1503.06388 [math.ST] (2019)
Wooldridge, J.M.: Violating ignorability of treatment by controlling for too many factors. Econom. Theory 21, 1026–1028 (2005)
MathSciNet MATH Google Scholar
Wooldridge, J.M.: Introductory Econometrics: A Modern Approach, 4th edn. Cengage, Boston (2009)
Google Scholar
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data, 2nd edn. MIT Press, Cambridge (2010)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

ETH Zürich, Zürich, Switzerland
Achim Ahrens
Heriot-Watt University, Edinburgh, UK
Christopher Aitken
Heriot-Watt University, Edinburgh, UK
Mark E. Schaffer

Authors

Achim Ahrens
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Aitken
View author publications
You can also search for this author in PubMed Google Scholar
Mark E. Schaffer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark E. Schaffer .

Editor information

Editors and Affiliations

Faculty of Economics, Chiang Mai University, Chiang Mai, Thailand
Songsak Sriboonchitta
Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA
Vladik Kreinovich
Faculty of Economics, Center of Excellence in Econometrics, Chiang Mai University, Chiang Mai, Thailand
Woraphon Yamaka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ahrens, A., Aitken, C., Schaffer, M.E. (2021). Using Machine Learning Methods to Support Causal Inference in Econometrics. In: Sriboonchitta, S., Kreinovich, V., Yamaka, W. (eds) Behavioral Predictive Modeling in Economics. Studies in Computational Intelligence, vol 897. Springer, Cham. https://doi.org/10.1007/978-3-030-49728-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-49728-6_2
Published: 06 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49727-9
Online ISBN: 978-3-030-49728-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Using Machine Learning Methods to Support Causal Inference in Econometrics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Regression Analysis from Data Science Perspective

Bootstrapping multiple linear regression after variable selection

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Using Machine Learning Methods to Support Causal Inference in Econometrics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Regression Analysis from Data Science Perspective

Bootstrapping multiple linear regression after variable selection

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation