Skip to main content

Using Machine Learning Methods to Support Causal Inference in Econometrics

  • Chapter
  • First Online:
Behavioral Predictive Modeling in Economics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 897))

  • 1214 Accesses

Abstract

We provide an introduction to the use of machine learning methods in econometrics and how these methods can be employed to assist in causal inference. We begin with an extended presentation of the lasso (least absolute shrinkage and selection operator) of Tibshirani [50]. We then discuss the ‘Post-Double-Selection’ (PDS) estimator of Belloni et al. [13, 19] and show how it uses the lasso to address the omitted confounders problem. The PDS methodology is particularly powerful for the case where the researcher has a high-dimensional set of potential control variables, and needs to strike a balance between using enough controls to eliminate the omitted variable bias but not so many as to induce overfitting. The last part of the paper discusses recent developments in the field that go beyond the PDS approach.

Invited paper for the International Conference of the Thailand Econometric Society, ‘Behavioral Predictive Modeling in Econometrics’, Chiang Mai University, Thailand, 8–10 January 2020. Our exposition of the ‘rigorous lasso’ here draws in part on our paper Ahrens et al. [1]. All errors are our own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Previous research on that topic had relied instead on anecdotal evidence, or restricted its attention to a subset of the literature that is not representative because it is not feasible to manually classify the full corpus of work (Hamermesh [36] and Backhouse and Cherrier [10]).

  2. 2.

    That said, traditional methods such as k-means cluster analysis and ridge regression are often now associated with ML.

  3. 3.

    Bickel et al. [21] use instead the weaker restricted eigenvalue condition (REC). The RSEC implies the REC and has the advantage of being sufficient for both the lasso and the post-lasso.

  4. 4.

    See, for example, Hastie et al. ( [35], Ch. 2).

  5. 5.

    In this special case, the requirement of the slack is loosened and \(c=1\).

  6. 6.

    These conditions relate to the use of the moderate deviation theory of self-normalized sums [39] that allows the extension of the theory to cover non-Gaussianity. See Belloni et al. [13].

  7. 7.

    The alternative is to simulate the distribution of the score vector. This is known as the ‘exact’ or X-dependent approach. See Belloni and Chernozhukov [14] for details and Ahrens et al. [1] for a summary discussion and an implementation in Stata.

  8. 8.

    The formula in (16) for the penalty loading is familiar from the standard Eicker-Huber-White heteroskedasticty-robust covariance estimator.

  9. 9.

    These authors are careful to note that the problem readily arises when researchers make decisions contingent on their data analysis; no conscious attempt to deceive is needed. Deliberate falsification, sometimes called ‘p-hacking’, is special case and likely much rarer.

  10. 10.

    Wooldridge [56], pp. 450-3, 474 and Wooldridge [57], pp. 153-4.

  11. 11.

    To treat it as a causal variable and obtain a valid standard error, we would have to estimate an additional lasso regression with y81 as the dependent variable etc.

  12. 12.

    Of course, this discussion raises a more fundamental question: given a specification like that above, how does one construct such a function? The answer is a little technical, unfortunately. Interested readers can find a detailed discussion in van der Vaart  [52].

  13. 13.

    \(\partial _{\eta }\) is shorthand for \(\partial /\partial \eta '\). This version of the condition is actually more stringent than required. A more general definition of it, based on Gateaux derivatives, can be found in Belloni et al. [16] and Chernozhukov et al. [22].

  14. 14.

    The estimators that possess this property need not be semiparametrically efficient, but because they can be, we restrict our focus to those that are.

  15. 15.

    Estimators based on the efficient influence function are also double-robust (Robins et al. [48]).

  16. 16.

    Sample splitting was originally introduced by Angrist and Krueger [3] and Altonji and Segal [2] in the context of bias reduction of IV and GMM estimators.

  17. 17.

    There are a number of conditions that are intimately related to the size of the function class, as measured by its bracketing and covering numbers, which if satisfied are sufficient for it to be Donsker. See Vaart and Wellner [51] for a full statement.

  18. 18.

    Note that this observation can be connected to the discussion in the paragraph above: if one of the models is estimated parametrically based on a relationship that is known to be true, the other model need only be consistent, since the product of the two rates would be \(o_{p}\left( n^{-1/2}\right) \).

  19. 19.

    Laan et al. [42] provide asymptotic justifications for weighted combinations of estimators, particularly those which use cross-validation to calculate the weights.

References

  1. Ahrens, A., Hansen, C.B., Schaffer, M.E.: lassopack: model selection and prediction with regularized regression in Stata. The Stata J. 20, 176–235 (2020)

    Article  Google Scholar 

  2. Altonji, J.G., Segal, L.M.: Small-sample bias in GMM estimation of covariance structures. J. Bus. Econ. Stat. 14, 353–366 (1996)

    Google Scholar 

  3. Angrist, J.D., Krueger, A.B.: Split-sample instrumental variables estimates of the return to schooling. J. Bus. Econ. Stat. 13, 225–235 (1995)

    Google Scholar 

  4. Angrist, J.D., Pischke, J.-S.: The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. J. Econ. Perspect. 24, 3–30 (2010)

    Article  Google Scholar 

  5. Angrist, J., Azoulay, P., Ellison, G., Hill, R., Lu, S.F.: Economic research evolves: fields and styles. Am. Econ. Rev. 107, 293–297 (2017)

    Article  Google Scholar 

  6. Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Statist. Surv. 4, 40–79 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  7. Athey, S., Imbens, G.W.: The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31, 3–32 (2017)

    Article  Google Scholar 

  8. Athey, S., Imbens, G.W.: Machine learning methods that economists should know about. Ann. Rev. Econ. 11, 685–725 (2019)

    Article  Google Scholar 

  9. Athey, S., Imbens, G.W., Wager, S.: Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 80, 597–623 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  10. Backhouse, R., Cherrier, B.: The age of the applied economist: the transformation of economics since the 1970s. Hist. Polit. Econ. 47 (2017)

    Google Scholar 

  11. Bansak, K., Ferwerda, J., Hainmueller, J., Dillon, A., Hangartner, D., Lawrence, D., Weinstein, J.: Improving refugee integration through data-driven algorithmic assignment. Science 359, 325–329 (2018)

    Article  Google Scholar 

  12. Begun, J.M., Hall, W.J., Huang, W.-M., Wellner, J.A.: Information and asymptotic efficiency in parametric-nonparametric models. Ann. Stat. 11, 432–452 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  13. Belloni, A., Chen, D., Chernozhukov, V., Hansen, C.: Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80, 2369–2429 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Belloni, A., Chernozhukov, V.: High dimensional sparse econometric models: an introduction. In: Alquier, P., Gautier, E., Stoltz, G. (eds.) Inverse Problems and High-Dimensional Estimation SE - 3. Lecture Notes in Statistics, pp. 121–156. Springer, Heidelberg (2011)

    Google Scholar 

  15. Belloni, A., Chernozhukov, V.: Least squares after model selection in high-dimensional sparse models. Bernoulli 19, 521–547 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  16. Belloni, A., Chernozhukov, V., Fernandez-Val, I., Hansen, C.: Program evaluation and causal inference with high-dimensional data. Econometrica 85, 233–298 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Belloni, A., Chernozhukov, V., Hansen, C.: Inference for High-Dimensional Sparse Econometric Models (2011). http://arxiv.org/abs/1201.0220

  18. Belloni, A., Chernozhukov, V., Hansen, C.: High-dimensional methods and inference on structural and treatment effects. J. Econ. Perspect. 28, 29–50 (2014a)

    Article  MATH  Google Scholar 

  19. Belloni, A., Chernozhukov, V., Hansen, C.: Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81, 608–650 (2014b)

    Article  MathSciNet  MATH  Google Scholar 

  20. Belloni, A., Chernozhukov, V., Hansen, C., Kozbur, D.: Inference in high dimensional panel models with an application to gun control. J. Bus. Econ. Stat. 34, 590–605 (2016)

    Article  MathSciNet  Google Scholar 

  21. Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  22. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J.: Double/debiased machine learning for treatment and structural parameters. Econom. J. 21, C1–C68 (2018)

    Article  MathSciNet  Google Scholar 

  23. Chernozhukov, V., Hansen, C., Spindler, M.: Post-selection and post-regularization inference in linear models with many controls and instruments. Am. Econ. Rev. 105, 486–490 (2015)

    Article  Google Scholar 

  24. Chetverikov, D., Liao, Z., Chernozhukov, V.: On cross-validated lasso in high dimensions. Annal. Stat. (Forthcoming)

    Google Scholar 

  25. D’Amour, A., Ding, P., Feller, A., Lei, L., Sekhon, J.: A Gaussian Process Framework for Overlap and Causal Effect Estimation with High-Dimensional Covariates, arXiv:1711.02582v3 [math.ST] (2019)

  26. Deaton, A., Cartwright, N.: Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210, 2–21 (2018)

    Article  Google Scholar 

  27. Farrell, M.H., Liang, T., Misra, S.: Deep Neural Networks for Estimation and Inference (2019)

    Google Scholar 

  28. Feigenbaum, J.J.: Automated census record linking: a machine learning approach (2016). Working Paper

    Google Scholar 

  29. Fisher, R.A.: Statistical Methods for Research Workers, 5th edn. Oliver and Boyd Ltd., Edinburgh (1925)

    MATH  Google Scholar 

  30. Fisher, R.A.: The Design of Experiments, 8th edn. Hafner Publishing Company, New York (1935)

    Google Scholar 

  31. Gelman, A., Loken, E.: The garden of forking paths: why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time (2013). http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

  32. Gentzkow, M., Shapiro, J.M., Taddy, M.: Measuring group differences in high-dimensional choices: method and application to congressional speech. Econometrica 87, 1307–1340 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  33. Gentzkow, M., Kelly, B., Taddy, M.: Text as data. J. Econo. Lit. 57, 535–574 (2019)

    Article  Google Scholar 

  34. Hahn, J.: On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  35. Hastie, T., Tibshirani, R., Wainwright, M.J.: Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics & Applied Probability. CRC Press, Taylor & Francis, Boca Raton (2015)

    Book  MATH  Google Scholar 

  36. Hamermesh, D.S.: Six decades of top economics publishing: who and how? J. Econ. Lit. 51, 162–172 (2013)

    Article  Google Scholar 

  37. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  38. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice, 2 ed. (2018)

    Google Scholar 

  39. Jing, B.-Y., Shao, Q.-M., Wang, Q.: Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31, 2167–2215 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  40. Kennedy, E.H.: Semiparametric Theory and Empirical Processes in Causal Inference, arXiv:1510.04740v3 [math.ST] (2016)

  41. Kiel, K., McClain, K.: House prices during siting decision stages: the case of an incinerator from rumor through operation. J. Environ. Econ. Manag. 28, 241–255 (1995)

    Article  Google Scholar 

  42. van der Laan, M.J., Dudoit, S., van der Vaart, A.W.: The cross-validated adaptive epsilon-net estimator. Stat. Decisions 24, 373–395 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  43. Luo, Y., Spindler, M.: High-Dimensional L2 Boosting: Rate of Convergence, arXiv:1602.08927v2 [stat.ML] (2019)

  44. Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31, 87–106 (2017)

    Article  Google Scholar 

  45. Neyman, J.: On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9. Translated by D. M. Dabrowska and T. P. Speed. Stat. Sci. 5, 465–472 (1990)

    Google Scholar 

  46. Ning, Y., Peng, S., Imai, K.: Robust estimation of causal effects via a high-dimensional covariate balancing propensity score. Biometrika (2020)

    Google Scholar 

  47. Powell, J.: Estimation of Semiparametric Models. Elsevier Science B.V., Amsterdam (1994)

    Google Scholar 

  48. Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Statis. Assoc. 89, 846 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  49. Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011)

    Article  Google Scholar 

  50. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  51. van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Heidelberg (1996)

    Google Scholar 

  52. van der Vaart, A.W.: Asymptotic Statistics, Cambridge University Press (1998)

    Google Scholar 

  53. Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  54. Wager, S., Walther, G.: Adaptive concentration of regression trees, with application to random forests, arXiv:1503.06388 [math.ST] (2019)

  55. Wooldridge, J.M.: Violating ignorability of treatment by controlling for too many factors. Econom. Theory 21, 1026–1028 (2005)

    MathSciNet  MATH  Google Scholar 

  56. Wooldridge, J.M.: Introductory Econometrics: A Modern Approach, 4th edn. Cengage, Boston (2009)

    Google Scholar 

  57. Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data, 2nd edn. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark E. Schaffer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ahrens, A., Aitken, C., Schaffer, M.E. (2021). Using Machine Learning Methods to Support Causal Inference in Econometrics. In: Sriboonchitta, S., Kreinovich, V., Yamaka, W. (eds) Behavioral Predictive Modeling in Economics. Studies in Computational Intelligence, vol 897. Springer, Cham. https://doi.org/10.1007/978-3-030-49728-6_2

Download citation

Publish with us

Policies and ethics