Abstract
The most reliable method for identifying the causal effect of a treatment on an outcome is to conduct an experiment in which the treatment is randomly assigned to a portion of the sample. The causal effect of the treatment is the difference in outcomes between units exposed to the treatment and those that were not. The apparent simplicity of randomized control trials (RCTs) belies the true complexity in designing, conducting, and analyzing them. This chapter provides an introduction to experimental analysis in higher education research that unpacks the multiple complexities inherent in RCTs. It presents both the basic logic and mathematics of experimental analysis before explaining more complex design elements such as blocking, clustering, and power analysis. The chapter also discusses issues that can undermine experiments such as attrition, treatment fidelity, and contamination and suggests methods of mitigating their negative effects. Concrete examples from experimental analyses in the higher education literature are provided throughout the chapter.
Similar content being viewed by others
Notes
- 1.
Somewhat confusingly, balance in an experimental context can also refer to the proportion of units assigned to treatment and control. A balanced experiment is one in which 50% of units are assigned treatment and 50% are assigned control. I will use the term in this chapter to mean that individual characteristics are, on average, equivalent across treatment arms, what is more specifically referred to as covariate balance.
- 2.
Some sources use the term stratification instead of blocking, but I think it is wise to use separate terms in an effort to distinguish between random sampling and random assignment to treatment.
- 3.
References
Angelucci, M., & Di Maro, V. (2015). Program evaluation and spillover effects. IZA Discussion Paper No. 9033. http://ftp.iza.org/dp9033.pdf
Angrist, J. D., & Pischke, J. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444–455.
Aronow, P. M. (2012). A general method for detecting interference between units in randomized experiments. Sociological Methods & Research, 41(1), 3–16.
Athey, S., & Imbens, G. (2017). The econometrics of randomized experiments. In A. V. Banerjee & E. Duflo (Eds.), Handbook of field experiments (pp. 73–140). Amsterdam: North Holland.
Baird, S., Bohren, A., McIntosh, C., & Özler, B. (2014). Designing experiments to measure spillover effects. World Bank Policy Research Working Paper No. 6824. http://documents1.worldbank.org/curated/en/266671468089953232/pdf/WPS6824.pdf
Baker, R. B., Evans, B. J., Li, Q., & Cung, B. (2019). Does inducing students to schedule lecture watching in online classes improve their academic performance? An experimental analysis of a time management intervention. Research in Higher Education, 60, 521–552.
Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal stratification approach to broken randomized experiments: A case study of school choice vouchers in New York City. Journal of the American Statistical Association, 98(462), 299–323.
Barnett, E. A., Bergman, P., Kopko, E., Reddy, V., Belfield, C. R., & Roy, S. (2018). Multiple measures placement using data analtyics: An implmentation and early impacts report. Center for the Analysis of Postsecondary Readiness.
Barrow, L., Richburg-Hayes, L., Rouse, C. E., & Brock, T. (2014). Paying for performance: The education impacts of a community college scholarship program for low-income adults. Journal of Labor Economics, 32(3), 563–599.
Bastedo, M. N., & Bowman, N. A. (2017). Improving admission of low-SES students at selective colleges: Results from an experimental simulation. Educational Researcher, 47(2), 67–77.
Bell, M. L., Kenward, M. G., Fairclough, D. L., & Horton, N. J. (2013). Differential dropout and bias in randomised controlled trials: When it matters and when it may not. BMJ, 346:e8668, 1–7.
Benjamin-Chung, J., Arnold, B. F., Berger, D., Luby, S. P., Miguel, E., Colford, J. M., Jr., & Hubbard, A. E. (2018). Spillover effects in epidemiology: Parameters, study designs and methodological considerations. International Journal of Epidemiology, 47(1), 332–347.
Bettinger, E. P., & Baker, R. B. (2014). The effects of student coaching: An evaluation of a randomized experiment in student advising. Educational Evaluation and Policy Analysis, 36(1), 3–19.
Bettinger, E. P., & Evans, B. J. (2019). College guidance for all: A randomized experiment in pre-college advising. Journal of Policy Analysis and Management, 38(3), 579–599.
Bloom, H. S. (1995). Minimum detectable effects: A simple way to report the statistical power of experimental designs. Evaluation Review, 19(5), 547–556.
Bloom, H. S. (2005). Randomizing groups to evaluate place-based programs. In H. S. Bloom (Ed.), Learning more from social experiments: Evolving analytic approaches (pp. 115–172). New York: Russell Sage Foundation.
Bloom, H. S. (2008). The core analytics of randomized experiments for social research. In P. Alasuutari, L. Bickman, & J. Branned (Eds.), The SAGE handbook of social research methods (pp. 115–133). Los Angeles: SAGE.
Bloom, H. S., Richburg-Hayes, L., & Black, A. R. (2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational Evaluation and Policy Analysis, 29(1), 30–59.
Bloom, H. S., Raudenbush, S. W., Weiss, M. J., & Porter, K. (2017). Using multisite experiments to study cross-site variation in treatment effects: A hybrid approach with fixed intercepts and a random treatment coefficient. Journal of Research on Educational Effectiveness, 10(4), 817–842.
Borm, G. F., Melis, R. J. F., Teerenstra, S., & Peer, P. G. (2005). Pseudo cluster randomization: A treatment allocation method to minimize contamination and selection bias. Statistics in Medicine, 24(23), 3535–3547.
Botelho, A., & Pinto, L. C. (2004). Students’ expectations of the economic returns to college education: Results of a controlled experiment. Economics of Education Review, 23, 645–653.
Bowen, W. G., Chingos, M. M., Lack, K. A., & Nygren, T. I. (2013). Interactive learning online at public universities: Evidence from a six-campus randomized trial. Journal of Policy Analysis and Management, 33(1), 94–111.
Castleman, B. L., & Page, L. C. (2015). Summer nudging: Can personalized text messages and peer mentor outreach increase college going among low-income high school graduates? Journal of Economic Behavior & Organization, 115, 144–160.
Castleman, B. L., Arnold, K., & Wartman, K. L. (2012). Stemming the tide of summer melt: An experimental study of the effects of post-high school summer intervention on low-income students’ college enrollment. Journal of Research on Educational Effectiveness, 5(1), 1–17.
Castleman, B. L., Page, L. C., & Schooley, K. (2014). The forgotten summer: Does the offer of college counseling after high school mitigate summer melt among college-intending, low-income high school graduates? Journal of Policy Analysis & Management, 33(2), 320–344.
Cheng, A., & Peterson, P. E. (2019). Experimental estimates of impacts of cost-earnings information on adult aspirations for children’s postsecondary education. Journal of Higher Education, 90(3), 486–511.
Ciolino, J. D., Martin, R. H., Zhao, W., Hill, M. D., Jauch, E. C., & Palesch, Y. Y. (2015). Measuring continuous baseline covariate imbalances in clinical trial data. Statistical Methods in Medical Research, 24(2), 255–272.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates, Publishers.
Darolia, R., & Harper, C. (2018). Information use and attention deferment in college student loan decisions: Evidence from a debt letter experiment. Educational Evaluation and Policy Analysis, 40(1), 129–150.
DeclareDesign. (2018). The trouble with ‘controlling for blocks.’ https://declaredesign.org/blog/biased-fixed-effects.html
DiNardo, J., McCrary, H., & Sanbonmatsu, L. (2006). Constructive proposals for dealing with attrition: An empirical example. Working Paper. https://eml.berkeley.edu/~jmccrary/DMS_v9.pdf
Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect size and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24–67.
Donner, A. (1998). Some aspects of the design and analysis of cluster randomization trials. Applied Statistics, 47(1), 95–113.
Duflo, E., Glennerster, R., & Kremer, M. (2008). Using randomization in development economics research: A toolkit. In T. Schultz & J. Strauss (Eds.), Handbook of development economics, volume 4 (pp. 3895–3962). Amsterdam: North Holland.
Dynarski, S., Libassi, C. J., Michelmore, K., & Owen, S. (2018). Closing the gap: The effect of a targeted tuition-free promise on college choices of high-achieving, low-income students. NBER Working Paper No. 25349. https://www.nber.org/papers/w25349.pdf
Evans, B. J., & Boatman, A. (2019). Understanding how information affects loan aversion: A randomized control trial of providing federal loan information to high school seniors. Journal of Higher Education, 90(5), 800–832.
Evans, B. J., & Henry, G. T. (2020). Self-paced remediation and math placement: A randomized field experiment in a community college. AEFP Working Paper. https://aefpweb.org/sites/default/files/webform/41/Evans%20&%20Henry%20Self%20paced%20remediation%20draft%20v2.pdf
Evans, B. J., Boatman, A., & Soliz, A. (2019). Framing and labeling effects in preferences for borrowing for college: An experimental analysis. Research in Higher Education, 60(4), 438–457.
Faul, E., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.
Faul, E., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160.
Field, E. (2009). Educational debt burden and career choice: Evidence from a financial aid experiment at NYU law school. American Economic Journal: Applied Economics, 1, 1), 1–1),21.
Firpo, S., Foguel, M. N., & Jales, H. (2020). Balancing tests in stratified randomized controlled trials: A cautionary note. Economics Letters, 186, 1–4.
Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver and Boyd.
Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58(1), 21–29.
Freedman, D. A. (2008). On regression adjustments to experimental data. Advances in Applied Mathematics, 40, 180–193.
Furquim, F., Corral, D., & Hillman, N. (2019). A primer for interpreting and designing difference-in-differences studies in higher education research. In Higher education: Handbook of theory and research: Volume 35 (pp. 1–58). Cham: Springer.
Gehlbach, H., & Robinson, C. D. (2018). Mitigating illusory results through preregistration in education. Journal of Research on Educational Effectiveness, 11(2), 296–315.
Gennetian, L. A., Morris, P. A., Bos, J. M., & Bloom, H. S. (2005). Constructing instrumental variables from experimental data to explore how treatments produce effects. In H. S. Bloom (Ed.), Learning more from social experiments: Evolving analytic approaches (pp. 75–114). New York: Russell Sage Foundation.
Gerber, A. S., & Green, D. P. (2012). Field experiments: Design, analysis, and interpretation. New York: Norton.
Gibbons, C. E., Serrato, J. C. S., & Urbancic, M. B. (2018). Broken or fixed effects? Journal of Econometric Methods, 8(1), 1–12.
Goldrick-Rab, S., Kelchen, R., Harris, D. N., & Benson, J. (2016). Reducing income inequality in educational attainment: Experimental evidence on the impact of financial aid on college completion. American Journal of Sociology, 121(6), 1762–1817.
Gopalan, M., Rosinger, K., & Ahn, J. B. (2020). Use of quasi-experimental research designs in education research: Growth, promise, and challenges. Review of Research in Education, 44(1), 218–243.
Grissmer, D. W. (2016). A guide to incorporating multiple methods in randomized controlled trials to assess intervention effects (2nd ed.). Washington DC: American Psychological Association.
Hallberg, K., Wing, C., Wong, V., & Cook, T. D. (2013). Experimental design for causal inference: Clinical trials and regression discontinuity designs. In T. D. Little (Ed.), Oxford handbook of quantitative methods. Volume 1: Foundations (pp. 223–236). New York: Oxford University Press.
Hansen, B. B., & Bowers, J. (2008). Covariate balance in simple, stratified and clustered comparative studies. Statistical Science, 23(2), 219–236.
Hanson, A. (2017). Do college admissions counselors discriminate? Evidence from a correspond-based field experiment. Economics of Education Review, 60, 86–96.
Haxton, C., Song, M., Zeiser, K., Berger, A., Turk-Bicakci, L., Garet, M. S., Knudson, J., & Hoshen, G. (2016). Longitudinal findings from the early college high school initiative impact study. Educational Evaluation and Policy Analysis, 38(2), 410–430.
Hayes, R. J., & Moulton, L. H. (2017). Cluster randomized trials (2nd ed.). Boca Raton: CRC Press.
Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCSER 2010-3006). National Center for Special Education Research, Institute of Education Sciences, U. S. Department of Education.
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.
Huber, M. (2012). Identification of average treatment effects in social experiments under alternative forms of attrition. Journal of Educational and Behavioral Statistics, 37(3), 443–474.
Hudgens, M. G., & Halloran, E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103(482), 832–842.
Ihme, T. A., Sonnenberg, K., Barbarino, M., Fisseler, B., & Stürmer, S. (2016). How university websites’ emphasis on age diversity influences prospective students’ perception of person-organization fit and student recruitment. Research in Higher Education, 57(8), 1010–1030.
Konstantopoulos, S. (2008). The power of the test for treatment effects in three-level block randomized designs. Journal of Educational Effectiveness, 1(4), 265–288.
Konstantopoulos, S. (2009). Using power tables to compute statistical power in multilevel experimental designs. Practical Assessment, Research, and Evaluation, 14, 10.
Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253.
Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. The Annals of Applied Statistics, 7(1), 295–318.
Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., & Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms (NCSER 2013-3000). National Center for Special Education Research, Institute of Education Sciences, U. S. Department of Education.
Logue, A. W., Watanabe-Rose, M., & Douglas, D. (2016). Should students assessed as needing remedial mathematics take college-level quantitative courses instead? A randomized controlled trial. Educational Evaluation and Policy Analysis, 38(3), 578–598.
Mattanah, J. F., Brooks, L. J., Ayers, J. F., Quimby, J. L., Brand, B. L., & McNary, S. W. (2010). A social support intervention to ease the college transition: Exploring main effects and moderators. Journal of College Student Development, 51(1), 93–108.
Millan, T. M., & Macours, K. (2017). Attrition in randomized control trials: Using tracking information to correct bias. IZA Discussion Paper Series No. 10711. http://ftp.iza.org/dp10711.pdf
Moerbeek, M. (2005). Randomization of clusters versus randomization of persons within clusters: Which is preferable? The American Statistician, 59(1), 72–78.
Morgan, K. L., & Rubin, D. B. (2012). Rerandomization to improve covariate balance in experiments. The Annals of Statistics, 40(2), 1263–1282.
Morgan, S. L., & Winship, C. (2015). Counterfactuals and causal inference: Methods and principles for social research (2nd ed.). New York: Cambridge University Press.
Mowbray, C. T., Holter, M. C., Teague, G. B., & Bybee, D. (2003). Fidelity criteria: Development, measurement, and validation. American Journal of Evaluation, 24(3), 315–340.
Murnane, R. J., & Willett, J. B. (2011). Methods matter: Improving causal inference in educational and social science research. Oxford: Oxford University Press.
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont report: Ethical principles and guidelines for the protection of human subjects of research. Washington, DC: U. S. Department of Health, Education, and Welfare.
Nelson, M. C., Cordray, D. S., Hulleman, C. S., Darrow, C. L., & Sommer, E. C. (2012). A procedure for assessing intervention fidelity in experiments testing educational and behavioral interventions. The Journal of Behavioral Health Services & Research, 39(4), 374–396.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.
O’Donnell, C. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes in K-12 curriculum intervention research. Review of Educational Research, 78(1), 33–84.
Oreopoulos, P., & Dunn, R. (2013). Information and college access: Evidence from a randomized field experiment. Scandinavian Journal of Economics, 115(1), 3–26.
Oreopoulos, P., & Ford, R. (2019). Keeping college options open: A field experiment to help all high school seniors through the college application process. Journal of Policy Analysis and Management, 38(2), 426–454.
Page, L. C., Feller, A., Grindal, T., Miratrix, L., & Somers, M. (2015). Principal stratification: A tool for understanding variation in program effects across endogenous subgroups. American Journal of Evaluation, 36(4), 514–531.
Paloyo, A. R., Rogan, S., & Siminski, P. (2016). The effect of supplemental instruction on academic performance: An encouragement design experiment. Economics of Education Review, 55, 57–69.
Pugatch, T., & Wilson, N. (2018). Nudging study habits: A field experiment on peer tutoring in higher education. Economics of Education Review, 62, 151–161.
Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5(2), 199–213.
Raudenbush, S. W., Martinez, A., & Spybrook, J. (2007). Strategies for improving precision in group-randomized experiments. Educational Evaluation and Policy Analysis, 29(1), 5–29.
Rhoads, C. H. (2011). The implications of “contamination” for experimental design in education. Journal of Educational and Behavioral Statistics, 36(1), 76–104.
Rhoads, C. H. (2016). The implications of contamination for educational experiments with two levels of nesting. Journal of Research on Educational Effectiveness, 9(4), 531–555.
Rivas, M. J., Baker, R. B., & Evans, B. J. (2020). Do MOOCs make you more marketable? An experimental analysis of the value of MOOCs relative to traditional credentials and experience. AERA Open, 6(4), 1–16.
Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477), 191–200.
Rosinger, K. O. (2019). Can simplifying financial aid offers impact college enrollment and borrowing? Experimental and quasi-experimental evidence. Education Finance and Policy, 14(4), 601–626.
Rubin, D. (1980). Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75(371), 591–593.
Rubin, D. (1986). Statistics and causal inference: Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81(396), 961–962.
Rubin, D. (2008). For objective casual inference: Design trumps analysis. The Annals of Applied Statistics, 2(3), 808–840.
Schnabel, D. B. L., Kelava, A., & van de Vijver, F. J. R. (2016). The effects of using collaborative assessment with students going abroad: Intercultural competence development, self-understanding, self-confidence, and stages of change. Journal of College Student Development, 57(1), 79–94.
Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62–87.
Schochet, P. Z. (2013). Estimators for clustered education RCTs using the Neyman model for causal inference. Journal of Educational and Behavioral Statistics, 38(3), 219–238.
Schochet, P. Z., & Chiang, H. S. (2011). Estimation and identification of the complier average causal effect parameter in education RCTs. Journal of Educational and Behavioral Statistics, 36(3), 307–345.
Scrivener, S., Weiss, M. J., Ratledge, A., Rudd, T., Sommo, C., & Fresques, H. (2015). Doubling graduate rates: Three-year effects of CUNY’s Accelerated Study in Associate Programs (ASAP) for development education students. MDRC.
Scrivener, S., Gupta, H., Weiss, M. J., Cohen, B., Cormier, M. S., & Brathwaite, J. (2018). Becoming college-ready: Early findings from a CUNY Start evaluation. MDRC.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston/New York: Houghton Mifflin Company.
Sinclair, B., McConnell, M., & Green, D. P. (2012). Detecting spillover effects: Design and analysis of multilevel experiments. American Journal of Political Science, 56(4), 1055–1069.
Smith, S. W., Daunic, A. P., & Taylor, G. G. (2007). Treatment fidelity in applied educational research: Expanding the adoption and application of measures to ensure evidence-based practice. Education and Treatment of Children, 30(4), 121–134.
Spybrook, J., Bloom, H., Congdon, R., Hill, C., Martinez, A., & Raudenbush, S. (2011). Optimal Design Plus empirical evidence: Documentation for the “Optimal Design” software. http://hlmsoft.net/od/od-manual-20111016-v300.pdf
Spybrook, J., Hedges, L., & Borenstein, M. (2014). Understanding statistical power in cluster randomized trials: Challenges posed by differences in notation and terminology. Journal of Research on Educational Effectiveness, 7(4), 384–406.
Tchetgen Tchetgen, E. J., & VanderWeele, T. J. (2012). On causal inference in the presence of interference. Statistical Methods in Medical Research, 21(1), 55–75.
VanderWeele, T. J., & Hernan, M. A. (2013). Causal inference under multiple versions of treatment. Journal of Causal Inference, 1(1), 1–20.
VanderWeele, T. J., Hong, G., Jones, S. M., & Brown, J. (2013). Mediation and spillover effects in group-randomized trials: A case study of the 4Rs educational intervention. Journal of the American Statistical Association, 108(502), 469–482.
Vazquez-Bare, G. (2019). Identification and estimation of spillover effects in randomized experiments. Working Paper. http://www-personal.umich.edu/~gvazquez/papers/Vazquez-Bare_spillovers.pdf
Weiss, M. J., Mayer, A. K., Cullinan, D., Ratledge, A., Sommo, C., & Diamond, J. (2015). A random assignment evaluation of learning communities at Kingsborough Community College – Seven years later. Journal of Research on Educational Effectiveness, 8(2), 189–271.
What Works Clearinghouse. (2020). Standards handbook, version 4.1. Washington, DC: Institute of Education Sciences.
Whitehurst, G. J. (2003, April 21–25). The Institute of Education Sciences: New wine, new bottles [Conference presentation]. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL, United States.
Acknowledgements
I would like to thank Amanda Addison and Robin Yeh for providing research assistance in tracking down many of the references across fields cited in this chapter. The chapter also benefited from many helpful suggestions from an anonymous reviewer and the guidance of the Associate Editor, Nick Hillman.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this entry
Cite this entry
Evans, B.J. (2021). Understanding the Complexities of Experimental Analysis in the Context of Higher Education. In: Perna, L.W. (eds) Higher Education: Handbook of Theory and Research. Higher Education: Handbook of Theory and Research, vol 36. Springer, Cham. https://doi.org/10.1007/978-3-030-43030-6_12-1
Download citation
DOI: https://doi.org/10.1007/978-3-030-43030-6_12-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43030-6
Online ISBN: 978-3-030-43030-6
eBook Packages: Springer Reference EducationReference Module Humanities and Social SciencesReference Module Education