To date, there have been four responses (Dumas & Edelsbrunner, 2023; Grosz, 2023; Mayer, 2023; Zitzmann et al., 2023) to the Brady et al. (2023) observational study showing that the field of educational psychology has experienced a continued (1) decline in the proportion of empirical articles that include intervention and experimental studies, an (2) increase in the proportion that include observational studies, and an (3) increase in the proportion of the latter that include recommendations for practice. Why is there such interest concerning these findings?

The earth’s temperature has risen about 2°F since 1880. However, the rate of warming since 1981 is more than twice as fast compared to previous decades (Lindsey & Dahlman, 2023). Is this a cause for concern? Similar to the data reported by Brady et al. (2023), it is just an observation. Whether we should be concerned with such trends depends on the possible consequences. The earth’s rising temperature may have disastrous consequences, whereas the replacement of experiments with observational studies likely will not kill anyone (at least not directly).

In the book Fossil Men, Tim White, a paleoanthropologist, rants about his field’s degeneration. At a conference where he preaches from the pulpit, he says the field is producing too many ill-prepared and poorly trained PhDs who do not want to go out into the field and find fossils. Instead, they want others to share imprints of their fossils so they can “analyze” the data from the comfort of their offices (Pattison, 2020). Our interpretation of the Brady et al. (2023) findings is similar. Educational psychology is continuing its movement toward the dark side of the quality continuum. There are fewer intervention studies and more armchair research—“analyzing” large datasets from the comfort of offices. More alarming is the increase in the tendency for authors to infer causality from correlation and offer recommendations based on shaky ground.

What most refer to as causal modeling (e.g., structural equation modeling or SEM and a direct generalization of traditional factor analysis) is in fact not causal at all. SEM gives us only a model that conveys causal assumptions. Leaning heavily on such models requires also leaning very heavily on largely untestable, substantial, heroic assumptions. A recent typical example is a study by Clarke et al. (2023) that was promoted as follows in Science Daily: “Study of 600 UK teenagers suggests that having stronger self-awareness and sense of purpose may raise GCSE maths scores ‘by a couple of grades.’” The authors analyzed student questionnaire data collected at one time point using SEM. This “causal” conclusion is bogus. Raising self-awareness and sense of purpose is not likely to raise math performance. It is just as likely that increased math performance raises self-awareness. This “correlation is not causation” error should have been corrected in any introductory statistics or research methods course. It is sadly reminiscent of the self-esteem movement from the 1970s. The thought back then was that because self-esteem was positively correlated with student achievement, we could simply raise achievement by raising self-esteem. Unfortunately, efforts to raise students’ self-esteem not only did not increase student achievement, but in some cases, achievement decreased (Baumeister et al, 2003). The lessons learned from 50 years ago have apparently been forgotten.

In educational psychology, we have moved away from a very important component of research—rigor. Instead, we have been seduced by what is simple and easy. The fruits of this latter type of research are rarely useful in providing causal evidence. More rigorous methods, such as interventions, pay bigger dividends. Improving education takes rigor and sustained effort over a long period of time. Even then, the effects might be neither impressive nor replicable. But we keep trying. We roll up our sleeves and apply “elbow grease” by using the most rigorous methods we have available.

In finance, the Markowitz model shows the positive relationship between risks and rewards. Higher returns typically only come from higher risks. In educational research, a similar relationship exists between the difficulty of data gathering and the credibility of causal inference.

The causal effect of some treatment relative to a control is the difference between the outcome observed with the treatment and the outcome that would have occurred without it. This latter aspect is what is usually referred to as Hume’s counterfactual (Holland, 1986). Observational studies typically assume that outcomes observed under the control condition represent what would have occurred in the experimental condition had they not received the treatment. This assumption is often heroic (e.g., those students living with their grandmothers who were in over-crowded, over-heated, poorly taught, inner-city schools were just the same in all important ways as those who were in new suburban schools with A/C, highly qualified teachers, and two parents at home who were college graduates). The counterfactual assumption is more credible in a randomized experiment because there is nothing different, a priori, between the treatment and control groups (at least on average) beyond that generated by chance—and those differences shrink as sample sizes increase. But randomized controlled experiments are more difficult to carry out.

Easy-to-do studies such as conducting complex analyses on a huge database with no interventions or manipulations, on the other hand, provide little supporting evidence of causal inference. Such observational studies require heroic assumptions. Whether researchers should offer recommendations for practice should depend on the credibility of the evidence supporting those recommendations. In educational psychology, credible evidence is disappearing.

The limits of any analytic procedure are most easily explained in terms of missing data. All statistical methods can be thought of in terms of missing data. For example, regression is y = Bx, in which we observe the ys and the xs and the Bs are missing. The analytic task is to estimate the Bs. This is covered in the first statistics course. Of course, the greater the missingness, the greater the uncertainty. What about factor analysis/SEM? The basic model is the same as regression: y = Bx, except now both the Bs and the xs are missing (called factor loadings and factor scores). All we have are the ys. Obviously, this is a much tougher task and cannot be accomplished at all without relying on heroic assumptions.

This is why, when it is possible, we ought to (vastly) prefer (and believe in) studies that are closer to what we can observe. Unfortunately, educational psychology continues to reject randomized experiments in favor of observational studies (Cook, 2001). SEM applied to observational data can only disconfirm causal hypotheses, not confirm them. Such models are casual, NOT causal. They are connected only through a vowel movement that is stinking up our field.