Introduction

In Life on the Mississippi, Mark Twain (1883) noted how the Mississippi River, by cutting across meanders, has periodically shortened itself in the past.

In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.

The question that Robinson, Levin, Schraw, Patall, and Hunt (hereafter RLSPH) wrestled within their paper (2013) is how to prevent scientists from drawing such “wholesale returns of conjecture out of such a trifling investment of fact.” Apparently, they fear that if we were to continue to allow extrapolations from data there is a chance that others, less wise than we, might use these suppositions as support for policies that are not fully justified.

So their solution is to ban any such extrapolations.

Debate About Extrapolation in Victorian England

A similar idea was being debated more than 150 years ago in Britain, and it is instructive to revisit that debate. We shall begin with one of the most far-reaching examples of a conclusion that productively extended beyond the data, Darwin’s (1859) statement near the end of the Origin of Species,

I have now recapitulated the chief facts and considerations which have thoroughly convinced me that species have been modified, during a long course of descent, by the preservation or natural selection of many successive slight favorable variations.

This quote is taken from a book, not a journal, and so is not strictly within the bounds of RLSPH, but Alfred Russell Wallace, who provided the motivation for Darwin to publish The Origin, said essentially the same thing, a year earlier, in concluding the joint Darwin–Wallace (1858) paper in The Journal of the Proceedings of the Linnean Society,

We believe we have now shown that there is a tendency in nature to the continued progression of certain classes of varieties further and further from the original type—a progression to which there appears no reason to assign any definite limits….This progression, by minute steps, in various directions, but always checked and balanced by the necessary conditions, subject to which alone existence can be preserved, may, it is believed, be followed out so as to agree with all the phenomena presented by organized beings, their extinction and succession in past ages, and all the extraordinary modifications of form, instinct, and habits which they exhibit.

It may be productive to compare this conclusion—which clearly violates the recommendations of RLSPH (but changed science forever)—to those from the papers that immediately preceded and followed the Darwin–Wallace paper in that issue of the Journal. These papers adhere to RLSPH’s recommendations but have had notably less impact. The paper that was published immediately before theirs is by Thomas Huxley, Fellow of the Royal Society and one of the most eminent scientists of the 19th century; a man who may well have been buried with Darwin at Westminster Abbey had he not given specific instructions to prevent such an effort. The paper that follows Darwin–Wallace is by Robert Knox, Fellow of the Royal Society of Edinburgh, but perhaps best known for his involvement in the Burke and Hare body-snatching case. Huxley (1858) concludes:

In the second edition of Professor Owen’s lectures on the Invertebrata (1855), I find no mention of Valenciennes’ discovery of four additional apertures; but the author states that “on each side at the roots of the anterior bronchiae, there is a small mammillary eminence with a transverse slit, which conducts from bronchial cavity to one of the pericardium containing two clusters of venous glands. There are also two similar, but smaller, slits, contiguous with one another, near the root of the posterior bronchia on each side, that lead to and may admit sea-water into the compartments containing the posterior cluster of venous follicles.” In this work the ovary is not only described, but figured, on the right side of the gizzard. The figure, however, rightly places the greater part of the ovary below that organ.

In a similar manner, Knox (1858) concludes his paper:

The conclusion I arrive at is this,—that the actual number of cervical vertebrae in the Mysticetus, as in most other mammals, is seven…

These latter two quotations clearly are in line with the recommendations of RLSPH; they also are consistent with the “science as an accumulation of facts” school of thought: They don’t draw conclusions that go beyond the actual results. In this they differ from the approach of Darwin and Wallace and seem to reflect a different approach to science. Darwin famously rejoiced, “I had at last got a theory by which to work.” The statement captured his view that the collection of facts could be useful only if it were guided by a design focused on iteratively supporting or refuting a theory. As Darwin put it in a letter to Henry Fawcett in 1861, “How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!”

Darwin’s comment captures a basic truth about how and why we do science. We conduct experiments and we collect and analyze data to create evidence to refute or support theories (or at least hypotheses), and those theories are important because they have real-world applications and policy implications. Scientists must tell us what those implications are because only they know the details of how the methodology was shaped to support those conclusions. It makes no sense for the reader to have to guess and more importantly the reader must understand the intended application because that is the specific context that will be the basis for crafting the next study to support or refute the credibility of the intended extrapolation. Ironically, when Darwin made the statement quoted in the previous paragraph he was in fact referring to an earlier effort to prevent scientists from going beyond the results. The more complete quote reads:

About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!

The problem is not new; clearly the answer must not be to limit science (or specific journals) to the collection of facts. In the balance of this essay, we will continue to explore this issue with the accumulated wisdom of 150 years of science and scientific discoveries, and in the end, we will conclude that the only sensible answer is to expect editors to do their job, with the expectation that sometimes they will fail to do it well.

A More General View

The failed nineteenth century policy that was disinterred by RLSPH has costs and benefits. The chief benefit is that the literature will be rid of some unjustified and incorrect claims. Balanced against this is the loss of powerful insights that might have changed the way we understand the data.

The first issue we ought to confront is whether it has to be all or nothing.

Let us consider the more general solution in which there is a cost functionFootnote 1 that penalizes us for each kind of error and thus guides our solution to one that minimizes the total cost. RLSPH’s solution essentially has an infinite cost for allowing an erroneous conclusion to be published, which overbalances any cost of missing what might have been found.

This seems to us to be too extreme.

How ought we weigh the two aspects of false positives and false negatives? Historically, R. A. Fisher (e.g., 1925) chose a standard for rejection (.05) that was very generous, for he felt that an accepted false result soon would be corrected whereas a real finding, if dismissed too early in its discovery path, might take a long time to be refound.

Although Fisher didn’t specify his cost function explicitly, his general intention is clear. Obviously, we would want the cost function to reflect the potential harm that may result if a researcher’s claim subsequently proved to be wrong. For example, if we were doing drug research, we would not accept only a one-in-twenty chance that a treatment for nausea among pregnant mothers would result in profound birth defects. Such a chance might, however, be acceptable as the basis for a decision to launch a moderately sized trial of an apparently successful new approach to teaching algebra.

This approach requires careful judgment and discretion on the part of editors who must balance the evidence against the claim. Simply allowing modest claims and rejecting the more gandiose will not do.

For example, suppose some overly enthusiastic authors finished off a paper with the claim that “based on our data we postulate that we have uncovered the secret of life.” Wouldn’t a cautious editor be forced to elide such an extravagant claim? Or at least ask the authors to tone it down? Yet suppose this claim turned out to be true? Clearly, stifling such a claim can carry substantial costs as well.Footnote 2

Quis custodiet ipsos custodes?

Some Alternatives

Editorial control is not the only choice available to deal with this problem. Newton opted to write such things in Latin, thus limiting access to only those worthy enough to be able to read them. He did this, for example, in discussing the odd sexual practices among the Babylonians. Instituting such a practice today might have the additional benefit of increasing the popularity of courses in the study of ancient languages.

Another alternative is to borrow from the practice common among medical journals in which all articles have a structured abstract that lays out the key ideas contained in the paper with clearly labeled sections. We might insist that all extrapolations beyond the data be in a section entitled something like “Extrapolations beyond the data” to make it clear that what follows lies in greater uncertainty than what might be in a section entitled “Conclusions.”

Perhaps we might require that each such section include Christopher Hitchens’ pithy observation:

That which can be asserted without evidence, can be dismissed without evidence

Coda

Last, it seems worthwhile to consider the consequences of those unsupported extensions that seem to evoke such terror in the souls of RLSPH. What is the harm that might be done? It certainly is possible that some unsupported conjectures in a paper in Educational Psychology Review might lead to the implementation of an educational policy that is unsuccessful, but it is unlikely to yield deformed babies or cause the atmosphere of the Earth to catch fire. Indeed, the more awful the consequences of such a poor decision, the more quickly it will be discovered and the policy reversed. This faith that we have that in the end bad ideas will die out and be replaced by good ones may sound Pollyannaish, but we are not alone in support of this notion. Most notably, John Stuart Mill (1859), in his famous essay On Liberty, said,

It is a piece of idle sentimentality that truth, merely as truth, has any inherent power denied to error of prevailing against the dungeon and the stake. Men are not more zealous for truth than they often are for error, and a sufficient application of legal or even social penalties will generally succeed in stopping the propagation of either.

The real advantage which truth has consists in this, that when an opinion is true, it may be extinguished once, twice or many times, but in the course of ages there will generally be found persons to rediscover it, until some one of its reappearances fall on a time when from favorable circumstances it escapes persecution until it has made such headway as to withstand all subsequent attempts to suppress it.

We can hope that if we trust editors to make careful decisions and thoughtfully balance the benefits and risks of extrapolation that we might not need to wait for “the course of ages” for important educational innovations to be discovered. We can also be sure that mistakes will be made. Whatever course we take, we should not fool ourselves into believing that our actions will prevent policy makers from carefully selecting—either from results or claims—those nuggets that support the policies they find politically attractive.

Throughout this process, we must also be guided by the awareness that one of the principal goals of science is the collection of evidence, not data. If we are interested in a student’s mathematical ability, we can collect that student’s scores on a math test that could be viewed as evidence. We might also measure the student’s shoe size, data certainly, but not evidence. Evidence is data related to a claim. Thus, without a claim there cannot be evidence. RLSPH’s proposal to ban claims means that they would be banning evidence and thence science.