1 Types of Historical Experiments

In this paper I wish to advance some novel arguments for using historical experiments in science education. I will begin by clarifying what I mean by “historical experiments”, and what I consider their general purposes to be. After that I will articulate the kind of historical experiments that I particularly advocate, namely those that recover past experimental knowledge that has been neglected by modern science, and extend the knowledge that has been recovered. The possibility of such experiments will be illustrated with examples relating to basic electrochemistry and the boiling point of water. I will then conclude with a consideration of how such historical experiments can help science education.

By “historical experiments” I mean experiments that arise from the study of past science, not from current science and its pedagogical preliminaries. There are various types of work that fall under the general rubric of historical experimentation, including replications of past experiments. In recent years historical experimentation has become a high-profile trend among historians of science and science educators, although the bulk of historical work remains text-based and mainstream science education remains focused on the present.

I will now provide a very brief survey of some key works in historical experimentation, as I will be referring to them in the discussions to follow. By common consent, the most major school of historical experimentation has been the Oldenburg group led by Falk Riess, which has paid equal attention to the concerns of history and education, with a particular focus on the training of science teachers (Riess 2000).Footnote 1 A striking concrete application of the Oldenburg method can be seen in Peter Heering’s (2000) teaching of electrostatics to secondary school students. A notable earlier attempt to use historical experiments in science teaching was made by Samuel Devons and Lillian Hartmann (1970), and Lillian Hartmann Hoddeson (1971).Footnote 2 More recently, the International Pendulum Project coordinated by Michael Matthews took experiments with the pendulum, already in widespread use in physics teaching at all levels, and provided historical content to them (Matthews et al. 2004, pp. 269–272). Douglas Allchin and his colleagues incorporated historical experiments into an innovative interdisciplinary science course for non-majors (Allchin et al. 1999), and Allchin has since then launched a larger initiative to train and support science teachers in the use of history of science, including experimentation.Footnote 3 Elizabeth Cavicchi (2003, 2006, 2008, 2009) has explored the nature of science learning through intensive individual work with students in a variety of historical experiments; Cavicchi’s work has great affinity to Elspeth Crawford’s (1993) practice of using the history of science including experiments to develop independent thinking in students. The various and coordinated efforts by scholars in the European Group on History of Physics in Education have included experimental work (see Bevilacqua and Giannetto 1998, p. 1022). There is indeed a venerable tradition of incorporating history into science teaching, starting with James Bryant Conant’s initiative embodied in the Harvard Case Histories in Experimental Science (1957) and the Harvard Project Physics course created in the 1960s by Gerald Holton, James Rutherford and Fletcher Watson (see Holton 2003), on which the newer and specifically experimental efforts can draw. (For a general overview of the use of HPS in science teaching, see Duschl 1993.)

Within the field of history of science, notable examples of historical experimentation include the following. The work on the early history of electricity focused around the figure of André-Marie Ampère, at CNRS in Paris under the leadership of Christine Blondel, includes a significant experimental dimension.Footnote 4 An informative interaction between the Paris and Oldenburg groups (and various other participants) took place through the debate originating from Peter Heering’s work (1992) on Charles-Augustin Coulomb’s torsion-balance experiment on electrostatic forces (Blondel and Dörries 1994), which has recently been picked up again by Alberto Martinez (2006), and Paolo Palmieri and colleagues.Footnote 5 H. Otto Sibum (1995) made a reconstruction of James Joule’s paddle-wheel apparatus for measuring the mechanical equivalent of heat, and drew much-debated conclusions about the feasibility of the experiment as described by Joule, the skills necessary for its success, and the nature of Joule’s accuracy-claims. A rather neglected case of experimental replication in the area of thermal physics is Sanborn Brown’s models of Count Rumford’s various experimental apparatus (some are shown in Brown 1979). Other examples in thermal physics include Don Metz and Art Stinner (2006) on Rumford’s heat-radiation experiments, James Evans and Brian Popp (1985) on Marc-Auguste Pictet’s experiment on the “reflection of cold” (which Rumford had later repeated and extended), and Christian Sichau (2000) on the Joule–Thomson effect. There have been various replications of Galileo’s experiments, from Thomas Settle’s (1961) ground-breaking early study of the inclined-plane experiment to Paolo Palmieri’s (2008) various recent studies. Michael Faraday’s experiments in electromagnetism have also been a popular subject for replications (e.g., Gooding 1985; Höttecke 2000; Cavicchi 2006).

Nearly all of the above examples are from physics, but recently there has been some pioneering work done in the area of chemistry recently, too. Lawrence Principe’s and William Newman’s major initiatives in the replication of alchemical experiments have attracted much attention (e.g., Principe 2000; Newman 2006).Footnote 6 A different kind of experimental study comes in through the realm of archeology; a notable example is Thilo Rehren, Marcos Martinón-Torres and their colleagues’ study of excavated alchemical instruments to infer the nature of experiments for which they must have been used (Martinón-Torres and Rehren 2008). Other chemical examples include Melvyn Usselman, Alan Rocke and their students’ reconstruction of Justus Liebig’s combustion-analysis apparatus (Usselman et al. 2005), and the work of Ryan Tweney and his students on Faraday’s discovery and investigation of gold colloid (Tweney 2006, and references therein). Experimental work in the history of the social and human sciences may be a rarity, but there have been a set of replication-studies in psychology (see Tweney 2008 for an overview). These experimental studies have not generally penetrated into the education of historians of science, but a notable exception has been Jed Buchwald’s lab-based history of science courses at MIT and then Caltech.Footnote 7 More recently, Cavicchi has offered an experimentally based course on Galileo at MIT.Footnote 8

In order to have a clear overview of this wide variety of efforts in historical experimentation, it will be useful to have a rough-and-ready typology of historical experiments. I see at least three different types. There are two types that may qualify as “replication” (or, more or less synonymously, reproduction, repetition, re-creation, or re-enactment), which are helpfully distinguished from each other by Dietmar Höttecke (2000, pp. 344–345, 353–354). In describing his study of an electric motor developed by Faraday in 1821, Höttecke stresses that “this replication is oriented as close to the original as possible” and also based on an understanding of the historical contexts of the experiment. In contrast, he points out that Coleman’s earlier replication was “displaying the same phenomenon in a physical sense only[,] which is not replicated in historical detail”, resorting to the use of “harmless and [easily] available materials.” Picking up on Höttecke’s distinction, I will refer to historical replications as opposed to physical replications of past experiments.

Now, when practitioners of historical replication say they try to get “as close to the original as possible”, that is usually with a clear awareness of some inherent limits to faithfulness. It is not always possible to match exactly the past instruments and operations described in historical papers. Sometimes it is simply impractical to recreate what the past scientists had; often it is impossible to know what exactly they had (if, for instance, they simply say that “water” was employed). Even where the very instruments and substances employed in the past experiments have been preserved, there is hardly any guarantee that they have survived intact. In addition, it is necessary to set aside the notorious philosophical problem of induction (e.g., worries about, say, whether gravity still follows the same law) and doubts about the semantic stability of apparently unproblematic terms in past documents. Descriptions surviving from the past also need to be supplemented by filling in the blanks where various aspects of instruments and operations are not explicitly specified; an instructive example of this is Jürgen Teichmann’s (1999) careful reconstruction of Galileo’s “jumping-hill” experiment from a rather cryptic page of manuscript. Some of these aspects may be inherently tacit and inarticulable; others will have been left out as well-understood by the readers in the original context; some other aspects may have originally received descriptions that are now lost. Despite all these limitations, the hope in historical replication is that our imperfect reproductions of original experiments will still produce some invaluable insights about the work of past scientists.

In contrast, in physical replication the main objective is to reproduce the physical phenomena that were created and observed in past experiments. This is done for various possible ultimate purposes, as I will discuss further shortly. In physical replication one uses any convenient instruments and procedures that will help one create the phenomena of interest, and faithfulness to the details of the original experiment is of secondary interest. The philosophical challenge in physical replication is not in the verification of the exactness of repetition, but in the characterization of the phenomena to be replicated. If we attempt a physical replication of, say, Boyle’s pneumatic experiments relating to the pressure and volume of air, what is the object to be replicated here? Is it Boyle’s law stating the inverse proportionality of pressure and volume (which Boyle himself did not articulate)? Or the set of numerical data that Boyle recorded? Or the general qualitative “habit” or “custom” of nature according to which volume goes down with increasing pressure (cf. Allchin 2007, p. 19)? The identification of phenomena for replication depends on our own interests and conceptual background; this implies that physical replication is an inevitably present-centered activity in a way that historical replication may at least try to avoid. It is important to note that physical replication is not simply a subset of historical replication, or an inferior version of it. The two types of replication have different aims, and there are some situations in which physical replication is not achieved in our best version of historical replication, though it may be achieved by departing in an informed way from the faithful historical details. An interesting case here is how Heering obtained results conforming to the inverse square law of electrostatic repulsion by employing a Faraday cage around his version of Coulomb’s apparatus (see Pestre 1994, p. 24).

Apart from either type of replication, there is the kind of work that I will characterize as extension; as the term implies, this most often arises as a follow-up to replication. Having performed any experiment (historical or otherwise), it is difficult to resist the natural curiosity (“But what happens if I do this?”) spurring the experimenter on to the next experiment, which may be a variation of the original one, or an altogether different experiment designed to pursue a further question that arises from observations made in the original experiment. Historical experiments are not immune to this drive of curiosity, and it would be unnatural to restrain our desire to learn, even if we are “mere historians” and not card-carrying research scientists. Reports of extension are rare in the literature on historical experiments, but there are some interesting instances. It is certainly difficult to resist the impulse to attempt quantification or higher precision if the original accounts are only qualitative or imprecise; for example, Riess (2000, p. 401) reports that the Oldenburg replication of the Magnus effect (the deviation in the trajectory of a rotating projectile) engaged in quantitative measurements, which Magnus himself had not undertaken, and compared these measurements with theoretical predictions. When Crawford (1993, p. 206) encouraged her students to “do a Faraday”, there was a natural progression from physical replication to extension. In a similar spirit but more generally, Allchin et al. (1999, p. 622) regard history as a source of questions. Some historical experimenters have embraced extension more whole-heartedly; foremost among them is Cavicchi (2009, p. 249) who remarks on one of her teaching experiences: “Using historical accounts only as a starting point and motivation, students’ improvisational experiments explored personal interests and provided grounds for synthesizing new understandings”. Michael E. Gorman and J. Kirby Robinson (1998) taught a fascinating engineering course in which they asked students to plunge themselves back into history and come up with actual designs that would have worked at the time as improvements upon Alexander Graham Bell’s and Elisha Gray’s telephones. Extension may not always serve the purpose of historical understanding, but it is a category of historical experimentation to the extent that it is inspired by the past and would not occur naturally to current scientists without knowledge of the history.

2 Purposes of Historical Experiments

Having distinguished different types of historical experiments, I now want to address the question of why we make historical experiments. What are their ultimate purposes, if we set aside the sheer interest and curiosity of observing natural phenomena and making things that work? In other words, what can anyone gain from making historical experiments, rather than experiments that today’s scientists would naturally make in the course of their research and teaching? I think there are three main purposes served by historical experiments. The first two have been clearly recognized and explored in the existing literature, so I will only give a brief review of them; it is the third purpose that I want to explain and advocate in the remaining sections of this paper.

First, historical experiments can indeed advance our understanding of the past of science. Successful historical replications can bring us to dimensions of past scientific work that are not available in extant descriptions of it. Dominique Pestre (1994, pp. 23–24), with particular reference to the works of Sibum on Joule and Heering on Coulomb, observes that historical replications do help the historian learn the tacit knowledge that was presupposed and used in the past experiments, and also get a better sense of what he calls the “temporal dimension” of scientific work (“the fact that an experiment is a process”). Numerous practitioners of historical replication have given testaments that reinforce Pestre’s view (e.g., Sibum 1995; Höttecke 2000; Heering 2007). So, through historical replications of past experiments, our understanding of history can become deeper, more immediate, and more complete. In this context, what I had noted earlier as inherent limitations of historical replication also become opportunities for better historical understanding: since historical replication inevitably requires the filling-in of gaps, it presents opportunities for creative and active attempts to complete the available picture of the past in non-arbitrary ways (e.g., Höttecke 2000, p. 346).

Historical experiments can also help us assess the intentions behind the texts that past scientists have left to us. Principe’s (2000, pp. 68–70) early success in the replication of a certain alchemical process producing a “tree” gave him enhanced confidence in his interpretations of the highly symbolic language of the texts involved. The successful replication also provided Principe with an argument that underlying the highly metaphorical descriptions of alchemy was “a solid body of repeated and repeatable observations of laboratory results”, against the view that these texts were merely symbolic and literary. In the rather opposite direction, if replication fails despite serious efforts, that gives us cause to re-examine the intentions and even the honesty of the past scientists. Commenting on what is now perhaps the most famous case of this kind, John Heilbron remarks (1994, p. 151): “Peter Heering’s failure to reproduce Coulomb’s experiment has been a great success.” Heering’s work led to a serious debate, and some sophisticated conclusions have emerged about Coulomb’s intentions and the contemporary reception of his work. As Heilbron puts it (p. 159): “Coulomb’s paper offered not a measurement, but a means of demonstration” or illustration of the presumed inverse-square law; “with this understanding… we may free him from the charge of fraud implied by later failures to confirm his numbers.” It should be noted that historical replication is not the only type of historical experimentation that can aid historical understanding. Physical replication or even extension can also help us gain a better sense of what was possible and plausible.

Second, historical experiments can be used to refine our philosophy of science, or, to improve our conceptions of the nature of science (NOS).Footnote 9 As Christian Sichau (2000, p. 396) puts it, experimental replication “shows how physics is (or has been) made.” There has been less attention paid to this aspect than to the purpose of historiography, perhaps because not many philosophers of science have yet recognized historical experiments as a useful source of insight, but there is certainly potential here that has not been fully exploited. It would be a mistake, however, to assume that the philosophical lesson arising from historical experiments is always along similar lines. An interesting reminder here is the work of Usselman et al. (2005), who take their relatively easy success in replicating Liebig’s analytical experiments as an indication that the tacit dimension of experimentation has been exaggerated by certain sociologists, historians and philosophers (especially Harry Collins). This is in clear contrast to the conclusion drawn by many other scholars from their work in replications, which is that tacit knowledge is both pervasive and important.

One thing is nearly universal, however. Getting involved in historical experiments will almost invariably teach students (and teachers) that things are more complicated that they had been led to believe—more specifically, historical experiments will help one get beyond the oversimplifications prevalent in what Berry van Berkel et al. (2000) have called “normal science education” to designate what Thomas Kuhn (1970, 1963) had identified as the typical kind of training that science students undergo in order to become equipped for the practice of “normal science”. Now, for this mode of philosophical enlightenment, what we do need not be historical, or experimental; in other words, the true nature of scientific work can be experienced through any real scientific activity, which could be modern experimental research, or the textual study of past experimental science, or even the practices of theoretical science. But I suppose that doing historical experiments is quite a sure way of getting away from today’s textbook presentations of science, as Allchin stresses (2007, p. 29); the replication approach is effective in promoting his “lawless” view of science, for instance taking us away from the facile indoctrination into Boyle’s law, toward a more insightful engagement with Boyle’s J-tube instrument. For such a purpose it is probably best to attempt historical replications rather than physical replications, but again there would be some cases in which physical replications or even extensions can be informative. The textbook simplifications about NOS can be, and have been, challenged in various directions, of which I would like to cite two most interesting examples. Heering (2000, p. 366) reports that his students found it refreshing (and sometimes disturbing) to realize that the most successful theories they came up with in order to explain their experimental results were bound to be rejected as their work advanced; here we have more than a hint of the anti-realist philosophers’ pessimistic induction from the history of science (Laudan 1981). Cavicchi (2008) argues that engaging in historical experiments reveals the interconnectedness of natural phenomena that is masked by the disciplinary divisions created and enforced for the convenience of teaching; here, interestingly, unity goes against simplicity.

Now I come to the third purpose of historical experiments: to improve scientific knowledge itself—that is, to gain more, better, or different knowledge of nature than current science delivers. This purpose is not usually noted by the promoters of historical experiments, and its illustration and advocacy will be my main object in the rest of this paper. What I say here is a continuation of my discussion of the function of history and philosophy of science (HPS) as “complementary science” (Chang 1999, 2004, Chap. 6), elaborating on its experimental dimension.

Replications of past experiments can serve the function of the recovery of scientific knowledge, if the past results being replicated are not previously known to us. Many primary sources from the past of science are full of observational reports that sound very wrong from the modern point of view. This is reminiscent of a well-known and controversial thesis in the philosophy of science, which says that the progress of science results in some loss of knowledge as well as obvious gains. For Kuhn (1970, Chap. 9), this was an inevitable consequence of revolutionary change. Making replications of forgotten experiments brings this point to the fore in a striking way. The dominant tendency in historical experiments, among historians and science educators alike, has been to replicate the classic experiments that form the basis of modern science. This has been the trend even for many scholars who are very critical of the common ideology of science and the standard practices of science education. My own inclination, on the contrary, is to examine strange reports that seem to go against current scientific wisdom, perhaps in line with the spirit of Principe’s and Newman’s efforts to see if alchemical claims can be validated. I also admire Evans and Popp’s replication of the apparent radiation and reflection of cold which, they report, “most physicists” find “surprising and even puzzling” (1985, p. 738).Footnote 10 At the end of his report on teaching electrostatics historically, Heering states (2000, p. 369): “historical experiments… should not be limited to those that can be found in physics textbooks. Maybe the forgotten experiments such as the fair experiments are actually the more instructive ones.” With this paper I hope to strengthen Heering’s “maybe” considerably.

In this line of work what matters is physical replication, not historical replication. In fact, if a past scientist reports a strange phenomenon, and we can replicate it without even using precisely the same materials and procedures that were originally employed, that renders the phenomenon more robust and worthy of attention. So, for example, I think it was entirely legitimate, and actually helpful, for Evans and Popp (1985) to use bent cardboard pieces covered with aluminum foil to simulate eighteenth-century concave metallic mirrors. This is as it happens among scientists when a new result is reported; if Newton’s optical results had been only replicable using English glass rather than Venetian glass, then they would have been suspected of being artifacts of specific instrumentation.Footnote 11 Therefore, unlikely as it may sound, a full historical replication is inferior to an unabashed physical replication, for the purpose of recovering lost knowledge. If certain anomalous-sounding results can be successfully replicated, then we will have learned something new (though old) about nature, not just about the history of science.

The purpose of improving scientific knowledge can also be served by the mode of historical experimentation that I described above as extension. Now, sometimes when we make extensions from other historical experiments, we may only be retracing steps taken by past scientists; in such cases we do not gain any strictly new knowledge of nature, though we may thereby get a better handle on the past scientists’ thinking process, or gain a better philosophical understanding of the nature of the discovery process. But sometimes we may make extensions that do not go in directions that the subsequent history of science actually took. If these experiments yield any interesting or noteworthy results, then we will have made genuine original contributions to scientific knowledge, not merely recovered some lost knowledge.

I expect that my talk of improving scientific knowledge through historical experiments will sound highly implausible to many readers, even if they can accept the logical possibility of it. Therefore, before I continue with the discussion of the pedagogical implications of my claim (Sect. 6), in the next three sections I will give some concrete illustrations of the kind of work I have in mind.

3 Boiling Point: Prelude on Learning from Past Science

With the help of a few striking examples I hope to make the case that modern science has forgotten some of the simple phenomena that originally stimulated its own development or abandoned the task of explaining them, and that HPS can help remedy this situation without denying or denigrating the achievements of modern science. The main body of examples that I wish to use in illustrating my claims comes from my own work concerning basic electrochemistry. But I will start with a brief discussion of the case of the anomalous variations in the boiling point of water, which will also explain how I got into the business of historical experiments to begin with.

While studying the early history of thermometry for my book Inventing Temperature (Chang 2004), I came across many reports of unruly variations in the boiling point of water. I do not mean the well-known effects of pressure variations, or impurities. Many scientists around 1800 observed that the boiling temperature of pure (distilled) water under standard pressure depended greatly on the material of the vessel employed, on the exact manner of heating, and on the amount of dissolved air present in the water. I reported these observations in my book but, respecting the usual custom among historians, I refrained from saying whether I thought these observations were correct. After the manuscript went off to press, however, curiosity got the better of me and I had to see for myself. (See Chang 2007 and Chang 2008 for detailed reports, discussions and further references; video clips of experiments are available in Chang 2007, online.)

I began with the claim from the 1810s by Joseph-Louis Gay-Lussac (1818), also reported approvingly by Jean-Baptiste Biot (1816, vol. 1, pp. 41–43), that the temperature of boiling water was 101.232°C in a glass container while it was exactly 100°C in a metallic container. This ought to be easy to check (and refute), I thought. To my surprise I saw that Gay-Lussac was essentially correct: the material of the vessel does make a clear difference, with temperatures easily reaching 102°C in a ceramic mug, and vigorous boiling happening below 99°C in a Teflon-coated pan. Here I was obviously not attempting to make an exact historical replication of Gay-Lussac’s experiments, as I did not try to match the exact compositions of metal and glass that he employed, or the precise dimensions of his vessels, or the exact manner of heating (which are not fully stated in the publications in any case). Instead, my immediate objective was the physical replication of the glass–metal difference. Then I went on to make extensions, using different types of metals and glass, ceramic vessels, and some materials not available at Gay-Lussac’s time (such as Teflon coating). I also could not help observing aspects of the phenomena not noted by Gay-Lussac, such as the shape and frequency of vapor bubbles, and the variations of temperature as boiling goes on. (In modern terms, the most crucial parameter is the density of small surface irregularities that work as sites of bubble-formation, or, nucleation.)

All this was fascinating, but history had even bigger surprises in store. The Genevan polymath Jean-André De Luc had noted in his book of 1772 that in ordinary boiling the bubbles only originated from the layer of water immediately in contact with the heated surface. That layer must be much hotter than the main body of the water, in which we insert the thermometer. In order to find out the temperature of what he called “true ebullition”, De Luc (1772, vol. 2, pp. 362–364) tried to bring the whole body of water to the same temperature by heating it slowly while minimizing the loss of heat at the surface. So he took a round flask with a long thin neck, and heated it by immersion in a bath of hot oil. In this kind of setup the character of boiling changes as it goes on, becoming more and more erratic with temperature going well above the normal boiling point (in my experiments it was easy to reach 104°C). De Luc worked out that bubble-formation was facilitated by the presence of dissolved air in the water. The process of boiling has the effect of sweeping out dissolved air, so boiling becomes more difficult as it goes on. For De Luc (1772, vol. 2, pp. 387–397), boiling facilitated by dissolved air was not real boiling; he wanted to study boiling in truly pure water. To remove the last bit of air that still remains even after prolonged boiling, De Luc used a kinetic method. Anyone who has ever made the mistake of shaking an unopened bottle of sparkling drink knows that mechanical agitation tends to dislodge dissolved gases. So, shaking is what De Luc did. He reports: “This operation lasted 4 weeks, during which I hardly ever put down my flask, except to sleep, to do business in town, and to do things that required both hands. I ate, I read, I wrote, I saw my friends, I took my walks, all the while shaking my water.” De Luc’s precious degassed water could stand the temperature of 97.5°C even in a vacuum, and under normal atmospheric pressure it reached 112.2°C before boiling off explosively.

Lacking De Luc’s dedication, I was not willing to commit to 4 weeks of shaking to see if his result could be reproduced. Fortunately, as my objective was the physical replication of the “superheating” of degassed water rather than the historical replication of De Luc’s particular experiment, it was sufficient to find some workable degassing method. In the end I devised an alternative that seemed almost as good as De Luc’s, and takes only about 30 minutes in all. With that, I was able to replicate De Luc’s results, including superheating up to around 110°C and an explosion at the breaking of that superheating. My degassing procedure begins with the recognition that heating water to 100°C already removes much of the dissolved air, since the solubility of air in water decreases sharply with temperature. And then the water is boiled for a while in a loosely covered pot, to sweep out as much of the remaining air as possible through the bubbling process while preventing the entry of fresh air. Then the boiled water is poured carefully into a long-necked flask, and placed on a hotplate. Boiling in this partially degassed water is very bumpy, and the temperature goes well beyond 100°C, almost certainly resulting in further degassing (I say “almost certainly” because I have not been able to find data for the solubility of atmospheric gases in water beyond 100°C). After a while the flask is removed from the hotplate and allowed to cool slightly.

The boiling of the degassed water was done by inserting the flask into a graphite bath for gentler heating, with the graphite temperature at about 250°C; this was more convenient and safer than De Luc’s oil-bath arrangement. At high degrees of superheating the insertion of an ordinary mercury thermometer into the water excites violent boiling, as the roughness at the tip of the thermometer serves as an effective site for nucleation (bubble-formation). Therefore the temperature of the water in the graphite bath can only be monitored intermittently.Footnote 12 For the most part the water is absolutely still, although its temperature is very high. Inserting the thermometer prompts very active boiling, bringing the temperature down; even so, temperatures of 107–109°C are easily recorded. At high degrees of superheating, the water will explode on contact with the thermometer, or sometimes spontaneously.

The immediate lesson from the case of boiling was that we can learn fresh things about nature from past science. It seemed to me unbelievable and wonderful that a 230-year-old text could teach me something basic about physics that I had never heard of in my years of studying physics at today’s elite universities. Now, at least one part of what I was learning from history, namely the possibility of superheating, does receive treatment in some modern textbooks of physics and chemistry, but only in passing, without sufficient depth or detail (e.g., Oxtoby et al. 1999, p. 153; Atkins 1987, p. 154; Atkins and De Paula 2010, p. 653; Silbey and Alberty 2001, p. 190; Levine 2002, p. 220; Rowlinson 1969, p. 20). Interestingly, the explanations of superheating offered in these texts are quite diverse. Silbey and Alberty attribute it to the collapse of nascent vapor bubbles due to surface tension; according to Atkins (and also Atkins and De Paula), superheating may occur “because the vapor pressure inside a cavity is artificially low”, which can happen for instance when the water is not stirred. Oxtoby et al. claim that superheating can only occur when water is heated rapidly (which is quite the opposite of De Luc’s and my observations).

Better modern knowledge about boiling actually resides in the field of engineering, not science, as my colleagues in physical chemistry advised me.Footnote 13 Today’s engineers who work on heat transfer, as well as some physical chemists, do know a great deal about the intricacies of boiling (e.g., Incropera and DeWitt 1996; Hewitt et al. 1997; Kandlikar 1999). As these phenomena have no place (literally) in the physicists’ phase diagram, today’s engineers have an entirely different way of thinking about boiling, represented by their “boiling curve”, which plots the rate of heat transfer against “surface superheat” (e.g., Incropera and DeWitt 1996, p. 502, Fig. 10.4). But even these engineering specialists on boiling do not seem to know everything, especially about the action of dissolved gases. At any rate, what exactly the specialists know and don’t know is not quite the issue here. Why should something like the boiling of water, at least in its basic phenomenology, be consigned to the realm of specialists? Most of us boil water on a daily basis. It is not right that we go around repeating that pure water under standard pressure always boils at 100°C, scolding children and marking down students if they don’t repeat that piece of untruth back to us. If the boiling of water is only for specialists or advanced students, what is left for the beginners? In fact we do teach about boiling at very low levels of science education; why is it that we have to do it in a patently incorrect way?Footnote 14

4 Recovery of Knowledge from Early Electrochemistry

In my presentation at the Nordic Symposium, the material in Sects. 4 and 5 was accompanied by photos and video clips of various experiments. These will be available online through http://www.hps.cam.ac.uk/staff.html.

I now come to the central case that I want to discuss in this paper, which concerns an equally basic item of science. The gist of this study is that some very simple electrochemical phenomena well-known in the early nineteenth century are quite neglected in modern chemistry, and that we can learn some interesting science by considering these phenomena again. Alessandro Volta’s battery itself was the subject of a major controversy; the instrument and its effects were very easily replicated almost immediately all across Europe, but there was no consensus on the explanation of its mechanism. Helge Kragh (2000) gives an insightful overview of the long and complex debate that raged throughout the nineteenth century between those who believed (following Volta) that the electrical action was caused by the contact between two different metals, and those who believed that the electricity was produced by chemical reactions.Footnote 16 Kragh concludes that the dispute was never really resolved; rather, it lost its urgency and fizzled out coming into the twentieth century. The vexing questions driving the nineteenth-century debates and many of the various experiments invoked by the opposing camps of scientists are now mostly forgotten, and they certainly do not feature in standard textbooks of chemistry. Even among professional historians of science, the details of nineteenth-century electrochemical debates are no longer common knowledge. The most thorough treatments of this history are still to be found in the older secondary literature, such as the classic treatises by J. R. Partington (1964) and Wilhelm Ostwald ([1895] 1980). A happy exception to the current dearth of interest is the set of papers published in the Nuova Voltiana volumes (Bevilacqua and Fregonese 2000–03), especially those by Kragh (2000) and Nahum Kipnis (2001). There is also Sungook Hong’s (1994) detailed account of one curious phase of this history, in which Kelvin revived Volta’s contact theory in the 1860s.

I started my experimental work in electrochemistry with the replication of a very simple and most intriguing electrochemical experiment by the physician-turned-chemist William Hyde Wollaston in London (Wollaston 1801, p. 427). He began with the already well-known observation that certain metals dissolved in acids, releasing bubbles of hydrogen. This phenomenon can be observed very easily and safely by dipping a piece of zinc in dilute hydrochloric acid or sulphuric acid; the zinc dissolves slowly, producing a fine stream of hydrogen bubbles.Footnote 17 Add to the same pot of acid a piece of silver, and no visible reaction happens there since these acids do not attack silver. But just make the zinc and the silver touch, and hydrogen bubbles immediately start issuing from the silver as well as the zinc. Wollaston noted the same phenomena with any combination of an acid and two metals, only one of which (on its own) is dissolved by the acid. I attempted replicating this experiment, and succeeded immediately. Again, it was a physical replication I was making, but in this case it was very nearly a historical replication as well, since Wollaston’s setup was so straightforward and it was not difficult to follow his directions (even though he did not give the exact specification of his wires, the exact strengths of his acids, etc.). Among the different possible materials specified by Wollaston, I found it most convenient and economical to work with hydrochloric acid (HCl, at about 5% concentration), and zinc and copper wires (diameter 1 mm).

There are two surprising things about this experiment. First, why doesn’t everyone know about it? How is it even possible to avoid trying out this experiment just by accident? Curiously, I have not yet met any chemists or chemistry students who had done this experiment before I showed it to them. It seems to be an excellent case of a neglected piece of past scientific knowledge, which is also extremely easy to recover. The other surprise is the difficulty of understanding what is going on in this experiment. Wollaston’s own view was that “in the solution of a metal, electricity is evolved during the action of the acid upon it”; the other metal, (silver or copper) “serves merely as a conductor of electricity, and thereby occasions the formation of hydrogen gas” at its surface (pp. 428–429). Wollaston was using the dominant conceptions of his day, which took electricity to be a fluid; moreover, he seems to have subscribed to the one-fluid theory of electricity. His theoretical account of the experiment, though very terse, was probably as good as any other story available at the time. For us today, however, it is not necessarily satisfactory to rest with Wollaston’s account. By the recovery of Wollaston’s experiment I have placed it in the twenty-first century, so it is no longer merely a piece of history. There is no compelling reason to stifle our scientific curiosity and refrain from asking for explanations of the phenomenon revealed in the experiment that make sense to us. (I take it that Heering (2000, p. 366) was thinking along the same lines when he encouraged his secondary school students to devise their own explanations of what they observed in their historical experiments in electrostatics.)

Modern textbook accounts (e.g., Stoker 2005, p. 563) say that in an acid–metal reaction the hydrogen ions in the acid take electrons from the metal, turning themselves into hydrogen gas; this transfer of electrons ionizes the metal, which then dissolves in the aqueous acid. But if that is what happens, how does the reaction generate any excess electrons that travel over to the copper side to make hydrogen gas there? In my opinion, this is an incomplete account of what acids and metals do to each other. Yes, according to the common Brønsted–Lowry theoryFootnote 18 it is the hydrogen ion that defines acidity, and H+ concentration is indeed what pH meters measure. But it seems to me that a crucial role is also played by the anion (the negative ion), which is specific to each acid. That would help to explain the fact that hydrochloric acid (HCl) is powerless to attack copper but nitric acid (HNO3) dissolves it readily, while both acids should provide an abundance of H+ ions. Also note that the nitric-acid reaction produces not hydrogen gas but nitrogen oxide, which promptly reacts with oxygen in the air to create the red fumes of nitrogen dioxide. I learn from T. M. Lowry’s own textbook (1936, p. 91) that there is no simple story about what happens in this reaction. And I am not entirely alone in having these unorthodox thoughts about the role of anions (see Whitby 1933; Evans 1944; Levine 2002, p. 413).

Things get even more interesting if we note that the topology of Wollaston’s experiment is actually the same as that of Volta’s cell: namely, two different metals with an electrolyte between them. Now, if our purpose were historical replication, it would be crucial to note that Volta saw the configuration of his cell as a bi-metallic pair with an electrolyte off to one side, the latter merely providing a non-metallic path for carrying off the electricity generated by the metals. This is clearly incommensurable with the chemical conception of the same cell, according to which the electricity is generated in the reaction between the electrolyte and one of the metals, the other metal serving merely as a conductor (Kuhn 2000, pp. 22–24). However, for the purpose of physical replication it is equally important to note that these mutually incommensurable conceptions both correspond to one and the same physical apparatus, when the circuit is closed. Connecting up multiple cells, we literally have a battery of them, which is the origin of that term. Volta himself had such an arrangement, which he called “the crown of cups”, though it is less famous than his so-called “pile”, which had pairs of metallic disks separated by layers of electrolyte-soaked paper (Volta 1800). Volta’s pile and crown are both very easy to reproduce, using any pairs of common metals and any electrolyte. In one of my replications, 6 cups of dilute HCl connected in series by joined-up pairs of copper and zinc wires produced a potential of 4.3 V; the voltage nicely decreased in steady steps as the cups were removed one by one from the circuit.

Now, if Wollaston’s setup is Volta’s cell, then we should be able to understand it simply by referring to the modern explanation of Volta’s cell. So, what is the standard modern explanation of Volta’s cell? Surprisingly, there isn’t one readily available. Standard theoretical treatments of electrochemical cells in physical chemistry textbooks (e.g., Atkins and De Paula 2010, Chap. 6) occur under the rubric of thermodynamic equilibrium, and they focus on computing the steady-state half-cell voltages from the Nernst equation. For anyone wanting a rather mechanical or causal story about how free electrons start getting produced and get moved about, the modern textbook theory is a difficult thing to apply. Most of the lower-level or practically oriented textbooks do attempt to give a more intuitive explanation of electrochemical cells, but what we get almost everywhere we turn is an explanation of the Daniell cell (named after John Frederic Daniell, who taught at King’s College London in the mid-nineteenth century), in which the electrolyte consists of two different solutions connected by a salt bridge or a porous barrier (e.g., Housecroft and Constable 2010, p. 638; Gilbert et al. 2009, pp. 894–895; R. Chang 2010, p. 841; Ramsden 1994, p. 281; Snyder 2003, p. 263). In this setup each metal is dipped in its own solution, and the electrical activity is conveniently explained in terms of the imbalance of the redox potentials on the two sides. But Volta’s original cell, which has only one electrolyte, containing no ions of either metal to begin with, cannot be explained in this way. Consequently Volta’s cell has disappeared from basic electrochemical thinkingFootnote 19; so has Volta’s original theory, which attributed the electrical action to the contact between two different metals, not to chemical reactions. Volta’s notion of contact action survives in the form of the physicist’s contact potential (linked to the work function of each metal), but this is not part of the standard chemical discourse today; in my admittedly limited survey, I have only seen one chemistry textbook in which the contact potential is mentioned (Levine 2002, p. 413), and even in that case it is not actually employed in giving an explanation of electrochemical cells.

5 Electrochemical Extension: Volta’s Cell in the Twenty-first Century

Once I had identified Volta’s original battery as a missing piece in modern electrochemistry, I was propelled into a different kind of work (departing from history per se), which I have characterized above as extension. Replicating the old experiments was quite easy. But I wanted to understand the phenomena revealed in these experiments in modern theoretical terms, or, more to the point, in any terms that could help me make sense of them. I had some ideas in that regard and devised further experiments to test them, and in the course of those experiments I also found new phenomena, which then led to further theorizing. Thus the replication of Wollaston’s and Volta’s experiments provided a fertile ground for extension. I have been engaged in this new phase of work over the period of 2 years now (as I can only do it in my “spare time”), and I have regularly amused my hosts in the Chemistry Department at University College London with my new experiments and unorthodox ideas.Footnote 20 Here I will present some brief highlights from this ongoing work. There are three main points worth discussing.

The first is the removal of a red herring; this is a point already made above, but it is worth coming back to in a different way. The bubbling up of hydrogen on the zinc side of an acid-based Voltaic cell is mostly an irrelevant side-show so far as the production of electric current is concerned. Into the original Wollaston setup (copper and zinc dipped into acid and connected with each other) one can easily insert an ammeter between the two wires in order to measure the amount of current flowing through the circuit formed by the two metals and the acid. In a typical trial the current registered was 13 mA.Footnote 21 Then I increased the amount of zinc immersed in the acid by about 20 times; the zinc-side reaction was quite exciting, but there was no increase of current. On the other hand, increasing the amount of copper to a similar extent produced a marked increase of current, to 97 mA. Even with just the tip of the zinc wire immersed in the acid, a very good amount of current (82 mA) was sustained as long as there was a lot of copper. So I think the particular effectiveness of acids as electrolytes in the Voltaic cell is due to the provision of H+ ions in the vicinity of the copper (or more generally, the less reactive metal), to receive electrons there and facilitate a flow of current. This is also consistent with the result of an early nineteenth-century experiment by William Sturgeon, who made a Voltaic cell using zinc–mercury amalgam instead of plain zinc, which produced a very good amount of electricity with no production of hydrogen on the zinc side. This experiment was replicated successfully by my student Alexandra Sinclair (2009, pp. 25–36); in one of her typical trials, we observed 43 mA of current with no bubbling at all from the zinc-amalgam wire. What is even more decisive, in my mind, is the fact that an acid is not necessary for making a Voltaic cell. In fact, Volta’s original pile did not use acids, but salt water—that is, mostly, NaCl solution (Volta 1800, pp. 404, 406).Footnote 22 This is also easily replicated. The active species here has to be Cl, not Na+.

Having removed the red herring, let me come to the dispute between Volta and others about the cause of the electrical activity in the battery. My sense is that both sides were partially right. In a typical battery both things are going on, all mixed up. The action of both causes can be exhibited nicely in a version of the Wollaston experiment using zinc and gold wires. What happens here is qualitatively the same as what happens in the experiment with zinc and copper, but in this case there is much more bubbling happening at the gold wire than at the zinc, although gold by itself does not generate any bubbles at all in HCl. Assuming that bubbling from the gold wire is caused by the electrons generated by the chemical reaction at the zinc wire, the great bubbling activity on the gold side leaves most observers with no doubt that there is something actively pulling the electrons over to the gold from the zinc, rather than electrons merely “spilling over” from the zinc to the gold. This makes a convincing demonstration of both the Voltaic contact potential and the chemical generation of electric current. I have been attempting to devise experiments that exhibit each of these two actions without the presence of the other, and I have been partially successful.

On the one hand, there are at least some candidate cells that seem to generate a sizable voltage (though minimal current) without obvious chemical reactions. Historically, De Luc and others made various “dry piles” that used dry layers (paper, etc.) instead of electrolytes, which were subjects of great experimental and theoretical debate (Hackmann 2001; Ostwald [1895] 1980, vol. 1, pp. 346–353; Partington 1964, pp. 16–17). There is a long-surviving example of a dry pile in the Clarendon Laboratory at Oxford which, as of 1984, had been ringing a bell nearly continuously for 144 years! A. J. Croft (1984), who reports on this remarkable instruments, says that “what the piles are made of is not known with certainty”, but that “a considerable number” of dry piles inspired by this instrument were made for military purposes during the Second World War by the Oxford physicist A. Elliott.Footnote 23 After much debate back in the nineteenth century an agreement was reached that the operation of the dry pile relied on the presence of moisture in the air; however, there was never a conclusive agreement on whether the role of moisture was to make the dry layers conducting or to generate electricity by facilitating chemical reactions. In my own experiments, two metals dipped in very pure deionized water produces good voltages. And in fact, putting my dry thumb between a zinc disc and a copper disc produces a clearly measurable voltage (up to 0.6 V). These cases made me wonder if there could indeed be cells in which electricity is generated by bi-metallic action as Volta had thought, and the electrolyte conducts electricity by means other than chemical reactions.Footnote 24 On the other hand, one can also easily make a battery without involving two different metals at all. One simple setup is two zinc wires dipped into HCl of different concentrations (10% and 1% work quite well), with the two solutions connected by a twine or a twisted strip of paper towel, which functions as a make-shift salt bridge. In this experiment a potential of 0.1 V can easily be generated. Historically, Humphry Davy (1807, p. 33) confounded Volta by making a cell using just one metal, or even no metal at all but a piece of charcoal and two different liquids. Volta himself, fascinated by the thought that his “pile” was a realistic model of the torpedo (electric fish), made a battery using pieces of bone instead of metal (Pancaldi 2005, p. 205). In the cells not involving contact between two different metals, it seems clear (in modern terms) that the net flow of electrons is caused by the different rates of chemical electron-generation on the two sides, which creates an imbalance of pressure.

The phenomena in all these cases are actually quite complex, and their theoretical interpretation is not always straightforward. For example, there is one puzzling thing I noticed in the original Wollaston setup, which is that the voltage of the Wollaston–Volta cell actually decreases with increasing concentration of the acid, beyond a certain threshold. The maximum voltage that I have been able to achieve with a cell made up of a zinc wire and a copper wire in HCl (0.99 V) was obtained when the pH of the solution was 2.5, which is only about as acidic as common vinegar. With the stronger solutions that I used in experiments reported earlier (up to 10% concentration, with pH near 0 or even negative), the typical voltage was in the range of 0.62–0.75 V. It is difficult to see how this negative correlation between pH and voltage can be explained, and this difficulty is a useful reminder that we do not have a simple story about how the voltage is produced. Higher acid concentration should correspond to higher level of chemical activity, and it does generate higher levels of current; why does it generate lower voltage?

The third main point I want to discuss concerns a puzzle that I have only begun to investigate seriously. This arises from the original experiments of Volta, in which he used salt water as the electrolyte. (He also made cells using plain water, but that is another issue!) Let’s suppose for the moment that my basic conception of what happens here is correct: the anions attack the zinc, removing Zn2+ ions into the solution and creating an excess of free electrons; the electrons then get pulled into the copper by the Voltaic contact force, and get released into the NaCl solution from the copper. But then what happens to these electrons? They seem to disappear without a trace. There is no production of hydrogen bubbles, and all the chemists tell me that it is not imaginable that the electrons would combine with sodium ions (Na+) in the NaCl solution (and sure enough, there is no sodium metal produced at the copper wire).

This led me to an extension which does not seem to have occurred to Volta and his contemporaries (but I may be wrong about this), though they certainly had the material resources for it: what happens if we pump lots of electrons into the NaCl solution, by connecting a battery into the circuit with the negative terminal connected to the copper? (I used a saturated NaCl solution, to get maximum effect.) The result of this experiment is quite striking: on connecting the battery (I used two 1.5 V batteries in series), there is immediately a fizz of very active bubbling from the copper wire; meanwhile the copper wire also begins to darken and then becomes coated in a thick layer of black gunk. On the zinc side a white precipitate starts to come off. The theoretical analysis of what happens in this experiment is a rather long and ongoing story, but my provisional conclusion is the following. The gas coming out of the copper wire is hydrogen, the black stuff collecting on that side is zinc, and the white precipitate coming from the zinc side is zinc hydroxide. All of that would mean that there are either a significant number of H+ ions in NaCl, or more likely, the battery is able to decompose water directly on the cathode (copper) side. In the decomposition of water, the H+ ions combine with the electrons supplied by the battery to make hydrogen gas; meanwhile the OH ions must combine with zinc from the anode, while the Cl ions remain dissolved.

Although the conventional wisdom among chemists seems to be that the production of hydrogen gas in electrolysis must proceed from pre-existing hydrogen ions (H+ or H3O+), some respectable chemists have indeed postulated a direct decomposition of H2O in electrolysis. Linus Pauling and Peter Pauling (1975, p. 356–358) considered that this would be the dominant cathode-side reaction in neutral salt solutions (2e + 2H2O → H2 + 2OH), while in acidic solutions it would be simply the combination of protons and electrons. Among more recent authors, Raymond Chang (2010, p. 868) is one who follows Pauling and Pauling on this point. Both texts are discussing the electrolysis of dilute salt solutions, but I think in concentrated solutions, too, the scarcity of hydrogen ions would have the same consequence. To return, then, to where this train of inquiry began: it seems plausible that in Volta’s NaCl cell the electrons coming through the copper electrode does produce hydrogen gas, but perhaps at a low enough rate that it becomes dissolved in the water rather than emerging as bubbles. My investigations are continuing.

More generally, the chemistry of a Volta-type battery is not simple, and offers intriguing opportunities for creative investigations. Most chemistry textbooks at university level and below are frustratingly silent on the matter, but there is a good starting point in the following admission in Carl H. Snyder’s delightful textbook (2003, p. 258): “Although the carbon-zinc battery is one of our simpler consumer products, the chemistry that goes on inside that wet, black paste is much too complex to be described in detail here.” Some textbooks do give a simple treatment of the commercial “dry cell”, but admit that what they give is an oversimplification (e.g., R. Chang 2010, p. 857; Pauling and Pauling 1975, pp. 374–375).

6 Can Complementary Experiments Help Science Education?

It seems clear that science does leave some valuable things behind as it progresses. Some basic phenomena become forgotten, and some basic questions cease to command attention. But these losses are not detrimental to specialist science. How water boils is no longer fundamental to thermodynamics, and how Volta’s cell works is no longer so relevant to cutting-edge electrochemistry. As science develops, it may be the case that nothing of fundamental importance rests any longer on its historical starting points. Metaphorically: the upper layers of a tower can be supported by structures other than what they first rested on during the construction process. We know the Eiffel tower stands very well with a great empty space at the bottom—likewise for science. All the same, that does not mean that the no-longer-urgent questions are now unimportant in an absolute sense. Someone should still be investigating them. The discipline of history and philosophy of science (HPS) can serve as a refuge for these and other neglected and excluded scientific questions. In that way, HPS becomes an enterprise that complements specialist science, neither hostile nor subservient to it. HPS in this complementary mode is not about science; rather, it is science, only not as we know it. So I have called it “complementary science” (Chang 1999, 2004, Chap. 6).

In this paper I have discussed how historical experiments can serve the purposes of complementary science. In the last three sections I have given examples of the recovery and extension of scientific knowledge, by means of the physical replication of experimental results that are currently neglected, and by new experiments arising from the replications that address fresh questions.Footnote 25 These two types of historical experiments fall under the category of what I will call complementary experiments, as they belong to the experimental side of the enterprise of complementary science.Footnote 26 I hope I have done enough so far to make a plausible case that these complementary experiments can improve our knowledge of nature. I now want to advance the view that complementary experiments can also aid science education, which is a separate claim that requires its own justification.

If the nature of science is such that only some of the important and interesting questions receive attention, then what does that imply about how we should teach science? Much of science teaching has been governed by the aim of inducting students as effectively as possible into the basic framework of modern science. Teachers of science right up to graduate school often behave like overprotective parents, guiding students carefully on a strict and narrow path towards current specialist knowledge. There is a long, long way to go toward sufficient competency in specialist science, and distractions, even by perfectly scientific questions, are not welcome. This is how science teaching has lost sight of Volta’s cell in electrochemistry, superheating in thermodynamics, and many other things. All of that is understandable for “normal science education” (Van Berkel et al. 2000). But by teaching science in such a rigid way we impoverish the content of science, misrepresent the nature of science, and de-motivate the majority of students. And by making the learning of science safe we also make it devoid of original thinking and independent inquiry. I believe that it will be beneficial, for the students themselves and for society at large, to incorporate complementary science into science education at all levels.

More specifically, what I want to argue here is that complementary experiments have a particular value for science education, in comparison to other dimensions of complementary science, and in comparison to other types of historical experiments. Perhaps the most common type of historical experiments in the context of science education has been the physical replication of important, well-known and unquestioned results. Reproducing these “classic experiments” can be an aid to teaching basic scientific facts and ideas, making them more vivid and memorable (Devons and Hartmann 1970)—but that seems to be a general benefit of laboratory demonstrations or experiments, not anything peculiar to historical experiments. In contrast, the physicalFootnote 27 replication of results that are unknown or considered implausible in current science will have different and much more interesting effects on education. Likewise for extensions into directions divergent from modern science. There are four different ways in which complementary experiments can improve science education.

Complementary experiments will introduce students to phenomena well-known to past scientists and some current experts but generally neglected in today’s science teaching. This plainly adds to students’ knowledge at a fairly factual level. For example, I think all students ought to learn that the boiling temperature of water depends significantly on the shape and material of the container and the amount of dissolved gases. They should also learn that they can make a battery from any two metals and salt water. And so on. For the sake of focusing on phenomena that are more easily explained, today’s science teaching tends to keep from students (and therefore from general society) some important knowledge, often about their immediate physical surroundings. This also enhances the widespread misconception that science is irrelevant to everyday life. But will it not confuse students if we show them strange experiments which we do not know how to explain clearly? Yes, it will, at least to some degree. But what good is clarity and certainty about stories that are false, at best very limited? Cavicchi (2003, 2009) and Crawford (1993) even argue quite plausibly that confusion has a positive pedagogical role to play.

Doing experiments that fall outside standard pedagogical frameworks will also give students an improved sense of the nature of scientific practice. This should be taken as part of the general argument that HPS helps NOS teaching (e.g., Matthews 1998; Allchin 2007; Kipnis 2009; Niaz 2009), which also seems to be borne out by empirical studies (Teixeira et al. 2009). I believe HPS is effective in NOS teaching because it counters the tendency of “normal science education” to misrepresent scientific practice, as I discussed in Sect. 2 above. Therefore, complementary HPS is the best option. The bits of history neglected by modern science, which it has not bothered to re-write, easily offer us an undisturbed view of scientific practice without the need to peel away the layers of later re-interpretation and tidying-up. In the realm of historical experimentation, this function can be served well by historical replications. However, I think that well-considered physical replications of anomalous results would work just as well, and perhaps in a more striking manner without the distraction of having to worry about all the historical details and subtleties. Best of all, if students can make extensions in previously uncharted directions (or at least in directions that are unknown to themselves and their teachers), they can have a genuine experience of inquiry and take a live lesson in NOS from that experience. Now, I am not proposing that the orthodox pedagogical frameworks should be dismantled altogether. On the contrary, for NOS education it would actually be best if we could preserve the orthodox frameworks and also show students some things that do not fit into them well. This would result in the very valuable lesson that the actual process of scientific inquiry is not fully reflected in the simplifications and distortions made in normal science education for the sake of effective communication and efficient learning. And that lesson can be taught without denying or undoing the positive benefits of those pedagogical shortcuts. Part of what is at stake here is pointing out the necessary gap between scientific research and science education.

A related benefit is the cultivation of original and independent thinking, and a critical and inquiring attitude in students. Cavicchi (2008, pp. 719–720) regards her brand of historical experimentation as evocative in applying Eleanor Duckworth’s methodology of critical exploration to science teaching, which involves “engaging students’ curiosity with something complex, encouraging them to interact with it directly and reflectively, and providing new materials and questions that add further options to do and notice.” Don Metz and Art Stinner (2006, p. 3) note that even inquiry-based learning and discovery learning can easily become stultifying processes when the exercises are designed and conducted with the assumption that there are known facts and laws of nature, and students are asked to arrive “on their own” at those pre-determined destinations. I agree with Metz and Stinner’s suggestion that historical experiments provide a way to avoid this problem, and I think complementary experiments will serve this purpose best, because they focus on unknown, unexpected or unorthodox results. Complementary experiments disturb scientific complacency. Simply understanding science as it stands is not good enough for a science education that is intended to foster critical and original thinking. Indeed, in the mode of work that Metz and Stinner (2006, pp. 6–7) encourage, “the experiment often reveals a discrepant or unusual event which calls for hypothesis-generation”, and “students may also… perform independent investigations arising from their proposals” for tackling the problem-situation. These aspects are very close to what I characterize as recovery and extension in complementary experiments. For stirring students’ critical faculties into action, there is nothing like witnessing and producing phenomena that do not fit into standard theories.Footnote 28

Many front-line teachers and students may object: don’t teachers need to answer to various national curricula and the demands of conventional examinations? Won’t this kind of activity take precious time and attention away from meeting the required learning objectives? Certainly—but I think if students are sufficiently motivated they can learn what they need to know for the exams, and still have time left over to study other interesting things as well. Crawford (1993, p. 207) relates her own very instructive experience in this regard. Her teaching through historical experiments got students curious and excited, but she was falling behind in covering the syllabus; in a panic she “slammed into top gear and taught fast”, but to her pleasant surprise the students coped well with this phase, too. She marvels: “It appeared that pupils with a taste for ‘thinking’, avidly desired knowledge to think with.”

This is not the place to enter fully into the profound debate about whether and to what extent science education should foster critical inquiry, but I think a brief reminder of a few familiar points will not be out of place. First, the training of research scientists is not the only aim of science education. Far from it—only a tiny fraction of students who learn science will ever become research scientists. So the aim of science education must be considered within the larger framework of general or liberal education, and critical thinking is vital in that context. Secondly, even for the training of normal scientists, there is a strong argument for making a place for critical inquiry. There are the old arguments by Karl Popper, John Watkins and Paul Feyerabend (see papers in Lakatos and Musgrave 1970) against the Kuhnian line on the necessity of dogma in scientific training. An even more poignant and interesting contrast to Kuhn is Joseph J. Schwab, who published The Teaching of Science as Enquiry in 1962, the same year as the first edition of Kuhn’s Structure of Scientific Revolutions. Harvey Siegel (1990, pp. 99–102) among others has noted a close parallel between Kuhn’s distinction of normal/extraordinary science and Schwab’s distinction of stable/fluid inquiry. But Schwab’s view is that more and more research is devoted to fluid inquiry as science develops, so it becomes more and more necessary to train scientists for fluid inquiry—in other words, to equip them for critical thinking. Finally, it may be objected that students themselves want definite answers, rather than uncertainly and open-ended questions. That is certainly true in some cases, as Crawford (1993, pp. 205–206), Heering (2000, p. 366) and Cavicchi (2009, pp. 259, 263) all report about some of their students. But the aim of education is not simply to give students what they immediately want (they may also want junk food and unprotected sex). It is also our duty as educators to introduce students to the uncertainty and discomfort of real scientific research, to let them realize that “thinking begins with not knowing” (Crawford 1993, p. 206). Enlightenment by textbook answers is not the only type of positive learning experience.

The final educational benefit of complementary experiments that I want to stress is that they will stimulate students' sense of wonder about nature and their excitement about science, through a direct engagement with natural phenomena without well-entrenched expectations of pre-determined outcomes. Among practitioners of historical experiments Crawford and Cavicchi have been the most eloquent about this benefit, and I believe that they are inclined toward complementary experiments. Crawford (1993, p. 207) reports that her use of history created “a crackling atmosphere of liveliness and deep satisfaction” and fostered “lively excited curiosity”. In Cavicchi’s words (2006, p. 91): “In science classrooms where experimentation evolves through students’ curiosity and observing, their learning becomes self-generating, joyful and resilient.” The degree of open-endedness possible in complementary experimentation can only be matched in cutting-edge scientific research, which the majority of students learning science at school or even university will never have the pleasure of experiencing. Having things that the teachers themselves cannot predict or explain can be a real motivator for students, as long as the ignorance is admitted with honesty and confidence, and used as a driver of inquiry. If science educators are worried about getting more students interested in science, they should welcome this additional source of motivation. For myself, I can honestly say that the summer of 2004, much of which I spent boiling water, was one of the most exciting period of scientific learning that I have ever experienced. And with the ongoing work on electrochemistry, I have been completely captivated by chemistry for the first time in my life. I am optimistic that many students will be drawn in by the sheer curiosity of phenomena and the independence of inquiry that complementary experiments can bring. Contrast that prospect with the disappointing tedium of run-of-the-mill laboratory exercises in school science (titration and circuits come to mind), in which students are expected to produce “correct” results in experiments designed by others to find out things they weren’t even curious about to begin with, using pre-fabricated apparatus whose workings they do not really understand.

In conclusion: a thoughtful engagement with past science helps us realize that modern specialist science only deals with a restricted range of things in a restricted range of ways. The brilliant successes of today’s science may make it seem that we have got the basic truth about nature, with only some details left to be worked out. But just scratch the surface, and we begin to see so much more, even in very simple and mundane phenomena. Opportunities for the recovery and extension of scientific knowledge should not be neglected in science education. Many complementary experiments are not theoretically or instrumentally demanding to start with, hence within easy reach of non-specialists with few resources. We can use them to give students a genuine experience of open-ended scientific inquiry, allowing them to discover something about nature for themselves and in the process also learn what it means to do original research. This, I believe, is an important part of the ultimate aim of science education.