1 Introduction

In 1958 Norwood Russell Hanson remarked that “there is more to seeing than meets the eyeball” (Hanson 1958, 7). Hanson was correctly pointing out that what we know influences what we observe. This is nicely illustrated in Fig. 1. A trained high-energy physicist observes the decay of a Λ hyperon into a proton and a pi meson near the top of the bottom section of the cloud chamber. An untrained observer sees an inverted vee with one short track and one long track. Hanson’s view was later transformed by Thomas Kuhn and Paul Feyerabend into the “theory ladenness of observation” and its more radical cousin “incommensurability.” Each of these problems has both a philosophical and pragmatic component.

Fig. 1
figure 1

The decay of a Λ hyperon into a proton and a pion. From Alford and Leighton (1953)

2 Theory-Ladenness: The Philosophical Component

I will deal first with the philosophical component of theory-ladenness. This is the view that observation cannot function in an unbiased way in the testing of theories because observational judgments are affected by the theoretical beliefs of the observer. Some philosophers of science, see, for example (Franklin et al. 1989), attempted to avoid this problem by looking at the theory of the experimental apparatus and the theory of the phenomenon under investigation. They argued that if the two theories are distinct then the problem can be avoided. Sometimes, however, that distinction cannot be made. An exemplar is the use of a mercury thermometer to test whether objects expand as their temperature increases. The proper operation of the thermometer depends on the hypothesis under test. One may argue, however, that one may still use a mercury thermometer in such an experiment if that thermometer can be calibrated against another independent thermometer such as a constant-volume-gas thermometer, whose operation does not depend on the theory under test.Footnote 1 In addition, there is certainly no guarantee that an experiment described in the language of a theory must give results which agree with the predictions of that theory.

It is clear that Kuhn did not intend the above view of theory ladenness because in his theory of scientific revolutions the motor for such revolutions is provided by “anomalies”, experiments that disagree with the predictions of the paradigm or theory under test. Consider the experiments of Lummer and Pringsheim and of Rubens and Kurlbaum on the spectrum of blackbody radiation that provided evidence against Wien’s Law and provided the impetus for the introduction of quantization by Max Planck. These experiments were described in the language of classical physics, but their results disagreed with its predictions. Other examples abound.

Kuhn, Feyerabend, and others have further argued that there can be no comparison between competing paradigms, or worldviews, based solely on experimental evidence. As Barry Barnes stated, “[t]here is no appropriate scale available with which to weigh the merits of alternative paradigms: they are incommensurable” (Barnes 1982, 65). Briefly stated, the argument is as follows. There can be no neutral observation language. All observation terms are theory laden and thus we cannot compare experimental results because in different paradigms terms describing experimental results have different meanings, even when the words used are the same. An example would be the term “mass”, which in Newtonian mechanics is a constant, whereas in Einstein’s relativistic mechanics it depends on the velocity of the object.Footnote 2

I disagree. I will demonstrate that a procedurally defined experiment, loosely called the elastic scattering of equal mass objects (protons, if you will), can distinguish between Newtonian and Einsteinian mechanics. The experimental procedure is as follows. Consider a class of objects, let us say billiard balls. The objects are examined pairwise by placing a compressed spring between them. The spring is allowed to expand freely and the velocities of the two objects are measured. Because we restrict ourselves to a single frame of reference in the laboratory, the measurement is theory neutral between Newtonian mechanics and special relativity. We then select two balls whose velocities are equal. A Newtonian would interpret this procedure as providing two objects with equal and constant mass. An Einsteinian would interpret this as two objects whose rest mass, M0, is the same, but whose mass varies with velocity, M = M0/√1–v2/c2. This is agreed, but the point is that the procedure is theory neutral. Adherents of both views agree that they have equal “mass.” One of the objects is then placed at rest in the laboratory and the other given a velocity, V (again theory neutral) and directed at the object at rest. The particles then scatter from one another. Care is taken to make the collision elastic, no energy lost. The final velocities of the two particles are measured as is the angle between them. In such an experiment the Newtonian prediction for the angle between the two outgoing particles is 90°, whereas in relativistic mechanics the angle is <90° (Fig. 2) (For details see Franklin 1984). Although adherents of the two competing views will describe the experiments differently, they will agree on the respective predictions and on the measurement of the angles in the laboratory system. Thus, the two paradigms can be compared. They are commensurable.Footnote 3

Fig. 2
figure 2

A diagram of the “equal mass” scattering experiment

A real-life example of this is the experiment that demonstrated that parity, or left–right- or mirror-symmetry, is violated in the weak interactions. In the early 1950s the physics community was faced with what was known as the “τ – θ” puzzle. There were apparently two elementary particles, the τ and the θ. On one set of criteria, namely mass and lifetime, they seemed to be the same particle. On another set of criteria, that of spin and parity, they appeared to be different particles. Lee and Yang (1956) pointed out that the puzzle would be solved if parity was not conserved in the weak interactions.Footnote 4 They suggested several experimental tests of their hypothesis of which one was the examination of the beta decay of aligned nuclei (the spins point in the same direction) (Fig. 3). If, for example, more electrons are emitted opposite to the nuclear spin direction than in the same direction, then this would demonstrate that parity, or mirror symmetry, is violated. In a mirror the spin of the nucleus is reversed, whereas the direction of the electron momentum remains the same. (Notice that in Fig. 3 the particle in real space is spinning counterclockwise when viewed from above and its spin direction is up. In mirror space the particle is spinning clockwise and the spin direction is down). Thus, the real and mirror experiments would differ. In the mirror experiment more electrons are emitted in the same direction as the nuclear spin, whereas in real space more electrons are emitted opposite to the nuclear spin direction. This asymmetry was, in fact, observed in an experiment done by Wu and her collaborators (1957). The result is shown in Fig. 4 and clearly shows an asymmetry. That experiment, along with two others, decided the issue (For details see Franklin 1986, Chapter 1). Although this may not be as general as a paradigm shift, the violation of a very general, discrete symmetry principle should be, and was, regarded as a major change in theory.Footnote 5 In this case, because there are only two classes of theory, those that conserve parity and those that do not, one can even avoid the Duhem–Quine problem.Footnote 6 (See, for example Franklin and Smokler 1981).

Fig. 3
figure 3

The decay of an oriented nucleus in real space and in mirror space

Fig. 4
figure 4

Relative counting rates for β particles from the decay of oriented Co60 nuclei for different nuclear orientations (magnetic field directions).The asymmetry is clearly visible. From Wu et al. (1957)

3 Theory-Ladenness: The Practical Problem

The practical problems are, perhaps, more difficult to solve. Virtually all experiments, except for those we can regard as exploratory,Footnote 7 are designed and conducted under the auspices of some theory and described in its language. One might worry, for example, that adherence to a particular theory may result in an experimental design that precludes observation of phenomena not predicted by that theory. An example of this, discussed by Galison (1987, Chapter 4), was experiment E1A at Fermilab (Benvenuti 1974, 674), one of the experiments that first discovered the existence of weak neutral currents. When the experiment was initially conceived, it was a rule of thumb in particle physics that weak neutral currents did not exist. The initial design included a muon trigger, which would be present only in charged current interactions. In a charged-current event a neutrino is incident and a charged muon is emitted, in a neutral-current event there is a neutrino in both the initial and final states, and no muon is emitted. Thus, requiring a muon in the event trigger would preclude the observation of neutral currents.

After discussion with theorists, who pointed out that the then recently proposed Weinberg-Salam unified theory of electroweak interactions predicted neutral currents, the trigger was changed so that neutral currents could be observed. In its original form, the experiment could not have detected those currents. Fortunately the design was changed before the experiment was performed. It is interesting to note that neutral current events had, in fact, been observed in earlier experiments, but had been attributed to a neutron background (see Pickering 1984, 98–100 for details). The lack of a theoretical prediction had led to a misinterpretation of the observations or the failure to observe an effect, another problem of the theory-ladenness of observation. Other possible practical problems of theory-ladenness, discussed below, are those of experimenter bias, the desire to get results in agreement with theory,Footnote 8 and the possibility of theoretical bias in the acquisition of data.

4 An Illustration: The Double Scattering of Electrons

An episode in which several of these practical problems are illustrated occurred in the experiments that investigated the double scattering of electrons at large angles from heavy nuclei in the 1920s and 1930s. In analogy with X-ray scattering, it was believed that the first scatter would polarize the electrons and the second scatter would detect that polarization by observing a forward–backward (0°–180°) asymmetry in the second scattering. None of the early experiments, those performed in the 1920s, found such an asymmetry or evidence for electron polarization. One experiment, performed by Cox et al. (1928), did, however, observe an unexpected left–right (90°–270°) asymmetry in the second scattering. Around 1930 Neville Mott, on the basis of Paul Dirac’s electron theory, proposed a quantitative theory of double-scattering and predicted a forward–backward (0°–180°) asymmetry of approximately 10 % (Mott 1929, 1931, 1932). The failure to observe that asymmetry cast doubt on Dirac’s theory, which had, at the time, very strong support because of its prediction of the positron and its subsequent confirmation by Carl Anderson. Mott admitted that his theory did not say anything about a (90°–270°) asymmetry. Subsequent experiments during the 1930s, all unsuccessful, searched for the (0°–180°) asymmetry to try to resolve the anomaly for Dirac theory.Footnote 9 No experiments attempted to replicate the (90°–270°) asymmetry found by Cox and his collaborators.Footnote 10 It wasn’t thought to be as theoretically important as the failure to observe the (0°–180°) asymmetry. The later experiments were designed to search for Mott’s predicted asymmetry. Theory strongly guided the design of the experiments so that the only experiments performed searched for the 0°–180° asymmetry.

There were, in addition, several theoretical attempts to resolve the discrepancy. All were unsuccessful. It wasn’t until the early 1940s that an experimental problem was found that had precluded the observation of the effect predicted by Mott. When that problem was corrected, the predicted asymmetry was observed. Ironically, it was the work of Cox and others who solved the problem (for details see Franklin 1986, Chapter 2). By that time even they did not recall their earlier results on the (90°–270°) asymmetry. It seems clear that the lack of a theoretical context was responsible for the failure to even attempt replication of the Cox results.

Interestingly, the effect observed by Cox et al. (1928), who did not recognize its importance, demonstrates, at least in retrospect, the nonconservation of parity. One can see that Cox and his collaborators came tantalizingly close to recognizing the significance of their work. “It should be remarked of several of the suggested explanations [of their result] that their acceptance would offer greater difficulties in accounting for the discrepancies among the different results than would the acceptance of the hypothesis that we have here a true polarization due to the double scattering of asymmetrical electrons” (Cox et al. 1928, 548, emphasis added). Electrons from beta decay are, in fact, longitudinally polarized so that the first scatter transforms that longitudinal polarization into a transverse polarization, which results in the (90°–270°) asymmetry found in the second scatter. The longitudinal polarization implied by the (90°–270°) asymmetry is itself evidence for parity nonconservation. Although parity conservation in quantum mechanics had been suggested in 1927 by Eugene Wigner, its importance was not widely appreciated. The lack of a theoretical context, unlike that which was available in the 1950s, when the Wu experiment was done, accounts, in all probability, for the failure to recognize the significance of the (90°–270°) asymmetry found by Cox et al.Footnote 11

It is unlikely that had other experimenters attempted to replicate the Cox experiment that they would have observed the (90°–270°) asymmetry. Cox and his collaborators used electrons from beta decay, which are longitudinally polarized, in their early experiments. The later experiments all used electrons from thermionic sources, which are unpolarized, and which would have precluded observing the same effect. This was not a result of any adherence to theory, but rather due to the desire for more intense and better-controlled electron beams, in other words a “better” experimental apparatus.

This episode also illustrates another practical problem that one might associate with theory-ladenness. This is the issue of an experimenter desiring to get results in agreement with accepted theory and practicing what one might legitimately call bad science. This is discussed in the next section. In this episode the only experimental results obtained during the 1930s that agreed with Mott’s theory were those of Rupp (1929, 1930a, b, 1931, 1932a, b, c, 1934; Rupp and Szilard 1931). In 1935 Rupp published a retraction of several papers on electron polarization, which included a note from a psychiatrist stating that Rupp had suffered from a mental illness and could not distinguish fantasy from reality (Rupp 1935). The results were fraudulent. (For details see French 1999, Darrigol 1984, Franklin 1986, Chapter 2). At the time Rupp’s results merely added to an already very confused experimental and theoretical situation. They did not persuade the physics community that Mott’s theory was correct.

This episode illustrates two of the practical problems of theory-ladenness: the failure to recognize the significance of experimental results because of a lack of theoretical context and the failure to attempt the replication of an experiment because it seemed less important theoretically than a similar experiment.

In some very interesting recent work, Karaca (2013) has distinguished between strong and weak senses of the theory-ladenness of experiment.Footnote 12 In the weak sense, an experiment and its apparatus will be described and discussed in terms of background theories that apply to a large segment of, in some cases all, phenomena such as quantum field theory. In describing the strong sense of theory-ladenness he remarks, “I shall characterize theory-driven [strong theory ladenness] experimentation as a specific type of experimentation that is performed under the continuous guidance of some theoretical account typically with the aim of ascertaining the conclusions of the same account.” He illustrates these different senses with accounts of experiments on elastic proton–proton scattering and on deep inelastic electron-proton scattering, the strong and weak theory ladenness, respectively. In the former, the design, construction, and the analysis, acquisition, and interpretation were performed under the guidance of Scattering Matrix Theory, in particular, Regge Pole theory. Nevertheless, even with such guidance, the experimental results might have disagreed with the theoretical predictions. In theory-driven experiments attention is restricted to phenomena the theory deems important, and other interesting phenomena may be missed, as was the case in the experiments on the double scattering of electrons, described above. In the experiments on deep-inelastic scattering, although theory was involved in many ways, Karaca shows that the experiments were theory laden only in the weak sense.

4.1 Another Problem: Possible Experimenter Bias in the Selection of Data or the Acquisition of Data.Footnote 13

It is also a fact of empirical science that experimenters never use all of their data in producing a result. Data may be excluded for many legitimate reasons. Certainly no one would think of using data obtained when the experimental apparatus was not working properly. Even when the apparatus is working properly problems may arise when only selected portions of the data, i.e., “good” data, are used to obtain a result. Selection criteria, usually referred to as “cuts,” are applied to either the data themselves or to the analysis proceduresFootnote 14 and are designed to maximize the desired signal and to eliminate or minimize background that might mask or mimic the desired effect. One might worry that the experimental result is an artifact produced by the cuts, and not a valid result.Footnote 15 A further worry may arise if the effect of the cuts on the experimental result is known in advance. Is the experimenter biased and tuning the cuts to produce a desired outcome?Footnote 16

One technique designed to avoid such possible bias is “blind analysis,” in which the selection criteria are set without knowing the effect on the final result (for details see Franklin 2002, Chapter 6). The reasons for using blind analysis along with possible problems due to experimenter bias are clearly stated in the “Draft Guidelines for Blind Analysis in BABAR (Burchat et al. 2000).”Footnote 17

The major motivation for a blind analysis is to adopt a technique which removes or minimizes Experimenter’s Bias; the unconscious biasing of a measurement toward prior results or theoretical predictions…. (emphasis added)Footnote 18

There are a number of ways in which Experimenter’s Bias can infect a measurement which can be eliminated with a blind analysis. First, the point at which the decision is made to stop working and present one’s result can be influenced by the value of the result itself, and how it compares with prior results or predictions.Footnote 19 In a blind analysis the decision to stop and publish is made based on external checks, and not on the numerical value of the result. After all there is no information about the correctness of a measurement in the numerical value obtained; a blind analysis enforces this separation. Second, choices about the data to include, or the cuts to use, can be subtly biased, if the effect these choices have on the result is known. Often changes in an analysis, which change the data set, can affect the value of a result in a statistically reasonable way. A blind analysis ensures that such choices affecting the data sample do not bias the result. Third, the values and types of cuts to use can be biased by knowledge of the effect of these cuts on particular events in the data. In particular, for rare decay searches or measurements involving small samples a blind analysis removes the possibility that cuts are chosen to include or exclude particular events in the data. In this case a blind analysis ensures a statistically meaningful result (Burchat et al. 2000, 3).

Another possible problem arose at about the same time. The increased intensity provided by new accelerators meant that the data collection systems could not deal with the total amount of data produced. Initially one could have relatively loose triggers that would not exclude very many events of interest.Footnote 20 At the Large Hadron Collider (LHC) the problem is extremely serious. At the LHC there are beam pulse collisions every 25 ns and each collision produces 20 proton–proton interactions. That means that events are produced at a rate of 800 megacyles (this depends, of course, on the luminosity of the beams). The data acquisition systems can deal with an event rate of approximately one hundred cycles per second.Footnote 21 Thus, the recording rate of events must be reduced by a factor of more than a million. “The CMSFootnote 22 trigger is designed to perform a data reduction from 32 MHzFootnote 23 down to O (100) Hz via different sequential triggers. The first trigger level of CMS, Level-1, is hardware implemented and reduces the data rate, by using specific low level analysis in custom trigger processors. All further levels are software filters which are executed on (partial) event data in a processor farm.Footnote 24 This is the upper level of real-time data selection and is referred to as High-Level Trigger (HLT). Only data accepted by the HLT are recorded for offline physics analysis” (Adam et al. 2006, 608). The HLT software can be changed relatively easily and there are various monitoring systems in place to ensure that such changes do not substantially change the operation of the experiment.Footnote 25

Most of the data produced are never recorded.Footnote 26 Only those events deemed to be of physics interest are stored. This is, at least in principle, relatively easy to do for known physical processes, but one of the goals of the LHC and its experiments is to look for physics beyond what is known, or beyond the Standard Model. Here the assumption is made that the new physics will resemble known physics. As Karaca (2011) has pointed out, virtually all models of physics that go beyond the Standard Model predict the production of heavy particles which will decay into particles with high transverse moment (pT) or jets. Table 1 shows a sample of Level-1 triggers used by the ATLAS experiment at the LHC. It lists the types of events searched for and their purpose.Footnote 27 The first level triggers reduce the event rate by approximately a factor of 1,000. Further High Level Triggers reduce the rate to a manageable rate of approximately 100 events/s. A small sample of the unfiltered events is saved, but the rest of the data is lost.

Table 1 Some ATLAS Trigger Menus (Karaca 2011)

Previous work has discussed the problem of selectivity, the application of selection criteria to already acquired data, so that the phenomenon under investigation can be isolated (Franklin 2002, Chapters 1–6). One important way to argue that the result is correct and not an artifact produced by the selection criteria is to vary those cuts and see whether the result is stable under reasonable variations of the selection criteria. If it is, then this robustness argues for the correctness of the result. As Karaca points out, this strategy is not available when the selection criteria are being applied at the data acquisition stage. He notes, however, that there is a form of robustness that is applied here. Karaca reports that in setting the Level-1 triggers the ATLAS group examines a diverse set of theories and models that go beyond the Standard Model. These include supersymmetry, extra-dimensional models, and models with heavy gauge bosons. These models are examined for common, robust properties, in this case the production of heavy particles. On the basis of that property, a hypothesis that can be empirically tested is constructed. Such a hypothesis is that heavy particles are produced which decay into particles or jets with high transverse momentum. Karaca notes that this hypothesis can be empirically checked using known particles, a procedure that demonstrates the ability of the experimental apparatus and the analysis procedures to detect such particles (CMS has, in fact, replicated much of the history of particle physics in the twentieth century, observing particles from the π0 meson to the top quark. Such replication demonstrates that both the experimental apparatus and the analysis procedures are working properly).

Finally, various trigger menus are established relative to that phenomenological hypothesis (Table 1). One then checks whether experimental results remain stable across the various menus. (This can be done empirically by detecting known heavy particles or by Monte Carlo simulation for proposed theoretical models, which involve new heavy particles). This provides what one might call theoretical-empirical robustness. Nevertheless, there is an underlying assumption that any new physics will resemble known physics. If it doesn’t, then the new physics might very well be missed. This danger is reduced by the fact that theorists are constantly producing models of new physics, and the history of physics has shown that plausible, and sometimes, even implausible, models are investigated.

5 Conclusion

As the discussion above indicates, I believe that the philosophical problems associated with both theory-ladenness and incommensurability have been solved. The practical problems associated with theory-ladenness are real and still with us, but scientists are aware of them and, as we have seen, take steps to avoid them. Nevertheless although the problems can often be eliminated or minimized, they cannot be avoided completely.