Introduction

In science, the prestige and status associated with the Nobel Prize is unmatched. An important symbol of scientific achievement and discovery, the Prize annually generates enormous interest in the scientific community and the general public. Accordingly, many studies have been devoted to the selection mechanisms, the institutional aspects of the history of the Prize (Zuckerman 1977; Crawford 1984; Friedman, 2001) as well as to the analysis of particular winners (Barkan 1994; Björk 1991; Jenkins 2001; Krige 2001; Elzinga 2006). Also, thanks to the opening of the Nobel archives for prizes and nominees dating back more than 50 years, we better know the population of scientists nominated for the coveted prizes (Bernhard et al. 1982; MacLachlan 1991; Crawford 1992).

From a bibliometric perspective, one could ask if Nobelists are more cited than the average scientist and if we can find a particular pattern for winners that would distinguish them from the rest of the community. In the latter case, one could even try to “guess”, or predict, the next winner. Eugene Garfield (1981) has explored some of these questions in a series of papers attempting to elucidate the profile of prizewinners by describing a subset of scientists “of Nobel class” via their citation statistics (Garfield 1977, 1986; Garfield and Welljams-Dorof 1992). Not surprisingly, it was found that this set of scientists does differ in citation frequency from the “average” scientist: “in the highest percentile […] a significant percentage have won the Nobel Prize or go on to win the Prize in later years. Also, the author impact of Nobelists is sufficiently high to distinguish them from non-Nobelists” (Garfield and Welljams-Dorof 1992, p. 118). Garfield also claims that Nobel laureates can be distinguished from other scientists by having written “citation classics”. It should be noted, however, that while Garfield claims that bibliometrics have substantial predictive power, he admits that the subjectivity of the Nobel selection process by definition precludes any systematic forecasting from “objective” data (Garfield 1986). A similar paper by Ashton and Oppenheim (1978) presents an improvement on Garfield’s method by, among other things, generalizing the citation statistics to include non-first authors. They also claim not to be able to predict a prizewinner, but rather to identify a group of candidates likely to win the Prize. In general, the inclusion of multi-authored papers improves the rankings, but does not substantially alter them. It should also be noted that most of the predictions discussed above were done a posteriori. In the same vein, Kademani et al. (2005) have demonstrated the dominance of multi-authored papers by Nobel laureates, although they use data from only 8 winners. Other similar bibliometric studies have also restricted their scope to a few selected laureates (Kademani et al. 2005 and references therein). Finally, a recent article by Karazija and Momkausaite (2004) has brought to light certain characteristics of the Nobel Prize in physics, most notably the distribution of winners’ ages, their fields of study and the lag times between a discovery and its corresponding prize.

An important aspect that is missing from most bibliometric analysis of Nobel Prize winners is a sensitivity to the evolution in time of the dynamics and growth of science that may affect the pattern of citations and the relative position of prizewinners in the structure of the scientific field. It is obvious that science has grown exponentially over the twentieth century and that all disciplines have given rise to many specialties to such an extent that the fragmentation of science makes it more difficult now than ever to identify an obvious winner for a discipline as a whole. Whereas it was still relatively easy around 1910 to know who the most important scientists in a discipline were, such a judgment is much more difficult since at least the 1970s. In order to analyze the effect of these changes on the distribution of citations to prospective winners of a Nobel Prize, we provide in this paper a detailed analysis of the changing pattern of citation and centrality in the co-citation network of all winners of the chemistry and physics prizes from 1901 to 2007, as well as for nominees (1901–1945). As we will see, the changing dynamic of science and its obvious growth and increased specialization since the 1960s had the effect of diluting potential winners in a massive group of central scientists, from which it has become nearly impossible to pick a winner using bibliometric tools.

Methodology

The basis for this paper is a list of the 500 most cited and most centralFootnote 1 chemists and physicists constructed annually from the citation data of the subset of physics and chemistry journals in the Web of Science for the period 1900–2006.Footnote 2 We thus exclude important multidisciplinary journals such as Nature and Science, but ensure a good representation of the disciplinary journals for each field. The two measures of a scientist’s importance are distinct though highly correlated: the more citations an author receives, the more chance he or she has of becoming “central” in the network of co-citations by having links with many other cited authors. In terms of co-citations, one can interpret centrality as a measure of an author’s position in the discipline’s network (Gingras 2007), a low centrality corresponding to authors on the periphery of the network and having few links with other cited authors. Using rankings instead of absolute numbers allows us to have a time invariant measure of the most influential scientists in a given year. We then compare this data with the list of all Nobel Prize winners between 1901 and 2007, as well as with the nominees for the 1901–1945 period (Crawford 2002).

From a bibliometric point of view, one would expect that Nobel Prizes are awarded to the most cited authors, or are at least chosen among the most cited, since it is taken as nearly axiomatic that citations constitute a good indicator of the recognition received by scientists and thus of their global symbolic capital in the scientific field (Cole and Cole 1973; Bourdieu 1975). Our large data sample is composed of 330 winners in physics and chemistry (1901–2007) and 1,595 nominees (including “repeat” nominees) in the same two disciplines (1901–1945). We compare these sets with the 500 most cited scientists each year in both fields. This allows us to examine in greater detail the profile of prizewinners and observe the changing characteristics of the distribution of ranks over a large time period before as well as after their being nominated or having received the Prize. In other words, for each prizewinner, we are able to obtain his/her ranking in terms of total citations and centrality the year his prize was awarded, as well as in the years preceding and following the event. This yields a large time interval for each author, and we can average the results over all prizewinners (or nominees), setting the year “0” as the winning (or nomination) year for each one. When someone wins twice (a rare event) or is nominated many times (a more frequent event), we can use the same citation data many times, each time with respect to the year of the Prize or nomination in question. In all cases, care must be taken to make sure the data contain minimal namesakes, not only in terms of the laureates themselves, but all prominent scientists as well, in order to ensure that the rankings are accurate. Often, in order to display the data in a meaningful manner, the rankings (from 1 to 500) are inverted. We have applied a “cutoff” at 500, beyond which it becomes very computationally time-consuming to collect data. However, given the usual distribution of citations, neglecting scientists who rank below 500 does not affect the results. In order to check this point, we have performed a similar analysis using 100 as a cutoff and, as expected, we have obtained statistically similar results.

Nominees, laureates and the development of disciplines: 1901–1945

Using the method discussed above, Fig. 1 displays the average rank in terms of citations and centrality of Nobel laureates and nominees in chemistry and physics for the 1901–1945 period. Data pertaining to the nominees serve as a sort of reference point that allows us to better understand the specificity of winners and how receiving the award affects them. For instance, Fig. 1 shows a steep increase in the rank of physics winners in the years leading up to the prize, despite the fact that several years have usually elapsed since their “discovery”. Though we at first thought that centrality measures would provide a better indicator than citations, we observe that in fact both distributions are nearly identical. Interestingly, the peak in the nominees’ ranks occurs slightly before the year of their nomination (4 years for physics, 1 for chemistry), while for the laureates, it occurs (on average and, in the case of chemistry, within the margin of error) the same year as their award. This is a surprising result, which remains valid (although much less pronounced, as is discussed below) in later periods. It is also somewhat counter-intuitive, given that the winners are selected many years after their discovery. Only four times were physics Nobel laureates awarded prizes within a year of their discovery, and the average lag time is around 12 years (Karazija and Momkausaite 2004). In other words, this distribution suggests that while the impact of their experiment or theory might have been most important several years earlier, the Prize is awarded when their accumulated symbolic capital is highest (Bourdieu 1975).

Fig. 1
figure 1

Average (a) physics and (b) chemistry centrality and citation rankings for winners and nominees, 1901–1945. The vertical dashed line indicates the year “0” when someone is nominated or wins the award. Each data point therefore represents an average of all winners’ rankings (between 1901 and 1945) at a given time interval from the year of the prize. For the sake of presentation, the rank of the top 500 most cited or most central scientists has been inverted (i.e., the most cited/central scientist has rank of 500). The smoothness of the nominees’ curves likely reflects the availability of more data points for constructing the average

The shape of the curve corresponds closely to a normal distribution with an elongated time tail. This deviation for t > 0 from the Gaussian can be interpreted as the “Halo Effect”, not present in the case of nominees (whose names are not made public by the Nobel committee). This effect essentially reflects the cumulative advantage process or the “Matthew effect” identified by the sociologist Robert K. Merton (1973). Being recognized via a Nobel Prize gives status to the scientist in the eye of his or her peers, who in turn accord more credit to him/her, which then translates into more citations. However, it is interesting to note that, on average, there is no evidence of this phenomenon generating an increased ranking in the years following the attribution of the prize, but simply a slower decline, as shown in the asymmetry of the distribution. In interpreting these results, we must also stress the fact that the selection process of Nobel Prizes is not conducted in a public realm and that no details about criteria are made public. Only the end result, that is winners, can be compared using bibliometrics. For the nominees, we have to wait 50 years before knowing their names, a policy which explains that our data points for them end in 1945.

Another important characteristic of these distributions is that the rate at which the ranking (in terms of citations or centrality) of scientists increases before the year of the Prize (i.e., the slopes of Fig. 1 at t < 0), is much greater for winners than nominees. Thus, the former can be characterized as “rising stars” of the scientific community. It is also interesting to note that before about 12 years from getting their prize, the average ranking of winners is lower than that of the nominees, since winners cannot be nominated again once they have won. If two important discoveries (according to other scientists) are nominated for the Prize 10 years thereafter, then the scientist who does not win will presumably be nominated again (if the validity of his work stands), so his t ≪ 0 ranking will generally be higher. Take, for instance, the 1925 Nobel Prize in Physics awarded to James Franck and Gustav Hertz, for the well-known “Frank–Hertz experiments”Footnote 3 performed over 10 years earlier. That year, among the 16 nominees (who never went on to win the Nobel Prize), we find figures such as Arnold Sommerfeld, Friedrich Paschen, Paul Langevin and Arthur Schuster, all of whom had been extremely well-known for at least 20 years and had already been nominated on several occasions. In essence, the fact that nominees are generally repeated year after year—while winners are selected once and usually at their “peak”—ensures that they have a relatively high standing (bibliometric ranking) over a longer period of time.

Most of the characteristics distinguishing nominees from winners described above are common to physics and chemistry, and this suggests that, at least for the first half of the twentieth century, these disciplines had a similar internal scientific dynamic for which citations and centrality offer a useful measure. As we mentioned at the beginning of this paper, the dynamics leading to the Nobel Prize in physiology and medicine seem more complex and less endogenous; citations and centrality measures are even less useful as predictive indices.

The Nobel Prize and the postwar growth in science: 1946–2007

In order to inquire into the possible change over time in the dynamic of the scientific fields of physics and chemistry, we have divided the analysis of the distribution of citation rankingsFootnote 4 into three periods (1901–1945; 1946–1970; 1971–2007). In this way, we can compare the ranks of Nobel laureates in chemistry and physics over time (Fig. 2). The first striking result is the progressive flattening of the distribution over the three periods. The rapid ascension of the winners as we approach the year of the Prize is replaced after 1970 by a nearly uniform distribution. The flatness of that curve is partly due to the fact that, since the 1980s, a majority of winners are not part of the first 500 most cited so that their change of position in the ranking is not captured as they are placed at zero position whether they are 800th or 600th in the total ranking. The second time period already suggests that the concentration of activities around a core of potential winners observed during the first half of the twentieth century has rapidly changed to a situation where the community is fragmented into many small specialties. Nobelists thus become less distinguishable from the majority of top-level scientists. Although the lag between the publication of results and their ensuing Nobel Prize has been steadily increasing over the past 50 years (Karazija and Momkausaite 2004), no clear important peak in citation or centrality rankings can be observed before the Prize is awarded again because of the large proportion of winners not ranked among the most cited.

Fig. 2
figure 2

(a) Physics and (b) chemistry prizewinners’ citation rankings, averaged over all years for three different periods. The vertical dashed line represents the year “0”, when the Prize is won

The most significant conclusion to draw from these distributions is that the predictive power of Nobel Prizes from bibliometric measures has decreased over time and has now become greatly limited. Moreover, it is unlikely that improved techniques or approaches could entirely remedy this situation since it stems, in the first place, from the explosion in the number of authors and the consequent fragmentation of disciplines into many relatively autonomous specialties: the Nobel committee can no longer easily isolate the most influential three physicists or chemists among a host of potential candidates. Whereas during the 1900–1945 period, a fairly large proportion of winners could be found in the top twenty or fifty most cited authors, the proportion of those being ranked above 500 become dominant after the 1970s thus making the game of prediction almost futile. The average rankings in themselves, however, do not provide complete information about the selection of prizewinners; as the peaks in Fig. 3a show, the size of the disciplines is a dominant, but not sufficient explanation. We have also compared the average number of co-authors of prizewinners with those of the average scientist and found no significant difference between the two, thus confirming that Nobel laureates work in collaboration as most scientists do. A finer analysis of our data is necessary in order to understand some of the other mechanisms at play in defining the post-war Nobelist’s profile.

Fig. 3
figure 3

Distribution of citation rankings of (a) physics and (b) chemistry prizewinners (by decade) compiled in the 3 years leading up to their being awarded the Nobel Prize. The six classes of rankings are chosen in order to compensate for the skewed distribution. Note the overall drop in highly ranked physicists and chemists being awarded the Prize in later decades

We therefore calculate the citation (and centrality) rankings during the 3 years preceding each prize, and analyze the distribution of the winners’ ranks during each decade between 1901 and 2000. As shown in Fig. 3, the odds of predicting the winner from the pool of candidates indeed becomes very low as time goes on. This is consistent with the rapid growth in the number of active scientists over the period as the probability of choosing a winner among the population is roughly inversely proportional to the number of scientists.

Let us examine more closely the case of physics in order to understand the laureates’ rankings. First, we note that the highest ranks (especially the top 20) are those which vary most over the entire century. In other words, we will generally have a similar number of Nobelists ranked between, say, 200 and 500, but whether or not the “top” physicists are Nobelists is virtually impossible to predict.

While the same overall trend of increasingly lower-ranked prizewinners can be observed in both disciplines, larger fluctuations over time in the proportion of high-ranking prizewinners seem to occur in physics than in chemistry. In physics, the important fluctuations seen in Fig. 3a can be partially explained by preferences for certain areas of the discipline. Table 1 breaks down the Nobel Prizes between 1940 and 2000 according to the field [as defined by the Physics and Astronomy Classification SchemeFootnote 5 numbers (Karazija and Momkausaite 2004)] and a distinction between experimental and theoretical work. Once again, we use the rankings calculated over the 3 years before the Prize is awarded. This table reveals not only trends in the specialties rewarded with the Nobel Prize, but also marked differences in the impact of authors in various subfields (or of experimentalists vs. theorists), compared with how often they are rewarded with the Nobel Prize. Within the entire discipline, there is a clear hierarchy, at least in terms of how (and how often) work is cited: high-energy physics, density functional theory and semiconductors, for instance, occupy a relatively central position in physics, while astrophysics, optics, and the thermal and mechanical properties of condensed matter find themselves at the periphery. Similarly, theory is more central than experiment and usually ranks much higher in centrality as well as in citations. Garfield has also alluded to the presence of Nobelists from “smaller specialties”: such recipients could have high rankings within their specialties, but not in physics as a whole (Garfield and Welljams-Dorof 1992).

Table 1 Sixty years of Nobel Prizes in physics by decade, broken down according to specialty and the (primarily) theoretical or experimental nature of the scientist’s work

The Nobel Prize, however, seems less dependent on this hierarchy of fields; there appears to be a will to distribute the prizes in a more “equitable” manner between specialties. The same phenomenon can be observed when we compare theoretical and experimental physics. The notion of “discovery” explains part of the lack of prizes (relative to their impact on the discipline) handed out to theorists: in almost all cases, the Nobelists’ theories must have already been rigorously confirmed by experiment before obtaining the Prize. On several occasions, the Prize has been jointly awarded to a theorist and an experimentalist, but the rank of the former is invariably higher than that of the latter. Two decades—the 1920s and 1960s—in which Nobel Prizes in physics seem to include relatively high (bibliometric) rankings of laureates, reveal moments in time when the discipline is relatively compact and a few specialties are predominant, thus making choices seem more evident (see Fig. 3a and Table 1). The rapid development of quantum mechanics in the 1920s is not only obvious to the historians of science today, but also to physicists and the Nobel selection committee at the time. Also, periods corresponding to the establishment of new central theoretical paradigms generate more theorists than usual among the prizewinners. The emergence of QED (quantum electrodynamics) just after the Second World War, and electroweak theories and high-energy physics (under the PACS classification of “elementary particles and fields”) in the 1960s, explains the relatively high ranking of Nobel laureates during that period, as does the relatively high number of prizes awarded to theorists. The clear dominance of prizes given for experimental results since the 1980 seems to reflect the absence of major paradigm shifts in physics since the end of the 1960s. A similar type of analysis could probably explain the fluctuations in the chemistry prizewinners’ curve, although each discipline has a different culture and internal dynamics.

In order to understand both the hierarchy of specialties and the fragmentation of the discipline, it is instructive to look at some specific, yet representative, examples. Take, for instance, the prizes awarded in 2001 and 2002, where the recipients were ranked extremely low in bibliometric terms over the entire field. In 2001, the award went to Ketterle, Cornell and Wieman for their (primarily experimental) work on Bose–Einstein Condensates only 6 years earlier. This is not to say that the their experiments went unnoticed or were seen as insignificant by other physicists, only that the work firmly established itself first within a niche of condensed matter experimentalists working to create Bose–Einstein condensates. In 2002, we find a similar case of primarily experimental astrophysics workFootnote 6 being rewarded without ever having been—according to the physics community as a whole—central to the discipline. In both cases, the selection of laureates was not done with respect to the discipline as a whole, but rather with respect to the impact within a given specialty.

Conclusion

Our comprehensive bibliometric analysis of the profile of Nobel Prize winners in chemistry and physics from 1901 to 2007, showed that the changing dynamic of science due to its rapid increase in size since at least the 1960s, had the effect of diluting potential winners in a massive group of central scientists, from which it has become nearly impossible to pick up only three winners using bibliometric tools. We have shown that using rankings based on citations and obtained from Freeman’s (1978/1979) degree centrality in the co-citation network as indicators, give the same results. While our results do not preclude the development of improved bibliometric tools, they help us understand some of the limits of bibliometrics for predicting Nobel Prize winners. In addition, the comparison of bibliometric data with the classification of winners by specialty and type of work (theoretical and experimental) allows us to gain insight into the hierarchies that exist within scientific disciplines. We have also found that the distribution of rankings peaks at around the time the Prize is awarded and that a Halo Effect is consistently observed in the years following the attribution of the Prize. Furthermore, by benchmarking the laureates against other nominees we have measured the “bibliometric” difference between the two and showed that winners are systematically higher ranking scientists than nominees at the time of the prize.

Changes in the size and organization of the two fields result in a rapid decline of the predictive power of bibliometric data over the century as the winners are distributed over a larger spectrum of rankings than at the beginning of the twentieth century. This can be explained not only by the growing size and fragmentation of the two disciplines, but also, at least in the case of physics, by an implicit hierarchy in the most legitimate topics within the discipline. Further research could show whether or not normalizing the rankings within given specialties would increase the odds of picking winners. Even then, the large number of such specialties, which are not always easy to define, and the existing limit of three winners for the discipline, mean that the predictive power of bibliometrics will inevitably stay very low. The task of the Nobel Committee is thus made harder by the variety of specialties and the accompanying embarrassment of riches which forces it to take into account the politics of the discipline when choosing which specialty will be crowned in a given year.