I have long worshiped the eponym as one of the last vestiges of humanism remaining in an increasingly numeralized and computerized society.

(Robertson 1972)

Introduction

In his thought-provoking essay on eponymy, Garfield (1983, p. 393) stressed that “Eponyms remind us that science and scholarship are the work of dedicated people.” Four decades earlier, Merton (1942, p. 121) had acknowledged the role of this “mnemonic and commemorative device” in the social structure of science. He further defined it as “the practice of affixing the name of the scientist to all or part of what he has found, as with the Copernician system, Hooke’s law, Planck’s constant, or Halley’s comet” (Merton 1957, p. 643). Since then, the topic of eponymy has been extensively discussed. It is even gaining an increasing attention nowadays (Fig. 1). Although a comprehensive review of the literature falls beyond the scope of this article, let us outline a few outstanding contributions before considering the following issue: How to systematically extract eponyms from full-text articles?

Fig. 1
figure 1

The number of articles published each year about eponyms has been increasingly growing, with a sharper increase during the last decade. These 743 records were retrieved from the Web of Knowledge on April 1st, 2013 by searching for “eponym*” in the Title field. Note that this figure underestimates the literature of eponyms, as articles pre-dating 1948 or lacking this word in their titles were not retrieved (e.g., Merton 1942)

Merton (1942, 1957) highlighted the prominent role of eponymy in the reward system of science. From the perspective of the history of science, Beaver (1976) studied the rate of eponymic growth. He discussed a puzzling observation: Although the number of scientists increased exponentialy during the twentieth century, the practice of eponymy remained constant in time. Garfield (1983) commented several features of eponymy, such as its twofold definition, Footnote 1 the various purposes of eponyms, and their debated use—especially in medicine. From a psychological and historic perspectives, Simonton (1984) discussed the relation between eponymy and ruler eminence in studying European hereditary monarchs. Further scientometric studies investigated the development of eponymy (Thomas 1992) and its relation with research evaluation (Száva-Kováts 1994) through non-indexed eponymal citedness, as the use of an eponym without any proper citation of the original work.

Several research articles have discussed the history and developments of specific eponyms in virtually all fields of science. Some tackle a single eponym, such as the ‘Shpol’skii fluorimetry’ in analytical chemistry (Braun and Klein 1992), ‘Southern blotting’ in molecular biology (Thomas 1992), the ‘Nash Equilibrium’ in mathematics (McCain 2011), and the ‘Henry V sign’ in medicine (Shanahan et al 2013). In scientometrics, Braun et al (2010) discussed two eponyms (i.e., Garfield’s law of concentration and Garfield’s constant) coined by and after E. Garfield in the festschrift dedicated to his 85th anniversary. Whilst most studies dealt with eponyms based on person names, McCain (2012) studied the use of ‘evolutionary stable strategies,’ as a non-eponymous case of obliteration by incorporation. Footnote 2 On a different note, various eponyms are known for failing to acknowledge the actual discoverers (Stigler 1980). In addition, in some extreme cases, the scientific community called for eponym retraction. For instance, Wallace and Weisman (2000) argued the case against the use of ‘Reiter’s syndrome’ honouring a war criminal and eventually, Panush et al (2007) recommended its retraction and suggested the use of ‘reactive arthritis’ instead.

In various fields, editorials and reviews have discussed a few prominent eponyms, such as in chemistry (Cintas 2004), and in forensic pathology (Nečas and Hejna 2012). There is indeed even a study with a narrower scope, addressing the eponyms in the field of education that were named after Spanish people (Fernández-Cano and Fernández-Guerrero 2003). There are also several dictionaries of eponyms of general interest (e.g., Ruffner 1977; Freeman 1997), as well as specialized ones (e.g., Zusne 1987; Trahair 1994).

A more long-standing debate was one dissecting the merits and flaws of eponyms in medicine, where they are widely practiced (e.g., see Boring 1964; Robertson 1972; Kay 1973). The climax of this debate is illustrated by the double-page spread in the British Medical Journal that featured a supporter (Whitworth 2007) and two opponents (Woywodt and Matteson 2007) of eponyms head-to-head. It seems that the matter is still not settled, though.

In the early days of the Science Citation Index, Garfield (1965, p. 189) reflected on the feasibility of inferring implicit references present in documents, evoking the case of an “eponymic concept or term.” It turns out that the impact of a research contribution is underestimated when considering explicit references only (Garfield 1973). Száva-Kováts (1994) notably raised this point to dispute the views of Cole and Cole (1972), who refuted an anti-elitist theory that they dubbed the Ortega Hypothesis. Footnote 3 Száva-Kováts (1994, p. 60) concluded that “the data of citedness based on citation indexes are quite inadequate to indicate the real measure of actual citedness of scientists in scientific articles.”

Some researchers have attempted to extract and quantify every eponym used in various fields of science. Several manual methods have been devised, targeting various materials. For instance, Diodato (1984) looked at the titles of articles in psychology and mathematics. Braun and Pálos (1989, 1990) perused the subject indexes of chemistry textbooks.Footnote 4 Besides the inspection of various dictionaries and source books in psychology (Roeckelein 1995), Roeckelein (1972, 1974, 1995) reported a series of line-by-line content analysis of introductory psychology textbooks. These daunting tasks were performed with the help of an army of student volunteers, who tediously marked the textbooks “without knowing why,” as acknowledged in the footnotes of (Roeckelein 1972, 1995).

The present article is concerned with this latter line of research, as we tackle the following question: How can we improve eponym extraction and quantification from full-text articles? It is possible that computing capabilities can provide an affordable, fast, reliable, and replicable method for extracting and quantifying eponyms in scientific articles.

Method

We designed a semi-automatic text mining approach to extract eponyms from any collection of documents, such as the articles published in an academic journal. The proposed approach relies on following two steps. First, each document is processed by a computer program using regular expressions to detect candidate eponyms in the text (e.g., Bradford’s bibliographical law). Second, these candidate eponyms are manually validated and labelled with the underlying person’s name (e.g., Bradford). Eventually, a list of names is produced ranked by frequency of appearance. We detail these two steps in the following sections.

Step 1: automatic extraction of candidate eponyms with regular expressions

Building on a standard text mining technique, we rely on regular expressions to identify eponyms in texts. A regular expression defines a pattern of text that is to be matched in a given document. For instance, the pattern \([{\texttt{A-Z}}]\;[{\texttt{A-Za-z}}]+{\texttt{ian}}\) matches any character string that starts with a capital letter (\([{\texttt{A-Z}}]\)) followed by at least one letter (\([{\texttt{A-Za-z}}]+\)) and ends with the three letters \({\texttt{ian}}\). As a result, this pattern matches the word “Mertonian” in the title “What is Mertonian sociology of science?” of (Hargens 2004), for instance. The interested reader is referred to (Friedl 2006) for a thorough coverage of regular expressions.

In Step 1, each document is processed by the computer program provided in Listing 1 (see Appendix) in search of eponyms. This program parses textual contents with the regular expression that is illustrated in Fig. 2. Here we tackle two kinds of eponyms:

  • Adjectival eponyms are matched in Part 1 of the regular expression. Such eponyms include, for example, Hippocratic medicine, Aristotelian logic, Euclidean geometry, Boolean algebra, and Keynesian economics. This list found in (Merton 1957, p. 643) is certainly not comprehensive, but we used it as a cue to match the suffixes -ean, -ian, and -tic in Part 1. Still, this list can easily be tailored in Listing 1.

  • Nominal eponyms appear in various expressions mixing up the name(s) of person(s), bibliographic references, and the target of the eponym (e.g., a law, a distribution). Here are some examples of eponyms matched by our regular expression:

    • ‘Bradford’s Law’ is matched by Parts 2 and 4.

    • ‘[The] Hirsch h-index’ is matched by Parts 2 and 4.

    • ‘Vinkler’s (2010a, 2013) πv-index’ is matched by Parts 2–4.

Fig. 2
figure 2

Syntax diagram of the regular expression used in Listing 1 to extract eponyms from text. The upper sub-expression (i.e., Part 1) matches adjectival eponyms (e.g., ‘Mertonian’), whilst the lower sub-expression (i.e., Parts 2–4) matches nominal eponyms, such as ‘Vinkler’s (2010a, 2013) πv-index.’ This diagram was produced by http://www.regexper.com

Eponyms are known to appear in both possessive and non-possessive forms (e.g., consider ‘Lotka’s law’ versus ‘Lotka law’). Footnote 5 The regular expression in Part 1 matches both.

Note that Parts 1–2 match capitalised eponyms only. Nonetheless, Garfield (1983, p. 384) noted that many listed in (Ruffner 1977) are no longer capitalized once absorbed in everyday language (e.g., diesel engine, saxophone). Here we rely on capital letters as a clue to eponymy, at the expense of such absorbed, non-capitalised eponyms. In addition, the proposed regular expressions consider author-date referencing style, as recommended by the American Psychological Association (APA 2010, Chap. 6), amongst others. Note that the proposed regular expression also handles other less complex referencing styles, such as numeric referencing (e.g., ‘Vinkler’s πv-index [3, 6]’). Eventually, the set of words used in Part 4 of the regular expression (i.e., distribution, coefficient) should be reviewed and tailored regarding the domain under study.

The outcome of Step 1 is a list of candidate eponyms weighted by their number of occurrences in the processed documents. Let us stress that any eponym found several times in a given document contributes only one point to its weight in the list. Thus, there is no over-representation of a given eponym just because it was used a large number of times in a few papers. The weight of an eponym in the result list is thus correlated with its acceptance in the considered research community.

Step 2: manual validation and labelling of eponyms

The proposed regular expression is designed to match capitalised eponyms. Unfortunately, other non-eponymic expressions are matched too (e.g., ‘Average h index’). Such ‘false positive’ expressions have to be identified and discarded manually.

In Step 2, each eponymic expression extracted during Step 1 is manually assessed. If the expression does not correspond to an eponym, it is dropped. Otherwise, it is labelled with the target person’s name. The human assessor may use any available dictionary of eponyms (e.g., Ruffner 1977; Freeman 1997) in conjunction with online materials for this task. Eventually, all eponymic expressions associated with the same person’s name are grouped and their weight are summed up. For instance, ‘Hirschian’ (found in 5 papers) and ‘Hirsch’s h-index’ (found in 45 papers) are grouped under the ‘Hirsch’ label with weight 50.

Data

As a case study, the method was applied to the Scientometrics journal. All available full-text articles were considered: 41 issues numbered 82(1) to 95(2) published from 2010 to 2013 were assessed containing 821 articles. These were downloaded in HTML format from SpringerLink. Eventually, the formatting instructions were dropped by stripping out HTML tags. This data cleaning process resulted in one file of plain text for each of the 821 articles.

Results

The 821 full-text files were processed by the computer program showed in Listing 1. Step 1 resulted in 3,457 candidate eponyms (see Online Supplementary Material). In Step 2, only 493 of these candidate eponyms passed manual validation. Eventually, there were 226 distinct person names targeted by these validated eponyms, and Fig. 3 shows the most frequent names cited in the eponyms of the processed articles. A graphical view of this Hall of Fame is shown in Fig. 4, where name sizes are proportional to their frequency as reported in Fig. 3. To the best of my knowledge, this is the first attempt at semi-automatic eponym extraction and quantification from full-text articles.

Fig. 3
figure 3

The 58 most frequent person names cited in 821 Scientometrics articles published in 2010–2013. The distribution of decreasing name occurrences fits a power law (R 2 = 0.9577)

Fig. 4
figure 4

Hall of Fame for the eponimised persons extracted from 821 Scientometrics articles published in 2010–2013 (see Fig. 3). This word cloud was produced by http://www.wordle.net

The most frequently eponymised person in the corpus is Jorge E. Hirsch, professor of physics at the University of California, San Diego. He is acknowledged for inventing the h-index (Hirsch, 2005), which gauges the impact of an author’s research according to his/her number of publications and citation rate. The h-index has soon attracted a great deal of attention in the community of scientometrics, as dozens of articles have discussed it and proposed extensions to it (Schreiber et al 2012). In this study, 15% of the 821 articles featured an eponym such as ‘Hirsch index,’ ‘Hirsch’s h-index,’ and the adjective ‘Hirschian.’ Here we need to bare in mind the following two empirical observations about eponyms:

“First, names are not given to scientific discoveries by historians of science or even by individual scientists, but by the community of practicing scientists (most of whom have no special historical expertise). Second, names are rarely given, and never generally accepted unless the namer (or accepter of the name) is remote in time or place (or both) from the scientist being honored.” (Stigler 1980, p. 148)

Hirsch’s (2005) article was published in November 15, 2005 with PNAS, but a preprint Footnote 6 was already available online as of August 3, 2005. Hirsch does not appear to have coined himself the eponym ‘Hirsch index,’ since the only occurrence of his name appears in the article’s byline. A few days after the preprint was posted on arXiv (August 18), Ball’s (2005) Nature article publicized the h-index. Following a quick framing of Hirsch and his invention, the second paragraph of this article starts with “His ‘h-index’ depends on” without introducing any eponymic version, though. Eventually, one of the first uses of the eponym in reference to the h-index (i.e., ‘Hirsch-type index’) seems to be due to Braun et al’s (2005) paper dated November 21, 2005. This eponym spread like wildfire in Scientometrics, starting with van Raan’s (2006) article received on December 1, 2005, and subsequent others published in 2006 onward (e.g., Egghe and Rousseau 2006; Banks 2006; Braun et al 2006)—note the presence of the eponym in the title of these articles!

Still regarding the Hirsch index, let us go back to Stigler’s (1980, p. 148) aforementioned observations. The first one obviously applies, since leading scientometricians introduced and publicized the use of this eponym. The second observation, however, does not seem to fit here: only one week Footnote 7 separated the PNAS publication and the first occurrence of the eponym ‘Hirsch-type index.’ Is this to say that Hirsch’s (2005) h-index was so extraordinary that it challenged and defeated the venerable mechanics of eponymy? Or are we facing a case of self-suggested eponymy? Had Hirsch picked up another letter than h (which incidentally turns out to be his initial), would we be commenting the ubiquity of eponyms named after him? Let us leave this question to historians of science.

A number of influential scholars are praised in various highly cited eponyms in Fig. 3. Some acknowledge the work of discoverers of famous laws and distributions that are of particular interest to scientometrics, such as Bradford, Gauss, Lotka, Pareto, Poisson, and Zipf (see e.g., Bar-Ilan 2008; De Bellis 2009). Many other eponyms praise the inventors of statistical procedures and tests, Footnote 8 such as Fisher, Gini, Kolmogorov–Smirnov, Kruskal–Wallis, Mann–Whitney, Pearson, Spearman, and Student (see e.g., Kotz et al 2005). Prominent scholars who published with Scientometrics were also eponymised: de Solla Price, Garfield, and Merton, all three being early gatekeepers to serve the journal (Beck et al 1978; Braun 2004). We also note the emergence of eponyms coined after three ‘younger’ scientometricians sitting in the board of the journal: Egghe, Glänzel, and Schubert.

Finally, the perspicacious reader might have noted that St. Matthew ranks highly in the list (fifth). This is due to him starring in the ‘Matthew effect’ eponym coined by Merton (1968a). It is well established, however, that “St. Matthew did not discover the Matthew Effect” (Stigler 1980, p. 148), thus illustrating Stigler’s purposely ironic Law of Eponymy: “No specific discovery is named after its original discoverer.” Footnote 9

Discussion

The main issue in this paper is how well this methodology worked in terms of standard criteria used in the Information Retrieval (IR) community: efficiency and effectiveness (see Kelly 2009, pp. 116–119).

Efficiency relates to the performance of an IR system in minimizing execution time and space needed (i.e., storage). The computer program involved in Step 1 processed the 821 articles in about 30 seconds using a regular laptop of year 2011. In contrast, Step 2 took longer as it required the manual validation of the 3,457 candidate eponyms. Note that there are affordable ways to reduce the duration of this task. For instance, crowdsourcing would allow the distribution of this manual validation task (with potential redundancy to satisfy with quality concerns) to several people working for a small amount of money. This opportunity has proved effective in IR, where so-called ‘workers’ assess the relevance of documents with respect to a given textual query (see Lease and Yilmaz 2013).

Effectiveness relates to the performance of an IR system in producing quality results. Here, two measures are usually considered. On the one hand, precision measures the extent to which the retrieved eponyms are relevant. The method yields 100% precision, as candidate eponyms are manually validated. On the other hand, recall measures the extent to which all relevant eponyms (in the whole corpus of articles) are retrieved. It is not possible to measure recall in this case study because we do not know the set of all eponyms used—this would require the manual labelling of all the eponyms in the 821 articles. We leave to future work the suggestion of creating such ‘ground truth’ for benchmarking eponym extraction approaches.

All in all, the proposed method for eponym extraction and quantification yields precise results but does not guarantee completeness. It should be stressed here that the regular expression used in Listing 1 (especially Part 4 showed in Fig. 2) should be tailored to the scientific domain under study. For instance, it should be extended to match eponyms such as ‘Brownian motion’ and ‘Schrödinger equation’ in physics.

Another point worth discussing relates to non-indexed eponymal citedness, as commented by Száva-Kováts (1994, pp. 68–69):

“this phenomenon is a very frequent and long-standing feature in the journal literature of physics, with a permanent and growing importance. It demonstrates that not ‘a handful’ of the most eminent scientists who are creating paradigms of science, but roughly 2,000 eponymous scientists have the chance to be mentioned in the text of recent articles on physics with their scientific achievements in eponymal form, that is, without any formal bibliographical reference. Hence, this mass of scientific people faces the possibility of losing indexed citations this way. It points out that the stock of non-indexed eponymal citations amounts to a third of the indexed ones at both ends of a 30-year period (1939–1969) representing two ages in the history of science, in two representative source journals of physics.”

It must be acknowledged here that the present method extracts eponyms from texts, without any consideration of them being indexed or not in the reference sections. As it stands, it is thus an optimistic approximation of non-indexed eponymal citedness.

Finally, the present method overlooks two subtle manifestations of acknowledgements in the same vein as eponyms. On the one hand, some units of measurement are of eponymic nature per se (e.g., 1N for one newton, 2J for two joules). On the other hand, there are papers featuring the name of a scientist as a keyword, such as the mathematicians Galton and Pearson in (Stigler 1989), or the (younger) psychologist Hartley in (Zhang and Liu 2011). It might be worth devising a further text mining approach to unveil such implicit citations of an author’s œuvre in keywords, but this must be left to future work.

Conclusion

In his take on the reward system of science, Merton (1957, p. 642) suggested that “heading the list of immensely varied forms of recognition long in use is eponymy.” Prior work reported results of painstaking manual extractions of eponyms from various materials (Diodato 1984; Braun and Pálos 1989, 1990; Roeckelein 1995). These authors operated on the titles of research articles, as well as on the subject indexes appearing in textbooks.

In the present study a semi-automated text mining approach was introduced to extract and quantify eponyms from full-text articles. The method relies on a computer program processing text with regular expressions (Listing 1), followed by manual validation of the candidate eponyms found. This approach was tested on a corpus of 821 articles published in Scientometrics from 2010 to 2013. Thus the findings stress the distribution of eponyms named after prominent scientists in the fields of mathematics and scientometrics. To the best of my knowledge, this is the first attempt of eponym quantification from full-text articles.

Such a text mining approach may be applied to unveil the most prominently eponymised scientists in any field of science. The results may also contribute to spotting new research trends (e.g., the h-bubble as coined by Rousseau et al (2013)), and updating existing dictionaries of eponyms (e.g., Ruffner 1977). It only requires that full-text articles are available. Footnote 10 The method may well serve as an umpteenth illustration of the potentials of text mining for understanding the developments of science (Nature 2012; Van Noorden 2012).