Introduction

Garfield (1955) in his classic paper “citation indexes for science: a new dimension in documentation through association of ideas” may be said to have launched contemporary scientometrics. Since 1964 the Science Citation Index (SCI) has been a leading tool in indexing (Garfield 1964). The SCI is a valuable sociometric tool for historians and sociologists (Garfield 1970) and a unique and necessary tool for scientific work (Malin 1968). The SCI presents data in the form of what has been published and the associated citations of publications. It has become one of the most widely and frequently used databases for searching literature and evaluating research performances. Basically, the SCI offers two main sources of information: “what has been published”? and “what are the relationships among these publications”? The citation index of an article is not a direct measure of its quality or importance; it is a measure of recognition that may suggest its visibility or impact on the scientific community (Furlan and Fehlings 2006). In earlier years, the SCI has been applied to the analysis of the most cited life sciences (Garfield 1989), physical sciences (Brush 1990) publications, and the most cited Soviet papers (Garfield 1990).

Thus, for example in the last decade, the most cited articles in medical research fields were orthopaedic surgery (Lefaivre et al. 2011), ophthalmology (Ohba and Nakao 2010), critical care medicine (Rosenberg et al. 2010), urology (Hennessey et al. 2009), pediatric surgical research (Celayir et al. 2008), occupational medicine (Gehanno et al. 2007), periodontology (Nieri et al. 2007), traumatic spinal cord injury (Furlan and Fehlings 2006), and trauma (Ollerton and Sugrue 2005).

General conclusions publishing behaviour include the observation that the top-cited papers were published in high impact factor journals. North America was most active in producing top-cited papers. Researchers for instance, Celayir et al. (2008), Rosenberg et al. (2010), Ohba and Nakao (2010) investigated the research focus of the top-cited papers. Furthermore, it was found that Nobelists are consistently highly cited while only a small percentage of most-cited authors win the prize. It would be expected that a large percentage of the latter are elected to national academies of science (Garfield and Welljamsdorof 1992).

This corresponds with the observation that equal credit is not given to all contributors of a publication. At the individual level, a non-alphabetical name order sends a clear signal to the market that the author who is listed first has actually contributed more (Engers et al. 1999). The first author is the person who contributed most to the work and writing of the article (Gaeta 1999). It has also been mentioned in guidelines on authorship of medical papers that the first author should have made major contributions in conception of the work represented by the article, design of the work, analysis and interpretation of data or other evidence presented in the article as well as drafting the article or revising it for critically important content (Huth 1986). The corresponding author is perceived as the author contributing significantly to the article independently of the author position (Mattsson et al. 2011). The corresponding author supervised the planning and execution of the study and the writing of the paper (Burman 1982). At the country or institutional level, the country or institution of the corresponding author might be a home base of a study, or origin of the paper.

In this study, all journal articles with more than 1,000 total citations since publication to 2010 were selected as top-cited research works and analyzed with regard to citation histories, total citation, and citation in 2010, journals, and Web of Science categories. Top-cited publications with authors address information were further analyzed with a new indicator the Y-index that was developed and used to evaluate contributions of individual authors, institutions, and countries.

Methodology

The information on documents used in this study is based on the Science Citation Index Expanded (SCI-Expanded) database of the Thomson Reuters Web of Science. According to Journal Citation Reports (JCR) of 2010, SCI-Expanded indexes 8,073 journals with citation references across 174 scientific disciplines in science edition. The journal index of the Web of Science was last updated on 29 February 2012. All the papers published in the last 10 years of the twentieth century and the beginning 10 years of twenty first century, were collected. The citation frequencies for each of the top-cited papers per year were collected from papers published to 2010. The total citations of a paper in recent year (2010) was recorded as C2010 and the total number of citations of an article from its publication to 2010 was recorded as TC2010 (Wang et al. 2011; Chuang et al. 2011). The articles with TC2010 ≥ 1,000 were selected as top-cited articles. The records were downloaded into spreadsheet software, and additional coding was manually performed to obtain the frequency distributions and percentages. Articles originating from England, Scotland, Northern Ireland, and Wales were reclassified as being from the United Kingdom (UK). USSR and Russia were reclassified as being from Russia. Czechoslovakia and Czech Republic were also reclassified as being from Czech Republic. Yugoslavia and Croatia were also reclassified as being from Croatia. Similarly, Articles from Hong Kong published before 1997 were included in the China category. Collaboration type was determined by the addresses of the authors, whereas, the term “country independent article” was assigned if the researchers’ addresses were from the same country. The term “internationally collaborative article” was designated to those articles that were coauthored by researchers from multiple countries (Chiu and Ho 2005). The term “institution independent article” was assigned if the researchers’ addresses were from the same institution. The term “inter-institutionally collaborative article” was assigned if authors were from different institutions (Li and Ho 2008). The impact factor of a journal was determined for each document as reported in the JCR 2010.

Y-index

It has been accepted that the most important positions are the first and the last, whom very often is the corresponding author (Zuckerman 1968; Costas and Bordons 2011). The first author contributed most to the work, including conducting research and writing of the manuscript (Herbertz and Müller-Hill 1995; Riesenberg and Lundberg 1990). It was also noticed that the corresponding author supervised the planning and execution of the study and the writing of the paper (Burman 1982). It has been found that an increase number of authors in a paper is more likely to precipitate various unethical authorship practice including gift authorship (Slone 1996; Dotson and Slaughter 2011). In addition, gift or honorary authorship is defined as the inclusion as author of an individual who has not contributed adequately to the project (Bennett and Taylor 2003; Singh 2009). However, the honorary authorship is still regarded as a minor digression and the honorary or gift authorship is unacceptable in the Lancet (2008). In this study, the Y-index is related to important positions which are the numbers of first author publications (FP) and corresponding author publications (RP). In general, only one parameter was included in indexes, for example h-index (Hirsch 2005), g-index (Egghe 2006), A-index (Jin 2006), R-index (Jin et al. 2007), and AR-index (Jin et al. 2007). The construction of the Y-index with two parameters (j, θ), is an attempt to assess both the publication quantity and character of contribution as a single index. The Y-index is defined as:

$$ j = \sqrt {{\text{FP}}^{2} + {\text{RP}}^{2} } $$
(1)
$$ \theta = \tan^{ - 1} \left( {\frac{\text{RP}}{\text{FP}}} \right) $$
(2)

j indicates publication quantity with important author positions (first and corresponding authors) articles only. It was calculated by using numbers of first authored articles and corresponding authored articles as the Eq. (1). When one had larger j it could be also found one’s Y-index located far away from original of the polar coordinates. It means that one published more articles as “important author”. In order to make sure where Y-index would be in the polar coordinates, another parameter θ is necessary. θ is a publication character constant, that differentiates its nature of leadership role. It introduces distribution of the numbers of the first authored articles and the corresponding authored articles. When the numbers of the first authored articles and the corresponding authored articles are the same, Y-index is located in the 45 degree (0.7854 rad) line. Thus θ could be calculated by using Eq. (2). Then when θ > 0.7854, means one published more corresponding author papers and θ < 0.7854, means one published more first author papers. When θ = 0, j number of first author papers and θ = ∞, j number of corresponding author papers.

In the SCI-Expanded database, the corresponding author is labeled as the reprint author. In our study this person is referred to as the corresponding author. In a single author article where authorship is not specified, the author is classified as the first author and the corresponding author. The Y-index was calculated and was applied to evaluate country, institution, and individual author publication characters. In total 3,022 documents with both first and corresponding authors were analyzed using the Y-index.

Results and discussion

In total 21,066,849 papers were published in SCI-Expanded from 1991 to 2010. There were 3,652 papers with TC2010 ≥ 1,000. These papers consist of 9 document types—articles (2,541) dominate with a comprising 70 %, followed distantly by reviews (951; 26 %). English is the only one language used for the top-cited papers. Finally, the journal articles (2,541) were extracted from the 3,652 documents for subsequent analyses.

Table 1 shows the number of total citations since papers publication to 2010 (TC2010) were further considered for the citations per publication (CPP) which was similarly for most document types but software reviews had highest CPP (4,128).

Table 1 Document type distribution

Effect of time on citation analysis

Figure 1 shows geographical distribution of these top-cited articles. North America, West Europe, and Japan were the main production area. Seventy percent of articles identified were published from 1991 to 1998 (Fig. 2). It also shows that the years from 2007 to 2009, although there were fewer top-cited articles, on average they have higher citations (CPP > 2,000) than earlier years. In 2008, there were only three top-cited articles. Article “a short history of SHELX” published in Acta Crystallographica Section A by Sheldrick (2008) had TC2010 = 15,241. This article had the highest yearly citation in 2010 with 6,826 citations. “CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice” (Thompson et al. 1994) with TC2010 = 31,799, “density-functional thermochemistry. III. The role of exact exchange” (Becke 1993) with TC2010 = 29,822, and “gapped BLAST and PSI-BLAST: a new generation of protein database search programs” with TC2010 = 26,578 (Altschul et al. 1997) had highest TC2010 in SCI-Expanded from 1991 to 2010. “Structure validation in chemical crystallography” published in Acta Crystallographica Section D-Biological Crystallography (Spek 2009) and “Cancer Statistics 2009” published in CA-A Cancer Journal for Clinicians (Jemal et al. 2009), were the latest top-cited articles with TC2010 2,613 and 1,945 respectively.

Fig. 1
figure 1

Distribution of top-cited articles in the world

Fig. 2
figure 2

Number of articles on the top-cited article and citation per articles

Table 2 presents data on the 12 articles cited more than 10,000 times. Out of these 12 articles, three were internationally collaborative publications and nine were independent publications. The USA published five independent articles followed by Germany with two articles and one for Canada and Japan respectively. The journals in which these articles published were Nucleic Acids Research (IF = 7.836) in the field of biochemistry and molecular biology with three articles, followed by Acta Crystallographica Section A (IF = 54.333) in the field of crystallography with two articles, and one article for each of Methods (IF = 4.527), Acta Crystallographica Section D-Biological Crystallography (IF = 3.326), Nature (IF = 36.104), Bioinformatics (IF = 4.877), Physical Review Letters (IF = 7.621), Journal of Chemical Physics (IF = 2.92), and Medical Care (IF = 3.183) respectively. Three authors had single author article including Becke from Canada, Sheldrick from Germany, and Iijima from Japan. Thompson from France who was the only authors published two first author articles.

Table 2 Twelve most frequently cited articles in the SCI-Expanded (TC2010 > 10,000)

Journal and Web of Science category

The top-cited articles were published by 365 journals across 122 subject categories. Of these 365 journals, 187 (51 %) journals contained only one top-cited article; 68 (19 %) journals contained two articles; and 31 (8.5 %) journals contained three articles. 2,491 articles were published in journals with impact factor information in JCR in 2010. Fifty-five percent of the top-cited articles were published in 10 journals with impact factors higher than 30. Half of the top-cited articles were published in five journals including Science (IF = 31.364) with 406 articles (16 %), followed by Nature (IF = 36.104) with 378 articles, New England Journal of Medicine (IF = 53.484) with 203 articles, Cell (IF = 32.401) with 197 articles, and Proceedings of the National Academy of Sciences of the United States of America (IF = 9.771) with 87 articles (Table 3).

Table 3 Characteristics of top 20 journals with the top-cited articles

As expected, top-cited articles were published in journals with the high impact factor, similar to the phenomenon in the study related to anesthetics (Baltussen and Kindler 2004). It was also noticed that leading journals attract the most-cited publications, which in turn maintain the high impact factor of these journals (Schein et al. 2000). However, articles with TC2010 > 1,000 could be also found in journals with lower impact factors such as European Transactions on Telecommunications with IF = 0.448, Applied Clay Science with IF = 2.303, Journal of the Electrochemical Society with IF = 2.42 in category of telecommunications; Optimization Methods and Software with IF = 0.794 in category of software engineering computer science, operations research and management science, and applied mathematics; AI Communications with IF = 0.837 in category of artificial intelligence computer science; Theoretical Computer Science with IF = 0.838 in category of theory and methods computer science; and Spatial Vision with IF = 0.883 in category of biophysics.

Within the total 122 Web of Science categories, 65 categories (53 %) accounted for 0–5 top-cited articles, 14 categories (12 %) for 6–10 articles, 23 categories (19 %) for 11–30 articles, and 20 categories (16 %) for more than 30 articles. In particular, the four top categories including multidisciplinary sciences with 875 articles, biochemistry and molecular biology with 388 articles, general and internal medicine with 370 articles, and cell biology with 274 articles, took the majority of the total top-cited articles with a high percentage of 75 %. Thirteen categories had two top-cited articles, thirty-two categories had only one and fifty-two categories had none.

Publication performances: countries, institutions, and authors

In recent years, indicators of performance of first authors (Li and Ho 2008), institutions (Ho et al. 2010) and countries (Wang et al. 2010) were examined to compare research performances. Among 3,652 top-cited documents, 3,022 publications had both first authors and corresponding author information, including 1999 articles, 889 reviews, 49 proceedings paper articles, 33 notes, 32 editorial materials, 12 letters, 7 software reviews and one database review were analyzed for publications of countries, institutions, and authors.

The Y-index was used to analyze publication performance of the 3,022 top-cited publications including 2,256 (75 %) country independent publications from 29 countries and 766 (25 %) internationally collaborative publications from 62 countries. Table 4 shows the 29 countries which had independent publications, ranked according to the number of total top-cited papers published, in terms of number total papers, country independent papers, internationally collaborative papers, first author papers, and corresponding author’s papers. Moreover the Y-index constants, θ, j, and their ranks, were also presented. United States (USA) was the most productive country in five indicators. It had strongest publication intensity with j = 2,715 while Poland, Taiwan, Singapore, and Czech Republic had the lowest j (j = 1.141). USA published more corresponding author papers than first author papers with θ = 0.7911. Czech Republic, Russia, Finland, Spain, Denmark, Italy, Switzerland, Australia, The Netherlands, France, Japan, and Germany had θ < 0.7854, that means these countries published more first author papers. In addition, Ireland ranked 21th in total top-cited publications, Mexico (25th), Chile (26th), Argentina (29th), Iceland (31st), South Africa (31st), Greece (35th), and Portugal (36th). No first author papers were published by these countries. The G7 countries (the US, UK, Canada, Germany, France, Italy, and Japan) had high productivity in top-cited publications, which included 2,814 (93 % of 3,022 top-cited publications). Domination in publication from mainstream countries was not surprising since this pattern has occurred in many medical related topics, for example patent ductus arteriosus (Hsieh et al. 2004), asthma in children (Chen et al. 2005), stem cells (Li et al. 2009a), Helicobacter pylori (Suk et al. 2011), human papillomavirus (Lin et al. 2011), and desalination (Tanaka and Ho 2011).

Table 4 Characteristics of the 29 countries with first author publications

Of the 3,022 publications with both first and corresponding author information in Web of Science, 1,347 (45 %) publications came from independent institutions, 1,675 (55 %) publications from inter-institutional collaboration. The inter-institutional collaboration rate of top-cited articles (55 %) was observed to be greater than with that of many fields with all related articles, for example 53 % of atmospheric simulation (Li et al. 2009b) and acupuncture research (Han and Ho 2011), 44 % of solid waste research (Fu et al. 2010), 37 % of desalination research (Tanaka and Ho 2011), 50 % of articles in water resources field (Wang et al. 2011) as well as some medical fields with respect to classic citation articles, for example 12 % of 100 top-cited articles in general surgical journals (Paladugu et al. 2002), 8 % of 100 ophthalmology class citations (Ohba et al. 2007); but smaller than fields for example 60 % of Helicobacter pylori research (Suk et al. 2011) and 62 % of global climate change (Li et al. 2011).

Results from several analytical methods for the analysis, including correlations, nonparametric tests (e.g., the Mann–Whitney test), and multidimensional scaling (MDS) pointed out that a majority of US universities were dominant in the center (Lee and Park 2012). Table 5 shows the top 20 institutions published at least 50 top-cited papers, ranked according to the total number of top-cited articles. Among the top 20 institutes, 18 (90 %) located in USA. Harvard University ranked first with 231 papers, followed by Stanford University (120 papers), and University of Texas (108 papers), and Massachusetts Institute of Technology (106 papers). Brigham & Womens’ Hospital, Massachusetts General Hospital, and National Cancer Institute were three non-university institutions. These two hospitals had less independent publications. The two non-US institutions were University of Oxford and University Cambridge, ranked in 15th and 18th, respectively. Furthermore Duke University ranked 12th and Rockefeller University ranked 13th in independent publications. Duke University, American Cancer Society, and Salk Institute for Biological Studies, and Osaka University, all ranked 17th in first author publications. Salk Institute for Biological Studies also ranked 15th in corresponding author publications.

Table 5 Characteristics of the top 20 institutions

Harvard University not only ranked top in total articles but also the top one in j. It has been reported that Harvard University was ranked as one, which was re-coded as 100 for easy interpretation (Lee and Park 2012). However, the rank of total top-cited publication and the rank of Y-index were changed. Some institutions were increased such as National Cancer Institute, University of Cambridge, Yale University, University of California San Francisco, Massachusetts Institute of Technology, and University of California Berkeley and were decreased including Massachusetts General Hospital, University of Pennsylvania, University of Oxford, Johns Hopkins University, Brigham & Womens’ Hospital, Columbia University, University of California San Diego, and University of Texas. Seven institutions in Table 5 with θ > 0.7854 had more corresponding author publications for example Columbia University and Massachusetts Gen Hospital while nine institutions with θ < 0.7854 had more first author publications for example University of California Los Angeles.

Everyone listed as an author of one article has made an independent material contribution to the manuscript (Coats 2009). Of the 3,022 top-cited publications with first and corresponding author information in SCI-Expanded, there were 22,335 authors from 62 countries. The percentages of publication with one, two, three, and four authors were 82, 13, 3.2, and 1.0 %, respectively. Only 1 % of 22,335 authors published at least five top-cited papers. Overall, top-cited publications (3,022) were published by 2,636 (12 %) first authors, 2,481 (11 %) corresponding author. The top 12 productive authors who published more than 10 top-cited papers, were Lander E. S. (17 papers), Wang, J. (15), Collins, R. (14), Peto, R. (14), Akira, S. (13), Vogelstein, B. (13), Brown, P.O. (12), Wang, Y. (12), Yusuf, S. (11), Botstein, D. (11), Thun, M. J. (11), and Murray, T. (11).

Figure 3 shows distribution of the Y-index (j, θ) of the top 67 authors with j > 4.00. j is publication intensity constant, an author with a higher j indicates more papers as first or corresponding authors, and partake leadership role in more papers. Jemal, A. had 8 papers in which all were first author and corresponding author (θ = 0.7854), and had the highest j of 11.3, followed by Lieber, C. M. (j = 9.00), Kresse, G. (j = 8.49), and Mirkin, C. A. (j = 8.06). θ, a publication character constant, differentiate its nature of leadership role. When θ > 0.7854, means published more corresponding author papers and θ < 0.7854, means published more first author papers. When θ = 0, j = number of first author articles and θ = ∞, j = number of corresponding author articles. Lieber, C. M. published 9 corresponding authors and none first author paper (9.00, ∞), followed by Jemal, A. and Mirkin, C. A. both had 8 corresponding author papers. Jemal, A. published 8 first author papers followed by Kresse, G. with 6. Alivisatos, A. P., Ridker, P. M., Botstein, D. and Dekker, C. had the same numbers of publications (r = 5.00). Botstein, D. and Dekker, C. (θ = 0.9273) published more first author papers than Alivisatos, A. P. and Ridker, P. M. who published only five corresponding author papers but none first author paper (θ = ∞). All authors in Fig. 3 had a θ ≥ 0.7854.

Fig. 3
figure 3

Top 67 authors with Y-index (i > 4.00)

Conclusion

The 3,652 top-cited papers in nine document types published from 1991 to 2010 were analyzed. Of these 2,541 articles were published in 365 journals listed in 122 Web of Science categories. Science, Nature, New England Journal of Medicine, and Cell which ranked top four by top-cited articles, of the 8,073 journals in the JCR Science Edition in 2010, published most top-cited articles. The USA ranked first by all indicators, although the article with most citations was published by Thompson, Higgins, and Gibson from the European Molecular Biology Laboratory, Germany. Six of the top 12 articles with TC2010 > 10,000 were published as institution independent articles and nine articles were country independent from the USA, Germany, Canada, and Japan respectively. A new indicator, the Y-index was proposed and successfully applied to evaluate the publication character of authors, institutions, and countries. The number of first and corresponding author articles were similar for institutions and countries, but varied significantly among individuals.

Harvard University was the most productive institution, ranked top one in total, first author, and corresponding author top-cited publications, as indicated by the Y-index. Lander E. S. was the most productive author while Jemal A. had highest publication performance of first and corresponding author articles, as indicated by the Y-index. Both first author and corresponding author are major contributors to a published research work, but the quality of their contribution does differ significantly. It was shown that the Y-index can assist bilbiometric researchers to look behind the usual indices of total publication, but also the character of contribution.