Introduction

The journal impact factorFootnote 1 was first developed to help librarians select quality journals (Garfield 1972, 2006). It has been widely but controversially used as a measure of journal, research, and researcher evaluation (Bornmann and Marx 2018; Lei and Yan 2016; Liu et al. 2016; Waltman 2016). One possible limitation that may challenge the validity of the journal impact factor when used as a measure of evaluation is its relatively simplistic algorithm. A journal’s impact factor in a certain year is calculated as the number of citations received for all its items (such as original research articles, review articles, proceedings papers, editorials) published during the previous 2 years, divided by the number of citable items (Clarivate Analytics 2018). For example, a journal’s impact factor in 2016 is calculated as shown in Eq. (1):

$$ {\text{Impact}}\,{\text{factor}}\,{\text{in}}\,2016 = \frac{{{\text{Citations}}\,{\text{in}}\,2016 \,{\text{to}}\,{\text{all}}\, {\text{items}}\,{\text{published}}\,{\text{in}}\,2014\,{\text{and}} \, 2 0 1 5 }}{{{\text{Number}}\,{\text{of}}\,{\text{citable}}\,{\text{items}} \,{\text{published}}\,{\text{in}}\, 2014\, {\text{and}}\,2015 }} $$
(1)

Based on Eq. (1), if one or more items published in 2014 or 2015 received many more citations than other items, then these most highly cited items might help boost the impact factor of the journal in 2016. That is, the journal impact factor may be dramatically affected by the distorted distribution of citations. As one of the reviewers correctly points out, the citation counts are not normally distributed, and thus the mean citation number is a potentially highly skewed or misleading variable. While the median may seem to be a better estimate in the case of non-normal distribution, its use as an index for differentiating journals is not feasible, since most journals would receive an impact factor of 0 or 1. One extreme example of the issue with highly cited items is Sheldrick (2008), which was found to temporarily but dramatically help boost the impact factor of its host journal, from 2.052 in 2008 to 49.926 in 2009 and 54.333 in 2010, dropping back to 2.076 in 2011 (Dimitrov et al. 2010; Foo 2013; Krauskopf 2013). Liu et al. (2018) recently reported another example of such a phenomenon. They found that the “Review of Particle Physics” series articles also helped boost the host journals’ impact factor (i.e., the RPP phenomenon). Interestingly, although the RPP phenomenon may substantially affect the impact factor of journals with a lower impact factor via a smaller number of published items, its contribution to the impact factor of journals with a higher impact factor and a larger number of published items is relatively modest. Based on these findings, Liu and colleagues (Liu et al. 2018, p. 3) suggest excluding “the direct effects of publishing the RPP when calculating a journal’s impact factor”.

In fact, the Sheldrick (2008) case and the RPP phenomenon may not be unique. A closer look reveals that such items are all review articles. Review articles as a genre serve as “a forum for the synthesis of ideas” (Noguchi 2006, p. 16), which are categorised into history reviews, state-of-the-art reviews, theory reviews, and issue reviews. For example, Sheldrick (2008) is a history review, and the RPP articles are state-of-the-art reviews. Since review articles synthesise ideas and the state of the art in a research area, we hypothesise that they attract much more attention from colleagues and hence receive many more citations than academic pieces of other genres such as original research articles. Few studies have provided empirical evidence for this hypothesis, though researchers mention the point in their arguments (e.g., Martin 2016). For example, Amiri and Michel (2018) examined the citation counts of articles in five pharmacology journals, and found that review articles received twice as many citations as original research articles. If the foregoing hypothesis is correct, then the impact factors of the host journals may be boosted not only by the RPP articles but also by the higher numbers of citations to other review articles. Accordingly, the Sheldrick (2008) case and the RPP phenomenon as reported in Liu et al. (2018) may not be unique, but rather pervasive.

To test this hypothesis, we conduct experiments to address two research questions as follows:

RQ1: Do review articles receive more citations than original research articles?

RQ2: Do the citations that review articles receive affect the impact factor of their host journals?

In the remainder of this article, we first introduce our research methods, and then report the findings. We end the article with some discussion.

Data and methods

The data for the experiments were extracted from the Web of Science on the portal of Huazhong University of Science and Technology on October 26–28, 2018. For RQ1, we extracted citations of both original research articles and review articles in the field of physics from 2009 to 2018. Concerning the search techniques, we first used “SU = Physics AND PY = 2009–2018” in the advanced search box to retrieve all items in physics published between 2009 and 2018, and then refined the results with “highly cited in field” and “article” or “review” for document type (i.e., other document types such as editorials and commentaries were excluded). The search code is as follows.

  • SU = Physics AND PY = 2009–2018

  • Refined by: ESI Top Papers: (Highly Cited in Field) AND DOCUMENT TYPES: (ARTICLE OR REVIEW)

  • Time span: 2009–2018. Indexes: SCI-EXPANDED.

Two points should be noted here. First, the information extracted on original research articles and review articles was refined by “highly cited papers”, since the Web of Science only reports citations of publications up to 10,000 items. Second, the data from 2009 to 2018 were used because the Web of Science only reports the citations of highly cited papers (the top 1% highly cited items in a field) in the most recent 10 years (InCites Help 2018).

For RQ2, we investigated the bibliometric information for Science, Nature, and Cell. We chose these journals as the targets for three reasons. First, Science, Nature, and Cell are widely recognised top journals in science. Second, they publish a large number of pieces each year. Third, and most importantly, they publish more than 60 review articles each year. The relatively large number of review articles provides a data set large enough for our experiment. In fact, we hoped to experiment with the journals examined in Liu et al. (2018). However, these journals publish a very limited number of review articles in certain years. For example, Physical Review D, the journal with the largest publication size, published only one review article in 2012, and none from 2013 to 2017, according to the records in the Web of Science. With this very limited quantity, the validity of the findings from the journals should be challenged. The search techniques used for the retrieval of the data used in this part were largely similar to those used in the previous part, but with different parameters such as journal titles and years.

Using the bibliometric data extracted from the Web of Science, we first calculated the impact factor of the journals based on Eq. (1) in the most recent 10 years (from 2008 to 2017), and then computed the adjusted impact factor excluding review articles based on Eq. (2) below (taking the impact factor in 2016 as an example). With the values of impact factor and adjusted impact factor, we were able to examine whether the impact factor was affected by the citations of review articles. It is worth noting that we calculated impact factors for the most recent 10 years, i.e., from 2008 to 2017, in that the data for 2018 was incomplete at the time of this study. Accordingly, the end year should be 2017.

$$ {\text{Adjusted}}\,{\text{impact}}\,{\text{factor}}\,{\text{in}}\, 2016 = \frac{{{\text{Citations}}\, {\text{in}}\, 2016\,{\text{to}}\, {\text{all}}\,{\text{items}}\,{\text{excluding}}\,{\text{review}}\,{\text{articles}}\,{\text{published}}\,{\text{in}}\, 2014\,{\text{and}}\, 2015 }}{{{\text{Number}}\,{\text{of}}\,{\text{citable}}\,{\text{items}}\,{\text{excluding}}\,{\text{review}}\,{\text{articles}}\,{\text{published}}\,{\text{in}}\,2014\, {\text{and}}\, 2015 }}. $$
(2)

Results

Do review articles receive more citations than original research articles?

The statistics for the mean citation numbers for the highly cited original research articles and the review articles from 2009 to 2018 are plotted in Fig. 1. It is obvious that the mean citation numbers for the review articles are always higher than those for original research articles. These findings are in accordance with our hypothesis that review articles may receive more citations than original research articles. This finding may also provide empirical evidence for such intuitive arguments in earlier studies, such as Martin (2016). As previously mentioned, review articles integrate ideas, findings, and state-of-the-art development in a research area, and may offer insights into research trends in the field, which as a result attract attention from colleagues and elicit citation in their future publications. In the next section, we will report results on the issue of whether the citations of review articles affect the impact factor of journals as was seen in the Sheldrick (2008) case and the RPP phenomenon (Liu et al. 2018a).

Fig. 1
figure 1

Mean numbers of citations of highly cited original research articles and review articles

Do the citations that review articles receive affect the impact factor of their host journals?

The impact factors and adjusted impact factors for Science, Nature, and Cell are depicted in Figs. 2, 3, and 4. It is clear that the impact factors of the journals are higher than their adjusted impact factors for all the years examined. That is, when the review articles are removed, the impact factors of the journals will decrease. These findings confirm our hypothesis that citations to review articles affect the impact factor of their host journals. Such findings also seemingly provide partial evidence to support our argument that the Sheldrick (2008) case and the RPP phenomenon as reported in Liu et al. (2018a) are not unique, but rather pervasive, as long as a journal publishes review articles.

Fig. 2
figure 2

(Adjusted) impact factors of Cell articles and review articles

Fig. 3
figure 3

(Adjusted) impact factors of Nature articles and review articles

Fig. 4
figure 4

(Adjusted) impact factors of Science articles and review articles

It should be noted that the contribution of review articles to the impact factors of journals in the present experiment is not as large as that in the Sheldrick (2008) and RPP cases. The smaller contribution of review articles may be explained by two points. First, the journals examined in the study, i.e., Science, Nature, and Cell, are top journals with high impact factors and large numbers of published pieces. Accordingly, the higher citations of review articles may contribute relatively modestly to the impact factor of these journals. Such a phenomenon was also been by Liu et al. (2018a). In their study, the RPP effect on the impact factor of a journal with a high impact factor and a large number of published items, i.e., Physical Review D, was much smaller than that for other journals with lower impact factors and a smaller number of published pieces. Second, the citation gap between review articles and original research articles in Science, Nature, and Cell is much smaller than that between the Sheldrick (2008) and RPP cases and the original research articles in the journals examined in Liu et al. (2018a). The smaller citation gap may also explain the relatively modest contribution of review articles to the impact factor of journals in our experiment.

Despite the foregoing points, we cannot ignore the contributions of review articles to the impact factors of the journals. As the statistics presented in Table 1 show, the review articles account for an average of 10% of the impact factor of Cell, and 3% of that of Nature and Science. Indeed, in certain years, the contribution of review articles is nearly 20% of the impact factor. For example, in 2012 and 2013, 18% of the impact factor is attributable to citations of review articles in Cell. Without the review articles, the impact factor would decrease from 35.45 to 29.07 in 2012 and from 35.59 to 29.08 in 2013. In addition, some extremely highly cited review articles are found in these top journals with high impact factors. For example, a review article published in Cell in 2011 (Hanahan and Weinberg 2011) had received a total of 5313 citations 2 years after its publication (2415 in 2012 and 2898 in 2013). This review article was responsible for 73.44% (5313/7234 = 73.44%) of all citations to review articles in 2013 and 19.82% (5313/26,802 = 19.82%) of all citations to all items published in Cell in 2013. This example also offers evidence to our point that the Sheldrick (2008) case and the RPP effect are not isolated phenomena.

Table 1 Statistics for impact factors (IF) and adjusted impact factors

Discussion

The present study finds that review articles consistently receive more citations than original research articles. This result provides empirical evidence for such intuitive arguments in previous studies (e.g., Martin 2016). The study also reveals that review articles affect the impact factor of journals. Specifically, review articles have been responsible for an average of 3–10% of the impact factors of the top journals Cell, Nature, and Science over the past 10 years. In addition, certain review articles are responsible for a relatively large proportion of citations to these top journals. One extreme example is a review article published in Cell in 2011 (Hanahan and Weinberg 2011), which accounted for approximately 20% of all citations to items published by the journal in 2013. Accordingly, the findings of the study show that the Sheldrick (2008) case and the RPP phenomenon (Liu et al. 2018a) are not unique. In fact, review articles play an important role in journal impact factors.

The impact factor of a journal is affected by many variables. For example, in a recent study, Shi et al. (2017) found that the impact factor was affected by the delay of publications in citing journals. Another example is the influence of documents such as editorials and commentaries on journal impact factors. While citations received by editorials and commentaries will be included only in the nominator and not the denominator in impact factor calculations, the impact factors of journals such as Nature and Science that publish a large number of editorials and commentaries may be skewed because of the citations from such uncitable items. Future research may focus on the influence of such items on journal impact factors. In addition, in Martin (2016), the editor of Research Policy lists many seemingly legitimate strategies that people use to manipulate the journal impact factor, one of which involves two points concerning review articles that are of special interest to the present study. First, journals may invite higher numbers of review articles in order to improve their impact factors. Second, review journals often sit at the top of impact factor rankings. Such an important role of review articles may motivate researchers’ proposals to exclude review articles in impact factor calculation (e.g., Liu et al. 2018b), particularly those extremely highly cited reviews such as the Sheldrick (2008) case and the RPP series (Liu et al. 2018a).

However, the discussion should not pertain only to the exclusion of highly cited items for impact factor calculation. First, it is no fault of the highly cited items that they attract higher numbers of citations. They attract citations because of their academic value. Second, although highly cited pieces may affect the journal impact factor, they are not necessarily pertinent to the “manipulation” of the impact factor (Falagas and Alexiou 2008; Yu et al. 2010). Third, if the impact factor is affected or distorted by the highly cited items such as the Sheldrick (2008) case and the RPP series (Liu et al. 2018a) and hence should be removed, should other highly cited pieces such as that published in Cell in 2011 (Hanahan and Weinberg 2011) also be excluded? In fact, the highly cited pieces permeate journals. In addition, a quick search of the most highly cited items published in 2017 in physics reveals that the top four pieces are all published in the same journal (Abbott et al. 2017a, b, c; Akerib et al. 2017), and they will surely increase the impact factor of the journal in 2018 and 2019. But it may not be wise to exclude them from the calculation of the impact factor, since if that is the case, then all of the most highly cited items should be excluded. In fact, as one of the reviewers points out, the use of any single index, whether the impact factor or another measure, is inherently limited as the sole means of evaluating journals, individuals, or institutions.

Lastly, at least two points should be considered concerning the journal impact factor issue. On one hand, rather than simply excluding the highly cited items, other sophisticated methods can be employed to prevent manipulation of the impact factor (e.g., Yang et al. 2016; Yu et al. 2011). On the other hand, the key point concerning the impact factor issue may not be the exclusion or inclusion of highly cited items, but the very use of the impact factor for evaluation purposes, which has been widely challenged (Paulus et al. 2018; Stephan et al. 2017). Thus, in addition to simplistic algorithms such as the journal impact factor, many other factors and methods should be considered for use in research evaluation (e.g., Hicks et al. 2015).