A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011

Opthof, Tobias; Leydesdorff, Loet

doi:10.1007/s11192-011-0424-8

A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011

Open access
Published: 17 June 2011

Volume 88, pages 1011–1016, (2011)
Cite this article

Download PDF

You have full access to this open access article

Scientometrics Aims and scope Submit manuscript

A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011

Download PDF

Tobias Opthof^1,2 &
Loet Leydesdorff³

2169 Accesses
17 Citations
Explore all metrics

Abstract

In reaction to a previous critique (Opthof and Leydesdorff, J Informetr 4(3):423–430, 2010), the Center for Science and Technology Studies (CWTS) in Leiden proposed to change their old “crown” indicator in citation analysis into a new one. Waltman (Scientometrics 87:467–481, 2011a) argue that this change does not affect rankings at various aggregated levels. However, CWTS data is not publicly available for testing and criticism. Therefore, we comment by using previously published data of Van Raan (Scientometrics 67(3):491–502, 2006) to address the pivotal issue of how the results of citation analysis correlate with the results of peer review. A quality parameter based on peer review was neither significantly correlated with the two parameters developed by the CWTS in the past citations per paper/mean journal citation score (CPP/JCSm) or CPP/FCSm (citations per paper/mean field citation score) nor with the more recently proposed h-index (Hirsch, Proc Natl Acad Sci USA 102(46):16569–16572, 2005). Given the high correlations between the old and new “crown” indicators, one can expect that the lack of correlation with the peer-review based quality indicator applies equally to the newly developed ones.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

We react on a study by Waltman et al. (2011a), entitled “Towards a new crown indicator: An empirical analysis.” The authors go at great length to show that a change in the normalization—in reaction to our previous critique of the Leiden “crown” indicators (Opthof and Leydesdorff 2010)—did not significantly affect the rankings at various aggregated levels. Since the Center for Science and Technology Studies (CWTS)-data under discussion were not publicly available,^{Footnote 1} let us use a previous occasion at which Van Raan (2006) revealed some of the micro-data underlying the evaluations in the case of 147 research groups in chemistry. The defense at that time was triggered by the introduction of the h-index by Hirsch (2005). How did the Leiden “crown” indicators work in comparison to the h-index? Unlike the citation indicators, the h-index is sensitive to the number of publications for which citation rates are compared. Decomposition of aggregated data allows for distinguishing mechanisms; for example, variance “within groups” versus “between groups.”

Since Narin (1976) suggested the use of bibliometrics for evaluative purposes, semi-industrial centers have sprung up either connected to academia (such as in Budapest, Leiden, Leuven, Beijing, Shanghai, etc.) or as independent commercial enterprises (e.g., Science-Metrix in Montreal). Two major companies (Thomson Reuters and Elsevier) are also active in this market. In other words, citation analysis has become an industry. Intellectual property of the data and the results of the analysis has become a major asset in this (quasi-)industry. Although contractors sometimes state that the results are freely available for the users, the licenses of the data (the Science Citation Index) often do not permit to publish results freely so that the scientists under study would be able to control these evaluations themselves (cf. Opthof and Leydesdorff 2010). This practice of secrecy tends to shield the evaluation against the criticism that has been voiced against the use of citation analysis for evaluative purposes (Leydesdorff 2008; MacRoberts and MacRoberts 1987, 1996, 2010).

The invention of the h-index as a new statistics in 2005 (Hirsch 2005), however, challenged the leading researcher of CWTS (Van Raan 2006) to test whether this new indicator correlated with the “crown” indicators of scientometric evaluation in use by CWTS: citations per paper/field citation score (CPP/FCSm) and CPP/JCSm (Schubert and Braun 1986; Vinkler 1986; Moed et al. 1995). These latter indicators have extensively been used for such purposes as the Leiden Rankings of universities, research evaluation at the institutional level, and science policy advice at national and international (e.g., EU) levels (e.g., Moed 2005). Vinkler (1996) considered this indicator—which he indicated with RW—as the most appropriate one for the evaluation.

The CWTS study (VSNU 2002) was based on more than 18,000 publications of 147 research groups in chemistry and chemical engineering in the Netherlands for the years 1991–1998. A subset of this data was secondarily analyzed by Van Raan (2006). In addition to the citation indicators, the research groups under study were peer reviewed on their quality on a five-point scale. All fields within chemistry were covered by this set of university groups. The author notes that the various specialties exhibit different citation characteristics and that therefore field-normalization would be essential (cf. Leydesdorff and Opthof 2010, 2011). CPP/FCSm normalizes CPP for the mean FCSm where a “field” is defined as a set of journals sharing a field-code of the ISI Subject Categories. Analogously CPP/JCSm normalizes for the mean citation scores of individual journals (Schubert and Braun 1986; Vinkler 1986; Waltman et al. 2011b).

Van Raan (2006, p. 495) provided the Table 1 ^{Footnote 2}

Table 1 Example of the results of the bibliometric analysis for the chemistry groups

Full size table

Table 1 shows the results for 12 research groups in one university who published during this period 1,327 times, obtaining a total of 17,566 citations. The bibliometric indicators, the h-index, and the peer ratings are provided. In the latter, “5” indicates “excellent,” “4” means “good,” and “3” is classified as “satisfactory.” Below “3” is not considered “satisfactory,” but such a low rating did not occur in this set of data.

Table 2 shows the Pearson correlations (r) in the lower triangle and the Spearman rank correlations (ρ) in the upper triangle. As noted (cf. Van Raan 2006, p. 499), the h-index is also dependent on the number of publications while the CWTS-indicators are not. As could be expected, the two CWTS-indicators are highly correlated between themselves (r = 0.783; p < 0.01). However, the quality parameter Q is uncorrelated with any of these scientometric indicators. Thus, we may conclude that the indicators are not validated by this study despite the author’s claim to the contrary.

Table 2 Pearson correlations (lower triangle) and Spearman rank correlations (upper triangle) among three citation indicators one peer-review based quality indicator

Full size table

Figure 1 shows the discriminating power of the h-index and the two indicators of CWTS (CPP/JCSm and CPP/FCSm) using the set provided in Table 1. We added error bars in order to show that the differences are contained within the margins of the standard errors of the measurement. Thus, none of the citation-based indicators is able to discriminate between the categories “good” and “excellent” which were distinguished during the peer review.

In his Table 2, Van Raan (2006, p. 500) provided also aggregated data for the set of 147 research groups. In this table, the association between Q and h is significant (using χ ², and p < 0.05), but not the association between Q and CPP/FCSm when testing Q = 4 against Q = 5 (χ ² = 4.211^{Footnote 3}; df = 2; p = 0.112). Thus, even at this aggregated level (N = 147), these results confirm the previous conclusion of Bornmann et al. (2010; cf. Van den Besselaar and Leydesdorff 2009) that the peer review systems and citation analysis are able to distinguish the tails of the distributions (low quality) from the high-end of the set, but perform poorly in distinguishing between excellent and good research to the extent that the correlation between evaluations based on these scientometric indicators or peer review can be negative (cf. Neufeld and von Ins 2011).

In Tables 19.3 and 19.4 at p. 243, Moed (2005) used this same data, but having access to the source data he added the larger set of similar results for biology and physics (whereas Table 1 above only contained the data for 12 research groups in chemistry in a single university among 147 chemistry groups at ten universities). By aggregating CPP/FCSm values also along the scale of “Citation impact classes,” he can conclude (at p. 242) that “a very high citation impact discriminated very well between departments rated excellent or good and those receiving lower peer ratings, but it did not discriminate so well between good and excellent departments in the perception of peers.”

This wording (“not so well”) suggests a poor correlation, whereas we showed above that there was no correlation at the level of the smaller set of chemistry and using the values of CPP/FCSm before binning them into “Citation impact classes:” citation analysis is not always helpful in distinguishing between good and excellent research. Aggregation may inadvertently obscure the absence of correlations. Unfortunately, the selection between “excellent” and “good” is one of the policy contexts in which citation analysis is used; for example, in rankings and funding schemes (e.g., Bornmann et al. 2010; Halffman and Leydesdorff 2010; Geuna and Martin 2003).

In summary, we argue that the industrial character of citation analysis for evaluative purposes has hidden technical flaws in these measurements because of a lack of openness about the data and therefore critical discussion in academia. Notwithstanding their prevailing use in research evaluation and strategic decision-making, the statistical analysis of this scientometric data, for example, supports the claim of the criticizers (e.g., MacRoberts and MacRoberts 2010) that citation analysis hitherto cannot legitimate the strategic selection of excellence.

Notes

One of us recently (Jan. 20, 2011) received access to this data in response to a request of the Dean of the Academic Medical Center of the University of Amsterdam. This communication was first submitted before that date.
In footnotes 4 and 5 on p. 464, Van Raan (2006) explains the rationale for using different citation windows for the h-index and the CWTS indicators.
This value is Yates-corrected because of one value smaller than five. Without this correction: χ ² = 5.559; df = 2; p = 0.062.

References

Bornmann, L., Leydesdorff, L., & Van den Besselaar, P. (2010). A meta-evaluation of scientific research proposals: Different ways of comparing rejected to awarded applications. Journal of Informetrics, 4(3), 211–220.
Article Google Scholar
Geuna, A., & Martin, B. R. (2003). University research evaluation and funding: An international comparison. Minerva, 41(4), 277–304.
Article Google Scholar
Halffman, W., & Leydesdorff, L. (2010). Is inequality among universities increasing? Gini coefficients and the elusive rise of elite universities. Minerva, 48(1), 55–72.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569–16572.
Article Google Scholar
Leydesdorff, L. (2008). Caveats for the use of citation indicators in research and journal evaluation. Journal of the American Society for Information Science and Technology, 59(2), 278–287.
Article Google Scholar
Leydesdorff, L., & Opthof, T. (2010). Normalization at the field level: Fractional counting of citations. Journal of Informetrics, 4(4), 644–646.
Article Google Scholar
Leydesdorff, L., & Opthof, T. (2011). Remaining problems with the “New Crown Indicator” (MNCS) of the CWTS. Journal of Informetrics, 5(1), 224–225.
Article Google Scholar
MacRoberts, M. H., & MacRoberts, B. R. (1987). Another test of the normative theory of citing. Journal of the American Society for Information Science, 16, 151–172.
Google Scholar
MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.
Article Google Scholar
MacRoberts, M. H., & MacRoberts, B. R. (2010). Problems of citation analysis: A study of uncited and seldom-cited influences. Journal of the American Society for Information Science and Technology, 61(1), 1–12.
Google Scholar
Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.
Google Scholar
Moed, H. F., De Bruin, R. E., & Van Leeuwen, T. N. (1995). New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422.
Article Google Scholar
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: National Science Foundation.
Google Scholar
Neufeld, J., & von Ins, M. (2011). Informed peer review and uninformed bibliometrics? Research Evaluation, 20(1), 31–46.
Article Google Scholar
Opthof, T., & Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of Informetrics, 4(3), 423–430.
Article Google Scholar
Schubert, A., & Braun, T. (1986). Relative indicators and relational charts for comparative assessment of publication output and citation impact. Scientometrics, 9(5), 281–291.
Article Google Scholar
Van den Besselaar, P., & Leydesdorff, L. (2009). Past performance, peer review, and project selection: A case study in the social and behavioral sciences. Research Evaluation, 18(4), 273–288.
Article Google Scholar
Van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502.
Google Scholar
Vinkler, P. (1986). Evaluation of some methods for the relative assessment of scientific publications. Scientometrics, 10(3), 157–177.
Article Google Scholar
Vinkler, P. (1996). Model for quantitative selection of relative scientometric impact indicators. Scientometrics, 36(2), 223–236.
Article Google Scholar
VSNU. (2002). Chemistry and chemical engineering. VSNU Series ‘Assessment of Research Quality.’ Utrecht: VSNU. ISBN 90 5588 4979.
Waltman, L., van Eck, N. J., van Leeuwen, T. N., Visser, M. S., & Van Raan, A. F. J. (2011a). Towards a new crown indicator: An empirical analysis. Scientometrics, 87, 467–481.
Article Google Scholar
Waltman, L., Van Eck, N. J., Van Leeuwen, T. N., Visser, M. S., & Van Raan, A. F. J. (2011b). Towards a new crown indicator: Some theoretical considerations. Journal of Informetrics, 5(1), 37–47.
Article Google Scholar

Download references

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Experimental Cardiology Group, Heart Failure Research Center, Academic Medical Center AMC, Meibergdreef 9, 1105 AZ, Amsterdam, The Netherlands
Tobias Opthof
Department of Medical Physiology, University Medical Center, Utrecht, The Netherlands
Tobias Opthof
Amsterdam School of Communication Research (ASCoR), University of Amsterdam, Kloveniersburgwal 48, 1012 CX, Amsterdam, The Netherlands
Loet Leydesdorff

Authors

Tobias Opthof
View author publications
You can also search for this author in PubMed Google Scholar
Loet Leydesdorff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loet Leydesdorff.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Opthof, T., Leydesdorff, L. A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011. Scientometrics 88, 1011–1016 (2011). https://doi.org/10.1007/s11192-011-0424-8

Download citation

Received: 11 May 2011
Published: 17 June 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s11192-011-0424-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A comment to the paper by Waltman et al., Scientometrics, 87, 467–481, 2011

Abstract

Notes

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation