Introduction

Born with glory, the h-index was first formally initiated by Jorge E. Hirsch in 2005 to measure a scholar’s output from the perspectives of both productivity and quality or, more precisely citation impact (Hirsch 2005). Since then, the h-index has attracted wide attention from academia and practitioners (Gingras 2016; Schubert and Schubert 2019). This indicator can be expanded to measure the output of any publication set of any research group or country, etc. (Bornmann and Daniel 2009). Given its limitations such as inter-field differences and insufficient weight to highly cited papers, criticisms sprang up and various h-index variants such as g-index and hg-index, to name just a few, were proposed successively (Alonso et al. 2009; Bornmann et al. 2011; Egghe 2006). Among them a fundamental question is that the h-index should be used without confusion.

In her highly-cited paper entitled “Which h-index? -A comparison of WoS, Scopus and Google Scholar” published in Scientometrics, Dr. Judit Bar-Ilan made a comparison of different calculations of h-index by using WoS, Scopus and Google Scholar bibliographic databases respectively (Bar-Ilan 2008).Footnote 1 In 2018, Judit Bar-Ilan also rejoined an open debate about the h-index through the Scientometrics journal which was triggered by an article entitled “Multiple versions of the h-index: Cautionary use for formal academic purposes” (Bar-Ilan 2018; Bornmann and Leydesdorf 2018; Costas and Franssen 2018; Teixeira da Silva and Dobránszki 2018a, b).

However, a more easily overlooked scenario is that even within Web of Science (WoS) different h-index values can also be generated. Jasco is an early pioneer discussing the names and time coverages of the WoS sub-datasets in computing the h-index (Jasco 2008). With the rapid evolution of WoS in recent years, a clear description of the data source in calculating the h-index is becoming increasingly important. This is particularly true when emerging countries such as China relies heavily on metrics to identify talents and make funding decisions (Tang and Hu 2018). In memory of Dr. Judit Bar-Ilan, we rejoin the discussion on this easily neglected issue in research evaluation practice by probing a similar “which h-index” question within the database of Web of Science.

WoS, WoSCC and WoK

Web of Science is one of the most adopted data sources for bibliography searching and research evaluation for good or flawed science (Harzing and Alakangas 2016; Tang et al. 2020; Zhu and Liu 2020). But this term has been used interchangeably for different sub-datasets of WoS (Calver et al. 2017). To make sure we are in the same platform on terminology, let us first clarify some easily confusing notions: Web of Science, Web of Science Core Collection (WoSCC), and Web of Knowledge (WoK) and also their correlation with the famous Science Citation Index-Expanded (SCIE).

Currently, WoS is a platform providing access to Clarivate Analytics’ multidisciplinary bibliographic databases.Footnote 2 According to Thomson Reuters, the former owner of WoS,Footnote 3 the integrated WoS platform was previously known as WoK but renamed WoS in 2014 (Torres-Salinas and Orduña-Malea 2014). The WoS platform contains citation indexes (including WoSCC), product databases, and Derwent Innovations Index as demonstrated in Table 1.Footnote 4

Table 1 Web of Science platform

WoSCC is a core database collection under the WoS platform. It was renamed from WoS in 2014.Footnote 5 Along with the expansion and integration of the WoSCC (Liu 2019; Jacso 2018; Rousseau et al. 2018), it now consists of two chemical indexes (i.e. Current Chemical Reactions and Index Chemicus) as well as the following eight citation indexes.

  • Science Citation Index Expanded (SCIE).

  • Social Sciences Citation Index (SSCI)

  • Arts and Humanities Citation Index (A&HCI)

  • Conference Proceedings Citation Index-Science (CPCI-S)

  • Conference Proceedings Citation Index-Social Sciences and Humanities (CPCI-SSH)

  • Book Citation Index-Science (BKCI-S)

  • Book Citation Index-Social Sciences and Humanities (BKCI-SSH)

  • Emerging Sources Citation Index (ESCI)

However, the phrase WoS is still often used to denote the WoSCC in practice. This phenomenon may due to the confusion between these two concepts or just for simplification, however, both may introduce confusion.

Calculation of the h-index

Which WoS?

Different h-index values for different WoS

The first factor that influences the calculation of the h-index is the identification of all the publications belonging to an entity. Two different scenarios may happen. Firstly, the phrase WoS may denote the WoS platform. However, different institutions may choose to subscribe to different database packages according to their personalized demand. When scholars search with WoS platform’s all databases search setting, different results may arise if accessed from different institutions. That is to say, different database package subscriptions under the WoS platform may generate different h-index values.

Secondly, the phrase WoS may also refer to WoSCC. A recent study has shown that different institutions may subscribe to different sub-datasets of WOSCC and also with varying years of coverage (Liu 2019). That is to say, the WoSCC-based h-index values may also be different when calculated in different institutions. Unfortunately, many scholars haven’t specified the details of the sub-datasets when using the WoSCC as the data source (Dallas et al. 2018; Liu 2019).

Non-transparent calculation of the h-index: an empirical evidence

To have a better understanding of how prevalent this ambivalent situation is, we manually check a sample of SCIE/SSCI indexed publications over the period of 2017 and 2019 and examined their calculations of h-index by using WoS.Footnote 6 We use “Web of Science” and “h index” as the keywords to search in topic field and limit the citation indexes to SCIE and SSCI only.Footnote 7 137 records published from 2017 to 2019 were retrieved. We further restricted the publishing language to English and 129 records were left. We ended up with 127 records with full text available for further analysis.Footnote 8

Two authors read the full texts of these 127 records and tabulated how h-index was calculated if documented. Our examination showed that 99 out of the 127 records used the data from the WoS to calculate h-index. Table 2 summarizes their distribution by journal sources.

Table 2 Journals haven’t specified the details of the sub-datasets of WoS when calculating the h-index

Yet over 40% of our sample (47 out of the 99 records) did not specify which sub-datasets of the WoS were adopted to calculate the h-index, including those professional publications in the category of Information Science and Library Science.

Which citation count?

The second factor that influences the value of h-index is time cited counts of an entity’s publications. One record’s citation counts from Google Scholar, Scopus, and WoS are usually different (Martín-Martín et al. 2018), however, similar scenario also exits in WoS. Though the help file of WoS states “If you view Times Cited for a record from anywhere in the world, the value is always the same”,Footnote 9 different versions of one record’s citation count also exist in WoS.

According to the help file of WoS all databases, at least four citation count related field tags are provided: TC (times cited from WoSCC), Z8 (times cited from Chinese Science Citation Database), ZB (times cited from BIOSIS Citation Index), and Z9 (times cited from all citation indexes under the WoS platform).Footnote 10

Figure 1 demonstrates that different versions of citation count of one Nature paper searched through the WoS platform. The article we chose is titled Collective dynamics of ‘small-world’ networks. As shown in the red brackets of Fig. 1, there exist large differences among different citation counts for the same paper. Even downloading the bibliographic data from the WoSCC, two citation count field tags are also provided (TC and Z9).Footnote 11 Therefore, different versions of citation count also influence the calculation of the h-index. Users generally use the TC field or the Z9 field to denote the citation count, however, many of them including the authors of this letter always haven’t specified which version of citation count is used. We also check the full texts of the abovementioned 99 records, but most of them haven’t mentioned this point.

Fig. 1
figure 1

Citation counts of one Nature paper. Note: Data accessed on Dec, 2 2019 from the WoS platform

Conclusion

With the rapid update of the WoS in recent years, the simple statement of the use of WoS as the data source without clarifications will bring about confusion and inconsistency in metrics. This study tries to distinguish the concepts of WoK, WoS and WoSCC and further reveals the fact that different WoS may generate different h-index values. We argue that h-index, despite its deficiencies, if used, should be at least used consistently by detailing how they are calculated. We hope to remind both bibliometricians and research evaluators of the need to pay attention to the possible various h-index values even within WoS database.

Twelve years after Dr. Judit Bar-Ilan’s high-cited paper on “which h-index”, this article expands the discussion on “which h-index” in research evaluation, but within one widely utilized bibliographic database, WoS. Similarly, for other database-dependent metrics, this phenomenon also exists. Given the increasing use of various bibliographic databases, the features and also limitations of each database should be expressed explicitly (Falagas et al. 2008; Liu 2017; Liu et al. 2018; Tang et al. 2017; Zhu et al. 2019). We write this paper, partly in memory of Dr. Judit Bar-Ilan; while at the same time, we suggest that researchers and evaluation practitioners should pay attention to the details of data sources especially when using the WoS (Dallas et al. 2018; Liu 2019).