Introduction

In the field of academic evaluation, the h-index (Hirsch 2005) and its variants are methods of quantitative evaluation. Commonly, publication number, citation counts, and average citations per publication are regarded as the three main traditional methods of evaluation. In contrast to them, the h-index takes into account two dimensions combined in a certain way, namely the number of papers and each paper’s citation counts. Thus, the h-index can effectively reflect a researcher’s academic influence. Moreover, its calculation is quite simple and easy to implement. Originally, the h-index was adopted and applied to evaluate the academic influence of an individual researcher (Hirsch 2005). Since then, the h-index has been used to evaluate journals (Braun et al. 2005; Schubert 2007; Olden 2007), conferences (Sidiropoulos et al. 2007), institutions (Molinari and Molinari 2008; Mugnaini et al. 2008), research themes (Banks 2006), and countries (Jacsó 2009). Although, the h-index can be used in different aspects, it is not certain that it is better than traditional methods. Meanwhile, the variants of the h-index have not been widely recognized by most scientific researchers. In most cases, the h-index and its variants are just popular with bibliometricians. The differences, advantages, and disadvantages of the h-index and its variants are also not clear. Waltman (2016) recommended that future research should pay more attention to the theoretical foundation of indicators and the way they are used in practice. Our work is to compare the distinctions of different indices when using them to evaluate single publications.

The assessment of a single publication is the cornerstone to evaluate scientists, institutions, countries, and other aspects of scientific research outputs. At present, evaluators mainly apply citation counts as a quantitative method to assess single publications. However, citation counts only roughly reflect the direct impact of a publication, but it cannot effectively reflect a publication’s comprehensive influence. Schubert (2009) considered both the direct impact and the indirect impact of citations, because they all can reflect one aspect of the quality of a publication. Given this, he proposed the paper-level h-index for assessing a single publication, in order to comprehensively reflect the direct and indirect impact of that publication. Afterwards, Egghe (2010) explained the relation between a single publication’s h-index and its total citations incorporating a concavely increasing power law using the Lotkaian model. Thor and Bornmann (2011) built a web application to calculate the single publication h-index, m-index, h2-lower, h2-centre, and h2-upper for publications indexed by Google Scholar. In addition, Bornmann et al. (2011b) did an empirical investment to prove the effectiveness of the single publication h-index. Their research objects are manuscripts which had been submitted to Angewandte Chemie International Edition (AC-IE) and ultimately published in AC-IE or in other journals. Their results indicated that editorial decisions and peer review ratings are correlated with the h-index values. The manuscripts that get positive ratings or are accepted by the AC-IE’s editors in the peer review stage show higher h-index values than those with negative ratings or that are rejected by editors. Thus, the Bornmann study confirmed that using the h-index for assessing single publications is effective.

Since the h-index can be used to assess single publications, could other Hirsch-type indices also be applied in assessing single publications? Do Hirsch-type indices behave the same with the h-index for assessing single publications? In the current research, 26 Hirsch-type indices (including the original h-index) and 3 traditional indicators, 29 indicators in total, were chosen as research objects and used to assess single publications.

In the following section, based on the original definitions of the 29 indicators and our new explanation of generation, the 29 paper-level metrics are defined or redefined to fit in the evaluation of single publications.

Paper-level metrics

Hirsch (2005) defined the h-index to evaluate an individual scientist’s output as follows: If a scientist has h of his total P papers with citation counts larger than h each, and the other (P − h) papers have less than h citations each, then the scientist has an h-index of h. Learning from this idea, Schubert (2009) applied the h-index in assessing a single publication, and defined the paper-level h-index as follows:

“h, of a publication as the citation h-index of the set of papers citing it, i.e. not more than h of the papers citing it should receive not less than h citations” (p. 560).

The objects of calculation of the paper-level h-index are the citation counts of papers which reference the single publication. Schubert (2009) took into account both direct influence and indirect influence of the citing papers of a publication, but he did not give detailed definitions of direct influence and indirect influence when calculating the single publication h-index. Considering the complexity of reference networks, the authors consider it necessary to argue this issue explicitly. Rousseau (1987) proposed the concepts first generation and second generation to describe reference networks. With respect to a certain paper, Rousseau said that first generation publications are those which reference the paper, thereby having a direct influence in its significance. Second generation papers are those which reference a first generation paper but are not themselves first generation, and so on.

In this current paper, we refine Rousseau’s definitions. To explain the reference network of a specific paper clearly, refer to the simplified schematic directed diagram in Fig. 1, where black nodes represent papers and the arrows point from the citing papers to cited papers.

Fig. 1
figure 1

A simplified directed diagram of reference networks

Here, we call the evaluated paper as 0th generation publication (0G-publication), which is node A in Fig. 1.

Papers which refer only to the 0G-publication are pure 1st generation publications (pure 1G-publications): nodes C and D. As for papers which not only refer to the 0G-publication but also refer to publications in other generations, they have both direct and indirect influence on the 0G-publication. We define these papers to be non-pure 1G-publications, as exemplified by node B in Fig. 1. These two types will collectively be called mixed 1G-publications:

$$mixed\;1{\text{G-publications}} = pure\;1{\text{G-publications}} + non{-}pure\;1{\text{G-publications}}$$
(1)

The mixed 1G-publications are exactly the “first generation” papers defined by Rousseau (1987). In Fig. 1, the mixed 1G-publications are nodes B, C, and D. Schubert’s “direct impact” papers might be equivalent to our mixed 1G-publications. The reason why we must say “might” is that Schubert did not give a clear and unambiguous concept of direct impact.

Papers which refer only to the pure 1G-publications can be defined as pure 2nd generation publications (pure 2G-publications): nodes F and G in Fig. 1.

Nodes B and E in Fig. 1 are quite special. B directly references A, and the length of this path is 1. B via C indirectly references A, and the length of this path is 2. E via B and E via C indirectly reference A, the length of both paths being 2. E via B and then via C also indirectly references A where the length of this path is 3. Obviously, how to define the generation of nodes B and E is complex. B has both a path to A of length 1 (i.e. B–A) and also a path to A of length 2 (i.e. B–C–A). E has two paths to A of length 2 (i.e. E–B–A and E–C–A) and also a path to A of length 3 (i.e. E–B–C–A). Here, we define the non-pure 2G-publications as papers which themselves do not belong to the pure 2G-publications, but at least one path from these papers to the 0G-publication is of length 2, exemplified by nodes B and E in Fig. 1.

The pure and non-pure 2G-publications are collectively defined as mixed 2G-publications, namely:

$$mixed\;2{\text{G-publications}} = pure\;2{\text{G-publications}} + non{-}pure\;2{\text{G-publications}}$$
(2)

Similarly, we can define all the generations in a reference network. Papers that only reference the pure (N − 1)th G-publications would be defined as pure Nth G-publications. In other words, all paths from these papers to the 0G-publication are of length N. If a paper has at least one path of length N to the 0G-publication in a reference network, but the paper itself does not belong to the pure Nth G-publications, then we define this paper to be a non-pure Nth G-publications. These two types will collectively be called mixed Nth G-publications:

$$mixed\;N{\text{th}}\;{\text{G-publications}} = pure\;N{\text{th}}\;{\text{G-publications}} + non{\text{-}}pure\;N{\text{th}}\;{\text{G-publications}}$$
(3)

To summarize, nodes C and D belong to pure 1G-publications. Nodes F and G belong to pure 2G-publications. B is not only a non-pure 1G-publication but also a non-pure 2G-publication. E is both a non-pure 2G-publication and a non-pure 3G-publication. The pure 1G-publication and non-pure 1G-publication both have direct influence on the 0G-publication. The non-pure 1G-, pure 2G-, non-pure 2G-, pure Nth G-, and non-pure Nth G-publications jointly have indirect influences on the 0G-publication.

Rousseau (1987) also has given a definition of second generation papers. Without our distinctions, people might suppose that Rousseau’s second generation papers are exactly our mixed 2G-publications. In fact, his second generation papers are just part of our mixed 2G-publications, namely pure 2G-publications. Schubert’s indirect influence might be our mixed 2G-publications. As stated above, the reason why we say “might” is that Schubert did not give a complete and clear concept. In fact, Schubert’s definition of indirect influence is too narrow (not including the influence of publications of mixed 3G-publications and subsequent generations) to have a direct analogue in our terminology. In addition, Hu et al. (2011) expanded Rousseau’s research (1987) and based on forward and backward citation generations, they proposed the concepts of H sn and G sn . Actually, the definition of H sn is exactly our mixed Nth G-publications, but the G sn is not our pure Nth G-publications. Their G sn contains more publications than our pure Nth G-publications for the same 0G-publication. Considering the direct and indirect impact of scientific publications and their age, Fragkiadaki and Evangelidis (2016) proposed another indirect indicator the fpk-index which is built on the concepts of Hu et al.’s generations of citations (2011) to assess the scientific publications. The formula is as follows (Fragkiadaki and Evangelidis 2016, p. 669):

$$fp^{k} = \frac{{1 + \sum\nolimits_{i}^{k} {\left( {\frac{1}{i} \times {\text{gen}}_{i} } \right)} }}{{n_{p} }}$$

where geni stands for the weighted citations basing on the generation citation i belongs to, and np stands for the scientific year of the paper.

After considering these related concepts in reference networks, we can give the definitions of these 29 indicators analyzed by this study for assessing single publications.

To begin with, we redefine the original definitions of three traditional indicators used to measure the research performance of a scholar (i.e. publication number, citation counts, and average citations per publication) to get indicators to assess single publications:

  1. 1.

    Publication Number of Mixed 1st Generation Publications (PNM1GP): This indicator equals the number of citations of the 0G-publication. For example, in Fig. 1, the PNM1GP is 3 (i.e. publications B, C, and D), which is also the citations of publication A. If publication A is viewed as a scholar, this indicator is scholar A’s publication number.

  2. 2.

    Total Citations of Mixed 1st Generation Publications (TCM1GP): This indicator equals the cumulative number of citations of each mixed 1G-publication. It can also be called the 2nd generation citations of the 0G-publication (i.e. the total count of paths with length 2 to the 0G-publication). For example, in Fig. 1, the TCM1GP are 5 (i.e. E–B–A, E–C–A, F–C–A, G–D–A, and B–C–A). If publication A is viewed as a scholar, this indicator is scholar A’s total citations.

  3. 3.

    Average Citation Counts of Mixed 1st Generation Publications (ACCM1GP): This indicator is the average number of citations received by mixed 1G-publications, which is calculated as the “TCM1GP” divided by the “PNM1GP”. If the ambiguity might be avoided, it can also be called as “average citation counts of the 0G-publication”. In Fig. 1, ACCM1GP = 5/3 = 1.67. If publication A is viewed as a scholar, this indicator is scholar A’s average citations per publication.

In addition to these three traditional methods and the h-index (Hirsch 2005), the current study used another 25 kinds of Hirsch-type indices to assess single publications, including the g-index (Egghe 2006a, b), Wu’s w-index (Wu 2010), h(5,2)-index (Ellison 2010), a-index (Rousseau 2006; Jin et al. 2007), e-index (Zhang 2009), f-index (Tol 2009), h(2)-index (Kosmulski 2006), hg-index (Alonso et al. 2010), j-index (Todeschini 2011), maxprod (Kosmulski 2007), m-index (Bornmann et al. 2008), normalized h-index (Sidiropoulos et al. 2007), p-index (Prathap 2010a), ph-ratio (Prathap 2010b), q2-index (Cabrerizo et al. 2010), rational h-index (Ruane and Tol 2008), real h-index (Guns and Rousseau 2009), r-index (Jin et al. 2007), rm-index (Panaretos and Malesios 2009), tapered h-index (Anderson et al. 2008), t-index (Tol 2009), weighted h-index (Egghe and Rousseau 2008), Woeginger’s w-index (Woeginger 2008), μ-index (Glänzel and Schubert 2010), and π-index (Vinkler 2009).

To illustrate how these indicators are defined using our generation terminology, four typical Hirsch-type indices were selected: i.e. h-index, g-index, Wu’s w-index, and h(5,2)-index, and their definitions are given in detail as follows:

  1. 1.

    h-index: This is defined to be the highest number h such that the 0G-publication has at least h mixed 1G-publications with h or more citations from mixed 2G-publications.

  2. 2.

    g-index: This is defined to be the highest number such that the top g mixed 1G-publications received together at least g2 citations from mixed 2G-publications.

  3. 3.

    Wu’s w-index: This is defined to be the highest number w such that the 0G-publication has at least w mixed 1G-publications with 10w or more citations from mixed 2G-publications.

  4. 4.

    h(5,2)-index: This is defined to be the highest number h such that the 0G-publication has at least h mixed 1G-publications with 5h2 or more citations from mixed 2G-publications.

The h(5,2)-index is a special form of the h(a,b)-index. Varying a and b yields an infinite number of combinations. After defining the h(a,b)-index, Ellison (2010) selected 12 combinations and conducted empirical research for assessing economists. The results demonstrated that for assessing economists, h(5,2) and h(10,1) are the best. Note that h(10,1) is exactly Wu’s w-index.

By referring to these four examples, the definitions of other indices for assessing single publications can also be easily defined using the viewpoint of mixed 1G- and 2G-publications, so further definitions are not given here. The original definitions of these indices can be seen in their original publications or Rosenberg’s review (Rosenberg 2014).

Data and method

To ensure comparability between one paper and another, we select three journals in Library and information science: Scientometrics, Journal of Informetrics, and Information Processing and Management. Their main themes focus on quantitative analysis of science, quantitative aspects of information, and information searching or retrieval, respectively. Papers published in these journals were taken as three groups of research objects. In October 2015, citation data of mixed 1G- and 2G-publications was taken from the Web of Science. To make the deadline for citation statistics the same, only citation records from before December 31, 2014 were considered. Basic information about the data sources is shown in Table 1.

Table 1 Basic information about data sources

As Table 1 shows, the research objects are papers which were published in 2005 or in 2007. In all, 105 of 125 publications in Scientometrics, 94 of 115 publications in Information Processing and Management, and all of the 33 publications in the Journal of Informetrics had at least one citation from the publishing year to the statistical deadline. The other publications had no citations before December 31, 2014. The 29 indices’ values for these publications without citations are mostly zero or cannot be calculated, and the real correlations between different indicators could be influenced by them. Therefore, when calculating an indicator’s value to assess a single publication, we only take into account publications with at least one citation.

According to our definitions of 29 indices, these indices’ values were calculated for 232 publications. Then the correlation coefficients of different indicators were calculated to compare the similarities between one indicator and another for assessing single publications. The Spearman’s rank correlation coefficient algorithm was used to calculate correlations between different indictors. However, if comparisons were to be made between every possible pair of indicators, it would be difficult to observe and consider all of the laws. To reduce the number of pairs considered, we note that the h-index is the prototype of all the Hirsch-type indices. Other indices mostly are based on it so we compare other indicators to it. In addition, we compare to Wu’s w-index (Wu 2010) because although it is also a variation of the h-index, it gives more attention to a scholar’s highly cited papers (Schreiber 2010; Kosmulski 2013), top cited papers (Panaretos and Malesios 2009; Chan et al. 2016), and most excellent papers (Wildgaard et al. 2014). Ellison (2010) confirmed that Wu’s w-index is better in terms of evaluating economists. Therefore, in the current study the h-index and Wu’s w-index were chosen as benchmarks. For every journal, correlations were calculated separately between each indicator and both the h-index and Wu’s w-index, for assessing single publications. The last step was an analysis of the similarities and differences between indicators and the h-index or Wu’s w-index.

In order to assess the reliability of the results, a factor analysis was also done for every journal as a kind of contrast. Factor analysis can be used to reduce dimensions, and to create a series of principal components which are independent of each other. Following Bornmann et al.’s study (2008), to begin with, we transform the scale of values of 29 indicators in the different journals. The function is y = log e(x + 1) where x stands for the original values and y stands for the transformed values. Then, using a rotated VARIMAX transformation, we did principal component analysis. If the eigenvalue of a component is larger than 1, this component will be recognized as a factor.

Results

Correlations of indicators with h-index

Figures 2, 3 and 4 show the correlations of 28 indicators with h-index for assessing publications in the different journals: Fig. 2 shows the correlations in Scientometrics; Fig. 3 shows the correlations in the Journal of Informetrics; Fig. 4 shows the correlations in Information Processing and Management. The values of correlations are higher than 0.8 for most indicators, and some of them reach 0.95 or more. The conclusion is justified that most indicators are highly correlated with the h-index. Note, however, that exceptions exist: the ph-ratio and the normalized h-index show negative correlation. In some cases, they did not pass the statistical test.

Fig. 2
figure 2

The correlations of 28 indicators with the h-index in Scientometrics. Filled circle significant at 1 %

Fig. 3
figure 3

The correlations of 28 indicators with the h-index in the Journal of Informetrics. Filled circle significant at 1 %. Filled triangle not significant at 5 %

Fig. 4
figure 4

The correlations of 28 indicators with the h-index in Information Processing and Management. Filled circle significant at 1 %. Filled triangle not significant at 5 %

In Figs. 2, 3 and 4 the indices are ordered by correlation value, with higher correlations on the right-hand part of each graph. Each figure shows a gradually upward trend from left to right for the other 26 indicators (excluding ph-ratio and normalized h-index). The extent of the rise for the Journal of Informetrics is less than other two journals. A significant drop occurs for the ph-ratio and normalized h-index. In terms of the overall shape, Figs. 2, 3 and 4 have the same trend, and they all passed the statistical test, as shown in Table 2 (see the section “correlations between different journals”). Therefore, we can make a horizontal comparison between one journal and another and get consistent results.

Table 2 Spearman’s rank correlation coefficients between different journals

We find that for all three journals, the values of correlations of f-index, rational h-index, real h-index, j-index, hg-index, Woeginger’s w-index, and tapered h-index with h-index are higher than 0.96, and all ranked in the top 10. Moreover, the relative rankings of these seven indices in the different journals are basically the same. The correlation levels of the g-index, π-index, TCM1GP, weighted h-index, PNM1GP, h(2)-index, q2-index, and r-index with h-index are also high, but lower than the aforementioned seven indices. The indices to the left of these 15 indices in Figs. 2, 3 and 4 have significantly lower correlations. The correlation levels of the t-index, p-index, h(5,2)-index, Wu’s w-index, and e-index with the h-index are a little low. The correlation levels of the a-index, m-index, and ACCM1GP with the h-index are quite low. In addition, we should note that the rankings of maxprod and μ-index in different journals fluctuate sharply.

Correlations of indicators with Wu’s w-index

Figures 5, 6 and 7 show the correlations of 28 indicators with Wu’s w-index for assessing publications in different journals. As shown in Figs. 5 and 6, the values of correlations of indicators with Wu’s w-index in Scientometrics and the Journal of Informetrics are usually higher than 0.75. Most correlation values in Information Processing and Management are higher than 0.7. Compared to the other two journals, Information Processing and Management has more indices with lower correlations. The normalized h-index shows a slightly negative correlation with Wu’s w-index, and the ph-ratio shows a slightly positive correlation with Wu’s w-index, but the correlation levels of these two indices are quite low. In some cases, they even cannot pass the statistical test. All in all, the correlations of every indicator with Wu’s w-index are also high, but the degree of correlations does not reach the height for the h-index.

Fig. 5
figure 5

The correlations of 28 indicators with Wu’s w-index in Scientometrics. Filled circle significant at 1 %. Filled triangle not significant at 5 %

Fig. 6
figure 6

The correlations of 28 indicators with Wu’s w-index in the Journal of Informetrics. Filled circle significant at 1 %. Filled square significant at 5 %. Filled triangle not significant at 5 %

Fig. 7
figure 7

The correlations of 28 indicators with Wu’s w-index in Information Processing and Management. Filled circle significant at 1 %. Filled triangle not significant at 5 %

In Figs. 5, 6 and 7, we find that in all these three journals, the values of correlations of the r-index, maxprod, e-index, π-index, p-index, TCM1GP, and weighted h-index with Wu’s w-index all ranked in the top 10, but the correlations of these indices with the h-index are lower than with Wu’s w-index. This phenomenon can also be observed in Figs. 2, 3 and 4. However, the rankings of the a-index, h(5,2)-index, and q2-index fluctuate sharply for different journals, even though they are all highly related to Wu’s w-index. For example, the h(5,2)-index ranked 1st in Scientometrics, but in the Journal of Informetrics and Information Processing and Management, it is respectively ranked at 10th and 11th. The f-index, rational h-index, real h-index, j-index, hg-index, Woeginger’s w-index, and tapered h-index are more correlated with the h-index than Wu’s w-index. In addition, the correlation values of ACCM1GP, μ-index, and PNM1GP with Wu’s w-index are relatively low.

Correlations between different journals

If we merely compare the similarities and differences of indicators in one journal, the significance might be limited. To determine the usefulness of the 29 kinds of indicators for assessing single publications, not only should we do an indicator-to-indicator comparison, but also a comparison between one journal and another. However, can one journal be compared to another or not? For each journal, we can use different indicators to assess this group of publications and different indicators will give different rankings of publications in the journal. Then we could calculate the correlations of different indicators with the h-index and Wu’s w-index for assessing publications in every journal. Eventually, we could obtain a relative ranking of indicators according to the magnitude of correlations with the h-index or Wu’s w-index in a journal, thereby determining whether the relative rankings of indicators in different journals remain the same.

First, a Spearman’s rank correlation test was done to make sure the relative rankings of indicators were basically consistent. The correlations between every pair of the three journals were at least 0.878, as shown in Table 2. The coefficients are all significant at level 0.01, showing that we can do journal-to-journal comparisons and find consistent arguments using these three journals.

Results of factor analysis

Details of the results of factor analysis for different journal are in Tables 3, 4 and 5. Indicators are ranked by the coefficients of factor 1. According to the coefficients of the indicators to the factors, in Tables 3 and 5, we call factor 1 “direct impact of the paper” and factor 2 “average impact of the paper”. Also, in Table 4 we refer to factor 1 as “average impact of the paper” and to factor 2 as “direct impact of the paper”. As Tables 3 and 5 show, of all 29 indicators, the PNM1GP (i.e. the number of citations of the 0G-publication) load the highest on factor 1 with coefficients of 0.936 and 0.949, and the ACCM1GP (i.e. the average citation counts of the 0G-publication) load the highest on factor 2 with coefficients of 0.961 and 0.983. In Table 4, the coefficients of PNM1GP on factor 2 and of ACCM1GP on factor 1 are also quiet high.

Table 3 The results of factor analysis of Scientometrics
Table 4 The results of factor analysis of Journal of Informetrics
Table 5 The results of factor analysis of Information Processing and Management

Obviously, according to factor analysis, it is easy to find that ph-ratio and normalized h-index are always the last two indicators and far from other indicators. As for the other 27 kinds of indicators in the same journal, the phenomenon is quite similar to the results in the Spearman’s rank correlation coefficients. Based on the coefficients of the indicators (see the bold face in Tables 3, 4, 5), the important indicators which constitute the factor “direct impact of the paper” include the PNM1GP, h-index, rational h-index, Woeginger’s w-index, μ-index, f-index, j-index, hg-index, real h-index, g-index, and tapered h-index. Also, the main indicators which form the factor “average impact of the paper” are the ACCM1GP, m-index, a-index, p-index, e-index, maxprod, q2-index, r-index, Wu’s w-index, h(5,2)-index, TCM1GP, and weighted h-index. Therefore, the h-index and Wu’s w-index have been identified as significantly different types of indices. In addition, the results of the Spearman’s rank correlation coefficients have been demonstrated to be reliable: some indices (i.e. the f-index, rational h-index, real h-index, j-index, hg-index, Woeginger’s w-index, and tapered h-index) are highly correlated with the h-index but not close to Wu’s w-index, while other indices (i.e. the a-index, h(5,2)-index, q2-index, r-index, maxprod, e-index, p-index, and weighted h-index) have relatively low correlations with the h-index but are near Wu’s w-index. On the whole, the results of these two methods (i.e. factor analysis, and Spearman’s rank correlation coefficients) do verify each other. However, it is hard to discriminate the relative ranks of indicators when using factor analysis, so combining these results with the Spearman’s rank correlation coefficient results may be useful to get more contrast.

Discussion and conclusions

Based on results above, this study finds that when compared with the h-index or Wu’s w-index, these indicators show various characteristics for assessing single publications.

First, except for the normalized h-index and ph-ratio, the other 22 Hirsch-type indices are clearly correlated with the h-index and Wu’s w-index for assessing single publications. Most indices’ correlation values with the h-index and Wu’s w-index are larger than 0.8. Therefore, these Hirsch-type indices probably do not substantially change on the h-index. However, due to the high correlations and their different characteristics, some of these indices might act as acceptable substitutes of the h-index.

Second, no matter which journal was considered, the f-index, rational h-index, real h-index, j-index, hg-index, Woeginger’s w-index, and tapered h-index are highly correlated with h-index but not Wu’s w-index. The relative rankings of these seven indices keep generally the same for the different journals. We find that these indices do not focus on top-cited papers like Wu’s w-index. Since these indicators’ correlation coefficients with the h-index are high, there is redundancy in the information they provide (Navon 2009; Bornmann et al. 2011a). Therefore, in practice, the h-index is preferable because of its conciseness and the fact that the extra complication of the others produces very little extra information.

Third, the r-index, maxprod, e-index, p-index, and weighted h-index are also correlated with the h-index, but the degrees of correlation are lower than for the seven indices mentioned above. Although these five indices originate from the h-index, they might make significant improvements on the h-index. They have their own characteristics, and future researchers may notice their advantages and disadvantages, and summarize their effective application scopes. Furthermore, the a-index, h(5,2)-index, and q2-index are highly correlated with Wu’s w-index and moderately with the h-index, but the rankings fluctuate sharply in different journals. In our opinion, being similar to Wu’s w-index, these three indices are significantly different from the h-index. The reason why their rankings fluctuate is that the different indices pay attention to top-cited papers in different ways.

Fourth, the normalized h-index and ph-ratio are obviously different from the other indices. Their correlation values are very low, or negative. In most cases, the correlation coefficient is statistically non-significant (p > .05) or negative significant (p < .01). To explore why normalized h-index and ph-ratio show this special feature, we should understand their definitions and formulae when assessing single publications:

  1. 1.

    Normalized h-index (hn) = h/P

  2. 2.

    ph-ratio = \(p/ h\), where \(p = \sqrt[3]{{\frac{{N_{P}^{2} }}{P}}}\)

NP stands for TCM1GP; P stands for PNM1GP.

According to the definitions and formulae, we can easily find that the normalized h-index and ph-ratio can be classified as proportional indices. The normalized h-index equals the number of publications in h-core divided by the PNM1GP, and reflects the percentage of highly cited publications. Although the ph-ratio is derived from the p-index which is highly correlated to the h-index, it cannot pass the test or the coefficient is negative because its denominator is the h-index. When referring to the original article introducing the ph-ratio, we find that the ph-ratio mainly “reflects the sensitivity to the citation numbers in the tall core and the long tail that h by itself fails to capture” (Prathap 2010b, p.155). These two indices reflect the proportion of highly cited articles to rarely cited articles while the h-index is a kind of indicator which only notices the number of highly cited papers. Therefore, we believe that the normalized h-index and ph-ratio must be distinguished from the other 24 Hirsch-type indices.

Fifth, as for the three kinds of traditional indicators, TCM1GP show strong correlation to the h-index and Wu’s w-index. The PNM1GP is correlated with the h-index and Wu’s w-index to a medium degree, and it shows higher correlation with the h-index than Wu’s w-index. The correlations of ACCM1GP with the h-index or Wu’s w-index are both not high. Overall, the traditional indicators show different features for assessing single publications. TCM1GP is the closest to the h-index and Wu’s w-index, while the PNM1GP and ACCM1GP are reasonably far from both the h-index and Wu’s w-index.

Much research has reliably established that the h-index is an effective method to evaluate academic impact and it has been widely applied in different areas of academic evaluation (Mingers 2009; Ellison 2010; Lacasse et al. 2011; Turaga and Gamblin 2012; Watkins and Chan-Park 2015). Consequently, in the authors’ opinion, it is most advantageous if the correlation values of Hirsch-type indices with h-indices are neither excessively high nor greatly low. If the correlation value of an index with the h-index is too high, this index shows little improvement over the h-index so it should be abandoned or perhaps integrated into the h-index in the future. If the correlation value is too low, even if the index contains additional information, we are unsure of its usefulness, because the extensive studies using h-index tell us nothing about whether the additional information is useful in terms of evaluating academic impact.

To illustrate this idea, consider a hypothetical new index named the reverse h-index and assume that its ranking is completely the opposite of the h-index. The correlation value would be −1. Though the reverse h-index does not duplicate any of the information of the h-index, this index does not have any practical significance. Therefore, when an index is neither too far from nor too near to the h-index, this index probably keeps most of the advantages of the h-index, but is also somewhat independent of the h-index. Such an index might provide more additional information not provided by the h-index, and therefore has potential to be a major improvement of the h-index. This kind of index may be much more promising as a complement to or replacement of the h-index.

What is the standard of “being neither too near to nor too far from h-index”? We cannot give an explicit and specific benchmark. In this study, as far as the three journals are concerned, consider these: (1) Using the correlation coefficients 0.65–0.95 as the standard, 11 indices (r-index, maxprod, e-index, p-index, weighted h-index, a-index, h(5,2)-index, q2-index, Wu’s w-index, μ-index, and t-index) are successful. (2) When 0.5–0.99 is taken as the criteria, 20 indices (r-index, maxprod, e-index, p-index, weighted h-index, a-index, h(5,2)-index, q2-index, Wu’s w-index, μ-index, t-index, π-index, PNM1GP, TCM1GP, ACCM1GP, m-index, h(2)-index, g-index, rm-index, and tapered h-index) are effective. (3) When benchmark is statistically positive significance (p < .01) in the correlation coefficients, all indices except for the normalized h-index and ph-ratio reach the required standard, which makes this third option not very helpful. A possible extension to this paper is to determine which benchmarks are best to identify useful new indices.