Introduction

Indicators are essential for research evaluation and constitute the core of applied bibliometrics (Vinkler 2010). Numerous indicators have been proposed in recent years [for a comparative overview see Yan et al. (2016) and Bornmann et al. (2011)]. Many bibliometricians have drawn attention on the difficulty to grasp a complete oeuvre in a single indicator and have suggested additional elements (Costas and Bordons 2007; Bornmann and Daniel 2009; Zhang 2009). Despite the conceptual weakness of any single indicator, the h-index (Hirsch 2005) has been widely accepted as a simple indicator of a researcher’s influence. However, also this indicator has been criticized for a number of drawbacks and inconsistencies (Costas and Bordons 2007; Wendl 2007; Waltman and Van Eck 2012; Bouyssou and Marchant 2011). Especially its inaptitude for benchmarking researchers from different disciplines (Batista et al. 2006) emphasizes the need for contextualization (Wendl 2007). Another important disadvantage is that the h-index does not signal increasing numbers of citations of the most influential (highly-cited) papers (Vinkler 2010, p. 864). Consequently, a number of variants such as the Kosmulski-index h(2) have been proposed (Kosmulski 2006). More recently, a ‘fame’ index f2 has been developed, based on a categorization of academic articles, the gh-rating or ghent-rating (Fassin 2018).

The paper is structured as follows. A succinct literature study on recent research on normalization of citation distributions and on multiple authorship is followed by the presentation, development and refinement of the gh-rating, complemented with a practical application to different scientific disciplines. The principal drawbacks of the h-index are illustrated with practical examples, with a special attention to the inability of the h-index to acknowledge highly cited papers and the difficulty of comparison between scientific fields.

In the next section, a proposal for normalization is formulated with the hf-ratio and the HF-rating. Their advantages are demonstrated with an application of inter-field comparison. In addition, a method for normalization of multiple authorship is developed, leading to the adapted fractional AHF-rating. Finally, the independence-index is introduced to analyze how independent the scholar is from co-authors for their most cited articles.

Succinct literature study on normalization

Radicchi et al. (2008) pointed to the universality of citations distributions. However, citation distributions substantially differ in skewness (Seglen 1992). Citation patterns vary over fields; those of Social Sciences (SSCI) journals differ from those of the Science Citation Index (SCI) journals (Leydesdorff et al. 2019).

A constant objective of the bibliometric field has been the quest for normalization, in order to allow interdisciplinary comparison. The h-index and its alternatives have not succeeded in this endeavor. Bornmann and Marx (2014) and Leydesdorff et al. (2016), have advocated to count highly-cited papers in the corresponding field of publication instead of papers in the h-core.

Leading bibliometrics researchers have plead for the appropriate use of normalized citations rather than bare citation counts in order to compare “like with like” (Bornmann and Leydesdorff 2018). The Leiden manifesto has relayed this request (Hicks et al. 2015). Therefore, the h-index values are only comparable after proper field-normalization (Bornmann and Leydesdorff 2018).

Recent research articles in bibliometrics (Leydesdorff et al. 2011; Bormann 2013) have put forward percentiles and percentile rank classes as schemes for normalizing the citation counts. Based on a percentile approach, quartiles and deciles have been defined. Percentile thresholds have been introduced; especially the top 10% has been recommended as a proxy for excellence (Waltman and Schreiber 2013), while the top 1% Highly-cited papers, in the last 10 years as defined by the Web of Science, constitutes an even more severe criterion (Kosmulski 2018).

Most attempts to normalization work on one criterium. Leydesdorff and Bornmann (2011a, b) have proposed a percentile approach, an I3 (Integrated Impact Indicator) scheme with 6 classes of ‘standard’ percentiles with linear weighted factors. Fassin (2018) has introduced variable h-type percentiles in his gh-rating framework in super-position of the thresholds of the standard percentiles, with weighted factors in a geometric sequence. Those h-type percentiles situated at the top of the citation distribution are determined on the basis of g, h, h′, h2 and h3-percentiles, based on their respective h-type index and corresponding h-type cores. The g-index of a set of articles is defined as the highest rank g such that these g articles together received at least g2 citations (Egghe 2006). The h(2)-index or Kosmulski-index is equal to h2 if r = h2 is the highest rank such that the first h2 articles each received at least (h2)2 citations. In analogy, and applying the generalization of the h(k)-index (k = 1,2,3,….) already proposed by Kosmulski (2006) and by other colleagues, the h3-index is equal to h3 articles that have at least (h3)3 citationsFootnote 1; for example an h3-index of 5 means that this author has 5 articles with at least 53 or 125 citations (Fassin and Rousseau 2019).

More recently, Leydesdorff et al. 2019 introduced a variant within the general I3-scheme with log-linearity, I3*. The proposed indexes I3 and I3* of the percentile approach by Bornmann and Mutz (2011) were originally designed for the journal indicator I3 where all publications of the dataset of that journal are taken into account.

A second aspect of normalization encompasses issues related to multiple authorship. Collaborative interdisciplinary research has developed in the last decades, encouraged by international cooperation, especially in life sciences as medicine, plant sciences and in various subfields of physics (Cronin 2001). The number of co-authors has also increased over time, as teams play an increasing role in the production of scientific knowledge (Wuchty et al. 2007; Fang 2018). However, the degree of multiple authorship varies a lot between disciplines; it is much higher in physics and in medicine (and life sciences) than in management (and social sciences) or bibliometrics (Cronin 2001). Some disciplines have a tradition of large teams that publish some ten papers a year, with all the authors of the team, while other disciplines focus on a small number of individual articles, or with one or two colleagues. The increasing presence of multiple authors has become a classical problem for bibliometric indicators (Henriksen 2016; Fang 2018). Especially the h-index has been criticized for not taking consideration of multiple authorship (Schreiber 2009). Although Lee and Bozeman (2005) suggest collaboration is positively correlated with research productivity, researchers may benefit disproportionally from working in larger research groups (Sahoo 2016; Berker 2018).

In order to address the issue of multiple authorship with larger amounts of publications, several methods for fractional counting have been developed over the years: complete-normalized fractional counting, where the number of citations is divided by the number of authors (Lindsey 1980 cited in Berker 2018; Van Hooydonk 1997; Leydesdorff and Bornmann 2011a, b), or uneven counting methods such as the geometric counting (Berker 2018) or the harmonic method (Hagen 2010) with a declining weighted factor in function of the author’s rank, and more recently, the modified fractional counting (Sivertsen et al. 2019). Most of those counting methods are universally applicable to the calculation of most bibliometric indicators such as the h-index (Berker 2018; Schreiber 2009). All those methods lead to different fractions, all present advantages and inconveniences (Berker 2018). Some methods give complete credit to the first author and no credit for their co-authors; other methods give proportional weights for different roles in the genesis of the paper; in some cases, the first author gets as much credit as the single author, in other cases only a portion of it, in other cases corresponding and first authors receive a higher weighted factor than the other authors. Those different methods of paper credit assignment posit the ethical dilemma of collaborative research and the necessity for fair comparison in research assessment (Fang 2018; Xu et al. 2016).

The ghent-rating and refinements

In this section, the new rating system for academic publications, the gh-rating or ghent-rating recently proposed by Fassin (2018) is developed and refined. The ghent-rating is an innovative scheme based on a categorization into tiers of publications within similar citations ranges. The ghent-rating serves as a framework of a large dataset such as a field to position the publications of an individual researcher and to calculate the researcher’s f2-index (here determined only on the basis of the h2-core of their publications).

The categorization of articles is accomplished in function of their position in the citation distribution rank, compared to successive thresholds of standard percentiles and h-type percentiles. These ghent-ratings are comparable to financial ratings such as Moody’s and S&P ratings, with categories designated by the symbols AAA, AA, A, BA, BBB, BB, B, CCC, CC, C, D, E, etc.

The categorization makes use of a variable percentile approach based on recently developed h-type indexes (Hirsch 2005; Egghe 2006). The gh-rating focuses “on the range of publications with the highest citation impact- that is the range which is usually of most interest in the evaluation of scientific performance” (Bornmann 2013: 587). The levels set to categorize articles into the different categories are defined by a mix of standard levels for the higher percentile ranges (for articles with fewer citations), and the h-type percentiles for the lower percentile ranges, i.e., for articles with a higher amount of citations. In practice, Fassin (2018) opts for a model with three superposing methods to define the thresholds. The basic division rests upon the standard percentiles and two different methods at both end of the distribution ranking: h-percentiles at the top end and fixed thresholds at the lower end, expressed through a minimum number of citations (0, 1 or 2).

The principles behind those ratings lie on an exponential increase of impact in function of the higher grades of the highest-cited papers. The categories are divided in grades A, B, C, D, E and Z in declining order of citations, each with a corresponding weighted factor defined by a geometric sequence (Fassin 2018). The B, C and D categories are delineated through the 10%, 25% and 50% percentile. The A-category is defined through the h-percentile, the percentage of articles within the h-core of the dataset. The Z-category groups the articles that have not received any citation yet, and the E-category groups those articles with 1 or 2 citations, and those that have not reached the 50% percentile. Figure 1 presents the gh-rating categories on a synchronous citation distribution curve.

Fig. 1
figure 1

The citation distribution curve and the gh-rating categories (Fassin 2018)

The lower categories group a larger amount of articles of all authors of the database selection (25% for D and 15% for C); the higher categories comprise around 9% (for B) or 1% (for A) of all articles of the dataset. Those general categories are further subdivided into subcategories with the first letter of the general category, for example CCC, CCD, CC and C for the general category C for the respective 12.5, 15, 20 and 25%-percentiles.

Further subdivisions are calculated at the top of the distribution on the basis of h3, h2, h′, h and g-percentiles, based on their respective indexes. They define the respective AXX, AAA, AA, A and BA categories. The BA-category corresponds to the g-percentile (Fassin 2018).

The weighted factors are defined by a geometric sequence: 4, 2, 1, ½ and ¼ with the ‘normal’ weight of 1 assigned to the 10% percentile band (i.e. B). Sub-categories receive an intermediate weighted factor, as defined in Table 1. For each of the publications that fall within the top categories, within the h-core and g-core, a bonus system is constructed: a bonus of 0.25 for the g-core, 0.50 for the h-core, 1 for the h′-core, 2 for the h2-core, 3 for the h3-core, is added to the starting weighted factor between 1 for the 10% B category or 2 for the higher cores of the 1% BBB category.Footnote 2 An additional bonus of 1 is added for the 0.1% highly-cited articles (for datasets of over 500 units). The bonus system for the h-percentiles contributes to mitigate for different h-indexes following different databases. While the h-indexes of Scopus and the Web of Science may differ for 20%, the databases based on Google Scholars may attain double of the h-indexes of the Web of Science (Teixeira da Silva and Dobrànszki 2018). In this bonus system, there is still differentiation if articles within the h-core also belong to the 1% percentile category BBB or only to the 10% percentile category B.Footnote 3

Table 1 The categories and weighted factor of the gh-rating

The gh-rating applied to the field of bibliometrics

Table 1 presents the standard and h-type percentiles with the corresponding weighted factor, bonus and final weighted factor, and the grade. The last line illustrates the application of the methodology to determine the thresholds for the field of bibliometrics. The sub-categorization in the gh-rating presented in the second half of the table, allows to make a better differentiation when comparing authors with less cited publications. Those sub-classifications also allow smoother transition. Two papers at the 9% and 11% percentile obtain a 10 or 2 in the I3* scheme, while the gh-rating leads to B or CCC with weighted factor of 1 or 0.75.

Comparison of the weighted factors of the percentile ranks

The three percentile-based schemes I3, ghent-rating and I3* make use of respectively a linear, geometric or log-linear sequence for their weighted factors for the percentile ranks. The three schemes are compared in Table 2.

Table 2 Weighted factors of the percentile ranks (Leydesdorff et al. 2019 adapted from , Table 1)

Where the log-linear scheme increased the spread between the 1% and 100% to 100 compared to 6 for the original I3-scheme, the gh-rating mitigates between both schemes with a spread of 16, and a spread of 50 for the absolute top (h3-core). The gh-rating offers differentiation in the top segment of highly-cited papers, namely a factor 3. The I3 scheme favours quantity, 3 papers in the 50%-percentile equal 1 paper in the 1%-percentile. In the I3* scheme, 10 papers in the lowest half of the citation distribution equal 1 paper in the 10%-percentile; 10 papers in the 10%-percentile equal one paper in the 1%-percentile compared to 2 in the gh-rating, where one paper in the top 1% necessitates 8 papers in the second half of the distribution. The limitations of the number of papers taken into account for the f2-index to the h2-core puts the emphasis on the best-cited papers.

The approach to normalize different points in the distribution curves, namely the thresholds of standard and h-core percentiles takes into consideration the variations in skewness of the different citation distribution curves in different fields, or journals or other datasets. Especially the variable h-type percentiles add an additional dimension of variation in skewness in the upper part of the distribution with the highest citations. h-type percentiles focus on the most important series of articles with the highest impact and introduce additional differentiation.

Drawbacks of the h-index and highly-cited papers

A major criticism against the h-index is the low differentiation factor, and more precisely, the fact that highly cited papers are not appropriately acknowledged. Bornmann et al. (2011) assert that the h-index is mainly determined by the productivity dimension (number of articles) and not by the impact factor (citations). As an example, I compare three totally different citation distributions of three different authors (in first part of Table 3). All three have 9 papers, and an equal h-index of 7. However, they present large differences: author X has 7 papers with 25 to 7 citations; author Y, has 7 papers with more than 100 citations, with 1000 citations for the highest cited paper, just as author Z, who has a second paper with 75 citations, and 7 other papers similar to author X. The total number of citations of author X amounts to 100 citations, compared to 2506 for author Y and 1125 for author Z.

Table 3 Citation distributions of three different authors, with corresponding gh-rating and weighted factors (wf)

The gh-rating is now applied to the three authors of my example (second part of Table 3), based on the citation distribution thresholds of the library and information sciences (4th line in Table 2): author X has 4 C papers (as grouped in general categories), 4 D papers and 1 E paper; the first paper of authors Y and Z obtains an AXX grade; there are 5 other A papers for author Y; one B-paper for the second paper of author Z. The corresponding weighted factor of the papers for the three authors are presented in the third part of Table 3.

Bibliometric differences between fields

The second major criticism on the h-index is the inability to compare between different scientific fields. Table 4 presents a few selected bibliometric data of some of the most-cited scholars in their disciplines.Footnote 4 Witten in physics, Montagnier in medicine (Aids-research), Shane in management (entrepreneurship research) and Bornmann in bibliometrics. For each author the table presents the number of papers, the total of citations of their papers, the number of citations of their most cited paper, the average number of citations and the h-index. In the right part of the table the corresponding discipline is assigned with the number of publications, the most-cited article of the field, and the h-index of the field.

Table 4 Profile of highly-cited authors in 4 disciplines

The highly-cited authors in the 4 disciplines have a different bibliometric profile. Witten’s h-index of 141 is nearly the double of Montagnier’s h-index, 3 times the h-index of the best entrepreneurship scholars, and 3.5 times the h-index of information scientist Bornmann. But also the number of total citations presents huge differences: Witten has 14 times more total citations than Bornmann, 3 times as much as Montagnier and about 9 times more than Shane.

At a lower level of total citations, younger researchers as Bonnet in infectious diseases science, and Zellweger in entrepreneurship are in the comparable range of around 1000 citations, but Zellweger has an h-index of 11 while Bonnet’s h-index reaches 22.

Moreover, fields are also widely different: the database selection of physics and medicine group more than 500,000 articles each, about 10 times more than entrepreneurship research and 40 times more than bibliometrics. The maximum number of citations of the most cited article in the field overpasses 30,000 in medicine and 25,000 in physics for 4000 in entrepreneurship research and 1200 in bibliometrics. The h-index of the fields of physics and medicine are double the one for the entrepreneurship field and is five times that of bibliometrics.

The categorization of the gh-rating can be applied for each specific field or subfield. Table 5 presents a continuum of thresholds based on percentiles and the corresponding rating categories for four disciplines: physics, medicine, entrepreneurship and bibliometrics. The table gives the number of articles (n), the h, h2, h3 and g-indexes for datasets for each of the 4 disciplines selected as ‘topic’ in the Web of Science search. The disciplines have been chosen on the basis of diversity in size of datasets. The following columns in Table 5 show, for each field, the highest cited citation count, the required citation thresholds for the 0.1%, 1%, 5%, 10% and 25% percentiles, as well as the thresholds for the g, h, h2 and h3-core.

Table 5 Distribution data of different disciplines*

Classifying the publications of the selected authors in Table 6 in the categories of the citation distribution of the most demanding field in terms of h-core, physics, leads to 37 articles of Witten in the field’s h-core, of which 1 in the h2-core, and 28 in the 0.1% threshold. Montagnier would have two articles in the h-core of which 1 in the h3-core, while Shane would have two articles in the h-core and Bornmann none. Hirsch would have one article in the h-core in physics, and one for his famous bibliometrics article. In the 1%-threshold, Witten obtains 84 articles, compared to 10 for Hirsch, Montagnier 9, Shane 8 and Bornmann only 3.

Table 6 Categorization of publications of highly-cited researchers following the distribution of physics or following the specific distribution of their field

A complete different result is found when categorizing the articles in the categorization of their corresponding field, in the right part of Table 6; so, Montagnier according to the medicine citation distribution, Shane following the entrepreneurship citation distribution and Bornmann in the smaller bibliometrics sample. Bornmann now reaches 9 h-core articles, comparable to Shane and Montagnier 10, all in their respective fields. Similar comparisons can be made at the B- and C-level for younger researchers with lower amounts of citations, and for mid-career researchers.

Normalization: the hf-ratio and the HF-rating

The quest for normalization in order to allow interdisciplinary comparison has been a constant objective of the bibliometric field. The approach of categorization as set out in Table 6—in line with the methodology based on the gh-rating—allows to compare different researcher’s contribution. The comparison focuses on publications in the highest categories, and thus of the publications in the researcher’s h-core or h2-core. In a simplified version for comparative analysis, one could select the researcher’s i best articles, and sum up their corresponding weighted factors.Footnote 5

If fn is the contribution of the nth paper determined by the weighted factor of this publication (including the bonus), then fi is the sum of the weighted factors of the i best cited publications:\(f_{i} = \sum \;f_{n} .\)

$${\text{The}}\;{\text{top}}\;{\text{four}}\;f_{i} - {\text{index}},\;f_{4} ,\;{\text{is}}\;{\text{thus}} \;f_{4} = f_{1} + f_{2} + f_{3} + f_{4} .$$

An alternative for a more balanced comparison of the impact of researchers is to select a fixed number of the highest cited papers and to calculate the average of their weighted factors

$$hf_{i} = \left( {1/i} \right) \cdot \sum \, f_{n} .$$

Following a common approach in statistical analysis, where often the extreme data are dropped to give a more precise measurement, I propose an adjusted average by dividing the sum of the i papers by (i − 1). So,

$$hf_{i\prime } = 1/\left( {i - 1} \right) \cdot \sum \, f_{n} .$$

This ratio is further called the researcher’s hf-ratio. This adjustment allows to avoid disadvantaging younger researchers who have no more than n papers or whose nth paper has not attained the same impact yet.

Reversed conversion of this hf-ratio on basis of the same Table 1 (limited to AAA) leads to the categories for researchers, or HF-ratings, AAA, AA, A, BA, BBB, BBC, BB, B, CCC, CCD, CC, C, D, E, etc. with corresponding percentiles.

In practice, I suggest to select the researcher’s four most cited papers (n = 4). So, three A papers will give an A-rating to the researcher. Several possibilities can lead to a certain threshold. A minimum HF-rating of B is obtained with 3 B papers or with 2 B papers and 2 C papers, etc.

The division by a fixed minimum factor, 3 or (n − 1), also allows to attenuate the contribution of authors with only one or two papers in the field but with exceptional amounts of citations: whereas the weighted factor f1 would give them the maximum count of 6, that only paper will give them a hf-ratio of 2. In order to distinguish occasional authors with only one or two papers from researchers with a large body of research in that field, they will receive only the basic categories A, B, C or D, put in italics.

The hf-ratio gives an average of the f4 most-cited papers; it does not signal the existence of one exceptionally highly-cited paper in that selection. In order to further differentiate those authors with such an exceptional paper, an asterisk * or sign ° will be added to the grade; an asterisk for HF-ratings of A will signal the existence of one paper in the h3-core of the field (grade AXX), while an ° sign for HF-ratings of B or C will indicate the presence of a paper in the h-core of the field (minimum A-grade).

In practice, seen the wide dissemination and acceptance of the h-index, I propose to add this rating (and the sign), based on the converted hf-ratio, to the author’s h-index to form a HF-rating (high fame).Footnote 6 This new HF-rating complements the well-known h-index with a relative indication of its influence in its field, and signals the existence of a highly-cited paper.Footnote 7

Applied to my example, the sum of the weighted factors gives 1.95 for author X, 17 for author Y and 8.25 for author Z. Applying the refined average to those three authors, this sum is divided by 3, which leads to respectively 0.65 for author X; 5.67 for Y and 2.75 for author Z. Authors X and Z have one exceptionally highly-cited paper in the h3-core. The reversed conversion assigns the categories respectively DDD, AAA* and A*. The ‘high fame’ HF-rating of the three authors is thus respectively 7DDD, 7AAA* and 7A*. "Appendix A" describes a heuristic for the calculation of the HF-rating.

The advantages of the HF-rating: inter-field comparison

The proposed HF-rating offers some advantages, especially a possibility of comparison of the impact of a researcher within his peer set. The HF-rating does not aim to rank, but leads to a rating in tiers of articles grouped in comparable categories. This HF-rating allows benchmarking. It gives the average categorization of the researcher’s (i or 4) best cited papers benchmarked in their discipline or field.

This new HF-rating based on the ghent-rating induces some qualitative elements in the evaluation of research and mitigates between classic h-indices. It includes more selectivity, and allows to single out more influential papers than the traditional h-indexes.

But the great benefit of the HF-rating lies in its universal scope of application. Thanks to its normalization character, the new HF-rating allows, to a certain degree, inter-field comparison. I will illustrate this with a practical example in a few totally different disciplines.

As an illustration of an interdisciplinary benchmarking, I present the calculation of the hf-ratio (Table 7) of a selected number of researchers in the four selected scientific disciplines: physics, AIDS- and entrepreneurship research, and bibliometrics. For each field, I select beneath one of the most influential researchers already presented, a promising scholar with around 1000 citations, and a younger researcher with around 100 citations. For the entrepreneurship field, I also compare the three most influential authors. Hirsch, the founder of the h-index, is positioned in two fields: physics, where he has his largest contribution, and bibliometrics, where he only has a limited number of extremely impactful papers. In his case, there is no overlap between the fields.

Table 7 shows the number of publications and the total number of citations of each researcher, the number of citations of the highest-cited article (c max), the average of citations per paper, and their h and h2-indexes. Then follow the gh-rating of their 4 most cited papers, and the hf-ratio as defined supra. The table is completed with the HF-rating of those researchers.

Table 7 h-indexes and hf-ratios of a selected number of researchers in different disciplines: physics (PHYS), AIDS and entrepreneurship research (ENT), bibliometrics (BIBL)

The comparison illustrates the variety of citation habits and size of different fields that result in higher h-indexes for influential authors of broad and large disciplines. In contrast, impactful authors in smaller specialized disciplines have a lower h, but comparable hf-ratios. The resulting HF-rating allows—to a certain extent—inter-field comparison. The most influential authors in each discipline obtain an AAA categorization, independent from their largely different h-index: Witten with 141, Montagnier with 79, Shane with 35 and Bornmann with 41; also independent from the huge differences in total citations or number of citations of their best-cited article, about 14 times higher for Witten than for Bornmann.

The three top researchers in entrepreneurship have different citation distributions. Shane and Wright have about equal h2-index, lower than Zahra’s h2-index, but Wright has the highest h-index, followed by Zahra and Shane; thanks to an exceptional highly-cited paper, the average influence of Shane’s best papers is higher than that of Zahra and Wright. Shane obtains an HF-rating of AAA*, Zahra AAA and Wright AA, the opposite of the ranking following the h-index. Their HF-rating—38AAA* for Shane, 43AAA for Zahra and 50AA for Wright—provides complementary information to the sole h-index.

The use of the hf4′ variant for the hf-ratio where the 4 best papers are chosen and divided by 3 allows also to evaluate and benchmark the work of younger authors. In an absolute ranking in order of the number of papers or h-index, the researchers in physics and medicine would rank much higher than their colleagues in smaller disciplines, as already discussed in the previous chapter. The comparison of the research oeuvre of Teynie in medicine (h-index of 8 for 11 papers) with Rinia in bibliometrics (10 papers with h-index of 7) is nuanced with and a CCD-grade for Teynie and a BA°-grade for Rinia, the average grade of their best papers benchmarked in their field. Teynie obtains as HF-rating 8CCD and Rinia 7BA°. For the younger researchers Rontynen in physics, and Faba-Perez in bibliometrics with equal h-index of 5, for respectively 5 and 19 papers, the HF-rating awards 5CCC for Rontynen and 5C for Faba-Perez.

Seen the exponential aspect of the categorization of the gh-rating, the categories reflect some proportionate distribution in the categorization of researchers. As a logical result, in absolute terms, more researchers in the larger disciplines with more researchers will be able to obtain the higher ratings, than their colleagues in lower numbers, from more limited disciplines.

However, as other indicators, the hf-ratio is only probably approximately correct (PAC) (Rousseau 2016). There can still raise differences in grading for articles near the thresholds. For example, if the article has a few citations beneath the h2-index, it gets only a weighted factor of 2.5 rather than 4; an article just above the 1% threshold obtains only 1.5 rather than 2. Even if those differences are much smaller than for the I3* methodology (100 vs 10), they can have a slight impact on the researcher’s final grade, for example A rather than AA. However, the division of the sum of 4 weighted factors by 3 rather than 4 compensates those small errors; in fact, there are more scholars who benefit from the rounding up and receive a slightly higher grade.

Normalization of multiple authorship

The presented method of HF-rating has made use of the complete counting principle, where all authors of the paper receive the same weighted factor, independently of the number of authors, their rank or their specific role as first author or corresponding author. However, co-authorship patterns vary across disciplines. Both the average number of articles by researcher and the average number of authors per article strongly differ between scientific field. This diversity constitutes an additional difficulty for interdisciplinary comparison.

More than 30,000 physics scholars published more than 20 publications, most in multiple authorship, compared to more than 10,000 Aids researchers and only 200 researchers in entrepreneurship and 85 scholars in bibliometrics. While the median of the number of authors per paper is around 2 in entrepreneurship, 3 in bibliometrics, the median climbs to more than double in medicine and physics. In entrepreneurship research, 75% of the papers have not more than 3 authors, compared to a maximum 4 authors in bibliometrics and 7 authors in Aids research. About 20% of the entrepreneurship papers are single authored, while under 10% for Aids research.

In order to tackle the issue of multiple authorship with larger amounts of publications, the various methods for fractional counting are universally applicable to the calculation of most bibliometric indicators such as the h-index (Berker 2018). Building further on my approach of weighted factors depending from the position in the citation distribution curve, the fractional h-index is now calculated on the new citation distribution where all citations are replaced by the total citations multiplied by the factor corresponding with the chosen fractional method. To calculate the impact of this transformation from complete counting to pure fractional counting, I take the example of author Y, to simulate and calculate the adapted corresponding citations if each of those papers had been co-authored by respectively 2, 5, 10, or 50 authors (first part of Table 8). I then assign the corresponding gh-rating in the second part of Table 8 and the corresponding weighted factor (wf) in the third part. Papers with 3, 4, 6 to 9, or 20 or 100 co-authors have intermediate gh-ratings and weighted factors, based on similar calculations.

Table 8 Simulation of adapted corresponding citations, gh-rating and weighted factors

With an increasing number of authors, the category of the highly cited papers moves down, with two categories for 2 to 3 authors (e.g. from AAX to A for paper 2), and more than two categories for larger number of authors. For papers in the h-core (A-grade), the decrease is one category up to 4 authors, and 2 when 5 or more authors. For articles with only a few citations, the category moves down with only one rank (from D to E as for paper 8). For large numbers of co-authors, the h-core A-papers can decline to D with 20 co-authors or E with 50 or more co-authors. The decline in grading is somewhat reduced to those top articles that are largely above the thresholds of the 0.1% or h3-core.

The transformation results in adapted reduced weighted factors for the grade of the articles. On average, the weighted factors are divided by 1.5 for 2 authors, by 2 to 3 for 5 authors. From 10 authors on, the reduction factor amounts to a factor 4 or 5 for the most-cited papers: over 50 authors, by a factor of 10. For highly-cited articles (as paper 1 and 2), large above the h- or h2- thresholds, the reduction factor is somewhat lower. For articles with only a few citations, the reduction factor is also lower. Therefore, in order to avoid complete dilution, I suggest to limit the reduction of the weighted factor to the weighted factor for 10 co-authors.

I further propose a second amendment to the application of the pure fractional counting method. In order to acknowledge the important contribution of the first and corresponding authors, I suggest to limit the reduction of the total citations to 50% for the calculation of their grade. The second paper (in Table 8) with 500 citations would receive category AAX if single authored; reduced to the adapted 10 citations in the case of 50 co-authors, the category drops to DDD, which constitutes a hard treatment for the first or corresponding author. In my proposed amendment, the adapted category becomes AA for the first or corresponding authors. The weighted factor of 5 in complete counting approach or 0.35 in the pure fractional approach is attenuated to 3 in the adapted fractional approach. Paper 5 would obtain an A-grade in the complete counting approach, or a CC grade for 10 co-authors in the pure fractional account approach; in the adapted fractional approach this grade moves to BBC for the first author while the weighted factor is reduced from 2 to 1.5 rather than towards 0.5.

Those both amendments to the pure fractional counting form an adapted fractional counting method. The publication category and its corresponding weighted factor are calculated with positioning the fractional citations or the adapted fractional citations in the citation distribution of that field. Similar to the hf-ratio one can calculate the fractional hf-ratio fhf and ahf on the basis of the categories of the fractional or adapted citations. To differentiate with the complete counting hf-ratio and categories signified with capital letter type, lower case letter type will be used for the fractional ahf- and fhf-ratios, with italics for the ahf-ratio.

Asymmetry in the selection of publications

Just as the hf-ratio, also the afh and fhf-ratios are based on the i most cited articles, 4 in my proposal. Seen the calculation, those 4 selected papers after fractional counting are not necessarily the same as those selected for the complete counting hf-ratio. Indeed, when looking back at Table 8 one sees how fast the weighted factor diminishes. In practice, the fractional fhf-ratio usually selects the researcher’s 4 best paper where they are the single, corresponding or first author.

Tables 9 and 10 illustrate the phenomenon of asymmetry in the selection of the 4 best papers for the hf-, ahf- and fhf-ratio’s for two authors of different disciplines and different profiles: Lutz Bornmann in bibliometrics with as major papers mainly single or first paper status, and Sarah Bonnet in infectious diseases with most papers with more than 7 authors up to 25, and half of the papers where she is either first or corresponding authors. Tables 9 and 10 provide for each paper the number of authors (n a), the place in the authors rank order (pl), the adapted weighted factor (w); following the journal and its year of publication, the paper’s gh-rating category based on respectively an adapted fractional counting (gh a), full fractional counting (gh fr) and full counting (gh); then the rankings according to the three methods (r a, r fr and r gh), the citations taken into account for the gh-rating (ca, cfr and cit), and finally, the average citations per paper (avg y).

Table 9 Fractional counting versus complete counting: low multi-authorship
Table 10 Fractional counting versus complete counting: more multi-authorship

Bornmann’s most cited article remains the most cited article in the transformation to fractional counting, even if it has been co-authored with another scholar. Bornmann’s 2nd, 3rd and 4th most cited articles however come and 8th, 4th and 6th in the pure fractional counting classification and only 4th, 5th and 7th in the adapted fractional counting classification, as they have been written with one or two co-authors. Bornmann’s other single papers, positioned in 5th, 8th, 11th and 17th place in the citation ranking, move to the 2nd, 5rd, 6th and 8th place in the adapted fractional counting.

The categories and weighted factors for those top 4 papers in complete counting classification scheme lead to 4.33 which corresponds to a AAA classification. The average of the adapted fractional counting gives 3.25, good for category aa, the same as for the pure fractional counting approach. This is due to similar categories of the 4 best papers in both calculations (aa, a, a, ba), The small difference in grade between complete counting (AAA) and adapted fractional counting (aa) is due to the fact that Bornmann is the single author or first author of his most cited papers.

There is more differentiation in the other example in Table 10 of Sarah Bonnet, a younger researcher. In this discipline, infectious diseases, there are much more co-authors than in management or library and information sciences.

The four selected articles in the total fractional counting classification are the 2nd, 4th, 5th and 14th most-cited articles in the complete citation ranking.

Bonnet’s four most-cited articles move to place 5, 1, 18 and 2 in the pure fractional counting method and to place 10, 7, 15 and 1 in the adapted fractional counting method. The reason is the larger number of co-authors respectively 9, 5, 25 and 4, while Bonnet is first author in her 4th most-cited article. The differences in hf-ratio are larger: an average of 1.50 in the complete counting classification (BBC grade), a fhf-ratio of 0.48 in the total fractional counting (ddd) and 0.83 as ahf-ratio in the adapted fractional counting (ccc).

A similar large discrepancy between the three methods is given for Montagnier: 5.67 in the complete counting classification (AAA), 2.92 in the total fractional counting (a) and 4.42 in the adapted fractional counting (aaa).

Those simulations show the severe penalization of the pure fractional counting approach that does not equitably acknowledge the conceptual work and the coordination efforts of large teams of the first and corresponding authors. I therefore retain the adapted fractional counting as alternative complement to the complete counting method, as it more fairly acknowledges first or corresponding authorship.

Applying these principles to the selection of the authors of different disciplines of Table 7, I calculate the adapted fractional counting ahf-ratio, and the pure fractional counting ahf-ratio, also calculated on the basis of the four most-cited articles. "Appendix B" gives a practical heuristic for the calculation of the adapted fractional hf-rating. Table 11 presents the hf-ratios, the adapted ahf- and the full fractional fhf-ratios, the corresponding HF, AHF and FHF ratings, with the recalculated fractional h-indexes haf and hff, and the independence-index I.

Table 11 Adapted fractional HF-rating and independence-index

This fractional AHF-rating is either similar or a category or two lower than the HF-rating based on complete counting.

The independence index

The third last column in Table 11 presents the I-index, the independence index, calculated as the division of the adapted fractional ahf-ratio by the hf-ratio. As its name implies, it illustrates how independent the scholar is for their most cited articles. The higher the I-index, the higher the number of top-cited papers where the author is single, first or corresponding author. Hirsch for example is the single author of 3 important papers in bibliometrics. Witten and Shane have written their major papers either single or with one co-author.

The top-authors Zahra, Montagnier and Bornmann have some impactful paper in collaboration, where they are not the single or corresponding author. In medicine, but also in physics, the I-index is somewhat lower because of multiple collaboration with a high number of colleagues, internationally. The lower I-index for younger researchers illustrates the increasing impact of collaborative research in recent years.

Conclusion

This extension of the application of the ghent-rating leads to a normalized hf-ratio and derived HF-rating. By focusing on the range of publications with the highest citation impact, I contribute to a better method for benchmarking researchers among different research disciplines. Indeed, while the h-index and most h-type related indicators depend from the discipline, the normalization offered by the hf-ratio allows identifying tiers of comparable researchers over all fields. The HF-rating provides the average grade of a researcher’s best papers benchmarked in their own specific field.

This high-fame HF-rating corresponds to the call in bibliometrics for more qualitative indicators and ‘responsible metrics’ in the evaluation of scientific performance (Editorial Nature 2015). The additional information provided by the HF-rating adds context to the h-index. Thanks to the normalization based on a series of thresholds, the HF-rating presents a valuable attempt for the universality of citation distributions. The innovation in the approach with variable h-type percentiles is to take into consideration the variations in skewness of citation distribution curves.

Contrary to the severe pure fractional fhf-ratio, the alternative with the adapted fractional ahf-ratio corresponds also to the equitable acknowledgement of the first and corresponding authors. It complements the complete counting hf-ratio. The ratio between the ahf- and hf-ratios, the independence I-index, determines the scholar’s contribution in terms of single author, or first or corresponding author.

Like many other indicators, the hf-ratio is only probably approximately correct. The present ratio has limitations, as it is based on total citations and thus evaluation of complete scientific careers, which favours established authors. It can, however, be applied on a limited time-window of 5 or 10 years. The figure of 4 best papers has been chosen on the basis of simulations. However, some research fields with higher frequency of publications, often with a large number of authors, may chose a larger fixed number than 4.

The same methodology and HF-rating can also be applied for the benchmarking of scientific teams or universities.