Introduction

Bibliometrics is a field of study applied to the various scientific disciplines, which also includes bibliometric analysis, developed in the last twenty years thanks to the online availability of large databases. Bibliometrics uses mathematical and statistical techniques to analyze the distribution patterns of publications and to explore the impact within the scientific communities [1, 2]. Research evaluation requires a two-pronged approach:

  1. 1)

    Quantitative, i.e., in numerical terms of scientific impact, patentability, of the presence of contracts with companies interested in research topics;

  2. 2)

    Qualitative, i.e., the judgment of peers (evaluators) or peer-review, which is still the most important of the methods for a meaningful assessment of quality. It is clear that it would be necessary to associate one or more qualitative methods with quantitative methods [3].

In bibliometric analysis, the two best known bibliometric measures are:

  • The number of citations;

  • The impact factor or impact factor IF

Citation analysis and content analysis are bibliometric methods commonly used by bibliometry. Indicator in bibliometric analysis. However, it is not the only indicator proposed by the ISI (now Thompson) to refer to. Other indicators used are:

  1. 1)

    Immediacy index: measures how successful the work is having in the year of publication and in relation to how quickly a journal article is cited on average and how often the journal articles are cited in the same year;

  2. 2)

    Cited half life: measures the validity over time of the articles cited or the duration of the citations over time;

  3. 3)

    Rate of cites index: represents a quality index of the individual work, based on the axiom that the more the work is cited by other researchers, the more relevant its scientific value is;

  4. 4)

    The hirsch-index (H-index) based on the number of publications and the number of citations;

  5. 5)

    Citation impact: it is calculated for a specific subject or author or institution or country on the basis of the ratio between the number of citations received and the number of articles published [3, 4].

The IF: measures the level of scientific research, on a national and international scale, of scientific publications. It varies greatly from area to area; it is, in fact, a measure of the number of citations of the works published in a certain journal compared to the total number of works published by the same journal in previous years [5]. The purpose of this study is to present an innovative mathematical, bibliometric method which therefore allows to evaluate the reliability of the H-index, but above all to evaluate how much the individual author has influenced it through self-citations. The main question is “how much do self-citations affect the h-index?”.

Citation analysis, which uses citations in scientific intellectual productions to establish connections to other works or other researchers, is the cornerstone of the research discipline known as bibliometry: it is the examination of the frequency and pattern of citations in articles and/or texts in general. Although for many decades the ISI Institute for Scientific Information’s Science Citation Index, now Thompson's Web of Science WoS, has been regarded as the premier tool for measuring citations, for some time now Web services have been questioning the dominance not only of instruments of the ISI but of the IF itself.

Content analysis, on the other hand, also known as textual analysis when conducted exclusively on texts, is a standard methodology in the field of social sciences applied to the study of the content of communication. The IF is the best-known bibliometric.

“Not all self-citations are the same”

Starting from this assumption a mathematical model has been designed that should give an objective result, therefore, on how much an author is self-influencing his bibliometric parameters (h-index).

Materials and Methods

Hirsch Index

The h-index is an index proposed in 2005 by Jorge E. Hirsch of the University of California at San Diego to quantify the prolificacy and impact of scientists’ work, based on the number of their publications and the number of citations received. It assumes great importance since it verifies the real influence of a scientist on the community, regardless of single highly successful articles, or even the works of authors who, despite having published a lot, have produced only articles of little interest, as is the case using IF [4]. The H-index calculators are easily found on the net and are accessible to anyone: Scholar index, for example, is a software that interrogates Google Scholar through queries and provides the H-index of the author and/or of the publication required to verify the impact of the work on the scientific community in general. Another software that allows the calculation of the H-index on a specific author is QuadSearch–MetaSearch Engine [6].

Fi-Index (Fiorillo Index)

To remedy the alteration of this value by individual authors, a formula has been proposed, which leads to the achievement of a parameter called Fi-index, named after the author of this study. To obtain the Fi-index it was necessary to standardize some further parameters that are easy to find, the first of all is the “k value”. This is obtained as in the formula below, where “ncit” represents the number of total citations.

Total Citation value k

$$\frac{ncit}{{100}}$$

The percentages of self-citations are obtained in this way, where “cleancit” represents the citations, once the self-citations have been removed.

% of self-citation

$$100 - \frac{cleancit}{k}$$

Results

The result of this study led to the creation of a formula for obtaining a value defined as “Fi”. In particular, this “corrective” value can be used to evaluate how much a particular author has influenced his h-index during his career, with self-citations. This value is baptized with the Fi-index name.

Below are listed the methods for achieving the result, which will be reliable for any author who has modified his H-index value through self-citations. The greater this value, the greater the unreliability of the h-index. A value close to, or equal to 0, on the other hand, will indicate that the author has not influenced his bibliometric parameter.

Here instead the calculation of the Fi-index: the calculation starts from the “hindex” which precisely represents the h-index of a particular author. The result of the equation in square brackets must be subtracted from this value. Note that the value can fluctuate up and down. It can have a value of 0 only if the author’s self-citations do not affect his h-index, therefore once it has deviated from the value of 0, it is difficult for this to return to 0, but it will still be able to get closer. The range that the Fi-index can encompass is between 0 and the h-index of its researcher (or author). The main purpose is not to consider self-citations, for this any scientific search engine is enough. The aim is to understand how much those self-citations are affecting the h-index.

Fi-hindex

$$hindex - \left[ {\frac{{\left( {100 - \% selfcit} \right)}}{100} * hindex} \right]$$

It is possible to represent in a table some examples, anonymized, with raw data taken on the Scopus® Elsevier search engine.

From the last row of the Table 1, it is clear how the value is clearly different from the h-index with and without self-citations, and above all it is different from the difference of these two values. This is because the value indicates how much an author “drives” self-citations in favor of him. It is obvious that it is more complex to increase a high h-index with self-citations rather than a low h-index, the purpose of this index is to detect ad hoc self-citation.

Table 1 In this table different examples could be observed

Discussion

The index calculation is performed based on the distribution of citations that a researcher's publications receive. Hirsch's definition is as follows:

A scientist has an index “n” if at least “n” works among those he has published have been cited at least “n” times each. In other words, a scholar with an index of three published three cited works at least three times each. To better understand the operating method of calculation, another example is given below, more articulated and more similar to reality: an author has published six works with the number of citations indicated below (total citations 8 out of 6 publications) [1, 3].

Publication A, citations 0

Publication B, citations 3

Publication C, citations 0

Publication D, citations 3

Publication E, citations 1

Publication F, citations 1

There are four works that have at least 1 citation (two of these even more), so at least H = 1; moreover, there are two works that have at least two citations (they have three in particular), so surely H = 2, but there are only two works with at least three citations, so H cannot be equal to three. The Hirsh index of this author is therefore two. Note that the index would not change at all if A, C, E, F had two citations each (for a total of 14 citations out of six works). The index is structured to quantify by means of a single numerical index not only the production, but also the influence of a scientist, distinguishing him from those who had published many articles but of little interest. Furthermore, the index is not too influenced by single highly successful articles [7].

The effectiveness of the index is limited to the comparison between scientists of the same field, also because the conventions regarding publications can vary: in physics, a moderately productive researcher will typically have an index equal to the number of years of work, while scientists working in the medical or biological fields tend to have higher values. The more complex problem that arises in the attempt to calculate the index is to establish the scope in which to select the publications and citations to be considered. Since there is no single database that includes all scientific publications in all sectors, the index is dependent on the chosen database. Furthermore, it is not always easy to discriminate cases of homonymy, or to uniquely identify each single publication. For example, the index obtained using Google Scholar could be significantly different from that obtained using a specialized database. Hirsch noted that the index is generally well correlated, for a physicist, with having won prizes such as the Nobel Prize or being a member of some major academy [4, 8].

However, it is not difficult to find situations in which H fails to describe the importance of a scientist at all. For example, scientists who have had short careers are heavily penalized, as the index does not take into account their influence as they have only produced a limited number of contributions, no matter how decisive. For example, Évariste Galois’s index is two and will remain so forever; if Albert Einstein had died in early 1906, his index would have stopped at four or five [the value of the h-index increases even after the author’s death and for a long time, based on what is quoted, so it depends on how much the contributions made, even if few, were decisive and innovative [3, 9]. As a counter-proof, Einstein's H index still increases by one or two points every year] a value that certainly does not represent with dignity the importance of the studies that led to the 1905 publications. The validity of this objection is demonstrated by the analysis itself of the data reported at the end, which indicate a ranking of physicists based on the index: it immediately catches the eye that, in the face of physicists with h values higher than or close to 100 (Einstein in 2020 reaches an h = 325), in 2020 Richard Feynman has h = 60, Paul Dirac has h = 63.

The h index does not consider the context of the citations. For example, some works in an article are cited simply to facilitate an introduction, even if they have little meaning in the specific context, and have no resolving power to delimit citations made in a negative or fraudulent context (for example when a work is cited because it contains erroneous statements).The h index is affected by limitations in citation databases, particularly for articles prior to the 1990s.

The h-index does not take self-citations into account [10]. If a researcher writes many cumulative works in the same field, it is likely that he will cite his previous articles himself, and this tends to create a long queue of self-citations that can artificially increase the index [11]. Recently, however, platforms such as ResearchGate have introduced, thanks to the help of information technology, calculation methods that allow you to choose whether to display an index that takes into account or excludes self-citations [12]. The limit in this case is due to the fact that the platform carries out its calculations only based on the works uploaded to it, which depends on the willingness of the individual registered researchers to update their profile by entering all their searches [1].

The h-index does not take into account the number of authors of an article, thus benefiting the authors who decide to sign articles together. This means that even an author who has made a minimal contribution to the publication will obtain, for the purpose of calculating the index, a publication and the relative citations on account, like all the others. This clearly benefits whoever directs a research group, for example professors with a large number of doctoral students: it being normal for the professor to be among the authors, in what follows and advises students and revises the proofs of their publications, he obtains a number of articles in his name disproportionate to the actual work. The same phenomenon could occur in industrial research groups [13]. The h-index appears to emphasize work from large collaborations, rather than small groups of researchers or individuals. The h index favors sectors in which there is a lot of contingent interest, and therefore which see a higher-than-normal quantity of works being published and cited. A chore from an industry with many interests and investments will be mentioned more times than one from a niche industry with few experts dealing with it. For example, a researcher who deals with the evolution of the immune system in crustaceans is unlikely to have an index with a higher value than a doctor who studies very common infectious diseases of humans [9]. It might be thought that a simple method is to calculate an author’s h-index, removing self-citations. The advantage of the h-index is unequivocally clear, this in fact combines and measures the quantity of publications with their impact, hence the citations. It therefore allows to have a unique criterion useful for the evaluation of a researcher. It can be obtained easily and is also simple to understand, as well as more reliable than other parameters such as the simple impact factor, or the total number of documents.

Obviously, it is not without disadvantages, as already discussed, first of all it is useful to remember that this value (and the relative number of citations) varies according to the field of research; this makes it difficult to compare cross-sector researchers. It is certainly influenced by the length of a researcher’s career, considering a number of citations somewhat constant over time, just think that this value tends to increase even after the discharge/retirement/death of some researchers. Unfortunately, as already said, it can be altered with self-citations. Precisely for this reason, different indices have been proposed over the years, and the purpose of this study was precisely to develop a corrective factor, or another index. Without dwelling too much, we will briefly analyze which variants have been proposed over the years [3, 10, 14].

The first indices dating back to 2006 in fact presented improvements or changes to the standard h-index. The g-index by Egghe et al. [15] is in fact a measure that should also indicate the quality of a researcher and goes to consider the performance of the best articles.

The a-index by Jin et al. [16] includes in the calculation only manuscripts that have been included in the h-index of a particular researcher. In fact, assuming an h-index of 10, only the first 10 manuscripts are taken into consideration, and of these the citations are averaged.

The h(2)-index by Kosmulski et al. [14] has been proposed, just like the g-index, precisely to give more weight to the highly cited works of the individual authors. For example, an h(2)-index of 10 indicates that the author has at least 10 works with 100 citations each.

Some indices are referred to as aggregation. The hg-index by Alonso et al. [17] uses the h and g indices. The function of this index could be seen as a penalty for authors with a low h-index, in fact the aim is to limit, or eliminate the influence of 1–2, or at least a few highly cited works, compared to the rest of the scientific production.

Instead, the q(2)-index by Cabrerizo et al. [18] is based on two indices. This value in fact offers a global vision of a researcher’s production.

The r-index by Jin et al. [16] is a modification of the existing a-index, in fact in this case rather than dividing by the number of h-, which would punish authors with a high h-index, it proposes to take into consideration the square root the sum of the citations used to calculate the h-index.

The ar-index by Jin [16, 19] according to some authors “completes the h-index”, this is an adaptation of the r-index, it takes into account not only the intensity of the citations but also the age of the publications used to calculate the h- index. Until now, however, no index had taken into consideration time, in addition to the latter, in fact, values such as the m quotient by Hirsch [20] have also been proposed. As previously mentioned, the risk is that the value of h- is proportional or in any case correlated to the length of a career, in this way it is possible to overcome this problem, by dividing it by the number of years that have passed since the first publication.

The first f-index by Franceschini et al. [21], relate the h-index with the publication age. Other indices proposed over the years are for example the m-index by Bornmann et al. [22], this goes to calculate the average number of citations received from the articles taken into consideration for the calculation of the h-index. The v-index by Riikonen et al. [23], representing the percentage of the total of articles, useful for the calculation of the h-index; or again the e-index, this has the function of taking into consideration the excess of citations that the h-index does not take into consideration. Furthermore, another f-index by Katsaros et al. [24], this takes into consideration the coterminal citations, as an extension of the co-citations in which some authors are part of papers that cite other papers. In this way it is possible to discriminate against these individuals. Again, the ch-index by Ajiferuke et al. [25], this evaluates the number of citers rather than the number of citations. An index that evaluates self-citations is the b-index by Brown [26], but this does not evaluate the real number of self-citations but considers the self-citation rate constant in the publications of an author.

In this case it is obvious that there are no similar parameters (if not for the name, see the two already existing f-indexes), in fact the Fi-index takes into consideration not only the number of self-citations of the single author, but between the other, he “reads” how much these self-citations have increased his h-index. The advantage of this index is that it can also be used for interdisciplinary comparisons, unlike the h-index itself, which obviously provides very variable values based on the sector to which the researcher belongs. Furthermore, it is however necessary to remember that in some cases, the authors, especially when they have a particular line of research, are “forced” to cite themselves, as the material present during the bibliometric research is scarce or comes only from their group of studies; this obviously happens even if the topic is at the forefront.

This is certainly a valid method, but it is still not possible to have a clear value on how much that author is influencing his bibliometric parameters. There are certainly many methods nowadays to promote and improve one's bibliometric parameters [6]. Major publishers also push a lot in the world of social media, and in channels such as Facebook®, Twitter®, LinkedIn®, ResearchGate® and more to promote their manuscripts [27, 28]. Surely the publishers themselves, must be chosen with great care by the authors, and the impact factor of the chosen journals counts a in this.

Conclusion

It is very easy to understand how a gender assessment, and such an index, may seem inconvenient to many authors. The Fi-index therefore promises to highlight how much an author tends to influence his h-index with self-citations, giving an objective and decreasing parameter depending on how “clean” the h-index is. This parameter will be useful for the evaluations of researchers, even in the public sphere, and works for interdisciplinary comparisons Above all, the response of the scientific community will be sufficiently strong and critical. The values are objective, and the value of an author must be pitted and considered a regardless of bibliometric parameters, which unfortunately, nowadays, we are forced to chase.