Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introductory Remarks

The research area connected to (i) construction of indexes for assessment of research production and (ii) the study of the properties of these indexes is very large and continues to grow. One could write entire books devoted to indicators and indexes . Below, we shall devote about 100 pages to indexes and indicators for assessment of research production of individual researchers and groups of researchers (a group may contain researchers from a department, research institute, university, systems of research institutes, or even a national research community). In order to discuss indexes as much as possible in this small number of pages, the following strategy will be adopted: The corresponding indexes and indicators will be described briefly. Their characteristics (positive or negative) will not be discussed in much detail. Instead, examples for calculation of indexes for two (actually existing) researchers from the same research field are presented. The reader may observe how each new index enlarges the knowledge of the evaluator about the characteristics of research production and about differences between the two researchers. In addition, numerous references are presented where the strengths and weaknesses of the indexes are discussed by competent researchers. We stress again the introductory character of this book. Researchers who want to study the characteristics of scientometric indexes in more detail may need another more extended approach, including, for example, the calculation of the indexes for various available databases; the study of relationships among indexes; methodologies for rescaling indexes calculated for different time intervals, etc. One such possible approach is presented and followed by Vinkler [1].

In Chaps. 2 and 3, the following practically oriented classification is adopted with respect to the indicators and indexes:

  1. 1.

    Commonly used indicators and indexes for evaluation of research mainly of individual researchers, Chap. 2. The indexes discussed are based mainly on citations obtained by the publications written by the evaluated researchers [24];

  2. 2.

    Additional indicators and indexes for evaluation of research of groups of researchers, Chap. 3. Indicators and indexes considered in this chapter will be connected both to the research publications of the evaluated group of researchers and to the citations of these publications.

Any of the above two classes of indexes and indicators may contain as subclasses the classes of indexes and indicators according to Vinkler [1, 5, 6], who proposed the following classification of indexes with respect of the number of sets they represent:

  1. 1.

    Gross indexes (indicators): these refer to the measure of a single scientometric aspect of evaluated systems represented by a single scientometric set with a single hierarchical level. The gross indexes (indicators) may be represented by the following relationship:

    $$\begin{aligned} G = \sum \limits _{k=1}^N w_k i_k, \end{aligned}$$
    (2.1)

    where \(i_k\) is the kth item in the corresponding set, and \(w_k\) is the respective weight. An example of a gross indicator is the number of publications of a research group published for the period of evaluation (bibliometric size of the research group).

    Another example of a complex index connected to publications is the RPR-index (research potential realized index) [7]. Let N be the number of papers published in a journal (or the number of papers authored by a researcher, research group, research institute, etc.). Let \(N_c\) be the number of cited papers published in the journal (or the number of cited papers of the researcher, research group, etc.). Then

    $$\begin{aligned} \text {RPR} = \frac{N_c}{N} = \frac{1}{N} \sum \limits _{i=1}^N w_i, \end{aligned}$$
    (2.2)

    where \(w_i\) is equal to 1 if the ith paper is cited, and equal to 0 if the ith paper is not cited.

  2. 2.

    Complex indexes (indicators): these refer to two or more sets or to a single set with more than a single hierarchical level. For the case of two sets \(\{A\}\) and \(\{B\}\), these indexes may be represented by the relationship

    $$\begin{aligned} C = f(A,B), \end{aligned}$$
    (2.3)

    where f is an appropriate function acting on the two sets. An example of a relationship for a complex index is

    $$\begin{aligned} C^* = \frac{\sum \limits _{i=1}^{N_A} {w_a}_i a_i}{\sum \limits _{i=1}^{N_B} {w_b}_i b_i}, \end{aligned}$$
    (2.4)

    where \({w_a}_i\) and \({w_b}_i\) are respective weighting factors. An example of a complex index is the impact factor (number of citations obtained by a journal for some time period divided by the number of published papers in the journal for this time period) [829]:

    $$\begin{aligned} G_i = \frac{C_i}{P_{i-1} + P_{i-2}}, \end{aligned}$$
    (2.5)

    where \(C_i\) is the number of citations obtained in the year i by the papers published in a journal in the years \(i-1\) and \(i-2\) (the number of these papers is \(P_{i-1}\) and \(P_{i-2}\)). The impact factor introduced by Garfield stimulated many researchers to construct such kinds of indexes [3040].

  3. 3.

    Composite indexes (indicators): these consist of several gross or complex indexes (indicators), usually with some weighting factors, and each representing a special aspect of the evaluated system. The relationship for this kind of index (indicators) is

    $$\begin{aligned} D = \sum \limits _{i=1}^N w_i \left( \frac{a_i}{\sum \limits _{i=1}^M a_i} \right) , \end{aligned}$$
    (2.6)

    where M is the number of evaluated research groups, \(N \le M\), and \(w_i\) is the respective weighting factor. An example of such an indicator is given in the RELEV method, which will be discussed in the next chapter of the book.

2 Peer Review and Assessment by Indicators and Indexes

Non est ponenda pluralitas sine necessitate.

(Do not introduce more arguments than are necessary.)

William of Ockham

In order to increase the effectiveness of scientific research, various officials often implement concepts such as the “value for money” concept in science [41]. Such concepts must be used carefully, for they can lead to unexpected side effects (e.g., soon after monkeys learned the concept of money, the first prostitute monkey appeared [42]). The above remarks lead to a practical question: How to measure “value” in science? In order to perform such measurements, policymakers increasingly use quantitative data [43]. On the basis of statistical analysis of these data, one may construct indexes to measure research activity. Such an approach has been applied for more than a century. In recent decades, activity around the construction of indexes and the study of their properties have been concentrated in several branches of science, e.g., in scientometrics and several related branches for the case of indexes for evaluation of research production. Assessment of research organizations is an important element of the process of research management and implementation of research policy [44, 45]. Administrators of science have two main instruments for evaluation of research organizations:

  1. 1.

    Peer review: evaluation of work by one or more people of similar competence as the producers of the work (peers) [4666]. The main problem with this instrument is to find competent evaluators.

  2. 2.

    Sets of indicators and indexes: this instrument may lead to quick, easy, and inexpensive evaluation of research performance [6769]. The main problem here is that if the indicators and indexes are inappropriate, then the result of evaluation will not be adequate.

Competition at different levels (from individuals to countries) has led to demand for comparative indicators for scientific and other achievements [70]. In addition, indicators and indexes may also be used for other purposes, e.g., for measuring growth of science [71]. Such types of indicators and indexes will be discussed in Chap. 3.

3 Several General Remarks About Indicators and Indexes

The number of indicators applied in evaluations should be reduced to the possible lowest but still sufficient number of indicators.

Peter Vinkler

There are different points of view on indicators and indexes [5, 72, 73]. Below, the indicators and indexes will be understood from the point of view of statistics [74], i.e.,

Indicator: an observed value of a variable, or in other words, a sign of the presence or absence of the concept being studied.

Several indicators can be aggregated into a single index. Thus from the point of view of statistics, an index is

Index: a composite statistic—a measure of changes in a representative group of individual data points, or in other words, a compound measure that aggregates multiple indicators. Indexes summarize and rank specific observations.

Below we present four classifications of indicators. The first classification of indicators of scientific research is

  1. 1.

    Input indicators: they are characteristics of the inputs of scientific organizations such as equipment; spent money; employed personnel.

  2. 2.

    Output indicators: they are characteristics of the results and outcomes of the research process. This class of indicators will be of interest for us below.

The second classification is:

  1. 1.

    Absolute indicators: they refer to one particular characteristic of research activity (number of articles published, money spent, number of citations, etc.).

  2. 2.

    Relative indicators: they refer to the relationship between two or more aspects such as number of articles per research group or the number of citations per paper.

Relative indicators often are more useful for research evaluation.

The third classification of indicators is from the point of view of the type of research. From this point of view, there are three classes of indicators:

  1. 1.

    Basic research indicators: These indicators are connected mainly to basic-research scientific papers and their citations.

  2. 2.

    Experimental development indicators: These indicators are connected mainly to patents and their citations.

  3. 3.

    Applied research indicators: These indicators are intermediate between the above two classes of indicators. They can be connected with applied research papers and their citations as well as with patents and their citations.

The fourth classification of indicators is from the point of view of the size of social systems and structures they measure. From this point of view, there exist the following classes of indicators [7577]:

  1. 1.

    Microindicators: indicators connected with individuals; indicators connected with research groups; indicators connected with status/target groups.

  2. 2.

    Mesoindicators: indicators connected with university departments and university institutes; indicators connected with universities, research institutes, and funding agencies; indicators connected with academic fields; indicators connected with research and grant programs; indicators connected with cross-sectional fields.

  3. 3.

    Macroindicators: indicators connected with scientific policies; indicators connected with national research and development systems; indicators connected with global developments.

Below, we shall focus on indexes and indicators connected to research publications and their citations. Usually these indexes are statistical functions defined on sets of bibliometric elements and units, and because of this relative complexity, there are requirements on the indexes, e.g., the indexes must be valid, i.e., we have to be sure that we really measure what we are intending to measure. Any publication assessment method has to cover the amount of scientific information (e.g., number of scientific papers) produced by the evaluated researcher or group of researchers [1]; the acknowledgement of the published results (e.g., the number of citations) [7884]; eminence of the publication channels. When used carefully, publication and citation data [8587] are meaningful for measuring scientific output and its impact on the course of scientific research. The number of publications that a research group produces may represent its scientific production and its contribution to the generation of new knowledge (but be careful about duplication and the number of coauthors [88]. A scientist with famous collaborators may be highly cited. But this is not a sufficient condition for assessing a large contribution to the advancement of science).

Publications usually contain new facts, new hypotheses, new theories or theorems, new explanations, or new syntheses of existing facts. This is a contribution to science, and the number of citations of the above information is a measure of the contribution to the advancement of research in the corresponding scientific field. But this indicator also must be used carefully. The number of citations depends on research area (chemists are usually much more cited than the mathematicians); number of collaborators and their position in the various scientific networks and systems, etc. [89]. Publications and citations are connected to the visibility of individual researchers and research collectives. But not all publications are equally visible. Visibility depends on the place of publication; on the language of publication; on the scientific field; on the current “fashion” in scientific research; on the presence of publications in international scientific databases, etc. Thus visibility as a characteristic for evaluation of researchers and research groups and organizations must be used with care.

Publications are an important channel for communication of scientific results. And the number of publications may be a quantitative measure of scientific production . In general, one can consider two criteria for evaluation of research production on the basis of publications:

  1. 1.

    External criteria: number of articles, books, patents, etc. published by the scientist.

  2. 2.

    Internal criteria: number of preprints, number of given seminars, number of written internal reports, etc.

To some extent, citations are a measure of the value of the scientific production of the corresponding scientist [30, 90]. In addition, citations are an important measure of the influence of the scientific production of the researcher. The citation is regarded as the scientometric unit of impact of scientific information. Higher scientific impact is revealed by a larger number of citations.

The indexes of research performance usually depend on the size of the analyzed data set. This is especially interesting for indexes connected to citation data [91], since it is often assumed that the level of excellence of a scientist is a function of his/her full citation record. An interesting fact in [91] is that at least fifty papers are needed in order to obtain a conclusion about the long-term scientific performance of two scientific authors and to discriminate between them on the basis of an appropriate one-dimensional (single) index of scientific performance. This means that citation-based one-dimensional indicators and indexes of research performance have to be used for discrimination between mature scientists (such as candidates for a professorship who have produced fifty or more papers) and not between young researchers. And if one wants to discriminate between scientists who have produced fewer than fifty papers each, one should use a multidimensional indicator (a set of indicators).

It is more difficult to evaluate individual researchers and to compare their achievements in comparison to evaluation and comparison of achievements of research groups. The reasons for such difficulty is the smaller sets of publications and citations and the increasing importance of nonscientific factors such as age, position, education, personal connections, etc. Thus in addition to the numerous indexes used in the evaluation, one should also use qualitative evaluation methods. Below we shall discuss many indexes for characterization of the results of the work of individual researchers. And almost all of them will be connected to the citations of publications of a researcher, since a citation may be considered a unit of impact of the information produced by the researcher.

4 Additional Discussion on Citations as a Measure of Reception, Impact, and Quality of Research

Citations are usually used to measure the reception of research results obtained by the corresponding research community. Discussion about the use of citation-based indicators intensified when bibliometric indicators were not only begun to be used for monitoring national or institutional research performance, but when they also became components of formulas for the funding of scientific research [92]. In the area of research management and science policy, citations are often used to measure the impact of research publications, or they even become a measure of the quality of the corresponding publication. The validity of such an approach depends on the number of citations. If a publication is very highly cited, then its impact is high, and it may be that its quality is also good. But if an article is not so highly cited, then is it of low quality, or is its impact low? Such a determination cannot be made immediately and without further investigation. Thus the use of the number of citations as a measure of impact or quality is not unproblematic, since there are many limitations, biases, or shortcomings connected to citation analysis [30, 93, 94]. Nevertheless, citations remain an important form of scientific information within the framework of documented science communication [95]. Not all citations are given, however, because of the quality of the cited paper [96]. Weinstock (in Current Contents # 12, 23 June 1971, reprint from [96]) gives some (fifteen, in fact) reasons for using citations:

  1. 1.

    Paying homage to pioneers.

  2. 2.

    Giving credit for related work.

  3. 3.

    Identifying methodology, equipment, etc.

  4. 4.

    Providing background reading.

  5. 5.

    Correcting one’s own work.

  6. 6.

    Correcting the work of others.

  7. 7.

    Criticizing previous work.

  8. 8.

    Substantiating claims.

  9. 9.

    Alerting to forthcoming work.

  10. 10.

    Providing leads to poorly disseminated, poorly indexed, or uncited work.

  11. 11.

    Authenticating data and classes of facts: physical constants, etc.

  12. 12.

    Identifying original publications in which an idea or concept was discussed.

  13. 13.

    Identifying original publications or other work describing an eponymic concept or term (as, e.g., Hodgkin’s disease, Pareto’s Law, Friedels–Crafts reaction).

  14. 14.

    Disclaiming work or ideas of others.

  15. 15.

    Disputing priority claims of others.

As we can see, for example, item 5 from the above list is certainly not connected to the quality of the cited work.

In addition to individual citations, there are many cases in which larger sets of citations have to be assessed. This will be one of the subjects of the next chapter of this book. Here we shall mention just that such sets of citations may be influenced by citation cliques (which are able to filter information sources), and numerous self-citations may be presented in the set of citations of an individual researcher or in the set of citations of a group of researchers. Thus the individual citations as well as set of citations and especially the frequency of such citations hardly may be considered a measure of the quality of the cited work. Citations may give us information about the impact of the work of the researchers, and the self citations may give us some information too: a lack of self-citations over a longer period may indicate lack of originality in research. The presence of many self-citations may indicate a significant record of publication activity of the corresponding researcher or group of researchers.

There exist quantitative evaluations on the amount of self-citations. The study [97] led to the result that in the area of basic research, the average number of self-citations is about 20 % of the number of citations. Another estimate [98] obtained percentages between 10 and 30 %. These estimates are for synchronous self-citations (The rate of synchronous self-citations is calculated as the citations to oneself relative to the total number of references). Another possible rate for self-citations is the diachronous rate (number of self-citations divided by the total number of citations received) [99]. Synchronous and diachronous self-citation rates can be calculated for individual scientists, groups of scientists, journals, etc. Glänzel and coauthors [100] even obtained a square root law \(f(k)\approx (k+1/4)^{1/2}\) between the number of self-citations f and the number of foreign citations k (foreign citations are the non-self-citations). This law shows that the self-citations and foreign citations are not independent, i.e., the self-citations may be an essential part of scientific communication.

Before beginning our discussion on the indexes used for assessment of research, let us note again that citation patterns are much influenced by subject characteristics. And the subject characteristics are different in different research fields, e.g., in chemistry and mathematics. Because of this, one should not use citations for cross-field comparison without appropriate normalization.

5 The h-Index of Hirsch

The h-index of Hirsch has become very popular in recent years [101118]. Because of this, it is much discussed and modeled [119133]. The h-index is defined as follows. Let us suppose that a certain scientist has N research publications. Let us rank these publications by decreasing number of the number of citations (The most cited paper is on the top of the list; second in the list is the second most cited paper, etc. The least cited paper is at the bottom of the list).

A scientist has h -index equal to H if the top H of his/her N publications from the ranked list have at least H citations each.

The h-index is the solution of the equation

$$\begin{aligned} r=C(r), \end{aligned}$$
(2.7)

where C(r) is the number of citations of the rth publication from the ranked list or articles of the researcher. We note that the other publications of the researcher will have no more than h citations each.

The h-index [134] was introduced on the basis of the intention to measure simultaneously the quality and quantity of scientific output. The h-index was introduced also because of the disadvantages of other bibliometric indicators, such as total number of papers (it does not account for the quality of scientific publications); total number of citations (this number may be disproportionately affected by participation in a single publication; large influence of a certain class of papers (the methodological papers that propose new techniques, methods, or approximations typically generate many citations); many publications with few citations each).

The main reason for the popularity of the h-index is its simplicity [135]. The h-index has been calculated also for journals, topics, etc. [136141]. Let us note the interesting research on correlations between the h-index and thirty-seven other similar indexes [142]. Several of these indexes will be described below.

Assuming that a researcher publishes a constant number of papers each year and that each published paper receives a constant number of citations per year (and this for each subsequent year), Hirsch [134] obtained two relationships when the publication time (which is approximately equal to the length of the scientific career of the scientist) is not too small. The relationships are

  • Relationship between total number of citations N and the Hirsch index h,

    $$\begin{aligned} N(t) \approx A h^2(t). \end{aligned}$$
    (2.8)
  • Relationship between Hirsch index and the time t (in years of research career),

    $$\begin{aligned} h(t) \approx b t, \end{aligned}$$
    (2.9)

    where A and b are some appropriate constants that can be different for different scientists; A has values between 3 and 5, and by b Hirsch classifies the scientists as

    • successful: \(b=1\);

    • outstanding: \(b=2\);

    • unique: \(b=3\).

Soon after its definition, the h-index was generalized to the \(h_\alpha \)-index [143].

A scientist has \(h_\alpha \) -index equal to \(H_\alpha \) if the top \(H_\alpha \) of his/her N publications from the ranked list have at least \(\alpha H_\alpha \) citations each.

If \(\alpha =1\), then \(h_\alpha = h\). The \(h_\alpha \) index has the following properties:

$$\begin{aligned} \lim _{\alpha \rightarrow 0} h_\alpha \sim p; \ \ \lim _{\alpha \rightarrow \infty } h_\alpha \sim c, \end{aligned}$$
(2.10)

where p is the number of papers published by the scientist that have been cited at least once and c is the number of citations of the most cited paper published by the scientist (these numbers can be called p-indicator and c-indicator).

5.1 Advantages and Disadvantages of the h-Index

The h-index is simple to calculate, and it encourages the performance of research work that is highly visible (and may be of high quality). In addition, the h-index is a measure of a combination of two important characteristics of research production: the number of publications and the citation impact of those publications. The h-index compares established scientists from the same scientific field. It does not discriminate much among the average scientists in the field. If a researcher has published many highly visible papers, then his/her h-index may increase with the accumulation of citations even if he or she no longer publishes.

When using the h-index for evaluation of research production, one should keep in the mind that the h-index doesn’t account for the typical number of citations in different scientific fields or for the typical number of citations in different journals. In addition, the h-index doesn’t account for the number of authors of a paper. The index favors scientific fields with large numbers of researchers working in the field. Moreover, the index favors scientific fields with larger sizes of research groups working in the field. The h-index is bounded by the total number of publications: it favors scientists with a longer career. Scientists who have written a small number of papers but have important discoveries are at a disadvantage.

The h-index doesn’t account for the place of the scientist in the author list of the paper. In addition, the h-index does not account for authorship without authorization (the name of a researcher is put in the list of the authors without his/her knowledge or permission). The h-index can be manipulated through self-citations [144148]. h-index doesn’t account for the context of citations. A citation can be made in a positive context, but a citation can also be made in a negative context. And some citations can be more significant for the citing paper. Finally, the h-index doesn’t account for the citation bias connected to the review papers.

The h-index is an attempt to achieve a balance between scientific productivity and quality of scientific production [91]. This index, however, assumes an equality between incommensurable quantities: number of papers and number of citations of a paper. A more general relationship between these two quantities could be

$$\begin{aligned} r^\alpha = \beta C(r). \end{aligned}$$
(2.11)

For the case of the h-index, \(\alpha = \beta =1\), and perhaps this is one of the simplest possible choices of the parameters \(\alpha \) and \(\beta \) [149].

Let us note an interesting effect related to the h-index: the h-bubble [150]. This effect is connected to the rapidly increasing number of citations gained by the authors who first began to study the characteristics of the h-index [151]. It is assumed that this fast growth forms a bubble like a stock market bubble. The question is whether after the bubble there will be a crash. The future will answer this question.

In order to give some simple examples for calculation of the h-index and of some of the indexes described in the chapter below, we shall consider data about citations of the fifty most-cited publications for two actually existing researchers from the research area of applied mathematics. The ranked numbers of citations (the number of citations of the most-cited publication is listed first) data are as follows

  1. 1.

    Researcher A (49 years old, 117 publications, 1375 citations): 93, 73, 67, 65, 59, 44, 43, 42, 38, 36, 36, 35, 34, 33, 33, 32, 29, 29, 29, 28, 27, 27, 26, 23, 23, 21, 21, 21, 20, 20, 20, 19, 18, 17, 16, 15, 15, 13, 11, 10, 10, 8, 8, 7, 7, 6, 6, 6, 6, 5.

  2. 2.

    Researcher B (63 years old, 260 publications, 1562 citations): 113, 65, 58, 51, 49, 42, 41, 37, 36, 34, 31, 27, 27, 25, 24, 24, 23, 23, 22, 20, 18, 17, 17, 16, 16, 16, 16, 16, 14, 14, 14, 14, 13, 12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9.

The h-index of researcher A is \(h_A = 23\). The h-index of researcher B is \(h_B=20\). Thus the younger researcher has a larger h-index. The value of a single index, however, is not enough for comparison of the characteristics of the research production of the two researchers. Below, the values of additional indexes will be calculated. In such a way, an evaluator may obtain a table of values of appropriate indexes, and this table may be used for the quantitative part of the assessment of the research production. Such a table for our two researchers will be presented below.

5.2 Normalized h-Index

One can consider a normalized Hirsch index

$$\begin{aligned} h^* = \frac{h}{N}, \end{aligned}$$
(2.12)

where h is the Hirsch index of the researcher and N is the number of the researcher’s publications. The value \(h^*\) (for large enough N) is closer to an intensive quantity in comparison to the extensive quantity h that in most cases increases in the course of a scientific career.

For our two researchers, the normalized h-index has the following values: \(h^*_A = 0.1965\); \(h^*_B = 0.0769\). Note that \(h^*_A\) is more than twice \(h^*_B\). This is a serious difference that can give us a hint about the effectiveness of the two researchers with respect to the impact of the research information they produce.

Another normalization of the h-index was proposed in [152]. This normalized index is equal to the square of the h-index divided by the total number of the authorships of the papers (sum of the number of authors for all papers from the set of papers) that determine the h-index of the researcher. The idea of normalization of the h-index was developed further in [153] by construction of the MII-index. This index was constructed for institutions but can also be used for evaluation of a group of researchers who have written a sufficiently large number of papers. The definition of the MII-index is

$$\begin{aligned} \text {MII}= \frac{h}{10^\alpha N^\beta }, \end{aligned}$$
(2.13)

where

  • h: the h-index of the scientist;

  • N: number of papers published by the scientist;

  • \(\alpha \): intercept of the line describing the dependence of the h-index on the number of publications in the \(\log _{10}\)-scale;

  • \(\beta \): slope of the line describing the dependence of the h-index on the number of publications in the \(\log _{10}\)-scale.

Construction of the h(N) line is as follows. For the ith member of the group of scientists (or for each institution from the group of evaluated institutions), one plots on a log-log plot the point with coordinates \((N_i,h_i)\). The resulting points are fitted by a regression line

$$\begin{aligned} \log _{10} h_i = \alpha + \beta \log _{10} N_i + \varepsilon _i, \end{aligned}$$
(2.14)

and in such a way, one determines \(\alpha \) and \(\beta \).

The MII-index is constructed for comparison of the quality of research of institutions of different sizes. It can be applied also to a group of researchers with different productivities. A value of MII that is larger than 1 means that the corresponding researcher from the research group of interest performs better than the average in terms of its h-index. The MII-index can also be used for evaluation of performance of a research institute in a large enough group of institutes from the same research area.

5.3 Tapered h -Index

The tapered h-index [154] is an extension of the h-index introduced in order to account for the citations of all papers of a researcher (and not only for the h papers that are cited at least h times). The definition of the index is as follows:

$$\begin{aligned} h_T = \sum \limits _{j=1}^N h_T (j), \end{aligned}$$
(2.15)

where \(h_T(j)\) is the score for the jth paper in the ranked list (with respect to citations) of the publications of the researcher. In other words, we assume that the researcher has N publications ranked by the number of citations \(n_1\ge n_2\ge \dots \ge n_N\). The number \(h_T(j)\) is determined as follows:

$$\begin{aligned} h_T(j)= & {} \frac{n_j}{2j-1}, \ n_j \le j ,\nonumber \\ h_T(j)= & {} \frac{j}{2j-1} + \sum \limits _{i=j+1}^{n_j} \frac{1}{2i-1}, \ \ n_j > j. \end{aligned}$$
(2.16)

The tapered h-index is larger than the h-index and is an additional characteristic that can be used to evaluate production (and impact of this production) of researchers.

We leave the calculation of the tapered h-indexes for the top cited fifty publications of our two researchers to the interested reader. The contributions of the first five publications that are not included in the h-index to the tapered h-index of the two researchers are:

  • Researcher A: 23 / 47, 23 / 49, 21 / 51, 21 / 53, 21 / 55;

  • Researcher B: 18 / 41, 17 / 43, 17 / 45, 16 / 49, 16 / 51.

5.4 Temporally Bounded h-Index. Age-Dependent h-Index

In the temporally bounded version of the h-index, one counts the citations of the articles for some time interval (for the last five years, for example), and then one makes a list in which we rank the papers with respect to the number of these citations.

A scientist has a temporally bounded h -index H if the top H of his/her N papers from the list have at least H citations each for some time interval (for example, for the last five years).

The temporally bounded h-index allows a comparison between the impacts of the papers of scientists working in the same scientific area. For our two researchers, the temporally bounded h-index for their citations for the last five years is:

  • Researcher A: \(h_A^{temp}=19\) (1041 citations for the last five years);

  • Researcher B: \(h_B^{temp}=12\) (724 citations for the last five years).

The h-index can be made age-dependent. The classic h-index is the solution of the equation (2.7). We can think about an appropriate inclusion of the time in the h-index in order to compensate for the length of the scientific career of younger scientists. One possibility is as follows. Let

$$\begin{aligned} C^*_r=C(r)/a_r, \end{aligned}$$
(2.17)

where \(a_r\) are the ages of the rth paper from the ranked list. Let us perform a ranking \(C^*(r)\) of the papers with respect to the values of \(C^*_r\) Then we can define the age-dependent h-index as the point of intersection of the straight line \(y=r\) and the curve \(y=C^*(r)\), i.e., as the unique solution of

$$\begin{aligned} r=C^*(r) \end{aligned}$$
(2.18)

5.5 The Problem of Multiple Authorship. \(\overline{h}\)-Index of Hirsch and gh-Index of Galam

Frequently, a publication has several coauthors [155158]. Coauthorship can be used as a measure of scientific collaboration. On the basis of the observation of the coauthorship pattern, one can conclude that scientific collaboration has increased greatly during recent decades at different levels of aggregation, e.g., at the level of individual authors; at the level of collaboration between sectors such as universities, research institutes, and industry; and at the level of international collaboration.

There are many reasons why researchers collaborate. One list of such reasons is as follows [159]:

  1. 1.

    Access to expertise.

  2. 2.

    Access to equipment, resources, or “stuff” one doesn’t have.

  3. 3.

    Improved access to funds.

  4. 4.

    To obtain prestige or visibility; for professional advancement.

  5. 5.

    Efficiency: multiplies hands and minds; easier to learn the tacit knowledge that goes with a technique.

  6. 6.

    To make progress more rapidly.

  7. 7.

    To tackle “bigger” problems (more important, more comprehensive, more difficult, global).

  8. 8.

    To enhance productivity.

  9. 9.

    To get to know people, to create a network, like an “invisible college”.

  10. 10.

    To retool, learn new skills or techniques, usually to break into a new field, subfield, or problem.

  11. 11.

    To satisfy curiosity, intellectual interest.

  12. 12.

    To share the excitement of an area with other people.

  13. 13.

    To find flaws more efficiently, reduce errors and mistakes.

  14. 14.

    To keep one more focused on research, because others are counting on one to do so.

  15. 15.

    To reduce isolation, and to recharge one’s energy and excitement.

  16. 16.

    To educate (a student, graduate student, or oneself).

  17. 17.

    To advance knowledge and learning.

  18. 18.

    For fun, amusement, and pleasure.

The classic version of the h-index does not account for multiple authorship [160]. Because of this, Hirsch [161] defined another index, called the \(\overline{h}\) index, as follows: A scientist has index \(\overline{h}\) if \(\overline{h}\) of his/her papers belong to his/her \(\overline{h}\) core. A paper belongs to the \(\overline{h}\) core of a scientist if it has \(\ge \overline{h}\) citations and in addition belongs to the \(\overline{h}\) -core of each of the coauthors of the paper. The \(\overline{h}\)-index shows one way to deal with multiple authorship in the process of evaluation of a researcher’s scientific production. Another way has been proposed by Galam [162], who introduced the gh-index as follows. Let us consider the function g(rk) that describes the fraction of the publication assigned to the rth author in the list of authors for a publication that has k coauthors. Then \(\sum \limits _{r=1}^k g(r,k) =1\). If an author has authored and coauthored T publications, then the fraction of publications that is assigned to this author will be

$$\begin{aligned} T_g = \sum \limits _{i=1}^T g_i(r,k). \end{aligned}$$
(2.19)

If the ith paper of the above set of T papers has \(n_i\) citations, then the fraction of citations that will be assigned to the rth author will be \(n_i g_i(r,k)\). Then the fraction of citations that will be assigned to the investigated author will be

$$\begin{aligned} N_g = \sum \limits _{i=1}^T n_i g_i(r,k). \end{aligned}$$
(2.20)

There are different proposals for the form of the function g(rk). Several of them are:

  • Egalitarian allocation: \(g(r,k) = \frac{1}{k}\) [163];

  • Arithmetic allocation: \(g(r,k)=\frac{2(k+1-r)}{k(k+1)}\) [164];

  • Geometric allocation: \(g(r,k) = \frac{2^{1-r}}{2(1-2^{-k})}\) [165], etc.

Galam proposed an allocation with bonuses for the first and for the last author of a publication as follows. Let a publication have k coauthors. We consider the decreasing arithmetic series \(k, k-1, \dots , 2, 1\) and two bonuses: \(\delta \) for the first author and \(\mu \) for the last author. Let us call them the bonus of the hard worker (the first author) and the bonus of the boss (usually the last author). The sum of the above arithmetic series and of the two bonuses is \(S_k = \frac{k(k+1)}{2} + \delta + \mu \). Then the function g(rk) becomes

$$\begin{aligned} g(1,k)= & {} \frac{k+\delta }{S_k}; \nonumber \\ g(k,k)= & {} \frac{k-1+\mu }{S_k}; \nonumber \\ g(r,k)= & {} \frac{k-r}{S_k}; \end{aligned}$$
(2.21)

and g(1, k) and g(kk) are defined when the publication has more than two coauthors, and g(rk) is defined only when the publication has at least three coauthors.

The final step is to set the values of the bonuses \(\delta \) and \(\mu \). These values have to be set by consensus. Possible relationships are \(\delta = 2 \mu \); \(\delta = 3 \mu +1\); etc. [162]. After setting the values of the bonuses, one can calculate the effective number of citations \(n_\mathrm{eff}(i)\) of the ith paper by

$$\begin{aligned} n_\mathrm{eff}(i) = n_i g_i(r,k), \end{aligned}$$
(2.22)

and then one can calculate the gh-index simply by calculating the h-index for the set of \(n_\mathrm{eff}(i)\), \(i=1,\dots ,N\). The gh-index obtained in such a way has smaller value compared to h (the two indexes are equal only if the scientist has no coauthors for any publication).

Finally, let us note one more index that has to deal with the problem of coauthorship [166]. This index is called the P-index of a researcher . Its definition is

$$\begin{aligned} P = \sum \limits _{k=1}^K A^*_k J_k, \end{aligned}$$
(2.23)

where

  • \(J_k\): journal impact factor of the journal where the kth paper of the researcher was published;

  • \(A^*_k\): \(A^*\)-index of the kth paper of the researcher.

The \(A^*\)-index is defined as follows. Let an article of the researcher have n coauthors that can be separated into \(m \le n\) groups and in each of these groups, the coauthors have the same credit (say \(c_i\) for the ith group of coauthors). The value of \(A^*\) for a coauthor from the group i is then

$$\begin{aligned} A^*(i) = \frac{1}{m} \sum \limits _{j=1}^m \frac{1}{\sum \limits _{k=1}^j c_k}. \end{aligned}$$
(2.24)

If no coauthors claim an equal contribution, then \(m=n\), \(c_i=1\), and

$$\begin{aligned} A^*(i) = \frac{1}{n} \sum \limits _{j=1}^n \frac{1}{j}. \end{aligned}$$
(2.25)

5.6 The m-Index

The m-index has been proposed in [167]. In order to define it, one needs to know about the Hirsch core. In the process of calculation of the Hirsch index, the papers of the scientists of interest are ranked with respect to the number of citations each of them has obtained. The papers from the ranked list whose rank is less than or equal to h build the Hirsch core of the ranked list of the paper. Then the m-index is the median number of citations received by the papers in the Hirsch core.

The m-index focuses on the impact of publications with the highest citation counts and is a characteristic of the quality of the production of the evaluated scientist taken from the core of his/her most cited scientific production. The m-index for our two researchers is approximately:

  • Researcher A: \(m_A \approx 38.8\);

  • Researcher B: \(m_B \approx 34.6\).

The word approximately above was used because the multiplication of the number of citations for both researchers leads to very large number that is represented only approximately by the simplest calculators. So the m-index usually can be calculated only approximately for researchers whose h-factor is relatively large (e.g., greater than 15).

5.7 h-Like Indexes and Indexes Complementary to the Hirsch Index

One can define central area indexes and central interval indexes [168]. These indexes are connected to the Hirsch index and supply information complementary to the information obtained on the basis of the h-index. For example, the central area index of radius j is defined as follows:

$$\begin{aligned} A_j = (h-j) c_{h-j} + \sum \limits _{i=h-j+1}^{h+j} c_i; \ \ j=1, \dots , h-1, \end{aligned}$$
(2.26)

where h is the h-index and \(c_i\) are the citations received by the ith-ranked publication of the scientist (\(c_1\ge c_2\ge \dots \ge c_n\) for a scientist who has n publications). The idea of this index is to reduce one of the negative effects of the Hirsch index, which penalizes authors with heavy tails in their citation distribution. The central area index for such authors increases faster in comparison to the central area index for authors whose least-cited papers have a small number of citations.

Generalizations of the h- and g-indexes are presented in [169], and the robustness of the corresponding set of indexes is investigated. The result is that the most robust of them is the h-index, which is most insensitive to the extreme values of the corresponding citation distribution. At the expense of this, the h-index has quite low discriminating power (many scientists with different citation distributions can have the same h-index of their citations).

In [170], Egghe develops further an idea of Glänzel and Schubert [171] about characteristic scores and scales (CSS). The original idea is to determine, on the rank-order citation distribution, a sequence of points \(\varepsilon _k\), \(k = 1, 2, \dots \), and \(\sigma _k\), \(k = 1, 2,\dots \), where the k are some ranks of papers and \(\sigma _k = \gamma (\varepsilon _k)\) are the corresponding characteristic scores, i.e., the number of citations to the paper of rank \(r =k\). The function \(\gamma (r)\) gives the number of citations of the paper of rank k, and it is called the rank-order frequency function. In other words, let \(\sigma _1 = \mu \) be the average number of citations of the paper authored by a scientist. Let us discard all papers with fewer citations than \(\sigma _1\). The average number of citations of the remaining papers is \(\sigma _2>\sigma _1\). Let us remove the papers with fewer citations than \(\sigma _2\). The average number of citations of the remaining papers is \(\sigma _3>\sigma _2\). This process can be continued (as long as set of remaining papers is not empty).

CSS is a set of indexes that characterize the distribution of citations of a scientist. As a multicomponent characteristic, it has the advantages of supplying evaluators with more information in comparison to the use of a single indicator (which is the average number of citations of the papers of the scientist. It can be shown that \(\sigma _k = \mu ^k\) for the case in which \(\gamma (r)\) satisfies Lotka’s law \(\gamma (t) \propto r^\alpha \) [172] (we shall discuss the Lotka’s law in greater detail in Part III of this book). The idea of Egghe in [170] is to base the characteristic scores and scales on the h-index instead of on the average number of citations. For the case of validity of the Lotka’s law, this leads to a sequence of values

$$\begin{aligned} h_k = \frac{-\sum \limits _{j=0}^{k-1} h_j+\left[ \left( \sum \limits _{j=1}^{k-1} h_j\right) ^2\right] ^{1/2}}{2}, \end{aligned}$$
(2.27)

where \(h_0 = h\) and \(h_1 = h \frac{\sqrt{5}-1}{2}\) (h is the value of the h-index). Of course, the CSS can also be based on other indexes (on the g-index, for example).

Finally, let us discuss three recently introduced indexes that are complementary to the h-index [173, 174]. These indexes are called the perfectionism index (PIX), the extreme perfectionism index (EPIX), and the academic trace. Our notation is different from that in [173] in order not to confuse these indexes with the productivity indexes (PI) that will be discussed below.

Let us assume a researcher who has published p papers, and these publications have been cited C times. We recall that the h-index of the researcher separates his or her publications into two groups: the core (the h publications that are cited at least h times) and the tail (the other \(p-h\) publications). Let the number of citations of the publications from the core be \(C_H\) and the number of citations of publications from the tail be \(C_T\) (\(C=C_h+C_T\)). We define the following two quantities:

  • \(C_E = C_H - h^2\): this quantity accounts for the eventual large number of citations in the core area;

  • \(C_{TC}=h(p-h) - C_T\): this quantity penalizes researchers who wrote many papers that are not much cited (the mass producers).

Then the perfectionism index is

$$\begin{aligned} \mathrm{PIX} = \kappa h^2 + \lambda C_E - \nu C_{TC}, \end{aligned}$$
(2.28)

where \(\kappa \), \(\lambda \), and \(\nu \) are real numbers.

In order to define the extreme perfectionism index, we need also

  • \(C_{IC} = \sum \limits ^* p-C_i\),

where \(\sum \limits ^*\) means summation over all publications whose number of citations \(C_i\) is less than (the number of publications) p. Then the extreme perfectionism index is

$$\begin{aligned} \mathrm{EPIX} = \kappa h^2 + \lambda C_E + \mu C_T - \nu C_{IC}, \end{aligned}$$
(2.29)

where \(\kappa \), \(\lambda \), \(\mu \), and \(\nu \) are real numbers. The value of these numbers must be fixed, and the proposal to do this from [173] is just to set all of them to 1 (or to set some of them to 1 and the others to 0). Let us set the values of the parameters to 1. Then from (2.28), we obtain

$$\begin{aligned} \mathrm{PIX} = C + h(h-p). \end{aligned}$$
(2.30)

If a researcher has 65 publications with 900 citations and h-index equal to 20, then \(\mathrm{PIX}=0\). If the researcher has 1000 citations, then \(\mathrm{PIX}=100\). If the researcher has 500 citations, then \(\mathrm{PIX}=-400\). The classification of the influential scientists and mass producers is:

  1. 1.

    If a researcher has \(\mathrm{PIX} >0 \), then he/she is an influential scientist;

  2. 2.

    If a researcher has \(\mathrm{PIX} <0 \), then he/she is a mass producer.

The academic trace index is defined as follows [174]:

$$\begin{aligned} T = \frac{h^2}{p} + \frac{C_T^2}{C} + \frac{C_E^2}{C} - \frac{p_0^2}{p}, \end{aligned}$$
(2.31)

where

  • \(p_0\): number of publications that are not cited.

Another interesting index complementary to the h-index is defined in [152]. This index is

$$\begin{aligned} h_I = \frac{h^2}{N_a^{(T)}}, \end{aligned}$$
(2.32)

where h is the h-index and \(N_a^{(T)}\) is the total number of authors of the h-core of the corresponding author (multiple author occurrences in different papers is counted, e.g., if an author is coauthor in k papers, then he/she is counted k times).

It is claimed in [152] that the \(h_I\) index rank plots collapse into a single curve. This is an important property, since in such a case, on the basis of the \(h_I\) -index, one can compare scientists from different scientific fields.

Dorogovtsev and Mendes [175] note that the use of only the h-index for assessment of research may lead to a reshaping of research behavior: misleading citation-based targets may substitute for the real aims of scientific research: strong results. If h is the value of the h-index and C is the number of citations of the articles of a researcher, then the region of small values of the relationship \(h/\sqrt{C}\) (i.e., the region where the researcher has a small number of very good articles that are highly cited) is occupied by outstanding researchers [105]. An interesting conclusion in [175] is that for given C, the h-index usually decreases with increasing \(\langle c \rangle = C/N\) (i.e., with increasing mean number of citations per paper, the h index decreases). Thus it seems that the h-index favors modestly performing scientists and punishes stronger researchers with a large mean number of citations per paper. In order to make a better ranking of evaluated scientists on the basis of a single metric, the o-index was proposed in [175]:

$$\begin{aligned} o = \sqrt{\tilde{m}h}, \end{aligned}$$
(2.33)

where h is the value of the h-index of the researcher and \(\tilde{m}\) is the number of citations of the most cited paper of the same researcher. The motivation for such an index is that \(\tilde{m}\) accounts for the best result of the researcher, and h accounts for his/her persistence and diligence. In order to relate the o-index to the number of citations C of the researcher and to the mean number of citations per paper \(\langle c \rangle \), one may use the following estimates: \(h \sim \sqrt{C}\), and the mean number of citations per paper is between \(n_1 = C\) and \(n_2=C/N\). Thus one may assume that \(\tilde{m} \sim C/\sqrt{N}\) (\(\tilde{m}^2 \sim n_1 n_2\)). Then

$$\begin{aligned} o \sim C^{3/4} N^{-1/4} = C^{1/2} \langle c \rangle ^{1/4}. \end{aligned}$$
(2.34)

Thus the o-index should grow with the average number of citations per paper.

The o-index considered above grows much faster with the number of citations than with the average number of citations per paper. If we want to put more weight on the average number of citations per paper, we can generalize the o-index as follows:

$$\begin{aligned} o_{\alpha ,\beta } = h^\alpha \tilde{m}^\beta . \end{aligned}$$
(2.35)

Then

$$\begin{aligned} o_{\alpha ,\beta } \sim C^{\alpha /2 + \beta } N ^{-\beta /2} = C^{(\alpha +\beta )/2} \langle c \rangle ^{\beta /2}. \end{aligned}$$
(2.36)

The o-index from [175] is

$$\begin{aligned} o = o_{1/2,1/2}. \end{aligned}$$
(2.37)

Let \(\beta >0\). Then if \(-\beta< \alpha <0\), we have \(\alpha + \beta < \beta \), which ensures a large weight of \(\langle c \rangle \). For example, let \( \alpha = -\beta + \delta \), where \(\delta >0\). Then

$$\begin{aligned} o_{\alpha ,\beta } \sim C^\delta /2 \langle c \rangle ^{\beta /2}. \end{aligned}$$
(2.38)

If \(\delta \) is 0 or very close to 0, then the contribution of C to the index could be very small.

Many other variants of h-indexes and h-like indexes exist. Let us note several of them:

  1. 1.

    The two-sided h-index [176], which accounts for the papers and citations out of the Hirsch core and allows comparison of researchers with the same values of the h-index.

  2. 2.

    The self-citations correction to the h-index [177] and to the g-index [178] (for discussion of the g-index, see Sect. 3.5).

  3. 3.

    Multidimensional extension of the h-index [179].

  4. 4.

    Successive h-indexes [180].

  5. 5.

    h-type index of coauthor partnership ability [181].

  6. 6.

    \(q^2\)-index uses the number and impact of papers in the Hirsch core [182],

    $$\begin{aligned} q^2 = \sqrt{h m}, \end{aligned}$$
    (2.39)

    where h is the Hirsch index and m denotes the median number of citations received by papers in the h-core of the corresponding set of articles (this is the m-index discussed above). The \(q^2\)-index is designed to supply a more global view of the scientific production of researchers, since it is based on two indices that describe different dimensions of the research output: the h-index describes the number of papers (quantitative dimension) in a researcher’s productive core, while the m-index is connected to the impact of research output.

  7. 7.

    The hg-index, which is the geometric mean of the product of the h-index and g-index [183, 184]:

    $$\begin{aligned} hg = \sqrt{h \times g}. \end{aligned}$$
    (2.40)

    The value of the hg-index is between the value of the h-index of Hirsch and g-index of Egghe: \(h \le hg \le g\).

6 The g-Index of Egghe

Another very popular index based on the number of citations of the publications of a researcher is the g-index [185188]. Let us make an ordered list of the papers of a researcher, and the order criterion is the number of citations: the most-cited paper is at the top of the list, the second-most-cited paper is at place 2 of the list, and the least-cited paper is at the bottom of the list. Then:

The g -index is the largest natural number g such that the top g articles received (together) at least \(g^2\) citations.

The g-index accounts for the number of citations of the highly cited papers of a scientist. The citations from higher-cited papers are used to bolster lower-cited papers. Because of this, the value of the g-index is at least equal to the value of the h-index, and in most cases, the g-index has a larger value than the h-index of the corresponding scientist.

The g-index can be generalized as follows [143]. The g-index above is restricted to integer values. One can define a \(g^*\)-index that is not restricted to integer values. Let \(x_i\), \(i=1,\dots ,N\), be the number of citations of the ith article of a researcher ordered in such a way that \(x_1 \ge x_2 \ge \dots \ge x_N\). Let x(u) be a function that approximates the values of the sequence \(x_i\). Then one can define a continuous version of the g-index:

$$\begin{aligned} g^* = \mathrm{max} \{ \mathrm{u} \mid \int \limits _{0}^\mathrm{u} \mathrm{dv} \ \mathrm{x(v)} \ge \mathrm{u}^2 \}. \end{aligned}$$
(2.41)

The \(g^*\)-index is connected to the g-index as follows: \(g \le g^* < g+1\). \(g^*\)-index can be generalized further. One can define the \(g^*_\alpha \)-index as follows:

$$\begin{aligned} g^*_\alpha = \mathrm{max} \{ \mathrm{u} \mid \int \limits _{0}^\mathrm{u} \mathrm{dv} \ \mathrm{x(v)} \ge \alpha \mathrm{u}^2. \} \end{aligned}$$
(2.42)

It is clear that when \(\alpha = 1\), \(g^*_\alpha \) reduces to \(g^*\). In addition,

$$\begin{aligned} \lim _{\alpha \rightarrow 0} g^*_\alpha \sim s; \ \ \lim _{\alpha \rightarrow \infty } g^*_\alpha \sim c, \end{aligned}$$
(2.43)

where \(s = \sum \limits _{i=1}^N x_i\) is the total number of citations of all papers published by the scientist and c is the c-indicator defined above i devoted to the \(h_\alpha \)-index.

The g-index for our two researchers is as follows:

  • Researcher A: \(g_A = 33\);

  • Researcher B: \(g_B = 30\).

Let us note that the larger values of the g-index are more difficult to reach. Researcher B has 946 citations of his 31 most-cited publications; 960 citations of his 32 most-cited publications and 973 citations of his 33 most-cited publications. In order to reach a g-index of 31, he will need an additional 15 (\(29 - 14\)) citations of his top-cited 31 publications. In order to reach a g-index of 32 after reaching \(g=31\), he will need an additional 49 (\(63 - 14\)) citations of his top-cited 32 publications. Finally, in order to reach \(g=33\) from \(g=32\), he will need an additional 52 (\(65 - 13\)) citations of his top-cited 33 publications.

The g-index can be temporally bounded. The temporally bounded g -index is the largest natural number g such that the top g articles received (together) at least \(g^2\) citations for some time interval (for example, for the last five years). The temporally bounded g-index allows for a comparison between the impacts of the papers of scientists working in the same scientific area. The g-index can be modified in order to account for multiauthorship of publications [189, 190].

Similar to the gh-index discussed above, one can obtain also a gg-index on the basis of the effective citations of the papers of the scientists as calculated by (2.22). There is a discussion as to whether the h-index or g-index is better [191]. Our experience shows that each of the two indexes gives a piece of information about the performance of researchers, and these pieces of information are not the same. Thus we recommend the use of both indexes together. For example, if one has to evaluate established researchers from the same research area of the natural sciences (on the occasion of competition for some award or some high academic position), then the set of the h-index and g-index is a good choice for a minimum set of indexes that can give an initial impression about the quantitative aspects of the results of the scientific work of the candidates.

7 The \(i_n\)-Index

This index simply counts the number of papers of the scientist that are cited more than n times. For example, the \(i_{10}\) index (used in Goggle Scholar) counts the number of papers that are cited more than ten times. There are two versions of this index:

Nonbounded \(i_n\) -index: This index counts the number of papers of the scientist that are cited n times for the time of the scientist’s entire scientific career.

and

Temporally bounded \(i_n\) -index: This index counts the number of papers of the scientist that are cited n times for some time interval (for example, for the last five years).

The temporally bounded \(i_n\)-index allows a comparison between the impacts of the papers of scientists working in the same scientific area. The combination of the h -index, g -index, and several \(i_n\) indexes is another candidate for a set of indexes that may give a good initial impression about the quantitative aspects of the production of the evaluated researchers.

The \(i_n\) indexes for our two researchers are as follows:

  • Researcher A: \(i_{100}=0\); \(i_{50}=5\); \(i_{30}= 16\); \(i_{10}=41\);

  • Researcher B: \(i_{100}=1\); \(i_{50}=4\); \(i_{30}= 11\); \(i_{10}=44\).

Interesting is the temporally bounded \(i_{10}\) index for the two researchers for the last five years. It is:

  • Researcher A: \(i_{10}^{temp}=36\);

  • Researcher B: \(i_{10}^{temp}=16\),

which shows that many more units of scientific information of researcher A (36 publications) are recognized as relatively important in comparison with the units of research information (16 publications) of researcher B. But the longer research career of researcher B has led to a larger value of his non-temporally bounded \(i_{10}\)-index. With respect to the \(i_{30}\) and \(i_{50}\) indexes, researcher A has already an advantage (despite the shorter research career). Researcher B still has a lead with respect to \(i_{100}\).

8 p-Index. \(IQ_p\)-Index

The p-index was introduced by Prathap [192, 193] on the basis of the exergy indicator

$$\begin{aligned} X = k^2 P, \end{aligned}$$
(2.44)

where P is the number of papers published by a scientist and \(k=C/P\) is the ratio of the number of citations C of the P papers published by the scientist. The p-index is defined on the basis of the indicator X as follows:

$$\begin{aligned} p= X^{1/3} =\left( k^2 P \right) ^{1/3}. \end{aligned}$$
(2.45)

The p-index is designed as a joint measure of publication–citation activity of a researcher. The values of this index for our two researchers are

  • Researcher A: \(p_A = 25.28\);

  • Researcher B: \(p_B = 21.09\).

The larger value of the p-index for researcher A is due to his better ratio between obtained citations and research publications. This ratio participates at power 2 in the index and compensates for the twice larger number of publications of researcher B.

The \(IQ_p\)-index was introduced in [194] to measure the impact of a researcher along two dimensions: production (output, which is measured by the number of publications) and quality (measured by the number of citations). In order to define this index, one has to introduce a quantity called estimated citations E. It is defined as

$$\begin{aligned} E = \frac{ca(p+1)}{2}, \end{aligned}$$
(2.46)

where

  • a: age of the researcher;

  • p: number of papers written by the researcher;

  • c: correction factor reflecting the citations an average article receives in a particular research area. The value of c is based on the weighted aggregate journal impact factor of the top three subject categories in which the person has been cited.

Then \(IQ_p = QP\), where Q and P are the quality and production components of the index, defined as follows:

$$\begin{aligned} Q = \frac{C}{E}; \ \ P = p \frac{E/p}{p+E/p}, \end{aligned}$$
(2.47)

where C is the number of citations of the papers written by the scientist and the production P is measured by the number of adjusted papers [194]. The result is

$$\begin{aligned} IQ_p = \frac{C}{p+\frac{ac(p+1)}{2p}} \end{aligned}$$
(2.48)

Note that the value of this index depends on the manner of counting citations and publications.

Let us calculate the \(IQ_p\) index for our two researchers. We shall avoid the unknown quantity c in the following manner. For researcher B, we shall assume \(c=1\). This will correspond to 1562 citations/260 publications. Then the value of c for researcher A will be (1375 citations/117 publications)/(1562 citations/260 publications) \(=1.956\). Then for the two researchers, the values of the index are as follows:

  • Researcher A: \(IQ_p^A = 8.316\);

  • Researcher B: \(IQ_p^B = 5.356\).

The \(IQ_p\) index assigns about a 60 % greater impact of researcher A in comparison to researcher B.

9 A-Index and R-Index

The equation for the A-index is [195]

$$\begin{aligned} A = \frac{1}{h} \sum _{i=1}^{h} C_i, \end{aligned}$$
(2.49)

where

  • h: the value of the h-index for the evaluated scientist.

  • \(C_i\): number of citations for the ith paper from the list of ranked papers connected with the h-index.

The A-index may be sensitive to the number of citations of highly cited papers. It can happen as follows. Let us suppose two scientists: Alain and Paul. The h-index of Paul is larger than the h-index of Alain. But the most-cited papers of Alain are much more frquently cited than the papers of Paul. Then it can happen that the A-index of Alain has a larger value than the A-index of Paul.

Because of the above, one often uses an additional index called the R -index (R is used because the index contains a square root). Its equation is

$$\begin{aligned} R = \sqrt{\sum _{i=1}^{h} C_i} = \sqrt{A \cdot h}, \end{aligned}$$
(2.50)

where

  • h: the value of the h-index for the evaluated scientist.

  • \(C_i\): number of citations for the ith paper from the list of ranked papers connected with the h-index.

The square root of the sum used in R leads to the consequence that the values of the index are not very large. In addition, there is no division by h, as in the case of A, and nevertheless, the values of the two indexes do not differ much.

The R-index never decreases. This happens even if the corresponding scientist has ended his or her publication activity. One way to deal with this is to define an age-dependent R-index. The equation for this index is [195]

$$\begin{aligned} R^* = \sqrt{\sum _{i=1}^{h} \frac{C_i}{a_i}}, \end{aligned}$$
(2.51)

where

  • h: the value of the h-index for the evaluated scientist.

  • \(C_i\): number of citations for the ith paper from the list of ranked papers connected with the h-index.

  • \(a_i\): age of the ith article.

On the basis of the R-index, a dynamic h-type index can be defined [196]. This index is

$$\begin{aligned} d_h(T) = R(T) \nu _h(T), \end{aligned}$$
(2.52)

where R(T) is the R-index, equal to the square root of the sum of all citations received by articles belonging to the h-core at time T, and \(\nu _h(T)\) is the h-velocity at time T,

$$\begin{aligned} \nu _h(T) = \frac{dh}{dt}\mid _{t=T} = \lim _{t \rightarrow 0} \frac{h(T+t)-h(T)}{t}. \end{aligned}$$
(2.53)

The definition of \(d_h\) contains three time-dependent elements: the size and contents of the h-core; the number of citations received; and the h-velocity. According to [196], the time \(T=0\) should be chosen not at the beginning of the researcher’s career but five to ten years from the current moment of time (if the corresponding career is long enough). Then the function h(T) should be fitted for determination of \(\nu _h(T)\). There are several estimates of h(T) [123, 197]. The estimate of Egghe [123] is

$$\begin{aligned} h(t) = [P_\infty C(t)^{\alpha -1}]^{1/\alpha }, \end{aligned}$$
(2.54)

where C(t) is the continuous citation distribution function; \(P_\infty \) is the number of publications at \(t=\infty \); \(\alpha >1\) is the (Lotka) exponent for the citation function. Then

$$\begin{aligned} d_h(T) = R(T) \left[ P_\infty (\alpha -1) C(t)^{\alpha -2} \frac{dC}{dt} \right] \frac{[P_\infty C(t)^{\alpha -1}]^{(1-\alpha )/\alpha }}{\alpha }. \end{aligned}$$
(2.55)

The values of the A-index and of the R-index for our two researchers are

  • Researcher A: \(A_A=40.6\); \(R_A = 30.561\);

  • Researcher B: \(A_B=38.6\); \(R_B = 27.784\).

The values of the A-index reflect the fact that the number of citations per publication from the h-core of researcher A is larger than the corresponding number of citations per publication of researcher B. The values of the R-index reflect the fact that the number of citations for the publications from the h-core of researcher A is larger than the number of citations for the publications from the h-core of researcher B.

Let us end here the calculation of various indexes connected to the research production of the researchers A and B. We can summarize the obtained results as follows. We have calculated values only for a small number of the indexes discussed in this chapter. As an exercise, the interested reader may enlarge the table with the values of additional indexes. As one may see from Table 2.1, the values of the indexes give us compact quantitative information about the research production of researchers, and on the basis of the values of the indexes, we can compare the researchers. Such an evaluation should be made on the basis of a sufficiently large number of values of indexes. And in addition to quantitative evaluation, qualitative evaluation (peer review, etc.) of research production of researchers should be made.

Now let us continue the discussion of the indexes.

Table 2.1 Values of various indexes calculated for researchers A and B

10 More Indexes for Quantification of Research Production

10.1 Indexes Based on Normalization Mechanisms

  1. 1.

    Index \(B_1\)

    For a set of n papers, this index is defined as [198201]

    $$\begin{aligned} B_1 = \frac{\sum \limits _{i=1}^n c_i}{\sum \limits _{i=1}^n e_i}, \end{aligned}$$
    (2.56)

    where

    • \(c_i\): number of citations of the ith publication (\(i=1,2,\dots \));

    • \(e_i\): expected number of citations of the ith publication.

    The expected number of citations \(e_i\) given the field and the year of publication is the average number of citations of all papers published in the same field and in the same year.

  2. 2.

    Index \(B_2\)

    This index is defined as [198]

    $$\begin{aligned} B_2 = \frac{1}{n} \sum \limits _{i=1}^n \frac{c_i}{e_i}. \end{aligned}$$
    (2.57)

We note that the above two indexes should be used carefully for evaluation of sets of papers that are published too soon, since then, the expected number of citations \(e_i\) can have a relatively large difference in the values for different years.

10.2 PI-Indexes

The popularity of the Hirsch index is due in great part to the fact that it is a composite index, because its value depends not only on the number and distribution of citations over journal papers but also on the number of papers. One of the problems of the h-index is that it is not appropriate for analysis of publication performance of scientists with a relatively small number of publications. Such a situation can arise in mathematics, for example. There are highly cited scientists with a relatively small number of publications. As additional indexes for quantification of results of scientific production in such cases, one can use the PI indexes [202]

$$\begin{aligned} PI(\log ) = \ln (pC^3), \end{aligned}$$
(2.58)

where

  • P: number of journal papers of the scientists;

  • C: total number of citations obtained by the journal papers;

$$\begin{aligned} PI(C)= & {} 0.01 (P+2C); \nonumber \\ PI(2C)= & {} 0.01(P+1.5C+2C_{3P}); \nonumber \\ PI(3C)= & {} 0.01(P+1.3C+3C_{3P}); \end{aligned}$$
(2.59)

where P and C are as above and \(C_{3P}\) are the citations of the three most cited papers of the scientists.

One can imagine other kinds of PI indexes. For example,

$$\begin{aligned} PI_k = \ln (C_{kP})/(k), \end{aligned}$$
(2.60)

where \(C_{kP}\) are the citations of the most-cited k papers, etc.

Another interesting kind of productivity index was introduced by Phelan [203, 204]. It is well suited for research fields in which the most important contributor is generally listed as the first author. In such fields, production might be better measured by an index that weights both first-author publications and citations. Such an index is

$$\begin{aligned} PI_i = \left( \frac{p_i c_i}{ \sum \limits _k p_k c_k} \right) ^{1/2}, \end{aligned}$$
(2.61)

where \(p_i\) equals the total number of first-authored publications and \(c_i\) equals the total number of citations from first-authored publications. The sum is over all k first authors of papers in the research field or subfield of interest. The value of \(PI_i\) can be multiplied by 100 for ease of reference.

Vinkler [205] proposed also the index

$$\begin{aligned} \pi = 0.01 C_S \end{aligned}$$
(2.62)

where \(C_S\) is the number of citations obtained from S of the most-cited papers of the researcher. The number S is obtained as follows. One takes all publications (whose number is P, for example) and ranks them with respect to the number of citations they have obtained. Then \(S=\sqrt{P}\).

10.3 Indexes for Personal Success of a Researcher

The h-index is not a proper quantity by which to compare scientists from different scientific fields, because of different citing behavior, different numbers of scientists working in different scientific fields, etc. Wu [206] proposed a field-independent index of the personal success of a researcher as follows:

$$\begin{aligned} F = \frac{1}{K} \sum \limits _{k=1}^K \sum \limits _{i \in k; i \in N } \frac{C_i(t)}{D_k(t)} \end{aligned}$$
(2.63)

where

  • \(k = \{1,\dots ,K\}\): index for numbering of subject categories in which the author has published;

  • \(i = \{1,\dots ,N\}\): index for numbering of published papers;

  • \(C_i(t)\): number of citations received up to some year of interest by the ith paper, published in the year t.

  • \(D_k(t)\): the average number of citations received up to the year of interest by all papers in the same publication year t as paper i and belonging to the same category k as the paper i.

Another kind of success index (s-index) was proposed in [207210]. It is connected to the indicator called the NSP (number of successful papers) [211]. From the point of view of NSP, a paper is successful if it has received more citations than the number of references in the list of references of the the paper. The concept of a successful paper is refined further in the case of the success index. The paper i of a researcher is successful if its citations \(c_i\) are more numerous than the corresponding comparison term \(\text {CT}_i\) specific for the ith paper. In this case, the ith paper receives the score \(\text {sc}_i =1\). If the paper is not successful with respect to \(\text {CT}_i\), the ith paper receives the score \(\text {sc}_i =0\). The s-index is the sum of the scores \(\text {sc}_i\),

$$\begin{aligned} s = \sum \limits _{i=1}^p \text {sc}_i. \end{aligned}$$
(2.64)

The question is how to constrict \(\text {CT}_i\). Two possible constructions are [209] these:

  • The average (or median) number of references made by the articles published in the same journal and year of the publication concerned.

  • The average (or median) number of references made/received by a sample of publications representing the “neighborhood” of the publication concerned.

The success index s can be connected, for example, to the h-index and to the g-index. Let all \(\text {CT}_i\) equal \(\chi \). Then the success index can be written as

$$\begin{aligned} s(\chi ) = \int \limits _{\chi }^{\infty } dj \ f(j), \end{aligned}$$
(2.65)

where f(j) can be connected to an information-production process as follows. An information-production process has sources (for example, publications) that produce items (which are citations when publications are the sources). Then f(j) is the density of the sources in item-density j. Let the size frequency function of the sources be a decreasing power law

$$\begin{aligned} f(j) = \frac{C}{j^\alpha }; \ C>0; \ \alpha \ge 1; \ j \ge 1 \end{aligned}$$
(2.66)

(this power law is called Lotka’s law and will be discussed in detail in Chap. 4). Then the success index is

$$\begin{aligned} s(\chi ) = \frac{C^*}{\chi ^{\alpha -1}}; \ \ C^* = \frac{C}{\alpha -1}. \end{aligned}$$
(2.67)

From the definition of success index \(s(\chi )\), one can easily see that:

  • If \(\chi = h\), then the success index \(s(\chi )\) is equal to the h-index of Hirsch;

  • If

    $$\begin{aligned} \chi = h \left( \frac{\alpha - 2}{\alpha - 1} \right) ^{1/\alpha }, \end{aligned}$$
    (2.68)

    then the success index \(s(\chi )\) is equal to the g-index of Egghe.

Often, the personal success of a researcher is connected to his/her publication strategy. The publication strategy of a researcher can be characterized by two indexes: the PS-index (publication strategy index) [67, 201, 212] and the RPS-index (relative publication strategy index). These indexes use the impact factor of Garfield (see Sect. 3.16 from the next chapter). The indexes are defined as follows.

Publication strategy index

$$\begin{aligned} \text {PS} = \left( \sum \limits _{i=1}^N n_i G_i \right) /\left( \sum \limits _{i=1}^N n_i \right) , \end{aligned}$$
(2.69)

where

  • N: number of journals where the papers of the evaluated researcher (or evaluated research group) are published;

  • \(n_i\): number of papers published in the ith journal;

  • \(G_i\): impact factor of the ith journal.

The PS-index gives interesting additional information about the publication practices of the evaluated researchers. The index can be applied for monitoring the publication channels used by the evaluated researcher or group of researchers. Since researchers from different research fields use different channels, the value of the PS-index may depend greatly on the bibliometric characteristics of the research field. Because of this, the PS-index should be applied for comparison of sets of papers of authors working in similar research fields.

Relative publication strategy index

The RPS-index is calculated on the basis of the PS-index as follows:

$$\begin{aligned} \text {RPS} = \frac{\text {PS}}{G_m}, \end{aligned}$$
(2.70)

where \(\text {PS}\) is the value of the PS-index and \(G_m = \frac{1}{K} \sum \limits _{i=1}^K G_i\) is the mean of the impact factors of some reference set of K journals.

10.4 Indexes for Characterization of Research Networks

The theory of networks [213, 214] (and especially its branch devoted to social networks) has already many applications in different areas of research. Here are several examples:

  • biology [215, 216];

  • epidemic spreading [217219];

  • crowd analysis [220];

  • human dynamics and community detection [221224];

  • collaboration networks [225232];

  • consensus formation and agreement dynamics [233235];

  • study of spatial structures [236238];

  • structure and evolution of the Internet [239241];

  • rumor spreading [242, 243].

Network theory has also been applied to the area of study of dynamics of research structures and evaluation of research production [244246]. We expect that in the course of the time, the number of these applications will grow steadily. Below, we give several examples of the use of concepts of network theory in the area of science dynamics and evaluation of research production.

  • Schubert, Korn, and Telcs [247] have constructed two indexes of Hirsch type to characterize properties of networks of scientists. The basic concept of these indexes is the degree h-index of a network. This index is defined as follows:

A network has a degree h-index of h if not more than h of its nodes have degree not less than h.

On the basis of the degree h-index of a network, two indexes have been constructed in [247]:

  • Degree h-index of paper \(h_p\): here the nodes of the network are the papers published in a journal, and the links are between papers that share at least one common author. Such a network of papers has degree h-index \(h_p\) if \(h_p\) is the largest number of papers in the network that have degree at least \(h_p\).

  • Degree h-index of authors \(h_A\): in this case, the nodes of the network are the authors who publish in a journal. Links of the network are between authors that coauthored at least one paper in the studied journal. In such a case, the network of authors has degree h-index \(h_A\), which is the largest number of authors in the network who have a degree at least \(h_A\).

Networks are important for dynamics of science and scientific production [248, 249] (for example, an important element of scientific structure and processes is the collaboration networks or the networks connected to the citation of the results of scientific research). This importance is a factor for the increase in research on scientific networks and for the introduction of new indexes and indicators connected to these networks [250254]. Let us mention several indexes and indicators for the reader’s information.

  • Network centrality in social networks has been much discussed since the famous paper of Freeman [255]. Network centrality refers to indicators and indexes that identify the most important vertices within a graph, connected to a certain (in our case scientific) network. An example of such an index is the C-index and its derivates [256]. This index presents a network centrality measure for collaborative competence. Another network centrality measure is given by the l-index (lobby index) [257, 258].

  • Ausloos [259] measures the impact of the research of a scientist by means of his/her scientific network performance and defines the coauthor core in this network analogously to the core of papers defined by the h-index.

11 Concluding Remarks

A significant part of the discussion is this chapter was devoted to the h-index of Hirsch: to its strengths and weaknesses and to numerous h-like indexes and indexes complementary to the h-index. The reason for this is the popularity and widespread use of this index. Numerous other indexes are also discussed, and they may help evaluators to perform the quantitative part of the assessment of research production of individual researchers. It was demonstrated on the basis of data about citations of two researchers from the area of applied mathematics that these indexes may also provide useful information for the comparison of research production of researchers.

The indexes discussed above, e.g., the h-index, may also be calculated for research groups and departments as well as for research institutes, universities, and even for research communities of countries. Thus the indexes discussed in Chap. 2 may also be used for assessment of research production of groups containing many researchers.

We have applied numerous indexes above in the text in order to assess the research production of two researchers. The bibliometric analyses might go far beyond such computation and direct comparison of values of indexes. Analysis of links and relations in research networks and especially in copublication networks, analysis of citation impact, etc., may require a multidimensional approach and advanced data-analytical techniques such as cluster analysis or other data-analysis approaches that allow a simultaneous analysis of quantitative relationships among several variables. One example of an index that characterizes relations between two sets is the Jaccard index [260, 261]. Let us suppose we have two sample sets A and B. If A and B are both empty, one sets the Jaccard index \(J(A,B)=0\). Otherwise,

$$\begin{aligned} J(A,B) = \frac{\mid A \cap B \mid }{\mid A \cup B \mid } = \frac{\mid A \cap B \mid }{\mid A \mid + \mid B \mid - \mid A \cap B \mid }. \end{aligned}$$
(2.71)

The values of the Jaccard index are between 0 (inclusive) and 1 (inclusive). One can define the Jaccard distance

$$\begin{aligned} d_J(A,B) = 1 - J(A,B). \end{aligned}$$
(2.72)

An example of a bibliometric application of the Jaccard index is as follows. Let us suppose we have a list of references, and let A and B be two sample sets of references from this list containing \(n_A\) and \(n_B\) references. Let \(n_{AB}\) be the number of references that are present in both lists A and B. Then the Jaccard index for the two sample sets is

$$\begin{aligned} J = \frac{n_{AB}}{n_A + n_B - n_{A B}}. \end{aligned}$$
(2.73)

If the two lists of references are identical, then \(J=1\), and the corresponding Jaccard distance is \(d_J=0\). If the two lists of references are completely different (no references appear in both lists), then \(J=0\) and \(d_J=1\).

The results of multidimensional bibliometric analyses can be presented very effectively by various kinds of maps and landscapes, and because of this, the importance of these kinds of visualization techniques is increasing continuously. Additional indexes characterizing relations among sets of units will be described in the next chapter.