Three novel indirect indicators for the assessment of papers and authors based on generations of citations

Fragkiadaki, Eleni; Evangelidis, Georgios

doi:10.1007/s11192-015-1802-4

Three novel indirect indicators for the assessment of papers and authors based on generations of citations

Published: 31 December 2015

Volume 106, pages 657–694, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Scientometrics Aims and scope Submit manuscript

Three novel indirect indicators for the assessment of papers and authors based on generations of citations

Download PDF

Eleni Fragkiadaki¹ &
Georgios Evangelidis¹

793 Accesses
11 Citations
Explore all metrics

An Erratum to this article was published on 25 February 2016

Abstract

A new indirect indicator is introduced for the assessment of scientific publications. The proposed indicator ($fp^{k}$-index) takes into account both the direct and indirect impact of scientific publications and their age. The indicator builds on the concept of generations of citations and acts as a measure of the accumulated impact of each scientific publication. A number of cases are examined that demonstrate the way the indicator behaves under well defined conditions in a Paper-Citation graph, like when a paper is cited by a highly cited paper, when cycles exist and when self-citations and chords are examined. Two new indicators for the assessment of authors are also proposed (fa-index and fas-index) that utilize the $fp^{k}$-index values of the scientific publications included in the Publication Record of an author. Finally, a comparative study of the $fp^{k}$ and $fa^{k}$ indices and a list of well known direct (Number of Citations, Mean number of citations, Contemporary h-index) and indirect (PageRank, SCEAS) indicators is presented.

Estimating Authors’ Research Impact Using PageRank Algorithm

The Use of Fi-Index Tool to Assess Per-manuscript Self-citations

Article Open access 07 October 2022

Fi-Index: A New Method to Evaluate Authors Hirsch-Index Reliability

Article 10 June 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Scientific publications are responsible for disseminating the research results and achievements of scientists and scientific groups. The term describes any scientific document that has been peer reviewed and published in a way that can assist other researchers and be referenced in their work. Different types of scientific documents can be considered, like master and doctoral theses, review articles, conference papers and journal articles, technical reports and documents, books and book chapters, short communications and commentaries. In the rest of the paper, the term paper will be used to describe any of the above items and the term author for scientists and researchers that publish papers

Published papers do carry knowledge and their content has passed through a review process prior to their publication. Therefore, there is value attached to every published paper, though not all published papers have the same impact on their respective field. Several bibliometric indicators have been proposed to evaluate the importance of a paper and/or its acceptance by the scientific community.

The most fundamental indicator for assessing the scientific impact of a paper is the total number of citations received. A number of researchers have argued that the importance of a paper should be considered by examining not only its direct impact but also the impact of the papers that have cited it (Rousseau 1987; Dervos and Kalkanis 2005; Sidiropoulos and Manolopoulos 2005; Walker et al. 2007; Ma et al. 2008; Maslov and Redner 2008; Yan et al. 2011; Xiaojun et al. 2011; Egghe 2011b; Cheng et al. 2011). By doing so, one considers not only the visibility of the paper but also its prestige.

Consequently, a number of indirect indicators have been proposed, some of which are alterations or adaptations of the PageRank algorithm that was originally defined for ranking pages on the web (Page et al. 1999). More specifically, Ma et al. (2008) propose the application of PageRank to citation analysis and they have adapted the damping factor to better represent the walk of a random “researcher” rather than a random “surfer” (Chen et al. 2007). CiteRank (Walker et al. 2007; Maslov and Redner 2008) is another example of a PageRank based algorithm for assessing a paper that takes into account the age of the paper in order to increase its probability of being the starting point of a random walk. Prestige-Rank (Cheng et al. 2011) was proposed in order to account for the incompleteness of the Paper-Citation graph, which originates from the fact that no bibliometric database does actually include all the citations given to a particular paper. P-Rank (Yan et al. 2011) is another PageRank based indicator that utilizes the Paper-Citation graph and information about the co-authors of the papers and the journals in which the papers have been published in.

SCEAS Rank (Sidiropoulos and Manolopoulos 2005) takes a similar approach to PageRank but introduces an indicator that defines the contribution of direct citations to be greater than the contribution of indirect citations. It also specifies that indirect citations should have a greater impact on papers in their neighborhood rather than to distant papers. We examine both of these principles in this paper. Another example is the Cumulative patent citations and the Weighted cumulative patent indicators (Atallah and Rodríguez 2006) that do not originate from PageRank but follow a different approach in evaluating indirect citations. These indicators were originally defined for a Patent-Citation graph, a network identical to the Paper-Citation graph if patents are replaced by papers. Their aim was to measure the impact of a patent by considering the direct and indirect citations received and the closeness of citations to the patent under scrutiny. Finally, another approach was followed in Fragkiadaki et al. (2011) where the f-value indicator accounts for all indirect citations and includes a reducing factor that can be used to simulate the different citation patterns between different scientific fields.

Apart from the indirect indicators for the assessment of papers, a number of indirect indicators have also been proposed for the assessment of authors. SARA (Radicchi et al. 2009) is an indicator that follows a PageRank approach applied to the a Weighted Author-Citation graph but with slight differences, mainly around the distribution of impact from dangling nodes (authors that do not appear to cite any other author in the graph). Another indicator that constructs and uses the Author-Citation graph has been proposed by Fiala et al. (2008), Fiala (2012). The authors introduce a modification of PageRank where citations between authors are examined individually based on a number of factors, like the total number of publications of each author, the number of common publications between two authors, the number of distinct co-authors, the number of citations from one author to the other, as well as the year of each author to author citation. Another approach was followed by Kosmulski (2010) and Egghe (2011a, b). Both authors propose an indirect indicator based not only on the direct citations of a paper but also on the direct citations received by the citing papers (second generation citations). They choose to apply these indicators over a different set of papers included in the Publication Record of an author, thus, producing different results meant to be used either as standalone (hfg-index) or as complementary (Indirect h-index). Finally, Xiaojun et al. (2011) propose the use of Generational indices as indirect indicators calculated per generation of citations with regards to a target paper and the use of Cross-generational indices as cumulative measurements of impact.

To summarize, there are a number of indirect indicators that one can use in order to assess the impact of a paper or author depending on the criteria at hand.

The first indicator proposed in this paper, $fp^{k}$-index, considers several aspects of the Paper-Citation graph like the existence of cycles, the existence of more than one citation paths of the same or different length from a source paper to a target paper as well as the scientific age of the paper in order to produce the individual paper scores. The next two indicators proposed, fa-index and fas-index, are based on the individual $fp^{k}$-index values of the papers included in the Publication Record of an author. These indicators provide the means for assessing an author and we demonstrate that they are time aware and, in most cases, size independent. In addition, fas-index also accounts for the existence of self-citations for the individual authors of a paper.

In “Theoretical background” section, the Paper-Citation graph is presented in detail along with the different types of citation generations and some of the properties of the graph are discussed in more detail, like self-citations, chords and cycles. “The meaning of generations of citations” section further discusses citation generations and presents an example of the application of citation generations and citation generation counts in order to justify the reasons behind the type selected for the indicators introduced in this paper. In “ fp ^k-index definition” section, the $fp^{k}$-index indicator is defined and two examples of its application are presented in “Application and comparison of fp ^k-index with Number of citations (NC) and PageRank” section. In that section, we compare $fp^{k}$-index to two well known indicators for the assessment of papers, namely, the Citation count and PageRank. The fa- and fas-index are defined in “ fa ^k and fas ^k indices definition” section and an application of both indicators is given in “Application of the fa ^k and fas ^k indices” section. “Comparative study” section presents a comparative study of the proposed indicators to other well known indicators of direct and indirect impact found in the literature, along with experimental results for the rankings produced by each indicator based on the data provided by DBLP. Finally, the paper concludes in “Conclusions” section.

Theoretical background

We present an overview of the Citation graph along with the available meta-data information definitions for each paper participating in a closed paper collection. In addition, the generations of citations are examined in detail and a thorough example of the four types of forward generations is discussed. Generations of self-citations and the concept of chords are also considered.

Citation graph

Citation graphs are constructed from the meta-data available for the papers included in a closed set of papers. The base form of a citation graph is the Paper-Citation graph, but there are other types of derived graphs like the Author-Citation graph and the Journal-Citation graph. Derived graphs are constructed from the Paper-Citation graph by applying appropriate transformations as presented in Fragkiadaki and Evangelidis (2014). Here, we only present the Paper-Citation graph along with the notations used throughout this paper to describe the different properties of this graph.

The Paper-Citation graph is a directed graph whose nodes are the papers included in the collection and edges are defined based on the citations present in the Reference lists of these papers. A directed edge from a source paper (S) to a target paper (T) exists if the source paper (S) includes the target paper (T) in its list of references. We denote this relationship between papers S and T as “S references T” or “T is cited by S”, and the corresponding notation for this edge is $S\rightarrow T$.

Apart from the papers and the citation data, the Paper-Citation graph includes additional information originating from the meta-data available for each paper. These information include the author list of each paper, the publication year and the publication journal. The different entities participating in this Paper-Citation graph along with the different properties of the graph are described by the following notations, as they were first presented in Fragkiadaki and Evangelidis (2014):

$\mathbf {P}=\{\mathbf {P}_{\mathbf{1}},\mathbf{P}_{\mathbf{2}},\ldots ,\mathbf{P}_{\mathbf{NP}}\}$ denotes the closed set of papers participating in a Paper-Citation graph and $\mathbf {NP}$ is the total number of papers included in the collection.
$\mathbf {A}=\{\mathbf{A}_{\mathbf{1}},\mathbf{A}_{\mathbf{2}},\ldots ,\mathbf{A}_{\mathbf{NA}}\}$ denotes the set of authors that have participated in any of the papers included in the Paper-Citation graph. $\mathbf {NA}$ denotes the total number of authors participating in the Paper-Citation graph.
$\mathbf {J}=\{\mathbf{J}_{\mathbf{1}},\mathbf{J}_\mathbf{2},\ldots ,\mathbf{J}_{\mathbf{NJ}}\}$ denotes the set of journals in which the papers of the Paper-Citation graph where published. $\mathbf {NJ}$ denotes the total number of journals participating in the Paper-Citation graph.

An example of a Paper-Citation graph can be found in Fig. 1. Using the notations presented earlier the following for this graph:

$P=\{P_{1},P_{2},P_{3},P_{4},P_{5},P_{6},P_{7}\}$ is the set of papers in our collection and $NP=7$
$A=\{A_{1},A_{2},A_{3},A_{4},A_{5}\}$ is the set of authors and $NA=5$
$J=\{J_{1},J_{2},J_{3}\}$ is the set of journals and $NJ=3$

The Paper-Citation graph of Fig. 1 may also be presented in the form of a table, which we call the Paper-Citation table and for our sample graph is shown in Table 1. Each row of the table describes a particular paper and includes the list of co-authors, the publication year and publication journal, the list of papers referenced by the paper and the list of papers that directly cite the paper.

Table 1 Paper-Citation table for the Paper-Citation graph of Fig. 1

Three novel indirect indicators for the assessment of papers and authors based on generations of citations

Abstract

Similar content being viewed by others

Estimating Authors’ Research Impact Using PageRank Algorithm

The Use of Fi-Index Tool to Assess Per-manuscript Self-citations

Fi-Index: A New Method to Evaluate Authors Hirsch-Index Reliability

Introduction

Theoretical background

Citation graph

Citation generations

Generations of self-citations

Chords

Cycles

The meaning of generations of citations

\(fp^{k}\)-index definition

Application and comparison of \(fp^{k}\)-index with Number of citations (NC) and PageRank

First example

Second example

\(fa^{k}\) and \(fas^{k}\) indices definition

Application of the \(fa^{k}\) and \(fas^{k}\) indices

Comparative study

DBLP data

Paper indicators

Number of Citations (NC)

Contemporary h-index score (\(h^{c}\)-index)

SCEAS rank

PageRank

Author indicators

Number of Citations (NC)

Mean number of citations (MNC)

h-index

g-index

Contemporary h-index (\(h^{c}\)-index)

SCEAS Rank

PageRank

Experimental results

Paper indicators

Author indicators

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation