Introduction

The use of bibliometric indicators to measure research performance of researchers and research units is becoming an established practice. These indicators are starting to be routinely used as criteria for academic recruitment and career progression, research funding, or public rankings of academic institutions or individual researchers (Lehmann et al. 2006).

Bibliometric studies of research performance currently use a diverse set of indicators, focusing on attributes of journals, publications, and citations. In terms of complexity, quantity, and quality of data and processing required, indicators range from simple counts to sophisticated normalized scores, controlling for document types, publication years, and scientific fields. Some of the most disseminated indicators are the crown indicator (Moed et al. 1995), the h-index (Hirsch 2005), the g-index (Egghe 2006), and the citation z-score (Lundberg 2007).

The application of these indicators in the comparison of researchers from different scientific fields faces difficulties, either due to the fact that they are based on field classifications with wide categories, which fail to address increasingly specific multidisciplinary or interdisciplinary research profiles, or because they fail to account for the differing publication and citation patterns among scientific fields.

This is an essential concern in contexts such as the one that motivated this study—a school-wide scientific excellence award created in the Faculty of Engineering of the University of Porto (FEUP). This award distinguishes annually a researcher with an outstanding research performance over the previous period of 5 years.

In this paper, we describe a new indicator—the x-index—designed to address this critical concern, and use the data gathered for the 2009 award to illustrate its computation, utilization, and features. The paper is organized as follows: “Methodology” section describes the indicator and its development; “Data collection” section presents the data collection process; “Results” section displays the results of this study; “Discussion and conclusion” section contains a discussion of the results, thoughts on future work, and concluding remarks.

Methodology

To compare the candidates to the award, we were interested in an indicator that would meet three important requirements: include information on both quantity and quality of publication; use a data set that did not require extensive extraction and computing efforts; provide appropriate comparison among researchers in different scientific fields.

We started our work based on the state-of-the-art indicator used to produce a successful Top-40 economics researchers ranking in the Netherlands (Nederhof 2008).

The indicator is computed by adding individual publication scores:

$$ I = \sum\limits_{p = 1}^{n} {I_{p} } , $$
(1)

where n is the number of publications and I p is publication p’s score.

Each publication p’s score is given by

$$ I_{p} = {\frac{2}{{N_{p} + 1}}}L_{p} C_{p} , $$
(2)

where N p is the publication’s number of authors, L p is the publication’s length normalized to the average number of pages of publications in the same journal in the same year, and C p is the journal’s average number of citations per publication normalized to field average. 2/(N p  + 1) is a co-authorship share coefficient.

The indicator met our first requirement, as it included information on quantity and quality of publications. However, it did not meet the second requirement, due to the vast amount of data and processing that are required to normalize page lengths and average numbers of citations for journals, and it met only partially the third requirement, because it relies on a scientific field classification with wide categories, failing to appropriately characterize increasingly specialized individual research profiles.

To overcome these difficulties, we developed a new indicator, by adapting the original one, in the following ways:

  • To reduce data and processing requirements, we adapted the publication scores by replacing the journal score with the 5-year Impact Factor. This removed normalization from the score, so we reintroduced it in a different way. We also left out the size of the publication, since it is increasingly becoming a journal-specific stable number. With these adaptations, we are able to compute an absolute score for the researcher.

  • To normalize that absolute score, we compare it to a weighted average of absolute scores for the researchers with more publications in each journal where the researcher has published. The proportions of each publication score in the researcher’s absolute score define the weights.

By focusing on the mix of journals where a researcher publishes, instead of using an established scientific field classification, we have better-suited bottom-up definitions of a researcher’s specific publication profile. By using data about the researchers with more publications in each journal, who can be identified by convenient functionality available from citation data providers, we reduce substantially the data extraction and processing efforts required to compute the indicator.

The indicator is defined and computed in the following way:

  1. 1.

    We compute an absolute score S of publishing quantity and quality, according to the following definition:

    $$ S = \sum\limits_{p = 1}^{n} {S_{p} } , $$
    (3)

    where S p is the individual quality score for publication p, and n is the total number of publications.

    The scores S p are computed in the following way:

    $$ S_{p} = {\frac{2}{{N_{p} + 1}}}F_{p} , $$
    (4)

    where F p is the 5-year Impact Factor of publication p’s journal, and 2/(N p  + 1) is a co-authorship share coefficient, with N p being publication p’s number of authors. This co-authorship share coefficient is replaceable with alternative definitions, which, for instance, may place different emphasis on collaboration (Abbasi 2009).

  2. 2.

    In order to enable the comparison among scientific fields, we then build the x-index as a relative score:

    $$ X = \frac{S}{R}, $$
    (5)

    where R is a reference score for the researcher’s specific scientific field.

    This reference score R is computed as a weighted average of individual publication reference scores R p :

    $$ R = \sum\limits_{p = 1}^{n} {w_{p} R_{p} } . $$
    (6)

    The weights w p are given by the proportion of S p in the researcher’s absolute score S:

    $$ w_{p} = {\frac{{S_{p} }}{S}}. $$
    (7)

    The individual publication reference score R p consists of the following average:

    $$ R_{p} = \frac{1}{3}\left( {S_{p}^{1} + S_{p}^{2} + S_{p}^{3} } \right), $$
    (8)

    where S p a is the absolute score for the a-th researcher with more publications in publication p’s journal.

    Individual publication reference scores R p may also be replaced by alternative definitions, such as a maximum, a minimum, a measure based on a different number of top researchers, or others.

When a researcher’s absolute score is the same as the average absolute score for the three top researchers in each journal, the value of the x-index is 1. When the absolute score is systematically above or below, the x-index will also be, respectively, above or below 1. In mixed circumstances, the magnitude of the x-index will depend on the specific magnitudes of the differences, and the weights of the publications.

Data collection

In this study we use data collected for the 2009 scientific excellence award. The award covered the 5-year period 2004–2008, and in this first edition only publications in journals covered by the 2008 Journal Citation Reports (JCR) were considered. In order to apply for the award, the researchers were required to have at least 15 publications or 7.5 FTEP (Full-Time Equivalent Publications), each publication contributing additively to this score with a co-authorship share coefficient of 2/(N p  + 1).

There were 14 candidates to the award, from two departments only: Chemical Engineering (Department A) with 8 candidates, and Mechanical Engineering (Department B) with 6 candidates.

The candidates from the Chemical Engineering Department had mostly published in journals included in the following JCR categories: Biotechnology & Applied Microbiology, Analytical Chemistry, Chemical Engineering, Environmental Engineering, and Environmental Sciences. Individual candidates also had publications in more than one of these categories. A similar observation was made regarding the candidates from the Mechanical Engineering Department, who had the majority of their publications in journals included in the following JCR categories: Manufacturing Engineering, Mechanical Engineering, Composites Materials Science, and Applied Mathematics. The differences in categories between the departments provide evidence of the need to account for differing publication and citation patterns among scientific fields, whereas the diversity of categories within departments, and even for individual researchers, is evidence of the need to account for candidate-specific research profiles.

Publication data was extracted from Institute for Scientific Information (ISI) Web of Science by a team from the Faculty’s Library and Information Services. The effort for data extraction covered approximately 150 journals, and required approximately 10 persons day, a quite large effort, even for such a restricted number of candidates. A critical part of the extraction process was the identification of individual researchers, which was in general achieved with a combination of name and institution, and facilitated by the fact that the data concerns researchers with high publication rates, and journals covered by the JCR, both being particularly careful with researcher identification data.

The data was imported into spreadsheets, in order to enable the subsequent computation of the scores.

Results

Score computation example

We start by providing detailed data about one of the researchers—researcher J—in order to illustrate how the x-index is computed. Names of researchers, and titles of publications and journals are omitted, for privacy reasons.

Table 1 shows the absolute scores for the three researchers with more publications in each journal that researcher J published on. The reference score for each journal is the average of the three scores.

Table 1 Scores for the authors with higher numbers of publications in the journals where researcher J published in the period 2004–2008

In Table 2 we display the remaining data required to compute the score: the number of authors N p , and the 5-year Impact Factor F p , involved in the computation of each publication’s score S p ; the scores S p , that are added to obtain the absolute score S = 24.934; weights w p and the publications reference scores R p , used in the weighted sum that provides the reference score R = 67.670. The x-index is obtained dividing the absolute score by the reference score X = S/R = 0.3685.

Table 2 Data for researcher J’s publications, required to compute the absolute score and the x-index

Researchers performance ranking

Table 3 contains the x-indexes for the researchers that applied to the award, and their ranking. Using partial information from the x-index, we also display in Table 3 four other bibliometric indicators, and the corresponding rankings of the researchers: number of publications, full-time equivalent publications, absolute scores, and average 5-year Impact Factor.

Table 3 Research performance ranking using number of publications, full-time equivalent publications (FTEP), absolute score, average 5-year Impact Factor (IF) and x-index

The comparison of the rankings shows clearly the effect of the normalization in the x-index. Considering, for instance, the top 4 ranking, the number of Department A researchers is 3 for publications and FTEP, 4 for the absolute score, and 2 for the x-index. The Spearman Rank Correlation Coefficient for the association between the x-index and the other four indicators is 0.70 for publications, 0.86 for FTEP, 0.66 for absolute scores, and −0.22 for average 5-year Impact Factor. The mean absolute shift in rank is 2.5 for publications, 1.7 for FTEP, 3.0 for absolute score, and 5.1 for average 5-year Impact Factor.

The differences relative to the average 5-year Impact Factor are very significant. In fact, this indicator presents several shortcomings that are addressed with x-index, mainly the fact that it fails to take into account quantity of work, and number of authors, and that it is not normalized in the scope of any specification of scientific field.

The effect is also visible, although in a more moderate way, within departments. Table 4 shows the ranks within the set of researchers from Department A. The Spearman Rank Correlation Coefficient is lower for the average 5-year Impact Factor: −0.29, but presents higher values for publications: 0.81, FTEP: 0.93, and absolute scores: 0.81. The mean absolute shifts in rank are lower—1.0 for publications, 0.5 for FTEP, 1.0 for absolute score, and 2.8 for average 5-year Impact Factor.

Table 4 Research performance ranking using number of publications, FTEP, absolute score, average 5-year IF, and x-index, for Department A researchers

Table 5 provides analogous data for the researchers from Department B. The Spearman Rank Correlation Coefficients are even higher—0.83 for publications, 0.94 for FTEP, 0.83 for absolute scores, and 0.20 for the average 5-year Impact Factor—and the mean absolute shifts in rank lower—0.7 for publications, 0.3 for FTEP, 0.7 for absolute score, and 1.7 for the average 5-year Impact Factor.

Table 5 Research performance ranking using number of publications, FTEP, absolute score, average 5-year IF, and x-index, for Department B researchers

However, these variations in the rankings, even within departments, are in line with our assertion that research performance assessment must be made closer to the specific research profiles.

Choice of reference scores

When choosing the set of researchers with the larger number of publications as a reference for each journal, we are seeking a reference that is simultaneously representative of the journal’s scientific field, and of a large research output. Focusing only on the one researcher with more publications entails two risks: the researcher can either be an outlier with an extremely high absolute score, or have a concentrated range of publication outlets, and present an extremely low overall score. In order to avoid these situations, we chose to use the average of the three top researchers as reference score.

Table 6 presents the values of the relative scores for different reference score alternatives: the maximum among the three scores (S RELMAX ), the minimum (S RELMIN ), the score of the researcher with more publications in the journal (S RELTOP ), the average of the two top researchers (S RELTOP-2 ), and the average of the three top researchers (the x-index).

Table 6 Research performance ranking using different reference score alternatives

The Spearman Rank Correlation Coefficient for the association between the different alternatives and the x-index is 0.93 for S RELMAX , 0.96 for S RELMIN , 0.95 for S RELTOP , and 0.96 for S RELTOP-2 . The corresponding mean absolute shifts in rank are 1.1, 0.9, 1.0, and 0.9, respectively.

Including more information does in fact lead to shifts in the rankings, and although the differences between alternative rankings and the x-index ranking are not large (in this particular case, they occur mostly in the middle of the rankings, with the top two and bottom two positions remaining unchanged), they show that the two risks we have mentioned are present and have an impact.

Discussion and conclusion

This study describes x-index, a new indicator for cross-disciplinary bibliometric research performance assessment, developed to meet three requirements: reflect both publication quantity and quality; modest data extraction and processing effort; ability to compare diverse scientific field profiles.

The indicator was used to rank candidates to a scientific excellence award in FEUP. The data collected for that purpose was used to illustrate the indicator’s computation, to show its suitability to compare researchers from different scientific fields, and to validate our design choice concerning the number of top researchers involved in defining journal reference scores.

The indicator therefore presents several merits, but it also has some limitations, and may be object of further improvement efforts:

  • Using a measure based on citation counts would permit a more meaningful assessment of scientific quality, since the Impact Factor has well-documented limitations (Lehmann et al. 2006). The generic structure of our indicator allows it to be adapted to measures based on citation counts, and even to other measures. This would, however, require a more complex, and possibly larger, extraction and processing work—to control for document type, to limit the citation counts to a certain time window—and in fields with a large time-to-citation would inhibit the inclusion of recent publications in the assessment.

  • Due to the normalization procedure, the indicator is vulnerable to scale issues. In the presence of a stratification of journals in the same scientific field, with non-intersecting clusters of researchers featuring high publication rates and high Impact Factors, and researchers featuring low publication rates and low Impact Factors, it is possible for researchers from the latter to have x-indexes similar to, or higher than, the researchers from the former.

    This vulnerability of the indicator is of a generic nature, and its impact will, in the end, depend on whether the stratification occurs in practice or not. We are not aware of any studies on this effect, and have not come across it in our study. Nevertheless, the fact that the reference scores are computed with the average of the three top researchers for each journal should help moderate the impact of stratification, by diversifying the set of researchers that are involved in the comparative assessment. Additionally, for the purpose of the award, we established a minimum threshold for the candidates’ number of publications, which should also be helpful, by leaving out candidates with a very narrow set of publishing outlets. When circumstances require a broader and more thorough assessment, multiple indicators should naturally be used, and should be used in a decision support perspective by the assessment experts (Pendlebury 2009).

  • A critical piece in the computation of the indicator is the unique identification of researchers, which is still a weakness in citation databases. As we have mentioned earlier, highly published researchers, and journals included in the JCR, are likely to be more careful with establishing conditions for this unique identification. On the other hand, there are ongoing efforts to address this problem (Enserink 2009). For the moment, efforts such as extending the normalization to averages, or automating the data extraction process will keep facing strong difficulties.

  • Using only data from ISI and JCR journals excludes some types of publications, such as books, book chapters and conference proceedings, and obviously publications in journals that are not covered by ISI. However, the extension of x-index to other sources of citation data is straightforward. As an example, a procedure analogous to the one we have described could be performed using Scopus, which offers a wider coverage of scientific journals, and SJR (Falagas et al. 2008) as an alternative journal scientific prestige indicator.

The scientific excellence award has fundamentally been established to recognize outstanding scientific work, but it also plays a role in incentivizing researchers to intensify their efforts to publish more, and to publish in prominent international journals. This is an important goal for FEUP as a whole, but it is an area where most departments need to radically improve their performance, as is noticeable in the concentration of applications to the award in just two departments.

Although the indicator does not cover all dimensions of research work, we believe its generic structure can be adapted to reflect a more complete appraisal, and allow an even better knowledge of where researchers stand in comparison with the best in their fields. An important trade-off of these improvements is the increase in the amount of required information extraction and processing, which is not always compatible with the resources that a single Faculty or University is able to make available for this purpose. Based on success ingredients from other ranking experiences (Nederhof 2008), the ranking procedure has been designed to be transparent, so it has been, and remains, open for discussion and improvements.