Introduction

Citation analysis involves calculation of the number of citations to a particular author, article, or journal (Wade 1975). For a journal, the tally of citations is influenced by both the number of citations per article and the number of articles. Consequently, citation analysis of a journal typically involves calculation of an index that corrects for the number of published articles, to reflect the average citation tally for articles in that journal (Garfield 1972). In order to make the calculation objective and reproducible, the index must be calculated using a specified set of journals and a specified time period in which citations may occur. Any other adjustments, such as weighting of citations, must also be specified.

The first journal citation index to be calculated for a large set of journals was the journal impact factor (JIF) (Garfield 2006). It is calculated as the number of citations a journal receives in a given year to items published in the previous 2 years divided by the number of articles published in the previous 2 years. The journals searched for citations are those listed by the Institute of Scientific Information (ISI), a division of Thomson Scientific (http://admin-apps.isiknowledge.com/JCR/JCR/). More recently, some alternative indices to quantify average citations per article in a journal have been developed (Banks and Dellavalle 2008), which differ from the JIF in the time period for the tally, the set of journals searched, and other adjustments (such as weighting each citation by how frequently the journal providing that citation is cited).

The limitations of journal citation indices have been widely published and repeatedly reviewed (Hecht et al. 1998; Kurmis 2003; Rey-Rocha et al. 2001). Despite this, journal citation indices continue to be used in many important decisions, including which journals to access and read (Duy and Vaughan 2006; Kelland and Young 1994; Schein 2000), where to submit manuscripts (Cheung 2008), which researchers to fund and appoint (Adam 2002; Fassoulaki et al. 2001; Fuyuno and Cyranoski 2006), and which institutions produce higher quality research (Davis and Royle 1996). It is important to consider to what extent the particular journal citation index used might influence such decision making. One method of investigating this is to examine the degree of correlation between one journal citation index and another. Also, if significant correlations were identified between journal citation indices that seek to quantify average citations per article in a journal by different methods, this would provide evidence of the convergent validity of those indices.

Numerous studies have examined correlations between journal citation indices (Bollen et al. 2009; Davis 2008; Falagas et al. 2008; Franceschet 2010; Gordon 1982; Leydesdorff 2009; Midorikawa et al. 1984; Rousseau 2009; Smart and Elton 1982; van Leeuwen and Moed 2005; Yue et al. 2004), most of which have found at least moderate correlations. However, many of these studies examined small samples of journals (N < 200) (Davis 2008; Falagas et al. 2008; Gordon 1982; Rousseau 2009; Smart and Elton 1982) or analysed data now over a decade old (van Leeuwen and Moed 2005; Yue et al. 2004). Three recent studies have examined the degree of correlation between various journal citation indices calculated for large samples of journals. Leydesdorff (2009) found moderate to strong correlations between three such indices. Bollen et al. (2009) also found statistically significant correlations between the same indices amongst a much larger set of comparisons, although the exact values are not reported. Franceschet (2010) also found moderate to strong correlations between the four citation indices examined. However, these studies often correlated journal citation indices that are normalised for the number of articles published with those that are not, potentially reducing the degree of correlation. Furthermore, none of these three studies performed a systematic search for all available journal citation indices that are normalised for the number of articles published. The first aim of this study was therefore to test the correlations between any journal citation indices that were normalised for the number of articles published and that had been calculated and published for a substantial set of journals.

Another criticism of some journal citation indices is the presence of errors in the scores for some journals (Monastersky 2005; Reedijk 1998)—an issue that in some cases is exacerbated by a lack of transparency about the criteria used to determine which articles are included in the calculation (Hernán 2008; PLoS Medicine Editors 2006). An error in a journal’s publication and citation data may have a large impact on its score, in absolute and rank terms. If a pair of journal citation indices do correlate well overall, then errors might be readily identified by checking the scores of journals that have particularly discordant ranks on the two indices. The second aim of this study was therefore to examine whether a sample of the most discordantly ranked journals on a pair of indices contained a greater number of erroneous scores than a random sample of the ranked journals.

Methods

Initially, we sought to confirm that our list of journal citation indices was comprehensive. We conducted internet searches using terms for the JIF and one of the other journal citation indices that were already known to us (see Table 1), with the intent of identifying lists of journal citation indices. The following searches of Google Scholar (http://scholar.google.com/) were conducted on 3 August 2008:

Table 1 Journal citation indices and the definition of their method of calculation

“impact factor” AND (“article influence” OR Eigenfactor)

“impact factor” AND (“journal rank” OR SCImago)

“impact factor” AND (“trend line” OR Scopus)

One author screened the search results for titles and abstracts that suggested an article might contain a list of journal citation indices. A full-text version of these articles was obtained and scanned for the location of the term “impact factor” and adjacent sentences were read. Thomson Reuter’s immediacy index was identified but it was not included as a separate index because it has the same formula as Scopus’ trend line index (Yue et al. 2004). We identified no additional published indices that quantify average citations per article in a journal.

For each index, we attempted to download all the scores published for the most recent complete calendar year. We downloaded the 2007 JIF scores for all available journals on the Journal Citation Reports of the ISI website (http://admin-apps.isiknowledge.com/JCR/JCR/) on 12 August 2008. We semi-automated downloading of Eigenfactor’s article influence (EAI) scores for 2006 for all available journals on the Eigenfactor website (http://eigenfactor.org/) on 14 August 2008. We downloaded SCImago journal rank (SJR) scores for 2007 for all available journals on the SCImago website (http://www.scimagojr.com/journalrank.php/) on 16 August 2008. We were unable to automate a download of Scopus trend line (STL) scores for 2007 and we were unable to purchase calculated scores from the developers. In order to generate a dataset from this site, one thousand journals were randomly selected from each of the other three datasets (JIF, EAI and SJR), using the random numbers function of Excel (Microsoft, Seattle, USA). Duplicates, identified by name or by ISSN, were removed. Two investigators independently entered each journal from this list on the Scopus website (http://www.scopus.com/) on 17 December 2008 and, if possible, retrieved its STL score for 2007. Discrepancies were corrected by referring to the Scopus website. Descriptive statistics and the Kolmogorov–Smirnov test were used to describe the distribution of scores within each of the four datasets.

The four journal citation indices allowed six possible pairings. Within each pairing, we used the LOOKUP function of Excel to match journals that had been scored on both indices. Matches were sought using whichever of the following identifiers were available: full journal title, abbreviated journal title, international standard serial number (ISSN), and the ISSN for an electronic version of the journal (eISSN). Journals that were automatically matched on at least two of these identifiers were paired without further checking. Where a full journal title was required to be matched with an abbreviated journal title to confirm a partial match, the LOOKUP function was used to search two spreadsheets matching titles and abbreviations: http://library.caltech.edu/reference/abbreviations/ and http://www.library.ubc.ca/scieng/coden.html. Where an ISSN was required to be matched with an eISSN to confirm a partial match, both were entered into an internet search engine (http://www.google.com/) to confirm that both were ISSNs of the named journal by the journal’s website or the publisher’s website. After matching was complete, 20 randomly selected pairs of scores from each of the six pairings were checked against the original websites to check for errors during the matching process.

The relationship between each pair of journal citation indices was then tested by correlating the two scores across all matched journals. Because of the skewed distribution of the raw scores, Spearman’s rho was used as the measure of correlation, with a two-tailed significance test. Probabilities of less than 0.05 were considered significant. To further describe the degree of association, we used the descriptors suggested by Cozby for Spearman’s rho: 0 to 0.19 = very weak relationship; 0.20 to 0.39 = weak relationship; 0.40 to 0.59 = moderate relationship; 0.60 to 0.79 = strong relationship; and 0.80 to 1 = very strong relationship (Cozby 2008).

Finally, we sought to examine the number of errors in the scores for journals that had very discordant ranks on a pair of indices, and the number of errors in the scores of randomly selected journals. To examine this, we compared the prevalence of errors in the 10 most discordantly ranked journals to the prevalence of errors among 10 randomly selected journals. This was performed within each of the six possible pairings of indices. The following procedures were used to check for and classify errors.

Identifier errors: any identifiers (full title, abbreviated title, ISSN, or eISSN) were checked against the journal’s website or the publisher’s website to ensure that all belonged to the same journal. Note that we allowed for changes in the journal’s name or other identifiers over the relevant period.

Denominator errors: if the number of items published during the relevant period (that is, the number to be used in the denominator in the calculation of the index) was reported, this was checked against the tables of contents of the journal. An error was only recorded if the number of publications identified was logically impossible and did not depend on the definition of what items were eligible for inclusion in the denominator.

Mathematical errors: where data used in the calculation of the index were available, the calculation was checked.

Results

We downloaded the JIF scores for 6,363 journals, the EAI scores for 7,556 journals, and the SJR scores for 12,751 journals. By merging three lists of journals (1,000 randomly selected from each of the JIF, EAI and SJR lists) and removing duplicates, we created a list of 2,731 journals whose STL scores were to be retrieved from the Scopus website. Scores for 2,550 (93%) of these journals could be identified on the Scopus website and their scores were recorded. The data for the four citation indices were strongly skewed (see Table 2).

Table 2 Descriptive statistics of the datasets of the four journal citation indices

The number of journals that could be matched in each of the six pairings of indices ranged from 1,856 to 6,508. Logically, the maximum number of journals that could be matched for each pairing of indices was the number of journals in the smaller dataset. The number of matched journals as a fraction of the number of journals in the smaller dataset ranged from 73 to 95% (see Table 3). No discrepancies were noted when 10 randomly selected pairs of scores were checked against the original website scores for each pair of indices.

Table 3 The number of matched journals as a fraction of the number of journals in the smaller of the two source datasets

Statistically significant, positive correlations were observed between all pairs of indices (see Figs. 1, 2, 3, 4, 5, 6). The strongest correlation was between JIF and SJR scores (Spearman’s rho = 0.89, p < 0.001) and was classified as ‘very strong’. The weakest correlation was between EAI and SJR scores (Spearman’s rho = 0.61, p < 0.001) and was classified as ‘strong’.

Fig. 1
figure 1

JIF (X-axis) and EAI (Y-axis) ranks for 5,856 journals. Spearman’s rho = 0.79, p < 0.001

Fig. 2
figure 2

JIF (X-axis) and SJR (Y-axis) ranks for 5,503 journals. Spearman’s rho = 0.89, p < 0.001

Fig. 3
figure 3

JIF (X-axis) and STL (Y-axis) ranks for 1,856 journals. Spearman’s rho = 0.70, p < 0.001

Fig. 4
figure 4

EAI (X-axis) and SJR (Y-axis) ranks for 6,508 journals. Spearman’s rho = 0.61, p < 0.001

Fig. 5
figure 5

EAI (X-axis) and STL (Y-axis) ranks for 2,026 journals. Spearman’s rho = 0.70, p < 0.001

Fig. 6
figure 6

SJR (X-axis) and STL (Y-axis) ranks for 2,421 journals. Spearman’s rho = 0.75, p < 0.001

The number of paired rankings in the SJR was spuriously elevated by the fact the website only publishes the results to two decimal places. This is evident in the horizontally aligned strings of data points in Figs. 2 and 4 and the vertically aligned data points in Fig. 6.

When 10 discordantly ranked journals and 10 randomly selected journals within each pair of indices were examined for errors, 13 errors were identified, all of which occurred among the discordantly ranked journals. Four were identifier errors, nine were denominator errors and none were mathematical errors (see Table 4). All were checked against the original websites and none were transcription errors on our part.

Table 4 Errors identified in the 10 most discordantly ranked journals and in 10 randomly selected journals for each pair of indices

Discussion

The method of matching journal scores from a pair of journal citation indices was very successful where the two indices are calculated from the same database of journals: SJR with STL (95%) and JIF with EAI (92%). Matching was slightly less successful when two indices were calculated from different databases of journals (73–85%). However, these results are consistent with the estimated proportion of journals that are common between these databases, which has been estimated at 84% (Gavel and Iselid 2008). Therefore, given the differences in source journals, the matching process can be seen as similarly successful in all pairs.

There is evidence of convergent validity of the indices because the four journal citation indices all correlated significantly with each other. Although the correlations were statistically significant overall, many individual journals’ scores differed markedly on the two indices. For each pair of citation indices there are examples of journals ranked within the top 10% on one index but the bottom 10% for another. This suggests that the choice of journal citation index could have a large impact on the types of decision making discussed in the Introduction. Decision makers could consider all the available indices and the subtle differences in how they are calculated to choose the most appropriate index for the decision. For example, citations over a 5-year period are used in the calculation of the EAI, which may make it the most appropriate index if long-term citability is of interest.

Possible sources of discrepancy in the rankings include differences in the definitions of the indices, differences in the raw data sets contributing to the calculation of the indices, differences in the time periods over which rankings were calculated, and errors. Previously published observations about errors in citation indices have mainly centred on inconsistencies in eligibility for inclusion in the denominator of a citation index (PLoS Medicine Editors 2006) and errors in the reference lists of the articles used in the count of citations (Reedijk 1998; Opthof 1997). Instead, we examined errors in the matching of journal titles and ISSNs, errors in the tallying of published articles that are unrelated to the definition of the denominator, and errors in the mathematical calculation of the index. Data from our random sample are reassuring in that no errors were found in 30 journals in any of the four indices. However, our sample of discordantly ranked journals showed that some errors were present.

One strength of this study is that the procedure for matching journals was rigorous. Among the other recent studies of correlations between journal citation indices, one matched by title only (Leydesdorff 2009) and one did not describe the matching process (Bollen et al. 2009). Franceschet (2010) calculated all the indices from the same dataset and therefore can be presumed to have had perfect matching. However, the scores generated for the purpose of the study may not be reflective of the scores available to the general public, especially given the errors we identified in the published indices.

A limitation of this study is that the process of checking for errors was not exhaustive. Furthermore, it could not be applied with equal rigour to the four indices. This was because the four publishing websites include differing amounts of additional information with their scores. For example, the Scopus website provides an easily replicable equation for calculation of the index and it provides the figures used in the calculation, while the Eigenfactor website does not allow the mathematical calculation to be replicated. Therefore we believe comparisons between the various indices using our data are not valid and should be avoided. We only sought to examine the number of errors among scores for discordantly ranked journals versus among scores for randomly selected journals. Since these two sets of scores received the same rigour of error checking overall, this comparison is valid.

The identification of a greater number of errors among discordantly ranked journals suggests a method by which publishers of the journal citation indices could seek to minimise errors in their data. By correlating their scores against those of the other indices as we have done, they may be able to locate some important errors—that is, those that greatly distort the relative ranking of individual journals—more rapidly. These methods are, however, too laborious for checking an individual journal’s score. We would encourage editorial board members and others interested in whether an individual journal’s score may be erroneous to visit the four publishing sites and to check whatever additional information is available for the types of errors we assessed.

Conclusion

There are strong correlations between four published indices that seek to represent the average number of citations received by the articles published in a given journal. Where the relative rank of a particular journal on two indices is very discordant, investigation of the source of the discrepancy may facilitate the identification of errors.