Introduction

After the advent of electronic journals, it has become much easier to collect journal usage data. Contrary to the hard copy era, it is possible to view scholarly communication through the eyes of the reader now (Rowlands and Nicholas 2007, p. 223). Besides easy and cheap data recording, usage metrics have further advantages compared to citation metrics. They are available much more timely and reflect use on a much broader scope (Duy and Vaughan 2006, p. 513). While citations can be considered an indirect measure of use but a direct measure of usefulness, reads could be seen the other way round. Accordingly, usage metrics can be regarded complementary to citation metrics (e.g. Armbruster 2007; Bollen et al. 2005). Since e-journal usage data have been available for several years only, there is a need for further research. Kurtz et al. (2005) even expect similarities and differences of reads and citations to become a central issue of bibliometric research.

Several usage indicators have been suggested in recent years. For instance, Darmoni et al. (2002) introduced a so-called reading factor. Most suggestions base on the classical citation indicators from the Journal Citation Reports (JCR), using download data (usually full-text article requests) instead of citations. The corresponding usage metrics are usage impact factor (UIF) (Rowlands and Nicholas 2007; Bollen and Van De Sompel 2008), usage immediacy index (Rowlands and Nicholas 2007) or download immediacy index (Wan et al. 2008), and usage half-life (Rowlands and Nicholas 2007).

According to Bollen and Van de Sompel (2008) comparisons between citation and usage data can be made at a local and at a global level. Most of the existing studies used local usage data. Darmoni et al. (2002) accessed data from the Rouen University Hospital digital library and compared download and JIF rankings. Duy and Vaughan (2006), who analysed journal titles from the subject areas of chemistry and biochemistry at Concordia university library, found a significant correlation between electronic journal usage and local journal citation data but no relationship with the JIF. Also McDonald (2007) detected a medium relationship between local article downloads and locally collected citations for more than 1,500 journals (mainly from sciences) owned by the California Institute of Technology. Bollen and Van de Sompel (2008) collected usage data from the entire California State University system and aggregated them for more than 6,000 journals. Their analysis shows a modest negative correlation between the UIF and the JIF. The analysis at discipline level revealed that correlations differ for particular disciplines. A weak positive relation could be observed especially for disciplines with a higher proportion of graduate students and graduate faculty.

Global usage studies were mainly performed using subject repositories and open access journals. Chu and Krichel (2007) investigated the top 200 documents at RePEc, a digital library in economics. They related the number of downloads to the number of citations from SSCI and Google Scholar and found a moderate correlation between them in both cases. Brody et al. (2006) used download and citation data from arXiv.org and identified a significant and “sizeable” correlation between citations and downloads, which varied between 0.33 and 0.46 depending on the considered field. Kurtz et al. (2005) used the NASA Astrophysics Data System for their research. Besides examining the relationship between citation and readership numbers, the authors also compared the obsolescence function as measured by readership with the obsolescence function as measured by citations. Another study investigating the relationship between downloads and citations and possible differences in obsolescence patterns for articles published in a single journal was conducted by Moed (2005). In particular, Moed tried to determine the effect citations have on downloads and the effect of initial downloads made during the first three months of an article’s life-span on citations later-on. Wan et al. (2008) concentrated on the download immediacy index and explored its relationships with several citation indicators. They used the Chinese Academic Journal Full-Text Database in order to retrieve the download data. The citation indicators were extracted form the Chinese Academic Journals Comprehensive Citation Report which is an equivalent to JCR.

Methodology

It is the objective of this contribution to examine on a global scale in which aspects journal usage metrics differ from citation metrics. In particular, the following issues will be addressed:

  • growth in e-journal use;

  • comparison of citations and downloads (absolute and relative);

  • differences in obsolescence characteristics.

We conducted the last two analyses both at journal level and on a paper by paper basis for a few journals.

In our study, we employ a usage impact factor (UIF) and a usage half-life (UHL) both of which are calculated in analogy to the corresponding Thomson/ISI’s citation indicators. Since the journal impact factor (JIF) only considers citable documents, we follow Bollen and Van De Sompel (2008, p. 139) and use the article counts of the Journal Citation Reports (JCR). Accordingly the formula for calculating the UIF for journal j in the year i is

$$ {\text{UIF}}_{j}^{i} = \left( {{\text{FTA}}_{j}^{i \to i - 1} + {\text{ FTA}}_{j}^{i \to i - 2} } \right)/\left( {{\text{nart}}_{j}^{i - 1} + {\text{ nart}}_{j}^{i - 2} } \right) $$

\( {\text{FTA}}_{j}^{i \to i - 1} \) is the number of full-text article requests in the year i to articles published in journal j in the year i-1; \( {\text{FTA}}_{j}^{i \to i - 2} \) the number of full-text article requests in the year i to articles published in journal j in the year i-2; \( {\text{nart}}_{j}^{i - 1} \) the number of (citable) articles published in journal j in the year i-1 (according to JCR); \( {\text{nart}}_{j}^{i - 2} \) the number of (citable) articles published in journal j in the year i-2 (according to JCR).

Analogously to the cited half-life, the usage half-life is defined as the median age of the articles that were downloaded for a particular journal in the considered year. This means that half of a journal’s downloaded articles were published more recently than the usage half-life.

The download data were provided from Science Direct (SD). We received the usage data for all SD oncology journals from 2001 to 2006 at journal level. For each journal within this period the (year-wise) sum of the downloaded full-text articles (FTAs) back to the year 1995 (if data were available so long backward) was given. Furthermore, the access data at article level were made available for a few journals. For each article, the number of downloads per month was itemized up to the year 2006.

Citation data were obtained from both JCR and Scopus. We used the number of (citable) documents, the number of citations, the impact factor, the cited half-life and the cited journal data which are the basis for the calculation of the cited half-life from JCR. For reasons of data reliability, we also retrieved the number of articles and review articles and the number of citations from Scopus and computed the impact factor with the Scopus data.

Results

Growth of article downloads

Table 1 reveals that e-journal use has strongly accelerated, at least with regard to SD oncology journals. Between 2001 and 2006 the number of full-text article requests (FTAs) has multiplied more than tenfold. This strong increase is only partly due to the longer time window of the access data in recent years (see obsolescence characteristics). The calculation of the growth rate on the basis of the 19 journals, which were included in Science Direct in 2001, shows that the use has multiplied sevenfold. This suggests that the overall growth was due only to some extent to more e-journals available. As a consequence, a big part of the increase might have been caused by the generally increased acceptance of e-journals at that time. As can also be seen, there was some consolidation in recent years where the growth rates were not so high anymore.

Table 1 Growth of article downloads (articles from 1995 onwards) and of electronically available journals (subject category oncology) in Science Direct in the years 2001–2006

Analyses at journal level

Usage data versus citation data

Figure 1 displays the citation frequencies (left y-axis) in the period 2004–2006 and the number of downloads (right y-axis) in 2004 for 29 SD journals which are arranged from left to right according to their FTA values. It can be seen that, though the ratio between citations and FTAs is relatively unequal (1:37 with 2004 cites, 1:33 with 2005 cites, 1:29 with 2006 cites, n = 29), journals with higher download values are cited more often in general. This is also reflected by a high correlation between these two indicators. We computed Spearman rank correlation coefficients of 0.89 and twice 0.92 between the 2004 downloads and the particular 2004, 2005 and 2006 cites. The corresponding correlations between downloads and citations of the years 2005 (n = 31) and 2006 (n = 33) were similar (between 0.9 and 0.92).

Fig. 1
figure 1

Comparison between number of citations (according to JCR 2004, 2005 and 2006) and number of FTAs (according to SD 2004 usage statistics) (n = 29 journals)

The correlations (Spearman-Rho) between the usage impact factor (UIF) and Thomson’s JIF (see Table 2) are not as high as those between FTAs and citations for the period 2004–2006 which might partly be due to different obsolescence characteristics of downloads and citations (see following section).

Table 2 Correlations (Spearman) between UIF and Thomson’s JIF (2004–2006)

As noted before, we computed the journal impact factor also with Scopus data. As in Gorraiz and Schloegl (2008) it turned out that there are problems with data quality for a few journals (“Drug Resistance Update”, “EJC Supplements”, “Journal of Urology”, “Lancet Oncology”, “Seminars in Radiation Oncology”). After removing these journals from the analysis, we identified a very high correlation between Thomson’s and “Scopus impact factor” (Spearman-Rho = 0.96 in 2004 and 2006, and 0.97 in 2005, n = 26). As a consequence, the correlations (Spearman-Rho) of the latter with the UIF are similar (see Table 3).

Table 3 Correlations (Spearman) between UIF and Scopus JIF (2004–2006)

Obsolescence characteristics

Timeliness seems to be crucial when downloading articles (see Table 4). Fifty-six percent of all downloads in 2006 concerned articles published after 2004. Only 19% of the downloaded articles were older than 5 years. This obsolescence pattern is also illustrated in Fig. 2 for a few selected journals. As can be concluded, the UHL is between 1 and 2 years for most of the journals (for which access data to volumes from the past 10 years were available and which were included in JCR) (mean = 1.7, max = 2.8, min = 0.8 years in 2006, n = 24).

Table 4 Obsolescence of oncology journals (with 10 years’ backfile data) (2006 FTAs) (n = 27)
Fig. 2
figure 2

Obsolescence of downloaded articles from selected oncology journals (with 10 years’ backfile data and JCR coverage—the median was computed for 24 journals) in Science Direct (2006 FTAs) (the following interval notation is used: \([a;b[=\{x \in {\mathbb{R}}\,|\,a \leq x < b\}\)

Comparing Figs. 2 and 3 illustrates the differences in obsolescence characteristics between citations and downloads quite well. Contrary to downloads, articles are hardly cited in the year of publication. Furthermore, it takes a few years until articles reach their citation maximum. Accordingly, also the cited half-life is much higher (mean = 5.6 years in 2006, n = 24). This also implies that the difference between the download immediacy index as defined by Wan et al. (2008) as “the number of downloads of a journal’s articles within one publication year, divided by the number of published articles by that journal in that same year” and Thomson’s (citation) immediacy index is even much greater than that of the half-lives.

Fig. 3
figure 3

Obsolescence of citations in selected oncology journals (with 10 years’ backfile data and Science Direct coverage—the median was computed for 24 journals) (2006 JCR data) (the following interval notation is used: \([a;b[=\{x \in {\mathbb{R}} a \leq x < b\})\)

Analyses at article level

Obsolescence characteristics

The download pattern on a paper by paper basis is similar to that of their aggregation at journal level in recent years, i.e. articles have the highest download requests immediately after they are available electronically (see Fig. 4). Interestingly, the age distribution is different for older volumes. As the case of “Cancer Letters” (see Figs. 5 and 6) shows, the download curve does not decline so quickly. Moreover, it may even happen that downloads increase again short-term in the following months or even years as can be observed for Articles 2, 6, and 8 in Fig. 5. One possible explanation could be that citations have a direct influence on downloads as will be analysed in more detail in the following section. Another explanation might be the beginning wide-spread use of e-journals at that time. As a consequence, also articles from older journal issues were downloaded more often because of non-existing prior e-journal use/access.

Fig. 4
figure 4

Downloads (selected articles and mean of 11 articles) from Cancer Cell (January issue, 2005) (FTAs per month)

Fig. 5
figure 5

Downloads (selected articles and mean of 18 articles) from Cancer Letters (issue 2, vol. 162—2001) (FTAs per month)

Fig. 6
figure 6

Downloads (selected articles and mean of 13 articles) from Cancer Letters (issue 1, vol. 203—Jan. 2004) (FTAs per month)

Figure 6 reveals a further feature of e-journals. Often their articles are made available online earlier than in print. For instance, the first article of issue 1, volume 203 (January 2004) of “Cancer Letters” was placed online in September 2003, followed by eight articles in October. Five out of 13 articles even had their download peak 1 month before they appeared in print.

Usage data versus citation data

In order to relate the download data of the articles to their citations, we complemented the former by retrieving the number of citations until the year to which the download data were available (2006) in SCI. As Tables 5, 6 and 7 exhibit, the number of articles differs between SCI and Science Direct (e.g. 404 vs. 383 articles for “Cancer Cell”). The main reason for this difference is that Science Direct did not collect download data for editorial materials, corrections and letters. A small part of the deviation can also be attributed to wrong or missing data (see also Gorraiz and Schloegl 2008).

Table 5 Citation and download data for articles in “Cancer Cell” (2004–2006)
Table 6 Citation and download data for articles in “Cancer Letters” (2001–2006)
Table 7 Citation and download data for articles in “Gynecologic Oncology” (2002–2006)

Tables 5, 6, 7 also clearly show that the number of citations is considerably lower than the number of downloads. This is especially true for the publication year 2006 in which an average article was only cited 0.4 times in “Cancer Letters” and 0.3 times in “Gynecologic Oncology”. (72% of the articles in “Cancer Letters” and 81% of those in “Gynecologic Oncology” were not cited at all in 2006.) In “Cancer Cell”, which is more heavily cited than the two other journals, an average article received two citations in 2006. The situation is different for downloads which usually have their maximum in the year of publication and the cumulated download frequency of which increases only gradually in the following years. The lower download frequencies for articles from the years 2001 and 2002 (“Cancer Letters” and “Gynecologic Oncology”) might be, as mentioned before, due to the lower use of e-journals at that time.

Because of the different obsolescence characteristics between citations and downloads, the average citation-download-ratio strongly depends on the considered publication year being very high for articles published more recently (for instance 1:1354 for “Cancer Letters” in 2006) and declining if a longer time window is considered (e.g. 1:42 for articles published in “Cancer Letters” in 2001).

The differences in the citation-download-ratio can be huge among articles. In the case of “Cancer Cell” we computed a minimum of 1:22.4 and a maximum of 1:1581 for 2004 articles. As a consequence, the correlation between downloads and citations is lower at article level than at journal level. Because of the big differences between downloads and citations especially in the publication year, a high correlation cannot be expected in 2006 (for instance 0.32 for “Cancer Letters” and 0.41 for “Gynecologic Oncology”). When considering longer time windows, we identified moderate relations (Spearman-Rho between 0.47 and 0.7).

Finally, as mentioned before, we investigated if citations have an impact on article downloads. (Figures 7 and 8 stand for the various analyses we have performed.) Figure 7 shows the monthly distribution of downloads and citations for Article 2 of Fig. 5, where downloads increased again clearly several months and even years after their initial maximum. As can be seen, it is difficult to determine a direct impact of citations on downloads. (Also the exclusion of self-citations did not result in different insights.) A possible explanation for the missing effect could be due to the low number of citations. Therefore, we conducted the comparisons with the most cited articles of our journal sample (see Fig. 8 for the most cited article in “Cancer Cell”). However, also these analyses failed to prove a direct impact of citations on downloads.

Fig. 7
figure 7

Downloads of and citations to Article 2 in “Cancer Letters” (issue 2, vol. 162—2001)

Fig. 8
figure 8

Downloads of citations to the most cited article in “Cancer Cell” (issue 4/2005)

Conclusions

Our article aimed at investigating the similarities and differences between citation and usage data. This was done at journal level and on a paper by paper basis. It was shown that downloads and citations have different obsolescence patterns. Most of the articles are downloaded immediately after they were put online. In many cases, they reach their download maximum even before they appear in print. In contrast, it takes a few years until articles receive their citation peak. This is reflected by different half-lives at journal level. While the average cited half-life was 5.6 years, we computed a mean usage half-life of 1.7 years for the 2006 Science Direct data. There is some distortion when analysing the obsolescence characteristics for former volumes because of the strong increase in e-journal use between 2001 and 2006, which is mainly due to the generally increased acceptance of e-journals at that time.

There is a moderate correlation between full-text article requests and article citations. Because of different obsolescence patterns, it is essential, however, to consider a time-window of several years. The relationship is strong when comparing downloads and citations at journal level because variations at article level compensate each other. The correlation between UIF and JIF is weaker again because, contrary to the article downloads, a time frame covering only the past two years is too short in order to consider citations sufficiently. Even when considering a longer time span (for instance, a five year impact factor), a very strong relationship cannot be expected because article downloads have a much broader “meaning” than citations in the Web of Science which denote only a small aspect of science communication. Or in the words of Rowlands and Nicholas (2007, p. 222f): “authors and readers are not the same … They have different needs and different requirements of the journals literature.”