Introduction

Citation counts, or formulae based upon citation counts, are widely used as indicators for the scholarly impact of individual academic articles, journals and groups of articles. They are used to support expert judgement in formal evaluations and to support decision making less formally and for self-evaluations. An important drawback of citation counts is that it can take several years for a typical article to be cited enough to point to its likely long-term impact. Thus, citation windows of several years are often used in citation analysis (e.g., Glänzel 2004), although 2 years can be enough to give limited information, if reduced accuracy is acceptable (Stern 2014), and early citation counts may be combined with journal impact factors for improved estimates of long term impact (Levitt and Thelwall 2011; Stegehuis et al. 2015).

In response to the need for early estimates of long term impact, a range of faster impact indicators have been proposed, including altmetrics, which are derived from the social web (Piwowar and Priem 2013; Priem et al. 2010). Counts of readers in the social reference manager Mendeley (Gunn 2013) show promise because they appear earlier than citations but have moderate or strong correlations with them in most fields in the long term (Haustein et al. 2014; Thelwall 2017c, 2018). They are also better for identifying highly cited articles than journal-based citation indicators (Zahedi et al. 2017). In addition, Mendeley reader counts correlate positively with peer judgements of academic quality in most fields (HEFCE 2015). One previous study has taken advantage of the early availability of Mendeley reader counts to get early evidence of the effectiveness of an article dissemination strategy (Kudlow et al. 2017). Nevertheless, no previous study has assessed whether early Mendeley reader counts correlate with later citation counts, as has previously been shown in one context for Twitter (early Journal of Medical Internet Research tweets associate with later citations: Eysenbach 2011) and downloads (early arXiv downloads associate with later citations: Brody et al. 2006). This omission needs to be filled if Mendeley reader counts can be used with confidence as early impact indicators.

Several previous papers have addressed the influence of time on the relationship between citation counts and synchronous Mendeley reader counts. Based upon six library and information science journals, during the year in which a journal issue is published the correlation between the citation counts and Mendeley reader counts for its articles can be expected to grow from zero to weakly positive (Maflahi and Thelwall 2018). Similar results were gained from an eighteen-month study of the Library and Information Science field (Pooladian and Borrego 2016). In the longer term, a study of 50 fields found that correlations between citation counts and Mendeley reader counts tended to be low in the year of publication but to increase annually for about 5 years, then becoming stable (Thelwall and Sud 2016). This data was based on a different set of publications for each time period, rather than the same set of publications for different time periods.

Only a minority of researchers use Mendeley, with one survey estimating 5–8% (Van Noorden 2014), and so Mendeley reader counts underestimate the total number of readers of an article by about 10–20 times. According to a different survey, users typically record articles that they have read or intend to read (Mohammadi et al. 2016). Combining these, it is reasonable to hypothesise that each Mendeley reader represents 10–20 article readers altogether. Mendeley users tend to be junior researchers and so the counts are likely to be biased towards articles of interest to younger researchers (Mohammadi et al. 2015). They are also biased against topics of interest in countries that use Mendeley the least (Thelwall and Maflahi 2015).

Other data sources have also been proposed for early impact indicators but all have drawbacks compared to Mendeley. Twitter mentions of research articles may give earlier evidence of interest but tweets seem to reflect publicity much more than scholarly impact (Haustein et al. 2016). Most other proposed altmetrics have much lower coverage than Twitter and Mendeley in terms of the number of articles with non-zero scores (Costas et al. 2015; Thelwall et al. 2013), including other reference managers, such as BibSonomy (Borrego and Fry 2012). Article downloads are, in theory, almost the ideal evidence of interest (Moed and Halevi 2016; Schloegl and Gorraiz 2010), especially with initiatives like COUNTER to standardise them, but are not routinely shared by publishers. Google Scholar (Halevi et al. 2017) and Microsoft Academic (Harzing and Alakangas 2017; Hug et al. 2017) also provide earlier citations than traditional citation databases but these are influenced to some extent by publication delays, and get lower values than Mendeley for recently published articles (Thelwall 2018).

The goal of this paper is to assess whether early Mendeley reader counts indicate later citation impact in the sense that they correlate strongly and positively with later citation counts. To be useful, Mendeley reader counts must correlate more strongly than early citation counts, otherwise the latter would be preferable. The following research questions therefore drive the study.

  1. 1.

    Do early reader counts correlate strongly with later citation counts in all fields?

  2. 2.

    Do early reader counts correlate more strongly than early citation counts with later citation counts in all fields?

The term “strongly” is used loosely in the research questions. There are guidelines for interpreting correlation coefficients, such as 0.1 is small, 0.3 is medium and 0.5 is large for behavioural research (Cohen 1992). There is no standard interpretation of correlation coefficients for general research purposes because their significance depends partly on the normal level of uncontrolled variability in a test. For citation counts and Mendeley reader counts, they are also affected by average values (Thelwall 2016). Thus, there cannot be a simple guideline for interpretation in the context of comparing datasets with different averages, as in the current paper. The solution adopted here is to use the term strong for correlations approaching 0.5, moderate for correlations close to 0.3, and weak for lower positive correlations but to discuss the influence of time alongside correlation coefficient values, when relevant.

Methods

The research design was to correlate early reader and citation counts with later citation counts for a heterogenous set of research fields.

Data

The raw data used is partly reused from a previous paper (Thelwall 2017a) that analysed Mendeley reader counts for ten Scopus fields using data from February 2016. These ten categories were chosen to represent a range of different fields. On 2 February 2016, Scopus was queried for all articles indexed in these fields with a publication year of 2016. These articles would therefore be formally up to a month old, although they may have been previously published as online first or author preprints (Haustein et al. 2015). These articles also had their Mendeley readership counts downloaded from Mendeley during 2–3 February 2016 using the Mendeley Applications Programming Interface via the free Webometric Analyst software. This program identified matching article records in Mendeley by using DOI searches (if present) as well as metadata searches (author names, title and publication year), totalling the reader counts of all matching records found (details in Thelwall and Wilson 2016; see also Zahedi et al. 2014).

The dataset is dominated by first issues of journals published near the start of January 2017 but also includes additional issues of some journals published in early February. For simplicity, all were kept although this will tend to reduce the strength of correlation coefficients by including the younger articles. Previous research suggests that the influence of the additional month on Mendeley readers is probably minor (Maflahi and Thelwall 2018).

New for the current paper, Scopus citation counts (23 September 2017) and Mendeley reader counts (23–24 September 2017) for the same ten fields were downloaded, querying Scopus for the earliest published articles from each of the ten fields in 2016. The datasets were then merged, discarding records that were only found in 2016 or only found in 2017. Thus, each remaining article had Scopus citation counts from February 2016 and September 2017 and, if the article had been found in Mendeley, reader counts from one or both months.

Analysis

For the first research question, the later citation counts (September 2017) were correlated against the early Mendeley reader counts (February 2016) separately for each field. It is important to separate fields before calculating a correlation coefficient because correlations can be inflated by mixing high and low citation specialisms. Spearman correlations were used instead of Pearson correlations because both citation counts (de Solla Price 1976) and Mendeley reader counts (Thelwall and Wilson 2016) are highly skewed.

Confidence intervals were calculated for each correlation coefficient using the Fisher (1915) transformation. This is important for fields with low sample sizes for which the correlation coefficient may be imprecise. Confidence intervals are for the underlying strength of association for the field, given that the set of articles are from one period but the research questions address general relationships. The confidence intervals should be interpreted cautiously because the samples are not random (other months may give different values). Moreover, individual data points are also not fully independent (because articles are published in journals and journals may have different characteristics), violating the statistical assumptions behind confidence interval calculations.

For the second research question, the above results were compared to the correlation between the Scopus citation counts from February 2016 and September 2017.

Average citation counts and reader counts were calculated for each field as background information. Geometric rather than arithmetic means were used due to the skewed nature of the datasets (Thelwall and Fairclough 2015; Zitt 2012).

Results

There were almost no citations recorded in Scopus in February 2016 to articles that it had indexed from 2016 (Table 1: Cites 2016 column). In contrast, at this date the average number of readers per article was 1. Correlations between these two were low and variable (Table 2), which might suggest that early Mendeley reader counts are not useful as citation impact indicators. Nevertheless, the early Mendeley reader counts (February 2016) have moderate or strong correlations with later (September 2017) citation counts so the low early (both data sets from February 2016) correlations mask the usefulness of the early Mendeley reader counts as indicators of citation impact. The reason for the low early correlation is that low average values for discrete data can mask the strength of the underlying relationship between two variables (Thelwall 2016). This conclusion is the same whether missing Mendeley reader counts are treated as missing variables (removed from the data set) or unread articles (kept in the dataset but assigned a reader count of 0).

Table 1 Geometric mean citation counts and Mendeley reader counts per article for the ten fields
Table 2 Spearman correlations (95% confidence intervals) between Scopus citation counts from February 2016 and Mendeley reader counts from February 2016

The two categories with the lowest correlations between citation counts from 2017 and reader counts from 2016, Maternity and Midwifery and Occupational Therapy (Table 3) both have few articles. They have confidence intervals with upper limits of at least 0.43 and so it is plausible that for larger samples these areas would show at least moderate correlations. These two fields have the lowest and third lowest average reader counts in 2016, making the correlation tests least powerful. Seven out of the 18 Maternity and Midwifery articles were from MCN The American Journal of Maternal/Child Nursing, including some articles that seemed to translate research for nurse practitioners (e.g., “Teen mothers’ mental health”, “Safe sleep: Hospitalized infants”, “Preeclampsia”), which may explain their low Mendeley reader counts (5 had no Mendeley readers in February 2016). The 36 Occupational Therapy articles were from four journals and so the results could be affected by journal-specific considerations. For example, there was only one February 2016 reader in total for the nine Journal of Vocational Rehabilitation articles (volume 1, issue 1, published 7 January 2016, according to Scopus). None of the articles in this journal issue had online preprints, according to Google Scholar, although two had post-publication author copies of the final article uploaded in June 2016 and April 2017. Thus, the low initial Mendeley reader counts may be partly due to a lack of preprint sharing in this journal specialism.

Table 3 Spearman correlations (95% confidence intervals) between Scopus citation counts from September 2017 and three other indicators (Scopus citation counts and Mendeley reader counts from February 2016 and Mendeley reader counts from September 2017)

The usefulness of early Mendeley readers as citation impact indicators can be seen by the correlations with 2017 citations correlating more highly with 2016 readers (Table 3) than with 2016 citations (Table 3). Thus, early readers are better indicators of later citation impact than are early citations, even though early citations do positively correlate with later citations (confirming: Adams 2005). This is due to the much greater number of uncited articles than unread articles in the 2016 data.

The highest correlations reported are between citations and readers from 2017 (Table 3). This is probably due to the higher average values for Mendeley readers in 2017 compared to 2016 (Table 1), making the data more powerful (Thelwall 2016).

Discussion

This study is limited by the sample being only ten fields out of 335 available in Scopus. The results may not apply to some fields, especially those with low Mendeley reader counts or low Scopus citation counts. It is also limited by the use of only one time interval (18 months) and one starting point. Although it seems likely that correlations would tend to be stronger for longer gaps to the citation count data because the counts would have a higher average, this has not been proven. The extent to which the magnitude of the correlations has been affected by any different nature of early Mendeley readers is unknown. For example, it is plausible that a higher proportion of early Mendeley readers are article authors than of later readers. It is not possible to separate the effect of the size of count averages and unusual properties of early readers from the correlation coefficient values.

The results complement prior research showing positive correlations between citation counts and Mendeley reader counts in the long term for all fields (Thelwall 2017c) and research showing that these correlations tend to be higher for longer time periods (Maflahi and Thelwall 2018; Thelwall and Sud 2016; Thelwall 2017a) by revealing, for the first time, that early Mendeley reader counts correlate with later citations. Although this seemed likely from previous studies, it was possible that early Mendeley readers were somewhat unusual and would therefore not correlate with later citations. For example, download counts have been shown to have a different temporal character to citation counts for one journal (Moed 2005), suggesting that early usage evidence may have a different quality to later usage evidence. Although this might still be the case to some extent, the evidence from the current paper suggests that this is not an important consideration. It is therefore safe to use early Mendeley reader counts as later citation impact evidence.

Conclusions

The results give clear evidence that early Mendeley readers are useful indictors of later citation impact in most, and perhaps all, fields and are better than early citations in this regard. Added to prior evidence that reader counts and citation counts have moderate or strong correlations in almost all fields in the longer term (Thelwall 2017c), this establishes Mendeley reader counts as a useful early impact indicator that should be considered for evaluations involving recently published articles.

Citation counts are not universally useful as indicators of the quality of academic research, as judged by experts (HEFCE 2015) and so Mendeley reader counts inherit the limitations of citation counts in this regard.

The main drawback of Mendeley reader counts is that they can be spamed and so are not recommended for important evaluations when the participants are aware in advance (Wouters and Costas 2012). Other limitations include the national and age biases discussed above. In addition, in some fields Mendeley reader counts may reflect a degree of educational or professional impact in addition to scholarly impact (Thelwall 2017b, c).

In summary, Mendeley reader counts are recommended as early impact indicators for situations where citation counts are valued as impact indicators in the fields analysed, there are no stakeholders that may manipulate Mendeley reader counts or the stakeholders are not aware of the indicators in advance, and the task involves recently-published research (e.g., up to 2 years old).