Introduction

There is a comprehensive literature comparing academic publication and/or citation coverage for the Web of Science, Scopus, and Google Scholar (for recent examples see e.g. Harzing and Alakangas 2016; Martin-Martin et al. 2018), as well as a growing number of studies on Microsoft Academic (see e.g. Harzing and Alakangas 2017a, b; Hug and Brandle 2017; Hug et al. 2017; Thelwall 2017, 2018a). To date, however, very few studies have investigated coverage of the two latest new sources for academic publication and citation data: Crossref and Dimensions.

Crossref, a not-for-profit organization, was founded in 2000 by 12 publishers to simplify the process of linking to research on other publisher platforms; since then it has grown to over 11,000 members from 128 countries (Fairhurst 2019). It has developed a wide range of functions over the years, but for this article our main interest is the addition of open citation data in April 2017, making it possible to use Crossref for citation analysis through an API. Since November 2017, Publish or Perish (Harzing 2007) has provided the option of searching for authors, journals and key words in Crossref. Although several articles have been published on the Crossref initiative, to the best of our knowledge there have been no articles reviewing its publication and citation coverage.

Dimensions was launched by Digital Science in January 2018 (see Orduña-Malea and Delgado-López-Cózar 2018 for an excellent summary of its history and functionality). This article focuses on the free version of Dimensions, which offers access to a subset of the data available in Dimensions Plus. To the best of our knowledge, there are only two published studies that have investigated Dimensions coverage. Thelwall (2018b) showed that for publications in the field of Food Science between 2008 and 2018 and a random sample of 10,000 publications for 2012 from all fields, coverage and citation counts of Dimensions were comparable to those of Scopus. Orduña-Malea and Delgado-López-Cózar (2018) showed that for most of 17 Library and Information Science journals, the h5 index in Dimensions was only slightly lower than for Scopus, but substantially lower than for Google Scholar. A detailed comparison for the Journal of Informetrics showed that both publication and citation counts for the years 2013, 2014 and 2015 were almost identical for Dimensions and Scopus. Finally, a comparison between Scopus, Google Scholar Citations and Dimensions for all 28 authors who won the Derek de Solla Price showed that citations were significantly lower in Dimensions than in both Scopus and Google Scholar Citations, thus signaling a disappointing coverage for author searches.Footnote 1

This article focuses on a detailed comparison across six data sources for an academic’s full publication and citation record as well as journal searches for six of the top journals in the field of Business & Economics. As such it provides three unique contributions. First, it presents the first study of Crossref coverage and allows us to verify whether, just over a year after its launch, Dimensions author coverage has improved. Second, it compares both Crossref and Dimensions coverage with no less than five other data sources. Third, by comparing author and journal coverage at the level of individual publications, it provides a more fine-grained comparison of coverage. Rather than investigating broad patterns, it thus demonstrate how an individual researcher can benefit from access to these two new data sources.

Data collection

To investigate Crossref and Dimensions coverage, I first conducted a detailed analysis of my own publication record. Despite this being a small-scale test, as discussed in more detail in Harzing (2016) there are four reasons why my own publication record is appropriate for this purpose. It includes a large number of publications in a wide range of journals in Management, International Business and Library & Information Systems, covers a 25-year period, and includes a significant variety of non-traditional publications. Finally, given that Google Scholar covers virtually all of my significant academic publications, it presents an excellent baseline for our comparison across the six data sources. Second, I compared coverage for six top journals in the field of Business & Economics, focusing on a single volume published 10 years ago, thus allowing sufficient time for citations to accrue. Within Business, I included the sub-disciplines of Management, International Business, Accounting, Finance, and Marketing.Footnote 2

All data were collected in the second week of April 2019 with the aid of Publish or Perish (2007). The free Publish or Perish software currently allows for searches in six data sources—Crossref, Google Scholar, Google Scholar Profiles, Microsoft Academic, Scopus and the Web of Science—and has recently implemented experimental in-house Dimensions support. Data for my own publication record were retrieved with an author search, paying special attention to the different search syntaxes in the various data sources. For Google Scholar, I used my manually curated Google Scholar Profile rather than the raw Google Scholar data. Data for the six journals were retrieved with a search for either the full journal title or an ISSN search. Only journal publications with substantive academic content were used for the comparison, thus excluding book reviews, calls for papers, editorial board notices, and errata. Results for all six data sources were subsequently exported to Excel, which allowed one-on-one matching of publications and a comparison of citation counts.

Results

Publication coverage for an individual academic record

My full publication record is comprised of 84 journal articles, four books, 25 book chapters, a software program, and an online compilation of journal rankings (see Table 1). It also includes more than 100 conference papers and well over 200 “other” publications, such as white papers, newsletter/magazine articles, and blog posts. However, the conference papers are generally not available online and the other publications would not typically be seen as academic publications. Hence substantive coverage of these two publication categories would not be expected in any of the six data sources.

Table 1 Publication coverage for academic record across six data sources

As Table 1 shows, both Google Scholar and Microsoft Academic record all of my journal articles and books. Crossref and Dimensions miss one 1996 journal article in a journal that, after a change of publisher, no longer has online coverage for its early years of publication. Both cover only one of the four books, a 2018 Palgrave publication. Scopus does not record any of the books, but does include most of the journal articles. The same 1996 article is missing, as are two articles in European journals that were not yet listed in Scopus in the year of publication [2003 and 2008 respectively]. The final missing journal article is an article published in “online first” and not yet allocated to a journal issue. The Web of Science does record one of the four books, the 2018 Palgrave book, but covers only 61 out of the 84 journal articles.

With regard to book chapters, Google Scholar records virtually all my book chapters, whereas Crossref, Microsoft Academic and the Web of Science record the seven individual chapters of the 2018 Palgrave book, as well as one (Crossref, Web of Science) to three (Microsoft Academic) other chapters. Dimensions and Scopus only record an incidental chapter or two. With regard to conference papers, CrossRef and Dimensions record all my Academy of Management (AoM) Proceedings papers since 2008, two of which were published in full and six with their abstracts only. Microsoft Academic and Google Scholar record the same AoM proceedings papers, but papers from quite a few other conferences too. Scopus only records the two AoM proceedings papers that have been published in full, whereas the Web of Science records none.

Only Google Scholar and Microsoft Academic record any of the “other” publications, mainly recording white papers, blogposts on the LSE Impact blog, and magazine articles. Further, even though it is missing the Journal Quality List, Microsoft Academic does record the Publish or Perish software. None of the four other data sources—Crossref, Dimensions, Scopus or the Web of Science—records any of the non-traditional publications.

Overall, Crossref and Dimensions thus have a better coverage than Scopus and the Web of Science for journal articles, book chapters and conference papers and an equally poor coverage for books and non-traditional publications. However, coverage for the two other free data sources, Google Scholar and Microsoft Academic, is substantially better.

Citation coverage for an individual academic record

Citation coverage for my full publication record was monitored on a monthly basis between early December 2018 and early April 2019. In this period, citation levels for all six data sources increased at a rate of 0.5–1.5% per month. Differences between the data sources remained stable; thus we discuss only the most recent round of data collection.

For my full publication record, Crossref and Dimensions report citation levels of 34% and 37% of those of Google Scholar (see Table 2). This is quite similar to Scopus (38%), whereas Microsoft Academic displays a much higher (88%) and the Web of Science a much lower number of citations (23%). When only comparing journal articles, citation levels across the six data sources diverge less, with Microsoft Academic citation levels (98%) being virtually on par with Google’s, Crossref, Dimensions and Scopus sitting at 42–46% of Google Scholar citations and the Web of Science closing the ranks with 28%.

Table 2 Citation coverage for academic record across six data sources

As overall citation levels can hide large differences for individual publications, we also compared Crossref and Dimensions citations with the four other data sources for each individual publication. Not surprisingly given the difference in overall citation levels, both Crossref and Dimensions had lower citation levels than Google Scholar and Microsoft Academic for each of the publications.

For Scopus, the results are mixed. Although well over half my publications in Crossref and two thirds of my publications in Dimensions are within a range of − 5/+ 5 citations when compared to Scopus, between 20% (Dimensions) and 30% (Crossref) of my publications have substantively fewer citations in these sources than in Scopus. This might well be caused by the fact that Elsevier, the provider of Scopus, doesn’t support the Open Citations Initiative. At the other end of the spectrum, around 15% of my publications have substantively more citations in Crossref and Dimensions than in Scopus. However, the two most important of these are journal articles that are not included in Scopus.

Less than 20% of the publications had a lower level of citations in Crossref and Dimensions than in the Web of Science; in most cases this was a difference of only a few citations. In contrast, well over half of my publications had more citations in Crossref than in the Web of Science, whereas this was the case for nearly two thirds of my publications in Dimensions. There were only two articles, one in JASIST and one in Scientometrics that had a substantially higher number of citations in the Web of Science than in Crossref/Dimensions: 169 versus 155/144 and 40 versus 34/31.

Overall, Crossref and Dimensions thus have citation levels that are fairly similar to Scopus for most publications, substantially higher than the Web of Science for most publications, but significantly lower than Google Scholar and Microsoft Academic for all publications.

Analysis of publication and citation coverage for six Business & Economics journals

Table 3 reports publication coverage for six top journals in the field of Business & Economics. Four of the six data sources—Dimensions, Google Scholar, Scopus and the Web of Science—show identical coverage for each of the six journals. Crossref reports a higher number of publications for two of the journals, published respectively by Springer and Oxford University Press. This is caused by the fact that, for these two publishers, Crossref records the year in which the article appeared in online first as the publication year. Microsoft Academic suffers from the same problem for the OUP journal, but not for the Springer journal. However, Microsoft Academic reports far fewer articles for the Journal of Finance than the other data sources. This was caused by Microsoft Academic listing the year that the missing articles were published in the NBER working paper series as the year of publication. Oftentimes this occurred many years before the official journal publication and thus these articles were missing when the search was confined to the year 2009. These two issues were reported to Crossref and Microsoft Academic and should be relatively easy to resolve.

Table 3 Publication coverage for 2009 volume of six top journals across six data sources

Table 4 reports on citation levels for articles published in the 2009 volume of the six journals. Two patterns are apparent here. First of all, just like for my own publication record, Crossref and Dimensions report citation levels that are very similar or even nearly identical to Scopus and are higher than the Web of Science. When compared to Google Scholar citation levels, average citation coverage is 36.5% for Crossref, 38.5% for Dimensions, 38.6% for Scopus, and 33.9% for the Web of Science. At 99.5% Microsoft Academic citation levels are almost identical to Google Scholar.

Table 4 Citation coverage for 2009 volume of six top journals across six data sources

Second, when compared to Google Scholar and Microsoft Academic, the citation coverage in Crossref, Dimensions, Scopus and the Web of Science is substantially lower for the three journals in Accounting, Finance and Economics, on average around 30%, than for the three journals in Management, International Business and Marketing, on average around 44%. Obviously, we would need a bigger sample of journals to draw any firm conclusions on this, but it appears as if, even within a single discipline, there is strongly variance in coverage between the different data sources.

Conclusion

Our comparison of coverage across six data sources showed that the two new kids on the block, Crossref and Dimensions, hold their own when compared to Scopus and the Web of Science, but are beaten by Google Scholar and Microsoft Academic. This means that if our findings with regard to Crossref and Dimensions can be confirmed by larger-scale studies, our options for literature search and citation analysis would certainly have widened further.

All of the four free data sources have an edge in terms of recency as they often record publications within weeks of them being published online. In contrast, although Scopus does list “in-press” articles, it typically does so much later than the four other data sources; the Web of Science only enters articles in their database as part of a print publication issue. Given that, especially in the Social Sciences, articles can be available in online first for 1–3 years before being allocated to a print issue, this could be a major drawback when doing a literature review. It is also problematic when reviewing an academic’s publication profile as knowing about their recent publications is important in recruitment and promotion, when searching for reviewers, keynote speakers, and examiners, and even in preparing for a meeting with the academic in question.

The six data sources have very different search options and limitations. For author and journal searches, the ease of disambiguation varies substantially between data sources. For keyword searches, some of the data sources allow searching for title words only, for others the standard search is in the title, abstract and keywords, yet others allow for full-text searching. The ease of searching for multiple authors or journals in a single search also differs by data source, as do the availability of affiliation searches and the option to exclude author names or keywords. Thus, access to six sources for publication and citation data—four of which offering free access—with roughly equivalent publication coverage and varying levels of citation coverage offers academics with a wide array of choices for literature reviews and citation analysis.