Introduction

In comparison to the many dozens of articles reviewing and comparing (coverage of) the Web of Science, Scopus, and Google Scholar (for the latest see e.g. Delgado-López-Cózar and Repiso-Caballero 2013; Wildgaard 2015; Harzing and Alakangas 2016), the bibliometric research community has paid very little attention to Microsoft Academic Search. A Google Scholar search for journal articles with Microsoft Academic Search (MAS) in the title produces only five results. The same search for Google Scholar, the Web of Science, or Scopus produces many hundreds of journal articles for each database. This is quite surprising given that Nature reporter Van Noorden (Van Noorden 2014), a frequent commentator on bibliometric developments, wrote: “A few years ago, Microsoft Academic Search (MAS) was vying with Google Scholar to be the web’s pre-eminent free scholarly search engine. Both products indexed tens of millions of scholarly documents, tracked their citations, and made profile pages for academics. […] The stage was set for bibliometric battle”.

Jacsó (2011) was the first to write about MAS, providing a review of the major content and software features, as well as its shortcomings. His verdict was: “this free bibliometric service is a project of great interest to those interested in metrics-based research performance evaluation” (Jacsó 2011: 983). Surprisingly, a full 3 years passed without any articles dealing with MAS until Ortega published two articles in 2014. The first compared 771 author profiles between Google Scholar Citations and MAS (Ortega and Aguillo 2014) and concluded that Google Scholar reported more publications and citations. The second (Ortega 2014) used MAS to study co-author networks and highly recommended MAS for collaboration studies, provided problems with duplicate profiles and infrequent updating could be resolved.

So why has the bibliometric community almost completely ignored MAS? One of the reasons might have been that its native interface was not very suitable for citation analysis. However, the same is true for Google Scholar and bibliometric researchers have turned en masse to Publish or Perish (Harzing 2007) to do bibliometric research with Google Scholar. Publish or Perish has included a search option for MAS since 2013, which was used by Haley (2014) to compare Economics & Finance journals in Google Scholar and MAS. Haley found citations levels to be substantially higher in Google Scholar than in MAS, with the mean h-index roughly twice as high. Rank correlations, however, were found to be very high.

The -to date- last article published on MAS might explain the bibliometric community’s lack of enthusiasm. Orduña-Malea et al. (2014) published a comprehensive analysis of MAS coverage and showed—as many users had no doubt noticed through incidental searches—that almost no new coverage had been added since 2012. However, fast forward 2 years and Microsoft Academic Search has arisen from the ashes with a new service—Microsoft Academic—built on content that search engine Bing crawls from the web, including publisher websites, university repositories, researcher, and departmental web pages. Citation counts are the sums of the reference links between the papers.

However, the big question that will burn on bibliometricians’ minds is: Is its coverage any better than its previous incarnation? This article provides a first attempt to answer this question through a comparison of the publication and citation record of a single academic for each of the four main citation databases: Google Scholar, Microsoft Academic, the Web of Science, and Scopus.

An individual academic record as a case study

In order to assess the coverage of the new Microsoft Academic in comparison to Google Scholar, Scopus, and the Web of Science, I conducted a detailed analysis of my own publication record. Although this is obviously a limited test of Microsoft Academic as a new data source for citation analysis, there are several reasons why I think my own publication record presents an appropriate test.

First, it includes enough publications—varying from 47 in the Web of Science to 124 in Google Scholar—to avoid idiosyncratic results. In addition, with over 10,000 Google Scholar citations and relatively few publications without citations (generally 2016 publications and conference papers), citation levels are also high enough to avoid idiosyncratic results.

Second, covering 22 years (1995–2016), it includes both older and younger publications, including some papers only available in online first. This should allow us to assess to what extent Microsoft Academic covers older publications as well as very recent ones.

Third, it includes a wide variety of publications. Looking at the 124 Google Scholar publications, 47 are in journals that could be considered to be mainstream in their field, 22 are in secondary journals, 20 are book chapters, 15 are conference papers, 12 are white papers published only on my website, three are books, and three non-refereed publications (two newsletter articles, one company report). The two final ones are a journal ranking available only on my website (The Journal Quality List) and a software program (Publish or Perish). This variety should allow us to assess the extent to which Microsoft Academic covers non-traditional publications.

Fourth, virtually all of my academic publications are included in Google Scholar, including all of my journal articles, books and book chapters, as well as 12 of my 14 white papers. The only two white papers that are not listed in Google Scholar relate to teaching (“Writing coursework assignments” and “How to address your teacher”). Not all of my conference papers are listed in Google Scholar, but this is only natural, as many of them never appeared online. Hence, my Google Scholar publication record provides an excellent baseline for our comparison across databases.

Data collection

Searches for Google Scholar and Microsoft Academic were conducted with Publish or Perish (Harzing 2007). Publish or Perish is used primarily in conjunction with Google Scholar, but has recently implemented experimental support for Microsoft Academic. It also offers extensive data import facilities, providing the ability to import amongst others Scopus and Web of Science data. Searches for Scopus and the Web of Science were thus conducted in their native interfaces, exported and subsequently imported into Publish or Perish to allow for calculation of the various citation metrics. Results for all four databases were subsequently exported to Excel, allowing for one-on-one matching of publications and comparison of citations counts.

Only publications with substantive academic content were included in our comparison. This means that we excluded book reviews, errata and corrigenda for all four databases. Stray publications, i.e. publications referring to the same master record with slightly different bibliographic details were merged into their master record for both Google Scholar and Microsoft Academic. Obvious parsing errors, such as lists of reviewers, were also excluded, as were publications by other authors in my edited textbook. There were far more stray publications and parsing errors for Google Scholar than for Microsoft Academic.

Microsoft Academic only displayed two clear parsing errors. In both cases authors of one publication were combined with publication details of another. In addition, there were about ten incongruous stray publications created by picking up pre-publication versions with a different title or publications from two different sources; none of these had any citations. A special category of stray publications in Microsoft Academic concerned five articles where citations were split between a version with the main title only, and a version with both the main title and a sub-title. In addition, there were two articles where citations were split between two versions of the document, because the separator between main and sub-title had been processed in different ways. For instance a question mark had been variously replaced by |[quest]| and a single letter q.

Finally, we discovered one other problem with Microsoft Academic that will need to be fixed before any metrics based on the year of publication can be used: the fact that the database indicated the wrong publication year for some papers, even though Microsoft Academic listed the correct journal volume. Incorrect year allocations are by no means uncommon in Google Scholar either. In fact, seven of my 124 publications were allocated the wrong publication year in Google Scholar, two because of inexplicable parsing errors (the source document displayed the correct year) and five because Google Scholar used a pre-publication version as its master record. However, as these incorrect year allocations were only year 1 year “out”, this is not generally a major problem.

Incorrect year allocations were more frequent in Microsoft Academic: no less than eighteen out of my 89 publications had the wrong publication year. Out of these, one was an inexplicable parsing error of a fairly obscure book chapter and one occurred as Microsoft Academic used a 2012 reprint in a Romanian journal as the source for a 2008 white paper. Just like Google Scholar, Microsoft Academic incorrect year allocations also occurred because of using the pre-publication or online first version as a source record (seven occurrences in total). In all these cases the publication year was only 1 year out, which is unlikely to cause major problems. A more disturbing problem was the fact that nine of my publications carried the wrong year in spite of referring to a source document with the correct year. In this case, the publication year was often “way out” (10 years or more in three cases). All nine records concerned journals published by either Emerald or Taylor & Francis, with the five Emerald records all being allocated a 2013 publication year (with actual publication years varying between 2001 and 2012). Hence, it would appear that there is a parsing problem with the websites of these two specific publishers, which will hopefully be resolved soon.

Results

Figures 1 and 2 visually display the comparative coverage of the four databases with regard to publications and citations. For both cases, we will first discuss the overlap in coverage between the databases and then look at the publications and citations unique to each of the four databases. As our interest in this article is in Microsoft Academic coverage, we do not provide a comparison between Google Scholar on the one hand and the Web of Science and Scopus on the other hand, or between the Web of Science and Scopus. There are many publications that have already done so in the past, including most recently Harzing and Alakangas (2016).

Fig. 1
figure 1

Comparing publication coverage between four data-bases

Fig. 2
figure 2

Comparing citation coverage between four data-bases

Publications: overlap between the four databases

As indicated above, I have 124 unique publications in Google Scholar. Of these, 89 were also present in Microsoft Academic Search; this included all 69 of my journal publications; all three books, seven of the fifteen conference papers, seven of the twenty book chapters, one of the white papers and both of the newsletter articles.

Of the 89 publications listed in Microsoft Academic, only 46 were listed in the Web of Science. All of these were journal articles. This included 40 of the 47 publications in mainstream journals, but only 6 of the 22 publications in secondary journals.

Of the 89 publications listed in Microsoft Academic, only 59 were listed in Scopus. All but three of these were journal articles. This included 44 of the 47 publications in mainstream journals (including two in-press articles) and 12 of the 22 publications in secondary journals. Scopus also covered two conference papers published in the Academy of Management’s best papers proceedings and one book chapter in the series Progress in International Business Research (Emerald publishers), a yearly research annual with selected papers presented at the European International Business Academy conference.

Conclusion

In comparison to the Web of Science and Scopus, Microsoft Academic covers a far larger number of publications that are listed in Google Scholar and—importantly—covers all journal publications and books that are also covered in Google Scholar. This suggests that Microsoft Academic has excellent coverage of what are usually considered to be the most important academic outputs: journal articles and books.

Publications: unique coverage in the four databases

Microsoft Academic compared with Google Scholar

There are no publications covered in Microsoft Academic that are not covered in Google Scholar (B1 = 0). Google Scholar included 35 publications that were not included in Microsoft Academic (A1 = 35). As indicated above, Microsoft Academic included all journals articles and books in our case study. Hence the 35 publications unique to Google Scholar were book chapters (13), white papers (11), conference papers (8), a web-based journal ranking (the Journal Quality List), a software product (Publish or Perish), and a company report.

For nearly half of these publications (17 publications), Google Scholar records are of the “[citation]” type, indicating that although Google Scholar found citations to these publications, it was not able to find the original publication. Of the remaining 18 publications, eleven publications were found on the author’s personal academic website, three on Google Books, three in online conference proceedings, and one on the website of Emerald publishing.

As Microsoft Academic did find seven of the book chapters, seven of the conference papers and one of the white papers, we tried to establish whether these publications differed in any way from the ones that were only listed in Google Scholar. This was easy for the sole white paper as Microsoft Academic actually found a reprint of this white paper in a Romanian journal. Of the seven book chapters, four were sourced from pre-publication versions at the author’s website, one from Researchgate, one from an institutional repository, and one didn’t have a source item. Of the seven conference papers, four were sourced from the Academy of Management proceedings, one from a pre-publication version at the author’s website and two from university repositories. It is unclear why some book chapters and conference papers available as pre-publication on the author’s website were sourced by Microsoft Academic and others were not.

Microsoft Academic compared with Web of Science

In total, there are 43 publications covered in Microsoft Academic that are not covered in ISI (B2 = 43). Microsoft Academic covered twenty non-journal publications (books, book chapters, conference papers, white papers, and newsletter articles) that were not included in the Web of Science.

However, seven of the articles published in mainstream journals included in Microsoft Academic were not included in the Web of Science either. For three of those, this was caused by the fact that the publications were either available only in online first (two) or were recently published, but not yet entered into the Web of Science database. The remaining four journal articles unique to Microsoft Academic concerned publications in 1995, 1996, 1997 and 2003 in journals that were not ISI listed at the time, but are included in the Web of Science now.

Of the twenty-two publications in secondary journals that are covered in both Google Scholar and Microsoft Academic Search, sixteen were not listed in the Web of Science at the time the publications appeared. These publications represent eleven different journals and all but one of the publications occurred between 2001 and 2008. Of these eleven journals, all but oneFootnote 1 are now included in the Web of Science.

In contrast, there is only one publication listed in ISI that is not listed in Microsoft Academic (A2 = 1). This concerns a book chapter in an edited book, published by Routledge in 2011.

Microsoft Academic compared with Scopus

In total there are 30 publications covered in Microsoft Academic that are not covered in Scopus (B3 = 30). The comparison between Microsoft Academic and Scopus for non-journal publications is similar in nature to that between Microsoft Academic and the Web of Science in that Microsoft Academic included seventeen non-journal publications that Scopus did not cover.

The three unique publications in mainstream journals in Microsoft Academic included two articles published in 1995 and 1996 before the original start of Scopus coverage in 1996.Footnote 2 A final publication in 2003 was published in a journal that was not listed in Scopus until 2005.

Of the 22 publications in secondary journals that are covered in both Google Scholar and Microsoft Academic Search, ten were not listed in Scopus at the time the publications appeared. These publications represent eight different journals and all but one of the publications occurred between 2001 and 2008. All eight journals are now included in Scopus, with Scopus adoption nearly always occurring only 1 or 2 years after the relevant articles were published.

In contrast, there are only two publications listed in Scopus that are not listed in Microsoft Academic (A3 = 2). This concerns the same book chapter as listed in the Web of Science, plus another book chapter in a research annual Advances in International Management, published by Emerald publishers in 2003.

Conclusion

Microsoft Academic performs very well in our comparison of unique coverage in the four databases. On the one hand, it does not display any unique coverage vis-à-vis Google Scholar, whereas Google Scholar has 35 additional publications not covered by Microsoft Academic. On the other hand, it does display a substantial unique coverage vis-à-vis both the Web of Science (43 publications) and Scopus (30 publications). Unique coverage for the Web of Science and Scopus vis-à-vis Microsoft Academic is miniscule: one book chapter for the Web of Science and two book chapters for Scopus.

In addition to many non-journal publications, the unique coverage for Microsoft Academic includes 23 journal articles when compared to the Web of Science and 13 unique articles when compared to Scopus. It must be acknowledged that all but one of the journals in question are now covered in both the Web of Science and Scopus, thus indicating that they were by no means obscure journals. Hence, for very recent journal publications there might be little, if any, difference between the coverage of Google Scholar, Microsoft Academic, the Web of Science, and Scopus. This is of little solace, however, for academics with (an interest in) publications that stretch back in time. In those situations, only Google Scholar and Microsoft Academic will provide sufficient coverage.

Citations: overlap between the four databases

Figure 2 provides a visual illustration of both the overlap and the unique coverage of the four databases in terms of the citations associated with the relevant publications. For those 89 publications that overlap between Microsoft Academic and Google Scholar, Google Scholar has more than 2.5 times as many citations as Microsoft Academic.

Part of the reason for this is that Microsoft Academic citation counts for non-journal publications in particular were quite modest. With 97 citations, only the Management the Multinationals book had a substantive number of citations, although this was still considerably lower than in Google Scholar (433 citations). However, for the two other books, the comparison with Google Scholar was even more unfavourable: 20 versus 203 citations for The Publish or Perish Book and 14 versus 392 citations for the International HRM textbook. Most of the seventeen conference papers, book chapters, and non-refereed publications had either zero or one citation in Microsoft Academic. In fact the total number of citations for these seventeen publications in Microsoft Academic was only 26. Google Scholar’s citations level for these seventeen publications was not very high either, but at 187 was still seven times as high.

When comparing citations for the 46 publications that are listed in both Microsoft Academic and the Web of Science, we find that Microsoft Academic has approximately 20 % higher citations levels overall. This doesn’t mean that every individual publication shows the same pattern. More than one-third of the publications (17 out of 46) has at least 20 % more citations in Microsoft Academic, going up to 94 and 170 % for two specific journal articles. Another third of the publications (16 out of 46) has between 3 % and 19 % more citations or has citation levels equal to the Web of Science. Thirteen articles had fewer citations in Microsoft Academic than in the Web of Science, but the difference in all cases was marginal, 1–3 citations for eleven articles and 4 or 5 for the remaining two.

When comparing citations for the 59 publications that are listed in both Microsoft Academic and Scopus, we find that overall citation levels are very similar indeed, with citations in Microsoft Academic being less than 1 % lower than in Scopus. This is reflected in the article-by-article comparison where roughly half of the articles had more citations in Microsoft Academic and half had more citations in Scopus. Absolute differences, however, were fairly small; only eight articles differed by more than 10 citations either way, and more than half of the articles differed by 3 citations at most.

Conclusion

Microsoft Academic performs very well in terms of citation counts for articles that overlap with other databases. It outperforms the Web of Science for nearly all articles and is an equal to Scopus. Only Google Scholar still outperforms Microsoft Academic in this respect.

Citations: unique coverage in the four databases

In addition to comparing citations for publications that can be matched across databases, it is important to assess to what extent unique publications in each database contribute to the overall citation count.

Microsoft Academic compared with Google Scholar

As there are no publications unique to Microsoft Academic, there are no unique citations for Microsoft Academic when compared to Google Scholar (B1 = 0). There are, however, 35 unique publications in Google Scholar that have accumulated 1310 citations in total (A1 = 1310). Most of these citations came from Publish or Perish (521 citations) and two book chapters published in research annuals (189 and 101) that were not covered Microsoft Academic. Five further publications unique to Google Scholar with significant citation levels were the Journal Quality list (79), three chapters on international assignments in three different editions of my International Human Resource Management book (67, 51 and 48 citations) and a conference paper comparing Google Scholar with the Web of Science (46 citations). Hence, 84 % of the unique citations in Google Scholar came from less than a quarter of the unique publications.

Microsoft Academic compared with Web of Science

There are 43 unique publications in Microsoft Academic when compared to the Web of Science, which have accumulated 1210 unique citations (B2 = 1210). Most of these unique citations came from journal publications, including four fairly highly cited publications (63–207 citations) in secondary journals. More than a third—generally either conference papers or very recently published journal articles—of the 43 unique publications had either no citations or just 1 citation. Hence, three quarters of the unique citations in Microsoft Academic came from just 16 % of the unique publications.

The only unique publication listed in ISI (a book chapter in an edited book) didn’t have a single citation. Hence there are no unique citations in Web of Science when compared to Microsoft Academic.

Microsoft Academic compared with Scopus

There are 30 unique publications in Microsoft Academic when compared to Scopus, which have accumulated 596 unique citations (B3 = 596). Most of these unique citations came from journal publications, including four fairly highly cited publications (37–71 citations) in secondary journals. A third of the 30 unique publications—generally either conference papers or book chapters—had no citations. Hence, more than three quarters of the unique citations in Microsoft Academic came from less than a quarter of the unique publications.

Only one of the two unique publications listed in Scopus (a book chapter in a research annual) had citations. As this book chapter was fairly highly cited (A3 = 85 citations), in contrast to the Web of Science, Scopus did have a non-negligible number of unique citations when compared to Microsoft Academic.

Conclusion

Microsoft Academic performs very well in our comparison of unique citations in the four databases. On the one hand, it does not display any unique citations vis-à-vis Google Scholar, whereas Google Scholar has 1310 additional citations not covered by Microsoft Academic. On the other hand, it does display a substantial number of unique citations vis-à-vis both the Web of Science (1210 citations) and Scopus (596 citations). Unique citations for the Web of Science and Scopus are either non-existent (Web of Science) or relatively modest (Scopus).

Most of the unique citations in Microsoft Academic relate to journal articles and it must be acknowledged that unique citations are concentrated in a fairly small number of unique publications. However, the conclusion that Microsoft Academic performs well in comparison to the Web of Science and Scopus in citation coverage as well as publication coverage is inescapable.

Conclusion

Our detailed comparison of coverage across four databases showed that Microsoft Academic significantly outperforms the Web of Science in terms of both publication and citation coverage. Microsoft Academic can also be considered to be at least an equal to Scopus on both counts. Only Google Scholar outperforms Microsoft Academic in terms of both publications and citations.

The biggest difference between Google Scholar and Microsoft Academic lies in two areas. First, Google Scholar includes coverage of non-standard research outputs, such as the Publish or Perish software, thus providing additional citations for unique publications. Second, Google Scholar has more citations for all of the overlapping publications, and substantially more in some cases.

We did find that the additional journal coverage of both Google Scholar and Microsoft Academic concerned journals that currently are included in both the Web of Science and Scopus, even though they were not at the time the articles in question were published. Thus differences between databases might become smaller over time. However, for those interested in a cross-section of younger and older publications, both Google Scholar and Microsoft Academic appear to be a better choice than the Web of Science or Scopus.

So what does this mean for an individual academic? A comparison of my h-index across databases shows it to be more than twice as high in Google Scholar (46) than in the Web of Science (22). Microsoft Academic (30) and Scopus (27) provide values in between these two extremes. In terms of the hIa—an individual annualized h-index (see Harzing et al. 2014), differences are smaller as both Scopus and the Web of Science miss coverage of a range of older articles, thus reducing the number of years since my first publication. As a result, the values of the hIa for Scopus (1.11), Microsoft Academic (1.10) and the Web of Science (1.06) are very close together. At 1.81, the hIa in Google is substantially higher.

Overall, this first small-scale case study suggest that—provided some teething problems with regard to publication duplicates and wrong year allocations can be resolved—the new incarnation of Microsoft Academic presents us with an excellent alternative for citation analysis, especially if coverage for books and non-traditional research outputs could be further improved. If our findings can be confirmed by larger-scale studies, Microsoft Academic might well turn out to combine the advantages of broader coverage, as displayed by Google Scholar, with the advantages of a more structured approach to data presentation, typical of Scopus and the Web of Science. If so, the new Microsoft Academic service would truly be a Phoenix arisen from the ashes.