Introduction

Despite the availability of alternatives such as Scopus and Google Scholar, Thomson Reuter’s ISI Web of Knowledge (or ISI for short) is still used in the majority of benchmarking analyses and bibliometric research projects. In addition, the current reward structure in the academic profession is increasingly focused on counting papers and citations (see Adler and Harzing 2009 for an analysis of the deleterious effects of this). Therefore, it is important to be aware of the limitations of any data provided by ISI. There are several papers that provide good overviews of the problems associated with the use of the ISI Web of Knowledge as a data source (e.g. Seglen 1997; Cameron 2005). Most of these problems revolve around ISI’s limited coverage, especially in the Social Sciences and Humanities. Poor aggregation of citations to minor variations of the same title is also listed as a disadvantage of the ISI (social) science citation index (Reedijk 1998; Harzing and van der Wal 2008).

The current paper, however, deals with another limitation that will be shown to disproportionally affect the Social Sciences: ISI’s misclassification of journal articles containing original research into the “review” or “proceedings paper” category. In the ISI Web of Knowledge each item is categorized into a particular “document type” category. Overall, there are nearly 40 different document types, but the most frequently used are: “article”, “review”, and “proceedings paper”. Although these document categories were initially created to help readers find relevant literature references, changes in academic reward structures mean that these categories could easily turn into value statements on the quality of research contained in them.

The ISI Web of Knowledge does not provide a definition of any document type in their helpfile, but in various documents (e.g. Journal Citation Report Quick Reference Card), Thomson contrasts “review articles” with “original research articles”. There is no commonly agreed definition of review articles and different disciplines might value them differently. However, in the Social Sciences review articles are normally considered to be articles that do not contain original data and simply collect, review and synthesize earlier research, without including substantial theoretical or conceptual development.

Thomson does not define proceedings papers in their helpfile either, but one can only assume them to be papers published in conference proceedings. Conference proceedings are a very common and respected outlet in some disciplines, such as Computer Science. However, in the Social Sciences they are seen as mere stepping stones to future publication in a peer-reviewed journal. The more prestigious conferences (such as the Academy of Management in the field of management) either do not publish proceedings or publish only short abstracted papers.

In most of the Social Sciences, neither review articles nor proceedings papers would be considered worthy of the quality stamp reserved for an original piece of research. Hence, having one of their publications misclassified by ISI might be problematic for individual academics. Disseminating new knowledge through journal publications is usually one of the key criteria for appointment, tenure and promotion in universities (Bailey 2002). Moreover, many universities now offer significant financial rewards for journal articles published in top journals (see e.g. McDonald and Kam 2007). Misclassification could distort international rankings of universities. For instance, one of the most influential rankings, the Shanghai Jiao Tong University ranking, only includes articles and proceedings papers in their analysis.

A reliable distinction between different document types is also important for bibliometric research as some researchers do not include proceedings papers (see e.g. Campanario et al. 2011), or reviews (see e.g. Davarpanah and Aslekia 2008) in their studies. Many even exclude both categories from their analyses (see e.g. Rodríguez-Navarro 2011; Shin et al. 2012; Choi 2012) and focus only on articles. Misclassification of journal articles might therefore distort bibliometric analyses. Finally, although Thomson (Thomson Reuters, nd) includes review articles and proceedings papers in their selection of “hot papers”, their media reporting (Thomson Reuters 2011) excludes review articles. Hence academics that have their papers misclassified as reviews are robbed of the possibility to have their paper publicized by Thomson as a hot paper.

When collecting the 2009 data for a longitudinal research project on editors and editorial board members (see Harzing and Metz 2012a, b; Metz and Harzing 2012), I noticed that some academics had a large number of papers categorised in the “review” and “proceedings paper” document type. This had not been the case in our earlier data collection rounds. A detailed investigation could find no indication that any of these papers were in fact review papers or conference papers. Without exception they were full-length journal articles published in top journals in their field that only publish original research. Their misclassification appeared to be due to Thomson Reuters’ application of inappropriate criteria.

Early in 2010, I wrote to Thomson to highlight the nature and seriousness of these misclassification problems. After a fairly protracted email exchange, they agreed to change the classification of any specific articles that I felt were misclassified. However, they did not appear to acknowledge the fundamental nature of my concerns. This article therefore investigates the occurrence of this misclassification on a larger scale. Based on this, I conclude that both ISI’s “proceedings paper” classification and its “review” classification illustrate a profound misunderstanding of research and publication practices in the Social Sciences.

Methodology

In this article, I report on a comprehensive, 11 year analysis of ISI document categoriesFootnote 1 for three journals in each of four Social Science disciplines (business, management, sociology, and economics), as well as four Science disciplines (mathematics, chemistry, computer science, and neurosciences). In each of these eight disciplines, I selected the three journals with the highest journal impact factor, unless these journals were pure review journals. I also included the three most prominent journals publishing bibliometric research: Journal of the American Society for Information Science and Technology, Journal of Informetrics and Scientometrics.

If a journal appeared in more than one discipline, I use the next ranked journal instead. This was also done if one or more of the journals did not include abstracts in the Web of Science. As a secondary aim of the investigation was to establish whether Thomson had changed their practices since I notified them of the problems early 2010, the analysis was split into two time periods: 2001–2009 and 2010–2011. Since data were collected between October and December 2011, not all of the 2011 journal issues had been entered into the ISI database. However, I have no reason to believe that this would substantially change the results. For all but one of the journals (Journal of Informetrics) data were available since 2001.

For each journal, I recorded the number of publications in the following three document types: articles, reviews, and proceedings papers. The other document types were ignored. For review articles, I also recorded the number of reviews that had more than 100 references. Finally, for this category I established—for each of the review articles—whether the article was correctly classified as a review. As the total number of reviews articles across the nine disciplines exceeded 1,100, this was a fairly substantial task. However, in the vast majority of cases the accuracy of the classification could easily be established by reading the abstract. Only in several dozens of articles in the Sciences was it necessary to download and read the entire article. Articles were only judged to be correctly classified as reviews if they did not include original empirical or substantial conceptual or theoretical work. Articles correctly classified as reviews in the Social Sciences were normally either pure literature reviews or meta-analyses. In the Sciences, review articles generally summarized prior empirical research.

Results

Proceedings papers

In my initial investigation, I found that although proceedings papers were not defined in the helpfile, Thomson did in fact provide a definition of proceedings papers in an FAQ. This FAQ explains why the number of articles in the Web of Science has gone down and the number of proceedings papers has gone up since October 2008, at which time Thomson integrated the Conference Proceedings Index into the Web of Science (see Thomson Reuters 2008). According to Thomson a ‘Proceedings Paper’ is:

a document in a journal or book that notes the work was presented – in whole or in part – at a conference. This is a statement of the association of a work with a conference. Prior to October 2008, these items displayed as “Article” in the Web of Science product.

Indeed, when verifying several “proceedings papers” in our editorial board study, I found that acknowledgements in their articles carried innocent notes such as “A portion of this paper was presented at the annual meeting of the Academy of Management, San Diego, 1998” or “An earlier version of this paper was presented at the Annual Meeting of the Academy of Management, Chicago, 1999” or “This paper builds on and extends remarks and arguments made as part of a 2006 Keynote Address at the Interdisciplinary Perspectives on Accounting Conference held in Cardiff, UK”. Most of these papers were published before 2008. Hence ISI has changed these classifications retroactively.

Simply presenting an early version of your ideas in a 10–15 min (or shorter) slot at a conference or workshop (some of the acknowledgments even referred to small workshops), perhaps attended by less than a dozen people, could therefore mean that your academic contribution is “downgraded” by ISI to be a “proceedings paper”. This is even the case when the conference in question doesn’t even publish proceedings. Such categorization seems to shows a rather limited understanding of the research process in the Social Sciences. Any research paper worth its salt will have been presented in at least one conference or workshop. In fact, it is unlikely that any paper would be accepted for publication in a top journal in the Social Sciences, without ever having been presented publicly to receive feedback.

Does that mean that from 2008 onwards all of the papers published in top journals in the Social Sciences were categorised as conference papers? No, this appears to happen only to those papers whose authors were honest enough to acknowledge that early versions of the paper had been presented at a conference, or to papers whose authors were kind enough to thank participants of a particular workshop for their input. A nice reward for being professional and collegial!

This categorisation process also appears to show a rather limited understanding of the review process in top journals in the Social Sciences. Early versions of a paper might have been presented at conferences. However, the paper that is subsequently submitted to a journal will normally be vastly different from the paper that was earlier presented at a conference. In the Social Sciences, conferences and workshops are often used as a means to test and polish ideas. Even if authors submit fairly polished papers to conferences, these papers will usually need to go through two to four rounds of revisions before they are accepted for the journal.

A longer and more extensive process of revision is likely for the many papers that are not accepted by the first journal approached. As acceptance rates for top journals in the Social Sciences are well below 10 %, the reality is that papers are often submitted to several journals before they get their first revise & resubmit. Maturation of the author(s)’ ideas, reorientation toward different journals, as well as the review process itself means that virtually every paper published has been substantially revised. Hence, the end-product published by a journal often bears very little resemblance to the paper that was originally presented at a conference, years before publication.

When I revisited ISI’s proceedings paper classification late 2011, I found that every single proceedings paper published in a journal was now double-badged as an “article” document type (without adapting the FAQ). Even though Thomson did not acknowledge the validity of my concerns or change their FAQ, they did change their categorization practices. Whilst this change of policy is of course good news for individual academics and bibliometricians, it begs the question why Thomson maintains the proceedings paper category for journal articles at all.

Fortunately, Thomson seems to have realized this as well. Table 1 shows the proportion of proceedings papers in 2001–2009 and 2010–2011. As is immediately apparent, even though in the 2001–2009 period all but the Mathematics journals have articles co-badged as proceedings papers, from 2010 onwards most journals no longer have any of their articles classified as proceedings papers. In fact only six of our 27 journals still have articles double-badged as proceedings papers. Two of these six journals (Review of Financial Studies, and Bioconjugate Chemistry) only had one article classified as a proceedings paper and hence these might be incidental mistakes.

Table 1 Proportion of articles classified as proceedings papers in 27 journals in the Web of Science, 2001–2009 and 2010–2011

In the Information & Library Science discipline, the proportion of articles double-badged as proceedings papers was second only to Computer Science & Software Engineering between 2001 and 2009. After 2009, only Scientometrics and Journal of Informetrics had articles double-badged as proceedings papers. All articles concerned were published in a single issue of each journal as a selection of papers presented at two conferences in the field.

The importance of conference papers in the field of Computer Science & Software Engineering is well-known. Hence, it is not surprising that between 2001 and 2009, this discipline had the highest proportion of journal articles classified as proceedings papers. After 2009, two of the journals in this category still have a significant proportion of paper classified as proceedings papers. Given that in this discipline it is not unusual for journals to have special issues consisting of selected conference proceedings papers, this seems appropriate. It is therefore rather surprising that the journal that had the largest proportion (73.8 %) of proceedings papers between 2001 and 2009 (ACM Transactions on Graphics), now has no articles at all that are double-badged as proceedings papers. This journal indicates that it has a very close link with the ACM SIGGraph conference and that there are two special issues a year with conference papers. So if there is one journal that should have articles classified as proceedings papers, it is ACM SIGGraph. Thomson might have gone a little too far with their new classification policy.

Review articles

Why does Thomson classify papers that present original research as derivative work that synthesises the work of other academics? Simply because they have more than 100 references! In a discussion of the journal impact factor (see Thomson Reuters 1994) Thomson says:

In the JCR system any article containing more than 100 references is coded as a review. Articles in “review” sections of research or clinical journals are also coded as reviews, as are articles whose titles contain the word “review” or “overview.”

Thomson does not provide a rationale for why papers containing more than 100 references are considered review articles. Although some authors (see e.g. Sigogneau 2000) consider this an “unambiguous” criterion, the applicability of this criterion is discipline-dependent. A “real” review article providing, for instance, a literature review of 30 years of publications in a field is likely to have many references. However, the reverse does not hold true. Especially in the Social Sciences there are many papers with more than 100 references that are not review articles. One cannot presume a direct relationship between the number of references in a paper and its level of originality. Thomson does not provide a rationale for the seemingly arbitrary cut-off point either. Perhaps they simply saw 100 as a nicely convenient round figure?

In our editorial board study in Business & Management, I found that this cut-off was religiously applied. Every article with more than 100 references was classified as a review article. This was true even if the paper had sections titled “Theory and Hypotheses” and “Data and Methods” or the abstract explicitly referred to empirical work. Even when the title clearly referred to empirical work (by e.g. by containing the words “an empirical investigation”), papers with more than 100 references would still be classified as reviews.

When I revisited ISI’s review classification for a larger set of journals late 2011, I found that between 2001 and 2009, journals in three of the four Social Sciences had a substantial proportion of review articles (see Table 2). The only exception was economics, which only had an incidental number of papers classified as reviews. Further investigation showed that a very large proportion of the articles classified as reviews (93–95 %) had more than 100 references. In nine of the twelve journals this was true for either all, or all but one, of the articles classified as reviews. In the remaining three journals, all articles classified as reviews with less than 100 references were misclassified: they were in fact book reviews.

Table 2 Proportion of publications classified as reviews in 27 journals in the Web of Science, 2001–2009 and 2010–2011

So which proportion of these reviews in the Social Sciences were in fact correctly classified as reviews, i.e. did not contain original empirical or substantial conceptual research? In Economics, this proportion was relatively high (30 %). However, we should note that only 0.6 % of the articles were classified as review articles and hence this concerns only three articles. As in Economics, most journal articles have far less than 100 references, having more than 100 references seems to provide some indication that the article might be a review article. However, in the other Social Sciences only 3–5 % of the review articles were correctly classified. The vast majority of articles that were correctly classified as reviews, were meta-analyses. Only in Gender & Society were most of the articles correctly classified, but just like in Economics, articles in this journal typically have far fewer than 100 references. Hence only a very small proportion was classified as reviews in the first place.

In the Sciences, the picture is rather different. In both Computer Science/Software Engineering and Mathematics, only a very small proportion (0.1 %) of the publications was classified as a review article. In fact, four of the six journals in Computer Science and Mathematics had no publications classified as review articles at all. The remaining two journals in Computer Science and Mathematics only had one article each classified as reviews, in both cases because they had more than 100 references. Both articles were misclassified.

In Chemistry the proportion of review articles is also fairly low, 0.4 % overall. In Organic Letters only a miniscule proportion of the many papers (0.05 %) were classified as reviews. It is rather unclear why this was the case as all were empirical studies. Given the very small proportion, they might just be random mistakes. In Bioconjugate Chemistry and Biomacromolecules, 1.2–1.6 % of the papers were classified as reviews. In Bioconjugate Chemistry a quarter of the articles classified as reviews did contain original research. They all had between 101 and 110 references. In Biomacromolecules, nearly all of the reviews were correctly classified. Incorrectly classified reviews in Chemistry were all articles that had more than 100 references, although—in contrast to the Social Sciences—in these journals some articles with more than 100 were correctly classified reviews.

In Neurosciences, the picture is more mixed. All journals have a substantial proportion of review articles. In Molecular Psychiatry and Nature Neuroscience, the vast majority (84–91 %) of the review were correctly classified as reviews. Those incorrectly classified in Molecular Psychiatry were virtually all articles that had more than 100 references, although some articles with more than 100 were correctly classified reviews. Nature Neuroscience had a surprisingly high number of misclassified articles with fewer than 100 references (5 out of the 7 misclassified articles between 2001 and 2009). As all of these were published in the same issue, this might just be a temporary lapse of attention. The third journal—Behavioral and Brain Sciences—shows a very different picture. Although the founding editorial clearly establishes that BBS is not a review journal, over three quarters of its articles are classified as reviews. Although many have review elements, all of the, often 30–40 page, articles have significant new intellectual content. All of them have more than 100 references (many have more than 200 or even 300 references) and all of them are misclassified.

Finally, I looked at Information & Library Science journals. All three journals had a fairly low proportion of review articles, with the slightly higher proportion for Journal of Informetrics probably due to the fact that we only had 3 years of observation for this journal. In comparison to most of the Social Sciences, the proportion of reviews with more than 100 references is a little lower in Information & Library Science, although they still make up the majority of the review category. In addition, reviews in these journals are almost equally likely to be misclassified as in the Social Science journals.

So overall, journals in the Social Sciences tend to have far more articles classified as reviews, simply because they are more likely to have articles with more than 100 references. This is hardly surprising as the average number of references in the Social Sciences (except economics) tends to be higher than in the Sciences, not least because—at 20–30 pages—articles are usually longer than in the Sciences. The only journal in the Sciences with a high proportion of review articles is Behavioral and Brain Sciences, a journal that also publishes on Social Science topics and has 30–40 page articles. At 13.6 % Molecular Psychiatry also has a fairly high proportion of review articles, but in this case four fifths of the reviews are correctly classified.

There is one case in which publications with more than 100 references are systemically classified as articles rather than reviews. This is when they have been classified as proceedings papers (and subsequently double-badged as articles). This seems to imply that any paper that is presented at a conference can never be a review and leads to inconsistencies even within the flawed Thomson classification. For instance, of the 33 articles classified as conference proceedings in AMR, more than half had more than 100 references and hence according to the Thomson definition should have been classified as reviews.

As indicated above, the proportion of review articles has also increased over the years, coinciding with the increasing number of references. For nearly all of the Social Sciences journals the year with the largest number of reviews was either 2009 or 2008. After 2009, Thomson appears to apply a new policy as the proportion of review articles has declined dramatically. On average, over the 9 disciplines only 1.8 % of the publications are classified as reviews, a 73 % decline when compared with the 2001–2009 period. Fourteen of our twenty-seven journals no longer have any articles classified as reviews, including journals that between 2001 and 2009 had a very large proportion of review articles, such as the American Journal of Sociology. The majority of articles inappropriately classified as reviews after 2009 were published in the first (January) or second (February) issue of 2010. Hence, it is likely that they were classified before the change of policy was fully effective. The second major reason for misclassification of an article as review after 2009 was the occurrence of the words “peer review” in the title.

Discussion and conclusion

This article reported on a comprehensive, 11 year analysis, of document categories for 27 journals in nine Social Science and Science disciplines. It showed that for the Social Sciences both the “proceedings paper” and the “review” document type were nearly always used incorrectly. Proceedings papers were only classified as such, because some authors were honest enough to acknowledge that a very early version of their ideas had been presented at a conference. Review articles were nearly invariably classified as such because authors had been very diligent in acknowledging prior research and as a result included more than 100 references in their paper.

Using these classification criteria shows a fundamental misunderstanding of research and publication practices in the Social Sciences. Nearly any article worth its salt in the Social Sciences will have been presented at a conference or workshop somewhere in order to receive early feedback and refine research idea. As research projects are less easily reproducible, academics in the Social Sciences do not face the same urgency as in the Sciences to “beat” other academics in publishing new research results. It is not at all unusual to see five or even more years pass between data collection and publication. Just because some academics have the decency to thank their colleagues for feedback doesn’t mean that their journal articles should suddenly be “downgraded” to proceedings papers.

Journal articles in the Social Sciences on average also tend to be much longer than journal articles in the Sciences. This is partly due to the emphasis on original theory development—or as Hambrick (2007) would say “theory fetish”—in many journals, and partly due to the lower paradigmatic nature of these disciplines. As a result, journal articles in the Social Sciences often contain a much larger number of references than journal articles in the Sciences; in some journals the average number of references lies above 100. Needless to say, a cut-off score of 100 references doesn’t make much sense in this case. Economics and Information Science & Library Science are an exception. Articles in journals in these disciplines are generally shorter and have far fewer references. This reflects their relatively higher paradigmatic nature and lower emphasis on new theory development. It was therefore not surprising that far fewer articles were classified as reviews in these disciplines.

Turning to the Sciences, the use of the proceedings papers category was virtually non-existent in Chemistry, Mathematics and Neurosciences. Either papers are not presented at conferences in these fields to protect intellectual property or authors are less diligent in thanking their colleagues for feedback. In Computer Science/Software Engineering, the proceedings paper category was used more frequently. This does seem to make sense as conference proceedings are quite relevant in these fields. Apart from the anomaly of Behavioral and Brain Sciences, that saw nearly all of its 30–40 page articles classified as reviews, the review classification was rarer in the Sciences than in the Social Sciences. Moreover, in most cases the review classification was justified. Given that original research articles in the Sciences normally tend to have far fewer than 100 references, the 100+ cut-off is quite successful in identifying review articles.

For both the “proceedings papers” and “review” document type, Thomson has appears to have substantially changed its policies since 2010. As a result, the proceedings paper category seems to have virtually died out and the review category has been reduced by 73 %. As many of the remaining review classifications occurred early in 2010, the review category might slowly fade into oblivion as well. This change of policy, however, was implemented without any changes in Thomson’s documentation and hence current policies are at odds with their own documented criteria.

Although there is no way of knowing whether Thomson Reuters’ change of policy was due to my criticism, the timing of it seems to suggest that the two might not be entirely unrelated. So the good news is that Thomson does appear to take criticism to heart. The bad news, however, is that the measures they have taken to address this critique seem to lead to even more inconsistency. Given the problems with the proceedings paper classification, Thomson has made the right decision to no longer use it. However, it would be much cleaner if this classification was removed for all articles published in journals. Now if researchers want to study publication patterns before 2010, they will still exclude a significant number of original research articles if they exclude the proceedings paper category.

Given the problematic nature of the review category in the Social Sciences, it would also be much cleaner if Thomson simply removed the >100 references criterion and retro-actively reclassified as articles all papers that were categorized as reviews based on this criterion. The >100 references criterion could be maintained for the Sciences as it seems to result in relatively few classification errors there. In journals designated as review journals, I would expect Thomson to classify all publications as reviews.

By differentiating criteria between the Sciences and the Social Sciences, Thomson would substantially improve classification accuracy. Although the Science Citation Index predates the Social Science Citation Index, there is no reason to continue to use Science based criteria to evaluate the Social Sciences.