Introduction

Wikipedia, a blend of the words “wiki”—a technology that allows collaborative modification of a website—and “encyclopedia”, is a free online encyclopedia, written collaboratively by the people who use it. At present, Wikipedia is the largest online encyclopedia. It held nearly 5.4 million articles in the English version alone in May 2017.Footnote 1

Altmetrics are non-traditional metrics proposed as an alternative or a complement to traditional citation impact metrics. Altmetrics cover other aspects of the impact of scientific works, such as the number of views, downloads, bookmarks or mentions in social media. A reference to a scientific article in Wikipedia can be seen as a metric that partially captures the impact of the article. Contrary to other sources of altmetric data, such as social media, in which the easiness of the process may result in casual sharing of research results, citations in Wikipedia may be indicative of stronger engagement of the user with the article. Among its “five pillars”, Wikipedia enforces strict editorial guidelines striving “for verifiable accuracy, citing reliable, authoritative sources” that ensure quality and standard across all the encyclopedia entries.Footnote 2 Citations allow Wikipedia editors to make their contributions verifiable by supporting them with trustworthy external sources, and enable readers to locate further information on topics of interest. Thus, citations in Wikipedia can be considered an indication of the transfer of scholarly output to a wider audience.

Nielsen (2007) was one of the first authors to examine citations in Wikipedia to articles in scholarly journals. He observed that Wikipedia citations correlated strongly with the total number of citations to a journal, but more weakly with the journal’s impact factor. Wikipedia contributors also showed a slight tendency to cite articles in high-impact journals such as Nature and Science. A similar trend was described by Stankus and Spiegel (2010), who observed that both titles topped the list of Wikipedia journal sources for entries on the brain and behavioural sciences. However, the results are different in disciplines with distinct citing behaviours. Thus, Luyt and Tan (2010) found that most citations in a set of Wikipedia history entries were to books, with very few citations of academic journal material. Similarly, Halfaker and Taraborelli (2015) analysed the presence of ISBN, PubMed, DOI and arXiv identifiers in Wikipedia and found that most matches were to books and monographs. To sum up, citations in Wikipedia of scholarly literature have been used as proxy measurements of the encyclopedia’s reliability, and differences in verifiability across topics have been identified (Mesgari et al. 2015).

Using a different approach, Huvila (2010) conducted a survey on Wikipedia editors’ information behaviour, identifying five groups of contributors who use different information sources. The results indicated a preference among contributors for sources that are available online, although a significant proportion of the original information was based on printed literature, personal expertise and other non-digital sources of information.

Finally, another line of inquiry has explored Wikipedia as an alternative source of evidence about the impact of research. Thus, Evans and Krauthammer (2011) searched PubMed IDs and DOIs in Wikipedia and observed that these articles have higher citation counts than an equivalent random article subset. The fact that articles were cited in Wikipedia soon after publication suggested that Wikipedia citations might represent a resource for assessing articles’ impact. This opinion was shared by one-third of the bibliometricians who attended the 17th International Conference on Science and Technology Indicators (STI2012), who believed that the number of Wikipedia links or mentions of an article could be of use in author or article evaluation (Haustein et al. 2014). Using a different approach, Tarango et al. (2017) analysed the obsolescence of Wikipedia featured articles in Spanish and observed that more than 90% had last been modified in the two years previous to data collection.

Interest in Wikipedia as a source of altmetric data has grown in recent years. In February 2015, Altmetric.com, a start-up focused on tracking and analysing online activity relating to scholarly literature, announced that any mentions of articles and academic output in Wikipedia would be reflected in a new Wikipedia tab on the Altmetric details page.Footnote 3 In order to capture this information, the academic output that was mentioned had to be referenced with proper Wikipedia citation tags.Footnote 4 However, exploratory research led to doubts about the use of Wikipedia as a source of evidence of the impact of research. Lin and Fenner (2014) found that just 4% of PLOS articles had been cited in Wikipedia. Thelwall (2016) analysed the presence of astronomy and astrophysics research in Wikipedia, and indicated that the use of Wikipedia citations as a proxy for public interest in research articles was limited, due to the intermediate role of Wikipedia contributors. Consequently, references reflect the interest of a small number of researchers and amateurs who are enthusiastic Wikipedia editors, rather than the general public. Subsequently, Kousha and Thelwall (2017) showed that only 5% of the articles indexed by Scopus between 2005 and 2012 had been cited in Wikipedia, although this percentage rose to 8% when reviews were considered. In contrast, 33% of the academic monographs indexed by Scopus had attracted at least one Wikipedia citation. They concluded that Wikipedia citations were not common enough to be used for impact assessment of articles in most fields. More recently, Teplitskiy et al. (2017) analysed whether journals’ impact factor and open access (OA) availability were related to their presence in Wikipedia. They found that a journal’s impact factor predicts its appearance in Wikipedia, and that its accessibility increases the odds of being referenced in Wikipedia, although to a lesser extent.

The aim of the current study was to explore the coverage of Library and Information Science (LIS) literature published between 2001 and 2010 in Wikipedia by 2017. The research paid special attention to the methodological issues involved in the use of Wikipedia citations for research evaluation. Specifically, the study aimed:

  • to identify the methodological limitations of counting Wikipedia citations,

  • to quantify the proportion of LIS literature cited in Wikipedia,

  • to analyse the characteristics of Wikipedia entries that cite LIS literature, and

  • to measure the OA availability of the LIS articles cited.

Methods

In order to conduct the study, we retrieved the 26,542 articles and reviews indexed in the category “Information Science and Library Sciences” of the Social Sciences Citation Index in the Web of Science published between 2001 and 2010.

Afterwards, we searched for each of these articles in Wikipedia, and retrieved all the entries in which they were cited. In order to achieve this, we used the advanced search feature of Google, searching for all the words in the article title as an exact phrase and narrowing the results to those in the domain “wikipedia.org”. In the case of articles with very short titles (three or four words), the name of the first author was added to the query and the results were checked manually. All the searches were conducted between the second half of 2016 and early 2017, to allow for an extended period of at least five years from publication of an article. Citation analysis studies usually employ a shorter citation window (impact factors, for instance, are based on the citations received by articles published in the previous two years). However, since this study focuses on citations outside the academic community, an extended citation window seemed appropriate.

Any citation to an article in a Wikipedia entry was recorded either in the “references” section, as “additional reading” or embedded in the text (for instance, in a section of a Wikipedia entry entitled “Example studies that have leveraged the IS success model”). If articles were cited in several Wikipedia entries, all the instances were recorded. Similarly, the citation of an article in different language versions of a single Wikipedia entry was recorded. It should be borne in mind that the different language versions of a single Wikipedia entry are not translations, but are edited independently and therefore may cite different sources.

Finally, when the reference included a link to an external source, we visited the website to find out whether the full-text of the article was available in OA. Again, in the case of articles cited in several Wikipedia entries, all the references were checked, since they may link to different sources. However, in the case of citations to a single article in different language versions of the same Wikipedia entry, only the reference in the English version (or the first retrieved version if the article was not cited in the English version) was checked.

In sum, the research process proceeded as follows:

  1. 1.

    Retrieval of LIS articles and reviews published between 2001 and 2010 indexed in WoS: 26,542 records.

  2. 2.

    Google advanced domain search of LIS articles cited in Wikipedia: 982 citations of 766 articles. For each citation we measured:

    1. 2.1

      Completeness of the reference: author, title, journal title, DOI, etc.

    2. 2.2

      Type of Wikipedia entry citing the article.

    3. 2.3

      OA availability of the cited article when an external link was provided.

Results

Limitations of counting Wikipedia citations

The retrieval of Wikipedia citations to the academic articles in the sample proved to be a difficult task, due to the lack of standardization of bibliographic references. Table 1 provides examples of the most frequently observed problems. Reference 1 illustrates the case of a reference that only includes the article’s title and a link to the full-text stored on the publisher’s website. Reference 2 shows a slightly more complete citation including the journal and year of publication, in addition to the article title and publisher URL. Meanwhile, Reference 3 includes the author’s name, year of publication, title, and URL. In this case, the link leads to a post-print copy of the article deposited in an institutional repository.

Table 1 Examples of incomplete references in Wikipedia

The degree of completeness of the references varies from entry to entry, even for a single article. Reference 4 only includes the names of the authors, year of publication and title, whereas Reference 5 provides a much more detailed citation of the same article obtained from a different Wikipedia entry. Abbreviated journal titles can make even relatively complete references difficult to retrieve, as in Reference 6, in which the Journal of the American Medical Informatics Association has been abbreviated to JAMIA. Some references contained errors, such as that in Reference 7, in which the authors’ names and surnames have been inverted.

The use of the “cite journal” templateFootnote 5 to create citations for scientific papers is inconsistent. It is common to find Wikipedia entries in which “references” have been edited using the recommended template, but citations included in sections such as “further reading”, “select bibliography” or “external links” have not. This is the case, for instance, of References 1 and 2 in Table 1. Even when the citation template is used, examples in Table 1 show that many parameters may be missing. The inclusion of a DOI in the reference could be used to automatically extract Wikipedia citations to academic articles. However, for articles published in 2010, the latest year considered in our study, just 61 references out of 115 (53%) included a DOI. Again, we can find examples of a single article cited in several Wikipedia entries with and without a DOI.

Proportion of LIS literature cited in Wikipedia

Overall, just 2.9% (766 articles) of the LIS output published between 2001 and 2010 and indexed in the Social Sciences Citation Index had been cited in Wikipedia by the time of data collection. Since some articles had been cited in several Wikipedia entries, the total number of citations retrieved was 982 (Table 2).

Table 2 LIS literature cited in Wikipedia by publication year

Citations in Wikipedia biographies

As could be expected, Wikipedia entries citing LIS literature were related to topics in the field. Frequently, these Wikipedia entries were biographical articles about well-known LIS scholars (such as Marcia J. Bates, Hope A. Olson and Tom Wilson, to name but a few) describing their education, work and awards, among other information.

Most of these biographical entries include a list of publications authored by the scholar in question (see, for instance, Reference 5 in Table 1). In fact, a total of 13.5% of the Wikipedia citations retrieved in our study were made in biographical entries about one of the authors of the cited article (Table 2). The number of citations in authors’ biographical entries was especially significant for articles published in the initial five years covered in our study, although it decreased for more recent literature.

Open access availability of articles cited in Wikipedia

Scholarly journals often require expensive subscriptions. It is therefore questionable whether Wikipedia contributors have access to these sources or whether they rely on OA sources to edit entries. Our results show that 31.2% of the Wikipedia citations were linked to an OA source, with this percentage increasing for more recent literature (Table 2).

At this point, we counted separately the citations to a single article in several Wikipedia entries, since a reference may link to an OA source in one entry, but not in another. For instance, Table 3 shows four different linking options for a single article in four Wikipedia entries: Reference 1 does not include a link; Reference 2 is linked to the publisher’s website that requires a subscription to gain access to the full-text; Reference 3 includes a broken link to the co-author’s personal website; and Reference 4 links to a freely available post-print copy of the article stored in the Internet Archive version of the page linked in Reference 3.

Table 3 Examples of different linking options to a single article

The 306 references that included an OA link pointed to three kinds of sources in a balanced manner: publishers’ websites (fully OA journals; articles that were OA after an embargo period and OA articles in hybrid journals, among others): 39.2%; repositories (disciplinary or institutional): 30.4%; and websites (personal, departmental and social networks, among others): 30.4%.

In the case of citations to articles published in fully OA journals, we could expect to systematically find links to the full-text available on the publisher’s website. However, this was not always the case. Table 4 shows two examples of references to articles available in OA journals that were not linked from Wikipedia. The first example corresponds to an article published in College and Research Libraries, a journal that is currently available in OA. The reference provides a (broken) link to the social network Academia.edu. A possible reason for this situation is that the reference, according to the retrieved date, was introduced in December 2010, but College and Research Libraries did not become OA until the following year. The second example shows a reference to an article published in Information Research, a fully OA journal since its creation. However, the reference does not include a link. Obviously, if publishers’ freely available versions are not always linked by Wikipedia contributors, it is highly plausible that copies deposited in repositories or other sources are not linked either, making the results in Table 2 an underestimate of the OA availability of cited sources.

Table 4 Examples of references to OA articles in Wikipedia without links to the full-text

Discussion and conclusions

Among other altmetric indicators, citations in Wikipedia have been proposed as an alternative to traditional impact metrics. Citations of articles in Wikipedia can be seen as a metric that partially captures the societal and educational impact of an article in a wider audience beyond the academic community. However, the results of this study reveal severe limitations in the use of Wikipedia citations for research evaluation purposes.

The lack of standardization of Wikipedia references makes it difficult to measure them with a minimum level of precision. Unlike bibliographies in academic publications, where references are edited to ensure that they are correct, Wikipedia citations are frequently incomplete or even erroneous. Essential fields for the proper identification of articles such as authors’ names or journal titles may be missing, making it impossible to retrieve citations. This feature, combined with the absence of document identifiers such as DOIs means that we cannot rely on automatic extractions of citations. If professionally edited citation indexes, such as Scopus and Web of Science, have been criticized for inaccuracies that make it difficult to retrieve some documents and distort bibliometric indicators (Franceschini et al. 2015, 2016), it is hard to consider using Wikipedia citations for research evaluation purposes. Bibliometric indicators based on Wikipedia citations will be unlikely to reach the requirements of robustness and replicability necessary to be used in decision-making processes.

The number of Wikipedia citations is also too small to be used in research evaluation. Less than 3% of LIS articles published between 2001 and 2010 had been cited in Wikipedia by 2016. This figure results from a detailed search of individual articles including manual checks but, given the lack of standardization and incompleteness of many references, any automatic attempt to retrieve Wikipedia citations would probably lead to a lower figure. Given the scarce amount of information provided in some references, it is also possible that we have missed some citations. Although our study focuses on a small discipline such as LIS, the results are consistent with those obtained by Lin and Fenner (2014) who found that just 4% of PLOS articles had been cited in Wikipedia, and Kousha and Thelwall (2017) who reported that only 5% of the articles indexed by Scopus between 2005 and 2012 had been cited in the encyclopedia. The share of LIS articles cited by year of publication remained fairly stable throughout the decade analysed. This issue requires further study with a larger sample, since Wikipedia citations can be expected to behave differently from those in academic journals or other scholarly outputs. The fact that academic journals are addressed to a scholarly audience while Wikipedia is aimed at the general public may result in a lower level of obsolescence of citations in Wikipedia compared to those in academic articles which tend to cite cutting-edge research.

In addition to the low percentage of scholarly literature articles cited in Wikipedia, attention must also be paid to the representativeness of these citations. As stated by Thelwall (2016), the use of Wikipedia citations as a proxy for public interest in research articles is limited, due to the intermediate role of Wikipedia contributors, with references reflecting the interest of a small number of researchers and amateurs who are enthusiastic Wikipedia editors, rather than those of the general public. Although our study does not deal with this issue, our results reveal some aspects that should also be considered when Wikipedia citation data is interpreted. This is the case of the relatively large amount of Wikipedia citations retrieved in the biographies of articles’ authors. Wikipedia biographies of relevant scholars often list their publications, which increases the number of citations received by well-known scholars in the field. This results in a phenomenon of accumulated advantage similar to the Matthew effect. Our results show that this phenomenon is more evident for older literature, which suggests that biographical Wikipedia entries are created for more senior scientists.

The relationship between OA availability and Wikipedia citations is also of interest, since we can intuitively assume that easy accessibility makes articles more likely to be referenced (Teplitskiy et al. 2017). Our results show that 31.2% of the Wikipedia citations of LIS literature linked to an OA source, with this percentage increasing for more recent literature. However, this is probably an underestimate of OA availability due to the incompleteness of Wikipedia citations, and the fact that links to OA sources are frequently missing.