Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Since its creation, Wikipedia has evolved as an important resource for patients, the general public, students and health professionals. Wikipedia is the largest online encyclopaedia with 4,541,520 articles in English (English Wikipedia, 2014). Important features of Wikipedia may include: (1) it is available free of charge for everyone without prior registration or membership; (2) it covers different aspects of knowledge and entertainment including arts, biography, geography, history, mathematics, science, society, technology, business and health; (3) it enables users to add their contributions and therefore these tools get enriched as more people use them; and (4) it has been seen by the general public, students, and health professionals as important sources for information.

With the changes introduced to medical curricula such as the introduction of problem-based learning (PBL) and the accommodation of self-regulated learning as part of the curriculum design, it has been noted that most medical students tend to search easily accessible online resources such as Google, and Wikipedia websites for their ‘learning issues’ (Alegría, Boscardin, Poncelet, Mayfield, & Wamsley, 2014; Patil et al., 2014; Petty, 2013). While the new changes in the curriculum aimed at enhancing students’ skills to critically search for knowledge from several resources rather than study the content of a particular textbook, the development of technology and its availability have shifted students’ search for knowledge from paper-based resources to online resources. Although several online resources have been created for medical and health professionals such as Medscape (eMedicine), UpToDate Inc., the Merck Manual Medical Library and PubMed (Azer, 2014; Tyson, 2000), there is evidence that medical students usually prefer to start by searching general online resources such as Google and Wikipedia to find answers for their queries (Kingsley et al., 2011). There is also evidence that physicians use the Internet far more than the general public (Masters, 2008). The work of Hughes, Joshi, Lemonde, and Wareham (2009) showed that 53 % of Internet visits were made by junior physicians and were directed mainly to Google and Wikipedia. Despite awareness of participants about information credibility risks related to these online resources and the risk of poor quality information obtained, the junior physicians in the study preferred easily accessed resources such as those provided by Wikipedia. They viewed Wikipedia and Google as important sources of medical information. These views have also been found to apply to medical and allied health students (Guarino et al., 2014; Kolski, Arlt, Birk, & Heuwieser, 2013; Prasannan, Gabbur, & Haughton, 2014).

However, these online resources may not be created by academics or qualified experts. Furthermore, both Wikipedia, and YouTube do not appoint expert editors to assess work submitted and materials are published online without prior peer-review or expert evaluation. The absence of prior review in assessing the quality and the scientific content of Wikipedia articles raises several questions regarding the adequacy and scientific accuracy of these resources. Two recent studies showed that physicians using online resources may become less vigilant towards potential errors even when it contradicts their existing knowledge (Lau & Coiera, 2008; Westbrook, Gosling, & Coiera, 2005). Recently Schmidt et al. (2014) demonstrated that media information about a disease gained from a source such as Wikipedia can cause internal medicine residents to misdiagnose similar-looking clinical cases. The problem of availability bias may arise from exposure to media-provided information about a disease, causing diagnostic errors (Schmidt et al., 2014). The bias’s effect is apparently associated with nonanalytic reasoning and can be counteracted by reflection. By this, we mean the tendency of residents to make shortcuts and jump into conclusions without careful analysis, possibly due to the effect of fast knowledge obtained from media information.

Despite their wide use by medical students and junior doctors, there are limited studies exploring the accuracy of these learning resources and whether they have attained the standards required in scholarly/academic resources. To determine the suitability of Wikipedia articles for medical students as a source for information, it is important to determine whether articles are scientifically accurate, up-to-date, free from errors and whether there are no gaps or deficiencies in the information provided. Also, it is important to determine their suitability for medical students and whether they are written at a reading level appropriate for college students rather than for lay persons in the general public. Therefore, the purpose of this study was to evaluate the suitability of the nervous system articles available on the English-language Wikipedia that might be used by medical students as part of their learning resources. To answer this question Wikipedia articles were assessed with respect to: (1) scientific accuracy and comprehension; (2) frequency of updating and quality of references; (3) reliability; and (4) readability.

2 Methods

This study analysed the nervous system articles on the English Wikipedia database (http://en.wikipedia.org).

2.1 Wikipedia Articles

2.1.1 Study Design

To identify the topics on the nervous system and its disorders, five medical textbooks (Table 7.1) and eMedicine (Medscape) website (www.emedicine.com) were searched. These books were used because they are recommended by most medical schools, have been reviewed in peer-reviewed journals such as the British Medical Journal and the New England Medical Journal, have been written and edited by medical experts, are regularly updated every 3–4 years, and several editions have been produced over the past 30–60 years. eMedicine (Medscape) is a professional educational website written and regularly updated by medical consultants and eminent clinicians.

Table 7.1 Medical textbooks used as a standardized reference in evaluating Wikipedia articles

The aims of searching these resources were to: (1) ensure that topics needed by medical students in their undergraduate course in relation to the nervous system have been identified and included in the search; and (2) use these resources as standardized reference in the assessment of the accuracy and quality of information provided in Wikipedia articles. The chapters on the nervous system were revised and key topics were identified by three evaluators (the author, a medical consultant and professor of medical education, plus two medical graduates). The lists of identified key topics were discussed in a meeting. The three evaluators agreed upon a final list covering 42 topics.

2.1.2 Searching Wikipedia

The Wikipedia website (http://en.wikipedia.org/wiki/Wikipedia) was searched on 20 May 2014 for the 42 topics. Topics were printed out and a photocopy was given to each evaluator. The aim of using a photocopied version of the articles rather than the electronic version was to ensure that all evaluators were using the same version. This is particularly important as Wikipedia articles undergo continuous changes.

2.1.3 Instrument Used in Assessing Accuracy

The rating instrument used in this study was a modified version of the DISCERN instrument (Azer, 2014, see Appendix C for the modified version used). The DISCERN project was funded by the British Library and the NHS Executive Research and Development Program from 1996 to 1997 (DISCERN Project, 1999). The instrument consists of 15 questions, plus a question about the overall evaluation of the document, and was designed for the evaluation of the different aspects of healthcare related websites and information about treatment options. For example, the DISCERN instrument was used in assessing online resources on epidural anesthesia (Jaffe, Tonick, & Angell, 2014), mental health (Grohol, Slimowicz, & Granda, 2014), colorectal cancer (Grewal & Alagaratnam, 2013), and inflammatory bowel disease (Van der Marel et al., 2009).

However, the original DISCERN instrument is not suitable for evaluating Wikipedia articles as it was not designed to assess scientific accuracy in given information, the inclusion of illustrations, figures, tables or multimedia to support the topics or whether there are gaps or deficiencies in the information given. These deficiencies drove the need to modify the DISCERN instrument as discussed in an earlier publication (Azer, 2014). The modified DISCERN instrument is comprised of ten questions and aims at providing a comprehensive assessment of Wikipedia articles in regard to: (1) aims of the article and the adequacy of subtitles used; (2) scientific accuracy of information provided and if there were any personal views; (3) degree of balancing of different parts, and whether sources of information were provided; (4) regular updating the article and if there were gaps or deficiencies in the article that need to be completed; (5) images, figures and tables provided; and (6) the overall rating of the article.

The original DISCERN scoring system has been used. Each question is rated on a 5-point scale, where 1 corresponds to ‘no’, 3 corresponds to ‘partially yes’, and 5 corresponds to ‘yes’. For the last question, 1 corresponds to ‘serious or extensive shortcomings’, 3 corresponds to ‘potentially important but not serious shortcomings’, and 5 corresponds to ‘minimal shortcomings’.

2.1.4 Piloting the Study

Prior to applying the instrument to the Wikipedia articles, the use of the modified DISCERN instrument was piloted with the aim to: (1) orient the evaluators to the different items of the instrument and the scoring system; (2) ensure that evaluators were able to use the instrument; and (3) enhance the evaluators skills in applying the instrument through feedback on their assessment. For piloting purpose, ten Wikipedia articles other than those included in the study were selected and each evaluator was asked to evaluate them using the instrument independently. Articles that were scored differently were discussed and a resolution was reached.

Ten additional Wikipedia articles were selected and were again evaluated by the three evaluators using the modified DISCERN instrument as described earlier. The agreement between the evaluators was in the range of 75–85 %, which was considered satisfactory.

2.1.5 Assessing Wikipedia Articles on the Nervous System

The 42 articles identified were evaluated independently by the three evaluators using the modified DISCERN instrument. Interrater agreement between the evaluators for each item in the modified DISCERN instrument was calculated using Cohen’s kappa score (Kharbanda et al., 2012; Tsivgoulis et al., 2013).

2.1.6 Assessing References

The list of references at the end of each article was evaluated by the three evaluators independently. The aims of evaluating the references were to assess if the authors used appropriate resources to construct each article and what type of references they used. Therefore, the following points were considered in the evaluation: total number of references; number of peer-reviewed journals; number of educational guidelines and proceedings from professional societies, textbooks, professional and general websites; and others (such as news and media). Articles written for academic purposes need to rely on up-to-date, peer-reviewed references such as scientific/medical articles, educational guidelines and proceedings produced by professional societies rather than cite general references such as general websites, non-peer-reviewed articles, magazine articles, and news.

2.1.7 Frequency of Wikipedia Article Updates

Resources written for academic purposes are expected to be regularly reviewed and updated. Such reviews usually aim at enhancing the quality of content, adding up-to-date information and recent developments as well as related references. The frequency of updating articles was assessed through the ‘view history’ button next to ‘search’ at the top right part of each article. Information collected included: (1) date created; (2) total number of revisions; (3) total number of authors; (4) average time between edits; (5) average edits per month; (6) revisions in the last 12 months; and (7) average edits per user.

2.1.8 Assessing Readability

The aim of assessing the readability was to evaluate whether the reading level was appropriate for medical students. Two methods were used in assessing readability: Flesch-Kincaid grade level and Coleman-Liau index (Vargas, Chuang, Ganor, & Lee, 2014). The score of readability is an indicator of the number of years of education that a person needs to be able to understand the text on the first reading. For example, a score of ten indicates that a tenth grade student can easily understand the topic. An online calculator, provided by Readability Formulas (http://www.readabilityformulas.com/free-readability-formula-tests.php), was used. Based on the instructions given, a random sample of 150—60 words were copied from the beginning, middle and the end of each article and placed into the space provided by the programme. Headings, external links, images and numbers of citations were omitted from the text used prior to conducting the calculation. The reading scores for each part were recorded and the mean and standard deviation were calculated for each article.

2.2 Statistical Analysis

The mean, standard deviation, minimum and maximum were calculated. To assess the degree to which different evaluators agreed in their assessment, the Cohen’s kappa interrater reliability was calculated. Correlation between the DISCERN scores and the number of updates and number of peer-reviewed references were also calculated. The aim was to assess whether the number of updates, the number authors/reviewers and references were related to the improvement of the article quality or not.

3 Results

3.1 Depth and Accuracy of Articles

Table 7.2 summarizes the number of pages, the scores calculated using the modified DISCERN instrument and the number of images, illustrations, tables and media/audios of each Wikipedia article. The number of pages ranged from one page for the article on smell to 36 pages for the article on stroke indicating that topics varied in regard to details given and depth of discussion. Also, this may be because some articles were incomplete, had gaps or deficiencies in their content and needed further work. Considering the number of pages of articles and the frequency of updating/reviewing since first created, there is evidence that some topics were of less interest to Wikipedians and were less frequently reviewed or improved compared to other articles.

Table 7.2 Accuracy score, number of images, illustrations, tables, readability scores and number of references of each Wikipedia nervous system article included in the study

The minimum DISCERN score was 10.33 ± 0.57 (mean ± SD) for the article titled ‘smell’ and the maximum score was 38.00 ± 1.00 for the article titled ‘multiple sclerosis’. The mean score for the 42 articles was 25.88 ± 5.97. To summarize the scores, there were 10 articles scoring 30 or higher, 26 articles scoring 20–29, and 6 articles 10–19 (the maximum score was 50). Top scored articles such as the article on ‘multiple sclerosis’ was covered on 28 pages, had 13 images, illustrations and photos, 114 references, 2 external links and one further reading. Also the article on ‘stroke’ scored 36.00 ± 1.00, was covered on 36 pages, had 12 images, illustrations, photos and media related, 158 references, 1 external link and 2 further readings.

On the other hand, articles with the lowest scores such as the article on ‘smell’ scoring 10.35 ± 0.57 was one page only, had no images, illustrations, tables, photos or multimedia. Also the article had no references or external links. The article on ‘encephalopathy’ is another example, scoring 15.00 ± 1.00, had no images, illustrations, photos or tables, only three references, one external link and one further reading. It was not possible to measure the readability for articles comprised of one page only.

Although the articles followed the template of Wikipedia for medical/health related articles, some articles were incomplete and most articles were deficient in the areas of: (1) disease pathogenesis; (2) clinical picture; and (3) management of nervous system diseases. Agreement between evaluators was calculated by Cohen’s kappa interrater correlation; the range for the mean ± SD for the scores were 0.65 ± 0.10 to 0.79 ± 0.12.

3.2 Article References

Table 7.2 summarizes the total number of references, external links and further readings for each Wikipedia topic. The total number of references for the 42 articles was 1517 and the number varied from 0 to 158 references; 36.12 ± 38.82 (mean ± SD). There was weak correlation between the DISCERN scores and the number of total references. This suggests that the absolute number of references was not a good measure for assessing the quality of an article. This is particularly important as not all references were peer-reviewed articles and educational guidelines produced by professional bodies were lacking in the list of references in most articles.

Common problems found in citations and the list of references can be summarized as follows: (1) citation of wrong references, and failing to cite the appropriate references; (2) incomplete references (for example, missing journal or book title, missing year, volume or page numbers); (3) inconsistencies in the way the references are written; (4) failure to include guidelines of professional societies/associations; and (5) several statements in articles are missing appropriate references as in-text citations.

3.3 Frequency of Revisions

Table 7.3 summarizes key information about articles, their history in regard to date created, number of revisions, number of authors, average time between edits, edits per month and in the last 12 months. It is obvious from the article’s history that the date of creation varied. For example, while the earlier article ‘stroke’ was created on the 16th of April 2001, the most recent article ‘brainstem glioma’ was created on the 25th of January 2008.

Table 7.3 Wikipedia articles on the nervous system: article history, date created, latest edit, number of revisions, number of authors, and frequency of revisions

While there was moderate correlation between the DISCERN score and the total number of revisions (R 2 = 0.38) and the total number of authors (R 2 = 0.42), there was weak correlation between the DISCERN score and the average edits per month (R 2 = 0.10).

3.4 Article Readability

To calculate readability, two methods were used. Table 7.2 shows the readability scores (mean ± SD) for each article as calculated by Flesch-Kincaid grade level and Readability Coleman-Liau index. The range of readability using the first method was in the range of 10.96 ± 1.30 to 17.90 ± 1.65, while the second method showed a range of 10.00 ± 1.00 to 17.33 ± 2.30. The article on ‘smell’ did not have enough text to calculate readability. A good correlation was found between the scores calculated by the two methods, R 2 = 0.650. The mean score for all articles was 14.21 ± 2.91 on using the first method and 13.02 ± 2.74 for the second method. These scores of readability indicate that Wikipedia articles were geared to college level.

4 Discussion

The aim of this study was to evaluate the quality and accuracy of content of Wikipedia articles as learning resources commonly used by medical students. To evaluate these resources, we specifically evaluated the accuracy, clarity, quality of information and readability of Wikipedia articles on the nervous system. A total of 42 Wikipedia articles covering the nervous system diseases were evaluated using the modified DISCERN instrument (Azer, 2014).

Although Wikipedia articles followed the template created by Wikipedia for medical/health articles, most articles were deficient in addressing disease pathogenesis, clinical picture, and management of nervous system diseases. The accuracy of articles as measured by the modified DISCERN instrument had a mean score of 25.88 ± 5.97; only ten articles scored 30 or higher and over 50 % of articles scored 20–29 out of 50. Although images, illustrations, photos, and multimedia were incorporated in some articles to enhance the educational value of articles, the quality of these images/illustrations were not at the standards expected of educational resources and the images used were not labelled to explain radiological, microbiological and pathological changes.

As indicated by the Wikipedia administrators, several articles were incomplete. These deficiencies may be summarized as follows: (1) articles in their early stages, for example, the articles on ‘peripheral neuropathy’, ‘brainstem’, ‘autonomic nervous system’, ‘Broca’s area’, ‘encephalopathy’, ‘motor neuron’, ‘lower motor neuron lesion’, ‘smell’, and ‘spinal cord’; (2) articles showing deficiencies in some content or needing tables, images, illustrations or media to make the message meaningful and enhance their educational value, for example, the articles on ‘brainstem glioma’, ‘dizziness’, ‘encephalopathy’, ‘myopathy’, ‘sensory neuron’, ‘smell’, ‘upper motor neuron lesion’; and (3) articles requiring the addition of proper citations for some statements, for example, the article on ‘sensory neuron’, and ‘encephalitis’. Although the number of references for the 42 articles was 1517, some articles had no references and a number of problems were identified in the list of references and the quality of references cited. Interestingly, none of the 1517 references was a Wikipedia citation. Recently, Bould et al. (2014) found that 1433 full text articles from 1008 journals indexed in Medline, PubMed or Embase with 2049 Wikipedia citations were accessed. They also found that the frequency of most citations occurred after December 2010. The Wikipedia citations were not limited to journals with a lower or no impact factor, but were in many journals with high impact factor. The authors warned journal editors and peer-reviewers to use caution when publishing articles that cite Wikipedia. The readability of Wikipedia articles was geared at college level indicating that the articles were not written for the public and the language used was suitable for the medical students.

It is obvious from recent research that Wikipedia has continuously worked to improve the quality of its medical/health content (Chiang et al., 2012; Rasberry, 2014). However, few articles meet the quality standards that medical schools would expect before recommending such resources to medical students. These findings have been reached when researchers evaluated Wikipedia ‘gastroenterology’ and ‘hepatology’ articles (Azer, 2014). A few researchers reported that Wikipedia articles are useful resources for patients with hand illness (Burgos, Bot, & Ring, 2012), and a reliable source for nephrology patients although written at a college reading level (Thomas, Eng, de Wolff, & Grover, 2013). It was also reported that Wikipedia was a prominent source of online health information compared to other online health information providers (Laurent & Vickers, 2009). Others reported that the quality of osteosarcoma related information in the English version of Wikipedia was inferior to the patient information provided by the US National Cancer Institute (Leithner et al., 2010).

The methods used in evaluating Wikipedia articles in this study aimed at providing a critique of accuracy, clarity, quality, and adequacy of content committed to nervous system articles. Three evaluators conducted the assessment of the Wikipedia articles and the methods were used in earlier publications (e.g., Azer, 2014). The agreement among evaluators had mean ± SD range of scores of 0.65 ± 0.10 to 0.79 ± 0.12.

Wikipedia articles need peer-review by experts and professionals. Harnad (1999) described peer-review as a quality control and certification process to ensure accuracy and validity of material produced in an academic environment. The results from this study show that most articles were updated regularly and the mean ± SD number of revisions of the 42 articles was 1298 ± 1418.00 and the average time between edits varied from 0.8 to 70.5 days; 10.15 ± 13.94 (mean ± SD). However, anonymous users of Wikipedia made approximately 30 % of edits. Generally, it is difficult to know the actual experience, level of education and skills of the Wikipedia articles.

Several suggestions have been made to improve the quality of editing of Wikipedia articles by doctors (Kint & Hart, 2012), experts and professionals who have specialized in the designated topic. Reavley et al. (2012) suggested that professional associations could create task forces for reviewing Wikipedia and even place an approval statement on acceptable articles. Wicks and Bell (2012) suggested that professional societies could nominate or suggest peer-reviewers that can take such responsibilities.

Recently, ‘WikiProject Medicine’ has been introduced where people interested in medical and health content on Wikipedia can discuss, collaborate or debate issues. Additionally, Wikipedia articles have also been categorised in regard to their status by administrators. For example, awarding of a golden star means ‘Featured Article’, or awarding of ‘A’ means Approved A-Class article etc. Details about Wikipedia categorisation are given on the following link (http://en.wikipedia.org/wiki/Category:FA-Class_medicine_articles). The aim of such categorisation is to help readers and editors/authors understand the relative status and possible veracity of the article.

The study reported in this chapter has several limitations; it evaluated only 42 Wikipedia articles and was limited to English-language topics on the nervous system. Therefore, generalization of these results to other medical or healthcare disciplines is not applicable. More work is needed in the future to evaluate Wikipedia articles on a wider range of medical and surgical diseases.

However, despite these limitations, this study raises important issues in the area of medical education and medical informatics particularly for problem-based learning programmes where self-directed learning is an important domain in the curriculum design (Artino, Cleary, Dong, Hemmer, & Durning, 2014). Expected directions in research in this area may include:

  1. 1.

    Expanding the evaluation of Wikipedia articles to other medical and surgical topics so that a conclusive evaluation of Wikipedia articles could be made.

  2. 2.

    Further assessment of the data provided by Wikipedia in regard to updating and revision of its articles in order to assess the quality of such revisions and understand why, despite recording higher numbers of revisions, articles were not at the standards required for an educational resource.

  3. 3.

    Assessing the impact of engaging medical students in reviewing Wikipedia articles and critically assessing them on their learning and understanding of topics evaluated and studied.

5 Conclusion

This is ongoing research; the findings from this study suggest that there were deficiencies and scientific errors in most Wikipedia articles evaluated. Considering the tendency of medical students to depend on Wikipedia in their learning, it may be necessary to educate students in critically engaging with online information by, for example, using guidelines such as the criteria used in this study in evaluating online resources. Given the expectation of medical teachers that students should take responsibilities of their self-regulated learning, Wikipedia articles could be a resource for critical evaluation and content improvement. These recommendations together with the need of medical schools to offer training to its students on how to select their learning resources is necessary.