Social media has become commonplace in modern society as a means for individuals to communicate and interact in real-time digital environments. Increasingly, both patients and health care providers are using such platforms to learn and share health care related information.1 More than 50% of patients are using social media as a means to access health care information.2 Video sharing platforms, a subset of social media, are becoming increasingly popular sources of information. YouTube (Google LLC, San Bruno, CA, USA), a video sharing platform, receives more than two billion active users a month, rendering it the most popular video sharing platform.3 As a result of its widespread popularity, it ranks second in total global internet traffic.4 Patients are increasingly accessing open access video sharing platforms like YouTube in search of health-related knowledge.1 YouTube’s widespread use and open access nature has resulted in the creation of a substantial repository of predominantly nonpeer-reviewed health information. With such widespread use and vast amounts of information, it is critical to appraise the quality of the information that is shared on the platform.

There has been an exponential increase in the number of studies published in the medical literature using data from social media.5 Yet, there have been no systematic reviews investigating the literature assessing perioperative anesthesia information on YouTube whereas this has been evaluated in other disciplines.6 It is understood that patients are increasingly using the internet to access health-related information. Nevertheless, inaccurate information may cause confusion and undue concern for patients,7 prompting the need for high-quality, regulated, and patient-centred information on the internet. In addition, there has been an increasing uptake by medical trainees and physicians using open access platforms like YouTube to gain further knowledge.8

With the large prevalence of patients using YouTube as a source of health information, it is important to appreciate its uses, limitations, and current uptake among patients that will be receiving care from anesthesiologists. Such work will provide a foundation for future investigations and health policy. As such, our aim was to conduct a systematic review to assess the overall quality of perioperative anesthesia videos on YouTube reviewed in the literature.

Methods

Study methodology and search strategy

This systematic review is reported in adherence to the standards and guidelines established in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement where applicable (Electronic Supplementary Material [ESM] eTables 1 and 2).9 The project was registered on Open Science Framework (https://osf.io/ajse9; first posted 1 May 2023). We conducted comprehensive searches of the literature using the databases Embase, MEDLINE, and Ovid HealthSTAR, from inception until 1 May 2023. The search strategy was developed in consultation with a health sciences librarian and peer reviewed according to the Peer Review of Electronic Search Strategies (PRESS) guidelines.10 We used a combination of keywords, including “YouTube,” “health care,” “information,” and “surgery.” Search strategies can be found in ESM eTable 3. We reviewed the reference lists of included studies for possible additional studies.

Eligibility criteria

Inclusion criteria for article selection were studies that were original research articles that investigated YouTube as a source of patient or trainee information for any topic regarding perioperative anesthesia (including but not limited to general anesthesia, regional blocks, and obstetrical anesthesia).

Exclusion criteria included articles outside of the defined scope, articles not examining YouTube as a source of patient information, reviews (although the citations of these papers were reviewed and extracted to the screening process), commentaries, editorials, guidelines, news articles, conference abstracts/proceedings, articles without an associated full text, and articles in a language other than English. We also excluded articles reporting studies investigating acute and chronic pain.

Study selection

All citations were imported into EndNote® X7 (Thomson Reuters, Toronto, ON, Canada) and underwent deduplication, which was manually confirmed to be accurate. Study screening was conducted in a two-stage process. First, two reviewers (two of A. P. J., M. N., M. V.) independently screened the titles and abstracts of each article in accordance with the inclusion criteria. This was preceded by pilots of 100 studies each to ensure that the interrater reliability (kappa) was greater than 0.6. Two reviewers (two of A. P. J., M. N., M. V.) then independently conducted full-text screening according to the inclusion criteria. During both stages, discrepancies were resolved through joint discussion and consensus among the two reviewers and a senior author (M. S.).

Data extraction and analysis

Data were extracted into Microsoft® Excel (Microsoft Corporation, Redmond, WA, USA) and in duplicate by two independent reviewers (two of A. P. J., M. N., M. V.). Discrepancies in data extraction were resolved through joint discussion with a senior author (M. S.). Data extraction parameters included study meta-data (year of publication, publishing journal), methodological data (study objective, search criteria and search methodology used, type of analysis conducted), type of anesthesia, quantitative data regarding videos examined (video duration, views), educational video quality, and video source. The target audience was also assessed for each included paper.

For studies that reported only median and interquartile ranges [IQRs] or ranges, mean and standard deviation (SD) were imputed from the median and with methods proposed by Luo et al.,11 Shi et al.,12 and Wan et al.13 following normality tests. In cases where data were missing from the published study, an email was sent to the author asking for access to the raw data.

Where overall quality of videos was assessed in a study, the overall educational quality of videos in a study was assessed as being either “poor,” “fair,” or “good,” as indicated by the authors’ global assessment of the videos in the study conclusion or discussion. Where multiple scales were used to evaluate the quality of the videos in the study, this approach was taken as well. Where authors did not explicitly comment on the overall educational quality of the videos, the overall assessment was determined according to a validated assessment scale (if one was used). If a validated assessment scale was not used and an author-generated scale was used instead, the quality was deemed to be “poor” if the score was 33% or lower, “fair” if between 33 and 66%, and “good” if above 66%.

We appraised the quality of individual studies using the framework published by D’Souza et al.14 This is a framework focusing on five overarching questions to assist authors in appraising studies using data from social media platforms. Each study was assessed according to the framework and a binary score (“Yes/No”) was reported for each category. The overall study quality was graded as “good” if the score was ≥ 7, “fair” if the score was ≤ 6 and ≥ 3, and “poor” if the score was ≤ 2.

The video uploader source was divided into the following categories: academic institution/affiliation; advertisers/commercial/media; health care organization; patient/public; physician; other health care practitioner; and other.

Because of significant heterogeneity and the inability to preclude double counting, quantitative meta-analysis was not conducted. We used descriptive statistics to report the results using mean, SD, range, and n/total N (%). We generated heatmaps to visualize the proportion of videos uploaded from the aforementioned sources. We calculated Cohen’s kappa to assess interrater reliability. We used Fisher’s exact test to compare the proportion of videos that were rated as poor among anesthesia techniques studied. In all statistical tests, was used an alpha value of 0.05 to declare statistical significance.

Results

Study selection

The initial literature search across three databases yielded 8,908 articles (Figure). Following the removal of 4,353 duplicates, we screened the titles and abstracts 4,555 of studies. We assessed a total of 20 full-text articles for eligibility, yielding 14 articles in the final analysis. The interrater reliability was 0.94 for title and abstract screening and 0.85 for full-text screening.

Figure
figure 1

PRISMA flow diagram showing the search, screening, and selection process for the studies that were included in the systematic review

Study characteristics

Articles were published between 2012 and 2023, with a median publication year of 2020. Half of the papers (7/14) were published in the last three years alone (2020–2023). Table 1 details individual study characteristics.15,16,17,18,19,20,21,22,23,24,25,26,27,28 Nine of 14 (64%) authors were contacted to access missing data, 3/9 (33%) authors responded to the correspondence, and 2/3 (66%) of those authors provided additional data.

Table 1 Key study characteristics

Among the 14 studies included, there were 796 videos with 59.7 hr of video content and 47.5 million views. Studies included a mean (SD) of 47 (44) videos (Table 2). Regional anesthesia was studied most frequently (six, 43%), followed by vascular access (three, 22%), intubation (two, 14%), obstetrical anesthesia (two, 14%) and least frequently, general anesthesia (one, 7%) (ESM eFig. 1). Cumulatively, the mean video duration ranged from 2.3 to 11.5 min, with a mean video view count range of 1,348–486,933.

Table 2 Quantitative video characteristics and quality metrics by type of anesthesia

Overall, 12/14 (86%) of the included studies had multiple raters for video review, and of those that did, 7/12 (58%) calculated and reported an interrater agreement. All studies conducted a content analysis. In terms of a study quality, 50% of the studies were graded as good, 50% as fair, and 0% as poor (Table 3).

Table 3 Study quality assessment

In total, 14/14 (100%) studies conducted a quality assessment of videos, with 12/14 (86%) reporting the overall educational quality of videos as poor and 2/14 (14%) as good (Table 1). Among the 14 studies, a total of 17 different tools were used to assess quality, three of which were from the peer-reviewed literature while the remaining 14 were author-generated. The JAMA score was used in three studies (two reported findings) with a mean range of 0.7–1.5 out of a possible maximum of four points (Table 2). The Global Quality Score was used in two studies, with a mean range of 1.7–3.7 out of a possible maximum of five points. The modified DISCERN was employed in two studies, with a mean range of 1.5–3.7 out of a possible maximum of five points.

There were no statistically significant difference in the proportion of videos that were rated as poor among anesthesia techniques studied (Fisher’s exact test, P = 1.0) (Table 2).

Overall, 7/14 (50%) studies were graded as good on methodological quality assessment whereas 7/14 (50%) were graded as fair and none graded as poor (Table 3). Studies most commonly lost points for failing to provide future directions (5/14) and for having insufficient data (7/14).

Approximately one third of videos were uploaded by sources that would be considered educationally reputable (academic institutions, health care organizations, physicians, or other health care practitioner) (35.4%), whereas 32.4% of videos did not have an upload source reported (ESM eFig. 2).

Discussion

Key findings

This systematic review investigated the educational quality of perioperative anesthesia videos for patients and trainees on YouTube that have been evaluated in the academic literature. The main findings were that 1) the overall educational quality of videos was poor, and the methodological quality of studies was fair; 2) the majority of the literature was published within the last several years and continues to grow rapidly; and 3) there is substantial heterogeneity in the tools used to evaluate quality within the literature. These findings are important because they highlight that YouTube is a social media platform actively used to educate patients and trainees alike on topics encompassing anesthesia; however, the overall quality of such information was found to be poor according to individual studies’ global conclusions.

There appears to be a growing demand for videos discussing anesthesia for both patients and trainees as evidenced by the 796 videos and 47.5 million views included within this review. There likely are many excellent, high-quality videos on a variety of anesthesia topics posted to YouTube; however, clinicians should remain cautious in recommending YouTube as a sole or primary educational solution for trainees and patients. This concern regarding the quality of health care information on the internet is not new. Keelan et al. investigated the quality of health care information available on YouTube.29 They found that more than half of the immunization videos on YouTube contained information that contraindicated the reference standard. These findings are consistent with our observations that the majority of YouTube videos discussing anesthesia topics and evaluated in the literature were found to be of low educational quality. Dissemination of misleading information carries a substantial risk, which may be implicated in negative effects on patient care.7 The lack of peer review prior to uploading videos to YouTube, or any open access platform, is evidently a problem that continues to grow. It is likely challenging for patients and some trainees to critically evaluate the information on its own. There is a need for strategies to identify high-quality educational content for patients and trainees on anesthesia. YouTube recently has made efforts to label videos from accredited health care sources to assist viewers in evaluating the trustworthiness of information.30

The literature appears to be generally united in the observation of low-quality videos on anesthesia topics posted to YouTube. The platform is designed for entertainment rather than educational purposes and the proprietary algorithm appears to function in a manner that promotes videos that users engage more with. There are likely a number of factors, such as search history, geographic location, and age that affect video sequence on YouTube. High-quality peer-reviewed information designed for patients is available online in a variety of formats, including webpagesFootnote 1 and videos.Footnote 2 Despite the existence of these resources, patients and trainees continue to access YouTube, likely because of its widespread popularity and ease of use. Moreover, multimedia modalities of education will likely continue to increase, and as some work has observed, it can significantly improve the understanding of complex topics.31 Therefore, this study is important because it provides a comprehensive evaluation of the literature assessing videos for trainee and patient education regarding anesthesia on YouTube.

We found that a substantial portion of the literature investigating YouTube as a source of patient and trainee education has been published in the recent past. A total of 17 unique tools were used to assess video quality across the included studies, three of which were from the peer-reviewed literature while the remaining 14/17 were author-generated. Study authors should consider incorporating commonly employed tools to assess videos in future studies (Table 4). Tools such as the DISCERN,32 the JAMA score,33 and Global Quality Scale34 are examples that should be incorporated into future projects. Similarly, the use of author-generated tools should be avoided when a validated instrument for the same purpose has been developed. In this review, we found that a high number of authors independently developed tools to assess videos on YouTube, with some failing to cite peer-reviewed publications regarding best practices.18,22 Without appropriate validation and insufficient details regarding the methods of developing such tools, the implications of the outcomes may be difficult to appreciate. Moreover, multiple raters should review videos and interrater reliability of their evaluation should be quantitatively assessed using statistics such as Cohen’s Kappa.35 Many of the studies failed to reduce the risk of bias by not employing multiple raters to assess videos.

Table 4 Recommendations for studies investigating YouTube videos in health care

The framework proposed by D’Souza et al. served as a novel tool to evaluate study methodological quality.14 As shown in Table 3, the vast majority of studies suffered in three key areas of assessment, particularly with regards to limitations, inconsistency, and future directions. Many of the studies reviewed employed quality assessment tools that were not validated and/or did not reference peer-reviewed literature, which is a major limitation. In addition, when studies failed to include multiple raters and/or report assessment of interrater reliability, they were deemed inconsistent. Lastly, the vast majority of studies failed to discuss future directions. This is a substantial issue as the authors engaged with this area of literature are important stakeholders in advancing our understanding and highlighting key knowledge gaps for future study.

Strengths and limitations

A primary limitation of the current review is the lack of a validated quality assessment tool to assess the rigour of included studies. Although D’Souza et al. have published a framework to appraise studies investigating social media, the tool has not been validated.14 No validated instrument has been developed to assess the overall quality of studies investigating social media and medical information. Future work should aim to develop a validated instrument similar to the GRADE tool36 to assess study quality. Moreover, given the tremendous heterogeneity of tools deployed for evaluating video quality, we were unable to provide a more precise estimation of quality. With the above suggestions, future work may be able to provide increasingly detailed analyses. In addition, given the lack of detail provided in the original studies, we were only able to quantitatively pool composite scores, as opposed to analyzing the underlying components of these scores (e.g., DISCERN). This is further challenged by the lack of data associated with outcomes, such as patient knowledge. The systematic search was restricted to studies published in English, potentially rejecting important papers published in other languages.

There are several important strengths of this review that merit consideration. There was high interrater reliability across data extraction and subsequent analyses. The search strategy deployed was extremely sensitive, as highlighted by the number of titles included in the preliminary screening compared with the final sample.

Recommendations and next steps

Ongoing work should focus on the incorporation of peer-reviewed and/or validated assessment instruments for assessing the educational quality of videos. This will serve to reduce the heterogeneity of assessment outcomes, permitting consistency of reporting and facilitation of aggregation across the body of literature. Table 4 summarizes recommendations for studies investigating digital information in health care, like perioperative anesthesia on YouTube. If authors choose to develop novel assessment tools for a given aspect of anesthesia, this should be done in reference to the peer-reviewed literature and the development process should be properly described. Strategies for improving the educational quality of available videos on YouTube and the internet more broadly are needed, with future investigation of the effect that such information has on patient knowledge, decision-making, and potentially even clinical outcomes. How video quality affects measurable outcomes for both patients and trainees should also be measured in future studies.

Trainee education is always evolving, and new modalities of education are frequently incorporated in an effort to improve educational outcomes. The learning outcomes associated with the use of freely available online videos on trainee education merit attention. Identifying characteristics that lend videos to be classified as providing high-quality educational information and how trainees develop as a result of engaging with such information should be studied. Previous literature has shown the improved learning outcomes when videos were included as part of medical trainee educational material.37

Recently, YouTube launched a pilot program with the Council of Medical Specialty Societies and the National Academy of Medicine to identify credible source of health-related information on the platform.30 This is one example of efforts that can be made to possibly improve the ease by which individuals can access credible and reliable information online.

Conclusion

In conclusion, this systematic review found that the overall educational quality of patient- and trainee-targeted videos on perioperative anesthesia on YouTube have been reviewed as poor quality. There is certainly demand for such videos; however, the impact of inaccurate information for patients and trainees is not fully understood. A standardized methodology for evaluating online videos is merited to improve future reporting. More importantly, a peer-reviewed approach to online open-access videos is needed to support online patient and trainee education in anesthesia.