Introduction

Fictional medical programs have long been a television staple, and since the 1994 premieres of ER and Chicago Hope the number of such programs has grown exponentially. These two programs ushered in a new era of medical programming in which the focus was on the lives of the doctors and nurses (Turow 2010). Furthermore, these programs, ER in particular, incorporated techniques such as using medical jargon and hiring physicians to serve on the writing staff to make the show as accurate as possible without sacrificing the story (Baer 1996).

At its peak in 1998, ER attracted over 47 million weekly viewers, and its success has been followed by other highly popular programs such as House M.D., Grey’s Anatomy, Scrubs, and Private Practice (Carter 2009). Although some of these programs are no longer on the air, their accessibility on services such as Netflix—which has over 62 million subscribers—helps them to continue reaching a vast audience (Maglio 2015). In the fall of 2015, the premiere episode of the new medical drama, Code Black, rated first in overall viewers at the time of airing demonstrating that this type of programming is likely to remain popular for the foreseeable future (Dixon 2015).

These television programs are particularly popular among health professional students (Weaver et al. 2014), and the content of these programs may influence student knowledge, perceptions, and behavioral expectations (Baer 1996). Anecdotal reports in the literature regarding both recreational exposure and exposure as part of formal health professional education curricula highlight both positive and negative potential influences. For example, medical students have reported television medical dramas to be an important influence in learning how to position the airway for endotracheal intubation, but this may be problematic given that one study assessing intubation practices on ER found that airway positioning was always suboptimal (Brindley and Needham 2009). On the other hand, Dahms et al. (2014) reported that remembering an episode of House M.D. used for teaching medical students ultimately helped them diagnose a patient with cobalt poisoning in an actual medical setting. These television programs may also influence health professional students’ career choices, as suggested by a 25% increase in applications to emergency medicine residency programs after the premiere of ER (O’Connor 1998).

Furthermore, although many of these television programs originated in the United States (U.S.), they are available in other countries on television, DVD, and online streaming services. Thus, examining their influence on health professional students and use in medical education is valuable to researchers and educators across the globe. It is important, however, to recognize that cultural differences, as well as variances in the course of health professional education, may affect the impact of these television programs on health professional students across cultures. For example, the time spent in residency programs varies country to country.

Considering the continuing availability of medical television programming and its potential for influence, it would be valuable to systematically assess the literature for research exploring the effect of fictional medical television programming on health professional students’ knowledge, perceptions, and/or behavior. Therefore, we conducted a systematic review in order to synthesize existing research, make recommendations for future study, and explore opportunities to utilize the vast quantity of fictional medical television programming to enhance medical education.

Methods

We designed and reported this study using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), which was designed to guide authors in comprehensive, evidence-based systematic reviews and meta-analyses (Moher et al. 2009).

Selection criteria

We created a comprehensive research protocol (Appendix 1 of ESM). Selected studies were required (1) to be scholarly, peer-reviewed research (2) to involve an exposure to fictionalized U.S. medical television programming premiering in 1994 or later by an individual in a formal health professional training program, and (3) to assess associations between program exposure and outcomes. We included all studies that met inclusion criteria published prior to February 2015. To be considered scholarly research, articles had to be an original research study published in a peer-reviewed journal in medicine, the social sciences, or related fields. We defined fictional television programming according to the Academy of Television Arts & Sciences Emmy awards definition of primetime television drama or comedy series (Primetime rules and procedures: academy of television arts & sciences 2013). To be deemed medical programming, at least half of the program had to take place in a healthcare setting and at least half of the main characters had to be health professionals such as physicians or nurses. We limited selection to studies examining shows premiering in or after 1994 in light of literature supporting that year as a pivotal turning point in fictional medical television (Turow 2010). Furthermore, earlier programming was less relevant to our goals of making recommendations for future research and exploring opportunities to use these television programs in medical education. Formal health professional training programs included both undergraduate and graduate medical programs, nursing programs, and other programs designed to train professionals in public health, epidemiology, and/or health policy. Finally, included studies were required to assess the association between program exposure and students’ knowledge (e.g., acquisition of facts), perceptions (e.g., attitudes and beliefs), and/or behaviors (e.g., specific action steps). Therefore, we excluded studies that were only content analyses. For example, a 2008 article by Lim and Seet was not included because it did not formally assess the impact of using House M.D to teach medical students about ethics and professionalism on medical students’ knowledge, perceptions, and/or behaviors (Lim and Seet 2008). Selection was not limited by sample size, age, gender, or location of study. Because of limitations regarding feasibility, we only included studies published in English.

Identification and selection of studies

In February 2015, we conducted searches in PubMed, which includes MEDLINE, PsycINFO (OvidSP), and CINAHL (EBSCOHost). A professional research librarian developed search strategies that were designed to be broad and tailored to the idiosyncrasies of each particular database (Appendix 2 OF ESM). All searches included comprehensive lists of search terms related to entertainment education, prime-time television, medical television, and titles of particular medical programs (e.g. Grey’s Anatomy and Nurse Jackie). We hand-searched reference lists of included studies to identify additional relevant articles. We also contacted authors of all included studies to inquire whether they knew of any additional studies that fit our criteria.

Four researchers independently screened all article titles and abstracts to generate a set of articles for which there was any possibility for selection (Liberati et al. 2009). The initial assessment was conducted to provide an initial screening to narrow the field of studies to those—based on the title, abstract, and other meta-data—with any likelihood of ultimately meeting selection criteria. Then, two researchers independently assessed the full-texts of these selected articles for eligibility. During this process, researchers used structured abstraction forms that enabled subsequent comparison of independent assessments with inter-rater reliability statistics. Inter-rater reliability was outstanding (98.5% agreement, Cohen’s κ = 0.95). To minimize risk of reviewer bias, only after independently screening articles did reviewers meet to discuss any differences. In the case of disagreement, a third reviewer helped with the adjudication process. After adjudication, we easily achieved consensus in 100% of cases. For each article determined to not meet criteria, we assigned a primary reason for exclusion.

Data extraction

We then developed structured spreadsheets to facilitate complete and accurate data collection. One researcher abstracted (1) study background information, such as year and location of study; (2) participant-related information, such as student level (e.g. undergraduate medical, nursing) and participant demographics; (3) exposure-related information, such as name of program(s) and number of clips/episodes; (4) outcome-related information such as timing of assessments and main outcome measures; and (5) study quality information such as participant recruitment methods and study design (Appendix 1 of ESM). A second researcher independently verified all abstracted data.

We created relevant variables based upon the primary information collected in these five categories. For example, we classified studies into subgroups based on whether they examined clips in a classroom setting or sought associations between volume of viewing and outcomes, and whether the main outcome measure was knowledge, perceptions, and/or behavior. All operationalization in this way facilitated synthesis of data and reporting of results (Appendix 1 of ESM).

Data analyses

Because of the wide variety of outcomes assessed and the lack of standard measurements for the outcomes that different studies had in common, we could not preform meta-analyses to quantitatively combine the data. Instead, we qualitatively described the data using standard methods of systematic review described by PRISMA (Moher et al. 2009) (Appendix 3 of ESM).

To assess study quality and therefore bias, we selected the Medical Education Research Study Quality Instrument (MERSQI) to formally evaluate each article. The MERSQI is a 10-item instrument designed to assess the quality of medical education research studies (Reed et al. 2007). The MERSQI contains six domains used to assess study quality, each of which has a maximum score of three, leading to a maximum total score of 18. Two researchers independently applied the MERSQI to all included studies, then met to compare scores for each study. Any scoring differences were easily adjudicated. We then calculated mean study quality and compared it to the mean score found by Reed et al. (2007) in their assessment of 210 medical education research studies.

Results

Study identification and selection

Of 4044 potentially relevant published articles, 3541 represented unique studies. We eliminated 3473 based on initial assessment of title and/or abstract. Because we used broad search criteria to maximize the potential of finding relevant studies, many studies were easily identified at this phase as not relevant. Out of the remaining 68 full-text articles assessed for eligibility, 13 met selection criteria (Fig. 1). No authors of the 13 selected studies responded with any additional studies that met criteria. The most common reason for study exclusion was that it was not scholarly research published in English. The majority of remaining studies that were excluded did not assess exposure to U.S. fictionalized medical television programming by individuals in a formal health professional training program (Fig. 1). Although we identified relatively few studies that fit our selection criteria, we felt it important to maintain our a priori research protocol, and that further synthesis of these 13 articles would provide valuable information to the medical education community.

Fig. 1
figure 1

Study selection. Of the 3541 unique articles retrieved using the defined search strategy, 13 remained after the exclusion process. *Although many articles did not meet inclusion criteria for more than one reason, each article was assigned a primary reason for exclusion. These numbers represent articles excluded for the primary reasons listed

Quality characteristics

The mean MERSQI score for all 13 studies was 8.27 (Standard Deviation [SD] = 2.01) (Table 2). This is slightly lower, but within the standard deviation, of the mean MERSQI score found by Reed et al. (2007) in their assessment of 210 medical education research studies (mean = 9.95, SD = 2.34). Ten studies (77%) received one point for the domain of study design because they used a single group cross-sectional or single group post-test only, while the remaining three received 1.5 points for utilizing a “single group pretest and posttest” (Appendix 4 of ESM). Overall, studies scored highest in the data analysis domain, which assesses the appropriateness of analysis for the study design and complexity of analysis (mean = 2.46, SD = 0.52). In this domain, all studies received one point for “data analysis appropriate for study design and type of data”, and one or two points depending on the complexity of analysis (one point for “descriptive analysis only”, and two points for “beyond descriptive analysis”) (Appendix 4 of ESM). Studies scored lowest in the validity of evaluation of instrument domain, which assesses internal structure, content, and relationship to other variables of the evaluation instrument (mean = 0.46, SD = 0.51). For this domain, 7 studies (54%) received a score of 0 because they did not report on the content of the evaluation instrument, and 6 (46%) studies received a score of 1 because they reported the content, but not internal structure or relationship to other variables (Appendix 4 of ESM). Overall scores across the various domains were relatively homogenous, with the largest standard deviation occurring in the type of data domain (mean = 1.46, SD = 0.88).

Population characteristics

Six studies (46%) involved undergraduate medical students (Aboul-Fotouh and Asghar-Ali 2010; McNeilly and Wengel 2001; Shevell et al. 2014; Weaver et al. 2014; Weaver and Wilson 2011; Williams et al. 2013), one (8%) involved nursing students (Weaver et al. 2013), two (15%) involved both medical and nursing students (Czarny et al. 2008; Jubas and Knutson 2012), two (15%) involved medical residents (Pavlov and Dahlquist 2010; Wong et al. 2009), one (8%) involved medical students, residents and attending physicians (van Ommen et al. 2014), and one (8%) involved graduate epidemiology students (Ostbye et al. 1997) (Table 1). The number of participants per study ranged from 8 to 484, with a mean of 181 (SD = 175) and a median of 92 (Interquartile range = 42, 362). The majority of studies (9, 69%) were conducted outside of the United States. Each study drew participants from only one institution.

Table 1 Study characteristics of studies examining impact of fictional medical shows on health professional students

Exposure characteristics

Five studies (38%) examined associations between the volume of routine viewing (as opposed to supervised viewing in an educational setting) and outcomes (Czarny et al. 2008; Weaver et al. 2013, 2014; Weaver and Wilson 2011; Williams et al. 2013). For these five studies, the average percent of respondents that watched fictional medical television was 84% (SD = 8%), with this number being slightly higher for medical students as compared to nursing students (87% [SD = 6%] and 75% [SD = 9%], respectively). The most commonly assessed television programs were ER and Grey’s Anatomy (6, 46% each), House M.D. (5, 38%), and Scrubs (3, 23%) (Table 1). The majority of students watched television programs by themselves. All studies that examined student-viewing habits focused on the impact of viewing on the health topic of professionalism and most (4) also included ethics (Table 1).

Eight studies (62%) assessed the association with outcomes of using clips as educational tools in a formal educational setting (Aboul-Fotouh and Asghar-Ali 2010; Jubas and Knutson 2012; McNeilly and Wengel 2001; Ostbye et al. 1997; Pavlov and Dahlquist 2010; Shevell et al. 2014; van Ommen et al. 2014; Wong et al. 2009). For teaching topics in these studies, four utilized clips to teach about doctor/patient communication (Pavlov and Dahlquist 2010; Aboul-Fotouh and Asghar-Ali 2010; Wong et al. 2009; McNeilly and Wengel 2001), one about professionalism (Shevell et al. 2014), one about ethics (van Ommen et al. 2014), one about medical training (Jubas and Knutson 2012), and one disease processes/medical terminology (Ostbye et al. 1997) (Table 1).

Outcomes

While most (11, 85%) studies collected written questionnaire data, the remaining two studies used semi-structured interviews (Jubas and Knutson 2012; van Ommen et al. 2014) (Table 2). All five studies that assessed routine viewing habits assessed participants’ perception of ethics and/or professionalism on these programs. Four of these studies also assessed behavior, with three asking whether students discussed plot lines with fellow students and whether they were asked about bioethical or medical topics from the television programs by friends or family (Czarny et al. 2008; Weaver and Wilson 2011; Williams et al. 2013), and one assessing the impact of viewing behavior on desire to become a nurse (Weaver et al. 2013) (Table 2).

Table 2 Study outcomes from studies examining impact of fictional medical shows on health professional students

All eight studies that examined the use of clips in a classroom setting reported high student satisfaction. Furthermore, three studies specifically reported that students expressed a desire to continue participating in learning that involved program clips (Ostbye et al. 1997; Pavlov and Dahlquist 2010; Wong et al. 2009). Six of these studies also assessed both student knowledge and perception, and all reported increased knowledge of the presented health topics (Table 2). Three studies assessed acquisition of knowledge with pre and post course multiple choice, Likert-scale, and/or open-ended questions (Aboul-Fotouh and Asghar-Ali 2010; McNeilly and Wengel 2001; Wong et al. 2009), and the remaining three measured student knowledge through open-ended survey questions at the end of the exposure only (Ostbye et al. 1997; Pavlov and Dahlquist 2010; Shevell et al. 2014). Some examples of open-ended questions designed to assess acquisition of knowledge include “did the clips illustrate elements of physicianship? Please explain.” (Shevell et al. 2014) and “name three things you as the physician might include in the boundaries you establish with a difficult patient.” (McNeilly and Wengel 2001). No studies assessed the impact of exposure to fictional medical shows on clinical practice.

Discussion

Despite a broad search and flexible selection criteria, there were relatively few peer-reviewed studies that examined the influence of fictional medical television programs on health professional students. Available studies suggest that these students commonly view fictional medical television, recall storylines from fictional narratives, and learn from these programs when they are utilized in a classroom setting.

Our review provides several concrete, actionable implications. First, none of the studies we identified directly examined whether there might be a difference between recreational exposure and intentional curricular exposure. It may be valuable for future research to investigate this question.

Second, the results of this review suggest potential for these television programs to serve as a springboard for education and to enhance current classroom activities. Clinical vignettes from television programs can be incorporated into course material to illustrate a range of medical topics, such as ethical dilemmas, differential diagnosis, and correct procedural techniques. Future research may help elucidate which specific ways of embedding material may be most associated with positive outcomes. For example, some studies were based upon integration of very brief clips while others relied upon full television programs. This inquiry may be particularly amenable to a qualitative approach. Furthermore, although few medical education interventions are judged by influence on clinical practice, future research on this topic may yield valuable results.

Third, our findings suggest that medical educators may wish to incorporate discussions about the portrayal of healthcare professionals on these television programs into course materials, as articles in our review suggest that fictional medical programs motivate career choice for both physicians and nurses (van Ommen et al. 2014; Weaver et al. 2013). The portrayal of physicians on television has changed dramatically throughout the years, with television doctor heroes who could do no wrong (e.g. Dr. Kildare) populating the airwaves in the 1960s, to physicians on current television shows that are generally portrayed as noble yet flawed (Turow 2010). Comparison with older shows may help spark discussion on the role of the physician and practice ideals, as well as provide an avenue to educate attending physicians about the potential influences of fictional medical programming on patients’ perceptions of healthcare and the healthcare workforce (Cho et al. 2011; Quick 2009; Stinson and Heischmidt 2012).

Fourth, more rigorous study design is needed. Improving such studies need not entail a shift to quantitative studies: there may be value in focusing on strengthening qualitative approaches. However, it would be valuable for future studies to be conducted at multiple institutions, to be conducted longitudinally, and to utilize validated instrument tools for assessment. Given that higher quality study design has been associated with increased funding opportunities, improvement in study design may also improve funding for this line of research (Reed et al. 2007).

Limitations

It was a necessary limitation that interpretation of selection criteria can be subjective. Therefore, we attempted to minimize bias by carefully defining our selection criteria with specific protocols and examples. It is possible that during the initial screening process certain articles could have been missed because they were not screened in duplicate. Our study was also limited in that we only examined television programs originating in the U.S. in the past twenty years. Considering the majority of studies examining these programs were conducted outside of the U.S., it is possible that widening our scope to include non-U.S. television programs would have resulted in a greater number of studies in our review. Our scope was limited to shows that feature health professionals in a professional setting. Thus, we did not include shows such as Frasier that feature medical professional characters mostly outside of an office or hospital setting. These restrictions were put in place because our research question specifically asked about the impact of U.S. shows that take place in a medical setting, but a wider variety of programming might be valuable to study in the future. A common limitation in the reviewed studies is the use of convenience sampling. It would be valuable for future studies to use more random sampling. Finally, the PRISMA guidelines were developed primarily for the reporting of reviews evaluating randomized trials. Despite this, many of the PRIMSA guidelines are in keeping with the recommendations laid out by Reed et al. (2005) in their review on improving reporting and synthesizing of educational interventions. For example, the lack of objective evaluation methods across studies was an obstacle, but the PRIMSA guidelines allowed us to develop methods to qualitatively synthesize the various evaluation methods. However, the heterogeneity of the small number of articles that fit the criteria for this study meant that it was not possible to utilize more quantitative synthesis or formal techniques of meta-analysis.

Conclusion

This systematic review suggests that fictional medical television programs may represent an untapped resource that can serve as teaching tools for students and medical professionals. Because existing studies have been limited in terms of sample size, scope, and study quality, it will be valuable for future research to utilize more rigorous study designs and to more directly assess the impact of these experiences on clinical practice instead of focusing on feasibility and acceptability.