Introduction

The flipped classroom (FC) model provides an alternative to traditional education—where class time is often spent passively lecturing—in favor of more active learning. In a flipped classroom, students engage in material on their own before classroom sessions, often in the form of instructor-made slideshows, readings, or videos. This allows students to enter class sessions having acquired a level of background knowledge, and valuable class time is spent practicing the application of this knowledge with the guidance of an expert (the instructor). With a growing interest in more effective teaching methods in higher education, active learning has shown to improve student performance in the undergraduate setting compared to a traditional lecture model [1, 2]. With promising theory behind a more active approach to learning, there have been calls for reform in medical education specifically to utilize such methods instead of traditional didactic lectures [3, 4].

With research supporting the value of active vs. passive learning, flipped classroom has gained popularity as an active learning modality in medical education, and the last decade has increasingly produced research investigating the impact FC has on both student perception and performance. One recent meta-analysis on FC in health professions education found a significant improvement in assessment performance, though the review included studies in a variety of professional education settings, including medicine, pharmacy, nursing, and other healthcare fields. Additionally, students reported a preference for FC in 70% of respondents, though only 5 of the included articles had data on student preference [5]. Research on FC specifically for pre-clinical medical education is more limited. With regard to medical education in general (including medical student and resident education), there remains a student preference for FC; however, whether FC improves assessment performance is unclear with mixed results [6,7,8]. While the results in strict knowledge acquisition are mixed, FC implementation has shown to improve learner performance in more procedural skills [7], though these findings may not apply as readily to pre-clinical coursework where the primary goal is knowledge acquisition, as is the case in the organ system course described in our study.

In the few studies that solely investigated pre-clinical medical education, the results are fairly consistent with more encompassing studies. Though not exactly a FC model, Sheakley et al. found no practical difference between student performance on exam questions from traditional lectures compared to content delivered via instructor guided-independent learning activities, defined simply as instructor-created learning modules to be completed outside of a classroom setting [9]. The lack of a follow-up active learning activity separates this approach from a true FC, and the lack of performance improvement may highlight the active learning sessions as a key feature in the success of a FC approach to replace traditional didactic sessions. In a more direct application, Street et al. found a slight improvement in a FC cohort’s assessment scores compared to a traditional cohort in a pre-clinical physiology course [10]. One common perceived benefit of FC is a deeper analytical understanding of content rather than low-level memorization, which was seen in a pre-clinical anatomy course in which FC students performed equally on basic knowledge questions and superior on analysis questions when compared to a traditional classroom [11]. Notably, these prior studies were done solely observing one aspect of pre-clinical education (anatomy, physiology, etc.). Pre-clinical education has recently shifted to an integrated approach—all relevant material pertinent to an organ system is covered simultaneously (anatomy, pathophysiology, pharmacology, diagnostics, etc.). Further research on FC in an integrated curriculum may better measure its value in today’s pre-clinical education landscape.

Clearly, there exists opportunity to implement FC effectively in pre-clinical medical education to improve student learning and attitudes. There are, however, challenges that accompany a FC approach that are especially present when applying it to medical education. Even in meta-analyses that found an overall positive student perception of FC, students still reported being unhappy with the increased time requirements of completing work prior to class [5], a common complaint surrounding flipped classroom. Additionally, there is a large amount of material to cover in pre-clinical medical education, further placing time constraints on students and resulting in another challenge of FC: optimizing pre-class material to maintain student engagement and preparedness during active learning sessions. The outside work in a FC should be of high quality and mindful of students’ desire to maximize efficiency—doing so allows students to fully engage with pre-class material and come prepared to instructor-facilitated sessions to maximize learning [12].

This study sought to investigate the utility of a flipped classroom approach in a pre-clinical medical education setting which utilizes an integrated approach to learning foundational organ system knowledge, especially given the relative lack of primary research in this subset of learners. The goal of our study was to further assess whether flipped classroom improves student performance on standardized examinations and student perception while being mindful of common pitfalls found with FC, namely the time constraints present in didactic medical education and efficiency of using instructor-led, active learning sessions.

Materials and Methods

Cohorts

Here, two cohorts of students were compared (two, consecutive classes of medical students): one receiving the traditional curriculum and the other receiving a flipped classroom model. Notably, comparison of standardized exam scores (n = 3) that were common between the two cohorts prior to endocrinology block (NBME subject exam: Biochemistry/Cell Biology, NBME subject exam: Musculoskeletal system, and MCAT), there was no statistical difference between the cohorts (data not shown). Demographic information is provided in Table 1 demonstrating no obvious demographic difference between the cohorts.

Table 1 Demographic information of two cohorts (*at time of study implementation only binary gender options were available to matriculants)

Course Description

The “endocrinology block” is a required 2-week, integrated, multidisciplinary portion of a larger course covering the gastrointestinal, endocrine, and reproductive systems, including all relevant basic and clinical science disciplines linked to the Knowledge for Practice domain. Throughout higher education and in this study, the traditional curricular model centers on the paradigm that students’ first pass through content is facilitated by an expert (usually via didactic lecture) and practiced application of concepts is done independently, whereas in the flipped-classroom, first-pass occurs independently, and practiced application occurs with the expert. Here, both curricular models had passive instruction and active instruction, with the passive instruction always preceding the active instruction and thus represented the first pass. Conversely, the active learning component of both curricular models came second and represented practiced application. The specific characteristics of each curricular model are defined in Table 2. The impact on time spent by students is present in the results.

Table 2 Descriptive characteristics of traditional and flipped classroom models (* indicates when expert faculty were present)

Passive instruction during the traditional model was in-person didactic lecture (recordings available post-lecture), while the flipped classroom employed pre-recorded modules or videos as the passive instruction modality. Again, for both curricular models, the passive instruction modality was also the first-pass; thus, the flipped-classroom did not have formal didactics. Flipped-classroom students were expected to prepare for in-class learning in advance (i.e., watch/study pre-recorded modules). The active learning component of the traditional model was a single independent, small-group, case-based assignment; whereas for the flipped classroom, the active learning encompassed four, large-group, case-based Q&A (w/audience response) applied learning sessions, facilitated by an expert to provide feedback and clarification. In the flipped classroom, the four active learning sessions were (1) diabetes mellitus, (2) hypothalamic-pituitary pathologies, (3) adrenal gland disorders, and (4) thyroid-parathyroid gland disorders. The assigned pre-recorded modules corresponding to each of the four sessions and are listed in Table 3.

Table 3 Pre-recorded module assignments per active learning session in flipped classroom

Data Instruments

The change in curriculum was evaluated in two approaches. The first was knowledge learned. Two multiple-choice assessments containing questions directly linked to the learning objectives of the endocrine block were utilized to compare traditional curriculum cohort versus flipped classroom. The first assessment was a unit exam covering the 2-week endocrinology block. Thirty-one in-house authored questions were reused unaltered (KR20 = 0.77). Twenty-three of the questions were case-based and utilized higher order Bloom’s thinking, and eight were not. The second multiple choice assessment used was a customized NBME—exam for the GI-Endo-Repro course. This exam was used unaltered between cohorts and was administered by the NBME CAS service. Fifty NBME questions were categorized with ‘endocrinology’ tag in the NBME system (KR20 = 0.64) and a sub-score on the endocrinology questions generated.

The second outcome was students’ reactions. Student surveys were administered to the flipped-classroom group only. The survey items were 5-point Likert-scale questions and free response: all pertaining to their experiences/opinions of the flipped classroom model. The questions were part of a larger survey administered by curriculum committee student representatives each semester. Broadly speaking, it is intended to gather information about student satisfaction and engagement with each semester of the curriculum. The survey is anonymous and optional (71 respondents of 165).

Data Analysis

We utilized a quasi-experimental design comparing examination performance between two cohorts of students. Independent t-tests were used to compare means and detect significant differences, with effect size reported. GraphPad Prism software was used in data analysis. This study was deemed exempt by the University of Cincinnati Institutional Review Board.

Results

Impact on Instructional Time

The total instruction time of the endocrinology block declined slightly (9%) when shifting from the traditional curriculum to the flipped classroom curriculum (24 vs. 21 h). The teaching modalities and the proportion of hours devoted to each are summarized in Table 4. In general, migrating to the flipped classroom curriculum resulted in fewer hours of passive instruction and increased hours of active learning. The increased active learning time was required to achieve the tenets of a flipped-classroom model, and since adding days of instruction to the endocrine block was not feasible, a reduction in passive instruction was also required. Importantly, the reduced passive instruction time was achieved through efficiency of communication and post-recording editing, not through reduction of breadth and depth of content.

Table 4 Hours of instruction time in traditional versus flipped classroom curricular models (* indicates in-person attendance)

Although total instruction time declined only slightly, the in-person requirements declined significantly in the flipped classroom (22 vs. 8 h). This reduction of in-person time was impactful to both students and faculty. Students who offered rationale for preferring the flipped classroom often mentioned the flexibility afforded to them because of reduced in-person time. For faculty, there is a significant upfront investment of time to develop the 8 h of in-person active learning sessions for the flipped classroom. However, faculty experts anecdotally reported a preference and appreciation for less in-person time and increased enjoyment since students came prepared to class already knowing the basics.

Impact on Student Performance

We compared student performance on both an in-house custom assessment and the endocrinology questions on the final exam, both contributed to the course final grade. The custom assessment was administered on a Monday after the final learning session the previous Friday. The assessment covered only the endocrinology block and thirty-three of the questions were reused between the traditional curriculum cohort and the flipped classroom cohort. The performance on reused questions of the flipped classroom cohort was significantly elevated over the traditional curriculum cohort (86.88 vs 81.03%, p < 0.0001, Table 5). More important is the effect size (Cohen’s d), with the flipped classroom cohort outperforming the traditional cohort by 0.61 standard deviations. Emphasis on effect size is often lacking in previously published reports of student performance, as statistical significance (p value) is easy to achieve with large cohorts of students. It is also noteworthy the improvement in lower quartile students was greater (+ 7.1%) than the upper quartile students (+ 2.9%) in the flipped classroom. This is most likely due to a ceiling effect of the upper quartile students but is important to recognize the impact FC had on lower-performing students.

Table 5 MCQ assessment performance

Similarly, the flipped classroom cohort outperformed the traditional cohort on the endocrine portion of the final exam. The final exam covered the gastrointestinal, endocrine, and reproductive system blocks and was sourced from a subscription to the commercially available NBME question bank. It was administered approximately 5 weeks after the in-house endocrine assessment (with reproduction block intervening). The exam contained 172 total questions, of which 50 were tagged as endocrine system questions by the NBME. The scores reported in Table 5, represent performance on only those fifty questions. The performance of the flipped classroom cohort was significantly increased over the traditional curriculum cohort (85.07 vs 81.68%, p < 0.001, Table 5), with an effect size of 0.43 standard deviations. For the NBME final exam, the performance improvements in upper and lower quartile students were similar with a slightly higher improvement for lower performing students (+ 2.6% and + 3.7%, respectively).

Impact on Student Satisfaction

The flipped classroom cohort of students was familiar with the traditional curriculum model, having been used for previous organ system blocks (e.g., cardiovascular and renal). Thus, it was possible to assess their satisfaction of the flipped classroom overall and to evaluate the ability of the passive and active learning sessions to achieve the goals. To review, the goal of the pre-session videos (passive) was to present a more efficient first pass of the foundational science content as preparation for the active learning session. The goals of the active learning session were to increase understanding of basic concepts and practice applying them. To evaluate whether the flipped classroom achieved these goals, 5-point Likert scale survey items were administered to the flipped classroom cohort. Results of five questions addressing these goals and the flipped classroom overall are presented in Fig. 1.

Fig. 1
figure 1

Survey results of the flipped classroom cohort of medical students, reporting their agreement (Likert-scale: 1, strongly disagree; 5, strongly agree) with statement characteristics of the flipped classroom curriculum. X-axis shows percentage of respondents who choose each Likert-scale answer. The categorical Y-axis displays a summary of the questions asked (numbered 1 thru 5)

Overall students were more satisfied with the pre-session videos, having rated the two survey questions at 4.42 and 4.14, respectively. The students found the pre-session videos to be concise and effective at delivering the required foundational science content and reported they were important to view prior to attending the active learning session. Although students were generally satisfied with the active learning sessions, the mean scores on the two survey items were lower as compared to the pre-session videos when asked if they helped in their understanding of and ability to apply concepts (3.58 and 3.70, respectively). When students were asked to ultimately state their preference for the flipped classroom model (experienced one-time during endocrine block) or the traditional model (experienced in all prior organ system blocks), 53% preferred the flipped classroom, 22% preferred traditional, and 25% were neutral.

Discussion

There is some flexibility when designing the elements of a FC model [12]. However, the core tenets are (a) a pre-session assignment accomplishing a first pass learning of material (generally lower-level Bloom’s) and (b) an in-person active learning session accomplishing higher level comprehension and application of the learned material. Sometimes, the in-person session can also serve to review the pre-session materials; however, in a time-constrained curriculum such as medical school, that would be problematic. Thus, careful consideration of the context of the medical school experience was paramount when designing this FC. This study sought to evaluate the impact of a FC model that abided by the above tenets, (a) was implemented in a multi-disciplinary organ system course, (b) was time neutral or timesaving, (c) preserved or improved student workload and satisfaction, and (d) was readily accepted by faculty as 100% of didactic instructors in traditional cohort accepted an invitation to be a facilitator for the FC cohort.

The development of the FC model for this study utilized a focus group consisting of student curriculum committee representatives to ensure student preferences were respected (and satisfaction maintained) coupled with course designer intents and feasibility. The student feedback was most important to the design of pre-session work. In short, the pre-session videos were triaged to essential concepts, and the medium (narrated videos with animations and live markups) met students’ preferences. Another influence on FC model design were principles originally authored for reading compliance in medical school [13]. The combination of reading compliance principles plus student feedback led to student fidelity with preparing for the active learning sessions. Additionally, trends in student learning across all spectrums of education are moving toward the inclusion of multimedia (e.g., videos) in first-pass learning [14].

The consistent messages from student representatives were to avoid requiring attendance to accomplish the learning objectives and to avoid increasing the workload. This informed the design, whereby the pre-session videos were purported to cover all the learning objectives for the endocrinology block. Thus, students were explicitly informed the sole purpose of the active learning sessions was to practice applying knowledge (to patient scenarios). No new content was taught or assessed from the active learning sessions. In essence, attendance at these sessions became optional; however, 90–100% of students either attended or watched the recorded videos of the active learning session. Further studies could extrapolate any performance differences between students who attended active learning live and in-person versus students opting to listen and “participate” later versus students who did not engage with active learning sessions at all.

The FC improved assessment performance of students on both a unit exam and the endocrinology portion of the NBME final exam. The latter exam was 5 weeks after the former, during which the students took two unit exams covering the reproductive system. The importance of the sustained improvement on the NBME final exam is twofold: (1) it indicates the FC may also improve long term retention, and (2) it provides evidence against exam item-selection bias, since NBME-sourced questions are unalterable and normed. This aspect of exam performance is often lacking in other FC studies [6, 10,11,12]. Furthermore, performance improvement occurred in all quartiles of students, with the lowest quartile students seeing the largest improvement on the in-house examination. The lowest quartile improvement reduced the performance gap between the lowest and highest quartile. Additional work could elucidate if the FC model can consistently close this gap and thus be used as a tool to minimize content-validity bias [15].

The nature of this study includes different didactic components that may have affected exam performance between cohorts. Though the condensed, more focused didactic modules in the FC cohort may have aided in the increased assessment performance, the active learning sessions also likely contributed to greater ability to apply the material (and thus higher assessment scores), which is consistent with FC literature mentioned previously. For example, Sheakley et al. found no difference in exam performance between traditional lectures and instructor-created modules [9], and Morton and Colbert-Getz highlighted the strength of a true flipped classroom in improving student performance on higher-level analytical questions [11]. However, due to the intwined nature of the passive and active components utilized in this study, it is ultimately difficult to distinguish the two and their individual and/or collective impact on results.

The averaged perception of the FC, based on student survey data, was positive (> 3,0 on 5-point Likert scale); however, compared to the mean for other Likert items, it was the lowest score (Fig. 1). Analysis of the free response comments revealed a polarizing experience for students. Although a more thorough study on the following trends is needed, 18 of 32 free response comments from the agree/strongly agree groups in Fig. 1, question 1 (i.e., preferred the FC model), cited having a pre-existing preference for independent learning and studying (sans consistent in-person attendance). Moreover, students with preference for FC liked the “deeper dive” into diagnostics and clinical decision-making offered by the active learning session (cited in 10 of 32 comments). Students who did not prefer the FC cited limited scheduled interactions with peers and faculty and a struggle to maintain an independent schedule without an in-person lecture to attend (cited in 7 of 15 free response comments). Several anecdotes explained that procrastination of the pre-session work led to “not getting much” out of the active learning session, a common problem with FC [8, 12]. This concern could be mitigated in a curriculum where the norm is FC model, as opposed to implementing FC in an established curricular culture. Again, a more thorough, qualitative analysis on a repeated survey instrument would be needed to broaden these trends.

Overall, this study demonstrates that a FC model can replace a traditional curriculum and in doing so, save time, enhance assessment performance, and satisfy the majority of students.

Study Limitations

One limitation of this study is the method or timing of students engaging with the active learning session. Due to institutional standards and norms, the active learning session could not be mandatory. Thus, some students attended the active learning session live, while others watched a recording of it. The latter was a study design choice to maximize the effect of the FC model. Even if active learning sessions themselves are more impactful in-person; some impact would be expected in simply viewing a recording of the active learning session.

The applicability of all the tenets of this FC model will vary even within organ system-based courses in medical school. One tenet in particular may be difficult to maintain: the avoidance of teaching new, testable content during the active learning session. Endocrinology offers a natural stepwise learning process; understanding of basic feedback loops and knowledge of lab testing options can lead to a plethora of clinical application scenarios. Clinical scenarios from other organ system pathologies may not be as stepwise, thus requiring teaching of new concepts during the active learning session.