Introduction

Medical educators generally agree that students should be trained in settings that promote self-directed learning (SDL) in order to encourage higher-order thinking. As the best practices in teaching and learning have evolved, the literature reflects the fact that talented and seasoned teachers cultivate this process through a variety of pedagogical and assessment methods, even in the challenging arena of the lecture hall [16]. This poses fundamental questions of whether all educators can acquire the highly effective skillset of master teachers through targeted professional development [7], and if so, how that would be measured. In this paper, we describe the development of criteria for teaching faculty in their role as effective facilitators of student learning in interactive large group settings.

The studies by Newman and colleagues [8, 9] formulated a series of 11 criteria for lecturing skills that allowed “Overall lecture quality” to be assessed by peer review. Their criteria of lecturing performance such as “Encourages appropriate audience interaction” and “Monitors audience’s understanding of material” address some key issues of interactive teaching and learning. Srinivasan and colleagues introduced a framework of teaching skills that encompass a broad range of capabilities including content knowledge, learner centeredness, and practice-based reflection [5]. The work of Prober and Heath [1, 2] deals with innovative new approaches to medical education that are captured in their provocative article titles Lecture Halls without Lectures and Medical Education Reimagined. Rather than reinventing the large group lecture forum, these authors propose using methods in the classroom to foster retention through relevance, e.g., such as integrating patient discussion to render lectures “stickier (more comprehensible and memorable).” However, it was noted, this alone is still a lecture and works best with other strategies like the “flipped classroom,” (where students view video of the sticky lecture beforehand, allowing the lecturer to focus on interactive discussion and application of concepts in the large group setting) so that “Teachers would be able to actually teach, rather than merely make speeches” [1]. McLaughlin and colleagues have successfully implemented the flipped classroom approach in which students had access to a recorded library of content-focused lectures [3]. In addition, several activities were applied in their iteration of the flipped classroom (i.e., audience response and open-ended questions; multifactorial pair and share) and micro-lectures to reinforce and consolidate learning. However, the preview of recorded lectures may not confer sufficient depth and breadth of foundational knowledge to prepare students for higher-order learning in large group encounters, and there is precious little time in the lecture setting to differentiate that. The teacher/facilitator has to gauge readiness, to optimize participation, and to redirect the exchanges of learners to foster meaningful application of biomedical science to patient care. Master teachers intuitively establish this alchemy and balance the needs of learners with instructional goals and objectives. It was our intention to create an instrument to guide instructors on the best practices for interactive teaching in a large group setting, based upon proven approaches in the literature and at our institutions.

To optimize interactive teaching/learning in different sized groups, our educators have engaged in a variety of hybrid approaches to large group teaching, which rely upon a foundational form of SDL framed by goals and learning objectives for each session. In this report, we describe the process of development of a skill rating instrument with specific standards that can be used to enhance educator effectiveness. While the criteria for the “Conductor of Interactive Learning” (COIL) tool were originally developed to promote mastery of interactive teaching skills, as it was beta-tested at multiple schools, we have determined broader applications for using the tool to assess faculty teaching skills. The COIL tool itemizes core teaching competencies in a visually structured way to form a platform for formative feedback: it describes key aspects of teaching style and learner engagement, and the criteria are anchored with descriptors across a continuum (from novice to master level). For faculty training and assessment purposes, this tool facilitates the discussion of these milestones after observed teaching. As such, we propose that the COIL tool is poised to become useful for self and peer assessment of interactive teaching and to cultivate a reflective practice to enhance interactive teaching skills.

Methods

At Hofstra Northwell School of Medicine, we use a hybrid SDL design for delivery of the medical science curriculum. In a hybrid SDL design, learning activities outside of class are followed by a large group interactive session conducted by a faculty member, who is both a content expert and skilled in the process of hybrid SDL-based interactive teaching with the Socratic Questioning pedagogy [10]. We refer to such a faculty member as a “Conductor of Interactive Learning” (COIL).

We defined a set of criteria necessary for COIL faculty in their role as effective facilitators of student learning. The goals and learning objectives for the session define the depth and breadth of the session content. A key component of higher-order interactive learning is to have COIL faculty trust learners to contribute to the learning partnership and feel comfortable with a collaborative classroom. If learners do not complete prerequisite learning activities (pre-work) that focus on foundational stages of knowledge and comprehension, the interactive session is more likely to devolve into a traditional, unidirectional delivery of a lecture by the content expert. Similarly, learners should trust the COIL to engage them in higher-order interactive learning and to avoid reiteration of their H-SDL foundational content in the classroom. To ensure success, it is vital to adhere to clearly defined COIL standards (skills) to guide educators in their role as a COIL educator. In the development phase, our standards were developed for two domains in the assessment of COIL skills: (1) specific interactive skills (seven items) and (2) general educational fundamentals (nine items). We refer to these criteria for interactive teaching collectively as the 16 “COIL standards.”

Development of the COIL Tool

The COIL tool consists of the 16 standards and a rating scale for each of the standards. Next, we describe the four phases for the process of creating and finalizing the COIL standards and the rating scale.

Phase I: Creating the First Draft of COIL Standards

Standards that characterized interactive educators were formulated by the following: (1) review of instruments that have been validated for use in the assessment of peer/self-rating of lecturing skills and educator competencies; (2) direct observation (by authors PJG, DEE) of in-house educators (n = 16), master educators selected as role models, who were in the process of refining their own interactive teaching skills as they conducted large group learning sessions; and (3) consultation with a national group of medical education research experts (MEREs, n = 12) who were recruited by the lead author (PJG) through their affiliations as alumnae of the Harvard Macy Institute. As a result, a first working draft of the COIL standards was generated. This was an initial step in construct validation and testing the reliability of the COIL instrument [11].

Phase II: Beta-Testing Standards and Finalization

We selected our COIL faculty from both senior and junior in-house educators who expressed enthusiasm and interest in learning new instructional skills for large group teaching. These early adopters assessed draft versions of our teaching criteria (COIL standards). They participated in the primary validation process using audience response clickers on anonymous setting to objectively rate each of the newly drafted standards. For example, they rated the statement “This Standard is Important for Both Self and Peer Rating of COIL- Skills” using a scale of (a) very important; (b) important; (c) usable with modification (= major modifications required); and (d) not usable (or not important enough to include). Results were subjected to multiple rounds of in-house overview by core COIL faculty. Outside expertise was solicited for subsequent round of reviews. MEREs were given the complete drafts and charged with independently determining which standards merited inclusion. In later stages, all critical comments and baseline psychometric data (audience response data from earlier review rounds) were shared to negotiate consensus about the final iteration of COIL standards. The approach was designed to be inclusive and recognize the diverse expertise of stakeholders who invested time/effort to support this project. After extensive consideration of all feedback from in-house educators and experts, we finalized the 16 COIL standards by consensus of the core COIL faculty and implemented the majority of recommendations (>90 %) after three rounds of external review (Table 1).

Table 1 The 16 COIL standards: concise version

Phase III: Expanding the Definition of Standards for Clarity

As the standard design progressed, the core COIL faculty determined that more detail was needed, to limit variation in how the criteria were interpreted and applied, particularly as the standards were to be used for peer/self-assessment and individualized skill development. The developers then created elaborate descriptors to anchor each standard with specific examples (referred to as foundational components C1, C2, C3). We expanded on these descriptors with a series of statements that would capture the meaning of each standard with examples of specific behaviors. The expanded definitions of COIL standards were presented to the core-teaching faculty over the course of three workshops. The final versions of all 16 standards were placed on a website for easy access by collaborators http://medicine.hofstra.edu/pdf/department/scienceedu/scienceedu_coil_expanded_all.pdf.

Phase IV: Development of a COIL Skill Rating Scale

After developing the concise and elaborated version of standards that define the COIL skills, we focused on developing a rating scale to be used for each standard. Various rating scale formats were considered for an instinctive way to assess the behaviors defined as COIL skills. We deliberated upon eight individual scales and combinations, including 4-, 5-, and 7-point Likert scales [12] and the visual analog scale (VAS), commonly used for rating pain [13].

Variants of the VAS were considered, including gray and color continua, and as well as combinations of VAS/Likert scales. A dual approach was chosen to represent both quantitative (ordinal) and qualitative (interval) data. In-house medical educators (n = 16) assessed each rating scale option for the following perceived qualities: “ease of use, familiarity, usefulness, inter-rater reliability, internal consistency, and generalizability.” Prior to open-forum discussion, the group rated their preference using a scale from “Strongly Agree” to “Strongly Disagree.” Statements such as “Continuous Color Scales Would Work Best for Me” (personal best fit) were used for rating throughout.

For potential use as descriptors, two different word-series choices for the anchored rating scales were assessed by the 16 medical educators. We considered the 4-point approach “Strongly Disagree, Disagree, Agree, Strongly Agree” and then decided on the more representative “Novice, Advanced Beginner, Proficient, Master” descriptors based on the Dreyfus scale [14]. The statement used for rating “This Rating Scale is Both a Valuable and Easy to Use Instrument to Rate COIL skills” established this preference. Next, the MEREs independently evaluated all of their preferences for the COIL rating instrument, including numerical scale, VAS, and choice of descriptors. The core COIL faculty consolidated all feedback, written, and oral comments to finalize the COIL tool that included the 16 standards and the specific rating scale.

Results

Results of the Beta Testing of COIL Standards

Once the COIL tool was finalized, as described above, preliminary beta testing of the COIL tool was conducted by the in-house faculty (n = 16). They used the COIL standards to peer assess one another’s teaching skills, as well as that of volunteer faculty and guest presenters (who knew of the mandate to be interactive, but did not have standards to guide them). Each session was followed by open-forum discussion to talk about their results in implementing the COIL tool. The results reported are percentiles from the midline of both sides of the 4-point rating series, being 0 to −100 % versus 0 to +100 %. As seen in Fig. 1, 15 of the 16 standards were rated “Very important” or “Important” by +75 to +94 % of observers. This gave us valuable insights for refining the language of COIL standards by consensus of the core COIL program faculty (authors), and that of the expert consultant group (n = 12).

Fig. 1
figure 1

Results of beta testing COIL standards. Raw data (table below) and rounded percentages in a bar chart of anonymous clicker rating (1–4) for COIL standards as rated by in-house faculty (n = 16 responders) to “Value for self and peer assessment of COIL-Skills”. VI very important, I important, UWM usable with modification, NU not usable

Notably, only Standard 7 (S7) received the low negative rating of −69 % across the “Usable with modification” (UWM) and “Not Usable” (NU) categories. This standard differed from all others as it was formulated to distinguish the situation where a clinical core leader brought in multiple colleagues to present an interdisciplinary case. Standard 7 was modified to reflect the leader’s role within an inter-professional team of teachers. After rewriting (“As moderator of an interdisciplinary group teaching presentation, effectively managed contributions and student involvement”), it was independently rated by all reviewers as “important.” In this case, the other standards applied to all members of the inter-professional team. Similarly, other standards (with slightly negative ratings, 20 % range), S4, S12, S14, and S15, were reworked and optimized by the core COIL program faculty and sent out for review.

Results of COIL Standards Grouping and Elaboration of Definitions

During open-forum discussion and workshops with the core COIL faculty, it was noted that not all 16 of the first group of standards referred precisely to “interactive” COIL skills. By consensus, it was decided to sort the standards into two distinct domains: standards (S1—S7) apply directly to interactive skills (asterisked standards in Table 1), whereas standards (S8—S16) represent fundamental educational skills that can be applied to a variety of presentations, regardless of group size and pedagogical approach (Table 1, standards without an asterisk).

As previously described for phase III of development, we created elaborated definitions of the standards to support the use of this instrument for (peer/self-directed) professional development. For brevity, examples of the concise and elaborated COIL standards for teaching skills are shown in Table 2 while the full version including fundamental non-interactive standards may be accessed on the website: http://medicine.hofstra.edu/pdf/department/scienceedu/scienceedu_coil_expanded_all.pdf.

Table 2 Elaborated versions of COIL skills (interactive skills only)

Results of Beta Testing of the COIL Skill Rating Scale

To inspire educators to achieve the goal of becoming an adaptive expert in conducting interactive large group sessions, we provided them with the COIL rating scale, with the option of using this instrument for professional skill development via self and/or peer assessment of COIL skills (Fig. 2). We developed the tool to show both quantitative and qualitative rating of the COIL teaching standards being assessed. The example depicted in Fig. 2a allows raters to determine categorically which anchor statement and Likert box most closely corresponds to the standards being assessed during direct observation of teaching. It also allows a continuous rating of skills or fundamentals by use of a color spectrum-based visual analog scale across the top of the anchor statement and Likert box (Fig. 2b). When using the instrument, raters were instructed to first decide which anchored Likert statement best characterizes the observed skill and then to use the VAS to quantify the level of proficiency the teacher demonstrates in that skill. Thus, two types of data collection, with different resolution (see Fig. 2b), are supported by the COIL instrument (non-parametric from the Likert scale and parametric from the VAS).

Fig. 2
figure 2

The Hybrid Rating Instrument for COIL standards allowing alignment of continuous and categorical ratings. a The Rating Instrument is a hybrid of standards placed above a “Color Spectrum Visual Analog Scale” (VAS) and below that a 4-point Likert scale with anchor statements across standards that assess progressive competencies from Novice to Master. To rate the educational fundamental skills (S816), the VAS (only) is marked with a cross along its progression. (NA not able to rate). b The “X” mark position on the parametric scale is measured by mm ruler and derived as a whole scale percentage as well as its alignment with the 4-point non-parametric Likert scale being noted. Different parametric values can be associated with a single Likert box. The full series can be downloaded from the COIL website as a pdf. http://medicine.hofstra.edu/pdf/department/scienceedu/scienceedu_coil_rating_all.pdf

Primary assessment of COIL rating scales by the in-house faculty (n = 16) gave rise to divided opinions about the 4-point Likert scale. The VAS was most highly rated (+54 % on the “Agree” side of the 4-point scale) while the “Continuous Gray Scale” was rated lowest (−91 % it on the “Disagree” side). Initially, broad resistance to 4-point [forced choice] Likert scale was overridden by the use of the VAS combination. This offered the ability to go meaningfully across a Likert box as a continuum. The final rating scale option was a dual scale of a 4-point anchored Likert scale, adjacent to the continuous VAS. This approach was considered easy to use and allowed for co-collection of ordinal and interval data, with 100 % of the in-house faculty rating its efficacy as strongly agree. This version was adopted as the COIL rating scale of choice for implementation. Feedback from medical science educators after we presented COIL at national/international workshops suggested that using a visual scale in combination with the anchored Likert scale was an effective and complementary way to qualify the rater’s perception of observed teaching skills. This remains to be explored in subsequent validation research.

This last iteration of the COIL skill standards with the dual rating instrument was presented to the MEREs (n = 12) for final consideration with the same questions used for institutional faculty. They were accepted unanimously as being “acceptable and innovative.” The MEREs all preferred the “Novice to Master” option (Fig. 2), which was adopted. Comments from MEREs included “This is timely and should be pursued quickly; it is well justified since problems and needs are clearly there; Appropriate approach to evaluate faculty development in an innovative pedagogy.”

Discussion

Our initial goal for establishing COIL standards was to help educators adopt a new teaching approach, aligned with our school’s educational directives (to find an antidote to traditional lectures). These criteria for interactive teaching were intended to guide faculty to the “how and what to do” to effectively engage learners in large group setting. The COIL tool development process originated by comparing previously published tools with the real-life, independently derived practices of our best teachers. From the systematic observation of our master teachers and in-house teaching faculty during interactive learning sessions (by PJG, DEE), two distinct domains emerged (interactive skills and educational fundamentals) to ground the organization of our COIL standards. That was followed by the challenge to qualify the degree of proficiency teachers demonstrated for each of these criteria, and it was this step that transformed our COIL standards from guidelines to model best practices in large group teaching into a versatile instrument for the assessment of faculty teaching.

While developing the elaborated definitions for COIL standards (to ensure reliability in application), we tested the practicality of using the COIL instrument for faculty development in interactive teaching and to provide structured formative feedback to teachers as they practiced these new strategies. The COIL standards can apply to learner groups of variable size and have the potential to complement interactive engagement of learners in multiple educational venues. Having combined standards for both interactive skills and educational fundamentals with unambiguous rating scales to measure competency, all in a single instrument, offered flexibility and ease of adaptation for the purposes of training and assessment of teaching skills.

Medical educators look for ways to make their educational sessions pertinent and clinically meaningful to the budding physicians of tomorrow. The first component (C1) of the expanded definition of Standard 5 states “Was able to reliably and consistently promote creative scenarios where examples unfolded to arrive at novel elements of comprehension.” This prompts clinician educators to introduce working examples of key clinical experiences into the interactive learning session or integrate findings from case studies. Our observations of sessions conducted by practicing physicians who often utilize standard 5 support the perception that this fosters applied learning. Similarly, basic scientist educators could engage scientific foundations with clinical correlates and evidence-based medicine to achieve a similar goal.

The elaborated definitions of the standards have become an integral tool for professional development and added value to our program. Further, when students acclimated to the roll out of COIL and consistently completed SDL activities, our COIL faculty were capable and ready to move learners up to higher levels of critical thinking in the large group sessions. It was essential that COIL faculty were consistent in facilitating the applied and interactive learning in a compelling and intellectually challenging way. Large group attendance at most institutions is at record low; this interactive milieu brings learners to the classroom to gain insights they simply cannot accomplish via SDL.

When we developed the elaborated definitions of COIL standards, it served to (1) provide individualized guidance to educators along varying developmental stages, (2) standardize how our teachers interpreted the standards (and behaviors which demonstrate their application), and (3) support novice raters while observing interactive sessions. We anticipated that the COIL tool would provide a systematic path to help end-users at any level of proficiency to observe, assess, and practice exemplary teaching skills in large group settings.

The design of this instrument happened organically, through a process of iterative refinements that specifically addressed the needs of our teaching faculty. Utilitarian features like the dual rating scales (Likert and visual analog scales) evolved because rating the demonstration and proficiency of skills is inherently challenging. This requires the rater to make nuanced observations of behaviors and then to translate the decision of “how well” into checking off a Likert box (forced-choice set of 4; Fig. 2a). Data from a rating practice workshop (in Fig. 2b) shows how the visual analog scale permits the rater to calibrate the relative degree of proficiency once they have selected the “Proficient” box on the Likert scale. During beta testing of the rating instrument, participants universally appreciated the freedom this feature offered and claimed to have frequently “struggled to justify a choice between adjacent anchors in a Likert scale because the best fit was somewhere in between.” Note that in Fig. 2b, the range of values (from 55 to 77 %) on the visual analog scale under the “Proficient” Likert anchor is broad and reflects a low through high degree of proficiency. This information would be potentially relevant and useful to monitor progression of individual skills (for ongoing professional development).

The main strength of this study is that it focuses on the pedagogy of large group interactive teaching and presents a “how to do it well” model with measurable standards for assessment. No standard instrument exists to rate the effectiveness of interactive teaching in large groups. Testing the COIL tool in multiple contexts and settings, we have learned that it positively supports teachers unfamiliar with this pedagogical approach and guides them through progressive stages of faculty development and practice to attain effective interactive teaching skills.

COIL’s dual-rating scale for teaching skills (with descriptive anchors and dual qualitative/quantitative scales) stimulated animated discussion at national and international workshops among beta-testers; the oft repeated subjective feedback was that COIL is “an innovative way to overcome the common restrictions of other assessment tools.” Most notably, teachers appreciated how the visual scale permitted a nuanced way to characterize the proficiency of teaching skills in contrast to the fixed Likert rating. A potential challenge of this approach is that it requires substantial professional development to foster consistency (inter-rater reliability); particularly important if the COIL tool is used as a peer assessment instrument to support continuous quality improvement.

Future Directions

The development of COIL standards and teaching skill rating scales marked an important transition for the professional development of our teaching faculty and how we prepared them for interactive teaching. With refinements, as this approach and the use of the COIL tool become systematic, it would enable longitudinal study of developmental milestones for effective teaching (interactive skills and educational fundamentals) in the large group setting. Combining the use of the expanded COIL standards within frame of reference training [9, 15] will also help newly recruited faculty to become proficient with COIL teaching skills.

It is critical that learners understand the importance of hybrid SDL so that at the start of any session, students attain at least level 2 of Bloom’s taxonomy “Remember – Understand” [16, 17]. This introduces the issue of whether the COIL tool should be used by students to rate faculty. It has been debated extensively in multiple arenas, including faculty development sessions at participating medical schools (n = 5) and in workshops (n = 3) at national and international meetings. The consensus opinion was that giving students access to the COIL tool could be constructive. Moreover, it was noted that students often co-opt faculty resources to gain advantage and benefit their own learning. This may foster transparency and more insightful participation as a partner in the interactive classroom. We have concluded that the COIL tool has attained an appropriate level of content validation through our preliminary beta-testing efforts. It has also demonstrated potential applications for skill acquisition and self/peer assessment. In the next steps, we (1) aspire to optimize reliability across users, (2) evaluate how the COIL tool may be utilized to track the progressive development of teaching skills through reliable peer/self-rating, (3) plan to partner with evaluation experts to refine our validation studies, and (4) engage a larger pool of medical schools in testing the tool for potential use among learners and for faculty professional development.