Introduction

Understanding good college teaching practices has become increasingly important with continued public affronts on the quality of higher education (Arum and Roksa 2011; Carey 2014; Guttenplan 2014; Kaminer 2013). With rising costs of college and diminishing public trust in higher education, the public media has questioned the educational process inside college and university classrooms (Carey 2014). Responding to these public concerns, policy-makers have considered the importance of college teaching in preparing a 21st century workforce and democracy. In one example, the call for better teaching in the Science, Technology, Math, and Engineering (STEM) fields has reached the level of a national imperative, as policy-makers consider improvements in college teaching as central to broadening student participation in STEM fields and staying at the forefront of scientific and technological advances (Henderson et al. 2011). Henderson et al. engaged in a meta-analysis of the literature on reforms in STEM college teaching and found that researchers across three fields have taken up this call to action, including disciplinary-based education researchers, faculty development researchers, and higher education researchers. Beyond STEM, considering the cross-disciplinary work on college teaching, we would add the learning sciences to this list of fields advancing the understanding of college teaching strategies largely from a cognitive perspective. Taken together, these fields have mounted a great deal of evidence about effective college teaching, yet the changes that professors have made in classroom practices have not been as substantial or pervasive as could be possible given the level of evidence of efficacious practices. Beach et al. (2012) comment on the state of moving evidence to practice in college teaching:

“But despite having significant knowledge of effective practices, along with curricular and pedagogical resources, efforts to transform introductory sequences have met with only modest success. While we may know what to do, we do not know how to enact and sustain these reforms at scale.” (p. 53).

Studies of trends in faculty teaching practices across disciplines provide context to Beach et al.’s (2012) concerns that the evidence for what works in college teaching practices has only been made actionable by a proportion of faculty. For example, results from the 2013–2014 national survey of faculty from the Higher Education Research Institute, a study that was across disciplines, indicated that less than half of faculty assigned group projects and 56 % reported using student inquiry to drive learning (Eagan et al. 2014). One promising finding was that 83 % of faculty reported incorporating the use of class discussions, but this appears not to be a dominant practice during the class period given that 51 % of faculty reported using extensive lecturing in all or most of their classes. Additionally, these results are self-reported by faculty whose responses may suffer from social desirability bias, causing upward estimates of active learning given that it is a well-known evidence-based practice. While substantive scholarship has focused on how faculty perceive their teaching at large (e.g., DeAngelo et al. 2009; Eagan et al. 2014; Umbach and Wawrzynski 2005) and how students perceive college teaching (e.g., NSSE 2015; Reason et al. 2006), little research examines the pervasiveness in evidence-based college teaching practices as embedded in courses. The few multi-institutional studies that do examine specific classroom practices at the course level were largely based on self-reports of students or faculty (e.g., Cabrera et al. 2001; Volkwein et al. 2007). A notable exception is Hora and Ferrare’s (2014) study, which relied on external observers to understand patterns of teaching practices using networks analysis, but the study had a small sample (58 instructors across three institutions) and viewed classes individually using a variable-centric approach. Interestingly, though, single institution studies have had more methodological diversity in the study of active learning practices. For example, Lammers and Murphy (2002) conducted an observational study of 58 college classrooms at a U.S. public university, and found that, on average, courses were instructor driven (i.e. no students actively engaged in course material) almost 50 % of the total class time. Yet, these single institution studies do not yield a broader perspective for the field. In order to understand pervasive trends in evidence-based college teaching, practices should be studied across multiple institutions, but understood as situated within the contexts of classrooms. The purpose of this multi-institutional study was to enter inside the college classroom using external observers to understand teaching practices at the course level by discerning patterns of pedagogical practices embedded in heterogeneous groups of courses. In essence, do courses display patterns of teaching practices in a systematic way as appraised by individuals other than instructors or students themselves?

Active Learning

Although higher education scholars have considered many ways to understand college teaching, active learning has been a central focus in the literature over several decades (Carroll 1963; Freeman et al. 2014; McKeachie and Kulik 1975). Active learning can be understood as pedagogical practices that engage students in their learning process, such as class discussions, group work, instructor questions, problem-based learning, and case studies (Arends and Castle 1991; Astin 1993; Braxton et al. 2000; Carini et al. 2006; McKeachie and Kulik 1975; Pascarella and Terenzini 2005; Rothkopf 1973; Van Der Meij 1994). The active learning paradigm grew out of the movement to transition from teacher-centered to learner-centered teaching practices. According to Robertson (2005), in learner-centered courses, “teachers construe themselves to be facilitators of student learning as opposed to teacher-centered teaching where teachers see themselves as disseminators or imparters of knowledge” (p. 181). In active learning, by engaging students in the classroom, instructors allow students to co-construct their learning experience around course ideas. In McKeachie and Kulik’s (1975) review of effective college teaching strategies, they contended that “since 1970…the importance of student participation and interpersonal interaction emerges more and more clearly” (p. 199). Rooted in this movement, the past two decades of most national survey studies of college teaching practices have focused on measuring the extent to which instructors interact with and engage their students (DeAngelo et al. 2009; Kuh 2008). For example, Koljatic and Kuh (2001) examined 14 years of data from the College Student Experience Questionnaire, and found that the frequency of student involvement in active learning did not change over time.

In addition to studies exploring teachers’ facilitation of active learning practices, several studies have focused on the effects of active learning practices. These studies contribute an important voice to the conversation because students’ classroom perceptions are heavily shaped by how faculty members structure class time, as active learning is associated with greater student self-reported educational gains (Laird et al. 2008; Kuh 2008; Pascarella and Terenzini 2005). This can be seen in Braxton et al.’s (2000) study, which found that exposure to active learning practices, such as class discussion, influenced students’ institutional commitment and persistence decisions. In another example, Cabrera et al. (2001) studied survey responses from more than 1200 students in engineering courses across seven schools and found that collaborative learning and instructor interaction contributed to important student outcomes, such as group skills, problem solving skills, and occupational awareness. However, each of these studies treated teaching practices as individual variables and did not examine whether there were different patterns of courses with collective teaching practices.

The results of active learning studies from discipline-based education researchers in the STEM field move beyond student and faculty self-reports to find promise in a broad base of experimental studies. Freeman et al. (2014) conducted a meta-analysis of 225 studies of STEM undergraduate courses that compared student performance in courses with at least some active learning to courses with traditional lecturing alone. They found dramatic results, with an effect size of 0.47, indicating that students in courses with some component of active learning scored a half a standard deviation higher when compared with students in standard lecture courses. While these results may seem conclusive, other scholars questioned whether a meta-analysis of these binary buckets (lecturing vs. active learning) was an adequate picture of teaching practices. As such, in Hora’s (2014) published response to Freeman et al.’s (2014) study, he discerns that, lecturing is not a monolithic category and that faculty can use a variety of different strategies while also maintaining “continuous exposition by the teacher,” which was the condition for “lecturing” in the Freeman et al. study. As such, Hora (2014) states, “skepticism is therefore warranted regarding the assumption that all of the 225 studies were measuring (or controlling for) the same type of instruction” (p. 3024). Following in this vein, the body of literature on active learning has been critiqued for using theoretical tools that are too blunt to understand the complex dynamics within a college classroom (Hora and Ferrare 2014). Teaching is a complex activity that is influenced by different behavioral factors, such as clarity, interaction, organization, enthusiasm, and feedback in a classroom (Feldman 1997; Marsh and Roche 1993). Using broad categories such as “active” or “passive” may obscure important subtleties in the teaching process, such as the interaction between multiple forms of pedagogy and the disciplinary context (Hora and Ferrare 2014; Major and Palmer 2006; Oleson and Hora 2014; Shulman 2004a).

Developing Theory

Frameworks on college teaching culled from the learning sciences or socio-cultural studies examine the subject matter and subtler practices that may shape student learning. One example that explores such subtleties was Hora and Ferrare’s (2013) study examining class observation and faculty interview data from 57 faculty across three research institutions. They found that there were systems of practice embedded within disciplines and instructors that indicated configurations of teaching methods during a class period that exhibited five dimensions of teaching practices (teaching methods, pedagogical strategies, student–teacher interactions, cognitive engagement, instructional technology). In another example, Neumann’s (2014) qualitative study of a college philosophy class in a high diversity institution found that subject matter and disciplinary ideas, teacher’s framing of the material, learner’s prior experiences, and course context all shaped the learning process. While this qualitative study highlights the complexity of the teaching and learning process and draws in a cognitive element that was less visible in other frameworks, this study was not able to understand the breadth of this cognitively responsive paradigm across courses. Other qualitative studies of individual courses or small groups of courses (e.g., Lattuca et al. 2004) also focus on the specific disciplinary contexts that shape faculty practices.

In each of these studies, the developing teaching theory points to the use of varied teaching techniques in specific educational spaces, contexts, conditions, and for specific students. In essence, an expert instructor knows what specific core ideas for the subject matter can be taught by using lecture—and this lecture may look different, taking on various rhetorical forms, at different times during the semester with different students (Bransford et al. 2000). Perhaps instead of two binary buckets of lecturing and active learning, teaching would look more like a web weaving between content, pedagogical technique, cognitive engagement, and course context. Hora and Ferrare (2014) demonstrate this phenomenon though network analysis of instructors’ teaching tools. Neumann’s (2014) work builds off the larger learning science literature base that Bransford et al. (2000) summarized in a report by the National Academies, demonstrating the important role of certain teaching practices in students’ learning. For example, teaching in depth for a small number of core ideas for the course is more beneficial for learning for transfer when compared to teaching for breadth across a wide variety of ideas (Bransford et al. 2000). Additionally, using a framework to organize and sequence the ideas in a way that is specific to the discipline has been shown to aid recall transfer. Metacognition and reflection about how a student is learning—what a student knows and does not know—has also been connected to learning. In each of these examples of practices, the efficacious practices do not focus on the frequency of interaction (as does active learning), but a consideration of the context of the learner, the teacher, and the course content (Neumann and Campbell 2016). Understanding these complex dynamics that take shape between faculty, students, course content, and context is often learned through many years of practice, in several facets of faculty lived experiences, and in specific developmental experiences (Major and Palmer 2006; Neumann 2014; Oleson and Hora 2014; Shulman 2004a).

Methods for Multi-Institutional Study of College Teaching Practices

Self-report surveys have been the prevalent means for studying college teaching across disciplines in multi-institutional studies. Faculty surveys are a commonly used instrument to study faculty teaching for a variety of reasons, such as improvement or evaluation of teaching practices. Based on a study by Carnegie Foundation for the Advancement of Teaching, approximately 82 % of the universities consider faculty self-reported data to review classroom teaching (Magner 1997). Additionally, national surveys such as the Higher Education Research Institute Faculty Survey (HERI Faculty Survey) and the Faculty Survey of Student Engagement (FSSE) use questionnaires specific to pedagogical practices to determine the perception of faculty about their own teaching practices. The HERI Faculty Survey asks faculty to report their teaching practices across classes, whereas FSSE asks faculty to report class time allocated to particular teaching methods that are linked with high levels of learning and development. As an example of the pervasiveness of the use of faculty survey to measure college teaching nationally, the most recent HERI faculty survey report (2013–2014) was based on 16,112 full-time college and university faculty members at 269 four-year colleges and universities nationwide. Another often cited method for studying faculty-teaching practices is through student self-report surveys. In one prominent example, the 2015 NSSE instrument asked approximately 315,000 first year and senior students across 541 universities about the teaching practices in their courses.

These national surveys of college teaching are useful toward understanding how students perceive their college course experiences and how faculty understand their teaching practice. Yet, the view that these methods yields is narrow in three primary ways. First, student surveys may not be able to capture developing theories that examine subtle dynamics that take shape between students, course content, and context. Students are not often equipped with adult learning theory or theory from the learning sciences that would be important when answering questions based on rich theory. Secondly, most multi-institutional studies of college teaching ask respondents to aggregate their experiences over the past year. This may reflect a general sense of student or faculty experiences at-large, but aggregating teaching experiences over the past year does not give evidence of the specific teaching practices that are important given certain course content with specific students as embedded within a course. This is problematic when theories from the learning sciences and socio-cultural studies of teaching and learning demonstrate that pedagogical choices are made at the course level, with specific content for specific students. Finally, both faculty and students have a substantial stake on what takes place in the classroom, which may influence the perspective that self-reports yield. For example, the results of faculty self-reporting of their teaching practices may not correlate with the results of students’ perception of teaching in the classroom (Centra 1973; Feldman 1989a). Perhaps an outside observer who does not have a stake in the course could offer a unique perspective.

As demonstrated by qualitative scholars of college teaching and learning, theory on college teaching points to context and complexity, and therefore, the field is in need of a nuanced window into the college teaching and learning experience. Substantive scholarship has focused on self-reports of teaching behaviors by faculty (e.g., Umbach and Wawrzynski 2005; Volkwein et al. 2007) and self-reported educational engagement by students (e.g., Cabrera et al. 2001; Reason et al. 2006). Largely missing is scholarship that unpacks this issue from inside the classroom and across multiple institutions. Studies using a quantitative observational protocol may provide useful supplementary insights into classroom practices in a way that mirrors the complexity of recent theories, but with a view that aggregates across courses. Additionally, while current studies contribute to the field’s understanding of what teaching practices faculty enact (for example, what proportion of faculty use active learning and what proportion use lecture), these studies treat the occurrence of teaching behaviors individually rather than understanding that a heterogeneous population of teachers likely favors one set of teaching practices over another. New analytical tools enable us to understand how patterns of teaching behaviors are grouped together among certain clusters of courses. Therefore, the purpose of this study was to address the following research questions using quantitative observational data and latent class analysis (LCA):

  1. 1.

    Are there classes of courses that cluster according to their pedagogies (class discussions, class activities, student questions, lecture, subject-matter expertise, prior knowledge, supporting changing views)?

  2. 2.

    If so, what set of pedagogies is each class of courses more prone to enact?

Theoretical Framework

The theoretical framework for this study considers two distinct conceptualizations of college teaching: the landmark theory of active learning (Carroll 1963; McKeachie and Kulik 1975) and the emerging conceptualization of cognitively responsive teaching (Neumann 2014) that rests on several prominent theories from the learning sciences (Bransford et al. 2000; Shulman 2004a, b). The conceptualization of “active learning” was developed the 1960s, and it grew up in a context of two simultaneous movements: “student-centered” learning (McKeachie and Kulik 1975) and “time on task” (Carroll 1963). Since that time, there have been a broad range of pedagogies that have been considered as active learning strategies in the literature base including, but not limited to, class discussions, group work, problem-based learning, studio-based learning, experiential education, and case studies (Arends and Castle 1991; Astin 1993; Braxton et al. 2000; Carini et al. 2006; McKeachie and Kulik 1975; Pascarella and Terenzini 2005; Rothkopf 1973; Van Der Meij 1994). These active learning techniques both focus students’ attention on course material (i.e. time on task) and create opportunities to bring forth the student’s context and learning needs (i.e. student-centered learning). The basis for this theory is that students learn best by engaging in course material through interacting with peers and instructor and actively engaging in the course content.

The second framework we used, namely Neumann’s (2014) framework on cognitively responsive teaching, is an emerging conceptualization of what good college teaching entails, focusing on the complex interactions between subject matter, teacher, learner, and context. Neumann (2014) claims that cognitively responsive teaching: (1) engages students deeply and intentionally in core subject matter ideas embedded in the discipline (Neumann 2005, 2014; Shulman 2004a, b); (2) connects students’ learning of new subject matter ideas to their prior knowledge, both personal and cultural (Bransford et al. 2000; Ladson-Billings 1995); and (3) supports students whose prior knowledge clashes or challenges their learning of the new material (Bransford et al. 2000; Gonzalez et al. 2005).

Neumann’s (2014) first claim suggests that good teaching facilitates students’ encountering of subject matter ideas. Neumann argues that subject matter matters (Bransford et al. 2000; Dewey 1902, 1916; Shulman 2004a, b), and as such, student learning requires interaction with subject matter ideas culled from a discipline. This claim is based considerably on Shulman (2004a, b) theory of pedagogical content knowledge that contends that good teaching moves beyond having expertise in a content area—and also that good teaching moves beyond having expertise in broad-based pedagogical practices. Instead, Shulman (2004a, b) suggests that good teaching entails understanding that certain pedagogy works better with certain content as culled from the discipline. Pedagogical content knowledge supports the idea that expert teachers understand how students react to course content taught in certain ways, and therefore use this insight to make decisions about sequencing content, the use of different pedagogical practices, and the unfolding of content ideas across the class session and the course.

Moving forward, Neumann’s (2014) second claim proposes that good teaching requires connecting students’ learning of new course ideas with their prior knowledge. Rooted in previous research that suggests that students’ learning of new ideas surfaces prior knowledge (Bransford et al. 2000; Dewey 1902; Shulman 2004a, b), this claim suggests that good teachers know how to engage students in conversations that bring forth their prior knowledge (Neumann 2014). Prior knowledge encompasses more than what the students know about the course content prior to the course (such as a pre-test), but it also includes the lived experiences and understandings, often cultural, that shape the students’ ways of framing the course material. For example, in learning about the history course of the cold war, a student from Russian heritage may have prior knowledge that the professor would need to uncover, in depth, in order to proceed with learning.

Lastly, Neumann’s (2014) third claim suggests good teaching involves supporting students’ changing views. In this claim, good teachers provide opportunities for students to express being challenged by the juxtaposition of old and new ideas (Neumann 2014). This claim is rooted in the notion that good learning happens when a student reconciles differences between prior views and beliefs and new subject matter ideas from the course (Bransford et al. 2000; Shulman 2004a, b). While students’ prior knowledge in some cases may aid learning, in others, their prior knowledge can serve as a barrier. Returning to the example of the history course on the cold war, the lived history of the cold war may have been very different for a student whose family is from Russia than a student whose family was living in the U.S. during the war. A professor can support both students in changing views by providing space for any dissonance between old views and new course ideas, and then supporting students both cognitively and emotionally through the dissonance that arises (Neumann 2014).

Methods

This study uses data from a multi-institutional quantitative observational study of 587 college classrooms. By “quantitative observation” we refer to an observation protocol that uses a closed-ended, highly structured rubric and coding scheme with raters who have been specifically trained to rate according to the conceptual framework used in this study (Johnson and Christensen 2010; Stallings and Mohlman 1988; Waxman et al. 2004). A benefit of using this method is that external raters, with knowledge of teaching and learning in higher education, can appraise the use of instructional practices in the classroom rather than relying on self-reported practices by instructors or students. Additionally, as Hora and Ferrare (2014) describe, this systematic observational method allows for raters to witness the subtle dynamics among students, faculty, course content, and context by being present in the classroom.

Sampling

We purposively selected nine institutions to examine course practices across varying institutional types (2 research, 2 comprehensive, 5 liberal arts) and levels of selectivity (highly selective through broad access). Although these institutions are not generalizable to all undergraduate institutions, they represent a broad swath of institutions. To assist in understanding the transferability of results, we present the characteristics for these institutions in Table 1.

Table 1 Characteristics of institutions

Within institutions, we employed random sampling of undergraduate courses, stratified by faculty category, class size, and discipline. Prior higher education literature has linked these factors to teaching quality or learning outcomes (Astin 1993; Johnstone and Maloney 1998; Pascarella and Terenzini 2005; Toth and Montagna 2002). We also weighted our sample of courses by seats within class size, ensuring that very small classes did not have an undue influence on our findings. While most classes were small, enrollment was clustered in larger classes, so a sample that considered the number of seats in each course better represented what a majority of students would experience at these institutions. We selected 350 courses from each institution, and faculty had the option to consent to include each course that was selected. Thirty-four percent of faculty who were invited to participate for one or more courses agreed to participate (similar response rate to national faculty survey rates). We were not able to observe all consented courses based on scheduling and observer constraints—among consented courses, we purposively selected courses that contributed to the representativeness of our sample along our strata. Observed courses were representative of course mode (online vs. on site), but slightly over-represented by tenure line faculty. As expected based on our sampling scheme, larger courses were over-represented due to weighting by enrollment. We present the representation of observed courses to the population of courses at the nine institutions in Table 2.

Table 2 Representativeness of observed courses compared with the population of courses at selected institutions

Procedures

Site teams of seven to ten extensively trained observers visited each institution for 1 week during mid-semester. Observers participated in an observation training of approximately 30 hours. Observers were required to pass a certification process to demonstrate their knowledge of the conceptual frameworks and the observation procedures. They also were required to pass an inter-rater test to demonstrate reliability with ratings from the principal investigator. For each course, two observers rated the entire duration of one class period (with some exceptions due to scheduling conflicts where one rater observed).

Data Sources

We created rubrics for the class observations to measure active learning, passive learning, and the three facets of Neumann’s (2014) cognitively responsive teaching framework. The development of the rubrics was assisted by both content experts (in teaching quality) as well as methodological experts (in survey and rubric design), who tested the rubrics for content and response process validity. Active learning and cognitively responsive teaching were operationalized for this study as the presence or absence of seven course behaviors that mapped onto the theoretical framework. Specifically, we considered three broad categories of active learning techniques: class activities, class discussion, and student questions. Class discussion was observed in a class session as a free-flowing back and forth conversation among students or between students and the professor. We distinguished this from class activities that were more structured—for example, a faculty member could have given instructions for students to complete a task or set of related tasks during the class. A final form of active learning we included was when students asked questions of the professor during the class. Arguably, this is the least “active” technique, but still indicated student engagement and some degree of interactivity during the class session. In addition to the three active learning practices, we included lecture in our model. The lecture was observed in a class session as the instructor presenting course material to the students without interaction among students or between the instructor and students. This way to define lecture is similar to the way it is defined in the Freeman et al. (2014) study of active learning and lecture practices as “continuous exposition by the teacher.” Finally, we included three teaching practices that mirror the three claims of good teaching in Neumann’s framework on cognitive responsiveness: the orchestration of in-depth subject-matter ideas, understanding students’ prior knowledge, and supporting students by bridging their prior knowledge to the new core ideas of the course. In order to demonstrate how the rubric on Neumann’s (2014) framework of cognitively responsive teaching might manifest in specific course contexts, we provide an example of rubric tuning to a fictitious course in research methods in Table 3.

Table 3 Cognitively responsive teaching tuning example, fictitious course in quantitative research methods in psychology

We present the seven behaviors, their operationalization, their mapping onto the theoretical frameworks, and their descriptive statistics in Table 4. We investigated patterns of courses across these seven dichotomous behaviors.

Table 4 Operationalization and descriptive statistics for seven course behaviors

Inter-rater Reliability

To calculate the inter-rater reliability of the observation data, we used a one-way, absolute, average-measure, mixed-effects intra-class correlation (ICC) calculation, which is appropriate for the current study given: the ordinal nature of the observation categories, the classes were random but the coders were fixed (i.e. we are not generalizing to a larger pool of raters outside the study), we used more than one rater per class, and we had an interest in the consistency of the absolute value of the ratings (Hallgren 2012). For observation, the ICC across all items was 0.705, with ICCs of sub-scales ranging from 0.664 to 0.787 (Cicchetti’s 1994 cut off values: 0.60–0.74 good; ≥0.75 excellent).

Disciplinary Considerations

Given that we studied college teaching in the context of disciplinary core ideas, we placed observers, as much as possible, in courses that matched their disciplinary backgrounds. As such, observers may be able to understand core concepts and how the frameworks apply within each discipline. We also did post hoc analyses to determine the extent to which disciplinary expertise matters in the ratings. We investigated whether the inter-rater reliability would be greater in courses where both observers had subject matter expertise compared with observers where only one had substantive subject matter expertise. We found that although the pairings of observers who both had expertise had slightly higher inter-rater reliability compared to pairing one expert observer with one non-expert observer (ICC = 0.737, 0.715, respectively), both pairings had good inter-rater reliability.

Analyses

While the literature on teaching practices has adopted a variable-centered approach, we opted for latent class analysis (LCA) instead. Variable centered approaches such as factor analysis assume that the relation among variables is the by-product of a homogenous population. The end result is scales, quantifying the intensity of the variable. This homogeneity condition is not tenable in dealing with discrete or categorical events as the presence or absence of teaching behaviors in particular course setting (e.g., Masyn and Nylund-Gibson 2012; Wang and Wang 2012). Instead, LCA assumes that the heterogeneity of observed patterns of behaviors may be the product of subpopulations or latent classes (Geiser 2012; Masyn 2013; Wang and Wang 2012). In other words LCA is a statistical method for finding subtypes of related cases (latent classes) according to their observed values on a set of categorical or nominal indicators in cross-sectional data (Geiser 2012; Masyn and Nylund-Gibson 2012). While cluster analysis has been traditionally used to discover subpopulations of individuals, it has several limitations. Above anything else cluster analysis is not an inferential technique (Wang and Wang 2012): the number of clusters is determined by rules of thumb and examination of graphs, which interject a great level of subjectivity on the part of the analyst. On the other hand, LCA is an inferential method allowing the analyst to compare alternative models of latent classes (Geiser 2012; Masyn 2013; Wang and Wang 2012).

Latent class analysis requires meeting three conditions: having a relatively small number of behaviors, having a sufficient sample size, and having a missing data pattern completely at random-MCAR (Masyn and Nylund-Gibson 2012). The sample of 587 courses is sufficiently large to analyze classes resulting from a hypothetical set of 49 combinations associated with our seven behaviors. Moreover, the condition that the probability of missing data on each of the 49 combinations as occurring completely at random was also met (\(\chi_{ 2 5 1}^{2} = 93.5\), p = 1.0). We ran our LCA analyses in MPlus statistical software (version 7.3).

Choosing the number of latent classes of courses that represent the behaviors under examination follows a step-wise process that combines statistical model-fit indicators with model usefulness indicators based on classification quality and theoretical underpinnings related to the substantive interpretability of the classes (Masyn 2013; Wang and Wang 2012). We started with a single class model, testing the hypothesis of homogeneity of teaching classes, and then increased the number of classes by one each time to examine an alternative hypothesis of heterogeneity of teaching practices. To determine model fit, we used a combination of indicators as recommended by the LCA literature. Those included: the likelihood ratio Chi squared (\(\chi_{\text{LR}}^{2}\)) statistics, BIC, BIC adjusted, AIC, and the adjusted Lo–Mendell–Rubin (\(\chi_{\text{LMR-LRT}}^{2}\)) statistics.Footnote 1 To determine model usefulness, we examined the entropy (E) statistics and the classification quality as determined by the average latent class probability for the most likely latent class membership by latent class. We also studied conditional probabilities of behaviors in each class to determine whether the classes were distinct and homogeneous in terms of course behaviors. Besides statistical indicators, we brought interpretability considerations to aid in the selection of the final model as suggested in the literature (e.g., Geiser 2012; Wang and Wang 2012).

Results

Table 5 reports the results of six alternative cluster models of seven teaching behaviors. As we added classes, one at a time to the model, we examined the absolute and relative fit indices and considered whether the model significantly increased the fit as evidenced by the adjusted differences in \(\chi_{\text{LMR-LRT}}^{2}\) (see last column in Table 5). We also brought to bear our knowledge of the literature in guiding our model selection.

Table 5 Fit and modification fit indices for alternative cluster models of seven teaching behaviors

The 1-class model, corresponding to the hypothesis that there is no heterogeneity in teaching behaviors across the 587 courses, was rejected. The single class model has the largest maximum likelihood (−2364), BIC (4772), BIC-adjusted (4749) and AIC (4741) among all six models tested, suggesting that this model fits the data worse than all other models. In other words, there is heterogeneity in the way courses cluster together in our sample. While the two-class model and three-class model were improvements upon the one-class model, as indicated by lower BIC, BIC-adjusted and AIC, they did not fit the data as well as the four and five class models. We examined the fit of the four and five class models closely, given their similar fit values (see Table 5). At close inspection, we found support for a latent class model consisting of five classes. The likelihood ratio statistics for the five-class model is non-significant (\(\chi_{\text{LR}}^{2} = 96.9\), p > 0.05), suggesting it adequately fits our data. The five-class model also has the lowest AIC value (4241), and its entropy value of 0.821 is above the 0.80 threshold meeting the condition of being a reliable model (Wang and Wang 2012). While the four-class model has lower BIC values and a higher entropy value in relation to the five-class model, the BIC-adjusted value for the five-class model represents only a slight worsening of fit relative to the four-class model. Moreover, the adjusted \(\chi_{\text{LMR-LRT}}^{2}\) value test, comparing the four-class model versus the five-class model (see last column in Table 5), was statistically significant, indicating that adding the fifth class indeed increases the model fit. In addition to meeting the fit criterion, each component of the five-class model has a high degree of internal consistency as shown in Table 6. Finally, the classes identified by the five-class model, as shown below, are interpretable. Collectively, this evidence lends strong support for the five-class model.

Table 6 Average latent class probabilities for the most likely latent class membership (row) by latent class (column)

Table 7 presents the results of the LCA, revealing five distinct clusters. The rows report the conditional probability of class membership for each of the seven behaviors within each of the latent classes. Following Masyn’s (2013) recommendation, we judged behavior membership to a particular class based on 0.70 and 0.30 thresholds (we also highlight probabilities that are close to these cut-offs).

Table 7 Unconditional and conditional probabilities for the 5-class LCA model (N = 587)

Class 1, labeled ‘Comprehensive’, represents a category of courses that was likely to enact a variety of teaching practices, including all seven practices investigated in this study. This class was the largest among the five classes representing 55 % of the sample. We labeled this class as ‘Comprehensive’ because the courses were likely to use teaching practices aligned with both frameworks we examined in this study (active learning and cognitively responsive teaching), as well as lecture. Notably, contrary to the literature that pins active learning as opposed to lecture (Freeman et al. 2014), this latent class enacted both lecture and active strategies. Additionally, instructors of the ‘Comprehensive’ courses delved deeply into subject-matter knowledge and students’ prior knowledge.

Class 2, labeled ‘Traditional Lecture,’ was most characterized by lecturing and subject matter knowledge. This class represented 19.3 % of the courses in our sample. In particular, these courses used traditional and teacher-centered modes of instruction, and did not enact active learning or cognitively responsive teaching practices. The internal consistency of this class, while still in the acceptable range, was lower than other classes (0.736) and had some overlap with class 3, ‘Active Lecture’ (Table 6). Given the overlap among the ‘Traditional Lecture’ and the ‘Active Lecture’ classes, there may have been certain courses that were emerging from traditional passive pedagogies and moving in the direction of becoming more active in their teaching practices.

Class 3, labeled ‘Active Lecture’, included a combination of lecture and active learning practices. Representing 19.1 % of courses, this class enacted traditional lecture and also incorporated class activities and student questions. Like the ‘Traditional Lecture’ group of courses, the ‘Active Lecture’ courses demonstrated teaching with in-depth subject-matter knowledge. However, these courses were not characterized by teaching practices that delved into students’ prior knowledge, according to Neumann’s (2014) framework.

Class 4, labeled ‘Integrative Discussion’, is the smallest among the five classes, accounting for only about 2 % of the courses. This class was selective in its practices. For example, ‘Integrative Discussion’ courses enacted active learning by using class discussion, but not activities or student questions. These courses also focused in on subject-matter knowledge and supporting students in making connections between their prior knowledge and the new course ideas. In this way, these courses may have used discussion to help students integrate their prior knowledge with new course ideas.

Class 5 represented only about 5 % of the courses examined. We called this class ‘Active Only’ because it enacted active learning by getting students engaged in class activities, but it was not organized around either in-depth subject matter expertise or the students’ prior knowledge. Courses in this class were unlikely to include class discussions, lecture, student questions, or any of Neumann’s (2014) three cognitively responsive teaching practices (subject matter knowledge, students’ prior knowledge, or supporting changing views). Out of the seven teaching practices, this class was only characterized by class activities, leaving us to question the focus and direction of these activities.

We closely examined the evidence for the final two latent classes given that they included only a small proportion of the courses in this study. While they are small, the ‘Integrative Discussion’ and ‘Active Only’ latent classes had very high internal consistency (0.871, 0.973 respectively). Additionally, these two latent classes of courses were consistent in both the 4-class and 5-class models that we tested, lending support for them as distinct latent classes. Finally, these classes represented meaningful categories of courses given the theory on active learning and cognitively responsive teaching, which we consider more fully in the discussion section. While we believe that the evidence from the model supports the inclusion of these smaller latent classes, future studies should attempt to replicate these results with larger samples.

Limitations

This study has certain noteworthy limitations. This study investigates teaching practices across disciplinary contexts. The classes of courses may change if the study was conducted within one particular discipline, and we see this as an important area of further research. However, given that this is an initial examination of a broad number of college classrooms, this study provides a view across classes of courses and the practices they enact. The way this study considers college course practices across disciplinary contexts is similar to other multi-institutional studies of college teaching (e.g., Carini et al. 2006; Umbach and Wawrzynski 2005). Secondly, the study is of 587 courses, but we only visited one class session, mid-semester. This is a limitation given that teaching unfolds across a semester and the ordering of course ideas is germane to pedagogical content knowledge (Major and Palmer 2006; Shulman 2004b). Yet, the study does reveal a snapshot in time of the practices across nine institutions—and perhaps, in the aggregate, this gives a sense of the “mid-semester” practices taking place on these institutions, even if individual course practices may vary from course to course. Another possible limitation is the reactivity involved in observation (participants may react differently while observed; Anderson and Burns 1989; Jacob et al. 1987). We attempted to limit these biases by instructing observers to be as unobtrusive as possible, reassuring participants about the confidentiality of the data. Finally, this study includes a sample of only nine institutions and is not necessarily generalizable to the broader population of higher education institutions. We provide information about the institutions in our sample in Table 1 to assist in interpreting whether the results transfer to other higher education settings. Given these reasons, we consider our study to be an exploratory examination of the complexity of teaching practices that occurs among courses.

Discussion

This study used latent class analysis (LCA) to determine whether there were patterns of courses mid-semester based on teaching behaviors from active learning theory and cognitively responsive teaching theory. In prior literature, studies examined which courses enacted individual practices, such as active learning techniques (Cabrera et al. 2001, Freeman et al. 2014; Hora and Ferrare 2014). However, by using LCA, we were able to learn whether groups of courses enacted sets of teaching practices (e.g., active learning in combination with lecture and cognitive responsiveness). Our study found that five groups of courses enacted specific patterns of educational practices, and with strong consistency within each group. In essence, this means that the courses within the study are heterogeneous in that there are five groups of courses, but within each group, these courses act similarly. These results characterize a strong LCA model (Wang and Wang 2012).

We see two main contributions of this study to the literature on college teaching and learning. First, understanding groups of courses and the teaching practices that characterize them gives insight into how teachers are, or are not, enacting both traditional and newer college teaching frameworks, and yields a multidimensional perspective on teaching practices that a variable centric approach cannot depict (Wang and Wang 2012). Second, surprisingly few multi-institutional studies focus at the course level when examining college teaching. Multi-institutional studies of college teaching have mainly surveyed students about their overall course experiences (Kuh 2001; Reason et al. 2006) or faculty about their overall teaching practices (DeAngelo et al. 2009; Feldman 1989a, b; Umbach and Wawrzynski 2005). This study observed almost 600 college classes to view the educational practices as they unfold within courses.

Returning to the theoretical contributions of this study, within the literature base on college teaching, we see a continued evolution of understanding effective practices. Although the literature base is broad and non-linear, we see a general movement from didactic teacher-centered paradigms (i.e., lecture) to active learning and student-centered paradigms to responsive paradigms (i.e. cognitively, culturally, or context responsive). We see artifacts of each of these movements in the classes of courses found our study. We also find that the classes of courses in this study depart from the traditional ways of describing these frameworks by uncovering the complex interplays across these frameworks in the way they manifest in the classroom.

The literature on college teaching has largely critiqued the traditional paradigm focusing on the instructor as the purveyors of expertise to students (as receivers) through a recitation or lecture (McKeachie and Kulik 1975). Our study revealed that one class of courses, the ‘Traditional Lecture’ class, still enacted these more traditional teaching practices, based on a teacher and expertise-centered paradigm. These courses, at least in the one class session observed, appear to be unsuccessful in active learning techniques as they did not engage students in class activities, class discussion, or student questions. These courses also did not appear to take into account the cognition of the students by incorporating teaching behaviors using students’ prior knowledge or connecting that prior knowledge to the core ideas of the course. It is possible that these courses are “stuck” in older paradigms of teaching; they have not taken up the call to active learning and engagement, nor have they delved into the minds of students through Neumann’s (2014) framework. Another possibility given that we observed one class mid-semester is that these courses used this particular class section for lecture, and may enact active learning and cognitively responsive teaching in subsequent class sessions. Future research that investigates these ‘Traditional Lecture’ courses across the semester would provide additional insight into how these practices unfold over time.

The Traditional Lecture class represented about 19 % of the courses in this study. In this respect, our findings run contrary to earlier studies that found that lecture was still a predominant teaching strategy in college teaching even after the considerable evidence of active learning practices took hold (Freeman et al. 2014; Hora and Ferrare 2013; Koljatic and Kuh 2001; Lammers and Murphy 2002). This discrepancy between our findings and the literature base can be traced back to the different analytical methods used in examining teaching practices. If a variable-centric rather than latent class approach had been used, we would have reported that 85.5 % of courses in our study had included a lecture, a finding largely consistent with prior literature using variable-centered methods for studying teaching behaviors. By using LCA, we were able to detect patterns of class practices that certain courses enact. The ‘Traditional Lecture’ group was the only class of courses to enact lecture with subject-matter expertise, but no other active learning or cognitively responsive practices.

Moving beyond traditional lecture, for decades, evidence accumulated that involving students in their education through active learning and student-centered practices was a more effective college teaching practice and contributed significantly to student learning gains (Cabrera et al. 2001; Colbeck et al. 2001) when compared to traditional lecture and teacher-centered approaches. More recently, Freeman et al. (2014) did a meta-analysis of college teaching studies and concluded that lecture was a “pariah” in college teaching, and should be left behind in favor of “active” practices. Our study reveals good news, in that four of the five classes of courses in our study were likely to include active learning techniques. However, the LCA analysis moves beyond this approach by examining the sets of practices that are enacted together with the active learning practices. This approach, perhaps, provides a more dynamic picture of the college classroom than the one painted by variable-centered methods.

For example, the ‘Active Lecture’ group uses lecture, student questions, and class activities—so is ‘active’ by engaging students in their learning process—but also utilizes the traditional lecture approach. Yet, this group of courses does not surface students’ prior knowledge or connect that knowledge to the new core ideas of the course. In essence the ‘Active Lecture’ courses integrated the active learning framework into their repertoire, but do not yet enact cognitively responsive teaching. Similarly, the ‘Active Only’ class of courses was likely to enact class activities, but unlikely to portray in-depth subject matter ideas or respond to students’ cognitions of the core ideas. We see these classes as potential evidence that the emphasis on singular categories (such as active learning/lecturing; Hora and Ferrare 2014) may be causing interactive education to take place without the associated meaning-making. Additionally, research using a variable centric approach cannot see heterogeneous groups of courses utilizing sets of practices. It, therefore, obscures which active learning courses or lecture courses also enact associated meaning-making using cognitively responsive paradigms.

Perhaps the most promising finding from our study was the ‘Comprehensive’ class, representing the largest proportion of courses in our study (55 %). The courses in the ‘Comprehensive’ class enacted active learning through class activities, class discussion, and student questions. While the findings of this study support active learning in the classroom, they also move beyond these binary categories. Contrary to prior literature that casts active learning strategies as favorable and lecture as problematic (Freeman et al. 2014), we find that the ‘Comprehensive’ courses both use lecturing and active learning practices. By using several modes of delivery (lecture, activity, discussion), these courses also are able to enact Neumann’s cognitively responsive framework. Out of all five classes of courses, the ‘Comprehensive’ courses, that use both active learning and lecture practices, were the only ones that also enacted all three of Neumann’s (2014) framework—in-depth subject matter expertise, understanding students’ prior knowledge, and bridging students’ prior knowledge with new core ideas of the course. Neumann (2014) contends, based on a framework from the learning sciences, that these three elements are necessary for learning. Lecturing may be an important ingredient of pedagogical content knowledge and connecting core ideas to students’ own understandings, under certain specific course contexts.

The final class in our analysis, “Integrative Discussion,” represents only 2 % of the courses in our sample. Yet, we see this class of courses as important in understanding how Neumann’s framework may be enacted in courses. These courses held class discussions and the instructor described in-depth subject matter ideas and helped bridge students’ prior knowledge to these core ideas. It may be that this particular class session mid-semester was devoted to discussing, in-depth, the integration of the course ideas to students’ prior knowledge. Perhaps class sessions earlier in the semester within the same course may have used other pedagogical techniques or focused on understanding the students’ prior knowledge of the core ideas. Perhaps the process of integrating core subject matter ideas with students’ prior knowledge requires in-depth discussion among students and instructor (Neumann 2014).

Our findings seem to support Hora and Ferrare’s (2014) conclusion regarding the policy approach to teaching: “In attempting to categorize teaching into simple dichotomous groups (e.g., lecturing vs. interactivity), policy makers are provided with an inaccurate and coarsely grained perspective of teaching that does not reflect the realities of classroom practice” (p. 36). By contrast, our LCA analysis was able to uncover how teaching practices (lecture, active learning, cognitive responsiveness) work together or separately in classes of courses. Simply ‘becoming active’ without the associated meaning-making or framing from students’ lived experiences or the core ideas of the course may not be enough to produce learning. Likewise, there may be moments in courses where focusing in on certain kinds of modes or pedagogical approaches, such as seen in the ‘Integrative Discussion’ class of courses, may be warranted. In this way the duality of a theory that juxtaposes active against passive learning does not seem to fit the data in this study, but this warrants future research. While this was an important finding given that the literature base often draws upon this duality, we wonder whether more nuanced understandings of what “active learning” might mean could be more meaningful in terms of representing what college teachers practice in their courses. For example, we wonder if the rubrics had more specifically examined studio-based or problem-based learning (Kwan 2009; Little and Cardenas 2001) instead of a broader category of “class activities”—would this be more meaningful in the course clusters?

The findings of this study suggest further research into college classrooms to understand the complex dynamics that take shape between students, faculty, course content, and context (Campbell in press; Neumann 2014). There has been a great deal of attention on the High Impact practices touted by the American Association of Colleges and Universities (AAC&U), such as culminating experiences, service learning, and internships (Kuh 2008). In addition to these practices, which often focus on the co-curriculum or a specialized curriculum, our research points to investigating the complex phenomenon inside and across college classrooms. Additional studies could further research what the ‘Comprehensive’ courses look like in practice. Further research could consider the timing of these courses in the beginning, middle, and end of the semester and how these sets of practices change overtime. Future research could also investigate the necessary conditions for these class memberships—do certain institutional types or course characteristics, such as class size, discipline, or faculty characteristics breed ‘Comprehensive’ or ‘Active Only’ courses? Perhaps, most importantly, do these classes of courses predict certain learning outcomes? Based on the college teaching literature and broader literature from the learning sciences, the individual practices of active learning and understanding and integrating students’ prior knowledge have been found effective in student learning (Cabrera et al. 2001; Colbeck et al. 2001; Freeman et al. 2014; Neumann 2014). However, this study found that these practices are utilized together in groups of courses, which leads to the question of whether these groups predict outcomes. Finally, for the faculty who seem to be ‘stuck’ in the traditional lecture model, further research could investigate whether there are ways to enhance faculty understanding of these new frameworks and to help faculty make them actionable.