Introduction

The National Science Education Standards (NSES; National Research Council (NRC), 1996), aimed at globally improving scientific literacy, emphasized an inquiry-based instructional approach to science teaching and learning. Building on this foundation, the recent publication of the Next Generation Science Standards (NGSS Lead States, 2013) further emphasized the teaching of science content through the integration of scientific practices and crosscutting concepts. Despite these recommendations, even experienced middle school teachers are implementing few inquiry strategies into their instruction (Capps & Crawford, 2013). To address this issue, a professional development model was developed based on Bandura’s (1986) social cognitive theory of learning and the belief that in order to effectively teach science as inquiry, teachers must have confidence in or self-efficacy beliefs about their own abilities to implement this approach. In addition to self-efficacy, teachers must hold beliefs consistent with inquiry practices, obtain inquiry teaching skills, and have time to practice implementing these skills with students in a supportive environment (Blanchard, Southerland & Granger, 2009; Luft, 2001; Singer, Lotter, Gates, & Feller, 2011). Based on this prior research, the primary research questions for this study were as follows:

  1. 1)

    How does this professional development model influence teachers’ quality of inquiry instruction as well as their personal self-efficacy and outcome expectancy for teaching science as inquiry?

  2. 2)

    How does this professional development model add to our knowledge of effective inquiry professional development?

Despite the emphasis on inquiry in reform documents (NRC, 2000), and research on the benefits of inquiry teaching, few secondary science teachers are successfully implementing inquiry (Capps & Crawford, 2013; Roehrig, Kruse & Kern, 2007). To understand this research to practice gap, Crawford (2014) described the need for more empirical studies that investigate how professional development (PD) influences teachers’ inquiry practice implementation. Although the general characteristics of effective PD are known (Garet, Porter, Desimone, Birman & Yoon, 2001), empirical research on the effectiveness of inquiry PD models has been limited (Capps, Crawford & Constas, 2012) and programs have met with partial success due to complex factors such as teachers’ belief systems, teachers’ lack of understanding of inquiry, and other school level constraints (Blanchard et al., 2009). This study sought to fill a gap in the inquiry PD literature through an investigation of a model inquiry PD program.

Literature Review

Bandura’s (1986) social cognitive theory, informed by literature on effective PD, guided the design of the PD model which focused on increasing teachers’ inquiry-based instructional practices through an emphasis on the five essential features of inquiry (NRC, 2000).

Inquiry Teaching

Inquiry teaching in science consists of a variety of instructional strategies that teachers use to guide students to understand scientific knowledge and scientific practices (NRC, 2012). Our program defined inquiry teaching using the five essential features of inquiry described in the NSES (NRC, 2000): (1) engaging with “scientifically oriented questions,” (2) “giving priority to evidence in responding to questions,” (3) “formulating explanations from evidence,” (4) “connecting explanations to scientific knowledge,” and (5) “communicating and justifying explanations” (NRC, 2000, p. 29). Engaging students in these activities helps them understand how scientific knowledge is generated and modified (Crawford, 2014; NRC, 2000). In the USA, the Next Generation Science Standards (NGSS Lead States, 2013) explicitly defined eight scientific practices that move teachers away from the often misunderstood term inquiry (Osborne, 2014). Crawford (2014) showed how these five essential features of inquiry align with many of the science practices in the NGSS; however, the NGSS document provides an increased focus on students engaging in modeling and argumentation. Our PD program took place before our states’ adoption of new NGSS-inspired standards, and therefore, the program instructors explicitly taught participants the five essential features of inquiry. However, participants also engaged in argumentation and modeling content experiences to prepare them to transition to new standards. The five essential features were also used as the foundation of the inquiry efficacy belief instrument used in this study (Smolleck, Zembal-Saul & Yoder, 2006).

Theory of Social Learning

According to Bandura (1993), teachers who have a high sense of efficacy about their teaching capabilities create a mastery-oriented environment that is more supportive of developing students’ intrinsic motivation to learn. Supporting research shows that teachers with higher teaching self-efficacy are also more willing to try new instructional techniques and to persevere through difficult tasks (Ross & Bruce, 2007; Tschannen-Moran, Hoy & Hoy, 1998). Ashton (1984) described how teachers with higher self-efficacy have a greater sense of personal accomplishment, hold positive expectations for student achievement, and believe it is their responsibility to alter instruction to increase students’ learning.

Dimensions of Self-Efficacy.

Many researchers have described the constructs of teacher efficacy and developed tools used to measure self-efficacy (Klassen, Tze, Betts & Gordon, 2011; Pajares, 1996; Tschannen-Moran et al., 1998). The majority of researchers believe that a teacher’s efficacy beliefs depend on both the tasks he or she has to perform and the context in which he or she has to perform these tasks (Tschannen-Moran et al., 1998).

Self-efficacy beliefs can be further broken down into two dimensions: personal self-efficacy and outcome expectancy (Bandura, 1986). In the case of teaching inquiry, personal self-efficacy is the individual teacher’s belief that he or she is capable of teaching an inquiry lesson. The second dimension of efficacy, outcome expectancy, is a person’s belief that their performance of a task will have a positive outcome. For example, within an inquiry lesson, outcome expectancy would be the teacher’s beliefs about the effects of their instruction on students’ science achievement. If teachers do not have successful experiences teaching or learning science as inquiry, it is unlikely that they will implement science as inquiry in their classrooms and unlikely that they will believe students will learn through this performance.

Effective Professional Development

Desimone’s (2009) literature review described five “core features of professional development” that have been shown to improve teacher knowledge and skills including a focus on subject matter content, opportunities for teachers to actively engage in learning, “coherence” or consistency between teachers’ beliefs and teaching context (e.g. district initiatives), and the goals of the PD, sufficient duration, and teacher collaboration (p. 183). Other research has focused on specific aspects of inquiry PD that support Desimone’s core features. For example, when teachers engaged in 80 or more hours of content-focused inquiry PD, improved inquiry instruction was found (Supovitz & Turner, 2000). Similarly, Penuel, Fishman, Yamaguchi and Gallagher (2007) found support for the core features of coherence and subject matter content but also described the importance of providing context specific support for teacher planning and implementation of complex inquiry curriculum. Likewise, Grigg, Kelly, Gamoran and Borman (2013) found that teachers only implemented partial inquiry cycles due to a lack of coherence between program and school goals and insufficient implementation support. Related to this, Capps et al. (2012) investigated the inclusion of effective PD features (Garet et al., 2001; Louks-Horsley, Love, Stiles, Mundry & Hewson, 2003) within 17 inquiry PD programs. As a result of their review, they call for additional research into inquiry PD programs, especially into the components that might be necessary to support teachers in enacting this complex pedagogy. Similarly, Desimone (2009) described PD features, such as a focus on student work, the role of curriculum, and impact of teacher reflection, that require additional empirical support before they can be considered “core features” of effective PD (p. 186). Our PD model included all five core PD features as well as a unique focus on student work through the practice-teaching and reflection sessions.

Professional Development and Teacher Efficacy for Inquiry Instruction

Effective PD programs have been shown to influence teachers’ beliefs as well as their instructional practices (Lotter, Yow, & Peters, 2014; Pajares, 1996). As Wilson and Berne (1999) state, “learning is hard” and teachers’ beliefs about inquiry (Lotter, Harwood, & Bonner, 2007; Luft, 2001; Wheeler, Bell, Whitworth & Maeng, 2015) often further interfere with their use of this learning strategy (p. 201). Many studies in science education have shown the influence of PD on teachers’ general self-efficacy to teach science using the Science Teaching Belief Instrument (STEBI, Riggs & Enochs, 1990; e.g. Hechter, 2011; Khourey-Bowers & Simonis, 2004; Lumpe, Czerniak, Haney & Beltyukova, 2012). However, only a few studies have shown a connection between teachers’ self-efficacy and their ability to teach science as inquiry. For example, Haney, Lumpe and Czerniak (2002) found that five out of six teachers in their PD program with high self-efficacy were scored as effective teachers using a protocol that included inquiry instruction. In their 3-year study, Lakshmanan, Heath, Perlmutter and Elder (2011) found, through the use of the STEBI, an increase in the participating teachers’ personal science teaching efficacy, but not their science teaching outcome expectancy after the teachers engaged in a PD program that focused on increasing their content knowledge and use of inquiry instruction. They also found that teachers with higher initial science teaching outcome expectancies enacted higher quality inquiry-based lessons. Posnanski (2002) who reported an increase in teachers’ personal self-efficacy beliefs but not their outcome expectancy beliefs after a PD program stated that “The implementation of innovative, new or standards-based teaching practices may precede the capacity of teachers to focus on student learning” (p. 212). Alternatively, the structure of a PD program, with practice-teaching and reflection opportunities, may result in an earlier focus on student learning issues and increases in teachers’ outcome expectancy beliefs.

Only one other study has investigated teachers’ efficacy beliefs around teaching science as inquiry with an instrument designed to measure the specific construct of inquiry science teaching (Smolleck & Mongan, 2011). This is a hole in the literature given that efficacy beliefs have been shown to be task specific (Tschannen-Moran et al., 1998), and thus research showing changes in teachers’ efficacy to teach science through general science teaching efficacy instruments may not necessarily equate to teachers’ efficacy for teaching science as inquiry.

Professional Development Model

The PD model began with a 2-week Summer Institute (Institute) and continued with three 4-h follow-up sessions held on Saturdays during the academic year. The Institute was divided into four main segments over the 2-week period (7 h a day for 10 days) that included whole-group inquiry instruction, small group content instruction, practice-teaching with middle school students, and small group reflection sessions. The middle school students were participants in a summer science enrichment program recruited through local school flyers.

Inquiry Pedagogy Sessions

During the Institute, the teachers spent between 30 and 60 min engaged in a morning session in which they participated in inquiry activities and pedagogy discussions as students. For example, the teachers participated in a guided laboratory in which they tried to determine how sound traveled through a toy phone (two cups with yarn connecting). The teachers were asked to make a few testable hypotheses (e.g. If I make the string longer, then sound will take longer to reach my partner), create an initial explanation of the phenomena and then test their predictions using additional materials such as different cups, different attachments, and metal cans (Cottam, 2006). After testing their variables and recording their observations, they were led through a whole-group discussion to collaboratively formulate a final explanation of the phenomenon. Following each activity, there was an explicit discussion of how the activity related to the five essential features of inquiry (NRC, 2000).

Content Instruction

After the whole-group pedagogy session, teachers were divided into three content area teams (cohort 1: energy, genetics, and astronomy; cohort 2: energy, genetics, geology) by grade level. In each group, a University content instructor (Ph.D. in science/science education) as well as a pedagogical instructor (middle school teacher who had completed this PD before) led the teachers through inquiry lessons. The science content was taught using locally developed middle school project-based curriculum units. For example, the sixth-grade Energy unit used a driving question of “How can I build a house that will keep me cool in our state?” to help the teachers investigate the state standards around energy and electricity. As part of this unit, teachers built and tested “coolers” using different insulating or conducting materials and analyzed temperature data to determine which materials would keep food warm for the longest time. During these sessions, the teachers learned content by experiencing a middle school curriculum as students as well as participating in additional experiences to advance their content knowledge.

Practice Teaching

During the first 2 days of the Institute, the pedagogical instructors taught lessons from the curriculum to the middle school students taking part in the summer enrichment program (two 90-min classes). After observing this model teaching for 2 days, teacher teams (three to five teachers) practice-taught lessons to the students that were adapted from the curriculum used during the content sessions. This practice teaching allowed participants to immediately enact their newly learned inquiry strategies and also to work through and adapt the new curriculum in a low-stakes environment. Each team of teachers taught six 90-min classes to small groups of students (8–15 students).

Reflection on Teaching

The teachers collaboratively reflected on both the content instruction and their own practice-teaching sessions daily. All practice sessions were observed by at least one project staff member (model teacher, science instructor, education faculty). The observers then met with teachers whom they observed to discuss their strengths as well as “missed opportunities” with inquiry teaching and how they could make adjustments before the next teaching session. The teachers were encouraged to try new strategies and step out of their instructional comfort zones during the following practice-teaching sessions.

Academic Year Follow-up

The teacher participants were asked to enact the inquiry units that they learned during the Institute with their own students during the following academic year. The PD program continued during the academic year with three 4-h Saturday sessions. These workshops began with an hour-long whole-group pedagogy session, and then participants engaged in grade-specific inquiry lessons to increase their content knowledge. In these small content groups, the teachers also reflected on their current use of inquiry practices in their classrooms. Outside of the Saturday workshops, in-classroom teacher support included occasional (if requested by teachers) project staff observations of teacher instruction and access to Institute materials.

Method

Participants

This study investigated two cohorts of teachers; each cohort participated in the program for only 1 year. The participants included 102 middle-level science teachers in a Southeastern state enrolled in the year-long PD program on the use of inquiry-based methods. Of the 102 participants, 52 were part of cohort 1 and 50 belonged to cohort 2. These participants attended a 2-week intensive Institute in the summer and continued with follow-up sessions and support during the subsequent academic year. Middle school science teachers who completed an online application were selected to participate on a first-come first-serve basis, with priority given to participants from partnering high-needs school districts. The sample was primarily female (83.7%), had a median of 10 years of teaching experience (ranged from one to 34 years), and included approximately equal representation of the three grade level groups (29.7% sixth grade, 35.6% seventh grade, and 34.7% eighth grade). Sixty-five of the participating teachers identified themselves as Caucasian (63.7%) and 37 identified themselves as African American or Black (36.3%).

Study Procedures

Efficacy Beliefs.

The Teaching Science as Inquiry (TSI; Dira-Smolleck, 2004; Smolleck & Yoder, 2008) instrument was used to measure the participants’ self-efficacy for teaching science as inquiry at three time points: beginning and the end of the 2-week Institute and again at the end of the academic year. The TSI measures self-efficacy beliefs in regard to the teaching of science as inquiry and includes 69 Likert-type items. Smolleck et al. (2006) showed the TSI to have strong evidence of construct and content validity and internal scale reliability with coefficient alpha values ranging from 0.60 to 0.78.

The sample in the current study represents in-service middle school teachers rather than pre-service elementary teachers, on which Smolleck et al. (2006) studied the TSI instrument for validity evidence. The in-service teacher participants were informed that the questions were phrased for pre-service teachers and to consider their answers in respect to their own experiences as practicing teachers. The TSI instrument authors granted permission to use the items exactly as written for this research.

The TSI addresses Bandura’s (1986) two dimensions of efficacy, personal self-efficacy, and outcome expectancy, to effectively teach science as inquiry. The TSI includes 34 items that were developed as indicators of teachers’ personal self-efficacy to implement inquiry strategies in the classroom (Smolleck & Yoder, 2008). The other 35 items on the TSI were developed as indicators of outcome expectancy for students in an inquiry-based classroom (Smolleck & Yoder, 2008). The TSI items were also cross-classified by the two types of efficacy as reflecting one of five essential features of inquiry as described by NSES (NRC, 2000) that apply across all grade levels in Supplemental Table S1. For example, “I have the necessary skills to determine the best manner through which children can obtain scientific evidence” was categorized as an item that measured teacher self-efficacy related to essential feature 3 (Dira-Smolleck, 2004, p. 2). Similarly, “My students will make use of data in order to develop explanations as a result of teacher guidance” was categorized as an item that measured outcome expectancy related to essential feature 3 (Dira-Smolleck, 2004, p. 3).

TSI Instrument Analysis.

Each of the items used a five-point Likert-type response scale that ranged from Strongly Disagree to Strongly Agree and used a midpoint of Uncertain. A total score for each of the five essential features of inquiry within each type of efficacy was computed. The five-point scale was coded as 1 = Strongly Disagree, 2 = Disagree, 3 = Uncertain, 4 = Agree, and 5 = Strongly Agree in order to compute these scores. Descriptive statistics were calculated for each score. A one-way repeated measure ANOVA was run on each essential feature score to investigate whether the response changes were statistically significant from pre- to post-Institute, from post-Institute to follow-up, and from pre-Institute to follow-up (total change). A Bonferroni correction was used to control for inflated type I error rate for multiple comparisons with each analysis. Assumptions of normality of the distributions of pre-Institute scores and post-Institute scores and sphericity (i.e. common variance between the differences in scores from pre- to post-Institute) for each of the five essential features were tested and deemed tenable.

Additional Survey Questions.

In addition to the TSI questions, each of the three survey administrations included other open-ended questions assessing the teachers’ understanding of inquiry teaching (e.g. “How would you now describe an effective inquiry lesson?”), their views of the effectiveness of the Institute components (practice teaching and reflection sessions), and their views on student learning during their inquiry unit implementation (e.g. “Did you perceive a difference in your students’ learning during this [inquiry] unit in comparison to other units you teach?”). A Methods Supplement provides a listing of all the survey questions.

Analysis of Open-Ended Survey Questions.

The open-ended survey questions were read multiple times, and teachers’ responses were recorded and placed into initial categories using a constant comparative method (Bogdan & Biklen, 1998). Initial categories were then counted for each question to determine major themes within and across the survey questions and PD years. Initial categories related to the teachers’ discussion of instructional strategies they gained (POE, evidence-based explanations, etc.), their definitions of inquiry teaching, and their discussion of student learning. Teachers’ definitions of inquiry and their description of their unit implementation were reduced into categories and organized into tables based on the survey administration. Other overarching themes describing what the teachers’ valued from the PD were combined into the final themes that are discussed in the “Results” section.

Two science education researchers (author 1 and author 3) initially coded a subset of the teacher survey responses separately. A comparison of the independently coded responses yielded a 70% agreement. The two raters met and came to consensus on coding categories and counts for the remaining survey responses until they were in 100% agreement.

Pre- and Post-Institute Inquiry Lesson.

Project staff observed one inquiry lesson before attending the Institute and one inquiry lesson during the academic year after the Institute (either directly or virtually). Each lesson was analyzed using the Electronic Quality of Inquiry Protocol (EQUIP, Marshall et al., 2010), which has been shown to effectively measure science teachers’ quality of inquiry teaching. The EQUIP consists of 19 indicators divided into four factors to evaluate inquiry instruction: (1) Instruction, (2) Discourse, (3) Assessment, and (4) Curriculum. For each factor, inquiry is divided into Pre-Inquiry (level 1), Developing Inquiry (level 2), Proficient Inquiry (level 3), and Exemplary Inquiry (level 4). Each level of inquiry is operationally defined for each factor. For example, within the Discourse factor, one item, Complexity of Questions, is described at the Proficient Inquiry level as “Questions challenged students to explain, reason, and/or justify.” Similarly, Proficient Inquiry for the item, Questioning Ecology, states: “Teacher successfully engaged students in open-ended questions, discussions, and/or investigations” (Marshall, Horton, Smart & Llewellyn, 2008). According to Marshall et al. (2010), level 3 or Proficient Inquiry is aligned with current standards-based instruction. Although created before the release of NGSS in the USA, the EQUIP aligns with NGSS’s emphasis on science practices through its emphasis on student-student discourse, student argumentation, and in-depth conceptual understanding of concepts rather that surface level memorization (NGSS Lead States, 2013).

EQUIP Analysis.

Prior to this study, a science education researcher (author 1) and two program staff members (former science teachers) were trained on the EQUIP through a set of training videos with expert scores provided on the instrument website. The training continued until all raters came to at least 95% agreement without comparing scores. All inquiry lessons in this study were scored by one of these previously trained researchers. From these scores, means and standard deviations were calculated for each teacher for each EQUIP factor (Instruction, Discourse, Assessment, and Curriculum) as well as a total sum score of all 19 items.

Missing Data.

The original sample of participants included 102 middle school science teachers. Because some participants did not complete all aspects of the data collection, the final sample size was 82. Bivariate associations were tested to compare the pattern of missing data for respondents with missing data on any of the TSI and EQUIP variables compared to the respondents having no missing data on these variables. This analysis resulted in no significant correlations between patterns of missing data and observed demographic variables for participants, suggesting the pattern of missing values were missing at random (MAR) and excluding participants with missing values did not result in biased estimates.

Results

Changes in Teachers’ Inquiry Instruction

Teachers’ pre-/post-Institute inquiry lessons (n = 58) were evaluated using the EQUIP observation protocol (Marshall et al., 2010). Teachers were encouraged to teach an inquiry lesson addressing the content standards covered during the Institute. Sixty-four percent (n = 37) of the teachers’ post-Institute lessons addressed content standards from the Institute.

The teachers who provided before and after Institute videos significantly increased their mean EQUIP scores from pre-Institute to post-Institute for all four EQUIP factors (Instruction, Discourse, Assessment, and Curriculum) as well as improving their overall EQUIP score (sum of 19 items) from a mean of 2.11 on the pre-Institute video to a mean of 2.42 on the post-Institute video (Table 1). Teachers’ instructional changes from pre-Institute to post-institute within the Discourse, Assessment, and Curriculum factors had large effect sizes (Cohen’s d greater than 0.5), showing practical significance in these areas. Although the teachers improved the quality of their inquiry instruction as measured by the EQUIP, the teachers mean scores on two of the four scales (Instruction and Curriculum) remained within Developing Inquiry (level 2) on the EQUIP. Teachers mean scores on the Discourse and Assessment subscales moved from Pre-Inquiry (level 1) to Developing Inquiry (level 2) from before to after their participation in the PD. These data show that the teachers’ instruction could be further improved to reach Proficient Inquiry (level 3), which is the level that Marshall et al. (2010) describe as being aligned with national science standards documents. These data also show that our participants began the program using mostly teacher-directed (levels 1 and 2) strategies within their self-defined “inquiry” lessons.

Table 1 Comparison of teachers’ mean pre-Institute inquiry lesson EQUIP scores to mean post-Institute inquiry lesson score by subscale (n = 58)

Teacher Personal Efficacy and Outcome Expectancy

The sample size used in the TSI instrument analysis with complete data from both cohorts was 82 participants. Fifty-one of the teachers were Caucasian (40 females, 11 males), and 31 were African American (29 females, 2 males). On the pre/post-Institute administrations for cohort 1, item 2 was inadvertently left off the TSI survey that was administered. This item was excluded from all analyses, including the internal constancy measures. The Cronbach’s alpha values for each subscale at each administration ranged from 0.59 to 0.90 and are provided in Supplemental Table S1.

Changes in Personal Self-Efficacy Across Inquiry Features.

Participants increased their perceived personal self-efficacy across all five features of inquiry after completing the PD. Means and standard deviations of the total scores are provided in Supplemental Table S2. Changes in total scores across all the essential features and the ANOVA F and p values (Table 2) for each inquiry feature show that the total scores had statistically significant increases from pre- to post-Institute for each feature with the exception of learner gives priority to evidence in responding to questions. Practical significance followed this same pattern with all features except learner gives priority to evidence in responding to questions (Cohen’s d = .278) having large effect sizes (Cohen’s d greater than .74). Statistically significant declines were observed from post-Institute to follow-up for the four features with significant increases during the Institute (Table 2). However, the total change from pre-Institute to follow-up was statistically significant and positive for all five features. From pre-Institute to follow-up, two features, learner formulates explanations from evidence (Cohen’s d = .53) and learner connects explanations to scientific knowledge (Cohen’s d = .596) showed medium effect sizes, while all other effect sizes for this time period were small.

Table 2 F and p values for personal self-efficacy total scores by inquiry features between three administrations of the TSI instrument

Changes in Outcome Expectancy Across Inquiry Features.

The teachers’ total change from pre-Institute to 1-year follow-up was statistically significant and positive for the three inquiry features associated with students developing and communicating explanations. The two features associated with students developing and testing scientific questions did not have statistically significant changes. Means and standard deviations of the total scores are provided in Supplemental Table S3. Changes in total scores across all the essential features of inquiry and the ANOVA F and p values (Table 3) indicate that the total score from pre- to post-Institute had statistically significant increases for three of the five inquiry features: learner formulates explanations from evidence, learner connects explanations to scientific knowledge, and learner communicates and justifies explanations. However, all effect sizes were small except for learner communicates and justifies explanation that had a medium effect size (Cohen’s d = .53). The teachers’ changes in outcome expectancy from post-Institute to follow-up were not statistically significant for any of the five inquiry features. Thus, the teachers maintained their outcome expectancy levels pertaining to explanations from after the program to the end of the year.

Table 3 F and p values for outcome expectancy total scores by inquiry features between three administrations of the TSI instrument

Teachers’ Beliefs About Inquiry and Their Instructional Changes

After the 2-week PD, teachers were asked on the post-Institute survey (n = 102) to describe both their greatest area of growth and their main learning gains from the PD. An additional question on the end-of-year survey also asked the teachers to describe the changes they made, if any, to their instruction after attending the PD program. Themes were consistent across both PD cohort years with the greatest emphasis on an improved understanding of inquiry, increased use of inquiry strategies (POE, CER, phenomena first), more effective teacher questioning, and the creation of a more student-centered classroom.

Improved Understanding of Inquiry.

In teachers’ descriptions of their greatest learning gains over the program (post-Institute survey), 40% of teachers’ comments referenced their gains in understanding inquiry teaching in general or their learning of specific inquiry teaching strategies (e.g. P.O.E., phenomena first approach). Throughout the PD program, the instructors’ modeled different pedagogical structures for helping students gather evidence through inquiry-based experiences to support and communicate their scientific arguments. Instructors emphasized a “phenomena first” or “explore before explain” instructional format. Program instructors also taught lessons using two structured protocols to improve students’ scientific explanations: Claim, Evidence, and Reasoning (CER, McNeil & Krajcik, 2008) and Predict, Observe, and Explain (White & Gunstone, 1992). Teachers also practiced these protocols during their sessions with students. A sixth-grade teacher’s comment of her implementation of phenomena first teaching is a representative of many of these comments. She stated, “The phenomena first idea of letting them experience it before I do any discussion. The quicker I can get them into the phenomena without discussion the better. Then it makes for the richest conversations ever with them” (cohort 2, post). A seventh-grade teacher stated that her gains included: “POE! Have students make predictions, and then let them do the active lab and after, explain what is happening using their data. Phenomena first” (cohort 1, post). In the end-of-year surveys, teachers also identified these instructional practices as strategies they had adopted into their own instruction. For example, an eighth-grade teacher stated, “Claim, Evidence, and Reasoning—I have put these into place for every lesson and lab I teach” (cohort 2, EOY).

In addition to referencing specific strategies, teachers also described a better understanding of inquiry teaching in general. An eighth-grade teacher stated, “I have a better vision of how it should look in the classroom after teaching the students in the summer program. I think teaching the actual content in the summer to students then discussing how the lesson went helped me tremendously” (cohort 2, EOY). Similarly, an eighth-grade teacher in cohort 1 stated, “I also realized that many of my inquiry activities weren’t really inquiry. Even if a lesson is active and fun, it doesn’t make it inquiry” (post). Teachers also emphasized the mastery experiences of the content sessions and the practice teaching as a key to their changes. An eighth-grade teacher stated:

I have had administrators and fellow teachers comment about including more inquiry driven lessons in the content. But, none of them took (or had) the time to explain what it is, what inquiry looks like in a classroom setting, or strategies used to implement inquiry. …Here I was given the concrete experience with content groups, which made me feel confident I could use it. Then we were able to apply the strategies in a classroom setting with a support system of fellow teachers. (cohort 1, post)

Teachers’ increased understanding of inquiry was also shown in the changes in their descriptions of an effective inquiry lesson from the pre-Institute to the end-of-year surveys (Supplemental Table S4). Teachers moved from describing inquiry more generally as involving students engaged and exploring through hands-on activities and teacher facilitation to more nuanced descriptions that included the need to assess students’ prior knowledge, engage students in higher level thinking questions, engage students in collecting and analyzing data, and in arguing from evidence.

More Effective Teacher Questioning.

Another commonly identified area of growth on the post-Institute survey (21% of teacher responses) and end-of-year survey (18%) was teachers’ discussion of how they had improved their questioning techniques. A sixth-grade teacher made this representative comment about her growth over the Institute, “I sat down with the students instead of my normal lecture position. I also asked questions and let the students answer them. I allowed the students to guide the discussion, and answer each other’s questions” (cohort 2, post). Teachers described their growth in questioning as including “not just answering students questions, but asking questions to help lead them to their own answers” (cohort 1, post, 7th), “remembering to ask why and keep asking questions” (cohort 1, post, 7th) and “using more analysis type questioning” (cohort 2, EOY, 8th). These self-identified claims are further validated through the significant change in teachers’ EQUIP Discourse scores from their pre-Institute to post-Institute lessons. Often, teachers described how the practice-teaching and reflection sessions pushed them to analyze their questioning skills. A seventh-grade male teacher described how the reflection sessions “made me really look at what I was doing during my teaching. They made me explore the types of questions I ask and what my goal is with each lesson that I teach” (cohort 1, post). A male sixth-grade teacher described through the practice teaching: “ …feeling the freedom of pushing myself to encourage students to learn and think without being as concerned about a supervisor’s criticisms toward my methods of instruction. I am more willing to make mistakes and learn from them” (cohort 2, post).

Less Teacher Telling/More Student-Centered Instruction.

The third most common identified area of growth on the post-Institute and second most on the end-of-year surveys was a pedagogical change from a teacher-directed to a more student-centered classroom. Seventeen percent of the teachers’ comments on the post survey and 19% on the end-of-year survey dealt with teachers gains’ in this area. A seventh-grade male teacher made this representative statement:

Although I am still learning, the main thing I have gained from this institute is that I need to do a better job of getting my students to THINK and not lead and guide them to the ‘answers’ … This institute has given me experience on using inquiry-based instruction so I feel now that I can take a risk on my part, just like my kids will be doing” (cohort 1, post).

Teachers’ comments in this area also emphasized the importance of seeing students learning while engaged in inquiry either during the practice-teaching component of the PD or in their own classrooms. An eighth-grade teacher stated on the post-Institute survey, “My greatest growth is seeing how allowing students to investigate new concepts on their own is not a waste of time when it is facilitated and structured in the right way” (cohort 2). Similarly, a seventh-grade teacher stated that her main instructional change after the program was “letting them look and explore. It was hard to give up class time to let them make observations and think about why. I learned that giving them that time was not a waste; it made them think and learn” (cohort 2, EOY).

The reflection sessions on the practice teaching, which allowed for immediate feedback from content and pedagogy experts and their teaching peers, were also described as valuable by all the teachers in the post-survey. This eighth-grade teacher’s statement is representative of the majority of teachers when she described the reflection sessions as “very useful in seeing others perspectives and how to make each lesson better than it was that day” (cohort 1, post).

After the program, teachers described having a better understanding of inquiry as a teaching strategy, improved questioning skills, and a decrease in the amount of teacher-directed instruction in their classroom through engaging their students in collaborative inquiry experiences and dialogue.

Discussion

The PD model investigated in this study resulted in observed changes in the quality of inquiry instruction in participating middle school teachers’ classrooms as well as changes in teachers’ personal self-efficacy and outcome expectancy for teaching science as inquiry. Teachers’ EQUIP scores significantly increased across all four factors (Instruction, Discourse, Assessment, and Curriculum) after participating in the program and teachers lessons moved from Pre-Inquiry (level 1) to Developing Inquiry (level 2) within the Discourse and Assessment factors. Teachers also described how the program improved their understanding of inquiry teaching, improved their ability to facilitate student collaboration through improved questioning and less direct instruction, and increased their use of inquiry strategies.

Personal Self-Efficacy

Findings for the personal self-efficacy scale showed statistically significant gains from pre- to post-Institute for four of the five essential inquiry features, slight significant losses from post-Institute to 1-year follow-up for the same four features, and statistically significant gains from pre-Institute to 1-year follow-up for all five inquiry features. These results provide support for the PD model, especially given the argument that teacher’s personal self-efficacy is thought to be relatively stable after their induction years (Ross, 1994; Tschannen-Moran et al., 1998). Henson (2001) in a review of teacher efficacy stated that “positively impacting teachers’ efficacy beliefs is unlikely outside of longer-term PD that compels teachers to think critically about their classrooms and behave actively in instructional improvement” (p. 831). Our PD model, designed with inquiry teaching experiences through participating as students in inquiry lessons, observing experts and peers instructing middle school students, and teaching students themselves, increased teachers’ personal self-efficacy to teach science as inquiry. Teachers worked in teams to plan, adapt, and enact inquiry lessons and received immediate peer and “expert” feedback through group reflection sessions. This feedback on both teacher practice and student learning issues was immediately incorporated into the group’s next inquiry enactment. This cycle of planning, teaching, and reflection may have helped build the teachers’ efficacy for teaching through inquiry (Clarke & Hollingsworth, 2002; Singer et al., 2011). Labone (2004) described how positive feedback encouraging teachers to set specific goals for improving their instruction is most beneficial when connected with teaching experiences, as was done in our PD through guided reflection on the teachers’ practice teaching. Henson (2001) found similar gains in teachers’ personal teacher efficacy after teachers’ involvement in collaborative research. Henson’s study points to the importance of teacher collaboration and goal-directed actions for changing teacher efficacy. Our PD model also emphasized daily goal setting based on collaborative practice-teaching evaluations. When student learning was not observed, teachers discussed strategies to change their pedagogy before their next teaching session, allowing them to try out different techniques and immediately analyze student learning. Palmer’s research supports this outcome (2011). That study revealed teachers’ science teaching efficacy was improved through constructive feedback from a perceived expert after inquiry teaching experiences. In Palmer (2011), the enactments took place in teachers’ classrooms, rather than a PD workshop. Conversely, our model provided for additional practice and reflective feedback before teachers enacted inquiry practices in their classrooms.

The summer practice-teaching experience involved small groups of students (10–15) and teachers working in teams to plan and instruct the students. Although this context provided a good practice environment, these factors may have negatively impacted the teachers’ beliefs about their own ability to teach science as inquiry with their students in their own classrooms with larger numbers of students. Additional instructional coaching or in-classroom mentoring during the academic year (beyond the supplied academic year workshops and classroom observations) might have reduced teachers’ post-Institute drop in efficacy beliefs.

Outcome Expectancy

Teachers’ outcome expectancy beliefs significantly increased across the program for three of the five essential features of inquiry. This is a worthwhile outcome of our PD model, as many prior PD studies have shown changes in teachers’ self-efficacy for teaching science but not their outcome expectancy (Lakshmanan et al., 2011; Posnanski, 2002). The practice teaching with immediate feedback and reflective planning allowed the teachers to gain confidence with inquiry instructional skills and observe students learning through inquiry. The reflection on inquiry practice continued, although to a limited degree, with instructor feedback given to the teachers during the academic year workshops and teachers implementation of the lessons developed during the Institute with their own students. Many teachers reported in the end-of-year survey that their own students learned from their implemented inquiry lessons and thus the teachers continued to increase their perceived outcome expectancy for inquiry during the academic year (around explanations).

Essential Features of Inquiry

Explanations.

Both the personal self-efficacy and the outcome expectancy features associated with explanations demonstrated growth over the entire PD program. Throughout the program, the instructors’ modeled different pedagogical structures (e.g. CER and POE) to support students’ scientific argument construction. Palmer (2011) found similar increases in teachers’ personal self-efficacy beliefs (the outcome expectancy scale was not analyzed) through their PD that helped teachers understand a specific six-step inquiry-teaching protocol, through both direct modeling of the strategy and teacher observations of pre-service teachers using this strategy with students. Thus, the integration of specific instructional protocols and opportunities to practice those protocols with students within PD models provide important scaffolding for teachers who implement new instructional practices.

Questioning.

For inquiry features that dealt with questioning, the teachers in our study increased their personal self-efficacy on learner engages in scientifically oriented questions but not on learner gives priority to evidence in responding to questions (pre to post and post to follow-up). The teachers did not increase their outcome expectancy with questioning.

Inquiry can range on a continuum from completely teacher directed to student directed. Our PD model emphasized guided inquiry, in which the teachers provided students with the question to investigate but allowed students to collect and analyze their own findings and reach their own conclusions. Thus, teachers might have felt less confident in engaging students in scientifically oriented questions or more open-inquiry experiences. Teachers described how they believed they had increased their skills with asking students probing and higher level thinking questions in the surveys, with little focus on engaging students in open-inquiry investigations. Teachers’ mean EQUIP Discourse subscale changes from Pre-Inquiry (level 1) before the program to Developing Inquiry (level 2) after the program further supports the teachers’ described changes. On average, the teachers were not yet engaging in discourse practices at Proficient Inquiry (level 3) as measured by the EQUIP, which would have included more student-directed discussion. The skill of connecting evidence back to supporting or not supporting a research question has been shown to be difficult for both students (Chang & Linn, 2013) and teachers (Capps & Crawford, 2013). Capps and Crawford (2013) stated that in their case study of eight teachers, “only one of the eight teachers was able to describe an instance where she helped students develop questions to investigate” (p. 513). Zangori, Forbes and Biggers (2013) found that elementary teachers spent more time engaging students with data collection and less time on data analysis and reflection, calling for teacher support for students’ evidence-based explanations through explicit scaffolding within curriculum or PD.

Conclusion

As teachers move to implement new standards across the content areas, PD must be designed to provide opportunities for teachers to practice with reform-based instructional strategies, reflect on this instruction, and increase both their efficacy and their instructional skills. On average, our teachers significantly improved the quality of their inquiry-based instruction after participating in the PD model. Guskey (1986) described a model for teacher change in which changes in teachers’ beliefs followed teachers observed changes in student learning after implementation. Our teachers were asked to implement inquiry instruction, despite their efficacy beliefs, during the practice-teaching sessions. The practice-teaching and reflection sessions were often described as being instrumental in changing teachers’ understanding of inquiry teaching and their beliefs about how students’ learn science best. Thus, our PD model may have given teachers the skills and knowledge to help them move past their initial beliefs.

Reflecting on Desimone’s (2009) five core features of effective PD, this study provides additional empirical support for these features as well as providing evidence for the inclusion of teacher reflection on student learning issues in inquiry PD (Capps et al., 2012). Teachers in our PD commented on the importance of seeing students learn through inquiry experiences as well as the value of reflecting with their peers on how to further improve their instruction and student learning. Further research will need to determine if practice teaching is vital for moving teachers forward or if other avenues that focus on student work, such as professional learning communities, are just as effective.