Introduction

International leaders in science, engineering, technology, and math-related learning research have made important discoveries on how students learn and recommendations on how students should be taught to maximize learning outcomes, critical reasoning skills, and the development of supportive learning communities (e.g. Ben-David & Zoohar, 2009; Cohen & Spillane, 1993; Colburn, 2000; De Jong & Van Joolingen, 1998; Drayton & Falk, 2001; Duschl, Schwingruber, & Shouse, 2007; Flavell, 1979; Herron, 1971; Krajcik, Czerniak, & Berger, 2003; Kuhn & Pearsall, 1998; Organization of Economic Cooperation and Development [OECD], 2016; Rutherford & Ahlgren, 1991; Schwab, 1962; Von Secker & Lissitz, 1999). The USA’s National Research Council (NRC, 1996, 2005, 2012) provides recommendations on procedures for teaching science-related topics in the K-12 learning community, with one of the most referenced guides being the National Science Education Standards. One recommendation is the use of inquiry-based learning methods for covering content through the use of the scientific method and scientific reasoning. The NRC’s (2005) How Students Learn History, Mathematics, and Science in the Classroom expands on the process of thinking scientifically to providing opportunities for students to reflect on what they are learning and how that information corresponds with prior knowledge bases. The NRC’s Next Generation Science Standards (NGSS; 2012) was published as a new declaration of the importance of teaching scientific concepts within the framework of the scientific inquiry process. The NGSS performance expectations for K-12 disciplinary core ideas include recommendations of inquiry-based learning processes such as designing experiments, collecting and analyzing data, visually presenting data, and providing written and oral discussions and critiques of data interpretations.

Although most science educators encourage the integration of scientific research processes into the learning environment, it is important that inquiry-based learning is interwoven into the framework of a coherent curriculum that serves as an instructional blueprint for student learning (Beane, 1995; NRC, 1996). Inquiry-based learning methods are only one component of an integrated set of pedagogical processes and assessments that facilitate the investigation of concepts, the development of skills and the mastery of objectives, and an understanding of the building, communicating, and critiquing of knowledge. International measures of competencies in science and math-related fields such as the Programme for International Student Assessment (OECD, 2016) identify the importance of measures of skill competencies along with communication and application to real-world scenarios. The ability to demonstrate mastery of skill competencies both through oral communication and writing can be enhanced by participation in inductive scientific research processes designed within an inquiry-based learning environment (e.g. Barthlow & Watson, 2014; Nadelson, 2009; Nybo & May, 2015; Prince & Felder, 2007).

Usually inquiry-based instruction is associated more with indirect instructional methods rather than direct instruction; however, it can encompass components of both. Commonly, more teacher-facilitated, direct types of instruction are introduced early in the use of inquiry-based learning with a graduation to indirect, student-led hypothesis development, testing, and critiquing as students master skills for conducting scientific research. The Scholastic Inquiry Observation (SIO) instrument was designed to provide feedback on the implementation of inquiry procedures that progress from teacher-led demonstrations and explanations to student-led investigations. Some of the benefits of using inquiry-based learning methods include learning about the process of conducting scientific research using both inductive and deductive reasoning. These skills can assist students in investigating and making decisions about scientific, technological, and even personal issues along with increasing students’ potential for working in or with others in scientifically literate fields which are crucial components of a coherent curriculum (NRC, 1996).

Many of the current recommendations of teaching STEM-related content using scientific inquiry are built upon Schwab’s (1962) foundational essay of “Science as Enquiry” in which he recommended that scientific training focus on the instruction of conducting scientific investigations to evaluate and interpret previously constructed theories which should be considered transitory in nature. Schwab emphasized that the rapid change in scientific theorems required the acknowledgement of the inappropriateness of considering theories as being “fact” and the need to effectively instruct students and scientists in the process of testing prior assumptions and theories.

Researchers such as Nadelson (2009) discuss the frustration that can accompany the attempt to incorporate inquiry-based learning activities into K-16 classrooms. Students commonly lack pre-requisite skill sets and a sufficient content knowledge base to actively engage in inquiry-based or inductive types of learning activities initially. The effective use of inquiry-based learning processes can be dependent upon the clarity of basic content coverage using other types of pedagogical methods. Therefore, the NRC (2005) and Duschl et al. (2007) publications provide recommendations on faculty development that is needed for preparing teachers to teach in inquiry-based learning environments. In response to these and earlier recommendations (e.g. NRC, 1996; Rutherford & Ahlgren, 1991), there have been many teacher development programs that focus on the implementation of inquiry-based learning environments in K-12 and post-secondary classrooms. Instruments have also been developed to measure inquiry implementation and associated programmatic outcomes (e.g. Brandon, Taum, Young, & Pottenger, 2008; Cianciolo, Flory, & Atwell, 2006; Sawada et al., 2002; Tafoya, Sunal, & Knecht, 1980; Wainright, Flick, Morrell, & Schepige, 2004).

Instruments designed to measure the use of inquiry in formal learning environments vary in terms of the types of inquiry measured and the way they measure inquiry implementation. The purpose of this study is to present an instrument developed to measure inquiry based on specific types of learning processes occurring, the level of inquiry being implemented, and the level of active student participation in the inquiry activity. The instrument, psychometric data, and results from 6 years of classroom observations of an inquiry-based faculty development program used primarily for science and math classrooms are provided.

What is Inquiry-Based Instruction and Inquiry Teaching?

In a formal inquiry-based learning environment, the teacher functions as the facilitator of the learning process with the goal of developing scientific investigators who are building their knowledge of science, mathematics, technology, and engineering-related concepts through an investigative and experiential process (Ben-David & Zoohar, 2009; Center for Science Education, 2006; De Jong & Van Joolingen, 1998; Drayton & Falk, 2001; Herron, 1971; NRC, 1996, 2005, 2012; Rutherford & Ahlgren, 1991; Schwab, 1962). The teacher assists students in developing an understanding of new theories, concepts, and their relationships through the investigation of outcomes and how they interrelate or contrast with their current knowledge base (Flavell, 1979; Keselman, 2003; Kuhn & Pearsall, 1998; Veenman, 2012).

There are many types of project-based or hands-on learning models in the field of education, with some models not including the use of inquiry-based instruction. Woods (2014) compiled a list of 33 types of learning environments related to project-based instruction which range from direct instruction models to student-driven empirical research studies. Two distinguishing components of the 33 environments are the types of learning activities (e.g. lecture, hypothesis creation, researching, small group discussion), and who is the active initiator or facilitator of the learning activity (teachers or students). In our instrument development, we focus on both of these key components of learning environment type and learning facilitator.

Although inquiry-based instruction is most popularly researched in science and engineering fields, inquiry learning is incorporated in fields other than science (e.g. math, geography, persuasive writing) and does not have to include the use of an extended project for scientific inquiry to occur. There are many different types and levels of inquiry-based learning that can be observed, and all can facilitate an increased learning of inductive reasoning and the scientific process. On one day, students may be identifying hypotheses or predictions to be investigated and designing experiments to investigate the hypotheses. On other days, students might discuss evidence presented by others and critique their interpretations. Inquiry in science can be as simple as having students predict and observe differences in the amount of time it takes a marble to drop through a liter of different types of liquid (e.g. water, oil, honey) to understand the concept of viscosity. Inquiry in math might include the discovery of the value of pi by comparing the ratio of circumference and radius measures of varying circles. Incorporating inquiry-based learning elements into learning environments can range from simple observations to expanded investigations.

Why is Inquiry-Based Learning Important?

The creation of the professional development program in the current study was built on the importance of incorporating processes used in scientific research into a coherent curriculum (e.g. Schwab, 1962) which can help emerging scientists better understand the nature of science (e.g. Abd-El-Khalick et al., 2004; McComas, 2004). Many science standards publications are built, in part, on the concept of approaching science as the building and refining of knowledge rather than the teaching of predetermined science concepts. This does not mean that it is not important to teach scientific principles and theorems in our science curriculum, nor that pedagogical processes should not be investigated for their effectiveness in demonstrating scientific concepts. In fact, researchers such as Nadelson (2009) discuss the frustration of trying to implement effective inquiry-based learning methods when participants do not have a sufficient foundation of scientific knowledge to use in the development and refining of hypotheses, and the interpretation of scientific outcomes. However, it is important to consider that the “nature of science” includes the building of scientific knowledge through the collection of empirical evidence and the awareness that conclusions can be impacted by the interpretation of that evidence (Abd-El-Khalick et al., 2004). Further, scientific knowledge is continually being revised based on the obtainment of new evidence, and thus is revisionary in nature. As such, teaching emerging scientists the process for investigating hypotheses and evaluating theories is key to building the future of scientific research and developing a global community that understands the emerging nature of science. As Schwab (1962) explained, it is imperative to teach students that there is a difference between scientific theorems and facts, and it is important to learn how to test our theories and challenge our current conceptions in order to advance scientific knowledge. He further postulated that the threat of not teaching students about the continual expansion and refining of prior scientific beliefs can result in adults who believe their prior education was faulty or wrong rather than being an important building block in the development of a better understanding of scientific principles.

The goal of the professional development program implemented in the current study was to assist middle school teachers in implementing lessons that used scientific inquiry processes for investigating scientific and mathematic principles. Recommendations by former scientists (Nadelson, 2009; Schwab, 1962) of incrementally introducing levels of inquiry instruction into a classroom environment were included in the professional development training with a need for evaluating different levels of inquiry becoming apparent early in the training process.

Need for an Inquiry Observation Instrument

When building a professional development program to train teachers to develop and implement inquiry learning modules, we identified the need for an observational instrument which trainers could use to identify (a) specific types of inquiry activities being implemented and (b) at what level of inquiry these activities were being integrated into the lesson. The National Science Foundation funded program (K-12: I Do Science (KIDS)) partnered graduate students in math, science, and engineering fields with pairs of math and science teachers in five school districts for the purpose of developing inquiry-based learning environments. The professional development was designed by a team of university faculty, middle and high school teachers, and a former industry engineer. The graduate students worked in classrooms with the teacher teams 10 h a week for an academic year. The professional development model used was similar to Lotter, Yow, and Peters (2014) community of practice built through initiating team-based content instruction and real-world simulation training, followed by engagement in a long-term implementation facilitated by a reflective community of practice. Activities focused on the integration of math and science concepts for the purpose of emphasizing their interconnectedness and were implemented using inquiry methods that provided a context for how STEM research is conducted and to provide authentic experiences for learning math and science (American Association for the Advancement of Science [AAAS], American Association for the Advancement of Science, 1993; Common Core State Standards Initiative [CCSSI], 2010; Heflich, Dixon, & Davis, 2001; NRC, 2012; So, 2012). One key goal of the KIDS four-week summer training program was the development of a “minds-on” component of inquiry that went beyond just “hands-on” experiential learning. Identifying when activities transitioned from “students following procedures” to “active student engagement” in the questioning and interpreting process was crucial (Schwab, 1962).

When designing the inquiry-based instruction training, it was determined that an instrument that provided trainers with a mechanism for identifying whether specific types of inquiry-based learning procedures were being used in the classroom and whether students were actively engaged in the inquiry process was not publically accessible. A review of available assessments resulted in a list of instruments that provided useful information about inquiry implementation, but did not provide the detail desired for an inquiry training program that incorporates both stable and fluid inquiry investigations described by Schwab (1962). The need to assess whether different types of inquiry, appropriate levels of inquiry implementation, and the desired degree of student engagement was being successfully obtained in classrooms resulted in the development of the Scholastic Inquiry Observation (SIO) instrument.

Inquiry-based instruments that were reviewed in 2001 during the development of the KIDS program included the Assessment of Inquiry Potential (AIP; Tafoya et al., 1980), the Reformed Teaching Observation Protocol (RTOP; Sawada et al., 2000), and the OCEPT Classroom Observation Protocol or also referred to as the Oregon Teacher Observation Protocol (O-TOP; Wainright et al., 2004). Examples of other instruments investigated after the initiation of the SIO include the Inquiry Observation Protocol (IOP; Cianciolo et al., 2006), the Inquiry Science Observation Coding Sheet (ISOCS; Brandon et al., 2008), and the Inquiry Science Implementation Scale (ISIS; Brandon, Young, Pottenger, & Taum, 2009). Instruments such as the O-TOP and RTOP provide useful levels of information for different components of inquiry, but commonly aggregate types of inquiry into larger individual item stems. A measure that provided more detail about the specific components of inquiry-based learning was desired for assessing the professional development program used in this study. The ISOCS (Brandon et al., 2008) compares learning activities using three types of inquiry processes (e.g. authoritarian, descriptive, and Socratic inquiry) with a relatively complex set of interactional responses between teacher strategy, learning activity, student response, teacher follow-up, and student-teacher proximity. The ISOCS identifies and describes which theoretical framework of inquiry a teacher is using rather than identifying concrete or specific indicators of types of inquiry implemented (and level of inquiry implementation).

Brandon et al. (2009) created the ISIS where teachers self-report their usage of inquiry. This format was not desirable for our training situation where graduate students’ and teachers’ knowledge level of inquiry processes was expected to change throughout the course of the training, and thus their self-reporting of inquiry-based processes would not have been based on the same operational definitions prior to and after training. The AIP (Sunal, Sunal, Sundberg, & Wright, 2008; Tafoya et al., 1980) provides useful information about inquiry strategies included in curricular materials that can determine the potential for inquiry in a set of instructional materials, but not the actual level of inquiry that occurs when a lesson is implemented. However, their descriptions of the levels of inquiry that could be achieved (based on the level of student involvement in the inquiry process) were very useful, and were used as a model for the SIO level of inquiry implementation ratings.

The last inquiry-based instrument investigated was the IOP developed by Cianciolo et al. (2006). The IOP model provides a more differentiated listing of inquiry processes than the prior instruments and is the most similar to the types of items on the SIO instrument. However, the IOP focuses on measuring inquiry implementation based on frequency of occurrence and does not measure the degree to which students are facilitating or directing the learning process, which is a key component to inquiry environments. The SIO was developed based on the need for an instrument that integrated types of inquiry characteristics or procedures as measured by level of inquiry implementation.

Theoretical Framework for SIO Instrument Development

The theoretical model used to develop the training for the KIDS program and subsequently the development of the SIO instrument included sources such as the National Science Education Standards (NRC, 2000), Drayton and Falk’s (2001) descriptions of inquiry-oriented classrooms, Herron’s (1971) overview of the nature of scientific inquiry, the NRC’s (2005) How Students Learn: History, Mathematics, and Science in the Classroom, Tafoya et al.’s (1980) and Schwab’s (1962) levels of inquiry, and most importantly a web-based unpublished technical report on Conceptualizing Inquiry Science Instruction completed by the Education Development Center personnel for the Inquiry Synthesis Project accessed in 2001 and later published (Center for Science Education, 2006).

Inquiry in learning has been defined a number of ways in the literature and thus compatibility with a theoretical framework was essential during the inquiry program development phase, and an ability to define what could be observed in an inquiry-based classroom was essential during the SIO instrument development phase. The theoretical framework began with the NRC’s (2000) description of inquiry in the classroom that included (a) the use of scientific questions and hypotheses (b) to gather evidentiary support that (c) leads to the development of explanations that (d) should be evaluated and critiqued and (e) finally communicated and justified to others. Inquiry has been similarly defined by many researchers to include the acquiring of knowledge through observing, hypothesizing, collecting and analyzing data, developing conclusions, making predictions, and communicating findings to others (e.g. Buch & Wolff, 2000; Center for Science Education, 2006; Drayton & Falk, 2001; Edelson, Gordin, & Pea, 1999; Madill et al., 2001). These tenants were the primary focus of the observational instrument development.

Inquiry-Based Guidelines/Instruments Assisting SIO Development

Types of Inquiry Activities

Levy and Minner (Center for Science Education, 2006, p. 4) provided the greatest assistance in designing the format of the SIO instrument with their focus on three primary areas: “presence of science content,” “type of student engagement,” and “elements of the inquiry domain present in the components of instruction.” Levy and Minner combined their detailed elements of inquiry into five general areas of question, design, data, conclusion, and communication. For the SIO instrument, greater detail of the elements within an inquiry domain was desired; thus, Levy and Minner’s original definitions of the elements of inquiry in combination with characteristics obtained from Drayton and Falk (2001), Herron (1971), and the National Science Education Standards (NRC, 1996) were used to develop 18 types of inquiry learning activities and inquiry-related processes (which are listed under Instrumentation in the Methods section).

Level of Inquiry Being Implemented

A surprising realization was that inquiry-based learning activities can be used in classrooms without actual inquiry learning being incorporated. When observing inquiry-based lessons being implemented in classrooms, our observers noticed an activity being used with high levels of inquiry in one classroom and essentially no inquiry in another. The level of inquiry depended upon the type of instructional guidance provided by the instructor. Thus, the second step was developing an assessment that measured the level of inquiry being facilitated. Schwab’s (1962), Herron’s (1971), and Tafoya et al.’s (1980) definitions of the levels of inquiry of an activity combined with Levy and Minner’s (Center for Science Education, 2006) definition of types of student engagement were used to develop the scale format described in the Methods section.

Level of Active Student Engagement

The third component important for measuring inquiry activity was level of student engagement. The definition used for the SIO instrument focuses on a directly observable outcome—the percentage of students actively engaged in each type of inquiry learning component. Categories were developed by the author using a 0 to 6 score range where a value of 0 indicated that less than 5% of students observed were actively engaged to a value of 6 which indicated that over 95% of the students were actively engaged.

Supplemental Scales: Student Interest and Mastery of Objectives

Two Likert-type scales measuring observation of student interest and demonstrated student mastery of objectives are included on the SIO. Many researchers include student interest as a factor that can impact student engagement which, in turn, is correlated to student learning. The mastery of objectives scale is considered even more important given that the explicit investigation of content is crucial during the use of inquiry-related instruction (e.g. Hodson, 1996; Roth & Garnier, 2007; Windschitl, Thompson, & Braaten, 2008). Research indicates that although the use of hands-on activities without the investigation of a specific content objective can increase student interest levels, it does little to increase scientific reasoning ability or the understanding of how a certain concept fits into a currently accepted set of postulates. The Mastery of Objectives scale is an observational measure of student mastery of academic content specifically identified during the learning activity.

Methods

Data Source

Evaluation personnel observed learning environments in middle school classrooms, collecting data on the types of inquiry being used, the level of inquiry being implemented, the degree of active student engagement, student interest, and the demonstrated level of student mastery of objectives. From 2006 to 2011, one faculty member and four graduate assistants collected 164 observations using the SIO instrument. Although data collected were primarily in math and science classrooms, it is important to note that the instrument was not developed for just math and science, but focused on the use of inquiry-based learning processes that can be integrated into a variety of teachers’ preferred pedagogical methods. Although education experts and researchers recommend the integration of math, science, and technology content in learning activities (e.g. AAAS, 1993; CCSSI, 2010; Heflich et al., 2001; NRC, 2012), they also indicate that there are challenges to this integration due to selected differences in math and science instructional methods (So, 2012). This study provides a comparison of inquiry outcomes used by teachers trained in inquiry using interdisciplinary teams to better understand the types of inquiry most commonly selected and implemented at the highest levels of inquiry in math and science classrooms. The results are to inform on both similarities and differences that are observable within an integrated math and science learning model.

There were a total of 85 teacher-scientist teams observed over the 6 years with the goal of observing at least two lessons for each team. There were 42 6th and 7th grade teachers with some participating more than 1 year. There were 46 scientists that participated with most of the scientists working with two different teachers on a curriculum team (one math and one science). The classrooms were located in ten schools in six school districts. The school districts were not selected at random, but selected based upon location within a 30-mile radius of the research team’s institution, with teachers being invited to apply for program participation. The majority of 6th and 7th grade math and science teachers within the ten schools participated within the 11-year program timeline. Two school districts were located in small rural communities with grade levels as small as 60 students. Four school districts were moderately large with two to four middle schools per school district and approximately 375 to 935 students within a middle school. The free and reduced lunch percentages of the schools ranged from 26 to 80%, with the percent of minority students ranging from 13 to 70% and the largest minority group classified as Hispanic.

Instrumentation

The observational instrument includes 18 learning activities or resources on which the observers rate the classroom environment in terms of level of inquiry and degree of active student engagement. Sixteen of the activities and resources are grouped into three components of inquiry:

Working with hypotheses

  • Generation of hypotheses/predictions

  • Selection of hypotheses to be investigated

  • Designing procedures for testing hypotheses

  • Testing hypotheses/predictions

  • Testing conclusions/interpretations

Communication in inquiry

  • Brainstorming

  • Verbally interpreting outcomes

  • Discussing interpretations in small groups

  • Critiquing others’ interpretations

  • Asking questions to further understanding

Hands-on inquiry

  • Hands-on demonstration of activity/concept

  • Gathering/recording data

  • Visual representation of concept or data

  • Creation of graphs/charts

  • Manipulation of materials/active learning tools

  • Use of scientific technology

There are two items included on the SIO that do not fit within one of the three subscale areas based on previous pilot study data. They are large group discussion and writing summaries of results. The complete instrument can be viewed online at https://www.academia.edu/34131309/Scholastic_Inquiry_Observation_Instrument.

The inquiry learning-related components are scored by level of inquiry implementation (0 to 4), similar to Tafoya et al.’s (1980) ratings:

  • 0 = Activity not observed

  • 1 = Teacher demonstrates the process or activity/students do not participate (Student Observation)

  • 2 = Teacher facilitates the process/students participate (Teacher Controlled Activity)

  • 3 = Teacher initiates/students facilitate (Students Actively Involved in design and guidance of the learning process)

  • 4 = Students initiate and facilitate (Student-led Learning)

This process for rating the inquiry-related activities is different from prior instruments such as the RTOP, IOP, and O-TOP in that the goal is to not only identify what type of inquiry process is being used, but to what degree students are learning how to facilitate the process of scientific inquiry. We wanted to identify which types of inquiry processes appear to lend themselves more readily to higher levels of student facilitation (or student ownership) of inquiry, does level of student facilitation increase with experience, and does this differ based on types of classrooms?

A second measure of the inquiry activities uses the percentage of students actively engaged which is recorded using a rating scale of 0 to 6. Having a large percentage of students actively engaged during the learning process is crucial. A high level of inquiry that only includes 10% of the students may not be as effective as a moderate level of inquiry that includes 95% of the students. The following is the rating scale used for level of student engagement:

  • 0 = 0–5% (essentially none of the students)

  • 1 = 5–20% (a few students)

  • 2 = 20–40%

  • 3 = 40–60% (approximately one half of the class)

  • 4 = 60–80%

  • 5 = 80–95% (strong majority of the students)

  • 6 = 95–100% (almost all of the class)

Mastery of Objectives and Student Interest Scales

In addition to the inquiry-based learning scales, two Likert scales were developed to measure student Mastery of Objectives and Student Interest. These two scales consist of seven and four items, respectively, using the same 0 to 6 point scale for student engagement in inquiry where 0 indicates that fewer than 5% of the students are demonstrating mastery of the objectives or showing interest in the learning activity and a value of 6 indicates more than 95% of the students are demonstrating either content mastery or student interest. The operational definitions for the Mastery of Objectives and Student Interest scales are:

  • Mastery of objectives. The degree to which a group of subjects demonstrate mastery of an objective or a set of objectives as measured through independent observations. The type of mastery includes basic content understanding, evaluating hypotheses, integrating new knowledge with prior content mastered, critiquing interpretations, identifying misconceptions, generating inquiries beyond the primary objectives, and reflecting on what was learned. This scale is not a measure of student enjoyment or interest.

  • Student interest. The observed level of listening, enthusiasm displayed, and active participation in a learning activity by a group of subjects. It includes the degree to which participants do not appear bored or frustrated with the learning process.

Procedure

Psychometric analyses of the instrument include inter-rater reliability comparisons, internal consistency analyses, factor analysis, and descriptive scale information. A report of item-level observational outcomes including the types of inquiry that occurred most frequently at the highest levels of inquiry in middle school math and science classrooms is provided, in addition to the types of inquiry that include the largest degree of active student participation.

Results

Inter-Rater Reliability

There are 164 classroom observations in the dataset; however, these include duplicate classroom observations used in the inter-rater training phase of the study. There were 95 independent observations containing complete data on all variables that were used in the psychometric analyses of the instrument validation phase. There were five raters (1 faculty member and 4 graduate assistants) with three training periods occurring during the graduate student transition time periods. There were four levels of rater preparation.

  • First, raters were trained on each inquiry characteristic, describing examples of each and developing examples that would (and would not) be appropriate classifications of each characteristic.

  • Second, lesson scenarios were presented to each rater, and they identified which inquiry characteristics were being demonstrated and at what level.

  • Third, raters observed the same lesson being conducted in the schools, rated the lesson using the SIO, and compared ratings for each inquiry characteristic. Item-level differences in interpretation were identified and modification training was completed where needed. Scale-level values were compared across groups for inter-rater consistency correlations. Item level ratings during training resulted in 62% perfect agreement for the inquiry implementation item ratings (5-point rating scale) and 56% perfect agreement on the student engagement in inquiry ratings (7-point rating scale). The percentages of ratings within 1 point were 78 and 73% for the two inquiry scales, respectively. The average difference in the ratings was 0.22 across the set of inquiry implementation items (5-point scale) and 0.31 for the student engagement items (7-point scale). The percent agreement on the student interest and mastery of objectives scales (7-point scale) was 56 and 54%, respectively, with average differences of 0.22 and 0.05. The ratings that were within one point were 90 and 79% for the student interest and mastery of objectives scales. Inter-rater consistency correlations (pairwise) for the four scale scores of inquiry implementation, student engagement in inquiry, student interest, and mastery of objectives ranged from 0.85 to 0.98, 0.61 to 0.99, 0.74 to 0.95, and 0.39 to 0.83 during the training phase. Based on the inter-rater reliability evaluation, one of the five raters’ observations was removed from the remaining data analyses due to this rater’s low consistency with the other raters (resulting in 95 independent, usable observations).

Inquiry Processes Used in Science and Math Classes

The purpose of the instrument is to provide feedback to teachers and faculty development personnel regarding types of inquiry learning processes being used at differing levels of inquiry implementation, and the percentage of active student participation with each inquiry-related process. In this section, item-level data are presented. Teachers who were observed using the SIO instrument had participated in 1 to 5 years of a 4-week summer training program on incorporating inquiry-based learning in their classrooms. The inquiry data for the current sample may not be representative of a general population of teachers due to their inquiry training and because these teachers had a graduate student working with them 10 h each week for the purpose of implementing inquiry learning processes. In addition, these teachers were engaged in a training model that integrated math and science content into lessons. The data provide feedback on the different types of inquiry with their associated levels of implementation that presented when the activities were facilitated in science versus math classrooms. Four figures are presented, with the first two providing Level of Inquiry Implementation (Fig. 1) and Level of Active Student Engagement in Inquiry (Fig. 2) for science classroom observations. Figures 3 and 4 represent the same two sets of inquiry-based characteristics for math classrooms.

Fig. 1
figure 1

Percentage of inquiry characteristics in science classrooms at each level of inquiry

Fig. 2
figure 2

Percentage of inquiry lessons in science classrooms classified by percentage of students actively engaged in the activity

Fig. 3
figure 3

Percentage of inquiry lessons in mathematics classrooms at each level of inquiry

Fig. 4
figure 4

Percentage of inquiry lessons in mathematics classrooms classified by percentage of students actively engaged in the activity

In the science classrooms, the most common inquiry activity implemented at any level of implementation was “manipulation of materials” (in 86% of lessons observed; see Fig. 1). The most common inquiry-based process used at the highest level of inquiry (where students initiate the action) is “asking questions to further understanding” which occurred in 24% of the lessons. The percentage of science lessons where students facilitated the questioning after teacher prompting was 8%. The next three most common inquiry processes that occurred at the highest level of inquiry were “brainstorming,” “discussing interpretations in small groups,” and “critiquing others’ interpretations and responses” with 20, 17, and 16% of the classes having students initiate these processes. These three activities of brainstorming, small group discussion, and critiquing occurred 29, 40, and 29% of the time at the “teacher initiates/student facilitates” level of inquiry. The inquiry-related processes used the least in the science classrooms were the “creation of graphs and charts” and “use of scientific technology.”

Figure 2 Provides data on the percentage of students actively engaged when each inquiry process was employed. The inquiry process used most commonly at the highest level of inquiry, “asking questions to further understanding,” did not actively engage a high percentage of the students in the science classroom (3% of lessons actively engaged 80–95% of the students). An additional 20% of the classrooms had approximately one half of the students actively engaged in “asking questions to further understanding.” The level of student engagement was much higher for the next three most commonly used inquiry processes of “brainstorming,” “discussing interpretations in small groups,” and “critiquing others’ interpretations/responses,” with 20 to 24% of the lessons having 80% or more of the students actively engaged. Whenever activities included working with hypotheses or predictions, a large proportion of the students were actively engaged, with 32 to 44% of the lessons having more than 80% of students participating. The activities that actively engaged the largest percentage of students in science classrooms were “manipulating active learning materials,” “visually representing concepts or data,” and “gathering/recording data.” Although the “use of scientific technology” and the “creation of graphs or charts” were the least used inquiry-related activities, they were the most successful at actively engaging 95% or more of the students whenever they were employed.

In the math classrooms, the most commonly used inquiry process implemented at any inquiry level was “verbally interpreting outcomes” (in 87% of math lessons; see Fig. 3). The inquiry-based process used most frequently at the highest level of inquiry (where students initiated the learning) was “critiquing others’ interpretations” which was observed in 5% of the lessons. When including the two highest levels of inquiry, the most common inquiry processes included “verbally interpreting outcomes” (in 40% of the lessons) and “manipulation of materials” (observed in 27% of lessons). The top three characteristics using the highest inquiry levels were also three of the five highest inquiry characteristics in the science classrooms. Inquiry-related activities that occurred frequently in math classrooms but were typically facilitated by the instructor were “large group discussion,” “asking questions to further understanding,” “visually representing concepts or data,” and “gathering/recording data.” Processes that tended to have higher levels of inquiry implementation in math than in science include “creation of graphs/charts,” “visually representing concept or data,” and “using scientific technology.”

The procedures which were most likely to actively engage 80% or more of the students in math classes were “visually representing concept or data,” “gathering/recording data,” “creating graphs or charts,” “hands-on demonstrations,” and “manipulating materials.” Of the 62% of the lessons that had students participating in “visually representing a concept or data,” 61% resulted in 80% or more of the students actively engaged (see Fig. 4). In comparison to science classrooms, “creating graphs or charts” was an activity that incorporated a much higher percentage of active participants in math classrooms. Another noticeable comparison was that hypothesis development, selection, and designing was not used as frequently in math classes as in science classes; however, we were encouraged that 13–24% of the math lessons incorporated the testing of hypotheses and predictions. Many of our math teachers indicated that hypothesis development and testing is not as intuitive or easy to include in math lessons as compared to science, and it is something they have to consciously work into their lessons.

Scale Score Development for Level of Inquiry Implementation and Active Student Engagement in Inquiry

As a secondary goal for research purposes, the combination of items into scale scores provides a mechanism for predicting outcomes such as student academic performance or for making group level comparisons based on faculty development impacts or classroom differences. For this purpose, it was hypothesized that the sets of ratings could be averaged to obtain six inquiry-based scales along with the two measures of student interest and mastery of objectives. Before using the scale score averages, reliability and validity of the hypothesized scales needed to be assessed. Eight scales were initially hypothesized:

  • Level of inquiry implementation—hypothesis usage (five items)

  • Level of inquiry implementation—inquiry communication (five items)

  • Level of inquiry implementation—hands-on activities (six items)

  • Level of active student engagement—hypothesis usage (five items)

  • Level of active student engagement—inquiry communication (five items)

  • Level of active student engagement—hands-on activities (six items)

  • Mastery of objectives (seven items)

  • Student interest (four items)

Initial reliability estimates were calculated using Cronbach’s alpha (see Table 1). The number of observations included was the 95 used for item-level analyses. The internal consistency of the Hypothesis Usage set of items and the Inquiry Communication items were acceptable for the level of inquiry implementation scales (α’s = 0.91 and 0.82, respectively). The Hypothesis Usage subscale for the level of active student engagement was 0.90; however, the Inquiry Communication subscale for active student engagement was below desired levels (0.66). The Hands-on Activities subscales for both level of inquiry implementation and active student engagement were insufficient (0.37 and 0.53), and therefore are not appropriate for use as scales and not investigated further. The Hands-on items are only recommended for item-level analyses. The internal consistency of the Mastery of Objectives and the Student Interest subscales were 0.78 and 0.85, respectively. Average scale scores and standard errors are included in Table 1.

Table 1 Descriptive statistics of inquiry-related item groupings for scale-level use (N = 95)

Factor Analysis

Confirmatory factor analyses were conducted to empirically test the theoretical groupings. The original set of 16 items was built from a theoretical model having three sections (Hands-on Activities, Hypothesis Usage, and Inquiry Communication). Due to the Hands-on Activity sections for both inquiry implementation and student engagement levels having insufficient internal consistency reliability levels, a two-factor model was selected for the Hypothesis Usage and Inquiry Communication scales measured using the level of inquiry implementation ratings and repeated for the level of student engagement ratings. These sets of scales were analyzed separately given that level of inquiry implementation and level of active student engagement were measured using the same set of item stems (non-independent). Two additional confirmatory factor analyses were conducted for the remaining two scales of Student Interest and Mastery of Objectives to investigate whether they met unidimensional scale requirements as hypothesized.

All confirmatory factor analyses were conducted using a robust maximum likelihood estimation recommended for Likert-type items using Mplus. Due to the small sample size with complete data on all items (N = 95), Hu and Bentler’s (1998) model fit recommendations for small samples (e.g. N ≤ 250) using CFI/TLI > 0.95 and SRMR < 0.09 were used. The two-factor confirmatory model for the 10 level of inquiry implementation responses for Hypothesis Usage and Inquiry Communication met recommendations for model fit indices with CFI = 0.953 and SRMR = 0.084. The initial two-factor model for the 10 level of student engagement responses for Hypothesis Usage and Inquiry Communication did not meet both recommended model fit indices (CFI = 0.909, SRMR = 0.070) and required one modification of allowing correlated errors between two items in order to reach acceptable model fit index values (CFI = 0.953; SRMR = 0.052).

It is recommended that the items representing Hypothesis Usage and Inquiry Communication be used as scale scores for both the level of inquiry and level of student engagement. The Hands-on Activity component for the two measures of inquiry are not recommended as scale scores and should only be used as individual items providing evaluation feedback on the usage and level of inquiry associated with these activities in the classrooms.

The third and fourth confirmatory factor analyses were conducted for the Student Interest and Mastery of Objectives scales. The four items for the Student Interest scale exhibited model fit indices of CFI = 0.982 and SRMR = 0.036 indicated strong model fit. The seven items for the Mastery of Objectives scale had poor fit on the initial confirmatory factor analysis (CFI = 0.629, SRMR = 0.156). An inclusion of recommended modifications allowing for three pairs of items to have correlated residuals increased model fit indices to acceptable model fit levels (CFI = 0.951, SRMR = 0.050).

Inquiry Subscale Comparisons for Science and Math Classes

Scale score averages for the Level of Inquiry Implementation, Active Student Engagement, Student Interest, and Mastery of Objectives scales were computed for science and math classes (see Table 2). At initial inspection, averages appear low for Level of Inquiry Implementation; however, rarely are all types of inquiry activities included in a single 45- to 75-min instructional session. Thus, averages commonly include values of 0 for many of the characteristics, indicating that overall averages greater than 1 can represent classrooms where some inquiry activities occurred at a fairly high level.

Table 2 Inquiry implementation and active student engagement scale score descriptive statistics

The effect size difference between the science and math classes on the Inquiry Implementation of Hypothesis Usage subscale was 0.36, with science classrooms implementing hypothesis development and usage at a higher level of inquiry than math classrooms. This corresponded with teacher reports that hypothesis development and testing was more easily integrated into the science classrooms, where many of the hands-on and data manipulation activities were easier inquiry procedures to implement in the math classrooms. A similar effect size difference of 0.42 was observed for the science classrooms compared to math for the Implementation of Inquiry Communication subscale. Student Engagement in Hypothesis Usage was also higher for science classrooms (d = 0.45) than math classrooms; however, the effect size was small (d = 0.16) for differences in their Student Engagement in Inquiry Communication. On the Student Interest scale, there was an effect size difference of 0.60 between math and science classrooms with observed interest level rated higher in the science classes. There was a small difference (d = 0.28) between math and science classrooms on the Mastery of Objectives scale with science classrooms rated only slightly higher on demonstrated content mastery.

The last descriptive measure provided is a comparison of the relationships among the six subscales for the sample. There is a strong relationship between the level of Inquiry Implementation of Hypothesis Usage and Student Engagement in Hypothesis Usage (r(93) = 0.87, p < .05; see Table 3). There was a moderate relationship between Implementation of Inquiry Communication and Student Engagement in Inquiry Communication (r(93) = 0.60, p < .05). It was anticipated that the Student Interest scale would be correlated with the two Student Engagement scales, but the relationships were weak (r(93) = 0.24 and 0.39, p < .05 for Hypothesis Usage and Inquiry Communication, respectively), indicating that these scales were not overlapping in content as much as initially hypothesized. Mastery of Objectives had moderate relationships with both sets of Hypothesis Usage and Inquiry Communication scales. It is also important to note that the relationship between the Hypothesis Usage and Inquiry Communication scales were stronger with the Mastery of Objectives scale than the Student Interest scale. Higher usage of inquiry both in level of implementation and student engagement was more likely to occur with explicit demonstrations of student mastery of the objectives.

Table 3 Relationships among the four inquiry-related scale scores and two student-centered scales

Discussion

The number of faculty development programs focused on the use of inquiry-based learning strategies for K-12 classrooms is large, including broad educational initiatives such as the National Science Foundation’s Graduate-K12 (GK12), Science of Learning Centers (SLC), Math and Science Partnerships (MSP), and Computing in Science and Engineering (CISE) programs that have funded numerous science, technology, engineering, and math projects incorporating scientific inquiry methodologies. The types of faculty development and the data needed to assess inquiry implementation can require measurement instruments with different foci in regard to the inquiry feedback needed. There are many inquiry-related observational instruments available for varying purposes. The purpose of developing the SIO instrument was for obtaining information about level of implementation of specific types of inquiry processes based on the degree to which students were active facilitators of the learning environment as described by Schwab (1962). This is slightly different than available instruments that focus on frequency of occurrence or an aggregated or holistic perception of the classroom. Analyses at the inquiry item level using the SIO have been a useful tool for our faculty development team in providing a detailed understanding of the types of activities our teams are creating and the effectiveness of implementation at varying inquiry levels in different types of classrooms.

In our study, the inquiry process that students tended to initiate at the highest level of inquiry was “asking questions to further understanding” which was observed in 24% of the science lessons. An additional 8% of science lessons had students successfully facilitating the questioning after teachers initiated it. Unfortunately, fewer than 50% of the students tended to be independently engaged in asking questions to further understanding. The three next most common processes students initiated in science classes were brainstorming, small group discussion of outcomes, and critiquing conclusions of results. The proportion of students actively engaged with these three processes was higher than that for “asking questions to further understanding.” It was observed that students did not commonly initiate certain hypothesis-related components such as selecting hypotheses to be tested, designing procedures for testing hypotheses, or testing hypotheses and/or conclusions in science classrooms. The observed behavior was similar to reported difficulties by Nadelson (2009) and theorized by Schwab (1962) about the training of students to initiate the scientific research process within the classroom environment. The areas where there were the highest levels of student engagement were in gathering and recording data, visually representing concepts or data, and manipulating active learning materials. It is recommended that as students become more comfortable with these processes using more directed learning techniques, they be pushed to generate, select, and test hypotheses based on the data collected and visually presented. The SIO is designed to provide feedback on the success of progress from a direct to a more indirect learning environment that models the scientific method.

Asking questions to further understanding was also used commonly in the mathematics classrooms, however at a much lower level of inquiry than what was observed in science. Gathering and recording data, visually representing concepts or data, and large group discussion appeared to be facilitated by the instructor in math classrooms rather than initiated by students. However, these processes engaged a large proportion of the students, typically greater than 80%. Creating graphs or charts was a process used more frequently and engaged a larger proportion of the students in math classes. Similar to science, hypothesis development, selection, and design were not common; however, frequencies between 13 and 24% in math classrooms were encouraging given teachers’ comments on the perceived difficulty of training students to develop and test hypotheses addressing math concepts. Also encouraging was the degree of relationship between students demonstrating mastery of the content objectives and the use of inquiry methods. The higher the level of inquiry used and the greater the proportion of students engaged, the more likely explicit demonstrations of content mastery would be observed. These relationships were stronger than those with perceived student interest.

Teacher mastery of how to facilitate varying types and levels of inquiry learning processes is an instrumental part of building student scientific reasoning skills that are a crucial component of coherent math and science curriculums. Training teachers to use inquiry learning processes where students are engaged at higher levels of inquiry can be difficult, and the SIO is designed for obtaining data on the progress from teacher-led to student-led inquiry implementation. As the SIO instrument continues to be used, further analyses will be conducted to monitor the relationships between the subscales of items on the four primary inquiry scales and how the results correlate with student achievement outcomes and students’ ability to use scientific reasoning skills to investigate real-world problems. Of interest will be how the instrument functions with a more diverse population of teachers, including teachers who use higher levels of inquiry and teachers without inquiry-based faculty development training who may use few inquiry-based learning procedures. Currently, the use of the level of Inquiry Implementation in Hypothesis Usage, Implementation of Inquiry Communication, Student Engagement in Hypothesis Usage, and Student Engagement in Inquiry Communication scales is supported based on internal consistency and factor analysis results. However, further study is needed to provide evidentiary support of their relationship to secondary inquiry measures. The individual item analyses can provide valuable information regarding the level of inquiry observed based on the way a lesson is facilitated in the classroom, demonstrating to trainees how the same lesson plan can be implemented at a high level of inquiry for one sample of students and at a low level of inquiry for another. In addition, item level results provide useful information regarding the types of inquiry observed for different types of inquiry learning activities that have been developed and in classrooms that cover different content fields.

Lessons Learned

We learned during the testing of the SIO that two common components of inquiry-based learning (large group discussion and writing summaries of results) are not strongly correlated with other inquiry-related activities, although they are considered useful tools in inquiry classroom environments. This was observed both in the pilot study and the current validation study. It appears that large class discussion is also commonly used with traditional lecture environments, and thus higher values for large group discussion were associated with both high and low levels of other types of inquiry communication. Writing summaries of results was also identified as a characteristic that does not correlate highly with other inquiry communication activities. Writing summaries was the least frequent activity observed in math classrooms and one of the least common in science classrooms with all occurrences being at low levels of inquiry. Thus, the lack of variation in responses may contribute to the low correlations. This outcome could change in samples with larger inclusion of this activity at higher inquiry levels.

We also learned it is possible to create lessons where students engage at the highest level of inquiry on all ten hypothesis usage and inquiry communication items. One observation obtained an average of 4.0 on both sets of Inquiry Implementation of Hypothesis Usage and Implementation of Inquiry Communication items. A review of the data indicated that this was not an error, and in fact the classroom was rated a 0 on the hands-on activity components. This was an activity in which students were presenting and discussing results from experiments they had designed, with the audience providing critiques and suggesting follow-up experiments that could be used to test their results. This activity was adopted as a model lesson for guiding students to the highest level of inquiry initiation.

It will be important to triangulate SIO scores with secondary measures of classroom inquiry, student engagement, student interest, and mastery of objectives. Recommendations include the investigation of the relationship between SIO inquiry scores with both content-specific student academic data and student-level inquiry outcome measures such as Chang, Chen, Guo, Cheng, Lin, and Jen’s (2011) Competency in Scientific Inquiry and Competence in Communication measures. These relationships would provide evidence of whether increased levels of observed inquiry in the classroom are related to student effectiveness in participating in the inquiry process, communicating their findings, and their ultimate mastery of content. In general, the SIO instrument is a publically available option for obtaining feedback on specific types of inquiry implementation. Feedback is provided based on level of inquiry implemented rather than frequency of occurrence and includes measure of active student engagement, interest, and mastery of objectives.