Introduction

Reform of science education in the United States is a difficult, multifaceted task. Within the realm of science education reform, emphasis relies heavily on the importance of scientific inquiry experiences for K-12 learners. The National Science Education Standards, (NSES; National Research Council [NRC], 1996), a current reform document aimed at improving scientific literacy for all, is striving to achieve this challenging goal by emphasizing an approach to teaching and learning about science and highlights scientific inquiry as a prominent feature. The National Committee on Science Education Standards and Assessment (NRC) has asserted that students should “engage in aspects of inquiry as they learn the scientific way of knowing the natural world, but they should also develop the capacity to conduct complete inquiries” (p. 23). The problem, however, exists in the fact that many teachers report that they have never experienced teaching or learning science as inquiry (Kleine et al., 2002; Windschitl, 2002). This lack of experience is particularly true at the elementary level.

If science reform is going to be successful and our elementary children are to be provided with effective science instruction, preservice teachers must first be provided with opportunities to experience success as learners of science in reform-oriented contexts (Enochs & Riggs, 1990; Kleine et al., 2002; Riggs, 1988). They themselves must experience first-hand how learning science as inquiry takes place within an elementary school setting (Windschitl, 2002). Based on the idea of Bandura's (1977) social learning theory, if preservice teachers can experience success within a science methods course, then they are more likely to model effective instruction within their own elementary classroom, which, in turn, may promote the success of their elementary students in the area of science. Thus, teacher preparation experiences have become a logical target for change.

While teachers should utilize an arsenal of different strategies when teaching science, the NSES (NRC, 2000) described five essential features of classroom inquiry that apply across all grade levels.

  1. 1.

    Learner engages in scientifically oriented questions,

  2. 2.

    Learner gives priority to evidence in responding to questions,

  3. 3.

    Learner formulates explanations from evidence,

  4. 4.

    Learner connects explanations to scientific knowledge, and

  5. 5.

    Learner communicates and justifies explanations. (p. 29)

These essential features introduce important aspects of science to students while simultaneously assisting them in developing knowledge in regard to specific science concepts. Within each of the five essential features of classroom inquiry, there are variations that are labeled within a continuum of inquiry experiences. To determine whether an experience is categorized as full or partial inquiry, one must consider the amount of student and teacher involvement. Partial inquiries have greater teacher involvement and less student involvement, whereas full inquiries have greater student involvement, and less teacher involvement. For example, when students are engaging in scientifically oriented questions, the amount of inquiry will vary depending on the origin of the question. If the question was presented by the teacher, the amount of inquiry will be less than if the students developed their own questions to investigate. In essence, within any classroom, a lesson may be more or less student directed, depending on the variation of the features implemented.

Given the nature of teaching and learning science as inquiry, it is important to point out that this type of learning and teaching is not a neat and tidy process. The NRC (2000) described a more specific set of variations to encompass inquiry learning. The NRC proposed a definition that is derived from “…the abilities of inquiry, emphasizing questions, evidence, and explanations within a learning context” (p. 24). At the center of this definition are the five essential features of classroom inquiry.

These features draw attention to students engaging in scientifically oriented questions and giving priority to evidence when formulating explanations. In addition, students evaluate their explanations in light of alternative explanations and then communicate their proposed explanations to others (NRC, 2000).

Although, scientific inquiry can be defined and described in a number of ways, for the purposes of this research, we will be using the definitions set forth by the NRC (2000). Our definition draws upon the essential features of classroom inquiry as the essence of inquiry, particularly the notion of giving priority to evidence and explanation.

Theoretical Underpinnings

In this section, we discuss how previously established research provided a foundation for this project. These areas include the importance of scientific inquiry experiences, social learning theory, self-efficacy, and self-efficacy within the context of science education. The focus of this research on beliefs is a result of the relationship between attitude and subsequent behavior (Bandura, 1986). Bandura's theory of social learning describes two dimensions of efficacy beliefs—personal self-efficacy and outcome expectancy—upon which behavior is based. As explained by social learning theory, if teachers do not have successful experiences teaching or learning science as inquiry, it is unlikely that these teachers will implement science as inquiry in their elementary science classrooms (Bandura, 1977).

Factors Driving Science Education Reform

Given the current trends in contemporary science education reform, science as inquiry is considered to be an important part of the restructuring. A rich learning environment, with a focus on inquiry-based learning, creates opportunities for children to internalize or transform new information, which then allows students to “…create and expand their individual cognitive structures” (Lee & Krapfl, 2002, p. 250). This type of learning supports conceptual understanding as opposed to rote learning of science concepts and facts (Kleine et al., 2002; Lee & Krapfl, 2002).

Children should know how to pursue their own questions about the world around them. This pursuit, however, does not happen naturally in the classroom, and students will need to be supported in their attempts to understand phenomena. When science is taught through the process of inquiry, children will have the opportunity to pose questions and seek answers based on observation and exploration. Students can then use the evidence gathered throughout this process to answer their own questions, as well as future questions that may arise. Inquiry allows students the opportunity to explore, yet simultaneously requires them to learn something about how science is done (Drayton & Falk, 2001; NRC, 2000).

This inquiry approach to teaching and learning science allows the teacher to become more facilitative in their instruction and the students to become more self-directed. As a result of this shift from a more teacher-centered classroom to a more student-centered classroom, students are able to establish “long-term conceptual understandings of science” (Kleine et al., 2002, p. 39). There is substantial empirical and theoretical evidence to support the assertion that inquiry-based science instruction is a starting point for personal construction of meaning and can also lead to higher achievement for students (Anderson, 1997; Freedman, 1997; Von Secker & Lissitz, 1999). As noted in theoretical claims supported by empirical evidence, “…greater emphasis on inquiry-based teaching is associated with higher science achievement” (Von Secker, 2002, p. 156). Scientific inquiry introduces students to the content of science, as well as the processes of investigation. “It provides the logical framework that enables students to understand scientific innovation” (Drayton & Falk, 2001, p. 25).

Although the push for science as inquiry within elementary classrooms is enormous, unfortunately it has yet to become a consistent feature of science classroom practice (Damnjanovic, 1999; Drayton & Falk, 2001; Weiss, Banilower, McMahon, & Smith, 2001; Wells, 1995; Windschitl, 2002). One possible reason for the lack of inquiry teaching in the elementary science classroom could be a reflection of the mismatch between teacher beliefs and the context of science (Windschitl, 2002). The problem with the current teaching of science is that it fails to reflect the changes in science that have occurred over the years; the process of science and the teaching of science have drifted apart. Despite the fact that many classrooms today give the illusion that inquiry is provided, such lessons typically involve structured procedures and the results provided through the textbook, as well as through the classroom experience (Drayton & Falk, 2001; Schwab, 1962). The reasoning behind this lack of visibility lies heavily on the conceptions of teachers (Windschitl, 2002). Many teachers believe that teaching science as inquiry is very difficult and cumbersome to implement and manage within classroom practice. Teachers believe that implementation is obstructed due to such constraints as time and money. They also feel that teaching science as inquiry “is possible only with above-average students” and, therefore, do not attempt to integrate inquiry into their regular education classrooms (Lee & Kraphfl, 2002; as cited in Windschitl, 2002, p. 115).

Although some of these reasons for the lack of science as inquiry experiences for elementary children may be viable, teachers themselves need to feel confident utilizing inquiry, both as learners and as teachers, so students can learn to participate in the processes of science (Kleine et al., 2002; Windschitl, 2002). One vehicle for achieving this level of confidence is through the investigation of how self-efficacy may impact teacher practice.

Social Learning Theory and Self-Efficacy Defined

One way to explain why some teachers choose to eliminate science instruction from their daily routine is through social learning theory. Bandura's social learning theory suggests that “people develop a generalized expectancy concerning action-outcome contingencies based upon life experiences” (Riggs, 1998, p. 2). “The strength of peoples' convictions in their own effectiveness is likely to affect whether they will attempt to cope with given situations; hence, perceived self-efficacy influences choice of behavior settings” (Bandura, 1977, p. 193). When individuals “…judge themselves as capable of handling situations that would otherwise be intimidating, they become involved in activities and behave assuredly, however when situations exceed one's own coping skills, individuals tend to fear and avoid” these difficult situations (Bandura, p. 194).

It has been found that efficacy expectations are presumed to influence levels of performance. Research also has indicated the predictive power of one's sense of self-efficacy on subsequent performance: “Efficacy expectations are a major determinant of people's choice of activities, how much effort they will expend, and of how long they will sustain effort in dealing with stressful situations” (Bandura, 1977, p. 194).

Developing self-confidence as a teacher of science is crucial, especially for preservice elementary teachers. In a national survey, Weiss et al. (2001) indicated that elementary teachers teach science an average of 25 minutes per day as opposed to about 114 min per day for reading and language arts. Several reasons are given to explain this phenomenon, including “lack of a strong background in science content, inadequate facilities and equipment, the congested curriculum, poor instructional leadership, and teacher attitude” (Enochs & Riggs, 1990, p. 694). Although elementary teachers are responsible for teaching all content areas, it is often noted that many elementary teachers do not feel comfortable teaching science (Martin, 2000). The difference between teachers that allow for more science instruction within their elementary classrooms and those who do not may be related to their self-confidence. Those teachers who do not believe in their ability to teach science (low self-efficacy) may avoid science instruction whenever possible (Enochs, Scharmann, & Riggs, 1995). Because of the relationship between beliefs, attitudes, and behavior with regard to elementary science teaching, this notion supports the conclusion that efficacy beliefs are potentially powerful variables that can influence the amount of science instruction time, as well as the achievement of students in science, at the elementary level.

Because Bandura (1981) described self-efficacy as a situation-specific construct, we chose to create an instrument that measures self-efficacy in regard to the teaching of science as inquiry. Although there are many instruments used to measure teacher self-efficacy in general terms, specificity was necessary within this study. Teacher efficacy beliefs appear to be dependent upon the specific teaching situations (Riggs & Enochs, 1990). Because elementary teachers typically teach all subjects—and may not feel equally effective at teaching all of them—a subject-specific instrument was more informative for the purposes of this study.

As noted earlier, Bandura (1977) asserted that the most complete prediction of human behavior can be derived from knowledge of both self-efficacy and outcome expectancy variables. Personal self-efficacy is “a judgment of one's ability to organize and execute given types of performances, whereas an outcome expectation is a judgment of the likely consequence such performances will produce” (p. 21). These principles underlie the development of the instrument in this study. The instrument we created was based on contemporary ideas about inquiry, as well as grounded in the fundamental ideas of Bandura, particularly the notion of self-efficacy being a context-specific construct.

The level of motivation an individual has for a given situation, their associated feelings toward the situation, and their subsequent behaviors are “based more on what they believe, rather than on what is objectively true” (Bandura, 1997, p. 2). “Unless people believe they can produce desired effects by their actions, they have little incentive to act” (Bandura, 1997, p. 3). Bandura (1995) explained the varying effects of perceived sense of efficacy as follows:

A strong sense of efficacy enhances human accomplishment and personal well-being in many ways. People with high assurance in their capabilities in given domains approach difficult tasks as challenges to be mastered rather than as threats to be avoided. Such an efficacious outlook fosters intrinsic interest and deep engrossment in activities. These people set themselves challenging goals and maintain strong commitment to them. They heighten and sustain their efforts in the face of difficulties. They quickly recover their sense of efficacy after failures or setbacks. They attribute failure to insufficient effort or to deficient knowledge and skills that are acquirable. They approach threatening situations with assurance that they can exercise control over them. Such an efficacious outlook produces personal accomplishments, reduces stress, and lowers vulnerability to depression. (1995, p. 11)

On the other hand,

People who have a low sense of efficacy in given domains shy away from difficult tasks, which they view as personal threats. They have low aspirations and weak commitment to the goals they choose to pursue. When faced with difficult tasks, they dwell on their personal deficiencies, the obstacles they will encounter, and all kinds of adverse outcomes rather than concentrate on how to perform successfully. They slacken their efforts and give up quickly in the face of difficulties. They are slow to recover their sense of efficacy following failure or setbacks. Because they view insufficient performance as deficient aptitude, it does not require much failure for them to lose faith in their capabilities. They fall easy victim to stress and depression. (1995, p. 11)

These quotes from Bandura support the factors previously discussed that contribute to the omission of science in the elementary classroom. Understanding the foundation upon which these quotes were built will help educators to more thoroughly understand why teachers choose to eliminate science from their curriculum and how these obstacles can be overcome.

Self-Efficacy Within the Context of Science Education

To more fully address the idea that teacher efficacy is context specific, Aston, Buhr, and Crocker (1984, as cited in Tschannen-Moran & Woolfolk-Hoy, 2001) developed a series of vignettes. These vignettes described situations a teacher would be likely to encounter and required the teacher to make judgments regarding their effectiveness in responding to the situations (Tschannen-Moran & Woolfolk-Hoy, 2001). A sample item follows:

A small group of students is constantly whispering, passing notes and ignoring class activities. Their academic performance on tests and homework is adequate and sometimes even good. Their classroom performance, however, is irritating and disruptive. How effective would you be in eliminating their disruptive behavior? (p. 788)

Although this instrument was useful in that it addressed the assumption that teacher efficacy is context specific, it did not receive wide acceptance and has not been used in any other research studies.

Drawing upon this work, Gibson and Dembo (1984) developed and validated the Teacher Efficacy Scale (TES), a questionnaire of 30 items using a 6-point Likert-type scale. This instrument was designed to measure teachers' self-efficacy beliefs by addressing the areas of their effort, skill, training, and experience. While the TES was believed to measure both general teaching efficacy (GTE) and personal teaching efficacy (PTE), other research did not substantiate this distinction. The TES was criticized for not clearly capturing the dimension of personal efficacy as described by Bandura's definition of the self-efficacy construct. In addition, the TES instrument developed by Gibson and Dembo was a global measure of the two efficacy factors. This caused concern because it was not consistent with Bandura's conception of efficacy as a situation-specific construct. This concern later encouraged others to further develop situation-specific self-efficacy instruments.

To add to the literature base on self-efficacy with attention to Bandura's description of a situation-specific construct, Riggs (1988) extended the work of Gibson and Dembo (1984). Riggs developed and validated an instrument to measure teachers' personal self-efficacy and outcome expectancy beliefs for science teaching and learning. This instrument was entitled the Science Teaching Efficacy Beliefs Instrument (STEBI).

This work by Riggs was further extended by Enochs and Riggs (1990) to address preservice elementary science teachers' self-efficacy. This instrument was entitled Science Teaching Efficacy Belief Instrument for Prospective Teachers (STEBI-B). To accomplish this task, the Riggs (1988) “STEBI-A was modified from an inservice orientation to that of preservice” (Enochs & Riggs, p. 696). Items were reworded in the future tense to allow “…for the construct to be viewed in a different situational context” (Enochs & Riggs, p. 696). Both the STEBI-A and the STEBI-B have become widely used in science education to inform teacher educators about the science beliefs of prospective teachers.

Although the Science Teaching Efficacy Belief Instrument for Prospective Teachers (STEBI-B) has been determined to be a valid and reliable instrument used to investigate preservice elementary science teachers' self-efficacy, it does not measure teaching efficacy in teaching science as inquiry. For example, the following statements, “I understand science concepts well enough to be effective in teaching elementary science” and “When a student has difficulty understanding a science concept, I am usually at a loss as to how to help the student understand it better” (Riggs & Enochs, 1990, p. 635) seem like effective statements for measuring science teaching; however, they do not capture the essence of scientific inquiry. Although this measure was a reflection of how science was taught during its time period, it does not capture the essential features of classroom inquiry. This measure was developed before reform documents were published; therefore, it is not a reflection of contemporary education reform.

In addition, the statements, “I find it difficult to explain to students why science experiments work,” and “I am typically able to answer students' science questions,” again do not support the process of scientific inquiry (Riggs & Enochs, 1990, p. 635). These statements infer that the teacher must explain to students the many phenomena that exist in the world of science and that every experiment has a certain procedure that must be followed to arrive at the correct answer. These statements do not draw attention to students engaging in scientifically oriented questions and giving priority to evidence when formulating explanations. Furthermore, the statements do not involve students in evaluating their explanations in light of alternative explanations and then communicating their proposed explanations to others. Although the STEBI was a useful tool in its time, the current standards associated with contemporary science education reform require a new instrument to fit the ever-changing complexities of science education.

More recently, Tschannen-Moran and Woolfolk-Hoy (2001) attempted to begin work on a new measure of efficacy. This new measure, named the Ohio State Teacher Efficacy Scale (OSTES), is said to be “…superior to previous measures of teacher efficacy” (p. 801). This statement was based on the assertion that the measure captures a broad range of capabilities necessary for good teaching. However, it is not so specific that it renders itself useless for comparison of teachers in different domains or contexts. The OSTES is considered to be reasonably valid and reliable. The researchers have asserted that this measure “…moves beyond previous measures to capture a wide range of teaching tasks” (p. 801). In addition, this instrument is unique in that it addresses a broad range of teaching responsibilities, including assessment, meeting individual student needs, motivating student engagement and interest, and addressing student misconceptions.

Significance of the Study

Although various researchers have set out to improve teacher education in admirable ways, few of these attempts have involved the relationship between self-efficacy beliefs and practice. Recalling Bandura's (1977, 1986, 1995) assertions that self-efficacy beliefs have a powerful influence over one's behavior, it is important for teacher educators to further investigate the influence of teacher beliefs on classroom practice. As a result of the relationship among beliefs, attitudes and behavior, the purpose of this study was to create an instrument that measures preservice teachers' self-efficacy in regard to the teaching of science as inquiry. Additionally, given the current trend of contemporary science education reform, there is not an instrument to measure self-efficacy and its impact on the teaching of science as inquiry. As a result of the current trends in science education—and the renewed interest in inquiry—there is a need to focus on the teaching of science as inquiry. Due to the shift in science education, it was our goal to contribute to the science education literature by addressing the ideas of where self-efficacy and inquiry science teaching connect.

As such, by developing, validating, and establishing the reliability of an instrument to measure self-efficacy beliefs of preservice elementary teachers with regard to the teaching of science as inquiry, the current reform efforts will also be addressed. We hope this instrument will provide a foundation through which researchers can identify certain individuals and investigate the connections between beliefs and actual teaching behaviors and classroom practices. Through the completion of more extensive research coupled with this instrument, science educators may come to more clearly understand the connection between teacher beliefs and the teaching of elementary science as inquiry.

Method

Participants

Data were collected from elementary education majors during their senior year at a central Pennsylvania university. Participants were enrolled in a science methods course during the time of data collection. The Teaching Science as Inquiry (TSI) instrument was administered to 190 prospective elementary school teachers in six sections of a science methods course during the week of September 8, 2003, and then again during the week of December 1, 2003, within a university classroom setting. This group of participants represents the intended target population for the final instrument. The intended population would, therefore, include preservice elementary science teachers and beginning practicing science teachers.

Development of the TSI

A 13-step process was utilized to complete this study. This 13-step plan described below was used to develop and build validity and high reliability into the TSI.

  • Step 1: Defining the Construct. According to Graziano and Raulin (2000), a Construct “is an idea constructed by the researcher to explain events observed in a particular situation. They are explanatory fictions because, in most cases, we do not know the real reason for a particular event. Once formulated, constructs are used as if they are true to predict relationships between variables in situations that had not previously been observed” (p. 419).

In this study, the instrument to be developed measured preservice teachers' self-efficacy in regard to the teaching of science as inquiry. Specifically, the two dimensions of self-efficacy to be measured are personal self-efficacy and outcome expectancy as defined by Bandura (1977).

Teaching and learning science as inquiry as recognized by the National Science Education Standards (NRC, 2000) involves five essential features:

  1. 1.

    Learner engages in scientifically oriented questions,

  2. 2.

    Learner gives priority to evidence in responding to questions,

  3. 3.

    Learner formulates explanations from evidence,

  4. 4.

    Learner connects explanations to scientific knowledge, and

  5. 5.

    Learner communicates and justifies explanations. (p. 29)

  • Step 2: Item Preparation, Version 1. The first phase involved the development of a preliminary instrument to assess preservice teachers self-efficacy in regard to inquiry science teaching. To develop this preliminary version of the instrument, the researchers utilized the text, Inquiry and the National Science Education Standards (NRC, 2000). To devise items representative of the construct to be studied, the researchers first developed Version 1 of the TSI. This version of the instrument was composed of a broad set of 31 statements to capture the nature of teaching science as inquiry. Example statements included “allowing student interest to guide the curriculum” and “provide opportunities for students to discuss the experiments in which they participated.” Although the items represented in Version 1 of the instrument followed the standards set forth by the NSES (NRC), they were only a broad set of items that clearly needed to be more thoroughly developed.

  • Step 3: Content Validity, Version 1. A panel composed of six faculty members from the University of Florida, The Pennsylvania State University, and the University of Missouri and three graduate students from The Pennsylvania State University representing the areas of science education and self-efficacy research was assembled for the purpose of judging the content validity of Version 1 of the instrument. Each member of the panel had elementary school, middle school, or both science classroom experience. Independent preliminary feedback from each member of the panel was collected and used as a means to revise the items. Overall, the two following broad themes emerged: basic grammatical revisions and content revisions. To address these themes, many of the 31 statements were revised to become more specific and more clear. The most significant revision made to Version 1 was that several statements were added to the instrument. The goal in adding these items was to more thoroughly represent the ideal of teaching science as inquiry. The panel also suggested including items that represent the five essential features of classroom inquiry. As a result of the activities identified in the first three steps, items from Version 1 were revised. This provided a basis for final item preparation for the next phase of the development process.

  • Step 4: Content Validity, Version 2. Upon completion of Version 2 of the instrument, a letter that explained the review process and the 81 draft items of Version 2 were submitted to the panel of nine experts. Directions, as well as a definition of “inquiry,” were provided with the instrument to ensure that all reviewers responded to the instrument in the same manner. The panel reviewed each of the items independently for clarity and comprehension. They were also asked to categorize each of the 81 items into the following five groups, which pertain to the five essential features of classroom inquiry (listed in the “Introduction” and “Method” sections).

In addition to categorizing the 81 statements, panel members were also invited to provide suggestions for rewording or rephrasing the statements. Additionally, the panel was also encouraged to identify statements they would like to see added to the list. Example statements included “I will be able to engage students in designing the learning environment in an attempt to allow for diversity of problems and methods,” “I possess the ability to ‘let go’ and allow students to devise their own problems to investigate,” and “The majority of evidence is derived from instructional materials such as a text book.”

Comments for improving the items and the categorizations were recorded directly on the instrument. Feedback was collected by the researcher and used to revise the items. Information concerning the categorization of each of the items was used to verify if the items represented the intended essential features of classroom inquiry. The researcher analyzed the panel's feedback to identify patterns among the data. Items with vast agreement remained on the instrument, whereas items with vast disagreement were revised according to the panel's suggestions or removed from the instrument if suggestions were not provided. Revisions also included basic grammatical corrections and content clarification. In particular, the reviewers suggested revising some of the statements to better represent the five essential features of classroom inquiry previously listed.

These revised items were then submitted to the panel of faculty members and graduate students for further review. The faculty members and graduate students revised until they judged that the clarity and comprehension was achieved for each of the items. After two rounds of review by the faculty members and graduate students, 94 items were consequently identified for Version 3.

As a result of the activities identified in Step 4, items from Version 2 were revised. This provided a basis for final item preparation for the next phase of the development process.

  • Step 5: Content Validity, Version 3. Using the newly revised version of the instrument, another round of construct validity was conducted. The reviewers for this round consisted of three faculty members from The Pennsylvania State University representing science education and self-efficacy research. These faculty members were given each of the items on the instrument one at a time and asked to place each of the items on a larger representation of the essential features of classroom inquiry and their variations (NRC, 2000, p. 29). Throughout this process, the researcher asked the reviewers to verbally convey their thought processes and reasoning behind each of their placements. During this time, the members of the review panel collectively provided feedback to the researcher. To ensure that all feedback would be considered and reflected in the next version of the instrument, the researcher audiotaped and then transcribed this meeting.

  • Step 6: Revision of Items. As a result of the process discussed in Step 5, Version 3 of the instrument was revised. Most of the revisions consisted of rephrasing the items to better represent the construct of science as inquiry, as well as Bandura's (1977) definition of self-efficacy. Specifically, upon suggestions of the faculty members, the researcher rephrased many of the items to capture the two dimensions of self-efficacy described by Bandura: personal self-efficacy and outcome expectancy. The faculty members also suggested adding seven new items to the instrument in an attempt to better address the definition of self-efficacy and the definition of teaching science as inquiry. Again, during the process of revision, items with vast agreement remained on the instrument, whereas items with vast disagreement were either revised or removed from the instrument. In addition, many items were revised for clarification and clarity.

The activities identified in Steps 5 and 6 provided a basis for final item preparation for the next version of the instrument, Version 4, which consisted of 65 items. Example statements include “As a teacher of science, I will be able to offer multiple suggestions for creating explanations from data,” “As a science teacher, I will provide opportunities for students to become critical decision makers when evaluating the validity of scientific explanations,” and “I possess the skills necessary for guiding my students toward explanations that are consistent with experimental and observational evidence.”

  • Step 7: Content Validity, Version 4. Similar to all of the other versions, content validity was conducted on Version 4 of the instrument. Four faculty members from The Pennsylvania State University representing science education and self-efficacy research were brought together for the purpose of judging each item's representation of Bandura's (1977) two dimensions of self-efficacy. The reviewers analyzed each of the items independently. Similar to previous content-review processes, comments from each reviewer regarding improving the items were also recorded directly on the instrument. Feedback was collected by the researcher and used to revise the items. As a result of this review process, it was clear that there were more personal self-efficacy items than outcome expectancy items. Each reviewer offered suggestions pertaining to how several of the items could be transformed from personal self-efficacy to outcome expectancy by simply changing the wording of the item. These comments were used as a means to revise and ensure that there was equal distribution of items within the two dimensions of self-efficacy. In addition, a few of the items were revised for clarity and comprehension.

As a result of the activities identified in Step 7, items from Version 4 were revised. This provided a basis for final item preparation for the next phase of the development process—the creation of Version 5.

  • Step 8: Content Validity, Version 5. Version 5 of the instrument also underwent a round of content validity. During this time, faculty members from The Pennsylvania State University reviewed each of the items to ensure that there was a balance of items within the construct of self-efficacy as defined by Bandura (1977). Specifically, the reviewers viewed each of the items to ensure that there was a relatively equal distribution between the number of items representing personal self-efficacy and the number of items representing outcome expectancy. Each reviewer analyzed the items independently and again made comments directly on the instrument. Basic grammatical and content revisions were made to the items. In addition, revisions were made to enhance the comprehension and clarity of the items.

As a result of the activities identified in Step 8, items from Version 5 were revised. As a result of this round of content validity, revisions were made to Version 5 of the instrument and, consequently, Version 6 was created.

  • Step 9: Content Validity, Version 6. Version 6 of the instrument also underwent a round of content validity. During this time, faculty members from The Pennsylvania State University again reviewed each of the items to ensure that there was a balance of items within the construct of self-efficacy as defined by Bandura (1977). In addition, the reviewers read each of the items to ensure that each clearly illustrated teaching science as inquiry. Specifically, the reviewers again placed each of the items into categories representing the 24 variations of the essential features of classroom inquiry (NRC, 2000, p. 29). Once complete, each of these cells were analyzed to ensure that there was a relatively equal distribution between the number of items representing personal self-efficacy and the number of items representing outcome expectancy within each of the 24 cells. While completing this content validity process, each reviewer analyzed the 65 items independently and again made comments directly on the instrument. Basic grammatical and content revisions were made to the items. In addition, revisions were made to enhance the comprehension and clarity of the items.

As a result of this round of content validity, revisions were made to Version 6 of the instrument and, consequently, Version 7 was created. Table 1 summarizes the results for the item distribution for Version 7 of the instrument. Each cell represents the essential features of classroom inquiry and their variations. The typeface (italics or underline) of each item indicates personal self-efficacy and outcome expectancy.

Table 1 Distribution of Items
  • Step 10: Administration of Version 7. Version 7 of the instrument contained 69 items and was administered to the 190 preservice elementary teachers in six sections of a science methods course during the week of September 8, 2003. Of the 190 participants, 91% were female and 9% were male. These groups represented the intended population for the final instrument.

  • Step 11: Analysis of Data, Version 7. The data obtained from administering the 69-item Version 7 to the science methods classes were used to identify the items to be included in the TSI (using SPSS, Version 11.0.4). The following guiding question was developed for this purpose:

What is the most reliable and valid combination of items to compose the TSI for the purposes of assessing preservice elementary teachers' self-efficacy beliefs in regard to teaching science as inquiry and the two dimensions of self-efficacy: personal self-efficacy and outcome expectancy?

This question required the researcher to use data from Version 7 to examine the construct validity of the items and the contributions each item made to the reliability of the instrument. Hence, data from the Version 7 items were examined for evidence of construct validity. Item score to total test score correlation and item contribution to total test reliability were used to identify the strongest items and, therefore, eliminate those that were not positively contributing to the overall reliability of the instrument. Item balance across the 24 variations of the essential features of classroom inquiry was also examined to determine the reliability of the instrument. Coefficient alpha, a measure of internal consistency, was utilized to examine the reliability of the instrument. The strongest combination of construct valid and reliable items that had balanced representation within the essential features of classroom inquiry and their variations, were identified using a combination of these procedures.

The ranges on the internal consistency were from .4906 to .7429. Please refer to Tables 2 and 3 for specific internal consistency data. These ranges met or exceeded the requirements set forth by Sax (1974) and Nunnally (1978) pertaining to first generation instrument construction. Outcome expectancy for the category “Learner Connects Explanations to Scientific Knowledge” of the instrument, produced the lowest alpha, .4906. Although this alpha is not as high as one would like it to be, omitting any item from this category would only lower the reliability of the instrument. In addition, one factor that contributes to the reliability of a test is the number of items on the test (Anastasi & Urbina, 1997). For this particular analysis, there were only four items in the category resulting in the .4906 alpha. The small amount of items within this category could have possibly accounted for the low reliability.

Table 2 Reliability Results for Self-Efficacy and the Essential Features of Classroom Inquiry—Version 7
Table 3 Reliability Results for Outcome Expectancy and the Essential Features of Classroom Inquiry—Version 7

The data obtained during these analyses were used to arrange items for Version 8 of the TSI. Due to the reliability results as determined by the internal consistency, as well as the correlation data, Version 8 of the TSI consisted of the identical items that were present on Version 7. Revisions made to Version 7 were aesthetic in nature and were, therefore, done to enhance the readability and visual appearance of the instrument. In addition, those revisions may also contribute to the ease at which participants complete the instrument. Font size was enlarged to increase the participants' ease in reading the items, sentence starters were added to reduce redundancy while reading the items, and shading was added to every other item to provide more ease when completing the instrument.

  • Step 12: Administration of Version 8. A second construct validity and reliability study was conducted on the TSI Version 8 during the week of December 1, 2003. This was done to further develop evidence of the instrument's construct validity and to collect data on the internal reliability and test-retest reliability of the instrument. Version 8 of the instrument contained 69 items and was administered during the week of December 1, 2003, to the 184 preservice elementary teachers in the same six sections of the science methods courses. Of the 184 participants, 90% were female, and 10% were male. These groups again represented the intended population for the final instrument. The resulting data were used in formulating the TSI as described in Step 13, below.

    Table 4 Reliability Results for Self-Efficacy and the Essential Features of Classroom Inquiry—Version 8
  • Step 13: Analysis of Data, Version 8. Data obtained from the administration of the 69-item Version 8 of the TSI to the science methods classes were used to identify items to be included in the final version of the instrument. Similar to Step 11, again the researcher examined the construct validity of the items and the contributions each item made to the reliability of the instrument. Data from Version 8 of the instrument were examined for evidence of construct validity. Item score to total test score correlation and item contribution to total test reliability were used to identify the strongest items. Item balance across the 24 variations of the essential features of classroom inquiry was also examined to determine the reliability of the instrument. Coefficient alpha, a measure of internal consistency, was utilized as a means to examine the reliability of the instrument. The strongest combination of construct valid and reliable items that had balanced representation within the essential features of classroom inquiry and their variations, were identified using these procedures in combination. The ranges on the internal consistency were from .6034 to .7833. Please refer to Tables 4 and 5 for specific internal consistency data. These ranges again met or exceeded the requirements set forth by Sax (1974) and Nunnally (1978) pertaining to first-generation instrument construction. In addition, most of these results met the stricter standards described by Anastasi and Urbina (1997) and Isaac and Michael (1997).

Table 5 Reliability Results for Outcome Expectancy and the Essential Features of Classroom Inquiry—Version 8

One-way analysis of variance (ANOVA) scores on the TSI were com- pared across the six sections of the science methods course. The .05 level for statistical significance was used to determine if statistically significant differ- ences in the subscale scores existed on the TSI among the six sections. Please refer to Tables 69 for further information regarding the ANOVA scores.

Table 6 Analysis of Variance Results for Self-Efficacy (SE) and Outcome Expectancy (OE) by Gender—Version 7
Table 7 Analysis of Variance Results for Self-Efficacy (SE) and Outcome Expectancy (OE) by Gender—Version 8
Table 8 Analysis of Variance Results for Self-Efficacy (SE) and Outcome Expectancy (OE) by Section—Version 9
Table 9 Analysis of Variance Results for Self-Efficacy (SE) and Outcome Expectancy (OE) by Section—Version 8

Data obtained during these analyses indicated that this pool of items best represented the intended means of the instrument. Due to the reliability results as determined by the internal consistency, as well as the correlation data, the final version of the TSI consisted of the identical items that were present on Version 8. No further revisions were made for the purposes of this research.

Conclusion

Based on the instrument development processes used and the associated data analysis results, the TSI appears to be a content and construct valid instrument with high to moderate internal reliability and high to moderate test-retest reliability qualities for use with preservice elementary education teachers to assess self-efficacy in regard to the teaching of science as inquiry. Instrument reliability results are summarized in Tables 25. The Cronbach's alpha internal consistency data revealed an acceptable level of reliability in the scores for first generation instruments (Nunnally, 1978; Sax, 1974).

Implications

Based on the study results, the experiences of the investigators, and prior literature, three areas of implications are specifically addressed. These areas are implications for research, policy, and practice.

Research Implications

The construct validity of an instrument is never fully established or achieved (Nunnally, 1970); thus, it is important to continue examining the construct validity of the TSI. In the process, the reliability of the instrument, including test-retest reliability, should continue to be assessed.

There are several research implications targeted toward teacher efficacy and science education. Research is needed to further explore the effects of self-efficacy on teacher development and how self-efficacy may affect eventual classroom practice. The TSI is a valuable tool for science teacher educators working in practical and research settings to assess the self-efficacy beliefs of prospective elementary teachers with regard to the teaching of science as inquiry.

The TSI should be used in combination with other data collection techniques to more fully determine the self-efficacy beliefs of prospective teachers. These data collection techniques may include, but are not limited to, observations of teachers engaged in the teaching of science as inquiry, as well as interviews with prospective teachers, to more clearly understand their ideas and beliefs associated with the teaching of science as inquiry. Although quantitative and qualitative research methods have been regarded as being “fundamentally different modes of inquiry,” both can be pursued rigorously (Shavelson & Towne, 2002, p. 19). Shavelson and Towne noted that the current trend of research in education to make greater use of qualitative methods at the expense of quantitative methods has created a dialogue of criticism. The nation's commitment to make “…scientific literacy for all a reality in the 21st century” (NRC, 1996, p. ix) requires continued efforts to improve the research capacity of science education (NRC; Shavelson & Towne). “What makes research scientific is not the motive for carrying it out, but the manner in which it is carried out” (Shavelson & Towne, p. 20). Hence, using a mixed-methods approach to investigate preservice teacher self-efficacy should assist in achieving this goal.

Observation of preservice teachers in the classroom could provide additional information in relation to the predictive validity of the instrument. By observing classroom teaching, one would be able to determine if a particular score on the instrument transferred into behavior and practice. Additionally, development of a form of the TSI for practicing elementary teachers, as was done with the STEBI-B, should be pursued. Furthermore, using the information obtained from both versions of this instrument, as well as classroom observations and interviews, comparisons can be made between the scores of preservice and inservice elementary science teachers.

In addition, because the instrument utilized a forced-choice response from the prospective elementary teachers, interviews should also be conducted. These interviews may more fully explore the prospective elementary teachers thoughts associated with the items on the TSI, as well as their responses to these items. This interview process could indicate if the preservice teacher truly understood the meaning of the items and if the researcher thoroughly understood the prospective elementary teacher's responses to the items. The inclusion of interviews would allow for a deeper understanding of the preservice teachers' self-efficacy in regard to the teaching of science as inquiry.

The idea of preservice elementary teachers' self-reporting inflated self-efficacy perceptions with regard to the teaching of science as inquiry also needs to be investigated. Because this study relied on self-reported data, it would behoove researchers to further investigate the idea of social desirability bias ([SDB]; Nancarrow & Brace, 2000; Ray, 1990). SDB refers to the possibility of respondents reporting what they perceive is socially desirable, rather than what might be the actual case. Such researchers as Phillips and Clancy (1972, as cited in Nancarrow & Brace, 2000) believed that SDB occurs because of two factors: “the general strength of need for approval felt by an individual (personality trait) and the demands of a particular situation.” Another important factor that may contribute to SDB is the desire “to present oneself in a favorable light to others and/or a self-esteem preservation function” (ac cited in Nancarrow & Brace). Although these are possible concerns, there are many ways in which researchers may reduce this problem. For example, a Likert-type scale, similar to the one used for the TSI, is one viable solution to the problem of SDB (Ray).

Another area warranting further consideration is the concept of types of teacher efficacy. Although research has indicated that positive teacher efficacy is an appropriate goal (Ashton, 1985; Ashton & Webb, 1982, 1986; Bandura, 1977, 1986; Tschannen-Moran & Woolfolk-Hoy, 2001), Wheatley (2000) identified eight different types of positive teacher efficacy that he considers problematic: “traditional methods, traditional goals, too-certain efficacy, overly-optimistic novices, hypothetical future efficacy, pretend teacher efficacy, competitive teacher efficacy, and independent teacher control” (pp. 18–21).

The foundation for Wheatley's (2000, 2002) research is the belief that teachers' doubts about their teaching efficacy often have important benefits for teacher learning and education reform. Wheatley believed that these “doubts are essential to widespread success of education reform, particularly for reforms that promote progressive meaning-centered education” (2002, p. 5). Although these assertions conflict with most of the previous research on teacher efficacy, it is important to carefully explore the meaning of these findings, as well as their relationship to education reform. Thus, to more fully understand and encourage the types of teacher efficacy that support teacher development, new approaches to teacher efficacy research are needed. To identify how teacher efficacy, confidence, and doubt may work together, research should be conducted within the context of the daily realities of teaching (2002). Additionally, these new approaches for investigating teacher efficacy should include qualitative means of research.

Policy and Practice

The TSI is a useful tool for the evaluation of science method courses with an emphasis on inquiry teaching. Although research and evaluation are two separate entities, through the use of the aforementioned research techniques and methods, program improvement, development, and assessment should be facilitated.

“Like other research, evaluation attempts to describe, to understand the relationships between variables, and to trace out the causal sequence” (Weiss, 1972, p. 8). Evaluation applies the methods of social research and is distinguished by its intent, the purpose for which it is done, rather than the method or subject matter. The purpose of evaluation research is to measure the effects of a program against the goals it set out to accomplish (Payne, 1994; Weiss, 1972). For example, test-retest data derived from the TSI instrument can be used in combination with other research techniques using an experimental model. This evaluation can help to identify if a particular course is achieving what it purports with regard to the teaching of science as inquiry. The data derived from this analysis is then used to make decisions about the current program and future programs.

When analyzing this data, however, it is important for researchers to realize that reliability values may decrease from pretest to posttest. This, in fact, was evident in the reliability results presented in Tables 2 and 4. The scores reported for “learner engages in scientifically oriented questions” in relation to self-efficacy decreased from an alpha of .6884 on the pretest to an alpha of .6579 on the posttest. It is the researchers' conjecture that when students initially entered the science methods course, their conceptions associated with learner engages in scientifically oriented questions were different than what is actually involved in the teaching of science as inquiry. Teaching science as inquiry requires teachers to possess a “sophisticated set of judgments about science, students, learning, and teaching” (NRC, 1996, p. 37). The central strategy for teaching science as inquiry is to use authentic questions generated from students' experiences. Teachers provide students with the opportunity to investigate these questions by giving students investigations or by guiding students toward designing investigations of their own. As a result,

Teachers of science are constantly making decisions, such as when to change the direction of a discussion, how to engage a particular student, when to let a student pursue a particular interest, and how to use an opportunity to model scientific skills and attitudes. (p. 33)

Consequently, this complex decision-making process requires teachers to struggle with the tension between guiding students toward a set of predetermined goals and allowing students to set and meet their own goals (NRC, 1996). Thus, when preservice teachers have the opportunity to experience a science methods course that provides preservice teachers the opportunity to experience the teaching of science as inquiry, their previously held conceptions may change. These preservice teachers may come to realize that the teaching of science as inquiry is much more complex and more difficult than they had originally thought.