1 Introduction

The quality of students’ science learning at middle-grade levels plays an important role in shaping their later science outcomes (Kwon and Lawson 2000; Jackson and Davis 2000), such as their decisions to enroll in science classes in high school, to choose a STEM-related college major, and finally to pursue a STEM career (Britner and Pajares 2006; Chen and Pajares 2010). These educational outcomes further affect economic prosperity, democratic process, and individual pursuit of equality and happiness in a nation (Quinn and Cooc 2015). However, U.S. middle-grade students showed poor science performance in international assessments (Kastberg et al. 2016; Provasnik et al. 2016), which has become a national concern during the past several decades (National Science Board 2018). Researchers have paid special attention to this issue and suggested a multitude of factors that are related to it. How teachers teach science and how confident students are in learning science are two key factors identified by many researchers (Bandura et al. 2001; Fogleman et al. 2011; Høigaard et al. 2015; Sweller 2009).

Science teaching approaches are at the center of students’ science learning quality (Fogleman et al. 2011). Although many teaching approaches are general to almost all subjects (Dole et al. 1991; Hattie et al. 1996), a set of approaches specific to science teaching are implemented in science classes. These approaches have often been categorized into traditional didactic teaching and practice-based science teaching (National Research Council 2012). Substantial research studies have been carried out to examine the relationship between these two main types of teaching and student achievement. However, the studies have corroborated the significant effects of both type of teaching on student achievement, thus, making inconclusive the question what kind of teaching approach is more effective for students’ science learning (Cairns and Areepattamannil 2019; Furtak et al. 2012; Jerrim et al. 2019; Lazonder and Harmsen 2016; Minner et al. 2010; Rönnebeck et al. 2016). In addition, the mechanism of how teaching approaches affect students’ science achievement has seldom been addressed in the literature. Researchers have suggested that students’ academic self-efficacy may play a role in the relationship (Chen and Pajares 2010). However, this hypothesis has not yet been empirically examined.

As another key factor affecting students’ science achievement, students’ self-efficacy, has also been examined in plenty of studies (Aurah 2017; Bandura et al. 2001; Boz et al. 2016; Britner 2008; Høigaard et al. 2015; Schraw et al. 2006). However, many studies have focused on students in elementary schools (Griggs et al. 2013), high school (Britner 2008; Kupermintz 2002; Lau and Roeser 2002; Tsai et al. 2011), and college (Alt 2015; Bartimote-Aufflick et al. 2016; Bilgin et al. 2015; Gormally et al. 2009), and there have been comparatively fewer studies focusing on middle school students (Britner and Pajares 2001, 2006; Chen and Pajares 2010). Additionally, although extensive research has supported the significant relationships between self-efficacy and science achievement as well as between teaching approaches and science achievement, little empirical evidence has been obtained from real school environments and incorporated the hierarchical relationship between teachers and students (Jiang and McComas 2015). Very few studies collected large-scale and nationally representative data. What’s more, previous studies have demonstrated that self-efficacy had a mediating effect on the relationship between previous and current academic achievement as well as learning environment and achievement (Diseth 2011; Moriarty et al. 1995). Nevertheless, the mediating effect has not been investigated on the relationship of between teaching approaches and academic achievement. Further, limited literature examining the relationship between science teaching approaches and student self-efficacy has also yielded mixed conclusions (Britner and Pajares 2001; Sungur and Tekkaya 2006).

Given these gaps in the literature, the current study aims to examine the direct and indirect relationships between teaching approaches, student self-efficacy in science, and student science achievement by using US eighth-grade dataset from the 2011 TIMSS large-scale assessment. It addresses the following research questions:

  1. 1.

    How do teaching approaches directly associate with student science achievement?

  2. 2.

    How do teaching approaches directly associate with student self-efficacy?

  3. 3.

    How does student self-efficacy in science directly associate with student science achievement?

  4. 4.

    Does student self-efficacy in science mediate the relationship between teaching approaches and student science achievement?

2 Theoretical framework and literature review

2.1 Teaching approaches and student science achievement

Two theoretical lenses frame this study. The first one is the theories about how different teaching approaches influence student achievement. In this study, teaching approaches are categorized into generic teaching and science-domain specific teaching approaches (Russ et al. 2016). Generic teaching approaches refer to strategies designed to initiate, sustain, and foster students’ cognitive engagement across different subject areas and grade levels. These approaches can lead to positive science learning outcomes (Azevedo 2015) by influencing students’ attention allocation, metacognitive monitoring, positive emotions (Broughton et al. 2011), setting learning goals (Zimmerman 1990), and involvement in academic tasks (Kyriakides et al. 2013). For example, empirical studies showed that questioning strategy, providing or reinforcing objectives, directly influence achievement (Schroeder et al. 2007).

Science-domain specific teaching approaches reflect specific scientific processes (Bransford et al. 1999) and help students develop an understanding of scientific content, processes, and values at the same time (Southerland et al. 2007). Two different type of teaching approaches are currently found in most science classrooms (Windschitl and Calabrese Barton 2016). The first one draws on a practice-based view of teaching and of science itself. This practice-based science teaching includes activities such as designing, planning, conducting investigations, conducting experiments, and developing explanations based on observations. These science practices provide students with opportunities to apply reason to key concepts in science, participate in the discussion of science, and solve authentic problems (Sinatra et al. 2015; Windschitl and Calabrese Barton 2016). This notion has been directing science education reform in the US and has helped shift the focus of science teaching from didactic teaching to the approaches that engages students in science practices (NGSS Lead States 2013; NRC 1996). At the same time, traditional didactic science teaching is often seen as a popular, yet ineffective, science teaching method as it transmits science facts to students mainly through teachers’ lecture and students’ drill and practices in order to memorize factual knowledge (Crawford et al. 2005; Smerdon et al. 1999).

Although there have been many discussions about these two types of science teaching in the literature, studies that examined the effects of different science teaching approaches on student achievement led to mixed results. In a study by Wilson, Taylor, Kowalski, and Carlson (2010), researchers randomly assigned 58 14–16-year-old students to either a science practices group or a control group. In the science practices group, students were exposed to learning activities, such as designing and implementing investigations, analyzing and interpreting data, constructing explanations and designing solutions, and evaluating scientific arguments. In the control group, students received traditional didactic instruction. Students in the science practices group performed significantly better than those in the control group on science achievements. This result is consistent with another study (Taraban et al. 2007) that assessed the achievement of 408 high-school students from six classrooms in Texas. Among the six classrooms, some were characterized with science practices focusing on lab activities, while others were traditional teaching classrooms with the characteristics of teachers’ direct transmission of information, whole-class activities, and “cookbook” experiments. However, these results are challenged by a study conducted by Wolf and Fraser (2007), which involved two science teachers and 165 7th grade students in eight classes located in Long Island, New York. The researchers did not find a significant difference between the effects of these two teaching strategies on student science achievement. Four classes of students who were engaged in science practices, including developing and planning their own investigations, working collectively toward unique goals, and sharing individual findings through conducting several laboratory experiments based on the physical science curriculum, scored slightly higher but not statistically significant than another four classes of students that followed a set of procedures provided by teachers and completed a findings’ worksheet. Similarly, Glasson (1989) found that ninth-graders from a classroom focusing on science practices did not outperform their peers from a classroom using direct demonstration for instruction. A limitation of these above-cited studies is that student samples were mostly selected from limited geographic areas, which cannot represent students from the whole nation. The results from some large datasets, such as Programme for International Student Assessment (PISA), indicated that science practice-based teaching was negatively related to science achievements of adolescents in Canada (Areepattamannil et al. 2011), Finland (Lavonen and Laaksonen 2009), England (Jerrim et al. 2019), and even across 54 countries (Cairns and Areepattamannil 2019). The current study intends to join this ongoing conversation by examining the relationship between student science achievement and science teaching approaches based on a nationally representative US sample collected in the 2011 TIMSS assessment.

2.2 Self-efficacy in science

The second theory guiding this study is that student self-efficacy has a positive impact on academic achievement (Bandura 1997). According to Bandura (1997), self-efficacy in human functioning, like people’s other motivational and affective states and actions, is “based more on what they believe than on what is objectively true’’ (p. 2). A wealth of research has supported the positive role of student self-efficacy in their achievement, performance, and other outcomes in different subjects (Artino 2012; Greene 2015; Greene and Miller 1996; Miller et al. 1996; Linnenbrink and Pintrich 2003), including in science (Andrew 1998; Britner and Pajares 2006; Chen and Usher 2013; Jansen et al. 2015; Luzzo et al. 1999; Pajares et al. 2000; Yazici et al. 2011).

Previous studies have demonstrated that students’ self-efficacy in science is affected by student demographic backgrounds, such as gender, race, grade, student beliefs, such as implicit beliefs of ability, epistemic beliefs, beliefs of learning science (Britner and Pajares 2001, 2006; Chen and Usher 2013; Griggs et al. 2013; Tsai et al. 2011), and mastery experiences, one of the primary sources of self-efficacy (Bandura 1997; Britner and Pajares 2006; Luzzo et al. 1999). Past research has also shown that student self-efficacy in learning is influenced by how academic tasks are provided, how students are encouraged to do their work, and other teaching approaches, such as recognizing students’ effort, providing opportunities for improvement, focusing on individual learning progress, help students improve their self-efficacy in learning (Ames 1992; Ryan et al. 1998).

In addition, a few studies also revealed that student self-efficacy improved through innovative teaching or curricula, such as engaging students in authentic inquiry-oriented science investigations (Britner and Pajares 2001), game-based learning (Meluso et al. 2012), the embedment of technology (Liu et al. 2011), and the use of animation (van der Meij et al. 2015). One study implementing the Responsive Classroom approach focusing on social-emotional learning reported significant increase in student self-efficacy and decrease in anxiety in math and science (Griggs et al. 2013).

However, the literature is limited in the number of studies that examine the effects of teaching practices on student self-efficacy in science. These studies are also limited methodologically because they have employed mostly experiments or quasi-experiments and reported the effects of the interventions on the outcomes through statistical analyses of mean differences. What is lacking in the literature is probably a study that takes into account hierarchies existing between teachers and students and examines the relationships through a modelling approach. Furthermore, the limited number of empirical studies also yielded mixed results about the relationship between teaching approaches and self-efficacy. For instance, a study found that more inquiry-oriented science investigations provide middle school students with mastery experiences that are necessary to the development of strong science self-efficacy (Britner and Pajares 2001). However, another study did not find the same results in a 6-week problem-based teaching session with high school students in a biology class (Sungur and Tekkaya 2006).

Despite the connections between student self-efficacy in science and science achievement as well as between teaching approaches and student self-efficacy, there were surprisingly few empirical studies that have put these three aspects together in one study and examined their direct and indirect relationships. However, self-efficacy was corroborated in a few studies as a mediator (Diseth 2011; Moriarty et al. 1995). For instance, in one experimental study that examined the effects of learning environments on achievement and behavior, researchers found that competitive learning environments enhanced students’ performance but this effect was difficult to maintain if students had a low self-efficacy (Moriarty et al. 1995). A more recent study showed that the relationship between students’ past and current academic performance was mediated by self-efficacy (Diseth 2011). Further, past studies have found a mediating effect of factors similar to self-efficacy, such as positive affect, on the relationship of teaching practices and students’ science achievement (Long 2016).

2.3 Hypotheses

Based on previous literature review, we hypothesized:

  1. 1.

    Student self-efficacy in science has a significant, direct, and positive effect on student science achievement.

  2. 2.

    Three types of teaching approaches (one generic teaching approach and two science-domain specific teaching approaches) have significant direct effects on student science achievement. Generic teaching approaches and science practice-based teaching have positive effects while traditional didactic science teaching has negative effect.

  3. 3.

    Three types of teaching approaches have significant direct effects on student self-efficacy in science. Generic teaching approach and science practice-based teaching approach have positive effects and traditional didactic science teaching has negative effects.

  4. 4.

    Three types of teaching approaches have significant indirect effects on student science achievement through student self-efficacy in science.

3 Methods

3.1 Data source and sample

In this study, we used the dataset from the 2011 Trends in International Mathematics and Science Study (TIMSS), which is the fifth cycle of the assessment. The data were collected from 10,382 8th—grade students and 865 teachers in the U.S. They were selected through the approach of two-stage stratified cluster sampling with probabilities proportional to size (PPS), which is the sampling design consistently used in international large-scale assessments, such as TIMSS and PISA. Specifically, schools were first randomly selected with PPS at the first stage and teachers and students in one or more classes were randomly selected at the second stage. The use of this sampling approach ensures the randomness and representativeness of the sample, thus increases the generalizability of the results (Foy et al. 2013).

3.2 Measures

3.2.1 Student science achievement

Student science achievement was an endogenous/outcome variable in this study. It was measured by test items about science content and cognitive domains. Four content domains were included in the assessment and each domain contained a few topics. The domain and its proportion accounting for in the assessment were as follows: biology (35%), chemistry (20%), physics (25%), and earth science (20%). Three cognitive domains were also included in the assessment. They were: knowing facts, procedures, and concepts; applying knowledge and conceptual understanding in real-life problems; and reasoning with unfamiliar situations, complex context, and multi-step problems. A multi-matrix design was employed to collect students’ science achievement, in which each student received only a subset of test items out of a test pool (Martin and Mullis 2012a, b). After the assessment data were collected, an item response theory approach was used to scaling student science assessment for each student, which was indicated by a set of five plausible values (Martin and Mullis 2012a, b). These values are randomly selected from a distribution of achievement scores that approximated the student true ability (Martin and Mullis 2012a, b) (see Table 1). Student final science achievement score used in the study was the average of the five plausible values.

Table 1 Means and standard deviations of items

3.2.2 Student self-efficacy in science

We treated student self-efficacy in science both as an endogenous and exogenous variable. Three items in student questionnaire measured this construct, including “I usually do well in science”, “I learn things quickly in science”, and “I am good at working out difficult science problems”. A 1–4 Likert scale was used, where 1 referred to “Agree a lot” and 4“Disagree a lot”. These three items were selected because they were strong expressions of student self-efficacy in learning science. All the items were reverse coded so that a higher value means a higher degree of agreement. The reliability among the items estimated by Cronbach’s α was.82, which was high (see Table 1).

3.2.3 Teachers’ generic teaching

We treated teachers’ generic teaching as another exogenous variable in the study. Four items in teacher questionnaire measured this construct. They were “Summarize what students should have learned from the lesson”, “Relate the lesson to students’ daily lives”, “Use questioning to elicit reasons and explanations” and “Bring interesting materials to class” based on a 1–4 Likert scale (1-“Every or almost every lesson”, 4-“Never”). All the items were reverse coded so that a higher value means a higher frequency. The reliability among the items estimated by Cronbach’s α was .53, which was acceptable (see Table 1).

3.2.4 Teachers’ science practice-based teaching

We treated science practice-based teaching as an exogenous variable. Four items in teacher questionnaire measured this construct. They were “Observe natural phenomena and describe what they see”, “Design or plan experiments or investigations”, “Conduct experiments or investigations”, and “Relate what they are learning in science to their daily lives”. These items were representations of the most popular inquiry-based teaching practices in science. They were measured on a 1–4 Likert scale, where 1 referred to “Every or almost every lesson” and 4 “Never”. All the items were reverse coded so that a higher value means a higher frequency. The reliability among the items estimated by Cronbach’s α was .72, which was adequate (see Table 1).

3.2.5 Teachers’ traditional didactic science teaching

We treated teachers’ traditional didactic science teaching as an exogenous variable. Three items in the teacher questionnaire measured this construct. They were “Read their textbooks or other resource materials”, “Have students memorize facts and principles”, and “Take a written test or quiz”. A 1–4 Likert scale was used, where 1 referred to “Every or almost every lesson” and 4 “Never”. All the items were reverse coded so that a higher value means a higher frequency. The reliability among the items estimated by Cronbach’s alpha was .69, which was very close to adequate (see Table 1).

3.2.6 Control variables

We selected student socioeconomic status (SES) and the frequency of science homework as control variables. SES has long been considered one of the most important variables that affect student achievement and should be treated as a control variable (Bornstein and Bradley 2003; Bradley and Corwyn 2002; Coley 2002; Long and Pang 2017; Milne and Plourde 2006). Although whether homework positively or negatively affects student achievement is still controversial, it was suggested to be one representation of Opportunity to Learn (OTL) (Schmidt et al. 2015) and has been found to be a significant predictor of student achievement (Cooper and Robinson 2006; Leone and Richards 1989; Maltese et al. 2012; Trautwein 2007). In addition, Trautwein (2007) differentiated the effects of homework time, homework frequency, and homework effort on student achievement and found that homework frequency, rather than the other two, was a significant predictor of achievement.

In this study, student SES was measured by the home educational resources index provided by TIMSS in a student questionnaire. It was based on a 1–3 Likert scale (1-“Many resources”, 3-“Few resources”). The frequency of homework time was measured by students’ response to one item “How often teacher give you homework in science?” on a 1–5 Likert scale (1-“Every day”, 5-“Never”). All the items were reverse coded so that a higher value means more resources and a higher frequency.

3.3 Data analysis

3.3.1 Preliminary analysis

Due to the complex sampling designs of the TIMSS, we employed several statistical techniques to increase the precision of the measures of the variables (Rutkowski et al. 2010; Rutkowski et al. 2014). Weights were first applied to all the variables in IBM SPSS Statistics (Version 24). Variables at the student level were weighted by total student weight and variables at the teacher level were weighted by science teacher weight. Then missing values were examined and handled. The dataset had a missing rate ranging from 1.78 to 4.02% for student variables and ranging from 27.4 to 38.0% for teacher variables and no missing value was found on the plausible values of student science achievement. Multiple Imputation (MI) approach was used in combination with two-level multilevel structural equation modeling (MSEM) to handle missing values in Mplus 8.3 and five imputations were made for each variable (Enders et al. 2015).

3.3.2 Modelling approach

In order to test the hypotheses regarding the direct and indirect effects of the exogenous variables on endogenous variable at both teacher- and student-level, we applied traditional multilevel structural equation modelling (MSEM) approach. Like Hierarchical Linear Modeling (HLM), MSEM can account for the variances at varying hierarchical levels. However, MSEM has advantages over HLM in the following three aspects. First, the approach is a synthesis of multilevel modeling and structural equation modeling (Morin et al. 2014; Rabe-Hesketh et al. 2007). Second, the approach can control both measurement and sampling errors, thus providing more accurate estimates of structural relationships among variables (Muthén and Asparouhov 2009). Third, the approach can conduct mediation analysis and further illustrate the direct and indirect effects of the variables at different levels on the endogenous variables (Preacher et al. 2010; Trusz 2018; Wendorf 2002).

MSEM analyses were conducted using the multilevel structural equation module in Mplus 8.3 (Muthén and Muthén 2010). To perform MSEM analyses, we followed the procedure proposed by Hayduk (1987). Measurement model was first set up to examine if measures of the variables were consistent with the theoretical constructs (see Fig. 1). Then a path model was used to test the hypothesized relationships among variables at the two levels (see Fig. 2). At the student level, two control variables, student SES and homework frequency, and student self-efficacy were included in the model as predictors of student science achievement. At the teacher level, the same two control variables, student self-efficacy, and three types of teaching approaches were included in the model as predictors of students’ science achievement. At the same time, student self-efficacy was used as a mediator. In the hypothesized model, endogenous variables (i.e. three types of teaching approaches) are level-2 variables, mediator (i.e. student self-efficacy) is a level-1 variable and exogenous variable (i.e. student science achievement) is also a level-1 variable. Therefore, the model is a 2-1-1 MSEM model (Preacher et al. 2010).

Fig. 1
figure 1

Measurement model. All the factor loadings are above the line

Fig. 2
figure 2

Multilevel structural equation model. Solid and single-arrow lines refer to significant and positive paths, solid and double-arrow lines refer to significant and negative paths, and dotted lines refer to insignificant paths. Perf.-B = Students’ Science Performance at Between Level; Perf.-W = Students’ Science Performance at Within Level; SSEF-B = Students’ Self-Efficacy at Between Level; SSEF-W = Students’ Self-Efficacy at Within Level; SPT = Teachers’ Science Practice-based Teaching; STR = Teachers’ Traditional Didactic Science Teaching; GT = Teachers’ Generic Teaching; SES = Students’ Socioeconomic Status; HW = Students’ Frequency of Homework

Mplus uses the Chi square test, Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and Standardized Root Mean Square Residual (SRMR) as the model fit indices (Muthén and Muthén 2010). According to previous studies (Hox and Bechger 1999; Hu and Bentler 1999; Schermelleh-Engel et al. 2003), CFI and TLI with a value of .90 or greater suggests an acceptable fit, and a value of .95 or greater suggests a good fit. RMSEA with a value of .06 or smaller suggests a good fit. SRMR with a value of .08 or smaller indicates acceptable fit. In the process of understanding the indirect effects, we focus on two points recommended by Rucker et al. (2011): the statistical significance of indirect effect and the size of the effect.

4 Results

4.1 Descriptive statistics and correlations

US students’ average score in science assessment was 523.88 (SD = 77.11), which was close to the score reported in the 2011 TIMSS (i.e. 525). Among the three items measuring students’ self-efficacy, students rated the item “I usually do well in science” (M = 3.28, SD = .83) higher than the other two items. This means that more students believe they do well in science than their ability of learning things quickly in science and solving difficult science problems. Among the three types of teaching, teachers reported more generic teaching than science practice-based teaching and traditional didactic science teaching. For the four science practice-based teaching approaches, teachers reported higher frequency of the practice “Relating what they are learning in science to their daily lives” (M = 3.37, SD = .74) than the other three practices. The practice “Design or plan experiments or investigations” was reported with the least frequency (M = 2.47, SD = .73). The three traditional didactic science teaching approaches were reported with similar frequency, with the approach of “Read their textbooks or other resources” having the most frequency (M = 2.70, SD = .85). The teachers reported high frequency of four generic teaching approaches: “Use questioning to elicit reasons and explanations” (M = 3.75, SD = .53), “Summarize what students should have learned from the lesson” (M = 3.69, SD = .60), “Relating the lesson to students’ daily lives” (M = 3.57, SD = .61), and “Brining interesting materials to class” (M = 3.23, SD = .71). The mean of students’ SES indicator is 2.14 (SD = .53), indicating that American students’ overall home educational resources were just above average. The average of students’ report of the frequency of science homework was 3.04 (SD = 1.12), meaning that American students’ science homework frequency was about 3 or 4 times a week (see Table 1).

Overall, the correlations among all the items were not high, ranging from .00 to .65. The three items measuring student self-efficacy in science had high correlations (i.e. .63, .55, and .65). A few items measuring teachers’ science practice-based teaching approaches had high correlations, such as the correlation between “Design or plan experiments or investigations” and “Conduct experiments or investigation” (r = .65). Two items measuring teachers’ traditional didactic science teaching had a comparatively high correlation: “Read their textbooks or other resource materials” and “Have students memorize facts and principles” (r = .48). In addition, one item measuring teachers’ generic teaching, “Relate the lesson to students’ daily lives”, and an item measuring teachers’ science practice-based teaching, “Relate what they are learning in science to their daily lives” had a correlation of. 54 (see Table 2).

Table 2 Zero-order correlations among variables

4.2 Measurement model

We conducted a Confirmatory Factor Analysis (CFA) in Mplus to test the measurement model before the test of MSEM. The results showed that χ2 (71) = 582.84, RMSEA was .03, CFI was .96, TLI was .94, SRMR at the within level was .01, and SRMR at the between level was .07. These indicate an excellent fit between the model and the data. All the factor loadings in the measurement model were statistically significant at p < .001 and most of the loadings ranged from .50 to .98. The factor loadings for the three items measuring students’ self-efficacy at the within level were .70 (“I usually do well in science”), .85 (“I learn things quickly in science”), and .74 (“I am good at working out difficult science problems”), respectively. Two items measuring teachers’ science practice-based teaching had high factor loadings (λ = .74 for “Design or plan experiments or investigation” and λ = .79 for “Conduct experiments or investigations”). Another two items measuring teachers’ traditional didactic science teaching had similar factor loadings (λ = .66 for “Read their textbooks or other resource materials” and λ = .67 for “Have students memorize facts and principles”). One item measuring teachers’ generic teaching had high factor loading (λ = .53 for “Relate the lesson to students’ daily lives”) and the factor loadings of the other three items were close to .50. All the three items measuring student self-efficacy at the between level loaded strongly on the latent variable and all the factor loadings were higher than .90, with the factor loading of “I learn things quickly in science” being .98 (see Fig. 1).

4.3 Multilevel structural equation modeling

The results of the final multi-level structural equation modeling (MSEM) analysis showed that χ2 (117) = 1073.53, RMSEA was .03, CFI was .93, TLI was .91, SRMR at the within level was .05, and SRMR at the between level was .08. All these values indicate an adequate fit between the model and the data.

4.3.1 Relationships at the within level

At the within level, student SES and science homework frequency were statistically significant predictors of student science achievement (p < .001). Student SES was a positive predictor, whereas student homework frequency was negative. This suggests that students who have more home educational resources are more likely to perform better in science. But students who have more science homework are less likely to have higher science achievement. After controlling student SES and homework frequency, student self-efficacy was still a significant and positive predictor of science achievement (p = .00) (see Fig. 2). This supports our hypothesis 1 at the within level. As shown in Table 3, student self-efficacy in science had a higher coefficient (β = .20, SE = .01) than SES (β = .08, SE = .01) and homework frequency (β = − .04, SE = .01). This means that student self-efficacy in science was a better predictor of student science achievement than the two control variables at the student level.

Table 3 Path standardized coefficients and standard error of MSEM model

4.3.2 Relationships at the between level

At the between level, student SES was a significant and positive predictor of student science achievement (p = .00), but homework frequency was nonsignificant (p = .61). This suggests that students with more home educational resources taught by the same teacher also performed better in science assessment than those with less resources. However, student homework frequency did not affect their science achievement at the group level. After controlling the two variables, student self-efficacy in science was still a significant and positive predictor (p = .00). This shows that students with higher self-efficacy in science still outperformed those with lower self-efficacy in science achievement even though they were taught by the same teacher. It also supports our hypothesis 1 at the between level. Comparatively speaking, students’ SES (β = .72, SE = .03) had a higher coefficient than student self-efficacy in science (β = .31, SE = .05) (see Table 3). This suggests that student SES accounted for a larger variance of science achievement than student self-efficacy in science at the between level.

However, the three types of teaching approaches did not have significant direct effects on student science achievement (p = .42 for science practice-based teaching, p = .50 for traditional didactic science teaching, and p = .97 for generic teaching). This indicates that students with a teacher using any of these three types of teaching approaches did not directly affect student science achievement. These results did not support our hypothesis 2.

Interestingly, among the three types of teaching, only generic teaching was a significant predictor of student self-efficacy in science (p = .02). This indicates that students with a teacher using more generic teaching are more likely to have a higher self-efficacy, but frequencies of using science-domain specific teaching do not affect student self-efficacy in science. The standardized coefficient of generic teaching was .25 (SE = .10). The results partially supported our hypothesis 3.

Further, no significant indirect effects were found for two types of science-domain specific teaching approaches (science practice-based teaching and traditional didactic science teaching) on student science achievement through student self-efficacy in science (p = .25 for science inquiry-based practices, p = .58 for science traditional practices). But a significant indirect effect was found for the generic teaching on student science achievement through the self-efficacy in science (p = .04). This shows that students with a teacher using more generic teaching approaches are more likely to have a higher science achievement through the effect of their self-efficacy in science. But the coefficient of the indirect effect is comparatively small (β = .08, SE = .04). These results partially supported our hypothesis 4.

5 Discussion

5.1 Contributions

This study sought to examine the direct and indirect relationships between teaching approaches, student self-efficacy, and science achievement with US eighth-grade students’ data collected from the 2011 TIMSS assessment. It makes the following contributions to the current literature. First, it examines the relationships among the three variables by using a large-scale, nationally representative sample, through a multi-level structural modeling approach that takes into account the hierarchical relations between teachers and students. This provides the literature with valid and generalizable study results. Second, it found the internal mechanism of how generic teaching approaches affect student science achievement through the mediating effect of student self-efficacy in science. Third, it provides empirical evidence to support the differences between generic teaching approaches and science-domain teaching approaches and their dissimilar effects on students’ self-efficacy in science and science achievement. Fourth, it shows that self-efficacy in science is a stronger predictor of student science achievement than other variables, corroborating the importance of students’ self-theories in learning in a general sense.

5.2 Discussion

This study found that none of the three types of teacher-reported teaching practices (generic teaching, science practice-based teaching, and traditional didactic teaching) has significant direct effects on science achievement. This result is not consistent with some of previous empirical studies that identified a positive relationship between generic teaching approaches, such as questioning and providing or reinforcing objectives, and student academic achievement (Ames 1992; Dole et al. 1991; Hattie et al. 1996; Schroeder et al. 2007). However, this result is consistent with other recent empirical studies using the data of TIMSS and PISA, another international large data set, that reported no significant or even negative relationship between practice-based science teaching approaches with student science achievement (Cairns and Areepattamannil 2019; Jerrim et al. 2019; Jiang and McComas 2015; Long 2016). Therefore, this finding empirically challenges a major assumption of science education reform again, which proposes to improve student science achievement by shifting the focus of science teaching from traditional didactic teaching to engaging students in science practices (NGSS Lead States 2013; NRC 1996).

This lack of significant effects also suggests that implementing a complex science activity to achieve meaningful goals in the classroom might require supports from resources beyond and above teaching. The literature showing positive relationships between practice-based science teaching and student achievement (Wilson et al. 2010; Taraban et al. 2007) usually involved effective professional development, curriculum materials, and specific practice guides to support teacher and student learning (Windschitl and Calabrese Barton 2016). Therefore, more factors that potentially support teachers to implement science practice-based teaching need to be further examined in future studies.

The current study also showed that only generic teaching was a significant predictor of student self-efficacy in science, but both science-domain specific science teaching (practice-based and didactic) were not. This finding contributes to our understanding of how regular teaching approaches impact on student self-efficacy in science by providing empirical evidence based on a large-scale, nationally representative dataset (Cheung 2015). It also adds more complexity to the current literature by agreeing with some scholars (Britner and Pajares 2001) but disagreeing with others (Sungur and Tekkaya 2006). It may also suggest that science-specific teaching approaches are related to students’ science self-efficacy in science through other student variables, such as peer pressure within science practices (Gao and Wang 2014; Wang and Lin 2005) and students’ cultural values. These variables can be examined in the future study.

Another important finding in this study is that student self-efficacy in science was significantly and positively associated with science achievement. This finding further supports theories from Bandura (1997) and Greene (2015) and adds to existent empirical studies in science education field (Aurah 2017; Boz et al. 2016; Britner 2008; Chen and Pajares 2010). It also encourages science teachers to continue to improve students’ self-efficacy in science and further help students improve their science achievement (Britner 2008; Britner and Pajares 2001, 2006; House 2008; Lavonen and Laaksonen 2009). However, further research is needed to scrutinize the mechanisms behind this relationship and identify what other factors mediate the relationship, such as the time or efforts individual students spent in science learning (Schmidt et al. 2018).

This study also reported a surprising and significant indirect association between teaching practices and science achievement through student self-efficacy in science, even though generic teaching was not significantly, directly associated with achievement. This finding empirically corroborated the mediating role of self-efficacy in science between teaching approaches and science achievement and supported that student self-efficacy in science is influenced by generic teaching approaches, such as encouraging students to do their work (Azevedo 2015). This finding also suggested that the reason science teaching approaches did not increase student science achievement was because these approaches suppress the improvement of students’ academic self-efficacy (Chen and Pajares 2010). Further research, such as qualitative studies, could be conducted to explore how and why this suppression happens.

This study further found a positive effect of one control variable, student SES, on science achievement and a negative effect of another control variable, science homework frequency, at level 1. It is in line with previous studies that have consistently identified SES as a significant factor affecting student science achievement, including studies using large-scale datasets (Long and Pang 2017; Maltese et al. 2012). Researchers have also indicated that the association between homework frequency and achievement vary with the measures and the level of analysis chosen in the study, which was referred to as chameleon effects (Trautwein et al. 2009). However, it remains unclear whether teaching approaches interact with student SES and how. Therefore, in the future, it is important to examine how teaching approaches are related with non-teaching factors, such as social, economic, cultural, and historical contexts, in which teaching and curriculum practices are situated (Berliner 2009; Sykes et al. 2010).

5.3 Limitations

This study provides interesting insights into the current literature. However, it is not without limitations. First, the 2011 TMISS used in the study is a secondary database. Teacher and student participants in this dataset were asked to self-report the frequencies of different teaching activities in science classroom through questionnaires. The lack of observations in teaching activities could affect interpretation about how three type of teaching approaches were regularly used in the classrooms. Therefore, the findings of this study need to be verified and further examined in future studies based on systematic observations and other approaches, such as interviews. Second, only some components of generic teaching, science practice-based teaching, and traditional didactic science teaching approaches were surveyed in the instruments. Other components of the teaching approaches may be unrepresented and need to be explored using a more complete instruments in the future. Third, although the hierarchical relations between teachers and students were considered in this study, the causal inferences about the relationships among the three variables should be interpreted with caution. Finally, this study only examined direct and indirect relationships among three variables. In the future, the relationships of more variables and moderation or mediated moderation effects will be examined.

6 Conclusion

Using US eighth-grade data obtained from the 2011 TIMSS, this study focuses on examining two important theoretical assumptions that guided science education reform during the past two decades. It provides empirical evidence for not only the relationships between teaching approaches, student self-efficacy in science, and science achievement, but also the mediation effect of student self-efficacy in science between teaching approaches and science achievement using a MSEM approach. Our finding indicated a significant mediation effect of student self-efficacy on the relationship between generic teaching practices and student science achievement even though none of the teaching approaches identified in this study were directly associated with student science achievement. This suggests that simply changing science teaching and curriculum in the reform of science education may not improve student science achievement. Rather, the reform needs to pay substantial attention to the complex relationships between teaching approaches and student science achievement, especially how teaching approaches influence student self-beliefs.