Introduction

The mathematics achievement gap is present in comparisons of males to females across the K-12 grades (Ellison and Swanson 2010; Ganley and Lubienski 2016; Parker et al. 2017; Robinson and Lubienski 2011). Robinson and Lubienski (2011) found that kindergarten girls and boys were equal in mathematics achievement; however, girls began to lose ground to boys in primary grades before regaining some of the difference in the middle years of schooling (Grades 6–8). According to Ganley and Lubienski (2016), the difference between boys and girls in mathematics confidence in primary school are larger than differences in mathematics interest, and may be one of the main reasons for the gap in mathematics achievement.

Studies have shown that the class achievement gap, or socio-economic status (SES) achievement gap, is present within the preschool years and persists until graduation, with lower-income students continually scoring lower in both mathematics and reading test scores (Pianta et al. 2008; Sirin 2005; Votruba-Drzal et al. 2008). Disparities in informal arithmetic knowledge among lower-income and upper-middle-income schools could also be a contributing factor to the mathematics achievement gap. Case et al. (2001) found 75% of the children in an upper-middle-income school already had informal arithmetic knowledge before entering school, whereas only 7% of children in a lower-income school had this knowledge.

Achievement gaps are often explained by postulating differences in family structure, home chaos, parenting practices, executive functioning, cultural capital and mother’s education (e.g., Crook and Evans 2014; Grimm et al. 2010; McKown 2013; Reardon 2011). However, an additional aspect of these achievement gaps might be differences in students’ classroom engagement and motivation.

Student engagement is defined as a student’s active behavioural involvement in school and learning activities (Fredricks et al. 2004). Three dimensions of engagement have been proposed by the literature (Fredricks et al. 2004)—affective (emotional), cognitive, and behavioural. Behavioural engagement refers to observable behaviour such as overt attention, classroom participation, and question asking. Cognitive engagement refers to mental effort, such as thinking about course content, using strategies, and concentrating. Affective engagement refers to positive emotions during class, such as interest, enjoyment, and enthusiasm. The current study examines classroom engagement (Wang et al. 2014) as rated by the teacher, as this is the most closely related type of engagement for learning outcomes.

Motivation was measured through self-reported attitudes toward mathematics and a student’s perception of competence in mathematics, two commonly measured aspects of motivation (Schunk et al. 2013). Attitude toward mathematics refers to perceptions of interest, enjoyment, and liking; positive attitudes are related to academic achievement (e.g., Bergin 1999; Cvencek et al. 2011; Dupeyrat et al. 2011; Ganley and Vasilyeva 2011; García et al. 2016; Lambic and Lipkovski 2012; Nosek and Smyth 2011; Pinxten et al. 2014;Villavicencio and Bernardo 2013). For example, Dupeyrat et al. (2011) found self-perceptions of mathematics competence predicted mathematics achievement in high school French students and Ganley and Vasilyeva (2011) found attitudes toward mathematics were associated with mathematics achievement within girls but not boys within U.S. middle school students. In addition, enjoyment in mathematics is associated with higher mathematics achievement in upper primary students (García et al. 2016) within middle school students (Pinxten et al. 2014) and in first year tertiary students (Villavicencio and Bernardo 2013).

Teacher-rated student engagement, a behavioural measure of engagement, and motivation, the perceived internal interest and competence in mathematics, were assessed to determine if both variables would have similar patterns of prediction. This comparison is important as most research uses self-report for motivation, however, in this study we examined the individual and moderation of motivation as a self-report and engagement as a behavioural teacher report. The individual and combined influences of motivation and engagement on mathematics achievement have implications for the biological sex and SES achievement gaps (Collie et al. 2019; Collie and Martin 2017; Martin et al. 2015; OECD 2015; Planty et al. 2009) in mathematics achievement and for the method of using different measures of motivation variables.

The purposes of the current study was to examine mathematics achievement, across groups delineated by biological sex and SES, based on self-reported motivation and teacher-reported engagement to predict mathematics test scores.

Methods

Data source and participants

The Early Childhood Longitudinal Study (ECLS-K) is a nationally representative, longitudinal data study which began with 21,260 kindergartners in 1998–1999. This study examined multiple childhood variables and their influence on academic performance. National Center for Education Statistics (NCES) was responsible for all data collection (Tourangeau et al. 2009). A multisource and multimethod approach, that included data from students, their parents, the students’ classroom teachers and the school administrator, was utilised. The data include direct assessments of students’ achievement in mathematics, reading and science. Initial data were collected in the fall of the participant’s kindergarten year. Subsequent data collection waves included the following: (a) spring of the kindergarten year, (b) fall and spring of the 1st grade, and (c) spring of the 3rd, 5th and 8th grade. The seventh and final wave of data were collected in the 2006–2007 school year.

A nationally representative sample was created utilising a multistage probability sample design of children who were in kindergarten in 1998–99. The primary sampling units (PSUs) for the sample include the students within the schools, and finally the geographic areas of counties or groups of counties. The data collection included superstrata to help ensure a representative sample based upon a number of demographic variables including race/ethnic group and socio-economic status (SES). See Tourangeau et al. (2009) for more details on data collection and sampling technique.

A subset of the original ECLS-K sample was used and the sample was restricted to the 3rd and 5th grades or waves five and six. The 3rd and 5th grades were chosen to ensure a stable score on mathematics achievement when compared to earlier grades. Due to the inclusion of teacher level variables, the 8th grade or seventh wave was purposely not included. Students are more likely to stay with one teacher during the primary years allowing the pairing of the teacher level variable with individual student level factors. The sixth wave of data contains 15,403 students from the original sample due to attrition. In addition, cases without weights were not included within the analysis (n = 3583). Added to those already excluded from the analysis, 474 cases were dropped because they had no Taylor series design correction statistic, and 2755 cases were dropped because they had no racial or SES values listed in the fifth wave or sixth wave of data, leaving a final study sample of 8591.

The students in the study sample were almost equally male, 4350 (50.6%) and female, 4241 (49.4%). The students were overwhelmingly white, 5439 (63.3%), with 826 (9.6%) Black, 1413 (16.4%) Hispanic, 453 (5.3%) Asian, 96 (1.1%) Native Hawaiian or other Pacific Islander, 169 (2.0%) American Indian or Alaska Native, and 195 (2.3%) multi-racial. SES was divided into quintiles, by ECLS study staff, with the first quintile being the lowest income bracket and the fifth quintile being the highest income bracket; based upon the total population of the wave. The number of cases in each SES category for the study sample was different for the fifth wave, or 3rd grade, and for the sixth wave, or 5th grade, however, the movement between quintiles was small. Due to the reduction of the sample size, SES was not divided precisely into 20% increments. In the fifth wave, the sample consisted of 14.5% (1246) in the lowest category, 17.5% (1504) in the lower-middle, 18.8% (1619) in the middle, 22.3% (1918) in the upper-middle, and 26.8% (2304) in the highest SES quintile. For ease of interpretation we labelled the lowest category of SES as ‘lower’, second as ‘lower-middle’ class, third as ‘middle’ class, fourth as ‘upper-middle’ class, and fifth as ‘highest’.

Analysis

Analysis included weights for the last wave used in the analysis or the 5th grade (sixth wave) to account for possible bias in the data due to attrition across the years of data collection. Taylor series design effects were also included in the analysis to account for the complex nature of the sampling design of the ECLS-K dataset. Patterns of missing data were examined and found to be missing at random with the highest percent of missing data in the item response theory (IRT) mathematics scores for the 3rd grade at 14%. Missing data were imputed using Markov chain Monte Carlo (MCMC) algorithm to produce 25 data sets for analysis. Finally, multiple linear regression was used to test the effect of mathematics motivation and school engagement on mathematics IRT scores for each combination of biological sex and SES for 3rd and 5th grade scores. All analyses were conducted using STATA version 14.

Instruments

3rd and 5th grade motivation for mathematics

Mathematics motivation is from the Self-Description Questionnaire (SDQ; Marsh et al. 1983), administered to students during the 3rd and 5th grade data collection waves. Five items measured attitudes toward mathematics (e.g., I enjoy doing work in mathematics, I cannot wait to do mathematics each day, I like mathematics) and three items measured perceived mathematics competence (e.g., I am good at mathematics). Children were asked to rate each item on a 4-point scale where 1 = not at all and 4 = very true. The scale reliability and validity are well established (see Marsh et al. 1983), and the study’s sample also showed high reliability (3rd grade α = .89, 5th grade α = .92).

3rd and 5th grade engagement

The Social Rating Scale (SRS), an adaptation from the Social Skills Rating System (Gresham and Elliot 1990), was used to measure classroom engagement. The classroom teacher filled out the scale about each student. The scale consisted of six items measuring aspects of classroom behaviour such as learning independence, flexibility, eagerness to learn, and attentiveness. Teachers used a four-point scale to answer each item where 1 = never and 4 = very often. The scale’s validity and reliability are well established (see Gresham and Elliot 1990) and the split-half reliability was high (3rd grade α = .91, 5th grade α = .91).

Mathematics achievement

Mathematics achievement was created using IRT standardised mathematics test scores. The mathematics assessment items measured a wide range of mathematics behaviours, including ordinal, sequence, rate and measurement, and multiply/divide. All scores were transformed and reported as T-scores to facilitate comparisons to national averages (Tourangeau et al. 2006).

Grouping variables

Biological sex was self-reported by the students during the 5th grade wave of data collection. Family SES was computed as a composite of five items: father’s and mother’s occupation, father’s and mother’s education level, and household income. SES was divided into within-sample quintiles where the 1st quintile represented the lowest 20% and the 5th quintile represented the highest 20% of the sample. The grouping variable was computed by combining biological sex with each level of SES. Means and standard deviations for the variables included in the analysis and sub-sample sizes are reported for each group in Table 1 for the fifth wave (3rd grade) and in Table 2 for the sixth wave (5th grade).

Table 1 Descriptive statistics by Biological Sex and SES for 3rd grade
Table 2 Descriptive statistics by race/ethnicity and SES for 5th grade

Results

Teacher-rated student engagement and student self-reported motivation for mathematics were used as predictors of mathematics achievement in 3rd and 5th grades. This study specifically examined mathematics achievement patterns and the combination of biological sex and SES, grouped as quintiles. The overall means for mathematics IRT scores, mathematics motivation, and school engagement for both 3rd and 5th grade are shown in Fig. 1 for males and females separately. Figure 2 illustrates the overall means for mathematics IRT scores, mathematics motivation, and school engagement for both 3rd and 5th grade for each category of SES. A total of 20 multiple regressions were run to predict mathematics test scores using mathematics motivation and engagement as independent variables. Results are shown in Table 3 for 3rd grade and in Table 4 for 5th grade.

Fig. 1
figure 1

Mean mathematics IRT scores, mathematics motivation, and engagement by biological sex

Fig. 2
figure 2

Mean mathematics IRT scores, mathematics motivation, and engagement by SES quintile

Table 3 Multiple linear regression results by Biological Sex and SES predicting 3rd grade IRT mathematics scores with mathematics motivation and engagement
Table 4 Multiple linear regression results by Biological Sex and SES predicting 5th grade IRT mathematics scores with mathematics motivation and engagement

For the 3rd grade, school engagement predicted mathematics IRT scores for all groups (all p values < .01) except for highest males; however, mathematics motivation was a more inconsistent predictor, predicting mathematics IRT scores for lower-middle, upper-middle and highest males, lowest females and highest females (all p values < .05).

The results for the 5th grade are similar; school engagement is again a more consistent predictor of mathematics IRT scores for all types of students examined (all p values < .05). Mathematics motivation is associated with mathematics IRT scores in the 5th wave only for male students (male lower, male upper-middle and male highest) (all p values < .05). Trends for IRT mathematics mean scores, mathematics motivation, and school engagement for males and females, for both 3rd and 5th grades, are shown in Fig. 1. Note, that 5th grade scores are higher than 3rd grade, and males have lower engagement scores but higher levels of mathematics motivation during both time periods when compared to females. Figure 2 contains the trends for IRT mathematics mean scores, mathematics motivation, and school engagement for each of the SES quintiles included, for both 3rd and 5th grades. Mathematics achievement increases as SES level increases as does engagement, however mathematics motivation is relatively flat across the five groups. Figure 3 examines the combination of biological sex and SES for IRT mathematics mean scores, mathematics motivation and school engagement for 3rd and 5th grades. Mathematics achievement increases across the groups examined with males having slightly higher scores, while females show lower levels of mathematics motivation but mathematics motivation is relatively level across SES groups within males and females. Females display higher levels of engagement across groups when compared to males.

Fig. 3
figure 3

Mean reading IRT scores, reading motivation, and engagement by race/ethnicity and SES quintile

Discussion

The main finding of the study is that school engagement, as rated by teachers, is a more universal predictor of mathematics achievement across biological sex and SES, for both 3rd and 5th grades, than motivation/interest in mathematics as rated by students. This finding is most prevalent within the female student population examined, with mathematics motivation only predicting mathematics achievement for 3rd grade lowest females and 3rd grade highest class females. School engagement was also a consistent predictor of mathematics achievement within males except for highest males (highest quintile) in the 3rd grade. It seems that a student’s observable behaviour predicts mathematics achievement. It may be that observable engagement in school is an active ingredient for achievement. Students’ self-reports of motivation are more consistent across biological sex/SES groups and this may have led to a restriction of range that suppressed the influence of this factor. It is important to note, however, that the measure of motivation was for mathematics motivation in contrast to engagement which was for general classroom engagement. Therefore, it might be possible that the findings are due to a difference between mathematics motivation and general classroom behaviour rather than self-report versus teacher report.

Within the study’s population we did not find that self-reported motivation was a consistent predictor of standardised test scores in mathematics. It may be that motivation without the knowledge or the skills to implement the behaviours necessary does not lead to academic achievement. It is possible that students, across biological sex and income groups, are motivated but do not know how to use the motivation to develop effective learning activities. In support of this, Fryer (2010) found that financial incentives did not improve students’ grades. Students were motivated to receive the monetary incentive but did not have the behaviour repertoire to achieve the grades necessary. Several different incentive systems were conducted across multiple major metropolitan cities within the United States, and very few were successful in increasing academic achievement. In discussing the results, Fryer pointed to the fact that motivation does not correspond to knowledge of the procedures and skills necessary to succeed. However, if the steps necessary to achieve the goal have been learned, such as reading, then incentives increase the behaviour of reading books. When the behaviour is more complex and less procedurally clear, such as increase scores on standardised tests, then incentives seem to have little to no influence. Fryer also asked students what they thought they could do to increase their standardised test scores; they mentioned being careful when they read the questions of the test but none mentioned “reading the textbook, studying harder, completing their homework, or asking teachers or other adults about confusing topics” (p. 33). The students seemed to concentrate on the test-taking moment and not on strategies to engage in learning. The findings from the current students support Fryer’s assertions. The restricted variance in motivation for mathematics across biological sex and SES indicates that students in the ECLS-K data report experiencing similar levels of motivation. However, there was not a clear connection between the level of motivation and the activities that led to school engagement which did predict mathematics achievement for the vast majority of groups examined.

The students examined in the study reported lower mathematics motivation during the 5th grade when compared to 3rd grade scores. Several previous studies have found dropping levels of motivation and perceptions of competence as students get older (e.g., Jacobs et al. 2002; Lepper et al. 2005). However, engagement scores were fairly constant across 3rd and 5th grade. School engagement did increase as SES increased, but again this increase was parallel between the 3rd and 5th grades. This may be another difference between self-reported student level mathematics motivation and teacher-reported behavioural school engagement measures.

A strength of the current study is the use of standardised IRT test scores from the ECLS-K data set and not teacher-assigned grades which may be influenced by the teachers’ bias toward students that are more engaged in class or have other behavioural or demographic characteristics such as high SES. The current study also uses a standard measure of SES (household income, father’s and mother’s education, and their occupations) and not a limited proxy of SES such as a dichotomous measure of free or reduced-price lunch.

As with all secondary data analysis, the study was not designed to measure the differences between measures of motivation and engagement within the groups examined. It would be interesting to have both student and teachers rate motivation and engagement to compare and contrast the congruence between the sources and their predictive power on academic achievement. However, we are limited to comparing self-reported mathematics motivation with teacher-reported classroom engagement. Another limitation is the level of school engagement as a general construct when compared to the domain specific motivation of mathematics. This limitation may be tempered by the fact that behaviours that are included within school engagement may be important behaviours across the various academic domains.

Future research is needed to compare the predictive power of self-reported measures of both motivation and engagement compared to teacher-rated observable measures of the same constructs, especially in light of the developmental stage of primary students. Could teachers’ reports of motivation and engagement have more validity in predicating academic achievement within this population? It would also be useful to develop more precise measures of motivation and engagement from the teachers’ perspectives as some of the observable behaviours may be confounded between the two constructs.

The current study points to a potentially important difference in school engagement between males and females from different SES backgrounds in regards to mathematics achievement. Certainly the replication of our findings is warranted and given that more work is needed in this area, this difference may have important implications for targeting resources to help reduce the mathematics achievement gap by increasing behavioural engagement in low-achieving students. The findings also help inform educational researchers about a potential important difference between motivation and school engagement in terms of their predictive power when examining SES and males and females’ mathematics achievement.