1 Introduction

Educational achievement is often surprisingly gendered. After all, legislators, policymakers, and educators have worked for more than a generation to encode a principle of equal opportunity in education. Boys and girls generally attend the same schools, sit in classrooms alongside one another, and learn the same lessons. Why then do gender disparities in achievement persist? This question becomes all the more perplexing when one considers that—despite the frequent rhetoric around the issue—it is not a simple matter of one gender having an overall edge in terms of achievement. Girls lag behind when the focus is on achievement in mathematics or science, especially in the upper grades or at the top end of the distribution. Boys underperform relative to girls when the focus is on language skills, particularly reading, classroom behavior, or grades (see Buchman et al. 2008 for a review). It is less a matter of the education system failing one gender relative to the other and more a matter of the system failing to produce gender equality in achievement.

This is a concern because any gender disparity in achievement—particularly when it emerges early—is likely to be perpetuated and spill over into other educational outcomes. There are suggestions, for example, that girls’ underperformance at the upper extreme of the mathematical achievement distribution is linked to them being less likely to (i) enroll in advanced math and science classes in high school, (ii) complete science and technology degrees in university, and (iii) subsequently be employed in technology-related occupations such as engineering or computer science (see Penner and Paret 2008; Nollenberger et al. 2016; Lavy and Sand 2015). Boys’ weaker literacy skills and poorer classroom behavior (at least as assessed by teachers) have been linked to their higher retention rate (Entwisle et al. 2007), while increased retention, in combination with more disciplinary incidents and lower grades, are thought to explain much of the relative gap in young men’s propensity to attend college (Jacob 2002). Finally, there are concerns that teachers’ biases in the way they teach, direct their attention, or assess performance all have the potential to amplify any gaps in objectively measured achievement (see DiPrete and Jennings 2012; Cornwell et al. 2013; Lavy and Sand 2015).

The objective of this paper is to contribute to a deeper understanding of gender disparities in achievement by analyzing the gender gap in third-grade test scores in numeracy and reading. In particular, our estimation relies on unusually rich panel data from the Longitudinal Study of Australian Children (LSAC) which was specifically designed to provide an in-depth understanding of child development. The LSAC data have the important advantage of being able to be linked to each child’s results on the national, standardized achievement tests that all Australian children take biennially. Moreover, information on parents’ expectations for and investments in their children as well as each child’s school readiness allows us to distinguish between gender gaps that exist before children arrive at school from those that emerge after. We begin by using quantile regression to demonstrate that the magnitude of the gender gap in literacy and numeracy test scores differs between low and high achievers in ways that are related to students’ social and economic circumstances. Borrowing from the literature on gender wage differentials, we then adopt an Oaxaca-Blinder (OB) approach to decomposing the gender gap in average standardized numeracy and reading test scores into their various components, paying particular attention to how these factors vary with children’s socio-economic status (Blinder 1973; Oaxaca 1973). This approach allows us to shed light on the cumulative importance of the factors underpinning the achievement gap—many of which may be individually insignificant—providing a useful indication of the potential causes and policy responses to be explored in more detail (Fortin et al. 2011).

We make three important contributions to the literature. First, we utilize a standard OB approach to decompose gender gaps in numeracy and reading achievement into two components: one due to endowment effects (i.e., the different characteristics of boys and girls) and one due to differential responses (i.e., the differences in outcomes for boys and girls with the same characteristics).Footnote 1 OB decomposition methods have been fundamental to deepening our understanding of gender gaps in labor market outcomes, particularly wages, for more than 40 years, but to date have only rarely been applied to the study of gender gaps in educational achievement.Footnote 2 Although it is common for researchers to analyze either those outcomes favoring boys or those outcomes favoring girls—but not both simultaneouslyFootnote 3—we consider both using a unified framework in order to gain a more nuanced understanding of the process through which gender disparity in achievement arises. Second, while some researchers find that gender gaps in test scores are more pronounced among disadvantaged children (e.g., Entwisel et al. 2007), others argue that the gender gap in achievement is more pervasive (e.g., Fryer and Levitt 2010). We add new evidence to this debate—which to date has largely been based on US data—by documenting the magnitude of the achievement gap across the distribution and investigating the link between socio-economic status and gender inequality in achievement in the context of Australia. Finally, we look at primary-school achievement using an objective, standardized achievement test. Although much of the empirical literature focuses on achievement gaps in secondary or post-secondary outcomes (Cornwell et al. 2013), new evidence is emerging that there are gender gaps in test scores as early as kindergarten (e.g., Penner and Paret 2008; Husain and Millimet 2009; Fryer and Levitt 2010; DiPrete and Jennings 2012; Cornwell et al. 2013). This disparity in early achievement is particularly worrying because the cumulative nature of the learning process has the potential to compound any gaps in achievement over time.

We find that girls in low- and middle-socio-economic status (SES) families have an advantage in reading, while boys in high-SES families have an advantage in numeracy. Girls score higher on their third-grade reading tests in large part because they were more ready for school and had better teacher-assessed literacy skills in kindergarten. Boys’ advantage in numeracy occurs because they achieve higher numeracy test scores than girls with the same education-related characteristics.

In Section 2, we briefly review the vast literature on gender gaps in early educational achievement paying particular attention to the potential role of socio-economic status. Details of our data, estimation sample, and key measures are presented in Section 3, while the magnitude of the gender gap in reading and numeracy is discussed in Section 4. The results of our decomposition analysis highlighting the source of the gender gap in achievement can be found in Section 5, while our conclusions and suggestions for future research are discussed in Section 6.

2 Literature review

There is a vast literature demonstrating that children’s educational achievement varies with their gender. Gaps in boys’ and girls’ achievement do not always exist of course, but when they do, they are in one sense remarkably easy to summarize: boys do better in numeracy and girls do better in literacy (see OECD (2015); Lavy and Sand (2015) for recent and particularly helpful reviews). In another sense, this characterization is vastly over-simplified. There is a striking lack of uniformity in the achievement gap. The relationship between gender and relative educational achievement varies with the social, cultural, and educational context for example (Pope and Sydnor 2010; Nollenberger et al. 2016; Lavy and Sand 2015), opening the possibility that each might play a role in generating the gap. Achievement gaps also vary with students’ race and ethnicity (Penner and Paret 2008; Husain and Millimet 2009), with their families’ and peers’ socio-economic status (Entwisle et al. 2007; Legewie and DiPrete 2012) as well as across the achievement distribution itself (e.g., Penner and Paret 2008).

A number of explanations have been proposed for the disparity in boys’ and girls’ educational achievement. These include (i) biological differences, particularly in spatial vs. verbal skills (e.g., Levine et al. 2005); (ii) parents’ gender-specific expectations for and investments in their children (e.g., Baker and Milligan 2016; Bertrand and Pan 2013); (iii) social and cultural influences (e.g., Guiso et al. 2008; Nollenberger et al. 2016); (iv) gender differences in the acquisition of social and behavioral skills (e.g., DiPrete and Jennings 2012); and (v) gender-specific educational practices, including teacher bias (e.g., Dee 2007; Gibbons and Chevalier 2008; Legewie and DiPrete 2012; Cornwell et al. 2013; Lavy and Sand 2015).

Despite the multitude of explanations put forward for the gender gap in educational achievement, it is fair to say that the literature has been better at documenting its existence than explaining its source. There is mixed empirical support for many plausible explanations of the gender gap and little to no support for others. Fryer and Levitt (2010), for example, find the mathematics gender gap is largest among children who attend private school, have highly educated mothers, and have mothers working in math-related occupations—precisely the groups for whom we might expect the gap to be the smallest. Similarly, Dee (2007) and Holmlund and Sund (2008) reach different conclusions about the potential for more male reading teachers along with more female math and science teachers to close gender gaps in these subjects, while Lavy and Sand (2015) and Gibbons and Chevalier (2008) disagree about the importance of teacher bias in gender inequalities in achievement. Parents do appear to make gender-specific investments in their children, yet this seems to contribute to the gender gap in some outcomes (e.g., preschool math and reading scores) (Baker and Milligan 2016) but not others (e.g., disruptive behavior) (Bertrand and Pan 2013).

These mixed messages about the mechanisms underpinning the achievement gap are perhaps not surprising. Generating credible scientific evidence on the issue is difficult because it is often nearly impossible to disentangle particular pathways (e.g., biological from environmental conditions) or to measure concepts like stereotypes and prejudices and test their empirical predictions (Lavy and Sand 2015). A large part of the challenge lies in finding explanations for the gender gap in achievement that are nuanced enough to account for heterogeneity in the relationship between gender and achievement across: (i) domains (i.e., numeracy vs. literacy); (ii) the achievement distribution; or (iii) characteristics like age, race, and socio-economic status. Levine et al. (2005), for example, argue that biological explanations of boys’ advantage in spatial skills, at least as currently formulated, would not predict the systematic variation across socio-economic status that we observe.

The bottom line is that inconsistency in the pattern of achievement gaps across groups or contexts makes it unlikely that one unified theory will ever provide a compelling explanation for the overall relationship between gender and educational achievement. At the same time, variation of the sort described above can be exploited to rule some mechanisms into the possibility set and others out. There is little doubt, for example, that educational practices are often gendered, but this is unlikely to provide an explanation for achievement gaps that emerge in preschoolers. Similarly, if biological factors cannot explain racial differences in gender achievement gaps then it seems reasonable to turn to social and cultural explanations (Penner and Paret 2008). In our view, the heterogeneity in achievement gaps across domains and socio-economic status are particularly promising avenues to explore. The mathematics curriculum is highly structured in comparison with other subjects like English (see Riegle-Crumb 2006), and there is evidence that math test scores may be more sensitive to principals’ and teachers’ actions than are English test scores (see Clark et al. 2009; Rivkin et al. 2005). Given this, it is possible that the relative importance of families and schools in shaping gender achievement gaps may depend on whether our focus is numeracy or literacy. Moreover, the interaction of socio-economic status with educational achievement points to the salience of family background, resource constraints, parental and school investments, and the like in explaining the gender gap in educational achievement.

Our goal is to contribute to a more nuanced understanding of gender inequality in educational achievement by investigating the extent to which these factors can explain the gender gap in students’ numeracy and reading test scores.

3 Data

Our data come from the Longitudinal Study of Australian Children (LSAC) which is a national study designed to provide an in-depth understanding of children’s development. The study commenced in 2004 with the recruitment of two cohorts: one cohort of 5107 children aged 0–1 years old (the birth or “B cohort”) and another of 4983 children aged 4–5 years old (the kindergarten or “K cohort”) and their families across all states and territories of Australia. Interviews have subsequently been conducted with families every 2 years (see Soloff et al. 2005 for details).

3.1 Educational achievement measures

The LSAC data can be linked to standardized test scores from the National Assessment Program-Literacy and Numeracy (NAPLAN) which assesses all Australian students in grades 3, 5, 7, and 9 in reading, writing, language conventions (spelling, grammar, and punctuation), and numeracy using a common test administered nationwide on the same day. NAPLAN has been conducted annually since 2008. The reporting scales range from 0 to 500 and are constructed so that scores can be compared across school grades and over time. For example, a score of 500 in third-grade reading in 2008 means the same as a score of 500 for fifth-grade reading in 2009. Each single-year grade progression represents an increase of approximately 40 points on the scale (or 80 points across NAPLAN testing grades).

The LSAC data include school achievement measures based on standardized national test scores, allowing us to extend the international literature on student achievement. Much of the recent US evidence relies on Early Childhood Longitudinal Study-Kindergarten (ECLS-K) data which measure achievement through a direct assessment of children’s cognitive development by the interviewer. Other data sources that include standardized test scores, e.g., the Programme on International Student Assessment (PISA), have far less detailed information on children’s characteristics and do not include any data on children’s development, parental expectations, or teachers’ assessments for example. The unique combination of standardized test scores and detailed information about children is extremely useful in shedding new light on the source of the gender gap in achievement.

3.2 Socio-economic status

To assess the relationship between gender inequality in educational achievement and socio-economic status, we categorize students as having low, medium, or high socio-economic status (SES). Specifically, the LSAC data include a measure of socio-economic position which is constructed from the standardized scores from three components: (i) income (standardized average income of both primary caregivers), (ii) educational attainment (standardized years of education for both primary caregivers), and (iii) occupational prestige (Jones and McMillan’s 2001 standardized status scale for both primary caregivers). These three components are then averaged and normalized to have a mean of 0 and a standard deviation of 1. This combined measure—based upon that designed by Willms and Shields (1996) for the National Longitudinal Survey of Children (NLSC)—provides a robust, parsimonious, and continuous measure of socio-economic position. Such a broad, multidimensional notion of family disadvantage is often preferable to more traditional measures based on low income alone (Corak 2006; Heckman 2011; Kautz et al. 2014). Children in the bottom third of the distribution are categorized as having low SES, while those in the middle and top thirds are categorized as having medium and high SES, respectively.Footnote 4

3.3 Controls

The LSAC data are extremely detailed, allowing us to account for children’s characteristics, behavior, family backgrounds, and home as well as classroom environments. Importantly, unlike the ECLS-K data, LSAC provides information for fathers as well as mothers. Our detailed controls give us the opportunity to simultaneously investigate the contribution of different mechanisms to the gender gap in educational achievement.

We account for measures of school readiness in order to assess whether gender differences in learning exist prior to school entry. Specifically, we control for each child’s age four “Who am I?” (WAI) score normalized to have a mean of 0 and a standard deviation of 1. The WAI score provides a direct assessment of school readiness, i.e., the cognitive processes underlying the acquisition of early literacy and numeracy skills such as: pre-writing skills (ability to copy shapes, letters, and words), pre-literacy skills (recognition of letters and sounds), and pre-numeracy skills (recognition of numbers and ability to count) (see de Lemos and Doig 1999). It has been previously used by researchers to assess how school readiness varies with characteristics such as indigenous status (Leigh and Gong 2009) and handedness (Johnston et al. 2009).

We also control for children’s reasoning ability using a subtest from the Wechsler Intelligence Scale for Children (WISC), 4th Edition (Wechsler 2003), a standardized, reliable, and widely used measure of children’s intelligence.Footnote 5 In particular, our subtest provides a unique assessment of abstract, nonverbal intelligence. Each child is presented with a sequence or group of designs, and then is required to fill in the missing design from a number of choices.

Parental investments may vary with children’s gender. Alternatively, the same investment may have a differential effect on boys’ and girls’ academic achievement. For both reasons, we control for the level of parental investment using a range of measures including: the number of age-appropriate books in the home (Wößmann 2003); the frequency at which the child is being read to by an adult (Leibowitz 1977; Hill and O’Neill 1994); parents’ involvement in children’s daily activities (sharing meals, brushing teeth) (Amato and Rivera 1999); and parents’ help with homework. Gender gaps in early achievement may also originate in deeply rooted societal or cultural expectations about gender roles. Consequently, we also account for mothers’ expectations about their children’s educational attainment and mothers’ labor force status (Fan et al. 2015).

Previous researchers have argued that gendered educational practices, including teacher bias, can also contribute to the gender gap. Consequently, in addition to the child’s school readiness, we also control for indicators of school type as well as for the teacher’s absolute and relative assessment of each child’s achievement level in reading and math. Our absolute achievement measure is based on teachers’ evaluation of how well each child performs with respect to a number of numeracy and literacy skills. This measure has been widely used in the previous literature (Cornwell et al. 2013; Robinson and Lubienski 2011). Teachers also report how well the child is doing in reading and math relative to his peers in the classroom. Relative achievement is often ignored in the literature (Samson and Lesaux 2008) but is likely to be important in light of the emerging evidence that relative ability has important implications for educational achievement over and above that associated with absolute ability (Elsner and Isphording 2015). These controls allow us to assess how teachers’ perceptions of the gender gap in achievement are linked to the actual achievement gap on standardized tests.

Boys’ and girls’ classroom behavior differs in ways that are related to their academic achievement (Bertrand and Pan 2013). We therefore investigate whether gender differences in a measure of children’s antisocial behavior, hyperactivity/inattention, emotional symptoms, peer relationship problems, and conduct problems contribute to differences in academic achievement in third grade.Footnote 6 Finally, we also control for children’s demographic characteristics (e.g., birth weight, age at test, and indigenous status), family background (e.g., household type and parental education), preschool attendance, and whether the teacher has completed the questionnaire (see Appendix for more details on control variables).

Descriptive statistics for all control variables by SES are provided in Table 1 (see Appendix 1 for more details on control variables).Footnote 7 Interestingly, gender differences in learning exist prior to school entry: irrespective of SES, girls score better on the WAI school readiness assessment given at age 4, suggesting that they already have better pre-school literacy and numeracy skills. In contrast, we find no gender difference in children’s reasoning ability (WISC).

Table 1 Summary statistics for control variables

Parental investments do not differ substantially with children’s gender. Boys and girls have the same number of age-appropriate books in the home, are read to by an adult at the same frequency, and are given the same amount of parental help with homework. Still, girls in low- and medium-SES families experience more parental involvement in their daily activities (sharing meals, brushing teeth). Mothers have higher expectations for their daughters’ educational attainment than they do for their sons’. In comparison with their brothers, girls are more likely to be expected to complete a university degree and less likely to complete a vocational degree. These gender gaps in parental expectations are larger in low- and medium-SES families than in high-SES families.

The schooling environment differs for boys and girls in ways that depend on families’ social and economic background. Girls in low-SES families are more likely to attend Catholic schools than are boys, while medium-SES girls attend government schools more frequently than do boys. Teacher assessments are also gendered. Teachers assess high-SES boys as having a relative advantage in numeracy and girls as having an advantage in literacy, both in objective and relative terms, irrespective of their families’ SES. Consistent with the literature, boys are reported to have more behavioral problems than are girls. Antisocial behavior, hyperactivity/inattention are more common for boys irrespective of their SES background, while peer relationship and conduct problems are concentrated at the lower end of the SES spectrum. In contrast, emotional symptoms (nervousness, worry, headaches, stomach aches, etc.) are more prevalent for girls at the higher end. Finally, we also find the usual gender differences in demographic characteristics with boys being born on average 100 g heavier and being 1 month older at the time of test taking due to being more likely to be delayed in school entry.

Decomposition results from our preferred specification, including these controls are discussed in detail in Section 5.2 below. In addition, we conduct a number of robustness checks to determine how sensitive our results are to the inclusion of a range of other measures, e.g., parenting style, approach to learning, parental background, etc., discussed in the literature. These additional results can be found in Section 5.5.

3.4 Estimation sample

Given our interest in early achievement gaps, we focus on NAPLAN test scores in third grade when standardized testing begins in Australia. Unfortunately, 23 % of third-grade test scores are missing for the kindergarten cohort (cohort K) because many of these children were enrolled in third grade in 2007 before NAPLAN tests were introduced. As a result, our analysis centers on children from the birth cohort (cohort B).

Because of different school starting ages across states and some parents’ decisions to delay their children’s school entry, children born in the same year (i.e., in the same LSAC cohort wave) may be enrolled in one of three different sequential grades and therefore take the third-grade NAPLAN test in one of three different calendar years (2010, 2011, or 2012). Moreover, the timing of LSAC interviews (from March to December every 2 years) differs from that of NAPLAN (in May each year). Both pose challenges in establishing a correspondence between NAPLAN scores and the information collected in LSAC. More specifically, some children will have taken the third-grade NAPLAN test before the wave 5 interviews, others after. To avoid explaining the gender gap in test scores with controls measured after the test, we use wave 4 data (when children were 6–7 years old), i.e., prior to any child taking the third-grade NAPLAN test, when constructing our controls.

We necessarily make a number of sample restrictions. First, we restrict our sample to the 67 % of cohort B children for whom we have third-grade test scores.Footnote 8 Second, we drop 4 % of the initial sample for whom we do not have the WAI scores at age 4 or the WISC score at age 6 (226 observations). Third, we drop 139 children for whom we have missing information on the following variables: socio-economic position (used to define the indicators of SES), birth weight, mother’s education, household type, number of books, reading to children, help with homework, mother’s involvement, SDQ, school type, and teachers’ assessments. In the case of all other variables, we retain as much sample as possible by recoding any missing observations to zero and including an indicator variable in the model to control for this recoding.Footnote 9 This results in an estimation sample of 3073 children (i.e., 60 % of the B cohort at wave 1).Footnote 10

4 The magnitude of the gender achievement gap

The boys in our sample score on average 9 points lower in reading and 12 points higher in numeracy on their third-grade NAPLAN assessments than do their female classmates (see Fig. 1). This disparity translates into approximately 3 months of normal academic progression in reading and numeracy, respectively.Footnote 11 Importantly, the gender gap in reading is evident only for children in families at the bottom and middle of the socio-economic distribution with low-SES boys, for example, having a reading level in third grade that is five academic months behind that of low-SES girls. At the same time, boys’ advantage over girls in numeracy exists only among advantaged children. In particular, high-SES boys are nearly six academic months ahead of high-SES girls in numeracy.

Fig. 1
figure 1

Mean NAPLAN reading and numeracy test scores by gender and SES

These results are consistent with previous research highlighting the relationship between gender disparities in educational achievement and socio-economic status. Entwisle et al. (2007), for example, find that the reading skills of boys who are receiving meal subsidies are lower than those of girls, while among students who do not receive meal subsidies, gender makes a little difference in reading levels. Similarly, Penner and Paret (2008) find that boys’ advantage in mathematics is most pronounced among students whose parents have a college or advanced degree, while Levine et al. (2005) demonstrate that boys outperform girls on spatial tasks in middle- and high-income schools but not in low-income schools.

Gender gaps in mean achievement often obscure a great deal of heterogeneity in the performance of different students, and it can often be particularly useful to know whether achievement gaps exist among high achievers, among students who struggle, or across the entire distribution. We investigate this issue by estimating simultaneous conditional quantile regressions of third-grade test scores (\({T_{i}^{j}})\) on an indicator of students’ gender. Specifically,

$$ {T_{i}^{j}}=\alpha_{0}^{j\tau }+\alpha_{1}^{j\tau }M_{i}+\varepsilon_{i}^{j\tau} $$
(1)

where iindexes individuals, j indexes subject areas (i.e., reading and numeracy), τ reflects the respective τ-percentile of the test score distribution, and Mis the indicator variable taking the value 1 for boys and 0 for girls. Equation 1 is estimated simultaneously at every decile of the test-score distribution and the estimated coefficients (along with a 95 % confidence interval) are presented graphically in Fig. 2.Footnote 12 As we condition only on students’ gender, the estimates obtained from these conditional quantile regressions capture the unconditional (raw) gender gap in educational achievement at different points of the achievement distribution.

Fig. 2
figure 2

The overall gender gap across the test score distribution. Notes: the black line represents the estimated coefficients from conditional quantile regressions which are estimated simultaneously at every decile of the test-score distribution. The grey area represents the 95 % confidence interval on these estimates

The equality of the gender gap in test scores across the entire test-score distribution is strongly rejected.Footnote 13 Girls have a relative advantage in reading over much of the distribution (though not among the top third of achievers), while boys’ stronger performance in numeracy is most evident in the top half of the achievement distribution (see Fig. 2). In contrast, Penner and Paret (2008) find that by third grade, boys in the US outperform girls in mathematics throughout nearly the entire distribution.

Finally, estimating quantile regressions separately for children from low-, medium-, and high-socio-economic backgrounds highlights the strong link between the distribution of achievement gaps and socio-economic status.Footnote 14 High-SES boys outperform high-SES girls throughout the entire numeracy distribution (Fig. 3), while low- and medium-SES girls have an achievement advantage in the bottom half to two thirds of the reading distribution (Fig. 4). In all other cases, there is little evidence of systematic gender gaps in achievement.

Fig. 3
figure 3

The gender gap in numeracy by SES across the test score distribution. Notes: the black line represents the estimated coefficients from conditional quantile regressions which are estimated simultaneously at every decile of the test-score distribution. The grey area represents the 95 % confidence interval on these estimates

Fig. 4
figure 4

The gender gap in reading by SES across the test score distribution. Notes: the black line represents the estimated coefficients from conditional quantile regressions which are estimated simultaneously at every decile of the test-score distribution. The grey area represents the 95 % confidence interval on these estimates

5 The source of gender achievement gaps

5.1 Decomposition approach

Decomposition analysis has been at the center of efforts to understand the source of the gender gap in labor market outcomes, in particular wages, for nearly half a century. Knowing the relative importance of various factors in contributing to gender disparities in the labor market has been important in highlighting the potential opportunities for policy response. Our objective is to apply this approach to investigate the source of the gap in boys’ and girls’ test scores. In effect, we will separate the endowment effect (i.e., disparity in boys’ and girls’ in school readiness, family background, age, noncognitive skills, etc.) from the response effect (i.e., disparity in the way endowments are translated into boys’ vs. girls’ achievement) which together lead to the overall achievement gap.

Specifically, the gender gap in third-grade test scores can be decomposed as follows:

$$ \bar{T}_{M}^{j}-\bar{T}_{F}^{j}\thinspace \,=\,\thinspace \bar{X}_{M}^{j}\hat{\beta }_{M}^{j}\,-\,\bar{X}_{F}^{j}\hat{\beta }_{F}^{j} \,=\,\left( \bar{X}_{M}^{j}-\bar{X}_{F}^{j} \right)\hat{\beta }_{P}^{j}-\left( \bar{X}_{M}^{j}\left( \hat{\beta }_{M}^{j}\,-\,\hat{\beta }_{P}^{j} \right)\,-\,\bar{X}_{F}^{j}\left( \hat{\beta }_{F}^{j}-\hat{\beta }_{P}^{j} \right) \right) $$
(2)

where \(\hat {\beta }_{M}^{j}\mathrm {}\thinspace \hat {\beta }_{F}^{j}\mathrm {\thinspace and\thinspace }\hat {\beta }_{P}^{j}\) are the estimated coefficients from an OLS regression of test scores on the full set of covariates X j using the male, female, and pooled samples, respectively (see Neumark 1988; Jann 2008). In effect, Eq. 2 allows the gender gap in achievement to be written in terms of boys’ and girls’ average endowments \(\left (\bar {X} \right )\) and the response functions \(\left (\hat {\beta } \right )\) which map those endowments into test scores. Gendered response effects are compared with a gender-neutral benchmark which, following Neumark (1988), is constructed using estimates from the pooled sample.Footnote 15 Thus, the first term on the right-hand side of Eq. 2 captures the endowment effect, i.e., the part of the gender gap in test scores that arises because boys and girls have different endowments of the things (characteristics) that support good educational outcomes. These differences in boys’ and girls’ average endowments \(\left (\bar {X} \right )\) are evaluated (weighted) using the vector of gender-neutral responses\(\thinspace \left (\hat {\beta }_{P}^{j} \right )\). Thus, this term captures gender differences in what we will refer to as “educational endowments.” The second right-hand side term captures the response effect, i.e., the part of the gender gap that arises because children’s endowments do not get translated into test scores in a gender-neutral way. The response effect is itself composed of two components: first, the gap in test scores due to the deviation in boys’ responses from the gender-neutral benchmark \(\left (\bar {X}_{M}^{j}\left (\hat {\beta }_{M}^{j}-\hat {\beta }_{P}^{j} \right ) \right )\) and, second, the gap in test scores due to the fact that girls’ response function is also not gender neutral, i.e., \(\left (\bar {X}_{F}^{j}\left (\hat {\beta }_{F}^{j}-\hat {\beta }_{P}^{j} \right ) \right )\). The total response effect—which we will refer to as “educational responses”—is equal to the sum of these two components.Footnote 16

In addition to the aggregate decomposition shown above, it is also possible to consider a detailed decomposition in which the contribution of each individual factor to the overall gender gap in achievement is isolated. Previous researchers have noted, however, that in detailed decompositions, the response effects for categorical variables will depend on the choice of the omitted category in the underlying regression model (e.g., Jones 1983; Oaxaca and Ransom 1999; Jann 2008; Fortin et al. 2011). While the literature occasionally refers to this as an identification issue, others argue that it is a conceptual problem in interpretation (see Fortin et al. 2011). Our interest is in investigating the importance of overarching concepts (e.g., parental education or family structure) rather than heterogeneity between specific groups (e.g., high-school dropouts vs. college graduates; single- vs. couple-headed families, etc.) making these interpretation issues less challenging for us. Thus, we do not adopt an estimation approach that would be invariant to the choice of the base category (Jann 2008) but would also be more difficult to interpret (Fortin et al. 2011). Instead, we allow our detailed response effects to depend on the base category we have chosen and, where relevant, interpret them in this light.Footnote 17

In what follows, we focus on the magnitude of the aggregate educational endowments vs. educational responses components of the gap as well as on the contribution of individual factors (e.g., school readiness, demographic characteristics, etc.) separately. Both are instrumental in highlighting the source of the gender gap in educational achievement.

5.2 The achievement gap in reading

The results of our decomposition analysis of reading achievement are presented in Table 2 separately by students’ socio-economic status.Footnote 18 The top panel shows boys’ and girls’ average standardized test scores and the magnitude of the gender gap in students’ standardized reading scores. The share of the reading gap attributable to differences in boys’ and girls’ educational endowments is presented in columns 1, 3, and 5 of the bottom panel of Table 2, while columns 2, 4, and 6 of the bottom panel show the educational responses.

Table 2 Oaxaca decomposition results for NAPLAN reading test scores by SES

There is no gender gap in reading among advantaged children. High-SES boys and girls perform equally well. Girls’ in low- and medium-SES families, however, score significantly higher on the NAPLAN reading test than do boys from the same socio-economic background. Differences in boys’ and girls’ educational endowments accounts for 68.2 % (0.116 std.) of the 0.170-std. gender gap in disadvantaged children’s reading achievement. Similarly, the gap in medium-SES boys’ reading achievement (0.159 std.) is more than explained by their relative educational endowments. In effect, boys’ underperformance in reading can be directly linked to the fact that, on average, they have less of the things that tend to be associated with reading achievement.

Disparities in two educational endowments appear to be especially important. First, girls’ age 4 “WAI” scores are higher suggesting that they were more ready for school when they started. Second, girls’ literacy skills at age 6 were rated by their teachers as being higher than those of their male classmates. Together, these two factors account for 0.113 std. of the overall 0.170 std. in the third-grade reading deficit among low-SES boys and for 0.144 std. of the overall 0.159-std. gap in medium-SES boy’s reading test scores. Finally, medium-SES parents report that their sons have significantly more social, emotional, psychological, and behavioral problems, while low-SES parents have much lower expectations for their sons’ educational attainment than they do for their daughters.Footnote 19 Both contribute to boys’ under-performance in reading.

Interestingly, a relative lack of school readiness and literacy skills in kindergarten also negatively affect high-SES boys’ relative reading achievement. Specifically, these two factors result in a 0.120-std. deficit in advantaged boys’ reading relative to girls. Unlike their less-disadvantaged peers, however, high-SES boys overcome this deficit in educational endowments because their reading scores respond more positively to the endowments they do have. This leaves their overall reading achievement similar to that of high-SES girls.

Taken together, these results confirm previous findings in the literature demonstrating the importance of environmental factors, including home environments and expectations, in understanding gendered educational outcomes. Family disadvantage appears to disproportionately impede the pre-market development of boys (Autor et al. 2016). Boys’ noncognitive development, for example, is particularly responsive to the relative lack of parental investments associated with growing up in a single-parent family (Bertrand and Pan 2013).

It is also important to draw attention to several things which do not contribute to the disparity in boys’ reading achievement. Specifically, parents are more likely to delay the school start of their sons rather than their daughters (Brent et al. 1996; Buchman et al. 2008), implying that boys are slightly older than their female classmates when they sit the NAPLAN tests. This age difference improves boys’ relative reading performance. That is, if boys on average were the same age as girls when taking the NAPLAN test, our estimates indicate that the gender gap in reading would be approximately 0.020–0.030 std. larger than it in fact is.

Moreover, as children’s gender is exogenously assigned, there is little difference in the characteristics of the families in which boys and girls grow up. Thus, it is not surprising that there is little role for differences in family structure, parental education, indigenous status, etc. to generate gender differences in educational endowments that would contribute to the gender disparity in achievement. More surprising perhaps is the lack of a role for abstract, nonverbal intelligence in explaining the gender gap in literacy. Specifically, gender differences in reasoning ability (WISC) do not explain the gender gap in literacy. Similarly, the disparity in boys’ and girls’ third-grade reading achievement is unrelated to teachers’ assessments of their math skills at age 6.

Although most of the gender gap in reading stems from boys and girls having different endowments of the things associated with good educational outcomes, it is also the case that boys and girls with the same educational endowments (e.g., family structure, school readiness, age, etc.) do not achieve the same reading scores on average. Two results are particularly noteworthy. First, reading achievement is related to disadvantaged children’s birth weight in ways that are different for boys and girls (column 2 bottom panel). Specifically, NAPLAN reading scores are positively associated with low-SES boys’ birth weight, but negatively associated with low-SES girls’ birth weight (see Appendix 3). This contributes to reducing the gender gap in reading for children in disadvantaged families. That is, if reading achievement responded to boys’ and girls’ birth weight in the same way, we estimate that the gender gap in reading among low-SES children would be nearly half a standard deviation (0.474 std.) larger. A child’s birth weight is driven by a range of factors including in utero nutrition, family circumstances (SES), maternal behavior, genetics, etc. and is commonly used as an indicator of child health (e.g., Currie and Moretti 2007; Currie 2009; Almond and Currie 2011). Previous researchers have found that higher birth-weight infants achieve higher levels of educational attainment (see Chatterji et al. 2014 for a review).

Second, there are gender differences in the relationship between advantaged children’s reading achievement and the type of school they attend. The average reading achievement of high-SES girls attending either Catholic or independent schools is significantly lower (0.177 and 0.161 std., respectively) than that of high-SES girls attending public schools (see Appendix 3). In contrast, high-SES boys attending independent schools have higher reading scores (0.161 std.) on average than those high-SES boys attending public schools. Although advantaged boys and girls are equally likely to attend private schools (see Table 1), these gender differences in the way that reading achievement is related to (responds to) the type of school that a child attends make an important contribution in eliminating the gender gap in reading achievement among high-SES children. In effect, the reading achievement of high-SES boys would lag an additional 0.116 std. behind that of their female classmates if the gendered response of reading achievement to school type were eliminated.

Finally, high-SES girls’ reading advantage is reduced by gender differences in the response of reading achievement to general intelligence (i.e., WISC scores) and increased by gender differences in the response of reading achievement to: (i) fathers’ education and (ii) teacher-assessed 6-year old math skills.

5.3 The achievement gap in numeracy

Decomposition results highlighting the source of the gender gap in numeracy scores are presented in Table 3. As before, the top panel shows the magnitude of the gender gap in students’ standardized numeracy scores. The bottom panel presents the share of the numeracy gap attributable to differences in boys’ and girls’ educational endowments in the odd-numbered columns and educational responses in the even-numbered columns.

Table 3 Oaxaca decomposition results for NAPLAN numeracy test scores by SES

In contrast to reading, the gender gap in numeracy achievement favors boys rather than girls and is concentrated at the top of the SES distribution. There is no statistical difference in the numeracy achievement of low- and medium-SES boys and girls. Boys in high-SES families, however, have third-grade numeracy scores that are 0.297 std. higher than high-SES girls. This gap is completely unexplained by differences in boys’ and girls’ educational endowments. After all, girls are more prepared for school when they start (as reflected in their age four WAI scores) and have better reading skills (as reported by kindergarten teachers). Both contribute to significantly reducing boys’ relative numeracy achievement (0.123 std. in total). Boys, on average, are heavier at birth, are slightly older at the test date, and have more numeracy skills at age 6 (as assessed by teachers), all of which contribute to raising their relative numeracy test scores. On balance, the effect of gendered educational endowments among high-SES children is small (0.051 std.), negative (i.e., favors girls), and statistically insignificant.

The advantage that high-SES boys have in numeracy relative to high-SES girls stems from gendered educational response effects, that is, from differences in the numeracy achievement of advantaged boys and girls with the same education-related characteristics. In particular, as with reading achievement, gender differences in the relationship between numeracy and the type of school high-SES children attend partly account for boys’ advantage in numeracy. The numeracy scores of advantaged girls attending Catholic schools are significantly lower (0.285 std.) than those of advantaged girls attending public schools (see Appendix 4). This results in a statistically significant response effect associated with school type. High-SES boys’ numeracy advantage would be 0.095 std. smaller, in the absence of these gender disparities in the relationship between numeracy achievement and school type. In addition, high-SES boys gain much more from attending preschool in terms of numeracy achievement than do high-SES girls (0.436 std.). This result is surprising in light of other evidence that pre-school attendance either favors girls’ relative educational achievement (Apps et al. 2013) or is gender neutral (Fitzpatrick 2008). All other response effects are statistically indistinguishable from zero. The cumulative effect is that high-SES boys have numeracy scores that are 0.348 std. higher than their female peers because they achieve better results than girls do with the same education-related characteristics.

Like their more advantaged counterparts, low- and medium-SES boys also have a substantial numeracy advantage as a result of gendered educational response effects. Specifically, boys in low-SES (medium-SES) families have numeracy scores that are 0.191 std. (0.250 std.) higher than girls with the same socio-economic background. However, this is completely counterbalanced by the relative deficits in their educational endowments—most importantly school readiness, parents’ educational expectations, teacher-reported literacy skills and own behavior. The two effects work in opposite directions resulting in an insignificant gender gap in numeracy among low- and medium-SES children. In contrast, the relative deficit in endowments experienced by high-SES boys is much smaller, leaving them with an overall numeracy advantage.

We can only speculate about the reasons that boys achieve higher numeracy test scores than do girls with the same education-related characteristics. One important possibility is that we are simply failing to measure important drivers of children’s numeracy achievement. Our data do not permit us, for example, to explicitly control for spatial skills, and there is evidence that gender gaps in spatial skills are larger among more advantaged children (Levine et al. 2005). Alternatively, although we control for teachers’ assessments of children’s absolute and relative ability, we may be failing to completely account for the role of gendered educational practices in generating gender gaps in achievement. Comparisons of nonblind classroom exams with results from exams marked anonymously indicate that teacher bias favors boys in some contexts (Lavy and Sand 2015) and girls in others (Terrier 2016). Critically, teacher bias can have long-term consequences in improving achievement levels and promoting more advanced curriculum choices for whichever gender is favored (Lavy and Sand 2015; Terrier 2016).

5.4 Summary

Taken together, these results indicate that girls in low- and medium-SES families have an achievement advantage in reading, while boys in high-SES families have an achievement advantage in numeracy. In all other cases, boys and girls perform equally well on reading and numeracy achievement tests.

Girls’ in low- and medium-SES families score significantly higher on the third-grade NAPLAN reading test than do boys from the same socio-economic background in large part because they were more ready for school at age 4 and had higher teacher-assessed literacy skills at age 6. In contrast, high-SES boys’ advantage in numeracy occurs because they achieve higher numeracy test scores than girls with the same education-related characteristics. Most importantly, high-SES boys benefit more from preschool and do not face an achievement penalty associated with attending Catholic rather than public school.

5.5 Sensitivity analysis

We conduct a number of sensitivity tests in order to shed light on the relative importance of children’s school readiness (WAI score) and reasoning ability (WISC score) in understanding observed gender gaps in educational achievement. Moreover, the previous literature suggests that a wide range of factors, e.g., social and behavioral skills, approach to learning, parenting style, etc., may be important in shaping gender differences in test scores. In our preliminary analysis, however, we found that many of these factors were not significantly related to children’s achievement over and above the other controls in the model. Hence, we exclude them from our preferred specification and test the sensitivity of our conclusions to this choice. The results of all of our sensitivity analyses are summarized in Table 4 and discussed below.

Table 4 Robustness results: Oaxaca decompositions of reading and numeracy achievement using alternative models

Specification A of Table 4 (preferred specification) reproduces the decomposition results from our preferred specification for comparative purposes. The next three specifications of Table 4 present aggregate OB decompositions that drop (i) the WISC score, (ii) the WAI score, and (iii) both the WISC and WAI scores from the model. Interestingly, failing to account for children’s reasoning ability (specification B) has virtually no effect on the decomposition of reading and numeracy score for high-SES children and only a modest effect on the decomposition of reading scores for low- and medium-SES children. In these cases, our substantive conclusions would be the same irrespective of whether or not we took children’s reasoning ability into account. At the same time, reasoning ability is somewhat more important in understanding the gender gap in low- and medium-SES children’s numeracy achievement. Specifically, failing to take reasoning ability into account exacerbates the estimated negative educational endowment and positive educational response effects that boys experience. In contrast, accounting for children’s school readiness through their WAI score (specification C) is fundamental to understanding gender gaps in third-grade reading and numeracy achievement across the SES spectrum. The advantage that girls have in terms of educational endowments is substantially understated if we ignore the effects of school readiness, leaving the response effects overstated as a result. This problem is compounded if we account for neither school readiness nor reasoning ability (see specification D).

We also consider whether or not the insignificant effect of parental investments, parental education, and family structure in generating gender differences in educational achievement stems from the inclusion of controls for children’s school readiness, reasoning ability, parents’ expectations, and teacher-assessed skill levels. These capabilities may themselves be driven by the preschool investments that parents make in their children and including them in our model may imply that we are in a sense over controlling. We investigate this issue by conducting the decomposition analysis for a model in which the WISC score, WAI score, parents’ expectations, and all teacher assessments are excluded. The results are presented in specification E. We find that medium-SES girls continue to have a reading advantage in third grade because they have better educational endowments even when these important endowments are excluded from the model. In all other cases, however, the gendered educational responses are more than sufficient to explain the entire gender gap in achievement levels ruling out an important role for gendered parental investments in generating the achievement gaps we observe.

Finally, the LSAC data provide information about a range of other factors which have been suggested as contributing to the gender gap in achievement. We investigate the importance of these factors by estimating a “kitchen-sink” model that adds to our preferred specification: (i) children’s acquisition of social and behavioral skills (i.e., teacher-assessed approach to learning as in Bertrand and Pan 2013); (ii) children’s preference for math or reading; (iii) parental background (i.e., whether or not parents were raised in a family in which the only breadwinner was the father); (iv) parents’ parenting style (i.e., disengaged, permissive, authoritarian, and authoritative) (see Wake et al. 2007)Footnote 20; (v) income; and (vi) a vector of other contextual variables (e.g., parents’ relationship quality, a cluttered house, etc.). Results are reported in specification F of Table 4. We find no evidence that the inclusion of this broader set of characteristics adds substantively to our understanding of the relative importance of either endowment or response effects in shaping the disparity in boys’ and girls’ achievement in reading and numeracy.

6 Conclusions

Achieving gender equality in education is a key social objective. Differences in boys’ and girls’ educational achievement are particularly concerning because they are likely to be perpetuated, spilling over into other educational outcomes and undermining efforts to achieve gender equality more generally. After reviewing the evidence, the OECD (2015) recently concluded that gender disparities in educational achievement at age 15 are not the result of differences in aptitude. Rather “given equal opportunities, boys and girls, men and women have equal chances of achieving at the highest levels” (p. 13). If true, this implies that policy effort should be directed towards equalizing opportunities.

This paper makes an important contribution to this debate by investigating the source of gender disparities in early educational achievement. Specifically, we decompose the gender gap in third-grade standardized test scores in reading and numeracy into two components. One component is due to differences in boys’ and girls’ endowments of education-related characteristics, while the second is due to gender differences in the way that test scores respond to those endowments. This approach has a long tradition in analyzes of gendered labor market outcomes, but to our knowledge, has not been applied to the study of early academic achievement. It has the advantage of allowing us to use a unified framework to consider the collective importance of a vast range of education-related characteristics, many of which may be individually insignificant.

Our results lead us to three important conclusions. First, like others in the literature, we find that gaps in educational achievement are linked to children’s socio-economic status. The relative advantage that girls have in reading exists only among children in low- and middle-SES families, while boys’ relative advantage in numeracy only occurs in high-SES families. There is no innate skill advantage for either one gender or the other that manifests itself in all contexts. Rather, the gendered nature of educational achievement differs across domains (e.g., reading vs. numeracy) and ends of the socio-economic spectrum.

Second, it is clear that the source of the gender gap in achievement differs across domains. Girls score higher on their third-grade reading tests because they have better endowments of the things associated with higher educational achievement. Specifically, they were more ready for school and had better teacher-assessed literacy skills in kindergarten. This results in low- and medium-SES girls having significantly higher reading test scores than boys. High-SES girls also have an advantage in educational endowments, but lag behind high-SES boys in the way these endowments are translated into reading achievement. These two effects work in opposite directions leaving high-SES girls and boys with virtually identical reading achievement.

In contrast, boys’ numeracy advantage stems not from better educational endowments but from an advantage in the way their test scores respond to (are associated with) these characteristics. That is, boys achieve higher numeracy test scores than do girls with the same education-related characteristics. In the case of low- and medium-SES boys, this advantage in response effects is large enough to compensate for their lower educational endowments leaving them with the same numeracy achievement as their female classmates. High-SES boys, however, have an advantage in numeracy because the positive response effects are larger than the negative endowment effects. In particular, high-SES boys’ numeracy advantage would be substantially smaller if the relationship between numeracy achievement on the one hand and preschool and school type on the other were gender neutral.

Third, while we cannot definitely rule out gendered educational practices as a source of the gender gap in children’s standardized test scores, this seems unlikely to be the full story, particularly in the case of reading achievement. Importantly, our results add to the small body of evidence showing that achievement gaps exist in early primary school, before children have been exposed to long periods of gender-biased schooling. Moreover, girls score higher on their third grade, standardized tests, largely because of the skills they already posed before entering school and in kindergarten. It is this skill advantage which produces a reading advantage—and eliminates a numeracy disadvantage—for girls in the bottom two thirds of the SES distribution. None of this suggests that the school environment itself is the main source of the gender gap in achievement. At the same time, high-SES boys have higher numeracy achievement than high-SES girls in third grade in large part because they gain much more from attending preschool and lose less from attending Catholic (as opposed to public) schools raising the possibility that gendered educational practices in these settings favor boys’ numeracy achievement. Alternatively, it is also possible that sorting into these educational settings varies by gender.

Despite these conclusions, there remains a great deal that we do not yet fully understand. Variation in the magnitude and direction of the gender achievement gap across domains and socio-economic status has led some researchers to speculate about the nature of the interactions that might produce this complex pattern of results. Levine et al. (2005) postulate, for example, that in high-SES families, boys engage in relatively more spatially relevant activities than do girls which would potentially explain boys’ numeracy advantage. Similarly, Penner and Paret (2008) argue that variation in gender differences in math scores may be due to variation in gender stereotypes or the transmission of cultural resources within groups. Research testing these hypotheses would be particularly valuable in identifying sensible policy responses.

There is also surprisingly little evidence that the gendered nature of investments in children varies by socio-economic status, let alone that this is the source of the gender gaps in achievement we observe. Baker and Milligan (2016), for example, provide cross-national evidence that from an early age parents spent more time with girls reading, telling stories, and teaching words and letters. This could certainly explain girls’ advantage in literacy. The gender gap in parental investments in literacy, however, is largest among less-educated mothers in the UK and among highly educated mothers in the US and Canada, leaving relative reading achievement by SES difficult to explain. It would be useful to know more about the pathways through which children’s SES produces gender inequality in educational achievement.

It is clear that the pattern of achievement gaps across domains and family circumstances is complex, making it unlikely that a single overarching process drives the relationship between gender and educational achievement. We need to do more to identify which mechanisms are relatively more important and in which circumstances.