Introduction

In most western countries, women currently outperform men in education (DiPrete and Buchmann 2013), but continue to experience significant disadvantages in the labor market, even when they complete higher education (Bobbitt-Zeher 2007). These gender inequalities reflect a complex configuration of determinants involving several family- and job-related factors, but it is well documented that gender segregation in higher education (GSHE) is a significant pathway to these labor market inequalities, because women major in fields that typically enhance unemployment and overeducation risks and lead to less lucrative jobs (Barone and Ortiz 2011; Bobbitt-Zeher 2007; Gerber and Schaefer 2004; Nunez and Livanos 2010).

Understanding the mechanisms behind GSHE is therefore a major theoretical challenge which bears important policy implications. Unfortunately, while several potential explanations for GSHE have been proposed, much less progress has been made in the direction of testing their heuristic power. As discussed below, a few previous empirical tests have been already carried out, but they have considered only one or two mechanisms separately and have provided mainly negative results, for instance showing that ability in mathematical and scientific subjects fails to account for GSHE. An additional limitation is that most of these empirical tests are concerned only with the USA, while much less is known about Europe, where educational systems are differently organized in several respects, most notably because curricular differentiation in high school is often more rigid and formalized.

Using a rich longitudinal dataset, in this work we propose a joint test of seven potential explanations for GSHE. We take Italy as test case and we consider a recent cohort of upper secondary school leavers. In Section 2 we discuss these seven potential explanations and in Section 3 we provide some background concerning the Italian case. In Section 4 we present our data, measures, and modeling strategy, which rests on the decomposition technique proposed by Breen et al. (2013). Section 5 presents the empirical results, and Section 6 concludes.

Explaining gender differences in field of study choice

Previous research indicates that the qualitative pattern of GSHE displays remarkable similarities across western countries (Barone 2011; Charles and Bradley 2002, 2009; Vaarmets 2018). Women are systematically underrepresented in STEM fields, such as engineering, computing, technology, and, to a lesser extent, physics; they are strongly overrepresented in the humanities, the social sciences,Footnote 1 teacher education, and, to a lesser extent, in medicine and other health-related fields; other fields such as the life sciences, architecture, law, business, and economics tend to be more gender-balanced. Moreover, empirical research indicates that the humanities and the social sciences offer less favorable labor market prospects in terms of unemployment rates, earnings, and the risks of overeducation and skill mismatch, while for the same outcomes graduates from engineering and computing, medicine, and the other health-related fields perform above the average (Assirelli 2015; Barone and Ortiz 2011; Davies and Guppy 1997; Reimer et al. 2011). Hence, the connection between GSHE and gender inequalities in career prospects is largely due the underrepresentation of women in engineering and computing and to their overrepresentation in the humanities and the social sciences. These two specific gender unbalances across fields thus constitute the focus of this paper.

Moreover, previous research reports that GSHE is highly stable across cohorts: the qualitative and quantitative patterns of gendered field of study choices display a remarkable tendency to persistently conform to gender stereotypes (Barone 2011; England and Li 2006; Mann and DiPrete 2013; van de Werfhorst 2017). This resilience is supposed to reflect the persistence of gender essentialist stereotypes (Charles and Bradley 2009). Gender essentialism promotes gender biases in the set of skills, preferences, and beliefs that are transmitted from one generation to the next (Buchmann et al. 2008; Correll 2004).

Several specific mechanisms may relate gender essentialist stereotypes to GSHE. First, gender essentialism can affect skill development through socialization processes operating through significant adults as well as through gender segregated peer networks. Indeed, a common explanation for women’s underrepresentation in some math-intensive and rewarding fields relates to gender gaps in math and science (Ceci and Williams 2010). This first explanation may be derived from rational choice theory, which argues that individuals tend to prefer educational options that enhance their chances of success. Indeed, in OECD countries, boys still outperform girls in mathematic skills. According to the PISA study (OECD 2014), the average gender gap is quite modest among 15-year-old students across OECD countries (11 points on the PISA scale), but it doubles among the decile of top performers (20 points), and a similar tendency can be identified as regards scientific skills. In Italy, the gender gap in math skills is comparatively high (18 points) and it reaches 29 points in the top decile. It should be noted that in several countries, including Italy, students have been already tracked by the age of 15, i.e., when they are tested in the PISA study. Therefore, this kind of descriptive evidence is likely to reflect both genuine differences in student skills and the gendered consequences of curricular tracking. At any rate, the empirical studies that have tested whether gender differences in math and science skills mediate GSHE report poor support for this hypothesis (Ellison and Swanson 2010; Hedges and Nowell 1995; Morgan et al. 2013; Xie and Shauman 2005), at least for the USA. There is increasing evidence that the overrepresentation of boys in scientific fields cannot be explained by their advantage in math and science (Vaarmets 2018).

Less is known about a second, complementary skill-based explanation for GSHE, namely, the competitive advantage hypothesis (Jonsson 1999; Vaarmets 2018). If students prefer fields of study in which they enjoy a comparative advantage in terms of academic performance, then girls are on average more likely to select fields that reward language and communication skills more, such as the humanities and the social sciences. The scant empirical evidence regarding this second rational choice hypothesis fails to support this explanation (Jonsson 1999; Riegle-Crumb et al. 2012; van de Werfhorst et al. 2003), but more research is needed to reach conclusive evidence.

A third rational choice mechanism revolves around gender differences in career preferences. In particular, women may opt for less lucrative fields because they are more family-centered and thus attach lower value to prestige, earnings, and career prospects, while they place a higher value on family conciliation and expressive motives, such as self-realization and social utility (Cattaneo et al. 2017; Ceci and Williams 2010; Garcia-Aracil et al. 2007; Zafar 2013). However, gender differences in career orientations are not very pronounced in recent cohorts, and they seem to fall short of explaining GSHE, at least in the USA (Bobbitt-Zeher 2007; Konrad et al. 2000; Mann and DiPrete 2013; Menon et al. 2017). Moreover, these arguments are difficult to reconcile with the dramatic increase of the presence of women in several legal and business professions, which can be demanding in terms of working time arrangements.

Let us recapitulate the hypotheses that we can derive from these first three rational choice explanations:

  • H1: gender differences in school performance in scientific subjects mediate the association between gender and field choice;

  • H2: gender differences in school performance in scientific subjects relative to performance in humanistic subjects mediate the association between gender and field choice;

  • H3: gender differences in career orientations mediate the association between gender and field choice.

However, it is possible that, rather than maximizing their expected utility in a narrow economic sense, girls and boys maximize the consumption value of education by choosing the fields that correspond more closely to the subjects that they have enjoyed in high school, or that lead to their preferred occupations. Hence, their expressive preferences for school subjects and for specific occupations associated with different tertiary fields represent two additional potential explanations for GSHE. These hypotheses have been put forward in an article by Morgan et al. (2013), who stressed the importance of collecting fine-grained measures of occupational plans, which tap the detailed patterns of gendered career orientation, rather than just considering the overall level of aspiration for highly or poorly lucrative jobs. Hence, this argument about expressive preferences differs from the rational choice argument that girls prefer soft fields because they attach less importance to career prospects. The underlying micro-level mechanism in the former case is not a cost-benefit calculation, but rather the emotional, symbolic value attached to specific subjects, educational programs, and occupations, regardless of their economic profitability. Indeed, there is evidence that detailed measures of “horizontal preferences” between school subjects and occupations are strong predictors of field of study choice that partly account for GSHE, at least in the USA (Morgan et al. 2013). For instance, girls develop more often a preference for humanistic subjects (e.g., arts) and for related occupations (e.g., artistic jobs), which favors enrollment in the humanities or the social sciences (Cech 2013; Morgan et al. 2013).

Furthermore, we will assess the contribution of peer influences to GSHE which reflect a variety of mechanisms: expressive motivations reflecting imitative dynamics and the concern to preserve friendship ties, but along the lines of bounded rationality models it may be argued also that peers act as a reference group and can be drivers of information about specific fields. We are aware of some case studies reporting that students are more likely to choose a given field if their classmates do the same (De Giorgi et al. 2010; Ost 2010), but we could not find any empirical test of the explanatory contribution of peer effects to GSHE.

The gender-biased influences of significant adults should be factored in as well. Teachers, parents, and school counselors encourage more or less directly and explicitly girls and boys to follow different educational tracks and tertiary programs (Zafar 2013; Gunderson 2012; Jacobs 1995). In particular, gender differences in curricular choice in high school partly reflect social control mechanisms operating through the gender-biased perception of the “natural” inclinations and preferences of students (Frank et al. 2008; Gabay-Egozi et al. 2015). Therefore, curricular track choices in high school represent an additional explanatory candidate, given that they are strongly predictive of future field of study choices. The evidence for the USA suggests that gendered patterns in course-taking in high school contribute to GSHE only to a rather limited extent (Frank et al. 2008).Footnote 2 As regards European nations, much less is known. Two comparative analyses report that earlier tracking is associated with higher GSHE at the macro-level (Imdorf et al. 2015; Smyth and Steinmetz 2008), but we could not find any individual-level analysis assessing to what extent track/curricular choices mediate GSHE. This is a significant research gap, because several educational systems in Europe display a fine-grained curricular differentiation in high school that channels girls and boys into different educational pathways at an early age, when the influences of significant adults can be particularly important.

Let us recapitulate the hypotheses that we can derive from culturalist explanations:

  • H4: gender differences in expressive preferences for specific occupations mediate the association between gender and field choice;

  • H5: gender differences in expressive preferences for school subjects mediate the association between gender and field choice;

  • H6: gender differences in the field of study preferences of close classmates mediate the association between gender and field choice;

  • H7: gender differences in secondary track curricular choices mediate the association between gender and field choice.

The Italian educational system and gender segregation in higher education

In Italy, primary and lower secondary education is comprehensive and takes 8 years to complete, from the age of 6 to 14. Lower secondary school leavers are faced with three main options: academic (licei), technical (istituti tecnici), and vocational schools (istituti professionali). Academic schools are more prestigious and display a marked underrepresentation of male, working class, and immigrant students. Each of these tracks comprises several curricula, which are strongly patterned along gendered lines. Within the academic track, girls are underrepresented in the scientific curriculum and overrepresented in the foreign languages and socio-pedagogic curricula. In technical and vocational tracks, girls are underrepresented in industry-oriented curricula and overrepresented in service-oriented curricula (tourism, catering, accounting, etc.).

Teachers’ track and curriculum recommendations are not binding, but they have an important influence on the choices of families. Parents play an important role at this stage, given the young age of their children and the lack of standardized assessments of children’s skillsFootnote 3 (Aastrup 2007; Cavalli and Facchini 2001). Italy is thus an interesting case study because, similar to several other countries in continental Europe, curricular choices are more precocious, formalized, and constraining than in the USA, which has been the focus of most previous empirical studies.

All of these upper secondary tracks and curricula take 5 years to complete and provide access to any program in higher education. However, universities often set a numerus clausus or entry examinations, which can be more or less demanding across tertiary fields. Higher education in Italy comprises a large university sector and a small sector of short post-secondary vocational programs. University education is articulated into 3-year bachelor courses and 2-year master courses. Field of study choice occurs at the entry into the bachelor level and mobility between fields in the transition to the master level is rare (ANVUR 2016).

Finally, it may be noted that in Italy the level and pattern of GSHE display high similarity with those that have been documented for other western countries (Barone 2011; Charles and Bradley 2009; Triventi 2010). Vertical and horizontal gender segregation in the labor market is quite pronounced in Italy (Almalaurea 2015), but less than in other industrialized nations: possibly owing to the low rate of female labor market participation, employed women are a more selected population in Italy (Charles and Grusky 2004). However, in younger cohorts, the activity rate of female graduates is aligned with those observed in other OECD countries (OECD 2016).

The research design: data, variables, and methods

This study is based on a longitudinal survey carried out in 31 high schools from all types of upper secondary tracks. These schools are located in four provinces (Milan, Vicenza, Bologna, and Salerno) that cover different geographical areas of the country. We first drew a random sample of schools proportionally stratified by province and school track (i.e., the number of students from each province and track in the sample is proportional to their share in the overall population of the four provinces). All senior students (grade 13) of these schools were interviewed in three separate occasions (N = 4805). The first wave was conducted at the beginning of the senior school year (October 2013) and involved self-administered questionnaires in the classrooms; the response rate was 99%. The second wave occurred at the end of the same school year (May 2014), and it was based on computer-assisted telephone interviews (CATI); the cumulative response rate was 82.8%. The third wave was conducted in November 2014 and recorded students’ university enrollment decisions, again with a CATI methodology; the cumulative response rate was 79.1%. Overall, the high level of participation of the schools and of the students ensures a good level of external validity for our study.

The analytical sample, which is restricted to university students for which we have valid information on the chosen field of study, includes 2403 individuals.Footnote 4

The primary outcome of interest for our analyses is field of study choice among university students (measured in wave 3). More specifically, following our previous arguments concerning the labor market prospects of different fields (Section 2), we will analyze gender differences in enrollment in occupationally weak, highly feminized fields (the humanities and social sciences), as well as in occupationally strong, masculine fields (engineering, computing). In other words, we focus on those fields of study that provide the most significant contributions to gender inequalities in the labor market, because they display a significant unbalance (higher than 10%) in the share of male and female students and they are associated with significantly above- or below-average labor market outcomes.Footnote 5 We rely on a simple distinction between fields with below-average, average, and above-average occupational returns because it is robust across undergraduate and graduate programs as well as across different occupational indicators, such as unemployment risks, earnings, and overeducation. This basic pattern is well documented in the literature on returns to fields of study concerning the Italian case and other western countries (Almalaurea 2015; Reimer et al. 2011).

As regards school performance in scientific subjects (H1), we consider student’s self-reported grades in math and in science (the two scientific subjects that are taught in high school tracks) on a scale from 0 to 10. As regards the hypothesis of competitive advantage (H2), following Jonsson’s (1999) seminal paper, we take the ratio between mean grades in Italian and foreign languages (the two humanistic subjects that are taught in all high school tracks) and mean grades in math and sciences. As regards the high school curriculum (H7), we can rely on a detailed eight-category variable that refers to the track (academic, technical, vocational) and to the specific curriculum chosen by the student: scientific, classical studies, psychology-pedagogy, and foreign languages curricula for the academic track; industry-oriented, service-oriented, and foreign languages curricula for the technical track; industry-oriented and service-oriented curricula for the vocational track. The service-oriented curricula in the technical track typically train for skilled white collar professions (e.g., accountant, surveyor), while those in the vocational track are more focused on tourism, catering, and personal services. In order to estimate the model for the KHB decomposition, we must aggregate the industry-oriented technical and vocational track, on the one side, and the service-oriented and foreign languages curricula, on the other side.

As regards career orientations (H3), we use the following question: “To what extent do you attach importance to your future career prospects when choosing a field of study? Indicate a value between 1 (not at all) and 10 (completely).” As regards students’ expressive preferences for specific occupations (H4), interviewees were asked to report the occupation that they would like to do within 10 years and their open answers were then coded into a nine-category variable. Similarly, for subject preferences in high school (H5), students were asked to indicate up to three favorite subjects and their answers were coded into four categories: humanistic, scientific, technical disciplines, and a separate category for business and law. For the analyses, we take into account the first preference expressed by students.

Finally, as regards peer influences (H6), each student was asked to indicate up to three closest friends in the classroom; we could map the final choices of these classmates from wave 3 and we constructed two variables to mark if the student has none, only one, or at least two friends choosing the relevant field (the humanities and social sciences, when modeling access to these fields; engineering and computing, when this is the outcome of interest). As an alternative specification, we can replace the final choices of these classmates with their initial preferences, recorded in wave 1. We can also consider the percentage of classmates or schoolmates choosing the relevant field. We will present the results based on the first solution and we will comment those based on the alternative specifications. It should be noted that all these three specifications refer only to school-based peer influences and that we cannot therefore extend our conclusions to the peer influences occurring outside the school context.

We estimate a sequence of binomial logit models among students who enrolled at university, where we incorporate each of the above measures sequentially, following temporal order whenever possible. We first look at the corresponding parameters and at the variations of gender effects across models, then we use the KHB method for a formal decomposition exercise for categorical outcomes based on a mediation analysis (Breen et al. 2013). By holding constant the scale and the fit of the error to the assumed distribution, this method presents a generalization to nonlinear probability models of the standard decomposition approach used in linear regression. Birth year, country of birth, province of residence, and parental education are introduced in the models as control variables, as well as measures of general school performance both before track choice in high school and at the final exam in high school. Standard errors are clustered at the school level.

This study improves over previous research by incorporating several potential mechanisms of GSHE altogether, but we would stress that we still miss some potentially important layers of GSHE. For instance, we are not able to directly operationalize teachers’ or parents’ influences.Footnote 6 To the extent that we omit some potentially important determinants of field of study choices, which could correlate with the determinants incorporated in our models, we cannot discard omitted-variable bias; reverse causality is also a potential concern, for instance girls may want to enroll more often in the humanities for some reason other than those discussed above and they may then tend to develop occupational plans that are coherent with these choices as a mechanism of cognitive-dissonance reduction. Our longitudinal design reduces these risks of reverse causality because information on the determinants of field of study choices is collected 1 year before (wave 1) the information on these field choices (wave 3), but this does not eliminate the problem. Hence, we cannot interpret the estimates of the coefficients of the variables in our models as truly “causal effects.”

Results

Descriptive results

In this sub-section, we present some descriptive analyses concerning the evolution of gender differences in field preferences across waves, as well as the determinants of these preferences. These analyses provide the background for the multivariate results that we will present in Section 5.2. Figure 1 refers to the variations over time in the field preferences of male and female students who enrolled at university. As explained above, the data for wave 1 refer to intended field of study choices at the beginning of the high school senior year, while wave 3 refers to the actual decisions of the students. Starting with strong fields, it can be seen that for both girls and boys engineering and ICT display a moderate increase over time in student preferences, while medicine and health-related fields decline considerably—most likely because of the strong selectivity of entry examinations in these highly regulated fields.

Fig. 1
figure 1

Variation over time of field preferences by gender, detailed classification of fields of study. Descriptive analysis

Conversely, both the humanities and the social sciences increase significantly for female and male students when comparing initial preferences and final decisions. Overall, we observe a significant decline over time of strong fields and a corresponding increase of weak fields for both genders. Interestingly, this trend is more pronounced for girls: 27.9% of them planned to select an occupationally strong field, but only 13.9% finally chose one. This decline is twice as large as the reduction observed among their male counterparts (from 44 to 36.6%); conversely, the share of girls who choose a weak field (33.8%) is substantially higher (+ 8.5%) than the share of girls who initially planned to do so (25.3%). We detect a much smaller increase among boys (+ 3.3%). Hence, gender differences in access to rewarding fields of study widen substantially over the senior high school year (Alon and DiPrete 2017).

Table 1 displays the distributions by gender of the independent variables. These distributions are informative of the potential explanatory power of each factor: if its distribution does not vary by gender, it cannot contribute to gender differences in field of study choice. As can be seen, gender differences across high school curricula are strong.

Table 1 The distribution of the determinants of field of study choices across male and female students

In particular, as regards the general track, girls are overrepresented in the curricula of classical studies (14.7 vs. 10.3% for boys) and, more markedly, of foreign languages and psychology-pedagogy (34.2 vs. 6.5% for boys). Conversely, boys are heavily overrepresented in scientific curricula (51.6 vs. 31% for girls) of the general track, as well and in the industry-oriented curricula of technical (12.9%) and vocational tracks (2.9%), while girls dominate in the foreign language and service-oriented curricula of these tracks. Hence, curricular differences may be a relevant mechanism of GSHE.

To the contrary, when inspecting the average grades in math and science of female and male students, it is apparent that girls outperform boys across both subjects in this recent cohort of students. Moreover, when contrasting performance in the two humanistic subjects against performance in the two scientific subjects, we see that girls enjoy a slight but statistically significant comparative advantage over boys in humanistic subjects. However, for both genders, school performance varies only moderately across domains. In particular, for girls, the average grade in Italian and in English is, respectively, 7.4 and 7.3; the corresponding values for boys are 6.9 in both subjects.Footnote 7 Overall, girls outperform boys in all subjects, and only slightly more in humanistic than in scientific subjects. Hence, skill-based explanations are not promising candidates to account for GSHE. It should be noted also that in Italy, due to the comparatively high drop-out rates in the first 2 years of upper secondary education, high school seniors are a selected population with respect to school performance.

Instead, subject preferences reveal strong gender differences along expected lines: boys more often prefer scientific subjects (43.9 vs. 28.3% for girls) or technical subjects (7.3 vs. 1.2%), while girls display a much stronger interest in humanistic subjects (63.9 vs. 39% for boys). Also gender differences in the preferred types of occupations are very pronounced, particularly as regards engineering and ICT professions (10.8% for boys vs 1.4% for girls), managerial jobs (12.8 vs. 7.1%), teaching (1.4 vs. 5.1%), and care-oriented occupations (4.3 vs. 11.4%). Hence, the expressive preferences for school subjects and future occupations are strongly patterned along gender lines and may thus contribute to the explanation of GSHE. At the same time, as regards the subjective importance of finding jobs with high career prospects, we observe a small significant difference in favor of girls, which casts further doubts on the explanatory potential of the rational choice mechanisms.

Finally, girls mention more often as close friends one classmate (27.8%) or two (19.7%) who enroll in the humanities or in the social sciences, while boys mention more often one (22%) or two (7.4%) classmates enrolling in engineering or ICT. Hence, the gender segregation of friendship networks may contribute to GSHE.

Multivariate analyses

In this section, we consider whether the above-described factors affect field of study choices by means of two sets of binomial logistic regression models referring to enrollment in the humanities and social sciences (Table 2), or in engineering and ICT (Table 3). We consider only students who enrolled at university and report the average marginal effects, which can be interpreted as differences between categories in the probability of enrolling in a given field. We will comment the parameters referring to the potential mediators of GSHE, as well as the variations across models in the coefficients for gender which provide some first indications on the explanatory power of these mediators that will be later tested more formally using the KHB method.

Table 2 Determinants of enrollment in the humanities and social sciences at university: binomial logistic regression models, average marginal effects
Table 3 Determinants of enrollment in engineering and ICT at university: binomial logistic regression, average marginal effects

Model 1 displayed on Table 2 refers to the total effect of gender on the chances of choosing a humanistic or social science field (boys are the reference category), controlling for socio-demographic covariates and for the final school mark at lower secondary examinations.Footnote 8 As can be seen, we detect an overall strong gender gap, which amounts to 24.6%.

In model 2, we introduce a set of dummy variables that refer to the upper secondary curriculum chosen by each student. This is a powerful predictor of field of study choice: relative to students attending scientific curricula in the general track (reference category), those in classical studies curricula have 25.1% higher chances of enrolling in the humanities and social sciences, while those in foreign language and psycho-pedagogical curricula have 49.2% higher chances; these fields are chosen more often also by students of foreign language curricula in technical tracks (+ 39%) and of service-oriented curricula in vocational tracks (+ 30.3%). Crucially, following the introduction of this single variable, the effect of gender is reduced from 24.6% (model 1) to 7.2% (model 2), in line with hypothesis 7. This reflects the strong gender unbalances between secondary curricula described in the previous section and the strong effects that these curricula exert on field of study choices.

Model 3 indicates that one additional point in the average grade in math and in science reduces by, respectively, 5.6 and 3.9% the chances of enrolling in humanistic and social science fields. Hence, performance in scientific subjects is a powerful predictor of field of study choice. However, contrary to hypothesis 1, it does not seem to mediate gender differences across fields, given that the parameter for gender has changed little in model 3. This could be expected given that, in this recent cohort of students, boys do not report higher grades than girls in scientific subjects. The measure of comparative advantage (model 4) does not have a statistically significant effect on field of study choice (contrary to hypothesis 2), and the same applies to the overall mark at the final upper secondary examinations. Overall, general school performance in lower and upper secondary education (models 1 and 4) impacts little on the outcome; the only relevant skill-based predictor is performance in scientific subjects.

Models 5 and 6 refer to student preferences for school subjects and for specific occupations to test hypotheses 4 and 5. These preferences are strong predictors of the outcome: a preference for humanistic subjects in high school boosts by 16.2% the preference for the corresponding fields relative to students mentioning a preference for scientific subjects (reference category, model 5). Moreover, aspiring to service-oriented, humanistic, and teaching professions correlates strongly with later enrollment in the humanities and the social sciences: the corresponding parameters are 27.1, 35.6, and 30.5%. When adding these variables referring to student expressive preferences in models 5 and 6, the residual effect of gender further decreases from 8.1 to 6.3%. Interestingly, these two factors also mediate the effect of secondary curricula to some extent. In particular, we observe substantial reductions—between one fourth and one third of the values in model 4—in the parameters referring to curricula in classical studies, other humanistic tracks, foreign languages in the technical track, and service-oriented curricula in vocational tracks.

Model 7 assesses the relevance of instrumental preferences for field of study choices (hypothesis 3): in line with rational choice theory, students who are more career-oriented display a lower propensity to enroll in the humanities and the social sciences: one point on this scale, which ranges from 0 to 10, reduces by 1.3% the chances to enroll in these fields. However, the effect of gender is virtually unchanged. This was expected, because girls are not less career-oriented than boys. Finally, model 8 indicates that, controlling for all the previous covariates, the share of close classmates choosing the humanities and social sciences is a significant predictor of respondent’s own field of study choice. In particular, having at least two friends who take these fields enhances the chances of choosing them by 8.5%. The parameter for gender is unaffected by the inclusion of this variable, contrary to hypothesis 6.

The pattern of results for enrollment in engineering and computing (Table 3) displays strong similarities with those that we have just commented. The overall effect of gender, which is now negative, is almost as strong (− 22.3%), and curricular choices in secondary education display strong influences on field of study choice. Moreover, performance in math is a relevant predictor of enrollment in these fields, while grades in science as well as the other skill-based predictors do not correlate with the outcome. Expressive preferences for scientific and technical subjects and for engineering and ICT professions are powerful predictors, while career aspirations do not seem to play any role. The effects of classmates’ field of study choices are confirmed, too.

Moreover, the initial gender differential in access to these fields is reduced substantially following the introduction of these mediators. However, in this case, the role of secondary curricula choices seems less prominent, while expressive preferences for specific occupations seem to play a more important role.

The results of the KHB decomposition, reported in Fig. 2, confirm the pattern of findings based on the logistic regression. In particular, curricular track alone accounts for almost two thirds (64%) of gender differences in access to the humanities and social sciences, and for almost one third (29%) of the gender gap in engineering and computing. The two skill-based explanations as well as career orientations do not hold any explanatory power (actually for the first outcome they even have negative explanatory power, meaning that they promote an underrepresentation of girls in occupationally weak fields). Expressive preferences for specific subjects and occupations give a small but significant contribution to explain gender differences in the humanities and social sciences (8% if we compare models 4 and 6), while their explanatory power is twice as high (15%) when it comes to engineering and computing. Overall, these determinants of field of study choices altogether account for 67% of the gender gap in soft fields and for 47% of this gap in the hard fields.

Fig. 2
figure 2

KHB decomposition of gender differences across fields, % explained by: secondary curricular track, school performance in scientific subjects, school performance in scientific subjects relative to performance in humanistic subjects, career orientations, expressive preferences for school subjects, expressive preferences for specific occupations, field of study preferences of close classmates

Finally, let us comment the results of some robustness checks (available upon request to the authors). First, the pattern of findings that we have reported does not significantly change if we run the models on each field separately (the humanities, the social sciences, engineering, computing), but the estimates become more unstable (due to lower numbers for the positive outcome). Moreover, as reported in the online appendix, using a three-category multivariate logit model (humanities and social sciences, engineering and ICT, gender-neutral fields) leads to the same results as using two sets of binomial logits, and the same holds for a four-category specification where gender-neutral fields are further disaggregated (economics and law vs. scientific fields). Second, we have experimented more and less detailed specifications for the variable referring to secondary curricula: more detailed specifications display slightly higher explanatory power as expected, but the differences are small. Third, we can measure peer effects using a broader range of fields or considering the percentage of class- or schoolmates selecting more and less rewarding fields: the conclusion that peer effects matter for field of study choice but account for GSHE only marginally still holds. Finally, in the models reported in Tables 2 and 3, we have followed a chronological criterion concerning the order of introduction of the covariates (general marks in lower secondary education, track choice at the beginning of upper secondary education, subject-specific marks in the previous school year, student preferences for subjects and occupations in the last year, peers’ final field of study choices). If we invert the sequence of introduction of the variables that refer to subject preferences, occupational preferences, career orientations, and peer influences, the pattern of results does not change and our substantive conclusions are confirmed. The same conclusion holds if we enter occupational preferences in high school before track choice (assuming that these preferences are a proxy of unmeasured preferences in junior school, i.e., before choosing the secondary track), as reported in the online appendix web address to be reported.

Concluding remarks

This article has used fresh, longitudinal data on a recent cohort of Italian high school seniors to assess seven potential explanations for gender differences across fields. Let us summarize our three main sets of results. First, skill-based explanations referring to gender differences in scientific subjects or the comparative advantage in humanistic subjects do not hold any explanatory power: school performance in math affects field of study choices, but in recent cohorts gender differentials are negligible in this domain, at least among high school seniors (see footnote 7). Similarly, the higher the importance attached to future career prospects, the lower are the chances of choosing the humanities and social sciences, but this factor fails to account for GSHE because girls are not less career-oriented than boys. Overall, this first set of results suggests that rational choice explanations for GSHE do not find empirical support. At least in this recent cohort of high school graduates, girls do not seem to experience any lack of relevant skills or of ambition that would prevent them from enrolling in more rewarding fields.

Second, expressive motivations related to preferences for school subjects and for specific occupations mediate to a significant extent GSHE because these two factors are strongly patterned along gender lines and affect field of study choices. Additionally, our results suggest that the educational choices of the closest classmates correlate with respondents’ own choices, although these factors do not mediate GSHE. Overall, our analyses suggest that culturalist approaches stressing the role of expressive preferences over and above cost-benefit calculations display significant heuristic value. Their importance is stronger for gender differences in engineering and ICT than for the humanities and social sciences.

Our third and most important result concerns the key role of curricular track choices. This single factor mediates 64% of the gender differences in access to the humanities and social sciences and 29% of the gender differences in access to engineering and ICT. This is because curricular choices in high school are heavily segregated along gender lines and curricular track displays a strong influence on field of study choices. Selecting a specific track in high school is a choice which has several important consequences, such as being exposed to the following: a specific curriculum and thus developing more interest in specific disciplines; a specific form of anticipatory socialization shaping the range of potential occupations that they envisage; different classmates with their own educational and occupational preferences which can affect student’s own preferences. Indeed, our results indicate that the strong influence of curricular track is partly mediated by the expressive preferences for school subjects and for specific occupations and to a minor extent by peer influences. However, even after taking into account these determinants of field choices, the effects of track choices remain strong. We suspect that the effect of secondary curricula reflects also the different chances to succeed in different fields afforded by these curricula. More generally, a limitation of our data is that they do not contain information on the subjective beliefs concerning the chances to succeed in different fields nor about their perceived profitability, which could contribute to further account for gender differences across tertiary fields (Barone et al. 2017).

With the data at hand, we cannot examine what shapes curricular choices at early stages. As suggested in Section 2, there is evidence from the literature that the influences of significant adults (teachers, parents, counselors) play a significant role, particularly at early educational transitions, when students are younger. However, the social norms with regard to the appropriateness for men and women of various educational fields may not be very strong: for instance, students choosing gender-atypical fields do not seem to receive less support from their parents (Mastekaasa and Smeby 2008). A limitation of this study is that we do not have direct information on significant other’s influences.

Moreover, it is possible that earlier expressive preferences for specific subjects and occupations influenced secondary track choices. If these earlier preferences affected track choice, and if they exert a direct impact on field choice net of track choice, this would lead us to overestimate the role of track choices. More generally, it is important to stress that our results refer to one specific educational transition, namely, access to university education. Our longitudinal data do not cover the earlier stages of the educational career and thus we cannot assess the dynamic evolution of the mechanisms that we consider. In this study, we can only observe the net influences of each of the above-mentioned mechanisms as they operate at the end of upper secondary education. This may explain why we are not able to entirely account for gender differences across fields, which is another limitation of our study.

Even if we cannot analyze what lies behind the influence of secondary curricula, the finding that this factor accounts for a substantial part of GSHE seems important to reorient the scientific and policy debate on this matter, because it suggests that GSHE is to a substantial extent predetermined well before students complete high school and choose a field of study. This conclusion differs from the results concerning the USA, where upper secondary education is not tracked and curricular choices are more fluid. Our results may not be mechanically transferred to other countries, but we would expect to observe similar patterns in other European countries, where school-based tracking is common. At the macro-level, there is already evidence that GSHE is stronger where early tracking is prevalent (Imdorf et al. 2015; Smyth and Steinmetz 2008). The scientific debate on this matter that we have discussed in Section 2 seems too focused on the proximate mechanisms operating when students must choose a field of study. Future research should devote more attention to the role of curricular tracking for GSHE in European contexts and to the mechanisms that lie behind it.

Future developments of our work involve also a deeper understanding of the interplay between the different determinants of field of study choices. Since our research questions focus on the mediating mechanisms of GSHE, we carried out a standard mediation analysis, but it is possible that the influence of different determinants of field choices varies across genders, thus opening the room for a moderation analysis. It seems also important to keep in mind that, while our modeling approach is unidirectional in suggesting a relation between track choice and field of study, reality is necessarily more complex, in that it involves multi-directional progressions, which can be at the same time reversible and open to anticipation effects, whereby occupational plans have retroactive influences on anticipated field of study choice and/or track choices (Xie and Shauman 2005, pp. 57–60).

As regards the policy implications, many interventions to reduce GSHE take the form of educational and counseling activities trying to enhance girls’ interest toward STEM fields of study and the related occupations or to raise the awareness of gender stereotypes on the side of students, teachers, and parents in high school (Timmers et al. 2010). The risk is that these interventions arrive too late, if they target only university choices and leave gendered curricular choices unchanged.