Bandura (1977, 1982, 1986) defined self-efficacy as the perceived ability of an individual to succeed at or accomplish certain tasks. Academic self-efficacy is essential to academic success (Lent et al. 1984, 1986, 1987). The criterion-related validity of academic self-efficacy has been documented by several primary studies and one meta-analysis. Multon et al. (1991) analyzed 36 studies that examined the relation of academic self-efficacy with performance and persistence and identified a mean correlation of r = 0.38 for performance and r = 0.34 for persistence. Given the importance of academic self-efficacy to academic achievement and persistence, determining whether academic self-efficacy is associated with other important variables is worthwhile.

Gender differences in academic self-efficacy have been investigated extensively in recent decades. Although many researchers have examined gender differences in academic self-efficacy, findings have been inconsistent. As conventional narrative reviews (Pajares 2002, 2003) may be influenced by subjectivity and bias, meta-analysis that quantitatively summarizes studies is an alternative to a narrative review. In a meta-analysis of US and Canadian participants, Whitley (1997) examined gender differences in computer self-efficacy and found that the mean effect size was weak to moderate. The major limitation of this meta-analysis was its narrow scope, as it focused specifically on computer self-efficacy and ignored many relevant studies conducted outside North America. Generally, Eastern cultures emphasize collectivism, while Western cultures prioritize individualism. Cultural differences in self-efficacy were identified by Scholz et al. (2002), who determined whether general self-efficacy is a universal construct for 25 countries. They demonstrated that participants from collective cultures, such as those in Japan and Hong Kong, had low self-efficacy. As culture may play a role in the determination of academic self-efficacy, this meta-analysis is an extension of the meta-analysis by Whitley (1997) of gender differences in computer self-efficacy. Specifically, gender differences in all major components of academic self-efficacy were examined in individualistic and collective cultures. Consequently, the principal goal of this study is to provide insight into average gender differences in academic self-efficacy via a meta-analysis. Such a synthesis can determine the generalizability of findings by individual studies and be utilized as reference points when examining gender differences in academic self-efficacy. Academic self-efficacy can be assessed in various contexts based on its subject area, participant age, and culture. Coinciding with this purpose is a specific examination of gender differences in academic self-efficacy given different subject areas, participant ages, cultures, and study features. Elucidating the magnitude of gender differences in academic self-efficacy across learning contexts may also clarify performance differences between males and females and help improve the academic self-efficacy of males and females in ways that promote academic success.

Structure and conceptualization of academic self-efficacy

Self-efficacy is an important construct of achievement motivation. According to the social cognitive theory of self-efficacy (Bandura 1977, 1982, 1986), performance accomplishments, vicarious learning, verbal persuasion, and physiological state are the four factors that determine an individual’s self-efficacy and affect an individual’s choice about whether to engage in a specific task and persist to complete a specific task (Bandura 1977, 1982, 1986). Performance accomplishment is the most significant determinant of an individual’s self-efficacy. For instance, students experiencing success in completing a task have a high self-efficacy for that task. Experiences of failure typically undermine this self-efficacy unless they are attributed to a lack of effort or poor strategies. For individuals lacking experience, observing others performing tasks affects their self-efficacy. Individuals use such information to assess their likelihood of completing a specific task. Verbal and nonverbal feedback from others has a relatively weaker effect than performance accomplishment and vicarious learning on individual self-efficacy. Persuaders may attempt to elevate an individual’s self-efficacy. Finally, physiological states, such as anxiety or tension, are determinants of an individual’s ability to complete a specific task. Generally, a negative physical state (i.e., high anxiety level) is associated with poor outcomes and low self-efficacy (Bandura 1977, 1982, 1986).

As self-efficacy and self-concept are two important components of self-beliefs, their conceptual similarity and dissimilarity must be clarified. Pajares and Miller (1994) claimed that self-efficacy denotes an individual’s perceived ability to complete a specific task. Consequently, self-efficacy is directly related to a task, context, or situation. Self-concept is a more general and global assessment of self-attitudes than self-efficacy. Self-concept can be domain-specific but not task-specific. The determination of self-worth is typically based on social comparisons, while the assessment of self-efficacy is related to specific tasks. That is, self-concept is determined based on an external reference, while self-efficacy is based on an internal reference (Marsh et al. 1991).

The similarity and difference between self-concept and self-efficacy are useful during meta-analyses of self-beliefs (Valentine et al. 2004). If replicable findings are obtained for different self-beliefs, the generalizability of self-terms is high. Although numerous meta-analyses have examined gender differences in self-esteem, meta-analyses of gender differences in all key self-efficacy components are lacking. Indeed, findings obtained by individual studies for the relationship between gender and self-efficacy are mixed. Given this inconsistency for the role of gender differences in academic self-efficacy, summarizing research in this area is necessary.

Previous review of gender differences in self-efficacy

Narrative reviews have been conducted regarding gender differences in academic settings. Pajares (2005), who summarized research on gender differences in math self-efficacy, reached four major conclusions. First, most studies indicated that male students had higher mathematics self-efficacy than females, while other studies did not. This inconsistency was related to variables used in regression equations. Second, gender differences in mathematics self-efficacy typically develop during middle school and increase as student age increases. Third, female students do not have higher mathematics self-efficacy than male students at any educational level. Finally, male students typically have higher mathematics self-efficacy than females, even when males and females have comparable achievement levels or when females outperform males. The pattern of gender differences in writing self-efficacy differs from that in mathematics self-efficacy. Pajares (2003) reviewed literature on gender differences in writing self-efficacy, concluding that females generally have higher writing self-efficacy than males during middle school; this gender gap disappears or reverses as students age. For gender differences in self-efficacy for self-regulated learning, Pajares (2002) proposed that female students were generally more confident than male students.

In a meta-analysis of 82 studies with 104 effect sizes based on 40,491 US and Canadian participants, Whitley (1997) sought to elucidate gender differences in computer self-efficacy that yield a mean effect size of d = 0.41, indicating that average computer self-efficacy for males was 0.41 standard deviations above the average computer self-efficacy for females. Significant heterogeneity among effect size estimates was a function of participant age. High school students had higher mean effect size than college and elementary school students. Furthermore, adult and college students had higher mean effect sizes than elementary school students. That is, high school students had higher computer self-efficacy than college students, who in turn had higher computer self-efficacy than elementary school students. As gender differences in computer self-efficacy existed, gender differences may exist in academic self-efficacy, and thus, it was hypothesized that gender differences exist in academic self-efficacy

Moderators of gender differences in academic self-efficacy

Literature on gender differences in academic self-efficacy suggests that the magnitude of effect size may vary as a function of the educational setting in which academic self-efficacy is measured. Since this study analyzes a large number of studies, the aim is to identify settings of academic self-efficacy in which gender differences are large. Moderator variables were chosen based on self-efficacy theory and empirical findings, including subject area, participant age, culture, and other study features, such as publication status, of analyzed studies.

Subject area

Researchers have suggested that patterns of gender difference in academic self-efficacy vary among domains. Britner and Pajares (2001), who examined science self-efficacy for 272 grade 7 students, found that girls had higher science self-efficacy and self-efficacy in self-regulated learning than boys. Anderman and Young (1994) did not find significant gender differences in science self-efficacy between boys and girls (mean age, 11.5). Cassidy and Eachus (2002), who investigated computer self-efficacy for 94 males and 113 females, found that males had higher computer self-efficacy than females. Similarly, in an investigation of Taiwanese students, Chou (2001) reported that gender differences in computer self-efficacy favored grade 10 boys. Gender differences in computer self-efficacy favoring males were also reported by Coffin and MacIntyre (1999), Miura (1987), and Qutami and Abu-Jaber (1997). Peng et al. (2006) reported mixed results for gender differences in computer self-efficacy. They surveyed 1,417 Taiwanese college students and found that no gender differences exist in their beliefs of their ability to use the Internet; however, significant gender differences favored males in beliefs about their ability to use the Internet for communication.

Friedel et al. (2007) investigated mathematics self-efficacy of 1,021 grade 7 students and found that no gender differences existed. In a longitudinal investigation of mathematics self-efficacy for a cohort of children in grades 5–7, Kenney-Benson et al. (2006) identified no significant gender differences in two waves of data. Furthermore, no gender differences in mathematics self-efficacy were identified in several studies (O’Brien et al. 1999; Pajares and Kranzler 1995). Conversely, Hackett (1985) identified significant gender differences favoring males in a study of 262 undergraduate students. Several researchers (Lapan et al. 1996; Matsui et al. 1990; Pajares and Miller 1994; Randhawa et al. 1993; Wang 2003) have also identified gender differences in mathematics self-efficacy favoring males.

In terms of writing self-efficacy, Pajares and Valiante (1996) examined the self-efficacy of 218 grade 5 students using a group-administered measure and found that significant gender differences favored girls. Similar analytical results favoring girls were identified by Stang (2001). However, Pajares and Johnson (1995) failed to observe gender differences in writing self-efficacy in a survey of grade 9 students. A review of self-concept research indicated that gender differences existed in various domains. Wilgenbusch and Merrell (1999), who analyzed 22 studies measuring multidimensional self-concept, identified gender differences that were consistent with gender stereotypes. As d = 0.28, average mathematics self-concept for males was 0.28 standard deviations above the average mathematics self-concept for females. Conversely, females had higher verbal self-concept than males, with d = −0.23. As the effect of subject area on gender differences in self-concept was noted in previous meta-analyses, hypothesis 2 is proposed:

  1. Hypothesis 2:

    Subject area significantly predicts variation of gender differences in academic self-efficacy.

Age

Self-beliefs theorists have argued that self-belief differences change during life stage. For example, Goetz et al. (2010) demonstrated that the domain-specific self-concepts of young children are less distinct than those of relatively older children, adolescents, and adults. However, findings for the effect of age on academic self-efficacy are inconsistent. For example, Liew et al. (2008) found that academic self-efficacy changed little from grade 1 to grade 2. On the other hand, Caprara et al. (2008) utilized the six-wave design to examine the development of self-regulatory efficacy for a sample of 412 students aged 12 at study inception. The interval between each measurement was 1 year. They demonstrated that self-regulatory efficacy declined progressively. Cross-sectional research has demonstrated that age moderates gender differences in academic self-efficacy. Hunter et al. (2005) analyzed speaking and listening self-efficacy of 577 grade 5, 594 grade 8, and 556 grade 11 Canadian students using a five-item questionnaire. Gender differences were moderated by age. The beliefs of both boys and girls in their abilities as effective listeners increased as age increased. For the remaining items, female self-efficacy reduced from grade 5 to grade 8 and then returned to near its original level in grade 11. Zimmerman and Martinez-Pons (1990) examined the development of academic self-efficacy for 30 grade 5, 30 grade 8, and 30 grade 11 gifted students. Notably, they failed to find evidence of a significant interaction between gender and grade. Lloyd et al. (2005) also identified that no significant interaction existed between gender and age. Because of this inconsistency of findings for the existence of an age effect on gender differences in academic self-efficacy, one should consider the moderating effect of participant age. Thus, hypothesis 3 is as follows:

  1. Hypothesis 3:

    Participant age significantly predicts variation in gender differences in academic self-efficacy.

Culture

Predictions based on culture should stem from the knowledge of cultural differences in academic self-efficacy. For instance, personal achievement is emphasized in individualist cultures (Hofstede 1984), whereas common interest is emphasized in collective cultures (Hui and Triandis 1986). That is, self-efficacy may have different meanings across cultures. Scholz et al. (2002) compared the psychometric properties of the general self-efficacy scale for 19,120 participants from 25 countries. Generally, participants from collective cultures had relatively low general self-efficacy. Cultural differences in academic self-efficacy were also identified. For example, Kim and Park (2006) suggested that US students perceived their ability as high, even though they did poorly in mathematics and the sciences. Conversely, students from East Asia, where collectivism is emphasized, tended to have lower academic confidence than individualism. To test culture differences in math self-efficacy, Lee (2009) used exploratory and confirmatory factor analyses to examine student data obtained by the Programme for International Student Assessment 2003 project. Lee found that participants from collective cultures, such as those from Japan and Korea, had low math self-efficacy despite high scores on math tests. Further, Kling et al. (1999) performed a meta-analysis of 216 effect sizes and found that the country effect on gender differences in self-esteem was significant (d = 0.17 for American participants; 0.24 for Australian, Canadian, and Norwegian participants; and 0.31 for other countries). As empirical studies support cultural differences in academic self-efficacy and a significant country effect on gender differences in self-esteem was found in previous meta-analyses, investigating a possible culture effect on gender differences in academic self-efficacy is worthwhile. Thus, hypothesis 4 is proposed:

  1. Hypothesis 4:

    Culture significantly predicts variation of gender differences in academic self-efficacy.

Publication status

Meta-analysts are often concerned about the issue of file-drawer problems, which refers to studies with non-significant results that are likely not published (Sutton 2009). Thus, a meta-analysis of only published data may overstate the strengths of relationships among variables. Gender differences in academic self-efficacy may vary as a function of study publication status; hence, hypothesis 5 is proposed:

  1. Hypothesis 5:

    Publication status significantly predicts variation of gender differences in academic self-efficacy.

Variance analysis

Interpretations of mean difference can be erroneous when the variances of two groups are unequal. Feingold (1992) indicated that when both gender groups have the same mean academic self-efficacy but different variability in academic self-efficacy, the gender with the higher variability will be overrepresented by individuals with extremely high and low academic self-efficacy. When both gender groups have different mean academic self-efficacy and variability, the ratio of number of both gender groups with low and high academic self-efficacy differed. As such, gender differences at low and high levels of academic self-efficacy would differ from the effect size in a given study. For instance, if females had a higher mean academic self-efficacy and variability than males, females are overrepresented by individuals with high academic self-efficacy. The gender differences for individuals with high academic self-efficacy will exceed the effect size in a given study. Conversely, gender differences for individuals with low academic self-efficacy will be smaller than the effect size for that study. In contrast, when the gender with a high mean academic self-efficacy has low variability, gender differences for individuals with low academic self-efficacy will exceed effect size in that study. Additionally, gender differences with high academic self-efficacy will be smaller than the effect size of that study. Since interpretations of mean-level difference depend on whether equality of variance holds, one must test the effect of gender differences on academic self-efficacy variability. Kling et al. (1999) examined gender differences in variability in self-esteem for 174 samples and found no significant difference in variance between males and females. Whether variance of self-efficacy between males and females is the same has not been established by integrating analytical results across studies; thus, a quantitative assessment is worthwhile. Based on the equality of variance in previous meta-analyses, hypothesis 6 is proposed:

  1. Hypothesis 6:

    No gender differences exist in variance of academic self-efficacy between males and females.

Method

Literature search

A computerized search of the ERIC and ProQuest Dissertations and Theses Databases was performed using possible combinations of the keywords: self-efficacy, gender, and sex to search for studies published through February 2008. First, the study had to include a measure of academic self-efficacy or measures of domain-specific efficacy (for example, math, science, or computer). Studies examining general self-efficacy, career/vocational self-efficacy, health self-efficacy, or family or gender role self-efficacy were excluded. To compute average effect size, studies were included if the sample size was reported. Finally, the study needs to be published in English. This search yielded 2,102 hits and 998 studies were retrieved for further review based on the title, keyword, and abstracts. The included studies have been listed in “Appendix 1.”

Coding

Besides information used to calculate effect sizes (g and variance ratio), weights (numbers of female and male participants), and direction of the difference between the academic self-efficacy scores of female and male students, the following information was also coded for each study: (a) domain of self-efficacy measure, (b) mean sample age, (c) country where the study was conducted, and (d) publication status.

Domain of self-efficacy measure

The self-efficacy measure domain was coded as one of seven categories: language arts, mathematics, science, social sciences, computers, general academics, and others. When the category of others was chosen, the domain of self-efficacy was specified.

Mean participant age

Participant mean age was recorded. When participant grade level was reported, 5 years were added to obtain the estimate of mean age.

Country where the study was conducted

The country where the study was conducted was specified.

Publication status

The publication was coded as journal, dissertation, thesis, and conference paper.

One former student of the author along with the author were coders in this study. To achieve a high level of agreement, a coder training manual, reference guide, and coding sheet were developed by the author. At each training meeting, each coder used the reference guide and manual to independently code 5 articles. Coding problems were discussed and changes to coding sheets were made accordingly. After the initial training meeting, each coder independently coded all studies. Discrepancies among coders were resolved through discussions. In the course of these conversations, the coding schema was revised. For categorical variables, inter-rater agreement exceeded 88% for all coding categories (domain of academic self-efficacy, country where the study was conducted, and publication status). For continuous variables, the coder reliability of coding for mean sample age and sample size for males and females all exceeded 0.89.

Analyses

The effect size used in this study was Hedge’s g (Hedges and Olkin 1985), computed by subtracting female mean from the male mean, then dividing by the pooled standard deviation of both groups. That is

$$ g = \frac{{M_{{\text{m}}} - M_{{\text{f}}} }}{{S_{{\text{p}}} }} $$

where M m is the mean for males, M f is the mean for females, and S p is the pooled standard deviation for males and females. Test statistics such as t values, F values, and p values were converted to g’s with conversion formulas (Rosenthal 1994). Positive values for g reveal that males had higher academic self-efficacy than females, while negative values demonstrate that females outperformed males. Most of the effect sizes were computed based on means and standard deviations presented in primary studies, while some effect sizes were converted from r, t, or univariate F statistics. All effect sizes g’s were corrected for overestimation of the population effect size, which occurs especially for small samples by using the formula provided by Hedges and Olkin (1985). In other words, adjusted g was obtained by multiplying unadjusted g by 1 − (3/4n − 9), where n is the sample size. Then, weighted mean effect sizes were computed to estimate the average effect sizes. Specifically, each effect size was weighted by the inverse of its variance. The sum of these products was then divided by the sum of the inverses for the computation of the weighted mean. The significance of the mean effect size was tested by computing 95% confidence interval. If the 95% confidence interval includes 0, the mean effect size is not significantly different from zero. Otherwise, it is significantly different from zero.

Under fixed-effects assumptions, all studies are assumed to have the same true effect sizes. The variation in the observed effect size is because of sampling error. Because of the implausibility of this assumption, the random-effects model assuming both sampling error and random components as cause for the variation of effect sizes was used. The homogeneity of effect size was tested by Q, which is distributed approximately as χ 2 with k − 1 degrees of freedom, where k is the number of effect sizes. A significant Q indicates that heterogeneity among effect sizes and moderators are therefore introduced to explain the variability.

The variance ratio was computed by dividing the male variance by the female variance. A variance ratio exceeding 1 indicates higher variability among males than females. Conversely, a variance ratio smaller than 1 demonstrates higher variability among females than males. Finally, a variance ratio of 1 indicates equal variability in males and females. Since arbitrarily putting male variance on the numerator will overestimate the male variance (Katzman and Alliger 1992; Kling et al. 1999; Shaffer 1992), variance ratios were log-transformed for calculating the weighted mean.

Independence

Multiple effect sizes are considered non-independent when they are from the same participant sample. The most common situation is when multiple effect sizes were obtained from participant responses to different domains of self-efficacy. When students presented multiple effect sizes in different subject area, they averaged to form a single effect to represent that study. In analyzing the moderating effect of subject area on gender differences in academic self-efficacy, multiple effect sizes were considered independent.

Results

Outlier analysis

Two outlier analyses were performed to examine whether the mean effect size was robust after excluding extreme effect sizes and sample sizes. For the 247 independent effect sizes, the mean was g = 0.08 with a 95% confidence interval of 0.03 to 0.12. As the significance of the mean effect size can be tested by 95% confidence interval, the mean effect size was significantly different from zero. Three potential extreme values, one extremely high (1.40) and two extremely low (−1.26 and −1.60), were analyzed to determine its effect on weighted mean effect size. When these extreme values were excluded one at a time, the mean effect size was comparable, and thus, these studies were included in further analysis. For the outlier analysis of sample sizes, when the study with sample size n = 5,455 was excluded, the mean effect size was 0.08. When the study with sample size n = 4,018 was excluded, the mean effect size was again 0.08. Since these two studies did not unduly affect the magnitude of mean effect size, they were also retained.

Study characteristics

One hundred eighty-seven studies yielded 247 independent samples. Of these, 27 studies yielded multiple independent samples. Furthermore, 15 studies contained 2 data sets, 4 had 3 data sets, 3 contained 4 data sets, 3 contained 5 data sets, 1 contained 6 data sets, and 1 contained 12 data sets. “Appendix 2” lists the sample size for males and females, mean age, country where study was conducted, domain of academic self-efficacy, variance ratio, and effect size for each study included in the meta-analysis. Sixty-four studies took the form of journal articles, 166 were doctoral dissertations, 2 comprised master theses, and 15 were conference papers.

The country where the research was conducted was unavailable in four studies. In the remaining studies, the majority of the samples were conducted in the USA (N = 201), 14 were conducted in Taiwan, 9 in Canada, 5 in Australia, 3 in Israel, and 2 in Japan. China, Greece, India, Malaysia, Norway, Sultan, Sweden, Turkey, and UK each accounted for one sample. Mean participant age was available in 235 studies. Three studies measured academic self-efficacy longitudinally over a 1-year interval. Specifically, academic self-efficacy was first measured when participants were 10 years old then re-measured at 11 years old in Anderman (1994). In Graham (2000), participants were measured twice at 11, 12, and 13 years old. Meanwhile, in Scott (2000), participants were measured at both 12 and 17 years old. For the remaining studies (k = 232), the mean participant age was 16.61 years old. The total number of participants was 68,429 (32,666 males and 35,763 females).

Coding multiple effect sizes for various subject areas from the same participant sample yielded 269 effect sizes. Of these, 34 studies focused on language arts self-efficacy, 78 on mathematics self-efficacy, 25 on science self-efficacy, 5 on social sciences self-efficacy, 53 on computer self-efficacy, 55 on general academic self-efficacy, and 16 for others (e.g., statistics). Three of 269 effect sizes involved multiple domains of self-efficacy. One effect size involved both language arts and social science self-efficacy, and two effect sizes involved math and science self-efficacy:

  1. Hypothesis 1:

    The existence of gender differences in academic self-efficacy.

The effect sizes ranged from −1.60 to 1.40, with mean g = 0.08. These findings indicate that the average academic self-efficacy for males is 0.08 standard deviations above the average academic self-efficacy for females. To address the issue whether all 247 effect sizes estimate the same population parameter, the homogeneity test was conducted. The meta-analytic model fit statistic was Q = 300.00, p < 0.05. As the hypothesis of homogeneity was rejected, moderator analyses were introduced to explain the systematic variability in effect sizes.

Moderator analysis

  1. Hypothesis 2:

    The existence of subject area effect on gender differences in academic self-efficacy.

Table 1 lists the moderator analyses for domain, age, culture, and publication status. Multiple effect sizes were coded when studies assess multiple components of academic self-efficacy. The mean effect sizes for language arts, mathematics, social science, and computer self-efficacy differed significantly from 0. For language arts self-efficacy, the mean effect size was g = −0.16, indicating higher female language arts self-efficacy. For mathematics self-efficacy, the mean effect sizes were g = 0.18, indicating higher male mathematics self-efficacy. Higher male self-efficacy than female self-efficacy was also observed for computer self-efficacy. These findings are consistent with gender stereotypes. Although males exhibited higher social science self-efficacy, this finding was based on only five data points, and therefore, caution is necessary in interpreting this result.

Table 1 Moderator analyses for variation of gender differences in academic self-efficacy
  1. Hypothesis 3:

    The existence of age effect on gender differences in academic self-efficacy.

The mean age of the samples was classified based on school levels and categorized into the following age groups: 6–10 (elementary school), 11–14 (middle school), 15–18 (high school), 19–22 (college), and over 23 years old. As shown in Table 1, the effect size for the age groups of 15–18 and over 23 years old were statistically significantly different from zero. The 95% confidence intervals for age groups of 6–10, 11–14, and 19–22 years old included zero, indicating no gender differences in academic self-efficacy for these age groups. The largest effect size occurred for the group aged over 23 years old, but the effect size was small at 0.23 using the guidelines of Cohen (1988). The between-groups homogeneity statistic was non-significant, Q B= 9.22, p = 0.06. As the mean effect sizes appeared to increase with age, weighted regression analysis using age as a continuous variable was employed for the hypothesis testing. The regression coefficient b = 0.01 (p < 0.01) indicates that for each unit change in age, an average gain of 0.01 unit is in effect size. Mean participant age explained 3.17% of the variance among the effect sizes. One note of caution is that the proportion of explained variance was underestimated (Aloe et al. 2010). Some variability in effect sizes due to random sampling error at the study level is unexplainable. Hence, 100% of variation in data that are theoretically explainable does not exist in meta-analyses. Because total variance in effect sizes includes random sampling error, the ratio of variance explained to total variance therefore underestimates the proportion of explained variance (Aloe et al. 2010).

The age analyses presented thus far pools the effect sizes across different domains of academic self-efficacy. Because mathematics self-efficacy was measured in numerous samples (k = 78), the age effect on mathematics self-efficacy was tested. Of these 78 studies, the mean sample age was available in 74 samples. The results of the analysis of mathematics self-efficacy by age group are presented in Table 2. The five age groups significantly explain the variation in effect sizes, Q B = 12.97, p = 0.01. The 95% confidence intervals for the groups of 6–10 and 11–14 years old included 0, indicating no gender differences in mathematics self-efficacy for the two youngest age groups. The mean effect sizes for the age groups of 15–18, 19–22, and over 23 years old were 0.20, 0.36, and 0.33, respectively. For the pattern of the age effect of mathematics self-efficacy, relatively large mean effect sizes emerged in the older age groups.

Table 2 Magnitude of gender differences in math self-efficacy as a function of age group
  1. Hypothesis 4:

    The existence of culture effect on gender differences in academic self-efficacy.

Cultures were classified as individualistic or collective. Individualistic cultures included samples from USA, Canada, Australia, Greece, Sweden, Norway, and UK, while collective cultures included samples from Taiwan, Japan, China, Malaysia, Turkey, Israel, and India (Morling and Lamoreaux 2008). As shown in Table 1, the effect size based on participants from individualistic culture was 0.08, indicating that male had higher academic self-efficacy than females but the effect was small. The effect size for collective culture was not significantly different from 0. As Q B (1.19) was not significant, individualistic (0.08) and collective (0.07) cultures exhibited no statistically significant differences in gender differences in academic self-efficacy.

  1. Hypothesis 5:

    The existence of the effect of publication status on gender differences in academic self-efficacy.

Whether the magnitude of gender differences in academic self-efficacy varied as a function of publication status was tested. No evidence supports the effect of publication status on gender differences in academic self-efficacy (Q B = 4.21, p > 0.05). Effect sizes for journal articles and doctoral dissertations differed significantly from 0. However, the effect sizes were also small at 0.10 and 0.08, respectively.

To test for the existence of publication bias, another three statistical tests were conducted. The correlation between ranks of standardized effect sizes and sample size was computed. Kendall’s rank correlation (τ = −0.03) and Spearman rank correlation (r s = −0.04) were both non-significant (p > 0.05), indicating that no publication bias exists. Rosenthal’s (1991) fail-safe number was estimated to test the number of missing studies with a mean effect size of 0 that is needed to reduce mean effect size from statistical significance to non-significance. To reduce the significance of mean gender differences in academic self-efficacy to 0.05, 5,748 additional unpublished studies would be required. Orwin’s (1983) fail-safe number was utilized to estimate the number of missing studies needed to reduce mean effect size in the studies. When d = 0.01 was used as a criterion, Orwin’s fail-safe number was 1,881, meaning that 1,881 studies would be needed to bring the mean correlation (d = 0.08) in this meta-analysis to d = 0.01. Both Rosenthal’s and Orwin’s fail-safe numbers exceeded the criterion number (5k + 10 = 1,245, where k = 247 effect sizes are used to estimate mean effect size, Rosenthal 1991), indicating that publication bias did not threaten the validity of study findings.

  1. Hypothesis 6:

    Equality of variance between males and females.

Standard deviations for females and males were not available for 42 samples, and therefore, the variance ratios could be computed for 205 independent samples. There was one outlier with variance ratio of 340.27. The extreme variance ratio may result from the small sample size (N = 12). For the others, the variance ratio ranged from 0.12 to 7.76. Because the variance ratios overestimate male variance, the variance ratio was log-transformed to correct this bias. The weighted mean of the log-transformed variance ratios was 0.09, with a 95% confidence interval of −0.01 to 0.03. As the 95% confidence interval included 0, the mean was not significantly different from 0. The weighted mean log-transformed variance ratio of 0.09 corresponds to 1.02, indicating that the male variance was approximately 102% as large as the mean female variance. Hence, this difference was little.

Conclusions and discussion

This study summarized research on gender differences in academic self-efficacy. Standardized mean differences (N = 68,429) were analyzed in 247 independent samples. The overall effect demonstrates that males have slightly higher academic self-efficacy than females (g = 0.08). Further analysis suggests that content domain was a significant moderator in explaining variation in gender differences in academic self-efficacy. These differences are consistent with gender differences in previous meta-analyses (Wilgenbusch and Merrell 1999; Whitley 1997). Categorical model analyses indicate that gender differences exist in the four domains of academic self-efficacy—language arts, mathematics, computer, and the social sciences. Females had higher language arts self-efficacy than males, while males had higher self-efficacy in mathematics, computer, and the social sciences than females. As the number of studies on gender differences in self-efficacy of the social sciences was insufficient, findings for social sciences self-efficacy should be taken cautiously. Conversely, no gender differences exist in science self-efficacy. Further, the gender difference in global academic self-efficacy was quite small. The hierarchical structure of academic self-efficacy may in part account for the inconsistent findings for academic self-efficacy. Future studies should consider this hierarchy, as noted by self-concept researchers (e.g., Marsh and Craven 2006; Valentine et al. 2004).

Because self-esteem and self-efficacy are important components of self-beliefs, comparing gender differences in self-efficacy with those in self-esteem is reasonable. Mean effect size of 0.08 in this study exceeds that of gender difference in self-esteem (r = −0.01) by Sahlstein and Allen (2002) and is similar to those in self-esteem (g = −0.08) in adults aged over 60 years old by Pinquart and Sörensen (2001). The effect of this meta-analysis is smaller than that in self-esteem identified by Kling et al. (1999) and Major et al. (1999). For domain-specific self-efficacy, Whitley (1997) demonstrated that mean gender differences in computer self-efficacy was d = 0.41. However, this synthesis reveals a small effect (g = 0.18) for computer self-efficacy. That effect size for computer self-efficacy was smaller in this meta-analysis than a previous meta-analysis may be due to differences in sample characteristics. Whitley (1997) included US and Canadian samples only, whereas 10 of the 53 effect sizes were not based on samples from North America in this meta-analysis. Compared with the domain-specific self-concept, gender differences in domain-specific self-efficacy were relatively small. Specifically, gender differences in language arts and mathematics self-efficacy in this study were comparatively lower at −0.16 and 0.18, respectively. In a study by Wilgenbusch and Merrell (1999), gender differences for verbal and mathematics self-concept were −0.23 and 0.28, respectively.

Effect size may vary with respondent age. For instance, Pajares (2002) proposed that males and females have similar levels of mathematics self-efficacy during elementary school, while males develop higher mathematics self-efficacy than females by middle school. The age effect was supported by this meta-analysis. Effect sizes for students aged 15–18 and >23 years differed significantly from 0, as the 95% confidence interval did not include 0. For mathematics self-efficacy, no evidence existed for the emergence of a significant gender difference from childhood to early adolescence; in groups of students aged 6–10 and 11–14, effect sizes did not differ significantly from 0. Conversely, among all groups of students aged over 14, all effect sizes were statistically significant, with males having higher mathematics self-efficacy than females. The finding that males had higher mathematics self-efficacy than females after early adolescence may be explained by age trends in the magnitude of gender difference in mathematics achievement. Hyde et al. (1990), who analyzed 100 studies, found that males had higher mathematics achievement than females during high school and this difference increased as student age increased. More specifically, females had slightly higher mathematics achievement than males in elementary and middle school (−0.06 and −0.07, respectively). Males had higher math achievement than females in high school with a mean effect size of d = 0.29, and this difference continued through college with a mean size of d = 0.41 and adulthood with a mean effect size of d = 0.59. The practical implication is that programs designed to improve the academic self-efficacy of girls are needed, especially for female adults. Further, future research should examine gender differences in domain-specific self-efficacy longitudinally to determine whether gender differences increase during different life stages. Longitudinal methods examining gender differences in academic self-efficacy are also required to identify the effect of such differences on course selection and career choice. Most existing research on gender differences in academic self-efficacy has measured academic self-efficacy at a single time point. The cross-sectional approach provides a snapshot of gender differences in academic self-efficacy, whereas the longitudinal approach can represent developmental trajectory. If gender differences are dynamic, applying longitudinal methods is worthwhile.

People in different cultures may have different gender difference patterns for academic self-efficacy, resulting from different socialization practices. For instance, both academic and athletic success is emphasized in Western cultures, whereas academic success is the only focus in Asian schools. The hypothesis that culture may be a significant moderator of gender differences in academic self-efficacy was not supported in this study. No significant differences in academic self-efficacy existed between individualistic and collective cultures, likely due to the low statistical power and diversity of countries. As the number of studies from collective cultures was not sufficiently large, statistical power was low. Given the dichotomous nature of grouping a variety of countries into individualistic and collective cultures, the culture effect may be canceled out.

Researchers may choose not report non-significant effect sizes because non-significant outcomes may not be published (Sutton 2009). Therefore, publication status may be a moderating factor for gender differences in academic self-efficacy. Findings obtained by this meta-analysis do not support this contention.

Many researchers (Feingold 1992; Hedges and Friedman 1993; Kling et al. 1999) cautioned that any interpretation of mean-level difference can be misleading when the assumption of equality of variance does not hold. This study tested the assumption of equality of variance of gender groups. Consistent with the finding for self-esteem (Kling et al. 1999), males and females displayed similar variances in terms of academic self-efficacy. Since equality of variance of gender groups holds, comparing means of academic self-efficacy between males and females is valid.

Pajares (1996) argued that mixed findings for self-efficacy may result from an inappropriate measurement of self-efficacy. Some measures, including self-efficacy in mathematical problem-solving (Pajares and Miller 1994), self-efficacy for writing skills (Shell et al. 1989), self-efficacy for performance for division problems (Schunk 1981), and self-efficacy for reading tasks (Shell et al. 1995), were designed to assess task-specific confidence. Self-efficacy for academic achievement (Bandura 1989) assesses domain-specific beliefs. Unfortunately, a comparison of gender differences in academic self-efficacy across scales was not pursued due to the large number of scales and small number of studies using the same scale. Future research should determine whether gender differences in self-efficacy depend on instrument scales. Further, some studies did not report reliability estimates for academic self-efficacy measures. Of the remaining studies, some reported reliability estimates based on subscales, while others provided reliability estimates for the entire scale. As a lack of consistency exists in the manner in which reliability estimates are reported, the effect of a reliability estimate of an academic self-efficacy measure on gender differences in academic self-efficacy was not examined in this study. Future research should address this possibility.

To summarize, gender differences in academic self-efficacy were statistically significant but small. However, these small effects may have practical importance. Lent et al. (1986) and Lent et al. (2005) suggested that academic self-efficacy is a key variable in academic/career choice for both male and female students. Moreover, gender gaps in academic self-efficacy increase as age increased in this meta-analysis. Consequently, a small effect size during early life may result in differential economic achievement between males and females because of differences in academic self-efficacy, course selection, and career choices. Future research is needed to examine the consequences of this small effect of gender differences in academic self-efficacy on occupational choice and career achievement.

Despite this study’s contributions to literature, this study has several limitations. First, gender differences in self-efficacy were analyzed in academic contexts but not in the athletic domain. Findings of this meta-analysis may therefore be inapplicable to such a context. Second, this study included studies measuring academic self-efficacy via self-reported data. Studies that experimentally assessed participant academic self-efficacy were excluded. Findings of this meta-analysis may be inapplicable to state-like academic self-efficacy.