The question of how to promote students’ academic achievement and educational attainment is of utmost concern in educational settings (Denissen et al., 2007; Marsh & Martin, 2011). Academic self-concept (ASC) is considered a pivotal factor that influences a number of educational variables such as academic achievement (Valentine et al., 2004), achievement goals (Niepel et al., 2014), academic interests (Marsh et al., 2005), educational attainment level (Marsh & O’Mara, 2008), and course selections (Marsh & Yeung, 1997).

Over the past few decades, extensive research has been carried out to investigate for the relationship between ASC and achievement. Since the 1970s, three models have been proposed to describe the relationship: the self-enhancement model, the skill-development model, and the reciprocal effects model (REM) (Calsyn & Kenny, 1977; Marsh & Craven, 2006). According to the self-enhancement model, ASC is the determinant of academic achievement, not vice-versa. The skill-development model posits that academic achievement predicts ASC but that ASC does not affect achievement (Calsyn & Kenny, 1977). REM, another model that reconciles the two aforementioned either-or views, postulates that early ASC influences later academic achievement and prior achievement leads to gains in subsequent ASC within domains over time (Marsh & Craven, 2006; Marsh & Martin, 2011). Although previous longitudinal studies supported the REM (e.g., Chen et al., 2013; Möller et al., 2014; Niepel et al., 2014; Sewasew & Schroeders, 2019), this relationship has not been investigated fully from a developmental perspective.

In the present study, we conducted a meta-analysis to systematically integrate previous longitudinal studies on the relationship between ASC and achievement, aiming to provide further evidence for REM and examine the development of the relationship between ASC and achievement from childhood to adolescence. In addition to student age, we investigated other theoretical and methodological moderators that might influence the relationship between ASC and achievement, including student achievement level, types of achievement measurement, domain of measurement, match between domain specificity of ASC and achievement, time lags between measurement waves, variable types, sample types, and publication year.

The Reciprocal Effects Model

Shavelson et al. (1976) broadly defined self-concept as “a person’s self-beliefs formed through experience with and interpretations of his or her environment (p. 411)”. In an academic context, ASC has received extensive attention because of its direct relevance to educational settings. It pertains to one’s beliefs about ability in academic domain (Susperreguy et al., 2017) and is typically measured by “being” and “feeling” questions such as “I get good marks in (a subject)” or “I learn things quickly in (a subject)” (see Marsh, 1990; Wigfield et al., 1997). This is distinguished from academic self-efficacy which refers to the learners’ convictions that they could perform well in academic tasks to achieve their goals (Bandura, 2010).

While ASC and academic self-efficacy share some commonalities (e.g., being self-related beliefs, predictions of educational outcomes and emotion), three key differences should be kept in mind: (a) the central element: ASC mainly taps the individuals’ perceived academic competence based on past accomplishments and circumstances, whereas academic self-efficacy refers to people’s perceived confidence in successfully coping with different tasks in academic situations (Bong & Skaalvik, 2003; Marsh et al., 2019; Pietsch et al., 2003); (b) composition: more recent studies have shown that ASC is separated into cognitive and affective components (e.g., Arens et al., 2014; Arens & Hasselhorn, 2015; Kadir et al., 2017; Pinxten et al., 2014). The cognitive component is concerned with students’ sense of competence or the judgement of their own academic ability in a specific subject, as measured by items such as “Are you good at (a subject)” or “Do you get good marks in (a subject)” (Bong & Skaalvik, 2003; Kadir et al., 2017). The affective component taps individuals’ affective-motivational reactions toward the domain by asking the items such as “Are you interested in (a subject)” or “Do you enjoy in (a subject)” (Bong & Skaalvik, 2003; Kadir et al., 2017). However, academic self-efficacy “exclusively taps the cognitive aspect of student’ self-perceptions” (Bong & Skaalvik, 2003, p. 13); (c) the nature of evaluation: the formation of ASC is most heavily influenced by social and dimensional comparison, whereas the level of academic self-efficacy primarily depends on individuals’ previous mastery experience in similar tasks (Bong & Skaalvik, 2003; Pietsch et al., 2003). Given the distinctions between the two constructs, our study focuses on the relationship between ASC and achievement.

The nature of the relationship between ASC and achievement is one of the “thorniest issues” in the field of educational psychology (Pajares & Schunk, 2001) and remains to be resolved. As Marsh and Martin (2011) and Shavelson et al. (1976) mentioned, achievement is the main source of the formation of ASC, so there is no need to argue the effect of achievement on ASC. The reciprocal internal/external frame of reference model also proposes that individuals develop and shape their ASC by social, dimensional, and temporal comparisons (Sewasew & Schroeders, 2019). That is, students perceive their domain-specific competence by comparing their achievement in a particular domain (e.g., a school subject) with the achievement of their classmates in the same domain (social comparison; Festinger, 1954), with their achievement in other different subjects (dimensional comparison; Möller & Marsh, 2013), and with their own prior achievement in the same domain (temporal comparison; Albert, 1977).

However, does a link from prior ASC to subsequent achievement in fact exist? Self-belief is an important driver of people’s behaviors and thoughts (Graham & Weiner, 1996). It is partly on the basis of beliefs that people choose what challenges to take on, how much effort to expend in the endeavor, and how long to persist in the face of obstacles (Bandura, 2010; Graham & Weiner, 1996). In addition, it has been suggested that students who have a positive view of themselves might strive to align their behavior and performance with their self-image (Sherman, 2013; Sherman & Cohen, 2006). It appears that students with a positive ASC are more motivated to self-regulation (e.g., adopting effective learning strategies, putting in efforts and persisting) to protect the consistency between self-concept and behaviors, and thus achieve academic success in a domain (Marsh, 1990; Marsh & Martin, 2011; Wigfield & Eccles, 2000). Hence, the process of linking ASC and achievement is likely to be reciprocal in theory: success in a domain enhances students’ perception of competence and gives them an intrinsic pleasure in the process of mastering the subject. As students work and become more skillful in a domain, the sense of competence for performing well would be maintained. In turn, their perceived high competence motivates them to invest more effort to achieve a cascade of success in future (e.g., Harter, 1978; Marsh & Martin, 2011).

To date, the reciprocal effect of ASC and achievement has been documented in some studies. Based on data from a large project in Germany (N = 1, 508), Möller et al. (2011) found that ASC and achievement were reciprocally related in both mathematical and verbal domains. Using two independent samples from an ongoing longitudinal study focusing on students’ self-concept development, Niepel et al. (2014) replicated the REM within the same domain. Sewasew and Schroeders (2019) assessed 16,216 students comprising a nationally representative sample from diverse ethnic and socioeconomic backgrounds and revealed a positive reciprocal relationship between ASC and achievement in the verbal and mathematical domains. Moreover, Valentine et al. (2004) identified 56 longitudinal studies exploring the relationship between self-beliefs (i.e., self-concept, self-esteem, self-efficacy) and achievement, supporting the REM. Based on path analyses, Huang’s (2011) study, in which 32 longitudinal studies investigating the relationship between self-concept (i.e., self-concept, self-esteem) and achievement were identified, also supported the REM.

However, although it is generally accepted that ASC and achievement are reciprocally related, some results in the extant literature did not support the REM. Marsh (1990) evaluated the causal order of general ASC and achievement using a four-wave design (i.e., early 10th grade, late 11th grade, late 12th grade, and 1 year after normal high school graduation). Results revealed that prior ASC affected subsequent grades but that prior grades had no effect on subsequent ASC, thereby supporting the self-enhancement model. Viljaranta et al. (2014) assessed students’ ASC and achievement (i.e., teacher-rating and school grade) when they were in Grades 1, 2, 4, and 7, and the results supported the skill-development model. These mixed results might be explained by the diversity of participants, measurement instruments, or methodological characteristics across studies.

Relationship Between ASC and Achievement from a Developmental Perspective

Previous studies have provided support for the generalizability of REM across age (e.g., Guay et al., 2003; Helmke & van Aken, 1995; Marsh et al., 1999). For instance, Guay et al. (2003) evaluated general ASC and achievement in elementary students (Mean age = 9 years old) on three occasions with 1-year interval and found that the REM was the best fitting model for all waves and age cohorts. Skaalvik and Valås (1999) assessed mathematical and verbal achievement and the corresponding self-concept twice for three cohorts of children (i.e., Grades 3, 6, and 8 at the beginning of the study). Their results indicated that the relationship between ASC and achievement was consistent with the skill-development model in childhood and adolescence, and no developmental effects were found.

However, some researchers have suggested a developmental perspective in which the nature of the relationship between ASC and achievement might demonstrate a trend from the skill-development model to the reciprocal model with age (Skaalvik & Hagtvet, 1990; Weidinger et al., 2018; Wigfield & Karpathian, 1991). In preschool and early elementary school, children tend to overestimate their academic competence and have an extremely positive ASC that is typically less relevant to external indicators such as grades and standardized tests (Marsh, 1990; Spinath & Spinath, 2005). Helmke and van Aken (1995) concluded that during elementary school, ASC is dominated mainly by cumulative achievement-related success and failure and has a minimally significant impact on subsequent grade or test performance. As students grow older, their ASC becomes increasingly realistic and differentiated, and depends more strongly on external achievement criteria (Marsh & Craven, 2006; Wigfield et al., 1997).

Some studies have lent support for the developmental hypothesis. Skaalvik and Hagtvet (1990) conducted a study with two cohorts of Norwegian school children who were assessed on two occasions on self-concept of ability and teacher-rating. Results showed a skill-development model in Grades 3 and 4 but a reciprocal model in Grades 6 and 7.

Weidinger et al. (2018) examined this developmental perspective using 542 children (Time [T]1: Mean age = 7.95 years old) whose mathematics competence beliefs and grades were evaluated on seven occasions, with an interval of four months between each wave. The results found that mathematics competence beliefs declined over time and the direction of the relationship between mathematics competence beliefs and mathematics grades changed over time. That was, at the end of second and the beginning of third grade, the skill-development model predominated over the self-enhancement model; at the end of elementary school, the REM was predominant.

Chen et al. (2013) assessed students’ self-concept in the mathematical and verbal domains using a multiple cohort (i.e., Grades 5 and 10 at the beginning of the study), multiple occasions (two occasions with a 1-year interval) design. Results showed that for elementary school students in childhood, the skill-development model was found in the verbal domain but the REM was found in the mathematical domain; for high school students in adolescence, ASC and achievement were reciprocally related in both domains. In addition, the effect of achievement decreased with age, whereas the effect of ASC increased with age, indicating a developmental trend. An overview of longitudinal studies that could suggest the developmental effect was presented in Table 1.

Table 1 Overview of longitudinal studies investigating the development change of the relationship between ASC and achievement (ordered according to the publication year)

To the best of our knowledge, most of the current studies have examined the development of the relationship between ASC and achievement in childhood or adolescence (e.g., Helmke & van Aken, 1995; Sewasew & Schroeders, 2019; Viljaranta et al., 2014; Weidinger et al., 2018), and few studies directly explored this development from childhood to adolescence. Moreover, in their meta-analysis, Valentine et al. (2004) explored only the moderating role of students age upon the effect of self-beliefs on achievement and found that age was not a significant moderator. This focus may have been insufficient for an examination of the development of the relationship between self-beliefs and achievement. Therefore, this relationship has not been investigated fully from a developmental perspective. The present meta-analysis tests whether the nature of the relationship between ASC and achievement changes with age by simultaneously examining both the moderating effect of participants’ age on the paths leading from ASC to achievement and on that from achievement to ASC.

Potential Moderators of Relationship Between ASC and Achievement

In addition to students’ age, some studies have shown that student achievement level affects the relationship between ASC and achievement (Gottfried et al., 2007; Prast et al., 2018). Academic experience varies from student to student: some students often experience academic success while some students often experience failure (Prast et al., 2018). Gottfried et al. (2007) showed that initial academic experience in school had long-term implications for students’ subsequent motivation and achievement. For instance, compared with typical-achieving students, low-achieving students experience less accomplishment, less positive reinforcement from others, less academic motivation, and more academic anxiety, which may weaken the association between ASC and achievement (Hampton & Mason, 2003; Prast et al., 2018). However, some longitudinal research demonstrated that the relationship between ASC and achievement was similar across students with different achievement levels (Gorges et al., 2018; Möller et al., 2014; Seaton et al., 2015). Hence, it is unclear whether student achievement levels affect the relationship between ASC and achievement.

The differences in the types of achievement measurement across studies might also account for the inconsistency among conclusions. First, Meyer et al. (2019) argued that the extent of the association between ASC and achievement differs according to types of achievement measurement. Achievement is typically measured by school grades, teacher-rating, or standardized tests, which separately capture different aspects of achievement (Pinxten et al., 2010; Valentine et al., 2004). Because ASC possesses motivational properties, when achievement is measured in a way that is more strongly related to motivation, it is also more strongly related to the ASC (Wylie, 1974). Standardized tests are typically conducted by research institutions or teams who commonly cannot announce test results or students’ relative level; thus, they are low stake and not of personal significance for the students themselves (Pinxten et al., 2010). However, school grades and teacher-rating often take additional information about students (e.g., academic effort or diligence) into account when rating students’ performance (Helmke & van Aken, 1995; Nagengast et al., 2013). Hence, it is hypothesized that achievement based on grades or teacher-rating is more closely related with ASC than studies using standard tests.

In addition, some studies have revealed that the relationship between ASC and achievement is remarkably domain specific (Chen et al., 2013; Liem et al., 2015). Chen et al. (2013) and Möller et al. (2014) found that REM was supported in the mathematical but not in the verbal domain. The domain specificity might be related to the characteristics of different subjects. The verbal domain that typically contains a variety of learning materials (e.g., poems, biographies) and classroom activities (e.g., grammar training, reading), is more broadly defined. In contrast, the topics and formats covered in the mathematical domain are more narrowly defined, primarily including problem solving exercises and nonverbal knowledge (Dweck, 1986; Götz et al., 2010; Stodolsky & Grossman, 1995), so students’ perceived competence and achievement are based on a more limited range of information sources. Therefore, it is speculated that subject domain could moderate the relationship between ASC and achievement.

Furthermore, Marsh’s early research (1992) found that the associations between achievement and ASC in corresponding subjects were more highly correlated (r mean = 0.57) than those in noncorresponding subjects (r mean = 0.33). According to the specificity-matching principle, a domain specific self-concept shows more predictive power for outcomes that belong to the same degree of specificity; therefore, ASC is expected to show the greatest predictive power for achievement that exhibits the same degree of specificity (Swann Jr et al., 2007). Thus, we aimed to examine whether studies with domain matching measures show a higher correlation than those studies without matching measures.

In addition to the moderators proposed above, we also attempted to examine some potential methodological moderators including time lags between measurement waves, variable types, and sample types. Collins and Graham (2002) stated that it is important to highlight the effect of time lags on effect sizes when including longitudinal studies in meta-analysis. Theoretically, a minimum amount of time is required to detect prospective effects between ASC and achievement. The effect sizes are zero when the time interval is zero; the effect sizes increase and reach a maximum when the time intervals lengthen to a specific time interval, and then the effect sizes should decrease to close to zero after long time interval (Cole & Maxwell, 2003; Taris & Kompier, 2014). A previous meta-analysis by Valentine et al. (2004) found that the relationship between self-belief and achievement varied with the time intervals, and that effect sizes based on longer time intervals (> 18 months) were larger than those based on shorter time intervals (< 6 months). We therefore anticipated that time lags between measurement waves would moderate the relationship between ASC and achievement.

Leonhart et al. (2008) showed that compared with variables calculated at the manifest level, latent constructs could yield a more valid and potentially higher estimates. Since latent variables cannot be measured directly, they are typically operationalized by two or more manifest variables through structural equation modeling (SEM). With reference to the assumptions of Classical Test Theory, a combination of answers to two or more questions might provide a more accurate estimate of the latent variable behind these manifest variables (Hair et al., 1998; Loehlin, 2004). In addition, the use of latent variables and SEM could correct for unreliability of scales and estimate the measurement error variance directly (Leonhart et al., 2008; Loehlin, 2004). Given these considerations, the latent variables analysis may yield a less biased, more valid estimation of true values and a better representation of theoretical constructs. Therefore, whether the use of latent variables affects the relationship between ASC and achievement is an issue of interest to us.

Moreover, most of the longitudinal studies included in our current meta-analysis used either representative sampling or convenience sampling. Convenience sampling is a non-probabilistic sampling or non-random sampling technique in which members of the target population who usually meet a series of realistic criteria, such as easy recruitment for the study, the willingness to participate, or geographical proximity (Etikan et al., 2016; Sedgwick, 2013). However, there are clear disadvantages to convenience sampling: the foremost being the low external validity due to sample bias. Moreover, the variability and bias cannot be measured or controlled (Acharya et al., 2013). In contrast, representative sampling usually utilizes random selection procedures and weighting techniques to maximize the representativeness of a sample (Zelinski et al., 2001). Two different sampling techniques might result in differences in the quality of the data collected. Therefore, we attempted to examine that whether the effect sizes would vary with sample types.

Finally, the latest publication year of the studies included in the Valentine et al.’s meta-analysis (2004) is 2001. However, 49 studies included in the current meta-analysis were published after 2001. Hence, we used the publication year as a moderator to investigate whether studies after 2001 have supported the REM.

The Present Meta-analysis

The first aim of the present meta-analysis was to extend on previous work by systematically integrating longitudinal studies on the relationship between ASC and achievement, and to provide further evidence for the REM. The one-stage meta-analytic SEM (OSMASEM) analysis that combines the advantages of meta-analysis and SEM was performed to examine whether the REM was established.

In addition, given that the relationship between ASC and achievement has not been examined fully from a developmental perspective, a secondary aim of our study was to test the development of the relationship between ASC and achievement from childhood to adolescence by simultaneously examining both the moderating effect of participant age on the path leading from ASC to achievement and on that from achievement to ASC.

Finally, previous studies have not yielded consistent conclusions on the moderating effect of some variables. We systematically investigated some theoretical and methodological moderators that might influence the relationship between ASC and achievement, including student achievement level, types of achievement measurement, domain of measurement, match between domain specificity of ASC and achievement, time lags between measurement waves, variable types, sample types, and publication year.

Method

Study Selection and Eligibility Criteria

Literature Search

Relevant studies were identified by an exhaustive search that proceeded in three ways. First, we searched reviews and research papers on electronic databases, including PsycINFO, PubMed, Web of Science, ERIC and Google Scholar using combinations of the keywords (a) (“self-concept” OR “academic perception” OR “academic construction” OR “academic perspective” OR “academic structure” OR “expectancy-value belief” OR “competence belief”), paired with (b) (achievement OR performance OR tests OR grade OR examination OR “grade point average”), and (c) (longitudinal OR causal OR prospective OR retrospective OR “risk factor” OR vulnerability OR antecedent OR cross-lagged OR predict OR “follow-up”). A full-text search strategy was used to avoid omitting eligible studies to the greatest extent possible. For all articles considered, we reviewed the abstract and examined the full texts whenever a paper might include a relevant effect. To reduce the publication bias, we also searched for as many unpublished dissertations and other unpublished studies as possible on ProQuest Dissertations and Theses and Google Scholar. The final research was conducted in November 2020. Second, if a study was identified as an eligible article, its reference list was further reviewed to examine if there were additional eligible studies. Third, the reference sections of previous two meta-analyses (i.e., Valentine et al., 2004; Huang, 2011) on the relationship between self-concept and achievement were screened. A total of 1725 abstracts were scanned for potential inclusion in this review (see Fig. 1 for PRISMA flow diagram).

Fig. 1
figure 1

PRISMA flow diagram illustrating the process of identifying eligible studies

Eligibility Criteria

Studies that fulfilled the following criteria were included in the current meta-analysis: (a) ASC and achievement were assessed with continuous and explicit measures. (b) To avoid confusion between ASC and other related self-beliefs, only studies that conformed with the definition of ASC were included, while those measuring academic self-efficacy and self-esteem were excluded. If a study assessed expectancy value belief but ASC was used to operationalize the expectancy component, this study was included. (c) The study used a longitudinal design. (d) For OSMASEM analysis, only studies in which ASC and achievement were measured simultaneously at two time points were included. (e) When separately analyzing two crossed-lag paths, studies in which at least one of the constructs (e.g., ASC or achievement) was measured at two time points were included. (f) Sufficient information was provided to calculate effect sizes. Only studies published in English were included owing to the language restriction of researchers. There were no restrictions on publication date of studies.

Study Selection

A total of 155 articles were assessed in full text, 68 of which met the eligibility criteria, including 61 peer-reviewed papers, 5 unpublished dissertations, and 2 conference paper. The authors of the remaining 87 studies that met the following criteria were contacted: (a) ASC and achievement were taken as the central constructs, and (b) most of the data for calculating the effect size were provided (e.g., only one correlation coefficient was missing). We eventually contacted the authors of 19 studies via e-email but did not receive any replies. See Fig. 1 for specific reasons for study exclusion at each stage of the search process.

Independence of Effect Sizes

Since large-scale longitudinal studies require enormous resources in terms of humans, energy, and finance, some studies that met the eligibility criteria may use overlapping samples. If several papers were based on the same sample, we only included one study that met the following criteria (in order of importance): (a) the study was most directly related to the current research. For example, if one study provided data on both ASC and achievement but another included data on only one of these concepts, the study with more information was selected; (b) the study contained the most comprehensive coding information and complete statistical data; (c) the study that was most recently published was selected. After careful inspection, there were 18 studies involving overlapping samples. To ensure the independence of effect sizes, nine of these papers were included.

Coding of Studies

We coded the following information for each study included in the current meta-analysis: sample size, participants mean age, achievement level, types of achievement measurement, match between domain specificity of ASC and achievement, domain of measurement, time lags between assessments, variable types, sample types, publication year, and effect sizes.

Effect Sizes

A cross-lagged regression coefficient generated by OSMASEM analysis that represented the effect of the predictor on the outcome variable after controlling for the initial levels of the outcome variable was used as effect size. Six Pearson correlation coefficients required for OSMASEM analysis needed to be coded. Among them were the concurrent correlation between two variables (i.e., the correlation of T1 ASC with T1 achievement, the correlation of T2 ASC with T2 achievement), the stability of ASC and achievement (i.e., the correlation of T1 ASC with T2 ASC, the correlation of T1 achievement with T2 achievement), and the cross-lagged correlation between ASC and achievement (i.e., the correlation of T1 ASC with T2 achievement, the correlation of T1 achievement with T2 ASC).

Age

Since participant age differed at different time lags in the longitudinal studies included in the current meta-analysis, it was impossible to code their precise age. Instead, we coded the sample as childhood, adolescence, or adulthood according to different development phases. For studies that did not provide the students’ age, we inferred this variable based on other valid information such as grade. For example, if a study adopted eighth-grade students as participants but did not report their age, we grouped them into adolescence.

Achievement Level

Achievement level was coded as “typical-achieving” if the sample was unselected, “low-achieving” if the sample was described as at-risk or poor performance, or “high-achieving” if the sample included students with high achievement, with excellent performance, or from the highest track.

Types of Achievement Measure

We coded achievement as standardized tests, grades, or teacher-rating basing on how achievement was measured.

Match of Domain Specificity Between ASC and Achievement

This moderator was coded as “match” if the measures of ASC and achievement belonged to the same subject (e.g., verbal self-concept and verbal achievement); otherwise, we coded it as “mismatch” (e.g., general ASC and mathematical achievement, mathematical self-concept, and GPA).

Domain of Measurement

The domain was coded as “mathematical” if the relationship between mathematical self-concept and mathematics achievement was reported, or “verbal” if the relationship between self-concept and achievement in the verbal domain was reported.

Time Lags Between Measurement Waves

We coded time lags between measurement waves in the following three categories: (a) less than or equal to 6 months; (b) more than 6 months and less than or equal to 12 months; and (c) more than 12 months. If a longitudinal study included more than two waves with an interval of several months, we coded it multiple times. For example, if a study included four waves with an interval of 6 months between each wave, it would be coded three times (T1–T2, time lag is 6 months; T1–T3, time lag is 12 months, and T1–T4, time lag is 18 months).

Variable Types

This moderator was coded as “latent” if the reported effect sizes were based on the latent variable level, or “manifest” if the reported effect sizes were based on the manifest variable level.

Sample Types

Sample types were coded as “representative sampling” if calculated effect sizes were based on a nationally representative sample, or “convenience sampling” if the calculated effect sizes were based on the sample selected by the researcher at a specific time and in a certain location.

Publication Year

Since the latest publication year of the studies included in the Valentine et al.’s meta-analysis (2004) is 2001, we coded the studies with publication years before 2001 or equal to 2001 as “before 2001” and coded the studies after 2001 as “after 2001.”

Sample Sizes

We used the sample sizes at T2 as the coding information. If some eligible studies only presented sample sizes at Time 1, we used other valid indicators such as the percent of attrition from the first assessment to the second assessment to estimate the sample sizes at Time 2.

In the coding process, sample sizes were coded as continuous variables that were centered for the analyses. Participant age, achievement level, types of achievement measurement, match or mismatch between domain specificity of ASC and achievement, domain of measurement, time lags between measurement waves, variable types, sample types and publication year were dummy-coded and submitted in F tests (as described below) to examine differences. See Online Resource 1 for summaries of the studies included in the meta-analysis.

Coder Reliability

All articles were coded by the first author, and 34 randomly selected articles were coded by the second author to provide estimates of interrater agreement. Interrater agreement was high (ICC > .95 and k > .97 for continuous and categorical variables, respectively) and discrepant ratings were discussed until consensus was reached.

Meta-analytic Procedure

The metaSEM package provides functions for univariate analysis, multivariate analysis, three-level meta-analysis, two-stage SEM (TSSEM), and OSMASEM using the SEM approach via the OpenMx package in the R statistical platform. The OSMASEM approach that we chose is the most suitable for processing longitudinal relationships between variables at continuous time points and building a cross-lagged model (Cheung, 2014). The metaSEM, with the maximum likelihood estimation for analyses, uses the sum rather than the average of sample sizes to compute the standard errors for the path coefficients, which increases the sensitivity of significance tests. For computation, we input the matrix of the sample size-weighted mean correlation in a dataset (Cheung & Chan, 2005).

To comprehensively test the direction of ASC and achievement, we separately analyzed the paths leading from ASC to achievement and from achievement to ASC. The effect size of each cross-lagged path was calculated based on the following equation (Cohen et al., 2013):

$$ {\beta}_{Y1.2}=\frac{\gamma_{Y1}-{\gamma}_{Y2}{\gamma}_{12}}{\begin{array}{c}\ \\ {}1-{\gamma}_{12}^2\end{array}} $$

where βY1.2 is the standardized regression coefficient of X1 predicting Y2 after eliminating the effect of Y1 on Y2. γY1 is the cross-lagged correlation between the predictor and the outcome (e.g., the correlation of T1 ASC with T2 achievement). γY2 is the autoregressive coefficient for the outcome variable (e.g., the correlationship of T1 achievement with T2 achievement). And γ12 is the concurrent correlationship between two variables, that is, the cross-sectional correlation (e.g., the correlation of T1 ASC with T1 achievement).

Referring to three previous meta-analyses (Khazanov & Ruscio, 2016; Kuykendall et al., 2015; Sowislo & Orth, 2013), cross-lagged effect sizes were also transformed using Fisher’s Zr transformations. The calculation and analysis of the effect sizes used the robust variance estimation (RVE) of the robu() function of the “rubumeta” package implemented in the R environment (version 3.3.3). RVE is a recently proposed meta-analytic method for dealing with dependent effect sizes (Fisher & Tipton, 2015). Because of the multidimensionality of ASC, most studies reported multiple correlated outcomes (e.g., the effect of mathematical achievement on mathematical self-concept; the effect of verbal achievement on verbal self-concept). Traditional meta-analytic methods are ill-equipped to deal with the complex and often unknown correlations among non-independent effect sizes, whereas RVE can provide valid point estimates, standard errors, and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown (Fisher & Tipton, 2015; Hedges et al., 2010).

Given the high correlation between ASC and achievement, ρ was set to 0.80. ρ is the user-specified within-study effect size correlation, and its default value is 0.8 in RVE (Hedges et al., 2010). The user-specified value must be between 0 and 1 when using the correlation effect model (Fisher & Tipton, 2015). In fact, different settings of the ρ value would not significantly influence final results. We also used the Wald_test () function in the clubSandwich package (version 0.2.2) in R to conduct F tests on models with dummy-coded coefficients, with the bias-reduced linearization adjustment for clustered standard errors and degrees of freedom estimated with Hotelling’s \( {T}_Z^2 \) method. In the effect size analyses, we used a random effects model that rests on the assumption that variation between studies is systematic and not only due to random error.

For each effect size, the z-score of the standardized residuals, which approximates a normal t distribution, was calculated to identify the presence of statistical outliers. If the z-score of the standardized residuals exceeded 1.96, the study was deemed to be an outlier (Viechtbauer & Cheung, 2010). Moreover, Cook’s distance plots were inspected to examine the influence of outliers on the results. If the z-score of the standardized residuals exceeded 1.96 but the Cook’s distance plots demonstrated that the outlier did not exert significant influence on the results, the study was retained.

To increase the robustness of the obtained effect size, we used three methods to test for publication bias. First, we visually inspected a funnel plot where the standard error for each study was plotted on the y-axis and the effect size on the x-axis. In the absence of publication bias, this plot would be expected to show a symmetrical shape. If the bottom of the funnel plot was asymmetrical, there would be evidence of publication bias. Second, we conducted Egger’s regression intercept test for funnel plot asymmetry. Finally, we used the trim-and-fill procedure to calculate the possible number of missing studies based on asymmetry in the funnel plot and compute an estimated overall effect size on this basis.

Results

Description of Studies

The 68 studies included in the present meta-analysis were published between 1981 and 2020, with a median publication year in 2011. In total, 240 effect sizes were reported based on 72 independent samples. Sample sizes ranged from 43 to 14,985 (M = 1,538, SD = 2,687, Median [Mdn] = 546). The proportion of female participants ranged from 0 to 100%, with a mean of 47.17%. The average age of participants at the first assessment varied between 6.60 and 20.50 (M = 12.19, SD = 2.92, Mdn = 12.20). Time lags between assessments ranged from 3 to 66 months (M = 20.62, SD = 13.46, Mdn = 18). Most publications were conducted in the US (33.8%), Germany (17.6%), China (10.3%), Canada (5.9%), and Australia (5.9%). In total, 58.8% of studies explored the relationship between ASC and achievement in verbal or mathematical domains, while the remainder explored the relationship between general ASC and achievement.

Preliminary Analyses

Preliminary analyses identified two outliers in six analyses (for details, see Table 2). However, Cook’s distance plots demonstrated that the outliers in all analyses did not have a significant influence on the results.

Table 2 Effect sizes for the relationship between ASC and achievement

Since we were more interested in the REM of the relationship between ASC and achievement, we conducted publication bias tests on two cross-lagged paths. For the cross-lagged effect of T1 achievement on T2 ASC, an inspection of the funnel plot did not show asymmetry, indicating the absence of publication bias (see Fig. 2a). However, Egger’s test for funnel plot asymmetry suggested possible publication bias, t (59) = − 2.53, p = 0.02. The significant evidence for publication bias prompted a concern that if more null effects of achievement on ASC had been published, these studies might have reduced the effect to a trivial or negligible size. To address this concern, we used Duval and Tweedie’s (2000) trim-and-fill procedure to look for missing studies on the left of the mean. However, the analysis did not estimate any missing studies (estimated missing = 0; SE = 4.50), showing that Egger’s test may overestimate publication bias. For the cross-lagged effect of T1 ASC on T2 achievement, the funnel plot indicated that the distributions of effect sizes exhibited a symmetrical shape (see Fig. 2b). The Egger’s test also returned nonsignificant results, t (60) = − 0.78, p = 0.44. Similarly, trim-and-fill analysis showed no evidence of missing studies (estimated missing = 0; SE = 4.30).

Fig. 2
figure 2

a Funnel plot on the effect of achievement on ASC. b Funnel plot on the effect of ASC on achievement

Primary Analyses

First, we used the one-stage meta-analytic SEM to meta-analyze the studies in which ASC and achievement were all measured at least twice (N = 55). The results demonstrated that the concurrent correlations of ASC and achievement were 0.35, p < 0.01 at T1 and 0.13, p < 0.01 at T2. In addition, T1 ASC could significantly predict T2 ASC (β= 0.45, p < 0.01) and T1 achievement could significantly predict T2 achievement (β= 0.64, p < 0.01). Most importantly, ASC and achievement were reciprocally related, although the effect of achievement on ASC was stronger; that was, achievement predicted ASC, β = 0.16, p < 0.01; and ASC predicted achievement, β = 0.08, p < 0.01 (see Fig. 3), supporting the REM.

Fig. 3
figure 3

Maximum likelihood estimation of the cross-lagged model

Additionally, to comprehensively test the REM, we separately analyzed the paths leading from achievement to ASC (N = 61) and from ASC to achievement (N = 62). The results supported the REM once more, and the effect of achievement on ASC (β= 0.15, p < 0.01, 95% CI [0.12, 0.18]) was larger than the effect of ASC on achievement (β= 0.09, p < 0.01, 95% CI [0.07, 0.11]) (see Table 2, Fig. 4a and b), strengthening our confidence in the reciprocal relationship between ASC and achievement. We examined the concurrent correlation at T1 and T2, and the prospective but uncontrolled correlation between two constructs, which allowed us to compare the meta-analytic results observed in previous research. As we hypothesized, when there was no control of the initial level of  outcome, the concurrent correlation (r T1 = 0.40, [0.35, 0.44], p < 0.01; r T2 = 0.43, [0.38, 0.49], p < 0.01) and prospective correlation between ASC and achievement (r achievement → ASC = 0.35, [0.31, 0.40], p < 0.01; r ASC → achievement = 0.35, [0.31, 0.39], p < 0.01) increased to a far larger correlation. The weighted mean effect sizes are displayed in Table 2.

Fig. 4
figure 4figure 4

a The forest plot of the effect of achievement on ASC. Size of the square indicates the relative weight assigned to that study in the analysis. Error bars represent the 95% confidence interval of the effect. This meta-analysis indicates that achievement could significantly predict the changes of ASC after controlling for the baseline level of ASC. b The forest plot of the effect of ASC on achievement. This meta-analysis indicates that ASC could significantly predict the changes of achievement after controlling for the baseline level of achievement

Moderator Analyses

In moderator analyses, we first focused the moderators of the effect of achievement on ASC. As expected, this effect was heterogeneous, Q (60) = 874.78, p < 0.01. Moreover, the percentage of between-study variance (I2) was 92.21%, hinting that some variables may moderate the effect of achievement on ASC.

As expected, student age significantly moderated the effect of achievement on ASC, t (54.5) = - 2.26, p = 0.03. The effect of achievement on ASC was stronger for children (β = 0.18, p < 0.01) than for adolescents (β = 0.12, p < 0.01), which indicated that the effect of achievement on ASC decreased as students grew older. Moderator analyses revealed no significant differences by student achievement level, F (2, 1.83) = 1.27, p = 0.45, types of achievement measurement, F (2, 7.81) = 0.40, p = 0.68, match/mismatch of domain specificity between achievement and ASC, t (56.7) = − 0.90, p = 0.37, or domain of measurement, t (29) = − 0.92, p = 0.36.

Finally, methodological variables did not significantly moderate the effect of achievement on ASC. Time lags between measurement waves, F (2, 25.6) = 2.26, p = 0.13, and variable types did not significantly moderate the effects of achievement on ASC, t (40.5) = − 0.23, p = 0.82. No significant difference in effect sizes between representative and convenience sampling were found, t (38) = 0.14, p = 0.89. The effect size in studies published before 2001 and after 2001 were equal, with no significant difference, t (27) = 0.49, p = 0.63 (see Table 3).

Table 3 Moderations on the effect of achievement on ASC

In addition, we conducted a moderator analysis of the path leading from ASC to achievement. The result of heterogeneity tests demonstrated the significant heterogeneity between effect sizes (Q (61) = 372.51, p < 0.01), and the percentage of between-study variance (I2) was 85.20%.

Three variables emerged as significant moderators of the effect of ASC on achievement. The first was student age, t (49) = 2.85, p = 0.006. The effect of ASC on achievement was stronger for adolescents (β = 0.11, p < 0.01) than for children (β = 0.06, p < 0.01), which was contrary to the effect of achievement on ASC. The second significant moderator was student achievement level, t (10.2) = − 4.51, p = 0.01. The effect sizes for low-achieving students (β = 0.01, p < 0.01) were significantly weaker than typical-achieving students (β = 0.11, p < 0.01). Moreover, types of achievement measurement significantly moderated the effect of ASC on achievement, F (2, 4.88) = 6.74, p = 0.04. In line with our hypothesis, when achievement measurement used standardized tests, the effect of ASC on achievement was minimal (β = 0.06, p < 0.01), compared to grades (β = 0.13, p < 0.01) and teacher-rating (β = 0.10, p < 0.01).

We then found that match or mismatch of domain specificity between achievement and ASC (t (53.7) = 0.94, p = 0.35), and domain of measurement, (t (27.5) = − 0.85, p = 0.40) did not moderate the effect of ASC on achievement. In addition, no methodological variables had a significant moderating effect on effect sizes. Time lags between measurement waves did not moderate the effect of ASC on achievement, F (2, 21.6) = 0.98, p = 0.39. The use of latent variables did not moderate the effect of ASC on achievement, t (35.1) = 1.04, p = 0.30. The effect sizes did not vary according to sample types (t (31.8) = − 0.68, p = 0.50) or publication year (t (23.6) = 0.68, p = 0.50) (see Table 4).

Table 4 Moderations on the effect of ASC on achievement

For comparison with Valentine et al.’s meta-analysis (2004), we included studies that reported regression coefficients to conduct primary effects and moderator analyses. A more detailed analysis is presented in the Online Resource 2.

Difference in the Relationship Between ASC and Achievement

Primary analyses and moderator analyses found that the effect of ASC on achievement and the effect of achievement on ASC were significant both in childhood and adolescence, supporting the generalizability of the REM across age. However, combining the moderator analyses of the effect of ASC on achievement and achievement on ASC, it was found that when students were in childhood, the effect of achievement on ASC (β= 0.18, p < 0.01, 95% CI [0.15, 0.22]) was stronger than the effect of ASC on achievement (β= 0.06, p < 0.01, 95% CI [0.04, 0.08]); when students entered adolescence, the effect of achievement on ASC (β= 0.12, p < 0.01, 95% CI [0.09, 0.16]) was almost the same as it was vice-versa (β= 0.11, p < 0.01, 95% CI [0.08, 0.14]) (see Fig. 5), demonstrating a trend from a strong skill-development effect to a pronounced reciprocal effect with age within the framework of the REM.

Fig. 5
figure 5

The development of the relationship between ASC and achievement from childhood to adolescence

Discussion

The current meta-analysis explored the longitudinal relationship between ASC and achievement and its moderators by a focus upon available studies. Consistent with previous longitudinal studies (e.g., Wigfield & Eccles, 2000; Marsh et al., 2002; Gorges et al., 2018; Weidinger et al., 2018; Sewasew & Schroeders, 2019), our results found that ASC and achievement affected each other mutually within domains over time, supporting the REM.

However, the mutual dependency did not indicate that effect sizes were equal: the skill-development effect was stronger than the self-enhancement effect. The small effect sizes of the REM may be related to the moderate stability of ASC (r = 0.45) and achievement (r = 0.64) (Seaton et al., 2015). The REM suggests that students are likely to feel more competent in subjects in which they perform better, and the sense of competence also affects subsequent achievement. As Prast et al. (2018) found, perceived competence could significantly predict T3 achievement and played a partial mediating role in the association between previous and subsequent achievement. Our findings corroborated the argument that a positive ASC is an important educational outcome and a mediating variable that promotes the development of achievement (Marsh & Martin, 2011; Möller et al., 2009).

Moderator analyses revealed that the effect of ASC on achievement and the effect of achievement on ASC were moderated by student age, although the support for the REM was found in both childhood and adolescence. The nature of the relationship between ASC and achievement demonstrated a trend from a strong skill-development effect to a more pronounced reciprocal effect with age within the framework of the REM, which was inconsistent with the results of Valentine et al.’s meta-analysis (2004). However, it should be noted that in Valentine et al.’s meta-analysis (2004), more than half of the included studies used middle school students as samples, perhaps limiting the comparison of effect sizes at different developmental stages.

What factors might explain this developmental change? During elementary school, students’ ASC undergoes a process of shaping and reshaping that is dominated by cumulative achievement-related success and failure (Skaalvik & Hagtvet, 1990). The motivational properties of ASC have not been fully developed; thus, their influence on achievement is limited (Helmke & van Aken, 1995). However, ASC is expected to exert increasing effects on achievement or other academic choices when it is firmly established and more stable (Wigfield & Karpathian, 1991). For instance, Guay et al. (2003) have found that the reliability and stability of ASC improved with students’ age, which increased the potential for influencing subsequent achievement. The developmental change of the relationship between ASC and achievement also supports the hypothesis that self-beliefs and achievement gradually become a coherent self-stabilizing system with age (Dweck, 2002).

Age differences in the relationship between ASC and achievement might be related to the maturation of cognitive ability, which enables students to make inferences regarding their own and others’ competence by integrating multiple information resources such as performance feedback and reflected appraisals from significant others (Weidinger et al., 2018). In addition, high cognitive abilities could enhance the coordination between previous self-beliefs, thus making it more possible for self-concept to predict external achievement indicators (Skaalvik & Hagtvet, 1990). Furthermore, significant changes have occurred in the educational environment for students beyond elementary school, with students having to participate in an increasing number of normative evaluations and receive more judgments from teachers (Dweck, 2002; Stipek & Iver, 1989). Such feedback helps students to establish competence beliefs more firmly, thereby boosting the effect of ASC on achievement. Taken together, such modes of progress lead to a strong skill-development effect for younger children but a pronounced reciprocal effect for older students.

In addition, we found that achievement level significantly moderated the effect of ASC on achievement. The effect of ASC on achievement for low-achieving students was significantly weaker than typical-achieving students. Low-achieving students experience less success and more academic challenges, which can lead to their academic motivation being weakened and the effect of ASC on achievement being greatly reduced (Hampton & Mason, 2003; Prast et al., 2018). Nevertheless, we cannot ignore the implications of ASC for achievement for low-achieving students. Instead, high ASC might be especially beneficial for subsequent achievement among low-achieving students because the motivational properties of ASC could initiate adaptive learning strategies and behaviors such as efforts and persistence, which would have a positive effect on future achievement and success (Marsh, 1990; Marsh & Martin, 2011; Wigfield & Eccles, 2000).

Consistent with our hypothesis, type of achievement measurement was a significant moderator of the effect of ASC on achievement. Studies using standardized tests as indicators of achievement tended to report a weaker relationship between ASC and achievement than studies using grades or teacher-rating as indicators. As mentioned in the introduction, standardized testing is not high-stakes and not of personal significance for the students, while grades and teacher-rating often serve other functions such as motivating students (Kriegbaum et al., 2018; Pinxten et al., 2010). ASC possesses motivational properties, so this leads to the closer correlation between ASC and grade or teacher-rating as they are directly related to motivation (Valentine et al., 2004; Wylie, 1974).

Moderator analyses demonstrated that the match between domain specificity of ASC and achievement did not significantly affect the relationship between ASC and achievement, which was inconsistent with Valentine et al.’s meta-analysis (2004). In the present study, we coded this moderator as mismatch when general self-concept corresponded to mathematical/verbal achievement, or the mathematical/verbal self-concept corresponded to GPA. In fact, the general ASC and mathematics achievement are not totally mismatched. However, in Valentine et al.’s meta-analysis (2004), the researchers coded the moderator as mismatch if ASC and achievement were based on different subjects (e.g., mathematical self-concept corresponded to verbal achievement). Therefore, it was not surprising that the significant moderating effect of match between domain specificity of ASC and achievement was not found in our study.

Similarly, the moderating effect of domain was not found in the present study. Although Chen et al. (2013) and Möller et al. (2014) found that a reciprocal relationship for mathematical but not for verbal domain, some characteristics of two studies needed to be noted. For instance, the stability of verbal achievement (r = .75) was much higher in Chen et al.’s research (2013), and the time lags between T1 and T2 was as long as 4 years in Moller et al.’s (2014) study, which may have resulted in the masking of the effect of the prior verbal self-concept. Language and reading are often viewed as feminine domains, whereas mathematics is stereotyped as a masculine domain (Marsh, 1989). Skaalvik and Skaalvik (2014) revealed that male students had a significantly higher ASC, performance expectation, and intrinsic motivation for mathematics than female students. The differential socialization hypothesis also states that gender-related difference in socialization progress may not fully enhance girls’ self-concept and performance in the mathematical domain and may fail to reinforce boys’ self-concept and performance in the verbal domain (Marsh, 1989). Therefore, future research is needed to investigate the interaction of subject and gender on the relationship between ASC and achievement.

Furthermore, we found that the effect of achievement on ASC and the effect of ASC on achievement did not significantly vary as a function of time lags. Our results suggested that the skill-development effect and self-enhancement effect are detectable and stable across a wide range of time lags. Nevertheless, although our meta-analysis covered a wide range of time lags (from 2 to 66 months), the number of studies with time lags of less than 1 year was too small, which may restrict the statistical power. Thus, these results are merely tentative and should not be regarded as definitive evidence. Future research should further investigate the effect of time lags on the relationship between ASC and achievement, and the mediating mechanisms that account for the temporal stability of these effects. Although there is no general standard for optimal measurement intervals, Helmke and van Aken (1995) suggested that students’ age and school system should be considered when determining the time interval between measurement waves.

Although some studies maintained that calculation effect sizes at the latent level within SEM can yield more valid and potentially high estimates, the use of latent variables did not cause the relationship between ASC and achievement to be significantly higher than those using manifest variables (Leonhart et al., 2008; Valentine et al., 2004). As for sample type of the original study, although representative sampling has some advantages over convenience sampling, their limitations are clear. In general, representative sampling has relatively broader aims, so that some variable of interest cannot be incorporated. For example, Project for the Analysis of Learning and Achievement in Mathematics only targeted students’ mathematical domain, and the Youth in Transition Project only measured students’ general ASC, which pertained specifically to a single academic domain. Therefore, it was not surprising that no significant difference in effect sizes was found between representative and convenience sampling. In the same way, longitudinal studies after 2001 also supported the reciprocal relationship between of ASC and achievement, strengthening the robustness of REM.

Limitations and Future Directions

Although the present meta-analysis of longitudinal studies has gone a step beyond previous studies, several limitations should be borne in mind when interpreting certain results. First, our research did not allow for a causal inference regarding the relationship between ASC and achievement because the included studies all adopted a correlational design. Therefore, we cannot rule out a third variable that affects both constructs, particularly due to the modest effect sizes observed here. For example, academic interest is related to high achievement and ASC (Marsh et al., 2005), but the extent to which academic interest accounts for their relationship is unclear. Future investigation could use experimental design, quasi-experimental design, and invention strategies to directly test the causal ordering between achievement and ASC.

Second, although the present study found that the relationship between achievement and ASC varied as a function of age and presented some evidence for the changes of the relationship between ASC and achievement over time, it did not provide a full picture of the developmental trajectory of the relationship between two constructs. Therefore, future studies could follow the same cohort of participants over time from childhood to adolescence to derive a linear or non-linear trajectory of the relationship between ASC and achievement from childhood to adolescence via longitudinal studies with a multicohort–multioccasion design. In addition, although we have inferred that this development may owe both to improvement of students’ cognitive abilities and changes of educational contexts, more in-depth research is needed to explore the mechanism underlying the change of relationship between ASC and achievement.

Third, our study is limited in that we did not systematically search for unpublished studies. Although we searched for unpublished studies to reduce publication bias, we did not request the data on unpublished studies from researchers who had published many studies in the field, which might have resulted in some unpublished data not being included.

Finally, abundant research has revealed that ASC and academic self-efficacy conceptually and empirically represent two distinct constructs (e.g., Bong & Skaalvik, 2003; Ferla et al., 2009; Pietsch et al., 2003), so we only included longitudinal studies where ASC or achievement was measured at least twice. However, it is undeniable that we failed to contrast the difference between ASC and academic self-efficacy when focusing only on ASC.