Introduction

Internalizing and externalizing behaviors represent two broad dimensions of behavioral problems, typically assessed in terms of severity, which respectively encompass depression, anxiety and social withdrawal (internalizing) or aggression, opposition and hyperactivity-inattention (externalizing) (e.g., Achenbach 2009; Krueger and Markon 2006). Despite forming distinct dimensions, the co-occurrence of internalizing and externalizing behaviors is well documented among youth (Gilliom and Shaw 2004; Krueger and Markon 2006; Reitz et al. 2005). However, their temporal ordering remains unclear. Some research supports positive unidirectional relationships according to which externalizing behaviors increase later levels of internalizing behaviors (e.g., Curran and Bollen 2001; Van der Giessen et al. 2013), or internalizing behaviors predict later increases in externalizing behaviors (e.g., Herrenkohl et al. 2010; Vieno et al. 2008). In contrast, accumulating evidence suggests negative longitudinal relationships whereby each problem predicts decreases in the other problem, particularly in adolescence (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Lee and Bukowski 2012; Masten et al. 2005; Rogosch et al. 2010; Van der Ende et al. 2016; Wiesner 2003). Using teacher ratings of behavioral problems, which are more likely to focus on observable behaviors than self-reports and less likely to suffer from self-consistency and memory biases, this study aims to clarify the temporal ordering of internalizing and externalizing behaviors. Furthermore, this study explores the extent to which the observed relationships generalize to samples of adolescents with average-to-high levels of cognitive abilities relative to those with low levels of cognitive abilities, for whom the mechanisms involved in the co-occurrence of internalizing and externalizing behaviors may differ.

Theoretical Perspectives on the Co-Occurrence of Internalizing and Externalizing Behaviors

The high level of co-occurrence between internalizing and externalizing behaviors in young populations has been well documented, with correlations generally close to, or above, .50 (e.g., Burt et al. 2008; Jackson et al. 2000). Many explain this co-occurrence by the presence of common determinants, and research supports the presence of shared determinants for internalizing and externalizing behaviors (e.g., Caron and Rutter 1991; Burt et al. 2008; Lee and Stone 2012; Moilanen et al. 2010). However, although shared determinants may explain the co-occurrence of internalizing and externalizing behaviors at a specific time point, it is unlikely to explain how each problem may be involved in the development of the other, particularly during adolescence given that the first onset of these behaviors typically occurs in childhood.

Mutually Reinforcing Relationships

Alternative theoretical explanations for the co-occurrence of internalizing and externalizing behaviors focus on the likely impact of one type of problem on the emergence of the other. First, Capaldi’s (1991, 1992) Failure Hypothesis suggests that the presence of externalizing behaviors impinge on the development of competence in the social and academic areas, which precipitates failures experiences in these areas. The accumulation of these failures experiences is proposed to lead to an increased risk of developing internalizing behaviors. Second, the Acting Out Hypothesis claims that youth may “mask” internalizing behaviors by acting out and engaging in externalizing behaviors (Carlson and Cantwell 1980; Gold et al. 1989), a phenomenon that may be amplified in childhood and adolescence when internalizing symptoms tend to be accompanied by increased levels of irritability (Wolff and Ollendick 2006). Even more direct mechanisms have been proposed through which internalizing behaviors increase the reactivity of neural regions associated with threat responses, leading to difficulties of emotion regulation in stressful situations and an increased likelihood of aggressive responses to these situations (Drabick et al. 2010).

Third, internalizing behaviors may also predict increased levels of externalizing behaviors through the action of mechanisms similar to those invoked by the Failure Hypothesis. Thus, internalizing behaviors are also associated with a more aversive interpersonal style, lower levels of social competence, and a tendency to withdraw from social interactions (Joiner and Coyne 1999), possibly leading to peer rejection and victimization (i.e., experiences of failures in the social area), two known predictors of externalizing behaviors (Marsh et al. 2011). Similarly, through their associations with difficulties of concentration, internalizing behaviors may also lead to failures in the academic area (Moilanen et al. 2010; Quiroga et al. 2013). As their experiences of failures in these two areas accumulate, youth may in turn be more likely to join deviant peer networks, further increasing their likelihood of developing externalizing behaviors (Connell and Dishion 2006; Oland and Shaw 2005).

Fourth, these unidirectional hypotheses have recently been integrated in Moilanen et al.’ (2010) Adjustment Erosion Hypothesis. This hypothesis states that both internalizing and externalizing behaviors lead to lower levels of social and academic competence, resulting in accumulated failures, in turn resulting in increased risks for the other type of problem. Mutually reinforcing relationships are thus expected between internalizing and externalizing behaviors. It is noteworthy that the Failure Hypothesis and the Adjustment Erosion Hypothesis implicitly suggest that the reciprocal effects of each behavior on the other might increase with age due to the accumulation of failure experiences. In line with this possibility, a fifth model suggests that adolescence might represent a particularly critical period for investigating possible reciprocal relationships between internalizing and externalizing behaviors. Indeed, in a more comprehensive model hereafter referred to as the Socio-Developmental Milestones Perspective, Oland and Shaw (2005) explicitly propose that these reciprocal relationships should increase with time. More precisely, this model proposes that externalizing behaviors should not lead to the development of internalizing behaviors as long as youth have not reached the socio-developmental milestone of developing cognitive abilities for self-evaluation, self-reflection, and empathy that are necessary to realize the negative impact of externalizing behaviors in their lives. This model also suggests that internalizing behaviors should not lead to the development of externalizing behaviors as long as youth do not develop strong intimate peer relationships. The absence of intimate relationships limits the possibility to experience negative interactions, negative feedback, and rejection, which may in turn reduce the risk of developing externalizing behaviors in reactions to these experiences. As the likelihood of attaining these socio-developmental milestones increases with age, so should the strength of the reciprocal relationships between internalizing and externalizing behaviors.

Mutually Suppressing Relationships?

So far, all theoretical models used in this line of enquiry have assumed positive predictive relationships between these two types of problems, and are thus unable to explain prior reports suggesting that these relationships may in fact be negative in adolescence (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Lee and Bukowski 2012; Masten et al. 2005; Rogosch et al. 2010; Van der Ende et al. 2016; Wiesner 2003). Interestingly, alternative theoretical perspectives emerging from animal models, neuropsychological research, and from examination of the clinical characteristics of these behavior problems indirectly suggest that these relationships may also be negative, with one type of behavior predicting decreases in the other.

Thus, drawing parallels with the expanded fight, flight or freeze mechanisms involved in reactions to threatening situations (Bracha et al. 2004; Maack et al. 2015), internalizing behaviors are often assumed to correspond to the last two mechanisms, which are incompatible with the display of externalizing responses. Thus, anxiety is known to result in a priming of the typical fight-or-flight response in the anticipation of a threatening situation, but this priming typically results in decisions to avoid the anticipated source of stress (Kunimatsu and Marsee 2012; Zinbarg et al. 1992). As such, it is not surprising to note that avoidance is an important part of the clinical picture of anxiety (Maack et al. 2015; Zinbarg et al. 1992), and tends to be particularly marked in social situations (Miers et al. 2014). It is also worth noting that social withdrawal is considered to represent a key component of internalizing behaviors (Achenbach 2009; Rubin et al. 2009), and that recent research has also shown avoidance to be a key component of the clinical picture of depression (Ottenbreit et al. 2014). An even closer correspondence is assumed between depression and the “freeze” response to threatening situations (Zinbarg et al. 1992). Thus, the Social Competition Hypothesis of depression (Price et al. 1994; Nesse 2000), which has been more recently expanded to include anxiety (Price et al. 2004), proposes that internalizing behaviors are an evolved version of primitive adaptive processes gone awry. Taking the example of ritualistic fighting for dominance, a behavior common in other mammalian species, this theory proposes that depression (and to a lesser extent anxiety) is similar to the automated yielding behaviors implicated in the “freeze” or “flight” response to stressful situations. As was the case for avoidance, these automated “yielding” responses are also incompatible with the display of externalizing responses.

Whereas the above considerations suggest that higher levels of internalizing disorders may possibly predict decreases in levels of externalizing behaviors, additional models and research results suggest that higher levels of externalizing disorders may also predict decreases in levels of internalizing behaviors. Thus, externalizing behaviors may represent a form of adaptation to troubled environments, or a way through which youth may seek alternate forms of success (Hayes and Ciarrochi 2015). Adopting an operant perspective on externalizing behaviors, Snyder and his colleagues (Snyder 2002; Snyder and Patterson 1995; Snyder and Stoolmiller 2002) noted that externalizing behaviors often serve an important function, being associated with a variety of positive and negative reinforcements. For instance, reacting with aggression to situations of coercion, conflict, or even bullying may help to end the aversive situation sooner. Similarly, externalizing behaviors are also known to lead to social interactions with deviant peers who tend to value and reinforce delinquent activities. These delinquent activities may provide short term experiences of “success”, in terms of gaining attention from peers and adults. Altogether, these “desirable” outcomes of externalizing behaviors may compensate for their associated failure experiences, potentially leading to lower levels of internalizing behaviors. Still, it is important to keep in mind that the possible social benefits of externalizing behaviors may only appear during adolescence, based on recent research evidence suggesting that childhood externalizing behaviors rather tend to be associated with less positive social interactions with peers (Gooren et al. 2011; Van Lier and Koot 2010; Van Lier et al. 2012).

The Directionality of the Associations between Internalizing and Externalizing Behaviors

Testing the directionality of reciprocal relationships between two types of behavioral problems requires an autoregressive cross-lagged design, such as the one illustrated in Fig. 1, which includes three measurement points for consistency with the present study. As shown in Fig. 1, identifying the directionality of these associations requires the demonstration that internalizing behaviors at Time T predict externalizing behaviors at Time T + 1 (cross-lagged paths P1 and P2), and that externalizing behaviors at Time T predict internalizing behaviors at Time T + 1 (reciprocal cross-lagged paths R1 and R2), while controlling for the cross-sectional associations (paths C1–C3) and longitudinal stability (paths A1–A2 and B1–B2) (Mitchison et al. 2015). The numerical component (1, 2, 3) associated with each of these relationships (A, B, P, R, and C) indicates that they can be alternatively freely estimated or constrained to equality across time periods (as in tests of predictive equilibrium described later; Cole and Maxwell 2003). Conversely, the subscript g associated with each of these relationships indicates that they can be alternatively freely estimated or constrained to equality across groups of participants in the context of multi-group models.

Fig. 1
figure 1

Autoregressive Cross-Lagged Model for Testing the Longitudinal Relationships between Internalizing and Externalizing Behaviors

Self-Reported Studies of Reciprocal Effects

So far, some studies have investigated bidirectional relationships between self-reported ratings of internalizing and externalizing behaviors using a scheme similar to Fig. 1, and relying mainly on samples of participants in their late childhood or adolescence. The results from these studies are inconsistent, with only a minority supporting direct or indirect positive bidirectional relationships, generally small in magnitude (Lee and Stone 2012) or limited to girls (Klostermann et al. 2015; Measelle et al. 2006). Other studies have supported direct or indirect unidirectional positive relationships going from internalizing behaviors to externalizing behaviors (Herrenkohl et al. 2010; Vieno et al. 2008), or from externalizing behaviors to internalizing behaviors, again limited to specific subpopulations (boys: Wiesner 2003; Klostermann et al. 2015; maltreated girls: Brentsilver et al. 2011). Finally, some studies failed to identify any cross-lagged relationships between internalizing and externalizing behaviors (Akse et al. 2007; Overbeek et al. 2001). In a comprehensive study involving four measurement points taken over two years in a sample of adolescents, Wiesner (2003) reported a negative relationship whereby depression measured in wave 2 predicted decreases in delinquency 6 months later among girls. In a similar study of Korean fourth graders repeatedly measured over a 4 year period, Lee and Bukowski (2012) found that initial levels of externalizing behaviors predicted decreases over time in internalizing behaviors among boys and girls, whereas internalizing behaviors predicted decreases in externalizing behaviors among boys.

An important limitation of these studies is their exclusive reliance on self-reports (e.g., Van der Giessen et al. 2013), which leads to two potential limitations. First, self-reported data may lead to inflated estimates of stability (the autoregressive paths A1–A2 and B1–B2) due to self-consistency and memory biases whereby respondents recall their prior answers, or interpret their current behaviors in light of their past. As a result of potentially inflated stability estimates, it might become harder to predict change over time. Arguably, similar biases are likely to be present in any report of behavioral problems provided by the same informant over time. Teachers’ report of youth behaviors as they occur in school over multiple school years may remedy this limit given that teachers change each year so that their assessments have to remain based on observed behaviors, rather than prior knowledge of past behaviors. A second limitation of self-report data is that it may lead to inflated estimates of the cross-lagged relationships between constructs. Although the focus of such assessments should be on behavioral problems of sufficient severity to be observable, youth knowledge of their own internal states may lead them to over-estimate some of the relationships. For instance, they are likely to be aware of the sadness or anxiety that accompany their delinquent activities, they may feel withdrawn from interactions with socially competent peers even if they externally appear well integrated into a delinquent peer group, or they may know that their acting out behaviors are attempts to cover up internalizing symptoms. In contrast, informant reports are more likely to focus on observable behaviors, allowing for a clearer understanding of which observable behavior leads to the other.

Informant-Reported Studies of Reciprocal Effects

We are aware of thirteen studies which have relied on informant reports. The first four focused on samples of children rated by their parents. Mathiesen et al. (2009) and Stone et al. (2015) failed to identify any form of cross-lagged relationships among parental ratings of young children’s internalizing and externalizing behaviors, whereas Curran and Bollen (2001) found evidence of significant relationships between externalizing behaviors and later increases in internalizing behaviors among a sample of slightly older children. Boylan et al. (2010) relied on data from a sample of children initially aged 6–7 and followed biannually across three measurement waves. Their results revealed small positive relationships between maternal ratings of boys’ oppositional symptoms and later increases in depressive symptoms, but this effect did not generalize across time waves. Among girls, the authors found evidence of negative relationships between mothers’ ratings of depressive symptoms and later decreases in oppositional symptoms (again limited to a subset of time waves), supporting Wiesner’s (2003) observation of negative relationships among girls. However, these authors also reported that the measurement model underlying mothers’ ratings of depressive symptoms among girls was not invariant over time, casting doubts on the comparability of results across time waves.

The next four studies relied on a combination of self-reports and parental reports in samples of children and adolescents. Van der Giessen et al. (2013) assessed reciprocal relationships between self-reported ratings of depressive symptoms and parental ratings of adolescents’ aggressive behaviors in a three wave longitudinal study with yearly intervals. The results revealed positive longitudinal relationships whereby aggressive behaviors predicted later increases in depressive symptoms, whereas the opposite relationship between depressive symptoms and later levels of aggressive behaviors was non-significant. Using data from a sample of at-risk boys measured at ages 6, 8, 10, 11, and 12 through parental reports and self-reports, Moilanen et al. (2010) reported similar relationships, albeit limited to a subset of time waves. Burt et al. (2008; for similar results obtained with the same sample, see Masten et al. 2005) tested the longitudinal relationships among ratings of internalizing and externalizing behaviors also obtained from a combination of parental reports and self-reports taken in early adolescence and 7, 10, and 20 years later. In this study, the only significant longitudinal relationship was a negative one between ratings of internalizing behaviors at age 17 and ratings of externalizing behaviors at age 20. Finally, Rogosch et al. (2010) followed a sample of 415 children initially aged between 7 and 9 years across four times points until they reached their late adolescent years (15–18 years). In childhood, they relied on combined ratings of internalizing and externalizing behaviors provided by camp counsellor and teachers, whereas only self-reports were used in adolescence. This study revealed that internalizing behaviors in early childhood and early adolescence respectively predicted a decrease in externalizing behaviors in late childhood and late adolescence. These studies (Burt et al. 2008; Rogosch et al. 2010) support the presence of negative relationships between internalizing and externalizing behaviors (Boylan et al. 2010; Lee and Bukowski 2012; Wiesner 2003).

Apart from Rogosch et al. (2010), we are aware of five additional studies relying on teachers’ ratings of children internalizing and externalizing behaviors, three of which relied on much younger samples than the present study. Van Lier and Koot (2010) annually followed a sample of children from kindergarten to Grade 4. Although their results revealed a direct positive relationship between teachers’ ratings of externalizing behaviors in kindergarten and internalizing behaviors in Grade 1, none of the other direct relationships proved significant. In a second study of kindergartners followed over a two-year period, Gooren et al. (2011) also report evidence of a significant relationship between teachers’ ratings of conduct disorders and later levels of depression, but show this relationship to be mediated by the effects of conduct disorders on peer rejection. No evidence of a reciprocal, direct or indirect, relationship between teacher’s ratings of depressive symptoms and later levels of conduct disorders was evidenced. The third study (Van Lier et al. 2012) essentially replicated these findings among a sample of slightly older children aged between 6 and 8, which demonstrated an indirect relationship (with no evidence of direct relationship) between teachers’ rating of externalizing behaviors at age 6 and internalizing behaviors at age 8 as mediated by lower level of academic achievement and higher levels of peer victimization at age 7. Again, no direct or indirect relationship was found between teachers’ ratings of internalizing behaviors and later levels of externalizing behaviors.

Two other studies considered samples of children followed up into adolescence. Englund and Siebenbruner (2012) reported negative relationships between teachers’ ratings of internalizing behaviors at age 7 and levels of externalizing behaviors at age 9. However, none of the reciprocal relationships between teachers’ ratings of externalizing behaviors and later levels of internalizing behaviors, or of the later relationships between internalizing and externalizing behaviors (age 9–12 and 12–16) proved significant, which could possibly be explained by the small sample size used in this study (N = 191). Van der Ende et al. (2016) considered possible reciprocal relationships between internalizing and externalizing behaviors in adolescence, contrasting parents and teacher reports. Parental reports of externalizing behaviors positively predicted later increases in parental reports of internalizing behaviors, whereas no evidence was found to support the reciprocal relationship between internalizing and externalizing behaviors when using parental reports. In contrast, teachers’ reports also evidenced a negative relationship between ratings of internalizing behaviors and later levels of externalizing behaviors.

In sum, prior results obtained using the scheme presented in Fig. 1 remained inconsistent, suggesting that reciprocal relationships between internalizing and externalizing behaviors may be much smaller than previously assumed. Yet, accumulating evidence suggest that these relationships may be negative, particularly in research conducted among samples of adolescents or older children, and using informant reports (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Lee and Bukowski 2012; Masten et al. 2005; Rogosch et al. 2010; Van der Ende et al. 2016; Wiesner 2003).

Predictive Equilibrium

When examining these prior results, it is noteworthy that among the few studies including more than two time waves, most studies in which significant relationships were observed found these relationships to be limited to a subset of the time intervals considered (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Klostermann et al. 2015; Masten et al. 2005; Moilanen et al. 2010; Rogosch et al. 2010; Van Lier and Koot 2010; Van Lier et al. 2012; Wiesner 2003). However, few studies directly and quantitatively tested whether the predictive relationships could be considered to be different, or equivalent, across time waves—what is commonly referred to as a test of predictive equilibrium (Cole and Maxwell 2003; Mitchison et al. 2015). The two studies that tested the predictive equilibrium of the relationships supported their generalizability across three time waves (Van der Ende et al. 2016; Van der Giessen et al. 2013), suggesting that such tests might be important. Thus, it remains unknown whether the observed discrepancies across time waves are related to random sampling variations around a common set of population parameters, or whether these discrepancies reflect meaningful developmental differences. Statistically, a predictive model that has reached equilibrium has the advantage of being more parsimonious (i.e., resulting in the free estimation of fewer time-invariant predictive paths), thus maximising the statistical power of the analyses and the stability of the estimation process.

More precisely, when examining the reciprocal relationships between internalizing and externalizing behaviors, there are three distinct elements to consider in the assessment of the predictive equilibrium of the system (Cole and Maxwell 2003; Mitchison et al. 2015). A first test concerns whether the stability of internalizing and externalizing problems vary across time periods (i.e., whether the autoregressive paths A1 and B1 estimated between T and T + 1 are equal to the same paths A2 and B2 estimated across T + 1 to T + 2). So far, research shows that both types of problems tend to present high levels of stability and chronicity (Beyers and Loeber 2003; Burt et al. 2008; Van der Ende et al. 2016; Van der Giessen et al. 2013). There is, however, also evidence of time variation in the stability of internalizing and externalizing behaviors, with studies suggesting that stability may increase with age (Buist et al. 2004; Reitz et al. 2005).

A second test concerns whether cross-lagged relationships between internalizing and externalizing behaviors vary across time (i.e., whether cross-lagged paths P1 and R1 estimated between T and T + 1 are equal to the same paths P2 and R2 estimated across T + 1 to T + 2). This test verifies whether the impact of one problem on the other changes as a function of time. For example, the Failure Hypothesis and the Adjustment Erosion Hypothesis both suggest that the reciprocal effects of each types of behavior on the other might increase with age due to the accumulation of failure experiences, a possibility that appears supported by Wiesner (2003). Similarly, Oland and Shaw’s (2005) Socio-Developmental Milestones Perspective explicitly suggest that the reciprocal relationships between internalizing and externalizing behaviors should become more marked during adolescence.

A third test concerns whether the time-specific correlations between internalizing and externalizing behaviors differ across time periods (i.e., whether the correlation C1 estimated at Time T is equal to the correlation C2 and C3 estimated at Time T + 1 and T + 2). This test verifies whether co-occurrence due to the impact of shared determinants increases or decreases over time. So far, research suggests that co-occurrence remains relatively stable across developmental periods (Beyers and Loeber 2003; Reitz et al. 2005), consistent with the idea that it reflects the action of shared determinants.

Generalizability to Youth with Low Levels of Cognitive Abilities

Research has demonstrated that youth with low levels of cognitive abilities present a higher likelihood of exhibiting internalizing and externalizing behaviors than their peers with average-to-high levels of cognitive abilities, making it particularly important to study the mechanisms involved in the development of these behaviors in this population (Dekker et al. 2002; Metcalfe et al. 2013; Weeks et al. 2014; Whitaker and Read 2006). These higher levels of internalizing and externalizing behaviors are not surprising considering that risk factors for these behaviors also tend to be more prevalent among youth with low levels of cognitive abilities than among their peers with average-to-high levels of cognitive abilities, including more limited social skills, as well as higher levels of exposure to stressful life conditions, peer rejection, and victimization (Rose et al. 2011; Valas 1999; Wehmeyer 2005). However, although numerous studies have demonstrated higher levels of internalizing and externalizing behaviors among youth with low levels of cognitive abilities, no research has yet examined the reciprocal relationships between these behaviors in this population. This limitation is important as it restricts the extent to which knowledge gained in the far more numerous studies conducted among youth with average-to-high levels of cognitive abilities can be generalized to those with low levels of cognitive abilities.

As reviewed above, the broader research literature aiming to explain the relationships between internalizing and externalizing behaviors typically relies on the Failure Hypothesis (Capaldi 1991, 1992) and the Adjustment Erosion Hypothesis (Moilanen et al. 2010). These models rely on the assumption that each type of behavioral problem is likely to increase the risk of experiencing failures in a variety of domains, which in turn should lead to increased risks of also developing the other type of problem. Of particular relevance to youth with low levels of cognitive abilities, the Dual-Risk Model (Sameroff 1983) proposes that vulnerability traits, such as low levels of cognitive abilities, tend to increase the risk of experiencing problematic behavioral outcomes when exposed to environmental stressors. With the higher prevalence of internalizing and externalizing behaviors among youth with low levels of cognitive abilities, and their more frequent experiences of stresses (or “failures”) in both the social and educational areas (e.g., Craven et al. 2015), the combination of these hypotheses suggest stronger reciprocal relationships between internalizing and externalizing behaviors among youth with low levels of cognitive abilities than among their peers with average-to-high levels of cognitive abilities.

In contrast, the more limited cognitive and social skills of youth with low levels of cognitive abilities are also likely to interfere with the mechanisms proposed to be at play in explaining possible reciprocal relationships between both types of behavioral problems. Thus, Oland and Shaw (2005) suggest that reciprocal relationships between internalizing and externalizing behaviors are conditional on the attainment of specific socio-developmental milestones. The cognitive deficits that characterize youth with low levels of cognitive abilities makes it less likely for them to successfully attain these developmental milestones, and may even lead them to develop altogether different definitions of “success” (e.g., Weeks et al. 2014). These observations suggest weaker reciprocal relationships between internalizing and externalizing behaviors among youth with low levels of cognitive abilities than among youth with average-to-high levels of cognitive abilities. Similarly, even though internalizing behaviors are known to lead to higher levels of social rejection, withdrawal, and victimization (Joiner and Coyne 1999), such experiences are already far more frequent for youth with low levels of cognitive abilities (Rose et al. 2011; Valas 1999; Wehmeyer 2005), and thus less likely to be further increased by internalizing behaviors. For youth with low levels of cognitive abilities already experiencing heightened social and educational difficulties, the power of these experiences to further enhance internalizing or externalizing behaviors (as proposed in the Failure Hypothesis, Adjustment Erosion Hypothesis, and Dual-Risk Model) may thus be minimal, suggesting weaker relationships than among youth with average-to-high levels of cognitive abilities.

Given that the co-occurrence between internalizing and externalizing behaviors is associated with greater impairment in terms of psychological, physiological, and social wellbeing (Fanti and Henrich 2010; Newman et al. 1998), which already tends to be compromised among youth with low levels of cognitive abilities (Helps 2015; Kiddle and Dagnan 2011), understanding the interplay between both types of problems appears to be of critical importance in this population. In response to the dearth of robust empirical investigation to evaluate the mechanisms at work for this vulnerable population, such investigations become even more paramount.

The Present Study

The first aim of the present study is to provide further insights on the nature of the longitudinal associations between teachers’ ratings of adolescents’ levels of observable internalizing and externalizing behaviors as they occur over three consecutive years. The theoretical perspectives typically invoked to explain how internalizing and externalizing behaviors may be reciprocally related over time (i.e., the Failure Hypothesis, the Acting Out Hypothesis, the Adjustment Erosion Hypothesis, and the Socio-Developmental Milestones Perspective) assume mutually reinforcing relationships between these types of behaviors whereby each one predicts increases in the other over time. A number of studies have provided tentative support to these hypotheses, showing either bidirectional (e.g., Lee and Stone 2012), or unidirectional (e.g., Curran and Bollen 2001; Herrenkohl et al. 2010; Van der Giessen et al. 2013) positive relationships among internalizing and externalizing behaviors, although additional studies failed to find evidence of significant relationships (e.g., Akse et al. 2007; Mathiesen et al. 2009; Overbeek et al. 2001). However, accumulating evidence, particularly from studies relying on informant reports of internalizing and externalizing behaviors conducted among samples of adolescents, suggest that these relationships might be negative (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Lee and Bukowski 2012; Masten et al. 2005; Rogosch et al. 2010; Van der Ende et al. 2016; Wiesner 2003). Although theoretical perspectives emerging from biological (fight-flight-freeze) or clinical (avoidance) research indirectly explain these unexpected negative relationships, these alternative models have never been used to guide research in this area. Furthermore, a key limitation of prior research is a frequent lack of consideration of the extent to which the relationships generalize to the various time intervals considered through systematic tests of predictive equilibrium. This limitation could have restricted the ability of these prior studies to detect significant relationships, and makes it impossible to determine whether any observed discrepancy reflects meaningful developmental differences rather than random sampling variations. Thus, although both theory and research allows us to expect significant reciprocal longitudinal relationships among teachers’ ratings of adolescents’ levels of internalizing and externalizing behaviors, the direction of these relationships remains an open research question due to the accumulation of contradictory research evidence. In sum, we consider the following two research questions:

Will the reciprocal relationships between teachers’ ratings of adolescents’ levels of internalizing and externalizing behaviors be mutually reinforcing (i.e., positive) or mutually suppressing (i.e., negative) (Research Question 1)?

To what extent will the reciprocal relationships between teachers’ ratings of adolescents’ levels of internalizing and externalizing behaviors generalize across the two time intervals considered between the three annual waves of measurement of the present study (Research Question 2)?

The second aim of this study is to assess whether the longitudinal relationships between teachers’ ratings of adolescents’ levels of internalizing and externalizing behaviors generalize across matched samples of adolescents with low levels of cognitive abilities and average-to-high levels of cognitive abilities. Although prior research leads us to expect higher levels of internalizing and externalizing behaviors among adolescents with low levels of cognitive abilities compared to their peers with average-to-high levels of cognitive abilities, evidence regarding the associations between these two types of behavioral problems is lacking among populations of youth with low levels of cognitive abilities. From a theoretical perspective, differential relationships can be expected among these two groups of adolescents. However, the nature of these differences remains again an open research question. Indeed, whereas some theoretical frameworks (i.e., the Failure Hypothesis, the Adjustment Erosion Hypothesis, and the Dual-Risk Model) suggest stronger reciprocal relationships between internalizing and externalizing behaviors among adolescents with low levels of cognitive abilities when compared to their peers with average-to-high levels of cognitive abilities, other perspectives suggest weaker reciprocal relationships (e.g., the Socio-Developmental Milestones Perspective). In sum, we consider the following research question and hypothesis:

Adolescents with low levels of cognitive abilities will present higher mean levels of internalizing and externalizing behaviors than those with average-to-high levels of cognitive abilities (Hypothesis 1).

To what extent will the reciprocal relationships between teachers’ ratings of adolescents’ levels of internalizing and externalizing behaviors generalize across matched samples of adolescents with low versus average-to-high levels of cognitive abilities (Research Question 3)?

Method

Sample, Procedure, and Matching

This study relies on a sample drawn from the Wollongong Youth Study, which was conducted in a number of regular secondary schools from the same Catholic Diocese and located in the regional and metropolitan areas of Wollongong (New South Wales, Australia). In 2003, all adolescents attending Grade 7 in the participating schools were targeted for participation and followed annually thereafter, resulting in a total sample of N = 979 adolescents (503 males; 476 females; 12–14 years old; M age 12.41). We focus on three measurement waves (Grades 8, 9, and 10) for which teachers’ ratings of internalizing and externalizing behaviors were available. Socioeconomic indicators, such as family occupation, structure, and first language closely match National Australian trends at the time of the study as reported by the Australian Bureau of Statistics (2005), supporting the idea that this sample is representative of the Australian population (Ciarrochi et al. 2012). This longitudinal study received annual approval from the university research ethics committee. Parents and adolescents also provided informed consent on an annual basis. For additional details on the Wollongong Youth Study, see Ciarrochi et al. (2012) and Heaven et al. (2009).

Upon entering secondary schools (in Grade 7), all students were required by the Department of Education and Training to complete two standardized measures of verbal and numerical aptitudes. The first test called “English Language and Literacy Assessment (ELLA)” assesses verbal aptitudes in writing, reading, and language (α = .87), while the second test called “Secondary Numeracy Assessment Program (SNAP)” assesses numeracy aptitudes in number, measurement, space, data, and numeracy problem-solving (α = .95). Although these tests are not specifically designed to assess intelligence (IQ), similar tests of cognitive abilities are known to underpin a common global (g) factor (Deary et al. 2007; Frey and Detterman 2004). Furthermore, these specific tests have been found to be significantly related to the abbreviated Wechsler Scale of Intelligence (Heaven et al. 2011), to significantly predict future academic performance (Heaven, and Ciarrochi 2008), and to represent valid proxy measures of global IQ (Ciarrochi et al. 2012; Heaven and Ciarrochi 2012; Heaven et al. 2011; also see Weeks et al. 2014). In order to select the subsample of participants with low levels of cognitive abilities, we retained all adolescents who scored in the lowest 15 % on the score distribution of the ELLA and SNAP tests. The 15th percentile was selected as the cut-off point as (a) prevalence estimates of students with low levels of cognitive abilities in mainstream settings in Australia fall between 12–16 % (OECD 1999), and (b) 15 % of the population have an IQ that falls one standard deviation below the mean IQ score (Wechsler et al. 2003).

This procedure resulted in a sample of N = 138 adolescents with low levels of cognitive abilities (90 boys; 48 girls; 12–14 years old; M age = 12.41). Most (79.7 %) were from an Anglo-Australian background. In terms of socio-economic status, 7.4 % had parents corresponding to an underclass level (e.g., retired, unemployed), 39.0 % to a working class level, 44.0 % to an intermediate social class level, and 9.6 % to a salariat (i.e., white collar professional, managers) social class level. These adolescents are best described as performing below or at national minimum standards in literacy and numeracy (Australian Curriculum, Assessment and Reporting Authority 2015). The higher proportion of males is not surprising as it is well documented that males are more likely to be identified as presenting learning difficulties than females (Australian Bureau of Statistics 2012; Westwood and Graham 2000).

Using the matchit package implemented in R (Ho et al. 2011; R Core Team 2013), we extracted a matched comparison sample of N = 556 adolescents with average-to-high levels of cognitive abilities (312 boys; 244 girls; 12–14 years old; M age = 12.41; 85.6 % native English speakers) from the rest of the Wollongong Youth Study sample. For purposes of the present study, exact matching was conducted on the basis of school, gender, age in Grade 7, and first language. To retain as much information as possible, we matched up to six participants with average-to-high levels of cognitive abilities to a single comparable participant with low levels of cognitive abilities. This 6:1 matching resulted in a set of weights that were used in all models so that the sample reflected the matching procedure (see Thoemmes and Kim 2011, for a review of propensity score matching procedures). These weights are such that their sum in each group is equal to the adolescents with average-to-high levels of cognitive abilities. Most (82.0 %) of these participants were from an Anglo-Australian background. In terms of socio-economic status, 4.2 % had parents corresponding to an underclass level, 28.6 % to a working class level, 49.2 % to an intermediate level, and 18.0 % to a salariat level. Because age, gender, and first language were used in the matching procedure, it is not surprising to note that these two samples did not differ from one another in terms of age, gender, and ethnic minority status (p ≥ .05). However, the sample of adolescents with average-to-high levels of cognitive abilities tended to come from a slightly higher socio-economic background (M = 1.97) than their peers with low levels of cognitive abilities (M = 1.79; p ≤ .05).

Instruments

Demographics

At the beginning of the study, participants self-reported their age (in unit increments) and gender (coded 0 for males and 1 for females). They also reported the main occupation of both of their parents, as well as the language they spoke most often at home, their birth country, and the birth country of their parents. All participants who reported having been born in Australia, speaking English at home, and having at least one parent also born in Australia were coded as being from an Anglo-Australian background (0) whereas the remaining participants were coded as being from an ethnic minority background (1). Finally, parental occupations were classified as a function of the major groupings of the Australian and New Zealand Standard Classification of Occupations (Australian Bureau of Statistics 2006). These were then simplified into broad categories of salariat (coded 3: including white collar professionals, managers, etc.), intermediate (coded 2: including trades, advanced clerical, service, etc.), working (coded 1: including elementary clerical or service workers, laborers, production, etc.), and underclass (coded 0: including unemployed, retired, homemakers, and pensioners; Jackson et al. 2012). These ratings were averaged across parents into a single measure of socio-economic status.

Internalizing and Externalizing Behaviors

At the end of each school year, homeroom teachers provided ratings of adolescents’ internalizing and externalizing behaviors. Homeroom teachers meet with the target adolescent on a daily basis and are in charge of monitoring their school life across disciplines in a specific school year. Homeroom teachers change across school years for any given adolescent. The items used for these ratings originate from the teacher version of the Multidimensional Peer Nomination Inventory (Pulkkinen et al. 1999). Internalizing behaviors were assessed by seven items (α = .83–.84 across time waves) covering symptoms of depression (e.g., “Is sad and depressed”) and anxiety (e.g., “Is shy in front of other students”). Externalizing behaviors included 13 items (α = .91–.93 across time waves) covering direct (e.g., “Teases smaller and weaker students”) and indirect (e.g., “Spreads rumors about other people’s personal matters when he/she is mad at them”) aggression, and symptoms of hyperactivity (e.g., Talks all the time) and inattention (e.g., “Is unable to concentrate on anything”). Ratings were given on a 4-point scale ranging from 0 (indicating that a specific behavior has not been observed) to 3 (indicating that the behavior describes the student very well). In this study, global scales of internalizing and externalizing behaviors are used.

A number of studies support the reliability and validity of these measures. Pulkkinen et al. (1999) showed that teachers’ ratings of boys’ and girls’ (aged 12) internalizing and externalizing behaviors on this measure presented a satisfactory level of scale score reliability (α = .93–.94 for externalizing and .77–.82 for internalizing), factor validity, and concurrent validity with parent and peer ratings of the same variables. Pulkkinen et al. (1999) also showed that these teachers’ ratings were related in theoretically expected ways with gender (e.g., with boys scoring higher on externalizing behaviors), and were stronger predictors of peer reports of internalizing and externalizing behaviors than parent ratings. More recently, Heaven et al. (2009) demonstrated that teacher’s ratings on these measures presented satisfactory levels of scale score reliability (α = .88–.93), were moderately stable over time (Grades 7–11), and presented concurrent validity with adolescents self-reported ratings of antisocial personality. Ciarrochi et al (2007) also provided evidence of scale score reliability (α = .83–.93), as well as of concurrent and predictive validity for teachers’ ratings on this instrument in relationship with adolescents’ (aged 12–14) self-reported ratings of hope, self-esteem, and positive attributional style.

Analyses

All models were estimated, while incorporating sample weights, using the robust weight least-square estimator (WLSMV) available in Mplus 7.3 (Muthén and Muthén 2014) to account for the ordered categorical nature of the four-point responses scales used for teacher ratings (Finney and DiStefano 2013; Rhemtulla et al. 2012). WLSMV estimation thus make no assumption regarding the underlying normality of the response scales, and results in the estimation of normally distributed latent factors, which represent the key variables of interest in the present study.

In this study, teachers competed an average 2.67 annual ratings of the participants, with 77.4 % of the participants having access to 3 sets of annual ratings, 12.1 % of the participants having access to 2 sets of annual ratings, and 10.5 % of the participants having access to 1 set of annual ratings, for a total of 1852 sets of time specific observations. A total of 637 participants had access to teacher’s ratings in Grade 8, 613 in Grade 9, and 577 in Grade 10. Comparable figures for participants with average-to-high levels of cognitive abilities are: M = 2.66; 3 sets: 77.0 %; 2 sets: 12.2 %; 1 set: 10.8 %; total: 1480; Grade 8: 505; Grade 9: 491; Grade 10: 462. Comparable figures for participants with low levels of cognitive abilities are: M = 2.70; 3 sets: 79.0 %; 2 sets: 11.6 %; 1 set: 9.4 %; total: 372; Grade 8: 132; Grade 9: 122; Grade 10: 115. Once missing time waves are taken into account, none of the participants had more than one time-specific missing response at the item level. To account for missing responses, models were estimated based on the full available information, based on algorithms implemented in Mplus for WLSMV (Asparouhov and Muthén 2010). This procedure has comparable efficacy to multiple imputation, while being more efficient (Enders 2010; Larsen 2011), and allows missing data to be conditional on all variables included in the model.

The main predictive model estimated in this study is illustrated in Fig. 1, where the ovals represent the latent constructs estimated directly from their items. This model will be estimated as a multi-group model across samples of participants with low, versus average-to-high, levels of cognitive abilities. The results from this model will be used to address Research Question 1. The measurement part from this model was specified as invariant across groups and time waves based on the results from preliminary CFA models reported in the Appendix, which helps to maximize the parsimony, stability, statistical power, and stability the model. Latent means estimated as part of these preliminary CFA models will further be used to test Hypothesis 1. In the estimation of multiple-group longitudinal models, it is important to assess whether the predictive system has reached equilibrium (Cole and Maxwell 2003; Mitchison et al. 2015). These tests verify whether the overall pattern of associations between internalizing and externalizing behaviors generalize across time periods. We started with the estimation of a model in which all predictive paths and correlations were freely estimated across time and groups. From this model, invariance constraints were progressively added to the autoregressive paths (A and B), the cross-lagged paths (P and R), and the time-specific correlations (C). For all tests of invariance, we started with a conservative model in which constraints were imposed across groups and time points, before moving to less demanding models of invariance across groups or time when results failed to support complete invariance assumptions. The results from these tests will further be used to address Research Questions 2 (equivalence of the predictions across time intervals) and 3 (equivalence of the predictions across groups). These models were then re-estimated while controlling for the effects of age, gender, minority status, and socio-economic status on the estimated relationships.

Model fit was evaluated using typical indicators as implemented in conjunction with WSLMV estimation (Hu and Bentler 1999; Yu 2002): (a) the chi-square statistic (χ 2), (b) the comparative fit index (CFI), (c) the Tucker-Lewis index (TLI), and (d) the root mean square error of approximation (RMSEA) and its 90 % confidence interval. According to typical interpretation guidelines (Hu and Bentler 1999; Marsh et al. 2005; Marsh et al. 2004; Vandenberg and Lance 2000) values greater than .90 and. 95 for the CFI and TLI respectively reflect adequate and excellent fit, while values smaller than .08 or .06 for the RMSEA respectively indicate acceptable and excellent fit. For the comparison of nested models used in the tests of measurement invariance (see Appendix) and predictive equilibrium, we examined changes in these fit indices, based on the recommended guidelines (Chen 2007; Cheung and Rensvold 2002; Vandenberg and Lance 2000) that a more constrained (or invariant) model can be considered as providing an equivalent fit than a less constrained model when it is accompanied by CFI or TLI decline of .010 or less and by a RMSEA increase of .015 or less. However, given that these recommended guidelines have been so far only validated for tests of measurement invariance, their relevance to tests of predictive invariance remains unknown. Thus, for tests of predictive invariance, we consider any decrease in model fit to suggest possible non-equivalence of the predictive paths, and combine examination of changes in goodness-of-fit indices with a more attentive consideration of parameter estimates. It should be noted that with WLSMV, the χ 2 values are “estimated” as the closest integer necessary to obtain a correct p-value, making it possible for the χ 2 and CFI values to be non-monotonic with model complexity. These apparent increases in fit should simply be interpreted as reflecting equally-fitting models.

Results

The results from the preliminary measurement models reported in the Appendix supported the adequacy of the a priori longitudinal measurement models, their strict invariance over time and groups of participants with low and average-to-high levels of cognitive abilities (i.e., the comparability of the ratings over time and groups), and the composite reliability of the latent constructs. Latent correlations also showed that both internalizing (r = .466–.505) and externalizing (r = .452–.639) behaviors were quite stable over time. In contrast, longitudinal correlations between internalizing and externalizing behaviors remained small, and often non-significant, although they tended to increase at later time waves. These results further supported Hypothesis 1 in showing that, when latent means were fixed to zero and latent variances to one for identification purposes for Grade 8 adolescents with average-to-high levels of cognitive abilities, adolescents with low levels of cognitive abilities presented higher levels of internalizing (+.520, p ≤ .01) and externalizing (+.322, p ≤ .01) behaviors in Grade 8 (expressed in standard deviation units). The magnitude of these differences remained stable over time (internalizing: +.450 and +.458 SD units in Grades 9 and 10; externalizing: + .261–.277 SD units in Grades 9 and 10). Finally, for both groups of adolescents, average levels of externalizing behaviors remained stable over time (non-significantly different from the grand mean of 0), while average levels of internalizing behaviors slightly decreased (−.259 SD per year, p ≤ .01).

Starting from the final model of strict measurement invariance across groups and time points to ensure the complete comparability of teacher’s ratings of internalizing and externalizing behaviors, we first estimated an autoregressive cross-lagged predictive model (corresponding to Fig. 1) in which all predictive paths and time-specific correlations were freely estimated across groups and time (Model 1). The goodness-of-fit of all predictive models is reported in Table 1. These results show that Model 1 provided an acceptable level of fit to the data according to CFI and TLI ≥ .900 and an excellent fit to the data according to a RMSEA ≤ .060. From this model, the autoregressive (Model 2) and cross-lagged paths (Model 3) were constrained to be equal across groups and time waves. For both models, fit indices including a correction for parsimony (TLI, RMSEA) improved as a result of these constraints (∆TLI = +.003–+.007; ∆RMSEA = −.001), while the CFI showed no decrease. These results thus support that the predictive system has reached equilibrium across time waves, and is fully equivalent across groups. However, when time-specific correlations were also constrained to be equivalent across groups and time points (Model 4), all goodness-of-fit indices revealed a slight decrease (∆CFI = −.002; ∆TLI = −.002) accompanied by a significant chi-square difference test. Although this decrease remained relatively small, it calls into question the equivalence of the time-specific correlations across groups and time points. An examination of these time-specific correlations obtained in the previous model (Model 3) suggests that the time-specific correlations may not be fully equivalent across time waves. To verify this possibility, two models were estimated, one in which the time-specific correlations were constrained to be equivalent across groups (Model 5) but not time waves, and one in which they were constrained to be equivalent across time waves, but not groups (Model 6). The goodness-of-fit indices associated with Model 6 mimic those obtained for Model 4, suggesting that the time-specific correlations differ across time waves. In contrast, Model 5 supports the equivalence of the time-specific correlations across groups. Model 5 was thus retained as the final model. The results from the models including the demographic controls (Models 7–13) lead to identical conclusions, supporting the superiority of Model 12 in which all regressive paths are constrained to equality across groups and time wave, whereas the time-specific correlations are allowed to differ across time waves, but not across groups.

Table 1 Goodness-of-Fit Indices for the Predictive Models

To answer Research Questions 2 and 3, these results show that the predictive relationships across (paths P and R in Fig. 1) and within (the paths A and B in Fig. 1) constructs were fully equivalent across the two sets of time intervals considered in this study (Grades 7–8 and Grades 8–9), as well as across groups of participants with average-to-high, and low, levels of cognitive abilities. However, they also show that although the time-specific correlations among constructs (C in Fig. 1) were equivalent across groups of participants with average-to-high, and low, levels of cognitive abilities, they tend to change over time.

Parameter estimates for Models 5 and 12 are reported in Table 2. The results from models excluding (Model 5) or including the demographic controls (Model 12) are essentially identical, supporting the idea that these controls were not necessary. We thus focus on the results from Model 5 (excluding controls). Standardized estimates from this model are also reported in Fig. 2. These results first show moderately high (and similar), levels of longitudinal stability in levels of internalizing (β = .566 to .654) and externalizing (β = .598 to .692) behaviors in both groups. The size of these coefficients still suggests that changes over time remain possible, and possibly frequent. The results show that each type of problem is also related to changes in the levels of the other problem over time, albeit these reciprocal relationships are smaller than the autoregressive estimates of stability. These relationships are also slightly more pronounced between internalizing behaviors and later levels of externalizing behaviors (β = −.138 to −.154), than the reciprocal relationships between externalizing behaviors and later levels of internalizing behaviors (β = −.099 to −.133) for both groups. These reciprocal relationships are negative, suggesting that prior levels of internalizing or externalizing behaviors predict later decreases in the other type of problem once stability and time-specific correlations are taken into account. Finally, the results show that time-specific correlations between both constructs appears to be relatively small in Grades 8 and 9 (r = .152 to .313), but become substantially larger in Grade 10 (r = .822 and .952) for both groups of adolescents.

Table 2 Results from Models 5 (without controls) and 12 (with controls)
Fig. 2
figure 2

Standardized Results from the Final Predictive Model (Model 5 in Table 1). Note As the unstandardized coefficients (b) are fully equivalent across groups and time waves, any variations in the size of the standardized (β) coefficients or correlations (r) are related to variations in group- and time-specific variance estimates. Coefficients accompanied by the subscript a/h-ca refer to participants with average-to-high levels of cognitive abilities, whereas those accompanied by the subscript l-ca refer to participants with low levels of cognitive abilities. *p ≤ .01

Thus, to answer Research Question 1, these results reveal high levels of developmental stability for both constructs, as well as levels of time-specific associations between them that tend to increase over time. Furthermore, they reveal small, yet consistent, negative associations whereby levels of internalizing behaviors tend to predict decreases in externalizing behaviors over time, and levels of externalizing behaviors tend to predict decreases in internalizing behaviors over time.

Discussion

The high level of co-occurrence between internalizing and externalizing behaviors among youth is well-documented (e.g., Reitz et al. 2005). However, the temporal ordering of both types of problems remains ambiguous. In addition, the fact that youth with low levels of cognitive abilities tend to present higher levels of internalizing and externalizing behaviors than youth with average-to-high levels of cognitive abilities at any given time is also well documented (Dekker et al. 2002; Metcalfe et al. 2013; Whitaker and Read 2006). However, the extent to which knowledge regarding temporal relationships between these two types of problems generated among adolescents with average-to-high levels of cognitive abilities generalizes to youth with low levels of cognitive abilities remains essentially unknown. To address these issues, this study explored the bidirectional temporal associations between teachers’ ratings of internalizing and externalizing behaviors among matched samples of adolescents with low levels of cognitive abilities and average-to-high levels of cognitive abilities extracted from a representative sample of Australian adolescents across three time waves.

Associations between Teachers’ Ratings of Internalizing and Externalizing Behaviors

Our first two research questions aimed to uncover the types of relationships that would be observed between teachers’ ratings of adolescents’ levels of internalizing and externalizing behaviors (Research Question 1), and whether these relationships would generalize across time points (Research Question 2). Our results revealed moderately strong levels of cross-sectional associations between internalizing and externalizing behaviors that tended to increase over time, moderately high rates of stability for both types of ratings over time, and mutually suppressing relationships, also stable over time, whereby each type of behavior longitudinally predicted decreases in the other type of behavior.

Increasing Levels of Cross-Sectional Associations

Looking at the pattern of relationships emerging from the final predictive model, we noted significant time-specific correlations between teachers’ ratings of internalizing and externalizing behaviors. This result is interesting, especially given that the correlation between both types of problems observed in Grades 8 and 9 (r = .152–.313) is slightly lower than what could have been expected on the basis of prior research with children or adolescents (Burt et al. 2008; Gilliom and Shaw 2004; Reitz et al. 2005). A possible explanation for this lower correlation might be that the mechanisms supporting co-occurrence may slightly break down across the transition into adolescence, as youth enter puberty, get used to a new school and form a new social network (Cicchetti 2013; Eccles et al. 1993). However, as the situation in the new school stabilizes to become more familiar to the adolescents, and risk factors for both types of problems start to accumulate, then co-occurrence levels increase. It should be noted that, although the size of the time-specific correlations observed in Grade 10 may appear to be quite high (r = .822 and .952), this correlation represents the association between the disturbances associated with both latent constructs, that is the part of each construct remaining unexplained by the predictive paths estimated in the model. The Grade 10 correlation among these two constructs estimated as part of the preliminary measurement models (r = .261–.531) thus provides a more realistic estimate of the simple associations between the constructs at the end of the study. Still, these results suggest that future studies should devote attention to shared risk factors, their possible accumulation in specific individuals, and their evolution across key life transitions.

Because this study relies on teacher ratings, it can be assumed that teachers’ knowledge of the internal states associated with behavioral problems as they occur in adolescents remains limited. More precisely, their assessments had to remain based strictly on observed behaviors occurring within the school context, which could possibly have led to a slight underestimation of internalizing and externalizing behaviors, and particularly for internalizing behaviors when they are hidden by externalizing behaviors. These ratings may thus have led to a slight under-estimation of the time-specific correlations between both types of behavioral problems, reinforcing the need for future research to rely on multisource assessments obtained from a wider range of informants (including students, parents and peers) in a variety of life contexts (e.g., home, peer groups, school) (for a discussion of multi-informant strategies, see De Los Reyes et al. 2015).

Longitudinal Stability

Our results also revealed that both internalizing and externalizing behaviors presented moderately high rates of longitudinal stability. Given that the results are based on teachers’ ratings, which differ on a yearly basis for each specific student, the observed rates of stability cannot be assumed to reflect teachers’ memory or consistency biases, through which teachers could expect a specific adolescent to display consistent behaviors across school years. Because the teacher providing the ratings changes over time, these rates of stability can be expected to reflect mainly observable behavioral stability, which is consistent with previous research showing that both internalizing and externalizing behaviors tend to present high levels of chronicity and developmental stability (e.g., Beyers and Loeber 2003; Keiley et al. 2000). Prior findings also suggest that stability may increase with age as internalizing and externalizing behaviors become more chronic (Osgood et al. 1988; Reitz et al. 2005). However, these prior studies have considered longer developmental spans as well as key life transitions, whereas the current study considers a more normative developmental period covering three subsequent secondary school years. Should rates of stability increase with age, it is thus possible that these increases may appear before (Reitz et al. 2005) or after (Osgood et al. 1988) the period covered here. Still, our results show that during adolescence, these rates of stability remained moderate, suggesting that change is possible, and that change mechanisms may be of particular interest to intervention research.

Mutually Suppressing Relationships between Internalizing and Externalizing Behaviors

Among the mechanisms underlying changes, our results revealed that these two types of problems shared negative reciprocal relationships over time, so that levels of any one type of behavioral difficulty predicted decreases in the levels of the other type of behavioral difficulty over time. In other words, our study reveals that, once the predictive cross-lagged relationships are properly controlled for the stability and co-occurrence of both types of behaviors, internalizing and externalizing behaviors share mutually suppressing reciprocal relationships over time. These results are consistent with prior reports of negative cross-lagged relationships between internalizing and externalizing behaviors among samples of older children or adolescents (Lee and Bukowski 2012; Wiesner 2003), particularly in research based on informant reports (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Masten et al. 2005; Rogosch et al. 2010; Van der Ende et al. 2016).

Although it is true that only a subset of prior studies found negative cross-lagged relationships between both types of behavioral problems, it is important to note that with two exceptions (Van der Ende et al. 2016; Van der Giessen et al. 2013), prior studies did not test the predictive equilibrium of the system. This may explain why previous results have been so inconsistent. Most previous studies including more than two time waves found relationships between internalizing and externalizing behaviors that were limited to a subset of the time intervals considered (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Klostermann et al. 2015; Masten et al. 2005; Moilanen et al. 2010; Rogosch et al. 2010; Van Lier and Koot 2010; Wiesner 2003). Without systematic tests of predictive equilibrium, it is impossible to ascertain whether differences reflect true developmental disparities, or simply random sampling variations. In the current study, we found evidence of predictive equilibrium for both the stability paths as well as for the reciprocal paths, supporting the generalizability of the results across the two time periods considered here, as well as maximizing the statistical power to detect significant relationships by the introduction of a greater level of parsimony to the model.

A particularly intriguing observation comes from the fact that prior studies of teacher’s ratings conducted with younger populations all report some form of direct, or indirect, positive relationships between externalizing behaviors and later levels of internalizing behaviors, but no reciprocal relationships between internalizing behaviors and later levels of externalizing behaviors (Gooren et al. 2011; Van Lier and Koot 2010; Van Lier et al. 2012). Similarly, most studies in which negative relationships were noted between internalizing and externalizing behaviors relied on samples of adolescents or older children (Burt et al. 2008; Englund and Siebenbruner 2012; Lee and Bukowski 2012; Masten et al. 2005; Rogosch et al. 2010; Van der Ende et al. 2016; Wiesner 2003). These observations strongly suggest that the mutually suppressing relationships identified here might be specific to adolescence, and that future research should devote attention to the identification of the mechanisms at play in these developmental differences. It is important to keep in mind that the current study mainly serves to open a window into an ongoing causal chain which typically starts in childhood, and may last well into adulthood. For now, although the exact nature of the reciprocal relationships between internalizing and externalizing behaviors remains under-documented, research suggests that these relationships are much smaller than their cross-sectional associations and stability, and possibly negative.

The observation of negative, or mutually suppressing, relationships between ratings of internalizing and externalizing behaviors during the adolescent period appears inconsistent with the key theoretical models typically invoked in studies of these reciprocal relationships (e.g., Capaldi 1991 1992; Carlson and Cantwell 1980; Gold et al. 1989; Drabick et al. 2010; Moilanen et al. 2010; Oland and Shaw 2005; Wolff and Ollendick 2006), which explicitly assume these relationships to be positive and mutually reinforcing. However, the results are not inconsistent with other well-established biological or clinical theoretical models often used to describe the clinical picture of internalizing and externalizing behaviors in disconnection from one another. Thus, based on parallels drawn with the expanded fight, flight or freeze responses to threatening situations (Bracha et al. 2004; Maack et al. 2015), internalizing behaviors are often assumed to correspond to the last two types of responses, which are incompatible with the display of externalizing behaviors. Thus, although anxiety is known to result in a priming of the “fight-or-flight” response, this priming often results in avoidance behaviors, which form a critical component of its clinical picture (e.g., Zinbarg et al. 1992). Similar avoidance mechanisms are also known to be part of other forms of internalizing behaviors, such as social withdrawal (Rubin et al. 2009) and depression (Ottenbreit et al. 2014). Similarly, the Social Competition Hypothesis (Price et al. 1994, 2004; Nesse 2000) notes similarity between internalizing behaviors and primitive responses of yielding or “freezing” in the face of challenging situations, making the reliance on externalizing responses less likely.

Conversely, Snyder’s (2002; Snyder and Patterson 1995; Snyder and Stoolmiller 2002) reinforcement perspective suggests that externalizing behaviors may, particularly in adolescence, represent a form of adaptive reaction to problematic environments, providing the affected individuals with a variety of negative and positive reinforcements, which may in turn limit the possibility that they will develop internalizing behaviors. The current results add to accumulating evidence (Boylan et al. 2010; Burt et al. 2008; Englund and Siebenbruner 2012; Klostermann et al. 2015; Masten et al. 2005; Moilanen et al. 2010; Rogosch et al. 2010; Van Lier and Koot 2010; Wiesner 2003) suggesting that increased levels of scientific attention should be devoted to achieving a complete understanding of the mechanisms involved in the reciprocal relationships between internalizing and externalizing behaviors. Furthermore, these results suggest a few promising theoretical perspectives as outlined above to guide these future investigations (e.g., Kunimatsu and Marsee 2012; Price et al. 1994, 2004; Snyder 2002; Zinbarg et al. 1992) even though they have yet to be systematically applied to studies of the interrelationships between internalizing and externalizing behaviors. It is important to reinforce, however, that although we found mutually suppressing longitudinal reciprocal relationships between internalizing and externalizing behaviors during this particular developmental period, the time-specific correlations remained positive, supporting the co-occurrence of these behaviors.

It is noteworthy that some previous studies suggested that the observed relationships may differ as a function of gender (Boylan et al. 2010; Lee and Bukowski 2012; Wiesner 2003; but see Van der Ende et al. 2016). Although we showed that adding gender as a controlled variable in the analyses did not change the overall pattern of results, it was not possible to more systematically investigate whether the observed relationships differed, or not, as a function of gender due to the relatively small gender-differentiated samples. Even more generally, the analyses conducted here assume that the relationships identified in our final model applied equally to all participants. Although we did systematically investigate for the effects of one source of sample heterogeneity related to cognitive ability levels, the possibility that the observed relationships may differ as a function of other participants’ characteristics has not been carefully explored. This possibility needs to be investigated more thoroughly in future research, which could potentially benefit from the adoption of a person-centered perspective aiming to directly identify sources of heterogeneity within the sample under consideration (Morin and Wang 2016; Ram and Grimm 2009). A person-centered approach might help to provide complementary perspectives on the relationships under consideration, by allowing researchers to identify subgroups of participants following distinctive longitudinal trajectories of internalizing behaviors, externalizing behaviors, and their co-occurrence.

Generalizability to Adolescents with Different Levels of Cognitive Ability

The present study systematically investigated the extent to which the longitudinal relationships between teacher ratings of internalizing and externalizing behaviors would generalize to samples of adolescents with low versus average-to-high levels of cognitive abilities. Our preliminary analyses first demonstrated that teachers’ ratings of internalizing and externalizing behaviors were fully comparable (invariant) across both groups of adolescents. Given previously expressed concerns regarding the ability of youth with low levels of cognitive abilities to properly report on their internal states and behaviors (e.g., Finlay and Lyons 2001), research with this population often relies on informant reports. However, teacher reports may themselves be biased by their own assumptions regarding the way specific student characteristics (such as low levels of cognitive abilities) may influence the way constructs (such as internalizing behaviors) are manifested, or the extent to which these constructs can be observed. Our results suggest that this was not the case, and that teachers were able to provide equivalent ratings of internalizing and externalizing behaviors for adolescents with low levels of cognitive abilities, relative to their ratings of students with average-to-high levels of cognitive abilities. As further evidence of the validity of teachers’ ratings, our results are in line with those from previous studies relying on alternative sources of information (Dekker et al. 2002; Whitaker and Read 2006) and support our hypothesis that adolescents with low levels of cognitive abilities tend to present higher levels of internalizing and externalizing behaviors than their peers with average-to-high levels of cognitive abilities. Our results further showed that these differences are stable in adolescence, even though internalizing behaviors slightly decreased over time for both groups (de Ruiter et al. 2007).

As a response to our final research question, our predictive model revealed that all relationships between internalizing and externalizing behaviors were fully replicated and equivalent across samples of adolescents with low versus average-to-high levels of cognitive abilities. These results are important, and tentatively suggest that current knowledge regarding the co-occurrence of internalizing and externalizing behaviors obtained within samples with average-to-high levels of cognitive abilities can be expected to generalize to youth with low levels of cognitive abilities. Although we expected that at least some of the relationships would differ as a function of the limited cognitive and social skills of youth with low levels of cognitive abilities, our results rather suggest that internalizing and externalizing behaviors tend to follow similar developmental pathways for adolescents with low levels of cognitive abilities compared to adolescents with average-to-high levels of cognitive abilities.

More precisely, a variety of theoretical perspectives (i.e., Capaldi 1991, 1992; Moilanen et al. 2010; Sameroff 1983) suggest that the more common exposure to experiences of failure in the social and academic area typically reported among adolescents with low levels of cognitive abilities should result in the observation of stronger mutually reinforcing relationships between internalizing and externalizing behaviors among this population. In contrast, because of their typically greater deficits in the cognitive and social areas, adolescents with low levels of cognitive abilities should be less likely to attain the social developmental milestones necessary for the emergence of such mutually reinforcing relationships (e.g., Oland and Shaw 2005; Weeks et al. 2014)—leading us to expect weaker reciprocal relationships. Similarly, possible ceiling effects in the accumulation of negative experiences could limit the possibility that additional failure experiences could result in stronger relationships between internalizing and externalizing behaviors. Thus, although these various perspectives disagree with one another, they all converge in the suggestion that reciprocal relationships between internalizing and externalizing behaviors should differ (being either stronger, or weaker) among samples of adolescents with low levels of cognitive abilities when compared to their peers with average-to-high levels of cognitive abilities. Our results suggest otherwise, and reveal that the key difference appears to be related only to the quantity of symptoms (potentially due to the greater number of risk factors for youth with low levels of cognitive abilities: e.g., Rose et al. 2011; Valas 1999; Wehmeyer 2005), rather than to more qualitative differences in the way these symptoms relate to one another. Still, our results also suggest the need for additional theoretical developments in this area. Indeed, a common characteristic of these prior theoretical propositions is that they all assumed that relationships between internalizing and externalizing behaviors would be mutually reinforcing. However, by revealing mutually suppressing relationships between internalizing and externalizing behaviors, our results suggest that these various theories may not be able to properly explain reciprocal relationships between internalizing and externalizing behaviors as they occur in adolescence. Understanding why these relationships turned out to be similar, rather than different, among these two populations should be anchored in a better understanding of why these relationships are mutually suppressing in the first place. Although we proposed a variety of tentative explanations for these results, future research will be clearly needed to test, revise, and improve on these explanations.

Finally, it is important to keep in mind that the type of educational placement experienced by adolescents with low levels of cognitive abilities is an important contextual factor impacting upon their development (e.g., Tracey et al. 2003). The Wollongong Youth Study, which provided the sample for this study, did not collect information about adolescents’ type of educational placement and did not include specialized establishments, which limits our ability to consider the role of this factor and populations experiencing the lowest levels of cognitive functioning. Similarly, the current study adopted a non-categorical approach focusing on student’s level of cognitive functioning rather than focusing on the etiology of their cognitive difficulties (e.g., Autism, Down syndrome). Still, information about specific developmental and learning disabilities, had such information been available, might have provided us with a more nuanced understanding of the mechanisms involved in the development of internalizing and externalizing behaviors for these adolescents. In addition, cognitive abilities were assessed through a standardized test of literacy and numeracy. Despite evidence, previously presented, that this test provides a highly reliable proxy for IQ, they remain, at best, an indirect measure of IQ and cognitive functioning. As such, future research would do well to more carefully consider the impact of school placements, diagnoses of developmental disabilities and cognitive levels as assessed through formal standardized IQ tests on the current results.

Conclusion

Internalizing and externalizing behaviors are frequent among adolescents, accompanied by a variety of negative developmental consequences, and often co-occurring. So far, this co-occurrence has been assumed to reflect the action of common determinants, but also the presence of mutually reinforcing relationships between these two types of behavioral problems. For preventive purposes, the presence of such mutually reinforcing relationships would suggest the possibility of preventing, or at least reducing, the risk for both types of behavioral problems through interventions targeting only one type of behavioral problem. However, prior research has led to ambiguous results regarding the nature of the reciprocal relationships found among these two types of behavioral problems, particularly during adolescence where accumulating evidence suggests that these relationships could be mutually suppressing, rather than reinforcing. In providing further evidence that these relationships were indeed mutually suppressing, the present study calls for improved theoretical developments in this area. More importantly, it suggests that preventive interventions cannot simply assume that reducing the risk for one type of behavioral problem would directly translate to a reduction in the risk of the other behavioral problem. Rather, it suggests the importance of developing joint programs targeting both types of behavioral problems.

Another key contribution of the present study is the observation that the relationships fully generalized to adolescents with low levels of cognitive abilities, which are known to present a higher level of risk for both types of behavioral problems than their peers with average-to-high levels of cognitive abilities (Dekker et al. 2002; Metcalfe et al. 2013; Whitaker and Read 2006). This result is particularly important given the relative rarity of research focusing on samples of youth with low levels of cognitive abilities, which can be partly explained by the challenges posed by conducting research with this population (i.e., difficulty of access, complexity of obtaining reliable measures, smaller samples, etc.). Possibly because of these challenges, developmental research on youth with low levels of cognitive abilities has often been conducted in relative disconnection from developmental research conducted with more normative populations, leading often to the development of interventions that are specific to youth with low levels of cognitive abilities. By providing evidence that the pattern of relationships between internalizing and externalizing behaviors generalizes to both samples, the present research suggests that the more extensive knowledge base accumulated with normative populations may generalize to samples of adolescents with low levels of cognitive abilities—as least in regards to the constructs considered in the present study. In doing so, these results also suggest the possibility to develop preventive interventions that would equally apply to all adolescents, pending further research aiming to better document the extent to which the mechanisms involved in the development of both types of behavioral problems also generalize to both populations.