The Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM; American Psychiatric Association, 2013) utilizes the consistency of behaviors associated with various disorders across settings for some diagnostic classifications. For Attention-Deficit/Hyperactivity Disorder (ADHD), symptoms of either inattention or hyperactivity/impulsivity must be present in at least two settings for the diagnostic criteria to be met; the number of symptoms present beyond the diagnostic threshold indicates severity of the disorder. For Oppositional Defiant Disorder (ODD), symptoms of angry/irritable mood, argumentative/defiant behavior, and vindictiveness must be present in at least one setting; the number of settings in which these symptoms are present indicates severity of the disorder. Clinicians and researchers often rely on different reporters to assess the presence of symptoms across settings. For instance, children’s teachers report on child behaviors at school and children’s parents report on child behaviors at home. However, agreement between teachers and parents on reports of children’s externalizing behaviors is consistently reported to be low. Although there has been a substantial amount of research documenting these discrepancies (e.g., Achenbach et al., 1987; De Los Reyes et al., 2015), little prior work has explored the degree to which different raters’ reports of behavior correspond to objective indicators of functioning in other domains. Furthermore, no prior work has explored whether these different raters’ reports of behavior account for predominately unique or overlapping variance in such indicators. If raters have substantially different relations to an objective indicator that is negatively affected by ADHD or ODD symptoms, it raises questions about the broad use of multi-informant report in diagnostics. Exploring these questions may help clarify the importance of symptoms in different settings, clarify issues related to clinical utility of different raters, and provide a basis for selecting assessment and treatment targets in clinical settings.

Rater Discrepancies

A substantial body of evidence documents a lack of agreement between parent and teacher reports of externalizing behaviors (Antrop et al., 2002; De Los Reyes et al., 2013, 2015; Lane et al., 2013; Nelson & Harwood, 2011; Stone et al., 2013). A meta-analysis of 119 studies highlighted that parent and teacher reports had a small level of association (r = .27); however, the same meta-analysis indicated that reports of behavior from similar informants (e.g., two parents) had a relatively high level of association (r = .60; Achenbach et al., 1987). Corroborating the results of this older meta-analysis, results of more recent reviews also demonstrated that parent and teacher reports of various behaviors do not have high inter-rater reliability (De Los Reyes et al., 2013, 2015). For example, De Los Reyes et al. (2015) reported an average correlation of 0.28 between parent and teacher reports of externalizing behaviors across 162 studies published subsequent to the Achenbach et al. (1987) meta-analysis. Although some studies indicate that discrepancies are higher with younger children (e.g., Holmberg & Bolte, 2014), results of the De Los Reyes et al. (2015) meta-analysis and other studies (Narad et al., 2015) indicated discrepancies are consistent through adolescence. These modest correlations raise questions about the utility of either parent or teacher ratings.

De Los Reyes and colleagues (e.g., De Los Reyes & Epkins, 2023; De Los Reyes et al., 2013) noted that the traditional approach to discrepancies in reports of behavior is to focus on the degree of overlap. For example, variance common to two reports reflects the construct of interest and the non-overlapping variance reflects non-construct variability. De Los Reyes et al. (2013) introduced the Operations Triad Model (OTM) to shift the focus from “convergence as validity” toward a focus on understanding reasons for discrepancies. Within the OTM, there are three types of measurement operations: converging, diverging, and compensating. Converging operations reflect the traditional expectation that two measures of the same construct will agree, leading to the same conclusion (e.g., psychopathology present, psychopathology not present). Diverging operations reflect instances when two measures do not agree, leading to different conclusions, but both reflect meaningful variation in the construct being assessed. Compensating operations reflect instances when two measures do not agree, leading to different conclusions, but the differences do not reflect meaningful variation in the construct being assessed because of methodological differences, error, or validity issues related to one or both measures. One goal of the OTM is to promote identification of situations in which discrepancies reflect meaningful variation and provide clinical utility.

The OTM provides a framework for considering when divergent ratings of behavior may be useful or when divergent ratings suggest a measurement or validity issue. It is possible that children’s behavior is objectively different in different settings, and that teachers and parents accurately report the behavior they observe, resulting in discrepant ratings. This would be a case of diverging operations, and each rater’s report would be expected to provide clinically useful information. It is also possible that there is variation in the validity of different raters’ behavior ratings that yields discrepant ratings, which would indicate compensating operations. There are potential reasons for differences in teacher and parent ratings of objectively similar behaviors. First, because of their more extensive exposure to children, teachers have a better frame of reference for distinguishing typical and atypical behaviors for children of a certain age or developmental level than do parents; therefore, teachers’ behavior ratings may be more accurate or more consistent from child to child. Second, behavioral expectations are likely to be stable across time and uniform across children in schools. Consequently, behavior that does not fall within expected behavioral norms is likely more salient, perhaps leading to more accurate or more consistent reporting. In contrast, behavioral expectations at home may be less stable across time (e.g., parents adapting expectations based on child behavior), less uniform across children (i.e., different parents have different expectations), or both. Thus, the identification of behavior that does not fall within expected behavioral norms may be idiosyncratic, leading to less accurate or less consistent reporting by parents. Finally, because of stable and uniform behavioral expectations in school, children face a greater demand for adaptation to those expectations than they do at home. In a child’s home, behavioral expectations may be more likely to be adapted to the individual or, conversely, expectations may be held as rigid, which could lead to developmentally inappropriate expectations. Therefore, it is possible that in school settings, children unable to regulate behaviors to meet developmentally appropriate behavioral expectations are those who are accurately identified by teachers as exhibiting developmentally atypical levels of externalizing behaviors, but children identified as not meeting behavioral expectations at home may not be those exhibiting developmentally atypical levels of externalizing behaviors.

Associations Between Externalizing Disorders and Other Domains

Externalizing diagnoses, such as ADHD and ODD, and the behaviors that define these disorders, are associated with impairments in other domains including academic performance (e.g., Allan et al., 2018; Evans et al., 2020; Gray et al., 2017; Kulkarni et al., 2020; Masten et al., 2005; Ogg et al., 2016; Owens et al., 2015; Owens et al., 2016; Tarver et al., 2014), social competence and peer relationships (e.g., Evans et al., 2020; Owens et al., 2015; Owens et al., 2016; Tarver et al., 2014), and other dimensions of self-regulation, like executive function (e.g., Romero-Lopez et al., 2017; Yang et al., 2022). In terms of the relation between externalizing behaviors and academic performance, both Arnold et al. (2020) and Frazier et al. (2007) conducted meta-analyses demonstrating that higher levels of ADHD symptoms were associated with lower levels of academic achievement, but these meta-analyses did not distinguish between inattentive and hyperactive/impulsive symptoms. Based on a qualitative synthesis of 41 studies, Gray et al. (2017) concluded that inattention in both clinical and non-clinical samples was significantly and negatively associated with academic outcomes. Symptoms of ODD are associated with poor academic performance, even when controlling for ADHD symptoms (Daley & Birchwood, 2010; Frazier et al., 2007), and have been associated with greater educational difficulties through high school (Burke et al., 2014).

Despite substantial discrepancies between teacher and parent report, studies tend to use either teacher or parent ratings, without acknowledging the reported discrepancies between different raters’ reports or the possible differential associations with outcome domains (e.g., Burke et al., 2014; Fuchs et al., 2006; Swanson, 2011). Additionally, some studies do not differentiate between the two raters in analyses (e.g., Daucourt et al., 2020; Gray et al., 2017). For example, although the reviews by Gray (2017) and Daucourt et al. (2020) summarized studies that included both teacher and parent reports of children’s ADHD-related behaviors, they did not examine teacher and parent report separately. However, examination of studies included in Gray and Daucourt et al. revealed that the correlation between teachers’ ratings of ADHD-related behaviors and academic outcomes ranged from 0.15 to 0.60, with an average correlation of 0.47. In contrast, parents’ ratings of ADHD-related behaviors and academic outcomes ranged from 0.17 to 0.37, with an average correlation of 0.25.

Despite discrepancies, separate raters’ reports may still represent valid indicators of the presence or absence of various symptoms, as the different raters may be reporting on different aspects of the underlying psychopathology. If this were the case, it would be expected that reports of externalizing behaviors from different raters would have similar associations with other functional domains despite discrepancies between each other. However, if separate raters’ ratings have differential relations to a functional outcome, such as academic achievement, that is negatively affected by the presence of externalizing behaviors and disorders, then raters may be supplying non-equivalent diagnostic information. Evidence suggests that requiring parents and teachers to agree on the presence of a behavior for an ADHD diagnosis reduces diagnostic accuracy (e.g., Martel et al., 2021). Although a comparison of the association between teacher and parent reports with outcomes in domains affected by a disorder is not a study of diagnostic accuracy, findings from such a study may have important implications for future clinical research, including the utility of different raters’ reports for diagnoses.

Current Study

Whereas discrepancies between parent and teacher reports of externalizing behaviors have been well documented (e.g., Achenbach et al., 1987, De Los Reyes et al., 2015), much less research has examined the associations these different behavior reports have with academic achievement or other outcome domains. The research that has summarized the associations between behavior reports and academic achievement typically has not distinguished between parent or teacher reports (e.g., Gray et al., 2017; Daucourt et al., 2020). One way to address whether the discrepancies between parent and teacher reports of externalizing behaviors represent diverging operations or compensating operations is by comparing the associations of teacher and parent reports to another domain. Consequently, the primary goal of this study was to compare correlations between parent-rated externalizing behaviors and academic skills and teacher-rated externalizing behaviors and academic skills in a sample of preschool-age children. A secondary goal of this study was to examine the degree to which parent and teacher reports of externalizing behaviors yielded incremental predictive utility for academic achievement. Based on our analysis of data from studies included in the Gray et al. (2017) and Daucourt et al. (2020) reviews, it was hypothesized that teacher ratings of externalizing behaviors would have stronger predictive utility for academic achievement compared to parent reports of externalizing behavior. Based on the expectation that discrepancies between teacher and parent reports reflect diverging operations, it was hypothesized that parent and teacher reports would have incremental predictive utility. Thus, when ratings from both parents and teachers were included in a predictive model, they would both account for unique as well as shared predictive variance.

Method

Participants

This study used an archival dataset from a large-scale study that involved the prediction and prevention of reading difficulties for preschool children who were at higher-than-average risk for such difficulties. Children were recruited from preschools in private centers and the Title I preschools of the local school district in north Florida in three successive years beginning in 2008. The sample used in this study included all children from the larger dataset who had parent ratings of behavior, teacher ratings of behavior, and academic outcome data. The sample included 695 children (376 boys, 318 girls, 1 unknown) who ranged in age from 48 months to 63 months (mean age = 55.05 months; SD = 3.63 months) at the time of their initial assessments. The sample was racially and ethnically diverse; 46.9% of the children were White, 43.2% were Black/African American, 3% were Hispanic, 2.7% were Asian, 0.3% were American Indian, 3.3% identified as more than one race, and the race of 0.4% of the sample was unknown. From the sample, 63.9% of participants’ families earned $50,000 or less, 14.8% of participants’ families earned between $50-$75,000, 13.1% of participants’ families earned between $75-$100,000, 6.4% of participants’ families earned between $100–150,000, and 1.7% of participants’ families earned $150,000 or more a year. The highest level of education for parents of participants in the sample was typically some college education (30.1%). However, 11.2% of participants had at least one parent who had earned an Associate’s degree, 21.2% had earned a Bachelor’s degree, 7.1% had earned a Master’s degree, and 3.5% had earned a Doctoral degree. Some families had a high school diploma or GED as the highest level of education attained (16.2%); other families had some high school (8.6%) or no high school/less than 8th grade (0.6%).

Measures

Conners’ Rating Scale for Preschool Children. Parents and teachers rated children’s externalizing behaviors using either a hybrid version of the Connors’ Rating Scale (CRS) that combined 44 items from multiple versions of the CRS (e.g., Conners, 1969, 1997; see Gerhardstein et al., 2003) or a 15-item version of the CRS developed by Purpura and Lonigan (2009) made up of a subset of the 44 items in the hybrid version. Three dimensions represent items from both versions of the CRS: Inattention, Hyperactivity/Impulsivity, and Oppositional Defiant Behavior. This 15-item scale was created using item-response theory analysis to reduce the time needed for parents and teacher to complete the measures while retaining the scales’ ability to discriminate children with different levels of behavioral problems along the three aforementioned factors. For both versions, parents and teachers rated behaviors on a 4-point Likert scale that ranged from zero, indicating that the behavior was not present at all, to three, indicating that the behavior was frequently present. For all participants, parent and teacher CRS scores were created by using the 15 items common to both the 44- and 15-item versions. Correlations between the 44- and 15-item versions are high (r = .92 for inattention subscale, 0.94 for hyperactive/impulsive subscale, and 0.96 for oppositional defiant subscale; Purpura & Lonigan, 2009). For both parents and teachers, the reliability of the CRS-15 ranged from acceptable to high for the Inattention (parent rating α = 0.81; teacher rating α = 0.90), Hyperactivity/Impulsivity (α = 0.81; α = 0.75), and Oppositional-Defiant (α = 0.78; α = 0.88) scales.

Preschool Comprehensive Test of Phonological and Print Processing. Participants were administered the Preschool Comprehensive Test of Phonological and Print Processing (Pre-CTOPPP; Lonigan et al., 2002), which measures three- to five-year-old children’s phonological awareness, print knowledge, and oral language skills. The Pre-CTOPPP was the development version of the Test of Preschool Early Literacy (Lonigan et al., 2007). Children’s phonological awareness skills were measured with two subtests: Blending and Elision. The Blending subtests consists of 21 items that required children to combine word sounds to form a word. The Elision subtest consists of 18 items that required children to say a word, then remove part of that word and say the new word. The Print Knowledge subtest consists of 36 items related to print concepts, letter name and letter sound recognition, and letter name and letter sound production. The Receptive Vocabulary subtest consists of 40 items for which children were required to point to a picture that represents a word spoken by the examiner. The Definitional Vocabulary subtest contains 40 items that require children to first provide the word that matches a picture and then provide information about the feature or function of the pictured item. All subtests include at least one practice item. Internal consistency for all Pre-CTOPPP subtests range from acceptable to high for three- to five-year-old children (α = 0.76 − 0.95).

Test of Early Mathematics Ability–Third Edition. Children completed the Test of Early Math Ability (TEMA-3; Ginsburg & Baroody, 2003) to assess overall mathematics ability. The TEMA-3 consists of 72 items that are scored in a binary, pass/fail fashion. The TEMA-3 produces an overall, math ability score but does not have sub-scales that assess specific dimensions of mathematical ability. The TEMA-3 has high internal consistency with three- (α = 0.92) and four- (α = 0.93) year-olds, as well as a two-week test-retest reliability of 0.82.

Child Math Assessment–Abbreviated. Children completed the Child Math Assessment–Abbreviated (CMA-A; Starkey et al., 2004) to assess their informal mathematical knowledge across a range of concepts. The CMA-A is composed of four sets of tasks that measure four dimensions of early mathematical knowledge. The first task has children solve simple addition and subtraction problems that involve a single set of objects that is initially invisible and then hidden from view. The second task has children construct a set of objects equal in number to a set that the children were shown. The third task asks children to recognize various shapes. The final task involves copying a repeating pattern using sets of objects that vary in color and identity from the objects in the model pattern. Each task is composed of multiple problems. The internal consistency of the CMA-A is acceptable for preschool children (α = 0.79; Preschool Curriculum Evaluation Research Consortium, 2008).

Procedure

The current study was approved by the Florida State University Internal Review Board (HSC2010.0144). Parents of all participating children provided written informed consent/permission for their children’s participation, and all classroom teachers consented to participate. The data used for this study came from children’s assessment of academic skills during the fall of children’s preschool year. Children gave verbal assent prior to the beginning of testing. Parents and teachers completed the CRS in the fall of children’s preschool year (i.e., October through early-December), coincident with the assessment of children’s academic skills. Trained research assistants administered assessments to children in a quiet area of their preschools. Research assistants were trained to criterion performance on the measures through didactic presentations, modeling, a performance assessment of test administration to an adult posing as a child, and live observations of assessment with feedback. Assessments were conducted over several 30- to 45-minute sessions, within a two- to three-week period.

Results

Some individuals (n = 38) did not answer one item on the CRS. The items not answered differed across participants. Consequently, all scales were prorated, such that any child who had a score for at least four of the five items on a subscale still received a total subscale score that was utilized for analyses (i.e., available items within a scale were summed and then divided by the number of items summed). Correlations within and between, as well as descriptive statistics for, parent and teacher ratings of children’s externalizing behaviors are shown in Table 1. All behavior ratings were significantly correlated. Paired-samples t-tests were used to examine whether the mean scores from the three scales were different for parents and teachers. The results of these tests indicated that parents rated oppositional defiant, t(685) = -6.55, p < .001, and hyperactive/impulsive, t(688) = -8.02, p < .001, behaviors at significantly higher levels than did teachers, but inattentive behaviors, t(691) = 0.38, p = .70, were rated at similar levels by parents and teachers.

Table 1 Correlations between and descriptive statistics of parent and teacher ratings of externalizing behaviors

Zero-order correlations between parent and teacher ratings of behavior and all academic outcome variables are shown in Table 2. Teacher ratings of inattentive, hyperactive/impulsive, and oppositional defiant behaviors were all significantly associated with all academic outcomes. Parents’ ratings of inattentive and hyperactive/impulsive behaviors were significantly associated with all academic outcomes; parents’ ratings of oppositional defiant behaviors were only significantly associated with TEMA-3, print knowledge, and expressive vocabulary scores. For both parent and teacher ratings, correlations were nominally higher for math, Print Knowledge, and language outcomes than they were for phonological awareness outcomes.

Table 2 Correlations of parent and teacher ratings of externalizing behavior domains with academic outcomes and comparisons of strength of parent and teacher correlations

Comparing Predictive Utility of Parent and Teacher Reports

Steiger’s Z tests (Steiger, 1980) were used to compare the strength of associations of parent and teacher ratings with academic outcomes, separated by behavior dimension and academic outcome (see Table 2). Results for inattentive behaviors revealed that teacher ratings had significantly stronger associations with all measures of academic outcomes, other than Blending, than did parent ratings. Results for hyperactive/impulsive behaviors revealed that there were no significant differences in associations between parent and teacher ratings for any academic outcome. Results for oppositional defiant behaviors produced results similar to those for inattentive behaviors such that teacher ratings had significantly stronger associations with academic outcomes than did parent ratings for all academic outcomes except Blending and Print Knowledge.

Multivariate Prediction of Academic Outcomes

Multi-level regression analyses were conducted to determine if parent reports and teacher reports of children’s externalizing behaviors accounted for portions of variance beyond each other, as well as child age at time of first assessment, child gender, child race, and highest education of parents. Children’s preschools were included as a clustering variable to account for the nesting of these data. Because multi-level regression models do not include true R2 values, pseudo-R2 values were calculated using residuals from conditional and unconditional models to represent the total variance explained by full models (LaHuis et al., 2014). The pseudo-R2 of control variables (i.e., age, gender, race, and highest education of parent; control block) and behavior variables (i.e., parent- and teacher-rated externalizing behaviors; behavior block) were calculated in a similar manner. Finally, unique predictive variance (sr2) for all variables in a model was calculated using the change in pseudo-R2 values when each variable was and was not included in a model (e.g., the pseudo-R2 from the full model minus the pseudo-R2 from a model without teacher-rated externalizing behaviors generates the sr2 of teacher-rated externalizing behaviors). Predictive variance shared among variables can be computed by subtracting the sum of the sr2 of the variables from one block (i.e., control block or behavior block) from the difference between the model pseudo-R2 and the pseudo-R2 of the other block.

Results of models that included children’s early mathematics abilities are shown in Table 3. Nominally, ratings of inattentive behavior accounted for more variance in math abilities than did ratings of hyperactive/impulsive or oppositional defiant behaviors (see Table 3Footnote 1 for pseudo-R2 values). Teacher-rated inattentive behaviors, hyperactive/impulsive behaviors, and oppositional defiant behaviors accounted for more unique variance than did parent-rated behaviors. For models including inattentive behaviors, the sr2 value of teacher-rated behaviors was 0.10 for both the TEMA-3 and the CMA-A. In contrast, the sr2 value of parent-rated behaviors was 0.01 for the TEMA-3 and 0.00 for the CMA-A. The variance unique to the behavior block was 0.15 and 0.13 for the TEMA-3 and CMA-A, respectively, indicating that parent and teacher ratings shared 3% of the variance accounted for in TEMA-3 and CMA-A scores. Finally, sr2 values for parent-rated and teacher-rated behaviors ranged between 0.00 and 0.02 for models including hyperactive/impulsive and oppositional defiant behaviors. Child race did not account for unique variance for TEMA-3 or CMA-A scores.

Table 3 Results of mixed-model regression using CRS scales from parents and teachers on math skills

Similar multi-level regression analyses were conducted to determine if parent ratings and teacher ratings accounted for different amounts of variance in children’s early literacy skills (see Table 4). Nominally, ratings of inattentive behaviors accounted for more variance in early literacy skills than ratings of hyperactive/impulsive or oppositional defiant behaviors across all early literacy skills (i.e., Blending, Elision, and Print Knowledge subtests). The behavior block for the inattentive model accounted for 23% of the variance in Print Knowledge outcomes, 16% of which was unique variance. Thus, 4% of the unique variance captured by behavior ratings was shared between parent ratings and teacher ratings. Additionally, the behavior block for inattentive behaviors accounted for 9% of the variance in Elision scores, 7% of which was unique. Teacher ratings contributed as much unique variance as variance they shared with parent ratings (3%), while parent ratings only accounted for 1% of the unique variance in that outcome. Teacher and parent ratings in all other models produced sr2 values between 0.00 − 0.02. Consequently, these results indicated that teacher ratings of inattentive behavior displayed incremental utility on Print Knowledge scores. Child race did not account for unique variance for scores on any measure of early literacy skill.

Table 4 Results of mixed-model regressions using CRS scales of parent and teacher report on early literacy skills

Results for models predicting children’s language skills are shown in Table 5. Again, ratings of inattentive behaviors nominally accounted for more variance in language skills than did ratings of hyperactive/impulsive behaviors or oppositional defiant behaviors. Parent and teacher ratings of hyperactive/impulsive behavior and oppositional defiant behavior yielded sr2 values between 0.00 and 0.01. Teacher-rated inattention contributed unique variance to both Receptive Vocabulary and Expressive Vocabulary scores, whereas parent-rated inattention produced negligible sr2 values. The inattentive-behavior block accounted for 9% and 12% of variance in Receptive Vocabulary and Expressive Vocabulary subtests, respectively, and 7% and 10% of that variance was unique variance, respectively. Similar to other academic outcomes, all the variance that parent ratings accounted for in Receptive Vocabulary and all but 1% of the variance in Expressive Vocabulary was shared with teacher ratings and the control block. Conversely, teacher rated inattention accounted for 6% and 5% of the unique variance in Receptive and Expressive Vocabulary subtests, respectively. Consequently, only teacher ratings of inattention displayed incremental predictive utility for Receptive and Expressive Vocabulary scores. Child race accounted for marginal amounts of unique variance in language skills; the White racial category contributed 1% unique variance to the receptive and expressive vocabulary in inattentive models as well as for receptive vocabulary in hyperactive/impulsive and oppositional defiant models.

Table 5 Results of mixed-model regression using CRS scales of parent and teacher report on early language skills

Discussion

Although multi-informant assessment of children’s externalizing behavior problems is considered critical for the assessment of some mental health disorders (e.g., ADHD, ODD; American Psychiatric Association, 2013), parent and teacher ratings of externalizing behavior have consistently low levels of agreement (Achenbach et al., 1987; De Los Reyes et al., 2015). Such findings could indicate that both parents and teachers provide unique and useful information about children’s behavior. Alternatively, differences between raters could indicate issues related to the validity or utility of one or the other rater. Studies in which behavior ratings are used to predict other outcomes often treat parent and teacher ratings as interchangeable (e.g., Daucourt et al., 2020; Gray et al., 2017). In this study, which directly compared the predictive utility of parent and teacher ratings for academic outcomes, teacher ratings were generally more strongly associated with academic outcomes than were parent ratings, and only teacher ratings provided consistent and non-trivial unique predictive information. These results indicate that there may not be uniform utility in multi-informant assessment.

Although teacher ratings of externalizing behaviors were more strongly associated with children’s academic outcomes than were parent ratings of the same behaviors, this finding was not consistent across domains of externalizing behavior or across domains of academic outcomes. For both teachers and parents, ratings of inattentive behaviors were more strongly associated with academic outcomes than were ratings of hyperactive/impulsive behaviors and oppositional defiant behaviors. Results of Steiger’s Z tests provided partial support for the primary hypothesis that teacher ratings would have stronger associations with academic outcomes than would parent ratings. Although teacher ratings of hyperactive/impulsive behavior had nominally larger correlations than did parent ratings, there were no significant differences in correlations for any of the seven academic outcome measures. It is unclear why there were significant differences in associations for academic outcomes between teacher and parent ratings of oppositional defiant behaviors but not hyperactive/impulsive behaviors. It is, however, notable that teacher-parent agreement on oppositional defiant behaviors was the lowest among the externalizing behavior domains examined. For both parent and teacher ratings, correlations were nominally higher for math, Print Knowledge, and language outcomes than they were for phonological awareness outcomes.

Consistent with results from univariate analyses and in partial support of the second hypothesis, teacher ratings contributed more unique variance to academic outcomes than did parent ratings in multivariate analyses. Multi-level regression analyses revealed that there was partial overlap between parent and teacher ratings, as indicated by non-zero amounts of shared predictive variance for teacher and parent ratings. Further mirroring results of univariate analyses, teacher and parent ratings of inattentive behaviors accounted for more variance in children’s academic outcomes than did ratings of hyperactive/impulsive or oppositional defiant behaviors. Control variables accounted for larger portions of variance than did behavior ratings for models examining ratings of hyperactive/impulsive and oppositional defiant behaviors. Parent ratings only contributed more unique variance than did teacher ratings of the same behaviors on one occasion, and parent ratings often contributed no unique variance when considering teacher ratings and the control variables. Conversely, teacher-rated inattention contributed more unique variance than any other variable for six out of the seven measures of academic outcomes examined. Consequently, our findings only support teacher-rated inattention as consistently displaying incremental predictive utility. However, the total amount of variance explained was modest, and more variability in academic outcomes was unexplained than accounted for by the combination of parent and teacher ratings and the control variables.

Results of multivariate analyses were in-line with prior literature. For example, Gray and colleagues (2017) reported that ratings of inattention accounted for between 5% and 16% of the variance in standardized measures of academic outcomes, and, in the current study, the behavior block for inattentive models accounted for 4 − 23% of the variance in standardized academic outcomes. Some existing research supports the notion that inattentive behaviors, more so than hyperactive/impulsive behaviors or oppositional defiant behaviors, form a distinct risk factor for poor academic outcomes in children (e.g., Massetti et al., 2008). Notably, there were differences in strengths of associations depending on the academic outcome. This could be due to inattentive behaviors playing a larger role in the development of some academic skills than others. Thus, the present findings corroborate some existing research in supporting inattentive behaviors having stronger associations with academic outcomes than other domains of externalizing behaviors and are thus a more salient risk factor for worse academic outcomes than other domains of externalizing behaviors. Future research should further explore the differential relations of inattentive behaviors with specific academic skills.

Compensating or Divergent Operations?

The modest associations between parent and teacher reports in this study were consistent with prior studies (e.g., Achenbach et al., 1987; De Los Reyes et al., 2015) and may indicate validity issues. Despite parent and teacher ratings of child behaviors having differential associations with an objective outcome measure (i.e., academic outcomes), both parents’ and teachers’ ratings accounted for unique variance in some of those outcomes (i.e., 33% and 81% of the 21 behavior-domain-by-academic-outcome models for parent ratings and teacher ratings accounted for non-zero portions of variance, respectively). However, some of the variance accounted for was shared between parent and teacher ratings. Thus, there were not many cases where both raters were providing novel predictive information. Given the substantive differences in unique variance accounted for by teacher ratings, the results of this study suggest that there may be limited value in including parent ratings in models to understand children’s academic outcomes. Teachers’ ratings provided novel information more often than did parents’ ratings, and, in cases where there was shared predictive variance, it was generally less than, or equal to, the unique variance provided by parent ratings. Even for inattentive models, in which behavior ratings captured more variance compared to the models that included other domains of externalizing behaviors, the amount of variance shared between parent and teacher ratings always exceeded the amount of variance parent ratings uniquely captured.

This study focused solely on academic outcomes as an objective indicator of functioning, which leaves open the possibility that the obtained pattern of results is relevant for only this domain of functioning. That is, because teachers rate children’s behaviors in the context of learning (i.e., school), teacher ratings are better indicators of academic success, whereas parents rate children’s behaviors outside of school and thus may be better indicators of non-academic functioning, such as executive function or social skills. Alternatively, teachers’ ratings may reflect implicit explanations for why a child is doing worse academically than other children (i.e., “this student does not pay attention” “this student is impulsive”); in this case, the academic performance, and not the child’s behavior, drives the teacher’s rating. However, teachers’ ratings of behavior as a proxy for children’s academic competencies seems an unlikely explanation in this study. First, ratings were collected in the fall of the school year, which would limit (but not eliminate) the amount of time that behaviors could interfere with learning. Second, teachers’ ratings were more predictive of academic outcomes than were parent ratings across most outcome measures, including measures of language, and preschool exposure seems to have a larger effect on non-language skills than on language skills (e.g., Ansari et al., 2020). Finally, preschools do not have a strong emphasis on direct instruction as is expected in later grades, and the quality of language instruction is often judged to be low (e.g., Justice et al., 2008).

As noted previously, the OTM was intended to shift focus away from convergence as validity toward understanding reasons for discrepancies and identifying when discrepant results provide information with clinical utility. Compensating operations describe situations in which measures do not agree, leading to different conclusions, but the differences between measures do not reflect meaningful variation in the construct of interest. Compensating operations may be due to methodological differences, error, or validity issues related to one or both measures. In this study, teacher and parent ratings had similar reliabilities, indicating that measurement error was unlikely to be a significant factor in discrepancies. Jungersen and Lonigan (2021) evaluated the measurement invariance of the parent and teacher report on the CRS and found that teachers and parents were reporting on largely the same constructs (i.e., partial metric invariance), and that teachers and parents were reporting differences in levels of the underlying constructs in similar ways (i.e., partial scalar invariance). These results suggest that discrepancies between raters are not the result of teachers and parents interpreting the externalizing behavior constructs differently.

Within the OTM, diverging operations describe situations in which measures do not agree, leading to different conclusions, but both measures reflect meaningful variation in the construct being assessed. Diverging operations can be due to factors such as unique perspectives that may result from observing events or individuals in different contexts or having context-related understandings of behavior. Although teachers’ ratings of externalizing behaviors being a stronger correlate of preschool children’s academic skills and having consistent unique predictive utility above parent ratings could be a function of such context-related factors or teachers’ context-related understanding of behavior, such an explanation seems less likely given findings from studies examining functional outcomes other than academic skills.

Studies that directly compare the utility of teacher and parent ratings of children’s behavior for the prediction of some independent, objective outcome are relatively rare; however, available evidence indicates a general pattern of stronger associations between teachers’ ratings and measures of other outcomes than the associations between parents’ ratings and that outcome. For instance, in terms of non-rating-based measures of inattention, hyperactivity/impulsivity, or both, Wang et al. (2015) reported that teacher ratings of inattention had the strongest correlations with scores on the Test of Variables of Attention compared to parent or clinician ratings of inattention. Similarly, based on a relatively large sample of Korean third- and fourth-grade children, Cho et al. (2011) reported a correlation between teacher-rated inattention and omission errors on a continuous performance task (CPT) that was significantly higher than the correlation between parent-rated inattention and omission errors, and the correlation between teacher-rated hyperactivity/impulsivity and commission errors on a CPT was significantly stronger than the correlation for parent-rated hyperactivity/impulsivity and commission errors.

Studies have reported a similar pattern of results for measures of executive function. Backer-Grondahl et al. (2019) noted that teacher-reported externalizing behaviors were associated with their measure of effortful control (EC; similar to inhibitory control) but parent-reported externalizing behaviors were not related to their measure of EC. Data from the study by Cho et al. (2011) indicated that teacher ratings of inattention were more strongly associated with scores on a Stroop task and interference on the Children’s Color Trails Test than were parent ratings of inattention. Jungersen and Lonigan (2020) reported that first- and second-grade teachers’ ratings of inattention and hyperactivity/impulsivity on two different measures of externalizing behaviors were more strongly correlated with a latent variable representing six executive function tasks than were parents’ ratings on the same measures.

The present findings coupled with existing research suggest that the context in which the child is observed is not the primary cause of observed differential associations. If children are behaving differently in different contexts, those behavioral discrepancies would cause mean rating differences, which would then lead to significant differences in associations with outcomes. However, mean ratings of inattentive behaviors, the domain for which differences in associations with academic outcomes were the largest, were not significantly different between teachers and parents, but parents were rating significantly more hyperactive/impulsive behaviors than teachers, the domain for which there were no significant differences in associations with academic outcomes. Thus, it becomes more likely that parents do not have a strong understanding of when various behaviors are problematic or what behaviors may be age appropriate because parents observe fewer children and likely only have many observations of their own children. This may lead to developmentally inappropriate expectations of a child that cause parents to erroneously identify developmentally typical levels of externalizing behaviors as problematic. Conversely, teachers have classrooms that consist of children of varying temperaments and achievement levels. Therefore, teachers may understand what behaviors are age-appropriate and when the presence of specific behaviors can have stronger impediments on relevant functional outcomes. Such potential differences in parents’ and teachers’ experiences could be a reason for the observed differences in parents’ and teachers’ behaving ratings, which could represent a diverging operation. However, even if raters differ for meaningful reasons, such as understanding when behaviors are more likely to cause an impediment, each rater would be expected to provide unique information. Considering that parent ratings accounted for at most 2% of the unique variance of a given outcome, parents’ ratings did not contribute much beyond teachers’ ratings. Because parent ratings did not contribute non-trivial amounts of unique variance beyond teacher ratings and there is no evidence to support measurement error, rater discrepancies likely reflect some other form of compensating operations.

Limitations and Future Directions

This study had numerous strengths, including a large and racially diverse sample and multiple standardized measures of children’s academic skills. Despite these strengths, some limitations were present. This study only examined concurrent associations between parent and teacher ratings and children’s academic abilities. Future research should examine predictions of longitudinal outcomes. Additionally, this study used only academic achievement data as an outcome measure. Future studies should examine the associations of teacher and parent ratings with multiple functional outcomes to assess whether raters account for largely overlapping, or largely unique, portions of variance. Moreover, comparing parent and teacher ratings’ associations with direct behavioral observations, performed by a researcher who was trained to some criterion, as well as assessing behavioral ratings’ associations with ratings of academic competency, could further the understanding of rater discrepancies.

Conclusions

The results of this study, when put into context with existing research, suggest that the information provided by different informants does not have universal utility across outcomes. Clinicians utilize multi-informant reports for various diagnoses that require consistency of symptoms across settings (e.g., ADHD, ODD), but there is no current, evidence-based, gold-standard practice by which to utilize multi-informant reports. Because diagnoses require consistency of symptoms across settings, one might think that diagnosis of such a disorder requires both teachers and parents to endorse the same behavior. However, such a designation is not required; clinicians and researchers employ varying strategies to combine the information to incorporate reports from multiple informants. Additionally, some research suggests that requiring a parent and teacher to agree on the presence of a given behavior results in worse negative predictive value, sensitivity, specificity, and similar positive predictive value (100 for both, 99.1; for either) compared to considering either parent or teacher endorsing a given behavior for the diagnosis of ADHD, when using a diagnosis made by a clinician as the standard to which these methods were compared (Martel et al., 2021).

Although many studies have compared parent and teacher reports of children’s externalizing behaviors to each other, as well as assessed the predictive utility of either parent or teacher ratings on some objective domain of functioning, no studies have directly compared the predictive utility of these different raters on academic achievement to assess each rater’s unique contributions. Results of this study provided partial support for the primary hypothesis; teacher ratings of inattentive and oppositional defiant behaviors had consistently stronger correlations with academic outcomes than did parent ratings. The results of the current study provided some support for the second hypothesis; teacher ratings more consistently contributed unique predictive variance to academic outcomes than did parent ratings. This study provided evidence that teacher ratings generally captured more unique variance in academic outcomes than the variance they shared with parent ratings or the variance that parent ratings uniquely captured. When considering existing research that highlights teacher ratings are also a stronger correlate of executive function measures compared to parent ratings (Cho et al., 2011; Backer-Grondahl et al., 2019), these results are not likely to be explained by context driving the observed differential associations. Thus, these findings open the possibility that teachers’ ratings provide more useful information than parent ratings due to teachers’ ability to identify normative behaviors. Consistent with OTM, developing a better understanding of the outcomes for which discrepancies reflect meaningful variation and provide clinical utility can improve diagnostic accuracy.