Attention-Deficit/Hyperactivity Disorder (ADHD) is characterized by developmentally inappropriate levels of inattention, impulsivity/hyperactivity, or both (American Psychiatric Association 2000). Individuals with ADHD represent a heterogeneous group who display substantial variation in the patterning and the severity of the 18 symptoms (Nigg et al. 2005). One approach to an optimal ADHD phenotype is to consider the unity and diversity of the symptoms of inattention and hyperactivity/impulsivity. Previous exploratory and confirmatory factor analytic studies have consistently found that inattention and hyperactivity/impulsivity are distinguishable but correlated dimensions in both clinical and population samples (Willcutt et al. under review). However, some researchers have found that this model is quite vulnerable to developmental changes in ADHD symptoms (Hart et al. 1995; Larsson et al. 2011), whereas others have shown that ADHD symptoms may be both quantitatively and qualitatively different depending on gender (Gershon 2002). There is also support for a correlated three-factor model of inattention, hyperactivity, and impulsivity (e.g., Proctor and Prevatt 2009). Nonetheless, current data overwhelmingly support the concurrent and predictive validity as well as clinical importance of the distinction between the symptom domains of inattention and hyperactivity/impulsivity in terms of developmental trajectory, comorbid mental disorders, academic and social impairment, and neuropsychological dysfunction (Willcutt et al. under review). In contrast with the perspective that inattention, hyperactivity, and impulsivity are fully separable dimensions, it is possible that most symptoms, currently clustered under two dimensions of Inattention and Hyperactivity/Impulsivity, co-occur and cluster under a general “ADHD” dimension. However, some symptoms from each dimension might cluster under one or more orthogonal factors. It is becoming evident that factor models of ADHD need to account for both concurrent overlap and separability between the ADHD symptom domains.

An alternative approach, which accounts for an overarching general ADHD factor as well as unique symptom dimensions, is a hierarchical factor model (also known as a “bifactor model”; Holzinger and Swineford 1937; Schmid and Leiman 1957) that has one general factor with additional specific factors. Some researchers posit that a single general ADHD factor accounts for covariation among all symptoms simultaneously with two (or three) separate, specific factors (i.e., inattention and hyperactivity/impulsivity; see Fig. 1) that vary independently from the general factor (e.g., Toplak et al. 2009). The specific factors thus contribute independent covariance among the symptoms beyond that accounted for the general factor. Hence, the hierarchical model suggests that there are distinct, as well as overlapping, causal pathways that converge on ADHD. Support for this model may improve our understanding of ADHD as a heterogeneous condition and reconcile apparent contradictory findings of general (Sonuga-Barke et al. 2008) and separate (Sagvolden et al. 2005) influences on ADHD symptom domains.

Fig. 1
figure 1

Heuristic representations of the two hierarchical models of attention-deficit/hyperactivity disorder

Evidence of the superiority of the hierarchical model of ADHD over one-, two-, or three-factor non-hierarchical models has been demonstrated in general population samples of children, samples of children and adolescents with ADHD recruited from the community, and clinic-referred samples, using parent, teacher, adolescent, or clinician reports (e.g., Dumenci et al. 2004; Gibbins et al. 2011; Martel et al. 2010a, b; Toplak et al. 2009; A summary of studies on the hierarchical models of ADHD is presented in Suppl. Table 1). Despite their large representative samples, the two existing population-based studies have important limitations to note. First, Dumenci and colleagues (2004) only used teacher reports and were therefore not able to confirm their results with other key informants, such as parents. In contrast, Ullebø and colleagues (under review) tested only a hierarchical model with three specific factors (i.e., inattention and separate hyperactivity and impulsivity factors). Furthermore, neither of these studies tested the stability of their models longitudinally. In light of the temporal instability of DSM-IV ADHD symptoms and subtypes and given that ADHD specific symptoms can wax and wane over time (Lahey et al. 2005; Larsson et al. 2011), it is also important to test the hierarchical models prospectively. Evidence of superiority of the hierarchical models in comparison to other models in longitudinal designs may confirm the well-accepted and empirically driven conceptualization that a latent general ADHD construct holds across time despite variation in the specific symptom presentation.

The unique objective of this study was to test the hierarchical models of ADHD in a normative sample of elementary school-aged children aged between 6 and 9, with a particular emphasis on potential informant, sex, and temporal differences over a 12-month period. To do so, we used a behavior rating scale designed to be sensitive to the full range of the behaviors of interest in a general school population: specifically the SWAN (Strengths and Weaknesses of ADHD-symptoms and Normal-behavior) scale (Swanson et al. 2001a). It is an empirical question whether hierarchical models fit data well in community samples as well as clinical ones. Evidence that they do would suggest that we are capturing an important structure or model of how these symptoms are organized across both general and clinical populations.

We therefore tested two hierarchical models with either two specific factors (inattention and hyperactivity/impulsivity) or three specific factors (inattention and separate hyperactivity and impulsivity factors). To test whether the hierarchical models are the optimal fitting models, we compared them to other factor analytic models published in the field of ADHD and examined them across informants. More specifically, we contrasted the fit of these hierarchical models with the fit of non-hierarchical models, namely one-factor model and confirmatory factor models with two and three correlated factors. Additionally, it is possible that confirmatory factor models in which each symptom is related to only one factor and its direct relationships with other factors (i.e., cross-loadings) are fixed to zero (as in our non-hierarchical correlated factor models) are likely overly restrictive for ADHD measurement scales. Instead, a non-hierarchical exploratory factor model in which each item is allowed to load on both an inattention and a hyperactivity factor may be more realistic (i.e., thus all cross-loadings are freely estimated; see, e.g., Asparouhov and Muthén 2009), in that some manifest symptoms might be influenced by both an inattention construct and a hyperactivity construct. Hence, with this non-hierarchical exploratory factor model each ADHD symptom is determined by not one but two factors, a characteristic it shares with the general-factor hierarchical model but not the strict non-hierarchical confirmatory factor model. Thus, the hierarchical model and the non-hierarchical exploratory model have a similar number of parameters and might be expected to provide near-equivalent fit to data.Footnote 1 To investigate this possibility, we also compared the fit of the hierarchical models with that of a two-factor exploratory model. We predicted that, regardless of informant, sex, or time point, the hierarchical model, with one general factor and two specific factors, would be the best fitting model, and one which could account for both the overlap and separateness of the inattention, hyperactivity/impulsivity symptoms.

Method

Participants and Procedure

Participants were parents and teachers of 524 children (259 boys, 265 girls), aged 6 to 9 years (M = 7.60; SD = 0.92 in Year 1 and M = 8.61; SD = 0.93 in Year 2). The majority of children had English as their primary language (97%) and was primarily Caucasian (86%). Children were recruited from grades 1 to 3 in seven public elementary schools, which constitute 20% of the 33 schools in a large rural and suburban district school board in Southern Ontario, Canada. The data were collected as part of a 2-year prospective study investigating behavioral symptoms of attention, cognitive measures of attention, and academic outcomes in elementary school-aged children (SSHRC #410-2008-1052). Data collection for the present study occurred in November of each year to ensure that the children had sufficient time to settle down and for teachers to have observed and interacted with the children for 2–3 months.

Complete data for teacher ratings were available for 510 participants in Year 1 (i.e., 14 or 2.7% of the 524 consenting teachers did not complete the ratings) and 491 in Year 2 (i.e., 96% retention). Complete data for parent ratings were available for 350 participants in Year 1 (69% of sample) and 347 participants in Year 2 (68% of sample; 99% retention in Year 2). Whereas parents rated their children at both time points, different sets of teachers rated the same set of children in Year 1 and Year 2.

This study was approved by the University of Toronto, the Hospital for Sick Children, and the participating school boards. Principals interested in having their school participate contacted the research team to learn more about the study. Information sessions for teachers of grades 1 to 3 were provided along with study information and consent packages. Consenting teachers then received research packages (containing a cover letter, study information, consent forms, and the study questionnaires) to give to parents. Eligibility criteria were: 1) education in mainstream classrooms in either English or French (25% were in French immersion); 2) no major sensory or physical impairment that would preclude a child from hearing the instructions or completing the assessment tasks; and 3) written informed consent from the child’s teacher and parent and verbal assent from child.

Participating parents were mostly mothers (90%) and had diverse levels of education: less than a high school degree (2%), high school degree or equivalence (6%), some college (9%), college degree (32%), some university (11%), university degree (31%), or post-graduate college degree (9%). According to parental report, some children were reported to have the following learning and mental health issues: ADHD (4%), language impairment (3%), learning disability (3%), and a behavior problem (2%). We compared the profiles of consenting parents (and their children) who did versus those who did not complete the behavior ratings to test for any systematic differences. As indicated in the supplementary Table 2, participating parents were more likely to have English as their primary language and their children were more likely to be enrolled in French immersion classes. Children with non-participating parents were rated by their teacher as having more problems with inattention and hyperactivity (as measured on the SWAN), but fewer peer problems than children with non-participating parents. However, it is important to note that the teacher SWAN scores of children with or without completed parent questionnaires were both solidly in the average range (see Suppl. Table 2).

Among the 52 teachers who participated, 87% were females and 81% were Caucasian. Classroom size varied from 13 to 26 students with an average of 19.7 students per class (SD = 2.5) in Year 1, whereas classroom size varied from 16 to 30 students with an average of 21.6 students per class (SD = 3.6) in Year 2. The number of years of experience as a teacher ranged from 1 year to 33 years with an average of 14.6 years (SD = 7.9). Also, 40% of teachers possessed additional qualifications in special education. All had taught at least one student with frequent problems in paying attention and concentrating, with about only 2% of teachers reported currently having no student with such problems. Moreover, 43% of teachers reported currently having several inattentive students in their classroom (1–3 students), 35% had quite a few (4–5 students), and 20% had many students with such problems (more than 5 students). As indicated in Table 1, and as expected from previous population studies, parents in our sample rated boys as having more ADHD symptoms than girls according to the SWAN. Teachers rated boys as having more ADHD symptoms and other problematic behavior compared to girls, on both the SWAN and Strengths and Difficulties Questionnaire (SDQ). Both parents and teachers rated boys as having less prosocial skills than girls on the SDQ. Both informants rated at least twice more boys than girls as being severely impaired. However, when considering the whole sample, only 10–11% of the participating children were rated as being severely impaired, confirming that this sample is generally unimpaired in daily functioning (see Table 1).

Table 1 Descriptive statistics for the SWAN rating scale and SDQ in year 1: means and SDs (in parentheses)

Measures

SWAN Questionnaire

The Strengths and Weaknesses of ADHD-symptoms and Normal-behavior Scale (SWAN; Swanson et al. 2001a) is a revised version of the SNAP-IV (Swanson 1995), a behavior rating scale measuring inattentive, hyperactive, and impulsive behaviors as specified in DSM-IV-TR (American Psychiatric Association 2000). The SWAN is also based on the 18 ADHD symptoms listed in the DSM-IV-TR, divided into two subscales of nine items corresponding to the domains of Inattention (items 1–9) and Hyperactivity/Impulsivity (items 10–18). The SWAN differs from most behavior rating scales used for assessing developmental psychopathology in that the symptoms of ADHD are reworded to reflect relative strengths (better than average) as well as weaknesses (worse than average). For instance, the DSM-IV symptom Often avoids, dislikes, or reluctantly engages in tasks requiring sustained mental effort” is reworded as “Engage in tasks that require sustained mental effort”; “Often has difficulty awaiting turn” is rephrased as “Await turn (stand in line and take turns)”. The items are rated on a seven-point scale (−3 = far above average; −2 = above average; −1 = somewhat above average; 0 = average; 1 = somewhat below average; 2 = below average; 3 = far below average; Swanson et al. 2001a). Thus, in line with the conventions for positive scores to reflect weaknesses or problem behaviors, positive scores on the SWAN indicate report of worse than average ADHD symptoms, whereas negative scores indicate better than average behavior. Children’s total scores ranged from a minimum of −27 to a maximum of 27 for each subscale. A significant advantage of the SWAN is that it yields an approximate normal distribution of the subscale total scores (Cornish et al. 2008; Hay et al. 2007; Polderman et al. 2007; Robaey et al. 2007; Swanson et al. 2001a) and avoids potential psychometric flaws that are associated with skewed distributions such as over-identifying extreme cases of ADHD (Hay et al. 2007; Swanson et al. 2001b). ADHD represents one tail of the normal distribution of ADHD symptoms (Levy et al. 1997), but it is highly likely that some children in the general population will have better than average self-regulation of attention, activity and impulse control, and thus fall at the opposite tail of the distribution (Swanson et al. 2001a).

In our sample, the total scores (i.e., mean of items) from the SWAN in Year 1 were approximately normal for the overall General ADHD scale (Parent version: skewness = −0.08, kurtosis = 0.22; Teacher version: skewness = −0.23, kurtosis = −0.07) and the Inattention (Parent version: skewness = 0.19, kurtosis = 0.53; Teacher version: skewness = −0.09, kurtosis = −0.24) and Hyperactivity/Impulsivity (Parent version: skewness = −0.22, kurtosis = 0.10; Teacher version: skewness = −0.14, kurtosis = 0.16) subscales.Footnote 2 The internal consistency coefficients in Year 1 for the General ADHD scale and the Inattention and Hyperactivity/Impulsivity subscales were homogenously excellent, ranging from 0.93 to 0.98 for parents’ and teachers’ ratings. Pearson correlation coefficients between the parent SDQ hyperactivity scale (which includes 5 items that probe for symptoms of inattention and hyperactivity/impulsivity) and the SWAN Inattention, Hyperactivity, and General scales were 0.28, 0.39, and 0.36, respectively in Year 1. Similarly, corresponding correlations between the teacher SDQ hyperactivity scale and the SWAN Inattention, Hyperactivity, and General scales were 0.30, 0.36, and 0.35, respectively in Year 1. All correlations were significant (in all cases p < 0.01). Virtually identical results were found with the Year 2 data in terms of the SWAN scales’ skewness, kurtosis, internal consistency, and correlations with the SDQ hyperactivity subscale.

These psychometric properties are consistent with those of previous studies (e.g., Robaey et al. 2007; So et al. 2008). Good concurrent validity exists between the SWAN and other clinical tools that use the exact wording of the DSM-IV criteria for ADHD. For instance, different structured clinical interviews (e.g., Schedule for Affective Disorders and Schizophrenia for School-Age Children—Present and Lifetime Version—KSADS-PL, Kaufman et al. 1997; Diagnostic Interview Schedule for Children 4.0, Shaffer et al. 2000) and symptom-based parent and teacher ratings (e.g., Conners’ Rating Scales revised, Conners et al. 1998a, b) has been previously reported in the literature (e.g., Cornish et al. 2005; Robaey et al. 2007; Toplak et al. 2009). For example, correlations between the Conners teacher Inattention and Hyperactivity/Impulsivity subscales and SWAN subscales were strong and ranged from 0.87 to 0.91 whereas they ranged from 0.75 to 0.82 on the parent version of the scales (Cornish et al. 2005).

Strengths and Difficulties Questionnaire

We used the Strengths and Difficulties Questionnaire (SDQ; Goodman 1997) as a screening measure of social, emotional and behavioral functioning, as well as of impairment. Each of the 25 items of the SDQ has the response options of 0 (not at all), 1 (a little, sometimes) or 2 (very much, all of the time). Responses for each item are grouped into one of five 5-item subscales (prosocial, hyperactivity, emotional, conduct and peer problems), and a total difficulties score is also obtained. Besides covering common areas of emotional and behavioral difficulties, it also enquires whether the informant thinks that the child has a problem in these areas and, if so, asks about resultant distress and impairment at home and school. Each impairment question is rated from 0 (not at all/only a little), to 1 (quite a lot), to 2 (a great deal). The maximum score for parent-rated impairment scale is 10 (5 questions) whereas the maximum score for teacher-rated impairment is 6 (3 questions). The SDQ is well established in research on the mental health of children and adolescents and there is good evidence of its psychometric properties across different cultures. Results from a recent review of 48 studies (N = 131,223) on reliability and validity of the parent and teacher SDQ confirmed strong psychometric properties, including adequate internal consistency, test–retest reliability, inter-rater agreement, construct validity, concurrent validity, and discriminant validity for the parent and teacher versions (Stone et al. 2010).

In our study, we used the SDQ Hyperactivity Scale to estimate the number of children in our sample in Year 1 who would probably be at-risk for ADHD classification; its internal consistency was good (parent version, Cronbach’s α = 0.84; teacher version, Cronbach’s α = 0.90). More specifically, we used the following modified SDQ predictive algorithm: a) parent “or” teacher SDQ Hyperactivity Scale ≥ 6 (corresponding to the 90th percentile); b) some hyperactivity/inattention symptoms rated on the other informant (parent or teacher) as indicated by scores on the SDQ Hyperactivity Scale ≥ 4; and c) manifest impairment, as rated by both parent and teacher on SDQ impact scale (scores of 2 or more; representing the 90th percentile). This scoring algorithm takes both symptom severity and level of impairment into account, and uses both parent and teacher ratings to indicate problems in more than one context. According to this algorithm, 7 children (1%; 5 boys, 2 girls) would meet “probable ADHD” criteria in our general community sample.

Data Analysis

The 18 symptoms used to assess ADHD from the SWAN parent and teacher reports were used for the present analyses. Analyses were conducted separately for each informant on the questionnaire. All teacher-report models were fitted using aggregated analysis under complex sampling as described by Muthén and Satorra (1995) to account for non-independence of observations due to each teacher rating multiple students in the class. Confirmatory factor analysis (CFA) was utilized to test six potential factor structures underlying the 18 ADHD symptoms: a) a one-factor model of inattention/hyperactivity/impulsivity; b) a non-hierarchical two-factor model with correlated inattention and hyperactivity/impulsivity factors; c) a non-hierarchical three-factor model with correlated inattention, hyperactivity, and impulsivity factors; d) a hierarchical model of a general ADHD factor with two specific factors of inattention and hyperactivity/impulsivity; e) a hierarchical model of a general ADHD factor with three specific factors of inattention, hyperactivity, and impulsivity, and f) a non-hierarchical exploratory two-factor model. As explained above, hierarchical models differ from non-hierarchical models in that they account for the covariation among all ADHD symptoms in terms of a general factor capturing the shared variation of all items, along with separate, uncorrelated specific factors capturing residual covariance among particular subgroups of items (see Fig. 1). Alternatively, non-hierarchical models include only factors for subgroups of items, and the relationships among items from different subgroups are accounted for by allowing these factors to correlate.

Because the SWAN items have a seven-point response scale, these items were factor analyzed as continuous variables with traditional maximum likelihood estimation because research shows this method is preferable with seven-category ordinal variables (Finney and DiStefano 2006). Although each of the individual symptoms was approximately normally distributed (across items, parent report univariate skewness ranged from −0.27 to 0.28 and univariate kurtosis ranged from −0.45 to 1.13 for both years while teacher report skewness ranged from −0.19 to 0.20 and kurtosis from −0.64 to 0.74), to account for potential multivariate non-normality, Satorra-Bentler fit statistics and robust standard errors (Satorra and Bentler 1988) were used. All models were estimated using Mplus (version 5.2; Muthén and Muthén 2002). Model fit was evaluated using the root mean square error of approximation (RMSEA), comparative-fit index (CFI), and Tucker-Lewis index (TLI) with acceptable model fit indicated by RMSEA values of 0.08 or lower along with CFI and TLI values of 0.95 or higher (Yu and Muthén 2002).

To test factorial invariance across sex, we followed procedures outlined by Widaman and Reise (1997) using a series of nested multiple-group CFA models. Although nested model comparisons often rely on χ 2 difference tests, recent methodological research suggests that examining alternative fit indices is preferable for this purpose because the χ 2 statistic is overly sensitive to sample size and ignores model parsimony. Thus, following Cheung and Rensvold (2002) and Chen (2007) in the current analyses two models were considered to have equivalent fit if the decrease in CFI (ΔCFI) was 0.01 or less and if the increase in RMSEA (ΔRMSEA) was not greater than 0.01. First, to establish configural invariance, we examined the fit of a two-group model in which the basic model specification was identical for boys and girls, but all parameters were free to vary across sex (except those needed for overall model identification). Next, to test weak invariance, the fit of the initial configural invariance model was compared to that of a model with all factor loadings constrained to be equal across sex. If weak invariance held, we then tested strong invariance by comparing the fit of the weak invariance model to that of a model with all factor loading and all symptom intercept parameters constrained to equality across groups.

To test for longitudinal factorial invariance by year, we compared a series of single-group hierarchical models following Widaman et al. (2010). Specifically, a given model contained a set of general and specific factors for both Year 1 symptoms and Year 2 symptoms. The Year 1 general factor freely covaried with the Year 2 general factor, and likewise each Year 1 specific factor freely covaried with the corresponding Year 2 specific factor but all other interfactor covariances were constrained to zero. Additionally, the residual term for each Year 1 symptom was allowed to covary with that for the corresponding symptom from Year 2. Given this structure, we again fitted models representing configural, weak, and strong invariance based on whether factor loading and intercept parameters for Year 1 symptoms were constrained to equal those for the corresponding Year 2 symptoms.

Results

Model Selection

Our results for parent- and teacher-report data (see Table 2) suggest that both the hierarchical 2-factor model and the hierarchical 3-factor model have better fit compared to the simple one-factor model and the non-hierarchical, correlated factor models. As detailed in Table 2, for Year 1 parent data, the hierarchical 2-factor model had slightly better fit (CFI = 0.96, TLI = 0.95, RMSEA = 0.057) than the hierarchical 3-factor model (CFI = 0.96, TLI = 0.94, RMSEA = 0.059), while the reverse occurred with Year 2 data (hierarchical 2-factor model: CFI = 0.96, TLI = 0.95, and RMSEA = 0.064; hierarchical 3-factor model: CFI = 0.97, TLI = 0.96, and RMSEA = 0.057). In Year 1 teacher data, the hierarchical 3-factor model best fit the data (hierarchical 2-factor model: CFI = 0.94, TLI = 0.93, and RMSEA = 0.081; hierarchical 3-factor model: CFI = 0.96, TLI = 0.94, and RMSEA = 0.072), but in Year 2 the hierarchical 2-factor model had the best fit (hierarchical 2-factor model: CFI = 0.97, TLI = 0.96, and RMSEA = 0.066; hierarchical 3-factor model: CFI = 0.97, TLI = 0.96, and RMSEA = 0.068). However, within both years, the fit of each hierarchical model for both parent and teacher SWAN was very similar. We therefore conclude that the hierarchical models with one general factor and either two or three specific factors (see Fig. 1) fit the data better than all competing models, using both parent- and teacher-report data in Year 1. This conclusion is cross-validated with the follow-up data from Year 2 (see Table 2). However, it is apparent that all of these models fit better for parents than for teachers.

Table 2 Fit of ADHD symptom CFA models for parent and teacher SWAN

Hierarchical Two-Factor Model Interpretation

With the parent-rating data, all symptoms had significant, positive loadings on the general ADHD factor, with completely standardized loadings ranging from 0.51 to 0.91 (all ps < 0.05; see Table 3 for standardized factor loading estimates and R 2 values). With the exception of a few symptoms from the inattention symptom domain (e.g., “listen when spoken to directly”, “ignore extraneous stimuli”, and “sustain attention on tasks or play activities”) that were more strongly related to the general factor than their specific factor, most inattention symptoms were related similarly to the general factor and their specific factor. In contrast, all symptoms from the hyperactivity/impulsivity domain were more strongly related to the general factor than their specific factor. With the teacher-rating data, all symptoms had significant, positive loadings on the general ADHD factor, with completely standardized loadings ranging from 0.67 to 0.95 (all ps < 0.05; see Table 3 for standardized factor loading estimates and R 2 values). All symptoms from the inattention symptom domain were more strongly related to the general factor than their specific factor. Similarly, all symptoms from the hyperactivity/impulsivity domain were more strongly related to the general factor than their specific factor. The most important implication from these findings is that a general factor substantially influences all of the ADHD symptoms while the specific factors account for additional variance not from this general factor.

Table 3 Completely standardized factor loadings for hierarchical 2-factor model fitted to year 1 parent-and teacher-report data

The interpretation of the standardized factor loading estimates for the hierarchical three-factor model in Year 1 was similar to those of the hierarchical two-factor model (see Table 4). Similarly, the overall pattern of factor loadings of the hierarchical models in Year 2 was virtually identical to our findings in Year 1 (see Suppl. Tables 3 and 4).

Table 4 Completely standardized factor loadings for hierarchical 3-factor model fitted to year 1 parent-and teacher-report data

Parent versus Teacher Reports

Comparison of the parent and teacher reports indicates that the symptom factor loadings on the general ADHD factor and R 2 values appeared stronger for teachers than parents on all the hierarchical models in both years. However, we found similar findings in terms of the general factor as the models for both informants produced moderate to strong loadings (i.e., >0.40) on this factor. Across hierarchical models, informants, and time points, inattention and hyperactivity/impulsivity (whether considered together or separately) symptoms were consistently more strongly related to the general factor than to their respective separate factors. One exception is that some parent-reported inattention symptoms in Year 1 were similarly related to the general factor and their specific factor. The loadings on the specific hyperactivity/impulsivity factors were often weak and sometimes non-significant or even negative. Overall, these findings suggest a single, dominant general latent factor underlying all 18 SWAN items.

Measurement Invariance of Sex, Informant, and Time

We examined factorial invariance by sex for both the hierarchical two-factor model and the hierarchical three-factor model separately for parent and teacher report and separately for Year 1 and Year 2 data. We found that strong measurement invariance holds across sex for both the hierarchical two-factor model and the hierarchical three-factor model and across both parent- and teacher-report (see Tables 5 and 6). This finding was cross-validated with the Year 2 follow-up data. Thus, the measurement properties of the SWAN are equivalent for boys and girls with respect to relating the ADHD constructs implied by the hierarchical factor models to the 18 observed symptoms. We also found that strong measurement invariance holds across Year 1 and Year 2 data for both the hierarchical two-factor model and the hierarchical three-factor model and across both parent- and teacher-report (see Tables 5 and 6). Thus, the measurement properties of the SWAN are stable across a 12-month period with respect to relating the ADHD constructs implied by the hierarchical factor models to the 18 observed symptoms.Footnote 3

Table 5 Results of factorial invariance testing for hierarchical 2-factor model
Table 6 Results of factorial invariance testing for hierarchical 3-factor model

Discussion

In the current longitudinal general population-based study, we evaluated the relative fit of six competing potential factor structures to determine the best model for the relationships among inattention, hyperactivity, and impulsivity symptoms, as reported by parents and teachers for elementary school-aged boys and girls using the SWAN scale. Hierarchical models with a general factor and two or three specific factors had the best fit regardless of informant, sex, or time point. These findings indicate a general ADHD factor that accounts for covariation among all symptoms characteristic of ADHD, as well as distinct specific factors that explain additional relationships among symptoms beyond general ADHD-related psychopathology. Our findings therefore corroborate those of other researchers documenting the superiority of general factor models in ADHD, using various ADHD rating scales and clinical interviews (see Suppl. Table 1). Complementing the results of previous studies that mostly have been conducted using clinical samples, our study provides further evidence that hierarchical models of ADHD also fit data well in a community sample of children. This suggests that such models are capturing an important structure or model of how the ADHD symptoms are organized across samples and populations. The fact that we found good fit for both hierarchical models was not surprising and is in line with Toplak et al.’s (2009) findings for a clinical sample of adolescents with the teacher SWAN. Those authors interpreted the general factor model with two specific factors as it is more parsimonious than the model with three specific factors. It is also interesting to note that in the current study, hyperactivity/impulsivity factor loadings in the two-factor solution appear more consistent across parents and teachers than in the three-factor solution. In other words, positive loadings for parents tend to parallel positive loadings for teachers. However, and perhaps more importantly, a dominant general factor is present regardless of whether there are two or three specific factors.

Notably, both parent and teacher reports converged and confirmed the superiority of the general factor models in contrast to other models. However, it is apparent that neither the general factor models nor the other non-hierarchical models fit as well for teachers as for parents. We propose several explanations for this difference. First, parents and teachers base their ratings on different contextual demands (i.e., home, school respectively). Second, parents and teachers also have different time frames of experience with a particular child, which may influence their ratings. Third, teachers also have access a different reference population (e.g., the whole class of students) than parents (e.g., siblings). Nonetheless, the fact that there is measurement invariance across the two time points for teachers, as well as for parents provides evidence of temporal stability of the general factor models across informants. Furthermore, on both parent and teacher reports, the inattention and hyperactivity/impulsivity (whether considered together or separately) symptoms were generally more strongly related to the general ADHD factor than to their respective separate factors. The factor loadings on the specific hyperactivity/impulsivity factor(s) were often even weak and sometimes non-significant or even negative. Taken together, these general results highlight an integral association between symptoms of inattention and hyperactivity/impulsivity, suggesting overlap among associated etiological pathways (Sonuga-Barke et al. 2008).

At a theoretical level, our findings converge with findings from both clinical and population studies which suggest a single, general latent construct of ADHD psychopathology underlying all 18 DSM-IV ADHD symptoms. Whereas these specific and independent dimensions of symptoms do exist, our findings imply that individual symptoms in each ADHD domain seem to more often co-occur rather than occur in isolation. This result can perhaps partly explain some of the inter-individual heterogeneity manifest in ADHD and is in line with research findings documenting important variability in neuropsychological endophenotypes of ADHD (Sonuga-Barke et al. 2010). This finding may also help explain why co-occurrence of impairments across these dimensions (as opposed to isolated deficits) is the norm (e.g., Conners 2008). Notably, the non-hierarchical exploratory factor model also permits overlap between inattention and hyperactivity/impulsivity symptoms by allowing cross-loadings; this is theoretically consistent with the notion of the general factor in the current hierarchical models tested.

Limitations and Future Directions

Ours is one of the three investigations on the hierarchical model of ADHD using a general population sample (Dumenci et al. 2004; Ullebø et al. under review) but among these, it is the only one adopting a longitudinal design. Nonetheless, there are three key limitations of the present study. First, our findings are limited to a specific age range (6 to 9) and so may not be generalizable to other age groups. However, evidence of the superiority of the hierarchical models of ADHD has been established with diverse age groups, including children, adolescents, and adults (see Suppl. Table 1). Second, although our is the only known longitudinal study on the hierarchical models of DSM-IV-TR ADHD symptoms, it is important to keep in mind that we only had two time points that were separated by only 1 year. One year might not be enough time for developmental differences to emerge. Nonetheless, our results indicate temporal stability of the ADHD hierarchical models over a 1-year follow-up, despite the change in one of the informants (teachers). Finally, in the current study we examined the ADHD symptoms alone in order to better understand the symptoms as an entity in DSM-IV and because our sample was constituted of children from the general population.

It will be important in future studies to explore the factor structure of ADHD with that of other disruptive behavior disorders and other correlates of ADHD, including cognitive functions, motivational processes, and personality trait profiles (e.g., Martel et al. 2011). Regarding the hierarchical models in particular, it will be crucial to examine the criterion-related and incremental validity of the separate general and specific factors. For example, evaluating the hierarchical models incorporating ADHD and ODD in relation to other models is likely to improve understanding of the structural relationship between these two conditions and to have important implications for diagnostic assessment, prognosis, and treatment (Martel et al. 2010a). This kind of research will allow us to examine further consistencies and inconsistencies across clinical and community samples, to eventually identify where important differences are.

Clinical Implications

Notwithstanding the limitations indicated above, our study has important clinical implications for finding an optimal diagnostic system that would enable clinicians and researchers to recognize and describe meaningful heterogeneity in ADHD in the DSM-V. Our findings are consistent with the notion that children with ADHD are characterized by substantial interindividual heterogeneity (Sonuga-Barke 2002; Sonuga-Barke et al. 2010) with some children exhibiting high inattentive and hyperactive-impulsive symptoms and other children exhibiting a different kind of symptom profile with high symptoms in only one domain. But in line with a recent continuous alternative to nominal subtypes of ADHD (Lahey and Willcutt 2010), it seems critical that the DSM-V committees consider the general ADHD factor and the relative contributions of inattention and hyperactivity/impulsivity symptoms in the review of diagnostic criteria.