Attention deficit/hyperactivity disorder (ADHD) is a childhood-onset disorder that is characterized by developmentally inappropriate levels of hyperactive, impulsive, and inattentive behaviors that cause impairment in multiple settings (American Psychiatric Association 2013). Over the past 4 decades, the most prominent changes to the diagnostic criteria for ADHD have involved the relative emphasis of specific behaviors (inattention, hyperactivity, impulsivity) or the conceptualization of how these behaviors relate to each other. Most recently, researchers have emphasized that a single general factor explains most of the variation in ADHD symptoms and that this general factor is more reliable than the subdimensions of inattention and hyperactivity-impulsivity (Arias et al. 2018; Gomez et al. 2018). As such, in this study we focus on individual differences in overall levels of ADHD behaviors.

A longstanding belief has been that individual differences in ADHD behaviors reflect neurobiological dysfunction (Levy 1991; Levy et al. 1998; Swanson et al. 1998; Wender 1975; Zametkin and Rapoport 1987). Although the sophistication of methods has increased over time (e.g., from gross brain abnormalities to differences in neural network organization; from heritability to genome-wide association or epigenetic expression method studies), the underlying assumptions about the origins of ADHD behaviors remain unchanged (Castellanos and Aoki 2016; Cortese 2012; Middeldorp et al. 2016; Walton et al. 2017). ADHD behaviors are presumed to reflect cognitive impairments, which reflect aberrant neurobiological development, which results from complex interactions between genes and environmental factors.

Decades of research have investigated the specific cognitive processes that are presumed to give rise to ADHD behaviors. Although early conceptualizations of ADHD as a purely attention disorder have been dismissed (Huang-Pollock et al. 2005), numerous cognitive processes—including executive functions (EFs), delay aversion, reaction-time variability, and vigilance/arousal—have been linked to ADHD (Coghill et al. 2014; Sonuga-Barke et al. 2010; Tamm et al. 2012). The current consensus is that ADHD is a neuropsychologically heterogeneous disorder across the lifespan (e.g., Kofler et al. 2019; Mostert et al. 2015; Sjowall and Thorell 2019; Wahlstedt et al. 2009). That is, as a group, individuals with ADHD exhibit impairments in multiple cognitive domains relative to their typically developing peers. However, the specific cognitive impairments vary across individuals with ADHD, no single cognitive domain is necessary or sufficient to inform diagnosis, and a minority of individuals with ADHD do not appear to exhibit any cognitive impairment.

Meta-analytic studies have consistently reported medium-sized associations between multiple cognitive processes and ADHD symptomatology in middle childhood through adulthood (Alderson et al. 2007; Doyle 2006; Hervey et al. 2004; Homack and Riccio 2004; Willcutt et al. 2005). This evidence has only been extended to early childhood relatively recently (Pauli-Pott and Becker 2011; Schoemaker et al. 2013). In their meta-analysis of 23 studies, which included over 3000 children, Pauli-Pott and Becker (2011) reported moderate-sized associations (rs = .18–.38) between multiple domains of cognitive function (i.e., response suppression, interference, delay aversion, vigilance, working memory) and ADHD symptomatology. In their meta-analysis of 22 studies, which included over 4000 children, Schoemaker et al. (2013) reported a moderate-sized association (r = .22) between EF skills and multiple domains of externalizing behavior, including ADHD (rs = 0.24, 0.17, 0.13 for inhibitory control, working memory, and cognitive flexibility, respectively). Based on this evidence, researchers have proposed early intervention efforts that target improved cognitive function as a strategy to prevent ADHD or mitigate ADHD-related impairments (Sonuga-Barke and Halperin 2010).

The rationale for early interventions that target improvements in children’s cognitive function as a strategy for reducing ADHD behaviors is predicated on the longstanding notion that ADHD behaviors result from cognitive dysfunction (i.e., these are within-person associations). However, this assumption is not well founded. Meta-analyses report group differences between ADHD and non-ADHD youth on (or relate continuous measures of ADHD behaviors to) cognitive performance measures. However, the individual studies from which effect sizes are derived primarily involve cross-sectional or case-control designs, which conflate between- and within-person sources of variation. Even when the studies that contribute to meta-analyses involve longitudinal designs, the descriptive statistics that are used to generate effect sizes typically conflate between- and within-person sources of variation. Hence, the interpretation of an average effect size of EF task performance and ADHD behaviors of r = 0.30 is ambiguous. Children with higher levels of ADHD behaviors may perform more poorly on EF tasks because EF deficits contribute to their ADHD behaviors. Alternatively, children with higher levels of ADHD behaviors may perform more poorly on EF tasks because of other differences that co-occur with ADHD behaviors (e.g., unmeasured genetic, cognitive, or experiential factors). The former interpretation reflects a within-person association, while the latter reflects a between-person association. With few exceptions (e.g., Arnett et al. 2016), studies of ADHD have rarely explicitly attended to between versus within-person associations.

Measuring both EF skills and ADHD behaviors in the same individuals across multiple occasions is one way in which between- and within-person sources of variation can be disaggregated. However, this disaggregation is contingent on how the repeated-measures data are analyzed (Curran and Bauer 2011). Conceptually, one can imagine a set of time-invariant variables that contribute to increased levels of ADHD behaviors or that equivalently increased risk for diagnosis (e.g., parental ADHD, low birth weight). These time-invariant risk factors are assumed to exert a constant effect on ADHD behaviors (i.e., on any given measurement occasion, children who were exposed to time-invariant risk factors will, on average, exhibit higher levels of ADHD behaviors) and represent between-person sources of variation. Importantly, we do not have to specify what these time-invariant variables are (or even measure them) in order to estimate their effects. If repeated measures of ADHD are available, we can estimate the contributions of all measured and unmeasured time-invariant contributions to ADHD behaviors based on the assumption that they exert a constant effect at each measurement occasion. At the same time, one can imagine a set of time-varying variables that contribute to increased levels of ADHD behaviors (e.g., EF skills). In contrast to time-invariant risk factors, time-varying risk factors change over time and can exert time-dependent contributions to ADHD behaviors. Time-varying risk factors must be explicitly measured (their impacts cannot be inferred) and they represent within-person associations (i.e., they reflect the unique association of a risk factor on ADHD at each measurement occasion, above and beyond the influence of all time-invariant risk factors). In the broader social science literature, random-effect and fixed-effect regression models are the two most commonly employed strategies for distinguishing between- and within-person associations using repeated-measures data (Allison 2009; Halaby 2004; Wooldridge 2002). Here, we followed Bollen and Brand’s (2010) structural equation modeling approach for testing between and within-person associations between EF skills and ADHD behaviors. Their approach subsumes and expands on traditional random and fixed-effects regression models.

The overarching objective of this study is to explicitly test the between and within-person associations between children’s performance on EF skills and ADHD behaviors in early childhood. EF skills are a set of domain general cognitive processes that include inhibitory control, working memory, and cognitive flexibility (Diamond 2013; Garon et al. 2008). EF skills represent one of multiple cognitive processes implicated in ADHD. In early childhood, EF skills are unidimensional (Espy 2017; Karr et al. 2018). Given our focus on early childhood, we prioritized tests of overall EF skills on ADHD behaviors. However, given that inhibitory control has long been considered a defining characteristic of ADHD (Barkley 1997; Nigg 2001), we repeated all analyses focusing on the time-varying effects of inhibitory control on ADHD behaviors. In both cases, our primary interest was determining whether the frequently reported bivariate association between EF skills and ADHD behaviors represented between- versus within-person sources of variation. Based on the meta-analytic results of Pauli-Pott and Becker (2011) and Schoemaker et al. (2013), we hypothesized that the unadjusted bivariate association between EF skills and ADHD behaviors would be r ≈ 0.30, with a somewhat smaller association at age 3 versus 5 years. We also hypothesized that the within-person association between EF skills and ADHD behaviors would be smaller in magnitude after we controlled for all time-invariant, between-person sources of variation, though the specific size of the within-person effect was unclear given the absence of directly relevant previous studies.

Method

Participants and Procedures

The Family Life Project is a longitudinal study of rural poverty that involves families who delivered a new child between September 2003 and August 2004 in one of six counties in eastern North Carolina or central Pennsylvania. Complex sampling procedures were employed to recruit a representative sample of 1292 children from rural counties, with oversampling of low-income families in both states and of African American families in North Carolina. A full characterization of the sampling plan and study has been detailed elsewhere (Vernon-Feagans et al. 2013). The University of North Carolina at Chapel Hill Internal Review Board provided oversight of the Family Life Project.

Following newborn hospital screening, families who were selected and agreed to participate were formally enrolled into the study by receiving a home visitor when the target child was approximately 2 months old. Participating children and their families subsequently completed multiple home and school visits. The current study makes use of data that were collected in home visits when children were 3 years, 4 years, and 5 years old. Specifically, this study is limited to 1160 children who had ADHD or EF data from at least one of the age 3-, 4-, or 5-year visits. Participating children (N = 1160) did not differ from nonparticipating children (N = 132) with respect to state of residence (40% vs. 36% residing in Pennsylvania, p = 0.51), living in a household that was recruited into the low-income stratum (78% vs. 73% poor, p = 0.33), primary caregiver educational status at study enrollment (80% vs. 82% with a high school degree or general education diploma, p = 0.62), sex of the child (51% vs. 54% male, p = 0.57), race of the child (43% vs. 36% African American, p = 0.28), or child first-born status (39% vs. 46%, p =0. 25). The average age of children at the 3-year visit was M = 3.09 years (SD = 0.15), at the 4-year visit was M = 4.03 years (SD = 0.13), and at the 5-year visit was M = 5.05 years (SD = 0.27). The child’s primary caregiver completed ADHD ratings at each visit.

Measures

ADHD Symptom Ratings

Consistent with previous studies (Pelham et al. 1992), primary caregivers rated the presence of 18 DSM-IV symptoms for ADHD using a 4-point Likert scale (0 = not at all, 1 = just a little, 2 = pretty much, 3 = very much) at the age 3-, 4-, and 5-year visits. We previously demonstrated that the ADHD items in this sample at these ages were most parsimoniously summarized by a single factor (Willoughby et al. 2012b). As such, the mean rating of all 18 items was used to summarize ADHD at each assessment (αs = 0.91, 0.91, and 0.93 at ages 3, 4, and 5 years, respectively). Of the 1160 sampled children, 5 (0.4%) did not have an ADHD rating at any timepoint, 72 (6.2%) had a rating at 1 timepoint, 97 (8.4%) had a rating at 2 timepoints, and 986 (85.0%) had a rating at all 3 timepoints.

Executive Function (Willoughby and Blair 2016)

A common battery of EF tasks was administered at the age 3-, 4-, and 5-year visits. One research assistant presented EF tasks to the child using an open, spiral-bound flipbook (20 cm × 36 cm [8 in. × 14 in.]), and a second recorded the child’s responses into a laptop for subsequent scoring. A standard script and administration procedures were employed throughout (i.e., items were demonstrated for each task, and children were given up to three practice trials to demonstrate their ability to understand task demands; children who did not demonstrate understanding during training were not administered the full task). The task administration procedures; the psychometric properties of individual tasks; and the overall battery score, retest reliability, and criterion validity of these tasks have been elaborated elsewhere (Willoughby et al. 2010; Willoughby and Blair 2011; Willoughby et al. 2012a, b, c). Abbreviated task descriptions are provided below.

Working Memory Span (WMS; Working Memory)

This span-like task required children to perform the operation of naming and holding in mind two pieces of information simultaneously (i.e., the names of colors and animals in pictures of “houses”) and to activate one of them (i.e., animal name) while overcoming interference occurring from the other (i.e., color name). Items were more difficult as the number of houses increased (each house included a picture of a color and animal). The WMS was administered at the age 3-, 4-, and 5-year visits and completed by 787 (67.8%), 958 (82.6%), and 983 (84.7%) children at the 3-, 4-, and 5-year visits, respectively.

Pick the Picture Game (PTP; Working Memory)

This self-ordered pointing task presented children with a series of 2, 3, 4, or 6 pictures in a set. Children were instructed to pick a picture and on each subsequent page instructed to pick a different picture within the set until each picture had “received a turn.” This task required working memory because children had to update which pictures in each item set they had already touched (the spatial location of pictures changed across trials and was uninformative). Pilot testing had revealed that the PTP was too difficult for many 3-year-olds and therefore was administered only at the 4- and 5-year assessments and completed by 934 (80.5%) and 1004 (86.6%) children at the 4- and 5-year visits, respectively.

Silly Sounds Stroop (SSS; Inhibitory Control)

This Stroop task presented children with pictures of cats and dogs and asked children to make the sound opposite of what was associated with each picture (e.g., to meow when showed picture of a dog). This task required inhibitory control, as children had to inhibit the tendency to associate bark and meow sounds with dogs and cats, respectively. The SSS was administered at the age 3-, 4-, and 5-year visits and completed by 479 (41.3%), 894 (77.1%), and 995 (85.8%) children at the 3-, 4-, and 5-year visits, respectively.

Spatial Conflict (SC; Inhibitory Control)

This task presented children with a response card that had one picture of a car and another of a boat. Initially, all stimuli (pictures of cars or boats identical to that on the response card) appeared directly above the corresponding photo on the response card (i.e., in a location that was spatially compatible with their placement on the response card). Children were instructed to touch “their” boat or car, according to which one they saw on the stimulus page. Subsequently, test items required a contra-lateral response (i.e., children were to touch their picture of the car even if its position relative to the stimulus card was reversed). This task required inhibitory control as children had to override the spatial location of test stimuli with reference to their response card. The SC was administered at the 3-year assessment and completed by 881 (75.9%) children.

Spatial Conflict Arrows (Inhibitory Control)

This task was identical in format to the SC task with the exception that the response card consisted of two black dots (“buttons”) and the test stimuli were arrows, presented one at a time, that pointed to the left or right. Children were instructed to touch the left button for a left-pointing arrow and the right button for a right-pointing arrow. Initially, all left-pointing arrows appeared either centered or directly above the left button, and all right-pointing arrows appeared either centered or above the right button. In the final portion of the assessment, some arrows pointed left but appeared above the right button, or vice versa. The task was administered at the 4- and 5-year assessments and completed by 985 (84.9%) and 1033 (89.1%) children at the 4- and 5-year visits, respectively.

Animal Go no-Go (GNG; Inhibitory Control)

This was a standard go no-go task in which children were instructed to click a button (which made an audible sound) every time they saw an animal (i.e., go trials) except when it was a pig (i.e., no-go trials). Varying numbers of go trials appeared before each no-go trial, including, in standard order, 1-go, 3-go, 3-go, 5-go, 1-go, 1-go, and 3-go trials. The GNG was administered at the age 3-, 4-, and 5-year visits and completed by 444 (38.3%), 796 (68.6%), and 980 (84.5%) children at the 3-, 4-, and 5-year visits, respectively.

Something’s the Same Game (STS; Attention Shifting)

This task presented children with a pair of pictures for which a single dimension of similarity was noted (i.e., color, size, or object type). Subsequently, a third picture was presented, and children were asked to identify which of the first two pictures was the same as the new picture. This task required the child to shift their attention from the initial labeling to a new dimension of similarity (e.g., a small blue chair and large blue flower would be described as “the same” in color; a small red dog would be described as “the same” as the small blue chair based on size). The STS was administered at the age 3-, 4-, and 5-year visits and completed by 842 (72.6%), 971 (83.7%), and 1024 (88.3%) children at the 3-, 4-, and 5-year visits, respectively.

Executive Function Task Scoring and Composite Formation

Children completed an average of 3.0 (SD = 1.7) of the 5 tasks that were administered at the age 3 assessment, 4.8 (SD = 2.1) of the 6 tasks that were administered at the age 4 assessment, and 5.2 (SD = 1.9) of the 6 tasks that were administered at the age 5 assessment. Correlations between the EF tasks within time points were weak to moderate (age 3: rs = 0.08–0.25; age 4: rs = 0.10–0.32; age 5: rs = 0.07–0.28). As elaborated elsewhere (Willoughby et al. 2012a, b, c), item-response theory models (i.e., graded-response and two-parameter logistic models, depending on the task) were used for task scoring. All tasks exhibited longitudinal measurement invariance and were scored on a common developmental scale (where M = 0, SD = 1 was defined as average performance at the age 4 assessment). In our previously published studies, we used individual EF task scores as reflective indicators of a latent variable of EF (e.g., Willoughby et al. 2010; Willoughby et al. 2012a, b, c). However, more recently, we have determined that individual EF task scores are better represented as a composite score (Willoughby et al. 2016, 2017). This EF composite was formed by taking the mean of each child’s task scores and was used as the primary independent variable in the models tested below. Of the 1160 sampled children, 39 (3.4%) did not have an EF composite score at any timepoint, 87 (7.5%) had a composite score at one timepoint, 169 (14.6%) had a composite score at two timepoints, and 865 (74.6%) had a composite score at all three timepoints.

Analytic Strategy

We estimated a series of structural equation models following the approach described by Bollen and Brand (2010). As depicted in Fig. 1, the general form of these models was:

$$ {ADHD}_{it}={\beta}_t{EF}_{it}+{\lambda}_t{\eta}_i+{\varepsilon}_{it} $$
Fig. 1
figure 1

General Panel Model. Caption. ADHD = attention deficit hyperactivity disorder; EF = executive function. β = coefficients relating time-specific measures of EF to ADHD behaviors; ε = variation in ADHD unexplained by time-varying EF and latent time-invariant factors; λ = coefficients for the magnitude of the associations between latent time-invariant influences on ADHD; φ = correlations between unmeasured time-invariant factors and EF

In words, ADHD is a dependent variable for the ith child at time t (here the age 3-, 4-, and 5-year assessments). EF is a time-varying independent variable for the ith child at time t. Beta (β) is a vector of coefficients that relate time-specific measures of EF to ADHD behaviors. Eta sub i i) is a latent variable that represents the impact of all time-invariant factors that influence ADHD behaviors for child i. Lambdas (λ) are coefficients that represent the magnitude of the associations between these latent time-invariant influences on ADHD at time t. One lambda coefficient is fixed to 1 for model identification; we chose age 4 because of our interest in understanding whether the effect of EF on ADHD differed across age 3 and 5 assessments. The epsilon (ε) term represents the variation in ADHD (for the ith child at time t) that is unexplained by time-varying EF and latent time-invariant factors.

The magnitude and statistical significance of the beta coefficients were of primary substantive interest. However, we began by fitting a sequence of models that imposed increasingly restrictive constraints on the lambda, beta, and epsilon parameters, along with the correlations between eta with EF (represented as phi [φ] parameters). Our sequence of models followed the backward search strategy that was described by Bollen and Brand (2010). We used likelihood ratio tests, global model fit statistics (i.e., comparative fit index [CFI], root mean squared error of approximation [RMSEA]), and information criteria (Akaike information criteria, Bayesian information criteria) to determine which model specification provided the best fit to the observed data. It was only after we had identified this best-fitting model that we interpreted the association between EF and ADHD. All models were fit using a robust full-information maximum-likelihood estimator that took individual probability weights and stratification variables into account to appropriately represent the complex sampling design. Full-information maximum-likelihood estimation also accommodated missing data (i.e., all available EF and ADHD data for each child were included) and is considered a statistical best practice (Schafer and Graham 2002). All models were estimated using version 8.0 of Mplus (Muthén & Muthén, 1998–2017).

Results

Bivariate Associations

Descriptive statistics for parent-reported ADHD symptoms and children’s EF composite scores are provided in Table 1. The across-time correlations for ADHD measures were large (rs = 0.60–0.72, ps < 0.001) and for EF were moderate to large (rs = 0.36–0.61, ps < 0.001). The within-time associations between EF and ADHD were consistent with previous meta-analytic results (rs = 0.20, 0.30, and 0.29 for the age 3-, 4-, and 5-year assessments, respectively, all ps < 0.001). In terms of mean differences, there were substantial increases in performance on the EF composite and modest decreases in parent-reported ADHD symptomatology.

Table 1 Descriptive statistics

Model Description, Fit, and Comparisons

A synopsis of global model fit, information criteria, and likelihood ratio tests that were used for model comparisons appears in Table 2. Model 1 was a baseline model in which all parameters were freely estimated and fit the data well, χ2 (3) = 3.9, CFI = 0.99, RMSEA (95% confidence interval [CI]) = 0.016 (0.000, 0.054). Model 2 tested whether the lambda parameters could all be constrained to 1, which would imply that the latent time-invariant influences on ADHD were constant across ages 3–5 years. Although global fit for Model 2 was acceptable (see Table 2), this model fit the data significantly worse than Model 1, Δχ2 (2) = 8.6, p = 0.014. An inspection of the parameter estimates from Model 1 suggested that the lambda parameter at age 3 was weaker than at ages 4 and 5. Model 2b freed the constraint on the lambda parameter at age 3 and no longer differed in fit from Model 1, Δχ2 (1) = 2.8, p = 0.10. Building on Model 2b, Model 3 tested whether the beta parameters could be constrained to be equal, which would imply that the time-varying effect of EF on ADHD was constant across ages 3–5 years. This constraint was reasonable, as Model 3 fit the data as well as Model 2b, Δχ2 (2) = 1.2, p = 0.55.

Table 2 Summary of model fit statistics and model comparisons

Model 4 tested whether the covariances between the latent variable eta and the time-varying measures of EF could be constrained to 0, which would imply that the latent time-invariant influences on ADHD were independent of the effects of EF on ADHD. Model 4 fit the data more poorly than Model 3, Δχ2 (3) = 70.4, p < 0.001. An inspection of the parameter estimates from Model 3 clearly indicated that all three of the covariances between the latent variable and time-varying measures of EF were non-zero, and no further constraints for the phi parameters were considered.

Continuing to build from Model 3 (because none of the constraints in Model 4 were supported), Model 5 tested whether the residual variances (epsilon) could be constrained to be equal, which would imply that the combined time-varying and time-invariant effects on ADHD were equivalent across ages 3–5 years. Model 5 fit the data more poorly than did Model 3, Δχ2 (2) = 31.1, p < 0.001. An inspection of the parameter estimates from Model 3 suggested that the epsilon parameter at age 3 was larger than that at ages 4 and 5. Model 5b freed the constraint on the epsilon parameter at age 3 but continued to fit the data more poorly than Model 3, Δχ2 (1) = 5.7, p = 0.017.

Consistent with the nested likelihood ratio tests, a comparison of the information criteria across models also indicated that Model 3 represented the most parsimonious representation of the data (see Table 2). Hence, although all models met traditional standards for acceptable global model fit, the likelihood ratio tests and information criteria converged in identifying Model 3 as providing the best fit to the observed data. Having identified the model that best represented the observed data, we next focused on the interpretation of model parameters.

Effects of Executive Function Skills on ADHD Behaviors

Based on Model 3, the within-person effect of EF on ADHD at age 3 was β = −.052 (95% CI = −0.093 to −0.011), p = 0.012; at age 4 was β = −0.051 (95% CI = −0.090 to −0.011), p = 0.013; and at age 5 was β = −0.043 (95% CI = −0.078 to −0.009), p = 0.014. In contrast, the between-person correlations involving the time-invariant latent influences on ADHD and time-specific indicators of EF were φs = −0.211, −0.286, and − 0.312, for age 3, 4, and 5 years, respectively, all ps < 0.001. Figure 2 replaces the general form of the panel model with model-based estimates. To reinforce the importance of identifying the model that provided the best fit to the observed data, we summarized the point estimate and confidence interval for the within-person effects of EF on ADHD across all the models in Fig. 3. It is noteworthy that the estimated effect of EF skills on ADHD behaviors was largest when the latent time-invariant influences on ADHD were assumed to be uncorrelated with time-varying measures of EF skills (Model 4), a condition that was not supported by the data. The ability to test this assumption is not possible in studies that employ cross-sectional designs or in studies that employ longitudinal designs but that do not parameterize models in ways that permit the explicit partitioning of between- and within-person variation in ADHD behaviors.

Fig. 2
figure 2

Representation of Model 3 Estimates. *p < 0.05. **p < 0.01. ***p < 0.001

Fig. 3
figure 3

Standardized Effects of EF Skills on ADHD Behaviors, by Model and Child Age

Supplementary Models

Up to this point, we have focused on the association between an overall measure of EF and ADHD. Although combining multiple EF tasks into a single score improves the reliability of measurement (Willoughby et al. 2017), ADHD may be more strongly associated with specific dimensions of EF skills, such as inhibitory control. To test this possibility, we recomputed the EF composite using up to three inhibitory control tasks and re-estimated the full sequence of structural equation models. The same pattern of results was evident with respect to the model comparisons. However, the magnitude of the effect was somewhat smaller for inhibitory control than for all EF tasks combined: age 3 β = −0.037 (95% CI = −0.077 to 0.003), p = 0.068; age 4 β = −0.034 (95% CI = −0.071 to 0.003), p = 0.069; age 5 β = −0.028 (95% CI = −0.059 to 0.002), p = 0.071. A summary of these results appears in supplementary materials (Online Resource Fig. 1 and Online Resource Table 1).

Up to this point, we have focused on overall ADHD behaviors. However, there is a long tradition in the ADHD literature of distinguishing inattentive from hyperactive-impulsive behaviors (Willcutt et al. 2012). We re-estimated the full sequence of structural equation models using EF composite scores with inattentive scores and then separately with hyperactive-impulsive scores. Model 3 continued to be the best representation of the data. None of the within-person associations between inattention and EF were statistically significant: age 3 β = −0.036 (95% CI = −0.078 to 0.007), p = 0.100; age 4 β = −0.035 (95% CI = −0.077 to 0.007), p = 0.100; age 5 β = −0.030 (95% CI = −0.067 to 0.006), p = 0.103. In contrast, the within-person associations between hyperactive-impulsive behaviors were small but statistically significant: age 3 β = −0.064 (95% CI = −0.106 to −0.023), p = 0.003; age 4 β = −0.061 (95% CI = −0.101 to −0.021), p = 0.003; age 5 β = −0.053 (95% CI = −0.088 to −0.018), p = 0.003. A summary of these results is available in supplementary materials (Inattentive: Online Resource Fig. 2 and Online Resource Table 2; Hyperactive/Impulsive: Online Resource Fig. 3 and Online Resource Table 3).

Discussion

A prevailing assumption in ADHD research has been that individual differences in ADHD symptomatology reflect the outward manifestation of underlying cognitive impairments. EF skills represent one domain of cognitive functioning that has been consistently related to ADHD symptoms across the life span. EF skills undergo pronounced normative improvements in early childhood, which has led to suggestions that early intervention efforts that target EF skill development may provide a means of preventing ADHD symptomatology or mitigating symptom-related impairments (Halperin et al. 2012; Sonuga-Barke and Halperin 2010). The purpose of this study was to leverage repeated-measures data to provide a stronger test of whether the association between EF skills and ADHD behaviors in early childhood reflects differences that exist between versus within children.

We observed bivariate associations between EF skills and ADHD behaviors that were in line with previous meta-analyses, including evidence of stronger associations among older versus younger preschool-aged children (rs = 0.30 vs. 0.20). These age differences were consistent with the idea that the association between EF skills and ADHD is less pronounced during periods of substantial normative developmental change (Pauli-Pott and Becker 2015). Notably, we have previously established that our measures of EF skills and ADHD behaviors exhibited longitudinal measurement invariance during the ages studied here (Willoughby et al. 2012a, b, c). This helps to rule out the possibility that these age-related differences in the association between EF skills and ADHD behaviors is an artifact of poorer quality measurement at age 3 versus age 5 years.

Most studies that have investigated the association between EF skills and ADHD in early childhood have not provided strong control for confounder variables that may account for some or all these associations. Here, we took advantage of repeated-measures data to control for all time-invariant confounder variables that might have accounted for the association between EF skills and ADHD behaviors. We fit a range of models that differed with respect to the assumptions that were made about the nature of the association between EF skills and ADHD behaviors, and we used formal statistical criteria to discern which model specification provided the best fit to the observed data. Our emphasis of model comparisons was important because all the models exhibited acceptable global model fit (see Table 2). Nonetheless, some models provided relatively better fit than others. We selected the model that provided the best representation of the observed data. It was only after we identified this model that we began to interpret parameter estimates. The benefits of this approach are evident from a review of Fig. 3, which demonstrates how the magnitude of the within-person association between EF and ADHD varies across a series of models, any one of which would have been considered to have “good enough fit” if considered in isolation.

In this study, the association between EF skills and ADHD behaviors primarily resulted from differences between children, which is consistent with confounding. That is, the results of this study suggest that EF skills are moderately to strongly correlated with other time-stable influences on children’s ADHD behaviors. When those time-stable influences were controlled for, the time-specific within-person time associations between EF skills and ADHD behaviors were small—far smaller than what is conveyed by meta-analytic studies (which typically conflate between- and within-person effects).

In many respects, the results of this study are unsurprising. The discordance between performance-based measures of EF skills and ADHD behaviors is well known, which is why performance-based EF measures are not used to diagnose ADHD. Paradoxically, a recurring idea in the literature, which has received some empirical support, is that efforts to improve a child’s EF skills (or related neurocognitive processes) will have corresponding benefits in terms of their ADHD behaviors (e.g., Spencer-Smith and Klingberg 2015). Although the methods of improving EF skills vary across studies—from meta-cognition training, to increased physical activity, to enhanced play, to adaptive computerized games—the theory of change underlying many interventions that are used in early childhood is the same (Halperin et al. 2013; Hoza et al. 2015; Tamm and Nakonezny 2015; van Dongen-Boomsma et al. 2014). This contradiction about whether an individual’s EF skills contribute to their ADHD behaviors—and if so, by how much—abounds in the literature. The results of this study suggest that most of the association between EF skills and ADHD behaviors derives from differences that exist between, not within, children.

It is instructive to contrast our approach with other recent studies that have investigated the association between ADHD symptoms and EF skills in early childhood. For example, Sjöwall and colleagues reported that measures of EF skills that were obtained in early childhood predicted ADHD behaviors and academic achievement in adolescence, above and beyond the effects of preschool ADHD behaviors (Sjöwall et al. 2017). Pauli-Pott and colleagues reported that EF skills (and related neurocognitive processes) that were measured at ages 4 and 5 predicted ADHD symptom changes from age 4 to 5 but that the converse was not true (Pauli-Pott et al. 2017). Tibu and colleagues reported that psychosocial deprivation experienced in early childhood predicted EF skills and ADHD behaviors in middle childhood and adolescence and that EF skills served to mediate the association between early deprivation and later ADHD behaviors (Tibu et al. 2016a, 2016b). These studies and others like them are interpreted as evidence that individual differences in EF skills precede and contribute to the manifestation of ADHD behaviors, which is inherently a within-person inference. However, despite the use of longitudinal designs, none of these studies analyzed repeated-measures data in ways that explicitly distinguished between- and within-person sources of variation. As such, they offer a relatively weak basis of inference for the assertion that ADHD behaviors reflect underlying impairments in EF skills.

Another approach for investigating the association between EF skills and ADHD behaviors has involved recruiting samples of children who meet diagnostic criteria for ADHD and a matched group of typically developing comparison youth. All children complete a common set of EF (and/or other cognitive performance-based) tasks. Statistical methods are then used to identify subgroups of children who are homogenous with respect to their patterns of EF (and/or other cognitive) task performance (e.g., Fair et al. 2012; Rajendran et al. 2015; Roberts et al. 2017). These studies provide an elegant way of empirically documenting cognitive heterogeneity within ADHD youth, which was anticipated over a decade ago (see Nigg et al. 2005). Nonetheless, these studies do not inform questions about whether ADHD behaviors reflect underlying EF impairments, even for the subgroups of children who exhibit elevated ADHD symptomatology and who exhibit impairments in EF task performance. Once again, these studies do not test within-person associations.

This study suffered from five limitations. First, a variety of cognitive processes that have been implicated in the development of ADHD were not included in this study (e.g., delay aversion, reaction-time variability). It is unclear whether a similar pattern of results would have been evident had these other constructs been included. Second, we focused on the association between EF skills and ADHD behaviors, without consideration of whether these behaviors resulted in impairment, which may limit the clinical utility of this study. Third, our results indicated that stable inter-individual difference variables accounted for most of the association between EF skills and ADHD behaviors. However, we did not consider the origin or predictors of these individual differences, which is an important direction for future research. Fourth, our study is based on a community sample of families and children who were recruited from nonmetropolitan, low-wealth counties in North Carolina and Pennsylvania. It is unclear to what extent these results would generalize to other settings. Fifth, we capitalized on normative changes that occur in EF skill development and ADHD behaviors in early childhood. The approach that we used here is not well suited to circumstances in which within-person variation in EF skills and ADHD behaviors are not expected.

Modern perspectives of developmental psychopathology emphasize the multilevel nature of all disorders, which represent complex associations among genetic, neural, cognitive, behavioral, and environmental factors. ADHD has long been characterized in this vein, such that individual differences in ADHD behaviors have been presumed to reflect developmental impairments in specific cognitive processes. Much of the extant evidence that has been marshalled to support this assumption conflates between- and within-person sources of variation, even though within-person variation is implicated in theoretical models of ADHD and is uniquely relevant to clinical decision making. We hope that the results of this study will challenge others to think critically about this distinction. Although we have emphasized the merits of using repeated-measures data, experimental studies that narrowly target improvements in EF (or other related cognitive) skills and that evaluate whether these experimentally induced improvements in EF skills result in improvements in ADHD behaviors, remain the strongest test of whether impaired EF skills contribute to the emergence and/or maintenance of ADHD behaviors in individual children.