Introduction

The use of electroencephalography (EEG) in ADHD began more than 75 years ago with Jasper et al. [1] reporting a slowing of the EEG rhythms at fronto-central sensors, a putative indicator of abnormal brain function in a group of “behavior problem children” – described as hyperactive, impulsive, and highly variable. The relative maturity of this finding when compared with recency of other major advances in neuroimaging [2, 3], underscores the potential for clinical utility of EEG. EEG is readily accessible and inexpensive, and measures with millisecond temporal resolution, the electrical activity produced by neuronal ensembles of the cerebral cortex. Yet, 75 years later, clinical applications of EEG in psychiatry are extremely controversial, with the primary question being whether the knowledge gained from EEG has any practical diagnostic value [46]. A number of excellent recent reviews have discussed methodological limitations, from a clinical standpoint, of using EEG as a diagnostic [6], as well as pointing to a relative lack of diagnostic studies [5, 7•].

In this review, we focus instead on the current state of EEG metrics that have potential for application in diagnosis of ADHD. We first summarize the most recent progress in two classes of neurophysiological features that have been associated with group differences between ADHD and other populations, the slow EEG rhythms described by Jasper et al. [1] and event-related potentials (ERP). Second, we evaluate new developments that more directly address the links between EEG features and clinical heterogeneity in ADHD — significant factors that may have limited past utility of EEG in ADHD diagnosis. We cautiously conclude that these new developments, which include multivariate analyses and resolution of EEG signals into their neural generators, place EEG on a path to transition from a research tool to an aid in clinical evaluation of ADHD.

Neurophysiological Candidates for Biomarkers of ADHD

The signal measured by each EEG sensor is a time course representing, in amplitude, the amount of summed electrical activity occurring somewhere in the cortex. This signal is rich in information that can be extracted using a variety of different techniques. In Fig. 1, we summarize common techniques for extracting information from the EEG time course that can be used to diagnose ADHD. Meaningful information content is extracted from such a signal, typically, in one of two ways: by quantifying the power (absolute magnitude) of oscillations of the signal spanning an interval of minutes (typically during resting conditions, [Fig. 1a, lower left]), and, by computing the average change in latency or amplitude of the electrical potential in the range of hundreds of milliseconds following or preceding some event (e.g., event-related potential, ERP, [Fig. 1a, upper right). The former is broadly compatible with a measure of brain state, whereas the latter measures transient cortical dynamics. A combination of these techniques can be used to quantify event-related changes in power (and therefore brain state, [Fig. 1a, lower right]). Since EEG signals are collected from many spatial locations on the scalp, representing the contributions of many sources in the cortex, these spectral (i.e., power) and temporal metrics can be computed for different electrodes (Fig. 1b). The EEG dataset is therefore a combination of temporal, spectral, and spatial (e.g., electrodes) “features”, all of which can be used to assist diagnosis (Fig. 1c). In the following discussion we review recent developments in common metrics used in ADHD diagnosis.

Fig. 1
figure 1

Diagnosis of ADHD can be based on temporal (a), spectral (a) and spatial (b) features of EEG, either alone or in combination (c). Raw EEG (a, top left), can be decomposed into spectral components that are quantified by power, which represents the amplitude of oscillations of varying frequencies that are present in the continuous signal. These measures capture the background “state” of brain activity. Alternatively, the data can be segmented (or epoched) around an event of interest (x). The epochs are averaged and normalized by pre-stimulus activity, to produce the event-related potential. These measures quantify temporal dynamics of information processing. By combining spectral analysis with event-related averaging, one can analyze event-related spectral power, changes in synchronization that may represent changes in the brain state during information processing. Spatial features (b) of EEG are scalp topography maps (spectral and time-domain values across electrodes) and their estimated cortical sources. Any of the spatial, temporal and spectral features of the EEG signal may be used to distinguish between patients with and without ADHD. Diagnosis based on EEG features therefore benefits from multivariate approaches that use patterns across features to classify patients (c). The lower panel shows an extreme example of the benefit of multivariate classification. Whereas each measure alone shows only weak trend differences between the two populations (e.g., red = ADHD, blue = Control), the combination of the two metrics (middle scatter plot) produces a linear function that dissociates between the two groups (e.g., ADHD fall above and Control fall below the line). Such an approach is likely to be of value in ADHD, known to exhibit significant variability in EEG measures

Theta: Slowed Brain Rhythms

The most robust EEG feature associated with ADHD is elevated power of slow waves (4-7 Hz “theta”) and/or decreased power of fast waves (14-30 Hz “beta”), typically recorded over fronto-central electrodes, which are sometimes combined and quantified by the theta/beta ratio (TBR) [8, 9]. This conclusion was bolstered by early reports of medium to large effect sizes, ranging from .62 and 3.08 [10, 11, 12••], for group differences in TBR and of diagnostic sensitivities and specificities in excess of 90 % in multiple reports [1315]. The relevance of TBR to ADHD was further strengthened by initial hypotheses linking the increases of slow wave activity contributing to this ratio with hypoarousal [9, 16], one of the earliest characterizations of the disorder [17]. The combination of a clear theoretical rationale and significant statistics inevitably contributed to the July 15, 2013 news release (http://www.fda.gov/newsevents/newsroom/pressannouncements/ucm360811.htm), by the Food and Drug Administration, of approval for a Neuropsychiatric EEG-Based Assessment Aid (NEBA) System for the diagnosis of ADHD — rooted in TBR.

The timing of the NEBA news release coincides with rising concern over the accuracy and reliability of TBR as a diagnostic [4] (Table 1). At least five recent studies have failed to replicate theta or TBR differences in ADHD versus non-ADHD, between groups of children [18] and adults [19, 20], as well as cross-sectionally across age [21, 22, 23•]. In a study of 101 children (62 diagnosed with ADHD), Ogrim et al. [18] reported sensitivity of 63 % and specificity of only 58 % in differentiating between children with and without ADHD based on TBR, to contrast with an accuracy of 85 % based on classification by omission errors alone. In a cross-sectional analysis, using logistic regression, Buyck and Wiersema [21] reported accuracy of 89.8-96.5 % in TBR predicting age (theta decreased with age), but only 49.2-54.8 % accuracy in predicting whether an individual has ADHD. Similarly, Liechti et al. [22] found 81 % accuracy in predicting age based on TBR, but only 53 % accuracy in predicting diagnosis. In the largest study of the TBR to date, no significant differences were found between 562 children, adolescents, and adults with ADHD compared to 309 non-ADHD controls, although modest heterogeneity was attributed to ADHD subtype and psychiatric co-morbidity [23•]. These null results were confirmed in a recent meta-analysis by Arns et al. [12••], who reported a diminishing, non-significant TBR effect size that was significantly associated with year of study publication (r = -0.96, p = 0.002). Notably the TBR for the ADHD group has remained fairly stable, however, the control group TBR has steadily risen over the years between 2006 and 2013 [12••]. Furthermore, in two recent attempts to explicitly test the posited [16] association between TBR and arousal, Clarke et al. [24] and Barry et al. [25] have reported no significant relationship between TBR and skin conductance level (SCL). Rather, they replicated their previously reported associations between SCL and power in the “alpha” (8-14 Hz) frequency range [26, 27]. The absence of a TBR-SCL correlation was despite detection of significant group differences in SCL and theta power, indicating that lack of power was not the mediating factor in the null result. It therefore remains unclear with what cortical activity, cognitive functions and behavioral symptomatology the TBR is associated. These studies do suggest, however, that TBR is not reliable in discriminating between individuals with and without ADHD.

Table 1 Studies examining the diagnostic utility of theta/beta ratio in ADHD

Event-Related Potentials: Abnormal Cortical Processing

The transient neural dynamics captured by ERPs have also been explored for distinguishing features of ADHD. An advantage of ERPs is that, because they capture the temporal evolution of neural activity following a prescribed event, they can be linked with specific stages of processing. They can be used to distinguish, for instance, between deficits of sensory processing (e.g., expected to occur within the first 100 ms or so following a tone) and post-sensory discrimination (e.g., expected only after sensory processing has completed), thus potentially allowing for a more refined diagnostic.

In practice, classification success using features of ERPs has been modest, hovering in the range of 70-80 % [8, 30]. In a review of a decade of ERP research (2002-2012), Johnstone et al. [7•] pointed to group differences between ADHD and controls in a variety of ERP features related to executive functions such as selective attention (P2, P3), response inhibition (N2, P3), error detection (ERN, Pe), and feedback processing (FRN), but noted that the results were quite variable, and that systematic studies of diagnostic success were largely absent from the literature. We are aware of only one recent meta-analysis [31] of the P3, a positive voltage deflection around 300 ms that has been associated with stimulus evaluation and response selection [32, 33], conducted across six studies in adults. This analysis revealed a large effect size (Cohen’s d = -0.55) for distinguishing adults with and without ADHD, which is consistent with the conclusions of Johnstone et al. [7•] but requires further research and reporting of sensitivity and specificity. The calculation of ERP features such as peak amplitudes or latencies, however, can be susceptible to high variance when relatively few trials are averaged (<50), especially when only one sensor is considered. This may have limited the efficacy of ERP features in predicting ADHD diagnosis in prior studies.

Partially in response to this limitation, there has been a rise in the use of multivariate analyses that exploit the co-variation between measures from many time points and many sensors to characterize group differences (Fig. 1c, Table 2). The gain in power from these approaches is evident in studies by Mueller et al. [34••] and Nazvahani et al. [35], who used machine learning algorithms and a combination of ERP-derived metrics to achieve classification accuracy in excess of 90 %. Mueller et al. [34••] reported sensitivity and specificity of 91 % in predicting diagnosis in a sample of 150 adults (75 with ADHD), exploiting a combination of five response-inhibition ERP features identified using independent component analysis. In a smaller sample (n = 36), focusing on visual evoked responses to flashes of light, Nazhvani et al. [35] developed an algorithm that identified the combination of time points at which the ERP amplitude maximized the accuracy of group discrimination. Using this approach they reached an accuracy of 94.6 % in discriminating adults with ADHD from controls and also an accuracy of 92.9 % in distinguishing adults with ADHD from those with bipolar mood disorder. Similarly, three recent applications of machine learning approaches to predict diagnostic category based on spectral power across a range of frequency bands and higher-order descriptors, accuracy ranged from 86 % to 97 %. Using a combination of spectral power and fractal features (see glossary) of EEG time series, one study reported diagnostic accuracy to be 86.4 %, with fractal features showing the strongest discrimination [36]. Ahmadlou and Adeli [37] reported maximal accuracy of 95.6 % based on the combination of theta band synchronization at electrodes O2/P4 and frontal electrodes, and delta band synchronization at electrode T5 and frontal electrodes. Similarly, Abibullaev and An [38] obtained a maximal accuracy of 97 %, using relative theta measures recorded from nine frontal scalp electrodes. Based on these accuracy rates, we may conclude that the potential of multivariate machine learning tools in EEG-based diagnostics is intriguing but, as such studies remain sparse and the results offer no simple interpretation (also c.f. individual level diagnostics), requires further replication and validation.

Table 2 Studies employing multivariate analyses and novel measures in EEG-based diagnosis of ADHD

Capturing Clinical Heterogeneity in EEG-Based Diagnostics

The limited success of EEG in diagnosis of ADHD is perhaps not surprising when considering the heterogeneity in etiology, symptoms and treatment outcomes of the disorder, a fact that has led most theorists to favor multiple pathway models over single-cause explanations of the disorder [4045]. Affected neural circuits in ADHD have included nigrostriatal, mesolimbic, and mesocortical dopamine pathways [41], noradrenergic dysregulation of posterior attentional pathways [46], delayed development of frontal cortex [47], and atypical functioning of default mode network [48]. The plausibility of multiple pathways causing the same set of ADHD symptoms implies that a single EEG measure, like TBR, will be effective in predicting ADHD in only a subset of those diagnosed. A growing awareness of increased heterogeneity even in non-ADHD populations is noted by Arns et al. [12••], who report that the significant heterogeneity in the control group TBR is responsible for the attenuated effect size of the TBR in ADHD. This means that the major challenge for any diagnostic is to contend with heterogeneity, not only of the disorder but also at the larger population level as well. Current efforts to disambiguate the links between clinical heterogeneity of ADHD and EEG-based metrics include multivariate, multidimensional analyses of EEG features previously associated with ADHD (Fig. 1c), as well as mechanistically motivated studies of the neural correlates of EEG features.

Emerging Multivariate EEG Profiles

Both exploratory and targeted analyses of spectral power in the continuous EEG recording have indicated the presence of multiple multidimensional clusters within ADHD, suggesting that using any one spectral power (or ERP) feature is likely to be characteristic of only a subset of kids with ADHD. Using a multivariate cluster analysis, Clarke et al. [49] characterized the variability within EEG data of 264 children (155 with ADHD) to reveal five behaviorally and symptomatically unique clusters. Of these, only two (55 participants, 36 % of the ADHD sample) showed elevated theta (and reduced beta), one of which was associated with reduced ADHD symptoms and the other with enhanced self-enjoyment. The remaining three clusters were characterized by: elevated beta (12.5-25 Hz) power that was coupled with symptoms of delinquent behaviors (n = 36), elevated slow wave (<12.5 Hz) power that was coupled with indices of maturational lag (n = 38), and elevated frontal alpha (7.5-12.5 Hz) power that was coupled with markers of ritualistic obsessive behaviors (n = 26). Clearly, elevated spectral power in the theta band (or any particular frequency band for that matter) is not a feature that is homogenous within ADHD but characterizes only a subset of patients.

Furthermore the overall spectral profile may also vary with gender and clinical subtype [50, 51, 52•, 53]. In several studies TBR was reported to be enhanced in the combined subtype of ADHD more than in the inattentive subtype [23•, 52•], and only in boys in other studies [51, 52•]. These data indicate that there are important sources of heterogeneity in ADHD (and the larger population) that make it unlikely that any one EEG or ERP feature can capture all or the majority of the variance. Although previous studies have focused on ADHD subtype, which is one of the most salient diagnostic divisions, this too may be tenuous. It is notable that, in a comprehensive review of the validity of DSM-IV subtype criteria, Willcutt et al. [54•] have concluded that the distinction between inattentive, combined and hyperactive sub-types is minimally supported, in contrast to the symptom dimensions of inattention and hyperactivity, which were well supported. It seems therefore that the sub-group characterized by elevated spectral power or ERP feature need not correspond to an existing sub-type.

Alternative approaches have been proposed to adapt EEG-based diagnostics to the heterogeneity of the ADHD clinical sample. Hermens et al. [55, 56] argued that EEG features ought to be best utilized as part of a larger profile and for prediction of treatment response rather than as a diagnostic. Defining response criteria based on performance on cognitive tests, and various EEG features (including resting state spectral power and ERP-related features) they achieved a sensitivity of 80-90 %, and specificity of 90-95 %. In more recent logistic regression analyses, Ogrim et al. [57, 58] identified EEG features that, as part of a larger profile, predicted positive response to methylphenidate (determined by symptom reductions), as well as the side effects. In these analyses, responders were characterized by higher baseline theta-band and alpha-band power, whereas side effects were predicted by a number of baseline ERP components including visual evoked potentials, anticipatory potentials and P3 amplitude. Interestingly, neither medication response nor side effects were predicted by ADHD sub-type (combined versus inattentive), in agreement with the conclusions of Willcutt et al. [54•]. Finally, Clarke et al. [59], in an 11-year follow up, reported that individuals whose ADHD persisted into adulthood had greater childhood global relative beta power, reduced frontal relative theta power, and increased frontal absolute and relative beta power. These studies demonstrate proof-of-concept utility of EEG as predictor of outcome rather than as diagnostic, and highlight the value of multivariate profiling. They also demonstrate the challenge with this approach. The profiles are complex, which makes their interpretation and direct comparison across studies difficult at best. Their generalizability remains to be tested.

Emerging Neural Mechanisms of EEG Features

In contrast to large-scale data mining efforts that seek to extract predictive content out of EEG (and other) data, efforts are emerging that aim to map the neural mechanisms of EEG features. From this perspective, clinical heterogeneity of ADHD may be addressed by identifying the neural pathways that account for distinct groups of ADHD symptoms or neurocognitive performance profiles. Mapping these to their EEG correlates could furthermore improve the diagnostic accuracy of EEG. This approach, though in relative infancy, is important because circuitry has been extensively studied in ADHD using magnetic resonance imaging (MRI) [5, 43, 6067], and thus provides priors for the categories that may be nested within EEG metrics in ADHD. Group differences in neural activity and connectivity have been documented for fronto-striato-cerebellar dopamine pathways [41, 68], attentional pathways [46, 64, 65], as well as default mode network [43, 48, 67]. Preliminary reports suggest that the degree of dysfunction in these networks can vary with ADHD subtype, comorbidity and symptoms. Fair et al. [69], having analyzed the connectivity patterns of 648 datasets (455 from individuals with ADHD), suggested that ADHD inattentive subtype is associated with a dysfunction of dorsolateral prefrontal cortex — a core node in the striatal and attentional pathways, whereas ADHD combined subtype is associated with dysfunction of default mode network regions. Arnsten and Rubia [62] reviewed differential network involvement according to comorbidity, highlighting, for example, dysfunction of attentional systems in obsessive-compulsive disorder that is more prominent than in major depressive disorder or conduct disorder.

The large body of knowledge derived from MRI and functional MRI may be helpful in refining the efficacy of EEG measures in ADHD diagnosis and prognosis. Direct studies comparing MRI and EEG in the context of ADHD are sparse but at least three lines of work have begun to yield results. In 2007, Sonuga-Barke and Castellanos [48] put forth the hypothesis that default mode network activity is dysfunctional and interferes with attentional control. This hypothesis has spurred a number of experiments that collectively suggest a positive relationship between slow fluctuations of the default mode network and ultra-slow fluctuations of the EEG signal, that together appear to account for response variability [7074]. These slow fluctuations may prove helpful in identifying patients with alterations of default mode network functioning.

Using a different approach Lenartowicz et al. [75•] combined two techniques, independent component analysis and event-related spectral analysis, to link poor spatial working memory in a group of ADHD children with ineffective stimulus encoding. Specifically, poor working memory encoding was associated with weaker desynchronization (i.e., decrease) of alpha power (resulting in elevated alpha power) and elevated midline theta power that were, through source imaging of independent components, associated with occipito-parietal and frontal brain regions, respectively. The data suggested that stimulus-locked desynchronization in alpha power in posterior regions may be a marker for the efficacy of attention network interactions — one of the primary networks implicated in ADHD and notably absent in the ADHD group during spatial working memory performance. Mazaheri et al. [53, 76] put forth a similar notion, supported by disrupted coherence (see glossary) between alpha-theta frequency band power in individuals with ADHD.

The most direct test of EEG-fMRI interactions is made possible by concurrent recording of the two modalities. This approach was adopted by Karch et al. [77, 78] who, in a preliminary study with only eight patients and eight controls, demonstrated that the stimulus-locked fronto-central N2 ERP response was associated with reduced involvement of attention-related neural structures, including middle and medial prefrontal cortex and insula. A combined approach can therefore offer validation of EEG metrics as they relate to existing network hypotheses as well as provide mechanistically driven priors for network-based sources of heterogeneity in ADHD. The primary challenge in concurrent EEG-fMRI recordings is degradation of the EEG signal by induced artefacts from MR gradients and cardiac activity that are amplified by head motion [7981], meaning that small trial numbers and hyperkinetics (e.g., fidgeting) associated with ADHD (and with childhood) can seriously compromise the signal-to-noise of the EEG in the ADHD demographic. New methods are, however, emerging that will allow practitioners to more accurately characterize (and eliminate) the noise [82, 83•, 84, 85], promising increasing feasibility of the technique.

Individual-Level Diagnostics

Independent of the sensitivity and specificity that a diagnostic tool achieves across individuals, its diagnostic value is also influenced by at least three other factors: robustness, interpretability, and feasibility. Considering, as an example, the X-ray in diagnosing tuberculosis, it is not only sensitive and specific, it also produces a signal that is distinct from noise (robust), allows for clear interpretation based on the presence or absence of masses in the lungs (interpretable), and it can be administered in less than 10 minutes within an outpatient clinic (feasibile). Can any EEG-derived metric come close to meeting these criteria? In terms of robustness, we suggest that spectral-based metrics outweigh the potential value of ERP-based measures. Spectral power indices such as relative level of alpha band power in a resting state recording, or event-related changes in alpha power, produce identifiable signal with a short recording of 10-15 minutes and with (in event-related analyses) as few as 30 trials [e.g., 75•]. In contrast, ERP analyses rely on averaging over in excess of 100 trials [86] in order to arrive at a reliable signal for each individual. Furthermore the typical ERPs that have been most relevant to ADHD diagnosis, such as the error related negativity or P3 responses to unexpected events (i.e., oddball paradigm), are produced by stimuli that by design must occur infrequently (10-30 %). The diagnostic test must therefore be at minimum 2-4 times longer than the duration of the data that is of interest. It is noteworthy that, because of volume conduction, EEG metrics at any electrode represent the summation of multiple electrical events from, possibly, different locations in cortex. Approaches that attempt to un-mix these signals into their source components (Fig. 1b), such as independent component analysis [87, 88] and inverse modeling [89, 90], improve the signal-to-noise (of either spectral or ERP measures), and so, in the case of ERP analysis, can counteract some of the single-trial variability associated with the technique [e.g., 34••, 57, 75•].

Multivariate approaches that combine many different metrics, including behavioral and other neuroimaging measures, into a diagnostic profile, face a different problem. These methods, because of their computational power, can be very robust. The more variables in the diagnostic, the more likely it is that some kind of a signal will be available from at least a subset of such variables. However such profiling faces a problem with feasibility and with interpretation. While machine-learning algorithms are becoming increasingly available, a norm in big-data mining, obtaining large amounts of data can lead to long and tiring sessions. The collection of cognitive, neuropsychological, EEG and other physiological measures requires multiple hours of testing time, which creates a confound of fatigue and learning-transfer. In other words, exhaustive tests can be impractical, undermining the feasibility of the diagnostic. Multivariate profiling also raises the issue of interpretation. Most measures of cognition, behavior and neurophysiology fall on a spectrum for the population. They cannot be trivially dichotomized into a ‘yes’ or ‘no’ answer. This means that such diagnostics rely on norms and/or databases, which themselves require updating as well as interpretation [56]. This problem is compounded as more measures become involved. Of course multivariate profiles can and are typically thresholded to output a binary decision. However, dependent on the design of the algorithm, such profiles can be susceptible to errors when the incoming patient’s profile does not fit perfectly in either the ADHD or the non-ADHD template (i.e., they can over-fit to the training data, failing to generalize). Balancing robustness, interpretability, feasibility, reliability and validity are not easy goals, particularly with a backdrop of EEG heterogeneity. Although it is still unclear which combination of EEG markers will ultimately be identified, we believe that parsimony allowing for heterogeneity should be the goal.

Conclusion

As the past 75 years will attest, finding a simple diagnostic measure for ADHD (i.e., behavioral, cognitive, etiologic, neurophysiologic, or neurobiologic) has not been possible and such a measure may not exist at all. Needless to say, we believe that EEG/ERP are not ready to serve as tools to diagnose or aide in the diagnosis of ADHD. Our caveat, based on review of current literature, is that this conclusion is not specific to EEG/ERP but reflects a general problem of univariate measures or markers (biological or otherwise) being used to predict clinically heterogeneous disorders such as ADHD. Nonetheless, it is notable that the relatively high (>90 %) sensitivities and specificities reported using EEG, far exceed the most advanced of classification attempts using anatomical and functional MRI data. The competition put forth by the ADHD-200 consortium, challenging scientists to develop novel diagnostic profiles based on over 700 MRI datasets [91•], resulted in a range of accuracies from 55 to 78 % (arrived at by internal cross-validation). EEG thus remains a strong contender for a spot in the clinical setting, contingent on continued efforts — via multivariate analyses and refined studies of EEG signal generators — to capture additional sources of heterogeneity in ADHD.