Introduction

Attention-deficit/hyperactivity disorder (ADHD) is associated with important difficulties in academic, psychosocial and community functioning [1]. The comorbidity of ADHD with behaviour disorders is 30–60 % (oppositional defiant disorder or conduct disorder) and is associated with a poor prognosis [1].

The cognitive difficulties experienced by individuals with ADHD stem from a deficit in executive functions (EFs) [2], which are the mental capacities necessary to formulate, plan and perform the required actions to reach an objective. EFs include working memory (WM), response inhibition, sustained attention, planning, cognitive flexibility, and task switching, all of which are impaired in individuals with ADHD [3]. Furthermore, EFs have a substantial negative impact on functional impairment of individuals with ADHD [4].

Working memory is the cognitive function that facilitates the active maintenance and manipulation of information without external stimuli for a period long enough to use this information for some purpose [5]. Several meta-analyses have described the presence of WM deficits in individuals with ADHD [3, 6, 7], which is one of the core deficits of this disorder [2]. Moreover, WM is a fundamental cognitive function that underpins other more complex cognitive functions, such as other EFs and academic achievement [8]. An intervention aimed at improving this cognitive ability in ADHD might, therefore, be of critical importance in the treatment of this disorder.

Although there has been increasing development in cognitive training programmes in recent years, their effectiveness on some cognitive deficits and clinical symptoms of ADHD has been questioned in several reviews [912] and meta-analyses [1316]. Small effect sizes have been reported in these domains. These meta-analyses have important limitations, such as the mixture of cognitive training programmes that enhance different cognitive abilities. However, the poor results obtained have led some authors to suggest that the cognitive and symptomatic aspects of ADHD are not related and contribute independently to the overall functional impairment of this disorder [17].

Klingberg et al. [18] developed Robomemo® Cogmed Working Memory Training™ (CWMT), a computerized WM training with different auditory and visuospatial WM tasks that are presented in the form of attractive games designed for children. This training has been used on different populations and has been effective at improving some cognitive functions and psychiatric symptoms [10, 16, 18, 19]. In healthy adults and in ADHD, CWMT has proved to produce changes in brain activity in the areas involved in WM [2023] and to facilitate dopaminergic transmission [24], which plays an important role in this cognitive function.

The effect of training on non-trained task performance can be differentiated into near-transfer effects (post-training improvement of performance in tasks similar to the training tasks) and far-transfer effects (post-training improvement on tasks that are different in nature or appearance from the training tasks) [25]. Far-transfer effects occur when two different tasks share an underlying processing component and neuroanatomical areas or neural circuits [26].

Several rigorous methodological studies (randomized, placebo-controlled, double-blind) performed on children and adolescents with ADHD have shown that CWMT produces near-transfer effects [2731] that persist for up to 3 months [28]. However, the possible far-transfer effect has been poorly investigated as very few studies using ADHD samples have analysed whether CWMT improves EFs rating scales [3034]. Other studies have analysed the effect on performance-based measures of EFs (PBMEF) [28, 29, 31, 33, 35], on academic achievement [29, 31, 33, 34] and on ADHD symptoms [2731, 33, 34, 36], yielding mixed, inconclusive results. Moreover, few of these studies were longitudinal [28, 3133], which may be the key to finding far-transfer improvements. Unfortunately, most of these studies are methodologically poor.

Aims of the study

The main objective was to analyse the effect of CWMT on EFs scales in a sample of children with ADHD with or without comorbid disruptive behaviour disorders with a randomized, double-blind, placebo-controlled, parallel-group clinical trial with a 6-month post-intervention follow-up. Our secondary objectives were to study other far-transfer effects on clinical symptoms, functional impairment, PBMEFs, and academic achievement.

The authors’ hypothesis was that training produces short- and long-term far-transfer and near-transfer improvements.

Two similar versions of CWMT were compared. They differed only in the adjustment of difficulty, which was automatically adapted to the highest achievable level of each participant in the experimental version (adaptive training), while it was maintained at a low level of achievement in the placebo version (non-adaptive training). This allowed a control of the non-specific effects of the intervention, such as the passage of time, the maturation of participants and their familiarity with the task.

Methods

Study design

This is a randomized, double-blind, placebo-controlled, parallel-group clinical trial. Participants were randomized (1:1) to an experimental group (CWMT) (adaptive training) or a control group (non-adaptive training). The flow chart is presented in Fig. 1.

Fig. 1
figure 1

Flow chart of study participants

Participants

A power analysis was calculated assuming the criterion of 1 SD group difference in WM subscale of BRIEF parents/teachers compared to the standardized sample in the normal population, a risk α = 5 % and a statistical power (1 − β) of 95 %. Assuming a 20 % dropout during the study, the sample size was 63. Another sample size calculation with visuospatial and auditory WM performance-based tasks was performed, because in the absence of increase in WM capacity, it is theoretically unclear why WM training should lead to improvements on far-transfer tasks [9]. We assumed 1 SD group difference, a risk α = 5 % and a statistical power (1 − β) of 95 %, and 20 % dropout. The final sample size included 66 subjects.

Participants were enrolled in the study and were randomly assigned to one of the intervention groups by a member of the research team, using a computer-generated sequence. The study group allocation was blinded to children, their family, their teachers and the professionals who performed the cognitive assessments. In addition, participants, families and teachers were unaware of the difference between the experimental and the control training (i.e. the automatic adjustment of difficulty). The double-blind condition was maintained in all evaluations conducted throughout the study.

Patient recruitment was carried out from cases that consulted at the Child and Adolescent Psychiatric Unit from the University Hospital Mútua Terrassa from June 2010 to March 2012. A total of 66 outpatients participated in the study. All were diagnosed of combined-type ADHD according to the DSM-IV-TR criteria. Comorbidity with other disruptive behaviour disorders was accepted (i.e. oppositional defiant disorder or conduct disorder) according to the DSM-IV-TR criteria. All diagnoses were confirmed using the semi-structured Kiddie-Schedule for Affective Disorders and Schizophrenia, Present and Lifetime Version (K-SADS-PL) [37] interview that was administered to participants’ parents. Other inclusion criteria included age between 7 and 12 years; T scores on the Conners ADHD index for parents and teachers >70 at the time of diagnosis; no previous psychological or pharmacological treatment for ADHD; and access to a personal computer with Internet connection. Exclusion criteria included IQ < 80; comorbidity with autism spectrum disorder, psychosis, affective or anxiety disorder, consumption of toxic substances, or learning disorder; history of traumatic brain injury in the last 2 years; and perceptual-motor alterations that would preclude the use of a computer. Participants whose educational or socio-economic context would make it unlikely for families to comply with the study requirements and follow the treatment procedure (subjects whose families did not speak Spanish or were monitored by social services due to suspected abuse/neglect) were also excluded from the study. Furthermore, children who participated in fewer than 20 training sessions were excluded from the posterior data analysis, as were those who initiated other pharmacological or psychological treatments during study participation.

Table 1 displays the socio-demographic and clinical characteristics of the participants at T0. No significant differences between groups were observed with respect to any of these variables or to the questionnaires scores, performance-based measures, academic achievement, or composite scores.

Table 1 Baseline socio-demographic and clinical characteristics of participants, and p value of differences between groups in these variables

This study respected the principles outlined in the current legislation regarding clinical investigation (Declaration of Helsinki: Ethical Principles for Medical Research Involving Human Subjects -WMA, 2004-, Organic Law 15/1999 on the Protection of Personal Data and Law 41/2002 on the Autonomy of the Patient) and was approved by the Clinical Research Ethics Committee of the University Hospital Mútua Terrassa. Following a thorough description of the study, verbal assent from the children and written informed consent from the parents were obtained. When the study was completed, participants in the control group were offered CWMT. This study is registered as ISRCTN00767728 (www.controlled-trials.com).

Intervention

The experimental group underwent CWMT RoboMemo® (2005, Cogmed Cognitive Medical Systems AB, Stockholm, Sweden), which consisted of visuospatial, auditory, and location memory and tracking of moving visual objects as WM tasks. Each training session included 90 trials and had a duration of 30–45 min. Participants attended 5 sessions per week over a 5-week period for a total of 25 sessions. The level of difficulty was automatically adjusted to the performance of each participant, thus generating a prolonged cognitive demand that exceeded existing capacity limits to keep the task challenging throughout the training phase and thereby maximize WM performance gains [38]. This is based on the fact that cognitive plasticity is driven by a prolonged mismatch between functional organismic supplies and environmental demands [39].

The control group (non-adaptive training) engaged in the MegaMemo (2005, Cogmed Cognitive Medical Systems AB, Stockholm, Sweden), which consists of the same WM tasks as CWMT RoboMemo® but without the adjustment for difficulty, i.e. they performed simpler tasks. The remaining characteristics were the same for both groups, and both conditions were translated into Spanish.

Training was conducted in the children’s home, under the supervision of a family member. The response to each session, training time and number of sessions completed were recorded on an Internet database. A member of the research team (coach) who was the same for the two experimental conditions examined this information on a weekly basis and contacted each family via telephone to ensure adherence to the rules and resolve queries. Training included feedback on performance with respect to each task and a reinforcement game at the end of each session. Families were advised to add an additional reward at the end of each session. After randomization, children were given the corresponding training programme (CWMT RoboMemo® or non-adaptive training) on a CD, which contained no more than 25 training sessions. Participants in the analysis received no other pharmacological or psychological treatment until the end of their participation in the study, as verified by asking families and by checking the records of participants’ visits to the unit. Participants, relatives and teachers were blinded to group assignment throughout the study.

Outcome measures

An improvement index score was calculated for participants in the experimental group by subtracting the start index (results of days 2 and 3 of training) from the max index (results from the two best training days).

Assessments were conducted at baseline (T0), 1 to 2 weeks post-training (T1) and 6 months post-training (T2). Participants, their parents, their teachers, and the professionals who performed the cognitive assessments were blinded to group assignment. The professionals who administered the cognitive assessments were graduates in psychology who had been appropriately trained. These assessments were conducted over two sessions that were separated by a maximum time interval of 1 week and were always administered in the same sequence. A written recommendation was delivered to parents and teachers, asking them to complete all questionnaires regarding the status of the child during the previous week. The primary and secondary outcome measures used are presented in Table 2. We computed composite scores for WM and clinical symptoms (ADHD, behaviour, emotional symptoms and social behaviour) to deal with the risk of committing a Type I error if we analysed all measures separately. The subscales used for the composite scores are presented in Table 3. The arithmetic mean of the corresponding standardized scores was calculated as the final composite score. An index score was calculated for WM because the measure of a cognitive ability is more robust when it is obtained by the combination of several tasks that measure the same processes. This reflects their shared performance or ability [9].

Table 2 Description of outcome measures
Table 3 Composite scores of cognitive measurements and clinical symptoms, and subscales used for calculations

Statistical analysis

A descriptive statistical analysis of the variables age, sex, years of schooling and comorbid disorders was performed. The Chi-square test was used to conduct a comparative analysis between groups of all categorical variables at baseline (when not applicable, the Fisher’s exact test was used), and the student’s t test was applied for quantitative variables (or the Mann–Whitney U test, when the t test was not applicable).

The following variables were created for efficacy assessment: score changes between study time points T0, T1 and T2 (T1–T0, T2–T1, T2–T0), and an adjusted analysis was performed using a general linear model while controlling for age, sex and presence of a disruptive behaviour disorder. This procedure is equivalent to an analysis of covariance with age, sex and comorbidity with disruptive behaviour disorders as covariates. The analyses were conducted as complete case analyses, i.e. did not include missing values. Effect sizes (d′), that is, the difference between the change in scores T1–T0, T2–T1, T2–T0 for each group divided by the pooled standard deviations of both groups at T0 [52], and its CI at 95 % were calculated and classified as small (0.2), moderate (0.5) and large (0.8). Statistical tests were conducted assuming two-tailed contrasts with an α significance level of 5 %. The Statistical Package for the Social Sciences (SPSS®, version 17.0) was used for statistical analysis.

Results

Of the 65 participants analysed at T0, 6.15 % (n = 4) completed fewer than 20 training sessions (two due to IT problems, two dropped out) and were not included in the posterior data analysis. The other participants (93.85 %) completed the 25 training sessions over, on average, 35.15 calendar days (SD: 3.15), with no statistically significant differences between groups in this respect (Z = −0.54, df = 59, p = 0.59). However, 9.2 % (n = 6) participants dropped out the study between T1 and T2 due to starting pharmacological treatment. No significant differences were found between the experimental and control groups with respect to the proportion of dropouts during any study period (Fisher’s exact test: from T0 to T1: χ 2 = 3.65, df = 1, p = 0.08; from T1 to T2: χ 2 = 0.18, df = 1, p = 0.51; from T0 to T2: χ 2 = 2.41, df = 1, p = 0.12). The last participant excluded from the data analysis after participation in the study was excluded due to a diagnosis of pervasive developmental disorder not otherwise specified. Missing values refer to questionnaires that were not completed (T0: 1 WFIRS-P, 1 SDQ-teacher; T1: 1 BRIEF-teacher; T2: 1 BRIEF-parent, 1 SDQ-parent, 4 BRIEF-teacher, 2 Conners-teacher, 5 TRF, 3 SDQ-teacher) or to cognitive measurements that were not administered due to organizational or technical reasons (T0: 1 Tower of London, 1 TMT B, 1 reading comprehension test; T1: 1 CPT II, 1 TMT B; T2: 1 CPT II). The study was conducted between June 2010 and December 2012.

Online Resource Table 4 includes the mean and SD of performance-based measures, academic achievement, questionnaires and composite scores at T0, T1 and T2 for both groups. Online Resource Table 5 includes the results of the general linear model analysis. The results referring to the subscales of the questionnaires regarding clinical symptoms are not included, but are available upon request.

The mean improvement index for the experimental group was 30 (SD: 13.04).

Primary outcome measures

With respect to EFs scales (BRIEF) as assessed by parents, no significant differences were observed between T0 and T1. In contrast, between T1 and T2, the experimental group improved significantly more than the control group according to the WM subscale (t = −2.73, df = 4, p = 0.01) with a large effect size (d′ = −0.86, 95 % CI −0.17 to −0.35), and this difference was also significant at T2–T0 (t = −2.56, df = 4, p = 0.01) with a moderate to large effect size (d′ = −0.61, 95 % CI −1.11 to −0.11). Furthermore, statistically significant improvements were found between T1 and T2 with respect to the plan/organize subscale (t = −2.02, df = 4, p = 0.05) and the metacognition index (t = −2.25, df = 4, p = 0.03), with moderate to large effect sizes (d’ = -0.71, 95 % CI −1.21 to −0.21; and d′ = −0.78, 95 % CI −1.28 to −0.27). In the BRIEF teacher version, from T0 to T1, the experimental group improved significantly more than the control group on the following subscales: initiate (t = −2.50, df = 4, p = 0.01) with a moderate effect size (d′ = −0.55, 95 % CI −1.05 to −0.05); WM (t = −2.11, df = 4, p = 0.04), and metacognitive index (t = −1.97, df = 4, p = 0.05) with a small to moderate effect size (d′ = −0.36, 95 % CI −0.85 to 0.13; and d′ = −0.37 95 % CI −0.86 to 0.12). These differences increased at T2 (t = −2.20, df = 4, p = 0.03; t = −2.47, df = 4, p = 0.02; t = −2.44, df = 4, p = 0.02 at T2–T0) with a moderate effect size evidenced for the initiate subscale (d′ = −0.57, 95 % CI −1.02 to −0.07) and with large effect sizes in WM subscale and the metacognitive index (d′ = −0.84, 95 % CI −1.35 to −0.33; and d′ = −0.81, 95 % CI −1.31 to −0.30). At T2, a significant improvement was also observed for the monitor subscale (t = −2.32, df = 4, p = 0.02 at T2–T1; t = −2.16, df = 4, p = 0.04 in T2–T0) with a moderate to large effect size (d′ = −0.72, 95 % CI −1.22 to −0.21 at T2–T1; d′ = −0.79, 95 % CI −1.30 to −0.28 in T2–T0) and the shift subscale (t = −2.04, df = 4, p = 0.05 at T2–T1) with a small to moderate effect size (d′ = −0.39, 95 % CI −0.88 to 0.10) (Fig. 2).

Fig. 2
figure 2

Effect size on subscales of BRIEF parent and teacher version. Negative effect sizes indicate a greater reduction of raw score in the experimental group compared to the control group; hence, a greater improvement in EF scales in the experimental group. Small effect size: 0.2; moderate effect size: 0.5; large effect size: 0.8.*p ≤ 0.05 in the adjusted analysis with score changes using a general linear model, controlling for age, sex and the presence of disruptive behaviour disorders

Secondary outcome measures

A significant improvement in ADHD symptoms composite score for the experimental group compared to the control group was reported by parents from T1 to T2 (t = −2.69, df = 4, p = 0.01) with a small to moderate effect size (d′ = −0.39, 95 % CI −0.88 to 0.10) and by teachers from T0 to T2 (t = −2.25, df = 4, p = 0.03) with a moderate to large effect size (d′ = −0.69, 95 % CI −1.20 to −0.18) (Fig. 3). On the Conners Rating Scales-Revised, only marginally significant differences were found at T2–T0 on the inattention subscale of the parent version (t = −1.76, df = 4, p = 0.08) and on the ADHD index subscale of the teacher version (t = −1.88, df = 4, p = 0.07). No significant differences were found for any other composite scores or clinical symptoms scale assessed by parents or teachers.

Fig. 3
figure 3

Effect size in ADHD symptoms composite score for parents and teachers. Negative effect sizes indicate a reduction in raw score in the experimental group compared to the control group; hence, a greater improvement in EF scales in the experimental group. Small effect size: 0.2; moderate effect size: 0.5; large effect size: 0.8. *p ≤ 0.05 in the adjusted analysis with score changes using a general linear model, controlling for age, sex and the presence of disruptive behaviour disorders

Significant improvements in functional impairment (WFIRS-P) for the experimental group compared to the control group were registered from T1 to T2 on the school learning behaviour subscale (t = −2.43, df = 4, p = 0.02) with a large effect size (d = −0.86, 95 % CI −1.37 to −0.35) (Fig. 4). No statistically significant improvements were detected on any other subscale of the WFIRS-P.

Fig. 4
figure 4

Effect size in Weiss Functional Impairment Rating Scale (WFIRS-P). Negative effect sizes indicate a reduction in raw score in the experimental group compared to the control group; hence, a greater improvement in WFIRS-P scales in the experimental group. Small effect size: 0.2; moderate effect size: 0.5; large effect size: 0.8. *p ≤ 0.05 in the adjusted analysis with score changes using a general linear model, controlling for age, sex and the presence of disruptive behaviour disorders

Regarding performance-based measures, the experimental group improved significantly more than the control group between T0 and T1 on the WM composite score (t = 3.67, df = 4, p < 0.01) with a large effect size (d′ = 0.81, 95 % CI 0.30 to 1.32). This significant difference persisted at T2 (t = 2.00, df = 4, p = 0.05 at T2–T0) though with a minor effect size (d′ = 0.12, 95 % CI −0.67 to 0.61).

From T0 to T1, the experimental group improved significantly more than the control group in CPT II commission errors (t = −2.27, df = 4, p = 0.03), with a small to moderate effect size (d′ = −0.40, 95 % CI −0.89 to 0.09) and detectability (t = 2.50, df = 4, p = 0.01) with a moderate to large effect size (d′ = 0.60, 95 % CI 0.1 to 1.10). Both improvements persisted at T2 (t = 0.93, df = 4, p = 0.36; t = −1.50, df = 4, p = 0.14 at T2–T1), and no statistically significant differences were found regarding any of the other performance-based measures or the reading comprehension.

Adjusted multiple linear regression analysis examined the effect of comorbidity, age and sex, on the efficacy of training. The variable group and the other predictor variables were never simultaneously significant in any of the regression models (see Online Resource Table 5), i.e. these values did not have a significant effect on the efficacy of training.

Discussion

CWMT produced far-transfer effects on a sample of children with ADHD. The strongest effects were observed on primary outcome measures EFs scales assessed by both parents and teachers, especially long-term effects. Both informants described improvements on the BRIEF metacognition index or its subscales with important effect sizes (most >0.70). The metacognition index represents the “ability to initiate, plan, organize, and sustain future-oriented problem solving in working memory” [40] and consequently assesses behaviours related to trained cognitive ability (WM), though the behaviours differ in nature and appearance. This far-transfer effect occurs because WM and EFs share neuroanatomical areas or neural circuits [53] and underlying processing components as WM underlies more complex EFs [54].

Very few randomized placebo-controlled trials using CWMT and samples of children with ADHD have evaluated the effect on the EFs scales [30, 34]. Gray et al. [34] analysed only the effect on a WM teacher rating scale post-training using an intent-to-treat analysis, and found no improvements in a sample of adolescents with severely impairing learning disorder and coexisting ADHD. The great severity of the participants’ condition may have hindered the effectiveness of the intervention. Another possible explanation for these discrepancies in results may be related to differences in training intensity. In our study, the improvement index for the experimental group was statistically significantly greater than that in Gray et al. [34] (t = 4.34, df = 61, p < 0.01), possibly because of the reward added by families after each training session (in addition to the game included in the programme) or upon final completion of the 25 training sessions for the whole sample analysed. Steeger et al. [30] examined the individual and combined post-training effects of CWMT and behavioural parent training on ADHD adolescents, using the same non-adaptive training as in the present study. Seteeger et al. [30] concluded that an uncontrolled potential bias operating in the non-adaptive control group may explain why they found no benefit of CWMT on EFs scales. We also believe that this bias may exist, especially post-training, as we will argue below.

Long-term far-transfer improvements on ADHD symptoms were also observed by both parents and teachers according to composite scores that included scales of ADHD symptoms assessed using different instruments, with small to moderate effect sizes in the case of parents and moderate to large effect sizes for teachers. We note, however, that as the differences between pre-training and follow-up in ADHD symptoms reported by parents were not statistically significant, we should be cautious in interpreting these results. It is likely that a later follow-up at 9–12 months would clarify the actual evolution of this symptomatology. In contrast, only marginally significant differences on the long-term assessment were found in Conners Rating Scales-Revised, specifically, inattention on the parent version, and ADHD index on the teacher version. This questionnaire differentiates clusters of ADHD symptoms of inattention and hyperactivity/impulsivity and adds an ADHD index that includes items with the highest factorial loadings [41]. This suggests that the training effect did not occur in a specific manner on a specific type of ADHD symptoms but, rather, on the overall symptomatology of the disorder. The improvement in ADHD symptomatology along with EFs would support the existence of a relationship between the cognitive and symptomatic aspects of ADHD, which contradicts what some authors suggest [17].

Most randomized controlled trials placebo-controlled using CWMT and samples of children with ADHD did not find improvements in ADHD symptoms reported by parents or teachers [27, 29, 30, 34], two of them with an intent-to-treat analysis, but none longitudinal [29, 34] and only one double-blind [29]. In symptoms rated by parents, only one clinical trial found post-training improvements [28] that remained at the 3-month follow-up with a small to moderate effect size, similar to that found in the present study. No previous clinical trial has described significant improvements in ADHD symptoms as rated by teachers post-training [2830, 34] or follow-up at 3 months [28]. The multiple differences between our study and previous studies make it difficult to compare results, but in view of our results it seems important to conduct long-term follow-ups to detect improvements in these symptoms. The only study with a similar follow-up was Egeland et al. [33], who performed an 8-month follow-up, but this study did not adhere to a rigorous methodology (not double-blind and used a passive control group) and, similar to Grey et al. [34], its training intensity was lower (the improvement index for the experimental group was statistically significantly greater in the present study: t = 2.32, df = 66, p = 0.02). Our results are similar to those found in a meta-analysis which describes that CWMT has significant benefits on symptoms of inattention in daily life, mainly rated by parents, with moderate effect sizes at post-training, and small to moderate effect sizes at follow-up [55].

Parents and teachers showed low concordance in identifying the moment when improvements in EFs scales and in ADHD symptoms occurred. Teachers detected improvements immediately after completing training, which continued to improve until 6-month follow-up. In contrast, the majority of improvements according to parents occurred from post-training to follow-up. A more detailed observation of the data indicates that parents in the experimental group actually detected some improvements post-training, like parents of the control group. But from post-training to follow-up, parents in the experimental group continued to detect improvements, while parents of the control group detected a worsening. We believe this can be explained by two reasons: first, the possible existence of a nocebo effect in the experimental group because the adaptive version is more frustrating than the non-adaptive [56] and children with ADHD and/or other disruptive behaviour disorders have difficulty managing frustration [1], and an important effort on the part of the parents is required [30, 57]. This may lead to a lower perception of initial improvements by parents in the experimental group. Something similar might have occurred in Green et al. [27] that evaluated the effect of CWMT on ADHD symptoms with questionnaires rated by parents along with an observational system considered a good indicator of the behavioural response of children. Interestingly, while they found post-training improvements in this observational task, improvements in ADHD symptoms were not indicated by parents. The second reason is the possible existence of a placebo effect in the control group that might explain the post-training improvement (not significant) described by the parents of the control group. Non-adaptive training is easier as the difficulty level does not increase, hence there are a greater number of correct trials (with corresponding positive feedback) and a smaller number of errors. This may increase child motivation and positive interactions by the family member who supervises the child’s training compared to the experimental group (in which a greater number of errors occur in each training session). Supportive interactions between the parent and child can have direct benefits on improving the behaviour of children [58], and increased support and collaborative problem solving between the parents and the coach can improve parent ratings of ADHD symptoms [59]. Other recent CWMT studies using non-adaptive training have found similar trends indicating improvements in the control condition [29, 30, 60]. Furthermore, such uncontrolled potential biases may reduce the opportunity to find treatment effects, especially at short term. This view is diametrically opposed to that expressed by some authors who consider that the non-adaptive training used in this study is not a valid placebo condition because it is less motivating and requires less training time than the experimental condition so that it reduces the amount and quality of parent–child–coach interactions during training. This would facilitate finding therapeutic effects rated by parents in the experimental group [12, 61]. It is unlikely to account for the results obtained in this study, since far-transfer effects were also detected by teachers, who were blind to treatment condition and did not participate in the training of children. We should also note that both groups completed the 25 training sessions in a similar number of days, and that no significant differences were found between groups with respect to the proportion of dropouts which would imply that children with non-adaptive training were not less motivated.

Long-term far-transfer improvement in functional impairment was observed on the school learning behaviour subscale (WFIRS-P) with a large effect size. This is the only scale WFIRS-P that specifically assesses adaptation to the school environment. This result is consistent with previous results: the greatest long-term improvements that occurred in the school environment (on ADHD symptoms and EFs scales rated by teachers), with effect sizes from moderate to large. Importantly, these similarities were detected by different raters, as functional impairment was assessed only by parents. We believe that these coincidences validate our results. We note, however, that as the differences between pre-training and follow-up were not statistically significant with respect to the learning behaviour scale, caution must be exercised when interpreting these results. A later follow-up would likely clarify the actual evolution of this scale. To our knowledge, this is the first placebo-controlled study to assess the effect of CWMT on a functional impairment scale for ADHD children. Van der Donk [31] did not find improvements in functional impairment in a sample of ADHD children, but this study was not placebo-controlled because it compared CWMT with another intervention that included paper and pencil WM training, psycho education about EFs, and strategies for optimizing generalization to the classroom situation.

Our results indicate greater improvements in the school environment than at home. This is perhaps because the school environment is more structured and allows for better supervision and support in applying the trained skill to new situations, which is one of the basic principles that facilitates generalization in neurorehabilitation [62].

Fewer far-transfer effects were detected in PBMEF as only response inhibition and sustained attention (CPT II) improved post-intervention with small to moderate effect sizes and without significant long-term worsening. Only two randomized, double-blind, placebo-controlled clinical trials have evaluated the effect of CWMT in children with ADHD on PBMEF [28, 29], and of these, only the study of Kilngberg et al. [28] found effects on response inhibition post-training with small to moderate effect sizes. Conversely, Chacko et al. [29], using an intent-to-treat analysis, found no effect on sustained attention or on impulsivity, post-training. This may be due to differences in training intensity as the improvement index was higher in our study (98.46 % of our participants showed an improvement index >17 compared to 84 % in Chacko et al.).

The relatively few effects on PBMEF in our study, compared with the effect on EFs scales, may be related to the inherent limitations of these measures. Such limitations include its questionable sensitivity and specificity [63], the structured and interactive nature of its assessment situations that reduces demands on EFs [64], the requirements of standardisation, reliability, and validity that restrict its ecological validity [63, 65] and its attempt to tap specific components of EF in isolation [66]. However, EFs scales have a greater ecological validity [67, 68] because they capture the integrated, multidimensional, relativistic, priority-based decision making that is often demanded in real-world situations [66]. Furthermore, EFs scales are more sensitive to the cognitive deficits associated with ADHD than laboratory measures [68]. In addition, unlike performance-based measures that assess short time periods (usually 5–30 min tasks), EFs scales assess behaviour over considerably longer periods of time (from weeks to months) and are consequently more useful as an indicator of cross-temporal behavioural organization and problem solving toward a goal [68]. This would explain why the most prominent results in these scales were detected in the long-term assessment.

We did not detect improvements on learning when assessing with a measure of reading comprehension (Canals). To our knowledge, this is the first clinical trial to value the CWMT effect on long-term reading comprehension using a sample of ADHD children. WM predicts reading comprehension [69] because WM acts as a buffer for retaining ordered strings of words as their corporate meaning is reconstituted [70]. Two previous clinical trials with CWMT and samples of ADHD children found no improvements in reading comprehension after training [29, 34] with an intent-to-treat analysis. Performance on measures such as reading and mathematics are strongly influenced by prior learning and are relatively insensitive to recent changes in learning capacities [71]. To detect improvements regarding a measure of reading comprehension, a longer follow-up may be necessary during which the child can exploit his or her improved WM capacity [71]. Furthermore, the outcome measure used in our study may lack sufficient sensitivity to detect subtle and developing changes. We also note that the measure of reading comprehension is not an exhaustive or representative measure of learning in general. Consequently, no improvement in this measure does not necessarily mean that these children have not improved overall school performance, as suggested by the results obtained on the scale of functional impairment.

Post-intervention near-transfer improvement on WM was observed as assessed by a robust composite score [9] with a large effect size. This remained significant over the long term with a smaller effect size. These results coincide with those of several other clinical trials [2730, 72], one of them using intent-to-treat analysis [29], that show that WM is a cognitive function that can be improved with training. This is of crucial importance as WM deficits are core symptoms in ADHD [2, 6] and WM is a cognitive function that underlies other EFs [54], which would account for the far-transfer effects observed in this study.

In some ways, while post-training improvements were detected, the improvements were generally higher at follow-up. This may be due to the characteristics of the aspects evaluated, as EFs scales assess behaviour over periods of time from weeks to months [68], and therefore, time is required to detect changes in these domains and in aspects they depend upon (i.e. ADHD symptoms and functional impairment related to EF). Another possible explanation may be that a certain amount of time is required for the child to exploit his or her improved WM capacity and to produce improvements in other EFs and related aspects. Third, there may be a greater influence of bias post-training, as argued above.

Training was equally effective regardless of comorbidity with disruptive behaviour disorders, age and sex, as statistical analysis controlled for the effect of these variables. In any event, the number of participants with oppositional defiant disorder was relatively low, and none presented with conduct disorder. Therefore, further studies should be conducted to confirm this result.

The main limitations of the study were as follows. First, while the same measurements were used in the different assessments, this does not exclude a possible test–retest effect (procedural learning). The inclusion of a third experimental condition (waiting list control group) would permit differentiating non-specific effects of training from test–retest effects. Second, due to the comprehensive evaluation used in this study, we had to deal with the risk of committing a Type I error if we analysed all measures separately. On the other hand, we risked committing a Type II error if we corrected for multiple comparisons with a strict correction. Instead, we chose to compute robust composite measures whenever possible, but this was not possible for all cognitive functions and questionnaires. Third, the analyses were not conducted as intent-to-treat analyses, but as complete case analyses. Fourth, the results cannot be generalized to ADHD children with IQ < 80, to those with comorbidities other than disruptive behaviour disorders, to those whose educational or socio-economic context would make it unlikely for families to comply with the treatment procedure, to those under 7 or over 12 years old, or to those who have already engaged in psychological or pharmacological treatment for ADHD. Fifth, we did not control how blind the study was, and, as expectancy effects were not evaluated, we cannot assure that these effects were similar in the experimental and control groups. Sixth, to assess the effects on clinical symptoms, subjective assessments with questionnaires administered to parents and teachers were used, but no child self-report nor clinical impression from an expert or from any other objective measure of ADHD symptoms was considered. Seventh, a longer follow-up could clarify the evolution of some of the results found in this study. Future studies should use placebo conditions that control the non-specific aspects of treatment while allowing to detect the beneficial aspects of training. The use of training with increased level of difficulty in the control condition (i.e. adaptive training that stimulates other cognitive skills) may be useful. Future studies should also identify which individual differences can influence the effectiveness of the training, and investigate whether there is a synergistic or differentiated effect of evidence-based interventions for ADHD and WM training. It would also be important to determine whether the effectiveness of this training is maintained beyond 6 months and for how long, and if some kind of training reminder may be useful. The strength of this study lies in its design, its low dropout rate, the thoroughness of the assessments conducted, and the long-term follow-up.

The results obtained have an important clinical impact because they show that CWMT affects underlying impaired mechanisms in ADHD by producing a significant, lasting impact on some of the core deficits of the disorder and on some of the cognitive abilities and functional aspects that depend on the core deficits, especially in more structured environments. CWMT can be a complementary treatment to current evidence-based interventions for ADHD (behavioural and pharmacological) since it has more lasting effects over time and produces improvements in some aspects on which “gold standard” treatments have no clear effectiveness, such as in EF deficits [13]. The combination of these interventions may also produce synergetic effects on improving ADHD symptoms and functional adaptation, although this hypothesis should be examined. We are aware that we must be cautious in interpreting some of the improvements described by parents. Nevertheless, the global similarities found in different raters and in performance-based measures are unlikely to be due to chance and thus validate our results. The results obtained suggest that a rigorous methodological design with long-term follow-up and high training intensity are the key to identifying far-transfer effects.

The higher training intensity reached in this study may be related to a better adherence, because all subjects analysed completed the 25 training sessions, whereas in most previous studies subjects performed 20–25 sessions. Some characteristics of our sample may have favoured adherence, such as the scarcity of ADHD with comorbid ODD, the inclusion of children (not teenagers) who probably enjoy and become more motivated with training exercises with childlike appearance, or the exclusion of subjects whose educational or socio-economic context would make it unlikely for families to comply with the treatment procedure. Furthermore, although the number of dropouts was really low, we performed a complete case analysis. Therefore, and based on the results obtained in the present study, we consider that CWMT may be a recommended intervention in children with ADHD aged between 7 and 12 years, newly diagnosed without previous treatment, mostly without comorbidity, with a familiar environment that can supervise the training (with the help of the coach) to make them able to complete it.

This study was registered in www.controlled-trials.com, November 27, 2013, identifier: ISRCTN00767728.

Aitana Bigorra reports no competing interests; Maite Garolera reports no competing interests; Silvina Guijarro reports no competing interests; Amaia Hervas has served on advisory boards and as a speaker with honorarium and travelling expenses for Shire.