Introduction

Non-pharmacological therapies for ADHD, especially Neurofeedback (NF) and PC-supported cognitive training (CogT), have gained increasing popularity over the last few years. Nevertheless, there is still controversy regarding the clinical efficacy of these treatments for ADHD. Although several meta-analyses of NF report satisfactory and even good clinical evidence when based on parents’ ratings [1, 2], other studies have claimed that there is no convincing evidence when “probably blinded” ratings are taken into account [3]. Unless behavioral observations are conducted (e.g., [4]), teacher ratings are considered the most blinded outcome [3]. Since teachers—in contrast to parents—do not usually actively participate in any aspect of the treatment, they are viewed as less biased. A comparable debate is underway with regard to CogT for ADHD [5, 6].

However, different reasons may be responsible for the low agreement between parent and teacher ratings. On the ADHD symptom level, interrater agreement between parents and teachers is generally low to at best moderate [7] and lower for inattention symptoms than for hyperactivity/impulsivity symptoms [8]. There is a lack of predictive value of parent symptom ratings for teacher-rated impairment [9]. Furthermore, parent and teacher symptom ratings may be differentially influenced by factors such as language problems of the child [10], socioeconomic status of the family, and externalizing behaviors [11, 12]. Parents generally report a greater severity of ADHD symptoms than do teachers [8]. Moreover, parents also report larger treatment effects across medication studies compared to teachers [13], although not systematically so [14, 15]. Teachers seem particularly sensitive to detecting medication-induced short-term reductions in hyperactivity, which may lead to erroneous carry-over effects in their ratings of other symptom dimensions [16]. In the long run, teacher reports seem to be more sensitive to changes on the hyperactivity/impulsivity dimension compared to parent or clinician reports, but studies are not consistent [17]. Correlations between parents’ and teachers’ change scores after medication are low or not significant, which has led to the conclusion that parent and teacher reports may not be interchangeable in the evaluation of pharmacological treatment effects [18].

Standardized classroom observations have been used as a more objective alternative to teacher ratings in intervention studies. Blinded objective observations are viewed as the “gold standard” for treatment evaluation [19]. Several intervention studies used the Behavioral Observation of Students in Schools (BOSS; [20]) to document behavioral change [4, 21, 22]. After NF training, significantly improved BOSS off-task behavior was reported [4]. However, the validity of such measures is limited, because they only provide a proxy for core ADHD symptoms. Cross-validation with informant ratings is, therefore, needed.

Another approach for evaluating the validity of informant ratings is to analyze their stability over time. Waiting lists are frequently used as a passive control condition. However, waiting for therapy may change symptoms. While a deterioration of symptoms during waiting time has been described for internalizing disorders, an attenuation of symptoms seems more common in externalizing disorders [24]. To the best of our knowledge, waiting time effects and their possible differential impact on teachers’ and parents’ ratings have not yet been systematically analyzed in ADHD, although significant changes may occur in both parent and teacher ratings (e.g., [4, 23]).

In the present study, we compared the effects of slow cortical potential (SCP) NF to an individualized approach of CogT in children and adolescents with ADHD. SCP NF aims at the phasic regulation of cortical excitability into a more activated or deactivated state [24]. Instead of training that aims at a normalization of possibly deficient EEG patterns, SCP training can be interpreted as neuroregulatory skill training [25, 26]. In the CogT, the training domains were tailored to the participants’ neuropsychological deficits. This enabled us to take into consideration the heterogeneity of neuropsychological deficits of patients with ADHD [27, 28] (see [29] for more details on the individualization).

One major objective of the study was to test the implementation of these treatments in schools. By doing so, we hoped to open up training opportunities for a clientele that might be unwilling or unable to pursue treatment in an outpatient facility during leisure time, such as families with lower socioeconomic status or adolescents. Treatment at school should facilitate transfer, and by involving teachers in the organization of individualized treatments, placebo effects might be reduced in parents—and possibly induced in teachers. To control for informant-specific bias, a waiting time of 3 months was introduced for all participants before training. The duration of the waiting period matched the subsequent duration of the training. Yet, no separate passive control group was included in the study design. Time effects were, however, controlled for within subjects through the addition of the pre-treatment waiting time.

The following hypotheses guided our research:

1. Differential treatment effects Studies comparing NF to CogT have reported a superiority of the former over the latter [4, 30]. Therefore, we expected to find larger effects in the NF group. A positive treatment response according to parent ratings was expected in both treatment groups (symptom reduction of ≥ 25% in at least 50% of subjects).

2. Effects of school vs. clinical settings Owing to the active involvement of teachers, the facilitation of transfer, and the disengagement of parents in the school setting, we expected differential setting effects on informants: better outcome in the school compared to the clinical setting according to teachers; better outcome in the clinical compared to the school setting according to parents.

3. Subjective vs. objective ratings A significant improvement after both types of treatment was expected in blinded standardized classroom observations. Correlations between changed classroom behavior and teacher ratings were expected, as both measures relate to the same context.

4. Waiting time effects We expected small but significant waiting time effects in informant ratings and larger treatment effects than waiting time effects.

5. Changing informants A change of the teacher may bias results. Ratings by the same teachers were assumed to have a better comparability and thus be more sensitive to detecting change than ratings by different teachers.

Methods

Participants

The final sample comprised 77 children and adolescents with ADHD (description is provided in Table 1). To be included in the study, participants had to be aged between 8.5 and 16 years and to present clinically relevant symptoms of ADHD, with or without hyperactivity, based on parent and teacher ratings on the Conners-3 DSM-IV ADHD indices (one of two ADHD DSM-IV indices reaching T values ≥ 65, the other T ≥ 60 according to both teachers’ and parents’ ratings). Exclusion criteria were severe comorbidities, autism, tics, or other psychiatric disorders as assessed by the Developmental and Well-Being Assessment (DAWBA; [31]). Further exclusion criteria were neurological diseases, intake of medication other than MPH, and an estimated IQ below 80 (Wechsler Intelligence Scales for children IV, short form of four subtests [32]). Medication with MPH was allowed under the condition that clinically relevant ADHD symptoms were still present and the dosage was kept stable throughout the study.

Table 1 Sample characteristics

Recruitment and randomization

Children in the clinical setting were recruited via outpatient clinics, clinicians in private practices, school psychological services, or parents’ support groups. Most children in the school setting were recruited directly in schools via teachers or school psychology services. Recruitment lasted for 2.5 years (December 2013–June 2016). In both settings, children were stratified by gender and age and randomized in parallel (1:1), with block sizes of 2, to either NF or CogT (Fig. 1). Allocation was determined by tossing a coin.

Fig. 1
figure 1

Flow chart of the randomized-controlled trial

Parents, children, and teachers gave written consent. The study was approved by the Ethics Committee of the University of Zurich, Switzerland. Clinical trial registration, https://clinicaltrials.gov, NCT02358941.

Setting

Thirty-eight participants of the final sample were tested and trained in their school (15 different schools), and 39 participants were tested and trained in an outpatient clinic (two clinics). In the school setting, training took place during ordinary school time, i.e., during supervised homework or leisure time, or during school lessons when the student was allowed to be absent from class. Training schedules and goals were worked out and evaluated in collaboration with the teacher. The involvement of parents was minimized. The training began with two to three double sessions (2 × 45–60 min) per week, and continued with one to two sessions per week, over a period of 10–14 weeks. Due to feasibility constraints, at the beginning, the intensity of training sessions differed slightly between settings. In the clinical setting, training began as a vacation course, with daily double sessions over 2 weeks, usually followed by a short therapy break and five double sessions over 5–8 weeks, administered during leisure time. The involvement of teachers was limited to the completion of questionnaires. Training schedules and goals were established in collaboration with parents.

Procedure and outcome measures

Parents completed the Conners-3 rating scale (German version; [33]) and the Behavior Rating Inventory of Executive Function (BRIEF) (German version; [34]) three times: at baseline (T1), 3 months later before the start of the training (T2), and after the training (T3). Similarly, teachers completed the Conners-3 and the BRIEF teacher version three times (see Fig. 2). Additional assessments will be reported elsewhere. Before and after the training, standardized school observations were conducted by blinded, trained raters with the BOSS [20, 35]. Raters were blind with regard to type of intervention, setting, and time point of the assessment; i.e., the child was rated before and after treatment by different observers. Throughout the study, 28% of classroom observations were completed by two experienced observers. Acceptable inter-observer agreement was reached, with a mean percentage agreement of 94% (93–96%) and a mean kappa of .67 (.64–.73). The mean time interval between classroom observation and completion of Conners-3 teacher ratings was 2.2 weeks (SD = 1.8).

Fig. 2
figure 2

Study design. BOSS Behavioral Observation of Students in Schools; BRIEF Behavioral Rating Inventory of Executive Function; PO primary outcome; SO secondary outcome

Interventions

SCP NF was administered with the Theraprax training device (Neuroconn) (the same device as used in a recent multicenter study [36]). The patients were supposed to steer a feedback item on the screen downward or upward by changing brain activity. In 50% of the trials, the task was to decrease brain activity and in the other 50% to increase brain activity. Feedback-EEG was recorded at Cz. Throughout the training course, an increasing number of trials provided delayed feedback (transfer trials).

In CogT, children were trained with CogniPlus, a software program developed for the rehabilitation of neurological patients (Schuhfried). The precursor of CogniPlus, Aixtent, was found to positively affect multiple measures of attention in children with ADHD [37]. CogniPlus consists of adaptive game-like training tasks that target neuropsychological functions such as alertness, sustained attention, working memory, selective attention, divided attention, and inhibition. An individual training program of four tasks was established for each child according to his/her main difficulties. Both types of interventions were complemented by elaborated transfer exercises to promote transfer to daily life. In the clinical setting, parents were introduced to transfer strategies and to the use of transfer cards, while in the school setting, the teachers received these instructions. However, neither parents nor teachers were informed about the actual treatment allocation. After completion of the training, parents and teachers in the school setting were asked about their assumptions regarding the type of treatment.

Statistical analysis

Repeated-measures MANOVAs with between-subjects variables treatment group (NF vs. CogT) and setting (school vs. clinic) were run to compare ratings before (T2) and after (T3) training for Conners-3 ADHD DSM-IV indices, BRIEF indices, and BOSS scores (engagement; off-task behavior). Post hoc t tests were conducted to analyze changes within groups. For the analysis of waiting effects, repeated-measures MANOVAs were calculated with three assessment times. Repeated contrasts were conducted to analyze changes from T1 to T2, from T2 and T3, and from T1 to T3. Partial eta-squared effect sizes (η 2 p ) were reported for MANOVA and ANOVA results (.01 = small, .06 = medium, .14 = large). Analyses were computed using IBM SPSS software version 23. Missing values were only imputed for T1 or T2, by estimating missing values proportionally to changes in the group (six missing BRIEF parent indices [2.8%] and nine BRIEF teacher indices [3.9%]). To account for effects of teacher change, outcomes were analyzed using a linear mixed model approach in R (nlme package; [38]) with Tukey post hoc contrasts in groups with and without teacher change (lsmeans package; [39]). Pearson correlations were used to assess agreement between changes in classroom behavior and informant ratings.

Sample size

With expected small-to-medium effect sizes for pre–post-training changes for both treatments, with four groups, three assessment times, and an alpha of .05 and power of .80, a required minimum total sample size of 72 was calculated (G*Power; [40]). Therefore, we intended to recruit a minimum of n = 20 in each group for a minimum total n = 80 with anticipation of several dropouts. Primary outcome measures were Conners-3 DSM-IV ADHD symptom scales (parent and teacher ratings).

Results

Groups were similar with regard to age, gender, IQ, medication, and comorbid disorders (Table 1). The socioeconomic status (SES) was significantly lower in families that participated in the school setting (U = 355, p = .047).

Intervention and setting effects

Results of pre- to post-training effects on Conners-3 ADHD DSM-IV indices, BRIEF indices, and BOSS measures are shown in Table 2. All MANOVAS yielded significant main effects of time, indicating a decrease in symptoms over time. Subsequent ANOVAs confirmed this improvement on all scale indices with the exception of teacher-rated DSM-IV hyperactivity/impulsivity. An improvement of medium size was found for classroom off-task behavior. For parent-rated ADHD DSM-IV indices, we found a significant time by training interaction effect. The interaction was significant for the inattention index only. Post hoc t tests indicated more improvement in inattention after CogT (M = − 5.08, SD = 5.51) than after NF (M = − 1.68, SD = 5.51, t(75) = 2.823, p = .006). Responder rates (T2/T3) of both interventions reached the criterion of more than 50% according to parents, with large effect sizes. According to teacher ratings, responder rates were much lower (31–37%), with effect sizes in the small-to-medium range. Between T1 and T3, the odds ratio indicated that more participants of the NF group than of the CogT group reached the responder criterion in teacher-rated DSM-IV hyperactivity/impulsivity. Responder rates and odds ratios are shown in Table 3.

Table 2 Descriptive statistics and MANOVA results on main outcome measures
Table 3 Responder rates to neurofeedback and cognitive training between pre- and post-training assessment (T2–T3) and between baseline and post-training assessment (T1–T3)

There was a significant main effect of setting on parent-rated Conners-3 and BRIEF indices, indicating more severe impairment in children treated in the clinical setting. The setting had otherwise no effect on any outcome variable.

Comparison of blinded classroom observations and informant ratings

Pre–post changes in observed off-task behavior and engagement were not significantly correlated with changes in teacher-rated Conners-3 or BRIEF indices (the same applied for ratings by the same teacher or by different teachers analyzed separately). The change in parent-rated BRIEF behavioral regulation showed a small but significant correlation in the expected direction with changed BOSS engagement (r = − .20, p = .049) (Table S1, Online Supplementary Material).

Waiting time effects

Waiting time effects are depicted in Fig. 3 (for detailed results, see Table S2, Online Supplementary Material). As the setting did not affect outcomes, it was dropped from the analyses. Significant waiting time effects (T1/T2 contrasts) for ADHD DSM-IV indices of large size were revealed for parent ratings (mean η 2 p  = .151) and teacher ratings (mean η 2 p  = .171). BRIEF waiting time effects were smaller, especially in parent ratings (parent mean η 2 p  = .018; teacher mean η 2 p  = .068), and only the T1/T2 contrast in teacher-rated BRIEF behavioral regulation became significant (η 2 p  = .086). Waiting time effects on teachers’ Conners-3 ratings were larger than treatment effects (mean η 2 p T1/T2 = .171 vs. mean η 2 p T2/T3 = .048). The opposite pattern was found for parent ratings (mean η 2 p T1/T2 = .151 vs. T2/T3 = .270). T1/T3 contrasts of all outcome measures indicated highly significant improvements.

Fig. 3
figure 3

Mean Conners-3 ADHD indices and BRIEF indices by parents and teachers with 95% CI across three assessment times, separated by training (-NF, -- CogT)

Effect of teacher change

The teacher changed in 15% (n = 12) of the sample between T2 and T3. Linear mixed model analyses revealed that only the subgroup with teacher change showed significant treatment effects with regard to DSM-IV inattention (b = 5.75, t(149) = 4.10, p = .001). A similar trend, with a better outcome rated by a new teacher, was found for BRIEF metacognition (b = 15.25, t(141) = 3.63, p = .005) (Fig. S1, Online Supplementary Material).

Blinding

The degree of blinding was tested by analyzing parents’ and teachers’ post hoc dichotomized responses (knew or guessed treatment condition correctly = “unblinded” vs. did not know or guessed wrongly = “blinded”). Parents and teachers in the school setting did not significantly differ in the proportion of blinding (Table S3, Online Supplementary Material). As expected, significantly more parents in the school setting (40%) than in the clinical setting (14%) were blinded. Teachers were blinded in 50% of cases in the school setting. Mixed models revealed no significant time by treatment by blinding interaction on any outcome measure.

Discussion

The present study analyzed informant-specific outcomes for two treatments by manipulating the setting, by adding an objective measure of behavioral change, and by assessing the stability of ratings over a waiting period.

Treatment effects Parents and teachers both indicated significant improvements after training (T2/T3) for NF and CogT, with larger sized effects according to parents (in line with the literature, e.g. [6]). Parents indicated a superiority of CogT over NF. However, this superiority might be related to incidental waiting effects: When considering changes from T1–T3, the superiority of CogT disappears. Contrary to assumptions, no differential treatment effect in favor of NF emerged, with the exception of the responder rate (T1/T3) according to teachers. Blinding did not seem to be associated either with differential treatment effects or with weaker treatment response.

Setting effects The setting had no significant impact on treatment response according to either of the informants. Involvement in the organization of the treatment did not seem to alter the teachers’ perception of behavioral change, which weakens the argument of Hawthorne effects being responsible for parent-teacher informant discrepancies. Parent ratings of impairment were generally higher in the clinical setting group, which may have resulted from the recruitment procedure: Children in the school setting were mainly picked out by teachers or school psychology services. Hence, the pressure of problems at home might have been less marked. Thus, whether children with ADHD are recruited via parents or via schools seems to have an impact on symptom severity ratings and should be considered in the interpretation of informant discrepancies. The demographic difference between settings in terms of socioeconomic status proved that implementing the treatment directly in the participants’ schools makes it possible to reach a different clientele. The results support the feasibility of both treatments in schools.

Blinded observations Blinded classroom observations showed improved off-task behavior after treatment, independently of type of intervention or setting. The improvement is comparable to that reported by Steiner et al. [4], but is of moderate effect size at best. It should be kept in mind that raters of the present study were blinded not only with regard to treatment but also with regard to setting and time. Changed classroom behavior was not correlated with changes in teacher ratings, suggesting that teachers might have provided a less objective rating of treatment response.

Waiting time Significant improvements in ADHD symptom ratings were found over a waiting period of 3 months according to both informants prior to training. These effects were larger than expected. Strikingly, teacher-rated waiting time effects exceeded treatment effects. Although a combination of possible explanations (such as maturation, spontaneous recovery, legitimation effects at screening, regression to the mean; see [41, 42]) may have contributed to this finding, it raises questions about the sensitivity to change of ADHD symptom scales, which usually serve as primary outcome measure. BRIEF indices, however, were less affected by waiting time than ADHD symptom ratings. This discrepancy might be attributable to the differential nature of the rating scales. While the BRIEF is supposed to assess the construct of executive functioning in daily behavior, the clinical Conners-3 ADHD symptom scales are designed to capture the whole phenotype of a rather broadly defined developmental disorder. Notably, parent and teacher ratings of ADHD symptoms seem to reflect a considerable amount of state variance (14–52%) [43], which is problematic when evaluating treatments. The finding, however, might equally reflect the fact that both interventions were primarily targeting regulatory control instead of ADHD symptoms, which would be more validly revealed by BRIEF ratings. Another explanation for small teacher-rated treatment effects compared to moderate-to-large waiting effects might be that repeated measurement or involvement in the organization of the treatment led to unrealistic hopes and expectations. This might have caused teachers to judge treatment effects more critically directly after training. If teachers expected treatment effects similar to the sudden impact of medication, they were bound to be disappointed, given the slow and discrete nature of improving behavioral control. Essentially, for teachers, normal behavior of other students seems to constitute the frame of reference. A difficult child will continue to be different in the view of teachers despite some possible improvement. It is also conceivable that teachers will rate changes more markedly with the passage of more time after training. Nonetheless, three out of four teacher-rated indices indicated small-to-medium improvements directly after training, and teacher-rated BRIEF metacognition did not improve during waiting time but only improved after treatment.

Effects of changing informants Contrary to expectation, improvements were more pronounced when a new teacher completed the post-training assessment. This does not necessarily imply that a new teacher provided a less preconceived view of the students’ behavior. It does, however, contradict the hypothesis that rater changes and, in consequence, a diminished reliability of ratings, might contribute to smaller treatment effects according to teachers.

Specific effects Analyses of the specific effects of both training methods (i.e., the associations between learned parameters and clinical outcome) should provide deeper insight into the mechanisms of the training and will be presented elsewhere [29, 44]. Some NF and CogT studies reported such direct associations with clinical outcomes rated by parents [45,46,47] or teachers [48, 49]. We believe that future studies will substantially benefit from applying such approaches, as these could contribute to the discussion about which informant may have provided a more sensitive rating of treatment effects (e.g. Janssen et al. [45] reported EEG parameter changes after NF related to parent ratings but not teacher ratings).

Limitations

Conclusions to be drawn from the comparison of settings are limited due to the fact that this allocation was not randomized, although the random allocation to NF or CogT still ensured that randomized-controlled trials were performed in both settings. Consequently, and as a result of the recruitment procedure, the socioeconomic status (SES) of parents in the school setting was lower than that of parents in the clinical setting. This might present a certain confound, and the limited effects of setting on treatment should be interpreted with caution. Another—interesting, but possibly limiting—result was that the initial symptom severity according to parents was lower in school-recruited children.

Although group sizes were adequately powered to detect medium within-between interactions, they might have been too small to unveil smaller effects or three-way interactions across the four groups.

The lack of a passive control group presents a further limitation of the study, which becomes even more essential given the substantial improvements in behavioral ratings across the waiting period. The possibility to interpret the efficacy of the applied treatments is, therefore, limited.

Conclusion

We found evidence of comparable effectiveness of NF and CogT for children and adolescents with ADHD according to blinded and unblinded outcome measures. Consistent with the literature, treatment effects were more pronounced in parent than in teacher ratings. Results from the closer examination of informant-related outcomes cast serious doubt on the assumption that teacher ratings are more immune to bias, while being as sensitive as parent ratings. Altogether, the findings of a lack of effect of setting and blinding, missing correlations with objective measures of change, and a positive rather than a negative impact of teacher change suggest that instead of “probable blinding”, other reasons might be responsible for the seemingly smaller transfer of training effects to the school context. Our results suggest that both parents and teachers should be regarded as relevant sources which may contribute different pieces of information to the evaluation of treatment.