Introduction

Dimensional assessment of externalizing symptoms in children often relies on observer ratings by parents or teachers and—to a minor degree depending on the patient’s age—on self-rating scales (Barkley 2006; Barkley and Murphy 2006). However, interview techniques may be beneficial, because clinicians can rule out misinterpretations of items and can judge the clinical relevance of the reported symptoms. In the case of interviewing both the caregiver and the child, the judgement’s basis is broader because it includes statements given by mother and child as well as the child’s behaviour during the exploration. Moreover, interviews allow for a blind outcome assessment which is commonly considered as an important methodological standard in treatment research (e.g. Chambless and Hollon 1998; Lonigan et al. 1998; Moher et al. 2001; Boutron et al. 2008).

However, existing interviews mainly focus on a categorial diagnosis. Thus, the objective of our study was to examine the psychometric properties of a scale derived from the Kiddie-SADS (Kaufman et al. 1996; Deutsche K-SADS-Arbeitsgruppe 2001) used for a dimensional assessment of externalizing symptoms (symptoms of attention deficit hyperactivity disorder, ADHD, and oppositional defiant disorder, ODD) in children and adolescents affected by ADHD. The severity of externalizing symptoms was operationalized by the sum of fulfilled diagnostic criteria within a given time period. Using sum scores of items covering diagnostic criteria has proven to lead to dimensional symptom scores which are sensitive to change and represent valid and reliable measures to assess treatment outcome (e.g. the Disruptive Behavior Rating Scale, DBRS, Barkley and Murphy 2006; the German “Fremdbeurteilungsbogen für Hyperkinetische Störungen”, FBB-HKS, Doepfner and Lehmkuhl 1998; Froelich et al. 2002; Doepfner et al. 2004). In the following psychometric properties of the externalizing symptom scale based on the K-SADS interview will be presented including inter-rater reliability; internal consistency, item difficulties and discriminative power of the items, and aspects of convergent, discriminant and factorial validity.

Materials and methods

Participants

The sample was recruited from patients of the Department for Child and Adolescent Psychiatry at Würzburg University Hospital. Inclusion criteria were given as follows: (1) a diagnosis of ADHD according to DSM-IV criteria (APA 2000); (2) age: 6–16 years, inclusive; (3) voluntariness and written informed consent of the patient and persons having the care and custody of the child. We excluded patients suffering from organic mental disorder, childhood autism and psychosis. Diagnosis of ADHD was based on the clinical diagnosis as documented in the patient’s case charts. In addition, diagnosis was confirmed by a structured checklist covering DSM-IV criteria. Coexisting disorders were coded according to ICD-10 (WHO 1993) as documented in the case charts. We initially included 61 patients. One patient was excluded later on because an autistic disorder was diagnosed, and one family withdrew informed consent stating that time was lacking to engage in the investigation. Thus, 59 patients were included in the data analysis (10 inpatients, 16.9%; 49 outpatients, 83.1%; 39 males, 66.1%, 20 females, 33.9%). Mean age of the patients was m = 9.66 years (SD = 2.30; min = 6, max = 16) and mean IQ was m = 99.68 (SD = 13.21, min = 75, max = 133). 41 patients (69.5%) lived in families together with both biological parents (the remaining 18 patients—30.5%—lived with their mothers while their fathers were living apart). 26 patients (44.1%) received outpatient treatment before contact to our child psychiatric unit had been established, 6 (10.2%) had previous inpatient treatment. At the time of the investigation, 49 patients (83.1%) were medicated with substances to treat ADHD (methylphenidate: n = 43, 72.9%; amphetamine: n = 5, 8.5%; atomoxetine: n = 1, 1.7%). Subtypes of ADHD and coexisting disorders in the study sample are shown in Table 1.

Table 1 Subtype of ADHD and coexisting disorders in the study sample (N = 59)

Most patients presented with the combined subtype of ADHD. 24 (40.7%) patients had no additional disorder, 23 patients (39%) had one, 10 patients (17%) had two and 2 patients (3.4%) had three additional disorders. The most frequent coexisting disorders were conduct or oppositional defiant disorder, elimination disorders, adjustment disorders or developmental disorders. A discrepancy between diagnoses documented in the case charts and the information given by the parents appeared with respect to specific developmental disorders of scholastic skills, which were much more frequently diagnosed previously according to parent information (reading and spelling disorders: n = 9; disorder of arithmetical skills: n = 5).

Materials

ADHD–ODD scale based on the Kiddie-SADS

To assess externalizing symptoms, the ADHD section and the ODD section of the Kiddie-SADS (German version: K-SADS-Arbeitsgruppe 2001) were applied. The K-SADS is a detailed semi-structured diagnostic interview with widespread international use covering mental disorders in children and adolescents according to ICD-10 and DSM-IV. To get to a dimensional ADHD–ODD score, the interview was modified to cover the actual presence of DSM-IV diagnostic criteria for ADHD and ODD within the last 2 weeks. Mother and child were interviewed separately. On the basis of this information and relying on the behaviour of the child during the examination, the interviewer decided on the presence or absence of each of the 26 criteria (18 ADHD criteria, 8 ODD criteria) within the 2-week period under investigation. As in the original K-SADS, the diagnostic decision solely based on either the mother’s or the child’s statements is also coded by the interviewer. However, the interviewer’s clinical judgement based on all information is the critical variable. In the original version of the K-SADS, there is a three-point rating with respect to each diagnostic criterion (0: not fulfilled, 1: subthreshold symptoms, 2: criterion fulfilled). We reduced the rating to a dichotomous decision (criterion fulfilled or not) to reassure that the scale covers clinically significant symptoms only. An externalizing symptom score was generated by the sum of fulfilled diagnostic criteria (theoretically ranging between 0 and 26). In addition, subscales were defined based on content validity (an ADHD scale covering the 18 ADHD criteria of DSM-IV; an AD scale covering the 9 attention-deficit-related criteria, an HI scale covering the 9 criteria for hyperactivity and impulsivity and an ODD scale covering the 8 criteria for ODD).

Strengths and difficulties questionnaire, SDQ

The SDQ is a brief questionnaire for about 3–16 years olds and assesses externalizing and internalizing symptoms by 25 items covering five subscales (five items each): (1) behavioural difficulties, (2) hyperactivity and attentional difficulties, (3) emotional distress, (4) difficulties getting along with other children, (5) kind and helpful behaviour. A total difficulties score is generated by subscales (1) to (4). For our study, we used the parent-rated version of the SDQ.

The SDQ is internationally applied for screening and scientific purposes and well investigated with respect to its reliability and validity (Goodman 1997; Goodman and Scott 1999; Klasen et al. 2000; Becker et al. 2004).

Parent-rating scale for ADHD, FBB-HKS (“Fremdbeurteilungsbogen für hyperkinetische Störungen”)

The FBB-HKS is part of the German Diagnostic System for Mental Disorders in Childhood and Adolescence (DISYPS-KJ; Doepfner and Lehmkuhl 1998). It includes 20 items of the symptom criteria of ICD-10 and DSM-IV to generate an ADHD total score. Subscales cover the core symptoms of ADHD: (1) inattention, (2) hyperactivity and (3) impulsivity. The psychometric properties of the FBB-HKS are well investigated and satisfying (Bruehl et al. 2000; Doepfner et al. 2006).

Both instruments, the SDQ and the FBB-HKS, were used to assess aspects of convergent and discriminant validity of the ADHD–ODD scale. In addition to the ADHD–ODD scale, the interview with the child’s mother included a semi-structured interview covering anamnesis, socio-economical background and treatment of the child. Sheets were included for the coding of coexisting diagnoses as documented in the patient’s case charts. The diagnosis of ADHD according to DSM-IV was based on a structured checklist which was filled out by the patient’s child psychiatrist or clinical psychologist (DCL-HKS, Doepfner and Lehmkuhl 1998).

Procedure

The study protocol was approved by the Ethics Committee of the Faculty of Medicine at the University of Würzburg.

The patient’s families were contacted for participation on the occasion of regular outpatient or inpatient sessions at our department. Two out of 61 patients were excluded (see the previous text). There were no fees payed for study participation. The ADHD–ODD interviews were conducted at our department first with the mother and thereafter with the child. The interviews had a duration of 1–1½ h for each mother–child pair. All interviews were videotaped to allow for a re-coding by a second independent rater. The additional questionnaires (FBB-HKS, SDQ) were filled out by the mothers within a timeframe of 2 days before or after the interview. The mothers were asked to refer their statements to the behaviour of their child during the last 2 weeks prior to the interview. The child’s primary case manager (child psychiatrist or clinical psychologist) coded the checklist covering DSM-IV ADHD criteria blind to the results obtained by the questionnaires or the interview. Patient’s case charts were analysed with respect to coexisting diagnoses by students working on their master or doctoral thesis (A.H., M.S.). The majority of interviews were conducted by these two students (n = 27 each). The first seven interviews were conducted by the senior authors (M.W., T.J.) and were used as training interviews for the students. The next nine interviews were conducted by the students under supervision. These 16 interviews had also been videotaped and re-coded by students and senior authors. Because of a high correlation between the ratings by the two students and the senior authors (r = 0.97 and 0.98, respectively), training was evaluated as successful and we decided to include these training interviews in the sample used for data analysis.

Statistical analyses

To assess inter-rater agreement for the ADHD–ODD scale, Pearson correlation coefficients were computed by correlating the sum scores of the rater conducting the interviews (rater 1) with the sum scores of the rater coding the videotapes of the interviews (rater 2). To rule out a systematic bias, the mean sum scores of rater 1 and rater 2 were compared using two-tailed t-test for dependent samples. Because these analyses showed a high objectivity, further analyses were conducted using scores obtained in the original interview by rater 1.

To assess internal consistency of the ADHD–ODD scale, Cronbach’s alpha was computed (in the case of dichotomous data, this coefficient is equivalent to Kuder–Richardson’s “formula 20”). Internal consistency was also calculated for the subscales. For the analysis of item difficulties, the number of patients fulfilling a specific criterion was divided by the sample size. Discriminative power of the items was computed using point-biserial correlations between items and sum scores (part-whole corrected).

Discriminant and convergent validity were also investigated using Pearson correlation coefficients. Coefficients were compared using Fisher’s Z-Test. Patients with coexisting CD or ODD symptoms were expected to score higher on the ADHD–ODD scale than patients without these diagnoses. These subgroups were compared using one-tailed t-tests for independent samples.

The analysis of factorial validity was exploratory because the conditions for factor analysis were not fulfilled completely (dichotomous items; low sample size). A principal component analysis (PCA) followed by VARIMAX rotation was computed. All statistical analyses were performed using SPSS 16.0.

Results

Inter-rater agreement

There was a high correlation between the sum scores of the ADHD–ODD scale for rater 1 and rater 2 (r = 0.98, P < .01) (see Fig. 1), with no statistically significant differences in mean sum scores between rater 1 and rater 2 (m 1  = 12.70, SD1 = 5.72, m 2  = 12.50, SD2 = 5.49, t = 1.28, df = 58, P = 0.21).

Fig. 1
figure 1

Inter-coder agreement: sum scores of the ADHD–ODD scale by rater 1 (conducting the interview) and rater 2 (re-coding the videotape of the interview)

As stated in the “Methods” section, there is also interview data available on the children’s symptoms based on information given separately by the mother and the child. Correlational analyses of the sum scores revealed that the clinical judgement of the interviewer (“critical” ADHD–ODD score) is mainly based on the statements given by the mothers (see Table 2).

Table 2 Pearson correlations between the ADHD–ODD scores based on the statements of either the mother (score mother) or the child (score child) or on all information available (ADHD–ODD score) (N = 59)

Internal consistency

The reliability of the ADHD–ODD scale turned out to be acceptable (Cronbach’s alpha: 0.85). As to be expected, the reliability of the subscales based on content validity reliability was lower (see Table 3).

Table 3 Reliability of the ADHD–ODD scale and subscales (N = 59)

Item difficulty and discriminative power of the items

The distribution of the item difficulties of the ADHD–ODD scale had a modus within the range of medium difficulty and an acceptable variability (see Fig. 2). Mean item difficulty was m = 48.85 (SD = 15.37, min = 15.25, max = 76.27). The item with the highest difficulty of 15.4 (i.e. only in 15% of the cases, the symptom was present) referred to the ODD symptom of being spiteful or vindictive.

Fig. 2
figure 2

Difficulty of the items of the ADHD–ODD scale. Legend: Item difficulty (theoretically ranging from 0 to 100%) is shown within 10%–intervals

The distribution of the discriminative power of the items of the ADHD–ODD scale is shown in Fig. 3. Mean discriminative power was m = 0.39 (SD = 0.14, min = 0.07, max = 0.58). Most of the items had medium to high discriminative power. The five items with low power (<0.30) were given as follows: “being easily distracted by extraneous stimuli”, “failing to give close attention to details or makes careless mistakes”, “being angry and resentful”, “being spiteful or vindictive” and “blaming others for his or her mistakes or misbehavior”.

Fig. 3
figure 3

Discriminative power of the items of the ADHD–ODD scale

Validity

Correlations between ADHD–ODD scores and, FBB-HKS as well as SDQ are shown in Table 4.

Table 4 Pearson correlations between the ADHD–ODD scale and mother-ratings of child behaviour (N = 59)

There were medium to high correlations—ranging between |0.48| and |0.70|—between the ADHD–ODD score and mother-ratings of the children’s externalizing behaviour (FBB-HKS covering ADHD symptoms; SDQ subscales “behavioural difficulties” and “hyperactivity and attentional difficulties”; negative correlation with SDQ subscale “kind and helpful behaviour”). The correlation between the ADHD–ODD score and the SDQ subscale “emotional distress” was lower with a significant difference between this coefficient and the correlation between the ADHD–ODD score and the FBB-HKS score (Z = 3.72, P < 0.01). In sum, the pattern of correlations points to convergent and discriminant validity of the scale.

As expected, patients with coexisting ODD or CD (n 1 = 19, adjustment disorders included) had significantly higher scores on the ADHD–ODD scale (m 1  = 15.37, SD1 = 6.20) than patients without these additional externalizing disorders (n 2 = 40, m 2  = 11.46, SD2 = 5.06; t = 2.58, P < 0.01).

As stated earlier, subscales of the ADHD–ODD scale were defined with respect to content validity (items reflecting (1) ADHD symptoms, (2) attentional difficulties only, (3) hyperactive impulsive symptoms only and (4) ODD symptoms). Correlations between the ODD subscale (4) and the ADHD subscales (1–3) varied between 0.41 and 0.51. For attentional difficulties (2) and hyperactive impulsive symptoms (3), the correlation was 0.46 (all coefficients significant at P < 0.01, two tailed). These moderate correlations point to the fact that related but different constructs are covered by these subscales.

For an exploratory purpose, a factor analysis was performed. A first PCA resulted in an eight-factor solution (based on Kaiser-criterion) accounting for 66% of total variance. According to Cattell’s Scree Test, a second analysis was performed extracting three factors that accounted for 40% of total variance after rotation. Items with the corresponding factor loadings are presented in Table 5.

Table 5 Factorial structure of the ADHD–ODD scale, loadings on factors 1–3

Factor 1 predominantly represents attentional difficulties. Furthermore, some hyperactive impulsive symptoms and oppositional symptoms are related to this factor (being loud, unwillingness to wait, difficulties in anger control, arguing). Factor 2 is predominantly characterized by hyperactive impulsive symptoms and some symptoms of attention-deficit and opposition (not listening, not admitting misbehaviour). The core symptoms of ODD are associated with factor 3.

Discussion

Our results point to a high objectivity in the assessment of externalizing symptoms using the ADHD and ODD sections of the K-SADS. Inter-rater correlation was 0.98. Recoding videotapes of interviews is a common and accepted method to analyse inter-rater agreement. Economy of time and efforts justifies this video-based re-coding strategy. However, it is associated with an overestimation of objectivity. It reflects the objectivity of coding the patient’s statements given during a semi-structured exploration. Instead of a mere re-coding of tapes, a more conservative and more valid method would be a second exploration carried out by an independent interviewer probably leading to less agreement between the two investigators. This has to be kept in mind when interpreting our results and those of other studies reporting “excellent” data on objectivity.

With respect to other aspects of reliability (item difficulties, discriminative power of items, internal consistency), the psychometric properties of the ADHD–ODD scale were satisfactory. The low discriminative power of single items would allow the construction of a shorter scale by item elimination. However, a scale covering all DSM-IV criteria of ADHD and ODD allows for alternative scoring strategies that may be useful (e.g. dichotomous decision if the criteria of ADHD or ODD are fulfilled at the time of investigation or not). Furthermore, this score has a higher face validity (reflecting the number of fulfilled criteria out of a comprehensive list of all diagnostic criteria).

The investigated features of discriminant and convergent validity were satisfying as well. Given the high correlation of the ADHD–ODD scale with mother-ratings of ADHD (FBB-HKS) and the behavioural difficulties score of the SDQ together with the result that the ADHD–ODD score mainly relied on mother statements, one may critically claim for the use of parent-ratings for the assessment of externalizing child problems thereby avoiding the use of time consuming interviews. This clearly is an option if reducing time and effort is critical within a given study protocol or diagnostic setting. The rather low correlations between different informants are a well-known phenomenon in the assessment of externalizing child behaviour. Therefore, multimodal assessment is warranted. The ADHD–ODD scale has the advantage to integrate mother statements, child statements and behavioural observations during the interview in the clinical judgments of the interviewer. This may be useful for clinical trials where the definition of a single primary outcome criterion is needed. Actually, our study group is using the ADHD–ODD scale as primary endpoint in a trial on a combined treatment of children and mothers both affected by ADHD (Jans et al. 2008, 2009).

The psychometric properties justify the use of the scale as an instrument to assess a broad spectrum of externalizing symptoms. Additionally, subscales defined on content validity (criteria of inattention, hyperactivity/impulsivity, ODD) may be used for a more distinguished analysis. However, our results point to a rather low reliability of these scales. A construction of subscales based on factor analysis may be fruitful. Because of the small sample size, however, our results are preliminary in this regard and do not allow for a well-founded scale construction.

Limitations of the study refer to the investigation of a referral sample of children and adolescents affected by ADHD. In all participants, the diagnosis of ADHD had already been established previously, and most of the patients had already been treated before entering the study. Thereby patients and parents were familiar with diagnostic strategies which may have helped them to give proper statements during the interview and during the completion of the rating scales. Thus, the application of the scale may be slightly less effective when investigating an unselected sample. On the other hand, the psychometric properties of the scale may even be enhanced in a population-based sample because of a larger variance of symptoms allowing for higher correlations. A less restricted clinical sample would have allowed for the investigation of other aspects of validity, e.g. the discriminative power of the scale to differentiate between ADHD patients and patients suffering from other child psychiatric disorders. This question could not be addressed within our study design. Furthermore, data on the scale’s sensitivity for change are missing and have to be supplemented by further investigation.