Introduction

As many as 50 % of youth with Autism Spectrum Disorder (ASD; which includes Autistic Disorder, Asperger’s Syndrome, and Pervasive Developmental Disorder- Not Otherwise Specified) also experience clinically significant anxiety (de Bruin et al. 2007; Leyfer et al. 2006; Sukhodolsky et al. 2008; van Steensel et al. 2011). Common comorbid anxiety disorders reported in children and adolescents with ASD include obsessive-compulsive disorder (OCD; 17–37 %), separation anxiety disorder (SAD; 9–38 %), specific phobia (26–57 %), social phobia (13–40 %), panic disorder (2–25 %), and generalized anxiety disorder (GAD; 15–35 %) (Leyfer et al. 2006; Simonoff et al. 2008; van Steensel et al. 2011; see White et al. 2009, for a review). Youth with ASD and clinical anxiety experience impairment above and beyond core ASD symptoms in school, home, and family functioning (Bellini 2004; Chamberlin et al. 2007; Kim et al. 2000; Lewin et al. 2011; Muris et al. 1998; Sukhodolsky et al. 2008) and are at an increased risk for peer rejection, depression, and loneliness (Attwood 2003; Bauminger and Kasari 2000; Kim et al. 2000; Storch et al. 2012a; Tantam 2003). Consequently, early identification of clinical anxiety symptoms is crucial in this population.

Despite the difficulties associated with diagnosing anxiety disorders in youth with ASD (e.g., separating subclinical anxiety symptoms from ASD symptoms given symptom overlap, lack of clarity in differential diagnosis, poor agreement among informants, lack of child insight, difficulty of the parent reporting on child internal states, cognitive and language limitations of the child; van Steensel et al. 2011; White et al. 2009; Wood and Gaddow 2010), few empirical studies have explored the psychometric properties of anxiety assessments in this population (see Nadeau et al. 2011). In particular, the Anxiety Disorder Interview Schedule DSM-IV Parent and Child Interview (ADIS-IV-C/P; Silverman and Albano 1996), which is a structured diagnostic measure with complementary parent and child interviews that has demonstrated utility in assessing youth with ASD (Grondhuis and Aman 2012), has received little attention regarding its psychometric properties in youth with ASD despite its frequent use in this population (e.g., Reaven et al. 2011; Storch et al. 2013; Wood et al. 2009). Among typically developing children and adolescents, the ADIS-IV-C/P has generally demonstrated strong reliability across time (Silverman et al. 2001), and poor to strong agreement among informants (Choudhury et al. 2003; Grills and Ollendick 2003).

To date, four studies have investigated inter-rater agreement of the ADIS-III-C/P and ADIS-IV-C/P (Lyneham et al. 2007; Lyneham and Rapee 2005; Rapee et al. 1994; Silverman and Nelles 1988). Lyneham and colleagues (2007) examined the inter-rater agreement of the ADIS-IV-C/P by comparing clinician ratings of 153 typically developing youth aged 7 to 16 years performed face-to-face with parents and their children, and clinician ratings performed after viewing a videotape of the assessment. Inter-rater agreement on principal diagnosis ranged from good to excellent (kappa [k] ranging from .80 to 1.0), individual anxiety disorders (k ranging from .80 to 1.0), and comorbid disorders (k ranging from .65 to .77). Agreement for principal diagnosis and all anxiety disorders based solely on child information or solely on parent information ranged from good to excellent (k ranging from .72 to .94 and k ranging from .78 to .95, respectively). However, patterns of disagreement were noticed when clinicians tried to determine if GAD or social phobia was the principal diagnosis, which may reflect limitations of the ADIS-IV-C/P in separating GAD symptoms from those of social phobia.

Lyneham and Rapee (2005) examined the inter-rater agreement of the ADIS-IV-C/P by comparing clinician ratings of 73 typically developing youth aged 6 to 12 years performed face-to-face with parents and their children, and clinician ratings performed over the telephone. Inter-rater agreement was good to excellent for principal diagnosis (k = .86), individual anxiety disorders (k ranging from .63 to .86), and other disorders (k ranging from .79 to .91). Researchers concluded that telephone administration of the ADIS-IV-C/P was as reliable and valid as face-to-face administrations for determining the presence or non-presence of anxiety disorders in children, which suggests that formats other than face-to-face administration of the ADIS-IV-C/P may be reliable. Similar inter-rater agreement was found in older versions of the ADIS-C/P that corresponded to the DSM-III (e.g., Rapee et al. 1994; Silverman and Nelles 1988).

In addition to examining overall inter-rater reliability, we sought to examine if the child’s age (see Lyneham et al. 2007; Rapee et al. 1994; Storch et al. 2012b) and ASD diagnosis (see Muris et al. 1998; van Steensel et al. 2011) moderated agreement. In a meta-analysis performed on anxiety disorders in youth with ASD, van Steensel and colleagues (2011) identified 31 studies involving 2,121 youth (age <18 years) with ASD and found that anxiety disorders were more likely to be diagnosed in adolescents with ASD than in younger children with ASD. Older youth with ASD were more likely to report anxiety symptoms, suggesting that rates of anxiety disorders may increase with age or that youth are better able to report their anxiety symptoms as they age. Consequently, greater accuracy in the reports of older youth with ASD may result in better inter-rater agreement than with younger youth with ASD. These results have been supported by several other studies using a variety of anxiety measures in typically developing children (Edelbrock et al. 1985; Silverman and Eisen 1992; Rapee et al. 1994), although some studies have found no significant moderating effect of age on inter-rater agreement (e.g., Lyneham et al. 2007; Rapee et al. 1994). Autism spectrum disorder diagnosis is hypothesized to moderate agreement because of varying levels of deficits in communication (Buitelaar et al. 1999; Prior et al. 1998), abstract reasoning (de Bruin et al. 2006; Prior et al. 1998), and insight across ASD diagnoses (Gillott et al. 2001; Kim et al. 2000; Sukhodolsky et al. 2008). For example, inter-rater agreement may be better in youth with Asperger’s syndrome who may be have greater levels of insight into their anxiety symptoms and fewer communication deficits than youth with Autistic Disorder.

Moreover, the presence of comorbid disorders across ASD diagnoses (e.g., oppositional defiant disorder, attention difficulties, depression; Muris et al. 1998; van Steensel et al. 2011) can impair the ability of the clinician to accurately diagnose the presence of anxiety disorder(s) and agree upon a diagnosis above and beyond the difficulties faced by clinicians when assessing anxiety in youth with ASD (e.g., cognitive and language limitations of the child) such as obscuring anxiety symptoms, hindering anxiety assessments, and, overall, making it more difficult for a clinician to retrieve relevant information.

To date, no study has investigated the inter-rater reliability of the ADIS-IV-C/P in children and adolescents with ASD despite its increasingly frequent use (e.g., Storch et al. 2013; Wood et al. 2009) prompting investigators to highlight the need to examine the psychometric properties of this measure in youth with ASD (Grondhuis and Aman 2012). Investigating the ADIS-IV-C/P inter-rater reliability of a measure is essential for several reasons. First, inaccurate or incomplete assessment of a child’s anxiety symptoms can lead to an inappropriate and ineffective treatment. For example, misclassifying repetitive behaviors and/or restricted interests as obsessive- compulsive symptoms may translate into incorrect treatment decisions. Second, reliable diagnoses are needed to ensure treatment specificity. Treatment aims and protocols can be tailored to address the child’s unique anxiety symptoms. If raters cannot sufficiently agree upon the presenting anxiety symptoms, an inaccurate treatment protocol may be used that does not target the child’s anxiety and comorbid conditions. Third, investigating the reliability of a measure allows researchers to explore the extent to which bias and other relevant factors may impact raters’ abilities to reach objective diagnoses (Groth-Marnet 2009; Gwet 2012). Lastly, inter-rater reliability is needed to properly screen research participants and match clinical characteristics of the participants to the appropriate treatment.

Given this, the purpose of this study was to examine the inter-rater agreement of anxiety and comorbid disorders as endorsed by the parent and child on the ADIS-IV-C/P. We had two primary aims. First, we examined the inter-rater agreement on the ADIS-IV-C/P with respect to principal diagnosis, individual anxiety disorders, and comorbid DSM-IV disorders as endorsed by the child and parent, and a clinician diagnosis. Second, we examined whether clinician inter-rater agreement on clinician diagnoses was moderated by child’s age and ASD diagnosis.

Method

Participants

Participants were 70 parents and their children (ages 7–16 years) with an autism spectrum disorder diagnosis, confirmed through the administration of the Autism Diagnostic Observation Schedule (ADOS; Lord et al. 2000) and Autism Diagnostic Interview-Revised (ADI-R; Rutter et al. 2003). Participants were recruited through referrals, flyers, brochures and various organizations for one of four randomized controlled studies. These studies examined the efficacy of cognitive behavioral therapy for anxiety in youth with ASD. In the present study, participants were included if they met criteria for ASD, as assessed by the ADOS and ADI-R, met DSM-IV diagnostic criteria for a primary anxiety disorder (i.e., SAD, GAD, social phobia, OCD) and had a full scale IQ equal to or above 70. Participants were excluded if they met criteria for bipolar disorder, schizophrenia or schizoaffective disorder within the past 6 months, displayed clinically significant suicidality or engaged in suicidal behaviors within the last 6 months, recently initiated or increased psychiatric mediation and/or had parents who were unwilling to accompany their children for multiple study visits.

Measures

Anxiety Disorders Interview Schedule for DSM-IV–Child and Parent Version (ADIS-IV-C/P)

The ADIS-IV-C/P (Silverman and Albano 1996) is a clinician-administered, structured interview used to assess the presence, severity and level of interference of anxiety disorders and common disorders in youth based upon the criteria set by the DSM-IV-TR (APA 2000). Parent and child were interviewed separately and a list of diagnoses endorsed by the parent and child were recorded. Clinician diagnosis was determined by the clinician after considering the disorders endorsed by parent and child. Severity ratings of each diagnoses ranged from 0 (not at all interfering) to 8 (very much interfering). Severity ratings greater than or equal to 4 indicated clinical significance. Principal diagnoses represented the most distressing/interfering anxiety disorder. The ADIS-IV-C/P has demonstrated strong psychometric properties in typically developing youth, including test-retest reliability (Silverman et al. 2001), inter-rater reliability (Silverman and Nelles 1988), and concurrent validity (Wood et al. 2002).

Autism Diagnostic Interview-Revised (ADI-R)

The Autism Diagnostic Interview-Revised (ADI-R) (Rutter et al. 2003) is a standardized semi-structured clinical diagnostic interview for assessing ASD in children and adults based on the diagnostic criteria for autism in the DSM-IV-TR (APA 2000). The ADI-R focuses on behaviors in the three content areas or domains often displayed by children and adults with ASD: quality of social interaction, communication and language, and repetitive, restricted and stereotyped interests and behaviors (Rutter et al. 2003). The ADI-R has demonstrated strong psychometric properties, including test-retest reliability, inter-rater reliability, and discriminant validity (Lord et al. 1994).

Autism Diagnostic Observation Schedule (ADOS) - Module 3

The ADOS–Module 3 is a structured observation assessment used to elicit atypical language use, social interaction, and stereotyped behaviors of individuals suspected of having ASD (Lord et al. 2000). The ADOS has demonstrated strong psychometric properties, including test-retest reliability, inter-rater reliability, and discriminant validity (Lord et al. 1999; Lord et al. 2000).

Interviewers

The original interviewers who audio-recorded their ADIS-IV-C/P were research assistants with prior experience working with youth with ASD and anxiety. They were trained to reliably administer the ADIS-IV-C/P. Training of the original interviewers involved didactic trainings, in vivo observation, coding audiotaped assessments, and weekly meetings with a licensed psychologist. The original interviewers have achieved an inter-rater agreement of 80 % or above on training tapes. Parent and child interviews were conducted by the same interviewer. A second interviewer who was blind to the exact age and ASD diagnosis of the child and was trained to administer the ADIS-IV-C/P (i.e., an interviewer who had observed and rated multiple ADIS-IV-C/P under the supervision of a qualified and reliable interviewer of that ADIS-IV-C/P, as specified above, and had achieved an inter-rater agreement of 80 % or above on all ADIS-IV-C/P observed) was used to establish inter-rater agreement.

Procedures

At the initial study visit, written parent consent and child assent was obtained and the parent and their child were administered a series of measures by trained clinicians including the ADIS-IV-C/P. In all clinical studies, parents consented and children assented to the audio recordings of measures and for their use in research. Full ADIS-IV-C/P modules were administered to the child and parent separately. After completing the interview, the rater assigned diagnoses based upon parent and child interviews. The second rater listened to the audiotapes of previous ADIS-IV-C/P taken at the screen visit and scored the ADIS-IV-C/P based upon these recordings. The order that the parent and child recordings were rated was randomized. Parent, child and clinician diagnoses were compared to assess inter-rater agreement. All studies were approved by the local institutional review board.

Data Analysis

Cohen’s Kappa (Cohen 1960) was calculated to evaluate agreement for each individual anxiety and comorbid diagnosis. Kappa values for inter-rater agreement were calculated for severity ratings that were 4 or greater which signified an endorsement of an anxiety diagnosis and/or comorbid diagnosis per ADIS-IV-C/P criteria. A 2 × 2 Cohen Kappa table was used to calculate a kappa coefficient for each individual anxiety and comorbid diagnosis. Per study inclusion criteria, generalized anxiety disorder, social phobia, separation anxiety disorder, and obsessive-compulsive disorder were the anxiety diagnoses that could be chosen to represent the principal diagnosis. Consequently, a 4 (anxiety diagnosis) x 2 (raters) Cohen’s Kappa table was used to calculate a kappa coefficient for principal diagnosis. The 95 % confidence intervals for Cohen’s Kappa coefficients were calculated using the formula provided by Blackman and Koval 2000. The following guidelines set by Mannuzza et al. (1989) were used to represent kappa values: values less than 0.40 are considered poor agreement, values 0.40–0.60 are considered fair agreement, values 0.60–0.74 are considered good agreement, and values greater than 0.74 are considered excellent agreement.

Participants were split into two groups, the child group (aged 7–11, n = 41) and adolescent group (aged 12–16 years, n = 29) to investigate whether age was a moderator of inter-rater agreement. Participants were split into three groups, youth with Autistic Disorder, youth with Asperger’s Syndrome, and youth with PDD-NOS to investigate whether ASD diagnosis was a moderator of inter-rater agreement. Cohen’s Kappa of each individual diagnosis (i.e., the estimate) was examined to determine if the coefficient fell within the confidence intervals of the same individual diagnosis across the moderator to determine moderator effects.

Results

Sample

Of the 70 participants, consisting mainly of male participants (n = 51), 23 participants were diagnosed with Autistic Disorder, 32 participants were diagnosed with Asperger’s Syndrome, and 15 participants were diagnosed with PDD-NOS. The mean age of the sample was 11 years (SD = 2.26 years). Demographics and diagnostic characteristics of study participants are presented in Table 1. Demographic characteristics were not significantly different within ASD child studies and ASD adolescent studies from which the participants were recruited.

Table 1 Demographic and diagnostic characteristics of the sample

Agreement on Principal Diagnosis, Individual Anxiety Disorders and Comorbid DSM-IV Diagnoses

The kappa coefficient for inter-rater agreement on principal diagnosis was 0.91 which signified excellent agreement.

Kappa coefficients for inter-rater agreement on parent and child ratings for individual anxiety disorders and comorbid disorders were 1.00 which signified excellent agreement. No disagreements were found in clinician-to-clinician ratings of parent and child ratings. Inter-rater agreement on individual anxiety disorders was excellent (k = 0.85–1.00). Inter-rater agreement on mood disorders and externalizing disorders was excellent (k = 0.89–1.00). Kappa coefficients for individual anxiety disorders and other comorbid DSM-IV diagnoses are presented in Table 2.

Table 2 Kappa coefficients for inter-rater agreement on parent and child ADIS and clinician diagnoses

Moderators of Inter-rater Agreement

Age

Age was not found to be a significant moderator of inter-rater agreement. However, inter-rater agreement among the adolescent group varied more so than the child group, ranging from good to excellent agreement (k = 0.73–1.00) for the adolescent group versus excellent agreement for the child group (k = 0.90–1.00). In the adolescent group, good agreement was found on SAD (k = 0.73) while all other anxiety disorders and comorbid DSM-IV diagnoses had excellent agreement (k = 0.83–1.00). Excellent agreement was found across age group on principal diagnosis (child group: k = 0.88, adolescent group: k = 0.94). See Table 3 for inter-rater agreement on individual diagnoses using age as a moderator.

Table 3 Kappa coefficients for inter-rater agreement on clinician diagnoses by age and ASD diagnosis

ASD Diagnosis

ASD diagnosis was not found to be a significant moderator of inter-rater agreement. Excellent agreement on individual anxiety diagnoses and comorbid DSM-IV diagnoses was found within each ASD diagnosis with the exception of good agreement for SAD in the youth with PDD-NOS. Kappa coefficients in Autistic Disorder group ranged from 0.81 to 1.00 signifying excellent agreement, the Asperger’s Syndrome group ranged from 0.93 to 1.00 signifying excellent agreement and the PDD-NOS group ranged from 0.73 to 1.00 signifying good to excellent agreement. Excellent agreement was found across ASD diagnosis on principal diagnosis (Autistic Disorder: k = 0.93, Asperger’s Syndrome: k = 0.85, PDD-NOS: k = 1.00). See Table 3 for inter-rater agreement on individual diagnoses using ASD diagnosis as a moderator.

Discussion

The present study examined the inter-rater reliability of the ADIS-IV-C/P in high-functioning youth with ASD. Clinician inter-rater agreement on principal diagnosis (k = 0.91) was excellent. As others have reported (e.g., Lyneham et al. 2007), modest discrepancies were noticed in clinician ratings when deciding whether social phobia or GAD was the principal diagnosis. One possible explanation is that the overlap in the diagnostic criteria of specific anxiety disorders may contribute to inter-rater disagreements on the principal diagnosis (Lyneham et al. 2007). For example, clinicians may disagree about whether social phobia stands alone as the primary diagnosis or is subsumed under GAD in a child with ASD. Clinician disagreements underscore the notion that anxiety in youth with and without ASD is a dimensional construct that cannot always be easily mapped onto a categorical system of classification, as specified by the DSM-IV on which the ADIS-IV-C/P is grounded.

Clinician agreement on the presence of individual anxiety diagnoses (k = 0.85–0.97) and other comorbid diagnoses (k = 0.89–1.00) was excellent but not perfect. Although agreement was strong, disagreement may arise due to a number of variables. Clinicians may differ on how they determine if an anxiety or comorbid disorder is clinically significant to warrant a diagnosis (e.g., behavioral observations, physiological manifestations, number of presented symptoms to meet DSM-IV criteria) which can contribute to inter-rater disagreement. Moreover, features that characterize ASD symptomology and/or are commonly observed in youth with ASD (e.g., communication and cognitive deficits, difficulty interpreting and understanding emotions) may restrict the youth’s level of insight and ability to reliably convey his/her emotional states, thus leading to inter-rater disagreement. Our finding of excellent agreement across anxiety and comorbid diagnoses was not significantly impacted by these clinician, child and ASD variables.

Inter-rater agreement on parent and child ratings was excellent, suggesting that information gathered from parent and child interviews can be reliably captured by two separate clinicians. The interview structure of the ADIS-IV-C/P allows for a clear and direct report of parent and child ratings of the severity and level of interference of individual anxiety and comorbid disorders (e.g., clear cut-off severity score to meet diagnosis). Consequently, raters are more likely to agree that the parent and the child reported an individual anxiety or comorbid condition to be clinically significant.

Consistent with previous findings (e.g., Lyneham et al. 2007; Rapee et al. 1994; see Storch et al. 2012b for exceptions), age and ASD diagnosis were not significant moderators of clinician inter-rater agreement. Overall, excellent agreement was found across age groups and ASD diagnoses. Inter-rater agreement did not vary across ASD diagnoses, indicating that ASD diagnosis does not significantly impact rater agreement. As a spectrum disorder, youth with ASD can vary on the frequency and severity of ASD symptomology and may not be best categorized as belonging to one category versus another. Although not statistically significant, inter-rater agreement varied more so in the adolescent group (ages 12–16 years) than the child group (ages 7–11 years). One possible explanation is that because children may be less reliable at reporting their anxiety symptoms, clinicians may rely more heavily on parents’ report. In contrast, adolescents may be better reporters of their anxiety, which consequently present more information for clinicians to consider. With more information available, clinicians may be more likely to differ on what information they use to decide the presence or absence of a disorder, resulting in greater inter-rater disagreement.

Several study limitations should be noted. First, due to the use of archival tapes, a second face-to-face ADIS-IV-C/P interview could not be performed to obtain inter-rater reliability. Whereas the present findings suggest strong inter-rater agreement of a single interview, it is difficult to speculate on consistency in anxiety diagnoses across two independent interviews, especially when separated by time. Additionally, face-to-face interviews may provide clinicians with further details about anxiety symptoms and the reliability of parent and child reports. For example, facial or body cues such as expressions of boredom or a need to quickly finish the assessment may inform clinicians about the reliability of parent and child reports. Second, given the small sample size in some groups, limited statistical power, and robust overall agreement may explain why age groups and ASD diagnosis were not found to be significant moderators of inter-rater agreement. The lack of overlap in confidence intervals signified that age and ASD diagnosis did not significantly impact inter-rater agreement. Third, assessments were limited to structured diagnostic interviews. Inclusion of a flexible, expert clinician interview can strengthen accuracy of diagnostic impressions (Lewin and Piacentini 2010). Finally, a majority of the sample were Caucasians with ASD, limiting the generalizability of the results.

The present study is the first to support the inter-rater reliability of the ADIS-IV-C/P in youth with ASD. Study results have several clinical implications. First, developing accurate treatment goals and treatment plans without a reliable case conceptualization of the child is not possible (Cormier et al. 2009). Inaccurate or incomplete assessment of a child’s anxiety symptoms can lead to an inappropriate and ineffective treatment (e.g., King et al. 2009). For example, a clinician who mistakes restricted interests and repetitive behaviors for OCD symptoms may administer a treatment protocol that is inappropriate for the youth or does not follow the recommended treatment. Lastly, researchers must be able to reliably assess symptoms in youth with ASD to enroll appropriate participants for their studies and to assign treatments that appropriately match each child’s clinical characteristics. For example, if a youth with ASD can be reliably diagnosed with an anxiety disorder and a comorbid diagnosis such as oppositional defiant disorder, treatment targeted at reducing problematic behaviors prior to or in conjunction with the anxiety treatment may maximize treatment efficacy by removing treatment barriers associated with comorbid conditions (e.g., low motivation and low homework compliance) (Storch et al. 2008). Consequently, understanding the functionality and impairments associated with anxiety and comorbid conditions in youth with ASD through reliable assessments and matching patient characteristics to certain interventions is critical to the success of individualized treatments for youth with ASD and anxiety (Wood et al. 2009).