Introduction

Best practice in diagnosing autism spectrum disorders (ASD) requires a multimethod approach that includes observation of the child, caregiver interview, assessment of developmental levels, detailed developmental history, and screening for associated disorders such as Fragile X (Filipek et al., 2000). Over the past decade, several ASD screening and diagnostic instruments were developed based on the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition criteria (DSM-IV; American Psychiatric Association, 1994). Such instruments include the Checklist for Autism in Toddlers (CHAT; Baron-Cohen et al., 1996), the Modified-Checklist for Autism in Toddlers (M-CHAT: Robins, Fein, Barton, & Green, 2001), the Gilliam Autism Rating Scale (GARS; Gilliam, 1995), the Social Communication Questionnaire (SCQ; Rutter, Bailey, & Lord, 2003), the Autism Diagnostic Observation Schedule (ADOS; Lord, Rutter, DiLavore, & Risi, 2002), and the Autism Diagnostic Interview-Revised (ADI-R; Rutter, LeCouteur, & Lord, 2003). Some of these instruments have received more psychometric investigation than others.

The Childhood Autism Rating Scale (CARS; Schopler, Reichler, DeVellis, & Daly, 1980; Schopler, Reichler, & Renner, 1988) was developed for the differential diagnosis of autism from other developmental disorders. Development of the CARS predates the current conceptualization of autism as a spectrum of disorders as reflected in the International Classification of Diseases-10th Edition (ICD-10; World Health Organization, 1992) and the DSM-IV. Schopler et al. (1980) developed items based on typical child development and diagnostic criteria sets from Kanner (1943), Creak (1961), Rutter (1978), and Ritvo and Freeman (1978). The disorder’s long recognized primary features include qualitative impairments in reciprocal social interaction, language and communication, and early onset. At the time the CARS was developed, investigators differed as to whether insistence on sameness and stereotyped behaviors (Rutter, 1978), and sensory peculiarities (Ritvo & Freeman, 1978) also reflected primary features. CARS items assess all of these features and yield a single Total Score, suggesting that the instrument measures a unitary construct. This is consistent with the then current DSM-III (APA, 1980) classification of Infantile Autism, the only specified pervasive developmental disorder.

The CARS is a 15-item paper and pencil measure that quantifies the severity of behaviors associated with autism. Items are rated on a scale from 1 (“normal”) to 4 (“severely abnormal”). Total Scores at or above 30 strongly suggest the presence of autism. Scores ranging from 30 to 36 indicate mild symptom presentation and scores at or above 37 indicate moderate to severe autism. The CARS requires relatively little training to administer and is widely used in the assessment of individuals suspected of autism (Saemundsen, Magnusson, Smari, & Sigurdardottir, 2003). CARS items may be scored based on direct behavioral observations in various settings, interview data, and/or chart review (Schopler et al., 1988). Data indicate consistency in CARS scores and classification across these assessment methods (Schopler et al., 1988). Although CARS development predates the DSM-IV and many new tools based on DSM-IV have become available its continued widespread use necessitates assessment of its diagnostic and research utility.

Psychometrics

Reliability

Overall, the literature supports the CARS’ reliability. Most studies report acceptable internal consistency with alpha coefficients often at or exceeding .90 (Nordin, Gillberg, & Nydin, 1998; Saemundsen et al., 2003; Schopler et al., 1980), and .85 (Sturmey, Matson, & Sevin, 1992). Only one study reported an alpha at .73 (Garfin, McCallon, & Cox, 1988). Interrater agreement data are less favorable with Pearson correlations for the Total Score at or below .71 (Schopler et al., 1980; Sevin, Matson, Coe, Fee, & Sevin, 1991) and kappas at or below .40 (Nordin et al., 1998; Sponheim & Spurkland, 1996). Schopler et al. (1988) reported a test-retest coefficient of .88 on a child sample evaluated twice within one year. Mesibov (1988) reported statistically significant changes in 10 of the 15 item scores for adolescents assessed twice across approximately 4 years and suggested lowering the Total Score classification cut-off to 27 for this group.

Validity

Several studies reported data attesting to the CARS’ validity. Data generally support its ability to discriminate between autistic and non-autistic samples (Eaves & Milner, 1993; Garfin et al., 1988; Mesibov, 1988; Sevin et al., 1991; Sponheim, 1996; and Teal & Wiebe, 1986). Data are mixed as to whether the CARS can discriminate among the ASDs (cf. Nordin et al., 1998; Sponheim, 1996). The instrument correlates with clinician ratings and clinical classifications based on the DSM-III-R (American Psychiatric Association, 1987), DSM-IV, and ICD-10 (cf. Schopler et al., 1980; Sponheim, 1996; Van Bourgondien, Marcus, & Schopler, 1992). Two studies examined the relationship between the CARS and ADI-R. Pilowsky, Yirmiya, Shulman, and Dover (1998) reported 91.8% agreement in diagnostic classification for positive cases of autism and 44.4% agreement for negative cases, and an overall kappa of .36. Saemundsen et al. (2003) reported strong relationships between the CARS and ADI-R subscales. Most diagnostic classification studies support the CARS’ utility in diagnostic decision-making.

Only two studies investigated the underlying factor structure of the CARS. DiLalla and Rogers (1994) evaluated children (N = 69) diagnosed with Autistic Disorder, PDD-NOS, and other developmental disorders (= 18). CARS data were collected through direct observation of the children interacting with a familiar adult in semi-structured play-based activities. Data were entered into a principal components analysis (PCA) with oblique Direct Oblimin rotation. Rationale for the nonorthogonal rotation appeared to be based on the authors’ a priori conceptualization of correlated constructs. The Kaiser (1960) criterion and the scree test (Cattell, 1966) indicated the presence of three components accounting for 64% of the total variance.

The components were labeled Social Impairment, Negative Emotionality, and Distorted Sensory Response. Social Impairment emerged as the largest component, accounting for 52% of the total variance. Negative Emotionality (9%) and Distorted Sensory Response (8%) accounted for far less variance. Items with factor loadings at or above .40 contributed to factor-based scales (DiLalla & Rogers, 1994). The authors did not specify if the reported factor loadings were coefficients from the structure matrix (zero order correlations with the component) or pattern matrix (analogous to partial regression weights).

Stella, Mundy, and Tuchman (1999) analyzed archival CARS data from a pediatric neurology clinic. Well-trained clinicians conducted assessments and CARS scores were based on direct observations and parent interview. CARS protocols were included in the study only for those subjects whose CARS classification agreed with a DSM-III-R diagnosis of autism. PCA evaluated data from 90 protocols, 66 from children with autism and 24 diagnosed PDD-NOS.

The authors reported a method similar to DiLalla and Rogers (1994); however, component extraction rules were not directly stated. Varimax and an unspecified oblique rotation of extracted components were reported. Varimax rotation reduced the number of CARS items that loaded on multiple factors (Stella et al., 1999), and the final solution represented uncorrelated components. This solution indicated five components accounting for 64% of the variance: Social Communication, Emotional Reactivity, Social Orienting, Cognitive and Behavioral Consistency, and Odd Sensory Exploration. Factor-based scales consisted of items loading at or above .45 on a component. The authors discussed the potential use of CARS factor-based subscales to provide information regarding individual differences among children diagnosed within the autism spectrum.

Table 1 presents components obtained in the DiLalla and Rogers (1994) and Stella et al. (1999) studies. The studies differ with respect to both the number and content of the components; however, both studies identified social-communication, emotional, and sensory components. These components reflect the DSM-IV core diagnostic and associated sensory features of autism.

Table 1 Principal components obtained by DiLalla and Rogers (1994) and Stella et al. (1999)

Present Study

The present study seeks replication of the components identified in the previous studies and further investigates the CARS’ underlying factor structure. The previous principal components studies support the CARS’ ability to assess multiple constructs. This study replicates the PCA procedures of DiLalla and Rogers (1994) and Stella et al. (1999) and includes a second set of factor analysis procedures to help identify the CARS’ internal structure. Unlike PCA, which extracts all item variance, principal axis factor analysis extracts variance common to CARS’ items. Relative to the previous studies, the dataset analyzed is from a larger sample of children diagnosed with an ASD. Larger sample sizes can generate more stable factor analytic solutions thereby increasing confidence in the validity of identified constructs. A replicable multidimensional scale would support assessment of relative impairments across reciprocal social, communication, and stereotyped and repetitive behaviors. This would further assist clinicians in differential diagnosis within the autism spectrum and with developing intervention plans specific to individualized symptom profiles. Thus, results from this study will inform us as to the relative utility of the CARS in the context of newer screening and diagnostic measures for autism.

Method

Database

Archive review identified CARS protocols completed within a developmental evaluation clinic of a midsized medical center in Western New York between 1995 and 2002. CARS protocols were administered by a developmental pediatrician or licensed psychologist with expertise in diagnosing ASDs according to DSM-IV and ICD-10 criteria. Items were scored based on direct observations, caretaker report, and chart review. CARS data were one source of diagnostic information used in the developmental evaluations. Protocols from children diagnosed with an ASD (Autistic Disorder, PDD-NOS, and two “rule out” ASD cases) were included in the analyses (N = 164), and 23.8% of the sample obtained CARS Total Scores at or below 29.5. Protocols with Total Scores at or below 29.5 were retained for analysis to capture variation in symptoms assessed by the CARS in children diagnosed according to contemporary criteria.

Table 2 presents sample demographic data. The sample consisted primarily of toddlers and preschool-age children. The male to female ratio of 4.65–1 is generally consistent with prevalence data indicating that children with ASDs are disproportionately male (American Psychiatric Association, 1994). The sample was predominantly white, with approximately 18% of the sample from other racial/ethnic backgrounds. This generally reflects the racial/ethnic make-up of the clinic’s catchment area. Although cognitive functioning was not reported in nearly 16% of the archival records, the percentage of children demonstrating significant cognitive delays is consistent with data reviewed by Klinger, Dawson, and Renner (2003). The mean CARS Total Score (34.40, SD = 5.72) is above the recommended cutoff for strong consideration of an ASD diagnosis.

Table 2 Subject demographics (N = 164)

Procedure

Principal Components Analysis

Replication of the DiLalla and Rogers (1994) and Stella et al. (1999) procedures involved principal components analyses (PCA) with SPSS 13.0 (SPSS Inc., 2004). Extracted components were rotated orthogonally and obliquely. The Kaiser (1960) criterion and scree test guided component extraction, and rotations included Varimax and Direct Oblimin procedures. Items with loadings on components at or above .40 were retained. This value is considered significant by many researchers (Pedhazur & Schmelkin, 1991) and is consistent with DiLalla and Rogers. PCA results were compared to the previous studies’ findings.

Principal Axis Factor Analysis

Principal axis factor analysis (PAF) procedures were performed subsequent to PCA. As noted above, PAF analyzes common variance among the CARS items. Several extraction and rotation procedures were performed to help determine the most parsimonious and conceptually meaningful accounting of the CARS’ underlying structure. Extraction procedures consisted of: (a) use of the Kaiser (1960) and scree test criteria; and (b) specification of three- and five-factor solutions. Factors were rotated orthogonally (Varimax and Quartimax) and obliquely (Promax and Direct Oblimin). CARS items were assigned to factors on which their loadings were at or above .40, in the structure matrix (orthogonal rotations) or in the pattern matrix (oblique rotations). Table 3 presents the item correlation matrix analyzed through PCA and PAF.

Table 3 Sample correlation matrix (N = 164)

Results

Table 4 presents PCA results of Varimax rotation of factors extracted according to the Kaiser (1960) and scree test criteria.

Table 4 Principal components results: Varimax rotationa

Four components accounted for 57.16% of the total variance. Varimax and Direct Oblimin rotations yielded similar item loading patterns. In the Direct Oblimin solution Object Use loaded on the first component, which appears to reflect a social construct. PCA replicated only DiLalla and Rogers’ (1994) Negative Emotionality component, which also emerged when five components were extracted. The third component reflects sensory and stereotypy items. The fourth component contains Activity Level and Level & Consistency of Intellectual Response, and is difficult to interpret.

Table 5 presents PAF results of the Promax rotation of four factors extracted according to the Kaiser (1960) and scree test criteria.

Table 5 Factor pattern loadings from principal axis factor analysis with Promax rotationa

Four factors accounted for 41.67% of the common variance among CARS items. The oblique rotations produced more parsimonious results (i.e., few items loaded on multiple factors) and these solutions included all items except Object Use. The Promax solution evidenced the most conceptually meaningful item loading and factor intercorrelation patterns. Relative to the PCA, factors identified through PAF represent more conceptually meaningful constructs.

The first factor is labeled Social-Communication (ά = .78). Its content includes Verbal Communication, Nonverbal Communication, Imitation, Level and Consistency of Intellectual Response, and General Impressions. The second factor consists of Relating to People and Visual Response, and is named Social Interaction (ά = .61). The third factor, Stereotypies and Sensory Abnormalities (ά = .54), consists of Taste, Smell, Touch Response and Use, Listening Response, and Body Use. The fourth factor, Emotional Regulation (ά = .59), consists of Emotional Response, Adaptation to Change, Fear or Nervousness, and Activity Level. Internal consistency coefficients fall below .80 for all factors.

Table 6 presents factor intecorrelations obtained through the Promax rotation. The only statistically nonsignificant correlation is between Social Interaction and Emotional Regulation.

Table 6 Factor intercorrelations: Principal axis factor analysis with Promax rotation

Discussion

This study assessed the factor structure of the CARS through analysis of archival data. Principal components analyses (PCA) consistent with DiLalla and Rogers (1994) and Stella et al. (1999) failed to replicate those results. Principal axis factor analysis assessed common variance among CARS items and identified four conceptually meaningful factors. Although the specific findings differed across studies, each identified constructs that map onto the DSM-IV core symptoms of ASDs and their associated features. Different CARS administration procedures, assessment settings, and child characteristics might account for the pattern of findings across studies. This study adds to the psychometric literature that supports the CARS’ technical adequacy and conceptual relevance despite its development that predated the DSM-IV. In addition, the CARS’ practical administration features further support its utility as a screening tool within a multifaceted assessment protocol.

The DiLalla and Rogers (1994), Stella et al. (1999), and present study had important methodological differences. DiLalla and Rogers collected CARS data in the context of a semi-structured play-based observation of a child interacting with a familiar adult. The authors noted that the semi-structured observation elicited behaviors assessed by the CARS. Stella et al. and the present study analyzed archival CARS data collected by clinicians with extensive experience in diagnosing ASDs. In both studies, CARS ratings were based on direct observations and parent report, and in this study, chart review was included. With respect to observational and interview methods, the degree of procedural similarity between Stella et al. and the present study cannot be determined. Both studies examined samples of children with ASDs, and the DiLalla and Rogers findings were based on a sample that included a substantial number of children not diagnosed with an ASD.

Across all studies many items appear to be robust indicators of social communication and interaction, sensory abnormalities, and emotional regulation. In all three studies: (a) Verbal Communication, Nonverbal Communication, Imitation, Relating to People, and Visual Response items were indicators of a social construct; (b) Emotional Response and Fear or Nervousness reflected an emotional construct; and (c) Taste, Smell, Touch Response and Use indicated a sensory construct. The present study was consistent with DiLalla and Rogers (1994) with respect to: (a) Level and Consistency of Intellectual Response and General Impressions which comprised a social construct; (b) Listening Response which indicated a sensory construct; and (c) Adaptation to Change, an indicator of emotional constructs. Activity Level was an indicator of emotional constructs in the Stella et al. (1999) and the present study. Object Use and Body Use items were indicators of different constructs across studies. Perhaps these items are most sensitive to variability across assessment contexts.

Validity data are interpreted in context, which includes administration procedures, settings, and sample characteristics (American Educational Research Association, 1999). In light of the differences in these variables across studies perhaps it is not surprising that specific constructs were not directly replicated. However, conceptually meaningful constructs, consistent with contemporary DSM-IV nosology, were replicated across these studies. This evidence suggests that, despite the fact it was developed more than a decade before DSM-IV, the CARS remains a clinically relevant screening tool that appears to assess autism-specific constructs consistent with current conceptualizations of the disorder. Furthermore, it does so across a variety of settings.

Given the development of several new autism screening and diagnostic measures, the CARS’ empirically supported technical adequacy, cost-effectiveness, and practicality has important implications for practice. The tool remains a reliable and valid screening instrument that can be used with youth across a wide age span. The CARS discriminates between children with and without an ASD and correlates well with other ASD measures. Relative to other measures (e.g., ADI-R, ADOS), administration of the CARS requires little training, is easy to score, and is flexible in administration procedures making it useful in a variety of settings including schools, outpatient mental health offices, primary care settings, and diagnostic clinics. Appropriately trained front-line community practitioners, such as school psychologists, can be confident in the appropriateness of a referral for more comprehensive differential diagnostic assessment based on elevated CARS’ scores. School psychologists routinely conducting psycho-educational assessments can provide a wealth of information to diagnosticians, in addition to CARS data, as part of a comprehensive community-based support for children with suspected ASD.

The preponderance of empirical data support the CARS’ utility in clinical decision-making. We believe that the psychometric literature would benefit from studies with more standardized administration procedures across settings. Future studies might then systematically determine the extent to which the internal structure of the CARS is a function of context.