Introduction

Comprehensive gold standard diagnostic evaluations for autism spectrum disorders (ASD) include three components: the clinical evaluation, the Autism Diagnostic Observation Schedule (ADOS) (Lord et al. 2002) and the Autism Diagnostic Interview-Revised (ADI-R) (Rutter et al. 2003b). In addition to providing a tremendous amount of clinical information, this three-tiered approach plays a crucial role in standardizing diagnostic practice (Risi et al. 2006) and has shaped guidelines and best practice for ASD evaluation (Klin et al. 2005). Drawbacks of such exhaustive assessments are the time and level of training required. Thorough diagnostic evaluations are costly and can take many hours to complete (Shattuck and Grosse 2007). In non-research settings, it is not always feasible to administer an ADOS or ADI-R; this often results in dependence on a clinical evaluation without the supplementation of a standardized observational assessment. In such circumstances, the clinician often must rely on the mental status examination.

The traditional mental status examination, used universally in psychiatry, does not provide flexibility to accommodate the developmental perspective necessary for the evaluation of patients with ASD. In particular, there is no structured way to observe and record a patient’s social, communicative and behavioral functioning. Instead, information is often recorded in a non-standardized manner or remains undocumented. A literature review and Pubmed search revealed a dearth of publications regarding the mental status examination of individuals with ASD. Furthermore, screening questionnaires such as the M-CHAT (Robins et al. 2001), Social Communication Questionnaire (Rutter et al. 2003a), and Social Responsiveness Scale (Constantino 2002) rely on parent report and do not provide the opportunity to record clinical observations. A recent cross sectional study reported that only 8% of pediatricians endorsed screening for ASD (Dosreis et al. 2006). Accordingly, new American Academy of Pediatrics guidelines were created in 2007 to address the lack of ASD assessment in pediatric practice (Johnson et al. 2007). To our knowledge, the Autism Mental Status Examination (AMSE) is the first mental status examination designed to evaluate individuals with ASD.

The AMSE, developed at the Seaver Autism Center for Research and Treatment at the Mount Sinai School of Medicine, is a brief diagnostic classification tool that structures and prompts the observation and recording of social, communicative and behavioral functioning in patients with ASD. The AMSE is designed to seamlessly take place in the context of a clinical examination and is comprised of eight items, each of which are scored on a 0 (no symptom) to 2 (moderate/severe symptom) scale, with a possible total score ranging from 0 to 16. The AMSE combines signs and symptoms into social, communication and behavioral domains, and thus, reflects the current conceptualization of autism as specified in the DSM-IV-TR (American Psychiatric Association 2000). The AMSE is intended to support a clinical diagnosis of ASD and cannot be used independently to diagnose autism.

The initial development of the AMSE involved four phases: (a) identification of the signs and symptoms to be targeted for assessment; (b) structuring of the AMSE and its scoring mechanism; (c) presentation of the AMSE to expert panels of autism researchers and clinicians; and (d) revision and refinement of the AMSE through pilot testing. The AMSE underwent several phases of modification, resulting in an eight-item assessment tool that provides the examiner with the opportunity to record both observed signs and reported symptoms. The eight items, as outlined in Table 1, include: eye contact, type of interaction, shared attention, language, pragmatics, repetitive behaviors, preoccupations, and unusual sensitivities.

Table 1 Summary of AMSE items and scoring guidelines

As the development of the AMSE continues, several areas of potential utility hold promise. First, the exam provides structured and reliable rapid assessment of patients with ASD in the context of a clinical examination. Second, the exam can be used to support a DSM-IV-TR (American Psychiatric Association 2000) diagnosis. Lastly, the AMSE may act as a useful resource in determining research eligibility. Findings from an initial phase of testing (N = 45) (Grodberg et al. 2010) suggest that the AMSE may successfully predict which patients will be classified as having an ASD on the ADOS.

The aims of the current study are (a) to assess the reliability of the AMSE and (b) to calculate sensitivity and specificity to determine the optimal AMSE cutoff score for an ASD classification on the ADOS.

Methods

Participants

All participants were part of ongoing recruitment at the Seaver Autism Center for Research and Treatment at the Mount Sinai School of Medicine and all consented to participate in the Center’s clinical assessment protocol approved by the Mount Sinai Program for the Protection of Human Subjects. Eighty consecutive subjects (61 males, 19 females) ages 18-months through 38-years (M = 12.7, SD = 8.05) participated in this study. Inter-rater reliability was assessed with 44 participants (32 males, 12 females) ages 2-years through 38-years (M = 13.8, SD = 8.47). Participants under 18 were consented by a parent or guardian and received verbal assent when appropriate. Participants received a written report of results from the evaluation.

Procedure

The Center’s clinical assessment protocol, which is used to determine research eligibility for many studies at the Center, consists of several individual components. First, there is an initial clinical evaluation, which is performed by a child and adolescent psychiatrist. The AMSE is administered in the context of this clinical evaluation. Second, all patients are administered an ADOS at a later time and by a different clinician. The ADOS is then scored by research-reliable ADOS raters with experience using the ADOS on a routine basis. Following the clinical evaluation and the ADOS, the ADI-R is administered to the patient’s parent or guardian. Adaptive and cognitive measures are also used to support diagnoses and guide treatment recommendations.

For the present study, the two child and adolescent psychiatrists who administered the AMSE had previously established 95% inter-rater reliability. When feasible, a psychiatric resident, medical student, licensed clinical psychologist or trainee (i.e. doctoral student, postdoctoral fellow) sat in the room with the psychiatrist during the clinical evaluation and used the AMSE to co-rate the patient being evaluated. Before co-rating, each trainee underwent a brief didactic training session on the use and scoring of the AMSE. The primary investigator conducted all training sessions. Scores between the psychiatrists and the co-rater were not reconciled at any time.

Materials

Autism Mental Status Examination

The AMSE is an 8-item diagnostic classification tool that prompts the examiner to observe and record social, communicative and behavioral functioning in the context of a clinical examination. Each item is scored on a 0–2 scale with possible total scores ranging from 0 to 16; higher scores reflect greater symptom severity. Social items must be observed during the clinical exam. Communication and behavioral items can be observed or reported and, in some cases, observed items receive a weighted score. The AMSE cannot be used independently to diagnose autism.

Autism Diagnostic Observation Schedule

The ADOS is a standardized, semi-structured diagnostic instrument designed to help identify individuals with autism spectrum disorder (Lord et al. 2000). The assessment takes approximately 45-min to administer and is separated into four modules based on an individual’s language. The ADOS combines unstructured play with structured activities and interview questions to probe for social, communicative, and behavioral symptoms that are associated with ASD. Two separate cut-off scores are available: autism and autism spectrum. In this study, the autism spectrum cut-off criteria on Communication, Socialization, and total Social and Communication scores was used to determine diagnostic classification on the ADOS.

Results

Inter-rater reliability was measured using the two-way, random effects, single measure intra-class correlation coefficient (ICC). Absolute agreement for total scores on the AMSE was 0.97. As shown in Table 2, intraclass correlations for individual items ranged from 0.78 to 1.0. Internal consistency was assessed using Cronbach’s α. The eight items on the AMSE produced an α of 0.72 indicating fair internal consistency (Cicchetti 1994).

Table 2 Interrater reliability [intraclass correlation coefficient (ICC)] for each item of the AMSE

Within this high-risk sample, 80% of participants met autism spectrum cut-off criteria on the ADOS. Diagnostic accuracy was assessed by the nonparametric measure of area under a receiver-operating characteristic (ROC) curve. The ROC curve analysis was used to determine a criterion cut-off score based on AMSE total scores. As displayed in Fig. 1, area under the ROC curve (AUC) was 0.93 (95% confidence interval [CI]: 0.86–0.99). This indicates that the AMSE was able to differentiate between ASD and non-ASD classifications on the ADOS.

Fig. 1
figure 1

Receiver operating characteristic (ROC) curve for the AMSE

The most effective cut-off score was estimated at a total score of greater than or equal to 5. This cut-off score produced a sensitivity of 94% and a specificity of 81% (see Table 3). Positive predictive value, at this cut-off, was 95% and negative predictive value was 76%. Increasing the cut-off score to greater than or equal to 7 lowered sensitivity to 61%, but improved specificity to 100%.

Table 3 Sensitivty, specificity, positive predictive value and negative predictive value of AMSE cut-off scores for a classification of ASD on the ADOS

Discussion

The AMSE structures the way we observe and record social, communicative, and behavioral functioning in children, adolescents, and adults suspected of an ASD. As an autism-focused mental status examination, its administration is embedded into the clinical examination. Preliminary results demonstrate that scores on the AMSE predict results on the ADOS and thus provide rapidly assessed standardized observational data that can supplement a clinical diagnosis. We intend for the AMSE to add accuracy and precision to clinical examinations and clinicians’ diagnostic impressions when treating patients with ASD, especially when an ADOS is not feasible.

The present study examined the reliability and utility of the AMSE. Results indicate acceptable internal consistency and excellent inter-rater reliability for AMSE total scores. Agreement between raters on each of the 8 items ranged from good to excellent, suggesting consistency in clinical judgment. While two child and adolescent psychiatrists with expertise in autism diagnostic practices administered the AMSE to all participants, the co-rater’s level of training and expertise varied. These findings suggest that physicians and clinicians without substantial autism experience may be able to reliably administer and score the AMSE.

We concluded that a cutoff greater than or equal to 5 produced excellent sensitivity and good specificity on the AMSE, using an “autism spectrum” classification on the ADOS as the outcome criterion. In addition, ninety-five percent of participants in the sample were correctly classified when a cutoff of 5 was used. The excellent classification accuracy of the AMSE in our initial sample suggests that the AMSE may hold promise as a diagnostic classification tool. Future studies will further assess this potential.

While preliminary findings are promising, several limitations must be considered. First, all participants were evaluated at an autism research center and thus the distribution between the ASD and non-ASD samples were not reflective of the prevalence of ASD in the general population. Due to the high-risk nature of the sample, the majority of participants presented with some symptoms of ASD. This overrepresentation of symptom severity has significant implications for its potential use as a diagnostic tool in the general population. A current study is underway using a broader sample to assess the discriminant validity of the AMSE. Second, all ADOS’s were administered by clinicians’ independent of the AMSE examiner; however, in several cases, the co-rater also administered the ADOS. In order to minimize bias, all cases were videotaped and supervised by independent raters who are research-reliable on the ADOS. Third, the internal consistency of the AMSE was only moderate. This may be attributed to the number of items on the examination, which is lower than that of other ASD assessments. We suspect that a greater number of items would increase internal consistency. In addition, while the present study uses the AMSE as a unidimensional tool, future studies must assess whether multiple factors contribute to AMSE total scores. Finally, the current study consists of participants at various ages and levels of intellectual functioning. Future studies must examine whether alternate cutoff scores may better predict an ASD classification in different subgroups.

The AMSE is the first mental status examination structured to examine signs and symptoms of ASD. Findings from the present study suggest that the AMSE may be a useful and much needed abbreviated observational diagnostic assessment tool that has promising preliminary psychometric properties. Overall, the AMSE has demonstrated excellent classification accuracy and acceptable reliability. It is our goal to provide a standardized and rapid autism-focused mental status examination that can be administered seamlessly in the context of both clinical and research evaluations. Future studies should (a) examine the classification accuracy of the AMSE in relation to a consensus diagnosis as determined by ADOS, ADI-R, and clinical judgment; (b) explore variables such as age and IQ to estimate the most appropriate criterion cutoff at various levels of functioning; (c) assess the relationship between AMSE total scores and the “calibrated severity metric” of the ADOS to evaluate the AMSE’s ability to assess symptom severity (Gotham et al. 2009) and (d) demonstrate reliability and utility of the AMSE in a larger population across multiple sites with inclusion of other patient populations to clarify the effects of developmental delay or other Axis I disorders on AMSE performance.