The prevalence of autism spectrum disorder (ASD), a neurodevelopmental condition impacting social communication and restricted and repetitive behaviors, has been found to be increasing over recent decades, with current US estimates of 1 in 36 in 8-year-old children and higher rates in males than females (Maenner et al., 2023; Center for Disease Control and Prevention (CDC), 2014 & 2023). The diagnosis of ASD is based on behavioral assessments, and, despite improvements in diagnostic reliability over the last 25 years, there remain issues of both under and over-diagnosis (Hill et al., 2015; Skellern et al., 2005; Fombonne, 2023; Zuckerman et al., 2013) that affect clinical management, access to services, population rates, and health/educational costs. Autistic individuals are also at-risk for a variety of struggles in learning, vocational attainment, and independent living (CDC, 2023). Increased services accessibility and awareness of ASD seems to be driving higher rates of diagnosis than ever before. Although identifying autistic individuals early (true positives) has promise for improving outcomes, there is increasing concern about misdiagnosis and overdiagnosis of ASD (false positives), which could negatively impact treatment and trajectories for these children and decrease our ability to accurately study the disorder at the population level (Lucinao et al., 2014; Fombonne et al., 2021).

Despite a higher degree of awareness, there is still a problem of late diagnosis, as the average age at diagnosis in recent surveys is 4–5 years of age (Hill et al., 2015; van’t Hof et al., 2021; Maenner et al., 2023). In some underserved groups, such as racial or ethnic minorities, non-English speaking or low-income families, ASD is either diagnosed later or underdiagnosed as shown by significantly lower early identification rates in the US in CDC surveys (La Roche et al., 2018; Zuckerman et al., 2013); although, the most recent prevalence estimates indicate that this gap in identification of ASD in racial and ethnic minority groups is shrinking to some degree (Maenner et al., 2021, 2023). However, autistic Black children in this US sample of children aged 8 were significantly more likely to be identified as having intellectual disability, suggesting that perhaps Black children with the highest needs are being identified, thereby missing a subset of Black children with ASD with relatively lower support needs. Misdiagnosis is also common in individuals with more complex psychiatric profiles, including those with prominent mood concerns, severe ADHD, personality disorders, schizophrenia and anxiety, among others (Luciano et al., 2014; Havdahl et al., 2016, Greene et al., 2022).

Conversely, concerns regarding overdiagnosis have also been noted (Graf et al., 2017; Fombonne, 2018, 2023). Overdiagnosis may be related to increased awareness of ASD in individuals with average or above IQ, provision of access to education services under a school, but not medical, eligibility of ASD, and overdiagnosis of ASD in individuals with other delays or co-occurring psychological diagnoses that could result in social difficulties (Hill et al., 2015; Fombonne, 2023, Van Schalkwyk et al., 2015). In fact, a study of Australian child psychiatrists and pediatricians indicated that when uncertain about an ASD diagnosis 58% had erred on the side of diagnosing ASD to qualify children for special educational supports (Skellern et al., 2005). There are few studies that have systematically examined the extent of misdiagnosis in either direction or the factors that influence it. This is a key gap in the field as early and accurate diagnosis and treatment has shown to be beneficial in improving quality of life and functional outcome in children with ASD. Additionally, our ability to study ASD in large community samples and at the population level is hampered by inaccurate diagnoses. Overidentification also has the possibility of taxing an already strained disability service system.

As part of screening for inclusion in a larger neuroimaging study, we evaluated, with research reliable diagnostic procedures, a high number of children who presented to the study with a prior diagnosis of ASD. This provided an opportunity to estimate the proportion of accurate/inaccurate diagnoses and to identify correlates of misdiagnosis among children presenting with a prior community ASD diagnosis. The specific aims of the study were:

Aim 1: to estimate the number and frequency of false positive ASD diagnoses observed in a research referred sample; and.

Aim 2: to compare clinical characteristics and sources of information between children with confirmed and unconfirmed ASD diagnosis in order to identify the origins of this discrepancy.

Methods

Study Design

Participants were drawn from the screening visit that was conducted as the first visit in an ongoing study examining neuroimaging correlates of children with ASD and ADHD. The basic study design is a case control comparison involving multi-method assessments of two groups: (1) children with an ASD diagnosis confirmed as part of the research evaluation [ASD+] and (2) children with community diagnoses of ASD who did not to meet ASD diagnostic criteria within the research evaluation (ASD-).

Participants

Participants included 232 children aged 7–12 years with historical ASD diagnoses who sought to be research participants in a neuroimaging study. Study recruitment occurred through posters in the community and on the hospital campus and the University Hospital Autism clinical program, community support groups and outreach, and targeted mailing and emails to patients with ICD-9/ICD-10 codes of ASD in their electronic health record. Inclusion criteria for the study included: an existing diagnosis of ASD; fluent in English; able to see and hear adequately for study completion; no major head trauma, no diagnoses of intellectual disability, schizophrenia, seizures or tic disorder; no medication needs that would be incompatible with washout; and no MRI contraindications.

Procedures

Potential participants completed a telephone screener regarding demographic variables, inclusion criteria, and ASD diagnosis history. Participants who passed telephone screening were then scheduled for an initial study visit. Participants provided consent and assent in compliance with IRB requirements. At the initial visit, participants completed cognitive assessments, behaviorally-based autism testing (Autism Diagnostic Observation Schedule-Second Edition (ADOS-2); Lord et al., 2012a, b) and caregiver based standardized clinical interview for symptoms of ASD (Autism Diagnostic Interview-Revised (ADI-R); Rutter et al., 2003). Caregivers completed questionnaires to provide the child’s developmental and medical history, social-emotional and behavioral adjustment, and language functioning. Due to the heavy time burden of completing ADI-R interviews and the significant proportion of children who did not ultimately meet ASD study inclusion criteria following the first visit, study procedures switched to a streamlined initial visit that postponed the ADI-R until a later visit. Thus, 188 participants completed both the ADI-R and ADOS-2. All 232 participants completed the ADOS-2. The ADOS-2 and ADI-R were administered by one of three doctoral level clinical psychologists who all attained research reliability. All ADOS-2 assessments were videotaped for subsequent review. All data were collected between 2011 and 2018, and importantly, before this study was planned; therefore, all scores and ratings were obtained in a manner that was strictly blind to the hypotheses of this study.

Measures

Autism Diagnostic Observation Schedule-Second Edition (ADOS-2; Lord et al., 2012a, b; Gotham et al., 2009). One of three research reliable clinical psychologists administered and scored the ADOS-2 Module 3 (for verbally fluent children) to all participants. The ADOS-2 is a 45–60 min standardized semi-structured observational measure of social, communication, rigidity and restricted interests that includes interacting with an examiner across several activities and contexts. A recent meta-analysis examining the ADOS-2 estimated sensitivity ranging from 0.89 − 0.92 and specificity ranging from 0.81 -. 85 (Lebersfeld et al., 2021).

Autism Diagnostic Interview-Revised (ADI-R; Rutter et al., 2003). One of three research reliable doctoral-level clinical or developmental psychologists administered and scored this comprehensive caregiver interview. Questions focus on current and lifetime behaviors related to ASD symptoms, including but not limited to social communication, language, gestures, sensory processing, behavioral rigidity, restricted interests and repetitive behaviors. Reliability is strong and a recent meta-analysis examining the ADI-R estimated sensitivity of 0.75 and specificity of 0.82 when examined and pooled across both clinical and research samples (Cicchetti et al., 2008; Lebersfeld et al., 2021). Notably, this study reported higher ADI-R specificity in research versus clinical samples (Research = 0.85, Clinical = 0.72).

Wechsler Intelligence Scale for Children, 4th Edition (WISC-IV; Wechsler, 2003). A clinical psychologist administered a reliable and valid group of cognitive subtests (Block Design, Information, and Vocabulary) of the WISC-IV to estimate full scale IQ (estimated FSIQ; Sattler & Dumont, 2004).

Social Responsiveness Scale, 2nd Edition (SRS-2; Constantino & Gruber, 2012). Caregivers completed the SRS-2, a 65-item measure used to assess autism symptom severity. Each question is answered using a 4-point Likert scale. The SRS-2 has been previously shown to reliably distinguish individuals with ASD from individuals with other psychiatric diagnoses (Constantino et al., 2003; Constantino & Todd, 2000).

Psychiatric Symptoms and Diagnoses

We relied on a combination of questionnaires and clinical diagnostic approach. Co-occurring diagnoses were evaluated with a best estimate clinical diagnosis approach after review of ADOS-2 videotapes and other observational data, of medical and educational records as well as of parent diagnostic interviews and completed questionnaires when available.

The Multidimensional Anxiety Scale for Children, 2nd Edition (MASC 2; March, 2012). The MASC 2 is a multi-rater 50-item, 4-point Likert-type scale used to assess the presence and severity of various symptoms of anxiety (i.e., Physical Symptoms, Harm Avoidance, Social Anxiety, Separation/Panic, and Obsessive and Compulsive behaviors) in children 8 to 19 years old. The MASC-2 demonstrates good discriminant validity (March, 2012). It was completed by caregivers.

Children’s Communication Checklist, 2nd Edition (CCC-2; Bishop, 2006). The CCC-2 is a 70-item caregiver report measure used to assess communication skills related to overall speech, vocabulary, sentence structure, and social/pragmatic language skills in children ages 4–16 years old. Dolata et al. (2022) have recently shown that pragmatic language scores of the CCC-2 are highly predictive of autism diagnosis and of the prominence of autistic features.

Procedure for Determining Group Membership

A clinical expert team review was conducted for each potential participant to implement a best-estimate clinical diagnosis procedure for ASD status. To do so, from the three licensed psychologists who administered the assessments a team of at least one licensed psychologist and one licensed child psychiatrist (also certified trainer to the ADI-R and the ADOS-2) reviewed videos of each participant’s ADOS-2, clinical, school and medical record information (including previous testing) and results from all research assessments described above (as well as additional structured interviews and questionnaires about mental health that were included in the parent study). These clinicians had research reliability on the ADOS-2 and extensive autism specific clinical experience, they then independently rated the presence/absence of ASD, and their degree of certainty. Each rater was asked the degree to which they were certain a subject had ASD (0 = certain no ASD diagnosis, 10 = certain of ASD diagnosis). Only in cases of disagreement regarding the presence/absence of ASD did the experts confer during the process. A unanimous clinical expert consensus was required for inclusion as ASD + by DSM-IV and DSM-5 criteria. Agreement was high with Pearson’s correlation coefficients for each pair of raters (4 pairs in total) for ratings of certainty of ASD diagnosis ranging from 0.96 to 0.99 (depending on the rating pair). The team also identified co-occurring psychiatric disorders using all information available.

Two groups were compared: individuals confirmed to have ASD by our group of experts (ASD+) and individuals who were determined not to meet criteria for ASD, despite reporting a prior history of ASD diagnosis in the community (ASD-).

Statistical Analysis

Descriptive statistics were used to characterize participant demographics, caretaker-reported behaviors, additional diagnoses, medication use, and clinical instruments for the overall cohort and according to ASD diagnosis (ASD- vs. ASD+); frequencies and percentages for categorical variables and means and standard deviations for continuous variables were calculated. To test for differences across groups in each of our collected variables, we used chi-square or Fisher’s exact tests for categorical variables and t-tests for continuous variables. All analyses were performed in Stata/SE 15.1.

Data used in this study were collected at the initial eligibility visit (described above), and as noted, due to the heavy time burden of completing all interviews/questionnaires and the significant proportion of children who did not meet ASD study inclusion criteria following the first visit, study procedures switched to a streamlined initial visit. Thus, there were participants that did not complete the ADI-R (n = 44), WISC-IV subtests (n = 55) or MASC (n = 97), and this was disproportionately among participants in the ASD- group. To assess the potential bias introduced by these protocol changes, we assessed if those ASD- participants who did not complete all assessments differed in demographics and ADOS-2 scores from ASD- participants who did complete all assessments. We compared medians and interquartile ranges for continuous ADOS-2 sub-scores using Mann-Whitney U tests and compared frequencies and percentages for categorical variables using chi-square tests. No statistically significant difference was found between the 2 groups. We did not use any further statistical or imputation techniques to handle missing data, as it would have been outside the scope of this descriptive study. Whether a participant was enrolled in special education was added to the comprehensive phone screener around 2013 after recruitment began, and because of this, 123 participants had missing values for this characteristic. All other response missingness was non-systematic and not associated with meeting ASD criteria. We report missingness as table footnotes or in the table itself.

Results

Demographic Characteristics of the Study Sample

Of the 232 participants, 123 were labeled ASD+ (e.g., meeting full diagnostic criteria for ASD in the comprehensive research criteria clinical evaluation), whereas 109 were labeled ASD- (e.g., participants who indicated they had existing diagnoses of ASD who did not meet diagnostic criteria by expert consensus). ASD + and ASD- groups did not differ in age (p = 0.61; Total M = 10.71 years, SD = 2.31 years), gender (p = 0.20; Total % Male = 81.5%), or ethnicity (p = 0.86; Total % Hispanic/Latinx = 13.5%; see Table 1). Additionally, groups had commensurate racial makeups (p = 0.22), totaling to 79.1% White, 2.0% Black, 3.6% American Indian/Alaska Native, 2.0% Asian, 1.0% Hawaiian/Pacific Islander, and 12.2% Multi-Racial across both groups. ASD + and ASD- groups did not differ in terms of reported family income (p = 0.20), though there was a high degree of missingness in this data point. A majority of the participants were residents of Oregon (74.2%), and the remainder were from Washington (24.9%), Idaho (0.5%), and Canada (0.5%). About half the children (51.1%) in the ASD + group were receiving support through their schools through an individualized educational program (IEP) or 504 plan compared to approximately a quarter (24.2%) of the ASD- group (p < 0.01) although caution is advised as only 109 participants had valid data for this variable.

Table 1 Characteristics of study cohort by ASD diagnosis

Early Neurodevelopmental Features

On a caregiver developmental history questionnaire, the ASD + group was more likely to have a caregiver endorse current language disorder or delays (33.0%) and a lifetime history of language or articulation differences (60.2%) compared to the ASD- group (14.9% and 43.8%; p < 0.01, p = 0.02, respectively; see Table 2). Additionally, on the ADI-R, development in the first 3 years was judged as more delayed and more consistent with autism in the ASD + group compared to ASD- patients. The reported age at which first phrases were used trended toward significance such that the ASD + group was slightly older than the ASD- group at time of first phrases used (See Table 1).

Table 2 Early neurodevelopmental features, by ASD diagnosis

Psychiatric & Developmental History

As the study began in 2011 DSM-IV was utilized for caregiver reports of previous diagnoses. All individuals invited to participate in the study were identified by their families as having ASD and received a prior diagnosis of Autistic Disorder (62.7%), Asperger’s Syndrome (23.2%), or Pervasive Developmental Disorder – Not Otherwise Specified (PDD-NOS; 14.1%), and no group differences were observed among these initial reported diagnoses (all p’s > 0.05).

Caregiver-Reported Diagnoses

The incidence of caregiver-reported lifetime psychiatric disorders (not including ASD) was higher in the ASD- sample (p = 0.022), with 73.4% reporting no existing psychiatric diagnoses apart from ASD, 22.9% reporting 1, and 3.7% reporting 2 or more psychiatric diagnoses (see Table 3). By contrast, 87.0% of the ASD + sample reported no existing psychiatric diagnoses apart from ASD, 9.8% reported 1, and 3.3% reported 2 or more psychiatric diagnoses. Groups had similar rates of all specific psychiatric disorders (all p’s > 0.05). Notably though, some specific diagnoses (e.g., adjustment disorder, trauma-related disorders, disruptive mood disorders, mood disorders) were reported rather infrequently in the current sample; therefore, those results should be interpreted with caution. Groups did not differ in prevalence of other non-psychiatric diagnoses such as prior developmental delay (p = 0.27) or sensory processing disorder (p = 0.26). No caregivers reported diagnoses of psychosis (a study exclusion criteria) or tic disorders. The ASD- group trended towards elevated levels of ADHD, though this difference fell short of statistical significance (p = 0.06).

Table 3 Psychiatric and developmental diagnoses, by informant and ASD diagnostic confirmation status

Research Team-Provided Diagnoses

The research evaluation team then provided psychiatric and developmental diagnoses based on their evaluation findings and all available data from the broader parent study (structured clinical interviews, medical, clinical and school reports). The incidence of research team-provided psychiatric disorders (not including ASD) was elevated in the ASD- sample (p < 0.01), with 38.5% meeting criteria for no psychiatric diagnoses, 44.7% meeting for 1, and 13.9% meeting for 2 or more psychiatric diagnoses (see Table 3). Alternatively, 64.2% of the ASD + sample met criteria for no other psychiatric diagnoses apart from ASD, 30.9% met for 1 additional diagnoses, and 4.9% met for 2 or more psychiatric diagnoses. Comparatively higher rates of anxiety (p < 0.01), ADHD (p < 0.02), and disruptive behavior disorders (p < 0.02) were found within the ASD- group whereas participants demonstrated no significant differences in rates of adjustment disorders, trauma-related disorders, mood disorders, psychosis, Tourette Syndrome or tic disorders, developmental delay, language disorder, or sensory processing disorder (all p’s > 0.05).

Medication Use

Of all psychotropic and non-psychotropic medications, stimulants were the most commonly used medications for both ASD+ (16.4%) and ASD- samples (17.2%). There were no differences in rate of medication use (p = 0.69) or type of medications used across groups.

Behavioral Observations of Autism Symptoms

The ASD + sample scored significantly higher on the ADOS-2 in regard to the restricted and repetitive behavior total score (p < 0.01), social affect total score (p < 0.01), and ADOS-2 total score (p < 0.01; see Table 4). The ASD + sample also had elevated CSSs (M = 7.64, SD = 1.45) compared to the ASD- group (M = 2.50, SD = 1.93; p < 0.01). Groups did not differ, however, in regard to other observed behaviors during the ADOS (i.e., “E-codes”) including overactivity/agitation (p = 0.22), tantrums, aggression, negative or disruptive behavior (p = 0.54), or anxiety (p = 0.86). Overall, 100% of the ASD + sample received an ADOS-2 diagnostic classification of at least “Autism Spectrum”, with 95% meeting the more stringent classification of “Autism”. This is in contrast to the ASD- sample, in which 14.7% received an ADOS-2 diagnostic classification of at least “Autism Spectrum”, with 11.0% meeting for the “Autism” classification. The ASD + group exhibited CSS scores within or exceeding the 6–10 “Autism” range (M = 7.64; SD = 1.45), whereas the CSSs of the ASD- group were well below in the “Nonspectrum” CSS of 1–3 range (M = 2.50; SD = 1.93).

Table 4 Autism instrument scores, by ASD diagnostic confirmation status

Caregiver-Reported Autism Symptoms (SRS-2 and ADI-R)

Results from the caregiver-completed SRS-2 revealed no significant differences in any domain or the SRS-2 total score between the ASD + or ASD- samples, with both showing high mean scores on this instrument in the total scores and all subtest scores (see Table 4). The ADI-R showed statistically significant group differences, with the caregivers of the ASD + sample consistently reporting higher levels of ASD symptoms across both item-level and total scores. Overall, 88.6% of the ASD + group had clinically elevated ADI-R total scores, in comparison to 56.2% of ASD- group.

Cognitive Profiles

On the WISC-IV subtests administered, the ASD- sample demonstrated relatively greater performance on both the information (p < 0.01) and vocabulary (p < 0.01) subtests, but not the block design subtest (p = 0.29; see Table 5). Relatedly, individuals in the ASD- sample also obtained significantly higher estimated full scale IQ’s (M = 106.57, SD = 19.08) compared to ASD + group (M = 97.64, SD = 21.59; p < 0.01) although both group means fell in the average range of function.

Table 5 Cognitive and Clinical instrument scores, by ASD diagnostic confirmation status

Caregiver-Reported Anxiety Symptoms

Caregivers of ASD + participants reported comparatively elevated symptoms of anxiety across a number of domains on the MASC-2, including the following indices and domains: the Anxiety Disorder Index (p = 0.01), Humiliation/Rejection (p < 0.01), Physical Symptoms (p = 0.01), Separation Anxiety/Phobias (p = 0.04), Social Anxiety (p = 0.01), Tense/Restless (p = 0.01), and the MASC-2 total score (p = 0.01; see Table 5).

Caregiver-Reported Language Use

No caregiver-reported group differences in functional language use, as measured by the CCC-2, were observed in the current sample (p’s > 0.05;Table 5).

Discussion

To our knowledge, the current investigation is the first to utilize both the ADI-R and ADOS-2 to examine children who were ultimately excluded from ASD research but who presented with existing community-based ASD diagnoses. Strikingly, nearly half (47%) of the participants in the current study did not go on to meet rigorous diagnostic criteria for ASD in a research-based evaluation. Although investigations examining discrepancies between community- and research-based ASD diagnoses are scarce, our results build upon an investigation by Hausman-Kedem and colleagues (2018), in which nearly one-quarter of participants with a community-based ASD diagnosis were classified as non-autism spectrum based on ADOS-2 evaluation. A meta-analysis conducted by Lebersfeld and colleagues (2021) reported reduced specificity of the ADI-R and mixed findings regarding the accuracy of the ADOS-2 in clinical versus research settings. Given recent increased prevalence of ASD (Maenner et al., 2021, 2023; CDC, 2023) and widespread debates in the field about diagnosis, overdiagnosis, and concerns regarding interpretation of ADOS-2 and ADI-R scores in common clinical use (Bishop & Lord, 2023; Duvall et al., 2022; Fombonne, 2023), it is critical to understand possible over- and under-diagnosis of ASD and identify patterns or factors that may contribute to misdiagnosis of ASD.

The current data suggest few group differences regarding demographic characteristics (e.g., age, gender, race, ethnicity) in youth who met expert group consensus for confirmation of ASD and youth who were excluded from study consideration due to not meeting ASD criteria. When considering caregiver report tools in the current study, the SRS-2 was very high in both groups and did not differentiate groups, which may be related to the SRS-2’s role as a screening instrument versus a diagnostic measure and underscores the importance of caution in interpreting report of ASD symptoms without comprehensive assessment and observation. This is also consistent with recent findings that have shown the SRS-2 to have decreased discriminant validity when used with psychiatrically complex patients, such as those with clinically significant anxiety (Capriola-Hall et al., 2021; South et al., 2017) or ADHD symptomatology (Grzadzinski et al., 2016; Havdahl et al., 2016). In contrast with caregiver-derived reports, diagnostic tools that are examiner dependent (i.e., ADOS-2, ADI-R) revealed group differences that may inform diagnostic discrepancies. Specifically, higher ADOS-2 algorithm and CSSs were seen in ASD + youth compared to ASD- youth. Results from the ADI-R also revealed differences in group membership on both item-level data and total scores, such that individuals classified as ASD + had heightened ADI-R scores. Although the ADI-R showed group differences, it is important to note that, based on caregiver accounts, over half of the ASD- group also had elevated ADI-R total scores (56.2% of the ASD- group compared to 88.6% of the ASD + group). Additionally, over 70% of the ASD- group obtained elevated scores on individual domain scores in the ADI-R for reciprocal social interaction, qualitative abnormalities in communication and restricted, repetitive and stereotyped behavior patterns. Of note, caregiver report of the timing of first ASD symptoms seems to provide limited value in differentiating groups as almost all in the ASD- group reported divergence in early development on the ADI-R (95.9%). In sum, this suggests that caregiver reports of ASD symptoms remain high in children who do not go on to meet diagnostic criteria for ASD. While inclusion of the ADI-R in addition to the ADOS-2 improves the accuracy of diagnostic outcomes (Risi et al., 2006; Ventola et al., 2006; Kim & Lord, 2012), in this study, when used alone, the ADOS-2 appears to provide more accurate differentiation between the team consensus categorizations than use of the ADI-R alone. In fact, through a meta-analysis, Lebersfeld and colleagues (2021) found the ADOS-2 to be the more stable and accurate assessment in both clinical and research settings when compared to the ADI-R.

No group differences emerged in caregiver-reported psychiatric diagnoses; however, the research evaluation team provided diagnoses of anxiety, ADHD, and disruptive behavior disorders relatively more frequently to ASD- than ASD + participants. Thus, behaviors and clinical symptoms that the families, and perhaps community-based providers, may have conceptualized as ASD were felt to be better attributed to other psychiatric diagnoses. The ASD + group demonstrated relatively elevated anxiety symptoms on a caregiver-reported measure of anxiety, which is consistent with previous research (e.g., Kent & Simonoff, 2017; Simonoff et al., 2008) and may reflect that the MASC-2 was more sensitive to rigidity in general (i.e., cognitive and/or behavioral), which can be associated with both anxiety and ASD.

It is also possible that the high rate of ASD- individuals within the current sample can be attributed, in part, to qualification for school-based services under the eligibility classification of ASD which vary from state to state and over time. Youth who receive that classification at school may qualify with different criteria than DSM-5 (impact on learning as determining factor, less assessment for other conditions that may also result in social difficulties) and may not have completed a formal medical assessment (Laidler, 2005). It is possible that caregivers mistook an academic classification of ASD as a medical diagnosis and then, in some cases, carried this understanding forward to their medical providers, possibly resulting in the cognitive bias of diagnostic momentum (Streiner, 2021).

This distinction is particularly important given many widely cited prevalence studies of ASD rely on caregiver report (Blumberg et al., 2013) and school records (Maenner et al., 2021, 2023) of ASD diagnoses, suggesting that current rates may, in fact, overestimate the prevalence of ASD. Notably, a high percentage of individuals within the ASD- group were identified as having an ICD-9 code associated with ASD documented within their electronic medical record. Identifying ASD diagnoses from patient electronic medical records is a common recruitment strategy within ASD research and is also used regularly as an empirical outcome measure. For example, a large study investigating the accuracy of a commonly used ASD screening measure, the Modified Checklist for Autism in Toddlers with Follow-Up (M-CHAT/F), defined their outcome of later ASD diagnosis by the presence of prior visit diagnoses of ASD and/or ASD included within a patient’s medical record “problem list” (Guthrie et al., 2019). While this approach enables researchers to examine ASD rates and the efficacy of screening instruments at a large scale without conducting individual ASD assessments for all participants, our current findings suggest using those metrics may falsely identify patients who do not truly meet diagnostic criteria for ASD, and it does not account for variable diagnostic stability (Wiggins et al., 2012; Woolfenden et al., 2012). Consequences of such misclassification can be more serious for etiological studies (e.g., genetics) or treatment efficacy studies. In clinical settings, over-identification of ASD may be following a similar trend of overdiagnosis across mental health conditions (especially in higher SES countries; Merten et al., 2017) and may further increase demand for already overextended ASD resource programs (e.g., county-based developmental disabilities and school-based services), thereby making it even more challenging for those who truly meet diagnostic criteria for ASD to access needed services (Pinals et al., 2022).

The current study has many strengths, including the use of a large sample of individuals who identified as having an existing diagnosis of autism spectrum disorder. Traditionally, in clinical settings, an ASD diagnosis is provided by one provider or a small interdisciplinary team of providers. The current study utilized a team of experts and standardized assessment tools to come to diagnostic consensus to ensure accuracy. Most importantly, the current study addresses a critical gap in the literature regarding overdiagnosis and misdiagnosis. Our results highlight the importance of comprehensive assessment that includes utilizing an examiner-led standardized behavioral observation, such as the ADOS-2, in combination with developmental interviews and comprehensive record review to facilitate accurate diagnostic impressions, especially for individuals with age expected cognitive skills. Our results also suggest that while caregiver reports of autism spectrum symptoms remain critically useful in understanding an individual’s unique strengths and challenges, overlapping traits or mimics may lead to inaccurate diagnoses. This may help to inform the utility and cost-effectiveness of instruments and measures employed within clinic settings.

The current study is not without its limitations. The sample comprised school age children seeking to participate in a neuroimaging study with an estimated FSIQ within the typically developing range as our participants required a certain level of communication in order to safely tolerate the scan without sedation; results may not apply in younger or more developmentally impaired individuals. The exclusion of children with medical complications precluding neuroimaging assessment or families reluctant to participate in this type of research may also impact our ability to extend findings to all individuals. Of note, data collection began in 2011 and spanned changes in diagnostic criteria and measure editions (e.g., WISC), additionally practices and trends in diagnosis have likely continued to shift over this time in both community and research settings. Thus, continued characterization of research samples (including children who volunteer for research with community-based diagnoses but do not meet ASD diagnostic threshold for inclusion) would be beneficial. Given the multiple study visits, changes in study protocol over the course of several years, and demanding nature of the study, missing data were unavoidable. While we do present inferential statistics in the form of bivariate analyses, we chose not to employ additional estimation strategies that could handle missing data, as the nature of this investigation was descriptive and not guided by a specific hypothesis. As noted, protocol changes reduced the administration of interviews/questionnaires, and this mainly affected the ASD- sample. Our examination of those impacted by this protocol change indicated these participants were no different than ASD- participants who fully participated; thus, providing some evidence that these protocol changes did not introduce bias in our estimates and the need to utilize additional statistical techniques. Despite this, missing data is a limitation and the implications related to ADI-R scores may be a particular consideration. Additionally, over 79% of participants in the current study identified as Caucasian. Though this is generally representative of the geographic region in which the study was conducted, the homogeneity of race and ethnicity limits our understanding of race/ethnic differences in group membership and generalizability of findings. This is an important area for future research as currently the CDC notes that, although the racial gap in diagnosis of ASD appears to be narrowing for some select groups (e.g., Black Americans with co-occurring intellectual disability), early diagnosis remains poor across minority groups and disparities persist in Hispanic and bilingual groups broadly (Maenner et al., 2021, 2023). Similarly, socioeconomic status of participants remains largely unknown, further narrowing the conclusions that can be made. Despite these limitations, we believe the compensating strengths are unique in speaking to concerns regarding possible overdiagnosis of ASD given than nearly half of volunteers for this study presenting with school eligibility, community diagnoses and/or ASD documented in their medical chart ended up not meeting expert consensus for ASD using standardized tools.

Conclusions

Overall, clinicians conducting ASD evaluations should be comprehensive and not only consider using standardized examiner administered autism-specific measures to ensure higher diagnostic accuracy but also critically assess other possible psychiatric conditions that may mirror some ASD symptoms. Specifically, clinicians should consider the possible impact that other psychiatric disorders and symptoms (especially ADHD, anxiety and disruptive behaviors) may have on social communication and behaviors as caregiver reports of ASD symptoms remained high in many children who did not go on to meet diagnostic criteria for ASD. The bar for entering into research-based studies on ASD is high (expert consensus) and inclusion criteria are likely more stringent than the assessment methods commonly utilized in the community; thus, the utilization of data from medical records, community-based diagnoses and education-based eligibilities as proxies of ASD diagnosis in population-based studies should be considered cautiously.