Introduction

Since the first autism survey conducted in England (Lotter, 1966), epidemiological surveys have increased in number and complexity. Contrasting with the first studies that were simple head counts of children already diagnosed with a severe autism phenotype and residing in small, circumscribed geographical areas, current surveys now include large populations, multiple sites, stratified samples and rely on intricate sets of screening procedures followed by some form of diagnostic confirmation. However, no agreed-upon formula exists for planning and conducting a survey, and there is no standardization of autism survey methodology. As a result, differences in methodologies account for substantial heterogeneity in survey findings. Each survey has unique design features that reflect the local educational and health services infrastructure and that are influenced by current social policies for children with disabilities in the local region or country. Moreover, survey protocols vary in whether they include parents, teachers and subjects with Autism Spectrum Disorder (ASD) as participants, and rely on variable screening and diagnostic procedures and instruments. As such, prevalence differences between studies are hazardous to evaluate and whether observed discrepancies are due to method factors or true differences in population parameters cannot usually be determined.

Here, after providing a brief review of the autism survey literature, we delve into specific areas of survey methodology where advances have been made or progress is still needed. We refer readers interested in a more detailed analysis to our most recent reviews (Fombonne et al., 2021a; MacFarlane et al., 2021) and commentaries (Fombonne, 2018, 2019) and to those from other scholars (Chiarotti & Venerosi, 2020; Jinan et al., submitted). Of note, the scope of these reviews is restricted to prevalence studies of autism and does not include studies of environmental risk factors. As well, each review has employed slightly different inclusion and exclusion criteria; thus, it is recommended to consult them in sequence to be fully comprehensive.

Substantive survey findings

We recently reviewed 141 prevalence surveys of autism conducted between 1966 and 2020 (Fombonne et al., 2021a). We included all published cross-sectional surveys of population-based samples aiming at estimating the prevalence of ASD. Studies designed to test the efficacy of screening tools or screening programs were excluded. Cohort studies that yielded incidence rates estimates using person-years denominators were excluded; in a few instances, we included cohort analyses that generated cumulative incidence proportions that can often approximate prevalence. We also excluded studies with a target population size of less than 5,000 individuals (because they lack precision and are more prone to sampling biases), studies where no diagnostic confirmation by a professional was available (e.g. surveys relying exclusively on parent reports), studies published in a different language than English or as abstracts only, and duplicate studies conducted on same samples. The 141 surveys were conducted in 37 countries and half of them were published since 2012. The median size of populations surveyed was 62,000 (interquartile range (IQR): 15,500–279,616) and the median age of participants was 8.0 years (IQR: 6.6–9.5); only two surveys focused specifically on adults. The median number of subjects with ASD included in studies was 177 (IQR: 59–987), the median proportion without intellectual disability (ID) was 53.25% (IQR: 31.6–64.5), and the median male:female ratio was 4.1 (117 surveys; IQR: 3.1–4.8). There was a marked increase over time in prevalence (correlation between prevalence and year of publication: Spearman rank r = 0.58; p < 0.001) as well as in the proportion of participants without ID (correlation between proportion without ID and year of publication (67 studies): Spearman rank r = 0.49; p < 0.001). A detailed methodological review of these surveys is available elsewhere (Fombonne et al., 2021a).

In this article, we chose to display one and only one prevalence estimate for each country where a survey had been carried out. Studies were selected according to the following criteria: most recent prevalence estimation in a given country, estimate for primary school age (6–12), largest sample size, and better methodological quality as defined by more intensive case ascertainment and case confirmation approaches. Table 1 includes the 37 selected studies that provide an up-to-date summary of prevalence studies worldwide. Figure 1 displays the findings for Europe and the rest of the world. Interested readers can also consult the interactive global autism prevalence map publicly available at: https://prevalence.spectrumnews.org/. The studies were conducted in 37 countries, half of them since 2018. Some countries (e.g. Russia) have no prevalence data yet available; Africa and South America have very few published studies, most of which failed to meet our inclusion criterion due to their small sample size. There was no study from any of the 28 low-income countries (as classified by the World Bank, 2020); 26 of the 37 surveys originated from high-income countries and 11 from middle income countries (see Table 1). For all 37 studies, the median age of participants was 8.0, the median male:female ratio was 4.1 (32 studies) and the median proportion of participants without ID was 60% (16 studies; IQR: 32.0%-73.3%). Prevalence results were heterogeneous and ranged from 0.043% to 2.68%; the average prevalence in 11 middle-income countries was significantly lower than that for the remaining 26 surveys in high-income countries (0.47 vs 1.0; t = -2.3; p = 0.026). Restricting the analysis to the 26 high-income countries (see Table 1), the median prevalence was 0.97% (IQR: 0.41%-1.31%).

Table 1 Selected prevalence surveys of ASD (37 countries)
Fig. 1
figure 1

Prevalence of ASD per 10,000 in 37 countries (world map on left, Europe inset on right)

We discuss below some methodological issues that must be borne in mind when evaluating the published literature; we outline methodological advances in recent surveys as well as persisting challenges in conducting, analyzing and interpreting these surveys.

Advances and remaining challenges

Case definition and case status determination

An important aspect of survey methodology is how caseness is defined and case status determined in individual participants in each study. There is no uniform approach to case definition across published studies. Some surveys simply use diagnoses from electronic medical records, some rely on an autism special education eligibility that varies across countries and even across areas and over time within the same countries, some rely on endorsement by caregivers of a single questionnaire item while other perform in-person clinical assessments. Many, if not most, surveys use combinations of modalities. Reliance on a particular mode of defining caseness has often predictable consequences on prevalence estimation. Thus, surveys of large national registries or administrative databases usually result in downward bias in prevalence estimation since only cases already identified and diagnosed are counted. Conversely, surveys that rely on parent report in a household survey often overestimate prevalence (see below). However, in most studies, investigators have attempted to confirm directly an ASD diagnosis in a participant (or a subsample of participants) by reviewing the symptomatology and developmental history and referring it to a set of established diagnostic criteria, such as the ICD or the DSM.

Here, several issues need to be considered. First, the terminology of ‘meeting diagnostic criteria’ does not magically guarantee the validity of caseness unless careful attention is paid to the quality of the data used to score these criteria and to how much clinical wisdom was infused into this process. The DSM/ICD algorithms for PDD or ASD are only guiding principles which can help organize the available information and provide final coherence to clinical data stemming from different data sources and informants. Yet, how data are collected, by whom, from which informants and using which methods, and how discrepancies between data sources are resolved, are essential features to consider in gauging the validity of case confirmation in a given survey. Guidance by nosographic definitions has the merit to increase the reliability of symptom identification and through that process the reproducibility of diagnoses across investigators. Indeed, one of the major accomplishments of the DSM-III has been to put emphasis on interrater reliability in psychiatric diagnosis, both for PDDs and for other psychiatric conditions. However, although it is contingent upon high reliability of measurement being achieved first, validity is a separate issue that requires other demonstrations than simply being in agreement.

Second, even when gold standard tools such as the Autism Diagnostic Interview-Revised (ADI-R: Rutter et al., 2003) and the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2002) are employed for in-person assessments, case status confirmation based on ‘scoring above/below threshold’ results are far from being sufficient. In reputable investigations such as the Simons Simplex Collection (Lord et al., 2012) or the Collaborative Programs of Excellence in Autism (Lainhart et al., 2006), scoring rules and cut-offs had to be modified to maintain adequate sensitivity of both instruments in selecting participants in these specific samples. It is worth noting that in both studies expert clinical judgement was employed to provide final confirmation of diagnosis and inclusion in the study. Validity of case status determination does not reside solely in any instrument or its scores; rather, it requires a higher-order, interpretative, process informed by expert clinical judgement. It is important to remember that even instruments like the ADI-R and the ADOS have been developed to be used in conjunction and that their results must be reviewed and interpreted by a clinical expert (Risi et al., 2006; Lord et al., 2012). Mechanical translation of scores into diagnosis is unwise. Similarly, it ought to be remembered that diagnostic algorithms of the ICD and DSM have been validated against a gold standard that was precisely the clinical judgment of experts (see for example Volkmar et al., 1994). The importance of expert clinical judgement in making final decisions about caseness is generally acknowledged in epidemiological investigations even when they do not rely on in-person assessments with the ADOS and the ADI-R. Thus, the surveillance definition implemented by CDC in its surveys guides clinicians’ evaluation of records materials along the nosographical criteria and algorithms but it also allows clinicians to rule out ASD based on insufficient or conflicting information. Unfortunately, how often this clause was used has not been reported and its influence on prevalence estimation remains therefore unknown. Similarly, quality and certainty ratings assigned by CDC clinicians to cases have not been examined with respect to their potential impact on prevalence estimation.

Third, diagnostic algorithms and ADI-R/ADOS cut-offs have been calibrated against control samples that have typically included participants with either typical development or intellectual disability and developmental delays without autism. The performance of these tools may be diminished when applied to samples enriched with varied types of psychopathology (Bastiaansen et al., 2011; Grzadzinski et al., 2016; Havdahl et al., 2016; Matsuo et al., 2015; Turban & van Schalkwyk, 2018) or with other types of neurodevelopmental and genetic disorders (e.g. Garg et al., 2013; Morotti et al., 2020). For example, Grzadzinski et al. (2016) reported that 20%-30% of children with ADHD but without ASD scored over the cut-offs of standardized autism diagnostic tools (ADI-R and ADOS); likewise, in a study investigating the impact of both parent-reported and clinician-reported behavioral/emotional problems on ratings of autistic symptoms, Havdahl et al. (2016) showed that the presence of co-occurring problems increased ADOS, ADI-R and Social Responsiveness Scale (SRS; Constantino & Gruber, 2005) scores resulting in decreased specificity of ASD instruments. Moreover, epidemiological samples include school (rather than preschool) age subjects with language and intellectual skills within the normal range. At that age, many psychiatric disorders are associated with social-communication symptoms (e.g. lack of friendships, self-centeredness, low empathy) and even restricted and repetitive behaviors (e.g. behavioral rigidity, obsessions), allowing autism symptoms to be easily ‘scored’ albeit wrongly endorsed. This concern is heightened for older children or adults with language and intellectual skills within the normal range when they are newly diagnosed as part of their participation to an epidemiological study, especially in the absence of a previous neurodevelopmental evaluation and/or of a developmental history suggestive of prior autistic abnormalities. Differentiating autism in the context of psychiatric comorbidity presents challenges to both to the performance of standardized instruments and the clinical judgement. However, in the absence of experienced clinical evaluation, simply scoring criteria and mechanical reliance on algorithms, either from record reviews or diagnostic instruments, may easily be misleading. Fourth, surveys have incorporated in their case status definition ill-defined diagnostic subtypes such as PDD-NOS whether ICD or DSM (until recently) was used. To illustrate, PDD-NOS could be diagnosed based on the presence of two diagnostic criteria (one social, one other) only, and no requirement of evidence of abnormality before age three. Therefore, contamination of cases with phenocopies of all kinds was a strong possibility. There again, false positives are more likely when mechanical rules devoid of clinical judgement are used to establish caseness. Fifth, screening and diagnostic confirmation should rely on reasonably independent procedures. If record review is used as the main procedure to screen and to confirm diagnosis, the risk of circularity is very high as exemplified in one CDC diagnostic validation study (Bakian et al., 2015). For a child with a clinical ASD diagnosis or ASD special education eligibility, the documentation in his or her medical or educational record will obviously contain descriptions in support of that classification, making it in turn difficult to truly evaluate its validity. Centers for Disease Control and Prevention (CDC) surveys are particularly vulnerable to this problem due to their specific record review methodology (see Van Naarden Braun et al., 2007).

The repeated changes in nosographical systems create another source of measurement uncertainty in autism studies in general. There was relatively strong parallelism between ICD-10 and DSM-IV that was unfortunately lost with the recent changes in DSM-5. Nevertheless, the new, single, unified concept of Autism Spectrum Disorder (ASD), that replaces the previous umbrella diagnostic class of Pervasive Developmental Disorder (PDD), has increased specificity that should benefit epidemiological research. Preliminary studies comparing the effects of using DSM-IV or DSM-5 on prevalence estimates have shown that, all else being equal, the shift from DSM-IV to DSM-5 leads to a decrease of 13% to 20% in prevalence within the same study datasets (Kim et al., 2014; Maenner et al., 2014). The decrease in prevalence is largely due to subjects with a DSM-IV diagnosis of PDD-NOS no longer meeting ASD criteria in DSM-5 (-37% decrease in Kim et al., 2014). Likewise, in a recent CDC survey of children age eight (Baio et al., 2018), the prevalence of DSM-5 behavioral criteria for ASD was -18.1% lower compared to that of DSM-IV-TR (for precise calculations, see Fombonne, 2018); a similar pattern emerged from CDC surveys of children age four surveyed in 2010, 2012 and 2014 (Christensen et al., 2019) where the prevalence was 1.70% for DSM-IV based definition, 20% higher than the 1.41% estimate derived from DSM-5. As it introduced a new DSM-5 based case definition, the new CDC surveillance definition provided a “grand-father” clause by which subjects with a history of a PDD diagnosis would automatically meet criteria for the new definition even though DSM-5 behavioral criteria would not necessarily be met. This practical choice was in line with DSM-5 recommendation to provide a new DSM-5 ASD diagnosis to individuals having a “well-established” DSM-IV PDD diagnosis. However, because the old and new surveillance definitions are embedded in each other, the net effect on prevalence due to the change from DSM-IV to DSM-5 cannot be evaluated; consequently, apparent similarity between DSM-IV and DSM-5 derived prevalence (Baio et al., 2018) should not be taken as evidence that the two sets of diagnostic criteria perform equally in terms of sensitivity and specificity.

The defunct PDD-NOS diagnostic category is not missed. It was an ill-defined diagnostic category with poor inter-rater agreement. In a review of previous surveys (Fombonne, 2003), we noted that the proportion of PDD-NOS diagnosed in epidemiological surveys was highly variable, accounting anywhere between 20 and 70% of the spectrum diagnoses reached in surveys. As narrated by Volkmar et al. (2000), a printing mistake in the 1994 DSM-IV manual initially enforced a hyper-lax definition of PDD-NOS (one social OR communication criterion was sufficient) that was subsequently corrected (one social AND one communication criteria now required) in the DSM-IV-TR Edition (APA 2000). The fact that in CDC surveys, the proportion of PDD-NOS diagnoses has revolved around 40% of the caseload adds further challenges to the interpretation of CDC surveys results (Fombonne, 2018; Mandell & Lecavalier, 2014). In an attempt to increase its specificity, the CDC surveillance case definition for PDDNOS added the requirement of the presence of at least 1 of 19 autism discriminators (see list in Baio et al., 2018). Unfortunately, the effect of applying or not that discriminator on prevalence has not been reported. Like for PDDNOS, a poor level of reliability of the Asperger disorder diagnosis was documented in epidemiological surveys (Fombonne & Tidmarsh, 2003) favoring its removal from DSM-5 as a separate diagnostic entity. Thus, findings from epidemiological studies concurred with those from other studies (e.g. Lord et al., 2012) in showing that reliability for subtypes within the autism spectrum was mediocre whereas it was excellent for differentiating spectrum and non-spectrum diagnoses.

The problems of parental reports

In our reviews, we excluded surveys that relied solely on parental responses collected in various national health surveys due to concerns about the validity of the case definition employed and of the resulting prevalence estimates. Surveys using large nationally representative samples, such as the US National Survey of Children’s Health (NSCH), have yielded prevalence estimates relying on highly problematic caseness determination. Gains in sample size, participants’ age range and representativeness were mitigated by reliance on simple yes/no answers by household informants to one or a few survey questions (‘‘Did a doctor or health professional ever tell you that [child’s name] had autism, Asperger’s disorder, pervasive developmental disorder, or autism spectrum disorder?’’) to establish caseness (Kogan et al., 2018). Similar unconfirmed parent reports were used in other population surveys in the US and elsewhere (Table 2). In these surveys, non-clinically trained interviewers recorded verbatim answers from respondents without further checking, children were not seen, and additional diagnostic evaluation reports were not collected or reviewed. In each of these surveys, the prevalence was estimated to be much higher than that derived from more rigorous population surveys performed at the same time in the same country. For example, the prevalence of 1.7% reported by Russell et al. (2014) in the UK compares to prevalence figures of 1.16% and 0.94% reported in the UK by Baird et al. (2006) and Baron-Cohen et al. (2009) at the same time; likewise, in the US, the recent 2.5% prevalence estimated in the National Health Interview Survey (NHIS; Zablotsky et al., 2020) and in the National Survey of Children’s Health (NSCH; Xu et al., 2018) is higher than the latest 1.85% prevalence figure from CDC (Maenner et al., 2020). To illustrate further the limitations of this type of survey and the considerable concerns about what a ‘case’ really means, a study by Zablotsky et al. (2015) showed that changes in the wording, format and placement of the single autism question in the National Health Interview Survey resulted in a sharp prevalence increase from 1.25% in 2011 to 2.24% in 2014, a difference seen as arising purely from questionnaire design modifications. Much caution should therefore be exerted when interpreting or using these survey results.

Table 2 Surveys using parent reports for case definition

Novel approaches to case finding/ascertainment

Classically, surveys identified cases by zooming in on children already diagnosed with autism or other behavioral or developmental problems. This approach to case ascertainment did not permit researchers to identify cases without a previously recognized condition and resulted in imperfect sensitivity of case ascertainment procedures (due to false negatives). The addition of a regular school survey component in recent surveys (Alshaban et al., 2019; Baron-Cohen et al., 2009; Fombonne et al., 2016; Kim et al., 2011) and in new studies in China (Sun et al., 2019; Zhou et al., 2020) have addressed this concern from a study design perspective (Table 3). However, new issues arose with the implementation of this approach. First, screening tools such as the Social Responsiveness Scale (SRS), the Social Communication Questionnaire (SCQ; Berument et al., 1999) and others show only mediocre specificity, especially among children with elevated levels of concurrent anxiety, attention deficit or other psychiatric symptoms (Fombonne et al., 2021b; Grzadzinski et al., 2016; Hus et al., 2013); moreover, their cut-offs have not been well calibrated for use in general population studies, and when both teachers and parents are used as informants, no clear rules exist for combining their often-discrepant results. As seen in Table 3, multiple screeners have been employed to survey school samples reflecting opportunistic rather than data-derived choices. Second, and most importantly, is the relatively low participation (30–70%) in the initial screening and the other survey phases (e.g. participation in a stage two diagnostic confirmation session). Statistical analysis of these complex survey designs were made adequate by applying a series of weights to account for different sampling fractions and participation rates at each survey phase. However, in doing so, strong, unchecked, assumptions had to be made as to whether participation was associated (or not) to caseness. In the complete absence of information about non-participants (which is the usual situation), the assumption that non-participants do not differ from participants with respect to the presence/absence of autism is a guess rather than a tested proposition. Parents of children with autism have unusually high participation in surveys (Fombonne, 2003) making it plausible that non-participants have ‘less’ autism than participants. Differential participation in that direction may have biased upwards prevalence estimates, a possibility appropriately discussed in the Korean study by its authors (Kim et al., 2011) as well as other commentators (Pantelis & Kennedy, 2016). Conversely, prevalence could be underestimated if parents of children with ASD were less likely to participate.

Table 3 Contribution of school survey to overall prevalence estimates

Nevertheless, important findings were obtained by adding a school survey component to the study designs. As can be seen in Table 3, the prevalence estimated by the school survey alone was never nil, and ranged from 0.054% to 1.89% confirming that screening school children allows for the identification of new cases that would otherwise have been missed by previous methodology relying on children already diagnosed with some form of disability. Moreover, within each survey, the relative contribution of the school prevalence component to the overall population prevalence ranged from 13 to 72% (median: 33%, 10 surveys from Table 3). It is probable that this wide variation reflects differences in school survey methodology across these studies although it might also reflect true differences across populations in the proportion of diagnosed/undiagnosed children. Unfortunately, there is no way to test these competing interpretations. Current preliminary evidence (Table 3) suggests nonetheless that up to a third of cases of autism in a population could be missed in studies that do not survey schools.

Execution of school surveys has confronted investigators with huge sample size and manpower issues, specifically due to very high numbers of screen positive, and sometimes of screen negative children to be assessed in second phases of diagnostic confirmation. Innovative techniques have been used to tackle this issue. For example, in a multisite Chinese study where 32.9% of school participants screened positive on an autism questionnaire, Zhou et al. (2020) implemented a second step screening procedure combining a brief semi-structured direct observation and group interviews in the school setting in order to eliminate a large proportion of false positives on the initial screener and thereby reduce the second phase sample to a manageable size. In the Qatar study, Alshaban et al. (2019) devised a brief semi-structured telephone interview allowing for rapid evaluation of a high number of screen negative children, leading to more valid and precise estimation of the prevalence estimate. The need to combine informants and data sources in efficient ways has led some European investigators (e.g. Narzisi et al., 2020; Fuentes et al., 2021) to develop a nested screening procedure whereby teachers are asked first to nominate children with suspected social communication or restricted/repetitive problems with a six-item Teacher Nomination Form (Hepburn et al., 2008). Parental screening is subsequently obtained only for the small sample of participants who first screened positively on teacher measures, allowing researchers to limit the final number of screened positive participants, hereby defined as screening positively both on teacher and parent informants. The efficiency of this approach is very attractive; however, its accuracy depends heavily on the performance and properties of the initial teacher identification, the sensitivity of which remains unknown. For example, it is possible that teacher nomination could disproportionately miss girls with autism, ‘passive’ autistic children in Wing’s nomenclature or those without behavioral problems. The method also creates another stratification layer within the screening phase that complicates the survey data analysis.

In sum, the addition of general schools to the samples surveyed in autism epidemiology was a logical improvement that has proven to be contributory. However, the methods used to screen and confirm cases in large samples of typically developing children need to be refined and adequately tested for their performance and cost-effectiveness.

The male preponderance in autism

The male preponderance in autism is a well-recognized feature of the disorder, one that has been steady through decades of research. In a review of 29 surveys published up to 2001, we previously reported an average male:female ratio of 4.3 (Fombonne, 2003). And in our latest all-inclusive review of 141 surveys published from 1966 to 2020, a median sex ratio of 4.1 was derived from 117 studies with sex data (Fombonne et al., 2021a); likewise, the sample size weighted average was 4.13. A sex ratio of 4.1 is equivalent to observing 80–81% males in surveyed samples. As illustrated in Fig. 2, this sex ratio has not changed over time as shown by the non-significant Spearman correlation between sex ratio and year of publication (Fig. 2).

Fig. 2
figure 2

Sex ratio in ASD surveys over time (117 studies)

In a recent meta-analysis of 54 surveys (data collected 1990–2011), Loomes et al. (2017) found that the male:female prevalence ratio was similarly 4.2:1. The authors rightly pointed out that conventional calculations of sex ratio (dividing the number of affected males by that of affected females) does not adequately capture the increase in risk associated to male sex. Indeed, a better measure is the prevalence odds ratio (obtained by dividing the prevalence in males by the prevalence in females, hence the terminology of ‘prevalence odds ratio’ or POR). The difference between the sex ratio and the POR is that it adjusts on the relative sizes of the unaffected male and female population under study. For example, in the New-Zealand study (Table 1; Bowden et al., 2020), the sex ratio is 4.59 when calculated as the ratio of affected males to that of affected females (2,577/561); however, the prevalence odds ratio decreases to 4.34 if the prevalence in males (2,577/163,185) is divided by the prevalence in females (561/154,236). This change reflects the slightly higher proportion (51.4%) of males than of females in the underlying population. We have nonetheless kept our reporting of the conventional sex ratio because: a) details about the male and female population denominators are not always available in published articles whereas sex ratio is routinely reported or can be calculated; b) when population denominators by sex are available, simple calculations of the POR (as described above) may be erroneous in complex survey designs where survey weights should be applied separately for each sex to account for unequal sampling fractions and participation rates at different survey phases, and; c) using sex ratio will facilitate comparisons since it is a widely reported metric.

After grouping surveys according to risk of bias, active/passive ascertainment and availability of IQ data, Loomes et al. (2017) reported a POR of 3.25 in 20 surveys with active case ascertainment, and of 3.32 in 17 surveys with low risk of bias. They concluded that the typical “4:1 male-to-female ratio is inaccurate” and that the true ratio is “lower than 3.5:1”. Furthermore, they interpreted this result as supportive of theories of female camouflage and systematic underdiagnosis. As explained above, the analysis could not account for other survey design (sex specific participation rates at different phases, survey weights) and individual participants (e.g. sex associated exclusion/inclusion criteria such as genetic disorders (e.g. Fragile X) and other comorbidities, etc.) characteristics that may modify the results. The subset of 20 studies had small sample sizes – there were fewer than 1,900 ASD participants in the 20 surveys with active case ascertainment. Like any biological variable, sex ratio in autism studies has a sampling distribution and variability across studies is to be expected. The dispersion of sex ratios across surveys is well illustrated in Fig. 2. Demonstrating further this variability, the median sex ratio of the 32 surveys with available sex ratio data (representing > 12,000 participants) included in Table 1 is 4.1:1 with a range of 1.5 to 6.7 (IQR: 3.3–4.4). In the most recent CDC survey (Maenner et al., 2020; Table 1), the male:female ratio ranged from 3.4:1 (Missouri) to 4.5:1 (Arkansas), with an overall value of 4.3:1. Thus, asymptotic convergence of sex ratios towards a central value matters more than any specific study estimate. Furthermore, the interpretation of Loomes et al.’s lower sex ratio as evidence of underdiagnosis in females was unsubstantiated as are several corollary claims linking female camouflaging and underdiagnosis (Fombonne, 2020). Besides, in school surveys of ASD where new, previously undiagnosed, ASD cases were identified, we found no evidence that more females than males were previously undiagnosed (see Table 3, right column); if anything, the trend was for even higher male:female ratio among newly diagnosed participants as would be expected in samples of school children without intellectual disability.

Our review does not therefore support the hypothesis that the male preponderance in ASD has been overestimated nor that it has changed over the last 50 years. Indeed, the ratio of 4 males to 1 female remains a robust characteristic of ASD both in epidemiological and clinical samples.

Age considerations

When evaluating surveys, careful consideration should be paid to the age range of included participants. Surveys have generally focused on school-age children and there are reasons why this is a good sampling choice. By ages 6 to 10, diagnoses can be verified and validated with robust instruments and methods. At lower ages, some children will be missed since the age of diagnosis is often delayed up to primary school entry or later. At older ages, some improvements in milder forms of the autism phenotype can pose difficulties for both identification and diagnostic confirmation. Importantly, a reason to focus on primary school-age is that, in most countries, school attendance is compulsory after age six which allows comprehensive, publicly available sampling frames to be used by survey researchers. In addition, most children with autism show some impaired functioning for learning and adaptive behavior that makes them eligible for school special support services, rendering them easier to identify in surveys.

In some studies that capitalize on existing databases or registries, prevalence estimates may be biased towards lower values when denominators include either infants or toddlers or older adults. For different reasons, those age groups are less likely to be diagnosed with autism: infant and toddlers simply because they have no or little likelihood to be already diagnosed and adults because of secular changes in awareness and ASD identification. Therefore, inclusion of very young or adult age groups in prevalence calculations is not recommended as it will bias the prevalence estimate towards lower values. For example, in their analysis of the Germany national health insurance database, Bachmann et al. (2018) report for 2012 a prevalence of 0.38% when considering the whole age range 0–24. However, the prevalence in age groups < 1 and 18–24 were much lower (about 0.11% and 0.18%) whereas a more accurate population estimate was 0.60% obtained for the 6–11 year old age group. Thus, while it may be useful for descriptive purposes to report prevalence at different ages, prevalence derived from school-age samples is likely more valid and accurate to inform service planning and public health policy.

Yet, even within the school age range, cross sectional surveys that sample different age groups sometimes exhibit age-associated differences in prevalence that are difficult to interpret. For example, in some surveys of relatively narrow age ranges (6–12), prevalence was at its maximum in children age eight or nine and lower at older ages which is inconsistent with autism being a lifelong disorder (e.g. Alshaban et al., 2019; Yeargin-Allsopp et al., 2003). Differences in sampling frames, participation rates, access to diagnosis and services or awareness could explain these results although these age effects remain often unexplained. Thus, age trends in prevalence are best evaluated in surveys that provide lifetime prevalence rates in cohorts followed over time rather than in cross-sectional surveys of contiguous birth cohorts. Typically, and reflecting age related patterns in the diagnosis of ASD, S-shaped curves portray low prevalence in preschoolers, followed by a steady increase through primary school age and progressive plateauing at older ages. An example of such pattern can be found in a recent Italian study where prevalence in 2001–2003 birth cohorts rose steadily with age from 0.40% among 3–5 year old to 0.96% among 9–11 year old and to 1.19% among 15–17 year old (Valenti et al., 2019). These trends in age-specific prevalence must be interpreted in the context of the specific survey methodology. Surveys that rely mostly on passive counts of already diagnosed cases will yield school-age prevalence figures that likely underestimate the population prevalence at that age. For example, in the new Canadian surveillance study (Ofner et al., 2018), only 72% of those participants diagnosed by age 17 had been diagnosed by age 8 and 10% were diagnosed after age 12. The CDC methodology circumvents the problem related to late diagnoses of ASD by allowing new cases to be confirmed in previously undiagnosed children at age eight. Of note, consistent with the Canadian data, about 20% of the case load of CDC surveys correspond to such cases. In general, surveys that are designed to identify yet-undiagnosed cases should yield more accurate prevalence estimates at any age and exhibit less marked age effects.

Adult surveys are still scarce. Pioneering studies were performed in England on combined samples of adults living in typical households or in accommodations for adults with ID (Brugha et al., 2016). These authors reported a prevalence of 1.1% with no variation across different age bands. The prevalence was much higher in the subsample with moderate to severe ID that also had a low male:female ratio compared to the usual male preponderance found in the sample without ID. This survey piloted thoughtful adult survey methodology (Brugha et al., 2012). Limitations were a low participation rate in the subsample with ID, and the small number of affected adults among those without ID. In the US, prevalence of autism among adults 18–65 years old registered in Medicaid in 39 states was reported recently (Jariwala-Parikh et al., 2019). In 2008, the overall adult prevalence was 0.37% and marked birth cohort effects were seen as illustrated by the prevalence varying from 0.82% among 18–25 year olds down to 0.05% among 46–65 year olds. Few other studies have shown similar decreases with age of adult prevalence (e.g. Bachmann et al., 2018), the magnitude of which suggesting that lack of awareness and diagnostic services for older cohorts rather than differential mortality accounted for this effect. Speaking to the importance of the population of adults with ASD, a simulation study by Dietz et al. (2020) estimated the national and state prevalence of ASD among US adults ages 18–84, taking into account prevalence data from the NSCH, mortality data for children and adults in the US, and the standardized mortality ratio that recapitulates the excess mortality in adults with ASD. The authors predicted that the current prevalence of ASD among adults over age 18 would be 2.21%, ranging from 1.97% in Louisiana to 2.42% in Massachusetts, and that 5.5 million adults were living with ASD at the national level. However, such models depend on some assumptions and input data that are not necessarily correct. For example, Dietz et al. (2020) used for their modelisation prevalence data for the 3–17 year old obtained from the National Children Health Survey, a survey that notably relies on unconfirmed parent reports (see above and Table 2).

There is no doubt that more surveys of adults with ASD are necessary, not only to estimate the prevalence or track time trends in prevalence but in order to identify patterns of psychiatric and medical comorbidity and unmet service needs of this growing fraction of the population (Fombonne et al., 2020; Hand et al., 2020).

Worldwide studies and cultural issues

It is beyond the scope of this article to review in detail the issues raised by the world emergence of surveys of autism and by the cross-cultural questions they pose. Two matters are addressed. First, is there evidence today that autism is either very rare or very abundant in some areas in the world? This question is important as geographical variation in incidence might provide important etiological clues either on genetic or environmental causation. The second question briefly touches upon variability across cultures of the expression of the autism phenotype and of its measurement, specifically as it applies to epidemiological surveys. Readers interested in a thorough reflection about cross-cultural issues are referred to the excellent conceptual framework proposed recently by de Leeuw et al. (2020).

The last 20 years have seen a welcome expansion of ASD epidemiological surveys of child populations, worldwide. Of the 197 world countries, prevalence estimates exist for only 37 countries (Table 1). As shown on the world map (Fig. 1), data are still lacking in many low- and middle-income countries, especially in Africa, South America, Russia, Caucasus and Central Asia. In many countries, lack of awareness and of diagnostic and intervention expertise persist alongside social stigmatization (e.g. Alshaigi et al., 2020; Yu et al., 2020). However, with the development of the internet and social media, and of advocacy organizations, it has become more difficult for governments to ignore the individual, familial and societal problems associated with autism and neurodevelopmental disorders in general. Epidemiological surveys are a natural starting point for developing clinical and research expertise on these conditions; and governments and their decision-making agencies understandably demand local, quantitative data to guide their service planning decisions.

Everywhere it has been investigated, autism has been found. Small case series appeared in the literature in 1972 for Africa and 1982 for China. Following these seminal clinical descriptions, basic surveys followed consisting of simple head counts that underestimated the prevalence as they only included diagnosed cases in areas where diagnostic services were scarce. As services expanded, prevalence increased; for example, in Oman where specialized autism services were recently established, prevalence in Muscat rose to 0.37% compared to a previous Omani estimate of 0.014% (Al-Mamari et al., 2019). And when more fully-developed survey methods are deployed, prevalence in the neighborhood of 1% has been reported in countries as diverse as India, Qatar, Mexico or China (Table 1), figures that are commensurate to those from high-income countries. However, the variability in survey methodology from one country to the other makes it impossible to draw inferences about underlying differences, if they exist, in true population prevalence. Therefore, with today’s available published data, there is no evidence that there are countries with either very low or very high autism rates, or meaningful between-country variations in prevalence. By the same token, true differences could exist and remain undetected with current methodological limitations.

Turning to the second issue, the similarity of the autism phenotype and of its clinical presentations across cultural groups has been rather striking in our experience of conducting studies in varied cultural settings. Across countries, investigators have relied upon international diagnostic criteria and employed them without difficulty. Diagnostic tools such as the Autism Diagnostic Interview Revised (ADI-R) and the Autism Diagnostic Observation Schedule (ADOS) have now been translated in multiple languages, and implemented successfully in survey diagnostic confirmation phases (e.g. Alshaban et al., 2019; Fombonne et al., 2016; Kim et al., 2011; Zhou et al., 2020). Investigators from the Korean study specifically examined the cultural applicability of the ADOS and ADI-R in diagnosing autism in Korean children and concluded that both DSM diagnostic criteria and scores on standardized diagnostic tools performed well in that population (Kim et al., 2016). Thus, it appears that the concept of autism has some universality even though it might be labelled and named differently in some cultures (for example, “Takiwātanga” among Maoris of New Zealand which means “in his/her own time and space”; www.tepou.co.nz and Bowden et al. (2020)).

Even though a common concept of autism is identified, it remains possible that differences across cultures in the expression and measurement of its manifestations may occur. Indeed, some cultural adaptations of autism tools have been necessary here and there. In China, the birthday party task of the ADOS Module 1 needed to be replaced by an equivalent task since birthday parties are not part of the familial traditions. In South Africa, the screwdriver toy of the Toddler ADOS needed to be removed when used in townships where this particular tool is commonly associated with violence and murder (de Vries, personal communication). In several Asian countries, eye contact from children to adults is discouraged (although Kim et al. (2016) disputed that claim for Korean children) and rules for appropriate social behavior emphasize compliance in children. Chinese parents do not normally expect their child to imitate reciprocally facial expressions or to point fingers at objects to show interest which may reduce the predictive validity of some items of the M-CHAT (Zhang et al., 2006) or that of other screening instruments. In turn, these different cultural expectations in child rearing may require an adjustment in professional definition and evaluation of reciprocity in social interactions. For example, we previously adapted a version of the Social Communication Questionnaire in Inuktitut to use as a screening tool among Inuit communities of Northern Canada, only to discover that, to mean ‘No’ or ‘Yes’, frowning the nose or raising the eyebrows are often substituted to conventional shaking and nodding the head (Fombonne et al., 2006). Comparisons of Indian, English and Japanese children on the Autism Quotient showed that some items perform differently in some cultural groups (Carruthers et al., 2018). The item ‘Enjoys social occasions’ performed poorly with Indian parents who typically raise their children with strong expectations for social conformity. Likewise, compared to Greek and Italian counterparts, US toddlers endorsed social interaction difficulties at higher frequencies on a toddler autism screener (Matson et al., 2017). In pioneering observations, Lotter (1978) reported a generally lower frequency of stereotyped behaviors, rocking and hand flapping in African samples. In the US, higher frequency of endorsement of routines and rituals, preoccupations with parts of objects and sensorimotor difficulties was documented in White compared to Black autistic children in record reviews at one CDC survey site (Sell et al., 2006); in other studies, Black children were reported to have more co-occurring ADHD symptoms than White children (Jarquin et al., 2011; Jo et al., 2015). Yet, direct observations of larger samples of Black and White children in the US evaluated with the ADOS did not confirm these differences (Fombonne & Zuckerman, 2021).

Overall, the reported differences across cultural groups are inconsistent and of small magnitude; to date, reports of cultural variation in symptom expression are best viewed as preliminary and require replication in larger samples after proper adjustment on background factors such as age, gender, language and cognitive level as well as on method of data collection. Nevertheless, these preliminary observations call for appropriate cultural sensitivity in working across cultures and may necessitate the occasional change in questionnaire item wordings or testing apparatus. While a single item’s performance might change according to cultural context, it appears that tools, in their totality, maintain measurement properties comparable to those established in Western countries where they were developed. For example, when screening tools were calibrated in local samples, the performances of the Social Communication Questionnaire (SCQ) or the Social Responsiveness Scale (SRS) were comparable in Qatar and Saudi Arabia (SCQ) and Mexico (SRS) to original UK and US studies (Aldosari et al., 2019; Fombonne et al., 2012). Finally, the magnitude of cultural effects on item or criterion endorsement and discriminant power does not appear to be larger than that already reported for gender, age, language or intellectual level within culturally homogeneous samples although formal comparisons of effect sizes remain to be performed.

Thus far, examples of cross-cultural differences in ASD symptom profiles remain largely anecdotal and a systematic investigation of differences in the expression and measurement of the autism phenotype across cultures remains to be conducted. Cross-cultural comparisons have been performed in other areas of psychopathology, e.g. the WHO world studies of schizophrenia in the 1970s, the US-UK comparisons of ADHD diagnostic approaches in the 1980s, and more recently, cross-national comparisons of child psychopathology measured with the Child Behavior Checklist (Rescorla et al., 2007) or the Strengths and Difficulties Questionnaire (Kovess-Masfety et al., 2016). Investigators who are embarking on autism surveys should keep in mind that their research data could be leveraged by embarking into international collaborations set to more systematically test the transcultural robustness of the autism phenotype and of its measurement.

Databases, ad hoc surveys and surveillance

Prevalence studies of autism vary in their methodological complexity, feasibility, duration, generalizability and costs. The datasets used to generate prevalence estimates are not comparable across studies and their respective merits and limitations should be recognized. For convenience, we grouped them into three types: administrative databases and registries, cross-sectional surveys, and surveillance programs.

Studies that use existing databases with routinely-collected health information provide an easy opportunity to generate preliminary prevalence estimates for a given population. Investigators have used health insurance databases (e.g. Bachmann et al., 2018; Segev et al., 2019), educational databases (e.g. Thomaidis et al., 2020) or regional or national registries (e.g. Delobel-Ayoub et al., 2020; Valenti et al., 2019). Some distinct advantages of such data sources are that they do not require costly data collection efforts, they have large and representative samples, they incorporate follow-up updates to clinical information allowing estimation of cumulative incidence or prevalence at different ages, they encompass cohorts born over long periods permitting detection of secular changes, they may include well-suited control groups of participants without ASD, and they may sometimes be merged with other databases containing more detailed health or socio-demographic information. Their limitations include reliance on electronic diagnoses/categories that cannot be verified, case definitions that reflect prevailing professional practice rather than research informed concepts, inability to capture undiagnosed or misdiagnosed participants, and intake that is contingent upon changing health or educational policies that in turn directly influence prevalence estimation.

A second type of surveys are cross-sectional investigations performed at one point in time in a given area or population. In the last 15 years, with increasing worldwide awareness, government authorities in low- to middle-income countries have initiated such studies often after extensive lobbying of influential individuals and newly-formed local family associations supported by advocacy organizations such as Autism Speaks, the World Health Organization or grassroots non-governmental organizations (Hoekstra et al., 2018; Rosanoff et al., 2015). The goal of these surveys is to generate an initial local prevalence estimate to gauge the magnitude of the health problem and to provide necessary information to decision-makers in charge of service planning. These ad hoc surveys provide a useful baseline against which surveys in other geographical areas or in the future can be calibrated. In addition to yielding a prevalence figure, carefully collected data can add value in describing trajectories of children with ASD in the local health and educational system, performing case–control comparisons of risk factors, developing locally-validated new tools for screening and diagnosis, collecting genetic specimens (e.g. saliva samples) and biomarkers in searchable repositories and creating an opportunity to follow up a population-based sample in order to study factors associated with later outcomes. It is not uncommon for such surveys to take four to five years to be completed from the initial planning to the final results. Idiosyncrasies of local health and educational systems, differences in levels of awareness, engagement and expertise result in major differences in survey design that ultimately make comparisons across surveys hazardous.

Finally, some countries have deployed programs aiming at monitoring autism in the population, often alongside surveillance of other developmental disabilities. The most important surveillance program, the Autism and Developmental Disabilities Monitoring (ADDM; https://www.cdc.gov/ncbddd/autism/addm.html), was launched in 2000 by the CDC in the US. The ADDM network comprises up to 16 sites that have estimated about every two years the prevalence of ASD among eight year-old children. The methodology of the ADDM relies on a systematic health and education records review that is relatively cost-effective (children are not assessed in person) and allows children without a prior diagnosis to be counted as cases if the behavioral pattern described in records meets criteria for the surveillance case definition (see Van Naarden Braun et al. (2007) for details of ADDM methodology; and a good summary in Baio et al. (2018). Since 2014, the ADDM has started to track ASD among four-year old children as well. The most recent ADDM survey yielded a prevalence of 1.85% among eight year-olds (Maenner et al., 2020; Table 1), with for the first time similar prevalence in White and Black children (1.85% and 1.83% respectively) but still lower prevalence in Hispanic children (1.54%), a male:female prevalence ratio of 4.3, a 33% frequency of associated intellectual disability, and an average age at diagnosis of 4.25 years for the 74% of children diagnosed prior to the survey. ADDM surveys have been useful in tracking over time prevalence and associated characteristics of ASD in the US population. Of note, ADDM surveys rely on convenience samples that are not nationally representative; in addition, the geographical repartition of ADDM sites has varied over time complicating the assessment of time trends. Other limitations of the ADDM methodology have been discussed elsewhere and include particular concerns about the validity of the surveillance case definition (Mandell & Lecavalier, 2014; Fombonne, 2018).

Repeated national surveys conducted in the US (National Health Interview Survey (NHIS); National Survey of Children’s Health (NSCH)) have also been used to track prevalence in the US population over time. The strengths of national surveys lie in their sampling methodology and representativeness, and their inclusion of wider age ranges, but as discussed above, they are seriously limited by the case definition they employ (see also Table 2). The CDC maintains a visualization tool that allows comparisons of these different data sources in the US (https:// www.cdc.gov/ncbddd/autism/data/index.html).

In Canada, a newly formed National Autism Surveillance System (NASS) has released its first results from 2015 concerning almost two million children ages 5 to 17, using administrative data from seven Provinces and Territories (Ofner et al., 2018; Table 1). The NASS case definition relies on ICD- or DSM-derived ASD diagnoses provided or confirmed by licensed health care professionals. The prevalence was 1.52% with a male to female ratio of 4:1 at all ages; 56% of the 29,099 cases had been diagnosed by age six, 72% by age eight, and 92% by age 12. In Europe, 14 countries of the European Union have engaged into a large multifaceted cooperative program to develop early detection programs, validate biomarkers, train professionals, improve support for adults and propose policies (Autism Spectrum Disorders in the European Union (ASDEU); www.asdeu.eu). The other objective is to investigate the prevalence of autism in 12 countries using a methodology that focuses on school age, identifies diagnosed children as well undiagnosed ones through school surveys, relies on common instrumentation (SCQ, ADI-R, ADOS), and uses in-person assessments. Additionally, exploration of European regional and national registries is being pursued as a complementary strategy (Delobel-Ayoub et al., 2020). Details on the methodology can be found on the web site of ASDEU and in publications (Boilson et al., 2016; Narzisi et al., 2020; Fuentes et al., 2021). Although it is too early to evaluate the efficacy and success of NASS and ASDEU as surveillance programs for Canada and the EU, the increasing interest for establishing population surveillance of ASD is noticeable in several high-income countries.

Conclusions

Epidemiological studies of ASD have expanded worldwide with a median estimate of 1% providing a conservative figure for ASD population prevalence. Comparisons of results across studies should be made with extreme caution due to irreducible heterogeneity pertaining to case definition and ascertainment strategies unique to each survey. Definitions of ASD used in population surveys often do not coincide with those required in rigorous clinical research protocols such as randomized clinical trials or molecular genetic investigations. Survey definitions are influenced by the need to comprehensively capture cases (optimizing sensitivity) and to estimate service needs for developmentally impaired children (that may come at a price for specificity). The addition of a survey component where mainstream schools were surveyed has consistently proven that general school screening is required if a comprehensive picture of ASD is to be provided. However, screening schemes for school samples need to be further researched, available screeners compared, and more cost-effective approaches properly evaluated.

Several countries are now considering the implementation of national registries or surveillance programs that will help track trends in prevalence and incidence of ASD in their populations in the future. To improve these programs, several additional features could be considered. First, more extensive validation of cases included in household surveys or registries would be beneficial. For example, validation of parental reports in the NSCH and NHIS even on subsamples could considerably augment their usefulness considering their acknowledged strengths in sampling and representativeness. Second, incorporating a follow-up of samples recruited as part of the ADDM network studies (now being planned) and of other surveys would provide critical information about diagnostic stability and developmental trajectories as well as their predictors. Accordingly, when designing new surveys, investigators should plan forward and implement ethically approved policies authorizing participant re-contacting in future investigations. Third, broadening the focus of surveys to the larger realm of neurodevelopmental disorders would increase their public health relevance and would also allow examination of important questions of boundaries and overlap between the autism phenotypes and other developmental disorders (motor, language, ADHD, etc.) and genetic syndromes. Premature and arbitrary decisions on what to include and what to not include in the definition of autism have historically proven to be detrimental to scientific enquiry. It is true that epidemiology appreciates binary codes and states (diseased/not diseased) that are necessary for prevalence calculations. Yet, there is more to epidemiological studies than calculating a proportion; inclusion of dimensional measurements of disease related constructs and of co-occurring phenomena and risk factors in population based samples would go a long way to advance current debate about the autisms and overlapping phenotypes. Fourth, systematic incorporation in survey protocols of standardized measures of behavioral problems and psychiatric disorders should be considered both at the screening stage and at the diagnostic confirmation stage. As discussed above, co-occurring behavioral problems influence the performance of autism screening and diagnostic tools in a way that can only be elucidated with contemporaneous and separate measurement of those problems. In a child already diagnosed with autism, it will facilitate the assessment of comorbid disorders while in evaluating children without a previous ASD diagnosis, this approach will provide the means to increase the specificity of a new ASD diagnosis. Surveys of school age children, teenagers and adults would especially benefit from such additions to their instrumentation. Fifth, diagnostic criteria for ASD have changed over time and, with them, the case definitions used in epidemiological surveys and surveillance programs. In future studies, new definitions and criteria should be introduced while keeping operational prior criteria/definitions. This will allow to test the impact on prevalence of the changes in those definitions and will preserve the possibility to evaluate time trends meaningfully. Sixth, to a large and unfortunate extent, surveys of autism and registries have failed to incorporate measures in the biological and genetic domains that are needed to tease apart the autism behavioral and cognitive heterogeneity. Addressing the disconnect between epidemiological surveys of autism and studies of its biological mechanisms should be regarded as a priority for the future of autism epidemiology. Leveraging epidemiological investigations by systematically developing regional registries and repositories may respond to that need. To summarize, we suggest that in planning future surveys and surveillance programs of ASD, investigators should systematically contemplate the possibility to enhance their research protocols by expanding the scope of enquiry to include a broad array of neurodevelopmental conditions, including longitudinal follow-up extensions, collecting genetic samples, and adding neuroimaging and biological sampling so as to maximize the return of information for their professional community, the participants and their families, and their funders.