Introduction

Autism Spectrum Disorder (ASD) affects as many as 1 in 68 children (Christensen et al. 2016). Individuals with ASD display a range of difficulties in the socio-communicative domain as well as restricted, repetitive patterns of behavior, interests or activities (American Psychiatric Association 2013). These difficulties limit everyday functioning and have a major impact on quality of life, both for the individuals and their families (Karst and Van Hecke 2012). Several randomized controlled studies have shown that early intensive behavioral intervention during the preschool years is associated with better cognitive and adaptive functioning and with reduced severity of core ASD symptoms (Dawson et al. 2010; Estes et al. 2015; Warren et al. 2011). However, long delays exist between initial concerns, referral, diagnosis, and intervention delivery. Although limited service availability likely accounts for these delays, the difficulty for primary care providers to identify early signs of ASD in young children may also be a contributing factor. Clinical features in the socio-communicative domain are subtle, variable among toddlers, and changing over time (Landa 2007; Lord et al. 2012; Ozonoff et al. 2010). Moreover, some behaviors such as repetitive movements can also be found in typically developing children (Leekam et al. 2007). Therefore, efforts have been made to facilitate screening of toddlers at-risk for ASD in primary care settings by designing standardized screening instruments that are objective, inexpensive, easy to use, and brief (Dumont-Mathieu and Fein 2005).

The Modified-CHecklist for Autism in Toddlers (M-CHAT) is a promising ASD-specific screening instrument for improving early detection of ASD in primary care settings (Robins et al. 2001). The M-CHAT is a parental questionnaire intended for use in the general population, consisting of 23 yes/no items targeting core symptoms of ASD in children between 16 and 30 months (Kleinman et al. 2008). It was originally developed in the USA as an extension of the CHecklist for Autism in Toddlers (CHAT; Baron-Cohen et al. 1992). The M-CHAT retains the CHAT’s format as well as the first nine items while eliminating the observation section and expanding the parent report items. The M-CHAT also includes a structured follow-up interview (FUI) for parents of children who initially screened positive. This two-step screening process allows additional details to be collected on children’s at risk responses in order to reduce the number of false positive cases. Note that a revised version of the M-CHAT, the M-CHAT-R/F, has recently been proposed by Robins et al. (2014). This present study, however, focuses on the original M-CHAT and its validation on a French population sample.

In their initial study, Robins et al. (2001) determined 2 cut-off scores based on the 23 items and on 6 critical items showing the best discriminant capacity: item 7 (proto-declarative pointing); item 14 (response to name); item 2 (interest in peers); item 9 (showing); item 15 (following pointing); item 13 (imitation). A child screened positive on the M-CHAT if he failed any 3 items out of the 23 or if he failed 2 critical items. Applying these criteria on a mixed-sample of 1122 children screened during their well-child visit with primary care providers (level 1 sample) and 171 children screened through early intervention service providers (level 2 sample), the authors reported a sensitivity of 0.87–0.97, a specificity of 0.95–0.99, a positive predictive value (PPV) of 0.36–0.79, and a negative predictive value (NPV) of 0.99 depending on which cut-off scores were used and whether or not the FUI was taken into account in the screening process. Subsequent studies further confirmed the effectiveness of the M-CHAT in detecting young children at-risk for ASD. These studies also highlighted the importance of the FUI in the screening process, especially in a low risk, general population sample (level 1 sample) (Chlebowski et al. 2013; Kleinman et al. 2008; Robins 2008). Of the 3309 low-risk children screened between 16 and 30 months in the Kleinman et al. (2008) study, 189 failed the initial screening, among which 20 received a diagnosis of ASD, yielding a PPV of 0.11. However, only 31 children failed both the initial screening and the FUI, thereby reducing significantly the number of false positive cases and increasing the PPV up to 0.65. In another non-overlapping sample of 4797 low-risk children screened between 15 and 24 months, Robins (2008) reported comparable results: with the M-CHAT alone the PPV was 0.058, whereas the combined use of the M-CHAT and FUI yielded a PPV of 0.57. The largest study to date included a sample of 18,989 children screened between 18 and 24 months across 2 US regions (Chlebowski et al. 2013). While the PPV with the M-CHAT alone was very low (0.06), the PPV with the M-CHAT and FUI was within the range of the prior studies (0.54). Overall, these studies provided empirical support for the utility and effectiveness of the M-CHAT when combined with FUI in low-risk children, consistently demonstrating that more than half of all children who failed both the M-CHAT and FUI present ASD.

The utility and effectiveness of the M-CHAT has also been investigated in a number of non-English speaking countries (e.g. Canal-Bedia et al. 2011; Kamio et al. 2014; Nygren et al. 2012; Wong et al. 2004). The majority of these studies agree on the clinical utility of the M-CHAT, even if performances may slightly vary across cultures and countries. As pointed out by Wallis and Pinto-Martin (2008), this variability highlights the importance of the validation and adaptation process of screening instruments developed in a different country and culture.

The Current Study

The French health-care system provides free universal access to several medical check-ups during childhood, including three compulsory well-child visits in the second year of life that makes it particularly suitable for implementing a systematic ASD screening procedure (Rogé et al. 2009). However, and although the French National Health Agency, following a number of health organizations across the world, recommends the use of the M-CHAT between 18 and 24 months during well-child visits, there has been no study conducted so far to validate the M-CHAT on a French sample (Baghdadli et al. 2006; García-Primo et al. 2014). Therefore, the aim of this study was to validate the M-CHAT on a French population sample of 24 months in order to provide decision rules regarding a child risk status for French primary care providers.

Method

Participants

Participants were 24 month-old children living in the Midi-Pyrénées area. The current sample includes exclusively low-risk children drawn from the general population and recruited from one of two sources. Participants were either recruited during the 24 months well-child visit at their pediatrician’s office or when they were 24 months at the daycare center that they attended. Children were not included if (1) they already had a diagnosis of ASD, (2) they were born preterm, before 37 weeks of pregnancy, and (3) they had severe sensory or motor impairments.

A total of 1250 children (663 males, 53%) were screened with the M-CHAT at 24 months. Of these 1250 children, 298 were screened by their pediatrician and 952 were screened by one of the staff members at their daycare center.

Screening Procedure

After initial contacts with all pediatricians and daycare centers from the Midi-Pyrénées area, outlining the nature of the study, a total of 175 pediatricians and 400 daycare center staffs (including childcare workers and pediatric nurses) agreed to complete a 2-h training course on ASD. The 2-h training course was given by one of the authors and included an introduction to ASD, a discussion of the importance of early screening, the use of the M-CHAT and the CHAT and the study procedure.

After completion of the training course, voluntary pediatricians (n = 17) and voluntary daycare centers (n = 62) invited all families with a 24 month-old child eligible for the study to fill in the informed consent form and the French version of the M-CHAT (resulting from a forward and back translation procedure). In order to detect false negative cases, professionals were also invited to observe the child according to the five observation items from the CHAT at 24 months and again at 30 and 36 months. They were also invited to raise any concerns about a child regardless of his/her M-CHAT and CHAT scores. Once filled, the M-CHAT and the CHAT forms were sent to the laboratory for scoring.

Scoring followed the original scoring approach from Robins et al. (2001). When a child failed the M-CHAT (i.e. any 3 M-CHAT items or 2 of the six critical items), parents were contacted by phone and the FUI was administered by one of the authors. If parental concerns remained (i.e. M-CHAT score still indicated risk for ASD after FUI), a free clinical/developmental evaluation was offered to parents. In order to detect children with possible ASD who did not fail the M-CHAT at 24 months, parents were also contacted by phone when one observation item from the CHAT was failed at any time (24, 30 and 36 months), and a free clinical/developmental evaluation was offered if necessary.

Evaluation Procedure

Evaluations took place either at the laboratory or at the child’s daycare center depending on the family’s preference. Any child suspected of ASD was evaluated with the French adaptation (Rogé et al. 2009) of the Autism Diagnosis Observation Schedule-Generic (ADOS-G; Lord et al. 2000) to target symptoms of ASD. In addition, a developmental evaluation was conducted with the Psycho Educational Profile Revised (PEP-R; Schopler et al. 1990) and the Vineland Adaptive Behavior Scales (VABS; Sparrow et al. 1984) in order to identify the presence of a developmental delay. Evaluations were conducted by one of these three authors, all trained in the use and scoring of the ADOS-G in young children, while one of the research assistants videotaped the session. If the score on the ADOS-G reached the cut-off for ASD, the family was referred to an independent team for a formal clinical evaluation to confirm the diagnosis of ASD. Following the evaluations, families received a written report and an intervention proposal.

Data Analyses

Sensitivity, specificity, and predictive values were examined on the basis of the screening and evaluation results to investigate the performance of the M-CHAT in identifying children at-risk for ASD in a low-risk, French general population sample. These psychometric properties were calculated both for the M-CHAT alone and the M-CHAT combined with the FUI. Additional analyses included score and item level analyses. For score level analyses, the total number of failed items and the number of failed critical items after taking into account the FUI were compared between the ASD, TD and DD groups. On the item level, a descriptive analysis of the percentage of children in each group who failed each item was first conducted and Chi square analyses were used to compare the frequency of children who failed each item in the ASD and non-ASD group before and after the FUI.

Results

Screening Results

Of the 1250 children screened at 24 months, 108 (8.8%) failed the M-CHAT and required the FUI. Of the 108 that screened positive, 85 (79%) were contacted to perform the FUI and specify their risk status (the remaining 23 screen-positive cases could not be contacted). Of these 85 children, 20 (24%) screened positive at the FUI and were offered an evaluation and 65 (76%) screened negative after the FUI. None of them required an evaluation afterwards.

Of the 1142 children that initially screened negative on the M-CHAT, 15 potential false negative cases were identified with the observation items from the CHAT at 24 months and 1 potential false negative case was identified through physician concern at 36 months. All potential false negative cases (n = 16) were offered an evaluation.

A follow-up at the age of 30 months with the CHAT observation items was performed on 862 children that initially screened negative on the M-CHAT (70% of the sample), and another one was performed at the age of 36 months on 431 children (35%). None of these follow-ups led to the identification of potential false negative cases.

Evaluation Results

A total of 36 children were offered an evaluation either because they continued to screen positive on the M-CHAT after the FUI (n = 20) or because they were identified as potential false negative cases at 24 months or later (n = 16). All families agreed to participate in the evaluation.

Of the 20 cases that screened positive, 12 (60%) were diagnosed with ASD. The other 8 children all presented a developmental delay, as indicated by a score below the cut-off for ASD on the ADOS and significant difficulties in the verbal and/or non verbal domains revealed on the PEP-R and VABS.

Of the 16 potential false negative cases, 6 (38%) were diagnosed with ASD: five were identified with the CHAT observation items at 24 months and 1 was identified through physician concern at 36 months (i.e. screen-negative case on both the M-CHAT and the CHAT observation items). The other 10 potential false negative cases were comprised of one typically developing child and nine children presenting a developmental delay.

Thus, a total of 18 children with ASD were identified in the current sample: 12 (67%) children were true positive cases, failing both the M-CHAT and the FUI at 24 months and 6 (23%) children were false negative cases. Of the 18 children with ASD, 11 (61%) were screened by pediatricians and 7 (39%) were screened at the daycare center they attended. None of them were younger siblings of children with ASD.

A total of 17 children with a developmental delay were also identified: 8 (47%) children screened positive on the M-CHAT and continued to screen positive after the FUI and 9 (53%) children were identified with the CHAT observation items at 24 months (Fig. 1).

Fig. 1
figure 1

Flowchart showing screening and evaluation results. a Unable to be contacted for FUI. b Evaluations based on cases detected through the observation items of the CHAT at 24 months (n = 15), and physician concern at 36 months (n = 1). c Detected through the observation items of the CHAT at 24 months (n = 5), and physician concern at 36 months (n = 1). Neg negative, Pos positive

Clinical Validity of the M-CHAT: Sensitivity, Specificity, and Predictive Values

Calculations of the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the M-CHAT, with and without FUI, were based on 1227 children (648 males), after excluding 23 screen-positive cases whose FUI could not be conducted and assuming that all children, whose follow-up could not be carried out, were true negative cases. Only the psychometric properties of the combined scoring methods are reported (i.e. any 3 M-CHAT items or 2 of the six critical items). Separate use of these two scoring methods is likely to decrease the sensitivity, which is not desirable in a low-risk, general population sample. Combining these two scoring methods, 12 true positives cases and six false negative cases were identified, yielding a sensitivity of 0.67, 95% CI (0.41, 0.86). Sensitivity remains the same with or without the FUI, given that the FUI is primarily intended to reduce the number of false positive cases. Without FUI, 73 false positive cases and 1136 true negative cases were identified. This yields a specificity of 0.94, 95% CI (0.92, 0.95). When the M-CHAT was combined with the FUI, the number of false positive cases dropped to 8, yielding a specificity of 0.99, 95% CI (0.98, 0.99).

With the M-CHAT alone the PPV was 0.14 and the NPV was 0.99, whereas the combined use of the M-CHAT and FUI yielded a PPV of 0.60 and a NPV of 0.99. The administration of the FUI is thus essential as it significantly increases the PPV of the screening.

Score Level Analysis

The average number of total failed M-CHAT items was 7.67 (SD = 5.98, range 0–17), 2.53 (SD = 2.40, range 0–8) and 0.53 (SD = 0.70, range 0–2) in the ASD, DD, and TD groups respectively. Given the violation of both the normality and the homogeneity of variances assumptions, a bootstrapped (N = 10,000) one-way analysis of variance was used to test for differences between groups. A significant effect of group (p < .001) was found. Post-hoc tests indicated that children with ASD failed significantly more items than children with DD (p < .001) and TD children (p < .001).

The average number of critical failed M-CHAT items was 2.50 (SD = 2.15, range 0–6), 0.47 (SD = 0.71, range 0–2) and 0.03 (SD = 0.17, range 0–1) in the ASD, DD and TD group respectively. Likewise, a bootstrapped one-way analysis of variance revealed a significant effect of group (p < .001). Post-hoc tests indicated that children with ASD failed significantly more critical items than children with DD (p < .001) and TD children (p < .001).

Item Level Analysis

Table 1 shows the percentage of children who failed each item of the M-CHAT by group before FUI, along with the results of the Chi square tests for each item between the ASD and Non-ASD groups.

Table 1 Percentage of children who failed each item of M-CHAT by group before the FUI

In the ASD group, the most frequently failed items are items 5 (pretend play) and 7 (proto-declarative pointing) failed by 55.56% of the children, 8 (functional play), 13 (imitation), and 19 (joint attention) failed by 50%, and items 9 (showing) and 22 (wandering without purpose) failed by 44.44%. In the DD group, the most frequently failed items are items 6 (proto-imperative pointing), 10 (eye contact), 22 (wandering without purpose), and 23 (social reference) all failed by 23.53% of the children and items 7 (proto-declarative pointing), 17 (looking object), 18 (unusual finger movements), 19 (joint attention), and 20 (deaf suspicion) all failed by 17.65%. In the TD group, the most frequently failed items are item 11 (oversensitive to noise; 17.20% failed); item 22 (wandering without purpose; 12.25% failed); item 23 (social reference; 6.54% failed); item 20 (deaf suspicion; 5.87% failed) and item 8 (functional play; 5.12% failed). Chi square analyses revealed that all but 3 items, including item 1 (physical play), item 10 (eye contact) and item 11 (oversensitive to noise), were more frequently failed by children with ASD than children without ASD.

When combined with the FUI, results remain unchanged for the ASD and DD groups. For the TD group, all percentages fall below 0.2%, meaning that almost no TD children failed items from the M-CHAT after the FUI.

Discussion

The aim of this study was to investigate the performance of the M-CHAT in identifying children at-risk for ASD at 24 months in a low-risk, French general population sample. Out of a sample of 1227 children, 18 received a diagnosis of ASD, 12 of which were identified with the M-CHAT plus FUI. This shows the utility of this autism-specific screening tool in primary-care settings. The PPV values in the current sample were very similar to those from other low-risk samples of children (Chlebowski et al. 2013; Kleinman et al. 2008; Robins 2008). Specifically, we found a PPV of 0.60 when the M-CHAT was combined with the FUI, meaning that about 60% of children who screened positive on the M-CHAT and who continued to screen positive after FUI presented ASD. However, when used alone, the PPV of the M-CHAT was 0.14, suggesting that approximately as few as 1 in 10 children who screened positive presented ASD. As noted by Kleinman et al. (2008), given this unacceptably low PPV, the M-CHAT alone is not advocated in the low-risk, general population. Instead, primary care providers should systematically administer the FUI for those who initially screen positive on the M-CHAT. The FUI is used to gather additional information on a child risk status in order to avoid unnecessary referrals and parent concern as well as to improve the performance of the test. Our results are in agreement with the 2-step screening process outlined by Robins (2008) when the M-CHAT is used in the low-risk general population. Moreover, all children who continued to screen positive after the FUI in our sample presented a developmental concern, either ASD or DD, which suggests that the use of the M-CHAT plus FUI is beneficial not just for children with ASD. This result also indicates that only a minimal number of children, if any, would be referred for further evaluation despite not presenting any developmental concern (Chlebowski et al. 2013). Importantly, all families agreed to participate in the follow-up evaluation that was offered to them, which is a major strength of this study.

Investigating the test’s psychometric properties, we found a sensitivity of 0.66. This sensitivity is lower than what is reported in the majority of studies (Kleinman et al. 2008; Canal-Bedia et al. 2011; Nygren et al. 2012; Robins et al. 2001) but higher than the one reported in Kamio et al. (2014). This discrepancy between studies, in estimating the sensitivity of the M-CHAT, may reflect differences in study designs, notably whether or not an attempt was made to identify potential missed cases (i.e. false negative cases). Sensitivity decreases as the number of false negative cases increases. In the current study, we employed both a concurrent screening instrument, the CHAT observations, and attempted to rescreen all children with the CHAT observations at 30 and 36 month. This led to the identification of 6 additional children with ASD who all screened negative on the M-CHAT at 24 months. Nevertheless, although sensitivity is not as high as in other studies, it is still higher than when screening relies on non-standardized strategies (Sand 2005). Specificity and NPV values were high in the current sample, meaning that children who screened negative on the M-CHAT were likely to not present ASD. However, it is important to stress that sensitivity, specificity, and NPV values should be considered with caution. This is particularly true for specificity and NPV values, which both represent conservative estimates of the “true specificity” and “true NPV” of the M-CHAT. Although we attempted to follow a maximum of children who initially screened negative on the M-CHAT, only 70% of the sample was followed-up until 30 months and 35% until 36 months. None of these children were identified as false negative cases but a possibility of false negative cases remains among the children we were unable to follow.

To further investigate the response pattern, we analyzed the frequency in which each item was failed in the different groups. We found that, before the FUI, all items were significantly more frequently failed in the group of children with ASD, except item 1 (physical play), item 10 (eye-contact) and item 11 (sensitivity to sound) which were equally failed by ASD and non-ASD children. Atypical eye contact is a cardinal feature of ASD emerging early in life. However, in our sample, none of the children with ASD failed item 10 while this item was failed by 2% of non-ASD children. This intriguing result could be due to the content of the question that does not emphasize on the communicative nature of eye contact. In the revised version of the M-CHAT (Robins et al. 2014), the authors rephrased this item to include examples of communicative contexts for further clarity (e.g. talking to him/her, playing with him/her, dressing him/her).

Limitation

One important limitation of this study relates to the low number of children screened at 24 months. This mostly results from the low participation of primary care providers. Of the 175 pediatricians that agreed to receive a 2-h training course, only 14 took part in this study. This indicates the challenges to overcome in order to facilitate uptake of ASD screening in the health care system. The reasons for this low participation rate remains unknown but could be due to logistic challenges, lack of time, or not feeling comfortable with the tools (Dosreis et al. 2006, Ip et al. 2015; Pinto-Martin et al. 2008; Zwaigenbaum et al. 2015). Factors influencing pediatricians’ participation in the ASD screening still needs further investigation (Ip et al. 2015; Zwaigenbaum et al. 2015). This area of research will be important in order to develop effective strategies that could enhance ASD screening in pediatric settings. In the current study, the introduction of the screening program was best followed in daycare centers. Nevertheless, its implementation in such settings faces important drawbacks since only 16.5% of children under 3 years of age benefit from daycare center services in France (Observatoire National de la Petite Enfance 2014).

In addition, the hypothesis of a selection bias among pediatricians may be considered. Of the 298 children screened by pediatricians, 11 were ultimately diagnosed with ASD. Given this high proportion of children with ASD, it is likely that pediatricians did not administer the M-CHAT on a routine basis to every child who came to their office. The M-CHAT was more likely used as a confirmation test when pediatricians had concerns about a child. Therefore, there was a higher probability for these children to present a risk of ASD. This underlines the importance of emphasizing that ASD screening should be systematic in pediatrician practice (Dosreis et al. 2006; Radecki et al. 2011).

Conclusion

This study is the first to validate an ASD-specific screening instrument, the M-CHAT, for young children in France, providing French practitioners with guidelines regarding its use in primary care setting. When used in a general low risk population, our results add to the evidence that the M-CHAT alone is not recommended. The FUI should systematically be administered to children who initially screen negative. With this 2-step screening procedure, the M-CHAT is an effective and useful screening instrument that correctly identifies the majority of children with ASD. The M-CHAT has the potential to facilitate early screening of ASD in primary care settings. However, our results also suggest that further studies are needed to investigate the feasibility and acceptability of the M-CHAT in clinical practice in order to identify factors that would encourage ASD screening. Ultimately, this would help guide appropriate political decisions regarding the implementation of ASD screening in France.