Introduction

Commitment to the early detection of Autism Spectrum Disorders (ASD) in children has resulted in more children being identified in the pre-school period [1]. However, there is still a gap of 18 months between parents’ first concerns and the definitive diagnosis [2]. This diagnostic delay increases parental distress [3, 4] and delays treatment, which is unfortunate because there is an evidence that early intervention leads to better outcomes. Studies have shown that children with ASD make greater advances when they start treatment before 4 years of age than at an older age [5, 6], and early intervention is associated with language gains, improved social behavior, and decreased symptoms of autism [6].

Despite the importance of identifying and treating ASD early, it remains difficult to detect ASD in young children. Clinical features such as failure to develop peer relationships and lack of varied pretend play emerge over the first years of development and cannot be used as a diagnostic criterion at an early age, and symptoms of repetitive movements, such as flapping or rocking, may be present in typically developing infants [7]. In addition, the characteristics of autism change over time and can become more obvious and severe, or more subtle and difficult to detect. The need for screening instruments to facilitate the early recognition of autism and related disabilities has long been recognized. The first instrument developed for this purpose was the Checklist for Autism in Toddlers (CHAT) [3]. Based on the Theory of Mind construct and joint attention, the CHAT was designed to screen for autism in children at 18 months of age in the general population. It includes nine parent-report items combined with five observational items that are completed by a home visitor. Researchers in the USA subsequently modified and extended the instrument to a 23-item parent-report questionnaire [8]. The Modified Checklist for Autism in Toddlers (M-CHAT) incorporates the original nine CHAT parent-report questions and additional items, and eliminates the practitioner observation component of the CHAT. Another instrument was recently developed for early screening: the Early Screening of Autistic Traits (ESAT) [9]. This questionnaire consists of 14 yes–no items and was developed as a primary screen to identify children at risk of ASD in the general population at 14 months. The ESAT is an empirically based, bottom-up constructed questionnaire, which means that it is based on the review of prominent early signs and symptoms of ASD as reported in earlier studies. The ESAT is not a top-down instrument like the CHAT key screening items that have been chosen to fit a conceptual model.

Although these screening instruments have been tested and validated in high-risk samples and in unselected samples of children of different ages, their psychometric properties as primary screening instrument for children aged 18 months in the general population have not yet been fully investigated or replicated. Since there is still a gap between parents’ first concerns and clinical evaluation, it is of interest to know if a screen-positive result is related to other parental worries, behavioral and medical problems and referrals. Previous studies by Glascoe [10, 11] and Tervo [12], for example, found that parental concerns relate directly to their child’s wellbeing and development. Also, parental judgment about whether or not to comply with professional recommendations did reflect a rather accurate estimate of the severity of autistic symptoms of their child [13]. However, there are some complications that add to the difficulties in early identification and intervention of children at risk. For example, health providers may minimize or dismiss parental concerns [14]. To complicate things further, previous studies found that when parental suspicions of problems or symptoms are not systematically elicited, more than 40 % of parents do not reveal them [10]. When the parents expressed their concerns, this did not increase the likelihood of referral to diagnostic and treatment services from a primary physician [15]. Using a standardized tool or instrument improves an exchange of thoughts for parents as well as health care providers. Therefore, this study focuses on the very important stage before clinical assessment, by addressing parental concerns, behavioral and medical problems and early referral status in relation to screening outcomes at the ESAT and M-CHAT. If the ESAT or M-CHAT strongly differentiates between the degree and content of parental concerns for children being screening positive and children being screen negative, this would contribute importantly to the validity of both instruments in being able to identify children at high risk for autism spectrum disorders or other serious developmental problems at an early age. In addition, there have been no head-to-head comparisons of the two instruments in the same population.

Here, we report results from a large Norwegian prospective pregnancy cohort study that included the ESAT and M-CHAT on the 18-month questionnaire. We investigated: (1) proportions of children that screened positive for ASD on the ESAT or the M-CHAT; (2) the extent to which screening positive on the ESAT and M-CHAT is associated with clinical referral by 18 months; and (3) whether screening positive on the ESAT or M-CHAT is associated with other aspects of children’s development, health, and behavior.

Methods

Design

This study was part of on an ongoing, prospective pregnancy cohort study of more than 100,000 births in Norway, the Norwegian Mother and Child Cohort Study (MoBa) [1618]. The MoBa cohort was initiated in 1999 and had included 100,000 pregnant women by April 2008. Pregnant women at participating hospitals were recruited to the study through a postal invitation in connection with the routine ultrasound examination offered to all pregnant women in Norway at 17–18 weeks of gestation (http://www.fhi.no/morogbarn). Overall, 44 % of those invited agreed to participate; informed consent was obtained from all participants. Information on the health and life style of the pregnant women, their partners, and, subsequently, their children was collected by means of questionnaires during pregnancy and post partum. The current study is based on the quality-assured data files released for research version II. The Regional Committee for Medical Research and the Norwegian Data Inspectorate approved the study. In this study, we used the data collected with the questionnaire that mothers completed when their child was 18 months.

Participants

The study population was a subsample of the children from the MoBa study. The two inclusion criteria were that the child became 18 months during the inclusive period 25 July 2003 to 29 March 2005, and that the mother had completed the questionnaire when the child was 18 months. Of the 16,919 children that met the first criterion, 13,015 also met the second criterion (response rate for completion of the questionnaire at 18 months was 76.6 %). Sixty-seven children were excluded because all M-CHAT and/or ESAT data were missing, leaving 12,948 questionnaires for analysis. Non-systematic missing data on items were coded as system missing, did not add to the total scores and did, therefore, not contribute to a possible screen (positive) status. Cases were included if they had at least one item of the ESAT and M-CHAT filled in. M-CHAT and ESAT data were complete (all items filled in) for 11,952 of the 12,948 children (12,102 cases on M-CHAT items and 12,666 cases on ESAT items). The mean age of the children at completion of the questionnaire was 18.53 months (SD 0.54). The sample consisted of 6,616 boys (51.1 %), 6,290 girls (48.6 %), and 42 children whose sex was not reported (0.3 %). In the total sample, 91.8 % of the children had parents who were both native Norwegian speakers, and 87.7 % had grandparents who had Norwegian as mother tongue.

In a population-based design, it is of great importance to know if the sample is a good representation of the general population. For this study, family characteristics that are related to the age of identification of ASD are of special interest, such as parents’ socio-economic status [19]. In the current study, the gross income of parents was divided into income groups with a range of 99,000 NOK. The median gross income group of the sample (including child support, unemployment benefits, and other allowances) was NOK 200,000–299,000 (€27,000–40,000) for mothers and NOK 300,000–399,000 (€40,500–54,000) for fathers, which was higher than that of the Norwegian population overall, since the median income in Norway 2003 was NOK 186,500 (€25,000) for women and NOK 285,600 (€38,000) for men [20].

Procedure

When the child was 18 months, the mother was sent a questionnaire on maternal and child health. If the mother did not return the questionnaire within 3 weeks, a written reminder was sent. All forms were scanned and the data were quality controlled and de-identified before being given to the researchers [16]. The research team was completely separate from any clinical services that were used by parents and their children, and the obtained results from the 18-month questionnaire were not communicated to and used by general practitioners.

Instruments

The 18-month questionnaire included questions about the child’s nutrition, growth, health, illnesses, medications, development, behavior, and everyday life, and about parental health and welfare. The section on child development and behavior incorporated the M-CHAT and the ESAT. The order of the items from the two instruments was not counterbalanced or interspersed. We used a fixed format.

M-CHAT

The M-CHAT is an extension of the Checklist for Autism in Toddlers (CHAT) [3]. The M-CHAT includes the first nine parent-report items of the CHAT and 14 additional items were created based on hypotheses from the literature, clinical instruments used to evaluate older children, and the clinical experience of the M-CHAT authors. Six M-CHAT items, identified by discriminant function analysis (DFA), are considered essential: protodeclarative pointing, response to name, interest in peers, bringing things to show parents, following a point, and imitation. Children who fail any 3 of the 23 items or 2 of the 6 critical items are considered screen positive. The psychometric properties (sensitivity, specificity, positive and negative predictive power) and reliability of the M-CHAT have been investigated in both low- and high-risk samples aged between 16 and 48 months [8, 2124]. In these studies, the proportion of the population that screened positive ranged from 5.7 to 14 %; the positive predictive value (PPV) was low (PPV 0.058–0.11), but was increased by telephone follow-up (0.57–0.65) [21, 24]. Pandey et al. [22] reported PPVs after telephone follow-up of 0.28 and 0.61 for younger and older low-risk groups of children, respectively. A recent study by Chlebowski et al. [25] showed that the M-CHAT is an effective screening instrument for ASD when the two-step screening procedure is used including the follow-up interview to reduce the number of false positives. They also found that children with a total score of 7 or higher can be directly referred for further clinical evaluation and bypass the follow-up interview. Another recent large population study showed that screening with M-CHAT alone in a general population at 18 months is not effective in identifying the majority of children who ultimately received a diagnosis of ASD, since only a third of the children diagnosed with ASD scored above the cut-off the M-CHAT at 18 months [26]. For the present study, the M-CHAT was translated (and back translated) into Norwegian from English (the original version). All 23 M-CHAT items are represented in the 18-month questionnaire. The scoring algorithm defines screen positive as failing 3 of the 23 M-CHAT items or 2 of 6 critical items. For the current study, we did not include a telephone interview, which is consistent with the American Academy of Pediatrics’ recommendations on early screening for ASD [27].

ESAT

The ESAT was developed to identify ASD in children in the (developmental) age of 0–36 months in the general population. It consists of 14 items and covers the domains of pretend play, joint attention, interest in others, eye contact, verbal and non-verbal communication, stereotypes, preoccupations, reaction to sensory stimuli, emotional reaction, and social interaction. Items of the ESAT are answered with “yes” for typical and “no” for atypical behavior. Children with “no” answers on at least 3 of the 14 ESAT items are screen positive [9]. The ESAT has been evaluated in a two-stage screening procedure in a study involving 14-month-old children (N = 31,724) from the general population that had been prescreened by physicians at well-baby clinics using a 4-item prescreening instrument including the first four items on the ESAT. Children who failed any of the four items were considered prescreen positive, and parents were asked to complete the 14-item ESAT. Children who screened positive on the complete 14-item ESAT were invited for further systematic psychiatric examination. Eighteen of 73 high-risk children were subsequently diagnosed with ASD. The remaining high-risk children had developmental language disorder, or mental retardation [28]. The ESAT is recently introduced as a screening instrument and available in a educational package ESAT—“screening of ASD at a young age”, containing a set of ESAT questionnaires, and a theoretical and practical manual for identifying, screening and diagnosing ASD [29].

For the present study, the ESAT was translated (and back translated) into Norwegian from Dutch (the original version). The complete instrument 14-item ESAT was included in the 18-month questionnaire. The scoring algorithm defines screen positive as failing 3 of the 14 items.

Referral status

The main outcome was referral to developmental services as reported by the parents in the 18-month questionnaire. This included referrals to educational services, child habilitation units, and child psychiatry services. Each service has specific functions: the educational services are for the assessment of eligibility for resources due to learning difficulties; the child habilitation units are for the medical examination and diagnostic work-up for children with neurodevelopmental disorders; and the child psychiatry services are for mental health issues and the diagnostic work-up of behavior problems. All children with disabilities are ultimately seen by the educational services. The special needs of populations of these three different services, therefore, overlap.

Other clinical and developmental characteristics

The health and development of the children that tested positive on the M-CHAT and/or on the ESAT was investigated, using preselected items from the 18-months questionnaire. Medical problems were assessed with four items reflecting aspects of child health often associated with poor development, including worries about physical development, hearing problems, diverging head circumference, and seizures. Up to 30 % of individuals with autism spectrum disorder (ASD) have comorbid epilepsy, while the prevalence of epilepsy in all children is 2–3 % [30, 31]. Motor development was assessed with six items on gross and fine motor development. Although not being core symptoms of autism, movement disorders and delayed motor development have been associated with ASD [3234]. Parental concerns were assessed with three items on parental “worries” about the child’s physical development and behavior. Parental concerns about their child’s behavior are associated with mental health issues [10]. However, at least one study has found that parental worries, as scored using the Parents’ Evaluation of Developmental Status (PEDS), were not predictive of M-CHAT screen positivity [23]. Social, emotional, language and communication, and joint attention were assessed with items derived from standard developmental scales such as the Ages and Stages Questionnaires (ASQ) [35]; Emotionality, Activity, Shyness, Sociability Scale (EAS) [36, 37]; Social, Communication Questionnaire [SCQ; Previously called Autism Screening questionnaire (ASQ)] [38]; NonVerbal Communication Checklist (NVCC: Schjolberg, In preparation). The source of the items is indicated in Table 2 (because Table 2 consists mainly of results, the table is placed in the result section).

In this study, all items were binomial and scored in the same direction with a score 0 for a normal answer and score 1 for an abnormal answer. In this way, the percentage of children scoring on an item reflects the percentage of children having developmental, health, and behavioral problems.

Data analyses

First, we investigated the overlap between the M-CHAT and ESAT by comparing the proportion of children that tested positive with either instrument, using the McNemar test [39, 40]. To measure the level of agreement between the M-CHAT and ESAT, a tetrachoric correlation was calculated with the statistical program R [41] using the Polycor package version 0.7-5 [42] and an algorithm for polychoric correlations reported by Drasgow [43] and Olsson [44]. The tetrachoric correlation estimates what the correlation between binary raters would be if ratings were made on a continuous scale [45], and the trait underlying the rating is conceived to be continuous. The tetrachoric correlation is preferable to Kappa if the observed prevalence of responses in one of two available categories is low. Its value is interpreted in the same way as a Pearson correlation [45]. Cohen [46] gives the following guidelines for the interpretation of effect sizes in social sciences: a correlation coefficient of 0.10 is thought to represent a weak or small association; a correlation coefficient of 0.30 is considered a moderate correlation; and a correlation coefficient of 0.50 or larger is thought to represent a strong or large correlation. However, the criteria for the interpretation of a correlation coefficient are in some ways arbitrary and should not be followed too strictly.

Secondly, we investigated whether testing positive on the M-CHAT or ESAT was associated with referral to developmental services by 18 months, by calculating PPV and negative predictive values (NPV). The PPV reflects the proportion of children that screened positive that were referred to any of the three developmental services (educational, habilitation, child psychiatry). The NPV reflects the proportion of children that screened negative that were not referred to any of the developmental services. PPV and NPV were calculated separately for the two instruments.

Thirdly, we investigated clinical and developmental differences between the children that screened positive or negative on the M-CHAT and ESAT. The percentage of children in each group whose mothers had indicated that their children had medical, motor developmental, emotional, language and communication, or social interaction problems or showed ‘abnormal’ joint attention and play (see Table 2) was compared, using a z test of two independent proportions. In addition, we compared the referred and non-referred children among ESAT and M-CHAT screen-positive children on the clinical and developmental variables using Chi-square tests or Fisher’s exact test. This could give some insight into the reason why children were referred to developmental services. The statistical analyses were performed with the statistical software package SPSS 20.0.

Results

Screen-positive scores of the M-CHAT and ESAT

Of the 12,948 children, 71 screened positive on the ESAT (0.5 %) and 826 screened positive on the M-CHAT (6.4 %) (p < 0.01). Cross tabulation of ESAT by M-CHAT screening status showed that 93.5 % of all children screened negative on both instruments, 0.4 % screened positive on both instruments, 0.2 % screened positive on the ESAT but negative on the M-CHAT, and 6.0 % screened negative on the ESAT but positive on the M-CHAT (Table 1). The tetrachoric correlation was 0.685, indicating a strong agreement between the M-CHAT and ESAT.

Table 1 Crosstabs of ESAT screen scores by M-CHAT screen scores

ESAT and M-CHAT screen scores were the highest for the lower income groups and the lowest for the highest income groups. Although these differences were significant, partly due to the large numbers in the study, the magnitude of the differences for the ESAT and M-CHAT between the income groups was very small (maximum difference between income groups ESAT: 0.17; M-CHAT: 0.37), and the correlations between the separate income groups and the screen score of the ESAT and M-CHAT were very low (r min–max = −0.012 to 0.028). Further, no difference was found in the distribution of children referred across the income groups (father and mother), between 0.6 and 2.8 % of the children in each income group has been referred (income mother: χ 2 = 9.91; p = 0.13; income father: χ 2 = 8.73; p = 0.27).

Screening status and clinical referral

In total, 184 children from the study population were referred to a developmental service. Of these 184 referred children, 79 (42.9 %) had screened positive on the M-CHAT, 21 (11.5 %) had screened positive on the ESAT, 21 (11.5 %) screened positive on both instruments and 103 (56.0 %) screened negative on both the ESAT and M-CHAT. Reversely, 79 of 826 (9.6 %) children that tested positive on the M-CHAT were referred to developmental services, 21 of 71 (29.6 %) children that screened positive on the ESAT had been referred to developmental services, and 21 of 51 (41.2 %) children that screened positive on both instruments had been referred by 18 months. Figure 1 shows the percentages of children referred to the developmental services (children could be referred to more than one service).

Fig. 1
figure 1

Percentage of children who screened positive or negative on the M-CHAT, ESAT, or M-CHAT plus ESAT combined that was referred to developmental services *The designation ‘Not screen positive’ refers to children that screened negative on the M-CHAT in the M-CHAT series, screened negative on the ESAT in the ESAT series, or screened negative on the M-CHAT and/or ESAT in the M-CHAT/ESAT series (all children except the children that screened positive on both instruments). Children could be referred to more than one service

The PPV of the ESAT was higher (range 0.070–0.296) than that of the M-CHAT (range 0.012–0.096) for overall referral and for referral to specific services than among M-CHAT screen positives. This difference was most marked for the category ‘Referred to child psychiatry’, where 7.0 % (5 of 71) of children that screened positive on the ESAT had been referred compared with 1.2 % (10 of 826) of the children that screened positive with the M-CHAT (Fig. 1). The PPV of screening with both instruments was consistently high for specific services, and more than 40 % for any service referral. The NPV was high for both instruments (values >0.98).

Other clinical and developmental concerns

For all domains examined (Table 2), a higher percentage of screen positives as compared to screen negatives was rated as abnormal. This held true for both the M-CHAT and ESAT. For example, 33.3 % of the children that screened positive on the ESAT but only 3.3 % of the children that screened negative on the ESAT showed delayed motor development. Likewise, 15.1 % of the children that screened positive on the M-CHAT but only 2.7 % of the children that screened negative on the M-CHAT showed delayed motor development. Two-sample z tests between independent proportions were performed to determine whether there was a significant difference between the percentages of children screening positive with the ESAT, M-CHAT or both the ESAT and M-CHAT on items concerning medical, motor developmental, emotional, language and communication, or social interaction problems or ‘abnormal’ joint attention and play. On nearly all items, the proportion of children with problems in the ESAT screen-positive group differed significantly from the proportion of children with problems in the M-CHAT screen-positive group. The proportion of children showing clinical and developmental problems was about twice as high among children that screened positive with the ESAT as among children that screened positive with the M-CHAT. There was no difference found between the proportion of problems between the ESAT screen-positive group and the group of children that screened positive with both the ESAT and M-CHAT (see Table 2).

Table 2 Percentage of children with additional characteristics, as rated by their parents, expressed in terms of the number of children screening positive or negative on the M-CHAT, ESAT, and M-CHAT plus ESAT combined (N = 12,948)

In order to get insight into the clinical and developmental profiles and parental concerns among referred screen-positive children on the M-CHAT and/or ESAT, we compared those who were referred to and those who were not. Overall, a higher percentage of children who were screen positive on the M-CHAT and/or ESAT and were referred to a developmental service (R), showed motor and language problems and ‘abnormal’ joint attention and play compared to screen-positive children who were not referred (NR) (p < 0.05). All percentages for referred and non-referred screen-positive children to any developmental service on the additional clinical and developmental variables are given in Online Resource 1. The percentage of children with behavioral, health and developmental problems among referred ESAT screen positives tended to be higher than among referred M-CHAT screen positives, this seems especially the case for ‘abnormal’ joint attention and play. For the children screening positive on both the M-CHAT and ESAT, the percentages are comparable to the ESAT screen-positive group. Of special interest are the characteristics of the referred screen-positive children to child psychiatry services. There were only significant differences (p < 0.05) between referred and non-referred children to psychiatry services among M-CHAT screen positives for behavioral problems, delayed or aberrant language, fine motor problems and imperative pointing. For the ESAT and M-CHAT/ESAT screen positives, no significant differences were found for psychiatry services, even though the percentages of these groups are higher than those of the M-CHAT screen-positives group and the percentages of problems for the referred children tended to be higher than for the non-referred children. For referrals to the Child Habituation Unit and School Psychology Services, the most marked differences between referred and non-referred children were found for motor developmental, language and joint attention problems and none or less for behavioral problems.

Discussion

This study is one of the first to use parent-completed (without involvement of health professionals) screening instruments (ESAT and M-CHAT) in an unselected general population. The aims of the study, which involved 18-month-old children, were to explore the differences between the proportions and the overlap thereof of children that screened positive on the ESAT or the M-CHAT, to investigate the association between screening positive on either instrument and referral to developmental services, and to establish the clinical and developmental profiles and parental concerns of children that screened positive on either or both instruments.

Results indicated a significant tenfold difference in the proportion screening positives on M-CHAT (6.4 %), an ESAT (0.5 %). These findings may suggest that the M-CHAT (without follow-up interview) is overinclusive and/or the ESAT is too conservative. The prevalence of ASD among older children is estimated around 0.9 % [47], lower than the M-CHAT screen-positive rate (6.4 %) and higher than that of the ESAT (0.5 %). The relatively high screen-positive rates of the M-CHAT in the current study are in line with previous studies (6–14 %) [2124]. The psychometric properties of the ESAT have yet to be tested in a general unselected population.

Further analysis revealed differences in service referral between the two instruments. Almost 30 % of the children that screened positive on the ESAT had been referred to any specialist services by 18 months, which was nearly three times higher than the referral rate of children screening positive on the M-CHAT. Among children that screened positive on both instruments almost half had been referred to any services by 18 months. Of the children that screened positive on either the M-CHAT or the ESAT, most were referred to educational services; few were referred to child psychiatry services. However, the proportion of children referred to child psychiatry services was six times higher among children that screened positive on the ESAT than among children that screened positive on the M-CHAT. This suggests that children who screen positive on the ESAT have more psychiatric problems than children that screen positive on the M-CHAT. Across all developmental services, a relatively higher percentage of screen-positive children on the ESAT compared to on the M-CHAT tended to have ‘abnormal’ joint attention and play, which is considered a primary marker for early detection of ASD. Results suggest further that the M-CHAT possibly identifies children with (milder) lower intellectual and adaptive functioning. This is in line with another large population study of MoBa, which shows that as many as 7.3 % of the children in the non-ASD group scored above cut-off on the M-CHAT 23-item criterion at 18 months, and that only a third of the children screening positive at 18 months were diagnosed with ASD at a later age [26].

The PPV of the ESAT and M-CHAT for clinical referral at 18 months was moderate for children that screened positive on both instruments, low for children that screened positive on the ESAT, and very low for children that screened positive on the M-CHAT. The PPV of the M-CHAT for clinical referral ranging from 0.012 to 0.096 was comparable with PPVs for ASD reported by Kleinman et al. [21] and Robins [24], namely, 0.11 and 0.058, respectively, for children that screened positive on the M-CHAT. The PPV of the M-CHAT in this study could be higher if we had incorporated the follow-up interview or used the suggested cut-off of 7 which bypasses the follow-up interview [25]. However, using the more conservative cut-off of 7, we would have missed some of the children with screen scores between 3 and 6, who have perhaps milder symptoms, but still have an increased risk for having developmental problems. There are no reports of the PPV of the ESAT for ASD in the general population at very young age. The low PPV we found in this study could reflect the very young age of the sample coupled with the fact that ASD is often diagnosed late.

Interestingly, it seems there were also qualitative differences in the nature of problems that were picked up by the instruments. In all domains of health and development evaluated in the exploratory analysis, a higher proportion of screen positives as compared to screen negatives, endorsed abnormality on other clinical and developmental problems, as rated by their parents. Overall, a higher proportion of children that screened positive on the ESAT than positive on the M-CHAT had these problems. Although not significantly different from ESAT screen positives, parents of children that screened positive on both instruments reported more clinical and developmental concerns in our post hoc analysis. These children also had the highest referral rates for developmental services. This may be because the use of a combination of screening instruments means that there are more questions or items, and that sometimes more than one item addresses a certain aspect or behavior. This could increase the likelihood that parental concerns about developmental problems are addressed. Further, more than one question about a particular aspect or behavior might make parents more aware of the importance of this aspect or behavior. This suggests that investigation of the predictive value of individual items might lead to further optimizing the screening instruments.

In sum, when choosing a screening instrument for the general population, investigators need to weigh the balance between false positives and false negatives. If the goal is to identify children at risk who should be closely monitored, it might be better to select too many rather than too few children for follow-up, making the M-CHAT the preferred instrument. However, selecting children with false-positive screening results for further follow-up might cause unnecessary parental anxiety and increase costs, making the ESAT the preferred instrument. Moreover, the ESAT differentiates more than the M-CHAT between the degree and content of parental concerns for children being screening positive and children being screening negative, which strongly contributes to the validity in being able to identify children at high risk for autism spectrum disorders or other serious developmental problems. As mentioned before, identifying children at high risk for autism spectrum disorders at an early age with an standardized instrument enhances exchange of thoughts for parents as well as health care providers, reduces diagnostic delay and subsequently parental distress and enables early intervention which leads to better outcomes. Investigation of the predictive value of individual items might be warranted to be able to give a recommendation or provide a strategy for the use of different questionnaires.

Limitations

The cross-sectional nature of the study means that we cannot establish whether, for example, parental concerns preceded referral to developmental services, or whether referral triggered symptom detection. Our data are based on parental information only. This could lead to an underestimating of the PPV of both instruments, because it is plausible that concerns increase as children get older. Parents, however, are the main and in almost all cases the sole informants about very young children’s behavior problems, and there is a body of evidence indicating the merit and validity of parent information [1013].

Final diagnoses (ASD: yes or not; mental handicap: yes or not) are crucial for examining the psychometric properties of the screening performance. However, outcome measures as being clinically referred, presence of parental concerns, and behavioral and medical measures are very relevant and generally accepted proxy measures of “caseness”. These proxy measures can give us better insights into the utility of the screening performance, pending more final analyses based on more specific clinical diagnosis.

As mentioned earlier, in a population-based design it is of great importance to know if the sample is a good representation of the general population. Although many contextual factors could play a role, we believe that SES stands at the basis of this comparison, since SES in itself is related to many other contextual factors and parents’ socio-economic status is found to be related to the age of identification of ASD [19]. However, Norway has relatively small differences in socio-economic status and ethnicity in the population compared to some other countries and the health- and welfare-system in Norway is well developed. In the current study, we have a sample with somewhat higher SES than the population, no relation between SES and referral status, but slightly higher scores in the lower income groups. This means that the screen scores found on the ESAT and M-CHAT could have been even slightly higher when more low SES families would have participated. This effect applies to both the instruments and counteracted rather than augmented our finding that the high scores on the M-CHAT items in particular may lead to over-identifying high-risk children. We feel justified to conclude that our results have not been biased to any relevant extent by SES factors.

Strengths

The strength of the study is its population-based design. Moreover, the wealth of questionnaire data made it possible to investigate the screening instruments in relation to other variables related to health and development. Lastly, the longitudinal design of the MoBa mother-and-child cohort study provides the opportunity to determine the sensitivity and specificity of these screening instruments as children grow older and undergo diagnostic assessments. The longitudinal information obtained in the MoBa study might provide insight into population screening and public health requirements for an adequate instrument, and the costs of screening large populations.

Conclusion

Our findings suggest that children who screen positive on the ESAT and the M-CHAT have different profiles in terms of their clinical and developmental characteristics. The ESAT identified fewer children as being at risk of ASD than the M-CHAT, but a higher proportion of children that screened positive on the ESAT was referred to developmental services and the ESAT tended to identify more children with medical, emotionally, language, and behavioral problems. The M-CHAT identified more children at risk of ASD, but these children were less referred to developmental services and had fewer problems associated with other aspects of children’s development, health, and behavior than the children that screened positive with the ESAT. Since a post hoc analysis of combining the two instruments appeared to be more effective than the individual instruments alone in identifying children referred to clinical services at 18 months, further analysis at the level of single items is warranted to improve these screening instruments.