Introduction

Previously believed to be a very rare disorder affecting only 2–4/10,000 children, several recent autism prevalence studies report rates above 1% (Baird et al. 2006; Honda et al. 2005; Sumi et al. 2006). Change of diagnostic criteria, increased awareness and the recognition of autism as a spectrum disorder are claimed to be the main reasons for the dramatic rise in autism spectrum disorder (ASD) prevalence (Fombonne 2003; Gillberg and Wing 1999; Williams et al. 2006a, b). As the diagnostic process for ASD is both time consuming and a highly specialized task, the increasing number of children suspected of suffering from an ASD puts the services available under pressure. This in turn has increased the need for efficient ASD screening tools. A consequence of reconsidering ASD as a spectrum across intellectual functioning and comorbid conditions is the great heterogeneity in the way ASD symptoms are presented. This heterogeneity poses a considerable challenge for the development of effective ASD screening instruments. In addition, the low prevalence of disorders (e.g. ASD) in general populations causes the positive predictive value (PPV) of a test to drop (Clark and Harrington 1999; Haynes et al. 2006) in spite of high specificity (the capacity to yield a negative result for a person not having the target condition) and sensitivity (the capacity to yield a positive test result for a person with the target condition).

Several screening devices validated in different settings are now available, but there is still no ASD screening test fully validated as a general population screen for school-age children (Williams and Brayne 2006). Only the Childhood Asperger Syndrome Test (CAST: Scott et al. 2002) and the Autism Spectrum Screening Questionnaire (ASSQ) (Ehlers and Gillberg 1993; Williams et al. 2005) were specifically developed to screen for ASDs in mainstream primary schools. The CAST has shown very good sensitivity and specificity, but only moderate test–retest reliability. Other problems with the validation of the CAST include the low response rate and the lack of case ascertainment procedure for screen negative children (Allison et al. 2007; Scott et al. 2002; Williams et al. 2006a, b, 2005). One major shortcoming of the CAST is that some items concern early development, and the CAST can therefore not be used by teachers or other educational professionals, only by parents.

The ASSQ was developed to screen school children for Asperger Syndrome (AS) (Ehlers and Gillberg 1993), but was later relabelled Autism Spectrum Screening Questionnaire as it appeared to work well as screen for other ASD as well. Although the ASSQ consists of only 27 items, it has been shown to be both valid and reliable, with good sensitivity and specificity in clinical settings (Ehlers et al. 1999). It has further been shown to have good internal consistency and a stable three-factor structure (Posserud et al. 2008b). The ASSQ could be appropriate for use as a population screen, as it is short and suitable for completion by both parents and teachers. Although it has been used in this way (Bilenberg et al. 2005; Ehlers and Gillberg 1993; Mattila et al. 2007; Posserud et al. 2006), data on specificity and sensitivity in this setting are still lacking.

Informant Differences

Low to moderate agreement between parents and teachers is a general finding in reports on child psychiatric symptoms (Achenbach et al. 1987; Kumpulainen et al. 1999; Touliatos and Lindholm 1981), and this has been interpreted as reflecting true differences in behaviour according to setting. In autism research, however, low agreement has often been viewed as a sign of poor validity of the instrument. Although an ASD assessment in clinical work usually includes gathering information from various sources, little work has been done on informant specific contribution in autism research. Studies analyzing the potentially very important effect of informant in studies of autistic symptoms show discrepant results, and conclude that the issue should be more extensively investigated (Ehlers et al. 1997; Hughes et al. 1997; Konstantareas and Homatidis 1989; Szatmari et al. 1994). When screening for autism, typically only one informant (parents or teachers) is involved in the first questionnaire phases. However, from the knowledge we have of autism and other mental health problems, it is possible that asking both parents and teachers might improve case finding. Parents and teachers see the same child in very different settings; teachers have more opportunities to see the child interact with other children, whereas parents see more of the child in a one-to-one interaction. A high functioning child with ASD may have subtle difficulties that may be hard to ascertain unless the child is being stressed (Wing 1996). He or she may function very well in a one-to-one setting with adults, but fail completely in the unstructured recesses in school. Studies show that parents often have autistic traits, or the broader autism phenotype (Bolton et al. 1994; Lainhart et al. 2002), and they may therefore not be able to identify the social impairment, whereas teachers may miss the passive and withdrawn child with ASD. In a recent study using ASSQ, only 24% of AS cases were identified by both parents and teachers (Mattila et al. 2007), further indicating that using both parents and teachers as informants is necessary for optimal case finding.

The aim of the present study was to evaluate the properties of the ASSQ as a general school population screen for ASD. ASSQ was used as part of a large total population screen for mental health problems in the Bergen Child Study (BCS), where both parents and teachers filled in the questionnaire (Heiervang et al. 2007; Posserud et al. 2006). A population derived sample of ASSQ screen positive children and controls were assessed for the presence of ASD diagnoses. We explored the effects of informant and different screening cut-offs on sensitivity and specificity, with the aim to identify the most efficient use of the ASSQ in the general population, e.g. public schools and primary health care.

Method

The BCS—First Phase

The screening for ASD was part of a larger longitudinal study—the BCS—assessing mental health among children in Bergen (N = 9430). The initial wave of the BCS consisted of three phases, which are described in detail in other publications (Posserud et al. 2006; Heiervang et al. 2007). The first phase consisted of a questionnaire including the ASSQ and other questions, sent to both parents and teachers of all children aged 7–9 in Bergen. Children whose parents gave informed consent to participate and had a matching teacher ASSQ questionnaire (N = 6609) were eligible for the second and third BCS phases, and are referred to as the identified sample.

Screening Criteria

The cut-off for the ASSQ screen was set at the 98th percentile, i.e. scores exceeding 18 points on parent ASSQ and 15 points on teacher ASSQ. Two-hundred and twelve children in the identified sample scored above this cut-off on parent and/or teacher ASSQ, and were defined as ASSQ screen positive.

BCS—Second Phase

The second phase of the BCS comprised the Development and Well-Being Assessment (DAWBA) parent interview (Goodman et al. 2000). The DAWBA is a structured interview including open-ended questions; the interview can be performed by lay interviewers, or on the internet, and is scored by a trained child psychiatrist (www.dawba.com). The later versions of the DAWBA have a scoring algorithm for the common ASD categories from the DSM-IV/ICD-10.

Parents of all ASSQ screen positive children were invited to the DAWBA, and 87 interviews were performed (41% of invited). As part of the overall study, 938 ASSQ screen negative children were also interviewed with the DAWBA (Fig. 1).

Fig. 1
figure 1

Flow-chart for the three BCS study phases. Phase 1 boxes are in white, Phase 2 in light gray and phase 3 in dark gray. The children going directly from P1 to P3 were invited extra due to their being part of a separate study of children with chronic physical illness (N = 19)

BCS—Third Phase: ASSQ Validation Sample

The third phase of the BCS was designed to resemble a clinical psychiatric assessment, including Kiddie-SADS (www.kiddiesads.com) (Kaufman et al. 1997), WISC-III (Wechsler 1992), Movement ABC and additional tests of cognitive function. Two hundred and ninety seven children from the identified sample were assessed in the third phase (4.5%). There were 194 boys (65%) and 103 girls (35%), mean age at assessment was 9.5 years, and mean full scale IQ (FISQ) = 89, (N = 296, one boy was not testable).

The sample in the third phase can best be described as a quasi-case-control group of children. To provide a validation sample that was both clinically meaningful, but that could also provide indications of the ASSQ sensitivity and specificity in a general population sample, the children in the third phase consisted of a large high-risk group of children (either screen positive on the ASSQ or other questions, and/or having received a diagnosis in the DAWBA) and a large control group of screen negative children.

All children who were ASSQ screen positive and participated in the second phase were invited to the third phase (N = 54, 62% of invited), along with all children who received any diagnosis in the DAWBA (N = 59, 67% of invited). A random sample of ASSQ screen negative children from the second phase was also included. This group consisted of both children whose questionnaires were completely negative on all questions, not only the ASSQ (N = 86, 62% of invited), and of children who were screen positive on other parts of the questionnaire (N = 77, 58% of invited). As part of another study, all children with a chronic physical illness (any disorder, including physical disability, epilepsy, asthma and allergy) were invited directly from the first phase to the third phase (if they had not already participated in the second phase) to increase their participation (see Fig. 1). See Hysing et al. (2007) for further details on the chronic disorder study. Although these children were recruited through a different pathway, they were included in the ASSQ validation sample as well, both because children with chronic neurologic disorders have an increased risk of having an ASD, and also because their number were relatively few.

The ASSQ validation sample thus constituted a very broad and varied group of children, derived from a total population sample but oversampled for case-ness, and oversampled for ASSQ screen positive children. Therefore, the ASSQ validation sample was expected to have a higher prevalence of both ASD and other mental health problems being relevant confounders of ASD, but it also included a large number of screen negative children to be able to make estimates of sensitivity and specificity representative of a population setting.

ASSQ Screen Positive Children

Including two ASSQ screen positive children with a chronic disorder who came directly from the first phase, the sample comprised in total 56 ASSQ screen positive children, corresponding to 26.4% of all ASSQ screen positive children from the identified sample. The main level of non-response was from the first to the second phase, where only 87 of 212 invited parents came for assessment. Attrition analyses to identify possible selection bias from the first to the second phase did not shown any significant differences between responders and non-responders to the second phase (Heiervang et al. 2007) nor from the second phase to the third phase (Posserud et al. 2008a).

ASD Diagnostic Procedure—Third Phase

A certified interviewer (M.P.) assessed 48 of the 56 ASD screen positive children in the third phase with the Diagnostic Interview for Social and Communication Disorders (DISCO, tenth revision) (Wing et al. 2002). The remaining eight ASSQ screen positive children were not interviewed with the DISCO due to the parent/child not wanting to come back for assessment after the Kiddie-SADS interview (N = 1) or impracticality arranging another interview day (N = 7). In all these cases, records were reviewed closely by the first author and in addition either the child was seen by the first author or discussed with the Kiddie-SADS interviewer. To ensure the finding of possible false screen negative cases, all Kiddie-SADS interviewers were instructed to refer children with suspected social difficulties on the basis of the Kiddie-SADS results to the first author. Two ASSQ screen negative children were referred for DISCO assessment according to this procedure. In order to reduce interviewer bias, a random selection of ASSQ screen negative children (N = 14) were assigned for DISCO assessment. All DISCO interviews were conducted blindly to the previous assessment and ASSQ score. In total 64 DISCO interviews were completed. The first author interviewed and/or interacted with all children whose parents took the DISCO interview, and in 48/64 children she was the Kiddie-SADS interviewer as well.

The DISCO is an interview for the systematic gathering of information that enables the interviewer to make a diagnosis within the autistic spectrum (Leekam et al. 2002; Wing et al. 2002). Similar to the Autism Diagnostic Interview-Revised (ADI-R) (Lord et al. 1994) it involves a diagnostic scoring algorithm to produce diagnoses according to the ICD-10, DSM-IV or other diagnostic criteria, but in contrast to the ADI-R the interviewer is expected to score all the items on the basis of all available information, including observation of the child. Using the DISCO is therefore conceptually similar to a combined diagnostic procedure involving both ADI-R and the Autism Diagnostic Observation Schedule (ADOS) (Lord et al. 2000). In addition, the DISCO contains items on early development and a section on activities of daily life and thus gives the interviewer some idea of the level of functioning in several different aspects of daily life, not only social functioning and communication.

Diagnoses

For the purpose of this study the concept of ASD comprised Autistic Disorder/Childhood Autism, Asperger’s Disorder/Syndrome and Pervasive Developmental Disorder—Not Otherwise Specified/Atypical Autism (DSM-IV, ICD-10). Children with clear autistic traits but where there was conflicting evidence of symptom levels between the parental report and the clinical observation (such that the symptoms reported were too scarce to diagnose an ASD), were classified as broader autism phenotype (BAP) (Bolton et al. 1994). Mental retardation (MR) was defined as a WISC-III FSIQ < 70 (American Psychiatric Association 2000). All the information from observation, testing, Kiddie-SADS and DISCO was used to make the final diagnosis. All the protocols were discussed individually in detail with the last author (C.G.) before final diagnosis was made.

Analyses

Receiver operating characteristic (ROC) analyses were performed to assess the discriminating power of the ASSQ (Hanley and McNeil 1982). ROC area under the curve (AUC) was calculated for ASD including BAP. Sensitivity, specificity, positive predictive value (PPV) and negative predictive values (NPV) were calculated for the recommended cut-off from the ROC. The sensitivity, specificity and ROC for the DAWBA were calculated for the children assessed in the second phase (N = 1025), for ASD versus no ASD as the DAWBA does not include any measure of the broader autism spectrum. Sensitivity and specificity for DAWBA ASD were calculated for the screening criteria used in the study (the 98th percentile) only because the validity of a DAWBA generated ASD is still not settled (Posserud et al. 2008a, b).

Results

The DAWBA Interview and ASD

Ten of the 87 ASSQ screen positive children interviewed in the second phase were diagnosed as having an ASD, versus none among the 938 ASSQ screen negative children, resulting in sensitivity of 1.0 and a specificity of 0.92 for the combined teacher and parent 98th percentile cut-off on the ASSQ vs. a DAWBA diagnosis of ASD. Looking at the ROC for parent and teacher ASSQ versus a DAWBA diagnosis, the AUC was very high for parents (0.98, 95%CI 0.97–0.99), and slightly lower for teachers (0.93, 95%CI 0.87–1.0) (see Fig. 2).

Fig. 2
figure 2

Receiver operating characteristic curves for ASSQ versus a DAWBA generated ASD

The DISCO Interview and ASD

Fourteen children were found to have a DISCO diagnosed ASD; six children were diagnosed with childhood autism (one girl), six children with Asperger’s syndrome (one girl), and two with atypical autism. In addition, nine boys were classified as BAP. Two children with ASD were ASSQ screen negative, whereas the remaining 12 children with ASD and the 9 children with broader spectrum were ASSQ screen positive on either parent and/or teacher ASSQ (Table 1).

Table 1 ASSQ screen status versus diagnosis

Several of the children with ASD had MR, other psychiatric disorders and epilepsy. MR, disruptive behaviour disorders (defined as K-SADS any disorder of oppositional defiant disorder, conduct disorder, and/or attention deficit/hyperactivity disorder, ever), affective and anxiety disorders (defined as K-SADS any affective and/or anxiety disorder, ever) were also very common among the false ASSQ screen positive children (Table 2). Only four out of the 35 ASSQ false screen positive children had neither low intellectual ability nor any psychiatric disorder. Table 2 presents an overview of comorbid problems found in the children with ASD, BAP and children who were false positive on the ASSQ.

Table 2 Clinical characteristics

Screening Properties of the ASSQ

Receiver operating characteristic (ROC) analyses were performed to assess the discriminant power of the ASSQ in distinguishing ASD (including BAP) from non-ASD cases. Figure 2 shows ROC curves for the ASSQ versus a diagnosis of ASD/BAP for parent ASSQ, teacher ASSQ, and for the combined ASSQ, defined as the highest ASSQ score on either parent or teacher ASSQ. The area under the curve (AUC) indicated strong discriminant ability for both parent and teacher ASSQ with ROC AUC of 0.90 (95%CI 0.85–0.95) and 0.89 (95%CI 0.81–0.97), respectively. Combining parent and teacher ASSQ resulted in even more optimal screening properties, with AUC of 0.94 (95% CI 0.89–0.98) for the combined score. Table 3 show the sensitivity, specificity, positive predictive value (PPV) and negative predictive values (NPV) for the recommended cut-offs from the ROC analyses (Fig. 3).

Table 3 ROC-AUC, sensitivity, specificity, PPV and NPV of the ASSQ
Fig. 3
figure 3

Receiver operating characteristic curves for ASSQ versus a DISCO generated ASD

Informant Agreement

Informant agreement to define a child as ASSQ screen positive in the third phase was low with Κ = 0.20, i.e. identical to the findings from the population sample (Posserud et al. 2006). The ratio of ASD among children whom both informants identified as ASSQ screen positive was nonetheless very high; five of the nine children with both parent and teacher ASSQ above the 98th percentile were classified as ASD (four children) or BAP (one child).

Table 4 shows the 23 children diagnosed with ASD or BAP according to screen status and informant. Teachers identified more of the children, but the difference was mainly in the identification of the spectrum cases, where the majority of children with BAP (7 out of 9) were identified only by teachers. When excluding children with BAP, parents and teachers were equally important to identify ASD cases. Interestingly, none of the two girls were identified by teachers.

Table 4 ASD screen status by informant of the children with ASD/BAP diagnosis

Discussion

The ASSQ Screen

We found that the ASSQ had good screening properties for all ASD in a total population sample. Using the combined criteria of parent and/or teacher ASSQ score above cut-off gave the optimal screening properties of both high sensitivity and high specificity. We found that more than 90% of the children (21 out of 23) who received a diagnosis of autism or broader autism phenotype were screen positive on either parent and/or teacher ASSQ using the 98th percentile, corresponding to a teacher ASSQ above 15 points and a parent ASSQ score above 18 points. ROC analysis indicated that the ASSQ worked very well, with high AUC for both parents and teachers, and especially when combining the two informants. The optimal cut-off indicated from the ROC curve was ≥17 on either parent or teacher questionnaire, corresponding to an estimated sensitivity of 0.91 and specificity 0.86.

Although the ASSQ was designed to capture ASD in higher functioning individuals, we found that the ASSQ also was efficient in capturing ASD in low functioning children. MR and borderline intellectual functioning (i.e. IQ 70–85) were the most common findings among false positive children, probably due to the general instruction to rate the child compared to typically developing peers.

Issues Concerning Using Parent Versus Teacher as Informants

In line with the hypothesis that parents may have difficulties appreciating more subtle social difficulties in their children, we found that only two out of the nine children with BAP were identified through the parental screen.

There were only two girls with ASD (and no girls among the BAP cases) in our sample. One of the girls was parent screen positive, and none was teacher screen positive. The numbers were small, and the high sex ratio may be pure coincidence. However, in the questionnaire phase of the BCS, teachers reported generally very low scores for girls, indicating that girls with ASD would mainly be identified through parental screen (Posserud et al. 2006). The combination of girls being identified through parental ASSQ and the finding that subtle ASD problems were mainly identified through teacher ASSQ may thus explain the lack of girls with BAP in the third phase.

The complete agreement between the parent ASSQ screen and the parent DAWBA ASD seen on the ROC support the notion that a DAWBA ASD relies entirely on the parental report. Therefore, the validity of a DAWBA generated ASD seems to be limited to the validity of the parental report concerning ASD problems (Posserud et al. 2008a).

Conclusions

The ASSQ is an effective screening tool for all ASD in a general population setting. Whenever possible, both parents and teachers should complete the ASSQ, as the best screening properties were obtained by combining teacher and parent information. The present study suggest that the cut-off should be ≥17 points for both informants. Using the same screening criteria may however work less well for girls with ASD, and the results indicate that gender specific screening criteria could be warranted.

Strengths and Limitations

The main limitations are the complicated design of the study with non-random sampling of ASSQ screen negative children, and the relatively small number of children diagnosed with ASD. The ASSQ validation sample is not a perfect representation of the general population, as it included more children with high ASSQ scores and other mental health problems than would be expected in a general population. However, differences in prevalence of the target disorder between samples do not alter test properties, whereas differences in severity of the condition do (Haynes et al. 2006). As all children with a diagnosis and all ASSQ screen positive children were invited to participate, it is probable that the children seen in the third phase had conditions in all degrees of severity, as in a general population. Also, the analyses of non-response bias from the first phase to the second phase showed no attrition effects here (Heiervang et al. 2007) nor from the second to the third phase (Posserud et al. 2008a).

Another limitation of the study was not applying the gold standard (the DISCO interview) to all children in the validation sample, introducing a potential verification bias. To minimize this bias, we ensured that all Kiddie-SADS interviewers contacted the first author whenever suspicion of ASD or autistic traits arose. The two cases of ASD that were found among the ASSQ screen negative children indicate that this strategy worked well.