Autism spectrum disorder (ASD) is a neurodevelopmental condition that is defined by persistent difficulties in social communication and interactional patterns of restrictive, repetitive, and stereotyped behaviours and interests (American Psychiatric Association (APA), 2013). Until recently, the diagnosis of ASD has focused predominantly on childhood. Yet there are a significant number of individuals for whom ASD is not diagnosed until later into adulthood. In particular, adults without intellectual disabilities tend to be overlooked (National Audit Office, 2009).

It is important to note that prior to the publication of the Diagnostic and Statistical Manual Version 5 (DSM-V; APA, 2013), the term Asperger Syndrome (AS) or Asperger’s Disorder was used when an individual reported no significant delay in language or cognitive development in early childhood. This term has since been subsumed by ASD in diagnostic manuals - the DSM-V (APA, 2013) and the 11th revision of the International Classification of Diseases (ICD-11; World Health Organisation, 2019). The debate on terminology used in a diagnostic process is beyond the scope of this article, but it is the view of the authors that the terminology used should reflect the diagnostic manual that was used in reaching a diagnosis. For example, if a prior version of a diagnostic manual such as the ICD-10 is used, the diagnosis should reflect the diagnostic categories and the terminology of the ICD-10. However, for the purposes of this article the term ASD will be used, both when referring to autism in general, and when referring to specific assessment tools and questionnaires that were originally developed for the assessment of AS.

Following an update to the government’s ‘Think Autism’ strategy (Department of Health & Social Care, 2014) to support adults on the autism spectrum in England to get a diagnosis and access support, the service in this study was commissioned in 2014 to meet a gap in services. The service aims to provide diagnosis-only for individuals who are referred specifically for suspected ASD, but with no comorbid intellectual disability, and are not currently being seen by secondary or acute mental health services who might otherwise carry out the assessment. The service is part of the NHS in England.

In England and Wales, the National Institute for Health and Care Excellence (NICE), an executive non-departmental public body that is sponsored by the Department of Health and Social Care, provides evidence-based recommendations for the diagnosis and care management of adults and young people with ASD. Using the NICE guidance is not mandatory, but it is used as a guide in both the NHS and private practice in the UK. Hence diagnostic services in the NHS and private practice will differ or share similarities depending on the extent to which they adhere to NICE guidance. The service in this study was set up to adhere to this guidance.

To provide a thorough assessment of ASD in adults, the National Institute for Health and Care Excellence (NICE, 2012) guidelines recommend a combination of structured questionnaires that are typically self-report measures; and diagnostic measures that include more in-depth assessment through semi-structured clinical interview and structured tasks (NICE 2012). The Adult Asperger’s Assessment (AAA), along with other diagnostic measures, was utilised by the service being one of the measures recommended by the NICE (2012) guidelines. The AAA was purposely designed to assess adults without an intellectual disability and is based on the diagnostic criteria of the DSM-IV for AS. The AAA is also more stringent than the DSM-IV, as a greater number of symptoms are required to be met for a diagnosis (Baron-Cohen, Wheelwright, Robinson & Woodbury-Smith, 2005).

The AAA comprises four sections that are completed during a clinical interview. In addition, the AAA includes three structured questionnaires that are routinely completed before the interview. The referred individuals complete the Adult Autism Quotient (AQ50) and the Empathy Quotient (EQ), with the designated advocate (as recommended by NICE, 2012), and someone who knows the individual well, preferably a parent, completes the Relative’s Questionnaire (RQ; Baron-Cohen et al., 2005).

Despite a 787% increase in the number of people in England diagnosed with ASD between 1998 and 2018 and the development of several specialist autism teams, there seems to be a waiting list of up to 2 years in many adult diagnostic services (Beresford et al., 2020; Westminster Commission on Autism, 2021). In the context of this increased demand in the assessment of adults, there remains a scarcity of studies that investigate the psychometric properties of diagnostic tools used in the adult population (Wigham et al., 2019). The review of existing measures specifically developed for adults, such as the AAA, as opposed to measures that were adapted from children’s measures, seem important for future adult diagnostic services’ consideration of diagnostic tools.

Although the extent of the current use of the AAA in adult diagnostic services is unclear, and the term AS is no longer formally used as a diagnosis, the structured questionnaires developed alongside the use of the AAA may well continue to be used in the NHS as well as private practice as the AAA is recommended by NICE. The Westminster Commission on Autism (2021) makes reference to the low levels of satisfaction regarding the diagnostic process and identifies the high volume of paperwork that usually form part of the assessment as one of the challenges of the diagnostic process. Hence, the value that the lengthy EQ, AQ and RQ add to the diagnostic process is important to investigate. Furthermore, the EQ, AQ and RQ are freely available online, and lay people and researchers are likely to continue to use the questionnaires. Although Wigham et al. (2019) stress that structured questionnaires alone should not be used to exclude further autism assessment, the use of structured questionnaires that do not require a trained assessor can be useful in the community, and in primary and secondary healthcare settings to gather information about a possible neurodevelopmental disability. Therefore, further research has relevance for any future or current diagnostic service using the AAA, as well as for healthcare settings that wish to gather information, and for lay people and researchers who wish to use the AQ, EQ and RQ.

While a number of studies have considered the utility of the AQ (e.g. Ashwood et al., 2016), few studies have investigated the clinical usefulness of the combined use all three of the structured questionnaires. The original study that proposed the AAA as a diagnostic method (Baron-Cohen et al., 2005) used a clinical population, however the sample size was small (N = 42), with a particularly small sample of those who did not receive a diagnosis (N = 8). In addition, the sensitivity (i.e., ability to detect those with ASD) and specificity (i.e., ability to exclude those without ASD) of the structured questionnaires was not considered (i.e., discriminant validity).

Further studies have been carried out to investigate the discriminant validity of the AQ50, however the findings are inconclusive (Ashwood et al., 2016; Booth et al., 2013; Woodbury-Smith, Robinson, Wheelwright and Baron-Cohen, 2005). In an early study by Woodbury-Smith et al. (2005), the AQ50 was found to be able to discriminate between those who were diagnosed with ASD and those who were not with moderate accuracy in their diagnostic service for adults without an intellectual disability. The AQ50 was moderately good in terms of sensitivity (0.77) and specificity (0.72) for the cut off of > 32. In a later study comparing participants with a preexisting diagnosis of ASD with a general population control group, the sensitivity and specificity of the AQ50 was good (≥ 80%; Booth et al., 2013); however, the ecological validity is limited compared to studies involving all participants presenting to diagnostic services for a first-time assessment, meaning that diagnostic accuracy may be inflated (Wigham et al., 2019). Ashwood et al. (2016) more recently reported findings in a more ecologically valid setting with individuals referred to an ASD assessment clinic that used the Autism Diagnostic Interview – Revised (ADI-R; Lord et al., 1997) and the Autism Diagnostic Observation Schedule-Generic (ADOS-G; Lord et al., 2000) to diagnose ASD in all participants. For both the diagnostic cut off (> 32) of the AQ50 and the screening cut off (> 26), the sensitivity was above 71%, however specificity was less than 38%. In addition, the self-rated measure using the AQ50 did not correlate with the clinician rated ASD behaviours (i.e. ADOS-G), and the relationship of the AQ50 with the early-life symptoms (i.e. ADI-R) was weakly correlated.

The discriminant validity of the other structured questionnaires that form part of the AAA – the EQ and the RQ – has not previously been assessed in adults without an intellectual disability. This is important to consider given that the AAA continues to be used in clinical practice to aid diagnosis of ASD. The current study also provides a context in which to interpret the use of the questionnaires in previous studies and assessments. The service in this study is meeting a previously unmet need for individuals without an intellectual disability who reach adulthood with undiagnosed ASD. However, due to the specific nature of the service and therefore sample, it might be expected that the measures would yield low levels of specificity in this study’s population, similar to the diagnostic service reported for the AQ50 by Ashwood et al. (2016), due to most individuals being referred having at least ASD traits, although not all enough to meet diagnostic criteria. A further important consideration to highlight is that the questionnaires were not originally designed to fully capture ASD. For example, the EQ (Baron-Cohen & Wheelwright, 2004) specifically considers empathy, and does not cover the full ASD phenotype. Hence, it needs to be considered whether a combination of the questionnaires may better predict ASD diagnosis.

The purpose of this study was to examine the structured questionnaires of the AAA as part of a service evaluation process with the hope of reviewing their utility and how the measures are currently being used. Therefore, the key aim was to address the validity of the structured questionnaires used as part of the Adult Asperger’s Assessment, through answering the following questions:

• To assess discriminant validity, can each of the three structured questionnaires, the Adult Autism Quotient, the Relative’s Questionnaire and Empathy Quotient, individually discriminate between those with a diagnosis of ASD and those without?

• Do the clinical cut-off points on each of the structured questionnaires have good sensitivity (i.e., how good is the test at picking up people with ASD?) and specificity (i.e., how good is the test at excluding people without ASD)?

• To assess combined predictive power, does a combination or all three measures enable the ability to discriminate between those with and without an ASD diagnosis?

Methodology

Study Population

Our study population consisted of 259 individuals who underwent a full diagnostic assessment with the Mid and West Essex Adult ASD Diagnostic Service between September 2014 and March 2019. Of this sample, 38 had not completed the RQ measure, and were therefore excluded from the study. A further 6 individuals were recorded as receiving a “probable diagnosis”. As a significant number of assessed individuals were excluded, a comparison of diagnosis rate was made to determine whether the sample excluded was representative. There was no significant difference between number of individuals diagnosed with ASD from the full population assessed during this period and the sample included in this study. Individuals were adults referred to the service via their GP without concomitant intellectual disability. Referral criteria included that they must be 18 years or over, and not open to either a secondary or acute Mental Health Team or a Community Learning Disabilities Team.

The participants’ demographic data (age, gender and ethnicity) is summarized in Table 1. The overall sample comprised 215 individuals (118 male) with a mean age of 31 years at the time of assessment. In terms of ethnicity, the majority were White British (N = 148; 69%). Of those assessed, 179 (83%) received a positive diagnosis of ASD, and 36 were assessed as not meeting requirements for a diagnosis. The ASD participant group (96 male) had a mean age of 29 years 11 months, and the non-ASD participant group (22 male) had a mean age of 36 years 1 month. There was no significant difference in proportion of gender between groups (χ2 = 0.68, df = 1, p = .41), however the mean age of the non-ASD group was significantly older than the ASD group (t (213) = -3.08, p = .002).

Ethical considerations

At the start of the ASD assessment process, services users were informed that anonymized data could be used for service evaluation: a leaflet provided to explain the use of data and consent was obtained. Ethical approval was not needed for this service evaluation, due to the use of retrospective anonymized data, and a Privacy Impact Analysis (PIA) was carried out with Information Governance in line with the local trust policy. This PIA detailed how the data was to be stored, kept confidential and anonymised.

Measures

The measures focused on in this study were routinely carried out by the ASD Diagnostic Service. The Adult Asperger’s Assessment (Baron-Cohen et al., 2005) comprises a semi-structured interview and three structured questionnaires: the Autism Quotient, the Empathy Quotient and the Relative’s Questionnaire.

The Autism Quotient (AQ50; Baron-Cohen, Wheelright, Skinner, Martin & Clubley, 2001) contains 50 statements assessing five key areas: social skills, attention switching, attention to detail, communication and imagination. Individuals score in the range 0–50, with a clinical cut-off of 32 or more. The questionnaire is a self-report measure that is completed by the referred individual. The respondent must answer ‘definitely agree’, ‘slightly agree’, ‘slightly disagree’ or ‘definitely disagree’ to each question. The scores are binarized into ‘ASD responses’ (1) and non-ASD responses (0), scoring one point if the respondent records the ASD related behaviour as either ‘mildly’ or ‘strongl’y (Woodbury- Smith et al.,2005). Moderate to high internal consistency for the subscales has been reported (Cronbach’s alpha = .63-.77), and test-retest reliability was good (r = .70; Baron-Cohen et al., 2001).

The Empathy Quotient (EQ; Baron-Cohen & Wheelright, 2004) comprises 60 statements, 40 of which assess empathy and 20 are control items. It has a range of 0–80, with a cut-off score of 30 or less. The questionnaire is similarly completed by the assessed individual, and the statements are endorsed ranging from ‘strongly agree’ to ‘strongly disagree’. Each of the empathy items are given a score of 1 point if the empathic behaviour is recorded as ‘slightly’ or 2 points if it is responded to with ‘strongly’; half of the items are worded to be ‘agree’ responses and half are ‘disagree’ (Baron-Cohen & Wheelright, 2004). This measure has been shown to have good internal consistency (Cronbach’s alpha ranging from 0.88 − 0.99; Allison, Baron-Cohen, Wheelwright, Stone & Muncer, 2011; Wakabayashi et al., 2006).

The Relatives’ Questionnaire (RQ; adapted from CAST; Scott, Baron-Cohen, Bolton & Brayne, 2002) contains a range of questions that cover social impairments, communication impairments and repetitive or stereotyped behaviour that a relative (preferably parent) completes retrospectively thinking of the referred individual as a child. A “yes” or “no” response is given to each question. A score of 0–31 is given, with a clinical cut-off of 15 or more. The CAST has been shown to have good internal consistency (Cronbach’s alpha ranging from 0.71 to 0.81 dependent on age and rater; Holmboe et al., 2014).

Assessment procedure

The questionnaires were completed by the assessed individual and their family with support from an independent advocate, and returned in advance of the face-to-face assessment. Independent advocates are assigned by the service, who offer pre- and post-diagnostic support. The questionnaires are then scored by entering the responses onto a Microsoft Excel document and then running a macro.

After receiving the questionnaires back, the person referred for the assessment is questioned using the semi-structured interview of the AAA, as well as a parent or family member who knows them well and who can provide additional data about their developmental history and day-to-day functioning. The responses on the questionnaires shape some of the questions asked during the clinical interview, but did not directly determine the diagnostic decision making. This interview is conducted by specialist professionals who have formal training and several years of experience in the assessment of ASD and is usually completed as part of a full day assessment. In addition, a sensory screening tool, selected structured tasks from Module 4 of the Autism Diagnostic Observation Schedule (e.g. story telling task) (ADOS; Lord, Rutter, DiLavore & Risi, 2008), and informal observations made while having lunch with the referred individuals, are used to inform the decision around a diagnosis of ASD. Diagnosis is made with reference to the ICD-10 diagnostic criteria, and in consultation with other clinicians in the diagnostic service. For the purpose of this study, the assessment outcome is reported as positive diagnosis (1), or no diagnosis (0).

Data analysis

All analyses were carried out by using SPSS version 24.0. To determine the discriminant validity of each of the structured questionnaires (AQ50, EQ, and RQ), the mean scores of ASD and sub-threshold group (i.e., non-ASD) were first compared with independent t-tests. Secondly, the area under the curve (AUC) scores were examined from nonparametric receiver operating curve (ROC) analyses (see Figs. 1, 2 and 3). This plots true positives (i.e., the likelihood of an individual with an ASD diagnosis meeting the clinical cut-off point on the measure) against false positives (i.e., the likelihood of an individual without a diagnosis of ASD meeting the clinical cut-off point on the measure; Habibzadeh et al., 2016). Streiner & Cairney (2007) suggested the following benchmark for interpreting AUC scores: 0.50-0.70 (low accuracy), 0.70-0.90 (moderate accuracy), and > 0.9 (high accuracy). The diagnostic accuracy of the AQ50, the EQ and RQ at previously reported cut-offs was then reported. Chi-squared tests were used to report whether the various cut-off points significantly predicted ASD diagnosis better than chance, with meeting the threshold vs. receiving a clinical diagnosis both as binary variables (i.e., with ‘1’ representing a score above the cut-off point/positive ASD diagnosis, and ‘0’ being a score on the measure being below each cut-off point/ no ASD diagnosis). The positive predictive value (PPV) and negative predictive value (NPV) were also calculated for each cut-off score.

To determine whether using the three standard questionnaires in combination enables the ability to correctly predict ASD diagnosis (i.e. combined predictive power), a binary logistic regression was performed, using scores on the AQ50, EQ and RQ as predictor variables (where ‘1’ equals a score above cut-off and ‘0’ equals a score below cut off). These variables were entered into logistic regression in a single step with age and gender also being entered as potential covariate predictors of an ASD diagnosis. Age was included as a covariate since the ASD and non-ASD groups were not matched on this criterion and the range was wide as is expected in an adult diagnostic service where services for adults born pre 1980 have only been recently available (Brugha et al., 2011). Gender was also included since prevalence rates are higher in males than females (Werling & Geschwind, 2013). The goodness of fit of the model was evaluated with chi-squared with a significant threshold of 0.05, and the sensitivity and specificity of the model was evaluated by reviewing how many participants were correctly assigned to the diagnosis, or no diagnosis group.

Results

Discriminant validity of AQ50, EQ and RQ

Table 2 shows skew and kurtosis statistics and standard errors for each of the structured questionnaires. These scores and visual inspection of the distributions suggested that they did not depart significantly from normality in this sample. Table 2 also displays the mean and standard deviation scores for the ASD and non-ASD group on each of the structured questionnaires. Independent t-tests demonstrated that the ASD group had significantly higher mean scores on the AQ50, and the RQ and a significantly lower mean score on the EQ than the non-ASD group. Cohen’s d effect sizes (all ≥ 0.61) suggested moderate practical significance of these findings. However, it is important to note that the AQ50 and EQ mean scores for both the ASD and non-ASD groups reached the clinical cut-off.

Table 1 Demographic characteristics for whole sample, and ASD and non-ASD subgroups
Table 2 Mean and standard deviation of AQ50, EQ and RQ for ASD and non-ASD group; Skewness and Kurtosis for distribution of total dataset for each questionnaire

The AUCs obtained from the ROC curves for each structured questionnaire (Fig. 1) present the average value of sensitivity for each possible value of specificity (Table 3). For all three questionnaires, the AUC values reached significance, meaning the structured questionnaires predicted ASD diagnosis better than chance; however, the AQ50 just met the threshold for a moderate level of accuracy (0.70), and the EQ and RQ were in the poor range in terms of level of accuracy.

Fig. 1
figure 1

Receiver Operator Characteristic Curves for the AQ 50 (i) EQ (ii), and RQ (iii)

Table 3 Area Under the ROC Curve for structured questionnaires (AQ50, EQ and RQ) predicting ASD diagnosis

Sensitivity, Specificity, and Positive and Negative Predictive Values for cut-off points

Table 4 summarizes the diagnostic accuracy of the AQ50, the EQ and RQ at various cut-off points. The AQ50, with the original diagnostic cut off score of ≥ 32, predicted ASD diagnosis better than chance (χ2 (1, N = 259) = 11.05, p < .001). It showed a fair level of sensitivity in predicting those receiving a diagnosis when scoring above cut-off, although its specificity was less accurate (Table 4). The positive predictive value was high, meaning that more than four-fifths of those scoring ≥ 32 on the AQ50 did receive a diagnosis of ASD. However, the low negative predictive value suggests that more than two-thirds of individuals who scored below 32, who were predicted not to receive a diagnosis, did go on to get a diagnosis of ASD. The “screening” cut-off score of ≥ 26 (Baron-Cohen et al. 2001; Woodbury-Smith et al., 2005) was not able to predict ASD diagnosis better than chance (χ2 (1, N = 259) = 2.12, p = .14). While it yielded a higher level of sensitivity, its specificity was very low. The pattern was similar in showing a high positive predictive value, and low negative predictive value.

Table 4 Diagnostic accuracy of the AQ50, the EQ and RQ

The EQ, with the previously established clinical cut-off score of ≤ 30, and the RQ, with a clinical cut-off score of ≥ 15, both predicted ASD diagnosis better than chance (EQ: χ2 (1, N = 259) = 54.65, p < .001; RQ: χ2 (1, N = 259) = 12.88, p < .001). The sensitivity in predicting diagnosis on the EQ was good and the RQ was fair, but as per the AQ50, the specificity of the measures was poor. While the positive predictive values were high, the negative predictive values were poor.

Combined predictive power of the AQ50, EQ and RQ

To test how well a combination of the structured questionnaires to determine how the measures worked in combination to identify an ASD diagnosis, these variables were entered as a single step into binary logistic regression with diagnosis (ASD or no ASD) as the dependent variable. The goodness of fit of the model was statistically significant (\(\chi\)2 (5) 35.88 = p < .001), but the effect size was small, with the model being able to explain 25.8% of variance in the data (Nagelkerke R2 = 0.26). The model correctly classified 85% of the cases; however, although there was a very high level of sensitivity (97%) the specificity was very low (25%). In terms of the covariables, increasing age (Wald = 5.1, p = .02; Exp (β) = 1.0, 95% CI = 0.94, 1.06) was a significant predictor of a positive ASD diagnosis, but gender was not (Wald = 1.77, p = .18; Exp (β) = 0.57, 95% CI = 0.24, 1.3). The AQ50 (Wald = 5.95, p = .02; Exp (β) = 0.91, 95% CI = 0.85, 0.98) and the RQ (Wald = 7.32, p = .01; Exp (β) = 4.03, (5% CI = 0.87, 0.98) were both significant predictors, however the EQ did not significantly predict ASD diagnosis (Wald = 0.55, p = .46; Exp (β) = 1.2, 95% CI = 1.0, 1.01).

Discussion

The findings of this study suggested that the AQ50, RQ and EQ did not discriminate well between those with and without a diagnosis of ASD. While those diagnosed with ASD scored significantly higher on the AQ50 and the RQ, and significantly lower on the EQ, the mean scores on the AQ50 and EQ for both those with and without an ASD diagnosis were above previously determined clinical cut-off points. In addition, both groups’ mean scores were higher compared to mean scores of typical populations (without ASD) in previous studies on the AQ50, and lower on the EQ (e.g., in Wheelwright et al., 2006, mean AQ50 = 16.3; mean EQ = 44.3). This study tested whether these predetermined cut-off points on the AQ50 (≥ 32), RQ (≥ 15) and EQ (≤ 30) (Baron-Cohen et al., 2005) had good sensitivity in picking up those with ASD, and specificity in excluding those without ASD. While this has been investigated in previous studies for the AQ50 (e.g. Ashwood et al., 2016), the sensitivity and specificity for the cut-off scores has not been considered for the EQ and RQ in adults without an intellectual disability. This is important to consider as these cut-off points are recommended for clinical use. The three structured questionnaires showed a similar pattern of results in demonstrating the ability to predict diagnosis better than chance, with fair to good levels of sensitivity, meaning those with a diagnosis were likely to score above clinical cut-off. However, specificity was poor, suggesting that the ability of the questionnaires to correctly identify people without ASD was reduced, thus the likelihood that those without an ASD diagnosis scored below the cut-off was poor. High positive predictive values, and low negative predictive values, meant that that while the structured questionnaires were able to accurately predict diagnosis in those scoring above clinical cut-off (> four-fifths), around two-thirds of those scoring below clinical-cut off scores were ‘false negatives’, who did have a diagnosis of ASD. Furthermore, the area under the ROC curve tests found that the AQ50 just met the threshold for a moderate level of accuracy, and the RQ and EQ’s levels of accuracy were in the low range, suggesting limited evidence of the diagnostic validity of the three structured questionnaires was found within this sample.

As each questionnaire describes a different aspect of the autism phenotype, this study also assessed how well a combination of all three measures could discriminate between those with and without an ASD diagnosis. The analysis found that a combination of the AQ and RQ were able to accurately identify those with a diagnosis of ASD. However, there were high number false positives (i.e., poor specificity), where 75% of those who did not receive a diagnosis also scored above clinical cut-off on the questionnaires. In addition, within this model, the EQ did not significantly discriminate between those with and without ASD. Therefore, these findings suggest that using the questionnaires collectively also showed limited diagnostic validity.

To our knowledge, this is the first study to consider the discriminant validity of the EQ, and the RQ in an adult population without an intellectual disability. The findings for the AQ50 are similar to other studies conducted in similar clinical settings (Ashwood et al., 2016), but contrasted to findings of a larger study that found high levels of sensitivity and specificity, and a greater ability to discriminate between ASD adults and healthy controls (Booth et al., 2013). One explanation for this variability is the nature of the non-ASD group. The AQ50 is better at discriminating individuals with ASD when compared to a control group comprising of individuals from the general population. However, case-control studies have been shown to inflate the accuracy of diagnostic tests, known as ‘spectrum bias’ (Ashwood et al., 2016; Lijmer et al., 1999).

Indeed, psychometric research involving carefully screened comparison groups are not necessarily representative for a clinical setting (Bezemer, Blijd-Hoogewys, & Meek-Heekelaar, 2021). In such studies, the ASD group are already aware of their diagnosis before completing the measures (e.g. Booth et al., 2013). Testing whether the measures have predictive power within an ecologically valid ASD assessment service where they are completed before a clinical diagnosis is then made, is important to determine. Data from the present study and other clinical samples suggests that group differences (ASD vs. non-ASD) in the constructs measured in these questionnaires are weaker in clinical practice (Bezemer et al., 2021), despite being originally developed through carefully designed studies comparing to matched control groups.

Interestingly, a further recent study conducted with a clinical sample (Bezemer et al., 2021) showed higher levels of specificity in the AQ50 (0.90 for cut off ≥ 32 and 0.72 for cut off of ≥ 26) than we found in the present study, yet still lower overall predictive value compared with studies comparing ASD groups to the general population. The key difference between the sample in the present study and Bezemer et al.’s (2021) sample was a significantly lower prevalence rate of ASD (63% vs. 83%) and a lower rate of missed patients. Therefore, even within clinical settings there is a wide variability in predictive power of the AQ50, making it challenging to draw comparisons.

Within a clinical setting, why then do the AQ50, EQ and RQ have limited ability to discriminate between those with and without a diagnosis of ASD? In accounting for the ‘false positives’ (i.e., poor specificity), it could be due to the nature of the sample, as some individuals without a positive diagnosis had a number of ‘traits’, but not a sufficient number to warrant a diagnosis. It may also be that for those in the non-ASD group but who have a number of ‘traits’, a greater level of public awareness of ASD over the last 5–10 years (Dillenbeurger et al., 2013) may have led to people being more vigilant in ‘noticing’ ASD related difficulties. This may lead to a ‘confirmation bias’ when completing the questionnaire measures, and potentially explain why both the ASD and the non-ASD group’s mean scores met the cut-off points, which were higher than in previous studies carried out before adult autism assessment services were more widely available (e.g. Woodbury et al., 2005). Even if the diagnosis is yet to be made, all participants are aware that they are being referred for an ASD assessment when completing the measures. There is also some evidence that the AQ is sensitive to anxiety symptoms (Ashwood et al., 2016), whereas clinicians are able to distinguish between ASD and ‘ASD-like’ symptoms. While the participants in this study were not open to a community mental health team, it does not exclude the possibility that some had mental health difficulties.

The rate of ‘false negatives’ could be explained by some individuals with ASD potentially having poor insight into their condition, meaning that it is challenging for them to self-report their own behaviours and symptoms that could lead to an underreporting of difficulties (Bishop & Seltzer, 2012). This may partly explain the findings for the self-reporting AQ50 and EQ, yet similar results were found for the parent-rated RQ. A difficulty with the RQ is that it requires a parent to consider childhood behaviours retrospectively, which may also impact the accuracy of reporting. In addition, as there is a high level of genetic heritability in ASD (Hallmayer, Cleveland & Torres, 2011), the parent may struggle with similar issues. This means that they may not experience their child’s difficulties as unusual. While the questionnaires can act as a prompt to ask clarifying questions when completing the AAA (Baron Cohen et al. 2005), the rate of false negatives suggests that these measures may be less useful as a screening measure to determine who would need an assessment. The use of these questionnaires as screening measures could result in a number of service users not being assessed who may have gone on to receive a diagnosis.

Although not an explicit aim, a strength of the study within its clinical population is the proportion of females (approximately half) who have participated where other studies typically include significantly more males. An additional interesting finding in terms of gender is that within the sample in this study, women were at least as likely to receive a diagnosis (85%) as males (81%). Staff members in the diagnostic service informally reported that they are mindful of the gender differences in ASD presentation when they conduct the AAA’s clinical interview, which appears to be reflected in the gender balance in diagnosis. This contrasts with research that has shown a gender bias with females scoring significantly lower on items of early childhood restricted and repetitive behaviours on the Autism Diagnostic Interview-Revised (Tillmann et al., 2018). A more fine-grained analysis of sex differences on items within the AAA’s semi-structured interview would need to be carried out to examine this finding in more depth.

Limitations

While a strength of this study is its ecologically valid population of individuals presenting to a naturalistic setting of an ASD diagnostic service, this comes with some caveats. The high rate of diagnosis (83%) means that it cannot be assumed that the same prevalence rate applies to other outpatient settings. For example, prevalence was lower at 73% in Ashwood et al. (2016), bringing into question the generalizability of the results. This also meant that the group sizes were not equal: the non-ASD group sample was small (N = 36) with a larger variance compared to the ASD group. If the sample sizes were matched, the differences between groups would probably have been more pronounced. It was not possible to perform a priori power calculations, as retrospective data were used to evaluate the use of questionnaires within a particular service, as opposed to a prospectively designed study aiming to address a particular question.

A second limitation is the non-blind design, as the clinicians who made the decision in terms of ASD diagnosis were privy to the results of the structured questionnaires, meaning the results may have influenced the diagnostic decision (i.e. incorporating bias; Hajian-Tilaki, 2013). The measures are considered initially in the referral process, but on the day of the assessment the diagnosis is not made based on the score on the questionnaire, particularly as clinicians are aware of the limitations of the measures. However, the measures give qualitatively useful information as they contain certain characteristics pertaining to diagnostic criteria that will be further explored during the interview, which certainly limits the extent to which there is independence between the variables. This was unavoidable due to the questionnaires being part of routine clinical practice and is an inherent limitation of retrospective data collected within clinical services. Ideally, clinicians would be blind to the responses, or at least as to whether the scores meet the cut-off points. Although this would be challenging to maintain in clinical practice with limited numbers of staff working in the service. Importantly, however, each clinical decision was made on the basis of a detailed clinical interview, ICD-10 criteria and team discussion.

Clinical implications

The results of this study suggest limited discriminant validity of the structured questionnaires of the AAA which has relevance for clinical practice, as it highlights that they should be used with caution in the context of diagnosing or screening of ASD. The results also confirm previous findings that suggest that pre-assessment structured questionnaires are not currently a suitable substitute in replacement of a clinician’s judgment in a diagnostic interview (Ashwood et al., 2016).

The findings also bring into question whether the information gathered from the structured questionnaires generates enough useful information to warrant the time taken for individuals to complete the measures pre-assessment. Service users have commented that they are overwhelming and time consuming to complete. Clinicians working in the service mentioned that answers to specific questions sometimes lead them to probe further during assessment, and that individuals sometimes add written responses that provides useful information. However, the findings in this study raise doubt as to how the questions are interpreted by the person completing the questionnaire, hence the ‘construct validity’ of the answers, and therefore how useful the structured questionnaires can really be as part of an assessment battery. It might be that an alternative ASD trait questionnaire might prove to have better quality information and hence diagnostic validity. Potentially, a novel scale could be developed through exploring how clinicians discriminate ASD from ‘ASD-like’ characteristics so as to develop a pre-assessment measure with better diagnostic utility (Ashwood et al., 2016).

As aforementioned, the questionnaires were not originally designed to fully capture ASD (e.g., the EQ measures empathy). However, a combination of all three questionnaires did not improve the ability to discriminate between those with and without a diagnosis. The NICE guidelines (2012) propose that a combination of structured questionnaires and diagnostic tools should be used to aid diagnosis of ASD, but these findings raise potential limitations in the use of the AAA’s structured questionnaires as part of a diagnostic battery in clinical practice and research. Further research is warranted to determine the most effective, efficient and acceptable combinations of questionnaires and diagnostic methods.

Conclusions

In conclusion, the present study found that the AQ50, EQ and RQ questionnaires, both on their own and in combination, were limited in their ability to discriminate between those with and without a diagnosis in an ASD diagnostic service assessing adults without a learning disability. The structured questionnaires showed good levels of sensitivity in picking up those with ASD, but poor levels of specificity at excluding those without ASD. Therefore, the question of whether they are limited in their use as part of the diagnostic assessment process is raised. However, these findings should be interpreted with caution in light of the small sample size of the non-ASD group. These findings have implications for the service in reviewing the efficiency of the assessment process.