Introduction

Formally, autism spectrum disorders (ASDs)—though considered highly heterogeneous—are defined with respect to three main characteristics: deficits in social interaction, deficits in communication, and a restricted behavioural repertoire (APA 2000). Intellectual disability, physical disorders and sensory symptoms are also commonly associated with ASDs (Matson et al. 2011; Matson and Shoemaker 2009; Rogers et al. 2003).

Increasingly, and for a number of different reasons, researchers are measuring autistic traits in those without ASD. One motivation is the argument that such research has the potential to enhance the understanding of the determinants and consequences of clinical ASD. Clinical samples can be limited in size due to the relatively low frequency of cases and including non-clinical cases can increase statistical power by capturing some sub-clinical variability in autistic characteristics (e.g. Lundström et al. 2012; Sung et al. 2005). For example, measures of the ‘broader autism phenotype’ or of ‘autistic-like traits’ aim to capture subclinical characteristics or traits which are related to a genetic liability to ASDs and may, therefore, assist in identifying the genetic determinants of ASD (Wheelwright et al. 2010).

Autistic traits in those without ASD may also be interesting in their own right. Several studies have examined the relation between autistic traits in such populations to personality traits and interests (Austin 2005; Baron-Cohen et al. 2001; Kunihira et al. 2006). One line of research in this paradigm has led to the development of the Autism Quotient Questionnaire (AQ) and its variants (e.g. Baron-Cohen et al. 2006; Hoekstra et al. 2011) which aim to quantify an individual’s position on an assumed continuum from ASD to normality (Baron-Cohen et al. 2001).

A number of authors have explicitly addressed the hypothesis that individuals with an ASD are simply those who score at the extreme end of a normal distribution of traits and abilities related to social adaptation and communication (Constantino and Todd 2003). A distinction is made between clinical autism and autistic like traits (ALTs) in the general population, but the distinction is one of degree and not kind. The key difference is that those with a clinical diagnosis of ASD cross a threshold beyond which functioning is significantly impaired, whereas those high on ALTs but without a clinical diagnosis of ASD remain able to cope with the social interaction demands of society (Lundström et al. 2012). This hypothesis has been addressed by examining the latent distribution of ASD measures across those with and without ASD using taxometric or latent class analysis methods, with the evidence at present pointing to ASD as a distinct category from ALTs (Frazier et al. 2010, 2012). Again, this line of research depends on measuring autistic traits in individuals without a clinical diagnosis of ASD.

Accurately identifying those who fall into the ASD category has also been a prominent concern among clinicians and researchers. Studies seeking to develop new psychometric tools for identifying individuals with ASD have utilised samples without ASD to evaluate the discriminative power of the instruments (e.g. Allison et al. 2011; Baron-Cohen et al. 2001). Such studies allow the utility of candidate screening tools to be assessed in non-clinical populations, where they may act as a ‘red flag’ to assist frontline health professionals in making the decision whether to refer an individual for full diagnostic assessment for an ASD (Allison et al. 2011).

An important concept when utilising a single inventory across different groups, e.g. individuals with and without ASD, is measurement equivalence (Kim and Yoon 2011). Briefly, tests of measurement equivalence assess the hypothesis that the constructs being measured by an inventory have the same meaning and measurement properties across the groups. There are several available methods for testing the measurement equivalence of an instrument across groups in both confirmatory factor analytic (CFA) and item response theory (IRT) frameworks (Kim and Yoon 2011). When several layers of constructs are of interest i.e. the instrument has a higher-order structure, then multi-group CFA approaches provide an efficient means of gauging measurement invariance at all levels of this higher-order structure. This has the advantage of allowing specific loci of a lack of invariance to be identified e.g. in first or second-order traits. This is of relevance when assessing invariance of a measure of autistic traits because it is thought that these traits form a hierarchy including at least two levels of generality (Hoekstra et al. 2011). Work by Hoekstra et al. (2011) has supported the existence of both first-order autistic traits and a second-order ‘Social Behaviour’ construct.

Measurement equivalence can be assessed at a number of levels. Depending on the purpose of measuring autistic traits in individuals without ASD, different levels of measurement equivalence are required in order for the conclusions drawn to be valid. The importance of achieving different levels of measurement equivalence for different types of analyses has in general been widely discussed (e.g. see Borsboom 2006), however, the implications for ASD research has received less attention. In the present study, we outline the levels of equivalence required for the above outlined applications and then proceed to test this in a measure of ASD commonly used in both those with and without ASD.

Although stronger forms of equivalence may be needed for some applications of ASD measures, their most common uses rely only on metric invariance. Metric invariance implies that the same latent traits underlie the observed variables in both groups. Metric invariance is, therefore, important if the aim is to make inferences about the causes and consequences of autistic traits from autistic traits in the general population to ASDs in the clinical ASD population. In addition, in combined analyses of two groups measured by a test that does not exhibit metric invariance across these groups, differences in factor loadings can result in the appearance of additional non-substantive factors (Meredith and Teresi 2006). This has the potential to confuse attempts to identify the dimensionality of autistic traits and any putative correlates of identified dimensions.

Scalar invariance may also be desirable in some cases. Scalar invariance is required whenever group means are to be compared, otherwise observed mean differences or lack thereof may not represent the extent of difference in the latent trait of interest (Borsboom 2006). For example, group differences in scores on an ASD measures have been used as an index of the discriminative power of that test, however, without assessing scalar invariance, it is not known how this observed difference corresponds to mean differences on the latent traits measured by such test (e.g. Baron-Cohen et al. 2001). Similarly, it is possible that a test of ASD traits used to screen for or diagnose ASD could systematically under- or over-estimate the scores of individuals with ASD relative to individuals without ASD if the test is biased, thus reducing its discriminative power. Assessing scalar invariance can evaluate this possibility. Finally, any aggregated factor analyses of scores of those with and without ASD in which scalar invariance does not hold can result in additional factors appearing due to the differences in thresholds across the groups (Meredith and Teresi 2006). Again, where the dimensionality of ASD traits may be of interest, this could produce misleading results.

It is important to test empirically, and not simply assume, measurement equivalence across groups of people with and without ASD, as there are potentially a number of reasons to think that it could be violated. For example, particular items may be less relevant, or have a different meaning or interpretation to individuals without, as compared to individuals with, a clinical diagnosis of ASD. In addition, ASD is associated with difficulties in self-perception, such that there may be systematic under-reporting or possible increased inaccuracies in any self-report measure of autistic traits in individuals with ASD (Johnson et al. 2009). To our knowledge, no study has previously examined measurement invariance of a test of autistic traits across individuals with and without ASD. It was, therefore, the aim of the present study to test for invariance of a popular measure of autistic traits across individuals with a clinical diagnosis of ASD and control individuals without a diagnosis of ASD.

Methods

Participants

ASD Group

We utilised archival data from case notes on a total of 148 participants with a diagnosis of Asperger syndrome (AS) or high functioning autism (HFA), who had previously completed the full AQ (Baron-Cohen et al. 2001). HFA was defined as meeting the criteria for Autism but having normal intellectual function while AS was defined as meeting the criteria for HFA but with no history of language delay (Baron-Cohen and Wheelwright 2004). The sample included 107 males and 41 females. The age range of the sample was 17–62 with a mean of 33.3 (SD = 10.7). For a comprehensive description of this sample, including information on data collection and the diagnostic process please see Kuenssberg et al. (2012).

Non-ASD Control Group

We recruited a control group of 166 participants from a large university community and from social networking sites. Information was collected via an online questionnaire, which included both the full AQ (Baron-Cohen et al. 2001) and questions on the demographic characteristics of participants. The control sample included 40 male and 126 female participants. The age of the control sample ranged from 17 to 65 with a mean of 30.1 (SD = 11.30).

Measures

The AQ-S is a short form of the full AQ (Baron-Cohen et al. 2001) motivated by the benefits of a shorter form for use in large scale studies where the full AQ may be too lengthy. Hoekstra et al. (2011) arrived at the structure and content of the AQ-S using item selection and validation analyses from the full AQ in a sample of individuals with ASD and 2 control populations. The AQ was designed to capture the core dimensions of autistic traits in adults with normal intelligence. The AQ-S includes 28 of the 50 original AQ items and retains its broad dimensionality. Thus, the items measure the domains of: social skills, routine, switching, imagination, and numbers/patterns and a higher-order social behaviour factor defined by the first four of these five factors. In ASD and control samples, CFA analyses have generally found reasonable fit of this structure for the AQ-S (Kuenssberg et al. 2012; Hoekstra et al. 2011).

In the present study, both our ASD sample and control sample completed the full AQ, but we selected only the AQ-S items for analysis. Each item of the AQ-S has four response options from ‘definitely agree to definitely disagree’. Half of these items are reverse keyed and we rescored these items so, for all items, higher scores indicate a higher degree of autistic traits. Although previous analyses of the full AQ have tended to use a dichotomous scoring system, we elected to use the full four-point scale for our analyses because multi-group CFA with dichotomous items has lower power and requires larger sample sizes to detect non-invariance (Kim and Yoon 2011).

Statistical Analysis

Data Screening

As a check on the appropriateness of the data for our planned statistical analysis, we examined data skewness, missingness and communalities.

Measurement Invariance Analysis

We estimated a series of CFA models in order to first assess the fit of the proposed structure of the AQ-S in both the ASD and control groups. We also fit these same CFA models to only the males and only the females in the sample. This was to assess the possibility that sex differences in autistic traits contributed to any non-invariance observed across the ASD and control samples (e.g. see Rivet and Matson 2011).

Next, we estimated multi-group CFA models to assess measurement equivalence. Multi-group CFA for the analysis of measurement equivalence can be implemented using either a forwards or backwards approach. In the forwards approach, the baseline model is the least constrained model and cross-group equality constraints are added successively. In the backwards approach, the baseline model is fully constrained and mis-specified constraints successively released. Here we employ a forward selection method for our analyses. In assessing the equivalence of a measure of autism, the forwards multi-group CFA approach is most appropriate because it allows the less strict assumption of factorial or metric invariance (equality of factor loadings across groups) to be tested before proceeding to a test of scalar invariance (equality of factor loadings and thresholds across groups) and the even stronger assumption of strict invariance (equality of factor loadings, thresholds and item residual variances). Forward selection has been shown to detect differential item functioning with greater accuracy and to be less prone to the substantial type I error produced by backward selection (Stark et al. 2006).

Here we followed the sequence of analyses suggested by Chen et al. (2005) for high-order structures. We first tested for configural equivalence across the ASD and control groups for the complete second-order structure. Next, metric equivalence constraints were added to the first, then second order factor loadings. Finally, scalar equivalence constraints were added to the first, then second order thresholds and intercepts.

Model Estimation

All models were estimated in Mplus 6.11 (Muthén and Muthén 2010) using weighted least squares means and variances estimation (theta parameterization) to account for the categorical measurement scale of the questionnaire (Rhemtulla et al. 2012).

Model Specification

Single Group CFAs

As a preliminary test of the appropriateness of invariance analysis in our data, we individually fit single group CFA models to both the first and second order proposed structure of the AQ-S in the ASD and control groups (Table 3). We fit the model suggested by Hoekstra et al. (2011) in the original validation study for the AQ-S and depicted in Fig. 1. For the purposes of scaling and identification, we fixed the variance of the latent factors to be 1.0.

Fig. 1
figure 1

Structural diagram and standardized parameter estimates for the second order metrically invariant model (M3, Table 5). i item. Item numbers correspond to the numbers from the original AQ. Parameter estimates are presented for both the ASD group (on the left hand side of the forward slash) and the non-ASD group (right hand side of the forward slash)

MG-CFA Configural Invariance

The configural invariance model (Table 4, M1) provides the baseline model for all subsequent analyses (see section “Model Evaluation”). Here, factor loadings and item thresholds are free to vary across both groups. Factor variances are again fixed to 1.0 across both groups for identification. Latent factor means were fixed to zero in both groups, while item residuals were fixed to 1.0 in both groups.

MG-CFA Metric Invariance

Metric invariance was assessed in both the 1st-order factors (Table 4, M2) and 2nd-order factor (Table 4, M3), by constraining the loadings to equivalence across groups.

MG-CFA Item Threshold Invariance

Next, equivalence constraints were placed on the thresholds of the indicator items across groups (Table 4, M4). The AQ-S has a four point response scale resulting in three thresholds per item. In this model, the factor means of the first order factors were freely estimated in the second group (here the control group), but remained fixed at 0 and 1.0 respectively in the reference group (here the ASD group).

Model Evaluation

Model fit was evaluated using comparative fit index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA) and Weighted Root Mean Square Residual (WRMSR). We judged fit to be good when CFI and TLI values were >.90–.95, WRMSR values were <.90, and RMSEA values were <.08 (Hu and Bentler 1999; Schermelleh-Engel et al. 2003; Yu 2002). Though it is often desirable to compare models, especially non-nested models, based on the Akaike and Bayesian Information Criteria, these are not available in the current analyses as we utilise a item level estimation technique (WLSMV), which does not maximize the log-likelihood required to calculate AIC and BIC.

In testing measurement invariance, the assumptions of invariance are assumed to hold if the additional constraints placed on models yield negligible changes in model fit. There is no uniformly agreed upon criteria for judging change in model fit within invariance models. Here we rely on the studies of Vandenberg and Lance (2000) and Chen (2007) and suggest drop in fit of −.010 for TLI, −.005 for the CFI, alongside a change in RMSEA of .010, to be indicative of substantive decline in model fit, and thus, that the assumptions of the invariance constraints are violated.

Results

Descriptive Statistics

Substantial skew and kurtosis were evident in both the ASD (±2.43 and ±7.22 respectively) and control (±1.27 and ±1.24 respectively) groups. However, WSLMV is robust to non-normality, and as such, the levels of skew and kurtosis were not deemed problematic. Item communalities in both the ASD (range = .24–.60, mean = .40) and control (range = .21–.75, mean = .44) suggested the data were suitable for factor analysis.

In both samples, the percentage of missing data was low (ASD = .27 %; control: .84 %). Less than 5 % missingness is generally considered unproblematic (Tabachnick and Fidell 2007).

Subscale Means

Subscale means and standard deviations first split by ASD versus control and then by males and females are provided in Table 1. For the social skills, routine, and switching factors means were higher in ASD relative to controls and in males relative to females. For the imagination and numbers/patterns subscale, however, there was essentially no observed mean difference between ASD and control and between males and females (see section “Discussion”).

Table 1 AQ-S subscale means and standard deviations for sample sub-groups

Table 2 reports the correlations of the five subscales scores of the AQ-S. Given the same sizes in the current study, differences in Pearson’s correlations of approximately .20 would be indicative of a significant difference. Consideration of the estimates in Table 2 suggests that although correlations between subscales show some variation across groups, these variations do not reach a threshold for being statistically significantly different. However, it is important to note that this may be a reflection of the size of the current samples.

Table 2 Subscale inter-correlations for ASD, control, males and females

Single Group CFAs

Fit indexes for the single group CFA analyses are provided in Table 3. Model fit for the first-order and second-order models in the ASD and control were acceptable based on the lower limits of model fit, and in line with model fits reported in previous studies. Fit in the ASD group was slightly better than that of the control group. The inclusion of the second-order factor in both the ASD and control groups resulted in a small decrease in model fit at the third decimal place in the CFI, TLI and RMSEA. However, given the magnitude of the factor correlations (ranging from r = .46 to .77; see also the sum score correlations in Table 1), which indicate the plausibility of a second-order factor, and past research on both the AQ and AQ-S suggesting its inclusion, the small differences in fit were not deemed substantive enough to support rejecting the second-order model. Modification indices (MI) and expected parameter changes (EPC) suggested the presence of some cross-loadings and correlated residuals in both samples and pointed to item complexity as an important source of mis-fit in the model. In the ASD group, the largest MIs and EPCs were associated with cross loadings of item 41 on social behaviour (MI = 16.13, EPC = .26) and item 2 (MI = 15.71, EPC = .43) on numbers and patterns. In the control group the largest MIs and EPCs were again associated with cross loadings of item 41 on social behaviour (MI = 20.65, EPC = .36) as well as item 46 on Social Skills (MI = 16.80, EPC = .72). Given that fit was well within acceptable levels and because we wished to avoid capitalising on chance (e.g. McDonald and Ho 2002) we did not make any post hoc modifications to our models based on these MIs and EPCs.

Table 3 Model fit statistics for single group models

In the male, female and whole sample groups, however, the magnitude of the fit statistics for the first-order models was poor (and we, therefore, did not attempt to fit a second-order model). A possible explanation for this was that differences between the ASD and control groups (i.e. non-invariance) meant that whenever a group contained both ASD and control individuals, there was a break down in factor structure (e.g. see Meredith and Teresi 2006). We tested this possibility by fitting single group CFAs to two samples to which participants were randomly assigned and which were, therefore, heterogeneous with respect to both sex and ASD versus control diagnosis. Table 4 provides details of the sex and ASD status of the random groups as compared to the whole sample. In addition, we fit single group CFAs to the sample of females without ASD and the sample of males with ASD in order to test the model on samples which were homogeneous with respect to both sex and ASD diagnosis. The purpose of these additional analyses was to help us to identify the source of mis-fit in the sex homogeneous, ASD heterogeneous samples.

Table 4 Cross tabulation of sex and ASD status in the whole and random samples

The model fit well in the male group excluding those without an ASD diagnosis and in the female group excluding those with an ASD diagnosis. In the groups mixed with respect to sex and ASD diagnosis, fit was poor and similar to that in the single sex groups which were heterogeneous with respect to ASD diagnosis. This pointed to differences between individuals with ASD and controls as the source of mis-fit in the mixed groups and suggested that we would observe non-invariance between ASD and control groups in subsequent stages of analysis.

Multi-Group CFA Measurement Invariance

Next, we sequentially tested for measurement invariance across ASD and control groups. The multi-group configurally invariant model (Table 5, M1) showed reasonable fit to the data and provided the initial baseline model for model fit comparisons. The assumption of factor loading or metric invariance was supported in the first-order model (Table 5, M2). Additional constraints on the second order factor loadings (Table 5, M3; see also Fig. 1) yielded a difference in CFI of −.011, above the suggested difference value of −.005. However, considered alongside the changes in TLI and RMSEA, invariance of the second order factor loadings was considered to hold. Therefore, the assumptions of factor loading, or metric invariance, were deemed to be reasonable for the AQ-S in both the first and second order factors across ASD and control groups.

Table 5 Model fit statistics for multi-group invariance models

We then proceeded to assess scalar invariance. The difference in model fit statistics between models M3 and M4 suggested that the assumption of scalar invariance of the indicator variables did not hold (∆CFI = −.18; ∆TLI = −.16; ∆RMSEA = .032). We investigated the source of misfit based on the MI. However, after twelve individual item thresholds had been released, model fit was still poor, and so model modification was terminated. All thresholds for items 3, 4, 23 and 46 were released suggesting these items performed very differently across groups. Considering the response counts for these items confirmed this. For example, for item 4 (I frequently get so strongly absorbed in one thing that I lose sight of other things), the distribution of responses across the four points of the response scale were 2, 2, 30 and 113 in the ASD group, and 60, 66, 30, 10 in the control group.

As the focus of the current study was not on mean differences in latent factor scores, and the fact that we were unable to achieve first-order scalar invariance, we did not continue to fix the intercepts of the second order factors.

Discussion

In the present study, we found that the higher-order structure of the AQ-S suggested by previous studies exhibited configural and metric invariance across an ASD and control group in a series of multi-group CFA models. We were not, however, able to achieve scalar invariance even in the first-order model. After releasing equality constraints on 12 item thresholds, the model fit remained outside of the bounds that would indicate acceptable fit.

Implications

The importance of assessing measurement equivalence of a measure used across two groups is widely acknowledged, however, the present study was, to our knowledge, the first to examine the measurement equivalence of a questionnaire used across both ASD and control groups. This is important because measures of autistic traits are frequently administered to both groups, with varying research goals.

Which level of invariance is required and how problematic violations of invariance are depend on the specific purpose of using the measure in the two groups (Borsboom 2006). Our finding that the measure exhibited invariance at the metric level suggests that the manifest variables of the AQ-S measure the same latent traits across ASD and control group. From a substantive point of view, this is interesting because it implies that the same constructs can describe autistic traits in ASD and control groups. This does not imply that these traits are necessarily continuously distributed across these groups (e.g. see Frazier et al. 2010), but it does suggest that even if the ASD and non-ASD represent distinct latent categories, then the two categories may still be definable in terms of the same latent autistic traits. To formally test this hypothesis, it would be interesting to conduct latent class or taxometric analyses including tests of invariance across categories with the AQ-S, however, we did not have the requisite sample size in the present study.

We did not observe complete scalar invariance, even in our first-order model. A lack of scalar invariance suggests test bias and means that equal observed scores on the AQ-S does not necessarily imply equal levels of autistic traits in an individual drawn from an ASD versus a non-ASD population. This may makes sense in the context of the ALT-ASD continuity hypothesis which proposes that those who score high on ALTs may not receive a diagnosis of ASD if they are able to compensate and achieve adaptive levels of social functioning in spite of being predisposed to ASD. This could result in non-diagnosed individuals having higher levels of autistic traits for the same AQ-S score as an individual with a clinical diagnosis of ASD. It also is consistent with a systematic under-reporting of symptoms in individuals with ASD due to deficits in self-perception (Johnson et al. 2009). This latter hypothesis would predict that scalar invariance could be observed if data were obtained from informant raters rather than via self-report and would be interesting to test in future research.

Strengths and Limitations

In the present study we examined the measurement equivalence of the AQ-S because it is a popular measure of autistic traits used frequently in individuals both with and without ASD. Historically, defining the core features of ASD in light of its complexity and heterogeneity has proven a challenge (Rajendran and Mitchell 2007). The AQ and its derivative, the AQ-S, represented an attempt to tackle this issue and distil ASD into its core trait dimensions using psychometric tools of analysis. The AQ has undergone extensive psychometric evaluation and the majority of studies support its psychometric utility. Wheelwright et al. (2010) note that it shows consistent results over time, produces highly heritable scores, predicts clinical diagnosis of ASD and correlates with brain function, genetic polymorphisms in ASD candidate genes, social attention and prenatal testosterone levels. The more recently developed AQ-S yielded scores that correlated strongly with those of the full AQ in both a control samples and an ASD sample, suggesting minimal loss of information moving to a briefer inventory (Hoekstra et al. 2011). They also demonstrated good sensitivity and specificity in classifying individuals as having or not having a clinical diagnosis of ASD (ibid).

One issue, however, is that the AQ was designed for the express purpose of measuring autistic traits in adults with normal intellectual functioning (Baron-Cohen et al. 2001). For example, behaviours that are less common in high functioning ASD than low functioning ASD such as repetitive sensori-motor behaviours are not represented in the AQ-S. In fact, estimates suggest that approximately 50–70 % of individuals with ASD have an intellectual disability (Matson and Shoemaker 2009). The AQ-S is, therefore, not applicable to a large proportion of individuals with ASD and also may be less likely to show invariance across a low functioning ASD group and control group. Nevertheless, this does not undermine the validity of the current results with respect to individuals of normal intellectual functioning.

It is also important to consider the effect of our particular sample on the generalisability of results. We used a clinical sample of individuals with an ASD diagnosis, as well as a sample drawn from the general population. The relative rarity of clinical disorders can limit the sample sizes attainable for studies on clinical disorders. In the present study our sample size was small for an application of multi-group CFA, however, simulation studies have suggested that a lack of invariance can be detected for group sizes of down to 100 if the effect size is not small and once the sample size reaches 200 per group, a lack of invariance can generally be detected well, irrespective of the effect size (Kim and Yoon 2011). Our group sample sizes of 148 for the ASD group and 166 for the control group fell between these two values, suggesting that our study was sufficiently powered to detect a lack of invariance unless it was of only small effect size. Thus, while we can rule out moderate to large differences in the factor structure of the AQ-S across individuals with and without ASD in the current sample, replication of these findings and larger samples may be required to formally test for small differences in factor loadings across groups. We were also more likely to observe non-invariance by using a general population sample as our comparison control group. Had we used a sample comprising, for example, relatives of individuals with ASD, who would have likely had more similar levels of autistic traits to a clinical ASD sample we may not have observed such a large decrease in fit when adding scalar constraints across the ASD and control group.

Another factor which could affect the generalisability of the present results is the differing sex ratios of the control and ASD group. ASD is more prevalent in males than females, with estimates suggesting an average sex ratio of 4.3:1 in the disorder and with an even higher male: female ratio in samples of individuals with higher IQs, such as that utilised in the present study (Rivet and Matson 2011). Thus, in the present study the sex ratios favoured males in the ASD group and females in the control group. It is possible, given that there has been much discussion of possible sex differences in ASD, that our failure to find threshold invariance partly reflects these possible sex differences (Rivet and Matson 2011). The results from our single group CFAs using both sex/ASD homogenous versus heterogeneous samples suggested that the source of invariance had its origin in differences between ASD and control individuals rather than in sex differences. However, although these models are suggestive, they did not provide a formal test of differences across ASD status and sex. Such tests would be possible by applying a four group invariance analysis, with homogenous sex/ASD groups. Again, the current sample was not large enough for such an analysis. Sex differences in ASD are, nevertheless, not yet well understood and further research would be required to examine possible sex differences in the constructs measured by the AQ-S.