Individuals with autism spectrum disorders (ASD), including Autistic Disorder, Asperger’s Disorder, and Pervasive Developmental Disorder-Not Otherwise Specified (PDD-NOS), exhibit pervasive deficits in social interaction and communication and often engage in restricted and repetitive behaviors (American Psychiatric Association 2000). Family and twin studies demonstrating high recurrence rates and heritability provide evidence for genetic transmission of the disorder (Bailey et al. 1995; Bolton et al. 1994; Folstein and Rutter 1977). These studies have also identified a set of traits in unaffected relatives that are qualitatively similar to the defining features of autism. These traits are therefore thought to represent the genetic liability for ASD and are referred to as the broader autism phenotype (BAP; Piven et al. 1997). The broad objective of this paper is to provide a thorough psychometric evaluation of self-report measures of the BAP in a non-clinical sample.

It has recently been proposed that the BAP can and should be studied in the general population (e.g., Constantino and Todd 2003). To this end, a number of studies have found that autistic behavioral traits are continuously distributed in population-based samples (Constantino and Todd 2003; Hoekstra et al. 2007; Hurst et al. 2007a; Stewart and Austin 2009). These traits are heritable (Constantino and Todd 2003, 2005; Hoekstra et al. 2007) and associated with a wide range of measures of social functioning (Ingersoll 2010; Jobe and Williams White 2007), personality structure (Austin 2005a; Wakabayashi et al. 2006), related psychopathology like mood and anxiety disorders (Kunihira et al. 2006), cognitive traits (Wheelwright et al. 2006), and neurological processing (Di Martino et al. 2009). This pattern of findings from non-clinical samples suggests that BAP features have broader relevance for describing and predicting psychological functioning in the general population.

The original studies of the BAP in family members utilized extensive family history interviews, structured interviews of personality, and standardized measures of language functioning (Landa et al. 1992; Murphy et al. 2000; Piven et al. 1997a, b). However, recent studies examining the BAP in the general population have relied on self- or informant-report questionnaires, most of which were designed to identify ASD rather than the BAP. The most commonly used measure is the Autism Spectrum Quotient (AQ; Baron-Cohen et al. 2001). The AQ is a self-report questionnaire that was originally developed to identify ASD among adults with normal intelligence. It contains five theoretically defined subscales of autistic behavior: Social skills, attention switching, attention to detail, communication, and imagination.

The AQ has been shown to distinguish between individuals with high-functioning ASD and typically developing individuals (Baron-Cohen et al. 2001), as well as individuals with other psychiatric disorders (Woodbury-Smith et al. 2005). A number of studies have used the AQ as an index of BAP, both in relatives of individuals with ASD as well as the general population. These studies have found that parents of children with ASD score higher on several subscales of the AQ than parents without children with ASD (Auyeung et al. 2010; Woodbury-Smith et al. 2005) and that autistic traits measured on the AQ are normally distributed in the population (Hurst et al. 2007a) and heritable (Hoekstra et al. 2007). Further, consistent with the finding that the BAP is significantly more common in males (Szatmari et al. 2000) and that individuals with ASD are more likely to have a relative in a scientific occupation (Baron-Cohen et al. 1998), several studies have found that males score higher on the AQ than females, as do individuals whose parents are in scientific occupations or who are pursuing scientific occupations themselves (Austin 2005; Baron-Cohen et al. 2001). Finally, a number of studies have found an association between AQ scores and anxiety and depression (Kunihira et al. 2006), schizotypy (Hurst et al. 2007), personality traits such as low extraversion and high neuroticism (Austin 2005; Wakabayashi et al. 2006), as well as poor interpersonal functioning indicated by fewer friendships, higher loneliness, and greater likelihood of being bullied (Jobe and Williams White 2007; Kunihira et al. 2006) in non-clinical samples.

Another questionnaire that has been used in BAP research in the general population is the Social Responsiveness Scale (SRS). The SRS is a parent/teacher questionnaire that was developed in a general population sample to distinguish children with ASD from children with typical development or other psychiatric disorders (Constantino 2002). The published instrument was developed for children ages 4–17; however, the SRS has recently been adapted for use with adults (SRS-A) by changing the wording on several items (Constantino and Todd 2005).Footnote 1 The SRS focuses on reciprocal social behaviors which have been found to be impaired in ASD, but includes items related to all three ASD symptom domains. It has five theoretically defined subscales: social awareness, social cognition, social communication, social motivation, and autistic mannerisms. The SRS has been found to effectively distinguish between children with ASD and those with non-ASD diagnoses (Constantino et al. 2003, 2007). It has also been shown to have reasonable convergence with well-established, clinician-administered measures of ASD, including the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule (ADOS) (Constantino et al. 2003). A number of studies that have used the SRS as an index of the BAP have shown that siblings of children with ASD score higher on the SRS than siblings of children with non-ASD psychopathology (Constantino et al. 2007) and that autistic traits measured on the SRS are continuously distributed and heritable in the general population (Constantino and Todd 2003, 2005). Finally, a recent study found that self-reported autistic symptoms on a modified version of the SRS were associated with a number of psychiatric problems including anxiety, depression, atypicality (having unusual thoughts or perceptions), and interpersonal difficulties, in a non-clinical sample of college students (Kanne et al. 2009).

Unlike the two previous instruments which were originally developed to identify individuals with ASD, the Broad Autism Phenotype Questionnaire (BAPQ) was specifically developed to identify individuals with the BAP (Hurley et al. 2007). The BAPQ was developed in a sample of parents of children with ASD and was designed to correspond to a conceptualization of the BAP as a set of personality traits. As such, it has demonstrated convergent validity with direct clinical assessment of the BAP using interview, clinical assessment, informant report, and consensus ratings by trained raters (Hurley et al. 2007). The BAPQ has three theoretically defined subscales thought to represent the key components of the BAP: aloof, pragmatic language, and rigidity. As the most recently developed scale, there is only a single published study on the BAPQ and it has not yet been evaluated in the general population.

Although studies based on the AQ, SRS, and BAPQ have found autistic traits to be correlated with deficits found in individuals with ASD, it is unclear how well the instruments actually perform at measuring the BAP in the general population. Further, it is unclear how well these measures compare in terms of their psychometric properties. Given the recent acceleration in research studies investigating the BAP in the general population, as well as the potential utility of self-report questionnaires in genetic studies of autistic traits in population-based samples, it is important to assess and compare the psychometric characteristics of these self-report questionnaires of the BAP in non-clinical samples.

A valid and useful measure of the BAP in the general population would have several characteristics. First, it would produce scores with a high degree of internal consistency, both in terms of the internal consistency of the overall score and the individual subscales. Previous studies of the SRS and BAPQ have shown acceptable (i.e., >.85) levels of internal consistency for the overall scores and individual subscales (Constantino et al. 2003; Hurley et al. 2007). Research on the AQ has found it to have moderate (.67) (Hurst et al. 2007b) to acceptable (>.70) internal consistency for the overall AQ score (Austin 2005; Baron-Cohen et al. 2001). However, internal consistency estimates for the individual scales can be significantly lower, ranging from a low of .34 to a high of .77 across studies.

Second, given the higher rate of ASD (APA 2000) and the BAP (Piven et al. 1997a, b) in males, it would be expected that males would score higher than females on the measure. All three measures have been shown to exhibit this pattern in previous studies; however, at least one study on the AQ in the general population has not found this sex-linked difference (e.g., Hurst et al. 2007b). Additionally, given research suggesting that dimensional autistic traits are continuously distributed in the general population (Constantino and Todd, 2003; Hoekstra et al. 2008), it would be expected that the overall scores on the measure would be continuously distributed. A number of studies have found the AQ to be normally distributed in the general population (Baron-Cohen et al. 2001; Hoekstra et al. 2008; Hurst et al. 2007b). Findings on the SRS have been less consistent with some suggesting a normal distribution in the general population (Constantino et al. 2006) and others suggesting a positively skewed distribution (Constantino and Todd 2005). The distributions of the BAPQ scores in the general population have not been previously reported.

Third, it is expected that a valid measure of the BAP in the general population would have a replicable factor structure. The measures vary in their number of subscales as well as their content. Both the AQ and SRS-A contain five subscales; however, factor analytic studies of these two instruments have suggested different structures. The AQ has been found to have between two and four factors (Austin 2005; Hoekstra et al. 2008; Hurst et al. 2007b; Stewart and Austin 2009). Social skills and communication consistently emerge as prominent factors; however, the replication of other factors such as attention to detail and imagination tends to be less consistent. The SRS has been found to be best represented by a single factor (Constantino et al. 2004). This finding is likely due to the fact that almost all items on the SRS-A relate to social and communication behavior, which have been found to substantially overlap in individuals with ASD (Lord et al. 2000; Tadevosyan-Leyfer et al. 2003). The factor structure of the BAPQ has not yet been evaluated, although its subscales correspond to the three defining behavioral domains of ASD (social, stereotyped-repetitive, and communication deficits) and the conceptualized structure of the BAP in relatives (Piven et al. 1997a, b). Thus, the precise structure of these measures is unclear at this point.

Fourth, a valid measure should show theoretically predictable patterns of association with criterion constructs (i.e., criterion validity), and a superior measure would increment other measures of the BAP in predicting the above related constructs (i.e., incremental validity). A number of psychopathology and personality constructs have been proposed to relate to the BAP. In terms of psychopathology, both social phobia and obsessive–compulsive disorder have been found to be higher in relatives of individuals with ASD, and have been proposed as features of the BAP (Micali et al. 2004; Piven and Palmer 1999). Studies have also found rates of depression to be higher in relatives, although there is some debate as to whether depression is part of the BAP (Ingersoll and Hambrick 2011; Micali et al. 2004; Piven and Palmer 1999). There also appears to be a significant overlap between the BAP and schizoid personality disorder (Constantino et al. 2009). Further, there is evidence that schizotypy shares a number of features with the BAP, particularly involving the facet of social anhedonia (Hurst et al. 2007a, b). Likewise, given the interpersonal deficits that are thought to comprise the BAP (e.g., fewer and poorer quality friendships), it would be expected that scores on measures of the BAP should be positively correlated with measures of interpersonal problems.

Last, measures of the BAP should be linked with the Big Five domains. Research on the BAP in relatives of those with ASD suggests that individuals with the BAP present with aloof, shy, over-sensitive, anxious, impulsive, irritable, and eccentric personality features. These traits have been shown, via factor analysis, to represent three broad constellations: withdrawn, difficult, and tense (Murphy et al. 2000), which map conceptually onto three dimensions of the five factor model of personality (Goldberg 1993): low extraversion, low agreeableness, and high neuroticism, respectively. Subsequent research has confirmed that individuals with the BAP, as well as those with a diagnosis of autism, score low on the traits of extraversion and agreeableness and high on the trait of neuroticism (Austin 2005; De Pauw et al. 2010; Wakabayashi et al. 2006).

In light of these criteria for evaluating measures of the BAP, the goal of the present study was to compare the AQ, SRS-A, and BAPQ in terms of gender differences, internal consistencies, distributions of scores, factor structure, and criterion-related validity in a large, non-clinical sample in order to provide evidence that will assist researchers in selecting an instrument in further investigations of the BAP in non-clinical samples.

Method

Participants

Six hundred and eighty undergraduate students at a large, Midwestern university were recruited through a departmental subject pool and participated for course credit. Participants were recruited in two separate samples (N 1 = 341; N 2 = 339). Fifty-four participants (8%) were excluded due to incomplete data, yielding a final sample of 312 participants in Sample 1 and 314 in Sample 2. The total sample was 76% female (N = 476) and ranged in age from 18 to 45 years (M = 19.68, SD = 2.02). Participants’ reported ethnicity was 84.2% White, 6.2% Black, 8% Asian/Pacific Islander, 1% Hispanic, .3% Native American, and .2% Bi/Multiracial.

Measures

Broader Autism Phenotype

Participants in both samples completed three self-report measures of the BAP. The Autism Spectrum Quotient (AQ; Baron-Cohen et al. 2001) is a 50-item self-report scale for identifying high-functioning autism in individuals with normal intelligence. Each item is rated on a 4-point scale, “definitely agree”, “slightly agree”, “slightly disagree”, and “definitely disagree.” Items are scored dichotomously as “0” or “1”, with “1” representing answers in the direction of autism. The Social Responsiveness Scale-Adult (SRS-A; Constantino and Todd 2005) is a 65-item informant-report questionnaire for assessing autistic traits. Each item on the SRS-A is rated on a 4-point (0–3) scale, “not true”, “sometimes true,” “often true,” and “almost always true.” For the purposes of this study, the SRS-A was adapted to a self-report format by changing the wording in the instructions to “Please check the box that best describes how others would describe your behavior.” The Broad Autism Phenotype Questionnaire (BAPQ; Hurley et al. 2007) is a 36-item self-report scale developed to identify the broader autism phenotype in relatives of individuals with ASD. Each item of the BAPQ is rated on a 6-point (1–6) scale, from “very rarely” to “very often.” Scores are summed for the first two measures, and averaged for the third, with higher scores indicative of greater likelihood of expressing the BAP.

Random Responding

The Personality Assessment Inventory (PAI; Morey 1991) Infrequency scale was administered to both samples to detect participants who may have produced random data. It asks questions that are endorsed very infrequently in normative studies. Individuals with T scores >75 were removed from the data, meaning that the data provided by any respondent who endorsed such items at a level that is 2.5 standard deviations higher than observed in the PAI community normative sample were not included in this study. This procedure led to the removal of 8 participants in sample 1 (N = 304) and 17 participants in sample 2 (N = 297).

Sample 1 Criterion Measures

Several criterion measures assessing psychopathology and personality constructs with theoretical relations to the BAP were administered to sample 1. The Social Phobia Inventory (SPIN; Connor et al. 2000) is a 17-item self-report scale used to identify symptoms of social phobia (alpha = .92). The Schizotypal Personality Questionnaire-Brief (SPQ-B; Axelrod, Grilo, Sanislow, & McGlashan, 2001) is a 22-item measure of schizotypal personality features (alpha = .81). The Obsessive–Compulsive Inventory—Revised (OCI-R; Foa et al. 2002) is an 18-item self-report questionnaire that measures obsessive–compulsive symptoms (alpha = .93). The Center for Epidemiological Studies-Depression Scales (CES-D; Radloff 1977) is a 20-item self-report scale that measures depressive symptoms in the general population (alpha = .92). The Big Five Inventory (BFI; John et al. 1991) is a 44-item measure of Big Five personality traits (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness). The alpha coefficients for these scales ranged from .77 to .95. The Inventory of Interpersonal Problems-Short Circumplex (IIP-SC; Hopwood et al. 2008; Soldz et al. 1995) is a 32-item measure of interpersonal problems. Its total score (alpha = .90) represents interpersonal distress.

Sample 2 Criterion Measures

A different set of psychopathology and personality criterion measures were administered to sample 2. Two scales from the Personality Diagnostic Questionnaire-4 (PDQ-4; Hyler 1996), which was designed to assess the DSM-IV criteria for personality disorders, were administered to assess obsessive compulsive (alpha = .48) and schizoid (alpha = .67) personality disorders. The scales have 8 and 7 items, respectively.

Analyses

Analyses were conducted in several stages designed to evaluate internal consistency, gender differences, dimensionality, and structure of the three BAP measures, test their criterion validity, and test their ability to increment one another in criterion predictions. We computed Cronbach’s alpha internal consistency coefficients for each scale and the total score of each measure in the combined sample. We then compared scores across genders to test the hypothesis that men would generally achieve higher scores than women. We used a conservative Type I error rate of .01 to account for multiple tests of gender differences. We next visually examined the distributions for the subscales of each measure using the full sample, with the expectation that the distributions would be continuous. Given previous reports of a normal distribution for the AQ and SRS, we also computed skewness and kurtosis values in order to evaluate the magnitude and significance of scale departures from normality. All of the above analyses involved the combined sample.

To test the structure of the measures we conducted an exploratory factor analysis (EFA) with principle axis factoring and oblique (oblimin) rotation on each inventory. We forced the number of factors proposed by previous theory and research for each instrument. We assessed replicability in two ways. First, we conducted the EFAs in the combined sample, and correlated the pattern coefficients to binary codes based on the scoring algorithms published previously for each measure. These point-biserial correlations provide an indication of the congruence between the observed solution and theoretical expectations. Second, we conducted the same analysis in each sample separately, and rotated the factors from Sample 2 to the Sample 1 target matrix using Procrustean rotation methods as implemented in Mplus. This permitted the computation of Pearson congruence coefficients across samples. Conventionally, congruence coefficients >.85 are regarded as acceptable for demonstrating structural generalizability. However, even well-validated measures of personality sometimes fall short of this benchmark (Hopwood and Donnellan 2010). For the purposes of comparing measures in this study, these coefficients were considered relative to one another as opposed to conventional standards.

To assess the criterion validity of the BAP measures we computed bivariate correlations between the total scores for each BAP measure and a number of variables that are theoretically or empirically related to the BAP, as described above. The correlation coefficient for each of these analyses was given as an indication of the relation between each BAP measure and the criterion variables. We evaluated incremental validity by computing partial correlations between each BAP measure and the criterion variables, while controlling for each of the other two BAP measures.

Results

Internal consistencies for the three BAP measures are given in Table 1. Alphas for the total scores for each BAP measure were adequate (>.70). The values were somewhat low (i.e., all <.70) for the individual subscales of the AQ. In contrast, all of the SRS-A subscale scale alphas except Social Awareness (.64) and all of the alphas for the BAPQ subscales were >.70, suggesting adequate internal consistency. Means and standard deviations in the full sample and across men and women are also reported in Table 1. Independent samples t-tests show that men tend to have significantly (p < .01) higher scores than women on the overall SRS-A and all of its subscales as well as the overall BAPQ and two of its subscales (Aloof and Pragmatic Language). In contrast, there were no sex differences on the overall AQ and only its Imagination subscale showed the expected sex differences.

Table 1 Characteristics of the scales of three BAP measures across both study samples

Table 1 also shows skew and kurtosis statistics and standard errors. Visual inspection of the distributions suggested that the total scores for each measure and their subscales were continuously distributed. Using a p value of .01, the overall BAPQ and two of its subscales (Pragmatic Language and Rigidity) did not depart significantly from normality. In contrast, the overall SRS-A and all of its subscales as well as the overall AQ and three of its subscales (Social Skills, Communication, and Imagination) were positively skewed. The total SRS-A and two AQ subscales, Social Skills and Attention to Detail, showed excessive kurtosis.

Item-level EFAsFootnote 2 were conducted for each measure in the combined sample in order to explore their structure and produce a target matrix with which to test the replicability of that structure against theoretical binary codes. Analyses of the AQ showed that 17 factors had eigenvalues >1. Examination of the scree plot and a minimum average partial (MAP) procedure (Velicer 1976) both suggested the retention of 3 (19.45% variance explained) factors. Following oblimin rotation, factor correlations ranged from .04 to .17.

Thirteen SRS-A factors had eigenvalues >1. However, the MAP test recommended five factors, and the scree plot suggested the existence of 3 (36.28% variance explained) or perhaps 4 (38.41%) factors. It is notable that previous research on this instrument has suggested that its items can be represented by a single factor. However, in this sample, the first factor only explained 27.20% of the variance in its item intercorrelations. In the three factor solution, 13 items loaded most strongly on the rotated second factor and 12 loaded most strongly on the rotated third factor; in the four factor solution 13, 13, and 15 items loaded most strongly on factors 2 through 4, respectively. Notably, none of the items on an ostensible fifth factor had the strongest pattern coefficient. Although these results are ambiguous in terms of the precise structure of the SRS-A, they are clearly inconsistent with the interpretation of a single factor. For the four factor solution, factor correlations ranged from .11 to .55.

Results from an EFA of the BAPQ were the most consistent with theoretical expectations among the three measures evaluated here. Although 7 factors had eigenvalues >1 and the MAP test suggested the extraction of four factors, the scree plot clearly indicated the extraction of three factors, which explained 38.44% of the variance among BAPQ items. Only two items from the fourth extracted factor demonstrated the strongest pattern coefficients. Overall, these results support the instrument’s proposed three-factor structure. The intercorrelations of these three factors ranged from .18 to .25 following oblimin rotation.

Table 2 shows the replicability of the proposed structures of these three measures compared to theoretical expectations and across samples. Replicability was examined two ways. First, factor pattern coefficients from the combined sample analyses described above were correlated with binary codes reflecting the items that are scored on the scales of each measure. These point-biserial correlations indicate the degree to which the empirically derived factors correspond to the theoretical structure of the test. Results show that all three measures had questionable values for some scales, but also that the BAPQ outperformed the other measures (average coefficient = .77), and the AQ (.55) outperformed the SRS-A (.28). This was even the case when interpretation was restricted to those factors that were supported empirically by EFAs as described above (i.e., average coefficient among the first 3 for the AQ = .56, among first 4 for the SRS-A = .33). Second, EFAs were conducted in both samples, with the pattern coefficients from Sample 2 rotated with a Procrustean routine designed to maximize its convergence with the Sample 1 matrix. Only the BAPQ had coefficients across all factors that were at or above the .85 benchmark that is often used to signify satisfactory congruence. Three such factors achieved this value for the AQ and 2 did for the SRS-A.

Table 2 BAP factor replicability

We conducted a series of bivariate and partial correlation analyses to test the convergent and incremental validity of each of the BAP measures. Table 3 shows the bivariate correlations between the total score for each BAP measure and the outcome variables. Each BAP measure correlated strongly with the other two measures (rs ranged from .55 to .66). These correlations were generally stronger than those between the BAP measures and the related constructs. Each BAP measure also correlated significantly and in the expected direction with a number of theoretically related constructs. In general, the SRS-A and BAPQ correlated more strongly with a greater number of variables than the AQ. However, in many cases the magnitude of these correlations differed only slightly.

Table 3 Total score correlations with theoretical criteria for each measure

Given the overall similarity of these criterion correlations, it is useful to assess the ability of each instrument to increment one another. These results are depicted in Table 4. Each measure incremented the others in predicting a number of criterion variables. It was again evident that the AQ had somewhat lesser validity. For instance, the BAPQ and SRS-A provided incremental validity at p < .01 beyond the AQ for 22/24 (92%) of outcome variables whereas the AQ provided incremental information beyond the other measures in only 11/24 (46%) of outcome variables. Further, in cases where it did significantly increment the other BAP measures, the partial correlation coefficients were generally lower. For example, the average of the absolute values of significant partial r’s for the AQ was .21, compared to .34 for the SRS-A and .30 for the BAPQ.

Table 4 Incremental criterion-related validity (partial r) of three measures of BAP

Discussion

This study compared three self-report measures of the broader autism phenotype with regard to their psychometric properties in a non-clinical sample. Evaluation criteria included internal consistency, expected gender differences, continuity of scale distributions, factor structure, and criterion-related validity. In many cases, our findings replicated previous research on the psychometric properties of the measures; however, by administering all measures to the same individuals, we were able to directly compare these three inventories. This study therefore provides much needed comparative data for researchers interested in selecting an instrument best suited to their purposes.

Internal Consistency, Gender Differences, and Score Distributions

Each BAP measure evidenced satisfactory internal consistency for its total scale (Cronbach’s alphas >.70). The BAPQ exhibited internal consistency >.70 for all three of its subscales and the SRS-A exhibited values above this benchmark for four of its five subscales. In contrast, none of the individual subscales of the AQ were found to have alphas greater than .70. Thus, our findings are consistent with other work in suggesting modest internal consistency for the individual subscales of the AQ (e.g., Hurst et al. 2007a, b). Given the fact that the AQ and BAPQ scales are of similar length, the BAPQ within-scale items are clearly more homogeneous. Austin (2005) identified a three factor structure in the AQ whose internal reliabilities were somewhat improved over the original subscales. However, this is not necessarily surprising given that these scales had more items than the original scales, and even the revised scales have been found to have low internal consistency in some studies (e.g., Hurst et al. 2007b; Ingersoll 2010).

Consistent with previous theory and findings, males scored higher than females on the total scale for the SRS-A and each of its subscales, as well as on the total scale for the BAPQ and two of its three subscales (Aloof and Pragmatic Language). However, males scored higher than females on only one of the AQ subscales (Imagination), and no difference was found for the total AQ score. This finding was unexpected as the majority of studies of the AQ in the general population have suggested that males score higher than females, both on the overall score and the majority of the subscales (e.g., Baron-Cohen et al. 2001; Hoekstra et al. 2008; Stewart and Austin 2009).

All measures and their subscales showed a continuous distribution, consistent with the proposed distribution of autistic characteristics. The distribution of the total BAPQ and two of its three subscales was normal. The total SRS-A and all of its subscales, as well as the total AQ and three of its subscales were positively skewed and the total SRS-A and two of the AQ subscales were excessively kurtotic. Although it is common for measures of psychopathology to be skewed, previous research has suggested that the SRS (Constantino et al. 2006) and AQ (Hoekstra et al. 2008; Hurst et al. 2007b) are normally distributed in the general population. Thus, the fact that we failed to find a normal distribution of scores for these instruments is surprising. It is possible that the difference between the current and previous findings may have been due to our chosen alpha level (p < .01) for determining skew. Additional research is needed to determine whether the distribution of autistic traits in the general population is indeed normal.

Differences in the internal consistency and distribution of scores on the AQ across studies may be due to cultural differences in populations on which the AQ was used (Hurst et al. 2007a, b). For example, studies which have found higher internal consistency for the individual subscales as well as the expected sex differences were conducted in the UK (Baron-Cohen et al. 2001) or Japan (Wakabayashi et al. 2006). In contrast, studies reporting lower internal consistency and a lack of sex differences were conducted in the US (Hurst et al. 2007a; Hurst et al. 2007b). However, even the UK and Japanese studies reported alphas in the marginally acceptable range. Thus, there is mounting evidence that the scale reliabilities of the AQ are inferior to those of the SRS-A and BAPQ when used in the general population.

Factor Structure

The BAPQ was found to be superior to both the SRS-A and the AQ in terms of the replicability of its theoretical factor structure. Findings from an exploratory factor analysis were consistent with the BAPQ’s three-factor theoretical structure. Further, the three-factor solution obtained from our data corresponded well with the original scales and replicated across samples. These findings suggest that the theoretically-derived scales converge well with the empirical data and that the proposed factor structure should be retained.

Extraction algorithms from an EFA of the AQ were ambiguous, but overall the results suggested the interpretation of 3 factors. Although the AQ is proposed to have 5 factors, these factors explained limited variance in scale inter-correlations and point-biserial correlations between the factor pattern coefficients obtained from the combined sample EFA and binary codes reflecting item composition on the measure’s subscales were low. Furthermore, our interpretation of a 3-factor solution is consistent with other evaluations of the AQ using EFA (Austin 2005). This finding along with low coefficient alphas for the subscales suggests that the originally proposed structure of the AQ is unsatisfactory when used with non-clinical samples. Given evidence of a 3-factor solution, this measure could be improved by refining these factors rather than retaining the original five (Hurst et al. 2007b).

Previous work on the SRS using principal components analysis has suggested that most of the items load onto a single factor that explains the majority of the variance. This finding, along with a high correlation between subscales, has led the measure’s authors to suggest that the SRS (and SRS-A) is best represented by a single underlying dimension of reciprocal social behavior (Constantino et al. 2003, 2004). Evidence regarding factor extraction in our sample was ambiguous for the SRS-A, but overall the results of an EFA on SRS-A items were inconsistent with this interpretation in that they suggested 3 or 4 factors. Furthermore, factors 2 through 4 showed sizeable pattern coefficients with a wide range of items and explained significant variance in item inter-correlations. However, these factors corresponded somewhat poorly to their theoretical item composition and did not replicate well across samples, suggesting structural problems for the SRS-A.

Criterion Validity

Bivariate and partial correlation analyses were used to examine the convergent and incremental validity of each of the BAP measures. Each measure correlated strongly with the other measures. Further, on average, the BAP measures correlated better with each other than with the related constructs. Although not surprising, this finding indicates that all of the BAP measures are tapping a similar construct. This finding is important given that research examining the convergent validity of self-report measures of the BAP is limited.

As predicted, each of the measures correlated with related constructs and incremented the others across a number of criterion variables; however, the AQ provided relatively less incremental validity than the other two. Thus, the SRS-A and BAPQ emerged as superior to the AQ in terms of criterion-related validity. This result may have to do with the poorer reliability of AQ scales.

The SRS-A and BAPQ performed similarly in terms of their criterion validity. To this end, it is informative to examine the specific patterns of incremental validity for the different BAP measures. Both measures were closely related to social phobia and interpersonal problems, constructs which represent specific difficulties with social/interpersonal functioning. The SRS-A appears to be more closely related to OCD and depression, mood and anxiety constructs found to be higher in first and second degree relatives of individuals with ASD (Piven and Palmer 1999), and schizotypy, which shares a number of features with the BAP and may be difficult to distinguish from the BAP in a general population sample (Hurst et al. 2007a). In contrast, the BAPQ is more closely related to the personality traits of extraversion, agreeableness and neuroticism, which have been associated with the BAP (Murphy et al. 2000), as well as OCPD and low openness, which share key features of rigidity with the BAP. Thus, the BAPQ appears to be more closely related to personality constructs that relate to aloofness (introversion) and rigidity (low agreeableness and OCPD), characteristics that have been proposed as key components of the BAP (Hurley et al. 2007; Piven et al. 1997a, b). This correspondence is not surprising given that the BAPQ was validated against direct clinical assessment of the BAP, which defined the BAP, in part, based on aloof and rigid personality features on the Modified Personality Assessment Schedule—Revised (M-PAS-R; Hurley et al. 2007). Taken together, these results suggest that, while both the SRS-A and BAPQ evidence strong criterion validity in terms of their convergence with related constructs and their ability to increment the other measures in predicting those constructs, the BAPQ is more closely related to the defining features of the BAP, whereas the SRS-A is more closely related to peripheral features of the phenotype, and may be indicative of psychopathology more generally.

Conclusions

There is evidence that all three instruments examined in this study can be used as reasonably valid self-report measures of the BAP in non-clinical samples of college students. However, the AQ had the weakest performance in this study; its scales were less reliable than those of the other measures, it did not show expected gender differences, its factor structure was ambiguous, and it had the weakest criterion validity.Footnote 3 Thus, there are reasons to recommend the SRS-A and BAPQ over the AQ.

Moroever, these data seem to suggest that the BAPQ is the best of the three measures we considered. For example, the BAPQ had adequate internal consistency, the expected distribution of scores and gender differences, anticipated and replicable factor structure, and criterion and incremental validity. It was the only measure with a replicable factor structure which furthermore corresponds to theories about the nature of the BAP. Indeed the subscales of the BAPQ may be particularly and uniquely useful in providing information that is separate from and augmentative of the total score. In addition, the BAPQ is substantially shorter (36 items) than the other two instruments (AQ: 50 items; SRS-A: 65 items). Finally, the BAPQ was the only measure that was developed to assess the BAP, rather than ASD. As such, it has been found to accurately discriminate between individuals who do and do not express the BAP based on direct clinical assessment (Hurley et al. 2007). In addition, the fact that the BAPQ exhibited psychometric properties similar to those obtained in a parent sample, suggests that this instrument can be used effectively in studies of the BAP in the general population as well as population-based genetic studies of ASD.

Limitations

There are several limitations to the current study. First, our study used only self-report measures of the BAP with adults. While the goal was to compare the psychometric properties of these self-report measures, there is some evidence that self-report of autistic-like symptoms by children may yield lower heritability estimates than parent- or teacher-reports (Ronald et al. 2008). The only study to compare self- versus informant- report of the BAP using the BAPQ found that both methods were valid; however, informant-report was slightly superior in terms of sensitivity and specificity (Hurley et al. 2007). Thus, a multi-method approach which uses both self- and informant-report instruments would likely yield a more valid measure of the BAP (Hurley et al. 2007). A related limitation is that we converted the SRS-A from an informant-report measure to a self-report measure by changing the wording to be consistent with the other measures. Although various studies of personality functioning have suggested that this type of modification yields similar findings in terms of validity, it is unknown how this adaptation may have affected the measure’s psychometric properties.

Another limitation is that we did not compare the measures to direct clinical assessment (i.e., interview/behavioral) of the BAP. Thus, we could not compare the instruments in their ability to accurately identify the BAP as defined by this standard. Further, we selected measures of psychopathology that were expected to relate to the BAP. Thus, we were not able to compare the ability of the measures to discriminate the BAP from other forms of psychopathology that would not be expected to share characteristics. Previous work has found that the BAPQ could discriminate between parents who met and did not meet criteria for the BAP based on direct clinical assessment, and that the AQ and SRS can discriminate individuals with ASD from those with other forms of psychopathology, suggesting that these measures hold promise in this regard. However, additional research is needed that can compare these measures to direct clinical assessment of the BAP and a range of psychopathology in order to further evaluate their convergent and discriminant validity. In particular, it is important to establish that poor social functioning associated with the BAP (e.g., failure to pick up on social cues) can be effectively discriminated from poor social functioning that is the result of other related conditions (Hurst et al. 2007b).

Finally, our sample consisted of predominantly white, middle-class college students in the United States; thus, the results may not generalize to individuals from other nations, ethnic groups, or SES brackets. Given the growing number of studies examining the BAP in non-selected, college-aged samples, we believe that we conducted an important first step in comparing measures. However, it is important to compare the instruments in more heterogeneous samples of individuals, including parents and adult siblings of individuals with ASD, who would be expected to be at a higher risk of expressing the BAP.

In sum, the present results converge with previous studies in suggesting that the BAP can be assessed with self-report inventories. Such measures can provide a quick and easy method for assessing autistic characteristics that would be useful in population-based studies of ASD. The BAPQ seems to be an especially promising instrument. Although there is room for improvement, the BAPQ is quick to administer, has adequate reliability and factor structure, and converges well with a number of related constructs. Further, its subscales may be useful in examining the heritability and correlates of individual dimensions of autistic traits in the general population.