Introduction

Autism is a neurodevelopmental variant often associated with intellectual disability, but at reported rates which vary widely, even across autistic cohorts born at similar times in the same country (e.g., 15 % in Williams et al. 2008, vs. 55 % in Baird et al. 2006). The DSM-5 requires an autism spectrum diagnosis to specify whether it is accompanied by intellectual disability, yet the text refers to autistics’ “(often uneven) intellectual profile” (APA 2013, p. 51), which suggests that assessing autistic intelligence is not necessarily straightforward.Footnote 1 Indeed, findings that the measured intelligence of autistic individuals varies—sometimes dramatically—according to which instrument is used are among the most durable in the history of autism research, but also among the most overlooked as to their full implications.

Both Raven’s Progressive Matrices (RPM) and Wechsler scales are major instruments used to estimate human intelligence, yet they are strikingly different in how they are structured and administered. Wechsler involves the individual administration of several subtests, some of them culture-specific, which assess a limited number of specific abilities considered to reflect latent general abilities. RPM is a one-format 60-item matrix reasoning test which minimizes the need for task instructions, for culture- or experience-specific abilities, and for other specific abilities which may be important (e.g., fine motor or speech skills) but do not necessarily reflect a person’s intelligence, particularly if the person is atypical. While Wechsler scoring is to some degree subjective and dependent on how skilfully it is administered, this is not the case for RPM, an untimed test (in its standard, most commonly used version; Raven et al. 1998) where each item has only one correct response.

For the typical population, these two very different approaches to intelligence testing provide similar estimates of intellectual potential (see Mackintosh 2011, for a review). This does not, however, necessarily hold true for adults or children on the autism spectrum, whose RPM scores have been reported as significantly, and sometimes dramatically, higher than their Wechsler full-scale IQ (FSIQ) (Barbeau et al. 2013; Bölte et al. 2009; Dawson et al. 2007; Hayashi et al. 2008; Soulieres et al. 2009), including in a large population-based sample (Charman et al. 2011). Autistics’ Wechsler Verbal and Performance IQ (VIQ and PIQ) may, in addition, be significantly lower than their RPM scores (e.g., Dawson et al. 2007). A functional neuroimaging study, in which reaction time data were collected during a self-paced RPM task, also suggests important autistic advantages on this test: autistics answered as accurately but significantly faster than nonautistics, while recruiting perceptual resources to a larger extent than nonautistics on more difficult and complex RPM test items (Soulieres et al. 2009).

These findings merit attention for several related reasons. First, in the field of human intelligence, RPM has long held unique importance as a paradigmatic test representing the constructs of fluid intelligence (Flynn 2000; Mackintosh 2011) and general intelligence (Neisser 1998). Second, RPM is recognized as the most complex single test of intelligence (Snow et al. 1984). Third, RPM is not a simple, rote, low-level, or perceptual task; instead, RPM makes demands on a wide range of abilities, such as attentional control and high-level integration or abstract reasoning (Carpenter et al. 1990; Kane and Engle 2002; Unsworth and Engle 2005), which have long been presumed deficient if not absent in autism (see Dawson et al. 2007; Stevenson and Gernsbacher 2013, for reviews). Fourth, while RPM is often described as a “non-verbal” test, in the nonautistic population, verbal abilities are crucial in determining performance (e.g., Fox and Charness 2010). Thus, autistics’ RPM performance presents interesting challenges to commonly invoked theories of autistic limitations (e.g., “disordered complex information processing;” Au-Yeung et al. 2013, p. 84) and to the recurring premise that autism per se causes low intelligence (e.g., Vivanti et al. 2013).

Nevertheless, Wechsler scales, and similar IQ or developmental test batteries, remain dominant in autism research (Joseph 2011; Mottron 2004) and very likely in practice. This has affected and continues to affect how autistic individuals are matched for research and, moreover, how their potential and progress are assessed and predicted (Courchesne et al. 2012). In addition, while RPM has been remarkably durable and unchanging, this has not been the case for Wechsler: autistics are also affected when Wechsler tests change in both their content and structure, sometimes substantially, as new versions are created (Nader et al. 2012). The latest Wechsler Intelligence Scale for Children (WISC-IV; Wechsler 2003) introduces substantial changes from previous WISC-III version (Wechsler 1991), at many levels. The long-standing Wechsler grouping of subtests into dichotomous PIQ and VIQ has been discarded; the four index scores have acquired greater importance and their structure has been changed; and several subtests have been revised or discarded while entirely new subtests have been added. See Table 1 for the WISC-IV subtests and structure, also described in Measures, below.

Table 1 WISC-IV subtests and index structure

Perhaps the most marked WISC-IV modification is the new Perceptual Reasoning Index (PRI), a major change from WISC-III PIQ or any of its indexes. PRI combines the well-known Block Design subtest, on which many autistics have a “peak of ability” (Caron et al. 2006; Happe 1994; Shah and Frith 1993), with two completely new subtests, Picture Concepts and Matrix Reasoning, which differ from WISC-III PIQ subtests in being untimed and less dependent on motor abilities. Furthermore, the new Matrix Reasoning subtest resembles some aspects of RPM. Interestingly, previous studies describing the cognitive profile of autistic individuals on WISC-IV suggest a strength on the Matrix Reasoning subtest (Mayes and Calhoun 2008; Oliveras-Rentas et al. 2012) and that WISC-IV PRI is better suited to capture autistics’ visual and abstract reasoning abilities than is WISC-III PIQ.

Thus, in view of fundamental changes to plausibly autism-relevant aspects of WISC-IV, our aim was to compare WISC-IV scores (FSIQ and all indexes) to RPM scores in both autistic and nonautistic children. We also aimed to compare autistic and non-autistic children’s WISC-IV index profiles and their relative performance on the new PRI.

Methods

Participants

All participant data came from the research database of the Autism Specialized Clinic at Riviere des Prairies Hospital (Montreal, Canada). Informed consent from parents was provided for all study participants; assent was also obtained from the children. The study was formally approved by the ethics committee of Riviere des Prairies Hospital.

Autistic Children

Previous research found different patterns of cognitive abilities in autism spectrum individuals subgrouped according to speech development anomalies (i.e., in autistic vs. Asperger subgroups) (Barbeau et al. 2013; Soulières et al. 2011). For this reason, and for comparability with previous studies, we chose to limit this study to autism spectrum children characterized by speech delays and/or other speech development anomalies. It is also important to note that, more than other autism spectrum subgroups (e.g., DSM-IV Asperger’s or PDD-NOS), intellectual disability is more often assumed to characterize this specific subgroup of autism spectrum children, who have the specific diagnosis of autism, and who meet DSM-IV “Autistic Disorder” criteria.

Thus, we retrieved data from all children who met criteria for the specific diagnosis of autism and had completed both WISC-IV and RPM. From this sample, autistic children who had a known, diagnosable genetic syndrome or additional neurological condition were excluded, leaving a non-syndromic or idiopathic autism group, that is, autism without a known primary and/or possibly confounding major co-occurring condition. The resulting sample consisted of 25 autistic children (24 males, 1 female) aged 6–16 years (see Table 2). While by definition they presented with speech onset delays and/or atypicalities in their development, all 25 autistic children were able to complete the 10 main subtests comprising WISC-IV FSIQ and its indexes.

Table 2 Participant descriptive data

Autism diagnosis was based on DSM-IV-TR criteria (APA 2000). The diagnostic process combined expert interdisciplinary clinical judgment with two gold standard research diagnostic instruments: Autism Diagnostic Interview-Revised (ADI-R; Lord et al. 1994) and Autism Diagnostic Observation Schedule-General (ADOS-G; Lord et al. 1999). As noted above, autism diagnosis was limited to children with speech delay (first single words after 24 months and/or first phrases after 36 months) and/or other speech atypicalities during their development (echolalia, stereotyped language, or pronoun reversal). Children on the autism spectrum but without speech onset delays or atypicalities (e.g., children with a clinical diagnosis of Asperger syndrome) were excluded from the present study.

Among the 25 autistic children, 18 were diagnosed using both ADI-R and ADOS-G, one was diagnosed using ADI-R only, three using ADOS-G only, and three using direct observation based on the ADOS-G procedure. All ADI-R and ADOS-G scores for the autistic children were above the established cut-offs for “autism spectrum disorder” with a majority (17/21) scoring at least 10 for “autism” on the ADOS. Diagnostic evaluations were performed by an interdisciplinary team specialized in the field at the Autism Clinic in Riviere des Prairies Hospital.

Typical (Non-autistic) Children

For the control group, we retrieved data from all typical children who had completed both WISC-IV and RPM. The control group comprised 22 typically developing, non-autistic children (16 males, 6 females) between 6 and 15 years of age (Table 2). Typical child participants selected from the database were all recruited within the community. Using a semi-structured interview, participants with a personal or family history of psychiatric, neurological or other medical conditions affecting brain development were identified and their data were excluded. All included typical children had a typical academic background.

Both groups were matched on age at WISC (p = .422). However, RPM was sometimes not administered at the same time and this led to groups that are not matched on age at RPM (p = .025). Furthermore, there is a significant age difference between the two tests only for autistic children (p = .007). While most control participants (20/22) came to the lab and performed both tests on the same visit, many autistic participants (11/25) were previously evaluated with WISC-IV at the clinic and performed the RPM subsequently when they came to participate in another study.

Measures

Wechsler Intelligence Scale for Children—Fourth Edition (WISC-IV)

WISC is an individually administered test battery that assesses intelligence in school aged children (6–16 years, 11 months). The 4th edition (Wechsler 2003) comprises 10 core subtests, yielding four index scores that combine into one FSIQ. The full names, organization, and structure of WISC-IV subtests and index scores are shown in Table 1. The Canadian version of WISC-IV was used along with Canadian norms.

WISC-IV Verbal Comprehension Index (VCI) has some resemblance to VIQ in WISC-III, combining three of its five VIQ subtests (Similarities, Comprehension and Vocabulary). As noted above, in contrast, the new WISC-IV PRI departs substantially from WISC-III PIQ, in which four subtests were timed. The PRI includes the familiar timed Block Design subtest from WISC-III PIQ, but adds two new subtests, Matrix Reasoning and Picture Concepts, neither of which is timed. The Working Memory Index (WMI: Digit Span and Letter–Number Sequencing) and Processing Speed Index (PSI: Coding and Symbol Search) are the third and fourth distinct indexes which together comprise WISC-IV FSIQ.

Raven’s Progressive Matrices

The standard version of RPM was administered to all participants, with no time limit, as recommended in the major manual (Raven et al. 1998), from which norms were obtained. RPM is composed of 60 items divided in 5 sets organized such that difficulty and complexity increase both within and across sets. All items are designed similarly; that is, they share the same format. A matrix of geometric designs is presented with the final cell of the matrix left blank; the participant has to choose which of six or eight alternative presented solutions best completes the matrix.

The latest RPM norms for U.S. children date from 1998 with wide percentile bands relatively, and intentionally, less precise than the scoring for Wechsler scales. However, RPM percentiles for both groups were derived from raw scores using the same norms and it can therefore be hypothesized that these norm limitations affected both groups similarly.

Procedure

Children completed WISC-IV and RPM at the same time (14 of 25 autistics; 20 of 22 nonautistics), or at two different times (11 of 25 autistics; 2 of 22 nonautistics; see detailed information in “Participants” section). In both groups, the tests were administered individually by neuropsychologists unaware of this study and its hypotheses. For the majority of all participants (44/47), WISC-IV was administered first and RPM second, but as aforementioned, the gap between both tests varied across participants. Therefore, further analysis was conducted with a subgroup of children for whom a year or less separated both administrations (see “Results” section).

Data Analysis Strategies

First, within each group, repeated-measures analysis of variance (ANOVA) assessed discrepancies across WISC-IV index and full-scale standard scores.

Second, RPM versus full-scale WISC-IV performance was compared in the two groups by inspecting percentiles derived from mean standard scores (WISC-IV) and from mean raw scores and chronological ages (RPM) for each group.

Third, individual RPM and full-scale WISC percentiles were plotted for each group, and statistical comparisons were conducted using individual percentiles. Parametric (t tests) and non-parametric (Mann–Whitney U) analyses were conducted to compare groups on full-scale WISC-IV versus RPM. Similar analysis strategies (ANOVAs and Wilcoxon signed-rank tests) were used to compare WISC-IV indexes and RPM within each group.

All procedures and strategies yielded similar results. Thus, when both parametric and non-parametric strategies were used, only the non-parametric analysis is presented. Data were analyzed by SPSS 19.0, with a significance level of p ≤ .05 (corrected for multiple comparisons with a Bonferroni correction, where applicable). Effect sizes are reported as eta squared for analyses of variance and as Pearson’s r for non-parametric comparisons.

Results

WISC-IV Index Profiles

The autistic children produced a distinctive profile with significant discrepancies between indexes (p < .001, n 2 = .277). Mean FSIQ (87.8, SD 15.7), VCI (83.6, SD 16.8), WMI (87.0, SD 20.5), and PSI (87.8, SD 13.8) were at similar levels. However, their PRI (101.5, SD 17.5) was significantly higher than their FSIQ, VCI, WMI, and PSI (all p < .05).

The typical children also showed discrepancies between indexes but with a different profile (p < .001, n 2 = .257). Their mean FSIQ was 110.3 (SD 14.8), with similar PRI (110.7; SD 12.4), VCI (112.8; SD 18.3) and PSI (105.3; SD 13.2) scores. Their WMI (99.9; SD 11.9) score was significantly lower than their FSIQ (p < .05), PRI (p < .001) and VCI (p < .05). See Table 2 for WISC-IV FSIQ and index scores in both groups.

RPM Versus WISC-IV: Group Percentile Comparison

The autistic child group’s RPM score was at the 60th percentile, compared to WISC-IV FSIQ at the 21st percentile, which represents a discrepancy of 39 percentile points between tests. For the typical child group, RPM score was at the 73rd percentile, very close to their full-scale WISC-IV score, at the 75th percentile. See Fig. 1, which also shows group WISC-IV index scores in percentiles.

Fig. 1
figure 1

For each group, performance in percentiles on WISC-IV FSIQ, the 4 WISC-IV indexes, and RPM

RPM Versus WISC-IV: Individual Percentiles

Individual scores on RPM and full-scale WISC-IV are shown for autistic and typical children in Fig. 2.

Fig. 2
figure 2

Distribution of individual scores on full-scale WISC-IV and RPM in percentiles for a autistic and b non-autistic (typical) children. Data points to the left of the lower diagonal lines represent participants whose RPM scores were greater than their full-scale WISC IV scores; data points to the left of the top diagonal lines represent participants whose RPM scores were more than 50 percentile points greater than their full-scale WISC-IV scores

While no autistic child scored over the 90th percentile on WISC-IV, more than one-fourth (28 %) did on RPM; and while only 28 % of the autistic children scored at or above the 50th percentile on the WISC-IV, nearly two-thirds (64 %) of the group did so on RPM. Four (16 %) of the autistic children would be judged as intellectually disabled according to their WISC-IV scores, but none would be so judged on RPM. Indeed, all autistic children scoring in the WISC-IV intellectual disability range performed at the 10th percentile or higher on RPM. Finally, only 20 % of the autistic children showed a discrepancy of 10 percentiles or less between the two tests, while 20 % displayed a discrepancy of 50 percentile points or more, their RPM scores being higher. In contrast, the majority (64 %) of the typical children showed an under-10 percentile point difference between the two tests, and only one typical child achieved a discrepancy greater than 50 percentile points between full-scale WISC-IV and RPM scores.

In statistical comparisons, Wilcoxon signed-rank tests indicated that the autistic children’s RPM scores were significantly higher than their WISC-IV full scale IQ (p < .0005; r = .611), VCI (p < .0005; r = .618), WMI (p < .005; r = .438) and PSI (p < .0005, r = .605) scores. The only WISC-IV index score that did not significantly differ from their RPM score was the PRI (p = .14). Unlike the autistic children, the typically developing children did not show discrepancies between RPM and WISC-IV FSIQ, VCI, and PRI scores, all p > .05. However, their WMI and PSI scores were significantly lower than their RPM scores (both p < .05).

A Group by Test (WISC-IV FSIQ vs. RPM) ANOVA revealed a significant interaction, Wilks Lambda = .71 F(1, 45) = 18.5, p < .0005, n 2 = .29, indicating that the difference between both tests was significantly greater in the autistic group than in the typical group.

Finally, a comparison between the autistic and typical children’s scores revealed that while their WISC-IV FSIQ scores differed significantly (p < .0005), their RPM scores did not (p = .56).

Further Analyses

In order to ensure that the time gap between the two tests did not have an effect on the results, further analyses were performed without children for whom more than a year separated the administration of both tests. This led to a subgroup of 18 autistic children with no difference between age at WISC-IV and age at RPM (p = .331). All analyses presented above were conducted for this subgroup and led to the same results. Also, to determine if sex had an influence on the results, all analyses were also conducted with all females excluded, i.e., with a subgroup of 24 autistic males and 16 nonautistic males. Again the results were the same.

Discussion

Our findings provide early evidence that the latest version of Wechsler for children, WISC-IV, may underestimate the intelligence of children whose specific DSM-IV-based diagnosis is autism, that is, autism spectrum children with speech onset delays and/or other speech development anomalies. Autistic children achieved significantly higher RPM than WISC-IV full-scale scores, a discrepancy not found in typical control children. This result is consistent with and adds to previous findings (e.g., Barbeau et al. 2013, which includes an overview) suggesting that RPM may better represent autistic intelligence than does Wechsler scales.

Because RPM is a complex test of general and fluid intelligence (Neisser 1998), our current findings, combined with previous related results, challenge the recurring view that autism is incompatible with the development of genuine human intelligence (Hobson 2002; Vivanti et al. 2013). Similarly, our and previous related findings challenge the still-common view that autistic strengths are limited to rote memory, isolated “islets” of ability (Happe 1999), or other simple low-level skills (Just et al. 2004, 2012). Instead, complex reasoning and novel problem-solving abilities may be important in autism. Accurately assessing these abilities, across the lifespan, should be a priority.

Our results also provide preliminary evidence, within a limited range of scores, suggesting that WISC-IV PRI better estimates autistic intelligence than both WISC-IV FSIQ and WISC-III PIQ (e.g., Dawson et al. 2007, Nader et al. 2012). PRI subtests to some extent limit demands on typicality (e.g., the requirement to typically process or produce speech; typical motor abilities; typical range of knowledge or experience), while concentrating on visuospatial and abstract reasoning abilities central to the concepts of fluid and general intelligence. Possibly because of this (Caron et al. 2006; Jones et al. 2009; Samson et al. 2012; Stevenson and Gernsbacher 2013), in autistics who can perform all WISC-IV subtests, PRI may better reflect their potential than the other WISC-IV index scores or WISC-IV FSIQ.

However, to assess autistics’ potential, RPM remains a better choice, due to a larger body of evidence encompassing a wide variety of autistics (see introduction, for a review). This includes Asperger adults and children, who achieved RPM score at the same level as their Wechsler peaks of ability (Soulières et al. 2011). This also includes pilot data from school-aged autistic children who have very little or no speech: most could not complete any WISC-IV subtest yet most could perform Raven’s Colored Progressive Matrices (RPM, board form) and achieved scores in the normal range of intelligence (Courchesne et al. 2012). Indeed, the current study adds to existing data suggesting caution when interpreting autistic results on Wechsler scales or similar test batteries, which may disadvantage autistics for reasons unrelated to their intellectual potential. If such test batteries are used, RPM could and perhaps should be used as a complementary test to facilitate interpretation of results.

There is now accumulating evidence that RPM may have specific properties which make it particularly important for assessing autistics, as suggested by their ability, not found in typical individuals, to efficiently engage perceptual or low-level resources in high-level tasks (Heaton et al. 2008; Koshino et al. 2005; Mottron et al. 2006; Soulieres et al. 2009). It may be especially important that individuals are free to solve RPM items how they wish, that explicit instructions are minimal or absent in RPM administration, that each RPM item contains all necessary information to arrive at (and verify) a solution, and that each RPM item can be solved in multiple different ways. RPM thus encompasses the kind of multi-level density or redundancy of complex information found in areas in which autistics are known to spontaneously excel (e.g., music, 3-D drawing), given the opportunity (e.g., Miller 1989; Pring 2008; Heaton 2009; Boso et al. 2013; see also Mottron et al. 2009, 2013, for a review). At the same time, RPM plausibly carries much less requirement for “typicality” in cognitive processes than do Wechsler and similar tests. Finally, consistently found discrepancies between autistics’ RPM and Wechsler performance suggest “hidden” potential (e.g., for resourcefulness and learning; Courchesne et al. 2012) that should be encouraged, rather than discouraged, in educational and other approaches to autism.