The DSM-5 Autism Spectrum Disorder diagnosis now requires to specify whether or not there is an associated intellectual impairment (American Psychiatric Association 2013). Diagnostic assessment often takes place during preschool years, an age at which the intellectual assessment is particularly challenging (Akshoomoff 2006). The decreased testability in children on the Autism Spectrum (AS)Footnote 1 has an important effect on the test results (Eagle 2003). In their review, Filipek et al. (1999) insisted on the importance of test choice and administration with autistic children, particularly when they are young, non-verbal or “low functioning”. Indeed, because of the behavioural challenges they present, these autistic children are often considered “untestable” and may therefore be wrongly judged as intellectually impaired (Eagle 2003).

Another challenge regarding intellectual assessment in autistic preschoolers is the few IQ tests available and suited for this population. For example, the Bayley Scales of Infant and Toddler Development (Bayley and Reuner 2006) is largely used in young children, but it was shown to be unpredictive of later non-verbal IQ when used with autistic children before 4 years of age (Lord and Schopler 1989), and it does not include a separate subscale for non-verbal abilities. This latter characteristic makes it unfit for non- or minimally-verbal autistic children, which represent the majority of autistic children in preschool years (Pickett et al. 2009; Wodka et al. 2013). One of the most widely used cognitive tests in preschool autistic children is the Mullen Scales of Early Learning (MSEL) (Filipek et al. 1999; Swineford et al. 2015). The validity of the MSEL was recently studied in a sample of autistic children with the conclusion that the MSEL has a good convergent and divergent validity (Swineford et al. 2015). However, in this study the MSEL subscales were compared with items from different diagnosis assessment tools or language tests (Swineford et al. 2015) and were not compared to any other IQ or developmental test. To our knowledge, only two studies did compare the MSEL to another IQ test (the Differential Ability Scale) in AS children (Bishop et al. 2011; Farmer et al. 2016). While performance on MSEL was strongly correlated to that on Differential Ability Scale, the performance of autistic children was significantly lower on the MSEL (Farmer et al. 2016). Akshoomoff (2006) demonstrated that autistic children show more off-task behaviours when assessed with the MSEL, which can, in turn, have an impact on their level of engagement. Despite the fact that very few studies specifically investigated the use of the MSEL in autism and that therefore little is known about the intellectual profile of autistic children on that test, it is often the only IQ measure used in studies focussing on autistic preschoolers (i.e. Dawson et al. 2010; Zwaigenbaum et al. 2005).

A few other IQ tests are available at preschool age and could be suited to the AS population. However, research is sparse in documenting the cognitive profile on different tests at that age. Indeed, heterogeneous performance within IQ tests is known to characterize the profile of both autistic children and adults (Akshoomoff 2006; Harris et al. 1991; Mayes and Calhoun 2008; Nader et al. 2015; Oliveras-Rentas et al. 2012; Soulières et al. 2011) and important discrepancies between tests are also well documented in the autism spectrum (Baum et al. 2014; Bölte et al. 2009; Dawson et al. 2007; Grondhuis and Mulick 2013; Hayashi et al. 2008; Sahyoun et al. 2009; Shah and Holmes 1985), but little is known on the intellectual profile during the preschool period. The few studies that investigated the cognitive profile of autistic preschoolers showed that the discrepancy between verbal and non-verbal abilities in favor of the latter is not only also present during preschool years, but that it might even be of greater amplitude early in life, as verbal abilities tend to increase with age in autism (Harris et al. 1991; Mayes and Calhoun 2003a, b). These studies however did not administer different tests to the same child, which only allows an analysis of the within-test cognitive profile. Given the known discrepancies not only among, but also between IQ tests in the AS, it would be important to further document AS preschoolers’ profile on different IQ tests.

We previously shown in a sample of minimally verbal school-aged autistic children that conventional or standardized assessment is not well suited to this population (Courchesne et al. 2015). In this study, we documented the performance of minimally verbal school-aged autistic children on a strength-informed assessment, that is, an assessment using nonverbal visual reasoning tests on which autistic individuals typically perform well. None of the participants were testable with a conventional assessment such as the Wechsler’s scales, but the vast majority of the children were able to complete the strength-informed assessment. This strength-informed assessment included the board form version of the Raven’s Coloured Progressive Matrices (a non-verbal intelligence test) and other potential indicators of intellectual level in autism: visual cognitive tasks. The chosen visual tasks were a Visual Search task and the Children Embedded Figures Test (CEFT) (Karp and Konstadt 1963).

In this previous study, the inclusion of visual tasks was motivated by the fact that perceptual abilities are historically linked to the study of intelligence (for an historic see: Deary et al. 2004; Mackintosh 2011) and were shown to be correlated to intellectual abilities in both autistic and non-autistic children and adults (Barbeau et al. 2013; Deary et al. 2004; Hill et al. 2011; Meilleur et al. 2014; Wallace et al. 2009). Furthermore, autistic individuals were shown to have superior abilities in various visual tasks (Jarrold et al. 2005; Kaldy et al. 2016; O’riordan 2004; Perreault et al. 2011; Schlooz and Hulstijn 2014; Soulières et al. 2011) and many of these tasks are fast and simple to administer compared to conventional IQ tests.

Little is known about visual perceptual abilities in autistic preschoolers, but at least a few studies suggest that perception may be also superior in this age group. Indeed, Kaldy et al. (2011) showed enhanced visual search in 1 to 3 years old autistic children. Morgan et al. (2003) showed an enhanced performance on the Preschool Embedded Figures Test and on a Pattern Construction task in a group of autistic children aged from 3 to 5 years old. Pellicano et al. (2006) obtained similar results on these same two tasks in a sample of autistic children aged from 4 to 7 years old. Furthermore, other research findings such as a preference for geometric forms in autistic toddlers (Pierce et al. 2011), or a faster response in an attention cueing task in 2 years-old autistic children (Chawarska et al. 2003), might indicate enhanced perceptual processes very early in autistic development.

The objective of the present study was therefore to replicate and extend the results of our previous study (Courchesne et al. 2015) to preschool-aged autistic children. We first aimed at documenting testability in this sample of very young autistic children, that is, at the moment of diagnosis. Is a cognitive assessment feasible at this age, and with which instruments? How can we maximize testability? To do so, we compared five different tests and experimental tasks (conventional and strength-informed) in preschool-aged AS and typically developing (TD) children. We hypothesized that a lower proportion of AS children would be able to complete the cognitive assessment. Our second aim was to document and compare the intellectual profile of these children on conventional versus strength-informed tools. We hypothesized that AS children would have a better performance in strength-informed tools and that their profile on conventional tests would be characterized by higher scores in domains not requiring language, while TD children would have a more homogeneous profile within and across tools. We also aimed at documenting the associations between conventional and strength-informed test performance.

Method

Participants

Fifty-two autistic children and fifty-four typically developing children aged from 31 to 77 months at time of testing were assessed in this study (see Table 1). All children who received an AS diagnosis at the Rivière-des-Prairies Hospital AS clinic during the study period, and without any known genetic condition, were solicited to be part of a research participant database. All those who signed the consent form to be included in the database, and who matched the age criteria, were solicited to participate in the present study. There were no other exclusion criteria for that group. They received an Autism Spectrum Disorder (ASD) diagnosis based on gold standard instruments and clinical expert judgment. 14 were assessed with both the ADOS (Lord et al. 2000) and the ADI-R (Lord et al. 1994), while 38 were assessed with the ADOS only. The TD children were recruited in the community through announcements in childcare centers and preschools in the Greater Montreal area. TD children were screened through an interview to ensure they and their first-degree relatives did not present or previously presented any developmental or neurological disorder. Informed consent was obtained for all participants prior to the beginning of the study. The study was approved by the Research and Ethic Committee of Rivière-des-Prairies Hospital.

Table 1 Participant characteristics and ratio of participants who completed each test

The level of language of the autistic children in our sample was estimated using the Expressive Language subscale of the Vineland Adaptive Behavior Scales (VABS)—second edition (Sparrow et al. 2005), through an interview with parents. This type of assessment was chosen in order to access the language capacity of the child at home, in natural and optimal conditions, as well as to minimize the length of direct testing with the child. Asking the parents allowed obtaining an estimation of how much their child was able to understand and produce in terms of language. Twenty-eight autistic children (54%) had scores that placed them below the second percentile, seven (13.5%) had scores between the second and 8th percentile, seven (13.5%) had scores between the 9th and 24th percentile, and seven (13.5%) had score in the average, e.g. between 25th and 75th percentile. We were unable to reach the parents to complete the assessment of three (5.5%) participants. These data are provided for informational purposes as the level of expressive language was not an exclusion criterion.

Conventional Tools

Mullen Scales of Early Learning (MSEL) (Mullen 1995)

The MSEL is a measure of cognitive and motor abilities that is largely used among young children since it is normed from birth to 68 months. The test is composed of five subscales: Gross Motor (normed only for children from birth to 33 months), Visual Reception, Fine Motor, Receptive Language and Expressive Language.

Wechsler Preschool and Primary Scales of Intelligence (WPPSI-IV) (Wechsler 2012)

The Wechsler scales figure among the most widely used intelligence tests in both clinical and research settings (Neisser et al. 1996). The preschool and primary version of the Wechsler – fourth edition (WPPSI-IV) is normed from 2 years 6 months to 7 years 7 months. There is one version for children aged below 4 years old and one version for children aged 4 and more. They respectively include 5 (Receptive Vocabulary, Information, Block Design, Object Assembly, Picture Memory) and 6 (Information, Similarities, Block Design, Matrix Reasoning, Picture Memory, Bug Search) mandatory subtests that allow computing a Full-Scale IQ.

Strength-informed Tools

The strength-informed tools were also selected based our previous study (Courchesne et al. 2015). The adaptations that were made in this first study in order to minimize requirement of language production or comprehension and to avoid the need to point to respond were deemed also appropriate for preschoolers (and are considered part of what we called flexible testing; see below).

Raven’s Colored Progressive Matrices (board form) (RCPM) (Raven et al. 1998)

The RCPM, despite being classified in the strength-informed tools in the present paper is also considered when we refer to «intelligence tests» and included in the «cognitive profile» results. The RCPM is an IQ test using non-verbal material, relatively independent of culture and measuring fluid intelligence (Neisser et al. 1996). It is composed of three sets of 12 matrices of increasing difficulty. The child has to choose among 6 pieces the one that best completes the matrix. The RCPM has norms from the Netherlands from 1982 for children as young as 3 years 9 months and up to 10 years 2 months. For the present paper, RCPM was administered to all participants regardless of their chronological age. Raw scores were used in analyses when possible since groups were matched on age. In the analyses for which the use of percentiles was necessary, the Netherlands norms were used for all participants, which led to the exclusion of children younger than 3 years 9 months at time of RCPM administration. The number of excluded participants is mentioned prior to each analysis. The RCPM was used in previous studies with autistic children and was shown to significantly improve testability and performance in minimally verbal autistic children, compared to other tests such as the Wechsler matrix subtest (Courchesne et al. 2015). In the board form of the RCPM (available from Pearson Clinical Assessment Canada), there is an actual hole in the 2 × 2 matrix and the child can pick the piece of his choice among the 6 pieces placed underneath the matrix and manually put it in the hole. This puzzle like version is known to be more suited for children with intellectual disabilities (down syndrome or idiopathic intellectual deficiency) or Autism Spectrum Disorder (Bello et al. 2008).

Visual Search Task

The visual search task consists of finding a target letter among distracters. It was adapted from the computerized version of the visual search task in O’Riordan et al. (2001) and was similar to the task used in Courchesne et al. (2015). In order to minimize verbal instructions, “absent” trials were withdrawn and the target letter was printed and given to the child prior to each trial. Three different letters were successively used as target and were embedded among 5, 15, 25, 50 or 75 distracters trials with 50 and 75 distracters were added to the task used in Courchesne et al. (2015). There were two conditions. In the Feature condition the target letter differed from distracters in shape (e.g. a red S hidden among red Ts and green Xs). In the Conjunction condition, the target shared color with some of the distracters and shape with others, so that only the conjunction of attributes defined the target (e.g. a red X hidden among red Ts and green Xs). Each combination of set of distracters (5) and condition (2) was presented six times for a total of 60 trials. The targets and distracters spanned approximately 1.8 × 2.7 cm each. The targets were printed on thick plasticized cardboard (3 × 2.4 cm). The sets of distracters with the embedded target letter were printed on 28 × 21,5 cm plasticized sheets presented on the table in front of the child after giving him the appropriate target. The time needed to place the letter on the target and the number of successful trials were recorded.

Children Embedded Figures Test (CEFT) (Karp and Konstadt 1963)

The CEFT consists of finding a target figure hidden among a larger meaningful line drawing. There are two different targets, a triangle and a house. The CEFT is composed of 14 practice trials and 25 test trials. In order to minimize verbal instructions, the instruction not to turn the target, which is normally part of the test, was removed for all participants. Also, the targets were cut in thick cardboard and the appropriate target was given to the child prior to each trial. The time required to place the target on the hidden figure was recorded and the number of successful trials was computed.

Procedure

The MSEL was generally administered in the first assessment session since it was also used as part of another study that was done prior to the beginning of this study. The order of administration of the other tests was counterbalanced. All the tests were administered by trained graduate students in clinical psychology in a room designed for the assessment of young children at Rivière-des-Prairies Hospital. The length of each assessment session depended on the child’s attentional capacity, but generally lasted about an hour to an hour and a half, including many short breaks. In order to obtain the maximal performance of the child, the items that were administered at the end of a session, and for which attention level was deemed not optimal, were re-administered at the beginning of the next session.

The tests were administered using what we refer to as flexible testing, to describe a combination of adaptations made in order to maximise the testability of both autistic and typical children, and that was used with both conventional and strength-informed tools. Some of these adaptations were also implemented in Courchesne et al. (2015), despite not being referred to as flexible testing. Flexible testing included, but was not limited to, administering the items of one subscale until the child was inattentive for one reason or another and switching tests and subscales as often as necessary during one session. Also, we used what the child was spontaneously doing as a mean of assessing his abilities (i.e. if the child spontaneously stacked the blocks during an activity, the corresponding Fine Motor item of the MSEL was coded). Even though some of these adaptations are allowed in the standardized procedures (i.e. subtest and item switching in the MSEL), flexible testing encompassed a wider range of minor adaptations that were applied to both standardized and non-standardized tests and was used for assessment of both groups. The strength-informed tests were particularly suited to flexible testing since they can easily be split over different assessment sessions, and since the autistic children manifested an intrinsic interest for these tests. For example, when a child was inattentive and did not cooperate for one subscale of the WPPSI-IV or the MSEL, the administrator could present a few items of the Visual Search Task, then administer one of the series of the RCPM and often, the child was able to refocus and respond to the new tasks presented. Hence, the strength-informed tests were often used to prolong the assessment sessions. Flexible Testing, in addition to being used in order to maximize the duration of each assessment session, was also a mean to access the maximal potential of the child. Indeed, trying to keep the child seated and focussed for prolonged periods of time to administer each test in the prescribed order was vain and did not allow for an appreciation of the child’s abilities to complete the tasks. This is why flexible testing was prioritized.

Results

Testability

The first objective was to investigate testability among young AS and TD children. As we hypothesized, not all participants were able to complete all tests. The number of participants in each group who completed each test is presented in Table 1. In order to get an approximation of testability, we used two approaches. First, we compared the number of sessions to complete participation in the study (i.e. attempting or completing every test) and the number of subtests (or subscales) successfully completed between groups. As expected, despite being assessed in a similar number of sessions (AS: M = 4.23; SD = 1.55; TD: M = 4.09; SD = 1.03, t (104) = .54, p = .59), AS children completed a significantly lower number of subtests compared to TD children (AS: M = 12.01; SD = 6.60; TD: M = 17.19; SD = 2.72, t(104) = − 5.31, p < .001). Second, as the vast majority (n = 46) of TD children successfully completed all subtests, in order to obtain an individual estimate of the testability that did not show such a ceiling effect, a testability ratio was computed by dividing the number of subtests completed by the number of assessment sessions. This testability ratio is thus an indicator of the attention level of the child, his capacity to remain seated and on task, his motivation to complete the tasks, etc. As expected based on our first estimate of testability, an independent t test indicated that AS children had a significantly lower testability (AS : M = 2.80; SD = 1.54; TD: M = 4.46; SD = 1.32, t(104) = − 5.96, p < .001). Furthermore, testability significantly correlated with age in both the AS (r(50) = .41, p < .005) and the TD groups (r(52) = .39, p < .05), older children completing more subtests in each assessment session. Finally, this testability ratio was not correlated with performance on any of the cognitive tests (MSEL, WPPSI-IV, RCPM) in any of the groups (all p’s > .05).

Cognitive Profile

The analyses on cognitive profile were done using a subgroup of AS (n = 26) and TD (n = 31) children who were able to complete all tests and were matched on age (p = .12). Bonferroni corrections were applied where appropriate.

The second objective of the study was to compare the intellectual profile of the two groups. We first compared the proportion of AS children scoring below the normal range (IQ < 70) according to each intelligence test (MSEL, WPPSI-IV and RCPM) among those who were able to complete the three IQ tests with the use of flexible testing. The proportion was 50% (13/26) according to the MSEL and 19% (5/26) according to the WPPSI-IV Full Scale IQ. This percentage dropped to 15% (4/26) when only considering the non-verbal subtests (Block Design and Matrix), and to 4% (1/26) according to the RCPM. None of the TD children scored below the normal range on any of the three intelligence tests.

Assessment with Conventional Tools

MSEL Profile. Autistic children performed significantly lower (M = 71.04; SD = 19.04) than TD children (M = 105.94; SD = 18.11) on the MSEL standard score (t(45) = -7.01, p < .001). In order to explore the profile of both groups on the MSEL and to test whether the performance of autistic children would be higher on non-verbal than verbal subscales, the four MSEL subscales were compared using a two-way 2 (group) x 4 (subscale) mixed ANOVA. The analysis revealed a significant interaction between Subscale and Group (Wilks Lambda = .65, F(3, 52) = 9.18, p < .001, ηp2 = .35). Post hoc comparisons using paired t tests (Bonferroni correction applied) revealed that Visual Reception was significantly higher than both Expressive Language and Receptive Language subscales in the AS group (both p’s < .001), while no differences were found between the other subscales and between any subscales in the TD group (all p’s > .05). See Fig. 1 for the profile on MSEL.

Fig. 1
figure 1

Mean performance in standard scores on the four subscales of the MSEL for each group. Note. Standard scores used in the graph were derived from T score for comparison purposes only, analysis were conducted with the original T scores. *p < .05; MSEL, Mullen Scales of Early Learning

WPPSI-IV profile The autistic children performed significantly lower (M = 86.27; SD: 22.52) than TD children (M = 113.58; SD = 12.66) on the WPPSI-IV Full-Scale IQ (t(55) = − 5.50, p < .001) and on all WPPSI-IV subtests, all p’s < .003, except Block Design, which was not significant following Bonferroni corrections, p = .02.

To further investigate the WPPSI-IV profile, the six subtests included in the WPPSI-IV version for 4 years old and above were then used in a two-way 2 (group) x 6 (subtest) mixed ANOVA (the subtests only included in the 2 and a half to 4 years old version were excluded from this analysis since only 4 AS children and 3 TD children in this age range completed them). The analysis revealed a significant Group X Subtest interaction (Wilks Lambda = .67, F(2, 54) = 13.05, p < .001, ηp2 = .33). Post-hoc paired t tests with Bonferroni corrections indicated that Similarities and Information were significantly lower than Block Design, Picture Memory, Matrix Reasoning and Bug Search (all p’s < .002) in the AS group. In the TD group, Similarities was significantly lower than Block Design, Picture Memory and Bug Search, and Information was significantly lower than Picture Memory, all p’s < .003. See Fig. 2 for profile on the WPPSI-IV subtests.

Fig. 2
figure 2

Mean standard scores for each mandatory subtest of the WPPSI-IV for each group. Note. Standard scores used in the graph were derived from scaled scores for comparison purposes only, analysis were conducted with the original scaled scores. *p < .05; WPPSI-IV, Wechsler Preschool and Primary Scales of Intelligence—fourth edition

Assessment with Strength-informed Tools

RCPM The analysis on RCPM was conducted with raw scores because (1) groups were matched on age, (2) fourteen of the 32 TD children included in this subsample performed at ceiling on the RCPM when their scores were converted into percentiles and (3) the RCPM is not normed for the participants younger than 45 months. None of the participants performed at ceiling when using raw scores on RCPM. The performance on RCPM did not differ between groups. Autistic children (M = 22.38/36; SD = 7.34) performed similarly to TD children (M = 22.19/36; SD = 4.59), t(55) = .115, p = .91. Similar results were obtained when using percentiles.

Visual Search Performance As expected in this type of task, accuracy on the Visual Search Task was at ceiling for most participants and did not differ between groups. As for the variable of interest, response times, a three-way 2 (group) × 2 (condition) × 5 (number of distracters) mixed ANOVA revealed a significant interaction between Condition and Number of distracters (Wilks Lambda = .36, F(4, 52) = 22.82, p < .001, ηp2 = .64). There was also a main effect of Group, AS group being significantly slower than TD group (F(1, 55) = 12.01, p < .005, ηp2 = .18), but post-hoc comparisons showed that between-group differences were not significant for all conditions and were often non-significant in the harder conditions (see Fig. 3 for significant differences). Post hoc tests revealed that for each category of number of distracters, both groups had slower response times in the Conjunction condition compared to the Feature condition (all p’s < .001), but not when there were only 5 distracters (p = .63). Also, pairwise post-hoc comparisons indicated that both groups had significantly slower response times as the number of distracters increased within each condition. The results showed that the only difference that was not significant was between F5 and F15 (p = .06). Overall, these results suggest that both groups had slower response times when the number of distracters increased, particularly in the conjunction condition (see Fig. 3 for between-group comparisons).

Fig. 3
figure 3

Mean Visual Search response times in seconds for each group and condition. *p < .05; F, feature; C, conjunction

Children Embedded Figure Test (CEFT) Independent t tests showed that AS children did not differ from TD children on CEFT score (AS children: M = 12.72, SD = 4.90; TD children: M = 12.64 SD = 5.47; t(54) = .06, p = .96) or response times (AS children: M = 23.76 s., SD = 12.17; TD children: M = 20.31 s., SD = 9.05; t (54) = 1.21, p = .23). See Fig. 4.

Fig. 4
figure 4

CEFT mean score (left) and CEFT mean response time for successful trials (right) for each group. Note. Score is on a maximum of 25. Response time is in seconds. CEFT, Children Embedded Figure Test

Comparisons and Associations Between Tests

In order to directly compare performance across tests, a two-way 2 (group) X 3 (test) mixed ANOVA was done using performance in percentiles on each test, on subgroups in the age range of all test norms (n = 24 AS and 29 TD children). A significant Group X Test interaction was found (Wilks Lambda = .72, F(2, 49) = 9.41, p < .001, ηp2 = .34). Post-hoc comparisons using paired t tests for each group separately indicated that performance on each test significantly differed, with performance on the RCPM being significantly higher than that on both WPPSI-IV and MSEL; as well as performance on the WPPSI-IV being significantly higher than that on MSEL (all p’s < .003).

To verify whether the magnitude of the difference between intelligence tests was, as predicted, greater in the AS group, difference variables were computed. The magnitude of the difference between tests was indeed significantly greater in the AS group -relative to the TD group- between WPPSI-IV and RCPM (t(51) = − 4.01, p < .001;), and between MSEL and RCPM (t(35) = 4.68, p < .001). Differences between WPPSI-IV and MSEL were of similar magnitude in both groups (t(54) = .34, p = .74) (see Fig. 5). Since percentiles can be considered as ranked variables, non-parametric analyses were also conducted and similar results were obtained.

Fig. 5
figure 5

Mean standard score for each intelligence test and each group. Note: Standard score for RCPM were derived from percentile for each individual. Standard scores are used in the graph to allow comparison with the other figures, analysis were performed with percentiles. * p < .05; MSEL, Mullen Scales of Early Learning; WPPSI-IV, Wechsler Preschool and Primary Scales of Intelligence-fourth edition; RCPM, Raven Color Progressive Matrices-Board-Form

Since we hypothesized that AS children would perform better in non-verbal subtests, we explored how the three intelligence tests compared with regard to assessment of non-verbal abilities. To do so, a mixed Group X Test ANOVA was done using WPPSI-IV Matrix Reasoning subtest, MSEL Visual Reception subscale and RCPM (all in percentiles). Results indicated a main effect of test (Wilks Lambda = .25, F(2, 46) = 70.82, p < .001; ηp2 = .76) with no interaction (p > .05). Post-hoc comparisons, with applied Bonferroni corrections, indicated that MSEL Visual Reception subscale (AS children: M = 27th, SD = 31.64; TD children: M = 55th, SD = 29.56) was significantly lower than RCPM (AS: M = 80th, SD = 32.96; TD children: M = 93th, SD = 7.83; both p’s < .001), but not Matrix Reasoning (AS children: M = 40th, SD = 30.02; TD children: M = 69th, SD = 21.98.0, both p’s > .05) in both groups. Matrix Reasoning was also significantly lower than RCPM in both groups (both p’s < .001).

Correlations were made between the three intelligence tests (MSEL and WPPSI-IV IQ scores, RCPM raw score) and Visual Search mean time in seconds, while controlling for age. In the AS group, significant negative correlations were found between Visual search time and WPPSI-IV IQ score (r(23) = − .74, p < .001), between Visual Search Time and MSEL IQ score (r(22) = − .56, p = .004) and between Visual Search Time and RCPM score (r(23) = − .61, p = .001). In the TD group, only the WPPSI-IV score correlated significantly with the Visual Search Time: (r(28) = − .42, p = .021).

The analyses were done using CEFT score, while controlling for age. CEFT time was not used since time is calculated only for successful items, leading to a bias in the correlations. In the AS group a significant positive correlation was found between CEFT score and RCPM (r = .54, p = .007) and between CEFT and WPPSI-IV (r = .40, p = .03), but not between CEFT and MSEL (r = .23, p = .29), while the only significant correlation in the TD group was between CEFT and RCPM (r = .44, p = .02).

Discussion

Summary of Findings

This study investigated testability and cognitive profile of preschool autistic children using conventional versus strength-informed tools. First, as expected, testability was lower in AS than TD children, but increased for strength-informed relative to conventional tools. Testability increased with age but was not related to performance on intelligence tests in both autistic and TD children. Second, autistic children in our sample performed better on strength-informed visual reasoning tasks than on conventional IQ batteries that include both nonverbal and verbal subtests. The autistic preschoolers in the present sample even had similar performance to TD children on two of the three strength-informed tools (RCPM and CEFT) and on the harder conditions of the Visual Search, despite much lower performance on conventional tests. Discrepancies among and between intelligence tests were greater in AS than TD preschoolers. The intellectual profile of our sample was as follow: MSEL < WPPSI-IV < RCPM, with greater between-tests discrepancies in the autistic group. Differences between subscales/subtests were also mainly found in the AS group. These differences were found despite the use of flexible testing in both groups and for all tests. Third, we observed more associations between perceptual and intellectual abilities in preschool AS children than in TD children. Indeed, the three IQ tests correlated with Visual Search, and two (WPPSI-IV and RCPM) with CEFT, in the AS group. Only the WPPSI-IV correlated with Visual Search, and RCPM with CEFT, in the TD group.

Testability

The results of the present study suggest that testability is an important issue when assessing young autistic children, in both clinical and research settings. For the participants who were able to complete one or more tests or subtests (i.e. we considered that the child understood minimally what was asked from him in the task), their performance on these completed tasks was not correlated to the number of assessment sessions needed to obtain a valid score. Since no association was found between testability and performance on the tests, a deficit should not be inferred systematically from an inability or refusal to complete a given task. Hence, the capacity to comply with the demands of a specific test does not seem to be indicative of the child’s intellectual abilities and could result from many different factors, such as difficulties with verbal language, which could be totally independent of the child’s IQ (Mayes and Calhoun 2003a). More generally, the interpretation of “failures” is particularly challenging in this population. It is often unclear whether the child was unable to answer correctly because of an actual invalid response, a refusal to answer, an impossibility to attend to the task, a failure to understand what is asked, etc. (Eagle 2003).

A clinical indication of this reality is the fact that many parents told the experimenter that their child was usually able to do similar tasks at home, or that their child knew the answer to many items, etc. This kind of observation from the parents, or from other people who know the child, are common in assessment of autistic children (Eagle 2003). Clinicians also find it challenging to assess young autistic children (Akshoomoff 2006) and researchers tend to exclude children deemed “untestable” from their studies. For example, many studies have inclusion criteria such as having an IQ higher than a certain value, which lead to the exclusion of participants unable or less able to complete intellectual assessment and to an inevitable bias in the results. The proportion of autistic children that are actually “untestable” is hard to estimate since the number of participants excluded is not necessarily reported in the studies. Furthermore, when some participants are unable to complete the assessment despite meeting the inclusion criteria, the proportion of “untestable” children reported is necessarily lower than what was found in the present study where no exclusion criteria were applied from the beginning. For example, Sutera et al. (2007) reported that approximately 14% of 2 years old AS children were untestable on the MSEL or Bayley Scales of Infant Development, which is slightly lower than what was found in the present study (19% did not complete the MSEL).

These testability issues raise important considerations for experimental and clinical assessment with this population. It might be difficult to have access to the full potential of an autistic child in an assessment setting (Eagle 2003; Filipek et al. 1999). Such assessments should therefore include (1) multiple sessions if necessary and (2) different types of tests that allow appreciating both strengths and weaknesses, in order to have a complete portrait of the child’s level of functioning and cognitive abilities.

Cognitive Profile

The present results suggest caution when choosing and interpreting tests at such a young age. It was previously shown in other samples of autistic preschoolers that performance on the MSEL (Munson et al. 2008; Swineford et al. 2015), on the Leiter-R (Kuschner et al. 2007), on the Stanford-Binet fourth and fifth editions (Grondhuis et al. 2018; Grondhuis and Mulick 2013; Mayes and Calhoun 2003a) and on the Differential Ability Scale (Farmer et al. 2016) differed depending on the subscale and/or subtest investigated, with non-verbal abilities generally being significantly higher than verbal abilities. We replicated this uneven profile within IQ tests in both MSEL and WPPSI-IV in our sample of preschoolers, thus highlighting that FSIQ should not be calculated nor interpreted when such important differences are present. Furthermore, regarding between-test differences, in both of our groups, the MSEL depicted lower performance than other tests. Importantly, average performance on MSEL would have been even lower if –as the test manual indicates– a score of zero was attributed to children who refused to collaborate, and if flexible testing was not used. This latter result is in line with what was found by Munson et al. (2008), who showed that the performance of the majority (59%) of autistic children in their large sample was very low in both the verbal and non-verbal subscales of the MSEL. Consequently, using the MSEL as a sole indicator of intellectual potential in either research or clinical setting could lead to an underestimation of the child’s abilities and, in turn, be detrimental to the optimization of his potential. Furthermore, the stability of the MSEL was shown to be lower in autistic compared to non-autistic children, particularly for the verbal subscales (Chawarska et al. 2009).

Moreover, as it was shown in school-age children, teenagers and adults on the autism spectrum (Barbeau et al. 2013; Bölte et al. 2009; Charman et al. 2011; Dawson et al. 2007; Nader et al. 2014; Sahyoun et al. 2009), the present results replicate the finding that RPM is the test on which autistic individuals perform the best. Despite the fact that the RCPM was also the test on which TD children performed the best, we showed that the between-tests difference was significantly greater in the autistic group. Thus, by extending the Wechsler-RPM discrepancy to preschool age, the present results contribute to this now well-established finding in the autistic population. Our results also suggest that the RPM is a relative strength in the autistic preschoolers that were able to complete the three IQ tests and their performance on this test was identical to that of TD children, who were only matched on chronological age and who had a mean measured Wechsler FSIQ of 113. This finding thus reinforce the relevance of including the RPM when assessing autistic children or adults in order to measure and have access to both the strengths and weaknesses of the child.

As for the correlation between perception and intelligence previously reported in autistic children and adults (Barbeau et al. 2013; Courchesne et al. 2015; Meilleur et al. 2014; Soulières et al. 2011), the present results suggest that it is also present during preschool years in autism, at least in the subgroup of autistic children who were able to complete IQ tests. First, AS children demonstrated good performance on Visual Search and CEFT in our study. Although they did not outperform TD children, it is important to mention that the AS children performed as well as TD children in the CEFT and in the harder conditions of the Visual Search, despite having on average a Wechsler IQ more than 30 points lower. This suggests better visual perceptual abilities in AS children than predicted based on their Wechsler IQ. Alternatively, it is also possible that the superiority in perception develops with age, being less evident during preschool years. Second, despite the small sample size and the fact that testability issues might have particularly affected the results on the perceptual tasks (timed tasks on which attention level, motivation, etc. might have a bigger impact), significant correlations with cognitive tests were found in the AS group, more so than in the TD group. Furthermore, correlations with IQ tests were systematically stronger in the AS group thus suggesting that perception plays a greater role in intelligence test results in autism. The exact nature of the relationship between perceptual abilities and intelligence is still unknown, but the present results are in line with previous findings and with the Enhanced Perceptual Functioning model of autism that suggests a generally greater role of perception in autistic cognition (Mottron et al. 2006).

Limitations and Future Directions

The present study replicates and extends our previous findings with minimally verbal school-aged autistic children (Courchesne et al. 2015). It figures amongst the first attempts to document intellectual profile in autistic children during preschool years. The data presented is thus preliminary by its nature, but still informative regarding testability issues and cognitive profile.

The use of flexible testing leads to testability data that is not test-specific. For example, it is impossible to record the off-task time during each test and compare it, since as soon as the child became bored or uninterested we switched to another subtest/test. This assessment technique is however promising and may allow better testability. It will be of paramount importance to document this technique more systematically in future studies, for example by documenting how many switches are necessary for each child, or registering the time spent on each subscale and on each test in total. It will also be important to compare overall testing time using flexible testing versus conventional assessment. Additionally, it would be relevant to document the cognitive profile of autistic preschoolers on other tests frequently used with this population (Leiter-3, Stanford-Binet-5, Bayley Scales of Infant Development, Differential Ability Scale, etc.) and on other strength-informed tasks or subtests (inspection time, mental rotation, Picture Concepts from the Wechsler, Frame and Block version of the Leiter, Picture Similarities from the Differential Ability Scales, etc.). Our results are also limited by the small number of participants who were able to complete the three IQ tests and hence, who were included in the comparison analysis. A replication of these findings with a bigger sample would allow gaining a more precise idea of the proportion of children that are «testable» during preschool years, and further documenting the testability relationship with age by testing the children over many time points. A larger sample followed over many years would also allow a further investigation of the development of the superiority in visual tasks in autism and its possible association with intellectual abilities. Furthermore, regarding the complexity of the tasks, in light of the counter-intuitive finding that more complex tasks or items are sometimes better performed than simpler ones, it would be relevant to specifically test this hypothesis empirically. Clinically, this finding could also be relevant and we encourage clinicians to try more complex items even if the simpler ones were failed. Our understanding of cognitive development in autism is still sparse and research results strongly suggest that it may not follow the same rules as in typical development.