Introduction

Autism spectrum disorders (ASD) are characterized by persistent impairments in communication and social interaction and by restricted, repetitive patterns of behaviour, interests or activities (DSM-5) [1]. One of the cognitive impairments proposed to underlie (at least some of) these symptoms is executive dysfunction [24]. However, investigating executive functioning (EF) in ASD and its association with ASD symptomatology entails certain difficulties. For instance, the EF construct is ill-defined and many different definitions co-exist [5, 6]. In general, EF is described as an umbrella term covering several interrelated but distinct higher-order cognitive functions, serving goal-oriented regulation of thoughts and actions [7, 8]. Yet, there is no consensus on the constituting factors. Different factor-analytic studies have yielded different results, depending on the measures included in these analyses [5]. Here, we discern the five overlapping EF domains described by Pennington and Ozonoff [9] and Hill [3]: (1) inhibition, the ability to suppress a certain behaviour or to ignore distracting information; (2) cognitive flexibility (or set-shifting), the ability to shift between different thoughts or actions; (3) generativity (or fluency), the ability to generate novel ideas; (4) working memory, the ability to hold certain information active while performing a task; and (5) planning, the ability to look ahead before starting to perform a task. When reviewing studies on EF in ASD, deficits have been reported for each of these domains, but many inconsistencies emerge (for reviews see [3, 912]). This troubles insight in the actual EF disabilities in ASD. Since most studies only examined a subset of the five EF domains and often used different tasks and different samples, it is impossible to disentangle the contribution of different sources causing inconsistencies. Therefore, this study assessed a wide range of EF processes within the same sample, enabling us to investigate the influence of task and sample characteristics independently and providing a broad picture of the EF profile in ASD.

Different studies have used different tasks to measure a specific EF ability, often leading to inconsistent results (for reviews see [3, 912]). One possible explanation for these inconsistencies is task impurity: solving an EF task always requires a combination of EF and non-EF processes, and different EF tasks require a different combination of these processes [5, 8]. This task impurity precludes an unequivocal interpretation of research findings. We therefore tried to overcome this problem by measuring each EF domain separately and by controlling for the contribution of possible confounding EF and non-EF variables. First, by applying a within-subject design where we compare performance on two task conditions that share the non-EF requirements, but differ in the particular EF process. Calculating the difference score between both conditions then yields a purer measure of that particular EF ability. Second, we looked for converging evidence. Convergence is obtained when different instruments assessing the same EF yield the same findings. Selecting highly differing instruments that claim to assess the same underlying EF ability increases the probability that convergent deficits are due to an actual EF impairment and not merely to a deficiency in any of the additional non-EF processes. We therefore measured each EF domain with several instruments tapping into different additional processes. Third, when a particular EF deficit was found, we sought to dissociate it from any confounding variables. Accordingly, we included several non-EF measures that are involved in the EF tasks. Finally, if group differences were also found on possible confounds, we investigated whether EF impairments remained while controlling for these confounds. Confounding variables comprised various non-EF and EF abilities. For example, Miyake and Friedman [8] suggested that inhibition is a common component in all EF tasks. To determine the potential EF confounds in our EF tasks, we calculated the correlations between all laboratory EF measures.

Another hypothesis addressing the inconsistent findings in the EF ASD literature postulates that individuals with ASD are more impaired in open-ended compared to highly structured assessment situations [1315]. In open-ended (or so-called ‘self-ordered’) tasks, there are several possible strategies to perform the task and the participant has to implicitly infer the correct behaviour while being free in strategy choice. These tasks are considered to be more ecologically valid compared to highly structured tasks. In highly structured or constrained EF tasks, explicit instructions clearly indicate what the participant has to do and how this has to be done. This hypothesis emerged from the striking observation that individuals with ASD often display pronounced EF deficits in daily life, while performing adequately on highly structured laboratory tasks [10]. White and colleagues [15] compared the performance of children with ASD versus typically developing (TD) children on a series of constrained versus open-ended tasks and showed that all open-ended tasks revealed impairments in ASD, while none of the more constrained tasks did. In line with this hypothesis, we included both an open-ended and a more constrained task variant for the cognitive flexibility, generativity and working memory domains. (Note that inhibition is hard to measure ‘purely’ with an open-ended task, while measuring planning seems to be incompatible with a highly structured task). Furthermore, task administration was complemented with parent reports of EF in daily life, which may be considered as the most open-ended assessment situation.

Besides differences in task characteristics, inconsistent findings among EF studies might be due to differences in sample characteristics. Identifying the particular EF profile of individuals with ASD requires comparing their performance with that of an appropriate control group without ASD. Apart from differences in clinical status, participant groups could differ on many other aspects that may influence their EF abilities. IQ, for instance, has been shown to correlate with some but not all EF measures [16, 17]. Age as well has been associated with EF performance. Different maturation trajectories have been described for different EF measures depending on task complexity, with maturation being reached later for the more complex tasks (for reviews see [18, 19]). Given these associations, it is important to control for group differences in age and IQ. However, the way in which this control is accomplished may induce differences in cognitive profiles and may thus contribute to the observed inconsistencies between studies [20, 21].

Moreover, differences between studies in age, IQ and gender of participants may also result in a different pattern of impaired and intact EF abilities in ASD. Thus far, few studies have directly investigated this topic. In particular the effect of gender on EF abilities in ASD has barely been explored (for a general review of gender effects in ASD, see [22]). Since ASD is far more common in boys than in girls, many studies only include boys or ensure that the gender-ratio is group-wise matched.

In the present study, we investigated EF in children (8–11 years) versus adolescents (12–18 years) and in boys versus girls with ASD as compared to TD controls, group-wise matched for age, gender, performance IQ (PIQ) and full-scale IQ (FSIQ). Large samples were recruited to yield sufficient statistical power to detect group differences.

In addition to examining group differences in EF and potential associations with age, gender and IQ, we also aimed to address the relationship between EF and ASD symptoms. Executive dysfunctions have been particularly related to the restricted, repetitive behaviours and interests (RRBIs) of individuals with ASD [24, 2327]. Especially impairments in response inhibition, cognitive flexibility and generativity have been proposed to provoke these RRBIs [28]. Lopez and colleagues [24] also demonstrated that some EF abilities are related to RRBIs (cognitive flexibility, response inhibition and working memory), while others are not (planning and fluency). Concerning the relationship between EF and social (interaction and communication) ASD symptoms, different opinions exist. Several authors have suggested that deficient EF may also cause some of the social problems in ASD [3, 4]. More recently, however, Happé and Ronald [2] suggested that no single cognitive account can explain the whole array of ASD symptoms, but that different accounts independently relate to different symptom domains, with executive dysfunctions being selectively and specifically associated with RRBIs. In line with this view, several studies found that EF correlated with RRBIs, but not with social communication or interaction symptoms [23, 2527]. To investigate this further, we examined the association between EF performance and both RRBIs and social ASD symptoms.

Method

Participants

One-hundred seventeen Dutch speaking children, aged between 8 and 18 years, participated in the study. All had a verbal (VIQ), performance (PIQ), and full-scale IQ (FSIQ) above 70. Fifty-nine participants had a formal diagnosis of ASD, made by a multidisciplinary team according to DSM-IV-TR criteria [29]. Individuals with a neurologic disorder or severe sensory constraints were excluded, but 16 participants were diagnosed with a co-occurring developmental disorder (seven had an Attention Deficit/Hyperactivity Disorder [ADHD], one had a tic disorder, four had dyslexia, two had a developmental coordination disorder and two had an anxiety disorder) and six of them took psychoactive medication during the study. Fifty-eight participants were typically developing (TD) children, who were recruited through schools, personal contacts and advertisements. According to parental reports, none of the TD children nor any of their first-degree relatives presented a neurological or psychiatric disorder.

A subset of this total sample was included in the group comparisons. For these analyses group membership was more strictly defined, resulting in the exclusion of five individuals with ASD whose diagnosis could not be confirmed with the Developmental, Dimensional and Diagnostic Interview (3di) [30], and three TD children who scored 2 SD above the mean on the Social Responsiveness Scale (SRS) [31]. Additionally, none of the TD children showed repetitive or stereotyped patterns of behaviour as measured with the Repetitive Behavior Scale—Revised (RBS-R) [32]. Participants of both groups were group-wise matched for gender, chronological age, PIQ and FSIQ, resulting in two groups comprising 50 children each. Descriptive statistics for both groups are displayed in Table 1. To allow an unconfounded investigation of the effects of age (children versus adolescents) and gender (boys versus girls) on EF in ASD versus TD, each of the subsamples were group-wise matched for all other variables (see Table 1 in Online Resource 1).

Table 1 Characteristics of the participants matched for gender, age, PIQ and FSIQ

Measures

After describing the laboratory tasks, the EF questionnaire measuring EF in daily life and the instruments measuring ASD symptoms are presented.

Intelligence

Intelligence was assessed with an abbreviated version of the Dutch Wechsler Intelligence Scale for Children (WISC-III-NL) [33] or Wechsler Adult Intelligence Scale (WAIS-III-NL) [34], containing four subtests: Vocabulary, Similarities, Picture Completion and Block Design [35].

Inhibition

A computerized Go/No-Go task measures prepotent response inhibition. After 200 ms presentation of a fixation cross, a geometrical figure was displayed during 1000 ms [triangle (20 %), square (60 %) or circle (20 %)] and participants had to press the response button as fast as possible (i.e. a Go-trial), except when a triangle was displayed (i.e. a No-Go-trial). The figures were presented in three different colours and two different sizes. Visual feedback was provided for 600 ms if participants responded incorrectly or too slowly. Successive trials were separated by an intertrial interval of 500 ms. A first practice block of 10 trials, with feedback after each trial, was followed by a second practice block consisting of 10 trials that were identical to the experimental trials. Afterwards, participants completed 120 randomly intermingled trials comprising 20 % No-Go-trials. The inhibition outcome variable is the percentage No-Go errors. The error percentage and mean reaction time (RT) on equally infrequent Go-trials (circles) provides an indication of non-inhibitory processes like sustained attention and impulsivity.

The Flanker task is similar to the one described by Christ, Kester, Bodner and Miles [36] and measures resistance to distractor interference. After presentation of a fixation cross (500 ms), a target stimulus (an arrow pointing left or right) was displayed and participants had to press the corresponding response button (left or right, respectively). On compatible trials the target was flanked by four arrows (two on each side) pointing in the same direction as the target (← ← ← ← ← or → → → → →). On incompatible (inhibitory) trials, the target was flanked by four arrows pointing in the opposite direction (→ → ← → → or ← ← → ← ←). Each arrow subtended 1.6° of visual angle and the adjacent arrows were separated by 0.2°. For each trial, stimuli remained on the screen until a response was made, or until more than 3000 ms elapsed. Feedback was provided visually during 1000 ms. After an interval of 1000 ms the next trial began. After completing practice blocks without and with flankers and with extensive feedback, participants completed 120 randomly intermingled trials (60 compatible, 60 incompatible). As outcome measure the inhibition cost was defined as the mean RT and error percentage on incompatible minus compatible trials.

Cognitive flexibility

The Wisconsin Card Sorting Task With Controlled Task Switching (WCST-WCTS) requires self-directed or internally controlled rule shifting and is previously described by Van Eylen et al. [37]. This is the more open-ended cognitive flexibility task, since no explicit instructions are provided about the rules that should be applied, nor that a rule switch will occur. Compared to the original WCST, the influence of confounding variables is minimized by reducing social demands, working memory and generativity load, and by providing a within-subject calculation of the switch cost. On each trial, three cards were presented on a computer screen: one at the top and two at the bottom. Participants had to indicate which of the two cards at the bottom matched the card at the top, based on either colour or shape. The correct sorting rule was not made explicit, but had to be derived based on the feedback. The sorting rule changed without explicit warning after a variable number of consecutive correct trials. The main outcome measures are the mean number of perseveration errors and the switch cost RT (switch trial RT minus maintain trial RT).

The Switch task assesses externally controlled rule shifting (based on [38]). This is a highly structured cognitive flexibility task, because a cue is shown on each trial, explicitly indicating which rule should be applied and therefore also providing information about when to switch and where to switch to. Similar to the WCST-WCTS social demands and other confounds are minimized in this computerized task. Participants watched a grid divided into four squares with a double-headed arrow in the centre pointing either horizontally or vertically (1600 ms). After 200 ms, a red dot appeared in one of the four squares (1400 ms), followed by an empty grid (800 ms). As soon as the red dot appeared, participants had to press the button corresponding with the position of the dot, on a diamond-like four-button response box (using only their index fingers). If the arrow pointed horizontally, participants had to indicate whether the dot was on the left or right side of the grid by pressing the left or right button. If the arrow pointed vertically, participants had to indicate whether the target was in the lower or upper half of the grid by pressing the bottom or top button. After 2–7 repeat trials a switch trial occurred with the direction of the arrow changing position. The task comprised four blocks, each containing 36 trials (including six switch trials).The task was preceded by two practice blocks where feedback (correct/incorrect) was provided. The main outcome measures are the switch cost RT and the switch cost error percentage (switch trial error percentage minus maintain trial error percentage).

Generativity

The Uses of Objects task [39, 40] is an open-ended test measuring the ability to generate new ideas (ideational fluency). Participants were asked to generate as many useful uses as they could for six different objects (90 s per object). Half of the objects had an obvious conventional function (conventional items) and half of them had no clear established function (non-conventional items). We intermittently presented a conventional and a non-conventional item in a fixed order across participants. Scoring was similar to Bishop and Norbury [39], and differentiated correct, incorrect (not useful, implausible or vague responses, or when merely a description of the object was provided), redundant and repetition (a literal repetition of a previous idea) responses. The number of correct responses is the main outcome measure, counted for the conventional and non-conventional items separately and combined (total correct responses). Additionally, we calculated the total number of responses, and the percentage of incorrect, redundant and repetition responses.

The Design Fluency test is part of the Delis–Kaplan Executive Functions System (D-KEFS) [41] and is a more constrained generativity task than the Uses of Objects task. Although this task is still somewhat open-ended (which is necessary to measure generativity), it is more constrained than the Uses of Objects task, since several rules are imposed explicitly restricting the correct way of performing the task. It consists of three conditions, but we only focused on the first one, providing a basic test of design fluency. In this condition, rows of boxes were presented on a piece of paper, with each box containing the same array of black dots. The participant had to draw a different design in each box by connecting the dots using four straight lines and each line had to touch at least one other line at a dot. The number of unique and correct designs provides a measure of generativity.

Spatial working memory

The Spatial Working Memory test is part of the Cambridge Neuropsychological Test Automated Battery (CANTAB) [42] and assesses the ability to retain and manipulate spatial information. It is a self-ordered, open-ended task, because the participant has to work out a suitable strategy on his own. A number of boxes (4, 6 or 8) were presented on a touch screen and the participant had to find a ‘token’ in each of them by touching the correct box. Only one token was hidden at a time, and within the same trial it was never hidden in the same box again. A trial terminated when the token was found in each of the boxes. The test comprised 12 trials (four trials for each box number) and was preceded by three practice trials. An error was defined as the selection of a box which did certainly not contain a token, either because the participant revisited a box in which a token was previously found or because the participant revisited a box that was already found to be empty during the same search. The main outcome measure is the number of errors, counted for the 4, 6 and 8 box trials separately and combined (total errors). Since an organized search strategy can minimize working memory load, a summary index of the applied strategy (i.e. the number of search sequences starting with a different box) was also registered as a control measure.

The Spatial Span subtest of the Wechsler Non Verbal-NL [43] measures spatial working memory. It is a highly constrained task, since explicit instructions clearly indicate what the participant has to do and how this has to be done. A board containing 10 blocks in a specific configuration was placed in front of the participant. After the experimenter tapped a number of blocks, the participant had to touch the same blocks, either in the same order (forward condition) or in reversed order (backward condition). Each condition started with a practice trial, followed by 16 experimental trials (sequentially increasing the number of tapped blocks from two to nine, with two trials for each number). The forward condition was administered before the backward condition. The number of correct trials was counted for the forward and backward condition separately and combined (total correct trials).

Planning

The Tower test of the D-KEFS [44] was administered to assess planning. Participants had to build a designated tower in as few moves as possible by moving five disks varying in size across three pegs, while moving only one disk at a time and never placing a larger disk on a smaller one. At the beginning of each trial, the experimenter placed a number of disks on the pegs in a predetermined starting position and displayed a picture showing the ending position of the disks. The move accuracy ratio (the actual number of moves divided by the number of minimally required moves) reflects the effectiveness of the employed strategy. Additionally, we assessed the time needed to make the first step and the mean time per step.

Motor screening and processing speed

The Motor Screening test is part of the CANTAB [42] and screens for basic visual, motor and task comprehension difficulties. Participants had to touch a cross, displayed at different locations on a touch screen, as fast as possible. Response latency of the correct trials was recorded.

RTs on the compatible trials of the Flanker task and on the maintain trials of the WCST-WCTS and Switch task were used as a general measure of processing speed.

EF in daily life: behavior rating inventory of executive function (BRIEF)

The Behavior Rating Inventory of Executive Function (BRIEF) is a parent-report questionnaire assessing impairments in EF in daily life [45]. We report the four subscales that match with the delineated EF domains: Inhibition, Shifting (flexibility), Working Memory and Planning.

ASD symptoms

The Developmental, Dimensional and Diagnostic interview (3di) is a computerized semi-structured interview providing a quantitative score for the three main symptom domains of ASD (according to the DSM-IV-TR: social interaction, social communication and RRBIs) and for several other domains of development and general functioning (including co-occurring psychiatric disorders) [30]. It also contains a diagnostic algorithm for ASD. Agreement with ICD-10-classification [46] and Autism Diagnostic Interview-Revised [47] is very good [30].

The Social Responsiveness Scale (SRS) for children and adolescents is a normed questionnaire, developed to assess a wide range of behaviours characteristic of ASD [31, 48]. It consists of five so-called ‘treatment scales’: Social Awareness, Social Cognition, Social Communication, Social Motivation and Autistic Mannerisms. By applying factor-analysis Frazier et al. [49] demonstrated that a two-factor model, dividing SRS social and autistic mannerisms scales consistent with DSM-5 ‘social communication/interaction’ and RRBIs domains, best explains the variance in SRS scores. Accordingly, we summed the scores of the ‘social’ scales to obtain one index of social (communication and interaction) ASD symptoms, while the score on the Autistic Mannerisms scale was taken as an index of RRBIs.

The Repetitive Behavior Scale—Revised (RBS-R) assesses the RRBIs observed in individuals with ASD [32]. The total score is reported. The questionnaire was translated to Dutch by translation and back-translation.

Procedure

Participants were tested individually in a quiet room, either at the University Hospital or at school. Besides the tasks described above, additional local–global visual processing tasks were administered as part of another study. The whole testing took about 4 h, divided into four 1-h sessions. Enough breaks were provided to avoid fatigue. Even for the computerized tasks, it was possible to take a break whenever necessary. When the participant became inattentive (failed to respond during stimulus presentation) the task paused and was only resumed when he/she was ready. Additionally, computerized tasks were alternated with other task formats to provide enough variation. To avoid order effects, the order of sessions and the order of tasks within a session were counterbalanced. Participants received a small reward for their participation.

Computerized tasks were run on a Dell Latitude E6400 notebook. Stimuli of the Go/No-Go, Flanker and Switch task were presented on the notebook’s screen. For the other computerized tasks, a 17-in. Elo Entuitive touch screen was used.

Questionnaires were completed by the participants’ parents. The 3di was administered to the parents as part of a research project from Wouter De la Marche [50]. 3di data were only collected from individuals with ASD.

Data analyses

Prior to analysis, appropriate transformations (square root or logarithm base 10) were applied if necessary to obtain normally distributed variables. In the tables, the values for the mean and standard deviations result from the raw, non- transformed variables. For the RT data, only the correct trials were used and within-subject outliers (>2.5 SD of the participant’s own mean) were excluded. Group outliers (>2.5 SD of the group mean) were excluded for all variables. Analyses were performed with and without exclusion of group outliers, except for five variables that only showed a normal distribution after outlier exclusion. As analyses with and without outlier exclusion yielded essentially the same results, for the former set of variables only analyses including group outliers are reported.

For all main EF measures, the effects of group (ASD versus TD), age (children versus adolescents), gender and all two-way interactions were investigated. The three-way interaction between group, age and gender was not included in the model, because the number of observations in each cell was too small to produce reliable results. An adapted backward model selection procedure was applied. Starting from the full model including all effects, all effects with a p value ≥ 0.20 were subsequently eliminated. For the remaining effects, all possible model combinations were fitted and the best model was selected based on the Akaike and Bayesian Information Criteria (AIC and BIC, respectively) [51]. Only this final best model is reported. Since the effect of group was our main interest, it was always included in the model. Whenever group differences emerged on the laboratory tasks, we examined whether these remained after controlling for possible confounding variables, by including the confounds as covariates in the analyses. To determine the potential EF confounds in our laboratory EF tasks, we calculated Spearman partial correlations (corrected for age and FSIQ) between the main EF measures (see Table 2 in Online Resource 1). If two EF measures of different EF domains were significantly correlated and group differences were found on both measures, we examined whether group differences on these EF measures remained when including the other measure as a covariate (based on the presumed influence of one EF measure on the otherFootnote 1). Furthermore, for all main EF measures on which we found significant group differences, we also checked the influence of ASD probands with a co-occurring ADHD diagnosis by repeating the analyses, excluding these participants from the sample. Doing so, all significant group differences remained.Footnote 2

Table 2 Performance on the main outcome variables per group per EF task

For all additional EF and non-EF measures, only the effect of group was examined. ANOVA was applied for most measures. Repeated-measures mixed model analyses were used when within-subject variables were included (for the main outcome measures of the Uses of Objects task, the Spatial Working Memory test and the Spatial Span test), and to analyse the repeated measures of the processing speed variables. For scores that could not be transformed to a normal distribution, non-parametrically Mann–Whitney U tests were used (% Go errors and RBS-R Total score). Post hoc tests were corrected for multiple comparisons using Tukey–Kramer correction. A significance level of p < 0.05 (two sided) was adopted for all analyses. For all main EF measures, Cohen’s d effect sizes were calculated by dividing the estimated group difference (Least Square Means) in the final model by the pooled standard deviation (√[(σ 21  + σ 22 )/2]). An effect size ranging from 0.2 to 0.3 is considered small, values around 0.5 are medium and values of 0.8 or above are considered large effects [52].

Spearman (partial) correlations were calculated (on the entire sample, N = 117) to investigate the association between EF measures and age, FSIQ and ASD symptoms. Additionally, we applied generalized linear models to test whether the correlations between EF measures and ASD symptoms differed between both groups. Since these models provide parametric tests, such analyses could only be performed for normally distributed data (and not for the following variables: the measures of the RBS-R, the inhibition cost % errors of the Flanker task, the perseverative errors of the WCST-WCTS and the switch cost % errors of the Switch task).

For some variables there were missing data, mostly limited to one participant per measure. On the RBS-R we have many missing data because it was added to the protocol at a later stage. For the Uses of Objects task only data of the participants included in the matched sample were scored.

Results

Group comparisons

Table 2 displays descriptive statistics for both groups based on the final model for each of the main EF measures. Corresponding effect sizes comparing ASD versus TD are presented in Fig. 1. Descriptive statistics for both groups for the additional EF and non-EF measures are displayed in Table 3. Descriptive statistics comparing children versus adolescents on the main EF measures are displayed in Table 4.

Fig. 1
figure 1

Effect sizes (expressed as differences in standard deviations [SD]) and 95 % confidence limits (CL) for group differences in performance on the main EF measures. Positive scores indicate better performance for TD individuals compared to individuals with ASD

Table 3 Group comparisons for ASD versus TD on additional EF and non-EF measures
Table 4 Comparison of children versus adolescents on the main EF measures

Inhibition

On the Go/No-Go task, the percentage No-Go errors was higher in individuals with ASD compared to TD individuals. Additionally, children made more No-Go errors compared to adolescents. For RT and percentage errors on equally infrequent Go-trials, both groups performed comparably.

On the Flanker task, no group differences were found.

Cognitive flexibility

On the WCST-WCTS, the ASD group made more perseverative errors than the TD group, and children made more perseverations than adolescents. For switch cost RT, there was an insignificant trend for a higher switch cost in the ASD group.

On the Switch task, the switch cost RT was similar for both groups, but higher in children than adolescents, and higher in girls than boys (F(1,95) = 4.2, p = 0.04). The switch cost error percentage was higher in the ASD group, but this effect was only significant for the children (Group × Age interaction: F(2,92) = 4.34, p = 0.01). Moreover, children in the ASD group had a higher switch cost error percentage than adolescents (t(45) = 2.89, p = 0.02), while no age effect was observed in the TD group (t(47) = −0.55, p = 0.95).

Significant group differences on both cognitive flexibility tasks were reduced, but remained significant after controlling for possibly confounding EF impairments. More specifically, the number of perseverative errors of the WCST-WCST and the switch cost errors percentage of the Switch task correlated with the main outcome measures of the Go/No-Go task and the Spatial Working Memory task (see Table 2 in Online Resource 1) for which EF impairments were found in ASD individuals. After controlling for these impairments in inhibition (i.e. percentage No-Go errors) and working memory (i.e. total errors on the Spatial Working Memory task), ASD individuals still made more perseverative errors on the WCST-WCST (F(1,87) = 4.60, p = 0.03) and had a higher switch cost percentage errors on the Switch task (F(1,86) = 6.83, p = 0.01). But this group difference on the Switch task was again only significant for the children (Group × Age interaction: F(2,86) = 2.45, p = 0.09; children ASD versus TD: p = 0.006; adolescents ASD versus TD: p = 0.98).

Generativity

Individuals with ASD generated fewer correct answers on the Uses of Objects task, due to a higher percentage of redundant and incorrect responses. However, the total number of responses and the percentage of literal repetitions were equal in both groups. Furthermore, more correct answers were given on non-conventional compared to conventional items (F(1,100) = 163.84, p < 0.001), and by adolescents compared to children. There was also a significant Item Type × Age interaction (F(1,100) = 6.76, p = 0.01), with a significant age effect only for non-conventional items (t(157) = 4.86, p < 0.001; conventional: t(157) = 2.31, p = 0.10), however, the effect of item type was significant for both age groups (non-conventional compared to conventional, children: t(100) = 7.14, p < 0.001; adolescents: t(100) = 11, p < 0.001). After controlling for VIQ (a non-EF confound), group differences were reduced but remained significant for the number of correct answers (F(1,100) = 9.49, p = 0.003),Footnote 3 the percentage of redundant answers and the percentage of incorrect answers (both p = 0.01). For all these variables the effect of VIQ did not differ by group (no VIQ × Group interaction; all p > 0.37).

Both groups generated a similar number of correct responses on the Design Fluency test, but adolescents generated more correct answers than children.

Spatial working memory

On the Spatial Working Memory test, a main effect of group was found, but individuals with ASD only made significantly more errors in the most difficult condition with 8 boxes (Group × Number of Boxes interaction: F(2,198) = 4.00, p = 0.02). Individuals in both groups made more errors as the number of boxes increased (F(2,198) = 301.18, p < 0.001). Children also made more errors than adolescents, but only on trials with six or eight boxes (Age × Number of Boxes interaction: F(2,198) = 22.96, p < 0.001; children versus adolescents: 4 boxes: t(263) = .58, p = 0.99; 6 boxes: t(263) = 4.53, p < 0.001; 8 boxes: t(263) = 8.84, p < 0.001). The correlation analyses between the main EF measures revealed a significant correlation between the number of total errors on this task and the measures of both cognitive flexibility tasks for which EF impairments were found in ASD individuals (see Table 2 in Online Resource 1). After controlling for these cognitive flexibility impairments (i.e. number of perseverative errors of the WCST-WCST and the switch cost errors percentage of the Switch task), no significant group differences were found (main effect of group: F(1,92) p = 0.70; Group × Number of Boxes interaction: F(2,184) = 2.59, p = 0.08, for all number of boxes the effect of group was non-significant: p > 0.46). The control measure of the Spatial Working Memory test, namely search strategy, did not differ between both groups.

There was no effect of forward versus backward spatial span measured with the Spatial Span test. There was an insignificant trend for reduced spatial span in ASD. Children had a smaller spatial span than adolescents.

Planning

Although individuals with ASD needed more time to take the first step on the Tower test, no group differences were observed for the move accuracy ratio nor for the mean time per step.

Motor screening and processing speed

There were no group differences on the Motor Screening test or on any of the processing speed measures.

BRIEF

Individuals with ASD presented significantly more problems than TD individuals on all analysed subscales of the BRIEF: Inhibition, Shifting (or cognitive flexibility), Working Memory and Planning. The cognitive flexibility problems were the most pronounced. For inhibition, group differences were more pronounced for children than for adolescents, but remained significant for both age groups (Group × Age interaction: F(2,91) = 4.84, p = 0.01). Moreover, the effect of age was only significant for ASD individuals (t(48) = 3.08, p = 0.01; TD group: t(48) = −0.47, p = 0.97).

Correlations

Table 5 displays correlations between main EF measures and age, FSIQ and ASD symptoms.

Table 5 Spearman correlations between main EF measures and age, FSIQ and ASD symptoms

Correlations with age and FSIQ

Increasing age was generally associated with better EF performance on the tasks: fewer errors on the Go/No-Go task, the WCST-WCTS and the Spatial Working Memory task; a lower switch cost (% errors and RT) on the Switch task and a lower inhibition cost RT on the Flanker task; and more correct answers on the Uses of Objects, the Design Fluency and the Spatial Span task. Age was also negatively correlated with inhibition problems as measured with the BRIEF. There was also a trend for a negative correlation between age and inhibition cost % errors of the Flanker task. The number of restricted, repetitive behaviours as measured with the RBS-R decreased with age as well.

Likewise, higher FSIQ was associated with better EF performance on the tasks: fewer errors on the WCST-WCTS and the Spatial Working Memory task; a lower switch cost RT on the switch task; and more correct answers on the Uses of Objects, the Design Fluency and the Spatial Span task. FSIQ was also negatively correlated with all BRIEF scales and ASD symptomatology as measured with the SRS and the RBS-R.

Correlations between EF measures and ASD characteristics

A higher score on social problems and RRBIs as measured with the SRS was associated with poorer EF performance: more perseverative errors on the WCST-WCTS, a higher switch cost (% errors) of the Switch task, fewer correct answers on the Uses of Objects task, a higher score on all BRIEF scales and a trend for a higher percentage No-Go errors. Additionally, a trend was observed for a positive correlation between social problems and the move accuracy ratio of the Tower test and between RRBIs and the number of correct trials of the Spatial Span task.

Similarly, a higher score on RRBIs as measured with the RBS-R total scale was significantly associated with fewer correct responses on the Uses of Objects task and a higher score on all BRIEF scales. Also a trend for a higher percentage of No-Go errors was observed.

None of these associations differed significantly between the groups (all p > 0.13).

Discussion

Influence of task characteristics: open-ended versus structured EF tasks

Compared to TD individuals, individuals with ASD showed impairments in all EF domains, with the most pronounced and most consistently found deficits in cognitive flexibility. No impairments were found on any of the control measures and EF group differences remained after controlling for possible non-EF confounds, suggesting that the observed group differences effectively reflect executive dysfunction in ASD. However, for none of the EF domains full convergence was met, since group differences on a specific EF domain depended on task characteristics and occasionally on the age of the participants. For inhibition, individuals with ASD showed impaired prepotent response inhibition (Go/No-Go task), but intact resistance to distractor interference (Flanker task). Correspondingly, Hill [3] postulated that inhibition of prepotent responses is particularly impaired in individuals with ASD, with sparing of other types of inhibition. However, recent meta-analyses indicated problems with both prepotent response inhibition and interference control, with a larger effect size being associated with prepotent response inhibition difficulties, but many inconsistencies between studies [53]. For cognitive flexibility, generativity and working memory, two tasks were included per domain, varying in the degree of open-endedness. Group differences in these domains were generally more pronounced and stable over development for the more open-ended tasks. Cognitive flexibility deficits were observed on both flexibility tasks, but only on the more open-ended WCST-WCTS task group differences were found for both age groups, while on the more structured Switch task impairments were restricted to the ASD child group (8–11 years). These findings replicate previous reports of intact performance on the Switch task for adults with ASD [54]. Furthermore, impaired performance on the WCST-WCTS in individuals with ASD has been previously described by Van Eylen et al. [37] and is consistent with many other reports of increased perseverative errors on the original WCST (for reviews, see [55, 56]). However, a serious disadvantage of the original WCST version is the impurity of the task in that other EF and non-EF processes are involved [55]. Our study provides a major contribution to the understanding of cognitive flexibility impairments in ASD by showing that these deficits are also found on a purer WCST variant and remain after controlling for impairments in response inhibition and working memory. On both cognitive flexibility tasks, we also found that individuals with ASD made more switch errors but had comparable switch cost RT, probably due to a speed-accuracy trade-off. For generativity and working memory, group differences were only observed on the most open-ended tasks (Uses of Objects and Spatial Working Memory), but not on the more structured ones (Design Fluency and Spatial Span). More specifically, on the Uses of Objects task the total number of responses and the percentage of literal repetitions were equal in both groups. However, ASD individuals generated fewer correct answers due to a higher number of incorrect and redundant responses. These findings largely correspond to previous reports of generativity problems in individuals with ASD when measured with the Uses of Objects task [39, 40, 57, 58], and intact generativity abilities when measured with the Design Fluency test [59]. When reviewing spatial working memory studies, largely consistent deficits have been found on the Spatial Working Memory task, with more spared performance on spatial span tasks (for reviews, see [10, 60]). Interestingly, we found that group differences on the Spatial Working Memory task disappeared after controlling for cognitive flexibility impairments. Regarding planning, we observed subtle impairments on the Tower test (of the D-KEFS), as individuals with ASD appeared to need more time to generate a plan, but they executed the plan in a similar manner as TD individuals. The lack of a group difference on the main outcome measure of the Tower test was quite surprising, given that planning is one of the EF domains that is most consistently found to be impaired in individuals with ASD (for reviews, see [3, 12]). Even if the same D-KEFS Tower test was employed, Lopez et al. [24] described reduced planning abilities in ASD, suggesting that the lack of group differences in our study is probably not due to the specific planning task used. However, our findings are in line with several other studies reporting intact performance of individuals with ASD on a measure of planning (for reviews see [3, 10, 12]). Finally, the most open-ended EF measures (from the BRIEF), reflecting EF in daily life, revealed the most pronounced impairments in all measured EF domains (inhibition, cognitive flexibility, working memory and planning).

Taken together, our study indicates that individuals with ASD show more EF difficulties in open-ended settings (both daily life and laboratory tasks) than on more structured tasks. Some EF impairments were even restricted to open-ended situations only. However, the implications of these findings are less clear. One interpretation holds that open-ended situations are more taxing and require more executive control, hence making them more sensitive to EF impairments, whereas highly constrained tasks provide more structure and organization, thereby relieving the EF demands [13]. Accordingly, it may be concluded that EF impairments in ASD are rather subtle, as they preferentially show up in taxing open-ended situations. An alternative view refers to the task impurity problem and suggests that poorer EF performance that is restricted to open-ended situations is not due to core executive dysfunction, but results from difficulties with other processes inherent to unconstrained tasks. This view is recently gaining traction with several theories postulating that impairments on open-ended EF tasks may result from another underlying cause. For example, White [14] stresses that unconstrained, open-ended tasks do not provide explicit instructions indicating what to do and how to do it, but that this information has to be inferred implicitly. She further hypothesizes that individuals with ASD have an impairment in ‘Inferring Implicit Information’ (Triple I impairment), due to mentalizing difficulties, underlying their impairments on open-ended tasks. According to Gomot and Wicker [61], executive processes closely rely on predictive abilities allowing a flexible adaptation to changing environmental contingencies. They argue that individuals with ASD are particularly poor at dealing with information that is rather unpredictable and less controllable (as in open-ended tasks), because they have a dysfunction in the ability to spontaneously adapt predictions in a flexible manner (see also [62]).

Based on our findings, we can conclude that individuals with ASD have genuine impairments in prepotent response inhibition and cognitive flexibility, since these impairments were also found on highly constrained, ‘pure’ tasks and remained after controlling for possible EF confounds. Additionally, subtle working memory problems are suggested, because group differences were not found on a highly constrained working memory task and deficits on the open-ended Spatial Working Memory task were restricted to the most difficult condition. As performance on the easier but equally open-ended conditions of the Spatial Working Memory task was spared, this hints that impairment on the most difficult condition was not merely due to the open-endedness of the task, but rather points to subtle working memory problems that only show up in taxing situations. However, since group differences disappeared after controlling for cognitive flexibility deficits, the problems on the Spatial Working Memory task may result from cognitive inflexibility (which mainly plays a role in the most difficult condition of the task). Problems with generativity and planning were also only observed in open-ended assessment situations, but it is less clear whether these deficits are due to subtle impairments in these EF domains or result from the open-ended nature of the measurements. Further research is needed to provide clarity.

Influence of sample characteristics on EF

We observed pronounced effects of both age and IQ on EF performance, but little effects of gender. The only gender effect was observed on the Switch task, with higher switch cost RT for girls than boys.

The observed age effects in this study are consistent with previous reports (for a review, see [18, 19]). Generally, adolescents performed better than children, but the size of the age effect depended on the task and the complexity of the tasks conditions. Evidently, task conditions targeting processes that mature the most during adolescence are the most sensitive to reveal age effects. On the Spatial Working Memory test, for instance, age effects were only observed for the most difficult conditions (6 and 8 boxes). This suggests that the performance on the 4-box items already matures around the age of 12 years, while working memory development continues throughout adolescence for more complex conditions [19]. On the Uses of Objects task, however, adolescents outperformed children only for the easier non-conventional items (easier than conventional items probably because they do not require disengagement from the conventional meaning). This finding may indicate a slower maturation for the difficult conventional items, suggesting that the performance on these items has not yet matured in adolescence, while this might be the case for the non-conventional items. On several measures, no age effects were found, probably because children already reached a mature level. Regarding the Flanker and the Tower Task, this corresponds to review findings indicating little improvement during adolescence [5, 19]. On the WCST-WCTS, children needed more trials to perform a switch (made more perseverative answers), but when a switch was made they had a comparable switch cost RT than adolescents. Finally, for the BRIEF, age effects were only observed for the Inhibition subscale. This seems to fit with the normed data of the BRIEF, yielding no age effects for planning and rather stable performance from 9 years onwards for shifting and working memory [45]. The results of the correlation analyses largely correspond with the reported differences between children and adolescents. Only significant correlations of considerable size were observed for the measures showing an age effect.

In sum, although it has been suggested that different EF domains follow different maturation trajectories, our data suggest that the effect of age mainly depends on the specific measure used. For the BRIEF, only inhibition correlated with age. However, for the EF tasks an age effect was observed for all EF domains, except for planning. One possible explanation for the reduced age effects on the BRIEF subscales is that parents already take into account the age of their child when reporting its EF problems (based on expectations of ‘normal’ behaviour at that age), thus masking true age effects in EF abilities. Nevertheless, even for the EF tasks the age effect depended on the specific task used to measure a particular EF domain, with generally larger age effects for more complex tasks or task conditions.

Better EF performance was also associated with higher FSIQ, although on the task measures no such correlation was found for inhibition and planning. Friedman et al. [17] also found that working memory was associated with intelligence, while inhibition was not.

Overall, our findings replicate and complement previous reports of age and IQ effects on EF. Given these strong effects, it is clear that group differences in these variables are potential confounds that should be controlled for (however, see [63] for a different opinion). In our study, this was done by matching the groups for age, PIQ and FSIQ. Since groups differed on VIQ, this factor was included as a covariate if relevant (i.e. when group differences were found on the verbal EF task, namely the Uses of Objects task). Whether or not (and how) these confounds are controlled for could affect the emerging EF profile and therefore might cause inconsistencies between studies [63, 64]. Moreover, since the effects of age and IQ strongly depend on the EF measure, the impact of controlling for their contribution will be task dependent. Furthermore, it is important to be aware of the potential risks being associated with controlling for IQ [63, 64]. Given the sporadic and small main effect of gender, group differences in gender-ratio seem to be less crucial to control for, as they barely affect EF.

Furthermore, no group by gender interactions were found indicating that differences in EF performance between TD and ASD individuals were not influenced by gender. So far, the only other study investigating such a group by gender interaction in children did report larger response inhibition impairments of individuals with ASD in girls compared to boys [65]. However, Lai et al. [66] failed to replicate this finding in adults and did not observe a group by gender interaction for generativity impairments either, in line with our results (for a general review of gender effects in ASD, see [22]).

On two measures, however, there was a significant group by age interaction. Children and not adolescents with ASD showed increased switch cost errors on the Switch task and reported more pronounced inhibition problems on the BRIEF. Interestingly, for both measures no age effects were observed in TD individuals, while adolescents with ASD outperformed children with ASD. Accordingly, these impairments may represent a developmental delay in ASD individuals that gradually resolves (or at least reduces) while growing older. Alternatively, with increasing age, ASD individuals might mobilize compensatory mechanisms to (partly) overcome their impairments. Indirect support for the latter is provided by a neuroimaging study demonstrating intact behavioural performance but atypical brain activity in adults with ASD performing the Switch task [54]. Nevertheless, all other EF impairments in ASD remained stable throughout development. This corresponds to the more general finding of developmental stability of individual differences in EF [8].

These findings suggest that differences between studies in participants’ age or gender only contribute marginally to the reported inconsistencies in EF impairments in ASD.

EF and ASD symptomatology

Finally, we investigated the association between EF and ASD symptoms. Given the significant correlation between all measured ASD symptoms and FSIQ, and between RRBIs (as measured by RBS-R) and age, correlations between EF and ASD characteristics were corrected for FSIQ and age. Overall, we observed that poorer EF performance was associated with more social problems and RRBIs. Performance on some EF measures was even more correlated with social problems than with RRBIs. Several other studies also demonstrated an association between EF and social impairments [6770]. These findings contradict the view that EF would selectively relate to RRBIs [2].

In general, the pattern of findings in our study was highly similar for both symptom domains, but depended on the specific measures used. Concerning the EF tasks, more ASD symptoms were mainly associated with reduced cognitive flexibility and generativity (but the latter only when measured with the open-ended Uses of Objects task). Additionally, we observed a trend towards a significant association between both symptom domains and response inhibition and between social ASD problems and planning. Concerning the relationship with RRBIs, our findings largely correspond with the view of Turner [28] that mainly reductions in cognitive flexibility, generativity and response inhibition are associated with elevated RRBIs. Note however, that the correlation between cognitive flexibility and RRBIs was not significant when measured with the RBS-R, possibly because it was based on a smaller sample. Furthermore, when EF measures were based on parent report (measured with the BRIEF), stronger associations were found and both ASD symptom domains were highly significantly associated with increased impairments in all EF domains: inhibition, working memory, planning and particularly shifting. These stronger associations are maybe due to a common informant bias between the EF and symptom measures that were both based on parent report. This pattern of findings suggests that the mixed results in the literature concerning the association between EF and ASD symptoms might be due to differences in both EF and ASD symptom measures, as well as differences in sample size (for a review see [71]).

Also note that the observation of a correlation between EF and ASD symptom severity does not imply a causal relationship. Executive dysfunction accounts of ASD postulated that impaired EF causes (some) ASD symptoms, as it mediates the relationship between brain abnormalities and behaviour [2, 4, 72]. However, more recent views offer alternative perspectives. Johnson [73], for example, suggested that ASD symptoms and EF problems may each have a different underlying cause and that EF impairments moderate the relationship between biological factors and ASD symptomatology. In his view individuals with strong EF skills are better able to compensate for atypicalities in brain systems early in life, and are therefore less likely to receive a (severe) diagnosis later in life. In other words, poor EF skills are considered an additional risk factor for developing ASD. Other authors suggested that impairments on EF tasks and ASD symptoms have a common underlying cause, creating a spurious correlation between them. According to White [14], they are both due to mentalizing and Theory of Mind difficulties, while Gomot and Wicker [61] point to a dysfunction in the ability to build flexible predictions (see also [62]). Further research is needed to refine, differentiate and test these views (for some suggestions, see [71]).

Conclusion and future perspectives

This study addressed the influence of task and sample characteristics on a wide range of EF abilities in individuals with ASD and matched TD controls. EF was measured with an extensive battery designed to reduce task impurity. Thereby, new insights emerged in the inconsistencies between studies examining EF abilities of individuals with ASD. These inconsistencies largely seem to result from differences in task characteristics (with more pronounced deficits on open-ended compared to highly structured assessment situations) and less from differences in the investigated sample features. However, the strong influence of age and IQ on EF indicates that group differences in these factors could be potential confounds that should be controlled for when studying EF. Additionally, although EF impairments were associated with more severe social and non-social ASD symptoms, further research is needed to clarify the nature of this relationship.

This study offered a more advanced insight in EF in ASD, but several issues are still awaiting further exploration. Here, we list a few prominent ones. First, our study only included 8–18 year old children and adolescents with a normal IQ, hence it remains to be shown whether our findings can be generalized to individuals outside this age and IQ range. Especially more research is needed to investigate EF at pre-school age and in elderly (for studies in these ages groups, see [70, 7482]). However, studying these age groups entails additional difficulties and requires different, age appropriate EF measures [83, 84]. Second, age effects in our study were examined cross-sectionally. Further insight in the maturational trajectories and possible developmental delays of individuals with ASD should be obtained through longitudinal designs. Third, despite our efforts to increase the validity of our measures, further research is needed to elucidate the underlying constructs of each measure, e.g. by performing factor analyses. However, a reliable factor analysis on the various EF measures would require larger samples. Moreover, it is difficult to determine the validity of the measures, without uniform definitions of the underlying constructs [5, 6]. Fourth, the effect of task open-endedness should be investigated more thoroughly by systematically varying the degree of open-endedness within the same task. Finally, ASD refers to a very heterogeneous group of disorders, suggesting that one cognitive profile does not apply for the entire ASD population and many other factors than the ones addressed here (e.g. psychiatric co-occurrenceFootnote 4) could influence the cognitive profile of individuals with ASD. Therefore, trying to delineate more homogeneous subgroups, each with a different EF profile, and investigating which factors determine subgroup membership seems to be a germane future approach [85].