Introduction

Autism is a neurodevelopmental disorder characterised by severe difficulties in social interaction and in communication, a restricted repertoire of interests and activities, and repetitive, stereotyped behaviours (American Psychiatric Association 2000).

An extensive literature has focused on diminished social functioning, leading to the hypothesis that specific impairments in theory of mind (ToM) and social cognition constitute the core features of autism spectrum disorders (ASD) (e.g. Baron-Cohen 1995; Baron-Cohen et al. 1985, 1986; Frith 1989; Happé and Frith 1996). The understanding of other’s behaviour depends mostly on our ability to infer motives, beliefs, desires, and intentions from the observed ongoing actions. Previous evidence in individuals with ASD has shown that while ToM, i.e. the ability to attribute beliefs might be severely impaired, no major deficits have been reported in understanding of others’ desires and intentions (Baron-Cohen et al. 1986; Carpenter et al. 2001). Desires and intentions are simple mental states and emerge earlier than beliefs in typical ontogenesis. A primitive grasp of intentions and an elementary understanding of the link between desires, action, and goals are exhibited by 2-year-old infants, while a more sophisticated ToM, based on belief attribution, is attained between about 3 and 5 years of age (Wellman 1990; Wellman and Woolley 1990). Carpenter et al. (2001) reported that young children with autism were able to understand unfulfilled intentions and the goal state of an intended action, suggesting that although they might have a slightly less complex understanding of others’ intentions, disturbances in this domain are not as marked as deficits in ToM and joint attention. Similarly, Aldridge et al. (2000) found that 2- to 4-year-old children with autism could imitate purposive actions on objects, even when the intended action was not completed. However, the evidence is somewhat controversial. Visual recognition of human biological movements (Blake et al. 2003) and imitation of actions and gestures of other people (Dewey et al. 2007; Smith and Bryson 1994, 1998; Williams et al. 2001) appeared to be disrupted in children with autism. More recently, D’Entremont and Yazbek (2007) found that, after observing an actor performing intentional and accidental actions on the same objects, typically developing children imitated more intentional actions, whereas children with autism tended to imitate intentional and accidental actions equally often, and to reproduce the action sequence in the same order as the experimenter, without reference to the agent’s intentions. Taken together, these findings strongly suggest that although individuals with autism are able to reproduce movements faithfully, difficulties in imitation might increase when they are not explicitly instructed to imitate or when they have to appreciate or identify others’ mental states, the model’s goals, or the social-communicative signals associated with intention reading (D’Entremont and Yazbek 2007; Tomasello et al. 2005).

However, as already noticed (D’Entremont and Yazbek 2007; Huang et al. 2002), the major difficulty with the unfulfilled intention paradigm is that children should be able to succeed in performing (unseen) intended actions without any understanding of the actor’s intentions. It is possible that the attribution of intentions is based on other forms of nonimitative social learning (Heyes 1994; Tomasello et al. 2005), such as emulation learning, object movement re-enactment or stimulus enhancement in which actions are reproduced through attention to movements or object affordances.

It has been suggested that impairments in action understanding and imitation in individuals with ASD can be explained by an abnormal mirror neuron system (MNS) (Williams et al. 2001; Oberman and Ramachandran 2007; Iacoboni and Dapretto 2006). Studies on monkeys have shown that mirror neurons are cells in premotor area F5c of the inferior frontal cortex and of the rostral inferior parietal cortex that fire when a monkey executes goal-directed actions (grasping, holding and manipulating objects) and also when it simply observes the same actions performed by others (Gallese et al. 1996; Kohler et al. 2002; Rizzolatti et al. 1996). In humans, recent imagining studies have shown that Broca’s area in the inferior frontal gyrus, previously known for speech processing, is crucially active during action observation (Hamzei et al. 2003; Binkofski and Buccino 2004; Petrides 2005). The MNS also includes the adjacent premotor areas (Brodmann’s area 6), the inferior parietal lobe, as well as regions within the middle frontal gyrus. According to the MN theory, during action observation, the automatic activation of the same neural mechanism triggered by action execution is at the basis of a direct form of non-conceptual action understanding (Gallese et al. 1996; Rizzolatti et al. 1996). This is achieved by means of an embodied simulation device that employs a forward mechanism to anticipate the sensory consequences of ongoing actions, at a processing level that does not entail the use of any explicit cognitive elaboration or declarative representation. Difficulties in understanding others’ actions in persons with ASD would be due to failure in translating observed meaningful movements (grasping, reaching, etc.) into the personal motor vocabulary, thus preventing the observer’s MNS to resonate with the observed action. For this reason, the MNS is thought to be involved when the observed action is consistent with or closely related to the observer’s behavioural repertoire (e.g. Buccino, Binkofski and Riggio 2004; Calvo-Merino et al. 2005; Cross et al. 2006), while when a new action is observed, action and intention understanding require additional neurocognitive mechanisms which lie outside of the MNS (Brass et al. 2007). Alternatively, inferential theories argue that, although motor representations might have some role in action perception, action comprehension involves interpretative inferential mechanisms that analyse visual characteristics of an event (Gergely and Csibra 2003).

More recently, the concept of mirror system has been enlarged by a series of studies showing that, in addition to a mechanism based on mirror neurons, there is a more complex mechanism based on ‘action constrained’ mirror neurons that code not only what the observer sees, but also the chain of motor actions that the observer would likely predict once she/he has encoded the agent’s intention (Fogassi et al. 2005). Using electromyographic (EMG) recordings, Cattaneo et al. (2007) have investigated the ability of children with ASD and children with typical development in the execution of their actions and in the observation of the same actions performed by others (grasping with the right hand a piece of food placed in front of the subject, bringing it into the mouth and eating it, or grasping a piece of paper and putting it into a container). During the reaching for and grasping to eat sequence, unlike children with typical development, no activation of the mouth muscles was found in children with autism, and a delayed activation only appeared during the last phase of the action sequence, corresponding to bringing the food to the mouth. More interestingly, unlike the comparison group, no EMG activity of this muscle was found during the observation of the action of bringing the food to the mouth, suggesting that children with autism were unable to anticipate their own action and to predict others’ behaviour. According to the authors, difficulties with intention understanding in autism are the result of defective activation of chains of action-constrained neurons. Many of these neurons have mirror properties, i.e. they selectively discharge when an initial motor act is part of a given action chain, and are implicated in the selection of impending motor acts during execution as well as in understanding another’s intention during action observation.

Interestingly, the concept of ‘chains of action-constrained neurons’ may be akin to the notions of ‘script’ and ‘action schema’ (Schank and Abelson 1977; Read 1987). Script knowledge defines the set of relations between actors and actions, the goal hierarchies, and the temporal order of events, i.e. how an event is related to other event and how, taken together, these events fit into a coherent structured goal-directed scenario. These structural features characterise all complex event knowledge, such as action or narrative knowledge. Scripts are schematically represented in long-term memory in particular spatio-temporal and situational contexts, and hierarchically organized with goals and subgoals. They provide knowledge about behavioural and social rules, adapted to specific contexts and characterized by a gradient of familiarity (Grafman 1989). The strength of the script representation can vary, from highly rehearsed routines with a low activation threshold, to less familiar plans, or sequences that sometimes must be assembled a new. Script components are organized within a particular spatial–temporal order and hierarchically structured as goals and instrumental actions or sub-goals. These elements are thought to be related to one another much like items of a semantic network, with various degrees of association strength. How often an action is performed within a given script, in a particular context, and how instrumental it is in bringing about a desired goal, jointly determine the prototypicality of that action.

In a previous study, Loveland and Tunali (1991) assessed whether children and adolescents with high-functioning autism were able to employ an accepted social script in a conversational situation. The authors concluded that they might have some scriptal competence for the situation, but this knowledge is not spontaneously used, or sufficient to formulate socially appropriate behaviour. Loveland and Tunali (1991) identified several factors that might explain difficulties in social script processing in people with autism, including the affective content (e.g. a distressing personal experience), the conversational context and the ability to shift between two conversation contents.

Using a picture arrangement task, in a previous study (Zalla et al. 2006), we showed that children with autism had difficulties in constructing script scenarios representing sequences of goal-directed actions, while they exhibited a preserved ability to arrange mechanical–physical events. Interestingly, despite these difficulties, children with autism were able to identify the last picture representing the sequence goal. Several considerations, in particular, the greater number of sequence errors occurring predominantly midway in the action sequences, supported the hypothesis that their impairment affects the ability to represent the internal structure of the action knowledge, e.g. the causal and the hierarchical relationship between the individual events and the overall goal.

The current study aims to investigate the ability of a group of children and adolescents with ASD to predict the outcome of a sequence of goal-directed actions. Unlike our previous study, in which participants had to reconstruct the global action sequence (Zalla et al. 2006), here, the task was to infer on-line the most likely outcome in the context of a dynamic scenario, which was free of affective content and interpersonal situations.

Participants were presented with short videotaped scenarios showing an actor executing two types of activities: (1) familiar actions exemplifying the conventional use of common objects, and (2) non-familiar actions performed on known objects, that is actions that were rarely or never executed. Participants watched incomplete movies, as the sequences were stopped before attainment of the goal. Four pictures, each depicting a possible outcome, were then simultaneously presented and the task was to identify the event that would suitably complete the action sequence. The suitable action always conformed to the object’s conventional use. On the basis of previous evidence, we expected participants with ASD to be impaired in selecting the appropriate outcome. We also expected difficulties with event prediction to increase for less familiar actions, as compared to familiar actions, i.e. those actions belonging to the observer’s motor repertoire and stored as script knowledge.

Materials and methods

Participants

The demographic and clinical information about the three groups are displayed in detail in Table 1.

Table 1 Demographic and clinical characteristics of the groups (years, months)

Eighteen children and adolescents (17 males and 1 female) with autism spectrum disorders (ASD), 13 children and adolescents (8 males and 5 females) with moderate mental retardation or learning disabilities, and 19 children (12 males and 7 females) with typical development participated to this study. Children and adolescents with ASD were recruited at Institut de Traitement des Troubles de l’Affectivité et de la Cognition (ITTAC, Vinatier, Villeurbanne), a specialised clinical service for autism spectrum disorders.

The diagnosis based on DSM-IV criteria was made by a qualified paediatrician or paediatric neurologist using different sources of information including an extensive standardised psychological evaluation, clinical observation, parents’ interview about the child’s social, emotional, and behavioural functioning, review of autistic symptoms and developmental history, prior evaluations, and pre-school and school records. Interviews with parents using the ADI-R (Autism Diagnostic Interview, Lord et al. 1994; French translation: Plumet et al. 1994) confirmed the diagnoses. The elevated scores indicated problematic behaviour in the three following areas: reciprocal social interaction, communication and stereotyped behaviours. The cut-off points for the three classes of behaviour are reciprocal social interaction 10, communication 8, and stereotyped behaviours 3, respectively. All participants scored above the cut-offs points. To rate the severity of autism symptoms, the Childhood Autism Rating Scale (CARS; Schopler et al. 1988, French translation: Pry and Aussilloux 2000) was completed for each participant. Intellectual functioning was assessed using the verbal and performance scores of one of the Wechsler Intelligence Scales (Wechsler Preschool and Primary Scale of Intelligence, Revised: Wechsler 1989; Wechsler Intelligence Scales for Children, third or fourth edition: Wechsler 1991, 2003). One participant received the K-ABC (Kaufman and Kaufman 1983) and two participants received PEP-R (Schopler et al. 1990), only the global score was retained for them (Table 1).

Participants with moderate mental retardation or learning disabilities (MR) were recruited from the Ecole Louis Armand, a specialised school in Villeurbanne. All completed the Wechsler Intelligence Scales for Children, fourth edition (Wechsler 1991). Participants with typical development (TD), with no history of neurological or psychiatric disturbances, or learning difficulties, were recruited at a primary school in Villeurbanne. They were matched with the two clinical groups for global mental age, and we assumed that their mental age corresponded to their chronological age.

Participants with ASD were matched for mental age with both the groups with MR (t(29) = 0.82; P = 0.42) and TD (t(35) = −0.41; P = 0.43). They were matched for chronological age with participants with MR (t(29) = 1.13; P = 0.26) and IQ level (full-scale: t(29) = −0.16, P = 0.87; verbal: t(26) = −1.1, P = 0.3), whereas they scored significantly higher in Performance IQ: t(26) = 2.66, P = 0.013).

Because of the greater number of male participants with ASD, they did not match for gender with the two comparison groups. Given the unequal sex distributions, we performed a statistical analysis for gender effect. No significant gender differences were found on test performances.

To evaluate ToM abilities, the three groups received two false belief tasks: the Sally and Ann task (Baron-Cohen et al. 1985) and the Smarties task (Perner et al. 1989). Three children with ASD did not understand the tests and no response was obtained. All the other participants passed control (memory and reality) questions. 6 of the 18 children with ASD passed both false belief tests, while three children only passed one of the two tests and six of them failed on both tests. One child with MR failed at both false belief tasks, three of them just passed one and nine passed both tasks. All participants with TD passed the Sally and Ann and the Smarties tasks. Ethical permission for the study was granted by the local medical ethics committee and all parents’ participants gave informed, signed consent to participate to this study.

Procedure

Participants were presented with 20 soundless videotaped scenarios on a colour screen showing an actor performing a sequence of meaningful actions. In accordance with the general concept of a script (Schank and Abelson 1977), the videos represented typical events and familiar activities following a natural temporal order. All the movies were filmed with the same actor, outside and inside an apartment, so as to make them ecologically closer to real situations. A pre-test was conducted to select 20 familiar action sequences performed on common objects from a preliminary pool of approximately 36 action sequences. All the selected objects and their functions were common everyday life tools; the criterion of familiarity was based on the frequency of object use and the degree of familiarity for each object was established through parents and caretakers’ interview. Objects never or rarely used were considered as non-familiar whereas objects frequently used were considered as familiar.

Action sequences performed were classified into two categories: (a) ten sequences of familiar actions (eating a sandwich, writing a letter, packing a suitcase, cutting bread, reading a book, getting dressed, posting a letter, watching television, having a meal, preparing a birthday party) and (b) ten sequences of non-familiar actions (cooking a meal, shaving, starting a car, lighting a candle, preparing a toilet bag, putting a nail on the wall, dressing a cake, hanging a painting, ironing, leaving for winter vacation). The selected scenarios were comparable in terms of duration, each sequence lasted approximately 40–42 s.

Participants were tested individually in a quiet room at the ITTAC. Before testing, a practise session was conducted to familiarise them with the task and the touch screen. Instructions were given orally by the experimenter. Participants watched a series of incomplete videotaped vignettes; the videotapes were stopped immediately preceding the completion of the action and the participant asked to predict the next step. For example, a videotape scenario showed a man writing a letter, then writing an address on the envelope and leaving home. As soon as he left home, the tape was stopped and after a brief interval (3 s) of dark screen, participants were asked the question “What happens next?” and were required to point to one of the four pictures appearing on the screen which would correspond to the final sequence action goal (See “Appendix”). The situation was designed to elicit a response indicating that ‘posting the letter’ was the next appropriate step. The four images displayed on the computer screen, each depicting a possible outcome, were simultaneously presented and participants were requested to answer the question by choosing the picture that would appropriately complete the videotaped vignette, by touching the computer screen. Participants were also free to provide verbal responses, although this was not required. The four pictures belonged to four distinct categories: (a) an appropriate, most likely outcome involving the object’s conventional use (e.g. cutting with a knife, writing with a pencil, eating with a fork); (b) an unusual, less likely, outcome; (c) a temporally preceding action event, and (d) an incongruous outcome (Fig. 1). The degree of appropriateness and the probability of each proposed outcome, as well as the degree of familiarity of the actions, were established in a pre-test conducted on a group of 30 individuals who were unaware of the purpose of the study. The purpose of this task was twofold. First, it allowed investigation of whether individuals with ASD possessed a basic level of action knowledge, enabling them to predict core components of common scripts. Second, it allowed assessment of whether they were able to generalize a script to non-familiar actions performed on known objects. The order of presentation of the scenarios and the four pictures was randomised. To control for any perceptive or attentional problem that might have affected their understanding of the movie contents, children were also asked to describe the scenes and pictures depicting the goal and helped by the experimenter in case of difficulties. Completion of the task took about thirty minutes, including a short break. The videotaped vignettes were presented on a computer screen, using Presentation experimental software and experimental data were recorded by the software.

Fig. 1
figure 1

Examples of pictures representing four possible outcomes (a an appropriate, likely event; b a less likely event; c a temporally preceding event, and d an incongruous event) for the familiar action sequence ‘Eating a sandwich’ (upper panel) and for the non-familiar action sequence ‘Cooking a meal’ (lower panel)

Statistical analyses

Statistical analyses included a two-way ANOVA, a repeated measure ANOVA, with the two factors of group (ASD: MR, CP) and condition (familiar and non-familiar sequences), unpaired t test, and Pearson correlation analysis. The Fisher’s exact test was used for post hoc comparisons. The significance threshold for all analyses was set at P < 0.05.

Results

Mean number of correct responses

We calculated the number of goal pictures correctly identified by participants in each group on the two conditions (familiar and non-familiar action sequences). All videotaped vignettes taken together, participants with ASD correctly identified a smaller number of goal pictures (mean = 14 ± 3.9), in comparison to both MR (mean = 18.2 ± 1.6) and TD (mean = 19 ± 1.1) groups. Repeated-measures ANOVA analysis yielded significant main effects of group (F(2,47) = 18.15, P < 0.0001) and sequence type (F(1,47) = 8.47, P = 0.005), as well as a significant interaction (F(2,47) = 3.31, P = 0.045). Participants with ASD performed at a lower level than both participants with MR (mean diff. = −2.08; P < 0.0001) and participants with TD (mean diff. = −2.47; P < 0.0001), while the two comparison groups did not differ. The sequence type effect was due to the greater number of correct responses produced in the familiar action condition, as compared to the non-familiar action condition (mean diff. = −0.4; P = 0.01). The significant interaction was due to the fact that both participants with MR (mean diff. = 1; P = 0.018) and those with TD (mean diff. = 0.47; P = 0.04) performed better in the familiar action condition compared to the non-familiar condition, while no effect of familiarity was found in participants with ASD (mean diff. = −0.059; P = 0.93) (Fig. 2). Interestingly, while failing to select the correct picture corresponding to the appropriate outcome, many participants with ASD were able to describe the general script theme. They also managed to infer the correct response by mimicking the corresponding sequence or verbally provided the correct response using contextual or semantic object knowledge. When verbal responses were considered, the group effect was no longer significant (F(2,47) = 2.33, P = 0.1). The Fisher’s exact test revealed that they did not differ from the MR group (mean diff. = −0.42; P = 1.6) and that their performance remarkably improved in comparison to the TD group in both the familiar (mean diff. = −0.58; P = 0.04) and non-familiar (mean diff. = −0.6; P = 0.09) conditions.

Fig. 2
figure 2

Mean number of correct responses in predicting familiar and non-familiar action sequences in each group. Whiskers report standard deviations. Asterisks indicate significant comparisons (*P < 0.05)

Reaction times (RTs)

Repeated-measures ANOVA on RTs revealed that the main effect of group was not significant (F(2,47) = 1.26, P = 0.29), nor was the Group × Sequence Type interaction effect (F(2,47) = 2.67, P = 0.079). The significant effect of condition (F(2,47) = 4.76, P = 0.034) was due to participants being faster in identifying the pictures for the familiar action sequences than for the non-familiar action sequences (mean diff. = −0.91; P = 0.047).

The ASD group’s RTs were 10 (±4) and 9.6 (±4.5) s for familiar and non-familiar action sequences, respectively, whereas the group with TD took 7.3 (±2.3) and 8.9 (±3) s and the group with MR 7.51 (±3.2) and 9.4 (±5) s for familiar and non-familiar action conditions, respectively. It should be noticed that although the Group × Condition effect was not significant, only the two comparison groups were faster on the familiar action condition, as compared to the non-familiar action condition.

Types of error

We further explored the distribution of errors in selecting the goal picture to define the type of error for the three groups. Participants were presented with four pictures belonging to four categories and they had to choose the one that better depicted the following event of a given action sequence. Their responses were classified in accordance with the four proposed pictures and error types as (a) error of probability, (b) error of temporal inversion, and (c) error of incongruity.

A two-way ANOVA with one between-subject factor (Group), and one within-subject factor (Type of Error) was conducted for all sequence conditions confounded (familiar and non-familiar actions). We observed a highly significant main effect of group (F(2,47) = 28.1, P < 0.0001), as well as significant main effect of condition (F(2,4) = 5.52, P = 0.005) and Group × Type of Error interaction effect (F(2,4) = 6.08, P = 0.0002). The effect of group was due to participants with ASD performing worse than the groups with TD (mean diff. = 1.72; P < 0.0001) and with MR (mean diff. = 1.42; P < 0.0001), whereas the effect of condition was due to a smaller number of errors of incongruity, as compared to the number of temporal inversion (mean diff. = 0.94; P = 0.0003) and probability (mean diff. = 0.54, P = 0.03) errors. The significant interaction was due to the ASD group committing a greater number of temporal inversion errors, as compared to the groups with TD (mean diff. = 2.97; P < 0.0001) and with MR (mean diff. = 3.15; P < 0.0001), whereas they did not differ from participants with MR in the number of errors of probability and incongruity committed. The group with ASD committed a greater number of errors of probability (mean diff. = 1.24; P = 0.0018) and incongruity (mean diff. = 0.95; P = 0.0005), as compared to the group with TD (Fig. 3).

Fig. 3
figure 3

Mean number of errors committed by each group as a function of the type of error. Whiskers report standard deviations. Asterisks indicate significant comparisons (*P < 0.001; **P < 0.0001)

Effects of mental age and severity of autistic symptoms on task performance

To assess whether the performance of participants with ASD on this task was independent of verbal mental age (VMA), and non-verbal mental age (NVMA), we computed a correlation analysis, using the Pearson Product Moment test, between scores on these measures and the total number of correct responses produced in action prediction. While no significant correlation was observed between VMA scores and the total number of correct responses, NVMA significantly correlated with the total number of correct responses (r = 0.57, z = 2.26; P = 0.024) (Fig. 4). We also computed the correlation between severity of autistic symptoms, as indexed by both CARS and ADI scores, and the total number of correct responses for participants with ASD. These correlations were not significant.

Fig. 4
figure 4

Scatterplot showing significant relationship between scores in non-verbal mental age and the total number of correct responses produced by the group with ASD

Discussion

The aim of the present study was to investigate the ability of a group of children and adolescents with ASD to predict others’ behaviour. Their performance was compared to that of two groups: participants with moderate mental retardation and participants with typical development. All participants were presented with short videotaped scenarios showing an actor performing familiar and non-familiar actions. Objects were all known to participants and the category of familiarity was defined on the basis of individual frequency of the object use. Participants watched incomplete scenarios since they were stopped before completion of the action sequence. After stopping of the sequence, four pictures, each representing a different outcome, were presented simultaneously and they were requested to select the one depicting the most likely continuation of the preceding action sequence.

Two main findings emerged from the present results. First, when asked to predict the most likely outcome, participants with ASD produced fewer correct responses, as compared to both comparison groups and, unexpectedly, their performance did not improve when action prediction involved actions belonging to their personal motor repertoire. Conversely, both comparison groups made fewer errors when action prediction involved frequently performed actions. Second, when we looked more closely at the type of error, children and adolescents with ASD predominantly chose the picture depicting a temporally preceding sequence event, whilst this type of error was significantly less frequent in the other two groups. The group with MR committed a slightly greater number of errors of probability and errors of incongruity, whereas the latter types were almost absent in participants with TD.

Noteworthy, participants with ASD were able to understand the action goal and verbally reported the ensuing events using alternative cognitive strategies, although they remained unable to identify the corresponding event pictures. As indicated by correlation analyses, performance on action prediction was not related to VMA, but rather to NVMA, suggesting that VMA, in itself, could not explain task improvement and language ability could not compensate for their impairment in processing action information through the visual modality. Hence, it is likely that goal understanding could have been achieved by eliciting object affordances, as it is suggested by the fact that they sometimes mimicked the motor acts directed towards the object, or alternatively by retrieving knowledge of object functions from semantic memory.

Recently, behavioural and imaging research has provided substantial support for a MNS dysfunction explanation of ASD during action observation and motor control (Blake et al. 2003; Bernier et al. 2007; Dapretto et al. 2005, 2006; Hadjikhani et al. 2006; Theoret et al. 2005; Oberman et al. 2005). In particular, two recent studies (Cattaneo et al. 2007; Fabbri-Destro et al. 2009) have shown that children with autism are unable to organize their motor actions as a chain of motor acts and concluded that a deficit in the chained organization of the motor system is the major responsible of the autistic impairment in understanding other’s action. The MNS hypothesis argues that observers use their own motor representations to comprehend the meaning of the actions, thus predicting that action comprehension should be limited to actions that are within the observer’s motor repertoire.

Different from motor theories, inferential theories of action ascribe to the MNS the functional role of action understanding by means of a predictive simulation device, which enables one to infer the most likely and optimal motor sequence for a given goal (Csibra 2007; Jacob and Jeannerod 2005). According to this view, the MNS is involved in monitoring and anticipating one’s own action as well as in predicting another individual’s ensuing actions, once the goal is known. As also argued by Wood et al. (2008), contrary to the common view that mirror neurons compute the goal of an action, evidence on monkeys and humans fit better with the notion that inferential processes, rather than a direct-matching mechanism, which allow comprehending the meaning and the goal-directness of the actions, along with mirror neuron activation reflecting this process.

In the present study, the fact that participants with ASD failed to figure out the correct response for both familiar and non-familiar actions suggests that their impairment extends beyond the motor simulation device and the intuitive sensory-motor understanding of others’ action. Indeed, rather than pointing to a specific impairment of the MNS and to its causal role in goal understanding, the present results shows that difficulties in action prediction using visual information in children with autism are due to an impaired higher-level inferential mechanism of action analysis. Similarly, in a previous study using a picture arrangement task (Zalla et al. 2006), we reported poor performance in script processing in a group of children with autism. Although they were able to identify the semantically related events in the condition in which four distinct script sequences were simultaneously presented, they encountered severe difficulties in reconstructing the correct sequential order of events, even when they were able to correctly identify the overall goal. Interestingly, both studies converged on the notion of a diminished ability to represent the integrative features of script knowledge that is the causal and hierarchical structure of the component action units. At the conceptual level, scripts provide information about the action sequence structure by defining how an event is related to other events of the script and how all the events are organised within a coherent goal-directed scenario (Read 1987). By encoding the action schema and the instrumental steps necessary to achieve a given goal, script knowledge structure translates overall intentions into specific sensorimotor programs. At the lowest level, failure to select the optimal instrumental action sequence leading to the goal would lead to an inadequate activation of the visuomotor simulation mechanism, crucially involved in both action execution and action observation. In accordance with this interpretation, more recently, E. Somogyi et al. (The development of intention. a comparative study, unpublished) have shown that children with autism do not differentiate between rational and inefficient actions during an imitation task, suggesting an impairment in selecting instrumental actions for imitation, given a known goal.

It has been suggested that a defective computational mechanism responsible for the means-end analysis, which allows selecting the appropriate instrumental actions and setting priorities among the component actions in relation to the stated goal, would be responsible for the disorganisation of behaviour following frontal lobe damage (Duncan 1986). Substantial evidence supports the view that understanding ‘why’ that sequence of actions was performed (e.g. ‘because the agent wanted to have dinner’) and ‘how’ to optimally achieve an intended goal requires inferential processes depending on the activity of the prefrontal lobe (Duncan 1986; Fuster 2002; Tanji and Hoshi 2008; Petrides 2005; Ostlund, Winterbauer and Balleine 2009). Consistently, previous lesion studies (Sirigu et al. 1996; Zalla et al. 2001, 2003) have shown that damage to the dorsolateral regions of the prefrontal cortex, including Brodmann’s areas 45–46 and more anterior areas in the medial frontal gyrus, generates selective impairments affecting the ability to optimally plan a sequence of action and to process the structural features of script knowledge for both routine and novel actions (Sirigu et al. 1995, 1998; Zalla et al. 2001, 2003). These regions, which are part of the neural circuit implementing a hierarchical and sequential organization of nested elements into meaningful structures (Petrides 2005), are adjacent to and functionally interconnected with, Broca’s area, the homologue of F5, that has been proposed to be part of the MNS in humans (Buccino et al. 2004; Fazio et al. 2009).

Previous studies have suggested that basic knowledge of everyday elementary scripts is preserved in children with high-functioning autism and with relatively spared language skills, but that they have difficulties in spontaneously using their generalized event knowledge to guide social and interactive behaviour (Lewis and Boucher 1988; Loveland and Tunali 1991). It is worth noting that, in the present study, while both comparison groups performed quite well in prediction of another’s action, on average, in participants with ASD, script knowledge was not completely disrupted since, with all conditions taken together, they were able to identify the correct event for more than half of the scenarios. Hence, although they were less competent than their matched comparison groups, the present findings should not be interpreted to mean that they are lacking script knowledge altogether. By choosing one of the previously occurring events belonging to the observed sequence, participants with ASD exhibited spared script semantic knowledge and were able to infer others’ goals and intentions by means of compensatory cognitive processes. To explain their relatively preserved performance, we speculate that different cognitive strategies, presumably based on object affordance or semantic knowledge, might be employed by participants with ASD to compensate for an inadequate predictive inferential mechanism. This could explain a diminished competence in using script knowledge, especially in everyday life situations, despite relatively spared performance in experimental contexts assessing conceptual script knowledge.

Deficits in perceiving complex visual scenes might also be explained by the ‘‘weak central coherence’’ theory (Frith 1989; Happé 1999; Loth et al. 2008), in terms of an increased focus on details and difficulties with integrating information into a coherent whole. An insufficient ability to group details in a complex context would lead to a fragmented perceptual scene and difficulties deriving meaning from this information. Action understanding, and in general social functioning, which requires fast on-line integration of context-dependent information, would be seriously hampered under such conditions. However, the ‘‘weak central coherence’’ theory cannot account for the type of error committed by participants with ASD in predicting the final step of an action sequence, i.e. the fact that they predominantly select a temporally preceding action step while they rarely chose the incongruous and the less probable outcomes.

Conclusion

In conclusion, the present study confirms previous evidence showing atypical action representation in people with autism (Cattaneo et al. 2007; Fabbri-Destro et al. 2009; Loveland and Tunali 1991; Zalla et al. 2006) and provides additional evidence for the hypothesis that a diminished sensitivity to the action sequence structure, due to an impaired means-end analysis process, would disrupt the ability to understand and predict others’ actions. The hypothesis of an impaired means-end analysis process might explain difficulties in both predicting others’ behaviour and planning one’s own action and is consistent with a more general executive dysfunction theory of autism (Hermelin and O’Connor 1970; Hill 2004; Hughes 1996). At the cognitive representational level, a diminished understanding of the action structure might lead to adherence to inflexible routines during planning and to the inability to mentally disengage from the actual situation, that is to inhibit reference to one’s current perceptual representation about reality while framing the likely impending event during action observation. Abnormalities in event knowledge as found in this study may contribute to a better understanding of some of the problems that individuals with ASD have in spontaneously understanding real-life social situations and in some aspects of executive deficits and behavioural inflexibility. These results may generate important implications of clinical relevance and should be taken into consideration when designing intervention programs.