Introduction

Our social environment is inherently dynamic and to interact successfully in it requires anticipating how it might change in the (immediate) future. One important source of information with which to anticipate others’ behaviour is gaze direction, which is indicative of perceptual and attentional states, goals and intentions (Baron-Cohen 1995). For example, when a person performs an action, he/she almost inevitably looks toward the goal of the action (Hollands et al. 2002; Land et al. 1999; Wilkie et al. 2008). Therefore, an observer can use the actor’s gaze direction to infer the goal of the action, which may help to anticipate how the action will continue (Flanagan and Johansson 2003; Rotman et al. 2006).

The anticipation of how an action will proceed in the immediate future has been studied using representational momentum paradigms (Freyd and Finke 1984; Graf et al. 2007; Jarraya et al. 2005; Thornton and Hayes 2004; Wilson et al. 2010). After observing a short action episode, participants are required to compare the remembered final position of that action with a test stimulus that is either before the final position, or extrapolated beyond the final position. Participants are more likely to remember the final position as being closer to the extrapolated position, suggesting they overestimated how far the action had progressed. This phenomenon has been attributed to the formation of a representation of the object’s most likely location in the immediate future, which causes an observer to remember the final position of the action as being further along the observed trajectory than it actually was.

However, by varying the gaze direction of the actor, action sequences may also be subject to a backward memory displacement (Hudson et al. 2009; Hudson and Jellema 2011). In these studies participants observed a head rotate towards them, while the gaze direction was either leading, or lagging behind, head rotation. Participants overestimated the end-point of the rotation when gaze direction was leading the head rotation (i.e. looking to where the head would rotate to in the immediate future) and underestimated the end-point when gaze direction was lagging behind head rotation (i.e. looking back in the direction from where the head started its rotation). In an additional condition, participants were asked to estimate how far a non-animate cylindrical object of comparable size and shape had rotated. This control object rotated in an identical way and possessed two features that mimicked the gaze direction of the rotating head, both in terms of the directional meaning it conveyed and in terms of the low-level visual appearance of the black pupil shifting within the white sclera. However, estimations of the rotation of the non-animate object were not affected by the ‘gaze’ manipulations. Thus action anticipation, as evidenced by a distorted memory for the actor’s final position, is determined not only by a visual analysis of the kinematics of the actor’s movements (the head rotation itself), but also by attributions made regarding the behavioural intention of the agent as conveyed by their gaze direction. This effect was called ‘social cue related anticipation of movement’, or for short, ‘social anticipation’.

Social Cue Processing in Autism Spectrum Disorder

Autism spectrum disorder (ASD) is a pervasive developmental condition, characterised by abnormal social development and impaired communicative abilities, and associated with rigid repetitive behaviours, obsessive interests, and lack of emotion and imaginative play (Rutter 1978; WHO 1992; DSM-IV 2004). A lack of spontaneous and involuntary interpretation of others’ social cues in terms of goals, intentions and states of mind, has been argued to be characteristic of ASD (Baron-Cohen et al. 1995; Jellema et al. 2009; Senju et al. 2009), as well as for other neurodevelopmental disorders such as schizophrenia (van ‘t Wout et al. 2009). However, when individuals with ASD are explicitly instructed to extract meaning they may perform as well as TD controls (e.g. Happé 1997). Although an impaired ability to utilise action kinematics and object knowledge to predict the most likely end-point of an action sequence has been demonstrated in children and adolescents with low functioning autism (Zalla et al. 2010), it is as yet unknown to what extent problems in the involuntary processing of social cues in ASD may give rise to problems in anticipating other’s actions.

Gaze perception in ASD is a complex issue. When explicitly required to follow gaze direction, individuals with ASD are able to discriminate gaze direction as precisely as TD individuals and infer to which object another person is looking (Leekam et al. 1997). Furthermore, not only can they infer what another person can see (1st order perspective taking) but also can represent what the object looks like from the other person’s perspective (2nd order perspective taking) (Tan and Harris 1991).

There is some evidence that individuals with ASD show intact reflexive orienting to gaze direction (Chawarska et al. 2003; Kemner et al. 2006; Kylliainen and Hietanen 2004; Senju et al. 2004; Vlamings et al. 2005), although this ability seems developmentally delayed until the child has reached a verbal mental age of around 48 months (Leekham et al. 1998; Leekham et al. 2000). Nevertheless, there is consensus that the ability of joint attention, which builds on the ability to follow gaze, is impaired in ASD (Baron-Cohen 1995; Charman et al. 2000).

Social Versus Non-Social Cues

One view as to the origins of the impairments in social perception, and particularly in gaze processing, is that individuals with ASD process gaze direction as a non-social stimulus, without interpreting the actor’s gaze direction in terms of underlying goals and intentions, while still being capable of computing and following another’s direction of gaze (e.g. Nation and Penny 2008). Thus, the strategy employed is atypical and based on perceiving gaze direction as a low-level directional cue, rather than as an intentional cue. Alternative conceptualizations for this dichotomy have been referred to as the mechanistic processing mode versus the mentalistic processing mode (Driver et al. 1999; Jellema and Perrett 2002, 2007), or as the feature correspondence versus social reading hypothesis (Ristic et al. 2005).

Several lines of evidence support the notion that individuals with ASD do not discriminate as much between social and non-social cues as TD individuals do. For example, while TD individuals are quicker to orient their spatial attention in response to non-animate directional cues (e.g. arrows) than to the averted gaze of another person, individuals with ASD are just as quick for both stimulus types (Chawarska et al. 2003; Vlamings et al. 2005). Furthermore, for TD individuals the gaze cueing effect differs depending on the visual hemifield to which the gaze cue is directed, whereas no such asymmetry is evident for the cueing effect in response to arrows (Frischen and Tipper 2006; Vlamings et al. 2005). In contrast, ASD individuals show no visual hemifield differences in the cueing effect for either stimulus type, suggesting that for them the gaze cue does not have a special meaning over and above that of the arrow cue (Vlamings et al. 2005). Further, individuals with ASD exhibit a similar cueing effect for both gaze cues and arrow cues despite being explicitly informed that the cues are counter-predictive of target location. This is in contrast to TD individuals who show a reduced cueing effect to counter-predictive arrow cues but not in response to counter-predictive gaze cues. This suggests that TD individuals, process gaze cues automatically, in contrast to those with ASD, who fail to discriminate between the social and non-social cue types and process them in a comparable manner (Senju et al. 2004).

The Current Study

The aim of the current study was to establish whether individuals with ASD are able to implicitly use gaze direction to infer the goal of another’s action and use this to anticipate the future motion trajectory. Furthermore, by comparing their performance in response to an animate stimulus and an equivalent non-animate control stimulus, we aimed to explore underlying mechanisms. That is, if individuals with ASD process social cues in a mechanistic, non-social, way, then the gaze manipulation (or its non-social equivalent) would have an effect both on estimations of how far the head had rotated and on estimations of how far the non-animate stimulus had rotated. On the other hand, if individuals with ASD, like TD individuals, process social cues in a mentalistic way, then the ‘gaze’ manipulation would have an effect on estimations of head rotations but not on estimations of non-animate object rotations.

Methods

Participants

Autism Spectrum Disorder (ASD) Group

Twenty-four students with high-functioning ASD were recruited through disability services at universities in the Northeast of England (UK). These individuals had previously been diagnosed with Asperger’s Syndrome, based on DSM-IV criteria (American Psychiatric Association 1994). Diagnostic evaluations consisted of psychiatric observations and review of prior records, which included assessments on the CARS (childhood autism rating scale) or GARS (Gillams Autism Rating Scale). At the time of testing, the ADOS (Autism Diagnostic Observation Schedule, module 4; Lord et al. 1999) was administered by Hollie G. Burnett. Three participants did not meet the ADOS criteria for ASD (total cut off <7) and were excluded, and one was excluded on the basis of error rates (see below). This left 20 participants remaining in the ASD sample (5 females, 15 males), with a mean ADOS score of 8.0 (SD = 0.9), and a mean age of 22.6 years (SD = 6.5 years) (see Table 1).

Table 1 Participant characteristics

Directly prior to the experiment, participants completed an online version of the AQ (Baron-Cohen et al. 2001; Hoekstra et al. 2008; Wheelwright et al. 2006). The AQ is a fifty-statement, self-administered questionnaire designed to measure the degree to which an adult with normal intelligence possesses traits associated with ASD. It covers social skills, attention switching, attention to detail, communication and imagination. Their mean AQ score was 30.2 (SD = 8.3). Participants also completed the WAIS-III (Wechsler Adult Intelligence Scale; Wechsler 1997). Their mean IQ score was 117.2 (SD = 9.8) (see Table 1 for subscores).

Typically-Developed (TD) Group

The TD group consisted of 24 undergraduate Psychology students. Four participants were excluded because of high error rates (see below), leaving 20 in the TD group (9 females, 11 males, with a mean age of 22.6 years (SD = 7.6 years), a mean AQ score of 18.1 (SD = 6.2) and a mean WAIS-III score of 113.7 (SD = 8.4) (Table 1). The TD and ASD groups did not differ in terms of age (t(38) = .022, p = .982), gender composition (X2(1) = 1.76, p = .185) or IQ (t(38) = 1.22, p = .232) but AQ scores were significantly higher in the ASD group than in the TD group (t(38) = 3.35, p = .002). All TD and ASD participants had normal or corrected-to-normal vision and provided written informed consent prior to the experiment. Participants received course credit or a fee for taking part. The University of Hull Ethics committee approved the study.

Stimuli

Stimuli were created using Poser 6 (Curious labs, Inc., Santa Cruz, CA. and e frontier, Inc., Scotts Valley, CA, USA) and presented using E-Prime software (Psychology Software Tools, Inc., Sharpsburg, PA, USA) on a 21-inch monitor (100 Hz refresh rate). Participants observed a rotating stimulus followed by a test stimulus.

Rotating Stimulus

The stimulus was depicted rotating 60° towards the observer, starting from a full profile view (90° from front view) and ending at an angle 30° from front view. Smooth continuous motion was induced by presenting 16 frames for 40 ms each at 4° interpolations. On each trial, the stimulus was either animate or non-animate. The animate stimulus was a human head (either male or female). The gaze direction was either leading the head rotation by 30°, or lagging behind the head rotation by 30° (Fig. 1a, b). The subtended height of the stimulus was 7.0° for the female, and 6.5° for the male stimulus. As the face rotated, the subtended width of the stimuli varied from 5.1° to 4.0° for the female, and 5.7° to 5.1° for the male. The non-animate stimulus was a cylinder of the same size, colour and texture as the animate stimulus. Down the vertical midline that marked the ‘front’ of the object were two cubes half submerged into the surface (Fig. 1c). Half of the cube surface was white, half was black. The configuration of the black and white areas varied between trials so as to mimic the appearance of the positions of the dark pupil and white sclera of the different gaze directions for the animate stimulus. In the equivalent of the gaze-leading condition, the black half was on the side corresponding to the direction of rotation. In the equivalent of the gaze-lagging condition it was on the opposite side. The subtended height of the non-animate stimulus was 6.3°, and the subtended width varied from 4.3° to 3.3° as it rotated.

Fig. 1
figure 1

Stimuli. a Trial sequence depicting the animate stimulus in the gaze-leading condition with a symmetrical test stimulus, in which the choices are 10° before (−) and 10° after (+) the final angle of the rotating stimulus. b The end-point of the rotating stimulus of the animate stimulus in the gaze-lagging condition. c The end-points of the rotating stimulus for the non-animate stimulus are depicted in the equivalent conditions. The two test stimuli are shown at 10° before (−) and 10° after (+) the final angle of the rotating stimulus

Test Stimulus

The test stimulus consisted of two static images of the same stimulus side by side, each at a different angle of orientation. The participant’s task was to select the stimulus that was at an angle most similar to the final angle of the rotating stimulus. One was oriented before (−) the final angle of the rotating stimulus (i.e. at an orientation observed in the rotating stimulus) and the other was oriented after (+) the final angle (i.e. extrapolated beyond the final angle along the observed trajectory). The gaze direction of the test faces was aligned with head orientation (i.e. gazing straight ahead) so that the test faces used in the gaze-ahead and gaze-lagging conditions were identical. This meant that if a difference was found between the two gaze conditions than it had to be due to the immediate perceptual history, and not to the test faces themselves.

In all trials, one choice was oriented 10° from the final angle (before or after). The deviation of the remaining choice from the final angle varied along three levels.

  1. 1.

    Symmetrical trials: The deviation of the remaining choice was also 10°, so that both choices differed by an equal amount from the final angle of the rotating stimulus (−10°/+10°). Thus, participants were forced to choose between two equally wrong responses as neither choice was more similar to the correct final angle than the other. A bias for choosing the ‘after’ choice as more similar to the final angle of the rotating stimulus would reflect an overestimation of the amount of head rotation, a bias for choosing the ‘before’ choice an underestimation.

  2. 2.

    Asymmetrical trials: The deviation of the remaining choice from the final angle was increased to 20°. As participants were required to judge which choice was at an angle most similar, or closest, to the final angle of the rotating stimulus, the 10° choice was the ‘correct’ answer as it was closer to the final angle of the rotating stimulus than the 20° choice. The aim of the asymmetrical trials was to investigate if gaze direction could induce an incorrect answer despite the presence of a correct answer.

  3. 3.

    Catch trials: The remaining choice was 40° from the final angle. The correct answer was obvious enough for these trials to be used as catch trials.

Design and Procedure

Each trial began with a fixation cross at the centre of the screen (1,000 ms) followed by the rotating stimulus (640 ms), after which the test stimulus was presented until a response was made (Fig. 1a). Participants completed 84 trials. The symmetrical condition (−10°/+10°) was the main focus of the study as it was most sensitive to a possible response bias induced by gaze direction, due to the two choices being equally different from the final angle of the rotating stimulus. Participants completed 48 of these trials. The asymmetrical conditions (−10°/+20° and −20°/+10°) were less sensitive to a response bias and participants completed 24 of these trials. The remaining 12 trials were catch trials. The direction of rotation (left or right), and the position of the before and after choices in the test stimulus (left or right), were counterbalanced across trials. For the animate stimulus, the identity (male or female) was counterbalanced across trials. The correct answer (before or after) present in the asymmetrical and catch trials was counterbalanced across trials. Instructions were given verbally and in writing (on screen). Participants were instructed that on each trial they would see an object rotate towards them, and that they had to remember the angle at which it stopped at. This would be immediately followed by two static objects side by side, each at a different angle. Their task was to choose which of the two objects was at an angle most similar to the final angle of the rotating object. They chose either the stimulus on the left or right side of the screen by pressing the ‘f’ and ‘k’ keys respectively (labelled accordingly). Participants were instructed to prioritise accuracy over speed, but that responses should be made within 3 s. No mention of the gaze manipulations was made. It was also not mentioned that in the test stimulus one of the choices was ‘before’ and the other ‘after’ the actual final angle.

Results

The mean catch trial error rate was 10.7% (SD = 9.8%). The error rates of the ASD group (M = 9.7%, SD = 10.2%) and TD group (M = 11.7%, SD = 9.7%) did not differ from each other (t(43) = .747, p = .459). Five participants made more than 25% errors and were excluded from the analysis (one participant in the ASD group, four in the TD group). Of the remaining participants the mean RT was 1,670.5 ms (SD = 416.4 ms). Trials were excluded if response times were less than 250 ms or more than 2SD above each participant’s mean RT, leading to 4.95% of trials being excluded.

The mean proportions of ‘after’ responses were entered into a three-way ANOVA with Stimulus type (non-animate vs. animate) and Gaze direction (leading vs. lagging) as within-subjects factors and Group (ASD vs. TD) as a between-subjects factor. There were more trials in the symmetrical (−10°/+10°) condition than in the asymmetrical (−10°/+20°, −20°/+10°) condition, reflecting the higher sensitivity for a possible response bias induced by gaze direction in the former condition (in the symmetrical condition the two choices were equally incorrect, while in the asymmetrical conditions a correct response was present). We therefore analysed the two conditions separately. Participant exclusion based on error rates created near ceiling performance in the catch trials, therefore these were not included in the analysis.

Symmetrical (−10°/+10°) Trials

There was a significant main effect of Stimulus type, with the Non-animate stimulus eliciting more ‘after’ responses than the Animate stimulus (F(1, 38) = 25.1, p < .001, \( \eta_{p}^{2} = .398 \)). There was also a significant main effect of Gaze direction (F(1, 38) = 26.3, p < .001, \( \eta_{p}^{2} = . 40 9 \)), with more ‘after’ responses in the gaze-leading condition than in the gaze-lagging condition. The interaction between Stimulus type and Gaze direction was significant (F(1, 38) = 4.53, p = .04, \( \eta_{p}^{2} = .10 6 \)), and crucially, the three-way interaction between Gaze direction, Stimulus type and Group was significant (F(1, 38) = 4.62, p = .038, \( \eta_{p}^{2} = . 10 8 \)). There were no further significant main effects or interactions. To investigate the three-way interaction further, separate two-way ANOVAs were conducted for each group separately with Gaze direction and Stimulus type as within-subjects factors.

For the TD group, there was a significant main effect of Stimulus type, with the Non-animate stimulus eliciting more ‘after’ responses than the Animate stimulus (F(1, 19) = 11.1, p = .003, \( \eta_{p}^{2} = .36 9 \)), and a significant main effect of Gaze direction (F(1, 19) = 14.6, p = .001, \( \eta_{p}^{2} = . 432 \)), with gaze-leading eliciting more ‘after’ responses than gaze-lagging, and a significant interaction between Stimulus type and Gaze direction (F(1, 19) = 10.7, p = .004, \( \eta_{p}^{2} = .36 \)). The Gaze-leading condition elicited significantly more ‘after’ responses than the Gaze-lagging condition for the Animate stimulus (F(1, 19) = 27.3, p < .001, \( \eta_{p}^{2} = .5 9 \)) but not for the non-animate stimulus (F(1, 19) = .907, p = .353, \( \eta_{p}^{2} = .0 46 \)).

For the ASD group there was a significant main effect of Stimulus type, with the Non-animate stimulus eliciting more ‘after’ responses than the Animate stimulus (F(1, 19) = 14.1, p = .001, \( \eta_{p}^{2} = . 426 \)). There was a significant main effect of Gaze direction as the gaze-leading condition elicited significantly more ‘after’ responses than the gaze-lagging condition (F(1, 19) = 11.8, p = .003, \( \eta_{p}^{2} = .383 \) η 2 p  = .383). However, there was no interaction between stimulus type and Gaze direction whatsoever (F(1, 19) = 0.0, p = .989, \( \eta_{p}^{2} = .000 \)). The gaze-leading condition elicited significantly more ‘after’ responses than the Gaze-lagging condition for both the Animate stimulus (F(1, 19) = 7.22, p = .015, \( \eta_{p}^{2} = .275 \)) and the Non-animate (F(1, 19) = 5.15, p = .035, \( \eta_{p}^{2} = .213 \)) stimuli.

As both the TD and ASD groups exhibited an effect of gaze direction for the animate stimulus it was important to directly compare the magnitude of this gaze effect (gaze-leading − gaze-lagging = gaze effect). The effect of gaze direction for the TD group (M = 25.5, SD = 21.7) was marginally larger than the gaze effect for the ASD group (M = 12.6, SD = 20.8, t(38) = −1.95, p = .058).

Asymmetrical Trials

The same analysis was conducted for the asymmetrical trials (Fig. 2), showing again a significant main effect of Stimulus type (F(1, 38) = 18.9, p < .001, \( \eta_{p}^{2} = .332 \)) and a significant main effect of Gaze direction (F(1, 38) = 5.53, p = .024, \( \eta_{p}^{2} = .127 \)). None of the other main effects or interactions were significant (all ps > .05) but the interaction between Gaze direction and Stimulus type approached significance (F(1, 38) = 2.86, p = .099, \( \eta_{p}^{2} = .07 \)). The effect of Gaze Direction was significant for the Animate stimulus (F(1, 39) = 13.7, p = .001, \( \eta_{p}^{2} = .26 \)) but not for the Non-animate stimulus (F(1, 39) = .25, p = .62, \( \eta_{p}^{2} = .006\)).

Fig. 2
figure 2

The effect of gaze direction on the estimations of stimulus rotation in response to the animate and non-animate stimuli for the typically-developed (TD) and autism spectrum disorder (ASD) groups. Results for the symmetrical trials (−10°/+10°; top panel) and asymmetrical trials (−20°/+10°, −10°/+20°; bottom panel) are shown. Error bars represent SEM (standard error of the mean)

Discussion

Social cues such as gaze direction and emotional expressions are informative of the goals and intentions underlying another person’s actions and previous studies have shown that they can influence, in an involuntary manner, an observer’s anticipation of how the action is most likely to continue in the immediate future (Hudson et al. 2009: Hudson and Jellema 2011). The aim of the current study was to investigate if social cues have a similar influence on the ability of individuals with ASD to anticipate other people’s actions and, if so, whether the underlying mechanism diverges from that employed by TD controls.

The results showed a strong effect of gaze direction on action anticipation for the TD group. A head was estimated to have rotated further when gaze was leading the direction of rotation than when it was lagging behind rotation. Furthermore, this was a specifically social bias in that it was only observed for the animate stimulus. The features that replicated the relative positions of the dark pupil and white sclera did not influence estimations of how far the non-animate control stimulus had rotated. These results replicate the results of Hudson et al. (2009) in which TD individuals showed a similar effect of gaze direction on estimations of head rotation, and were also unaffected by equivalent visual manipulations when estimating the rotation of a non-animate stimulus. It testifies to the robustness of the effect, and corroborates the conclusion that in TD individuals the action anticipation in the animate condition was not caused by the low-level visual appearance of the gaze direction, but that the effect relied on gaze direction being interpreted in terms of the action intentions of the actor.

We speculate that in TD individuals the observed gaze direction activated representations of the agent’s attention and the direction of attentional focus. When these representations are integrated with the perception of the head rotating towards the observer, then they afford the action with an intention, which is either to continue to approach (gaze-ahead condition) or to discontinue, or slow down, the approach (gaze-lagging condition). These action intentions affect the observer’s anticipation about how the action is most likely to continue in the immediate future, resulting in biases in the observer’s memory for the action’s final position. As no intention was attributed to the non-animate stimulus, the TD individuals showed no ‘gaze’-induced biases in their estimations of how far the non-animate object had rotated.

It should, however, be noted that the inanimate stimulus, in contrast to the animate stimulus, was novel and unfamiliar to participants. This in itself may have contributed to the discrepant responses elicited by the two stimulus types, irrespective of whether they were social or non-social. Future research will have to explore this further.

The ASD group also showed a significant effect of gaze direction for the animate stimulus, which might suggest that they too are capable of involuntarily inferring how an action will continue based on social cues. However, unlike the TD group, their estimations of how far the non-animate stimulus had rotated were equally influenced by the non-animate ‘gaze’ manipulations. This opens up the possibility that the estimations of the ASD individuals in the animate condition were influenced by the low-level visual appearance of the gaze manipulations, rather than by the associated intentions conveyed by gaze direction. The reliance on the low-level visual information may have resulted in an inappropriate application of this information to the non-animate stimulus. However, on the basis of the current data, we cannot exclude the possibility that the ASD individuals, just like the TD group, relied on the intentional meaning of social cues in the animate condition, and only in the non-animate condition were affected by the low-level directional cues.

In the asymmetrical trials, which were less sensitive to possible response biases than the symmetrical trials, no group differences were found. In the animate condition, both TD and ASD groups showed an effect of gaze direction, similar to that in the symmetrical trials. This meant that the influence of gaze was strong enough to perceive a test head oriented 20° from the final angle as more similar to the final angle than a test head oriented 10° from the final angle. In the non-animate condition, no effect of the equivalent-gaze manipulations was evident for either the TD or ASD group. For the TD group this mirrored the findings in the symmetrical trials. However, for the ASD group, it contrasted with the symmetrical trials. It suggests that the effect of gaze direction in the animate condition was stronger than the effect of the equivalent manipulations in the non-animate condition, and that the added ‘sensitivity’ of the symmetrical trials was necessary to bring out the latter effect. It is not due to the visual manipulations in the non-animate condition being less visually salient than the gaze manipulation of the animate stimulus; the black and white equivalent-eyes of the non-animate stimulus were bigger and more conspicuous than the eyes of the animate stimulus (see Fig. 1).

The present study suggests that individuals with a mild form of ASD are able to implicitly integrate low-level visual information about gaze direction and bodily actions, without implicitly referring to intentionality. The current results support the view that implicit Theory of Mind (ToM) is impaired in ASD (Ruffman et al. 2001; Senju et al. 2009). However, the current findings do not allow us to say anything about whether explicit ToM is intact, as the short durations of the stimuli prevent the employment of deliberate effortful reasoning processes (explicit ToM).

The current results cannot be accounted for by more general deficits in attention (Fine et al. 2008) or in visual perception (Behrmann et al. 2006; Kern et al. 2006). Such general deficits would have manifested themselves in an impaired processing of the gaze manipulations in the ASD group, whereas we found an enhanced processing of the non-animate ‘gaze’ cues compared to the control group. Similarly, a more selective attentional deficit in spontaneously attending to gaze direction could not explain the results, as the ASD group was significantly affected by the gaze manipulations.

An inability to discriminate between social and non-social stimuli resulting in an equivalent response to the two stimulus classes agrees with the proposal that social processing difficulties in ASD may originate in part from an impaired ability to comprehend intentional behaviour (Driver et al. 1999; Nation and Penny 2008; Ristic et al. 2005). These findings are in line with research into reflexive orienting of visual spatial attention in response to the averted gaze of another person, which, like the current task, involves the involuntary processing of gaze direction. In these studies, TD individuals show subtle differences in orienting in response to gaze direction compared to non-social directional cues such as arrows, while individuals with ASD tend to treat these cues equally. For example, when explicitly instructed that in the majority of trials (80%) the target will appear on the side opposite to that indicated by the gaze and arrow cues, children with autism still succumb to the automatic or exogenous effect of both the gaze and arrow cues (at short SOA of 100 ms; Senju et al. 2004). Thus, at an SOA of 100 ms they detected the target fastest when it appeared on the side indicated by the gaze and arrow cues. However, TD children showed this automatic facilitatory effect only for the gaze cue, not for the arrow cue. The consensus from these studies and the current study is that although superficially the social cue processing in individuals with ASD may appear intact, closer inspection reveals qualitative differences, possibly reflecting the use of an atypical strategy based on low-level cues.

Such compensatory mechanisms have also been proposed to underpin the perception of other socially relevant stimuli, such as emotional facial expressions. Those with ASD may exhibit comparable abilities in expression recognition, but may focus on local features rather than global configuration, and may revert to explicit cognitive or verbally mediated processes rather than implicit emotion processing with which to do so (see Harms et al. 2010 for a review).

It remains an open question as to how the individuals with ASD “compensated” for their lack of what one could call a ‘social module’ or ‘intentionality detector’ (e.g. Baron-Cohen 1995). Did the absence of such a module cause the individuals with ASD to simply revert to more general perceptual mechanisms (Johnson et al. 2005), or did they actively, over the years, develop compensatory mechanisms (that are built on these general perceptual mechanisms), to try and navigate their way in the social world? Nevertheless, both accounts would result in the ASD group utilising a mechanism that is not as specialised in the processing of social information as the mechanism of the TD group.

The visual analysis of eye gaze direction has been proposed to proceed initially via a sub-cortical route for the rapid and basic processing of gaze direction based on low spatial frequencies (e.g. Sander et al. 2007; Senju and Johnson 2009). This is followed by a slower and more accurate cortical pathway encompassing the animate motion processing areas of the superior temporal cortex (STS) and the spatial processing areas of the intraparietal lobe (Haxby et al. 2000). This cortical pathway carries out a more sophisticated analysis, which incorporates the context in which the stimulus occurs and contributes to the attribution of intentionality. Possibly the persistent influence of low-level features in the ASD group arose from a disturbed balance between these subcortical and cortical systems, leading to an over-reliance on the low-level aspects (e.g. Senju and Johnson 2009). Indeed, abnormal STS functioning has been observed in individuals with ASD and has been proposed to constitute a major source of their socio-cognitive deficiencies (Redcay 2008; Zilbovicius et al. 2006).

The participant sample in the current study had diagnoses of Asperger’s syndrome (AS) or high-functioning autism (HFA) with normal, or above normal, IQ scores. It remains to be investigated whether individuals with low-functioning ASD are also influenced by gaze manipulations for the animate or non-animate stimuli. Possibly, the sample in the current study were better able to develop non-mentalistic strategies to navigate the social world, such as using the low-level visual appearance of the eyes, than individuals with low-functioning ASD. If individuals with ASD, and especially those with high-functioning ASD or Asperger’s syndrome, indeed employ compensatory mechanisms and strategies to navigate the social world, then one would expect to find cases of over-attribution of intentionality and of problems in distinguishing voluntary and involuntary actions. There are indeed such cases. One example comes from imitation research. Whilst TD children will imitate intentional actions but not accidental actions, children with ASD will imitate both, thus treating accidental actions in the same manner as intentional ones (D’Entremont and Yazbek 2007). A further example comes from research on faux pas detection. In a faux pas, a person accidentally causes offence to another person. Individuals with Asperger’s syndrome are able to detect the cause of the offence (e.g. what was said), but tend to attribute this to malicious intent rather than to a mistake (Baron-Cohen et al. 1999; Zalla et al. 2009). However, this is most likely due to cognitive compensation using overlearned abstract knowledge of normative rules (Zalla et al. 2009), rather than to an over-reliance on low-level visual features as in the current study. The dichotomy between over- and under-attribution of intentionality (in high- versus low-functioning ASD, respectively) is worthy of further investigation and might go some way to resolving the sometimes contradictory findings in the literature. This distinction may also explain why the results are not entirely consistent with some previous research examining the ability of individuals with ASD to understand the actions of others from social cues (Jellema et al. 2009). In the Jellema et al. (2009) study TD individuals underestimated the distance between two static agents depicted as running and looking toward each other as compared to when they were depicted as running toward each other but looking away from each other (i.e. the head was shown to be looking over the shoulder). This underestimation was not evident in individuals with ASD. However, the ASD sample comprised of individuals diagnosed with low-functioning as well as high-functioning ASD, unlike the current study which employed a sample of just high-functioning ASD. Furthermore, the methodology of Jellema et al. (2009) differed in that motion was not directly observed as in the current study, but had to be inferred from the form of the depicted agents (e.g. articulation of the limbs). In addition, in the Jellema et al. (2009) task the action and test stimuli were separated by a mask of 1 s duration and therefore relied more on visual working memory than the current task.

In studies of motor contagion—in which action observation facilitates action execution—it has been shown that TD individuals involuntarily read motor intentions from the gaze direction of an actor (Castiello 2003; Pierno et al. 2006), similar to the findings of the current study using a non-motor perceptual task. However, in sharp contrast to the TD individuals, individuals with ASD failed to read motor intentions from gaze (Becchio et al. 2007; Pierno et al. 2006). This seems to contradict our finding that individuals with high-functioning ASD did take the actor’s gaze direction into account. There are at least two possible reasons for this discrepancy. First, the impairment in using gaze might only become apparent in those with low-functioning ASD (the ASD sample in the motor contagion studies was low-functioning, with a mean age of 11.1 years, which was also considerably younger than in the current study). Second, the discrepancy could be related to the way in which reading of motor intentions was measured. The current study looked for a bias in a perceptual judgment of the observed action, while the motor contagion studies used kinematic parameters of the action executed by the observer. Possibly, the requirement to translate perception into action prevented the information about the actor’s gaze from influencing the ASD individuals.

Conclusion

The problem individuals with ASD have in understanding the behaviour of others may be partially due to an inability to involuntarily extract the other’s behavioural intentions as conveyed by social cues such as gaze direction. This study demonstrated that individuals with high functioning ASD are able to anticipate others’ actions from their gaze direction, but also suggested that they employ an atypical strategy. This strategy seems to be based on the visual appearance of the eyes without full comprehension of the goals and intentions conveyed by the actor’s gaze direction. In this way, they are able to mimic the social processing behaviour of their TD peers using an alternative mechanism. This poses interesting questions about the ability of individuals with mild forms of ASD to develop cognitive skills and behaviours to compensate for their impaired social processing abilities.