Introduction

Nonverbal communication governs a great part of our daily interactions with our kind (Mehrabian 2007). This spontaneous type of communication serves an important role in guiding our attention toward objects relevant to our social life. These shifts of attention can arise as a response to different types of cues, like direction of gaze or pointing gestures (Sodian and Thoermer 2004; Daum et al. 2013; Aldaqre et al. 2014; Paulus and Fikkert 2014). When presented in a shared communicative context, attention allocation in the direction of such cues can be fostered (Tomasello et al. 2007; Senju and Csibra 2008).

Impairments in these communication and social interaction skills are at the core of the diagnosis for autism spectrum disorder (ASD; American Psychiatric Association 2013). These impairments include how people diagnosed with ASD respond to social stimuli (Dawson et al. 2004; Senju et al. 2004; for a review, see Ames and Fletcher-Watson 2010). For example, while typically developing individuals prefer to attend to social aspects of a scene presented to them (such as the faces of two people who are conversing), people with ASD do not show such a preference; in contrast, they show increased attention to nonsocial elements of such a scene (like the shadows of the two people; Falck-Ytter and von Hofsten 2011). Additionally, when social cues (e.g., gaze direction) are presented in conflict with nonsocial cues (e.g., motion cues) by referring to two different objects, people with ASD attend both objects to the same extent, unlike their typically developing peers who show a clear preference to the object cued by the social cue (Aldaqre et al. 2014).

It has been argued that these atypical behaviors are rooted in deficits on a basic social–perceptual level, which in turn affects the development of social cognition (Tager-Flusberg 2001; Schultz 2005; Palmer et al. 2015). This argument is supported, for example, by the observation that individuals with ASD differ from neurotypical controls in the immediate perception of social stimuli such as faces (Schultz 2005). However, to date we lack comprehensive evidence for putative deficits on a social–perceptual level as previous research mainly focused on face processing. Further, results from a recent study thoroughly re-examining paradigms to test action perception suggest that the perceptual system to process other people’s actions is intact in ASD (Cusack et al. 2015). Thus, the hypothesis of social–perceptual deficits in ASD remains controversial and additional evidence is required to address this issue.

One important aspect of social perception is spontaneous attentional orienting in response to social cues. This mechanism, present early in development (Gredebäck and Daum 2015), can be assessed by employing the well-established spatial cueing paradigm, developed by Posner (1980). In this paradigm, participants are typically presented with a central cue (e.g., an arrow) directed to the left or right. The cue is then followed by a peripheral target object that appears either in direction of the cue (congruent condition) or in the opposite direction (incongruent condition). Reaction times (RTs) upon target detection are measured. If participants employ the directionality of the central cue correctly, the typical pattern of results is slower RTs for incongruent relative to congruent trials (Posner 1980). This difference between RTs will be referred to as the cueing effect; the increase in RTs is an indication that participants shift their covert visual attention in direction to the cue, which results in faster detection of targets that appear at a location that is congruent with this direction compared to a location that is opposite to the cue direction. In the latter case, participants need to reallocate their attention to the actual target location (Posner 1980; Hood et al. 1998; Driver et al. 1999).

Little is known about how individuals with ASD shift their visual attention in response to social cues, and the available evidence in this regard is inconclusive. A study by Ristic et al. (2005) demonstrated that older adolescents and adults with ASD did not show the cueing effect when presented with a non-predictive gaze cue. On the other hand, many studies show that the ability to follow gaze cues is intact in ASD. For example, Chawarska et al. (2003) employed the spatial cueing paradigm to test whether 2-year-old children with and without ASD shift their visual attention in response to perceived eye movement. They presented their participants with a face whose gaze could be directed to the left or right, followed by a target on the congruent or incongruent location relative to where the face was looking. The gaze cue was not predictive, as the target could appear at the cued side with a 50 % probability. Saccadic reaction times (SRTs) were calculated as the latency between target onset and participants’ initiation of a saccade toward that target. Their main finding suggests that visual attention of children with ASD is modulated by perceived eye movements the same way as in their neurotypical peers (NT). These results were replicated by Senju et al. (2004, Experiment 1) and extended to nonsocial cues (an arrow). Interestingly, children with ASD showed a similar cueing effect for gaze and arrow cues, while NT children showed a persistent cueing effect for gaze cues compared to arrows (Senju et al. 2004, Experiment 2).

Comparing responses to social versus nonsocial stimuli is insufficient to characterize specific communication deficits in ASD, as this disregards the communicative value of social stimuli. This is particularly important in light of theoretical accounts on early cultural learning such as natural pedagogy, which suggests that human infants are particularly tuned to attend and receive information from someone who shows an intention to communicate with them (Csibra and Gergely 2009). Indeed, empirical work suggests that infants show increased attention toward people who initiate an ostensive situation with them (Farroni et al. 2002) and prefer attending objects cued by those people (Senju and Csibra 2008). Even on a more basic level, relating to the nature of the stimulus, Falck-Ytter et al. (2010) distinguished between the social and the communicative aspects of faces. This distinction gives rise to a different perspective in looking at the atypical behavior of people with ASD in social situations: we can compare how the communicative aspect of cues affect visual attention allocation of people with ASD, which can expand our understanding of how they respond to social stimuli.

To this end, we used the spatial cueing paradigm (Posner 1980) and manipulated the Cue Type to be a pointing gesture, a social cue with high communicative value, or a grasping gesture, a social cue with a low communicative value. We refer to both cues as being social because any human—and as such social—stimuli are presumably processed differently in ASD. Indeed, previous research revealed that atypical visual information processing in ASD is not restricted to eyes or faces of others, but is also observable when presenting bodies, arms and hands (e.g., Klin et al. 2002; Fletcher-Watson et al. 2009; Boria et al. 2009; Wang et al. 2015). The key difference between pointing and grasping exploited in this study is that pointing is typically used to communicate a location to somebody. In contrast, even though it has also clear directional information, grasping is usually used to perform an action that is relevant to one self and not meant to be communicative. Yet, hand preshaping of a grasping gesture is sufficient to trigger information processing on the goal of the grasping action, for example if the grasper prepared to reach for a small or big object (Ambrosini et al. 2011). In sum, while both cues are directional and convey information about action goals or intentions, only pointing is clearly communicative. Additionally, the advantage of using pointing and grasping cues instead of gaze cues, as it was the case in several previous studies, is that it avoids the increased affective reaction of people with ASD in response to gaze cues, which might hinder their attention orienting ability to these cues (Frischen et al. 2007).

Pointing and grasping were shown to be effective cues in shifting visual attention in typically developing infants (Daum and Gredebäck 2011a; Daum et al. 2013; Paulus and Fikkert 2014). For example, Daum and Gredebäck (2011a) used the spatial cueing paradigm with 3-, 5- and 7-month-old infants. The cue in their experiment was a static picture of a grasping hand. They have shown that, starting at 5 months of age, infants shift their attention in response to grasping hands, but not to grasping mechanical claws. Using a similar paradigm, Daum et al. (2013) showed that 12-month-old infants shift their attention in response to pointing gestures (see Bertenthal et al. 2014, for a similar effect in even younger children). These effects were also shown to be present in typically developing adults where both pointing and grasping gestures resulted in a reliable and comparable cueing effect (Daum and Gredebäck 2011b).

We hypothesized that, if individuals with ASD have general difficulties in processing social cues, we should observe a reduced cueing effect for participants with ASD, as well as a difference in SRTs as compared to the NT group. Further, no differences in SRT between the pointing and the grasping gestures would be expected. If, however, it is the processing of the communicative value of social stimuli that is impaired in ASD, we should find a difference in SRTs between the pointing and grasping gesture for individuals with ASD, but not in the NT group (cf. Daum and Gredebäck 2011b). Accumulating evidence suggests that social cognition undergoes substantial developmental changes beyond childhood (Burnett et al. 2011). Thus, we recruited adolescents with and without ASD to test whether such ongoing development is also present in the processing of gestures.

Different cue–target stimulus-onset asynchronies (SOA) were employed, to assess the temporal dynamics of the cueing effect for the cues employed (cf. Driver et al. 1999). This might reveal differences in the efficiency of processing pointing or grasping gestures. Previous findings employing gaze cues showed a decrease in RTs as SOA increased (Driver et al. 1999; Senju et al. 2004; Langdon and Smith 2005). Additionally, the cueing effect for very short SOAs (100 ms) was found for nonsocial cues like arrows (Senju et al. 2004; Langdon and Smith 2005) and for pointing and grasping gestures (Daum and Gredebäck 2011a, b) but not for gaze cues (e.g., Driver et al. 1999, Experiment 2). Based on these findings, we expected that participants would show the cueing effect for both Cue Types employed in this study at all SOAs. However, if people with ASD do not process the cues employed in this study efficiently, we would expect them to show no cueing effect at the shortest SOA.

Due to the relatively high communicative value of pointing cues—in addition to their directional information (Daum et al. 2013)—it would be reasonable to assume that processing such cues is more demanding than cues which have a relatively low communicative value, like grasping. Hence, in addition to the saccadic responses, pupil dilation was used as a measure of mental effort during task performance (Beatty and Lucero-Wagoner 2000; Moresi et al. 2008). This made it possible to investigate any differences in processing load of these cues between ASD and NT groups. It is worth noting that the current task is an endogenous cueing task, in which cues are presented centrally and participants’ responses to such cues are characterized as being voluntary (Müller and Rabbitt 1989). Therefore, we reasoned that additionally measuring mental effort is suitable for such a task.

Pupil dilation was previously found to be a sensitive measure of Cue Congruency, showing larger pupils for incongruent relative to congruent cues (Moresi et al. 2008). This suggests higher mental effort in processing the incongruent cue–target relation, probably due to reallocating attention to the actual target location after the target had already appeared. Thus, the pupil dilation data were expected to support that of SRTs: participants in both groups were expected to show larger pupils for the incongruent compared to the congruent cue condition. However, if individuals with ASD have difficulties with processing the communicative value of social cues, resulting in a higher processing load for these cues in ASD, we should observe larger pupils for the pointing relative to the grasping cues.

Methods

Participants

The final sampleFootnote 1 included 18 adults with ASD (9 females; M age = 37.8 years, SD = 10.8 years) and 18 neurotypical (NT) adults (9 females; M age = 36.9 years, SD = 13.9 years). Eight additional adults were excluded from the analyses due to refusal to continue the session (1 ASD), technical problems with the experimental procedure (3 ASD and 3 NT) or later change in the diagnosis (1 ASD). Additionally, we tested 14 adolescents with ASD (4 females, M age = 16.4 years, SD = 1.9 years) and 14 NT adolescents (5 females; M age = 14.3 years, SD = 1.1 years). Fourteen additional adolescents were excluded from the analyses due to technical problems (4 ASD and 10 NT). Participants in this age group were particularly difficult to test because they tended to dim their eyelids, resulting in large amount of gaze data loss.

Adult participants in the NT group were recruited through advertisement at the local hiring office, while the adolescents in the NT group were recruited through local schools. Participants in the ASD group were recruited through local autism centers and clinics and were diagnosed by a qualified clinical psychologist or psychiatrist. Diagnoses had to meet the International Classification of Diseases 10th Revision (ICD-10, WHO 1993) criteria for Asperger syndrome (15 adults and 11 adolescents), or childhood autism (3 adults and 3 adolescents). Individual medical records were consulted to confirm the diagnoses of participants with ASD.

To support group assignment, all participants completed the validated German shortened version of the Autism Quotient (AQ-k, cutoff criterion: score ≥17; Baron-Cohen et al. 2001; German version by Freitag et al. 2007). Additionally, the caregivers of all adolescent participants completed the German version of the Social Responsiveness Scale (SRS; cutoff criterion: T-score ≥61; Constantino and Gruber 2005; German version by Bölte and Poustka 2008). Further, they filled the questionnaire for social communication (SCQ) to assess communication skills and social function across the entire development (lifetime form, discriminative cutoff: sum score ≥10 for comparisons between ASD and NT) and in the last 3 months (current from; Rutter et al. 2003; German version by Bölte and Poutska 2006). It can be observed that some participants in the ASD sample scored below the cutoff point on the different control measures (2 adults and 4 adolescents on the AQ-K; 2 adolescents on the SRS and SCQ). As the main results remained unchanged when these participants were excluded, these participants were kept in the final sample.

Other measures for the adult participants included the Culture-Fair Test 20-R for nonverbal intelligence (CFT 20-R; Weiss 2006) and a German vocabulary test (Mehrfachwahl-Wortschatz-Intelligenztest (MWT-B); Lehrl 2005) for verbal intelligence. Adolescent participants until the age of 16 years completed the German Hamburg-Wechsler Intelligence Scale for Children—4th Edition (HAWIK-IV, Petermann and Petermann 2008) and those from 16 years and above completed the German version of the Wechsler Intelligence Scale for adults (WIE, von Aster et al. 2006). All participants completed four subtests, namely the Similarities and the Vocabulary tests for verbal comprehension, and the Block Design and Matrix Reasoning tests for perceptual reasoning. Demographic data of the participants are presented in Table 1. All participants were matched on verbal and nonverbal intelligence, and they had normal or corrected-to-normal vision. Additionally, adult participants were matched for age. Note that the intelligence matching procedure resulted in a significant age difference between adolescents with and without ASD.

Table 1 Mean scores (standard deviation in brackets) of demographics and control measures, listed for adolescents and adults with and without ASD

This study was reviewed and approved by the local ethics committee. Adult participants and caregivers of the adolescent participants gave a written consent before starting the experiment, and were given monetary compensation for their participation.

Stimuli

Participants were presented with a modified version of Posner’s spatial cueing paradigm (Posner 1980). In this paradigm, participants were presented with a series of short trials. Each trial started with the presentation of a central fixation cross for 1500 ms (Fig. 1). Subsequently, the cross was replaced by a static picture of a hand, which served as a central cue. The Cue Type was manipulated to be a hand either pointing or grasping toward the left or the right periphery. The cue pictures were closely controlled for size, color and luminance. After a varied stimulus-onset asynchrony (SOA) of 100, 250 or 400 ms, a colored circle (the target) appeared either to the left or to the right side of the cue and lasted for 1000 ms, after which the trial ended. We chose the SOAs for having shown a cueing effect with the same cues in a pilot experiment done with typically developing adults. Moreover, to avoid predictability of the target location and therefore measure the spontaneous orienting of attention to the cue, Cue Congruency was counterbalanced across trials with equal probability of congruent and incongruent trials. Target location was counterbalanced, and the SOA, type and congruency of the cue were randomized.

Fig. 1
figure 1

Trial timeline is presented with possible Cue Types (a grasping and b pointing) and Cue Congruency (c congruent and d incongruent)

Apparatus and procedure

During the experiment, participants sat on a height-adjustable office chair, approximately 60 cm away from the eye tracker. Gaze data were recorded with a Tobii T60 eye tracker (Tobii Technology, Sweden), at 60 Hz sampling rate. The eye tracker is integrated in a 17-inch display, on which the stimuli were presented. A two-computer setup was used for stimulus presentation and gaze data recording. The stimulus material was presented using the software Presentation® (version 16.0, www.neurobs.com) and the Tobii Workspace Extension (Martin 2012), while calibration of the eye tracker and data acquisition was performed using the software TobiiStudio (version 3.2.2, Tobii Technology, Sweden).

At the beginning of the session, participants were asked to sign the written consent and to fill demographic information. Then, they were instructed to sit in front of the eye tracker. After a five-point calibration, instructions were presented on the display and explained to the participants. Participants were instructed to look at the fixation cross presented in the middle of the screen and to look at the target when it appeared as fast and accurately as possible. Additionally, they were told that the hand gesture could be directed to the left or right side, suggesting that it is not predictive to the location of target appearance. They were also instructed to avoid blinking during the cue and target presentation as far as possible, to minimize data loss. The experiment had 300 trials for the adult participants and 204 trials for the adolescents, divided into 6 blocks, and participants were given a short break between blocks. After the experiment has finished, participants were asked to sit again in front of the table to do the autism quotient test, the verbal and nonverbal intelligence tests. In some cases, the control measures were administered before the experiment started, while the experimental equipment was prepared.

Data analysis

All data preprocessing and analyses were done using the statistical computing language “R” (R Core Team 2013) and the following packages: “aspace” (Bui et al. 2012) for access to trigonometric functions which are necessary to convert from pixels to degrees of visual angle, “reshape2” (Wickham 2007) for data manipulation, “zoo” (Zeileis and Grothendieck 2005) for interpolating missing values and “ez” (Lawrence 2013) for performing analyses of variance. Gaze data were averaged from both eyes, and missing data points were linearly interpolated with a maximum gap of two consecutive samples.

Saccadic reaction times

Saccades were identified using a velocity-based filter (Salvucci and Goldberg 2000) with a threshold of 52°/s. A saccade was defined as all consecutive raw gaze samples with a velocity higher than that threshold. Saccadic reaction time (SRT) was calculated as the time interval between target onset and saccade onset. Trials in which the saccade did not start from the cue region or went toward the opposite location from the target were excluded. Additionally, SRTs that were less than 100 ms from target onset and were more or less than 3 standard deviations from the individual mean were excluded from the analyses. By employing these criteria, 27.7 % of all trials were excluded (range 2.4–81 %; mean percent of excluded trials for ASD adults = 19.7 %; ASD adolescents = 35.5 %; NT adults = 23.6 %; NT adolescents = 36.5 %).

Pupil dilation

Pupil dilation during the target display was analyzed. The average pupil diameter during the fixation cross for every trial for every participant served as individual baseline (length: 1500 ms) for the pupil dilation analysis. Trials on which the eye tracker did not record any gaze data during the baseline segment were excluded from these analyses (15.7 % of all trials). The percentage of change from baseline was then calculated and used as the dependent variable. Values of this measure are in negative values when pupil diameter during the baseline is bigger than that during the analyzed time segment. Previous research indicates that pupil dilation as indicator of mental effort during stimulus processing has a peak around 600 ms after stimulus onset (see Laeng et al. 2012, for review). Moreover, Verschoor et al. (2015) suggested a peak for pupil dilation between 500 and 2000 ms after stimulus onset. For this reason, we focused our analysis on a time interval beginning with 500 ms from target onset and lasting to 1000 ms from target onset.

Results and discussion

Saccadic reaction times

An analysis of variance (ANOVA) was employed to investigate SRTs, with the within-subject factors Cue Type (pointing and grasping), Cue Congruency (congruent and incongruent) and SOA (100, 250 and 400) and the between-subject factors Group (ASD and NT) and Age (adults and adolescents). Figure 2 shows mean SRTs separately for all conditions for all groups. When necessary, t tests were used to explore significant main effects and two-way interactions.

Fig. 2
figure 2

Average saccadic reaction times (SRT) for each Group, Age, Cue Type and SOA on congruent and incongruent trials. Error bars represent standard error of the mean (SEM)

As expected, the ANOVA revealed that participants had overall faster SRTs for the congruent than the incongruent condition (F(1, 60) = 45.01, p < 0.001, \(\eta_{G}^{2}\) = 0.04). The analyses further revealed that SRTs were slower for the ASD group than for NT group (F(1, 60) = 4.73, p < 0.05, \(\eta_{G}^{2}\) = 0.05). There was also a main effect of SOA (F(2, 120) = 7.61, p < 0.001, \(\eta_{G}^{2}\) = 0.01), showing that SRTs were overall slower for the 100 than 250 and 400 ms SOAs. This effect was modulated by a significant two-way interaction between Cue Congruency and SOA (F(2, 120) = 4.75, p < 0.015, \(\eta_{G}^{2}\) = 0.003), showing that the size of the congruency effect was modulated by SOA. Further comparisons for the congruent condition revealed that SRTs for the 100 ms SOA (225.94 ms) were slower than those for the 250 ms (206.98 ms; t(63) = 6.05, p < 0.001, Cohen’s d = 0.31) and 400 ms (213.03 ms; t(63) = 2.23, p < 0.05, Cohen’s d = 0.23). As for the incongruent condition, SRTs for the 400 ms SOA (231.06 ms) were faster than those for the 100 ms (243.90 ms; t(63) = −2.53, p < 0.05, Cohen’s d = −0.23) and 250 ms (239.48 ms; t(63) = −2.05, p < 0.05, Cohen’s d = −0.16). Differences in SRTs between 250 and 400 ms in the congruent condition and between 100 and 250 ms in the incongruent condition did not reach significance (all p s > 0.178). Moreover, the two-way interaction between Group and SOA was significant (F(2, 120) = 3.44, p < 0.05, \(\eta_{G}^{2}\) = 0.01), showing that SRTs were slower for the ASD than the NT group only for the 100 ms (t(62) = 2.36, p < 0.05, Cohen’s d = 0.59). This difference approached significance for the 250 ms SOAs (t(62) = 1.77, p = 0.081), but not for the 400 ms SOA (p = 0.204). Further comparisons within the ASD group revealed that SRTs for the 100 ms SOA (252.69 ms) were slower than those for the 250 ms (234.31 ms; t(31) = 3.76, p < 0.001, Cohen’s d = 0.34) and 400 ms (228.08 ms; t(31) = 3.58, p < 0.005, Cohen’s d = 0.50). As for the NT group, the difference in SRTs between 100 and 250 ms SOA approached significance (t(31) = 1.82, p = 0.078), showing a trend of slower SRTs for the 100 ms SOA. Differences in SRTs between 250 and 400 ms in the ASD and in the NT groups and between 100 and 400 ms in the NT group did not reach significance (all p s > 0.228). All p values for the main effect of SOA and the interactions of SOA with other factors were corrected using Greenhouse–Geisser epsilon, due to the violation of the sphericity assumption. Finally, the two-way interaction between Group and Age was marginally significant (F(1, 60) = 3.22, p = 0.078, \(\eta_{G}^{2}\) = 0.04), showing that SRTs were slower in the ASD than the NT group for the adolescents only (t(26) = 2.26, p < 0.05, Cohen’s d = 0.85) but not for the adults (p = 0.674). All other main effects and interactions did not reach significance (all p s > 0.078).

These results show that SRTs of participants in both groups were slower in the incongruent than in the congruent conditions, even when analyzed for each SOA separately. As for the Cue Type, no differences in the cueing effect between the pointing and the grasping cues were found. While previous studies suggested that social cues have a more persistent cueing effect than nonsocial cues (Senju et al. 2004), this difference was not related to SRTs during the observation of grasping and pointing gestures. Rather, it was in the ability to unlearn social cues and to perform saccades toward the opposite side of the cue, an ability which was not assessed in the current study. Additionally, the cues used in Senju and colleagues’ study varied in their social nature, whereas both cues used in the current study are of a social nature, and therefore, we did not expect differences in SRTs between these cues.

Moreover, no difference in the cueing effect between the ASD and NT groups was found. This finding can be considered evidence that people with ASD shift their attention following the direction of both social cues. This extends previous studies reporting an intact ability to shift attention in response to gaze cues in people with ASD Chawarska et al. (2003), Senju et al. (2004), Aldaqre et al. (2014) by showing that their directing of attention in response to other social cues, like gestures, seems not to be impaired.

The current results also indicate that people with ASD shift their visual attention spontaneously in response to these cues, even at short SOAs. This is in line with previous findings showing a cueing effect for pointing and grasping cues at very short SOAs in typically developing adults (Daum and Gredebäck 2011b) and points to an advantage for fast processing of these cues relative to gaze cues, as the latter showed no cueing effect at short SOAs (Driver et al. 1999; Senju et al. 2004). Given the lack of a significant interaction between Group, Cue Congruency and SOA, we conclude that both the ASD and the NT groups responded to the cues presented in the current study spontaneously. This suggests efficient processing of communicative as well as non-communicative social cues in both groups. The most important theoretical insight from our findings, however, is that it seems unlikely that atypical behavior in ASD results from a deficit in processing cues of social nature in general or impaired social perception.

Interestingly, no difference in cueing effect was found between the adolescents and adults. Unlike previous studies, which reported that adults with ASD do not respond spontaneously to non-predictive gaze (Ristic et al. 2005), both age groups in the current study did so in response to pointing and grasping cues. Taken together, the current findings and those from Ristic et al. (2005) suggest a disparity in how people with ASD respond to different social cues, with non-predictive gaze seemingly losing its ability to modulate attention while pointing and grasping gestures maintain that ability. This may indirectly support the hypothesis suggested by Frischen et al. (2007) that people with ASD develop an ability to inhibit looking at other people’s eyes in order to avoid the increased emotional response which accompanies eye contact.

Additionally, we replicated previous findings (Driver et al. 1999) showing that SRTs were longer for shorter SOAs, an effect which could be due to participants’ expectation of target appearance, and their increasing preparedness to respond (Driver et al. 1999).

We have also found that the ASD group had overall slower SRTs than the NT group. This increased reaction time in ASD, compared to an NT group, was previously reported (Senju et al. 2004; Pruett et al. 2011). A plausible explanation for this finding is the impaired ability to disengage from the central cue in people with ASD (Senju et al. 2004). There is considerable evidence of an impaired ability to disengage attention in ASD (Casey et al. 1993; Adams and Jarrold 2012), and considering that the cue in the current study, similarly to Senju et al. (2004) and Pruett et al. (2011), remained present even after target appearance, the ASD group may have required more time to disengage from the cue and perform their saccade. On the other hand, when the central cue disappears from the scene with target appearance, like in the case of Chawarska et al. study (2003), people with ASD were found to have faster RTs relative to the NT group, which gives further support to the explanation provided.

Error rates

Erroneous trials were coded as those in which the first saccade after target presentation went to the opposite side of the target. Then, the percentage of erroneous trials was calculated for each condition and participant and analyzed by means of an ANOVA. The analysis revealed that adults had lower error rates (9.45 %) than adolescents (19.71 %; F(1, 60) = 7.18, p < 0.01, \(\eta_{G}^{2}\) = 0.07). Additionally, error rates differed between SOAs (F(2, 120) = 4.78, p < 0.05, \(\eta_{G}^{2}\) = 0.01), with higher errors for the 250 ms SOA (16.79 %) than the 100 ms (14.28 %; t(63) = 2.88, p < 0.01, Cohen’s d = 0.15) and 400 ms (14.37 %; t(63) = 2.12, p < 0.05, Cohen’s d = 0.15). The two-way interaction between Group and Cue Congruency was also significant (F(1, 60) = 5.43, p < 0.05, \(\eta_{G}^{2}\) = 0.003). Further investigation of this interaction by means of paired-samples t test revealed higher error rates for the ASD group in the incongruent (16.57 %) than in the congruent condition (14.63 %); however, this difference was only marginally significant (t(191) = 1.88, p = 0.062). All other comparisons did not reach significance (p s > 0.112).

Adolescents showed overall more erroneous responses than adults, probably due to the continued development of response inhibition in adolescence (Blakemore and Choudhury 2006). However, no differences between the ASD and the NT groups were found, indicating that people with ASD were as prone to move their gaze to the uncued location as the NT group, suggesting the same level of efficiency in responding to the type of cues used.

Pupil dilation

Percent pupil diameter change from baseline (Fig. 3) was analyzed by means of an ANOVA with the within-subject factors Cue Type, Cue Congruency and SOA and the between-subject factors Group and Age. As expected, pupils were larger in the incongruent condition (−0.52 %) compared to the congruent condition (−1.12 %; F(1, 60) = 41.59, p < 0.001, \(\eta_{G}^{2}\) = 0.03). Pupils were also larger for pointing (−0.75 %) than for grasping cues (−0.89 %; F(1, 60) = 6.23, p < 0.05, \(\eta_{G}^{2}\) = 0.002). Additionally, pupil diameter increased as SOAs got longer (F(2, 120) = 30.38, p < 0.001, \(\eta_{G}^{2}\) = 0.03), potentially indicating that pupil size dilated gradually over time (as the interval between baseline and target onset varies with SOA) The analysis further revealed a significant interaction between Group, Age and Cue Type (F(1, 60) = 4.65, p < 0.05, \(\eta_{G}^{2}\) = 0.002) and between Cue Type and Cue Congruency (F(1, 60) = 4.79, p < 0.05, \(\eta_{G}^{2}\) = 0.002). Moreover, the interaction between Group, Cue Congruency and SOA (F(2, 120) = 3.57, p < 0.05, \(\eta_{G}^{2}\) = 0.002), between Age, Cue Congruency and SOA (F(2, 120) = 5.65, p < 0.005, \(\eta_{G}^{2}\) = 0.004) and between Group, Cue Type and SOA (F(2, 120) = 3.22, p < 0.05, \(\eta_{G}^{2}\) = 0.002) were all significant. It is worth noting that the number of participants might have not been large enough to obtain higher statistical power, a problem that is difficult to overcome when testing clinical samples, like people diagnosed with ASD.

Fig. 3
figure 3

Average percent pupil change from baseline for each Group, Age and Cue Type on congruent and incongruent trials. The shaded area represents the time segment from which data are analyzed

Notably, the five-way interaction between Cue Type, Cue Congruency, SOA, Group and Age reached significance (F(2, 120) = 5.17, p < 0.01, \(\eta_{G}^{2}\) = 0.003). Thus, all other observed effects are dependent on this significant five-way interaction, and we therefore broke it down to assess which specific difference was driving this interaction. The order by which factors were split to perform separate ANOVAs for each of their levels was such that the factors of key interest (Cue Type and Cue Congruency) were split later. First, separate ANOVAs for each SOA were performed, to assess which SOA produces a four-way interaction between the other factors. This analysis revealed a significant four-way interaction for the 250 ms SOA (F(1, 60) = 9.79, p < 0.005, \(\eta_{G}^{2}\) = 0.01) but not for the other SOAs (all F s < 0.26, p s > 0.6). To explore this interaction further, we analyzed the data for each Age separately for the 250 ms SOA and found a significant three-way interaction between Group, Cue Congruency and Cue Type for the adults (F(1, 34) = 11.56, p < 0.005, \(\eta_{G}^{2}\) = 0.01) but not for the adolescents (p = 0.108). We followed up on this significant interaction with two separate ANOVAs for adults with and without ASD (Fig. 4) and found a two-way interaction between Cue Type and Cue Congruency for the ASD group (F(1, 17) = 15.89, p < 0.001, \(\eta_{G}^{2}\) = 0.02) but not for the NT group (p = 0.218). Further investigation of the significant interaction revealed that, for the 250 ms SOA, pupils of adults with ASD were larger for the incongruent (−0.60 %) than the congruent condition for pointing cues (−1.66 %; t(17) = 6.58, p < 0.001, Cohen’s d = 0.73) but not for grasping cues (incongruent = −1.06 %; congruent = −1.23 %; p = 0.392). Moreover, pupils were larger for pointing than for grasping cues in the incongruent condition (t(17) = 3.49, p < 0.005, Cohen’s d = 0.34), but this pattern was reversed in the congruent condition (t(17) = −2.40, p < 0.05, Cohen’s d = −0.28).

Fig. 4
figure 4

Average percent pupil change from baseline for each group and Cue Type on congruent and incongruent trials from the adult sample for 250 ms SOA only. Error bars represent standard error of the mean (SEM). Asterisks show significant differences (p < 0.05) for the t tests used to investigate the two-way interaction between Cue Type and Cue Congruency found in the ASD group

Due to the complex pattern of results provided by pupil dilation data, we focus in this discussion on findings crucially informative for our hypotheses. First, we found an overall increase in pupil dilation for the incongruent cue condition compared to the congruent cue condition. This is in line with Moresi et al. findings (2008) from healthy adult participants, and it confirms the hypotheses that the pupil dilation data mirror the cueing effect on reaction times found in the literature. Increased mental effort in the incongruent condition is reflected by this increase in pupil dilation, and it can be attributed to participants having to reallocate their attention to target location after the target has already appeared (e.g., Posner 1980).

Cue Type was shown to affect pupil dilation as well. Participants had overall larger pupils for the pointing than the grasping cues. Since the stimuli were controlled to have similar physical properties, this effect is not likely to be due to low-level perceptual differences between the pictures employed. This rather indicates different cognitive loads for processing each Cue Type, with pointing requiring more mental effort to trigger a response than grasping. This finding can be explained by the precedence of understanding grasping over pointing gestures in the course of development (Sodian and Thoermer 2004). Previous studies have shown that beginning at the age of 5 months, infants successfully shift their attention in response to a grasping cue (Daum and Gredebäck 2011a; Wronski and Daum 2014), which is around the time they start to produce proficient goal-directed grasping gestures (von Hofsten et al. 1998). On the other hand, infants start shifting their visual attention in response to pointing gestures at around 12 months of age (Daum et al. 2013), which is when they also produce such protodeclarative gestures (Liszkowski et al. 2004). Taken together, these findings suggest that processing cues with high communicative value is more effortful than those with low communicative value.

The difference in pupil dilation between pointing and grasping cues was particularly present in the ASD group. It has been proposed that the understanding of other people’s behavior might be promoted by the ability to produce the same action (Aglioti et al. 2008; Paulus 2012), and the production of protodeclarative pointing was shown to be impaired in 4-year-old children with ASD (Baron-Cohen 1989). This impairment might still be present at an older age as part of impaired communication, manifesting itself by the increased mental effort while processing pointing cues.

General discussion

The main aim of this study was to investigate whether people with ASD are as sensitive as neurotypical controls to social cues, which vary in their communicative value. Participants were presented with a classical spatial cueing paradigm, with pointing (i.e., high communicative value) and grasping hands (i.e., low communicative value) serving as central cues, and were instructed to perform a saccade toward the target as soon as it appeared. The target could appear either on the congruent or on incongruent location relative to the cue, and we were interested in the increased SRT in the incongruent relative to the congruent condition, or the cueing effect.

Although the response to communicative social cues in NT and ASD looks the same at a behavioral level, as measured by saccadic reaction times, it might recruit more and/or different cognitive resources, as indicated by the current pupil dilation findings. Similar patterns of results can be found in other domains as well, such as in cognitive aging. When performing the same task, older and younger adults have often been found not to differ much in their behavioral response. However, despite the similarity in observable behavioral response, older adults often exhibit quite different patterns of brain activation than younger adults. This is evident, for example, by an age-dependent increase in prefrontal cortex activation (Cabeza et al. 1997; Reuter-Lorenz et al. 2000). It is assumed that the frontal activation increasing with age is an indicator for an adaptive brain that engages in compensatory scaffolding in response to the challenges posed by declining neural structures and function (Park and Reuter-Lorenz 2009). Based on the present findings, it can be predicted that such a dissociation between behavioral and neural responses could also be found for processing of pointing gestures in individuals with and without ASD. More research is needed to clarify whether such dissociation exists.

We mentioned above in the introduction an important difference between the gestures employed in this study: pointing is used for communicating information, while grasping is not. The increased mental effort in processing communicative cues observed in people with ASD might be a contributing factor to their difficulty in learning from others. This can have implications on both formal and informal learning situations and should be considered when designing intervention programs for teaching people with ASD. Yet, before drawing strong conclusions, it would be important to consolidate these findings by future studies employing different types of communicative and non-communicative cues.

The high rate of data loss has some implications for future studies. Here, we focus on four recommendations. First, with regard to the number of trials which had to be excluded, we suggest using an eye tracker with higher sampling rate. This could increase the probability of having enough gaze samples during the time segment of interest. As for the number of participants excluded, we recommend monitoring participants during the experiment to make sure that they are looking at the screen and that the eye tracker is recognizing and tracking their eyes properly. This can be done by employing some sort of tracking-quality monitor, typically bundled with modern eye trackers. Additionally, if the experiment is long, like the case of the current one, we recommend having the possibility to pause the procedure to instruct participants to adjust their posture or open their eyes wider if needed. Lastly, we advise future researchers to use suitable timing of their stimulus for the age group they plan to test. This might limit the possibility to compare the results from different age groups, which depends on the main question to be addressed.

In conclusion, together with previous findings, our study demonstrates that people with ASD are sensitive to social cues. They do not differ from neurotypical controls in the immediate perception of gestures. However, apart from this intact sensitivity, we found that individuals with ASD process cues with high communicative value, like pointing, more effortfully than those with low communicative value, like grasping. Additionally, while adults with ASD were previously shown not to orient to non-predictive gaze cues, ASD adults in the current study were shown to respond spontaneously to other non-predictive social cues, like pointing and grasping. These findings speak to social perception accounts of ASD in two ways (e.g., Tager-Flusberg 2001; Schultz 2005): first, they are in line with recent evidence that social perception is intact in ASD (Cusack et al. 2015). Second, despite the intact sensitivity to social cues, deviant cognitive load during processing these cues suggests that such information might nonetheless be processed differently in individuals with ASD. Our findings thus challenge social perception deficit accounts of ASD and pose the need to integrate the possibility that processing of social information might not be impaired in ASD but qualitatively different from neurotypical controls. Future research is required to elucidate potential qualitatively different social perception in ASD.