Difficulties in social communication and interaction, alongside restricted and repetitive behaviours and interests, form a core symptom cluster of autism spectrum disorder (ASD), affecting the initiation and reciprocation of nonverbal, emotional, and communicative interactions (American Psychiatric Association, 2013). There is an emerging consensus that these social difficulties may originate, in part, from a general overreliance on bottom-up sensory input, at the expense of top-down information (e.g., Palmer et al., 2015; Van de Cruys et al., 2014). Reduced top-down influence would particularly affect social interactions, as the hidden goals and intentions that drive others’ observed behaviour need to be constantly disambiguated (e.g., Bach et al., 2014; Barresi & Moore, 1996; Hohwy & Palmer, 2014). In typically developing individuals, such top-down influences allow others’ behaviour to be seen not as simply the movement of the limbs, but in terms of the mental states that cause it, as if others’ intentions, beliefs, and emotions were “drawn” onto the behaviour (for reviews, see Bach & Schenke, 2017; Gallagher, 2008; Teufel et al., 2010). This direct perception of the goals and intentions that imbue observed behaviour with meaning may be less evident in ASD, so that behaviour is perceived only in terms of what is explicitly visually available.

So far, research has, however, primarily tested situations in which the relevant high-level information was merely implied by the observed behaviours, and not explicitly given, and therefore needed to be spontaneously inferred from the action kinematics. This limitation is crucial because there is emerging evidence that while people with autism can reason about others’ mental states when explicitly asked, they tend to not do so spontaneously (e.g., Apperly & Butterfill, 2009; Frith & Frith, 2008; Schaller & Rauh, 2017; Senju et al., 2010; Tager-Flusberg, 2001; Zalla et al., 2010). Despite being able to differentiate actions based on explicitly available visual information—for example, recognizing different actions or discriminating them from robotic or scrambled motion (e.g., Cusack et al., 2015)—people with autism find it particularly difficult when action perception requires spontaneous inferences of higher-level information, such as the goals implied by an action (Ganglmayer et al., 2019; Hudson et al., 2012), an actor’s false beliefs (Schuwerk et al., 2016; Senju et al., 2010), the emotions implied by biological motion (Centelles et al., 2013), or the intention behind communicative gestures (von der Luhe et al., 2016).

To convincingly test whether ASD involves a generalized reduction in top-down processing, it needs to be established that these difficulties  persist even if the relevant higher-level information is explicitly given, in a form that makes spontaneous inferences unnecessary. This study provides this crucial test. It utilizes a recent paradigm (Hudson, Nicholson, Ellis, & Bach, 2016; Hudson, Nicholson, Simpson, et al., 2016) that reliably reveals how social perception in neurotypical (NT) individuals is biased by the integration of both explicit top-down intention information and intention information merely implied by the actor’s behaviour. In each trial, participants view, on a monitor, an actor’s hand near an object and are given explicit information about the actor’s goal, either by hearing the actor’s verbal statements (e.g., “I’ll take it!”; “I’ll leave it!”), or by themselves instructing the actor (“Take it!”; “Leave it!”). They then briefly view the actor either reach for or withdraw from the object, equally likely following the expectation or acting against it. The hand disappears mid-trajectory, and participants simply report, either on a touch screen, or using a probe comparison task, the last seen position of the hand. Across several studies, these perceptual judgments were reliably distorted by the goal implied by the action kinematics, so that a reach was perceived nearer the object than it really was and a withdrawal further away (the representational momentum effect; Freyd & Finke, 1984). Importantly, perceptual judgments were also biased by the actor’s explicit intention statements. Participants reported hands further towards an object if the intention was to “take” the object and further away if to “leave” it. Crucially, the effect increased the more reliably the intention predicted the action (Hudson, Bach, & Nicholson, 2018), consistent with an optimal integration of prediction information into one’s perceptual estimates (Ernst & Banks, 2002).

These studies show that, in NT participants, action perception is enriched by top-down goal expectations derived from both explicit intention information and action kinematics (Bach et al., 2014; Hudson, McDonough, et al., 2018; McDonough et al., 2019; McDonough et al., 2020). Here, we used this task with individuals with autism, and a matched group of NT individuals, who completed both the spoken instruction and heard intention variants of the experiment. This study directly tests whether action observation in ASD is generally immune to top-down information, or whether it is evident if the goals are provided explicitly through the intention statements. Moreover, comparing the spoken and heard version of the task will reveal whether those with ASD generally overweigh self-generated compared with other-generated predictors, as some proposals suggest (Lawson et al., 2014).

Method

Participants

Neurotypical participants (n = 28) were recruited from Plymouth University and the wider community. ASD participants (n = 28) were recruited from local community services in the region via intermediaries within and outside of Plymouth University. All participants had normal/corrected vision and gave written informed consent. All ASD participants had received a clinical diagnosis of autism spectrum disorder, meeting either DSM-IV/V (American Psychiatric Association, 1994, 2013) or ICD-10 (World Health Organization, 1993) criteria. Sample size was constrained by participant availability, but matched previous studies. After exclusions (see Results), our sample size of 23 in each group provides 80% power (G*Power; Faul et al., 2007) to detect across-groups within-subject effects sizes of .42, within-groups effect sizes of .61, and between-subject effect sizes of 0.84. The smallest effect sizes of interest detectable with this paradigm (SESOI; Lakens et al., 2018) are .30, .43, and .59, respectively. The experiment was approved by Plymouth University ethics board.

Participants completed the Autistic-Spectrum Quotient (AQ: Baron-Cohen et al., 2001), which is a 50-item self-administered questionnaire that measures the degree of autistic traits in clinical and NT populations along five dimensions (social skills, attention switching, imagination, attention to detail, and communication). Initial feedback from potential participants and their support worker suggested that the four response options (definitely/slightly agree, definitely/slightly disagree) be condensed to two (agree, disagree). The binary scoring of 1 or 0 for each item of the AQ permitted this with no loss of variance in AQ scores. Participants also completed the Vocabulary and Matrix Reasoning subtests (FSIQ-2) of the WASI IQ test (Wechsler, 1999), although we were unable to collect data from two NT participants. After participant exclusions (see below), the two groups did not differ in age, gender, Full-2 IQ, Verbal t score or Perception t score, whilst AQ scores were higher for the ASD group (see Table 1).

Table 1 Participant information after exclusions

Apparatus and stimuli

The experiment was administered using Presentation (NeuroBS) on a Hewlett Packard s230tm EliteDisplay Touchscreen (1,920 × 1,080 px, 60 Hz). A Logitech PC120 combined microphone and earphone headset delivered auditory stimuli and registered verbal responses.

The stimulus set consisted of 8 action sequences, each consisting of 26 frames (960 × 540 px). They were filmed with a Canon Legria HFS200 at 30fps and separated into individual frames with MovieDek. Each sequence showed a hand reaching from the right to the left towards a goal object, depicting the initial start of the action but stopping before object contact. Sequences were digitally manipulated using Corel PaintShop Pro X6, so that all background details were replaced with a uniform black background. Four sequences showed natural reaches towards objects that were safe to touch (water bottle, water glass, wineglass, handle of a knife). In the other four sequences, the safe objects were digitally replaced by a painful object of similar size (cactus, broken water glass, broken wineglass, knife blade), ensuring identical reach kinematics in both conditions.

Each sequence started with the hand at a neutral middle point (randomly chosen between frames 11-16) and progressed forwards or backwards through the sequence in two-frame steps for 3, 4, or 5 frames (80ms each) to depict reaches for or withdrawals from the object. The response stimuli were created by erasing the hand from a single frame of each of the sequences to depict just the object. When displayed directly after the action, it created the impression that the hand simply disappeared.

Audio stimuli of an actor saying “I’ll take it” and “I’ll leave it” (1,000-ms duration) were recorded with a M-Audio Microtrack 2 Digital Voice Recorder, and were presented binaurally, but biased 50% to the right ear to match the implied location of the actor.

Procedure

Participants first completed two training sessions, which served to progressively familiarise them with the task requirements (see Supplementary Materials for analysis). In the first training session (42 trials), participants reported, via touchscreen, the position of a briefly presented static hand. In the second training session (48 trials), participants judged the final position of a moving hand after it had disappeared. As in the main experiment, participants were instructed to touch the tip of the index finger.

Participants then completed two blocks of the experiment. Each trial (see Fig. 1) started with a written instruction to hold the spacebar with the right hand, to prevent tracking of the stimulus with the finger, while the other hand rested on the desk or lap. A fixation cross was then presented for 500 to 1,000 ms, followed by the first frame of the stimulus sequence (hand in neutral position). In separate blocks, participants either verbally instructed the actor (Spoken Instruction) or heard the actor state an intention (Heard Intention). In the Spoken Instruction block, participants said “Take it!” if the object was safe to grasp and “Leave it!” if the object was potentially painful. Responses were not recorded, but correct performance was monitored throughout by the experimenter. In the Heard Intention block, participants heard the actor say “I’ll take it” if the object was safe, and “I’ll leave it” if the object was potentially painful. Block order was counterbalanced between participants. In both blocks, there was a 50% congruency between intention statement and action, and participants were explicitly told that the actor was just as likely to do what they had said/were instructed as do the opposite. The biases we measure therefore reflect people’s prior expectation that other’s statements truthfully signal their intention, rather than contingencies learned within the experiment.

Fig. 1
figure 1

Trial sequence in the experimental blocks

In both conditions, the action sequence began 1,200 ms after the onset of the intention statement or the registration of the verbal response by Presentation’s sound threshold logic, showing the hand reaching for or withdrawing from the object, for 3, 4, or 5 frames. When the hand disappeared mid-action, participants released the spacebar and, with the same hand, touched the screen where the last seen position of the tip of the index finger had been. For each experimental session, participants completed 10 practice trials, followed by two blocks of 48 trials, providing all factorial combinations of object type (safe, painful), object identity (4), action direction (reach, withdrawal), and sequence length (3, 4, 5 frames).

Trial and participant exclusion

Trials were excluded if the correct response procedure was not followed (participants released the spacebar prior to offset of the visual stimulus) and if the response initiation time (spacebar release after stimulus offset) or response execution time (spacebar release to touch-screen response) was less than 200 ms or more than three standard deviations above the group mean. Trial exclusions for the NT group (9%, SD = 6.5%) and ASD group (14%, SD = 15%) did not differ, t(44) = 1.61, p = .116, leaving on average 174 trials per participant in the NT group (22 per condition) and 163 trials per participant in the ASD group (20 per condition). After exclusions, response initiations were slower to intention/action incongruent than congruent actions, F(1, 44) = 7.52, p = .009, ηp2 = .146, and the ASD group was slower than the NT group to initiate responses, F(1, 44) = 5.05, p = .03, ηp2 = .103, and execute responses, F(1, 44) = 19.6, p < .001, ηp2 = .308, but this group effect did not interact with any of the conditions of interest (see Supplementary Materials). Participants were excluded (NT = 4; ASD = 3) if their responses showed no consistent relationship with the visual stimuli, as assessed by four a priori criteria (Hudson, Bach, & Nicholson, 2018): (1) if the selected screen position in each trial was not significantly correlated with the real screen position on either axis; (2) if the correlation coefficient between the real and selected screen positions was less than three standard deviations below the group median on either axis; (3) if the mean screen distance (pixels) between the real and selected screen positions was more than three standard deviations above the group mean; (4) if not enough valid trials (<50%) remained after trial exclusions (see below). One additional NT participant was excluded because they misunderstood the task, and two ASD participants were excluded for terminating the experiment before the second block.

Analysis

To assess the prediction that location judgments would be shifted in the direction of the actors’ explicit goal (nearer the object after “take”; farther after “leave”) and in the direction of the action trajectory (nearer for reaches, farther for withdrawals), we measured the difference between the hands’ location and where participants localized it on the touch screen. For each trial, the perceptual bias in pixels was calculated by subtracting the real disappearance point of the tip of the index finger from the participant’s touch screen responses. Positive values reflect rightward (x-axis) and upward (y-axis) biases from the real position, and negative values reflect leftward (x-axis) and downward (y-axis) biases.

As in prior work (Hudson, Bach, & Nicholson, 2018; Hudson, McDonough, et al., 2018; McDonough et al., 2019), participants generally pressed further rightwards (34.0 px) and downwards (−105.3 px) of the real stimulus position, reflecting well-known shifts towards a stimulus’ centre of gravity (Coren & Hoenig, 1972). The overall distance from the real position did not differ between groups, t(44) = 1.51, p = .140, but the ASD participants had a larger rightwards bias (ASD: 41.2 px; NT: 26.9 px, t = 2.28, p = .027), but a smaller downwards bias (ASD: −96.1 px, t = 2.57, p = .014; NT: −114.5 px). As these biases reflect general localization errors that are independent from our effects of interest, and are also present in the training sessions with dynamic and static stimuli (see Supplementary Material), the mean biases across groups were subtracted from each participant’s mean response, therefore leaving all comparisons of interest unaffected.

These difference scores were analysed separately for the x-axis and y-axis with a mixed-effects analysis of variance (ANOVA), with Intention Type (heard intention, spoken instruction), Action (reach, withdrawal), and Intention (take it, leave it) as within-subject factors, and Group (NT, ASD) as a between-subjects factor. Main effects of Action will reveal the extent to which responses are biased away from the hands’ real position towards the goals implied by the action kinematics (further leftwards towards the object for reaches than for withdrawals). Main effects of Intention will reveal the extent to which responses are biased away from the hands’ real position towards the explicitly provided goal information (further leftwards for goals to “take” the object rather than “leave” it).Footnote 1 Interactions with Group will reveal whether ASD and NT groups differ in how much perceptual judgments are biased by kinematics (Action × Group) and explicit goal information (Intention × Group).

There were no between-group differences in variance on any of the effects of interest (Levene’s max F = 3.69, p = .061). Bayes factors (BFs) are reported for crucial two-sided tests with default priors using JASP (JASP Team, 2020), which provides a relative estimate of how much more likely the presence of an effect is given H1 than the null hypothesis. All other main effects or interactions will be evaluated against a Bonferroni-corrected alpha threshold for incidental findings due to hidden multiplicity in ANOVAs (Cramer et al., 2016).

Results

x-axis

As predicted, and replicating our prior work, there was a main effect of Action, F(1, 44) = 6.27, p = .016, ηp2 = .13, 95% CI [2.2, 23.7], BF10 = 2.18, showing that perceived disappearance points of reaches were shifted more leftward (towards the object) than those of withdrawals, and a main effect of Intention, F(1, 44) = 53.4, p < .001, ηp2 = .548, 95% CI [6.9, 12.4], BF10 = 2.763e+6 (see Fig. 2), showing that perceptual judgements are biased more leftward after an explicit intention to “Take” the object than to “Leave” it, showing the predicted perceptual bias towards explicit action goals..

Fig. 2
figure 2

The perceptual bias (selected screen position − real screen position) on the x-axis and y-axis for the NT (top row) and ASD (bottom row) groups, depending on whether explicit intentions were passively heard (left panels) or actively spoken (right panels). The intersection of the crosshairs at 0,0 represents the real disappearance point on any given trial, corrected for overall (across-group) biases in finger placement, with the object located to the left. In each panel, bias in the direction of the observed kinematics (main effect of Action) is reflected in how much perceived disappearance points of reaches (squares) are shifted leftwards compared to those of withdrawals (circles). The bias in the direction of prior intentions (main effect of Intention) is reflected in how much explicit intentions to “take” the object (filled markers) shifts perceived disappearance points leftwards compared with intentions to “leave” the object (empty markers). Error bars represent 95% confidence intervals

The main question was whether the perceptual biases induced by explicit intention information was smaller in the ASD group, as predicted from a generally reduced reliance of top-down information. However, there was strong evidence against an interaction of Intention and Group, F(1, 44) = 1.78, p = .190, ηp2 = .04, 95% CI [−8.9, 1.8], BF10 = 0.60. Separate analyses provided decisive evidence that intentions to “take” or “leave” the object biased perceptual judgments towards and away from this object in both groups, with similar effect sizes in the ASD, t(22) = 4.97, p < .001, d = 1.03, 95% CI [6.7, 16.2], BF10 = 451.7, and the NT group, t(22) = 6.05, p < .001, d = 1.26, 95% CI [5.2, 10.6], BF10 = 4809. Thus, explicit goal information biases social perception in both groups, refuting a generalized impairment in top-down processing.

Regarding motion prediction cues, the analysis revealed an interaction of Action and Group, F(1, 44) = 4.30, p = .044, ηp2 = .09, 95% CI [0.6, 42.3], BF10 = 1.61. Analysis of the NT group provided strong evidence that seeing reaches and withdrawals biased perceptual judgements towards or away from objects, t(22) = 4.43, p < .001, d = .92, 95% CI [12.6, 34.7], BF10 = 141.5. However, for the ASD group, this analysis provided considerable evidence against such a bias, t(22) = .25, p = .804, d = .05, 95% CI [−20.6, 16.1], BF10 = 0.23. Thus, while perceptual judgments are biased by explicit intention information in ASD just as in NT participants, they are unaffected by goals implied by the action kinematics.

Finally, the analysis revealed the predicted interaction of Intention Type (spoken, heard) and Intention (take it, leave it), F(1, 44) = 17.5, p < .001, ηp2 = .28, 95% CI [3.1, 8.9], BF10 = 171.8, showing that spoken intentions induced larger perceptual shifts than heard intentions, but this did not differ by Group, F(1, 44) = 1.2, p = .277, ηp2 = .03, 95% CI [−2.7, 9.0], BF10 = 0.48. As we had no further predictions, all other effects are subject to alpha inflation in an ANOVA (Cramer et al., 2016, Bonferroni-adjusted p < .003). No further main effects or interactions passed the adjusted threshold.

y-axis

As we had no predictions for the y-axis, a Bonferroni-adjusted alpha of p < .003 was employed (Cramer et al., 2016). A main effect of Action, F(1, 44) = 139.6, p < .001, ηp2 = .76, 95% CI [31.8, 44.9], BF10 = 2.597e+12, showed that reaches were perceived higher than withdrawals, consistent with a bias in motion direction (i.e., upwards for reaches, downwards for withdrawals). There also was a main effect of Intention, F(1, 44) = 28.6, p < .001, ηp2 = .394, 95% CI [2.1, 4.5], BF10 = 7116. Hand disappearance points were perceived higher if the intention was to “Take” than to “Leave”. There were no interactions between Intention and Group, F(1, 44) = .05, p = .825, ηp2 = .001, 95% CI [−2.2, 2.8], BF10 = 0.30, or between Action and Group, F(1, 44) = 1.4, p = .243, ηp2 = .03, 95% CI [−5.4, 20.8], BF10 = 0.51.

Relative weighting of motion and goal information

We next sought to establish whether the predictors of explicit intention and action kinematic information exert an equal influence on perceptual judgments, or whether one is weighted more than the other, and how this may differ between groups. For each participant, we derived statistically orthogonal indices of the effect of each predictor type by comparing the main effect of Action (withdrawals − reaches) and the main effect of Intention (Leave It − Take It). A mixed-effects ANOVA was conducted with Group (NT, ASD) as a between-subjects factor, and Intention Type (heard, spoken) and Predictor Type (motion, intention) as within-subjects factors (see Fig. 3a). The analysis revealed an interaction between Predictor and Group, F(1, 44) = 4.71, p = .035, ηp2 = .097, 95% CI [1.8, 48.2], BF10 = 1.88. The NT group showed a larger influence of motion (M = 23.7 px, SD = 25.6) than intention information (M = 7.9 px, SD = 6.2), t(22) = 2.97, p = .007, d = 0.75, 95% CI [4.7, 26.8], BF10 = 6.52. In contrast, the ASD group showed no bias, t(22) = .902, p = .377, d = 0.34, 95% CI [−30.4, 11.9], BF10 = .315, although numerically there was a larger influence of intention information (M = 11.4 px, SD = 11.0) than motion information (M = 2.2 px, SD = 42.5).

Fig. 3
figure 3

a The relative weighting of Motion information and Intention information on the perceptual shift for NT (left) and ASD (right) groups. Error bars represent standard error of the mean. b The relationship between the weighting of Motion and Intention information for each participant in the NT (left) and ASD (right) groups

There was a three-way interaction of Intention Type, Predictor, and Group, F(1, 44) = 6.12, p = .017, ηp2 = .122, 95% CI [5.7, 55.2], BF10 = 3.22. When analysed separately, the NT group provided considerable evidence for the absence of an interaction of Intention Type and Predictor, F(1, 22) = .78, p = .387, ηp2 = .034, 95% CI [−12.7, 5.1], BF10 = 0.311. Perceptual judgments were weighted more in favour of motion information than intention information in both the Spoken and Heard conditions. This interaction was, however, present in the ASD group, F(1, 22) = 5.34, p = .031, ηp2 = .95, 95% CI [2.7, 50.5], BF10 = 1.96, who, unlike the NT group, weighted intention information more strongly than motion information in the Spoken condition but not the Heard condition.

Exploratory correlational analyses

We assessed the relationship between the weighting of motion and intention information across participants (see Fig. 3b). The NT group showed no correlation in the perceptual shift caused by motion and intention information, r = .136, n = 23, p = .536, 95% CI [−.314, .585], BF10 = 0.31. However, in the ASD group, those participants who afforded less weight to motion information gave more weight to intention information, r = −.503, n = 23, p = .014, 95% CI [−.895, −.111], BF10 = 4.33.

Finally, we tested whether the relative influence of motion and intention information was associated with individual differences in IQ or autistic traits, as measured by the AQ and its subscales. For each participant, the perceptual shift elicited by intention information was subtracted from that elicited by motion kinematics, indexing the size to which each participant weighted each predictive source. The NT group exhibited no relationship between this relative weighting and overall AQ score, nor any of its subscales, or the verbal or perception WASI subtests. The ASD group exhibited no relationship between autistic-like traits and the preference of motion or intention information except for a marginal (against Bonferroni-corrected p = .008) negative association with the social skills subscale, r = −.538, p = .008, 95% CI [−.155, −.920], BF10 = 7.0. The greater the deficit in social skills, the larger the influence of explicit intention information on the intention–action prediction effect. Moreover, there were significant relationships between the extent of motion information weighting and performance on the verbal, r = .489, p = .018, 95% CI [.094, .885], BF10 = 3.64, and perceptual subtests, r = .523, p = .010, 95% CI [.137, .910], BF10 = 5.70, indicating that the higher the IQ score, irrespective of domain, the greater the influence of motion information compared with intention information.

Discussion

Here, we tested whether social perception in autism spectrum conditions is impervious to top-down expectation (e.g., Palmer et al., 2015; Van de Cruys et al., 2014). Replicating our prior work (Hudson, Bach, & Nicholson, 2018; Hudson, Nicholson, Ellis, & Bach, 2016; Hudson, Nicholson, Simpson, et al., 2016), in NT participants, perceptual judgments of others’ actions were predictively biased by both the goals implied by the observed kinematics (a reach or withdrawal) and by the explicit intentions (to “take” or “leave” the object). The ASD participants differed markedly from this pattern, but in a manner inconsistent with a generally weaker top-down influence. The explicit intention statements to “take” or “leave” the object biased perceptual judgments as in NT participants, revealing that top-down influence of higher-level goal information onto action perception is fundamentally intact. However, ASD participants exhibited no perceptual shift in response to the goals implied by the observed movements. Thus, top-down expectations about another’s goals shape social perception in ASD just as they do in NT individuals, but only if these expectations are explicitly provided and do not have to be spontaneously inferred from the observed kinematics itself.

These findings challenge the growing consensus that social perception in ASD is not influenced by top-down modulation (e.g., Palmer et al., 2015; Van de Cruys et al., 2014). Instead, our findings imply a more differentiated account in which prediction channels for perceptually derived top-down information are down-weighted in ASD, while others remain intact, perhaps guided by the relative reliability of prediction sources (Ernst & Banks, 2002). Indeed, our analysis of the relative weighting of both types of intention cues found that neurotypical participants weighted prior motion information more strongly than intention information, as expected given that current motion is the more reliable predictor of future position. In contrast, for the ASD group, explicit intention information exerted a greater influence on the top-down predictive bias than action kinematics, specifically when these intentions were given to the actors by the participants (rather than passively heard). In addition, in ASD participants, but not in the NT group, the weighting of both predictors was negatively associated, with the influence of intention information increasing the more that of kinematic information decreased. This suggests a difficulty integrating the two sources of information, which is resolved by down-weighting kinematic relative to explicit intention information (Ernst & Banks, 2002; Hudson, Bach, & Nicholson, 2018; Zaki, 2013), although these results should be interpreted with caution due to the low sample sizes.

The question then is why autistic people would form predictions only when the predictive cue is made explicit. This may speak to the hierarchical organization of the social perception system. Higher-level predictions generalize across space and time and become translated into action expectations further down the hierarchy, possibly via goal-to-kinematic transformations in the motor system (Kilner, 2011), about what one will perceive, such as the assumption that others will behave in the most efficient way possible to achieve their goals (Marsh et al., 2015). Action expectations can, however, also form locally, within the perceptual system for analysing biological motion (Hudson, McDonough, et al., 2018; McDonough et al., 2019; Scholl & Gao, 2013). Our data suggest that it is in these low-level prediction processes that the differences in social perception may originate (Palmer et al., 2017). This could reflect either the use of kinematics to generate perceptual predictions about an action’s future course, or the perception of a coherent motion percept from the sequential motion frames, which requires predictive processes (Kourtzi & Shiffrar, 1999; Yantis & Nakama, 1998), but which is compromised in ASD (David et al., 2010).

These findings of reduced reliance on motion prediction maps nicely onto wider issues in biological motion perception in ASD, whereby small reductions in biological motion sensitivity may give rise to up-stream difficulties in the interpretation of social information from motion cues, such as emotion (for meta-analyses see, Federici et al., 2020; Todorova et al., 2019). Interestingly, motion perception sensitivity is related to the extent of the autistic condition (Kaiser & Shiffrar, 2009), and we similarly found that the underweighting of motion relative to intention information was negatively associated with IQ and social skills. While these findings need to be confirmed in more complex and realistic behaviours, they suggest that the shift from spontaneous inferences to explicit information could become more pronounced with a more heterogeneous sample that encompasses all levels of the autistic spectrum. Moreover, perceptual processes may be employed for prediction if provided with explicit information about the reliability of such sensory information (e.g., Hudson, Bach, & Nicholson, 2018).

The current findings have implications beyond the domain of social perception. Atypical social perception overlaps with that of nonsocial perception, reflecting a broader difference in integrating top-down input, resulting in both perceptual enhancements and impairments (Behrmann et al., 2006; Mottron et al., 2006). An increased reliance on bottom-up sensory evidence at the expense of top-down information can explain many characteristics of perceptual anomalies in ASD, such as the preference for local details over global configuration (Happé & Frith, 2006), higher motion coherence thresholds (Milne et al., 2002; Zaidel et al., 2015), or the reduced use of complex/second-order motion cues (Bertone et al., 2003; Kaiser & Shiffrar, 2009). We suggest that these findings may not reflect a general reduction in the use of high-level information to shape sensory input, but instead the spontaneous generation of the relevant inferences from the input.

Conclusion

Social perception in ASD is subject to top-down modulation but only if this information is explicitly given as abstract information that is independent of the stimulus itself. Such perceptual processes in the social domain may represent an especially notable example of a more general atypicality in ASD perception. This study opens up a new avenue for perceptual research in ASD. Perceptual processes in ASD may in fact be open to top-down influences if such information is explicitly provided.