Social communication is a multifaceted process, underpinned by fundamental skills including the interpretation of the emotional states and thoughts of others (Adolphs 2001; Frith and Blakemore 2006). Recognition of expressed emotions involves integrating nonverbal and verbal cues, including facial expressions, body gestures, contextual cues and vocal prosody (Borod et al. 2000). The developmental course of emotion processing typically begins in early infancy with individuals continuing to develop their emotion recognition repertoire with age (Durand et al. 2007). Infants between 4–7 months have the emerging ability to recognise happy and sad (Walker-Andrews 1998; Young-Browne et al. 1977). By the age of 10, children have mostly achieved adult level proficiency in emotion recognition ability (Mondloch et al. 2003). As individuals progress into adolescence and adulthood, they gradually develop efficiency in processing increasingly complex emotions.

Atypical emotion recognition processing is postulated to underpin the social communication difficulties observed in autism spectrum disorders (ASD) (Baron-Cohen 2004; Williams and Gray 2013). Behavioural studies investigating the emotion processing abilities of individuals with ASD across the developmental trajectory, report a lack of progressive increase in maturation of emotion recognition abilities among individuals with ASD compared to neurotypical individuals (Lozier et al. 2014). While reduced emotion recognition accuracy has been reported from basic (Bӧlte and Poustka 2003; Eack et al. 2015; Falkmer et al. 2011; Griffiths et al. 2017; Uljarevic and Hamilton 2013) to complex emotions among individuals with ASD with normal intellectual functioning (Fridenson-Hayo et al. 2016; Golan et al. 2006a), some studies report comparable emotion recognition performance to neurotypical controls (Castelli 2005; Leung et al. 2013). The heterogeneity in the current literature limits our ability to delineate the precise developmental course of emotion processing in ASD. Discrepancies in these findings are likely due to variations in experimental demands and demographic factors (Harms et al. 2010). For instance, studies assessing the recognition of basic emotions often yield mixed findings, with some reporting comparable recognition in individuals with ASD (McCabe et al. 2013; Spezio et al. 2007) and others observing decreased basic emotion recognition accuracy for individuals with ASD (Berggren et al. 2016; Falkmer et al. 2011). Other studies suggested emotional valence influences emotion recognition performance in ASD with differing accuracy in negatively valanced emotions, but not positive basic emotions (Ashwin et al. 2006). There is evidence of a selective impairment in emotions with briefer presentation duration, suggesting a delayed spontaneous processing of emotional expressions for individuals with ASD (Clark et al. 2008).

Variation in stimulus presentation may offer another explanation of the heterogeneous findings reported in the current literature (Harms et al. 2010; Nuske et al. 2013). Contemporary emotion recognition experimental designs in ASD have largely focused on presenting static and typical basic emotions, with few studies employing dynamic stimuli with naturalistic elements. Static prototypical stimuli may not accurately reflect the demands of everyday social interactions, which often require rapid processing of multiple social cues. This would be in line with previous research comparing the performance of people with ASD and neurotypical controls on both static and dynamic emotional stimuli, concluding that emotion recognition accuracy during viewing of dynamic stimuli has more consistently differentiated these groups (Cassidy et al. 2015; Chevallier et al. 2015). Neuroimaging studies have reported differential neuro activation amongst individuals with ASD when presented with dynamic relative to static expressions, with the former reporting reduced activation in social brain regions such as the amygdala and fusiform gyrus (Pelphrey et al. 2007). A more nuanced understanding of the multimodal emotion processing abilities of individuals with ASD may depend on additional research employing more ecologically valid stimuli.

To date, previous ASD related research adopting dynamic naturalistic stimuli has predominantly focussed on evaluating impairments in mentalising or theory of mind (Mathersul et al. 2013; Müller et al. 2016; Murray et al. 2017; Rosenblau et al. 2015) with comparably fewer studies utilising dynamic naturalistic stimuli specifically in the context of emotion recognition. One such assessment is the Reading the Mind in Films Task (RMFT). This task examines the ability to identify complex emotional concepts from video clips of social scenes. Previous work using the RMFT has observed impaired emotion recognition accuracy in children and adults with high functioning ASD compared to neurotypical controls (Golan et al. 2006b, 2008). While this research suggests that ASD may be associated with difficulties in the ability to extract complex emotional information from social scenes, further exploration of the underlying mechanisms contributing to this impairment is warranted (Bird et al. 2011; Klin et al. 2002; Nakano et al. 2010).

Eye tracking may provide valuable insights into the visual processing mechanisms underpinning the emotion recognition performance of individuals with ASD. Systematic reviews in this area report collective evidence for divergent gaze patterns towards facially expressed emotions and eye avoidance amongst adults with ASD (Black et al. 2017; Harms et al. 2010). However, findings from eye tracking studies in ASD are somewhat inconsistent, a pattern linked to factors such as heterogeneity in study design and participant demographics (Guillon et al. 2014). While some research report that adolescents and adults with ASD have reduced eye gaze towards dynamic faces and social stimuli compared to neurotypical controls (Bird et al. 2011; Klin et al. 2002), others report similar visual scanning patterns when comparing participants with and without ASD (Kuhn et al. 2010; Nakano et al. 2010). The role of atypical gaze strategies in contributing to ASD-linked difficulties in recognising complex emotional information from naturalistic social contexts needs to be further understood.

The present study therefore sought to examine the visual processing mechanisms of adults with ASD during the recognition of complex emotional concepts presented within naturalistic social scenes. In light of previous research (Golan et al. 2006b), we administered the RMFT to adults with ASD and neurotypical controls. Eye gaze was recorded throughout this assessment, enabling comparison of the visual processing mechanisms of both groups. Specifically, it was hypothesised that

  1. (1)

    Adults with ASD compared to neurotypical adults would be less accurate in recognising the complex emotional concepts presented in social scenes.

  2. (2)

    Adults with ASD compared to neurotypical adults would exhibit reduced percentage of fixation time towards target characters (i.e. the subject of the emotional information) and greater fixation time towards non-social display regions for each social scene.

Methods

Participants

Adults with ASD and neurotypical adults were recruited via the Curtin University Autism Research Group community pool and local service providers throughout Perth, Western Australia. Participants with ASD had a confirmed diagnosis of ASD based on consensus from a multidisciplinary team, Asperger’s syndrome (AS), or Pervasive developmental disorder—Otherwise not specified (PDD-NOS) as specified under the Diagnostic and Statistical Manual of Mental Disorders-5 (DSM-5) (American Psychiatric Association 2013) or according to the previous DSM-IV diagnostic criteria (American Psychiatric Association 2000). At present, diagnostic processes involving assessments within multidisciplinary team is considered the ‘goal standard’ for autism diagnosis in Australia (Whitehouse et al. 2018). Participants with ASD diagnosed with significant neurodevelopmental and/or mental health disabilities, such as intellectual disability, epilepsy and bipolar disorder were excluded. Neurotypical adults with no history of neurodevelopmental disorder or current psychiatric diagnoses, scoring below the clinical cut-off (raw score of 68) for autistic traits measured using the adult self-report Social Responsiveness Scale (SRS-2) (Constantino and Gruber 2011) were eligible for inclusion. All participants were required to have sufficient understanding of verbal and written English language, and normal or corrected vision. In total, 69 individuals were initially recruited for this study. Twelve adults with ASD were subsequently excluded, due to the absence of a confirmed ASD diagnosis (n = 2), significant comorbidities (intellectual disability, n = 2; bipolar disorder, n = 1), technical issues during data recording (n = 5), or eye tracking calibration failure (n = 2). Nine neurotypical adults were excluded for scoring above the SRS-2 cut-off for clinically significant symptoms (n = 5) and technical issues in data recording (n = 4). Neurotypical adults were matched with the participants with ASD according to age, gender, Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI) and Full Scale Intelligence Quotient (FS-IQ) (Wechsler 2011). The final sample included 23 adults with ASD (age M = 25.81, SD =9.91 years) and 25 matched neurotypical adults (age M = 27.31, SD =9.00 years). Table 1 summarises the participants’ demographic characteristics. This study was approved by the Human Research Ethics Committee at Curtin University, Perth, Western Australia (Approval Number: 52/2012).

Table 1 Participants characteristics

Measures

Social Responsiveness Scale—Second Edition

The Social Responsiveness Scale—Second edition (SRS-2) Adult Self Report (Constantino and Gruber 2011) screens for autistic trait severity, consisting of 65 four-point Likert scaled items relating to five dimensions; social awareness, social cognition, social communication, social motivation and restricted interests and repetitive behaviours. Total raw scores above 67 indicate mild to severe autism symptomology. The SRS-2 has demonstrated excellent internal consistency (rtt = .94–.96), good test–retest reliability (rtt = .88–.95) and reliability coefficients (rtt = .61–.92) (Bruni 2014), and is validated cross culturally (Bölte, Poustka and Constantino 2008).

Wechsler Abbreviated Scale of Intelligence—Second Edition

The Wechsler Abbreviated Scale of Intelligence—Second edition (WASI-2) (Wechsler 2011) is the abbreviated version of the Wechsler Adult Intelligence Scale—Fourth edition, assessing general cognitive abilities across verbal reasoning (vocabulary and similarities) and perceptual reasoning (Block Design and Matrix Reasoning). The WASI-2 has shown excellent internal consistency (rtt = .90–.92), stability coefficients (rtt = .83–.94) and inter-rater reliability (rir = .94–.99) (McCrimmon and Smith 2012).

Reading the Mind in Films Task

The Reading the Mind in Films-Adult Task (RMFT) provides an ecologically valid assessment of complex emotion recognition skills (Golan et al. 2006b), comprising of 22 social scenes taken from four movies. Each scene conveys an emotional state of a specified character within varying social contexts. For example, social scenes range in number of characters (one to five main characters) and settings (home and public settings). Prior to watching each scene, a target character is specified with participants requested to identify the character’s expressed emotion at the end of the scene. All film clips include the dialogue associated with the original movie clip. After watching each film clip, a question slide with four multiple choice options appeared and participants were requested to indicate which emotion best represents the emotion of the target character at the end of the scene by pressing a key on a keyboard.

Apparatus

Eye tracking data was recorded using the SensoMotoric Instruments 60 Hz remote eye tracker (RED), a stationary contact-free device, capturing movements within a 40 × 20 cm range at a distance of 70 cm. The RED recording unit was integrated with two external devices, a laptop controlling data acquisition and a 40 inch television screen for stimuli presentation. The experimental setup was managed via SMI Experiment Centre and presented in 800 × 600 pixels. IViewX in 60 Hz managed the eye movement data acquisition.

Procedure

Following completion of socio-demographic questionnaires and screening assessments, participants were oriented to the eye tracker and completed a five-point calibration protocol. Since the RMFT was sampled from four featured movies and series, participants firstly indicated whether they had previously watched any of these films. All participants confirmed that they were naïve to all four movies. Verbal instructions were then provided followed by one practice trial. The question slide orienting the participants to a target character and four multiple choice options was shown prior to each video clip. This was followed by a one second fixation cross and the video clip. After presenting the video clip, the question slide was presented for a second time and participants were asked to indicate their answer on the keyboard before moving on to the next question. The presentation order for each task is shown in Fig. 1. No restriction was placed on response time. All participants were provided with a handout with a list of definitions of each emotion.

Fig. 1
figure 1

Order of presentation for each question in Reading in the Mind in Films task. (1) Question slide, (2) fixation cross to validate the eye tracking recording, (3) video, (4) question slide was presented the second time and participants recorded their answers (Golan et al. 2006a; 2006b, p. 116)

Data Preparation

Behavioural and eye tracking data output was obtained from SMI BeGaze software and managed using SPSS Version 24. Trials were excluded from analysis if the eye tracker failed to detect any gaze throughout the trial. Percentage accuracy scores were calculated by summing the total number of correct responses divided by the number of included trials. Gaze measures were derived from correct response trials only, in order to conduct a direct observation of instances when the emotion recognition occurred.

Participants with calibration data exceeding 1.5° visual angle were excluded from the analysis. Fixations were defined as gaze samples held within 1° visual angle for a minimum duration of 100 ms (Falkmer et al. 2008). Rectangular areas of interests were then dynamically defined over the facial and body regions of the ‘target character’ (central character identified in the video) and ‘other characters’ (supporting characters in the video scene). An example of the defined areas of interests is shown in Fig. 2. A fifth interest area ‘elsewhere’ was defined as the remaining areas of the display not occupied by the other interest areas. For each video, the total fixation time to each interest area was then calculated as a percentage relative to the video duration. From this, the average percentage of fixation time was then derived for the ‘target character face’, ‘target character body’ ‘other character face’ ‘other character body’ and ‘elsewhere’ areas of interest.

Fig. 2
figure 2

Example of defined area of interests, ‘target character face’, ‘target character body’, ‘other character face’, ‘other character body’. Areas outside of the defined area of interests were specified as ‘elsewhere’ (Golan et al. 2006a; 2006b, p. 116)

Statistical Analysis

Group characteristics between ASD and neurotypical adults were compared using an independent t test for continuous data. To determine if the distribution of gender between groups matched, gender comparison was analysed using Chi square test. Greenhouse–Geisser correction was applied for analyses violating the assumption of sphericity. Partial eta squared, η2 was reported for effect size calculations, with alpha value, α applied at p = 0.05.

Results

Emotion Recognition Accuracy

Emotion recognition accuracy was analysed using one-way analysis of variance (ANOVA) to determine group differences between participants with and without ASD. No significant group differences in emotion recognition accuracy, were found F(1,45.06) = 0.22, p = 0.64, partial η2 = 0.01. Participants with ASD had a similar performance to the neurotypical participants in recognising complex emotions (ASD, M = 55.50%, SD = 15.10; TD, M = 57.45%, SD = 13.82).

Percentage Fixation Time

To examine the group differences in fixation time, a group (ASD vs. neurotypicals) by 5 area of interests (target character face vs. other character face vs. target character body vs. other character body vs. elsewhere) factorial repeated measures ANOVA was conducted. As Mauchly’s Test of Sphericity indicated a violation of the assumption of sphericity, χ2(9) = 152.75, p < 0.01, Greenhouse–Geisser correction was applied. Analysis revealed a main effect of area of interest, F(1.6, 28979.38) = 146.93, p < 0.01, partial η2 = 0.76, which importantly was qualified by a significant group by area of interest interaction, F(1.6, 732.09) = 3.71, p = 0.04, partial η2 = 0.08.

Follow up pairwise comparisons with Bonferroni adjustments indicated that participants with ASD (M = 20.24%, SD =14.73) compared to neurotypical participants (M = 13.38%, SD =6.21) had an increased fixation time towards ‘elsewhere’, p = 0.04 (Fig. 3). There was a near significant trend, indicating participants with ASD (M = 39.14%, SD =11.65) had reduced fixation time towards ‘target character face’ compared to neurotypical adults (M = 45.68%, SD =11.50), p = 0.06. No other group differences in fixation time was found.

Fig. 3
figure 3

Average proportion fixation time across all area of interest, target character face, target character body, other character face, other character body and elsewhere in adults with ASD and neurotypicals

Discussion

The current study investigated the gaze strategies of adults with ASD and neurotypical controls during the processing of complex emotional content in naturalistic social scenes. The findings revealed an interesting contrast between the emotion recognition performance and the visual processing mechanisms employed by each group. While participants with ASD demonstrated comparable emotion recognition performance to controls, there was evidence of divergent gaze patterns towards non-social information. These findings offer insights into the possible visual processing strategies adopted by individuals with ASD during the processing of social information and recognition of complex emotional social scenes.

This study predicted that adults with ASD would be less accurate in recognising complex emotions presented in a dynamic ecologically valid assessment, a hypothesis consistent with previous research (Golan et al. 2006b, 2008; Müller et al. 2016). This prediction was however not supported, with comparable performance observed across adults with ASD and their neurotypical peers in emotion recognition accuracy. These findings suggest that adults with ASD with normal to above intelligence quotients have an intact ability to recognise emotions during the viewing of social scenes (Gepner et al. 2001; Hillier and Allinson 2002; Tracy et al. 2011), and that emotion recognition difficulties may not be universally present in ASD (Nuske et al. 2013).

It is possible that certain methodological factors may have contributed to the discrepancies in emotion recognition accuracy findings between the current study and those of Golan et al. (2006b). Previous research indicates that the emotion recognition difficulties of individuals with ASD becomes increasingly apparent during tasks with increasing complexity (Harms et al. 2010; Nuske et al. 2013). The presence of complex emotions presented in an ecologically valid manner suggest that the RMFT should be sufficiently complex for adults with ASD. The RMFT contains four multiple choice options, reducing the probability of individuals performing above chance levels, as evident in this study. In addition, no ceiling effect was observed for either group. Despite the advantages of the RMFT, an absence of difference in emotion recognition accuracy suggest that the RMFT may not be sufficiently complex to illicit differences in emotion recognition performance between individuals with ASD and neurotypical individuals. A previous review reported emotion recognition deficits in ASD typically appear during tasks with increasing attentional and cognitive demands, including tasks providing explicit prompting of the presented emotion (Nuske et al. 2013). Guided by the previous experimental protocol of the RMFT, this task is explicit in its presentation of social scenarios, with scenarios bookended by questions such as, “At the end of the scene, how is the younger man feeling?” presented before and after the social scenario. These questions explicitly direct participants towards the central character within a scenario and may serve to alleviate some of the cognitive demand associated with emotion processing (Klin 2000), possibly enabling the participants with ASD to correctly recognise the emotion. Pre-prompting was previously reported to improve the emotion recognition performance and alter the visual search patterns in ASD, reducing the deviance observed between individuals with ASD and neurotypical individuals (Joosten et al. 2016; Kliemann et al. 2013; Senju 2013). Additionally, the RMFT presented response items in a multiple choice format, allowing an unrestricted time to respond, possibly alleviating some of the cognitive demand associated with the processing requirements of these tasks (Clark et al. 2008; Klin 2000; Nuske et al. 2013; Tardif et al. 2007). Further insights may be gained from designs which impose time constraints on response time and present varying, and even open ended response formats.

The eye tracking measurements obtained in this study revealed that in comparison to controls, the individuals with ASD spent a greater percentage of time fixating on the non-social regions of the scenes, which were perhaps more peripheral in conveying emotional tone. No other significant results in eye gaze was found, although one trend was observed suggesting that individuals with ASD may have spent less time processing the target characters, who were the subject of the emotional content of each scene. These findings are consistent with research reporting that individuals with ASD direct less of their visual attention towards socially salient stimuli (Chita-Tegmark 2016; Guillon et al. 2014) and may not prioritise social information as neurotypical individuals (Sato et al. 2017).

Interestingly, the evident atypical visual processing of social stimuli was observed in conjunction with comparable behavioural performance. This combination of results suggests that the individuals with ASD may have potentially employed altered processing strategies during emotion recognition (Harms et al. 2010). Given that comparable visual attention was found for body and increased fixation time towards ‘elsewhere’, it is possible that the participants with ASD were utilising other visual modalities such as body language or contextual cues to successfully infer the emotional state of the target character. Consistent with this notion, previous studies have reported that individuals with ASD may show comparable recognition of body language, but reduced recognition of facially expressed emotion, suggesting varying emotion recognition abilities in individuals with ASD across different modalities (Nuske et al. 2013; Peterson et al. 2015). Future research might experimentally manipulate these regions, such as through occlusion, examining the differential impact this has on the performance of individuals with ASD, compared to controls. Findings from this study highlight the need to further explore the potential role of altered processing style employed by individuals with ASD during the processing of naturalistic social emotional information. Future research may seek to further explore why atypical gaze patterns present in the absence of a concurrent emotion recognition difficulties.

Dynamic ecologically valid assessments such as the RMFT presents social information with visual and auditory information, permitting the evaluation of multisensory integration abilities of participants with ASD in understanding emotion cues (Magnée et al. 2011). While adults with ASD demonstrated intact emotion recognition behavioural performance, differences in eye tracking results observed between adults with ASD and neurotypical adults suggests altered audio-visual processing strategy in ASD. It is plausible that individuals with ASD may draw on auditory information when interpreting complex social emotional scenarios (Rice et al. 2012). However, since unimodal visual versus auditory processing was not investigated, conclusive evidence on individual with ASD’s ability in audio-visual processing could not to be determined in this study. Research examining the effect of separating the visual and auditory information presented in the RMFT on the performance of individuals with ASD relative to neurotypical controls may provide further insights in audio-visual processing and emotion recognition abilities in individuals with ASD.

Limitations

The RMFT employed in the current study, presented videos of social scenarios with visual and auditory information varying in nuanced emotional tone. The ecological validity of this task represents a strength of the present work in enabling unique insights into how complex emotional content may be extracted from naturalistic social situations, providing understanding towards the nature of emotion processing for adults with ASD. However, limitations of the present study are also acknowledged, including the nature of the clinical sample used. For the present study, all participants with ASD may be considered higher-functioning as they all exhibited normative levels of verbal and intellectual ability. It is possible that this may have contributed to the higher levels of emotion recognition accuracy observed for this group (Harms et al. 2010). Caution in the generalisation of the present findings is therefore warranted, given the nature of the clinical sample used. While the present findings suggest that ASD is associated with altered attentional processing of social emotional scenes, it is noted that only a modest sample size was collected for the present study. Future research may seek to determine whether the present findings may be replicated across larger samples.

Conclusion

In summary, the present study provided evidence that the recognition of complex emotional concepts may be intact for adults with ASD, in response to the viewing of naturalistic social scenarios. Concurrent findings of aberrant gaze behaviour for these individuals however, points towards an altered processing style compared to neurotypical perceptions of social salience. This study provides understanding of the emotion recognition mechanisms which may characterise ASD.