By their very nature, social interactions are fluid and require the continual monitoring of our interactional partners. Our visual system is well designed for this task, making a saccade to a new location approximately three to five times per second (Krekelberg, 2011). Directing our attention toward others’ faces, particularly their eyes, is an important social mechanism (Bodenschatz et al., 2019; Emery, 2000; Ristic et al., 2005; Tomasello, 1995). For instance, through our eyes and facial expressions, we can communicate aspects of our personality, convey humour, our current emotional state and where we are currently directing our attention. Furthermore, during face-to-face interactions, neurotypical (NT) participants have been found to fixate more on an interaction partner’s eyes when listening rather than speaking (Ho et al., 2015), when information is ambiguous (Macdonald & Tatler, 2013) or to indicate that their conversational turn has finished (Ho et al., 2015).

Given the importance of how and where our eyes are directed during everyday social interactions, it is perhaps unsurprising that studies have made links between social and relational difficulties in certain groups of individuals and their use of gaze (Dawson et al., 2004; Horley et al., 2003; Langdon et al., 2006; Sasson et al., 2016; Senju & Johnson, 2009; Shean & Heefner, 1995; Wieser et al., 2009). One such group are autistic children and adults (Dawson et al., 2004; Senju & Johnson, 2009). In a large meta-analysis examining whether autistic individuals differ in their social attention and interest in the social world, Chita-Tegmark (2016) found a medium effect for overall diminished social attention for autistic individuals in eye-tracking studies, though the mixed findings across studies may reflect methodological differences. Chita-Tegmark (2016) highlighted some of the possible moderating factors that might account for differing results in the field, including the nature of the stimuli, pointing out that static stimuli (for instance, photos of social scenes) lack biological motion and that this may moderate the impact on whether differences emerge between autistic and non-autistic participants. Similarly, in a systematic review of eye-tracking studies in autism, Black et al. (2017) found that there was significant evidence of a reduced focus on the eye region of emotionally expressive faces in autistic adults, but the evidence was much weaker for autistic adolescents and children. Some have criticised the tests of social attention used in these eye-tracking studies as being unrealistic and lacking ecological validity (Fletcher-Watson et al., 2009; Kingstone, 2009), often presenting “social” information as static photographs depicting only the face (Dalton et al., 2005; Ristic et al., 2005; Rutherford & Towns, 2008), lacking contextual cues, which might otherwise act as attentional triggers to compete with the face. When presented with only a face with little context, this is likely to be attended to by all participants (Cole et al., 2016), but this does not represent data on the spontaneous gaze usage of participants in real-world settings. Some have argued that real-person interactions are essential for understanding social attention (Cole et al., 2016). Furthermore, some researchers have highlighted the importance of establishing if social attention is engaged similarly if a social partner is physically present or not (Freeth & Bugembe, 2019; Risko et al., 2016). It is notable that in some autobiographical accounts, the autistic authors report difficulties in the use and timing of gaze during real-life social interactions, with some authors describing feelings of violation and sensory overload (Trevisan et al., 2017).

Attempts have been made to investigate the use of gaze during face-to-face interactions among autistic and NT individuals. These studies have primarily employed either video recording of gaze patterns or some form of eye-tracking device, with a predetermined interaction between a participant and a researcher, including measuring eye gaze during the completion of an increasingly challenging arithmetic task (Riby et al., 2012) and several studies that have sought to manipulate components of the interaction, such as specifically encouraging direct or averted gaze by a participant or researcher (Birmingham et al., 2017; Falck-Ytter et al., 2015; Freeth & Bugembe, 2019; Riby et al., 2012). A rare example of a more naturalistic study of eye gaze with autistic participants was conducted by Hanley et al. (2015), which involved a real-life social interaction with the experimenter about social topics, in which they found subtle differences in attention to the eye region between the autistic and NT participants.

Several studies have reported gaze behaviour towards interaction partners that bears some similarity between NT and autistic participants. For instance, Birmingham et al. (2017) found that, despite the autistic youths being slower to follow the experimenter’s gaze, as a form of joint attention, most participants (ASD and NT) followed the gaze in at least half the trials. Doherty-Sneddon et al. (2013) found typical amounts of gaze aversion while thinking in autistic youths but noted that they used less gaze aversion with unfamiliar conversational partners than the neurotypical control group. In a study with autistic adults, Falkmer et al. (2011) found no differences between samples in the proportion of fixations across static and dynamic stimuli but did find that the control group had longer mean fixation durations than the autistic sample.

Some have argued that autistic individuals modulated their gaze behaviour similarly to NT individuals, at least to some degree, during an interaction (Doherty-Sneddon et al., 2012; Doherty-Sneddon et al., 2013; Freeth & Bugembe, 2019; Riby et al., 2012). However, it has been noted that in experimental paradigms that encourage or facilitate spontaneous gaze behaviour, autistic participants engage in smaller proportions of gaze fixations on the eye or face region (Falck-Ytter, 2015; Falck-Ytter et al., 2015; Freeth & Bugembe, 2019; Hanley et al., 2015) than NT participants. There is considerable heterogeneity in the design of gaze behaviour studies in autism research, with some focused on the impact of gaze on cognitive performance (Doherty-Sneddon et al., 2012; Falck-Ytter et al., 2015; Riby et al., 2012). Other studies focus on the face as the area of interest (AOI) rather than the eye region (Nadig et al., 2010; Riby et al., 2012), while still others have prioritised the examination of differences during phases of conversation (listening, thinking, talking) between autistic and NT samples (Doherty-Sneddon et al., 2013; Freeth & Bugembe, 2019).

There are several hypotheses regarding the use of eye gaze during interaction in autistic participants, which could explain group differences. Less eye contact may reflect low social motivation (Chevallier et al., 2012) or a lack of reflexive motivation to make eye contact (Dalton et al., 2005; Kylliäinen et al., 2012). Alternatively, reduced eye contact could function to manage cognitive load for autistic individuals, involving a reflexive avoidance of the eyes (Kliemann et al., 2012; Tanaka & Sung, 2016). For NT individuals, averting one’s gaze has long been viewed as a means of managing cognitive load (Glenberg et al., 1998). Autistic individuals report that eye contact is challenging and takes conscious effort (Trevisan et al., 2017). Some previous studies using static images and non-naturalistic experimental designs may have been vulnerable to artefacts of the experimental paradigm. For instance, the use of static images is unlikely to generate self-conscious emotions, increase anxiety or lead to cognitive overload, all of which may play a role in reducing autistic participants’ ability to manage their interpersonal eye gaze in real-life settings.

While it is clear that eye gaze is important in social interactions, to indicate turn-taking (Ho et al., 2015), signal social interests (Emery, 2000) and show or withhold intimacy (MacDonald & Tatler, 2013), an additional feature of good eye contact is the information this affords the person in interpreting the emotional state of one’s conversational partner. It has been argued that poor eye gaze may lead to the undeveloped ability to identify emotions in others. Dadds et al. (2012) have argued that lack of attention to the eye region of the face, as shown by fewer fixations on the eye region, is associated with poorer emotion recognition. Schurgin et al. (2014) demonstrated that distinct eye fixation patterns are associated with successful emotion recognition. Specific deficits in recognising emotions, which are associated with autistic traits, may be due to these differences in eye fixation patterns (Sasson et al., 2016).

Additionally, several studies have highlighted that autistic participants show less preference for the eye region in eye gaze behaviour and that this predicted emotion recognition skills (Kliemann et al., 2010; Kliemann et al., 2012) and that autistic participants moved their gaze away from the eye region, and this has been linked to differences in amygdala activity between controls and autistic participants (Kliemann et al., 2012). However, few studies have attempted to link eye gaze behaviour in real-life face-to-face interactions with associated emotion recognition ability.

In the current study, we used the latest eye-tracking technology to conduct semi-naturalistic face-to-face interaction with a sample of autistic and NT adults. Eye-tracking metrics were derived from common paradigms used in the field (Holmqvist et al., 2011) to include fixation count, duration of eye gaze and time to first fixation, all of which are considered important measures of eye gaze behaviour. In order to make the interaction as natural and engaging as possible, we chose to centre the conversation around everyday topics (holidays and hobbies). As noted above, while several studies have attempted to separate the speaking and listening phases of an interaction or manipulated how eye gaze is utilised (Falck-Ytter et al., 2015; Freeth & Bugembe, 2019; Riby et al., 2012), it is important to establish whether a more natural back and forth of conversation reveals significant differences in the eye gaze in autistic participants. So, in the present study, the researcher asked a participant about a topic and then provided their response once the participant had finished. We anticipated that this more ‘dialogue-like’ structure would be sensitive to social and emotional difficulties experienced in day-to-day life within the autistic group. We hypothesised that autistic participants would use less eye gaze toward an interaction partner during the participant’s speaking rather than listening phase of the conversation. However, we predicted that eye contact would be reduced in the autistic sample, as measured by the duration, number of fixations and time to first fixation. Likewise, we predicted that duration, number of eye gaze fixations and time to first fixation would all predict poorer emotion recognition ability as measured by a separate emotion recognition task.

Methods

Participants

Participants for the autistic group were recruited through a local government-funded support service for autistic individuals or through a disability support service within the university. The design of the study was discussed with an autistic adult with expertise in autism-friendly initiatives. Participants for the NT group were recruited through advertisements in the university, circulated to all post-graduate students by email by two of the authors, and all confirmed that they did not have autism or any other neurodevelopmental disorder. None of the participants in either sample wore glasses. All participants provided informed consent.

In total, ten autistic adults (3 female) and 10 NT adults (3 female) took part (for participant characteristics, see Table 1). All autistic participants had received a formal clinical diagnosis of autism through multidisciplinary assessment in order to access the services that were used for recruitment. Except for two autistic participants, all had attained or were working towards college-level education. To investigate other potential group differences, the participants completed three self-report questionnaires, i.e. Autism Spectrum Quotient-10 (AQ-10; Allison et al., 2012). Empathy Quotient-Short (EQ-Short; Greenberg et al., 2018) and the Toronto Alexithymia Questionnaire (TAQ-20; Bagby et al., 1994), and, to assess verbal IQ, they also completed the Peabody Picture Vocabulary Inventory (PPVT-4, Dunn & Dunn, 2012).

Table 1 Means, standard deviations, range and statistical test of group difference across participant characteristics

In terms of Verbal IQ, except for two individuals in the autistic group who fell within the moderately low range, all other participants scored within the average range. There was no significant difference between groups, F(1, 18) = 2.80, p = .12 (ASD group M = 95.9, SD = 14.66; NT group M = 104.7, SD = 7.86). As was expected, the autistic group scored significantly higher on the AQ-10, F(1, 18) = 33.34, and p < .001, d = 2.58 (ASD group M = 6.9, SD = 2.08; NT group M = 1.7, SD = 1.95). This indicated that the autistic group as a whole fell in or near the clinical range on the AQ-10, but none of the NT group did so. The autistic group was also found to have greater difficulties with empathy, scoring significantly lower on the EQ-Short as compared to the NT group, F(1, 18) = 47.92, p < .001, and d = − 3.10 (ASD group M = 11.1, SD = 6.72; NT group M = 35.0, SD = 8.01). On the TAS-20, the autistic group scored significantly higher scores than the NT group overall, F(1, 18) = 54.71, p < .001, and d = 3.31 (ASD group M = 63.1, SD = 8.2; NT group M = 35.0, SD = 8.78), demonstrating higher levels of alexithymia, a difficulty in recognise and naming their own emotions, which can affect understanding of other people’s emotions. A score over 61 provides a strong indication of alexithymia—six of the autistic participants, but none of the NT participants scored above this threshold.

Procedure

All of the recordings were conducted by the first author and located in the same room in the university (for images of the experimental setup, please see Fig. 1). Participants sat 100 cm from the researcher during the face-to-face interaction. In order to measure gaze during the study, participants wore eye-tracking glasses (Pro Glasses 2, Tobii). Through corneal reflection, the glasses sample eye movements at a rate of 100 Hz (per eye). An external scene-viewing camera, that is integrated into the bridge of the glasses, provided a video recording from the participant’s perspective at 25 fps. A standard calibration procedure for the glasses, as outlined in the Tobii manual, was used with each participant before the start of each recording to ensure high-quality data was collected. This involved asking participants to look at a calibration target provided by Tobii, which is positioned at 100 cm from the person, and held still while the software conducts a calibration procedure. Successful calibration is indicated by the software and the experiment could begin.

Fig. 1
figure 1

Static images extracted from the eye-tracking glasses’ front-facing camera, which give an overview of a the experimental setup, b how AOIs were drawn and c how AOIs were adjusted

The face-to-face interaction was semi-structured, with each participant being asked the same series of questions about various topics, including holidays and hobbies. Participants were then asked to complete an emotion recognition task and a series of questionnaires, which were presented through the website Qualtrics (Qualtrics, 2014), on a Dell 1680 × 1050 screen (60 Hz). The emotion recognition task required participants to identify the emotions of static faces, which were shown for 500 ms. Participants were required, after each face was presented, to select from seven emotion-related responses: neutral, angry, sad, happy, surprise, disgust or fear. In total, 42 faces were pseudo-randomly presented, showing six different people displaying the seven emotional expressions. The faces of three males and three females were used, taken from Ekman’s ‘Pictures of Facial Effects’ (Ekman & Friesen, 1976). A total score for Emotion Recognition Accuracy (ERA) was calculated for each participant, in addition to scores for each individual emotion. Participants then completed the three self-report questionnaires (AQ-10, EQ-S, TAS-20) and the PPVT-4. Finally, participants were debriefed about their experience of the study. The majority of the participants completed the whole study within 60 min.

Measures

ASD Traits

The ten-item Autism Spectrum Quotient-10 (AQ-10; Allison et al., 2012) was used to assess autistic traits of all participants. Higher scores indicate a greater degree of autistic traits. As can be seen in Table 1, the ASD group had a mean score above the clinical threshold of 6 and over, while the neurotypical group had a mean of just 1.7.

Empathy Quotient-Short

The EQ-Short is a ten-item version of the full Empathy Quotient measure, designed to assess empathic responding and awareness of others’ mental states (Greenberg et al., 2018).

Toronto Alexithymia Questionnaire

The TAQ-20 is a twenty-item measure of alexithymia (Bagby et al., 1994), which commonly co-occurs with autism (Kinnaird et al., 2019), and is associated with differences in emotion recognition abilities (Cook et al., 2013).

Peabody Picture Vocabulary Inventory

The PPVT-4th edition is a widely used measure of receptive vocabulary (Dunn & Dunn, 2012) and correlates highly with verbal intelligence scores (Mulhern et al., 2017).

Data Analyses

Given the richness of data contained in the 15 min of social interaction, in line with previous eye-tracking studies, we used a limited time window to sample the data (Birmingham et al., 2017; Freeth & Bugembe, 2019). Data was processed for two 10-s samples both early and late in each interactional episode to coincide with periods when the participant was either talking (talk phase) or listening (listen phase), and mean scores were generated across the two sampling periods.

The metrics generated consisted of measures of the area of interest (AOI) that were drawn around the researcher’s eyes (an example can be seen in Fig. 1), as well as around the face region. For each phase and AOI, measures were generated for the fixation count, mean duration of gaze on the area of interest and time to first fixation. Using a limited number of AOIs is not uncommon; only four of the previous fourteen studies that have used face-to-face interactions have included an expanded selection of AOIs (Falkmer et al., 2011; Freeth & Bugembe, 2019; Hanley et al., 2014, 2015).

Participant characteristics and emotion recognition data were analysed using one-way ANOVA. The eye-tracking data were analysed using 3-way mixed-model ANOVAs, 2 (group: autistic vs NT) × 2 (AOI: eyes vs face) × 2 (phase: talk vs listen), and linear regression on both the autism symptom measure (AQ) and the measure of Emotion Recognition Accuracy (ERA). However, given the small sample sizes, we could not be sure that the data would be robust to violations of normalcy for any of the statistical testing; therefore, we utilised permutation testing for all ANOVAs and linear regressions. This is an approach in which resampling is used to create a distribution for the test statistic (Collingridge, 2013). The process involves permuting the observed data multiple times to generate a test statistic distribution, against which one can compare the observed values. Permutation tests require the assumption of exchangeability under the null hypothesis, which is not always met in complex ANOVA and linear regression designs. Some authors use a method of restricting permutations based on exchangeability units (Anderson & Braak, 2003) and premised on the idea that any unknown parameters in a model that are not being tested should be kept constant under permutation. However, recent work by Frossard and Renaud (2021) developed methods to transform the data to reduce the impact of nuisance variables before the permutation. A further benefit is that permutation testing is a powerful alternative to Family-Wise Error Rate correction methods such as Bonferroni (Camargo et al., 2008). These methods are available in the R package permuco, which we used in this data analysis. The visualisation of data was done using ggplot2 package, with box and whisker plots showing the median, produced in the Tukey style.

Results

Eye-Tracking Face-to-Face Interaction

Gaze Data Integrity

No significant difference was found between groups for the percentage of total valid gaze data, averaged across AOIs, t(18) = − 1.43, p = .17, and d = − 0.64 (ASD group M = 80%, SD = 16%; NT group M = 88%, SD = 7%). Eye-tracking data always has some missing data points due to processes that prevent an accurate reading at a particular moment in time, such as the participant blinking. Nevertheless, as variability in the amount of data collected is not uncommon (e.g. Freeth & Bugembe, 2019), a measure that has commonly been employed is to convert data into a proportion for analysis, namely, by dividing the total duration of gaze for an AOI by the total amount of data collected within those 10 s. This method was used when reporting time-based analysis.

To test the hypothesised differences in eye gaze patterns, on either the face or the eye region, between autistic and NT participants, three measures are used: the total number of fixations (fixation count) duration of gaze (gaze duration) and time to first fixation (time to first fixation) which is the time in seconds that it took each participant to look at a particular AOI.

Fixation Count

The mixed model ANOVA for fixation count revealed a significant between-subjects main effect of group (F(1, 18) = 4.61, p < .046, \({\eta}_p^2\) = .20), with NT participants having a higher mean number of fixations 7.03 (SD = 3.89) compared with ASD participants mean fixations 4.55 (SD = 4.04). As can be seen from Fig. 2, the autistic group made significantly fewer fixations overall compared with the NT group. Neither the within-subjects main effect of phase (F(1, 18) = 3.08, p < .10) or AOI (F(1, 18) = 0.20, p = .66) was statistically significant. This supports the hypothesis that autistic participants would make fewer fixations during the conversation than NT participants.

Fig. 2
figure 2

Mean count of fixations by phase, AOI and group

The interaction between group and phase was not significant (F(1, 18) = 2.56, p = .13) nor was the interaction between group and AOI (F(1, 18) = 3.10, p = .093) or between group, phase and AOI (F(1, 18) = 0.28, p = .605).

Duration of Gaze on the Eyes or Face

The mixed model ANOVA (see Fig. 3) revealed a significant between-subjects main effect of group (F(1, 18) = 6.39, p < .02, \({\eta}_p^2\) = .26), with NT participants having a greater overall gaze duration than autistic participants (NT M = 40%, SD = 27%; ASD M = 22%, SD = 25%). This supports the hypothesis that autistic participants would have a reduced duration of fixation during the conversation than NT participants. There was also a within-subjects main effect of phase (F(1, 18) = 25.84, p < .001, \({\eta}_p^2\) = .59), with listening (M = 41%, SD = 31%) being associated with a greater percentage of gaze duration than talking (M = 20%, SD = 18%). There was a within-subject main effect of AOI (F(1, 18) = 33.98, p < .001, \({\eta}_p^2\) = .65), with a greater percentage of gaze duration on the face (M = 43%, SD = 30%) than the eyes (M = 19%, SD = 18.5%). The interaction between group and phase was not significant (F(1, 18) = 1.93, p = .18) nor was the interaction between group and AOI (F(1, 18) = 0.128, p = .73).

Fig. 3
figure 3

The mean duration of gaze by group, across AOI and phases

However, there was an interaction between Phase and AOI (F(1, 18) = 11.35, p < .005, \({\eta}_p^2\) = .39). As expected, on the contrasts within phases, duration was longer on the face than eyes (while listening, t = 0.571, p < .001; while talking, t = 4.92, p < 0.001). Furthermore, eye gaze duration was greater on the face while listening than on the eyes while talking (t = 7.75, p < .001). However, there was no significant difference between eye gaze duration on the face while talking and the eye region while listening (t = − 0.518, p = .95). The interaction between group, phase and AOI was not significant (F(1, 18) = 0.28, p = .60) as can be seen in each panel of Fig. 3. The autistic group has a smaller mean duration of gaze than the NT group for both phases and across both AOI.

Time to First Fixation on Eyes or Face

Autistic and NT participants did not differ significantly overall in time to first fixation (F(1, 18) = 1.17, p = .29) (see Fig. 4). This contradicts the hypothesis that autistic participants would be slower to first fixation than NT participants. There was a significant within-subjects main effect of phase (F(1, 18) = 12.49, p < .002, \({\eta}_p^2\) = .41), with listening being associated with a shorter mean time to first fixation (M = 1.15 s, SD = 1.05) compared with talking (M = 2.52 s, SD = 2.09). A group by phase interaction was also significant (F(1, 18) = 6.17, p < .02, \({\eta}_p^2\) = .26), with autistic participants being quicker (M = 1.83 s, SD = 1.80) than NT participants (M = 3.20 s, SD = 2.18) to establish eye contact during the talking phase and slower (M = 1.43 s, SD = 1.27) than NT participants (M = 0.88 s, SD = 0.71) during the listening phase. Separate analysis by group revealed that the NT participant time to first fixation varied significantly by phase (F(1, 9) = 15.44, p < .003, \({\eta}_p^2\) = .36), but the ASD time to first fixation did not (F(1, 9) = 0.66, p < .44). There was no significant difference for the within-subject main effect of AOI (F(1, 18) = 0.43, p = .53). However, there was a significant effect of the interaction between phase and AOI (F(1, 18) = 5.24, p < .03, \({\eta}_p^2\) = .23), with mean time for eye gaze to face (M = 0.73 s, SD = 0.44) being significantly shorter than for eye gaze to eyes (M = 1.57 s, SD = 1.31) during the listening phase. The three-way interaction between group, phase and AOI was not significant (F(1, 18) = 0.01, p < .92).

Fig. 4
figure 4

The mean time to make first fixation on the AOI of an interaction partner by group and phase

Emotion Recognition Task

In addition to examining group differences in gaze behaviour in interpersonal interactions, we also sought to examine if such differences might predict the ability of participants in a standardised lab-based experiment on emotion recognition. The purpose of this task was to investigate the overall ability of each group to identify the emotional expressions on photographs of static faces correctly which may be related to reduced eye contact in autistic participants. Differences were found for the total Emotion Recognition Accuracy (ERA) scores and for scores across emotional expressions of angry, happy and neutral. All the p values quoted in this section are for null hypothesis tests with 10,000 permutations. A significant difference was found between groups for the angry expression, (F(1, 18) = 7.19, p = .02, d = − 1.19), with NT group more accurately identifying faces as angry (NT group M = 68.3%, SD = 14.4%; ASD group M = 48.4%, SD = 18.5%). The NT group (M = 100.0%, SD = 0.0%) also achieved significantly higher accuracy than the autistic group (M = 92.9%, SD = 10.1%) for the happy expression (F(1, 18) = 4.91, p = .04, d = − 0.99). The NT group (M = 93.3, SD = 11.6%) also achieved significantly higher accuracy for the neutral expression as compared to the autistic group (M = 69.9%, SD = 30.2%) (F(1, 18) = 5.24, p = .02, d = − 1.024). The NT group (M = 70.0%, SD = 5.4%) achieved a significantly higher ERA score than the autistic group (M = 56.0%, SD = 14.2%) (F(1, 18) = 8.46, p = .004, d = − 1.30). No significant differences were found between groups for the sad, fearful, surprised or disgusted emotional expressions (all p > .12). Finding no group differences across several emotional expressions is not uncommon in this research area (Uljarevic & Hamilton, 2013). Furthermore, some level of group differences across expressions, particularly for the negative emotional expression of anger, is consistent with the broad research base (e.g. Humphreys et al., 2007).

Linear Regression

To examine how various eye movements may relate to the autistic symptoms and Emotion Recognition Accuracy (ERA) score, we carried out two exploratory linear regressions, with the AQ and ERA as the dependent variables. All three eye gaze metrics were included as predictors, but we focused on the data for the eyes AOI, as this may be regarded as the more stringent measure of typical behaviour.

As can be seen in Table 2, fixation count while listening and the gaze duration while talking were significant predictors of autistic symptom scores on the AQ-10, with fewer fixations and shorter gaze duration being associated with higher AQ-10 scores. In Table 3, the linear regression on to Emotion Recognition Accuracy highlights a similar relationship with the gaze duration while talking being a significant predictor of accuracy, with greater accuracy being associated with increased duration. However, in contrast to the AQ-10 scores, the fixation count while listening did not predict emotion recognition skills. Additionally, verbal IQ emerged as a significant predictor of emotion recognition.

Table 2 Linear regression to AQ
Table 3 Linear regression to emotion recognition accuracy

Discussion

The current study investigated how autistic adults and NT adults may differ from one another in how they use eye contact during a semi-naturalistic face-to-face social interaction. Some strong group differences emerged, including that the autistic participants had significantly lower fixation count than the NT participants and a significantly shorter gaze duration, while there was no overall group difference for the time to make the first fixation. These first two findings confirmed our hypothesis of a reduction in quantity and length of gaze fixations during interpersonal interactions, and this finding is concordant with Hanley et al.’s (2015) findings. Additionally, autistic participants were quicker than NT participants to first fixation while talking but slower than NT participants while listening. This difference in the interaction between phase and group was mostly due to more modulation of time to first fixation between phases in the NT group, whereas the autistic participants time to first fixation did not differ significantly between phases. We have not seen this apparent lack of modulation in the speed of first fixation by autistic participants reported in the literature before, and this appears to be a novel finding.

Some level of difference between groups in how gaze is directed towards an interaction partner is consistent with most studies that have employed face-to-face interactions with autistic and NT groups (e.g. Doherty-Sneddon et al., 2013; Freeth & Bugembe, 2019; Hanley et al., 2015; Mirenda et al., 1983). Autistic participants, in a similar manner to NT participants, demonstrated a longer gaze duration for the listening phase of a conversation compared to the speaking phase. The main effect of phase was very strong, and the adjustment of eye contact depending on the conversational phase is a robust finding within the NT population (e.g. Cook, 1977; Ehrlichman, 1981; Freeth et al., 2013; Ho et al., 2015). Furthermore, although not consistent across studies, observing an adjustment of gaze duration depending on the conversational phase has sometimes been reported in autistic participants previously (e.g. Doherty-Sneddon et al., 2013; Falck-Ytter et al., 2015; Freeth & Bugembe, 2019). Our data provides strong support for the suggestion that autistic people modify their gaze patterns according to the phase of the conversation.

Despite this similarity in response to the conversational phase, the autistic participants were consistently fixating less on both eyes and face region during each phase of the conversation compared to the NT participants. It could be that this similarity in response to conversational phase between both autistic and NT participants may sometimes mask the more subtle eye gaze differences between the groups in previous studies. The modification of eye gaze behaviour according to the phase of conversation by autistic participants may have made detecting the differences from neurotypical participants more challenging.

Freeth and Bugembe (2019) reported similar findings to ours but also noted the significant heterogeneity in the behaviour of the autistic samples. Heterogeneity was noted in the current autistic sample in some aspects of gaze behaviour when analysing the mean duration of gaze to the face when listening. The current study benefited from access to data for both the face region and the eye region specifically, as a measure of gaze behaviour.

Beyond examining group differences between the two samples, we also used linear regression to explore the relationship between gaze behaviours and autistic symptoms. We found that fixation count while listening and duration of gaze while talking both predicted autistic traits. Additionally, the gaze duration while talking and verbal IQ both significantly predicted emotion recognition scores across the two samples. While many studies have attempted to assess eye gaze during emotion recognition tasks across autistic and NT samples (Leung et al., 2013; Santos et al., 2012; Van der Donck et al., 2021; Wang & Adolphs, 2017), previous studies have not linked data from gaze behaviour in live social interactions with facial expression recognition data. If difficulties with eye contact are partly mediated through the emotional arousal and competing processing demands created in real-life social situations, one would not expect to be able to detect these differences with experimental designs that present the stimulus through static images and video screens that are unlikely to generate sufficient emotional arousal in participants. Further work is needed to assess the degree to which arousal plays a part in these differing eye gaze patterns, and one useful additional measure to include in future studies would be heart-rate monitoring during social interactions.

Many studies have attempted to explore gaze behaviour in autism, often in rigorous but rather artificial experimental designs. The technology to carry out eye-tracking has made significant advances in the last ten years, from apparatus that required contact lenses, bite-bars (Mele & Federici, 2012), chinrests (Sumner et al., 2021) and headrests (Vaidya et al., 2014) to much more sophisticated equipment very similar to everyday spectacles or reading glasses. As Boraston and Blakemore (2007) pointed out, many experimental studies of autism using eye-tracking do not involve real people, and some of this reluctance to use live conversational partners in this research paradigm was due to the clunky and difficult to manage the hardware used for eye-tracking. However, modern eye trackers enable a more realistic experimental design, in which the subject “looks back” at the participant, creating a complex, real-world emotional and social scenario in which we can explore the eye gaze of the autistic person. Building on the work of Hanley et al. (2015), this study has demonstrated that with modern, unobtrusive eye-tracking technology, even with modest sample sizes, it is possible to identify some of the differences in gaze behaviour between autistic adults and neurotypicals and that these differences appear to be related to core traits of autism and emotion recognition ability.

Limitations and Future Research

This study had several limitations. Data-intensive studies involving visual search and eye-tracking often rely on small samples due to the high volume of data produced by each participant and the burden imposed by coding the AOI data from each still image from a 25 fps video recording. Samples sizes as small as that of the current study can be seen in other published research involving visual search (Hanley et al., 2015; Papeo et al., 2019; Suzuki & Cavanagh, 1995). However, a risk with small sample sizes is that a study will lack statistical power and therefore have a low probability of finding true effects (Button et al., 2013). But contrary to this concern, all of the ANOVA effect sizes reported in the current study were values of partial eta-squared greater than 0.14, the threshold for large effect sizes according to Cohen (1988). Additionally, all the statistical tests were calculated using permutation testing to reduce the risk of potential violations of normalcy. However, even with these caveats, small sample sizes do reduce the generalisability of results, and therefore, replication, preferably with a larger sample, would be beneficial.

We only included autistic participants without a co-occurring intellectual disability and neurotypical participants. We do not know whether these gaze differences would have been seen in autistic adults with intellectual disabilities or, for that matter, other neurodevelopmental differences, though it is important to note that the emotion accuracy score was predicted by a combination of gaze behaviour and verbal IQ. Further work is required to disentangle the role of intellectual functioning in these processes.

During de-briefing, some autistic participants commented on their own self-conscious eye-gaze behaviour. Future research would benefit from testing whether the conscious intentional control of eye-gaze by autistic participants plays any role in differences from neurotypical eye-gaze. The social interactions in this study were only semi-naturalistic, as though they involved real-world conversational partners and happened face-to-face; the experimenter guided the structure of the conversation. Likewise, the study only had scope to analyse data on a couple of topics (holidays and hobbies) and did not touch upon challenging or negative topics, which might have produced different responses. Further studies that collect data on eye-gaze behaviour in autistic adults in truly naturalistic settings would be highly beneficial; particularly, testing the potential role anxiety in social situations may play in attenuating autistic eye contact.