Introduction

Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by compromised social communication and restricted and repetitive behavior. Abundant research has demonstrated that individuals with ASD differ from those with typical development (TD) with respect to both visual fixation patterns and oculomotor performances (Chita-Tegmark, 2016; Frazier et al., 2017; Johnson et al., 2016; Kanner, 1943).

The implementation of eye tracking technology has greatly promoted our understanding of both visual attention and oculomotor dynamics in ASD. Interestingly, the majority of eye tracking research has been focused on examining characteristics of visual fixation in individuals with ASD, and a variety of ASD related features has been revealed. One of the most prominent features is that individuals with ASD present a decreased visual fixation to social stimuli, and an increased fixation to nonsocial stimuli (Chita-Tegmark, 2016; Frazier et al., 2017; Klin et al., 2002). Decreased fixation to socially relevant information might lead to failures of detecting important clues for social interaction, which would further contribute to impaired social interaction (Murias et al., 2018). As compared to people with TD, it has also been found that ASD individuals tend to pay less visual attention to biological motion (movements produced by humans or animals) (Klin et al., 2009), and have higher preference for repetitive movements (e.g., a butterfly performing circular movements) versus random movements (e.g., a butterfly flying randomly) performed by the same cartoon characters (Wang et al., 2018). Interestingly, some studies reported that people with ASD performed better in local or piecemeal processing tasks than their TD peers, in which participants were required to detect a unique element among a number of foils (Gliga et al., 2015; Grinter et al., 2009). Although mixed results have been reported (Jones et al., 2017; Nadig et al., 2010), it is widely accepted that individuals with ASD exhibit an atypical pattern of visual fixation in certain circumstances.

In terms of oculomotor performance, abnormalities in fixation maintenance, saccade, and visual pursuit have been reported in ASD. For instance, Nowinski et al. (2005) investigated the suppression of intrusive saccades and the ability to sustain eccentric gaze in individuals with ASD. Participants were required to fixate on a central target, a peripheral target, and the remembered central target location. No difference was found in terms of frequency of intrusive saccades, but their results demonstrated a significant increase in amplitude of intrusive saccade and a significant decrease in time of target re-fixation after intrusive saccades in ASD individuals when fixating on the remembered central target location (Nowinski et al., 2005). Regarding saccadic eye movement, visually-guided tasks were typically implemented, in which participants were required to fixate on a peripheral target as rapidly as possible after focusing on a central target. Reduced saccade accuracy (Miller et al., 2014; Schmitt et al., 2014), longer saccade latency (Miller et al., 2014), and elevated instability of fixation (Johnson et al., 2016; Sumner, Hutton, & Hill, 2020) have been reported in individuals with ASD compared to their TD peers. Concerning visual pursuit, Sumner, Hutton and Hill (2020) found that children with ASD spent significantly less time when visually following a target, which oscillated in a horizontal motion at different frequencies (Sumner et al., 2020).

Again, despite the fact that atypical ocular motion has been frequently reported in ASD, considerable controversies exist. For instance, unlike abovementioned studies reporting reduced saccade accuracy and longer saccade latency with ASD, Kovarsk et al. (2019) investigated characteristics of eye movements in ASD during different visual tasks, and their results showed that children with ASD were even faster and more accurate to visually reach targets in certain tasks as compared to the TD counterparts (Kovarski et al., 2019). Indeed, the discrepancy of findings in previous studies might result from participants’ characteristics (e.g., age and sex ratio) and task variance (Harrop et al., 2018; Kovarski et al., 2019; Schmitt et al., 2014).

To date, oculomotor performance in ASD has been mainly examined in stimuli viewing scenarios. No study has been found to investigate whether eye movements are different in individuals with ASD during face-to-face interactions. In fact, a few studies demonstrated that the physical presence of a social partner was a potent stimulus eliciting a different pattern of neural response and gaze behavior from tasks without genuine interpersonal interactions (Freeth, Foulsham, & Kingstone, 2013; Pönkänen et al., 2011). These findings suggest that oculomotor performance in ASD during face-to-face interaction might be different from that in stimuli viewing scenarios.

Brief Introduction to This Study

Using a head-mounted eye tracker, this study collected the eye movements of both children with ASD and TD while they were engaged in a face-to-face conversation with an interviewer. Since the eye tracker was tightly attached to head, there was no relative movement between the head and the device. Therefore, eye movement could be estimated by analyzing the gaze behavior. The primary objective of this study was to investigate the randomness and the amount of eye movement in ASD, which have been rarely examined by prior research. The investigation of these two aspects of oculomotor performance in ASD not only provides opportunities to understand the abnormalities in the expansive neural network that controls the eye movement, but also offers quantifiable behavioral markers that might facilitate the objective identification of ASD. This is of particular scientific and practical significance given the fact that the current diagnosis of ASD heavily relies on observational evaluation, which is negatively affected by a variety of subjective factors such as caregivers’ report bias and clinicians’ insufficient experiences of detecting ASD (Möricke, Buitelaar, & Rommelse, 2016; Tebartz van Elst et al., 2013).

Apart from comparing group differences, it was also examined how much time was needed to observe the significance of group difference in eye movement. This is a critical question in both ASD research and clinical application because children with ASD are sensitive to wearing devices, and great difficulty might be experienced to make them follow experimental instructions (Dufour & Lanovaz, 2020). Therefore, knowledge about the amount of time needed to reveal the group difference has valuable implications for developing diagnostic tools using eye movement data.

Method

Participants

Eye tracking data of this study were obtained from a larger experiment which examined the natural social behavior in children with ASD during a face-to-face conversation (Z. Zhao et al., 2021a; Zhong Zhao et al., 2021b). The sample size was estimated as 34 (n = 17 for ASD and TD, respectively) using power analysis (between-group t-test, d = 1, α = 0.05, power = 0.8) on eye tracking data during face-to-face interactions (Falck-Ytter, 2015; Hutchins & Brien, 2016). Finally, 20 children with ASD and 23 children with TD were enrolled in our study. Data of 4 participants (1 ASD and 3 TDs) were excluded due to data loss in the eye tracking process. Children with ASD were recruited from a first-class mental health center in China. The accuracy of the ASD diagnosis was assured through a variety of rigorous procedures. First, the diagnosis of ASD was made by a licensed psychiatrist with no less than 5 years’ clinical experience by strictly following the DSM-IV criteria. Afterwards, the ASD diagnosis was further evaluated by a senior psychiatrist. A consultation with at least two additional senior psychiatrists would be conducted if disagreement took place. In addition, participants with ASD needed to fulfill the following criteria: (a) aged between 6 and 13 years old; (b) having at least average non-verbal intellectual ability (IQ was first screened by the psychiatrist and subsequently assessed as IQ ≥ 70 with the Raven’s Advanced Progressive Matrices); (c) absence of other clinical conditions such as schizophrenia and ADHD, and not on medication at the time of experiment; (d) being capable of maintaining average verbal communication (assessed by a speech–language psychologist in preliminary screening). Children with TD were healthy participants included from local schools if aged between 6 and 13, and reporting no physical or mental disorders. In addition, no ASD/ADHD was reported in the first-degree relatives of this group. Children with TD also received the Raven’s Advanced Progressive Matrices to assure that their IQ was above the average level. Written informed consent approved by the local ethics committee was provided by participants’ caregivers. Participants were compensated with 200 CNY for their participation in the experiment. The experimental protocol conformed to the Declaration of Helsinki. The subject’s demographics is presented in Table 1.

Table 1 Subject demographics and group comparisons

Eye Tracker

Participants wore a head-mounted eye tracker (Tobii Pro Glasses 2, sampling frequency 50 Hz; Tobii Technology, Stockholm, Sweden) during the conversation, and they were seated 80 cm away from the interviewer’s chair (Fig. 1a). Tobii Pro Glasses 2 is a light-weighted, glasses-like eye tracker that tracks the natural gaze behavior of the wearer without constraining head movement. The direction of eye gaze was recorded by four sensors receiving near infrared light beams reflected by the user’s eyes. The eye tracker comprised a scene camera to video-record the front scene of the wearer (Fig. 1b). Finally, time series of gaze allocation in the coordinate system based on the scene camera could be obtained. Since the eye tracker was tightly worn (without causing self-reported discomfort), there was no relative movement between the scene camera and the head. Therefore, the eye movement in the eye sockets could be estimated by the position of gaze allocation.

Fig. 1
figure 1

a Experimental setup. b Scene camera and the coordinate system. x and y coordinates correspond to the horizontal and vertical directions of eye movement respectively

In our experiment, participants were instructed to act naturally, and not to move the glasses (eye tracker) or make abrupt or intense head movements during the conversation with an interviewer. The genuine function of the eye tracker was not revealed to them. All participants were asked whether they knew the genuine function of the glasses after the whole experiment, and none of them were aware that it was used for recording eye movement.

Experimental Procedure

Prior to the start of the conversation, participants were arranged to sit on a chair and to wear the eye tracker under the experimenter’s guidance. Afterwards, the one-point calibration procedure was conducted, in which participants focused their gaze on the center point of the calibration card that was placed around the position of the interviewer’s body (within Tobii’s recommended calibration distance of 0.5–1.5 m). Next, the interviewer came in and sat on the other chair once the calibration was completed. Note that the interviewer was not informed of the participants’ group membership, and she was specifically required to behave consistently across all participants regarding the way she spoke, gestures, and other body movements.

The structured conversation was launched by the interviewer to greet the participant, followed by the chronologically arranged sessions/topics: Generic question (also referred to the 1st session), Hobby sharing (2nd session), Yes–no question (3rd session), and Question raising (4th session). Only data of the first session were analyzed in the present study. In this particular session, the interviewer posed six questions to the participants, and the purpose of this session was to help both people get familiar with each other (please refer to “Appendix” for questions asked in this session).

The conversation was videotaped by two still cameras for the purpose of studying the participant’s social behavior. One camera (Samsung HMX-F90, sampling frequency 25 Hz) recorded both persons’ behavior during the conversation, with each person separated equally on the left and right side of the recording view. The other camera (Logitech C270, sampling frequency 30 Hz) was placed beside the interviewer to capture the participant’s behavior from the front view.

Eye Tracking Data Analysis

All the x and y coordinates of the gaze point were exported by the Tobii Pro Lab in units of pixels. The coordinate system for the recorded gaze data was based on the scene camera video, whose size was 1920 pixels (x-axis) × 1080 pixels (y-axis) (Fig. 2).

Fig. 2
figure 2

Illustration of the coordinate system of the scene video camera and the position of gaze allocation

Computation of Randomness and Amount of Eye Movement

Given a time series of gaze point (a1, a2, a3,…,ai), the first step was to set the length of the examined time window. The purpose of creating time window was to seek the time duration needed to reveal the group differences in eye movement measures. We examined the time duration of k second(s) (where k = 1, 2, 3,…,up to 25). A maximum time window of 25 s was chosen based on the principle of preserving the maximum number of participants, as some participants would have to be excluded due to great data loss (> 30%) when the duration of the time window was greater than 25 s (Wang et al., 2020). Since the sampling frequency of the eye tracker was 50 Hz, the number of gaze points within the examined time window would be 50 * k.

Second, the randomness and amount of eye movement were calculated within the examined time window. We used entropy analysis to quantify the randomness of eye movement in this study. Entropy was initially termed in thermodynamics to denote the form of energy no longer available for physical work (Clausius, 1867). In probability theory, information theory, and the theory of dynamical systems, it is a variable that quantifies the level of uncertainty, complexity, or irregularity (Kolmogorov, 1959; Shannon, 1948). In the literature of ASD, prior studies have implemented entropy analysis to quantify the randomness/regularity of movement in postural sway, head movement, and visual scan (Fournier et al., 2014; Wang et al., 2020; Zhong Zhao et al., 2021b). Higher entropy is associated with an elevated level of randomness in movement (i.e., less regular movement), and lower entropy is an indication of lower degree of randomness in movement (i.e., more regular movement). In this study, the randomness of eye movement was computed with Shannon entropy based on the gaze allocation data. Given that the size of the scene camera video was 1920 × 1080 pixels, the image was divided into 22,000 (200 × 110) equally sized blocks (Fig. 3). This meant that the size of each block was roughly 10 × 10 pixels, which was neither too small to treat extremely close gaze allocations as different points, nor too large to include distantly allocated gazes in the same block. To examine the randomness of gaze allocation in these blocks within k second(s), Shannon entropy was computed as:

Fig. 3
figure 3

Exemplary illustration of gaze allocation for an ASD and a TD participant at a 5 s time window

$$\mathrm{Entropy}=-\sum_{i=1}^{n}p\left({x}_{m}\right)*\mathrm{log}2p\left({x}_{m}\right),$$

where n = 22,000, p(xm) is the probability of the mth block.

As for the amount of eye movement, the first step was to compute the sum of the distance between two consecutive gaze points within the time window, measured in pixels. To counterbalance the variance of gaze loss in different k-second time windows, the amount of eye movement within the time window was calculated as:

$$\text{Amount of eye movement}=\frac{\text{sum of distance }* 50\text{k}}{\text{number of valid gaze points}}.$$

Third, the examined k-second time window shifted forward in time by one gaze point, and the calculation of Shannon entropy and amount of eye movement repeated. The window continuously shifted until the end of the trial. Note that time windows with few data points (data loss > 30%) were discarded from further calculation (Wang et al., 2020). The illustration of time window and window shift is presented in Fig. 4.

Fig. 4
figure 4

Illustration of time window and window shift

Finally, the randomness and amount of eye movement within the whole trial was computed by taking the means of all the examined k-second time windows. In this way, the randomness and amount of eye movement within 1 s, 2 s,…,25 s could be obtained.

Results

Table 1 presents participants’ demographics. Results showed a significant difference in nonverbal IQ between ASD and TD in the present study, suggesting that nonverbal IQ might be a potential confounder for eye movement measures. To address this concern, Pearson correlation tests were conducted to examine the relation of IQ with the randomness and the amount of eye movement.

Results showed that none of the randomness of eye movement measures was significantly correlated to IQ (all p values > 0.05), but all the amount of eye movement measures were significantly correlated to IQ (all p values < 0.05). Based on these results, IQ was entered as a covariate in comparisons on the amount of eye movement, but not on the randomness of eye movement.

Randomness of Eye Movement

Independent t-tests were performed to examine whether the ASD group significantly differed from the TD group with respect to the randomness of eye movement. All the results are presented in Table 2 and Fig. 5. Results showed that the entropy values were significantly higher in the ASD group than those in the TD group at all time windows from 1 s through 25 s, indicating a higher level of randomness in the ASD children across all the examined time durations.

Table 2 Comparison of randomness of gaze allocation between ASD and TD within different time durations
Fig. 5
figure 5

Comparison between ASD and TD on the randomness of gaze allocation (upper panel) and the corresponding p values (bottom panel) at different lengths of time window

Amount of Eye Movement

ANCOVAs were performed to examine the significant difference in the amount of eye movement between ASD and TD. The dependent variables were the amount of movement at different time windows. The independent variable was group, and IQ was entered as a covariate variable in all ANCOVAs. Results presented in Table 3 and Fig. 6 showed that the ASD group had a significantly greater amount of eye movement than the TD group only within a short amount of time (≤ 3 s). The statistical significance disappeared in longer time durations.

Table 3 Comparison of amount of eye movement between ASD and TD within different time durations
Fig. 6
figure 6

Comparison between ASD and TD on the amount of eye movement (upper panel) and the corresponding p values (bottom panel) at different lengths of time window

Discussion

The present study investigated the characteristics of oculomotor performance in children with ASD during a face-to-face conversation. Results demonstrated a higher level of randomness and a short-term excessive eye movement in children with ASD as compared to those with TD. Different from previous studies having participants looking at static or dynamic stimuli, the innovation of this study was that we investigated the dynamics of eye movement during a face-to-face conversation with a human partner. In addition, we examined the randomness and the amount of eye movement, which revealed novel features of eye movement that have been rarely studied in prior research. Further, the study of time duration in the present study could provide important information when designing the duration of future experiments or clinical tests.

In a typical face-to-face interaction with a real person, individuals do not consistently stare at the social partner. But rather, gaze shifts occur all the time. For instance, individuals tend to look at the partner when the person is speaking (Jones et al., 2017; Klin et al., 2002), and look away when dealing with tasks with high cognitive load (Doherty-Sneddon et al, 2002; Glenberg, Schroeder, & Robertson, 1998). Previous studies also demonstrated that individuals look at different important body features such as mouth, eyes, and body, to help decode the intention and emotion of the social partner (Melinger & Levelt, 2005; Sasson et al, 2016). Our results demonstrated that ASD children produced more eye movement at all time scales, although significant difference was only found within 3 s (Fig. 6). This finding was consistent with prior research which showed that children with ASD made more saccadic eye movement, and had shorter spontaneous fixation durations (Kemner et al, 1998; Nackaerts et al., 2012; Wass et al., 2015).

Apart from excessive eye movement, our results also showed that eyes moved in a less regular fashion in children with ASD, as evidenced by a more sparsely allocated gaze points. These results might be partially explained by the attentional deficit in selectively attending to social information out of irrelevant information among children with ASD (Chita-Tegmark, 2016; Frazier et al., 2017). When interacting with a real human, individuals with TD fixate more on important social information (i.e., face, and body), and less on background (Z. Zhao et al., 2021a; Zhong Zhao et al., 2021b). In contrast, visual fixation in ASD was more equally assigned between social and irrelevant information, leading to more sparsely allocated gaze points, and thus a higher entropy value.

Participants’ head movement was not constrained, and the eye tracker was tightly attached to the head in our study. This meant that gaze allocation in the coordinate system of the scene camera did not reflect how participants fixated on external stimuli, as fixation shift from one external object to another could be realized through head movement without making eye movements. A recent study also computed Shannon entropy of gaze allocation to understand the scanning strategy children with ASD used in face viewing tasks (Wang et al., 2020). Consistent with our finding, their research showed a significantly greater entropy in gaze allocation in children with ASD. However, Wang et al. (2020) used a remote eye tracker in their experiment, and thus, the gaze allocation simply reflected how participants looked at the presented stimuli. In our study, however, the result of greater entropy in children with ASD could be interpreted that eyes moved in a more random (or less regular) fashion in these participants.

The present study examined the influence of the time duration, and found a significant higher entropy across all examined time durations and a significantly greater amount of eye movement with the time durations less than 3 s. The results on entropy analysis echoed Wang et al.’s finding, which showed an increased entropy in children with ASD even within a very short amount of time (500 ms) (Wang et al., 2020). As for the amount of time needed to reveal group difference in the amount of eye movement, it was not clear why significance existed only within 3 s. But these results suggest that time duration is an important factor that influences the group difference, and that a short eye tracking test might suffice to reveal the ASD related deficits.

Limitations and Future Directions

Exploring oculomotor dynamics during a face-to-face conversation was one of the innovations of this study since it expanded our understanding of eye movement performance in a natural interpersonal interaction. However, this also meant that various factors were not controlled (e.g., distractions in the background, and the interviewer’s interactive behavior). For instance, although the interviewer was required to treat all participants equally, her behavior was not exactly the same across all participants. Since visual attention is sensitive to the behavior of the social partner, the group difference could possibly be explained by the variant interactive behavior of the interviewer. To address this concern, we implemented an image differencing techniques (Alviar et al., 2020; Ramseyer & Tschacher, 2011) to estimate the overall amount of movement the interviewer made during the conversation. Statistical analysis failed to reveal a group difference in the amount of the interviewer’s movement [t(24.13) = 0.93, p = 0.361]. Strictly speaking, however, similar amounts of movement does not necessarily imply that the form and the temporal structure of the interviewer’s movement was the same for these two groups of participants. This is a serious question for all studies which intend to adopt genuine social interaction tasks in research, since a real person could not behave exactly the same with different persons, or even with the same person at different times.

Our study investigated the characteristics of eye movement when participants answered generic questions. A sizable amount of research demonstrated that social gaze behavior vary with tasks (Falck-Ytter, 2015; Falck-Ytter, Carlstrom, & Johansson, 2015; Hutchins & Brien, 2016). A few studies reported that eye movement varies with task demands and stimuli familiarity in individuals with TD (Benson, Piper, & Fletcher-Watson, 2009; Kemner et al., 1998). For instance, Kemner et al. (1998) reported more saccadic movement in children with TD when viewing familiar relative to unfamiliar objects (Kemner et al., 1998). In contrast, ASD individuals show less variability in treating different stimuli (Benson et al., 2009; Kemner et al., 1998). In this vein, it is reasonable to argue that eye movement differences between ASD and TD are task-specific. Thus, whether our results could be replicated in other tasks or contexts requires further investigation.

Conclusion

By implementing a head mounted eye tracker, our study showed that children with ASD made more eye movement in a less regular fashion when conversing with a real person. Given the close relation between eye movement and the neural networks that controls it, the atypical pattern of eye movement might be indicative of a structural or functional change in ASD. On the other hand, our results indicated that oculomotor performance might contain objective biomarkers that could be harnessed to identify ASD. This is of particular significance in ASD as it might offer opportunities to develop labor- and time-saving means to ASD diagnosis.