Introduction

Autism is a neurodevelopmental disorder characterized by impairments in three behavioral domains: social interaction, communication, and repetitive behaviors/obsessive interests (American Psychiatric Association, 2000). Many of the abnormal behaviors that constitute the clinical evidence for autism do not appear in mature form before the second and third years of life (e.g., problems with peer relationships and verbal communication skills), which is part of the difficulty in pinpointing the age of onset and typical pattern of progression. Early detection studies designed to identify behavioral markers of the disorder in infancy (including population-based screening studies, retrospective home movie studies, and studies of high-risk populations) are underway to try to overcome this diagnostic obstacle and open new avenues for treatment and intervention (Dumont-Mathieu & Fein, 2005; Osterling, Dawson, & Munson, 2002; Sigman, Dijamco, Gratier, & Rozga, 2004; Yirmiya et al., 2006; Zwaigenbaum, Bryson, Brian, Roberts, & Szatmari, 2005).

The sibling recurrence rate for autism is approximately 5–10% (Sumi, Taniai, Miyachi, & Tanemura, 2006), which represents at least a 20-fold increase over the rate in the general population (Ritvo, Jorde, Mason-Brothers, Freeman, et al., 1990). Because infants with an older sibling with autism represent a “high-risk” cohort, studying these individuals is a strategy to improve efficiency in detecting early behavioral markers. This report is a ‘snapshot’ of 55 infants (31 at-risk, and 24 comparison infants) tested at 6 months of age as part of a larger prospective follow-along effort.

Face Scanning and Social Interaction

Face scanning behavior emerges quite early in typically-developing infants. Newborns show a preference for looking at face-like objects, and will track a schematic moving pattern with a face-like configuration further into the peripheral visual field than a non-face-like pattern (Johnson, Dziurawiec, Ellis, & Morton, 1991). Between the first and second months of postnatal life, typically-developing infants show a rapid growth of sustained visual attention, and a shift in the pattern with which they visually scan a face. At one month, infants scan the edge of live faces, looking mostly at the external contour (hairline and chin). At two months, infants fixate on the eyes and mouth more than on the outer contour of the face, and look longer at talking than at silent moving faces (Haith, Bergman, & Moore, 1977; Maurer & Salapatek, 1976; Pascalis, Deschonen, Morton, Deruelle, & Fabregrenet, 1995). At approximately eight weeks of age, a preference for fixating faces rather than other patterns in the central visual field emerges (Maurer & Barrera, 1981). In a recent longitudinal study of developmental changes in face scanning of video recordings of the mother, with infants 6–26 weeks of age, Hunnius and Geuze (2004) found that the percentage of time infants looked at the mother’s mouth region increased from 39.98% at six weeks to 57.19% at 26 weeks of age. This might reflect their growing interest in language.

Many individuals with autism do not seem to attend to faces in a typical fashion. Young children with autism tend to use the lower half of the face to identify pictures of familiar people, in contrast to comparison participants (Langdell, 1978). Adolescents with autism have been shown to be impaired at identifying the emotion in pictures of faces when the mouth region is obscured (Hobson, Ouston, & Lee, 1988). Recent findings in adults with autism have uncovered abnormalities in how these individuals visually scan faces. Pelphrey et al. (2002) studied the visual scan paths of five high-functioning adult males with autism, and five adult male comparison participants, when viewing pictures of expressive faces, and found that the participants with autism spent less time looking at the feature areas of the face (eyes, nose, mouth), and more time looking at non-feature areas (cheeks, chin, side of the face). Klin, Jones, Schultz, Volkmar, and Cohen, (2002) measured gaze behavior in 15 high-functioning adolescents and adults with autism and 15 matched comparison participants while they watched short video clips showing intense social interaction. The comparison participants paid special attention to the eye regions of the actors when watching the clips, and spent relatively little time watching their mouths, whereas the participants with autism showed the opposite pattern (viewing time for eyes was 65% in the comparison group and 25% in the autism group; for mouths, 21% and 41%, respectively). The authors found that fixation times on mouths and objects, but not on eyes, were strong predictors of the degree of social competence. There have, however, been several reports unsupportive of the hypothesis that face processing and face scanning are abnormal in autism, or that face processing deficits are distinct from a general perceptual abnormality (Bar-Haim, Shulman, Lamy & Reuveni, 2006; Dapretto et al., 2006; Davies, Bishop, Manstead, & Tantam, 1994; Teunisse & de Gelder, 1994; Volkmar, Sparrow, Rende, & Cohen, 1989). For recent reviews of these studies, see Jemel, Mottron, and Dawson (2006), and Sasson (2006).

Retrospective studies of home movies of children’s first birthday parties have found that infants later diagnosed with autism demonstrated less attention to faces than typically-developing comparison infants or infants with developmental delay (Osterling, Dawson, & Munson, 2002). In a prospective study of one infant who later developed autism, from a family with a high incidence of the disorder, Dawson, Osterling, Meltzoff, & Kuhl (2000) found that the infant exhibited poor eye contact and low levels of social engagement during the second half of the first year of life.

Broader Phenotype

Some of the relatives of individuals with autism possess sub-clinical features of the syndrome (including difficulties in social interaction), likely due to shared autism-risk genes, collectively termed the ‘broader phenotype’. These features have been observed in parents and siblings of children with autism. Lainhart, et al. (2002) found that 32% of parents of children with autism demonstrated the broader autism phenotype (defined as a tendency toward rigidity, hypersensitivity, or impairments in speech or friendships). Bolton et al. (1994) report that approximately 20% of siblings of children with autism demonstrate aspects of the autism syndrome, including impairments in language, learning, and social interaction. Dorris, Espie, Knott, & Salt, (2004) found that children with a sibling with Asperger’s syndrome perform more poorly than controls on a version of Baron-Cohen’s ‘Eyes Test’, in which participants are asked to label a picture of the eye region of a face with an emotional word describing what the person is thinking or feeling. These studies suggest that the broader phenotype of autism is not rare among first-degree relatives of affected individuals, often affects social interaction, and may involve differences in face processing or theory of mind abilities.

The Still Face Experiment

Previous studies of face scanning in individuals with autism have used static images of faces or pre-recorded video stimuli. Gaze behavior during the viewing of slides or movies might differ from behavior during social interaction: the additional complexity of engaging in interaction might place demands on the face processing system that are not present with other stimuli. For tasks in which instructions are given (e.g., matching or identifying emotion, gender, or some other attribute) different strategies for extracting information might generate different scan patterns in individuals with autism relative to comparison participants, whether or not there is an underlying neural difference in face processing ability (see Yarbus, 1961; Hayhoe & Ballard, 2005). For these reasons, we decided to study infant face scanning during live reciprocal social interaction with a caregiver, conducted via a closed-circuit TV system.

Reciprocal social interaction was studied using the Still Face experiment (Gusella, Muir, & Tronick, 1988; Tarabulsy et al., 2003; Toda & Fogel, 1993; Tronick, Als, Adamson, Wise, & Brazelton, 1978). In this paradigm, the mother engages the infant socially, by smiling, talking to the infant, and making eye contact, for a predetermined period. She then presents the “Still Face”, by looking at the infant while displaying a blank, neutral expression. She resumes reciprocal interaction for the final episode. The pre- and post-Still Face interaction episodes allow the researcher to assess the social capabilities of the infant, and the Still Face episode to observe how the infant reacts to perturbation. The basic Still Face effect, demonstrated by researchers using face-to-face interaction (Tronick et al., 1978), as well as TV-video interaction methodology (Gusella et al., 1988), is that infants decrease their smiling frequency and increase gaze aversion when the adult interrupts the ongoing interaction and presents the Still Face.

The paradigm has been adapted to the specific study of autism: Yirmiya and colleagues (2006) found that 4-month-old infants with an older sibling with autism were less likely to become upset during the Still Face (2 of 18 at-risk infants started to fret or cry, versus 8 of 19 comparison participants). The duration of this episode was therefore longer for at-risk than comparison infants. This finding suggests that infants at-risk for autism may react differently to the negative affective component of the Still Face experienced by typically-developing infants.

Aims and Hypotheses

The ultimate goal of the project is to prospectively test face-scanning and affective/communicative behavior during reciprocal social interaction in infants at elevated risk for autism, to determine if impairments can be observed in infancy for children who later develop autism or related conditions. For this paper, we examined whether differences could be observed between the at-risk and comparison groups at 6 months of age. Diagnostic outcomes for this cohort are not yet known.

We hypothesized that some of the infants in our at-risk cohort would demonstrate a flattened, muted affective profile during the interaction: less smiling, less gaze aversion, and less fussing/crying. Eye tracking work by Klin et al. (2002) suggested that adults with autism tend to gaze at the mouth at the expense of the eyes when viewing social scenes; we therefore hypothesized that a subgroup of infants in the at-risk group would gaze more at their mother’s mouth and make less eye contact than comparison infants.

Method

Participants

Infants with an older sibling with autism (n = 31), or with an older sibling without autism (n = 24), were tested for this study. Infants were recruited using a database maintained by the M.I.N.D. Institute, and by word of mouth. For infants with an older sibling with autism (“at-risk”), the diagnosis of the proband was re-confirmed by clinicians affiliated with the project, using the Autism Diagnostic Observation Schedule (ADOS) and the Social Communication Questionnaire (SCQ). For infants with an older sibling without autism, it was assured that the older sibling demonstrated no symptoms of autism in a brief phone intake interview with the parents, and received an SCQ score below 15, the cut-off for autism spectrum disorders (Berument, Rutter, Lord, Pickles, & Bailey, 1999).

Ninety-one 6-month-old infants visited the lab between mid-2004 and mid-2006. Of these, 21 infants (23%, 12 [6 males] in the at-risk group, and 9 [4 males] in the comparison group) cried and could not be soothed before the experiment could be administered, during the first interaction episode (pre-Still Face), or during the first 30 sec of the Still Face episode. Data from these infants was not recorded. There were three pairs of siblings; one infant from each pair was randomly excluded. Insufficient eye-tracking data (defined as less than 15% of the Still Face episode) was recorded for 12 of the remaining 67 infants (6 in each group), leaving a total of 55 for analysis (31 at-risk and 24 comparison infants). Data on infant affective behaviors were not recorded for 1 of the 55 infants; therefore, certain analyses (e.g., smiling and gaze aversion) were conducted on the subset of 54 infants.

Three infants were brought to the lab by the father, and 1 by a grandmother (all in the at-risk group). Three of these 4 demonstrated very typical gaze behavior, and it was felt that the most conservative action was to retain them in the analysis. For simplicity, the term “mother” is used to refer to the interaction partner throughout this article.

Mean age (corrected for prematurity) was 6.06 months (SD = 0.28, range 4.83–6.44) in the at-risk group, and 5.90 months (SD = 0.35, range 4.96–6.54) in the comparison group. Fifteen of the 31 at-risk infants were male; 16 of the 24 comparison infants were male. There were no significant group differences in gender (χ2(1) = 1.86, n.s.).

Experimenters were blind to the participant’s group affiliation.

Apparatus

The Tobii ET-17 used in the study is a binocular infrared bright-pupil corneal-reflection videooculographic eye-tracker. The fixed wide-angle camera allows data to be recorded from a freely-moving person, within a virtual ‘box’ 20 cm on each side. Gaze signals can be reacquired 100 milliseconds after blinks or other interruptions; there is no delay caused by reorientation of the camera, unlike narrow-field-of-view systems. The published precision of the system is 0.5° of visual angle; a conservative estimate for errors in calculated gaze position with infants is 1° of visual angle (approximately one centimeter on the monitor).

The experiment was conducted using two adjacent rooms. In one room, the mother was seated in front of a video camera and a TV showing a live image of the infant; in the other room, the infant was held on a person’s lap in front of a video camera and the monitor of the eye-tracking system. The live image of the mother was presented to the infant on the eye-tracking monitor. Microphones and speakers allowed the infant and the mother to hear each other.

Procedure

Calibration

A five-point calibration routine was used. The target was a looming/vanishing blue circle presented on a background of white dots, at each corner of the screen, and in the center. At each location, calibration data was collected when the eye-tracking operator observed the infant look in the direction of the target. After calibration, the operator examined a plot of gaze data recorded during the acquisition of each calibration point; bad points (instances in which the infant was not looking at the target, or was moving his/her eyes) were discarded and the routine was repeated to collect useable calibration data for that quadrant of the screen.

Still Face Procedure

The image of the mother was presented to the infant to begin the pre-Still Face interaction. Once the infant showed signs of orienting to the mother and engaging in social interaction, the operator started a timer. After 60 sec, the mother was instructed to present the Still Face—to stop interacting and display a neutral, expressionless face while looking at the infant’s eyes on the TV screen. After 60 sec of Still Face, the mother was instructed to re-engage the infant. The experiment was ended after 60 sec of post-Still Face interaction.

The experiment was halted if the infant started crying during the pre-Still Face episode, or during the first 30 sec of the Still Face episode. Up to two additional attempts to gather the data were made later in the session, after napping or feeding, if possible. If the infant started to cry during the final 30 sec of the Still Face, the episode was stopped early and the mother was instructed to re-engage the infant.

Data Synchronization, Coding, and Coder Reliability

Synchronization tests conducted in our lab indicate that the synchrony between the video of the infant, the mother, and recorded gaze data was within 500 milliseconds. Coders scored behavioral states frame-by-frame from a video of the infant during the interaction, and from a synchronized video of the mother with superimposed infant visual fixations, using Noldus Observer v. 5.0 software. Coders were blind to the participant’s group affiliation. Five categories of infant behavior were coded: smiling (smiling or not smiling), gaze direction (toward the screen or averted), affective state (upset or not upset), vocalization (on or off), and point of visual fixation.

Eye-tracking data were coded using an area-of-interest (AOI) approach. The AOIs were: the hair, left and right eye regions (including the eye brow), the nose, the bridge of the nose (between the eyes), the forehead, mouth, chin, cheeks, and non-face areas (the background behind the mother, and her shoulders). For each recording, a list of fixations was generated from the raw data, and imported into Noldus Observer software for coding. A fixation was defined as a series of data points within a 30-pixel radius (approximately 1.5° of visual angle) for a minimum duration of 100 milliseconds. Coders replayed the video and assigned each fixation to one of the 10 AOIs. Since the episode length differed for different recordings (chiefly the Still Face episode, due to early termination for infants who fussed or cried), behaviors were expressed as a percentage of the duration of the episode.

Coders were trained to 95% agreement (based on the measured duration of behavior in each category) before starting to code independently. Two coders scored social/communicative behaviors from the infant video, and three scored visual fixation data from the mother’s video. Data from 28 infants and toddlers scored by all three coders were used to calculate the Intraclass Correlation Coefficient (ICC) for each infant behavior variable; scores for each episode of the experiment were analyzed as separate cases, creating 84 cases for comparison. For the duration of infant smiling, single measure ICC was 0.95. For the duration of infant gaze at the monitor, ICC was 0.86. For the duration of infant negative affect, ICC was 0.92. For the duration of infant vocalization, ICC was 0.63 (because coders could not reliably code the duration of vocalization, this variable was not analyzed further).

Data from 9 eye-tracking files coded by all three coders were used to calculate ICCs for select eye-tracking AOIs; scores for each episode were analyzed as separate cases, creating 27 cases for comparison. Note that for these files, the list of fixations to be coded is the same for each coder; the coder’s role is to assign each undefined fixation to the correct AOI. The ICC values for the three coders were uniformly high for the three AOIs examined (eyes, nose, and mouth): the lowest single value from the correlation matrix for any two of the three coders was 0.93 (for nose), with most above 0.96.

Results

Overview

In this section, we present data on established measures of the Still Face paradigm (smiling, gaze aversion, and negative affect), as well as new measures describing infant visual fixation patterns. For the fixation measures, we proceed from an analysis of the total amount of fixation, to fixations directed at the face versus other areas of the image, to fixations directed at individual facial features, to an analysis of an index of fixations directed at eyes relative to the eyes and mouth. Cluster analysis is performed using this index, and the resultant clusters are tested against experimental group. Final analyses compare the mean durations of social/communicative behaviors between clusters.

Infant Smiling, Gaze Aversion, and Negative Affect

In order to confirm that our methodology produced the anticipated effect on infant behavior, we examined the percent duration of smiling and gaze aversion during each episode of the experiment. Two (group) by 3 (episode) repeated-measures ANOVAs revealed a main effect of episode (for smiling: F(2,51) = 27.58, P < 0.001; for gaze aversion: F(2,51) = 22.24, P < 0.001), but no group effect, or group by episode interaction, for either behavior (see Fig. 1). Simple effects analysis indicated that infants smiled more during the pre-Still Face than the Still Face (t(53) = 7.52, P < 0.001), and looked away from the mother more during the Still Face than the pre-Still Face (t(53) = 6.83, P < 0.001). The effect sizes (measured using Pearson’s r, with the convention that r = 0.10 represents a small effect, 0.30 represents a medium effect, and 0.50 represents a large effect) were 0.72 and 0.68, indicating that these were large effects. There was no difference between the pre- and post-Still Face episode for either behavior (for smiling, t(53) = 1.20, n.s., r = 0.16; for gaze aversion, t(53) = 1.78, n.s., r = 0.24), indicating that infants returned to pre-perturbation baseline behavior during the post-Still Face.

Fig. 1
figure 1

Percent smiling, gaze aversion, and negative affect (fussing and crying) during each episode of the experiment. Error bars show ±2 SE of the mean

To investigate the hypothesis that infants at-risk for autism demonstrate less negative affect than comparison infants, we analyzed the percent duration of negative affect (fussing and crying), and the overall likelihood for infants in each group to become upset. A 2 (group) by 3 (episode) repeated-measures ANOVA for the percent duration of negative affect revealed a significant main effect of episode (F(2,51) = 14.61, P < 0.001), but no main effect of group or group by episode interaction (see Fig. 1). Infants cried and fussed more during the Still Face and post-Still Face episodes (simple comparison between pre- and Still Face: t(53) = −4.53, P < 0.001; for pre- versus post-Still Face, t(53) = −5.16, P < 0.001). Effect sizes for these comparisons were large (r = 0.53 and 0.58, respectively). The mean duration of the Still Face episode was the same in both groups (one-way ANOVA F(1,52) = 0.06, n.s., at-risk M = 57.36 sec, SD = 8.46, comparison M = 57.92, SD = 7.68). There was no difference in the number of infants in each group who were scored as demonstrating negative affect for more than one second during the Still Face (14 of 24 comparison infants, compared to 17 of 30 at-risk infants (χ2(1) = 0.037, n.s.).

Percentage of Eye Tracking Data Recorded

The total percentage of each episode during which eye tracking data was recorded also showed a main effect of the episode of the experiment (F(2,52) = 18.36, P < 0.001): less data was recorded during the Still Face episode than the pre-Still Face, when infants looked away from the screen more (simple comparison for these two episodes: t(54) = 5.98, P < 0.001, r = 0.63). Less data was recorded during the post-Still Face than during the pre-Still Face episode (t(54) = 4.08, P < 0.001). This medium-sized effect (r = 0.49) was most likely due to the increase in infant fussiness between the pre- and post-Still Face episodes. These results demonstrate that the effect of the Still Face manipulation, as measured by infant gaze direction coded from video, can also be observed using only eye tracking data. There was no main effect of group (F(1,53) = 0.65, n.s.) or group by episode interaction for the total amount of eye tracking data that was recorded (F(2,52) = 0.36, n.s.).

Face Versus Non-face Gaze

We next examined gaze duration at face vs. non-face areas in each of the conditions, using a ratio of the amount of gaze to the face divided by the total gaze. A 2 (group) by 3 (episode) repeated-measures ANOVA demonstrated that both groups looked almost exclusively at the face and hair, and not at the background behind the mother, the mother’s neck, or shoulders (pre-SF, at-risk group M = 97.7%, SD = 3.2%, comparison group M = 94.4%, SD = 6.5%; Still-Face at-risk M = 92.8%, SD = 9.9%, comparison M = 90.6%, SD = 10.4%; post-SF at-risk M = 93.9%, SD = 12.3%, comparison M = 94.3%, SD = 8.6%). There was a significant effect of episode; the ratio of gaze to the face decreased slightly during the Still Face, and then returned to baseline (for episode, F(2,52) = 5.78, P < 0.01; simple comparison between pre- and Still Face: t(54) = 3.49, P < 0.01; between pre- and post-Still Face, t(54) = 1.82, n.s.). Effect sizes for these simple comparisons were medium (r = 0.43) and small (r = 0.24). There was no main effect for group and no significant interaction between group and episode.

Gaze Duration for Individual Regions of the Face

In order to analyze gaze to the eyes, mouth, and other face areas independently, we conducted a 3 (face region) by 3 (episode) by 2 (group) repeated-measures ANOVA, comparing the percentage of each episode infants fixated the eyes, mouth, and sum of other face areas (nose, hair, forehead, chin, and cheeks). The denominator for these variables was the total duration of the episode. The analysis revealed a main effect of episode (F(2,53) = 24.52, P < 0.001), but no main effect of face region or group (for region, F(2,106) = 0.38, n.s.; for group F(1,53) = 1.11, n.s.). There was a significant region by episode interaction (F(4,212) = 19.78, P < 0.001); no other interactions were significant. Contrasts for this interaction (given in Table 1) revealed that gaze to the three face regions changed between the pre- and Still Face episodes, but not between the pre- and post-Still Face episodes. This indicates that infants changed their pattern of looking at the face during the Still Face, and returned to baseline behavior during the post-Still Face. Figure 2 shows the percent duration of gaze to the eyes, mouth, and other face areas, by group and episode.

Table 1 Within-subjects contrasts for face region by episode interaction
Fig. 2
figure 2

Percentage of each episode infants visually fixated the eyes, mouth, and other face areas during each episode. Error bars show ±2 SE of the mean

Eye-Mouth Index

To further investigate gaze to the eyes and mouth—to capture changes in gaze to these regions within each participant across episodes, independent of differences in total gaze or episode duration—we computed an index of the percentage of gaze to the eyes out of the time the infant attended to either the eyes or the mouth (Eyes/(Eyes + Mouth).Footnote 1 A low EMI (less than 0.50) means that the infant looked relatively more at the mouth; a high EMI (greater than 0.50) means that the infant looked relatively more at the eyes.

A 2 (group) by 3 (episode) repeated-measures ANOVA for Eye-Mouth Index (EMI) revealed a main effect of the episode of the experiment (F(2,106) = 51.78, P < 0.001). Simple effects tests demonstrated that the mean EMI increased between the pre-Still Face and Still Face episodes (t(54) = −8.27, P < 0.001), but did not differ between the pre- and post-Still Face episodes (t(54) = −0.33, n.s.). This indicates that infants shifted to looking more at the mother’s eye region relative to the mouth during the Still Face, and returned to baseline during the post-Still Face. The effect size was large for the comparison between pre- and Still Face (r = 0.75), and negligible for the comparison between pre- and post-Still Face (r = 0.05). There was no group main effect (F(1,53) = 2.56, P = 0.11), or group by episode interaction (F(2,106) = 1.47, n.s.). Figure 3 shows the mean EMI during each episode, by group.

Fig. 3
figure 3

Mean Eye–Mouth Index (i.e., Eyes/(Eyes + Mouth) by group and by cluster during each episode. Error bars show ±2 SE of the mean

Cluster Analysis and Chi Squares

A close examination of EMI scores suggested the possibility of a subgroup of infants who displayed unusually low scores (i.e., very little gaze at the eyes). As an empirical test of this observation, a hierarchical agglomerative cluster analysis (using Ward’s method, with squared Euclidean distance as the similarity measure) was used to classify infants on the basis of gaze behavior (Aldenderfer and Blashfield, 1984).Footnote 2 The cluster analysis was conducted without respect to experimental group, using three variables: Eye-Mouth Index scores during each episode of the experiment.

Three clusters were generated: infants who gazed predominantly at the eyes (a “high EMI” cluster, n = 33, 16 at-risk and 17 comparison); infants who gazed predominantly at the mouth (a “low EMI” cluster, n = 11, 10 at-risk and 1 comparison); and infants who gazed predominantly at the mouth during the pre- and post-Still Face episodes (when the mother was speaking), and predominantly at the eyes during the Still Face episode (a “low–high–low” EMI cluster, n = 11, 5 at-risk and 6 comparison).Footnote 3 Figure 3 shows the mean EMI during each episode, by cluster. Figures 4 and 5 show visual fixation data during the Still Face episode for each infant in the study, by group and cluster.

Fig. 4
figure 4

Hotspot plots of visual fixations during the Still Face episode for 30 of 31 infants the at-risk group (a plot for 1 infant could not be created due to data corruption subsequent to fixation coding; the missing infant received an EMI score during the Still Face of 0.13, and belongs at the end of the second-to-last row). The blue box contains recordings from infants in the high EMI and low–high–low clusters; the orange box contains infants in the low EMI cluster. The plots are arranged in order of Eye–Mouth Index during the Still Face episode, which decreases from left to right and top to bottom of the figure (except in the blue box, which contains several “tied” recordings from infants with an EMI of 1.0). The mean EMI for each row is shown in the column on the left. Each fixation is represented as a shaded circle with a radius corresponding to approximately 1 degree of visual angle. The color scale represents the cumulative duration of fixation from 0 and 2000 milliseconds, with red representing longer duration. Note that the plots do not show the total duration of fixation; if several fixations occurred in the same location, the durations were added until the scale maximum was reached

Fig. 5
figure 5

Hotspot plots of visual fixations during the Still Face episode for infants in the comparison group (n = 24). The blue box contains recordings from infants in the high EMI and low–high–low clusters; the orange box contains the infant in the low EMI cluster. The mean EMI for each row is shown in the column on the left

Chi Square analysis using the 2 experimental groups and 3 clusters (i.e., a 2 by 3 contingency table) demonstrated a significant association between group and cluster (Likelihood Ratio χ2(2) = 7.76, P < 0.05). In order to investigate the “group by category” interaction reflected by this result, we conducted “simple effects” analyses, by comparing each cluster to each other cluster, by group. There was a significant association between group and cluster when the low EMI cluster was compared to the high EMI cluster (χ2(1) = 7.12, P < 0.01); there was also a significant association for the low EMI versus low-high-low cluster comparison (χ2(1) = 5.66, P < 0.05). There was no association when the high EMI cluster was compared to the low-high-low cluster (χ2(1) = 0.03, n.s.). These findings reflect the fact that, based on the overall odds ratio (at-risklow EMI/at-risktotal)/(comparisonlow EMI/comparisontotal), at-risk infants were 7.74 times more likely to belong to the low EMI cluster than comparison infants.

In order to determine if the group by EMI cluster association could be related to underlying differences in behaviors other than visual fixation, we conducted 2 by 3 repeated-measures ANOVAs for smiling, gaze aversion, negative affect, and overall fixation metrics (the total percentage of each episode during which eye tracking data was recorded, and the ratio of face- to non-face fixation). Since the group by cluster interaction was generated by the low EMI cluster, we collapsed the high EMI and low-high-low clusters (i.e., we compared the 11 infants in the low EMI cluster to the other 44 infants). There were no significant main effects for the between-subjects factor (low EMI cluster versus other) or factor by episode interactions for any of these analyses (for smiling, between-subjects factor main effect F(1,52) = 0.03, n.s., interaction F(2,104) = 0.09, n.s.; for gaze aversion, factor main effect F(1,52) = 0.15, n.s., interaction F(2,104) = 1.07, n.s.; for negative affect, factor main effect F(1,52) = 0.05, n.s., interaction F(2,104) = 0.04, n.s.; for total percentage of gaze data recorded, factor main effect F(1,53) = 0.89, n.s., interaction F(2,106) = 0.40, n.s.; for face-versus-non-face gaze, factor main effect F(1,53) = 0.29, n.s., interaction F(2,106) = 0.17, n.s.). These results suggest that the 11 infants in the low EMI cluster did not demonstrate differences in social/affective behavior, or in the duration of fixation, compared to infants exhibiting other visual fixation patterns.

Discussion

In general, the at-risk and comparison groups were remarkably similar with respect to the established measures of the Still Face effect (smiling, gaze aversion, and negative affect). While it is conceivable that future retrospective analyses may uncover subtle behavioral differences among infants who later develop autism or the broader phenotype, our first hypothesis, that infants in the at-risk group would show a muted affective profile and an attenuated response to the Still Face manipulation, was not substantiated.

In contrast to another recent study testing the response of infants at-risk for autism to the Still Face manipulation (Yirmiya et al., 2006), we found similar levels of distress in both groups. There are several methodological differences that might account for the lack of replication of the Yirmiya et al. finding. The infants in the present study were two months older, and the Still Face episode (which lasted up to 60 sec) was shorter than in the Yirmiya et al. report. Furthermore, our administration procedures were specifically designed to reduce the likelihood that an infant would begin to cry. This was done to maximize the recording of gaze data during the Still Face, and was accomplished by having the infant held in a person’s lap (rather than in an infant seat), limiting each episode of the experiment to one minute (some other labs have used longer intervals, though others have used 60 sec, as we did), and having the person holding the infant soothe the child by patting and bouncing.

When the duration of visual fixation to the eye region, mouth, and the sum of other face areas was analyzed independently for each region, we did not find overall group differences. There were no overall group differences when an index of gaze to the eyes relative to the eyes and mouth was analyzed using a parametric ANOVA. Heterogeneity of variance, due to the presence of a putative subgroup with the at-risk cohort, may have attenuated the power of the parametric analysis to detect overall group differences.

Our second hypothesis, that a subgroup of at-risk infants would demonstrate reduced gaze to the mother’s eyes and increased gaze to her mouth, was substantiated. The subgroup was identified using cluster analysis, conducted without respect to experimental group. Gaze behavior cluster membership was significantly associated with experimental group; this association was due to the low EMI cluster, which was overwhelmingly composed of at-risk infants. Eleven out of the total n of 55 infants demonstrated very little gaze at the eyes, throughout the experiment. Of these 11, 10 had an older sibling with autism; these infants constituted about a third of the at-risk cohort (10/31). The other infant in this subgroup belonged to the comparison cohort; the visual fixation pattern of limited gaze at the eyes and considerable gaze at the mouth was rare in this cohort (this infant was 1 out of 24 comparison infants).

The remaining infants in both groups (n = 44) made considerable eye contact during the Still Face episode. Eleven of these (5 at-risk and 6 comparison) were identified as belonging to a separate behavioral cluster (“low-high-low” EMI), characterized by a tendency to gaze at the mother’s mouth during the pre- and post-Still Face episodes, when she was speaking. A total of 33 infants (16 at-risk and 17 comparison) maintained a high degree of gaze at the eyes throughout the experiment.

Many factors likely interact to regulate gaze behavior during reciprocal interaction. Information about the direction of the mother’s gaze, and her emotional state, can be ‘read’ from her eyes; gaze at the mouth may be involved in learning to read expression and parse language. This is consistent with the behavior of adults during ‘lip-reading’; when adults view video of people talking audibly, they tend to gaze at the speaker’s eyes, but when viewing silent video of people talking, they tend to look at the mouth (Lansing & McConkie, 2003).

Our study suggests that the factors regulating gaze behavior are strongly affected by the Still Face manipulation (this is supported by the main effect of episode and region by episode interactions for visual fixation variables). One hypothesis is that infants who look predominantly at the mother’s eyes during the Still Face episode (23/24 comparison and 21/31 at-risk) are seeking to extract social information. Six-month-olds have not been shown capable of mental state attribution (Csibra, 2003), but some would argue that the infant’s interest in looking at the mother’s eyes reflects a drive to determine her direction of gaze (“is this Still Face directed at me?”), or her intent (“why is she no longer interacting?”). A different hypothesis (not invoking social information processing) is that the mother’s mouth and eyes are both attractors of infant gaze when the mother is talking; when she poses the Still Face, her mouth loses its visual appeal and infants default to looking at the eyes. The extension of these hypotheses to the low EMI subgroup is that infants who make little eye contact may lack the drive to extract social information from the mother’s eye region, or may be demonstrating an imbalance between hypothetical eye and mouth gaze attractors. Interestingly, despite markedly different visual fixation behavior, the infants in the low EMI subgroup did not differ from other infants with respect to social/affective measures. This suggests that gaze at the mother’s eyes may not be necessary for an infant to experience the negative affective component of the Still Face manipulation. These findings tend to argue against the notion that infants in the low EMI subgroup have diminished drive to extract social information; however, the Still Face effect may be so powerful that nuanced gaze behavior and/or normal social motivation is not necessary for the infant to generate the appropriate response. Unequal group sizes (11 low EMI infants versus 44 other) for analyses of social/affective behaviors versus gaze behavior cluster dictate that these results be viewed as suggestive but not definitive.

Several researchers have proposed that early developmental failures in regions of the brain involved in face processing are instrumental in causing autism. Schultz and colleagues hypothesize that the amygdala is involved in directing visual attention to the eye region of faces; abnormal development of the amygdala results in reduced visual attention to the eye region; this, in turn, results in reduced visual input to cortical face processing areas, especially the fusiform gyrus. Deprived of appropriate input, these areas fail to develop normally, and the infant does not acquire “expertise” in extracting information such as expression, identity, and gender from faces (Schultz, 2005; Schultz et al., 2003). According to this model, deficits in the amygdala-fusiform system have a cascading effect on the acquisition of other social skills, and on language, resulting in autism. In the contrasting Davidson model (Dalton et al., 2005), amygdala hyperactivity produces abnormal anxiety and fear when affected infants view faces, especially the eye region. This heightened fear response to social stimuli causes affected infants to avert gaze away from the eyes, and (similar to the Schultz model) this results in atypical development of the fusiform and orbitofrontal cortical areas necessary for the development of normal social abilities.

The converse of these behavioral hypotheses regarding the pathogenesis of autism is that putative face processing deficits observed in this population are a result of having autism, which acts on brain systems necessary for normal social behavior at various stages of development, through as-yet unknown mechanisms. A hypothesis put forth by Dawson and colleagues (Dawson, Webb, & McPartland, 2005; Dawson et al., 2002; Dawson et al., 2005) posits that face processing deficits are secondary to a primary deficit in social motivation, which results in reduced attention to faces, due to a failure of dopaminergic reward pathways activated in response to social reward (such as eye contact). It is worth pointing out that since these theories posit a different neural mechanism to explain the same behavior, they cannot be disambiguated using eye-tracking data alone.

An outstanding question relevant to this debate between cause- and consequence-hypotheses is the relationship between face scanning and face processing in the brain. Is abnormal face scanning due to fundamental disruption of face processing networks that extract information from the face (e.g., amygdala, fusiform gyrus, superior temporal gyrus, and orbitofrontal cortex), or are these areas functionally intact, but the motivation to extract the information absent? This could be clarified by studies relating eye movement patterns to indices of face-processing activity, such as ERP N170 and event-related fMRI studies in individuals with autism, and in the typical population.

It is not known how features of the broader autism phenotype will manifest in infants, but abnormal face processing, and related scanning differences, might be one such manifestation. In a recent study of face recognition performance and event-related brain potentials to face- versus non-face stimuli in parents of children with autism, Dawson et al. (2005) found that this group of parents demonstrated a decrement in face recognition relative to visual spatial and verbal abilities. The authors also found that these parents failed to show shorter-latency N170 ERP responses to faces compared to non-face stimuli, relative to comparison participants. They suggest that face processing ability may be a functional trait marker of genetic susceptibility to autism. The proportion of at-risk infants in our sample in the low EMI cluster (10/31, 32%) is comparable to the number of infants in this cohort who are predicted to develop the broader phenotype (estimates range from 20–40%; Bailey, Palferman, Heavey, & Le Couteur, 1998). Once outcome measures sensitive to the presence of the broader phenotype of autism in toddlers and young children are developed, we will be able to test the correspondence between measures based on clinically-recognizable problems (of peer interaction, language delays, shyness, etc.) and measures of face scanning in infancy.

Limitations of the study are the modest sample size of 55 infants, the lack of outcome data allowing association between our findings and autism or the broader phenotype, and the inherent confound of infant research: social interaction could only be tested for infants who did not cry before or during the initial phases of the paradigm. A methodological limitation is that the mother/infant interaction was conducted through a closed-circuit TV system. A modest attenuation of infant smiling, and reduced gaze toward the mother, has been reported when face-to-face interaction is compared to TV-video (Gusella et al. 1988). Our methodology elicited the expected infant behaviors and behavioral changes (e.g., the Still Face effect), but we cannot rule out the possibility that the altered sensorium (the sound of the mother’s voice from a speaker behind the eye tracking monitor, and the large, bright image of her face) might be affecting the at-risk infants differently than the controls. This could be the case if sensory processing or integration is different in this cohort. The fact that there were no group by episode, or cluster by episode, interactions for the total amount of eye tracking data that was recorded may help to mitigate these concerns: these findings suggest that the methodology was effective at capturing attention equivalently in both groups, and in the clusters.

Our study suggests that studying gaze patterns during live interaction, in children and adults already diagnosed with autism, would also be useful. This would help address the ongoing debate about whether individuals with autism demonstrate abnormal face scanning dissociable from other perceptual or attentional deficits, which has been addressed using static and recorded dynamic stimuli. It would also help clarify the significance of the present findings: is diminished attention to the eyes relative to the mouth during a Still Face interlude a marker for the presence of autism or the broader phenotype in older individuals? It is also possible that low EMI during the Still Face episode is an endophenotype indexing genetic risk for autism and its more broadly defined features, similar to smooth pursuit eye movement abnormalities that have been found in unaffected family members of individuals with schizophrenia (Hong et al., 2006). These findings would support the use of this behavior to assist in the identification of autism or the broader phenotype in pre-lingual infants.

Finally, the study has also generated useful new information about typical development. There is still a great deal that is not known about infant visual fixation patterns during social interaction, when viewing faces, and when performing other tasks. One advantage of the design is that by recruiting a sizable cohort of comparison infants, we generate norms for gaze behavior during the Still Face paradigm. This normative data may help researchers design the next generation of experiments in the field of early identification of autism and autism spectrum disorders.