Introduction

The human face. It is our identity—the historical record of who we are in the present, who we were in the past and who we will be in the future. The face reflects our internal emotions and cognitions, providing clues to others about what we may be feeling and thinking in the moment. As face perceivers, most people are “face experts”—able to recognize familiar faces and interpret facial emotions in a single glance without conscious effort or forethought. However, for individuals with autism spectrum disorder (ASD), it is often challenging to recognize the identity of faces and to correctly decode their displayed emotions. If individuals with ASD have difficulty perceiving and understanding the meaning revealed in a face, it is not surprising that they would encounter problems during everyday, social interactions that depend heavily on interpreting facial cues. In this paper, we will explore the perceptual, motivational and social bases of face processing in autism. These accounts help to explain how breakdowns in face processing can lead to problems in social and emotional functions related to ASD. We propose that patterns of face impairment in autism are best explained by a perceptual strategy that involves avoiding face information in the eye region. An “eye avoidance” hypothesis explains why individuals with ASD have difficulty recognizing faces, interpreting facial emotions and understanding intentions of others through the social meaning conveyed in the eyes. Based on this account, we will discuss how the “eye avoidance” perspective can be incorporated into interventions aimed at enhancing social and emotional functioning in individuals with autism.

Autism Face Recognition Abilities: Perception and Recognition of Facial Identity

Although not a defining characteristic of the disorder, many persons with autism show deficits in their perception and recognition of face identity. Compared to typically developing individuals, persons with ASD struggle in tasks involving the discrimination of facial identities (Behrmann et al. 2006b; Tantum et al. 1989; Wallace et al. 2008), recognition of familiar faces (Boucher and Lewis 1992) and immediate recognition of novel faces (Blair et al. 2002; Boucher and Lewis 1992; Gepner et al. 1996; Hauck et al. 1998; Klin et al. 1999). In a large sample of 66 children with ASD and typically developing children matched for age and full scale IQ, the ASD group performed significantly worse than the typically developing (TD) group on facial matching tasks across expression (Cohen’s d: 1.00), when the eyes were masked (Cohen’s d: .56) and when the mouth was masked (Cohen’s d: .89) (Wolf et al. 2008). Other studies argue for the face-specificity of these impairments because individuals with ASD do not differ from control participants in their ability to recognize non-face objects, such as cars and houses (Lopez et al. 2004; Wallace et al. 2008; Wolf et al. 2008).

Despite the mounting evidence, Simmons et al. (2009) suggest that many of the identified face deficits apply to the processing of unfamiliar faces and do not apply to the processing of familiar faces. In their recent and extensive review of the literature, Weigelt et al. (2012) found that about half of the reviewed studies (N = 46) provided evidence in favor of face deficits in autism whereas the other half (N = 44) showed no difference between ASD and non-ASD groups. However, of those studies, they found that systematic breakdowns in face processing occurred. Face deficits are most pronounced when the face task involves either immediate or long-term memory for faces or requires the processing information of the eyes.

A fundamental issue in the autism face research is whether the observed face deficits of individuals with ASD reflect a qualitative breakdown or quantitative impairment in the face processing system. That is, do individuals with ASD lack a fundamental operation and/or neural mechanism that is critical to normative face processing? Alternatively, individuals with ASD may possess the cognitive strategies and neural mechanisms typical of normative face processing, but employ these operations differently than neurotypical individuals.

The Holistic Face Hypothesis: Are Face Deficits in Autism Due to a Lack of Holistic Processing?

A viable test between the qualitative and quantitative accounts is the measure of holistic processing. It has been argued that more than other forms of recognition, face processing is “holistic” where recognition depends the integration of the individual eyes, nose, and mouth parts. In the face recognition literature, three tasks have served as the gold standards of holistic processing: the face inversion task, the composite task and the parts-wholes task. As revealed by these measures, faces demonstrate more holistic recognition than the recognition of other non-face objects (e.g., cars, houses). If individuals with ASD lack the holistic operation that is essential to normal face recognition, we would expect that they would show less holistic processing on the inversion, composite and part-whole tasks compared to non-ASD participants.

The Face Inversion Effect

Although all objects are more difficult to recognize when seen upside down, inversion disproportionately disrupts the recognition of face identity more than the identity recognition of non-face objects (Yin 1969). As one of the most robust phenomenon in the face recognition literature, the Face Inversion Effect has been demonstrated across short- and long-term memory paradigms and is found regardless of whether study faces or test faces are inverted. It is hypothesized that normal holistic face processes are disrupted when a face is inverted forcing the observer to process the face stimulus, not as an integrated whole, but in a piecemeal fashion. If individuals with autism lack holistic face strategies, the predication is that they would show a reduced Face Inversion Effect compared to non-ASD individuals. The face inversion evidence supporting the qualitative impairment of holistic processes in autism is mixed.

In support of the qualitative view, Hobson et al. (1988) found that children with and without autism did not differ in their ability to memorize and recognize upright faces. However, when tested several days later, the ASD group failed to show the typical Face Inversion Effect and recognized more faces in the inverted condition than the typically developing (TD) group. Similarly, Rose et al. (2007) found that in an immediate memory task, children with ASD showed no difference in their ability to recognize upright and inverted faces whereas TD children exhibited a reliable Face Inversion Effect.

However, other studies have shown that individuals with ASD, like individuals without ASD, exhibit a reliable Face Inversion Effect. Lahaie et al. (2006) found that in an immediate memory recognition task, adults with ASD exhibited a normal inversion effect for faces, but not for artificial objects. Similarly, Scherf et al. (2008) found that both typically developing and ASD groups of children and adult displayed a reliable Face Inversion Effect and the groups did not differ with respect to magnitude of the effect. The absence of group differences cast doubt on the qualitative view of atypical ASD face processing, suggesting that ASD individuals, like typical individuals, perceive upright faces in terms of the whole face and inverted faces in terms of their parts.

The Composite Face Effect

A more direct measure of holistic face processing is the Face Composite Task. Here, a composite face stimulus is formed by combining the top half of one face identity with the bottom half of another face identity (see Fig. 1). The impression of the face composite is that it neither resembles the person depicted in the top half or bottom half, but takes on a new, emergent identity. In the Face Composite Task, the participant is asked to report the identity of the person in the cued top half (or bottom half) of the face while ignoring information in the uncued bottom (or top) half. The main finding with neurotypical individuals find it difficult to selectively attend to the cued portion of the face due to holistic interference caused by the to-be-ignored half (Young et al. 1987). Critically, the holistic interference effect is diminished when the top and bottom halves are misaligned or when the aligned composite face is inverted (Rossion, n.d.). The Face Composite Task provides strong evidence that face perception is holistic whereby the top half of a face influences the perception of the bottom half and vice versa.

Fig. 1
figure 1

The face composite task. A composite face is created by the joining the top half of one face with the bottom half or another face. In the example, participants would be asked to judge wether the top halves of the faces are the same or different when the composite faces are either a aligned or b misaligned

Do individuals with ASD experience the same degree of holistic inference in the Face Composite Task as non-ASD individuals? In an initial study, Teunisse and de Gelder (2003) indicated an absence of holistic interference found in individuals with autism, reporting that persons with ASD recognized the top of face halves equally as well whether shown in aligned or misaligned composite faces. However, the results of this study have been criticized because the ASD and non-ASD groups were not equated for IQ and additional between-group statistics were unreported. After controlling for these variables, Gauthier et al. (2009) found that individuals with ASD demonstrate interference from irrelevant parts when the top and bottom halves of faces are aligned and surprisingly, when the top and bottom halves are misaligned. For misaligned faces, the authors speculated that individuals with autism may attend to both halves of the face; thereby producing a type of “contextual” holistic interference. Finally, Nishimura et al. (2008) found that adult participants with ASD, like age and IQ-matched participants without ASD, demonstrate a composite effect in which the normal holistic interference is observed in the aligned composite, but not the misaligned composite. Like the inversion studies, results from the face composite studies fail to convincingly link autism with a failed holistic face processing system. Depending on the selected study, the empirical results indicate that individuals with ASD either exhibit a typical holistic interference effect (Nishimura et al. 2008), no holistic interference (Teunisse and de Gelder 2003) or a “super” holistic interference (Gauthier et al. 2009).

The Part/Whole Task

Similar to the Face Composite Task, the Part/Whole Task is a direct measure of holistic processing. In this paradigm, participants study a whole face for a brief study period and then are asked to make a forced-choice recognition decision. In the isolated condition, a target face part (e.g., eye) from the study face and its foil are presented by themselves. In the whole face condition, the target face part is shown in the original study face and a foil whole face where the non-target features (e.g., nose, mouth) are held constant (see Fig. 2). An advantage of the Part/Whole Task is that it tests holistic memory for individual face features. According to the holistic hypothesis, recognition of face parts should be better in the whole face context than in isolation if they are integrated into a holistic representation. Consistent with this prediction, part recognition is found superior when presented in context of whole faces but no evidence of holistic recognition is found in context of scrambled faces, inverted faces or non-face stimuli (houses) (Tanaka and Farah 1993).

Fig. 2
figure 2

Part/Whole Task. a Participants are shown a study face. At test, participants are asked to identify a “part” of the target face (e.g., eyes) presented either b in isolation or c in the whole face. In the isolated part and whole face test conditions, the target and foil items differ only with respect to eye part under test

Several investigators have applied the Part/Whole Task to evaluate holistic processing in ASD populations (Faja et al. 2009; Joseph and Tanaka 2003; Wolf et al. 2008). Joseph and Tanaka (2003) found that the typically developing children showed holistic effects for eyes and the mouths, but that children with ASD exhibited a strong holistic effect for mouths only—not for eyes. This result was replicated in a large-scale study where ASD children and age- and IQ-matched, non-ASD children (n = 66 in each group) demonstrated a robust part/whole effect. However, while the two groups did not differ on the “mouth” trials, children with ASD performed significantly worse on the “eye” trials compared to non-ASD children. Faja et al. (2009) also reported an overall holistic effect for adults with ASD that was on par with non-ASD adults who were equivalent in age and IQ. However, in contrast to the previous studies where a holistic advantage was exhibited for the mouth (Joseph and Tanaka 2003; Wolf et al. 2008), the ASD group showed a stronger holistic effect for the eye features. A possible explanation is that adults in the Faja et al. study (2009) developed compensatory face strategies focusing on the eyes compared to the children tested in the other studies (Joseph and Tanaka 2003; Wolf et al. 2008). The conclusion of the Part/Whole results indicate that individuals with ASD recognize face parts better when shown within whole faces than when shown in isolation.

Lopez et al. (2004) hypothesize that individuals with autism can apply holistic strategies to faces when properly cued. Youth with ASD were matched with typically developing controls by chronological age. Participants were shown a full face image followed by a short delay. The task then required individuals to discriminate between two alternative face parts, choosing the correct one they had previously viewed in the presented face. The two possible face parts were displayed embedded within the previously shown face, or as an isolated stand-alone face part. Before each face presentation, participants were either cued to pay attention to specific face areas or left uncued receiving no hints or clues. Results showed that prior cueing enhanced discrimination of face parts in the whole face condition for participants ASD, similar to their typically developing peers. This quantitative analysis further supports that individuals with ASD possess both a holistic or part-based strategies in their perceptual repertoire and can apply either approach when instructed to do so.

In summary, the converging results from the Face Inversion, Face Composite and Parts/Wholes studies indicate that individuals with autism exhibit normal holistic recognition of faces. Individuals with autism, like neurotypical individuals, are impaired in their recognition of upside-down faces, have difficulty dissociating the top and bottom halves of faces in face composite task, and show superior recognition of face parts when presented in the whole face stimulus. Face recognition deficits in autism cannot be explained by the absence of a fundamental holistic face mechanism and therefore, the findings argue against the qualitative explanation of the face impairment.

The Perceptual Account: Are Face Deficits in Autism Due to a Local Processing Bias?

The local versus global nature in which individuals with autism process visual stimuli has been speculated as another source of qualitative impairment in the autism face deficit. According to the local processing view, individuals with ASD are biased toward attending to the local details and features of an object or face and this finer level perceptual analysis comes at the expense of processing the global organization of the stimulus. The local processing perspective fits with Kanner’s original description of autism as ‘the inability to experience wholes without full attention to the constituent parts.’ While typically developing individuals tend to process information extracting overall meaning or gist, some suggest autism is characterized by weak or absent drive for global coherence (Frith and Happé 1994). Proponents suggest that given their bias for detail-focused and localized processing, persons with ASD lack global strategies that are prerequisite for successful face recognition. Might a local versus global analysis explain why individuals on the spectrum struggle to understand changes across changes in expressions and identity?

Evidence does indeed suggest that individuals with ASD exhibit some enhanced abilities in local-oriented visual search tasks (Joseph et al. 2009; Kemner et al. 2008; O’Riordan et al. 2001), sensory tasks involving luminance and texture discrimination (Bertone et al. 2005) and block design completion tasks (Shah and Frith 1993; Minshew et al. 1997). In addition, other studies suggest individuals with autism are superior in detecting embedded figures (Happé 1996; Jolliffe and Baron-Cohen 1997; Shah and Frith 1983). The key qualitative question then becomes, does this local processing strength interfere with global processing ability? Does being a good visual discriminator destine individuals with autism to struggle in perceptual grouping, and global shape performance?

The Navon Task

A number of investigations have explored local versus global processing in autism using the classic Navon Task (1977). Described as a hierarchical visual processing task, individuals are presented with both local and global stimuli simultaneously (see Fig. 3). As a measure of the global processing, participants are asked to report the large letter and ignore the smaller letters and as a measure of their local processing to report the small letters and ignore the large letter. Performance amongst individuals with autism has demonstrated that the relationship between local and global processing is not a zero-sum game. Both distinct local and global processing styles can and do co-exist in autism.

Fig. 3
figure 3

Examples of the Navon hierarchial letters task. a consistant, b neutral and c conflicting conditions are displayed

Mottron et al. (2003) compared persons with ASD to undiagnosed individuals using a local disembedding and global Navon letter task. Results indicated those with autism were not only faster to identify embedded figures in the local perceptual task, but were no slower to identify letters in the hierarchical global task. Individuals with ASD and their neurotypical peers shared typical patterns for a local bias with larger letters and a global bias with smaller letters. Further evidence for global processing in autism has been documented by Deruelle et al. (2006). In a separate Navon-inspired letter task, performance by thirteen children with ASD were compared with typically developing children matched in one control group by verbal mental age and a second control group by chronological age. A significant bias towards global processing was demonstrated in all three groups. Children with ASD were again found comparable to controls in their global perceptual strategies. Finally, when asked to respond randomly to Navon stimuli, individuals with autism were as likely to respond at the global as at the local level compared to neurotypical control participants (Wang et al. 2004). The cumulative evidence indicates that enhanced local processing in autism does not render global strategies impaired, underdeveloped, or non-functional. The default setting of autistic perception appears to be locally oriented relative to the global default setting of non-autistic processing, but the local bias does not come at the expense of global perception (Mottron et al. 2006).

Object Versus Face Recognition Tasks

With global processing strategies seemingly intact in autism, researchers have also considered whether the autism face deficit is due to additional perceptual impairments. If there is a general perceptual deficit in autism, impairment should then extend to other non-face stimuli (Behrmann et al. 2006b). One study suggests detail-focused and localized perception in autism may be at play for poorer visual recognition memory of cats, horses, and motorbikes (Blair et al. 2002). Individuals with autism have also been found slower to discriminate artificial objects (i.e., Greebles) suggesting a generalized deficit in perceptual processing may interfere with configural processing and ability to make distinctions between structurally-similar objects within shared object class (Behrmann et al. 2006a; Scherf et al. 2008).

Other findings suggest that individuals with ASD show equal or even superior object recognition abilities. Whereas recognition was poorer for cats, hoses and motorbikes, Blair et al. (2002) reported persons with autism were equal to typical controls in visual recognition memory for buildings and leaves. Boucher and Lewis (1992) found that compared to learning disabled peers, children with autism performed better in memory for buildings. Wolf et al. (2008) assessed specific face recognition skills in comparison to immediate memory for cars, and discrimination of houses across persons with autism and typically developing controls. Matched by age and IQ, children, adolescents, and young adults with ASD were significantly poorer in face recognition tasks including: matching facial identity across expressions, matching identity across masked features, and identifying faces from immediate memory in continuous trials. Yet, compared to neurotypical participants, those with autism were equal in their memory for cars and performed significantly better in discriminating houses (Wolf et al. 2008).

Local Bias as an Explanation for Face Deficits in Autism

As an account of face recognition deficits in autism, it has been speculated that local processing is optimally suited for making detailed discriminations in simple geometric stimuli, but is less optimal for making more global discriminations required in subordinate level categorization of complex objects such as faces (Behrmann et al. 2006a, b). According to this local account, the source of face impairment in autism is due to a general local perceptual strategy rather than a specific impairment in face processing, per se. For example, children with autism outperformed neurotypical children on a face recognition task when the details of a face were enhanced through high spatial filtering whereas neurotypical children outperformed children with autism when the global configural information was preserved through low frequency filtering (Deruelle et al. 2006). However, there is evidence to suggest that the local strategy is not a general perceptual bias, but is influenced by the type of object category. When asked to make local discriminations between the size and spacing of house, car and face features, performance of the ASD group was equal to or superior to age- and IQ-matched control participants on house and car tasks, consistent with a local bias account (Wallace et al. 2008; Wolf et al. 2008). Critically, when the same local task was embedded in face stimuli, individuals with autism performed reliably worse than control participants, particularly on judgments involving the size and spacing of the eye features. These results indicate that the local strategy is a general perceptual strategy in autism, but is reserved for specific non-face stimuli and for specific regions of the face. In the next section, we will investigate the specificity of selective face deficits in autism resulting from distinct impairment in gathering information from the eyes.

The “Eye Avoidance” Hypothesis of Autism Face Processing

Not only do the eyes provide a “window into the soul,” they hold the key to our identity; it is through the eyes that we recognize the person. In experiments by Schyns et al. (2002), participants were asked to perform a face recognition task in which selected areas of a blurred face stimulus were randomly uncovered with Gaussian blotches known as “Bubbles”. Recognition was most successful when the Bubbles revealed the eye and eye brow regions, indicating these features are most diagnostic for face identification (Schyns et al. 2002; Vinette et al. 2004). Face recognition is most disrupted when the eyes and eye brow features are occluded in a face compared to when the nose and mouth features are covered (Sadr et al. 2003; Sekuler et al. 2004), regardless of whether faces are presented in their upright or inverted orientations (Sekuler et al. 2004). When participants are asked to make judgments about face identity, emotion or gender, their eye fixations concentrate just below the eye region and this area of fixation is the most optimal for recognition (Peterson and Eckstein 2011; van Belle et al. 2010). Thus, among healthy adults, the behavioral research demonstrates that the eyes are most critical for recognizing a face.

When people fail to take into account eye information, systematic breakdowns in face recognition ensue as shown by patients with acquired prosopagnosia caused by temporal lobe brain damage. Closer examination of their face recognition impairments show that the patients are able to discriminate differences in the lower mouth region of the face (i.e., size of mouth, spacing between the nose and mouth) equally as well as non-brain damaged control participants, but are selectively impaired in their ability to discriminate information in the eye region (i.e., size of the eyes, spacing between the eyes) (Bukach et al. 2008; Rossion et al. 2009). Whereas healthy adults look at the eye region, prosopagnosic patients attend to the mouth region (Caldara et al. 2005; Xivry et al. 2008). It has been speculated that the mouth strategy in prosopagnosia may reflect the patients’ difficulty in processing the high spatial frequency information contained in the eye region (Caldara et al. 2005).

Eye Deficits in Identity and Expression Recognition in Autism

Like prosopagnosic patients, individuals with autism tend not to look at the eyes of another person and instead, preferentially attend to information in the mouth region. However, unlike a prosopagnosia patient, the person with autism may be uncomfortable and feel threatened by looking at someone’s eyes. In his best-selling memoir, John Robison (2007) begins by painfully recounting his almost daily interrogation by parents, relatives, teachers, and principals: “Look me in the eye young man!… What are you hiding?… You’re up to something. I know it!” (p. 1). Yet instead, Robinson “would glance up at their hostile faces and feel squirmier and more uncomfortable and unable to form words… would quickly look away.” Despite ongoing pleas and threats demanding eye contact, Robinson shares that “us with Asperger’s are just not comfortable doing it. In fact, I don’t really understand why it’s considered normal to stare at someone’s eyeballs” (p. 3). The absence of eye contact and reduced attention to the eyes of another person is an early warning sign of autism. By the first year of life, children who are later diagnosed with ASD exhibit a lack of attention to faces (Osterling et al. 2002) and diminished eye contact (Zwaigenbaum et al. 2005).

Eye Tracking and Face Strategies in Autism

Eye tracking is a powerful technique for linking eye movement patterns to their underlying cognitive strategies. In one of the first eye tracking studies in autism, Phelphrey et al. (2002) assessed the eye tracking patterns of individuals with autism while viewing a static face. They found that compared to the control group matched for age and IQ, the adults with ASD displayed disorganized scanning patterns, concentrated less on the core eye, nose and mouth features of the face and focused more on the external features (hair, chin, clothing). Critically, when individuals with autism do attend to the core features, their gaze patterns are directed to mouth area and spend less time inspecting the eyes Dalton et al. (2005); Pelphrey et al. 2002; Spezio et al. 2007).

In the real world, faces are not static entities, but are constantly moving and reacting to dynamic social situations. It is not clear whether the eye tracking evidence obtained from static face stimuli generalize to face-to-face interactions that we encounter in our everyday lives. To simulate naturalistic conditions, Klin et al. (2003) recorded the scan paths of individuals with autism while viewing an emotionally-charged scene from the film “Who’s Afraid of Virginia Wolf.” While watching this interaction, individuals with autism spent more viewing time looking at the mouths of the actors and unrelated objects in the scene and less time on actors’ eyes. Collectively, the eye tracking studies show that whether viewing a static image or a dynamic video of faces, individuals with autism show a preference for the mouth features and avoidance of the eye features.

Developmentally, there is evidence to suggest that the onset of the mouth bias in autism emerges relatively early in ontogeny (Jones et al. 2008; Klin and Jones 2008). In a case study, Klin and Jones (2008) showed a 15 month toddler diagnosed with autism a video of a female actor playing a variety of childhood games (e.g., peek-a-boo and pat-a-cake). The child fixated on the mouth of the actor that was synchronous with the actor’s speech and motor movements (e.g., clapping hands). Although non-autistic toddlers were similarly drawn to the speech movements of the actor, their gaze patterns were equally distributed to the eye and mouth regions of the actor. In a larger group study, fifteen 2-year-old children with ASD were compared to 36 typically developing children and to 15 developmentally delayed but nonautistic children. When presented with videos of childhood games (Jones et al. 2008). The 2-year-old children with autism exhibited a significant increase in looking time at mouth region and decrease in looking time at the eyes in comparison to both control groups. Furthermore, their fixation time on the eyes was correlated with their level of social competence such that less fixation on eyes predicted greater levels of social disability (Jones et al. 2008). Jones and colleagues argued that the mouth preference is likely to exert a negative impact on subsequent social development given the importance of the eye region for extraction of expression and identity information. However, a recent study suggests that the linkage between the mouth bias and social function is not straightforward. When viewing a dynamic social scene, Rice et al. (2012) found that visual fixation time on the mouth positively correlated with social disability for children with similar verbal and non-verbal IQ scores. In contrast, for children with high verbal IQ’s, mouth fixation time negatively correlated with social disability (i.e., the more time fixated on the mouth, the better social function). The authors speculated that for these children, language and attention to the mouth may be their main tool for navigating the demands of social interaction.

There are two possible interpretations of the foregoing eye tracking findings. On one hand, it might be that persons with autism actively avoid looking at the eyes of the face because they find them socially threatening. On the other hand, it is plausible that people with ASD are not repelled by the eyes, but are spontaneously drawn to the mouth features. To test these competing accounts, persons with ASD were assigned the task of categorizing emotional faces as fearful, happy or neutral (Kliemann et al. 2012). Participants were instructed to fixate on a point that was located either in the upper eye or lower mouth region of the face. When cued to the eye region, individuals in the ASD group made more and faster saccades away from the eyes than when cued to the mouth. When cued to the mouth, participants in the typically developing group automatically shifted their gaze away from the mouth and towards the eyes whereas participants in the ASD group were less inclined to saccade to the eyes. These results were interpreted as support for the hypothesis that persons with ASD actively and reflexively avoid eye region of the face in an attempt to reduce social contact with others (Kliemann et al. 2012).

Consequences of an Eye-Avoidance Strategy in Recognition of Identity and Expression

The preference for information in mouth region and avoidance of information in ASD should produce behavioral differences in face processing tasks. This prediction was directly tested in the Dimensions task where two faces are presented side-by-side (see Fig. 4) and participants are instructed to indicate whether the faces are the “same” or “different”. For the “different” trials, the two faces can vary in eye information (e.g., size of the eyes, spacing between the eyes) or mouth information (e.g., size of the mouth, spacing between the nose and mouth). Whereas children with and without autism perform equally well in their discrimination of changes in the mouth region, children with ASD are selectively impaired in their discrimination of spacing and size differences in the eyes region. (Rutherford et al. 2007; Wolf et al. 2008). The eye-mouth difference is evident in the previously discussed Part/Whole recognition task. When asked to identify the part from a previously studied face, typically developing individuals show a reliable advantage over the ASD individuals in their recognition of the eyes. However, the opposite pattern is observed for discriminations in the mouth region where individuals with autism are either superior (Joseph and Tanaka 2003) or equal (Wolf et al. 2008) to typically developing peers. Like patients with prosopagnosia, individuals with ASD preferentially attend to information in the lower mouth region over information in the eye region and like prosopagnosic patients, individuals with ASD show deficits in their face processing abilities.

Fig. 4
figure 4

The face dimensions Task. Examples of a complete set of the face stimuli. a faces differing in the distances separating the eyes (configural/eyes manipulation). b faces differing in the distance between the nose and mouth (configural/mouth manipulation). c faces differing in the size of the eyes (featural/eyes manipulation). d the mouth (featural/mouth manipulation)

The eye avoidance strategy used by individuals with autism similarly affects their recognition of facial expressions. Comparisons on individual expressions reveal that participants with ASD perform reliably worse than neurotypical participants in identifying the angry expression. Anger is considered a ‘top half’ emotion where the majority of the expressive information is conveyed in the upper half of the face (Calder et al. 2000; Smith and Cottrell 2005). Given the tendency of individuals with ASD to avoid the eye region of the face in deference to information in the lower mouth region (Klin et al. 2002; Riby et al. 2009; Rutherford et al. 2007; Wolf et al. 2008), it is not surprising that perception of anger is differentially compromised relative to the other facial expressions. As well, the perceptual strategies have been further explored with the Part/Whole Expression Task. When asked to recognize the eyes or mouths from happy and angry expressions in isolation or in the whole face, typically developing individuals showed a whole face advantage for the eyes whereas the ASD individuals performed better on the mouths, shown in either in isolation or the whole face (Tanaka et al. 2012). The eye avoidance strategy places individuals with autism at a disadvantage when trying to decode most facial expression that are jointly determined by facial muscles in the upper eye and lower mouth regions of the face (Calder and Jansen 2005; Smith and Cottrell 2005).

According to Baron-Cohen et al. (2001, 1997), individuals with ASD fail to correctly read the “language of the eyes” and often miss the subtle, socially relevant cues that the eyes convey. In their study, participants with ASD and typically developing participants were presented with photographs of people posing basic facial emotions (e.g., happy, sad, disgusted, angry) and complex mental states (e.g., guilty, thoughtfulness, flirtatious, arrogance) (See Fig. 5). Participants were asked to make a forced-choice response after viewing a stimulus of either the whole face, the eyes alone, or the mouth alone. Participants with ASD performed less well than non-ASD controls on judgments involving basic emotion and complex mental state judgments. Importantly, they performed markedly worse on the “eyes alone” trials in both the basic emotion and mental state categories. The Baron-Cohen findings are relevant because they show that the eyes are not only important for helping us understand what someone is feeling, but also for providing insights into what someone is thinking.

Fig. 5
figure 5

Mental states task. Eyes and mouth features depicting the mental states of a guilt. b thoughtfulness, c flirtatious and d arrogance

Eye Processing and Physiological Arousal in Autism

What factors might account for the eye-avoidance face processing strategy in autism? Despite the importance of the eyes for recognition of identity and expression, they may be the most threatening area of the face for individuals with ASD. Eye contact is a potent signal sending a powerful message to its receiver as invitation for social engagement and intimacy (Kleinke 1986). In everyday face-to-face encounters, social communication is initiated and regulated through the “language of our eyes”. For individuals with autism, avoiding eye contact may be an effective strategy for discouraging social interactions. Indeed, people with autism anecdotally comment that looking into another person’s eyes is an unpleasant, even painful experience (Robison 2007).

To examine the link between eye contact and emotional arousal, children with ASD and neurotypical children viewed face stimuli displaying either a direct gaze (with eye contact) or averted gaze (with no eye contact) as their skin conductance was recorded (Kylliainen and Hietanen 2006). Whereas the neurotypical children showed no changes in skin conductance as a function of the gaze condition, children with autism exhibited a stronger skin conductance reaction to the direct gaze, indicating hyper-physiological arousal (Bradley et al. 2001). Past research has shown a good correspondence between skin conductance levels and subjective reports of emotional arousal (Hietanen et al. 2008) suggesting that the elevated skin response reflects the heightened emotional response to the eye gaze stimulus.

Does eye gaze and skin conductance play a function role in identity and expression recognition in autism? In a study by Joseph et al. (2008), children with and without ASD were given a face recognition task with faces displaying direct or averted gaze while skin conductance levels were monitored. For children with ASD, a negative correlation was found between skin conductance amplitude and recognition performance on direct gaze. Interestingly, there was no association between skin conductance activity and recognition performance on faces depicting an averted gaze. These results indicate that the autonomic reaction to direct eye contact faces may interfere with face identity recognition in children with ASD.

The Neural Substrates of Face and Eye Processing

At the neuroantomical level, a network of brain structures including the fusiform gyrus, amygdala and superior temporal sulcus play a key role in mediating face-to-face social and emotional interaction. The fusiform gyrus selectively responds to face stimuli over other types of non-face objects (Kanwisher et al. 1997; Puce et al. 1995), shows greater activation to familiar than unfamiliar faces (Lehmann et al. 2004) and is sensitive to the individual identities of familiar faces (Rotshtein et al. 2005). Sharing dense connections with the fusiform gyrus, the subcortical amygdala structure is tuned to the coding of facial expressions (Adolphs et al. 1998) and is triggered by expressive information in the eye region, such as the sclera that signals the expression of fear (Kawashima et al. 1999; Morris et al. 2002; Whalen et al. 2004). The superior temporal sulcus is responsive to a person’s eye movements (Puce et al. 1998; Wicker et al. 1998; Hoffman and Haxby 2000) as cues of their intentions and goals, such as when a person shifts their eyes to pick up a cup of coffee (Pelphrey et al. 2003). Collectively, the fusiform gyrus, amygdala and superior temporal sulcus form a network of neural structures responding to the saliency of faces in the environment and coding of facial information about a person’s identity, expression and goal (Schultz 2005).

It is hypothesized that the neural circuitry mediating social interaction in neurotypical individuals is compromised in people with autism (Schultz 2005). When individuals with autism view faces, they show a reduced activation of the fusiform gyrus relative to neurotypical individuals suggesting that face stimuli are less engaging (Schultz et al. 2000; Pierce et al. 2001; Wang et al. 2004). This claim has been challenged in a recent, meta-analysis where Samson and colleagues (Samson et al. 2012) found that ASD and non-ASD groups did not differ in their overall activation of face-related brain areas (i.e., fusiform gyrus and related occipital face area). Critically, the brain activity of the ASD group extended to anterior fusiform gyrus, parahippocampal gyrus, bilateral striate and extrastriate areas suggesting a larger, more diffuse face network in individuals with autism compared to non-autistic individuals.

For individuals with autism, activation of the fusiform gyrus and amygdala is modulated by fixations to the eyes and is increased when attention is directed to the eyes independent of emotion or familiarity (Dalton et al. 2005). According to the authors, eye fixation is associated with negative overarousal mediated by amygdala activation and the diminished gaze fixation commonly displayed by individuals with autism is a compensatory strategy to regulate overarousal to social stimuli (Dalton et al. 2005). Recent neuroimaging results have been shown that ASD individuals exhibit increased amygdala activation when making a gaze movement away from the eyes as a sign of “eye avoidance” (Kliemann et al. 2012). In a study by Pelphrey et al. (2005), participants observed a virtual actor looking towards a target checkerboard in a congruent trial and looking away from the target in an incongruent trial. Whereas neurotypical participants registered the violation of expectation in the incongruent trial with increased superior temporal sulcus activation, participants with ASD did not exhibit a difference in STS activation between congruent and incongruent trials. The researchers hypothesized that even when participants with ASD are attending to shifts in eye gaze, they are not sensitive to the social contingencies and expecations that are contained in the eye movement (Pelphrey et al. 2005).

Eye Avoidance or Mouth Preference?

Can the foregoing evidence be explained as a mouth preference rather than an eye avoidance? That is, individuals with autism may not be averse to looking at the eyes, but are attracted to information in the mouth region. In language perception, for example, attending to the mouth is crucial for integrating dynamic visual input with the ongoing speech signal. Developmental studies with three- to six-month infants show that visual information about speech articulation not only enhances phoneme discrimination, but also contributes to the learning of phoneme boundaries in infancy (Kuhl and Meltzoff 1982; Teinonen et al. 2008). It is conceivable that the child with ASD learns to attend to information in the mouth region as a strategy to compensate for the language deficits associated with the disorder. As a consequence, the encoding of information in the mouth is enhanced at the expense of information in the eyes. Although a plausible account, there is little data to support the “mouth preference” hypothesis.

In face discrimination tests in which mouth information was manipulated independently of eye information, there is no evidence to suggest that individuals with autism outperform age- and IQ-matched neurotypical participants on discriminating the size, shape or spatial distances in the mouth region (Rutherford et al. 2007; Wolf et al. 2008). On expression recognition tasks, the ASD group performed more poorly than the neurotypical group on disgust expression where the majority of diagnostic information is contained in the mouth area (Rump et al. 2009; Tanaka et al. 2012; Wright et al. 2008). On a Part/Whole recognition task, participants performed as well, but not better than typically developing group on recognition of the mouth part shown either in isolation or in the whole face (Joseph and Tanaka 2003). Taken together, there is little empirical support to show that individuals with ASD demonstrate a face processing advantage for the mouth as predicted by the “mouth preference” hypothesis.

Summary

Although individuals with ASD show deficits in their ability to recognize faces, their deficits seem to be primarily centered on the eye region of the face. Empirical research has shown that people with autism have difficulty discriminating information in the eye region of the face on tasks that involve the recognition of identity (Joseph and Tanaka 2003; Rutherford et al. 2007; Wolf et al. 2008), expression (Rump et al. 2009; Tanaka et al. 2012; Wright et al. 2008) and mental states (Baron-Cohen et al., 1997). Eye-tracking studies show that individuals with autism look at other parts of the face (Jones et al. 2008; Klin et al. 2009; Pelphrey et al. 2002) and actively avoid looking at the eyes (Kleimann et al. 2008).

According to the “eye avoidance” hypothesis, the eyes are an emotionally charged region of the face that elicit an immediate visceral response mediated by elevated skin conductance and increased amygdala activity in persons with ASD. The “eye avoidance” strategy is an adaptive, compensatory perceptual strategy that focuses on external features (clothing, hair) or other facial features (mouth, chin) of the face. This approach protects individuals wih ASD from the discomfort and threat posed by the eyes. The drawback of the “eye avoidance” strategy is that it has cascading effects on the ability to encode and discriminate information about facial identity, expression, and intention and further interferes with social processing.

Implications for Intervention

Although the emerging evidence indicates that face processing is impaired in autism, the precise connection between autism and face abilities is difficult to disentangle. On one hand, it is conceivable that compromised face processing abilities contribute to the very core of the social and communication deficits associated with the autism condition (Dawson et al. 2005; Schultz 2005). However, the converse relationship is also plausible. Poor social and communication skills and a general disinterest in people might lead to less motivation to attend to faces, which further exacerbates impaired face recognition abilities and degrading of social skills. Thus, autism and face processing appear to have a reciprocal relationship where deficient face processing and impaired inter-personal communication contribute to a downward spiral in social function. Regardless of the relationship between face processing and autism, training in face recognition abilities provides a practical avenue for intervention with straightforward implications for ameliorating the social deficits of ASD.

Training protocols based on subordinate level recognition have been developed and successfully applied to teaching expert recognition of artificial objects (Gauthier and Tarr 1997) and real world objects (e.g., birds, cars) (Scott et al. 2006; Tanaka et al. 2005). In the realm of face recognition, expertise training has been effective for improving face recognition in patients with developmental prosopagnosia (DeGutis et al. 2007) and in healthy adults who perceive other-race faces poorly (Lebrecht et al. 2009; McGugin et al. 2011; Tanaka and Pierce 2009).

Applying perceptual expertise protocols to teach face recognition skills in autism is more challenging than teaching other types of object and face recognition skills. As discussed in the previous section, because faces are perceived to be threatening and aversive stimuli, individuals with ASD will be less motivated to engage in the level of face training necessary to improve their recognition skills. Despite the inherent obstacles, efforts to teach face processing skills to individuals with ASD have produced some, albeit limited, success. Faja et al. (2007) conducted individualized laboratory face training sessions over a three week period with a small group of adults with ASD. The results showed that trained individuals (N = 5) significantly improved in their ability to discriminate spacing differences in the eye region. Although these results provide “proof of concept” for the face training approach in ASD, the intervention tested only small number of participants and the individualized lab training was not practical for a large-scale intervention.

Other computer-based programs have provided a reasonable alternative to the one-on-one training method. Computer-based training is a desirable method of autism intervention because it: (1) is cost-effective and easy to disseminate, (2) provides a consistent learning environment and (3) can be modified according to the unique needs of the ASD learner (Battocchi et al. 2010). Programs such as Emotion Trainer (Silver and Oakes 2001), Frankfurt Test and Training of Facial Affect Recognition (Bolte et al. 2002), FaceSay (Hopkins et al. 2011) and Let’s Face It! (Tanaka et al. 2010) have been developed for individuals with ASD and achieved some success. For example, after playing 20 hours of the Let’s Face It! program, children with ASD improved in their holistic recognition of eye features (Tanaka et al. 2010). In another study, children who played the social skills program FaceSay two times a week for 6 weeks improved in their recognition of facial emotions and exhibited more positive social interactions on the playground (Hopkins et al. 2011). In the future, technological innovations in automatic face recognition (Deriso et al. 2012), robotics (Scassellati et al. 2012) and virtual reality (Kandalaft et al. 2013) may further improve the efficacy of face training for persons with ASD.

Final Words

This review began describing the face recognition deficits in autism. We highlighted that impaired face recognition can lead to cascading problems in social and emotional functioning, and therefore, investigated three accounts of the phenomena in which we explored the holistic, local perceptual bias, and eyes avoidance hypotheses for autism face processing. Our analysis suggests face recognition deficits in autism cannot be explained by the absence of holistic and or global perceptual strategies. Rather, the “eye avoidance” hypothesis helps to account for the selective pattern of face deficits in autism across eye tracking, comparative group, skin conductance, and neuroanatomical studies. Further exploration of the “eye avoidance” hypothesis provides optimism for future face training interventions.