Introduction

In their seminal paper, Young, Hellawell and Hay (1987) presented a new ‘composite face paradigm’ for investigating the perception of facial configurations. The authors described how, when the top half of one face was aligned with the bottom half of another, the resulting composite arrangement induced the perception of a novel facial configuration; the presence of the bottom half altered the appearance of the top half, and vice versa. The illusory fusion of the two aligned halves hindered recognition—observers took longer to identify the source faces. Importantly, however, the illusion was observed only when the composite arrangements were aligned spatially and presented upright; free from illusion-induced interference, observers were able to identify the sources of upright, misaligned halves with comparative ease (Fig. 1). When presented upside-down, observers’ recognition of the source faces was broadly similar in the aligned and misaligned conditions.

Fig. 1
figure 1

When aligned with different lower halves, it is surprisingly difficult to recognise that the upper regions of the two composites are identical (left top). However, the illusion-induced interference is greatly diminished when the composites are misaligned (right top). When composite arrangements are shown upside-down, little illusion-induced interference is seen in either the aligned (left bottom) or misaligned (right bottom) conditions

The composite face illusion has proved enormously influential. Crucially, the effect reveals a tendency to integrate feature information from disparate facial regions. Findings from studies employing composite face techniques have therefore informed the development of holistic theories of face perception, which posit that local facial features are integrated into a unified representation for the purposes of efficient analysis and interpretation (Farah, Wilson, Drain, & Tanaka, 1998; Maurer, Le Grand, & Mondloch, Maurer et al. 2002; Piepers & Robbins, 2013). Moreover, the finding that greater illusory fusion is induced by upright arrangements suggests that the orientation-specific processes responsible for composite interference may also be disrupted in the classic face inversion effect, whereby orientation inversion disproportionately impairs the recognition of faces, relative to other classes of object (Yin, 1969).

The purpose of this review is to provide a focussed overview of the existing composite face literature with a view to identifying priorities for future research. We begin by discussing some of the practical issues associated with measurement (Measuring observerssusceptibility to the illusion). We then consider how susceptibility to the illusion develops during ontogeny, and the relative contribution of genetic and environmental factors (Development). In subsequent sections, we consider the neural signatures associated with the illusion (Neural basis), and discuss findings obtained from clinical populations (The susceptibility of clinical populations to the illusion). Having reviewed the available empirical data on the composite face effect, we present and evaluate different theoretical accounts of the illusion (Theoretical interpretations). We end the review with discussion of six priorities for future composite face research (Future directions).

Measuring observers’ susceptibility to the illusion

Paradigms

In the original composite study (A. W. Young et al., 1987), participants were presented with a single facial composite, and the degree of illusion-induced interference inferred from their response latencies. When asked to identify the source of a task-relevant half (e.g., upper half; hereafter ‘target half’), whilst disregarding the remaining task-irrelevant half (e.g., lower half; hereafter ‘distractor half’), observers were disproportionately slower in the upright-aligned condition. The application of this naming paradigm is limited, however, by the need to use familiar faces. To allow participants to name the source of the target half, composites must be constructed from personally familiar, celebrity, or otherwise learned faces. This is potentially problematic in light of known differences in the perceptual processing of familiar and unfamiliar faces (Jenkins & Burton, 2011; Murphy, Ipser, Gaigg, & Cook, 2015). For example, observers are better able to match familiar faces using their internal features, including the eyes, nose, and mouth (Osborne & Stevenage, 2008). In contrast, unfamiliar face matching is frequently based on external features, such as hairstyle and face shape. Familiar faces also place lower demands on visual working memory (Jackson & Raymond, 2008) and are easier to detect under conditions of reduced attention, compared with unfamiliar faces (Jackson & Raymond, 2006).

Since the initial description of the illusion, simultaneous and delayed matching procedures have been developed that overcome the need to use familiar faces (i.e., that allow investigation of holistic processing of unfamiliar faces). Crucially, these paradigms confirm that arrangements constructed from unfamiliar faces also produce composite face effects. In simultaneous-matching paradigms, two composite arrangements—either aligned or misaligned, upright or inverted—are presented simultaneously and observers asked whether two target regions (e.g., the upper halves) are identical or not (Hole, 1994). In delayed-matching paradigms, participants are again presented with two composite arrangements and asked to judge whether the target regions are identical or not. However, rather than appear simultaneously, the composites are presented sequentially (Goffaux & Rossion, 2006; Le Grand, Mondloch, Maurer, & Brent, 2004; Michel, Rossion, Han, Chung, & Caldara, 2006). In a popular variant, the first composite is presented briefly (e.g., 500 ms), followed by an inter-stimulus-interval, typically between 500 ms and 2 s. Thereafter, the second composite is presented until the participant responds (e.g., Michel et al., 2006). In both matching procedures, observers’ responses are disproportionately impaired (slower and/or less accurate) when composite arrangements are aligned and presented upright, than when misaligned or inverted.

Original and complete matching procedures

In simultaneous and sequential matching paradigms, the factorial combination of target-half (T same, T different) and distractor-half (D same, D different) yields four possible trial types: congruent-same (T same, D same), incongruent-same (T same, D different), congruent-different (T different, D different), incongruent-different (T different, D same). Note that on ‘congruent’ and ‘incongruent’ trials, susceptibility to the illusion therefore helps and hinders observers, respectively. Traditionally, researchers have employed a variant whereby distractor halves always differ, and therefore exert different illusory biases on the target regions (e.g., Cassia, Picozzi, Kuefner, Bricolo, & Turati, 2009; de Heering, Houthuys, & Rossion, 2007; Goffaux & Rossion, 2006; Konar, Bennett, & Sekuler, 2010; Le Grand et al., 2004; Robbins & McKone, 2007). The original matching procedure excludes congruent-same and incongruent-different trial types (Fig. 2a). More recently, some groups have advocated the use of so-called ‘complete design’ matching procedures (Fig. 2b), incorporating identical and different distractor halves and all four trial types (e.g., Cheung, Richler, Palmeri, & Gauthier, 2008; Richler, Cheung, & Gauthier, 2011a; Richler, Gauthier, Wenger, & Palmeri, 2008; Richler, Mack, Palmeri, & Gauthier, 2011). Importantly, upright face composite effects calculated using the complete and original matching designs do not correlate (Richler & Gauthier, 2014), suggesting that the variants measure different phenomena; however, the reason for the discrepancy remains contested (Meinhardt, Meinhardt-Injac, & Persike, 2014; Richler & Gauthier, 2013; Rossion, 2013).

Fig. 2
figure 2

Illustration of the (a) original and (b) complete designs used in simultaneous and sequential matching paradigms

Proponents argue that, compared to the original matching procedure, the complete design estimates observers’ susceptibility to the composite illusion more accurately. Unlike estimates inferred using the complete design, composite effects measured using the original matching task may be influenced by differences in response bias (Richler & Gauthier, 2014). For example, estimates of composite interference derived from original matching designs are disproportionately driven by responses on incongruent-same trials where the presence of the different distractors makes it harder to recognise that same target halves are identical. Where observers have a pre-existing bias to respond ‘different’, original matching designs may underestimate their susceptibility to the illusion. Consistent with this view, belief manipulations that affect response bias distort composite effect estimates measured using the original matching design, but not the complete design (Richler, Cheung et al., 2011a).

However, the complete design has also been criticised. Importantly, the effects of alignment seen in the additional congruent-same and incongruent-different trials may have a distinct origin to those seen in the incongruent-same and congruent-different trials included in the original matching design. Specifically, the facilitation seen on congruent-same trials, and the interference seen on incongruent-different trials, may reflect domain-general processes akin to response priming and response conflict, not perceptual integration (Rossion, 2013). Accordingly, complete designs may be prone to strategic effects (see section below on Strategic and contextual influence ), and may be more likely to find effects with non-face composites (see section on Domain specific vs. domain general processing ); for example, composite effects have been observed for unfamiliar Chinese characters using this paradigm (Hsiao & Cottrell, 2009). A further objection relates to the predicted pattern of responses in the congruent-different condition. On these trials, different distractor regions are paired with different target regions. On some trials it is reasonable to assume that the additional different signal present in the distractors is fused with the physical differences between the targets, thereby making it easier for observers to respond ‘different’. However, where the physical difference between the targets is already large, it is conceivable that fusion with distractor regions with less obvious differences may actually dilute the difference signal, hindering correct responding (Robbins & McKone, 2007).

Control conditions

While different research groups employ different control conditions, the aim of these manipulations is consistent: to rule out simpler, generic accounts of the performance decrements seen when upright composites are aligned. The most commonly used control conditions, misalignment and inversion (see Fig. 1), were introduced by Young and colleagues (1987). Interference is greatly reduced when composites are misaligned horizontally, excluding the possibility that the interference is simply caused by the proximity of the distractor half. Transposing the position of upper and lower halves (Abbas & Duchaine, 2008) and presenting the halves side-by-side (Richler, Tanaka, Brown, & Gauthier, 2008), also speak against this account. Similarly, the finding that composite interference is greatly diminished—although not eliminated entirely (e.g., Susilo, Rezlescu, & Duchaine, 2013)—when stimulus arrangements are shown upside-down confirms that performance decrements in the upright-aligned condition are not induced by the presence of the continuous boundary when the halves are aligned; whereas the boundary is disrupted by misalignment, it is preserved by inversion (Rossion, 2013). Where effects of alignment are seen for upright, but not inverted arrangements, authors can also exclude the possibility that the differential interference observed in these conditions reflects differences in the size of the spotlight of visuospatial attention (e.g., McKone et al., 2013). For these reasons, many consider the use of convergent inverted and misaligned control conditions necessary for clear interpretation of composite interference (McKone et al., 2013; Rossion, 2013). Finally, some investigations have included parallel tasks where observers judge composites constructed from non-face objects, including cars (Cassia et al., 2009) and dogs (Robbins & McKone, 2007). Where observed, findings of face-specificity argue against the possibility that decrements in the upright-aligned condition reflect domain-general interference that may be produced by any object with a canonical orientation (e.g., Robbins & McKone, 2007).

Which judgments are affected?

In the original description of the illusion, observers were slower to name the identity of the faces from which target regions were sourced as a consequence of the composite interference (A. W. Young et al., 1987). It appears, however, that composite interference affects several other attributions, extending beyond facial identity. For example, viewers are slower to identify the emotion present in a target half, when paired with a distractor half sourced from the same face, but expressing a different emotion. Once again, disproportionate interference is seen when composite arrangements are presented upright and aligned, with little or no interference seen when the halves are misaligned or inverted (Calder, Young, Keane, & Dean, 2000; Palermo et al., 2011; Tanaka, Kaiser, Butler, & Le Grand, 2012; Wegrzyn, Bruckhaus, & Kissler, 2015). Composite interference also appears to bias the perception of facial gender and age. Observers were slower and made more errors in categorising the gender of target halves when composite faces were constructed using halves from different genders, relative to same-gender and misaligned control conditions (Baudouin & Humphreys, 2006). Similarly, when estimating the age of composite top halves, observers’ estimates were biased towards the age of the bottom halves, a pattern of performance that was attenuated, but not eliminated, by inversion (Hole & George, 2011). Composite interference also distorts evaluative impressions. Top halves of upright composite faces were judged more attractive (Abbas & Duchaine, 2008) and trustworthy (Todorov, Loehr, & Oosterhof, 2010) when aligned with attractive and trustworthy bottom halves, respectively.

Low-level stimulus characteristics

Several studies have sought to investigate how observers’ susceptibility to the composite face illusion is affected by low-level image manipulations. For example, results obtained using the original matching procedure suggest that filtered composites that preserve low-spatial frequencies, induce stronger illusion-induced interference than high-spatial frequency composites (Goffaux & Rossion, 2006). While this finding is consistent with the view that holistic processing is supported predominantly by low-spatial frequencies (Sergent, 1986), it has been contested; subsequent findings obtained using the complete design have found no effect of spatial frequency (Cheung et al., 2008). A related suggestion is that composite effects vary as a function of face size, with arrangements resembling faces viewed at ~ 2 m inducing the strongest illusions when measured using the complete design (Ross & Gauthier, 2015). Facial composites presented in grayscale have been found to induce greater composite interference in the original matching procedure, than composites shown in colour (Retter & Rossion, 2015). Interestingly, photographic negation has little effect on the strength of the illusion when measured using the original matching procedure; grayscale composites induce similar interference in positive and negative contrast polarity (Hole, George, & Dunsmore, 1999; Taubert & Alais, 2011).

Development

Susceptibility to the composite face illusion appears to be present early in development. For example, de Heering and colleagues (2007) found that 4-, 5- and 6-year-olds displayed similar susceptibility to each other and to adults when tested using the original matching procedure (see also Carey & Diamond, 1994; Mondloch, Pathman, Maurer, Le Grand, & de Schonen, 2007; Susilo, Crookes, McKone, & Turner, 2009). Comparable results have also been reported for expression composites (Durand, Gallay, Seigneuric, Robichon, & Baudouin, 2007). Susceptibility, however, may be present at an even younger age. Composite interference has been reported in 3-year-olds when tested on a simultaneous variant of the original matching procedure (Cassia et al., 2009). Remarkably, recent evidence from a visual preference paradigm indicates that infants as young as 3 months may also be susceptible to the illusion. When misaligned, 3-month-old infants exhibited a looking preference for familiar eye-regions over unfamiliar eye-regions. However, when aligned with an unfamiliar mouth, no systematic preference was observed (Turati, Di Giorgio, Bardi, & Simion, 2010).

Early susceptibility to the composite face illusion may partly reflect a genetic contribution. Having tested 173 twin pairs (102 monozygotic and 71 dizygotic, age 7–19 years) using the original matching procedure, the correlation between the composite effects estimated for the monozyotic twins exceeded that seen for the dizygotic twins (Zhu et al., 2010). The authors estimated that genetic heritability accounted for 31% of the variance in the twins’ susceptibility to the illusion. As is often the case with heritable traits (e.g., Haworth et al., 2010), the genetic influence appeared to increase with age; the variance attributable to genetic factors was greater in the 13- to 19-year-olds, than in 7- to 12-year-olds. No concordance was observed for a non-face measure of global-to-local interference based on the Navon Task (Navon, 1977), suggesting that the heritability of the composite face illusion may show a degree of face-specificity.

Complementary evidence suggests that visual experience also plays a critical role in shaping susceptibility to the composite face illusion. Twelve individuals (age 9–23 years) deprived of visual experience during the first 3–6 months of development due to bilateral congenital cataracts, showed little or no susceptibility to the composite face illusion when measured using the original matching design. Free from illusion-induced interference, these individuals outperformed controls when the target and distractor halves were aligned (Le Grand et al., 2004). When tested using the original matching design, Michel and colleagues (2006) found that observers’ susceptibility to the composite illusion was also greater when participants viewed composite faces of their own race (but for conflicting results obtained with the complete design see Horry, Cheong, & Brewer, 2015; Zhao, Hayward, & Bülthoff, 2014; for a review of this literature, see Hayward, Crookes, & Rhodes, 2013). Similarly, Susilo and colleagues (2009) found stronger effects using the original matching procedure when composites were constructed from own-age faces (but see Wiese, Kachel, & Schweinberger, 2013). However, preschool teachers (i.e. observers with extensive contact with child faces) failed to show an own-age effect when tested using the original matching task; they showed similar composite effects for arrangements constructed from own-age faces and the faces of children (de Heering & Rossion, 2008). Where observed, these biases may therefore be products of experience.

Neural basis

Neural markers revealed by fMRI and EEG

The neural representation of composite faces has been addressed using fMRI adaptation paradigms. These studies take advantage of the fact that repeated presentation of the same stimulus elicits a weaker change in the blood-oxygen-level-dependent (BOLD) signal, than presentation of two different stimuli. The same top half of a face elicits a larger response—a bigger release from adaptation—in the fusiform face area (FFA) when aligned with a lower half sourced from a different facial identity than when taken from the same face (Schiltz, Dricot, Goebel, & Rossion, 2010; Schiltz & Rossion, 2006). However, this difference is not found when the top and bottom face regions are spatially misaligned (Schiltz et al., 2010; Schiltz & Rossion, 2006; but see Harris & Aguirre, 2010) or when arrangements are inverted (Schiltz & Rossion, 2006).

The composite face illusion is also associated with a characteristic electroencephalography (EEG) signature. In comparison to aligned composite faces, consistent evidence indicates that misalignment elicits a delayed and enhanced N170 (Jacques & Rossion, 2009, 2010; Kuefner, Jacques, Prieto, & Rossion, 2010; Letourneau & Mitchell, 2008; Soria Bauser, Schriewer, & Suchan, 2014; Wiese et al., 2013)—a response that bears striking similarity to the modulation elicited by face inversion (Eimer, 2000). Consistent with behavioural findings, the effect associated with misalignment is not observed when the composites are inverted (Jacques & Rossion, 2010). However, the response elicited may also depend on the stimulus attributes; a stronger effect of misalignment has been reported for young faces in comparison to older faces, regardless of participant age (Wiese et al., 2013). For aligned composite faces a larger N170 is also observed when sequentially presented top halves differ than when they are identical (Jacques & Rossion, 2009). Crucially, however, a similar increase in the N170 is observed when aligned distractor halves induce the illusion that two identical target halves differ (Jacques & Rossion, 2009; Kuefner et al., 2010). This effect is not seen with misaligned composites (Jacques & Rossion, 2009).

Brain stimulation and neuropsychology

Findings that illusory composite fusion can modulate FFA and N170 responses are of great interest. However, it is important to note that these observations are correlational. For example, the fMRI adaptation effects may reveal a causal role for the FFA in the emergence of the illusory percept, or may merely be a consequence of the illusion. The functional significance of these neural signatures thus remains unclear. In contrast, brain stimulation and neuropsychological paradigms potentially provide evidence that a particular neural substrate is causally involved in the experience of the illusory percept; where a region makes a necessary contribution to the illusion, interference in the form of stimulation or acquired lesion, should alter observers’ susceptibility. To date, few investigations have employed brain stimulation to understand the neural basis of the composite face illusion. However, a recent study employing the complete design found that transcranial direct current stimulation (tDCS) applied to occipito-temporal cortex reduced observers’ susceptibility to the composite face illusion, relative to a sham condition (Yang et al., 2014; but see Renzi et al., 2015).

In cases of Acquired Prosopagnosia (AP), individuals are left with face recognition difficulties following brain injury. To date, surprisingly few APs have been tested on composite face tasks. However, patients with damage to posterior (Busigny, Joubert, Felician, Ceccaldi, & Rossion, 2010; Ramon, Busigny, & Rossion, 2010) or anterior (Busigny et al., 2014) regions of the right temporal lobe have been found to exhibit reduced composite face effects, measured using original matching procedures, relative to matched controls. Typical susceptibility to the original matching procedure was reported in patient Herschel, who developed AP following extensive lesions predominantly in the right occipitotemporal cortex (Rezlescu, Pitcher, & Duchaine, 2012). When tested using the complete design, patient LR, an individual who developed AP following focal damage to the anterior portion of the right temporal lobe, exhibited a typical composite effect (Bukach, Bub, Gauthier, & Tarr, 2006). However, this result is qualified by an abnormal pattern of responses; specifically, the time taken by LR to identify the top half of the face was influenced by the bottom half in both the aligned and misaligned conditions, indicative of an atypical composite effect (Busigny et al., 2014).

Patient CK, who exhibited severe integrative object agnosia, showed broadly typical recognition of upright aligned faces; for example, CK was able to identify celebrity faces as well as age-matched controls, even when the identity was obscured by a disguise. However, CK’s recognition was drastically impaired relative to controls when upper and lower face halves were misaligned horizontally, or when spatially aligned faces were inverted (Moscovitch, Winocur, & Behrmann, 1997). Interestingly, CK’s recognition was largely unaffected when left and right facial halves were misaligned vertically, hinting at qualitative differences between the disruption induced by horizontal and vertical displacement. Because of the nature of CKs injury (MRI revealed no obvious circumscribed lesion), it is not possible to localize the sources of his impairments. Nevertheless, these results suggest that the neurocognitive mechanisms responsible for processing upright aligned faces dissociate from those mediating recognition of inverted or horizontally misaligned faces (see discussion of gated processing in Domain specific vs. domain general processing ).

The susceptibility of clinical populations to the illusion

Developmental prosopagnosia

Individuals with developmental prosopagnosia (DP) experience life-long face recognition difficulties, despite normal levels of intelligence and typical low-level vision (Behrmann & Avidan, 2005; Cook & Biotti, 2016; Duchaine & Nakayama, 2006b; Susilo & Duchaine, 2013). Some DPs appear to show reduced susceptibility to both the identity (Avidan, Tanzer, & Behrmann, 2011; Palermo et al., 2011; but see Susilo et al., 2010), and expression (Palermo et al., 2011) variants of the composite face illusion, assessed using original matching procedures. However, DP is thought to be a heterogeneous condition (e.g., Susilo & Duchaine, 2013), and cases have been described who show broadly typical composite face effects on simultaneous variants of the original matching task (Le Grand et al., 2006). The contribution of task differences (e.g., simultaneous vs. sequential matching paradigms) to the discrepant findings remains unclear. We also note that several studies employing the original matching design fail to report participants’ performance on different trials (e.g., Avidan et al., 2011; Palermo et al., 2011). While individual differences in illusion susceptibility may be most pronounced in the incongruent-same condition (e.g., Le Grand et al., 2004), failure to report observers’ performance when targets differ makes it impossible to assess the potential contribution of response bias to group differences, where observed.

Autism Spectrum Disorder

It has been suggested that people with autism spectrum disorders (ASD) may exhibit reduced susceptibility to the illusion. Observers with ASD often focus on local features and may therefore experience problems forming integrated global representations (Happe & Frith, 2006). Moreover, many individuals with ASD often exhibit reduced sensitivity to illusions induced by contextual influence (Behrmann, Thomas, & Humphreys, 2006; Simmons et al., 2009). Research examining composite face effects in ASD has yielded mixed results (Weigelt, Koldewyn, & Kanwisher, 2012). High-functioning adults with ASD showed broadly typical composite face effects to age- and IQ-matched controls on a sequential matching task (Nishimura, Rutherford, & Maurer, 2008). In contrast, a sample of adolescents with ASD failed to show the typical composite effect on a sequential matching task; their matching ability was very similar in the upright-aligned and upright-misaligned conditions (Teunisse & de Gelder, 2003). Whereas the foregoing studies employed the original matching design, Gauthier, Klaiman and Schultz (2009) found evidence of atypical composite face effects in a sample of adolescents with ASD using the complete design; the distractor halves induced similar interference in both aligned and misaligned conditions.

Schizophrenia

There is growing interest in the face recognition of observers with schizophenia. In particular there has been suggestion that they may exhibit reduced holistic representation (Bortolon, Capdevielle, & Raffard, 2015). Schwartz, Marvel, Drapalski, Rosse and Deutsch (2002) utilized the composite face paradigm to examine holistic processing in a sample of 19 medicated observers with schizophrenia. Using the naming paradigm (A. W. Young et al., 1987), the authors found a comparable composite effect (i.e., greater composite interference in the aligned, than in the misaligned condition) in their clinical sample and matched controls, suggesting typical susceptibility to the illusion. However, in this particular investigation the authors utilized a misaligned control condition only; it is unclear whether the misalignment advantage is also seen for inverted composite arrangements (McKone et al., 2013).

Theoretical interpretations

Many explanations of the composite face effect appeal to the idea that upright aligned composites gain access to high-level face-specific processing, whereas misaligned and inverted composites do not. In contrast, some authors have challenged this view and have advanced domain-general accounts (see section on Domain specific vs. domain general processing ). Domain-specific accounts have been influenced heavily by the concepts of configural and holistic representations (Configural and holistic processing). However, while these ideas remain valuable theoretical heuristics, they are under-specified as theories in their own right (Burton, Schweinberger, Jenkins, & Kaufmann, 2015; Piepers & Robbins, 2013; Richler, Palmeri, & Gauthier, 2012). Drawing on the computer vision literature, attempts have therefore been made to provide instantiated image-processing accounts of the composite face illusion (Image processing models).

Domain specific vs. domain general processing

Faces may recruit additional perceptual processing not engaged by other classes of object (Kanwisher, 2000; McKone, Kanwisher, & Duchaine, 2007; McKone & Robbins, 2011). However, employing resource-intensive processing indiscriminately is inefficient. Some stages of face processing may therefore be gated, whereby sophisticated domain-specific processing only commences once a face has been detected in the environment (Tsao & Livingstone, 2008). Detection of simple, proto-facial features, or ‘faciotopy’ (Henriksson, Mur, & Kriegeskorte, 2015), possibly mediated by subcortical visual processing, may be sufficient to engage more complex cortical processing (Johnson, 2005; Shah, Gaule, Bird, & Cook, 2013). When composite faces are aligned and presented upright, the presence of the intact facial arrangement may therefore permit access to face-specific processing, responsible for the composite illusion. Inverted or misaligned composites may lack the basic faciotopy necessary to gain access to the highest levels of face-specific processing, and therefore do not induce composite interference (Tsao & Livingstone, 2008). It is striking that these manipulations also disrupt the residual face recognition ability seen in patient CK (see Brain stimulation and neuropsychology ). Potential accounts of this high-level face processing are described in the following sections (see Configural and holistic processing and Image processing models ).

In contrast, some authors reject the idea that the composite-face illusion is a product of face-specific representation mechanisms; rather, they argue that composite interference reflects a form of automatic processing recruited by ‘objects of expertise’. Objects of expertise are categories of objects (1) with which the observer has extensive visual experience, and (2) comprising exemplars that share a common prototypical feature arrangement. Thus, birds (Gauthier, Skudlarski, Gore, & Anderson, 2000) and dogs (Diamond & Carey, 1986) may be objects of expertise for ornithologists and dog show judges, respectively. Following extensive experience individuating exemplars, observers are thought to process the separate features of objects of expertise as a unified whole (Diamond & Carey, 1986; Richler et al., 2012; Richler, Wong, & Gauthier, 2011; A. C. Wong, Palmeri, & Gauthier, 2009).

Consistent with this domain-general account, composite effects obtained using the complete design have been reported for non-face objects of expertise encountered naturally, including cars (Bukach, Phillips, & Gauthier, 2010; Gauthier, Curran, Curby, & Collins, 2003), words (A. C. Wong et al., 2011), Chinese characters (A. C. Wong et al., 2012), and chess boards (Boggan, Bartlett, & Krawczyk, 2012). Similar composite effects have been reported with synthetic non-face objects, including ‘Greebles’ (Gauthier & Tarr, 2002; Gauthier, Williams, Tarr, & Tanaka, 1998) and ‘Ziggerins’ (A. C. Wong et al., 2009), where expertise is acquired through lab-based training. Crucially, the type of training administered appears to modulate illusion susceptibility. Participants who received individuation training—naming particular exemplars—exhibited greater composite effects than those who received categorisation training—judging which group a given exemplar belonged to (A. C. Wong et al., 2009). There have been relatively few reports of composite effects for non-face objects obtained using the original matching procedure. For example, Robbins and McKone (2007) failed to find a composite effect when dog experts were presented with arrangements constructed from dog stimuli. Similarly, Greeble experts failed to show composite effects for Greebles (Gauthier & Tarr, 2002; Gauthier et al., 1998). Recently, however, a study found evidence of composite effects for body postures using the original procedure (Willems, Vrancken, Germeys, & Verfaillie, 2014), possibly reflecting similarities in the way that faces and bodies are processed (Minnebusch & Daum, 2009; see section: Not all facial composites are created equal ).

Configural and holistic processing

It has been widely suggested that upright faces may recruit a rapid parallel analysis of the whole face. The terms ‘holistic’ and ‘configural’ are often used interchangeably to describe this whole-face processing, leading to some confusion in the literature (e.g., Piepers & Robbins, 2013). However, two broadly distinct accounts may be delineated.

Faces are defined by the presence and prototypical arrangement of certain basic features; i.e., two eyes above a nose and mouth, so-called first-order relations. Because all faces share this common arrangement, the spatial distances between internal features, so-called second-order relations, may be particularly important for the recognition of individual faces (Diamond & Carey, 1986). The processing of these second-order relations has been commonly termed ‘configural’ processing. It is possible that the composite face illusion alters configural processing; for example, when composite faces are presented upright and aligned, observers may perceive a novel configuration that hinders recognition of the constituent regions (Hancock, Burton, & Bruce, 1996; A. W. Young et al., 1987). Consistent with this interpretation, observers’ ability to discriminate faces that share common features and differ only in their spatial arrangement (e.g., eyes close together or far apart), is greatly diminished when stimuli are inverted (Barton, Keenan, & Bass, 2001; Freire, Lee, & Symons, 2000; Goffaux & Rossion, 2006; Haig, 1984; Leder & Bruce, 2000; Leder, Candrian, Huber, & Bruce, 2001; Rhodes, Brake, & Atkinson, 1993; Searcy & Bartlett, 1996).

Alternatively, variation in feature shape and the spatial relations may be described within a single non-decomposable holistic representation—a ‘Gestalt’ that cannot be broken down into its constituent parts or the inter-relations between them (Farah et al., 1998). Because different distractor regions induce the modelling of a new face Gestalt, they alter the appearance of the target half. This view accords with recent findings that the composite face illusion alters observers’ perception of feature shape as well as feature configurations (Hayward, Crookes, Chu, Favelle, & Rhodes, 2016). This holistic account is also consistent with the part-whole effect, whereby individual features are judged more accurately in the context of an upright face than when shown in isolation, despite the context remaining uninformative (Tanaka & Farah, 1993). Interestingly, this contextual advantage disappears when stimulus arrangements are shown upside-down. Similarly, individuals’ ability to discriminate exemplars that differ only in terms of (1) their features, or (2) their inter-feature spacing, are highly correlated for upright faces, but not for inverted faces or non-face objects (Yovel & Kanwisher, 2008).

In an attempt to reconcile these views, it has been proposed that holistic, configural, and piecemeal (‘parts-based’) processing, are points on a processing continuum defined by degree of feature integration (Reed, Stone, Grubb, & McGoldrick, 2006). However, the distinction between holistic and configural processing remains contentious. Changing the spatial relations between features necessarily changes the Gestalt representation hypothesised by the holistic account. The distinction between the accounts therefore relies on the assumption that feature shape may be changed independently of the spatial relations between the features. While this view is accepted by some authors (Farah et al., 1998; Yovel & Kanwisher, 2008), it is disputed by others on the ground that changing local contour and shading cues also changes the configural relations between features (Hancock et al., 1996; McKone & Yovel, 2009; Piepers & Robbins, 2013).

Image processing models

According to the Gabor jet model (Biederman & Kalocsais, 1997; Wiskott, Fellous, Kuiger, & Von Der Malsburg, 1997), faces are described by columns of multi-scale, multi-orientation filters (Fig. 3a), with their receptive fields centered on facial landmarks defined relative to the first-order features (Fig. 3b). The model retains biological plausibility, insofar as a Gabor jet approximates the multiscale, multiorientation tuning properties of cells in a V1 hypercolumn (De Valois & De Valois, 1988). For a given face, a Gabor jet centered on the left pupil, with different filters for eight orientations and five scales would yield 40-item vectors describing the contrast variation around the left eye (Fig. 3c). Face recognition is achieved by comparing the combined readout from all of the Gabor jets, with stored templates for known faces. Importantly, variation in one part of the face will affect kernels with medium and large receptive fields centered on other parts of the face (Xu, Biederman, & Shah, 2014). Where aligned composite faces are shown upright, image variation present in the lower face half will therefore alter the readout of Gabor Jets centered on features in the upper half, biasing observers’ perception of its identity (Herald, Shah, Xu, Biederman, & Juarez, 2015).

Fig. 3
figure 3

(a) Illustration of a Gabor Jet, comprising different multi-scale, multi-orientation filters. (b) The receptive fields of the filters in a particular Gabor Jet are centered on a particular facial landmark. The receptive fields of the intermediate and larger scale filters overlap and cover wide areas of the face. (c) Gabor jets comprising n filters yield a vector of n items describing the contrast variation at the particular facial landmark sampled by the Gabor Jet

There has also been interest in the utility of unsupervised data reduction algorithms as models of human face perception. Following an initial pre-processing stage—input images are typically cropped to a common aspect ratio and faces aligned using sets of fiducial points—algorithms are able to extract dimensions that describe the variation within a set of input images. To date, many authors have used principal components analysis (PCA) to illustrate the value of this approach (Calder, 2011; Calder, Burton, Miller, Young, & Akamatsu, 2001; Calder & Young, 2005; Hancock et al., 1996). When applied to faces, PCA returns the n Eigenface dimensions (whole face components that may be combined linearly to reconstruct a given facial exemplar) that most effectively describe the variation present within the set of sampled images (Turk & Pentland, 1991). Thereafter, facial exemplars are represented as vectors in the facespace defined by the n Eigenface dimensions. Importantly, where PCA is applied only to intact, naturalistic faces, components will reflect the natural covariation present in different face halves; for example, happy and angry mouth regions will have been encountered in the presence of accompanying happy and angry eye-regions. Unless the algorithm encounters faces with different emotion signals in their top and bottom halves, the resulting PCA space will lack the dimensionality with which to represent mixed-emotion composites (Cottrell, Branson, & Calder, 2002). The illusion-induced interference first reported by Calder et al., (2000) may therefore reflect the process of modelling emotion-incongruent composites within an emotion-congruent face-space. By appealing to the same logic, PCA models can explain other types of composite binding such as that seen for age and gender (see Which judgments are affected?). Many of the principles and predictions derived from PCA models generalize well to other data reduction algorithms, notably Architecture II independent components analysis (ICA; Bartlett, Movellan, & Sejnowski, 2002; Calder & Young, 2005; Nestor, Plaut, & Behrmann, 2013).

Future directions

In the foregoing sections, we have reviewed the body of empirical findings reported with composite paradigms and existing theoretical accounts of the illusion. In this section, we look forward, highlighting priorities for future research. In particular, several critical questions remain unresolved. These gaps in our knowledge undermine attempts to evaluate and refine theoretical accounts of the composite face illusion.

The functional significance of the composite face illusion

The view that holistic representation, as measured by the composite face illusion, is causally related to face recognition ability remains extremely popular; for example, holistic processing may permit accurate and efficient representation (Maurer et al., 2002; Piepers & Robbins, 2013). However, the functional significance of the composite face illusion remains uncertain. First, the literature is inconsistent with respect to the relationship between individuals’ susceptibility to the composite face illusion and other markers of holistic representation, including the part-whole (Tanaka & Farah, 1993) and face-inversion (Yin, 1969) effects. Whilst some authors have found associations between these measures using the complete design (DeGutis, Wilmer, Mercado, & Cohan, 2013), others using the original matching procedure have not (Durand et al., 2007; Wang, Li, Fang, Tian, & Liu, 2012). For example, Wang and colleagues (2012) found susceptibility to the composite and part-whole effects were unrelated. Second, studies comparing observers’ susceptibility to the composite face illusion and face recognition ability have also yielded equivocal results; whilst some studies have found little or no correlation using the original matching procedure (de Heering & Maurer, 2014; Konar et al., 2010; Wang et al., 2012), other authors have observed associations with ability using the complete (DeGutis et al., 2013) and original procedures (Avidan et al., 2011). It is likely that differences between the composite tasks used (e.g., Richler, Cheung, & Gauthier, 2011b), their analyses (e.g., DeGutis et al., 2013), the sensitivity of face recognition measures (e.g., Duchaine & Nakayama, 2006a), and sample composition (e.g., Avidan et al., 2011), all contribute to the discrepant findings. It is imperative that future research teases apart these sources of variation to understand the functional significance of the illusion.

Strategic and contextual influences

The composite face illusion is frequently attributed to automatic holistic processing—either mediated by face-specific mechanisms or a product of perceptual expertise (see Theoretical interpretations ). Importantly, however, several results suggest that strategic and contextual factors influence susceptibility to the illusion. For example, Navon priming manipulations that bias observers’ attention towards global form increase susceptibility to the composite-face illusion measured using the complete design (Gao, Flevaris, Robertson, & Bentin, 2011). Conversely, induction of negative mood (Curby, Johnson, & Tyson, 2012) and the presence of unrelated observers (Garcia-Marques, Fernandes, Fonseca, & Prada, 2015), were found to decrease illusion susceptibility when measured using the original matching procedure. In a study employing the complete design, greater distractor interference was observed for composites made from unfamiliar non-face objects when trials were preceded by an aligned composite face (Richler, Bukach, & Gauthier, 2009). Aligned composites also induce more interference when preceded by misaligned composites—possibly reflecting the creation of a larger ‘attentional window’ that biases subsequent perceptual processing—than when preceded by aligned composites (Richler et al., 2009). Where observed, composite effects for unfamiliar non-face objects (e.g., Hsiao & Cottrell, 2009), may be a product of contextual and strategic influences, whereas composite effects observed with objects-of-expertise, including faces, may be the product of increasingly automatic holistic processing (Richler, Wong et al., 2011). Understanding how strategic and contextual factors influence performance on composite paradigms will inform investigation into the perceptual origins of the illusion (Richler, Cheung et al., 2011a; Rossion, 2013).

Top-down or bottom up?

There is growing interest in how flexible strategic factors, including attention and response base-rates, can affect the size of composite face effects ( Strategic and contextual influences ). A related question is the extent to which the illusion itself is a product of ‘bottom-up’ feed-forward processing of upright aligned composite arrangements, or a ‘top-down’ interpretation imposed on a sensory description (see Gregory, 1997). Several theoretical approaches, including the Gabor Jet (Xu et al., 2014) and PCA accounts (Cottrell et al., 2002), model composite interference as an emergent property of feed-forward descriptive processes. Alternatively, however, the composite illusion may be a product of a top-down modelling process. The goal of visual perception is to infer the causes of sensory input; percepts can be thought of as inferences from sensory data and knowledge derived from the past (Friston, 2005; Gregory, 1997; Kersten, Mamassian, & Yuille, 2004). The generative models we have acquired for faces will likely reflect the whole-face covariation we experience during the course of our lives. These whole-face models may aid perception of ambiguous forms or expressions under typical conditions, but will tend to generate holistic illusions when composite inputs come from different faces.

While little attempt has been made to distinguish these possibilities, several lines of evidence suggest that a top-down account is plausible. For example, should the composite illusion be an emergent property of feed-forward processing, some degree of illusory bias might be expected where distractor regions are masked and fail to reach conscious awareness. However, studies employing continuous flash suppression suggest that distractor face regions do not induce illusory bias when presented outside of conscious awareness (Axelrod & Rees, 2015). Conversely, when observers are consciously aware of stimulus arrangements, a surprising array of stimuli successfully elicit the composite illusion. For example, photographic negation drastically distorts the appearance of faces (Galper, 1970); in particular, the inversion of shape-from-shading cues, whereby patterns of shading are used to infer three dimensional form, give negated faces a grotesque appearance (Kemp, Pike, White, & Musselman, 1996). While this manipulation might be expected to severely hamper feed-forward processing, studies employing the original matching paradigm suggest that arrangements shown in typical polarity and photographic negative induce broadly similar composite effects (Hole et al., 1999; Taubert & Alais, 2011). Similarly, compelling demonstrations of the composite illusion can be achieved using abstract cartoon faces (Fig. 4). While cartoon and naturalistic faces may share conceptual characteristics, they bear little physical resemblance. Knowledge about faces therefore appears to modulate the composite face illusion in the absence of typical appearance.

Fig. 4
figure 4

Compelling demonstrations of the composite illusion can be provided using highly abstract cartoon-face stimuli. In the upright-aligned condition (left), the presence of the difference mouths makes it harder to recognise that the eyes are the same. The illusion is diminished in the misaligned (middle) and inverted (right) conditions

Not all facial composites are created equal

While considerable research has addressed differences in individuals’ susceptibility to the composite face illusion (Avidan et al., 2011; DeGutis et al., 2013; Richler, Cheung et al., 2011b; Wang et al., 2012), there is a paucity of research examining inter-stimulus differences. While it is recognised that some facial composites induce stronger illusions than others (e.g., Richler & Gauthier, 2014; Ross, Richler, & Gauthier, 2015), little explanation has been offered for these differences. A better understanding of these differences might inform theoretical accounts; for example, it will be interesting to see whether image processing approaches (e.g. Gabor Jet and PCA accounts; see Image processing models ) can model this inter-stimulus variability. Moreover, an appreciation of inter-stimulus-variability may help disambiguate some of the equivocal aspects of the literature (e.g., the functional significance of the illusion, findings from clinical populations). Some of the variability may reflect low-level image differences including image scale, spatial frequency and colour information (see Low-level stimulus characteristics ). Differences in shape and texture variation may also be important. For example, in original matching paradigms, distractors with incongruent shape but congruent texture exert more influence on a target region, than distractors with incongruent texture but congruent shape (Jiang, Blanz, & Rossion, 2011).

A further possibility is that observers are detecting emotion cues in some of the ostensibly ‘neutral’ faces used to construct facial composites (Fig. 5). When posing for photos, actors seeking to appear neutral may appear anxious or bored. Moreover, it is not always easy to distinguish a stranger’s permanent facial shape from their transient facial expressions; e.g., whether an unfamiliar actor is scowling or simply has narrow eyes (e.g., Todorov, Said, Engell, & Oosterhof, 2008). Crucially, observers may therefore perceive emotion where actors do not intend to convey emotion. It is well established that facial emotions induce strong composite effects (Calder et al., 2000; Palermo et al., 2011; Tanaka et al., 2012). Facial composites rich in perceived emotion cues may therefore induce stronger composite effects when observers are asked to judge whether target regions are, or are not identical (e.g., in original or complete matching paradigms). Interestingly, gaze direction—known to modulate the perception of facial emotion; for example, direct gaze makes expressions appear angrier (Adams & Kleck, 2003, 2005)—has been found to influence the magnitude of composite effects measured using the original matching design (S. G. Young, Slepian, Wilson, & Hugenberg, 2014). We also note that a study employing the original matching procedure recently found composite effects with expressive body postures (Willems et al., 2014), suggestive of holistic coding of body posture. In contrast, authors have failed to observe composite effects with neutral bodies (e.g., Soria Bauser, Suchan, & Daum, 2011).

Fig. 5
figure 5

Examples of facial composites taken from a commonly used stimulus set developed Le Grand et al. (2004). While the composites are constructed with ostensibly neutral faces, subtle emotion cues are present in many of the faces

The nature of holistic information representation

The composite face illusion is frequently attributed to whole-face processing (see Domain specific vs. domain general processing ). However, surprisingly little is known about the nature of the integrated representations derived from this analysis. At least two lines of evidence suggest that current ‘holistic’ and ‘configural’ accounts may be under-specified. First, observers’ face recognition is surprisingly insensitive to vertical and horizontal stretching; for example, observers can easily recognise faces stretched to twice their true height (Hole, George, Eaves, & Rasek, 2002). Whole-face processing is thought to mediate accurate and efficient description of the spatial relationships between features (see Rossion, 2008). However, such findings suggest that spatial relations may be coded relatively—for example, the height of one feature may be described through reference to the height of another—not through absolute metrics—where horizontal and vertical distances are measured in units akin to degrees of visual angle (Burton et al., 2015). Second, recent findings suggest that, in comparison to vertical information, the horizontal information structure of faces contributes disproportionately to face recognition (Dakin & Watt, 2009) and is disproportionately sensitive to orientation inversion (Goffaux & Dakin, 2010). These findings raise the possibility that holistic representations may be dominated by horizontal information. Composite face paradigms may offer researchers a way to test these hypotheses.

Dynamic Face Processing

While the overwhelming majority of existing research has examined the composite face illusion using static faces, the faces we typically encounter outside of the lab are moving (O'Toole, Roark, & Abdi, 2002). A recent study employing the naming paradigm found that the presence of an aligned dynamic distractor impairs identification of dynamic target halves, learned beforehand (Favelle, Tobin, Piepers, Burke, & Robbins, 2015). Importantly, composite interference is diminished when the dynamic halves are misaligned or presented upside-down. Similarly, when asked to judge the speed of eye opening and closing movements, the presence of task-irrelevant mouth opening and closing induces illusory slowing of the eye changes (Cook, Aichelburg, & Johnston, 2015). Interestingly, this illusion-induced interference is only seen in upright dynamic faces, at particular relative-phase relationships. Whereas static composite effects are disrupted by spatial misalignment, phase-specificity suggests that dynamic integration may be sensitive to temporospatial alignment.

It is important that future research addresses the relationship between composite interference seen with dynamic faces and the recognition of moving faces. The presence of motion cues often improves facial recognition (Knight & Johnston, 1997; Lander, Christie, & Bruce, 1999). However, observers’ ability to use motion cues is sensitive to orientation, suggestive of holistic or configural representation (Hill & Johnston, 2001; Knight & Johnston, 1997). Interestingly, observers with ASD are relatively poor at recognising individuals’ facial motion signature (O'Brien, Spencer, Girges, Johnston, & Hill, 2014) and show little or no susceptibility to the illusory slowing induced by dynamic cross-feature interactions (Shah, Bird, & Cook, 2016). Moreover, the relationship between the dynamic and static composites remains unclear; while both illusions exhibit sensitivity to alignment and orientation, it is uncertain whether they are products of common processes or representations.

Conclusion

Results obtained from composite face procedures have contributed significantly to our understanding of holistic face processing, the detrimental effects of inversion, and aberrant face perception in clinical populations. The ongoing value of the paradigm is illustrated by its recent application to investigate dynamic face processing. However, composite procedures have been the subject of intense scrutiny, particularly over the last decade, and there is a growing sense that the composite face illusion, whilst easy to illustrate, is deceptively difficult to measure and interpret. Considerable debate has focussed on the use of original and complete designs (Richler & Gauthier, 2014; Richler et al., 2012; Rossion, 2013), control conditions (McKone et al., 2013), the analyses employed (DeGutis et al., 2013), and the effects of strategic and contextual factors (Richler, Wong et al., 2011). We have suggested that inter-stimulus variability may also affect the susceptibility estimates obtained by different authors and warrants greater consideration. Despite overarching disagreements on how to refine the paradigm, the composite face effect remains a compelling and fascinating visual illusion. As a tool, it still holds much potential in our investigation of how faces and other objects of expertise are processed.