Introduction

Faces are multidimensional visual stimuli that are capable of transmitting a great deal of information regarding a host of physical and social attributes. These attributes include (but are not limited to) the identity, sex, emotional expression, or gaze direction of the face. A fundamental question in the study of faces concerns the manner by which facial attributes are integrated into a unified phenomenal experience (Bruce, & Young, 1986; Haxby, Hoffman, & Gobbini, 2000; Fitousi, 2013; Fitousi, & Wenger, 2013; Young, & Yamane, 1992). Whereas extensive research has been conducted on the binding of simple features, such as color, shape, and spatial location (Treisman, 1996; Hommel, 1998), less effort has been invested in studying the binding of more complex dimensions, such as facial features. The present study sought to fill in this gap. It addressed the question of whether facial features (i.e., identity, emotion, and gaze direction), as well as non-facial features (i.e., spatial location and response location) are integrated in and across perception and action.

A novel hypothesis advanced in the present study postulates that people create, maintain, and retrieve transient memory structures of facial features. The quest for such “face files” in the present study has been inspired by the notions of “object files” (Kahneman, Triesman, & Gibbs, 1992), and “event files” (Hommel, 1998, 2004). These notions have been instrumental in the study of objects and attention (Gordon, & Irwin, 1996; Henderson, 1994; Hommel, 2005), but they have been rarely applied to faces. Harnessing concepts and methodologies from these literatures, the current investigation yielded consistent evidence for the existence of “face files”—transient memories of facial and non-facial features bindings. The results bear important implications for current theories of face and object recognition (Haxby et al., 2000), person construal (Freeman, & Ambady, 2011), and feature binding (Hommel, 1998; Treisman, 1996).

Faces and the binding problem

The primate brain codes the dimensions of perceptual objects in a distributed manner (Hubel, & Wiesel, 1977; Felleman, & Van Essen, 1991). In this process, elementary features, such as color, shape, and location, are represented in different feature maps in the visual cortex (Livingstone, & Hubel, 1987, 1988). A major challenge facing our perceptual systems is that of recombining the separate features into veridical representations of the viewed objects (Treisman, & Gelade, 1980). To accomplish this task, the primate brain should coordinate information from several independent and often temporally discordant sources. This formidable computational challenge has been often dubbed the binding problem (Singer, & Gray, 1995; Treisman, 1996; von der Malsburg, 1999). One notable example for the presence of a binding problem in perception is the finding of “illusory conjunctions” with color and shape (Triesman, & Schmidt, 1982).

Very much like objects, faces may pose binding problems to our visual system. This is because facial attributes are represented as separate codes in the brain. There is now ample evidence to suggest the involvement of a distributed network of brain areas that is responsible for the perception of specific facial dimensions (Haxby et al., 2002). For example, the processing of facial expression is governed by the amygdala (Breiter et al., 1996), whereas the processing of facial identity is held mainly in the fusiform area (FFA, Kanwisher, McDermott, Chun, 1997) and the superior temporal sulcus (STS, Haxby et al., 2000). Moreover, recordings in temporal cortex of nonhuman primates (Rolls, & Tovee, 1995; Sugase, Yamane, Ueno, & Kawano, 1999) support the existence of neuronal activity that is distributed across many neurons (Rogers, & McClelland, 2004; Spivey, & Dale, 2004).

Given the involvement of a highly distributed network in processing facial attributes, an acute binding problem may arise. Consider a situation in which you are presented with two facial identities with each conveying a different facial emotion (Jim happy, Dan sad). Your visual system must ensure that each identity is integrated with the correct emotion (Jim + happy, and Dan + sad). This is not a trivial task. Binding problems with faces may be even more difficult than with elementary low-level features. This is because faces, in addition to carrying invariant attributes (e.g., identity and gender), transmit a great deal of dynamic information, such as eye-gaze and emotional expressions. These attributes frequently change their physical appearance as well as their semantic meaning and thus require greater effort in maintaining accurate bindings.

A concrete example may be constructive here. Imagine you are standing in a crowded airport terminal, expecting your uncle to show up. You suddenly detect someone who is smiling at you. Then, you note that this “stranger” is approaching you. Finally, you understand that the man who is weeping on your shoulders is your uncle. Your uncle’s face went through many feature changes in the course of a relatively short period of time. Still, you succeeded in maintaining a single coherent representation. How can this be accomplished? It is likely that some sort of binding mechanism has been operative. Evidence for such binding mechanism comes primarily from situations in which binding fails. In the well-known McGurk effect (McGurk, & MacDonald, 1976), the vocal sound produced by a face is erroneously integrated with the lips movements, such that the perceiver hears a different phoneme than that articulated.

Facial attributes and “person construal”

Cognitive psychologists have invested much effort in studying the perceptual mechanisms that govern face processing (Bruce, & Young, 1986; Burton, Bruce, & Johnston, 1990; Calder, & Young, 2005; Farah, Wilson, Drain, & Tanaka, 1998; Fitousi, & Wenger, 2013; Fitousi, 2015, 2016; Haxby et al., 2000). Social psychologists have also studied the implications of perceiving the faces of others. This work has come to be known as “person construal” (Fiske, & Neuberg, 1990; Freeman, & Ambady, 2011; Macrae, Bodenausen, & Milne, 1995). Person construal research investigates the lower levelFootnote 1 perceptual mechanisms that produce social cognitive phenomena. A recent influential theory by Freeman and Ambady (2011) has proposed that perception of the social attributes in a face is a dynamic process that evolves over hundreds of milliseconds. In this model, perceptual processing of irrelevant social face attributes can partially activate other face attributes, including motor actions. Event-related potential (ERP) studies supported this conjecture, showing that the extraction of facial attributes (e.g., sex, race, and age) is immediately and concomitantly shared with the motor cortex (Freeman, Ambady, Midgley, & Holcomb, 2011).

Another source of support in the interactive theory of Freeman and Ambady (2011) comes from studies on response trajectories (Freeman, Pauker, Apfbelbaum, & Ambady, 2010; Freeman, & Ambady, 2009). In this type of studies, participants classify faces on a predefined facial attribute (e.g., age) by moving their hand toward one of two labels on the screen. The faces also vary on an irrelevant dimension (e.g., gender). Participants’ hand trajectories are often attracted to the label carrying the name of the irrelevant facial attribute (e.g., woman), indicating its abrupt online activation. These studies support the idea that face attributes interact with other face attributes at perceptual, cognitive, or motor levels. Freeman and Ambady’s (2011) theory contributes valuable insights into the interaction of perceptual and motor aspects of face perception, but it is moot with respect to the binding mechanism that shapes the ultimate representation. What is needed is a broader theoretical framework that can shed light on the binding of facial and motor attributes. The following section proposes such a framework.

From “object files” to “event files”

A systematic analysis of feature binding with objects has been performed by Kahneman and Treisman (1984) and Kahneman et al., (1992). They have used a preview task in which a letter appears in a prime display, and then the same letter or different letters is presented in a probe display. Naming latencies for the probe letter were faster if the letter’s identity was repeated and associated with the same object/location.Footnote 2 Kahneman et al. (1992) called this object-specific preview effect. According to these authors, the processing of a visual object leads to the creation of an “object file”, an episodic representation of the object’s identity and location that allows its identification in spite of spatiotemporal discontinuities.

Considerable progress in understanding “object files” has been made by Hommel (1998). He has advanced the theory in various creative ways (Hommel, 2004, 2005; Hommel, & Colzato, 2009). First, Hommel showed that priming effects can be documented even when an object’s location is not repeated, but other of its features are (i.e., object-nonspecific repetition effects). Second, he demonstrated that “objects files” may consist of a subset (i.e., binary bindings) of their features, not necessarily the entire list of features, as argued by Kahneman et al. (1992). Third, object-nonspecific repetition effects represent a processing cost, rather than a benefit (Hommel, & Colzato, 2009). In particular, repeating two given features (e.g., a red square) or alternating the same features (e.g., a blue triangle) yields performance levels that are superior to those observed in conditions in which one of the features is repeated and the other is alternated (e.g., a red triangle). This pattern is called partial-repetition costs (Hommel, 2004, p. 496). Fourth, Hommel introduced the concept of action codes. These are motor and response attributes that are distributed in the brain and are amenable to integration just like visual features-codes (Hommel, Müsseler, Aschersleben, & Prinz, 2001). When action codes integrate with feature codes, they create an “event file”—a mid-level representation or a pointer to a visuo–motor episodic trace (Hommel, 1998). For example, responding to a red object with your right hand may lead to the binding of the red color with the motor code associated with the right hand. Complete repetition or alternation of the features in this newly created combination would enjoy more efficient processing than partial repetitions.

The distributed coding of simple attributes, such as color, shape, and orientation, in the primate brain is well established (Livingstone, & Hubel, 1987). But are more complex attributes, such as facial dimensions coded in a distributed fashion? Haxby and his colleagues (Haxby et al., 2002; Hoffman, & Haxby, 2000) have presented evidence for the existence of a neural system in the human brain of separate localized regions. This system specializes in processing facial attributes. In this system, the ventral temporal cortex and the fusiform gyrus (Kanwisher et al., 1997) are responsible for the processing of invariant facial aspects, such as identity, whereas the superior temporal sulcus (STS) is responsible for the processing of variant attributes, such as eye gaze and emotion (Vuilleumier, Armony, Driver, & Dolan, 2001). The neuronal distributed model proposed by Haxby and his colleagues (Haxby et al., 2002; Hoffman, & Haxby, 2000) suggests that facial attributes are coded in separate brain areas. To date, no direct attempt has been made to study how these face codes are integrated with each other, or how they are bound with action codes (Hommel, 2000).

Overview of the present experiments

Using simple colored shapes, Hommel (1998) adduced consistent evidence for the presence of binding processes, supporting the existence of both visuo–visuo integrations (i.e., form and color, form and location, and color and location) and visuo–motor integrations (i.e., color and response location, form and response location). Hommel’s (1998) methodology and results provide strong evidence for the existence of “object files” and “event files” with low-level features. The present study tested the hypothesis that similar “object files” and “event files” exist for face attributes. A recent study by Keizer, Colzato, and Hommel, (2008) documented integrations of faces with houses, motion, and manual response. The present study departs from the Keizer et al. study in an important way. In that study, the whole face served as the elementary unit of integration, whereas here, facial attributes (e.g., eye gaze and expression) are the integration units, and the main question of interest concerns the binding of these attributes.

Five facial and non-facial attributes were elected for testing: facial identity, emotion (i.e., expression), eye-gaze direction, the face’s spatial location, and the location of the manual response emitted toward the face. Subsets of these five attributes have been tested in a series of four experiments. The reason for choosing these attributes is that they represent the most important and studied face attributes (cf. Haxby et al., 2000, 2002). Another reason is that they encompass both variant (i.e., emotion and gaze direction) and invariant (i.e., identity) attributes (Haxby et al., 2000, 2002).

A word is in order regarding the non-facial attribute of spatial location. A-priori, it seems likely that faces are individuated via their identity (John’s face). However, there is also the possibility that faces are individuated through their location in space. Interestingly, spatial location has not been considered as a consequential variable in face recognition studies, although it has been attributed a fundamental role in tagging an addressing “object files” (Kahneman, & Treisman, 1984; Kahneman et al., 1992; Wolfe, & Bennett, 1997). Hommel has documented partial-repetition costs for combinations of location and response, location and form, but not for combinations of location and color (Hommel, 1998). It is, therefore, crucial to see whether spatial location is critical to the individuation of faces, or for the integration of facial features into an “object file” or “face file.”

The paradigm deployed throughout the present experiments is similar to that used by Hommel (1998, 2004, see also Zmigrod, de Sonneville, Colzato, Swaab, & Hommel, 2013). It is a variation on the original preview method developed by Kahneman et al. (1992). Each trial consisted of a sequence of displays, starting with a cue to response, followed by a face (S1), and replaced by a blank. The blank was then substituted by another face stimulus (S2). Response to the first face, S1, is termed R1, and response to the second face, S2, is called R2. Figure 1 shows a schematic illustration of displays and timings in the experiments. On a trial, each one of the features could be either repeated or alternated from S1 to S2. Similarly, the response feature (i.e., left- vs right-hand response) could be repeated, alternated, or neutral from R1 to R2. The neutral condition means that no response was required in R1. This condition can help decide whether repetition was beneficial or alternation was harmful for performance. The execution of R2 was performed according to the relevant dimension for response (e.g., identity) in the given experiment. The target dimension for response was varied across experiments.

Fig. 1
figure 1

Illustration of the displays used in Experiments 1–4 following Hommel (1998). R1 and R2 are responses to the first stimulus (S1) and second stimulus (S2)

In the present experiments, each facial dimension could take one of two values. Thus, facial identity could belong to either person A or person B (Experiments 1 and 2); similarly, facial emotion could take one of two possible values—sad vs angry in Experiments 1 and 2, or frightened vs angry in Experiments 3 and 4; eye-gaze direction was either averted to the left or to the right (in Experiments 3 and 4), and the spatial location of the face was either on the top or bottom of the screen (Experiments 1–4).

Three effects of major theoretical significance may emerge in this priming setup (Hommel, 1998, 2004). The first is a main effect of stimulus or response feature repetition. Perceivers may benefit from the repetition of facial identity S1 (e.g., Jim) in S2 (e.g., Jim), or due to the repetition of R1 response to S1 (e.g., right-hand key) in S2 (e.g., right-hand key). In that case, perceivers may respond faster to the probe in the identity-repeated condition than in the identity-alternated condition (Burton, Kelly, & Bruce, 1998; Ellis, Young, Flude, & Hay, 1987). This type of effect does not imply integration of features, but it indicates feature priming in short-term memory.

A second type of effect is called partial-repetition costs (Hommel, 2004) and is due to repetition or alteration of combinations of features from S1 to S2. To better understand how this effect is measured, consider the following three types of trials: (1) complete repetitions are trials in which the two features of the stimulus in S1 (e.g., Jim + happy) are repeated in S2 (e.g., Jim + happy), (2) complete alternations are trials in which the two features in S1 (e.g., Jim + happy) are replaced by two different features in S2 (e.g., David + sad), and (3) partial repetitions are trials in which one of the features in S1 is repeated in S2, whereas the other feature is alternated (e.g., Jim+ happy in S1 and David + happy in S2). Partial-repetition costs (Hommel, 2004) are recorded when performance in the partial-repetition trials is worse than that in the complete repetition or complete alternation trials. The presence of such costs entails the formation of an “object file” consisting of a pairwise binding trace of the two pertinent features (Hommel, 1998).

A third type of result is due to the repetition or alteration of feature–response combinations. The repetition or alteration of a specific combination of stimulus–response features conjunction in S1–R1 (e.g., Jim + left key in S1) may be facilitated if completely repeated in S2–R2 (e.g., Jim + left key in S2) or completely alternated (e.g., David + right key in S2), relative to a condition where only one of the features is repeated and the other is alternated (e.g., Jim + right key in S2). Partial-repetition costs with response–stimulus features indicate the formation of an “event file” (Hommel, 1998, 2004). In the theoretical context of the present study, this type of effect may speak to the integration of response codes with facial attributes.

Experiment 1

Faces in Experiment 1 varied on four dimensions: identity, emotion, spatial location, and response location. The relevant dimension for response was facial identity. A central goal of the experiment has been to examine whether facial identity plays a crucial role in the formation of “face files”. Mitroff, Scholl, and Noles, (2007) have shown that the response to facial identity was speeded if identity reappeared in a previously presented object irrespective of the object’s location. The results by Mitroff et al. (2007) suggest the involvement of episodic tokens in the formation of “object files”. It is highly likely that facial identity is an important feature in the formation of “face files”, allowing a coherent representation when a face undergoes spatiotemporal discontinuities. However, the Mitroff et al. study has not been designed to probe identity binding with other facial attributes of perceptual and conceptual variability (e.g., emotion).

Spatial location is another feature that might be operative in the formation of “face files”, serving the visual system as an anchor or pointer toward the perceived face. This is a plausible idea, since spatial tagging mechanisms, such as inhibition of return (i.e., IOR, Posner, & Cohen, 1984), have been shown to affect the detection of faces (Tipper, Weaver, Jerreat, & Burak, 1994). If this hypothesis is correct, partial-repetition costs are expected with spatial location. Another prediction follows from Hommel’s work (1998, 2004) on the binding of visual and action codes. Hommel found that the task-relevant feature is often highly likely to be bound with the response code. It is therefore predicted that facial identity, which serves here as the relevant feature, will be integrated with response code. Finally, Kahneman and Treisman (1984, see also Kahneman et al. 1992), have argued that the creation of “object files” is exhaustive, in the sense that it requires the binding of all constituent features. If such an exhaustive process occurs with faces, full-repetition costs with all four dimensions are expected. This would be indicted by a four-way interaction with identity × emotion × location × response.

Method

Participants

Twenty young volunteers from Ariel University took part in this experiment. These were young male and female undergraduate students (aged 20–28) who participated in partial fulfillment of course credit. All reported normal or corrected-to-normal vision, normal hearing, and unencumbered use of their two hands.

Apparatus and stimuli

The experiment was controlled by a desktop computer. Viewing distance subtended 76 cm from the computer screen. The stimuli consisted of three 3.16° × 2.7° black square outlines arranged vertically from the top to the bottom (see Fig. 1). Four facial identities were deployed. These consisted of two females and two males. Two separate sets of faces were constructed for the male and female faces (see Fig. 2). Each set of images was created by crossing two unfamiliar facial identities (person A and person B) with two facial expressions (sad and angry). The face images were downloaded with permission from the Karolinska directed emotional face (KDEF) database (Lundqvist, Flykt, & Ohman, 1998). The images were altered with the free GIMP software. Each face image subtended 1.88° × 2.33°. The faces were equated for size, brightness, and overall shape. The face stimuli were presented as gray-scale images over a gray or black frame (see Fig. 2).

Fig. 2
figure 2

a Four male face images and b four female face images used in Experiment 1. Each set of the faces was created by crossing two levels of unfamiliar facial identity (A and B) with two levels of facial emotion (angry and sad)

Each face could appear either in the upper box or in the lower box (see Fig. 1). A middle box, at the center of screen, was used for presenting the cue for response (R1). Response cues were full black arrows which were pointing to the right, left, or both directions (when no response in R1 was needed). Responses were made by pressing the left (“z”) or right (“m”) keys on a QWERTY keyboard.

Procedure and design

The procedure and design were similar to those reported by Hommel (1998). Each experimental trial started with an arrow cue for 1500 ms. Participants withheld their response (R1) to the first stimulus (S1) if the arrow was bidirectional. Participants made a response (R1) to S1 according to the cue if the arrow was pointing only in one direction (left or right). A leftward pointing arrow required a left-hand-key response and a rightward pointing arrow required a right-hand-key response. Participants were informed that there would be no systematic relationship between S1 and R1, so that they should execute the precued response at the onset of S1 while ignoring the irrelevant dimension of S1. A second response (R2) was always a binary-choice reaction to the second stimulus (S2). The critical stimulus dimension in S2 was facial identity. Half of the participants responded to “identity A” with a right-hand response (“m”) and to “identity B” with a left-hand response (“z”), while the other half responded with the reverse assignment. To be able extend the validity of the results beyond a certain identity and gender, one group of participants (n = 12) was presented with the male images (Fig. 2a), whereas the other group of participants (n = 8) was presented with the female images (Fig. 2b).

Figure 1 shows a typical sequence of events in a trial. Each trial began with an arrow cue presented for 1500 ms followed by a blank interval for 500 ms. Then, S1 face appeared for 500 ms and R1 was expected. S1 was then replaced by another blank interval for 500 ms followed by S2. At this stage, R2 was expected. S2 remained on the screen for 2500 ms or until response. An inter-trial interval of 2500 ms preceded the presentation of a new response cue. A block consisted of the factorial combination of S2 identity (person A vs person B), R1 response (left vs right vs both), emotion (sad vs angry), location (top vs bottom box), and R2 response (left vs right), the possible relationships between S1 and S2 (i.e., repetition vs alternation) regarding identity, emotion and location, and the three possible relationships between R1 and R2 (repetition, alternation, or single response). Each experimental block consisted of 192 trials. The experiment consisted of three blocks of trials. The order of trials in each block was chosen randomly by the computer. A 1 min break was allowed between the blocks.

Results

Trials in which RTs were incorrect, longer than 1900 ms, or shorter than 150 ms were removed from the analysis. These amounted to 8.6 % of the total number of trials. Mean RTs and mean proportion of errors were calculated for each possible level of stimuli and responses in the two tasks (R1 and R2). A five-way ANOVA with stimulus set (male, female), response (repeated, alternated), emotion (repeated, alternated), identity (repeated, alternated), and location (repeated, alternated) as factors was performed on mean RTs. Because the effect of stimulus set was far from significance, the data were collapsed to a four-way ANOVA. Table 1 reports those mean RTs along with the error rates (see also Table 5 in the Appendix for an exhaustive list of the ANOVA effects). A significant main effect of emotion [F(1, 19) = 12.47, MSE = 24,055, p < 0.005] revealed that repeating facial expression led to faster responses (802 ms) than alternating it (819 ms). Most importantly, the response × identity interaction [F (1, 19) = 43.43, MSE = 18,4767, p < 0.00001] indicated partial-repetition costs due to bindings of the response feature with the task-relevant facial feature of identity (see Fig. 3a). Responses were faster when both identity and response features repeated or alternated (790 and 782 ms) than when only one of them repeated and the other alternated (830 and 839 ms). It is important to emphasize that interpreting binding effects strictly requires focusing on the interaction as such. The main effects, whether significant or insignificant, are irrelevant to the interpretation of the binding effect. A response × location [F (1, 19) = 5.41, MSE = 10,586, p < 0.05] reflected the binding of spatial location with response (see Fig. 3b). Responses were faster when both location and response features repeated or alternated (807 and 802 ms) than when only one of them repeated and the other alternated (811 and 821 ms). In addition to the creation of these “event files”, which reflected a visuo–motor binding, a significant identity × emotion interaction [F (1, 19) = 4.6, MSE = 11,712, p < 0.05], indicated the binding of identity and emotion (see Fig. 3c), and thus the creation of “object files”. Faster responses were recorded when both facial identity and emotion repeated together (795 and 813 ms) than when only one of them repeated and the other alternated (808 and 825 ms). Two-tailed t tests verified that the benefits and costs associated with all these pairwise bindings were significantly different from zero (all ps < 0.05).

Table 1 Means reaction times (RT) in ms and percentage of error (PE) for R2 in Experiment 1 for conditions of repetition and alteration in S1 and S2 and in R1 and R2
Fig. 3
figure 3

Feature binding in Experiment 1. a Facial identity and response. b Spatial location of the face and response. c Facial emotion and facial identity

Discussion

Experiments 1 underscored partial-repetition costs with both facial and non-facial attributes, adducing consistent evidence for the formation and retrieval of both “object files” and “event files” with facial attributes. These episodic structures are dubbed herein “face files”. The current patterns extend those observed with color-shape objects (Hommel, 1998). They show that: (a) binding can take place with subsets of features rather than the entire list of features (Kahneman et al., 1992) and (b) integration of response-stimulus features can occur with task-relevant as well as with task-irrelevant stimulus features (Hommel, 2004). The results support the hypothesis that high-level social and motor categories conveyed by faces are abstracted, extracted, and become available to perception and action. The results are commensurate with Freeman and Ambady’s (2011) interactive model, according to which social aspects of a face interact with each other as well as with motor codes.

Note that spatial location interacted with the response, but not with any of the other facial features; while the task-relevant attribute (e.g., identity) was bound with the response feature and with the facial attribute of emotion. This might be because identity served as the task-relevant dimension. An alternative explanation is that facial identity serves as a quintessential facial dimension in the individuation of a face. According to this account, identity should be automatically bound with response, as well as with other facial features. In addition, this should hold true even when identity is not the relevant dimension for the task at hand. A plausible hypothesis is, therefore, that it is facial identity and not spatial location that maintains the retrieval of integrated face attributes. A central goal of Experiment 2 has been to decide between these two hypotheses. In Experiment 2 facial emotion was made the relevant dimension for response. If the former hypothesis is correct, it is expected that facial identity would not be integrated with response. If the latter hypothesis is correct, it is expected that facial identity would be integrated with the response feature, as well as with other features.

Experiment 2

Experiment 2 was identical to Experiment 1 in terms of design, procedure and stimuli, except for the fact that emotion served as the relevant dimension for response. Participants were asked to ignore the facial identity as well as other irrelevant features.

Method

Participants

A new group of eleven young volunteers from Ariel University took part in Experiment 2. These were male and female undergraduate students who participated in partial fulfillment of course credit. All reported normal or corrected-to-normal vision, normal hearing, and unencumbered use of their two hands.

Apparatus and stimuli

Apparatus and stimuli were identical to those reported in Experiment 1 with the female set of face stimuli.

Procedure and design

The procedure and design were identical to those reported in Experiment 1. The only difference between the two experiments was that in the current experiment participants responded to the feature of facial emotion rather than to that of facial identity of the target’s face. Participants indicated whether the face was sad or angry by pressing one of two response keys. Response assignment was balanced across observers.

Results

Trials in which RTs were incorrect, longer than 1900 ms, or shorter than 150 ms were removed from the analysis. These amounted to 4.6 % of the total number of trials. Mean RTs and mean proportion of errors were calculated for each possible level of stimuli and responses in the two tasks (R1 and R2). Table 2 reports those mean RTs along with the error rates. A four-way ANOVA with response (repeated, alternated), emotion (repeated, alternated), identity (repeated, alternated), and location (repeated, alternated) as factors was performed on mean RTs. The full list of ANOVA effects is presented in Table 6 at the Appendix. A marginally significant main effect of spatial location [F (1, 10) = 4.37, MSE = 11,641, p = 0.06 ] revealed that repeating that feature led to faster responses (822 ms) than alternating it (835 ms).

Table 2 Means reaction times (RT) in ms and percentage of error (PE) for R2 in Experiment 2 for conditions of repetition and alteration in S1 and S2 and in R1 and R2

Most importantly, the response × emotion interaction [F (1, 10) = 18.12, MSE = 50,889, p < 0.005] indicated partial-repetition costs due to bindings of the response feature with the task-relevant facial feature of emotion (see Fig. 4a). Responses were faster when both emotion and response features repeated or alternated (816 and 810 ms) than when only one of them repeated and the other alternated (861 and 834 ms). A response × identity interaction [F (1, 10) = 10.64, MSE = 19,912, p < 0.005] reflected the binding of facial identity with response (see Fig. 4b). Responses were faster when both identity and response features repeated or alternated (810 and 830 ms) than when only one of them repeated and the other alternated (841 and 841 ms). In addition to these visuo–motor bindings that indicated the presence of “event files”, a significant identity × emotion interaction [F (1, 10) = 9.10, MSE = 17,321, p < 0.05], indicated the binding of identity and emotion (see Fig. 4c) and, therefore, the emergence of an “object file”. Faster responses were recorded when both identity and emotion repeated or alternated together (824 and 817 ms) than when only one of them repeated and the other alternated (853 and 828 ms). Two-tailed t tests verified that the benefits and costs associated with all these pairwise bindings were significantly different from zero (all ps < 0.05). The response × identity × emotion interaction was significant [F (1, 10) = 5.28, MSE = 18,503, p < 0.05].

Fig. 4
figure 4

Experiment 2. Partial-repetition costs indicating the binding of a response and facial emotion, b response and facial identity, and c facial emotion and identity

Error analysis A similar four-way ANOVA was performed on error rates. The analysis revealed a main effect of emotion [F (1, 10) = 8.84, MSE = 0.016, p < 0.05], indicating that more errors (5.0 %) were committed when emotion was repeated than when alternated (3.1 %). No other significant effects have been found on the error analyses.

Discussion

In Experiment 2, facial emotion served as the task-relevant dimension. The results provided further evidence for the formation of “face files”. The existence of these episodic structures proposes that visual and motor attributes are abstracted, extracted, and integrated into temporary constructs in visual short-term memory. Both visuo–visuo and visuo–motor bindings obtained with facial emotion and identity. The task-relevant feature of emotion was combined with response, as did facial identity. Emotion and identity were also bound together. This outcome supports the hypothesis that facial identity is an essential attribute in the formation of face files, as it was automatically integrated into a “face file”, even though it was not relevant for response.

Another conclusion that can be made is that spatial location has not played a significant role in the binding process. The almost non-existent involvement of spatial location in binding effects in the last two experiments is quite surprising. The previous studies with geometric colored shapes have shown that spatial location is often integrated with visual and motor features (Hommel, 1998; Kahneman et al., 1992; van Dam, & Hommel, 2010). Why were bindings with spatial location missing here? It should be noted that an auxiliary experiment has been conducted using the same displays and methods adopted in Experiments 1–2. In this experiment, I have replicated Hommel’s (1998) Experiment 1, including the presence of the exact patterns of partial-repetition costs with spatial location. Thus, the absence of bindings with spatial location in the last two experiments is not due to differences in methods. There are two possibilities that can account for the lack of such binding effects with faces. One is that spatial location is not operative in the binding of facial features. An alternative account is that spatial location may be active only when it becomes relevant for the processing of the task-relevant dimension. The testing of these two hypotheses becomes possible in the next set of two experiments by introducing the dimension of eye-gaze direction.

In the next set of experiments, the facial dimension of eye-gaze direction was varied in a newly created set of face stimuli. Gaze direction is a facial attribute of considerable social import. It is extremely useful in reading other people’s attention, intentions, and actions. In recent years, this facial attribute has been under extensive scrutiny (Calder, Beaver, Winston, Dolan, Jenkins, Eger, & Henson, 2007; Engell, & Haxby, 2007; Friesen, & Kingstone, 1998; Frischen, Bayliss, & Tipper, 2007). One of the most intriguing discoveries is that gaze direction induces spatial attentional shifts by acting as a visual cue for location (Friesen, & Kingstone, 1998). When perceivers see a leftward looking face, their responses are faster to targets located on the left, while the reverse also holds true. In Experiment 3, gaze direction will serve as the target dimension. Because gaze direction is coded in a spatial location code, it seems likely that it will interact with spatial location and response features.

Experiment 3

The goal of Experiment 3 has been to further test the binding process with facial and non-facial attributes. The focus of this experiment was the possibility that the spatial codes of location and response are integrated with gaze direction. A new set of faces was constructed in which the eye gaze of the face could be directed either to the left or to the right (see Fig. 5). Participants indicated the direction of the face’s eye gaze while ignoring variations in facial emotion, location, and response. Based on the spatial qualities of gaze direction (Friesen, & Kingstone, 1998), it was predicted that spatial location would come to play a vital role in the formation of “face files”, such that it would be integrated with gaze direction as well as with other facial and non-facial features, including spatial location.

Fig. 5
figure 5

Four face images used in Experiments 3 and 4. The faces were created by crossing two levels of facial emotion (anger and fear) with two levels of gaze direction (leftward and rightward)

Method

Participants

A new sample of seventeen young volunteers from Ariel University took part in Experiment 3. These were male and female undergraduate students who participated in partial fulfillment of course credit. All reported normal or corrected-to-normal vision, normal hearing, and unencumbered use of their two hands.

Apparatus and stimuli

A new set of stimuli was created for this experiment. The face images were downloaded with permission from the Karolinska directed emotional face (KDEF) database (Lundqvis et al., 1998). The images were modified using the free GIMP software to create the four images that are presented in Fig. 5. Two levels of gaze direction (left, right) were crossed with two levels of facial emotion (anger, fear). The same female identity from Experiment 1 (person B) was used. The change of the emotion values from those used in the previous experiments was done to extend the conceptual replicability of the stimuli. The images were equated on size, brightness, and overall shape.

Procedure and design

The procedure and design were identical to those reported in Experiment 1. The task-relevant dimension for response was gaze-direction. Participants pressed a right-hand key (“m”) if the gaze was averted to the right. They pressed a left-hand key (“z”) if the gaze was averted to the left. Participants were asked to ignore all the other irrelevant dimensions.

Results

Trials in which RTs for R1 and R2 were incorrect, longer than 1900 ms or shorter than 150 ms were removed from the analysis. These amounted to 6.9 % of the total number of trials. Mean RTs and mean proportion of errors were calculated as a function of the four possible relationships between the stimuli and the responses of the two subtasks (R1 and R2). That is according to whether the emotion, gaze direction, or location of S1 and S2 was repeated or alternated, and whether R2 was preceded by a same, different, or no response. Table 3 presents the mean RTs and error rates in the different conditions. A four-way analysis of variance (ANOVA) with response (repeated, alternated), emotion (repeated, alternated), gaze direction (repeated, alternated), and location (repeated, alternated) as factors was performed. The full list of ANOVA effects is presented in Table 7 at the Appendix.

Table 3 Means reaction times (RT) in ms and percentage of error (PE) for R2 in Experiment 4 for conditions of repetition and alteration in S1 and S2 and in R1 and R2

Reaction time A main effect of emotion [F (1, 16) = 5.38, MSE = 4954, p < 0.05] revealed that responses were faster when emotion repeated (731 ms) than when emotion alternated (739 ms). Most importantly, various two-way interactions signaled the obtainment of visuo–motor and visuo–visuo bindings (see Fig. 6a–f). First, a significant response × gaze interaction [F (1, 16) = 7.08, MSE = 16,394, p < 0.05] indicated that response and the task-relevant feature of gaze direction were bound together. RTs were faster when these features both repeated or alternated together (740 and 715 ms) than when one repeated and the other alternated (743 and 742 ms). A response × location interaction [F (1, 16) = 13.17, MSE = 11,800, p < 0.005] revealed the binding of response with location. RTs were faster when both response and location repeated or alternated (731 and 726 ms) than when one repeated, but the other alternated (732 and 751 ms). Response also integrated with emotion [F (1, 16) = 7.61, MSE = 10,312, p < 0.05], as indicated by faster RTs when the two features repeated or alternated together (731 and 727 ms) than when one of them repeated and the other alternated (731 and 752 ms).

Fig. 6
figure 6

Experiment 3. Partial-repetition costs indicating the binding of a gaze direction and response, b spatial location and response, c emotion and response, d emotion and location, e gaze direction and emotion, and f gaze direction and location

Evidence for visuo–visuo bindings was indicated by the significant gaze × emotion interaction [F (1, 16) = 9.923, MSE = 10,451, p < 0.01]. Faster RTs were recorded when both features repeated or alternated (731 and 726 ms) than when only one of them repeated and the other alternated (730 and 752 ms). A marginally significant interaction of location × emotion [F (1, 16) = 4.45, MSE = 6449, p = 0.05 ] was found (723 and 738 ms in mutual repetitions or alternations vs 739 and 741 ms in the contrasting case); and a marginally significant interaction of location and gaze [F (1, 16) = 4.31, MSE = 8464, p = 0.054] was recorded (733 and 726 ms in mutual repetitions or alternations vs 731 and 751 ms in the contrasting case). These two interactions pointed to the binding of location with emotion and location with gaze. T tests revealed that most of the partial-repetition effects were significant.

Error analysis Similar four-way ANOVA was performed on the error rates. No significant effects have been observed.

Discussion

In Experiment 3, gaze direction served as the relevant dimension for response. Participants were instructed to ignore variations on facial emotion, response, and spatial location. A number of pairwise bindings were detected across motor and visual face attributes. In addition to the expected binding of the task-relevant dimension of gaze direction with the motor response feature, gaze was integrated with spatial location and emotion. It seems that including the dimension of gaze direction in the stimuli set, and rendering it the relevant face attribute, activated the spatial location code. This in turn, led to the binding of spatial location with other features. Such an outcome supports the hypothesis that the absence of binding effect with spatial location in Experiments 1 and 2 is not due to some unique characteristic of faces, but rather due to the facial features used. The findings of genuine bindings with spatial location in Experiment 3 are commensurate with our initial hypothesis that gaze direction is coded in some sort of a spatial code that is shared with the spatial location code. These results support Hommel’s (1998) conjecture that binding is more likely across features that share common codes. The binding of emotion with gaze direction is consistent with recent studies showing interactions between the two dimensions (Adams, Gordon, Baird, Ambady, & Kleck, 2003; Adams, & Kleck, 2003, 2005).

It is interesting to note the involvement of spatial location in binding with gaze direction, as well as with emotion and response features. A plausible explanation for this might be the relevancy of gaze direction for task completion in this experiment. As mentioned earlier, gaze direction is known to induce reflexive shifts of spatial attention (Friesen, & Kingstone, 1998). Gaze direction is also instrumental in providing the observer with cues for emotion (Adams et al., 2003; Adams, & Kleck, 2003, 2005). This might account for the observed bindings of location with gaze direction and facial emotion. If this explanation is correct, one would expect to find a reduction or even a complete abolishment of the involvement of spatial location in binding when gaze direction stops serving as a relevant dimension for response. This prediction has been tested in Experiment 4 by turning the feature of gaze direction into an irrelevant (though existent) dimension in the stimuli set.

Experiment 4

Experiment 4 was identical to Experiment 3 in terms of procedure, design, and stimuli. The only difference was that facial emotion was made the relevant dimension for response instead of gaze direction.

Method

Participants

A new sample of seventeen young volunteers from Ariel University took part in Experiment 4. Participants were young male and female undergraduate students who participated in partial fulfillment of course credit. None of them participated in the previous experiments. All of them reported normal or corrected-to-normal vision and unencumbered use of their two hands.

Apparatus and stimuli

Apparatus and stimuli were identical to those reported in Experiment 3.

Procedure and design

Procedure and design were identical to those reported in Experiment 3. The only difference was that facial emotion served as the relevant dimension for response in this experiment. Response assignment was counterbalanced across participants.

Results

Trials in which RTs for R1 and R2 were incorrect, longer than 1900 ms or shorter than 150 ms were removed from the analysis. This amounted to 9.8 % of the total number of trials. Table 4 gives the mean RTs and error rates in the various conditions. These means were entered into a four-way analysis of variance (ANOVA) with response (repeated, alternated), emotion (repeated, alternated), gaze direction (repeated, alternated), and location (repeated, alternated) as factors. The full list of ANOVA effects appears in Table 8 at the Appendix. A main effect of emotion repetition [F (1, 16) = 10.05, MSE = 65,877, p < 0.005 ] was recorded, suggesting that repeating emotion led to slower responses (843 ms) than alternating it (811 ms). Most importantly, evidence for the existence of “face files” was indicated by a highly significant response × emotion interaction [F (1, 16) = 33.76, MSE = 140,273, p < 0.000001] and a significant response × gaze-direction interaction [F (1, 16) = 6.50, MSE = 54,862, p < 0.05] (see Fig. 7a, b). The response × emotion interaction indicated partial-repetition costs due to bindings of the response feature with the task-relevant facial feature of emotion. Responses were faster when both the features of emotion and response repeated or alternated (825 and 783 ms) than when only one of them repeated and the other alternated (840 and 860 ms). Paired comparisons confirmed that the costs and benefits associated with response and emotion were significantly greater than zero (all ps < 0.05).

Table 4 Means reaction times (RT) in ms and percentage of error (PE) for R2 in Experiment 3 for conditions of repetition and alteration in S1 and S2 and in R1 and R2
Fig. 7
figure 7

Experiment 4. a Partial-repetition costs indicating the binding of response feature and facial emotion. b Partial-repetition costs indicating the binding of response feature and gaze direction

The response × gaze-direction interaction reflected partial-repetition costs due to binding of the response feature (left, right) with the task-irrelevant facial feature of gaze direction (left, right). Responses were faster when both gaze direction and response features repeated or alternated (814 and 811 ms) than when only one of them repeated and the other alternated (832 and 850 ms). Paired comparisons confirmed that the costs and benefits associated with the response and emotion features were significantly greater than zero (all ps < 0.05). This result suggests that although it was not relevant for the task at hand, gaze direction was integrated into a “face file”. No other effects have reached significance level.

Error analysis Similar four-way ANOVA was performed on error rates. The analysis revealed a main effect of response [F (1, 16) = 4.73, MSE = 0.047, p < 0.05], indicating that fewer errors were committed (2.9 %) when response was repeated than when response was alternated (5.5 %). A two-way interaction of response × emotion [F (1, 16) = 30.32, MSE = 0.085, p < 0.0000], signaled higher error rates (4.6, 7.4 %) when response and emotion repeated than when one of them repeated and the other alternated (3.7 and 1.2 %). This result looks like a mirror image of the RT result, and thus might suggest a speed-accuracy tradeoff strategy.

Discussion

In Experiment 4, a single facial identity varied on gaze direction, facial emotion, and spatial location. Facial emotion served as the target dimension. The results attested once more to the primacy of the task-relevant dimension (i.e., emotion) in binding with the response feature. The irrelevant dimension of gaze direction has also been integrated with response. Commensurate with our initial hypothesis, spatial location has not been involved in any of the bindings. It seems that once gaze direction becomes an irrelevant dimension for response, its close associate and code-sharing dimension—spatial location—turns into a dormant feature. Such an account can be attributed to a possible reduction in the amount of attention allocated to gaze direction, and consequently to the weaker activation levels that might spread out to the spatial location codes.

General discussion

A series of four experiments provided substantial evidence for the formation and retrieval of transient memory structures with face attributes. In Experiment 1, participants responded to the identity of two unfamiliar faces varying on facial emotion and spatial location. Partial-repetition costs (Hommel, 1998) indicated the bindings of identity with response features, location with response, and identity with emotion. In Experiment 2, facial emotion was made the relevant dimension for response. Similar pairwise bindings were recorded. In Experiment 3, a single identity varied on gaze direction, emotion, and spatial location, with gaze-direction serving as the relevant feature for response. Several visuo–motor and visuo–visuo bindings were documented, including pairwise conjunctions with spatial location. In Experiment 4, facial emotion served as the target feature. The task-relevant dimension of emotion and the task-irrelevant dimension of gaze direction were both integrated with the response feature.

Taken collectively, the results from the four experiments converged on the conclusion that social and physical attributes of faces are integrated into temporary memory structures of pairwise visuo–visuo and visuo–motor bindings. The current empirical patterns extend the previous findings with the low-level features of color, shape, or location (Hommel, 1998). Here, it has been shown that binding can take place with attributes of higher representational complexity than the routine colored shapes (Hommel, 1998, 2004). At the neuronal level, the integration of facial attributes with each other, as well as with other motor features, requires the engagement of a broader network of neuronal substrates (Haxby et al., 2002; Sugase et al., 1999). This is implied by the logic that correct and efficient binding of facial attributes necessitates the activation of long-term representations as well as prior knowledge of social categories (Freeman, & Ambady, 2011).

The present study is not the first to incorporate face stimuli in a binding task. Mitroff et al. (2007) presented evidence for the integration of faces with objects, showing that response to facial identity is enhanced when the face reappears within the boundaries of the same object. Keizer et al. (2008) demonstrated spontaneous integration between blended images of faces and houses. The present study extends these results in several important ways. First, the binding unit in the Mitroff et al. (2007) and the Keizer et al. (2008) studies is the whole face itself, whereas in the present study, it is the face attribute (e.g., identity and emotion). The current study demonstrates that binding can occur at the mid-level of a spectrum expanding—on the one end—the whole face as the unit of integration, and—on the other end—the face attribute (e.g., identity) as a unit of integration. Second, the classification of face images is more difficult than that of house images. Faces share many overlapping low-level features and thus belong to the same basic-level category (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). Consequently, within-faces discriminations should rely more heavily on the activation of long-term nodes.

A closer look at the contents of “faces files” in the current study reveals that they all consisted of pairwise bindings of task-relevant and response features. This type of integration was evident in each and every one of the experiments. Other visuo–motor bindings with task-irrelevant features were also prevalent, depending on whether those features shared a representational code (e.g., a spatial code in the case of gaze direction and response). These findings are commensurate with Hommel’s claims concerning the likelihood of a feature to be integrated into an “event file” given its role in a particular task (Hommel, 2004). According to this idea, task-relevant feature dimensions are “intentionally weighted” (Hommel, Memelink, Zmigrod, & Colzato, 2014), and thus have better chances of being integrated. This can also explain why spatial location has been activated when gaze direction became the task-relevant dimension, but remained dormant when gaze direction stopped been relevant for task completion. The current results are also consistent with an embodied cognition view of face recognition. According to this approach, faces are embodied entities (Spivey, & Dale, 2004) that, in addition to conveying perceptual information, also activate a rich network of action programs in the viewers, depending on context.

One issue that deserves a comment concerns the possibility that the binding effects observed here capture some type of configural learning rather than the genuine integration of complex facial attributes. According to this argument, the small number of face stimuli used may have encouraged participants to respond to a learned configuration of the low-level features rather than to the criterial abstracted facial attribute. Several lines of evidence militate against such a possibility. First had there been any configuration effects, the outcome pattern should have been different than that observed. In the case that people respond to the configuration, the 2 × 2 × 2 stimulus-feature design should have partitioned into two conditions; in one condition, S2 is the exact replica of S1, and in the second condition, it is not. This should have resulted in a four-way interaction, in which only exact repetitions speed up response repetitions and slow down response alterations, while all seven other conditions do the opposite. Since the current findings do not seem to reflect such a case, they provide positive evidence against configurational processing. Another source of evidence against the possibility of configurational processing comes from studies that demonstrated that learning does not interact with binding effect (Colzato, Raffone, Hommel, 2006; Hommel, & Colzato, 2009). Colzato et al. (2006) examined how color-shape binding is affected by conjunction probabilities and learning. They found that the effects of binding and learning were independent of each other. There is good reason to believe that this is also the case here.

Binding of facial features and “person construal”

The results of the present investigation accord well with recent social cognition studies. In particular, the results fit nicely with research on “person construal”. This burgeoning area of study investigates the low-level perceptual mechanisms that generate social phenomena (Fiske, & Neuberg, 1990; Freeman, & Ambady, 2011; Macrae et al., 1995). Studies in this domain have shown that when people categorize faces on a predefined category (e.g., gender), other social categories (e.g., age and race) are activated (Cloutier, Freeman, & Ambady, 2014; Freeman, 2014). To account for these findings, a dynamic model has been recently proposed by Freeman and Ambady, (2011). The model postulates interactive and simultaneous influences of bottom–up face processing of all possible category representations (e.g., male, female, White, Black), and top–down information sources (e.g., attentional states due to task demands). This mode of processing implies that all perceptual, cognitive, and motor attributes associated with the face are processed in parallel (Freeman et al., 2010, 2011). The model also predicts that the most important facial categories, those that are utmost relevant, or those that were recently active, will be activated more strongly.

A certain limitation of Freeman and Ambady (2011) model is that it is not clear regarding whether—beyond the interactive influences postulated—facial features are bound together in short-term memory, and if so, how. A recent study by Martin, Swainson, Slessor, Hutchison, Marosi, and Cunningham (2015) has yielded results that can speak directly to this point. When participants categorized faces on sex (i.e., man vs woman), repetitions and alternation of the previous trial’s irrelevant face category (i.e., race and age) affected performance. Responses were faster when the relevant feature and the irrelevant feature repeated or alternated together than when one of the features alternated and other repeated. Martin et al. (2015) intention was not to study integration across social face attributes, although the partial-repetition costs they have documented seem to support the notion of binding. These researchers have not couched their results in terms of feature binding, and their paradigm does not permit a clear dissociation between visual and motor components of bindings. This has become possible using the “event file” paradigm (Hommel, 1998) deployed in the current study. Future work should seek to reveal the mechanisms that govern the binding of action and perception of facial attributes.

Binding of facial features and the dual-route model

Haxby and his colleagues (Haxby et al., 2000, 2002; see also Bruce, & Young, 1986) have proposed a dual-route model of face recognition. In this model, the representations of facial dimensions (e.g., sex, expression, identity, and gaze direction) are distributed along two separate routes; one route is dedicated to the processing of invariant facial dimensions (e.g., facial identity), while the other route is responsible for the processing of variant facial dimensions (e.g., emotion and eye gaze). The model predicts the emergence of perceptual interactions between any two invariant (e.g., identity and gender) or between any two variant (e.g., gaze and emotion) facial attributes. In contrast, the dual-route model predicts independence between a variant and an invariant feature (e.g., identity and emotion). A vast literature has been dedicated to testing predictions from the dual-route model (Bartlett, Searcy, & Abdi, 2003; Calder et al., 2007; Le Gal, & Bruce, 2002; Fitousi, & Wenger, 2013; Soto, Vucovich, Musgrave, & Ashby, 2014). The accumulated bulk of evidence to date has been generally in agreement with the dual-route model (Haxby et al., 2000, 2002). However, recently, several studies have yielded results that are inconsistent with the dual-route model. In particular, violations of independence between facial identity and emotion have been reported (Fitousi, & Wenger, 2013; Soto et al., 2014; Yankouskaya, Booth, & Humphreys, 2012).

The results of the current investigation cannot be fully accommodated by the dual-route model; some of them are in agreement with the model, but some are much less so (see also Calder, & Young, 2005). As predicted by the dual-route model, variant and variant features, such as gaze direction and facial emotion, did interact in the binding process (see also Adams et al., 2003; Adams, & Kleck, 2003, 2005; Graham, & LaBar, 2007; Hietanen, & Leppänen, 2003), but that held true only when gaze direction was the task-relevant dimension. In contrast to the prediction of the dual-route model, it was found here that the variant and invariant facial attributes of emotion and identity do interact in the binding process. These results suggest that binding can take place within- and across the two allegedly independent routes and that the binding process is not symmetric. These patterns cannot be fully accommodated by the dual-route model. Another limitation of the dual-route model (Haxby et al., 2000, 2002) is that it does not address the possible interactions between facial features and action codes. The current “face file” approach can guide future theorizing on the dual-route model in this respect. Future work should further study in more detail the formation and function of “face files”.