What’s in a “face file”? Feature binding with facial identity, emotion, and gaze direction

Fitousi, Daniel

doi:10.1007/s00426-016-0783-0

What’s in a “face file”? Feature binding with facial identity, emotion, and gaze direction

Original Article
Published: 17 June 2016

Volume 81, pages 777–794, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Psychological Research Aims and scope Submit manuscript

What’s in a “face file”? Feature binding with facial identity, emotion, and gaze direction

Download PDF

Daniel Fitousi¹

601 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

A series of four experiments investigated the binding of facial (i.e., facial identity, emotion, and gaze direction) and non-facial (i.e., spatial location and response location) attributes. Evidence for the creation and retrieval of temporary memory face structures across perception and action has been adduced. These episodic structures—dubbed herein “face files”—consisted of both visuo–visuo and visuo–motor bindings. Feature binding was indicated by partial-repetition costs. That is repeating a combination of facial features or altering them altogether, led to faster responses than repeating or alternating only one of the features. Taken together, the results indicate that: (a) “face files” affect both action and perception mechanisms, (b) binding can take place with facial dimensions and is not restricted to low-level features (Hommel, Visual Cognition 5:183–216, 1998), and (c) the binding of facial and non-facial attributes is facilitated if the dimensions share common spatial or motor codes. The theoretical contributions of these results to “person construal” theories (Freeman, & Ambady, Psychological Science, 20(10), 1183–1188, 2011), as well as to face recognition models (Haxby, Hoffman, & Gobbini, Biological Psychiatry, 51(1), 59–67, 2000) are discussed.

Serial dependence in facial identity perception and visual working memory

Article Open access 04 October 2023

Integration of facial features under memory load

Article Open access 29 January 2019

Global precedence effects account for individual differences in both face and object recognition performance

Article 20 March 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Faces are multidimensional visual stimuli that are capable of transmitting a great deal of information regarding a host of physical and social attributes. These attributes include (but are not limited to) the identity, sex, emotional expression, or gaze direction of the face. A fundamental question in the study of faces concerns the manner by which facial attributes are integrated into a unified phenomenal experience (Bruce, & Young, 1986; Haxby, Hoffman, & Gobbini, 2000; Fitousi, 2013; Fitousi, & Wenger, 2013; Young, & Yamane, 1992). Whereas extensive research has been conducted on the binding of simple features, such as color, shape, and spatial location (Treisman, 1996; Hommel, 1998), less effort has been invested in studying the binding of more complex dimensions, such as facial features. The present study sought to fill in this gap. It addressed the question of whether facial features (i.e., identity, emotion, and gaze direction), as well as non-facial features (i.e., spatial location and response location) are integrated in and across perception and action.

A novel hypothesis advanced in the present study postulates that people create, maintain, and retrieve transient memory structures of facial features. The quest for such “face files” in the present study has been inspired by the notions of “object files” (Kahneman, Triesman, & Gibbs, 1992), and “event files” (Hommel, 1998, 2004). These notions have been instrumental in the study of objects and attention (Gordon, & Irwin, 1996; Henderson, 1994; Hommel, 2005), but they have been rarely applied to faces. Harnessing concepts and methodologies from these literatures, the current investigation yielded consistent evidence for the existence of “face files”—transient memories of facial and non-facial features bindings. The results bear important implications for current theories of face and object recognition (Haxby et al., 2000), person construal (Freeman, & Ambady, 2011), and feature binding (Hommel, 1998; Treisman, 1996).

Faces and the binding problem

The primate brain codes the dimensions of perceptual objects in a distributed manner (Hubel, & Wiesel, 1977; Felleman, & Van Essen, 1991). In this process, elementary features, such as color, shape, and location, are represented in different feature maps in the visual cortex (Livingstone, & Hubel, 1987, 1988). A major challenge facing our perceptual systems is that of recombining the separate features into veridical representations of the viewed objects (Treisman, & Gelade, 1980). To accomplish this task, the primate brain should coordinate information from several independent and often temporally discordant sources. This formidable computational challenge has been often dubbed the binding problem (Singer, & Gray, 1995; Treisman, 1996; von der Malsburg, 1999). One notable example for the presence of a binding problem in perception is the finding of “illusory conjunctions” with color and shape (Triesman, & Schmidt, 1982).

Very much like objects, faces may pose binding problems to our visual system. This is because facial attributes are represented as separate codes in the brain. There is now ample evidence to suggest the involvement of a distributed network of brain areas that is responsible for the perception of specific facial dimensions (Haxby et al., 2002). For example, the processing of facial expression is governed by the amygdala (Breiter et al., 1996), whereas the processing of facial identity is held mainly in the fusiform area (FFA, Kanwisher, McDermott, Chun, 1997) and the superior temporal sulcus (STS, Haxby et al., 2000). Moreover, recordings in temporal cortex of nonhuman primates (Rolls, & Tovee, 1995; Sugase, Yamane, Ueno, & Kawano, 1999) support the existence of neuronal activity that is distributed across many neurons (Rogers, & McClelland, 2004; Spivey, & Dale, 2004).

Given the involvement of a highly distributed network in processing facial attributes, an acute binding problem may arise. Consider a situation in which you are presented with two facial identities with each conveying a different facial emotion (Jim happy, Dan sad). Your visual system must ensure that each identity is integrated with the correct emotion (Jim + happy, and Dan + sad). This is not a trivial task. Binding problems with faces may be even more difficult than with elementary low-level features. This is because faces, in addition to carrying invariant attributes (e.g., identity and gender), transmit a great deal of dynamic information, such as eye-gaze and emotional expressions. These attributes frequently change their physical appearance as well as their semantic meaning and thus require greater effort in maintaining accurate bindings.

A concrete example may be constructive here. Imagine you are standing in a crowded airport terminal, expecting your uncle to show up. You suddenly detect someone who is smiling at you. Then, you note that this “stranger” is approaching you. Finally, you understand that the man who is weeping on your shoulders is your uncle. Your uncle’s face went through many feature changes in the course of a relatively short period of time. Still, you succeeded in maintaining a single coherent representation. How can this be accomplished? It is likely that some sort of binding mechanism has been operative. Evidence for such binding mechanism comes primarily from situations in which binding fails. In the well-known McGurk effect (McGurk, & MacDonald, 1976), the vocal sound produced by a face is erroneously integrated with the lips movements, such that the perceiver hears a different phoneme than that articulated.

Facial attributes and “person construal”

Cognitive psychologists have invested much effort in studying the perceptual mechanisms that govern face processing (Bruce, & Young, 1986; Burton, Bruce, & Johnston, 1990; Calder, & Young, 2005; Farah, Wilson, Drain, & Tanaka, 1998; Fitousi, & Wenger, 2013; Fitousi, 2015, 2016; Haxby et al., 2000). Social psychologists have also studied the implications of perceiving the faces of others. This work has come to be known as “person construal” (Fiske, & Neuberg, 1990; Freeman, & Ambady, 2011; Macrae, Bodenausen, & Milne, 1995). Person construal research investigates the lower level^{Footnote 1} perceptual mechanisms that produce social cognitive phenomena. A recent influential theory by Freeman and Ambady (2011) has proposed that perception of the social attributes in a face is a dynamic process that evolves over hundreds of milliseconds. In this model, perceptual processing of irrelevant social face attributes can partially activate other face attributes, including motor actions. Event-related potential (ERP) studies supported this conjecture, showing that the extraction of facial attributes (e.g., sex, race, and age) is immediately and concomitantly shared with the motor cortex (Freeman, Ambady, Midgley, & Holcomb, 2011).

Another source of support in the interactive theory of Freeman and Ambady (2011) comes from studies on response trajectories (Freeman, Pauker, Apfbelbaum, & Ambady, 2010; Freeman, & Ambady, 2009). In this type of studies, participants classify faces on a predefined facial attribute (e.g., age) by moving their hand toward one of two labels on the screen. The faces also vary on an irrelevant dimension (e.g., gender). Participants’ hand trajectories are often attracted to the label carrying the name of the irrelevant facial attribute (e.g., woman), indicating its abrupt online activation. These studies support the idea that face attributes interact with other face attributes at perceptual, cognitive, or motor levels. Freeman and Ambady’s (2011) theory contributes valuable insights into the interaction of perceptual and motor aspects of face perception, but it is moot with respect to the binding mechanism that shapes the ultimate representation. What is needed is a broader theoretical framework that can shed light on the binding of facial and motor attributes. The following section proposes such a framework.

From “object files” to “event files”

A systematic analysis of feature binding with objects has been performed by Kahneman and Treisman (1984) and Kahneman et al., (1992). They have used a preview task in which a letter appears in a prime display, and then the same letter or different letters is presented in a probe display. Naming latencies for the probe letter were faster if the letter’s identity was repeated and associated with the same object/location.^{Footnote 2} Kahneman et al. (1992) called this object-specific preview effect. According to these authors, the processing of a visual object leads to the creation of an “object file”, an episodic representation of the object’s identity and location that allows its identification in spite of spatiotemporal discontinuities.

Considerable progress in understanding “object files” has been made by Hommel (1998). He has advanced the theory in various creative ways (Hommel, 2004, 2005; Hommel, & Colzato, 2009). First, Hommel showed that priming effects can be documented even when an object’s location is not repeated, but other of its features are (i.e., object-nonspecific repetition effects). Second, he demonstrated that “objects files” may consist of a subset (i.e., binary bindings) of their features, not necessarily the entire list of features, as argued by Kahneman et al. (1992). Third, object-nonspecific repetition effects represent a processing cost, rather than a benefit (Hommel, & Colzato, 2009). In particular, repeating two given features (e.g., a red square) or alternating the same features (e.g., a blue triangle) yields performance levels that are superior to those observed in conditions in which one of the features is repeated and the other is alternated (e.g., a red triangle). This pattern is called partial-repetition costs (Hommel, 2004, p. 496). Fourth, Hommel introduced the concept of action codes. These are motor and response attributes that are distributed in the brain and are amenable to integration just like visual features-codes (Hommel, Müsseler, Aschersleben, & Prinz, 2001). When action codes integrate with feature codes, they create an “event file”—a mid-level representation or a pointer to a visuo–motor episodic trace (Hommel, 1998). For example, responding to a red object with your right hand may lead to the binding of the red color with the motor code associated with the right hand. Complete repetition or alternation of the features in this newly created combination would enjoy more efficient processing than partial repetitions.

The distributed coding of simple attributes, such as color, shape, and orientation, in the primate brain is well established (Livingstone, & Hubel, 1987). But are more complex attributes, such as facial dimensions coded in a distributed fashion? Haxby and his colleagues (Haxby et al., 2002; Hoffman, & Haxby, 2000) have presented evidence for the existence of a neural system in the human brain of separate localized regions. This system specializes in processing facial attributes. In this system, the ventral temporal cortex and the fusiform gyrus (Kanwisher et al., 1997) are responsible for the processing of invariant facial aspects, such as identity, whereas the superior temporal sulcus (STS) is responsible for the processing of variant attributes, such as eye gaze and emotion (Vuilleumier, Armony, Driver, & Dolan, 2001). The neuronal distributed model proposed by Haxby and his colleagues (Haxby et al., 2002; Hoffman, & Haxby, 2000) suggests that facial attributes are coded in separate brain areas. To date, no direct attempt has been made to study how these face codes are integrated with each other, or how they are bound with action codes (Hommel, 2000).

Overview of the present experiments

Using simple colored shapes, Hommel (1998) adduced consistent evidence for the presence of binding processes, supporting the existence of both visuo–visuo integrations (i.e., form and color, form and location, and color and location) and visuo–motor integrations (i.e., color and response location, form and response location). Hommel’s (1998) methodology and results provide strong evidence for the existence of “object files” and “event files” with low-level features. The present study tested the hypothesis that similar “object files” and “event files” exist for face attributes. A recent study by Keizer, Colzato, and Hommel, (2008) documented integrations of faces with houses, motion, and manual response. The present study departs from the Keizer et al. study in an important way. In that study, the whole face served as the elementary unit of integration, whereas here, facial attributes (e.g., eye gaze and expression) are the integration units, and the main question of interest concerns the binding of these attributes.

Five facial and non-facial attributes were elected for testing: facial identity, emotion (i.e., expression), eye-gaze direction, the face’s spatial location, and the location of the manual response emitted toward the face. Subsets of these five attributes have been tested in a series of four experiments. The reason for choosing these attributes is that they represent the most important and studied face attributes (cf. Haxby et al., 2000, 2002). Another reason is that they encompass both variant (i.e., emotion and gaze direction) and invariant (i.e., identity) attributes (Haxby et al., 2000, 2002).

A word is in order regarding the non-facial attribute of spatial location. A-priori, it seems likely that faces are individuated via their identity (John’s face). However, there is also the possibility that faces are individuated through their location in space. Interestingly, spatial location has not been considered as a consequential variable in face recognition studies, although it has been attributed a fundamental role in tagging an addressing “object files” (Kahneman, & Treisman, 1984; Kahneman et al., 1992; Wolfe, & Bennett, 1997). Hommel has documented partial-repetition costs for combinations of location and response, location and form, but not for combinations of location and color (Hommel, 1998). It is, therefore, crucial to see whether spatial location is critical to the individuation of faces, or for the integration of facial features into an “object file” or “face file.”

The paradigm deployed throughout the present experiments is similar to that used by Hommel (1998, 2004, see also Zmigrod, de Sonneville, Colzato, Swaab, & Hommel, 2013). It is a variation on the original preview method developed by Kahneman et al. (1992). Each trial consisted of a sequence of displays, starting with a cue to response, followed by a face (S1), and replaced by a blank. The blank was then substituted by another face stimulus (S2). Response to the first face, S1, is termed R1, and response to the second face, S2, is called R2. Figure 1 shows a schematic illustration of displays and timings in the experiments. On a trial, each one of the features could be either repeated or alternated from S1 to S2. Similarly, the response feature (i.e., left- vs right-hand response) could be repeated, alternated, or neutral from R1 to R2. The neutral condition means that no response was required in R1. This condition can help decide whether repetition was beneficial or alternation was harmful for performance. The execution of R2 was performed according to the relevant dimension for response (e.g., identity) in the given experiment. The target dimension for response was varied across experiments.

In the present experiments, each facial dimension could take one of two values. Thus, facial identity could belong to either person A or person B (Experiments 1 and 2); similarly, facial emotion could take one of two possible values—sad vs angry in Experiments 1 and 2, or frightened vs angry in Experiments 3 and 4; eye-gaze direction was either averted to the left or to the right (in Experiments 3 and 4), and the spatial location of the face was either on the top or bottom of the screen (Experiments 1–4).

Three effects of major theoretical significance may emerge in this priming setup (Hommel, 1998, 2004). The first is a main effect of stimulus or response feature repetition. Perceivers may benefit from the repetition of facial identity S1 (e.g., Jim) in S2 (e.g., Jim), or due to the repetition of R1 response to S1 (e.g., right-hand key) in S2 (e.g., right-hand key). In that case, perceivers may respond faster to the probe in the identity-repeated condition than in the identity-alternated condition (Burton, Kelly, & Bruce, 1998; Ellis, Young, Flude, & Hay, 1987). This type of effect does not imply integration of features, but it indicates feature priming in short-term memory.

A second type of effect is called partial-repetition costs (Hommel, 2004) and is due to repetition or alteration of combinations of features from S1 to S2. To better understand how this effect is measured, consider the following three types of trials: (1) complete repetitions are trials in which the two features of the stimulus in S1 (e.g., Jim + happy) are repeated in S2 (e.g., Jim + happy), (2) complete alternations are trials in which the two features in S1 (e.g., Jim + happy) are replaced by two different features in S2 (e.g., David + sad), and (3) partial repetitions are trials in which one of the features in S1 is repeated in S2, whereas the other feature is alternated (e.g., Jim+ happy in S1 and David + happy in S2). Partial-repetition costs (Hommel, 2004) are recorded when performance in the partial-repetition trials is worse than that in the complete repetition or complete alternation trials. The presence of such costs entails the formation of an “object file” consisting of a pairwise binding trace of the two pertinent features (Hommel, 1998).

A third type of result is due to the repetition or alteration of feature–response combinations. The repetition or alteration of a specific combination of stimulus–response features conjunction in S1–R1 (e.g., Jim + left key in S1) may be facilitated if completely repeated in S2–R2 (e.g., Jim + left key in S2) or completely alternated (e.g., David + right key in S2), relative to a condition where only one of the features is repeated and the other is alternated (e.g., Jim + right key in S2). Partial-repetition costs with response–stimulus features indicate the formation of an “event file” (Hommel, 1998, 2004). In the theoretical context of the present study, this type of effect may speak to the integration of response codes with facial attributes.

Experiment 1

Faces in Experiment 1 varied on four dimensions: identity, emotion, spatial location, and response location. The relevant dimension for response was facial identity. A central goal of the experiment has been to examine whether facial identity plays a crucial role in the formation of “face files”. Mitroff, Scholl, and Noles, (2007) have shown that the response to facial identity was speeded if identity reappeared in a previously presented object irrespective of the object’s location. The results by Mitroff et al. (2007) suggest the involvement of episodic tokens in the formation of “object files”. It is highly likely that facial identity is an important feature in the formation of “face files”, allowing a coherent representation when a face undergoes spatiotemporal discontinuities. However, the Mitroff et al. study has not been designed to probe identity binding with other facial attributes of perceptual and conceptual variability (e.g., emotion).

Spatial location is another feature that might be operative in the formation of “face files”, serving the visual system as an anchor or pointer toward the perceived face. This is a plausible idea, since spatial tagging mechanisms, such as inhibition of return (i.e., IOR, Posner, & Cohen, 1984), have been shown to affect the detection of faces (Tipper, Weaver, Jerreat, & Burak, 1994). If this hypothesis is correct, partial-repetition costs are expected with spatial location. Another prediction follows from Hommel’s work (1998, 2004) on the binding of visual and action codes. Hommel found that the task-relevant feature is often highly likely to be bound with the response code. It is therefore predicted that facial identity, which serves here as the relevant feature, will be integrated with response code. Finally, Kahneman and Treisman (1984, see also Kahneman et al. 1992), have argued that the creation of “object files” is exhaustive, in the sense that it requires the binding of all constituent features. If such an exhaustive process occurs with faces, full-repetition costs with all four dimensions are expected. This would be indicted by a four-way interaction with identity × emotion × location × response.

Method

Participants

Twenty young volunteers from Ariel University took part in this experiment. These were young male and female undergraduate students (aged 20–28) who participated in partial fulfillment of course credit. All reported normal or corrected-to-normal vision, normal hearing, and unencumbered use of their two hands.

Apparatus and stimuli

The experiment was controlled by a desktop computer. Viewing distance subtended 76 cm from the computer screen. The stimuli consisted of three 3.16° × 2.7° black square outlines arranged vertically from the top to the bottom (see Fig. 1). Four facial identities were deployed. These consisted of two females and two males. Two separate sets of faces were constructed for the male and female faces (see Fig. 2). Each set of images was created by crossing two unfamiliar facial identities (person A and person B) with two facial expressions (sad and angry). The face images were downloaded with permission from the Karolinska directed emotional face (KDEF) database (Lundqvist, Flykt, & Ohman, 1998). The images were altered with the free GIMP software. Each face image subtended 1.88° × 2.33°. The faces were equated for size, brightness, and overall shape. The face stimuli were presented as gray-scale images over a gray or black frame (see Fig. 2).

Each face could appear either in the upper box or in the lower box (see Fig. 1). A middle box, at the center of screen, was used for presenting the cue for response (R1). Response cues were full black arrows which were pointing to the right, left, or both directions (when no response in R1 was needed). Responses were made by pressing the left (“z”) or right (“m”) keys on a QWERTY keyboard.

Procedure and design

The procedure and design were similar to those reported by Hommel (1998). Each experimental trial started with an arrow cue for 1500 ms. Participants withheld their response (R1) to the first stimulus (S1) if the arrow was bidirectional. Participants made a response (R1) to S1 according to the cue if the arrow was pointing only in one direction (left or right). A leftward pointing arrow required a left-hand-key response and a rightward pointing arrow required a right-hand-key response. Participants were informed that there would be no systematic relationship between S1 and R1, so that they should execute the precued response at the onset of S1 while ignoring the irrelevant dimension of S1. A second response (R2) was always a binary-choice reaction to the second stimulus (S2). The critical stimulus dimension in S2 was facial identity. Half of the participants responded to “identity A” with a right-hand response (“m”) and to “identity B” with a left-hand response (“z”), while the other half responded with the reverse assignment. To be able extend the validity of the results beyond a certain identity and gender, one group of participants (n = 12) was presented with the male images (Fig. 2a), whereas the other group of participants (n = 8) was presented with the female images (Fig. 2b).

Figure 1 shows a typical sequence of events in a trial. Each trial began with an arrow cue presented for 1500 ms followed by a blank interval for 500 ms. Then, S1 face appeared for 500 ms and R1 was expected. S1 was then replaced by another blank interval for 500 ms followed by S2. At this stage, R2 was expected. S2 remained on the screen for 2500 ms or until response. An inter-trial interval of 2500 ms preceded the presentation of a new response cue. A block consisted of the factorial combination of S2 identity (person A vs person B), R1 response (left vs right vs both), emotion (sad vs angry), location (top vs bottom box), and R2 response (left vs right), the possible relationships between S1 and S2 (i.e., repetition vs alternation) regarding identity, emotion and location, and the three possible relationships between R1 and R2 (repetition, alternation, or single response). Each experimental block consisted of 192 trials. The experiment consisted of three blocks of trials. The order of trials in each block was chosen randomly by the computer. A 1 min break was allowed between the blocks.

Results

Trials in which RTs were incorrect, longer than 1900 ms, or shorter than 150 ms were removed from the analysis. These amounted to 8.6 % of the total number of trials. Mean RTs and mean proportion of errors were calculated for each possible level of stimuli and responses in the two tasks (R1 and R2). A five-way ANOVA with stimulus set (male, female), response (repeated, alternated), emotion (repeated, alternated), identity (repeated, alternated), and location (repeated, alternated) as factors was performed on mean RTs. Because the effect of stimulus set was far from significance, the data were collapsed to a four-way ANOVA. Table 1 reports those mean RTs along with the error rates (see also Table 5 in the Appendix for an exhaustive list of the ANOVA effects). A significant main effect of emotion [F(1, 19) = 12.47, MSE = 24,055, p < 0.005] revealed that repeating facial expression led to faster responses (802 ms) than alternating it (819 ms). Most importantly, the response × identity interaction [F (1, 19) = 43.43, MSE = 18,4767, p < 0.00001] indicated partial-repetition costs due to bindings of the response feature with the task-relevant facial feature of identity (see Fig. 3a). Responses were faster when both identity and response features repeated or alternated (790 and 782 ms) than when only one of them repeated and the other alternated (830 and 839 ms). It is important to emphasize that interpreting binding effects strictly requires focusing on the interaction as such. The main effects, whether significant or insignificant, are irrelevant to the interpretation of the binding effect. A response × location [F (1, 19) = 5.41, MSE = 10,586, p < 0.05] reflected the binding of spatial location with response (see Fig. 3b). Responses were faster when both location and response features repeated or alternated (807 and 802 ms) than when only one of them repeated and the other alternated (811 and 821 ms). In addition to the creation of these “event files”, which reflected a visuo–motor binding, a significant identity × emotion interaction [F (1, 19) = 4.6, MSE = 11,712, p < 0.05], indicated the binding of identity and emotion (see Fig. 3c), and thus the creation of “object files”. Faster responses were recorded when both facial identity and emotion repeated together (795 and 813 ms) than when only one of them repeated and the other alternated (808 and 825 ms). Two-tailed t tests verified that the benefits and costs associated with all these pairwise bindings were significantly different from zero (all ps < 0.05).

Table 1 Means reaction times (RT) in ms and percentage of error (PE) for R2 in Experiment 1 for conditions of repetition and alteration in S1 and S2 and in R1 and R2

Full size table

Discussion

Experiments 1 underscored partial-repetition costs with both facial and non-facial attributes, adducing consistent evidence for the formation and retrieval of both “object files” and “event files” with facial attributes. These episodic structures are dubbed herein “face files”. The current patterns extend those observed with color-shape objects (Hommel, 1998). They show that: (a) binding can take place with subsets of features rather than the entire list of features (Kahneman et al., 1992) and (b) integration of response-stimulus features can occur with task-relevant as well as with task-irrelevant stimulus features (Hommel, 2004). The results support the hypothesis that high-level social and motor categories conveyed by faces are abstracted, extracted, and become available to perception and action. The results are commensurate with Freeman and Ambady’s (2011) interactive model, according to which social aspects of a face interact with each other as well as with motor codes.

Note that spatial location interacted with the response, but not with any of the other facial features; while the task-relevant attribute (e.g., identity) was bound with the response feature and with the facial attribute of emotion. This might be because identity served as the task-relevant dimension. An alternative explanation is that facial identity serves as a quintessential facial dimension in the individuation of a face. According to this account, identity should be automatically bound with response, as well as with other facial features. In addition, this should hold true even when identity is not the relevant dimension for the task at hand. A plausible hypothesis is, therefore, that it is facial identity and not spatial location that maintains the retrieval of integrated face attributes. A central goal of Experiment 2 has been to decide between these two hypotheses. In Experiment 2 facial emotion was made the relevant dimension for response. If the former hypothesis is correct, it is expected that facial identity would not be integrated with response. If the latter hypothesis is correct, it is expected that facial identity would be integrated with the response feature, as well as with other features.

Experiment 2

Experiment 2 was identical to Experiment 1 in terms of design, procedure and stimuli, except for the fact that emotion served as the relevant dimension for response. Participants were asked to ignore the facial identity as well as other irrelevant features.