Faces are stimuli of vital importance in the human environment and hence supposedly processed by a dedicated neural network. In addition, displayed emotional expressions have important implications during social interaction, and may further raise the relevance of perceived faces. Although a mechanism that encodes, both, faces and emotion-related information at the same time may seem parsimonious, in fact it could pose a serious problem to recognize a person independently from its emotional expression. Therefore, mechanisms that account for the high proficiency in face processing might be separated from those that encode emotional meaning (Haxby et al. 2000). The present study provides evidence for this claim by showing a dissociation of face- and emotion-specific processes, suggesting that both have evolved rather independently in humans.

In event-related brain potentials (ERPs), face-selectivity is reflected by an enhancement of the so-called N170, a negative amplitude peaking around 150–200 ms over temporo-occipital sites (e.g., Bentin et al. 1996; Rossion et al. 2000). Functionally, processes reflected by the N170 have been related to stages of face-specific structural encoding (Bentin and Deouell 2000; Eimer 2000a, b; Itier and Taylor 2004; but see Thierry et al. 2011), that is, the generation of a holistic internal face representation utilized for later processes, as for instance the explicit recognition of a person (cf. Bruce and Young 1986; Eimer et al. 2011).

Brain imaging studies in humans suggest the N170 to stem from a network in posterior regions of the extrastriate visual cortex. This network comprises the inferior occipital and the middle lateral fusiform gyri—termed occipital (OFA) and fusiform face area (FFA), respectively—as well as the superior temporal sulcus in the lateral temporal cortex (Haxby et al. 2000; see also, Dalrymple et al. 2011; Deffke et al. 2007; Ishai 2008; Itier and Taylor 2004; Sadeh et al. 2010). Although the site of the N170 amplitude fits the relative location of this face-selective network, precise contributions of the different brain regions to the N170 remain controversial. Most consistently, the FFA in the fusiform gyrus has been reported to be selectively activated by faces (e.g., Kanwisher et al. 1997, 1999; Puce et al. 1996). However, according to a recent lesion study, the FFA is neither sufficient nor necessary for generating the face-selective N170 component (Dalrymple et al. 2011). In fact, the N170 may result from simultaneous activations in different brain regions, varyingly recruited depending on the concrete task at hand (Calder and Young 2005; Hoffman and Haxby 2000).

Critically, the involvement of face-unspecific, domain-general functions may coincide with face-specific processes (for review see, Palermo and Rhodes 2007; Ishai 2008). Traditionally, it has been assumed that structural encoding of a face and encoding of its changeable features, such as emotional expressions, take place independently and in parallel (Bruce and Young 1986). In line with this idea, many ERP studies found no influence of emotional expressions on N170 amplitudes; however, at variance with traditional accounts, a significant amount of studies reported the N170 to be modulated by emotional facial expressions (see Table 1), yielding higher amplitudes mainly for negative relative to neutral faces.

Table 1 Effects of emotional facial expressions on the N170 in studies using upright, naturalistic faces

To date, the reasons for these contradicting findings remain unclear. Thus, the question whether the N170 as an index of face-specific processes is affected by emotion has not finally been answered. Importantly, looking at the opposing lines of evidence more closely, a striking methodological difference appears: The large majority of studies that found emotion effects on the N170 amplitude used the average activity across all electrodes as reference (average reference; e.g., Batty and Taylor 2003), whereas most of the studies that did not find such effects used single or paired common references located at or close to the mastoids (linked earlobes or mastoids; e.g., Ashley et al. 2004; Pourtois et al. 2005).

Although different types of references do not alter the topography of ERPs across the scalp, they may change the magnitude characteristics of ERPs at certain electrode sites. This principle is evident from theoretical considerations and has repeatedly been shown over the last four decades (e.g., Brandeis and Lehmann 1986; Koenig and Gianotti 2009; Lehmann 1977; for a recent review, Michel and Murray 2012). For the N170, this has been empirically demonstrated by Joyce and Rossion (2005): The more posterior the reference is located, that is, the closer to electrodes where the N170 is typically measured, the smaller the N170 peak amplitude at these sites. In contrast, a fronto-centro-parietal positivity, the vertex positive potential (VPP) suggested to be a polarity reversal of the posterior N170, became increasingly pronounced with a more posterior reference site. The same should apply to emotional modulations over posterior electrodes in the N170 time window (see, Junghöfer et al. 2006).

By definition, the voltage of any potential on the scalp is set to zero at the site of the reference electrode. Hence, any experimental effects will be zero at the location of the reference. So, chances are higher to observe a significant effect if the reference site is further away from the maximal effect site. If the reference happens to be located at the site of a maximal experimental effect, the effect will be pushed to a totally different site. Therefore, using a common reference electrode makes it hard to draw conclusions about the actual location of an effect, unless one has a priori knowledge about its distribution. The average reference, in contrast, is much less sensitive to prior assumptions about electrically neutral locations and thus tends to leave effects at sites where they in fact happen to be maximal (see, Koenig and Gianotti 2009).

Importantly, during the time window of the N170, also the emotion-sensitive early posterior negativity (EPN) arises. The EPN consists in a relatively increased negativity over temporo-occipital electrodes around 150 ms after stimulus onset, most pronounced between 250 and 300 ms, which typically emerges for emotional relative to neutral stimuli. It has been associated with increased perceptual encoding of emotional stimuli in extrastriate visual cortex and occurs spontaneously, that is, even when emotion is not relevant to the current task (for review, Schupp et al. 2006; Rellecke et al. 2012). Like the N170 peak amplitude, the EPN amplitude has been reported to be most pronounced for threat-related expressions (e.g., Schupp et al. 2004; Rellecke et al. 2011), and to occur in ERPs referred to average reference rather than to mastoid or earlobe common references (Junghöfer et al. 2006). In the latter case, emotion effects seem to occur primarily over fronto-central regions (see, Eimer and Holmes 2007), which may reflect a polarity reversal of the EPN (see, Junghöfer et al. 2006)—similar to the inverse relation of the VPP and the N170.

Since the EPN will yield more negative amplitudes for emotional facial expressions over temporo-occipital regions, at least in average-referenced ERPs, it needs to be taken into consideration that it is not necessarily the N170 component itself that underlies the emotion effect. Instead it is conceivable that the emotion effects at typical N170 electrodes may be due to superimposed EPN activity. Note that focussing only on ERPs at posterior sites of the scalp will not distinguish between the N170 and EPN as both components occur with enhanced negativities. Thus, to separate N170 and EPN related activity from each other, a more promising approach should be to compare the components’ overall spatial distributions across the scalp (topography) in the same time window. Differences in scalp distribution indicate that the neural generator configurations of the N170 and EPN differ in at least some respect (Lehmann and Skrandies 1980; McCarthy and Wood 1985; Skrandies 1990). This difference may consist in (partially) non-overlapping generator sources but also in different relative contributions from the same sources. None of these differences would be expected if the N170 and EPN reflected the same process, thus supporting the traditional notion of face- and emotion-specific processes to be carried out independently and in parallel (Bruce & Young, 1986).

By comparing scalp distributions during face processing within the same time window, two previous studies have already distinguished EPN from N170 activity, showing that emotion effects on temporo-occipital amplitudes were due to the former but not the latter component (Rellecke et al. 2011; Schacht and Sommer 2009). The first aim of the present study was to replicate these findings and to show that face- and emotion-specific encoding are in fact based on independent, parallel processes as indicated by distinct scalp distributions of the EPN and N170 component, respectively. By applying different reference montages to the same data, the second aim of the present study was to assess to which degree the use of different referencing techniques (average reference vs. mastoid) alters emotion effects at typical N170 electrodes. This might offer an explanation for the contradicting reports regarding emotion effects on the posterior N170 amplitude in the literature. However, note that if topographical differences emerged for emotion effects and the N170 in the same time window, this would suggest previously reported emotional modulations of the N170 to in fact reflect superimposed EPN activity.

The first part of this investigation involved a reanalysis of data from the study by Rellecke and co-workers (2011) where faces with angry, happy and neutral expressions had been presented in an easy and superficial face-word decisions task. In this study, we had compared N170 and EPN effects based on ERP mean amplitudes averaged across 150–200 ms relative to face onset. In contrast, other studies focussed on emotion effects on the exact N170 peak amplitude rather than the averaged activity for a certain interval (see Table 1). In order to better compare our results with previous reports, the current reanalysis assessed emotion effects and corresponding topographies coinciding with the exact N170 peak amplitude over temporo-occipital electrodes.

The second experiment involved a similar design as the original study by Rellecke et al. (2011); however, inverted images of the same stimuli were also included. Inversion preserves all physical properties of an image while disrupting the holistic configuration of diagnostic facial features relevant for expression recognition (Bartlett and Searcy 1993), thus interferes with emotion processing in faces (cf., Ashley et al. 2004; Eimer and Holmes 2002). Since different emotional expression categories naturally differ in physical stimulus characteristics, face inversion is useful to ascertain the validity of early emotion effects. If emotion effects occur only for upright but not inverted faces, this argues that they reflect the recognition of emotional meaning rather than physical differences between different expression types.

Experiment 1

Methods

Participants

Twenty-four participants between 18 and 35 years of age (11 female) contributed data to the experiment. They were reimbursed with course credits or payment. All participants were right-handed (according to Oldfield 1971), native German speakers with normal or corrected-to-normal vision and without any psychotropic medication or any history of psychiatric condition according to self-report.

Stimuli

Colour portraits of 50 different persons (25 female) displaying angry, happy, and neutral expressions were taken from the Karolinska database (Lundquist et al. 1998) and the NimStim Face Stimulus Set (Tottenham et al. 2009), yielding 150 pictures in all. All images were edited to a unitary format by applying a mask with ellipsoid aperture framing the stimuli within an area of 126 × 180 pixels (4.45 × 6.35 cm) and rendering only the facial area visible (see Fig. 1a). Luminance (according to Photoshop™) and contrast (SD of luminance divided by M of luminance; cf. Delplanque et al. 2007) of each image was automatically adjusted by eliminating extreme pixels using Photoshop; multivariate analyses of variance confirmed that facial expressions did not vary on these parameters (mean luminance and contrast: angry = 122 and 0.3; happy = 123 and 0.3; neutral = 123.2 and 0.3), Fs(2,147) ≤ 2.4, ps = .690 and .095, respectively. Each face stimulus was presented at the centre of the screen on a dark grey background and appeared only once during the experiment.

Fig. 1
figure 1

Examples of angry, neutral, and happy facial expressions (from left to right). a Upright stimuli used in Experiment 1 and 2. b Inverted stimuli used in Experiment 2 (Color figure online)

In addition, 150 word stimuli were included (white Arial letters; mean length: 6.3 ± 1.5 letters, 3.1 ± 0.6 syllables; height: 8 mm; maximal width: 39 mm) for purposes other than the present report (see, Rellecke et al. 2011).

Procedure

The study was performed in accordance with the Declaration of Helsinki. After signing written informed consent, participants were seated at a viewing distance of 55 cm in front of a computer screen in a dimly lit, sound-attenuated room. Participants were instructed to classify each stimulus as “face” or “word” as fast and accurately as possible by pressing one of two horizontally arranged buttons. Each trial started with a fixation cross presented for 500 ms at the centre of the screen, followed by a face or word displayed for 1,000 ms at the same location. In order to minimize ocular artifacts, participants were asked to look at the fixation cross before stimulus appearance and to avoid movements and eye blinks during a trial. Responses had to be given within an interval of 1,500 ms after stimulus onset. The next trial always started 1,500 ms after stimulus onset. A practice block of about 12 trials was conducted using stimuli not used in the experiment proper. The practice trials were repeated till performance was considered appropriate. The main experiment consisted of 300 trials (150 faces and words each) divided into six blocks of 50 trials each, with a break after each block. The stimulus-response mapping was balanced across participants and stimulus presentation was fully randomized. Each stimulus appeared once for each participant yielding 50 trials in each experimental cell.

EEG Recording and Pre-Processing

The electroencephalogram (EEG) was recorded from 56 electrodes according to the extended 10–20 system (Pivik et al. 1993). Most electrodes were mounted within an electrode cap (Easy-Cap™). The left mastoid electrode (A1) was used as initial reference and AFz served as ground. Four additional external electrodes were used for monitoring the vertical and horizontal electrooculogram. Impedance was kept below 5 kΩ, using ECI electrode gel (Expressive Constructs Inc., Worcester, MA). All signals were amplified (Brain Amps) with a band pass of 0.032–70 Hz; sampling rate was 250 Hz. Offline, the continuous EEG was corrected for blinks via BESA software (Brain Electrical Source Analysis, MEGIS Software GmbH) and segmented into epochs of −200 to +1,000 ms relative to stimulus onset. These epochs were recalculated to average reference (AR) and average mastoid reference (MR; averaged across the left and right mastoid), yielding two separate sets of ERPs about 57 and 55 electrodes, respectively. ERPs were calculated for the edited set of raw data and referred to a 200-ms pre-stimulus baseline. Epochs containing incorrect responses were discarded as well as epochs showing amplitudes exceeding −200 or +200 μV or voltage steps larger than 100 μV per sampling point in any of the channels (artefacts).

Data Analysis

N170 peak amplitudes were automatically detected as local minima between 100 and 200 ms at P10, P9, PO10, and PO9, separately, in both AR and MR data. These electrodes were chosen as they showed largest N170 peak amplitudes at symmetrical electrode sites across hemispheres. Peak amplitudes were analyzed by means of repeated measures analysis of variance (ANOVA), α = .05, including the factor Reference type (AR, MR), and Emotion (angry, happy, neutral); electrode was included as an additional 4-level within-subject factor. For peak amplitudes showing significant main effects of Emotion and/or interactions between emotion and reference, Bonferroni-adjusted pos hoc tests were performed by separate ANOVAs in AR and MR data, involving pair-wise comparisons of emotional categories.

For those pair-wise emotional categories that yielded significantly different peak amplitudes at N170 electrodes (P10, P9, PO10, and PO9), ERPs across the scalp coinciding with the exact time point of the average peak (across N170 electrodes for each participant separately) were analyzed by means of repeated measures ANOVA, involving the within-subject factors emotion (2-levels) and electrode [57- (AR) or 55- levels (MR)].

To test whether Emotion effects were distinguishable from the N170 with regard to their topography, amplitude differences were eliminated by vector scaling (McCarthy and Wood 1985). Vector scaling adjusts for effects of amplitude by dividing the voltage at each electrode by the root mean square of activity across all electrodes (i.e. global field power, GFP; Skrandies 1990) for the same time point and condition. Therefore, one can infer that any difference across electrodes between two conditions is due to the spatial distribution of ERPs rather than amplitude. After adjusting for amplitude, repeated measures ANOVAs were performed, including all electrodes as within-subject factor levels. Note that in order to avoid comparing overlapping data sets, different emotion conditions were used for the calculation of the N170 and Emotion effect topographies (e.g. N170 to happy vs. Emotion effect of angry minus neutral expressions).

The average reference sets the mean value of the ERP amplitude across all electrodes within a given condition to zero. Therefore, for repeated measures ANOVAs on AR data including all electrodes across the scalp, only effects in interaction with electrodes are meaningful. To keep results comparable across the two reference conditions, only electrode interactions will be reported for both, AR and MR data. Degrees of freedom were Huynh–Feldt corrected to account for violations of the sphericity assumption; we report the original degrees of freedom, the correction factor Epsilon, and corrected p-values.

Results

Reaction times (M = 424 ms) and error rates (missed and incorrect responses, M = 2.6 %) were both unaffected by Emotion, Fs(2,46) < 1.8, ps ≥ .174 (see Rellecke et al. 2011).

The latency of the N170 amplitude did not vary between the selected electrodes, F(3,69) = 1.7, p = .200 (mean peak latency: M = 158 ms, range 156–163 ms). A main effect of Reference on peak amplitudes was observed, F(1,23) = 69.7, p < .001, η 2 p  = .752. As can be seen in Fig. 2a, this effect was due to much more pronounced peak amplitudes in AR relative to MR data. Also an effect of Emotion occurred, F(2,46) = 21.0, p < .001, η 2 p  = .477, that interacted with Reference, F(2,46) = 13.8, p < .001, η 2 p  = .376. Post hoc comparisons of emotion categories revealed that peak amplitudes in both AR and MR data significantly differed for angry (M = −15.5 and −8.9 μV) compared to neutral (M = −13.5 and −8.1 μV) and happy expressions (M = −13.8 and −8.2 μV); however, the Emotion effect was stronger in AR than MR data, Fs(2,46) = 26.1 versus 7.2, ε (only AR) = .832, ps < .01, η 2 p s = .532 versus .239. As depicted in Fig. 2a, AR and MR data both yielded increased peak amplitudes between 100 and 200 ms for angry relative to neutral and happy expressions but these effects were more pronounced in average-referenced ERPs.

Fig. 2
figure 2

Effects of emotional facial expressions and reference during the N170 peak amplitude. ERPs to angry (red), happy (blue), and neutral (grey) facial expressions averaged across typical N170 electrodes (P10, P9, PO10, PO9) in average reference (solid lines) and mastoid reference (dashed lines). For illustration purposes ERP curves were filtered at 15 Hz. Larger maps (range: −16–16 μV) depict the topography of the N170 (averaged across facial expressions), surrounding smaller maps (range: −2–2 μV) depict Emotion effects of angry relative to neutral (left) and happy expressions (right) for the same time point in average reference (bottom) and mastoid reference (top). a In Experiment 1, peak amplitudes differed for angry relative to neutral and happy expressions, respectively, in both average reference and mastoid reference. b In Experiment 2, peak amplitudes differed between angry and neutral expressions (upright faces) only in average reference. Inversion of the stimuli completely eliminated this emotion effect (right graph) (Color figure online)

Effects across all electrodes were also significant for angry compared to neutral and happy expressions in both AR, Fs(56,1288) = 9.4 and 9.5, εs = .079 and .116, ps < .001, η 2 p s = .290 and .292, and MR data, Fs(54,1242) = 5.9 and 9.9, εs = .075 and .090, ps < .001, η 2 p s = .203 and .300. As can be seen in Fig. 2a, Emotion effects were associated with an increased posterior negativity accompanied by a fronto-centro-parietal positivity for angry relative to other facial expressions in AR and MR data. Notably, the posterior negativity was stronger in AR data, while in MR data the anterior positivity was more pronounced.

Comparison of vector-scaled topographies of the N170 (for happy or neutral expressions) and emotion effects (for angry relative to neutral or angry relative to happy expressions, respectively) yielded significant differences in, both, AR, Fs(56,1288) = 5.8 and 5.7, εs = .131 and .147, ps < .001, η 2 p s = .202 and .198, and MR data, Fs(54,1242) = 7.5 and 6.3, εs = .143 and .163, ps < .001, η 2 p s = .247 and .215. Note that topographies are independent from reference, so AR and MR data yield highly similar results; slightly different F-values for topographical comparisons within each data set are due to the different numbers of included electrodes. As apparent from Fig. 2a, topographies of the Emotion effects yielded more pronounced centro-parietal positivities than the N170 component in both, AR and MR data.

Discussion

As indicated in the introduction, we hypothesized ERP effects over posterior sites of the scalp to be more pronounced in average than mastoid reference. This was confirmed by the finding of larger overall peak amplitudes over typical N170 electrodes and their stronger modulation by emotional facial expressions in average-relative to mastoid-referenced data. Therefore, it appears that effects of emotional facial expressions coinciding with the N170 peak amplitude over temporo-occipital electrodes are more likely to surface in average-referenced ERPs. In contrast, in ERPs referenced to mastoids, posterior peak amplitudes and their modulation by Emotion were less pronounced at N170 electrodes due to the spatial proximity of the site of reference (cf., Junghöfer et al. 2006).

Importantly, comparison of vector-scaled topographies yielded that overall spatial distributions of the Emotion effect and the N170 differed. This suggests that effects of emotional facial expressions on amplitudes at typical N170 electrodes do not reflect a modulation of the N170 component itself—or at least not solely of the N170—but rather superimposed activity of an emotion-sensitive component. As hypothesized, a likely candidate is the EPN, most pronounced for threat-related expressions (e.g., Schupp et al. 2004; Rellecke et al. 2011, 2012), so increased negative amplitudes for angry relative to other facial expressions can be expected to occur during the time of the N170 peak.

Topographies of the EPN and N170 seemed to mainly differ due to the distribution of an accompanying fronto-centro-parietal positivity. Such fronto-centro-parietal positivities are very likely polarity reversals of the EPN and N170, respectively (Joyce and Rossion 2005; Junghöfer et al. 2006). Different topographies of the EPN and N170 across the scalp suggest at least partly distinct neural generators involved in emotion- and face-specific encoding, respectively—which is in accordance with traditional models of face processing assuming both mechanisms to proceed in parallel and independently (cf. Bruce and Young 1986).

Experiment 2

Since inversion is known to interfere with face and emotional expression processing by disrupting the coherence of diagnostic facial features (e.g., Eimer 2000b; Eimer and Holmes 2002; Haxby et al. 1999), we included also inverted stimuli in “Experiment 2”. If the increased negativity for emotional (angry) expressions over N170 electrodes was indeed based on the recognition of emotional meaning in faces, it should only occur for upright but not inverted faces. We again expected emotion effects and the N170 peak amplitude coinciding over temporo-occipital sites to be more pronounced in average- rather than mastoid-referenced data.

Methods

Twenty participants between 21 and 32 years of age (six female) contributed data to the experiment. They were reimbursed with course credits or payment. All participants were right-handed (Oldfield 1971), native German speakers with normal or corrected-to-normal vision and without any psychotropic medication or any history of psychiatric condition according to self-report.

The same stimuli were used as for “Experiment 1” but each face and word stimulus was also inverted by flipping the image along its vertical axes (see Fig. 1a, b), yielding a total stimulus set of 600 items containing 150 upright and 150 inverted images for faces and words each.

The procedure was identical to “Experiment 1”, apart from the main experiment consisting of 600 trials divided across 12 blocks. EEG recording and pre-processing were identical to “Experiment 1” as was the statistical analyses of ERPs, except for the inclusion of Orientation (upright, inverted) as additional factor.

Results

Reaction times (M = 404 ms) and mean error rates (M = 2.3 %) were both unaffected by Emotion and Orientation, Fs ≤ 2.3, ps ≥ .145.

Again, latencies of N170 amplitudes did not differ between selected electrodes, F(3,57) = 2.0, p = .147 (mean peak latency: M = 170 ms, range 164–174 ms). Like in “Experiment 1”, a main effect of Reference was observed for peak amplitudes at N170 electrodes, F(1,19) = 74.7, p < .001, η 2 p  = .797. As can be seen in Fig. 2b, this difference was, again, due to larger peak amplitudes in AR relative to MR data. This time, no main effect of emotional facial expressions on peak amplitudes over N170 electrodes occurred, Fs(2,38) = 1.2, p = .305. However, again, an Emotion × Reference interaction was observed, F(2,38) = 5.1, p < .05, η 2 p  = .211. In addition, an Emotion × Orientation interaction emerged, F(2,38) = 6.1, p < .01, η 2 p  = .244. Pos hoc tests yielded that these interactions were due to emotional facial expressions affecting peaks at N170 electrodes only for upright stimuli in AR data, F(2,38) = 6.5, p < .05, η 2 p  = .254—with significantly larger amplitudes for angry (M = −13.4 μV) relative to neutral expressions (M = −12.1 μV) and a trend for angry versus happy expressions (M = −12.5 μV), p = .072—while Emotion effects were entirely absent in MR data and did not occur for inverted faces in AR data, Fs(2,38) ≤ 3.8, ps ≥ .064. As can be seen in Fig. 2b, face inversion eliminated the Emotion effect on peak amplitudes at N170 electrodes in average-referenced ERPs.

For angry relative to neutral upright facial expressions in AR data, also ERPs across all electrodes significantly differed, F(56,1064) = 3.1, ε = .098, p < .01, η 2 p  = .142. As can be seen in Fig. 2b (AR data), angry expressions induced an increased temporo-occipital negativity, this time accompanied by an anterior positivity focussing on centro-parietal regions more strongly. Again, comparison of vector-scaled topographies yielded significant differences between the Emotion effect (angry minus neutral) and the N170 component (for happy expressions), F(56,1064) = 6.9, ε = .111, p < .001, η 2 p  = .267 (AR data).

Discussion

Once again we found peak amplitudes at N170 electrodes and their emotional modulation to be more pronounced when ERPs were referred to average reference than when mastoid reference was applied. Again, overall spatial distributions of the original N170 and the Emotion effect significantly differed for the same time point, suggesting the N170 component itself to be unaffected by emotion. Instead, emotional amplitude modulations coinciding with the N170 peak amplitude over temporo-occipital electrodes seem to originate from superimposed EPN activity (Schacht and Sommer 2009; Rellecke et al. 2011). A further finding is that Emotion effects occurred only for upright but not for inverted faces. This supports the notion that the EPN is in fact based on the processing of emotional meaning rather than low-level physical features in facial stimuli.

However, unlike in the first experiment, this time ERPs referred to mastoids did not yield any Emotion effect on peak amplitudes over N170 electrodes. Notably, overall, Emotion effects were smaller in “Experiment 2” than in “Experiment 1”, as apparent from the effect of angry relative neutral expressions across electrodes being larger in “Experiment 1” (η 2 p  = .290) than “Experiment 2” (upright faces; η 2 p  = .142). Thus, it seems that in mastoid-referenced data, the likelihood of obtaining emotional modulations for peak amplitudes between 100 and 200 ms at N170 electrodes decreases with the overall impact of emotional facial expressions on ERPs; that is, the closer the reference is located to posterior regions and the weaker the emotion effect across electrodes, the more unlikely the detection of Emotion effects at temporo-occipital electrodes coinciding with the N170 component (cf., Junghöfer et al. 2006).

The overall decrease of the effect of emotional facial expressions in “Experiment 2” relative to “Experiment 1” is most likely due to the use of inverted stimuli. Task and trial sequencing were exactly the same as in “Experiment 1”, thus the intermixed and random presentation of inverted and upright stimuli must have reduced the impact of emotional facial expressions. Since inverted and upright faces are processed differently (Haxby et al. 1999, 2000), presenting stimuli of different orientations in random order could have changed the implicit processing strategy of the participants (cf., Rellecke et al. 2012). Also the scalp distribution of the Emotion effects seemed different, with an anterior positivity more strongly centring on centro-parietal electrodes in “Experiment 2”, which further suggests a different quality of emotion processing to some degree in both experiments.

General Discussion

We investigated whether face-specific processing as indicated by the N170 entails the encoding of emotion. The previous literature on this phenomenon has yielded conflicting evidence, which is likely due to the different referencing procedures applied (cf., Junghöfer et al. 2006).

Replicating findings of Joyce and Rossion (2005), we found that the face-locked N170 peak amplitude over typical temporo-occipital electrodes decreased when the reference was located at the mastoids relative to when average reference was used. The same held true for effects of emotional facial expressions: Emotion effects coinciding with the N170 peak amplitude at temporo-occipital electrodes were more pronounced when ERPs were referred to average reference than when mastoids were used. More generally speaking then, the closer the sites of recording and reference are located to each other, the less pronounced are effects in the ERP. This offers an explanation as to why many studies using average reference reported emotional modulations of the N170 peak amplitude, while others using more posterior reference sites did not find such an effect. Moreover, it appears conceivable that reports on emotional expression modulations arising in the N170 time window over anterior sites in linked earlobes reference montages (for review, Eimer and Holmes 2007) reflect effects at the positive pole of the EPN shifted to frontal electrodes by mastoid references.

Note that site of reference alone does not completely determine whether effects of emotional facial expressions are observed over temporo-occipital electrodes in the N170 time window. It is still possible to obtain emotion effects with a mastoid reference as we did in “Experiment 1” (see also, Williams et al. 2006), or to find no emotion effect with average reference (e.g., Dennis and Chen 2007). This hints at other experimental factors relevant for the occurrence of increased posterior negativities for emotional facial expressions during the N170 peak amplitude; most likely factors affecting the overall magnitude of the emotion effect in the N170 time window. We thus conclude that the site of reference merely determines the likelihood of emotional facial expression effects to be detected during the N170 peak amplitude.

Most importantly, given the overall spatial distributions of the N170 and the Emotion effect to differ for the same time point (in both mastoid- and average-referenced data), emotional modulations of the N170 peak amplitude as reported by others appear spurious. In fact, different topographies for the N170 and EPN indicate—at least to some degree—different neural sources to be active in parallel for face- and emotion-related processes, respectively, which is in line with traditional models of face processing (Bruce and Young 1986). Therefore, emotional modulations coinciding with the N170 peak amplitude at temporo-occipital electrodes seem to reflect a component superimposed on the N170 wave shape. The topography of this effect resembled the emotion-sensitive EPN (Schupp et al. 2006). Notably, as must be the case from simple mathematical considerations, changing the reference from mastoids to average did not change the distinct topographical shapes of the N170 and the Emotion effect (EPN), but only shifted the magnitudes of potentials to more positive or negative values (most apparent in Fig. 2a; see, Junghöfer et al. 2006). That is, changing the point of reference leaves intact all topographical features (relative amplitudes between any given pair of locations) except for the zero equipotential line.

In sum, our findings suggest that apparent effects of emotional facial expressions on the N170 peak amplitude may be misleading and at least in part reflect other processes than a modulation of the N170 component itself. Instead, emotion effects coinciding with the N170 peak amplitude over temporo-occipital electrodes appear to consist in EPN activity at the same latency; given the high human expertise in face processing it is feasible that emotional meaning can be extracted at such an early stage. Detectability of the overlapping EPN depends on the referencing procedure with emotion effects being less pronounced at posterior electrodes with more posterior reference sites. Our data is in line with a functional dissociation of processes related to face (N170) and emotion (EPN) encoding, as suggested by traditional models of face processing.