Introduction

Telling apart complex visual patterns is a fundamental brain function. Among all visual patterns, the face holds a special value for the human brain: it is a highly complex stimulus made of multiple parts, and it is the most familiar and socially relevant stimulus in the visual environment. Throughout the lifespan, the human visual system is exposed to thousands of individual face exemplars, which form a highly homogenous visual category (Fig. 1). Being able to discriminate these faces from one another (i.e., individual face discrimination) accurately and rapidly is critical for human beings.

Fig. 1
figure 1

The human brain is able to tell apart complex visual patterns of a visually homogenous category, such as faces, efficiently and rapidly, even when the faces have never been encountered (unfamiliar faces), and even when they do not have external features, as in this figure. The difficulty of this task is illustrated when the different faces are presented in an unusual orientation, for instance if the exact same faces are presented upside-down, as above. In this inverted orientation, their visual similarity is striking. Measuring and characterizing the nature of this high-level visual discrimination process remain particularly challenging

Behavioral studies have revealed the visual cues playing an important role in individual face discrimination, whether the faces are familiar or unfamiliar. For instance, 3D shape and surface (i.e., texture, color) cues provide complementary information (Bruce and Young 1998; Jiang et al. 2006). Furthermore, among all facial parts, the region of the eyes is particularly diagnostic for telling faces apart (Haig 1985; Sadr et al. 2003). However, rather than being perceived independently, facial parts appear to be integrated into a unified representation of the whole face, a so-called holistic/configural representation (Sergent 1984; Young et al. 1987; Tanaka and Farah 1993; Maurer et al. 2002; Rossion 2013).

Individual face discrimination involves a large set of brain areas, in particular those areas responding preferentially to faces in the right ventral occipito-temporal cortex (e.g., Gauthier et al. 2000; Grill-Spector and Malach 2001; Davies-Thompson et al. 2009). Event-related potential (ERP) recordings on the human scalp during face repetition indicate that individual faces can be discriminated as early as 160–170 ms from stimulus onset (e.g., Jacques et al. 2007; for later individual face discrimination responses, see e.g., Schweinberger et al. 2002).

Because of their remarkable ability to discriminate individual faces, human adults have been defined as “face experts” (Carey 1992). Nevertheless, the expert level of performance at individual face discrimination is only attained following a long developmental course and considerable visual experience (Mondloch et al. 2003; Crookes and McKone 2009; Germine et al. 2011). Moreover, there is large amount of inter-individual variance of this ability in the normal adult population (Bowles et al. 2009; Wilmer et al. 2010). Deficient early visual experience, such as in people with a history of early life visual deprivation from bilateral congenital cataracts, can be detrimental for individual face discrimination in adulthood (LeGrand et al. 2001; de Heering and Maurer 2014). In case of brain damage to right occipito-temporal areas leading to prosopagnosia, individual face discrimination can be severely and specifically disrupted (e.g., Sergent and Signoret 1992; Barton 2008; Busigny et al. 2010; see also Valentine et al. 2006). Additionally, this function can be impaired in typical adults by simple stimulus manipulations, such as picture-plane inversion (Yin 1969; for review, see Rossion 2009) or contrast-reversal (Galper 1970; Russell et al. 2006), and people are generally poor at discriminating individual faces of a nonexperienced morphology (e.g., the “other-race effect”, Malpass and Kravitz 1969; for review: Rossion and Michel 2011).

Despite the importance of individual face discrimination and the large amount of research on this function, it is still poorly understood. For this reason, many issues remain hotly debated: the respective importance of certain facial cues over others for individual face discrimination, the face-specificity and the nature of this process (e.g., holistic/configural or part-based), its developmental course, whether it is affected by long-term familiarity of the faces or not, how it can be measured and characterized, etc. A major reason for this gap in our knowledge is the lack of a highly sensitive and objective measure of individual face discrimination, i.e., a measure that is not contaminated by cognitive and decisional processes and which can be recorded using the same stimulation paradigm in all kinds of human populations. This paper provides a review of a highly promising approach recently developed to fill this gap and to make decisive progress in understanding individual face discrimination: fast periodic visual stimulation (FPVS), coupled with the recording of electrical (or magnetic) brain responses from the human scalp.

Fast periodic visual stimulation (FPVS) and the steady-state visual evoked potential (SSVEP)

The SSVEP

When a stimulus, or the property of a stimulus, is repeated at a fixed rate, i.e., periodically, it generates a periodic change in voltage amplitude in the electrical activity recorded on the human scalp by electroencephalogram (EEG), or a change in the magnetic fields as captured by magnetoencephalography (MEG). In ideal conditions, this electrophysiological response is stable in phase and amplitude over time, and for this reason, it has been defined as a “steady-state” response (Regan 1966). This “steady-state” EEG response was first recorded in response to visual stimuli in the 1960s by researchers who were interested in responses to luminance flicker (Van der Tweel and Lunel 1965; Regan 1966) and called this response the “steady-state visual evoked potential” (SSVEP). Since a SSVEP is a periodic response, it is confined to a specific frequency and it is thus natural to analyze it in the frequency domain instead of the time-domain. The stimulus frequency f determines the response frequency content: the response spectrum can have narrow-band peaks at frequencies that are directly related to the stimulus frequency (Fig. 2).

Fig. 2
figure 2

a Sinusoidal contrast modulation (0–100 %) of different face identities at a periodic rate of 3.5 Hz (Rossion and Boremanse 2011). b EEG spectral power between 1 and 6 Hz following only 60 s of the periodic stimulation depicted in (a). The EEG spectrum, averaged across 12 participants, is depicted for the right occipito-temporal electrode site, PO8, as illustrated on the topographical map of the back of the head, which shows the largest response at the fundamental frequency of stimulation (3.5 Hz). Stimuli and data from Rossion and Boremanse (2011). Note the narrow frequency bin at which the response of interest is recorded

Beyond the SSVEP: the FPVS approach

SSVEPs—or steady-state visual evoked magnetic fields (SSVEFs) in MEG—have been recorded mainly to low-level visual stimuli, or to periodic changes of low-level attributes of simple stimuli: luminance, contrast, spatial frequency, color or motion (e.g., Regan 1966, 1974; Tyler and Kaitz 1977; Zemon and Ratliff 1982; Braddick et al. 1986; Norcia et al. 2002). Many studies have also used SSVEPs to investigate spatial and selective attention; an increase in attention to one of two simultaneously flickering stimuli increases specifically the response to the attended stimulus (Morgan et al. 1996; Müller et al. 2006) and may even reduce the attention to an unattended stimulus (Chen et al. 2003). In all these contexts, the SSVEP is thought to be a stable repetitive response (with the exception of its modulation by fluctuations of attention), which is obtained when the exact same stimulus is repeated at a high-frequency rate (i.e., roughly above 5–6 Hz).

A SSVEP response is traditionally thought to be located over occipital medial electrode sites (typically Oz), originating mainly from the primary visual cortex (Muller et al. 1997; Di Russo et al. 2007; but see Srinivasan et al. 2006, and “The frequency-tuning function of individual face discrimination” section of the present paper). It has also been proposed that SSVEP responses originate from the mere linear sum of transient event-related responses (Bohórquez et al. 2007; Capilla et al. 2011; see Galambos et al. 1981 in the auditory domain and Regan 1989 for an in-depth discussion of this issue). However, whether these assumptions are valid or not, they concern the term “SSVEP” not the principle of fast periodic visual stimulation. FPVS is a more generic term that defines the approach, rather than the assumed nature of the brain response, and for this reason, it will be used preferably here.

Using periodic stimulation, it is clear that periodic responses in EEG (or MEG): (1) can be obtained at various frequency rates, including relatively slow rates (e.g., 3 Hz, Appelbaum et al. 2010), even though using higher-frequency rates than in typical behavioral or transient ERP experiments can be advantageous (“The frequency-tuning function of individual face discrimination” section); (2) can be obtained even when the stimulus changes substantially at the periodic rate (e.g., in attentional blink paradigms, Keil et al. 2006; Talsma et al. 2006);Footnote 1 (3) are not confined to low-level visual areas, providing that complex stimuli and adequate frequency rates are used; and (4) are not necessarily stable in amplitude over time. All of these aspects of the approach are well illustrated by FPVS studies of individual face discrimination, as introduced in the next section.

FPVS in high-level vision: a first approach of individual face discrimination

A proof of concept

Presenting different face pictures to human observers for 1 min at a fixed rate of 3.5 Hz, namely 3.5 faces by second for a total of 210 face stimulations, leads to a clear EEG response confined to a narrow 3.5-Hz frequency bin (Fig. 2; Rossion and Boremanse 2011). This response is of about 1 µV, which is not a large amplitude compared to standard ERP responses to transient events. However, the signal-to-noise ratio (SNR) of this response—computed by comparing the amplitude at the 3.5-Hz frequency bin (the signal) to that of its neighboring bins in the spectrum (the noise)—is extremely high: following only 1 min of recording, the response at the targeted frequency can be three or four times larger than the response at neighboring bins (i.e., an increase in the SNR of 300–400 %). This 3.5-Hz response to faces peaks over posterior electrode sites, with a maxima over the right occipito-temporal cortex (channel PO8) (Fig. 2b).

Importantly, the 3.5-Hz response obtained with this paradigm cannot be attributed specifically to faces, let alone to individual face discrimination. Rather, it reflects a global change in contrast between a uniform gray background and a face picture. Thus, the response contains a mixture of low-level and high-level visual responses. However, this response can be contrasted to the response obtained when the exact same face picture is now repeated at a 3.5-Hz rate (Fig. 3). In this latter case, the 3.5-Hz response remains substantial. However, this response is now much lower in amplitude (SNR decrease of about half), decreasing specifically over the right occipito-temporal cortex. Hence, comparing (i.e., subtracting) the two conditions reveals a differential 3.5-Hz response over bilateral occipito-temporal cortices, with a right hemispheric dominance (Fig. 3).

Fig. 3
figure 3

EEG power spectrum and topographical maps of the difference between “different faces” and “same face” conditions, as obtained following 60 s of stimulation at 3.5 Hz (grand-averaged data over 12 participants, Rossion and Boremanse 2011)

The only difference between the two conditions is whether the face identity varies or not at every cycle: all the different faces are equalized in luminance, and there is also a substantial change in face size at every cycle in both conditions (range of 82–118 % of a base face, Fig. 2 of Rossion and Boremanse 2011, see movie 1 in supplementary material for 4 s of stimulation at 3.5 Hz, i.e., 14 faces). Hence, this differential response between the two conditions can be safely attributed to the brain’s discrimination of individual faces (including low-level differences, such as color variations, contrast and power spectrum, which are part of what constitutes face identity and contribution of which can be estimated by other stimulus manipulations, see “Inversion and contrast reversal” section).

The originality of the present FPVS approach lies in the comparison of two conditions in which the property of interest—here, face identity—either varies periodically or remains stable at every periodic stimulation cycle. Hence, this approach differs from other “SSVEP” studies that have presented complex visual stimuli at fast periodic rates. For instance, in attentional blink paradigms, different words or letters are presented at a fast periodic rate (e.g., 6–8 Hz) in all sequences and the SSVEP response is compared for different targets included within the sequences (Keil et al. 2006; Talsma et al. 2006). In other studies, the exact same image, namely a visual scene or an isolated object (e.g., Keil et al. 2003; Moratti et al. 2004; Kaspar et al. 2010), or even a face in recent studies (McTeague et al. 2011; Wieser et al. 2012; Gruss et al. 2012), is presented at a fixed rate in a given trial of a few seconds. The stimulation rate is usually high (10 Hz or more), and the periodic EEG response obtained during a few seconds is usually measured at medial occipital sites and compared across different task manipulations, or different kinds of stimuli (e.g., a pleasant vs. an unpleasant image). In contrast, in the approach presented here, it is the property of interest that changes or remains constant at a periodic rate, the response of interest being isolated by a comparison of the two conditions.Footnote 2

The strengths of FPVS

This first experiment presented in “A proof of concept” section illustrates most of the virtues of the FPVS approach. These advantages are not the spatial and temporal resolution (i.e., where and when specific brain processes happen). Spatial resolution, when recording on the scalp, is limited inasmuch as the spatial resolution of EEG and MEG is limited: there is an inherent uncertainty regarding the localization of the generators of electrophysiological responses recorded on the scalp (Luck 2005; Nunez and Srinivasan 2005). Hence, even though the differential response measured when contrasting different and identical sequences of faces is observed over the right lateral occipital, or right occipito-temporal, cortex, the exact cortical sources of this effect remain unknown and probably involve many populations of neurons distributed in several occipital and temporal areas (see “Frequency-tuning of individual face discrimination in human fMRI” section).

Temporal resolution of the FPVS approach in EEG/MEG is also limited: the relatively high-frequency rates of stimulation used in common practice make it difficult or impossible to define the time-course of the response with respect to the stimulus onset. That is, at 3.5 Hz, still a relatively slow rate in such experiments, the stimulus onset asynchrony is only of 286 ms. Hence, there is an overlap between the responses elicited by the different stimuli, making it difficult to identify the exact timing of a response relative to the stimulus onset.

Therefore, even though FPVS provides a direct neural response, at a global scale (the activity of populations of neurons recorded at a distance), and even if this response has a distinct signature on the scalp, this approach should be considered primarily as an alternative to psychophysics, or behavioral measures (Regan 1977). Nevertheless, compared to psychophysics or other approaches in systems and cognitive neuroscience, such as the measure of EEG responses to transient events, the FPVS approach has at least four invaluable strengths that are illustrated by the result described above and below.

First, FPVS provides an objective measure of a process, namely a response at an exact frequency that is defined in advance by the experimenter. A 3.5-Hz stimulation provides an exact 3.5-Hz response (and harmonics, see below), and all of the process of interest appears to be confined within this small frequency bin. In contrast, transient events lead to multiple ERP components in overlapping time-courses, or transient event-related synchronization or desynchronizations (ERS/ERDs) which spread over various relatively large and variable frequency bands (i.e., theta, beta, alpha and gamma range) (see Klimesch et al. 2007; Mouraux and Iannetti 2008). Even when a particular component of interest is targeted (e.g., “the face-sensitive N170 component”, e.g., Bentin et al. 1996; George et al. 1996; see Jeffreys 1989 for early evidence; Rossion and Jacques 2011 for review), its presence and temporal definition is highly subjective, paving the way for inadequate definition and measurement (see Rossion and Jacques 2008).

This leads to the second advantage of the FPVS approach: the quantification of the response is quite straightforward. In the above example, the magnitude of the 3.5-Hz response can be measured unambiguously, without post hoc definition of a particular time-window or a frequency band of interest. Moreover, its SNR can be computed directly and compared across conditions.

Third, the response obtained following periodic visual stimulation has an extremely high SNR, allowing the recording of subtle differences between visual stimuli in a very short amount of time: following only 1 min of stimulation, there was a 50 % difference of SNR between two conditions. Importantly, this high SNR is not merely due to the fact that many images are presented in a short period of time: there is a real benefit of stimulating at the exact same frequency rate for a few tens of seconds because the response of interest concentrates in a very narrow frequency band compared to the total EEG bandwidth, providing very high SNR (Regan 1989). The approach is thus relatively immune of artifacts and a meaningful response can be obtained without corrections for blinks or other artifacts, making the analysis relatively straightforward and the results highly reproducible across studies.

Finally, the measure is implicit: the periodic response can be measured in the absence of an overt behavioral measure, so that it is not contaminated by decisional processes. In the example above, participants simply monitored rare nonperiodic changes of the color of a fixation cross, but were not instructed to explicitly process face identity (Rossion and Boremanse 2011). Because it is implicit, the approach can be used similarly in typical human adults and in populations who are unable to provide overt behavioral responses, such as infants or certain patient populations.

All of these characteristics of the FPVS approach make it particularly appealing for understanding a complex visual function, such as individual face discrimination.

Neural mechanisms and temporal dynamics

Adaptation/repetition suppression

Why is the periodic EEG response larger when different faces are presented at every cycle as compared to the repeated presentation of the exact same face? This effect can be related to the well-known reduced neural response to repeated compared to unrepeated visual shapes, in particular faces, that was first reported at the neuronal level in the monkey inferior temporal (IT) cortex (e.g., Baylis and Rolls 1987; Miller et al. 1991; Ringo 1996), and later in humans in many neuroimaging studies (Grill-Spector and Malach 2001; Grill-Spector et al. 2006) as well as on the scalp in visual evoked potentials (e.g., Itier and Taylor 2002; Jacques et al. 2007; Walther et al. 2013; see also Kovács et al. 2006 for category-level adaptation in ERPs). This effect is generally defined as a form of neural habituation, repetition suppression, or visual adaptation (Henson 2003; Grill-Spector et al. 2006; Kohn 2007). In the monkey IT cortex, different facial identities are coded by overlapping yet different patterns of responses in the same populations of face-selective neurons (Leopold et al. 2006; Rolls and Tovee 1995; Young and Yamane 1992). Hence, one can speculate that presenting a different face than the previous one, at every 3.5-Hz cycle, triggers a partially different population of neurons in the human brain than the previously activated population of neurons. Compared to the suppressed response when the exact same face stimulus is presented at every cycle, the overall 3.5-Hz periodic response thus remains high in amplitude during the whole stimulation sequence. Importantly, this effect is specific to regions of the brain that can distinguish between different faces. In low-level visual areas, a high-amplitude response can be maintained merely by changing a low-level property—stimulus size—at every cycle, so that the difference between conditions is not found over the primary visual cortex (typically electrodes Oz, Pz).

Harmonics

If a slightly higher 4-Hz stimulation rate is used, with grayscale faces either changing identity at every cycle or with identity being constant, the amplitude and SNR spectra now show clear responses at the 4-Hz stimulation rate (Fig. 4; Rossion et al. 2012). As in the study described above, higher harmonics are also observed at exact integer multiples of the stimulus frequency (e.g., 2f: 8 Hz and 3f: 12 Hz, Figure 4). When using a squarewave stimulation (abrupt periodic onset/offset), these harmonic components are present in the stimulus. However, higher harmonic components than the fundamental frequency rate are present even when the input stimulation is sinusoidal. This could be due either to stimulus constraints (i.e., at a 100-Hz refresh rate, each frame is a 10 ms step, so that the stimulus is not a pure sinusoid but a staircase function) or to the double response of low-level visual processes to contrast modulation (i.e., one increase and one decrease in contrast by cycle). Yet, even a pure sinusoid generates frequencies in the response (the output) that are not present in the stimulus (the input), demonstrating the presence of nonlinear neural mechanisms (e.g., firing threshold) (e.g., Regan 1966, 1989). Interestingly, when stimulating at 4 Hz, both the fundamental (4 Hz) and second harmonic (8 Hz) responses are larger in amplitude over the right occipito-temporal cortex for different than identical faces (Fig. 5a; Rossion et al. 2012). Differences are not observed at higher harmonics (12 Hz, etc.), at which responses are localized over occipital medial electrode sites (Figs. 4 and 5). Although a peak at the 8-Hz second harmonic in the frequency spectrum does not necessarily reflect a neural response occurring 8 times/s, these observations suggest that there is a limited bandwidth of frequencies of interest for observing individual face discrimination responses, with a high cut-off frequency somewhere in between 8 and 12 Hz (see “The frequency-tuning function of individual face discrimination” section).

Fig. 4
figure 4

a Grand-averaged EEG spectrum displayed between 2 and 14 Hz for a periodic stimulation of grayscale faces at 4.0 Hz (N = 20 participants, 4 sequences of 60 s/participant, see Rossion et al. 2012). Note that the data are expressed in µV rather than power (µV2) values as in Figs. 2 and 3. Responses are observed at the fundamental stimulation frequency of 4 Hz and its harmonics (2f = 8 Hz, 3f = 12 Hz, etc.). The response at the fundamental frequency is the largest over right occipito-temporal sites, with two foci at the second harmonic (right occipito-temporal and occipital medial sites). The medial occipital topography at 12 Hz is characteristic of typical SSVEP topographies in response to low-level stimuli and/or frequency rates above 10 Hz. Note also the broadband increase in EEG amplitude between 8 and 12 Hz (alpha band). b The same data as displayed in (a), but the spectrum has been normalized by dividing the amplitude at each frequency bin by the average of the 20 neighboring bins, in order to estimate SNR spectrum (no signal = SNR of 1)

Fig. 5
figure 5

a Grand-averaged SNR EEG spectrum of electrode PO8 displayed between 3 and 13 Hz for a periodic stimulation of grayscale faces at 4.0 Hz (N = 20 participants, 4 sequences of 60 s/participant; Rossion et al. 2012). The difference between conditions (“different faces” and “same face”) is limited to the fundamental frequency (4 Hz) and the second harmonic (8 Hz), with similar topographical maps. There is no difference between conditions at the third harmonic (12 Hz). b The differential SNR maps (4 and 8 Hz averaged) reflecting the individual discrimination response for grayscale upright faces, inverted faces and contrast-reversed faces

Inversion and contrast reversal

When faces are presented upside-down or contrast-reversed, there is a substantial decrease in the differential periodic response (at 4 and 8 Hz) to different and identical faces, specifically over the right occipito-temporal cortex, and to a lesser extent over the left homologous region (Fig. 5b). This is the case even in the absence of any difference in behavioral performance at the orthogonal task (detecting changes of color of the fixation cross) between conditions. Since these two stimulus manipulations greatly affect individual face discrimination as measured behaviorally (inversion: Yin 1969; Rossion 2009 for a review; contrast-reversal: Russell et al. 2006), these observations support the relevance of the implicit measure of individual face discrimination obtained with FPVS: it is a high-level discrimination response, which reflects at least partly our specific visual expertise with faces. These observations also support the view that the difficulty at discriminating faces upside-down has a perceptual basis (Freire et al. 2000; Rossion 2008, 2009) and pave the way for investigating individual face discrimination responses to subtle stimulus manipulations.

Time and phase information

In principle, whereas ERP studies to transient stimulation provide critical information about the time-course of processes, timing information is thought to be lost in the EEG response to FPVS (Luck 2005; “The strengths of FPVS” section). However, the apparent advantage provided by transient ERP studies in terms of absolute timing information should be qualified, at least in face perception research. Indeed, following more than two decades of standard ERP studies in this field, there is still no agreement regarding the exact time-course of individual face discrimination. One reason for that is that the effects of individual face discrimination on visual components obtained following transient stimulation, such as the N170, are relatively small (Itier and Taylor 2002; Jacques et al. 2007). Hence, they cannot be reliably identified at the individual level without averaging many trials, and these effects are highly susceptible to modifications of methodological parameters (Rossion and Jacques 2011). For later components such as the N250, at which effects of individual face repetition are more consistently observed (e.g., Schweinberger et al. 2002), the ambiguity lies in the identification and definition of the component itself—usually a relatively small and wide negative deflection—on the scalp that can overlap with the time of saccadic eye movements and decisional responses. Moreover, all timing information is not lost with FPVS because the phase of the periodic response can usually be accurately extracted from the EEG spectrum within a small frequency bin and reveal useful information regarding the relative time differences of the response across conditions. For instance, in the two studies described so far, we extracted the phase information of the response obtained following stimulation of upright and inverted faces at 3.5 Hz (Rossion and Boremanse 2011) and 4 Hz (Rossion et al. 2012) (Fig. 6). In both cases, there was remarkable consistency across individual participants in the phase of the response at the fundamental frequency, with all conditions of interest in the study being associated with a relatively close average response phase value (Fig. 6). Most interestingly, there was a systematic delay for inverted as compared to upright faces (Rossion and Boremanse 2011; Rossion et al. 2012; see also Tsuruhara et al. 2014) and for contrast-reversed faces, as compared to typical contrast grayscale faces (Rossion et al. 2012; see Fig. 6). This delay was estimated at about 20 ms for inverted relative to upright faces. Considering that the stimulus is only revealed progressively due to the sinusoidal stimulation, this effect is well in agreement with behavioral and ERP findings (Jeffreys 1993; Bentin et al. 1996; Rossion et al. 1999), indicating that the phase of the FPVS response can be used to derive useful information about relative processing times between conditions or between populations (e.g., patients or children vs. typical adult observers).

Fig. 6
figure 6

a Time-domain representation (1 s = 4 cycles at 4 Hz, data from Rossion et al. 2012). To obtain the figure, 4 50-s trials of data of each individual participant (N = 20) were cropped in 50 small epochs of 1 s (4 cycles), averaged by condition, and then narrow-band filtered (3–5 Hz) before being grand-averaged. Here, a pooling of 3 occipito-temporal channels (PO8, PPO10 h, PO10) shows the difference in latency (phase) for inverted and contrast-reversed faces as compared to upright faces. b The phase values extracted exactly at 4 Hz for the 4 conditions (mean across participants, with standard errors indicated by dotted lines). In that study, phase values were relatively consistent across participants, and across conditions, with a delay of about 20 ms between upright and inverted faces, and further delay for contrast-reversed and contrast-reversed inverted faces. These displays indicate that relative timing information can be meaningful with this approach, even though the absolute time delay between the response and the stimulus onset is ambiguous

Temporal dynamics

Using a long sequence of stimulation is quite unusual in typical FPVS studies, even though it carries two important advantages (Regan 1989). First, Fourier transform of a long EEG epoch provides a very-high-frequency resolution (i.e., 1/60 s = 0.0166 Hz for 60 s of stimulation). Such a high-frequency resolution is important because all of the response, and thus all the effect of interest, is concentrated in a few discrete frequency bands that are very narrow and occupy a very small fraction of the total EEG bandwidth. In contrast, biological noise is distributed throughout the EEG spectrum, so that the technique is almost immune to artifacts and provides a SNR in the bandwidth of interest that can be very high (Regan 1989; Rossion et al. 2012).

The second advantage of using a long stimulation train comes from the fact that the periodic response may take several seconds to reach its maximum amplitude. Given this, a sequence of short duration may not be optimal to disclose differences between two conditions of interest (i.e., an effect of adaptation may be counteracted by the build up of the periodic response). This does not necessarily mean that the larger response observed to a train of different than identical faces takes time to emerge during a stimulation sequence: the very first appearance of a different face than the previous one in the periodic stimulation sequence (i.e., AAAAAAB) may be enough to increase the response specifically at the periodic stimulation rate. To demonstrate this, one can present a long train of identical faces at 4 Hz, for instance for 15 s, before suddenly introducing a periodic change in identity for the remaining duration of the sequence (Figs. 7, 8). In this situation, an abrupt increase in the response is observed, the very first change in face identity leading to a large release of the 4-Hz electrophysiological response. Importantly, this release is confined to the 4-Hz stimulation frequency and concerns only the right occipito-temporal region (Figs. 7, 8). This observation points to a rapid temporal dynamics of the discrimination of individual faces, at the exact frequency of stimulation, and provides the rationale for using rapid oddball paradigms, as introduced in the “Fast oddball face discrimination” section (Liu-Shuang et al. 2014).

Fig. 7
figure 7

a Band-pass-filtered (3–5 Hz) EEG data centered on the fundamental stimulation frequency (4 Hz) and grand-averaged across 4 stimulation sequences of 20 participants (Rossion et al. 2012). The displayed segment (PO8 channel) shows 5 s (10–15 s) of presentation of identical faces, followed by the presentation of different face identities at every cycle. There is a massive and immediate increase in amplitude (in gray) when different faces are presented relative to when the exact same face continues to be presented (in black). b In this figure, two seconds of data (PO8, between 14 and 16 s) has been low-pass filtered (cut-off 20 Hz) and grand-averaged, in order to show the shape of the response following the immediate introduction of the new face identity (after 15 s, or 60 faces at 4 Hz, of presentation of the exact same face). A large negative response of exactly one cycle width (about 250 ms, between 15.25 and 15.50 s) is elicited in this condition compared to the condition when there is no change in face identity

Fig. 8
figure 8

A time–frequency (TF) representation based on the continuous Morlet wavelet transform (CWT) of EEG of 6 subjects tested at a periodic rate of 4 Hz (occipito-temporal channel PO8, unpublished data, same protocol as in Rossion et al. 2012). Epoch length is −2 to 84 s. EEG epochs are cleaned up for artefacts by an independent component analysis (ICA, Makeig et al. 1997). Between 9 and 16 trials per subject are averaged in the time-domain. Time-domain averaging is performed within subject, before Morlet wavelet transform (initial spread of the Gaussian wavelet set to 10/w0 (w0 being the central frequency of the wavelet), see Mouraux and Iannetti 2008). Explored frequencies ranged from 0 to 15 Hz in steps of 0.15 Hz. The time–frequency maps are expressed in percentage relative to baseline (prestimulus interval of 1 s, −1.5 to −0.5 s): event-related synchronization (ERS) or desynchronization (ERD). Following the initial event-related potentials shortly after stimulation onset (0 s), there is a synchronization to the frequency of stimulation at 4 Hz lasting the whole stimulation sequence. The introduction of the different faces after 15 s (panel on top) boosts this 4-Hz response, which is larger than when the same face continues to be presented at every stimulation cycle

The frequency-tuning function of individual face discrimination

What stimulation frequency should, and could, be used to measure individual face discrimination? Traditional periodic visual stimulation studies have used relatively high-frequency rates, under the assumptions that a steady brain response requires such high-frequency rates and that the visual system’s optimal frequency rate is around 10 Hz (Regan 1966) or 13 Hz (Silberstein et al. 1990). However, such relatively high-frequency rate values may only lead to the largest response when stimulating with exactly identical low-level visual stimuli (i.e., luminance flicker). In the studies described above, a relatively slow rate of 3.5 or 4 Hz was used to provide enough time to the visual system to process each individual face sufficiently deeply (i.e., at the individual level) before the next one appeared in a train of stimulation. However, in reality, using 14 different frequency rates between 1 and 16.66 Hz (i.e., a face every 1,000 to ~60 ms) shows that the difference between the two conditions (different facesidentical face) is observed for stimulation frequencies above 3 Hz and below 9 Hz, peaking at a frequency rate of about 6 Hz (5.88 Hz, Alonso-Prieto et al. 2013; Figs. 9, 10). Strikingly, despite substantial inter-individual variability in SNR, the frequency-tuning function is remarkably similar across individuals, peaking at right occipito-temporal sites for all suitable frequency ranges (Fig. 10; Alonso-Prieto et al. 2013).

Fig. 9
figure 9

SNR EEG spectra illustrated for 3 of the 14 frequencies of stimulation in the study of Alonso-Prieto et al. (2013): 1.0 Hz (left), 5.88 Hz (center) and 12.50 Hz (right). SNR spectra were extracted from a right occipito-temporal region of interest of nine channels based on topographical maps, and grand-averaged over all recording sessions and four participants. Clear responses are observed at the fundamental frequency of stimulation and its harmonics. Note the numerous harmonic responses at 1.0 Hz, and the particularly large response observed at 5.88 Hz, with a much higher SNR when different faces are presented as compared to the presentation of the exact same face. Note that the three panels have different y-axis ranges

Fig. 10
figure 10

a Grand-averaged topographical maps (back of the head) of the differential (different faces–identical face) SNR at each of 14 frequencies of stimulation (8 stimulation sequences of 50 s averaged, for each of four participants). Negative values were minimal and set to zero in the display. b Frequency-tuning functions of individual face discrimination for two participants (S1 and S3), for each condition and their difference at right occipito-temporal electrode sites (mean and standard errors computed across subjects and sessions)

What do these observations mean? They indicate that there is a suitable bandwidth, or area of frequencies, for disclosing individual face discrimination responses, centered around 6 Hz (the “Fundamental Frequency Face Area”, “FFFA”). At rates above 8.33 Hz (cycle duration of 120 ms), there are no consistent differences between the two conditions, suggesting that each individual face cannot be processed at a sufficiently fine-grained level before the next face interrupts its processing. Thus, even though an observer might be able to perceive differences among individual faces at frequency rates above 8.33 Hz, the face perception system cannot synchronize to, i.e., process, every single face that is presented in the sequence. These results therefore provide original information about the temporal bottleneck of individual face discrimination in humans: At least 120 ms is necessary to process a face at the individual level and a duration of 170 ms seems to be sufficient for a full individualization process.

The absence of a repetition suppression effect at low-frequency rates (below 3 Hz) is difficult to understand. It may be due to a too long ISI between individual faces, which allows release from adaptation. That is, information about a repeated individual face may remain present in the neural response for no more than 250–300 ms and disperse thereafter. In line with this suggestion, modulations of the face-sensitive N170 component to individual face repetition have been found mainly at short intervals between the adapter and the target face (e.g., 100–300 ms in Caharel et al. 2009; Jacques et al. 2007), while studies using longer ISIs generally failed to reveal such effects (e.g., Schweinberger et al. 2002); see Rossion and Jacques 2011 for a discussion of this issue). Alternatively, the absence of effect at low-frequency rates may be because “transient” ERP components are clearly distinguishable, since there is time to return to baseline before the next stimulus arrives. Differences between conditions may thus take place on successive components of different polarities (e.g., P1, N170, P2 and N250), so that the overall difference between conditions may be canceled out when measured on the scalp (see Fig. 7 in Alonso-Prieto et al. 2013).

Irrespective of the correct explanation, it seems that for complex visual stimuli such as faces at least frequency rates above 3 Hz—i.e., when there are no clearly identifiable successive ERP components—should be used to provide visual discrimination responses at the periodic frequency of stimulation. Hence, although the term “fast” in FPVS is relative, it is informative because a relatively fast rate of visual stimulation is indeed necessary to capture some of the phenomena of interest such as high-level visual discrimination.

Frequency-tuning of individual face discrimination in human fMRI

With a slow temporal resolution method, such as fMRI, the response measured, i.e., the differential blood oxygen level-dependent (BOLD) signal between blocks of different faces compared to the same repeated face, cannot be specific to the frequency of stimulation. Yet, interestingly, considering only the magnitude of the BOLD response over several seconds, the frequency-tuning function obtained with fMRI in face-selective areas of the occipito-temporal cortex is similar to the function obtained with EEG, peaking also at 6 Hz (Gentile and Rossion 2014; Fig. 11). The face-selective regions include the fusiform and occipital “face areas” (FFA and OFA), as well as the posterior temporal sulcus (pSTS). In fMRI, however, this effect is due both to a maximal response to different faces in a range of 3–6 Hz and to a sharp drop of the BOLD signal from 6 Hz onward when the same face is repeated during a block (Gentile and Rossion 2014).

Fig. 11
figure 11

a Frequency-tuning function of individual face discrimination as obtained with fMRI (Gentile and Rossion 2014). The data are displayed for the independently localized core face-sensitive areas (FFA, OFA and pSTS) averaged across 6 individual brains. b Beta weights (extracted from a general linear model (GLM) analysis) related to the difference between the different faces and same face conditions across frequencies for all areas. The different tones of red colors in the brain maps are related to the right and left FFA of the six different subjects. The same rationale is used for the right OFA (tones of green) and for the right pSTS (tones of blue). The asterisks in correspondence with 6 Hz for the right FFA and the right OFA represent the significant difference between the adaptation effects at 6 Hz in comparison with all other frequencies

These observations complement the EEG observations by providing a neural basis of the effect observed on the scalp, indicating that face-selective cortical areas process each individual face in full when these successive faces are presented every 160–170 ms. Yet, in fMRI, differences between the two conditions remain significant at slow (<3 Hz) and high rates (>9 Hz) in most functional areas (Fig. 11; Gentile and Rossion 2014). This discrepancy with the EEG results suggests that at high stimulation frequency rates the system bypasses some of the different faces presented in succession: the fMRI effect—but not the EEG effect—can be driven by the perception of only a subset of different faces during a block. As for the significant effects at low-frequency rates in fMRI, they are not surprising because most fMRI studies that investigate individual face discrimination through repetition suppression effects rely on relatively low stimulation rates (i.e., from 0.5 to 1 face every 2 s to 3 Hz) (e.g., Davies-Thompson et al. 2009). The discrepancy with EEG suggests therefore that these fMRI effects occur at a different timescale than the exact rate of stimulation. Nevertheless, this relatively slow rate appears to be suboptimal to disclose large repetition suppression effects. Also, such rates may be too slow to disclose individual face repetition effects related to individual face discrimination in regions such as the pSTS (Fig. 11), which is tuned to rapidly changing (i.e., dynamic) stimuli (Puce et al. 1998; Schultz and Pilz 2009).

Practical implications and the golden range of frequencies

What are the implications of this frequency-tuning function for individual face discrimination? One implication is that at the behavioral level, human observers should find it difficult to discriminate individual faces presented at rates above 8.33 Hz. Obviously, they should not perform better at individual face discrimination between 3 and 9 Hz than at slower stimulation rates (<3 Hz). However, it may be that these intermediate-frequency rates allow better isolation of the contribution of high-level face discrimination processes to performance than slower rates, at which many other processes can contribute to (i.e., contaminate) performance. For instance, the difference in performance between the discrimination of upright and inverted faces might be larger at 6 Hz than at 1 Hz. If this is the case, a 3–9-Hz rate may reduce inter-trial and inter-individual variability in behavioral performance at individual face discrimination and may potentially be used to characterize abnormal processes in patients suffering from face recognition difficulties.

In terms of EEG research, the 5.88-Hz peak of the individual discrimination function corresponds exactly to a 170-ms cycle length, which is the peak of the face-sensitive N170 ERP component (Bentin et al. 1996; Rossion and Jacques 2011), and also the earliest time-point at which individual faces are discriminated over the right occipito-temporal cortex (Jacques and Rossion 2006; Jacques et al. 2007). This observation suggests that face-related fast periodic responses on the scalp may reflect the linear summation of many successive transient N170 face-sensitive responses, as proposed for the 40-Hz auditory steady-state response (ASSR) (Galambos et al. 1981; Santarelli et al. 1995) or SSVEP responses to low-level stimuli [e.g., checkerboard reversal, Capilla et al. 2011 see (Regan 1989)] (see also (Heinrich 2010) for an illustration of the superposition phenomenon in the steady-state response). That is, at frequency rates around 6 Hz, at occipito-temporal sites, the ERP components could combine to form a single, stable, composite wave dominated by the N170, with a difference between conditions accumulating at every cycle.

Importantly, this does not mean that with FPVS, one necessarily looks at overlapping, individual ERPs that are linearly superimposed: The periodic response could be generated by an increase due to the periodic stimulation train of an ongoing neural oscillation. Interestingly, in the study of Alonso-Prieto et al. (2013), resonances at 6 Hz over occipito-temporal regions were observed not only when stimulating at this fundamental frequency, but also at lower-frequency rates leading to 6-Hz harmonic responses (i.e., 6 F at 1 Hz, 3 F at 2 Hz, 2 F at 3 Hz). In all these cases, the response at 6 Hz was associated with the largest amplitude of all harmonics, although no differences between conditions were observed on these harmonic responses. Brain oscillations in this theta band, and in particular at frequency rates at which the effects of visual discrimination of faces are observed (4–8 Hz), have been associated with memory encoding (Buzsaki et al. 1994; Klimesch et al. 1996), prediction of visual detection (Busch and VanRullen 2010; Liebe et al. 2012) and accumulation of evidence in perceptual decision making (van Vugt et al. 2012). This functional role of theta oscillations thus generally agrees with the finding of a 4–6-Hz maximal response when presenting different faces at a periodic rate.

Regardless of the factors subtending the shape of this temporal frequency-tuning function, these observations suggest that this function should be seriously considered. Had we used a stimulation rate of 10 Hz or so in our first EEG investigation, we would have failed to disclose any difference between different and identical faces, i.e., an individual face discrimination response. Similarly, the conclusions of recent studies that have found relatively small repetition suppression effects at 3 Hz that vanish when using emotional faces (Gerlicher et al. 2014) should be qualified. More fundamentally, a close look at the scalp topographies of the two conditions of interest in the EEG frequency-tuning study reveals an interesting observation (Fig. 12). At slow frequency rates (1 or 2 Hz ), the domain of transient ERPs, the response focuses on (right) lateral occipital cortices but does not differ between conditions. At rates above 9 Hz, the domain of traditional SSVEP studies, the response focuses on medial occipital electrode sites in both conditions (same and different faces). It is only when using different stimuli within a “middle” range of frequency rates (3–9 Hz) that the response is localized over high-level visual cortices (Fig. 12, Alonso-Prieto et al. 2013).

Fig. 12
figure 12

Topographical maps (back of the head) of the SNR at the first harmonic when different faces (above) or the same face (below) are presented at every cycle (grand-averaged data across sessions and participants). The difference between these two conditions is shown in Fig. 9. The color scales are adapted to the maximal SNR value at each frequency. For different faces (above), note the peaks of activity at right occipito-temporal sites until about 9.09 Hz, and the second peak of activity over medial occipito-parietal channels (i.e., from 3.03 Hz until 16.66 Hz). The highest SNR response was recorded at 5.88 Hz. For identical faces (below), the peak of activity at right occipito-temporal sites is clearly visible only at low-frequency rates (1–2 Hz), and the scalp topographies are very similar from 4.0 Hz onward, with a focus on medial occipito-parietal sites. This figure suggests that the optimal temporal frequencies for capturing individual face discrimination processes range from 3 to 9 Hz

This observation concurs with previous studies showing that the spatial distribution of the fast periodic visual response is extremely sensitive to the driving frequency and that responses are recorded over parietal, temporal and frontal lobes only over limited frequency ranges in comparison to occipital responses (Narici et al. 1998; Srinivasan et al. 1999; Ding et al. 2006; Srinivasan et al. 2006). The strong dependence of responses far from primary visual areas on the flicker frequency does not easily fit a framework in which the “SSVEP” is generated by only localized occipital sources. More specifically for our purpose, these observations suggest that transient and fast periodic EEG studies of individual face discrimination—and visual discrimination in general—may have missed the ideal temporal frequency rates to investigate and understand this function.

Fast oddball face discrimination

In all the FPVS studies discussed so far, the two conditions of interest—different faces and identical faces—are recorded at different times, in different sequences (Rossion and Boremanse 2011; Rossion et al. 2012; Alonso-Prieto et al. 2013). Measuring discrimination based on a subtraction between EEG amplitude for “different” and “same” faces assumes that the two conditions differ only in one discrete feature, the variation of face identity during a block. Yet, when they are not recorded simultaneously, this might not necessarily be the case and there is always a concern that various factors can affect one condition more than the other. For instance, participants may not maintain the same level of attention/arousal to sequences of the exact same faces compared to those of different faces.

An elegant way to overcome this limitation is provided by measuring the discrimination between base and oddball stimuli differing on the dimension of interest—here, face identity—within the same periodic sequence. For instance, one can present the same individual face (A) at a frequency of 5.88 Hz (base frequency f) for 60 s and introduce different oddball faces (B, C, D…) at a rate of every 5th base face, i.e., at an oddball frequency of 1.18 Hz (= f/5 = 5.88 Hz/5) (Fig. 13) (Liu-Shuang et al. 2014; see movie 2 in supplementary material, for 15 s of stimulation). In these conditions, a robust measure of individual face discrimination is observed at the exact frequency at which face identity changes (every 850 ms, or 1.18 Hz) and at its harmonics (i.e., 2f = 2.35 Hz). This response is localized over the right occipito-temporal cortex and differs from the large response at the base frequency, peaking over medial occipital sites (Oz, Fig. 14). The discrimination response can be obtained in a few minutes in every individual brain, without asking participants to explicitly attend to the faces or to the oddball faces. When the exact same faces are presented upside-down or with their contrast-reversed, the individual face discrimination response at the oddball frequency is substantially reduced. However, importantly, the response at the base frequency remains virtually unaffected (Fig. 15).

Fig. 13
figure 13

a The fast periodic oddball paradigm used by Liu-Shuang et al. (2014). Faces are presented by sinusoidal contrast modulation at a rate of 5.88 cycles per second). At fixed intervals of every 5th base face (= 5.88/5 Hz = 1.18 Hz), different oddball faces are presented. b SNR spectrum of the right occipito-temporal ROI for faces (grand-average, four sequences of 60 s for each of 12 participants). The channels composing this ROI are indicated with black dots on the 3D head in the upper right. On the SNR spectrum, only significant oddball responses are labeled. Note that while the 1.18-Hz response appears small, it has a SNR of 1.49, corresponding to a 49 % response increase (see also Fig. 14). Below the spectrum, 3D topographies of each harmonic response are displayed. The largest oddball response is observed over the occipito-temporal regions, with a clear right hemisphere lateralization

Fig. 14
figure 14

a From Liu-Shuang et al. (2014). Grand-average SNR spectrum for a single trial (60 s of recording) for the right occipito-temporal ROI (channels shown on the blank 3D topography on the left). Only significant responses are labeled (Z-score >1.96 computed by using the average and standard deviation of surroundings frequency bins). b 3D topographies of oddball responses in individual participants (average of four trials). SNR values of harmonic frequencies that are significant on the group-level are averaged (1F/5 = 1.18 Hz until 8F/5 = 9.41 Hz). The color-scale represents response magnitude according to individual participants’ peak SNR value, which is shown above each topographical map

Fig. 15
figure 15

SNR spectra of a right occipito-temporal region (Liu-Shuang et al. 2014) for stimulation with grayscale and contrast-reversed faces. On the SNR spectra, only significant peaks (Z-score >1.96) are labeled. Below the spectra, 3D topographies of each harmonic are shown at their individual color scales (the scales’ maxima are shown above each topography). a In the grayscale condition, all the responses related to face discrimination are significant from the fundamental oddball frequency (1.18 Hz) until the 6th harmonic (7.05 Hz). b For contrast-reversed faces, not only are there fewer significant responses, but these are also of much smaller magnitude. Contrary to grayscale faces, there is no clear right lateralization. Importantly, the response at the base rate (5.88 Hz) does not differ in magnitude between conditions (grayscale vs. contrast-reversed), the difference being limited to the individual face discrimination response

This fast oddball periodic paradigm (Heinrich et al. 2009; Liu-Shuang et al. 2014) has all the advantages of the FPVS approach (objectivity, sensitivity and implicit measure) and can isolate a discrimination response without relying on a subtractive operation between separately recorded conditions. Instead, a periodic response at the oddball frequency is already a measure of a difference in response magnitude to the base and oddball stimuli. Hence, a significant signal at the oddball frequency is sufficient in itself to infer visual discrimination. This method is flexible, since the experimenter can easily manipulate the dimension and the finesse of the discrimination by varying the nature of the base and oddball stimuli. Moreover, the different behavior of the base and oddball frequencies across conditions and their differences in scalp topography (medial occipital for the 5.88-Hz base frequency and right occipito-temporal for the 1.18-Hz oddball frequency) imply that these responses reflect distinct processes: one related to individual face discrimination (oddball frequency) and the other the general responsiveness of the system to visual pattern stimulation (base stimulation frequency).

The potential neural mechanisms that generate the periodic oddball face discrimination response could be similar to those underlying the mismatch negativity response in the auditory (MMN, Näätänen et al. 1978) and visual domains (Pazo-Alvarez et al. 2003; Kimura et al. 2011), although there are many reasons to believe that the oddball periodic response cannot be related directly to a transient ERP component such as the MMN. Based on what is known about the discrimination of the specific visual category used here, i.e., faces, one can speculate about the neural mechanisms that underlie the oddball response: Perhaps, different subgroups of neurons within the same population respond to the repeated base face compared to the periodically introduced different oddball faces (e.g., Young and Yamane 1992; Abbott et al. 1996; Leopold et al. 2006). These two types of responses can be distinguished at a larger scale on the scalp only if the amplitude of the periodic response at the rate of face change (1.18 Hz) differs from that of the responses at the base stimulation frequency (5.88 Hz) (see Fig. 6 in Liu-Shuang et al. 2014). Thus, similarly to other studies using FPVS (Rossion and Boremanse 2011; Rossion et al. 2012; Prieto et al. 2013), the fast periodic oddball paradigm probably shares the same neural mechanisms of classic repetition suppression paradigms (Henson 2003; Grill-Spector et al. 2006; Jacques et al. 2007), but with the advantages of increased objectivity, sensitivity and direct isolation of the differential response of interest.

Summary and perspectives

The studies summarized in this review have provided novel information regarding the temporal frequency-tuning of individual face discrimination, indicating that at least 120 ms is necessary to process a face at the individual level and that a duration of 170 ms seems to be sufficient for a full individualization process. In addition, contrary to the conclusions of many fMRI adaptation studies, the face-selective area of the pSTS appears to take part in individual face discrimination, but only at fast rates of stimulation. However, the main contribution of this review has been to highlight the practical advantages of the FPVS approach to measure a fundamental function of the human brain: visual discrimination of individual faces. The studies reviewed suggest that by using a relatively fast (above 3 faces/second) periodic input, one can record an electrical brain signature of individual face discrimination that has many advantages over behavioral or conventional EEG measures in response to transient visual stimulation. These advantages have been summarized in “The strengths of FPVS” section, and the studies presented after this section have only reinforced these advantages: (1) the objective signature of individual face discrimination (i.e., at an experimentally defined frequency), (2) the direct quantification of this response and estimation of its SNR, (3) the very high SNR obtained even for short acquisition durations and (4) the fact that the visual response is measured implicitly and thus can be measured by means of the same paradigm in all populations. The fast periodic oddball paradigm presented in the “Fast oddball face discrimination” section adds to these advantages the possibility to measure a visual discrimination response without having to compare the response to two kinds of stimuli recorded at different times. Because of these advantages, the approach offers a powerful alternative to behavioral measures of individual face discrimination, which are relatively slow to collect and confounded by cognitive and decisional factors.

The approach presented here is only constrained by a few factors. First, the input needs to be periodic. This is a potential weakness because periodicity means predictability and, thus, a response that could be potentially confounded by an observer’s expectations. However, the fast rate of the input and the possible changing properties limit the potential impact of top-down expectancies. For instance, participants tested with the fast oddball paradigm notice that some faces change from the base face, but they are unable to determine the periodicity of the oddball face. A second factor to consider is the length of visual stimulation. A long sequence of stimulation—as used in all studies presented here—is undoubtedly advantageous in terms of SNR. However, short independent trials rather than long sequences may be useful or mandatory in some experimental paradigms. Fortunately, since the temporal dynamics of individual face discrimination is extremely fast, as illustrated in the “Neural mechanisms and temporal dynamics” section, a long sequence of stimulation is not mandatory. For instance, short oddball trials can provide high SNR responses (see Fig. 6 in Liu-Shuang et al. 2014). Other aspects of the approach, such as the use of sinusoidal rather than square wave stimulation, are not central as long as the input is periodic.

As mentioned above, the strengths of the approach outlined here—at least when using EEG or MEG—are not in providing information about the neural basis (where) and the time-course (when) of individual face discrimination. Yet, all information about “where” and “when” is not lost with this approach. First, the signature of individual face discrimination on the scalp is remarkably consistent across studies and across participants, indicating a dominance of occipito-temporal regions, in particular on the right hemisphere. Given that the SNR is much higher with FPVS than with transient stimulation, it is reasonable to believe that constrained or unconstrained source localization procedures would work better with periodic responses than standard ERP/ERMf measures in order to identify the regions coding for individual faces (see Appelbaum et al. 2006, 2010 for EEG source localization of periodic responses in fMRI-defined functional regions of the visual cortex). Nevertheless, fMRI studies indicate that a large set of areas are involved in this function, all along the ventral occipito-temporal temporal cortex. Thus, it is likely that the resolution of this issue will come from fMRI or direct intracerebral recordings in the human brain rather than measurements on the scalp. Second, there is more information about timing in fast periodic visual EEG responses than usually thought. For instance, the temporal frequency-tuning function that has been discussed in “The frequency-tuning function of individual face discrimination” section provides timing information—the time it takes to process a stimulus—that would be difficult to obtain with standard ERP measures. Also, for frequency rates of interest of up to 4 Hz at least, the relative difference in time between conditions can be estimated by analyzing the phase of the response, revealing meaningful and consistent timing delays between upright and inverted faces for instance (Fig. 6).

Moreover, measuring individual face discrimination implicitly, i.e., while participants complete an entirely orthogonal task, or no task at all, and are unaware of the experimental manipulation opens the possibility of using the approach with participants who are unable to provide explicit discrimination responses due to decisional, motor or cognitive limitations, such as patients or infants, and to compare their response to typical adults. Nevertheless, in the later population, it will be important in future research to relate the implicit measures of individual face discrimination obtained with FPVS to behavioral measures of face processing, as obtained in the Benton face matching test (Benton and Van Allen 1968) or the Cambridge face memory test (CFMT, Duchaine and Nakayama 2006a), for instance. Given the wide interest in inter-individual variability in face recognition performance in the normal adult population (Bowles et al. 2009; Wilmer et al. 2010) and in characterizing cases of so-called developmental or congenital prosopagnosia (Behrmann and Avidan 2005; Duchaine and Nakayama 2006b), an approach such as FPVS that can characterize a high-level visual function rapidly and more objectively than behavioral measures or transient ERP responses in such populations (e.g., Towler and Eimer 2012) may be particularly useful.

Conclusions

To conclude, this paper has provided a summary of the recent use of fast periodic visual stimulation (FPVS) to make significant progress in understanding visual discrimination of individual faces. Besides individualization, which is probably the most fine-grained, i.e., difficult, process to perform on a face, there are many other signals that can be extracted on faces for social categorization (e.g., age, sex, ethnical origin and expression) and for which an efficient and implicit measure of discrimination, such as that provided by fast periodic visual stimulation, could prove extremely useful in the future. In the same vein, the FPVS approach could be relatively easily implemented to test visual discrimination of nonface objects and letter strings, opening a real avenue for future research on the human brain’s high-level visual discrimination function in general.