Introduction

Human visual experience is strongly modulated by culture. For over 20 years, cross-cultural research in cognitive science and, more recently in neuroscience, have shown that culture shapes the way we perceive the world (for a review, see Han et al. 2013). Studies comparing Western Caucasian (WC) and East Asian (EA) observers have provided convergent evidence on the cultural perceptual biases characterizing the cognitive styles of those populations (e.g., Nisbett et al. 2001; Nisbett and Miyamoto 2005). Westerners preferably focus on local information in objects (e.g., Masuda and Nisbett 2001), scene (e.g., Masuda and Nisbett 2006), and face perception (e.g., Blais et al. 2008; Caldara et al. 2010). In contrast, individuals from EA cultures—such as China or Japan—display instead a perceptual bias towards global information processing. Recent evidence has suggested that this cultural contrast might rely on culture-specific tuning towards visual spatial frequency information (Miellet et al. 2013). More precisely, WC observers use preferentially high spatial frequency information from foveal vision (e.g., see Miellet et al. 2013 for evidence from face recognition). In contrast, EA observers preferentially process contextual information by relying on extra-foveal vision during face recognition (Miellet et al. 2013), and for change detection of both low-level visual stimuli (e.g., color blocks; Boduroglu et al. 2009) and complex real-world stimuli (e.g., natural scenes; Masuda and Nisbett 2001).

These perceptual tunings have been related to attentional differences across cultural groups (Nisbett et al. 2001). In a series of studies, it has been consistently demonstrated that WC observers attend to and process more effectively local features, while EA observers exhibit a global attention bias (McKone et al. 2010; Nisbett and Miyamoto 2005; Hedden et al. 2008; Kitayama et al. 2003). For example, Kitayama et al. (2003) presented Westerners and Esterners a vertical line within a square frame and subsequently asked the observers to report the length of the line (i.e., rod-and-frame task). In the task requiring local selective attention (i.e., reporting the absolute length without referencing to the surrounding frame), WC observers are more accurate than EA observers, whereas EA observers outperformed WC observers in the task requiring global selective attention (i.e., reporting the relative length by referencing to the surrounding frame). EA observers are also more distracted by unrelated global information and faster at detecting targets at the global level compared to Westerners (Boduroglu et al. 2009; Petrova et al. 2013), which suggest they might have a global selective attention bias. Within this framework yet, McKone et al. (2010) used Navon stimuli to directly quantify the global/local attention bias between Westerners and Easterners. Navon figures are hierarchical stimuli comprising a large global shape constituted by small local shapes (Navon 1977). One of the main advantages of these images lies in that participants viewing the very same stimulus (thus perfectly matched in terms of low-level visual properties) can be cued or can show a visual preference to either the local or global elements (McKone et al. 2010; Navon 1977). Interestingly, McKone et al. (2010) reported that only East Asian observers identified target letters faster when they were appearing at the global level. These authors attributed this performance advantage to a more effective global-selective attention in EA compared to WC observers (McKone et al. 2010).

However, the difference in attention selectivity between cultures has been questioned in a number of studies. For example, a direct replication of Kitayama et al. (2003) reported that both WC and EA observers performed better at tasks requiring more global- than local-selective attention (Zhou et al. 2008). Eye tracking results also showed that EA observers do not overtly direct more attention (i.e., fixations) towards the global information than WC observers (Evans et al. 2009; Miellet et al. 2010; Rayner et al. 2009, 2007; but see Chua et al. 2005). In other cross-cultural studies using Navon figures, Caparos et al. (2013) tested observers from Himba culture (i.e. a remote culture from North Namibia, see Davidoff et al. 2008) and reported a perceptual bias to local features in this population. However, despite such very strong local perceptual bias, the Himba did not underperform on global selective-tasks compared to Western controls (Caparos et al. 2013). Altogether, the extent to which attentional processes are playing a key role on the cultural differences reported above and the presence of this perceptual bias across the Western and Eastern cultures remains to be clarified.

The electroencephalography (EEG) technique is highly relevant to probe this question and identify whether and when attentional processes would drive such cultural perceptual biases in global/local processing. Notably, attention modulates a particular electrophysiological signature in human observers: the P1 event-related potential (ERP) (Hillyard et al. 1973; Voorhis and Hillyard 1977; for a review, see Luck et al. 2000). The P1 is a positive deflection peaking roughly 100 ms after stimulus onset on the occipital scalp surface, larger for attended than unattended information. Surprisingly, studies directly comparing two groups of observers did not reveal differences on this component (Goto et al. 2010; Lewis et al. 2008). For example, Goto et al. (2010) manipulated the semantic incongruity between objects and scenes. They only observed cultural differences in the ERP amplitudes at later stages of information processing (i.e., N400 components; Kutas and Hillyard 1980; Holcomb and Neville 1991). While methodological shortcomings such as significant differences in low-level visual properties of the stimuli should be taken into consideration (Petrova et al. 2013), conventional ERP analysis relying on absolute amplitude differences across conditions might not be sensitive enough to reveal early effects (see Vizioli et al. 2010). In order to increase the sensitivity of the electrophysiological signals, here we adopted one of the most powerful methods used in the neurophysiological literature, which relies on the neural repetition effect. Adaptation is a well-established paradigm used to reveal the nature of information coding at the perceptual and neurophysiological levels (for a review, see Grill-Spector et al. 2006). In this framework, “repetition suppression” (RS) is the reduction of neural activity, in stimulus-related regions, associated to the presentation of two or more stimuli in rapid succession (Grill-Spector et al. 2006; Henson 2003; Henson and Rugg 2003; Wiggs and Martin 1998). It has been argued that RS represents a “sharpening” mechanism within the neural population that engage in the processing of the repeated stimulus (Grill-Spector et al. 2006; Wiggs and Martin 1998). Thus, the amount of RS reflects the capacity of neural populations to discriminate different information and can thus be viewed as a novelty detection mechanism and a more efficient coding of the sensory input (Caharel et al. 2009; Grill-Spector et al. 2006; Jenkins et al. 2010; Vizioli et al. 2010). RS has been reported in EEG and fMRI studies across a variety of cognitive tasks (e.g., Gutnisky and Dragoi 2008; Müller et al. 1999; Todorovic and de Lange 2012; Vizioli et al. 2010). Interestingly, RS has recently been observed in cross-cultural studies on the blood oxygenation level-dependent (BOLD) signal with functional magnetic resonance imaging (Goh et al. 2007; Jenkins et al. 2010). Jenkins et al. (2010) found a significantly larger RS to incongruent scenes than to congruent scenes in bilateral occipital cortex for Chinese participants. This observation suggests that EA observers engage more in object processing when it appears in incongruent than in congruent contexts, which also indicates a bias towards global processing in this population. However, the temporal dynamics of this cultural perceptual bias are still largely unexplored.

To clarify this issue, we took full advantage of the conjoint use of Navon stimuli with an EEG adaptation paradigm, in order to map out the temporal dynamics of global/local shape processing in WC and EA observers. EA and WC observers viewed sequences of two Navon figures (an adaptor and a target, see Fig. 1), while we recorded their scalp EEG signals. Importantly, Navon images were constituted by elementary geometric shapes equally familiar to both groups of observers (and not by Romanic letters, for which Westerners have greater experience). Participants performed an active categorization task requiring the selective detection of (a potential) change occurring either on global or local features on the target image. To control for electrophysiological artifacts generated by saccades, we simultaneously recorded the eye movements and participants were trained to fixate at the center of the screen. Moreover, instead of applying a conventional electrophysiological analysis that would only focus on the target ERPs, we adopted the novel single-trial analysis method developed by Vizioli et al. (2010). Electrophysiological responses elicited by the target shapes were subtracted from those elicited by the adaptor shapes independently at the single-trial level (see methods). To better estimate their temporal sensitivity, we then used a data-driven spatio-temporal analyses carried out independently at all electrodes and all the time-points. Our results showed that EA observers were more efficient at detecting global changes compared to WC observers. This behavioral effect was paired with larger RS responses for global changes as early as 80 ms after stimulus onset, on the attention-sensitive P1 component. These results confirm and refine the cultural perceptual bias for global processing in Easterners, a process that occurs very early and it is rooted into selective attention.

Fig. 1
figure 1

Experimental procedure. Different color in the target represents the condition in the current study (as shown in the legend). (Color figure online)

Methods

Participants

Twelve East Asians (7 females) and 12 Western Caucasians (6 females) took part in the study. All participants were students from the University of Glasgow with normal or corrected-to-normal vision. Participants’ ages ranged from 20 to 29 years (mean age 25.7). All the EA participants were from Mainland China, and had no previous experience with a Western country. The maximum duration of residence in the UK for the EA participants upon testing was less than 6 months. The experiment received the approval of the local ethical committee and all participants provided written informed consent.

Stimuli and procedure

Stimuli were hierarchical Navon figures composed by five different shapes: circle, cross, diamond, square, and triangle (Navon 1977; see Fig. 1). We employed geometric shape to avoid familiarity differences with alphabet letters between two groups of observers. The local shapes (0.6° × 0.6° of visual degree at a viewing distance of 70 cm) were equally spaced and arranged to create the global shapes (5.8° × 5.8°), resulting in a total number of 25 different stimuli. To minimize the effect of low-level visual adaptation stemming from identical images, the size of the target Navon stimulus was slightly larger than the adaptor (0.72° × 0.72° on local level and 6° × 6° on global level). Stimuli were presented on a Dell P1130 19″ CRT monitor with an 800 × 600 pixel gray background, at a refresh rate of 170 Hz.

At the beginning of the experiment, participants sat in a dimly lit, sound-attenuated electrically shielded booth. Each trial started with a fixation cross (0.3°) presented at the center of the screen for 300 ms. The adaptor was presented for 800 ms, followed by an interval duration randomly ranging between 150 and 300 ms. The second shape was then presented for 600 ms. Following the offset of the target stimuli, subjects were required to respond by pressing a key on a standard computer keyboard. A randomized inter-trial interval between 1,200 and 1,500 ms preceded the beginning of the following trial.

We used a forced choice task. During the experiment, participants categorized the feature change between each pair of adaptor and target. There were four conditions: (1) identical; (2) global congruent condition, where only the local features were changed; (3) local congruent condition, where only the global features were changed; (4) different—both local and global features were different. Upon the offset of the target, participants assigned the trial to one of the conditions by pressing one of the four pre-assigned buttons, corresponding, respectively to the letter “d”, “f”, “j”, and “k”. An example sheet (similar to Fig. 1) was provided prior to beginning the experiment to explain the four possible conditions. Participants were instructed to respond as quickly and as accurately as possible. To prevent perceptual and decisional bias, we counterbalanced all 25 stimuli so that all stimuli would appear in the same proportion both as an adaptor and as a target in each condition and the trial sequence was also randomized. Moreover, to minimize the potential EEG artifacts generated by eye movement, we monitored and processed participants’ eye movements online for trial validation. Trials containing blink or saccade during stimulus presentation were excluded. We defined a trial as valid if the gaze-drift during the stimulus presentation was less than 1° of visual angle away from the fixation cross. Prior to commencing the experiment, participants performed 20 practice trials to familiarize themselves with the task. The experiment ended when a participant completed 600 valid trials (4 conditions ×150 repetitions). Subjects were given a short break every 100 trials. The whole experiment was programmed in Matlab 2007b, using the Psychophysics Toolbox (PTB-3, Brainard 1997; Kleiner et al. 2007).

Eye tracking and EEG recording

Eye movements were recorded by means of Desktop-Mount EyeLink 2 K eyetracker (SR Research Ltd., Mississauga, Ont., Canada) with a temporal resolution of 1,000 Hz, a spatial resolution of 0.01° of visual angle and an average gaze position error of about 0.25°. Eye movements were recorded monocularly on the dominant eye. The eye movement data were recorded via Matlab (R2006a), using the Psychophysics (PTB-3) and EyeLink Toolbox extensions (Brainard 1997; Cornelissen et al. 2002). Calibration and validation of eye fixation were conducted before the experiment and repeated whenever necessary during the procedure.

The EEG data was recorded continuously at 1,024 Hz from a 128-channel Biosemi Active Two EEG system (Biosemi, Amsterdam, Netherlands). Ag/AgCl electrodes were placed in a nylon cap following the 10-5 system (Oostenveld and Praamstra 2001). Data were band-pass filtered (0.1 and 100 Hz) online. Electrode impedance was kept between ±25 kΩ. The recording reference and ground consisted of two active electrodes [common mode sense (CMS); driven right leg (DRL). For further information see www.biosemi.com]. Subjects were asked to minimize blinking, head movement, and other body movement.

Behavioral and EEG analysis

We carried out 2 (groups of observers) × 4 (feature changes) mixed model ANOVAs independently on the reaction time (RT) and accuracy rate (ACC). The two groups of observers (WC and EA) were the between subjects factor and the four conditions (identical; global congruent; local congruent; and different) were the within subjects factors. Moreover, to take into account both measurements, inverse efficiency score (IES—Bruyer and Brysbaert 2011; Townsend and Ashby 1978, 1983) was calculated as:

$$ {\text{IES }} = {\text{RT }}/ {\text{ACC}} .$$

Only trials with correct behavioral response were analyzed (93.7 % for WC and 95.5 % for EA, detailed in Result session). EEG data was preprocessed and analyzed using Matlab and EEGLAB toolbox (Delorme and Makeig 2004). The raw signal was referenced off-line to an average reference, and low-passed filtered at 40 Hz. Noisy electrodes were excluded, and interpolated using a spline interpolation algorithm implemented in EEGLAB on a subject-by-subject basis. EEG data were epoched from −100 to 500 ms relative to stimulus onset, separately for adaptor and target, independently per condition. Baseline was defined as 100 ms prior to, until stimulus onset. Baseline correction was applied by removing the average baseline amplitude from every time point independently for each epoch. To fully exploit the paired nature of the adaptation design, trials were only accepted if both signals elicited by adaptor and target stimuli were below the rejection threshold. Single-trial repetition suppression (stRS) was then calculated for the accepted trials only, by subtracting the target shape epoch from the adaptor epoch. For more details about the procedure, see Vizioli et al. (2010).

We carried out the same 2 (group of observers) × 4 (feature change) mixed model ANOVAs independently at all electrodes and all time-points on the stRS amplitude. To correct for multiple comparisons, we used robust statistics, including bootstrap (Wilcox 2005) and a multivariate clustering technique (Maris and Oostenveld 2007; Vizioli et al. 2010). Within each group, we first centered each stRS trial on the mean amplitude so that each condition had a mean of zero. We then used bootstrap, sampling with replacement the subjects, to derive an estimate of the sampling distribution under the null hypothesis of no difference across the conditions’ means. We carried out the mixed model ANOVAs (as described above) independently at all electrodes and time-points on the centered bootstrapped stRS. The significant F values (p < 0.05) were then grouped in spatiotemporal clusters (Maris and Oostenveld 2007; Vizioli et al. 2010). We computed the sum of F values in every cluster and selected the maximum cluster sum. After repeating the same procedure 500 times, we sorted all the cluster sums according to their values and selected the 95th percentile as the cluster threshold. The original ANOVA clusters F-value sums were then compared with the bootstrap cluster threshold. If an observed cluster sum was equal to or larger than the threshold sum obtained under H0, all of the time-points and the electrodes contained in that cluster were considered significant (Vizioli et al. 2010).

Results

Behavioral results

The behavioral results are summarized in Fig. 2. The 2 × 4 ANOVAs carried out on RT, ACC and IES showed significant main effects of feature change for RT [F (3, 66) = 52.07, p < 0.05], ACC [F (3, 66) = 30.17, p < 0.05], and IES [F (3, 66) = 49.49, p < 0.05]. Post hoc paired-sample t-tests indicated that both groups of observers responded faster and more accurately to the identical condition compared to the others. No main effect of groups of observers was observed for all three measurements [RT: F (1, 22) = 0.96; ACC: F (1, 22) = 1.66; IES: F (1, 22) = 1.52. All p > 0.05].

Fig. 2
figure 2

Results of reaction time (a), accuracy rate (b), and inverse efficiency score (c). Only global congruent and local congruent conditions are shown for the inverse efficiency score (c). Error bars show standard errors from the means

Importantly, we observed a significant interaction of group of observers (2) × feature change (4) for the IES [F (3, 66) = 3.59, p < 0.05]. To clarify this result, we performed post hoc paired-sample t tests on the IES values between the local and global congruent conditions independently for each group of observers. This analysis revealed that WC observers identified local congruent items (M = 673 ms, SD = 293) significantly slower than the global congruent ones [M = 551 ms, SD = 229, t (11) = 2.79, p < 0.05]. No differences were reported between global and local congruent conditions for EA observers [t (11) = 1.41, p > 0.05].

Single-trial RS results

The minimum number of accepted trials across all subjects and conditions was 100. Figure 3 shows the adaptor ERP, target ERP, and stRS for the electrode where the F values for the groups of observers × feature change interaction was maximal (i.e. P4).

Fig. 3
figure 3

ERP and stRS for P4. Depicts the mean ERPs elicited by the adaptor (solid line) and the target faces (dashed line) for the four conditions (identical, red line; global congruent, green line; local congruent, blue line; different, yellow line) for WC (a) and EA (b) observers. stRS responses are shown in c for WC and d EA. (Color figure online)

After multiple comparisons correction, the Mix-model ANOVA revealed a significant interaction (groups of observers × feature change) at two spatial–temporal clusters (Fig. 4a). The first cluster was located at right occipito–parietal electrodes (Fig. 4b). Its time-window was within the latency of the P1 component, which ranged from 60 to 110 ms. The F-value of the first cluster reached its maximum at 88 ms at electrode P4 [minimum F (3, 66) = 2.24; maximum F (3, 66) = 8.32, p < 0.05]. This was the same electrode where P1 component reached its maximum amplitude. Post hoc paired-sample t tests showed that global congruent trials elicited significantly larger stRS responses compared to local congruent condition in EA observers [t (11) = 3.26, p < 0.05]. No significant difference between global and local congruent conditions were observed for WC observers [t (11) = 0.39, p > 0.05. See Fig. 4e].

Fig. 4
figure 4

Significant interaction of spatial–temporal ANOVAs. a Significant F values of the groups of observers × feature change interaction on the stRS amplitude. The x-axis depicts the time points of the stRS epoch, while the y-axis indicates the 128 electrodes. The electrode order is unrelated to their spatial topographical positions. b, c, d Portrays the F-value topography maps at the local peaks of the P1 and N2 spatial–temporal clusters. For illustration purpose, here the first two local peaks of the N2 cluster are shown. b Represents the F-value topography of the P1 spatial–temporal cluster at 88 ms (latency of the peak F-value),which was maximal at electrode P4. c, d Show F-value topographies of the N2 spatial–temporal cluster at two local F-value peaks (236 and 273 ms). The peak electrodes were CP6 and C2, respectively. e, f, g Illustrates the mean stRS responses of global congruent and local congruent conditions for WC and EA observers at each F-value peak latency at the respective electrodes. For display purposes, stRS on the negative component was multiplied by −1. Thus, more positive values indicate larger repetition suppression. Error bars show standard errors from the means

The second significant spatial–temporal cluster occurred within the time-window ranging from 200 to 350 ms (i.e. the posterior N2 component) and was spatially located around the anterior parietal electrodes (Fig. 4b). F-value topography maps revealed 3 local maxima with different spatial distributions (See Fig. 4c, d). The first occurred at 236 ms at electrode CP6 with a right-lateralized topography [F (3, 66) = 8.42, p < 0.05, Fig. 4c]. Post hoc paired-sample t tests indicated larger stRS responses to local congruent compared to global congruent conditions for EAs [t (11) = 2.4, p < 0.05] but not WCs [t (11) = 0.18, p > 0.05, Fig. 4f]. The second maximum occurred at 273 ms over center-parietal electrodes [C2, F (3, 66) = 10.01, p < 0.05, Fig. 4d]. Post hoc paired-sample t tests indicated that WCs displayed larger stRS responses to global congruent compared to local congruent conditions [t (11) = 2.24, p < 0.05]. No significant differences were observed for EAs [t (11) = 0.21, p > 0.05, Fig. 4g]. The third maximum was observed at 312 ms at electrode C5 [F (3, 66) = 9.29, p < 0.05]. Post hoc paired-sample t tests showed significantly larger stRS responses to local congruent compared to global congruent conditions for WCs [t (11) = 4.30, p < 0.05]. No significant differences were observed for EAs participants [t (11) = 2.145, p > 0.05].

Discussion

The present study aimed to address whether and when cultural perceptual tunings to global/local information are driven by attention. We directly tested this hypothesis on Western Caucasian (WC) and East Asian (EA) observers while recording their scalp electrophysiological signals during the visual categorization of hierarchical shapes differing in their global/local properties. We used an adaptation paradigm, as well as a single-trial method coupled with robust data-driven analyses on all the electrodes and time points. The behavioral results showed that EA observers display a global advantage by processing more effectively visual changes on this visual property compared to WC observers. More importantly, our electrophysiological data showed that global/local feature repetition modulates the P1 and N2 ERP components as a function of culture. Specifically East Asian observers showed greater sensitivity to global congruency on this component compared to WC observers. These observations confirm the existence and refine the knowledge on the cultural attentional biases in visual information sampling, which are consistent with previous observations (Kitayama et al. 2003; McKone et al. 2010). Importantly, this early neurophysiological signature of cultural diversity in Easterners cannot be accounted by the inherent nature of the stimuli and/or task, as it was not present in Western Caucasian observers. Westerners showed sensitivity to hierarchical Navon shapes discrimination at later stages.

Attention modulation on the P1 component

We reported an interaction between the culture of the observers and the neural sensitivity to global/local information coding at early stages of visual information processing. Firstly, only EA participants showed larger stRS responses to global compared to local feature repetition, as early as 80 ms after stimulus onset; the time window of the P1 ERP component. The P1 is a positive deflection peaking between 70 and 120 ms at central occipital electrodes, which has been consistently related to attention (Luck et al. 2000). The P1 amplitude modulations observed in EA observers suggest that the global precedence in this population is modulated by attention. Importantly, there is a growing body of evidence positing that larger RS responses reflect greater neural coding efficiency (Caharel et al. 2009; Grill-Spector et al. 2006; Vizioli et al. 2010). Although the precise neural mechanisms of RS are still debated, numerous models have been proposed to account for the reduction in neural activity following stimulus repetition (Grill-Spector et al. 2006; Schacter et al. 2007). The most prominent account—the sharpening model—holds that the neuronal representations of a stimulus become less redundant and sparser with repetition (Grill-Spector et al. 2006). Under the same model, RS can also be considered as a neural novelty detection mechanism (Vizioli et al. 2010). Therefore, the larger amount of RS to global features for EA observers can result from a more efficient neural coding of global information at this stage, which is reflected in the sharpening of global feature representations as a function of attention. Moreover, the absence of significant differences in the amount of stRS between global and local feature repetition in Westerners suggests that, at least at early neural stages of processing, there is no preferential tuning to either global or local information coding in this population.

In line with our findings, the sensitivity to global/local selective attention on the posterior P1 component has previously been reported in a number of studies (Han et al. 1997, 1999, 2000; Lin et al. 2008). For example, the priming of interdependent self-construal, which is closely related to global perceptual bias (Kühnen and Oyserman 2002; Lin and Han 2009), enlarged the P1 amplitude to global compared to local features in Chinese participants (Lin et al. 2008). Similarly, (Han et al. 1997, 1999, 2000) showed that selective attention to global or local feature modified the P1 amplitude in EA observers. The data reported here are in line with previous studies, reinforcing the view of an early sensitivity to global information coding on the P1 component in Easterners only. In contrast, WC observers did not show sensitivity to their preferred (i.e., local) features in this early component. It is worth to point out that in the current task observers were forced to attend equally information at both the global and the local levels, since they could not predict the nature of the potential change in target shape. Therefore, in this task global processing might be more dominant than local processing. Thus, the absence of sensitivity in Westerers on the P1 component could be accounted by the ineherent, general, slower processing of local compared to global features in visual information processing.

Moreover, EA observers showed a behavioral advantage on the global task compared to WC observers. This advantage is likely to arise from their early attentional bias towards global information. In fact, we found that WC observers were less efficient at detecting global than local feature changes, while EA observers performed equally well on both conditions. The behavioral disadvantage of WC observers in the global task seems to stem instead from differences in visual saliency between global and local features. Specifically, the visual processing of global features precedes the analysis of local information (Navon 1977). This global precedence effect would conflict with local information, thus inhibiting responses to the detection of local features (Han et al. 2001; Miller and Navon 2002; Navon 1977). For example, participants are slower at identifying local targets in the presence of interference from a global shape, even when this information is irrelevant (Miller and Navon 2002). The change of global features is disruptive for WC observers when they are required to detect changes in local information (i.e. distinguish between local congruent and different condition). As comparison, EA observers benefit from a top-down attention control to global features, thus limiting the disturbance from the visual salience induced by global feature changes (e.g., the interface hypothesis, McMains and Kastner 2011).

Cognitive control effect on the N2 component

We also observed an interaction between the culture of observers and global/local information tuning over the central-anterior-parietal electrode site at around 200–350 ms. A fine-grained adaptation pattern revealed distinctive temporal dynamics of global/local processing between Westerners and Easterners. Specifically, EA observers showed larger stRS responses to local compared to global feature repetitions over the right anterior-parietal electrodes at 240 ms. WC observers instead displayed first larger stRS responses to global compared to local feature at 270 ms, to then show a reverse pattern at 320 ms. This modulation occurs in the time window of the N2 component, which has a distinct topography (see. Fig. 4c, d) and has been related to cognitive control (see Folstein and Petten 2008 as a review). Cognitive control is partly defined as being a strategy regulation process during response to novelty or (mis)match detection (Nieuwenhuis et al. 2003; Yeung et al. 2004; Zhang et al. 2003). For example, slower behavioral responses resulting from larger demands in cognitive control are correlated with larger N2 amplitudes and delayed peak latencies (Gehring et al. 1992; Nieuwenhuis et al. 2003). Moreover, inhibition of behavioral response also elicits larger N2 amplitudes, especially when stimuli share similar visual features (Folstein and Petten 2008). Here, regardless of their cultural background, all the observers engaged a larger amount of neural resources during the processing of the cultural non-preferred features to overcome the interference from the cultural preferred features. Therefore, the fewer amount of stRS responses elicited by local feature repetition in WC observers and the fewer amount of stRS responses for global feature repetition in EA observers on the N2 component might reflect a more demanding engagement of cognitive resources devoted to performing (mis)match detection for the non-preferred feature repetition. More specifically, these resources might be related to the inhibition of the culturally-preferred bias in visual information sampling (i.e., consistent local feature for WCs and consistent global feature for EAs).

General discussion

The conjoint use of EEG and Navon stimuli has been widely used to investigate the temporal dynamics of attention modulations in global/local shape processing (Han et al. 1997, 1999, 2000; Heinze et al. 1998; Heinze and Münte 1993; Malinowski et al. 2002). Surprisingly, while this approach has been extensively used in a wide range of studies on WC observers and EA observers separately, no previous study had yet directly compared these two groups of observers. Here, we overcame this limitation by testing both groups of observers. As discussed above, our results precisely mapped out and confirmed that the temporal dynamics of global/local processing is indeed different between Easterners and Westerners. More precisely, EA observers show a distinct early sensitivity between global and local feature coding compared to WC observers. Such difference in temporal dynamics of visual shape processing is driven by an initial attentional selectivity and tuning between these two groups of observers from different cultures: a global selective-attention in Easterners and a local-selective attention in Westerners (McKone et al. 2010).

Previous studies on global/local visual shape categorization have suggested that the processing of hierarchical visual stimuli is lateralized in the brain. A left hemisphere advantage has been reported for local processing at the encoding stage, and a right hemisphere advantage for global processing (e.g., Folstein and Petten 2008; Han et al. 2002; Lamb et al. 1990). Unfortunately, it is difficult to interpret our results in terms of hemispheric asymmetries. Although we found an interaction, for which the effect is lateralized in both spatial–temporal clusters (i.e., on the right scalp electrode locations in P1 and on both side of scalp electrode locations in N2), we did not modulate the visual field presentation nor counterbalanced the response keys across participants. The interpretation of our findings without these controls could be potentially misleading, and further research is required to elucidate this issue.

Conclusion

The present study demonstrated that the temporal dynamics of global–local feature coding is shaped by culture. Our data show an early attentional bias in Easterners for global visual properties during an active matching task of Navon hierarchical stimuli. East Asian observers showed strong EEG repetition suppression responses to global compared to local features on the attention-sensitive P1 component. This global precedence effect in Easterners was paired with a more efficient processing of global feature changes at later stages. Western Caucasian observers displayed instead the expected differential electrophysiological responses between global and local information processing, but in later electrophysiological stages (i.e. N2 component). These findings support the view that cultural perceptual biases in visual perception are driven by selective attention. Importantly, they also provide cultural neural signatures and their temporal dynamics for global/local feature processing. Overall, these distinct neural markers could represent the entry level of the more apparent and striking differences observed at the behavioral level across observers from the Western and Eastern cultures.