Introduction

Women are known for enhanced susceptibility to emotional stimuli compared to men [1,2,3,4,5]. This observed sex-related difference is mainly manifested by women’s high sensitivity to negative stimuli. For instance, women, compared to men, exhibit stronger neural activation for negative photographs [6], recognize anger and disgust faces at lower emotion intensity [2], have less schizophrenia-related impairment in identifying negative emotions [3], and show a reduced threshold of emotional responding to negative scenes [5]. Furthermore, this susceptibility is also embodied by women’s reduced emotion regulation compared to males. For instance, women have fewer choices of adaptive regulation strategies [7], and they regulate negative feelings with less involvement of the neural networks underpinning emotional control, such as the dorsolateral prefrontal, orbitofrontal, and anterior cingulate cortices [8]. Also, while men are slower in attention-orienting to happy than to fearful faces, women show similar latencies of attention orienting to both facial expressions [1]. A couple of recent studies have also reported that women are more susceptible to startle responses evoked by rare or novel stimulus than men [9, 10].

However, women and men differ, in most cases, in biological sex and also in the identity of gender role. It is so far unclear whether the well-known sex differences in emotion processing should be accounted for by biological sex or by gender role. Studies of adolescent development have established a phenomenon of gender-role conformity in which early adolescence serves as a period of intensified pressure to conform to gender role expectations from family, peers, school, and the media [11,12,13]. As a result, adolescent development brings boys increased masculinity rather than femininity, while bringing girls increased femininity but not masculinity [14,15,16,17]. Consistent with this evidence, a recent study based on a large sample of college students showed that male adults in general score higher on masculinity but lower on femininity than females [18]. Another line of supporting evidence comes from the studies of gender role violation. Specifically, recent evidence shows that violation of typical, sex-consistent gender roles may predispose individuals, in particular females, to be victims of aggression [19] or to increased victim-blaming by others [20]. Previous studies in gender role have suggested that increased femininity is associated with reduced assertiveness, higher relationship concerns, and increased social dependence, which in turn result in greater risk of depressive symptomatology [14, 21]. By contrast, masculinity is linked with increased assertiveness, more self-confidence, enhanced independence, and reduced depression [22]. Thus, it is reasonable to infer that femininity contributes, at least in part, to a high susceptibility to negative emotion and depression, an effect often observed and interpreted in the framework of sex differences. In this regard, it is essential to isolate the effect of sex from that of gender role in interpreting the phenomenon of sex difference in sensitivity to emotional stimuli. Also, it is important, for ecological consideration, to clarify how sex and gender role each contribute to individual variability in real-life psychopathological symptoms, such as anxiety and depression. However, as far as we know, no study has directly investigated how sex and gender role contribute, independently or interactively, to one’s susceptibility to emotion.

An early study by Kopper and colleagues [22] measured gender role socialization, interpersonal behavior, and mental health variables. The authors reported that masculine gender role, rather than femininity or androgyny, predicts reduced negative emotional states including depression and guilt. In addition, masculinity uniquely predicts increased positive emotional state including self-confidence. However, none of these effects were predicted by sex. Similarly, a later study investigated how sex and gender role identification are related to anger processing, using two anger-evoking scenarios. The results showed that gender role rather than sex effectively predicts anger expression, experience, and control, in that masculine subjects reported higher levels of trait anger, more outward anger expression, and less anger control than feminine subjects [23]. Despite converging evidence of gender role effects in emotion processing, there is evidence showing that sex itself, after isolating the contribution of gender role, is related to hemispheric lateralization during processing of basic emotions [18]. This study further showed that sex modulated gender role effects in emotion-related hemispheric lateralization [18]. Based on these studies, we predicted that, on the one hand, biological sex and gender role may contribute uniquely to one’s sensitivity to emotional-evoking stimuli. As gender role identification is a postnatal, higher-order social learning which develops slowly while sex is determined by prenatal chromosomal features, it is likely that the effect of sex occurs at earlier time points (more automatic/bottom-up) than that of gender role when high temporal resolution technique (e.g. EEG) is used. Specifically, gender role may account more than sex for gender-related differences in neural sensitivity to emotional stimuli in high-order, late processing stages such as attention allocation-related N2 and cognitive relevant P3.

To address the question raised above, two complementary studies were conducted. Study 1 isolated sex and gender role factors by a 2 (man, woman) by 2 (feminine, masculine) orthogonal design, using a non-emotional oddball task combined with EEG technique. The purpose was to explore how sex and gender role each modulated brain susceptibility to emotional stimuli in a lab setting, after controlling for emotion-related traits across different samples. The oddball task was used to allow an emotional stimulus to occur infrequently in the emotion-irrelevant cognitive context, similar to emotional events that occurs in natural situations [24, 25]. Another consideration was that this paradigm has proven to be reliable in eliciting attention-related N2 and cognitive-related P3 components that are sensitive to emotion-related individual differences [5, 26, 27]. In addition, we manipulated the emotion intensity of positive and negative stimuli in order to detect individual differences in the threshold of emotion elicitation in brain potentials [26, 28]. In study 2, for ecological consideration, we used an observation method to measure masculinity, femininity and self-reports of anxiety and depression for psychometric analyses in a large sample of subjects. The purpose was to further determine how gender role and sex influence the real-life prevalence of emotion-related psychopathological symptoms.

Materials and Methods (Study 1)

Participants

To calculate the sample size required in the study, we conducted prior power analyses using G-Power [29] with effect size f set at 0.30 (middle level), power (1 - β) set at 0.90, number of groups set at 4, number of measurements set at 6, and α = 0.05. This showed us that a sample size of n = 96 is required to reach a moderate effect size with the current design. Thus, 96 paid volunteers including 48 masculine (18–29 years; 26 males) and 48 feminine (18–29 years; 22 males) were recruited from Southwest University in China. One masculine man and one feminine woman were excluded from data analysis due to excessive artifacts in the EEG data (Table 1).

Table 1 Gender role measures for the four samples orthogonalized by sex and gender role.

The subjects were selected from a large pool of 450 students who completed the Chinese Sex-Role Inventory (CSRI) [30, 31], an inventory that has been widely used to measure gender expression and gender roles ([32]; please refer to supplementary material (Part A) for the introduction of CSRI). Masculine and feminine samples were produced by the median split on the positive scales for the masculine (median = 69) and feminine dimensions (median = 70) of the CSRI [31]. See supplementary material (Part B) for a detailed description of participants. Female participants were asked for the first day of the last menses and duration of the menstrual cycle (in days), to examine potential effects of the menstrual cycle on emotion processing [for results, see supplementary material (Part G)]. Written informed consent was obtained from every participant prior to the experiment. The experimental protocol was approved by the Ethics Committee of Southwest University and this study was performed strictly in accordance with the approved guidelines.

Stimuli

The present study included two modified oddball sessions (positive and negative). Each session consisted of 4 blocks of 100 trials (55 standard vs 45 deviant pictures). The standard picture was a natural scene of a cup which was kept constant through the experiment (Fig. 1), while there were 180 deviant pictures (90 for positive and another 90 for negative sessions) selected from the Chinese Affective Picture System [33]. The 90 pictures in each session were grouped into three categories according to their emotional intensity (30 high, 30 mild, and 30 neutral pictures). Each block presented 15 pictures for each condition. Blocks 3–4 presented all the pictures again to increase the number of trials. The order of the two sessions was counterbalanced across subjects.

Fig. 1
figure 1

Trial structure and stimulus samples (taking a positive session for example).

The picture was assessed for its valence and arousal on a 9-point scale with a large sample of Chinese participants in a previous survey [34]. In the positive session, the three categories of deviant pictures differed significantly in valence (mean: high = 7.18, mild = 6.62, neutral = 5.57; F(2,87) = 90.4, P < 0.001) and arousal from one another (high = 7.08, mild = 6.47, neutral = 5.63; F(2,87) = 79.5, P < 0.001). Similarly, the deviant pictures used in the negative session were significantly different in valence (high = 2.21, mild = 3.79, neutral = 5.36; F(2,87) = 192, P < 0.001) and arousal (high = 7.18, mild = 5.90, neutral = 4.51; F(2,87) = 89.4, P < 0.001).

Procedure

As shown in Fig. 1, each trial began with a 300-ms fixation followed by a blank screen for a random duration between 500 and 1500 ms. Then, the picture was presented for 2000 ms. Participants were required to press, as accurately and quickly as possible, the “F” key on the keyboard with the left index finger for the standard stimulus and the “J” key with right index finger for the deviant stimulus. Standard and deviant pictures were presented in a random order for each participant. Ten practice trials were used before formal experiments. The standard picture in the practice was the same as that in the formal experiment while deviant stimuli were neutral pictures that were not presented in the subsequent experiment.

EEG Recording and Routine Analysis of ERPs

Brain electrical activity was recorded referentially against the left mastoid and re-referenced offline to the average of the left and right mastoids, by a 64-channel amplifier with a sampling frequency of 250 Hz (NeuroScan Inc., Herndon, VA). EEG data were collected with electrode impedances below 8 kΩ. No online filter was used. Vertical electrooculograms (EOGs) were recorded supra- and infra-orbitally at the left eye. The horizontal EOGs were recorded from the left versus the right orbital rim. Ocular artifacts were removed from EEGs followed by data filtering (0.01–30 Hz; a slope of 24 dB/oct). Filtered data were segmented beginning 200 ms prior to the onset of pictures. Epochs were segmented and baseline-corrected with respect to the mean voltage over the 200 ms preceding the onset of pictures. Only EEG epochs with correct responses were included for analyses.

This study focused on the ERPs elicited by emotional pictures of different valence and intensity. We analyzed the average amplitudes of occipital P1 and N1, frontal N2, and parietal P3 components. The occipital P1 and N1 were measured at the electrode sites O1, O2, PO3, PO4, PO7, and PO8 between 90 and 120 ms (for P1) and 140 and 180 ms (for N1) after pictures. The N2 was measured as the average amplitude at the electrode sites Fz, FCz, F1, F2, FC1, and FC2 from 210 to 270 ms after picture onset. The parietal P3 was measured as the average amplitude at the electrode sites CPz, Pz, P1, P2, P3, P4, CP1, and CP2 from 330 to 560 ms after picture onset. See supplementary material (Part C) for detailed method of routine ERP analyses.

Statistical analysis was performed using SPSS Statistics 20.0 (IBM, Somers). Descriptive data are presented as the mean ± SD. Four-way repeated-measure ANOVAs were performed on behavioral and ERP data, with Intensity (high, mild, neutral) and Valence (positive/negative) as two within-subject factors, and sex (male, female) and gender role (masculine/feminine) as two between-subject factors. For N1 and P1, hemisphere (left/right) was added as another within-subject factor. Post hoc multiple comparisons were conducted using the Bonferroni test. Degrees of freedom were corrected by Greenhouse-Geisser correction whenever appropriate. Source localization was performed using the standardized low-resolution brain electromagnetic tomography (sLORETA, v20081104) [35].

Single-trial Classification Based on ERP Data

The machine learning method was used to find ERP markers that discriminate emotional pictures with various intensities, and determine whether gender role modulates these ERP markers. In line with this purpose, we did two single-trial analyses using logistic regression (LR) classifiers, which has been widely applied in the field of cognitive neuroscience [36,37,38].

First, binary classification was performed. Single-trial amplitudes of short intervals (60 ms) in a sequential order were used to discriminate between emotional (including high and mild intensity) and neutral pictures, as well as between low (including mild and neutral) and high intensity pictures. Classification features were averaged across short intervals (236 short intervals in each trial [37, 39, 40]. Each interval had 58 mean amplitudes from the 58 recording electrodes. To prevent overfitting and retain spatial information, the 58 electrodes were grouped into 8 regions on the scalp [Fig. S1, supplementary material (Part D)]. Then the 8-dimensional feature vector (amplitudes averaged across within-region electrodes) was put into two separate LR classifiers to label the picture currently presented as “emotional” or “neutral”, and “high” or “low”.

Second, ternary classification was performed using combined LR classifiers. Mean amplitudes were calculated within the same time window as in routine ERP analyses (210–270 ms for N2 and 330–560 ms for P3), resulting in a 2 (N2, P3) × 58 (number of channels) feature matrix, which was then stacked into a feature vector of 116 dimensions. To prevent overfitting and for the sake of data illustration, linear discriminant analysis [41] was used to shrink the feature dimension before LR classification.

To ensure that the predictive ERP measures have validity across individuals, we used a between-subject cross-validation strategy [42]. In particular, the LR classifiers were trained on 46 masculine (or feminine) datasets and then applied to the remaining masculine (or feminine) participant (i.e., leave-one-out cross-validation). To evaluate the performance of the classifiers, the receiver operating characteristic curve was plotted and the area under the curve (AUC) was calculated to quantify the prediction results. See supplementary material (Part D) for detailed method of single-trial ERP analyses.

Post-experiment Rating of Emotion (Picture Valence and Arousal)

An emotion assessment using the Self-Assessment Manikin procedure was conducted after EEG recording [43], to explore the subjective emotion induced by each set of images. Subjects were required to rate the emotional valence (1, very unpleasant to 9, very pleasant) and arousal (1, relaxed to 9, very excited) they felt for each image. The onset sequence of images was randomized across emotion conditions.

Materials and Methods (Study 2)

To examine potential influence of sex and gender role in real-life prevalence of emotion-related psychopathological symptoms, we recruited another 300 college students from Neijiang Normal University (n = 206) and Shenzhen University (n = 94). These participants were required to fill out the CSRI, the Beck Depression Inventory Second Edition (BDI-II [44]), and Spielberger’s State-Trait Anxiety Inventory (STAI) [45]. Fourteen participants were excluded according to the criterion of social desirability cutoff of CSRI [supplementary material (Part B)], resulting in a total of 286 valid datasets (151 females). Multiple linear regression (enter method) was implemented with sex (male labeled 0 and female labeled 1) and Feminine Score (a differential score between feminine and masculine traits; referred to as Feminine Score hereafter) as predictors, and trait anxiety (trait form of STAI) and depression level (BDI-II) as two separate dependent variables. Similarly, female participants were asked for the first day of the last menses and average duration of the menstrual cycle to examine potential menstrual cycle effects [for results, see supplementary material (Part H)].

Results (Study 1)

Behavioral Data: Reaction Time and Accuracy

For reaction time, the main effect of gender role was significant [F(1,90) = 4.9, P = 0.029, \( \upeta_{\text{p}}^{2} \) = 0.052]. Masculine participants (563 ± 79 ms) showed longer response times than feminine participants (533 ± 58 ms). For accuracy rate, no significant difference was found between conditions due to ceiling effects (99.0% ± 1.8%).

Behavioral Data: Picture Rating

In this session, only the most important findings are reported. Please refer to supplementary material (Part E) for all statistical results.

Valence Rating

There was a significant three-way interaction of valence × intensity × gender role [F(2,180) = 3.7, P = 0.045, \( \upeta_{\text{p}}^{2} \) = 0.039]. To break down the three-way interaction, we tested the intensity by gender role interaction for positive and negative pictures. For positive pictures, there was a significant gender role by intensity interaction [F(2,184) = 12.0, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.118]; however for negative pictures, we found no significant interaction between them (F < 1). To clearly illustrate the different patterns of valence rating between masculine and feminine participants, an index called differential valence was calculated by subtracting the valence rating for neutral pictures from that for emotional pictures. We found a significant two-way interaction of valence × gender role [F(1,90) = 5.0, P = 0.028, \( \upeta_{\text{p}}^{2} \) = 0.053]: feminine participants (1.2 ± 0.6) had a higher differential valence for positive pictures than masculine participants [0.7 ± 0.7; F(1,90) = 16.1, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.152], and this was positively correlated with self-reported depression (n = 94, r = 0.277, P = 0.007); however the differential valence for negative pictures did not differ between the two groups (F < 1).

Arousal Rating

There was a significant two-way interaction of intensity × gender role [F(2,180) = 5.1, P = 0.016, \( \upeta_{\text{p}}^{2} \) = 0.054]. Similarly, an index of differential arousal was calculated by subtracting the arousal rating for neutral pictures from that for emotional pictures. We found a significant main effect of gender role [F(1,90) = 8.1, P = 0.005, \( \upeta_{\text{p}}^{2} \) = 0.083]: feminine participants (1.4 ± 1.1) had a higher differential arousal for emotional pictures than masculine participants (1.1 ± 1.2), irrespective of stimulus valence and intensity.

Routine Average ERP

Occipital P1

The main effect of sex was significant [F(1,90) = 8.8, P = 0.004, \( \upeta_{\text{p}}^{2} \) = 0.090; Fig. 2]: females (3.5 ± 3.1 μV) showed larger P1 amplitudes than males (2.0 ± 2.7 μV). The main effect of hemisphere was significant [F(1,90) = 7.8, P = 0.006, \( \upeta_{\text{p}}^{2} \) = 0.080; left vs right = 2.5 ± 3.1 vs. 3.1 ± 2.9 μV].

Fig. 2
figure 2

Waveforms and topography of the occipital P1 and N1 components. Waveforms were calculated by averaging the data at O1, O2, PO3, PO4, PO7, and PO8. The P1 topography was calculated by averaging the data at 90–120 ms after picture onset. The N1 topography was calculated by averaging the data at 140–180 ms minus the data at 90–120 ms.

Occipital N1

The main effect of gender role was significant [F(1,90) = 11.3, P = 0.001, \( \upeta_{\text{p}}^{2} \) = 0.112; Fig. 2]: feminine participants (–4.9 ± 3.7 μV) showed larger N1 amplitudes than masculine participants (–3.0 ± 3.0 μV). The main effect of sex was also significant [F(1,90) = 5.8, P = 0.018, \( \upeta_{\text{p}}^{2} \) = 0.061]: females (–4.6 ± 3.7 μV) showed larger N1 amplitudes than males (–3.3 ± 3.1 μV).In addition, the main effect of hemisphere was significant [F(1,90) = 40.0, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.308; left vs right = –3.1 ± 3.3 vs –4.9 ± 3.4 μV].

Fronto-central N2

The main effect of valence was significant [F(1,90) = 16.7, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.157; positive vs negative picture = –5.6 ± 4.0 vs –6.5 ± 4.0 μV]. The main effect of intensity was significant [F(2,180) = 19.9, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.181; high = –6.7 ± 4.3 μV, mild = –6.0 ± 4.0 μV, neutral = –5.4 ± 3.7 μV, paired Ps ≤ 0.007]. We found a significant two-way interaction between intensity and gender role [F(2,180) = 3.6, P = 0.032, \( \upeta_{\text{p}}^{2} \) = 0.038], and between valence and intensity [F(2,180) = 17.7, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.164]. More importantly, there was a significant three-way interaction of valence × intensity × gender role [F(2,180) = 5.4, P = 0.006, \( \upeta_{\text{p}}^{2} \) = 0.056; Fig. 3A].

Fig. 3
figure 3

The frontocentral N2 component. A Waveform and topography. The waveform was calculated by averaging the data across Fz, FCz, F1, F2, FC1, and FC2. The N2 topography was calculated by averaging the data at 210 –270 ms after picture onset. B Source localization of the correlation coefficient between the N2 activity in the mildly negative condition and Feminine Score (a feminine-masculine differential score). Colored areas indicate significant brain regions (P < 0.05), with bright yellow reflecting the most significant areas.

To break down the three-way interaction, we tested the intensity by gender role interaction for positive and negative pictures. For negative pictures, there was a significant gender role by intensity interaction [F(2,184) = 8.4, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.084]; the effect of intensity was found in both masculine [F(2,89) = 24.2, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.352] and feminine participants [F(2,89) = 18.7, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.296], but with different patterns. In masculine individuals, highly negative pictures (–7.9 ± 4.4 μV) evoked larger N2 amplitudes than both mildly negative (–5.4 ± 3.8 μV, P < 0.001) and neutral pictures (–5.6 ± 3.6 μV, P < 0.001), with no difference between the latter two conditions (P = 1.000). In feminine individuals, both highly (–7.5 ± 4.1 μV, P < 0.001) and mildly (–6.9 ± 3.8 μV, P < 0.001) negative pictures evoked larger N2 amplitudes than neutral pictures (–5.1 ± 3.7 μV), with no difference between the former two conditions (P = 0.456). Further correlation analysis showed that the N2 amplitudes for mildly negative pictures were correlated with the anxiety (n = 94, r = –0.198, P = 0.056) and depression scores (r = –0.258, P = 0.012), i.e., a larger N2 corresponded to a higher level of depression. For positive pictures, we found neither a significant interaction between intensity and gender role (F < 1; masculine: high = –5.5 ± 4.2 μV, mild = –5.9 ± 3.6 μV, neutral = –5.6 ± 3.9 μV; feminine: high = –5.6 ± 3.9 μV, mild = –5.5 ± 4.6 μV, neutral = –5.1 ± 3.8 μV), nor significant main effects of intensity (F < 1) and gender role (F < 1).

In line with the three-way interaction, which demonstrated that feminine participants were more sensitive to mildly negative pictures than masculine participants, we found that the N2 amplitudes for mildly negative pictures were significantly correlated (r(94) = –0.333, P = 0.001; Fig. 4A) with Feminine Score (feminine trait minus masculine trait). To localize the neural sources associated with this correlation, sLORETA was used to regress the N2 activity in the mildly negative condition on Feminine Score (Fig. 3B). The most significantly enhanced cortical source within the N2 time window was located in the anterior cingulate [Brodmann’s area (BA) 24, MNI coordinates = [−5, 30, 0]; r = 0.381, P < 0.001). Other significant regions included the superior frontal gyrus and medial frontal gyrus.

Fig. 4
figure 4

Correlation (n = 94) between ERP indexes and Feminine Score. A Correlation between the N2 amplitudes for mildly negative pictures and Feminine Score. B Correlation between the P3 amplitudes for positive pictures and Feminine Score. Pearson correlation coefficient (r) and significance level (P) are reported in the plots.

Parietal P3

The main effect of valence was significant [F(1,90) = 6.8, P = 0.011, \( \upeta_{\text{p}}^{2} \) = 0.070; positive vs negative = 9.6 ± 3.5 vs 10.2 ± 3.8 μV]. The main effect of intensity was significant [F(2,180) = 23.0, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.204]. Emotional pictures with high intensity (10.7 ± 3.8 μV) evoked larger P3 amplitudes than those with mild (9.7 ± 3.5 μV, P < 0.001) and neutral intensities (9.4 ± 3.6 μV, P < 0.001); no significant difference was found between the latter two conditions (P = 0.201). More importantly, there was a significant three-way interaction of valence × intensity × gender role [F(2,180) = 5.2, P = 0.006, \( \upeta_{\text{p}}^{2} \) = 0.055; Fig. 5A].

Fig. 5
figure 5

The parietal P3 component. A Waveform and topography. The waveform was calculated by averaging the data across CPz, Pz, P1, P2, P3, P4, CP1, and CP2. The topography was computed by averaging the data at 330–560 ms after the picture onset. B Source localization of the correlation coefficient between P3 activity in the positive condition and Feminine Score.

To break down the three-way interaction, we tested the intensity by gender role interaction for positive and negative pictures. For positive pictures, there was a significant gender role by intensity interaction [F(2,184) = 9.0, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.089]: an effect of intensity was found in feminine [F(2,89) = 23.0, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.341] but not masculine participants (F < 1; high = 9.2 ± 3.4 μV, mild = 8.9 ± 3.5 μV, neutral = 9.0 ± 3.7 μV). Specifically, feminine individuals showed larger P3 amplitudes for both highly (11.2 ± 3.4 μV, P < 0.001) and mildly positive pictures (10.6 ± 3.5 μV, P = 0.001) than neutral pictures (9.0 ± 3.2 μV), with no difference between the former two conditions (P = 0.516). Further correlation analysis showed that the P3 amplitudes for positive pictures were positively correlated with the anxiety (n = 94, r = 0.234, P = 0.024) and depression scores (r = 0.212, P = 0.041). For negative pictures, we found neither a significant interaction between intensity and gender role (F < 1), nor a significant main effect of gender role (F < 1); only the main effect of intensity was significant [F(1,92) = 17.2, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.158]: highly negative pictures (11.2 ± 4.1 μV) evoked larger P3 amplitudes than both mildly negative (9.7 ± 3.6 μV, P < 0.001) and neutral pictures (9.8 ± 3.6 μV, P < 0.001), with no difference between the latter two conditions (P = 1.000).

In line with the three-way interaction, which demonstrated that feminine participants are more sensitive to positive pictures than masculine participants, we found that the P3 amplitudes for positive pictures (averaged across high- and mild-intensity conditions) was significantly correlated with Feminine Score (r(94) = 0.240, P = 0.020; Fig. 4B). To localize the neural sources associated with this correlation, sLORETA was used to regress the P3 activity in the positive condition on Feminine Score (Fig. 5B). The most significantly enhanced cortical source within the P3 time window was located in the precuneus (BA 7, MNI coordinates = [15, –60, 25]; r = 0.253, P = 0.014). Other significant regions included the posterior cingulate and superior temporal gyrus.

Single-trial Classification of Emotional Pictures with Diverse Intensities

Binary Classification

To verify the ERP findings obtained using the routine epoch-averaging method, this section examines whether single-trial amplitudes within the N2 and P3 time windows can discriminate between emotional and neutral, as well as between high and low intensity pictures.

When highly and mildly positive pictures were discriminated from neutral pictures, the AUC produced by the LR classifier based on feminine datasets showed a distinct peak across the time window 300–600 ms (peak AUC = 0.91 ± 0.10; first column of Fig. 6), while the AUC of the null hypothesis (H0: no predictive information in the ERP) had a mean 95% CI of [0.49, 0.62]. The brain region that contributed most to the classification [32.3% ± 4.2%; F(7,322) = 316, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.873; paired t-test P < 0.001] consisted of parietal electrodes such as CPz, Pz, P1, P2, P3, P4, CP1, and CP2 (refer to supplementary material (Part F) and Table S1 for electrode contribution). However, the AUC based on masculine datasets did not show any peaks above the 95% CI of H0 distribution ([0.49 0.63]).

Fig. 6
figure 6

Binary classification. LR classifiers were trained using the 58-channel ERPs (60 ms in length) to discriminate emotional from neutral pictures (first and third columns) and high and low intensity pictures (second and fourth columns). For the ERP data of 47 masculine (top row) and 47 feminine (bottom row) participants, the leave-one-out cross-validation resulted in 47 testing sets. The waveforms show the area under the curve (AUC) produced by the LR classifiers (pink for individuals and red for the mean) and the 95% confidence interval (CI) of the null hypothesis (black dashed lines).

When highly positive pictures were discriminated from mildly positive and neutral pictures, the AUC produced by the LR classifier based on either masculine or feminine datasets did not show any distinctive peaks above the 95% CI of H0 distribution ([0.48, 0.62]; second column of Fig. 6).

When highly and mildly negative pictures were discriminated from neutral pictures, the AUC based on feminine datasets showed a distinct peak across the time window 200–300 ms (peak AUC = 0.89 ± 0.15; third column of Fig. 6). The brain region that contributed most to the classification (25.4% ± 3.1%; F(7,322) = 131, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.740; paired t-test P < 0.001) consisted of frontal electrodes such as Fz, FCz, F1, F2, FC1, and FC2 (Table S1). However, the AUC based on masculine datasets did not show any peaks above the 95% CI of H0 distribution ([0.48, 0.62]).

When highly negative pictures were discriminated from mildly negative and neutral pictures, the AUC based on masculine datasets showed two peaks across the time windows 200–300 ms (peak AUC = 0.91 ± 0.11) and 300–600 ms (peak AUC = 0.87 ± 0.12; fourth column of Fig. 6). The brain regions that contributed most to the classification consisted of frontal [24.8% ± 3.3%; for the first AUC peak; F(7,322) = 89.6, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.661; paired t-test P < 0.001] and parietal electrodes [36.2% ± 5.0%; for the second AUC peak; F(7,322) = 247, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.843; paired t-test P < 0.001]. Meanwhile, the AUC based on feminine datasets shows a peak across the time window 300–600 ms (peak AUC = 0.89 ± 0.11). The brain region that contributed most to the classification [34.2% ± 3.4%; F(7,322) = 248, P < 0.001, \( \upeta_{\text{p}}^{2} \) = 0.843; paired t-test P < 0.001] consisted of parietal electrodes (Table S1).

It is worth noting that the AUC peaks shown in Fig. 6 occurred at approximately the same times as the peaks of differential amplitudes between high, mild, and neutral pictures in Figs 3 and 5.

Ternary Classification

According to the results of routine ERP analyses, the N2 and P3 contribute to the intensity discrimination of positive and negative pictures. In this section, single-trial N2 and P3 amplitudes are used to classify emotional pictures into high, mild, and low (i.e., neutral) intensity.

When highly and mildly positive pictures as well as neutral pictures were separated, the AUCs produced by the LR classifiers based on masculine and feminine datasets were 0.42 ± 0.08 and 0.60 ± 0.10, while the AUC of the null hypothesis had a mean 95% CI of [0.30, 0.44]. This result indicates that single-trial N2 and P3 of masculine participants cannot discriminate the three categories of pictures, while the N2 and P3 of feminine participants are helpful for discrimination. Three typical datasets of masculine and feminine participants are shown in Fig. 7A and C.

Fig. 7
figure 7

Ternary classification. Single-trial N2 and P3 amplitudes (116 dimensions) were first shrunk by linear discriminant analysis (LDA) to obtain optimized feature vectors. Then combined LR classifiers were trained using optimized N2/P3 feature vectors to classify positive (A and C) and negative pictures (B and D) into highly and mildly intense and neutral categories. Here we show optimized classification features from three typical masculine (upper row) and feminine (lower row) participants. Each dot indicates an individual trial. Data samples are plotted at the first two out of five optimized discriminant dimensions. The feature vectors were transferred from amplitude values to arbitrary units for display purposes only.

When highly and mildly negative pictures as well as neutral pictures were separated, the AUCs produced by the LR classifiers based on masculine and feminine datasets were 0.59 ± 0.09 and 0.89 ± 0.08, while the AUC of the null hypothesis had a mean 95% CI of [0.31, 0.44]. This result indicates that single-trial N2 and P3 are helpful for discriminating the intensity of negative pictures, and the ERP data from feminine participants achieve a better performance than that from masculine participants [independent samples t(92) = –16.7, P < 0.001]. Three typical datasets of masculine and feminine participants are shown in Fig. 7B and D.

Results (Study 2)

For the 286 valid datasets, there were 135 males and 151 females. Among the 135 males, there were 43 masculine, 38 feminine, 32 androgynous, and 22 undifferentiated subjects. Among the 151 females, there were 32 masculine, 47 feminine, 40 androgynous, and 32 undifferentiated subjects. Masculine and feminine samples were produced by the median split on the positive scales for the masculine (median = 69) and feminine dimensions (median = 70) of CSRI [31].

To explore the influence of sex and gender role on anxiety and depression, hierarchical regression was performed with sex and Feminine Score considered as the first and second predictor. Results showed that Feminine Score but not sex significantly predicted the levels of trait anxiety (standardized coefficient = 0.178, P = 0.003) and depression (standardized coefficient = 0.174, P = 0.003; Fig. 8; Models 1 to 4 in Table 2).

Fig. 8
figure 8

Influence of sex and gender role on trait anxiety and depression (n = 286). A Relation between sex and trait anxiety (measured by Trait form of STAI). B Relation between gender role (measured by Feminine Score) and trait anxiety. C Relation between sex and depression level (measured by BDI-II). D Relation between gender role and depression level. Standardized coefficient (β) and significance level (P) are reported in the plots.

Table 2 Hierarchical regression of sex and gender role on trait anxiety and depression

To give equal attention to the factor sex, another hierarchical regression was performed, with Feminine Score and sex as the first and second predictor. Results showed that Feminine Score but not sex significantly predicted the levels of trait anxiety and depression (Models 5 to 8 in Table 2).

Discussion

After dissociating the effects of gender role and biological sex on emotional reaction, the findings of this study confirmed that gender role, compared to biological sex, plays a critical role in determining humans’ susceptibility to emotion. Specifically, in an event-related potential-based experiment, Study 1 indicated that feminine subjects showed specific emotional effects for mildly negative stimuli in N2 amplitudes that were absent in masculine subjects. Meanwhile, feminine subjects showed a specific emotional effect for highly and mildly positive stimuli in P3 amplitudes that was absent in masculine subjects. These neurophysiological results were in line with the behavioral findings that feminine subjects showed more enhanced emotional arousal than masculine subjects for emotional pictures, irrespective of valence and intensity. Complementary to these results, Study 2, using a large sample of psychometric data, showed that feminine gender role rather than biological female predicted increased psychopathological symptoms of anxiety and depression in real-life. All of these findings were independent of the menstrual cycle of females (Part G, supplementary material).

The occipital P1 and a later N1 are known to reflect early visual processing of stimuli sensitive to the manipulation of selective attention, with increased amplitudes for attended relative to unattended conditions [46,47,48]. We did not find any significant effect of group factors, either sex or gender role, on emotion effects in these early components. This result is consistent with prior ERP findings that sex differences in emotion processing mainly occurs in later ERP components, such as fronto-central N2 and parietal P3 indicative of attention orienting and controlled cognitive processing, respectively [1, 5, 49,50,51]. It is worth noting that the main effect of sex (P1, 90–120 ms) occurred earlier than the main effect of gender role (N1, 140–180 ms), suggesting that biological sex modulates overall visual processing faster than environmentally-shaped gender role.

More importantly, after controlling gender role for male and female samples, the established sex differences in response to emotional stimuli, as evident in oddball N2 and P3 components [1, 5], were no longer found in the current study. Instead, we found clear gender role differences with feminine subjects, irrespective of sex, exhibiting enhanced N2 amplitudes to mildly negative stimuli and greater P3 amplitudes to pleasant stimuli of diverse intensities compared to masculine subjects. Fronto-central N2 activity during the oddball paradigm is considered to reflect orienting attention and attention vigilance toward unexpected salient and potentially harmful events [26, 34, 49, 52,53,54]. Subsequently, a parietal peak P3 component in a non-emotional task has been accepted to reflect controlled, inhibitory processing of task-irrelevant, interfering negative information [25, 27], or reflect heightened evaluation of stimulus meanings and increased approach motivation in a pleasant context [55, 56]. All the subjects, irrespective of sex or gender role, responded more intensely to highly negative stimuli across N2 and P3 components. This is in line with the notion of an emotional negativity bias that prioritizes processing of salient negative stimuli, irrespective of processing stage, is important for adaptive living in a changing environment [57, 58].

The key finding was the emotional effects of mildly negative stimuli on N2 amplitudes and of positive stimuli on P3 amplitudes in feminine but not in masculine subjects, irrespective of sex. The validity of frontocentral N2 and parietal P3 as biomarkers of the feminine sample in response to negative and positive stimuli, was consistently verified by our machine learning analyses based on single-trial ERP data, using both binary and ternary classifiers. This finding was supported by the correlation analyses showing that N2 amplitudes for mildly negative and P3 amplitudes for positive stimuli both increased as a function of feminine scores.

In addition, sLORETA results showed that the correlation between N2 and feminine score was localized at the anterior cingulate cortex, superior frontal gyrus, and medial frontal gyrus that subserve the attention allocation process [59, 60], which is consistent with previous source location findings in attention-related tasks such as the oddball task [53, 61,62,63]. Meanwhile, the correlation between P3 and feminine score was located in the precuneus (conscious or elaborative information processing [64], posterior cingulate cortex, and superior temporal gyrus, which is in line with previous source estimations for late positive component in emotion-related [27, 65, 66] and emotion-unrelated ERP studies [67,68,69]. This evidence, together with the current source estimations, suggest that the enhanced emotional reactivity of the feminine sample in N2 and P3 were mediated by attention- and cognition-related networks including the anterior/posterior cingulate, medial/superior frontal cortices, superior temporal gyrus, and precuneus.

Previous studies of sex-related differences in emotion processing, without controlling gender roles, consistently reported that females are more reactive to aversive stimuli than males [2, 6, 70], which was supported by a recent meta-analysis of sex differences in emotion-related neural activation [71]. Also, a couple of studies examined sex differences in the experience of positive emotion, without considering gender roles. They reported that females are more responsive to positive depictions [1, 6] and experience positive emotions like affection, contentment, and joy more frequently and intensely than males [72], despite existence of opposing evidence showing no consistent sex difference in positive emotion [5, 71].

In contrast to these findings, the robust phenomenon of sex differences is replaced by a prominent gender role differences when gender role (masculinity, femininity) was isolated from sex in our study. This is consistent with early studies showing that feminine gender role predicts greater social dependency and depressive psychopathology [22]. Another line of supporting evidence comes from management psychology studies using an employee sample, which shows that feminine individuals react more intensely and positively to social support than masculine ones do, irrespective of sex [73]. The gender role-related enhancement of response to cues of social support is considered to explain the unique attention bias of women for positive facial expressions [1] and sex differences in the prevalence of depression [74]. Consistent with this idea, we found that the feminine compared to the masculine sample showed greater emotional effects for pleasant stimuli in P3 amplitudes and valence rating, two indexes positively correlated with self-reported anxiety and/or depression. Based on these results, the feminine-related sensitivity to social support is likely to contribute to the finding that the feminine sample exhibited enhanced emotional reactivity to positive stimuli in P3 amplitude.

In summary, the findings of the current study indicated that gender role modulates lab-based emotion processing and real-life mental health more than sex, possibly due to the central role of gender role in determining cognitive, affective, and social function. Developmental studies have indicated that psychosocial variables contribute substantially to the prevalence of affective disorders during adolescence in girls, such as gender role socialization, relational dependence, ruminative coping, and body image concerns [12, 75]. Aside from the pivotal role of intensified feminine gender role in the development of depression in girls, multiple studies indicate that feminine gender role positively predicts relational dependence, ruminative coping, body dissatisfaction, and depression [22, 76]. By contrast, the masculine gender role positively predicts problem-focused coping, self-confidence, and independence, and negatively predicts depression [22, 77, 78]. Also, socialized masculinity has proven to be a protective factor for mental health in men, as controlling masculinity leads to a greater likelihood for men versus women to experience depressive symptoms [78]. Consistent with this evidence, the current findings, both the lab-based (Study 1) and psychometry-based (Study2) results suggest that gender role rather than sex is the primary determinant of susceptibility to emotion. Finally, recent evidence implies that sex-related hormones play an important role in the early development of gender identity [79]. Also, sex-related hormonal levels are indicated to modulate several lines of emotional processing, such as facial affect recognition [80], pleasant rating of positive images [81], and aggressiveness [82]. In this regard, it is necessary for future studies to measure sex-related hormonal levels (androgen and estrogen) in males and females of varying gender roles, in order to clarify the neurochemical mechanisms underlying the gender role effects on susceptibility to emotion.