Introduction

Facial expressions of basic emotions (e.g., sadness, happiness) are recognized across cultures at accuracy rates better than chance and thus are considered by many to be universal (Ekman 1972; Ekman and Friesen 1976; Ekman et al. 1969; Izard 1971). Nevertheless, a relative in-group advantage has been documented in the recognition of emotional facial expressions (Beaupré and Hess 2006; Elfenbein and Ambady 2002a, b, 2003a, b; Matsumoto 2007). People are typically more accurate at recognizing emotions expressed by members of their own cultural group than emotions expressed by members of another cultural group. Research further shows this “relational” effect in recognition accuracy cannot be fully explained by factors such as greater familiarity with in-group members’ facial physiognomies or physical features (e.g., hair style), greater motivation and attention for decoding in-group members, or anxiety in decoding unfamiliar out-group members (Elfenbein and Ambady 2002b; Elfenbein et al. 2002, 2004).

Elfenbein and Ambady (2002b, 2003b) proposed that this in-group advantage reflects knowledge about cultural variations and nuances in expressing and communicating emotions, which they term cultural dialects. In addition to universal expressive features, they suggest, people also acquire culturally specific expressive features through social learning. These cultural dialects give in-group members an advantage in recognizing each other’s expressions. Subsequent research findings have lent support to their theory. Researchers have demonstrated, for example, that the same posed expressions of serenity, shame, contempt, anger, sadness, surprise, and happiness involved activation of different facial muscles for Quebecois and Gabonese Canadians (Elfenbein et al. 2007). Moreover, exposure to another culture tends to reduce the in-group advantage (Elfenbein and Ambady 2002b, 2003c) and the closer two cultures are in cultural and physical distance, the smaller the in-group advantage (Elfenbein and Ambady 2003a).

However, most of the previous research tested the in-group advantage using prototypical, full-blown expressions, which may be limiting because expressions occurring in daily life are typically more subdued. Rarely are strong emotions evoked in everyday life and even when they are, their expression tends to be modulated according to social display rules (Ekman and Friesen 1969; Hess et al. 1995; Matsumoto et al. 2002; Matsumoto et al. 2009). Thus, subtle expressions and expressions of subtle-to-moderate intensity may be more common in everyday communication than strong, full-blown expressions. Whether or not recognition of these expressions exhibits the same pattern of in-group advantage as the full-blown expressions is important to investigate because it allows us to gauge the extent to which cultural dialects influence emotion communication in everyday life and whether cultural forces other than cultural dialects influence the recognition process.

In addition, using full-intensity facial expressions may not be the best stimuli to use to investigate the in-group advantage. Indeed, the in-group advantage has not been unequivocally supported (Matsumoto 2002, 2007; Matsumoto et al. 2009), leading Matsumoto (2002) to suggest that the in-group advantage may be best observed when expressive signals are of mid-range clarity (e.g., moderately intense). When expressive signals become extremely clear or extremely weak, Matsumoto suggested, the in-group advantage may disappear because the expressions are either too recognizable or unrecognizable to people from all cultural groups.

Moreover, when it comes to detecting subtle emotional expressions, there may be cultural differences in sensitivity overall, regardless of whether the expresser is an in-group or out-group member. For instance, there is evidence to suggest that East Asians tend to be more vigilant to subtle social signals of positive and negative emotions and show greater efficacy in processing such signals than Westerners (Ishii et al. 2011). In East Asian cultures, interdependence is emphasized and greater value is placed on interpersonal relatedness and harmony, which contrasts with the Western notion of independence and autonomy of the self (Markus and Kitayama 1991). As such, East Asians are more concerned with acceptance and fitting in the social hierarchy and report a stronger motivation than Westerners for “preserving face”, a concept that relates to the combined social reputation of one and one’s close others (Heine 2007; Ho 1976). To people from East Asian cultures, it is more important to avoid social conflict and failing social expectations than it is to promote and assert the self (Heine 2007). Correspondingly, previous studies have shown that East Asians demonstrate greater vigilance to social signals of approval or disapproval than people from Western cultures (Heine 2007; Ho 1976; Ishii et al. 2011; Kitayama et al. 2007).

Alternatively, East Asians’ desire for fitting in and preserving interpersonal harmony may result in a reduced sensitivity to subtle expressions of negative emotions in others. Matsumoto and colleagues reported that East Asians tended to rate expressions of sadness, anger, and fear as less intense than did individuals from individualistic cultures, presumably out of a concern that these emotions potentially could disrupt social harmony (Matsumoto 1989; Matsumoto and Ekman 1989; Matsumoto et al. 1999, 2008). East Asians’ desire to avoid social embarrassment or conflict may thus make them relatively “oblivious” to subtle expressions of negative emotions in others to ensure smooth social interaction. If this is true, we would expect then that East Asians would actually be less accurate than Westerners in detecting subtle negative emotional expressions that potentially could disrupt interpersonal harmony, such as anger and sadness, but probably not different in detecting subtle expressions of positive emotions.

In sum, while an in-group advantage has been observed in the recognition of emotional expressions (e.g., Elfenbein and Ambady 2002a, b, 2003a, b), these studies have sometimes had conflicting findings (Matsumoto 2002, 2007; Matsumoto et al. 2009), and the majority of these studies used facial expressions with full-blown, prototypical expressions of emotions. Subtler, lower-intensity emotional expressions may be much more typical in daily life. Research using such expressions is sparse, and the literature on social display rules and cultural differences in emotion experience suggest that when comparing people from East Asian cultures and people from Western cultures, the East Asians may either demonstrate greater ability to detect subtle facial expressions, due to vigilance for threats to social harmony, or lesser ability to detect subtle facial expressions, due to a desire to avoid signals of disharmony. It is currently unknown which alternative may be true (if either) and whether these effects are universal or vary by whether the expresser is an in-group versus out-group member. Filling this gap in the literature could extend our understanding of how people from different cultures vary in their perception of emotional expressions, which is increasingly important as our world becomes more and more globalized.

In order to investigate these questions, we exposed American and Chinese undergraduates to Caucasian and Chinese faces depicting angry, sad, and happy expressions of subtle, low and moderate intensity. We chose American and Chinese participants to represent, respectively, Western individualistic cultures and East Asian collectivistic cultures (Triandis 1995). As described earlier, the diametric differences between Western and East Asian cultures in values, meaning systems, customs, and behavioral conduct rules would allow us to discern possible cultural patterns above and beyond the in-group advantage in the recognition of facial expressions. Angry and sad expressions were selected to represent negative emotions that signaled social disapproval or failure and happy expressions were used to represent positive emotion that signaled social approval. To create expressions of subtle, low, and moderate intensities, we used a morphing procedure that utilized full-blown emotional expressions and neutral expressions from the same expresser to develop faces that varied in the percentage of emotional expression they expressed (Feldman Barrett and Niedenthal 2004; Hess et al. 1997; Niedenthal et al. 2001). The benefit of the morphing procedure is that it simulates intensity increases in facial expressions reasonably well, allows control over individual differences in the time course of expressions, and has good ecological validity (Feldman Barrett and Niedenthal 2004; Hess et al. 1997; Niedenthal et al. 2001).

Grounded in the previous literature, we predicted a relative in-group advantage in the recognition of expressions at the moderate intensity level (that is, Americans would be more accurate than Chinese in judging Caucasian facial expressions of moderate intensity and vice versa). For subtle and low intensities, we predicted that rather than observing an in-group advantage, we may observe a main effect of culture in sensitivity to these subtle facial expressions. Two alternative predictions can be made regarding this cultural difference. Based on the literature suggesting East Asians may be more vigilant for cues implying social acceptance or rejection, Chinese judges may outperform Americans in judging expressions of these subtle intensities. However, based on the literature suggesting that ignoring subtle negative facial expressions may help to preserve social harmony, Chinese judges may underperform American judges in judging subtle expressions of anger and sadness, but not happiness. By varying the cultural background of both the expressers and the judges and varying the intensity of expressions, our study was designed to elucidate the boundary conditions of the in-group advantage.

Method

Participants

One hundred seventy-seven American undergraduates from a liberal arts college in the Northeast region who identified themselves as Caucasian (33 males, 142 females, and 2 unspecified on gender) and one hundred ninety Chinese undergraduate students from a university in Mainland China who identified themselves as Han Chinese (the largest ethnic group in China with 91.59 % of the population being classified in this category) and reported having never been abroad (75 males, 109 females, and 6 unspecified on gender) participated in the study in exchange for extra credit in psychology classes.Footnote 1 Both the American and Chinese institutions are 4-year institutions and students in both institutions predominantly lived on campus. Both institutions were highly homogenous in ethnicity; the American sample was predominantly white (91.2 % of the sample identified as Caucasian) and the Chinese sample was predominantly Han-Chinese (89.2 % of the sample identified as Han). The American sample (M age  = 18.9, SD = 2.2; females 80.2 % vs. males 18.6 %) was somewhat younger than the Chinese sample (M age  = 19.7, SD = 1.1; females 57.4 % vs. males 49.5 %), t = −4.33, p < .001, and had a more predominantly female ratio than the Chinese sample as well, χ2 = 17.35, p < .001. Analyses with age as a covariate and gender as a fixed factor, however, did not reveal any significant effect of age or gender on recognition accuracy in preliminary analyses. Therefore age and gender were not included in the analyses reported below.

Procedure

Participants completed the study in small groups in the psychology labs, respectively, in the U.S. and Mainland China. The labs were comparable in equipping PC desktop computers, E-Prime software program, and a private testing compartment for each participant. Upon arrival, participants first reported baseline mood on a 20-item mood scale adopted from the Positive and Negative Affect Schedule (PANAS, Watson et al. 1988).Footnote 2 Next, they completed two computer tasks. The emotion recognition task was always completed second.Footnote 3 The study was conducted in English for American participants and in Chinese for Chinese participants. The research protocols and instructions were originated in English and translated into Chinese using back-translation procedures to ensure accuracy of the translation and equivalency of the procedure. Participants were debriefed after completing the study and thanked for their participation.

Apparatus and materials

Facial stimuli

Color photos of facial expressions of anger, sadness, happiness, and neutral expressions were used in the present study. The Caucasian facial expressions were selected from the Karolinska Directed Emotional Faces database (KDEF, Lundqvist et al. 1998). The KDEF contains frontal straight views of emotional expressions of Caucasian models who were between 20 and 30 years of age. The models were instructed to evoke a target emotion and then to show the expression clearly and strongly; digital color photos were taken of their posed expressions. In a pilot study, we asked 8 outside American judges to rate the set on a 7-point scale (1 = very poor expression of target emotion to 7 = very good expression of target emotion). Based on the ratings, we selected 2 males and 2 females expressing angry, sad, happy, and neutral expressions that had the best representativeness ratings. The selected expressions were independently rated on intensity of the intended emotions in pretests by nine Chinese raters and fourteen American raters who were unaware of the purpose of the study, using a 9-point scale (1-not at all, 4-moderate, 9-extremely). The average intensity ratings for the four expressers are reported in Table 1.

Table 1 Pretest intensity ratings of Caucasian and Chinese faces

The Chinese facial expressions were created following a similar procedure as the KDEF. A group of 20 Han-Chinese models who were between 18 and 34 years of age and who had never been to or lived outside of Mainland China were recruited. Models were instructed to first display a neutral expression and color digital photos were taken of the expression. They were then instructed to recall or imagine as vividly as possible an event that made him or her feel intensely sad, angry, and happy, discuss it in detail, re-experience the emotion as vividly as possible, and then show the emotion as strongly and clearly as possible through his or her facial expression. Similar to the KDEF, models wore gray T-shirts during the photographing and were seated against a white wall at about 3 m away from the camera. They wore no eyeglasses, earrings, or makeup and did not have facial hair (beard, mustache). Probe questions were asked during the recall to help the models better generate or retrieve the emotion. Neutral expressions were always photographed first, followed by expressions of anger and sadness, balanced in order among the models, and happy expressions were always photographed last. An independent group of ten American raters and twenty-seven Chinese raters rated the intensity of the intended emotional expressions in two separate pretests, following the same rating procedure described earlier for the KDEF photos. Based on the pretest results, 2 Chinese males and 2 Chinese females were selected whose expressions were judged to be most genuine, clear, and intense in the set and also rated closest to the selected KDEF expressions in intensity. The average intensity ratings of the selected Chinese expressions are reported in Table 1. As shown, the selected Chinese expressions were comparable to the selected Caucasian faces in intensity on sad, happy, and neutral expressions, respectively, t(50) = −0.72, t(52) = 0.24, and t(53) = −0.21, all non-significant, but were lower than Caucasian faces in intensity on anger, t(43) = −3.98, p < .001. Even though this difference in intensity is consistent with previous findings showing that Chinese emotional expressions were generally less intense than Caucasian expressions (Elfenbein et al. 2002), it may have affected recognition accuracy. To rule out any possible effect this difference may have on recognition accuracy, we statistically controlled for pretest intensity of the expressions in subsequent analyses. Details are discussed below.

In addition, in an effort to validate the Chinese expressions, twenty-four independent American raters who were unaware of the purpose of the study rated the Chinese expressions on recognizability (raw percentage hit rate) and fourteen of these people also rated the selected KDEF Caucasian expressions on recognizability. Following the procedure used in the validation of the KDEF set (Goeleven et al. 2008), raters were asked to choose an emotion label that best described the emotion portrayed in the expression from nine possible choices (i.e., anger, sadness, happiness, fear, surprise, disgust, contempt, neutral, and others). Raters viewed expressions one at a time, in a randomized order. Recognizability of the Chinese expressions, in comparison to the Caucasian expressions selected from the KDEF set for the present study was 68.8 vs. 78.6 % for angry expressions, 78.2 vs. 73.2 % for sad expressions, 93.1 vs. 96.4 % for happy expressions, and 64.2 vs. 69.6 % for neutral expressions, respectively. These numbers were largely comparable to previously reported recognizability of 78.8, 76.7, 92.7 and 62.6 % for the angry, sad, happy, and neutral expressions in the KDEF set in a Caucasian sample (Goeleven et al. 2008), and were also comparable to or even better than recognizability of face sets used in other studies (e.g., Elfenbein et al. 2004). Thus the result lent support to the validity of the Chinese expressions in the present study.

Facial expressions of varying intensity levels

Morphing software (Black Belt Systems 2000) was used to generate the varying intensity levels of each expression for both the Caucasian and Chinese expressers. The morphing technique uses each person’s neutral expression as the beginning anchor and his or her clear full emotional (angry, sad, and happy) expression as the final anchor to produce a movie composed of 100 facial composites for each emotional expression. The composites show, successively, a face changing gradually in intensity, from displaying a neutral expression to displaying a clear full-blown emotional expression (e.g., neutral → clearly angry). Previous research has shown that recognition accuracy increases incrementally in relation to intensity level and starts to level off when the intensity level reaches 75 % (Zhang and Parmley 2010). Therefore, we used the range of 15, 30, 45, and 60 % to represent intensity levels ranging from subtle to moderate.

Emotion recognition task

The expressions at the four frames were presented in a computerized emotion recognition task, using E-Prime stimulus presentation software, version 2.0 (2002). Before the emotion recognition task, participants completed two practice trials in which they viewed and labeled expressions of one Caucasian and one Chinese male target. Participants then viewed the target emotional expressions, one at a time, in the center of the screen, and in a randomized order. The emotional expressions (angry, sad, and happy emotions at four frames with four Caucasian and four Chinese expressers) each were shown twice. We felt that showing the expression twice struck a good balance between increasing the reliability of the accuracy measure and avoiding having the participants become too familiar with the expression. The neutral expressions, used as fillers, were interspersed among the emotional expressions and were each shown 10 times. Participants viewed each expression and categorized it by pressing one of the keys (angry, sad, happy, neutral) on the keyboard. Once participants entered a response, the expression disappeared and the next expression appeared on the screen.

Data preparation

Because we were primarily interested in cultural differences, recognition accuracy was averaged among the four expressers of each cultural group to create a composite accuracy of recognizing in-group versus out-group members. In addition, to simplify data presentation and analyses, recognition accuracy at 30 and 45 % were averaged to represent accuracy at the low intensity level, contrasting with the subtle (15 %) and moderate (60 %) intensity levels.Footnote 4 Table 2 shows means and standard deviations of raw accuracy (raw hit rate) as a function of emotion, culture of expresser, and culture of judge.

Table 2 Mean raw recognition accuracy as a function of emotion, culture of expresser, and culture of judge

Raw accuracy, however, is not a good measure of recognition accuracy because it does not take response biases into consideration. Therefore, an unbiased hit rate (Hu) was used to measure recognition accuracy (Wagner 1993). Hu was computed by multiplying the raw hit rate (e.g., the number of times sadness was correctly identified divided by the total number of times sad expressions were presented) and differential accuracy (e.g., the number of times sadness was correctly identified divided by the number of times participants used the label of sadness across all stimuli). Hu scores ranged from 0 to 1. We calculated Hu for each participant, separately for the recognition of each expression at the 15, 30, 45 and 60 % frames. As reported earlier, the original Chinese expressions were rated somewhat lower in pretest intensity than the original Caucasian expressions. As a result, the morphed Caucasian and Chinese expressions may differ somewhat in intensity as well, which may have affected recognition accuracy. To correct for this, each Hu was regressed onto the pretest intensity of the corresponding original expression, and the unstandardized Hu residues were used in the analyses. Hu residues represent recognition accuracy after adjusting for variance in the intensity of the original expressions; it ranges from negative to positive values, with higher values indicating greater accuracy. As with raw accuracy, Hu residues for the 30 and 45 % frame were averaged to represent adjusted recognition accuracy at the low intensity level. Hu residues were then averaged among the four expressers of each culture to create a composite accuracy of recognizing in-group versus out-group members at the subtle, low, and moderate intensity levels.Footnote 5

Results

A 2(Judge culture: American vs. Chinese) × 2(Expresser culture: Caucasian vs. Chinese) × 3(Emotion: angry, sad, happy) × 3(Intensity level: subtle, low, moderate) mixed ANOVA, with repeated measures on the latter three variables, was conducted on the composite adjusted recognition accuracy (Hu residues). Above all effects, a significant four-way Judge culture × Expresser culture × Emotion × Intensity interaction was found, F(4, 362) = 5.33, p < .001, η 2 = .06, indicating that recognition accuracy varied as a function of judges’ culture, expressers’ culture, emotion, and intensity levels of the expressions. Therefore, a 2(Judge culture) × 2(Expresser culture) × 3(Intensity) repeated-measures ANOVA was conducted separately for each emotion.

Angry expressions

The ANOVA revealed a significant main effect of expresser culture, F(1, 365) = 27.1, p < .001, η 2 = .07. Overall, Caucasian expressions (M = −0.07, SE = 0.003) were more accurately judged than Chinese expressions (M = −0.08, SE = 0.002). A significant main effect of Intensity was also found, F(2, 364) = 2,892.4, p < .001, η 2 = .94; as the intensity of the expressions increased from subtle to moderate, recognition accuracy also increased, respectively, M = −0.23, −0.06, and 0.07 (SEs = 0.003) at the subtle, low, and moderate intensities. Significant two-way interactions were found between Judge culture × Expresser culture, F(1,365) = 55.9, p < .001, η 2 = .13, Judge culture × Intensity, F(2, 364) = 27.8, p < .001, η 2 = .13, and Expresser culture × Intensity, F(2, 364) = 583.96, p < .001, η 2 = .76. Most important of all, a significant three-way interaction was found between Judge culture × Expresser culture × Intensity, F(2, 364) = 20.0, p < .001, η 2 = .10. Panel A in Fig. 1 illustrates the three-way interaction.

Fig. 1
figure 1

Comparing American Judges and Chinese Judges in recognition accuracy in identifying Caucasian and Chinese facial expressions of emotions of varying intensities. Error bars represent standard errors. AJ American Judges, CJ Chinese Judges, CauExp Caucasian Expressions, ChiExp Chinese Expressions. a Angry expressions, b sad expressions, c happy expressions

To tease apart the three-way interaction, simple effects analyses (two-tailed, with Holm-Bonferroni corrections, adjusting for the number of multiple comparisons) were conducted, comparing American judges and Chinese judges at each of the intensity levels in judging in-group versus out-group members. At the subtle intensity level, American judges were more accurate than Chinese judges in judging both Caucasian and Chinese expressions, respectively, F(1,365) = 4.99, p = .052, η 2 = .013 for Caucasian expressions and F(1, 365) = 7.38, p = .024, η 2 = .02 for Chinese expressions. As the intensity level increased to the low intensity level, an in-group advantage was found. Specifically, American judges were more accurate than Chinese judges in judging Caucasian expressions, F(1, 365) = 22.16, p < .001, η 2 = .06, and Chinese judges were more accurate than American judges in judging Chinese expressions, F(1, 365) = 7.69, p = .024, η 2 = .02. At the moderate intensity level, American and Chinese judges did not differ in judging Caucasian expressions, F(1, 365) < 0.000, ns, but Chinese judges were more accurate than American judges in judging Chinese expressions, F(1, 365) = 84.31, p < .001, η 2 = .19.

Sad expressions

The ANOVA for sad expressions also revealed a significant main effect of expresser culture, F(1,365) = 312.04, p < .001, η 2 = .46. Again, Caucasian expressions (M = −0.02, SE = 0.003) were more accurately judged than Chinese expressions (M = −0.08, SE = 0.002). A main effect of judge culture group was found, F(1,365) = 48.97, p < .001, η 2 = .12, showing that American judges (M = −0.04, SE = 0.002) were overall more accurate than Chinese judges (M = -0.06, SE = 0.002). A main effect of Intensity was also found, F(2, 364) = 1,477.02, p < .001, η 2 = .89, showing again that as the intensity of the expressions increased, recognition accuracy also increased, respectively, M = −0.16, −0.05, and 0.05 (SE = 0.002, 0.002, and 0.003) for the subtle, low, and moderate intensity levels. In addition to the main effects, the analysis also revealed significant two-way interactions between Judge culture × Expresser culture, F(1,365) = 7.42, p = .007, η 2 = .02, Judge culture × Intensity, F(2, 364) = 19.48, p < .001, η 2 = .10, and Expresser culture × Intensity, F(2, 364) = 260.97, p < .001, η 2 = .59. These interactions, however, were qualified by a significant three-way Judge culture × Expresser culture × Intensity interaction, F(2, 364) = 5.02, p = .007, η 2 = .03. Panel B in Fig. 1 illustrates the three-way interaction.

To tease apart the interaction, simple effects analyses (two-tailed, with Holm-Bonferroni corrections) were conducted at each of the intensity levels. American judges were more accurate than Chinese judges in judging both Caucasian and Chinese sad expressions at both the subtle and low intensity levels, respectively, F(1, 365) = 13.5 and 37.8, ps < .001, η 2 = .04 and .09 for judging Caucasian expressions at subtle and low intensities and F(1, 365) = 38.7 and 26.3, ps < .001, η 2 = .10 and .07 for judging Chinese expressions at subtle and low intensities. At moderate intensity, an in-group advantage was found. American judges were more accurate than Chinese judges in judging Caucasian expressions, F(1, 365) = 6.06, p = .028, η 2 = .02, and Chinese judges were more accurate than American judges in judging Chinese expressions, F(1, 365) = 15.02, p < .001, η 2 = .04.

Happy expressions

The ANOVA for happy expressions revealed a marginal main effect of Expresser culture, F(1, 365) = 2.86, p = .09, and a significant main effect of Intensity, F(2, 364) = 3,729.8, p < .001, η 2 = .95, which showed once again that as the intensity of the expressions increased, recognition accuracy also increased, respectively, M = −0.21, 0.05 and 0.12 (SEs = 0.003) for the subtle to moderate intensity levels. No main effect of judge culture group was found. Significant two-way interactions were found between Judge culture × Intensity, F(2, 364) = 8.99, p = .018, η 2 = .024, and Expresser culture × Intensity, F(2, 364) = 339.58, p < .001, η 2 = .65. However, they were qualified by a significant three-way interaction between Judge culture group × Expresser culture group × Intensity, F(2, 364) = 4.05, p = .018, η 2 = .02. Panel C in Fig. 1 illustrates the three-way interaction.

Simple effect comparisons between American judges and Chinese judges (two-tailed, with Holm-Bonferroni corrections) indicated, however, that American judges did not differ from Chinese judges in judging happy expressions at all intensity levels except at the low intensity level where American judges were more accurate than Chinese judges in judging Caucasian expressions, F(1, 365) = 8.9, p = .018, η 2 = .024.

Discussion

The present study compared Americans with the Chinese in the recognition of subtle to moderately intense angry, sad, and happy expressions of in-group and out-group members. Evidence for the relative in-group advantage was found in the recognition of angry and sad, but not happy expressions. Specifically, a clear in-group advantage was found in the recognition of angry expressions at the low intensity level; Americans and Chinese both were more accurate in judging expressions of in-group than out-group members. A partial in-group effect was also found in the recognition of angry expressions at the moderate level; Chinese were more accurate than Americans in judging the Chinese angry expressions even though they did not differ from Americans in judging the American angry expressions. Consistent with these results and our predictions, a clear in-group advantage was also found in the recognition of sad expressions at the moderate level; Americans and Chinese both were more accurate at recognizing sad expressions expressed by in-group than out-group members. Together, these findings confirmed the existence of the relative in-group advantage in the recognition of angry and sad expressions at low to moderate intensity.

Why might our findings have shown an in-group advantage at the low intensity level for angry expression, rather than our predicted in-group advantage at the moderate level? Matsumoto (2002) suggested that the in-group advantage may be best observed when expressive signals are of mid-range clarity. Even though mid-range clarity is clearly related to intensity, it is also related to specific emotions. Happy, and to a lesser degree, angry expressions, were found to have lower in-group advantages than other emotions because these two expressions have higher signal clarity and are more recognizable than the other emotions (Elfenbein and Ambady 2002b; Matsumoto 2002). This inherent difference in signal clarity between different emotional expressions probably explained why the in-group advantage was not observed for happy expressions but was observed at the low intensity for angry expressions and at the moderate intensity for sad expressions.

The present findings also demonstrated that the in-group advantage for angry and sad expressions has an intensity boundary. As the expression intensity dropped to lower levels, the in-group advantage disappeared, replaced by a main effect of judge culture in recognition accuracy. American judges were more accurate than Chinese judges in judging both Caucasian and Chinese angry expressions at the subtle intensity level. American judges were also more accurate than Chinese judges in judging both Caucasian and Chinese sad expressions at both the subtle and low intensity levels. The two cultural groups did not differ in judging subtle happy expressions. The present findings suggest that the in-group advantage demonstrated by cultural members subsides in recognizing subtle- to low-intensity negative expressions, and where the in-group advantage stops, cultural differences in sensitivity to very subtle expressions come to fore.

As described earlier, two alternative hypotheses could be made about possible cultural differences in the recognition of subtle expressions. The present finding appears to support the second hypothesis, which predicted Chinese individuals to be less accurate than Americans in judging negatively-valenced subtle expressions, perhaps because negative expressions have more troublesome implications for interpersonal harmony and could potentially be disruptive to that harmony (Matsumoto 1989; Matsumoto and Ekman 1989; Matsumoto et al. 2008). Possibly, the desire to preserve “face” and maintain interpersonal harmony makes Chinese less sensitive or more oblivious to subtle negative expressions because accurate perception of these expressions disturbs interpersonal equilibrium.

The present findings suggest cultural forces other than the in-group advantage in the recognition of subtle facial expressions. Previous research has noted that people in individualistic cultures have greater personal autonomy and show greater skills in entering and leaving social relationships (Triandis et al. 1988). Personal feelings and expressions of these feelings are paramount in interpersonal communication in individualistic cultures. In contrast, Chinese culture is a prototypical example of vertical collectivism that emphasizes both social interdependence and hierarchy (Singelis et al. 1995; Triandis 1995; Shavitt et al. 2011). People are expected to act in accordance with their social places in the context of hierarchical relations with others (Hsu 1981; Shavitt et al. 2011; Yang 1981). Preservation of harmony of social relations is more important than expressions of personal feelings in the Chinese society. These different cultural values probably orient the Chinese toward neutrality and ambivalence in negative social interactions (Peng and Nisbett 1999), and Americans to be more perceptive of others’ feelings and intentions.

As discussed earlier, the present study did not find any difference between Americans and Chinese in recognition of happy expressions except at the low intensity where Americans were found to be more accurate than Chinese in judging the Caucasian expressions. This overall lack of strong cultural effects likely reflects a ceiling effect in judging happy expressions (Elfenbein and Ambady 2002b; Matsumoto et al. 2002). As discussed, happy expressions are more recognizable than other expressions because they involve fewer facial muscles and less subtle muscle actions (Matsumoto et al. 2002). People recognize happy expressions with very high accuracy even at reduced intensity levels (Zhang and Parmley 2011). In the present study, the raw recognition accuracy for happy expressions already hit around 84 % at the low intensity level and around 97 % at the moderate intensity level. Homogeneity in recognizing happy expressions may have excluded any cultural effect. The fact that American and Chinese judges did not differ in recognizing subtle happy expressions, contrasting with the result on anger and sadness, suggests that the cultural difference in judging subtle expressions is particular to judgments of angry and sad expressions. Negative emotions such as anger or sadness have more troublesome implications for interpersonal harmony, which probably makes accurate detection of these expressions more threatening and uncomfortable to the Chinese.

Strengths of the present study include the inclusion of expressions that varied in both intensity and in the cultural background of the expresser and the inclusion of a large sample of both Chinese and American judges residing in their home countries (in contrast, many studies compare East Asian and American participants both studying in Western Universities). Our novel results answer an important quandary posed by the past literature: for subtler, more everyday emotional expressions, would East Asian participants demonstrate an emotion recognition advantage over American participants (due to a greater vigilance for threats to social harmony) or an emotion recognition disadvantage over American participants (due to a desire to avoid threats to social harmony)? The current results suggest the latter, which following replication and extension may refine contemporary theories of the in-group advantage and of cross-cultural differences in emotion recognition.

However, the present study had several limitations. First, for negative emotions, we only examined angry and sad expressions because anger and sadness signal social disapproval or loss. It would be interesting to see whether the present findings will be replicated using other negative emotions such as shame, guilt, contempt, and disgust, and other positive emotions such as contentment and surprise. A broadened investigation will allow for a more general conclusion to be drawn. Second, our sample sizes were large and the effect sizes of key findings were generally small. The small effect sizes suggest that while the differences observed were statistically significant, it is possible that they may not be meaningful differences in experience. We recruited large samples because of the complexity of the research design and the number of factors we wished to examine in the study. The effect sizes, despite being small, are comparable to the effect size (r 2) of .06 (95 % confidence interval was .02–.12) Elfenbein and Ambady (2002b) reported in the meta-analysis of forty-eight studies on the in-group advantage.Footnote 6 Additionally, we note that two features of the present study contributed to the small effect sizes. First, the design of the study was complex and involved many main effects and two-way interactions, resulting in reduced three-way interaction effects among the independent variables. Second and more importantly, we used Hu residuals to measure recognition accuracy. While Hu residuals adjusted for differences in intensities of the original expressions and thus provided a better measurement of recognition accuracy, they did remove a large portion of variance in the data and likely artificially reduced the effect sizes. In this sense, the present results can in fact be regarded as robust because even with such stringent data criteria, the results were still found significant. Prentice and Miller (1992) argued that the size of an effect depended not just on the relationship between the independent and dependent variables but also on the operations used to generate the data and small effects can be important if the effects hold even under the most inauspicious circumstances. Our data fit this description. Furthermore, the cultural differences found in the present study made theoretical sense and were consistent with known differences between American and Chinese cultures.

A third limitation is that although the morphing procedure keeps the timing of the transition of the expression constant, it does not represent real-time movements in facial expressions. The photos that participants viewed were static, and it is not clear how watching the actual movement of the expression might impact the perception of these expressions. Actual movement of expressions includes dynamic features such as tempo, duration, velocity, and unfolding of expressions that are not reflected in static expressions and play a unique role in nonverbal communication (Ekman 1984; Edwards 1998; Harwood et al. 1999). Previous research has produced mixed results as to whether dynamic expressions affected recognition accuracy (Fiorentini and Viviani 2011; Krumhuber et al. 2013; Wehrle et al. 2000). Future research should examine participants’ accuracy in assessing facial expressions as they are naturally shown in a video sequence to see if similar cultural patterns emerge. Fourth, whenever one performs cross-cultural studies of this sort, one must question whether the samples differed above and beyond cultural background. Other than the difference between the two samples in age and sex distribution (which was nevertheless found to have no effect on recognition accuracy), the major way in which the samples differed was that the American sample was recruited from a small liberal arts college while the Chinese sample was recruited from a top-tiered Chinese university. Thus, the Chinese students in the study were part of a larger educational institution and were a more selective group of students. What effect these variations in sampling might have on recognition of facial expressions is unclear. The fact that we found a clear in-group advantage in the recognition of angry and sad expressions at the low and moderate intensity for both samples suggests that any effect that this sampling difference might have on facial recognition may be negligible. Finally, we only compared American judges with Chinese judges. Before broad conclusions can be made about differences among East Asian cultures and Western cultures, additional judges from various cultures of each type should be examined further.

Despite these limitations, the present study provided evidence to show that the in-group advantage existed at the low and moderate intensity levels, thus extending the previous research. The present findings also suggest a cultural difference in sensitivity to subtle expressions of negative emotions. Compared to Americans, Chinese appear to show a reduced sensitivity to subtle expressions of anger or sadness at subtle or low intensities, which may be related to their strong desire to preserve interpersonal harmony. The present findings have important implications for cross-cultural communication of emotions, suggesting that cultures differ in how sensitive people are to subtle emotional signals. As our global culture evolves to be increasingly interactive, understanding how emotion recognition varies by cultures matters a great deal in the quality and success of these interactions.