People tend to make trait inferences from facial features (Oosterhof & Todorov, 2008; Todorov et al., 2009) with remarkable speed and consensus (Rule et al., 2013; Todorov et al., 2009). This trait inference tendency can be based on facial features alone (Oosterhof & Todorov, 2008; Todorov, Baraon, & Oosterhof, 2008). Moreover, these evaluations can predict important social outcomes (Antonakis & Dalgas, 2009; Rule & Ambady, 2008; Wilson & Rule, 2015; Zebrowitz & McDonald, 1991). However, the role that cultural contexts play in face-to-trait inferences has not been fully examined.

Previous cross-cultural studies have widely documented cultural differences in social inferences (Choi et al., 1999; Nisbett et al., 2001). Such studies have shown that whereas European Americans tend to display an analytic thinking style and more readily make trait inferences than East Asians (e.g., Na & Kitayama, 2011; Shimizu et al., 2017; Shimizu & Uleman, 2021), East Asians tend to display a holistic thinking style and pay more attention to situational factors than European Americans (e.g., Choi & Nisbett, 1998; Miyamoto & Kitayama, 2002). Given such cultural differences, it can be expected that the extent to which people ascribe traits to individuals based on facial features may be less among East Asians than European Americans, a concept which has never been directly tested.

The goal of the current research is to fill this gap. We made the following predictions. First, although both European Americans and East Asians ascribe traits to individuals based on facial features, European Americans will make stronger trait ascription than will East Asians. Second, we expected that cultural differences in holistic attention may partly mediate cultural differences in trait ascription.

A considerable body of cross-cultural research has documented cultural differences in how people make social inferences (Choi et al., 1999; Masuda et al., 2019). In European American cultural contexts that highlight an independent model of individuals, people are defined in terms of their internal attributes separately from situations. In such cultural contexts, people readily and spontaneously infer a trait from another person’s behavior, with limited attention paid to situational information (Gilbert & Malone, 1995). In contrast, in East Asian cultural contexts that emphasize an interdependent model of individuals, people are defined largely in terms of their relationships to social contexts. In such cultural contexts, people are more likely to take situational factors into consideration when making social judgments (Choi & Nisbett, 1998; Miyamoto & Kitayama, 2002) and less likely to show spontaneous trait inference (Na & Kitayama, 2011; Shimizu et al., 2017). These findings suggest a possibility that the degree to which people ascribe traits to another person based solely on individual attributes such as facial features—without any situational information—may also differ across cultures.

At the same time, cross-cultural studies that examined trait judgments from faces found high cross-cultural consensus in face-to-trait inferences (Na et al., 2015; Rule et al., 2010; Zebrowitz et al., 2012). These studies asked participants to judge a trait based on faces, and found high correlations between American and non-American respondents’ judgments. For example, Na and colleagues (2015) found that Korean and European American respondents’ inferences of competence based on political candidates’ facial features were highly correlated with each other, even though the inferred traits predicted election outcomes less in Korea than in the U.S. However, even though there is cross-cultural consensus on which faces are more likely to be ascribed a certain trait compared to the other faces (i.e., ordering of faces along a certain trait dimension), cultures could differ in the degree to which people ascribe traits to faces.

In particular, although people across cultures may ascribe similar traits to individuals based on their facial features, the extent to which trait inferences vary as a function of stimulus trait-signaling intensity may differ across cultures. Small changes in trait-related facial cues may convey gradual information about potential personality traits, thus facilitating a corresponding judgment about a person (Lynn & Barrett, 2014). However, the degree to which people use such trait-related changes in facial cues to infer corresponding traits of the target person may vary across cultures. That is, European Americans, who tend to more readily infer traits from another person’s features, may pay more attention to changes in trait-signaling facial cues and make corresponding trait judgments than East Asians, who tend to also rely on situational information to infer traits of others. Taking situational factors into consideration when inferring traits from faces would reduce the relative impact of changes in trait signaling facial cues on trait inferences.

Although not directly about the trait-signaling intensity, there is suggestive evidence showing cultural differences in the degree of face-to-trait inferences. Walker and colleagues (2011) found that Asian respondents were slower and made more errors than Western respondents when presented with a pair of faces (with enhanced versus reduced salience of personality trait) and asked to identify the one that matched the given personality trait. Based on such evidence, we tested the potential cultural differences in the degree of trait inference based on trait-signaling intensity by comparing European American and Korean respondents’ judgment of traits based on Caucasian and Asian faces created using model-based techniques (see the Method section for details). We hypothesized that, although both European Americans and East Asians ascribe traits to individuals based on facial features, European Americans would make stronger trait ascription than would East Asians. We further hypothesized that cultural differences in holistic attention to situation would mediate cultural differences in trait ascription; European Americans who tend to ascribe traits mainly based on facial features would show stronger face-to-trait inferences than East Asians partly because East Asians tend to also consider situational factors when ascribing traits.

We further explored if there would be out-group homogeneity effect, and whether the effect would depend on respondents’ cultural backgrounds. It is well-documented that people tend to view members of out-groups as more homogeneous by focusing on shared categorical features, while individuating in-group members by focusing on unique characteristics (Judd & Park, 1988; Park & Rothbart, 1982; Quattrone & Jones, 1980; Fiske & Neuberg, 1990). Moreover, studies examining neural responses to in-group and out-group faces suggested that in-group faces are processed more readily and deeply than out-group faces (Adams et al., 2010; Hughes et al., 2019; Ratner & Amodio, 2013; Van Bavel et al., 2008, 2011). It is thus likely that people make more differentiated trait ascriptions based on their own-race faces than other-race faces when perceiving faces that gradually change in trait-signaling intensity. At the same time, the bias does not always appear to occur, and depends on multiple contextual factors (Linville et al., 1989; Rubin & Badea, 2012; Simon, 1992), including perceivers’ cultural background (e.g., Lee & Ottati, 1993). For example, using a recognition task, Ng and colleagues (2016) found that East Asian Canadians showed weaker own-race face memory effect compared to European Canadians (though also see Sutherland et al., 2018). We thus explored if such cultural differences can be observed even with a perceptual task using controlled facial stimuli.

To test these hypotheses, we used model-based techniques (Oosterhof & Todorov, 2008) to create Caucasian and Asian face stimuli with finely graded continua of trait signals ranging from submissive to dominant or untrustworthy to trustworthy. We then compared the extent to which European American and Korean respondents ascribe dominance and trustworthiness traits to these faces along the continuum of trait intensity. We predicted that although both European Americans and Koreans ascribe traits that track the trait intensity predicted by the model, European Americans would make stronger trait ascriptions than Koreans. Furthermore, we examined whether attention to situational information would partly mediate cultural differences in trait ascription. We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. The preregistration can be accessed at http://aspredicted.org/blind.php?x=7eq64c.

Method

Materials

Sets of Caucasian and Asian face stimuli that vary in their two underlying trait dimensions (Trustworthiness and Dominance) were created following the methods used by Oosterhof & Todorov (2008). We first randomly generated 300 Caucasian and 300 Asian faces using Facegen modeller 3.5 with emotional expressions set to neutral. We then recruited 70 participants (35 European Americans and 35 Koreans) to judge the 300 own-race faces on trustworthiness and dominance dimensions, with the response scale ranging from 1 to 9. The mean judgments averaged across these participants were used to find the precise dimensions of trustworthiness and dominance based on the best linear fit of the mean judgments as a function of the 130 shape and texture components for each social dimension. Using these components of judgment, we constructed the optimal weight for each component in changing this attribute to increase or decrease 1 SD of trustworthiness (or dominance) trait by altering the normalized face shape vector. We then orthogonalized trustworthiness and dominance vectors so that those would be mutually exclusive (see Oosterhof & Todorov 2008 for the detailed procedure of model construction). Based on the models, 20 faces of each race were generated and manipulated in seven degrees of trustworthiness and dominance (-4.5, -3, -1.5, 0, 1.5, 3, and 4.5 SD). This resulted in 140 faces per race on a single trait dimension and 280 Caucasian and 280 Asian faces in total (see Fig. 1). All the faces were generated as middle-aged, gender-neutral faces and were presented against a black background.

Fig. 1
figure 1

A 2D model of face evaluation. Examples of (A) a Caucasian and (B) an Asian face varying on the two orthogonal dimensions: trustworthiness and dominance. The face changes were generated from a computer model based on trustworthiness and dominance judgments of 300 Caucasian and 300 Asian faces. The extent of face changes is presented in SD units.

Additionally, to test our hypothesis that situational attention would partially mediate cultural differences in the magnitude of facial trait inferences, we asked participants to rate the extent to which they took into consideration various factors while making judgments regarding the person’s face. Four items pertained to contextual and demographic factors (i.e., possible surrounding situation, race, gender, and age) and four items pertained to facial features (i.e., eyes, brows, nose, and mouth), with additional two filler items included (i.e., overall impression and emotional expression).

Participants

Based on the power analysis (power = 0.95, alpha = 0.05, effect size f = 0.1, G Power), we preregistered to collect 120 participants from each culture. We eventually obtained 127 European American undergraduate students from the U.S. and 150 Korean undergraduate students from South Korea, with the expectation that some respondents would be excluded. A sensitivity analysis with power = 0.95, alpha = 0.05, total sample size = 252, and number of measurements = 5 suggested the effect size of 0.12 using f, which is equivalent to 0.01 using η2. The participants received course credit for taking part. Following our preregistered plan, we excluded participants who failed an attention check, who provided the same response to more than 90% of questions, and who were not European Americans in the U.S. data collection or not Koreans in the Korean data collection. After respondents who failed to meet the criteria were excluded (5 participants from US data and 21 participants from Korea data), responses from 122 European Americans and 129 Koreans were included in the data analysis.

Procedure

We employed a 2 Respondents’ culture (European Americans vs. Koreans; between-subjects) × 2 Ethnicity of faces (Caucasian vs. Asian; between-subjects) × 2 Trait type (Dominance vs. Trustworthiness; between-subjects) × 7 Level of trait (-4.5, -3, -1.5, 0, 1.5, 3, 4.5 SD; within-subject) mixed design. Participants were randomly assigned to one of four conditions. Each participant saw 140 Caucasian or Asian faces that varied on either dominance or trustworthiness and rated each of them on one of those two traits on a 9-point scale. Within each of four conditions (2 face races x 2 trait types), the order of 140 faces was randomized among participants. After rating all the faces, the participants rated factors (e.g., possible surrounding situation, demographic factors, specific facial features) that they took into consideration when judging faces on a 7-point rating scale. Finally, all respondents rated how much they were distracted during the rating task with 3 items (e.g., “I was distracted while rating the faces”) on a 7-point rating scale.

Results

Cultural differences on trait ascription

We first standardized all the trait ratings from each respondent to control for potential cultural differences in response bias. We then performed Linear Mixed Model (LMM) analyses with the standardized trait ratings Footnote 1as the dependent variable and Face ethnicity, Trait type, Level of trait, Respondents’ culture, and all their interaction terms as predictors. In this model, we allowed the slope of ratings on seven trait levels of each face (intensity of trait ascription) to be random (see Table S2 for detailed results). There was a main effect of Level of trait (B = 0.27, S.E. = 0.006, t(40) = 41.1, p < .001); the level of trait linearly predicted trait ascription in both cultures, such that as the trait intensity of face increased, a more extreme trait was inferred. However, the main effect was qualified by an interaction of Respondents’ culture × Level of trait (B = 0.02, S.E. = 0.004, t(35,060) = 5.92, p < .001). Supporting our prediction, the level of trait had a larger impact on the trait ratings for European American respondents (B = 0.28, 95% CI [0.27, 0.29]) than for Korean respondents (B = 0.26, 95% CI [0.24, 0.27], z = 5.92, p < .001, see Table S3).

This interaction was further moderated by Type of trait (B = 0.02, S.E. = 0.008, t(35,060) = 2.18, p = .02). Although cultural differences in the magnitude of trait ascription were found across traits, cultural differences were more prominent for dominance (BAmericans = 0.31, BKoreans = 0.27, z = 5.72, p < .001) than for trustworthiness (BAmericans = 0.26, BKoreans = 0.24, z = 2.65, p = .008, see Table S4).

Next, to test if this effect was driven by cultural differences in distraction, we first ran a t-test to examine if there was a cultural difference in the self-reported distraction. Although we did not find differences in distraction between the cultures (t(249) = 1.54, p = .12), we controlled for Distraction and the interaction of Distraction and Level of trait to check if the Respondents’ culture × Level of trait interaction would remain even after controlling for the effects of Distraction. Controlling for Distraction did not change results; the Respondents’ culture × Level of trait interaction remained the same (B = 0.03, S.E. = 0.004, t(35,050) = 6.33, p < .001), indicating that the found effect is not attributable to distraction.

Attention to situation as a mediator

We first standardized all the ratings of factors the participants reported taking into consideration to control for potential cultural differences in response bias. The ratings were standardized within individual participants. We adjusted for multiple testing using Bonferroni correction. As predicted, compared to European Americans, Koreans paid more attention to surrounding situation (MAmericans = − 0.75, MKoreans = − 0.43; t(248)= -2.89, η2 = 0.03, p = .033). At the same time, European Americans paid more attention to gender (MAmericans = − 0.35, MKoreans = − 0.68; t(249) = 3.14, η2 = 0.04, p = .015) and age (MAmericans = − 0.24, MKoreans = − 0.64; t(249) = 4.43, η2 = 0.07, p < .001) than Koreans, which did not support our prediction that Koreans would be more likely than European Americans to pay attention to demographic factors given their holistic attention (see Table S5 for analyses of the other factors). It is possible that considering demographic backgrounds, such as gender and age, when perceiving faces may reflect attention to the individuating attributes of the person (rather than attention to situational information per se). We thus focused on attention to “possible surrounding situation” as a measure of situational attention and tested whether it mediates cultural differences in the magnitude of facial trait inferences.

We then tested if there was a cultural main effect on attention to situation with a Respondent Culture × Face Ethnicity × Trait Type ANOVA. Results revealed a cultural main effect on attention to situation (F(1, 242) = 7.69, η2 = 0.03, p = .006): Koreans demonstrated greater attention to situation (M = − 0.42) than did Americans (M = − 0.75). To test the model assumption that the Respondents’ culture × Level of trait interaction influences a trait inferences through attention to situation, we conducted a moderated-mediation analysis using multilevel SEM (see Fig. 2). The analysis revealed that Respondents’ attention to situation partially accounted for cultural differences in the magnitude for facial trait inferences. In particular, there was a significant effect of culture on situational attention (B = − 0.33, z = -34.30, p < .001), and a significant Level of trait (B = 0.26, z = 99.2, p < .001) and a significant situational attention × Level of trait interaction on trait ratings (B = − 0.02, z = -7.64, p < .001). That is, in line with the prediction, the level of trait had a weaker impact on the trait ratings at higher (1SD above the mean) situational attention (B = 0.26, 95% CI [0.25, 0.26]) than at lower (1SD below the mean) situational attention (B = 0.28, 95% CI [0.28, 0.30]). A Culture × Level of trait interaction on trait ratings remained significant (B = 0.02, z = 4.13, p < .001). Finally, the index of indirect effect was significant (B = 0.01, z = 7.45, p = .001). These results suggest that the situational attention score partly accounted for the interaction of Respondents’ culture × Level of trait (see Table S6 for detailed results).

Fig. 2
figure 2

Moderated mediation model with multilevel SEM used to test whether situational attention mediated the Respondents’ culture x Level of trait interaction. Respondents’ culture: European Americans = 0.5, Koreans = -0.5.

Out-group homogeneity effects

The means of standardized trait rating for in-group and out-group faces are shown in Fig. 3. We performed Linear Mixed Model (LMM) analyses with the standardized trait ratings as the dependent variable and Face ethnicity, Trait type, Level of trait, Respondents’ culture, and all their interaction terms as predictors.

Fig. 3
figure 3

(A) Americans’ trustworthiness judgments of faces generated by the trustworthiness model. (B) Koreans trustworthiness judgments of faces generated by the trustworthiness model. (C) Americans dominance judgments of faces generated by the dominance model. (D) Koreans’ dominance judgments of faces generated by the dominance model. All the judgments were standardized within respondents. Error bars show standard error of the mean of the standardized scores. Asi = Asian faces, Cau = Caucasian faces.

Supporting the out-group homogeneity effect, we obtained a Culture x Level x Face Ethnicity interaction (B = 0.09, S.E. = 0.008, t(35,060) = 11.02, p < .001). To understand the 3-way interaction, we further tested a Level x Face Ethnicity interaction within each respondent s culture. For Americans, the level of traits had a larger impact on the trait rating for in-group (Caucasian) faces (B = 0.32, 95% CI [0.30, 0.34]) than for out-group (Asian) faces (B = 0.275, 95% CI [0.23, 0.27]). The difference in the slopes for Asian and Caucasian faces was significant (BAsian vs. Caucasian = − 0.07, z = -5.31, p < .001), thus suggesting the out-group homogeneity effect. However, for Koreans, the impacts of the level of traits on the trait rating for in-group (Asian) faces (B = 0.27, 95% CI [0.25, 0.29]) and that for out-group (Caucasian) faces (B = 0.25, 95% CI [0.23, 0.27]) were not significantly different (BAsian vs. Caucasian = 0.02, z = 1.41, p = .16). Thus, out-group homogeneity effect was weaker (and event non-significant) among Koreans compared to among European American (see Table S7).

General discussion

Our research is the first to employ a data-driven statistical model to generate facial stimuli across cultures and to show that while trait inferences closely follow the trait intensity predicted by the computer models in both cultures (thus showing cross-cultural similarities), the magnitudes differ across cultures. That is, our research revealed stronger face-to-trait inferences for European Americans than for Koreans. Exploratory analyses also indicated that attention to situational information partly mediated such cultural differences. These findings make important contributions to the field of social cognition. Prior research showed that people ascribe traits to individuals based on facial features (Todorov, Pakrashi, & Oosterhof, 2009). Our data add to this body of research by showing that, although both European Americans and East Asians ascribe traits to individuals based on facial features, European Americans make stronger trait ascriptions than do East Asians. Given the known effect of facial features on important outcomes, such as election outcomes and judicial decisions (Antonakis & Dalgas, 2009; Zebrowitz & McDonald, 1991), the current findings have important implications for potential cultural differences in such outcomes and open the possibilities for future research. For example, Asians’ attention to situational information may lead them to make less extreme trait inference from facial features, which could result in a weaker effect on electoral outcomes (e.g., Na et al., 2015).

It is worth noting that prior research showed that Chinese respondents’ social judgments of faces relied on trustworthy/approachability dimension but not on dominance dimension (Sutherland et al., 2018; Wang et al., 2019). However, it is important to point out that we found cultural differences in face-to-trait inferences not only for dominance but also for trustworthiness. Thus, our findings cannot be attributed simply to the use of culturally less-relevant dimensions. At the same time, cultural differences were larger for dominance than for trustworthiness, indicating that face-to-trait inferences may be stronger for culturally relevant dimensions. Future research needs to examine whether cultural differences extend to other dimensions, such as competence (Sutherland et al., 2018; Wang et al., 2019).

Our data also showed that respondents who reported paying attention to situations were less likely to ascribe traits based on subtle changes in trait-signaling facial cues, leading to cultural differences in the magnitude of trait inference. The correlational nature of our data does not allow us to rule out the possibility of other causal relationships (e.g., reverse causality). However, given that attention has been shown to underlie cultural differences in trait inferences (Shimizu & Uleman, 2021), we believe that it is more likely for attention to underlie cultural differences in the magnitude of trait inference (rather than the magnitude of trait inference to underlie cultural differences in attention.)

Furthermore, our research provides evidence for the out-group homogeneity effect and its cultural moderation. Prior research showed that people individuate their own race’s faces compared to other race faces at the earliest stages of perception (Hughes et al., 2019) and that in-group faces are processed more deeply than out-group faces (Ratner & Amodio, 2013; Van Bavel et al., 2008, 2011). Extending such an effect, we found that people tend to make stronger trait ascriptions when perceiving own-race faces that gradually change in trait-signaling intensity than when perceiving corresponding other-race faces. Furthermore, our data suggest that, the out-group homogeneity effect is weaker for East Asians than for European American respondents. In fact, in our paradigm, East Asians (Koreans) did not show a significant out-group homogeneity effect towards Caucasian faces.

Our finding contributes to the body of existing cross-cultural evidence suggesting that Asians tend to show a weaker out-group homogeneity effect (Lee & Ottati, 1993; Ng, Steele & Sasaki, 2016). It would be fruitful for future research to examine potential mechanisms that may underlie such cultural differences. For example, greater familiarity with the out-group (Linville, Fischer, & Salovey, 1989) and higher power of the out-group (Shriver & Hugenberg, 2010) have been suggested to lead to greater perceived variability of the out-group and may play a role in the weaker out-group homogeneity effect among Asians.

There are limitations to the current research. First, although the exploratory analysis found that self-reported attention to surrounding situations partly mediated cultural differences in the strength of the face-to-trait inferences, the exact nature of such mediation processes is not yet clear. Future research should elucidate what aspects of the surrounding situations Asians are paying attention to and how such attention weakens the face-to-trait inferences. Second, while the use of computer generated faces allowed us to manipulate subtle changes while matching and controlling facial features across cultures, they are not the same as real human faces. Employing real faces to replicate the current findings would be important in the future.

Our research investigated the extent to which culture interacts with judgments of personality traits based on facial features by using a set of controlled facial stimuli created using a data-driven statistical model. However, it would also be important to identify specific objective facial features underlying judgments of personality traits across cultures. For example, prior research has shown that the facial width-to-height ratio (fWHR) of a person is associated with perceived dominance and trustworthiness judgments (Carré & McCormick, 2008). In general, faces with high fWHR are judged higher on dominance and lower on trustworthiness. Future researchers may examine how culture interacts with the target person’s fWHR in affecting such perceptions. It would be worthwhile to see if Asian perceivers’ dominance judgments are affected less by the target person’s fWHR compared to Caucasian perceivers’ judgments.

Furthermore, although our research utilized gender-neutral facial stimuli, it is possible that the found effect might interact with the target faces’ gender. In the existing body of research, some researchers have found that fWHR of both men and women is positively correlated with perceived dominance (Geniole et al., 2015). Others have concluded that, unlike male fWHR, female fWHR is not linked to perceived dominance (Mileva et al., 2014). It would be fruitful for future researchers to investigate if the trait ascription (especially dominance) based on facial features found in the current study is weaker for female faces than for male faces.

All in all, it is our hope that our research expands the field’s understanding of face-based trait inferences by elucidating both their cultural similarities and differences. Given the key role that face-based trait inferences play in social outcomes such as elections and court sentences, our findings carry important social implications.