Introduction

Human faces are sexually dimorphic. Structural facial dimorphism is often captured in landmark-based indices, such as the global masculinity index (Penton-Voak et al. 2001). In the last decade, a new index, the facial width-to-height ratio (fWHR; bizygomatic width divided by face height (distance between the upper lip and the brow ridge or nasion); Carré and McCormick 2008; Weston et al. 2007), has received considerable attention. fWHR has been associated with a large number of behavioural traits that influence human social interactions. These include traits such as fighting ability (Třebický et al. 2015), dominance and threat behaviour (display of selfish, pejorative or aggressive traits; Geniole et al. 2015), perceived formidability (Zilioli et al. 2015), and aggressiveness (Carré and McCormick 2008; Haselhuhn et al. 2015; Lieberz et al. 2017). An above-average fWHR has also been found in CEOs (chief executive officers) and prosocial leaders of NGOs (non-governmental organizations, N = 103, Hahn et al. 2017), hence the association between fWHR and social rank does not only embody antisocial behaviour. Instead, fWHR could be thought of as a predictor of social rank more generally, in both competitive and prosocial settings. A link between observer-rated dominance and fWHR has also been reported in Capuchin monkeys (N = 64; Lefevre et al. 2014), thus suggesting the link may be evolutionarily old. Here we study whether intrasexual differences in facial masculinity, as indexed by global facial masculinity and fWHR, are associated with various measures of testosterone in men and women.

Testosterone as an Underlying Mechanism

One potential underlying mechanism for the association between fWHR and dominance- or aggressiveness-related traits is the steroid hormone testosterone (T). A longitudinal study has shown that exposure to prenatal T is associated with facial masculinity in adult men and women (r = .51, N = 183; Whitehouse et al. 2015). Thus, interindividual differences in fWHR may be reflected in T concentrations. Across two samples, Lefevre et al. (2013) tested the associations between men’s salivary T reactivity (measured as T change from before to after a speed dating event), baseline T, fWHR and global facial masculinity (a composite measure of five sexually dimorphic components: (i) eye size, (ii) lower face height/face height, (iii) cheekbone prominence, (iv) face width/lower face height, and (v) mean eyebrow height, see Fig. 1 and Penton-Voak et al. 2001). Measures of fWHR showed positive relationships with baseline T (sample 1: r = .13, n = 181; sample 2: r = .26, n = 79) and T reactivity levels (sample 1: r = .21, n = 181). Associations with T reactivity, but not baseline T, remained stable after controlling for age and BMI. However, no association of global facial masculinity emerged with baseline T or with T reactivity (Lefevre et al. 2013). In an earlier study, the link between global facial masculinity, baseline T and T reactivity after a rigged competitive task in men was investigated (n = 57; Pound et al. 2009). In contrast to Lefevre and colleagues (Lefevre et al. 2013), Pound and colleagues (Pound et al. 2009) findings suggest that facial masculinity is associated with T reactivity (r = .36) but not baseline T (r = .19), at least in the winning condition. Hodges-Simeon et al. (2016) investigated the association of pubertal T levels with fWHR in peri-pubertal men and observed no robust relationship between pubertal T and fWHR (r = .13; N = 75). However, when controlling for age this association was significant (r = .28; see Welker et al. 2016 for re-analyses of these data also finding a significant positive effect). Eisenbruch et al. (2017) found no link between fWHR and baseline T and T reactivity (r = .09 and r = .06, respectively, N = 133 men). However, fWHR was significantly correlated with baseline T in an ethnic sub-sample of white men (r = .23, n = 85). Bird and colleagues (Bird et al. 2016) conducted an internal meta-analysis on correlations between fWHR and baseline T and T reactivity levels across seven samples of men (total N = 793), which revealed no significant associations (baseline T: r = −.04, T reactivity: r = −.03). A second meta-analysis encompassing their seven studies and the two samples from Lefevre and colleagues (Lefevre et al. 2013) was conducted, but the link between baseline T and fWHR remained insignificant (r = .01, N = 1041). Hence, recent evidence seems to show that fWHR may not be linked with T levels (baseline T and T reactivity), whereas findings for global facial masculinity are mixed. This study was conducted to provide further clarity in men, and to additionally explore these relationships in women

Fig. 1
figure 1

Landmarks used for morphometric calculations. Measure of fWHR: (a-b)/(c-d); eye size: (e-h)/(f-g); cheekbone prominence: (c-d)/(k-l); face width/lower face height: (c-d)/(e-i); lower face/face height: (e-i)/(j-i)

.

Testosterone Measures from Hair Samples

Previous studies relied primarily on the assessment of T levels from saliva samples. Recently, a new method of measuring hormone concentrations from human hair samples was developed (e.g., Gao et al. 2013; Russell et al. 2012), which has been validated showing significant associations between salivary and hair androgen measures (r = .67 for T, Wang et al. 2019). Assessments of hormone levels from hair samples are supposed to reflect mean hormonal concentrations over a longer time compared to salivary baseline measures (approximately 1 month per 1 cm of hair; Russell et al. 2012). Given that fWHR and global facial masculinity are stable traits developing over many years in childhood and adolescence, as are the personality traits that have been found to be associated with them, these more long-term measures of T levels from hair represent an intriguing alternative for investigating relationships with facial characteristics (e.g., Grotzinger et al. 2018).

Sexual Dimorphism in fWHR

fWHR has been postulated to be sexually dimorphic based on biometric analyses of skulls (Weston et al. 2007), which is in line with associations of fWHR with behavioural traits implied in intrasexual selection (a diverse range of aggression- and dominance- related behaviours and traits), mainly in men. Assuming a positive association between fWHR and T concentrations, considerably higher T levels in men, compared to women, may corroborate the idea of sexually dimorphic fWHR. A small degree of male-biased sexual dimorphism in fWHR has indeed been reported in a meta-analysis (k = 32 samples, overall N = 10,853, d = 0.11; Geniole et al. 2015). Larger fWHR in men compared to women was also found in a large sample of Commonwealth Game athletes (N = 3479, d = 0.31; Kramer 2015), but differences were non-significant when controlling for body size. Interestingly, ethnicity was reported to moderate these findings, in that significant sex differences in fWHR were only found in White and Black subsamples but not in Asian-Oriental, Indian-Oriental, or other subsamples. In Kramer’s (2017) meta-analysis of cranial data, men’s skulls had a slightly larger fWHR compared to women’s (k = 87 samples, N = 7941, d = 0.09). However, when accounting for ethnic diversity and using facial instead of cranial data, there was little evidence of sexual dimorphism in fWHR (k = 24 samples, N = 4161; Kramer 2017). Only weak evidence was found by Lefevre and colleagues (Lefevre et al. 2012), who reported a significant sex difference in fWHR in only one out of four samples (with larger fWHR in women, the effect of which faded when controlling for BMI; overall N = 924, d = 0.03), and a further null-finding was reported by Robertson et al. (2017; N = 444, partialη2 = .00). In sum, there is still much controversy whether fWHR is sexually dimorphic or not, and further research concerning this question is warranted.

Study Aim

In light of previous null-findings on the link of baseline T levels with fWHR (Bird et al. 2016) and global facial masculinity (e.g., Lefevre et al. 2013), the aim of the present study is to provide further evidence on the associations of these two facial metrics with salivary baseline T levels in men, as well as investigating them in women. To our knowledge, this study is the first to examine salivary T associations in women (but see Whitehouse et al. 2015 on prenatal T and facial masculinity in both sexes). Additionally, competition-induced salivary T reactivity in men (as in Lefevre et al. 2013) and, for the first time, hair T in women are explored as alternative measures. Finally, sexual dimorphism in fWHR and global facial masculinity are examined.

Methods

Participants

One hundred and sixty-five men (age: M = 24.2 years, SD = 3.3, range 18–34) and 157 women (age: M = 23.3 years, SD = 3.4, range 18–35) were recruited from a local participant pool (88.6% students and 98.8% European ethnicity for men, 98.1% students and 93.6% European ethnicity for women). The final sample sizes for baseline T (N = 140, see below) and T reactivity (n = 104) had sufficient power (> .80) to detect effect sizes of Pearson’s r > .23 and r > .26, respectively (Cohen 1992; Faul et al. 2007). Both samples were part of larger studies on hormonal effects in mating contexts (see Jünger et al. 2018; Kordsmeyer and Penke 2018).

3D Face Measures

Participants’ faces were 3D scanned twice for men and thrice for women. For each participant the most suitable scan (in terms of neutral facial expression and standardized head position) was chosen. Due to inadequate facial 3D scans, strong beardedness or the participants not wanting their scans to be used, the samples in this study were reduced to 140 men and 151 women. The remaining scans were aligned horizontally and 27 facial landmarks were placed using Morph Analyser 2.4 (Tiddeman et al. 2000), from which 12 were relevant for this study (Fig. 1). Two independent coders for men’s and women’s scans each placed the landmarks after receiving extensive training. The landmarks’ x- and y-coordinates from the two coders were aggregated and used to calculate fWHR by dividing the bizygomatic width (distance between the left and right zygion; Fig. 1) by face height (distance between the nasion and the vermillion of the upper lip, labrale superius; Stirrat and Perrett 2010; Weston et al. 2007). Additionally, four established sexually dimorphic facial metrics (eye size, cheekbone prominence, face width/lower face height, and lower face/face height) were computed and aggregated to form a global facial masculinity index (as in Penton-Voak et al. 2001). Intercoder reliabilities were good for all facial metrics (for men: fWHR ICC = .96, p < .001; global facial masculinity ICC = .91, p < .001; for women: fWHR ICC = .92, p < .001, global facial masculinity ICC = .85, p < .001; except for cheekbone prominence in men ICC = .61, p = .08, for women: ICC = .75, p = < .001).

Hormonal Measures

Participants were asked to refrain from drinking alcohol, exercising, taking recreational or non-prescribed clinical drugs on the day of the study, from ingesting caffeine (coffee, tea, coke) or sleeping 3 hours before the study, and from eating, drinking (except for water), smoking or brushing teeth 1 hour before their scheduled appointment (Geniole et al. 2013; Lopez et al. 2009). To check participants’ adherence to these instructions and to assess further potential influences on the saliva samples and hormonal levels, a screening questionnaire was administered at the beginning of the session (Schultheiss and Stanton 2009). None of the participants indicated to be taking hormonal medication or supplements. All saliva samples were taken in the afternoon to control for circadian variation in hormonal levels (between 2 pm and 6 pm for men, between 11.30 am and 6 pm for women; Schultheiss and Stanton 2009). Participants provided at least 2 ml of saliva via unstimulated passive drool through a straw, as recommended by Schultheiss and colleagues (Schultheiss et al. 2012). The samples were immediately transported to an ultra-low temperature freezer (−80 °C), where salivary testosterone levels (T) are stable for at least 36 months (Granger et al. 2004). At the end of the data collections, saliva samples were shipped on dry ice to the Technical University of Dresden, where men’s samples were analysed using chemiluminescence-immuno-assays with high sensitivity (IBL International, Hamburg, Germany) and women’s samples via liquid chromatography mass spectrometry (LCMS; Gao et al. 2015). The intra- and inter-assay coefficients (CVs) are below 11%. For men’s baseline T levels, saliva samples were collected on two different days (the second sample was collected approx. One week after the first) and then averaged to have a more reliable estimate of baseline T (due to large within-individual variation; Idris et al. 2017). For part of the men’s sample (n = 104), T reactivity was measured during the second session after the male participants engaged in a dyadic intrasexual competition (four disciplines: table pinball, snatching game, arm wrestling, turn-taking verbal fluency game; outcome naturally emerging) led by an attractive female confederate (for more detail, see Kordsmeyer and Penke 2018). Two post-competition salivary samples were collected, the first immediately after the competition (approx. 15 mins. After onset), the second 15 to 20 min later. For women, overall four saliva samples were taken for baseline T measures, two each in the luteal and follicular phases of their menstrual cycle, across two consecutive cycles (i.e., in each cycle one sample 16–18 days before and another 4–11 days before the next menstrual onset, cycle phase was validated with luteinizing hormone tests; for details see Jünger et al. 2018). Because hormonal levels are known to fluctuate during the menstrual cycle (Tsepelidis et al. 2007), only the two samples from the luteal phases were used, as were the 3D facial scans from these sessions. Two hair samples were taken from female participants, during their second and fourth visits (for approx. Two thirds of the sample in their luteal phases, for the remaining third in their fertile phase; hair T measures are available from n = 128 women as aggregates from two hair samples). Hair at the back of the subject’s head was pinned up, and two hair strands (each with a diameter of approximately 3 mm, almost full length of participants’ hair) were separated. Then, both strands were tightened with a thread, cut as close as possible to the scalp, and packed up in aluminium foil. Finally, the scalp-near ends of the strands were marked on the foil and samples were shipped to the Technical University of Dresden, where only the last grown centimetre of each hair sample was analysed using the LC-MS/MS method (Gao et al. 2013), thus presumably representing the mean testosterone levels within the previous month (Russell et al. 2012). All saliva (baseline, post-competition) and hair T measures were checked for outliers and winsorized to ± 3 SDs (Mehta et al. 2015; see supplementary material for details). The Shapiro-Wilk test revealed assumptions of normality were violated for both salivary baseline and post-competition T measures (for men: Ws < .96, ps < .001; for women: Ws < .90, ps < .001); thence, all variables were log10-transformed (after which assumptions of normality were met, Ws > .96, ps > .05, except for the first baseline T measure in men, W = .97, p = .003, assumption met for aggregate of both baseline T measures in men, W = .99, p = .19).

Statistical Analysis

Pearson correlations were calculated to examine bivariate associations of (1) fWHR, (2) global facial masculinity, (3) cheekbone prominence, (4) face width/lower face height, (5) lower face/face height and (6) eye size with levels of baseline T in both sexes as well as T reactivity in men. Correlations were run using the means of the two baseline measures taken for both sexes, means of the two hair measures for women, and for the T reactivity aggregate in men. For T reactivity measures, percent changes from baseline levels (using the saliva sample obtained on the day of the second session) to post-competition levels were determined (in accordance with Carré et al. 2014; Roney et al. 2003). The difference of baseline and post-competition levels was divided by baseline T. Since both T reactivity values were highly correlated (r = .65, p < .001) we decided to aggregate them, to get one T reactivity value. In line with Lefevre and colleagues (Lefevre et al. 2013), partial correlations controlling for age and BMI were calculated (the latter from participants’ measured height and weight; since body size has been suggested to be related to facial metrics and T levels; Kramer 2017; Třebický et al. 2015). Two-sided t-tests were conducted to examine mean differences in fWHR between men and women. To examine sexual dimorphism in global facial masculinity we used raw instead of the usual z-standardized scores and excluded eye size, because its absolute scale with considerably larger values compared to the other ratio measures (with values close to 1) would bias mean differences. We aggregated the remaining three components of global facial masculinity as follows: global facial masculinity = lower face height/face height - cheekbone prominenceface width/lower face height (thus in slight contrast to Penton-Voak et al. 2001, who formed a similar aggregate based on five components). All analyses were performed using R (R Core Team 2015).

Data Availability

The data and analysis script associated with this research are available at https://osf.io/3ze4g.

Results

Descriptive statistics and sex differences for all variables are shown in Table 1, correlations between main variables are depicted in Table 2. Baseline T and T reactivity were negatively correlated in men. In women, saliva T and hair T were positively correlated. fWHR correlated negatively with global facial masculinity and was not associated with baseline T in either sex. In men, fWHR was not related to T reactivity, also not after controlling for age and BMI (r = −.05, p = .62).

Table 1 Descriptive statistics and sex differences for all main variables
Table 2 Bivariate Pearson correlations between all main variables

Global facial masculinity or any of its four components were not related to baseline T in either sex (men: rs < │.17│, ps > .06; women: rs < │.17│, ps > .07) or to T reactivity in men (rs < │.14│, ps > .19), also not after controlling for age and BMI (rs < │.13│, ps > .23). Similarly, when analysing the two T reactivity measures in men separately, no significant associations with fWHR, global facial masculinity or any of its four components were found (rs < │.14│, ps > .17; Table S1 in the online supplementary).

Examining sexual dimorphism in fWHR, a two-tailed t-test showed that fWHR was larger for women than for men (t = 3.11, p < .01, d = 0.37). Global facial masculinity was higher in men compared to women (t = 4.69, p < .001, d = 0.56). Concerning the components of global facial masculinity, men showed a larger eye size than women (t = 15.06, p < .001, d = 1.73), and a larger lower face/face height (t = 5.36, p < .001, d = 0.64). Face width/lower face height was larger in women (t = 3.72, p < .001, d = 0.44). No significant sex difference emerged for cheekbone prominence (t = 1.05, p = .29, d = 0.12).

Discussion

Previous studies suggested associations of fWHR with dominance and aggression. Testosterone (T) has been hypothesized as an underlying mechanism, with some studies reporting positive relationships of baseline or reactive T with fWHR and a related measure, global facial masculinity, though results were inconsistent and a recent meta-analysis revealed non-significant associations for fWHR in men. The aim of the present study was to add further evidence on potential links of fWHR and global facial masculinity with T levels in men and to additionally examine these effects in women, as well as to provide further evidence for a potential sexual dimorphism in fWHR and in global facial masculinity.

Our results suggest that fWHR is neither associated with salivary baseline T in either sex, nor with competition-induced T reactivity (only measured in men) or hair T (only measured in women). This contradicts initial findings, which showed a positive link between fWHR and T levels (Lefevre et al. 2013), but is in line with more recent studies, which revealed no association of fWHR with T levels in adult (meta-analysis, Bird et al. 2016; Eisenbruch et al. 2017) and adolescent men (Hodges-Simeon et al. 2016). Hence, although fWHR has been linked with a vast range of dominance and aggression-related behaviours (Carré and McCormick 2008; Geniole et al. 2015; Hahn et al. 2017; Lieberz et al. 2017; Třebický et al. 2015; Zilioli et al. 2015), it does not seem to be robustly related to T levels (see Noser, Schoch, & Ehlert, 2018 for a recent finding of T moderating the association between fWHR and narcissism in men). Objectively-measured global facial masculinity, an index of sexually dimorphic structural facial characteristics (Lefevre et al. 2013; Penton-Voak et al. 2001), was not related to any of the T measures, contradicting a previous study which indicated an association of facial masculinity with T reactivity (Pound et al. 2009). Nevertheless, our null result corresponds to the findings of Lefevre and colleagues (Lefevre et al. 2013) who found no evidence for such an association. Thus, the relationship of T levels with global facial masculinity is questionable.

As this and earlier studies and meta-analyses have shown that fWHR appears not to be related to baseline or reactive T (e.g., Bird et al. 2016), the question remains which underlying mechanisms can contribute to explaining links of fWHR with focal individuals’ behaviour and how they are treated by others (e.g., Geniole et al. 2013; Zilioli et al. 2015). For example, an individual with higher fWHR may be perceived as more dominant or aggressive, without fWHR being calibrated to the individual’s traits or behavioural tendencies, instead partly explicable by observers’ sensory bias (e.g., Haselhuhn et al. 2013; for similar perceptual biases based on facial characteristics in the context of an economic game and criminal-sentencing decisions, see Eisenbruch et al. 2016, and Wilson and Rule 2015, respectively). This may result in a self-fulfilling prophecy for the focal individual, in that others treat individuals with higher fWHR as if they were more dominant or aggressive, who in turn habitually tend to behave in such a way. Still, other mechanisms likely have an influence and should be investigated further.

Different Measurements of T Levels

It is worth mentioning that we did not investigate pubertal or perinatal T levels, which might show stronger links with facial characteristics than baseline or reactive T levels, due to developmental links with traits implicated in craniofacial growth (Hodges-Simeon et al. 2016; Whitehouse et al. 2015). Moreover, due to potentially unreliable measures of salivary T, especially for women’s considerably lower T concentrations, the assessment of salivary T is not without pitfalls (Fiers et al. 2014). To tackle this, we aggregated saliva T measurement across two samples in order to get a more stable assessment of both baseline T and T reactivity. Furthermore, we also explored, for the first time, T assessment from women’s hair samples as a correlate of intrasexual differences in structural facial dimorphism, which is thought to provide more reliable and aggregated T measures than from saliva (Gao et al. 2013, Russell et al. 2012; while T measures from saliva and hair samples have been shown to be highly correlated, Wang et al. 2019). However, no associations of hair T with any of the facial measures were detected. This contrasts earlier findings on some behaviours and traits which were shown to be related to measures of hair T, but not salivary baseline T (e.g., aggression in N = 460 adolescents, partly moderated by hair cortisol; Grotzinger et al. 2018). In general, our results do not support the proposal that T functions as an underlying mechanism for the link between fWHR and dominance.

Sexual Dimorphism

In our study, men showed a significantly lower fWHR than women. This is rather unexpected, given that higher fWHR is thought to signal dominance and aggression, and previous findings suggested a higher fWHR for men compared to women (Carré and McCormick 2008; see Geniole et al. 2015 for a meta-analysis), while other studies found little or no evidence of sexual dimorphism in fWHR (Kramer et al. 2012; Kramer 2017; Lefevre et al. 2012; Robertson et al. 2017). One possible explanation would be that fWHR is suited to explain differences in masculinity among men but not between the sexes. Another suggestion comes from research showing that fWHR measures vary across different scanning procedures (e.g., Lefevre et al. 2012 found a lower fWHR in men compared to women in one of four samples, but only for measures from 3D facial scans and not from 2D facial photos). The fact that we used 3D face scans may partly explain this unexpected finding. Furthermore, it has been suggested that sexual dimorphism might be hard to detect on the facial surface due to greater facial adiposity in women, possibly obscuring any differences in the underlying bone structure (Kramer et al. 2012; Lefevre et al. 2012). Since results are mixed and explanations rather speculative, it appears that a sexual dimorphism in fWHR is in need of further investigation. Nevertheless, global facial masculinity was higher for men compared to women, indicating this measure may be better suited to describe between-sex differences in masculine facial characteristics than fWHR, although we only used three out of five components of the original index (Penton-Voak et al. 2001). Despite the broad use in the literature, measurements of facial masculinity based on just a few facial landmarks might not be the most valid method. Instead, approaches like geometric morphometrics might be superior in detecting masculine (sexually dimorphic) face shapes because they reflect biological factors and shape differences more accurately (Windhager et al. 2011; Mitteroecker et al. 2015).

Conclusion

Overall, our findings provide no evidence for a link of fWHR and global facial masculinity with salivary baseline T in men or women, T reactivity in men, or hair T in women. Hence, hypotheses that T levels are an underlying mechanism for the development of and/or sexual dimorphism in fWHR or global facial masculinity are not supported. Finally, our study does not support the proposal that fWHR is larger in men. Rather, in our samples, women had a higher fWHR than men. Thus, future studies should take into account that neither fWHR nor the global masculinity index reflect current T levels, rendering it unlikely that T underlies their relationship with dominance-signalling, and that global facial masculinity is better suited than fWHR to capture objective structural sexual dimorphism in the face.