As women’s recruitment and participation in traditionally male occupations increase, research suggests that organizations face challenges in retaining talented women beyond entry-level jobs (Soares et al. 2013; Soyars 2017; Yee et al. 2016). Women in the workforce often encounter gender stereotypes and biases that reinforce the existing gender hierarchy, which may impede their advancement to higher levels of leadership. Specifically, gender bias and reinforcement of stereotypes in leader performance evaluations may hinder women’s career aspirations and retention as well as limit their ability to be promoted.

In particular, existing gender hierarchies that are predicated on long-standing and widely held gender beliefs may implicitly influence performance evaluations. Gender, as with other social statuses, forms status hierarchies based on the relative values associated with each gender (Berger et al. 1977). For instance, for gender, men are considered higher status and women lower status and for professional position (e.g., leader vs. subordinate), leaders are considered to be higher status, whereas subordinates are lower status. Associated with social status are beliefs that often reinforce the status hierarchy.

Gender status beliefs that reinforce women’s perceived lower status position relative to men include stereotype content such as gendered language within performance evaluations for leaders in the form of descriptive and proscriptive characteristics. Descriptive characteristics (generally positive qualities) reinforce who we should be or how we should behave, whereas proscriptive characteristics (generally negative qualities) tell us who we should not be or how we should not behave (Prentice and Carranza 2002). These descriptive and proscriptive characteristics may have a gendered, stereotypic quality (i.e., masculine or feminine) such that men and women may differentially receive particular characteristics related to their gender status and role as a leader (Bem 1974; Prentice and Carranza 2002). Given the relative status of role and gender characteristics, leader performance evaluations may implicitly reinforce existing status hierarchies despite efforts to use objective evaluation criteria or meritocratic organizational practices and policies.

To better understand this implicit phenomenon, leader evaluation research examines leader status characteristics based on an agentic-communal dichotomy and finds that agentic characteristics are valued (higher status) whereas communal characteristics are not (lower status) (Abele and Wojciszke 2014; Bakan 1966; Bem 1974; Eagly 1987). Because men are described/prescribed to be agentic and women to be communal, women leaders are often evaluated as being status-incongruent. Women leaders (people of lower gender status in a position of higher status) often receive more proscriptive feedback because they are violating the gender status hierarchy (Rudman et al. 2012). We suggest that female employees in roles that are status-incongruent (e.g., leader) will receive less descriptive and more proscriptive feedback than will male employees who are status-congruent. Further, status-incongruent female employees may receive more masculine proscriptive feedback based on violations of role status and more feminine proscriptive feedback based on gender status. We explore these hypotheses by examining “real world” evaluations of women and men training to be military officers. Because both the leader role and the military are traditionally masculine, men are status-congruent and women are status-incongruent in this domain.

The present research using leader performance evaluations offers contributions to the existing literature on gender, status beliefs, and gender stereotypes in several ways. Whereas most research examining gendered leadership attributions is situated in experimental academic settings or real-world settings with limited access to data, the secondary data analyzed in our research allow us to examine perceptions of leadership performance based on real-world, routine, anonymous evaluations from a military service academy. Not only has the military historically been at the forefront of social change and inclusion, but it is also the largest employer in the United States and thus it is an especially relevant domain in which to assess the potential prevalence of gender bias (Lundquist 2008; Moskos 1993; Sampson and Laub 1996). Our results are consistent with previous experimental findings and support existing theoretical frameworks. Moreover, we are able to quantify both the breadth and depth of stereotype and gender status beliefs as conveyed in subjective performance evaluations, something not previously feasible without this type of data. Finally, we contribute to the theoretical literature by examining the interrelationship of role status and gender status as distinct bases for evaluating performance.

Gender Stereotypes, Status, and Leadership

The lower retention and advancement of women, especially in traditionally male professions, are often attributed to discrimination and prejudice against women in stereotypically masculine work roles. Stereotypes and expectations about who a leader is “supposed” to be impact how individual leaders are evaluated (Galinsky et al. 2013; Gündemir et al. 2014). These stereotypes and expectations are associated with status characteristics, such as gender (Wagner and Berger 1997). Status characteristics theory (SCT) states that socially significant, salient, and observable characteristics (e.g., gender, race) form status hierarchies based on relative value, competence, and prestige, understood in broadly shared cultural beliefs. Stereotypes and their associated content reflect these cultural beliefs and provide rules for social interaction, evaluation, and judgment. Within the workplace, people with higher status characteristics (e.g., men, Whites) often receive advantage through higher performance expectations, more prestige, and increased influence (Berger et al. 1977).

Status can be ascribed based on individual characteristics (e.g., gender, race, age) or achieved as something that is earned (e.g., leadership position, academic degree, military rank). The gender status hierarchy influences performance expectations such that in higher status positions (e.g., leader), competence (e.g., performance) expectations for women are lower than for men (Ridgeway 2001). Moreover, SCT helps explain why women (lower ascribed status) often lack perceived legitimacy in leadership positions.

Status characteristics are useful in understanding the shared beliefs we hold about who we should be or how we should behave (descriptive) and who we should not be or how we should not behave (proscriptive), particularly in work roles. When considering a higher status role like leader, descriptive status characteristics are often associated with men and masculinity, whereas proscriptive traits for leaders are often associated with women and femininity. Research on leader performance evaluations using status characteristics is largely based on the agentic-communal dichotomy (Bakan 1966). Agentic behavior is associated with instrumental, task-focused, and goal-oriented characteristics, whereas communal behavior is linked to relationship-oriented, nurturing, and warmth characteristics (Abele and Wojciszke 2014; Bem 1974; Eagly 1987). These agentic qualities and behaviors are characterized as higher status and communal traits and behaviors as lower status (Abele and Wojciszke 2014).

Agentic leadership qualities and behaviors consist of two distinct constructs related to competence and dominance (Abele and Wojciszke 2014; Rosette et al. 2016). Agentic competence as a leader relates to a person’s abilities and skills to lead others in a goal-oriented manner and is typically found in descriptive traits of leaders (Abele and Wojciszke 2014; Rosette et al. 2016). In contrast, agentic dominance is understood as an assertive leader establishing control over others with an emphasis on competitiveness (Abele and Wojciszke 2014; Rosette et al. 2016). Dominance traits of a leader may be either descriptive or proscriptive traits (Rosette et al. 2016; Rudman et al. 2012).

Leader evaluation research examining the competence and dominance constructs finds that women leaders are often evaluated as having either an agentic deficiency (i.e., viewed as lacking competence to be a leader) or an agentic penalty (i.e., penalized for displays of dominance). An agentic deficiency makes it difficult to be hired into or be deserving of a leadership position, and agentic dominance impacts perceived legitimacy in using agentic behaviors in leadership roles (Rosette et al. 2016). Dominance is often perceived as not congruent with stereotypical feminine communality and can lead to perceptions of lack of warmth (not feminine) according to stereotype content research (Eckes 2002; Fiske et al. 2002). Consequently, it is especially challenging for women leaders who must often establish their competence as leaders using agentic characteristics and behavior (Rudman et al. 2012).

The combination of warmth and competence within gender stereotypes reinforces the gender hierarchy for women where they are expected to have lower competence and higher warmth (Eckes 2002; Fiske et al. 2002). The warm but not competent woman (e.g., the “housewife” stereotype) poses no threat to the gender hierarchy. However, a high competence and low warmth woman (e.g., the “career woman” stereotype) challenges the gender hierarchy (Eckes 2002, p. 112). The stereotype that women should not be cold or uncaring (because they should be warm and caring) further penalizes women using a proscriptive characterization. Both dominance and competence as agentic characteristics operate as status maintenance for the gender status hierarchy and provide justification and motivation for penalties in the form of negative performance evaluations or at least less positive evaluations for women compared to men’s. This is the double bind women may experience—being negatively evaluated as lacking competence (when perceived as communal) and being too dominant (when perceived as agentic).

Gender status characteristics and stereotype content for military service members are similar to those for leaders. Military leaders are expected to be decisive, independent, confident, and competitive with a command and control style of leadership. These expectations are consistent with those for male leaders; however, they are inconsistent with expectations for female leaders, who are expected to be helpful, kind, gentle, and emotionally expressive using a participative and collaborative style of leadership (Archer 2013; Boldry et al. 2001; Boyce and Herd 2003; Ebbert and Hall 1993; Francke 1997; Looney et al. 2004; Morgan 2004). Thus, because gender stereotypes for military women are inconsistent with expectations for military leaders, they may contribute to women leaders’ negative performance evaluations.

The U.S. Military as a Case Study

The U.S. military offers an ideal environment for directly examining the relationship among gender, status, and stereotype content because it is a social institution that has long been considered a vanguard of social change and has institutionalized role expectations and a formal performance evaluation system (Atkinson 2015; Lundquist 2008). Epitomizing masculine-type work, the military was, until recently, highly gender-segregated limiting women’s ability to compete with men on equal footing (Pellerin 2015; Segal et al. 2016). Despite the military’s gender integration efforts over the last 40 years, men represent 84% of the active duty forces and are retained at almost twice the rate as women in combat specialties (U.S. Department of Defense 2016). Beyond representation and the type of work, military culture reinforces a hypermasculine identity with the ideal warrior being brave, unemotional, fit, and ready to fight (Archer 2013; Barrett 1996). This ideal masculine warrior is socialized through basic training and everyday military life, and this paradigm may influence leadership styles or perceptions.

Institutional military structure is premised on an “up-or-out” career model, whereby one is either promoted or separated (i.e., not retained) (Rosen 1992). Evidence suggests that military women and men perform similarly in relevant training and other objective measures such as awards, physical fitness scores, grade point average, military science grades, and rankings (Biernat et al. 1998; Boldry et al. 2001). The military is perceived to be meritocratic because promotion is largely based on expertise and competency; however, subjective factors are also highly relevant (Atkinson 2015; Lundquist 2008).

Our research examines whether stereotype content and status incongruity arise in subjective performance evaluations of those training to be military leaders at the United States Naval Academy (USNA). USNA is one of three U.S. military service academies (the others are the United States Air Force Academy and the United States Military Academy [Army]), which are major military accession sources in addition to Reserve Officer Training Corps (ROTC) and Officer Candidate School (OCS). Military service academies are four-year public colleges where students graduate with Bachelor of Science degrees. Students receive military, physical, and character training in preparation for commissions as military officers in their respective services. The four-year leader education and development programs are expressly designed to indoctrinate and socialize students into the military profession, with the hope that they will make it a career. For all four years at USNA, students (“Midshipmen”) both work and live in their professional units (“companies”), which results in minimal separation between professional and personal lives and results in students getting to know members of their company on a personal and professional level. The work includes company leadership and organization, dissemination of information from Academy leadership, military leadership training, counseling and guidance, among others. As part of the leadership development process, students are evaluated on their professional competence by superiors and peers in their professional units.

Performance evaluation data on Midshipmen offer a unique opportunity to examine the relationship between gender and assigned leadership characteristics and to assess evidence-based organizational practices. For men and women who are broadly similar with respect to academic standards, physical standards, and military standards, we would expect similar evaluations in the absence of biases and stereotypes. However, theory suggests that gender status beliefs may penalize women in traditionally masculine roles who violate the gender status hierarchy. Peer and upper-class application of subjective leadership characteristics offer insight into how status incongruity and related penalties may be reflected in performance evaluations. Based on previous research, we anticipate that in this military leadership context women will be evaluated differently and more harshly than their male peers will be.

The Present Study

We hypothesize that women training to be military leaders will be perceived as status-incongruent based on their gender and leader role and that this incongruity will be observed in their performance evaluations. Based on SCT, evidence of status incongruity will be associated with women receiving fewer descriptive and more proscriptive characteristics than men (Hypothesis 1). As leaders, men are status-congruent (higher status for gender and role) and expected to be agentic (competent and dominant), whereas women are not status-congruent or expected to be agentic. Therefore, when we consider the gendered component of these characteristics, we anticipate that men will receive more masculine descriptive characteristics whereas women will receive more feminine descriptive characteristics (Hypothesis 2). Also, because women leaders are status-incongruent (lower gender status and higher role status), they will receive more feminine proscriptive characteristics (agentic deficiency) and masculine proscriptive leadership characteristics (agentic penalty) than men will (Hypothesis 3).

Additionally, analysis at the level of individual characteristics enables us to explore how agentic competence and agentic dominance may be attributed to our participants. We expect that individual descriptive and proscriptive characteristics will be assigned consistent with Hypothesis 2 and Hypothesis 3. Building on the theoretical underpinnings of SCT, we explore leader agency through assignment of specific characteristics in performance evaluations. Specifically, we expect that men will be more likely to receive individual masculine descriptive characteristics (i.e., agentic competence and agentic dominance) and that women will be more likely to receive individual feminine descriptive characteristics (i.e., communal) (Hypothesis 4). Finally, we expect that women will be more likely than men will be to receive individual feminine and masculine proscriptive characteristics (Hypothesis 5).

Method

Leader Evaluation Process

Data were drawn from the Midshipmen Aptitude for Commissioning system and merged with demographic and performance (military, academic, and physical) measures drawn from an institutional database with approval from USNA’s Institutional Review Board. Students evaluate one another in multiple ways using the Academy’s Midshipman Aptitude for Commissioning system (this is akin to a 360 degree feedback system in professional context, albeit without subordinate input). At the end of each semester students are required to anonymously rank all of their classmates within their company (i.e., approximately 40 peers per class year) who are of the same year or younger, placing each person into a quintile. Then they must determine who the top three performers in the top quintile are, as well as identify the bottom three performers in the bottom quintile. For each of these six individuals they must make a single selection (one attribute) from a predetermined list of 44 positive and 45 negative characteristics that best describes the individual’s professional and leadership traits (hence referred to as “leadership attributes”). (They may provide leadership attributes for other students as well, but it is not required.)

The leadership attributes available for selection are presented in a single alphabetical list complete with descriptions to raters. (See online supplement for the complete list including definitions and valence.) The rankings and leadership attributes are intended to capture how other students perceive the target and are largely subjective (United States Naval Academy 2016). Students understand that these evaluations are influential in the assignment of student leadership positions in conjunction with objective measures (e.g., grades, fitness scores, class standing). Although the evaluations are important and the institution wants students to take the process seriously and provide responses that accurately reflect their observations, it is unclear on exactly what information and criteria students base their evaluations and, for some, it may be more about popularity than a professional evaluation.

Participants

We obtained data on all students (evaluatees) enrolled at USNA in the Spring semester of the 2014–2015 academic year. Because evaluators were anonymous, their demographic data, including gender, were unavailable for this analysis. We excluded students studying abroad (n = 23) and students who were foreign nationals or visiting USNA for the semester (n = 82).

Many students in high-level leadership positions (called “stripers” because of the insignia they wear) were missing peer evaluation data because they were not ranked by their peers at the end of the semester due to the nature of their positions away from their companies. However, since striper assignment is partially based on class standing, the omission of these stripers might meaningfully skew results. Therefore, for those stripers without peer performance data for the Spring semester, we imputed rankings and attributes from the previous semester (n = 63). Forty-five students with striper positions both Spring and Fall semesters had no peer performance measures either semester and were dropped. The resulting dataset comprises 4344 students.

Men composed more than three-quarters of the student body (nmen = 3349, 77%; nwomen = 995, 23%). A majority of the student body was White (n = 2841, 65%), with 482 (11%) Hispanic, 304 (7%) African American, 293 (7%) Asian American, 21 (.5%) Native Hawaiian/Pacific Islander, 20 (.5%) Native American, and 383 (9%) “Other.” Classes were approximately equally distributed, with 1031 (24%) seniors, 1055 (24%) juniors, 1114 (26%) sophomores, and 1144 (26%) first years and with differences largely due to attrition. Age was excluded from analysis because all students are required to matriculate by age 23 and graduate by age 27. Although everyone at the Academy must participate in athletics, only about a quarter were varsity athletes (n = 1108, 26%), whereas the remainder were involved in intramural and club sports.

Measures

Descriptive and Proscriptive Attributes

The Midshipman Aptitude for Commissioning system identifies 89 leadership attributes Midshipmen can ascribe to one another, and it explicitly assigns them a valence in the context of leadership at the Naval Academy. Because the attributes identified as “positive” are consistent with descriptive leadership traits, and the attributes identified as “negative” are consistent with proscriptive leadership traits, we labeled the 44 positive attributes “descriptive” and the 45 negative attributes “proscriptive.” The analysis considers the descriptive and proscriptive attributes together because all 89 attributes are available for selection when assigned by evaluators. However, to interpret the assignment of these as distinct descriptive and proscriptive categories, we examine the type of attribution separately.

Feminine and Masculine Attributes

To address gendered assignment of leadership attributes, attributes were labeled as feminine, masculine, or neutral based on gender assignment derived from earlier research (e.g., Bem Sex Role Inventory, Personal Attributes Questionnaire; see online supplement). Both an undergraduate research assistant and the second author reviewed previous literature for how characteristics were coded. Where our attributes mapped directly onto characteristics identified previously, we used the gender assignment from that research. Where attributes did not map directly, we looked for a closely identified term and used its gender assignment in conjunction with the institutionally provided definition (e.g., “apathetic” is one of our attributes, but does not appear in the research we examined; however, Prentice and Carranza 2002, have “detached,” which we used as a synonym for “apathetic”). If there was disagreement or it was unclear how to code an attribute based on pre-existing literature, we labeled the attribute as neutral. All attribute labeling was reviewed by the first author.

It is worth emphasizing that the attribute gendering is distinct from the descriptive/proscriptive nature of the leadership characteristics. Of the 44 descriptive leadership attributes, 11 were characterized as masculine (analytical, athletic, competent, confident, courageous, decisive, inspiring, logical, practical, proactive, and resourceful), 15 feminine (charismatic, civil, compassionate, dependable, diplomatic, enthusiastic, honest, intuitive, loyal, mature, organized, polished, respectful, self-aware, and team-player), and 18 neutral (articulate, candid, dedicated, diligent, energetic, ethical, industrious, innovative, judicious, level-headed, methodical, principled, resilient, responsible, self-disciplined, self-reliant, thorough, and versatile). Of the 45 proscriptive leadership attributes, 18 were characterized as masculine (abrasive, abusive, apathetic, arrogant, blunt, careless, confrontational, disorganized, egocentric, forgetful, inconsiderate, lethargic, opportunistic, overbearing, ruthless, selfish, sloppy, and stubborn), 10 feminine (excitable, frivolous, gossip, indecisive, inept, panicky, passive, scattered, temperamental, and unpredictable), and 17 neutral (argumentative, complacent, impetuous, inattentive, incurious, indifferent, irresponsible, lackadaisical, mistrustful, sarcastic, sleepy, uncommitted, unprincipled, unproductive, untruthful, vague, and vain).

Counts, Proportions, and Relative Frequencies

The leadership evaluation process produces count data—the number of times each Midshipman was assigned each attribute. For instance, if a Midshipman has a count of 3 for a given attribute (e.g., analytical), then this Midshipman was characterized as such by three other Midshipmen in the company. For each Midshipman in our dataset, we have the number of times she or he was assigned each of the 89 leadership attributes. Broadly (i.e. across attributes), we consider the counts (e.g., the total number of descriptive assignments and the total number of proscriptive assignments). When considering the attributes individually, however, we consider the counts relatively; specifically, we examine: (a) breadth (or diversity) of the attribute as indicated by the proportion of the population (4344 Midshipmen [3349 men and 995 women]) to ever receive the attribute (1 = at least once, 0 = never) and (b) depth (or intensity) of the attribute as indicated by the frequency of assignment of that attribute relative to the other 88 attributes (81,774 total attribute assignments [51,699 descriptive and 30,075 proscriptive]). The key distinction in these measures is the denominator, where the denominator for the proportions is the total number of Midshipmen, whereas the denominator for the relative frequencies is the total number of attribute assignments. The proportions and relative frequencies allow us to identify differences, respectively, in how widely an attribute is used to describe the population and how often a particular attribute is used relative to others.

Results

The men and women in our study were comparable in military performance with respect to their cumulative military grade point average (Mmen = 3.18, SD = .36; Mwomen = 3.17, SD = .38, on a scale of 0 to 4.0; p = .354) and company military ranking (Mmen = 18.5, SD = 10.64; Mwomen = 18.3, SD = 10.65, on a scale of 1 to 41; p = .558). Therefore, consistent with previous research (e.g., Biernat et al. 1998; Boldry et al. 2001), something other than objective performance presumably accounts for gender differences in the subjective performance evaluations and attribute assignment.

In our discussion of the results that follows, it is important to understand the distinction between attributes and attribute assignments. Suppose a target in our study received analytical twice, competent three times, and all other attributes zero times. This target received two attributes and five attribute assignments. We consider the first measure as the diversity, or variety, of attributes received and the second measure as the intensity of attribute assignment—how, in this case, “positively” the target is viewed.

Given that Midshipmen are required to assign leadership attributes only to the top three and bottom three in their ranking, one might assume that many Midshipmen are not assigned any attributes. This is not the case. The percentage of men who never received a descriptive attribute was 1.5% compared to 1.6% of women. The percentage of men who never received a proscriptive attribute was 13% compared to 7.4% of women.

The number of descriptive attribute assignments and the number of proscriptive attribute assignments received by men and women were compared using the Wilcoxon rank sum test. The men and women in our study did not differ significantly with respect to the number of descriptive assignments received (Mdnmen = 10, Mdnwomen = 9, p = .098); however, the men received significantly fewer proscriptive assignments compared to the women (Mdnmen = 4, Mdnwomen = 5, p < .0001), providing partial support of Hypothesis 1.

The numbers of gendered (masculine, feminine, neutral) descriptive and proscriptive attribute assignments received by men and by women were also compared, again using the Wilcoxon rank sum test. The significance level used to test for differences in these comparisons was set at α* = .05/6 = .0083 (standard overall α = .05 Type I error rate with a Bonferroni correction to adjust for the six comparisons). Under the attribute classification described previously, there were significant differences in the number of assignments to men and women for masculine descriptive attributes (Mdnmen = 3, Mdnwomen = 2, p < .0001) and feminine descriptive attributes (Mdnmen = 3, Mdnwomen = 4, p < .0001), providing support for Hypothesis 2. Moreover, whereas women received significantly more feminine proscriptive attributes than did men (Mdnmen = 0, Mdnwomen = 1, p < .0001), women did not receive significantly more masculine proscriptive attributes (Mdnmen = 2, Mdnwomen = 2, p = .010), providing partial support for Hypothesis 3.

Turning attention now to the individual attributes, the proportion of Midshipmen who were assigned a given attribute (at least once) and the relative frequency of assignment of that attribute were computed separately for men and women. Fisher’s exact test (a small sample alternative to the Chi-square test) was used to test for gender differences in both the proportion and the relative frequency of the individual attributes with significance level α* = .05/89 = .00056 (overall α = .05 Type I error rate with a Bonferroni correction to adjust for multiple comparisons). Effect sizes for proportions were calculated using Cohen’s h (an analog to Cohen’s d for proportions) (Cohen 1988). For example, the proportion of men assigned the attribute analytical was .580, whereas the proportion of women assigned analytical was .482; this difference is statistically significant (p < .0001, Cohen’s h = .197). The relative frequency of analytical also differs significantly by gender (p < .0001), where, relative to the other attributes, analytical is used with greater frequency for men than for women (.0598 versus .0430, respectively). (Complete results are available in the online supplement.)

Table 1 summarizes the gender differences in the proportion of Midshipmen ever assigned the individual attributes, categorized by the gendering of the attributes. Statistically significant gender differences were detected on 11 of the 44 descriptive attributes (see Table 1a) and on 10 of the 45 proscriptive attributes (see Table 1b). In partial support of Hypothesis 4, men were more likely to receive 5 masculine descriptive attributes (analytical, competent, athletic, logical, and practical), 1 neutral descriptive attribute (level-headed), and none of the feminine descriptive attributes; whereas women were more likely to receive 1 masculine descriptive attribute (proactive), 1 neutral descriptive attribute (energetic), and 3 feminine descriptive attributes (compassionate, enthusiastic, and organized). In support of Hypothesis 5, women were more likely than men to receive all 10 of the proscriptive attributes for which there was a statistically significant gender difference (selfish, vain, inept, frivolous, gossip, excitable, scattered, temperamental, panicky, and indecisive).

Table 1 Significant gender differences in proportions: Descriptive and proscriptive attributes

Table 2 shows a similar pattern in gender differences for the relative frequency of attribute assignment. Statistically significant gender differences were detected on 14 of the 44 descriptive attributes (see Table 2a) and on 14 of the 45 proscriptive attributes (see Table 2b). These results are mainly consistent with the findings for proportions (i.e., Table 1), although there are more attributes that are significant. The fact that we find more significant gender differences among attributes when evaluating the relative frequencies is not surprising when we recognize that a single Midshipman who receives a single attribute many times can strongly impact the relative frequency (attribute counted multiple times), but not the proportion (individual counted once).

Table 2 Significant gender differences in relative frequency: Descriptive and proscriptive attributes

As shown in Table 2a, men received 10 descriptive attributes with greater relative frequency, 6 of which are masculine (analytical, competent, athletic, confident, logical, and practical), 3 neutral (versatile, articulate, and level-headed), and 1 feminine (dependable). Women received 4 descriptive attributes with greater relative frequency, of which none are masculine, 1 is neutral (energetic), and 3 are feminine (compassionate, enthusiastic, and organized). Table 2b shows the corresponding results for the proscriptive attributes. Of the 14 proscriptive attributes for which there was a significant gender difference, women received 12 with greater relative frequency (selfish, opportunistic, vain, inept, frivolous, passive, scattered, gossip, excitable, panicky, temperamental, and indecisive). Only 2 proscriptive attributes (arrogant and irresponsible) were assigned with greater relative frequency among men. Again, there is strong support for Hypothesis 5.

Discussion

Based on SCT and status incongruity we predicted that men and women would receive different performance evaluations. Our results show that overall, women received more proscriptive leadership attributes than men do, but a similar number of descriptive leadership attributes (Hypothesis 1). Within the descriptive leadership attributes, we found that women received more feminine attributes and men received more masculine attributes (Hypothesis 2). However, for proscriptive leadership attributes, women received more feminine attributes while receiving a similar number of masculine attributes (Hypothesis 3). We also found significant gender differences in the individual descriptive attributes, with women more likely to receive 5 attributes (1 masculine, 1 neutral, 3 feminine) and men more likely to receive 6 attributes (5 masculine, and 1 neutral) (Hypothesis 4). As for individual proscriptive attributes, women were more likely to receive 10 attributes (1 masculine, 1 neutral, and 8 feminine) (Hypothesis 5). Consistent with prior meta-analytic research on gender differences in cognitive, communication, and social and personality variables (Hyde 2005), our effect sizes (Cohen’s h) were relatively small with few exceptions. Although the effect sizes may seem to be small, these differences can result in practical importance in the workplace. Indeed, over time, research shows that small biases against women in performance evaluations can cumulatively result in large disparities in gender diversity at senior leadership levels (Martell et al. 1996).

We found support for SCT, in that men’s higher ascribed gender status is congruent with their higher role status (leader) and women were evaluated as incongruent as lower gender status and higher role status (leader). Examining the collective and individual leadership attributes, we found that men were more likely to receive 6 of the 29 masculine/neutral descriptive attributes and none of the feminine descriptive attributes. Furthermore, not only were women more likely than men to receive only 2 of the masculine/neutral descriptive attributes, but they were also more likely to receive 3 of the 15 feminine descriptive attributes. SCT is further supported by our finding that women were more likely than men were to receive all 10 of the proscriptive attributes for which there was a statistically significant gender difference.

Because the majority of the proscriptive leadership attributes women were more likely to receive were feminine (8 of the 10), it appears that these women may have been evaluated more often on competence (agentic deficiency) than on dominance (agentic penalty). Consistent with previous research, this pattern might imply that these women employ a stereotypical feminine leadership style (communal). If this is the case, it could explain why the women in our study were more likely to be characterized as inept because it implies an unspoken questioning of their competence.

Although we hypothesized that women would also receive masculine proscriptive attributes more than men, there was little support for this prediction in our data. This may suggest that either these women are not employing an agentic leadership style or that the dominance penalty is not as prevalent in this context. According to SCT, women who lead using greater agency (dominance) are more likely to receive backlash (agentic penalty) in the form of proscriptive attributes (e.g., abrasive, abusive, argumentative, arrogant, or confrontational) emphasizing the masculine authority they have usurped. With the exception of selfish, no other masculine terms were received more by women.

Because women were generally more likely to receive feminine descriptive and proscriptive leadership attributes, we considered the possibility that evaluators attempt to maintain the gender status hierarchy by evaluating women using attributes that emphasize what women are not—stereotypical masculine leaders. Of note, compassionate was the most commonly assigned attribute of any type to be given to women. Compassionate is a desirable leadership attribute for any leader, regardless of gender, yet it is a characteristic that is generally more associated with women leaders than with men leaders (Parker et al. 2015). Similarly, the leadership attribute, organized, was assigned to women more than to men (Parker et al. 2015). Thus, there is evidence that feminine leadership attributes are being assigned in a way that is consistent with maintenance of the gender status hierarchy.

Our results suggest that women in the military may face a more subtle version of the double bind. Only one masculine proscriptive attribute, selfish, was assigned more often to women whereas we expected more penalties for agentic dominance in the military context. Instead, women were assigned more feminine proscriptive leadership attributes (inept, frivolous, gossip, and excitable), which may be a penalty for being perceived as communal. Of note, the neutral proscriptive leadership attribute, vain, was also more likely to be assigned to women. Personal appearance in the military context is valued and emphasized in terms of professional appearance in uniform. However, women whose personal appearance is observed to be more feminine or somehow overtly enhanced (e.g., cosmetic make-up, nail polish, hairstyle) in ways that may make them feel more professional, could draw attention to their femininity and therefore be evaluated as incongruent with the leader role. We also acknowledge that vain may not be properly categorized as neutral.

Finally, the absence of gender differences between men’s and women’s cumulative military grade point average and company military ranking indicates that something other than objective performance accounts for gender differences in the subjective performance evaluations and attribute assignment (i.e., bias). However, the possibility exists that some of these evaluations may be grounded in accurate perceptions of leadership. The data do not enable the comparison of attribute assignment to actual performance. For instance, although a person may receive high marks on the aggregate performance measures we have, they may have done it in a way that leads the evaluator to judge the person’s leadership style as selfish. However, previous research on applicable leadership traits (e.g., personality traits, intelligence, emotional intelligence, creativity) suggest that any significant differences may be attributed to other evaluative processes such as bias and stereotype content (Baer and Kaufman 2008; Halpern and LaMay 2000; Petrides and Furnham 2000; Schmitt et al. 2008).

Limitations and Future Directions

Because we analyzed real-world data from a current leadership performance evaluation system, there are several limitations to our research. One of the limitations of our dataset is that the evaluator’s gender is unknown (so as to provide anonymity in the performance evaluation system). Because maintenance of the gender status hierarchy is conducted by both men and women, it would lead us to expect that there would not be any difference in how men and women assign leadership attributes based on stereotypes (Greenwald and Banaji 1995; Rudman et al. 2012). However, it would be valuable to understand whether men and women assign gendered descriptive and proscriptive leadership attributes differently in the present context. It would also be helpful to understand the criteria they used because, in some cases, evaluators may have access to more objective performance data allowing for a more comprehensive depiction of how and when proscriptive and descriptive attributes are assigned. Finally, having an accurate depiction of the target’s leadership style would enable analysis of who was penalized for agentic styles compared to communal styles of leadership.

Beyond gender, further analysis of attributes may provide more detailed knowledge of how particular attributes relate to each other and factors such as age, race and ethnicity, and important professional qualifications. We contend that an intersectional analysis of gender and race/ethnicity could be of particular interest and importance in today’s modern workplace. Multicultural perspectives of leadership performance and evaluations are conspicuously sparse in the literature and would be useful for organizational leaders and human resources managers.

Another area to explore is recent research suggesting that effects of status incongruity and threats to the gender hierarchy in organizations and industries are observable at macro levels. In analyzing leader effectiveness and evaluations, two studies find that gender difference at the organization and industry level moderates leader evaluations (Ko et al. 2015; Paustian-Underdahl et al. 2014). Although our results are consistent with the findings of these macro level analyses that, in more masculine, male-dominated organizations, professions, and industries, men received evaluations as being more effective (e.g., men being evaluated as competent and women being evaluated as inept), examining gender composition at each level of leadership may provide further clarity on status effects in performance evaluations. Beyond historically male-dominated industries such as the military where there are more men than women at all levels, it may be useful to examine industries where there is overall gender balance but where women’s representation decreases at successively higher leadership levels (e.g., advertising; pharmaceutical).

Status incongruity and a defense of the gender status quo on a macro level may also help explain why women are more likely to receive vague feedback on performance evaluations that are more closely tied to their communal traits as caregivers than as leaders (Correll and Simard 2016). Consistent with this line of research, women in our study were more likely to be evaluated positively as compassionate and negatively as inept. Future research that includes organization gender composition, evaluator gender, and objective individual performance outcomes may refine the relationship of status, performance, and stereotypes. Particularly useful would be an analysis of the type of language used in performance evaluations based on achievement of a desired outcome.

Finally, longitudinal research examining leadership style and evaluations could provide critical information on employees’ outcomes associated with gender status beliefs. Specifically, performance evaluations that can be tied to retention and promotion outcomes would provide valuable data to practitioners in establishing policy and evaluating best practices.

Practice Implications

The type and amount of evaluation criteria are instrumental to facilitating gendered performance expectations. Research shows that when there is more ambiguity in evaluation criteria and level of performance, evaluators are more likely to rely on stereotyped expectations (Heilman 2012). Additionally, when there is less relevant performance information available for evaluation, evaluators are more likely to infer performance based on stereotypes (Heilman et al. 2004; Swim et al. 1989). The subjective nature of the leadership performance evaluations available to our research participants, along with ambiguity and scant performance criteria, facilitates evaluations based on gender-stereotyped expectations. Alternatively, our results could be evidence of backlash that successful women receive in masculine organizational contexts and male-typed tasks (Heilman et al. 2004).

From a practical perspective, our results add to the wealth of research demonstrating how, in the absence of other information, ambiguous and subjective evaluations facilitate evaluators’ use of gender stereotypes. Our data from the existing performance system assumes that participants use appropriate criteria to complete their evaluations. However, the minimal guidance provided may create an environment that allows gender status beliefs to be employed. Creating specific objective criteria based on goals, skills, and outcomes that could be assessed using available tools and metrics may provide more accurate and useful evaluations. Further, traits-based evaluation systems that employ phrases or other pre-selected evaluation content should purposefully select trait language after careful testing to minimize status beliefs and stereotype content.

Finally, our research complements prior work by providing additional evidence as to how status characteristics influence performance perceptions and has important implications for reducing gender inequities throughout the career pipeline. Identifying and removing stereotype content and biased language embedded in job advertisements and recruiting materials is vital to employers seeking to attract and hire diverse talent (Bolukbasi et al. 2016; Gaucher et al. 2011). Additionally, research in performance standards for accountability, promotion, attribution rationalization, and stereotype threat continues to be instrumental in understanding why women may be receiving subtle, if not explicit, messaging that they are not the right fit for the job (Biernat et al. 2010; Castilla 2008; Cuddy et al. 2004; Davies et al. 2005; Kalev et al. 2006; Lerner and Tetlock 1999; Rudman 1998; Rudman and Glick 1999).

Conclusions

Industries and professions are desperately trying to retain talented women who often receive formal and informal messaging that they do not belong and do not fit, as well as are penalized for their authentic leadership style. Even in esteemed institutions such as military service academies with a reputation for producing leaders of character to serve a nation, gender status beliefs are pervasive and may be unknowingly contributing to retention problems when women make career decisions years later. The findings of our research suggest that SCT and status incongruity may be reinforced in a U.S. military leadership context through an institutional, formal performance evaluation system.

In the present paper, we employed SCT and status incongruity to analyze real-world leadership performance evaluation data and found support that women leaders are evaluated with a greater variety of proscriptive attributes. Additionally, our finding that women are evaluated with a limited variety of descriptive leadership attributes provides theoretical nuance. Not only are women penalized for violating the gender status hierarchy by being evaluated with more proscriptive attributes, they are also penalized with fewer types of individual descriptive attributes (less variety). Although this finding is consistent with previous experimental research that women received similar overall numbers of descriptive evaluations (Eagly et al. 1992; Rudman and Glick 2001; Rudman et al. 2012), it also expands what we know about the variety of descriptive evaluations. This reasoning leads us to question if our findings are specific to the military leadership context or might also be observed in other professions and industries—especially those that are historically male or with a hypermasculine culture.

Whereas women received more proscriptive leadership attributes, the type of proscriptive attributes more often used in evaluations were feminine and not the masculine proscriptive attributes that status incongruity and agentic dominance (penalty) would predict. Military women in a hypermasculine culture are challenged to fit in as leaders while also contending with gender stereotypes (Archer 2013). In this cultural context, we expected the agentic dominance penalty to be amplified, with masculine proscriptive attributes outnumbering feminine attributes, but which was not the case. Our data do not enable us to explain this result, but we suggest that there could be two explanations. First, these are college students at a military service academy who may not have adopted a more traditional masculine leadership style. Alternatively, over the course of their time in the military leadership setting, some women may have received sufficient negative feedback and backlash about their agentic leadership style and adapted to a more traditionally feminine leadership style.

Our findings provide important evidence to organizational leaders and human resources managers seeking to develop transparent evaluation processes that identify, develop, and promote the most talented people, regardless of gender. Research on status characteristics contributes to our knowledge of gendered language in performance evaluations and can assist researchers and practitioners with developing interventions. Understanding how gender status beliefs are associated with evaluation processes may facilitate changing workplace culture to be more gender-inclusive through less biased and stereotypical performance evaluations.