Introduction

Despite decades of women’s progress in earning college degrees and becoming managers (Perry 2013), women remain a small minority in the highest levels of leadership (i.e., the “glass ceiling”; Catalyst 2018; Morrison et al. 1994). Widespread gender stereotypes and leader prototypes that more closely associate men with leadership can drive differential evaluations of male and female leaders, perpetuating a gender gap in leadership (Eagly and Karau 2002; Eagly et al. 1992; Lord and Hall 2003; Schein 1973, 2001). In response, some women may adopt more masculine or agentic behaviors (e.g., being more voluble; Brescoll 2011) that are congruent with leader prototypes but incongruent with their gender role, leading to negative backlash (Brescoll 2011; Brescoll and Uhlmann 2008; Rudman 1998; Rudman et al. 2012). Organizations may also implement policies and programs ostensibly intended to support and retain female employees, yet these initiatives can counterintuitively create additional barriers for potential female leaders (e.g., gender quotas and maternal leave benefits can highlight women’s gender roles and stereotypes; Gloor et al. 2018; Heilman et al. 1997). For these reasons, business ethics scholars consider the persistent and pervasive gender gap in leadership a pressing issue in need of innovative and effective solutions (e.g., Hernandez Bark et al. 2016; O’Neil et al. 2008; Oakley 2000).

We propose a team-level intervention that aims to “fix the game, not the dame” by changing group-level leader prototypes via manipulation of the group gender composition as opposed to women’s behaviors. Guided by role congruity theory (Eagly and Karau 2002) and the social identity model of organizational leadership (Hogg 2001; van Knippenberg and Hogg 2003), we propose that the gender composition of the group (i.e., the percentage of women in the team) may weaken or even override team leader prototypes to improve followers’ responses to female leaders. Gender stereotypes, defined as widely held persistent and pervasive oversimplified images or ideas of a particular type of person or group (Schein et al. 1996), can be extremely difficult to change (Moss-Racusin et al. 2014). However, group prototypes are more malleable than stereotypes and are largely influenced by contextual norms. Prototypicality is a “set of characteristics possessed by most category members” (Cronshaw and Lord 1987, p. 97) and may be benchmarked according to the leader (i.e., attributes that characterize “leaders,” such as gender) or the group (i.e., attributes that characterize the follower group). Since leaders’ demographic characteristics are generally immutable, whereas group prototypes are alterable and relevant for leadership effectiveness (e.g., Giessner et al. 2009; also see van Knippenberg 2011), we target team-level prototypes for intervention.

We integrate the social identity model of organizational leadership with role congruity theory to examine how team gender composition may serve as a contextual moderator that affects the way leader gender relates to team perceptions of leader prototypicality and trustworthiness. According to the social identity theory of leadership, a leader is perceived to be more group prototypical (i.e., embodying the group identity as a team or organization) as he or she builds influence and legitimacy from group members who believe the leader represents what is group-normative (Hogg 2001; van Knippenberg and Hogg 2003). A leader may be viewed as more group-normative to the extent to which he or she shares demographic characteristics with team members. Role congruity theory (Eagly and Karau 2002) suggests that, in general, prejudice toward female leaders results from the perceived incongruity between the characteristics of women and the requirements of leadership roles. This prejudice toward female leaders can vary depending on features of the leadership context and the characteristics of leaders’ evaluators (Eagly and Karau 2002). Therefore, prejudice against female leaders may weaken as the gender diversity of the group increases.

We manipulate gender composition at the team level in two randomized, multiple-wave, and multiple-source field experimentsFootnote 1 to examine whether the male advantage in perceptions of leader prototypicality and trustworthiness can be eliminated in more gender-balanced teams. Our key contributions are threefold. First, we bridge classic work on gender and leadership (Eagly and Karau 2002) with leadership and group prototype research (Hogg 2001; van Knippenberg and Hogg 2003). The idea that group prototypes may override broader societal stereotypes has been theorized (van Knippenberg 2011) and described in the context of organizational compositions or professional stereotypes (Eagly and Karau 2002; Perry et al. 1994). However, to our knowledge, this proposition has not yet been experimentally tested in the field or examined at the team level.

Second, individual and organizational efforts to improve gender equity may focus on females, which can inadvertently create additional barriers to their success (e.g., Brescoll 2011; Brescoll and Uhlmann 2008; Gloor et al. 2018; Heilman et al. 1997; Rudman 1998; Rudman et al. 2012). Thus, typical interventions to improve gender equity may be misdirected or even morally questionable. We aim to improve gender equity in leadership evaluations by adjusting team compositions as opposed to changing female leaders’ behaviors in the workplace or creating new company-wide hiring or family policies for female employees.

Finally, ethics and social justice perspectives have been the prevailing arguments for employees’ equal employment and career opportunities regardless of their demographic background (e.g., Dwertmann et al. 2016; Eagly and Carli 2007). However, our research goes a step further to causally illustrate why and how this might be true and potentially self-reinforcing in the case of gender. That is, having more gender-diverse teams may be an admirable goal in its own right, but more diverse teams may also create a fairer playing field for female leaders.

In the following, we present relevant literature on gender and leadership with a focus on the context in which leadership is enacted: teams. We define and describe group prototypicality, leadership prototypes, and trust, summarizing the relevant research and the key propositions we aim to test. Finally, we outline several theoretical and practical implications, including the relevance of the team gender composition for leaders, organizations, and business ethics more broadly as well as several specific ideas for future research.

Leadership as a Group Process

Leadership is defined by its context because a leader cannot exist without followers. Given the highly embedded nature of leadership, it is important to remember that leaders not only lead the group, but are also members of the groups they lead (e.g., the President of the USA is also an American; van Knippenberg 2011). This shared identity shapes followers’ responses to and evaluations of their leaders, which is a process driven by prototypes.

People are quick to categorize themselves and others into groups, which are cognitively represented by prototypes (Turner 1985; Turner et al. 1987). Derived from cognitive psychology (Rosch 1978), prototypes are “fuzzy sets of attributes that define and prescribe attitudes, feelings, and behaviors that characterize one group and distinguish it from other groups” (Hogg 2001, p. 187). Comparable to stereotypes, prototypes serve as mental heuristics that are retrieved in relevant situations to guide perception, self-conception, and eventual action (Cronshaw and Lord 1987; Hogg 2001). However, prototypes also comprise a contextual element that allows them to be responsive to specific social contexts (Hogg et al. 1998, 2006). For example, a liberal, democratic leader may be viewed as more prototypical of her constituents in metropolitan New York, but less prototypical in rural South Carolina.

According to theory, group prototypes are inherently context based (Hogg 2001; van Knippenberg and Hogg 2003); as group composition changes, group prototypes evolve accordingly. In other words, the context shapes what constitutes a typical group member and thus who would be a typical “leader” of this group. These benchmarking processes have direct implications for leadership evaluations because perceptions of group representativeness are strong and positive predictors of the degree to which leaders can influence followers (van Knippenberg 2011). We build on this work to examine the effects of team gender composition on leader prototypes (i.e., mental representations of what constitutes “good” leadership) for male and female leaders. That is, we manipulate leaders’ objective group prototypicality in terms of gender to causally examine its effect on team perceptions of how “leader-like” the leader is.

Leader Gender, Team Gender Composition, and Leadership Evaluations

Role congruity theory is based on social role theory (Eagly 1987), which explains that historical distributions of men and women into breadwinner and homemaker roles, respectively, have produced societal gender norms as well as actual differences in behavior. Women and men are expected to have attitudes and skills congruent with their traditional roles, which creates stereotypes that foster gendered responses to leadership (Eagly and Karau 2002). Meta-analytic results bolster this assertion, indicating that men are perceived as more closely fitting stereotypical leadership prototypes and thus are evaluated more favorably than women (Eagly et al. 1992). However, more recently, a meta-analysis has found that gendered responses to leadership vary according to certain contextual moderators, including the percentage of male raters evaluating the leader (Paustian-Underdahl et al. 2014). Indeed, women are viewed as equally effective leaders as men in gender-diverse groups of followers (Paustian-Underdahl et al. 2014).

In line with this work, the social identity model of organizational leadership (van Knippenberg and Hogg 2003) argues that leaders’ group context (i.e., the team) can influence followers’ responses to leadership beyond leader characteristics. In other words, leaders are more effective in mobilizing and influencing followers when the leader’s identity more closely reflects that of the team or group (Hogg 2001). Most research has examined subjective characteristics of identity (e.g., attitudes; Giessner et al. 2013, Study 1; Hais et al. 1997; Monzani et al. 2014; van Knippenberg and van Knippenberg 2005); however, initial findings also support this effect for leaders’ objective characteristics (e.g., sex). For example, Hogg et al. (2006) compared perceptions of male and female leaders as a function of whether group norms emphasized stereotypically masculine or feminine qualities. They found that gendered group norms relate to leaders’ group prototypicality contingent upon leader gender. Social identity and social influence theories (e.g., Ashforth and Mael 1989; Hogg and Abrams 1988; Tajfel and Turner 1986) similarly argue that groups are a critical source of social influence and information.

We build on this work by dovetailing propositions from role congruity (Eagly and Karau 2002) and social identity theories (Hogg 2001; van Knippenberg and Hogg 2003) to propose that when women lead male-dominated groups, they may be viewed as less prototypical leaders compared to male leaders and female leaders of more gender-balanced groups. When female leaders are more representative of their followers, as they would be in gender-balanced teams, equity in evaluations of male and female leaders should be restored. Given the extent to which many organizations and leadership roles continue to be male-dominated (Catalyst 2018; Eagly and Karau 2002; Swiss Federal Statistical Office 2014), leading to negative consequences for female leaders (see Heilman 2012), we propose that team gender composition could be a prime point of intervention. Specifically, team gender composition may reduce or even override the more general, societal leader stereotypes to improve follower responses to female leaders as the team gender composition shifts from male-dominated to more gender-balanced. In other words, male leaders are likely viewed as more prototypical than female leaders are in male-dominated groups, but this effect should dissipate in more gender-balanced groups.

Hypothesis 1

The male advantage in leader prototypicality is smaller in gender-balanced teams than in male-majority teams.

Group Prototypicality and Trust

Leadership scholars view the trust of subordinates as an essential component of effective leadership (e.g., Bennis and Nanus 1985; Braun et al. 2013; Dirks and Ferrin 2002; Zand 1997). For example, Conger and Kanungo (1998, p. 46) state that “…leading implies fostering changes in followers through the building of trust and credibility. In turn, trust enables and builds enduring commitment in the pursuit of a future goal.” Indeed, trust is universal to leadership theories (Dansereau et al. 2013, p. 800). Thus, trust in leadership is an important leadership outcome that is likely affected by leaders’ group prototypicality.

Social identity leadership scholars argue that prototypicality is an important driver of effective leadership and group performance because more group prototypical leaders are more trusted to pursue the group’s interests (e.g., Giessner and van Knippenberg 2008; Giessner et al. 2009; van Knippenberg and Hogg 2003). Indeed, leaders’ group prototypicality has consistent positive effects on a range of leadership evaluations, such as performance ratings and effectiveness (see van Knippenberg 2011). We build on this research by arguing that not only leaders’ group prototypicality but also their prototypicality as leaders is associated with leader trust and effectiveness.

Hypothesis 2

Team ratings of leader prototypicality are positively associated with trust in the leader.

By integrating role congruity (Eagly and Karau 2002) and social identity theories (Hogg 2001; van Knippenberg and Hogg 2003), we propose that team gender composition will interact with leader gender to affect team ratings of leader trust through team ratings of leader prototypicality. Teams consisting of more men are likely to see a male leader as more prototypical than a female leader; thus, the male leader should be viewed as more trustworthy. However, as teams become more gender-balanced, this male advantage should dissipate, thus increasing gender equity in team leadership evaluations. Formally,

Hypothesis 3

The male leadership advantage in leader trust via leader prototypicality is smaller in gender-balanced teams than in male-majority teams.

In Study 1, we test our first hypothesis regarding the effects of leader gender and team gender composition (i.e., leader group prototypicality) on team ratings of leader prototypicality. Then, in Study 2, we expand on this finding by testing the extent to which leader gender and team gender composition interact to relate to team perceptions of trust in the leader through team leader prototypicality ratings (Fig. 1).

Fig. 1
figure 1

Complete theoretical model. Note Control variables are in gray; academic major was also a control variable in Study 2

Methods

Data were collected in the same manner and in the same context for Studies 1 and 2, but the data were collected one year apart for each study (2014 and 2015). Additionally, in Study 2, we replicate Study 1 and conceptually extend our model to predict trust in the leader.

Sample and Procedure

We conducted two randomized field experiments among teams of business, economics, and informatics (IT) students at a large university in Western Europe.Footnote 2 Followers were incoming first-year students participating in an orientation event starting on the first day of the semester and lasting for the duration of the semester. Our specific context was organizational socialization (Fang et al. 2011), wherein our leaders served as “organizational insiders” (Fang et al. 2011) guiding newcomers with institutionalized socialization tactics (i.e., “learning experiences as part of a cohort with clearly defined sequences and timed training and orientation activities”; Jones 1986, p. 131). Leaders were more experienced students enrolled in a leadership and group organization course. Leaders applied for this course and received academic credit upon completion. A total of 35 leaders were selected using systematic criteria (e.g., previous academic performance) and trained for two days. Leaders were not trained in a specific leadership style; instead, they received a general theoretical overview of leadership. This training was intended to develop leaders’ skills and effectiveness during the event.

Data were collected via pencil-and-paper surveys distributed in person. Surveys were administered in the participants’ native language of German with items forward- and back-translated from English. Surveys were completed after team members spent approximately six hours with their teams and leaders during the orientation event. Leaders were responsible for designing activities for their groups, including study tips and strategies for academic success, as well as physical and social orientation to the university campus and preparation for an intergroup competition.Footnote 3 Thus, leaders’ strategic goals included teams’ academic and social orientation to campus as well as encouraging followers’ creativity and performance for the competition. Leaders also organized subsequent events for their teams; this orientation event was the first, but not last, event during which the leaders, followers, and teams would interact.

Measures

We used a multiple-wave and multiple-source approach to data collection. All perceptual measures had 6-point response scales (1 = does not apply at all to 6 = completely applies). Follower gender (male = 0, female = 1) was self-reported, and leader gender (male = 0, female = 1) was objective information entered by researchers. We manipulated leaders’ objective group prototypicality via team gender composition. Leader and follower ratings were both measured by surveys with additional filler items to further disguise our study’s purpose as well as standard orientation day evaluation items (e.g., satisfaction with the amount of information received and the organization of the event) for the dean’s office.

Leader Prototype

We assessed followers’ perceptions of their leader’s embodiment of a prototypical leader with three items (Cronshaw and Lord 1987): the leader is a typical leader, exhibits the behavior of a leader, and fits one’s image of a leader (α = .88 in Study 1; α = .89 in Study 2).

Group Prototypicality

We randomly determined team gender composition (i.e., leaders’ objective prototypicality of the group in terms of gender) as male-majority (20% women) or gender-balanced (50% women). Our manipulations were strategically chosen to mirror the current gender composition of women in leadership roles (i.e., 20%; Swiss Federal Statistical Office 2014) and approximately equal group gender compositions (50%). Furthermore, team gender compositions of 20–50% are feasible in modern workplaces given that women have composed at least half of college-degree earners for several decades (Perry 2013). These team gender compositions also echo skewed (20% women) and balanced (40–60% women) designations from critical mass theory (Kanter 1977).

Due to common issues with field experiments (e.g., new participants and no-shows on the day of the event), team gender compositions were not always exactly 20 and 50%. Thus, we also had leaders report the exact number of men and women in their teams. Since these estimates are more accurate and informative than a dichotomous variable, we used the leader-reported measure of team gender composition in our empirical analyses; however, our results remain largely the same in size and significance when using the dichotomous variable.

Trust in Leader

We assessed followers’ trust in their leaders in Study 2. We used the 6-item behavioral trust scale developed by Gillespie and Mann (2004; α = .89). Items assessed included the extent to which followers trust their leaders’ skills, judgment, and values as well as how willing they are to share their feelings and personal information with the leader (e.g., to what extent do you trust your leader in regards to…relying on his/her task-related skills and abilities? …sharing your personal beliefs?; α = .82).

Control Variables and Robustness Tests

According to the relational demography theory (Tsui et al. 1992; Tsui and O’Reilly 1989), women may prefer female leaders and vice versa for men. Thus, we controlled for follower gender (0 = male, 1 = female). Theoretically, group identity may be more cohesive in smaller groups, whereas more practically, followers may interact more with their teammates and leaders in smaller groups. Post (2015) also found a female leadership advantage in larger teams. Thus, we also controlled for group size for theoretical reasons. The results remained unchanged if these two control variables were excluded from the analysis.

By design, we hold constant potentially meaningful facets of group diversity (e.g., age and tenure), as recommended by team diversity scholars (Jackson et al. 2003; van Knippenberg and Schippers 2007), to better isolate the effect of team gender diversity. Specifically, 97% of the participants were 18–25 years old, and all were new to the university. Three groups in Study 2 were not from business and economics, so we included academic major as a control variable.

To rule out alternative explanations for our findings, we also assessed leaders’ self-conceptions as leaders (Epitropaki et al. 2017) as well as leader–follower gender match (Tsui and O’Reilly 1989). Leaders’ self-reported prototypicality was measured with the same three items used to assess team ratings (Cronshaw and Lord 1987) but adapted for leaders’ self-ratings. Leaders completed these items twice: once before (α = .84 in Study 1; α = .72 in Study 2) and once after leading their teams (α = .89 in Study 1; α = .82 in Study 2).

Results

Results of Study 1

The analyses were conducted with Mplus (version 7.4). Continuous predictors were centered prior to the analyses, and unstandardized coefficients are reported. Given our relatively small sample sizes, we calculated one-tailed tests for all directional hypotheses.

Descriptive Statistics

Descriptive statistics and correlations are displayed in Table 1. From a total of 512 followers, 12 were eliminated due to missing data and 3 teams were eliminated due to our randomization being compromised (n = 74)Footnote 4 for a final total of 426 participants (38.5% women). Team gender was randomly assigned as male-majority (20%) or balanced (50%). However, as is common with field experiments, there was slight variation in the actual proportion of women in each group (e.g., no-shows or newcomers who had not signed up for the event), and the team share of women ranged from 20 to 63.64% (M = 37.22, SD = 12.31).

Table 1 Study 1 descriptive statistics, correlations, and scale reliability

To ensure the validity of our manipulations, we conducted a series of manipulation checks. First, leader reports showed that balanced teams comprised a greater share of women (M = 46.81, SD = 8.00) than did male-majority teams (M = 26.87, SD = 6.28), Cohen’s d = 2.77. Second, we included a measure of followers’ own ratings of their perceived group gender composition, for which followers rated their team composition from 1 (all men) to 5 (all women). Analysis of this item indicated that followers noticed their group gender composition and scored it in line with our manipulation for balanced (M = 2.94, SD = .0.35) and male-majority teams (M = 2.23, SD = 0.54), t(425) = 16.01, p < .001, Cohen’s d = 1.56. These checks reflect very large effects, indicating that our manipulations were effective.

Preliminary Analyses

The data represent 426 participants nested within 32 teams. Our hypotheses propose that leader gender and team composition affect group-level perceptions of the leader. Thus, we computed an intraclass correlation coefficient (ICC1) to show the percentage of total variance in the dependent variable that is between groups (ICC1 = .06). Although not extremely high, this value suggests that there is meaningful variance in group perceptions of leader prototypicality (Bryk and Raudenbush 1992). Additionally, we examined whether aggregation was justified in our data by calculating r*wg(j) values. Examining r*wg(j) values suggested that the level of agreement was above the typical cutoff of .70 to support aggregation for leader prototypicality (average r*wg(j) = .89). Given the r*wg(j) values, our theoretical reasons (Bliese 1998), and an ICC(2) of .46, which also suggested reliable group means (Castro 2002; LeBreton and Senter 2008), we aggregated leader prototypicality. Comparisons of the null model with a model allowing for random intercepts also indicated a significant difference, χ2(1, N = 426) = 5.86, p < .01. Thus, there is a significant intercept variation according to group.

Hypothesis Testing

The linear mixed-effects model indicates that male leaders are not rated as more prototypical leaders than female leaders (b = − 0.12, p = .19; see Table 2). There is also no main effect of team gender composition on leader prototypicality (b = 0.01, p = .39). However, as expected, these null main effects are qualified by a significant interaction between leader gender and team gender composition (b = 0.01, p = .03). As depicted in Fig. 2, when there are more men in the team, female leaders are rated as less prototypical leaders than males. However, this effect is eliminated when there are more women in the team. This pattern of results supports Hypothesis 1.

Table 2 Study 1 linear mixed-effects models
Fig. 2
figure 2

Interaction between leader gender and group gender composition (Study 1). Note The interaction is plotted at ± 1 SD from the mean or 22 and 52% women in the team (respectively)

Results of Study 2

Descriptive Statistics

Descriptive statistics and correlations are displayed in Table 3. From a total of 467 followers, 33 were eliminated due to missing data. The final sample consisted of 434 participants (31.33% women). The team share of women ranged from 0 to 63%, with an average of 31.37% (SD = 16.15). Leaders of balanced teams reported more women on their teams (M = 41.42, SD = 11.18) than did leaders of male-majority teams (M = 17.05, SD = 10.27, Cohen’s d = 2.27), as intended by our manipulation.

Table 3 Study 2 descriptive statistics, correlations, and scale reliability

We conducted a series of t-tests to examine potential differences in sample characteristics between Studies 1 and 2. The only significant difference was found for participant gender: 38% of followers were male in Study 1, whereas only 31% of followers were male in Study 2 (p < .05). The main reason for this was that male followers were overrepresented in the three teams that were excluded from Study 1 due to compromised randomization. It is noteworthy that we also control for follower gender across all analyses.

Preliminary Analyses

The data represent 434 participants nested within 35 teams. An ICC1 was computed to show the percent of total variance in leader prototypicality and trust in a leader between groups (ICC1 = .11 and .05, respectively). These values suggest that there is meaningful variance in group perceptions of leader prototypicality and trust in the leader (Bryk and Raudenbush 1992). An examination of r*wg(j) values suggested that the level of agreement was above the typical cutoff of .70 to support aggregation for leader prototype and trust (average r*wg(j) = .90 and .94, respectively). The ICC2 values for leader prototype (.62) and trust in the leader (.40) also suggested reliable group means, supporting aggregation (Bliese 1998; Castro 2002; LeBreton and Senter 2008). Model comparisons of the null model with a model allowing for random intercepts indicated a significant difference for leader prototypes [χ2(1, N = 434) = 23.11 and p < .001] as well as trust in the leader [χ2(1, N = 434) = 6.64, p < .01].

Hypothesis Testing

As expected, we replicated a significant interaction effect of leader gender and team gender composition on leader prototypicality (b = 0.02, p = .003; see Table 4). As depicted in Fig. 3, when there are more men in the team, female leaders are rated as less prototypical leaders than male leaders. However, this effect is eliminated with more women in the team. This pattern of effects replicates and further supports Hypothesis 1.

Table 4 Study 2 linear mixed-effects models
Fig. 3
figure 3

Interaction between leader gender and group gender composition (Study 2). Note The interaction is plotted at ± 1 SD from the mean or 15 and 47% women in the team (respectively)

As an extension of Study 1, we also tested the relation between leader prototypicality and trust in the leader in Study 2. As expected, leader prototypicality was positively associated with leader trust (b = 0.41, p < .05), supporting Hypothesis 2. To test our proposed mediated moderation effect, we followed the recommendations of Preacher and Hayes (2008) to estimate the indirect effects with 95% Confidence Intervals (CI)—conditional on the moderator. We calculated the conditional indirect effects of leader gender on leader trust via leader prototypicality at three different values of our moderator, team gender composition (the mean ± 1 SD). The results show that female leaders are viewed as less trustworthy (via leader prototypicality) in male-dominated teams (b = − 0.14, 95% CI = [− .26, − .03], at − 1 SD of team gender composition), but this effect becomes nonsignificant with more women in the team (b = − 0.05, 95% CI = [− .11, .02] and b = 0.05, 95% CI = [− .02, .12], at the mean and + 1 SD of team gender composition, respectively). These results support Hypothesis 3 and our overall model (Fig. 1).

Supplementary Analyses

Statistical Power

The number of groups tested in Studies 1 and 2 could be considered small (N = 32 and N = 35), which could raise concerns about statistical power. However, insufficient power is more of a concern when effects are not found. Thus, insufficient statistical power is less of a threat to our key findings of interest, for which we found repeated empirical support. Insufficient statistical power is also one reason why we did not conduct simple slopes analyses. Simple slopes tests have been shown to be unreliable when sample sizes are small (e.g., Liu et al. 2017). Methods experts (e.g., Dawson 2014) also argue that simple slopes tests are not always necessary.

In the following, we describe several robustness checks that were conducted to rule out alternative explanations for our findings. To optimize statistical power, we combined data sets from Studies 1 and 2 for the following analyses.

Leader Ratings

Although there is ample evidence to guide our prediction that team responses to leaders change according to leader gender and team gender composition, it is unclear whether only the followers’ perceptions of leaders change or whether the leaders’ conceptions of themselves also change. If leaders’ self-perceived similarity to the leader prototype also changes based on the teams that they lead, their group-directed actions may be altered. This notion builds on principles from the social identity theory of leadership (Hogg 2001; van Knippenberg and Hogg 2003) such that leaders’ self-perceived prototypicality and degree of team identification predict their team-oriented attitudes and behaviors. However, the vast majority of research on leader prototypicality has examined followers’ perceptions of leaders’ group prototypicality (see van Knippenberg 2011) rather than leader prototypicality. Indeed, a recent review by Epitropaki et al. (2017) has similarly highlighted the sparse work on leadership identity.

Given this paucity of research, we also include an exploratory assessment of leaders’ self-reported leader prototypicality. We assessed the relationship between leader gender and leader prototypicality at two points in time: before and after leaders worked with their teams. Only in the latter instance could leaders have possibly been influenced by their objective group prototypicality because this information was unknown to them before they led their teams.

Across the two studies, half of the leaders were women (51.43%) and half of the groups were gender-balanced (54.29%). Of 70 leaders, 65 (92.9%) returned completed surveys at both time points. These leaders were evenly distributed across leader gender (n = 35, or 53.8% women) and team gender conditions (n = 33, or 50.8% balanced gender teams).

We ran a series of mixed analyses of variance (ANOVAs) using within-(Time 1, Time 2) and between-subject variables (leader gender: male or female; team gender composition: male majority or gender balanced). The results indicate no significant main effects or interactions apart from an overall increase in self-rated leader prototypicality from Time 1 (M = 4.23, SD = 0.81) to Time 2 (M = 4.48, SD = 0.88; F(1, 61) = 7.33 p = .009, η2p = 0.11); all other ps = .36–.70. Thus, leaders’ self-rated prototypicality does not appear to be affected by our intervention of group gender composition.

Leader–Follower Gender Match

Given our knowledge of the relational demography theory and findings (e.g., Tsui and O’Reilly 1989), the gender composition of the leader–follower dyads may affect followers’ ratings of leaders in more gender-balanced teams rather than team-based perceptions of prototypicality. That is, with more women in the team, there are more gender-matched pairs, which could provide an alternative explanation for our findings.

To test the potential effect of dyadic similarity, we created a new variable of gender match (1) or mismatch (0) between followers and leaders. We then included this variable in our previous model predicting leader prototypicality. This new variable does not explain additional variance in our model, nor does it change our overall patterns of results. Thus, our results do not seem to be explained by female (male) followers’ higher ratings of leaders of the same gender.

Discussion

Our findings provide evidence that the male advantage in leadership ratings (i.e., prototypicality and trustworthiness) can be mitigated in gender-balanced teams. In doing so, we provide causal support for the social identity theory of organizational leadership (Hogg 2001; van Knippenberg and Hogg 2003) and a boundary condition of role congruity theory (Eagly and Karau 2002). We also show evidence against potential alternative explanations of leaders’ self-perceptions of prototypicality, which could change based on the teams that they lead, and dyadic gender match (if female followers rate female leaders more positively). In the following, we outline the implications of our findings for theory, practice, business ethics, and society.

Theoretical Implications

We aimed to bridge classic work on gender and leadership (Eagly and Karau 2002) with leadership and group prototypes research (Hogg 2001; van Knippenberg and Hogg 2003) to make several core contributions and outline specific areas for future research. First, the proposition that team contextual features and group prototypes undermine broader societal biases regarding women’s incongruity as leaders has been theorized (Eagly and Karau 2002; Hogg 2001; van Knippenberg and Hogg 2003; van Knippenberg 2011). However, to our knowledge, this proposition has not yet been tested. We provide causal evidence for this idea by showing that gender differences in perceptions of leadership prototypicality dissipate with increasing gender diversity in teams.

However, it is important to note that the teams in our study never exceeded 63% women. It is possible that there may be a critical point at which team gender composition no longer helps and might even be detrimental to female leaders. For example, social psychology research has found a stigma-by-association effect for female leaders who lead majority female teams (Pryor et al. 2012). Field experimental evidence also supports this proposition such that stigma toward individual team members, as well as teams as a whole, increased with the proportion of women on the team (West et al. 2012). Thus, future research should seek to delineate the boundary conditions of the positive effects of team gender composition for teams and female leaders, extend this research from intragroup to intergroup perceptions, and examine other organizational outcomes also influenced by team gender composition (e.g., team performance; Hoogendoorn et al. 2013).

Second, we demonstrated a new type of prototypicality benchmarking. To date, social identity leadership researchers have mostly manipulated leaders’ perceived group prototypicality via fabricated feedback about leaders’ values or beliefs (e.g., Giessner et al. 2013, Study 1; Hais et al. 1997; Monzani et al. 2014; van Knippenberg and van Knippenberg 2005). This was often done for individual participants who were ostensibly in groups or anticipated group interaction (e.g., Hais, et al. 1997; Hogg et al. 1998, 2006; Monzani et al. 2014), organized in virtual teams, or with virtual leaders (e.g., Giessner et al. 2013, Study 1; van Knippenberg and van Knippenberg 2005). This speaks to the power of group prototypes. However, we found converging effects after manipulating leaders’ objective group prototypicality (in terms of gender); our followers were also nested in actual groups, and they interacted with real team members and leaders for several hours. Because leadership is inherently a social process (Chemers 2001), it is perhaps most appropriately or accurately studied through social interactions. However, a key distinction between this past work and the current research is that we did not measure leaders’ perceived group prototypicality (i.e., the leader represents what is characteristic about the team), as is common in social identity studies of leadership (see van Knippenberg 2011). Thus, it is unclear whether and how our manipulation would have influenced team perceptions of leaders in this regard.

Third, beyond our implications for female leaders, our findings stimulate avenues of inquiry about male leaders. Interventions such as ours should not affect team ratings of men, because they are similarly prototypical as female leaders in more gender-balanced groups (Hogg 2001; van Knippenberg and Hogg 2003). Our results seem to support this proposition in Study 1 (Fig. 2), but not in Study 2 (Fig. 3). This discrepancy could be data driven given the different values represented in our figures (i.e., lower boundaries of 15 and 22%). That is, a recognizable number of women in the team may be required to influence teams’ ratings of their female leaders (i.e., a cross-level effect), and this value could fall between 15 and 22%. This idea echoes themes from tokenism theory such that a “critical mass” requires more than 20% women in a team (Kanter 1977) or a “magic number” of three (Joecks et al. 2013; Konrad et al. 2008; Torchia et al. 2011) to influence teams (i.e., within-level effects) or organizational outcomes. However, this is only one possible post hoc explanation.

Fourth and finally, our findings also have implications for the diversity literature. Scholars have argued that actual diversity is a key facet of diversity climate, because more diversity means a more favorable climate (see Dwertmann et al. 2016). Our findings illustrate why and how this holds true for gender and leadership. Meta-analytic evidence supports a similar idea such that women have fewer leadership disadvantages in settings with more balanced organizational gender demography (Eagly et al. 1995; Paustian-Underdahl et al. 2014). Thus, beyond the ethics and social justice arguments for equal career opportunities (e.g., Dwertmann et al. 2016; Eagly and Carli 2007), more diverse teams and organizations may also have a self-reinforcing effect by creating a more level playing field for female leaders.

Practical Implications

Our findings also offer implications for practice, for example, in guiding team formation and leader assignments. Teams are becoming more gender diverse as increasingly more women enter traditionally male-dominated fields (Bureau of Labor Statistics 2018). However, despite their representation at lower levels, women remain a glaring minority in leadership positions (e.g., only 5.2% of CEOs are women; Catalyst 2018). According to our findings, such demographic changes at the lower level may also benefit female leaders in ways that have been overlooked to date, but only if teams are designed with gender in mind.

Furthermore, there are extraordinary costs invested in leadership training programs despite a lack of evidence of transfer (Baldwin and Ford 1988; Burke and Day 1986; Burke and Hutchins 2007). Individual interventions that train women to display more masculine or agentic behaviors also risk backlash (Brescoll 2011; Brescoll and Uhlmann 2008; Rudman 1998; Rudman et al. 2012), whereas organizational interventions may incidentally reinforce gender roles and biases (e.g., see Gloor et al. 2018; Heilman et al. 1997). Thus, we encourage organizations to consider team-based interventions to restore gender equity in leadership evaluations.

In the case that teams are already established or must be constructed based on non-gender-based criteria (e.g., employee education or expertise), practitioners can also use our findings to inform their interpretations of leader evaluations. For example, a woman from a male-majority team may provide similarly negative performance feedback about a woman supervisor as her male teammates. This effect would be unexpected according to relational demography perspectives (Tsui et al. 1992; Tsui and O’Reilly 1989) and could be interpreted as problematic female same-sex interactions in organizations (see Sheppard and Aquino 2017), damaging the case for increasing women in the workforce. Thus, the potential influence of the team gender context on evaluations, such as performance reviews or 360-degree feedback, should not be overlooked and can easily be assessed by including a single item about team gender.

These practical measures may also help to resolve three particular moral issues relevant to employees, organizations, and society. First, there is a fundamental moral case for equity in leadership appraisals if male and female leaders perform the same behavior but receive different reactions to or ratings of their performance. These ratings are then used to designate penalties and rewards and thus perpetuate the gender gap in leadership. Such a situation would be unfair for competent female leaders and overly reassuring for less competent male leaders. Given general moral preferences for fair performance appraisals (Dusterhoff et al. 2014), the consequences of such inequities could also resonate throughout teams and organizations.

Second, growing evidence indicates that women may be more moral, ethical, and other-oriented socially compassionate leaders than men (see Eagly 2005). Empirical evidence suggests that a larger number of women serving on boards of directors are associated with more ethical firm behavior (Nekhili and Gatfaoui 2013). A recent meta-analysis has also found that female board representation is positively associated with board monitoring and firm profitability (Post and Byron 2015). Thus, female leaders may lead in a more moral manner than men, bringing a more ethically oriented style to their teams and organizations without necessarily sacrificing the bottom line.

Finally, there are ethical implications pertaining to an inefficient use of the labor force. Students invest significant time and effort in their studies, whereas society invests substantial funds in their training and education (OECD 2017). Thus, attracting and retaining trained female talent by recruiting more female leaders and ensuring equitable ratings and rewards for existing female leaders also make sense as a societal priority. Such an initiative also has implications for workforce sustainability in the long term, particularly where increasingly common immigration restrictions necessitate a more efficient use of locally trained talent (including women), especially for highly specialized workers (Dutu 2014).

Strengths, Limitations, and Future Research

Our study is methodologically rigorous, including a replication of large effects and a conceptual extension. King et al. (2013) endorse field experiments such as ours as a gold standard for external validity, especially when examining sensitive topics such as gender bias. We also avoid the threat of common method variance (Podsakoff et al. 2012) by using data collected from different sources (e.g., followers and leaders) and including objective data (e.g., team share of women and leader gender) collected at different times (e.g., before and after the orientation event). Interactions cannot be artifacts of common method variance (Siemsen et al. 2010). Because we intervened and manipulated team gender composition by randomly assigning leaders and followers to teams, we can make a causal claim based on our findings. That is, leader gender predicts team ratings of leader prototypicality and trust depending on the team gender composition. We also provide evidence against several other potential alternate explanations (e.g., leaders’ self-conceptions as leaders or increasing shares of female follower–leader dyads drive this effect).

However, as with any study, our research has its limitations. For example, our student sample could limit the generalizability of our results. However, aligned with common conceptions of leadership, our student leaders guided their teams toward shared strategic goals (e.g., academic and social orientation as well as creativity and performance for the intergroup competition). Although our student leaders had no evaluative or disciplinary influence on their followers, they could dismiss individuals from the event. Thus, despite our explicit references to our more senior students as “leaders” in our measures and event organization, such an arrangement may be more representative of modern, flatter hierarchies (e.g., project managers or peer leadership) than more traditional conceptions of leadership, the former of which is becoming increasingly common in today’s more interdependent organizations (Rajan and Wulf 2006; Wegman et al. 2016). We recommend that future research replicates and extends our findings by testing these hypotheses within organizational teams to better understand the extent to which team gender composition and leader gender relate to leader evaluations in ongoing work teams. However, leaders and followers would not be randomly assigned to teams, so this type of design could be threatened by endogeneity.

Future research should also examine the effects of interventions such as ours on team ratings of male leaders. Although men should be rated similarly to women in more gender-balanced groups because they are equally prototypical of their groups (Hogg 2001; van Knippenberg and Hogg 2003), our results support this idea only in Study 1. As previously discussed, this could be due to the specific ranges of team gender compositions depicted in our graphs, which differed across studies. However, future research with a more continuous range of team gender compositions is needed to better assess this possibility.

We also encourage studies in other countries to examine whether cultural preferences for gender egalitarianism may influence our findings. The Western European country where our studies occurred has lower ratings for gender egalitarianism in comparison with Eastern or Nordic European countries (House et al. 2004). Similarly, the disciplines studied, management and economics, are more masculine in both stereotypes and actual demographic composition. Thus, future research could also examine team responses to male and female leaders in traditionally female disciplines or occupations such as teaching or nursing. However, these instances do not contribute to the larger patterns of social and economic inequality (Budig 2002) as in the traditionally male contexts that we examined.

Finally, we created teams with low or balanced shares of women. Although this allowed us to maintain generalizability to typical work groups, we were unable to draw conclusions about groups that were all male or all female. Field studies in actual organizations could help address this concern because many organizations have teams comprising a variety of gender compositions. However, this option would not solve the potential endogeneity problem.

Conclusions

Our results highlight the benefits of increasingly diverse employees for female leaders and organizations, but only if teams are designed with leaders in mind. Indeed, if the leadership game is rigged in men’s favor, women may face a double bind of backlash regardless of their ability or performance, which poses a continuing ethical dilemma for organizations and society. However, according to our reasoning and results, there is hope for restoring gender equity and equality in leadership if we fix the game, not the dame.