Introduction

Research regarding women in science has received substantial attention since the 1980s, motivated by inequality and occupational segregation concerns. This exploration has provided valuable insights on many aspects of women scientists’ career development (e.g., Xie et al. 2003). In 1980, the Congress of the United States passed the Women in Science and Technology Equal Opportunity Act (Handelsman et al. 2005). Bentley and Adamson (2003) prepared a report for the National Science Foundation that reviews the literature on gender differences in the careers of academics. The authors summarize their results, highlighting evidence that women in academic careers are disadvantaged compared to men—earning less, publishing less frequently, and less likely to receive promotions to senior positions. A Women at the Top parliamentary government report in 1990 also stressed that “it is wholly inexcusable that centers of modern academic teaching and excellence … should remain bastions of male power and privilege” (cited in Osborn 1994). Such differences violate Mertonian norm of universalism in science, emphasizing that scientists should only be evaluated based on the quality of their work (Andersen 2001; Bornmann et al. 2007; Lincoln et al. 2012).

Gender inequality, imbalances, and biases in science remain a widely discussed and explored area in recent years (see, e.g., Holman et al. 2018; Gay-Antaki and Liverman 2018, Bendels et al. 2018a, b; Caplar et al. 2017; van den Besselaar and Sandström 2016, 2017; Lerback and Hanson 2017; Handley et al. 2015; Sheltzer and Smith 2014; West et al. 2013; Larivière et al. 2013; Bedi et al. 2012; Conley and Stadmark 2012; Ceci and Williams 2011); as barriers to women in science are still widespread (Larivière et al. 2013). The continued male domination of academia (King et al. 2017) is, for example, illustrated by the disparity in publication records: women account for less than 30% of fractionalized authorships or research pools globally, while males have almost twice as many first-authored articles (Larivière et al. 2013; Nature Cell Biology Editorial 2018).

Despite the growing share of women in all positions, the higher one moves up the ranking ladder, the lower the number of women (van Arensbergen et al. 2012). Concerns regarding subconscious gender biases in hiring and promotion processes have been expressed (Williams and Ceci 2015; Reuben et al. 2014; Treviño et al. 2018). For example, Sheltzer and Smith (2014) find evidence that elite male faculty (who have been awarded major grants or awards) from biology laboratories at leading academic institutions in the US are less likely to give junior positions (graduate students and postdoctoral researchers) to women, compared to other male faculty or elite female faculty. As observed in tenure track systems, this is particularly true if these processes depend on the decision of the direct superior rather than a committee (Bakker and Jacobs 2016). However, a study looking at the introduction of the tenure track system at Wageningen University in The Netherlands indicates that promotion rates between men and women were already fairly equal before the tenure track system (which led to a further relative increase in female promotions) was introduced in 2010 (Bakker and Jacobs 2016). A recent PNAS study reports that women also remain underrepresented among research grantees, comprising less than one-third of National Institutes of Health (NIH) grant recipients. Funding longevity after receiving their first major NIH grant is similar to that of males (Hechtman et al. 2018). On the other hand, Marsch et al.’s (2009) carefully executed comprehensive meta-analysis of peer-reviews of grant applications uses a multilevel approach and indicates that grant applications in higher education are not subject to gender differences. Some differences were observed among peer reviews of fellowship applications, but the effects were small. Boyle et al. (2015) found that UK social-science funding did not show gender biases. Controlling for the academic position, few gender differences were found in regards to application rate, success rate, and grant size.

Academic research agencies regularly provide detailed statistical information on the progress of women in science (see, e.g., Committee on Women in Science and Engineering, Committee on the Status of Women in the Economics Profession, Committee on Women in Psychology, or Committee on the Status of Women in Sociology, National Science Foundation (e.g., Women, Minorities, and Persons with Disabilities in Science and Engineering initiative) or the European Commission with the She Figs. 2018 report). Certain research fields have also developed their own initiatives to promote the careers and monitor the progress of women, including the Committee on the Status of Women in the Economics Profession (a standing committee of the American Economic Association), which since 1971 has actively provided mentoring workshops and activities for female junior scholars to assist in overcoming the tenure hurdle (Blau et al. 2010).Footnote 1 For example, reports for the economic profession indicate that while female assistant professors and associate professors tripled between 1974 and 1989, the proportion of female full professors only grew from around two percent in the late 1970s to three percent in the late 1980s (Torgler and Piatti 2013). Likewise, the American Psychological Association Committee on Women in Psychology works to ensure that women achieve equality within the psychological science community,Footnote 2 while the Committee on the Status of Women in Sociology addresses educational, workplace, research, and visibility issues.Footnote 3 In physics, the Step Up For Women program provides resources for high school physics teachers; aiming to reduce barriers and inspire young women to major in physicsFootnote 4 by showcasing those who have been successful (Skibba 2019). Meanwhile, the National Science Foundation’s ADVANCE program is seeking to develop systematic approaches that increase the participation and advancement of women in academic STEM careersFootnote 5 and initiatives, including the Rita Levi-Montalcini Foundation and the Christiane Nüsslein Volhard Foundation, both launched by female Nobelists. The University of Michigan’s ADVANCE program developed an interactive theatre program that engages participants with typical academic situations. This program helps with interpersonal behaviours that negatively affect the campus climate and atmosphere. The ADVANCE program at the University of Wisconsin-Madison designed workshops to sensitize search committees to potential biases (Handelsman et al. 2005). Other efforts to facilitate and increase female engagement in research include the L’Oréal-UNESCO for Women in Science program and AcademiaNet (Nature Editorial 2013). Equality is one of two global priorities announced by UNESCO, which in May 2019 hosted a roundtableFootnote 6 to discuss the drivers of the science gender gap. UNESCO’s recent reportFootnote 7 from its GenderInSITE initiativeFootnote 8 on gender in science, innovation, technology, and engineering also addresses important questions regarding where the primary changes are occurring, which forces are driving development, what progressive policies and practices have emerged, and how these latter are shaping scientists’ leadership pathways. The goal is that the resulting information should help reduce male dominance at all levels of scholarly careers (Nature Editorial 2013).

Relatively few scholars are responsible for a relatively large share of scientific output, and these top scholars serve as role models for others. Experts emphasize the importance of increasing the number of female role models as a key factor in reducing this gender gap (Shen 2013; Morgan et al. 2013), especially as—compared with men—women obtain more inspiration and aspiration from outstanding female role models than male role models (Lockwood 2006). A recent large-scale field experiment covering 98 high school classrooms in the Paris region demonstrated that a one-hour exposure to a female role model led to a 10% increase in the share of 12th-grade girls enrolled in a STEM undergraduate program (Breda et al. 2020). In an attempt to assess the efficacy of such initiatives, a 2004 randomized control trial examined the usefulness of the American Economic Association’s CeMENT mentoring program (co-supported by the National Science Foundation), which exposes female participants to senior female role models. This experiment randomly allocated around 80 applicants into treatment and control groups based on (non)acceptance into the workshop. A post-treatment analysis five years after the workshop indicated that participants were 27 percentage points more likely to have secured an NSF or NIH grant and 23 percentage points more likely to have 2.4 more publications overall, with at least one being in a top-tier publication (Blau et al. 2010). Meta-studies on mentoring in general also tend to demonstrate its positive benefits and, although associated effects are small (Allen et al. 2004; DuBois et al. 2002), they reveal larger effects for academic mentoring (Eby et al. 2008). Others have emphasized that choices or motivation can lead to status differences and such differences are subject to constraints and sociocultural influences (Ceci et al. 2009). Therefore, studying the top scientists can provide guidance and insights into success cases, especially if differences between fields are observed. Women’s life choices—such as the decision of whether, or timing of when they have children—are powerful predictors of career success (Ceci et al. 2009), but women in different fields face the same life choice decisions. Top scholars could share sound advice regarding their experiences and success, which may help women moving through the pipeline (Handelsman et al. 2005).

Given the abundance of research focusing on the actual career processes or progressions of women relative to men in academia and identifying gender gaps and performance differences in the central tendency of the distribution of science workforce, in this study, we explore the gender imbalances among those who have already made their mark in science, across many fields and countries. It is important to look at top scientists because productivity and reputation in science are highly skewed (e.g., Radicchi et al. 2008; Redner 1998; Seglen 1992; Simon 1955; Lotka 1926). Therefore, documenting female representation at the top should be informative for addressing the science gender gap (Aguinis et al. 2018). A key advantage of looking at top scientists is that this is a relatively homogenous group of scholars in terms of capacity to produce successful and innovative ideas (Chan et al. 2016a, b), in particular when comparing fields. Using a large-scale dataset of more than 94,000 top-cited scientists in 21 fields across 43 countries, we are able to complement previous research at the global level (Gender in the Global Research Landscape 2017; Larivière et al. 2013) as well as studies that compare gender differences between fields (e.g., Holman et al. 2018; King et al. 2017) by providing a systematic overview of gender disparity among top scientists. In particular, we first study the distribution and representativeness of the most cited female and male scientists across different countries and fields, comparing the gender participation gap with those identified in previous studies from other levels of science. Using a large set of bibliometric measures, we also study and explore the gender difference in scientific activities such as productivity, impact, and collaboration patterns among these top scientists. Importantly, we outlined and compared these differences across fields. As the proportion of female scientists varies according to the scientific discipline, such variation could potentially explain the pattern in gender difference across fields. Lastly, we examine how relative female scientific performance or success across countries correlates with the nation’s gender equity status in general, emphasising that cultural, environmental, and institutional factors might explain the observed gender differences at higher levels in academic fields and the scientific workforce.

Gender differences among science elites

Because productivity and reputation in science are highly skewed (e.g., Radicchi et al. 2008; Redner 1998; Seglen 1992; Simon 1955; Lotka 1926), documenting female representation at the top should be informative for addressing the science gender gap. A key advantage of looking at top scientists is that this is a relatively homogenous group of scholars in terms of capacity to produce successful and innovative ideas (Chan et al. 2016a, b). In general, respect and fame among peer scientists are seen as important factors for scholars (Samuelson 2004). Several previous studies analysing top scientists have looked at Nobel laureates (see, e.g., Ashton and Oppenheim 1978; Zuckerman 1992, 1996; Breit and Hirsch 2004; Mazloumian et al. 2011; Bucchi 2014; Chan and Torgler 2012, 2015a, Chan et al. 2014a, b, c; Chan et al. 2015, 2016a, b; Chan et al. 2018; Mixon et al. 2017; Fahy 2018; Hansson 2018; Schlagberger et al. 2016; Widmalm 2018). Previous research has shown that women are strongly under-represented among Nobel Laureates (Lunnemann et al. 2019). Scholars such as Simonton (1988, 2013) have dedicated a lot of attention to the study of scientific geniuses. Some contributions in that area have looked at female geniuses such as Marie Curie (Goldsmith 2005) or Maria Goeppert Mayer, who shared the Nobel Prize in 1963 with Eugene Wigner (Rossiter 1993). Chan and Torgler (2015b) explored the question of whether great minds appear in batches and Chan et al. (2019) looked at eminent scholars to analyse fame trajectories throughout time. When looking at the literal fame of 5631 scientists born between 1800 and 1969 (based on Google Books data regarding the number of times a scientist’s full name was mentioned in digitalized English-books published between 1800 and 2000) (Bohannon 2011), we find that only 7.87% are females (N = 443), while some field differences are observed (Chemistry: 5.35%, Physics: 4.37%, Biology: 8.56%, Mathematics: 4.32%, Social Sciences: 14.09%).Footnote 9

Various studies also show that women as a group are underrepresented in senior academic ranks (Jena et al. 2015; Blumenthal et al. 2018; Santos and Van Phu 2019; Yang et al. 2019; Qamar et al. 2020; Niemeier and González 2004), even after controlling for factors that affect promotions (for an overview see Bentley and Adamson 2003), which affects the chances of increasing the numbers of eminent female scientists. Nittrouer et al. (2018) analysed data on colloquium talks at prestigious universities and colleges as they are a reflection of academic research reputation and status. Their results indicate that men give more colloquium talks at top US colleges and universities than women, which indicates that women are less visible, a result that cannot be explained by turning down talks or thinking that colloquium talks are less important. However, women seemed not to be less active in conducting international research collaboration, which does help to increase visibility and reputation (Aksnes et al. 2019).

Another set of literature explores gender disparities among high-impact authors or top journals. Torgler and Piatti (2013) explore women’s contributions to the American Economic Review, a leading journal in economics, over a period of 100 years (1911–2010). They find that the share of female contributors increased from 2.4 to 12.24%. However, a noticeably large proportion of female authorship for the years 1943 till 1945 (between 7.41 and 11.11%, with the largest share- 24%- in 1944) is due to the fact that many economists were then employed in wartime efforts. Interestingly, Liu et al. (2020) looked at 257,642 articles and 130,397 scholars in economics between 1933 and 2017. Their work indicates that out of the scholars with paper citation frequency of more than six—which they classify as the highest impact group—females are only two-fifths of the number of male scholars but are cited more often (on average 10.15 times) than males (9.66 times). Mixed evidence is available on gender differences related to publishing in high impact factor journals (Dehdarirad et al. 2015). A recent study reflects gender differences with females submitting to less competitive journals when controlling for environmental conditions (e.g., department size and set-up, size of the research group, prestige of the university and research focus, third-party funding) and individual factors such as career age and rank, marital status and children, motivations, or personality characteristics (Mayer and Rathmann 2018).

Using a large sample of high performing scientists in mathematics, genetics, and psychology, Aguinis et al. (2018) find that females are underrepresented among star performers, especially when considering the upper tail of the distribution. This echoes a significant number of studies reporting that female scholars publish less (see Aksnes et al. 2019), while some studies report no statistically significant gender differences (for a discussion, see Dehdarirad et al. 2015). In terms of citation performance, many studies show that when looking at citations per publication, no gender differences are found, particularly when comparing citation rates of single-author publications (e.g., Peñas and Willett 2006; Over 1982; Elsevier 2017; see Ceci et al. 2014 for review). Some studies observed that women have even higher citation scores (see van Arensbergen et al. 2012). Nevertheless, it is reported that multi-author publications with females in key author positions (first or last author) tend to receive fewer citations in general (Larivière et al. 2013)—as well as when publishing with high-impact journals—indicating that even established scientists are not immune to the gender bias (Bendels et al. 2018a, b). Consequently, the overall gender differences in citations seemed to be driven by females’ lower research productivity and/or the fact that women are less likely to secure high-impact key authorships in collaborative publications (Broderick and Casadevall 2019). Potthoff and Zimmermann (2017) report a gender-based fragmentation of communication. Gender boundaries are observed in the attention, exchange, and integration of knowledge. As male and female scholars cite publications of their gender, more extensive gender homophily in citations is found (Ghiasi et al. 2018). Additionally, one study also finds that gender disparities in the rate of receiving invitations for commentary and contribution worsen with author seniority and the scientific track record (Thomas et al. 2019), which could be attributed to the over-representation of males on journal editorial boards (Ioannidou and Rosania 2015; Topaz and Sen 2016; Bhaumik and Jagnoor 2019; Hafeez et al. 2019).

A large literature has looked at the implications of fame, exploring whether we observe a Matthew Effect (see, e.g., Merton 1968, 1973; Cole 1979; Goldstone 1979; Allison and Stewart 1974; Allison et al. 1982; Medoff 2006; Azoulay et al. 2014; Chan et al. 2014a, b, c; Bothner et al. 2011). Rossiter (1993) criticized that females were not only unrecognized in their own time, but well-known female scientists have also been obliterated from history. As an example, Rossiter mentioned Trotula, an eleventh-century physician famous for her cures of and writings on women’s diseases. Other females, such as Rosalind Franklin or Lise Meitner are further cases of credit denial or lack of Nobel credit. Rossiter (1993) calls this the Matilda Effect, whereby outstanding scientific contributions from women are credited to men or are overlooked entirely. A study on awards conferred by 13 STEM disciplinary societies indicates that between 2000 and 2010, women were eight times less likely to win a scholarly award and three time less likely to win a young investigator award than men (Lincoln et al. 2012). In addition, the authors clarified that women win a lower proportion of awards than expected, based on the representation in the nomination pool. Asplund and Welle (2018) assert that culturally ingrained perceptions of STEM individuals stress that “[m]erit alone does not predict our judgement—instead we are influenced by our unconscious set of expectations and stereotypes. This means candidates who are not in line with the stereotypical idea of a “high-performer” find themselves working against an invisible barrier” (p. 635).Footnote 10

Method

Data

To measure gender differences amongst the very successful scholars in science, we employ the database made publicly available by Ioannidis et al. (2019), which encompasses information on the 105,026 highest ranked scientists from almost two hundred fields in terms of a composite citation indicator that considers six career-citation metrics (excluding self-citations) (Ioannidis et al., 2016). Specifically, the composite index is composed of the standardized and log-transformed values of two bulk impact measures, i.e., total number of citations and h-index and four co-author adjusted impact, i.e., hm-index, number of citations per single-authored publications, number of citations per single- and first-authored publications, and number of citations per single-, first-, and last-authored publications. These selected scientists thus represent the most impactful authors (~ top 1.5 percentile) among the 6,880,389 authors who published at least five articles during 1960 and 2017, based on the period 1996 to 2017 using Scopus citation collection (Ioannidis et al. 2019). The collection of top-cited scientists together generated more than 831.4 million un-weighted non-self-citations between 1996 and 2017.

The dataset records the institutional affiliation listed on the scientists’ most recent publication and its corresponding country was used to assign the scientists into 95 countries. Scientists with no information regarding affiliation were excluded from the country-level analysis, which comprises 9.58% of all scientists in the database. For research field-level analysis, scientists were categorized into the three-level classification (domain-field-subfield) developed by Science-Metrix, based on the most common subfield of the journals in which they had published. For the sake of statistical power, we focus only on countries and subfields with at least 30 scientists whose gender is identified.

To compare the gender ratio between top-cited scientific authors and other levels of science, we obtained data on gender differences in authorships across countries from Larivière et al. (2013)Footnote 11 and fields from the Science-Metric report (2018). Other country-level gender-related indicators were gathered from UIS Statistics (UNESCO Institute for Statistics), World Values Survey (Inglehart et al. 2014), OECD Science, Technology and Industry Scoreboard (OECD 2015), Global Gender Gap Report (World Economic Forum 2018), and Human Development Reports (United Nations Development Programme). Further details on data collection and collation are available in the Supplementary Material. We provide data and script used to produce the study results at https://osf.io/5dhmp/.

Gender classification

To classify the gender of scientists, we match scientists’ given name to country-specific gender-name lists from multiple online sources, namely, Genderize.io (Strømgren 2016), Global Name Data project, Behind the Name, and Wikipedia, based on the country of the scientist’s most recent institutional affiliation. If a given name was listed only as first initials, the scientist was omitted from the analysis, except where gender can be classified by surname (e.g., names from Slavic countries or surname suffix). Names in which the gender was undetermined by the automatic classifier were resolved manually by conducting web searches. Resolving difficulties in name and gender disambiguation allow us to provide a more complete global perspective on gender differences in science. Overall, we are able to assign gender to 89.94% of all scientists in the sample. The full details of gender assignment are given in the Supplementary Material.

Distribution of gender across ranks

To understand the distribution of gender among the most cited scientists within a country and field (Figs. 3 and 4), we calculate the gender ratio in each ranking percentile (five percent intervals) based on Ioannidis et al.’s composite (non-self-) citation indicator (Ioannidis et al. 2016). This calculation is conducted for each area of interest. We use a simple linear regression model to determine if women are disproportionately distributed at the bottom of the ranking, using a scientist’s percentile ranking as predictor of gender.

Performance by gender

To measure how top scientists differ in scientific activities, we construct the corresponding mean difference between gender at the subfield level. Specifically, we report gender differences in career productivity (total number of publications and total number of single-, first-, and last-authored publicationsFootnote 12), collaboration and authorship pattern (share of number of single-authored papers to all publications and share of number of first- or last-authored papers to all multi-authored publications), career research impact (total number of citations received, total number of citations to single-, first-, or last-authored publications, and the corresponding citations per publication, as well as the Hirsch h-index (Hirsch 2005, 2007) and the Schreiber hm-index with multi-authorship adjustment (Schreiber 2008)Footnote 13), credit allocation based on authorship patterns (ratio of total citations and per publication citations received to single-authored publications to all publications and ratio of citations to first- and last-authored publications to citations on all multi-authored publications), and self-citation rate. For each metric, we perform a paired t-test on the subfield gender differences to see if such a difference is systematic across the most cited scholars in science. To account for the skewed nature of publication and citation counts, we also perform the test on related metrics with log-transformation as well as Wilcoxon signed-rank tests on median difference (SI Appendix, Fig. S4). Furthermore, we present the gender differences across fields by standardizing the effect size by means of Cohen’s d (Fig. 6), and using the number of scientists within the subfield as weights. To assess whether the results in gender differences are sensitive to the academic age of the scientists, we also report additional regressions based on results controlling for the year of first publication of each scientist (see SI Appendix, Table S4). The findings are both qualitatively and quantitatively in line with the main results.

Results

Universally, we find only about 15% of the most cited scientists are women, who together generated only 13.8% of all un-weighted non-self-citations in the database records. This reveals a more pervasive indicator of inequality compared with inequality in scientific production, where women account for 30% of fractionalized authorships globally (Larivière et al. 2013) and in high-quality research (Bendels et al. 2018a, b). While men dominate the top scientists list in all countries (Fig. 1a) and in nearly all scientific disciplines (SI Appendix, Fig. S1), it is interesting to examine the degrees to which those differences vary.

Fig. 1
figure 1

Women as share of top-cited scientists across countries. a, We demonstrate the proportion of men and women among the most cited scientists in 43 countries. The size of the pie is scaled to reflect the total number of top scientists with country colouration indicating the ratio of top-cited scientists to the total number of researchers in the higher education sector in the country, based on the latest available records from UNESCO’s Institute for Statistics. b, Women-to-men ratio among top-cited authors is higher in countries with a larger top scientists-researcher ratio (Pearson’s correlation = 0.381, P = 0.0117). Table S1 contains the underlying data. See Figure S3 for an enlarged map for European countries

To increase the reliability and representativeness of our findings at the country level, we focus on countries with more than 30 scientists whose gender is assigned (N = 43). The average percentage of women in the country sample is 11.83% (s.d. = 0.046). The ten countries with the highest shares of women among their top authors are Finland (20.45%), Portugal (18.81%), New Zealand (18.32%), Slovenia (18.18%), South Africa (18.04%), Australia (17.83%), the US (17.43%), the UK (17.12%), Canada (16.80%), and France (16.75%). The 10 with the lowest shares are Czech Republic (7.35%), Japan (7.22%), Austria (7.08%), Egypt (6.67%), Taiwan (6.54%), Iran (6.45%), China (6.29%), Republic of Korea (4.99%), Chile (4.26%), and Saudi Arabia (2.08%). It is not surprising to see Finland in first place, as 25 years ago, the country introduced a requirement for minimum representation of 40% of either gender on committees responsible for public spending; this includes research funding. Ten years ago, women already made up 50% of the board of the Academy of Finland and the scientific committees (Boyle et al. 2015).

We find that those countries with more top-cited scholars, relative to the number of researchers in the higher education sector within the country,Footnote 14 also tend to have a higher ratio of women scholars (Pearson’s r = 0.381, P = 0.0117, n = 43; Fig. 1b). An environment with a relatively larger pool of women scholars may be conducive to (or may attract) a relatively greater share of top women scholars, even though the share of the former is substantially higher than that of the latter. For example, the proportion of females ranked as top scientists is highly correlated with a country’s share of female scientific authors (Pearson’s r = 0.514, P = 0.0005), higher education sector researchers (Pearson’s r = 0.338, P = 0.0306), PhD holders (Pearson’s r = 0.497, P = 0.006), and corresponding authors (Pearson’s r = 0.406, P = 0.0114) with an average inter-ratio difference of 28.78 (s.d. = 5.98, n = 42), 28.99 (s.d. = 8.55, n = 41), 26.09 (s.d. = 6.15, n = 29), and 11.63 (s.d. = 8.42, n = 38) percentage points, respectively (SI Appendix, Fig. S2).Footnote 15 Notably, countries with greater female participation in science have larger inter-ratio differences (i.e., further away from equal women representation at the top (red diagonal line), see SI Appendix, Fig. S2), which indicates a higher degree of underrepresentation of women among top-cited scientists.

In general, the gender gap pattern among top scholars across fields is similar to those reported in previous studies (e.g.,Holman et al. 2018; Bendels et al. 2018a, b; Larivière et al. 2013; European Union, 2019). We find that Natural Sciences is the most gender imbalanced discipline, with the lowest average share of women among top-cited scientists (9.54%) and the smallest variation (s.d. = 0.035) among its fields and subfields (n = 33 subfields with more than 30 scientists whose gender is classified), compared to Applied Sciences (10.9%, s.d. = 0.052, n = 33 subfields), Economic and Social Sciences (17.12%, s.d. = 0.08, n = 21 subfields), Health Sciences (21.12%, s.d. = 0.119, n = 55 subfields), and Arts and Humanities (24.96%, s.d. = 0.08, n = 6 subfields) (SI Appendix, Fig. S1). The only subfield with more than 50% of female top-cited authors is Nursing (79.41%).Footnote 16 Contrary to the country-level findings, the degree of underrepresentation of women among top-cited scientists does not correlate with the overall gender ratio of a scientific field (Fig. 2). Instead, the level of women underrepresentation (in contrast to all Scopus authors) is roughly the same across fields (vertical distance to the diagonal line).

Fig. 2
figure 2

Correlation between proportion of females as most cited scientists and Scopus authors by subfields. Colored markers represent a subfield; marker size is scaled to the number of top scientists in the subfield. Horizontal lines show the proportion of female top-cited scientists in each field. Red dashed diagonal lines indicate equal representation of women. Proportion of female Scopus authors (2006–2015) is based on the Science-Metrix (2018). Underlying values are shown in Table S1

Next, we examine the gender ratio variation within the list of most cited scientists by country and discipline. We scale the overall ranking outcomes into percentile ranks within each panel (Figs. 3 and 4). We find a strong trend by both discipline and country for the lowest proportions of women represented in the very top scholars (superstars) with a steady increase in women as we move to lower-ranked scholars. This suggests that men disproportionately dominate the top citation ranking positions, which is consistent with the increasing underrepresentation of women as they move up the academic ladder; often referred to as the ‘leaky pipeline’ (Rees 2002; Benschop and Brouns 2003; European Commission 2019; Kloot 2004; Van den Brink 2010). For example, only 12.4% of the top-cited scientists in the first quartile of the distribution are women compared to 15.2%, 16.6%, and 16.8% in the second, third, and fourth quartiles, respectively (Fig. 3a). On average, the top one percentile of the most cited scientists has the lowest share (5.77%) of women scientists, with the lowest score (2.35%) in Natural Sciences and the largest (8.49%) in Applied Sciences (Fig. 3b). The steepness of the change is more significant in Social Sciences and Health, two areas in which the proportion of women is higher than in either Natural or Applied Sciences. The difference between the first and last quartiles is largest in Economic and Social Sciences at 8.65 percentage points (14.03−22.68%), followed by Health Sciences at 5.89 percentage points (14.19−20.08%). Towards the lower ranks in the distribution, the increase in the proportion of women is smaller in Applied Sciences (2.99 percentage points, 8.86−11.85%) and Natural Sciences (3.94 percentage points, 7.48−11.42%), likely due to the fact that both disciplines have low ratios of female participation overall.

Fig. 3
figure 3

Gender ratio across ranking positions based on citation performance. We show that the proportion of women among the most cited scientists increases with the ranking a overall, b across disciplines, and c across countries, based on the composite citation indicator, excluding self-citation [31]. Rank percentile is rescaled from the overall ranking within each panel. Red dot represents the mean share of women at a one percentile interval for the overall ranking (a), and at a five percentile interval for discipline (b) and country (c) rankings; Blue solid line represents local mean-smoothed (kernel half-width = 5) average proportion of women across rank percentiles with shaded area as the 95% confidence interval; Green dashed line in panels b, c shows the world smoothed average for comparison. For illustration, we display only countries with at least 300 top-cited authors and omit results for Arts and Humanities and General discipline because of the small number of observations. Figure 4 provides results for the 15 largest research fields. Underlying data for all countries, disciplines, fields, and subfields are provided in Table S2

Fig. 4
figure 4

Gender composition among top-cited authors by research field over rank. Research fields in Applied Sciences, Natural Sciences, Health Sciences, and Economic & Social Sciences are categorized under the Science-Metrix classification system. Author ranking within each field is scaled into percentile rank with five percent intervals. Red dots indicate the mean proportion of women within each five percent interval. Blue solid lines represent the local mean-smoothed average proportion of women with shaded areas as the 95% confidence intervals. Green dashed lines show the discipline local mean-smoothed average as the reference. Fields classified under disciplines of Arts & Humanities (Philosophy & Theology (N = 70), Historical Studies (N = 146), and Communication & Textual Studies (N = 96)) and General (General Arts, Humanities & Social Sciences (N = 6), and General Science & Technology N = 111) were not analysed because of the small number of observations. Built Environment & Design (N = 187) was also omitted for the same reason. Males dominate the top positions among top authors in most fields, although the proportion of women increases with rank except in fields with a very low female representation (i.e., Mathematics & Statistics, Engineering, Information Technology, and Physics & Astronomy)

Breaking out the results by scientific field reveals relatively stable gender proportions in Mathematics and Statistics, Physics and Astronomy, Engineering, Information and Communication Technologies, and Enabling and Strategic Technologies (see Fig. 4). In this analysis, the areas with the highest proportions of women among their top authors are Public Health and Services (36.06%), Communication and Textual Studies (33.66%), Psychology and Cognitive Science (27.52%), and Social Sciences (22.98%), while the lowest are Mathematics and Statistics (6.26%), Engineering (7.23%), Physics and Astronomy (7.68%), and Earth and Environmental Sciences (9.09%) (SI Appendix, Fig. S1). The overall mean by subfield is 16.35% with an s.d. of 0.12 (see the Supporting Informatiosn for more detail, including the 171 subfields).

Using a simple linear regression, we find that the share of women scientists decreases by around 1 percentage points for every 10 rank percentile increase for the US (b = 0.1, P < 0.001, CI95% = [0.088;0.113]), UK (b = 0.099, P < 0.001, CI95% = [0.07;0.128]), Canada (b = 0.083, P < 0.001, CI95% = [0.043;0.123]), Germany (b = 0.078, P < 0.001, CI95% = [0.049;0.106]), Australia (b = 0.099, P < 0.001, CI95% = [0.049;0.148]), Denmark (b = 0.131, P = 0.001, CI95% = [0.055;0.208]), Sweden (b = 0.126, P < 0.001, CI95% = [0.065;0.187]), Austria (b = 0.079, P < 0.067, CI95% = [− 0.006;0.164]) and France (b = 0.131, P < 0.001, CI95% = [0.074;0.189]). We also identify a moderate relation for the Netherlands (b = 0.069, P = 0.012, CI95% = [0.015;0.123]), Italy (b = 0.066, P = 0.020, CI95% = [0.01;0.121]), and Spain (b = 0.066, P = 0.068, CI95% = [− 0.005;0.137]), but the slope for the other five other countries with at least 300 scientists is not statistically significant: Belgium (b = 0.015, P = 0.706, CI95% = [− 0.064;0.094]), Finland (b = 0.078, P = 0.190, CI95% = [− 0.039;0.194]), Norway (b = 0.007, P = 0.901, CI95% = [− 0.1;0.114]), Switzerland (b = 0.050, P = 0.088, CI95% = [− 0.007;0.107]), New Zealand (b = 0.028, P = 0.718, CI95% = [0.015;0.123]), Israel (b = 0.060, P = 0.158, CI95% = [− 0.023;0.143]), Hong Kong (b = 0.046, P = 0.365, CI95% = [− 0.053;0.145]), Taiwan (b = 0.038, P = 0.399, CI95% = [− 0.051;0.127]), China (b = 0.002, P = 0.938, CI95% = [− 0.04;0.043]), Japan (b = 0.007, P = 0.686, CI95% = [− 0.025;0.038]), and South Korea (b = 0.033, P = 0.410, CI95% = [− 0.045;0.11]).

Next, to better understand how the most prominent scientists differ by gender, we take a look at their productivity (total number of papers and number of papers as single, first, or last author; Fig. 5a), collaboration (share of single-author papers; Fig. 5b), authorship pattern (share of first or last author in multi-authored papers; Fig. 5b), impact (total non-self-citations, citation per papers, and h- and hm-index; Fig. 5c–e), credit allocation (share of total citations received and relative citations per paper to single, first, or last author papers; Fig. 5f–g), and percentage of self-citations (Fig. 5h). To account for specific practices among scientific fields and subfields, we compare men and women within each subfield for which there are more than 30 scientists with gender assigned (n = 149 subfields), and test if there is a systematic difference across all subfields. We find that men tend to have larger numbers of total publications and citations (e.g., the average gender difference of total publications; number of single, first, and last author publications; citations; citations to single/first/last author papers; h-index; and hm-index between men and women across the subfields equals to 25.1 (CI95% = [19.8;30.4]; Cohen’s d = 0.254), 19.9 (CI95% = [16.7,23.1]; Cohen’s d = 0.29), 569.5 (CI95% = [387.6,751.5]; Cohen’s d = 0.106), 483.9 (CI95% = [383.3,584.5]; Cohen’s d = 0.144), 1.24 (CI95% = [0.81,1.66]; Cohen’s d = 0.111), and 1.14 (CI95% = [0.91,1.37]; Cohen’s d = 0.195), respectively with statistical significance below 0.1% level for two-tailed t-test; Fig. 5a, c, and e). Yet, on average, women slightly outperform men in terms of citations per paper or citation per single/first/last-authored paper, by 2.64 (CI95% = [− 4.11, − 1.17]; Cohen’s d = − 0.099) and 2.67 (CI95% = [− 4.36, − 0.99]; Cohen’s d = − 0.091), respectively (P = 0.0006 and P = 0.0021, Fig. 5d).

Fig. 5
figure 5

Gender difference in top authors. We show the overall gender differences for 149 scientific subfields in a productivity, b collaboration and authorship pattern, c–e impact, f, g credit allocation, h self-citation frequency. For each subfield, we calculate the average for men and women separately and analyse the pairwise differences between gender using a two-tailed t-test. All citation measures exclude self-citation counts. Share of first and last author publications and citations received are compared to all multi-authored papers. Solid black dots represent the mean of subfield averages of men and women. **P < 0.01; ***P < 0.001; n.s. not significant

While there is an equal likelihood of either gender being the first author on multi-authored papers (difference of − 0.2 pp, P = 0.624), women are less likely to be the last author (by 1.8 pp, P = 0.0002, Cohen’s d = 0.121) and have a slightly smaller proportion of single-author publications in their portfolio relative to all publications (by 1.3 pp compared to men, P = 0.0001, Cohen’s d = 0.121) (Fig. 5b). Nevertheless, women receive more citations (and citations per paper) on their first author papers, relative to their multi-authored papers (P = 0.0026, Cohen’s d = − 0.083 and P = 0.0003, Cohen’s d = − 0.09), but less on papers for which they are the last author (P < 0.0001, Cohen’s d = 0.098 and P < 0.0001, Cohen’s d = 0.146), compared to men (Fig. 5f, g). While the share of citations received on single-author papers is not significantly different between men and women (P = 0.228, Cohen’s d = 0.053), women have a slightly higher number of citations per publication on their single-author papers compared with their multi-authored papers (P = 0.0095, Cohen’s d = − 0.066) (Fig. 5f, g).

Nevertheless, comparing the year of first publication in Scopus between men and women top-cited scientists shows that male scientists are 3.2 years older (in terms of academic age) than their female counterpart (Cohen’s d = − 0.326, P < 0.0001, n = 149).Footnote 17 Despite this difference, the respective subfield average of year of first publication for male scientists is 1983.3 (s.d. = 3.48) and female scientists is 1986.5 (s.d. = 3.34); which shows that both male and female top-cited scientists in our sample are past their mid-career stage. Nevertheless, we provide robustness checks to assess whether the above findings are sensitive to scientists’ academic age by controlling for year of first publication fixed effects using linear regression estimation (SI Table S4). We find the two sets of results are indeed very similar, meaning the observed gender difference in scientific activities is unlikely due to the difference in academic age.

Moreover, even though there are substantial gender differences observed within disciplines (Fig. 6), we note interesting exceptions in fields historically perceived as male dominated. That is, women scientists in Mathematics and Statistics, Physics and Astronomy, Engineering, Information and Communication Technologies, and Built Environment and Design all perform, on average, equal to their top male peers with respect to the number of single/first/last author papers produced,Footnote 18 as well as the total and normalized citations generated from this output. Interestingly, in more gender-balanced fields such as Social Sciences, Psychology and Cognitive Sciences, and Public Health and Health Services, the contrast in quantity-quality of research output between women and men is more apparent.

Fig. 6
figure 6

Gender difference in top authors by field. a productivity, collaboration and authorship pattern, and self-citation frequency, b impact and credit allocation. Coloured dots represent the effect size (Cohen’s d) of a subfield for the respective metric within the field classification, based on Science-Metrix classification of authors. Positive Cohen’s d indicates stronger male performance while negative shows stronger female performance. Overall field level effect size is computed as the weighted average gender difference (men minus women) of the subfields, with the weights as the number of scientists within the subfield (dot size). Statistical significance based on z test. P < 0.1; *P < 0.05; **P < 0.01; ***P < 0.001

With respect to frequency of self-citations, we find (contrary to a recent analysis of 1779−2011 JSTOR data (King et al. 2017)) there is no statistically significant difference in self-citations either between genders overall (P = 0.358, Cohen’s d = 0.029), or in most fields among top-cited scholars (Fig. 5h). Nevertheless, women in Clinical Medicine (diff. = 0.52 pp, Cohen’s d = − 0.08), Psychology and Cognitive Sciences (diff. = 1.05 pp, Cohen’s d = − 0.14), and Physics and Astronomy (diff. = 0.58 pp, Cohen’s d = − 0.1) have slightly higher self-citations on average, while men in Public Health and Services (diff. = 1.04 pp, Cohen’s d = 0.17) and Economics and Business (diff. = 1.05 pp, Cohen’s d = 0.28) self-cite significantly more often than their women counterparts (Fig. 6).

To test for the correlation between female academic performance and gender equity or equality indicators in each affiliated country (Fig. 7; Supporting Information 1.2), we first make use of two indexes from the 2018 Human Development Report: The Gender Development Index and the Gender Inequality Index. The data indicate how men and women differ in economic involvement (earned income and labor market participation rate), health (life expectancy and reproductive health), education (years of schooling), and political activity (proportion of parliamentary seats held by women). We also draw on the Global Gender Index provided by the World Economic Forum, which measures the relative gaps between women and men across the same four key areas in human life. We then link these indicators to cultural attitudes towards women (Guiso et al. 2008) using value and attitudinal variables from the World Values Survey, including preferential treatment for men during job scarcity, whether men are better political leaders or business executives than women, and whether university is more important for a boy than for a girl. Not only do we observe a high correlation among the different indicators of gender equality (bottom right corner, Fig. 7), but the strongly significant correlations (at the bottom left) indicate that female scientific performance or success (relative to male) is positively correlated with more institutional gender equality in public life and decision making, and lower preferential discrimination against women. Yet, there are no strong correlations between average performance difference between women and men (lower middle). We thus infer that either gender equity-promoting institutional conditions and values are conducive to better female representation among scholars, or that better conditions attract more (but not necessarily better) women scientists.

Fig. 7
figure 7

Achievement of women scientists and gender equality proxies. We report the pairwise Spearman rank correlations between gender differences among top scientists with overall gender equality indicators (n = 43). We measure aggregated gender differences among top scientists by the ratio of women among all top scientists (FemRatioTop), difference between the total number of single, first, or last author publications produced by all men and women scientists (geometric mean difference (women–men), TtlSFLPapersLDiff) and the number of unweighted non-self-citations towards these publications (geometric mean difference (women–men), TtlSFLCiteLDiff). RankDiff is the difference in rank percentile (based on composite citation performance) between the average women and men scholars within a country. To compare performance between the average top men and women, we first compute the standardized z-score of each metric for all scholars within a subfield, then calculate the difference of the average z-scores between women and men. We considered the number of single, first, or last author publications (SFLPapersDiff) and the number of non-self-citations on these publications (SFLCiteDiff), composite citation (CompositeDiff), citations per paper of single, first, and last author publications (CiteperSFLDiff), and self-citation percentage (SelfciteDiff). Country aggregated gender-related indicators include Gender Development Index (GenDevIndex), Gender Inequality Index (reverse-scored, RevGenInequIndex), Global Gender Gap Index (GenGapIndex), and measures derived from Values Surveys (JobRight, PoliLeader, BusExec, and UniForBoy). See Supporting Information regarding data sources of variables used. P < 0.1; *P < 0.05; **P < 0.01; ***P < 0.001; n.s. not significant (two-tailed test)

Conclusions

Overall, the results indicate that although more female top scholars come from countries with higher shares of women scientists, the ratios for the former lag behind the overall proportion of the latter. Despite signs of recent improvement in some countries and fields (Supplementary Results Fig. S5–S7; Holman et al. 2018), this result suggests there is a serious need for future adjustment. According to the large-scale global survey on Women in Physics (n = 15,000), the barriers faced by women scholars extend well beyond the academic field, incorporating sex-based differences in resources, professional opportunities, and allocation of family responsibilities.

We were able to identify an interesting result, finding higher citation inequality than publication productivity inequality among those top scholars. Men are quite dominant among the top scholars in many scientific disciplines and countries, even when classifying the top scholars into percentiles to identify the superstars among them. The lowest proportion of women is observed among that rank of superstars (compared to other percentiles; Aguinis et al. 2018). For many countries, the share of women scientists decreases for every rank percentile increase, with the exception on results for Belgium, Finland, Norway, Switzerland, or New Zealand. This is consistent with the notion that career-wise gender imbalance in the advancement in academia requires further attention (Benschop and Brouns 2003; Van den Brink 2010; Huang et al. 2020a, b).

Our findings based on top scholars also echo the gender differences in scientific activities identified in other studies. Women on average outperform men in terms of overall citations per paper and citations per paper to publication with first-authorship (Bendels et al. 2018a, b; Thelwall 2020b, c; Frandsen et al. 2020), despite differences in total productivity and impact (Huang et al. 2020a, b). In addition, in some traditionally male-dominated fields such as mathematics and statistics, physics and astronomy, or engineering, the performance of top female scholars is equal to their male colleagues. Thus, increasing the pool of female scholars could help increase the number of female scholars at the top. Environments with a stronger female representation contribute to the production of quality female scholars; an important insight, as the natural sciences are still the most gender imbalanced discipline with respect to top scientists. The number of female scientists in a particular field is also not per se a guarantee of producing top scientists, but having more women scholars within a country helps to increase or produce top scholars.

Given the substantial heterogeneity observed in this study between countries and fields, additional case studies would provide valuable information on both the barriers to female participation and the initiatives that may help remove such obstacles. Suggestions for the latter include more active employer support via workload adjustment during life event shocks that can hinder career progression (e.g., parenting) or family-friendly amenities such as on-campus daycare (Zakaib 2011). Both these measures would facilitate a better work-home balance, thereby raising incentives for women to become academics. Likewise, allowing more time for (or part-time engagement in) the tenure track career path would mitigate the strong disincentive for female academics to have children (Ceci and Williams 2011). A study of 11,000 research grant applications in the Netherlands exploring gender productivity differences finds that gender performance differences among the younger generation of researchers has disappeared (van Arensbergen et al. 2012). If differences in this cohort existed, young female researchers outperformed their male counterparts. Another study tracking the productivity and impact of health science researchers over a period of 16 years, starting from enrolment in the PhD program, reports no or little gender difference in productivity (Frandsen et al. 2020). In some cases, women also outperform men.

The fact that the top women in fields traditionally characterized by greater female underrepresentation (i.e., Physics and Astronomy, Mathematics and Statistics, Engineering, Economics) perform better than or equal to their male colleagues underscores the potential of gender-stereotypes and gendered expectations to undermine scientific advancement in these fields. Perceptions of these disciplines as inhospitable to women can self-reinforce by discouraging applications to or engagement in these environments (Williams and Ceci 2015), perpetuating science’s historical losses from the marginalization and disempowerment of women scientists.

A key strength of our paper is its exploration of a large number of factors measuring gender equality at the country level; a key contribution is its finding of a high correlation among those different indicators of gender equality and female scientific performance or success (relative to male). Thus, our results also emphasize that science is not immune to cultural norms and institutional conditions: societies that are more committed to gender equality or equity and who have less discriminatory attitudes towards women and better female political representation also have a higher proportion of women among their top scientists. Societies that support egalitarian principles and tolerance are thus nurturing a better scientific environment for females. Conversely, inequality in gender norms and stereotypes embedded in the cultural and political domain strengthen and legitimize unequal gender systems that are harmful to female representation in science and, thus, to scientific progress (Miller et al. 2014). Given that a healthy, resilient, inclusive, and regenerative academic ecosystem requires full involvement of women in decision making, engagement, and leadership, reducing attitudes and behaviours that constrain or undervalue women will benefit not only science but academia in general. However, the substantial heterogeneity between countries indicates that we need more insights into how environmental and institutional conditions affect gender representation at the top. It may also be interesting to explore whether better scholars migrate to environments with better environmental conditions. Thus, we need more dynamic mapping and studies that provide a causal understanding of how environmental conditions affect female scientific success.

Our results are limited by several factors. The first concerns coverage of the Scopus database, from which the underlying data on top-cited scientists were drawn. The selection and representativeness of top scientists is directly impacted by the database’s journal coverage. Despite the wide collection of 77 million items from more than 25,000 titles (Baas et al. 2020), studies have shown that Scopus’s journal coverage is biased against research in Arts and Humanities and Social Sciences; therefore over-representing Natural Sciences, Engineering, and Life Sciences research (Mongeon and Paul-Hus 2016; Martín-Martín et al. 2018a, b; de Moya-Anegón et al. 2007; Archambault et al. 2006), a pattern that also applies to highly-cited documents (Martín-Martín 2018a, b). Non-English research produced in non-western countries is also underrepresented in Scopus (Baas et al. 2020; (Martín-Martín et al. 2018a, b; Mongeon and Paul-Hus 2016; Vera-Baceta et al. 2019). While it is difficult to gauge the level of under-representation of high impact researchers in these fields and countries, composing the list of top-cited authors using other databases is likely to result in ranking differences (e.g., Huang et al. 2020a, b). It is also unclear how this would affect our results in terms of gender differences.Footnote 19 Thus, replication studies using other databases (e.g., Google Scholar, Web of Sciences, Microsoft Academic Graph, Dimensions) are needed. Additionally, although the composite citation measures proposed by Ioannidis et al. (2016) to rank research impact of scientists accounts for multiple citation metrics, including adjustments for multiple authorship, alternative ranking (e.g., using fractional counting of citation performance or field normalization) might produce different results given the gender difference in co-authorship patterns and citation practice across fields.Footnote 20 The current set of results did not account for scientists’ mobility as a country is inferred by the authors’ affiliation from the most recent publication as opposed to their country of birth, residency, or citizenship. In addition, measuring top scientific performance via citation does not consider that a scientist may be outstanding along several dimensions, such as, for example, in terms of great discoveries or contributions to societal needs. There are other interesting ways of measuring scientific success or (social) influence looking, for example, at awards obtained, patents generated, or invitations received (e.g., advisory positions, keynote speeches etc.) (see Chan et al. 2016a, b). Some studies have used the number of pages indexed by Google or Bing (Aguinis et al. 2012; Chan et al. 2016a, b) or speaking fees received (Chan et al. 2014a, b, c) as measures of external scholarly influence.

In general, using such a large-scale descriptive dataset is useful to map the current situation of women at the top of their rankings across a large number of fields. It also allows exploration of how macro factors are correlated with a country’s situation. However, such a dataset only allows the understanding of certain types of issues or questions that are useful in understanding the broader problem. To provide greater depth, such large-scale datasets need to be complemented with surveys, interviews, case studies, archival work, or natural and field experiments. For instance, survey data can provide more detailed information on the contextual aspects within an environment and across individual characteristics. Surveys can measure, for example, personality factors as measured by the BIG 5 factors, which can influence interactions in academia. Women tend to be more agreeable than men, which could affect their ability to collaborate and interact (Fell and König 2016). Experimental investigations can provide important insights into the causal channels. Moss-Racusin et al. (2012), for example, explored data on science faculty members and showed preferential evaluation and treatment of male students for work in the laboratory (compared to an equality qualified female student) by randomly assigning them into the two gender conditions. Witteman et al. (2019) took advantage of a natural experimental situation that arose due to changes in the funding setting in 2014; the Canadian federal health research funding agency divided investigator-initiated funding into new grant programs, namely with and without a focus on the principal investigator. Results indicate that programmes that primarily fund people rather than projects show stronger gender gaps, which means that women were assessed less favourably when the principal investigator was evaluated rather than the project quality. In general, only the use of a large variety of methods will allow scholars to take into consideration the richness and complexity of this area of research.

Supplementary information

Supplementary information is available for this paper at https://osf.io/5dhpm/.