Avoid common mistakes on your manuscript.
Dear Scientometrics Editors,
The interest in studying gender differences in science has increased over the last decade in the field of bibliometrics. Calls for diversity in science (i.e., CoARAFootnote 1), evidence demonstrating a gender gap in science (Sugimoto & Larivière, 2023), as well as the expansion of algorithmic approaches for author disambiguation (Tekles & Bornmann, 2020) and gender assignment (Mihaljevíc et al., 2019) have greatly helped explore publication and citation patterns within the scientific workforce. This qualitative leap has been made possible partly by an improvement in the quality of metadata produced by major bibliometric data providers.Footnote 2 By exploring gender inequalities in science, the scientometric community has raised awareness on this topic and explored the mechanisms producing such inequalities.
However, we have noted an important lack of rigour when reporting the use of algorithms that assign gender to names, which play a major role on the findings reached by these papers. In this sense, some voices advocate that journals should collect self-declared gender information (Ribarovska et al., 2021), but this may go against researchers’ personal privacy rights. Although these algorithms are used to infer conclusions from social groups and do not tackle individuals’ gender, they make important assumptions that have not been tested. First, they presuppose that gender can be inferred from names (or images of faces), which is not necessarily true. Just because a name is usually associated with one gender, it does not mean that it is always the case. Moreover, names are not always associated with gender in the first place, which leads to the second limitation: there are many given names which are unisex (can be applied both to male and female authors), depending on the author’s country of origin. Third, they consider gender as a binary variable, making invisible other identities such as non-binary or trans authors (Lindqvist et al., 2021; Rasmussen et al., 2019).
There are further limitations however, which, on many occasions, are not reported or are overlooked. Gender algorithms usually work better with Western (and English) names than with Asian names, as current methods have performed poorly in non-roman names and, overall, non-Western names (Karimi et al., 2016). Geographically unequal representations of gender in global analyses can lead to biased findings.Footnote 3 This is related with the use that gender assignment algorithms make of lists of gendered names as a fundamental component. These lists are often not reported in the studies. It is critical to understand how they are composed as some do not consider the cultural and regional variations that can exist within countries. A notable example of this limitation is evident in Slavic countries, where gender assignment based solely on given names is less efficient than focusing on both first names and surnames, since gender information can be found in surnames (Mryglod et al., 2023). Moreover, it is important to recognize that some algorithms, such as NamSor and Gender API, do not transparently report the sources of their name-gender lists, leaving room for uncertainty regarding their origin and reliability. Therefore, advocating for increased transparency in the description of gender assignment methods in gender-related research is essential to address these limitations and promote more robust and inclusive practices within the field.
Table 1 includes a brief analysis that illustrates the extent to which transparency is needed in these studies. We retrieved journals articles published since 1981 responding to the following search query in Scopus:
TITLE-ABS-KEY (gender) OR TITLE-ABS-KEY (wom?n) OR TITLE-ABS-KEY (*male) AND (LIMIT-TO (EXACTSRCTITLE, “Scientometrics”))
We retrieved a total of 271 records out of which 222 used some sort of method to infer gender from their dataset. 28.4% assigned gender manually by doing online searches or based on the researchers’ knowledge of gendered names. 27% of the articles analysed did not report how they got the gender information. Then, 16.2% of articles got gender information from secondary official data which already assigned gender to its subjects. In most cases this was governmental or university data. 19 articles (8.6%) used third-party algorithms (e.g., genderize.io, Gender API). The drawback of these methodologies is the lack of replicability they allow. The first three cases (manual assignment, no information and secondary data) are impossible to track back, however, the use of algorithms is no easier to examine for robustness. For instance, there is no information about where data from NamSor’s Gender Guesser comes from.Footnote 4 Gender API, another commonly used service, simply states that data comes from “publicly available data, governmental data and manual additions/corrections”.Footnote 5
However, in recent years, research has started to focus more extensively on this methodological issue, and we found 19 articles (8.6%) that designed their own method to assign gender, either from scratch or combining previous methodological approaches. In this last group we find exemplary cases of transparent, robust and replicable reporting on gender assignment. This is the case of Ma et al. (2023), that recognizes the challenge of assigning gender to names and its binary nature, producing a method to assign gender to their dataset. Fell and König (2016) included a step-by-step validation of their initial results. El-Ouahi and Larivière (2023) dedicate an Appendix to discuss their gender assignment method. Chan and Torgler (2020) include a detailed account of the combination of methods used in their supplementary material.
Devoting time and space to explain the gender disambiguation process is not only feasible but essential to understand caveats and critically contrast findings with previous research. Thus, we make a call for transparency when reporting gender assignment. Moreover, good research needs to be replicable. Providing a clear methodology and allowing replicability is of great importance for the development of science. We encourage all researchers to apply these principles to their research.
Notes
Web of Science started including author full names in 2007 (they are searchable in the database since 2011) and Scopus announced their inclusion in 2022. Also now most bibliometric databases include their own researcher identifier.
They may also lead to inaccurate findings (i.e. Andrea is commonly assigned to men in Italy and to women in Spain).
References
Chan, H. F., & Torgler, B. (2020). Gender differences in performance of top cited scientists by field and country. Scientometrics, 125(3), 2421–2447. https://doi.org/10.1007/s11192-020-03733-w
El-Ouahi, J., & Larivière, V. (2023). On the lack of women researchers in the Middle East and North Africa. Scientometrics, 128(8), 4321–4348. https://doi.org/10.1007/s11192-023-04768-5
Fell, C. B., & König, C. J. (2016). Is there a gender difference in scientific collaboration? A scientometric examination of co-authorships among industrial–organizational psychologists. Scientometrics, 108(1), 113–141. https://doi.org/10.1007/s11192-016-1967-5
Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., & Strohmaier, M. (2016). Inferring gender from names on the web: A comparative evaluation of gender detection methods. Proceedings of the 25th International Conference Companion on World Wide Web. https://doi.org/10.1145/2872518.2889385
Lindqvist, A., Sendén, M. G., & Renström, E. A. (2021). What is gender, anyway: A review of the options for operationalising gender. Psychology & Sexuality, 12(4), 332–344. https://doi.org/10.1080/19419899.2020.1729844
Ma, Y., Teng, Y., Deng, Z., Liu, L., & Zhang, Y. (2023). Does writing style affect gender differences in the research performance of articles? An empirical study of BERT-based textual sentiment analysis. Scientometrics, 128(4), 2105–2143. https://doi.org/10.1007/s11192-023-04666-w
Mihaljević, H., Tullney, M., Santamaría, L., & Steinfeldt, C. (2019). Reflections on gender analyses of bibliographic corpora. Frontiers in Big Data. https://doi.org/10.3389/fdata.2019.00029
Mryglod, O., Nazarovets, S., & Kozmenko, S. (2023). Peculiarities of gender disambiguation and ordering of non-English authors’ names for Economic papers beyond core databases. Journal of Data and Information Science, 8(1), 72–89. https://doi.org/10.2478/jdis-2023-0001
Rasmussen, K. C., Maier, E., Strauss, B. E., Durbin, M., Riesbeck, L., Wallach, A., Zamloot, V., & Erena, A. (2019). The nonbinary fraction looking towards the future of gender equity in astronomy. arXiv. https://doi.org/10.48550/arXiv.1907.04893
Ribarovska, A. K., Hutchinson, M. R., Pittman, Q. J., Pariante, C., & Spencer, S. J. (2021). Gender inequality in publishing during the COVID-19 pandemic. Brain, Behavior, and Immunity, 91, 1–3. https://doi.org/10.1016/j.bbi.2020.11.022
Sugimoto, C. R., & Larivière, V. (2023). Equity for women in science: Dismantling systemic barriers to advancement. Harvard University Press.
Tekles, A., & Bornmann, L. (2020). Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches1. Quantitative Science Studies, 1(4), 1510–1528. https://doi.org/10.1162/qss_a_00081
Funding
This work was funded by Ministerio de Universidades, FPU21/02320, Elvira González-Salmón, Ministerio de Ciencia e Innovación, MCIN/AEI/https://doi.org/10.13039/501100011033 FSE invierte en tu futuro, Nicolas Robinson-Garcia, PID2020-117007RA-I00, Nicolas Robinson-Garcia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Nicolas Robinson-García, co-author of the article, is an associate editor in Scientometrics.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
González-Salmón, E., Robinson-Garcia, N. A call for transparency in gender assignment approaches. Scientometrics 129, 2451–2454 (2024). https://doi.org/10.1007/s11192-024-04995-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-024-04995-4