1 Introduction

Team diversity refers to the individual differences between members of a team. It can be present on various dimensions such as value diversity (e.g., beliefs, goals, values), information diversity (e.g., experience, knowledge, background ), and social diversity (e.g., gender, age, race) (Jehn et al. 1999). Inspired by the social diversity dimension (Jehn et al. 1999), in this paper we define “perceived diversity” as the perceived internal diversity aspects that individuals are born with (e.g., gender, age, race, and nationality). The perception of these diverse aspects from a person can activate the prejudices, stereotypes, or biases that other individuals might have against that person (Evans 2003; Heiniger and Mercie 2018; Bertrand and Mullainathan 2004). For example, when the person A believes that they have perceived these diverse aspects from person B, it might activate whatever biases person A might have about people with that diversity aspects.

Some of the problems associated with diversity in working teams can be explained by the Similarity-Attraction theory and the social categorization perspective. The Similarity-Attraction theory postulates that individuals working in groups prefer working with others similar to them (Byrne 1971). The social categorization perspective predicts that group members are more prone to like, trust, and co-operate with similar others (Homan et al. 2007). Many companies are aware of the lack of diversity in their organizations and this has resulted in a wave of efforts to increase the diversity of employees in worldwide tech organizations.Footnote 1Footnote 2Footnote 3

The diversity of a team is essential beyond ethical reasons. In the past decades, diversity has increasingly been recognized as an essential feature of a team (Page Scott 2007). For example, without gender diversity, teams may focus more on doing things faster and less on doing new things (Østergaard et al. 2011); and without race or nationality diversity, teams might not benefit from multiple points of view, availability of knowledge and skills, and constructive conflict (Shachaf 2008). Since Software Engineering (SE) activities involve teams of developers, it is interesting to understand the way how members of SE teams interact with their similar and dissimilar peers when developing software products.

Recent studies have shown that SE teams have problems associated with perceived diversity aspects in both industrial and Open Source Software (OSS) environments. Blincoe et al. found that 12% of men working in IT teams have admitted to having a conscious bias against women [SLR(Blincoe et al. 2019)].Footnote 4 Davidson et al. found that OSS contributors have witnessed discrimination towards others, especially against non-native English speakers and women [SLR(Davidson et al. 2014)].

However, while researchers have made progress in showing that gender diversity increases innovation and productivity (Østergaard et al. 2011; Tourani et al. 2017) [SLR(Vasilescu et al. 2015)], reduces turnover and conflict within teams [SLR(Vasilescu et al. 2015)], and produces more user-friendly software [SLR(Burnett et al. 2016)], comparatively less research has examined other perceived diversity aspects. Hence, through a Systematic Literature Review (SLR) we analyze the previous studies that have been published until May 2020 on the topic of perceived diversity in SE. Our aim is to identify what has been studied and discovered related to perceived diversity aspects in SE, what means have been proposed to mitigate perceived diversity issues, and what needs to be done in SE to increase the knowledge about perceived diversity. Our results help researchers be aware of all the work that has been done and all the work that needs to be done so that they take further actions against these perceived diversity issues. Also, our results may help practitioners to identify which tools and methods they can use to make effective decisions on assessing the inclusivity and diversity of both their teams and their products.

There are five literature reviews for some perceived diversity aspects in SE previous to our SLR. Three reviews focused on gender diversity (Canedo et al. 2019; Spichkova et al. 2017; Silveira and Prikladnicki 2019), one on cultural diversity (Fazli and Bittner 2017), and one about the characteristics of diversity in SE (Menezes and Prikladnicki 2018). This last literature review is similar to ours but it was a work in progress that included only 29 studies. Our SLR differs from the previous SLR because (i) we analyze studies that address not only gender and cultural – in our SLR, culture is a sub-dimension inside nationality – diversity but other software perceived diversity aspects (e.g., age, and race), and (ii) we summarize the outcomes studied and identify gaps in the literature. We make four significant contributions by presenting:

  • A summary of 131 studies addressing perceived diversity in SE. Researchers in SE can use these studies as the basis of future investigations into perceived diversity.

  • A subset of 41 studies on inclusivity that report tools, models, and practices proposed to help assess perceived diversity in SE. Researchers and practitioners can use these results to assess perceived diversity in their teams.

  • An identification of important limitations and threats to validity when analyzing perceived diversity. Researchers studying perceived diversity can use this outcome to mitigate or avoid known threats in their studies.

  • An identification of gaps in the current perceived diversity research that suggests areas such as race for further investigation. Researchers might use the gaps identified to carry out further research.

This paper is organized as follows. Section 2 presents the perceived diversity background and previous related SLR studies published in the SE field. In Section 3, we present our SLR methodology, research selection criteria, and process. Section 4 shows the results of applying our assessment criteria to 131 studies and synthesizes our results. Section 5 discusses the results associated with the different perceived diversity aspects. Section 6 identifies the threats to the validity of this SLR. Finally, in Section 7 we summarize and present our conclusions.

2 Background and Related Work

2.1 Perceived Diversity Aspects

Software Engineering (SE) concerns not only technical aspects of how to build and develop software but also concerns human aspects (Lenberg et al. 2015) as software products are created by developers (Hongyun et al. 2009).

When software developers are working in face-to-face teams, they can perceive perceived diversity aspects (e.g., gender, race, and nationality) from other members of the team. Similarly, when software developers are working in online teams, they can infer the perceived diversity aspects from others based on the developers’ names, photos, pronouns, English fluency in their comments, and their online profiles. For example, developers in online communities are aware of the gender, ethnicity, and age of most of their team members [SLR(Vasilescu and Filkov 2015)].

For this systematic literature review, we considered four perceived diversity aspects that we believe to be the most relevant for the Software Engineering community. Gender is related to an individual’s own gender identity typically as man, woman, or non-binary (Usher 2006). In our SLR, gender-related studies encompass topics such as gender identity and gender perception. Age is the biological age of people. In our SLR, age-related papers study the age of developers. Race is a social construct used to categorize diverse populations, it is linked with physical characteristics such as skin color. In our SLR, race-related papers encompass previous studies that analyzed different racial and ethnic groups such as Black, Hispanic, Asian Pacific Islander, or White, among others. Nationality is the country of origin, language, or culture that characterizes social groups. The nationality papers in this SLR study the communication between teams, stereotypes that can trigger bias against developers, and national beliefs that can influence SE practices and practitioners.

2.2 Related Work

Previous to our SLR, five literature reviews have analyzed gender diversity in SE.

Canedo et al. (2019) conducted a systematic literature review in the context of Open Source Software (OSS) communities. Their study analyzed 24 papers intending to find factors that could help increase the engagement of women to contribute to OSS. This SLR identifies some factors among women that cause a lack of interest, and some possible solutions. The main findings of Dias Canedo et al.’s SLR (2019) indicate that women are underrepresented in the OSS community with less than 10% of the total developers. Furthermore, they stated that the reason for this under-representation may be associated with women’s workplace conditions, which may support men’s gender bias. Our SLR expands Canedo et al.’s SLR (2019) because it focuses not only on women and OSS but on five perceived diversity dimensions in OSS and industry.

Spichkova et al. (2017) conducted a literature review of gender diversity aspects within the field of Software Architecture. The authors only found one paper published in this field. Therefore, their results indicated a big gap in Software Architecture literature as the majority of the publications on gender diversity aspects within SE were not focused on this field.

Silveira and Prikladnicki (2019) conducted a systematic mapping study in the context of SE and Agile Methodologies to identify how diversity is discussed in SE. They identified 221 qualified papers in their systematic mapping. These qualified papers studied Gender (129), LGBTQI (2), Age (10), Race (7), Cultural (67), and Disabilities (1). Our SLR differs from the systematic mapping because our study largely explains the outcome of the papers. While Silveira and Prikladnicki (2019) report the frequency of papers per year, per diversity, and per conference, we study the outcome, the methods and tools proposed, whether these studies show any type of bias or inclusivity-efforts and the limitations and threats found in previous studies. Furthermore, we have investigated why Silveira and Prikladnicki (2019) systematic mapping identifies 221 qualified papers whereas our SLR identifies 131 qualified papers. After comparing both outcomes we have noticed that some of the studies in Silveira and Prikladnicki (2019) did not qualify for the inclusion criteria in our SLR as the publications were not specifically from SE journals or conferences. Another difference is that the systematic mapping includes papers related to software management and agile methodologies but our SLR does not.

Fazli and Bittner (2017) conducted a systematic literature review to identify the impact of national cultural factors on collaborative software development approaches. Fazli et al.’s SLR (2017) analyze 20 papers and its results indicate that there are differences in communication, interaction, and decision-making during collaborations. Such differences may cause problems in distributed projects because of some cultural ignorance issues.

The closest study to ours is the SLR by Menezes and Prikladnicki (2018) which aims to analyze the characteristics of diversity in SE through a systematic literature review. Their SLR included 29 papers and their findings relate the types of perceived diversity with a SE domain. Our SLR includes the 29 papers from Menezes and Prikladnicki’s SLR (2018) and goes further to identify not only the SE domain, but also the type of study, the tools, methods, and processes proposed. Furthermore, our SLR also summarizes the outcomes studied so far.

3 Systematic Literature Review Methodology

We conducted a Systematic Literature Review to review the literature on the perceived diversity in Software Engineering. We followed the SLR approach identified by Kitchenham and Charters (2007). This approach presents appropriate guidelines that have been derived from guidelines in medical research and adjusted to suit software engineering. Besides these guidelines, we also followed the structure of Hall et al.’s work (2011) for conducting the review and presenting the results.

All the steps of our SLR (i.e., research questions, inclusion and exclusion criteria, and research selection process) are documented in this section. The research questions help us formulate the aim of this systematic review. The inclusion and exclusion criteria help us assess each potential study. The research selection process helps us find as many studies relating to the research question as possible. The related data are available onlineFootnote 5 for further validation and replication.

3.1 Research Questions

This SLR aims to summarize the existing evidence concerning the perceived diversity in Software Engineering. Our purpose is to help researchers and practitioners to identify what has been studied so far, what has been proposed to help foster perceived diversity in SE, and what are the limitations and threats faced in previous SE perceived diversity studies.

  • RQ1: What are the types of perceived diversity research studies in SE? Motivation With this question our goal is to depict an overview of the perceived diversity state-of-the-art in SE and help researchers and practitioners to better understand what has been studied and what are the results of these studies. Based on the results from RQ1, we can identify gaps in the current literature and create a call for future action in perceived diversity in SE. For that, we identified the frequency of the papers of each perceived diversity aspect per year, the most frequent venues which publish perceived diversity studies, and the number of papers published for each perceived diversity aspect. Our primary contribution in this RQ is the summary of the outcomes for each perceived diversity aspect in SE.

  • RQ2: What the perceived diversity research has proposed to foster diversity-inclusiveness in SE? Motivation With this question we want to identify and describe the tools, methods, and practices proposed by previous studies. For that, we identified the papers that describe and prove inclusivity efforts and summarize their outcomes. Researchers can use these results to further study perceived diversity issues. Furthermore, practitioners may use these results to assess or foster diversity and inclusiveness in SE.

  • RQ3: What are the challenges faced by SE researchers when studying perceived diversity? Motivation With this question we aim to help researchers identify the threats to validity and limitations that previous studies have faced when studying perceived diversity in SE. That way, researchers examining perceived diversity can easily identify whether their study presents known threats to validity. For that, we summarized the threats and limitation sections of the papers included in the SLR.

3.2 Inclusion and Exclusion Criteria

The inclusion criteria for studies to be included in our SLR are:

  • We included studies that analyze any perceived diversity aspect related to SE.

  • We included studies that analyze any perceived diversity aspect related to SE in student teams that simulate SE activities.

  • We included studies that are written in English.

  • We included studies published as either Journal papers, conference proceedings, or Workshop proceedings.

  • We included studies published in computer science venues.

  • We included studies published only in mature venues where SE papers are typically published.

  • We included peer-reviewed studies.

The exclusion criteria for studies to be included in our SLR are:

  • We removed book chapters, work in progress papers, poster, and Master or Ph.D. theses because they may be published as papers as well and we did not want to double count.

  • We removed short versions of long version papers to mitigate duplicate results.

  • We removed discussion papers, i.e., those papers studying the perceived diversity in a psychological, social, or philosophical context because they were not related to SE activities.

  • We removed software engineering education papers that describe studies to increase the gender diversity in SE courses. For example, papers that explained the results of implementing certain criteria, frameworks, models, or practices at Universities. We are aware that this is an important discussion that we are not including in this SLR and that should be analyzed in future research, but this SLR focuses on perceived diversity aspects when engineering software instead of when studying software engineering.

  • We removed papers that analyze perceived diversity from a software management perspective (i.e., planning, scheduling, resource allocation, execution, delivery) and a team management perspective (i.e., communication) because these papers were not related to perceived diversity in the software development.

  • We removed papers that describe machine learning techniques for identifying some perceived diversity aspect in people because they are not related to SE per se.

  • We also removed literature review papers because they are synthesizing the results from previous studies.

3.3 Research Selection Process

We looked in the IEEE database and Google Scholar. We use Google Scholar because it has high accuracy in locating citations and it provides more results than other search engine databases. We also use the IEEE database because it is a solid academic source. To identify the tentative articles we used the following search string. We have limited the search of the strings only to the title:

figure d
figure e

The search in Google Scholar and IEEE database resulted in a total of 5671 papers. During the cleaning process, we removed 2477 entries because they were duplicated entries, master and Ph.D. thesis entries, unrelated entries (e.g., keynotes, panels, welcome message), and entries from unrelated venues to software or engineering computing. During the first interaction, we read the title and the venue of publication and removed 2822 entries based on the exclusion criteria. When the title was insufficient to decide, we looked over the abstract. During the second iteration, we read the abstract of the tentative studies and removed 138 papers based on the exclusion criteria. When the abstract was insufficient to decide, we looked over the paper. On the third iteration, we examined the tentative papers thoroughly and removed 168 papers based on the exclusion criteria. On the fourth iteration, we investigated the references to the resulting set of publications – a process known as snowballing. This iteration was repeated until no newer relevant studies were found. We added 51 new papers to the set of publications. In the fifth iteration, we identified who are the prolific authors in perceived diversity aspects from the set of publications, and we contacted them to assess the 117 publications. Based on the prolific authors’ recommendations we included another 14 papers to our SLR list. Note that 8 out of the 14 recommended papers were published after May 2020, therefore they were not found in our first search. Table 1 presents the number of tentative papers after each iteration.

Table 1 Resulting set of publications after each iteration

Until the fourth iteration, the first and second authors of this paper divided the resulting set of publications in half and excluded the papers based on the exclusion and inclusion criteria. Furthermore, when the first or second author of this paper had doubts about the inclusion or exclusion of a paper, they annotated it as an undecided paper. Then, both authors discussed the undecided papers with each other until they reached an agreement to include or exclude these papers. In the fifth iteration, the first and third authors discussed the prolific authors’ recommendations until reach an agreement. These procedures help improve the reliability of the results.

4 Results

We first describe the data that we extracted from the 131 qualified papers. We then descriptively synthesized the data with a quantitative summary.

Demographic Data

The first paper studying perceived diversity in our dataset of 131 papers was published in 2003. The 131 papers were published by more than 490 unique authors from different countries within Africa, Australia, Asia, Europe, North America, and South America. Figure 1 shows a yearly trend of the paper counts grouped by the perceived diversity aspects. We can observe an increasing trend of studies in perceived diversity aspects from 2013 with peaks in 2018 and 2019.

Fig. 1
figure 1

Yearly trend of the papers count grouped by the perceived diversity aspects

The type of publication with more papers published was conferences (65%), followed by journals (28%), and workshops (7%). The top three conferences with more publications were: “IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)” (15%), “International Conference on Software Engineering (ICSE)” (13%), and “ACM Conference on Human Factors in Computing Systems (CHI)” (12%). The Journal publications were more spread between different Journals. The top three journals with more publications were: “IEEE Software” (30%), “IEEE Transactions on Software Engineering (TSE)” (13%), and “Empirical Software Engineering (EMSE)” (8%). Finally, the workshop with a higher number of publications was: “ICSE-Workshop on Gender Equality in Software Engineering” (67%).

SE knowledge area

We classified each paper based on the knowledge area that the authors were addressing in the papers. For identifying the knowledge area, we used the “SoftWare Engineering Body Of Knowledge (SWEBOK)” framework (Bourque et al. 2014). Table 2 shows the descriptions of the SWEBOK that we found in the papers and the percentages of papers within each knowledge area.

Table 2 Description and percentages of the knowledge area studied in the 131 papers included in our SLR according to the SWEBOK framework

Purpose data

We identified two main purposes between the 131 papers: (1) papers that show perceived diversity biases or differences in SE, and (2) papers that describe or present inclusivity-efforts to tackle perceived diversity in SE. While 69% of the papers show perceived diversity bias or differences in SE activities, only 31% of the papers describe inclusivity-efforts to assess the perceived diversity in SE.

Research method of studies

We classified the 131 papers based on the research method used in the paper following the five most relevant research empirical methods described by Easterbrook et al. (2008). In addition to these five methods, we have added two more categories (“Mixed study” and “Not applicable”) as some papers cannot be easily classified into the five methods. The description of each research method and the percentage of papers that fall into each category is presented in Table 3.

Table 3 Description of the research method used on the studies included in our SLR

Perceived diversity aspects

We classified the papers based on the five perceived diversity aspects: Gender, Race, Age, and Nationality.

Gender diversity was the dimension most studied with 61% of the papers. These papers encompass gender identity studies, gender perception studies, and studies on transgender developers. Nationality diversity was the second dimension most studied with 10% of the papers. Nationality papers analyze the communication between teams, stereotypes that can trigger bias against developers, and cultural/national beliefs that can influence SE practices and practitioners. The third dimension most studied was age diversity with 8% of the papers. Finally, race diversity studies accounted for 2%. The remaining 19% of the papers studied a mixture of perceived diversity aspects, the most common combinations were Gender-Age (6%), Gender-Race (3%), and Gender-Nationality (3%).

Outcome Data

We extracted the main results of the papers and the threats to validity related to perceived diversity and, when it was possible, the dependent and independent variables used in the studies. Figure 2 shows the paper counts grouped by the SWEBOK activity and the study methodology used within the different perceived diversity aspects. Most of the papers in our SLR have a gender diversity dimension, are case studies, survey studies, or controlled experiments, and the SWEBOK is not clearly defined (not applicable) or professional practice.

Fig. 2
figure 2

Perceived diversity aspects grouped by the SWEBOK activity (right) and the study methodology (left) used in the 131 papers from our SLR

Table 4 presents the synthesis of the 131 papers based on the data extracted previously. Each paper is classified based on the diversity type, the research methodology, and the purpose. Notice that we did not use a statistical meta-analysis method to combine numerical results because the different nature of the 131 papers increases the complexity to carry out a meta-analysis.

Table 4 Synthesis of the social data form the 131 papers

4.1 RQ1: What are the Types of Perceived Diversity Research Studies in SE?

Since the findings from the 131 papers are very broad, we present what has been studied and discovered related to the perceived diversity in SE in different sections. These sections summarized the main findings and report (1) differences within the perceived diversity aspects; (2) relationships between the perceived diversity aspects and SE metrics; (3) SE practitioners’ perceptions about diversity aspects; and (4) challenges, barriers, and motivations faced by SE participants within the perceived diversity aspects.

4.1.1 Differences Within the Perceived Diversity Aspects

From the 131 studies in our SLR, we found results related to differences in gender, nationality, age, and race.

Gender differences

We report gender differences based on women and men as the studies included in this SLR only reported differences based on these two genders. Findings from previous studies show both statistically and no statistically significant gender differences between men and women in different fields. We first describe the findings related to statistically significant gender differences and then the findings related to no statistically significant gender differences.

Statistically significant gender differences

The studies from this SLR report gender differences in the way how developers solve problems [SLR(Beckwith et al. 2006)], how developers tinker [SLR(Beckwith et al. 2006)], how developers use debugging strategies [SLR(Grigoreanu et al. 2009)], and when developers use those strategies successfully [SLR(Grigoreanu et al. 2009)]. For example, findings show (1) significant gender differences in the features elected to use by developers and the willingness to tinker and explore features [SLR(Burnett et al. 2010)], (2) men tend to switch more frequently between debugging strategies [SLR(Cao et al. 2010)], (3) women tend toward underconfidence and men tend to use more unfamiliar software features [SLR(Beckwith et al. 2005)], (4) software environments are often aligned with the needs of men rather than women [SLR(Beckwith et al. 2005)]; and (5) some end-user tools for debuggers may not fully support women debugging strategies [SLR(Subrahmaniyan et al. 2008)]. Similarly, other studies report gender differences during code review practices indicate that (1) accepted pull requests submitted by women and men provide similar descriptions in terms of length, and generate a similar number of discussions [SLR(Imtiaz et al. 2019)], (2) when both genders are known, women tend to have contributions accepted more often than men when they the contributors are from insiders to a project, but men’s acceptance rates are higher when the contributors are from outsiders to a project, [SLR(Terrell et al. 2017)], (3) while women concentrate their work across fewer projects and organizations, men contribute to a higher number of projects and organizations [SLR(Imtiaz et al. 2019)], (4) there are gender differences in the use of positive opinion words, emoticons, and expletives during the code review as while men tend to express more positive/negative sentiments, women tend to express more neutral comments instead of expressing strong sentiments [SLR(Paul et al. 2019)], and (5) men and women follow different comprehension strategies when reading source code [SLR(Zohreh Sharafi et al. 2012)]. There are also gender differences in pair programming related to coordination, communication, and collaboration. Same-gender pairs tend to be democratic but mixed-gender pairs tend to have one authoritarian partner, and women preferred women partners [SLR(Kaur Kuttal et al. 2019)]. However, the findings indicate that productivity in pair programming is not affected by differences in gender pair types [SLR(Gómez et al. 2017)]. Finally, there are gender differences within the OSS community and the SO community. OSS studies found that (1) there is gender bias in the authorship of contributions to OSS projects from the past 50 years since while men have contributed with more than 92% of OSS code in the past 50 years, women reached 10% of the yearly contributions in 2019 [SLR(Zacchiroli 2020)], and (2) the perceptions and preferences of women and men developers in GitHub are statistically significant. Women developers significantly prefer in-person communication and are more frequently aware of the gender, ethnicity, and age of most of their team members [SLR(Vasilescu and Filkov 2015)]. Studies on SO reported that (1) men and women posted equally as often in that platform, but men post statistically significant a higher number of posts [SLR(Kuechler et al. 2012)], and (2) there are significant gender differences in the participation and success in SO. While women are asking more questions, men are obtaining more votes because they answer more questions [SLR(Wang 2018),(May et al. 2019)].

No statistically significant gender differences

Findings indicate that there are no statistically significant gender differences between men and women in terms of (1) the ability to learn the new features when debugging [SLR(Beckwith et al. 2007)], (2) the difficulty of the strategies used for program comprehension [SLR(Fisher et al. 2006)], (3) the time using debugging strategies [SLR(Cao et al. 2010)], (4) the individual productivity in OSS projects [SLR(Bosu and Sultana 2019)], (5) accuracy, required time, and effort in source code reading [SLR(Zohreh Sharafi et al. 2012)], and (6) timing of the review, content category, and character length in the mobile app’s feedback [SLR(Guzman and Rojas 2019)].

Nationality differences

The results from the SLR indicate nationality differences in an academic, industrial, and OSS context. Findings in the academic context reported that while student teams formed by homogeneous cultures perform better initially student teams formed by heterogeneous cultures perform better in the long term [SLR(Anderson et al. 2019)]. In an industrial context, the national differences of two vendor teams of the same organization were responsible for the differences in the software-testing approaches (concerning the testing-team structure, thought process, expectations, the primary focus, and trust levels) [SLR(Shah and Harrold 2013)]. Finally, in an OSS context, GitHub contributors from countries with low human development indexes (HDI) face more rejections than other contributors from or high HDI countries [SLR(Furtado et al. 2020)].

Age differences

The results from the SLR indicate that some developers tend to use stereotypes and do not include older populations, and older adults prefer to participate as consultants because they may face some confidence barriers [SLR(Kopeć et al. 2018)]. In SO, the users’ reputation scores increase relative to age well into the 50’s, users in their 30’s tend to focus on fewer topics than both younger and older users in age [SLR(Morrison and Murphy-Hill 2013)].

Racial differences

The results from the SLR indicate that there are no traces of explicit racism in the written comments left by GitHub developers in code review practices. However, while the pull requests from perceptible non-white developers tend to be closed and non-merged without a reason, the pull requests from perceptible white developers tend to have a reason [SLR(Nadri et al. 2020)].

4.1.2 Relationships Between the Perceived Diversity Aspects and SE Metrics

We have identified the relationship between dependent and independent variables that have been studied in the papers and summarized the empirical evidence, i.e., the hypothesis tested, assumptions, or bias, that have been reported so far with relation to the perceived diversity aspects. The independent variables are the different perceived diversity aspects. The dependent variables are related to SE metrics.

Gender

The studies have reported negative and positive relationships between gender diversity in SE and SE metrics in different online communities and industrial workplaces. We first describe the results related to online communities and then present the results related to industrial workplaces.

Online communities

Studies analyzing the SO community indicate that (1) the scoring SO uses is designed to rewards practices more common among men (posting and answering questions) than among other people [SLR(Vasilescu et al. 2014)], (2) women who encountered other women in SO were more likely to engage sooner than those who did not [SLR(Ford et al. 2017)]; and (3) it is uncommon for a woman that posts a question to receive an answer or comment from another woman, but most of the women interaction exists when the post is initiated by a woman [SLR(Morgan 2017)]. Similarly, studies analyzing the GitHub community indicate that (1) women are more likely to continue participating in a project when the diversity of the team members’ expertise is higher [SLR(Qiu et al. 2019)], (2) the presence of women in development teams generally reduces community smells [SLR(Catolino et al. 2019b)], (3) gender diversity has a significant positive effect on productivity [SLR(Vasilescu et al. 2015)], and (4) code review developers spent the majority of their time reviewing the source code in a pull request, but they also spent a considerable amount of time looking at technical and social signals of the pull requester [SLR(Ford et al. 2019)]. Finally, studies analyzing gender diversity in online communities generally state that (1) the majority of developers in library OSS are men between 30 and 49 years old [SLR(Choi and Pruett 2015)], (2) women participation and women leadership in OSS is not representative of the true population as less than 10% of the developers contributing in OSS are perceived as women [SLR(Bosu and Sultana 2019)], (3) there are more women participation in library OSS than general OSS projects [SLR(Choi and Pruett 2015)], (4) women start to participate in FLOSS projects at a later age than men, mainly with tasks different from coding, and have different reasons to start and remain contributing in FLOSS projects [SLR(Robles et al. 2016)]; and (5) women are often hesitant to make contributions to a new project in GitHub [SLR(Wang et al. 2018)].

Industrial workplace

Studies analyzing industrial workplaces indicate that (1) the gender of software engineers was a statistically significant predictor of self-rated productivity as software engineers who identified themselves as women and who selected to write their gender identity reported significantly higher self-rated productivity than men [SLR(Murphy-Hill et al. 2019)]; (2) gender diversity has a significant positive effect on performance [SLR(Gila et al. 2014)], (3) while the self-perception in men’s end-user debuggers is positively correlated with the debugging performance, the self-perception in women’s end-users is not correlated [SLR(Chintakovid and Wiedenbeck 2009)], (4) women do not seem to be willing to acknowledge their own performance in software development teams, but their performance is highly regarded by their teammates [SLR(Bastarrica and Simmonds 2019)], (5) all professional software developers, both men and women, have implicit gender biases which may impact their decision-making on the evaluation of other developers’ contributions [SLR(Yi and Redmiles 2019)], and (6) there is a gender pay gap of 18% in a UK company in which only 11% of senior positions were taken by women [SLR(Kirton and Robertson 2018)]. Finally, studies performed by students simulating industrial software teams indicate that (1) gender diverse teams are more effective and coordinated than non-gender diverse teams [SLR(Marques 2015)], and (2) gender alone, and cognitive style and gender together in teams were positively associated with feature novelty. Women designed significantly more novel software features [SLR(Pretorius et al. 2020)].

Nationality

Studies from this SLR reported that diversity in the language of team members (1) has a negative impact on the quality of the work [SLR(Pieterse and van Eekelen 2018)], (2) a negative impact on community engagement [SLR(Daniel et al. 2013)], (3) it might lead to difficulties in the communication between GitHub users [SLR(Vasilescu et al. 2015)], and (4) together with the country of residence has a positive effect on market success [SLR(Daniel et al. 2013)]. Findings related to diversity in the country of residence indicates that developers’ country (1) has a statistically significant relationship to the project’s success, but a small effect size [SLR(Aué et al. 2016)], (2) is linked with lower team politeness [SLR(Ortu et al. 2017)], and (3) have a statistically significant relationship with the acceptance of pull requests [SLR(Rastogi et al. 2018)]. Findings related to diversity in the culture of developers indicate that (1) people working together in a team will form ties with each other despite their cultural differences [SLR(Dong et al. 2016)], (2) cultures with high levels of inequality in the hierarchical structure of software development organizations may have negative effects to form and retain software development teams [SLR(Borchers 2003)], and (3) the communication in distributed software development teams is negatively affected when there is a high cultural diversity in the team [SLR(Casey 2009)].

Age

Studies from this SLR reported that (1) the age of code reviewers does not affect the correctness and efficiency of their reviews [SLR(Murakami et al. 2017)], (2) age diversity has a low correlation with team efficiency [SLR(Altiner and Ayhan 2018)], (3) fully involving older adult participation improves the process of developing apps for the older population, and (4) developers older than 40 years and with more than 15 years of experience can contribute to increasing the knowledge of software development in SO [SLR(Morrison et al. 2016)].

Race

Studies from this SLR reported that (1) the R community has an under-representation of non-white attendees in their UserR! conference [SLR(Bollmann et al. 2017)], (2) there are gender and racial stereotypes behavior embedded in modern software [SLR(Brun and Meliou 2018)], and (3) the racial diversity in student software development teams does not have an impact on how well the team members cooperated to develop software products [SLR(Pieterse and van Eekelen 2018)].

4.1.3 Participants’ Perceptions About Perceived Diversity in SE

We have summarized the perceptions, experiences, and reflections of participants who work and contribute to software engineering activities. These perceptions, experiences, and reflections are related to the perceived diversity aspects and have been reported in the 131 paper.

Gender

Studies related to FLOSS communities have found that many participants express a positive sentiment towards including women in the participation of FLOSS projects. However, women contributors did face sexism or had encountered a sexism incident [SLR(Lee and Carver 2019)]. In fact, harsh and sexist treatment faced by women is said to be “as constant as it is extreme” and harsh discussions about which source code piece should get accepted or merged in the software leads to the project behave as a “pushyocracy” instead of a meritocracy, a prime reason why women leave these communities [SLR (Nafus 2012)]. Previous findings also report that (1) the majority of older adults claimed to have witnessed discrimination towards others in FLOSS, especially against non-native English speakers and women [SLR(Davidson et al. 2014)], (2) successful OSS women believe that having codes of conduct in the projects would help to increase and retain women and allies [SLR(Singh 2019b)], and (3) transgender developers have expressed discomfort when participating in Hackathons and they are concerned about LGBQTPhobia in these events [SLR(Prado et al. 2020)]. Similarly, studies related to the industrial work environment have found that (1) women are less satisfied with the spirit of their team at their company [SLR(James et al. 2017)], (2) 12.3% of the men admitted to having a conscious bias against women and 40.2% of women have felt gender discrimination during their career [SLR(Blincoe et al. 2019)], (3) developers perceived gender diversity of a team as being less important than developer experience or team size to mitigate community smells [SLR(Catolino et al. 2019a)], (4) gender minorities in teams formed predominantly by men feel less included when the team uses audio channels or visual communication channels to discuss work [SLR(Hui and Farnham 2016)], (5) transgender developers think that working remotely can foster a more inclusive environment as they can control their professional identity [SLR(Ford et al. 2019)], and (6) men or women software architects who exhibit “feminine expertise”Footnote 6 are perceived as successful when in dealing with software architecting’s human aspects [SLR(Razavian and Lago 2015)].

Nationality

Studies from this SLR reported that (1) teams with perceptions of higher levels of diversity are more willing to cooperate when they are geographically dispersed [SLR(Robert 2016)], (2) intercultural competencies are not specifically important for team participants in the coding phase of the software product. But, intercultural competencies are seen as very important in tasks with high levels of communications [SLR(Holtkamp et al. 2015)], and (3) some cultures can lead to strong stereotypical beliefs that challenge women in teams dominated by men. These women have perceived difficulties having their work recognized, proving themselves, and feeling members of their team [SLR(Adikaram and Wijayawardena 2015)].

Age

Studies from this SLR reported that (1) software testing teams think that the age diversity of the team is not an important factor for improving the performance when compared with personality diversity and experience diversity [SLR(Kanij et al. 2011)], and (2) developers’ performance expectations in China, Germany, Poland, and Bulgaria are biased towards middle-aged employees over younger and older employees. These developers have negative age stereotypes toward older or younger employees depending on the country [SLR(Schloegel et al. 2018)].

Race

Studies from this SLR reported that Black women have experienced complete isolation in the field of computer science and they do not know if this negative experience is because of their gender or their race [SLR(Thomas et al. 2018)].

4.1.4 Challenges, Motivations, and Barriers Faced by SE Participants Within the Perceived Diversity Aspects

We have summarized the challenges, motivations, and barriers of participants who work and contribute to software engineering activities. These challenges, motivations, and barriers are related to the perceived diversity aspects and have been reported in the 131 paper.

Gender

Studies focused on community-based environments have identified that (1) while the most common barriers among women in FLOSS are related to social factors such as not being taken seriously, needed to prove themselves, or found it hard to find a mentor or attract attention, the most common barriers among men in FLOSS are related to entry factors such as the built environment, tooling, license terms, and the attitude of other participants [SLR(Lee and Carver 2019)], (2) there are five hidden barriers identified in SO that forestall women from contributing [SLR(Ford et al. 2016)], (3) newcomer women developers face gender bias related to multiple problem-solving facets in tool/infrastructure [SLR(Padala et al. 2020)]. In fact, 73% of the barriers in tools and infrastructure faced by OSS newcomers had some form of gender bias [SLR(Mendez et al. 2018)], (4) the competence-confidence gap is a threat to women’s contribution in GitHub [SLR(Wang et al. 2018)], and (5) while stereotypical web interfaces can trigger a less sense of belonging in women exposed to these interfaces, a neutral web interface does not negatively affect the sense of belonging of men or women [SLR(Metaxa-Kakavouli et al. 2018)]. Similarly, studies focused on industrial contexts have identified that (1) adult women in Finland were motivated to make a change in career into the software industry because of the variety of new jobs offered, the equal wage, and a flexible work climate [SLR(Hyrynsalmi and Hyrynsalmi 2019)], (2) women face challenges related to their self-esteem when talking about their skills [SLR(Hyrynsalmi and Hyrynsalmi 2019; Hyrynsalmi 2019)] and find the male-dominated industry sometimes challenging [SLR(Hyrynsalmi 2019)], and (3) women and men are affected by different types of interruptions during social isolation. While most of the challenges faced by women are related to the support with housework and child care responsibilities, men most of the challenges faced by men are related to the work space [SLR(Machado et al. 2020)].

Nationality

GitHub developers have reported challenges related to social barriers, i.e., language, economic reasons, or political reasons, in the communications channels they used for work in GitHub [SLR(Storey et al. 2016)]. These challenges have been also reported by mentors in OSS as the main barriers faced by their newcomer mentees in OSS [SLR(Balali et al. 2018)].

Age

Studies from this SLR reported that (1) developers older than 40 years and with more than 15 years of experience are less motivated to interact in SO than their younger peers [SLR(Morrison et al. 2016)], (2) the top three motivations of older adults to contribute to OSS are intrinsic motivation, community identification, and altruism [SLR(Davidson et al. 2014)], (3) the top 3 barriers perceived by older adults in their first time in OSS are lack of communication, installation issues, and documentation issues [SLR(Davidson et al. 2014)], (4) older adults face challenges on their first contribution and more social challenges in more recent contributions [SLR(Davidson et al. 2014)], and (5) the media in United Stated has pictured ageism as a major barrier in the process of hiring developers in large software companies [SLR(Baltes et al. 2020)].

4.2 RQ2: What the Perceived Diversity Research has Proposed to Foster Diversity-Inclusiveness in SE?

To answer RQ2 we have identified the tools, processes, models, and practices reported in the papers whose purpose was Inclusivity-Efforts. These papers help to mitigate bias in SE by proposing different elements to assess the perceived diversity of teams, or by identifying perceived diversity-inclusiveness issues. The results from RQ2 can be used by both researchers in perceived diversity and practitioners willing to increase diversity.

4.2.1 Tools, Models, and Theories to Foster Perceived Diversity-Inclusiveness in SE

GenderMag

It is a software inspection method to solve gender biases in software developers’ problem-solving experiences [SLR(Burnett et al. 2016; Burnett et al. 2018)]. Previous research shows that GenderMag is very effective in identifying gender-inclusiveness issues when evaluating technologies [SLR(Burnett et al. 2016; Hill et al. 2016; Vorvoreanu et al. 2019; Burnett et al. 2017)] and was found a powerful tool to uncover potential gender biases in system functionality and interface design [SLR(Cunningham et al. 2016)]. Although GenderMag is partially based on gendered personas, it did not promote gender stereotyping [SLR(Hill et al. 2017)]. Recently, Hilderbrand et al. have reported real-world experiences on how software teams can embed GenderMag into their development processes [SLR(Hilderbrand et al. 2020)]. To help practitioners, GerderMag has a GenderMag Recorder’s Assistant, which is a semi-automated visual tool that reduces the cognitive load for practitioners that work with GenderMag [SLR(Mendez et al. 2018)].

InclusiveMag

It is a generalization of the GenderMag method that assessed whether the software supports a particular dimension of diversity. InclusiveMag helps researchers to develop new inclusive design methods and evaluate the software product in terms of inclusiveness [SLR(Mendez et al. 2019)].

AID Tool

It is an automated detector for gender-inclusivity bugs in OSS project pages. This tool automates the GenderMag method for detecting gender-inclusivity bugs in OSS projects [SLR(Chatterjee et al. 2021)].

Themis

It is an open-source tool that helps discover and debug discrimination in different perceived diversity dimensions (race, age, and gender) in software products. This tool automatically generates tests to detect and measure discrimination [SLR(Angell et al. 2018; Galhotra et al. 2017)].

Gender-Extended Research and Development (GERD)

It is a model that combines gender studies approaches and computer science thinking. The model identifies seven core processes or phases of research and development and aims to contribute to the more inclusive development of technology. It also encourages reflection on how the construction and use of software systems are socially embedded [SLR(Draude and Maaß 2018)].

The intergroup contact theory

This theory was proposed by Gordon W. Allport (Allport et al. 1954) and poses that increasing interactions between different groups of people would possibly be helpful to reduce biases under some conditions. A recent study has proved that this theory helps reduce implicit gender biases in both general and SE-specific contexts [SLR(Yi and Zhang 2020)].

Ethics-aware SE

It is a model that captures, analyzes, and reflects the ethical values of different stakeholders in the SE processes and software specifications. This model assists in the creation of ethical software development by ensuring (1) that developers create software for organizations that are aligned with their ethical values and (2) that organizations follow their ethical principles. [SLR(Aydemir and Dalpiaz 2018)].

A framework to recognize fair and unfair modern code reviews

This framework is based on fairness theory and can be used to study and manage social behavior in modern code reviews [SLR(German et al. 2018)].

Support for Participant Involvement in Rapid and Agile software development Labs (SPIRAL)

SPIRAL is a method that provides strategies for the direct involvement of older adults in the development process [SLR(Kopeć et al. 2018)].

Redesign of SO reward system

The scoring SO uses is designed to reward practices more common among men than among other people. users. Thus, May et al. have proposed an alternative scoring system that equalizes the rewards obtained in SO and does not penalize any group of users in absolute terms. With this redesign, the median woman is marginally more successful than the median man [SLR(May et al. 2019)].

4.2.2 SE practices that Help to Foster Diversity

  1. 1.

    A formal definition of software fairness testing and causality-based measure of discrimination [SLR(Galhotra et al. 2017)];

  2. 2.

    Codes of conduct and women-only spaces facilitate women engagement and retention in online communities [SLR(Singh and Brandon 2019; Singh 2019a)];

  3. 3.

    Politeness helps to involve women in FLOSS projects [SLR(Moon 2013)];

  4. 4.

    Brainstorming techniques help participants to feel satisfied with the team’s process. The use of these strategies supports the satisfaction and outcomes of minority participants working in a team [SLR(Filippova et al. 2017)];

  5. 5.

    Interventions to reduce age stereotypes in software development: An awareness-based intervention and a cooperation-based workshop [SLR(Schloegel et al. 2016)];

  6. 6.

    Ethnography studies as a useful and usable approach to empirical software engineering research [SLR(Sharp et al. 2016)];

  7. 7.

    Agile values of collaboration as a framework for addressing gender diversity issues in a team [SLR(Judy 2012)];

  8. 8.

    The inclusion of two features in an end-user software development environment. The first feature provides a way to express less confident judgments and the second feature provides better explanations in the learning process of the end-user system [SLR(Beckwith et al. 2005)]. Furthermore, these features are effective in removing unintended barriers that affect the performance of women without harming men performance [SLR(Grigoreanu et al. 2008)];

  9. 9.

    Living Labs: This approach can help women professionals in their work as it offers opportunities to share experiences, provide insights, and create social changes in a collaborative environment [SLR(Ahmadi et al. 2018)];

  10. 10.

    Guidelines for encouraging older adults (< 50 years) in FOSS. These guidelines focus on providing feedback by projects and improving the communication by first-time older contributors and by projects [SLR(Davidson et al. 2014)];

  11. 11.

    Two proposed approaches that can address subjects’ implicit gender biases at individual and organizational levels [SLR(Yi and Redmiles 2019)]. The first approach is to design continuous training courses to learn about people’s own implicit biases. The second approach states that software development organizations should encourage and hire women to take counter-stereotypical roles such as technical leadership;

  12. 12.

    Research approaches such as perspective alignment, community trust-building, narrative research to understand community building [SLR(Ford 2020)];

  13. 13.

    Five recommendations for gender-inclusive hackathon events: (1) have a gender-inclusive organizing team; (2) foster inclusive communication; (3) make safety visible; (4) provide good working conditions to participants; and (5) showcase transgender people in the event [SLR(Prado et al. 2020)].

  14. 14.

    Gender discrimination in software development can be avoided using participatory and interdisciplinary approaches [SLR(Irrgang 2018)]; and

  15. 15.

    Empirical SE research on intersectionality to better understand software engineering in general and obtain more realistic and diverse studies regarding culture, gender, and ethnicity [SLR(Gren 2018)].

4.3 RQ3: What are the Challenges Faced by SE Researchers when Studying Perceived Diversity?

To answer RQ3 we have identified the threats to validity and limitations reported by the authors in the papers. The results are presented based on the limitation and threats found per the type of study methodology of the study and can be used by researchers on perceived diversity in SE to mitigate the possible threats of validity that their studies can have. Notice that some papers did not present a threat to the validity section and some threats were not directly related to perceived diversity.

Students as subjects

The low number of women students in class when compared to men students can influence the results [SLR(Bastarrica and Simmonds 2019)].

Measuring cultural team diversity

One of the limitations when measuring the cultural diversity of a team with this Hofstede’s framework (Hofstede 2005) is the underlying assumption that few dimensions can explain cultural beliefs and values (Shachaf 2008). Therefore, the use of ethnographic methods to discover the cultural beliefs and values of a team may be a better approach.

Gender identification

Men and women can misrepresent their genders in their online profiles [SLR(Ford et al. 2016)]. Indeed, around 30% of the surveyed participants who masked their gender by choosing gender-neutral pseudonyms were women participants [SLR(Lee and Carver 2019)]. When this occurs, the results of studies based on labels extracted from online profiles might not be accurate [SLR(Terrell et al. 2017)]. Furthermore, the tools and heuristics used to identify the gender of developers might not give reliable information when users do not use their real names [SLR(Qiu et al. 2019)]. Another limitation is the assumption that users/developers can be classified into only three gender groups (men, women, and non-binary), and then focus the study on only two genders (men and women) [SLR(Guzman and Rojas 2019)]. Risks implied by reducing gender studies to men and women are related to the inherent marginalization of non-binary individuals [SLR(Izquierdo et al. 2018)]. Researchers should acknowledge the possible limitations in their sampling and/or gaps in the generalizability of their study if they have only included people of binary genders (Scheuerman et al. 2021).

Unbalanced representation of real demographics

When doing a qualitative study in FLOSS, the number of women’s responses is normally very low (1-5% (Ghosh et al. 2002),[SLR(Storey et al. 2016)], 10.9% [SLR(Lee and Carver 2019)]). This is something that researchers should be aware of and try to balance the women’s responses in these studies [SLR(Lee and Carver 2019)].

Generalization

When survey respondents and interview participants come from the same geographical location or culture, the results from the study may not be generalized [SLR(Blincoe et al. 2019)]. Furthermore, when analyzing OSS projects, authors cannot generalize their results because OSS projects can be extremely different in terms of product, participant type, community structure, and governance [SLR(Bosu and Sultana 2019)]. The culture of participants in a case study can thereat the generalization of results too [SLR(Gilal et al. 2016)]. Also, a recent study states that considering intersectionality in research studies is important to understand the needs of different individuals as the experiences of Black women are often different from the experiences of both Black men and non-Black women in the United States (Ross et al. 2020).

5 Discussion

This SLR shows that perceived diversity studies in SE have critically reflected upon practices, methods, and even tools concerning inequality issues in both the development of software and the social aspects of the software. Specifically, while most of the studies analyzed in this SLR have shown social issues related to gender, less has been done to research how racial, age, and nationality diversity relates to SE.

Thus, we believe that more research initiatives have to be done to help the SE research community to understand the challenges and needs that other groups are facing in SE when participating in software engineering. In the following subsections, we discuss the findings of this SLR for each one of the perceived diversity aspects studied in SE.

5.1 Gender Diversity in SE

The study of gender diversity in SE has been essential to demonstrate the benefits that gender diversity brings to the workplace. Previous studies have proven that gender diversity is a factor associated with an increase in productivity [SLR(Vasilescu et al. 2015)] and innovation (Østergaard et al. 2011) in projects. However, as can be seen in Table 4, much of this gender research has focused on identifying bias through case studies, rather than proposed models, tools, or processes to assess gender diversity in software development teams. From the 80 studies on gender diversity, 53 (66%) have analyzed bias in SE and 27 (34%) have described Inclusivity-Efforts.

Furthermore, although 60% of the papers analyzed in this SLR have focused on gender, only two have analyzed transgender developers [SLR(Ford et al. 2019),(Prado et al. 2020)]. The transgender population has experienced and is still experiencing inexcusably high levels of harassment and violence (Lombardi et al. 2002). In the United States, 46% of transgenders had experienced verbal harassment (James et al. 2016). Therefore, we believe that gender identity, i.e., transgender and non-binary people, should be further studied in SE to understand their needs and to foster safer spaces for everyone.

Recently, LinkedIn reported that women in the technology area represent 24.4% of their users, while men represent 77.6% (Hall and Durruthy 2020). This gap is also present in the results from our SLR since women developers are still underrepresented in OSS and industrial companies after more than a decade of research [SlR(Robles et al. 2016; Izquierdo et al. 2018)]. Women still face discrimination between their peers [SlR(Blincoe et al. 2019)]. This result may indicate that more research into inclusivity-efforts in gender diversity should be done. OSS and industry should invest in reducing the gender gap. Likely, these efforts may have to be coordinated with other efforts to continue removing existing barriers and walking through an equitable integration in developer teams.

Currently, many companies are aware of the gender gap and are actively trying to reduce it. Specifically, big companies such as Google, Microsoft, and Mozilla are encouraging women to work in the software industry through some initiatives, i.e., Google’s DiversityFootnote 7, Microsoft’s Global Diversity and Inclusion program,Footnote 8 and Mozilla’s Diversity and Inclusion Strategy.Footnote 9 That way, these companies aim to create a better workplace free from prejudice and discrimination.

figure an

5.2 Racial Diversity in SE

Unfortunately, people might hold unconscious beliefs about various social groups that can be triggered by the perceived race derived from one’s physical appearance or one’s name. This unconscious bias can impact some races on behalf of others. Some studies on social psychology have demonstrated that to get one callback when applying for a job, if African-American people are identified from the names, they need to send almost double the resumes than people associated with White names (Bertrand and Mullainathan 2004). Furthermore, companies are less willing to hire a Mexican-Spanish-accented applicant than the standard American English-accented applicant for software engineering jobs (Hosoda et al. 2012).

Hence, the study of racial diversity in SE can help to identify the benefits of tackling problems from different perspectives (Shachaf 2008). We strongly believe that understanding the needs and barriers of different developers may help to avoid and mitigate stereotyping in the workplace. The results of this SLR indicate a gap in the literature when studying racial diversity in SE as we found only a few papers addressing racial diversity in SE. There is a lack of empirical evidence about race issues in SE. Some initial work reports on the negative experiences of Black women in the field of computer science [SLR(Thomas et al. 2018)]. Additionally, previous research has specifically not considered the perceptions of other ethnic subgroups such as Hispanics or indigenous people. Quesenberry and Trauth argue that it is not enough to simply increase the numbers of underrepresented minorities, organizations must adapt their practices to support the needs of these minorities (Quesenberry and Trauth 2012). Thus, we believe that to foster a healthier SE community by understanding and addressing minorities’ needs, researchers should further study the racial diversity in SE.

Furthermore, Thomas et al. report that Black women do not know if their negative experience was because of their gender or their race (Thomas et al. 2018). Therefore, we also believe that a better understanding of the intersectionality issues faced by SE developers would be essential to avoid negative experiences.

figure ao

5.3 Nationality Diversity in SE

The study of nationality diversity can help to identify the benefits of developing software in a cross-cultural environment and how to solve cultural issues. Specifically, in online collaborative environments is where the teams’ geographical diversity can be observed as developers are often from geographically separated localities [SLR(Vasilescu et al. 2015)]. Previous studies have demonstrated the cross-cultural issues faced by developers in distributed software development (Krishna et al. 2004). For example, Mishra and Mishra identified the communication, collaboration, and coordination issues as the most negative issues in distributed software development (Mishra and Mishra 2014). Shachaf found that cultural and language differences negatively impact cost, trust, cohesion, and team identity (Shachaf 2008). Paris et al. demonstrated that despite papers’ quality, papers with authors from some regions receive fewer citations than papers from authors from other regions (Paris et al. 1998).

In this SLR, most of the studies about nationality diversity have shown that collaboration between industrial organizations from different cultures may raise misunderstandings, discomfort, or cultural issues that may end with people leaving the organization. For example, Casey [SLR(Casey 2009)] studied the implications of misunderstanding and not addressing cultural differences in an Irish multinational organization that off-shored part of their software development process to Malaysia. Casey [SLR(Casey 2009)] found that some cultural differences force key Malaysian personnel to leave the organization which resulted in serious implications for the success of the project. Managers from both Irish and Malaysian organizations recognized the requirement for cultural training. Therefore, this SLR indicates the need for cultural training to avoid misunderstandings and discomfort between organizations.

However, less is known about what are the challenges that different nationalities are facing in OSS development. In online collaborative environments, there are a huge diversity of nationalities as developers often are from different geographically separated localities [SLR(Vasilescu et al. 2015)]. We found only one paper that studied the relationship between the country of residence of developers and the acceptance or rejection of their contributions to OSS projects [SLR(Rastogi et al. 2018)]. Other papers have studied geographical location but as a combination factor with culture. Therefore, we believe that the collaboration in OSS may face challenges since multiple different nationalities may collaborate in one project. Researchers should further investigate nationality diversity in online collaborative environments for OSS. For example, researchers can use qualitative ethnographic methods to discover how the significant components of nationality may affect software development.

figure ap

5.4 Age Diversity in SE

The study of age diversity in SE development teams can help to understand the myriad of skills and talents that each generation can bring to the team. For example, younger developers might have greater familiarity with the cutting edge of technology tools compared with their more mature counterparts. But mature SE developers might have exceptional interpersonal skills and experience leading software teams. However, previous social psychological studies have reported negative stereotypes toward older and younger employees across industries. While older employees are seen as poorer, less adaptive, and less innovative performers (Posthuma and Campion 2009), younger employees are seen as being disloyal to their employer, less supportive to their colleagues, and not willing to work with older employees (Hertel et al. 2013; O’higgins 2001). These stereotypes might lead to issues in attracting or keeping highly qualified developers, and ultimately, harm the project outcome.

This SLR has found a few papers addressing age diversity in SE. These papers report that age diversity does not affect the team efficiency (Altiner and Ayhan 2018) or the performance of developers [SLR(Murakami et al. 2017)]. However, some cultures may have stereotypes that damage both young and old developers in an industrial context [SLR(Schloegel et al. 2018)]. Thus, it was very interesting to find a paper that demonstrates how to reduce age stereotype bias through two interventions [SLR(Schloegel et al. 2016)]. The first intervention was an awareness-based workshop with 56 employees and the second intervention was a cooperation-based workshop conducted with 76 employees.

This SLR also indicates that the number of old developers participating in FOSS or SO is worrisome [SLR(Robles et al. 2016; Morrison et al. 2016)]. Thus, researchers should further study how to engage and retain older developers in these communities. That way, the SE community can avoid the loss of knowledge.

figure aq

5.5 Recommended Papers From Prolific Authors in Perceived Diversity Aspects in SE

In this section, we describe briefly studies related to perceived diversity aspects in SE that were not included in this SLR because they do not meet the inclusion criteria but were suggested by the prolific authors in perceived diversity in SE. The studies are discussed according to the following categories:

Disability

Although there are many dimensions of disability in SE, we briefly discuss two dimensions of disability: blind or visually impaired software developers and neurodiverse software developers. We believe that future work should analyze further what has been published about disability in SE.

The perceptions and experiences of blind and visually impaired software developers have reported that these developers perceived some barriers using screen readers when writing code (Mealin and Murphy-Hill 2012). The biggest challenges are related to workplace dynamics, project management dynamics, and tool accessibility (Huff et al. 2020). Although neurodiverse software engineers may desire some accommodations, most of them do not disclose their diagnosis to their company (Morris et al. 2015). Finally, a remote video game coding camp can improve the communication skills of autistic college students (Begel et al. 2021)

Intersectional studies

There is a need for intersectional studies as they can help us obtain a more realistic picture of software engineering with regards to culture, gender, and ethnicity [SLR(Gren 2018)]. Recently, three Black women academics shared their lived intersectional experiences during a global pandemic. They also created a call for action to “ stand in solidarity with Blacks in computing; and acknowledge, disavow, and dismantle Whiteness and oppressive power structures in the field of computing” (Erete et al. 2021). A study showed how the experiences of Black women are often different from the experiences of both Black men and non-Black women in the United States. In fact, Black women, are less likely to be introduced to computer science by their family during their school than other women, and they have fewer friends in computer science than both Black men and non-Black women (Ross et al. 2020). Finally, a study has shown how a virtual summer camp that focuses on informal computer science learning opportunities can increase the confidence of both Black and Latina girls (Braswel et al. 2021).

OSS projects for social good

A recent study by Huang et al. identified what are the motivations of software developers to contribute to OSS projects with a societal goal. They found that developers that participate in OSS projects for social good do not focus on their professional benefit but on leaving their mark on society. Developers in OSS projects for social good also evaluate the owners of these projects significantly more than other OSS developers (Huang et al. 2021).

6 Threats to Validity

We present the threats to validity according to Wohlin et al. (2012), who discuss four main threats in software engineering: internal, construct, external, and conclusion.

Internal validity threats

Our SLR may present some internal validity threats due to our choice of Google Scholar and IEEE database as the search engine for all publications. This is because other publications may exist that are not indexed by these engines. Another threat is that our search was done based on keywords appearing in the title. Therefore, relevant papers not using our keywords might be missing. To minimize this threat we followed the procedures for performing SLRs described by Kitchenham and Charters (2007) and we offered a replication package so that third parties can inspect our sources. Another factor affecting internal validity may be the maturity of the perceived diversity issues in SE, in the sense that our study may have been done at too early a stage to conclude. However, the first study we found was in 2003, thereby we believe that more than 15 years is enough time to allow us to extract valid lessons from the literature. Although we believe we have analyzed the most important papers in perceived diversity in SE so far, we may have missed some relevant studies due to language constraints, and thus underestimate the extent of perceived diversity research in SE. Also, we decided to exclude papers published in non-CS venues. Hence, our results must be qualified as applying only perceived diversity research published in mature international software engineering conferences or journals. Although we might have missed some important papers due to the search strings we used in Google Scholar and IEEE database, we have contacted field experts in the perceived diversity area to assess our SLR and minimize the number of important studies missed.

Construct validity threats

Our SLR may have construct validity threats as we have manually analyzed hundreds of papers for specific and sometimes very detailed aspects, so human errors may have occurred. For example, two of the authors selected the candidate studies independently and extracted the data. To mitigate human errors, before include or exclude a problematic paper for which the authors were unsure, they discussed the paper. Also, the first author has checked the data extraction from the second author and they have discussed the paper until they reach an agreement. The second author checked 25% of the data extracted by the first author and they agreed on the extraction. Our replication package does not remove the human errors that we may have incurred, but it offers the possibility for others to check and improve our work.

External validity threats

Although we have analyzed more than one hundred papers that addressed some perceived diversity issues in SE, we cannot claim that our results can be generalized outside the SE community. Our results present a general overview of the perceived diversity state-of-the-art in SE. We identify gaps that should be addressed, but we cannot claim that our results can be generalized in individual or specific SE communities. Thus, we believe that further investigation on perceived diversity issues is necessary for specific SE communities.

Conclusion validity threats

For the analyses of our SLR, we have extracted as much data as possible from the 131 papers: publication year, authors, venue, type of publication, SE activity done in the paper, and purpose. However, there might be other undiscovered valuable data in the papers that we did not extract. Not having used this undiscovered data can be a threat to the conclusion validity of the study since the results may be different. This SLR summarizes the findings of previous SE papers in perceived diversity. Therefore, it might be possible that some results about gender diversity are not generalized as previous studies might have considered all developers to be cisgender developers

7 Conclusion

Software Engineering is a collaborative field where a variety of diverse software developers work together to develop software. The higher diversity in a software development team, the greater possibility to understand and represent the final user’s needs (Muller and Kuhn 1993). However, as developers are human beings, they can hold conscious or unconscious bias when working with their peers. Such bias can be triggered in person A when person B believes that they have perceived diversity aspects (i.e., gender, age, and race) from person A.

Hence, to better understand how these perceived diversity characteristics are related to SE we have carried out this SLR. We first defined the four perceived diversity aspects (gender, age, race, and nationality) considered in this SLR and then analyzed what 131 previous studies have discovered and proposed to increase the awareness and reduce the bias of these four perceived diversity aspects in SE.

With this SLR we aim 1) to assist the SE community with the understanding of the different factors that may influence the engagement and permanence of diverse developers working in software engineering; 2) to identify different methods used to improve perceived diversity in teams, and 3) to be aware of the threats to validity and limitations that previous studies have.

In conclusion, this study highlights the gaps in the current literature and creates a call for future action in perceived diversity in Software Engineering. In addition, we found that gender diversity has been widely studying. Previous studies demonstrate that women increase productivity, performance, and efficiency. But, unfortunately, some developers still have a strong bias against women in OSS and industry. Nationality and age diversity has been the second and third most studied aspects. The papers indicate the differences that can occur when different cultures work together and that some old developers are seldom found in OSS and SO communities. Finally, our SLR indicates that race has been the perceived diversity aspect least studied in SE and they should be further studied.

7.1 Replication Package

The replication package can be found online.Footnote 10