Knowledge Flows Within Open Source Software Projects: A Social Network Perspective

Kerzazi, Noureddine; El Asri, Ikram

doi:10.1007/978-981-10-1627-1_19

Noureddine Kerzazi⁷ &
Ikram El Asri⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 397))

Included in the following conference series:

International Symposium on Ubiquitous Networking

1246 Accesses
2 Citations

Abstract

Developing software is knowledge-intensive activity, requiring extensive technical knowledge and awareness. The abstract part of development is the social interactions that drive knowledge flows between contributors, especially for Open Source Software (OSS). This study investigated knowledge sharing and propagation from social perspective using social network analysis (SNA). We mined and analyzed the issue and review histories of three OSS from GitHub. Particular attention has been paid to the socio-interactions through comments from contributors on reviews. We aim at explaining the propagation and density of knowledge flows within contributor networks. The results show that review requests flow from the core contributors toward peripheral contributors and comments on reviews are in a continuous loop from the core teams to the peripherals and back; and the core contributors leverage on their awareness and technical knowledge to increase their notoriety by playing the role of communication brokers supported by comments on work items.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Who Can Help to Review This Piece of Code?

Exploring Collaboration Networks in Open-Source Projects

Analyzing the Social Networks of Contributors in Open Source Software Community

Keywords

1 Introduction

Open source communities can be perceived as knowledge-sharing ecosystems in which contributors learn from the community and from each other [1]. They share both domain and technical knowledge through contributions to the source code repositories or by reviewing source code from one another. Interactions between contributors, which can be materialized by looking to co-edited files [2], constitutes the backbone of socio-technical perspective which has gained increased attention over the past decade [3–5]. Social Network Analysis (SNA) has been used to capture and understand such information about relations among people [6] with the aim at enhancing team performance and software product quality.

Previous research has shown that there are expert reviewing technical contributions involved in most OSS projects [7]. However, this past research does not explain how developers identify experienced contributors to review their code and how awareness and knowledge are spread through the contributors’ community. Many open source (OSS) projects adopt the practice of code reviews to increase the quality of their software products [8]. Collaboration on code review aims not only to improve the quality of code changes made by contributors [9], but also for the purpose of knowledge transfer and awareness [10, 11]. If we could explain the propagation of knowledge flows within contributor networks throw code source reviews, we can enhance the quality of the code and improve the signal to noise ratio of comments on commits which decrease teams’ performance. One way of locating reputed domain expert, to ask for reviewing a piece of code, is to build contributors networks and analyze it.

Historically, SNA has been known to be effective in many areas [12]. In this paper, we examine the socio-technical interactions for three OSS. Using histories of version control data, we constructed contributors’ networks based upon which files are commonly modified by contributors. Using network analysis, we can uncover details of knowledge sharing and the circulation of knowledge flows between the core and peripheral contributors. Our research questions can be summarized as follows:

RQ1. Is there a Relationship between Contributors’ Network and Knowledge Sharing?
RQ2. Does the network position of contributors affect the review process and the number of comments on GitHub projects?
RQ3. Does the Socio-Technical analysis make knowledge transfer an actionable concept?
RQ4. What Kind of Knowledge is transferred?

The main contributions of the paper are as follows:

A thorough Social Network Analysis of three OSS projects that provides insights into socio-interaction of contributors and their knowledge sharing;
A view of knowledge circulation through code review practice along with the kind of knowledge that is transferred;
An exploration of how SNA metrics can inform to answer whether or to what extent an open source community has a good underpin knowledge and awareness sharing mechanisms;
An understanding of whom are requesting code review; whom are commenting on reviews; and whom are performing reviews according to their network position and degree.

Paper organization. The remainder of the paper is organized as follows. Section 2 presents related work and background. Section 3 introduces our SNA-based method for identifying knowledge flows. Section 4 describes the selected projects from GitHub and data collection process. Section 5 provides our study results. Section 6 discusses our finding and points out practical implications. Section 7 discloses the threats of validity. Section 8 concludes and outlines future work.

2 Related Work and Background

Considering people at the heart of OSS projects, SNA in software development teams shows that social networking contains tremendous information that can be leveraged for purposes such as: defects prediction [3, 13], teams’ organization and coordination [14], team productivity [15], and tools or techniques for the purpose of studying developer communities [1, 4]. We first summarize related work according to these three different perspectives. Then we introduce previous work on knowledge sharing and propagation. Finally, we present what we know about SNA measures.

Defects Prediction—Rigby and Storey [16] examined manually hundreds of code reviews across five high-profile OSS projects aiming to investigate the mechanisms and behaviours that developers use to find code changes they are competent to review. They found that the Apache project adopted a broadcast-based style of code review, meaning increasing the awareness of new changes, but annoying the community with a high amount of irrelevant notification. Baysal et al. [9] studied the factors that influence the outcome of the review process and found that review positivity (i.e., the proportion of accepted patches) can be influenced by non-technical factors such as organization.

Furthermore, a recent qualitative study at Microsoft [10] showed that identification of defects is not the only motivation for code review, but sharing knowledge among team members is also considered as a very important motivation of modern code review. This related work indicates that our findings are not specific to the open source community but can be applied within commercial organizations.

Organization and Coordination—Recently, there has been considerable interests and work on improving the coordination between software team’s members [17]. Knowledge dependencies drive the need to coordinate software process activities. Saying that an SNA approach can support identification of coordination needs by identifying previous collaboration and communications. Social Network metrics arise as a response to those questions such as who should do what, when it is required.

Productivity—It has been reported that higher socio-technical congruence usually correlates with higher developer productivity [15] and reduces integration failures [17]. Both researchers and OSS projects leads could use STC to diagnose project members’ collaboration and improve team coordination [18].

Social Knowledge Sharing—Prior work has shown that social networking contains plenty of information that can be leveraged for other purposes [14]. For instance, the socio-cultural learning theories state that people learn from each other through observation, interaction and communication [19]. Seeing learning through its social aspect emphasizes the fact that OSS projects are increasingly growing. Contributors are part of a community of practice, organization, and belong to a group of people where there is competence knowledge already established. Source code review practice and comments are seen as ideal vehicles for leveraging tacit knowledge and learning.

3 SNA-Based Knowledge Flows

According to SNA [20], a Network consists of a set of nodes and a set of edges. Thus, we represent contributors as nodes as shown in Fig. 1. Connections, between those nodes, are weighted and represented based on the number of files the pair has collaborated on.

When two contributors are directly connected by an edge they are adjacent. The number of adjacent connections for a given contributor is called the Degree of that contributor. As illustrated in Fig. 1, C₃ has a degree of 3 and C₄ has a degree of 1.

Geodesic path refers to the shortest social distance between two contributors represented such as adjacent and unique connections. While networks’ diameter refers to the longest path between two contributors.

3.1 Contributor Network Metrics

Connectivity metric measuring direct connections between nodes. SNA has come up with three distinct structural properties to measure the centrality of a given node.

Centrality metrics measure how closely contributors are indirectly connected to each other in the network. SNA measures the centrality based on two metrics: closeness and betweenness.

Closeness refers to the average distance from a node to any other node in the network. For example, Closeness for C₁ = (1 + 1 + 2)/3 = 4/3 noticing that the shortest paths from C₁ to (C₂ and C₃) are each 1 and the shortest path from C₁ to C₄ is 2. For instance, Fig. 1 shows that C₃ has the maximum possible degree (3) meaning that it is central in this network. While C₁ and C₂ have a degree equal to 2; and C₄ has a degree of 1 meaning that this contributor is peripheral in this network.

Betweenness is another centrality metric calculated for a given node as the number of shortest paths that include this node divided by the total number of shortest paths in the network. In the example of Fig. 1, we have a total of 6 shortest paths. Saying that the betweenness of C₁ and C₂ is 3/6, while the betweenness of C₃ is 5/6.

4 DataSets

We focus our study on three large and rapidly evolving open-source systems which are highly stared projects from GitHub. Our choice of projects was based on the following criteria: (i) project should be among the 100 most stared projects; (ii) should be still under active development; and (iii) involving at least 250 contributors. Table 1 summarizes the characteristics of our selected projects including the programming language, the total number of developers, number of releases; number of lines of code; number of requested reviews; and the total of commits. Table 2 shows the characteristics of each network.

Table 1 Overview of the studied systems

Full size table

Table 2 Metric of the networks

Full size table

For each project we queried the GitHub API with the query https://api.github.com/repos/<owner>/<repo>/commits?page=<n>, where <owner> is a GitHub user account and <repo> is the name of the repository. Hence, we extracted the commits data for each project including details such as the programming language used, the time period covered, the number of commits and developers, information about releases as well as the number of edited files.

Once the commits and edited files were linked, we were interested by all requests of reviews for each project. Since our study is focused on knowledge sharing between contributors, we also extracted all comments on each code review. Our query retrieves open and closed issues (i.e., state = all), along with labels tagged as ‘need review’. Figure 2 summary interval times needed to close code review requests.

5 Study Results

RQ1. Is there a Relationship between Contributors’ Networks and knowledge sharing?

We were interested to find the position of contributors within the contributors’ network, that are asking for code reviews, commenting on code reviews, and carrying out reviewing activity. Figure 3 shows a comparative between Angular and Docker projects. We represent in red colour contributors’ network on top of which we map sub-networks. For instance, Fig. 3a1 illustrates the social network of Angular (red colour) and a sub network of contributors that requested code review (blue colour). One can observe the differences between the two projects in terms of density and position of requesters of code reviews.

Figure 3b_1–2 show the mapping of the contributors who have commented on code reviews (green). And finally, Fig. 3c. compares the relative network position of contributors that carried out the code reviews.

We found that core contributors act such as knowledge brokers and boundary spanners across comments loops not only from the periphery to the core, but also from the core to the periphery. We pay a close attention to how core contributors (experts) influence communication patterns through comments on code reviews and issues in OSS projects as well as transferring and spreading knowledge to peripherals.

RQ2. Does the network position of contributors affect the review process and the number of comments on GitHub projects?

We found a strong correlation between the position of contributors in the Network (Degree) and the number of comments on code reviews. Figure 4 illustrates the trend of communication in the Angular project. One can also notice that more the Degree is highly likely the contributors are active in transferring their knowledge and awareness to other contributors. For instance, surprisingly in Angular project we have identified 16.7 % of contributors such as core^{Footnote 1} developers that generate 81.6 % of communication against 83.3 % of peripheral developers generating only 18.4 % of the comments flow.

RQ3. Does the socio-technical analysis make Knowledge Transfer an actionable concept?

Core contributors are communication brokers that have awareness and both technical and domain knowledge. SNA allows us to identify central core contributors. We segregate contributors according to the degree of centrality they have. Our analysis shows that we can go further with SNA metrics and patterns that can support studying knowledge flows in OSS communities similar to previous studies that attempt to predict software failures based on SNA metrics [13].

Figure 5 shows a comparative of the betweenness centrality metric as well as the distribution of degree for contributors.

Table 3 summarizes SNA metrics for each network. Those metrics help to understand the nature of the project as well as the architecture. For instance, we observed a high density for Docker project probably meaning cohesive architecture of this project.

Table 3 Network SNA mertics

Full size table

Table 4 shows intrinsic SNA metrics emphasizing characteristics of interactions between contributors such as degree and centrality metrics (betweenness and closeness).

Table 4 Contributors SNA metrics

Full size table

6 Discussion

Review assignments are not sufficient to explain comments flows in a project. Figure 6 shows the distribution of contributors pre-assigned for code reviews (yellow) within the overall contributors’ network. Nerveless, we have seen quite lot of code reviews carried out by other contributors with different degrees of centrality, which is not necessarily problematic but may indicate areas where the review assignment was not supported by either awareness or cross-functional knowledge or the distribution of domain knowledge in the core team. OSS Team leads can optimize the team configuration when forming new teams, especially for the code review activity.

The core contributors are assumed to be structurally more central, in the contributors’ networks, than other contributors. They have enough either awareness or knowledge about the product to manage other developers’ contributions.

RQ4. What kind of Knowledge is transferred?

We manually classify comments according to technical or domain knowledge. Another category emerged throughout our classification process: Awareness. The majority of knowledge transfer is about Awareness (46.3 %), then technical (34.5 %) generating large contributors’ debates and domain knowledge (19.1 %). For example, contributor 13286 in Angular project commented on an implementing approach that he perceived as an anti-pattern:

[I’m not quite sure why you’re against this. The job of ‘inject’ is to inject a function, as its name implies. Not inject a function and eliminate its return value. I would argue, instead, that is an anti-pattern of function decorators. It’s confusing and unnecessary….] 13286.

7 Threats to Validity

Construct Validity—In this paper, we adopt co-edited files as a heuristic to build the graph of contributors’ networks. We do not consider the time frame such as co-edition within one month or under releases. In fact, we could rely on comments for SNA instead of co-edited files. However, focusing only on comments will hide the big analysis of all socio-interactions. Furthermore, our heuristic based on file co-edition does not consider the amount of LOC the contributors make. However, file editing is considered by many studies as a fine-grained enough indicator of developers’ collaboration [2]. Furthermore, we assume that all communications occur with either review requests or comments within the review process. We cannot assume that developers on GitHub are not using an external social media or mailing list to communicate.

Internal Validity—We are aware that we might miss transitive dependencies between technical elements. For instance, changing the framework on which depends many files is unseen such as a technical interaction. Moreover, software development is dynamic, and as contributions are made over time, the nature of the socio-interaction changes. We mitigated this threat by studying multiple open source projects, using different languages, within the GitHub community. Furthermore, our analysis is time-agnostic. Since contributors are changing over time, the number of core developers may vary as well. We plan to conduct a temporal analysis of core contributors in future work to get more insights on how those contributors rich their actual position in the Network.

External Validity—In this study, we choose three projects which therefore might limit the generation of our results. However, we choose carefully mature and long-lived projects running in different languages and with an amount of contributors ranging from 250 to 1403. We filtered away projects that have fewer than 250 contributors or fewer than 1,000 edited files to remove projects that are immature or without an underpinning socio-technical interaction, and thus alleviate potential bias.

8 Conclusion

In this paper, we have performed Social Network Analysis on three open source projects. We showed how knowledge is transferred between core contributors and peripherals when using code review activity. We build contributors’ networks based on co-edited files and then we build sub-networks for contributors requesting code reviews, commenting on, and those performing the code reviews. SNA visualization makes the identification of the structural interactions analysis of those networks possible. We found that there is a strong correlation relationship between the degree centrality of contributors and their implication on knowledge and awareness transfer.

By understanding the knowledge flows between OSS collaborators, socio-technical interactions structure, OSS communities gain an increased ability to facilitate code reviews in their projects. We hope this will lead to software projects with more efficient knowledge transfer, less overhead of review assignment, and increased leverage of the software quality and teams’ performance.

Notes

1.
We define a Degree threshold > 500 to filter on core developers which are marked as central in our SNA analysis.

References

VonHippel, E., VonKrogh, G.: Open source software and the “Private-Collective” innovation model: issues for organization science. Organ. Sci. 14(2), 209–223 (2003)
Article Google Scholar
Dabbish, L., et al.: Social coding in GitHub: transparency and collaboration in an open software repository. In: The Conference on Computer Supported Cooperative Work. Seattle, WA, USA (2012)
Google Scholar
Begel, A., DeLine, R., Zimmermann, T.: Social media for software engineering. In: FSE/SDP Workshop on Future of Software Engineering Research, pp. 33–38. Santa Fe, New Mexico, USA (2010)
Google Scholar
Yang, X.: Social Network Analysis in Open Source Software Peer Review, pp. 820–822 (2014)
Google Scholar
Yang, X., et al.: Understanding OSS Peer Review Roles in Peer Review Social Network (PeRSoN), pp. 709–712 (2012)
Google Scholar
Bird, C., et al.: Latent social structure in open source projects. In: Proceedings of the 16th International Symposium on Foundations of Software Engineering (FSE’08). Atlanta, Georgia (2008)
Google Scholar
Asundi, J., Jayant, R.: Patch review processes in open source software development communities: a comparative case study. In: The 40th Annual Hawaii International Conference on System Sciences (2007)
Google Scholar
Bissyande, T.F., et al.: Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In: 24th International Symposium on Software Reliability Engineering (ISSRE) (2013)
Google Scholar
Baysal, O., et al.: The influence of non-technical factors on code review. In: Proceedings of the 20th Working Conference on Reverse Engineering. Koblenz, Germany (2013)
Google Scholar
Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 35th International Conference on Software Engineering (ICSE’13). San Francisco, CA, USA (2013)
Google Scholar
Kilamo, T., et al.: Knowledge transfer in collaborative teams: experiences from a two-week code camp. In: 36th International Conference on Software Engineering (ICSE’13), pp. 264–271. Hyderabad, India (2014)
Google Scholar
Yarosh, S., et al.: I need someone to help!: a taxonomy of helper-finding activities in the enterprise. In: Proceedings of the 27th International Conference on Computer Supported Cooperative Work (CSCW’13), pp. 1375–1386. Texas, USA (2013)
Google Scholar
Meneely, A., et al.: Predicting failures with developer networks and social network analysis. In: International Symposium on Foundations of Software Engineering (FSE’11). Atlanta, Georgia (2011)
Google Scholar
Hossaina, L., Zhub, D.: Social networks and coordination performance of distributed software development teams. J. High Technol. Manage. Res. 20(1), 52–61 (2009)
Article Google Scholar
Cataldo, M., Herbsleb, J.D.: Coordination breakdowns and their impact on development productivity and software failures. Trans. Softw. Eng. 39(3), 343–360 (2013)
Article Google Scholar
Rigby, P.C., Storey, M.-A.: Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). 2011. Waikiki, Honolulu, USA
Google Scholar
Kwan, I., Schroter, A., Damian, D.: Does socio-technical congruence have an effect on software build success? a study of coordination in a software project. Trans. Softw. Eng. 37(3), 307–324 (2011)
Article Google Scholar
Cataldo, et al.: Identification of coordination requirements: implications for the design of collaboration and awareness tools. In: Proceedings of the 20th International Conference on Computer Supported Cooperative Work. Banff, Alberta, Canada (2006)
Google Scholar
Nam, K.K., Ackerman, M.S., Adamic, L.A.: Questions in, knowledge in?: a study of Naver’s question answering community. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Boston, MA, USA (2009)
Google Scholar
Kadushin, C.: Understanding Social Networks: Theories, Concepts, and Findings. Oxford University Press (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

National Higher School for Computer Science and System Analysis (ENSIAS), Rabat, Morocco
Noureddine Kerzazi & Ikram El Asri

Authors

Noureddine Kerzazi
View author publications
You can also search for this author in PubMed Google Scholar
Ikram El Asri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ikram El Asri .

Editor information

Editors and Affiliations

Computer Science Laboratory (LIA), University of Avignon, Avignon, France
Rachid El-Azouzi
Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Daniel Sadoc Menasche
Hassan II University of Casablanca, ENSEM, Casablanca, Morocco
Essaïd Sabir
CREATE-NET, Trento, Italy
Francesco De Pellegrini
INPT, Rabat, Morocco
Mustapha Benjillali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kerzazi, N., El Asri, I. (2017). Knowledge Flows Within Open Source Software Projects: A Social Network Perspective. In: El-Azouzi, R., Menasche, D.S., Sabir, E., De Pellegrini, F., Benjillali, M. (eds) Advances in Ubiquitous Networking 2. UNet 2016. Lecture Notes in Electrical Engineering, vol 397. Springer, Singapore. https://doi.org/10.1007/978-981-10-1627-1_19

Download citation

DOI: https://doi.org/10.1007/978-981-10-1627-1_19
Published: 04 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1626-4
Online ISBN: 978-981-10-1627-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Knowledge Flows Within Open Source Software Projects: A Social Network Perspective

Abstract

Similar content being viewed by others

Who Can Help to Review This Piece of Code?

Exploring Collaboration Networks in Open-Source Projects

Analyzing the Social Networks of Contributors in Open Source Software Community

Keywords

1 Introduction

2 Related Work and Background