An empirical study on the teams structures in social coding using GitHub projects

El Mezouar, Mariam; Zhang, Feng; Zou, Ying

doi:10.1007/s10664-019-09700-1

An empirical study on the teams structures in social coding using GitHub projects

Published: 22 May 2019

Volume 24, pages 3790–3823, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Software Engineering Aims and scope Submit manuscript

An empirical study on the teams structures in social coding using GitHub projects

Download PDF

1249 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Social coding enables collaborative software development in virtual and distributed communities. Social coding platforms (e.g., GitHub) provide the pull request feature that allows developers to clone a project, make code changes, and request the project owners to review and integrate the code changes to the main stream of a project. The pull request feature has been widely adopted by a large number of GitHub projects, as it minimizes the risk of exposing the projects to the open communities. The efficiency of the pull requests review process depends both on technical (e.g., the code quality) and social (e.g., the connection of a contributor to the project maintainer) factors. However, it is still unclear which social factors have the most impact on the efficiency of the review process. To identify the social factors, we study the team structures formed by the developers within the projects that adopt the pull-based development model. We build the pull-based networks, where two developers are linked if one has integrated a pull request submitted by the other. We investigate the 7,850 most popular projects on GitHub that are developed in ten programming languages. We identify the network metrics that have a significant association with the speed of processing the pull requests. Specifically, maintaining a strong core of contributors and denser interactions among the developers is associated with faster response and processing of the pull requests. We further find that more than 90% of the studied projects follow 8 dominant team structures out of 18 possible team structures. In the larger projects, only a set of developers is granted review and integration privileges of the pull requests, reflecting a strict decision making process. The small to medium projects are characterized by a small number of core contributors who maintain repeated interactions, and are able to process the incoming pull requests more efficiently. The evolution of the team structures of projects over time reveals that only a low percentage of the projects witnesses a change towards team structures associated to faster pull requests processing (e.g., stronger centralization).

How do developers collaborate? Investigating GitHub heterogeneous networks

Article 07 September 2022

Who Can Help to Review This Piece of Code?

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Article Open access 26 May 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Social coding websites (e.g., GitHub), provide a friendly platform for source code management, issue tracking, and networking among distributed communities (Dabbish et al. 2012). The open source software development benefits from social coding websites, by improving collaboration (Dabbish et al. 2012). The core of many social coding websites is the pull request feature (a.k.a. the pull-based development model) (Barr et al. 2012). A developer (i.e., contributor) is free to create a local copy of the project repository, make code changes, and submit a pull request to the project owner. A project owner, maintainer, or integrator is responsible to respond to a pull request by reviewing the code changes and determining if the pull request can be integrated into the main branch of the project.

The pull-based development model eliminates the need for a shared repository, lowers the upfront coordination, and decreases the barriers for the first-time contributors (Gousios et al. 2014). As such, many projects adopt the the pull-based development model, as a substitution to the past collaboration channels, such as submitting patches via issue tracking systems and/or mailing lists (Bird et al. 2007; Gharehyazie et al. 2015). In terms of popularity, a study by Gousios et al. (2014) reports that pull requests and shared repositories are equally used among GitHub projects (≈ 14% of the projects each), with the remaining projects being single-developer projects. The pull-based development model is particularly appreciated for separating the development effort from the decision making process about the submitted changes (Gousios et al. 2014).

The performance of the pull-based development model depends not only on the quality of the submissions made by the contributors but also on the processing of the pull requests by the project maintainers (particularly, the integrators of pull requests) (Gousios et al. 2015). A qualitative study by Gousios et al. (2016) and Gousios et al. (2015) shows that the integrators struggle to review or motivate other developers to review the submitted changes. The lack of responsiveness of the project maintainers is a common complaint from the contributors(Gousios et al. 2016). The low responsiveness in processing pull requests delays the integration of code changes on new features and bug fixes, therefore it weakens the power of the pull-based development model.

In this paper, we study the types of team structures that possibly impact the performance of the pull-based development model. We build the pull-based networks of 7,850 GitHub projects. In a pull-based network, two developers are connected if one of them has merged at least one pull request that was submitted by the other. We describe the pull-based networks by a set of network metrics, such as the centralization, and the reciprocity. The network metrics capture the roles of developers as integrators or contributors or both (i.e., reciprocity). The network metrics can also identify the existence of core developers, or equally important participants (i.e., centralization). We use the network metrics of the pull-based networks to infer the team structures formed in the GitHub projects. A team structure reflects how a development team self-organizes as they submit and review the pull requests. We systematically identify the set of existing team structures based on a set of influential network metrics from the open source projects. Specifically, we investigate the following four research questions:

RQ1.What are the influential network metrics on the performance of the pull-based development model?
We compute the network metrics of the pull-based networks. Then, we compute four performance metrics that reflect the productivity and efficiency of a team in managing the pull requests. We build a regression model to identify the influential network metrics on the performance of a team. We find that three metrics (i.e., density, out-degree centralization and reciprocity) are significantly associated with all four performance metrics.
RQ2.What are the common team structures in the pull-based development model?
We capture the team structures using the three influential network metrics that are identified in RQ1. We define the possible team structures by discretizing the values of the three influential network metrics (e.g., the out-degree centralization metric is discretized into 3 levels based on its distribution). We obtain 18 (i.e., 3 × 3 × 2) structures in total. We observe that 8 dominant team structures are adopted by over 90% of the projects. More than a third of the projects follow a team structure characterized by developers taking dedicated roles, and disconnected sub-teams working on different parts of the project.
RQ3.Are there team structures that yield higher performance in processing the pull requests?
We attempt to rank the 8 dominant team structures based on the performance of the associated projects. The team structures describing well-connected teams with a small number of core contributors exhibit the highest performance.
RQ4.Does changing the team structure over time have an impact on the performance of the pull-based development model?

The team structure of projects evolve over time. We compute the team structures and the performance metrics of a project at different temporal snapshots. The adoption of more desirable team structures that we identify in RQ3 is strongly associated to an improvement in the performance of the pull-based development model.

Paper organization

We describe the related work in Section 2. The background on the pull-based network is presented in Section 3. We present the experimental setup of the empirical study in Section 4, followed by our results in Section 5. We discuss the threats to validity in Section 7, and conclude in Section 8.

2 Related Work

In this section, we first present the related work on the governance of open source projects, followed by the evaluation of pull requests. We then discuss the developer social networks and their impact on the development process.

2.1 Governance in Open Source Projects

The governance of software projects is a process, by which the projects are strategically managed, to control the progress and continuous commitment of the developers (Capra et al. 2008). O’Mahony and Ferraro (2007) argue that although the online communities that form the open source projects are enabled by technology, they are not immune to the well-known general principles of organizing. Even if the technical contributions of the developers are a vital part to the progress of the open source projects, O’Mahony and Ferraro (2007) further argue that the process of coordinating the developers became vital to leadership, particularly as projects become mature. As such, a line or work emerged to investigate the social and informal structures of open source projects (Rigby et al. 2013; Crowston and Howison 2006; Dinh-Trong and Bieman 2005). In a study by Rigby et al. (2013), the relationship between open source project governance and distributed version control is investigated. Similarly to our study, the relationship between two developers x and y is defined with the number of times x signed off or reviewed the code change of y and vice versa. Accordingly, it is found that large open source projects are oligarchies or dictatorships that have a large number of external contributors who do not have the sign-off authority. In an effort to examine the social structure of open source projects, Crowston and Howison (2006) look into the interactions related to the bug fixing process, through the issue tracking systems. The study reveals that although the project teams are highly hierarchical, the centralization levels tend to vary and are negatively correlated to project size, suggesting that large projects are more modular. Dinh-Trong and Bieman (2005) investigate the common characteristics in the development processes of successful open source projects. For instance, the FreeBSD project follows prescribed processes that determine developers’ responsibilities, deal with enhancements and defects, and manage releases. Both the FreeBSD and Apache projects have a small set of core developers who control the code base. Bird et al. (2008) study the latent sub-communities from the email social network of several projects to understand how successful open source projects can self-organize. It is revealed that a strong community structure existed within the communication patterns of the participants, and that the structure was more modular when the discussions in the emails focused directly on source code artifacts. Additionally, sub-communities within a project were also representative of the collaboration behavior of the developers. In terms of developers’ roles, Joblin et al. (2017) propose a relational perspective to classify developers into core and peripheral using network metrics. The authors further report that core developers exhibit upper positions in the hierarchy, high positional stability, and are at the centre of coordination with other developers.

In prior work, the social structures of open source projects has been captured using metrics such as hierarchy (Rigby et al. 2013), and centralization (Crowston and Howison 2006). Given the rich insights that could be obtained from network structure, we include in our study a more comprehensive set of network metrics (Butts et al. 2008) shown in Table 1, and we attempt to identify the most significant metrics in the context of the pull-based development model (RQ1) in order to capture the team structures (RQ2).

Table 1 A descriptive summary of the pull-based networks extracted from the GitHub projects

Full size table

2.2 Evaluation of Pull Requests

In social coding, the evaluation of the pull requests made by external contributors plays a key role in the success of distributed software development. The evaluation of external contributions involves both social (Ducheneaut 2005; Marlow et al. 2013; von Krogh et al. 2003) and technical factors (Jiang et al. 2013; Mockus et al. 2002; Rigby and Storey 2011). Prior work has found that the social and technical impressions of external contributors influence the evaluation of their contributions (Tsay et al. 2014a, 2014b). Particularly, Tsay et al. (2014a) found that project integrators are likely to consider both the technical quality of the contribution and the social connection of the contributor to the project integrators. For instance, pull requests with many comments were less likely to be accepted, and their acceptance was dependent on the submitter’s prior interaction with the project. Moreover, Tsay et al. (2014a) report that well-established projects were more conservative in accepting pull requests. In addition to the technical factors, such as code quality (Gousios et al. 2015; Tsay et al. 2014a), adherence to project conventions (Gousios et al. 2015), and inclusion of test code in the pull request (Gousios et al. 2015; Tsay et al. 2014a), the study by Gousios et al. (2014) shows that the time it takes to accept and merge a pull request is also influenced by the previous track record of a developer. Yu et al. (2014a) propose to recommend a pull request reviewer based on comment networks of projects, since the review process is mostly embedded in the discussion section of the pull request. Vasilescu et al. (2015) study the effect of introducing continuous integration to the pull request process. The continuous integration is found to be associated to more pull requests being processed to be either accepted and merged or rejected, without compromising the quality of the source code.

Given the importance of the social aspects in the evaluation of the pull requests as shown by previous studies (Gousios et al. 2014; Tsay et al. 2014a), we complement the existing line of study by investigating the team structures formed within the pull-based development model (RQ2), and their association with the productivity and efficiency of processing the pull requests (RQ3).

2.3 Developer Social Networks

Distributed development is very common for OSS projects. A number of studies (Ehrlich and Cataldo 2012; Wolf et al. 2009; Zanetti et al. 2013) have investigated the social aspects of distributed development, and their relation to the collective and individual performance of a distributed team. Such studies examine the networks formed by the developers as they contribute, communicate, and possibly thrive in their respective communities. There are many ways to build a developer network. Two developers can be connected if they communicated in a discussion thread in the past, thus forming a communication network. Bettenburg and Hassan (2010) and Wolf et al. (2009) report that the structure of communication networks shows an associated to future failures, in addition to the quality of bug reports, as revealed by Zanetti et al. (2013). In the communication network of a large software project, Ehrlich and Cataldo (2012) found that the centrality of developers in the network indicates their performance in fixing bugs. The follow networks of developers capture the follow behaviours among developers in social coding platforms, such as GitHub. Schall (2014) examine the follow network of developers on GitHub to recommend who to follow. The purpose is to help developers build a reputation and a strong network among their peers (Schall 2014). Yu et al. (2014b) mine the follow networks, and identify the behaviour patterns of developers from the networks (e.g., star, group, or hub shaped). Yu et al. (2014b) further claim that the identified behavior patterns can inform the design of assistive tools for developers, such as recommendation systems.

Pull-based collaboration networks capture a different layer of interactions among developers in social coding platforms. In the context of the pull-based development, the collaboration happens when one developer reviews the pull request made by others, as studied by Rigby et al. (2013). However, the hierarchy of the networks is the only aspect studied by Rigby et al. (2013) in their investigation of the review networks of developers. In our paper, we perform a more comprehensive study and include other network metrics to capture the team structures of the projects in the pull-based development model. Additionally, we also look into the evolution of the pull-based networks over time (RQ4).

3 Pull-Based Networks

In this section, we provide a background of the pull-based software development, describe the pull-based networks inferred from the existing open source projects, and discuss the metrics used to capture the performance of the pull-based development model.

3.1 Pull-Based Software Development

The pull-based development model has become the de facto standard of collaboration within open source projects (Gousios et al. 2015). There are two types of roles for developers to participate in a pull-based model: 1) contributors who make the code changes and submit the pull requests; and 2) integrators who are responsible to review pull requests and decide whether to merge the pull requests to the main code base. A contributor can either be part of the project maintainers or an external developer to the project. An integrator is, on the other hand, necessarily part of the team that maintains the project. In some projects, the project maintainers can directly commit their changes to the code base; while external developers need to create pull requests to submit their changes. In other projects, the project maintainers and external developers can both solely use pull requests to submit code changes. In this case, pull requests are used to track, review, and discuss all the code changes (Gousios et al. 2015).

On social coding websites, such as GitHub, and BitBucket, the contextual and structured information are recorded for each pull request. For instance, a single pull request contains three tab pages on GitHub as shown in Fig. 1: 1) the “Conversation” tab page is used to track the discussions and activities related to the pull request; 2) the “Commit” tab page shows all the commits associated with the pull request; and 3) the “Files changes” tab page lists all files changed in the pull request and records the differences resulting from each code change.

3.2 Pull-Based Network

We define a pull-based network as a directed and weighted graph. Each node represents a developer. An edge between two nodes signifies that two developers have engaged in a <contributor, integrator> collaboration. The edges of the network are weighted by the number of times that two developers collaborated in the past. We conjecture that the <contributor, integrator> relationship constitutes collaboration between two developers, because the review process of a pull request involves both a review of the code submitted, along with back-and-forth discussions to request changes if needed. The network includes all the developers who have either submitted a pull request, reviewed and integrated a pull request, or both (regardless of the developers’ level of participation in the project). For each project, we build a pull-based network, as shown in Fig. 2. We represent the pull-based networks as a set of vectors with each vector in the form of Contributor, Integrator, Number_collaborations. We show in Table 1 a descriptive summary of the pull-based networks of the 7,850 GitHub projects.

It is possible to mirror the network structure of development teams using other types of relationships, such as the co-editing of files or the participation in communication threads. However, in this paper, we purposefully investigate the high level structure of the development teams, as reflected by the review process of the pull-based development model (i.e., the <contributor, integrator> relationship). Our goal is to infer from the constructed networks the structures of the development teams in the pull-based development model.

We use the network metrics to describe the pull-based networks. From the network metrics, we can infer information such as the centralization of the developers, or how densely the developers in a network are connected. Specifically, network metrics are used to describe the structural properties of a network in its entirety (Anderson et al. 1999), in terms of centralization (Freeman 1977, 1978), informal organization (Krackhardt 1994), and general structure (Garlaschelli and Loffredo 2004). We compute 10 commonly used network metrics to understand the team structures in the context of the pull-based development model. We capture the team structures based on a discretization of the most influential network metrics that we identify in RQ1. As such, we are able to systematically define the different structures formed by the developers as they collaborate through the pull-based model. Table 2 shows the list of the network metrics and the corresponding descriptions.

Table 2 The extracted network metrics

Full size table

3.3 Performance Metrics for Evaluating the Pull-Based Model

It is important to process the incoming pull requests in an efficient and productive manner in order to maximize the benefits of the pull-based model. In previous studies on the pull-based development model (Gousios et al. 2014; Yu et al. 2015), models are built to predict the decision to merge a pull request, and the time it takes to process it.

In our study, we focus on the responsiveness of the team in processing the pull requests. Since it is not our goal to identify the factors behind a pull request acceptance, we do not consider the decision to merge a pull request as an outcome metric, but we include the time to process a pull request. Moreover, we add three metrics, i.e., the ratio of long running pull requests, the number of pull requests closed daily, and the response time. We explain the performance metrics in more details below.

3.3.1 Productivity

We compute the following two metrics to capture the productivity of a development team. The productivity metrics are designed to assess whether the developers are able to produce the intended results, i.e. closing the pull requests, within a time period.

The ratio of long running pull requests. GitHub defines a long running pull request as one that has lived for more than a month, with some activity (e.g., a comment) within the past month (Rick 2013). This metric helps us assess whether the team leaves pull requests lingering for an extended period of time. The higher the ratio of the long running pull requests, the lower the productivity of the team.
The average number of pull requests closed daily. The higher the average, the more productive the team is. As more pull requests are closed, more issues are fixed and more new features are introduced to the project.

3.3.2 Efficiency

We extract the following two metrics to quantify the efficiency of a development team. The efficiency metrics are meant to measure whether the developers process the pull requests using the least amount of resources, i.e., time.

The average response time. The time it takes project maintainers to provide a first response to the pull request. The sooner project maintainers provide an initial feedback to the contributor, the more likely the contributor is motivated to work on the requested reviews to improve the quality of the code change.
The average processing time. The time it takes the team to process and close a pull request. The lower the processing time, the sooner the integrators can focus on processing other pull requests, and the sooner contributors can work on new code changes.

To ensure that the performance metrics can capture distinct information, we compute the pairwise correlation among the collected metrics using the Spearman’s rank coefficient. We choose Spearman’s rank correlation test over other non-rank correlation tests (e.g., Pearson’s coefficient) because rank correlation is more robust to data that is not normally distributed (Zar 2005). For each pair of metrics, we find that the value of the Spearman’s rank coefficient is always less than 0.7 (i.e., the recommended threshold by Zar (2005)). Therefore, we use all four metrics to measure the productivity and the efficiency of a development team.

4 Experimental Setup

In this section, we provide details on collecting and processing the GitHub data. Figure 3 depicts our experimental setup including the overall approach.

4.1 Collecting the GitHub Data

GitHub^{Footnote 1} is not only the largest code host (over 38 million repositories), but also a very popular social coding platform. GitHub provides issue tracking, pull requests, commits history, subscriptions to other users, and documentation. Developers can easily share their profile and their activities through GitHub.

To collect the GitHub data, we use GhTorrent (Gousios 2013), an off-line mirror of the GitHub data. GhTorrent has been collecting data since February 2012 and is updated periodically, i.e., every two to three weeks. We download eight temporal snapshots (i.e., 2014-01-02, 2014-08-18, 2015-01-04, 2015-08-07, 2016-02-16, 2016-03-01, 2016-06-01, and 2016-11-01) of the GitHub data dump. We intentionally keep approximately a 6-month interval between each two snapshots whenever possible. The multiple snapshots enable us to study the evolution of the pull-based networks. We apply the following three filters to select the subset of subject projects:

F1.Programming language filter. We choose the projects that are written in the ten most popular programming languages on GitHub: JavaScript, Java, Python, CSS, Php, Ruby, C++, C, Shell, and C#.
F2.Type of project filter. We only extract the non-forked projects. A non-forked project is an original repository that was started from scratch, as opposite to forked projects which are copies of other repositories. At this step, we obtain over six million projects.
F3.Activity level filter. An almost equal number of projects use pull requests and shared repositories for distributed collaboration (\(\sim \) 14%) (Gousios et al. 2014). The remaining projects that do not use either collaboration approches (over 60%) are single-developer projects (Gousios et al. 2014). We focus on the most active projects in terms of the number of recorded pull requests, as we need to build pull-based networks. We select the projects in the top 95% percentile, with over 100 recorded pull requests. In total, we obtain 7,850 projects with a total of 2,854,917 pull requests.

4.2 Computing and Normalizing the Network Metrics

We process the pull-based networks using the R package SNA (S ocial N etwork A nalysis) developed by Butts (Butts et al. 2008). The SNA packages transforms each network into a matrix, and provides a set of functions (e.g., grecip()) to compute the network metrics listed in Table 2. Moreover, it is important to include other project measures that have shown to have strong predictive power in the previous studies (Moser et al. 2008; Nagappan and Ball 2007). Therefore, we include the number of commits and the number of developers overtime, to control the impact of the activity level of a project and the size of the project team. To control the impact of the number of developers, we use a normalized metric \(\frac {nodes}{edges}\), where the number of nodes is a simple count of the developers in a project, and the number of edges describes the sparsity of collaborations among the developers.

Reason for the normalization

When two networks have different sizes, it is not recommended to directly compare the values of their associated network metrics (Anderson et al. 1999; Butts et al. 2008; de Reus and van den Heuvel 2013). For instance, we assume that two networks N₁ and N₂ have the same centralization value C, but different sizes (N₁ > N₂). The centralization of a network measures the importance of the different nodes based on the number of edges. As a network grows in size, its centralization value inevitably changes as well. A centralization value equal to C is within the norm for N₁, compared to other networks of the same size. However, the same centralization C is larger than what is usual for the smaller network N₂. A prior study (Anderson et al. 1999) has shown that the interaction between network metrics and the size of a network can not be ignored. Considering the intrinsic dependence on the size of a network, it is likely that the difference in the metric values can be partly explained by the difference in the network sizes. Therefore, it is important to normalize the network metrics by controlling the effect of the network size. The normalized metrics allow for a more sound interpretation of the network metric values, and a fair comparison of graphs with different sizes (de Reus and van den Heuvel 2013).

The CUG test for normalization

To control the effect of size, we perform the Conditional Uniform Graph (CUG) hypothesis test (Anderson et al. 1999), a simple model that fixes certain properties of a network (e.g., the number of nodes) at particular values, and treats all networks meeting the selected properties as equally probable. The CUG test is adequate for the task of controlling the effect of size on the remaining network metrics, as the effect of size is the only substantial effect reported by the literature (Anderson et al. 1999; Butts et al. 2008; de Reus and van den Heuvel 2013). In the CUG test, a baseline model is built and used as the null hypothesis. Under the baseline model, a number of networks of the same size are used as the input network and are simulated using Monte Carlo simulation (Handcock et al. 2008). Monte Carlo simulation shuffles edges while fixing the number of nodes to simulate the networks for the baseline model. The test generates the distribution of a network metric under the baseline model, and compares the observed network metric to the baseline distribution. To perform the CUG tests, we use Statnet, an R package developed by Handcock et al. (2008). For each network metric value, the CUG test returns the probability of the observed value to be greater than or equal to the values under the baseline model (i.e., Prob_greater = Prob(X <= Observed)), and the probability of the observed value to be less than or equal to the values under the baseline mode (i.e., Prob_less = Prob(X >= Observed)).

Normalizing the metrics

To normalize the values of the network metrics, we choose to transform each metric value into Prob_greater, as we find it easier to interpret. When Prob_greater is closer to 1, the value of the network metric is unusually high for networks of the same size. The closer Prob_greater is to 0, the smaller is the observed value of the network metric compared to the baseline. For instance, assuming Prob_greater = 0.9 for the metric centralization in a network, we can conclude that the network is particularly centralized compared to other networks of the same size. Thus, the normalized metric values help us compare the strength of a network property across networks with different sizes.

5 Results

In this section, we present the results of our experiments with respect to four research questions.

RQ1.What are the influential network metrics on the performance of the pull-based development model?

Motivation

In distributed software development, the social and organizational aspects have an impact on the individual and collective performance of the developers (Ehrlich and Cataldo 2012). As such, the performance of the pull-based development model is governed by both technical factors (e.g., the quality of the code changes), and social factors (e.g., the team structure). However, it is unclear which team structure properties have the highest impact on the performance of processing the pull requests. In this research question, we identify the network metrics (described in Section 3.2) that have a significant association with the performance metrics of the pull-based development model (listed in Section 3.3).

Approach

For each subject project, we first build a pull-based network. Second, we compute and normalize the network metrics to describe the structural properties of the pull-based network (see Section 3.2). Finally, we conduct the following steps to identify the influential network metrics.

Reduce highly-correlated metrics

In the presence of highly-correlated metrics, the estimate of the impact of one metric on the dependent variable tends to be less precise, thus weakening the classification model. Therefore, we use the R function cor() to generate the correlation matrix of the number of vertices and edges in the network, in addition to the ten network metrics. If the correlation between two metrics is more than 0.7 (i.e., the recommended threshold by Zar (2005)), we select the one which is easier to interpret in the context of the pull-based development model.

Build a regression model

The purpose of the analysis is to model the relationship between the response variable (i.e., the performance metrics, such as the average response time) and the predictors (i.e., the network metrics, such as the density). Therefore, we use linear regression to determine which predictors are statistically significant and how changes in the predictors relate to changes in the response variable.

For each performance metric, we build a separate regression model and use the R² metric to assess the fit of the model. The R² measures the “variability explained” of the response variable that is analyzed (Steel and Torrie 1960). For instance, an R² of 0.5 indicates that 50% of the variability of the response variable is being modeled (i.e., “explained”) by the predictors. The remaining 50% of the variability may be due to external factors that are not being modeled or cannot be controlled. The interpretation of R² values depends on the analysis that is being performed. For example, when the main goal is prediction, the R² values should be very high (e.g., around 0.7 to 0.9) (Choi and Varian 2012). Low R² values (e.g., around 20%) may also generate interesting insights in fields such as social sciences or psychology (Bersani et al. 2016).

Identify the influential network metrics

We identify the predictors (i.e., network metrics) that show the highest association with the response variables (performance metrics). The influential predictors can then be used to define the team structures. Therefore, we identify the significant predictors (p-value < 0.05). We also report the regression coefficients of the predictors, to assess the influence of each predictor on the response variable. The regression coefficient tells us how much the response variable (e.g., the response time) is expected to increase when the predictor variable (e.g., the density) increases by one, holding all the other predictors constant. The regression coefficients of different predictors are not always comparable because the predictors have different types of unit. For example, the response time is measured in seconds, while the number of pull requests closed daily is counted as units of pull requests.

Results

The correlation analysis leads to the removal of two network metrics (i.e.,hierarchy, andefficiency). We show in Fig. 4 the results of the correlation analysis. We retain the metric density (over the efficiency) because it reflects whether developers collaborate with the entire team or only a subgroup of developers in the team. The metric efficiency measures whether the network uses as few edges as possible to connect the developers (see Table 2). Low network efficiency means two developers are indirectly connected more than once, which is not as easy to interpret in this context as the density. Finally, we choose the reciprocity (over the hierarchy) because the reciprocity measures whether the developers take single or multiple roles in the team (i.e., contributors and integrators). The notion of hierarchy is not applicable in the pull-based development model because a directed edge from a contributor to an integrator does not indicate hierarchy levels, but rather collaboration.

The four most influential network metrics in terms of their association with the performance of pull-based development model include: the vertices over edges, reciprocity type 2, out-degree centralization, and the density. To select the top influential network metrics, we identify the metrics that return a p − value < 0.05. We further report the regression coefficients of the selected metrics, to measure the influence of each predictor. Table 3 shows the regression coeffients of the significant predictors, for each of the linear regressions models (a model is built for each performance metric defined in Section 3.3). We also show in Table 3 the coefficient of determination R² of the trained models. The resulting R² values are low (i.e., 0.32 or less), therefore, the network metrics can only explain up to 32% of the variability of the performance metrics. Therefore, the team structure properties (as measured by the network metrics) can only partly explain the productivity and efficiency of the development team in processing the pull requests. The remaining variability is likely due to other factors, such as the complexity of the code change in the pull requests and the time availability of developers. However, we can still infer interesting insights about the relationship between the network metrics and the performance metrics. For instance, an increase in one unit of the network density is associated to the decrease by 12465.85 seconds (3.46 hours) in the processing time of the pull requests. A more dense network implies a developers would collaborate at the pull request level with diverse developers, instead a reduced number of developers. In other words, encouraging more collaboration links among the developers who process the pull requests could be associated to reducing the processing time of the pull requests. Unsurprisingly, an increase in the processing time can occur when the number of commits a projects receives is higher. Every additional commit is associated with an increase of 16983.65 seconds (4.72 hours) in the processing time. A higher reciprocity is possibly associated to lower processing time. A reciprocal link between two developers indicates prior connection between the two developers. Therefore, this result confirms a previous finding by Tsay et al. (2014a) regarding the role that a contributor’s prior connection to the project integrators has on the processing of the pull requests.

Table 3 Regression coefficients of the significant metrics from the linear regression models

Full size table

RQ2.What are the common team structures in the pull-based development model?