1 Introduction

In modern, social-coding (Dabbish et al. 2012) projects based on sites like GitHub and BitBucket, that favor the pull-request model, the emergence and growth of a particular type of socio-technical link, @-mentions, can be observed in task-oriented technical discussions. For example, in the rails project on GitHub (issue 31804), one of the head developers calls on another, explicitly stating trust of their expertise, saying: “@kamipo can you take a look since you are our MySQL expert?” On GitHub, the @-mention in issue discussions is a type of directed social link; the @-mentioner “calls” the @-mentionee via a directed communication that is sent to the @-mentionee through GitHub’s interface. Thus, one can consider the network of @-mentions, specifically calls, as a sort of directed social network, with a task-oriented purpose. These mentions are heavily used in social coding; in our data, a sample of the most followed and starred projects on GitHub, 52.46% of issues and 22.02% of pull requests contain at least one @-mention, with an average of 1.46 and 1.37 @-mentions per issue or pull request (respectively). On average, developers who are called (while not yet actively participating in the thread) respond 19% of the time; the number rises to 42.94% when excluding those who never respond.Footnote 1 @-mention ubiquity reflects the central role they play in task-oriented social interactions on GitHub. Since much of a developers behavior in OSS projects is recorded, if a person has the expertise and/or are reliable in many different tasks, they will be visible to others. The decision to @-mention someone will be based on visible attributes of that developer, including reliability, productivity, etc. Identifying a reliable and knowledgeable person to ask for help or action is key to addressing issues in a timely manner and keeping a project vibrant and alive. In fact, Yu et al. found that having @-mentions in a discussion decreases the time to resolve an issue (Yu et al. 2016); Zhang et al. found that more difficult issues (e.g., longer length of discussion) have more @-mentions (Zhang et al. 2017). Given these important outcomes, it would be beneficial to know what (observable) socio-technical attributes of developers contribute to being @-mentioned.

As @-mentions have an inherent social element, a global model describing the determinants of @-mention calls would suggest that project-specific social idiosyncrasies are less important than social elements common across GitHub. A global GitHub model for @-mentions may be seen as constructive, since shared social norms across the ecosystem can increase social mobility (Sato and Arita 2004); on GitHub, this may make the acculturation process easier for those who move between projects. In addition, the findings of Burke et al. (2009) suggest that those who perceive themselves as socially central contribute more as a result - this may extend to code contributions on GitHub. The findings of Kavaler et al. (2017) suggest global and project-specific social phenomena (apropos language use) exist on GitHub; is this the case for @-mentions? Or does one phenomenon dominate?

The goal of this paper is to understand both the elements contributing to @-mentions in GitHub projects and the extent to which those elements are shared between projects across GitHub. @-mentioning is a complex, multidimensional phenomenon. Developers that are often @-mentioned can have outsized roles and responsibilities in the project network, and be able to handle any task. Thus, a frequently @-mentioned person could well be a strong, highly visible contributor, who might be a trustworthy collaborator on an active task. Whereas visibility can be operationalized more directly, based on a person’s aggregate presence in all aspects of the social coding process, both reliability and trust are more complex: we describe the theoretical background for these in the next section. Starting from those theories, and from data on @-mentions and comprehensive developer and project metrics from 200 GitHub projects, we seek a predictive, quantitative model of future @-mentions of a developer, using past observations of the developer’s visibility, expertise, productivity, and responsiveness in their projects. From our quantitative models, together with case studies aimed towards triangulating the model results,

  • We find that we can mine a reliable @-mention signal from GitHub data, in ways consistent with current theories in sociology, psychology, and management.

  • We see a net positive effect of visibility on @-mentions. We see that less expertise (via, e.g., commits that need fixing, likely buggy commits) associates with lower @-mentions when one has already been @-mentioned, and higher @-mentions if one has not already been @-mentioned; perhaps explained by the idea that any contributions, even defective ones, associates with an initial @-mention, consequently adjusted. We see positive effects for productivity, and none for responsiveness.

  • We find that cross-project model fits are generally good, suggesting a common model of @-mentions across GitHub. Similarities among the models are greater for enhanced @-mentions after the first @-mention, than for the initial one.

  • We see indications of project-specific @-mentioning behavior, however, the high performance of cross-project prediction suggests the differences may matter little, especially for predicting @-mentions.

We present the theory and research questions in Section 2, research questions in Section 3, data and methods in Section 4, results and discussion in Section 5, practical implications for practitioners in Section 6, threats to validity in Section 7, and conclusions in Section 8.

2 Theory and Related Work

To understand the notion of @-mentions in OSS projects, we build a theory drawing from diverse sources. First, we discuss @-mentions and their use on GitHub, supported by prior work. Then, we introduce theory behind GitHub @-mentioning drawn from work regarding reliability and trust in the fields of sociology, psychology, and management. We then discuss the importance of social exchange and interaction (and thus, the importance of @-mentions) on OSS project success. Finally, we compare our work to that in the field of expertise recommendation.

2.1 @-Mentions on GitHub

GitHub projects have issue trackers with a rich feature set, including ticket labeling, milestone tracking, and code tagging. In GitHub projects, individuals can open up an issue thread where others can comment and discuss a specific issue. In these discussions, developers can tag others using @-mentions; the mentioned developer receives a notification that they are being referenced in a discussion. When one decides to @-mention another developer, there is generally a specific reason, e.g., to reply to a single person in a discussion involving many others; or, to call the attention of someone who isn’t currently in the discussion. The latter aspect is what we wish to capture; calling upon another person is an implicit (and on GitHub, often explicit) statement of belief that the receiver could help address the task at hand. To validate the importance of modeling call @-mentions on GitHub, we perform a case study (Section 5.1) and also look to prior literature (below) for the reasons behind the use of @-mention.

Tsay et al. performed interviews with several developers of popular projects on GitHub, specifically related to the discussion and evaluation of contributions (Tsay et al. 2014). They found that both general submitters and core members use @-mentions to alert core developers to evaluate a given contribution or start the code review process. They further found that core members often @-mentioned other core members specifically citing that the @-mentionee is more qualified to answer a particular question or review a given contribution. In nearly all cases, the @-mention seems to be used to draw the attention of a developer who may contribute to the task at hand. Kalliamvakou et al. surveyed and interviewed developers, mostly commercial, that use GitHub for development (Kalliamvakou et al. 2015). Of all interviewees, 54% stated that their first line of communication is through the @-mention.Footnote 2 In addition, they state that teams often use the @-mention to draw members’ attention to a problem.Footnote 3

2.2 @-Mentions and Personal Reliability

The ability to rely on others socio-technically is critical for cohesive workgroups. From a social perspective, Saavedra et al. argue that reliable interactions among group members are important for success, especially when tasks are interdependent (Saavedra et al. 1993). According to social learning theory, frequent interactions among group members increases the likelihood that some in the group will be raised to “role model” status (Bandura 1973; Bandura and Walters 1977). The importance of role models in social learning has been widely discussed (Bandura and Walters 1977; Burke et al. 2009; Dourish and Chalmers 1994). On GitHub, researchers have found that these role models (“rockstars”) are important influencers, allowing developers to learn from “rockstar” code contributions in order to improve their own work (Dabbish et al. 2012; Lee et al. 2013). In other words, developers rely on others within and outside their immediate working group in order to solve problems. In addition, peer code review (relying on team members other than the authors for manual inspection of source code) is recognized as a valuable tool in software projects (Ackerman et al. 1984). Thus, we argue that identifying these reliable developers, by means of the @-mention, is important for project success. We theorize that reliability will manifest itself on GitHub through responsiveness, measuring: if you are called, how often do you answer?

2.3 @-Mentions and Trust

Trust has a long-recognized complex (McKnight et al. 2002; Gallivan 2001) social component and well understood benefits to social and economical well-being (Inglehan 1999; Newton 2001), in both physical and virtual teams (Jarvenpaa et al. 1998). While individuals do have a personal notion of when to trust someone, in social settings those notions inherit from the communal sense of trust (Newton 2001; Jarvenpaa et al. 1998; Inglehan 1999). In socio-technical groups like software projects, contributors must be trusted as technically competent, and also as useful to the project. Gaining contributor status is a key indicator of trust, which has been extensively studied (Bird et al. 2007; Steinmacher et al. 2015; Casalnuovo et al. 2015; Gharehyazie et al. 2013; Ducheneaut 2005). In pull-request oriented models, with decentralized repositories, anyone can make changes in a fork, and then submit the changes as a pull-request. Here, social processes such as code-review take a central role in deciding the fate of code contributions. Opinions from trusted people during the relevant discussions would be in great demand, and thus, the social demand on a person is an indication of the trust placed upon them by the community. Since the pull-request model is more or less normative in GitHub projects, it is reasonable to posit that many projects in the GitHub community ecosystem may share the same determinants @-mention extension, i.e., the reasons behind @-mention extension may be a global phenomenon.

We acknowledge that an @-mention does not necessarily arise purely to trust in the taggee; however, some form of trust likely plays a role. Thus, understanding theories of trust is important to understanding @-mentions on GitHub.

Oft-mentioned and widely discussed, the meaning and role of trust has been examined across many disciplines, including sociology, psychology, and philosophy (Zucker 1986; Brockner 1996; Kramer and Tyler 1996; Brenkert 1998; Husted 1998). Gallivan provides a succinct set of definitions for trust types as provided by prior work on organizational trust (Gallivan 2001); relevant types for GitHub are: 1) Knowledge-based trust: trust based upon a prior history of transactions between two parties; 2) Characteristic-based trust: trust that is assumed, based on certain attributes of the other party; and 3) Swift trust: a “fragile” form of trust that emerges quickly in virtual workgroups and teams.

For our work, the idea of swift trust is important as it is theoretically defined for virtual teams, as on GitHub. Jones and Bowie (1998) state: “the efficiency of [virtual teams] depends on features - speed and flexibility - that require high levels of mutual trust and cooperation”; other researchers share and expand on this notion (O’Leary et al. 2002; Handy 1995). Though swift trust may initially appear most applicable, much of the founding work was done prior to the proliferation of socio-technical systems such as GitHub. More recently, Robert et al. redefine swift trust for modern systems as a combination of classical swift trust, knowledge-based trust, and parts of characteristic-based trust (Robert et al. 2009). We agree with this blended definition - a sweeping categorization of GitHub as having a swift trust system is likely incomplete; multiple trust regimes probably apply. We capture knowledge-based trust through our measures of visibility, i.e., functions of @-mention network degree. Characteristic-based trust is also likely; task characteristics can be easily seen on GitHub, as captured by measures of expertise and productivity.

2.4 @-Mentions and Social Exchange

On GitHub, the @-mention is a type of directed social link; the @-mentioner causes a notification to be sent to the @-mentionee through GitHub’s interface, a form of social communication. Thus, the network of @-mentions is a sort of social network, with a task-oriented purpose. Much work has been done in variety of fields on identifying reasons behind social tagging and mentioning behavior, including on GitHub (Yu et al. 2014).

In the fields of psychology and sociology, many researchers have explored the phenomenon of social tagging on Facebook (Qiu et al. 2013; Burke et al. 2010; Oeldorf-Hirsch and Sundar 2015). In general, this research has shown that social tagging provides a sense of community and increases one’s social capital. These findings are of importance to GitHub as they elucidate the importance of community social interaction, which are known to be important to OSS success (Gharehyazie et al. 2015; Gharehyazie et al. 2013). Of specific interest, Burke et al. found that those who receive feedback on their Facebook posts share more (Burke et al. 2009). It is reasonable to believe that this extends to task-oriented networks, such as GitHub; those who feel as though their contributions are important, socially or technically, are likely to contribute more.

McDonald et al. interviewed multiple GitHub developers and found that they rarely use product-related measures (e.g., release quality, bug fixes) to describe project success; rather, they use measures such as number of (new) contributors, pull requests, etc. McDonald and Goggins (2013). As stated above, social exchange is important to both one’s own well-being and OSS success. As social measures have been shown to be important for OSS product success (Hossain and Zhu 2009), and given that developers generally use non-product measures to describe project success, fostering the use of @-mentions and thus the exchange and gain of social capital would be beneficial for both metrics of success. We capture social aspects in visibility - functions of @-mention network measures.

2.5 @-Mentions and Discourse/Dialogue

Discourse and dialogue have seen a resurgence of research interest with the advent of NLP computational methods. Stolcke et al. (2000) have most prominently defined discrete conversational speech categories into which @-mentions fit well, perhaps because they themselves are social link extensions. Stolcke’s et al. (2000) work and the other aforementioned prior work (Tsay et al. 2014; Kalliamvakou et al. 2015), helped us distill the following four categories of speech that use @-mentions (one of these is a slightly modified category as compared to Stolcke’s work, marked by ⋆):

  1. 1.

    Request (R): An explicit request towards the called person to perform some action.

  2. 2.

    Request-Suggest (R-S): An implicit request towards the called person to perform some action.

  3. 3.

    Inform (I): An indication that the issue or post is relevant to the called person.

  4. 4.

    ⋆Credit Attribution (CA): An @-mention designed to attribute credit to the called person. This is similar to “Thank” by Stolcke et al. (2000), but explicitly directed at an individual.

We use these categories in a case study examining reasons behind call @-mentions in Section 5.1.

2.6 Expertise Recommendation

As our interest is in @-mentions, often used to call upon those with expertise relating to the task at hand, we compare our work with that in the field of expertise recommendation. Murphy and Cubranic (2004) used text classification on data from the Eclipse bug tracking system in order to identify developers relevant for a given bug; Matter et al. performed a similar study (Matter et al. 2009). Likewise, Mockus and Herbsleb developed and deployed a tool to identify expert developers (Mockus and Herbsleb 2002) working on proprietary software. More directly relevant to our work, Ibrahim et al. developed a tool to identify which developers should contribute to a given discussion on mailing lists for three open source projects (Ibrahim et al. 2010). Though our work is similar in spirit to that in the field of expertise recommendation, our focus is specifically on the calling signal itself (@-mention). This goal is unique from expertise identification and recommendation as our goal is to identify reasons behind the signal extension itself, rather than identifying the best receiver of said signal.

3 Research Questions

@-mentions signal a desire for a developer’s involvement in a task-oriented discussion. GitHub is a rich source of mine-able, potentially relevant, developer characteristics.

The theory above allowed us to identify relevant dimensions along which to model the phenomenon of @-mentions. We describe them shortly here, and operationalize them in the Methods section. Visibility measures the ability of others to know of a developer; if a developer is to be @-mentioned, people must know the network in order to know who they are capable of reaching. Expertise can be defined through task-related measures, e.g., number of likely buggy commits, which might influence how much a developer is @-mentioned. Productivity is defined by number of commits; prolific committers could be viewed as the “top brass” of a project, and commits are easy to see in GitHub. Finally, we are interested in responsiveness; if a mentionee is called to lend their talent, it is not farfetched that those who respond to the call are more likely to be @-mentioned in the future.

We explicitly model future @-mentions, i.e., @-mentions as measured 6 months beyond the “observation period”, described further in Section 4.6. Having an effective model that explicitly predicts future behavior has higher utility to potential future applications than an aggregate regression model over the whole history.

RQ 1: :

Can we describe/predict future @-mentions in terms of developer visibility, expertise, productivity, and responsiveness?

Our second question relates to the utility of our model. If one wishes to use our model on their own projects, it would be helpful to be able to use the model pre-trained on some data, e.g., trained entirely on a separate project and applied to one’s own.

RQ 2: :

Can models trained entirely on one project be reliably used to predict @-mentions on another project?

Our third question is more theoretical in nature. Specifically, we wish to describe the differences between projects in terms of our determinants of @-mentions and identify some potential reasons behind these differences. As GitHub is composed of subcommunities which may have some idiosyncrasies, we believe that these differences may be reflected in our describers of @-mentions.

RQ 3: :

Is there evidence of project-specific @-mention culture? Or are the determinants of @-mentions a GitHub-wide phenomenon?

4 Data and Methodology

All data was collected by querying GitHub’s public API using the Python package PyGithub,Footnote 4 with the exception of issue fixing data, which was gathered by cloning individual repositories. Commits are found through the official GitHub API, including commits within and without pull-requests. Developers for a given commit are identified automatically by inspecting the commit’s specified author within git, and querying GitHub for an existing user with the same name. If no GitHub user is found, the commit is not attributed to any user in our data. Data was gathered during the month of July 2017.

4.1 Filtering and Cleaning

For our data set, we looked at the top 900 most starred and followed projects, each of which likely to contain enough issues and commits for us to model robustly. The number of stars and followers are proxies for project popularity. We noted that among the 900 projects there was a significant difference in popularity among the 1st and the 900th project. Some of the measures we used are expensive to calculate, so we had to limit our calculations to a smaller sample. We chose as our sample a random subset of 200 of those 900 projects. The choice of a random sample, as opposed to, say, the very top 200, was made to ensure we captured a diverse mixture of projects in terms of popularity, and, thus, have a more widely applicable results from the modeling.

Due to our described project selection method, it is possible that non-software projects are within our final sample. However, our research questions are not software specific; our findings are meant to reflect the whole of GitHub - primarily software, but also projects focused on other goals (e.g., books and link collections).

We ran multiple parallel crawlers on these 200 projects to gather commits, issues, pull requests, and associated metadata. Due to some internal issues with the PyGithub package,Footnote 5 some projects failed to return complete data. We created a verification system (completely external to PyGithub) to determine which projects were incomplete, and removed them from consideration. Finally, we only consider developers with at least one commit to their given project in order to avoid a proliferation of zeros in our covariates, as many developers participate in issue discussions but never contribute. This was done in order to focus on those who may become @-mentioned in the future; without any commits, we argue it is unlikely to be @-mentioned in the future. To support this claim, we note that in our data, the average number of future @-mentions for those with zero commits is 0.382 with a standard deviation of 1.98 commits; in contrast, the average number of future @-mentions for those with more than zero commits is 1.98 with a standard deviation of 10.62.

As we wish to explicitly model future @-mentions, we introduce a time split in our data. For each project, we define a time frame under which we “observe” the project and its participants, and model our response as calculated beyond our observation time frame - the “response” period. We decided to set our response period as 6 months, i.e., 6 months prior and up to the end of our data. We also tested periods of 3 and 12 months; 3 months had little difference to 6 months, and 12 months left us with too little data to model. We then filtered out each individual who had a project participation shorter than 3 months. This is because with 2 months of data, there exists only one line which can be fit: the line with the first month as the start point, and second month as the end point. With 3 (or more) data points, there exists more between- and within-subject variance to be captured by the model, further reducing the worry of overfitting, as there are multiple lines which may fit ≥ 3 data points. Thus, we explicitly model future @-mentions, as our response period is disjoint from our observation period. In total, this yielded 154 unique projects comprised of 17,171 project-developer pairs to test our hypotheses.

4.2 Issues and @-Mentions

For each project on GitHub, individuals can open up an issue thread where others can comment and discuss a specific issue.Footnote 6 We constructed a social network for each project using @-mentions in their issue comment threads; Fig. 1 depicts this process. Similar to Zhang et al. (2015), i.e., every edge (u,v) is developer u @-mentioning v somewhere in their post. This yields a directed multigraph; there can be multiple edges (u,v) depending on how many times u @-mentions v. We distinguish between two edge types: reply and call. A reply edge is defined by u @-mentioning v when vhas already posted in the given thread. A call edge is defined by u @-mentioning v when vhas not yet posted in the given thread. Thus, a call edge is representative of the phenomena we wish to measure, described in Section 2; u calls upon v as u wishes for v’s input for the discussion at hand.

Fig. 1
figure 1

The network creation process. Shown is a discussion thread and the resulting reply and call networks. Note this can be a multigraph (not shown)

4.3 Focus

As a measure of visibility, we wished to capture phenomena more nuanced than merely raw indegree and outdegreeFootnote 7, as raw degree counts do not take into consideration the larger, neighborhood view. Standard global measures used in social network analysis are often too expensive to calculate for our large @-mention networks.Footnote 8 Thus, we require a measure that takes into account a more global view that is relatively inexpensive to calculate. Here, we introduce the idea of social focus in the @-mention network.

Theoretically, we believe that when given many choices on who should be contacted (@-mention), individuals must make a decision, based on their knowledge of the potential receiver’s characteristics (e.g., ability to help in a task) and who is more readily visible. In social networks, knowledge of others is propagated through existing links. Thus, if an individual is highly focused-on, it is likely that they will become more so in the future. This means that the more focused-on a developer is, the more visible they likely are. In addition, those who have lower social focus on others, i.e., they distribute their out-links widely among many others, are also more likely to be visible to others.

To represent focus, we adapt a metric described by Posnett et al. (2013). This metric is based on work by theoretical ecologists, who have long used Shannon’s entropy to measure diversity - and its dual, specialization - within a species (Good 1953), and can be derived from Kullback-Leibler divergence. For discrete probability distributions P and Q, Kullback-Leibler divergence (KL) is defined as:

$$ D_{KL}(P | Q) = \sum\limits_{i} {P_{i} \ln{\frac{P_{i}}{Q_{i}}}} $$

Bluthgen et al. define a species diversity measure, δ,Footnote 9 using DKL (Blüthgen et al. 2006). This measure is calculated naturally in a bipartite graph formulation, where each species in the graph has its own diversity value δi. Posnett et al. use this metric, normalized by the theoretical maximum and minimum (i.e., so δi ranges from 0 to 1), to measure “developer attention focus” (DAF) (Posnett et al. 2013). When δi (a row-wise measure) is high, developer i is more focused in commits to a fewer number of modules. Analogously, when δj (a column-wise measure) is high, module j receives more focused attention from fewer developers. They call these quantities “developer attention focus” (\(\mathcal {DAF}_{i}\)) and “module attention focus” (\(\mathcal {MAF}_{j}\)).Footnote 10

In this work, we take these definitions and expand them to the social network of @-mentions. Recall that we distinguish between two types of @-mentions: reply and call. We can likewise represent our social network as a bipartite graph, where the rows and columns of the adjacency matrix both refer to developers, and each cell suv is the count of directed @-mentions from developer u to developer v for a given @-mention type. Thus, we analogously define ρu as the focus developer u gives in their reply @-mentions, and ρv as the focus developer v receives from others’ reply @-mentions. Similarly, we define κu as developer u’s focus in their call @-mentions, and κv as the focus developer v receives from others’ call @-mentions.

Recall that we can interpret these values equivalently as a measure of specialization or inverse uniformity. For example, if ρu is large, developer u specializes their replies to a select group of others; if ρu is small, developer u uniformly replies to all others. Likewise, if κv is large, developer v is called by a select group of others; if κv is small, developer v is called uniformly by all others. We believe this intuition is useful to answer our research questions. Thus, we define normalized outward social specialization and inward social specialization measures for both replies (ρ) and calls (κ):

$$ \mathcal{OSS}_{u,\rho} = \frac{\rho_{u} - \rho_{u,min}}{\rho_{u,max} - \rho_{u,min}} \qquad\qquad \mathcal{ISS}_{v,\rho} = \frac{\rho_{v} - \rho_{v,min}}{\rho_{v,max} - \rho_{v,min}} $$

where \(\mathcal {OSS}_{u,\kappa }\) and \(\mathcal {ISS}_{v,\kappa }\) are defined analogously.

4.4 Attributing Commits That Need Changing

To identify commits that had to be changed in order to close an issue (i.e., likely buggy commits), we use the standard SZZ algorithm (Śliwerski et al. 2005), as expanded in Kim et al. (2006), with a few changes to accommodate GitHub nuances. GitHub has a built-in issue tracking system; developers close open issues by using a set of keywordsFootnote 11 in either the body of their pull request or commit message. E.g., if a developer creates a fix which addresses issue #123, they can submit a pull request containing the phrase “closes #123”; when the corresponding fixing patch is merged into the repository, issue #123 is closed automatically. To identify likely bug-fixing commits, we search for associated issue-closing keywords in all pull requests and commits. We then “git blame” the respective fixing lines to identify the last commit(s) that changed the fixing lines, i.e, the likely buggy lines. We assume the latest change to the fixing lines were those that induced the issue, and refer to those changes as likely buggy, or buggy for short.

We note that an issue is a rather broad definition of a bug, as an issue can be brought up to, e.g., change the color of text in a system’s GUI; this may not be considered a bug by some definitions. However, as GitHub has the aforementioned automatic closing system, we believe that our identification of fixing commits (and therefore buggy commits) does not contain many false positives. Prior work has relied on commit message keyword search, which may introduce false positives due to project-level differences in commit message standards, i.e., what a commit message is expected to convey. These standards can vary widely (Bird et al. 2009).

4.5 Variables of Interest

We are interested in measuring and predicting @-mentions as a function of readily observable developer attributes, namely visibility, expertise, productivity, and responsiveness. We operationalize these attributes as follows:

We define visibility as the ability for developers to note a person’s existence; if developer A is not aware of the existence of developer B, it is unlikely that A would @-mention B. This is akin knowledge-based trust. Here, we use our social specialization measures \(\mathcal {OSS_{\rho }}\), \(\mathcal {OSS_{\kappa }}\), and \({\mathcal {ISS}_{\kappa }}\), along with total social outdegree (total number of @-mentions for a developer in a given project) as measures of visibility. We believe these measures are reasonable as they identify one’s existence within the social network.

We define expertise as a developer’s ability to complete project tasks in accordance with team expectations, related to characteristic-based trust. To represent this, we use number of issue-inducing commits made by a developer, focus measure \(\mathcal {DAF}\), and a factor identifying whether or not the given developer is the top committer or project owner. A higher number of issues can indicate a lack of aptitude for programming according to the project’s goals.Footnote 12 It has been shown that a higher \(\mathcal {DAF}\) (i.e., higher module specialization) is associated with fewer bugs in a developer’s code (Posnett et al. 2013). Thus, \(\mathcal {DAF}\) can represent developers’ expertise in code modules. The top committer or project owner factor indicates a certain level of prestige and expertise; one would expect the top contributor or project owner would likely be the most expert in matters concerning the project. Number of fixing commits was also calculated, but was not used due to collinearity with that of bug commits.

We measure productivity as the raw commit (authoring) count. Measures of productivity abound– most have been shown (of those we computed, e.g., lines of code added or deleted) to highly correlate with commit count, especially in models where confounds are recognized. We choose commit count as it is the simplest.

We describe responsiveness as a measure to answer the question: when you are called, do you show up? One would expect that those who are responsive, and thus display their reliability, will be called upon again. This is precisely defined as the number of times a developer is called and responds to that call; e.g., if a developer is called in 10 unique issues and responds in 8 of those issues, their responsiveness value is 8.

4.5.1 Extra-Project Controls

As stated, our interest is to identify readily observable attributes of potential @-mentionees (e.g., within-project social activity and commit activity), and functions thereof. This is in contrast to things that may be hard to observe, such as activity outside the project at hand (e.g., outside-project social activity, exact number of commits to other projects, etc.). However, such a control for outside experience is likely necessary as, e.g., a developer that is experienced outside the project may already be known due to outside channels, and thus have an inflated likelihood of being @-mentioned to begin with. We consider an outside-project attribute, developer’s GitHub age (in days), in order to control for experience outside the project which may lead to increased @-mentions when project contributions are relatively low. As GitHub age is readily observable through the profile interface on GitHub (e.g., by viewing the contribution heatmap), we believe this to be a reasonably observable control. Another outside-project control we considered was number of public repositories contributed to by a developer, as this is readily observable; however, this was highly correlated with age, and was thus dropped from the model.

4.6 Modeling Future @-Mentions

To answer our questions, we use count regression in a predictive model. This allows us to inspect the relationship between our response (dependent variable) and our explanatory variables (predictors or covariates, e.g., responsiveness) under the effects of various controls (e.g., project size).

There are many forms of count regression; most popular are so-called Poisson, quasi-poisson, and negative binomial regression, all of which model a count response. In our work, we are interested in @-mentions as measured by number of incoming @-mention calls per person - a count. In addition, as our data contain many zeros, we need a method that can accommodate; the methods listed above all have moderate to severe problems with modeling zeros. Zero inflated negative binomial regression and hurdle regression are two methods specifically designed to address this challenge by explicitly modeling the existence of excess zeros (Cameron and Trivedi 2013). It is common to fit both types of models, along with a negative binomial regression, and compare model fits to decide which structure is most appropriate. Standard analysis of model fit for these methods uses both Akaike’s Information Criterion (AIC) and Vuong’s test of non-nested model fit to determine which model works best (Vuong 1989).

We employ log transformations to stabilize coefficient estimation and improve model fit, when appropriate (Cohen et al. 2013). We remove non-control variables that introduce multicollinearity measured by variance inflation factor (VIF)> 4 (e.g., we do not use \(\mathcal {ISS}_{\rho }\) due to high VIF), as multicollinearity reduces inferential ability; this is below the generally recommended maximum of 5 to 10 (Cohen et al. 2013). Keeping control variables with high VIF is acceptable, as collinearity affects standard error estimates; as control variables are not interpreted, we do not much care if their standard error estimates are off (Allison 2012). We model on the person-project level, i.e., each observation is a person within a project. We performed multiple hypothesis testing (p-value) correction by the Benjamini and Hochberg method (Benjamini and Hochberg 1995). A squared age term is present in the zero model to account for a quadratic shape in the residuals, along with its lower order term as is standard in regression (Faraway 2014).

As noted in Section 3, we explicitly model future @-mentions; our response variable is the value 6 months after our “observed” (i.e., covariate) data. As such, we build a predictive model, not a fully regressive model - i.e., one that is built on the entirety of available data. We note the difference is minor, but worth reiterating.

5 Results and Discussion

5.1 Case Study: Project-Level Reasons for Call @-Mentions

We are interested in empirically measuring the reasons behind the @-mention. To make sure our theoretical underpinnings are reasonable, we performed a random manual inspection of 100 call @-mentions from our data set, to qualitatively identify the primary reason behind the call. A sample of size 100 grants far above the recommended statistical power of 0.8 at an error rate of 5% for 5 pairwise comparisons (for our 5 qualitative groups, discussed below) based on 1-way ANOVA with 2-sided quality; thus, our results are considered statistically sound (Chow et al. 2017). Though call @-mention use might be more common in some projects than others, our sample of 100 call @-mentions were selected at random across all projects, and thus our results should not be biased.

This study was performed initially by one author, with qualitative codes defined by prior work (Stolcke et al. 2000) (discussed in Section 2.5) and in collaboration with another researcher (not credited as an author of this work). We first identified a set of general codes from the mentioned prior work, and consolidated these into the final set of 5 presented. The goal of this case study was to validate theoretical reasoning discussed in Section 2, specifically to identify whether or not our definition of a call @-mention is viable, as well as guiding decisions regarding variables for our models. Thus, although our observed counts within each category are statistically robust as defined by recommended procedure, this study was performed primarily as a motivational study, rather than a strictly statistically robust undertaking.

The counts of each category found in our manual inspection is shown in Table 1. In the case of R, we argue that reliance and/or trust in the mentionee is clear: the mentioner explicitly requests that the mentionee performs some defined task; if the mentionee was deemed unreliable, the mentioner would be unlikely to trust them with an explicit task.

Table 1 Call @-mention categories, samples, and case study

For R-S, the mentionee is not explicitly called upon to perform some task. However, the mentioner seems to want the mentionee to respond (or perform a task), but does not wish to explicitly tell the mentionee to act, likely out of politeness. Though the call to action is not explicit, we argue this still represents mentionee reliability; like R, the mentioner wants the mentionee to perform an action, but does not explicitly state as much.

In the case of I, the call is meant to tag the @-mentionee in case they want to participate; not necessarily in order to respond to the thread, or perform some action. However, the mentioner believes that the mentionee may be interested in the issue at hand. This is similar to R-S, albeit slightly weaker, as the mentioner may not have a particular task in mind for the mentionee. However, this still indicates that participation from the mentionee may be appreciated. E.g., in Table 1, in the I example, @DavidGoll @karelz are cc-ed in a message to @mconnew, to inform them of a new development in a discussion they were previously involved.

In the case of CA, the mentioner is calling the mentionee in order to give credit, e.g., when the mentionee produced an important patch that is relevant to the discussed issue. Though this is not a clear reliance on the mentionee in description, in practice we find it is often used in a similar way to I; participation from the mentionee may be appreciated, but not necessary.

Across all 100 manually inspected cases, we found only 3 cases in which the call @-mention does not fall into the aforementioned categories (3%); one appears to be a misuse of the @-mention; the other two are due to users changing their GitHub display name after the @-mention is seen, thus throwing off our detection of the @-mention as a call rather than a reply. Thus, we argue that the call @-mention is consistently representative of reliance on the mentionee.

5.2 Future @-Mention Models

Figure 2 shows a selection of variables from our categories of interest and their paired relationship with future @-mentions. For all variables, we see a strong positive relationship with @-mentions; the largest correlation sits with developer responsiveness (78.90%).

Fig. 2
figure 2

Future @-mentions vs. selected attributes of visibility, expertise, productivity, and responsiveness. Axes log scaled

Though paired scatter plots provide initial insight to determinants of potential power, we must model them in the presence of other variables, along with controls, to properly answer our questions.

RQ 1: :

Can we describe/predict future @-mentions in terms of developer visibility, expertise, productivity, and responsiveness?

Table 2 shows our model of future @-mentions, with determinants of interest grouped and separated from one another. Our analysis points to a zero hurdle model as providing the best fit, which separately models the process of attaining one’s first call (“zero” model, logistic regression), and the process of attaining beyond one call (“count” model, negative binomial regression). We tested for the usage of a negative binomial regression as opposed to a Poisson regression in the count model by fitting an additional quasi-poisson model to test for issues of under- and over-dispersion, which can be an issue for Poisson models and may affect model quality. Quasi-poisson models explicitly fit a dispersion parameter which can be used to evaluate under- and over-dispersion; negative binomial models can be used for over-dispersed data (Rodrıguez 2013). Results from these tests suggested that over-dispersion was an issue for our data; in addition, a negative binomial count model had a better fit than both Poisson and quasi-poisson models. Thus, we chose to model non-zeroes using a negative binomial model in our hurdle regressions. Figure 3 depicts predicted and observed values along with a y = x and trend line.

Table 2 Future @-mention model; p-values corrected by BH method. User subscripts omitted; they refer to the developer under observation within the model
Fig. 3
figure 3

Predicted vs. observed values

The mean average and mean squared error are 0.910 and 15.769, respectively. To aid in interpretation of how good this predictive model is, we note that the range for the observed future @-mentions is from 0 to 136, with an average of 2.637; thus our mean average error with respect to the spread is 0.910/173 = 0.670% and with respect to the average is 0.910/2.380 = 38.24%. In other words, looking at the mean average error, our model differs from the observed value by an average of 0.910 call @-mentions. Thus, we say this indicates a good model fit.

Visibility

We see that \(\mathcal {OSS_{\rho }}\) and social outdegree are positive for both the count and zero components of our model. This suggests that a higher social focus (in replying to others) and larger overall social outdegree associates with being @-mentioned in the future - be it in the transition from zero to greater than zero @-mentions, or in increasing @-mentions. However, we see a negative coefficient for \(\mathcal {ISS_{\kappa }}\), suggesting that when others focus their calls on the observed individual, the observed’s @-mention count decreases.Footnote 13 This negative coefficient is not unexpected; \(\mathcal {ISS_{\kappa }}\) is derived from the Kullback-Leibler divergence, and when there are many cells (i.e., others that can be called), it is expected that a higher focus is correlated with a lower raw value. E.g., consider the case where 10 individuals can call on developer A. If each calls A once, the raw value for calls is 10 and \(\mathcal {ISS_{\kappa }}\) is low; if only one developer calls A, the raw value is 1 but \(\mathcal {ISS_{\kappa }}\) is high. In support of this intuition, Posnett et al. (2013) found that a higher value of \(\mathcal {DAF}\) associates with a lower raw cell count.

In sum, having a larger social presence (\(\mathcal {OSS_{\rho }}\), social outdegree) may associate with one’s future @-mention count. These values are much easier to increase for an individual than \(\mathcal {ISS_{\kappa }}\), as \(\mathcal {ISS_{\kappa }}\) is a function of indegree, and thus less in the individual’s control.

Expertise

The number of likely buggy commits a developer makes has a negative coefficient for the count component, suggesting that a larger number of likely buggy commits associates with a decrease in @-mentions. This is as expected: a higher expertise should lead to more future @-mentions. However, we see a positive coefficient for the zero component. This is puzzling at first, but may be explained thusly: it is known that contributions are extremely important in order to gain technical trust in OSS (Gharehyazie et al. 2015), supported also by the large coefficient for commits in the zero component (0.453). As the number of likely buggy commits is correlated with the number of overall commits by a developer, this positive coefficient indicates that contributing at all, regardless if one’s contribution is buggy, is important in getting the first call mention, and thus the first @-mention.

Productivity

In both the zero and count components, we see a positive coefficient for commits, indicating that increased productivity is associated with higher @-mentions. The zero model coefficient is very high. This is in support of productivity being important in receiving the first @-mention.

Responsiveness

Interestingly and contrary to our hypothesis, for the count component, we see an insignificant coefficient. Responsiveness is not considered in the zero component as one must be called in order to reply, which means responsiveness is undefined for those with an @-mention count of 0.

figure d

5.3 Case Study: Attributes of Interest and Model Fit

To further examine RQ 1 and provide concrete reasoning behind our model’s fit, we performed case studies. Specifically, we looked at those with high observed future @mentions but low model predictions, and those who transition from zero to nonzero @-mentions.

5.3.1 Sub-Case Study: High Observed @-Mentions, Low Predicted @-Mentions

For this study, we manually examined those with less than 50 and greater than 15 observed future @-mentions, nonzero observed past @-mentions, and a predicted @-mention count of less than or equal to 1; i.e., those along the bottom of the x-axis of Fig. 3. In this region, all individuals have never explicitly replied to another developer (i.e., \(\mathcal {OSS_{\rho }}\) and social outdegree are both 0), a low number of commits (1 to 9); as these coefficients are positive in our model, these individuals should be pushed to higher counts. However, all developers in this region also have relatively high \(\mathcal {ISS_{\kappa }}\) (0.1 to 1.0), and have experience in other projects (indicated by a large developer age). As both \(\mathcal {ISS_{\kappa }}\) and developer age have a relatively large negative influence in our model, this explains why our predicted future @-mentions are low from a statistical standpoint.

To dig deeper, we consider the case of a particular developer in this region: developer arthurevans, for project google/WebFundamentals. In issue #4928 of the project, a discussion about PRPL patterns,Footnote 14 the poster says: “I’ll defer to the grand master of all things PRPL, @arthurevans for what the final IA for this section might look like”. Although arthurevans has low observed activity in the project itself (e.g., low social outdegree and low commit count), this indicates that the poster greatly values arthurevans’s input. The story is similar for the others in this region;Footnote 15 the issue poster values the opinion of the called-in person, indicating a level of outside-project expertise.

In summary, it appears this region consists of those who are actually expert, but this expertise is not reflected by their in-project contributions. Although we attempt to capture outside expertise through a developer’s overall GitHub age, we were unable to include other metrics of outside expertise (e.g., number of public repositories contributed to) due to high multicollinearity. Orthogonal metrics of outside expertise may exist that can better fit these individuals.

5.3.2 Sub-Case Study: Transitioning From Zero @-Mentions

For this study, we took a random sample of 10 individuals (out of 235) who had zero observed @-mentions, but transitioned to nonzero @-mentions in the next 6 months, i.e., our future period. In this region, we observe a combination of factors: project age and newcomers who wish to participate more. Some projects are relatively new or newly popular, which means that although they are rapidly gaining popularity on GitHub, their issue production rate hasn’t yet caught up. Though all individuals have contributed to the project, there has not been a chance for @-mentions to be observed; those transitioning from zero @-mentions to nonzero @-mentions would likely have nonzero @-mentions had the observation time split been later in the project.

Perhaps more interesting, we see some new individuals that have recently contributed commits and seem genuinely interested in participating more. For example, in pull request #2587 of the project prometheus/prometheus, we see the first call to developer mattbostock, causing a transition from zero to nonzero @-mentions. Prior to this, we see that mattbostock had been contributing to issue discussions (e.g. issues #1983 and #10), bringing up problems and providing potential solutions. Thus, due to signaling interest and participating in discussions (visibility), providing commits (productivity), and having no issues against these commits (expertise), we see them being eventually recognized in an @-mention.

RQ 2: :

Can models trained entirely on one project be reliably used to predict @-mentions on another project?

To answer this question, we require project-specific models of @-mentions. Due to the sparseness of data, adding a factor to the existing model in Table 2 causes estimation to diverge. Thus, to avoid divergence, we fit simplified models with selected attributes of visibility (\(\mathcal {OSS_{\rho }}\), \(\mathcal {ISS_{\kappa }}\), social outdegree), expertise (likely buggy commits), productivity (commits), responsiveness, and developer’s outside project experience (GitHub age). A subset is required due to the smaller number of observations per project; too many variables for too little data can cause issues as, e.g., small multicollinearity can cause big issues for small data. Thus, we select only a few representative variables from each of our groups of interest. For consistency, we explicitly fit separate models for the transition from zero to nonzero (zero component) and for nonzero count (count component), as is done implicitly by the hurdle model.

Figure 4 contains symmetric heatmaps of predictability for our project-specific models (count and zero, respectively). To measure predictability of the count component, we use the average of mean absolute error (MAE) between each pair of models. For projects i and j, with data di and dj, and models yi and yj, we compute predicted values \(\hat {y}_{i} = y_{i}(d_{j})\) and \(\hat {y}_{j} = y_{j}(d_{i})\); i.e., we predict using one model’s fit and the other model’s data, thus providing a measure of cross-project model fit. We then compute the average MAE between the two fits i.e., \(\frac {\hat {y}_{i} + \hat {y}_{j}}{2}\), and plot this value in each heatmap cell. For the zero component, we analogously compute fit by calculating the average area under the receiver operating characteristic curve (AUC) between two projects i.e., \(\frac {AUC(\hat {y}_{i}) + AUC(\hat {y}_{j})}{2}\). For MAE, a lower value is better; for AUC, a higher value. We then plot a dendrogram, showing clusters of projects based on predictive ability.

Fig. 4
figure 4

Cross-project predictive power heatmap for each project-specific model, count (a) and zero (b) components

For both the count and zero components, we generally see good fit across projects (lower average MAE, higher average AUC), with some outliers. For the count case, we see that uxsolutions/bootstrap-datepicker is an anomaly in having poor fit for many projects, being grouped in its own cluster. Otherwise, there are no immediately clear clustering relationships between projects, other than that the mean MAE is generally below 10, as noted in the density plot.

For the zero case, we also see one clear outlier: akka/akka. In general, cross-project fits for this project are relatively poor compared to the majority. The reason for this may be due to the difference in importance for our determinants of interest as compared to other projects. Figure 5 shows our fitted coefficients for each project model. For the zero component, though akka/akka does not lie on its own according to hierarchical clustering, we see that its coefficients are very different from other projects, with a negative coefficient for commits and almost zero coefficients for all other variables (except social outdegree). This explains the poor cross-project fit; in this project, a higher number of commits associates with a lower predicted @-mention count, while in the majority of other projects this coefficient is positive (or nearly zero).

Fig. 5
figure 5

Heatmap of coefficients for project-specific models, a count and b zero components

In summary, we do see a general trend of good fit for both the count component and, to a lesser extent, the zero component.

figure e
RQ 3: :

Is there evidence of project-specific @-mention culture? Or are the determinants of @-mentions a GitHub-wide phenomenon?

Figure 5 contains heatmaps of coefficients for the count and zero components of our project-specific models. When looking at each column, we see some coefficients that are almost uniformly the same, e.g., responsiveness for both components, commits for the count component, and likely buggy commits for the zero component. However, we do see differences, e.g., \(\mathcal {OSS_{\rho }}\) in both model components is negative for some and positive for other rows.

The fact that there are differences per column (i.e., per coefficient) for most coefficients lends credence to the idea that there are project-specific @-mention culture differences on a per attribute basis. However, there are things that don’t change across projects, e.g., the importance of commits in gaining more @-mentions. In addition, the generally high cross-project predictive power shown in Fig. 4 suggests that project-specific culture differences may not matter too much. To identify some concrete reasoning behind these particular differences in variable importance, we turn to another case study.

5.4 Case Study: Project-Level Differences

Reflecting on Fig. 5, here we ask: why are some coefficients positive for a number of projects, and negative for others?

As \(\mathcal {OSS_{\rho }}\) seems to exhibit this behavior in both our count and zero models, and significantly so for our global model, we choose it for our study. For the zero model, we see a negative coefficient for projects uxsolutions/bootstrap-datepicker, pouchdb/pouchdb, and codemirror/CodeMirror; indicating higher specialization in one’s replies associates with lower future @-mentions in the projects.

One explanation for this phenomenon could be due to a larger inner circle as compared to other projects; i.e., to gain @-mentions one must become visible to more people. For both uxsolutions/bootstrap-datepicker and pouchdb/pouchdb, this seems to be the case. When looking at the distribution of commits across contributors, in both projects the original top committer has largely reduced their commit rate, while in the mean time the second largest committer has picked up the pace. In addition, the distribution of commits seems to be comparatively more uniform across contributors, indicating a larger inner circle. For codemirror/CodeMirror, the distribution of commits is highly concentrated in the top committer; however, when viewing issues, we see that multiple others contribute to review and discussion. This likewise indicates a larger inner circle that one must be visible to. For the count model, the story seems to be the same for projects with a negative coefficient; there is either a more uniform distribution of commits across the top contributors, or a larger number of individuals participating in issue discussions, indicating a larger inner circle.

For projects with positive coefficients, we see a different behavior. In pull requests, it appears the top project members are more open to calling on others to provide input. E.g., for project spotify/luigi pull request #2186, a top contributor asks the original poster to run git blame on the modified code to see who originally posted it, admitting a lack of expertise about the associated module; we see similar behavior for pull request #2185. For project addyosmani/backbone-fundamentals issue #517, we see the project owner calls on another contributor for their input, stating “[I] would love to suggest your project to devs ...”. Recall that a positive coefficient for \(\mathcal {OSS_{\rho }}\) indicates a specialization in reply behavior, i.e., more focus in one’s social behavior. As the top contributors for these projects seem to be the ones calling on others, it appears one may specialize their social behavior towards the top contributors to get noticed; hence, more social specialization may associate with higher future @-mentions.

figure f

6 Practical Implications

Understanding which of the observable attributes of a developer are most correlated with their status can inform developers about how others see them, and help them learn the community values. To that end, our models of future @-mentions based on past observable behavior can be useful for informing developers as to which of their external characteristics matter to others, for them to be called for help. Such an understanding can be a first step to becoming a part of a community, or belonging. In practice, this means examining the coefficients of our models, and noting the attributes (i.e., metrics) having significant positive and significant negative coefficients in the models. Those would be the attributes that the data says may matter most to the community. Since we model two separate phenomena, the count and the zero models in Table 2, there are different attributes that matter to the community for attaining a higher count than for getting the first @-call, as discussed in the Results and Discussion section.

But understanding community values can also lead to wanting to enhance those values, or perhaps modify them. Periodic re-fitting and examination of the models can reveal trajectory changes in the community values, e.g., increased emphasis on participating in discussions, or decreased emphasis on responsiveness over time. If the resulting trajectories are showing a departure from the community goals, then concerted, community-wide efforts, e.g. by establishing guidelines or even rules, may be needed to modify those trajectories and bring them closer to the ideals.

As discussed in Theory and Related Work above (Section 2), the (call) @-mention represents a deeper underlying phenomenon than just a tool for getting attention. It is also a belief in the @-mentionee’s abilities to provide valuable input (Tsay et al. 2014), often explicitly stated as being greater than the caller’s, to accomplish a particular task at hand. Attaining a social (or technical) status as the developer who “gets the job done” can be desirable, as these individuals may more easily enter the “circle of trust” within a given OSS project (Gharehyazie et al. 2015), potentially reducing the time it takes for their commits to be incorporated into the code base (Dabbish et al. 2012; Calefato et al. 2017). Having higher status also increases one’s visibility, and with it the chances of participating (or furthering participation) in popular projects (Dabbish et al. 2012). In addition, social interaction between work group members has been found to be important for task success (Saavedra et al. 1993), and the elevation of members to a “role model” status (Bandura 1973; Bandura and Walters 1977). On GitHub, these role models (“rockstars”) have been found to be important influencers on the general community, providing examples of coding best practices and facilitating novice developers to learn (Dabbish et al. 2012; Lee et al. 2013). Our models in Table 2 offer good predictivity, and indicate which attributes of developer activities may be influential in attaining future @-mentions. For example, our model shows that committing more is associated with more future @-mentions, for the first mention and beyond. In addition, having a larger outgoing social presence (social outdegree) is associated with more future @-mentions. Thus, developers who wish to attain the above mentioned status roles can use these model associations as guidelines to finely calibrate their behavior over time to exhibit in the community the characteristics that can result in them gaining higher status.

Our finding that the models can perform cross-project prediction well suggests that, on average, it can be expected that one’s activity pattern in a project, if emulated in other projects, can result in the same level of @-mentions for them. This also implies that by simply joining the other project and continuing the same behavior as in the former project, one can expect similar @-mention levels there. However, in each project a developer needs to demonstrate sufficient levels of the predictors mentioned above in order to be called upon, which presumably will take some time.

Some of our results were less obvious than others, e.g., the insignificant effect of responsiveness. This may indicate that it is worth calling on and waiting for the high-status people to get involved, even if they are slow.

From a security perspective, trusting new people with the project’s code is associated with more maintenance and supervision, which is certainly a concern. Based on our results in this paper, increased efforts could be useful towards training new people to the specifics of the project’s code, e.g., by creating a portal for newcomers (Steinmacher et al. 2016). Future work may include building online tools to facilitate newcomer onboarding, e.g., the creation of “tag profiles” which provide suggestions to new users regarding how to increase their @-mentions, an indicator that they are important to a project’s success, thus benefiting the project as a whole; also (and perhaps more controversially) it might be helpful to have tools to measure how often each developer’s changes induce future fixes.

7 Threats to Validity

There were challenges involved in all aspects of the work, largely due to the loaded reasoning behind @-mentions. Being @-mentioned is not just a result of technical prowess; @-mentioning is also a social phenomenon. Many potential issues were anticipated and carefully addressed. Once we settled on the idea of using call @-mentions, we were able to connect our outcome with background theory on the multidimensionality of @-mentions. To define @-mentions precisely, we necessarily had to narrow our definition specifically to call mentions in issue discussions.

We acknowledge that only considering individuals with non-zero commits is a threat. However, the density of @-mentions for those with zero commits is highly concentrated at zero (median 0 @-mentions for those with zero commits, mean 0.26). In addition, this helps alleviate the threat of bots as bots often must be @-mentioned to be activated; however, most bots do not show up in the “authors” field for a given commit (they may show up in the “committer” field, but we do not use this field in our work). In addition, as we have data from many projects, unless bots are the vast majority of our data points, they should not have a palpable effect on our model fit. We also manually inspected commits to try and filter out bots. However, we cannot be certain that all bots are removed from our data; thus, we acknowledge bots as a threat.

Regarding our identification of “reply” and “call” @-mentions, it is possible that we misidentify a “reply” as a “call” if an individual uses multiple GitHub accounts to post within a project. E.g., if someone first posts with username A, then later posts as username B, we would identify B as a “call” rather than a “reply”. Due to the confusion that would likely ensue if this was common, we do not see this as a major threat.

We acknowledge that our identification of likely buggy commits may have issues. The method we use is supported by multiple prior works (Śliwerski et al. 2005; Kim et al. 2006), and seen as a “standard method”. However, this does not have the robustness of a method that, e.g., uses an explicit bug tracker to identify buggy commits, and has issues as described by prior work (da Costa et al. 2017; Rodríguez-Pérez et al. 2018). Due to the structure of GitHub, there is no guaranteed method to identify buggy commits that will work across all projects. Thus, we acknowledge this as a threat.

We note that our operationalization of productivity through commits has its issues. This includes the issue of varying commit styles amongst developers and between projects; some projects may want commits to be as small as possible, while others may not care for or enforce such a rule. We also tested for the inclusion of \(\mathcal {MAF}\) and \(\mathcal {DAF}\) in their original forms from prior work, which can be seen as productivity measures, as well as raw lines of code. However, these measures were highly correlated with commits. Thus, we chose to represent productivity with a more easily understood measure - commits - for the sake of model parsimony (Vandekerckhove et al. 2015).

Though we attempt to model developer expertise through their GitHub age (in days), we acknowledge a threat with this operationalization. Though age is often an important indicator of expertise, expertise can be measured outside of age, e.g., a contributor may have a specific skill set that is sought after in a project, and thus may be called upon regardless of their age. We do have a semi-overlapping measurement of code-based skill through commits, though this is not guaranteed to be adequate in controlling for this phenomenon. And, as noted above, \(\mathcal {MAF}\) and \(\mathcal {DAF}\) - which can be interpreted as more complex measures of expertise - were highly correlated with commits. It may be possible to measure skill-specific expertise by, e.g., looking at a developer’s commit history and seeing the subject domain of their contributions prior to being called. However, this measurement would be difficult to obtain as it would require a labeling of subject domains, which GitHub does not reliably provide. Thus, though we attempt to control for developer expertise, we note this as a threat to validity.

We also acknowledge that coefficient estimates are somewhat small, and we do not report effect sizes (beyond interpreting coefficient estimates as “effect sizes” themselves). This is because standard effect size calculations, E.g., Cohen’s d (1988), are not well-behaved for non-Gaussian distributions - as is the case in our models.

Our case studies would benefit from larger amount of data. The case study sizes were due to the regions of interest; our regions were small, and thus our case studies were relatively small.

Our work is supported by prior qualitative research into @-mention usage. Still, we acknowledge that our study would likely benefit from further qualitative studies, e.g., a survey of developers on their use of the @-mention.

8 Conclusion

We performed a quantitative study of @-mentions in GitHub, as captured in calls to people in discussions. We supplemented those with case studies on samples of discussions, to help triangulate our findings. Our models have good fits to the data, suggesting that our formulation of @-mentions is explained well by the data.

The idea that projects in an ecosystem have similar models of what it means to be worthy of an @-mention is appealing. We find that the good cross-project predictive power cannot be simply distilled down to productivity in our models, thus adding evidence toward the multidimensional nature of @-mentions. It is also very reasonable that there would be cliques of projects in which the sense of who to @-mention is even more uniform than across the whole ecosystem, and our findings underscore that. Obvious open questions here are: how do notions of @-mentions get in sync? And, to borrow from ecology, does the robustness of the @-mention models across GitHub convey any fitness benefit in the ecosystem? We can see a plausible mechanism that would offer an answer to the first: projects share people and people cross-pollinate the @-mentioning behavior across projects in which they participate. We leave the validation of this, and other models, to future work. The @-mention model robustness, likewise, implies some preference for success, be it by design or an emerging one, across the ecosystem. This can be a function of people’s mobility in the ecosystem and their preference for and vigilance to participate in popular projects; we leave the answers for future work.