1 Introduction

More and more developers work collectively on software development projects in online platforms such as GitHub. When contributors of a project push their changes of the code to the project repository, pull requests (PRs) are issued and they must be reviewed and approved before the new code gets merged into the codebase (Fig. 1). To maintain the flow of project development, there is a pressing demand for timely, qualified PR reviews (Yu et al. 2016). According to a survey in 2014, 15% of the contributors complain that their pull requests hardly get a prompt feedback (Gousios et al. 2016a). It has always been a challenge assigning pull requests to appropriate reviewers (Balachandran 2013). On the one hand, although contributors can propose reviewers in their PRs, many of them, especially those new to a project, have little idea of who may be qualified and willing to review their code (Xia et al. 2015). On the other hand, reviewers have limited time and capacity to handle the large quantity of PRs (Gousios et al. 2015; Pham et al. 2013). To improve the efficiency of collaborative code review process, some developers introduce bots that provide automatic reviewer recommendation (ARR) service into their projects. For instance, Balachandran (2013) implements “ReviewBot” that shortlists potential reviewers by code change history, and then selects the most appropriate reviewer using code review history. Recently, major software development platforms start to provide their own ARR services for users. For example, developers in GitHub with write access to a project’s repository can request reviews from suggested developers based on git blame data (i.e., each line of the file that being revised) (Github 2017b).

Fig. 1
figure 1

GitHub branch-based workflow (GitHub 2013)

Existing research on ARR services mainly focus on the performance in terms of recommendation accuracy (Jiang et al. 2015; Yu et al. 2016; Zanjani et al. 2016; Fejzer et al. 2018). However, the user experience of and interaction among different stakeholders involved in the process are under-investigated (Fig. 1). In particular, little work has looked into (1) how developers perceive and work with ARR in practice, and (2) what are the most critical needs for different types of users involved in ARR services. To fill this gap, we conduct a case study on Facebook mention bot, an ARR bot active in GitHub from October 2015 to April 2018, as a lens to gain insights into a better design of ARR services in collaborative software development platforms.

In this paper, we use a two-stage mixed-methods approach to address the two questions mentioned above. In Stage I (2015.11 to 2016.06), we conduct archival analysis on 155 GitHub projects that employed Facebook mention bot at that time. More specifically, for each project, we compare the response rate and response time of pull requests with and without reviewers suggested by mention bot, to assess the effectiveness and efficiency of this ARR bot in practice. We further analyze comments related to mention bot inside these projects and conduct a survey with 52 mention bot users to explore user needs. In Stage II (2016.07 to 2017.08), we revisit these projects and analyze new user comments emerged within the year to see if user needs identified in Stage I are met. To gain more in-depth understanding of why developers use/do not use mention bot and what they expect from an ARR service, we divide the ARR users into three groups: project owners, contributors, and reviewers. Then we conduct an additional survey with 36 valid responses and interview six developers in GitHub to explore the needs of each group. Results of the two-stage investigation show that developers appreciate mention bot saving their efforts, but different user groups have different demands for ARR services, i.e., simplicity and stability needed by project owners, transparency needed by contributors, while selectivity needed by reviewers. We summarize our findings into considerations for future ARR services design.

2 Background and related work

In this section, we first introduce the concepts of pull request (PR) and review process in GitHub, then we summarize the mechanism of some existing ARR services.

2.1 Pull request and review process in GitHub

The pull-based development is the latest model of distributed software development (Gousios et al. 2014). To receive external contributions, repositories are shared by fork (i.e., clone) and modified by PRs. Normally there are three kinds of developers involving in the pull request process:

  • Project owners who possess PRs in their projects.

  • Contributors who submit PRs that need reviews.

  • Reviewers who help to review PRs.

The pull request process is described in Fig. 1. Contributors fork a master branch and commit changes to their local branches (Gousios et al. 2014, 2016b, 2015). To make contributions to the master branch, contributors submit a set of changes by creating a PR. The owners inspect the PR and project owners decide whether to merge the changes or not. During this process, the project owners, reviewers and contributors usually need to discuss the proposed changes. In the end, the PR is closed.

After a pull request is opened, anyone with read access can review and comment on the changes it proposes. GitHub allows developers to comment on the changes proposed in pull requests, approve the changes, or request further changes before the pull request is merged.

When PRs are submitted, they are intended to be reviewed within a short period of time. However, in reality, owners in popular projects receive too many PRs. They have difficulties in reviewing these PRs by themselves or identifying other appropriate reviewers for them (Balachandran 2013; Gousios et al. 2015; Jiang et al. 2015; Thongtanunam et al. 2015; Tsay et al. 2014; Xia et al. 2015; Yu et al. 2015; Yu et al. 2016; Zanjani et al. 2016).

2.2 Automatic reviewer recommendation for pull requests

To reduce project owners’ efforts, some researchers have proposed automatic reviewer recommendation (ARR) services (Balachandran 2013; Jiang et al. 2015; Thongtanunam et al. 2015; Xia et al. 2015; Yu et al. 2016; Zanjani et al. 2016). As the key of any review is context and change understanding (Bacchelli and Bird 2013), these ARR services intend to bring in reviewers who are qualified for the PRs and willing to help. They normally use historical information of code change and review in order to identify appropriate reviewers (Balachandran 2013; Jiang et al. 2015; Thongtanunam et al. 2015; Xia et al. 2015; Yu et al. 2016; Zanjani et al. 2016). The “ReviewBot” proposed by Balachandran shortlists potential reviewers by blame information, and then selects the most appropriate reviewer who has modified the related code sections most (Balachandran 2013). Thongtanunam et al. (2015) proposed “RevFinder ” which recommends reviewers not only based on code review history but also the similarity of file paths. Then, “Tie” was proposed to enhance “RevFinder” by using different similarity measures for file paths and textual information in pull requests (Xia et al. 2015). Jiang et al. (2015) developed “CoreDevRec” to train a prediction model using a support vector machine. This model uses three features, which are file path, social interaction between reviewers and contributors, and activeness of reviewers. Profiles of the developers are also used for reviewer recommendations. For example, Rahman et al. (2016) proposed to use the experience of a developer in certain specialized technologies associated with a PR in addition to the cross-project experience to determine the expertise as a potential code reviewer. The experiment on their dataset show that this technique can achieve over 85% recommendation accuracy. Fejzer et al. (2018) employed a similarity function between programmers’ profiles and change proposals to be reviewed to give recommendations, and they obtained improved results in terms of classification metrics and performance. A review of different PR reviewer recommendation techniques can be found in Badampudi et al. (2019).

While the above ARR services are outside tools of the software development platforms, GitHub has provided their own ARR services. The “CODEOWNERS file” (Github 2017a) is used to define individuals or teams that are responsible for the code in a repository. These developers will be automatically requested for review if someone modifies the code they own. The “suggested reviewer” feature (Github 2017b) can automatically suggest reviewers based on git blame data. Every time a PR is submitted, the organization members, repository owners and collaborators can see the suggested reviewers in the right sidebar of the PR and they can decide whether to request reviews from these reviewers or not.

However, little work addresses how software developers perceive and work with these ARR services in practice. Many factors could affect the efficiency of these services, as suggested by works that explore user experience of recommendation services in the domains of music, digital cameras (Chen et al. 2013; Ferwerda et al. 2015; Lee et al. 2015; Sinha and Swearingen 2002; Stolze and Nart 2004). For example, Sinha and Swearingen (2002) studied the role of transparency in music recommender systems. Stolze and Nart (2004) found that compared with the feature-oriented recommendation, needs-oriented recommendation for digital cameras was more helpful. Chen et al. (2013) studied how personality influences users’ need for recommendation diversity. Ferwerda et al. (2015) tested that the user personality affected their ways of choosing music. However, the methods to study the user experience in above scenarios only referred to only one specific aspect or no more than two user groups. It is still a challenge to study user experience in the domain of ARR service for online software development platforms, which might involve three kinds of user groups (project owner, contributor and reviewer) in.

2.3 Facebook mention bot

Facebook mention bot can recommend any developers to be reviewers using two heuristics: (1) If a line was deleted or modified, the person that last touched that line is likely to care about this pull request. (2) If a person last touched many lines in the file where the change was made, he may want to be notified (Facebook 2015).

Since its launch in October 2015, mention bot had served for 205 GitHub projects and handled 12,060 pull requests up to June 2016. owners of GitHub projects can deploy the mention bot using a webhook service (Webhooks 2017) without any extra setting. Once the mention bot is employed in a project, a recommendation comment is added to the newly made pull requests as shown in Fig. 2. By default, mention bot will straightly mention its recommended reviewers after the PR is created, but project owners can manually personalize the bot by adding a “.mention-bot” file to the base directory of the repository (Facebook 2015). For instance, they can configure some recommendation and notification rules such as the maximum number of candidates for recommendations, the message from mention bot and the blacklist for some reviewers.

Fig. 2
figure 2

An example of the mention bot comments (Facebook 2015)

3 Research method overview

In this section, we first introduce the facebook mention bot, and then present our two-stage mixed-methods approach.

3.1 Facebook mention bot

In this work, we use mention bot developed by facebook as a lens to look into how developers work with it in practice and what are the critical needs for different stakeholders. Facebook mention bot can recommend any developers to be reviewers using two heuristics: (1) if a line was deleted or modified, the person that last touched that line is likely to care about this pull request. (2) If a person last touched many lines in the file where the change was made, he may want to be notified (Facebook 2015).

Since its launch in October 2015, mention-bot had served for 205 GitHub projects and handled 12,060 pull requests up to June 2016. owners of GitHub projects can deploy the mention-bot using a webhook service (Webhooks 2017) without any extra setting. Once the mention-bot is employed in a project, a recommendation comment is added to the newly made pull requests as shown in Fig. 2. By default, mention-bot will straightly mention its recommended reviewers after the PR is created, but project owners can manually personalize the bot by adding a “.mention-bot” file to the base directory of the repository (Facebook 2015). For instance, they can configure some recommendation and notification rules such as the maximum number of candidates for recommendations, the message from mention bot and the blacklist for some reviewers.

3.2 Two-stage mixed-methods approach

To better explore how software developers perceive and work with Facebook mention-bot overtime, we carry out our research with archival data, survey and interview in two stages. In the first stage, we analyze 205 projects that employ mention bot, investigate 53 issue comments about mention-bot, and conduct a survey with 52 mention-bot users. In this stage, we focus on mention bot’s performance in practice during a certain period (from November 2015 to June 2016). By analyzing the pull requests in these projects, we measure the response rate and the response time of the recommended reviewers. We use the issue comments to investigate user needs for mention-bot, while the survey is used to learn how users perceive its usefulness. We conclude with three potential features to improve mention bot and address user needs at the end of Stage I. In the second stage, we revisit these projects and analyze another 90 related comments emerged within this year to see if user needs identified in Stage I are met. Furthermore, to gain more in-depth understanding of why people use/do not use mention bot and what they expect from an ARR service, we conduct a survey and acquire 34 valid responses from three user groups, i.e., project owners, contributors and reviewers, and then interview six developers. Then we explore factors critical to the user experience of ARR services for each user group. Noticed that above research methods might conflict with or support each other, we then integrate our results of two stages to discuss how to provide better user experience in automatic reviewer recommendation services.

4 Stage I

4.1 Research setting

4.1.1 Data collection

In Stage I, we track the public activities of Facebook mention bot up to June 2016 and identify 205 projects in GitHub that employ this bot. We use GitHub API (GitHub 2016) to gather their properties, pull requests and issues. Among these projects, we exclude the projects that have less than four reviewer candidates (i.e., the total number of contributors in the project), since the mention bot normally recommends up to three candidates. We further exclude the projects that have not received any external contribution (i.e., pull requests made by external contributors). According to the literature, a reviewer identification task can be challenged with external contributions (Gousios et al. 2015; Tsay et al. 2014; Yu et al. 2015). Finally, we use 155 projects for our investigation.

We exclude the following pull requests: (1) pull requests made by project owners and merged into a master branch without any review. (2) Pull requests made by other bots. (3) Pull requests not closed. In total, we identify 64,937 pull requests from the 155 projects. Among them, the mention bot is called in 9413 pull requests while not being used in the rest.

Table 1 Properties of projects used in our archival analysis

Table 1 represents the properties of the 155 projects in our dataset up to June 30th, 2016. Their average development period is about 28 months (SD = 21.38). The latest revisions of the projects have approximately 58K lines of source code on average excluding whitespace and comments (SD = 124.71K). The average numbers of commits and pull requests in total are around 2863 (SD = 7953.35) and 413 (SD = 1250.60) respectively for each project.

Table 2 The number of issues comments that show the positive, negative and neural evaluations of the mention bot

We extract 258 issue comments that contain the keyword, “mention bot” in the original 205 projects. To avoid bias, we exclude 15 issue comments from the two projects that develop and test the mention bot. Through manual inspection, we finally identify 53 issue comments that express the likeability of the mention bot (Table 2). There are 25 positive comments, 20 negative comments and eight neutral comments that give suggestions.

4.1.2 Survey

In Stage I, we identify 2467 developers in GitHub who make or review the pull requests that call Facebook mention bot. Among them, 1445 developers post their email addresses on GitHub profiles or personal web pages. We advertise for our survey to these developers by emails. To get more responses, we also invite them to distribute the survey to their communities. In total, we receive 52 responses.

Our survey consists of five questions about the perceived usefulness and likeability of Facebook mention bot. The first question asks if mention bot recommendations are appropriate. We use a 5-point Likert scale to measure the appropriateness of the mention bot recommendations. In the next question, we measure the perceived reduction in response time and efforts after deploying the mention bot. We provide four statements regarding this aspect and ask the respondents about their level of agreement using a 5-point Likert scale. The first two statements represent whether participants receive responses faster or provide a faster response when the mention bot is involved. The other two statements are to examine whether the participants can save the efforts spent on identifying proper reviewers or exploring pull requests using the mention bot. The rest three questions ask about the likeability of the mention bot. Specifically, we use a 5-point Likert scale to measure how much the participants like the mention bot. Then, we offer the four options that correspond to the “Reviewer recommendation”, “Automatic notification”, “Enable/disable notification for certain PRs/people” and “Message customization” features of the mention bot. We ask the participants to select one or multiple favorite features if they respond positively to the previous question. Finally, a yes–no question asks if they would continue using the mention bot.

4.1.3 Performance evaluation methods

To understand what kind of benefits a reviewer recommendation service can provide, we first technically measure mention bot’s performance by response rate and response time of recommended reviewers in practice.

In our work, we measure response rate rather than top-k accuracy as in other works (Balachandran 2013; Jiang et al. 2015; Thongtanunam et al. 2015; Xia et al. 2015) because contributors concern about whether there is any response from recommended reviewers. If the ARR service can correctly recommend a reviewer who is interested in working on the PR even he or she might not work it out, it still does a good job. Response rate (Eq. 1) represents the percentage of pull requests whose actual reviewers are correctly recommended by Facebook mention bot. It is similar to top-k accuracy, but we count it as a hit if any of the recommended developers is observed in a review process.

$$\begin{aligned} \text {Response rate} = \frac{\sum _{r\in R}\text {Hit}(r,\text {Response})}{|R|} \times 100\%. \end{aligned}$$
(1)

The calculation of response rate for each project is straightforward. For each pull request (PR) that mention bot comments on, we count it as a successful response if at least one of the recommended reviewers show up in this PR review process.

Response time (Eq. 2) can reflect whether mention bot can reduce time in involving reviewers in pull requests. It refers to the time difference between submitting a PR and the first response made by any developer other than the submitter (Yu et al. 2015; Yu et al. 2016).

$$\begin{aligned} \text {Response time} = T_{{{ First Response}}} - T_{{{ Submit PR}}} \end{aligned}$$
(2)

To measure the response time of recommended reviewers, for each project, we divide the pull requests into the two groups by whether the mention bot is called, and compare the average response time between the two groups. In detail, we put the pull requests that call the mention bot into the \(PR_{{ Bot}}\) group and the rest of them into the \(PR_{{ Non-bot}}\) group. After excluding the responses made by bots, we calculate the average of the response time in each group. To precisely calculate the average, we exclude the outliers using the interquartile range (IQR) in Box-and-Whisker plots (Hoaglin et al. 1986).

4.2 Findings

4.2.1 Performance of mention bot

Overall the average of the response rate in the 155 projects is about 75.37% (SD = 26.92%). For the response time, we found that the it is reduced in 75 out of the 155 projects (about 48.4%) when deploying the mention bot. However, the response time rather increase in the rest of the projects.

Table 3 We compared the response time in the \(PR_{{ Bot}}\) and \(PR_{{ Non-bot}}\) groups using Mann–Whitney–Wilcoxon test

We further analyze the change in the response time using Mann–Whitney–Wilcoxon test. Table 3 shows the comparison between the response time in the \(PR_{{ Bot}}\) and \(PR_{{ Non-bot}}\) groups. When the mention bot is deployed, the response time in the 25 projects is significantly reduced while the response time in the 6 projects is significantly increased. In the rest of the projects, there is no significant time difference between the two groups. We then randomly sample pull requests that do not employ the mention bot from the six projects whose response time significantly increased (\(PR_{{ Non-bot}} < PR_{{ Bot}}\)). The average response time is about 1.7 h (SD = 5.14). However, in the 25 projects with a significant decrease in response time (\(PR_{{ Non-bo}} > PR_{{ Bot}}\)), the average response time is around 9.45 h (SD = 74.01). This result implies that mention bot is more likely to reduce the response time in less active projects.

Fig. 3
figure 3

Participants indicate their level of agreement with following statements: a I think the recommendations made by the mention bot are appropriate. b I receive faster responses from reviewers using mention bot. c I respond faster to review request sent through the mention bot. d Using the mention bot saves my efforts to identify proper reviewers. e Using the mention bot saves my efforts to explore PRs

In our survey, we evaluate developers’ perceived usefulness and likeability of mention bot. Figure 3a shows the survey results of the first question that asks the appropriateness of Facebook mention bot recommendations. About 75% of the participants express positive responses with strongly agree or agree. In the second question, the first two statements ask whether the participants could save time when using the mention bot. As shown in Fig. 3b, c, 50–52% of the participants strongly agree or agree with the statements while 36.5–38.5% of the participants neither agree nor disagree with them. The rest 11.5% of the participants strongly disagree or disagree with the time benefit from the mention bot.

The last two statements in the second question ask if the participants could reduce efforts with the mention bot. Overall, compared to the responses for the time reduction, there are more positive and negative responses but less neutral responses. As described in Fig. 3d, e, 46.2–71.1% of the participants give us positive responses (strongly agree or agree) while 13.5–28.8% of them reply with the neutral (neither agree nor disagree). 15.4–25% of the participants show the negative responses (strongly disagree or disagree).

Interestingly, about 20% of the participants respond that the mention bot is useful to save the efforts for identifying proper reviewers but not helpful to reduce the time spent in this process. These results may imply that the effort reduction in identifying reviewers is perceived as the key benefit that mention bot provides for developers.

In the survey, we ask the participants whether they like mention bot and whether they would continue using it. The results show that 73% of the participants strongly like or like the service and 84.6% of the participants would continue using it, which suggests that users are positive about mention bot.

4.2.2 User needs for mention bot

Fig. 4
figure 4

The favorite features of the mention bot. The participants can choose one or multiple features

In our survey, we ask the participants to indicate their level of agreements with mention bot’s features: “Reviewer recommendation”, “Automatic notification”, “Enable/disable notification for certain PRs/people” and “Message customization”. Figure 4 shows the result from the survey. The most favorite feature is the “Reviewer recommendation” with the support from about 81.3% of the participants. The second most favorite feature is the “Automatic notification” which receives votes from approximately 60.4% of the respondents. The other two features, “Enable/disable notification for PRs/people” and “Message customization”, are the favorites for around 37.5% and 16.7% of the participants, respectively.

The 53 issue comments also show users’ preference about these features. As showed in Table 2, 25 issue comments contain positive feedbacks on the mention bot. The developers seem to like its core features, including the “Reviewer recommendation” and “Automatic notification”. Especially, when the mention bot is shut down (erlend sh 2016), we observe developers feel inconvenient and manually send notifications to the potential reviewers:

@YYY could you take a look at this and #2645 if you have time [...] Not sure what happened to our friend the mentionbot. facebook/mention-bot#134.

However, in the 20 comments of negative feedbacks on the mention bot, developers dislike the mention bot’s insensitivity to context and unbalanced workload allocation. Some of them do not want to get further notifications because they no longer work on the projects:

Can someone please correct the blacklist for @mention-bot? I don’t want to receive any notifications for this repository as I’m not a collaborator here. PS: Just complaining because this is the 4th email I receive thanks to the bot.

While the context insensitivity problem bothers the developers who no longer work on the projects, the unbalanced workload allocation problem increases some reviewers’ workloads and discourage others:

If a person is being recommended a lot, nominate a reviewer who wouldn’t have a super hard time.

It’s almost always recommending the same person in our project which is not really that helpful.

The main cause of these problems lies on its manual setting. In the current environment, project owners have to manually identify developers who do not want to be notified and then add them to the blacklist. The “Enable/disable notification for certain PRs/people” feature of mention bot is designed to minimize the above incorrect recommendation and unbalanced workload allocation problems, but its unfriendly designation discourages the users (only 37.5% of the participants like it).

The results from the survey and comment analysis imply that “Reviewer recommendation” and “Automatic notification” are the key features of mention bot (favorite by 83.1% and 60.4%, respectively). When these two features are broken, users will feel inconvenient. But if mention bot keeps notifying a specific reviewer, it will increase the workload of the reviewer. And users need a higher context sensitivity which can avoid notification to inactive developers in the projects. Given with these results, we find the possibility that the user needs may come from three user groups. For example, the project owners and contributors need the “Reviewer recommendation” and “Automatic notification” features, while the reviewers need a more balanced workload allocation and a higher context sensitivity.

4.3 Discussion

To address above user needs, we propose three potential features to improve mention bot:

  • A delay for 1–3 days before activate mention bot.

    User comments suggest that the immediate activation of mention bot may cause redundant notifications to developers. We explore the distribution of the response time in the archival data. We find that about 80.34% and 89.52% of the pull requests are responded within 24 and 72 h respectively. Given this, we propose that a delay for 1–3 days before activate mention bot would help to avoid the majority of redundant notification.

  • Automatically disable notification for inactive developers

    We find that the notification feature may bother developers who no longer work on the projects. Mention bot does have a blacklist to not notify certain developers, but project owners need to manually identify these developers and add them to the blacklist. We propose a feature that automatically turns off notifications if reviewer candidates are inactive. We suggest measuring the activeness of developers by checking their last contribution to the project and how much times they fail to respond to the recommendations before. For example, if a reviewer candidate made the last contribution on a project six months ago and fails to respond to the notifications three times, it would be better to turn off the notification to this developer.

  • Limit the maximum number of review requests to one developer.

    Workload balancing among recommended reviewers can be critical. As the participants say, it is not realistic to ask one developer to review many pull requests at a time while others have nothing to work with. We propose that we can limit the maximum number of review requests to one developer. For example, if one developer receives more than five review requests within a week, it is reasonable to lower the priority of this developer in recommendations.

Overall, up to June 2016, Facebook mention bot performs quite well as a reviewer recommendation service. Its recommended reviewers respond actively to the PRs (75.37%) and it is useful to reduce the response time in less active projects. Our user study supports that mention bot recommends appropriate reviewers for the PRs (75%) and developers perceive that the effort reduction in identifying reviewers is the key benefit provided by mention bot. And we find the possibility that the user needs identified may come from different user groups, which motivates us to investigate factors critical to their experience of ARR services separately in the later stage.

5 Stage II

Over the year, Facebook mention bot has added more configuration options to benefit different users.

For example, the “delayed” feature that we proposed in Stage I is added with default “false” setting and a “delayUntil 3 days” configuration. In addition, to avoid redundant notifications to the reviewers and provide a better recommendation result for the contributors, project owners can now filter developers and files via settings such as “requiredOrgs”, “skipAlreadyMentionedPR”, “fileBlacklist” and “skipTitle”. With so many attractive features added in, it is interesting to know whether mention bot has attracted more projects, whether developers are satisfied with the improvement, and whether each of the three user groups have unmet needs and expectations for ARR service. To answer these questions, we conduct the second stage of our research, starting with re-analyzing the adoption of mention bot and investigating factors critical to the three user groups of ARR services, i.e., project owners, contributors and reviewers, respectively.

5.1 Methods

5.1.1 Archival data collection

In Stage II, we find that the official account of “mention-bot” has been removed from GitHub so that we can not track the mention bot’s activities in the PRs like we do in Stage I anymore. But mention bot is still active. On the one hand, reviewers are still notified by mention bot. On the other hand, some developers configure mention bot by adding a “.mention-bot” file to the base directory of the repository but naming it in different ways, such as “jimmibot” and “salt-jenkins” (rather than “mention-bot”). Therefore, we revisit the 205 projects to check whether they still use mention bot using following criteria: (1) removed: some issues explicitly claim that the project removes mention bot. (2) Still use: the “.mention-bot” configuration file still exists in the project or there are issues that imply the existence of mention bot. (3) Disappeared: the project no longer exists in GitHub. (4) Unclear: there is no “.mention-bot” configuration file inside the project and we cannot find any issues claiming that the mention bot is still use or has been removed. To understand reasons for the usage and removal of these project, we further collect users comments about mention bot by searching related issues in GitHub using the keyword “mention bot”.

5.1.2 Survey and interview

To further investigate developers’ perceived usefulness of mention bot and explore factors critical to ARR user experiences and adoption, we conduct a survey with three user groups: project owners, contributors and reviewers (see Research Method Overview Section). We design five different questionnaires in our survey: (1) Project owner using mention bot; (2) Project owner not using mention bot; (3) Contributor using mention bot; (4) Reviewer using mention bot; (5) Contributor or reviewer not using mention bot. We design only one questionnaire for the contributor not using mention bot and the reviewer not using mention bot because we ask almost the same questions to investigate their needs and expectation for ARR services. Across all user groups, we ask respondents to rate the perceived usefulness and annoyance of the service as well as the efficacy of each features of mention bot on a 5-point Likert scale (1 being the least of each measure). There are also customized questions for each user group. For example, we ask project owners about the reason why they (do not) deploy mention bot; we ask contributors what they would do before issuing a pull request and when a mention bot comments on their pull requests; and we ask reviewers how they get pull requests to review and what they would do if notified by mention bot.

By searching the contributors of the projects that use mention bot now or used it before, we sent emails to over 700 potential users for invitation to survey and interview. Noticed that software developers might act as project owners, contributors or reviewers under different circumstances, we ask the participants to fill out the surveys as much as they could if they match the criteria and invite them to join our interview. In total, we get 34 effective survey responses and interview six developers through email and Google hangout.

Table 4 Mention bot’s status in the projects in Stage II

5.2 Findings

5.2.1 Change on the adoption of mention bot

Through our manual check, we find that 22 projects have removed mention bot, 30 projects still use it, 11 projects disappear and the rest 142 projects are unclear (Table 4). Among these 142 projects, 72 of them have no issues about mention bot, which is unusual if mention bot serves well in these projects. In addition, we try to identify mention-bot-related activities in other GitHub projects between July 2016 and August 2017 by searching “create mention bot” and “remove mention bot” in project “Commits” log in GitHub. Filtering out the irrelevant results, we identify that 22 projects claim that they employ mention bot and 19 projects claim that they remove it during this period. We can see that mention bot is not increasingly used in GitHub during this year.

Table 5 Contents showed in some comments

As for the searched comments, we finally identify 90 effective comments emerged within the year (from July 2016 to August 2017) that express user attitude towards mention bot. Among these comments, 33 of them show positive attitude toward mention bot, 26 of them are negative, and the rest 31 are neural. We further classify these comments based on their contents (Table 5), and identify five comments specifying benefits of mention bot, nine complain about the unbalanced workload allocation, 12 comments reporting bugs, nine suggesting room for improvement, and eight proposing an alternative service. We suspect that the adoption of mention bot is greatly impaired by its bugs, unbalanced workload allocation problem and the existence of alternatives.

5.2.2 Perception towards mention bot

We receive a total 34 valid responses from our survey. The responses come from 7 project owners using mention bot, 10 project owners not using mention bot, 11 contributors or reviewers not using mention bot, five contributors and one reviewer using mention bot. Overall, mention bot users “find it useful” (mean = 4.08, SD = 0.64). The project owners employ mention bot in their projects mostly for its “efficiency” (mean = 4.29, SD = 0.95) or “convenience” (mean = 3.71, SD = 0.95), but not for “fun” (mean = 2.29, SD = 1.11). With mention bot, project owners spend less effort in “managing the pull request process” (mean = 3.71, SD = 0.76) and can “engage developers more in the projects” (mean = 3.86, SD = 0.69). However, employing mention bot does not necessarily “boost the activeness of the projects” (mean = 3.00, SD = 1.00). After briefly explaining the concept of mention bot to the contributors and reviewers who have never heard of the service, 70% of them hope that the projects they participate in would employ it. Contributors who use mention bot do not think that they can always “get faster response from its recommended reviewers than from others” (mean = 3.00, SD = 0.71), or that “the suggested reviewers certainly provide better feedback” (mean = 3.20, SD = 1.10), or that “it improves their interaction with other developers” (mean = 3.2, SD = 1.10). However, they do agree that it saves their efforts in looking proper reviewers (mean = 4, SD = 1.22), which is consistent with our findings in Stage I.

Among the respondents, six software developers (I1, 2, 3, 4, 5, 6) express further interest and join our semi-structural online interview. I1 is a project owner as well as a current user of mention bot and I2 is a reviewer as well as a contributor who did not hear about mention bot before and the rest four (I3, 4, 5, 6) are project owners who used it before but removed it later. We mainly ask them to share their user experience with or without mention bot during the interviews.

When we ask I3, 4, 5, 6 why they removed mention bot, surprisingly, their answers are quite similar:

We’re not using mention bot any longer because GitHub added the “suggested reviewers” feature which is enough for our needs, but we found mention bot very useful otherwise. (I3)

Compared with mention bot, the “suggested reviewers” feature in GitHub is less aggressive because it does not automatically notify the reviewers but only suggests potential reviewers to project owners who have the write access to the PRs. It is plugged into the GitHub platform so that users do not need to configure it by themselves and worry about its instability. However, interviewees also commented that “suggested reviewers” is not flexible enough , as developers who only have read access to the PRs cannot send a request to the suggested reviewers on their own if the project owners are too busy to notify them. Our interviewee I1, the owner of a big project (with 1870 contributors, 84,888 commits and 26,576 closed PRs up to August 25, 2017), explains why he continues to use mention bot rather than “suggested reviewers”:

Our project is too big. The feature needs permission, the suggested reviewers should be the member of our project. But we have nearly 2000 contributors. We want them all in our project, and mention bot suits our need. (I1)

He stresses how mention bot contributes to his project:

Mention bot does improve the quality of software, because more people review the pull request before they are merged. A big improvement of the number of reviews that we get.

Overall, we find that mention bot’s performance does not meet some developers’ expectation possibly because of its unstable settings, unbalanced workload allocation and the existence of other ARR services, especially the better integrated “suggested reviewers” feature of GitHub. Still, many users value mention bot’s benefits in terms of extending reviewer pool and reducing effort in managing PRs. In the next subsection, we present the factors essential to the unique experience of each ARR user group.

5.2.3 Factors critical to ARR user experiences

Fig. 5
figure 5

The potential features of a reviewer recommendation service. Participants are asked to evaluate their usefulness

In our survey, we ask participants to indicate their perceived usefulness of a list of potential features of a PR reviewer recommendation service identified in Stage I: “Message customization”, “Explanation of the result”, “List of recommended reviewers”, “Delayed time” and “Blacklist”. As shown in Fig. 5, most respondents find “Explanation of the result” and “List of recommendation reviewers” (extremely) useful features to have (78.6% and 71.5%, respectively). In comparison, respondents’ perception of the other three features which already exist in mention bot is rather neutral (50.0% for “Message customization” and 50.0% for “Delayed time”, and 27.3% for “Blacklist”). In fact, some comments from social media are negative about these features:

“how do I get myself blacklisted from this XXX mention bot thing?”

“If the delay feature is enabled, mention bot no longer works”

We further summarize features that matter most to project owners, contributors and reviewers.

  • Project owners.

    1. 1.

      Simplicity Although Facebook mention bot claims that it can be set up easily, it has 22 configuration options now. We find that most of the projects we visit just keep the default setting, which disables features that might be helpful for contributors and reviewers such as fileBlacklist, SkipTitle and requiredOrgs. In fact, some project owners removed mention bot because of they could not configure it right:

      The configuration added is not working, so I just removed it since the benefit would be minor anyways (and might annoy some people?) Very funny. I was deleting the mention bot webhook and accidentally found out why it was not working. I forgot to check the events to be sent on ‘Labeling’. Oh well.

    2. 2.

      Stability Mention bot itself is a project under constant development, and thus may not function normally from time to time, which really affect the experience if it is under heavy usage. One of our interviewee (I4) removed it because “It stopped working a while ago so I’ve disabled it.”. Besides, as showed in Table 5, the bugs of mention bot reported by 12 out of the 90 comments we collected also discourage its usage, e.g., “Seems the complete .mention-bot file is currently ignored”.

  • Contributors.

    1. 3.

      Transparency According to our survey, it is not a common practice for contributors to identify reviewers by their own, such as “manually search and add reviewers” (mean = 2.57, SD = 1.16) or “mention reviewers they know” (mean = 3.00, SD = 1.03). When mention bot comments on their PRs, although they are inclined to “trust its recommendation” (mean = 3.8, SD = 0.84), many contributors will still “check its recommendation” (mean = 4.6, SD = 0.55). But mention bot and GitHub “suggested reviewer” feature are not transparent enough as their results are only several user names of the reviewers. Some contributors may want to know who is in charge of the part that they make PR to: “Normally I want my direct supervisor rather than those who had modified related files to review my PRs” (I2). The fact that “Explanation of the result” and “List of recommended reviewers” are the most preferred features according to our survey also suggests that contributors want decisions made by ARR services to be more transparent.

  • Reviewers.

    1. 4.

      Selectivity Our respondents are somewhat conservative about taking on PR reviews, such as “look for pull requests interesting to me on my own” (mean = 3.33, SD = 1.12), “mentioned by contributors” (mean = 3.00, SD = 1.12) or “want to be recommended by a bot” (mean = 3.33, SD = 1.00). This may be because they are already rather occupied: “I am too busy to look every email from GitHub because it sends all information about the update of pull requests, but actually I do not need to review all those pull requests” (I2). I2 said that he would like the bot to only notify him with the PRs that really need him. These results imply that the reviewers would not actively take on ordinary PR reviews but want to have selectivity to only be notified by certain kinds of PRs.

Overall, in Stage I we find that users need a better reviewer recommendation with automatic notification as well as a more balanced workload allocation and a higher context sensitivity, while in Stage II we further explore that the simplicity, stability, transparency and selectivity are critical to the ARR experiences of different user groups. Taking all these user needs and factors into account, we propose our design considerations of ARR services in next section.

6 Discussion

In this section, we present design considerations for improving ARR services, other insights and limitations of this work.

6.1 Design considerations for improving ARR services

Based on findings from both stages, we propose three design considerations that would possibly improve user experience of ARR services.

6.1.1 Easier configuration of ARR service for project owners

Users cannot customize the “suggested reviewers” feature provided by GitHub, and thus it cannot adequately meet different types of user needs. Mention bot does have many options to deal with different situations, but its unfriendly manual configuration process intimidates many project owners who are responsible for handling the service. Since the project owners tend to “only care about the PRs and want the bot easily tells how its capacities are” (I1), we propose that a better ARR service should have an easier configuration process. For example, the service can have shortcuts to easily change modes to satisfy different needs. If the project needs more external contributions, the owner can use a shortcut to adjust some options to invite external reviewers to review the PRs. Besides, the service can have a log so that project owners can easily reset it to the suitable and stable state.

6.1.2 Better transparency of recommendation for contributors

According to our survey in Stage II, contributors tend to check on mention bot’s recommendation when it comments on their PRs, and they call for information that can improve their understanding of why a particular recommendation is made. Therefore, we propose that a better ARR service should keep their recommendation transparent to contributors, especially regarding the qualification and availability of the suggested reviewers. For each PR, in addition to directly naming the top few appropriate reviewers, ARR service can provide a ranked list of all the potential reviewers for this PR, each with a brief profile summarizing their role in the project, specialty, recent activeness, current workload, etc. In case contributors would like to manually select reviewers, this list would be a good place to start.

6.1.3 More flexible notification preference setting for reviewers

Reviewers are bothered the most by ignorant PR review notifications. For example, when reviewers are already overloaded with work on the project or in real life, they do not want to receive more review requests. Mention bot does have some mechanisms to filter reviewers in the candidate pool, but only project owners have the access to set the rules. Reviewers have to contact the mangers to adjust the pool if they would like to disengage from/reengage in the review activities. While the automatically filtering out inactive reviewers feature that we propose in Stage I is a potential way to avoid unnecessary notification, and it may not be able to respond instantly to urgent changes in availability. Hence, we propose that a better ARR service should allow reviewers to specify personal notification preference on their side. Reviewers can change their status to “Do not disturb” when occupied, declare types of PRs uninterested to them, and set a maximum quota of PRs. Further more, for reviewers (e.g., I2) who would love to help but are not sure of their qualification and/or availability, ARR services may instead recommend PRs to them according to their interests.

6.2 Additional insights into bot usage in GitHub

Our interviewees share their positive attitude toward general bots usage in GitHub in the interviews.

Really necessary, because there are many repeated work to do otherwise.” (I1) “I feel most of the bots can solve actual problems. I am positive toward these bots and hope more and more useful bots come up. (I2)

Almost every big project that we visit for this research involves some bot(s) in its development, such as “facebook-github-bot” in Facebook organization, “Microsoft Pull Request Bot” in Microsoft society and “greenkeeper bot”. Their functions are very specific, helping with small chores like adding labels to PRs or sending customized messages. Our interviewees hope to see a bot that can provide all these functions in the future.

6.3 Limitation and future work

Our work has some limitations. Some of our findings might be unique to GitHub and we did not compare Facebook mention bot with other ARR services. Our survey results come from a small sample of developers due to the low response rate. Therefore, our results may not represent the opinions of the entire user group (e.g., contributors with different levels of experiences) about reviewer recommendation services. In the future, we plan to improve the coverage and generality of our research, develop a user-friendly ARR service based on the findings, and test its usefulness, usability, and user experience in the wild. In addition, the emergence of competitors of mention bot, such as GitHub “suggested reviewers” features, indicate that a simple direct-manipulation feature rather than a chatbot could be enough. We will further explore this point in the future.

7 Conclusion

In this paper, we used Facebook mention bot, an automatic reviewer recommendation (ARR) bot in GitHub, as a lens to explore how developers work with ARR services. We used a two-stage mixed-methods approach to investigate practical usefulness of mention bot and critical needs for different types of users. Our Stage I investigation (June 2016) shows that mention bot performed quite well as it can save contributors’ effort in identifying proper reviewers, and can achieve a 75.57% response rate among suggested reviewers who expressed the need for a better workload allocation. A year later in Stage II (August 2017), we do not see an obvious increase in mention bot’s adoption, perhaps due to its inherent problems and the existence of other ARR alternatives. Our survey and interview with three user groups (project owners, contributors and reviewers) suggest that simplicity, stability, transparency and selectivity are critical to the user experiences of ARR services. According to these findings, we propose a set of considerations for designing more user-friendly ARR services.