On the Need for a New Generation of Code Review Tools

Baum, Tobias; Schneider, Kurt

doi:10.1007/978-3-319-49094-6_19

Tobias Baum¹⁹ &
Kurt Schneider¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10027))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

5647 Accesses
13 Citations
1 Altmetric

Abstract

Tool support for change-based code review is gaining widespread acceptance in the industry. This indicates that the current generation of tools is well-aligned to current code review practices. Nevertheless, we believe that further improvements in code review tooling can lead to increased review efficiency and effectiveness. In this paper, we combine results from a qualitative study and results from the literature to substantiate this claim. We derive promising improvement areas and provide an overview of existing research in these areas. A common attribute of these improvements is that they trade flexibility for reviewer support. As flexibility is one of the main characteristics of the current generation of code review tools in Hedberg’s classification of review tool generations, we regard these coming tools as part of a new generation of code review tools.

Access provided by Autonomous University of Puebla. Download conference paper PDF

The evolution of the code during review: an investigation on review changes

Article Open access 20 September 2022

The Choice of Code Review Process: A Survey on the State of the Practice

An Empirical Investigation of Relevant Changes and Automation Needs in Modern Code Review

Article Open access 13 September 2020

Keywords

1 Introduction

Code review is a well-established method of software quality assurance. In recent years, change-based review has become the dominant style of code review in industry [5, 24]. Its main characteristics are the use of code changes performed in a unit of work, e.g. a user story, to determine the scope of the review, and the replacement of management intervention through conventions or rules for many decisions [5]. Change-based review is supported by tools, in some cases specialized code review tools, in other cases general-purpose tools like “diff”. Looking at their widespread adoption, these tools seem to address the current needs of the industry. Nevertheless, we believe there is still room for improvement, not least because these tools do not fully incorporate existing research results. The purpose of this article is to derive and collect promising ideas to improve code review effectiveness and efficiency through code review tools in the context of industrial software development. This is done from the point of view of researchers developing these tools and based on results published in the literature and on interviews with software development professionals. This article can be used to guide and direct future research as well as tool development efforts.

2 Methodology

The style of this article is largely deductive. We extract well-founded hypotheses from existing research and combine them to derive and evaluate improvement opportunities for industrial code review. While there are notable differences between classic Fagan Inspection and modern change-based code review, many important aspects are similar [5, 24]. We therefore also include results from research on classic inspections as evidence, as long as we believe them to be applicable. Further experiments could be conducted to ascertain these assumptions. A limitation of this part of our analysis is that we did not perform a systematic literature review (SLR) in the narrow sense of the term. An SLR could have further reduced the risk to miss relevant publications.

We recently performed a study based on semi-structured interviews with 24 software professionals from 19 companies [5, 6]. To some extent, these interviews concerned the way in which reviewers work and which problems they perceive. These interviews form the second pillar of our argumentation, in addition to the literature. All interviews were recorded and later transcribed. Most interviews were conducted by the first author. Some interviews were performed by another researcher to reduce the risk of bias. Our sample has a focus on small and medium standard software development companies and in-house IT departments from Germany, but we included contrasting cases for all main factors. The interviewees are mostly software developers and team or project leads, as the development teams were responsible for code reviews in the sampled cases. The interviews were conducted between September 2014 and May 2015. Further methodological details on the interviews can be found in our related articles [5, 6]. The main study [6] followed “Grounded Theory” methodology, but the results presented in the current article are not a grounded theory. We cite many statements from the interviews as examples for certain points. The subscripts at these citations denote the interviewee ID from [6].

To assess the current state of code review tools, we combined information from our interviews and from the websites of the respective tools. To a limited degree, we also executed and tried some of the tools.

3 What Do We Know About Code Reviews?

A lot of research has been done on code reviews and inspections, and still many questions could not be answered conclusively. But some results are relatively well supported and a subset of these will form the foundation of our discussion:

The first such result concerns which factors have a major and which only a minor influence on the effectiveness and efficiency of reviews. When analyzing experimental data, Porter et al. “found that [reviewers, authors, and code units] were responsible for much more variation in defect detection than was process structure”, and they “conclude that better defect detection techniques, not better process structures, are the key to improving inspection effectiveness.” [21]. A similar conclusion is reached by Sauer et al., who identify “individuals’ task expertise as the primary driver of review performance” based on theoretical considerations. Correlations between the (inspection) expertise of the reviewer and the number of found defects have also been reported by Rigby [23] and by Biffl and Halling [8], just to name a few. We conclude that the major factors influencing code review effectiveness and efficiency are the reviewer, its relation to the artifact under review and the way in which it performs the checking.

The second important result is about the role of understanding the artifact under review. In their study based on interviews with developers at Microsoft, Bacchelli and Bird found that “[m]any interviewees eventually acknowledged that understanding is their main challenge when doing code reviews” [2], which confirmed earlier results from Tao et al. [26]. Further support for a positive correlation between code understanding and review effectiveness comes from experiments by Dunsmore, Roper and Wood [12]. Our interview results fully support these findings, e.g.: “I have to understand what the other developer thought at that time. And for that you look very closely at the code, and then things that should or could be done better somehow come up automatically”\(_{ 3 }\).

4 The Problem of Large Changes

In our interviews, we asked about problems hampering review effectiveness. One of the most common themes was the difficulty to understand and review large changesets: “Smaller commits are generally not a problem. But these monster commits are always ...not liked very much by the reviewers.”\(_{ 5 }\) “What sometimes impedes me is when the ticket is just too big.”\(_{ 7 }\) “When you have such a big pile to review the motivation is not very high and you probably don’t approach the review with the needed quality in mind.”\(_{ 12 }\)

The conclusion that large changesets are problematic can also be derived from other research results: There is evidence that the review effectiveness greatly decreases when the review rate (checked lines of code/time for checking) is outside the optimal interval (see e.g. [15]). There is also evidence that concentration and therefore review effectiveness fades after some time of reviewing [19, 22]. Combining these values leads to an upper limit on the maximal size of an artifact that can be reviewed effectively in a single session.

Given the problems with the review of large changes, many teams resort to the frequent review of small changes [23]. Up to a certain point, this is a good thing to do, but there are also arguments in favor of larger changes and reviews: The change under review should be self-contained, it should fulfill certain quality criteria before central check-in (at least to be compilable) and reviewing very small changes can lead to high overhead and duplicate work [26]. So instead of forcing every change to be very small, we argue to make the review of larger changes more effective and let changes stay at their “smallest natural size”.

5 Tool Support to the Rescue

We substantiated in the previous sections that to increase the effectiveness and efficiency of code reviews for defect detection, we should focus on the reviewer and how to help her/him understand large code changes better. We believe that improved tool support provides a lot of opportunities in this regard, and will give examples in the following subsections. Additionally, an overview of our argumentation is shown in Fig. 1. The subsections correspond to the most important influencing factors, deduced from the results mentioned so far:

Choose the best reviewer for the job (Sect. 5.1)
Shrink the size of the changeset that has to be reviewed (Sect. 5.2)
Help the reviewer to understand large changesets (Sect. 5.3)
Decrease the need to understand the change (Sect. 5.4)

5.1 Reviewer Recommendation

In recent years, there have been a number of studies on “reviewer recommendation”, i.e. on finding the best reviewer(s) for a given change (e.g. [3, 29]). While this promises a large effect in theory, there are several problems reducing the benefit, especially in smaller teams. The most obvious is that in a small team, it is often fairly easy to see who is a good reviewer for a change, so that computer support does not provide large gains. In some other cases, the reviewer for a certain module is fixed [5], so there is no choice at all. Additionally, always choosing the best reviewer can lead to a high review load for experienced developers, and a high workload has a negative impact on review quality [7]. Therefore reviewer recommendation has to move from determining local optima for every single review to a more global optimization of reviewer assignment.

5.2 Reducing Changeset Size

Given large changesets with singular changes of varying relevance for the review goals, reviewers try to manually pick the relevant subset. This is seen as hard and error-prone: “After some time you get a feeling which files are relevant and which are not, but it’s hard to filter them out. And when I don’t look at them there might be some change in there that was relevant, anyway. That’s problematic.”\(_{ 8 }\)

An important special case is systematic changes, especially rename and move refactorings. This special case has been studied for example by Thangthumachit, Hayashi and Saeki [28] and Ge [14]. For the more general case, Kawrykow and Robillard [18] developed a method to identify “non-essential” differences. Zhang et al. [30] describe the tool “Critics” to help in inspecting systematic changes using generic templates. Tao and Kim [27] propose an approach to partition composite code changes. Further research could provide a better foundation to decide which changes are low-risk, and it could look into the distinction between change fragments that are error-prone and need to be checked in detail and change fragments that only need to be read to help understanding. Another research avenue is to include more data, such as test coverage information, to assess review relevance. Nevertheless, much could already be gained by bringing the promising existing results into wider use.

5.3 Support for Understanding the Change

A theme that occurred throughout our interviews is that large changes are best reviewed with the search and hyperlinking support of an IDE (e.g. “I think reviewing code purely in’Crucible’ only works for trivialities. Because naturally many features are missing that you have in an IDE.”\(_{ 2 }\)). This improvement has already made its way into some widely used review tools, either by making IDE-like support available in a browser (e.g. “Upsource”^{Footnote 1}) or by making the review tool available as an IDE plugin (e.g. “AgileReview”^{Footnote 2} or “EGerrit”^{Footnote 3}).

Many of our interviewees try to get a high-level understanding of the change at the start of the review (“at first an overview because otherwise the problem is that you loose sight of the interrelation of the changes”\(_{ 10 }\)). The current support for this activity is very limited, consisting mainly of the overview of the commit messages of the singular commits belonging to the change. There is relatively little research on visualizing and summarizing code changes for better understanding: McNair, German and Weber-Jahnke propose an approach to visualize change-sets [20], as do Gomez, Ducasse and D’Hondt [16]. In addition, several textual summarization techniques have been proposed (e.g. [9]). A related technique that can help to summarize the contents of a change is “change untangling” [4, 11, 27]. We believe that more research on these topics is needed to make change visualization effectively usable by reviewers.

After having an overview of the changes, the reviewer needs to step through the change’s details in some order. Many reviewers try to find an order that helps their understanding, but often fall back to the order presented by their review tool: “The problem is you sometimes get lost and don’t find a good starting point.”\(_{ 10 }\) “If you don’t have that, you just step through the files in the commit one after another ...”\(_{ 10 }\). A similar finding resulted from a study by Dunsmore, Roper and Wood where participants suggested “ordering of code” to improve inspections [13]. Guiding the reviewer as proposed here shares some similarities with the reading techniques studied intensively for inspections [1, 10]. The main difference is that these reading techniques try to change the way the reviewer works, while the proposed guiding moves some cognitive load from the human reviewer to the tool. In addition, most reading techniques proposed so far are not intended to be used with changesets, so that research opportunities abound in this area.

5.4 Decrease the Need for Code Understanding

From a theoretical point of view, reducing the need to understand the code is another possibility to solve the stated problem. Essentially this is a question of efficiency: Is in-depth code review the most efficient way to find a certain defect type or are there more efficient ways, e.g. static code analysis or testing? [15, 25] As long as there are practically relevant defect types for which in-depth code review is most efficient, understanding the code will still be needed. And when there will be no such defect types anymore, for example after a breakthrough in static analysis research, code review in its current form will not be needed any longer for defect detection. Therefore, we won’t discuss this topic further in this article.

6 A New Generation of Code Review Tools

About a decade ago, Henrik Hedberg proposed a classification of software inspection/review tools into generations [17]. He concluded that the coming fifth generation should provide flexibility with regard to the supported documents and processes and that they should comprehensively include existing research results. This prediction has come true (with limitations): Current review tools like “Gerrit”^{Footnote 4}, “Crucible”^{Footnote 5} or “Collaborator”^{Footnote 6} are flexible and commonly support the review of any kind of text file. In the preceding sections, we derived opportunities to reach a higher level of review effectiveness. For most of them, a reification from the review of changes in text files to the review of changes in source code has to take place. Flexibility is traded for better reviewer support to some degree. This leads us to expect the rise of the sixth generation of code review tools, the generation of “cognitive support review tools”.

7 Summary

We collected four findings on code review we regard as well established: (1) Code review effectiveness and efficiency depend to a large degree on the reviewer, its style of work and its fit to the artifact under review. (2) Understanding the review artifact is the most important aspect of reviewing code. (3) Review in industry is commonly done change-based. (4) The review of large changes is the most significant challenge in code review. Based on these assumptions we derived leverage points to improve review effectiveness and efficiency through tool support. For each of these points, we surveyed existing research and state open research questions. In our own work, we currently look into some of these research questions. We believe there is an abundance of open questions for other researchers to join us in our efforts to lay the foundation for the generation of “cognitive support review tools”.

Notes

References

Aurum, A., Petersson, H., Wohlin, C.: State-of-the-art: software inspections after 25 years. Softw. Test. Verification Reliab. 12(3), 133–154 (2002)
Article Google Scholar
Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 712–721. IEEE Press (2013)
Google Scholar
Balachandran, V.: Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press (2013)
Google Scholar
Barnett, M., Bird, C., Brunet, J., Lahiri, S.K.: Helping developers help themselves: automatic decomposition of code review changesets. In: Proceedings of the 2015 International Conference on Software Engineering. IEEE Press (2015)
Google Scholar
Baum, T., Liskin, O., Niklas, K., Schneider, K.: A faceted classification scheme for change-based industrial code review processes. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2016)
Google Scholar
Baum, T., Liskin, O., Niklas, K., Schneider, K.: Factors influencing code review processes in industry. In: Proceedings of the ACM SIGSOFT 24th International Symposium on the Foundations of Software Engineering. ACM (2016)
Google Scholar
Baysal, O., Kononenko, O., Holmes, R., Godfrey, M.W.: Investigating technical and non-technical factors influencing modern code review. Empir. Softw. Eng. 21(3), 932–959 (2016). doi:10.1007/s10664-015-9366-8
Article Google Scholar
Biffl, S., Halling, M.: Investigating the influence of inspector capability factors with four inspection techniques on inspection performance. In: Eighth IEEE Symposium on Software Metrics, 2002, Proceedings, pp. 107–117. IEEE (2002)
Google Scholar
Buse, R.P., Weimer, W.R.: Automatically documenting program changes. In: Proceedings of the IEEE/ACM international conference on Automated software engineering, pp. 33–42. ACM (2010)
Google Scholar
Denger, C., Ciolkowski, M., Lanubile, F.: Investigating the active guidance factor in reading techniques for defect detection. In: International Symposium on Empirical Software Engineering, 2004, Proceedings, pp. 219–228. IEEE (2004)
Google Scholar
Dias, M., Bacchelli, A., Gousios, G., Cassou, D., Ducasse, S.: Untangling fine-grained code changes. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering, pp. 341–350. IEEE (2015)
Google Scholar
Dunsmore, A., Roper, M., Wood, M.: The role of comprehension in software inspection. J. Syst. Softw. 52(2), 121–129 (2000)
Article Google Scholar
Dunsmore, A., Roper, M., Wood, M.: Systematic object-oriented inspection - an empirical study. In: Proceedings of the 23rd International Conference on Software Engineering, pp. 135–144. IEEE Computer Society (2001)
Google Scholar
Ge, X.: Improving tool support for software developers through refactoring detection. Ph.D. thesis, North Carolina State University (2014)
Google Scholar
Gilb, T., Graham, D.: Software Inspection. Addison-Wesley, Wokingham (1993)
Google Scholar
Gómez, V.U., Ducasse, S., D’Hondt, T.: Visually characterizing source code changes. Sci. Comput. Program. 98, 376–393 (2015)
Article Google Scholar
Hedberg, H.: Introducing the next generation of software inspection tools. In: Bomarius, F., Iida, H. (eds.) PROFES 2004. LNCS, vol. 3009, pp. 234–247. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24659-6_17
Chapter Google Scholar
Kawrykow, D., Robillard, M.P.: Non-essential changes in version histories. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 351–360. ACM (2011)
Google Scholar
Laitenberger, O., Leszak, M., Stoll, D., El Emam, K.: Quantitative modeling of software reviews in an industrial setting. In: Sixth International, Software Metrics Symposium, 1999, Proceedings, pp. 312–322. IEEE (1999)
Google Scholar
McNair, A., German, D.M., Weber-Jahnke, J.: Visualizing software architecture evolution using change-sets. In: 14th Working Conference on Reverse Engineering, 2007, WCRE 2007, pp. 130–139. IEEE (2007)
Google Scholar
Porter, A., Siy, H., Mockus, A., Votta, L.: Understanding the sources of variation in software inspections. ACM Trans. Softw. Eng. Methodol. (TOSEM) 7(1), 41–79 (1998)
Article Google Scholar
Raz, T., Yaung, A.T.: Factors affecting design inspection effectiveness in software development. Inf. Softw. Technol. 39(4), 297–305 (1997)
Article Google Scholar
Rigby, P.C.: Understanding open source software peer review: review processes, parameters and statistical models, and underlying behaviours and mechanisms. Ph.D. thesis, University of Victoria (2011)
Google Scholar
Rigby, P.C., Bird, C.: Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 202–212. ACM (2013)
Google Scholar
Roper, M., Wood, M., Miller, J.: An empirical evaluation of defect detection techniques. Inf. Softw. Technol. 39(11), 763–775 (1997)
Article Google Scholar
Tao, Y., Dang, Y., Xie, T., Zhang, D., Kim, S.: How do software engineers understand code changes? an exploratory study in industry. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM (2012)
Google Scholar
Tao, Y., Kim, S.: Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), pp. 180–190. IEEE (2015)
Google Scholar
Thangthumachit, S., Hayashi, S., Saeki, M.: Understanding source code differences by separating refactoring effects. In: 2011 18th Asia Pacific Software Engineering Conference (APSEC), pp. 339–347. IEEE (2011)
Google Scholar
Thongtanunam, P., Tantithamthavorn, C., Kula, R.G., Yoshida, N., Iida, H., Matsumoto, K.-I.: Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER) (2015)
Google Scholar
Zhang, T., Song, M., Pinedo, J., Kim, M.: Interactive code review for systematic changes. In: Proceedings of 37th IEEE/ACM International Conference on Software Engineering. IEEE (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

FG Software Engineering, Leibniz Universität Hannover, Hannover, Germany
Tobias Baum & Kurt Schneider

Authors

Tobias Baum
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tobias Baum .

Editor information

Editors and Affiliations

Norwegian University of Science and Technology, Trondheim, Norway
Pekka Abrahamsson
Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern, Germany
Andreas Jedlitschka
and Technology, Norwegian University of Science and Technology, Trondheim, Norway
Anh Nguyen Duc
University of Innsbruck, Innsbruck, Austria
Michael Felderer
Okayama Prefectural University, Soja, Japan
Sousuke Amasaki
Tampere University of Technology, Tampere, Finland
Tommi Mikkonen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baum, T., Schneider, K. (2016). On the Need for a New Generation of Code Review Tools. In: Abrahamsson, P., Jedlitschka, A., Nguyen Duc, A., Felderer, M., Amasaki, S., Mikkonen, T. (eds) Product-Focused Software Process Improvement. PROFES 2016. Lecture Notes in Computer Science(), vol 10027. Springer, Cham. https://doi.org/10.1007/978-3-319-49094-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-49094-6_19
Published: 06 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49093-9
Online ISBN: 978-3-319-49094-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Need for a New Generation of Code Review Tools

Abstract

Similar content being viewed by others

The evolution of the code during review: an investigation on review changes

The Choice of Code Review Process: A Survey on the State of the Practice

An Empirical Investigation of Relevant Changes and Automation Needs in Modern Code Review

Keywords

1 Introduction

2 Methodology

3 What Do We Know About Code Reviews?

4 The Problem of Large Changes

5 Tool Support to the Rescue

5.1 Reviewer Recommendation

5.2 Reducing Changeset Size

5.3 Support for Understanding the Change

5.4 Decrease the Need for Code Understanding

6 A New Generation of Code Review Tools

7 Summary

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On the Need for a New Generation of Code Review Tools

Abstract

Similar content being viewed by others

The evolution of the code during review: an investigation on review changes

The Choice of Code Review Process: A Survey on the State of the Practice

An Empirical Investigation of Relevant Changes and Automation Needs in Modern Code Review

Keywords

1 Introduction

2 Methodology

3 What Do We Know About Code Reviews?

4 The Problem of Large Changes

5 Tool Support to the Rescue

5.1 Reviewer Recommendation

5.2 Reducing Changeset Size

5.3 Support for Understanding the Change

5.4 Decrease the Need for Code Understanding

6 A New Generation of Code Review Tools

7 Summary

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation