Keywords

1 Introduction

For many years, information technology has become increasingly pervasive, and consequently, its importance has also risen [1]. Moreover, this trend is not expected to end soon [2]. This leads to an ongoing demand for new and improved software products to satisfy the emerging needs [3]. Yet, developing software is a highly challenging task, which is prone to errors and in case of complex systems, potentially susceptible to unintended side effects and compatibility issues [4, 5]. Furthermore, badly designed or erroneously implemented software solutions can have detrimental impacts on the utilizing organizations [6]. Additionally, due to the increasing technical possibilities and the corresponding emergence of opportunities for new applications and use cases, the complexity of the developed solutions has also increased [7]. Thus, tools, techniques, paradigms, and strategies that facilitate the quality assurance of the developed applications are of high relevance for practitioners as well as for researchers [8] and therefore heavily and broadly discussed in the scientific literature [9,10,11,12,13,14,15].

Test-driven development (TDD) is one of those strategies. Even though, due to its holistic nature, it can be seen rather as a philosophy instead of just a testing approach [16]. While it is not a new concept and has already been scientifically discussed for a long time [17], it has not yet reached maturity, and therefore, the corresponding discourse is still ongoing and highly relevant. With the approach being rather complex, since the developers now also have to fill the role of testers at the same time, naturally, guidance on its correct application is one of the most relevant aspects for many practitioners. To provide guidance to them, but also to researchers concerned with the domain, by conducting a structured literature review, the publication at hand aims to identify the implementation guidelines and best practices identified in the existing literature with the goal of providing a solid foundation that any future endeavors can subsequently build upon. This culminates in the following research question (RQ):

RQ: Which guidelines for the application of test-driven development in software engineering can been identified in the scientific literature?

To answer the RQ, the ensuing text is structured as follows. After the introduction, an overview of the concept of TDD is given. Subsequently, the protocol of the conducted literature review is outlined. Afterward, the findings of the review are presented. Finally, a conclusion is given, and limitations as well as potential prospects for future research are discussed.

2 Test-Driven Development

When consulting the literature [18], the application of TDD is highlighted as a promising way of improving an implementation’s quality as long as the accompanying increase in development time and effort is deemed an acceptable trade-off.

The approach aims at improving the quality of the regarded product by mainly influencing two aspects. On the one hand, the test coverage shall be increased, helping in finding and subsequently remedying issues that occurred during the implementation of the respective artifact. On the other hand, TDD also influences the design process itself, leading to a more manageable and better preplanned structure that helps in avoiding mistakes and incompatibilities [19, 20]. While its main application area is the software development domain, it is also applicable to the special case of the implementation of big data applications [21], process modeling [22], or the development of ontologies [23, 24]. However, the main focus for this publication is the application of TDD in the development of software.

When following the “traditional” way of developing software, after envisioning a feature or change that is supposed to be realized, it is implemented and subsequently tested. In contrast, in the test-driven approach, the order of implementation and testing is swapped. Therefore, after the desired change is conceived, it is partitioned into the smallest reasonable pieces [25]. For those, to assure that the required functionality is provided, one or more tests are written. Afterward, the tests are executed and supposed to fail, since the actual functionality has not yet been implemented [26]. Only then, the productive code is written to provide the new functionality. Factors like the code’s elegance are not yet regarded, instead the simplest solution is sought after. Once finished, the code has to pass the previously written tests [20]. When successful, the code is refactored to improve aspects like its readability or the adherence to standards and best practices [26]. While doing so, the functionality is constantly validated against the tests.

As mentioned before, this approach not only affects the test coverage, but by having small working packages instead of large tasks, also the software’s design. Furthermore, this focus on incremental modifications [27], intertwining the testing and implementation, provides the developer with more immediate feedback, since it results in short testing cycles [28]. While usually the majority of tests is specifically written for those small units, also other tests such as integration, system, or acceptance tests can be used in TDD [29]. Moreover, to fully harness the potential of TDD without occupying the developer’s attention by forcing them to run the tests manually, TDD is often used in conjunction with test automation in a continuous integration (CI) context [30, 31]. In doing so, to assure that the newest code addition or change does not negatively affect the already existing parts of the implementation, a CI server automatically starts and reruns all applicable tests once a new code commit is registered by the versioning system.

3 The Literature Review

To answer the RQ, a structured literature review (SLR) is conducted that is oriented on recognized methodologies, namely the approaches of Webster and Watson [32], Levy and Ellis [33], and Okoli [34]. In the following, the whole process is described as detailed as possible, allowing successive researchers to retrace and evaluate its steps. This, in turn, facilitates a better judgment regarding this contribution’s value for their own endeavors and also allows to build upon if desired [35]. To increase understandability, the process is also visualized in Fig. 1.

Fig. 1
figure 1

Review process

Because Scopus is considered the most extensive scientific abstract and citation database while simultaneously providing the user with powerful tools for searching and filtering, it was chosen for the purpose of this literature review [36, 37].

Since the publication at hand aims to give guidance on how to apply the TDD methodology, a variety of relevant search terms and different spellings were used to cover the relevant literature. To assure the relevancy of the results to the TDD domain, the title had to contain either “test driven development”, “test-driven development”, or “testdriven development.

Further, to identify the works that provide guidance how TDD should (not) be conducted, the title, abstract or keywords had to contain “best practice”, “best-practice”, “pattern*”, “guideline*”, “guidance”, “antipattern*”, or “anti-pattern*”.

The overall search term, therefore, looks as follows:

(TITLE (“test driven development” OR “test-driven development” OR “testdriven development”) AND TITLE-ABS-KEY (“best practice*” OR “best-practice*” OR “pattern*” OR “guideline*” OR “guidance” OR “antipattern*” OR “anti-pattern*”)).

By using this search term, 22 publications were identified (December 22, 2021) for further consideration. Subsequently, to determine those that are actually relevant to this work’s topic, a multi-stepped filter process was conducted. For this, the inclusion and exclusion criteria listed in Table 1 were applied.

Table 1 The inclusion and exclusion criteria

In the first phase of the filter process, the respective titles were read and if it became apparent that a paper does not fit the scope, it was excluded. This reduced the number of contributions to 18. Afterward, for those that remained, the abstract and keywords were considered to narrow down the list even further. Again, if a paper was deemed clearly not fitting, it was excluded, while in case of uncertainty, it was carried over to the next phase. This resulted in a set of eleven contributions. For these, as the next step in the filter process, the introduction and conclusion were read. Subsequently, nine articles remained. Ultimately, as the last step, these were skimmed in total, leading to a final set of five relevant publications, which are listed in Table 2.

Table 2 The final set of literature

4 Findings

In [38], which is the earliest paper considered in this review, a technique for the application of TDD to the development of graphical user interfaces (GUI) is presented. It is based on a model-view-presenter pattern, which is a variation of the common model-view-control-pattern. The approach focuses on a strong decoupling of the data handling and application logic behind a GUI and its visual presentation. This, in turn, allows to implement small changes to the GUI (e.g., placement of buttons, grouping of objects, display stile for indicators), without affecting the business logic, granting more flexibility, and less implementation risk. At the same time, for the testing, mock objects can be used, decreasing the complexity. Additionally, by starting with the implementation of the presenter, the requirements of the customer can be gathered and solidified, avoiding to start developing the model without having comprehensive knowledge of what is actually needed. This also helps to not get lost in focusing on rather insignificant visual details or fattening the view component with business logic, which also increases the difficulty of testing. Overall, this paper presents a strong case for the decoupling of components in a TDD setting, while also proposing a technique for its utilization when it comes to the development of GUIs. Furthermore, to save time and avoid complications, it highlights the importance of structuring the developers’ work in a way that the requirements can be gathered before starting with the implementation of the actual logic.

During the same year, an analysis of two TDD endeavors was published [39], focusing on how such projects should be conducted. In this work, the importance of conviction to the approach and if possible, guidance through experienced TDD applicants was highlighted. Moreover, it was stressed that TDD is not an end in itself, therefore, the tests should be directed at parts that might actually break. This means that writing tests for every single component of the program, like “getter” or “setter” methods, are a waste of time that should be avoided. However, while writing unit tests are the backbone of TDD, it is also necessary to create integration tests to verify the interplay of those units. Accordingly, these two types of tests can be considered the minimum. Generally, the application of different types of automated tests is recommended to increase the coverage of the validation. Further, every test should bring business value. With respect to the costs, it is necessary to find a balance between manual tests and test automation, with the former also being necessary to assure the correct look and feel of the system and helping to increase its usability. In addition, the reusability of test components is emphasized as a way of reducing the test creation’s required effort and complexity. Again, the use of mock objects for testing purposes is highlighted as a way of acting more resource efficient. Besides the quality assurance aspect, it is also highlighted that TDD helped in accurately determining the application’s requirements based on the customer’s test data, emphasizing its influence on the software’s design.

While [40] is more focused on exploring the consequences of applying TDD, it also provides some insights on its actual implementation and comprises experience reports of actual TDD projects. Besides the need for test automation, it especially highlights the necessity of support by domain experts and customers to determine the relevant user stories and requirements as well as to perform customer centered acceptance tests. For the described projects, the user stories that the acceptance tests were based upon were required to stand alone, be discrete for implementation, should be usable as a function, provide business value, and were supposed to be testable. Further, the stories were heavily limited in scale to increase comprehensibility, manageability, and testability. Besides assuring the quality, the tests build for these user stories were also used to define and refine the requirements of the developed components, since big upfront design is prone to false assumptions and might be hard to implement, whereas a smaller scale and more direct task (in the form of developing the tests) can help avoiding those pitfalls. Consequently, in TDD, the tests are not just tests but become part of the system’s specification as well. However, the testing aspect is also not to be neglected. The developed tests should be actively utilized for frequent regression tests and in case of occurring failures, immediate fixing of the corresponding issues is required. Furthermore, the work highlights the combination of TDD with other techniques, such as pair programming, as well as the use of design patterns. However, it is also emphasized that these patterns have to be used for the right reasons and to provide a benefit and not just as an end to itself. Regarding the system’s architecture, it is stated that a certain architectural foundation is necessary, but that its specifics only emerge over time as the development progresses.

The journey of the test-driven development of a framework is outlined in [41]. There, the use of mocks is also endorsed. However, this is amended by the recommendation to avoid mocking external components or APIs. During the described development, an external API was used. Since it was rather extensive, creating a complete mock would have been extremely time consuming. Instead, the author “introduced a new class into the framework that encapsulated the use of the external API and only mocked this new element instead of the complicated API” [41], which reduced the complexity and increased the decoupling of the developed framework from the external API’s details. As an additional design challenge, not everything should be mocked. Only these parts that should be externally exposed should also be mocked, which makes this decision already an important part of the system’s design. Another point highlighted in the work is that big refactorings are to be expected in the development of a complex system, since the requirements might exceed the possibilities of the previous iteration. This should, however, not be considered as a mistake of the previous design, but as a natural part of its evolution. Finally, it is highlighted that the developer should consciously drive the design through the tests in the desired direction, since good design not magically appears just by writing test.

The most extensive consideration of generally applicable guidelines for TDD in the studied literature set is given in [42]. It is also the only book chapter. Therefore, it is emphasized that the scope of a TDD session as well as the tests should be clearly defined before its start and that test design should be cohesive, meaning that similar components should be tested in a similar fashion and that related tests should be grouped. Further, test routines should be reused when possible. Moreover, the significance of mock objects is once more highlighted, while it is also pointed out that they should only be used for external dependencies and not for (class-) internal ones. Another important point is to focus on fulfilling the current requirements, instead of trying to predict and preempt potential future necessities. Consequently, large refactorings are part of the design process and not an indicator of the previous failures. It is also stated that TDD can (and probably should) be combined with other design techniques that focus on different aspects, such as domain specific design. To solve design problems, the use of patterns should be considered. Additionally, since the big picture of architectural design is out of the scope of TDD, and the use of reference architectures or otherwise predetermined structures is recommended.

Even though the number of papers that sufficed the review criteria was rather small and they all had slightly varying areas of focus, when aggregated, they yield valuable findings on how to conduct a TDD endeavor. To make those insights more accessible and therefore, facilitate their utilization by practitioners as well as researchers, as the main contribution of this paper, they are condensed into the set of guidelines that is depicted in Table 3.

Table 3 Guidelines for applying TDD in software engineering

5 Conclusion

By means of a structured literature review, the publication at hand has identified twenty guidelines for the application of the TDD approach for the development of software. Those are also the main contribution of this work. Practitioners can use them as guidance for their own projects, whereas researchers can build upon them when striving to advance software testing in general or TDD in particular. Overall, those guidelines are rather extensive and cover many aspects of the domain. However, despite Scopus, the database for the search, being the largest of its kind, it has to be noted that the found publications were still rather old, and that their number was quite low. Therefore, future work that aims to validate or amend the guidelines should be directed into two directions, a broadening of the search and on conducting additional practical experiments.