Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Al-Shaaby, Ahmed; Aljamaan, Hamoud; Alshayeb, Mohammad

doi:10.1007/s13369-019-04311-w

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Review Article - Computer Engineering and Computer Science
Published: 07 January 2020

Volume 45, pages 2341–2369, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Download PDF

2114 Accesses
43 Citations
1 Altmetric
Explore all metrics

Abstract

Code smells are indicators of potential problems in software. They tend to have a negative impact on software quality. Several studies use machine learning techniques to detect bad smells. The objective of this study is to systematically review and analyze machine learning techniques used to detect code smells to provide interested research community with knowledge about the adopted techniques and practices for code smells detection. We use a systematic literature review approach to review studies that use machine learning techniques to detect code smells. Seventeen primary studies were identified. We found that 27 code smells were used in the identified studies; God Class and Long Method, Feature Envy, and Data Class are the most frequently detected code smells. In addition, we found that 16 machine learning algorithms were employed to detect code smells with acceptable prediction accuracy. Furthermore, we the results also indicate that support vector machine techniques were investigated the most. Moreover, we observed that J48 and Random Forest algorithms outperform the other algorithms. We also noticed that, in some cases, the use of boosting techniques on the models does not always enhance their performance. More studies are needed to consider the use of ensemble learning techniques, multiclassification, and feature selection technique for code smells detection. Thus, the application of machine learning algorithms to detect code smells in systems is still in its infancy and needs more research to facilitate the employment of machine learning algorithms in detecting code smells.

Severity Classification of Code Smells Using Machine-Learning Methods

Article 29 July 2023

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Article 30 November 2020

Comparing and experimenting machine learning techniques for code smell detection

Article 06 June 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Code smell detection can be defined as the task of identifying potential code or design problems in a system [1,2,3,4,5]. Code smells occur due to programming and design mistakes caused by software developers during the software designing and programming state [2]. They can also occur for other reasons such as an incorrect analysis, incorrect integration of new models into the system, ignoring software development principles, and writing codes in a complex way [1, 2]. These smells may have a negative impact on the overall quality of the system, such as maintainability and understandability [6,7,8,9,10]. Therefore, the code smell detection process has motivated many researchers to propose different methods to deal with the occurrence of code smells in systems. Refactoring is proposed to alleviate and overcome code-smell-related issues. Refactoring leads to high quality, high performance, low cost, reusability, implementation and the easy development of software [1, 11].

Undertaking the code smell detection process in a manual manner is considered to be subjective. Most of these techniques mainly depend on object-oriented metrics that result in various outcomes [7]. Therefore, automated tools are proposed [12,13,14,15,16,17,18,19,20]. Nowadays, machine learning techniques are utilized to address code smell issues with promising results. A machine learning classifier needs first to be trained using a set of code smell examples to generate a model. The generated models are then used to identify or detect code smells in unseen or new instances. The power of the generated model relies on various criteria related to the dataset, the machine learning classifiers, the parameters of the classifier itself, etc.

A number of systematic literature reviews (SLRs) have been conducted in the area of bad smells. Zhang et al. [21] conducted an SLR on refactoring and code smells and reviewed papers published during the period of 2000–2009. They observed that most of the reviewed studies used a small number of code smells, and some of these smells are used by the participants in a bad way (for example, message chains). Likewise, Singh and Kaur [22] performed an SLR by reviewing 238 papers. They concentrated on the methods used to detect code smells as well the tools used for refactoring these code smells. Sharma and Spinellis [23] conducted another SLR and presented the existing knowledge associated with code smells, identified the challenges and investigated the definitions of code smells, reasons for their occurrence, their impact, and the available detection tools. Santos et al. [3] conducted an SLR to synthesize the existing knowledge on code smells. They concentrated on empirical studies that investigated how code smells affect software development. Mariani and Vergilio [24] presented an SLR of the existing research that motivates or applies search-based methods in the software refactoring activity. Their SLR was conducted by reviewing 71 primary studies. Several mechanisms have been introduced to detect code smells. Rasool and Arshad [25] performed a review on the existing tools that are used to detect code smells and report their associated challenges. Furthermore, Fernandes et al. [26] conducted a comparison between 84 smell detection tools. Fontana et al. [27] conducted a literature review that focused on code smells and automatic tools. They identified seven code smell detection tools and evaluated four of them in terms of their detection results. Garcia et al. [28] presented the code smells that are frequently recurring in software design which can have non-clear and important detrimental impact on system lifecycle properties. A recent and related SLR was undertaken by Azeem et al. [29] in which they conducted a study to provide an overview of the use of machine learning approaches for code smell detection. They identified 15 primary studies that used machine learning approaches. Their study focused on addressing four issues: (1) the code smells considered, (2) the setup of the approach, (3) the design of the evaluation strategies, and (4) an analysis of the performance. However, in this paper, we further provide a detailed analysis of the datasets used in code smell detection studies. Furthermore, we investigate the tools used to detect code smells, the tools used to extract and compute software metrics (features) and the tools used to implement machine learning techniques applied to detect code smells. Moreover, we investigate the use of feature selection techniques and compare the stand alone and ensemble-based machine learning techniques.

Another recent SLR was developed by Caram et al. [30]; the authors in this study conducted a systematic mapping study of the use of machine learning techniques for code smells identification. twenty-five primary studies were identified. Their SLR concentrated on studying code smells detection using machine learning techniques from different perspectives: (1) the detected code smells, (2) the used machine learning techniques, (3) the most used machine learning techniques for each code smell, and (4) and the performance of each of these techniques for the code smells. These perspectives are covered in this study; also we present an extensive analysis of the used datasets in the literature from different viewpoints: (1) the size of each dataset (i.e., number of included systems), (2) dependent variables, (3) independent variables, (4) the used tools to compute or extract metrics (independent variables), (5) the used tools to assign dependent variables(smelly or not smelly), and (6) description analysis of each available dataset (i.e., number of features, instances and smelly, and non-smelly instances in each dataset). Further, we explore the setup of used techniques such as, the used evaluation metrics, the used validation methods, and classification type (i.e., binary classification or multiclassification). Furthermore, we investigate the use of ensemble machine learning techniques and compare them with stand-alone machine learning techniques. Moreover, we show the tools used to implement machine learning techniques. In addition, we investigate the use of feature selection techniques.

In this study, our main objective is to systematically review the studies carried out to detect code smells using machine learning algorithms from different perspectives. To achieve the study objectives, we carried out an SLR following the general guidelines defined by Kitchenham and Charters [31]. First, a wide literature search was conducted in five online databases to identify the relevant studies. Then, a set of inclusion and exclusion criteria and quality assessments were devised to obtain the primary studies. The selected studies were then analyzed, classified, and compared using our defined criteria including types of machine learning algorithms, prediction accuracy, detected code smells and datasets, resulting in the selection of 17 primary studies.

The results of this study provide knowledge for both practitioners and researchers about the most frequently code smells detected, and the machine learning technique used to detect them. The results also provide information about the accuracy measures used in the experimental studies that practitioners and researchers can use for comparison with the existing studies. The study also provides analysis about the tools used for code smell detection and correction. Furthermore, the details of the datasets used in bad smell detection and correction studies are also reported.

The main contributions of this study are:

1.
Identified 17 primary studies that use machine learning techniques to detect code smells.
2.
Conducted different analyses on these primary studies to provide knowledge about: (1) the applied techniques, (2) the detected code smells, (3) the accuracy measures, (4) the used datasets, and (5) the most commonly used tools.
3.
Provide recommendations that can be used for the future research.

The rest of this study is organized as follows. A background of bad smells and machine learning techniques is presented in Sect. 2. The research methodology used is discussed in Sect. 3. Section 4 presents the results. Section 5 discusses the main findings. In Sect. 6, we identify the potential threats to the validity of our study, while Sect. 7 presents the conclusion.

2 Background

In this section, we provide a brief background of code smells and machine learning techniques.

2.1 Code Bad Smells

Code smells indicate potential code or design problems in a system [1, 2]. Code smells, also known as design flaws, refer to design situations that negatively influence the maintainability of the software [27]; therefore, they may impact the maintenance processes [32]. A number of code smells were presented in Fowler’s book [33]. Fowler also suggested guidelines to eliminate these smells from the system [25].

It is helpful if we identify a bad smell as early as possible in the development lifecycle [32, 34]. Detecting bad smells in code or in the design and then performing the appropriate refactoring procedures when necessary is very useful to enhance the quality of the code. These smells make the system more difficult to maintain, probably also increasing its fault proneness [35]. Bad smells are unlikely to result in failure directly but might do so indirectly which still negatively impacts software quality [36].

Code metrics are used by different code smell detection tools to detect code smells. These tools either compute the metrics from the code itself or utilize the extracted metrics from the external party’s tools [7]. The bad smell detection process can either be manual or automatic, using detection strategies. Bad smells can be detected in source code or system design.

2.1.1 Code smell categories

There are several categories of bad smells, and each category contains several types. In our study, we selected the category proposed in [33, 37,38,39], as they include the most common code smells types. The code bad smells are very closely relevant. Consequently, we consider that the taxonomy makes the smells more recognizable and understandable. Figure 1 details these categories.

Bloaters In this category, the code or classes are expanded to such a large extent that they are difficult to work with. These smells do not manifest immediately, rather they aggregate after some time as the program develops, particularly when no one endeavors to eliminate them. The first type of code smell in this category is the Long Method, which contains too many lines of code, making it difficult to reuse, change, and understand. The best solution for this smell is to divide this method into separate methods. The second type of code smell in this category is Large Class. This occurs when a single class attempts to do too much, and it usually contains several instances and has various responsibilities. This smell makes the reusability and maintainability of this class more difficult. The best solution for this smell is splitting this class, by applying extract class. The third type of code smell in this category is Primitive Obsession used in software. We should use small classes instead of primitive types in some situations. For instance, primitives are used in place of small objects for simple tasks, for example, special strings for phone numbers, ranges, and currency. The fourth type of code smell in this category is Long Parameter List. This type of smell occurs if any method has more than four parameters, making parameter lists more difficult to understand and use and also inconsistent. The final type of code smell in this category is Data Clumps. Occasionally, various parts of the code contain identical groups of variables, for example, parameters to connect to a database. These clumps should be turned into their own classes.
Object-Orientation Abusers All the smells in this category involve the incomplete or incorrect application of the principles of object-oriented programming. The first type of bad smell in this category is the Switch Statements. This smell appears in the code when it contains a sequence of if statements or a complex switch operator. The second type of bad smell in this category is temporary field. Usually, temporary fields are created for use in an algorithm that requires a lot of parameters. Therefore, instead of creating a large number of parameters, the programmer creates fields for these data in the class. These fields are utilized only in the algorithm and go unused the rest of the time. This kind of smell is difficult to discover. Removing this smell enhances code clarity and organization. The third type of bad smell in this category is Refused Bequest. This smell occurs when programmers create inheritance between two completely different classes, but the subclass uses only a few of the methods and properties inherited from the superclass. The best way to treat this smell is to use delegation instead of inheritance. The fourth type of bad smell in this category is alternative classes with different interfaces. This code smell occurs when programmers create two classes with identical functionality, but these methods have different names.
Change Preventers These smells occur if you need to modify something in one place in your code, and then you have to make many modifications in other places too. Therefore, the development process of the software is more complicated and costly. In this category, there are three types of bad smells. The first type of bad smell in this category is Divergent Change. This bad smell occurs when many changes are made to a single class. The best way to remove this smell is to split the class’s behavior. For instance, in the case where different classes have the same behavior, the classes should be combined through inheritance. This will improve the organization of the code as well as reduce code duplication. The second type of bad smell in this category is Shotgun Surgery. This occurs when a single change is made to multiple classes simultaneously. The reason this smell occurs is because one responsibility has been split up among a large number of classes. The best way to remove this smell is to move the existing class behaviors into a single class. This will improve the organization of the code, reduce code duplication, and make it easier to maintain. The third type of bad smell in this category is Parallel Inheritance Hierarchies. This smell occurs when you create a subclass for a class and then discover that you need to create a subclass for another class.
Dispensables These smells occur when part of the code is not needed and where it to be removed, the code would be cleaner, more efficient and easier to understand. There are six types of bad smells in this category. The first type of bad smell in this category is Comments. This smell occurs when the program is filled with explanatory comments. The second type of bad smell in this category is the Duplicate Code. This smell occurs when the same or very similar code appears in several parts of the program, making the program code large. This bad smell can be removed by creating a new method that encapsulates the duplicated code. The third type of bad smell in this category is Lazy Class which is a useless class. Every class which is built takes effort and is time-consuming to understand and maintain. The best way to remove this smell is to eliminate these classes. The fourth type of bad smell in this category is Data Class. This is a class that contains only fields but there is seldom any logic to it. The Data Class has getters and setters methods for fields. The fifth type of bad smell in this category is Dead Code. This occurs when a code is never executed. The sixth type of bad smell in this category is Speculative Generality. This occurs when there is an unused parameter, field, method, or class. The reason this bad smell occurs is because sometimes, code is created to support anticipated future features that are never actually implemented. Consequently, the code becomes difficult to understand and support.
Couplers All the smells in this category contribute to an excessive coupling between classes or show what happens if coupling is replaced by excessive delegation. There are six types of bad smells in this category. The first type of bad smell in this category is Feature Envy. This bad smell occurs a method accesses the data of another object more than its own data. It generally occurs when fields are moved to a Data Class. If this happens, the operations on data should be moved to this class as well. The second type of bad smell in this category is Inappropriate Intimacy. This occurs when one class uses the internal methods and fields of another class to do its work. The third type of bad smell in this category is Message Chains. This occurs when a client requests another object, and that object requests yet another object and so on. The fourth type of bad smell in this category is Middle Man. This occurs if a class performs only one action and delegates work to another class, hence there is little point in its existing. This smell can be the result of the overzealous elimination of Message Chains. The fifth type of bad smell in this category is Incomplete Library Class. This occurs when libraries no longer meet user requirements.

2.2 Machine learning

Machine learning is a discipline where computer systems are able to learn and perform their work even if they were not explicitly programmed [40]. The most commonly used machine learning techniques in the literature on software quality prediction are supervised learning, reinforcement learning, and unsupervised learning. This subsection describes the investigated classification models.

Multilayer Perceptron (MLP) [41, 42] This is an artificial neural network (ANN) model that consists of a layer as an input, at least one hidden layer, and an output layer. Every node is a neuron that utilizes a nonlinear activation function, and it associates with other nodes in the next layer with a specific weight. MLP constantly utilizes the backpropagation technique for training.
Support Vector Machines (SVMs) [43, 44] These are supervised learning models with related learning techniques, and these learning algorithms can be used for regression or classification. SVM was defined by Vapnik [45] based on the principle of structured risk minimization. The main objective of SVM is empirical error minimization and geometric margin maximization. Commonly, the extent to which deviations are tolerated, the complexity parameter C, and the kernel are the parameters that are used to define the SVM model.
Radial Basis Function Networks(RBFs) [46, 47] This is a type of neural network that has three layers, one being an input layer, the second being a hidden layer, and the third being a linear output layer. Three types of RBFs are multiquadric, polyharmonic spline, and Gaussian. RBF networks are used for classification, function approximation, and system control.
Bayesian Belief Networks (BBNs) [48] This is a convenient graphical model for representing a collection of variables and their probabilistic independencies. In this model, a random variable is represented by a node in the graph, whereas the probabilistic dependencies among the corresponding random variables are represented by the edges connecting the nodes.
Naive Bayes (NB) [47, 49] This is a supervised learning algorithm that employs the Bayes algorithm with the “naive” assumption of conditional independence among each pair of attributes.
Random Forests (RF) is a supervised learning algorithm that contains many unpruned classifications or regression trees such that every tree is based on the values of a random vector experimented individually and with the same distribution for all trees in the forest. RF can be employed for both classification and regression problems [50, 51].
Linear Regression (LR) is a modeling method that is applied to find the correlation between the target and independent variables in the dataset by utilizing the linear predictor functions [52,53,54].
Multinomial Naive Bayes (MNB) Multinomial naive Bayes is a version of NB that is introduced for text classification. In MNB, the data samples follow a multinomial distribution [49, 55].
Decision Tree (DT) [56] This is one of the most successful options for the supervised learning method for regression and classification. The C4.5 algorithm is the most commonly used technique to generate decision trees [57].

3 Research Methodology

The main objective of this study is to identify and analyze all relevant studies that use machine learning to detect code smells. As previously mentioned, we followed the SLR guidelines suggested by Kitchenham and Charters [31]. An SLR is a well-defined and systematic way of finding, assessing, and analyzing published primary research [58,59,60,61]. The SLR gives a strong basis on which to make claims on research questions, but it needs considerably more effort than a traditional literature review [62]. The SLR process involves six phases, followed in sequence as shown in Fig. 2:

Research questions
Search strategy design
Study selection
Quality assessment
Data extraction
Data synthesis

In the first phase, a number of research questions based on the goal of the current SLR are defined. Then, a search strategy is built to find all the studies relevant to the research questions. Next, the search string, the digital libraries and the inclusion and the exclusion criteria are identified, and in the fourth phase, a set of quality assessment criteria are identified and applied to the selected papers. In the data extraction phase, the data extraction cards are created and employed to obtain data from the selected papers. Finally, in the data synthesis phase, appropriate methodologies are defined to synthesize the extracted data.

3.1 Research questions

Constructing the research questions is a significant step in the SLR process. Five research questions are defined to achieve our research objective:

RQ1: Which machine learning techniques have been applied to detect code smells?

The objective of RQ1 is to identify the machine learning techniques that have been applied to detect code smells. Researchers can use the outcomes of this question to identify the most applied machine learning techniques for code smells detection and to investigate the possibility of implementing unused techniques.

RQ2: Which code smells are most commonly detected using machine learning techniques?

Several studies have been conducted to address the issue of code smells, so our objective in relation to this question is to identify the code smells that have been detected using machine learning and why researchers have chosen these code smells. The findings of this question can be used to determine the code smells that have not been investigated yet or got less attention by current studies. Hence, researchers can address them in the future work.

RQ3: What are the accuracy measures of the machine learning techniques that have been used for code smell detection?

To answer this question, we identify the performance metrics used to evaluate the machine learning techniques in terms of the detection of code smells. Then, based on these performance metrics, we identify the accuracy of the machine learning techniques that have been used for code smell detection. Next, we compare the machine learning techniques that have been used to detect code smells in order to find the most efficient. The findings of this question can be useful to identify the accuracy measures used to evaluate the performance of the used machine learning techniques. Such knowledge enables researches to use the most appropriate accuracy measure in their studies.

RQ4: What datasets have been used for code smell detection?

The objective of this question is to investigate the attributes of these datasets, such as: the dataset name, size (number of systems in each dataset, the size of each system), type (commercial, student, open source), availability of the datasets (available online or not), the language of the selected systems, the inputs of the datasets, the tools used to obtain the values of dataset inputs, and an analysis of the available datasets. Researchers can use the outcomes of this question to build their datasets in a good way.

RQ5: What are the most commonly used machine learning tools in the context of bad smell detection?

Many tools are used to implement machine learning algorithms. To answer this question, we investigate the tools that are used to implement machine learning algorithms and explain why they have been selected. The findings of this question can be useful to identify the most used tool; thus, researchers can select the most suitable tool for their needs.

3.2 Search strategy

This involves three phases: search terms, online databases, and the search process. These are described in the following subsections.

3.2.1 Search strategy

The common strategies applied to construct the search terms are described in this subsection as follows:

1.
Obtaining the main terms from the research questions.
2.
Finding the alternate synonyms and spelling for the main terms.
3.
Verifying the above steps by matching the keywords from any relevant research paper.
4.
Managing the Boolean operator “OR” to link the alternative synonyms and spellings and “AND” to link the major terms and the Boolean operator.

The search terms are built based on population, intervention, outcome, and experimental design.

Population Code smells.
Intervention The existing machine learning techniques for detecting code smells.
Outcomes Improve software quality.
Experimental Design Empirical studies, case studies, and experimental studies.

After applying the previous steps along with several tests results, the following complete search terms are employed in our research.

((((Code OR Bad) AND Smell*) OR Antipatterns OR Refactoring) AND (Detect* OR Predict* OR Estimat* OR Forecast*) AND ((“Machine learning” AND (Model OR Technique OR Algorithm OR Method OR approach)) OR “artificial intelligence” OR “Ensemble learning”)).

3.2.2 Research Resources

Five online databases are used to find relevant conference and journal papers using our defined search terms. Table 1 presents these online databases. These databases were selected as they are the popular venues for publishing papers on machine learning and bad smell detection studies. Other researchers have also used these databases in their SLR studies [3, 63,64,65].

Table 1 Online databases

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Abstract

Similar content being viewed by others

Severity Classification of Code Smells Using Machine-Learning Methods

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Comparing and experimenting machine learning techniques for code smell detection

Explore related subjects

1 Introduction

2 Background

2.1 Code Bad Smells

2.1.1 Code smell categories

2.2 Machine learning

3 Research Methodology

3.1 Research questions

3.2 Search strategy

3.2.1 Search strategy

3.2.2 Research Resources

3.2.3 Search Process

3.3 Study Selection

3.3.1 Inclusion Criteria

3.3.2 Exclusion Criteria

3.4 Study Quality Assessments

3.5 Data Extraction

3.6 Data Synthesis

4 Results

4.1 Overview of Selected Studies

4.2 Types of machine learning techniques used (RQ1)

4.3 Detected Smells (RQ2)

4.4 The Accuracy of Machine Learning Techniques (RQ3)

4.5 The Used Datasets (RQ4)

4.6 The Tools Used to Implement the Machine leArning Algorithms (RQ5)

5 Discussion

5.1 Types of Machine Learning Techniques Used (RQ1)

5.2 Detected Smells (RQ2)

5.3 The Accuracy of Machine Learning Techniques (RQ3)

5.4 The Used Datasets (RQ4)

5.4.1 Dataset Analysis

5.5 The Tools Used to Implement the Machine Learning Algorithms (RQ5)

6 Threats to Validity

7 Conclusion

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation