Keywords

1 Introduction

The software complexity is continuously mounting because of the complex requirements, an increase in the quantity, module size, and code smells belonging to the advanced software, etc. Harsh conditions are challenging to examine and recognize, and therefore, as a result the improvement turns out to be problematic. The complicated necessities are not in the hands of the designer, but the smell of the code can be recognized and the software can be made modest, more comprehensible, and uncomplicated [1]. During the software making procedure, the operational and non-operational quality required for developers must be followed to secure software quality [2]. Developers concentrate only on practical demands and ignore non-functional needs, such as comprehensibility, verifiability, evolution, maintainability, and reusability [3]. The complexity of the software program is growing constantly due to the wide range of module sizes, method sizes, and branching complexity, renovation costs are increasing due to extra code smells, and the quality of the software program is deteriorating due to the growing number of lines of code. Fowler et al. [4] described the refectory method from which the slackly executed program could be changed in an exemplary execution; 22 definitions of code smells were given by them.

Many approaches for code smell detection have been used in the literature to identify smells of different codes.

Every method will produce a distinct outcome according to their different categories. Seven types of code smell detection techniques are categorized by Kessentini et al. [5]. These seven techniques are cooperative-based technique [6], visualization-based technique [7], search-based method [8], probabilistic technique [9], symptom-based technique, metrics-based technique, and manual technique [10]. Inspection approaches [11], manufacturing process, and process identification methods [12] were used by manual technique to advance the superiority of software. To detect code smells using a symptom-based technique, the specification algorithm was used [13].

Multiple studies have analyzed the code smell’s influence at the software and also displayed unwanted impacts of software’s features with code smell detection [14, 15]. They also analyze the code smell effects that increase the risk of software system failures and faults. They found that the challenge full of code smell impacted the software improvement procedure and suggested software’s refactoring for elimination of it.

Deligiannis et al. [16], Olbrich et al. [17, 18], and Khomh et al. [19] observed the effect of code smells on software development by observing the changes in the occurrence and size in the software system. They also observe that classes affected by code smells have a more significant rate of change and require supplementary maintenance work. The relevance of bad smells and the possibility of class error in an OOS system were investigated by Li and Shatnawi [20]. Infected software elements that use code smells have more class mistakes than other elements, according to the experiment. The negative effect of God-class on energy intake was examined by Perez-Castillo and Piattini [21], who found that eliminating God-class odors reduces the cyclomatic complexity of the source code in the software system.

The main contributions of this research work are divided into two parts: In the first part, five classification techniques are applied to detect the code smell from the dataset and feature selection technique is also applied to select the best features from each dataset. The second part shows the performance measures obtained using classification and evaluation techniques with the tenfold cross-validation technique.

In this research work, the classification techniques for code smell detection are proposed. The four datasets of code smell datasets are considered. The class-level smell contains the God-class and the Data-class datasets, whereas method-level smell contains Feature-envy and Long-method datasets. Five classification methods (random forest, SVM, naive Bayes, KNN, and logistic regression) are applied to classify the dataset.

This paper is partitioned in five section: Sect. 2 describes the literature review, and this section shortly reviews the work done in past by various authors for code smell discovery by classification techniques. Section 3 describes proposed methodology. Section 4 describes the investigational consequences, and Sect. 5 explains the conclusion of our work.

2 Related Work

Many researchers have presented papers using ML algorithms to detect the code smells. In this paper, the existing methods of supervised learning techniques are used to detect code smells. In this paper, the existing methods of supervised learning techniques are used to detect code smells to comparing and experimenting with ML algorithms to detect code smells. They tested sixteen ML techniques on four code smell datasets as well seventy four Java platforms on the training dataset that were manually evaluated. Boosting approaches are also used on four datasets of code smells.

Mhawish and Gupta [22,23,24] proposed software metrics, tree-based and decision tree-based ML algorithms, and software metrics for differentiating and recognizing similar structural design patterns. To choose the most significant characteristics from each dataset, they utilized two feature selection strategies in light of GA-CFS (genetic algorithm).

They also employed a parameter refinement using a grid search method approach to improve the accurateness of all machine learning methods. Guggulothu and Moiz [25, 26] suggested a multi-label classification strategy for code smell detection. To see if the specified code components are affected in several ways, they employed a categorization system with many labels. For excellent accuracy, they made use of an unsupervised classification algorithm. Dewangan et al. [27] applied six ML algorithms, and two feature selection techniques such as chi-square and method for selecting features based on a wrapper were applied to pick the greatest features from each dataset; then, moreover grid search procedure was used to increase the performance of model, and they obtained 100% highest accuracy using the logistic regression technique for the Long-method dataset. Kreimer [28] proposed a detection approach to detect Long-methods and prominent class code smell based on a decision tree approach. The approach is evaluated on two small software: the WEKA software package and the IYC system. It was found that the prediction model and this model help detect code smells. The usefulness of decision trees for identifying code odors was proposed by Amorim et al. [29]. By putting Kreimer’s decision tree model to the test, they were able to corroborate his findings. Class change proneness can be predicted based on code smell using Pritam et al.’s [30] machine learning methods. They agree that code smells have an influence on the predisposition of a given session in a produce context to change. They used six ML techniques to estimate variation proneness based on code smells from 8200 Java modules across 14 software systems.

Draz et al. [31] proposed employing the classifier based on the whale optimization method to enhance code smell prediction using a search-based method. They tested the nine different kinds of code smells on five different open systems applications. They had an accuracy of 94.24% and a recall of 93.4% on average.

For metric-based code smell detection, Pecorelli et al. [32] provided an interesting finding comparing the performance of machine learning-based and heuristic-based strategies. They considered five types of code smells (God-class, Spaghetti Code, Class Data Should Be Private, Complex Class, and Long-method) and compared ML techniques with DECOR, a state-of-the-art heuristic-based approach. Researchers discovered that the DECOR consistently outperformed the ML baseline. In Table 1, a summary table of some essential related work is shown.

Table 1 Summary of related work

3 Proposed Methodology

In this paper, a code smell detection framework is constructed using classification models. Code smell matrices play an essential part in determining the operational as well as non-operational abilities and recognizing the software’s properties. Metrics manage the static information of the software, such as classes, methods, and parameters that measure coupling and cohesion between objects in the system. Figure 1 depicts the steps which are followed to build the code smell detection framework. First, four datasets of code smell are created. The pretreatment (regularization) processes are then done to the dataset to cover all of the dataset’s ranges. The best features from each dataset are then selected using the wrapper-based FST. Then train the model with classification algorithms applied to the dataset and determine their performance. The methodology of tenfold cross-validation is then applied to assess the result of each experiment during the preparation development. For performance measurement, tenfold cross-validation is used, which divides the dataset into ten sections and repeats them ten times. Then evaluate the final results.

Fig. 1
figure 1

Proposed work

God-class, Data-class, Feature-envy, and Long-method are four code smell datasets which [33] were taken to make the code smell detection framework in this study. In the following section, the data preparation methodology is shown briefly.

Because various datasets have distinct attributes, we cannot always use straightforward classification techniques on them. As a result, normalization is required to span the dataset’s various ranges. ML models may sometimes evaluate quickly on a normalized dataset, which might have a big impact when the model is sensitive to size. Prior to the implementation of the support vector machine algorithm, for example, it is necessary to avoid normalization in order to dominate higher number ranges on small number varieties, where the variety of possible elevated values causes mathematical problems [34]. This article uses the minimum–maximum normalization technique to convert dataset values between 0 and 1. This strategy is utilized in the data preparation step, which prepares the data for subsequent processing using one of the machine learning algorithms such as SVM, NN, and others [35]. The following equation executes an x mapping change from feature A from the range [min A, max A] to [new min A, new max A].

$$x^{\prime} = \left( {x - \min A} \right)/\left( {\max A - \min A} \right)$$

All datasets were subjected to the min–max normalization approach, and the resulting new data was used as input into all classification systems.

To choose the best features (matrices) from each dataset, this experiment uses a wrapper-based FST. FST is applied to choose a set of characteristics in the dataset that are mainly appropriate to the goal value [36]. In this experiment, we have selected the ten best features from each dataset, and then, classification algorithms are applied to each dataset.

This paper applies five classification algorithms (random forest, SVM, naive Bayes, KNN, and logistic regression) to perceive code smells from the code smell dataset. Classification algorithms classify the data into the specified number of classes in our dataset.

In this study, the validation technique is used to assess the performance of each experiment. For this, a tenfold cross-validation training approach was applied. Classification models that partition the dataset into tenfold with ten times of iteration are calculated using tenfold cross-validation. Different parts of the dataset are considered test datasets at each iteration, and other convolutions of the dataset are considered training modes. Then, finally the trained models are tested with unseen test dataset (10% split from the dataset before training). Stealth test dataset is used to clarify the model’s forecasts and escape making broad generalizations.

Four performance constraints, precision, recall, F1-score, and accuracy, were examined to measure the efficiency of our classification approach. To calculate them, TP, TN, FP, and FN are found through the confusion matrix. True positive (TP) displays the occurrences in the positive class that properly forecast the model. False positive (FP) refers to occasions in which the model is predicted inaccurately in the positive class. True negative (TN) displays the instances in the negative class that properly forecast the model. Furthermore, false negative (FN) displays situations where the negative class is wrongly predicted.

4 Experimental Results

To the experiment work, four code smell datasets are used. The five classification algorithms are applied to identify the code smells from each dataset. The four performance measurements, precision (P), recall (R), F1-score (F1), and accuracy (A), are considered for each dataset. The experimental results for each classification technique are shown in Table 2. In this research, the F1-score was 0.98%, and accuracy was 0.98% for the Data-class, the F1-score was 0.98%, and the accuracy was 0.97% for the God-class, the F1-score was 0.98%, and an accuracy of 0.9912% for the Feature-envy, and an F1-score of 1.00% and an accuracy of 0.9952% for the Long-method using the random forest algorithm attained the maximum accuracy, whereas the naive Bayes (0.91% accuracy for Feature-envy) attained the gives the worst performance.

Table 2 Experimental result of five classification techniques with four code smell datasets

4.1 Evaluation of Our Techniques to Other Related Works

Table 3 represents a brief long evaluation of our techniques with other related works. In this evaluation, it is observed that in the Feature-envy dataset our approach achieved 99.12% accuracy, while in the Data-class and God-class datasets, Mhawish and Gupta [22] achieved 99.70 and 98.48% highest accuracy. For the Long-method data set, Dewangan et al. [27] achieved the highest accuracy of 100%.

Table 3 Comparison of our approach with other related work

5 Conclusion and Next Steps

The classification strategy is provided in this research to identify the code smells from software and to find the metrics that play an important part in the detection process using classification algorithms. To determine the key metrics that may be utilized to increase accuracy, the wrapper-based feature selection approach is used. The findings are then evaluated using a tenfold cross-validation procedure. In this research work, it has been noted that the random forest procedure achieved the maximum accuracy of 0.98% for Data-class, 0.97% for God-class, 0.9912% for Feature-envy, and 0.9952% for Long-method dataset. In the future work, other machine learning techniques and other metrics selection techniques can be applied to increase the outcomes.