Abstract
Software testing is a vital phase in the software development life cycle. Testing validates a developed software against the input test cases by identifying the defects present in the system. The phenomenon of testing is not only time-consuming but also a costly affair. Although there are automated tools available that reduce the effort of testing up to some extent, the high maintenance cost of these tools only increases the cost. Earlier defect prediction in software significantly reduces the effort and cost without affecting the constraints. It identifies the defect-prone modules that require more rigorous testing. A practical and effective defect prediction mechanism is the need of the hour due to the challenges, namely dimensionality reduction and class imbalance, present in software defect prediction. Lately, machine learning (ML) has emerged as a powerful decision-making approach in this regard. This research work aims to do an extensive study on the implementation of ML techniques in software defect prediction. This comprehensive report is based on two different aspects named feature selection/reduction techniques and ensemble learning methods that have been used in software defect prediction. This study has also discussed the widely used software and performance measure metrics used in software defect prediction. This concise work would guide future researchers in this emerging research area. Further, this paper also emphasizes the need to identify a suitable feature selection approach that could enhance the model's predictive performance when applied with ensemble learning.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Nowadays, software has become an essential and integral part of our lives. The industry and society are immensely reliant on software-backed surroundings as it significantly reduces human endeavour and time [1]. Any electronic device or system, modern household product or spacecraft revolves around the software. Due to the high demand for good-quality software products across different walks of life, it is crucial to develop a software product free from defects. However, good quality and reliable software products come with complexities and challenges. In software development, a defect is an outcome of some programming mistakes due to which the software system does not show the expected results. Software testing enables the developers to develop a quality and reliable software product by identifying and fixing the defects. Defects present in a software module increase the development and maintenance costs and, at times, are the prime reason for the failure of the software product. At present, a significant part of the software development and maintenance budget goes into identifying and fixing defects [2]. This high maintenance cost can come down significantly by identifying the defects in the earlier stage of the development. Mantyala et al. [3], in 2008, described two types of defects named functional defects and evolvability defects. Functional defects affect the system's behaviour, whereas evolvability defects improve the evolvability of the software by making it easy to understand and modify.
Software defect prediction (SDP) focuses on defect-prone modules that require extensive testing. This way it facilitates the efficient utilization of the available resources without affecting the testing constraints. For a software product, the timely defect prediction boosts the quality of the software besides giving the flexibility to the project managers to manage the resources optimally [4]. Additionally, it also helps increase the quality of the developed software, leading to higher customer satisfaction and subsequently to the product's success. A practical and powerful defect prediction mechanism is vital due to the challenges, namely dimensionality reduction and class imbalance. The class imbalance problem is an extreme imbalance between the defect-prone (DP) and non-defect-prone (NDP) modules. Hence, the data set is highly skewed towards DP or NDP modules. Usually, the DP modules are fewer than NDP modules, and initially, the learners mainly focus on NDP modules. Hence, data balancing is required to resolve the skewness in the data set [5]. Another major challenge most defect prediction models face is irrelevant features present in the data set. These features do not possess any significant information and are hence considered noise. Like the class imbalance problem, irrelevant features also reduce the model’s predictive performance.
Organization of the paper: Sect. 1 discusses the formal requirement of a software defect prediction model and its challenges. Section 2 describes the basic concepts that might help understand this work. Section 3 discusses the methodology that has been followed in this work. The literature review has been discussed in Sect. 4. It also raises three research questions. Section 5 consists of the discussion on our findings. The answer to the research questions also follows here. Finally, the paper concludes with the summary and future action in Sect. 6.
2 Software Defect Prediction
Here, we discuss the fundamental concepts of defect prediction in software modules and the applicability of machine learning. First, Fig. 1 depicts the block diagram of software defect prediction [7]. Next, the data set, taken from publicly available repositories like NASA, PROMISE, ECLIPSE, AEEEM, or a real-life project, undergoes some pre-processing to remove any noise or missing values. After pre-processing, the dis-aggregation of the data set is done into training and testing data. Unfortunately, the training set may contain irrelevant attributes that do not contribute significantly to the defect prediction but can lower the model's performance. A suitable feature selection technique can significantly solve this issue. Then, a suitable machine learning classifier uses this reduced training data to build the model. Further, the built model measures the performance using the test data.
Machine learning (ML) enables computers to learn automatically without human intervention and adjust their actions accordingly. Lately, ML has emerged as a powerful decision-making approach in software defect prediction. It helps in identifying the defective modules more effectively and reduces maintenance costs. Several defect prediction models have been proposed using the classifiers such as Naïve Bayes (NB), logistic regression (LR), random forest (RF), support vector machine (SVM), K-NN, decision tree (DT). However, the performance of these machines was not that satisfactory as these models suffer from challenges like class imbalance and irrelevant features [6].
3 Methodology
This study intends to review and assess the experimental evidence gained from the existing work done in software defect prediction using machine learning techniques. The overall methodology followed in this review is as below:
3.1 Research Questions
The following research questions have been framed to assist us in the assessments as mentioned earlier.
-
RQ1: Does the feature selection/reduction techniques impact the model’s predictive performance? What are the increasingly used feature selection/reduction techniques for defect prediction?
-
RQ2: Do Ensemble Learning methods give a better predictive performance in defect prediction models than the individual classifiers?
-
RQ3: What are the frequently used software and performance measure metrics in software defect prediction?
3.2 Review Protocol
The general process of identifying relevant works includes selecting appropriate digital repositories, determining search keywords and finding a list of existing works that match the search keywords. This survey includes research papers since 2010 from several databases/publishers such as Google Scholar, IEEE, Science Direct, and Springer Link. Further, we defined the search keyword “Defect Prediction in software modules using Machine Learning”, intending to include relevant studies for our study. We identified 117 papers from the repositories mentioned above in the initial list. We did a preliminary study based on the title, abstract, and conclusion of those papers and reduced the number to 79 in our second list. The statistical distribution of the publisher-wise studies and the methods followed has been shown in Figs. 2 and 3.
We went through these papers thoroughly and selected 15 final papers based on the inclusion criteria as mentioned below:
-
Inclusion Criteria:
-
Papers that used ML techniques in software defect prediction.
-
Papers that compared the performance of different defect prediction models.
-
Empirical papers.
-
We examined the final list of papers based on characteristics like feature selection/reduction techniques, ensemble learning methods, software metrics, and performance measurement metrics to check if they covered the above-defined research questions or not. The results are represented in Table 1.
4 Literature Review
In this section, we present the findings of our literature survey. We divided our survey into two categories. The first category included studying the conventional ML approach using feature selection/reduction techniques. In Table 3, we summarize the findings of the study (Table 2).
Though the defect prediction models based on individual classifiers using the conventional machine learning approach showed good performance, they still faced challenges like class imbalance. This challenge paved the way for further research to strengthen the achievement of defect prediction models. As a result, ensemble learning methods came into existence that combined several individual classifiers and built the prediction models using the ensemble classifiers. Table 3 represents the findings of defect prediction using the ensemble learning methods.
5 Discussion on Our Findings
5.1 Answer to the Research Questions
Through a detailed study of the selected papers, we observed that though software defect prediction system has made significant progress in recent times, it still faces challenges like irrelevant features and data imbalance. While feature selection and reduction techniques have helped to an extent in removing the irrelevant features, ensemble learning techniques have significantly improved the model’s performance compared to individual classifiers. The above study's findings answered the research questions raised in this paper.
RQ1: Does the feature selection/reduction techniques impact the model’s predictive performance? What are the increasingly used feature selection/reduction techniques for defect prediction?
Answer: The above survey found that the feature selection/reduction techniques significantly improved the model’s performance [4, 13,14,15, 18,19,20,21,22,23, 26, 27]. The widely used feature selection techniques are filter-based, wrapper-based, correlation-based, and consistency-based. Similarly, the widely used feature reduction techniques are principal component analysis, FastMap, feature agglomeration, transfer component analysis and TCA+ , random projection, restricted Boltzmann machine, and autoencoder. The study also found that the selection techniques exceeded the reduction techniques in supervised learning. Similarly, reduction techniques based on neural networks were better in unsupervised learning [27].
RQ2: Do ensemble learning methods give a better predictive performance in defect prediction models than the individual classifiers?
Answer: Several core models are combined to produce one optimal predictive model in the ensemble method. It improves the model’s predictive performance compared to the individual classifiers [5, 13, 23, 24]. When applied with ensemble learning methods, the feature selection methods mostly gave a better performance than the scenario when the feature selection was not used [4, 13, 26, 27]. However, in some cases, the performance was decreased when ensemble learning methods used specific feature selection methods [4]. Some widely used ensemble methods are bagging, boosting, and stacking.
RQ3: What are the frequently used software and performance measure metrics in software defect prediction?
Answer: In this review, we extensively studied the different software metrics used in software defect prediction. From our analysis, we found that majority of the studied works have used McCabe metrics, Halstead base and derived metrics. In terms of performance measurement metrics. Our study observed that the area under the receiver operating characteristic curve (AUC) is the most widely used performance measure as the skewness of defect data does not affect AUC. AUC is followed by accuracy as the second most widely used performance measure metric. Some other popular performance measures are precision, accuracy, specificity, G-mean, F-measure, performance variance, and error rate.
5.2 Filter-Based Feature Subset Selection Technique
The feature subset selection techniques examine the importance of each feature and produce a subset of relevant features. Filter-based and wrapper-based feature subset selection techniques are prevalent in defect prediction. The past research works have extensively used filter-based feature subset selection techniques, which were found very effective [15,16,17]. An overview of subset-based feature selection has been depicted in Fig. 4.
Correlation subset-based techniques do not evaluate individual features. Instead, they result in subsets of features. The best feature subset is low on intercorrelation but high correlation with the class label [17]. Consistency subset-based techniques focus on consistency to estimate the relevance of a feature subset. This approach provides a nominal feature subset with consistency equivalent to all the features [16]. Ghotra et al. [15] did an extensive study and assessed the influence of feature selection techniques on the defect forecast model. Through their trial outcome, the authors established that correlation-based subset feature selection techniques, coupled with best-first search, outperformed other feature selection approaches over different datasets used in this study. Balogun et al. [20] scrutinize the effect of diverse feature selection approaches on the predictive accomplishment of the models. They also compared the effect of filter-based feature selection methods with filter-based feature ranking methods on fault forecast models. The authors concluded that although the filter-based feature selection methods, in addition to the best-first search technique, enhanced the achievement of the fault forecast model, the filter feature ranking approach-based models gave a more stable predictive performance. Kondo et al. [21] applied different filter-based feature subset selection techniques and filter-based feature ranking techniques and observed the outcome of the fault forecast model in both supervised and unsupervised learning. They observed that filter-based feature subset selection procedures gave the best performance for the supervised defect prediction model. Similarly, neural network-based feature reduction approaches (RBM and AE) provided better-unsupervised patterns.
Balogun et al. [23] proposed an enhanced wrapper feature selection technique that selects features dynamically and iteratively and found that the proposed technique not only selected the subsets in less time but also returned an improved prediction rate.
5.3 Bagging Ensemble Learning
Bagging, an acronym of bootstrap aggregation, is one widely used ensemble learning method. It is a parallel method that fits several weak learners independently, making it possible to train them simultaneously. Bagging uses random sampling with replacement to generate additional data for training from the data set. Multiple models are trained in parallel using these multidata sets. Finally, the average predictions from different ensemble models are calculated. Bagging not only reduces the variance but also attunes the forecast to an anticipated outcome. Figure 5 depicts an overview of the bagging ensemble learning method.
5.4 Boosting Ensemble Learning
Boosting is another type of ensemble method where the weak learners learn in sequence and enhance the performance of a model adaptively—boosting increases the weight of a wrongly predicted data point. Each resulting model is allocated with weight during training. Figure 6 shows the boosting ensemble method.
Khan et al. [24] explored the idea of a hybrid ensemble learning technique using AdaBoost and bagging ensemble approaches using Naive Bayes, support vector machine, and random forest classifiers using PROMISE datasets. They equated the outcome of the studied models and concluded that AdaBoost-SVM and bagging SVM gave the best performance compared to the other methods.
From our study, we observed that most of the studied papers used bagging and boosting as an ensemble learning method in software defect prediction [4, 13, 24,25,26,27]. Mangla et al. [5] used a sequential ensemble model based on a neural network and compared the performance of their proposed model with other ensemble methods such as bagging, boosting, stacking. Laradji et al. [13] used the average probability ensemble (APE) method on two variants to gain a more robust data imbalance and feature redundancy system. Figure 7 shows the distribution of selected ensemble method-based papers.
5.5 Metrics Used in Software Defect Prediction
5.5.1 Software Metrics
The static source code consists of some features known as software metrics. These features are helpful, user-friendly, and extensively used. The data are module-based, and mainly it includes McCabe and Halstead features extractors of source code. For example, a practical defect prediction model only considers the best metrics and discards the metrics that may hurt the model’s predictive performance. McCabe [8] argued that the code containing a complex pathway is more prone to errors. Hence, the metrics reflect the pathways within a code module. Some commonly used McCabe metrics are (LOC, cyclomatic complexity, essential complexity, design complexity). Halstead [9] argued that a hard to read code is more error-prone. So, the metrics estimate the various concepts in a module and determine the complexity—Table 4 lists the software metrics.
The widely used Halstead metrics are base Halstead metrics (number of unique operators, number of unique operands, total number of operators, total number of operands, length, vocabulary), derived Halstead metrics (volume, potential minimum volume, program level, difficulty, effort, and time) [10]. Our study noted that all the selected papers used the software as mentioned above metrics in their defect prediction model.
5.5.2 Performance Measure Metrics
Performance measure metrics measure the model’s predictive performance. Table 5 furnishes some performance measure metrics used in software defect prediction.
Our study observed that the area under the receiver operating characteristic curve (AUC) is the most widely used performance measure as the skewness of defect data does not affect AUC. AUC is followed by accuracy as the second most widely used performance measure metric. Some other popular performance measures are precision, accuracy, specificity, G-mean, F-measure, performance variance, and error rate. Figure 8 represents the distribution of studied works on the usage of performance measure metrics.
5.6 Limitations of Existing Research
The research gap in the existing works lies in the lack of universally acceptable techniques across different datasets and classifiers, though past research used several feature selection techniques. Therefore, an approach is needed to find the right balance of the features using a suitable feature selection technique to enhance the model’s performance across different datasets and classifiers. In addition, high complexity in ensemble-based defect prediction models remains an unaddressed challenge that is crucial in bringing down the maintenance cost.
6 Conclusion
This literature survey aimed to comprehensively study different machine learning techniques used in software defect prediction. The study found that though defect prediction has come a long way, it still lacks a suitable approach in selecting appropriate features while discarding the irrelevant ones as there is no universally accepted feature selection technique available. This research work discussed different feature selection/reduction techniques accustomed to enhancing the accomplishment of the fault forecast system. The study has given a good emphasis on ensemble learning methods and observed that forecast model based on ensemble learning approaches give enhanced performance than individual classifier-based models. This work has also discussed the widely used software and performance measurement metrics in defect prediction. This concise work would guide future researchers in this emerging research area. In future, we aim to introduce a hybrid feature selection technique using the filter and wrapper approaches to an ensemble learning-based defect prediction model to enhance its predictive performance.
References
Jena AK, Das H, Mohapatra DP (eds) (2020) Automated software testing: foundations, applications and challenges. Springer Nature
Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology. RTI Project, 7007(11):1–309
Mäntylä MV, Lassenius C (2008) What types of defects are discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448
Saifan AA, Abu-wardih L (2020) Software defect prediction based on feature subset selection and ensemble classification. ECTI Trans Comput Inform Technol (ECTI-CIT) 14(2):213–228. https://doi.org/10.37936/ecti-cit.2020142.224489
Mangla M, Sharma N, Mohanty SN (2021) A sequential ensemble model for software fault prediction. Innov Syst Softw Eng 1–8
Zhou T, Sun X, Xia X, Li B, Chen X (2019) Improving defect prediction with deep forest. Inf Softw Technol 114:204–216
Sharmin S, Arefin MR, Abdullah-Al Wadud M, Nower N, Shoyaib M (2015) SAL: an effective method for software defect prediction. In: 2015 18th International conference on computer and information technology (ICCIT). IEEE, Dec 2015, pp 184–189
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320
Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc.
Mall R (2018). Fundamentals of software engineering. PHI Learning Pvt. Ltd.
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, May 2016, pp 309–320
Tantithamthavorn C, Hassan AE (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, May 2018, pp 286–295
Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402
Afzal W, Torkar R (2016) Towards benchmarking feature subset selection methods for software fault prediction. In: Computational intelligence and quantitative software engineering. Springer, Cham, pp 33–58
Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, May 2017, pp 146–157
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176
Hall MA (1999) Correlation-based feature selection for machine learning
Sabharwal S, Nagpal S, Malhotra N, Singh P, Seth K (2018) Analysis of feature ranking techniques for defect prediction in software systems. In: Quality, IT and business operations. Springer, Singapore, pp 45–56
Huda S, Alyahya S, Ali MM, Ahmad S, Abawajy J, Al-Dossari H, Yearwood J (2017) A framework for software defect prediction and metric selection. IEEE Access 6:2844–2858
Balogun AO, Basri S, Abdulkadir SJ, Hashim AS (2019) Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl Sci 9(13):2764
Kondo M, Bezemer CP, Kamei Y, Hassan AE, Mizuno O (2019) The impact of feature reduction techniques on defect prediction models. Empir Softw Eng 24(4):1925–1963
Balogun AO, Basri S, Capretz LF, Mahamad S, Imam AA, Almomani MA, Adeyemo VE, Alazzawi AK, Bajeh AO, Kumar G (2021) Software defect prediction using wrapper feature selection based on dynamic re-ranking strategy. Symmetry 13(11):2166
Kumar A, Kumar Y, Kukkar A (2020) A feature selection model for prediction of software defects. Int J Embedded Syst 13(1):28–39
Khan MZ (2020) Hybrid ensemble learning technique for software defect prediction. Int J Modern Educ Comput Sci 12(1)
Alsawalqah H, Hijazi N, Eshtay M, Faris H, Radaideh AA, Aljarah I, Alshamaileh Y (2020) Software defect prediction using heterogeneous ensemble classification based on segmented patterns. Appl Sci 10(5):1745
Malhotra R, Jain J (2020) Handling imbalanced data using ensemble learning in software defect prediction. In: 2020 10th International conference on cloud computing, data science & engineering (confluence). IEEE, Jan 2020, pp 300–304
Mehta S, Patnaik KS (2021) Improved prediction of software defects using ensemble machine learning techniques. Neural Comput Appl 1–12
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Anand, K., Jena, A.K. (2023). Software Defect Prediction: An ML Approach-Based Comprehensive Study. In: Bhateja, V., Mohanty, J.R., Flores Fuentes, W., Maharatna, K. (eds) Communication, Software and Networks. Lecture Notes in Networks and Systems, vol 493. Springer, Singapore. https://doi.org/10.1007/978-981-19-4990-6_46
Download citation
DOI: https://doi.org/10.1007/978-981-19-4990-6_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4989-0
Online ISBN: 978-981-19-4990-6
eBook Packages: EngineeringEngineering (R0)