A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Kalsoom, Anum; Maqsood, Muazzam; Ghazanfar, Mustansar Ali; Aadil, Farhan; Rho, Seungmin

doi:10.1007/s11227-018-2326-5

A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Published: 20 March 2018

Volume 74, pages 4568–4602, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Journal of Supercomputing Aims and scope Submit manuscript

A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Download PDF

Anum Kalsoom¹,
Muazzam Maqsood^1,2,
Mustansar Ali Ghazanfar²,
Farhan Aadil^1,2 &
…
Seungmin Rho ORCID: orcid.org/0000-0003-1936-6785³

933 Accesses
51 Citations
Explore all metrics

Abstract

Software quality is an important factor in the success of software companies. Traditional software quality assurance techniques face some serious limitations especially in terms of time and budget. This leads to increase in the use of machine learning classification techniques to predict software faults. Software fault prediction can help developers to uncover software problems in early stages of software life cycle. The extent to which these techniques can be generalized to different sizes of software, class imbalance problem, and identification of discriminative software metrics are the most critical challenges. In this paper, we have analyzed the performance of nine widely used machine learning classifiers—Bayes Net, NB, artificial neural network, support vector machines, K nearest neighbors, AdaBoost, Bagging, Zero R, and Random Forest for software fault prediction. Two standard sampling techniques—SMOTE and Resample with substitution are used to handle the class imbalance problem. We further used FLDA-based feature selection approach in combination with SMOTE and Resample to select most discriminative metrics. Then the top four classifiers based on performance are used for software fault prediction. The experimentation is carried out over 15 publically available datasets (small, medium and large) which are collected from PROMISE repository. The proposed Resample-FLDA method gives better performance as compared to existing methods in terms of precision, recall, f-measure and area under the curve.

Software Fault Prediction Using Machine Learning Algorithms

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

Article 09 February 2023

An Efficient Approach to Software Fault Prediction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The software industry has seen enormous growth due to its high use in daily life. The size and, ultimately, the complexity of software modules are rapidly increasing these days. This increase is giving rise to customer demands as well as for the software to be reliable and secure. It is practically impossible to create error-free and reliable software due to budget and time constraints. In the life cycle of software, the strategy adopted for dealing software faults must be properly planned. These faults, which, in case, are not removed, cause quality failure and cost escalation. Software Quality Assurance (SQA) is an important process to achieve the required software quality at a minimal cost. Different SQA processes like formal code inspections, code walkthroughs, software testing and software fault prediction are more likely to be included in it. Software fault prediction has now become a mandatory step in the life cycle of software to identify fault-prone modules in early stages of software development before testing time [1, 2].

The objective of software fault prediction is to identify the software faults before testing phase by using certain software attributes or metrics. Software fault prediction models are constructed using previous releases of similar projects and perform fault prediction for current software. Software fault prediction provides assistance in allocating SQA resources in an economical and efficient way by predicting faulty components of software. Identification of fault-prone software modules at early phases of software development helps to improve the quality of software system [3]. This enables us to guide software testers to bring into focus fault-prone components in the first place. This can only be achieved by the identification of software quality metrics such as fault percentage, required effort, testability, maintainability and reliability at the initial developmental phase. Various software fault prediction approaches are proposed for the prediction of fault-prone and non-fault-prone modules using Halstead and McCabe metrics [1].

Various software metrics and approaches are available for fault prediction [4]. Number of approaches have been proposed previously for software fault detection, many of which aim to classify software modules into faulty or non-faulty categories. The aim is to utilize the success of machine learning classification techniques which include ANN, SVMs, Linear Regression (LR), Decision Tree, NB, Genetic Programming and Random Forest (RF) for software fault detection. Some researchers have used ensemble methods for classification of faulty modules. In most cases, the results for these classifiers cannot be generalized for large-sized software. Performance of these abovementioned classifiers is severely affected by data quality; biased datasets [5] curse of dimensionality [6] and class imbalance problem [7]. The dimensionality problem is caused by a lot of unnecessary features (i.e., software metrics) that can be solved by feature selection. The class imbalance problem is caused by redundant instances of similar classes and is handled by instance sampling (or reduction) which, from majority classes, samples a subset of the instances. Although both techniques proved them to be effective solutions, few researchers have merged feature selection and instance reduction simultaneously. This has been done for the sake of improving data quality in software fault prediction [8, 9].

The datasets used for software fault prediction comprises of following class categories: “fault-prone” and “non-fault-prone.” Generally, the fault-prone category is under-represented in data sets. Therefore, development of an efficient software quality models is necessary for the identification of change prone classes. It is because all of these classes must be predicted with ultimate efficiency so that software’s quality can be improved. A classification model which can be labeled as efficient and effective must be able to detect classes of both kinds of nature which are “fault-prone” and “non-fault-prone,” that too with high precision and accuracy. However, what generally happens is that non-fault-prone classes are predicted with high accuracy, while fault-prone classes are predicted with less accuracy. This problem is generally caused by the imbalanced nature of training data. Since the datasets are influentially imbalanced, this lower recognition in case of fault-prone classes can be easily ignored and better accuracy rates are showcased by classifiers on the whole. However, this case can be unfavorable and may lead to inexact judgments causing losses and poor reputation of software organization. For fault-prone classes, such low prediction accuracy is not advisable as it would lead to a bad quality software product. This is so because there is a requirement of allocating sufficient resources to fault-prone classes so that they can be accurately constructed, scrutinized, demonstrated and tested [10]. These classes must be properly handled to prevent occurrences of any kind of faults in software’s future version. Keeping these issues in mind, the objectives of this research paper are (1) to study the effect of software fault prediction dataset size on the performance of most widely used machine learning classifiers (2) to handle the class imbalance problem for software fault prediction and (3) to use an effective approach for efficient feature selection and dimensionality reduction using Fisher linear discriminant analysis (FLDA).

In this paper, the aforementioned problems are addressed by proposing an efficient algorithm that can handle dimensionality reduction and class imbalance problem. We have explored the effect of data sampling techniques and a dimensionality reduction technique on the performance of machine learning classifiers. To start with, most widely used machine learning classifiers are used on a large number of publically available small-, medium- and large-size datasets. Two standard sampling techniques named SMOTE and Resample, with replacement, are used to handle the class imbalance problem [11]. After using sampling methods, the behavior of various machine learning approaches clearly progressed toward betterment. FLDA is used separately for the selection of most discriminative software metrics and dimensionality reduction. We designed a new approach to handle dimensionality reduction and feature selection issue by using FLDA. It is a supervised learning dimensionality reduction approach that selects the most discriminative features with respect to class labels. We intend to utilize this functionality to improve performance. To the best of our knowledge, FLDA is not used for software fault detection so far. Using FLDA significantly improves the performance of machine learning classifiers for datasets of all sizes. Machine learning classifiers produced exceptional results when Resample with replacement and FLDA is used together. This paper has the following contributions;

We performed an analysis to check the performance of most widely used machine learning classifiers on software fault prediction datasets of different sizes.
An FLDA-based dimensionality reduction approach is proposed to detect most discriminative software metrics for software fault prediction.
We explored the performance of two state-of-the-art sampling techniques in combination with FLDA. The results have been significant.
To verify generalizability of proposed method by evaluating it over 15 publically available datasets including small-, medium- and large-sized software.
The proposed method not only correctly identifies non-faulty modules but also correctly classify faulty modules.

The rest of the paper is organized as follows; Sect. 2 represents related work of classification, feature selection and class imbalance issue for software fault prediction. Section 3 explains the methodology for software fault prediction. Section 4 presents experimental methodology and Sect. 5 explains results followed by the conclusion.

2 Related work

A lot of work is done in the field of software fault prediction. Related work is divided into three sections according to the objectives of this paper. The first section covers the literature related to software fault prediction. The second section explains work done in the area of class imbalance problem followed by related work of feature selection for software fault prediction.

2.1 Software fault prediction

In relation to fault prediction techniques, certain studies are reportedly using generalized linear regression, Poisson regression, negative binomial regression, genetic programming, and neural network.

Graves et al. [12] performed different experiments to predict a number of faults using generalized linear regression (GLR). These experiments were performed for a large telecommunication company using various software change history metrics. They proposed two kinds of different models for a number of fault prediction. In the first experiment, a stable model was designed using many “past faults” for fault prediction. After this stable model, another model was presented that uses certain change history metrics and it was called a GLR-based model. Comparison of these two models showed that GLR-based model performed better than the stable model. However, the combination of “Module age,” “Modular changes” and “Lifespan of changes” metrics produced better results for software fault prediction. Other software metrics like module size and its complexity performed poorly for fault prediction. The primary aim of this study was to evaluate the performance of different change history metrics and their combinations for software fault prediction using GLR. However, other machine learning prediction algorithms were not explored in this study.

Another study presented by critical analysis is performed for the use of negative binomial regression to predict fault density and number of faults [13]. The experiments were performed on two industry projects using Lines of Codes (LOC) and different file characteristics. The results suggested that negative binomial regression performed well for software fault prediction [14]. However, later on, another fault prediction model was designed, based on LOC metric, which produced comparable results to negative binomial regression (NBR). This not only requires less effort to design software fault prediction model but also generate accurate results. Evaluation of results was based on performance; evaluation none other than faults found in the top 20% file predicted to be fault-prone [13, 15]. Some other studies have also been reported by Janes et al. [16] using NBR. Janes et al. designed an NBR model [15] for telecommunication system using object-oriented metrics to predict the fault counts. It was claimed that NBR produced comparatively better results to predict fault counts. Yu [14] has performed a comparative analysis of NBR and logistic regression for fault prediction. Logistic regression performed comparatively better for the prediction of fault-prone software modules and NBR performed better for prediction of multiple faults in a software module.

Ensemble methods have been largely used for software fault prediction in recent years, especially for binary classification. A similar ensemble method was presented by Misirli et al. [17] for software fault prediction using a combination of three different techniques. These techniques were Naïve Bayes, ANN, and Voting Feature Intervals. The results suggested that an ensemble classifier produced significantly better prediction accuracy in comparison with Naïve Bayes classifier alone. In another study performed by Zheng [18], a comparative analysis of three cost-effective boosting neural networks for software fault prediction was presented. The results claimed that cost-sensitive neural networks achieved significantly accurate prediction for software defect prediction. Twala [19] assessed ensemble classification methods to predict faults for a large space system. In this study, five fault prediction approaches were used as base learners for ensemble method. The result suggested that ensemble methods improved prediction accuracy every time in comparison with individual classifiers.

Wang et al. [20] presented a comparative analysis of various ensemble methods with Naïve Bayes for software defect prediction. According to results, ensemble methods—RF and voting—produced better prediction accuracy. Other studies like [4, 21] have proposed ensemble methods based work for fault prediction and have also compared performances of ensemble method with other fault prediction techniques. Aljaman and Alish [4] used bagging and boosting for software fault prediction and produced a better performance as compared to individual classifiers used for software fault prediction. A study was conducted by Khoshgoftaar et al. [21] in the year 2003 for performance evaluation of three combination techniques using ensemble method in predicting software quality. Results suggested that combination approaches proved to generate more efficient performance for prediction. Certain ensemble methods were recently investigated for the sake of software prediction maintenance/changing efforts [4]. This study was planned and evaluated over two publicly available datasets using some design level software metrics. The ensemble method proved to give better prediction accuracy results as compared to other individual classifiers.

2.2 Class imbalance problem in software fault prediction

It is mandatory to handle the class imbalance problem for efficient model development. A brief overview regarding nature and issues associated with the field of imbalanced learning [4]. According to them, these primary causes, accuracy, class distribution and error cost lead to the ineffective performance of certain learning methods. Class imbalance can be handled using come methods and these broad categories suggested to handle class imbalance as follows: (1) incorporating the use of sampling method, (2) use of cost-sensitive methods, (3) active learning and kernel-based methods, (4) use of ensemble learners, (5) application of some specific evaluation metrics, (6) incorporation of human knowledge, (7) segmentation of data, (8) non-greedy methods for used for searching, (9) use of an effective inductive bias and (10) other methods like unary classification methods or novelty detection methods. It is worth mentioning that some classification methods do not assume the imbalanced nature of data. It has been widely useful in the development of classification models particularly for the imbalanced dataset in the field of quality engineering. However, these methods are rarely used in software engineering because of difficulty in determining a suitable threshold for the execution of an efficient classification process.

In recent years, a large number of applications face class imbalance problem especially sentiment analysis, fraud detection, video mining, text mining, churn prediction and other bioinformatics applications. Researchers have explored various methods to handle class imbalance problem for software defect prediction. Wang and Yao [22] conducted a study using threshold mining, resampling, and ensembles to predict software defects by using imbalanced datasets [23]. Seiffert et al. [24] to explore sampling methods to improve the performance of software fault prediction models. Seliya and Khoshgoftaar [6] explored the cost-sensitive learning techniques using decision trees to develop software defect prediction models. The misclassification cost was taken as the key parameter for model training. Galar et al. [25] and Rodriguez et al. [26] presented a comparison of cost-sensitive, sampling and ensemble learning methods development for software defect prediction using imbalance data. Gao et al. [6] suggested that use of feature selection with sampling techniques improves the performance of software fault prediction.

2.3 Feature selection approaches

In this section, we discuss the importance of feature selection in the background of software fault prediction. We then discuss previous related work about feature selection and dimensionality reduction. Software fault prediction is an important step for getting to know about faulty software modules. Many researchers prefer machine learning classification models to predict these faulty modules. These classification models require training data collected from previous projects where faulty modules have been identified. It is widely discussed that dataset quality plays an important role to increase prediction accuracy of a classification algorithm. The performance of these machine learning algorithms can be further improved by data preprocessing that includes feature selection and instance reduction. Feature selection process consists of identifying and discarding irrelevant features from a dataset so that only discriminant features are selected for training classification models. Some feature selection methods have been widely used that are categorized as filter-based and wrapper-based. Filter-based methods select features that are most relevant based on the correlation between features and class labels. Wrapper-based methods require feedback from classification model and select feature vector iteratively that may lead to high computational complexity. Researchers have conducted a comparison between filter- and wrapper-based feature selection for software fault prediction). Shivaji et al. [26] evaluated five feature selection methods including three filter-based ranking methods and two rapper-based methods using Naïve Bayes and SVM. All the feature selection methods proved to improve the performance of software fault prediction. However, the improvement was only comparable for classifiers after using feature selection methods.

Gao et al. [6] performed a comparison of feature selection methods in predicting faulty software modules for a large legacy telecommunication system. They have used seven filter-based methods and three wrapper-based feature selection methods using search-based greedy techniques. This result suggested that removing 85% of software metrics did not affect prediction accuracy and even the performance was improved in some cases. Wang et al. [22] presented a comparative analysis for evaluation of seventeen ensembles of eighteen feature ranking methods. The results suggested that use of few rankers (i.e., 2–4) improved the results. Dimensionality reduction is a process to select a subset of representative features (software metrics) to train a classification model [11]. Researchers have used different dimensionality reduction techniques for software fault prediction recently. Dimensionality reduction removes redundant features to handle the problem of inter-class imbalance. Random sampling is one of the effective methods for instance reduction which is also effective in minimizing the impact of imbalanced distributions among classes [6].

Experimental results suggested that a combination of random under-sampling and Naïve Bayes yielded good performance for highly imbalanced data. Pelayo et al. [35] presented a comparative analysis between random under-sampling and oversampling for six software datasets. The statistical analysis suggested that under-sampling proved useful in improving prediction performance of classification algorithms. Khoshgoftaar et al. [6] discussed the effects of random sampling combined with other data reprocessing methods including feature ranking. Their results also suggested the effectiveness of random sampling to deal imbalanced datasets. Until recently, only a few researchers have combined feature selection with sampling to handle data preprocessing for software fault prediction. Liu et al. [37] combined feature selection methods with instance sampling for software fault prediction. However, the purpose of instance sampling was to reduce a total number of instances instead of handling class imbalance.

In this study, we have handled all aforementioned issues related to software fault detection. The suitability of different machine learning classifiers are explored for large-sized datasets and best four classifiers are selected based on the performance. SMOTE and Resample methods are used to handle class imbalance issue. We incorporated FLDA with these sampling issues to handle feature selection problem.

3 Proposed methodology

The proposed mythology for software fault prediction using preprocessing and FLDA is outlined in Fig. 1. Each step is explained in the following sections.

3.1 Fault prediction techniques

This section presents machine learning classifiers used in this study. As one of the prime objectives of this study is to explore the suitability of different machine classifiers for software fault prediction, therefore, nine most widely used machine learning classifiers have been explored for software fault detection. These classifiers include Bayesian, Naïve Bayes (NB), ANN, SVMs, KNN, AdaBoost, Bagging, Zero R, and RF. The details of these classifiers are presented in Table 1. Best four classifiers based on performance are further picked and used for evaluation of the proposed system. The design details for these four algorithms are given in this section.

Table 1 Details of classifiers used for software fault detection

A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Abstract

Similar content being viewed by others

Software Fault Prediction Using Machine Learning Algorithms

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

An Efficient Approach to Software Fault Prediction

Explore related subjects

1 Introduction

2 Related work

2.1 Software fault prediction

2.2 Class imbalance problem in software fault prediction

2.3 Feature selection approaches

3 Proposed methodology

3.1 Fault prediction techniques

3.1.1 Support vector machines (SVM)

3.1.2 Random forest

3.1.3 Multi-layer perceptron

3.1.4 Naïve Bayes

3.2 Data preprocessing

3.2.1 Resampling with replacement method

3.2.2 Smote

3.3 Feature selection method

3.3.1 Fisher linear discriminant analysis (FLDA)

4 Experimental setup

4.1 Datasets

4.2 Evaluation metrics

5 Results and discussion

5.1 Results using different machine learning classifiers

5.2 Results using SMOTE and Resample

5.3 Results for SMOTE-FLDA and Resample-FLDA

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation