An empirical study on predictability of software maintainability using imbalanced data

Malhotra, Ruchika; Lata, Kusum

doi:10.1007/s11219-020-09525-y

An empirical study on predictability of software maintainability using imbalanced data

Published: 05 August 2020

Volume 28, pages 1581–1614, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Software Quality Journal Aims and scope Submit manuscript

An empirical study on predictability of software maintainability using imbalanced data

Download PDF

686 Accesses
9 Citations
Explore all metrics

Abstract

In software engineering predictive modeling, early prediction of software modules or classes that possess high maintainability effort is a challenging task. Many prediction models are constructed to predict the maintainability of software classes or modules by applying various machine learning (ML) techniques. If the software modules or classes need high maintainability, effort would be reduced in a dataset, and there would be imbalanced data to train the model. The imbalanced datasets make ML techniques bias their predictions towards low maintainability effort or majority classes, and minority class instances get discarded as noise by the machine learning (ML) techniques. In this direction, this paper presents empirical work to improve the performance of software maintainability prediction (SMP) models developed with ML techniques using imbalanced data. For developing the models, the imbalanced data is pre-processed by applying data resampling methods. Fourteen data resampling methods, including oversampling, undersampling, and hybrid resampling, are used in the study. The study results recommend that the safe-level synthetic minority oversampling technique (Safe-Level-SMOTE) is a useful method to deal with the imbalanced datasets and to develop competent prediction models to forecast software maintainability.

An empirical study for software change prediction using imbalanced data

Article 05 January 2017

Handling class imbalance problem in software maintainability prediction: an empirical investigation

Article 03 December 2021

Analysis of the Performance of Learners for Change Prediction Using Imbalanced Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the progression of time, software systems are becoming substantially large and complex. The maintenance of such complex systems is becoming enormously challenging for software professionals. Some software organizations may be unable to begin new projects since most of their assets may be devoted to maintaining the old systems. Therefore, predicting software maintainability is becoming an impending area in software engineering. It focuses on the design and development of prediction models to forecast software maintainability when the software is in the initial stages of its development.

The knowledge about high-maintainability effort classes in advance helps to allocate the limited resources of an organization optimally to these classes. It results in good quality and highly maintainable software developed within the time and budget. Over the years, there has been a debate on measuring software maintainability. Software maintainability has been seen as a software quality attribute and defined according to numerous facets. According to Coleman et al. (1994), maintainability is defined as “the ease with which software component can be modified to correct the existing faults after delivery.” Aggarwal et al. (2002) suggested that maintainability is an integrated measure of software characteristics like the readability of source code, software understandability, and quality of documentation. Software maintainability is the degree of difficulty in understanding and performing changes in the software, according to Schneberger (1997). Oman and Hagemeister (1994) proposed the maintainability index (MI) to assess and quantify the maintainability. The maintainability index is a linear polynomial equation computed from the software metrics. The software metrics describe the characteristics of software like average Halstead volume of a program, average cyclomatic complexity of the program, the average number of lines of source code, and the average number of comments. The polynomial equation on evaluation gives a single number that indicates the maintainability. The lower the value of MI, the lesser would be the maintainability of the software and vice versa (Oman and Hagemeister 1994). Later, Ash et al. (1994) and Coleman et al. (1995) revised the MI proposed by Oman and Hagemeister (1994) and validated it on software written procedural programming languages like Pascal, C, Ada, and Fortran. However, the correctness of MI for software systems implemented in object-oriented (OO) languages has not been advocate much. Li and Henry (1993) defined software maintainability in the form of the lines of source code changed during the period of maintenance to correct faults. This study advocated that the software maintainability has a strong correlation with the OO metrics describing various software characteristics such as inheritance, coupling, and cohesion. Later various researchers (Van Koten and Gray 2006; Zhou and Leung 2007; Thwin and Quah 2005) measured the maintainability in the form of the lines of source code changed during the maintenance. The more changes encountered in a class in the maintenance phase means more maintainability effort is required for that class and vice versa. The ultimate goal of developing these models is to predict those software classes accurately that require high maintainability effort. In the literature, various maintainability prediction models have developed with statistical, ML, evolutionary, and hybridized techniques. The software metrics have been used as predictors for developing the maintainability prediction models (Wang et al. 2009; Dagpinar and Jahnke 2003; Kumar and Rath 2015; Malhotra and Lata 2017). The high maintainability effort classes are critical for any project because these classes must be tested cautiously to decrease the probability of occurrence of faults. Also, such classes should need to be well-documented to augment understandability to carry out future maintenance activities. Software metrics ranging from procedural metrics such as number of unique operators, number of unique operands, and cyclomatic complexity of module (McCabe 1976; Halstead 1977; Schneidewind 1979) to OO metrics (Chidamber and Kemerer 1994; Henderson-Sellers 1996; Martin 2002) characterize and quantify various aspects of the software systems and play a vital role in model development. The dataset used for training the maintainability prediction models should consist of sufficient instances of high- and low-maintainability effort classes to train the model effectively. However, in reality, during maintenance, few software classes demand complex interventions, resulting in more changes in the code lines, i.e., high maintainability effort. Therefore, there is an imbalance among the number of instances of the classes requiring high (minority class)- and low-maintainability effort (majority class), resulting in an imbalanced dataset. It is challenging to train the prediction models to predict the unseen data points of these classes with reasonable accuracy using imbalanced data.

Therefore, this study is important because it deals with the development of effective SMP models by treating imbalanced datasets to predict high maintainability effort classes accurately. Identification of high maintainability effort classes is crucial as these classes need more attention during the software maintenance and testing phase as such classes are likely to be sources of defects and future advancements (Eski and Buzluca 2011). Appropriate distribution of resources to these classes helps in enhancing the quality of the software product. However, with the imbalanced datasets, many ML techniques encounter enormous trouble (Chawla et al. 2004; Fawcett and Provost 1997; Kubat et al. 1998), and the prediction models obtain higher prediction accuracies just for the majority class rather than those for both of the classes. The software maintainability prediction models developed using imbalanced data do not have any practical significance as they may misclassify the minority class (high maintainability effort) instances. Thus, such misclassification may lead to improper resource allocation to the misclassified classes resulting in poor quality software products. Early prediction of high maintainability effort classes accurately before the software product release helps the software professionals to test these classes critically. Also, software developers can effectively refactor such classes to improve their maintainability.

In the software engineering domain, the imbalanced class problem is addressed to build competent models to predict faulty and change-prone classes (Malhotra and Khanna 2017; Choeikiwong and Vateekul 2015; Gao et al. 2015). However, no study dealt with handling the imbalanced class problem in SMP. Therefore, to treat the imbalanced data problem in SMP, this study applies various data resampling methods, including oversampling, undersampling, and hybrid resampling, before learning the SMP models to improve their performance.

Data resampling: The data resampling methods modify the training dataset in such a manner that it includes enough quantity of data points of minority and majority class. These methods include oversampling, undersampling, and hybrid resampling. In the oversampling techniques, the new data points of the rare or minority class produced so that the dataset contains the proportionate number of instances of the minority and majority class. The undersampling methods work by expelling a few data points of the majority class to make a proportionate dataset. Hybrid resampling combines the oversampling and undersampling strategy (Kotsiantis et al. 2006).

The study has the following objectives:

To construct SMP models to predict high maintainability effort classes by treating the imbalanced datasets with data resampling techniques.
To assess the predictive performance of the developed SMP models and validate them statistically.
To investigate the improvement in the predicting performance of the built SMP models after data resampling.

We achieve the above-specified objectives by finding answers to the following research questions (RQs).

RQ1: What is the performance of SMP models developed using ML techniques on original imbalanced datasets?

RQ2a: What is the performance of SMP models developed using ML techniques after balancing the datasets with data resampling methods?
RQ2: Which data resampling method improves the performance of the prediction models the most?

In the interest of answering the above research questions, we build up SMP models that use OO metrics as predictors and software maintainability as the outcome. The datasets extracted from eight Apache open-source software packages are used to develop SMP models with the application of ML techniques. The stable evaluators Balance and G-mean are used in this study to evaluate the predictive performance of the SMP models. Also, the study conducts a statistical analysis of constructed models to strengthen the conclusions. The organization of the remaining paper is given below:

Section 2 presents related work. Section 3 describes the research methodology. Section 4 describes the results of the study. Section 5 presents the threats to validity, and Section 6 describes the conclusions and future work.

2 Related work

We present the related work in two sections. The first section discusses the state-of-the-art of SMP models, whereas the second section discusses the studies which have faced and handled the class imbalance problem.

2.1 Literature work related to studies predicting software maintainability

This section discusses various studies that have proposed models to predict software maintainability. Different learning techniques ranging from ML, statistical, and hybridized have been used to construct models by building up the relationship of software metrics with maintainability. An empirical analysis of the dataset extracted from two software systems written in Java language is conducted by Dagpinar and Jahnke (2003). The study revealed that coupling and size are strong maintainability predictors. The study by Elish and Elish (2009) proposed TreeNet classifier. The outcome of the study evidenced that OO metrics are good predictors of maintainability.

A non-linear model, project-pursuit regression, is given by Wang et al. (2009) to build SMP models. The study developed SMP models using OO metrics extracted from two commercial software systems. The study by Jin and Liu (2010) used clustering techniques to predict software maintainability. The study empirically validated OO metrics collected from software projects written in the C++ language. A comprehensive statistical comparison of 27 different ML techniques to develop models to forecast maintainability was conducted by Kaur and Kaur (2013). The study revealed that instance-based classifier performs best to predict maintainability. The study by Olatunji and Ajasin (2013) proposed extreme learning machines to develop SMP models using OO metrics. The study by Zhang et al. (2015) suggested a framework, SMPlearner, where they employed 44 metrics collected at different hierarchy levels and developed SMP models. They validated SMPlearner on eight datasets pertaining to open-source software systems. Kumar and Rath (2015) built the SMP model by applying hybridized techniques. This study was carried out on two commercial datasets widely used in the literature. Wang et al. (2019) proposed a fuzzy network framework to predict software maintainability using two widely used commercial datasets, and the study advocated that the proposed framework improves the accuracy of SMP models compared with standard fuzzy-based models. Kumar et al. (2019) used class-level software metrics with three different types of neural networks to train SMP models. The genetic algorithm with gradient decent approach is used to find optimal weights of neural networks. Schnappinger et al. (2019) extracted software metrics using static analysis tools and predicted software maintainability using diverse ML techniques. Thus, in this way, we see various models to predict software maintainability have been successfully developed and validated on software metrics. This study is related to the studies published in the literature in the manner that like the previous studies (Zhang et al. 2015; Kumar and Rath 2015; Wang et al. 2009; Kaur and Kaur 2013), this study also predicts software maintainability using the internal characteristics of the software systems. However, the imbalanced data problem has not been touched in any of the studies published. This problem is taken care of in this investigation to develop effective prediction models to forecast the software maintainability.

2.2 Literature work related to studies taking care of class imbalance problem

The imbalanced class problem arises when in a particular dataset, the quantity of data points of one class is far more compared to the other. With such datasets, the many ML techniques encounter serious troubles (Laurikkala 2001) . Many times, with the imbalanced datasets, ML techniques learn to predict only the dominant (majority) class instances. In contrast, the cases of the class of interest (minority class) are discarded by being treated as noise (Maloof 2003). Various resolutions are given by the researchers to address the imbalanced class problem at the algorithm level and the data level. The data level solutions employ numerous kinds of data resampling strategies to get rid of the issue of imbalanced data. The algorithmic solutions include regulating the cost of both classes to tackle the problem (Chawla et al. 2004). For predictive modeling in software engineering, the imbalanced class problem encountered prediction of classes that are likely to be change-prone and defective. This problem is solved in different ways to uplift the performance of the predictive models. Choeikiwong and Vateekul (2015) proposed an algorithm-level solution to class imbalance problem for software fault prediction. They implemented a classifier in which the separation hyperplane of the Threshold Adjustment Support Vector Machine (R-SVM) is adjusted to cut down the bias from the dominant class. This study was performed on 12 datasets. The findings of the paper showed that R-SVM improved the prediction rate of models for predicting faulty modules. Gao et al. (2015) examined four different scenarios of feature selection and data sampling to boost the predictive capability of defect prediction models developed with imbalanced datasets. This study confirmed that feature selection on resampled data improves the predictive capability of the models. Laradji et al. (2015) proposed an average probability ensemble (APE) incorporating several base classifiers to cope up with the imbalanced class problem. To further improve the prediction capability, feature selection was combined with APE in this study. Siers and Islam (2015) proposed a cost-sensitive classifier, which was an ensemble of the decision trees to tackle the problem. Pelayo and Dick (2007) investigated the synthetic minority oversampling, which balances the proportion of the defective and non-defective modules.

The paper by Sun et al. (2012) used ensemble and coding schemes to handle class imbalance for predicting defective classes. The study first converted the unbalanced binary class data into multiclass balanced data by using coding-based schemes and then trained the defect prediction models from this multiclass data. Tan et al. (2015) employed three data resampling approaches and updatable classification techniques to boost the predictive capability of fault prediction models learned using imbalanced class datasets. Wang and Yao (2013) investigated different methods, including resampling, ensembles, and threshold moving, to improve defect prediction models. The study also proposed a dynamic version of AdaBoost to handle the imbalanced class issue in the area of defect prediction. A paper by Zheng (2010) proposed three cost-sensitive boosting techniques to improve the prediction rate of neural networks. Malhotra and Khanna (2017) developed software change prediction models from imbalanced data by employing three data resampling methods and meta cost learners. In this way, various studies in the literature have dealt with class imbalance situations in predictive modeling in the software engineering domain to improve defect prediction and change prediction models. However, the imbalanced class issue is untouched in literature in the maintainability prediction. Therefore, in this direction, this study will be the first study to deal with the imbalanced class problem for SMP.

3 Research methodology

The research methodology comprises all the components of the study, experimental design, data resampling methods, and ML techniques used for developing the SMP models.

3.1 Components of the empirical study

3.1.1 Predictor and response variable

Training a prediction model for a predictive task requires a dataset comprising predictors (independent) and response (dependent) variable. For software quality prediction models, the predictor variables are the software metrics. Software metrics quantify various aspects of the software systems and are used to predict and estimate different software characteristics (Chidamber and Kemerer 1991; Ebert and Dumke 2007; Fenton and Bieman 2014). Over the years, different software metrics (procedural and OO) are proposed, and their relationship with software maintainability is established.

3.1.2 Predictor variables

We use OO metrics as the independent variable to develop prediction models in this study as the study is carried out using OO systems developed in the Java programming. The OO metrics used in the study include Chidamber and Kemerer (C&K) metric suite (Chidamber and Kemerer 1994), Quality Model for Object-Oriented Design-QMOOD (Bansiya and Davis 2002) metric suite, and metrics proposed by Henderson-Sellers (Henderson-Sellers 1996) and Martin (Martin 2002). C&K metrics suite includes the metrics, namely WMC: weighted methods per class, DIT: depth of inheritance tree, NOC: number of children of a class, CBO: coupling between the objects, LOCM: lack of cohesion among methods, and RFC: response for class. The QMOOD metric suite includes the metrics, namely, MOA: measure of aggregation, DAM: data access metric, MFA: measure of functional abstraction, NPM: number of public methods, and CAM: cohesion among methods of a class. The Martin metrics are Ce: efferent coupling and Ca: afferent coupling. Few other metrics used in the study are IC: inheritance coupling, CBM: coupling between the class methods and AMC: average method complexity, LOC: lines of source code, and LCOM3. LCOM3 is the variation of LCOM given by Henderson-Sellers. These metrics describe different aspects, namely, cohesion, coupling, size, inheritance, composition, and encapsulation of OO systems.

The metrics WMC, NPM, LOC, DAM, and AMC are indicators of the size of a class. The metrics CBO, RFC, Ca, Ce, IC, and CBM measure the coupling. The inheritance property is measured with the help of NOC, DIT, and MFC. The metrics LCOM, CAM, and LCOM3 are indicators of class cohesion, whereas MOA measures composition. These metrics that quantify the different characteristics of a class are regarded as internal quality attributes. The class qualities such as testability, reliability, reusability, and maintainability belong to a set of quality attributes that are called external quality attributes (Al-Dallal 2013). The present study is based on Morasca’s (2009) suggestion to predict software maintainability, i.e., external quality attribute, by constructing probabilistic models. In this study, the prediction models use the above OO metrics as the predictor variables to estimate the external quality attribute, namely, software maintainability. The internal quality attributes used have significant relation with software maintainability. For instance, the attributes WMC, NPM, LOC, DAM, and AMC measure the size of a class. If the size increases, the code would likely be less maintainable, i.e., likely to require high-maintainability effort (Al-Dallal 2013). Table 1 shows a brief explanation of the predictors used in this paper. As researchers for predictive modeling in the domain of software engineering have extensively used these metrics (Singh et al. 2010; Kpodjedo et al. 2011; Giger et al. 2012; Gyimothy et al. 2005; Olague et al. 2007; Elish and Al-Rahman Al-Khiaty 2013), this motivates us to use these metrics for our study. Radjenovic et al. (2013) conducted a review of 106 papers predicting defects in classes. This review revealed that C&K metrics have frequently been used for predicting faults in classes. Therefore, our study also has taken C&K metrics to validate them for predicting software maintainability. The paper Lu et al. (2012) assessed sixty-two OO metrics for estimating the change-prone classes. The study discovered in the study that LOC, CBO, LCOM, and CAM are significant metrics. The C&K metrics, combined with QMOOD metrics, are used by Eski and Buzluca (2011) to predict change-prone classes. The study advocated the combination of C&K and QMOOD metrics be the competent predictors for predicting classes that are likely to be changed in the future. Hence, our paper uses an effective combination of predictors successfully validated in literature for predictive modeling tasks in software engineering.

Table 1 OO metrics studied

An empirical study on predictability of software maintainability using imbalanced data

Abstract

Similar content being viewed by others

An empirical study for software change prediction using imbalanced data

Handling class imbalance problem in software maintainability prediction: an empirical investigation

Analysis of the Performance of Learners for Change Prediction Using Imbalanced Data

Explore related subjects

1 Introduction

2 Related work

2.1 Literature work related to studies predicting software maintainability

2.2 Literature work related to studies taking care of class imbalance problem

3 Research methodology

3.1 Components of the empirical study

3.1.1 Predictor and response variable

3.1.2 Predictor variables

3.2 Response variable

3.2.1 Software system studied

3.2.2 Performance metrics

3.2.3 Statistical tests

3.3 Experimental setting

3.3.1 Data collection

3.3.2 Data pre-processing

3.3.3 Applying data resampling methods

3.3.4 Maintainability prediction model development and evaluation

3.4 Data resampling methods used

3.5 ML techniques

4 Results and analysis

4.1 RQ1: What is the performance of SMP models developed using ML techniques on original imbalanced datasets?

4.2 RQ2a: What is the performance of SMP models developed using ML techniques after balancing the datasets with data resampling methods?

4.3 RQ2b: Which data resampling method improve the performance of the prediction models the most?

4.4 Discussion on results

5 Threats to validity

6 Conclusions and future work

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation