Object oriented software metrics threshold values at quantitative acceptable risk level

Singh, Satwinder; Kahlon, K. S.

doi:10.1007/s40012-014-0057-1

Object oriented software metrics threshold values at quantitative acceptable risk level

Original Research
Published: 25 November 2014

Volume 2, pages 191–205, (2014)
Cite this article

Download PDF

CSI Transactions on ICT Aims and scope Submit manuscript

Object oriented software metrics threshold values at quantitative acceptable risk level

Download PDF

Satwinder Singh¹ &
K. S. Kahlon²

4175 Accesses
13 Citations
Explore all metrics

Abstract

The metrics can be applied by software maintenance, testing and evolution teams for a variety of purposes. Various research studies have designed metrics models for analyzing the quality of software. However, it is hard to assess the quality of software with a single metrics value. A metrics value alone is meaningless without its threshold values. In the current paper, study derives a threshold metrics value against the bad smell using risk analysis at five different levels. Three versions of Mozilla Firefox were used as a dataset to validate the study. The results show that some metrics have threshold values at various risk levels that are of practical use in predicting faulty classes. Finally one threshold value was selected from the various risk levels (for bad smell) by determining the largest area under receiver operating character curve for faulty classes at corresponding risk levels.

On the Comparison of Static and Dynamic Metrics Toward Fault-Proneness Prediction

The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction

Article 02 August 2017

Sofware Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for Object-Oriented Applications

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the growth of the software industry, software evolution has become an important concern of software developers. Evolutions of software over many years often lead to obscure systems. It is hard to predict where changes have to be applied on the code and their effects on the system. This will make the maintenance and evolution of software more and more complex and expensive. In order to have efficient evolution and maintenance of code the structure and the design of a system have to be improved. To do this, first it is required to identify the areas for improvement in the code design and the problems faced because of this poorly coded design, i.e. bug origination. Manual identification of the problem of poor code design is not consistent [12]. The support of a tool is required for problem detection. Such a tool would accept input in the form of source code and produce output consisting of estimated flaws in the source code. Refactoring is one of the solutions for improving the source code. Refactoring changes the internal structure of the object oriented code without affecting the overall behavior of the system. Refactoring plays an important role in reengineering and reverse engineering, which makes the code easier for software developers to understand [22]. The process of refactoring is applied in three different stages, namely identification of the problematic area, selection of an appropriate refactoring technique and application of the refactoring technique [31]. Fowler and Beck [22] have defined the 22 bad smells that describe design problems and suggested a number of refactoring techniques that [22] can help in improving the design structure of a code.

The agile community has used the principal of refactoring as a solution to bad code smells. Fowler [22], however, did not suggest any criteria for making the decision to refactor. Software metrics have often been used to assess the quality of the software. Although a number of studies have been done to find a relationship between metrics and bad smell [8, 10, 11, 29, 30], these studies have not exploited the threshold values of the metrics. Threshold value is necessary to generalize the view on different software applications. Proper threshold value is an orientation for the developer in the task of refactoring. With threshold value, the developer and tester can scrutinize the classes to find candidates for refactoring on the basis of smelly classes. A class having metrics value greater than the threshold gives a sign of poor design, it needs the inspection to make a design better. In this work study has derived the metrics threshold values for identification of bad smells using a logistic regression model. The model was first proposed by Bender [13] in the field of epidemiology. This model can easily be reused to derive the threshold values with a given estimated coefficient. Moreover, it has previously been applied in the software metrics field [14]. Study has applied this model on three Firefox versions (1.5, 2.0 and 3.0) to identify the threshold metrics value for identification of bad smells in classes. The set of metrics selected for the study is DIT [2], NOC [2], RFC [2], WMC [2], LCOM [2], Co [1], CBO [2], NOA [37], NOOM [37], NOAM, PuF [23] and EncF [23]. The bad smells considered in the study are Large Method, Long Parameter List, Large Class, Temporary Fields, Shotgun Surgery, Lazy Class, Data Class, Speculative Generality, Middle Man, Feature Envy and Inappropriate Intimacy [22]. The practically effective threshold metrics value for the identification of smelly classes’ was selected on the basis of high area under curve [receiver operating character (ROC)]. It was believed that a module’s proneness to error is somehow associated with bad design [9, 23, 50]. So, clearly the threshold value of metrics associated with the design principles (bad smells) can also be used to predict faults that will arise in the future. The derived threshold values are validated by identifying the faulty classes for subsequent Firefox versions. Finally, study also compare result with that of a previous study [16] that also selected the fault during the development stage.

2 Literature review

Applying refactoring manually is very difficult. Numerous tools have been developed by different communities (commercial, academic and open source) to assess a bad smell from a code, including Together [17], XRefactor [18], jbuilder [19] (commercial), Eclipse [20] (open source), Columbus [21] (academic). In all these cases refactoring is applied with the interactive participation of the user. The purpose of these tools is to identify the areas of code on which refactoring should be applied.

Webster [24] gives the conceptual, coding and quality assurance view of antipatterns (which are potential causes of bad smells) in programming languages. Later on, Reil [25] defined the heuristic to assess quality of the programming language manually for object oriented design. Fowler and Beck [22] have defined the 22 bad smells, which helps in identifying the area to apply refactoring. Mantyla [12] proposed the classification of these bad smells into six groups. They all have proposed theoretical approaches to defining bad smells. They also did not define the criteria to identify them from the code. Other studies (manual and automatic) have been done to identify them from the code. A manual approach to identifying the design flaws that lead to bad smells was proposed by the Travassos [28]. They use the reading technique to identify the antipatterns; this will not be worthwhile for large industrial projects. Ciupke [29] proposed an approach to identify simple design problems which that occur frequently and locate them with the queries applied on a derived model from the source code. However, he did not address the complex design problems causing the bad smells code. Marinescu [30] has proposed a “detection strategy” for formulating metrics-based rules to check the deviation from good design principles and heuristics. In this case human intervention is required because the detection process is uncertain. Therefore, a detection process creates the uncertainty in the classes. Marinescu [30] did not give any justification for the selection of metrics, threshold and combination of metrics defined in the detection strategies. It is indispensable to know whether the metrics chosen encapsulate the design problem or not. Marinescu [31] also proposed an approach, which did not cover the uncertainty issues, quality analysts’ interpretations and context of the programs. Furthermore, Marinescu had did the case studies on small projects and not on industrial projects. Rao and Ready [32] use the design change propagation probability (DCPP) matrices to detect the two antipatterns (shotgun survey and divergent change). A DCPP matrix is a N $\times $ N matrix where N represents the artifact or class. In the matrix, the value at column A, row B represents the probability that a design change in artifact A will require changes in B to preserve the overall functionality. Rao and Ready [32] do not use the threshold values in the detection of the antipatterns, but compare candidates of design defects (source code or class, they called it artifacts in their study) under certain specified conditions. They checked the specified conditions (listed in their study [32]) iteratively and correct the design defects with refactoring techniques representing the bad smell (e.g. move method for shotgun survey). Li et. al. [9] have inspected the relationship between faulty classes and a few bad smells. They have checked the relationship with three versions of Eclipse and concluded that broader studies are needed to validate the results. Current study provides such extensive studies by deriving threshold values for smelly classes (with 11 bad smells) and validate values with the probability of occurrence of faulty classes.

Many research studies have developed the metrics model or used the metrics values to predict the quality of the software both for design deviance [10, 23, 30, 38, 50] and for error proneness [4, 23, 50]. Out of these studies few studies have further focused on error proneness due to lack of design [23, 38, 50]. But few other sole research studies have been done to show the relationship between design deviance and faulty classes [9, 12].

Rosenberg [16] has designed the threshold values for some CK metrics with error data, collected in development phase. Threshold values derived in their study can be used for redesigning. Rosenberg [16] values were tested by Shatnawi et. al. [14] and they determined that it is not find it effective for predicting faulty classes as compared to their own values. Shatnawi [14] derived the threshold values for software metrics for predicting the faulty classes. They were the first one to use the approach of quantitative risk level in software metrics, which was used earlier by Bender [13] in the field of epidemiology.

As a compromise between manual and automatic technique, Dhambri et al. [33] have proposed a visual technique for identification of smelly classes. But this technique is unable to decide when a smelly class is actually a smelly and provides a futilely long list of candidates. Fully automatic technique was also proposed by Lanza et al. [33] but this does not give the threshold values for the detection of smelly classes, either, and it shows uncertainty in the detection of smelly classes which really are smelly classes. None of the above approaches, whether manual or automatic, validate their work with the bugs which is designed for smelly classes. In this paper first step is to design the threshold values for determining smelly classes and then validate it with the bug database. This shows that by improving the understandability with refactoring of a code, the probability of bugs will also be reduced. Very few studies have figured out more than five to six bad smells. Current study, however, has taken into account 11 bad smells and designed the threshold values of metrics based on these bad smells.

3 Data collection

Metrics and a bad smell database of Mozilla Firefox were collected using the Columbus Wrapper Framework tool (academic command prompt version on special request) [21]. It has earlier being used and authenticated by number of research studies [23, 26, 40, 43, 44, 50] done by developer of Columbus tool. The selection of metrics was based on criteria that it shall cover a maximum of properties of object oriented methodology. Another criterion is that the metrics should be measured or extracted by the Columbus tool. The selected set of metrics is DIT [2], NOC [2], RFC [2], WMC [2], LCOM [2], Co [1], CBO [2], NOA [37], NOOM [37], NOAM, PuF [23] and EncF [23]. The metrics are characterized as Information hiding (PuF), Encapsulation(EncF), Class Complexity (WMC), Inheritance (DIT, NOC), Class Size(NOA, NOOM, NOAM), Cohesion (LCOM, Co) and Coupling (RFC, CBO). The criteria for the selection of bad smells were the same as those for metrics. The selected set of bad smell is Large Method, Long Parameter List, Large Class, Temporary Fields, Shotgun Surgery, Lazy Class, Data Class, Speculative Generality, Middle Man, Feature Envy and Inappropriate Intimacy [22]. The Columbus tool compiled the source code and produced the metrics and bad smells for each class. Separate files of metrics and bad smells were generated.

3.1 Bad smell extraction

Bad smell identification in a code will help to refactor the code. Decision of refactoring the module is not simple. A number of people have done work to make a decision on refactoring [5–8] on the basis of metrics models. Research results show that there is a relationship between structural attributes (design metrics) and external quality metrics (bad smell) [7, 11, 12, 23, 50]. Bad smell findings will help the testing team to adjudge some faults which are likely to come in the future due to these bad smells [9, 10]. It also helps the manager to forecast the system for refactoring it and for long term evolution of the projects. Table 1 shows the bad smell count in the classes for three versions. Interested reader can read about the brief description of the extraction of 11 bad smells by Columbus tool from the products white paper [42]. For better statistical analysis, bad smells are divided into six categories [12]. The distribution of the effective smelly classes in first five categories is shown in Table 2.

Table 1 Bad smell-affected class count distribution in three versions

Object oriented software metrics threshold values at quantitative acceptable risk level

Abstract

Similar content being viewed by others

On the Comparison of Static and Dynamic Metrics Toward Fault-Proneness Prediction

The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction

Sofware Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for Object-Oriented Applications

Explore related subjects

1 Introduction

2 Literature review

3 Data collection

3.1 Bad smell extraction

3.2 The bug data collection

4 Research methodology

4.1 Threshold model

4.2 Model accuracy

5 Hypothesis of study

6 Metrics UBR analysis

7 Metrics threshold values analysis

8 Effectiveness of metrics threshold values

9 Result and discussion

10 Threshold for other system

10.1 Threats to validity

11 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation