Keywords

1 Introduction

A large number of bugs are reported on bug tracking systems by different users, developers, and staff members located at different geographical locations. Bug priority (P1, the most important, to P5, the least important) is an important attribute which determines the importance and the order of fixing of the bugs in the presence of other bugs. To automate the bug priority prediction, we need historical data to train the classifiers. In reality, this data is not available easily in all software projects, especially in new projects. Cross-project priority prediction works well in such situation where we train the classifiers with historical data of projects other than the testing projects [1, 2].

The bug reports are reported by users having different levels of knowledge about the software which results in uncertainty and noise in bug reports data. “Without proper handling of these uncertainties and noise, the performance of learning strategies can be significantly reduced” [22]. The entropy-based measure has been used to calculate the uncertainty in bug summary reported by different users. In literature, researchers [1, 2] have made attempts for cross-project bug summary-based priority prediction. No attempt has been made to handle uncertainty in bug summary in cross-project context for bug priority prediction. We have proposed summary entropy-based cross-project priority prediction models using Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), and Neural Network (NNET). In addition to the summary entropy, we have also considered bug severity and the derived bug summary weight attribute. Results show improvement in performance over summary-based cross-project priority prediction models [2].

The rest of the paper is organized as follows: Sect. 2 deals with related work. Section 3 describes the data description, bug attributes, and model building required to perform the analysis. Results have been discussed in Sect. 4. Finally, the paper is concluded in Sect. 5.

2 Related Work

Bug priority assessment helps in correct resource allocation and bug fix scheduling. A bug priority recommender has been proposed by Kanwal and Maqbool [3] by using SVM classification technique. The study was further extended for comparison of SVM and NB performance with different feature sets by Kanwal and Maqbool [4]. An attempt for bug priority prediction has been made by Alenezi and Banitaan [5] using NB, Decision Tree (DT), and Random Forest (RF) for Firefox and Eclipse datasets. Lian Yu et al. [6] proposed defect priority prediction using Artificial Neural Network (ANN) technique. Results show that ANN performs better than Bayes algorithm. Tian et al. [7] proposed a new framework called DRONE (PreDicting PRiority via Multi-Faceted FactOr ANalysEs) for Eclipse projects and compared it with SeverisPrio and SeverisPrio+ [8].

In literature, several studies have been conducted in cross-project context [9,10,11,12,13,14,15,16].

Bug summary-based cross-project priority prediction models have been proposed by [1, 2] using SVM, NB, k-NN, and NNET. Results show that cross-project bug priority prediction works well. Another attempt has been made by authors to propose bug summary-based cross-project severity prediction models [17].

Software are evolved through source code changes done in it to fix different issues, namely bugs, new features, and feature improvements reported by different users. These source code changes result in uncertainty and randomness in the system. In literature, researchers have used entropy-based measures to quantify the code change process for defects prediction [18]. Researchers have used entropy-based measures to predict the potential code change complexity [19]. A software reliability uncertainty analysis method has been proposed by Mierswa et al. [20].

To our knowledge, no work has been done for measuring trustworthiness of bug summary data in bug repositories. The uncertainty/noise present in bug summary data can affect the performance of prediction models. In this paper, we have measured the uncertainty in bug summary by using entropy-based measures. In addition to summary entropy, bug severity and summary weight for bug priority prediction in cross-project context have been considered. We have compared our proposed summary entropy-based bug priority prediction models with Sharma et al. [2] and found improvement in performance of the classifiers.

3 Description of Datasets, Bug Attributes, and Model Building

In this section, description of datasets and bug attributes used for validation and the model building have been discussed.

3.1 Description of Datasets

We have taken different products, namely Platform Version 2 (V2), Platform Version 3 (V3) of Eclipse project (http://bugs.eclipse.org/bugs/) and Database Access (DB), Spreadsheet (SST), Presentation (PPT) of OpenOffice project (http://bz.apache.org/000/). We have considered the bug report for status “verified,” “resolved,” and “closed.” Table 1 shows the distribution of bug reports of different priority levels.

Table 1 Priority-wise number of bug reports of different projects

3.2 Bug Attributes

To predict bug priority in cross-project context, we considered three attributes, namely severity, summary weight, and entropy of summary. Severity is a nominal attribute, whereas summary weight and entropy are continuous attributes. Bug severity gives the impact of bug on the functionality of software or its components. It is divided into seven levels, namely “Blocker, Critical, Major, Normal, Minor, Trivial, and Enhancement.” Blocker is the highest level, and Enhancement is lowest level. Bug priority determines the importance of a bug in the presence of others. Bugs are prioritized by P1 level, i.e., the most important, to P5 level, i.e., the least important. The bug summary gives the textual description of the bug. Summary weight is extracted from the bug summary attribute, entered by the users.

The bug summary has been preprocessed with the RapidMiner tool [21] to calculate the summary weight of a reported bug [2].

Different users are reported bug on bug tracking system. The size of software repositories is also increasing by an enormous rate that enhances the noise and uncertainty in the bug priority prediction. If these uncertainties are not handled properly, the performance of the learning strategy can be significantly reduced [22]. We have proposed entropy-based measure to build the classifier for bug priority prediction to handle uncertainties in cross-project context. We have used Shannon entropy to build the classifier model.

Shannon entropy, S is defined as

$$S = - p_{i} \log_{2} p_{i}$$

where \(p_{i} = \frac{{{\text{Total number of occurrences of terms in }} i{\text{th bug report}}}}{\text{Total number of terms}}\).

The top 200 terms have been taken from all terms based on their weight. To rationalize the effect of the priority, we multiplied the entropy by 10 for P1 and P2 priority level bugs, 3 for the P3 priority level bug, and 1 for P4 and P5 priority level bugs [23].

3.3 Model Building

We have proposed summary entropy-based classifiers based on SVM, k-NN, NNET, and NB for bug priority prediction in cross-project context by taking bug attributes severity and summary weight. We have taken the bug reports of two products of Eclipse and three products of OpenOffice projects. To get the significant amount of performance, we have used the appropriate parameters values. “For SVM, we have taken polynomial kernel with degree 3, the value of k as 5 in case of k-NN and for NNET the training cycle as 100” [2]. Number of validations is taken as 10 and sampling types as stratified sampling for different classification techniques. The performance of the proposed models has been validated using different performance measures, namely Accuracy, Precision, Recall, and F-measure.

Figure 1 shows the main process of cross-project priority prediction.

Fig. 1
figure 1

RapidMiner process for bug priority prediction in cross-project context

4 Results and Discussion

We have validated the entropy-based classifier of different machine learning techniques, namely SVM, k-Nearest Neighbors, Naive Bayes, and Neural Network using 10 fold cross-validations for predicting the bug priority. We have compared the proposed entropy-based approach to Sharma et al. [2]. We have taken the same datasets and techniques as taken by Sharma et al. [2] to predict the bug priority in cross-project context. Table 2 shows the Accuracy of different machine learning techniques to predict the priority of cross-validated projects.

Table 2 Accuracy (%) of cross-validated projects

Accuracy for Training Dataset V2

For testing dataset V3, our entropy-based approach improves the Accuracy by 3.46% and 91.93% for SVM and NB, respectively. Our entropy-based approach improves the Accuracy by 7.86%, 10.21%, 2.81%, and 82.85% for SVM, k-NN, NNET, and NB, respectively, for testing dataset DB. For testing dataset SST, our approach improves the Accuracy by 6.66%, 8.42%, 2.96%, and 82.08% for SVM, k-NN, NNET, and NB, respectively. Our entropy-based approach improves the Accuracy by 11.69%, 10.99%, 13.00%, and 85.19% for SVM, k-NN, NNET, and NB, respectively, for testing dataset PPT.

Accuracy for Training Dataset V3

Our entropy-based approach improves the Accuracy by 6.34%, 6.57%, 6.40%, and 82.10% for SVM, k-NN, NNET, and NB, respectively, for testing dataset DB. For testing dataset SST, our approach improves the Accuracy by 3.46% and 91.93% for SVM and NB, respectively. Our entropy-based approach improves the Accuracy by 9.39%, 8.16%, 7.41%, and 76.44% for SVM, k-NN, NNET, and NB, respectively, for testing dataset PPT.

Accuracy for Training Dataset DB

For testing dataset V2, our entropy-based approach improves the Accuracy by 3.43%, 10.29%, 9.09%, and 60.21% for SVM, k-NN, NNET, and NB, respectively. Our entropy-based approach improves the Accuracy by 1.04%, 0.11%, 2.70%, and 79.27% for SVM, k-NN, NNET, and NB, respectively, for testing dataset V3. For testing dataset SST, our approach improves the Accuracy by 5.46%, 13.35%, 16.66%, and 76.19% for SVM, k-NN, NNET, and NB, respectively. Our entropy-based approach improves the Accuracy by 8.77%, 5.66%, 11.59%, and 60.12% for SVM, k-NN, NNET, and NB, respectively, for testing dataset PPT.

Accuracy for Training Dataset SST

For testing dataset V2, our entropy-based approach improves the Accuracy by 3.14%, 1.08%, and 49.11% for SVM, k-NN, and NB, respectively. Our entropy-based approach improves the Accuracy by 0.92% and 75.93% for SVM and NB, respectively, for testing dataset V3. For testing dataset DB, our approach improves the Accuracy by 7.89%, 11.11%, 4.18%, and 75.83% for SVM, k-NN, NNET, and NB, respectively. Our entropy-based approach improves the Accuracy by 8.89%, 5.25%, 4.02%, and 52.07% for SVM, k-NN, NNET, and NB, respectively, for testing dataset PPT.

Accuracy for Training Dataset PPT

For testing dataset V2, our entropy-based approach improves the Accuracy by 4.33%, 5.85%, 8.59%, and 70.95% for SVM, k-NN, NNET, and NB, respectively. Our entropy-based approach improves the Accuracy by 1.42%, 1.64%, 0.38%, and 84.93% for SVM, k-NN, NNET, and NB, respectively, for testing dataset V3. For testing dataset DB, our approach improves the Accuracy by 9.23%, 5.62%, and 72.92% for SVM, k-NN, and NB, respectively. Our entropy-based approach improves the Accuracy by 7.46% and 75.73% for SVM and NB, respectively, for testing dataset SST.

Out of 19 combination cases, SVM, k-NN, NNET, and NB outperform in 19, 16, 14, and 19 cases, respectively, in comparison with Sharma et al. [2]. Our approach improves the Accuracy 0.92–11.69% for SVM, 0.11–13.35% for k-NN, 0.38–16.66% for NNET, and 49.11–91.93% for NB across all the 19 combinations for bug priority prediction in cross-project context. SVM and NB outperforms for bug priority prediction across all the 19 combinations.

Table 3 shows the best training dataset with highest Accuracy for different machine learning techniques. Across all the machine learning techniques, on the basis of Accuracy, DB is the best training dataset for V2 testing dataset, DB is the best training dataset for V3 testing dataset, SST is best training dataset for DB testing dataset, DB is the best training dataset for SST testing dataset, and V2 is the best training dataset for PPT testing dataset.

Table 3 Classifier-wise best training candidate with highest accuracy

Avg. F-Measure for Training Dataset V2

From Table 4, we observed that the value of F-measure (avg.) lies between 34.32%–48.49%, 30.69%–40.52%, 31.63%–40.04%, and 35.13%–39.44% for training candidates V3, DB, SST, and PPT, respectively, across all the machine learning techniques.

Table 4 Average precision (P), recall (R), and F-measure (F) for training dataset (V2 product)

Avg. F-Measure for Training Dataset V3

We obtained the value of F-measure (avg.) that lies between 33.94%–35.22%, 33.53%–35.98%, and 32.04%–35.96% for training candidates DB, SST, and PPT, respectively, across all the machine learning techniques as given in Table 5.

Table 5 Average precision (P), recall (R), and F-measure (F) for training dataset (V3 product)

Avg. F-Measure for Training Dataset DB

Table 6 shows the value of F-measure (avg.) that lies between of 25.23%–41.53%, 26.33%–38.09%, 30.93%–51.12, and 32.30%–42.75% for training candidates V2, V3, SST, and PPT, respectively, across all the machine learning techniques for bug priority prediction.

Table 6 Average precision (P), recall (R), and F-measure (F) for training dataset (DB product)

Avg. F-Measure for Training Dataset SST

From Table 7, we observed that the value of F-measure (avg.) lies between 25.00%–39.50%, 26.27%–38.80%, 32.46%–51.34%, and 31.84%–46.00% for training candidates V2, V3, DB, and PPT, respectively, across all the machine learning techniques.

Table 7 Average precision (P), recall (R), and F-measure (F) for training dataset (SST product)

Avg. F-Measure for Training Dataset PPT

Table 8 shows the value of F-measure (avg.) that lies between 26.71%–40.29%, 29.10%–39.88%, 27.88%–40.53%, and 27.64%–37.54% for training candidates V2, V3, DB, and SST, respectively, across all the machine learning techniques.

Table 8 Average precision (P), recall (R), and F-measure (F) for training dataset (PPT product)

Table 9 shows the best training dataset with highest F-measure (avg.) for different machine learning techniques. Across all the machine learning techniques, on the basis of F-measure, DB is the best training candidate for V2 testing dataset, V2 is the best training candidate for V3 testing dataset, SST is best training candidate for DB testing dataset, DB is the best training candidate for SST testing dataset, and SST is the best training candidate for PPT testing dataset.

Table 9 Classifier-wise best training candidate with highest F-measure (average)

Figure 2 shows the Accuracy comparison using SVM machine learning technique for cross-project bug priority prediction.

Fig. 2
figure 2

SVM accuracy comparison (proposed work vs. Sharma et al., 2014 [2])

Figure 3 shows the Accuracy comparison using k-NN machine learning technique for cross-project priority prediction.

Fig. 3
figure 3

k-NN accuracy comparison (proposed work vs. Sharma et al., 2014 [2])

Figure 4 shows the Accuracy comparison using NNET machine learning technique for cross-project priority prediction.

Fig. 4
figure 4

NNET accuracy comparison (proposed work vs. Sharma et al., 2014 [2])

Figure 5 shows the Accuracy comparison using NB machine learning technique for cross-project priority prediction.

Fig. 5
figure 5

NB accuracy comparison (proposed work vs. Sharma et al., 2014 [2])

5 Conclusion

In the absence of data for building a classifier, cross-project study provides a solution. In this paper, we have proposed an approach for cross-project bug priority prediction using three attributes, bug severity, summary weight, and summary entropy. By considering learning from the uncertainty, we have derived an attribute termed as summary entropy using Shannon entropy. To build the classifier, we have used machine learning techniques, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), and Neural Network (NNET). The built-in classifiers based on these techniques predict the priority of a reported bug in cross-project context very accurately and outperform with the work available in the literature.