Keywords

1 Introduction

It is difficult for humans to develop software without any problems. So, finding and fixing bugs is an important process in the software life cycle. People use bug tracking systems to collect software system errors discovered by developers, testers, and end-users. The most commonly used bug tracking system is BugzillaFootnote 1. Many open-source projects use Bugzilla to help them manage their projects. For example, eclipse receives a lot of bug reports every dayFootnote 2. But everything has two sides. On the one hand, a lot of reports can help us improve the quality of software, however, on the other hand, handling these reports manually is a very time consuming task. If we can’t extract useful value information from these bug reports, then more data doesn’t make any sense.

Due to a large number of bug reports, it is unrealistic to rely solely on people to deal with them. People are gradually proposing more and more analytical methods to deal with bug reports. In this article, we will investigate the different problems that researchers have studied in existing erroneous data sets. We want to express what technologies people have used so far to solve the various problems in the bug report.

In order to better review the existing work and look forward to the future work, we sort out the previous work of the researchers from the three dimensions: problem, data and technology. From a problem perspective, some people focus on the detection of bugs, they want to analyze the existing bug reports and extract the characteristics of the bugs. When they receive a new bug report, let the machine automatically determine if this is a real error, whether it is a duplicate of the bug in the existing bug library, and find a bug similar to this bug to solve this problem faster. Some researchers want to automatically identify the severity and priority of new bugs through existing bug reports so that developers can prioritize the most problem-solving issues and do more meaningful things in a limited amount of time. There is a bug in the system, usually because some files are written. Some researchers want to find the relationship between bugs and source files through the existing repair experience, let people locate bugs and fix bugs faster. Some researchers believe that different people have different ability to solve bugs in different fields. They want to learn the distribution of existing bugs and automatically recommend the most suitable developers to solve problems when new bugs occur.

From a data perspective, most of the work already done is based on Bugzilla collecting bug reports. However, the fields that different researchers pay attention to are not the same. Some people only pay attention to the text information in the bug report, such as description, summary field. Some people think that structured information also contains important information. They not only consider text information but also consider structured information such as priorities and components. Still others believe that just analyzing bug reports is not enough. They introduce external data combined with bug reports to solve problems. How to reasonably combine different types of information has always been a difficult point. From a technical point of view, most of the work is based on traditional information retrieval techniques, machine learning and deep learning techniques. In recent years, with the development of big data and the significant improvement of computing power, people think that “big data + complex model” is a better choice. Therefore, neural networks and deep learning have become more and more popular, and they have demonstrated their capabilities in various fields. More and more people are trying to use neural networks instead of traditional information retrieval and basic machine learning algorithms to solve different problems.

2 Bug Report Problem Classification

In this survey, we mainly summarize the problems that others have studied on the bug report from the perspective of the problem and analyze the data and technology they use. We divide the existing research related to the bug report into five categories: bug detection, bug level determination, bug location, bug developer recommendation, and some other issues.

In terms of bug detection, we mainly consider three sub-problems: bug classification, to determine whether a report is a bug; Duplicate bug detection to determine if the two bug reports describe the same problem; Similar bug detection to determine if two bugs are of the same type. The bug level determination is mainly divided into two sub-problems: bug severity prediction and bug priority prediction. The bug location and bug developer recommendations are two separate and widely studied issues. By dividing the bug report problem into different categories, it is easy to know the hot point research area in recent years and conclude future tendencies.

3 Bug Detection

There are three main problems with bug detection. The first question is whether the problem described in the bug report is about the bug. Users not only submit a bug to the tracking system, they also mention other requirements, such as how to make the system more convenient. Thus, we need to identify the real bug and then solve it. The second question is about duplication. Many reports submitted by users refer to the same bug but with different descriptions, we need to be able to determine whether the two bug reports are about the same problem. The third problem is to identify similar bugs. We know that if the two errors are very similar, then the reasons for them may be very similar. By recommending similar bug reports to developers, it may be faster to locate errors and fix bugs.

3.1 Bug Reports Classification

There are a large number of reports that are actually misclassified. Manually classifying bug reports is a very time consuming task. [1] used 90 days to manually sort over 7,000 error reports. Therefore, it is necessary to classify reports automatically.

Researchers firstly used the text field in the bug report to extract features and use this to determine if the new report was a bug. [2] applied topic modeling to the corpus of pre-processed bug reports, and then classified bug reports using decision trees, naive Bayes classifiers, and logistic regression. Experiments implicate that the topic-based model outperforms than the word-based model, and the naive Bayesian model is better than the other two in classification.

Some researchers believe that structured information in the bug report can help judge whether a report is a bug. [3] used a hierarchical Dirichlet process (HDP) and clustering to classify bug reports. The bug report is projected into the topic vector space, then clustering method is utilized to aggregate the bug reports and tag the categories. How to better combine structured and unstructured data has always been an important issue. Some people [4] proposed a hybrid approach to classify error reports. They used the text mining method to extract features from the report summary, and then used the data mining method to combine the structured information features in the error report with the previous stage text features, finally used the Bayesian classifier to predict.

3.2 Duplicate Bug Report

In 2018, [5] found that the proportion of duplicate bugs reached 20% on the bug repositories of Mozilla Core, Firefox and so on. These duplicated reports may be solved by developers for many times, resulting in waste of human resources. So far, the bug tracking system can not detect duplicate bugs automatically when collecting bugs [6]. The work of detecting duplicate bugs can be divided into two categories: One is to apply natural language processing (NLP) techniques to unstructured textual information, such as bug title and bug summary; Another approach focuses on execution information in bug reports.

Duplicate Bug Detection Based on NLP. [7] used BM25 for term weighting to transform bug reports into vector space. They found that the right term weighting is critical for detecting duplicate bug reports.

There are many challenges in using text information. As each person has different speaking habits, different words may be used to express the same concept, so only use text matching is not enough. Hence, it is necessary to analyze the semantic of bug report. [8] treated each bug report as a text document and used it to train word embedding models [9]. Using the trained word embedding model, They converted the error report into a vector and further trained the deep neural network above these bug report vectors to understand the distribution of duplicate bug reports and non-repeated bug reports. In addition to the above method, there are some other attempts. [10] proposed a combination of Latent Dirichlet Allocation (LDA) and word embedding method to determine whether it is a duplicate bug report. The idea of this approach is to use LDA’s higher recall rate to first exclude the most dissimilar bug reports, and then use the word embedding model in the remaining reports to calculate the similarity among reports.

Duplicate Bug Detection Based on Execution Information. The bug execution information describes the context in which the bug occurred. The context of repeated bugs is the same. [11] proposed a repetitive error detection involving both execution information and natural language information. [12] considered to utilize domain knowledge and context of software. In the Android bug tracking system experiment, they found that the detection of bug reports can be improved by considering keywords in the Android domain. Some people think that the more features of a bug report you have, the more detailed you can portray a bug. [13] defined 25 features in the bug report as the basis for the classification, most of which were generated by the TakeLab system, and then used the SVM training model to classify the bugs. The topic model enables efficient semantic analysis and text mining. [14] proposed a novel duplication detection method based on the topic model. They combined the similarity of each report in the topic space with the similarity of the classified information of each report to predict duplication.

Deep learning methods also be involved in this area. [15] proposed a search and classification model combining CNN and LSTM [16]to solve the problem of repeated bugs. They used the vanilla single layer neural network to handle structured information in bug reports, used LSTM to handle short descriptions, used CNN to process long descriptions, and finally combined them to learn bug reporting features. Some people [5] proposed to use word embedding and Convolution Neural Networks to calculate the similarity between bug reports, because this not only pays attention to textual similarities but also achieves semantic similarity. This method not only considers the textual information in the report, but also combines domain-specific features (i.e., Component, Create time and Priority, etc.) to better detect duplicate bug reports. Over time, more and more new models have been proposed for specific problems. [17] used stack traces and hidden Markov models to automatically detect duplicate bug reports. Based on their research, they recognized the obvious benefits of using stack trace information. They believe that this information can improve the accuracy of the detection of repeated bug reports. They used recall rate and Mean Average Precision (MAP) to evaluate their models on Firefox and GNOME datasets and found better results than baseline models.

3.3 Similar Bug Report

Similar errors mean that bugs in several bug reports are related to common code files. Unlike duplicate bug reports, we generally think that two reports are similar reports when they have more than 50% modified common files. By recommending similar bug reports to developers, we can help them locate the cause of the error faster and solve new bugs efficiently.

[18] combined traditional information retrieval technology and word embedding technology, and considered the title, description, and other component information in the bug report to recommend similar reports to developers. They experimented with similar bug recommendations, and their approach has better performance than NextBug [19]. Inspired by this, [20] proposed a new method for using document embedding models in order to further improve the performance of the method. They added a new document embedding vector component to the existing three components. This component focuses on mining the potential relationships between the two bug reports at the document level for better results.

4 Bug Level Determination

4.1 Bug Reports Severity

The bug report generally includes a severity, which helps the developer to resolve the serious error first. The severity is divided into crashes, errors, low efficiency and minors. Automatically detecting the degree of severity can benefit bug report processing, letting high severity bug reports be processed preferentially. [21] extracted the concept word in the bug report to construct the concept profile (CP). When a new bug report is encountered, they calculate the degree of similarity between the report and the CP severity concept to determine the severity of the bug. How to determine the severity of the profile requires people to do it manually. [22] use unsupervised methods to determine whether the severity of the bug report is correctly assigned. They used a Gaussian mixture model to group similar bug reports, then assigned severity labels to grouped bug reports, and finally used supervised algorithms to predict the severity of unmarked bug reports.

Just thinking about textual information is not enough. [23] proposed a nearest neighbor method based on information retrieval to predict the severity of bugs. They used the extended BM25 document similarity function to select the k reports that are most similar to the new bug report, and then predicted the severity of the new bug based on the severity of the k reports. In addition to considering the text information in the bug report, they also considered structured information like product and component.

4.2 Bug Reports Prioritization

[24] proposed to use different machine learning algorithms such as Naïve Bayes, decision trees, and random forests to predict the priority of reported bugs. They experimented on two feature sets. The first feature set is based on the textual description of the error report. The second feature set is based on the predefined metadata of the bug report. Experiments showed that the classification results of random forests and decision trees are better than Naïve Bayes, and the results of the second feature set are better than the first feature set. [25] thought that previous researchers did not take into account the reporter’s sentiment when predicting the priority of bug reports. In the bug report, if the submitter’s description is very anxious, the severity and priority of the bug may be high. Therefore, they extracted features from the bug report, then used the sentiment words involved in the bug report summary to calculate the sentiment value of each bug report, and then combined the two to train and predict the priority of the bug report. Rich, unstructured information in bug reports was also involved. [26] extracted the temporal, textual, author, related-report, severity, and product in the bug report as features, and then used the linear regression model to determine the priority of the report. They defined the priority of five bug reports and then used the thresholding approach to solve the problem of data imbalance.

5 Bug Localization

To solve the bug, the system developer needs to locate the source file that caused the bug which would be difficult especially in a huge system. The biggest challenge in auto-locating bugs is the mismatch between the terms used in the bug report and the terms used in the source file. There are lots of methods for solving this issue. One of the directions is analyzing the source code file. [27] believed that structured information based on code structure such as class name and a method name can help us locate bugs better. Their method only required source code and error reporting. However, the terms used in the error report used to describe the error may not be the same as the terms used in the source file. [28] combined DNN and information retrieval (IR) techniques to locate error files associated with bugs. They used information retrieval techniques to calculate textual similarities between error reports and source files. DNN is used to link the specific terms in the bug report to the terms in the source file. [29] found that if the error report lacks rich and structured information, the information retrieval technology often does not work well, and too much stack tracking information does not help the positioning.

Some researchers believed that considering the version history can better locate potential error locations. [30] proposed AmaLgam, a model that combines historical data, similar reports, and structural information to locate files related to bugs and achieved good performance. To help better understand existing code, researchers use information retrieval techniques to map bug reports to associated code units. [31] proposed a variant of 15 vector space models based on tf-idf to form a new composite model. They used the VSM model and AmaLgam [30] to calculate the weighted sum of suspicious files to locate bug files.

6 Bug Developer Recommendation

When we encounter a new bug, which developer should I assign to fix it? In general, we can randomly assign bugs to the developer, but it is deficient. Generally, developers have their own expertise area, so it is better to recommend bug reports that belongs to this area for them to fix up. Researchers attempt to address this issue by analyzing different kinds of data, such as text information, structured information and developer profile, and so on.

[32] believed that most of the previous work focused only on open source projects. They used convolutional neural networks and word embedding to build auto-recommended developers to fix bugs and apply the technology to industrial projects and open source projects. They believed that there are two main challenges. The first challenge is that in multinational companies whose native language is not English, bug reports often appear in their native language and English. The second challenge is that industrial projects are different from open source projects, and there may be many specific terms in specific fields. They mainly extracted the two text fields of description and summary in the bug report as features. They also proposed the idea of manual and automated classification cooperation and introduced the experience used in the industrial development environment.

Textual information in bug reports alone often does not yield satisfactory results. [33] introduced a highly scalable recommendation system for bug reporting assignments. In addition to considering the textual information of the report, they also used structured information such as component id, product id, and bug severity as feature data. They used convolutional neural networks and recurrent neural networks as deep learning classifiers. At the same time, they also made certain restrictions on developers. They believe that only developers who are still active in the project should be assigned bug fixes. They believe that only developers who have been fixing bugs for a while can be considered an active developer. They opened up further research directions in optimizing training speed and predictive performance.

Summarizing the bugs that each developer has fixed in the past is a good way to portray developers. [34] proposed to create an activity profile for the history of all activities of all users in the bug tracking system, then model it according to this file, and then recommend the appropriate developer to solve the bug through this model. Through this file, we can probably know the role of the developer in this system and the areas of expertise. Although we can better distribute bugs to developers through configuration files, it takes time to mine the configuration files for each developer’s historical data. [35] proposed a new method called DevRec, which consists of two types of analysis, bug reports based analysis and developer based analysis. The bug-based analysis is mainly to find bugs similar to the newly collected bugs from the bug repository. By analyzing the bug fixers, he can found potential fix developers for new bugs. They converted the features in the bug report into vectors to calculate the similarity between the two reports. The developer-based analysis measures the distance between the bug report and the developer, correlating the developer with the characteristics of the bug report. They combined these two analyses for optimal performance. [36] proposed a unified model based on learning ranking technology, which combines activity-based technology to find out which developers have solved similar bugs and location-based techniques to find the right developer for the bug location.

7 Other Problems

7.1 Generating Bug Fixes

Although the bug report tells the developer that there is a defect in the system, it may not be able to fix the defect due to the lack of a development environment, and the defect remains in the system. [37] proposed a method called R2Fix, which automatically generates bug fixes from bug reports. They chose buffer overflow, null pointer error and memory leak to evaluate the proposed method, because the repair methods of these three types of errors are relatively simple, and the repair mode can be found. When R2Fix received a new bug report, it analyzed the bug report, determined which error belongs to the above three types of errors, and finally generates a possible patch to fix the bug. In the verification experiment, R2Fix automatically generated the correct patch for 57 errors with an accuracy of 71.3%, and it also found potential errors that the tester did not find. Due to the difficulty of automatically generating bug fix, there is still a long way to go to generalize automatic patch generation.

7.2 Automatic Vulnerability Recognition

Automatically identify potential bugs will greatly improve the efficiency of software maintenance. In order to find unrecognized vulnerabilities in the open-source library and fix it, Some people [38] proposed an automatic vulnerability identification system. They used a variety of machine learning classifiers as basic classifiers to extract features from bug report submissions and reports themselves and automatically discovered unrecognized errors in submitted reports through natural language processing and machine learning techniques. However, because of the complexity of this issue, existing methods perform not well. Future work will continue exploring new methods to solve this problem.

7.3 Bug Report Summarization

In order to help developers quickly understand the information in the bug report, [39] proposed a two-layer semantic model (TSM) to extract important information from the report. They first used the extended NR (ENR) model to preserve the sentences with important semantics in the report, then used BRC (Bug Report Classifier) to extract the text features from these sentences, and finally used the logistic regression training model to select the sentences with high scores to generate the abstract of the article. [40] explored the use of deep neural networks to generate a summary of bug reports. They used bug report preprocessing, unsupervised network training and summary generation to assign scores to sentences in bug reports, and then dynamically selected sentences with high scores to generate summaries.

8 Conclusion - What’s the Outlook?

We have presented a comprehensive survey of bug report, categorizing current bug report tasks based on problem, data and technology and summarizing the current situation for each tasks. What’s the next for bug report? We end with future potential directions by applying past insights to the current situation. Firstly, different types of data will be used to better analyze bug reports; Then, advanced models will be invented to better address specific problems; Last but not least, the efficiency and practicability of methods will be considered.