Improving bug management using correlations in crash reports

Wang, Shaohua; Khomh, Foutse; Zou, Ying

doi:10.1007/s10664-014-9333-9

Improving bug management using correlations in crash reports

Published: 10 October 2014

Volume 21, pages 337–367, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Software Engineering Aims and scope Submit manuscript

Improving bug management using correlations in crash reports

Download PDF

Shaohua Wang¹,
Foutse Khomh² &
Ying Zou³

712 Accesses
123 Citations
Explore all metrics

Abstract

Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.

An empirical study of crash-inducing commits in Mozilla Firefox

Article 09 March 2017

ANCHOR: locating android framework-specific crashing faults

Article 12 July 2021

The significance of bug report elements

Article Open access 14 September 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, many big software organizations such as Microsoft^{Footnote 1} and Mozilla^{Footnote 2} embed automatic problem reporting tools in their software systems. Whenever the software crashes (i.e., terminates unexpectedly) in a user’s environment, the automatic problem reporting tool collects information about the crash and sends a detailed crash report to the software vendor. A crash report usually contains the stack trace of the failing thread and other runtime information. A stack trace is an ordered set of frames; each frame referring to a method signature. Crash reports are used by several stakeholders such as developers fixing crashes and product managers allocating development resources. Using crash reports, Microsoft developers were able to fix 29 % of the bugs found in Windows XP SP1, and more than 50 % of the Office XP SP2 bugs (Connecting with customers 2012). The automatic collection of crash reports helped Mozilla developers to improve the reliability of Firefox by 40 % from November 2009 to March 2010 (Firefox Stability Improvement 2012).

Built-in automatic crash reporting tools often collect a large amount of crash reports. For example, Mozilla Firefox receives 2.5 million crash reports every day (Socorro: Mozilla’s Crash Reporting Server 2012). To reduce the amount of crash reports to handle, similar crash reports are identified and grouped together based on the similarity of their stack traces. We refer to a group of similar crash reports as a crash type. The signature of a crash type is usually the top method signature of the stack traces. The crash types are sorted based on the number of crash reports and developers usually file bug reports for the top crash types, i.e., crash types with high numbers of crash reports. Later, stack traces from the failing threads, contained in crash reports, are used by developers to diagnose and fix the bugs.

A bug can frequently trigger crashes in different usage scenarios, causing different crash types to be linked to the same bug. A crash type can be linked to multiple duplicate or correlated bug reports. A duplicate bug report describes a problem already filed. Two bug reports are considered to be correlated if the occurrence of one bug in one bug report causes the bug in the other report to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group (CCG). A crash type can belong to one or several crash correlation groups. For example, if a crash type C T ₁ shares a bug report with a crash type C T ₂ and another bug report with a crash type C T ₃. C T ₁ belongs to two crash correlation groups, i.e., {C T ₁,C T ₂} and {C T ₁,C T ₃}.

The identification of crash correlation groups can help developers identify correlated crash types and fix bugs more efficiently; crash types in a crash correlation group should be analyzed together when fixing bugs. Crash correlation groups provide a diversity of crashing scenarios that could help developers identify the root cause of the bugs more efficiently.

Many studies have been performed on the use of stack traces in crash reports to locate and fix bugs. Schröter et al. (2010) examined stack traces in bug reports and found that bugs are fixed faster when their reports contain at least one stack trace. Brodie et al. (June 2005) proposed a method based on a comparison of stack traces to identify similar bugs using historical information on known bugs. Dhaliwal et al. (2011) examined the use of stack traces for bug fixing and identified some limitations in the crash grouping process of Mozilla Firefox. They proposed a grouping approach for crash reports, based on a comparison of failing stack traces using the Levenshtein distance (Kruskal JB 1983), and build sub-groups of crash reports of a crash type. Their sub-grouping strategy can improve the existing Mozilla crash reporting system and this improvement can help to reduce the bug fixing time by more than 5 % based on their empirical study.

In our previous work published at the 10^th Working Conference on Mining Software Repositories (Wang et al. 2013), we propose three rules to identify correlated crash types automatically, using structural information about the crash types (i.e., the crash signatures and stack traces).

In this paper, in addition to using structural information, we investigate the possibility to identify correlated crash types using temporal and semantic information. The temporal information is related to the co-occurrence time of crash types and the semantic information is related to the textual similarity between user comments provided for the crash types. Moreover, we also explore the possibility of using crash correlation groups to help development teams fix bugs and identify duplicate bug reports.

We conduct our study using Firefox crash reports and Eclipse bug reports. We address the following five research questions:

RQ1. :

Can we identify correlated crash types using crash type signature and stack traces?

We strive to propose simple rules for the identification of crash correlation groups (i.e. correlated crash types) using the structural information of crash types. First, we examine the signatures of crash types and generate a rule to automatically identify crash correlation groups. The rule does not require a detailed analysis of failing stack traces and can identify crash correlation groups with a precision of 100 % and a recall of 68 % for Firefox. On Eclipse, the rule achieves a precision of 69 % and a recall of 46 %. To improve on the results, we examine failing stack traces and propose two additional rules to detect correlated crash types automatically. When executed together, our three rules identify crash correlation groups in Firefox with an average precision of 91 % and an average recall of 87 %. On Eclipse, the three rules achieve an average precision of 76 % and an average recall of 61 %. The average execution time of the three rules is in the order of 128 seconds. The scalability is preserved.

RQ2. :

Can we identify correlated crash types using the occurrence times of crash events?

A group of crash types reported by the same users frequently, within a short time period, can be correlated. We examine the co-occurrences of crash types and propose one additional rule to detect correlated crash types automatically. This rule can identify crash correlation groups in Firefox with an average precision of 52 % and an average recall of 58 %. The highest recall it can achieve is 84 %. This rule is not applicable to Eclipse, since the time of user comments being posted in Eclipse bugzilla is not the actual time of the occurrence of exceptions.

RQ3. :

Can we identify correlated crash types using the textual similarity between users comments about the crash events?

The user comments describe the crashing scenarios of crash types. Correlated crash types could have similar user comments, therefore we examine the similarity between text mined from user comments of crash types and propose one additional rule to detect correlated crash types automatically. This rule identifies crash correlation groups in Firefox with an average precision of 54 % and an average recall of 46 %. On Eclipse, the rule achieves an average precision of 42 % and an average recall of 30 %.

RQ4. :

Can the correlated crash types help identify buggy files?

We propose an algorithm, using our proposed crash correlation group identification rules, to locate and rank suspicious files using the stack traces of correlated crash types. When considering only the top three buggy file candidates, our algorithm achieves a recall of 62 % and a precision of 42 % on Firefox; and a recall of 52 % and a precision of 50 % on Eclipse. The top ten candidate files reported by our algorithm can recover up to 92 % of buggy files in Firefox and up to 90 % of buggy files in Eclipse.

RQ5. :

Can the correlated crash types help identify duplicate bug reports?

We investigate the possibility of using the correlated crash types to identify duplicate or related bug reports. Our proposed approach, using the relations among crash correlation groups for duplicate bug reports identification, can achieve a precision of 55 % and a recall of 50 % on Firefox, and a precision of 38 % and a recall of 47 % on Eclipse. This confirms that using correlations between crash types can help identify duplicate bug reports.

This paper is an extended version of our earlier work (Wang et al. 2013). The original work:

proposes one rule based on the comparison of crash type signatures of crash types and two rules based on stack traces to group correlated crash types;
conducts an empirical study of the effectiveness of the three rules on stack traces from Firefox crash reports and Eclipse bug reports;
proposes an approach, using the correlations between crash types within a crash correlation group, to help development teams locate buggy files.
conducts an empirical study of the effectiveness of our approach for locating buggy files on Firefox and Eclipse.

We extend the earlier work in the following aspects:

1.
We build two more additional rules: One rule is based on the co-occurrence time of crash types; the other one is based on the textual similarity between crash types.
2.
We conduct an empirical study on Firefox and Eclipse to verify the effectiveness of the two rules identifying correlated crash types.
3.
We propose an approach using the relations between crash correlation groups to identify duplicate and related bugs.
4.
We conduct an empirical study of the effectiveness of our approach for identifying duplicate and related bug reports on Firefox and Eclipse.

The rest of this paper is organized as follows. Section 2 explains the process of crash reporting and introduces stack traces and crash types. Section 3 introduces the experimental setup. Section 4 presents the research questions of our study; for each research question, we present the motivation, introduce the analysis approach and discuss the results of our study. Section 5 discusses threats to the validity. Section 6 summarizes the related literature. Finally, Section 7 concludes the paper and outlines some avenues for future work.

2 Background

2.1 Crash Reporting

Many software organizations use a bug tracking system (e.g., Eclipse’s Bugzilla) to store and track bugs. When a crash occurs on a user’s machine, the software generates a failing stack trace that developers can use to fix bugs related to the crash. Users usually file bug reports in bug tracking systems to report crashes and include failing stack traces in comments made on the crashes. The other users can also share their failing stack traces by making comments on the filed bug reports including the crashes. The failing stack traces in the comments of bug reports as well as other information in the bug reports can help developers to reproduce and fix the bugs.

However, not all users file bug reports or report failing stack traces. To ensure that developers get the necessary information to fix bugs, more software organizations now ship their product to users with an embedded problem reporting tool that can collect failing stack traces automatically (e.g., the Mozilla Crash Reporter embedded in the Firefox browser). When a crash occurs, the failing stack trace is automatically collected by the problem reporting tool and a crash report containing information related to the crash is sent to a crash report repository (e.g., the Mozilla Socorro crash report server as illustrated in Fig. 1) maintained by the software organization. A crash report usually contains a signature, the stack trace of the failing thread, some runtime information such as the crash time, and information about the user environment, e.g., the operating system, the version, and the install time. Some crash reports contain comments discussing the crashes in the reports from users. Crash reports are grouped into crash types and ranked based on their frequency of occurrence. We discuss the grouping of crash reports in Section 2.2. For the top crash types, bug reports are created in a bug tracking system and linked to their corresponding crash types. Multiple bug reports can be filed for a single crash type and multiple crash types can be associated with the same bug report. A bug report contains detailed semantic information about a bug, such as the bug open date and the bug status. Moreover, users can make comments on a bug in the filed bug reports and some comments also contain stack traces (e.g., Eclipse’s bug reports). Bug reports are triaged and assigned to developers for fixing.

2.2 Stack Traces, Crash Reports and Crash Types

A stack trace is an ordered set of frames 〈 F ₁, F ₂, …, F _n〉. Each frame F _i is composed of a method signature which we denote by methSign and a fully qualified file name which we denote by qfileName. F _i=m e t h S i g n _i|q f i l e N a m e _i, where i∈{1…n} is the position of the frame F _i in the stack trace, and n is the total number of frames in the stack trace. F ₁ is the top frame of the stack trace. Figure 2 presents an example of stack trace extracted from a crash report of Firefox.

Each crash report contains a failing crash stack trace. On the Mozilla Socorro server, crash reports are grouped into crash types based on the similarity of the top frames (i.e., F ₁) of their stack traces (Dhaliwal et al. 2011). The crash time of a crash type is the time of its first crash report received by Socorro server. Usually, the top frames of all the stack traces in a crash type are identical. The method signature (i.e., methSign) from the common top frame is used as the crash type signature of the crash type, for example in Fig. 2, the method signature OnWriteSegmentt of frame F ₁ is used as a crash type signature. In the following, we refer to the top frame common to all the stack traces of a crash type as the top frame of the crash type. However, the subsequent frames in a stack trace might be different for different crash reports in a crash type.

A crash type signature S can be represented in the following structure: S=P ₁| P ₂| …| P _n, where each element P _i is composed of 〈F i l e〉〈O p〉〈M e t h o d〉〈P a r a m e t e r〉〈M e m o r y L o c a t i o n〉. F i l e, O p, M e t h o d, and P a r a m e t e r are respectively the name of a file or class name, an operator or a separator, a method, and a parameter.

In a crash type signature, at least one P _i should be ≠N U L L. In a P _i, the attributes F i l e, O p, M e t h o d, and P a r a m e t e r can be NULL. However, a P _i cannot be formed using only the name of an operator (i.e., O p). The value of O p depends on the programming language and the approach of composing a signature, e.g., the Firefox Browser written in C ++, O p is generally either the scope operator “::” or a separator “_”. Figure 3 shows an example crash type signature from the Mozilla Socorro server. This signature is composed of two elements. The first element P ₁ contains F i l e and M e m o r y L o c a t i o n. The O p, M e t h o d, and P a r a m e t e r are NULL. In the second element P ₂, the memory location is NULL.

The format used in Eclipse’s stack traces is different from the format used in Firefox’s stack traces. Figure 4 presents an example of a stack trace extracted from Eclipse’s bug reports and Fig. 5 shows the structure of a Frame in Eclipse stack traces.

In Fig. 5, Exception is the name of a Java exception (e.g., org.eclipse.core.commands. ExecutionException as shown in the Frame 1 in Fig. 4), Message is the description of the exception (e.g., While undoing the operation, an exception occurred), qfilePath is the path in the file directory structure, of the Method in which the exception was raised (e.g., org.eclipse.jface.text.projection.internalAdd as shown in Fig. 4), File is the name of the file that caused the exception (e.g., ProjectionDocument.java), and Line is the exact location in File where the exception was triggered. A stack trace from Eclipse is mapped to the format of Firefox’s stack traces as follows: m e t h S i g n=〈E x c e p t i o n|M e s s a g e|M e t h o d〉 and q f i l e N a m e=〈q f i l e P a t h|F i l e〉. If E x c e p t i o n=N U L L, then m e t h S i g n=M e t h o d.

We regroup Eclipse stack traces with similar top frames into crash types using the concatenation 〈F i l e|M e t h o d〉 from their common top frame. This approach is similar to the grouping of Firefox’s crash reports in the Mozilla Socorro server.

3 Experimental Setup

This section discusses our data collection and processing.

3.1 Data Collection

We conduct our study on two software systems: Firefox (written mainly in C/C ++) and Eclipse (written in Java). Firefox is an open-source Web browser developed by the Mozilla Corporation. It is currently the third most widely used browser, with approximately 24 % usage share worldwide (Web browsers). Eclipse is an open-source integrated development environment. It is a platform used both in the open-source community and the industry.

We analyze 7 beta versions of Firefox, i.e., Firefox-4.0b1 to Firefox-4.0b7. For each beta version, we download the summaries of all related crash types stored in Socorro server. We select the crash types for which at least one bug report is filed. For each selected crash type, we download the Firefox crash reports, based on their crashing time from latest to earliest, from Socorro server. Table 1 reports the descriptive statistics of our dataset. In total, we obtained 1,256 crash types. For all the bug reports filed for our selected crash types, we retrieve the bug reports from Bugzilla. We download the Firefox change logs to extract a list of files changed to fix a bug.

Table 1 Descriptive Statistics of Our Data Set on Firefox

Full size table

To the best of our knowledge, only the Mozilla Foundation has opened the crash reports of its products to the public. To verify the replicability of our study on other systems, we downloaded the MSR Mining Challenge 2008^{Footnote 3} data set containing 213,000 Eclipse bug reports filed between October 2001 and December 2007.

3.2 Data Processing

Figure 6 shows an overview of our data processing approach. First, we process Firefox crash reports to extract failing stack traces, user comments, user environment information, e.g., crash time and operating system, and IDs of bugs filed for the crashes. Second, we parse Eclipse bug reports to extract failing stack traces and their descriptions from user comments, and the IDs of bugs filed for the crashes. Third, we identify crash correlation groups (CCGs) defined by developers for the validation of our approach. Fourth, we use user environment information to identify users who report crashes. Then, we conduct word normalization on the user comments of crashes. Next, we parse Firefox and Eclipse change logs to identify bug fixes locations and, we map these bug fixes locations to the stack traces.

The remainder of this section elaborates on each of these steps.

3.2.1 Data Extraction from Firefox and Eclipse

We now discuss in details the data extraction for Firefox and Eclipse.

Firefox

For each crash type selected in our study, we extract the list of crash reports of the crash type and the failing stack traces contained in the crash reports by parsing HTML pp. We extract the possible user comments in the crash reports of each selected crash type and maintain a mapping between a crash type and its user comments used as the textual description of the crash type. We further extract user environment information such as operating system and crash time and maintain a mapping between user environment information and each crash report. We also extract the IDs of all the bugs filed for the crash types. We obtain a mapping linking each crash type to the list of its crash reports and the list of bug IDs filed against the crash type. Furthermore, we download the bug reports using the extracted IDs of the bugs filed for the crash types, and mine groups of duplicate and related bug reports.

Eclipse

We parse the 213,000 bug reports contained in the 2008 MSR Mining Challenge data set and extract all of the comments posted by users (e.g., developers) for each bug. Unlike Firefox, the Eclipse stack traces are embedded in the comments of Eclipse bug reports. We process the comments using regular expressions to extract the failing stack traces of the bug reports in a similar way as Betttenburg et al. (2008). We obtain 22,379 bug reports having comments which contain at least one stack trace. We obtain 29,874 stack traces that we link to their corresponding bug report IDs. A bug report ID is linked with a set of stack traces. We cleanse and verify all of the extracted stack traces manually to ensure that there is no chaos (e.g., English words describing the crash scenario) in the extracted stack traces. In addition, we mine duplicate bug report relations among bug reports. After the stack traces extracted from the comments of a bug report, we keep the remaining words in the comments as textual description for the stack traces. We group the extracted stack traces into crash types using the approach in Section 2.2 and maintain a mapping between a crash type (i.e., a set of stack traces) and its textual description.

3.2.2 Identification of Developers-defined Crash Correlation Groups

To validate our proposed rules for identifying correlated crash types, we build a gold standard by mining Developer-defined Crash Correlation Groups from our dataset. More specifically, we identify Developer-defined Crash Correlation Groups (CCGs) by grouping together crash types that are linked to the same bugs. We create groups containing at least two crash types. The links between crash types and bugs are established by developers during the triaging and debugging of crash types. These links are updated during the bug fixing process, therefore we are confident that the crash types collectively linked together to a bug are correlated.

Overall we obtain 144 Developer-defined CCGs containing a total of 792 crash types from the Firefox dataset and 1306 Developer-defined CCGs containing 2837 crash types from the Eclipse dataset. In this study, we use Developer-defined CCGs as our gold standard to evaluate the performance of our crash type correlation identification rules. For each Developer-defined CCG, we maintain the list of bugs filed for the group.

3.2.3 Identification of Users

The Firefox crash reports do not contain personal information to identify unique users reporting the crashes due to privacy concerns. To identify users reporting crashes, we have to use heuristics and adopt the approach in Khomh et al. (2011). When we process the Firefox crash reports, we extract the following available information on the crash events:

the install age (in seconds) since the installation or the last update of the user’s system;
the date at which the crash was processed on the server;
the crash time on the user’s operating system when the crash occurred (this time can shift around with clock resets);
the uptime (in seconds) since the user’s operating system was launched;
the last crash of the user;
the other user’s environment information: operating system name, operating system version, architecture (e.g., ×86) and CPU family model and stepping.

For each crash report, we use crash time to subtract the “install age” to obtain the installation time when the user, who reports the crash, installed Firefox. We use the installation time, other user’s environment information and the last crash times to build a vector of unique profiles; each profile represents a user. We associate each unique profile with the list of crash types for which crash reports contain information corresponding to the profile. In this way, we obtain a mapping between each user and his corresponding crash reports. We sort the crash types from each user based on their crash times from newest to oldest. In total, we identify 1,048,576 users (i.e., groups of crash types) from 1,322,385 crash reports.

3.2.4 Identification of Bug Fix Locations

We parse Firefox and Eclipse change logs and apply the heuristics by Śliwerski et al. (2005) to identify bug fix locations. Precisely, we parse commit log messages using a Perl script and extract bug IDs and specific keywords, such as “fixed” or “bug” to identify bug fixing commits. For each bug fixing commit, we extract the list of files that were changed to fix the bug. In the following, we use the two lists of files obtained for Firefox and Eclipse as our gold standard to evaluate the performance of our bug localization algorithm and refer to them as Bug Fixing Location Mapping.

4 Research Questions

This section presents and discusses each research question. For each research question, we present the motivation behind the question, the analysis approach and a discussion of our findings.

RQ1. Can we identify correlated crash types using crash type signature and stack traces?

Motivation

Schröter et al. (2010) observed that when multiple failing stack traces are available, developers fix the bugs quickly. Therefore, the identification of crash correlation groups (i.e. correlated crash types) early in the debugging process will not only help developers fix groups of correlated crash types all together, but it will also help them fix the bugs faster. The identification of crash correlation groups can also help development teams to better manage their resources, for example, by assigning correlated bugs to experienced developers and increasing their priority. Crashes are reported continuously by users until they are fixed. Therefore, by fixing groups of correlated crash types early, development teams can reduce the amount of incoming crash reports.

In this research question, we aim to provide developers with simple rules that can be used to identify crash correlation groups automatically. First, we strive for building a rule requiring only an analysis of crash type signatures. In this way, development teams would be able to process large amounts of crash types efficiently since no deep analysis of the content of crash reports will be required. Second, we investigate if a detailed analysis of stack traces can improve the identification of crash correlation groups.

A higher recall will enable the discovery of more crash correlation groups, resulting in further improvement of the bug fixing process and the management of resources.

Analysis Approach

To answer RQ1, we introduce the following three rules for the identification of crash correlation groups.

These rules were derived from a manual analysis of 40 of Firefox crash types selected randomly.

We define a contains relation between crash type signature elements as follows. Given a crash type signature S=P ₁| P ₂| …| P _n, for two elements P _i=〈f i l e _i〉〈o p _i〉〈m e t h _i〉〈p a r a m _i〉〈m e m l o c _i〉 and P _j=〈f i l e _j〉〈o p _j〉〈m e t h _j〉〈p a r a m _j〉〈m e m l o c _j〉 of S, if $(file_i=file_j)\wedge \{op_i, meth_i, param_i\} \subseteq \{op_j, meth_j, param_j\}$ then P _j contains P _i.

We define a binary relation $\subset $ on the set of all crash type signatures $\mathbb {S}$.

Let S _A and S _B be two crash type signatures where, $S_{A} ={P^{A}_{1}} $|$ {P^{A}_{2}} $| …|$ {P^{A}_{n}}$ and $S_{B} ={P^{B}_{1}} $|$ {P^{B}_{2}} $| …|$ {P^{B}_{m}}$, with ${P^{A}_{i}}=\langle fil{e^{A}_{i}} \rangle \langle o{p^{A}_{i}}\rangle \langle met{h^{A}_{i}}\rangle \langle para{m^{A}_{i}} \rangle \langle memlo{c^{A}_{i}} \rangle $, ${P^{B}_{j}}=\langle fil{e^{B}_{j}} \rangle \langle o{p^{B}_{j}}\rangle \langle met{h^{B}_{j}}\rangle \langle para{m^{B}_{j}} \rangle \langle memlo{c^{B}_{j}} \rangle $, i∈{1…n}, j∈{1…m}, and m≥n.

$S_{A} \subset S_{B}$ if $\forall {P^{A}_{i}}$, i∈{1…n}, $\exists j\in \{1{\ldots } m\} | {P^{B}_{j}}$ contains ${P^{A}_{i}}$. Table 2 presents some examples of comparison of crash type signatures using $\subset $.

Table 2 Example of the comparison of crash type signatures

Full size table

Rule 1 identifies similarities between the signatures of correlated crash types. More specifically, it compares the strings of the signatures of two crash types and uses the contains relation to decide if they are correlated.

To investigate if a detailed analysis of stack traces can improve the identification of crash correlation groups, we manually analyzed 400 stack traces extracted from 400 of Firefox crash reports. The crash reports were selected randomly from our 40 randomly selected crash types. From this analysis, we derived the following two additional rules for the identification of crash correlation groups.

Rule 2 can be applied on the following example from Firefox 4.0b1. The top frames of the crash types js_GetGCThingTraceKind and js_IsAboutToBeFinalized are respectively j s_G e t G C T h i n g T r a c e K i n d|j s/s r c/j s g c.h and j s_I s A b o u t T o B e F i n a l i z e d| j s/s r c/j s g c.c p p. These two crash types are correlated and linked to the bug 514819. As illustrated by the above example, Rule 2 compares the fully qualified file names of the top frames of two crash types to verify if the crash types are correlated. When two crash types have the same fully qualified file name in their top frame, the two crash types are correlated.

We also analyze the other subsequent frames in the stack traces of a crash type to further improve the identification of crash types correlations. We introduce the concept of closed ordered sub-sets of frames for crash types.

Lets ST be a set of stack traces {T ₁,T ₂,…,T _p}, where p is the number of stack traces in the set, $T_{i} = \langle F^{i}_{1}$, $F^{i}_{2}$, …, $F^{i}_{n_{i}} \rangle $, $F^{i}_{j}=methSig{n^{i}_{j}}|qfileNam{e^{i}_{j}}$, j∈{1,…,n _i}, n _i is the number of frames in T _i, and i∈{1,…,p}.

Figure 2 shows an example of stack trace. Each frame in the stack trace has a method signature (e.g., OnWriteSegment for F ₁) and a fully qualified file name (e.g., h t t p/n s H t t p C o n n e c t i o n.c p p for F ₁).

Given an ordered set of frames S u b F=〈G ₁,…, G _m〉, For each T _i, i∈{1,…,p}, if ∃k,l, with 1<k≤l≤n _i| ($G_{1}=qfileNam{e^{i}_{k}}$) ∧…∧($G_{m}= qfileNam{e^{i}_{l}}$), then S u b F is an ordered sub-set of frames of T _i. The value of each frame in S u b F is a Fully Qualified File Name.

Whenever ∃i∈{1,…,p}|S u b F is an ordered sub-set of frames of T _i, we denote S u b F as an ordered sub-set of frames of ST. S u b F is a closed ordered sub-set of frames of ST if there is no other ordered sub-set of frames of ST containing S u b F.

The absolute support of S u b F is the number of i∈{1,…,p}|S u b F is an ordered sub-set of frames of T _i. The relative support of S u b F is the a b s o l u t e s u p p o r t/p. This relative support is the frequency of S u b F in ST. We consider an ordered sub-set of frames as frequent if its r e l a t i v e s u p p o r t >0.5.

We mine all the stack traces of each crash type and extract frequent closed ordered sub-sets of frames (FCSF), using the BI-Directional Extension based frequent closed sequence mining (BIDE) pattern mining algorithm proposed by Wang and Han (2004). We chose the BIDE algorithm because it scales very well in the number of frequent closed patterns. In fact, BIDE does not require the maintenance of a set of candidate closed patterns. BIDE performs a strict depth first search and can output frequent closed patterns on the fly.

Rule 3 examines the FCSFs of two crash types. If two crash types have a common FCSF, they are correlated. For example, there are two crash types from Firefox 4.0b7: RtlIntegerToUnicodeString and _SEH_prolog. The Rule 3 mines the stack traces of both crash types to identify whether these two crash types share common closed ordered sub-sets of frames. The closed ordered sub-set of frames is identified as illustrated in Table 3. The frequency of this sub-set of frames is 0.96 in RtlIntegerToUnicodeString and 0.90 in _SEH_prolog. Both RtlIntegerToUnicodeString and _SEH_prolog are correlated and linked to the bug report whose id is 591599.

Table 3 A frequent closed ordered sub-sets of frames common to RtlIntegerToUnicodeString and _SEH_prolog

Full size table

To assess the performance of Rule1, Rule 2 and Rule 3, we proceed as follows: First, we filter out from our data set, all the 40 crash types that were used to discover the rules. Second, we rank the remaining Eclipse and Firefox crash types based on their creation date to mimic the current practice. The creation date of a crash type from Firefox is the date on which the first crash report was received. For Eclipse crash types it is the date on which the oldest stack trace in the crash type was reported in a bug report. Next, we apply successively Rule 1, Rule 2 and Rule 3 to the crash types one by one to identify crash correlation groups. Older crash types are processed first. Every crash type is tested against all the other crash types to verify its membership of crash correlation groups. When three rules are combined together, two crash types are in a crash correlation group as long as they satisfy one of three rules.

We compare the obtained crash correlation groups to Developer-defined CCGs and compute the precision and the recall of the rule using respectively (1) and (2). The precision value measures the fraction of retrieved crash correlation groups that are correct, while the recall value measures the fraction of correct crash correlation groups that are retrieved.

$$ precision = \frac{|\{correct\: CCGs\}\bigcap\{retrieved\: CCGs\}|}{|\{retrieved\: CCGs\}|} $$

(1)

$$ recall = \frac{|\{correct\: CCGs\}\bigcap\{retrieved\: CCGs\}|}{|\{correct\: CCGs\}|} $$

(2)

Rule 3 is dependent on the threshold 0.5 that is used during the identification of frequent closed ordered sub-sets of frames. Therefore we perform a sensitivity analysis to measure the impact of threshold selection on the results. Precisely, we repeat the evaluation of Rule 3 using thresholds 0.1 to 1 by step 0.1 and 30 first crash reports in each crash type. Rule 3 is also dependent on the number of stack traces that are processed for each crash type. We repeat the evaluation of Rule 3 using 10, 20, 30, 40, 50, and 100 first crash reports in each crash type and the threshold 0.5.

Findings

We obtain a precision of 100 % and a recall of 68 % for Firefox using Rule 1. All the crash correlation groups of Firefox retrieved using Rule 1 are correct. For Eclipse, Rule 1 achieved a precision of 69 % and a recall of 46 %. We attribute the low recall observed for Eclipse to missing information in crash type signatures; indeed Eclipse crash type signatures contain neither parameters nor memory location information. However, achieving a 69 % precision with a simple rule like Rule 1 is already a good result. Moreover, Rule 1 identifies crash type correlation groups very efficiently. We were able to process 752 Firefox crash types in 4.53 seconds and 2797 Eclipse crash types in 22.32 seconds on a Lenovo Thinkpad laptop with an Intel Core i7-2620M CPU 2.7GHz processor and 8GB RAM.

We obtain a precision of 45 % and a recall of 48 % for Firefox using Rule 2, and a precision of 40 % and a recall of 52 % for Eclipse. When we apply Rule 1 and Rule 2 together, we obtain a precision of 89 % and a recall of 83 % on Firefox, and a precision of 75 % and a recall of 58 % on Eclipse. The results indicate that Rule 2 increase the recall obtained with Rule 1 by 15 % on Firefox and 12 % on Eclipse.

Table 4 shows that when the threshold of relative support used to identify frequent closed ordered sub-sets of frames is ≥0.5, Rule 2 and Rule 3 increase the recall obtained with Rule 1 without decreasing the precision. For both Firefox and Eclipse, the best precision and recall are obtained with a threshold value of 0.5.

Table 4 Precision and recall of using Rule 1, Rule 2 and Rule 3 together for different thresholds

Full size table

Table 5 shows that our three rules do not require the analysis of a large number of crash reports. High precision and recall (i.e., ≥0.65) are achieved with as little as 10 stack traces per crash types on both Firefox and Eclipse stack traces. This result is particularly important since software organizations receive millions of incoming crash reports every day. Using our rules, they can identify crash correlation groups efficiently by analyzing only the first 10 incoming crash reports of every crash types.

Table 5 Precision and recall of using Rule 1, Rule 2 and Rule 3 together for different number of crash reports. NCR stands for number of crash reports.

Full size table

Table 6 summarizes the results obtained by using different sets of rules. Rule 2 improves the recall of Rule 1 on Firefox and Eclipse. However Rule 2 decreases the precision of Rule 1 on Firefox by 11 % and increases the precision of Rule 1 on Eclipse by 6 %. Based on the results in Tables 4 and 5, when the threshold values of relative support and number of crash reports are set to be 0.5 and 30 respectively for Rule 3, Rule 3 improves the precision and recall obtained by using Rule 1 and Rule 2 together.

RQ2. Can we identify correlated crash types using the occurrence times of crash events?

Table 6 Summarized results of using Rule 1, Rule 2 and Rule 3. The value in parentheses shows the percent difference in results caused by using one more rule on correlation group identification

Full size table