Keywords

1 Introduction

Our society, organizations and IT systems depend on processes. Products and services can only be delivered efficiently and effectively when processes are running as planned. Process mining aims to discover, monitor, and enhance processes by extracting knowledge from event data that can be extracted from almost all modern [1].

Process Mining is able to bridge the gap between Business Process Modeling (BPM) and data driven methods like data mining and machine learning [2]. Process mining is able to analyze the actual processes without relying on simplistic models. There are basically two main types of data-driven analysis [3]:

  • Predictive analysis: involving techniques that extract knowledge and rules to predict or classify samples, such as classification, regression and time series algorithms.

  • Descriptive analysis: involving techniques that discover interesting knowledge about samples and their attributes to explain the data (e.g. association rules).

In other words, descriptive analysis techniques extract patterns from the data with respect to properties and their values. For example, a manager wants to know in which situations customers have complaints. Descriptive analysis will not be able to predict the complaints; however, it will provide insights about various factors that may cause the complaints [5].

The lion’s share of process mining research has been devoted to descriptive forms of analysis. Next to process discovery techniques, there have been approaches to group traces. The approach presented in [4] clusters traces thereby characterizing each cluster. However, in this method class of samples cannot be used. The approach presented in [5] extracts interesting patterns based on a class attribute. In many applications, stakeholders prefer to analyze and know more about a subset of cases rather than all the cases. Examples of interesting subsets (or target group) include:

  • Deviating cases from the reference model

  • Cases with high or low performance

  • Cases with high profits for the company

  • Unfinished or canceled cases

  • Cases from a particular period

  • Cases that pertain to users complaints

  • Events related to particular products or services

Given such subsets of cases, it is of the utmost importance to see what kind of attributes they share. For example, discovering that deviating cases are caused by particular resources or limited to specific groups of customers. According to our knowledge, there is no research has been done to extract such information from event data. The main contribution of this paper is that we apply subgroup discovery techniques in the context of process mining domain, to discover the statistically most interesting patterns in a subset of cases called the target groups. The attributes and also the target group can be created based on features extracted using process mining algorithms (e.g., conformance checking or performance analysis). Moreover, our approach also produces insightful collections event logs that can be used as input for a range of existing process mining techniques (e.g., process discovery). In short, this approach will help process analyst to find what are distinctive attributes in a subgroup of cases. Such information assists further investigations like root cause analysis.

Table 1. Small fragment of the dataset provided by Telefonica, related to the ticket handling process.
Fig. 1.
figure 1

A normative process model that describes the ticket handling process. This model was designed by Telefonica (of course such models can also be discovered based on the event log).

To evaluate the possibility of using this method in the reality, we provide a case study where we applied our proposed method on the ticket handling process of Telefonica. Figure 1, shows the process model of this process. The ticket handling process in Telefonica consists of the following main steps. First, a ticket is created through the ‘New’ activity and then it should be activated by conducting the ‘Active’ activity. After this activation, a ticket should be handled appropriately and consequently closed through the ‘Solved’ and ‘Closed’ activities. It is possible to interrupt the handling of a ticket by the ‘Delayed’ activity. Also, a ticket could be restored to the customer via the ‘Restored’ activity. There is also another possibility, namely: the cancellation of tickets by the ‘Canceled’ activity. This can happen at any point in their lifetime. We consider ‘Canceled’ and ‘Closed’ as the possible final activities of a ticket.

Every process may be executed for multiple cases (also called process instances). Each case is composed of a set of events that are stored in the event log. The standard format for storing an event log which is supported by the majority of process mining tools is XES [6]. In Table 1, a simple example of the event log for the Fig. 1 is shown that contains 3 cases. Cases A1001 and A1003 have 4 events and A1002 has 3 events. By using the CaseID field we know which events are related to particular cases. Note that, case A1002 is not completely “fitting” in the process model (there is one so-called “move in log” showing an event that happened in reality but could not happen according to the model). Furthermore, both events and cases may have attributes that can be used. For example, in Table 1, Resource and Group are event attributes. These attributes indicate that who is handled each event and do it in which organizational part of the company. Also, Severity that is a case property, shows the importance of different tickets (cases) in the event log.

The remainder of this paper is organized as follows. In Sect. 2, subgroup discovery is explained. In Sect. 3, we describe how we map and use subgroup discovery in the process mining domain. Section 4 describes the implementation of our approach. Next, Sect. 5 illustrates the usefulness of our approach through the application of our techniques to a real life dataset obtained from Telefonica. Finally, Sect. 6 concludes the paper.

2 Subgroup Discovery

Subgroup discovery was originally proposed by [7, 8] and it is based on the idea of local exceptionality detection [9]. In contrast with most classification or prediction algorithms, subgroup discovery does not try to find rules that are used to decide or predict things for new instances of the problem. Also, unlike clustering methods, in this technique, we assume that we have a population of samples that have already a class label (e.g., deviating or not). As mentioned before, the aim of subgroup discovery algorithm is to discover patterns for particular class labels (target groups) [8]. In other words, we try to find the common characteristics in a subset of cases that are fewer happened in the other cases. For example, discovering cases that are delayed caused by particular resources or limited to specific type of tickets. Subgroup discovery is used in various domains including the filed of Bioinformatic, e-learning and medical domain [11]. Also in [12] this technique is extended to used multi class data.

We define a subgroup as \((ValueSet \rightarrow Target)\) where ValueSet is an ordered list of independent attributes having specific values. In addition, Target is the desired class of samples that we are interested in analyzing them like deviated cases. For example, \( S_{1} \), \( S_{2} \) and \( S_{3} \) are three examples of possible subgroups:

Using subgroup discovery we want to discover interesting subgroups. According to [8], a subgroup is interesting if it satisfies the following conditions:

  • it is of considerable size and

  • it has the most unusual statistical distribution characterization (distribution of different classes in the subgroup compared to their distribution in whole samples)

In Fig. 2, this concept is illustrated. Consider that the class feature is depicted by a red dash or a blue plus. In this figure, three subgroups are shown. Subgroup (a) is not an interesting one because there are too few samples included in it. In other words, this subgroup is too specific. In contrast, subgroup (b) has more samples, but the distribution of samples in it is not unusual, because it is the same as the whole population. Finally, the subgroup (c) has a substantial number of samples and has an atypical distribution at the same time, therefore it is considered as an interesting subgroup. It should be noted, it is not required that all samples included in a subgroup have the same class (see for example subgroup (c)). Also, one sample could be placed in more than one subgroup simultaneously (or not placed in any subgroup), i.e., subgroups are not a partitioning of the whole set.

In this approach, we consider only the standard definition of interestingness (based on size and statistical difference), however other definitions could be applied that incorporate domain or business knowledge.

Fig. 2.
figure 2

Three different subgroups. Subgroup (a) is very specific, subgroup (b) has a class distribution similar to the whole and thus not “unusual” enough. The subgroup (c) is an interesting subgroup because it has sufficient samples with a class distribution sufficiently different from the rest. (Color figure online)

Many measures have been proposed in the literature to quantify the quality of a subgroup and its interestingness. Table 2 summarizes several of the proposed metrics mentioned in papers like [3]. Many of these measures have also been applied in the association rules mining field. To illustrate them in a better way, we use the contingency table presented in Table 3. This table is a useful way to examine relations between categorical variables [10]. A sample matches a particular ValueSet if its attributes have values in the ranges defined by ValueSet. Similarly, a sample matches a particular Target if its class attributes has a value defined by Target. In this table, the number of samples that match the ValueSet and the number of samples that match the defined Target are indicated by \(n_{V}\) and \(n_{T}\) respectively. Also, the number of samples that match both the selected ValueSet and Target is indicated by \(n_{VT}\). In addition, \(n_S\) is the total number of samples. Note that \(n_{\overline{T}}\) and \(n_{\overline{V}}\) are define the number of samples do not match the Target and the selected ValueSet respectively. Therefore, \(n_S= n_T + n_{\overline{T}} = n_V + n_{\overline{V}} \).

A higher value of the coverage metric means that the subgroup has more samples. coverage = 1 indicates that the corresponding subgroup includes all the samples. Therefore, an interesting subgroup should have a coverage that is high enough. A value of 1 (or 0) in the support metric indicates all the samples (or none of them) are match both the ValueSet and Target class. If a subgroup has a value of 1 in its confidence metric, it indicates that if a sample match the selected ValueSet it should match the Target too. The lift metric computes how dependent (or independent) are the Valueset and Target. If Lift equals 1 then they are independent. However, a value higher than 1 suggests a positive correlation and a value lower than 1 indicates a negative correlation. If the added value metric has a value of 0, it suggests that the distribution of the classes are similar in both subgroup and total samples and consequently, the ValueSet has no influence on the Target distribution. In addition, a higher positive (or lower negative) value for this measure, suggests higher positive (or negative) effect on the distribution of the target feature.

Table 2. List of various measures used in subgroup discovery domain.
Table 3. Contingency table shows counts of all possible conjunctions of ValueSet and Target group.

Precision measures the quality of a subgroup by computing ratio of different classes when samples match the selected ValueSet. In its formula, g is the generalization parameter which is usually in the range [0.5, 100]. The unusualness value of a subgroup is computed based on both the coverage and added value of it. It could be proven that \(Unusualness(subgroup) = WRAcc(ValueSet\rightarrow Target) \) is equal to \(PS(Target\rightarrow ValueSet )\) which is widely used in field of association rules mining. Both of them equal to \(\frac{n_{V} \times n_{VT} }{ n_{S} \times n_{V} } - \frac{n_{V}\times n_{S} }{ {n_{S}}^{2}}\) (one of difference between association rule and subgroup discovery is in association rule we extract \(Target\rightarrow ValueSet\) pattern, but here we are interested in \(ValueSet\rightarrow Target\) patterns.) These measures account for coverage (size of a subgroup) and added value (unusual statistical distribution of subgroup) at the same time (\( Conf (Target \rightarrow ValueSet) \times Supp(Target) \)).

In this paper, we mainly use the unusualness measure and it’s range is in [−0.25, 0.25]. Unusualness equals 0, suggests that a subgroup would not be interesting; however, a higher positive value indicates that the ValueSet has higher effect on the Target compare to the whole samples. Also, lower negative value for this measure, shows that the samples match the selected ValueSet have lower fewer in the Target class compare to other class. In many applications, discovering subgroups with negative unusualness would be also valuable. Thus, we use the absolute value of unusualness (\(|WRAcc(subgroup)|\)) or RuleInterestVariant [15].

3 Applying Subgroup Discovery in Process Mining

In this section, we formally define how to apply subgroup discovery in the field of process mining. The architecture of proposed method is illustrated in Fig. 3. The starting point of our method is an event log. An event log may contain many cases and each case has a set of associated events. Most of the process mining techniques consider events as the starting point for process analysis. In this research, we focus on cases rather than events.

Table 4. Some process mining properties for the event log of Table 1. To compute alignment costs we use standard cost.

Therefore, in the next step we extract properties for all cases. There are three types of properties in process mining: properties that are related to (a) cases, (b)events, and (c) processes mining properties. In general, a case property is the same for all the events of a specific case. However, for event attributes, the values could be different (or simply missing) for individual events within a case. For example, in Table 1, CaseID, Severity, Type, and Creator are case properties and EventID, Operation, Resource, and Group are event properties. Properties of events can also be mapped to cases properties indirectly. In Fig. 4, an example of such mapping is shown. All possible values of each event property are mapped to a case property. If in any event of a case this value occurred, then the corresponding property of case equals 1, otherwise, it will be 0 (here we use existence function, but other functions like frequency could be used as well). To explain more, a resource of event 6 is “Tim” and because this event belongs to case “A1002”, the value of “R:Tim” for this case equals 1.

The third type of properties, the so-called process mining properties, are obtained by performing some kind of computation over the events within a case. Examples include performance metrics (sojourn time, waiting time, etc.) or conformance checking metrics (fitness, precision, counts of move on logs and model, etc.). To extract some of the mentioned features we can optionally provide a process model (that could be given as a reference model or discovered by some process discovery algorithm). Some examples of process mining properties for the event log of Table 1 are given in Table 4. Note that process mining techniques like conformance checking can be used as input for subgroup discovery. However, the very same techniques can be applied to the discovered subgroups in a later phase. This shows the close interaction between process mining techniques and subgroup discovery.

Fig. 3.
figure 3

The architecture of proposed method.

Fig. 4.
figure 4

Mapping the properties of events to case (trace) properties. Values of event attributes are transformed to a case property. For each case, it is computed whether the property is present. The values that are indicated with red color, explain that we use the existence of an attribute value. (Color figure online)

According to Fig. 3, the output of Property Extractor component will be a matrix where each row corresponds to a case and each of its columns refers to a property.

Definition 1 (Universes)

\( U_{S}= \mathcal {P}(U_{V})\) is the universe of value collections. \( U_{H} =\mathcal {P}(U_{S}) \) is the universe of sets of value collections (set of sets). Note that \(v \in U_{V} \) is a single value (e.g. \( v= Claim \)), \(V \in U_{S}\) is a value collection (e.g., \(V = \{ Claim , Order , Query \}\)).

Definition 2 (Case Base)

A case base \(CB=(C,P,\pi )\) defines a set of cases C, a set of properties P, and a function \(\pi \in (P \rightarrow (C \rightarrow U_V)) \). For any properties \(p \in P\), \(\pi (p) \) (denoted \(\pi _{p}\)) is a partial function mapping cases onto values. If \(\pi _{p}(c)=v\), then case \(c\in C\) has a property \(p \in P\) and the value of this property is \(v \in U_{V}\).

Therefore, P includes case, events and process properties and each property \(p\in P \) corresponds to column in the extracted matrix shown in Fig. 3. According to the method’s architecture (Fig. 3), for each case in the case base (that is created by the property extractor), a class attribute and intended properties are selected. The class attribute is a binary property that helps us to divide cases into two subsets (two classes). The first subset contains cases that we are particularly interested in analyze them (e.g., cases with delay or deviation). The rest of cases are placed in the other subset. By defining a class attribute, we specify which of the cases are interesting for analysis. In addition, not all of the properties in the case base may be noticeable and we should set aside them from properties that will be analyzed them in this subset. We name this subset target group and define it formally as follows:

Definition 3 (Target Group)

\(TG(CB, Att,\pi _{class})\) is a target group of \(CB=(C,P,\pi )\) where \(\pi _{class}\) is a membership function mapping cases to their relative classes. If a case \(c\in C\) belongs to our intended subset, then \(\pi _{class}(c)=1\) otherwise \(\pi _{class}(c)=0\). Attributes \(Att \subseteq P \) is a subset of the case base properties that we are interested to analyze their effect on the intended subset of cases.

Therefore, we could say that TG specifies a subset of properties in the case base that we want to analyze them and class of each case. Using this definition, we take a case base as an input and returning a subset of it’s properties and the class value of each case.

At last, by applying subgroup discovery on the target group we will discover many subgroups. Here we formally define a subgroup as the following definition.

Definition 4 (Subgroup)

S(TG,att,vs) is a subgroup of attribute \(att \in Att\) when \(\pi _{att} = vs\) on the target group TG. Each subgroup is a subset of cases in the TG that in these cases, the value of attribute \(att\) equals to \(vs\).

As an example, att can be \( Type \) and vs equals \( Claim \). The resulted subgroup is the subset of cases in the TG and the value of “Type” property for these case is claim.

Considering several properties in the target group, we will have many subgroups. However, the discovered subgroups are different based on their size, interestingness, distribution, and effects of them on the target group. We use unusualness measure to compute the interestingness of discovered subgroups on the target group. We name this measure Impact Effect and denote it by \(IE( subgroup )\). The higher value of IE suggests higher positive Influence of the subgroup. As mentioned before, we aim to discover subgroups with higher \( |IE| \) values. Using this definition we can compute the interestingness (or unusualness) of each subgroup on the target group.

Until now, we just considered one attribute in the ValueSet of a subgroup. However, it is possible to have a subgroup with multiple attributes. The complexity of a subgroup could be defined by the number of attributes in its ValueSet [3]. For example, is a subgroup with multiple attributes and its complexity equals 2. Note that in combination of properties, each property should not appear more than one time in a subgroup.

However, computing all possible subgroups would be very time-consuming. There are many methods proposed to overcome this issue [21]. Here, we use minimum coverage of subgroups. So, subgroups with \(Cov(subgroup) \) lower than the minimum threshold are not considered. Note that if the coverage value of (\(ValueSet_1\rightarrow Target\)) is \(Cov_1\), the coverage value of (\(ValueSet_1 \wedge \ ValueSet_2\rightarrow Target\)), by definition, is less than or equal to \(Cov_1\). Thus, if a subgroup does not contain sufficient samples to have the minimum coverage, no other subgroups included in this subgroup have a higher coverage and there is no need to consider them.

In the final step of our approach we apply process mining techniques to the subgroups created. For each subgroup, we could extract a sublog (i.e., a subset of the main event log). A wide range of process mining algorithms ranging from dotted chart [16] and process comparator [17] to the inductive miner [18] and various conformance checkers [19] could be applied on these sublogs for further analysis.

Fig. 5.
figure 5

An example of the output of the Subgroup Discovery plugin. (Color figure online)

4 Implementation

To make it possible to apply subgroup discovery approach in the process mining context, a Subgroup Discovery plugin has been developed in ProM framework. ProM is an open source tool that allows to use and implement lots of different techniques in the field of process mining [20]. This tool can be freely downloaded from www.promtools.org.

The Subgroup Discovery plugin takes two event logs as input, one contains all the case samples (Case Base) and the other one is related to the subset of cases that we want to characterize (i.e., target group). Therefore, the second event log should be a subset of the first event log. Furthermore, regarding the output range of unusualness metric (and also PS metric) is in [−0.25, 0.25], we use range bar chart (it is also called Tornado chart) to visualize the impact of each subgroup on the target group. A screenshot of an example output result of subgroup discovery obtained using our tool is shown in Fig. 5.

Our plugin provides four types of results. First and foremost, we provide the impact effect analysis that is shown in Fig. 5(a). Each subgroup is shown in one row and its effect on the target group is indicated by a bar. The blue bars indicate a positive influence and red ones depict a negative impacts on the target group. The result presented in Fig. 5(b) illustrates Added Value that suggests how the percentage of classes are changed in a subgroup compared to whole samples. The chart presented in Fig. 5(c) shows how many samples in each subgroup are placed in the target class or another class (red bars correspond to target class). At last, in Fig. 5(d) the plugin shows a table with measured values for coverage, support, and confidence for each subgroup.

5 Evaluation

To evaluate the usefulness of applying subgroup discovery in the field of process mining we applied our approach and implementation to a dataset of Telefonica. As mentioned before, this data relates to the ticket handling process of three services provided by Telefonica and its corresponding process model is shown in Fig. 1. Also, a few statistics for this dataset are shown in Table 5. Guided by our assumption about complete cases, we just consider cases that contain “Canceled” or “Closed” activities. All other cases are removed from dataset.

The business questions that will be answered in the remainder of this section are:

  1. 1.

    Which attribute values often appeared in cases that have a long duration (cases with delays)?

  2. 2.

    Is there any difference in the property values of different services? If yes, what is the difference and which attribute values have more impact on such differences?

To answer each question, we should first define the intended cases that make our target group. Our target group for Question 1 is defined by cases that take more than 80 days to finish. Also, for answering Question 2, we consider the Jasper service (i.e., one of the three provided services) as our target group. Some statistics for these target groups are shown in Table 6.

The results of applying our new ProM plugin on these target groups are shown in Figs. 6 and 7 respectively. In the remainder of this section, we explain some of our findings for each question.

Question 1: Figure 6 indicates 37 subgroups for the class of slow cases. It shows that in the class of slow cases there is an under representation of and an under representation of modification by (in fact there is no case with this service in the slow case class). In contrast, in this class , and are more represented and therefore, they have a higher positive effect. Therefore, if stakeholders want to collate with slowing cases they should pay more attention to these properties. For example, they should think about the relation of or with the slowness of cases. Also, in this class, the case with are more presented (for conformance checking we use “Replay a Log on Petri Net for Conformance Analysis” plugin.)

Question 2: In Fig. 7, again 37 subgroups for Jasper service class are illustrated (in our experiments accidentally the number of discovered subgroups be similar). This chart indicates that and have higher influence on this service. Also, the unfitted cases or cases with are more presented in this target class. Although, some of these subgroups may be obvious (like has negative impact and has positive impact, because we just consider service in this target group), the extracted rules indicate that this approach could extract interesting and correct patterns in subgroups that could not be uncovered by looking at the whole log.

Table 5. Statistical information of Telefonica dataset
Table 6. Statistical information of target groups. For each question, we have two classes.
Fig. 6.
figure 6

Interesting patterns discovered by subgroup discovery technique for cases with long duration. The red bars indicate negative effect and coloration and blue bars suggest positive influence and correlation. (Color figure online)

Fig. 7.
figure 7

Interesting patterns for Jasper service cases. Longer bars show higher influence for corresponding subgroup.

Fig. 8.
figure 8

The Dotted charts of two sublogs extracted from two discovered subgroups.

We also present these subgroups to Telefonica experts who have business knowledge. They confirmed that all of the discovered subgroups are correct, but not all of them were considered as surprising. They also recommended us to define other target groups and reapply our approach using these new target groups.

Even though other techniques like correlation, association rule mining and decision tree have similarities with subgroup discovery algorithm, they could not find these discovered subgroups. For example, when we apply correlations we typically do not consider the sizes of subgroups. Also, when applying decision trees, we aim to discover rules for predicting future samples not describing current ones. In association rule mining variations that consider a class feature, the focus is on coverage, support and confidence of a class and an item set and unusualness of a rule is of less relevant. In the other hand in subgroup discovery, at the same time coverage and unusualness of a pattern (not a rule) are intended to describe the model of samples. Also, subgroup discovery is focused on the target group rather than all the samples.

According to Fig. 3, each subgroup also is an event log that could be used for further process mining analysis. In Fig. 8, we compare the dotted charts of two sublogs related to subgroups of and . According to Fig. 8, it is indicated that the cases of the first subgroup take more time whereas cases in the second subgroup take less time. We also apply “Mine Petri Net with Inductive Miner” plugin on these sublogs. The discovered models using this plugin are shown in Fig. 9. According to this figure, there are difference in their process. For example, in the process model in Fig. 9(a) it is possible for a “Delayed” ticket to be “Active” again, but it is impossible in the in the process model in Fig. 9(b). These kinds of analysis could give valuable information to stakeholders for understanding the reasons for difference in behavior in subgroups of cases. It is also possible to apply any other process mining technique on the discovered subgroups.

Fig. 9.
figure 9

The process models of two sublogs discovered by “Mine Petri Net with Inductive Miner” plugin: (a) is the process model of subgroup and (b) is the process model of

6 Conclusion

Process mining can be used to extract knowledge from event logs. However, event logs may contain information on cases with very different characteristics. Analyzing these different group of cases together may conceal important phenomena. Delays and deviations may be linked to very particular subgroups that are not known beforehand.

To address this problem, we applied subgroup discovery technique to find the statistically interesting patterns in subsets of cases belonging to a predefined target class (e.g., cases that are delayed). In this regard, properties of the event log are extracted with their corresponding values. These properties could be related to the case, its events or computed by other process mining techniques. Afterwards, interesting subgroups of the target group can be extracted by applying well-known measures like Added Value and WRAcc. Interesting subgroups that contribute to the target group positively or negatively may be discovered. Importantly, any process mining algorithms can be applied to the discovered subgroups to extract surprising insights and behaviors.

To evaluate the proposed approach we developed a plugin in a ProM platform and applied it in a case study conducted together with Telefonica. Two target groups are defined for this purpose, one for slow cases and another for cases related to a specific service. This case study indicates that the proposed approach could is able to discover interesting patterns. However, not all of them were surprising for business experts.

In the current implementation we do not consider attributes with continues values. In ProM and other data mining tools there are techniques to make these attributes discrete. Not doing this up-front, but trying to integrate this in the approach itself may be very time consuming, especially for time and date attributes. Here, we also define target groups manually, however defining a suitable target group would be a challenging task for continues attributes.