Keywords

1 Introduction

Workflow discovery techniques [1] have gained attention in BPM applications, owing to their ability to extract (out of historical execution data) a descriptive model for the behavior of a process, which can support key process analysis/design, process improvement and strategic decision making tasks.

However, two critical issues undermine the effectiveness of traditional workflow discovery methods, when they are applied to the logs of lowly-structured processes: (i) the high level of details that usually characterizes log events, which makes it difficult to provide the analyst with an easily interpretable description of the process in terms of relevant business activities, and (ii) the presence of various execution scenarios (a.k.a. “process variants”), which exhibit different business processing logics (often determined by key context factors), and cannot be captured effectively with a single workflow model. In fact, when applied to such logs, most current workflow discovery techniques tend to yield “spaghetti-like” models, suffering from both low readability and low fitness [7].

Two kinds of solution methods have been proposed in the literature to alleviate these problems: (i) turn raw events into high-level activities by way of automated abstraction techniques [6, 10], (ii) partition the log into trace clusters [4, 5, 8, 9], capturing homogenous execution groups, and then separatley model each cluster with a simpler and more fitting workflow. Since both kinds of methods assume that each log event refers to a predefined (possibly low-level) process task, they are of limited usefulness for many real-world flexible BPM applications (e.g., product management, or problem/issue tracking). In such cases, indeed, each log event takes the form of a tuple storing several data fields, none of which can be interpreted as a task label (capable to fully capture the semantics of the performed activity). Moreover, the clusters produced by current event abstraction (resp., trace partitioning) approaches are not self-descriptive, and the analyst must carry out difficult and long interpretation/validation tasks to turn them into meaningful classes of activities (resp., process instances).

Contribution. In order to overcome these limitations, a two-fold mining problem is stated in this work, for a given low-level multi-dimensional event log. On the one hand, we want to discover an event clustering function, allowing for automatically abstracting log events into (non a-priori known) event types, each of which is meant to represent a distinguished pattern for the execution of single work items. While exploiting such event abstraction, we also want to possibly detect and model different process execution variants.

Our solution approach relies, first of all, on a specialized logic-based event clustering method, which can find a clustering function encoded in terms of decision rules (over event attributes), to provide the analyst with an interpretable description of the discovered activity types (i.e. event clusters). We also find a similar conceptual clustering function for partitioning the log traces into different execution classes, by using their associated context data (e.g. cases’ properties or environmental factors) as descriptive variables. To this end, an iterative trace-clustering approach is proposed, which tries to greedily maximize the (average) quality of the workflow schemas induced from the discovered trace clusters.

Expressing trace/event clusters via predictive (logical) rules is an important distinguishing feature of our approach, which makes it possibly support the implementation of advanced run-time services. This point is discussed in Sect. 6, which also provides a comparison with related work in the literature.

2 Preliminaries

Log Data. For each process case a trace is recorded, storing the sequence of events happened during its enactment. Let \(E\) and \(\mathcal {T}\) be the universes of all possible events and traces, respectively, for the process under analysis. For each trace \(\tau \in \mathcal {T}\), \(len(\tau )\) denotes the number of events stored in \(\tau \), while \(\tau [i]\) is \(i\)-th event in \(\tau \), for \(i \in \{1,..., len(\tau ) \}\).

For each event \(e \in E\), let \(prop(e)\) be a tuple data properties (over some attribute space) associated with \(e\) — each event also refers to a case ID and to a timestamp, but these are useless for recognizing general activity patterns. Execution cases as well are often associated with a number of data properties. For each trace \(\tau \in \mathcal {T}\), let \(prop(\tau )\) be the data properties of \(\tau \). Finally, let \(traces(L)\) (resp., \(events(L)\)) denote the set of traces (resp., events) stored in \(L\).

Workflow Schemas. Our final aim is to describe process behavior by way of flow-oriented models, like those commonly used to express the control-flow logics of a business process. For the sake of concreteness, we here focus on the language of heuristics nets [14], where a workflow schema is essentially a directed graph, where each node represent a process activity, and each edge \((x,y)\) encodes a dependency of \(y\) on \(x\). In addition, one can express cardinality-based fork (resp., join) constraints for nodes with multiple outgoing (resp., incoming) edges.

For any workflow schema \(W\), let \(\mathcal {A}(W)\) be the set of all activities featuring in \(W\). The definition of \(\mathcal {A}(W)\) is a crucial point in our setting, where it may well happen that no predefined business activities exist for the process. In such a case, our approach consists in partitioning log events into event classes, representing distinguished activity patterns, which can be eventually used to label the nodes of a workflow schema.

Fig. 1.
figure 1

Workflow schema induced from the Problem Management log, when using a status-based event abstraction.

Example 1

Let us consider a real-life Problem Management application (used as a case study in Sect. 5), where each log trace registers the history of a ticket. Each log event \(e\) is a tuple featuring 8 data attributes: the status (accepted, queued, completed, or closed) and substatus (unmatched, awaiting_assignment, assigned, in_progress, wait, cancelled, or closed) of the ticket when \(e\) happened; the resource who generated \(e\), and her nationality (res_country); the support team, functional division and organization line which the resource was affiliated to; the country where the line was located (org_country). Since such representation carries no information on what process task was performed in each event, we need to infer activity types from event tuples, to grasp some suitable abstraction level on process behavior. A common approach consists in abstracting each event tuple into just one of its attributes, and interpret it as an activity label. Figure 1 shows the model discovered (by algorithm Heuristics Miner [14]) after replacing each event with its respective status’s value. \(\lhd \)

Figure 1 evidences that a one-attribute event abstraction is likely to produce a lowly informative (or even trivial) process model for a log storing fine-grain execution traces. On the other hand, using a combination of multiple attributes to the same purpose may well yield a cumbersome and overfitting workflow schema, as confirmed by our tests. The latter solution is, in fact, also unviable in many real application scenarios for scalability reasons, seeing as the computation time of typical approaches (to both workflow discovery and log abstraction) is quadratic in the number of activities.

Behavioural Profiles. Behavioural profiles [13] are basic ordering relationships over activity pairs, which can summarize a workflow schema’s behavior, and can be computed efficiently for many classes of workflow specification languages.

Let \(W\) be a workflow schema, and \(\mathcal {A}(W)\) be its associated activities. Let \(\succ ^W\) be a “weak order” relation inferred from \(W\), such that, for any \(x,y \in \mathcal {A}(W)\), \(y \succ ^W x\) iff there is at least a trace admitted by \(W\) where \(y\) occurs after \(x\). Then, the behavioral profile matrix of \(W\), denoted by \(\mathcal {B}(W)\), is a function mapping each pair \((x,y) \in \mathcal {A}(W) \times \mathcal {A}(W)\) to an ordering relation in \(\{\rightsquigarrow ,+,\parallel \}\), as follows: (i) \(\mathcal {B}(W)[x,y] = \rightsquigarrow \), iff \(y \succ ^W x\) and \(x \nsucc ^W y\) (strict order); (ii) \(\mathcal {B}(W)[x,y] = +\), iff \(x \nsucc ^W y\) and \(y \nsucc ^W x\) (exclusiveness); (iii) \(\mathcal {B}(W)[x,y] = \parallel \), iff \(x\succ ^W y\) and \(y \succ ^W x\) (observation concurrency).

Let \(\tau \) be a trace in \(\mathcal {T}\) (with event universe \(E\)), \(x\) and \(y\) be two event classes (i.e. disjoint subsets of \(E\)), and \(\mathcal {B}\) be a behavioral profile matrix. Then we say that \(\tau \) violates (resp., satisfies) \(\mathcal {B}[x,y]\), denoted by \(\tau \not \!\vdash \mathcal {B}[x,y]\) (resp., \(\tau \vdash \mathcal {B}[x,y]\)), if the occurrences of \(x\) and \(y\) in \(\tau \) infringe (resp., fullfill) the ordering constraints in \(\mathcal {B}[x,y]\). Precisely, it is \(\tau \not \!\vdash \mathcal {B}[x,y]\) iff there exist \(i,j \in \{1,...,len(\tau )\}\) such that \(\tau [i]=y\), \(\tau [j]=x\), and: either (i) \(\mathcal {B}[x,y] =+\), or (ii) \(\mathcal {B}[x,y] =\rightsquigarrow \) and \(i<j\).

Behavioral profiles help us measure workflow conformance, as in what follows.

Definition 1

Let \(L\) be an event log, \(W\) be a workflow schema, and \(\sigma \in \mathbb {N}\) be a (lower) noise threshold. Then the compliance of \(W\) w.r.t. \(L\), denoted by \(compl(W,L)\), is defined as: \(compl(W,L) = \frac{1}{|\mathcal {A}(W)|^2} \times | \{ (x,y) \in \mathcal {A}(W) \times \mathcal {A}(W) \ | \ \lnot (\exists ^{\ge \sigma } \ \tau \in traces(L)\) s.t. \(\tau \not \!\vdash \mathcal {B}(W)[x,y] )\}|\). Moreover, the precision of \(W\) w.r.t. \(L\), denoted by \(prec(W,L)\), is defined as: \(prec(W,L) = \frac{1}{|\mathcal {A}(W)|^2} \times | \{ (x,y) \in \mathcal {A}(W) \times \mathcal {A}(W) \ | \ \mathcal {B}(W)[x,y]\)=\(\parallel \) and \(\lnot (\exists ^{\ge \sigma } \tau \in L\) s.t. \(|\{x,y\} \cap tasks(\tau )|=1 ) \} |\) where \(\exists ^{\ge \sigma }\) is a counting quantifier, asserting the existence of at least \(\sigma \) elements in a given set.Footnote 1    \(\Box \)

3 Problem Statement

As discussed above, we want to discover two interrelated clustering models: one for log events, and another for log traces (intended to recognize process variants).

For the sake of interpretability, in both cases we seek a clustering model that can be encoded by decision rules. Let us assume that each rule is a conjunctive boolean formula of the form \((A_{1} \in V_{1}) \wedge (A_{2} \in V_{2}) \wedge \ldots \wedge (A_{k} \in V_{k})\), where, for each \(i \in \{1,\ldots ,k\}\), \(A_{i}\) is a descriptive attribute defined on some given set \(Z\) of data instances, and \(V_i\) is a subset of \(A_i\)’s domain. For any \(I \subseteq Z\) and for any such a rule \(r\), let \(cov(r,I)\) denote the set of all \(I\)’s instances that satisfy \(r\).

A conceptual clustering model for \(Z\) is a list \(\mathcal {C}=\langle r_1,..., r_n\rangle \) of conceptual clustering rules (for some positive integer number \(n\)), which defines a partitioning of \(Z\) into \(n\) parts \(P_1, \ldots , P_n\), where \(P_i = cov(r_i, Z) /\ \bigcup _{j=1}^{i-1} cov(r_j, Z)\).

Our ultimate goal is to find a multi-variant process model leveraging two such clustering functions (for events and traces, resp.), as it is formally defined next.

Definition 2

Let \(L\) be a log, and \(\mathcal {T}\) (resp., \(E\)) the associated trace (resp., event) universe. Then, a High-Level Process Model (\(\mathtt {HLPM} \)) for \(L\) is a triple \(\langle \mathcal {C}^E,\mathcal {C}^\mathcal {T},WS\rangle \) such that: (i) \(\mathcal {C}^E=\langle r^E_1,\ldots ,r^E_p\rangle \) is a conceptual clustering model for \(E\), where \(p \in \mathbb {N}\) is the number of event clusters, and \(r^E_i\) (for \(i=1,..,p\)) is the clustering rule of the \(i\)-th event cluster; (ii) \(\mathcal {C}^\mathcal {T}=\langle r^\mathcal {T}_1,\ldots ,r^\mathcal {T}_q\rangle \) is a conceptual clustering model for \(\mathcal {T}\), where \(q \in \mathbb {N}\) is the number of trace clusters, and \(r^\mathcal {T}_j\) (for \(j=1,..,q\)) is the clustering rule of the \(j\)-th trace cluster; \(WS=\langle W_1,\ldots ,W_q\rangle \) is a list of workflow schemas, where \(W_k\) (for \(k\)=\(1,..,q\)) models the \(k\)-th trace cluster, using the classes yielded by \(\mathcal {C}^E\) as activities.    \(\Box \)

Clearly enough, sub-model \(\mathcal {C}^E\) plays as an event abstraction function, which maps each event \(e \in E\) to an event class, based on logical clustering rules (expressed on \(e\)’s properties). Analogously, model \(\mathcal {C}^\mathcal {T}\) partitions historical execution traces into behaviorally homogenous clusters (via logical clustering rules over traces’ properties). Each discovered trace cluster, regarded as a distinct process variant, is also equipped with a workflow schema, where some of the clusters of \(C^E\) feature as (high-level) activity nodes.

Our discovery problem can be stated conceptually as the search for an optimal \(\mathtt {HLPM} \) that maximizes some associated quality measure, such as those defined in the previous section.

4 Solution Approach

Technically, we rephrase the discovery of conceptual clustering models (over either events or traces) as a predictive clustering problem. Basically, predictive clustering approaches [3] assume that two kinds of data attributes characterize each element \(z\) in a given space \(Z=X \times Y\) of instances: descriptive attributes and target attributes, denoted by \(descr(z)\in X\) and \(targ(z)\in Y\), respectively. The goal of these approaches is to find a logical partitioning function (of the same nature as our conceptual clustering models) that minimizes \({\small \sum _{C_i} |C_i| \times Var (\{ targ(z) \ | z \in C_i \}) }\), where \(C_i\) ranges over current clusters, and \(Var(S)\) is the variance of set \(S\).

Different predictive clustering models exists in the literature. Owing to scalability and readability reasons, we preferred Predictive Clustering Trees (PCTs), where the clustering function is a (propositional) decision tree.

We next introduce ad-hoc propositional encodings for events and traces, allowing for inducing a clustering model by reusing a PCT learner.

Definition 3

Let \(L\) be a log, over event (resp., trace) universe \(E\) (resp., \(\mathcal {T}\)). Then, the e-view of \(L\), denoted by \(\mathcal {V}_E(L)\), is a relation containing a tuple \(z_{i,j}\) for each \(\tau _i \in traces(L)\) and for each \(j\in \{1, \ldots , len(\tau _i)\}\) (i.e. for each event \(\tau [j]\in events(L)\)), such that: (i) \(descr(z_{i,j}) = prop(\tau _i[j])\), and (ii) \(targ(z_{i,j}) =\langle \frac{j}{len(\tau _i)}, \frac{j}{maxL}, \frac{len(\tau _i) - j}{maxL} \rangle \), where \(maxL= \max (\{ \ len(\tau ) \ | \ \tau \in traces(L) \})\).   \(\Box \)

In such a log view (acting as training set for predictive clustering), the descriptive (input) attributes each instance \(z_{i,j}\) are the data fields in \(prop(\tau _i[j])\) (see Sect. 2), while the target ones are three indicators (derived from the relative/absolute position \(j\) of \(\tau _i[j]\) in its surrounding trace). Intuitively, we want events with similar intra-trace positions (and similar properties) to be put together.

For the clustering of traces, we introduce another log view, named t-view, where the context data and abstract activities of each trace play as descriptive features, while the target variables express local activity relationships.

Specifically, let \(\mathcal {B}\) be a behavioral profile matrix (derived from some workflow schema), and \(\alpha : E \rightarrow A\) be an event clustering function. Then, for each trace \(\tau \) and any activities \(a_i\) and \(a_j\) in \(A\) we define a target variable \(v_\mathcal {B}(\tau ,a_i,a_j)\) as follows: (i) \(v_\mathcal {B}(\tau , a_i,a_j)\) = \(1\) if \(\tau \not \!\vdash \mathcal {B}[a_i,a_j]\); (ii) \(v_\mathcal {B}(\tau , a_i,a_j) = \frac{f(\tau , a_i,a_j)}{2 \times c(\tau , a_i,a_j)}\) if both \(a_i\) and \(a_j\) occur in \(\tau \), where \(c(\tau , a_i,a_j) = | \{ (i',j') \ | \ i',j' \in \{1,\ldots ,len(\tau )\} \ \wedge \ i' \ne j' \wedge \alpha (\tau [i']) = a_i \wedge \alpha (\tau [j'])\) = \(a_j \} |\), and \(f(\tau , a_i,a_j) = \mathtt{sum } (\{ \mathtt{sgn }(j'-i') \ | \ i',j' \in \{1,\ldots , len(\tau ) \} \wedge \) \(\alpha (\tau [i']) = a_i \wedge \alpha (\tau [j']) =a_j \} )\), where sgn denotes function signum; and (iii) \(v_\mathcal {B}(\tau , a_i,a_j) = \mathtt{null }\) if \(\tau \) does not contain both \(a_i\) and \(a_j\). This way, \(v_\mathcal {B}(\tau ,a_i,a_j)\) can capture a violation to a behavioral profile (case i), or keep information on the mutual positions of \(a_i\) and \(a_j\), if both occur in \(\tau \) (case \(ii\)).

Definition 4

Let \(L\) be a log over event and trace universes \(E\) and \(\mathcal {T}\), \(A=\{a_1,\ldots ,a_k\}\) be a set of event clusters (regarded as abstract activities), and \(\alpha : E \rightarrow A\) be an event clustering function. Let also \(\mathcal {B}: A \times A \rightarrow \{\rightsquigarrow ,+,\parallel \}\) be a behavioral profile matrix. Then, the t-view of \(L\) w.r.t. \(\alpha \) and \(\mathcal {B}\), denoted by \(\mathcal {V}_\mathcal {T}(L,\alpha ,\mathcal {B})\), is a relation containing, for each \(\tau \in traces(L)\), a tuple \(z_\tau \) such that: (i) \(descr(z_\tau ) = prop(\tau ) \oplus act_\alpha (\tau )\), where \(\oplus \) denotes tuple concatenation, and \(act_\alpha (\tau )\) is a vector in \(\{0,1\}^k\) such that, for \(i=1,..,k\), \(act_\alpha (\tau )[i] = 1\) iff activity \(a_i\) occurs in \(\tau \) (i.e. iff \(\exists j \in \{1,\ldots , len(\tau ) \}\) s.t. \(\alpha (\tau [j])=a_i\)); and (ii) \(targ(z_\tau ) = \langle v_\mathcal {B}(\tau , a_1,a_1), \ldots ,v_\mathcal {B}(\tau , a_1,a_k), v_\mathcal {B}(\tau , a_2,a_2), ..., v_\mathcal {B}(\tau , a_2,a_k), \ldots , v_\mathcal {B}(\tau , a_i,a_i), \ldots , v_\mathcal {B}(\tau , a_i,a_k), \) \(\ldots , v_\mathcal {B}(\tau , a_k,a_k)\rangle \).   \(\Box \)

Fig. 2.
figure 2

Algorithm \(\mathtt {HLPM} \mathtt {-mine} \)

Algorithm \({\mathtt{\textit{HLP-mine} }}\) . Our approach to the discovery of a \(\mathtt {HLPM} \) is illustrated as an algorithm, named HLPM -mine and reported in Fig. 2. The algorithm follows a two-phase strategy: it first finds a conceptual clustering model (Steps 1–3) for the events, and then computes a collection of trace clusters, along with their associated clustering rules and workflow schemas (Steps 4–24).

Conceptual clustering models are found through function minePCT, which leverages a PCT-learning method in [3]. In the algorithm, this function is applied to both an e-view (Step 2), and a t-view (Step 12). Two further parameters allow to constrain a PCT’s growth: the minimal coverage a node must have to be possibly split, and the maximal number of leaves, respectively.

The PCT found for log events is turned into a conceptual clustering model through an iterative bottom-up procedure, named extractRules, omitted for lack of space. Basically, starting with the rules of the tree leaves, this procedure replaces each rule \(r\) with that of the parent, if \(|cov(r,\mathcal {V}_E(L))| < minCov \times | \mathcal {V}_E(L) |\), and the variance of \(cov(r,\mathcal {V}_E(L))\) is higher than, or nearly equal to, the parent’s variance (precisely, the former is lower than the latter of a fraction \(\gamma \) at most). Notice that, in current implementation this test is performed on a separate pruning set. Whenever a rule \(r\) is removed, all the instances in \(cov(r,\mathcal {V}_E(L))\) are assigned to the parent rule (i.e. to the rule of the parent of \(r\)’s node), which is appended to the clustering model, if it does not appear in it yet.

Trace clusters are computed through an iterative partitioning scheme (Step 4–24), where \(TClust\) is the set of current trace clusters, and \(Q\) just contains the ones that may be further split. For any trace cluster \(c\), \(W(c)\) and \(rule(c)\) store the associated workflow schema and clustering rule, respectively.

Before the loop, \(TClust\) just consists of one cluster, gathering all \(L\)’s traces. At each iteration, we try to split the cluster \(c^*\) in \(Q\) that has been given the lowest compliance score by the (profile-based) conformance measure \(compl\) (see Definition 1), among all those overcoming the coverage threshold \(minCov\). To this end, a PCT model \(T\) is induced from the propositional view of cluster \(c^*\), according to the discovered event clustering \(\mathcal {C}_E\), and the behavioral profiles of the schema associated with \(c^*\) (see Definition 4).

For each leaf cluster \(c\) in \(T\), we extract a workflow schema and the associated behavioral profiles, and merge the rule of \(c\) with that of the cluster (namely, \(c^*\)) \(T\) was induced from (Steps 16–17).

At this point, the algorithm verifies if the workflows discovered after splitting cluster \(c^*\) really allowed to model more effectively the behavior of \(c^*\)’s traces. This check relies on an ad-hoc quality relationship, named \(\gamma \) -improve, which is defined next, based on the conformance metrics \(compl\) and \(prec\) (see Definition 1).

Definition 5

Let \(c\) be a set of traces, \(\{c_1, .., c_k \}\) be a partition of \(c\), \(W\) be a workflow schema for \(c\), and \(\mathrm{WS}=\{W_1, .., W_k\}\) be a set of workflow schemas s.t. \(W_i\) models \(c_i\) and \(\mathcal {A}(W_i) \subseteq \mathcal {A}(W)\), for \(i=1,..,k\). Then, for \(\gamma \in [0,1]\), we say that \(WS\) \(\gamma \)-improves \(W\), denoted by \(WS\) \(\prec ^\gamma \) \(W\), iff (i) \(\sum _{i=1}^k \frac{|traces(c_i)| \cdot compl(W_i,c_i)}{|traces(c)|} > (1+ \gamma ) \cdot compl(W,c)\), or (ii) \(compl(W,c) \le \sum _{i=1}^k \frac{|traces(L_i)| \cdot compl(W_i,c_i) }{|traces(c)|} \le (1+~\gamma ) \cdot compl(W,c)\) and \(\sum _{i=1}^k \frac{|traces(L_i)| \cdot prec(W_i,c_i) }{|traces(c)|} > (1+ \gamma ) \cdot prec(W,c)\).    \(\Box \)

We hence assume that the workflows of \(c^*\)’s children clusters model the behavior of \(traces(c^*)\) better than \(W(c^*)\) alone, if they get, in the average, higher compliance (on their respective sub-clusters) than \(W\) (on \(c^*\) as a whole). When the compliance score keeps unchanged, we still prefer the new schemas if they are more precise than \(W(c^*)\).

The split of \(c^*\) is eventually kept only if the above improvement relationship holds. In any case, \(c^*\) is removed from the list of candidates for further refinement.

Steps 25–28 just build the list of trace clustering rules, and the list of workflow schemas, based on the set of trace clusters eventually left in \(TClust\).

5 Experiments

The approach described so far was implemented into a Java prototype system, and tested on the logFootnote 2 of a real problem management system, encompassing 1487 traces recorded from January 2006 to May 2012. As explained in Example 1, each log event stores eight data attributes (i.e., status, substatus, resource, res_country, functional_division, org_line, support_team, and org_country). For each problem case \(p\), two attributes are associated with \(p\)’s trace: \(p\)’s impact (medium, low, or high), and the product affected by \(p\).

We enriched each trace \(\tau \) with further context-oriented attributes: (i) firstOrg, indicating the team associated with \(\tau \)’s first event; (ii) workload, quantitying the number of problems open on the time, say \(t_\tau \), when \(\tau \) started; (iii) several time dimensions (namely, week-day, month and year) derived from \(t_\tau \).

In all the tests discussed next, we ran HLPM -mine with \(minCov = 0.01\), and \(\gamma = 0.1\), while using plugin FHM [14] to instantiate function mineWF.

Table 1. Results (avg\(\pm \)stdDev) obtained with the event clustering method (no trace clustering) and manual abstraction criteria. The best value of each column is in bold.
Table 2. Results (avg\(\pm \)stdDev) yielded by different trace clustering methods, after abstracting events with the clusters found by HLPM -mine (m=1). Best scores are in bold.

Evaluation Metrics. Different kinds of metrics (fitness, precision, generalization) exist for evaluating the quality of a workflow schema. In particular, fitness metrics quantify the capability to replay a given log, and represent the main evaluation criterion [7], while the other metrics serve finer-grain comparisons.

In our tests, we measured the fitness of each discovered heuristics-net according to the Improved Continuous Semantics Fitness (named Fitness hereinafter) defined in [12]. Basically, the fitness of a schema \(W\) w.r.t. a log \(L\) (denoted by Fitness(W,L)) is the fraction of \(L\)’s events that \(W\) can parse exactly, with a special punishment factor for benefitting the schemas that yield fewer replay errors in fewer traces.

Moreover, the behavioral precision of schema \(W\) w.r.t. log \(L\), denoted by \(BehPrec(W,L)\), was simply measured as the average fraction of activities that are not enabled by a replay of \(L\) in \(W\).

As to schema’s complexity, we considered: the numbers of nodes (#nodes) and of edges (#edges), and the average number of edges per node (#edgePerNode).

Test Bed. As no current approach mixes automated event abstraction and trace clustering, we tested these two facets incrementally, for the sake of comparison.

Fig. 3.
figure 3

Some event clusters (left) and trace clusters (right) found by HLPM -mine.

First, we assessed the ability of our event abstraction method to help discover higher-quality workflow schemas. To this end, we evaluated the workflow schemas extracted by algorithm FHM to an abstract version of the log, obtained by replacing the original events with the event clusters found by HLPM -mine, ran without any trace clustering (i.e. by setting \(m=1\)). As a term of comparison, we considered three “manual” event abstraction criteria: (i) replacing each event with the associated value of status, or (ii) of substatus, and (iii) using each distinct event 8-ple as an activity type (all).

As to trace clustering, we considered four competitors: algorithm Actitrac [8]; the sequence-based and alphabet-based versions of the approach in [4] (TRMR and A-TRMR, resp.); and the approach in [5] (KGRAM), exploiting a \(k\)-gram representation. Since all of these competitors lack any mechanisms for abstracting log events and for selecting the number of clusters, we provided each of them with the abstraction function and the number of trace clusters found by our approach.

Quantitative Results. Table 1 reports the quality scores obtained by our event clustering approach (without trace clustering), here viewed as an enhanced data-driven event-abstraction criterion, compared with the “manual” basic ones described before. To this end, we first partitioned all log events via the Steps 1–3 of algorithm \(\mathtt {HLPM} \mathtt {-mine} \), and then replaced each event with the label of its respective cluster. All measures were computed by averaging the results of 10 trials, performed each in 10-fold cross-validation.

Interestingly, our event abstraction method gets the best fitness outcome (0.815), and a satisfactory precision score, at the cost of little increase in structural complexity (8 more nodes and 14 more edges) w.r.t. the substatus abstraction, which is the most effective among all 1-attribute abstractions. By the way, only the dummy abstraction all gets higher precision, but it returns overly complex and overfitting workflows.

The benefit of clustering log traces is made clear by Table 2, which reports the quality results obtained by our approach and by the competitors. More precisely, in each clustering test, we computed an overall Fitness (resp., BehPrec) measure for each method, as the weighted average (with cluster sizes as weights) of the Fitness (resp., BehPrec) scores received by the workflow schemas induced (with FHM) from all the trace clusters discovered in the test. As all cross-validation trials HLPM -mine (ran with \(m\)=\(\infty \)) yielded 8 trace clusters, the same number of clusters was given as input to the competitors.

Fig. 4.
figure 4

Workflow schemas induced from trace clusters \(t_7\) (top), and \(t_8\) (bottom).

Notably, HLPM -mine managed to neatly improve the average fitness (0.851) w.r.t. the case where no trace clustering was performed (0.815), and surprisingly outperformed all competitors on this fundamental metrics, despite it can only split the log by way of boolean formulas over trace properties. HLPM -mine also managed to achieve good precision (0.742), compared to its competitors, and to find quite readable workflow schemas — exhibiting, indeed, the lowest average numbers of nodes (10), edges (14) and edges per node (2.6).

Qualitative Results. In order to give a more concrete idea of the models that our approach can extract, we ran algorithm \(\mathtt {HLPM} \mathtt {-mine} \) on the entire log (without cross validation), while still setting \(m\)=\(\infty \). As a result, 8 trace clusters were found, and equipped with separate workflow schemas, each describing a distinct problem-management scenario. For instance, Fig. 4 shows the models of two trace clusters — notice that each node is labelled with the identifier of an event cluster, whose clustering rule can be found in Fig. 3 (left).

Besides being more compact than the models discovered without trace clustering (see the last row in Table 1), these schemas help us reckon that problem cases follow different execution scenarios. Specifically, the schema of cluster \(t_8\) captures a particular scenario, where two activities (i.e. event clusters \(e_4\) and \(e_5\)) are executed in sequence. Moreover, the trace clustering rules on the right of Fig.  let us reckon that these scenarios differ both for the value of context factors, and for the occurrence of certain activity types (i.e. event clusters).

6 Discussion and Future Work

We have presented a clustering-based process discovery method for low-level multi-dimensional logs, where event and trace clusters are used to capture different activity types and process variants, respectively. Tests on a real-life log showed that the approach is able to find high-quality readable workflow models.

Novelty Points. Several features distinguish our proposal from current process mining solutions, in addition to the very idea of combining activity abstraction and trace clustering — which has only be explored in  [4] so far. Fist of all, each event/trace clustering function discovered by our approach is natively encoded in terms of logical rules, which the analyst can easily interpret, to distillate a semantical view of process behavior and of its dependence on relevant properties of process events/cases (and on other context-related factors). Moreover, we removed the common assumption that each log event explicitly refers (or can be easily mapped) to some predefined process activity. In fact, even most current activity abstraction methods [6, 10] rely on the presence of activity labels (possibly defined at a high level of granularity) associated with log events, when trying to aggregate them into higher-level activities (or sub-processes). We pinpoint that no high-level activity types are assumed to be known in advance, differently from [2], where a method was proposed to map log events to a-priori given activity types. The problem faced here is, in fact, more challenging, in that event types are to be learnt inductively from scratch.

Owing to the predictive nature of our models, they can help implement advanced run-time services, beside serving descriptive analyses. For example, one can dynamically assign any novel process case, say \(c\), to one of the discovered process variants (i.e., trace clusters). The workflow schema associated with that cluster can be then presented as a customized (context-adaptive) process map, showing how \(c\) could proceed. Moreover, by continuously evaluating the degree of compliance between \(c\) and its reference schema deviating can be detected.

Future Work. Implementing the advanced run-time services above (useful in flexible BPM settings) is left for future work. Inspired by the proposal in [11], we will also try to extend our approach to deal with interleaved logs, which mix traces of different processes without keeping information on which process generated each of them. This would let us release another common assumption — orthogonal to that considered in our work, on the existence of a-priori knowledge on the mapping of log events to process activities — of most process mining approaches. Finally, we plan to test our approach on a wider range of logs.