1 Introduction

Technologies for business process management have matured significantly since the early proposals of office automation systems and business process definition languages in the late 1970s [30, 84, 89]. Today, BPMN [29, 62, 88] has become a stable, de-facto standard notation for describing business processes. Users can choose from a number of commercial design tools and business process management systems, supporting the design and enactment of business processes. In recent years, we have even seen commercial process mining tools [44] that support the automated discovery of BPMN models from event logs [83, 85].

With the increased need to accommodate flexible, knowledge-intensive processes, notations focusing on essential rules, rather than detailed procedures have seen increased attention from researchers [28, 55, 65, 70, 75]. This approach is often characterized as declarative and juxtaposed with imperative notations like BPMN [39, 52, 66].

Highly regulated workflows, for example governmental case work processes, are particularly challenging examples, since constantly changing legislation gives rise to changes in rules, and often an increase in complexity [24, 36]. Declarative notations are, by design, well-suited to translation from natural language rules while avoiding over-specification, making them suited to capturing regulatory constraints in workflows requiring some degree of flexibility.

As part of a broader national digitalization initiative, the challenge of modeling knowledge-intensive workflows has been tackled in several collaborative projects involving Danish universities, government institutions and firms in the private sector. One such project is the EcoKnowFootnote 1 project which builds upon the DCR Graphs formalism [40, 55, 75].

To support the local development and maintenance of the declarative DCR models, several modeling tools have been developed [16, 20, 52], supported by formal understandability studies [1,2,3]. Along with the tools, a methodology for modeling with DCR has been developed, advocating an iterative and incremental, scenario-driven approach with three main tasks. First, to identify key activities and roles. Second, to perform simulations of wanted and unwanted scenarios. Finally, the modeler may either go back to add missing activities and roles or forward to the task of identifying rules that supports the wanted scenarios and forbid the unwanted scenarios.

The iterative approach lends itself extremely well to being supported by process discovery: after the users define wanted and unwanted scenarios, discovery algorithms can be used to automatically make suggestions for which rules should be added. Such a discovery algorithm needs to be both efficient and accurate. On the one hand, users expect their modeling experience to be continuous, without long interruptions waiting for a discovery algorithm to compute possible rules. On the other hand, they are only helped by rule suggestions that are relevant and correct in terms of the suggested scenarios: poor suggestions will only confuse the users and reduce the quality of their modeling experience.

Recently, an efficient and accurate discovery algorithm was developed for DCR Graphs and implemented in a commercial design tool [59]. One advantage of the algorithm is that it can provide accurate suggestions even with small training sets, facilitating rule discovery from large historical event logs as well as fast recommendations based on few simulated scenarios carried out as part of the scenario-driven modeling approach.

This paper is part of a special issue of the journal in connection with the Process Discovery Contest 2019, which frames process discovery as a binary classification task. The DisCoveR algorithm secured a second place in that year’s contest in terms of classification accuracy. The algorithm itself was first introduced by Nekrasaite et al. in [59], and the current paper expands on this initial introduction with: a complete and thorough formalization of the algorithm that provides all details required for its implementation (Sect. 4); a novel, open source and more efficient implementation based on bit vector operations (Sect. 5); an evaluation of the algorithm against flagship academic miners based on the classification task provided by the Process Discovery Contest 2019 and a run time comparison suggesting the miner is competitive with its peers, along with a framing of process discovery in terms of computational learning theory which helps explain the key to its effectiveness in terms of regularization (Sect. 6); a case study showing how the algorithm has been swiftly transferred to industry through its integration in the dcrgraphs.net process modeling portal, leading to an enhanced modeling experience by its users (Sect. 7).

After surveying related work in Sect. 2 and introducing preliminaries in Sect. 3, we proceed as sketched above, concluding and proposing future directions of research in Sect. 8.

2 Related work

Many declarative process notations have been developed, several with corresponding discovery algorithms [76]. One of the first of these was Declare [66, 82, 86], which was inspired by property specification patterns for linear temporal logic (LTL) [31]. Declare identified a particular set of patterns relevant for business processes and gave them semantics through a mapping to LTL formulae relevant for describing the rules governing a business process. A Declare model is therefore a collection of such patterns, and the semantics of a model is defined as the traces that satisfy the conjunction of the formulae underlying the patterns. More recently, the same patterns have been formalized using colored automata [48], SCIFF [53, 54], and regular expressions [92]. Extensions to Declare include timed [90] and data [18] constraints, which were combined in MP-Declare [10] (Multi-Perspective Declare), and hierarchy [95]. The first miner for Declare was the Declare Maps Miner [49], while initially using a brute-force approach, it was extended with several improvements [46] inspired by the Apriori algorithm for association rule mining [6]. More recently, the miner was extended to allow for parallelization [47]. The second Declare miner to be developed was Minerful [14], which provided significant gains in efficiency. Since its introduction, it has been extended with support for target-branched constraints [27], removal of redundancies and inconsistencies [12] and removal of vacuously satisfied constraints [13].

Another prominent declarative approach is the Guard-Stage-Milestone (GSM) notation [41], inspired by earlier work on artifact-centric business processes [9]. GSM aims to effectively model case management and has been a primary contributor to the development of the Case Management Model And Notation (CMMN) [61]. CMMN has seen a relatively fast industrial and academic adoption through the development of tools and case studies [35, 43, 93]. Work on process discovery for GSM or CMMN on the other hand is still rather sparse, and only one discovery algorithm has been proposed to date [67] with no working implementation.

Process discovery has also been considered for the Declarative Process Intermediate Language (DPIL) [72, 94], which is a textual, multi-perspective, declarative modeling language. Process discovery for DPIL is supported through the DPIL Miner.Footnote 2 In comparison with other Declarative miners, which tend to focus on the control-flow perspective of processes, the DPIL Miner instead focuses more on mining the organizational perspective [71]. Interestingly, the miner has never been made publicly available and its effectiveness or accuracy cannot be independently ascertained.

In more recent work, it has been proposed to combine declarative and imperative discovery to produce the so-called hybrid [7, 21, 69, 79] or mixed [17, 19, 91] models that combine both paradigms. Hybrid miners include the Fusion miner [80], which produces an inter-mixed Petri net and Declare model, the Hybrid Miner [50] which produces a hierarchical Petri net and Declare model, and the Precision Optimization Hybrid Miner [73] which produces a process tree in which some nodes may be Declare models.

Approaches to workflow formalization based on Classical Linear Logic, a resource-aware logic, were implemented in WorkFlowFM [63, 64]  which guarantees optimally concurrent, correct-by-construction processes. The framework was applied to intra-hospital patient transfers in [51].

Temporal logics have also been used to model phenomena which would not be considered workflows, such as robot motion [32], naval traffic, and train network monitoring [42].

Finally, DCR Graphs were inspired by event structures [60] and developed after Declare was shown to not be sufficiently expressive in modeling industrial cases [57]. In contrast to Declare, the semantics of DCR Graphs are defined as transformations on the markings of the events. This allows modelers to straightforwardly reason about the execution semantics of a model by simulating it and observing the changes to the markings as events are executed. [52] Since their inception, DCR Graphs have been extended with nesting [37], time [38], data [16, 56, 78], and hierarchy [22].

Regarding evaluation, process mining has traditionally been framed as an inherently descriptive rather than predictive data mining problem, which precludes the use of standard evaluation metrics familiar in classification and regression tasks. This is largely due to the assumption that an event log represents only positive examples [33]. Some authors have addressed this by developing techniques to generate artificial negative examples [34].

3 Preliminaries

We present here the definitions of processes and event logs, necessary to give a formal presentation of the task of process discovery in terms of computational learning theory, as well as the DCR Graphs formalism.

Definition 1

(Processes and Event Logs)

  • An alphabet \(\Sigma \) is a finite set of symbols denoting activities. We denote by \(\Sigma _L\) activities present in log L.

  • \(\Sigma ^*\) and \(\Sigma ^\omega \) denote countably infinite sets of finite, respectively, infinite, sequences over \(\Sigma \).

  • A process is a pair \((P, {\mathbb {P}}_P)\) where P is a set of allowable sequences of activities along with an associated probability distribution \({\mathbb {P}}_P\) over P. The probabilistic framing is required for consistency with the statistical metrics (e.g., accuracy) used for evaluation in Sect. 6.

  • An event, denoted \(\varsigma \), is a particular occurrence of an activity.

  • A trace \(\sigma \in \Sigma ^*\cup \Sigma ^\omega = \langle \varsigma _1,\ldots , \varsigma _i, \ldots \rangle \) represents a sequence of activities, with \(i \in {\mathbb {N}}\). A trace can be seen as a partial mapping:

    $$\begin{aligned} \sigma (i): {\mathbb {N}} \hookrightarrow \Sigma \end{aligned}$$
  • A process model h defines a semantics such that the language \(\ell \) of h denotes the set of traces accepted by h. That is,

    $$\begin{aligned} \ell (h) \subseteq \Sigma ^* \cup \Sigma ^\omega \end{aligned}$$

    and for some process \((P, {\mathbb {P}}_P)\) we have \(P = \ell (h)\) if h is a perfect model of the process. Note that h may be agnostic regarding \({\mathbb {P}}_P\).

  • Finally, a log L is a multiset representing the number of occurrences of different traces:

    $$\begin{aligned} L = \left\{ \sigma _1^{m(\sigma _1)}, \ldots , \sigma _n^{m(\sigma _n)} \right\} \end{aligned}$$

    where \(m(\sigma _k) \in {\mathbb {N}}\) denotes the multiplicity of \(\sigma _k\). As L is essentially a sample from \((P, {\mathbb {P}}_P)\), it is necessary to consider trace multiplicities rather than collapsing the log to a set.

Note the assumption of strict monotonicity implied by this definition of traces. That is, for all \(i,j \in {\mathbb {N}}\) we have that

$$\begin{aligned} i < j \implies \sigma (i) \prec \sigma (j) \end{aligned}$$

where \(\prec \) denotes “precedes,” and also that

$$\begin{aligned} i = j \implies \sigma (i) = \sigma (j). \end{aligned}$$

The definition of a trace as a function mapping from a timestamp domain to the codomain of individual activities implies that no two events can share the exact same timestamp (otherwise, \(\sigma \) would not be a function). We note this, in part, due to the observation that shared timestamps are not uncommon in real data sets. Nonetheless, the present formalization of traces is widely accepted and sufficient for the study at hand.

Definition 2

(Process Discovery) Process discovery refers to a procedure that derives a process model from an event log. Let \({\mathcal {L}}\) denote the set of all valid event logs and \({\mathcal {H}}_F\) the set of process models encodable by some process modeling formalism F. A process discovery algorithm \(\gamma \) is a mapping from logs to models:

$$\begin{aligned} \gamma : {\mathcal {L}} \rightarrow {\mathcal {H}}_F \end{aligned}$$

Examples of F include Petri nets, sound Petri nets, WorkFlow nets, R/I-nets, Declare maps, and of course DCR Graphs. In other words, \({\mathcal {H}}_F\) is our hypothesis space to which our learning algorithm is restricted.

By extension, we can view the overall task as a mapping from a log to a language, i.e., a subset of all possible traces:

$$\begin{aligned} \ell (\gamma ): {\mathcal {L}} \rightarrow 2^{\Sigma ^*\cup \Sigma ^\omega } \end{aligned}$$

Where \(2^{\mathcal {X}}\) denotes the powerset of set \({\mathcal {X}}\). To see this, consider that for some \(L \in {\mathcal {L}}\), we have \(\gamma (L) = h\) and \(\ell (h) \subseteq 2^{\Sigma ^*\cup \Sigma ^\omega }\). That is, \(\ell (\gamma (L)) \subset \Sigma ^*\cup \Sigma ^\omega \). This view of process discovery will lead naturally to the classification task and reduce the choice of modeling formalism F to an intermediate step w.r.t. classification.

Definition 3

(DCR Graphs) DCR Graphs consist of a set of events with three associated unary predicates: executed, pending, and included which together constitute the marking (i.e., state) of a DCR Graph. Moreover, four binary relations are defined between events. In order to be executed, an event must be included and satisfy any relevant relations.

Formally, a dynamic condition response graph is a tuple

$$\begin{aligned} g = ( {\mathcal {E}}, m, A, \bullet \rightarrow , \rightarrow \bullet , \rightarrow +, \rightarrow \%, l) \end{aligned}$$

where

  • \({\mathcal {E}}\) is a set of “events” (analogous to transitions in a Petri net, and not to be confused with events in a trace, see l).

  • \(m \in 2^{\mathcal {E}} \times 2^{\mathcal {E}} \times 2^{\mathcal {E}}\) is the marking

  • A is the set of activities.

  • \(\rightarrow \bullet \in {\mathcal {E}} \times {\mathcal {E}}\) is the set of condition relations.

  • \(\bullet \rightarrow \in {\mathcal {E}} \times {\mathcal {E}}\) is the set of response relations.

  • \(\rightarrow + \in {\mathcal {E}} \times {\mathcal {E}}\) is the set of includes relations.

  • \(\rightarrow \% \in {\mathcal {E}} \times {\mathcal {E}}\) is the set of excludes relations.

  • \(\rightarrow + ~\bigcap \rightarrow \% = \emptyset \).

  • \(l: {\mathcal {E}} \rightarrow A\) is a labeling function mapping every “event” to an activity.

Table 1 Relevant constraint templates from Declare

A DCR Graph marking \(m = ({\mathsf {E}}{\mathsf {x}}, {\mathsf {P}}{\mathsf {e}}, {\mathsf {I}}{\mathsf {n}})\) represents events which have previously been executed, pending events to be executed or excluded, and events currently included. For finite traces, a DCR Graph is defined to be accepting when \({\mathsf {P}}{\mathsf {e}} \cap {\mathsf {I}}{\mathsf {n}} = \emptyset \), i.e., no pending events are currently included. For infinite traces, accepting states are defined in the limit as with Büchi automata, to which DCR graphs can be translated [58].

The execution semantics of DCR Graphs requires that for an event e to be executed, it must fulfill the following criteria:

  • e must be included, i.e., \(e \in {\mathsf {I}}{\mathsf {n}}\)

  • If any condition relations exist s.t. \(e' \rightarrow \bullet e\), then all such \(e'\) must have been executed, or excluded, i.e., \(e' \in {\mathsf {E}}{\mathsf {x}}\) or \(e' \notin {\mathsf {I}}{\mathsf {n}}\). In this way, conditions can be nullified by excluding the source event. The latter is the “dynamic” aspect of DCR Graphs.

Furthermore, if e is executed, the marking m will change as follows:

  • If any response relations exist s.t. \(e \bullet \rightarrow e'\). , then all such \(e'\) will become pending, i.e., \(e' \in {\mathsf {P}}{\mathsf {e}}\)

  • If any excludes relations exist s.t. \(e \rightarrow \% e'\), then any included \(e'\) will become excluded, i.e., \(e' \notin {\mathsf {I}}{\mathsf {n}}\).

  • If any includes relations exist s.t. \(e \rightarrow + e'\), then any excluded \(e'\) will become included, i.e., \(e' \in {\mathsf {I}}{\mathsf {n}}\).

An important point to note regards the labeling function l, which may map more than one event to the same activity (analogous to Petri nets with duplicate transitions). This can potentially result in a non-deterministic model. In the algorithm presented here, only bijective labeling functions are considered, so each event is mapped to exactly one activity and vice versa.

Example

Consider a DCR Graph consisting of 4 events with a one-to-one mapping to activities: a,b,c,d:

  • Initial marking:

    • Executed: \(\emptyset \)

    • Pending:  a

    • Included: a,c,d

  • Relations

    • \(a \rightarrow \bullet b\)

    • \(a \bullet \rightarrow b\)

    • \(b \rightarrow \bullet a\)

    • \(b \bullet \rightarrow a\)

    • \(c \rightarrow + b\)

    • \(d \rightarrow \% b\)

    • \(d \rightarrow \% d\)

Accepting run 1: \(\langle a \rangle \). The model begins in a non-accepting state since a is both pending and included. Since b is not included, \(b \rightarrow \bullet a\) does not come into effect. After a is executed, b becomes pending, but since it is not included, the model is in an accepting state.

Accepting run 2: \(\langle a, c, b, d, a \rangle \). After a is executed, b becomes pending. Executing c causes b to be included as well. Now, the model is in a non-accepting state. Executing b causes a to become pending. Executing d excludes b and d itself. Finally, a, which is still pending and included is executed, which causes b to become pending, but since it is not included, the model is in an accepting state.

figure a

Non-accepting run: \(\langle c, d, c \rangle \). When c is executed, b becomes included, but cannot be executed due to the condition relation \(a \rightarrow \bullet b\). Likewise, a is unable to execute due to \(b \rightarrow \bullet a\). Executing d excludes b and d itself, releasing a from \(b \rightarrow \bullet a\). At this point, executing a will lead to an accepting state. However, if instead c is executed again, b is included again and \(b \rightarrow \bullet a\) comes into effect and now d cannot be executed to exclude b. The graph is now locked in a permanently non-accepting state as neither a, nor b can ever be executed because of their mutual conditions, yet a remains forever included and pending.

Table 2 Formal definitions of helper functions which return sets of relevant relations

4 Algorithm

In this section, we formally describe the ParNek algorithm underlying DisCoveR. Note the distinction we draw between the fundamental algorithm, ParNek, and the specific implementation, DisCoveR, presented in Sect. 5. This distinction is also reflected in the formal, functional description in this section which remains agnostic to concrete implementation details, e.g., for extracting the sets of relations defined in Table 2.

The algorithm always produces perfectly fitting models, i.e., all traces in the log will be replayable on the generated model. The algorithm proceeds in the following steps:

  1. 1.

    A set of candidates for four relation patterns is constructed.

  2. 2.

    Additional excludes relations are added based on predecessor and successor relations.

  3. 3.

    Additional includes/excludes patterns are added analogous to NotChainSuccession  relations.

  4. 4.

    Redundant excludes relations are removed.

  5. 5.

    Redundant condition and response relations are removed via transitive reduction.

  6. 6.

    Additional condition relations are discovered using a limited replay strategy.

  7. 7.

    A final transitive reduction is performed for condition relations.

We will refer to seven relation templates from the LTL-based modeling language Declare. The relations are described in words in Table 1 with analogous DCR relations. These particular Declare constraints have been selected based on their ability to be mapped to DCR Graph relations that can be composed orthogonally (thereby ensuring the perfect fitness requirement of the miner), the possibility to detect them in linear time and extensive experimentation to determine which combination of constraints yielded the best balance between precision and simplicity on real-life logs. Formal specifications of functions for identifying relations satisfied by the log are given in Table 2 (again, these are only specifications, not implementations). In the description that follows, we refer to lines in the high-level control flow pseudocode in Algorithm 1.

The first step of the ParNek algorithm is the initialization of a DCR Graph, after which we begin adding relations using a number of strategies.

Initialization (lines: 2–5) We begin by defining a set of events

$$\begin{aligned} E \equiv \{1,\ldots , |\Sigma _L|\} \end{aligned}$$

containing the same number of events as distinct activities present in the log, the latter defining our set of activities

$$\begin{aligned} A \equiv \Sigma _L. \end{aligned}$$

The labeling function

$$\begin{aligned} l : E \rightarrow \Sigma _L;~i \mapsto s_i \end{aligned}$$

is a bijective mapping between events and activities. So for all intents, events and activities are equivalent. Finally, we assign an initial marking

$$\begin{aligned} m \equiv (\emptyset , \emptyset , E) \end{aligned}$$

in which all events are included, none are pending, and none are executed. This marking does not change and is returned in the final graph.

Self-ExclusionsAtMostOne (line: 11): We begin with activities for which the log satisfies the AtMostOne  relation. Any activity s satisfying this unary relation is mapped onto the binary self-exclusion relation \(s \rightarrow \% s\).

ResponsesResponse (line: 13): All pairs of distinct activities s and t for which the log satisfies the Response  relation are mapped directly onto the response relation \(s \bullet \rightarrow t\).

ConditionsPrecedence (line: 12): All pairs of distinct activities s and t for which the log satisfies the Precedence  relation are mapped directly onto the condition relation \(s \rightarrow \bullet ~t\). While this forms the basis of the condition relation, more will be added in lines 32-34.

Includes/ExcludesChainPrecedence (line: 14–15): The first step in populating \(\rightarrow +\) and adding further self-exclusions to \(\rightarrow \%\) is based on identifying ChainPrecedence  relations. However, encoding ChainPrecedence  in DCR Graphs is less straightforward than AlternatePrecedence, which is (nearlyFootnote 3) captured by an include and self-excludes. Since AlternatePrecedence  subsumes ChainPrecedence, it is safe to check for evidence of the more restricted ChainPrecedence, yet add AlternatePrecedence  to the model.

Excludes—Predecessor/Successor (lines: 17–21): Further excludes relations are found by defining two relations:

$$\begin{aligned} Predecessor(L) \text { and } Successor(L) \end{aligned}$$

which return the sets of all possible predecessors and successors of an activity, respectively. Note that these relations are, in fact, each other’s dual:

$$\begin{aligned} (a,b) \in Predecessor(L) \implies (b,a) \in Successor \end{aligned}$$

Nonetheless, to maintain consistency between the implementation described in Sect. 5, we distinguish between the two.

Based on the observation that a log in which activities s and t never co-occur in the same trace satisfies the NotCoExistence(st)  relation, we add \(s \rightarrow \% t\) and \(t \rightarrow \% s\) (lines: 17–18). However, due to the subsequent removal of redundant exclusions (lines: 26–27), the NotCoExistence  relation cannot be guaranteed to hold since one or both of the exclusions may be removed.

Furthermore, if s is observed to precede, but never succeed t, and if no self-exclusion \(s \rightarrow \% s\) has been found, we add \(t \rightarrow \% s\) (lines: 19–21).

In order to restrain model complexity, only one exclusion relation is included for each target activity by means of the ChooseOneRelation function. At present, this function is implemented in a naive (but fast and determinstic), first-come manner with a more sophisticated approach being left for future work.

Includes and ExcludesNotChainSuccession (lines: 23–24): To identify further includes and excludes relations, we rely on NotChainSuccession(L) as well as Between(L), which simply identifies activities occurring between two other activities in a log.

Put simply, if we never observe s followed immediately by t, we add an exclusion \(s \rightarrow \% t\) (NotChainSuccession). If, however, t occurs after s, with some sequence of intermediate activities s.t. we have \(\langle \ldots , s, u_1, \ldots , u_n, t, \ldots \rangle \), then we allow all intermediate events to re-include t. That is, for all \(1 \le i \le n\), we add \(u_i \rightarrow + t\).

Remove Redundant Excludes (lines: 26–27): Here, we remove redundant excludes relations based on the observation that if activity r always precedes s, and if \(r\rightarrow \% t\), then adding \(s \rightarrow \% t\) is redundant. It should be noted that this redundancy does not hold if some u occurs between r and s and \(u \rightarrow + t\). Presently, this caveat is ignored, potentially leading to a decrease in model precision, but allowing for an enormous reduction in model complexity.

Limited Transitive Reduction (lines: 29–30 and 36): The condition and response relations satisfy the transitive property when seen in isolation. That is, if we have \(s \rightarrow \bullet t\) and \(t \rightarrow \bullet u\), then \(s \rightarrow \bullet u\). In this case, \(s \rightarrow \bullet u\) is superfluous. The caveat, seen in isolation, is crucial; however, since if the same model has \(v \rightarrow \% t\) for some v, then t may become excluded, annulling the implicit \(s \rightarrow \bullet u\). Formally,

$$\begin{aligned} s \rightarrow \bullet t \wedge t \rightarrow \bullet u \wedge \not \exists v.~v \rightarrow \% t \models s \rightarrow \bullet u \end{aligned}$$

In fact, we can safely remove redundant \(s \rightarrow \bullet u\) despite the presence of an interfering excludes relation (that is, we ignore \(\not \exists v.~v \rightarrow \% t\)). The removal is safe in the sense that this can only result in a more permissive model, i.e., we do not risk arriving at a model on which the log cannot be replayed. The downside is a less precise model, which may permit behavior which ought to be forbidden.

A limited-horizon transitive reduction is performed which considers only relations between an activity and its neighbors’ neighbors, but not further, in order to constrain computational complexity. This is applied to all condition and response relations prior to the final step of discovering additional condition relations, and once again on condition relations afterward. In many models, the reduction in relations is very substantial. See Fig. 1 for a graphical illustration.

Fig. 1
figure 1

Transitive reduction with a limited horizon: graph a has the same reachability/transitive closure as the reduced graph (b), but redundant edges within a 2-edge horizon have been removed

Additional Conditions (lines: 32–34): The first set of conditions we added based on the \(\textsc {Precedence}\) relations were conservative in that this relation was observed to hold unconditionally across traces. We can now add less obvious condition relations, taking advantage of semantics added to our model by inclusion and exclusion relations.

We start by adding \(s \rightarrow \bullet t\) if s occurs before the first occurrence of t in some trace. For those traces in which s does not precede the first t, it may be the case that at the time of executing t, that s is currently excluded, e.g., if the relation \(u \rightarrow \% s\) is present and u is observed prior to t, and s has not been re-included. Recall that DCR Graphs semantics dictate that a relation does not apply when the source activity is excluded.

Since only includes and excludes relations are determinative for the validity of these candidate relations, we can utilize a limited replay strategy based on these relations alone. This approach is less computationally demanding than using the full model.

5 The DisCoveR miner

In the previous section, we provided a formal, functional characterization of the ParNek algorithm. In the current section, we show how the algorithm was operationally implemented as the DisCoveR miner. The full JAVA source code is provided as open source (licensed under LGPL-3.0) at [81]. As the full source code is too large to include in this paper, we will at various times provide a skeleton of the code and refer to the repository for the full details, note that this means that the listings below may at times obfuscate some details from the actual source code, or include additional comments, when class names and line numbers are mentioned, they refer specifically to the release version 1.0.1.

The primary contribution of the implementation is its run time complexity, expressed in terms of the size of the event log (L) and in terms of the number of unique activities in the log (A). This is achieved through two primary means. First of all, instead of computing the various functions of Table 2 naively by continuously re-parsing the log, we first build an abstraction of the log, which allows us to afterward compute these functions in \({\mathcal {O}}(A^2)\), which in turn makes the main Algorithm 1 independent to the size of the log, except for the computation of additional conditions. Secondly, by using bit vector operations for (1) the building of the abstraction, (2) the computation of additional conditions and (3) the DCR Graph semantics, we reduce their complexity to be, respectively, \({\mathcal {O}}(L * A)\), \({\mathcal {O}}(L)\), and \({\mathcal {O}}(1)\). This means that the combined complexity of the miner is \({\mathcal {O}}((L * A) + A^2)\), with the log size usually dominating. The bitvector implementation of DCR Graphs was inspired by earlier work by Debois et al. [20, 45].

Why bit vectors? A bit vector (also bit array or bit set) is an array of bits (i.e., Booleans) that exposes bitwise operations. This allows the compiler to map the data structure directly to bitwise machine instructions, making computations on them extremely fast.

5.1 DCR graph semantics

We first show how we used bit vectors to improve the efficiency of replaying DCR Graphs. Note that BitSets are JAVA’s implementation of bit vectors, the marking of a DCR Graph can then be represented as such:

figure b

For example, let us assume that we have three activities with respective indices A (1), B (2), and C (3). If A and C have been previously executed, then their executed states can be represented as the bit vector:

figure c

We can similarly represent relations as matrices, encoded in practice as hashmaps of bit vectors to allow fast lookup of the relations of a particular activity:

figure d

Continuing on the previous example, if we have a condition from A to B and from B to C, the data structure conditionsFor would be constructed as follows:

figure e

Given these definitions, the semantics of DCR Graphs can be expressed as a short list of bitvector operations. Note that the get() method retrieves the bit at a given index and that the intersects method first applies an AND operation on two vectors and afterward checks if the result is 0. Enabledness of events can be computed as follows:

figure f

First, we check the index of the included bit vector corresponding to the event, after we check if any of the conditions for the event are current included and not executed. The latter requires two bitwise operations: first, we subtract the executed from the included events, giving us a bit vector representing those events that are currently included, but have not yet been executed, after we check if this bit vector intersects (i.e., checking if the bitwise AND is greater than 0) with the bitvector representing the conditions for the event. Note that a more straightforward, but less efficient implementation of DCR Graphs would loop over a data structure containing all conditions to achieve a similar result.

Likewise, the execution of an event can be computed as follows:

figure g

Here, we first set the bit that corresponds to the executed event in the executed bit vector to true. We then set the bit that corresponds to the event in the pending bit vector to false. Afterward, we add any new pending responses through the bitwise OR operation on the pending bitvector and the responseTo bitvector for the executed event (which represents those events that are a response to the event). Then, we remove excluded events from the included bitvector by subtracting the excludesTo bitvector for the executed event. Finally, we add included events to the included bitvector through a bitwise OR with the includesTo bitvector (Table 4).

As before, because hashmap lookup and bitvector operations are constant, a function that would usually loop over the sets of relations becomes a short list of constant operations.

This implementation of DCR Graphs allows for extremely fast replay of logs, which significantly reduces the duration of the Additional Conditions part of the algorithm, which requires a replay of the log on the graph that has been found up-to that point. We will further address how we reduced the computation of Additional Conditions to linear time later in this section.

Table 3 Example of bit operations involved in execute method (see Listing 4)

5.2 Abstracting the log

To avoid repeating computations, we separate the mining process into two steps: first, we build a number of relevant abstractions of the log, which we then use afterward during the actual model building steps as described in Sect. 4. This separation of concerns ensures that there is a central part of the code where we parse the log, with all other parts of the algorithm working only on these abstractions, which are bounded by the number of activities (\({\mathcal {O}}(A^2)\)) and not the log size. To increase the efficiency of the log abstraction mechanism, we also store and compute these abstractions through bit vector operations. The listing below shows their definition:

figure i

Below we show how the abstractions are computed. The parseTrace method is called once for each trace in the log. Note that the method does not require a nested iteration over the log or current trace, only a single nested iteration over the activities to compute responses. Therefore, the complexity of computing the abstractions is \({\mathcal {O}}(L * A)\). For convenience, logs are transformed into lists of integers, this allows for straightforward mapping of activities to the indices of the bit vectors and efficient storage of the log for later reuse.

figure j

To avoid unnecessary computations embedded in the main parsing of the log, we exploit the fact that the predecessor and successor functions are each other’s dual and compute the successor function after the log has been parsed:

figure k

5.3 Mining from log abstractions

After creating the log abstractions, we start the discovery task. For the sake of brevity, we will not show source code here, but refer to [81]: BitParNeks, ln. 75–228. In short, the implementation follows largely the steps described in Algorithm 1. The key difference is in the additional condition step, where we avoid having nested loops over the traces by implementing this function as follows (we only show the most relevant parts, for the full method we refer to [81]):

figure l

Altogether, these optimizations provide us with an extremely efficient implementation of the ParNek algorithm. In the following section, we will show through experimentation that it is in fact nearly one order of magnitude faster than any other miner and two orders of magnitude faster than most of the state-of-the-art Declare miners.

6 Evaluation

To evaluate the performance of our algorithm, we frame the process discovery task as a binary classification task of identifying legal/illegal traces. For this, we take advantage of a labeled data set from the Process Discovery Contest 2019,Footnote 4 in which DisCoveR was among the top performing submissions , classifying \(96.1\%\) of traces correctly. This result was achieved despite the fact that DisCoveR considers only control-flow, ignoring auxiliary data associated with events. Nevertheless, the present evaluation should not be interpreted as a comprehensive benchmarking, but rather a preliminary, proof-of-concept evaluation.

For comparison, we report results for: (1) a miner based on the same formalism (DCR Graphs) developed by Debois, et.al. [25]; (2) two leading miners also based on the declarative paradigm: MINERful[15] and Declare Miner [47]; (3) the well-established Petri net miner, Inductive Miner; and finally (4) the winning miner for the PDC 2019, the Log Skeleton miner [87]. Note that the reason DisCoveR  achieves a higher accuracy than the Log Skeleton miner in our evaluation is due to the fact that we report the results of the algorithm’s classification alone, whereas the winning submission to the process discovery contest was a manually augmented model based on the output of the Log Skeleton miner.

Framing process discovery as a binary classification task is arguably an oversimplification of the aim of process discovery, since it does not capture the degree to which a model fails to capture an event log. Error measures that aim to capture this are usually based on model-log alignment techniques [5], or model specific measures such as token replay metrics for Petri nets [68]. The advantage of classification-based evaluation lies in the ease of interpretability and comparability. In a model-agnostic manner, we gain a view of the algorithm’s bias toward committing different classes of statistical errors (e.g., Type I/II) by analyzing true/false positives/negatives, and the corresponding precision, recall, \(F_1\)-score and MCC measures.

Before presenting the results, we briefly formalize the task of process discovery as binary classification in terms of computational learning theory. This clarifies our formulation of processes in probabilistic terms, a property which is implied by the statistical evaluation metrics we present, a subset of which were the basis for evaluation in the PDC 2019.

Through the appeal to learning theory, we aim to illustrate that a key reason our algorithm performs well is due to the—albeit heuristic—regularization (i.e., restriction on model complexity) performed at several steps in the algorithm.

6.1 The learning task

The goal of a supervised learning task is to learn an approximation h of a target function f which is assumed to generate the observed data [4]. The training data L are an i.i.d.Footnote 5 sample from the true probability distribution (\({\mathbb {P}}_P\)) associated with f. The aim is to maximize performance (e.g., minimize an error function) on out-of-sample data by means of optimizing performance on in-sample training data in such a way that the learned model avoids overfitting.

Formally, a learning algorithm \(\gamma \) is a mapping from a sampling L from the process \((P, {\mathbb {P}}_P)\) to a hypothesis space \({\mathcal {H}}\) s.t. the out-of-sample error \(E_\mathrm{out}\) is minimized:

$$\begin{aligned} \gamma : {\mathcal {L}} \rightarrow {\mathcal {H}}; ~L \mapsto \mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{h\in {\mathcal {H}}}E_\mathrm{out}(h) \end{aligned}$$

To define our error function E, we can frame process discovery-based binary classification as the task of predicting the outcome of a random Bernoulli variable defined by

$$\begin{aligned} \mathbbm {1}(\sigma \in P) \end{aligned}$$

which returns 1 when a trace \(\sigma \) is a member of P (the set of traces associated with the true process) and 0 otherwise.

The most straightforward way of defining the in-sample error measure is simply the proportion of “successes” in this Bernoulli trial. If L contains only positive examples (i.e., \(L \subseteq P\)), the in-sample error can be formulated as the proportion of traces accepted by the learned model h (i.e., recall):

$$\begin{aligned} E^r_\mathrm{in}(h) = \sum _{\sigma \in L} \frac{\mathbbm {1}(\sigma \in \ell (h))}{|L|} \end{aligned}$$

If L contains both positive and negative examples, the in-sample error can be written as the proportion of examples on which the learned model and example agree (i.e., accuracy):

$$\begin{aligned} E^a_\mathrm{in}(h) = \sum _{\sigma \in L} \frac{\mathbbm {1}(\sigma \in \ell (h) \iff \sigma \in P)}{|L|} \end{aligned}$$

We include this formulation (\(E^a_\mathrm{in}\)) for clarity, but note that it is at odds with our formalization of a log L as a sample from P. For it to be consistent, we would need to consider L a sample of traces in P as well as not in P. In the evaluation based on PDC 2019 data, all training logs contain only positive examples.

In most learning tasks, minimizing \(E_\mathrm{in}(h)\) is trivial if the hypothesis set \({\mathcal {H}}\) is large enough. Indeed, \(E^r_\mathrm{in}\) can be trivially minimized by a flower process model which permits all behavior. The true challenge of the learning task lies in ensuring not only that in-sample error is small, but simultaneously that in-sample error is close to out-of-sample error.

Formally,

$$\begin{aligned} | E(h)_\mathrm{in} - E(h)_\mathrm{out} | < \epsilon \end{aligned}$$

for some tolerance threshold \(\epsilon \).

While a large enough hypothesis space \({\mathcal {H}}\) may indeed contain the target function f, the likelihood of our learning algorithm choosing f in such a large hypothesis space is vanishingly small. It is much more likely to settle on some other, very complex, function \(g \in {\mathcal {H}}\), leading to a high \(E_\mathrm{out}\). We therefore seek an approach to ensuring that

$$\begin{aligned} {\mathbb {P}}[~~| E(h)_\mathrm{in} - E(h)_\mathrm{out} | < \epsilon ~] > 1 - \delta \end{aligned}$$

where \(\delta \) is a desired confidence threshold. This is known as a “probably (\(\delta \)), approximately (\(\epsilon \)), correct” (PAC) bound.

While somewhat counter-intuitive, this formulation helps us understand why restricting \({\mathcal {H}}\) to a smaller set which does not include the target function f will often lead to a lower \(E_\mathrm{out}\).

Regularization Thus, a key component in the learning process is that of regularization: a process for controlling the complexity of a learned model, i.e.,  restricting the size of the hypothesis space, to improve generalization. This gives rise to the formulation of the learning process as a trade-off between inductive biasFootnote 6 of a hypothesis set and a penalty for the complexity of a hypothesis [74]. The sum of these terms gives an estimate of the out-of-sample error:

$$\begin{aligned} {\hat{E}}_\mathrm{out} = E_\mathrm{in} + \Omega (N, {\mathcal {H}}, \delta ). \end{aligned}$$

Where N denotes sample size, \({\mathcal {H}}\) the hypothesis space and \(\delta \) the desired confidence that \(E_\mathrm{out}\le {{\hat{E}}}_\mathrm{out}\).

So although we can achieve a very low in-sample error using a rich hypothesis set, we penalize complex models using a regularization function \(\Omega \). Explicitly incorporating this function into learning algorithms s.t., it minimizes \({\hat{E}}_\mathrm{out}\) rather than \(E_\mathrm{in}\), can greatly improve results.

ParNek does not currently attempt to explicitly minimize \({\hat{E}}_\mathrm{out}\), and \(\Omega \) is likewise not explicitly formulated. However, some form of regularization is achieved by effectively restricting the size of \({\mathcal {H}}\). This is done via a set of heuristics attempting to control model complexity, removing those which are redundant w.r.t. training data or add little to the precision of its semantics. Indeed, ParNek cannot discover the entire set of DCR Graphs, thus

$$\begin{aligned} {\mathcal {H}}_\mathrm{ParNek} \subset {\mathcal {H}}_\mathrm{DCR} = \omega \text {-regular languages} \end{aligned}$$

Restricting the available hypothesis set is analogous to limiting a linear regression algorithm to third-order polynomials, for example, which corresponds to an \(\Omega \) which assigns a zero weight to all higher-order coefficients.

While heuristic in nature, the approach is effective, as is seen in comparison with miners which do little to control model complexity, such as Debois, et al’s miner. We intend to pursue more well-defined regularization procedures for DCR Graph mining algorithms in future work.

Metrics Aggregate evaluation metrics, such as precision, recall, and \(F_1\)-score, are commonly reported for classification tasks. Given a confusion matrix, we define precision (prec.) and recall as follows:

Pred-

Data

 

iction

\(+\)

 

\(+\)

(TP)

False Pos. (FP)

prec. \(\equiv \frac{\text {TP}}{{\text {TP}} + {\text {FP}}}\)

False Neg. (FN)

True Neg. (TN)

 
 

recall \(\equiv \frac{TP}{TP + FN}\)

acc.\(\equiv \frac{{\text {TP}}+{\text {TN}}}{{\text {TP}}+{\text {TN}}+{\text {FP}}+{\text {FN}}}\)

The \(F_\beta \)-score is then the harmonic mean of precision and recall, where \(\beta \) determines a weighting of precision relative to recall:

$$\begin{aligned} F_{\beta } = \frac{(1+\beta ^2) \cdot \text {precision} \cdot \text {recall}}{\beta \cdot \text {precision} + \text {recall}} \end{aligned}$$

Originally stemming from information retrieval, these metrics have been criticized for giving weight to true positives and ignoring true negatives [11], and other metrics such as Matthews Correlation Coefficient (MCC) avoid assumptions regarding the target class.

Arguably, process mining can be seen as an information retrieval task, if the tool is used to “query” an event log for compliant/noncompliant traces. For completeness, we report precision, recall, and \(F_1\)-score for both the situation in which the target class is compliant behavior (true positive) and noncompliance (true negative), as well as Matthews Correlation Coefficient (MCC).

6.2 Results

In addition to case studies, we present a controlled evaluation of the algorithm based on a labeled data set from the Process Discovery Contest 2019. The evaluation is bolstered by the truly blind nature of the process. After being presented with a training set with positive examples only, and submitting results for a partially blind validation round, the predictions on a separate test set were sent in to the contest administrators who independently evaluated their accuracy. This removes any potential for accidental data snooping.

See Table 4 for the complete results.

Dataset The data set essentially consists of 10 independent data sets stemming from 10 different processes. Participants were presented with an unlabeled training set from each process. Then, two validation sets were provided for which participants could submit their algorithm’s classification results. The organizers then returned a confusion matrix—but no details regarding which traces specifically were misclassified and how. Two rounds of submission for validation were permitted, though we only took advantage of the first.

Event logs for processes 1, 5, 7, 8, 9, and 10 contained auxiliary data associated with each event, sometimes more than one attribute. The version of our algorithm presented here considers only control-flow and is unable to take advantage of additional attributes, and neither do the miners we present in the following comparison.

Comparison For comparison, we present the performance of five relevant mining algorithms: the first, another DCR Graph mining algorithm designed by Debois et al. [25]; second, two miners based on Declare constraints, MINERful[15] and Declare Miner [47]; third, Inductive Miner, a flagship imperative miner which returns Petri net models; and finally Log Skeleton Miner, the winning submission to PDC 2019 [87].

Debois et al.’s DCR Graph miner takes a very greedy approach to identifying DCR relations which hold for an event log. Essentially, the algorithm begins with a fully constrained model over the set of activities in the log (mapped one-to-one to DCR events), then goes through the log and removes any constraints which are violated by observed behavior.

Due to the greedy strategy, the algorithm often finds thousands of constraints and clearly overfits the training data, leading to poor performance on test data.

MINERful is a miner for the Declare language which uses a number of user-defined parameters to determine which constraints to include in a model after mining the event log. The three core parameters are:

  1. Support

    The fraction of traces in which the constraints must hold.

  2. Confidence

    Support scaled by the fraction of traces in which a constraint is activated.

  3. Interest Factor

    Confidence scaled by the fraction of traces in which target of a constraint is also present.

A constraint is considered to be activated when it becomes relevant in a trace. So, a succession constraint between s and t will only become activated in traces in which s is present. In addition, to count toward interest factor, the target t must also be present. Defined as scalings, these parameters are dependent on one another and result in the bounds: support> confidence > interest factor.

MINERful also performs subsumption checks to eliminate redundant or meaningless constraints. For example, wherever a ChainSuccession constraint is found to hold, Succession will necessarily hold and adds no information. This procedure is akin to DisCoveR’s strategy of removing transitively redundant constraints in order to avoid unnecessarily complex models.

We employed an automated parametrization procedure originally developed for the evaluation in [8]. The procedure employs a binary search strategy to find values for confidence and threshold which result in a model with a number of constraints as close to, but not exceeding, some limit. We present results for models with between 89 and 200 constraints. Allowing larger models did not improve accuracy further.

Declare Miner was the first miner developed for the Declare language and uses a frequent itemset mining approach using the Apriori algorithm combined with subsequent pruning techniques. The user can set two threshold parameters: support, which measures the fraction of traces in which the constraints hold and alpha which measures the how often a constraint is activated (same as confidence for Minerful). Furthermore, the user can specify which constraint templates should be considered.

We consider models generated by Declare Miner with thresholds support = 100 and alpha = 100 and with either all constraint templates or only positive constraint templates (no Not- constraints). The parameter settings were settled upon after testing numerous settings from the range of thresholds, with 100/100 performing best.

Inductive Miner uses a divide-and-conquer approach to recursively partition the directly-follows graph (eventually-follows in the IMi variant of the miner) of a log such that the partitions correspond to one of four process tree operators: exclusive choice, sequential composition, parallel composition, and redo loop. The resulting process tree can be transformed into a corresponding Petri net.

We tested Inductive Miner (IMf) using a range of noise thresholds from 0.0 to 1.0, where a setting of 0.0 ensures perfectly fitting models w.r.t. to the mined event log (training set). A noise threshold of 0.0 is equivalent to the original Inductive Miner (IM). We also investigated the variants known as IM-EKS, IMc, IMcpt, IMlc and IMflc, whose performance was nearly identical to standard IM (noise threshold 0.0). The largest difference was IMflc with 2 fewer correct classifications. We only report detailed results for settings 0.0, 0.5, 1.0 for readability, but note that intermediate noise threshold between these values followed the same, roughly linear, relationship with the accuracy of the resulting model.

Log Skeleton miner was the basis for the winning submission to the PDC 2019 and builds on some basic Declare constraint templates: Precedence, Response, NotCoexistence, and adds NotPrecedence, and NotResponse. Furthermore, it employs the notion of equivalence classes for co-occurring activities. We report the results for the fully automated miner, but as noted, the final submission was manually extended, which is why the results we report are lower than the 99.78% accuracy achieved by the creator of Log Skeleton.

Results We report results for the classification task in a confusion matrix for each of the 10 processes, as well as aggregate across processes in Table 4. Keep in mind, that a user-defined error measure may choose to weigh false positives and false negatives differently (\(\alpha \) and \(\beta \) in our formalization).

Additionally, we report Matthews Correlation Coefficient (MCC) in addition to precision, recall, and \(F_1\)-score, both in the case of the target class being permissible traces, as well as forbidden traces. The appropriate framing would depend on the application.

6.2.1 Run time

We compared run time performance to the same miners as in our classification evaluation, finding that DisCoveR performs comparably with the fastest miners, and much faster than Declare-based miners, MINERful and Declare miner, even when multithreading is enabled. Note that for run time comparison, the linear-time IMd variant of Inductive Miner from the pm4pyFootnote 7 Python module was used.

Fig. 2
figure 2

Mean run times in milliseconds across 100 runs on Process Discovery Contest 2019 training logs. MINERful  was run with the thresholds: support = 1.0, confidence = 1.0, interest factor = 1.0, with and without a post-processing step to simplify models. Declare Miner was with and without multithreading, and with alpha = 1.0, support = 1.0, with all constraint templates and with all but the negative constraint templates. For run times, the IMd variant of Inductive Miner from the py4pm platform for Python, as this variant is significantly faster than other IM variants. Log Skeleton Miner was the winning submission in terms of classification accuracy, DisCoveR was the runner-up

Table 4 Confusion matrices for individual data sets, each generated by separate ground truth model, in our formulation referred to as \((P_i, {\mathbb {P}}_{P_i})\)

Experimental setup Experiments were conducted on the set of 10 test logs from the Process Discovery Contest 2019 and were run on a Lenovo Thinkpad P50 with an Intel Xeon E3-1535M v5 2.90 GHz quad-core processor and 32G of RAM. We present mean run times over 100 runs of mining each log.

MINERful  was parametrized with support threshold of 1.0, a confidence threshold of 1.0 and interest factor threshold of 1.0. Declare miner was parametrized with support = 1.0 and alpha = 1.0. The parameters for MINERful were chosen due to being the most “generous” in terms of run time. The parameters for Declare miner stem from the best performance in classification. We also report notable variants: for MINERful with and without an additional model simplification step, for Declare miner with/without negative constraints and with/without multithreading. We note that changes in parametrizations do not significantly alter performance—certainly not relative to other miners. Note that we did not employ the parameter tuning procedure used to achieve the results for MINERful in Table 4 which requires re-running the miner many times.

The 10 logs all consist of 700 traces. Run time results can be seen in Fig. 2 as well as Table 5, where details regarding number of activities and mean trace length are also included.

Note that these results should be taken as a rough indication of performance subject to some variance. A number of factors that are out of our control may affect run times, especially for very low run times. These include Java Virtual Machine’s garbage collection strategies, just-in-time compilation and optimization strategies, as well as background operating system processes. To determine a reasonable number of runs, we observed the convergence of run time estimates w.r.t. increases in runs, finding that estimates stabilized by 100 runs and clearly so by 1000 and 10,000 runs. We present results for 100 runs in part because higher numbers of runs for some miners was not feasible.

Table 5 Mean run times in milliseconds across 100 runs on Process Discovery Contest 2019 training logs, along with log statistics

6.2.2 Mined model

Finally, we show an example of what a mined DCR Graph actually looks like. In the listing below, we show the mined model for log 10 of the PDC 2019 data set. DCR Graphs can be represented either graphically (as nodes representing activities and edges representing relations) or as a language [26]. Here, we opted for the language format as it allows for a more concise representation of large models. The relations are written as -->*, *-->, -->% and -->+, respectively, the condition, response, exclusion and inclusion. By d -->* ae, we denote that d is a condition for ae. Activities can be grouped together as a shorthand for denoting multiple relations, e.g., i -->* (p, ai) denotes a condition from i to p and an additional condition from i to ai.

figure m

7 Case study: interactive model recommendation

In this section, we discuss how DisCoveR has been integrated in the dcrgraphs.net process portal as a means to provide modeling recommendations for the interactive modeling of declarative knowledge-intensive processes. We start by briefly describing the portal and its main functionalities. We then show how process discovery has been integrated in the portal and end with a discussion on how the model recommendation functionality is used in practice.

7.1 The DCR process portal

The dcrgraphs.net process portal is a cloud-based commercial modeling solution for declarative process models, offering an extensive range of functions including process modeling, simulation, analysis, maintenance, and a wide variety of collaboration features. The portal has been created and is maintained by DCR Solutions, in close collaboration with researchers from the University of Copenhagen, IT University of Copenhagen and Danish Technical University. The DCR notation, portal and DCR process engine have been applied in a range of application domains. Most notably, the engine was integrated into Workzone, a case management product used by over 70% of Danish central government institutionsFootnote 8 and the portal has become a cornerstone of the Ecoknow research project,Footnote 9 which proposes a novel digitalization strategy for Danish municipalities grounded in the declarative modeling of knowledge-intensive citizen processes.

Fig. 3
figure 3

DCR graphs modeling

The key component of the portal is the DCR modeling tool, shown in Fig. 3, which allows users to model and simulate DCR graphs. At the center of the screen is the modeling pane with the graphical representation of the DCR Graph, where activities are drawn as boxes and relations as colored arrows in a style similar to the formal syntax. Users can add and manipulate activities and relations between them directly in the modeling pane and change their details in an option panel on the right. The simulation screen is shown in Fig. 4. The upper right of the screen shows the current task list; here, the user can select which task to execute next. The middle of the screen shows recommendations for next steps and a simulation log. On the left, we have a number of advanced features, such as making time steps and a list of all users involved in the simulation (collaborative simulations are supported). In the bottom of the screen, the user can see a step-by-step flowchart representation of the current simulation, divided into swimlanes.

Fig. 4
figure 4

DCR graphs simulation

7.2 Interactive process modeling through model recommendation

In the declarative modeling approach advocated by DCR Solutions modelers are encouraged to (1) identify the activities and roles of the process, (2) think about what common and uncommon scenarios (i.e., traces) should be supported by the process, (3) based on the scenarios determine what reasonable constraints for the process would be, and (4) ensure that the constraints do not conflict with any desired paths through the use of simulation and test-cases [77]. The identification of constraints in step 3 has been identified as the most challenging for users because it requires a firm grasp of the semantics of DCR Graphs. While test cases and simulation can be used to retroactively check that no conflicting constraints have been introduced, they are not helpful for identifying suitable constraints directly. As a result, novice users often use a fairly inefficient trial-and-error approach where they try a constraint, check how it behaves under simulation and then update their model accordingly.

We introduced process discovery as an alternative to this trial-and-error approach. In this new setting, the portal supports the user by having an algorithm automatically propose suitable relations based either on an existing event log, and/or the traces that were identified during step 2 of the previously sketched modeling method.

Figure 5 provides an overview of the adapted approach: we start by identifying the activities of the process and modeling these directly in the portal. In the next step, we run simulations on these activities (recall that following the declarative paradigm, these simulations are entirely unconstrained and any trace can be generated). We store the traces generated during the simulation and use these as input for the following step, where we use DisCoveR to identify constraints based on the generated traces. Finally, the user can improve on their model and potentially run more simulations which can be used for additional process discovery, possibly finding additional constraints that were not found for the initial traces.

Fig. 5
figure 5

Overview of the model recommendation approach

Fig. 6
figure 6

Model recommendation

The model recommendation screen is shown in Fig. 6 and fairly straightforward: the user is shown which relations were found between which activities and can select those they wish to add through the box on the left. The user can also enter an explanation for the relation (i.e., why was it added or left out), this enables rationale management of the model and allows other users to follow the modeler’s reasoning. In addition, we plan to use this information in the future to improve upon the discovery algorithm. By clicking Add Relations, all selected relations are added to the model.

7.3 Discussion

Since the integration of DisCoveR into the DCR Graphs portal, DCR Solutions has been actively conducting workshops with users where the new methodology is demonstrated and used. The inclusion of process mining in the modeling task was embraced enthusiastically by users and has been (informally) observed to lower the complexity of the modeling task.

In the traditional modeling exercise, users that are more familiar with BPMN and/or flowcharts are often hampered by the novelty of the notation, e.g., they will be unclear on what the different relations mean and how to use them. In particular, the fact that arrows do not indicate flow, but logical relations between the activities can lead to confusion. Using model recommendations, on the other hand, has allowed DCR Solutions to ask the users questions based on the recommended relations such as, “Is it true that approval is a condition for providing documentation?” or, “Is it true approval removes the ability to reject?”.

In essence, model recommendation has managed to bridge an important gap between the consultant and user: in the past, the users were new to the notation, the consultants to the process. This made building a common understanding about the process a time intensive task. Model recommendation closes this divide by, on the one hand, helping the consultant better understand the process and, on the other, providing the user with examples of the notation that are uniquely fitting to their own domain.

The high accuracy of the algorithm has also been noted in practice: even for processes that include other perspectives than just control-flow (e.g., decisions depending on contextual data), the algorithm has been noted to be highly successful in recommending relevant relations that improved the users’ understanding of the process.

The integration of the algorithm in the commercial tools was relatively effortless: the front-end of the model recommendation was developed rapidly at DCR Solutions through existing plugin support for the portal. The algorithm itself was simply deployed as a cloud service by the researchers. Because of a long history of close collaboration between the two parties, the details of the interface between these two components and a general understanding of how the system should work was fleshed out quickly over two meetings and a few emails.

It should be noted that two variations of DisCoveR exist: the regular version used in the Process Discovery Contest prioritizes accuracy, whereas there also exists a light version that skips the step of finding additional inclusions and exclusions, thereby returning a less accurate but simpler model. It is this light version that is used within the DCR Graphs portal.

8 Conclusion

In this paper, we presented DisCoveR, a declarative miner for DCR Graphs based on the ParNek algorithm. We formally defined the underlying algorithm and how it has been implemented using an acute mapping to bit vector operations, yielding a highly efficient process discovery tool. We then preface the evaluation by framing process-discovery-as-classification in terms of computational learning theory in order to gain insight into the convincing performance of the algorithm on out-of-sample data. We evaluated the miner using a traditional classification task and computed the standard machine learning measures of accuracy (0.961), precision (0.94 on positive traces, 0.99 on negative traces), recall (0.99 on positive traces, 0.93 on negative traces), F1 (0.96 on each) and MCC (0.92).

The present evaluation suggests that DisCoveR is competitive with its peers. However, this should not be seen as a comprehensive benchmarking: this would require a more extensive evaluation on a larger variety of data sets, and against a more representative collection of miners. Where DisCoveR does appear to excel—in particular in comparison with other declarative miners—is in terms of run time, performing one order of magnitude faster than the state-of-the-art in DCR Graphs discovery and nearly two orders of magnitude faster than the state-of-the-art in Declare discovery . Finally, we showed how the tool has been integrated in a commercial modeling tool and discuss how its integration has significantly improved the modeling experiences of its users.

Future Work Several avenues exist for future work in mining DCR Graphs from event logs. So far, we have considered only the control flow of processes. Incorporating timing, data, and resource perspectives is very relevant for many real-world scenarios and one of the primary requests made by DCR Solutions. Furthermore, accounting for noisy data is an important point to address since this is common in real-world applications.

We restricted our hypothesis space to graphs with the same simple initial marking in which all events are enabled. This is due to the complicated interactions arising with other relations when excluding a source event. Considering different initial markings would enable the discovery of more complex models, but also enlarge the hypothesis space and increase the danger of overfitting.

In order to control more explicitly for overfitting and quantify the tradeoff between inductive bias and complexity, a formulation of regularization functions for classes of DCR Graphs is an important next step. This is not entirely straightforward due to the non-monotonic nature of DCR Graphs [23], rendering simple relation counting more or less meaningless for regularization purposes.

As described in the case study, users of the dcrgraphs.net portal are not only able to define positive scenarios, but also undesired scenarios. The use of negative input data in process discovery has so far been mostly ignored based on the assumption that such data are not available. Having negative scenarios provided by the portal offers a unique opportunity to develop new algorithms that take negative examples as input and thereby produce more relevant models. We observe that DisCoveR has a noticeably lower recall on negative than positive traces and hypothesize that the ability to analyze negative examples of traces will help us improve on this aspect of the accuracy of the tool.

Finally, there remain certain points in the ParNek algorithm in which choices are currently taken in a naive manner (e.g., ChooseOneRelation). This decision point should be framed as a proper optimization problem. In fact, framing DCR Graph mining properly as an optimization task would open a powerful set of tools from the general optimization literature.