Keywords

1 Introduction

A central goal of Business Process Management (BPM) is organizing work into several interrelated “units of work” to achieve shared goals. The formal foundations of BPM, as used in process modeling and mining, conceptualize such a unit of work as a task. Tasks are planned, scheduled, distributed to suitable actors such that the overall work can be performed by a collaborating workforce. Most Process-aware Information Systems (PAIS) support this goal by assuming that work is performed in the context of a business process that is executed as a sequence of task executions called a case. Each task is executed by a specific actor and the BPM system is responsible that the correct tasks are performed in the correct order. Thereby the actual work happens outside the PAIS itself which only schedules tasks and checks completion [9].

However, this concept of a task in process modeling and mining in BPM—a unit of work is a single step executed by a single actor in a single case—fails to capture many facets of work that occur in practice. In organizations research, a well-defined (atomic) step in a process is called an activity or action [2, 20]. In contrast, a task is considered a slightly larger “unit of work” that has to be carried out to achieve an objective within the process, e.g., review CVs. Completing or executing a task often requires to perform multiple actions (e.g., download, open, take notes), not necessarily limited to a single case (e.g., all CVs received); these actions may be grouped differently depending on the actor(s) the task is assigned to.

This also has been acknowledged in the BPM field from several perspectives. Robotic Process Automation (RPA) uses task mining to identify how individual actors perform tasks by recording their desktop interactions revealing tasks spanning more than a single case, e.g., data entry from a spreadsheet to an information system [16]. Individual actors may batch actions in multiple cases, e.g., a manager reviewing and approving requests in different cases, which is still poorly supported by many PAIS [23]. Finally, actors often do not act independently from each other, multiple actors may perform work together even across multiple cases, e.g., the collaborative grading of student reports, and across multiple actions, e.g., delivering and installing a new washing machine. Despite this acknowledgment and many years of BPM research, there is no generally agreed definition of a task that captures such aspects of work in a process and that is compatible with the established process modeling and process mining concepts. In other words, the existing process mining and process modeling concepts are too simplistic.

In this paper, we investigate how to conceptualize task executions beyond the basic definition of an action performed by a single actor in isolated cases with the goal of capturing the various facets of tasks. Our approach combines event data analysis with conceptually modeling behavior in processes and actors as two behavioral dimensions simultaneously [10]: (1) the sequence of events recorded in a process case and (2) the sequence of events, across multiple process cases, in which an actor is involved. We use event graphs as introduced in [10] as data structure to model relations between events, cases, and actors as paths along cases and along actors over the same events; thereby escaping limitations of classical event logs (Sect. 3). In such a graph, a task execution emerges when a path along an actor meets a path along a case over one or more events.

We then perform a systematic, theoretical analysis of the types of task executions that can be expressed in an event graph depending on (1) how many paths along cases meet (2) how many paths along actors for (3) how many events (Sect. 4.1). From this theoretical analysis we derive a taxonomy of task execution types characterized by 5 different parameters (Sect. 4.2); the taxonomy describes 23 task execution types, several have not been described in literature before. We present methods for querying these task execution types in event graphs (Sect. 5) and evaluate the existence of these task execution types in the BPIC’17 and BPIC’14 event logs (Sect. 6). We specifically found that non-trivial task execution patterns over multiple steps, multiple cases, and even multiple actors frequently occur in two real-life event logs, as well as occurrences of several previously unknown patterns. We also observe changes in frequency of task execution patterns over time due to changes in the way actors do their work.

2 Related Work

Related research that also accounts for the more complex aspects of work beyond isolated cases has been conducted from several perspectives.

Process modeling literature studies actors performing work in terms of “resources” required for a task. Of the workflow resource patterns [24], only “Simultaneous Execution” and “Additional Resources” consider joint work by multiple actors. Only recently, actor behavior across multiple cases came into focus under batching across individual cases [23] and instance-spanning constraints [12]. Current BPM systems poorly support these phenomena and existing notations (e.g., BPMN) require extensions [13, 23] to support them; but actor behavior is never modeled explicitly. Synchronous proclets [11] allow modeling individual actor behavior across individual cases in a network of Petri nets, each describing a process or an actor [6], that dynamically synchronize on single transition occurrences. The same synchronization principle has also been adopted for DCR graphs [4]. We contribute to this stream of modeling research by showing that actor-case interactions themselves form complex task execution patterns over multiple actions, cases, and actors, that should be supported in modeling.

In process mining, social network mining [27] analyzes actor interactions but excludes the control-flow perspective. Other approaches are mining of team composition and work assignment [25], resource skills, utilization, and productivity [22] and resource availability [17]; these works assume tasks to be single actions. Task executions by the same actor over multiple actions can be discovered as local process models [5]. Task mining analyzes behavior that may transcend multiple cases [16] by tapping into desktop interaction logs of actors. These works are limited to single actors in isolated cases due to the use of event logs. For analyzing instance spanning constraints [28], batch activities [18, 19] and scheduled processes [26] process mining methods have been extended to consider inter-case relations; in these works actor behavior is described/modeled implicitly, whereas we analyze actor behavior explicitly.

Routines research [14, 20, 21] studies “work” in terms of a narrative network of actors interacting to achieve organizational goals. A narrative [20] is a path in the narrative network, i.e., a “coherent, time-ordered sequence of actions or interactions [for] accomplishing an organizational task” [14, 21]. An action pattern that occurs repeatedly at an individual actor is called a habit; a recurrent action pattern involving multiple actors is called a routine [2]. Habits and routines capture how actors accomplish their tasks. A central question in routine dynamics research is to identify such patterns in the narrative network and how they change [14] and is approached through field studies.

We complement prior work by transforming an event log into an event graph which can be understood as data-based representation of a narrative network. We use graph theory to detect patterns of task executions (i.e., habits and routines) and their changes over time. Our taxonomy of task executions extends and generalizes existing notions of tasks in BPM and process mining that are tailored towards either isolate cases, e.g., [5], or towards specific aspects of work behavior across cases, e.g., [12, 13, 16, 18, 19, 23, 26, 28].

3 Preliminaries

We first discuss relevant concepts of the conventional, single-dimensional representation of event data. We then show how these concepts can be translated to a multi-dimensional representation using a general data model based on labeled property graphs [10], which we use as a foundation for our work.

Single-Dimensional Representation of Event Data. A PAIS can record an action execution as an event in an event log. Each event records at least the action that occurred, the time of occurrence, and an entity identifier indicating on which entity or case the action occurred. Often, the actor executing the action is recorded as resource. Table 1 shows an example event log containing 10 events occurring on the same day.

Process mining [1] analyzes event data by grouping events w.r.t. a chosen case identifier attribute, e.g., a loan application document or a patient in a hospital. Ordering all events of a case by time yields the trace as a sequence of events. Grouping the events in Table 1 by Case yields the traces \(\langle e1,e2,e3,e4,e5\rangle \) and \(\langle e6,e7,e8,e9,e10\rangle \). A set of such traces is a traditional single-dimensional event log along the case perspective [10].

Classically the Resource attribute in Table 1 is considered as event attribute describing the event further. However, the resource (the actor) is an entity in its own right and we can also study the sequence of events along each resource, defining a second behavioral dimension in the data. Choosing Resource as case identifier yields a second event log with traces \(\langle e1,e2,e6,e7,e8 \rangle \) (Resource 1), \(\langle e3,e4,e9,e10 \rangle \) (R. 5), and \(\langle e5 \rangle \) (R. 29).

Each event in Table 1 is related to 2 entity identifiers: a case identifier and resource identifier. Generally, an event can have multiple case identifiers and/or multiple resource identifiers [10]. The relation of events to multiple entities results in different behavioral dimensions between events that cannot be adequately represented or analyzed using a single-dimensional event log representation.

Table 1. Event table example
Fig. 1.
figure 1

Event graph containing the events and entities from Table 1

Multi-dimensional Representation of Event Data. We use a labeled property graph (LPG) to represent multiple behavioral dimensions together over a set of events.

Graph databases use LPGs [3, Chapter 2] for modeling various entities (as nodes) and various relationship (as edges) between them. An event graph [10] is a specific LPG, which can be obtained from an event table: each event and each entity (i.e., cases and resources) is represented by a node with label Event or Entity. Event and entity nodes are connected through directed binary relationships: a CORR relationship from e to n defines that event e is correlated to entity n. A DF relationship from e to \(e'\) defines that event \(e'\) directly follows e from the perspective of a specific entity n to which e and \(e'\) are correlated (i.e., e occurs before \(e'\) and there is no other event between them). Each node and relationship can hold a number of key-value pairs referred to as properties, e.g., whether an entity has \( Type = Case \) or \( Type = Resource \). As short-hand notation we write \((e,e')^x\) for a DF-relationship in G from e to \(e'\) of type \(x \in \{c,r\}\) (i.e., case or resource). See [10, 15] for formal details.

The example in Fig. 1 shows the event graph derived from Table 1: each square (white) node is an event node; each circle is an entity node of the corresponding type (blue for Case, red for Resource). CORR relationships are shown as dashed edges, e.g., e1, e2, e3, e4, e5 are correlated to case c3 and e1, e2, e6, e7, e8 are correlated to resource r1. DF relationships are shown as solid edges. The DF-relationships between the events correlated to the same entity form a DF-path for that entity; the graph in Fig. 1 defines 2 DF-paths for case entities, e.g., \(\sigma _{c3} = \langle (e1,e2)^c, (e2,e3)^c, (e3,e4)^c, (e4,e5)^c \rangle \) and 3 DF-paths for resource entities, e.g., \(\sigma _{r1} = \langle (e1,e2)^r, (e2,e6)^r, (e6,e7)^r, (e7,e8)^r\rangle \).

In the graph in Fig. 1, we observe which resource executed which action in which case, e.g., r1 performed A in c3 (at event e1). However, we can also see that DF-paths for case and resource “flow in parallel” over multiple actions, e.g., \(\sigma _{c3}\) and \(\sigma _{r1}\) both contain \((e1,e2)^x\) meaning r1 performed A and B consecutively in c3 (events e1 and e2) forming a larger unit of work captured in Fig. 1 as the connected subgraph of events \(\{e_1,e_2\}\) and the two DF-relationships between them. We can observe more such subgraphs of consecutive events along the same resource and case in Fig. 1, i.e., larger units of work.

4 Task Execution Patterns

We observed in Sect. 3 that event graphs reveal “units of work” that are not just individual events but are connected subgraphs where resources and case meet along several subsequent events. In this section, we conceptualize these connected subgraphs as task executions and explore in which forms they can manifest. We explain our approach in Sect. 4.1 and present a novel taxonomy of task execution patterns and their interpretation in Sect. 4.2. We thereby make use of standard graph theory concepts; see [15] for formal definitions.

4.1 Exploring Event Graphs for Forms of Task Executions

In the event graph in Fig. 1, we initially observe two ways in which a task execution manifests. (1) A resource follows a case over multiple events, e.g., e6, e7, e8; these event nodes form a subgraph induced by one DF-path \(\sigma _c\) of a case entity and one DF-path \(\sigma _r\) of a resource entity as follows: \(\sigma _c\) and \(\sigma _r\) enter the subgraph together (e.g., at e6) and leave the subgraph together (e.g., at e8) and are both continuous in this subgraph (all events of the DF-path are within the graph). (2) We also observe an execution of a classical task in the event graph in Fig. 1 consisting of only a single event, e.g., e5; the path of the resource and the path of the case synchronize for this step only, i.e., the subgraph is a single node.

We explored whether other subgraphs can be characterized by searching for different configurations of the following concepts in the subgraph: DF-path of a resource, DF-path of a case and their synchronization. We identified the following parameters and values:

  1. 1.

    The subgraphs in Fig. 1 contain at most one case DF-path. Are there (meaningful) execution patterns which have multiple case DF-paths?

  2. 2.

    If multiple case DF-paths are in the subgraph: are the case DF-paths disjoint (i.e., each event belongs to exactly one case) or can case DF-paths synchronize (i.e., have a shared event)?

  3. 3.

    The subgraphs in Fig. 1 contain at most one resource DF-path. Are there (meaningful) execution patterns which have multiple resource DF-paths?

  4. 4.

    If multiple resource DF-paths are in the subgraph: are the resource DF-paths disjoint (i.e., each event belongs to exactly one resource) or can resource DF-paths synchronize (i.e., have a shared event)?

  5. 5.

    In Fig. 1, all DF-paths are continuously in the subgraph (i.e., they only enter once and leave once). Are there (meaningful) execution patterns where a DF-path also temporarily leaves the subgraph and re-enters later?

4.2 Taxonomy of Task Execution Patterns

The above questions define a parameter space that allows for a set of different subgraph configurations that can be systematically described within the bounds of this space. We explored this parameter space by modeling abstract task execution patterns as subgraphs of event nodes of an event graph. We explain the patterns found, introduce a taxonomy to structure them systematically, and evaluate whether each pattern has real-world interpretation and whether it was already discussed in literature.

We considered subgraphs that emerge from multiple DF-paths synchronizing as some “unit of work”. We identified the following necessary conditions for \(n \ge 2\) DF-paths \(\varSigma = \{\sigma _1,\ldots ,\sigma _n\}\) to induce a subgraph G that describes a task execution: (T1) any two event nodes in G are connected via at least one DF-path \(\sigma \in \varSigma \), (T2) for each event node e in G exists a case DF-path \(\sigma _c \in \varSigma \) and a resource DF-path \(\sigma _r \in \varSigma \) that contain e (i.e., G is traversed by at least one case and one resource DF-path), (T3) there is at least one DF-path \(\sigma _1 \in \varSigma \) that is continuously in G (enters G once and leaves G once). We identified two stricter necessary conditions of task executions defining a spectrum:

  • Graph-structure based task execution: in the strictest form of task executions the subgraph G is induced by \(\varSigma = \{ \sigma _1,\ldots ,\sigma _n\}\) and satisfies (T1)–(T3) and additionally (T2’): each event node \(e \in G\) is in each DF-path \(\sigma \in \varSigma \) (i.e., all paths always synchronize in all events in G but some paths may leave in between). As a consequence, all paths converge at the first event of the continuous DF-path \(\sigma _1\) in G and diverge at the last event of \(\sigma _1\) in G, see (T3). All subgraphs in Fig. 2 have this property.

  • Domain-knowledge based task execution: the paths in G do not converge and diverge at the same start and end events of G, yet are coherent. In addition to (T1)–(T3) the following condition holds: (T3’) all DF-paths are continuously in G. All subgraphs in Fig. 3 have this property. T3’ requires domain-knowledge to decide whether DF-paths \(\varSigma \) form a valid subgraph G describing a task execution.

Next, we differentiate different types of subgraphs further by the following 4 parameters over a subgraph G (i) the number of case DF-paths, (ii) the number of resource DF-paths, and (iii) how often they enter and leave, and (iv) how they synchronize in G. We start with graph-structure based tasks executions.

Taxonomy of Graph-Structure-Based Task Execution Patterns. Figure 2 shows the graph-structure-based task execution patterns arranged according to the parameters identified in Sect. 4.1.

Fig. 2.
figure 2

Taxonomy of graph-structure-based task execution patterns

The taxonomy categorizes the patterns on the x-axis based on them containing a single DF-path from a single case (SC) or multiple DF-paths from multiple cases (MC). The patterns are categorized along the y-axis based on them containing a single DF-path from a single resource (SR) or multiple DF-paths from multiple resources (MR). Our taxonomy thus has four major quadrants: (SR,SC), (SR,MC), (MR,SC), (MR,MC).

Next, subgraphs within each of these quadrants are arranged based on the configuration of the paths they contain. A path is (1) single step (s) if it only contains a single event node within the subgraph, (2) continuous (c) if the path contains \(>1\) event node and is continuously within the subgraph, i.e., it only enters and leaves once, and (3) interrupted (i) if it contains \(>1\) event node and it leaves and enters the subgraph more than once. A single step path and an interrupted path are both non-continuous.

The set of resource DF-paths are configured separately from the set of case DF-paths. For each quadrant the subgraphs are arranged on the x-axis into columns for continuous and non-continuous case DF-paths and on the y-axis into rows for continuous and non-continuous resource DF-paths.

The bold letters in Fig. 2 indicate the short-hand notation we use in the following, e.g., MR\(_{\mathsf{i}}\),SC\(_{\mathsf{c}}\) denotes pattern P10: Multiple Resource interrupted, Single Case continuous.

Structural Properties. A fundamental property of graph-structure-based patterns is that all entities (cases and resources) are involved in each step of the task execution.

A fundamental property of our taxonomy is that adding/removing resource or case DF-paths results in a corresponding pattern in another quadrant, e.g., adding more resource DF-paths to P3 returns P11 and adding both more resource and case DF-paths returns P15.

A second fundamental property of the taxonomy is that it distinguishes elementary task execution patterns that cannot be decomposed further into subgraphs fitting the parameter space (1, 4, 5, 8, 9, 12, 13, 16) and non-elementary task execution patterns that are compositions thereof (2, 3, 6, 7, 10, 11, 14, 15). By combining instances of elementary patterns along either a continuous resource DF-path or a continuous case DF-path, we end up with a non-elementary instance belonging to the same quadrant. For instance, if we take three instances of P1 for the same case c and resource r and combine them along the case DF-path, i.e., the case is continuous in the composition (only leaves and enters once) while the DF-path for r can be arbitrary, we end up with P2.

The subgraphs in Fig. 2 that are not single-step are only one of the many variations possible within each specific cell. For instance, if we take two instances of P1 and one instance of P4 and again combine along a continuous case DF-path, we end up with a variation of P2.

Conceptual Evaluation. Within each quadrant, the top-left cell (1, 5, 9, 13) describes a single-step task. The bottom-right cell (4, 8, 12, 16) describes a task continuously involving all cases and resources over multiple steps. The top-right and bottom-left cell of a quadrant describes tasks interrupted by either the resource(s) or the case(s), respectively, e.g., a resource attending an urgent task in another case or a resource requiring input from another resource not involved in the task. While some single step and continuous patterns have been observed in other works [5, 6, 18], at present, no work has systematically studied interrupted patterns.

SR,SC patterns portray tasks that can only be executed for a single case by a single resource at a time, e.g., writing out a speeding ticket (single step, i.e., P1), or finalizing a loan offer: \(\langle \)call client about loan offer, create offer, send offer\(\rangle \) (multiple steps, i.e., P4). P2 describes an actor interrupting and returning to a larger unit of work in a case, e.g., compiling a report that is interrupted by other duties. P3 describes an actor continuously being concerned with the same case while other actors have to be involved as well, e.g., due checks based on a four-eyes principle. P1 has been extensively studied in traditional process analysis and P4 has been observed in [5].

SR,MC patterns portray tasks where a single resource handles multiple cases together, such as batch processing, e.g., lecturing a classroom of students (single step, i.e., P5) or analyzing a batch of blood samples, which involves transporting them to the lab, scanning them and analyzing them (multiple steps, i.e., P8). Only single step (P5) and multi-step batching (P8) have both been observed in [18].

Conversely, MR,SC patterns portray tasks where multiple resources work on a single case together. For instance in collaborative decision making, e.g., a master’s defense (single step, i.e., P9), or due to practical or technical requirements, e.g., delivering, carrying, and installing a washing machine requires two people (multiple steps, i.e., P12). Only P9 has been observed in [6], where queues and conveyor belts synchronize as distinct resources for the same (single step) events.

Finally, MR,MC patterns portray tasks executed by multiple resources on multiple cases together. While theoretically possible, it is very unlikely multiple cases and resources synchronize that strongly in real-life processes especially over multiple steps. A very relaxed interpretation of such a task would be the co-chairing of a panel for a conference (single step, i.e., P13). We discuss more realistic manifestations of MR,MC task configurations when discussing domain-knowledge based patterns (shown in Fig. 3). At present, no work has systematically studied patterns involving both multiple cases and multiple resources.

Taxonomy of Domain-Knowledge-Based Task Execution Patterns. So far we discussed our taxonomy for the strictly synchronized graph-structure based patterns shown in Fig. 2. We now discuss our taxonomy for the domain-knowledge based patterns at the other end of the spectrum shown in Fig. 3. The domain-knowledge based task execution patterns allow that only some case/resource paths synchronize per event but require all paths to be uninterrupted; this yields “units of work” that are more distributed. While there is a considerable amount of other subgraphs that fit Fig. 3, we limit ourselves to discuss those that require only basic domain knowledge.

Fig. 3.
figure 3

Taxonomy of domain-knowledge-based task execution patterns

Structural Properties. We first observe that we can derive all the configurations in Fig. 3 by composing multiple of the same elementary pattern from Fig. 2. For instance, P7’ and P10’ are essentially multiple instances of P1 composed along the resource and case DF-path, respectively. P8’ and P12’ can be composed similarly using instances of P4. P14’ can be composed of P5 instances along the case DF-path and P15’ with P9 instances along the resource DF-path. In Fig. 3, P16’ is composed of multiple instances of P1, both along the resource and case dimension. The pattern properties of this particular cell 16’ allow for basically every combination of elementary patterns composed along both the resource and case dimension. In contrast to graph-structure-based non-elementary patterns, the conditions for composing the patterns in Fig. 3 require domain knowledge.

Conceptual evaluation. In Fig. 3, P7’, P8’, P15’ portray different forms of sequential batching, i.e., the same step is executed for a sequence of cases one after the other, with a single resource (P7’, SR\(_{\mathsf{c}}\),MC\(_{\mathsf{s}}\)), with multiple resources (P15’, MR\(_{\mathsf{c}}\),MC\(_{\mathsf{s}}\)) and with one resource executing multiple steps per case (P8’, SR\(_{\mathsf{c}}\),MC\(_{\mathsf{c}}\)). Sequential batching that involves a single resource (P7’ and P8’) has been observed in [18].

We also identify a subset of patterns in which multiple resources are separately involved, each performing a set of steps for a case after which it moves to the next resource for the next step(s) in a pipe-lined fashion (e.g., P10’ and P12’, MR\(_{\mathsf{s,c}}\),SC\(_{\mathsf{c}}\)), resembling a factory/production type of setting. Such a setting could also be realized for (simultaneous) batches of cases (P14’, MR\(_{\mathsf{s}}\),MC\(_{\mathsf{c}}\)). At present, no work has systematically studied these patterns.

P16’ (MR\(_{\mathsf{c}}\),MC\(_{\mathsf{c}}\)) is a combination of the former two types; it portrays sequential batching being performed in a pipe-lined fashion. In Sect. 6 we show that we can identify an instance of P16’ in the BPIC’17 data.

5 Detecting Task Execution Patterns in Event Graphs

In this section, we present a technique for detecting instances of the task execution patterns of Sect. 4 as subgraphs in an event graph. We query the graph to retrieve all instances (subgraphs) of elementary task execution patterns (P1, P4), which we then materialize and store as new “task instance” nodes (Sect. 5.1). Later, we use these task instance nodes for querying elementary task instances (Sect. 5.2) and for detecting and querying non-elementary task instances (Sect. 5.3). All conceptual queries presented here are implemented in the graph query language Cypher on the graph database Neo4j; see Sect. 6 and [15].

5.1 Modeling Elementary Task Instances as High Level Events

We assume the data to be given in an event graph G (see Sect. 3). We describe how to detect in G subgraphs describing task instances (TIs) of elementary patterns P1 and P4 and how to materialize these as new nodes with label TI in the event graph. Finally, we lift the DF-edges from the Event nodes in the task instance subgraph to the corresponding TI node. Figure 4 shows the result of constructing the TI nodes and all corresponding relationships for the event graph from Fig. 1.

Fig. 4.
figure 4

Event graph containing the \(h_{ti}\) nodes constructed from the events from Fig. 1

We first search the graph for all pairs of events \((e_i,e_{i+1})\) that have both a case DF-edge \((e_i,e_{i+1})^c\) and a resource DF-edge \((e_i,e_{i+1})^r\) and create a new “joint” DF-edge \((e_i,e_{i+1})^j\). We then detect any task instance of elementary P4 as a maximal sequence of events \(ti= \langle e_m, ..., e_n \rangle \) where for each \(e_i, e_{i+1} \in ti\) there exists \((e_i,e_{i+1})^j\) and there exists no joint DF-edges \((e',e_m)^j\) or \((e_n,e')^j\). An instance of P1 is \(ti=e\) without \((e',e)^j\), \((e,e')^j\). We materialize ti as a new node \(h_{ti}\) with label TI and a contains relationship from \(h_{ti}\) to each \(e \in ti\). We treat \(h_{ti}\) as a “high-level” event and set properties \(h_{ti}.time_{start} = e_m.time\) and \(h_{ti}.time_{end} = e_n.time\) and correlate \(h_{ti}\) to each entity n to which \(e_m,...,e_n\) are correlated by adding CORR relationships. Finally, we sort all TI nodes \(h_1,\ldots ,h_k\) correlated to the same entity n of type x by \(time_{end}\) and introduce corresponding DF-edges \((h_i,h_{i+1})^x\) of type x, which lifts DF-paths from events to task instances. See [15] for the Cypher query. For example, we detect in Fig. 1 \(ti_1 = e_1,e_2\) and \(ti_4 = e_6, e_7, e_8\) resulting in nodes h1 and h4 and \((h1,h4)^r\) in Fig. 4. Instances of the other elementary patterns (5, 8, 9, 12, 13, 16) can be found by checking for multiple \((e_i,e_{i+1})^c\) and/or \((e_i,e_{i+1})^r\) relationships over all events in ti.

The elementary task execution T described by an elementary task instance \(ti = e_m,...,e_n\) is its sequence of action names \(T = e_m.action,...,e_n.action\); we set \(ti.task = T\) for easier querying.

5.2 Querying Elementary Task Instances

Having materialized all elementary task instances into TI nodes, we can query the graph of TI nodes and DF-relationships between them for insights. This allows for the following kinds of queries: (1a) Retrieve a subset of TIs based on a specific property, e.g., all TIs correlated to r1 (h1 and h4 in Fig. 4), or (1b) the subset of TIs of the most frequently executed tasks (h2 and h5 in Fig. 4). (2) Query for DF-paths between TI nodes, for instance the DF-path of TI nodes correlated to a specific case (\(\langle h4,h5 \rangle \) for c1 in Fig. 4) or to a specific resource (\(\langle h3 \rangle \) for r29 in Fig. 4). (3) Querying the DF-path of TIs of a specific resource on a specific day could give insight into habits [2] followed by this resource. Next, we query DF-paths between TI nodes along cases and resources to detect larger, non-elementary task execution patterns.

5.3 Querying Non-elementary Task Instances

We materialized elementary task instances as TI nodes connected through DF-edges in Sect. 5.1. We now show how to detect instances of non-elementary task execution patterns (NTI for short) as shown in Figs. 2 and 3 as compositions of TIs by querying for paths of TI nodes along DF-edges.

We detect any interrupted NTI (Fig. 2) involving resources \(r_1,...,r_l\) and cases \(c_1,...,c_m\) by querying for a maximal sequence of TI nodes \(h_1,...,h_k\) with \((h_i,h_{i+1})^{r_1},...,(h_i,h_{i+1})^{r_l}\) or \((h_i,h_{i+1})^{c_1},...,(h_i,h_{i+1})^{c_m}, 1\le i < k\) so that all underlying Event nodes are correlated to the same resource entities \(n_{r_1},...,n_{r_l}\) and case entities \(n_{c_1},...,n_{c_m}\). For detecting the domain-knowledge-based NTIs (Fig. 3), we also query a maximal sequence of TI nodes along either the resource-path or case-path, but this time require only one of the entity types (cases or resources) to be correlated to all TI nodes. For the patterns that describe batching behavior (7’, 8’, 15’, 16’), we additionally require all TI nodes \(h_1,\ldots ,h_k\) to describe the same elementary task, i.e. \(h_i.task = h_{i+1}.task\), and a maximum time difference \(\varDelta t_{batch}\) between two subsequent TIs, i.e. \(h_{i+1}.time_{start} - h_{i}.time_{end} < \varDelta t_{batch}\); see [15] for a Cypher query that detects NTIs of sequential batching P8’. Using such domain knowledge for a time gap is commonly done in batch identification [18]. For patterns 10’, 12’, 14’ and 16’ additional domain-knowledge is required to determine if multiple TIs along the case-path form a task execution. Examples of NTIs of P2, P3, P7’ and P8’ are shown in Fig. 5.

6 Evaluation

We performed an exploratory analysis to investigate the occurrence of task execution patterns in two real-life event logs BPIC’14 [7] and BPIC’17 [8]. We realized the approach of Sect. 5 in Cypher queries invoked via Python scripts on the Graph DB Neo4j; available at https://github.com/multi-dimensional-process-mining/event-graph-task-pattern-detection. Creating all TI constructs (Sect. 5.1) took 44.57s for the BPIC’14 log and 68.81s for the BPIC’17 log on an Intel i7 CPU @ 2.2 GHz machine with 32 GB RAM.

We applied queries for detecting all patterns that did not require specific domain knowledge, i.e., all patterns except 10’, 12’, 14’ and 16’, in the event graphs of the BPIC’14 and BPIC’17 data. We found TIs of patterns 1, 2, 3, 4, 7’ and 8’; TIs involving multiple resources and/or multiple cases per event simply do not occur in the data.

Fig. 5.
figure 5

Instances of task execution patterns 1, 2, 3, 4, 7’ and 8’ found in the BPIC’17 data

Figure 5 shows for the BPIC’17 data for each detected pattern type a task instance annotated with resource and case identifiers. The P1 instance in Fig. 5 shows actor r14 executing a single step in a case before moving to a different case and the P4 instance shows r95 executing four actions in a case before moving to the next. The P2 instance shows r3 executing ten actions in a case with an interruption after the first step \(W_2\), executing the same action \(W_2\) also in another case before completing the task in the former case. The P3 instance shows r107 continuously working on a case, performing the same task execution (\(\langle V_4,I_1,I_2,A_6 \rangle \)) twice while r128 performs other actions in between. The P7’ and P8’ instances show r35 and r126 performing the same actions \(W_2\) and \(\langle I_4,V_1,V_2,A_9 \rangle \), respectively, for the same five cases in a sequential batch. We observe a min/avg/max time difference of 0/12/613 min and 0/4.8/512 min between any two subsequent steps in a batch for BPIC’17 and BPIC’14, respectively.

Table 2. General task execution pattern measurements for BPIC’17 and BPIC’14 with <30 m between all TIs in NTI or >30 m between at least one TI in NTI

Table 2 shows the occurrence and other statistics of patterns 1, 2, 3, 4, 7’ and 8’ in the BPIC’14 and BPIC’17 data. We observe that P4 (multiple actions by same actor) makes up for the largest share of events in both logs (77.1% and 98.6% for BPIC’14 and BPIC’17, respectively) and has an average duration of 21.6 and 2.9 min and an average waiting time of 16.5 and 22.6 min between every pair of successive P4 instances for BPIC’14 and BPIC’17, respectively. It is therefore likely that these instances are composed of a single task execution as opposed to multiple tasks executions, rejecting the general assumption that a task execution is a single step executed by a single actor in a single case. We see P3 (actors interrupt a task execution and switch to another case) more often in BPIC’14 (1.6% + 8.6%) than in BPIC’17 (0.2%), showing that actors work differently in different processes. We observe that almost half of the interruptions P2 in BPIC’14 last more than 30 min, indicating that actors often switch context for long periods at a time. Task executions interrupted by waiting for another actor (P3) make up 10.2% of the BPIC’14 data, with a minor part lasting longer than 30 min, indicating either very long breaks or tasks in another process context not recorded in the data. Of these P3 interruptions, 60% lasted less than 10 s. TIs of P2 and P3 contain on average 2 other elementary TIs of P1 or P4 (last column). Executions of batch patterns P7’ and P8’ comprise 0.2+7.6% of the BPIC’17 data; although multi-step batches (P8’) have \(>5\) times as many steps as single-step batches (P7’) they only take twice as long in duration; indicating large deviations in executions of batching tasks in the BPIC’17 data. We observe a mean duration of 21.6 m for analyzing problems in IT components (BPIC’14) and 2.9 m for handling loan applications (BPIC’17) for elementary TIs, confirming our intuition that a task execution is short.

Fig. 6.
figure 6

Event graph of loan applications 1111458873, 1372864243, 206394826, 1877008365 and 1992048266 in BPIC’17 revealing five different task execution patterns

Fig. 7.
figure 7

Trends and subgraphs of four elem. task executions T in BPIC’17 showing concept drift

To investigate the manifestation of different task execution patterns on a case level, we visualized the events of five process executions in Fig. 6, revealing instances of five different task execution patterns P1, P4, P7’, P8’, and notably the most complex P16’. In Fig. 6, we observe six instances of P1 (one separately and five as part of P7’). All other task instances are of type P4, meaning that most actors perform tasks over multiple steps in the same case before handing the case over to the next actor. Existing process discovery techniques lack the resource perspective necessary to actually structure a trace into a sequence of P4 (and P1) instances, required to reveal these handovers of work. We observe r1 executing \(\langle A_4,A_8,H_1,H_3,W_1,A_3 \rangle \) in a batch for five cases in row (P8’) and r19 executes \(W_2\) for the same fives cases in a batch (P7’) directly afterwards. The instances of P8’ and P7’ together form an instance of P16’ along the cases. While domain knowledge is required to verify whether this instance of P16’ is intentional, we show that structured task executions involving multiple resources and cases exist in the data. This particular type of structured collaboration over multiple steps suggests a routine; confirming this requires further investigation.

We finally explored whether we find evidence for task executions changing over time, as stipulated in [14]. We queried the frequency of all task executions in BPIC’17 over time; Fig. 7 shows 4 selected task executions \(T_a\)\(T_d\) and their frequency. \(T_a\) changes into \(T_b\) in July 2016 by changing only the first action from \(V_5\) (lifecycle complete) to \(V_4\) (lifecycle abort), suggesting a minor change in directive but not a change in the way people work. We see similar lifecycle changes in other task executions of the BPIC’17 data (not shown here). Around the same time, \(T_c\) and \(T_d\) emerge, which both contain \(T_a\) at the end suggesting that also the way people worked changed. In total we found 6 more task executions with this characteristic that also show this change.

7 Conclusion

In this paper, we considered event data along both the case and actor dimension. Doing this in event graphs revealed different ways in which non-trivial tasks manifest as patterns in the data and uncovered dynamics that have not been described before in established process mining and modeling; these range from interruptions and batching to the more complex production-type settings.

This lays the foundation for a fundamentally different way of conceptualizing processes as the interplay of cases and actors engaging in recurrent patterns of work, i.e., routines and habits as studied in routines research [2]. We found evidence in existing real-life event logs that such patterns make up the larger share of events. We believe the taxonomy of task execution patterns aids in task mining and many other process analysis problems where actor and case perspectives meet. For example, the relation between certain actor behavior and process performance or process outcomes, the adherence to queuing policies [26] as well as questions related to the study of complete systems of processes, resources and queues together [6].

Our approach is limited in that our taxonomy does not cover the entire spectrum of possible task patterns; we currently do not account for graph-structure-based patterns with a less strict synchronization of paths, while real-life manifestations thereof do exist. This includes other possible patterns or aspects that may have been overlooked. Some patterns of the taxonomy involving multiple actors and cases were not found in the data as such data was not available. However, such patterns do exist, e.g., delivering and installing a washing machine by two actors does happen. A next step is finding and exploring other data for the existence of these patterns. Important to note is that our work does not identify a task itself but only the patterns of actions used to achieve a task. Finding out what these patterns mean and what real-life tasks they portray is future work.