Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring

Nguyen, An; Chatterjee, Srijeet; Weinzierl, Sven; Schwinn, Leo; Matzner, Martin; Eskofier, Bjoern

doi:10.1007/978-3-030-72693-5_9

An Nguyen⁸,
Srijeet Chatterjee⁸,
Sven Weinzierl⁹,
Leo Schwinn⁸,
Martin Matzner⁹ &
…
Bjoern Eskofier⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 406))

Included in the following conference series:

International Conference on Process Mining

1582 Accesses
15 Citations

Abstract

Predictive business process monitoring (PBPM) aims to predict future process behavior during ongoing process executions based on event log data. Especially, techniques for the next activity and timestamp prediction can help to improve the performance of operational business processes. Recently, many PBPM solutions based on deep learning were proposed by researchers. Due to the sequential nature of event log data, a common choice is to apply recurrent neural networks with long short-term memory (LSTM) cells. We argue, that the elapsed time between events is informative. However, current PBPM techniques mainly use “vanilla” LSTM cells and hand-crafted time-related control flow features. To better model the time dependencies between events, we propose a new PBPM technique based on time-aware LSTM (T-LSTM) cells. T-LSTM cells incorporate the elapsed time between consecutive events inherently to adjust the cell memory. Furthermore, we introduce cost-sensitive learning to account for the common class imbalance in event logs. Our experiments on publicly available benchmark event logs indicate the effectiveness of the introduced techniques.

S. Chatterjee—Equal contribution with An Nguyen.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Learning Accurate LSTM Models of Business Processes

Predictive Business Process Monitoring with LSTM Neural Networks

An Eye into the Future: Leveraging A-priori Knowledge in Predictive Business Process Monitoring

Keywords

1 Introduction

In the last years, a variety of predictive business process monitoring (PBPM) techniques that base on machine learning (ML) were proposed by researchers [6] to improve the performance of operational business processes [4]. PBPM is a class of techniques aiming at predicting future process characteristics in running process instances [12], like next activities, next timestamps or process-related performance indicators. Such PBPM techniques produce predictions through predictive models. These models are in turn constructed based on historical event log data.

A current trend in PBPM is to apply deep neural networks (DNNs) to learn more accurate predictive models from event log data than with “traditional” ML algorithms like probabilistic automata [7]. DNNs belong to the ML-sub-field deep learning (DL) and achieve that by identifying the intricate structures in high-dimensional data through multi-representation learning [11].

Existing DL-based PBPM techniques often rely on DNN architectures consisting of out-of-the-box constructs like layers with a “vanilla” long short-term memory (LSTM) cell [9] or state-of-the-art loss functions for parameter learning.

Event logs can be seen as sequences of events in continuous time with irregular intervals (i.e., elapsed time between consecutive events). We argue that these time intervals are informative in the case of event logs in PBPM. Intuitively, these time intervals describe human behavior of executing business processes. Thus, a time-aware PBPM technique considering information on time intervals could potentially achieve a higher predictive quality. Time information is currently only exploited via hand-crafted control-flow features as inputs to “vanilla” LSTM cells [15]. To better account for the time information in event log data, we propose a new PBPM techniques using time-aware LSTM (T-LSTM). T-LSTM extends the “vanilla” LSTM cells by incorporating the elapsed time between consecutive events in order to adjust the memory state and is inspired by work from Baytas et al. [2].

Furthermore, the problem of next activity prediction is commonly modeled as a supervised multi-class classification problem. The distribution of activities in event logs are commonly skewed. Therefore, we additionally introduce cost-sensitive learning to address the inherent class-imbalances.

The main contributions of this work are summarized below:

We introduce a time-aware LSTM model for the tasks of predicting next activities and timestamps in PBPM
We tackle the problem of skewed class distributions via cost-sensitive learning

We evaluate the effectiveness of our proposed techniques by conducting experiments for the next activity and timestamp prediction on publicly available benchmark event logs commonly used for PBPM.

The remainder of the paper is structured as follows: Sect. 2 presents related work on DL-based next activity and timestamp prediction. Section 3 introduces preliminaries and the concept of a LSTM. Sections 4 and 5 describes the architecture of T-LSTM and our experimental setup respectively. Then, in Sects. 6 and 7, we present and discuss our results. Section 8 concludes our paper and discusses future research directions.

2 Related Work

Inspired by the field of natural language processing (NLP), Evermann et al. [7] applied recurrent neural network-based and LSTM-based DNN architectures for the next activity and next sequence of activity prediction in PBPM. They made use of word embeddings to encode activities of event log’s process instances.

Navarin et al. [14] used a “vanilla” LSTM-based DNN architecture for predicting the completion time of running process instances. They one-hot encoded the activity attributes, computed temporal control-flow attributes, and considered additional real-valued or categorical context attributes.

Tax et al.[15] proposed a multitask learning approach using “vanilla” LSTM cells for next activity and timestamp prediction respectively. Like in [14], they one-hot encoded the activity and computed temporal control-flow features. However, the authors did not consider additional data attributes in their approach. This work acts as a baseline for a variety of other techniques such as [18].

Khan et al. [10] introduced memory augmented neural networks (MANNs) in PBPM. MANNs reduce the number of trainable parameters. In general, the network’s architecture consists of an externalized state memory and two “vanilla” LSTM cells manipulating the memory. One LSTM cell works as encoder and the other one as decoder. Concerning the predictive quality, their approach is comparable to the one presented in [15].

Camagro et al. [5] extended the implementation of [15] and fed the resource attribute into the DNN model. Additionally, instead of one-hot encoding, they applied embeddings, as proposed by Evermann et al. [7].

Taymouri et al. [16] introduced generative adversarial networks (GANs) for the next activity and timestamp prediction. The network’s architecture comprises two “vanilla” LSTM cells. One for the generator and the other one for the discriminator.

To date, several studies have investigated DNN-based PBPM techniques. None of the related works proposes a DL-architecture that explicitly models the elapsed time between two successive events. We address this gap by adapting time-aware LSTM cells [2]. Further, Mehdijev et al. [13] tackle the class imbalance problem in the context of the DNN-based prediction of next activities through a second neural network, namely radial basis function neural network, which generates semi-artificial data of the minority class in the pre-processing phase. In contrast, we adapt cost-sensitive learning to investigate the class-imbalance problem for DL-architectures comprising T-LSTM cells.

3 Background

3.1 Preliminaries

Definition 1 (Event, Trace, Event Log)

An event is a tuple (c, a, ts) where c is the case id, a is the activity (label) and ts is the timestamp. A trace is a non-empty sequence \(\sigma = \langle e_{1}, \ldots , e_{\vert \sigma \vert } \rangle \) of events such that \(\forall i, j \in \{1, \ldots , \vert \sigma \vert \}~ e_{i}.c = e_{j}.c\) and \(e_{i}.ts \le e_{j}.ts,\) for \(1 \le i < j \le \vert \sigma \vert \). An event log L is a set \(\{\sigma _{1}, \ldots , \sigma _{\vert L \vert }\}\) of traces. A trace can also be considered as a sequence of vectors which contain derived control flow information or features. Formally, \(\sigma =\left\langle \mathbf {x}^{(1)}, \mathbf {x}^{(2)}, \ldots , \mathbf {x}^{(\vert \sigma \vert )}\right\rangle \), where \(\mathbf {x}^{(t)} \in \mathbb {R}^{{n} \times 1}\) is a vector, and the superscript indicates the time-order upon which the events happened. n is the number of features derived for each event.

Definition 2 (Prefix and Label)

Given a trace \(\sigma =\left\langle e_{1},\dots , e_{k}, \dots , e_{\vert \sigma \vert }\right\rangle \), a prefix of length k, that is a non-empty sequence, is defined as \(f_{p}^{(k)}(\sigma )=\langle e_{1},\dots , e_{k}\rangle ,\) with \(0< k < \vert \sigma _{c} \vert \). A next activity label for a prefix of length k is defined as \(f_{l,a}^{(k)}(\sigma )= e_{k+1}.a\), whereas a next timestamp label for a prefix of length k is defined as \(f_{l,ts}^{(k)}(\sigma )= e_{k+1}.ts\). The above definition also holds for an input trace representing a sequence of vectors. For example, the tuple of all possible prefixes, the tuple of all possible next activity labels and the tuple of all possible next timestamp labels for \(\sigma =\langle \mathbf {x}^{(1)}, \mathbf {x}^{(2)}, \mathbf {x}^{(3)} \rangle \) are \(\langle \langle \mathbf {x}^{(1)}\rangle , \langle \mathbf {x}^{(1)},\mathbf {x}^{(2)}\rangle \rangle \), \(\langle e_2.a, e_3.a \rangle \), and \(\langle e_2.ts, e_3.ts \rangle \).

3.2 Long Short-Term Memory Cells

Most of the DNN architectures proposed for the next activity and timestamp prediction in PBPM [17] use “vanilla” LSTM cells [9]. LSTMs belong to the class of recurrent neural networks [11] and are designed to handle temporal dependencies in sequential prediction problems [3].

Given a sequence of inputs \(\sigma =\langle \mathbf {x}^{(1)}, \mathbf {x}^{(2)}, \mathbf {x}^{(3)}, ..., \mathbf {x}^{(k)}\rangle \), a LSTM computes sequences of outputs \(\langle \mathbf {h}^{(1)}, \mathbf {h}^{(2)}, \mathbf {h}^{(3)}, ..., \mathbf {h}^{(k)}\rangle \) via the following recurrent equations:

(1)

\(\{\mathbf {U}_{f, i, g, o}, \mathbf {W}_{f, i, g, o}, \mathbf {b}_{f, i, g, o}\}\) are trainable parameters, \(\circ \) denotes the Hadamard product (element-wise product), \(\mathbf {h}^{(t)}\) and \(\mathbf {c}^{(t)}\) are the hidden state and cell memory of a LSTM cell. Additionally, a LSTM cell uses four gates to manage its states over time to avoid the problem of exploding/vanishing gradients in the case of longer sequences [3]. \(\mathbf {f}_{g}^{(t)}\) (forget gate) determines how much of the previous memory is kept, \(\mathbf {i}_{g}^{(t)}\) (input gate) controls the amount new information is stored into memory, \(\mathbf {\tilde{c}^{(t)}}\) (candidate memory) defines how much information is stored into memory and \(\mathbf {o}_{g}^{(t)}\) (output gate) determines how much information is read out of the memory. The hidden state \(\mathbf {h}^{(t)}\) is commonly forwarded to a successive layer.

4 Methodology

4.1 Time-Aware Long Short-Term Memory Cells

“Vanilla” LSTM cells, as described in Sect. 3.2, assume a uniform distribution of the elapsed time between events . This assumption does not hold for most event logs analyzed in PBPM though (see Fig. 4). The elapsed time between consecutive events might have an impact on the next activity and timestamp prediction. Hence, a LSTM cell should be able to take irregular elapsed times into account when processing event logs.

Time-aware long short-term memory (T-LSTM) cells are an extension of the LSTM. Figure 1 depicts the T-LSTM cell and highlights its differences with regard to the LSTM cell.

The main idea behind T-LSTM is to perform a subspace decomposition of the previous cell memory \(\mathbf {c}^{(t-1)}\). First, a short term memory component \(\mathbf {c}_{S}^{(t-1)}\) is extracted via a network. Next, the short term memory is discounted via a decay function of the elapsed time and yields \(\mathbf {\hat{c}}_{s}^{(t-1)}\). Then, the long term memory \((\mathbf {c}_{T}^{(t-1)}=\mathbf {c}^{(t-1)} - \mathbf {c}_{S}^{(t-1)})\) is calculated. Finally, the previous cell memory is adjusted \(\mathbf {c}_{*}^{(t-1)}=\mathbf {c}_{T}^{(t-1)}+\mathbf {\hat{c}}_{s}^{(t-1)})\). The adjusted previous memory \(\mathbf {c}_{*}^{(t-1)}\) is then, together with \(\mathbf {h}^{(t-1)}\) and \(\mathbf {x}^{(t)}\), further processed as in LSTM cells by substituting \(\mathbf {c}^{(t-1)}\) with \(\mathbf {c}_{*}^{(t-1)}\) in Eq. (1). The following equations summarize the T-LSTM specific computations for the subspace decomposition and adjustment of the previous memory.

(2)

Note, that we only add \(\{\mathbf {W}_{d}, \mathbf {b}_{d}\}\) as trainable parameters compared to the LSTM cell. As recommended in Baytas et al. [2], we chose \(decay(\mathbf {\Delta }^{(t)}) = 1/log(e+\mathbf {\Delta }^{(t)})\) since we input the elapsed times in seconds and therefore have large values for \(\mathbf {\Delta }^{t}\). Any other monotonic decreasing function and scale for \(\mathbf {\Delta }^{t}\) would be valid as well, but our initial choice proved to be effective. The intuition behind the subspace decomposition is that the short term memory should be discounted if the elapsed time is very large, while the long term memory should be maintained in the adjusted previous cell memory \(\mathbf {c}_{*}^{(t-1)}\). Similar as for LSTMs, the hidden state \(\mathbf {h}^{(t)}\) is forwarded to successive layer for further processing. Hence, it is straightforward to substitute LSTM with T-LSTM cells in a given DNN architecture.

4.2 Network Architecture

We adapted the multitask architecture proposed by Tax et al. [15] as a baseline (see Fig. 2). The predicted next activity \(\hat{e}_{k+1}.a\) is the output of a softmax activation after the last dense layer, where the output dimension is equal to the number of unique activity labels. \(\hat{e}_{k+1}.a\) is evaluated against the one-hot encoded ground truth label \(e_{k+1}.a\) by using the Cross-Entropy (CE) loss. The predicted next timestamp \(\hat{e}_{k+1}.ts\) is a scalar output of a dense layer. We do not apply any additional activation after the time specific dense layer to be consistent with the implementation^{Footnote 1} of Tax et al. [15]. \(\hat{e}_{k+1}.ts\) is compared with the ground truth timestamp \(e_{k+1}.ts\) using the Mean Absolute Error (MAE). The total loss is the sum of both losses, as implemented in Tax et al. [15]. Further, they applied one-hot encoding for the activities and compute time-related control-flow features, which we also used in our experiments. We refer to the baseline architecture as “Tax”. We performed an ablation study and made three modifications to the baseline DNN architecture:

We weighted the CE loss function based on the distribution of activity labels in the training set. Hence, the classification of under-represented event classes had larger influence during training. We refer to this model as “Tax+CS”.
We replaced all LSTM layers with T-LSTM layers and refer to this model as “Tax+T-LSTM”.
We added cost-sensitive learning and replaced all LSTM layers with T-LSTM layers. We call this model “Tax+CS+T-LSTM”.

5 Experiments

5.1 Datasets

We performed our experiments on the same publicly available datasets as Tax et al. [15] to validate the effectiveness of our proposed techniques. Figure 3 shows the distribution of the activities (labels) for the different datasets. It is evident that the distributions of activities are skewed for both event logs. Table 1 presents descriptive statistics of the datasets used in this work.

Helpdesk^{Footnote 2}: This event log originates from a ticket management process of an Italian software company.

BPI’12 W Subprocess^{Footnote 3} (BPI12W): The Business Process Intelligence (BPI) 2012 challenge provided this event log from a German financial institution. The data come from a loan application process. The ‘W’ indicates state of the work item for the application.

Table 1. Descriptive statistics of the datasets used in this study.

Full size table

5.2 Preprocessing

We used the cleaned and prepared datasets as in Tax et al. [15]. The datasets can be found on the corresponding GitHub repository^{Footnote 4}. The preprocessing steps include splitting the data into training and test set, calculating time divisors, and ASCII encoding activities and sequence generation. Datasets were split into 2/3rd and 1/3rd for training and testing respectively and preserve the temporal order of cases. We additionally used the last \(20\%\) of the training data as a validation set in order to tune the hyperparameters. We adapted the sequence and feature generation methods by Tax et al. [15]. The features include the activity of the event, position of the event in the case, time since the last event, time from the starting event of the case, time from midnight, and day of the week. We create one-hot encoded versions of the ground truth labels \({e}_{k+1}.a\) for the next activity prediction in order to compare them with the predicted next activity labels \(\hat{e}_{k+1}.a\).

5.3 Training Setup

For hyperparameter tuning, we performed a grid search on the training set and chose the model with the lowest validation loss. The validation loss is the sum of activity-related validation loss and time-related validation loss. The number of LSTM or T-LSTM units were set to 64 or 100. For the dropout rate (for both dense layers), we tried the values 0.0 and 0.2. We choose Nadam as an optimization algorithm, as used in [15]. Nesterov accelerated gradient (NAG) calculates the step using the ‘lookahead’ algorithm, which approximates the next parameters. Adam optimizer estimates learning rates based on initial moments of the gradients. Nadam is a combination of both and is robust in noisy datasets. Furthermore, we tested a range of different learning rates \(\{0.0001, 0.0002, 0.001, 0.002, 0.01\}\) since this is known to have a large impact on LSTMs [8]. We trained each model for 150 epochs, with a batch size of 64 and apply early stopping with patience 25 for regularization.

5.4 Evaluation

We applied the same evaluation metrics as in [15]. We used the Accuracy metric to evaluate the next activity prediction. For the next timestamp prediction, we used the Mean Absolute Error (MAE) to evaluate our models.

5.5 Implementation

We conducted all experiments on a workstation with 24 CPU cores, 748 GB RAM and a singe GPU NVIDEA QUADRO RTX6000. We implemented the experiments in Python 3.7. We used the DL framework TensorFlow 2.1^{Footnote 5}. The source code is available on GitHub^{Footnote 6}.

6 Results

6.1 Next Activity Prediction

Table 2 shows the results for the next activity prediction in terms of Accuracy. For Helpdesk and BPI12W, the approach Tax+CS+T-LSTM achieved the highest Accuracy (0.724 and 0.778) among all approaches. The approach’s improvement compared to the baseline is 0.012 and 0.018. While the two approaches, Tax+CS and Tax+T-LSTM, outperformed the baseline for Helpdesk, these approaches achieved a lower Accuracy for BPI12W than the baseline.

Table 2. Results for the next activity prediction in terms of Accuracy. The best result for each dataset is highlighted (larger is better).

Full size table

6.2 Next Timestamp Prediction

Table 3 shows the results for the next timestamp prediction task in terms of MAE in days. All approaches with a T-LSTM cell, clearly outperformed the baseline for both event logs. Thereby, the approach Tax+CS achieved the lowest MAE of 2.87 days and 0.88 days for Helpdesk and BPI12W respectively. Compared to the baseline, this approach reduced the MAE by 0.88 days (Helpdesk) and 0.68 days (BPI12W). The other two approaches, Tax+T-LSTM and Tax+CS+T-LSTM, achieved a slightly worse MAE values compared to Tax+CS for both event logs. It is worth noticing that for Helpdesk Tax+CS+T-LSTM and for BPI12W Tax+T-LSTM yielded the second best results with MAE close to Tax+CS.

Table 3. Results for next step time prediction in terms of MAE in days. The best result for each dataset is highlighted (lower is better).

Full size table

7 Discussion

In this paper, we argued that the elapsed time between consecutive events carries valuable information on human behavior in running business processes. Therefore, we introduced T-LSTM cells for PBPM which inherently model the elapsed time between consecutive events. Further, we introduced of cost-sensitive learning to better cope with the problem of imbalanced data.

The obtained results indicate that the elapsed time between consecutive events is informative and that a DNN architecture relying on T-LSTM cells cab yield more accurate models for PBPM. Especially, with the approach Tax+CS+T-LSTM, we could outperform the baseline (Tax) for both datasets (i.e., Helpdesk and BPI12W) and both prediction tasks (i.e., next activity prediction and next timestamp prediction). Thereby, we could observe that cost-sensitive learning plays a crucial role for the predictive quality of a DNN architecture using T-LSTM cells instead of “vanilla” LSTM cells. Interestingly, the effectiveness of the introduced techniques is more evident for the next timestamp prediction compared to the next activity prediction

Even though our presented results on DNN architectures using T-LSTMs seem promising, there are a few limitations to our work. First, we need to verify our findings by performing experiments on more datasets. Second, a better hyperparameter tuning approach like Bayesian optimization [1] could be applied for all configurations to get a better estimate of their effectiveness. Further, several runs with random initialization should be performed to estimate the stability of the models.

8 Conclusion and Future Work

We propose T-LSTM as an alternative to the commonly used “vanilla” LSTM cell to better exploit information on the elapsed time between consecutive events. Furthermore, we introduced the concept of cost-sensitive learning to account for the common class-imbalance in event log data. Our results indicate the effectiveness of the introduced techniques for the next activity and timestamp prediction. This suggests that integrating specific mechanisms into neural network layers to incorporate event log specific characteristics might be an interesting direction for future research. Here, we mainly demonstrated the benefit of replacing a normal LSTM with a time-aware LSTM cell for a given baseline approach [15].

An avenue for future research is to investigate if T-LSTM cells might also improve other LSTM-based PBPM approaches such as Camargo et al. [5] involving resource attributes or Taymouri et al. [16] generating fake event logs. Another direction for future research is to further customize an LSTM cell in terms specifically for PBPM. For example, a process-aware LSTM cell could not only deal with time information but also with resource information.

Notes

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd International Conference on Knowledge Discovery and Data Mining (KDD) (2019)
Google Scholar
Baytas, I.M., Xiao, C., Zhang, X., Wang, F., Jain, A.K., Zhou, J.: Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 65–74 (2017)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P., et al.: Learning long-term dependencies with gradient descent is difficult. Trans. Neural Networks 5(2), 157–166 (1994)
Article Google Scholar
Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for business processes. MIS Q. 40(4), 1009–1034 (2016)
Article Google Scholar
Camargo, M., Dumas, M., González-Rojas, O.: Learning accurate LSTM models of business processes. In: Hildebrandt, T., van Dongen, B.F., Röglinger, M., Mendling, J. (eds.) BPM 2019. LNCS, vol. 11675, pp. 286–302. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26619-6_19
Chapter Google Scholar
Di Francescomarino, C., Ghidini, C., Maggi, F.M., Milani, F.: Predictive process monitoring methods: which one suits me best? In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNCS, vol. 11080, pp. 462–479. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98648-7_27
Chapter Google Scholar
Evermann, J., Rehse, J.R., Fettke, P.: Predicting process behaviour using deep learning. Decis. Support Syst. 100, 129–140 (2017). https://www.evermann2017predicting
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Networks Learn. Syst. 28(10), 2222–2232 (2017)
Article MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Khan, A., et al.: Memory-augmented neural networks for predictive process analytics. arXiv preprint arXiv:1802.00938 (2018)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Jarke, M., et al. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 457–472. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6_31
Chapter Google Scholar
Mehdiyev, N., Evermann, J., Fettke, P.: A novel business process prediction model using a deep learning method. Bus. Inf. Syst. Eng. 62(2), 143–157 (2018). https://doi.org/10.1007/s12599-018-0551-3
Article Google Scholar
Navarin, N., Vincenzi, B., Polato, M., Sperduti, A.: LSTM networks for data-aware remaining time prediction of business process instances. In: IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7. IEEE (2017)
Google Scholar
Tax, N., Verenich, I., La Rosa, M., Dumas, M.: Predictive business process monitoring with LSTM neural networks. In: Dubois, E., Pohl, K. (eds.) CAiSE 2017. LNCS, vol. 10253, pp. 477–492. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59536-8_30
Chapter Google Scholar
Taymouri, F., La Rosa, M., Erfani, S., Bozorgi, Z.D., Verenich, I.: Predictive business process monitoring via generative adversarial nets: the case of next event prediction. In: Proceedings of the 18th International Conference on Business Process Management (BPM) (2020)
Google Scholar
Weinzierl, S., et al.: An empirical comparison of deep-neural-network architectures for next activity prediction using context-enriched process event logs. arXiv:2005.01194 (2020)
Weinzierl, S., Dunzer, S., Zilker, S., Matzner, M.: Prescriptive business process monitoring for recommending next best actions. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNBIP, vol. 392, pp. 193–209. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58638-6_12
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Erlangen, Germany
An Nguyen, Srijeet Chatterjee, Leo Schwinn & Bjoern Eskofier
Institute of Information Systems, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Nürnberg, Germany
Sven Weinzierl & Martin Matzner

Authors

An Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Srijeet Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Sven Weinzierl
View author publications
You can also search for this author in PubMed Google Scholar
Leo Schwinn
View author publications
You can also search for this author in PubMed Google Scholar
Martin Matzner
View author publications
You can also search for this author in PubMed Google Scholar
Bjoern Eskofier
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Srijeet Chatterjee: Equal contribution with An Nguyen

Corresponding author

Correspondence to An Nguyen .

Editor information

Editors and Affiliations

Queensland University of Technology, Brisbane, QLD, Australia
Sander Leemans
Kühne Logistics University, Hamburg, Germany
Henrik Leopold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, A., Chatterjee, S., Weinzierl, S., Schwinn, L., Matzner, M., Eskofier, B. (2021). Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring. In: Leemans, S., Leopold, H. (eds) Process Mining Workshops. ICPM 2020. Lecture Notes in Business Information Processing, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-030-72693-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-72693-5_9
Published: 31 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72692-8
Online ISBN: 978-3-030-72693-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring

Abstract

Similar content being viewed by others

Learning Accurate LSTM Models of Business Processes

Predictive Business Process Monitoring with LSTM Neural Networks

An Eye into the Future: Leveraging A-priori Knowledge in Predictive Business Process Monitoring

Keywords

1 Introduction

2 Related Work

3 Background

3.1 Preliminaries

Definition 1 (Event, Trace, Event Log)

Definition 2 (Prefix and Label)

3.2 Long Short-Term Memory Cells

4 Methodology

4.1 Time-Aware Long Short-Term Memory Cells

4.2 Network Architecture

5 Experiments

5.1 Datasets

5.2 Preprocessing

5.3 Training Setup

5.4 Evaluation

5.5 Implementation

6 Results

6.1 Next Activity Prediction

6.2 Next Timestamp Prediction

7 Discussion

8 Conclusion and Future Work

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation