Abstract
Next step prediction is an important problem in process analytics and it can be used in process monitoring to preempt failure in business processes. We are using logfiles from a workflow system that record the sequential execution of business processes. Each process execution results in a timestamped event. The main issue of analysing such event sequences is that they can be very diverse. Models that can effectively handle diverse sequences without losing the sequential nature of the data are desired. We propose an approach which clusters event sequences. Each cluster consists of similar sequences and the challenge is to identify a similarity measure that can cope with the sequential nature of the data. After clustering we build individual predictive models for each group. This strategy addresses both the sequential and diverse characteristics of our data. We first employ K-means and extent it into a categorical-sequential clustering algorithm by combining it with sequential alignment. Finally, we treat each resulting cluster by building individual Markov models of different orders, expecting that the representative characteristics of each cluster are captured.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ruta, D., Majeed, B.: Business process forecasting in telecommunication industry. In: 2011 IEEE GCC Conference and Exhibition (GCC), pp. 389–392 (2011)
Tsui, K., Chen, V., Jiang, W., Aslandogan, Y.: Data mining methods and applications. In: Pham, H. (ed.) Handbook of Engineering Statistics, pp. 651–669. Springer (2005)
Trcka, N., Pechenizkiy, M.: From local patterns to global models: Towards domain driven educational process mining. In: 9th International Conference on Intelligent Systems Design and Applications, pp. 1114–1119 (2009)
van der Aaslt, W., Weijters, A.: Process mining: Research agenda. Computers in Industry 53(3), 231–244 (2004)
Smyth, P.: Clustering sequences with hidden markov models. In: Advances in Neural Information Processing Systems, pp. 648–654. MIT Press (1997)
Garcia, D., Parrado, E., Diaz-de Maria, F.: A new distance measure for model-based sequence clustering. IEEE Transactions on Pattern Analysis and Machine Intelligent 1(7), 1325–1331 (2009)
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Waterman, M.: Estimating statistical significance of sequence alignments. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences 344, 383–390 (1994)
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Rajaraman, A., Ullman, J.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
Berry, M., Linoff, G.: Data Mining Techniques: for Marketing, Sales, and Customer Relationship Management. Wiley, Newyork (2004)
Gabrys, B., Bargiela, A.: General fuzzy min-max neural network for clustering and classification. IEEE Transactions on Neural Networks 11(3), 769–783 (2000)
Anitha Elavarasi, S., Akilandeswari, J., Sathiyabhama, B.: A survey on partition clustering algorithms. International Journal of Enterprise Computing and Business Systems 1 (2011)
Zaki, M., Peters, M., Assent, I., Seidl, T.: Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data Knowl. Eng. 60(1), 51–70 (2007)
Dhillon, S., Modha, S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
Li, C., Biswas, G.: Clustering sequence data using hidden markov model representation. In: Proceedings of the SPIE 1999 Conference on Data Mining and Knowledge Discovery: Theory, pp. 14–21 (1999)
Porikli, F.: Clustering variable length sequences by eigenvector decomposition using hmm. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 352–360. Springer, Heidelberg (2004)
Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 881–892 (2002)
Wagstaff, K., Cardie, C., Rogers, S., Schrodl, S.: Constrained k-means clustering with background knowledge. In: 18th International Conference on Machine Learning, pp. 577–584 (2001)
Elkan, C.: Using the triangle ilequality to accelerate k-means. In: 20th International Conference on Machine Learning (ICML-2003), Washington DC, pp. 2–9 (2003)
Pham, D., Dimov, S., Nguyen, C.: Selection of k in k-means clustering. I MECH E Part C Journal of Mechanical Engineering Science 219(1), 103–119 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Le, M., Nauck, D., Gabrys, B., Martin, T. (2014). Sequential Clustering for Event Sequences and Its Impact on Next Process Step Prediction. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 442. Springer, Cham. https://doi.org/10.1007/978-3-319-08795-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-08795-5_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08794-8
Online ISBN: 978-3-319-08795-5
eBook Packages: Computer ScienceComputer Science (R0)