Abstract
Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also enforce in which order the events are allowed to occur. In this work we introduce a technique for discovering closed episodes. Adopting existing approaches for discovering traditional patterns, such as closed itemsets, to episodes is not straightforward. First of all, we cannot define a unique closure based on frequency because an episode may have several closed superepisodes. Moreover, to define a closedness concept for episodes we need a subset relationship between episodes, which is not trivial to define. We approach these problems by introducing strict episodes. We argue that this class is general enough, and at the same time we are able to define a natural subset relationship within it and use it efficiently. In order to mine closed episodes we define an auxiliary closure operator. We show that this closure satisfies the needed properties so that we can use the existing framework for mining closed patterns. Discovering the true closed episodes can be done as a post-processing step. We combine these observations into an efficient mining algorithm and demonstrate empirically its performance in practice.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: 11th international conference on data engineering (ICDE 1995), pp 3–14
Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: Proceedings of the 7th IEEE international conference on data mining (ICDM 2007), pp 83–92
Casas-Garriga G (2003) Discovering unbounded episodes in sequential data. In: Knowledge discovery in databases: PKDD 2003, 7th European conference on principles and practice of knowledge discovery in databases, pp 83–94
Casas-Garriga G (2005) Summarizing sequential data with closed partial orders. In: Proceedings of the SIAM international conference on data mining (SDM 2005), pp 380–391
Cule B, Goethals B, Robardet C (2009) A new constraint for mining sets in sequences. In: Proceedings of the SIAM international conference on data mining (SDM 2009), pp 317–328
Garofalakis M, Rastogi R, Shim K (2002) Mining sequential patterns with regular expression constraints. IEEE Trans Knowl Data Eng 14(3): 530–552
Gwadera R, Atallah MJ, Szpankowski W (2005a) Markov models for identification of significant episodes. In: Proceedings of the SIAM international conference on data mining (SDM 2005), pp 404–414
Gwadera R, Atallah MJ, Szpankowski W (2005) Reliable detection of episodes in event sequences. Knowl Inf Syst 7(4): 415–437
Laxman S, Sastry PS (2006) A survey of temporal data mining. SADHANA Acad Proc Eng Sci 31(2): 173–198
Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2007), pp 410–419
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3): 259–289
Méger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: Knowledge discovery in databases: PKDD 2004, 8th European conference on principles and practice of knowledge discovery in databases, pp 313–324
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: ICDT ’99: proceedings of the 7th international conference on database theory, pp 398–416
Pei J, Wang H, Liu J, Wang K, Wang J, Yu PS (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11): 1467–1481
Tatti N (2009) Significance of episodes based on minimal windows. In: Proceedings of the 9th IEEE international conference on data mining (ICDM 2009), pp 513–522
Tatti N, Cule B (2010) Mining closed strict episodes. In: Proceedings of the 10th IEEE international conference on data mining (ICDM 2010)
Tzvetkov P, Yan X, Han J (2003) Tsp: mining top-k closed sequential patterns. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), pp 347–354
Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: 20th international conference on data engineering (ICDE 2004), p 79
Wang JT-L, Chirn G-W, Marr TG, Shapiro B, Shasha D, Zhang K (1994) Combinatorial pattern discovery for scientific data: some preliminary results. ACM SIGMOD Rec 23(2): 115–125
Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining (SDM 2003), pp 166–177
Zhou W, Liu H, Cheng H (2010) Mining closed episodes from event sequences efficiently. In: Proceedings of the 14th Pacific-Asia conference on knowledge discovery and data mining, vol 1, pp 310–318
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M. J. Zaki.
The research described in this paper builds upon and extends the work appearing in Proceedings of Tenth IEEE International Conference on Data Mining (ICDM 2010), 2010 (Tatti and Cule 2010).
Rights and permissions
About this article
Cite this article
Tatti, N., Cule, B. Mining closed strict episodes. Data Min Knowl Disc 25, 34–66 (2012). https://doi.org/10.1007/s10618-011-0232-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0232-z