Abstract
In this paper we aim at extending the non-derivable condensed representation in frequent itemset mining to sequential pattern mining. We start by showing a negative example: in the context of frequent sequences, the notion of non-derivability is meaningless. Therefore, we extend our focus to the mining of conjunctions of sequences. Besides of being of practical importance, this class of patterns has some nice theoretical properties. Based on a new unexploited theoretical definition of equivalence classes for sequential patterns, we are able to extend the notion of a non-derivable itemset to the sequence domain. We present a new depth-first approach to mine non-derivable conjunctive sequential patterns and show its use in mining association rules for sequences. This approach is based on a well known combinatorial theorem: the Möbius inversion. A performance study using both synthetic and real datasets illustrates the efficiency of our mining algorithm. These new introduced patterns have a high-potential for real-life applications, especially for network monitoring and biomedical fields with the ability to get sequential association rules with all the classical statistical metrics such as confidence, conviction, lift etc.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering (ICDE 95), Tapei, Taiwan, pp 3–14
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using bitmap representation. In: Proceedings of the 8th SIGKDD international conference on knowledge discovery and data mining (KDD 02), Alberta, Canada, pp 439–435
Balcázar JL, Garriga GC (2007) Horn axiomatizations for sequential data. Theor Comput Sci 371(3): 247–264
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Discov 7(1): 5–22
Calders T (2007) Itemset frequency satisfiability: complexity and axiomatization. Theor Comput Sci
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Elomaa T, Mannila H, Toivonen H (eds) PKDD, vol 2431 of lecture notes in computer science. Springer, pp 74–85
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: PKDD proceedings, pp 71–82
Calders T, Goethals B (2005) Depth-first non-derivable itemset mining. In: SDM
Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Discov 14(1): 171–206
Calders T, Rigotti C, Boulicaut JF (2006) A survey on condensed representations for frequent sets. Constraint based mining. Springer-Verlag, LNAI, 3848:64–80
Ireland K, Rosen M (1990) A classical introduction to modern number theory. Springer-Verlag
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT 96), Avignon, France, pp 3–17
Xing Z, Pei J, Dong G, Yu PS (2008) Mining sequence classifiers for early prediction. In: Proceedings of the 2008 SIAM international conference on data mining (SDM’08), Atlanta, GA, 24–26 April
Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large databases. In: Barbará D, Kamath C (eds) SDM. SIAM
Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1/2): 31–60
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution,and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Walter Daelemans, Bart Goethals, and Katharina Morik.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Raïssi, C., Calders, T. & Poncelet, P. Mining conjunctive sequential patterns. Data Min Knowl Disc 17, 77–93 (2008). https://doi.org/10.1007/s10618-008-0108-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0108-z