Abstract
An active research topic in data mining is the discovery of sequential patterns, which finds all frequent subsequences in a sequence database. The generalized sequential pattern (GSP) algorithm was proposed to solve the mining of sequential patterns with time constraints, such as time gaps and sliding time windows. Recent studies indicate that the pattern-growth methodology could speed up sequence mining. However, the capabilities to mine sequential patterns with time constraints were previously available only within the Apriori framework. Therefore, we propose the DELISP (delimited sequential pattern) approach to provide the capabilities within the pattern-growth methodology. DELISP features in reducing the size of projected databases by bounded and windowed projection techniques. Bounded projection keeps only time-gap valid subsequences and windowed projection saves nonredundant subsequences satisfying the sliding time-window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the pattern growing process. The comprehensive experiments conducted show that DELISP has good scalability and outperforms the well-known GSP algorithm in the discovery of sequential patterns with time constraints.
Article PDF
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering. Taipei, Taiwan, pp 3–14
Ayres J, Gehrke JE, Yiu T, et al (2002) Sequential pattern mining using bitmaps. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. Alberta, Canada
Bettini C, Wang XS, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. Data Eng Bull 21:32–38
Garofalakis MN, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings of the 25th international conference on very large data bases. Edinburgh, Scotland, pp 223–234
Guralnik V, Garg N, Karypis G (2001) Parallel tree projection algorithm for sequence mining. In: Proceedings of the 7th international Euro-par conference on parallel processing, pp 310–320
Han J, Pei J, Mortazavi-Asl B, et al (2000) FreeSpan: Frequent pattern-projected sequential pattern mining. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
Lin MY, Lee SY (1998) Incremental update on sequential patterns in large databases. In: Proceedings of 10th IEEE international conference on tools with artificial intelligence. Taipei, Taiwan, pp 24–31
Lin MY, Lee SY (2002) Fast discovery of sequential patterns by memory indexing. In: Proceedings of the 4th international conference on data warehousing and knowledge discovery (DaWaK02). Aix-en-Provence, France, pp 150–160
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1:259–289
Masseglia F, Cathala F, Poncelet P (1998) The PSP approach for mining sequential patterns. In: Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery, vol 1510. Nantes, France, pp 176–184
Oates T, Schmill MD, Jensen D, et al (1997) A family of algorithms for finding temporal structure in data. In: Proceedings of the 6th international workshop on AI and statistics. Fort Lauderdale, Florida, pp 371–378
Pei J, Han J (2002) Constrained frequent pattern mining: A pattern-growth view. SIGKDD Explor 4:31–39
Pei J, Han J, Pinto H, et al (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of 2001 international conference on data engineering, pp 215–224
Pei J, Han J, Wang W (2002) Mining sequential patterns with constraints in large databases. In: Proceedings of the 11th international conference on information and knowledge management
Pinto H, Han J, Pei J, et al (2001) Multi-dimensional sequential pattern mining. In: Proceedings of the 10th international conference on information and knowledge management, pp 81–88
Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng 14:750–767
Rolland P (2001) FlExPat: Flexible extraction of sequential patterns. In: Proceedings of the IEEE international conference on data mining 2001, pp 481–488
Shintani T, Kitsuregawa M (1998) Mining algorithms for sequential patterns in parallel: Hash based approach. In: Proceedings of the 2nd Pacific–Asia conference on knowledge discovery and data mining, pp 283–294
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology. Avignon, France, pp 3–17 (an extended version is the IBM Research Report RJ 9994)
Tsoukatos I, Gunopulos D (2001) Efficient mining of spatiotemporal patterns. In: Proceedings of the 7th international symposium of advances in spatial and temporal databases, pp 425–442
Wang K (1997) Discovering patterns from large and dynamic sequential data. J Intell Inf Syst 9:33–56
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn J 42:31-60
Zaki MZ (2000) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management. Washington, DC, pp 422–429
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, MY., Lee, SY. Efficient mining of sequential patterns with time constraints by delimited pattern growth. Knowl Inf Syst 7, 499–514 (2005). https://doi.org/10.1007/s10115-004-0182-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-004-0182-5