Mining sequential patterns: Generalizations and performance improvements

Srikant, Ramakrishnan; Agrawal, Rakesh

doi:10.1007/BFb0014140

Ramakrishnan Srikant^1,2 &
Rakesh Agrawal¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1057))

Included in the following conference series:

International Conference on Extending Database Technology

2640 Accesses
799 Citations

Abstract

The problem of mining sequential patterns was recently introduced in [3]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-specified minimum support, where the support of a pattern is the number of data-sequences that contain the pattern. An example of a sequential pattern is“5% of customers bought ‘Foundation’ and ‘Ringworld’ in one transaction, followed by ‘Second Foundation’ in a later transaction”. We generalize the problem as follows. First, we add time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern. Second, we relax the restriction that the items in an element of a sequential pattern must come from the same transaction, instead allowing the items to be present in a set of transactions whose transaction-times are within a user-specified time window. Third, given a user-defined taxonomy (is-a hierarchy) on items, we allow sequential patterns to include items across all levels of the taxonomy.

We present GSP, a new algorithm that discovers these generalized sequential patterns. Empirical evaluation using synthetic and real-life data indicates that GSP is much faster than the AprioriAll algorithm presented in [3]. GSP scales linearly with the number of data-sequences, and has very good scale-up properties with respect to the average data-sequence size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., May 1993.
Google Scholar
R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, September 1994.
Google Scholar
R. Agrawal and R. Srikant. Mining Sequential Patterns. In Proc. of the 11th Int'l Conference on Data Engineering, Taipei, Taiwan, March 1995.
Google Scholar
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proc. of the Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95), Montreal, Canada, August 1995.
Google Scholar
R. Srikant and R. Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995.
Google Scholar
R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. Research Report RJ 9994, IBM Almaden Research Center, San Jose, California, December 1995.
Google Scholar
J. T.-L. Wang, G.-W. Chirn, T. G. Marr, B. Shapiro, D. Shasha, and K. Zhang. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proc. of the ACM SIGMOD Conference on Management of Data, Minneapolis, May 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Almaden Research Center, 650 Harry Road, 95120, San Jose, CA
Ramakrishnan Srikant & Rakesh Agrawal
Department of Computer Science, University of Wisconsin, Madison
Ramakrishnan Srikant

Authors

Ramakrishnan Srikant
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Peter Apers Mokrane Bouzeghoub Georges Gardarin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srikant, R., Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds) Advances in Database Technology — EDBT '96. EDBT 1996. Lecture Notes in Computer Science, vol 1057. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0014140

Download citation

DOI: https://doi.org/10.1007/BFb0014140
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61057-1
Online ISBN: 978-3-540-49943-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics