Abstract
One of the sequential pattern mining problems is to find the maximal frequent sequences in a database with a β support. In this paper, we propose a new algorithm to find all the maximal frequent sequences in a text instead of a database. Our algorithm in comparison with the typical sequential pattern mining algorithms avoids the joining, pruning and text scanning steps. Some experiments have shown that it is possible to get all the maximal frequent sequences in a few seconds for medium texts.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Fayyad, U., Piatetsky-Shapiro, G.: Advances in Knowledge Discovery and Data mining. AAAI Press, Menlo Park (1996)
Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases (KDT). In: Proceedings of the 1st International Conference on Knowledge Discovery, KDD 1995 (1995)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of the International Conference on Data Engineering (1995)
Lin, Dao-I. Fast Algorithms for Discovering the Maximum Frequent Set, Ph. Thesis, New York University (1998)
Pei, J.H., et al.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc International Conference on Data Engineering, ICDE 2001 (2001)
Zaki, M.j.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. In: Machine Learning, Kluwer Academic Publishers, Dordrecht (2000)
Ahonen, H.: Finding All Maximal Frequent Sequences in Text. In: ICML 1999 Workshop: Machine Learning in Text Data (1999)
Antunes, C., Oliveira, A.: Generalization of Pattern-growth Methods for Sequential Pattern Mining with Gap Constraints. In: Third IAPR Workshop on Machine Learning and Data Mining MLDM 2003 (2003)
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Discovery and Data Mining (1996)
Public domain documents from American and English literature as well as Western philosophy, http://www.infomotions.com/alex/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
García-Hernández, R.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A. (2004). A Fast Algorithm to Find All the Maximal Frequent Sequences in a Text. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2004. Lecture Notes in Computer Science, vol 3287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30463-0_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-30463-0_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23527-9
Online ISBN: 978-3-540-30463-0
eBook Packages: Springer Book Archive