Abstract
Mining contrast sequential patterns, which are sequential patterns that characterize a given sequence class and distinguish that class from another given sequence class, has a wide range of applications including medical informatics, computational finance and consumer behavior analysis. In previous studies on contrast sequential pattern mining, each element in a sequence is a single item or symbol. This paper considers a more general case where each element in a sequence is a set of items. The associated contrast sequential patterns will be called itemset-based distinguishing sequential patterns (itemset-DSP). After discussing the challenges on mining itemset-DSP, we present iDSP-Miner, a mining method with various pruning techniques, for mining itemset-DSPs that satisfy given support and gap constraint. In this study, we also propose a concise border-like representation (with exclusive bounds) for sets of similar itemset-DSPs and use that representation to improve efficiency of our proposed algorithm. Our empirical study using both real data and synthetic data demonstrates that iDSP-Miner is effective and efficient.
This work was supported in part by NSFC 61103042, SKLSE2012-09-32, and China Postdoctoral Science Foundation 2014M552371. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Dong, G., Pei, J.: Sequence Data Mining. Springer-Verlag, Berlin, Heidelberg (2007)
Dong, G., Bailey, J., eds.: Contrast Data Mining: Concepts, Algorithms, and Applications. CRC Press (2012)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE Computer Society, Washington, DC (1995)
Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)
Yan, X., Han, J., Afshar, R.: Clospan: mining closed sequential patterns in large databases. In: SDM (2003)
Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 15th International Conference on Data Engineering, pp. 106–115. IEEE Computer Society, Washington, DC (1999)
Zhang, M., Kao, B., Cheung, D.W., Yip, K.Y.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), August 2007
Pei, J., Wang, H., Liu, J., Wang, K., Wang, J., Yu, P.S.: Discovering frequent closed partial orders from strings. IEEE Trans. on Knowl. and Data Eng. 18(11), 1467–1481 (2006)
Ferreira, P.G., Azevedo, P.J.: Protein sequence pattern mining with constraints. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 96–107. Springer, Heidelberg (2005)
She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–445. ACM, New York, NY (2003)
Zeng, Q., Chen, Y., Han, G., Ren, J.: Sequential pattern mining with gap constraints for discovery of the software bug features. Journal of Computational Information Systems 10(2), 673–680 (2014)
Conklin, D., Anagnostopoulou, C.: Comparative pattern analysis of cretan folk songs. Journal of New Music Research 40(2), 119–125 (2011)
Rabatel, J., Bringay, S., Poncelet, P.: Contextual sequential pattern mining. In: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops. ICDMW 2010, pp. 981–988. IEEE Computer Society, Washington, DC (2010)
Feng, J., Xie, F., Hu, X., Li, P., Cao, J., Wu, X.: Keyword extraction based on sequential pattern mining. In: Proceedings of the Third International Conference on Internet Multimedia Computing and Service. ICIMCS 2011, pp. 34–38. ACM, New York, NY (2011)
Chang, J.H.: Mining weighted sequential patterns in a sequence database with a time-interval weight. Know.-Based Syst. 24(1), 1–9 (2011)
Cécile, L.K., Chedy, R., Mehdi, K., Jian, P.: Mining statistically significant sequential patterns. In: Proceedings of the 13th IEEE International Conference on Data Mining (ICDM2013). ICDM2013, pp. 488–497. IEEE Computer Society, Dallas, TX (2013)
Antunes, C., Oliveira, A.L.: Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNAI 2734, vol. 2734, pp. 239–251. Springer, Heidelberg (2003)
Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U., Chun Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering, pp. 215–224. IEEE Computer Society, Washington, DC (2001)
Xie, F., Wu, X., Hu, X., Gao, J., Guo, D., Fei, Y., Hua, E.: MAIL: mining sequential patterns with wildcards. Int. J. Data Min. Bioinformatics 8(1), 1–23 (2013)
Zhang, M., Kao, B., Cheung, D.W., Yip, K.Y.: Mining periodic patterns with gap requirement from sequences. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(2), 7 (2007)
Shah, C.C., Zhu, X., Khoshgoftaar, T.M., Beyer, J.: Contrast pattern mining with gap constraints for peptide folding prediction. In: FLAIRS Conference, pp. 95–100 (2008)
Deng, K., Zaïane, O.R.: Contrasting sequence groups by emerging sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 377–384. Springer, Heidelberg (2009)
Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS 8421, vol. 8421, pp. 372–387. Springer, Switzerland (2014)
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52 (1999)
Li, J., Liu, G., Wong, L.: Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2007, pp. 430–439 (2007)
Rymon, R.: Search through systematic set enumeration. In: Proc. of the 3rd Int’l Conf. on Principle of Knowledge Representation and Reasoning. KR 1992, pp. 539–550 (1992)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, H., Duan, L., Dong, G., Nummenmaa, J., Tang, C., Li, X. (2015). Mining Itemset-based Distinguishing Sequential Patterns with Gap Constraint. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-18120-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)