Abstract
Emerging patterns (EPs) are itemsets whose supports change significantly from one class to another. It has been shown that they are very powerful distinguishable features and they are very useful for constructing accurate classifiers. Previous EP mining approaches often produce a large number of EPs, which makes it very difficult to choose interesting ones manually. Usually, a post-processing filter step is applied for selecting interesting EPs based on some interestingness measures.
In this paper, we first generalize the interestingness measures for EPs, including the minimum support, the minimum growth rate, the subset relationship between EPs and the correlation based on common statistical measures such as chi-squared value. We then develop an efficient algorithm for mining only those interesting EPs, where the chi-squared test is used as heuristic to prune the search space. The experimental results show that our algorithm maintains efficiency even at low supports on data that is large, dense and has high dimensionality. They also show that the heuristic is admissible, because only unimportant EPs with low supports are ignored. Our work based on EPs for classification confirms that the discovered interesting EPs are excellent candidates for building accurate classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. ACMSIGMOD 1998, Seattle, WA, USA, June 1998, pp. 85–93 (1998)
Bethea, R.M., Duran, B.S., Boullion, T.L.: Statistical methods for engineers and scientists. M. Dekker, New York (1995)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proc. ACM-SIGKDD 1999, San Diego, CA, August 1999, pp. 43–52 (1999)
Dong, G., Zhang, X., Wong, L., Li, J.: Classification by aggregating emerging patterns. In: Proc. the 2nd Intl. Conf. on Discovery Science, Tokyo, pp. 30–42
Fan, H., Ramamohanarao, K.: An efficient single-scan algorithm for mining essential jumping emerging patterns for classification. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 456. Springer, Heidelberg (2002)
Fan, H., Ramamohanarao, K.: A bayesian approach to use emerging patterns for classification. In: Proc. 14th Australasian Database Conference, ADC 2003 (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. ACM-SIGMOD 2000, Dallas, TX, USA, May 2000, pp. 1–12 (2000)
Wong, L., Li, J.: Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)
Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems 3(2), 131–145 (2001)
Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: A new instance-based discovery and classification system. Machine Learning (to appear)
Li, J., Liu, H., Downing, J.R., Wong, L., Yeoh, A.: Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (all) patients. Bioinformatics 19(1), 71–78 (2003)
Li, J., Wong, L.: Geography of differences between two classes of data. In: Proc. 6th European Conf. on Principles of Data Mining and Knowledge Discovery, Helsinki, Finland, August 2002, pp. 325–337 (2002)
Lim, T., Loh, W., Shih, Y.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40, 203–228 (2000)
Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 8(6), 970–974 (1996)
Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: Proc. ACMSIGKDD 2000, Boston, USA, August 2000, pp. 310–314 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, H., Ramamohanarao, K. (2003). Efficiently Mining Interesting Emerging Patterns. In: Dong, G., Tang, C., Wang, W. (eds) Advances in Web-Age Information Management. WAIM 2003. Lecture Notes in Computer Science, vol 2762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45160-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-45160-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40715-7
Online ISBN: 978-3-540-45160-0
eBook Packages: Springer Book Archive