Abstract
A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets are likely to be more informative than small cardinality itemsets in this type of dataset. This paper proposes an approach, termed DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a Compact Row-Tree data structure to represent itemsets during the search process. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental results show that DisClose can achieve extraction of colossal closed itemsets in the discovered datasets, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Pan, F., Cong, G., Tung, A.K.H., Yang, J., Zaki, M.J.: CARPENTER: Finding closed patterns in long biological datasets. In: Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp. 637–642. ACM (2003)
Liu, H., Wang, X., He, J., Han, J., Xin, D., Shao, Z.: Top-down mining of frequent closed patterns from very high dimensional data. Information Science 179(7), 899–924 (2009)
Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent closed patterns by core pattern fusion. In: Proc. International Conference on Data Engineering (ICDE 2007), pp. 706–715. IEEE (2007)
Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), pp. 85–93. ACM, New York (1998)
Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: Proc. 1st IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), pp. 123–132 (2003)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15(1), 55–86 (2007)
Rioult, F., Boulicaout, J., Cremilleux, B., Besson, J.: Using transposition for pattern discovery from microarray data. In: Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), pp. 73–79. ACM (2003)
Cong, G., Tan, K.-L., Tung, A., Pan, F.: Mining Frequent Closed Patterns in Microarray Data. In: Proc. Fourth IEEE Int’l Conf. Data Mining (ICDM), vol. 4, pp. 363–366 (2004)
Besson, J., Robardet, C., Boulicaout, J.-F., Rome, S.: Constraint-based mining and its application to microarray data analysis. Intelligent Data Analysis Journal 9(1), 59–82 (2005)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zulkurnain, N.F., Haglin, D.J., Keane, J.A. (2013). DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree. In: Washio, T., Luo, J. (eds) Emerging Trends in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36778-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-36778-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36777-9
Online ISBN: 978-3-642-36778-6
eBook Packages: Computer ScienceComputer Science (R0)