Abstract
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithm outperforms the current state-of-the-art CHUD and CHUI-Miner algorithms, and that CLS-Miner scales linearly.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487–499
Zaki MJ, Gouda K. Fast vertical mining using diffsets. In: Proceedings of the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 326–335
Han J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53–87
Han J, Wang J, Lu Y, Tzvetkov P. Mining top-k frequent closed patterns without minimum support. In: Proceedings of IEEE International Conference on Data Mining. 2002, 211–218
Grahne G, Zhu J. Fast algorithms for frequent itemset mining using fptrees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10): 1347–1362
Deng Z H, Lv S L. PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications, 2015, 42(13): 5424–5432
Dam T L, Li K, Fournier-Viger P, Duong Q H. An efficient algorithm for mining top-rank-k frequent patterns. Applied Intelligence, 2016, 45(1): 96–111
Liu Y, Liao WK, Choudhary A. A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2005, 689–695
Liu M, Qu J. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 55–64
Tseng V, Shie B E, Wu C W, Yu P. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772–1786
Fournier-Viger P, Wu C W, Zida S, Tseng V. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of International Symposium on Methodologies for Intelligent Systems. 2014, 83–92
Song W, Liu Y, Li J. BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining, 2014, 10(1): 1–15
Song W, Zhang Z, Li J. A high utility itemset mining algorithm based on subsume index. Knowledge and Information Systems, 2015, 1–26
Duong Q H, Liao B, Fournier-Viger P, Dam T L. An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowledge-Based Systems, 2016, 104: 106–122
Ahmed C, Tanbeer S, Jeong B S, Lee Y K. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708–1721
Yang Q. Three challenges in data mining. Frontiers of Computer Science in China, 2010, 4(3): 324–333
Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 919–933
Song W, Liu Y, Li J. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence, 2014, 40(1): 29–43
Fournier-Viger P, Lin J C W, Duong Q H, Dam T L. In: Fujita H, Ali M, Selamat A, et al, eds. FHM+: Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction. Cham: Springer International Publishing, 2016, 115–127
Tseng V S, Wu C W, Fournier-Viger P, Yu P S. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 726–739
Pasquier N, Bastide Y, Taouil R, Lakhal L. Efficient mining of association rules using closed itemset lattices. Information Systems, 1999, 24(1): 25–46
Lucchese C, Orlando S, Perego R. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 21–36
Zaki M J, Hsiao C J. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 462–478
Fournier-Viger P. FHN: efficient mining of high-utility itemsets with negative unit profits. In: Proceedings of International Conference on Advanced Data Mining and Applications. 2014, 16–29
Tseng V, Wu CW, Fournier-Viger P, Yu P. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 54–67
Wu C W, Fournier-Viger P, Gu J Y, Tseng V S. Mining closed+ high utility itemsets without candidate generation. In: Proceedings of Conference on Technologies and Applications of Artificial Intelligence. 2015, 187–194
Chan R, Yang Q, Shen Y D. Mining high utility itemsets. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 19–26
Lan G C, Hong T P, Tseng V S. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85–107
Gouda K, Zaki M J. Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 2005, 11(3): 223–242
Uno T, Kiyomi M, Arimura H. LCMver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations. 2004
Szathmary L, Valtchev P, Napoli A, Godin R, Boc A, Makarenkov V. A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Annals of Mathematics and Artificial Intelligence, 2014, 70(1–2): 81–105
Fournier-Viger P, Wu C W, Tseng V S. Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the International Conference on Advanced Data Mining and Applications. 2014, 30–43
Shie B E, Philip S Y, Tseng V S. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 2012, 39(17): 12947–12960
Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Proceedings of the International Conference on Database Theory. 1999, 398–416
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C W, Tseng V S. SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research, 2014, 15: 3569–3573
Pisharath J, Liu Y, Liao W K, Choudhary A, Memik G, Parhi J. NU-MineBench 2.0. CUCIS Technical Report CUCIS-2005-08-01. 2005
Acknowledgements
The research was funded by the National Natural Science Foundation of China (Grant Nos. 61133005, 61432005, 61370095, 61472124, 61202109, and 61472126), and the International Science and Technology Cooperation Program of China (2015DFA11240 and 2014DFBS0010). T.-L. Dam was also partially supported by the science research fund of Hanoi University of Industry, Hanoi, Vietnam.
Author information
Authors and Affiliations
Corresponding author
Additional information
Thu-Lan Dam received the BS and MS degrees in computer science from Hanoi University of Science and Technology, Vietnam in 2004 and 2009. She is currently working toward the PhD degree in computer science at College of Computer Science and Electronic Engineering, Hunan University, China. Her research interests are optimization, operational research, data mining and knowledge discovery.
Kenli Li received the PhD degree in computer science from Huazhong University of Science and Technology, China in 2003. He was a visiting scholar at University of Illinois at Urbana Champaign, USA from 2004 to 2005. He is currently a full professor of computer science and technology at Hunan University, China, and deputy director of National Supercomputing Center in Changsha. His major research areas include parallel and distributed processing, cloud computing, scheduling algorithms, computer simulation, biological computing, and big data management. He has published more than 100 research papers in international conferences and journals such as IEEE-TC, IEEE-TPDS, JPDC, ICPP, CCGrid. He is an outstanding member of CCF. He is a member of the IEEE and serves on the editorial boards of IEEE Transactions on Computers.
Philippe Fournier-Viger is a full professor at School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, China. He received his PhD degree in cognitive computer science from the University of Quebec in Montreal, Canada in 2010. He has published more than 85 research papers in refereed international conferences and journals. His research interests include data mining, pattern mining, text mining, intelligent tutoring systems, knowledge representation and cognitive modeling. He is the founder of the popular SPMF open-source data mining library.
Quang-Huy Duong received the MS degree in computer science from Hunan University, China in 2016. He is currently working toward the PhD degree in computer science at the Faculty of Information Technology, Mathematics and Electrical Engineering, Norwegian University of Science and Technology, Norway. His research interests are optimization, data stream mining and bioinformatics.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Dam, TL., Li, K., Fournier-Viger, P. et al. CLS-Miner: efficient and effective closed high-utility itemset mining. Front. Comput. Sci. 13, 357–381 (2019). https://doi.org/10.1007/s11704-016-6245-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-016-6245-4