Abstract
In this paper, we study the incremental update of Frequent Closed Itemsets (FCIs) over a sliding window in a high-speed data stream. We propose the notion of semi-FCIs, which is to progressively increase the minimum support threshold for an itemset as it is retained longer in the window, thereby drastically reducing the number of itemsets that need to be maintained and processed. We explore the properties of semi-FCIs and observe that a majority of the subsets of a semi-FCI are not semi-FCIs and need not be updated. This finding allows us to devise an efficient algorithm, IncMine, that incrementally updates the set of semi-FCIs over a sliding window. We also develop an inverted index to facilitate the update process. Our empirical results show that IncMine achieves significantly higher throughput and consumes less memory than the state-of-the-art streaming algorithms for mining FCIs and FIs. IncMine also attains high accuracy of 100% precision and over 93% recall.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD, (pp. 207–216).
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499).
Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD, (pp. 487–492).
Chang, J. H., & Lee, W. S. (2004). A sliding window method for finding recently frequent itemsets over online data streams. Journal of Information Science and Engineering, 20(4), 753–762.
Chen, Y., Dong, G., Han, J., Wah, B. W., & Wang, J. (2002). Multi-dimensional regression analysis of time-series data streams. In VLDB, (pp. 323–334).
Chi, Y., Wang, H., Yu, P. S., & Muntz, R. R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In ICDM, (pp. 59–66).
FIMI Dataset Repository (2003). Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/.
Garofalakis, M. N., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD, (p. 63).
Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2004). Mining frequent patterns in data streams at multiple time granularities. Cambridge, MA: MIT Press.
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD, (pp. 1–12).
IBM Quest (1996). Ibm quest data mining project. frequent itemset mining dataset repository. http://www.almaden.ibm.com/software/quest/.
Jiang, N. & Gruenwald, L. (2006). CFI-Stream: Mining closed frequent itemsets in data streams. In KDD, (pp. 592–597).
Kifer, D., Ben-David, S., & Gehrke, J. (2004). Detecting change in data streams. In VLDB, (pp. 180–191).
Lee, C.-H., Lin, C.-R., & Chen, M.-S. (2001). Sliding-window filtering: An efficient algorithm for incremental mining. In CIKM, (pp. 263–270).
Li, H., Lee, S., & Shan, M. (2004). Algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of First International Workshop on Knowledge Discovery in Data Streams.
Manku, G. S. & Motwani, R. (2002). Approximate frequency counts over data streams. In VLDB, pages 346–357.
Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In ICDT, (pp. 398–416).
Pei, J., Han, J., & Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, (pp. 21–30).
Wang, J., Han, J., & Pei, J. (2003). CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In KDD, (pp. 236–245).
Yu, J. X., Chong, Z., Lu, H., & Zhou, A. (2004). False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In VLDB, (pp. 204–215).
Zaki, M. J. (2000). Generating non-redundant association rules. In KDD, (pp. 34–43).
Zaki, M. J., & Hsiao, C.-J. (2002). Charm: An efficient algorithm for closed itemset mining. In SDM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, J., Ke, Y. & Ng, W. Maintaining frequent closed itemsets over a sliding window. J Intell Inf Syst 31, 191–215 (2008). https://doi.org/10.1007/s10844-007-0042-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-007-0042-3