Abstract
We study the problem of computing frequent elements in a data-stream. Given support threshold \(s \in [0, 1]\), an element is said to be frequent if it occurs more than \(sN\) times, where \(N\) denotes the current length of the stream. If we maintain a list of counters of the form 〈element, count〉, one counter per unique element encountered, we need \(N\) counters in the worst-case. Many distributions are heavy-tailed in practice, so we would need far fewer than \(N\) counters. However, the number would still exceed \(1/s\), which is the maximum possible number of frequent elements. If we insist on identifying exact frequency counts, then \(\varOmega(N)\) space is necessary. This observation motivates the design of streaming techniques based on \(\epsilon\) -approximate frequency counts. We also discuss the extension of the ideas to the problem of mining frequent itemsets over streams, and relevant applications.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in Proc. of 20th Intl. Conf. on Very Large Data Bases (1994), pp. 487–499
A. Arasu, G.S. Manku, Approximate counts and quantiles over sliding windows, in Proc. ACM Symposium on Principles of Database Systems (2004)
C. Estan, G. Varghese, New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J. Ullman, Computing iceberg queries efficiently, in Proc. of 24th Intl. Conf. on Very Large Data Bases (1998), pp. 299–310
R.M. Karp, C.H. Papadimitriou, S. Shenker, A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proc. 28th VLDB (2002), pp. 356–357
J. Misra, D. Gries, Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Manku, G.S. (2016). Frequent Itemset Mining over Data Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-28608-0_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)