Frequent Itemset Mining over Data Streams

Manku, Gurmeet Singh

doi:10.1007/978-3-540-28608-0_10

Gurmeet Singh Manku⁶

Part of the book series: Data-Centric Systems and Applications ((DCSA))

3556 Accesses
3 Citations

Abstract

We study the problem of computing frequent elements in a data-stream. Given support threshold \(s \in [0, 1]\), an element is said to be frequent if it occurs more than \(sN\) times, where \(N\) denotes the current length of the stream. If we maintain a list of counters of the form 〈element, count〉, one counter per unique element encountered, we need \(N\) counters in the worst-case. Many distributions are heavy-tailed in practice, so we would need far fewer than \(N\) counters. However, the number would still exceed \(1/s\), which is the maximum possible number of frequent elements. If we insist on identifying exact frequency counts, then \(\varOmega(N)\) space is necessary. This observation motivates the design of streaming techniques based on \(\epsilon\) -approximate frequency counts. We also discuss the extension of the ideas to the problem of mining frequent itemsets over streams, and relevant applications.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Mining Discriminative Itemsets in Data Streams

Mining Data Streams with Dynamic Confidence Intervals

Frequent Itemset Extraction over Data Streams Using Chernoff Bound

References

R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in Proc. of 20th Intl. Conf. on Very Large Data Bases (1994), pp. 487–499
Google Scholar
A. Arasu, G.S. Manku, Approximate counts and quantiles over sliding windows, in Proc. ACM Symposium on Principles of Database Systems (2004)
Google Scholar
C. Estan, G. Varghese, New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Article Google Scholar
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J. Ullman, Computing iceberg queries efficiently, in Proc. of 24th Intl. Conf. on Very Large Data Bases (1998), pp. 299–310
Google Scholar
R.M. Karp, C.H. Papadimitriou, S. Shenker, A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
Article Google Scholar
G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proc. 28th VLDB (2002), pp. 356–357
Google Scholar
J. Misra, D. Gries, Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA, USA
Gurmeet Singh Manku

Authors

Gurmeet Singh Manku
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gurmeet Singh Manku .

Editor information

Editors and Affiliations

University Campus - Kounoupidiana, School of ECE, Techn. Univ. of Crete University Campus - Kounoupidiana, Chania, Greece
Minos Garofalakis
Microsoft Corporation, Redmond, Washington, USA
Johannes Gehrke
Amazon India , Bangalore, India
Rajeev Rastogi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Manku, G.S. (2016). Frequent Itemset Mining over Data Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-28608-0_10
Published: 12 July 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Frequent Itemset Mining over Data Streams

Abstract

Chapter PDF

Similar content being viewed by others

Mining Discriminative Itemsets in Data Streams

Mining Data Streams with Dynamic Confidence Intervals

Frequent Itemset Extraction over Data Streams Using Chernoff Bound

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Frequent Itemset Mining over Data Streams

Abstract

Chapter PDF

Similar content being viewed by others

Mining Discriminative Itemsets in Data Streams

Mining Data Streams with Dynamic Confidence Intervals

Frequent Itemset Extraction over Data Streams Using Chernoff Bound

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation