Abstract
This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitioning. This method can avoid data skew and reduce communication cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Shah M, Hellerstein J, Chandrasekaran S,et al. Flux: An Adaptive Partitioning Operator for Continuous Query System.Report No. UCB/CSD-2-1205. Berkeley: University of California, 2002.
Gray J, Bosworth A, Layman A,et al. Data cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub-Total. In: Su S Y W, Ed.Proc of Intl. Conf on Data Engineering, New Orleans: IEEE Computer Society, 1996, 152–159.
Alin D, Minos G, Johannes G,et al. Processing complex aggregate queries over data streams.Proc of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2002, 61–72.
Guha S, Koudas N, Shim K. Data-Streams and histograms. In: Yannakakis M, Ed.Proc of the 33rd Annual ACM Symp on Theory of Computing. Heraklion: ACM Press, 2001, 471–475.
Chandrasekaran S, Cooper O, Deshpande A,et al. Telegraph CQ: Continuous Dataflow Processing for an Uncertain World.Proc Conf on Innovative Data Syst Res, Asilomar, CA, January 2003, 269–280.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by Foundation of High Technology Project of Jiangsu (BG2004034), Foundation of Graduate Creative Program of Jiangsu (xm04-36)
Biography: WANG Yong-li (1974-), male, Ph. D. candidate, research direction: data streams processing, knowledge discovery, hardware and software blending.
Rights and permissions
About this article
Cite this article
Yong-li, W., Hong-bing, X., Li-zhen, X. et al. A granularity-aware parallel aggregation method for data streams. Wuhan Univ. J. Nat. Sci. 11, 133–137 (2006). https://doi.org/10.1007/BF02831718
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02831718