Abstract
We consider a distributed system where each node has a local count for each item (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose sum of counts, across all nodes in the system, is the largest. In this paper we present a Monte-Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node.
This research was supported in part by Israel Ministry of Science and Technology contract 3-941.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Attiya, H., Welch, J.: Distributed Algorithms. McGraw-Hill Publishing Company, UK (1998)
Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proc. 2003 ACM SIGMOD (2003)
Bak, P.: How Nature Works: The science of self-organized criticality. Springer, New York (1996)
Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top k retrieval in peer-to-peer networks. In: Proc. 21st Int. Conf. on Data Engineering (2005)
Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: Proc. 18th Int. Conf. on Data Engineering (2002)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proc. 23rd Ann. ACM Symp. on Principles of Distributed Computing (2004)
Considine, J., Li, F., Kollios, G., Byers, J.: Approximate aggregation techniques for sensor databases (April, 2004)
Cormode, G., Garofalakis, M.N., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In: Proc. 2005 ACM SIGMOD (2005)
Dagum, P., Karp, R.M., Luby, M., Ross, S.: An optimal algorithm for Monte Carlo estimation. SIAM J. Comput. 29(5) (2000)
Durand, M., Flajolet, P.: Loglog counting of large cardinalities (extended abstract). In: Algorithms: ESA 11th Ann. European Symp. (2003)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proc. 20th ACM Symp. on Principles of Database Systems (2001)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: Proc. SIGCOMM 1999, ACM Press, New York (1999)
Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of the 21st Annual ACM Symposium on Theory of Computing (May 1989)
Greenwald, M., Khanna, S.: Power-conserving computation of order-statistics over sensor networks. In: Proc. 23rd ACM Symp. on Principles of Database Systems (2004)
Lynch, N.: Distributed Algorithms. Morgan Kaufmann, San Mateo (1995)
Madden, S., Franklin, M.J., Hellerstein, J.M., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proc. ACM SIGMOD (2003)
Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: Proc. 31st Int. Conf. on Very Large Data Bases (2005)
Nath, S., Gibbons, P.B., Seshan, S., Anderson, Z.R.: Synopsis diffusion for robust aggregation in sensor networks. In: SenSys 2004: Proc. 2nd international conference on Embedded networked sensor systems (2004)
Panconesi, A., Srinivasan, A.: Fast randomized algorithms for distributed edge coloring (extended abstract). In: Proc. 11th Ann. ACM Symp. on Principles of Distributed Computing (1992)
Patt-Shamir, B.: A note on efficient aggregate queries in sensor networks. In: Proc. 23rd Ann. ACM Symp. on Principles of Distributed Computing (2004)
Silberstein, A., Braynard, R., Ellis, C., Munagala, K., Yang, J.: A sampling-based approach to optimizing top-k queries in sensor networks. In: Proc. 22nd Int. Conf. on Data Engineering (2006)
Warneke, B.: Miniaturizing sensor networks with mems. In: Ilyas, M., Mahgoub, I. (eds.) Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems, CRC Press, Boca Raton (2004)
Yao, A.C.-C.: Should tables be sorted? J. ACM 28(3) (1981)
Yao, Y., Gehrke, J.: The Cougar approach to in-network query processing in sensor networks. ACM SIGMOD Record 31(3), 9–18 (2002)
Zeinalipour-Yazti, D., Vagena, Z., Gunopulos, D., Kalogeraki, V., Tsotras, V., Vlachos, M., Koudas, N., Srivastava, D.: The threshold join algorithm for top-k queries in distributed sensor networks. In: Proc. 2nd Int. Workshop on Data Management for Sensor Networks (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Patt-Shamir, B., Shafrir, A. (2006). Approximate Top-k Queries in Sensor Networks. In: Flocchini, P., Gąsieniec, L. (eds) Structural Information and Communication Complexity. SIROCCO 2006. Lecture Notes in Computer Science, vol 4056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780823_25
Download citation
DOI: https://doi.org/10.1007/11780823_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35474-1
Online ISBN: 978-3-540-35475-8
eBook Packages: Computer ScienceComputer Science (R0)