Abstract
In this paper we study the problem of estimating several types of spatial queries in a streaming environment. We propose a new approach, which we call Local Kernels, for computing density estimators by using local rather than global statistics on the data. The approach is easy to extend to an on-line setting, by maintaining a small random sample with a kd-tree-like structure on top of it. Our structure dynamically adapts to changes in the locality of data and has small update time. Experimental results show that the proposed algorithm returns good approximate results for a variety of data and query distributions. We also show that it is useful in off-line computations, as well.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Acharya, S., Poosala, V., Ramaswamy, S.: Selectivity estimation n spatial databases. In: Proceedings of ACM SIGMOD, pp. 13–24 (1999)
Aggarwal, C.C., Procopiuc, C.M., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD, pp. 61–72 (1999)
Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking join and self-join sizes in limited storage. In: Proceedings of ACM PODS, pp. 10–20 (1999)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annu. ACM Symp. on the Theory of Computing (STOC), pp. 20–29 (1996)
Blohsfeld, B., Korus, D., Seeger, B.: A comparison of selectivity estimators for range queries on metric attributes. In: Proceedings of ACM SIGMOD, pp. 239–250 (1999)
Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: a multidimensional workload-aware histogram. In: Proceedings of ACM SIGMOD, pp. 211–222 (2001)
Cormode, G., Muthukrishnan, S.: The count-min sketch and its applications. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 29–38. Springer, Heidelberg (2004)
Cressie, N.A.C.: Statistics for Spatial Data. J. Wiley & Sons, New York (1993)
Das, A., Gehrke, J., Riedewald, M.: Approximation techniques for spatial data. In: Proceedings of ACM SIGMOD, pp. 695–706 (2004)
Gilbert, A.C., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 34th Annu. ACM Symp. on the Theory of Computing (STOC), pp. 389–398 (2002)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng. 15(3), 541–554 (2003)
Greenwald, M., Khanna, S.: Efficient online computation of quantile summaries. In: Proceedings of ACM SIGMOD, pp. 58–66 (2001)
Guha, S., Kim, C., Shim, K.: Xwave: Approximate extended wavelets for streaming data. In: Proceedings of the 30th VLDB Conference (2004)
Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: Proceedings of ICDE, pp. 567–578 (2002)
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: Proceedings of ACM SIGMOD, pp. 463–474 (2000)
Hershberger, J., Shrivastava, N., Suri, S., Toth, C.D.: Adaptive spatial partitioning for multidimensional data streams. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 522–533. Springer, Heidelberg (2004)
Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Random sampling techniques for space efficient online computation of order statistics of large datasets. In: Proceedings of ACM SIGMOD, pp. 251–262 (1999)
Pach, J., Agarwal, P.K.: Combinatorial Geometry. J. Wiley & Sons, New York (1995)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Proceedings of the 23rd VLDB Conference, pp. 486–495 (1997)
Scott, D.W.: Multivariate Density Estimation. Wiley Interscience, Hoboken (1992)
Suri, S., Toth, C.D., Zhou, Y.: Range counting over multidimensional data streams. In: Proceedings of Symp. on Computational Geometry, SCG (2004)
Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: Proceedings of ACM SIGMOD, pp. 428–439 (2002)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)
Vitter, J.S.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Procopiuc, C.M., Procopiuc, O. (2005). Density Estimation for Spatial Data Streams. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds) Advances in Spatial and Temporal Databases. SSTD 2005. Lecture Notes in Computer Science, vol 3633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11535331_7
Download citation
DOI: https://doi.org/10.1007/11535331_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28127-6
Online ISBN: 978-3-540-31904-7
eBook Packages: Computer ScienceComputer Science (R0)