Abstract
Range searches in metric spaces can be very difficult if the space is “high dimensional”, i.e. when the histogram of distances has a large mean and/or a small variance. This so-called “curse of dimensionality”, well known in vector spaces, is also observed in metric spaces. There are at least two reasons behind the curse of dimensionality: a large search radius and/or a high intrinsic dimension of the metric space. We present a general probabilistic framework based on stretching the triangle inequality, whose direct effect is a reduction of the effective search radius. The technique gets more effective as the dimension grows, and the basic principle can be applied to any search algorithm. In this paper we apply it to a particular class of indexing algorithms. We present an analysis which helps understand the process, as well as empirical evidence showing dramatic improvements in the search time at the cost of a very small error probability.
This work has been partially supported by CYTED VII.13 AMYRI Project (both authors), CONACyT grant R-28923A (first author) and FONDECYT Project 1-000929 (second author).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimension. In Proc. SODA’ 94, pages 573–583, 1994.
J. Bentley. Multidimensional binary search trees in database applications. IEEE Transactions on Software Engineering, 5(4):333–340, 1979.
S. Berchtold, D. Keim, and H. Kriegel. The X-tree: an index structure for high-dimensional data. In Proc. VLDB’96, pages 28–39, 1996.
E. Chávez, J. Marroquýn, and G. Navarro. Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools and Applications, 2001. To appear.
E. Chávez and G. Navarro. Measuring the dimensionality of general metric spaces. Tech.Rep. TR/DCC-00-1, Dept. of Computer Science, Univ. of Chile, 2000. Submitted, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/metricmodel.ps.gz
E. Chávez and G. Navarro. A probabilistic spell for the curse of dimensionality. Tech.Rep. TR/DCC-00-2, Dept. of Computer Science, Univ. of Chile, 2000. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/probmetric.ps.gz
E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquýn. Searching in metric spaces. ACM ComputingSurveys, 2001.Toappear. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survmetric.ps.gz
B. Chazelle. Computational geometry: a retrospective. In Proc. ACM STOC’94, pages 75–94, 1994.
K. Clarkson. Nearest neighbor queries in metric spaces. Discrete Computational Geometry, 22(1):63–93, 1999.
A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proc. ACM SIGMOD’84, pages 47–57, 1984.
D. Harman. Overview of the Third Text REtrieval Conference. In Proc. TREC-3,pages 1–19, 1995. NIST Special Publication 500-207.
L. Micó, J. Oncina, and E. Vidal. A new version of the nearest-neighbor approximating and eliminating search (aesa) with linear preprocessing-time and memory requirements. Pattern Recognition Letters, 15:9–17, 1994.
D. White and R. Jain. Algorithms and strategies for similarity retrieval. Tech.Rep. VCL-96-101, Visual Comp. Lab., Univ. of California, La Jolla, CA, July 1996.
P. Yianilos. Excluded middle vantage point forests for nearest neighbor search. In DIMACS Implementation Challenge, ALENEX’99, Baltimore, MD, 1999.
Peter N. Yianilos. Locally lifting the curse of dimensionality for nearest neighbor search. Technical report, NEC Research Institute, Princeton, NJ, June 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chávez, E., Navarro, G. (2001). A Probabilistic Spell for the Curse of Dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_12
Download citation
DOI: https://doi.org/10.1007/3-540-44808-X_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42560-1
Online ISBN: 978-3-540-44808-2
eBook Packages: Springer Book Archive