Abstract
Nearest neighbour search (NNS) is an old problem that is of practical importance in a number of fields. It involves finding, for a given point q, called the query, one or more points from a given set of points that are nearest to the query q. Since the initial inception of the problem a great number of algorithms and techniques have been proposed for its solution. However, it remains the case that many of the proposed algorithms have not been compared against each other on a wide variety of datasets. This research attempts to fill this gap to some extent by presenting a detailed empirical comparison of three prominent data structures for exact NNS: KD-Trees, Metric Trees, and Cover Trees. Our results suggest that there is generally little gain in using Metric Trees or Cover Trees instead of KD-Trees for the standard NNS problem.
Chapter PDF
Similar content being viewed by others
References
Minsky, M., Papert, S.: Perceptrons, pp. 222–225. MIT Press, Cambridge (1969)
Aurenhammer, F.: Voronoi diagrams–A survey of a fundamental geometric data structure. ACM Computing Surveys 23(3), 345–405 (1991)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
Liu, T., Moore, A.W., Gray, A.G.: Efficient exact k-NN and nonparametric classification in high dimensions. In: Proc. of NIPS 2003, MIT Press, Cambridge (2004)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. 13th Annual ACM symposium on Theory of Computing, pp. 604–613. ACM Press, New York (1998)
Nene, S.A., Nayar, S.K.: A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans. Pattern Anal. Mach. Intell. 19(9), 989–1003 (1997)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. 20th Annual Symposium on Computational Geometry, pp. 253–262. ACM Press, New York (2004)
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proc. 23rd International Conference on Machine learning, pp. 97–104. ACM Press, New York (2006)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Mount, D.M., Arya, S.: ANN: A library for approximate nearest neighbor searching. In: CGC 2nd Annual Fall Workshop on Computational Geometry (1997), Available from http://www.cs.umd.edu/~mount/ANN
Kibriya, A.M.: Fast algorithms for nearest neighbour search. Master’s thesis, Department of Computer Science, University of Waikato, New Zealand (2007)
Omohundro, S.M.: Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute (December 1989)
Uhlmann, J.K.: Satisfying general proximity / similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)
Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Proc. 16th Conference on Uncertainty in Artificial Intelligence, pp. 397–405. Morgan Kaufmann, San Francisco (2000)
Bei, C.D., Gray, R.M.: An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Transactions on Communications 33(10), 1132–1133 (1985)
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kibriya, A.M., Frank, E. (2007). An Empirical Comparison of Exact Nearest Neighbour Algorithms. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-74976-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)