Abstract
Similarity search (e.g., k-nearest neighbor search) in high-dimensional metric space is the key operation in many applications, such as multimedia databases, image retrieval and object recognition, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are built by partitioning the data set based on distances to certain reference point(s). Using the index, search is confined to a small number of partitions. However, these methods either ignore the property of the data distribution (e.g., VP-tree and its variants) or produce non-disjoint partitions (e.g., M-tree and its variants, DBM-tree); these greatly affect the search efficiency. In this paper, we study the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
An J, Chen H, Furuse K, Ohbo N (2005) CVA file: an index structure for high-dimensional datasets. Knowl Inform Syst 7: 337–357
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Comm ACM 51(1): 117–122
Berchtold S, Keim D, Kriegel HP (1996) The X-tree: an index structure for high-dimensional data. In: Proceedings of VLDB
Beyer K et al (1999) When is “Nearest Neighbor” meaningful? In: Proceedings of IEEE ICDE
Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In: Proceedings of ACM SIGMOD, pp 357–368
Brin S (1995) Near neighbor search in large metric spaces. In: Proceedings of VLDB, pp 574–584
Burkhard WA, Keller RM (1973) Some approaches to best-match file searching. Comm ACM 16(4): 230–236
Chavez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recognit Lett 26(9): 1363–1376
Ciaccia P, Patella M, Zezula P (1997) M-Tree: an efficient access method for similarity search in metric spaces. VLDB J 426–435
Data set is available at: http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html
Cui B, Ooi BC, Su J, Tan T (2005) Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans Knowl Data Eng 17(3): 339–353
Dohnal V et al (2003) D-index: distance searching index for metric data sets. Multimedia Tools Appl 21(19): 33
Filho RFS, Traina AJM, Traina C, Faloutsos C (2001) Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of IEEE ICDE, pp 623–630
Fu AW-C et al (2000) Dynamic VP-tree indexing for N-Nearest search given pair-wise distances. VLDB J
Source code is available at http://www.cs.mu.oz.au/~rui/code.htm
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2): 364–397
Katamaya N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of ACM SIGMOD
Data set is available at: http://kodiak.cs.cornell.edu/kddcup/datasets.html
Kailing K, Kriegel H-P, Pfeifle M, Schonauer S (2006) Extending metric index structures for efficient range query processing. Knowl Inform Syst 10(2): 211–227
Kadiyala S, Shiri N (2008) A compact multi-resolution index for variable length queries in time series databases. Knowl Inform Syst 15: 131–147
Lowe DG (2004) Distinctive image features from scale invariant features. IJCV 60(2): 91–110
Nievergelt J, Hinterberger H, Sevcik KC (1984) The grid file: an adaptable, symmetric multikey file structure. ACM Trans Database Syst 9(1)
Pawlak Z (1991) Rough sets theoretical aspects of reasoning about data. Kluwer, Dordrecht
Seidl T, Kriegel HP (1998) Optimal multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD
Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In: Proceedings of ICDT, pp 158–172
Traina C, Traina AJM, Seeger B, Faloutsos C (2000) Slim-trees: highet performance metric trees minimizing overlap between nodes. In: Proceedings of EDBT, pp 51–65
Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inform Process Lett 40: 175–179
Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequences databases. In: Proceedings of VLDB
Vieira MR, Traina C, Chino FJT, Traina AJM (2004) DBM-tree: a dynamic metric access method sensitive to local density data. In: Proceedings of SBBD, pp 163–177
Source code is available at: http://www.cse.cuhk.edu.hk/~kdd/program.html
White DA, Jain R (1996) Similarity indexing with the ss-tree. In: Proceedings of IEEE ICDE
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional space. In: Proceedings of VLDB
Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of ACM-SIAM symposium on discrete algorithms, pp 311–321
Yianilos P (1999) Excluded middle vantage point forests for nearest neighbor search. In: Implementation challenge, ALENEX’99
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, M., Alhajj, R. Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22, 1–26 (2010). https://doi.org/10.1007/s10115-008-0190-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0190-y