Abstract
The multidimensional k – NN (k nearest neighbors) query problem is relevant to a large variety of database applications, including information retrieval, natural language processing, and data mining. To solve it efficiently, the database needs an indexing structure that provides this kind of search. However, attempts to find an exact solution are hardly feasible in multidimensional space. In this paper, a novel indexing technique for the approximate solution of k – NN problem is described and analyzed. The construction of the indexing tree is based on clustering. Indexing structure is implemented on top of high-performance industrial DBMS.
This research is supported by Russian Foundation for Basic Research, grant 10-07-00156.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008), http://doi.acm.org/10.1145/1327452.1327494 , doi:10.1145/1327452.1327494
Barton, S., Gouet-Brunet, V., Rukoz, M.: Large scale disk-based metric indexing structure for approximate information retrieval by content. In: Proceedings of the 1st Workshop on New Trends in Similarity Search, NTSS 2011, pp. 2–7. ACM, New York (2011), http://doi.acm.org/10.1145/1966865.1966869 , doi:10.1145/1966865.1966869
Chen, J., Fang, H.R., Saad, Y.: Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009), http://dl.acm.org/citation.cfm?id=1577069.1755852
Ciaccia, P., Patella, M.: Bulk loading the m-tree. In: Proceedings of the 9th Australasian Database Conference, ADC 1998, pp. 15–26 (1998)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 426–435. Morgan Kaufmann Publishers Inc., San Francisco (1997), http://dl.acm.org/citation.cfm?id=645923.671005
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 2004, pp. 253–262. ACM, New York (2004), http://doi.acm.org/10.1145/997817.997857 , doi:10.1145/997817.997857
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco (1999), http://dl.acm.org/citation.cfm?id=645925.671516
Gudmundsson, G.T., Jónsson, B.T., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proceedings of the International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval, VLS-MCMR 2010, pp. 31–36. ACM, New York (2010), http://doi.acm.org/10.1145/1878137.1878145 , doi:10.1145/1878137.1878145
Günnemann, S., Kremer, H., Lenhard, D., Seidl, T.: Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT 2011, pp. 237–248. ACM, New York (2011), http://doi.acm.org/10.1145/1951365.1951395 , doi:10.1145/1951365.1951395
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 47–57. ACM, New York (1984), http://doi.acm.org/10.1145/602259.602266 , doi:10.1145/602259.602266
Guttman, A.: R-trees: a dynamic index structure for spatial searching. SIGMOD Rec. 14(2), 47–57 (1984), http://doi.acm.org/10.1145/971697.602266 , doi:10.1145/971697.602266
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 604–613. ACM, New York (1998), http://doi.acm.org/10.1145/276698.276876 , doi:10.1145/276698.276876
Moreno-Seco, F., Milcó, L., Oncina, J.: A modification of the laesa algorithm for approximated k-nn classification. Pattern Recogn. Lett. 24(1-3), 47–53 (2003), http://dx.doi.org/10.1016/S0167-86550200187-3 , doi:10.1016/S0167-8655(02)00187-3
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010), http://dl.acm.org/citation.cfm?id=1756006.1953015
Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proc. VLDB Endow. 3(1-2), 1231–1242 (2010), http://dl.acm.org/citation.cfm?id=1920841.1920994
Shen, H.T., Zhou, X., Zhou, A.: An adaptive and dynamic dimensionality reduction method for high-dimensional indexing. The VLDB Journal 16(2), 219–234 (2007), http://dx.doi.org/10.1007/s00778-005-0167-3 , doi:10.1007/s00778-005-0167-3
Skopal, T.: Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the Third International Conference on SImilarity Search and Applications, SISAP 2010, pp. 13–21. ACM, New York (2010), http://doi.acm.org/10.1145/1862344.1862347 , doi:10.1145/1862344.1862347
Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-Tree Building Principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)
Thomasian, A., Zhang, L.: Persistent clustered main memory index for accelerating k-nn queries on high dimensional datasets. Multimedia Tools Appl. 38(2), 253–270 (2008), http://dx.doi.org/10.1007/s11042-007-0179-7 , doi:10.1007/s11042-007-0179-7
Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000), http://dl.acm.org/citation.cfm?id=645339.650146
Vidal, E.: New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (aesa). Pattern Recognition Letters 15(1), 1–7 (1994)
Xu, J., Zheng, B., Lee, W.C., Lun Lee, D.: The d-tree: An index structure for planar point queries in location-based wireless services. IEEE Trans. on Knowl. and Data Eng. 16(12), 1526–1542 (2004), http://dx.doi.org/10.1109/TKDE.2004.97 , doi:10.1109/TKDE.2004.97
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mikhaylova, E., Novikov, B., Volokhov, A. (2013). An Indexing Structure for Dynamic Multidimensional Data in Vector Space. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32741-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-32741-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32740-7
Online ISBN: 978-3-642-32741-4
eBook Packages: EngineeringEngineering (R0)