Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search

Yin, Shaoyi; Badr, Mehdi; Vodislav, Dan

doi:10.1007/978-3-642-40285-2_7

Shaoyi Yin²¹,
Mehdi Badr²¹ &
Dan Vodislav²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8055))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1894 Accesses
3 Citations

Abstract

Locality-Sensitive Hashing (LSH) is widely used to solve approximate nearest neighbor search problems in high-dimensional spaces. The basic idea is to map the “nearby” objects into a same hash bucket with high probability. A significant drawback is that LSH requires a large number of hash tables to achieve good search quality. Multi-probe LSH was proposed to reduce the number of hash tables by looking up multiple buckets in each table. While optimized for a main memory database, it is not optimal when multi-dimensional vectors are stored in a secondary storage, because the probed buckets may be randomly distributed in different physical pages. In order to optimize the I/O efficiency, we propose a new method called Dynamic Multi-probe LSH which groups small hash buckets into a single bucket by dynamically increasing the number of hash functions during the index construction. Experimental results show that our method is significantly more I/O efficient.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Efficient locality-sensitive hashing over high-dimensional streaming data

Article 17 September 2020

DASH: Data Aware Locality Sensitive Hashing

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Keywords

References

Bawa, M., Condie, T., Ganesan, P.: Lsh forest: self-tuning indexes for similarity search. In: WWW, pp. 651–660 (2005)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Article MathSciNet MATH Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-Tree: an index structure for high-dimensional data. In: Proceedings of the 22nd VLDB Conference, pp. 28–39 (1996)
Google Scholar
Buhler, J.: Efficient large scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 419–428 (2001)
Article Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB Conference, pp. 426–435 (1997)
Google Scholar
Comer, D.: The ubiquitous B-tree. ACM Computing Surveys 11(2), 121–137 (1979)
Article MATH Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004)
Google Scholar
Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM 2008, pp. 669–678 (2008)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th Very Large Database (VLDB) Conference, pp. 518–529 (1999)
Google Scholar
Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984)
Google Scholar
He, J., Liu, W., Chang, S.: Scalable similarity search with optimized kernel hashing. In: ACM SIGKDD, pp. 1129–1138 (2010)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. In: Proceedings of STOC, pp. 604–613 (1998)
Google Scholar
Jegou, H., Amsaleg, L., Schmid, C., Gros, P.: Query adaptative locality sensitive hashing. In: ICASSP 2008, pp. 825–828 (2008)
Google Scholar
Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: SIGMOD Conference, pp. 369–380 (1997)
Google Scholar
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, pp. 950–961 (2007)
Google Scholar
Pan, J., Manocha, D.: Bi-level locality sensitive hashing for k-Nearest Neighbor computation. In: ICDE, pp. 378–389 (2012)
Google Scholar
Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: Advances in Neural Information Processing Systems, pp. 1509–1517 (2009)
Google Scholar
Satuluri, V., Parthasarathy, S.: Bayesian locality sensitive hashing for fast similarity search. PVLDB 5(5), 430–441 (2012)
Google Scholar
Weber, R., Schek, H., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp. 194–205 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

ETIS, Univ. of Cergy-Pontoise / CNRS, France
Shaoyi Yin, Mehdi Badr & Dan Vodislav

Authors

Shaoyi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Badr
View author publications
You can also search for this author in PubMed Google Scholar
Dan Vodislav
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Department of Information Technologies, University of Economics, Winston Churchill Square 4, 130 67, Prague 3, Czech Republic
Josef Basl
Institute of Software Technology, Vienna University of Technology, Favoritenstraße 9-11 / 188, 1040, Vienna, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, S., Badr, M., Vodislav, D. (2013). Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40285-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-40285-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40284-5
Online ISBN: 978-3-642-40285-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search

Abstract

Chapter PDF

Similar content being viewed by others

Efficient locality-sensitive hashing over high-dimensional streaming data

DASH: Data Aware Locality Sensitive Hashing

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search

Abstract

Chapter PDF

Similar content being viewed by others

Efficient locality-sensitive hashing over high-dimensional streaming data

DASH: Data Aware Locality Sensitive Hashing

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation