Index Structures for Fast Similarity Search for Real-Valued Vectors. I

Rachkovskij, D. A.

doi:10.1007/s10559-018-0016-1

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

NEW MEANS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS
Published: 27 January 2018

Volume 54, pages 152–164, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cybernetics and Systems Analysis Aims and scope

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

Download PDF

D. A. Rachkovskij¹

100 Accesses
5 Citations
Explore all metrics

Abstract

This survey paper considers index structures for fast similarity search for objects represented by real-valued vectors. Index structures based on locality-sensitive hashing and their modifications are discussed. The ideas of specific algorithms, including the recently proposed ones, are stated. Their interrelations and some theoretical aspects are discussed.

Article PDF

Index Structures for Fast Similarity Search for Real Vectors. II^*

Article 01 March 2018

Index Structures for Fast Similarity Search for Symbol Strings

Article 01 September 2019

Index Structures for Fast Similarity Search for Binary Vectors

Article 26 September 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

D. A. Rachkovskij, “Real-valued embeddings and sketches for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 52, No. 6, 967–988 (2016).
Article MathSciNet MATH Google Scholar
D. A. Rachkovskij, “Binary vectors for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 53, No. 1, 138–156 (2017).
Article MathSciNet MATH Google Scholar
D. A. Rachkovskij, “Distance-based index structures for fast similarity search,” Cybernetics and Systems Analysis, Vol. 53, No. 4, 636–658 (2017).
Article MathSciNet Google Scholar
D. A. Rachkovskij, “Index structures for fast similarity search for binary vectors,” Cybernetics and Systems Analysis, Vol. 53, No. 5, 799–820 (2017).
Article MathSciNet Google Scholar
C.Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, New York (2008).
Book MATH Google Scholar
R. Datta, D. Joshi, J. Li, and J. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Computing Surveys, Vol. 40, No. 2, 1–60 (2008).
Article Google Scholar
Ì. Ì. Fouad, “Content-based search for image retrieval,” Int. J. Image, Graphics and Signal Processing, Vol. 5, No. 11, 46–52 (2013).
Article Google Scholar
F. A. Khalifa, N. A. Semary, H. M. El-Sayed, and M. M. Hadhoud, “Local detectors and descriptors for object class recognition,” Int. J. of Intelligent Systems and Applications, Vol. 7, No. 10, 12–18 (2015).
Article Google Scholar
A. Ziomek and M. Oszust, “Evaluation of interest point detectors in presence of noise,” Int. J. Intelligent Systems and Applications, Vol. 8, No. 3, 26–33 (2016).
Article Google Scholar
S. Fortune, “Voronoi diagrams and Delaunay triangulations,” in: Handbook of Discrete and Computational Geometry, Chap. 27, 3rd Edition, CRC Press, Boca Raton, USA (2017), pp. 705–721.
S. Meiser, “Point location in arrangements of hyperplanes,” Inform. and Comput., Vol. 106, No. 2, 286–303 (1993).
Article MathSciNet MATH Google Scholar
A. Andoni and P. Indyk, “Nearest neighbors in high-dimensional spaces,” in: Handbook of Discrete and Computational Geometry, Chap. 43, 3rd Edition, CRC Press, Boca Raton, USA (2017), pp. 1133–1153.
R. Weber, H. Schek, and S. Blott, “A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces,” in: Proc. VLDB’98 (1998), pp. 194–205.
S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu, “An optimal algorithm for approximate nearest neighbor searching fixed dimensions,” Journal of the ACM, Vol. 45, No. 6, 891–923 (1998).
Article MathSciNet MATH Google Scholar
A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” Communications of the ACM, Vol. 51. No. 1, 117–122 (2008).
Article Google Scholar
S. Har-Peled, P. Indyk, and R. Motwani, “Approximate nearest neighbor: Towards removing the curse of dimensionality,” Theory Comput., Vol. 8, 321–350 (2012).
Article MathSciNet MATH Google Scholar
D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” J. of Machine Learning Tech., Vol. 2, No. 1, 37–63 (2011).
MathSciNet Google Scholar
R. Das, S. Thepade, and S. Ghosh, “Content based image recognition by information fusion with multiview features. I,” J. Information Technology and Computer Science, Vol. 7, No. 10, 61–73 (2015).
Article Google Scholar
S. Ramaswamy and K. Rose, “Adaptive cluster distance bounding for high-dimensional indexing,” IEEE Trans. on KDE, Vol. 23, No. 6, 815–830 (2011).
Google Scholar
M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE TPAMI, Vol. 36, No. 11, 2227–2240 (2014).
Article Google Scholar
A. Shrivastava and P. Li, “Asymmetric minwise hashing for indexing binary inner products and set containment,” in: Proc. WWW’15 (2015), pp. 981–991.
M. Charikar, “Similarity estimation techniques from rounding algorithms,” in: Proc. STOC’02 (2002), pp. 380–388.
M. Aumuller, T. Christiani, R. Pagh, and F. Silvestr, Distance Sensitive Hashing. arXiv:1703.07867. 22 Mar 2017.
A. Andoni, M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing using stable distributions,” in: Nearest Neighbor Methods for Learning and Vision: Theory and Practice, MIT Press, Cambridge (2006), pp. 61–72.
N. Pham, “Hybrid LSH: Faster near neighbors reporting in high-dimensional space,” in: Proc. EDBT’17 (2017), pp. 454–457.
J. Wang, H. T. Shen, J. Song, and J. Ji, Hashing for Similarity Search: A Survey. arXiv:1408.2927. 13 Aug 2014.
J. Tang and Y. Tian, “A systematic review on minwise hashing algorithms,” Annals of Data Science, Vol. 3, No. 4, 445–468 (2016).
Article Google Scholar
B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing,” IEEE Trans. PAMI, Vol. 34, No. 6, 1092–1104 (2012).
Article Google Scholar
Y. Mu and S. Yan, “Non-metric locality sensitive hashing,” in: Proc. AAAI’10 (2010), pp. 539–544.
A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” in: Proc. FOCS’06 (2006), pp. 459-468.
H. Jegou, L. Amsaleg, C. Schmid, and P. Gros, “Query-adaptive locality sensitive hashing,” in: Proc. ICASSP’08 (2008), pp. 825–828.
F. Chierichetti and R. Kumar, “Lsh-preserving functions and their applications,” J. ACM, Vol. 62, No. 5, 33:1–33:25 (2015).
F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli, “The distortion of locality sensitive hashing,” in: Proc. ITCS’17 (2017), p. 23.
A. Sokolov, “Investigation of accelerated search for close text sequences with the help of vector representations,” Cybernetics and Systems Analysis, Vol. 44, No. 4, 493–506 (2008).
Article MathSciNet MATH Google Scholar
A. Andoni, R. Krauthgamer, and I. P. Razenshteyn, “Sketching and embedding are equivalent for norms,” in: Proc. STOC’15 (2015), pp. 479–488.
A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt, “Practical and optimal LSH for angular distance,” in: Proc. NIPS’15 (2015), pp. 1225-1233.
K. Terasawa and Y. Tanaka, “Spherical lsh for approximate nearest neighbor search on unit hypersphere,” in: Proc. WADS’07 (2007), pp. 27–38.
K. Eshghi and S. Rajaram, “Locality sensitive hash functions based on concomitant rank order statistics,” in: Proc. KDD’08 (2008), pp. 221–229.
A. Andoni and I. Razenshteyn, “Optimal data-dependent hashing for approximate near neighbors,” in: Proc. STOC’15 (2015), pp. 793–801.
T. Laarhoven, “Hypercube LSH for approximate near neighbors,” in: Proc. MFCS’17 (2017).
C. Kennedy and R. Ward, “Fast cross-polytope locality-sensitive hashing,” in: Proc. ITCS’17 (2017).
R. Panigrahy, “Entropy based nearest neighbor search in high dimensions,” in: Proc. SODA’06 (2006), pp. 1186–1195.
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, “Multi-probe lsh: efficient indexing for high-dimensional similarity search,” in: Proc. VLDB’07 (2007), pp. 950–961.
A. Joly and O. Buisson, “A posteriori multi-probe locality sensitive hashing,” in: Proc. MM’08 (2008), pp. 209–218.
M. Slaney, Y. Lifshits, and J. He, “Optimal parameters for locality-sensitive hashing,” Proc. IEEE, Vol. 100, No. 9, 2604–2623 (2012).
Article Google Scholar
M. Kapralov, “Smooth tradeoffs between insert and query complexity in nearest neighbor search,” in: Proc. PODS’15 (2015), pp. 329–342.
T. D. Ahle, M. Aumuller, and R. Pagh, “Parameter-free locality sensitive hashing for spherical range reporting,” in: Proc. SODA’17 (2017), pp. 239–256.
A. Pacuk, P. Sankowski, K. Wegrzycki, and P. Wygocki, “Locality-sensitive hashing without false negatives for lp,” in: Proc. COCOON’16 (2016), pp. 105–118.
P. Wygocki, On Fast Bounded Locality Sensitive Hashing. arXiv:1704.05902. 19 Apr 2017.
W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li, “Modeling lsh for performance tuning,” in: Proc. CIKM’08 (2008), pp. 669–678.
P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier, “Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm,” in: Proc. AofA’07 (2007), pp. 127–146.
A. Chakrabarti, V. Satuluri, A. Srivathsan, and S. Parthasarathy, “A Bayesian perspective on locality sensitive hashing with extensions for kernel methods,” ACM TKDD, Vol. 10, No. 2, 19:1–19:32 (2015).
Google Scholar
M. Bawa, T. Condie, and P. Ganesan, “Lsh forest: Self-tuning indexes for similarity search,” in: Proc. WWW’05 (2005), pp. 651–660.
A. Andoni, I. Razenshteyn, and N. Shekel Nosatzki, “Lsh forest: Practical algorithms made theoretical,” in: Proc. SODA’17 (2017), pp. 67–78.
Y. Tao, K. Yi, C. Sheng, and P. Kalnis, “Efficient and accurate nearest neighbor and closest pair search in high dimensional space,” ACM TODS, Vol. 35, No. 3, 20:1–20:46 (2010).
Article Google Scholar
J. K. Lawder and P. J. H. King, “Querying multi-dimensional data indexed using the Hilbert space filling curve,” ACM SIGMOD Record, Vol. 30, No. 1, 19–24 (2001).
Article Google Scholar
D. Comer, “The ubiquitous B-tree,” ACM Comput. Surv., Vol. 11, 121–138 (1979).
Article MATH Google Scholar
Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen, “Sk-lsh: An efficient index structure for approximate nearest neighbor search,” in: Proc. VLDB Endowment, Vol. 7, No. 9, 745-756 (2014).
J. Chen, C. He, G. Hu, and J. Shao, “SELSH: A hashing scheme for approximate similarity search with early stop condition,” in: Proc. MMM’16, Vol. 2 (2016), pp. 104–115.
F. Hao, J. Daugman, and P. Zielinski, “A fast search algorithm for a large fuzzy database,” IEEE Trans. Information Forensics and Security, Vol. 3, No. 2, 203–212 (2008).
Article Google Scholar
K. Ling and G. Wu, “Frequency based locality sensitive hashing,” in: Proc. ICMT’11 (2011), pp. 4929–4932.
J. Gan, J. Feng, Q. Fang, and W. Ng, “Locality-sensitive hashing scheme based on dynamic collision counting,” in: Proc. SIGMOD’12 (2012), pp. 541–552.
Y. Zheng, Q. Guo, A. K. H. Tung, and S. Wu, “LazyLSH: Approximate nearest neighbor search for multiple distance functions with a single index,” in Proc. SIGMOD’16 (2016), pp. 2023–2037.
Q. Huang, J. Feng, Y. Zhang, Q. Fang, and W. Ng, “Query-aware locality-sensitive hashing for approximate nearest neighbor search,” Proc. VLDB Endowment, Vol 9, No. 1, 1–12 (2015).
Article Google Scholar
X. Zhang, M. Wang, and J. Cui, “Efficient indexing of binary LSH for high dimensional nearest neighbor,” Neurocomputing, Vol. 213, 24–33 (2016).
Article Google Scholar
M. Norouzi, A. Punjani, and D. J. Fleet, “Fast exact search in Hamming space with multi-index hashing,” IEEE Trans. PAMI, Vol. 36, No. 6, 1107–1119 (2014).
Article Google Scholar
J. Gao, H. V. Jagadish, B. C. Ooi, and S. Wang, “Selective hashing: Closing the gap between radius search and k-NN search,” in: Proc. SIGKDD’15 (2015), pp. 349–358.
A. Andoni, T. Laarhoven, I. Razenshteyn, and E. Waingarten, “Optimal hashing-based time-space trade-offs for approximate near neighbors” in: Proc. SODA’17 (2017), pp. 47–66.
A. Becker, L. Ducas, N. Gama, and T. Laarhoven, “New directions in nearest neighbor searching with applications to lattice sieving,” in: Proc. SODA’16 (2016), pp. 10–24.
T. Christiani, “A framework for similarity search with space-time tradeoffs using locality-sensitive filtering,” in: Proc. SODA’17 (2017), pp. 31–46.
D. A. Rachkovskij, I. S. Misuno, and S. V. Slipchenko, “Randomized projective methods for construction of binary sparse vector representations,” Cybernetics and Systems Analysis, Vol. 48, No. 1, 146–156 (2012).
Article MATH Google Scholar
D. A. Rachkovskij, “Formation of similarity-reflecting binary vectors with random binary projections,” Cybernetics and Systems Analysis, Vol. 51, No. 2, 313–323 (2015).
Article MATH Google Scholar
R. Donaldson, A. Gupta, Y. Plan, and T. Reimer, Random Mappings Designed for Commercial Search Engines. arXiv:1507.05929. 21 Jul 2015.
S. Ferdowsi, S. Voloshynovskiy, D. Kostadinov, and T. Holotyak, “Fast content identification in highdimensional feature spaces using sparse ternary codes,” in: Proc. WIFS’16 (2016), pp. 1–6.
G. Valiant, “Finding correlations in subquadratic time, with applications to learning parities and the closest pair problem,” J. ACM, Vol. 62, No. 2, 13:1–13:45 (2015).
Article MathSciNet MATH Google Scholar
H. L. Nguyen, Algorithms for High Dimensional Data, PhD Thesis, Princeton University (2014). URL: http://arks.princeton.edu/ark:/88435/dsp01b8515q61f.
A. Rahimi and B. Recht, “Random features for large-scale kernel machine,” in: Proc. NIPS’07 (2007), pp. 1177–1184.
R. O’Donnell, Y. Wu, and Y. Zhou, “Optimal lower bounds for locality sensitive hashing (except when q is tiny),” ACM TOCS, Vol. 6, No. 1, 5.1–5.13 (2014).
MathSciNet MATH Google Scholar
J. Wang, W. Liu, S. Kumar, and S.-F. Chang, “Learning to hash for indexing big data: A survey,” Proc. IEEE, Vol. 104, No. 1, 34–57 (2016).
Article Google Scholar
J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, “A Survey on Learning to Hash,” IEEE Trans. PAMI. doi:https://doi.org/10.1109/TPAMI.2017.2699960.
L. Gao, J. Song, X. Liu, J. Shao, J. Liu, and J. Shao, “Learning in high-dimensional multimedia data: The state of the art,” Multimedia Systems, Vol. 23, No. 3, 303–313 (2017).
Article Google Scholar
W. Mou and L. Wang, “A refined analysis of lsh for well-dispersed data points,” in: Proc. ANALCO’17 (2017), pp. 174–182.
A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn, “Beyond locality-sensitive hashing,” in: Proc. SODA’14 (2014), pp. 1018–1028.
A. Andoni and I. Razenshteyn, “Tight lower bounds for data-dependent locality-sensitive hashing,” in: Proc. SoCG’16 (2016), pp. 9:1–9:11.
V. I. Gritsenko, D. A. Rachkovskij, A. A. Frolov, R. Gayler, D. Kleyko, and E. Osipov, “Neural distributed autoassociative memories: A survey,” Cybernetics and Computer Engineering, No. 2 (188), 5–35 (2017).
Y. Wang, A. Shrivastava, and J. Ryu, FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search. arXiv:1709.01190. 4 Sep 2017.
A. Shrivastava, “Optimal densification for fast and accurate minwise hashing,” in: Proc. ICML’17 (2017), pp. 3154–3163.

Download references

Author information

Authors and Affiliations

International Scientific-Educational Center of Information Technologies and Systems, NAS of Ukraine and MES of Ukraine, Kyiv, Ukraine
D. A. Rachkovskij

Authors

D. A. Rachkovskij
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. A. Rachkovskij.

Additional information

Translated from Kibernetika i Sistemnyi Analiz, No. 1, January–February, 2018, pp. 168–183.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rachkovskij, D.A. Index Structures for Fast Similarity Search for Real-Valued Vectors. I. Cybern Syst Anal 54, 152–164 (2018). https://doi.org/10.1007/s10559-018-0016-1

Download citation

Received: 23 August 2017
Published: 27 January 2018
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10559-018-0016-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

Abstract

Article PDF

Similar content being viewed by others

Index Structures for Fast Similarity Search for Real Vectors. II^*

Index Structures for Fast Similarity Search for Symbol Strings

Index Structures for Fast Similarity Search for Binary Vectors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

Abstract

Article PDF

Similar content being viewed by others

Index Structures for Fast Similarity Search for Real Vectors. II*

Index Structures for Fast Similarity Search for Symbol Strings

Index Structures for Fast Similarity Search for Binary Vectors

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Index Structures for Fast Similarity Search for Real Vectors. II^*