Abstract
We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive.
This work was partially supported by Grant 1930765 from Fondecyt.
Supported in part by NSF grants CCR-9002351 and CCR-9301129, and by the Advanced Research Projects Agency under contract number DABT63-93-C-0052. The information contained in this paper does not necessarily reflect the position or the policy of the U.S. Government or other sponsors of this research. No official endorsement should be inferred.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul S.F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Molecular Biology 15 (1990), 403–410.
Baeza-Yates, R.A. and Cunto, W., “Unbalanced Multiway Trees Improved by Partial Expansions”, Acta Information, 29 (5), 1992, 443–460.
Baeza-Yates, R.A. and Gonnet, G.H., “All-against-all Sequence Matching”, Dept. of Computer Science, Universidad de Chile, 1990.
Bahl L. R., P. S. Gopalakrishnan, D. S. Kanevsky, and D.S. Nahamoo, “A fast admissible method for identifying a short list of candidate words,” IBM tech report RC 15874 (June 1990).
Bugnion, E. and Roos, T. and Shi, F. and Widmayer, P. and Widmer, F. “A Spatial Index for Approximate Multiple String Matching”, 1st South American Workshop on String Processing, Belo Horizonte, Sept 1993, 43–54.
Burkhard, W.A. and Keller, R.M. “Some Approaches to Best-Match File Searching”, Communications of the ACM 16 (4), April 1973, 230–236.
Chang W.L., and E.L. Lawler, “Approximate matching in sublinear expected time,” Proc. of the 31st IEEE Symp. on Foundations of Computer Science (1990) 116–124.
Friedman, J.H. and Bentley, J.L. and Finkel, R.A. “An Algorithm to find best matches in logarithmic expected time”, ACM Trans. on Math. Software 3(3), 1977.
Gonnet, G.H. and Baeza-Yates, R. Handbook of Algorithms and Data Structures, Addison-Wesley, second edition, 1991.
Gonnet, G.H., M.A. Cohen, and S.A. Benner, “Exhaustive matching of the entire protein sequence database,” Science 256, 1443.
Lipman D. J., and W.R. Pearson, “Rapid and sensitive protein similarity searches,” Science 227 (1985), 1435–1441.
Mahmoud, H. Evolution of Random Search Trees, John Wiley, New York, 1992.
Murtagh, F. “A Survey of Recent Advances in Hierarchical Clustering Algorithms”, IEEE Computer 26 (4), 1983, 354–359.
Myers, E. “Algorithmic Advances for Searching Biosequence Databases,” Proceedings of the International Symposium on Computational Methods in Genome Research (Heidelberg, 1992), to appear.
Myers, E. “A Sublinear Algorithm for Approximate Keyword Matching,” Algorithmica, in press.
Nevalainen, O. and Katajainen, J. “Experiments with a Closest Point Algorithm in Hamming Space”, Angewandte Informatik 5, 1982, 277–281.
Santana, O. and Diaz, M. and Duque, J.D. and Rodriguez, J.C. “Increasing radius search schemes for the most similar strings on the Burkhard-Keller tree”, International Workshop on Computer Aided Systems Theory, EUROCAST'89, 1989.
Shapiro, M. “The Choice of Reference Points in Best-Match File Searching”, Communications of the ACM 20 (5), May 1977, 339–343.
Shasha, D. and Wang, T-L. “New Techniques for Best-Match Retrieval”, ACM Transactions on Information Systems 8, 1990, 140–158.
Ukkonen, E., “Approximate string matching with q-grams and maximal matches,” Theoretical Computer Science (1992), 191–212.
Ukkonen, E., “Approximate string-matching over suffix trees,” 4th Annual Combinatorial Pattern Matching Symp., Padova, Italy (June 1993), 228–242.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R., Cunto, W., Manber, U., Wu, S. (1994). Proximity matching using fixed-queries trees. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_18
Download citation
DOI: https://doi.org/10.1007/3-540-58094-8_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive