Abstract
In the approximate dictionary matching problem, a dictionary that contains a set of pattern strings is given. The user presents a text string and a tolerance k (k is a positive integer) and asks for all occurrences of all dictionary patterns that appear in the text with at most k differences to the original patterns. We present two algorithms for the problem. The first algorithm assumes that all patterns in the dictionary are of the same length. The second algorithm removes this assumption at the expense of a bit more complicated preprocess of the dictionary and slower query time. The basic idea behind our algorithms is to represent each dictionary pattern with one or two points in a ¦Σ¦ q — dimensional real space under the L 1-metric where Σ is the underlying alphabet and q a fixed integer and then organize these points with some spatial data structure to make subsequent searches with different texts of different lengths and different tolerance values fast. Although the approximate dictionary matching would be of enormous importance in molecular biological applications, no previous results for the problem are known.
Preview
Unable to display preview. Download preview PDF.
References
A. V. Aho and M. Corasick, Efficient string matching: an aid to bibliographic search. Communications of the ACM, June 1975, Vol. 18, No. 6, pp. 333–340
A. Amir and M. Farach, Adaptive dictionary matching. Proc. of the 32nd IEEE Annual Symposium on Foundation of Computer Science, 1991, pp. 760–766
A. Amir, M. Farach, R. Indury, J. A. Poutre and A. Schaeffer, Improved dynamic dictionary matching, Proc. of the fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 1993, pp. 392–400
Amir, M. Farach and Y. Matias, efficient randomized dictionary matching algorithms. Proc. of the 3rd Ann. Symp. on Combinatorial Pattern Matching, 1992
E. Bugnion, T. Roos, F. Shi, P. Widmayer and F. Widmer, Approximate multiple string matching using spatial indexes. in Proc. of the 1st South American Workshop on String Processing, (eds.) R. Baeza-Yates and N. Ziviani, pp. 43–53, 1993
M. L. Fredman, J. Komlos and E. Szemeredi, Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31, 3(1984), 538–544
R. Indury and A. Schaeffer, Dynamic dictionary matching with failure functions. in Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, 1992
T. Kohonen and E. Reuhkala, A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. Proc. 4th Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, 807–809
G. M. Landau and U. Vishkin, in Proc. 18th ACM Symposium on Theory of Computing, 1986, pp. 220–250
H. Noltmeier, K. Verbarg and C. Zirkelbach, Monotonous bisector* trees — a tool for efficient partitioning of complex scenes of geometric objects. In Data Structures and Efficient Algorithms: Final Report on the DFG Special Joint Initiative Vol. 594 of L.N.C., Spring-Verlag, 1992
O. Owolabi and D. R. McGregor: Fast approximate string matching. Software — Practice and Experience 18(4) (1988), 387–393
C. E. Shannon, A mathematical theory of communications. The Bell Systems Techn. Journal 27 (1948), 379–423
E. Ukkonen, Approximate string matching with q-grams and maximal matches. Report, Department of Computer Science, University of Helsinki, Finland, 1991
K. Verbarg, Räumliche Indizes-Celltrees: Analyse und Vergleich mit Monotonen Bisektorbäumen, Diploma Thesis, Department of Computer Science, University of Würzburg, Germany, 1992
C. K. Wong and A. K. Chandra, Bounds for the string editing problem. Journal of the ACM, vol.23, No.1, January 1976, pp. 13–16
C. Zirkelbach, Monotonous bisector trees and clustering problems, Report, Department of Computer Science, University of Würzburg, Germany, 1990
C. Zirkelbach, Geometrisches Clustern — ein metrischer Ansatz, Dissertation, Department of Computer Science, University of Würzburg, Germany, 1992
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shi, F. (1995). Fast approximate dictionary matching. In: Staples, J., Eades, P., Katoh, N., Moffat, A. (eds) Algorithms and Computations. ISAAC 1995. Lecture Notes in Computer Science, vol 1004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015431
Download citation
DOI: https://doi.org/10.1007/BFb0015431
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60573-7
Online ISBN: 978-3-540-47766-2
eBook Packages: Springer Book Archive