Faster String Matching with Super-Alphabets

Fredriksson, Kimmo

doi:10.1007/3-540-45735-6_5

Kimmo Fredriksson⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

818 Accesses
9 Citations

Abstract

Given a text T [1... n] and a pattern P [1... m] over some alphabet Σ of size σ, finding the exact occurrences of P in T requires at least Ω (n log_σ m/m character comparisons on average, as shown in [19]. Consequently, it is believed that this lower bound implies also an Ω (n log_σ m/m lower bound for the execution time of an optimal algorithm. However, in this paper we show how to obtain an $ \mathcal{O}(n/m) $ average time algorithm. This is achieved by slightly changing the model of computation, and with a modification of an existing algorithm. Our technique uses a super-alphabet for simulating suffix automaton. The space usage of the algorithm is $ \mathcal{O}(\sigma m) $. The technique can be applied to many other string matching algorithms, including dictionary matching, which is also solved in expected time $ \mathcal{O}(n/m) $, and approximate matching allowing k edit operations (mismatches, insertions or deletions of characters). This is solved in expected time $ \mathcal{O}(nk/m) $ for $ k \leqslant \mathcal{O}(m/\log _\sigma m) $. The known lower bound for this problem is Ω (n(k+log_σ m)/m), given in [6]. Finally we show how to adopt a similar technique to the shift-or algorithm, extending its bit-parallelism in another direction. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. Some of the algorithms are implemented, and we show that the methods work well in practice too. This is especially true for the shift-or algorithm, which in some cases works faster than predicted by the theory. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. All the methods and analyses assume the ram model of computation, and that each symbol is coded in b =⌈log₂ σ⌉ bits. They work for larger b too, but the speed-up is decreased

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Alphabet-Dependent String Searching with Wexponential Search Trees

Efficient String Matching Based on a Two-Step Simulation of the Suffix Automaton

Substring Complexities on Run-Length Compressed Strings

References

A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18(6):333–340, 1975.
Article MATH MathSciNet Google Scholar
R.A. Baeza-Yates. Improved string searching. Softw. Pract. Exp., 19(3):257–271, 1989.
Article MathSciNet Google Scholar
R.A. Baeza-Yates. String searching algorithms revisited. In F. Dehne, J.R. Sack, and N. Santoro, editors, Proceedings of the 1st Workshop on Algorithms and Data Structures, number 382 in Lecture Notes in Computer Science, pages 75–96, Ottawa, Canada, 1989. Springer-Verlag, Berlin.
Google Scholar
R. A. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74–82, 1992.
Article Google Scholar
R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, 1977.
Article Google Scholar
W. I. Chang and T. Marr. Approximate string matching with local similarity. In M. Crochemore and D. Gusfield, editors, Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, number 807 in Lecture Notes in Computer Science, pages 259–273, Asilomar, CA, 1994. Springer-Verlag, Berlin.
Google Scholar
M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string matching algorithms. Algorithmica, 12(4/5):247–267, 1994.
Article MATH MathSciNet Google Scholar
M. Crochemore, A. Czumaj, L. Gasieniec, T. Lecroq, W. Plandowski, and W. Rytter. Fast practical multi-pattern matching. Inf. Process. Lett., 71((3-4)): 107–113, 1999.
Article MATH MathSciNet Google Scholar
R.N. Horspool. Practical fast searching in strings. Softw. Pract. Exp., 10(6):501–506, 1980.
Article Google Scholar
D. A. Huffman. A method for the construction of minimum redundancy codes. Proc. I.R.E., 40:1098–1101, 1951.
Article Google Scholar
D.E. Knuth, J.H. Morris, Jr, and V. R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(1):323–350, 1977.
Article MATH MathSciNet Google Scholar
W. J. Masek and M.S. Paterson. A faster algorithm for computing string edit distances. J. Comput. Syst. Sci., 20(1):18–31, 1980.
Article MATH MathSciNet Google Scholar
M. Miyazaki, S. Fukamachi, M. Takeda, and T. Shinohara. Speeding up the pattern matching machine for compressed texts. Transactions of Information Processing Society of Japan, 39(9):2638–2648, 1998.
MathSciNet Google Scholar
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 18(2):113–139, 2000.
Article Google Scholar
G. Navarro and M. Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. In M. Farach-Colton, editor, Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, number 1448 in Lecture Notes in Computer Science, pages 14–33, Piscataway, NJ, 1998. Springer-Verlag, Berlin.
Chapter Google Scholar
G. Navarro and J. Tarhio. Boyer-Moore string matching over ziv-lempel compressed text. In R. Giancarlo and D. Sankoff, editors, Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching, number 1848 in Lecture Notes in Computer Science, pages 166–180, Montréal, Canada, 2000. Springer-Verlag, Berlin.
Chapter Google Scholar
J. Tarhio and H. Peltola. String matching in the DNA alphabet. Softw. Pract. Exp., 27(7):851–861, 1997.
Article Google Scholar
S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83–91, 1992.
Article Google Scholar
A. C. Yao. The complexity of pattern matching for a random string. SIAM J. Comput., 8(3):368–387, 1979.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Finland
Kimmo Fredriksson

Authors

Kimmo Fredriksson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ciěncia da Computação, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, MG, Brazil
Alberto H. F. Laender
Instituto Superior Técnico, INESC-ID, R. Alves Redol 9, 1000-029, Lisboa, Portugal
Arlindo L. Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fredriksson, K. (2002). Faster String Matching with Super-Alphabets. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_5

Download citation

DOI: https://doi.org/10.1007/3-540-45735-6_5
Published: 18 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44158-8
Online ISBN: 978-3-540-45735-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Faster String Matching with Super-Alphabets

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Alphabet-Dependent String Searching with Wexponential Search Trees

Efficient String Matching Based on a Two-Step Simulation of the Suffix Automaton

Substring Complexities on Run-Length Compressed Strings

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Faster String Matching with Super-Alphabets

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Alphabet-Dependent String Searching with Wexponential Search Trees

Efficient String Matching Based on a Two-Step Simulation of the Suffix Automaton

Substring Complexities on Run-Length Compressed Strings

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation