Abstract
In this paper we are concerned with the basic problem of string pattern matching: preprocess one or multiple fixed strings over alphabet σ so as to be able to efficiently search for all occurrences of the string(s) in a given text T of length n. In our model, we assume that text and patterns are tightly packed so that any single character occupies logσ bits and thus any sequence of k consecutive characters in the text or the pattern occupies exactly klogσ bits. We first show a data structure that requires O(m) words of space (more precisely O(mlogm) bits of space) where m is the total size of the patterns and answers to search queries in average-optimal O(n/y) time where y is the length of the shortest pattern (y = m in case of a single pattern). This first data structure, while optimal in time, still requires O(mlogm) bits of space, which might be too much considering that the patterns occupy only mlogσ bits of space. We then show that our data structure can be compressed to only use O(mlogσ) bits of space while achieving query time O(n(log σ m)ε/y), with ε any constant such that 0 < ε < 1. We finally show two other direct applications: average optimal pattern matching with worst-case guarantees and average optimal pattern matching with k differences. In the meantime we also show a slightly improved worst-case efficient multiple pattern matching algorithm.
Work partially supported by the french ANR-2010-COSI-004 MAPPI Project.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Allauzen, C., Raffinot, M.: Simple optimal string matching. Journal of Algorithms 36, 102–116 (2000)
Belazzougui, D.: Worst case efficient single and multiple string matching in the ram model. In: Iliopoulos, C.S., Smyth, W.F. (eds.) IWOCA 2010. LNCS, vol. 6460, pp. 90–102. Springer, Heidelberg (2011)
Belazzougui, D.: Worst-case efficient single and multiple string matching on packed texts in the word-ram model. J. Discrete Algorithms 14, 91–106 (2012)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with o(1) accesses. In: SODA, pp. 785–794 (2009)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)
Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-fast tries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 159–172. Springer, Heidelberg (2010)
Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R., Weimann, O.: Optimal Packed String Matching. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011). Leibniz International Proceedings in Informatics (LIPIcs), vol. 13, pp. 423–432 (2011)
Bille, P.: Fast searching in packed strings. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 116–126. Springer, Heidelberg (2009)
Breslauer, D., Gąsieniec, L., Grossi, R.: Constant-time word-size string matching. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 83–96. Springer, Heidelberg (2012)
Carter, L., Wegman, M.N.: Universal classes of hash functions (extended abstract). In: STOC, pp. 106–112 (1977)
Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the ram, revisited. In: Symposium on Computational Geometry, pp. 1–10 (2011)
Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)
Crochemore, M., Czumaj, A., Sieniec, L.G., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Fast multi-pattern matching. Rapport I.G.M. 93-3, Université de Marne-la-Vallée (1993)
Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press (1994)
Czumaj, A., Crochemore, M., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12, 247–267 (1994)
Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS, pp. 137–143 (1997)
Fredriksson, K.: Faster string matching with super-alphabets. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 44–57. Springer, Heidelberg (2002)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Hagerup, T., Miltersen, P.B., Pagh, R.: Deterministic dictionaries. J. Algorithms 41(1), 69–85 (2001)
Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)
Hon, W.-K., Lam, T.W., Sadakane, K., Sung, W.-K., Yiu, S.-M.: A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica 48(1), 23–36 (2007)
Hon, W.-K., Sadakane, K., Sung, W.-K.: Breaking a time-and-space barrier in constructing full-text indices. SIAM J. Comput. 38(6), 2162–2178 (2009)
Knuth, D.E., Morris Jr, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)
Lecroq, T.: Recherches de mot. Thèse de doctorat, Université d’Orléans, France (1992)
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Morris Jr., J.H., Pratt, V.R.: A linear pattern-matching algorithm. Report 40, University of California, Berkeley (1970)
Morrison, D.R.: Patricia-practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)
Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics 5, 4 (2000)
Ružić, M.: Constructing efficient dictionaries in close to sorting time. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 84–95. Springer, Heidelberg (2008)
Ruzic, M.: Making deterministic signatures quickly. ACM Transactions on Algorithms 5(3) (2009)
Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. 41(4), 589–607 (2007)
Yao, A.C.: The complexity of pattern matching for a random string. SIAM J. Comput. 8(3), 368–387 (1979)
Shi, Q., JáJá, J.: Novel Transformation Techniques Using Q-Heaps with Applications to Computational Geometry. SIAM J. Comput. 34(6), 1474–1492 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Belazzougui, D., Raffinot, M. (2013). Average Optimal String Matching in Packed Strings. In: Spirakis, P.G., Serna, M. (eds) Algorithms and Complexity. CIAC 2013. Lecture Notes in Computer Science, vol 7878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38233-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-38233-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38232-1
Online ISBN: 978-3-642-38233-8
eBook Packages: Computer ScienceComputer Science (R0)