Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLP-compressed text of length \( \overline{m} \), and an uncompressed pattern of length n, Cégielski et al. gave an algorithm for local subsequence recognition running in time \( O\left( {\overline{m} n^2 \log n}\right)\). We improve the running time to \( O\left( {\overline{m} n^{1.5} }\right)\). Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time \( O\left( {\overline{m} n^{1.5} }\right)\); the same problem with a compressed pattern is known to be NP-hard. Bibliography: 22 titles.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1976).
C. E. R. Alves, E. N. Cáceres, and S. W. Song, “An all-substrings common subsequence algorithm,” Electr. Notes Discr. Math., 19, 133–139 (2005).
J. L. Bentley, “Multidimensional divide-and-conquer,” Comm. ACM, 23, No. 4, 214–229 (1980).
P. Cégielski, I. Guessarian, Y. Lifshits, and Y. Matiyasevich, “Window subsequence problems for compressed texts,” in: Proceedings of CSR, Lect. Notes Comp. Sci., 3967 (2006), pp. 127–136.
P. Cégielski, I. Guessarian, and Y. Matiyasevich, “Multiple serial episodes matching,” Inform. Process. Lett., 98, No. 6, 211–218 (2006).
M. Crochemore, G. M. Landau, and M. Ziv-Ukelson, “A subquadratic sequence alignment algorithm for unrestricted score matrices,” SIAM J. Comp., 32, No. 6, 1654–1673 (2003).
M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press (1994).
G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos, and J. Kärkkäinen, “Episode matching,” in: Proceedings of CPM, Lect. Notes Comp. Sci., 1264 (1997), pp. 12–27.
D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press (1997).
J. Kärkkäinen, G. Navarro, and E. Ukkonen, “Approximate string matching on Ziv–Lempel compressed text,” J. Discr. Alg., 1, 313–338 (2003).
Y. Lifshits and M. Lohrey, “Querying and embedding compressed texts,” in: Proceedings of MFCS, Lect. Notes Comp. Sci., 4162 (2006), pp. 681–692.
W. J. Masek and M. S. Paterson, “A faster algorithm computing string edit distances,” J. Comp. System Sci., 20, 18–31 (1980).
G. Myers, “Approximately matching context-free languages,” Inform. Process. Lett., 54, 85–92 (1995).
G. Navarro, “A guided tour to approximate string matching,” ACM Comp. Surv., 33, No. 1, 31–88 (2001).
F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer (1985).
W. Rytter, “Application of Lempel–Ziv factorization to the approximation of grammar-based compression,” Theor. Comp. Sci., 302, No. 1–3, 211–222 (2003).
A. Tiskin, “Semi-local longest common subsequences in subquadratic time,” J. Discr. Alg., 6, No. 4, 570–581 (2008).
A. Tiskin, “Semi-local string comparison: Algorithmic techniques and applications,” Math. Comput. Sci., 1, No. 4, 571–603 (2008).
B. W. Watson and G. Zwaan, “A taxonomy of sublinear multiple keyword pattern matching algorithms,” Sci. Comput. Programm., 27, No. 2, 85–118 (1996).
T. A. Welch, “A technique for high-performance data compression,” Computer, 17, No. 6, 8–19 (1984).
G. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transact. Inform. Theory, 23, 337–343 (1977).
G. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Transact. Inform. Theory, 24, 530–536 (1978).
Author information
Authors and Affiliations
Corresponding author
Additional information
Published in Zapiski Nauchnykh Seminarov POMI, Vol. 358, 2008, pp. 282–300.
Rights and permissions
About this article
Cite this article
Tiskin, A. Faster subsequence recognition in compressed strings. J Math Sci 158, 759–769 (2009). https://doi.org/10.1007/s10958-009-9396-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10958-009-9396-0