Approximate Matching of Run-Length Compressed Strings

Mäkinen; Ukkonen; Navarro

doi:10.1007/s00453-002-1005-2

Approximate Matching of Run-Length Compressed Strings

Published: April 2003

Volume 35, pages 347–369, (2003)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Algorithmica Aims and scope Submit manuscript

Approximate Matching of Run-Length Compressed Strings

Download PDF

Mäkinen¹,
Ukkonen¹ &
Navarro²

170 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.

Author information

Authors and Affiliations

Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, vmakinen@cs.helsinki.fi, ukkonen@cs.helsinki.fi. Supported by the Academy of Finland under Grant 22584., Finland
Mäkinen & Ukkonen
Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile. gnavarro@dcc.uchile.cl. Supported by Millenium Nucleus Center for Web Research, Grant P01-029-F, Mideplan, Chile
Navarro

Authors

Mäkinen
View author publications
You can also search for this author in PubMed Google Scholar
Ukkonen
View author publications
You can also search for this author in PubMed Google Scholar
Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mäkinen, Ukkonen & Navarro Approximate Matching of Run-Length Compressed Strings. Algorithmica 35, 347–369 (2003). https://doi.org/10.1007/s00453-002-1005-2

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1007/s00453-002-1005-2

Keywords

Compressed pattern matching, Run-length encoding, Levenshtein distance, Longest common subsequence, Weighted edit distance

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximate Matching of Run-Length Compressed Strings

Abstract

Article PDF

Similar content being viewed by others

Locating All Maximal Approximate Runs in a String

Space-Efficient STR-IC-LCS Computation

Longest Common Prefixes with k-Errors and Applications

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximate Matching of Run-Length Compressed Strings

Abstract

Article PDF

Similar content being viewed by others

Locating All Maximal Approximate Runs in a String

Space-Efficient STR-IC-LCS Computation

Longest Common Prefixes with k-Errors and Applications

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation