Abstract
Given a collection of strings S={s 1,...,s n } over an alphabet Σ, a superstring α of S is a string containing each s i as a substring; that is, for each i, 1 ≤ i ≤ n, α contains a block of ¦s i¦ consecutive characters that match s i exactly. The shortest superstring problem is the problem of finding a superstring α of minimum length. This problem is NP-hard [6] and has applications in computational biology and data compression. The first O(1)-approximation algorithms were given in [2]. We describe our 2 3/4-approximation algorithm, which is the best known. While our algorithm is not complex, our analysis requires some novel machinery to describe overlapping periodic strings. We then show how to combine our result with that of [11] to obtain a ratio of 2 50/69 ≈ 2.725. We describe an implementation of our algorithm which runs in O(¦S¦+n 3) time; this matches the running time of previous O(1)-approximations.
This work was done while the author was at Dartmouth College.
Research partly supported by NSF Award CCR-9308701, a Walter Burke Research Initiation Award and a Dartmouth College Research Initiation Award.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Armen and C. Stein. Short supertrings and the structure of overlapping strings. To appear in J. of Computational Biology, 1995.
A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings. Journal of the ACM, 41(4):630–647, July 1994.
A. Czumaj, L. Gasieniec, M. Piotrow, and W. Rytter. Parallel and sequential approxmations of shortest superstrings. In Proceedings of Fourth Scandinavian Workshop on Algorithm Theory, pages 95–106, 1994.
A. Lesk (edited). Computational Molecular Biology, Sources and Methods for Sequence Analysis. Oxford University Press, 1988.
A.M. Frieze, G. Galbiati, and F. Maffoli. On the worst case performance of some algorithms for the asymmetric travelling salesman problem. Networks, 12:23–39, 1982.
J. Gallant, D. Maier, and J. Storer. On finding minimal length superstrings. Journal of Computer and System Sciences, 20:50–58, 1980.
D. Gusfield. Faster implementation of a shortest superstring approximation. Information Processing Letters, (51):271–274, 1994.
D. Gusfield, G. Landau, and B. Schieber. An efficient algorithm for the all pairs suffix-prefix problem. Information Processing Letters, (41):181–185, March 1992.
John D. Kececioglu. Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, 1991.
D.E. Knuth, J.H.Morris, and V.B. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6:189–195, 1977.
R. Kosaraju, J. Park, and C. Stein. Long tours and short superstrings. In FOCS, November 1994.
M. Li. Towards a DNA sequencing theory (learning a string). In FOCS, pages 125–134, 1990.
L.J.Cummings. Strongly qth power-free strings. Annals of Discrete Mathematics, 17:247–252, 1983.
Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization, Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982.
H. Peltola, H. Soderlund, J. Tarjio, and E. Ukkonen. Algorithms for some string matching problems arising in molecular genetics. In Proceedings of the IFIP Congress, pages 53–64, 1983.
Graham A. Stephen. String searching algorithms. World Scientific, 1994.
J. Storer. Data compression: methods and theory. Computer Science Press, 1988.
J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science, 57:131–145, 1988.
Shang-Hua Teng and Frances Yao. Approximating shortest superstrings. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science, pages 158–165, November 1993.
J. Turner. Approximation algorithms for the shortest common superstring problem. Information and Computation, 83:1–20, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Armen, C., Stein, C. (1995). Improved length bounds for the shortest superstring problem. In: Akl, S.G., Dehne, F., Sack, JR., Santoro, N. (eds) Algorithms and Data Structures. WADS 1995. Lecture Notes in Computer Science, vol 955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60220-8_88
Download citation
DOI: https://doi.org/10.1007/3-540-60220-8_88
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60220-0
Online ISBN: 978-3-540-44747-4
eBook Packages: Springer Book Archive