Abstract
We consider efficient methods for computing a difference metric between two sequences of symbols, where the cost of an operation to insert or delete a block of symbols is a concave function of the block's length. Alternatively, sequences can be optimally aligned when gap penalties are a concave function of the gap length. Two algorithms based on the ‘candidate list paradigm’ first used by Waterman (1984) are presented. The first computes significantly more parsimonious candidate lists than Waterman's method. The second method refines the first to the point of guaranteeingO(N 2 lgN) worst-case time complexity, and under certain conditionsO(N 2). Experimental data show how various properties of the comparison problem affect the methods' relative performance. A number of extensions are discussed, among them a technique for constructing optimal alignments inO(N) space in expectation. This variation gives a practical method for comparing long amino sequences on a small computer.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Literature
Fickett, J. W. 1984. “Fast optimal alignment.”Nucleic Acids Res. 12, 175–179.
Fitch, W. M. and T. F. Smith. 1983. “Optimal sequence alignments.”Proc. natn. Acad. Sci. U.S.A. 80, 1382–1386.
Fredman, M. L. 1984. “Algorithms for computing evolutionary similarity measures with length independent gap penalties.”Bull math. Biol. 46, 553–566.
Gotoh, O. 1982. “An improved algorithm for matching biological sequences.”J. molec. Biol. 162, 705–708.
— 1986. “Alignment of three biological sequences with an efficient traceback procedure.”J. theor. Biol. 121, 327–337.
Hirschberg, D. S. 1975. “A linear space algorithm for computing maximal common subsequences.”Communications of ACM 18, 341–343.
— and L. L. Larmore. 1987. “The least weight subsequence problem.”SIAM J. on Computing 16, 628–638.
Miller, W. and E. W. Myers. 1986. “A simple row-replacement method.” TR 86-28 Dept. of Computer Science, University of Arizona, Tucson, AZ 85721 (submitted toSoftware—Practice and Experience).
Myers, E. W. and W. Miller. 1988. “Row replacement algorithms for screen editors.”Trans. on Prog. Lang. and Systems (in press).
Needleman, S. B. and C. D. Wunsch. 1970. “A general method applicable to the search for similarities in the amino acid sequences of two proteins.”J. molec. Biol. 48, 443–453.
Sankoff, D. and J. B. Kruskal. 1983.Time Warps, Strings Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.
Smith, T. F., M. S. Waterman. 1981. “Identification of common molecular subsequences.”J. molec. Biol. 147, 195–197.
—— and W. M. Fitch. 1981. “Comparative biosequence metrics.”J. molec. Evol. 18, 38–46.
Ukkonen, E. 1985. “Algorithms for approximate string matching.”Information and Control 64, 100–118.
Waterman, M. S. 1984. “Efficient sequence alignment algorithms.”J. theor. Biol. 108, 333–337.
-Waterman, M. S. and T. F. Smith. “Rapid dynamic programming algorithms for RNA secondary structure.”Adv. Appl. Math. 7, 455–464.
—— and W. A. Beyer. 1976. “Some biological sequence metrics.”Adv. Math. 20, 367–387.
Zuker, M. and D. Sankoff. 1984. “RNA structures and their prediction.”Bull. math. Biol. 44, 591–621.
Author information
Authors and Affiliations
Additional information
This work was supported in part by NSF Grant DCR-8511455.
Rights and permissions
About this article
Cite this article
Miller, W., Myers, E.W. Sequence comparison with concave weighting functions. Bltn Mathcal Biology 50, 97–120 (1988). https://doi.org/10.1007/BF02459948
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02459948