Summary
A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.
Application of this ‘optimum match’ method of sequence comparison shows that the sensitivity of the ‘maximum match’ method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The ‘maximum match’ method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The ‘constrained match’ method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ambler, R.P., Bartsch, R.G. (1975). Nature253, 285–288
Barker, W.C., Dayhoff, M.O. (1972). In: Atlas of Protein Sequence and Structure, (Dayhoff, M.O. ed.), vol. 5, pp. 101–110. National Biomedical Research Foundation, Washington, USA
Haën, C. de, Swanson, E., Teller, D.C. (1976). J. Mol. Biol.106, 639–661
Dickerson, R.E. (1971). J. Mol. Biol.57, 1–15
Dickerson, R.E. (1972). Scientific American, vol. 226, No. 4, pp. 58–72
Fitch, W.M. (1966). J. Mol. Biol.16, 9–16
Fitch, W.M. (1970). J. Mol. Biol.49, 1–14
Haber, J.E., Koshland, D.E. (1970). J. Mol. Biol.50, 617–639
Johnson, N.L., Nixon, E., Amos, P.E. (1963). Biometrika50, 459–498
Mathews, F.S., Levine, M., Argos, P. (1972). J. Mol. Biol.64, 449–464
McLachlan, A.D. (1971). J. Mol. Biol.61, 409–424
Needleman, S.B., Wunsch, C.D. (1970). J. Mol. Biol.48, 443–453
Ozols, J., Strittmatter, P. (1967). Proc. Nat. Acad. Sci. Wash.58, 264–267
Pettigrew, G.W. (1974). Biochem. J.139, 449–459
Rossmann, M.G., Argos, P. (1975). J. Biol. Chem.250, 7525–7532
Sankoff, D., Cedergren, R.J. (1973). J. Mol. Biol.77, 159–164
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elleman, T.C. A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions. J Mol Evol 11, 143–161 (1978). https://doi.org/10.1007/BF01733890
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01733890