Abstract
The comparison of several sequences is central to many problems of molecular biology. Finding consensus patterns that define genetic control regions or that determine structural or functional themes are examples of these problems. Previously proposed methods, such as dynamic programming, are not adequate for solving problems of realistic size. This paper gives a new and practical solution for finding unknown patterns that occur imperfectly above a preset frequency. Algorithms for finding the patterns are given as well as estimates of statistical significance.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Literature
Arbanel, R. M., P. R. Wienecke, E. Mansfield, D. A. Jaffe and D. L. Brutlag. 1984. “Rapid Searches for Computer Patterns in Biological Molecules.”Nucl. Acids. Res. 12, 263–280.
Aho, V. A., J. E. Hopcroft and J. D. Ullman. 1974.The Design and Analysis of Computer Algorithms. Menlo Park, CA: Addison-Wesley.
Anderson, W. F., Y. Takeda, D. H. Ahlendorf and B. W. Matthews. 1982. “Proposed Helix Super-Secondary Structure Associated with Protein-DNA Recognition.”J. Mol. Biol. 159, 745–751.
Breathnoch, R. and P. Chambon. 1981. “Organization Expression of Split Genes Coding for Proteins.”A. Rev. Biochem. 50, 344–383.
Breen, S., M. S. Waterman and N. Zhang. 1984. “Renewal Theory for Several Patterns.”J. Appl. Prob. (in press).
Dickerson, R. E., H. R. Brew, B. N. Conner, R. M. Wing, A. V. Frantini and M. L. Kopha. 1982. “The Anatomy of A-B- C and Z-DNA.”Science 216, 475–485.
Dickerson, R. E. 1983. “Base Sequence and Helix Structure Variation in B DNA.”J. Mol. Biol. 166, 419–441.
Dumas, J. P. and J. Ninio, 1982. “Efficient Algorithms for Folding and Comparing Nucleic Acid Sequences.”Nucl. Acids Res. 80, 197–206.
Gnanadesikan, R. 1977.Methods for Statistical Data Analysis of Multivariate Observations. New York: John Wiley.
Goldberg, M. L. 1979. Ph.D. thesis, Stanford University.
Hawley, D. K. and W. R. McClure. 1983. “Compilation and Analysis ofEscherichia Coli Promotor DNA Sequences.”Nucl. Acids Res. 11, 2237–2255.
Marliere, P. 1982. “The Fossil Organization of Transfer-RNA Sequences.” Unpublished manuscript.
Matthews, B. W., D. H. Ahlendorf, W. F. Anderson and Y. Takeda. 1982. “Structure of the DNA-binding Region ofLac Repressor Inferred from its Homology withCro Repressor.”Proc. natn. Acad. Sci. U.S.A. 79, 1428–1432.
Minsky, M. and S. Papert. 1969. In “Perceptrons.” MIT Press, Cambridge, MA.
Noller, H. F. and C. R. Woese. 1981. “Secondary Structure of 16S Ribosomal RNA.”Science 212, 403–410.
Parzen, E. 1962. “On the Estimation of Probability Density Functions and Mode.”Ann. Math. Statist. 33, 1065–1076.
Pribnow, D. 1975. “Bacteriophage T7 Early Promoters: Nucleotide Sequences of Two RNA Polymerase Binding Sites.”J. Mol. Biol. 99, 419–443.
Queen, C. M., N. Wegman and L. T. Korn. 1982. “Improvements to a Program for DNA Analysis: A Procedure to Find Homologies Among Many Sequences.”Nucl. Acids Res. 10, 449–456.
Sadler, J. R., M. S. Waterman and T. F. Smith. 1983. “Regulatory Pattern Identification in Nucleic Acid Sequences.”Nucl. Acids Res. 11, 2221–2231.
Schaller, H., C. Gray and K. Herrmann. 1975. “Nucleotide Sequence of an RNA Polymerase Binding Site from the DNA of Bacteriophage fd.”PNAS 72, 737–741.
Smith, T. F., M. S. Waterman and W. M. Fitch. 1981. “Comparative Biosequence Metrics.”J. Mol. Biol. 18, 38–46.
Steitz, J. A. and K. Jakes. 1975. “How Ribosomes Select Initiator Regions in mRNA: Base Pair Formation Between the 3′ Terminus of 16S rRNA and the mRNA During Initiation of Protein Synthesis inE. coli.”Proc. natn. Acad. Sci. U.S.A. 72, 4734–4738.
Stormo, G. D., T. D. Schneider, L. Gold and A. Ehrenfeucht. 1982. “Use of the ‘Perceptron’ Algorithm to Distinguish Translational Initation Sites inE. coli.”Nucl. Acids Res. 10, 2997–3011.
Waterman, M. S. and D. E. Whiteman. 1978. “Estimation of Probability Densities by Empirical Density Functions.”Int. J. Math. Educ. Sci. Technol. 9, 127–137.
Waterman, M. S. 1983. “Frequences of Restriction Sites.”Nucl. Acids. Res. 11, 8951–8956.
Waterman, M. S. 1984. “General Methods of Sequence Comparison.”Bull. math. Biol.
Author information
Authors and Affiliations
Additional information
This author supported by a grant from the System Development Foundation.
This author supported by NSF grant MCS-8301960 and by a grant from the System Development Foundation.
This author supported by NIH grant GM19036.
Rights and permissions
About this article
Cite this article
Waterman, M.S., Arratia, R. & Galas, D.J. Pattern recognition in several sequences: Consensus and alignment. Bltn Mathcal Biology 46, 515–527 (1984). https://doi.org/10.1007/BF02459500
Issue Date:
DOI: https://doi.org/10.1007/BF02459500