Abstract
This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79–95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 “super-family” proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length.
The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079–2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377–1382).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Barker WC, George DG, Hunt LT, Garavelli JS (1991) The PIR protein sequence database. Nucleic Acids Res Suppl 19:2231–2236
Blake CCF (1983) Exons—present from the beginning? Nature 306:535–537
Bossi L, Roth JR (1980) The influence of codon context on genetic code translation. Nature 286:123–127
Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990a) The signal for termination of protein synthesis in prokaryotes. Nucleic Acids Res 18:2079–2086
Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990b) Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Res 18:6339–6345
Cavalier-Smith T (1985) Selfish DNA and the origin of introns. Nature 315:283–284
Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392
Darnell JE (1978) Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202:1257–1260
Dill KA (1985) Theory of the folding and stability of globular proteins. Biochemistry 24:1501–1509
Doolittle RF (1979) Protein evolution. In: Neurath H, Hill RL (eds) The proteins, vol IV. Academic Press, New York, pp 1–118
Doolittle RF (1991) Counting and discounting the universe of exons. Science 253:677–679
Doolittle WF (1978) Genes in pieces: were they ever together? Nature 272:581–582
Doolittle WF (1990) Understanding introns: origins and functions. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 43–62
Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382
Dorit RL, Gilbert W (1991) The limited universe of exons. Cur Opinion Struc Biol 1:973–977
Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science 152:363–366
Flory PJ (1953) Principles of polymer chemistry. Cornell University Press, Ithaca, NY, pp 1–672
Gilbert W (1978) Why genes in pieces? Nature 271:501
Hanyu N, Kuchino Y, Nishimura S (1986) Dramatic events in ciliate evolution: alteration of UAA and UAG termination codons to glutamine codons due to anticodon mutations in two Tetrahymena tRNAs(Gln). EMBO 15:1307–1311
Hawkins JD (1988) A survey on intron and exon lengths. Nucleic Acids Res 2:9893–9908
Holland SK, Blake CCF (1990) Proteins, exons, and molecular evolution. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 10–42
Iranpour R, Chacon P (1991) Basic stochastic processes. Macmillan, New York, pp 1–258
Jukes TH (1982) Possible evolutionary steps in the genetic code. Biochem Biophys Res Comm 107:225–228
Jukes TH, Osawa S, Moto A, Lehman N (1987) Evolution of anticodons: variations in the genetic code. Cold Spring Harbor Sympos Quant Biol 52:769–776
Lau KF, Dill KA (1990) Theory for protein mutability and biogenesis. Proc Natl Acad Sci USA 87:638–642
McLachlan AD (1972) Repeating sequences and gene duplication in proteins. J Mol Biol 64:417–437
Mound J (1971) Chance and necessity. An essay on the natural philosophy of modern biology. Alfred A. Knopf, New York, pp 1–199
Naora H, Deacon NJ (1982) Relationship between total size of exons and introns in protein-coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200
Nei M, Chakraborty R, Fuerst PA (1976) Infinite allele model with varying mutation rate. Proc Natl Acad Sci USA 73:4164–4168
Osawa S, Jukes TH (1988) Evolution of the genetic code as affected by anticodon content. Trends Genet 4:191–198
Patthy L (1991) Exons—original building blocks of proteins? BioEssays 13:187–192
Ross SM (1989) Introduction to probability models, 4th ed. Academic Press, San Diego, pp 1–544
Rossman MG (1990) Introductory comments on the function of domains in protein structure. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 3–9
Senapathy P (1986) Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proc Natl Acad Sci USA 83:2133–2137
Senapathy P (1988) Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proc Natl Acad Sci USA 85:1129–1133
Shakhnovich EL, Gutin AM (1989) Formation of unique structure in polypeptide chains: theoretical investigation with the aid of a replica approach. Biophys Chem 34:187–199
Shakhnovich EL, Gutin AM (1990) Implications of thermodynamics of protein folding for evolution of primary sequences. Nature 346:773–775
Sharp PA (1985) On the origin of RNA splicing and introns. Cell 42:397–400
Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55
Sommer SS, Cohen JE (1980) The size distributions of proteins, mRNA, and nuclear RNA. J Mol Evol 15:37–57
Tate WP, Brown CM (1992) Translational termination: “stop” for protein synthesis or “pause” for regulation of gene expression? Biochemistry 31:2443–2450
Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948
White SH (1992) The amino acid preferences of small proteins: implications for protein stability and evolution. J Mol Biol 227:991–995
White SH, Jacobs RE (1990) Statistical distribution of hydrophobic residues along the length of protein chains—implications for protein folding and evolution. Biophys 157:911–921
White SH, Jacobs RE (1993) The evolution of proteins from random amino acid sequences I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 36:79–95.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
White, S.H. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences. J Mol Evol 38, 383–394 (1994). https://doi.org/10.1007/BF00163155
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00163155