The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences

White, Stephen H.

doi:10.1007/BF00163155

The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences

Published: April 1994

Volume 38, pages 383–394, (1994)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Molecular Evolution Aims and scope Submit manuscript

The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences

Download PDF

Stephen H. White¹

140 Accesses
17 Citations
Explore all metrics

Abstract

This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79–95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 “super-family” proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length.

The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079–2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377–1382).

Article PDF

Site-Specific Amino Acid Distributions Follow a Universal Shape

Article 24 November 2020

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Article Open access 22 January 2015

Natural protein sequences are more intrinsically disordered than random sequences

Article Open access 22 January 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Barker WC, George DG, Hunt LT, Garavelli JS (1991) The PIR protein sequence database. Nucleic Acids Res Suppl 19:2231–2236
Google Scholar
Blake CCF (1983) Exons—present from the beginning? Nature 306:535–537
Google Scholar
Bossi L, Roth JR (1980) The influence of codon context on genetic code translation. Nature 286:123–127
Google Scholar
Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990a) The signal for termination of protein synthesis in prokaryotes. Nucleic Acids Res 18:2079–2086
Google Scholar
Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990b) Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Res 18:6339–6345
Google Scholar
Cavalier-Smith T (1985) Selfish DNA and the origin of introns. Nature 315:283–284
Google Scholar
Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392
Google Scholar
Darnell JE (1978) Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202:1257–1260
Google Scholar
Dill KA (1985) Theory of the folding and stability of globular proteins. Biochemistry 24:1501–1509
Google Scholar
Doolittle RF (1979) Protein evolution. In: Neurath H, Hill RL (eds) The proteins, vol IV. Academic Press, New York, pp 1–118
Google Scholar
Doolittle RF (1991) Counting and discounting the universe of exons. Science 253:677–679
Google Scholar
Doolittle WF (1978) Genes in pieces: were they ever together? Nature 272:581–582
Google Scholar
Doolittle WF (1990) Understanding introns: origins and functions. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 43–62
Google Scholar
Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382
Google Scholar
Dorit RL, Gilbert W (1991) The limited universe of exons. Cur Opinion Struc Biol 1:973–977
Google Scholar
Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science 152:363–366
Google Scholar
Flory PJ (1953) Principles of polymer chemistry. Cornell University Press, Ithaca, NY, pp 1–672
Google Scholar
Gilbert W (1978) Why genes in pieces? Nature 271:501
Google Scholar
Hanyu N, Kuchino Y, Nishimura S (1986) Dramatic events in ciliate evolution: alteration of UAA and UAG termination codons to glutamine codons due to anticodon mutations in two Tetrahymena tRNAs(Gln). EMBO 15:1307–1311
Google Scholar
Hawkins JD (1988) A survey on intron and exon lengths. Nucleic Acids Res 2:9893–9908
Google Scholar
Holland SK, Blake CCF (1990) Proteins, exons, and molecular evolution. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 10–42
Google Scholar
Iranpour R, Chacon P (1991) Basic stochastic processes. Macmillan, New York, pp 1–258
Google Scholar
Jukes TH (1982) Possible evolutionary steps in the genetic code. Biochem Biophys Res Comm 107:225–228
Google Scholar
Jukes TH, Osawa S, Moto A, Lehman N (1987) Evolution of anticodons: variations in the genetic code. Cold Spring Harbor Sympos Quant Biol 52:769–776
Google Scholar
Lau KF, Dill KA (1990) Theory for protein mutability and biogenesis. Proc Natl Acad Sci USA 87:638–642
Google Scholar
McLachlan AD (1972) Repeating sequences and gene duplication in proteins. J Mol Biol 64:417–437
Google Scholar
Mound J (1971) Chance and necessity. An essay on the natural philosophy of modern biology. Alfred A. Knopf, New York, pp 1–199
Google Scholar
Naora H, Deacon NJ (1982) Relationship between total size of exons and introns in protein-coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200
Google Scholar
Nei M, Chakraborty R, Fuerst PA (1976) Infinite allele model with varying mutation rate. Proc Natl Acad Sci USA 73:4164–4168
Google Scholar
Osawa S, Jukes TH (1988) Evolution of the genetic code as affected by anticodon content. Trends Genet 4:191–198
Google Scholar
Patthy L (1991) Exons—original building blocks of proteins? BioEssays 13:187–192
Google Scholar
Ross SM (1989) Introduction to probability models, 4th ed. Academic Press, San Diego, pp 1–544
Google Scholar
Rossman MG (1990) Introductory comments on the function of domains in protein structure. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 3–9
Google Scholar
Senapathy P (1986) Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proc Natl Acad Sci USA 83:2133–2137
Google Scholar
Senapathy P (1988) Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proc Natl Acad Sci USA 85:1129–1133
Google Scholar
Shakhnovich EL, Gutin AM (1989) Formation of unique structure in polypeptide chains: theoretical investigation with the aid of a replica approach. Biophys Chem 34:187–199
Google Scholar
Shakhnovich EL, Gutin AM (1990) Implications of thermodynamics of protein folding for evolution of primary sequences. Nature 346:773–775
Google Scholar
Sharp PA (1985) On the origin of RNA splicing and introns. Cell 42:397–400
Google Scholar
Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55
Google Scholar
Sommer SS, Cohen JE (1980) The size distributions of proteins, mRNA, and nuclear RNA. J Mol Evol 15:37–57
Google Scholar
Tate WP, Brown CM (1992) Translational termination: “stop” for protein synthesis or “pause” for regulation of gene expression? Biochemistry 31:2443–2450
Google Scholar
Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948
Google Scholar
White SH (1992) The amino acid preferences of small proteins: implications for protein stability and evolution. J Mol Biol 227:991–995
Google Scholar
White SH, Jacobs RE (1990) Statistical distribution of hydrophobic residues along the length of protein chains—implications for protein folding and evolution. Biophys 157:911–921
Google Scholar
White SH, Jacobs RE (1993) The evolution of proteins from random amino acid sequences I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 36:79–95.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physiology and Biophysics, University of California, 92717, Irvine, CA, USA
Stephen H. White

Authors

Stephen H. White
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, S.H. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences. J Mol Evol 38, 383–394 (1994). https://doi.org/10.1007/BF00163155

Download citation

Received: 24 June 1992
Revised: 24 May 1993
Issue Date: April 1994
DOI: https://doi.org/10.1007/BF00163155

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences

Abstract

Article PDF

Similar content being viewed by others

Site-Specific Amino Acid Distributions Follow a Universal Shape

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Natural protein sequences are more intrinsically disordered than random sequences

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences

Abstract

Article PDF

Similar content being viewed by others

Site-Specific Amino Acid Distributions Follow a Universal Shape

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Natural protein sequences are more intrinsically disordered than random sequences

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation