Summary
This paper conducts a statistical analysis of the size distributions of exons and six other gene parts [the transcription unit, introns, intervening DNA (sum of introns), mRNA (sum of exons), and leader and trailer regions of mRNA] as well as the number of exons, the percentage of introns, the placement of introns within the gene, and the potential for frameshifts from coding exon shifts. The first seven variables measured in base pairs fit lognormal distributions. Significant correlations between the sizes of intervening DNA and mRNA, the sizes of leader and trailer regions, and the sizes of introns and flanking exons exist. Introns occur at nonrandom frequencies within the codon frame, in untranslated regions, and relative to the frameshift potential from exon movement or duplication. These nonrandom patterns in gene structure demonstrate that models of gene evolution must incorporate selective processes.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bell GI, Sanchez-Pescador R, Laybourn PJ, Jajarian RC (1983) Exon duplication and divergence in the human preproglucagon gene. Nature 304:368–371
Bell GI, Quinto C, Quiroga M, Valenzuela P, Craik CS, Rutter WJ (1984) Isolation and sequence of a rat chymotrypsin B gene. J Biol Chem 259:14265–14570
Bensi G, Raugei G, Klefenz H, Cortese R (1985) Structure and expression of the human haptoglobin locus. EMBO J 4:119–126
Blake CCF (1978) Do genes-in-pieces imply proteins-in-pieces? Nature 273:267–268
Blake CCF (1983) Exons—present from the beginning? Nature 306:535–537
Blake CCF (1985) Exons and the evolution of proteins. Int Rev Cytol 93:149–185
Bodner M, Fridkin M, Gozes I (1985) Coding sequences for vasoactive intestinal peptide and PHM-27 peptide are located on two adjacent exons in the human genome. Proc Natl Acad Sci USA 82:3548–3551
Brown JR, Daar IO, Krug JR, Maquat LE (1985) Characterization of the functional gene and several processed pseudogenes in the human triosephosphate isomerase gene family. Mol Cell Biol 5:1694–1706
Burgess DG, Penhoet EE (1985) Characterization of the chicken aldolase B gene. J Biol Chem 260:4604–4614
Campbell RD, Porter RP (1983) Molecular cloning and characterization of the gene coding for human complement protein factor B. Proc Natl Acad Sci USA 80:4464–4468
Campbell RS, Rosen JM (1984) Comparison of the whey acidic protein genes of the rat and mouse. Nucleic Acids Res 12:8685–8697
Cavalier-Smith T (1985) Selfish DNA and the origin of introns. Nature 315:283–284
Cech TR (1986) The generality of self-splicing RNA: relationship to nuclear mRNA splicing. Cell 44:207–210
Chan SJ, Episkopou V, Zeitlin S, Karathanasis SK, MacKrell A, Steiner DF, Efstratiadis A (1984) Guinea pig preproinsulin gene: an evolutionary compromise? Proc Natl Acad Sci USA 81:5046–5050
Chang HC, Seki T, Moriuchi T, Silver J (1985) Isolation and characterization of mouseThy-1 genomic clones. Proc Natl Acad Sci USA 82:3819–3823
Chiu I-M, Reddy EP, Givol D, Robbins KC, Tronick SR, Aaronson SA (1984) Nucleotide sequence analysis identifies the human c-sis proto-oncogene as a structural gene for plateletderived growth factor. Cell 37:123–129
Cooke NE, Baxter JD (1982) Structural analysis of the prolactin gene suggests a separate origin for its 5′ end. Nature 297:603–606
Craik CL, Choo Q-L, Swift GH, quinto C, MacDonald RJ, Rutter WJ (1984) Structure of two related rat pancreatic trypsin genes. J Biol Chem 259:14255–14264
Crouse GF, Simonsen CC, McEwan RN, Schimke RT (1982) Structure of amplified normal and variant dihydrofolate reductase genes in mouse sarcoma S180 cells. J Biol Chem 257:7887–7897
Das HK, Lawrence SK, Weissmann SM (1983) Structure and nucleotide sequence of the heavy chain gene of HLA-DR. Proc Natl Acad Sci USA 80:3543–3547
Davies PL, Hough C, Scott GK, Ng N, White BN, Hew CL (1984) Antifreeze protein genes of the winter flounder. J Biol Chem 259:9241–9247
Degen SJF, MacGillivray TTA, Davie EW (1983) Characterization of the complementary deoxyribonucleic acid gene coding for human prothrombin. Biochemistry 22:2087–2097
Deschenes RJ, Haun RS, Funckes CL, Dixon JE (1985) A gene encoding rat cholecystokinin. J Biol Chem 260:1280–1286
Doolittle RF (1985) The genealogy of some recently evolved vertebrate proteins. Trends Biochem Sci 10:233–237
Dudov KP, Perry RP (1984) The gene family encoding the mouse ribosomal protein L32 contains a uniquely expressed intron-containing gene and an unmutated processed gene. Cell 37:457–468
Dull TJ, Gray A, Hayflick JS, Ullrich A (1984) Insulin-like growth factor II precursor gene organization in relation to insulin gene family. Nature 310:777–781
Dush MK, Sikela JM, Khan SA, Tischfield JA, Stanbrook PJ (1985) Nucleotide sequence and organization of the mouse adenine phosphoribosyltransferase gene: presence of a coding region common to animal and bacterial phosphoribosyltransferases that has a variable intron/exon arrangement. Proc Natl Acad Sci USA 82:2731–2735
Evans BA, Richards RI (1985) Genes for the α and γ subunits of nerve growth factor are contiguous. EMBO J 4:133–138
Fiddes JC, Goodman HM (1981) The gene encoding the common alpha subunit of the four human glycoprotein hormones. J Mol Appl Genet 1:3–18
Fornace AJ Jr, Cummings DE, Comeau CM, Kant JA, Crabtree GR (1984) Structure of the human γ-fibrinogen gene. J Biol Chem 259:12826–12830
Foster DC, Yoshitake S, Davie EW (1985) The nucleotide sequence of the gene for human protein C. Proc Natl Acad Sci USA 82:4673–4677
Gilbert W (1978) Why genes in pieces? Nature 271:501
Gilbert W (1985) Genes-in-pieces revisited. Science 228:823–824
Gitschier J, Wood WI, Goralka TM, Wion KL, Chen EY, Eaton DH, Vehar GA, Capon DJ, Lawn RM (1984) Characterization of the factor VIII gene. Nature 312:326–330
Gray PW, Goeddel DV (1982) Structure of the human immune interferon gene. Nature 298:859–863
Hall JL, Cowan NJ (1985) Structural features and restricted expression of a human α-tubulin gene. Nucleic Acids Res 13:207–223
Harris SE, Mansson P-E, Tully DR, Burkhart B (1983) Seminal vesicle secretion IV gene: allelic difference due to a series of 20-base-pair direct tandem repeats within an intron. Proc Natl Acad Sci USA 80:6460–6464
Heilig R, Muraskowsky R, Kloepfer C, Mandel JL (1982) The ovalbumin gene family; complete sequence and structure of the Y gene. Nucleic Acids Res 14:4363–4382
Heinrich G, Kronenberg HM, Potts JT Jr, Habener JF (1984) Gene encoding parathyroid hormone. J Biol Chem 259:3320–3329
Hudson P, Haley J, John M, Cronk M, Crawford R, Haralambidis J, Treagear G, Shine J, Niall N (1983) Structure of a genomic clone encoding biologically active human relaxin. Nature 301:628–631
Ito R, Sato K, Helmer T, Jay G, Agarwal K (1984) Structural analysis of the gene encoding human gastrin: the large intron contains anAlu sequence. Proc Natl Acad Sci USA 81:4662–4666
Jameson L, Chin WW, Hollenberg AN, Chang AS, Habener JF (1984) The gene encoding the β-subunit of rat luteinizing hormone. J Biol Chem 259:15474–15480
Jones WK, Yu-Lee L, Clift SM, Brown TL, Rosen JM (1985) The rat casein multigene family. J Biol Chem 260:7042–7050
Jung A, Sippel AE, Grez M, Schutz G (1980) Exons encode functional and structural units of chicken lysozyme. Proc Natl Acad Sci USA 77:5759–5763
Kitamura N, Kitagawa H, Fukushima D, Takagaki Y, Miyata T, Nakanishi S (1985) Structural organization of the human kininogen gene and a model for its evolution. J Biol Chem 260:8610–8617
Kost TA, Theodorakis N, Hughes SH (1983) The nucleotide sequence of the chick cytoplasmic β-actin gene. Nucleic Acids Res 11:8287–8301
Kwoh TJ, Engler JA (1984) The nucleotide sequence of the chicken thymidine kinase gene and the relationship of its predicted polypeptide to that of the vaccinia virus thymidine kinase. Nucleic Acids Res 12:3959–3971
Larhammar D, Hyldig-Nielsen JJ, Serenius B, Andersson G, Rask L, Peterson PA (1983) Exon-intron organization and complete nucleotide sequence of a human major histocompatibility antigen DCβ gene. Proc Natl Acad Sci USA 80:7313–7317
Levanon D, Lieman-Hurwitz J, Dafni N, Wigderson M, Sherman L, Bernstein Y, Laver-Rudich Z, Danciger E, Stein O, Groner Y (1985) Architecture and anatomy of the chromosomal locus in human chromosome 21 encoding the Cu/Zn superoxide dismutase. EMBO J 4:77–84
Lonberg N, Gilbert W (1985) Intron/exon structure of the chicken pyruvate kinase gene. Cell 40:81–90
Mahdavi V, Chambers AP, Nadal-Ginard B (1984) Cardiac α-and β-myosin heavy chain genes are organized in tandem. Proc Natl Acad Sci USA 81:2626–2630
Marchuk D, McCrohon, Fuchs E (1984) Remarkable conservation of structure among intermediate filament genes. Cell 39:491–498
Mayo KE, Cerelli GM, Lebo RV, Bruce BD, Rosenfeld MG, Evans RM (1985) Gene encoding human growth hormone-releasing factor precursor: structure, sequence, and chromosomal assignment. Proc Natl Acad Sci USA 82:63–67
Melton DW, Konecki DS, Brennand J, Caskey CT (1984) Structure, expression, and mutation of the hypoxanthine phosphoribosyltransferase gene. Proc Natl Acad Sci USA 81:2147–2151
Meyerhof W, Klinger-Mitropoulos S, Stadler J, Weber R, Knochel W (1984) The primary structure of the larval β1-globin gene ofXenopus laevis and its flanking region. Nucleic Acids Res 12:7705–7719
Michelson AM, Bruns GAP, Morton CC, Orkin SH (1985) The human phosphoglycerate kinase multigene family. J Biol Chem 260:6982–6992
Miyatake S, Yokota T, Lee F, Arai K-I (1985) Structure of the chromosomal gene for murine interleukin 3. Proc Natl Acad Sci USA 82:316–320
Miyazaki H, Fukamizu A, Hirose S, Hayashi T, Hori H, Ohkubo H, Nadanishi S, Murakami K (1984) Structure of the human renin gene. Proc Natl Acad Sci USA 81:5999–6003
Nabeshima Y, Fujii-Kuriyama Y, Muramatsu M, Ogata K (1984) Alternate transcription and two modes of splicing result in two myosin light chains from one gene. Nature 308:333–338
Naora H, Deacon NJ (1982a) Clustered genes require extragenic territorial DNA sequences. Differentiation 21:1–6
Naora H, Deacon NJ (1982b) Relationship between the total size of exons and introns in protein coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200
Nathans J, Hogness DS (1984) Isolation and nucleotide sequence of the gene encoding human rhodopsin. Proc Natl Acad Sci USA 81:4851–4855
Nawa H, Kotani H, Nakanishi S (1984) Tissue specific generation of two preprotachykinin mRNAs by alternate splicing. Nature 312:729–734
Nef P, Mauron A, Stalder R, Alliod C, Ballivet M (1984) Structure, linkage, and sequence of the two genes encoding δ and γ subunits of the nicotinic acetylcholine receptor. Proc Natl Acad Sci USA 81:7975–7979
Nemer M, Chamberland M, Sirois D, Argentin S, Drouin J, Dixon RAF, Zivin RA, Condra JH (1984) Gene structure of human cardiac hormone precursor, pronatriodilatin. Nature 312:654–656
Notake M, Tobimatsu T, Watanabe Y, Takahashi H, Mishina M, Numa S (1983) Isolation and characterization of the mouse corticotropin-β-lipotropin precursor gene and a related pseudogene. FEBS Lett 156:67–71
Nudel U, Calvo JM, Shani M, Levy Z (1984) The nucleotide sequence of a rat myosin light chain 2 gene. Nucleic Acids Res 12:7175–7186
Ny T, Elgh F, Lund B (1984) The structure of the human tissue-type plasminogen activator gene: correlation of intron and exon structures to functional and structural domains. Proc Natl Acad Sci USA 81:5355–5359
Ohno M, Sakamoto H, Yasuda K, Okada TS, Shimura Y (1985) Nucleotide sequence of a chicken δ-crystallin gene. Nucleic Acids Res 13:1593–1606
Ooyen AV, Nusse R (1984) Structure and nucleotide sequence of the putative mammary oncogeneint-1; proviral insertions leave the protein-encoding domain intact. Cell 39:233–240
Parnes JR, Seidman JG (1982) Structure of wild-type and mutant mouse β2-microglobin genes. Cell 29:661–669
Protter AA, Levy-Wilson B, Miller J, Bencen G, White T, Seilhamer JJ (1984) Isolation and sequence analysis of the human apolipoprotein CIII gene and the intergenic region between the apo AI and apo CIII genes. DNA 3:449–456
Reinke R, Feigelson P (1985) Rat α1-acid glycoprotein. J Biol Chem 260:4397–4403
Rogers J (1985) Exon shuffling and intron insertion in serine protease genes. Nature 315:458–459
Rosen H, Douglass J, Herbert E (1984) Isolation and characterization of the rat proenkephalin gene. J Biol Chem 259:14309–14313
Ruppert S, Scherer G, Schutz G (1984) Recent gene conversion involving bovine vasopressin and oxytocin precursor genes suggested by nucleotide sequence. Nature 308:554–557
Sargent TD, Jagodizinski LL, Yang M, Bonner J (1981) Fine structure and evolution of the rat serum albumin gene. Mol Cell Biol 1:871–883
SAS Institute Inc (1982) SAS user's guide: basics, 1982 ed. SAS Institute, Cary NC
Scarpulla RC (1984) Processed pseudogenes for rat cytochromec are preferentially derived from one of three alternate mRNAs. Mol Cell Biol 4:2279–2288
Searle PF, Davison BL, Stuart GW, Wilke TM, Norstedt G, Palmiter RD (1984) Regulation, linkage, and sequence of mouse metallothionein I and II genes. Mol Cell Biol 4:1221–1230
Seidman CE, Bloch KD, Klein KA, Smith JA, Seidman JG (1984) Nucleotide sequences of the human and mouse atrial natriuretic factor genes. Science 226:1206–1209
Sekiya K, Fushimi M, Hori H, Hirohashi S, Nishimura S, Sugimura T (1984) Molecular cloning and the total nucleotide sequence of the human c-Ha-ras-1 gene activated in a melanoma from a Japanese patient. Proc Natl Acad Sci USA 81:4771–4775
Selby MJ, Barta A, Baxter JD, Bell GI, Eberhardt NL (1984) Analysis of a major human chorionic somatomammotropin gene. J Biol Chem 259:13131–13138
Shen L-P, Rutter WJ (1984) Sequence of the human somatostatin I gene. Science 224:168–171
Simmen RCM, Tanaka T, Ts'ui KF, Putkey JA, Scott MJ (1985) The structural organization of the chicken calmodulin gene. J Biol Chem 260:907–912
Sogawa K, Fujii-Kuriyama Y, Mizukami Y, Ichihara Y, Takahashi K (1983) Primary structure of the human pepsinogen gene. J Biol Chem 258:5306–5311
Sogawa K, Gotoh O, Kawajiri K, Fujii-Kuriyama Y (1984) Distinct organization of methylcholanthrene- and phenobarbital-inducible cytochrome P-450 genes in the rat. Proc Natl Acad Sci USA 81:5066–5070
Sokal RR, Rohlf FJ (1981) Biometry, WH Freeman, New York
Stanton LW, Fahrlander PD, Tesser PM, Marcu KB (1984) Nucleotide sequence comparison of normal and translocated murine c-myc genes. Nature 310:423–425
Strein JP, Catterall JF, Kristo P, Means AR, O'Malley BW (1980) Ovomucoid intervening sequences specify functional domains and generate protein polymorphism. Cell 21:681–687
Stone EM, Rothblum KN, Alevy MC, Kuo TM, Schwartz RJ (1985) Complete sequence of the chicken glyceraldehyde-3-phosphate dehydrogenase gene. Proc Natl Acad Sci USA 82:1628–1632
Sudhof TC, Goldstein JL, Brown MS, Russell DW (1985) The LDL receptor gene: a mosaic of exons shared with different proteins. Science 228:815–822
Swift GH, Craik CS, Stary SJ, Quinto C, Lahaie RG, Rutter WJ, MacDonald RJ (1984) Structure of the two related elastase genes expressed in the rat pancreas. J Biol Chem 259:14271–14278
Takeya T, Hanafusa H (1983) Structure and sequence of the cellular gene homologous to the RSVsrc gene and the mechanism for generating the transforming virus. Cell 32:881–890
Tamkun JW, Schwarzbauer JE, Hynes RO (1984) A single rat fibronectin gene generates three different mRNAs by alternative splicing a complex exon. Proc Natl Acad Sci USA 81:5140–5144
Valerio D, Duyvesteyn MGC, Dekker BMM, Weeda G, Berkvens TM, van der Voorn L, van Ormondt H, vander Eb AJ (1985) Adenosine deaminase: characterization and expression of a gene with a remarkable promoter. EMBO J 4:437–443
Wang JYJ, Ledley F, Goff S, Lee R, Groner Y, Baltimore D (1984) The mouse c-abl locus: molecular cloning and characterzation. Cell 36:349–356
Wiedemann LM, Perry RP (1984) Characterization of the expressed gene and several processed pseudogenes for the mouse ribosomal protein L30 gene family. Mol Cell Biol 4:2518–2528
Wieringa B, Hofer E, Weissmann C (1984) A minimum intron length but no specific internal sequence is required for splicing the large rabbit β-globin intron. Cell 37:915–925
Yamada Y, Kuhn K, Crombrugghe BD (1983) A conserved nucleotide sequence, coding for a segment of the C-propeptide, is found at the same location in different collagen genes. Nucleic Acids Res 11:2733–2744
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Smith, M.W. Structure of vertebrate genes: A statistical analysis implicating selection. J Mol Evol 27, 45–55 (1988). https://doi.org/10.1007/BF02099729
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02099729