Summary
Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Almagor H (1983) A Markov chain analysis of DNA sequences. J Theor Biol 104:633–645
Altenburger W, Neumaier PS, Steinmetz M, Zachau HG (1981) DNA sequence of the constant region of the mouse immunoglobulin kappa chain. Nucleic Acids Res 9:971–981
Anderson TW, Goodman LA (1957) Statistical inference about Markov chains. Ann Math Stat 28:89–109
Baralle FE, Shoulders CC, Proudfoot NJ (1980a) The primary structure of the human epsilon-globin gene. Cell 21:621–626
Baralle FE, Shoulders CC, Goodbourn S, Jeffreys A, Proudfoot NJ (1980b) The 5′ flanking region of human epsilon-globin gene. Nucleic Acids Res 8:4393–4404
Bell GI, Pictet RL, Rutter WJ, Cordell B, Tischer E, Goodman HM (1980a) Sequence of the human insulin gene. Nature 284:26–32
Bell GI, Pictet R, Rutter WJ (1980b) Analysis of the regions flanking the human insulin gene and sequence of an Alu family member. Nucleic Acids Res 8:4091–4109
Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504
Blaisdell BE (1983a) A prevalent persistent nonrandomness that distinguishes coding and noncoding eucaryotic nuclear DNA sequences. J Mol Evol 19:122–133
Blaisdell BE (1983b) Choice of base at silent codon site 3 is not selectively neutral in eucaryotic structural genes: It maintains excess short runs of weak and strong hydrogen bonding bases. J Mol Evol 19:226–236
Chang ACY, Cochet M, Cohen SN (1980) Structural organization of human genomic DNA encoding the propiomelanocortin peptide. Proc Natl Acad Sci USA 77:4890–4894
Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base substitution hotspots inEscherichia coli. Nature 274:775–780
Elton RA (1975) Doublet frequencies in sequenced nucleic acids. J Mol Evol 4:323–346
Erickson JW, Altman G (1979) A search for patterns in the nucleotide sequence of the MS2 genome. J Math Biol 7:219–230
Gatlin L (1972) Information theory and the living system. Columbia University Press, New York
Goeddel DV, Yelverlon E, Ullrich A, Heyneker HL, Miozzari G, Holmes W, Seeburg PH, Dull T, May L, Stebbins N, Crea R, Maeda S, McCandliss R, Sloma A, Tabor JM, Gross M, Familetti PC, Pestka S (1980) Human leukocyte interferon produced byE. coli is biologically active. Nature 287:411–416
Gubbins EJ, Maurer RA, Lagrimini M, Erwin CR, Donelson JE (1980) Structure of the rat prolactin gene. J Biol Chem 255:8655–8662
Hieter PA, Max EE, Seidman JG, Maizel JV, Leder P (1980) Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell 22:197–207
Holland JP, Holland MJ (1979) The primary structure of a glyceraldehyde-3-phosphate dehydrogenase gene fromSaccharomyces cerevisiae. J Biol Chem 254:9839–9845
Josse J, Kaiser AD, Kornberg A (1961) Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem 236:864–875
Jukes TH (1978) Codons and nearest neighbor nucleotide pairs in mammalian messenger RNA. J Mol Evol 11:121–127
Konkel DA, Maizel JV, Leder P (1979) The evolution and sequence comparison of two recently diverged mouse chromosome beta-globin genes. Cell 18:865–873
Kullback S, Kupperman M, Ku HH (1962) Tests for contingency tables and Markov chains. Technometrics 4:573–608
Lawn RM, Efstratiadis A, O'Connell C, Maniatis T (1980) The nucleotide sequence of the human beta-globin gene. Cell 21:647–651
Lawn RM, Adelman J, Franke AE, Houck M, Cross M, Najarian R, Coeddel OV (1981) Human fibroblast interferon gene lacks introns. Nucleic Acids Res 9:1045–1052
Lipman DJ, Wilbur WJ (1983) Contextual constraints on synonymous codon choice. J Mol Biol 163:363–376
Lomedico P, Rosenthal N, Efstratiadis A, Gilbert W, Kolodner R, Tizard R (1979) The structure and evolution of the two nonallelic rat preproinsulin genes. Cell 18:545–558
Ng R, Abelson J (1980) Isolation and sequence of the gene for actin inSaccharomyces cerevisiae. Proc Natl Acad Sci USA 77:3912–3916
Nishioka Y, Leder P (1979) The complete sequence of a chromosomal mouse alpha globin gene reveals elements conserved throughout vertebrate evolution. Cell 18:875–882
Nishioka Y, Leder PJ (1980) Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem 255:3691–3694
Nussinov R (1980) Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res 8:4545–4562
Nussinov R (1981) The universal dinucleotide asymmetry rules in DNA and amino acid codon choice. J Mol Evol 17:237–244
Ohno S, Epplen JT (1983) The primitive code and repeats of base oligomers as the primordial protein-encoding sequence. Proc Natl Acad Sci USA 80:3391–3395
Perder F, Efstratiadis A, Lomedico P, Gilbert W, Kolodner R, Dodgson J (1980) The evolution of genes: the chicken preproinsulin gene. Cell 20:555–566
Proudfoot NJ, Maniatis T (1980) The structure of a human alpha globin pseudogene and its relationship to alpha globin gene duplication. Cell 21:537–544
Richards RJ, Shine J, Ullrich A, Wells JRE, Goodman HM (1979) Molecular cloning and sequence analysis of adult chicken beta globin cDNA. Nucleic Acids Res 7:1137–1146.
Robertson MA, Staden R, Tanaka Y, Catterall JF, O'Malley BW, Brownlee CG (1979) Sequence of three introns of the chick ovalbumin gene. Nature 278:370–372
Sakano H, Maki R, Kurosawa Y, Roeder W, Tonegawa S (1980) Two types of somatic recombination are necessary for the generation of complete immunoglobulin heavy chain genes. Nature 286:676–683
Salser W (1977) Globin messenger—RNA sequences—analysis of base-pairing and evolutionary implications. Cold Spring Harbor Symp Quant Biol 42:985–1103
Slightom JL, Blechl AE, Smithies O (1980) Human fetal G-gamma and A-gamma globin genes: Complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21:627–638
Spritz RA, De Riel JK, Forget BG, Weissman SM (1980) Complete nucleotide sequence of the human delta-globin gene. Cell 21:639–646
Sun SM, Slightom JL, Hall TC (1981) Intervening sequences in a plant gene: comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin. Nature 289:37–41
Sures I, Lowry J, Kedes LH (1978) The DNA sequence of sea urchin (S. purpuratus) H2A, H2B and H3 histone coding and spacer regions. Cell 15:1033–1044
Swartz MN, Trautner TA, Kornberg A (1962) Enzymatic synthesis of deoxyribonucleic acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J Biol Chem 237:1961–1967
Takahashi N, Kataoka T, Honjo T (1980) Nucleotide sequences of class-switch recombination region of the mouse immunoglobulin gamma 2b-chain gene. Gene 11:117–127
Tschumper G, Carbon J (1980) Sequence of a yeast fragment containing a chromosomal replicator and the TRPI gene. Gene 10:157–166
Ullrich A, Dull RJ, Gray A, Brosius J, Sures I (1980) Genetic variation in the human insulin gene. Science 209:612–615
van Ooyen A, van den Berg J, Mantei N, Weissmann C (1979) Comparison of total sequence of a cloned rabbit beta-globin gene and its flanking regions with a homologous mouse sequence. Science 206:337–344
Young RA, Hagenbuchle O, Schibler U (1981) A single mouse alpha-amylase gene specifies two different tissue-specific mRNAs. Cell 23:451–458
Zuckerkandl E (1975) The appearance of new structures and functions in proteins during evolution. J Mol Evol 7:1–57
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Blaisdell, B.E. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol 21, 278–288 (1985). https://doi.org/10.1007/BF02102360
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02102360