Summary
In order to examine whether certain short DNA sequences (putative splice signals) occurred in a certain region of an intron more often than would be expected by chance, intron data were examined to see what structure they took. There were significant departures from equal nucleotide frequency, and successive nucleotides clearly did not occur independently in the rat and mouse introns examined. The nonindependence was mainly due to a CG shortage and a less marked TA shortage. However the pairwise frequencies explained almost all the variability in triplet frequencies in the data and so the data could be approximately modeled by using nucleotide frequencies conditional on what the previous nucleotide was. Some coding DNA was also examined and the pairs in second and third positions, and third and first positions in a codon, showed similar departures from independence to those of the intron data. Using the probability model derived for intron data, expected frequencies of putative signals were derived and compared with the observed frequencies.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Baker RJ, Nelder JA (1978) The GLIM system: release 3. Numerical Algorithms Group, Oxford
Bishop MMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, p 270
Breathnach R, Chambon R (1981) Organisation and expression of eukaryotic split genes coding for proteins. Annu Rev Biochem 50:349–384
Bulmer MG (1987) A statistical analysis of nucleotide sequences of introns and exons in human genes. Mol Biol Evol (in press)
Josse J, Kaiser AA, Kornberg A (1961) Enzymatic synthesis of deoxyribonucleic acid: VIII. Frequencies of nearest neighbour base sequences in deoxyribonucleic acid. J Biol Chem 236: 864–875
Keller EB, Noon WA (1984) Intron splicing: a conserved internal signal in introns of animal pre-mRNAS. Proc Natl Acad Sci USA 81:7417–7420
King CR, Piatigorsky J (1983) Alternative RNA splicing of the murine αA-crystallin gene: protein-coding information within an intron. Cell 32:707–712
Kinnaird JH, Fincham JRS (1983) The complete nucleotide sequence of theNeurospora crassa am (NADP-specific glutamate dehydrogenase) gene. Gene 26:253–260
Langford CJ, Gallwitz D (1983) Evidence for an intron-contained sequence required for the splicing of yeast RNA polymerase II transcripts. Cell 33:519–527
Lathe R (1985) Synthetic oligonucleotide probes deduced from amino acid sequence data: theoretical and practical considerations. J Mol Biol 183:1–12
Lewin B (1983) Genes. John Wiley & Sons, New York
Lomedico P, Rosenthal N, Efstratiadis A, Gilbert W, Kolodner R, Tizard R (1979) The structure and evolution of the two non-allelic rat preproinsulin genes. Cell 18:545–558
Maruyama T, Gojobori T, Aota S, Ikemura T (1986) Codon usage tabulated from the Genbank genetic sequence data. Nucleic Acids Res 14:r151-r197
Miller WL, Martial JA, Baxter JD (1980) Molecular cloning of DNA complementary to bovine growth hormone mRNA. J Biol Chem 255:7521–7524
Mount SM (1982) A catalogue of splice junction sequences. Nucleic Acids Res 10:459–472
Noda M, Furutani Y, Takahashi H, Toyosato M, Hirose T, Inayama S, Nakanishi S, Numa S (1982) Cloning and sequence analysis of cDNA of bovine adrenal preproenkephalin. Nature 295:202–206
Nussinov R (1984) Doublet frequencies in evolutionary distinct groups. Nucleic Acids Res 12:1749–1763.
Pikielny CW, Teem JL, Rosbash M (1983) Evidence for the biochemical role of an internal sequence in yeast nuclear m-RNA introns: implications for U1RNA and metazoan mRNA splicing. Cell 34:395–403
Tautz D, Trick M, Dover GA (1986) Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656
Woudt LP, Pastink A, Kempers-Veenstra AE, Jansen AEM, Mager WH, Planta RJ (1983) The genes coding for histone H3 and H4 inNeurospora crassa are unique and contain intervening sequences. Nucleic Acids Res 11:5347–5360
Zakut R, Shani M, Givol D, Neuman S, Yaffe D, Nudel U (1982) The nucleotide sequence of the rat skeletal muscle actin gene. Nature 298:857–859
Author information
Authors and Affiliations
Additional information
Some of the work for this paper was done while the author was at the Department of Applied Statistics, University of Reading, England
Rights and permissions
About this article
Cite this article
Avery, P.J. The analysis of intron data and their use in the detection of short signals. J Mol Evol 26, 335–340 (1987). https://doi.org/10.1007/BF02101152
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02101152