Summary
The data from a genomic library can be sorted into the frequencies of every possible tetranucleotide in the sequence. This tabulation, a short sequence distribution, contains the frequency of occurrence of the 256 tetranucleotides and thus seems to serve as a vehicle for averaging sequence information. Two such distributions can be readily compared by correlation. Reported here are correlations (Spearmanr s) of the distributions from all of the genomic libraries in GenBank 44.0 with sizes equal to or larger than that ofSalmonella typhimurium, except for the data for mouse and humans. All of the organisms examined showed highly significant correlations between the two DNA strands (not the complementarity expected from base pairing). Of 155 comparisons between libraries, 132 showed significant correlations at the 99% confidence level. Application of the correlation coefficients as a similarity matrix clustered most organisms in a phenogram in a pattern consistent with other hypotheses. This suggests a highly conserved pattern underlying all other genetic information in cellular DNA and affecting both DNA strands, perhaps caused by interaction with conserved factors necessary for DNA packaging.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Blaisdell BE (1985) Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol 21:278–288
Blaisdell BE (1986) A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA 83:5155–5159
Blaisdell BE (1989a) Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences. J Mol Evol 29:526–537
Blaisdell BE (1989b) Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system. J Mol Evol 29:538–547
Fickett JW, Burke C (1989) Development of a database for nucleotide sequences. In: Waterman MS (ed) Mathematical models for DNA sequences. CRC Press, Boca Raton FL, pp 1–34
Grantham R, Gautier C, Gouy M, Mercier R, Pavé A (1980) Codon catalog useage and the genome hypothesis. Nucleic Acids Res 8:r49-r62
Karlin S (1986) Comparative analysis of structural relationships in DNA and protein sequences. In: Karlin S, Nevo E (eds) Evolutionary process and theory. Academic Press, New York, pp 329–363
Karlin S, Ost F, Blaisdell BE (1989) Patterns in DNA and amino acid sequences and their statistical significance. In: Waterman MS (ed) Mathematical models for DNA sequences. CRC Press, Boca Raton FL, pp 133–157
Kimura M (1986) The neutral theory of molecular evolution. Cambridge University Press, New York
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York
Phillips GJ, Arnold J, Ivarie R (1987) Mono-through hexanucleotide composition of theEscherichia coli genome: a Markov chain analysis. Nucleic Acids Res 15:2611–2626
Rogerson AC (1989) The sequence asymmetry of theEscherichia coli chromosome appears to be independent of strand or function and may be evolutionarily conserved. Nucleic Acids Res 17:5547–5563
Rohlf FJ (1986) NTSYS-pc version 1.01. applied Biostatistics, Setauket, NY
Sneath PA, Sokal RR (1973) Numerical taxonomy. WH Freeman, San Francisco
Travers AA (1989) DNA conformation and protein binding. Annu Rev Biochem 58:427–452
von Heijne G (1987) Sequence analysis in molecular biology. Academic Press. San Diego
Woese CR (1987) Bacterial evolution. Microbiol Revs 51:221–271
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rogerson, A.C. There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes. J Mol Evol 32, 24–30 (1991). https://doi.org/10.1007/BF02099925
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02099925