Abstract
E. coli genes that contain a high frequency of the tetranucleotide CTAG are also rich in the tetramers CTTG, CCTA, CCAA, TTGG, TAGG, and CAAG (group-I tetramers). Conversely, E. coli genes lacking CTAG are rich in the tetranucleotides CCTG, CCAG, CTGG, and CAGG (group-II tetramers). These two gene samples differ also in codon usage, amino acid composition, frequency of Dcm sites, and contrast vocabularies. Group-I tetramers have in common that they are depleted by very-short-patch repair (VSP), while group-II tetramers are favored by VSP activity. The VSP system repairs G:T mismatches to G:C, thereby increasing the overall G+C content of the genome; for this reason the CTAG-rich sample has a lower G+C content than the CTAG-poor sample. This compositional heterogeneity can be tentatively explained by a low level of VSP activity on the CTAG-rich sample. A negative correlation is found between the frequency of group-I tetramers and the level of gene expression, as measured by the Codon Adaptation Index (CAI). A possible link between the rate of VSP activity and the level of gene expression is considered.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Atlung T, Nielsen A, Hansen FG (1989) Isolation, characterization and nucleotide sequence of appY, a regulatory gene for growth-phase-dependent gene expression in Escherichia coli. J Bacteriol 171: 1683–1691
Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958
Bhagwat AS, McClelland M (1992) DNA mismatch correction by very short patch repair may have altered the abundance of oligonucleotides in the E. coli genome. Nucleic Acids Res 20:1663–1668
Burge C, Campbell AM, Karlin S (1992) Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci USA 89:1358–1362
Devereux J, Haeberli P, Smithies O (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12: 387–395
Dixon WJ, Brown MB (1979) BMDP-79 biomedical computer programs P series. University of California Press, Berkeley
D'Onofrio G, Bernardi G (1992) A universal compositional correlation among codon positions. Gene 110:81–88
Gómez-Eichelmann MC, Ramírez-Santos J (1993) Methylated cytosine at Dcm (CC-A/T-GG) sites in Escherichia coli: possible function and evolutionary implications. J Mol Evol 37:11–24
Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:7055–7074
Holmquist GP, Filipski J (1994) Organization of mutations along the genome: a prime determinant of genome evolution. TREE 9:65–69
Jukes TH, Bhushan V (1986) Silent nucleotide substitutions and G+C content of some mitochondrial and bacterial genes. J Mol Evol 24:39–44
Kao C, Snyder L (1988) The lit gene product which blocks bacteriophage T4 late gene expression is a membrane protein encoded by a cryptic DNA element, e14. J Bacteriol 170:2056–2062
Lieb M (1991) Spontaneous mutation at a 5-methylcytosine hotspot is prevented by very short patch (VSP) mismatch repair. Genetics 128:23–27
Lébart L, Fenelon JP (1975) Statistique et informatique appliquées. Dunod, Paris
Lupski JR, Godson GN (1984) The rpsU-dnaG-rpoD macromolecular synthesis operon of E. coli. Cell 39:251–252
McFall E (1987) The D-seriee deaminase operon. In: Neidhardt FC (ed) Escherichia coli and Salmonella typhimurium: cellular and molecular biology. American Society of Microbiology, Washington, DC, p 1520
Médigue C, Viari A, Henaut A, Danchin A (1991a) Escherichia coli molecular genetic map (1500 kbp): update II. Mol Microbiol 5: 2629–2640
Médigue C, Rouxel T, Vigier P, Henaut A, Danchin A (1991b) Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol 222:851–856
Merkl R, Kröger M, Rice P, Fritz HJ (1992) Statistical evaluation and biological interpretation of non-random abundance in the E. coli K-12 genome of tetra- and pentanucleotide sequences related to VSP DNA mismatch repair. Nucleic Acids Res 20:1657–1662
Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA 84:166–169
Nomura M, Sor F, Yamagishi M, Lawson M (1987) Heterogeneity of GC content within a single bacterial genome and its implications for evolution. Cold Spring Harbor Symp Quant Biol 52:658–663
Phillips GJ, Arnold J, Ivarie R (1987a) Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic Acids Res 15:2611–2626
Phillips GJ, Arnold J, Ivarie R (1987) The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. Nucleic Acids Res 15:2627–2638
Pietrokovski S, Hirshon J, Trifonov EN (1990) Linguistic measure of taxonomic and functional relatedness of nucleotide sequences. J Biomol Struct Dyn 7:1251–1268
Pietrokovski S, Trifonov EN (1992) Imported sequences in the mitochondrial yeast genome identified by nucleotide linguistics. Gene 122:129–137
Rolfe R, Meselson M (1959) The relative homogeneity of microbial DNA. Proc Natl Acad Sci USA 45:1039–1043
Sharp PM, Li WH (1987) The codon adaptation index: a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295
Sharp PM, Cowe E, Higgins DG, Shields DC, Wolfe KH, Wright F (1988) Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity. Nucleic Acids Res 16:8207–8211
Sharp PM (1990) Processes of genome evolution reflected by base frequency differences among Serratia marcescens genes. Mol Microbiol 4:119–122
Sharp PM, Lloyd AT (1993) Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. Nucleic Acids Res 21:179–183
Sueoka N (1959) A statistical analysis of deoxyribonucleic acid distribution in density gradient centrifugation. Proc Natl Acad Sci USA 45:1480–1490
Sueoka N (1962) On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA 48:582–592
Sueoka N (1992) Directional mutation pressure, selection constraints, and genetic equilibria. J Mol Evol 34:95–114
Author information
Authors and Affiliations
Additional information
Correspondence to: A. Marine
Rights and permissions
About this article
Cite this article
Gutiérrez, G., Casadesús, J., Oliver, J.L. et al. Compositional heterogeneity of the Escherichia coli genome: A role for VSP repair?. J Mol Evol 39, 340–346 (1994). https://doi.org/10.1007/BF00160266
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00160266