Abstract
All complete retrovirus sequences in the GenEMBL database were examined with the goal of assessing possible relationships between the nucleotide composition of retroviral genomes, the amino acid composition of retroviral proteins, and evolutionary strategies used by retroviruses. The results demonstrated that the genome of each viral lineage has a characteristic base composition and that the variations between groups are related to retroviral phylogeny. By analogy to microbial species, we suggest that the variations arise from group-specific patterns of directional mutations where the bias can be exerted on any of the four nucleotides. It is most likely that the mutational patterns are introduced during reverse transcription, and a direct participation of reverse transcriptase in the process is suspected.
A straightforward strategy was used to analyze the compositional relationship between nucleotides and encoded amino acids. The procedure entailed calculations of amino acid frequencies from nucleotide content and the comparison of the calculated values to the observed amino acid frequencies in retroviruses. The results revealed an excellent correspondence between variation in genomic base composition and variation in amino acid composition of proteins with the compositional differences extending into all major coding regions of the viruses. Because of the magnitude and dispersion of these effects, and because of the nonconservative nature of many of the substitutions between groups with different genomic biases, we suggest that the variations in protein composition driven by biased nucleotide frequencies are an important factor in shaping the characteristic phenotypes of the different viral lineages.
A clue to the nature of the evolutionary forces that are responsible for the generation of nucleotide biases was provided by the observation that viruses with radically different base frequencies most often inhabit the same cell type. This observation, along with analysis of amino acid and nucleotide replacement patterns between and within reverse transcriptase sequences from the various groups, permitted us to advance a model for the evolution of retroviruses. According to the model, speciation could initiate when daughter virions from a single progenitor vary in the direction of their mutational bias. These variations would exert a pleiotropic effect on the frequencies of nucleotides in all viral genes and consequently on the frequencies of amino acids in the encoded proteins. The variants with the most extreme compositional differences would have a selective advantage because their different precursor requirements would enable them to occupy different ecological niches within a single cell. Once the viruses have adapted to different amino acid compositions, continued presence of the diverging viruses in the same cell would no longer be needed to maintain different phenotypes. Each virus would then possess a distinct mutational bias which would fix the patterns of amino acid substitution. These patterns would favor a degree of conservation of the phenotype in the viral progeny, thus promoting the concerted evolution of the species.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Argos P, Rossmann MG, Grau UM, Zuber H, Frank G, Tratschin JD (1979) Thermal stability and protein structure. Biochemistry 18:5698–5703
Argos P (1988) A sequence motif in many polymerases. Nucleic Acids Res 16:9909–9916
Barber AM, Hizi A, Maizel JV, Hughes SH (1990) HIV-1 reverse transcriptase: structure predictions for the polymerase domain. AIDS Res Human Retroviruses 6:1061–1072
Bernardi G (1989) The isochore organization of the human genome. Annu Rev Genet 23:637–661
Bernardi G, Bernardi G (1985) Codon usage and genome composition. J Mol Evol 22:363–365
Bernardi G, Bernardi G (1986) Compositional constraints and genome evolution. J Mol Evol 24:1–11
Bernardi G, Bernardi G (1991) Compositional properties of nuclear genes from cold-blooded vertebrates. J Mol Evol 33:57–67
Bernardi G, Olfsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of the vertebrates. Science 228:953–958
Boeke JD, Corces VG (1989) Transcription and reverse transcription of retrotransposons. Annu Rev Microbiol 43:403–434
Boyer PL, Ferris AL, Hughes SH (1992) Cassette mutagenesis of the reverse transcriptase of human immunodeficiency virus type 1. J Virol 66:1031–1039
Brendel V, Bucher P, Nourbakhsh IR, Blaisdel BE, Karlin S (1992) Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci USA 89:2002–2006
Burns DPW, Desrosiers RC (1991) Selection of genetic variants of simian immunodeficiency virus in persistently infected rhesus monkeys. J Virol 65:1843–1854
Cann AJ, Chen ISY (1990) Human T-cell leukemia virus types I and II. In: Fields BN, Knipe DM (eds) Virology, 2nd ed. Raven, New York, pp 1501–1527
Coffin JM, Tsichlis PN, Barker CS, Voynow S, Robinson HL (1980) Variation in avian retrovirus genomes. Ann NY Acad Sci 354:410–425
Coffin JM (1990) Retroviridae and their replication. In: Fields BN, Knipe DM (eds) Virology, 2nd ed. Raven, New York, pp 1437–1500
Coffin JM (1992) Genetic diversity and evolution of retroviruses. Curr Top Microbiol Immunol 176:143–163
Cohen EA, Terwilliger EF, Sodroski JG, Haseltine WA (1988) Identification of a protein encoded by the vpu gene of HIV-1. Nature 334:532–534
Dalgeish AG, Beverley PCL, Clapham PR, Crawford DH, Greaves MF, Weiss RA (1984) The CD4 (T4) antigen is an essential component of the receptor for the AIDS retrovirus (HTLV-3). Nature 312:763–767
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5, suppl 3, National Biomedical Research Foundation, Washington, DC
DeClercq N, Hemschoote K, Devos A, Peeters B, Heyns W, Rombauts W (1992) The 4.4-kilodalton proline-rich polypeptides of the rat ventral prostate are the proteolytic products of a 637-kilodalton protein displaying highly repetitive sequences and encoded in a single exon. J Biol Chem 267:9884–9894
DiGiulio M (1989) The extension reached by the minimization of the polarity distances during the evolution of the genetic code. J Mol Evol 29:288–293
Donahue PR, Hoover EA, Beltz GA, Riedel N, Hirsch VM, Overbaugh J, Mullins JI (1988) Strong sequence conservation among horizontally transmissible minimally pathogenic feline leukemia viruses. J Virol 62:722–731
D'Onofrio G, Mouchiroud D, Aissani B, Gautier C, Bernardi G (1991) Correlations between the compositional properties of human genes, codon usage and amino acid composition of proteins. J Mol Evol 32:504–510
Doolittle RF, Feng DF, Johnson MS, McClure MA (1989) Origins and evolutionary relationships of retroviruses. Q Rev Biol 64:1–30
Doolittle RF, Feng DF (1992) Tracing the origin of retroviruses. Curr Top Microbiol Immunol 176:195–211
Fauci AS (1988) The human immunodeficiency virus: infectivity and mechanisms of pathogenesis. Science 239:617–622
Genetics Computer Group (1992) Program manual for the GCG package, version 7, 575 Science Drive, Madison, Wisconsin, USA 53711
Grantham R, Perrin P, Mouchiroud D (1986) Patterns in codon usage of different kinds of species. In: Dawkins R, Ridley M (eds) Oxford surveys in evolutionary biology, vol 3. Oxford University Press, pp 48–81
Graur D (1985a) Pattern of nucleotide substitution and the extent of purifying selection in retroviruses. J Mol Evol 21:221–231
Graur D (1985b) Amino acid composition and the evolutionary rates of protein-coding genes. J Mol Evol 22:53–62
Graur D, Li W-H (1988) Evolution of protein inhibitors of serine proteinases: positive darwinian selection or compositional effects? J Mol Evol 28:131–135
Haase AT, Stowring L, Harris JD (1982) Visna DNA synthesis and the tempo of infection in vitro. Virology 119:399–410
Haig D, Hurst L (1991) A quantitative measure of error minimization in the genetic code. J Mol Evol 33:412–417
Holland J, Spindler K, Horodyski F, Grabau E, Nichol S, VandePol S (1982) Rapid evolution of RNA genomes. Science 215:1577–1585
Holmes EC, Zhang LQ, Simmonds P, Ludlam CA, Brown AIL (1992) Convergent and divergent sequence evolution in the surface envelope glycoprotein of human immunodeficiency virus type 1 within a single infected patient. Proc Natl Acad Sci USA 89:4835–4839
Huynen MA, Konings DAM, Hogeweg P (1992) Equal G and C contents in histone genes indicate selection pressures on mRNA secondary structure. J Mol Evol 34:280–291
Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–34
Ina Y, Gojobori T (1990) Molecular evolution of human T-cell leukemia virus. J Mol Evol 31:493–499
Japour AJ, Chatis PA, Eigenrauch HA, Crumpacker CS (1991) Detection of human immunodeficiency virus type 1 clinical isolates with reduced sensitivity to zidovudine and dideoxyinosine by RNA-RNA hybridization. Proc Natl Acad Sci USA 88:3092–3096
Jukes TH, Holmquist R, Moise H (1975) Amino acid composition of proteins: selection against the genetic code. Science 189:50–51
Jukes TH, Bhushan V (1986) Silent nucleotide substitutions and G + C content of some mitochondrial and bacterial genes. J Mol Evol 24:39–44
Kypr J, Mrázek J (1987a) Unusual codon usage of HIV. Nature 327:20
Kypr J, Mrázek J (1987b) Occurrence of nucleotide triplets in genes and secondary structure of the coded proteins. Int J Biol Macromol 9:49–53
Kypr J, Mrázek J, Reich J (1989) Nucleotide composition bias and CpG dinucleotide content in the genomes of HIV and HTLV 1/2. Biochim Biophys Acta 1009:280–282
Kyte J, Doolittle RE (1982) A simple method for displaying the hydropathic character of protein. J Mol Biol 157:105–132
Lazcano A, Valverde V, Hernandez G, Gariglio P, Fox GE, Oro J (1992) On the early emergence of reverse transcription: theoretical basis and experimental evidence. J Mol Evol 35:524–536
Lee KY, Wahl R, Barbu E (1956) Content en bases puriques et pyrimidiques des acids desoxyribonucleiques des bacteries. Ann Inst Pasteur 91:212–224
Leeds JM, Slabourgh MB, Mathews CK (1985) DNA precursor pools and ribonucleotide reductase activity: distribution between the nucleus and cytoplasm of mammalian cells. Mol Cell Biol 5:3443–3450
Lockhart PJ, Howe CJ, Bryant DA, Beanland TJ, Larkum AWD (1992) Substitutional bias confounds inference of cyanelle origins from sequence data. J Mol Evol 34:153–162
Mergia A, Luciw PA (1991) Replication and regulation of primate foamy viruses. Virology 184:475–482
Muto A, Osawa S (1987) The guanine and cytosine contents of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA 84:166–169
Nara PL, Smit L, Dunlop N, Hatch W, Merges M, Waters D, Kelliher J, Gallo RC, Fischinger PJ, Goudsmit J (1990) Emergence of viruses resistant to neutralization of V3-specific antibodies in experimental human immunodeficiency virus type 1 IIIB infection of chimpanzees. J Virol 64:3779–3791
Narayan O, Clements JE (1990) Lentiviruses. In: Fields BN, Knipe DM (eds) Virology, 2nd ed. Raven, New York, pp 1571–1589
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Nelson JA, Ghazal P, Wiley CA (1990) Role of opportunistic viral infections in AIDS. AIDS 4:1–10
Ohno S (1988) Codon preference is but an illusion created by the construction principle of coding sequences. Proc Natl Acad Sci USA 85:4378–4382
Ohno S, Yomo T (1990) Various regulatory sequences are deprived of their uniqueness by the universal rule of TA/CG deficiency and TG/CT excess. Proc Natl Acad Sci USA 87:1218–1222
Pang S, Shlesinger Y, Daar ES, Moudgil T, Ho DD, Chen ISY (1992) Rapid generation of sequence variation during primary HIV-1 infection. AIDS 6:453–460
Penny D, Hendy M, Zimmer EA, Hanby RK (1990) Trees from sequences: panacea or Pandora's box? Aust Syst Bot 3:21–38
Popovic M, Read-Connole E, Gallo RC (1984) T4 positive human neoplastic cell lines susceptible to and permissive for HTLV-III. Lancet 11:1472–1473
Preston BD, Poiesz BJ, Loeb LA (1988) Fidelity of HIV-1 reverse transcriptase. Science 242:1168–1171
Ratner L, Philpott T, Thowbridge DB (1991) Nucleotide sequence analysis of isolates of human T-lymphotropic virus type 1 of diverse geographical origins. AIDS Res Hum Retroviruses 7:923–941
Repaske R, Steele PE, O'Neill RR, Rabson AB, Martin MA (1985) Nucleotide sequence of a full-length human endogenous retroviral segment. J Virol 54:764–772
Richman DD (1992) Selection of zidovudine-resistant variants of human immunodeficiency virus by therapy. Curr Top Microbiol Immunol 176:131–143
Roberts JD, Bebenek K, Kunkel TA (1988) The accuracy of reverse transcriptase from HIV-1. Science 242:1171–1173
Rolfe R, Meselson M (1959) The relative homogeneity of microbial DNA. Proc Natl Acad Sci USA 45:1039–1043
Saccone C, Pesole G, Preparata G (1989) DNA microenvironments and the molecular clock. J Mol Evol 29:407–411
Sagata N, Yasunaga T, Tsuzuku-Kawamura J, Ohishi K, Ogawa Y, Ikawa Y (1985) Complete nucleotide sequence of the genome of bovine leukemia virus: its evolutionary relationship to other retroviruses. Proc Natl Acad Sci USA 82:677–681
Schachtel GA, Bucher P, Morcarski ES, Blaisdell BE, Karlin S (1991) Evidence for selective evolution in codon usage in conserved amino acid segments of human alphaherpesvirus proteins. J Mol Evol 33:483–494
Shpaer EG, Mullins JI (1990) Selection against CpG dinucleotides in lentiviral genes: a possible role of methylation in regulation of viral expression. Nucleic Acids Res 18:5793–5803
Sidow A, Wilson AC (1990) Compositional statistics: an improvement of evolutionary parsimony and its application to deep branches in the tree of life. J Mol Evol 31:51–68
Sneath PHA, Sokal RR (1973) Numerical taxonomy. Freeman, San Francisco, pp 230–234
Stevenson M, Bukrinsky M, Haggerty S (1992) HIV-1 replication and potential targets for intervention. AIDS Res Hum Retroviruses 8:107–117
Strauss EG, Strauss JH, Levine AJ (1990) Virus evolution. In: Fields BN, Knipe DM (eds) Virology, 2nd ed. Raven, New York, pp 167–190
Sueoka N (1959) A statistical analysis of deoxyribonucleic acid distribution in density gradient centrifugation. Proc Natl Acad Sci USA 45:1480–1490
Sueoka N (1961) Compositional correlation between deoxyribonucleic acid and protein. Cold Spring Harbor Symp Quart Biol 26:35–43
Sueoka N (1962) On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA 48:582–592
Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA 85:2653–2657
Sueoka N (1992) Directional mutation pressure, selective constraints and genetic equilibria. J Mol Evol 34:95–114
Teich N (1984) Taxonomy of retroviruses. In: Weiss R, Teich H, Varmus H, Coffin J (eds) RNA tumor viruses, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, pp 25–207
Van Wye JD, Bronson EC, Anderson JN (1991) Species-specific patterns of DNA bending and sequence. Nucleic Acids Res 19:5253–5261
Varmus H, Brown P (1989) Retroviruses. In: Howe M, Berg D (eds) Mobile DNA elements. American Society of Microbiology, Washington, DC, pp 53–108
Vartanian J-P, Meyerhans A, Asjo B, Wain-Hobson S (1991) Selection, recombination, and G → A hypermutation of human immunodeficiency virus type 1 genomes. J Virol 65:1779–1788
Velasco AM, Medrano L, Lazcano A, Orb J (1992) A redefinition of the Asp-Asp domain of reverse transcriptases. J Mol Evol 35:551–556
Williams KJ, Loeb LA (1992) Retroviral reverse transcriptases: error frequencies and mutagenesis. Curr Top Microbiol Immunol 176:165–181
Woese CR, Dugre DH, Dugre SA, Kondo M, Saxinger WC (1966) On the fundamental nature and evolution of the genetic code. Cold Spring Harbor Symp Quant Biol 31:723–736
Wong-Staal F (1990) Human immunodeficiency viruses and their replication. In: Fields BN, Knipe DM (eds) Virology, 2nd ed. Raven, New York, pp 1529–1543
Yokoyama S, Moriyama EN, Gojabori T (1987) Molecular phylogeny of the human immunodeficiency and related retroviruses. Proc Jpn Acad 63:147–150
Yokoyama S, Chung L, Gojobori T (1988) Molecular evolution of the human immunodeficiency and related viruses. Mol Biol Evol 5:237–251
Yomo T, Ohno S (1989) Concordant evolution of coding and noncoding regions of DNA made possible by the universal rule of TA/CG deficiency-TG/CT excess. Proc Natl Acad Sci USA 86:8452–8456
Author information
Authors and Affiliations
Additional information
Correspondence to: J.N. Anderson
Rights and permissions
About this article
Cite this article
Bronson, E.C., Anderson, J.N. Nucleotide composition as a driving force in the evolution of retroviruses. J Mol Evol 38, 506–532 (1994). https://doi.org/10.1007/BF00178851
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00178851