Abstract
The persistence of life requires populations to adapt at a rate commensurate with the dynamics of their environment. Successful populations that inhabit highly variable environments have evolved mechanisms to increase the likelihood of successful adaptation. We introduce a 64 × 64 matrix to quantify base-specific mutation potential, analyzing four different replicative systems, error-prone PCR, mouse antibodies, a nematode, and Drosophila. Mutational tendencies are correlated with the structural evolution of proteins. In systems under strong selective pressure, mutational biases are shown to favor the adaptive search of space, either by base mutation or by recombination. Such adaptability is discussed within the context of the genetic code at the levels of replication and codon usage.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Viable mutations can potentiate the emergence of new life forms and the adaptation of living organisms to new environmental constraints. Evolution occurs through a hierarchy of genetic events, including base substitution, homologous recombination, insertions, deletions, rearrangements, transpositions, and horizontal transfers (Lawrence 1997; Pennisi 1998). Systems such as the adaptive immune system use somatic hypermutation to rapidly search protein space to combat infectious agents. Likewise, error-prone PCR is used in molecular evolution protocols to search space in order to optimize protein function. In addition, pathogens and cancers have evolved effective dynamic mechanisms, often predicated on base substitution, to evade immune and therapeutic selection. In HIV, for example, the high rate of viral mutation makes the development of a vaccine difficult and results in the rapid onset of resistance to many current drugs. Indeed, there is a correspondence among the ability of HIV to evolve drug resistance, the drug regimen given, and the genetic makeup of the strains present in a patient (Lathrop and Pazzani 1999). Crucial to a thorough understanding of the base substitution process is a mathematically precise quantification of the various mutation rates.
The mutational machinery of hypermutation and recombination is under environment-dependent regulation (Bull et al. 2001). Studies have shown that regulation is possible both in the process of replication and error correction (Sutton and Walker 2001) and in the type of polymerases expressed (Friedberg et al. 2000; Storb 2001). The mechanism for maintenance of adaptability traits is population-based and requires a dynamic environment. That evolvability is a group selectable trait has been shown in simulations of digital organisms (Travis and Travis 2002; Peper 2003; Ofria et al. 1999; Thearling and Ray 1997; Wagner and Altenberg 1996; Altenberg 1994). Many of the biochemical events necessary to modify adaptability are known. At the simplest level, mutation of a single amino acid site in the Tag Pol I enzyme is sufficient to greatly modulate the accuracy of DNA replication (Patel et al. 2001).
The goal of this paper is to show how species can use evolution in populations as a space searching advantage in the context of the genetic code. Base-to-base rates of synonymous, conservative, and nonconservative mutation tendencies for each codon are described, thus allowing for the quantification of evolutionary potential. Base-specific mutation rates are dependent on the fidelity of the replication machinery, flanking sequences, and other environmental conditions. The base substitution rate is nonuniform because transitions are typically greatly favored over transversions, and purines are typically substituted at a greater frequency than pyrimidines. However, different replication systems have different base-specific mutation probabilities. It is argued here that within the context of genetic code, the emergence of replication variants that modulate not only the overall rate but also base-specific mutation rates allows populations to increase the probability of searching productive survival space under dynamic environmental constraints. Our theory complements previous observations for the immune system (Kepler 1997), recent observations of codon bias toward increased adaptability in influenza A (Plotkin and Doshoff 2003), and recent work on digital organisms showing how adaptability evolves within a population (Travis and Travis 2002; Peper 2003; Ofria et al. 1999; Thearling and Ray 1997; Wagner and Altenberg 1996; Altenberg 1994).
A codon mutation matrix that defines in a precise manner the probability per round of a codon mutating by base substitution into another codon is introduced. This matrix provides the rates of all possible 64 × 64 mutations. From these detailed rates, properties of the codons themselves can be calculated. For example, the codon mutation matrix allows the classification of codons according to their synonymous, conservative, or nonconservative mutabilities. Codons that tend to mutate in a more dramatic, nonconservative fashion are characterized as having a higher evolutionary potential, allowing for a more rapid short-term adaptation.
To describe mutabilities of codons, there currently exists the K S and K A notation (Li et al. 1985). The parameter K S describes the number of synonymous substitutions per site, and K A describes the number of nonsynonymous substitutions per site. Because of its average character, and because it is based on sequences that have undergone selection, the K S and K A description is limited to estimating the number of synonymous and nonsynonymous nucleotide substitutions between exons of homologous genes.
Some approaches that exist are indirect measures of intrinsic adaptability at the genetic level. The PAM and BLOSUM matrices, for example, describe mutabilities between amino acids rather than between codons (Dayhoff et al. 1978; Henikoff and Henikoff 1992; Durbin et al. 1998). Moreover, a matrix of pure mutational tendencies is ideally constructed from data gathered from nonselected genomic data, such as intron regions or pseudogenes. Yang and Kumar (1996) have developed what is known as the Q matrix. This matrix quantifies the underlying mutational pattern of nucleotide substitution. This 4 × 4 matrix, which deals with bases rather than codons, will be useful in our development. A codon mutation matrix based on the assumption that the ratio of transition-to-transversion mutation rates is constant and that the ratio of nonsynonymous-to-synonymous mutation rates is constant has been developed (Goldman and Yang 1994). This matrix can capture mutational data that are consistent with the assumption of equal transition-to-transversion and nonsynonymous-to-synonymous mutation rates. Our 64 × 64 matrix separates the species-specific mutation probabilities, and it additionally allows us to quantify the efficacy, type, and biases of subsequent codon mutation changes in the context of the genetic code.
In the context of pathogen and disease evolution, the mutation matrix can be a valuable tool to quantify mutation probabilities and to enable the design of therapeutics and vaccines that would most effectively target disease epitopes that have the lowest chance of evolutionary escape (Freire 2002). In the context of laboratory evolution of proteins, or protein molecular evolution (Patten et al. 1997; Lutz and Benkovic 2000; Petrounia and Arnold 2000), knowing the tendencies of codons to mutate synonymously, conservatively, or nonconservatively would be helpful in experiment design.
Methods
The Codon Mutation Matrix
As an approximation, it is initially assumed that each base in a codon mutates independently. This allows the 64 × 64 codon mutation matrix to be constructed from the 4 × 4 base mutation matrix. In particular,
where i is the number of the codon that will be mutated and j is the number of the codon that results after the mutation, with 1 ≤ i, j ≤ 64. The codon is denoted i 1 i 2 i 3, where i 1 is the first base in codon i, i 2 is the second base, and i 3 is the third base, with 1 ≤ i 1, i 2, i 3 ≤ 4. Similarly, j 1, j 2, j 3 is the base triplet for codon j. The probability of a mutation from codon i to codon j in one round of replication is given by the codon mutation matrix T ij . In this mathematical representation, the probability per round of no mutation is given by T ii . Since either a mutation occurs or no mutation occurs, this matrix satisfies the constraint
The probability per round of a mutation from one base to another is given by the base substitution matrix t. The base mutation matrix also satisfies conservation of probability \( \sum\nolimits_{j_1 }^4 { = \;1\;t_{i_1 \,j_1 } = } \;1\).
This definition leads to what is known mathematically as a discrete-time Markov process. The base mutation matrix t can be constructed from information about the mutation frequency for the four bases, A, C, G, and T. The nondiagonal elements of t are derived from the 12 different independent rates of mutation. Typically the nondiagonal elements are small, since the rate of mutation is of the order of 10−2–10−6 per base per replication. The diagonal elements of the base mutation matrix are computed from the conservation of probability constraint. The 64 × 64 codon mutation matrix is then constructed from the 4 × 4 base mutation matrix by Eq. (1). Each element of the 64 × 64 matrix thus gives the probability per round of one codon mutating to another codon. One round, or codon mutation step, can include zero, one, two, or three simultaneous base mutations.
The assumption that DNA bases mutate independently can be refined in the presence of additional experimental data. It is known, for example, that flanking bases affect the base mutation rate in the hypervariable region of mouse antibodies (Smith et al. 1996). Overall mutation rates have been measured for base triplets, and this information can be used to refine the codon mutation matrix. If ω i is the observed mutation rate for codon i, the improved codon mutation matrix T′ is defined as
where z is a constant chosen so that the average mutation rate of the codons remains unchanged by this operation:z = ∑ i≠j ω i T ij / ∑ i≠j T ij . Alternatively, the assumption of equal transition-to-transversion and synonymous-to-nonsynonymous mutation rates may be used to generate a refined codon mutation matrix (Goldman and Yang 1994), although this is not done in the present work.
The codon mutation matrix differs from organism to organism and is constructed here for several specific systems. Since comparative trends are of interest, the overall average base mutation rate is set to be the same in all species, 2 × 10−5 per replication. A different average mutation rate for each species would simply adjust the overall scale of the codon mutation matrix. In each case, the base mutation matrix is first constructed from available data, and then Eq. (1) is used to construct the full codon mutation matrix.
The 64 × 64 codon mutation matrix contains a total of 4096 elements, each element calculated from Eq. (1) or Eq. (3). For each codon a synonymous, conservative, and nonconservative mutability is defined. The synonymous mutability, for example, is the sum of all of the elements of the codon mutation matrix that change a codon by a synonymous mutation. Similarly, the conservative mutability is the sum of all of the elements of the codon mutation matrix that change a codon by a conservative mutation. A conservative mutation occurs when a codon mutates to a codon that codes for a different amino acid that is, however, similar to the amino acid originally encoded. Amino acids are similar if they are in the same group, and there are seven groups: neutral and polar, positive and polar, negative and polar, nonpolar with ring, nonpolar without ring, cysteine, and stop. Substitutions that change the amino acid to a different group are defined as nonconservative, and substitutions that retain the encoded amino acid are defined as synonymous. Finally, the nonconservative mutability is the sum of all of the elements of the codon mutation matrix that change a codon by a nonconservative mutation. These three mutability values express the probability that a specific codon will mutate synonymously, conservatively, or nonconservatively in one round of replication.
Systems Studied
The mutation frequencies of the Taq polymerase in error-prone PCR are available and can be extracted (Moore and Maranas 2000). In the context of protein molecular evolution (Patten et al. 1997; Lutz and Benkovic 2000; Petrounia and Arnold 2000), understanding the mutational process in error-prone PCR is especially important. The base mutation matrix for this, and the other systems, is available in the Supplementary Information. The three mutability values for each codon for this system are shown in Fig. 1A.
The codon mutation matrix is also constructed for mutations in the intronic V regions of mouse antibodies (Smith et al. 1996). Equation (3) is used to account for the effect of flanking bases in the mutation process, using JH/Jκ intronic data (Shapiro et al. 1999). The mutability values for this system are shown in Fig. 1B.
The data from non-long terminal repeat retrotransposable elements are used to construct the 4 × 4 base mutation matrix for Drosophila (Petrov and Hartl 1999). Only the data from the terminal branches, representing “dead-on-arrival,” nonfunctional copies that are unconstrained by selection were used. These copies evolve as pseudogenes.
The last system for which a codon mutation matrix is constructed is mitochondrial DNA from Haemonchus contortus (Blouin et al. 1998). This is a nematode in the same subclass Rhabditia as Caenorhabditis elegans. Coding regions of mtDNA were used to allow for comparison with codon usage data available in the literature. The base mutation matrix obtained from these data is treated as applicable to nuclear DNA, and so the standard genetic code is used. While use of intronic data from C. elegans would be preferable, such data are difficult to collect due to the extensive divergence between C. elegans and its near relative, C. briggsae (T. Blumenthal, personal communication, 2001). The mutation rate data estimated by the mtDNA mutation rates does not play an essential role in the analysis.
The No-Bias Codon Mutabilities
We are looking for biases in the underlying mutation rates of the replication machinery, not for biases in the genetic code itself. The genetic code biases—that hydrophobic residues tend to mutate to hydrophobic residues and that hydrophilic residues tend to mutate to hydrophilic residues—are well known (Woese 1965; Epstein 1966; Goldberg and Wittes 1966; Fitch 1966; Volkenstein 1994). To investigate biases other than those induced by the genetic code, a refinement to the codon mutability plots is made. This refinement subtracts from each mutability a value termed as the “no-bias” value. The no-bias value comes from a 64 × 64 matrix that is created by using a 4 × 4 matrix where each nondiagonal term has equal mutation frequencies, e.g., equal transition and transversion rates. In other words, the no-bias plots indicate which empirically derived mutabilities are above or below those expected if all base substitutions were equally likely. This matrix serves as a baseline for unbiased mutation rates within the context of the genetic code. This no-bias transformation is not a correction: it is a refined way to do the analysis. The overall mutation rate of the no-bias codon mutation matrix is made to be same as that of the original codon mutation matrix. Synonymous, conservative, and nonconservative mutabilities are calculated from this baseline 64 × 64 matrix and subtracted from the original mutabilities (Fig. 2).
Results
Modulation of Codon Mutation Rates
Error-prone PCR, while not a pure biological system, is a central tool and serves as an excellent example of the power of our approach. Figure 2A immediately reveals that for error-prone PCR, the codons that code for polar amino acids have low relative conservative and nonconservative mutabilities. That is, these mutabilities are much lower than what would be expected under unbiased conditions. For the codons that code for the nonpolar amino acids, on the other hand, a different pattern is observed. In this case, the conservative and nonconservative mutabilities are higher than the baseline values generated from equal mutation rates. Note that because of the factorization in Eq. (1), our theory describes the biasing effect of base mutations, and the “reading frame” of Taq does not matter. In Figs. 1A and 2A we are showing the effect of these biased base mutations when the ribosome reads the exons in frame.
To study the possible effects of mutability modulation in a natural population undergoing rapid, active evolution, the mouse V regions are examined with the 64 × 64 mutation matrix approach. Interestingly, higher conservative and nonconservative mutabilities are observed for the polar amino acids compared to the nonpolar amino acids (Fig. 2B). We quantify the statistical significance of these results by computing the probability per round that a random base mutation matrix would lead to a ratio of mutation rates between the polar groups and the nonpolar groups that is as great as or greater than that observed. That is, we take the ratio of the sum of the conservative and nonconservative mutabilities from Fig. 1 for these two groups. The probability by chance that this ratio is as large as or larger than that in Fig. 1B is 8.6%, From his extremely conservative statistic, it can be concluded that the pattern of increased mutability of polar amino acids is statistically significant to the level of 91%. We also perform this same calculation using another, independent estimate of the base mutation matrix for mouse V regions (Neuberger and Milstein 1995). The probability by chance that the ratio of conservative and nonconservative mutabilities for a random matrix is larger than that given by this new matrix is 5.3%. This result is, thus, significant to the level of 95%. It is interesting to note that if one assumed the experimentally measured base mutation matrices were random, i.e., dominated by experimental noise, the probability that two random such matrices would give a ratio as large as or larger than that observed in Fig. 1B is 0.0862 = 0.7%.
It is difficult to measure experimentally exact mutation rates. Thus, the sensitivity of the codon mutation matrix to changes in the base substitution rates is of interest. In order to test the robustness of our findings for Tag to experimental noise, a random number is added or subtracted from each of the 12 off-diagonal, independent values in the 4 × 4 base mutation matrix. This random number is generated from a Gaussian distribution with zero mean and a standard deviation that is equal to a given percentage of the average mutation rate. This procedure generates a new 4 × 4 base mutation matrix, from which a new 64 × 64 codon mutation matrix is calculated. To determine if the mutability bias patterns found in Fig. 2A is perturbed by the addition of noise, codon mutability plots are created with the new codon mutation matrix. This plot displays the pattern observed in Fig. 2A until the noise overwhelms the signal. The pattern from Fig. 2A is still evident up to noise levels of 50% of the average mutation rate, disappearing only when the noise reaches 60% (Tan 2002). An analogous calculation was performed for the mouse V region system, and again the pattern in Fig. 2B persisted up to noise levels of 50% of the average mutation rate, disappearing only when the noise reaches 60% (Tan 2002). Thus, the observed trends in Fig. 2 are rather robust to the presence of experimental noise.
One might wonder whether this pattern of increased nonsynonymous mutabilities of charged residues would survive in other mouse or mammalian genes. Figure 3 shows the no-bias codon mutability plot derived from non-immune-system gene mutation rates from human B cells. Data are from Shen et al. (2000). As expected, there is no overall pattern. A quantitative comparison to the polar-to-nonpolar ratio of conservative and nonconservative mutabilities calculated for Fig. 1B shows that in this case the probability that a random base mutation matrix has a value higher than that observed in Fig. 3 is 25%. Thus, the increase in the nonsynonymous mutability of the mammalian, immunoglubulin V region in Fig. 1B is unique and statistically significant.
Further analysis of the codon mutation matrices was done by combining mutability information with codon usage information. Codon usage is necessary to determine via the mutation matrix the average rate of mutation of a gene, since the total rate of mutation depends both on the mutation rate per codon and on which codons are present. By summing the product of the RSCU value (Sharp et al. 1986) and the synonymous mutability for all the codons that code for a given amino acid, the synonymous mutability of amino acid α is calculated:
where the synonymous mutability of codon i is taken from Fig. 1A,B, and the codon usage p α i is taken from the experimental RSCU values (Duret and Mouchiroud 1999). The synonymous mutability of amino acids is observed to be higher in the short genes than in the long genes for the nematode (Fig. 4). Indeed, of the amino acids, only arginine has a demonstrably lower synonymous mutability for the short genes, as shown in Fig. 4. We calculate the probability that the observed increase in synonymous mutability is due to chance. The probability of 17 or more of 18 amino acids showing this trend by chance is \( [({18\atop18})+({18\atop17})]2^{-18}=7.2\times10^{-5} \)
. Making the same plot for the nematode, one observes the pattern to be even more striking (Tan 2002; Blouin et al. 1998) (data not shown). Indeed, of the amino acids, only proline has a demonstrably lower synonymous mutability for the short genes, and only two other amino acids have roughly the same synonymous mutability in short and long genes. The probability of 15 or more of 16 amino acids showing this trend by chance is \( [({16\atop10})+({16\atop15})]2^{-16}=2.6\times10^{-4} \). While there are selective pressures on synonymous codon usage, such as preference for tRNAs at different levels of abundance, it seems unlikely that there would be a selection on the quantity synonymous mutation rate, in and of itself, that is significant enough to cause the observed correlation. In other words, there are known to be selective pressures on codon usage. What is not clear is why there should be selective pressure on synonymous mutation rate itself. There is selection pressure on the ability to adapt, however. In order for short genes to evolve at an overall rate comparable to that of long genes, the mutation rate per base would have to be higher in short genes. If one assumes that on average there are a certain number of mutations needed to effect functional adaptation of a protein, and that short proteins and long proteins need to evolve at roughly similar rates, this then implies that short proteins need a higher per base rate of evolution than long proteins—because they are shorter, and the evolution rate of a gene is the evolution rate per base times the number of bases. Thus, the evolution rate per base must be higher for shorter proteins. In contrast to Fig. 4, however, a correlation between conservative or nonconservative mutation rate and gene length was not observed for either Drosophila or the nematode (data not shown).
Modulation of Recombination Rates
An alternative means of evolution is recombination, and recombination rates are known to be correlated with codon usage bias (Comeron et al. 1999). Selection pressure on short genes for greater evolvability could favor a higher recombination rate per base, thus allowing short genes to evolve at a rate comparable to that of long genes. It would be unfavorable if evolution for higher recombination rates led to lower conservative or nonconservative mutation rates. C+G content is known to be a rough measure of recombination rate (Eyre-Walker 1993; Comeron and Kreitman 2000; Duret et al. 2000; Birdsell 2002). In other words, the correlation between C+G content and recombination rate is strong enough that C+G content is now felt to be a useful maker of local recombination rate (Fullerton et al. 2001; Birdsell 2002). Interestingly, we find that C+G is positively correlated with all three mutation rates and is most highly correlated with synonymous mutation rate. Moreover, as Fig. 5A shows, the codon usage of short genes is such that a higher per base rate of estimated recombination is favored. The recombination rate of amino acid α is estimated by
where the codon usage p α i is taken from the experimental RSCU values (Duret and Mouchiroud 1999). In Fig. 5A, only one exception, for proline, is found to the general pattern. As Fig. 5B shows, a similar correlation between codon usage and enhanced estimated recombination frequency is also observed in Drosophila. No exceptions to the general pattern are found in Fig. 5B. Finally, Fig. 5C shows the estimated recombination rate for A. thaliana. In Fig. 5C, only one exception, for glycine, is found to the general pattern. Considering all three species, the probability of 52 or more of 54 amino acids showing this trend by chance is \( [({54\atop 52}) +({54\atop52})\,+ ({54\atop54})]2^{-54}= 8.2\times10^{-14} \). The pattern is, thus, highly statistically significant. One explanation for the observed codon usage of short, high-expression genes is selective pressure on crossover frequency. On a long time scale, other factors such as neutral evolution and rearrangements become important, and this is likely the reason for the relatively modest shifts in the codon usage observed in Fig. 5.
Figure 6A shows the measured recombination rate versus protein length for genes in Drosophila at high expression levels (Hey and Kliman 2002) (EST > 50). In this species, codon bias is observable for genes at all recombination levels. The correlation between codon bias and recombination rate is seen, however, only when the latter is low rates (Hey and Kliman 2002; Marais and Piganeau 2002). Figure 6 is, therefore, made only for recombination rates less than 1 cM/Mb. A negative correlation between recombination rate and protein length is observed. In Fig. 6C, the measured recombination rate versus protein length is shown for C. elegans for genes at high expression levels (Marais and Piganeau 2002). A clear negative correlation between recombination rate and gene length is again observed.
Discussion
Selective Pressures on Codon Mutation Rates
It was found that for the Taq polymerase, nonpolar amino acids are mutated at an elevated rate. Nonpolar amino acids are more frequently present in the interior cores of proteins, and mutations of these amino acids more often lead to dramatic rearrangements of the protein structure. The pattern in the error-prone PCR mutation plot suggests that the mutations that occur will tend to cause larger changes in the structure of the encoded protein (Fig. 7). It is becoming more accepted that large mutation events such as transpositions, horizontal transfers, gene exchange, and nonconservative mutations are necessary for dramatic evolution. This was shown quantitatively in Bogarad and Deem (1999). Nonconservative mutations in the core of the protein would be one of the most dramatic amino acid substitution moves possible and can be considered to search the protein sequence space most broadly. In other words, under error-prone conditions, the Taq polymerase favors codons for the nonpolar amino acids that mutate nonconservatively. This property of error-prone PCR greatly enhances the ability of this method to improve protein function effectively by forcing the search of greater regions of tertiary fold space. Moreover, the average mutational tendencies of Taq can be modulated by codon usage. Table 1 defines codons by their tendencies to evolve under error-prone conditions. These data can be useful in the design of protein evolution experiments, especially when trying to evolve new motifs ab initio.
It was found that for V regions of mouse antibodies there is an increase in the mutation rate of the charge amino acids. These trends are not sensitive to whether Eq. (1) or Eq. (3) is used to model the mutation matrix or whether the mutation data are taken from (Smith et al. 1996; Shapiro et al. 1999) or from (Neuberger and Milstein 1995). Antibody V regions undergo DNA swapping of gene fragments in order to create the primary repertoire needed to develop resistance to disease. Therefore, base mutations that alter the framework of the proteins become less necessary. More significant are mutations that lead to a greater binding affinity. In protein–protein complexes, a positive correlation is observed between the binding affinity and the number of ionic interactions spanning an interface (Sheinerman et al. 2000; Xu et al. 1997). Thus, for the polar amino acids participating in binding, high conservative and nonconservative mutabilities would be most favorable, since such characteristics would enable more efficient searching of sequence space to optimize binding.
Selective Pressures on Recombination Rates
Previously, a correlation between codon usage bias and gene length had been observed in the species considered here (Duret and Mouchiroud 1999). Several mechanisms that might explain the increased codon bias in short genes were considered, including biased tRNA levels, but all predicted increased bias for longer genes, in contrast to the greater observed bias for shorter genes (Duret and Mouchiroud 1999). We suggest that codon usage in short genes in these species has evolved due to selection for increased recombination frequency (Fig. 6). This mechanism is consistent with previously observed positive correlations between recombination rate and codon usage bias and with previously observed negative correlations between gene length and codon usage bias (Comeron et al. 1999; Comeron and Kreitman 2000). The observed correlation between codon usage and synonymous mutation rate (Fig. 4) may be a by-product of selection on recombination rate, as synonymous mutation rate is positively correlated with C+G content (R = 0.62 for Drosophila, and R = 0.51 for the nematode).
In Duret and Mouchiroud (1999), the codon usage bias was highest for those genes at high expression levels, and Fig. 5 is based on those data. In fact, the expression level was estimated in Duret and Mouchiroud (1999) from the frequency at which those genes were observed in the EST database. It is possible that certain genes may be overrepresented in the EST database, in a way that is correlated with the gene length. If this unknown bias were the cause of the correlation in Fig. 5, then the opposite or no correlation would be expected to be observed for genes at low expression. In fact (data not shown), the same patterns observed in Fig. 5A are observed when codon usage for the genes at low (bottom one-third of genes with nonzero EST abundance) rather than high (top one-third of genes with nonzero EST abundance) expression levels are used: Among the 54 amino acids, only 3 have lower estimated recombination rates for the short genes at low expression levels than for the long genes at low expression levels.
It might be argued that to be fully consistent with our theory, the relevant recombination rate is that of the whole gene, divided by the coding length of the gene. This quantity is slightly different from the quantity plotted in Fig. 6A–C because the intron-to-exon composition of genes could vary systematically with length. This concern has been addressed in Fig. 6B, where recombination rate times gene length divided by coding length has been plotted. The same negative correlation between recombination rate and gene length is again observed.
For our explanation to be consistent, it must be the case that Drosophila and C. elegans are, in some sense, mutationally starved. The very existence of the Hill–Robertson effect in these species (Marais and Piganeau 2002) implies that this is the case, because it implies that point mutation is insufficient to evolve linked genes and that recombination is necessary to break the linkage. The existence of related effects, such as interference selection (Comeron and Kreitman 2002), provides additional evidence for the same reasons. Finally, the fact that codon bias is observable only for genes at low recombination rates in Drosophila, less than 1 (Marais and Piganeau 2002) or 1.5 (Hey and Kliman 2002) cM/Mb, provides additional indirect evidence that the selective pressure to increase evolution rates is strongest where evolution is the slowest.
Conclusion
Previous treatments of the evolutionary biology of codon usage have largely ignored the possibility that codon usage could affect mutation or recombination rates and have primarily focused on using codon usage as a measure of selection. We suggest here that not only can codon usage affect mutation and recombination rates but also codon usage has been selected to enhance functional gene adaptation within the context of the genetic code. This line of reasoning is in accord with strategies for optimized design of experimental protein molecular evolution protocols, where speed of evolution is an explicit goal (Bogarad and Deem 1999; Moore and Maranas 2002).
In nature there are numerous examples of exploiting codon potentials in ongoing evolutionary processes. In the V regions of encoded antibodies, high-potential serine codons such as AGC are found predominantly in the encoded CDR loops, while the encoded frameworks contain low-potential serine codons such as TCT (Wagner et al. 1995). Unfortunately, antibodies and drugs are often no match for the hydrophilic, high-potential codons of “error-prone” pathogens. The dramatic mutability of the HIV gp120 coat protein is one such example. One can envision a scheme for using codon potentials to target disease epitopes that mutate rarely (i.e., low-potential) and unproductively (i.e., become stop, low-potential, or structure-breaking codons). Such a therapeutic scheme should be generally useful against diseases that use error-prone replication to escape therapeutic treatments or vaccines.
References
L Altenberg (1994) The evolution of evolvability in genetic programming KE Kinnear (Eds) Advances in genetic programming MIT Press Cambridge, MA 47–74
JA Birdsell (2002) ArticleTitleIntegrating genomics, bioinformatics, and classical genetics to study to effects of recombination on genome evolution Mol Biol Evol 19 1181–1197 Occurrence Handle1:CAS:528:DC%2BD38Xlt1aktb4%3D Occurrence Handle12082137
MS Blouin CA Yowell CH Courtney JB Dame (1998) ArticleTitleSubstitution bias, rapid saturation, and the use of mtDNA for nematode systematics Mol Biol Evol 15 1719–1727 Occurrence Handle1:CAS:528:DyaK1MXjtVyg Occurrence Handle9866206
LD Bogarad MW Deem (1999) ArticleTitleA hierarchical approach to protein molecular evolution Proc Natl Acad Sci USA 96 2591–2595 Occurrence Handle10.1073/pnas.96.6.2591 Occurrence Handle1:CAS:528:DyaK1MXhvFyltL8%3D Occurrence Handle10077554
HJ Bull MJ Lombardo SM Rosenberg (2001) ArticleTitleStationary-phase mutation in the bacterial chromosome: Recombination protein and DNA polymerase IV dependence Proc Natl Acad Sci USA 98 8334–8341 Occurrence Handle10.1073/pnas.151009798 Occurrence Handle1:CAS:528:DC%2BD3MXls1Wisro%3D Occurrence Handle11459972
JM Comeron M Kreitman (2000) ArticleTitleThe correlation between intron length and recombination in Drosophila: Dynamic equilibrium between mutational and selective forces Genetics 156 1175–1190 Occurrence Handle1:CAS:528:DC%2BD3cXosFSnsbo%3D Occurrence Handle11063693
JM Comeron M Kreitman (2002) ArticleTitlePopulation, evolutionary and genomic consequences of interference selection Genetics 161 389–410 Occurrence Handle1:CAS:528:DC%2BD38XkslChu7o%3D Occurrence Handle12019253
JM Comeron M Kreitman M Aguade (1999) ArticleTitleNatural selection on synonymous sites is correlated with gene length and recombination in Drosophila Genetics 151 239–249 Occurrence Handle1:CAS:528:DyaK1MXovVCgsA%3D%3D Occurrence Handle9872963
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Atlas of protein sequence and structure. National Biomedical Research Foundation, Vol 5, pp 345–352
R Durbin S Eddy A Krogh G Mitchion (1998) Biological sequence analysis Cambridge Unirversity Press Cambridge, UK
L Duret D Mouchiroud (1999) ArticleTitleExpression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila and Arabidopsis Proc Natl Acad Sci USA 96 4482–4487 Occurrence Handle10.1073/pnas.96.8.4482 Occurrence Handle1:CAS:528:DyaK1MXjs1yls7Y%3D Occurrence Handle10200288
L Duret G Marais C Biémont (2000) ArticleTitleTransposons but not retrotransposons are located preferentially in regions of high recombination rate in C. elegans Genetics 156 1661–1669 Occurrence Handle1:CAS:528:DC%2BD3MXjsFGitg%3D%3D Occurrence Handle11102365
CJ Epstein (1966) ArticleTitleRole of the amino acid ‘code’ and of selection for conformation in the evolution of proteins Nature 210 25–28 Occurrence Handle1:CAS:528:DyaF28XpsVOksg%3D%3D Occurrence Handle5956344
A Eyre-Walker (1993) ArticleTitleRecombination and mammalian genome evolution Proc R Soc Lond Ser B Biol Sci 252 237–243 Occurrence Handle1:CAS:528:DyaK2cXht1Khs7w%3D
WM Fitch (1966) ArticleTitleThe relation between frequencies of amino acids and ordered trinucleotides J Mol Biol 16 1–16 Occurrence Handle1:CAS:528:DyaF28Xns12ksA%3D%3D Occurrence Handle5296819
E Freire (2002) ArticleTitleDesigning drugs against heterogeneous targets Nat Biotechnol 20 15–16 Occurrence Handle10.1038/nbt0102-15 Occurrence Handle1:CAS:528:DC%2BD38Xis12gsA%3D%3D Occurrence Handle11753347
EC Friedberg WJ Feaver VL Gerlach (2000) ArticleTitleThe many faces of DNA polymeraces: Strategies for mutagenesis and for mutational avoidance Proc Natl Acad Sci USA 97 5681–5683 Occurrence Handle10.1073/pnas.120152397 Occurrence Handle1:CAS:528:DC%2BD3cXjvFaksbs%3D Occurrence Handle10811923
SM Fullerton AB Carvalho AG Clark (2001) ArticleTitleLocal rates of recombination are positively correlated with GC content in the human genome Mol Biol Evol 18 1139–1142 Occurrence Handle1:CAS:528:DC%2BD3MXktFemu7w%3D Occurrence Handle11371603
AL Goldberg R Wittes (1966) ArticleTitleGenetic code: Aspects of organization Science 153 420–422 Occurrence Handle1:CAS:528:DyaF28XksVyqs7o%3D Occurrence Handle5328568
N Goldman Z Yang (1994) ArticleTitleA codon-based model of nucleotide substitution for protein-coding DNA sequences Mol Biol Evol 17 32–43
JG Henikoff S Henikoff (1992) ArticleTitleAmino acid substitution matrices from protein blocks Proc Natl Acad Sci USA 89 10915–10919 Occurrence Handle1:CAS:528:DyaK3sXjsFCgsQ%3D%3D Occurrence Handle1438297
J Hey RM Kliman (2002) ArticleTitleInteractions between natural selection, recombination, and gene density in the genes of Drosophila Genetics 160 595–608 Occurrence Handle1:CAS:528:DC%2BD38XitlWrs7w%3D Occurrence Handle11861564
TB Kepler (1997) ArticleTitleCodon bias and plasticity in immunoglobulins Mol Biol Evol 14 637–643 Occurrence Handle1:CAS:528:DyaK2sXjsVGhsLk%3D Occurrence Handle9190065
RH Lathrop MJ Pazzani (1999) ArticleTitleCombinatorial optimization in rapidly mutating drug-resistant viruses J Comb Optim 3 301–320 Occurrence Handle10.1023/A:1009846028730
JG Lawrence (1997) ArticleTitleSelfish operons and speciation by gene transfer Trends Microbiol 5 355–359 Occurrence Handle10.1016/S0966-842X(97)01110-4 Occurrence Handle1:STN:280:ByiH2c%2Fmslc%3D Occurrence Handle9294891
W Li C Wu C Luo (1985) ArticleTitleA new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes Mol Biol Evol 2 150–174 Occurrence Handle3916709
S Lutz SJ Benkovic (2000) ArticleTitleHomology-independent protein engineering Curr Opin Biotechnol 11 319–324 Occurrence Handle10.1016/S0958-1669(00)00106-3 Occurrence Handle1:CAS:528:DC%2BD3cXmtV2ltrc%3D Occurrence Handle10975450
G Marais G Piganeau (2002) ArticleTitleHill-Robertson interference is a minor determinant of variations in codon bias across Drosophilia Melanogaster and Caenorhabditis elegans genomes Mol Biol Evol 19 1399–1406 Occurrence Handle1:CAS:528:DC%2BD38XntVyhtrc%3D Occurrence Handle12200468
GL Moore CD Maranas (2000) ArticleTitleModeling DNA mutation and recombination for directed evolution experiments J Theor Biol 205 483–503 Occurrence Handle10.1006/jtbi.2000.2082 Occurrence Handle1:CAS:528:DC%2BD3cXksFeqt70%3D Occurrence Handle10882567
GL Moore CD Maranas (2002) ArticleTitleeCodonOpt: A systematic computational framework for optimizing codon usage in directed evolution experiments Nucleic Acids Res 30 2407–2416 Occurrence Handle10.1093/nar/30.11.2407 Occurrence Handle1:CAS:528:DC%2BD38XkvV2hsrk%3D Occurrence Handle12034828
MS Neuberger C Milstein (1995) ArticleTitleSomatic hypermutation Curr Opin Immunol 7 248–254 Occurrence Handle10.1016/0952-7915(95)80010-7 Occurrence Handle1:CAS:528:DyaK2MXls1agtL4%3D Occurrence Handle7546385
C Ofria C Adami TC Collier GK Hsu (1999) ArticleTitleEvolution of differentiated expression patterns in digital organisms Adv Artific Life 1674 129–138
PH Patel H Kawate E Adman M Ashbach LA Loeb (2001) ArticleTitleA single highly mutable catalytic site amino acid is critical for DNA polymerase fidelity J Biol Chem 276 5044–5051 Occurrence Handle10.1074/jbc.M008701200 Occurrence Handle1:CAS:528:DC%2BD3MXhtlOqu7w%3D Occurrence Handle11069916
PA Patten RJ Howard WPC Stemmer (1997) ArticleTitleApplications of DNA shuffling to pharmaceuticals and vaccines Curr Opin Biotech 8 724–733 Occurrence Handle10.1016/S0958-1669(97)80127-9 Occurrence Handle1:CAS:528:DyaK2sXotVOgsbc%3D Occurrence Handle9425664
E Pennisi (1998) ArticleTitleMolecular evolution—How the genome readies itself for evolution Science 281 1131–1134 Occurrence Handle10.1126/science.281.5380.1131 Occurrence Handle1:STN:280:DyaK1cvgtVWqtQ%3D%3D Occurrence Handle9735027
JW Peper (2003) ArticleTitleThe evolution of evolvability in genetic linkage patterns Biosystems 69 115–126 Occurrence Handle10.1016/S0303-2647(02)00134-X Occurrence Handle12689725
IP Petrounia FH Arnold (2000) ArticleTitleDesigned evolution of enzymatic properties Curr Opin Biotechnol 11 325–330 Occurrence Handle10.1016/S0958-1669(00)00107-5 Occurrence Handle1:CAS:528:DC%2BD3cXmtV2lt74%3D Occurrence Handle10975451
DA Petrov DL Hartl (1999) ArticleTitlePatterns of nucleotide substitution in Drosophila and mammalian genomes Proc Natl Acad Sci USA 96 1475–1479 Occurrence Handle10.1073/pnas.96.4.1475 Occurrence Handle1:CAS:528:DyaK1MXhsFSru70%3D Occurrence Handle9990048
JB Plotkin J Doshoff (2003) ArticleTitleCodon bias and frequency-dependent selection on the hemagglutinin epitopes of influenza A virus Proc Natl Acad Sci USA 100 7152–7157 Occurrence Handle10.1073/pnas.1132114100 Occurrence Handle1:CAS:528:DC%2BD3sXkslOnu7o%3D Occurrence Handle12748378
GS Shapiro K Aviszus D Ikle LJ Wysocki (1999) ArticleTitlePredicting regional mutability in antibody V genes based solely on di- and trinucleotide sequence composition J Immunol 163 259–268 Occurrence Handle1:CAS:528:DyaK1MXktF2ru7c%3D Occurrence Handle10384124
PM Sharp TMF Tuohy KR Mosurski (1986) ArticleTitleCodon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes Nucleic Acids Res 14 5125–5143 Occurrence Handle1:CAS:528:DyaL28Xlt1yksbc%3D Occurrence Handle3526280
FB Sheinerman R Norel B Honig (2000) ArticleTitleElectrostatic aspects of protein-protein interactions Curr Opin Struct Biol 10 153–159 Occurrence Handle10.1016/S0959-440X(00)00065-8 Occurrence Handle1:CAS:528:DC%2BD3cXivFWgtr0%3D Occurrence Handle10753808
HM Shen N Michael N Kim U Storb (2000) ArticleTitleThe TATA binding protein, c-Myc and survivin genes are not somatically hypermutated, while Ig and BCL6 genes are hypermutated in human memory B cells Int Immun 12 1085–1093 Occurrence Handle10.1093/intimm/12.7.1085 Occurrence Handle1:CAS:528:DC%2BD3cXls12jtLk%3D Occurrence Handle10882420
DS Smith G Creadon PK Jena JP Portanova BL Kotzin LJ Wysocki (1996) ArticleTitleDi- and trinucleotide target preference of somatic mutagenesis in normal and autoreactive B cells J Immunol 156 2642–2652 Occurrence Handle1:CAS:528:DyaK28XhvVGltb0%3D Occurrence Handle8786330
U Storb (2001) ArticleTitleDNA polymerases in immunity: Profiting from errors Nat Immunol 2 484–485 Occurrence Handle10.1038/88673 Occurrence Handle1:CAS:528:DC%2BD3MXkt1yksr0%3D Occurrence Handle11376332
MD Sutton GC Walker (2001) ArticleTitleManaging DNA polymerases: Coordinating DNA replication, DNA repair, and DNA recombination Proc Natl Acad Sci USA 98 8342–8349 Occurrence Handle10.1073/pnas.111036998 Occurrence Handle1:CAS:528:DC%2BD3MXls1Wisrs%3D Occurrence Handle11459973
T Tan (2002) The codon mutation matrix in the context of protein molecular evolution, PhD dissertation UCLA Los Angeles
K Thearling TS Ray (1997) ArticleTitleEvolving parallel computation Complex Sys 10 229–237
JMJ Travis ER Travis (2002) ArticleTitleMutator dynamics in fluctuating environments Proc R Soc Lond Ser B 269 591–597 Occurrence Handle10.1098/rspb.2001.1902 Occurrence Handle1:STN:280:DC%2BD387osVKhsQ%3D%3D
MV Volkenstein (1994) Physical approaches to biological evolution Springer-Verlag New York
GP Wagner L Altenberg (1996) ArticleTitlePerspective: Complex adaptations and the evolution of evolvability Evolution 50 967–976
SD Wagner C Milstein MS Neuberger (1995) ArticleTitleCodon bias targets mutation Nature 376 732 Occurrence Handle10.1038/376732a0 Occurrence Handle1:CAS:528:DyaK2MXnvFChsrs%3D Occurrence Handle7651532
CR Woese (1965) ArticleTitleOn the evolution of the genetic code Proc Natl Acad Sci USA 54 1546–1552 Occurrence Handle1:CAS:528:DyaF28Xltlagsg%3D%3D Occurrence Handle5218910
D Xu SL Lin R Nussinov (1997) ArticleTitleProtein binding versus protein folding: The role of hydrophobic bridges in protein association J Mol Biol 265 68–84 Occurrence Handle10.1006/jmbi.1996.0712 Occurrence Handle1:CAS:528:DyaK2sXnvVehug%3D%3D Occurrence Handle8995525
Z Yang S Kumar (1996) ArticleTitleApproximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites Mol Biol Evol 13 650–659 Occurrence Handle1:CAS:528:DyaK28Xis1Sms78%3D Occurrence Handle8676739
Acknowledgments
This research was supported by the National Institutes of Health and the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Reviewing Editor: Dr. Edward Trifonov
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Tan, T., Bogarad, L.D. & Deem, M.W. Modulation of Base-Specific Mutation and Recombination Rates EnablesFunctional Adaptation Within the Context of the Genetic Code. J Mol Evol 59, 385–399 (2004). https://doi.org/10.1007/s00239-004-2633-8
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00239-004-2633-8