Abstract
Synonymous codon usage in yeast appears to be influenced by natural selection on gene expression, as well as regional variation in compositional bias. Because of the large number of potential targets of selection (i.e., most of the codons in the genome) and presumed small selection coefficients, codon usage is an excellent model for studying factors that limit the effectiveness of selection. We use factor analysis to identify major trends in codon usage for 5836 genes in Saccharomyces cerevisiae. The primary factor is strongly correlated with gene expression, consistent with the model that a subset of codons allows for more efficient translation. The secondary factor is very strongly correlated with third codon position GC content and probably reflects regional variation in compositional bias. We find that preferred codon usage decreases in the face of three potential limitations on the effectiveness of selection: reduced recombination rate, increased gene length, and reduced intergenic spacing. All three patterns are consistent with the Hill–Robertson effect (reduced effectiveness of selection among linked targets). A reduction in gene expression in closely spaced genes may also reflect selection conflicts due to antagonistic pleiotropy.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
The DNA sequence corresponding to the amino acid sequence of a polypeptide is subject to a variety of forces that influence its evolution. Purifying, or negative, selection against mutations that change the amino acid sequence (i.e., replacement mutations) constrain evolution, while directional, or positive, selection can increase the frequency of beneficial mutations. Many mutations, however, are neutral or nearly neutral, and their fate is determined to greater or lesser extent by stochastic processes. In many organisms, synonymous mutations that do not change the amino acid sequence are thought to be subject to weak selection that can influence a mutation's fate only when the population size is sufficiently large (Ohta 1973).
It has been known for some time that synonymous codon usage varies among genes within organisms. Studies on prokaryotes (Ikemura 1981, 1985; Gouy and Gautier 1982), single-celled eukaryotes (Bennetzen and Hall 1982; Ikemura 1985; Sharp and Devine 1989; Sharp and Cowe 1991; Coghlan and Wolfe 2000), and complex multicellular eukaryotes (Stenico et al. 1994; Duret and Mouchiroud 1999; Hey and Kliman 2002) have indicated a correlation between codon bias and gene expression. The prevailing view is that high levels of protein synthesis associated with high levels of gene expression generate a selective pressure to minimize the energetic cost of translation. In those organisms for which data are available, the codons used in the most biased genes (i.e., those inferred to be the “preferred” codons) tend to correspond to the most abundant tRNAs (Ikemura 1985; Moriyama and Powell 1997; Coghlan and Wolfe 2000), and this does affect translation speed (Berg and Kurland 1997). Other forms of selection on codon usage may include selection for accuracy of translation (Akashi 1994) and selection against certain mRNA secondary structures (Antezana and Kreitman 1999; Bernardi 2000).
While some of the among-gene variance in codon usage can be attributed to variation in selection pressure, regional variation in compositional bias also appears to play an important role. This is especially apparent in warm-blooded vertebrates (Bernardi et al. 1985; Bernardi 2000), where expression-associated selection on codon usage has not been supported. Kliman and Hey (1993, 1994) found a correlation between intron GC content and codon third position GC content (GC3) in Drosophila melanogaster, and it is clear that among-gene variance in expression-associated selection and compositional bias both contribute to variance in codon usage in this species.
For the past decade or so, several research groups have focused on codon bias, not simply to understand its causes, but as a model system for studying limitations in the effectiveness of natural selection. Kliman and Hey (1993) found that codon bias in D. melanogaster was reduced in regions of the genome with limited recombination (see also Comeron et al. 1999); Hey and Kliman (2002) recently confirmed this relationship using ~13,000 genes from the complete genome (Adams et al. 2000). The findings are consistent with the expected reduction in the effectiveness of selection on one target due to stochastic effects of selection on linked targets (Hill and Robertson 1966; Felsenstein 1974; Gillespie 2000; Comeron and Kreitman 2002)—the so-called Hill–Robertson effect. A second prediction, that codon bias would be reduced in closely spaced genes (due to reduced opportunity for recombination among targets), was partially confirmed by Hey and Kliman (2002); that is, the expected pattern was observed when gene spacing fell below ~1500–2000 bp. Another way in which selection conflicts can arise, akin to the Hill–Robertson effect, is if individual genes have more potential targets for selection. Li (1987) showed by computer simulation that longer genes will have lower usage of preferred codons, assuming complete linkage. Comeron et al. (1999) showed by simulation that codon bias would decrease with increasing gene length and, also, with decreasing recombination rate. McVean and Charlesworth (2000) also demonstrated a reduction in codon bias with decreasing recombination rate. Most recently, Comeron and Kreitman (2002) showed that weak selection (i.e., “interference selection”) on multiple targets can reduce codon bias and may even explain some of the codon usage patterns observed within Drosophila genes (Kliman and Eyre-Walker 1998).
In the present study, we use factor analysis to identify the major trends in codon usage in the S. cerevisiae genome. Using genomewide data on gene expression levels (DeRisi et al. 1997), we confirm that variance in codon usage is due, in part, to variance in expression levels, consistent with the model that certain preferred codons allow for more efficient translation of mRNA (see also Coghlan and Wolfe 2000). We then show that preferred codon usage covaries with recombination rate (Gerton et al. 2000) and gene length as predicted by selection conflict models. Finally, we find that, as in the case of D. melanogaster, preferred codon usage in yeast decreases as gene density increases. We propose that the latter is due to two forms of selection conflict: conflict due to Hill–Robertson effects and conflict due to antagonistic pleiotropy.
Materials and Methods
Yeast DNA Sequences
The 28 April 2002 annotations of the 16 completely sequenced yeast chromosomes (Goffeau et al. 1996) were downloaded from NCBI. A locus was initially selected for analysis if it met the following three criteria: (i) its start and stop codons were located at the positions indicated in the annotation, and there were no premature stop codons; (ii) if the region between the start and the stop codons overlapped the corresponding region of another locus, a third locus did not overlap either of the first two; and (iii) the locus was not the first or last listed, by location, for a chromosome. The latter were excluded from analysis because we cannot measure the distance to flanking genes on both sides for terminal loci (see gene spacing). A total of 6149 loci met all three criteria.
Measures of Gene Length, Gene Spacing, and Local Base Composition
Gene length was defined as the number of codons. As a measure of gene density, we used SBG (space between genes), previously defined by Hey and Kliman (2002). SBG is the mean of the distance, in base pairs, between the terminal bases of the coding sequence of the locus in question and the terminal bases of the coding sequence of the nearest loci to either side. SBG was calculated for all loci, regardless of whether they would be used later in the study.
In many organisms, intron GC content can be used as a measure of local base composition. However, only 3–4% of yeast genes have introns. Instead, we use the GC content of 1000-bp regions flanking the terminal codons (GC_fl) on both sides of the gene. If either segment of flanking region extended into the coding region of an adjacent gene, we truncated the analyzed region to include only noncoding bases. On average, we analyzed 479 upstream flanking bases and 373 downstream flanking bases (with median values of 416 and 296 bases, respectively). Since noncoding regions include positions subject to functional constraint (e.g., transcription factor binding sites), it was possible that truncation would bias GC_fl. However, we found no correlation between the number of noncoding bases analyzed and GC_fl (r = −0.0173, p = 0.185). Regardless, some analyses were repeated using only flanking regions of 200 bases or more. [All statistical analyses presented here were performed using Statistica 6.0 (StatSoft, Inc. 2001).]
Measures of Gene Expression and Recombination Rate
The expression data for seven time points during the S. cerevisiae diauxic shift (DeRisi et al. 1997) were obtained from http://cmgm.stanford.edu/pbrown/explore/array.txt . In their study, Cy5-labeled cDNA from each time point was applied to an ORF microarray, along with Cy3-labeled cDNA from a reference time point. The level of Cy5 fluorescence, corrected for background fluorescence and variation in the amount of RNA spotted on the array, serves as a measure of transcription. We calculated expression for each time point as E t = Cy5t × Cy3avg/Cy3t, where Cy3avg is the mean across the seven time points of the fluorescence of the Cy3-labeled cDNA corrected for background, Cy3t is the Cy3 fluorescence for a given time point corrected for background, and Cy5t is the Cy5 fluorescence for a given time point. The assumption behind this approach is that variation in Cy5 fluorescence across time points due to inconsistent array loading will be revealed by variation in Cy3 fluorescence. Unfortunately, the approach cannot correct for systematic among-gene bias in the average amount spotted. However, as the results will show, the inferred measures of relative expression appear to be quite informative. We define a composite expression value, E avg, as the mean of the seven E t values for a given gene. We also define E max as the maximum value of E t calculated for the seven time points for a given gene. From the data provided by the authors, we were able to match confidently 5847 of their loci to the 6149 of those that met our earlier criteria for inclusion in the study.
Relative recombination rates have been estimated by Gerton et al. (2000) for most of the S. cerevisiae loci. The authors estimated relative recombination rate up to seven times for each locus; we defined relative recombination rate for a given locus as the mean of all available measures for that locus. We were able to assign confidently relative recombination rates to 5836 of the 5847 loci that met all prior criteria for inclusion in the study.
Indices of Codon Usage
Unequal usage of synonymous codons is often measured using the codon bias indices Chi/L (Shields et al. 1988) and ENC (Wright 1990). These indices do not explicitly distinguish the systematic overuse or underuse of specific codons; however, because many organisms consistently overuse a specific subset of codons, both Chi/L and ENC usually reflect usage of that subset. Alternative indices that reflect the usage of specific preferred, or optimal, codons have been developed (e.g., Sharp and Li 1987), but these indices require a priori identification of the preferred codons. Regardless, the correlations among codon bias indices tend to be strong.
Here, we use factor analysis to identify major trends in codon usage. The frequencies of synonymous codons within amino acid class were calculated for the 5836 loci that met all criteria for inclusion. These frequencies were entered into the factor analysis, and all factors with eigenvalues exceeding 1.0 were identified. The primary factor (Factor1) had an eigenvalue of 11.08, corresponding to 18.79% of the overall variance. The secondary factor (Factor2) had an eigenvalue of 4.73 (8.01% of the overall variance). All other factors had eigenvalues below 2.0. We used each gene's position along the Factor1 axis as a measure of the primary trend in codon usage. We used each gene's position along Factor2 as a measure of the secondary trend. Factor1 is highly correlated with Chi/L (r = 0.8206, p < 0.001) and ENC (r = −0.7770, p < 0.001); note that ENC decreases as codon usage becomes increasingly biased.
“Major” vs. “Preferred” Codons
We defined major codons as those that were present in the genome at a frequency that exceeded 1/degeneracy of the amino acid. That is, the major codons were the ones used disproportionately across the genome. All other codons were deemed minor.
Initial identification of preferred codons was based on the factor analysis. Since Factor1 is strongly positively correlated with Chi/L, the polarity of the primary factor is such that increased values of Factor1 correspond to greater codon bias. Thus, the 24 codons whose frequencies positively correlated with Factor1 were initially inferred to be preferred, and all others were inferred to be unpreferred. This relationship held up when we turned our attention to gene expression.
Statistical Analysis of Influences on Codon Usage
Correlation coefficients were calculated for several variables (gene length, GC_fl, recombination rate, gene expression, etc.) with Factor1 and Factor2. We then used multiple regression to calculate partial correlations and to calculate residual values of Factor1. The relationship between gene expression and Factor1 was evaluated using both linear and curvilinear models, and the relationship between Factor1 and SBG was evaluated using either Factor1 directly or the residuals of Factor1 following regression on other variables. Further details and elaboration are provided under Results and Discussion.
Results
Identification of Major and Preferred Codons
Overall codon frequencies are listed in Table 1. All but two of the major codons (AGG and TGG) end in either A or T. This is consistent with an overall AT compositional bias of noncoding regions in the yeast genome: The mean GC_fl is 0.358, while the mean intron GC is 0.341. In contrast, of the 24 codons whose frequencies correlate positively with Factor1, 12 end in G or C (see Table 1). Third codon position GC content (GC3) correlates positively with Factor1 (r = 0.2177, p < 0.001), indicating a slight preference for G- and C-ending codons. All of the preferred codons initially identified by Bennetzen and Hall (1982) are included in our list, along with the addition of CGT (arginine). All of the preferred codons identified by Sharp and Cowe (1991) are included in our list, with the addition of CGT as well as GCC (alanine). While we will continue to use Factor1 as our measure of codon bias, we note that when frequency of optimal codons (F op [Ikemura 1985]) is calculated using the 24 identified preferred codons, the correlation between F op and Factor1 is 0.9788.
As expected, there is a strong positive correlation between Factor1 and E avg (Table 2). This correlation holds across time points (though it is weaker for E 6 and E 7), as well as for E max. Because E avg is not normally distributed (there is a long tail of highly expressed genes), we also calculated the correlation for log-transformed expression levels (r = 0.5825, p < 0.001) as well as Spearman's rank correlation (r = 0.4867, p < 0.001). The frequencies of all 24 codons inferred as being preferred by their positive correlation with Factor1 were positively correlated with E avg; likewise, the frequencies of all 35 presumed unpreferred codons were negatively correlated with E avg. Figure 1 contrasts overall codon usage with codon usage of the top 5% of genes ranked by E avg. All codons with frequencies positively correlated with Factor1 were used more often in the highly expressed genes than in all genes, and the converse is true for codons with frequencies negatively correlated with Factor1.
If codon usage reflects, in part, selection for efficient translation, we expect that preferred codons would correspond to the more abundant iso-accepting tRNAs (Ikemura 1985). This is relevant for threefold, fourfold, and sixfold redundant classes, where two or more iso-accepting tRNAs are used for a given amino acid. We find a strong association between codon preference and tRNA gene copy number (Percudani et al. 1997); the preferred codons always correspond to tRNAs with the highest gene copy number. However, a few unpreferred codons (arginine codon CGC, glycine codon GGC, and proline codon CCG) use the same tRNA as a preferred codon, indicating that correspondence to an abundant tRNA does not alone guarantee preference.
Factor1 and Factor2 vs. Other Variables
In addition to correlating strongly with gene expression, Factor1 is negatively correlated with gene length and positively correlated with relative recombination rate, as predicted by models of selection conflict. When all genes are included in the analysis, there is no correlation between Factor1 and GC_fl, consistent with the finding that the preferred codons are split evenly between AT- and GC-ending (see Table 3). This changes slightly if analyses are limited to genes with flanking regions of at least 200 bp; a weak positive correlation between flanking GC content and Factor1 emerges (r = 0.0386, N = 5639, p = 0.004). If analyses are limited only to upstream flanking regions of at least 200 bp, there is a somewhat stronger correlation between flanking GC content and Factor1 (r = 0.1360, N = 4914, p < 0.001), though a negative correlation of similar magnitude is seen if the analysis is limited to downstream regions of at least 200 bp (r = −0.1361, N = 4100, p < 0.001). Thus, there is no clear relationship between the primary trend in codon usage and local base composition.
While Factor1 correlates most strongly with gene expression and somewhat with recombination rate, Factor2 appears to reflect mainly compositional biases. Factor2 is very strongly correlated with GC3 (see Fig. 2) and more strongly correlated with GC_fl and recombination rate than is Factor1, though the correlation with GC_fl is weak (see Table 3). The correlations between Factor2 and GC_fl are considerably stronger, however, if we limit analysis to only flanking regions of at least 200 bp (upstream, r = 0.3248, p < 0.001; downstream, r = 0.3370, p < 0.001; combined, r = 0.3930, p < 0.001).
Given that GC_fl may reflect regional compositional bias and that GC-biased gene conversion can explain some of the correlation between recombination rate and base composition (Eyre-Walker 1993; Marais et al. 2001; Birdsell 2002; Marais and Piganeau 2002), the correlations between Factor2 and these two variables are not surprising. In fact, recombination rate is positively correlated with several measures of GC content in yeast (codon first positions, r = 0.1530, p < 0.001; codon second position, r = 0.1409, p < 0.001; GC3, r = 0.4732, p < 0.001; GC_fl, r = 0.0045, p = 0.733). Recombination rate is also positively correlated with GC content of upstream and downstream flanking regions of at least 200 bp (upstream, r = 0.2520, N = 4914, p < 0.001; downstream, r = 0.1746, N = 4100, p < 0.001; combined, r = 0.2775, N = 5836, p < 0.001), though the correlations involving upstream or downstream regions nearly disappear if shorter flanking regions are included.
Gene Expression and Codon Bias vs. Gene Spacing
Hey and Kliman (2002) found a curvilinear relationship between SBG and both gene expression and preferred codon usage in D. melanogaster. As SBG decreased, both gene expression and preferred codon usage increased; however, when genes were within ~1500–2000 bp of other genes, the relationships reversed.
In yeast, we observe a positive correlation between SBG and three variables: E avg (r = 0.0385, p = 0.003), Factor1 (r = 0.1090, p < 0.001), and the residual values of Factor1 after multiple regression on gene length, GC_fl, E avg, and recombination rate (r = 0.1123, p < 0.001). While there is a slight hint at a curvilinear relationship (see Fig. 3), the r 2 values for fitting the data to second-order polynomials are not much greater than those for the strict linear model. It should be noted, however, that most values of SBG in yeast fall below 2000 bp, and the clear curvilinear relationship observed in D. melanogaster requires the inclusion of more distantly spaced genes. Also, because Factor1 correlates with GC3, we plotted GC3 vs. SBG (see Fig. 3). The relationship between GC3 and SBG is clearly different and cannot account for the relationship between Factor1 and SBG. The relationship between codon bias and gene spacing is discussed later in more detail.
Discussion
One of the first things that stands out in a genomewide survey of yeast codon usage is that the most commonly used codons are not necessarily the preferred codons. Twenty four codons are used at a frequency higher than 1/degeneracy, and all but two end in A or T. Twenty-four codons are also positively correlated with both Factor1 and gene expression and, therefore, inferred to be the preferred codons. However, half of the preferred codons are minor codons; 11 of 12 of these end in C or G. There appears to be a strong compositional bias toward A and T, such that many of the preferred codons are underrepresented. In fact, three of the preferred codons (GCC, alanine; GAC, aspartic acid; and CAC, histidine) remain at a frequency below 1/degeneracy even in the most highly expressed 5% of genes.
Factor1, which represents the primary trend in codon usage, is positively correlated with recombination rate and negatively correlated with gene length. Both of these findings are consistent with studies on other organisms, and both are consistent with a model whereby the effectiveness of natural selection on linked targets is reduced. This effect, first suggested by Hill and Robertson (1966), is due to the difficulty, in the absence of recombination, in assorting character states at multiple sites into haplotypes composed entirely of preferred character states. The correlation between recombination rate and preferred codon usage has been well documented in D. melanogaster (Kliman and Hey 1993; Comeron et al. 1999; Hey and Kliman 2002), but there is an important difference between fruit flies and yeast. The preferred codons in Drosophila predominantly end in C or G, such that GC3 and preferred codon usage are essentially synonyms. Among-gene variation in base composition will change GC3 and, consequently, preferred codon usage. Marais et al. (2001) argued that the correlation between recombination rate and preferred codon usage in Drosophila is due mainly to the influence of recombination on base composition, and not to Hill–Robertson effects. However, Hey and Kliman (2002) found that, even after correcting for local base composition, the correlation between recombination rate and preferred codon usage remained.
Regardless, the situation in yeast is clearly different. Because exactly half of the preferred codons end in G or C, variation in local base composition may have little effect on overall preferred codon usage. As GC content increases, the increased usage of preferred G- or C-ending codons will balance the decreased usage of preferred A- or T-ending codons. In fact, there is no clear correlation between flanking GC content and Factor1. Therefore, the correlation between recombination rate and Factor1 is consistent with Hill–Robertson effects. However, we recognize that there is a strong correlation between GC3 and recombination rate and that there is some correlation between GC3 and Factor1. Thus, some of the covariance between recombination rate and Factor1 may reflect a more direct effect of recombination rate (by GC-biased gene conversion) on GC3. It is interesting that the secondary trend in codon usage is very strongly correlated with GC3, and more strongly correlated than is Factor1 with recombination rate. Thus, we should be cautious when suggesting that reduced usage of preferred codons in regions of lower recombination reflects Hill–Robertson effects in yeast, though the possibility should not be dismissed.
The negative correlation between gene length and preferred codon usage is also consistent with among-gene variance in the effectiveness of natural selection. Using computer simulations, Li (1987) showed that preferred codon usage will be reduced in longer genes, even if the codons are independently assorting (though the effect is weaker). A negative correlation between gene length and preferred codon usage has been observed in Drosophila (Moriyama and Powell 1998; Comeron et al. 1999). Kliman and Eyre-Walker (1998) also found that preferred codon usage decreased toward the 3′ end of genes in Drosophila, and this would lead to decreased preferred codon usage in longer genes.
The strong correlation between gene length and the secondary trend in codon usage is intriguing, since Factor2 seems to reflect mainly local base composition. However, we cannot at this time offer a clear cause-and-effect explanation for the relationship between gene length and Factor2. Gene length and recombination rate are negatively correlated in yeast (r = 0.1775, p < 0.001), and gene length and upstream flanking GC are positively correlated (r = 0.0993, p < 0.001). (The correlation with overall flanking GC is similarly positive if regions of at least 200 bp are used.) Duret and colleagues (1995) have reported that longer genes tend to be found in low-GC isochores in vertebrates, suggesting a correlation between base composition and gene length. However, cause and effect are unclear, and the lower gene density of low GC isochores may contribute to this statistical association (Bernardi 2000). It may simply be the case that selection on codon usage is weaker on longer genes, such that the usage of A- and T-ending codons is more likely to increase due to the overall AT-biased mutation pressure in yeast.
The negative correlation we observe between gene length and codon bias contrasts with the positive correlation found by Coghlan and Wolfe (2000). Log-transformation of gene length does not appreciably change our value for the correlation coefficient; r changes from −0.0702 to −0.0758. Coghlan and Wolfe pointed out that a product–moment correlation may be inaccurate due to violation of the assumption of normality, and they chose to use Spearman's correlation. They also first removed the effect of gene expression, which they argued confounds the relationship between gene length and codon bias. To explore the discrepancy in our results, we regressed Factor1 on E avg and then calculated Spearman's correlation between the residuals and gene length. The correlation coefficient was now positive but indistinguishable from zero (Spearman's r = 0.0082, p = 0.5311). If we use residuals from a curvilinear regression of Factor1 on E avg (see below), Spearman's correlation is −0.0137. One assumption behind using residuals is, of course, that the independent variable is accurately measured, and there is certainly going to be error in estimating gene expression levels from microarrays. Still, our correlation between codon bias and gene expression is similar to that obtained by Coghlan and Wolfe (2000). Our gene data sets are, however, somewhat different. We used nearly 400 fewer genes, as our analyses were limited to those for which we had recombination rates, expression levels, and SBG values. This is admittedly an unsatisfying ad hoc explanation for the discrepancy, and further investigation may be warranted.
The association between gene density and preferred codon usage is of particular interest. Hey and Kliman (2002) predicted a negative correlation between gene density and preferred codon usage on the basis of Hill–Robertson effects. Specifically, by reducing the opportunity for recombination among targets of selection, decreased space between genes should interfere with purifying selection on synonymous codon usage. However, the overall trend observed was an increase in preferred codon usage as gene spacing decreased, though it was clear that the relationship was not linear. As gene spacing continued to decrease, the trend reversed, and preferred codon usage decreased. Hey and Kliman proposed that two factors were at play. Eukaryotic gene expression is influenced by chromatin structure (Gasser et al. 1998; Farkas et al. 2000; Bell et al. 2001), and the loosening of chromatin necessary for expression will affect a region that may include nearby genes. Thus, when high gene expression is selectively favored, chromosome rearrangements or deletions that place a highly expressed gene of interest near another highly expressed gene would be favored. However, at some point, placement of genes too near each other would interfere with expression. Here, there are two distinct models. Hill–Robertson effects could interfere with selection on codon usage and, therefore, the efficiency of translation. This, however, would not decrease the level of transcription. Alternatively, closely spaced genes will have a greater tendency to share control elements, so individual sites may have pleiotropic effects. Mutations that increase the expression of one gene will not necessarily do the same for the other, and such antagonistic pleiotropy will make it more difficult to optimize control regions for maximum transcription of both genes. Consequently, usage of preferred codons may decrease.
The gene expression data for yeast are of much better quality than the expression data available at the time for Drosophila. The very strong correlation between E avg and Factor1 suggests that the former is a fairly reliable estimate of expression levels. Just as Hey and Kliman saw for the more closely spaced Drosophila genes, both gene expression and Factor1 decrease as the space between genes decreases in yeast. It is striking, however, that the decrease in Factor1 is nearly the same when we correct for variation in gene expression (as well as other variables) by multiple regression. Thus, a decrease in preferred codon usage in closely spaced genes does not appear simply to reflect a more direct effect on expression (i.e., that fitness effects of unpreferred codon usage are simply smaller because efficiency of protein synthesis is less important). Hill–Robertson effects may, therefore, play a role in the small, but significant, decrease in preferred codon usage in closely spaced yeast genes.
To explore this possibility further, we predicted that antagonistic pleiotropy should be more obvious if we measure SBG only using upstream regions of genes, and it should be less obvious if we measure SBG only using downstream regions of genes. Our reasoning was that the most important control elements of the promoter tend to be located upstream of the transcription start site, so selection conflicts associated with pleiotropic effects should be more evident there. However, this reassessment of gene density does not affect the results in an obvious way (see Fig. 4). Decreases in either upstream or downstream SBG are associated with decreases in Factor1 (corrected for gene expression and other variables). We should note, however, that the correlation between upstream SBG and Factor1 (r = 0.1141) is significantly greater than the correlation between downstream SBG and Factor1 (r = 0.0497). These correlations are essentially unchanged if we substitute for Factor1 the residuals of Factor1 following the multiple regression described earlier.
Because the relationship between gene expression and SBG may not be linear, simple regression may not entirely remove the effect of gene expression. We computed a second-order polynomial relating SBG to E avg, and r 2 (0.4000) was slightly, but significantly, greater than that of the linear model (r 2 = 0.3534). [The second-order polynomial using a log transformation of gene expression had an intermediate value of r 2.] Thus, more of the effect of gene expression should be removed by saving the residuals from the fitting of the second-order polynomial. In line with our earlier analyses, we performed multiple regression of the residuals on gene length, GC_fl, and recombination rate. The residuals from this multiple regression were then saved. The correlations between these Factor1 residuals and SBG were essentially unchanged (SBG, r = 0.1104; upstream SBG, r = 0.1122; downstream SBG, r = 0.0539). Thus, it appears that the finding is robust and that codon bias decreases as genes become more closely spaced. The effect is somewhat stronger for upstream gene spacing than for downstream gene spacing, but the nonzero correlation between codon usage and the latter may indicate subtle Hill–Robertson interference.
Our analyses are consistent with multiple forms of selection conflict associated with evolution of codon usage in yeast. Polymorphism data, as well as orthologous sequences from close relatives of S. cerevisiae, would allow us better to infer among-gene variation in the intensity and effectiveness of natural selection, as we could use the suite of analytical tools developed for analogous studies in Drosophila (Akashi 1995; Akashi and Schaeffer 1997; Kliman 1999).
References
MD Adams et al. (2000) ArticleTitleThe genome sequence of Drosophila melanogaster. Science 287 2185–2195 Occurrence Handle10.1126/science.287.5461.2185 Occurrence Handle10731132
H Akashi (1994) ArticleTitleSynonymous codon usage in Drosophila melanogaster. Natural selection and translational accuracy. Genetics 136 927–935 Occurrence Handle1:CAS:528:DyaK2MXpsFaq Occurrence Handle8005445
H Akashi (1995) ArticleTitleInferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics 139 1067–1076 Occurrence Handle1:CAS:528:DyaK28XhsFWgtL0%3D Occurrence Handle7713409
H Akashi SW Schaeffer (1997) ArticleTitleNatural selection and the frequency distributions of “silent” DNA polymorphism in Drosophila. Genetics 146 295–307 Occurrence Handle1:CAS:528:DyaK2sXmsVarsbg%3D Occurrence Handle9136019
MA Antezana M Kreitman (1999) ArticleTitleThe nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences. J Mol Evol 49 36–43 Occurrence Handle1:CAS:528:DyaK1MXkt1Oktr8%3D Occurrence Handle10368432
AC Bell AG West G Felsenfeld (2001) ArticleTitleInsulators and boundaries: Versatile regulatory elements in the eukaryotic genome. Science 291 447–450 Occurrence Handle10.1126/science.291.5503.447 Occurrence Handle1:CAS:528:DC%2BD3MXlslehsA%3D%3D Occurrence Handle11228144
JL Bennetzen BD Hall (1982) ArticleTitleCodon selection in yeast. J Biol Chem 257 3026–3031 Occurrence Handle1:CAS:528:DyaL38XitFWkt7w%3D Occurrence Handle7037777
O Berg CG Kurland (1997) ArticleTitleGrowth rate-optimised tRNA abundance and codon usage. J Mol Biol 270 544–550 Occurrence Handle10.1006/jmbi.1997.1142 Occurrence Handle1:CAS:528:DyaK2sXltlSnsb8%3D Occurrence Handle9245585
G Bernardi (2000) ArticleTitleIsochores and the evolutionary genomics of vertebrates. Gene 241 3–17 Occurrence Handle10.1016/S0378-1119(99)00485-0 Occurrence Handle1:CAS:528:DyaK1MXotVGksrw%3D Occurrence Handle10607893
G Bernardi B Olofsson J Filipski M Zerial J Salinas G Cuny M Meunier-Rotival F Rodier (1985) ArticleTitleThe mosaic genome of warm-blooded vertebrates. Science 228 953–958
JA Birdsell (2002) ArticleTitleIntegrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol Biol Evol 19 1181–1197 Occurrence Handle1:CAS:528:DC%2BD38Xlt1aktb4%3D Occurrence Handle12082137
A Coghlan KH Wolfe (2000) ArticleTitleRelationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast 16 1131–1145 Occurrence Handle1:CAS:528:DC%2BD3cXmvFSitbY%3D Occurrence Handle10953085
JM Comeron M Kreitman (2002) ArticleTitlePopulation, evolutionary and genomic consequences of interference selection. Genetics 161 389–410 Occurrence Handle1:CAS:528:DC%2BD38XkslChu7o%3D Occurrence Handle12019253
JM Comeron M Kreitman M Aguadé (1999) ArticleTitleNatural selection on synonymous sites is correlated with gene length and recombination rate in Drosophila. Genetics 151 239–249 Occurrence Handle1:CAS:528:DyaK1MXovVCgsA%3D%3D Occurrence Handle9872963
JL DeRisi VR Iyer PO Brown (1997) ArticleTitleExploring the metabolic and genetic control of gene expression on a genomic scale. Science 278 680–686 Occurrence Handle1:CAS:528:DyaK2sXmvVynsrs%3D Occurrence Handle9381177
L Duret D Mouchiroud (1999) ArticleTitleExpression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96 4482–4487 Occurrence Handle10.1073/pnas.96.8.4482 Occurrence Handle1:CAS:528:DyaK1MXjs1yls7Y%3D Occurrence Handle10200288
L Duret D Mouchiroud C Gautier (1995) ArticleTitleStatistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 40 308–317 Occurrence Handle1:CAS:528:DyaK2MXksl2gsL4%3D Occurrence Handle7723057
A Eyre-Walker (1993) ArticleTitleRecombination and mammalian genome evolution. Proc R Soc Lond B 252 237–243 Occurrence Handle1:CAS:528:DyaK2cXht1Khs7w%3D Occurrence Handle8394585
G Farkas BA Leibovitch SG Elgin (2000) ArticleTitleChromatin organization and transcriptional control of gene expression in Drosophila. Gene 253 117–136 Occurrence Handle10.1016/S0378-1119(00)00240-7 Occurrence Handle1:CAS:528:DC%2BD3cXlsFSlt7w%3D Occurrence Handle10940549
J Felsenstein (1974) ArticleTitleThe evolutionary advantage of recombination. Genetics 78 737–756 Occurrence Handle1:STN:280:CSqD1MvhvVc%3D Occurrence Handle4448362
SM Gasser R Paro F Stewart R Aasland (1998) ArticleTitleThe genetics of epigenetics. Cell Mol Life Sci 54 1–5 Occurrence Handle10.1007/s000180050120 Occurrence Handle1:CAS:528:DyaK1cXns1Cmuw%3D%3D Occurrence Handle9487382
JL Gerton J DeRisi R Shroff M Lichten PO Brown TD Petes (2000) ArticleTitleGlobal mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci USA 97 11383–11390 Occurrence Handle11027339
JH Gillespie (2000) ArticleTitleGenetic drift in an infinite population. The pseudohitchhiking model. Genetics 155 909–919 Occurrence Handle1:STN:280:DC%2BD3cvislKrsQ%3D%3D Occurrence Handle10835409
A Goffeau et al. (1996) ArticleTitleLife with 6000 genes. Science 274 563–567 Occurrence Handle10.1126/science.274.5287.546
M Gouy C Gautier (1982) ArticleTitleCodon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10 7055–7074 Occurrence Handle1:CAS:528:DyaL3sXivVOquw%3D%3D Occurrence Handle6760125
J Hey RM Kliman (2002) ArticleTitleInteractions between natural selection, recombination and gene density in the genes of Drosophila. Genetics 160 595–608 Occurrence Handle1:CAS:528:DC%2BD38XitlWrs7w%3D Occurrence Handle11861564
WG Hill A Robertson (1966) ArticleTitleThe effect of linkage on limits to artificial selection. Genet Res 8 269–294 Occurrence Handle1:STN:280:CCiB2cbotl0%3D Occurrence Handle5980116
T Ikemura (1981) ArticleTitleCorrelation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: A proposal for a synonymous codon choice that is optimal for the E. coli translation system. J Mol Biol 151 389–409
T Ikemura (1985) ArticleTitleCodon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2 13–34 Occurrence Handle1:CAS:528:DyaL2MXhvFyksbk%3D Occurrence Handle3916708
RM Kliman (1999) ArticleTitleRecent selection on synonymous codon usage in Drosophila. J Mol Evol 49 343–351 Occurrence Handle1:CAS:528:DyaK1MXmtlGksLc%3D Occurrence Handle10473775
RM Kliman A Eyre-Walker (1998) ArticleTitlePatterns of base composition within the genes of Drosophila melanogaster. J Mol Evol 46 534–541 Occurrence Handle1:CAS:528:DyaK1cXislGku7Y%3D Occurrence Handle9545464
RM Kliman J Hey (1993) ArticleTitleReduced natural selection associated with low recombination in Drosophila melanogaster. Mol Biol Evol 10 1239–1258 Occurrence Handle1:CAS:528:DyaK2cXht1KktrY%3D Occurrence Handle8277853
RM Kliman J Hey (1994) ArticleTitleThe effects of mutation and natural selection on codon bias in the genes of Drosophila. Genetics 137 1049–1056 Occurrence Handle1:CAS:528:DyaK2MXmslWmug%3D%3D Occurrence Handle7982559
WH Li (1987) ArticleTitleModels of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J Mol Evol 24 337–345 Occurrence Handle1:CAS:528:DyaL2sXktFyrurw%3D Occurrence Handle3110426
G Marais G Piganeau (2002) ArticleTitleHill-Robertson interference is a minor determinant of variations in codon bias across Drosophila melanogaster and Caenorhabditis elegans genome. Mol Biol Evol 19 1399–1406 Occurrence Handle1:CAS:528:DC%2BD38XntVyhtrc%3D Occurrence Handle12200468
G Marais D Mouchiroud L Duret (2001) ArticleTitleDoes recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci USA 98 5688–5692 Occurrence Handle10.1073/pnas.091427698 Occurrence Handle1:CAS:528:DC%2BD3MXjs1WnsLo%3D Occurrence Handle11320215
GAT McVean B Charlesworth (2000) ArticleTitleThe effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155 929–944 Occurrence Handle1:STN:280:DC%2BD3cvislKrtw%3D%3D Occurrence Handle10835411
EN Moriyama JR Powell (1997) ArticleTitleCodon usage bias and tRNA abundance in Drosophila. J Mol Evol 45 514–523 Occurrence Handle1:CAS:528:DyaK2sXmvFSktbY%3D Occurrence Handle9342399
EN Moriyama JR Powell (1998) ArticleTitleGene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae, and Escherichia coli. Nucleic Acids Res 26 3188–3193 Occurrence Handle10.1093/nar/26.13.3188 Occurrence Handle1:CAS:528:DyaK1cXks1Crs7w%3D Occurrence Handle9628917
T Ohta (1973) ArticleTitleSlightly deleterious mutant substitutions in evolution. Nature 246 96–98 Occurrence Handle4585855
R Percudani A Pavesi S Ottonello (1997) ArticleTitleTransfer RNA gene redundancy and translation selection in Saccharomyces cerevisiae. J Mol Biol 268 322–330 Occurrence Handle10.1006/jmbi.1997.0942 Occurrence Handle1:CAS:528:DyaK2sXjsVamu7k%3D Occurrence Handle9159473
PM Sharp E Cowe (1991) ArticleTitleSynonymous codon usage in Saccharomyces cerevisiae. Yeast 7 657–678 Occurrence Handle1:CAS:528:DyaK3MXmsFWktr4%3D Occurrence Handle1776357
PM Sharp KM Devine (1989) ArticleTitleCodon usage and gene expression level in Dictyostelium discoideum: Highly expressed genes do prefer optimal codons. Nucleic Acids Res 17 5029–5039 Occurrence Handle1:CAS:528:DyaL1MXlt1Ortrw%3D Occurrence Handle2762118
PM Sharp WH Li (1987) ArticleTitleThe codon adaptation index—A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15 1281–1295 Occurrence Handle1:CAS:528:DyaL2sXhtlejtrc%3D Occurrence Handle3547335
DC Shields PM Sharp DG Higgins F Wright (1988) ArticleTitle“Silent” sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol Biol Evol 5 704–716 Occurrence Handle1:CAS:528:DyaL1MXitVym Occurrence Handle3146682
StatSoft Inc. (2001) STATISTICA (data analysis software system), version 6. http://www.statsoft.com
M Stenico AT Lloyd PM Sharp (1994) ArticleTitleCodon usage in Caenorhabditis elegans: Delineation of translational selection and mutational biases. Nucleic Acids Res 22 2437–2446 Occurrence Handle1:CAS:528:DyaK2cXlsFSht7s%3D Occurrence Handle8041603
F Wright (1990) ArticleTitleThe ‘effective number of codons' used in a gene. Gene 87 23–29
Acknowledgements
This work was supported by NIH Grant R15HG02456 to R.M.K. We thank G. Bernardi and two anonymous reviewers for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kliman, R.M., Irving, N. & Santiago, M. Selection Conflicts, Gene Expression, and Codon Usage Trends in Yeast . J Mol Evol 57, 98–109 (2003). https://doi.org/10.1007/s00239-003-2459-9
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00239-003-2459-9