Introduction

With the recent proliferation of sequence data there is much interest in determining those genes on which positive selection has acted (see, e.g., Clark et al. 2003). Similarly, it is desirable to know where in the sequence selection might have acted and relate that to the biology of the gene/protein in question. The common method for diagnosing the mode of sequence evolution considers the ratio of nonsynonymous substitutions per nonsynonymous site (K A), to the corresponding figure for synonymous substitutions (K S). A ratio (K A/K S) greater than unity is taken as evidence for positive selection, promoting change at the protein level. This interpretation need not, however, be correct. In principle, if purifying selection is stronger on synonymous mutations than is purifying selection on the protein, K A/K S >1 would also be found.

The latter possibility is typically considered to be so bizarre as to be effectively worth ignoring. In one case, however, BRCA1, a very high K A/K S intragenic peak, found using sliding window analysis, was observed in both human-dog and mouse-rat orthologous gene alignments at the same location (Hurst and Pal 2001). The peak was associated not with an increased rate of protein evolution in the critical domain but rather with a strikingly reduced rate of synonymous evolution. This was interpreted as being consistent with purifying selection on synonymous mutations. Indeed, while K A/K S ratios greater than unity can be owing to forces other than positive selection—they can, for example, be recovered in simulations of background selection when assuming that synonymous mutations are neutral (Palsson 2004)—we are aware of no alternative interpretation, save for sampling artifact, for domains of low K S but moderate K A. While Hurst and Pal (2001) were unable to identify a possible cause, subsequently it was noted that this critical region was one containing splice enhancer domains associated with alternative splicing (Orban and Olah 200l). It was thus suggested that the K A/K S peak was owing to highly regionalized selection against synonymous mutations that affect alternative splicing. Since then, much evidence for selection against mutations that affect splicing and splice enhancer domains has emerged (Carlini and Genut 2006; Cartegni et al. 2002; Chamary et al. 2006; Chamary and Hurst 2005a; Chen et al. 2006; Ermakova et al. 2006; Fairbrother et al. 2004a; Parmley et al. 2006; Plass and Eyras 2006; Willie and Majewski 2004; Xing and Lee 2005a, 2005b, 2006a, 2006b). Other forms of selection on synonymous mutations are, however, possible. For example, selection for mRNA folding (Chamary and Hurst 2005b; Duan et al. 2003; Nackley et al. 2006), micro-RNA/mRNA binding (Hurst 2006), and translational pausing by use of rare codons (Kimchi-Sarfaty et al. 2007), all have some empirical support and could potentially be regionalized within genes. More generally, large spans of intragenic domains associated with low synonymous rates of divergence have been found (Schattner and Diekhans 2006).

The above case history of BRCA1 tempts an obvious question: if one were to repeat the same form of sliding window analysis on very many genes, how commonly would one find K A/K S > 1 intragenic peaks best explained by localized selection on synonymous mutations? The problem, however, centers on what one means by “best explained”, which in the case of sliding window analysis poses multiple difficulties. First it is necessary to isolate K A/K S > 1 peaks and attempt to classify them according to the pattern of rate variation around the region and in the gene more generally: are they regional K S dips showing no increase in K A, K A peaks associated also with higher than average K S, or might they be undeterminable?

Having identified possible K S dips we cannot, however, be confident that we are witnessing selection on synonymous mutations. Most of the problems stem from the fact that sliding window analysis has no formal statistical basis and is difficult to defend rigorously. Most notably, as the method requires multiple windows to be examined, the probability of spurious peaks is acute, made more so by nonindependence between overlapping windows. Note too that the requirement for many windows and lower limit on window size for accurate estimation of K A and K S mean that the method is also applicable only to long genes. To control for the multiple testing and nonindependence problems we assemble random genes by shuffling the codons in each alignment and determine how often a given peak is expected to be observed given the underlying rates of evolution, applying the same sliding window protocol to all the randomised versions. Even after making allowance for such error there remains the possibility that any significant peak is still just spurious, especially as multiple genes are being tested. To be more confident that the low K S is associated with selection against synonymous mutations, it is best if, as in the case of BRCA1, some evidence for the same pattern is observed in an independent comparison.

Methods

Genes and Alignment

A file of 12634 orthologous mouse/rat genes was obtained from the Mouse Genome Informatics Web site (ftp://ftp.informatics.jax.org/pub/reports/index.html#orthology). The EntrezGene IDs were used to search the NCBI database for corresponding mouse and rat RefSeqs. Orthologous genes were discarded if the gene was only classed as predicted in either species. The mouse gene sequence and exon position data were obtained from Ensembl, whereas the rat sequence was obtained from NCBI. The orthologues were aligned using MUSCLE (Edgar 2004). Orthologues less than 1500 bp in length were discarded to allow for a sliding window analysis. Any alignments with indels of either high frequency (>5 indels per 1 kbp) or long length (>30 bp) were also discarded as potential poorly aligned sequence. For a list of the 1074 remaining genes and accession numbers of genes used in the study, see supplementary data 1.

Sliding Window

The synonymous and nonsynonymous rates of substitution were calculated by the Li method (Li 1993; Pamilo and Bianchi 1993) for windows of varying sizes moving three codons along the sequence for each “slide”. In our application of the sliding window, if either extracted sequence contained an indel, then another codon was included in the window until an effective codon window size was achieved. The substitution rates and hence the K A/K S ratio for each window were calculated and those that were >1 had their cause assessed to be due to either a peak in local K A or a dip in K S. A ratio peak is deemed to be a peak in K A under two conditions: first, if both K A and K S are higher than that of the gene average. This we refer to as the strict definition of a K A peak. However, this strict definition, although rigorously defendable, will miss cases where K A is much higher than the average but K S is just below the average. To attempt to ensure that what is true for the strict set might be more generally true, it is desirable to define a more generous definition. This we do by defining “much higher” and “little lower” by the deviation of the observed values from the genic mean in terms of the number of standard deviations (i.e., Z scores). The standard deviation in K A and K S was determined from the observed windows. If at the K A > K S peak, the ratio of the Z score for K A to the Z score for K S is >2, we consider this to be a “generous” K A peak. Likewise, the K A/K S peak is deemed due to a dip in K S under two conditions: if both K A and K S are below the gene average (a strict K S dip) or if K A is a little higher than the mean and K S is much lower than the mean, i.e., the ratio of the Z score for K A to the Z score for K S is <0.5 (a generous K S dip). Any other permutations are considered to be a combination of factors and therefore have an undefined cause. The figure for defining a generous peak (2 or 0.5 ratio of Z scores) is an arbitrary choice but by visual inspection (Supplementary Fig. 4) appears to capture the appropriate forms of peak. It does, for example, capture the previously discussed K S dip in BRCA1. Restricting analysis to only those peaks classified under the strict definition does not qualitatively affect conclusions (Supplementary Fig. 1).

Obtaining Exonic Splicing Enhancers

Exonic splicing enhancer sequences, for human and mouse, were downloaded from http://www.genes.mit.edu/burgelab/rescue-ese. These candidate exonic splicing enhancer (ESE) sequences were previously identified by the Burge group, by assaying whether oligonucleotide motifs exhibit splicing activity in vivo. The 238 human (Fairbrother et al. 2002) and 380 mouse (Yeo et al. 2004) ESE hexamers were determined using Relative Enhancer and Silencer Classification by Unanimous Enrichment (RESCUE), a computational approach followed by experimental validation. Briefly, the method identifies motifs that are (1) significantly enriched in exons relative to introns and (2) significantly more frequent in exons with weak nonconsensus splice sites than in exons with strong consensus splice sites (Fairbrother et al. 2004b). Motifs that match these criteria are then grouped into clusters, after which representatives from each cluster are tested for ESE activity in vivo using a splicing reporter system.

SNP Analysis

The locations within the gene of synonymous SNPs in mouse were obtained by screening all the genes in our full long gene data set at dbSNP (Sherry et al. 2001) using each gene’s unique unigene identification. The total number of SNPs in the full data set, along with the full length of all sequences, was then employed to define the expected SNP count within and outside the K S dip windows.

Results

At KA/KS > 1 Peaks, KS Dips Are More Common Than KA Peaks

To investigate the relative contribution from a decrease in K S and the increase in K A to locally observed peaks in the K A/K S ratio, a sliding window analysis was implemented and applied to 1074 mouse-rat genes longer then 1500 base pairs (bp). For a given window size we considered overlapping windows in the gene and identified those windows showing K A/K S > 1. A peak was defined as a maximal point in K A/K S with at least six windows on either side of the peak having a lower ratio. All peaks were categorized as either being K A peaks (when K A was unusually high and K S not unusually low), a K S dip (when K S was unusually low and K A not unusually high), or undetermined (see Methods). We applied both a strict and a more generous set of definitions (see Methods).

Any results are likely to be sensitive to choice of window size, as there exists a compromise between the calculation accuracy of the substitution rate and the dilution of the signal from potentially selected regions by neutrally evolving neighbouring sequence. As the calculation of K A/K S for sequences shorter than circa 100 codons is thought to be error-prone, we set this as a lower limit but repeated the analysis using multiple larger window sizes.

As window size increases, so the absolute number of K A/K S peaks is reduced, as one might expect as larger windows potentially dilute a weak signal produced by selection on a small area (Fig. 1). Depending on window size, a K A/K S > 1 peak is found at a rate 0.15–0.2 per gene, with approximately 10% of genes showing at least one K A/K S > 1 peak. With a mean coding sequence size of around 2300 bp, this approximates to one peak per 12–15 kb of exonic sequence. Unexpectedly, at all window sizes we observe the same trend, namely, that K S dips contribute to a larger proportion of peaks in K A/K S ratio than an increase in K A (for generous and strict definition results see Fig. 1; for strict definition alone see Supplementary Fig. 1). The relative proportion of peaks that are K S dips as opposed to K A peaks varies from ∼60% to just over 50% in the longer windows.

Fig. 1
figure 1

The frequency of K A/K S peaks per gene that are due to either a peak in K A, hence positive selection (dashedline), a dip in K S, hence synonymous purifying selection (solidline), or an ambiguous cause where both events are concomitant (dash-dot line), as a function of the size of the sliding window.

Allowing for False Positives

A problem with the sliding window analysis is the occurrence of spurious (“false-positive”) peaks, not least because, even in randomly generated genes with no force producing intragenic heterogeneity in K S, a high variance between windows is nonetheless possible. To minimize this possibility, and hence to control for spurious peaks owing to use of multiple windows in the same gene, a randomization was implemented. The codons of each gene were shuffled for 100 repetitions; the resulting sequences were assessed by the same sliding window analysis that determines the proportion of ratio peaks for a window size of 102 effective codons. A real peak was determined to be significant if <5% of the 100 simulants of each gene were able to produce a peak with a higher ratio. The sliding window analysis was repeated, as before, to identify the effects of increasing window sizes on the purged set of genes with no loss of magnitude in the contribution by K S. Of 103 genes with at least one intragenic K A / K S peak > 1, we find that 47 have at least one peak significantly higher than unity. Of the significant peaks, again, there are more that are K S dips rather than K A peaks (35 K S dip in 25 genes versus 7 K A dips in 6 genes). This qualitative finding is true under both the strict (Supplementary Fig. 2) and the more generous definitions (Supplementary Fig. 3). Sliding window plots of all genes with a K A/K S > 1 peak are shown in Supplementary Fig. 4.

Statistically Significant KS Dips Are for the Most Part Not Repeatable

Can we be confident that the few K A /K S peaks that are significant and associated with reduced K S are really the result of purifying selection on synonymous mutations? To examine this, we took the seven genes (Supplementary Table 1) in which we have K S dips that are strictly defined and statistically significant and found, via homologene at NCBI, the sequence of human and dog (or pig) orthologues. We then performed a four-way alignment with the mouse-rat sequence and reimplemented the sliding window analysis.

As can be seen (Figs. 2a and b) only two of the seven show both K A>K S classified as a K S dip at the same intragene location in mouse-rat as in human-dog (for the other five see Supplementary Fig. 5; for details of orthologues see Supplementary Table 1). The two genes with repeatable K S dips are retinoid X receptor interacting protein 110 (Rxrip110: dip at ∼600 nucleotides in alignment) and chloride channel CLIC-like 1 (Clccl: dip at ∼1200 nucleotides in alignment). Only in the former case is the K S dip in the human-dog comparison a strictly defined dip. These two constitute the strongest evidence that we have that some deterministic force is constraining K S in these localized subdomains.

Fig. 2
figure 2

Three genes with repeatable K S dips at K A/K S peaks. a Retinoid X receptor interacting protein 110 (dip at ∼600 nucleotides in alignment). b Chloride channel CLIC-like 1 (dip at ∼1200 nucleotides in alignment). c Nuclear autoantigenic sperm protein (Nasp; dip at ∼1300 nucleotides in alignment). Mouse/rat sequences are shown in gray; human/other, in black. K S is a solid line: K A, a dotted line.

If we extend this final analysis to include those K S dips that are significant and more generously defined (of which there are 18 individual dips, including the 7 strictly defined ones, for which we can obtain informative four-way alignments), we find one more instance of repeatability, this being in nuclear autoantigenic sperm protein (Nasp) (Fig. 2c). In another case, Ltbpl, the K S shows striking reduction in the same domain in both comparisons, but only in the mouse-rat analysis does K A exceed K S (Supplementary Fig. 6g). Thus, we find that only 3, possibly 4, of 18 K S dips show repeatability of the dip (see Supplementary Table 1, Supplementary Fig. 6). This finding suggests that K S dips at K A/K S peaks can at best be a weak guide to domains of interest, if without the support of independent confirmation. Note, however, that our dual criteria of both statistical significance and repeatability may be too stringent and provide false negatives. For example, the peak in Brca1, associated with splice control, while repeatable in at least two independent contrasts, is not statistically significant, owing to the high variance in K A and K S of this gene.

KS Dip Domains Are Not Associated with Low Synonymous SNP Counts

While the lack of repeatability of K S dips is strongly suggestive of spurious significance, perhaps those that are nonrepeatable may yet be under selection, but just in rodents? To address this possibility, we assessed the SNP density within our critical windows. If there is an element within our critical windows at which selection is strong enough to reduce K S, then we would expect the SNP density within this region also to be reduced.

SNP data for all genes in our sample were obtained from dbSNP at NCBI. This provides data on the location of synonymous SNPs within our genes (but not their frequency). We could then determine the number of synonymous SNPs within the critical K S dip windows. We then compare this number to the number expected in and out of the critical windows using a chi-square test, given the number of SNPs in the sample as a whole and the relative proportion of sequence contained within the K S dip windows. The test was repeated for the more stringent definitions of our critical windows. We also employed two different nulls, one employing the SNP density across all genes in the sample and a second applying the SNP density across only those genes within which we find K S dips. Given the possibility of between genes deterministic differences in SNP density, the second is probably the more stringent.

Although the observed numbers of SNPs within our windows is consistently lower than the predicted value, this value was only significant in the most liberally defined set (generous dip definition including nonsignificant peaks) when applying a null derived from the complete gene set. We would also expect that the chi-square value would increase with the stringency of the definition, but this was not observed (Table 1). Moreover, if we use the number of synonymous SNPs seen in the genes containing the K S dips as the basis for the null expectation, then we see no significant reduction in the K S dip domains. We conclude that we find no evidence that domains identified as K S dips have unusually low SNP rates, be they significant or not.

Table 1 Chi-square analysis of SNP density in genes associated with K S dips: (A) comparison of the grouped “critical” windows to a grouped set of their parent genes; (B) comparison of the grouped “critical” windows to the whole data set

Discussion

We find that K A /K S intragenic peaks are relatively common, occurring about once for every 10,000 bp of exonic sequence, this figure being dependent on the window size employed. Many of these, if not most, are not statistically significant and may be regarded as spurious. When all the peaks, be they significant or not, are divided into those that are likely owing to regional increases in rates of protein evolution or regional reduction in the rate of synonymous evolution, unexpectedly we find an excess of the latter. Employing stricter definitions of the form of the peak and requiring the peak to be significantly high does not alter this conclusion but does render the number of incidences much lower. Assuming that our sample of well-described, long genes is representative of genes more generally, this suggests that it is unwise to make the inference that a K A/K S > 1 peak is in and of itself indicative of a region of positive selection. The lack of evidence for repeatability and the lack of evidence for selection from SNP data also suggest that many, if not most, of those few K S dips that are statistically significant are also possibly spurious.

In sum, from over 1000 long genes we have found only three examples of K S dips for which we can provide prima facie evidence that the low synonymous substitution rate associated with the dip requires a special explanation (other than spurious noise). Can we say anything about the cause of the reduced K S in the three repeatable cases? An association between alternative splicing and high K A/K S (Chen et al. 2006; Ermakova et al. 2006; Plass and Eyras 2006; Xing and Lee 2005a, 2005b, 2006a, 2006b) and low K S (Parmley et al. 2006) has been suggested in several studies. Might there be a correlation here as well? Given the difficulties associated with providing any definitive statements as to the presence/absence of alternative transcripts, to investigate this we examined three resources: the alternative splicing database (http://www.ebi.ac.uk/asd/) (Stamm et al. 2006), the alternative transcript database (http://www.ebi.ac.uk/atd/) (Le Texier et al. 2006), and Ensembl. In two cases (Rxrip 110 and Nasp) we find that the repeatable K S dips are in “cleanly” described alternative exons, by which we mean that there are long transcripts having nearly all the same exons, but excluding the K S dip-containing one (Table 2, Supplementary Fig. 7). In the other case (Clcc1) we find evidence for a long transcript that omits the final two exons, the K S dip being in the last but one exon (Supplementary Fig. 7). Analysis of the orthologous human sequences, within the same reference databases, reveals evidence that this alternative splicing is maintained in the three human genes (data not shown).

Table 2 Association of the exon containing or spanned by the statistically significant K S dip domains with alternative exons

In rejecting the 15 statistically significant but nonrepeatable examples, might we have been too stringent? Might it be the case that the SNP data are too sparse to be informative and the repeatability assay unnecessarily restrictive, excluding real examples of selection unique to rodents? One way to address this is to test for particular mechanisms by which selection acts on synonymous sites. Such tests are by necessity weak, as they require the majority of dips to be associated with the same form of selection. We shall, however, examine the two dominant and related explanatory variables: alternative splicing and splice control. For all the genes with significant peaks we hence scrutinized the same alternative transcript resources as above to look for an association with alternative splicing (Table 2). While there is evidence that the three repeatable cases are associated with alternative exons, we find no similar evidence to imply that the remainder are. For a few of the genes there is evidence for very small transcripts missing most exons (including the one with the K S dip) but no evidence for any association with cleanly described alternative exons. To further check we also consulted the Hollywood database of alternative splicing (http://www.hollywood.mit.edu/Login.php) and, again, found no evidence for an association between the nonrepeatable dips and alternative exons (data not shown).

Perhaps the K S dips are not the result of alternative exons but are more generally owing to selection on synonymous mutations associated with splicing control, as evidenced by the reduced SNP density and synonymous rates of evolution in exonic splice enhancers (ESEs) near intron-exon boundaries (Carlini and Genut 2006; Fairbrother et al. 2004a; Parmley et al. 2006). Are, then, the K S dip domains enriched for splice enhancer elements, and are they associated with intron-exon boundaries? To deduce whether the windows defined by K S dips have significantly greater proportions of ESE than we would expect, we compared the windows in which the peak in K A/K S occurred, that are putative K S dips, with all other windows from the same gene. We find no evidence for enrichment of K S dip regions with splice enhancer hexamers (for significant K S dips, Z = 0.09 ± 1.08, p > 0.339). The same is true if we analyze all possible K S dips, be they significant or not (mean Z = 0.03 ± 1.08, p = 0.308). Similarly, for those K S dips in which most of the sequence is near an intron-exon junction (>90% within 70 bp), we see no evidence for enrichment in ESEs compared with other windows in the same gene equally close to junctions (mean Z = 0.04, p >> 0.05).

Are K S dip domains especially close to intron-exon junctions? We determined the proportion of the window containing the K A/K S peak, due to reduced K S, that was within 70 nucleotides of any intron-exon boundary, this being thought to be the approximate span of splice regulating elements. To determine whether there was any skew compared to that we would expect by random chance, a simulation was employed. Each critical window was compared to 100 randomly sampled windows from the same gene. A p-value was determined as the fraction of randomly sampled windows that had a greater than or equal proportion of sequence within 70 bp of an intron-exon boundary, compared to the critical window. Of the 128 dips under the broadest definition, seven reside closer to boundaries than expected by chance, at p=0.05. None of these are significant peaks. Note too that the proportion of p-values falling below 0.05 was 7/128 = 0.054: more or less as might be expected by chance (we confirmed this also by simulation using a randomly picked window as a pseudo-K S dip; data not shown).

From the above tests we surmise that there is little reason to suppose that the K S dips that we rejected as probably being spurious were falsely rejected. Employing the direct tests, however, we found one noteworthy aspect to the data: those K S dip windows with no sequence within 70 bp of an intron-exon boundary avoid the center of the exon (Fig. 3). While the dip test for bimodality (Hartigan and Hartigan 1985; Hartigan 1985) fails to reject the null of unimodality, there is a tendency for the K S dip windows to be non-randomly located. If we section the interior of the exon into quarters, we can confirm this skew by chi-square analysis. If peaks were to occur evenly throughout the exon (stochastic variance), then we would expect 6.25 ratio peaks per quarter; what we actually see is a distribution of 12:8:4:1, χ2 = 11.00, p < 0.001. In addition, there is a strong correlation observed between the position of the window within the exon and the position of the window within the gene (see Supplementary Fig. 8, Spearman rank correlation = −0.5523, p = 0.0042): exons near the beginning of the gene tend to have ratio peaks in the 3′ region, whereas those exons nearer the 3′ end of genes tend to have peaks at their 5′ ends. Why this might be is far from transparent. The windows away from boundaries are not especially enriched for exonic splice enhancers: of 42 K S dips, 22 have a higher ESE density than the other windows from the same gene, and the remaining 20 having a lower density (Sign test, p = 0.88).

Fig. 3
figure 3

The relative location of the center of the K S dip window as a function of proximity to intron-exon junctions for those windows with no part of the sequence within 70 bp of an intron-exon window.

From the above we conclude that, after attempting to make some allowance for problems inherent in sliding window analysis, domains for which we can find coherent evidence that synonymous mutations are under selection appear to be rare. One could equally well conclude that sliding window analysis, while a common method (see, e.g., Endo et al. 1996; Fares et al. 2002; Gissen et al. 2005; Huttley et al. 2000; Talbert et al. 2004) and featured in several computer applications (e.g., Fares 2004; Filatov 2002; Liang et al. 2006; Rozas and Rozas 1999), is both a weak and hard-to-defend mode of analysis. It is striking that for so many of the K A/K S > 1 peaks we cannot eliminate the possibility of spurious occurrence owing to multiple sampling. That there are more K A/K S > 1 peaks resulting from lowered K S (spurious or otherwise) than raised K A also suggests that it is very unwise to take K A/K S>1 peaks as prima facie evidence for positive selection.

Recently, there have also been a number of genome scans to identify positive selection using polymorphism data alone or in addition to divergence data (e.g., Carlson et al. 2005; Hanchard et al. 2006; Hutter et al. 2006; Voight et al. 2006). Whether the same false-positive problems apply in these instances remains to be seen. It is also unclear whether similar problems will affect more sophisticated sliding window analyses. For example, Liang et al. (2006) have developed a sliding window K A/K S procedure that allows windows to be defined by reference to the three-dimensional structure of the protein. Given our results, we would suggest that, to be conservative, even in this more directed approach, interpretation of an intragenic K A/K S ratio >1 is best treated with caution.