Background

The standard genetic code includes three stop codons, UAG, UAA and UGA, that signal the end of translation. In eukaryotes termination of translation involves two proteins, eRF1 and eRF3, termed release factors. When a stop codon is within the A site of the ribosome it is recognized by eRF1. The nascent polypeptide is then released from the ribosome in a process mediated by a ternary complex of eRF1, eRF3 and guanosine triphosphate (GTP; Hellen 2018). Although all three stop codons can be recognized by eRF1, the efficiency of translation termination differs significantly between them, ranging from UAA (highest fidelity of translation termination) to UGA (lowest), with UAG being intermediate (Dabrowski et al. 2015). The altered conformation of the ribosome accommodates the nucleotide immediately downstream of the stop codon (known as the +4 nucleotide) within the A site (Brown et al. 2015) and this nucleotide can also have a substantial impact on the efficiency of translation termination (Cridge et al. 2018; Dabrowski et al. 2015). Nucleotides further downstream of the stop codon and nucleotides upstream of the stop codon can also affect translation termination (Cridge et al. 2018). Failure to terminate translation at the stop codon is known as stop codon readthrough and some human genes have been found to have stop codon readthrough rates as high as 17% (Loughran et al. 2017).

Consistent with the higher efficiency of translation termination associated with purines at the +4 position, G is over-represented at this position in mammalian genes (McCaughan et al. 1995), particularly in highly expressed genes (Tate et al. 1995). By contrast, despite having the lowest efficiency of translation termination among the three, UGA is the most common stop codon in many multicellular eukaryotes (Sun et al. 2005). The relative frequencies of the three stop codons are dependent on multiple factors and strongly associated with variation in GC content of the coding sequence (Trotta 2016). The frequency of UAA is strongly negatively correlated with GC content, while the use of UAG and particularly of UGA increases with increasing GC content. This suggests a mutational origin for the variation in stop codon use, as mammalian GC content may be largely determined by mutational effects, reflecting variation in deoxynucleoside triphosphate (dNTP) abundance across S-phase that favors the incorporation of A and T nucleotides in late-replicating DNA (Kenigsberg et al. 2016). However, the relationship with GC content is less strong for stop codon use than for sense codons and this has been interpreted as evidence that selection also contributes to the choice of stop codon (Trotta 2016). This is further supported by the associations that have been reported between protein function and stop codon use (Trotta 2016).

Although a preference for efficient translational termination could explain variation in stop codon preference between different gene classes, it does not explain why UGA, the least efficient stop codon, remains the most frequent stop codon in many multicellular eukaryotes, including human, in which it occurs in close to 50% of protein-coding genes. This suggests either that the difference in fitness resulting from differences in the efficiency of translation termination are small or, alternatively, that stop codon readthrough may have useful functional consequences. Programmed readthrough, which refers to readthrough that contributes to biological function, was discovered in viruses and enables their typically compact genomes to increase their coding capacity (reviewed in Firth and Brierley 2012). Only a limited number of cases of functional readthrough have been confirmed in higher eukaryotes; however, the application of genome-scale techniques has recently suggested over a hundred human genes as candidates for functional readthrough (Schueren and Thoms 2016).

Stop codon readthrough may also have an important regulatory role, potentially involving mechanisms that degrade proteins resulting from readthrough translation (Arribere et al. 2016). Yordanova et al. (2018) recently proposed an intriguing model whereby low-level readthrough of a stop codon may play a role in gene regulation and mRNA quality control by placing a constraint on the total translational capacity of an mRNA. Under this model, ribosomes that continue past the stop codon form a queue, backing up from a downstream ribosome stalling site. The rate of stop codon readthrough together with the length of the interval between the stop codon and the stall site control the number of times the mRNA can be translated before the ribosome queue backs up as far as the stop codon, inhibiting translation. Currently it is not known how widespread this mechanism is; however, if it is common it may have an impact on stop codon use, as the different readthrough efficiencies of the different stop codons would have implications for the number of times the mRNA is translated. Termination of translation is a slower process than the addition of an amino acid to the nascent peptide; hence, stop codons themselves are often used as ribosome stalling sites. Such stalling may affect mRNA stability via NoGo decay (Doma and Parker 2006) (i.e., the decay of mRNA associated with stalled ribosomes), but may have other important functions as in the above example or in the example of the XBP1 gene where it is required for its unusual cytoplasmic mRNA splicing (Yanagitani et al. 2011). Yet another regulatory event that involves stop codons is programmed ribosomal frameshifting (Atkins et al. 2016) that often takes place at stop codons, e.g., +1 frameshifting in all three human antizyme genes (OAZ1, OAZ2 and OAZ3) takes place at stop codons and their identity is highly conserved. These, or as yet undiscovered functional implications of stop codon choice, may provide a selective explanation for the markedly high abundance of UGA stop codons in multicellular eukaryotes.

Here we set out to assess the extent of purifying selection affecting stop codon evolution in mammals. We extended models of codon sequence evolution that have previously been used to assess selection acting on coding sequences (Anisimova and Kosiol 2009) to encompass the stop codon and fitted mixture models to estimate both the strength of purifying selection and the proportion of genes affected by purifying selection acting on the stop codon. Our model enabled us to assess the statistical evidence for selection for individual genes. Genes with conserved stop codons showed striking characteristics, including longer \(3^{\prime }\) UTRs and shorter mRNA half-life compared to other genes. Stop codon conservation was more strongly associated with the selective constraints acting on the coding sequence than with the GC content of the gene, suggesting a selective, rather than a mutational origin for the variation in stop codon use with GC content.

Results

Model

Standard codon models are characterized by a \(61\times 61\) transition rate matrix, \(\mathbf{{Q}},\) that gives the instantaneous rate of transition between each pair of sense codons (Muse and Gaut 1994; Goldman and Yang 1994; Delport et al. 2009; Anisimova and Kosiol 2009). Typically, these models assume that codons evolve through single nucleotide substitutions, according to a continuous-time Markov process, so that the instantaneous rate of transition between codons differing at more than one position is zero. Here we extend models of codon evolution by proposing a full \(64\times 64\) rate matrix that includes the three stop codons. As implemented here, we set the rate of substitutions between sense and stop codons to zero. Although the point at which translation terminates may shift (resulting from mutations between sense and stop codons), we consider only aligned sequences with the stop codons positionally homologous to the end of the alignment and assume correctness of the sequence alignment. The model can be modified easily to allow mutations between stop and sense codons by the addition of parameters corresponding to the rates of these substitutions. Note that the stop codon UAA is accessible by a single base mutation from both of the other stop codons (UAG and UGA); however, the instantaneous rate of transition between UAG and UGA is zero, as it requires two single nucleotide substitutions. We note also that, unlike standard codon models, the transition probability matrix obtained by exponentiating our rate matrix is not irreducible and does not, therefore, have a unique stationary distribution (a chain starting with a sense/stop codon will remain a sense/stop codon). Several forms have been proposed for the generator matrix of codon substitution models, differing mainly in how differences in codon usage are modeled. The model of Muse and Gaut (1994) sets the rate of substitution from codon i to codon j (which differ at a single nucleotide position, k) to be proportional to \(\pi _{j_{k}},\) the equilibrium frequency of the nucleotide at position k of codon j. We follow this approach for all of the results presented in the manuscript, as it has been found to be less prone to detecting spurious context-dependent effects than the version of Goldman and Yang (1994), which sets the substitution rate to be proportional to the equilibrium frequency of codon j (Lindsay et al. 2008). The entries, \(q_{ij},\) of the rate matrix of our model are therefore:

$$q_{ij}=\left\{ \begin{array}{ll} \pi _{j_{k}} & {\text {synonymous transversion}}\,(i, j \in S) \\ \kappa \pi _{j_{k}} & {\text {synonymous transition}}\,(i, j \in S) \\ \omega \pi _{j_{k}} & {\text {nonsynonymous transversion}}\,(i, j \in S) \\ \omega \kappa \pi _{j_{k}} & {\text {nonsynonymous transition}}\,(i, j \in S) \\ 0 & {\text {if }}i{\text { and }}j{\text { differ at }}>1{\text { position}}\\ \phi \kappa \pi _{j_{k}} & {\text {synonymous transition}}\,(i, j \in N) \\ 0 & i \in N \oplus j \in N, \end{array}\right.$$
(1)

where S and N are the sets of sense and stop (or nonsense) codons, respectively (note that all mutations between stop codons are transitions). The last two conditions represent the extension of the model to accommodate stop codons. A parameter, \(\phi ,\) models the substitution rate between stop codons, relative to the rate of synonymous substitutions between sense codons. The last condition assigns a zero rate for substitutions between sense and stop codons (the exclusive OR means that a zero rate applies if exactly one of i and j is a stop codon). Although a parameter can easily be added to allow mutations between sense and stop codons (resulting in a shift in the stop codon position) as presented above the model consists of two subsets of intercommunicating states (corresponding to the sense and stop codons). This model could be fitted as a combination of a standard codon model and a \(3\times 3\) rate matrix for the stop codons; however, the model parameters would need to be estimated jointly and therefore we consider that this is best represented as a single reducible Markov process.

We implemented this model in R [45], optimizing parameters using the Nelder–Mead (1965) method. We simulated data under the model and found that we could recover simulated values of \(\phi ,\) the parameter of interest, with no evident bias (Fig. S1a; slope = 1.008, estimated with robust regression, using an M estimator). We also simulated data under the Goldman and Yang (1994) model, using empirical triplet frequencies which we obtained from intron sequences, to investigate if we could recover \(\phi ,\) when the simulated data were generated with a different version of the codon model to the one used for inference. We found that \(\phi\) could still be recovered from the simulated data with little bias (Fig. S1b; robust linear model slope = 1.04; see “Methods” section for details of the simulations).

The model we present above essentially leverages the rate of synonymous substitution in the coding region to infer the effects of selection on substitutions between synonymous stop codons. However, it is possible that the parameters of the model are influenced by selection acting on synonymous mutations within the coding region. This could result from selective codon use (reflecting differences in the abundance of different tRNA species) or selection against mutations that alter mRNA splicing, which can reduce the rate of synonymous substitution. Although the latter effect is likely to make our method conservative (see “Discussion” section), it is possible that the properties of the coding sequences distort our parameter estimates resulting in bias in our inference of selection affecting mutations between synonymous stop codons. Therefore, we developed an alternative approach to inferring selection acting on substitutions between synonymous stop codons based on alignments of intron sequences from the same gene (see “Methods” section for details). This model makes use of appropriately scaled branch lengths and model parameters (e.g., the transition to transversion rate ratio) estimated from the intron to model stop codon evolution. In this case \(\phi\) is the rate of synonymous substitutions between stop codons relative to the rate of transitions in the intron (accounting in both cases for unequal nucleotide frequencies). Although this method does not make any use of the coding sequence alignment (other than the stop codon) estimates of \(\phi\) obtained in this way were close to and well correlated with estimates obtained using the coding sequence alignment (Fig. S2). Because all mutations between synonymous stop codons are A \(\leftrightarrow\) G, it is possible that differences in rates between different synonymous substitution types could bias our results; however, we found that results obtained from the intron sequences using the GTR model (Tavaré 1986) were very similar to results obtained using the HKY model (Hasegawa et al. 1985), upon which the MG model and, by extension, our model are based (Fig. S3).

Proportion of Stop Codons Under Purifying Selection

To estimate the proportion of stop codons evolving under the influence of purifying selection, we fitted our stop-extended codon model to the codon-aware alignments of mammalian orthologues, obtained from the OrthoMaM database (Douzery et al. 2014), using a mixture distribution for the \(\phi\) parameter. The mixture distribution consisted of two point masses, one with \(\phi\) a free parameter (constrained to be \(< 1\)), corresponding to alignments with a stop codon evolving under purifying selection and another with \(\phi\) fixed at 1, corresponding to neutral evolution—i.e., substitutions between stop codons occurring at a rate consistent with the rate of synonymous substitutions in the coding region. We then used maximum likelihood to estimate the two free parameters of this mixture model (the \(\phi\) parameter for the constrained stop codons and the mixture weight parameter). Note that we are using the synonymous substitution rate to approximate the rate of selectively neutral substitution (see ““Discussion” section)” for caveats).

We estimated that 57% (Fig. 1) of mammalian stop codons are evolving under relatively strong (point mass for \(\phi\) at 0.24) purifying selection. Using simulation we found that this estimate is not strongly dependent on modeling assumptions (Fig. S4). To investigate differences in selection pressure between genes with each of the three stop codons, we separated the genes into three groups, depending on which stop codon was found in the human gene. When we analyzed these groups separately, we found that a substantially higher proportion of UGA stop codons are under selective constraint, compared to UAG and UAA stop codons. Bootstrapping over orthologue families suggests that the higher frequency of conservation in UGA stop codons is highly robust to sampling error (Fig. 1).

Fig. 1
figure 1

Mixture model results. Estimated proportions of stop codons under purifying selection for all genes (black) and for genes with UGA, UAG and UGA shown in brown, yellow and orange, respectively (or umber, amber and opal, according to the color nomenclature for stop codons). Each histogram was derived from 1000 bootstrap replicates of the data (Color figure online)

Identification of Genes with Conserved Stop Codon Use

We also fitted our extended codon model to each orthologue family independently, estimating a separate value of \(\phi\) for each gene. The rate of evolution of stop codons varied widely (Fig. S5). For 3642 orthologue families (29.5% of those included in the analysis) the stop codon was completely conserved across the mammalian phylogeny, resulting in maximum likelihood estimates of zero for \(\phi ,\) while for some other genes the point estimate of \(\phi\) was greater than one. An example of a gene in each category is provided in Fig. S6. Of the 3642 completely conserved stop codons, 2585, 667 and 390 were UGA, UAG and UAA stop codons, respectively.

To assess the evidence for selection acting on stop codon use at the level of individual genes we used a likelihood ratio test, comparing the likelihood of the Null model with \(\phi = 1\) to the maximum likelihood of the alternative model with \(\phi\) as a free parameter, bounded above by 1. For data simulated under the Null model the likelihood ratio test (LRT) statistic (twice the difference in the log likelihood between the null and alternative models) matched the expected \(\chi ^{2}\) distribution, with one degree of freedom (Fig. S7a). The fit with the \(\chi ^{2}\) distribution was less good when we simulated data using a model based on triplet nucleotide frequencies from introns (Fig. S7b; see “Methods” section for details); however, even in this case the proportion of simulations for which the LRT statistic exceeded the critical value and for which \(\phi\) was less than one was 0.058, approximately equal to the significance level (\(\alpha =0.05\)) applied. All genes for which the LRT statistic exceeded the critical value and for which the maximum likelihood estimate, \({\hat{\phi }},\) was less than one were considered as putatively under selective constraint. Using these criteria 27% of genes showed evidence of purifying selection acting on stop codon preference. We caution that some of these may be false positives, given the 5% significance threshold applied. The lower proportion (27%) of genes with conserved stop codons detected using the likelihood ratio test applied to individual genes, compared to the estimate from the mixture model, likely reflects limits of the power to reject the null hypothesis for individual alignments, as the null hypothesis was not rejected for some of the alignments with completely conserved stop codons. For 2881 of the 3642 genes with completely conserved stop codons, we reject the null hypothesis of neutral evolution (the null hypothesis was also rejected in favor of purifying selection for a further 447 genes with some stop codon substitutions). Failure to reject the null hypothesis even when the stop codon was completely conserved across the phylogeny was more likely to occur for UGA (neutral model rejected for 78% of completely conserved stop codons) or UAG (73%) stop codons than for UAA (95%) stop codons. The higher power for UAA results from the fact that UAA may mutate to UGA or UAG in a single point mutation, reducing the likelihood of passive conservation, relative to UGA and UAG, which can only mutate via single point mutations to UAA. Differences in power between stop codons means that the frequency of purifying selection cannot be compared between the stop codons on the basis of individual tests; however, this does not undermine the comparison based on the mixture model as described above, as the latter can accumulate evidence of frequent purifying selection across genes.

Properties of Genes with Conserved Stop Codons

We compared the following properties between genes with conserved and non-conserved stop codons: \(3^{\prime }\) UTR length, \(5^{\prime }\) UTR length, downstream open-reading frame length, coding sequence length, mRNA half-life, \({\text {d}}N/{\text {d}}S,\) GC content, gene expression breadth, gene expression level. These properties were compared between all genes and between groups of genes, defined by the stop codon found in humans. Genes with putatively conserved stop codons had several striking features. Notably, they had longer \(3^{\prime }\) UTRs than other genes (median of 1183 bp compared to a median of 877 bp for the remainder of the genes in the dataset; \(p = 1\times 10^{-39}\)). However, the lengths of \(3^{\prime }\) and \(5^{\prime }\) UTRs are strongly negatively correlated with GC content (Pesole et al. 2001). When we fitted a linear model to \(3^{\prime }\) UTR length as a function of stop codon conservation and mRNA GC content we found the effect of conservation remained positive and significant (effect size estimate = 300 bp; \(p = 7\times 10^{-14}\)). Interestingly, the mRNA half-life of genes with conserved stop codons was shorter than that of other genes (Fig. 2). A strong effect was observed only for UGA stop codons with weak effects in opposite directions observed for UAG and UAA stop codons. For genes with a UGA stop codon (in human) half-life remained significantly associated with stop codon conservation when we fitted a logistic regression model with GC content, \(3^{\prime }\) UTR length, coding sequence (CDS) length and the ratio of nonsynonymous to synonymous substitution rates, \(\omega ,\) as predictor variables (\(p = 0.0004\)). Both the expression level and breadth of the stop-conserved genes were significantly higher than that of other genes (\(p = 1\times 10^{-8}\) and \(2\times 10^{-10},\) respectively, using a Wilcoxon rank sum test). When these effects were investigated separately in genes with different stop codons (in human) they remained strongly statistically significant in genes with UGA and UAA stop codons but not in genes with UAG stop codons.

Fig. 2
figure 2

Relationship between stop codon conservation and mRNA half-life and coding sequence conservation. Estimated probability of stop codon conservation (and 95% confidence interval) as a function of a mRNA half-life and b \(\omega\) for each stop codon type. Conservation is based on model comparison in a and on complete sequence conservation across the alignment in b. Estimates are from a logistic regression model, which included the number of taxa for which the stop codon was positionally homologous with the end of the alignment as a covariate

Model-Free Analysis Supports a Major Role for Selection in Shaping Stop Codon Use

As an alternative to the model-based approach to defining conserved stop codons we investigated the characteristics of genes for which the stop codon was completely conserved across the entire alignment, regardless of whether there was sufficient evidence from the model to reject neutrality. We found that \(\omega ,\) the ratio of nonsynonymous to synonymous substitution rates, was strongly associated with stop codon conservation, with genes with low \(\omega\) values (consistent with strong purifying selection acting on the CDS) having a much higher probability of having a stop codon that was completely conserved across the alignment (Fig. 2; this effect was also evident for the model-based conserved set). We fitted a logistic regression model, treating complete conservation of the stop codon across the alignment as the outcome and with GC content, stop codon (in human) and \(\omega\) as predictors and including the interaction between stop codon and \(\omega\) (i.e., allowing for different \(\omega\) effects for different stop codons). There was a striking effect of \(\omega\) on the probability that a stop codon was conserved (Fig. 2). This suggests that conservation of the stop codon is influenced by selection, as genes with low values of \(\omega\) are under strong selective constraint. Moreover, when we fitted an equivalent model but with GC content and stop codon (in human) and included interaction terms, none of the interactions between stop codon and GC content were significant, revealing no difference in the relationship with GC content between the stop codons. Given the large variation in mutational patterns as a function of GC content (Kenigsberg et al. 2016) this suggests that variation in mutation patterns is not the cause of differences in stop codon conservation. Further evidence that stop codon use is influenced by selection, rather than mutational effects comes from analysis of the frequency of each stop-associated triplet in the \(3^{\prime }\) UTR. The frequency of UAG, UGA and UAA in the \(3^{\prime }\) UTR in human (across all three forward frames) was 23%, 38% and 38%, respectively, suggesting that the excess of UGA over UAA stop codons is not mutational in origin, although the low abundance (22% in human) of UAG stop codons may be a mutational effect.

Characteristics of the set of genes with stop codons conserved across the mammalian alignment (the sequence-based set) were similar to those of the genes identified using the model (the model-based set) and, indeed, the majority (70%) of the genes that occurred in either group occurred in both. The sequence-based conserved gene set also had significantly longer \(3^{\prime }\) UTRs. The mRNA half-life effect was more striking. In a logistic regression model with membership of this set as the outcome variable and including GC content, \(3^{\prime }\) UTR length, CDS length, mRNA half-life and \({\text {d}}N/{\text {d}}S\) ratio as predictors, the mRNA half-life coefficient was significantly different from zero for the complete set of genes (\(p = 1\times 10^{-8}\)) as well as for the genes with UGA or UAG stop codons in human (\(p = 2\times 10^{-6}\) and 0.01, respectively), but not for genes with UAA stop codons in human (\(p = 0.68\)).

Nonsynonymous But Not Synonymous Divergence Strongly Predicts Conservation of the Stop Codon

To investigate further whether conservation of the stop codon results from purifying selection or from a lower mutation rate or random chance we assessed the relationship between the probability of stop codon conservation and synonymous/nonsynonymous divergence. We obtained the number of nonsynonymous substitutions per nonsynonymous site (\({\text {d}}N\)) and the number of synonymous substitutions per synonymous site (\({\text {d}}S\)) for human–mouse orthologues from Aken et al. (2017). Although, we already report above that the \({\text {d}}N/{\text {d}}S\) ratio (i.e., \(\omega\)) is predictive of stop codon conservation, these data allowed us to investigate the relationship with \({\text {d}}N\) and \({\text {d}}S\) separately, using values calculated independently of the alignments analyzed here. We found that \({\text {d}}N\) was highly predictive of complete stop codon conservation but \({\text {d}}S\) was only weakly predictive (Fig. 3). On average synonymous substitutions are under much weaker selection than nonsynonymous substitutions and \({\text {d}}S\) is therefore much more reflective of the underlying mutation rate than is \({\text {d}}N.\) Our observation that \({\text {d}}S\) is a much weaker predictor of stop codon conservation than \({\text {d}}N\) suggests that a lower mutation rate is not sufficient to explain conservation of the stop codon across mammals. It is possible that weak relationship between \({\text {d}}S\) (human–mouse) and stop codon conservation is due to saturation of synonymous substitutions between human and mouse, given the relatively distant divergence of these species. Therefore, we also fitted the same model to complete stop codon conservation as a function of divergence values with macaque, which diverged from humans much more recently. Again \({\text {d}}N\) was strongly associated with stop codon conservation (at least for UGA and UAA stop codons), while \({\text {d}}S\) was not (Fig. 3).

Fig. 3
figure 3

Stop codon conservation and nonsynonymous and synonymous distance. Estimated probability of complete sequence conservation (and 95% confidence interval) as a function of a \({\text {d}}N\) between human and mouse (\(n = 12,266\)), b \({\text {d}}S\) between human and mouse (\(n = 12,266\)), c \({\text {d}}N\) between human and macaque (\(n = 12,083\)) and d \({\text {d}}S\) between human and macaque (\(n = 12,083\)). In all cases the x-axis is truncated to 1, as most of the divergence values are lower than this. The number of taxa for which the stop codon was positionally homologous with the end of the alignment was included as a covariate in the model

Discussion

We set out to determine the extent to which alternative stop codon use is affected by purifying selection in mammalian genes. By extending models that were developed to understand the selection pressures acting on amino acid encoding sequences, we estimated that the stop codon is subject to purifying selection in a large proportion (approximately 57%) of mammalian genes. The proportion under selection varies between the stop codons and is highest for genes with UGA stop codons (Fig. 1). UGA is by far the most common stop codon in human and many other multicellular eukaryotes (close to 50% of human protein-coding genes have a UGA stop codon). Given the high rate of purifying selection affecting UGA stop codons we propose that the predominance of UGA codons is a result of selection rather than mutation. Stop codon use is strongly associated with GC content (Trotta 2016) and large-scale variation in GC content across genomes has a mutational origin (Kenigsberg et al. 2016). However, if the high abundance of UGA codons was mutational in origin we would expect that UGA codons in regions with high GC content would be much more likely to be conserved than UGA codons in low GC regions, given the strong relationship between GC content and stop codon use. This was not the case, as we found only a weak relationship between the probability of complete sequence conservation and GC content for all three stop codons.

Stop codon conservation may result from gene regulatory mechanisms that have a preference for the use of a specific stop codon. These mechanisms may include the control of translation capacity of mRNA molecules (Yordanova et al. 2018) and translational pausing, which has previously been associated with localization of the translating ribosome (Yanagitani et al. 2011). Strong enrichment of UGA among conserved stop codons suggests that UGA may frequently be the preferred codon in such cases (71% of completely conserved stop codons were UGA, compared to 50% UGA among all human protein-coding genes). In principle, it is possible that some cases of purifying selection acting on stop codons result from missed examples of UGA codons that encode the amino acid selenocysteine, rather than signaling the end of translation. However, given the small number of genes encoding selenoproteins (Kryukov et al. 2003) and the large number of conserved UGA stop codons, this is very unlikely to explain a substantial proportion of the conservation we report.

The use of the synonymous substitution as a proxy for the rate of neutral evolution has been criticized, as it is known that synonymous substitutions may have functional consequences (Caceres and Hurst 2013; Chamary and Hurst 2005; Carlini and Genut 2006; Ngandu et al. 2008; Sauna and Kimchi-Sarfaty 2011). Codon models that include a variable synonymous substitution rate have been developed (Pond and Muse 2005; Rubinstein et al. 2011). By not incorporating variability in the synonymous substitution rate in the coding region we are effectively comparing the rate of synonymous stop codon substitutions to their expected rate, given the mean rate of synonymous substitution in the coding region. Given the extent of purifying selection acting on synonymous sites, the mean synonymous substitution rate is likely to be an underestimate of the neutral rate of evolution. However, because our objective here is to study purifying selection affecting synonymous stop codons underestimation of the neutral rate would make our method conservative (we would miss some genes under purifying selection, but the underestimate of the neutral rate should not cause false positives). We also observed many genes for which the maximum likelihood estimate of \(\phi\) was greater than one (including PARP1, shown in Fig. S6). It is possible that some of the genes with stop codons evolving at a rate greater than expected, given the synonymous rate, result from purifying selection acting on mutations between synonymous sense codons, but it is also possible that there is diversifying selection acting on stop codon use in some genes. However, the number of genes with \(\phi\)significantly greater than 1, was not more than we expected by chance (260 from a total of 12,336 genes at a significance threshold of \(\alpha =0.05\)). Using a very different method to ours, Belinky et al. (2018) recently carried out an analysis of stop codon substitutions in a wide range of taxa. Although the majority of the taxa studied were microbial, they included three mammalian species. The authors reported an excess of substitutions to UGA stop codons, which they attributed to positive selection. However, consistent with our findings, they also report widespread purifying selection acting on synonymous stop codon mutations in primates, particularly for UGA (Belinky et al. 2018).

Our method to infer purifying selection between synonymous stop codons using the synonymous substitution rate in the coding region does not consider the impact of selection acting on codon use. Differences in tRNA abundance can result in striking difference in the frequencies of synonymous codons, particularly for highly expressed genes in organisms with large effective population sizes (Gouy and Gautier 1982; Galtier et al. 2018). However, although translational selection on codon use has been reported in mammals (Doherty and McInerney 2013), it tends to be weak due to relatively small effective population sizes (Pouyet et al. 2017; Galtier et al. 2018). The use of the rate of synonymous substitutions in the coding region to approximate the rate of neutral evolution to detect the effect of purifying selection acting on substitutions between synonymous stop codons is supported by the fact that we obtained consistent results when we instead used the rate of substitution in introns (Fig. S2). Although the use of intron sequences circumvents concerns that the rate of synonymous substitutions in the coding region may not be close to the rate of neutral substitution, there are significant advantages to the use of coding sequences. In particular, not all genes have introns, coding regions can be aligned much more accurately and the large-scale datasets of aligned coding regions are more readily available. We caution, however, that at least in its current form our model should not be applied to infer purifying selection on synonymous stop codon substitutions in organisms in which codon use is likely to be shaped by translational selection.

Among the most striking properties of genes with conserved stop codon use was the decreased mRNA half-life for conserved genes with UGA stop codons (Fig. 2). The length of the \(3^{\prime }\) UTRs is known to be inversely correlated with mRNA half-life and the conserved genes had longer \(3^{\prime }\) UTRs; however, the reduced half-life in the UGA genes remained significant, even when we fitted a logistic regression model incorporating \(3^{\prime }\) UTR length and GC content as covariates. Although there may be many possible explanations for the lower half-life of these genes we note that the model proposed by Yordanova et al. (2018) consisting of a mechanism to limit the number of times an mRNA molecule is translated may result in lowered half-life of the mRNAs affected because stalled ribosomes trigger mRNA decay (Doma and Parker 2006). Arribere et al. found evidence of instability of proteins resulting from readthrough in Caenorhabditis elegans and human cells and noted that this has been reported to result in mRNA instability in the HBA2 gene (Arribere et al. 2016; Liebhaber and Kan 1981). Given the apparent ubiquity of protein instability resulting from readthrough, the higher readthrough rate for UGA codons and the shorter mRNA half-life of genes with conserved UGA stop codons, destabilization of mRNA, resulting from readthrough may be a common mechanism of controlling protein abundance. However, selection acting against synonymous mutations between stop codons may have reasons other than readthrough. In this regard we note a recent analysis of stop codon readthrough in Saccharomyces cerevisiae and Drosophila melanogaster (Li and Zhang 2019) that suggests that many stop codon readthrough events may be molecular errors rather than having a specific function.

Previous studies have investigated stop codon sequence conservation, for example as one of the strands of evidence of functional translational readthrough (Jungreis et al. 2011; Loughran et al. 2014; Jungreis et al. 2016). Our study is distinct in that we do not set out to investigate a specific function of conserved stop codons but, instead, to estimate the frequency and strength of selection acting on synonymous stop codon use and to investigate the properties of the associated genes, in general. In principle, inference of stop codon conservation using our extended codon model is preferable to inference based on sequence conservation alone, as the latter does not take into account differences in GC content and mutation rates between genomic regions. Our method explicitly models variation in codon usage, through incorporation of parameters (estimated empirically from the entire CDS alignment) for codon equilibrium frequencies. However, the total tree length and number of taxa in the mammalian orthologue alignments from Douzery et al. (2014) was only just sufficient in many cases and in some other cases insufficient to reject the neutral model, even in the presence of complete conservation of the stop codon across all taxa. As the size of the families of reliably aligned coding sequences increases, the power to estimate accurately the strength of purifying selection acting on synonymous stop codons will increase. This will allow much more accurate identification of the affected genes and represents an example of the value of continued efforts to sequence the genomes of an ever increasing range of organisms and of the curation and alignment of orthologue families.

Conclusions

Our manuscript describes a method to assess the selection pressure acting on mutations between stop codons using the observed rate of synonymous substitution in the coding region or the rate of comparable substitutions in intron sequences. Using mixture models to combine information across alignments allowed us to infer that a large proportion of stop codons are under purifying selection pressure that reduces the rate of substitutions between synonymous stop codons. We conclude that selection plays an important and largely overlooked role in determining stop codon use in mammalian protein-coding genes.

Methods

Model Optimization and Data

We downloaded 14,526 coding sequence alignments for mammalian orthologue families and the corresponding inferred phylogenetic trees from the OrthoMaM (version 8) database (Douzery et al. 2014). These included sequences from 43 completely sequenced mammalian genomes, spanning from platypus to placental mammals. We restricted to the 12,336 families with at least 20 taxa for which the stop codon was included in the sequence alignment and positionally homologous with the end of the alignment. For each sequence alignment, we first re-estimated branch lengths of the phylogenetic trees using a the Muse and Gaut model (1994) with the F1X4 model of codon equilibrium frequencies (MGF1X4), implemented in codonPhyml (Gil et al. 2013). Treating the tree topology and relative branch lengths estimated by codonPhyml as fixed, we then optimized the model in Eq. 1 over the parameters \(\kappa ,\)\(\omega ,\)\(\phi\) and a scaling factor, s,  by which we multiplied all branch lengths of the tree (in practice the scaling factor was typically close to 1—interquartile range 0.96–0.98). The model was implemented in (R Core Team, 2017), and optimized using the optim function with the Nelder and Mead (1965) method. Likelihood ratio tests were used to identify genes with evidence of conserved stop codons, with twice \(\Delta L\) (the difference in the log likelihood of a model with \(\phi\) fixed at 1 and a model with \(\phi\) set to its maximum likelihood value) compared to a \(\chi ^{2}\) distribution with one degree of freedom. Code and data to reproduce our results are available from https://github.com/cseoighe/StopEvol.

Inference of Selection from Intron Sequences

Sequence alignments in Multiple Alignment Format (MAF) for 37 mammals, inferred using the Ensembl Compara (Herrero et al. 2016) EPO pipeline, were retrieved by FTP from Ensembl Compara in October 2019. Using gene structure information obtained from BioMart (Kinsella et al. 2011) and custom Perl scripts we retrieved the genomic coordinates (in the human genome) of the \(3^{\prime }\) most intron of all human protein-coding genes (taking the longest protein-coding transcript for each gene). Based on these coordinates we retrieved alignments corresponding to the last introns of human genes from the MAF files. In each case we excluded the first and last 100 bp of the intron (to reduce the representation of sequences involved in mRNA splicing). Genes for which the last intron was shorter than 1200 bp were not considered. We then selected only the taxa that were also represented in the OrthoMaM (version 8) alignment corresponding to the same human gene. We further removed all positions from the alignment at which at least 10% of the sequences had a gap at that position. We retained only genes for which there remained at least 1000 bp of aligned sequence and trimmed to the central 1000 bp of each remaining intron alignment, so that the size of the remaining intron alignments was the same for each gene. We selected an arbitrary subset of 500 of these alignments, each including at least 20 taxa. For each alignment we pruned the phylogenetic tree obtained from OrthoMaM to remove species not present in the intron alignment. We then used PhyML (version 3.3) (Guindon et al. 2010), with the HKY and GTR models (for Figs. S2 and S3, respectively) to reestimate branch lengths and model parameters. We converted the resulting branch lengths (in units of substitutions per site) to codon branch lengths (with units of substitutions per codon) by multiplying by a factor of three. We then applied these branch lengths and rate parameters (\(\kappa\) or the A \(\leftrightarrow\) G rate for the HKY and GTR models, respectively) to model the data at the stop codon. In this case \(\phi\) models the rate of substitution between synonymous stop codons relative to the rate of the corresponding mutation type in the intron (transitions in the case of HKY and A \(\leftrightarrow\) G transitions in the case of GTR).

Simulations

We first produced simulated data under the model in Eq. 1 and optimized the parameters of the same model. Coding sequence alignments for 1000 genes (and the corresponding phylogenetic trees) were sampled at random from the OrthoMaM database. We re-estimated the branch lengths of the tree using a codon model (MGF1X4) implemented in codonPhyml (Gil et al. 2013). Codons at the root of the tree were sampled from their equilibrium frequency under the F1X4 model (the F1X4 model sets codon frequencies to the product of the frequencies of their constituent nucleotides). Coding sequences were evolved from the root node over the phylogeny using code written in R. Model parameters were estimated from the simulated data as described above. We also simulated data under a Goldman and Yang (1994) model (GY). The GY model differs from the MG model in that it uses triplet frequencies in place of the equilibrium frequency of the target nucleotide at the mutated codon site. We used empirical target triplet frequencies, derived from intron sequences of the same gene. Intron sequences from human protein-coding genes were derived from Ensembl Genes 94 (Aken et al. 2017). We downloaded cDNA sequences and exon coordinates using BioMart and subtracted the exonic regions from the cDNA sequences. The first 100 bp and last 100 bp of each intron were discarded to reduce the impact of splicing signals on triplet frequencies. Codons were sampled from the intronic triplet frequencies for the root node of each tree and coding sequences were again simulated over the phylogeny.

Mixture Model, Bootstrapping and Simulation

We used a mixture model to estimate the proportion, p,  of alignments for which the stop codon was under purifying selection pressure. Conditional on belonging to this set of alignments the value of \(\phi\) was treated as a free parameter, while \(\phi\) was equal to 1, otherwise. For tractability, we set \(\omega ,\)kappa and the tree scaling parameter to their maximum likelihood values. We performed a bootstrap over alignments to assess uncertainty in the estimates of p and \(\phi .\) To test the accuracy with which the proportion of stop codons under purifying selection could be recovered we performed additional simulations. We simulated 1000 genes with neutrally evolving stop codons (\(\phi = 1\)) and a further 1000 genes with \(\phi\) a normal random variable with mean 0.3 and standard deviation 0.1. Both sets of genes were simulated under a GY model with empirical triplet frequencies derived from intron sequences, as described above. We then performed 100 simulations. For each simulation we sampled 1000 genes from the two sets above, a random proportion (uniformly sampled from 0.1 to 0.8) of which were from the purifying selection set. We then applied our method (using the MGF1X4 model) to estimate the proportion of genes under purifying selection. Despite the substantial differences between simulation and the model the results are highly correlated (\(R^{2} = 0.996\)) and show only a very slight downward bias in the estimates (Fig. S4). We also applied the mixture model using the GY model with the F3X4 model of codon frequencies, but found that this yielded less accurate results, despite being closer to the model under which the simulation was performed (Fig. S8).

Gene Properties and Enrichment Analysis

Sequences of \(3^{\prime }\) untranslated regions (UTRs) for human and mouse were downloaded from Ensembl Genes 91 (Aken et al. 2017). Lengths of UTRs and coding regions were compared between genes that showed evidence of stop codon conservation (\(\phi < 1\) and \(p < 0.05\)) and the remaining genes using Wilcoxon rank sum tests. We performed tests of enrichment using DAVID (version 6.8) (Huang et al. 2009b, a).

Expression Level, Expression Breadth and mRNA Half-Life

Expression level and breadth were calculated using median values, per tissue, of gene transcripts per million (TPM) data from GTEx (version 7; Battle et al. 2017), downloaded on 8 February, 2018. We used the median of the per tissue median TPM values as a measure of expression level and the number of tissues in which the median TPM was greater than 10 as a measure of expression breadth. mRNA half-life data are from Tani et al. (2012).