Introduction

Synonymous substitution rates in angiosperm mitochondrial genes are about 10-fold lower than in the nuclear genes (Drouin et al. 2008; Richardson et al. 2013; Wolfe et al. 1987) and approximately 100-fold lower than in animal mitochondria (Palmer and Herbon 1988). This low rate appears to be a derived trait in land plants (Smith 2015). Synonymous substitutions are often used to calculate mutation rates in genes under the assumption that they are selectively neutral (Nei et al. 2010). It might also be expected that mutations in non-coding or nonessential regions would also be neutral, and this could provide an interesting comparison to synonymous substitution rates. However, the non-gene regions of land plant mitochondrial genomes expand and rearrange so quickly, and to such an extent, that it is difficult to align the non-gene regions outside of very closely related species (Christensen 2013, 2014; Darracq et al. 2010; Kubo and Newton 2008; Mower et al. 2007; Palmer and Herbon 1988; Richardson et al. 2013; Sloan et al. 2012; Smith and Keeling 2015). If the mutation rate in plant mitochondrial genomes is truly low, then why do the non-gene regions diverge so quickly? One possible part of the explanation may be that synonymous substitutions in angiosperm mitochondria are not selectively neutral, and therefore underestimate the mutation rate. If so, the explanation for the paradox of low mutation rates in genes and high mutation rates in junk may need to be explained not just by DNA repair and maintenance mechanisms, but by a further understanding of the role of selection on synonymous substitutions.

This possibility has been addressed (Sloan and Taylor 2010) using patterns of codon usage in mitochondrial genes. Their study concluded that selection on synonymous sites was neutral or nearly neutral, and that selective effects on synonymous sites were too weak to explain the reduced substitution rates. They also identified a bias toward A-T bases and pyrimidines at synonymous sites, but this non-randomness is not fully understood. More recently, presumably neutral mutation rates in mitochondrial insertions of plastid DNA were measured, but were not able to be directly compared to homologous sequences under selection in mitochondria (Sloan and Wu 2014). Thus, the substitution rates of synonymous sites have never been directly compared to truly neutral substitution rates, such as the rates of homologous non-selected sequences. Such a comparison would provide a direct way of confirming that synonymous substitutions are truly neutral; however, the highly divergent nature of non-gene regions prevents proper alignment among lineages, and thus, there are very few opportunities for direct comparisons across diverse species.

Ribosomal protein small subunit 14 (rps14) is co-transcribed in many plant mitochondrial genomes (see Fig. 1) with ribosomal protein large subunit 5 (rpl5) and cytochrome b (cob) (Hoffmann et al. 1999; Quinones et al. 1996). In some lineages, a copy of rps14 has been relocated to the nucleus and the protein is imported by mitochondria. In these lineages, the mitochondrial copy of rps14 has become a pseudogene (Aubert et al. 1992; Figueroa et al. 1999; Ong and Palmer 2006). These pseudogenes accumulate frameshift mutations so are clearly non-functional and not under selection for protein coding capacity. Because both rps14 genes and pseudogenes are co-transcribed with and located between rpl5 and cob, large rearrangements of the area will be selected against, as cob would lose its promoter. These rps14 pseudogenes are thus a unique example of a non-coding sequence that can still be aligned to homologous coding sequences across very diverse lineages. Therefore, rps14 is a perfect candidate to measure neutral mutation rates. In lineages with functional rps14 genes, the synonymous substitution rate can be measured, while in lineages with ψrps14 pseudogenes, the total substitution rate is the neutral mutation rate. These rates can be compared to find out if synonymous substitutions in plant mitochondrial genes are selectively neutral.

Fig. 1
figure 1

A map showing the three co-transcribed mitochondrial genes, rpl5, rps14, and cob. These three genes are syntenic in all the species of angiosperm examined. A single promoter has been identified in several species (Forner et al. 2007; Hoffmann et al. 1999; Quinones et al. 1996) indicated at left

Methods

Accession numbers of all sequences used are listed in Online Resource 1. In a few species, the synteny of cob with rpl5 and rps14 was disrupted, but it was still possible to identify rps14 or ψrps14 just downstream of rpl5. The ψrps14 pseudogenes were confirmed by the presence of internal stop codons or frameshifts. Four multiple alignments were used in this analysis: an alignment of the rps14 sequences in all species analyzed (Online Resource 2), an alignment of the concatenated sequences of atp4, rpl5, and cob in all species analyzed (Online Resource 3), an alignment of the functional rps14 sequences (Online Resource 4), and an alignment of the concatenated sequences of atp4, rpl5, and cob in only those species with a functional rps14 gene (Online Resource 5).

There is also RNA editing by pentatricopeptide repeat (PPR) proteins in the analyzed genes in several of these species (Uchida et al. 2011). A PPR protein binds to an mRNA and edits a cytosine to a uracil. These edits may change the amino acid encoded. A mutation at an edited site, or in the binding sequence of the PPR protein, may appear synonymous at the DNA level, but change the final protein, or may appear non-synonymous at the DNA level but leave the protein sequence unchanged. To avoid confounding the analysis, edited codons and the 18 upstream nucleotides representing potential PPR binding sites under selection have been deleted from analysis.

Two phylogenetic trees were constructed: one using the concatenated sequences of atp4, rpl5, and cob from all species analyzed, and one using the concatenated sequences of atp4, rpl5, and cob from only those species with a functional rps14 gene. The atp4 gene was chosen because it is independently transcribed (Forner et al. 2007). All alignments and phylogenetic trees were constructed with Mega5 (Tamura et al. 2011).

Analysis of functional rps14 genes was done using CodeML in PAML 4.8 implemented in PAMLX (Yang 2007). Branch lengths were calculated using synonymous substitutions, and the phylogenetic tree of the concatenated sequences of atp4, rpl5, and cob (Online Resource 6) was used to set the topology. This was done separately using the multiple alignment of the rps14 sequence including only species with functional rps14 genes (Fig. 2a) and the multiple alignment of the concatenated sequences of atp4, rpl5, and cob including only species with a functional rps14 gene (Fig. 2b). Taking the branch length of each terminal branch leading to a lineage on the rps14 tree and dividing it by the length of the same branch on the atp4, rpl5, and cob tree provides a ratio of the synonymous substitution rate of rps14 genes compared to the synonymous substitution rate of the other three genes.

Fig. 2
figure 2

Phylogenetic trees with terminal branch lengths calculated using PAML as described in methods. a Phylogenetic tree of functional rps14 genes using synonymous substitutions. b Phylogenetic tree of atp4, rpl5, and cob using synonymous substitutions and including only species with functional rps14 genes. c Phylogenetic tree of both rps14 genes and ψrps14 pseudogenes using total substitutions. Species with ψrps14 pseudogenes are labeled with a ψ. d Phylogenetic tree of atp4, rpl5, and cob using synonymous substitutions and including all species. For all trees, topology is based on an initial tree of atp4, rpl5, and cob sequences

Analysis of ψrps14 pseudogenes was done using BaseML in PAML 4.8 implemented in PAMLX (Yang 2007), branch lengths were calculated using total substitutions, and the phylogenetic tree of the concatenated sequences of atp4, rpl5, and cob (Online Resource 7) was used to set the topology. This was done using the multiple alignment of the rps14 sequence including all species (Fig. 2c). A phylogenetic tree using CodeML as described above was made using the multiple alignment of the concatenated sequences of atp4, rpl5, and cob including all species analyzed (Fig. 2d). Taking the branch length of each terminal branch leading to a lineage with an ψrps14 pseudogene on the rps14 tree and dividing it by the length of the same branch on the atp4, rpl5, and cob tree provides a ratio of the total substitution rate of the ψrps14 pseudogene compared to the synonymous substitution rate of the other three genes (Online Resource 8). Species with functional rps14 genes were included in these trees to avoid counting as much divergence before the pseudogenes became pseudogenes as possible. Indels were counted in all rps14 sequences. Indel rates per site were calculated.

Results

If synonymous substitutions in plant mitochondria are not neutral, then the synonymous substitution rate would erroneously underestimate the neutral mutation rate. In this event, we would expect rps14 genes to have a significantly lower synonymous substitution rate than the total substitution rate in an ψrps14 pseudogene. Alignments were done for ψrps14 of the chosen species as well as rps14 genes for the chosen species (Online Resources 2 and 4). Alignments were also done for the concatenated sequences of atp4, rpl5, and cob for all chosen species (Online Resources 3 and 5) in order to generate the trees shown in Fig. 2. Following alignments, we calculated both rates.

Terminal branch lengths for the genes were calculated using PAML 4.8 (Yang 2007), and are shown in Fig. 2 and Online Resource 8. For rps14 genes, the normalized neutral mutation rate is calculated by dividing the terminal branch length of the rps14 tree by the terminal branch length of the atp4, rpl5, and cob tree, both calculated using synonymous substitutions per synonymous site. For ψrps14 pseudogenes, the normalized neutral mutation rate is calculated by dividing the terminal branch length of the rps14 tree (calculated using total substitutions per site) by the terminal branch length of the atp4, rpl5, and cob tree (calculated using synonymous substitutions per synonymous site).

The neutral mutation rates normalized with the atp4, rpl5 and cob genes are shown in Table 1 and Fig. 3. The average normalized neutral mutation rate of the functional rps14 genes is 0.276, and the average normalized neutral mutation rate of the ψrps14 pseudogenes is 1.32. Using a Student’s t test, these rates are significantly different (p = 0.0099). One species, Citrullus lanatus, had branch lengths of zero for both ψrps14 and atp4, rpl5, cob, and was excluded from analysis. Despite having no lineage specific substitutions when compared to neighboring species, C. lanatus differed by several indels.

Table 1 Synonymous substitution rates in rps14 genes and substitution rates in ψrps14 pseudogenes, relative to synonymous substitution rates in atp4, rpl5, and cob in the same species
Fig. 3
figure 3

Comparison of the neutral mutation rate of species with functional rps14 genes and species with ψrps14 pseudogenes. Rates are from Table 1 and Online Resource 6. Mean ± standard error are shown

In addition to substitutions, we also measured indel rates. Indels should be strongly selected against in functional genes, but neutral in pseudogenes. The ψrps14 pseudogenes had an average indel rate of 0.011 indels per site. The rps14 genes had an average indel rate of 0 indels per site. These rates are significantly different (p = 0.00043), as expected.

Discussion

Because there is no selective pressure on a non-functional pseudogene, substitutions will be neutral. The availability of both genes and alignable pseudogenes of rps14 allowed us to measure the neutral substitution rate directly and compare it to the synonymous substitution rate, often used as a proxy for the neutral rate. The normalized synonymous substitution rate of the rps14 genes is significantly different from the neutral substitution rate of the ψrps14 pseudogenes (Fig. 3; Table 1). Therefore, it can be inferred that the number of observable synonymous substitutions in plant mitochondria is lower than we would expect in the absence of any selection.

One possible explanation for the apparent selection on synonymous substitutions is RNA stability and translation efficiency. If synonymous substitutions affect the stability of mitochondrial RNA or the association with the translation machinery, then there will be selective pressure to repair them even without a difference in the encoded protein. Another possibility is that mutational processes may be responsible for the A-T and pyrimidine biases in codon usage observed by Sloan and Taylor (2010), as well as the A-T bias in mutations of neutral insertions of plastid DNA in mitochondrial genomes (Sloan and Wu 2014). In other systems, it has been estimated that the rate of cytosine deamination which causes G-C to A-T transitions is at least 50-fold higher than deamination reactions that could cause A-T to G-C transitions (Friedberg et al. 2006). The oxidation of guanine to 8-oxo-guanine, which can result in G-C to T-A transversions, appears to occur in plant mitochondria as well (Christensen 2013; Markkanen et al. 2012; van Loon et al. 2010). These two processes may skew the overall mutational spectrum toward an A-T bias, resulting in the non-randomness at synonymous sites previously observed (Sloan and Taylor 2010; Sloan and Wu 2014).

Another possible explanation for the apparent selection on synonymous substitutions is that synonymous substitutions might be repaired simultaneously with non-synonymous substitutions via gene conversion if gene conversion tracts are long enough. In genes, the selective pressure on deleterious mutations is very high, so repaired mutations should be frequent. In the pseudogene, there will not be selection to repair mutations, so nearby neutral mutations will not be repaired as a result of a selective sweep.

The low mutation rate in land plant mitochondrial genes compared to non-genes does not appear to be due to differences in repair processes available, but is likely due to differences in selection on the repaired products (Christensen 2013, 2014). Gross rearrangements or even small indels would be strongly selected against in gene sequences, while they would not be selected against in non-genes, including pseudogenes. These events appear to be common on evolutionary timescales, explaining the large divergence of non-coding sequences.

This study is the first direct comparison of plant mitochondrial synonymous substitution rates with a neutral substitution rate in homologous pseudogenes. Although we have found that synonymous substitutions are not completely neutral, we still concur with the conclusion of Sloan and Taylor (2010) that the non-neutrality is not sufficient to explain the large disparity between the low mutation rates in genes and the much higher mutation, rearrangement, and expansion rates of the non-coding sequences in plant mitochondria.