Introduction

In land plants, RNA editing manifests itself as targeted conversions of cytidines (C) into uridines (U) in organellar transcripts by deamination (Covello and Gray 1989; Gualberto et al. 1989; Hiesel et al. 1989; Hoch et al. 1991; Lamattina et al. 1989; Maier et al. 1992). This phenomenon of correcting genetic information is absent in algae and seems to have emerged concomitant with the water-to-land transition of embryophytes (Covello and Gray 1993; Jobson and Qiu 2008; Maier et al. 2008; Steinhauser et al. 1999). RNA editing occurs in all land plant clades (Freyer et al. 1997; Groth-Malonek et al. 2007; Hiesel et al. 1994; Malek et al. 1996; Sper-Whitis et al. 1994; Sper-Whitis et al. 1996) with one unique exception. In the subclass of complex thalloid, marchantiid liverworts no RNA editing has been found. This is evident from the completely sequenced mitochondrial genome of Marchantia polymorpha (Ohyama et al. 2009) and from investigations of the mitochondrial genes cox1, cox3, nad5, and nad7 in several other marchantiid liverwort species (Groth-Malonek et al. 2007; Malek et al. 1996; Sper-Whitis et al. 1996; Steinhauser et al. 1999). In flowering plants, mitochondrial transcriptomes contain some 300–500 RNA editing sites (Giegé and Brennicke 1999; Handa 2003; Mower and Palmer 2006; Notsu et al. 2002) and chloroplast transcriptomes contain approx. 20–40 editing sites (Inada et al. 2004; Sasaki et al. 2003; Tillich et al. 2005; Tsudzuki et al. 2001). In the extant lycophytes, which represent the most ancient surviving lineage of vascular plants, transcriptome analyses reveal enormous frequencies of RNA editing, with more than 1,700 editing sites in the quillwort Isoetes engelmannii (Grewe et al. 2011) and even more than 2,100 editing sites in the spike moss Selaginella moellendorffii (Hecht et al. 2011). Fern allies, ferns, and hornworts also display abundant “reverse” uridine-to-cytidine RNA editing (Kugita et al. 2003; Steinhauser et al. 1999; Vangerow et al. 1999; Wolf et al. 2004; Wolf et al. 2005; Yoshinaga et al. 1996).

In contrast to abundant RNA editing in vascular plants, the model moss Physcomitrella patens and its close relative Funaria hygrometrica show only 11 and 8 editing sites, respectively, in their entire mitochondrial transcriptomes (Rüdinger et al. 2009, 2011b). Studies in Haplomitrium mnioides, however, a member of the basal-most subclass of liverworts (Haplomitriidae), showed RNA editing at more than 20 positions in its nad7 gene alone (Groth-Malonek et al. 2007).

In the meantime, several specific recognition factors for RNA editing sites have been identified in chloroplasts and mitochondria of plant model species like Arabidopis thaliana (reviewed in Fujii and Small 2011), Oryza sativa (Kim et al. 2009), or Physcomitrella patens (Ohtani et al. 2010; Rüdinger et al. 2011b; Tasaki et al. 2010). All of them belong to the large pentatricopeptide repeat (PPR) protein family, which was first described for the Arabidopsis thaliana nuclear genome where some 450 members are encoded (Lurin et al. 2004). PPR proteins are characterized by tandem repeats of a loosely conserved 35 amino acid motif (Small and Peeters 2000). Canonical PPR proteins of this type (the P subfamily) exist in numerous eukaryotes but plant genomes also encode unique PPR variants, referred to as the “PLS” type. Members of the PLS subfamily are characterized by long (L) and short (S) PPR motif length variants and many have optional C-terminal protein domain additions, the E, the E+ and the DYW domains, as successive extensions that, when present, always appear in this order (Lurin et al. 2004).

All RNA editing factors that have been characterized so far are members of the PLS subfamily of PPR proteins and carry at least the E/E+ or the complete suite of extensions including the DYW domain. The DYW domain in particular received attention given its weak similarity with cytidine deaminases (Salone et al. 2007). Moreover, the appearance of the DYW domain and RNA editing perfectly correlate, with DYW genes neither found in green algae nor in marchantiid liverworts where no RNA editing has been detected until now (Rüdinger et al. 2008; Salone et al. 2007). Intriguingly, all hitherto identified editing factors in Physcomitrella are DYW-type PLS proteins and no E/E+ type proteins lacking the DYW domain are encoded in the Physcomitrella genome (O’Toole et al. 2008) suggesting that the DYW domain indeed is intimately correlated with editing, at least in early plant evolution.

A robust molecular phylogeny of bryophytes has emerged over the recent years and extensive data sets are now available for a wide bryophyte taxon sampling of the mitochondrial nad5 gene plus widely sampled data sets for the nad2 gene specifically for mosses and for the nad4 gene recently compiled specifically for liverworts (Beckert et al. 1998; Volkmar et al. 2011). We used the recently developed PREPACT tool for prediction of editing sites in these genes, which we exemplarily confirmed on cDNA level for selected taxa. A new amendment to PREPACT, which allows for the use of multiple reference sequences for more conservative RNA editing site prediction, is introduced. To further investigate the correlation of RNA editing frequency and DYW-type gene diversity, we compared the DYW domain diversity for selected pairs of liverworts (Haplomitrium mnioides and Lejeunea cavifolia) and mosses (Takakia lepidozioides and Pogonatum urnigerum), which differ extremely in their RNA editing rates. Complementing the previous work, our new data emphasize a correlation between RNA editing frequency and DYW protein diversity in the two most ancient land plant clades. The altogether 184 E–E+–DYW domain sequences now available from liverworts and mosses allow for comparison of amino acid conservation with their vascular plant homologues.

Materials and Methods

Identification of RNA Editing Sites on DNA and cDNA Level

Total plant nucleic acids were extracted using either the CTAB method (Doyle and Doyle 1990) employing cetyltrimethyl-ammonium bromide as a detergent for cell lysis or the NucleoSpin Plant kit (Macherey–Nagel, Düren, Germany). RNA was prepared from plant material using the NucleoSpin® RNA Plant kit (Macherey–Nagel). RNA was additionally treated with DNase I (Fermentas Life Sciences, St. Leon-Rot, Germany) to remove potential vestiges of DNA. First strand cDNA was synthesized using the RevertAid™ M-MulV® Reverse Transcriptase kit (Fermentas Life Science) and the hexanucleotide random primer mix (10 μM per assay; Carl Roth, Karlsruhe, Germany).

Primers were designed to target conserved regions of the nad4 (nad4up: 5′-acagccaaatttcartttgtggaa-3′ and nad4do: 5′-tyaatsaaattttccatgttgcac-3′) and nad2 (nad2up: 5′-ggagttgtntttagtacctctaa-3′ and nad2do: 5′-agtagtaacgayttntcacgatccat-3′) genes. In some cases alternative primers (nad4dov2: 5′-tccatgttgcactaagttacttacggangtatgcat-3′; nad4up2: 5′-aaatttcartttgtggaaannnttcgatggcttcc-3′ or nad4up3: 5′-aggaagccttattattttggtgatcc-3′) were used to amplify the nad4 gene regions. In liverworts primers n5up (5′-gcaggntttttyggncgttttct-3′) and nad5do (5′-aacatnrcaaaggcataatgata-3′) were designed to amplify the coding region of nad5, whereas alternative primers (K: 5′-atatgtctgaggatccgcatag-3′, L: 5′-aactttggccaaggatcctacaaa-3′) were mostly used to amplify the gene region in mosses. PCR amplification assays in total contained 2.5 mM MgCl2, 0.2 mM of each dNTP, 0.3 μM of each primer and 0.5 U of GoTaq polymerase (Promega, Mannheim, Germany) or alternatively the PCR extender system (Taq-Pfu mixture, 5prime, Hamburg, Germany) using the respective buffers supplied by the manufacturers and double distilled water in a volume of 25 μl. The touchdown temperature profile used in the PCR assays included an initial denaturation at 94°C for 3 min, followed by 10 cycles, each with a denaturation step at 94°C for 30 s, 30 s annealing initially at 50°C, then decreased by 0.8 K in each cycle and a synthesis step at 72°C for 3 min. This was followed by 30 further amplification cycles of 30 s at 94°C, 30 s at 42°C, and 3 min at 72°C and a final elongation step of 7 min at 72°C to complete strand syntheses. PCR products were cloned into the pGEM-T Easy vector (Promega) before sequencing (Macrogen Inc., Seoul, South Korea). Sequences obtained by clone sequencing and sequences already available in the NCBI database (Annotations see Tables 1, 2) were assembled and aligned manually using MEGA 4.0.2 (Tamura et al. 2007). RNA editing sites were predicted using the RNA editing prediction and analysis computer tool PREPACT (Lenz et al. 2010) and verified by comparison of DNA and cDNA sequences for several species (Tables 1, 2). Different PREPACT options including the new feature “Commons” were used for prediction of editing sites.

Table 1 Liverwort taxon sampling and RNA editing in nad5 and nad4 gene regions
Table 2 Moss taxon sampling and RNA editing in nad5 and nad2 gene regions

Phylogenetic Tree Construction

Phylogenetic tree construction was based on concatenated organelle genome DNA data sets: nad5 including group I intron nad5i753g1, nad2 including group II intron nad2i156g2, the nad5–nad4 intergenic spacer, the cobi420g1 group I intron locus, rbcL and rps4 for mosses and nad5 with nad5i753g1, nad4 including group II intron nad4i548g2, rbcL and rps4 for liverworts, aligned manually in MEGA 4.0.2 (Tamura et al. 2007) and divided into partitions (mitochondrial coding sequences, mitochondrial spacer sequences, group I intron, group II intron, chloroplast sequences). Maximum likelihood phylogenies were calculated with Treefinder (Jobb et al. 2004) under GTR+G+I substitution model selected with Modeltest (Posada and Crandall 1998) and node support was determined based on 1,000 bootstrap resampling replicates. Bayesian analyses were performed with MrBayes (Ronquist and Huelsenbeck 2003) for 1 million generations with every 100th tree stored. Trees sampled before log stationarity was reached were discarded as burn-in.

The New PREPACT Feature “Commons”

The prediction of RNA editing sites in multiple DNA sequences by “Plant RNA Editing Prediction and Analysis Computer Tool” (PREPACT) was improved to allow a batch prediction of multiple DNA sequences against multiple reference sequences included in the alignment. Results for each single prediction are displayed in tables using LivePipe JavaScript framework (livepipe.net). In an additional table called “commons” the prediction for each query is summarized to show the amount of overlapping predictions against all reference sequences. In these tables the number of predicted sites per reference and the overall number of sites supported by a user-defined percentage of reference sites is given, too. These data can be downloaded for spreadsheet analysis from the WWW interface (www.prepact.de).

Identification of E–DYW Domain Extensions of PPR Protein Genes on DNA Level and Consensus Creation

Degenerated primers F (5′-gshtaygtdytbhtrtcmaacatwta-3′) and R (5′-tyaccartartcnctacaagaaca-3′) were designed based on available partial carboxyterminal E/E+/DYW domain sequences from bryophytes (Rüdinger et al. 2008) and used to amplify the C-terminal part of DYW-type PPR genes. PCR amplification assays with ingredients as described above included 3 min initial denaturation at 94°C followed by 10 cycles each with 30 s denaturation at 94°C, 30 s annealing at 45°C to 35°C (decreasing 1 K/cycle) and 3 min synthesis at 72°C, additionally 30 cycles with annealing at 35°C and a final step of synthesis for 7 min at 72°C were performed. PCR products were cloned into the pGEM-T Easy vector and commercially sequenced (Macrogen Inc., Seoul, South Korea). The cloning approach was tested with random sequencing of 30 clones of Funaria hygrometrica PCR products which revealed three of its nine known DYW genes (Rüdinger et al. 2011b).

Bioinformatic Work and Statistical Analyses

Deduced protein sequences were aligned with MEGA 4.0.2 (Tamura et al. 2007) using the ClustalW algorithm and manually adjusted. Consensus sequences of the E, E+, and DYW domains were created using sequences obtained by clone sequencing and sequences already available in the NCBI database (Tables 1, 2) and displayed using the weblogo server at http://weblogo.berkeley.edu (Crooks et al. 2004). Phylogenetic analyses were conducted in MEGA 4.0.2 (Tamura et al. 2007). Mathematical simulations for identification of differing numbers of different genes within a limited random clone sampling of a gene family were conducted using R (R Development Core Team 2011) and displayed with the graphic package ggplot2 as shown in Supplementary Fig. 1 (Wickham 2009). The Fisher’s exact test, which assesses the likelihood that two different subsets are equal, was used to test for statistical significance of different DYW population diversities.

Results

Amending PREPACT for Stringent Editing Site Predictions

RNA editing in plant mitochondrial genes can be predicted quite reliably by comparison with homologous genes in non-editing species like the marchantiid liverwort Marchantia polymorpha or green algae or with known cDNAs in editing taxa. Among several other features, the recently developed PREPACT allows automatic prediction of RNA editing sites in multiple sequence alignments or in full organelle genomes (Lenz et al. 2010). The latter option has recently been used successfully to identify candidate RNA editing sites even in the mtDNA of a phylogenetic distant protist, which could subsequently be confirmed (Knoop and Rüdinger 2010; Rüdinger et al. 2011a). However, RNA editing prediction with this strategy frequently proves to be too sensitive and identifies false positive candidate sites when based on individual reference sequences alone. Consequently, we have now added new options to PREPACT, which allow simultaneous inclusion of multiple reference sequences to identify intersections of independently predicted editing sites (“commons”) for more stringent prognoses.

We here use the recently proposed nomenclature to label RNA editing sites (Rüdinger et al. 2009). Briefly, editing site labels are composed of the name of the respective gene followed by an ‘e’ (for editing), the respective nucleotide introduced by the editing event (U), the nucleotide position in the transcript (with position 1 corresponding to the A of the AUG start codon) followed by the resulting amino acid change, e.g., nad5eU598RC.

RNA Editing Variability in Mosses and Liverworts

Mitochondrial gene sequences of nad4 and nad5 from 52 liverworts and gene sequences of nad2 and nad5 from 54 mosses were included in our analyses (Tables 1, 2; nad genes encode subunits of the NADH ubiquinone oxidoreductase, complex 1). RNA editing site numbers identified in the available mitochondrial transcriptomes of widely divergent plant species (Physcomitrella patens, Funaria hygrometrica, Selaginella moellendorffii, Oryza sativa, Silene noctiflora, Arabidopsis thaliana, Brassica napus) were compared with RNA editing frequencies in nad2, nad4, and nad5 transcripts alone to test for their use in extrapolation from a limited transcript sample (Supplementary Table S1). This revealed that the three genes allow to extrapolate very reasonably from the limited gene sampling to total mitochondrial RNA editing numbers over a wide range of RNA editing frequencies in the different taxa (i.e., edited/coding nucleotides), ranging from 0.02% in Funaria (0.04% estimated) to 10.2% in Selaginella (12.7% estimated).

RNA editing sites were predicted with PREPACT using alternative reference sequences (the homologous gene sequences of Marchantia polymorpha and the alga Chara vulgaris and the corresponding cDNA sequences of Arabidopsis thaliana and Physcomitrella patens). Both in liverworts (Table 1) as well as in mosses (Table 2) the numbers of predicted RNA editing sites differ widely in different taxa. Notably, the restricted taxon sampling of Steinhauser et al. (1999) for nad5 of only seven marchantiid (complex thalloid) liverworts is now extended to 14 taxa and the data sampling now includes nad4 as an independent second locus. For none of the marchantiid liverworts only a single site of RNA editing was predicted using coding regions of nad5 and nad4 of Marchantia polymorpha as a reference whereas up to four sites would be predicted using Chara vulgaris or the Physcomitrella or Arabidopsis cDNAs as references (Table 1). None of the ambiguously predicted sites using the phylogenetically more distant taxa was corroborated in exemplary cDNA analyses of nad5 in Corsinia, Lunularia, or Ricciocarpos (Steinhauser et al. 1999) or of nad4 in Lunularia (this study) in support for the new “commons” concept for more restrictive editing site prediction (Table 1).

In contrast to the marchantiid liverworts, editing sites were consistently predicted for the jungermanniid (i.e., leafy and simple thalloid) liverworts, even when using the new restrictive “commons” mode of prediction. Again, we wished to test predictions with exemplary cDNA sequencing, for which we selected six taxa. In particular, this included Pellia cf. endivifolia and Calycularia crispula with some 20 or more editing sites predicted for each gene (Fig. 1; Table 1). Sequencing on cDNA level for the investigated gene regions of nad5 and nad4 confirmed a total of 43 and 35 C-to-U editing events, respectively, for those two taxa. All sites predicted using the stringent “commons” prediction were confirmed and all additional sites were correctly predicted using the homologous liverwort Marchantia sequence as a reference. An example of editing prediction in nad5 based on graphic output from PREPACT is exemplarily shown in Fig. 1 for a sample of selected liverworts and mosses.

Fig. 1
figure 1

RNA editing site prediction is shown for the mitochondrial nad5 gene amplicon encompassing 368 codons using the graphical output of PREPACT (with Marchantia as the reference sequence) for a selection of eight mosses (upper part) and 11 liverworts (lower part). The sampling includes all species for which RNA editing was checked on cDNA level and pairs of taxa with high versus low editing among the mosses (Takakia and Pogonatum) and the liverworts (Haplomitrium and Lejeunea) also investigated for DYW gene diversity (yellow shading). Blue circles indicate codon sense changes after single C-to-U editing and purple circles indicate codon changes double C-to-U editings (multistep). The blue open circle indicates one editing site (confirmed) in Homalia trichomanoides (codon 398) that was predicted with Physcomitrella patens, but not with Marchantia polymorpha as reference. Conversely, one predicted editing site in Lepidogyna hodgsoniae (codon 292, crossed out) was not confirmed on cDNA level (Color figure online)

In general, false positive predictions using individual references mainly affect conservative amino acid exchanges (such as GCN alanine to GUN valine) potentially subject to editing (Supplementary Tables S2–S5). A potential editing event nad4eU1394AV erroneously predicted for nad4 in many jungermanniid taxa is a typical example that remained consistently unconfirmed in cDNAs.

Interestingly, Pellia and Calycularia are closely related genera (Fig. 2a) and share 13 editing sites in nad5 and 14 sites in nad4 (Fig. 1; Supplementary Tables S2, S3). Other closely related species like Metzgeria furcata, Apometzgeria frontipilis and the Symphyogyna species (Fig. 2a; Supplementary Tables S2, S3) also share the majority of their editing sites in both investigated gene regions. RNA editing frequencies differ widely among the jungermanniid liverworts, whereas the frequencies of RNA editing in the two analyzed genes of a given jungermannid species are rather similar in most cases, confirming that RNA editing is much more taxon-dependent than locus-dependent. As a single exception, in Aneura pinguis the editing site prediction in nad5 is significantly lower than in nad4. Overall, a decrease of RNA editing frequency is apparent in diversification of the jungermannid liverworts with higher editing rates in the early-branching taxa (Fig. 2a).

Fig. 2
figure 2

a Liverwort phylogeny based on a concatenated data set of nad5, nad4, rbcL, and rps4. Thickened internode lines indicate significant Bayesian probability support (>0.96). Numbers next to taxa indicate putative editing sites in the investigated gene regions (nad5 331–1434 bp, nad4 130–1440 bp of the coding regions), here shown with Marchantia polymorpha used as reference taxon. The ‘>’ symbol is added to numbers of RNA editing sites when derived from a smaller region than the regular amplicon. b For comparison of DYW gene diversity of Haplomitrium mnioides (highest predicted editing frequency) and Lejeunea cavifolia (low predicted editing frequency) 51 and 30 clones were sequenced, respectively. Nine different E–DYW domain sequences were identified in Lejeunea and 40 different sequences in Haplomitrium, which all cluster species-specifically in a simple Neighbor-Joining tree using uncorrected (p) distances

In the haplomitriid liverworts, the earliest diverging clade of liverworts (Fig. 2a), the genera Haplomitrium, Apotreubia, and Treubia show the most extreme discrepancies in RNA editing frequencies. Haplomitrium mnioides has the highest degree of RNA editing in all liverworts with 56 (confirmed) editing sites in nad5 (1104 bp), whereas in Treubia only single editing events are predicted in nad5 and nad4, respectively.

In mosses, RNA editing levels also vary between species, but two editing sites nad5eU598RC and nad5eU730RW seem to be highly conserved in nearly all arthrodontous mosses (Fig. 1; Supplementary Table S4). Overall editing frequencies are lower than in the jungermanniid liverworts (Table 2; Fig. 3a) with even less editing sites in nad2 than in nad5. A unique exception is Takakia lepidozioides, for which 27 editing sites in the nad5 gene and 22 editing sites in the nad2 gene are predicted using the commons feature of PREPACT. Very similar to Haplomitrium among the liverworts, Takakia represents an early branch in the phylogeny of its clade.

Fig. 3
figure 3

a Moss phylogeny based on a concatenated data set of nad5, nad2, nad5-nad4 spacer, cobi420, rbcL and rps4. Thickened internode lines indicate significant Bayesian probability support (>0.96). Numbers next to taxa indicate putative editing sites in the investigated gene regions (nad5 331–1434 bp, nad2 99–1350 bp of the coding regions), here shown with Marchantia polymorpha used as reference taxon. The ‘>’ symbol is added to numbers of RNA editing sites when derived from a smaller region than the regular amplicon. b For comparison of DYW gene diversity of Takakia lepidozioides (highest predicted editing frequency) and Pogonatum urnigerum (low predicted editing frequency) 30 clones were sequenced for each species. Four different E–DYW domain sequences were identified in Pogonatum and 26 different sequences in Takakia, which all cluster species-specifically in a simple Neighbor-Joining tree using uncorrected (p) distances

The Detailed Inventory of RNA Editing Sites

A detailed listing of editing sites following the recently proposed nomenclature (Rüdinger et al. 2009) is given in Supplementary Tables S2–S5. The few editing sites which were not confirmed in cDNA analyses of related taxa were omitted from the listings. For the exemplary cDNA analyses (Tables 1, 2) at least three clones per gene region were sequenced to increase chances of identifying potentially partial and/or silent RNA editing events. In liverworts generally one to two editing sites per taxon were found to be partially edited. In mosses, except for one single editing site in Homalia trichomanoides, all editing sites were completely edited in all sequenced cDNA clones. Silent RNA editing sites which do not change the amino acid sequence were rarely observed. No single reverse U-to-C RNA editing site was detected in any of the moss or liverwort cDNAs.

RNA Editing Frequencies Correlate with DYW Domain Diversity in Selected Species

With the observation of highly variable editing frequencies both in liverworts and in mosses, the question arises whether the number of DYW-type PPR genes, recently identified as RNA editing recognition factors, varies correspondingly. In a previous study we already showed a high diversity of E–DYW domains in Haplomitrium mnioides, the taxon with apparently the highest RNA editing rate among liverworts (Rüdinger et al. 2008). Targeting a PCR amplicon encompassing the E–DYW domain continuity with degenerated primers, we now wished to check the diversity of the corresponding gene family in the liverwort Lejeunea cavifolia, in which we now identified a particularly low amount of predicted RNA editing. Indeed, in a total of 30 E–DYW amplicon clones only nine different sequences could be identified (Fig. 2b) contrasting the diversity of 40 different among 51 E–DYW clones previously identified in Haplomitrium.

A fully congruent picture emerged for the mosses. 30 E–DYW amplicon clones of Takakia lepidozioides, the moss with the highest degree of putative RNA editing sites in mitochondria and likely also in chloroplasts (Sugita et al. 2006; Yura et al. 2008), resulted in 26 different E–DYW domain sequences. Conversely, in a total of 30 E–DYW amplicon clones of Pogonatum urnigerum, with only two editing sites predicted in the investigated gene regions, only four different E–DYW domain sequences could be identified (Fig. 3b), again demonstrating clearly that numbers of RNA editing sites and DYW gene diversity correlate.

The restricted clone samplings naturally cover only a fraction of the true DYW gene diversity in a given genome. Firstly, degenerated primers will bind preferentially to a subset of DYW gene family members. This was tested with Funaria hygrometrica DNA as a control, where we identified three of the nine known DYW genes (Rüdinger et al. 2011b) among 30 clones and—with the benefit of knowing primer target sites in this species—these turned out to be the DYW genes, where primer sequences fitted best. Nevertheless, the Funaria result matches very well with the only four DYW sequences identified in Pogonatum given that both moss species show only two editing sites in the sampled genes. Secondly, even if primer binding would be fully unbiased for DYW genes, theoretical mathematical modelling shows that a total genomic diversity of n different DYW genes requires a sampling of at least 2n clones for reliable estimates, which limits the approach for very high DYW gene numbers (see Supplementary Fig. 1). Conversely, the high diversity values of 26 or 40 different DYW domains in samplings of 30 or 51 clones in Takakia and Haplomitrium, respectively, would translate into minimally 50 or 60 and up to a few hundreds of DYW domains in their genomes within 95% confidence limits. To test, whether our observations from the clone samplings for the two species pairs of mosses and liverworts reflect true differences we used the Fisher’s exact test, which reveals the likelihood that the two different observations may actually reflect equal true diversities. The likelihood of equal probabilities for the liverwort species pair Haplomitrium/Lejeunea (40/51 vs. 9/30) is 4 × 10−5 and for the moss species pair Takakia/Pogonatum (26/30 vs. 4/30) is 1.3 × 10−8, therefore strongly rejecting hypotheses of equal probabilities in both species pairs.

Conservation of C-terminal Domain Additions Among Land Plant Clades

The C-terminal domain additions E/E+/DYW in more than 100 PLS-type PPR proteins of this type each are highly conserved among the dicot Arabidopsis thaliana and the monocot Oryza sativa. The moss Physcomitrella patens encodes only ten DYW genes in its nuclear genome and it seemed interesting to check for conservation of the domains in the now available wide sequence samplings of liverworts and mosses. The domain sequences obtained in this study were combined with those from previous studies (O’Toole et al. 2008; Rüdinger et al. 2008) resulting in a total of 119 E–DYW domain sequences from 19 different liverworts and of 65 E–DYW domain sequences from 10 mosses, which were used to derive liverwort- and moss-specific consensus sequences of the C-terminal domain extensions (Tables 1, 2). The weblogo profiles (Crooks et al. 2004) and consensus sequences derived for liverworts and mosses did not show any characteristic differences (not shown), suggesting that no functional adaptation occurred after this earliest phylogenetic split of plant evolution. Therefore, liverwort and moss sequence alignments were combined to create a collective bryophyte data set, which was used to derive a joint weblogo conservation plot (Fig. 4). The comparison with the corresponding consensus sequence profiles of the E, E+, and DYW domains of Arabidopsis thaliana (Lurin et al. 2004) showed no single significant change in conservation patterns across all three C-terminal domains, suggesting no significant functional adaptation in land plant phylogeny. Only two conserved amino acid sequence stretches in the E domain and two in the DYW domain seem to be under relaxed conservation in Arabidopsis thaliana sequences (Fig. 4). Comparison with a consensus sequence (not shown) of 77 different E–DYW domains of the lycophyte Selaginella moellendorffii (http://wiki.genomics.purdue.edu-/index.php-/PPR_gene_family) likewise shows a comparably high conservation of these protein domains in the early-branching vascular plant lineage.

Fig. 4
figure 4

a Typical motif structure of a PLS-type PPR protein, characterized by variable numbers (2–26) of alternating PPR motif repeats: canonical 35 aa P motifs (orange) and “long” L (brown) and “short” S (yellow) variants and optional carboxyterminal domain extensions E, E+, and DYW. Oligonucleotide primers F and R used in this study target the conserved E–DYW domain continuity (black arrowheads) are indicated. b Weblogo sequence conservation plot for 119 liverwort and 65 moss E–DYW amplicon sequences (conserved sequence motifs of primer binding sites indicated) obtained from their fused alignments using the WEBLOGO service at http://weblogo.berkeley.edu/logo.cgi. Blank positions indicate rare insertions of amino acids in individual sequences. Comparison with the corresponding conservation plot of the 87 Arabidopsis thaliana E/E+/DYW gene sequences shows largely identical conservations of highly conserved (bit score of at least 2) positions with only four sequence stretches (AAA and RGV in the E domain and VLH and TAT in the DYW domain) showing relaxed conservation in Arabidopsis thaliana marked with bold lines above (Color figure online)

Discussion

Considering the variability of RNA editing frequencies among the basal-most land plant clades and across the wide plant phylogeny at large, the mechanism to correct genetic information in organellar genomes on RNA level seems to be a highly dynamic evolutionary process. The most obvious and suggestive idea explaining the origin of RNA editing in land plants in the first place is the adaptation to higher exposure of mutational stress such as UV light in the new terrestrial environment (Fujii and Small 2011; Tillich et al. 2006). Strikingly, our analyses found extraordinarily high amounts of RNA editing in the respective early-branching taxa, Takakia and Haplomitrium, within their respective clades. A tendency to reduce RNA editing is then obvious both in the diversification of mosses and liverworts. It is certainly tempting to speculate that a potential reduction in RNA editing may reflect reductions of mutational pressure after other adaptations to land plant life have come into existence, such as improved protective plant surfaces or DNA repair mechanisms in the organelles (Maier et al. 2008). This would all the more emphasize the “living fossil” roles of the two enigmatic genera Takakia and Haplomitrium in their phylogenetically isolated positions, which have only recently been elucidated with molecular data (Crandall-Stotler et al. 2005; Davis 2004; Volkmar and Knoop 2010). However, it should not be overlooked that other early-branching taxa both in the mosses (e.g., Sphagnum) and in the liverworts (e.g., Apotreubia and Treubia) show massively lower or even extraordinarily low RNA editing rates leaving open that RNA editing in Takakia and Haplomitrium has instead increased independently in frequency. One clear point of evidence, however, in support for loss over independent gain scenarios is the striking overall absence of RNA editing in marchantiid liverworts, which we now find strongly supported from the much extended sequence and taxon sampling reported here. While the high numbers of RNA editing events in early-branching lineages both among mosses and the liverworts might argue for highly frequent RNA editing as a plesiomorphic character state among land plants, the detailed RNA editing patterns support the idea only weakly. Out of 71 sites of editing in the nad5 gene, only 11 are shared between Haplomitrium and Takakia. Conceptually, the loss of editing sites can be expected to occur much more easily than the emergence of new sites, which require appropriate novel editing factors. It has indeed been shown conclusively that losses of editing sites occur faster than gains in angiosperm mitochondria (Mower 2007; Shields and Wolfe 1997) and chloroplasts (Tillich et al. 2006). However, a different picture may emerge on larger evolutionary timescales as reflected by the dramatic increase of RNA editing frequencies in ancient clades such as the hornworts and lycophytes (Duff 2006; Grewe et al. 2011; Hecht et al. 2011; Sper-Whitis et al. 1996). In those lineages, and possibly in yet more ancient isolated lineages like Haplomitrium and Takakia, evolutionary forces reshaping nuclear genomes by massive waves of gene duplications may allow for neo-functionalizations of editing factors addressing new editing sites, which may appear faster than others are lost.

Independent of the above gain or loss scenarios, the diversity of the nuclear DYW genes correlates well with the RNA editing frequencies. Extrapolating our data, hundreds of such genes must be expected in the nuclear genomes of Takakia and Haplomitrium. We here observed a clustering of the E–DYW domain sequences of one species (Figs. 2b, 3b) instead of a clustering of potential functional orthologues from different taxa. This is in stark contrast to the unequivocal identification of DYW protein orthologue pairs in the two closely related mosses Physcomitrella and Funaria that was recently observed (Rüdinger et al. 2011b) and obviously related to the much wider phylogenetic divergence between the respective pairs of high and low frequency editing liverworts and mosses investigated here (Figs. 2, 3). Clustering of DYW gene paralogs within one species would certainly be expected as the result of multiple gene duplications concomitant with vastly (and rapidly) increased frequencies of RNA editing in the organelles of taxa such as Haplomitrium, Takakia, the hornworts or the lycophytes (Duff 2006; Grewe et al. 2011; Hecht et al. 2011; Sper-Whitis et al. 1996).

The here documented conservation of the C-terminal domain extensions E–DYW across the entire land plant phylogeny with the domain sequences of liverworts and mosses being highly similar in conservation patterns to the vascular plant homologues is striking. This consequently suggests highly conserved functions of these protein domains, which, however, are still elusive. The DYW domain in particular has been suggested to play an important role in RNA editing given its weak similarity to cytidine deaminases (Salone et al. 2007) and bioinformatic protein structure analyses have indeed found strong support for this (Iyer et al. 2011). However, this assumption could not be confirmed in vitro or in vivo (Nakamura and Sugita 2008; Okuda et al. 2009) and—besides several DYW-type proteins identified as RNA editing factors (Kim et al. 2009; Ohtani et al. 2010; Okuda et al. 2009; Robbins et al. 2009; Rüdinger et al. 2011b; Tasaki et al. 2010; Verbitskiy et al. 2011; Zehrmann et al. 2009; Zhou et al. 2008), numerous other PLS proteins of the E/E+ type lacking the carboxyterminal DYW domain were also identified as RNA editing recognition factors in chloroplasts (Chateigner-Boutin et al. 2011; Kotera et al. 2005; Okuda et al. 2007) and mitochondria (Sung et al. 2010; Takenaka 2010; Takenaka et al. 2010; Tang et al. 2010; Verbitskiy et al. 2010, 2011). One DYW-type PPR protein (Phypa_154890; PPR_43) in Physcomitrella patens has recently been shown to be relevant for splicing of mitochondrial group II intron cox1i732g2 and not for RNA editing (Ichinose et al. 2011). Interestingly, however, this is the single P. patens DYW-type PPR protein lacking highly conserved amino acid positions assumed to be important for deaminase functionality (Rüdinger et al. 2011b). Moreover, the degenerated DYW domain and the preceding E and E+ domains could be deleted without loss of splicing function (Ichinose et al. 2011). Aside from the observations reported here, the link between the DYW domain and the cytidine deamination type of RNA editing is furthermore strengthened from recent findings in the heterolobosean protist Naegleria gruberi. After the first discovery of DYW-type PPR protein genes outside of land plants in the nuclear genome of Naegleria (Knoop and Rüdinger 2010), we could recently identify two events of C-to-U RNA editing in its mitochondrial transcriptome (Rüdinger et al. 2011a). The huge phylogenetic distance of some 1.5 billion years separating Naegleria from land plants raises yet more questions on why and how RNA editing came into existence and is maintained in certain lineages of life.