Introduction

Group II introns are large ribozymes that catalyze their removal from precursor messenger (m) RNA, and a number of such introns in fungi, algae, and bacteria have been shown to self-splice from precursor transcripts in the absence of protein co-factors (Peebles et al. 1986; Schmelzer and Schweyen 1986; van der Veen et al. 1986; Schmidt et al. 1990; Ferat and Michel 1993; Costa et al. 1997; Robart and Zimmerly 2005; Mullineux et al. 2010). Ribozyme-catalyzed splicing follows the branching pathway, in which the intron is excised as a branched, or lariat, molecule with a characteristic 2′–5′ phosphodiester bond, and/or the hydrolytic pathway, in which the intron is released as a linear molecule (Daniels et al. 1996; Vogel and Börner 2002). Many group II introns are also mobile retroelements that insert site-specifically into cognate intron-minus alleles with the assistance of an intron-encoded protein (IEP, Moran et al. 1995). Typical group II IEPs are multifunctional proteins with reverse transcriptase (RT), maturase, and DNA endonuclease activities (reviewed in Lambowitz and Belfort 1993; Saldanha et al. 1993; Michel and Ferat 1995; Lambowitz et al. 1999; Lambowitz and Zimmerly 2004).

A novel type of group II intron containing an open reading frame (ORF) encoding a putative LAGLIDADG homing endonuclease (LHEase), rather than an RT-type ORF, was identified in the mitochondrial (mt) small subunit (rns) and large subunit (rnl) ribosomal (r) RNA genes of fungi belonging to the Ascomycota and Basidiomycota (Michel and Ferat 1995; Toor and Zimmerly 2002; Monteiro-Vitorello et al. 2009; Mullineux et al. 2010). However, the origin of this novel composite element and the mode of transmission among species and populations remain poorly understood.

RNA secondary structure models indicate that these introns belong to the group IIB1 subclass (Mullineux et al. 2010). The LAGLIDADG ORFs are inserted within domain (D) IV in the mS785 intron of the rns gene of Cryphonectria parasitica and in intron 5 (mL2059) of the rnl gene of Agrocybe aegerita (Toor and Zimmerly 2002). However, in the mS952 group II intron of Leptographium truncatum (Mullineux et al. 2010) and in the related mS952 introns identified in Cordyceps species (spp.) and C. parasitica (Monteiro-Vitorello et al. 2009), the ORF is inserted in a peripheral loop in DIII (Toor and Zimmerly 2002; Mullineux et al. 2010). DIII is a ribozyme component that acts as a catalytic effector in intron splicing (Lehmann and Schmidt 2003; Fedorova and Zingler 2007; Pyle 2010).

LAGLIDADG homing endonuclease genes (LHEGs) are widely associated with self-splicing elements, such as introns and inteins, or they may be present as free-standing ORFs, inserted outside of the intervening sequence (Dujon 1980; Dalgaard et al. 1993; Jurica and Stoddard 1999; Gibb and Hausner 2005; Bae et al. 2009; Singh et al. 2009). LAGLIDADG-type HEase proteins are named for their LAGLIDADG amino acid α-helical motifs that comprise part of the enzyme’s active site. LHEases bind to long, greater than 20 base pairs (bp), DNA target sites and exhibit flexibility in sequence recognition (reviewed in Chevalier et al. 2005). This class of meganuclease promotes homing by generating a double-stranded cut with 4 nucleotide (nt) 3′OH overhangs in DNA; the break is repaired by the host’s double-stranded break repair processes using the intron/LHEG-containing allele as a template (reviewed in Belfort and Roberts 1997; Belfort et al. 2002; Stoddard 2006; Edgell 2009). Some LHEases have been shown to function as maturases, promoting the splicing of their host group I intron and occasionally related introns (Lazowska et al. 1989; Ho et al. 1997; Ho and Waring 1999; Bassi et al. 2002; Bassi and Weeks 2003; Belfort 2003; Longo et al. 2005).

The mS952 intron is currently the best characterized example of group II introns that encode LAGLIDADG-type ORFs. Previously, we demonstrated that the mS952 intron, Lt.SSU/1, in L. truncatum strain CBS929.85 self-spliced from precursor transcripts in the absence of co-factors under moderate temperature (37°C) and ionic (6 mM Mg2+) conditions and that the presence of the ORF sequence in DIII did not inhibit the efficiency of autosplicing (Mullineux et al. 2010). We also showed that the LHEase, designated I-LtrII, acted solely as an endonuclease and cleaved the mt rns gene 2 nt upstream of the intron insertion site, strongly suggesting that I-LtrII potentially promotes the mobility of the group II intron/LHEG and that they function as a composite genetic element (Mullineux et al. 2010).

In this study, we describe the results of a PCR-based survey of introns within the mt rns gene of 47 strains belonging to the asexual genus Leptographium, phylogenetically allied to the sexual genus Grosmannia (Ascomycota), and the phylogenetic relationships of the host gene, intron, and the LHEase sequences. These fungi are economically important; they are commonly referred to as “blue-stain fungi,” as they impart stains on stored lumber, reducing its value, and some species of Leptographium and Grosmannia are also tree pathogens (reviewed in Hausner et al. 2005). The goals of the study were: (i) to identify group II intron/LHEG composite elements; (ii) to examine the evolutionary relationships of the intron and LHEG; and (iii) to gain an understanding about the transmission of this composite element among Leptographium spp.

Materials and Methods

Amplification, Cloning, and Sequencing of the mt rns Gene

The maintenance of fungal cultures and DNA extraction protocols employed in the present study are described in Hausner et al. (1992), and the strains of Leptographium used in the PCR screen are listed in Table 1. The oligonucleotide primers used for amplification of the mt rns gene, mtsr1 and mtsr2, the amplification conditions, and the purification and cloning of amplicons are described elsewhere (Mullineux et al. 2010). The cycle-sequencing of PCR products and plasmid DNA was carried out as previously described (Mullineux and Hausner 2009; Mullineux et al. 2010). Chromatograms were visualized using the program BioEdit version 7.0.9.0 (Hall 1999), and sequence data were aligned using GeneDoc V2.7.000 (Nicholas et al. 1997). The program ORF Finder (National Center for Biotechnology Information, NCBI) was used to identify putative ORF sequences. Introns are named based on the location of the insertion with respect to the small subunit (SSU) rRNA gene of Escherichia coli strain J01695 or the large subunit rRNA gene (AB035922) of E. coli, according to the proposed nomenclature by Johansen and Haugen (2001).

Table 1 List of strains and size of rns PCR amplicons

Phylogenetic Analyses of Sequence Data

Evolutionary relationships among strains of Leptographium and related taxa were previously inferred using the nuclear internal transcribed spacer (ITS) 1-5.8S rDNA-ITS2 region, as described (Mullineux and Hausner 2009). For inferring evolutionary relationships among the mt rns gene, intron, and LHEase sequences, sequence data were obtained from strains housed at the WIN(M) herbarium (University of Manitoba), and additional sequences were obtained from GenBank (NCBI) using those sequences as queries in blastn searches, employing the database corresponding to “Others (nr, etc.)” for the mt rns and group II intron data sets and in blastp searches for the amino acid data set. Identical sequences were identified using DAMBE (Xia 2000) and discarded for this study, leaving data sets of 65 sequences (mt rns exon), 18 (mt rns group II introns), and 39 (LHEases).

Strains used in the analysis of the mt rns gene are listed in Supplementary Table 1. DNA sequences corresponding to the mt rns exon (from which intronic sequences were removed) were first aligned using ClustalX 2.0.10 (Larkin et al. 2007), and the alignment was refined manually using GeneDoc V2.7.000 (Nicholas et al. 1997). Regions of the mt rns sequence in which the alignment was ambiguous were removed; the alignment used for phylogenetic analyses is provided in Supplementary Fig. 1. Programs contained within PHYLIP Version 3.68 (Felsenstein 2008), MrBayes v3.1.2 (Ronquist and Huelsenbeck 2003), and Tree-Puzzle version 5.2 (Schmidt et al. 2002) were utilized for phylogenetic analyses. The mt rns gene sequence of Kluyveromyces thermotolerans was selected as the outgroup, and the data set was analyzed with DNAPARS (maximum parsimony) and DNADIST (F84 setting). From the latter, the distance matrix generated for each set was utilized in the NEIGHBOR program (NJ setting) for inferring a phylogenetic tree. Phylogenetic estimates were evaluated using the bootstrap procedure (SEQBOOT: NJ, 1,000 replicates; parsimony, 1,000 replicates and jumble 1 time) and CONSENSE in PHYLIP. Analysis with the Tree-Puzzle program used the following settings for the quartet puzzling algorithms: 25,000 puzzling steps; transition/transversion parameter estimated from the data sets; and HKY evolutionary model (Hasegawa et al. 1985). For Bayesian analysis, the data set comprised 65 taxa and 1,242 characters, lset nst was set to six, and the rate was set to gamma. The analyses were run for 10 million generations, and the sampling frequency was set to 1,000. To generate 50% majority rule consensus trees with posterior probability values, 50% of the trees were discarded. The phylogenetic tree presented was drawn with the Tree View program version 1.6.6 (Page 1996), using the Bayesian consensus outfile, and annotations were added to the figure using Corel Draw version 14.0.0.701 (Corel Corporation, Ottawa, Canada).

Sequences used in the phylogenetic analysis of the group II intron are listed in Supplementary Table 2. DNA sequences corresponding to the group II intron (from which sequences between and including the putative ORF start codon to the stop codon were removed) were aligned as described for the mt rns gene sequence alignment; conserved helices and loops in RNA secondary structure models (Toor and Zimmerly 2002; Mullineux et al. 2010) were used as a guide to refine the alignment. Regions in DIII to DIV in which the alignment was ambiguous were removed; the alignment is provided in Supplementary Fig. 2. For phylogenetic analysis, the mS785 intron from C. parasitica (C.p.SSUi1) was selected as the outgroup. Phylogenetic estimates inferred using programs contained within PHYLIP were evaluated as described for the mt rns gene, except that the jumble number was set to 3. Analysis with the Tree-Puzzle program used the same settings as for the mt rns data set, except that the number of puzzling steps was 10,000. For Bayesian analysis, the data set comprised 18 taxa and 739 characters and was analyzed as for the mt rns exon data set. The phylogenetic tree was drawn as described for the mt rns gene.

Sequences used in the phylogenetic analysis of the LHEase dataset are listed in Supplementary Table 3. The amino acid sequences of putative LHEGs were automatically aligned with PRALINE (Heringa 1999, 2000, 2002; Simossis and Heringa 2003, 2005) using the default parameters: exchange weights matrix, BLOSUM62; open gap penalty, 12; extension, 1; progressive alignment strategy, PSI-BLAST pre-profile processing (homology-extended alignment); iterations, 3; e-value cut-off, 0.01; DSSP-defined secondary structure search; and secondary structure prediction, PSIPRED. The alignment was then refined manually, and ambiguous regions were ultimately removed; the alignment of the LHEase amino acid sequences is provided in Supplementary Fig. 3. For analysis of the amino acid sequence of the LHEases, the LHEase encoded within the fifth intron of the cox1 gene from Podospora anserina (cox1i5) was selected as the outgroup. For maximum parsimony, phylogenetic estimates were evaluated as for the mt rns exon data set. Analysis with the Tree-Puzzle program used the following settings for the quartet puzzling algorithms: 10,000 puzzling steps; uniform rate of heterogeneity; and the Mueller–Vingron Model (Müller and Vingron 2000). For Bayesian analysis, the data set comprised 39 taxa and 356 characters. The parameters were estimated by MrBayes, the amino acid model was Poisson, and a gamma rate was used. The analyses were run for 5 million generations and the sampling frequency was set to 1,000. To generate 50% majority rule consensus trees with posterior probability values, 50% of the trees were discarded. The phylogenetic tree was drawn as described for the phylogenetic tree of the mt rns gene.

To infer in greater detail the evolutionary relationships among the 16 mS952 LHEG sequences, the DNA sequence between (and including) the start and stop codons was aligned (Supplementary Fig. 4), and identical sequences were removed, leaving a dataset of 14 taxa. For Bayesian analysis, the data set comprised 14 taxa and 1,174 characters. The parameters were estimated by MrBayes, the analyses were run for 10 million generations, and the sampling frequency was set to 1,000. To generate 50% majority rule consensus trees with posterior probability values, 50% of the trees were discarded. The phylogenetic tree was drawn as described for the phylogenetic tree of the mt rns gene. Phylogenetic estimates inferred using programs contained within PHYLIP were evaluated as described for the mt rns gene, except that the jumble number was set to 3. Analysis with the Tree-Puzzle program used the same settings as for the mt rns data set, except that the number of puzzling steps was 1,000. In all analyses, the ORF sequence of the mS952 intron of C. parasitica was used as the outgroup. The phylogenetic tree was drawn as described for the phylogenetic tree of the mt rns gene.

Results

Distribution of Introns Within the Mitochondrial rns Gene of Leptographium spp.

The mt rns gene in members of the fungal genus Leptographium was screened for the presence of introns using PCR. Amplification of the mt rns gene using primer pair mtsr1 and mtsr2 yielded an amplicon of either 1.2 kb, corresponding to the expected size of intron-minus alleles, or 3–4 kb, representing the intron-plus allele (Table 1); intron-plus alleles among strains of Leptographium correspond to an intron of 1.8–2.8 kb in size.

Amplicons of 1.2 kb were observed in all strains of L. procerum, indicating that introns were absent from the mt rns gene in members of this taxon (Fig. 1). Conversely, amplicons of 3–3.5 kb, corresponding to ORF-containing introns, were obtained for all strains of L. lundbergii. Mitochondrial heteroplasmy of the mt rns gene, that is, the presence of both intron-plus and intron-minus alleles, was detected in the following L. lundbergii strains: NFRI69-148, NFRI89-1040/1/3, NFRI1502/1, and CBS352.29. Strains NFRI1502/1 and CBS352.29, which share identical ITS sequences (indicated by the “=” sign separating taxa in Fig. 1) along with strains DAOM60397 and DAOM63692 yielded amplicons of different sizes (3 and 3.5 kb, respectively).

Fig. 1
figure 1

Phylogenetic tree of strains of Leptographium and related fungal taxa, inferred using nuclear ITS1-5.8S-ITS2 rDNA sequences, showing the distribution of introns within the mt rns gene of strains of Leptographium. Intron-minus alleles produce an amplicon of 1.2 kb in size, while intron-plus alleles produce amplicons ranging in size from 3 to 4 kb. Some strains are heteroplasmic, containing an intron-minus allele (1.2 kb) and intron-plus allele (3 kb). Branch lengths were determined using the Bayesian consensus outfile generated from the phylogenetic analysis of nuclear ITS1-5.8SrDNA-ITS2 DNA sequences in strains of Leptographium and related fungal taxa. Values at the nodes were determined using algorithms implemented by DNA PARS/Tree Puzzle/Mr Bayes programs. The symbol “-” indicates the node is absent or the posterior probability or bootstrap value is not well supported (a posterior probability value of less than 0.95 and bootstrap values less than 70%). The “=” sign is used to indicate those strains that are identical at the ITS-5.8S sequence level. This phylogenetic trees is based on Fig. 3 in Mullineux and Hausner (2009)

Among strains of L. wingfieldii and L. terebrantis, amplicons were observed that ranged in size from 1.2 to 3 kb, and introns, when present, were 1.8 kb in size (Table 1). Within strains of L. wingfieldii, introns were absent in most isolates; in fact, L. wingfieldii strains TOM1.3 and TOM9.4, both collected in Ontario (Canada), are the sole isolates containing introns. Leptographium wingfieldii strains CBS948.89 and TOM9.4, which share identical ITS sequences, were differentiated based on the presence of an intron in the latter strain only. Leptographium terebrantis strains CBS337.70 and CBS298.85 contained introns of 1.8 kb, and heteroplasmy (intron-plus and intron-minus alleles) was observed in the latter strain. Introns were absent in L. terebrantis strains CBS408.61, UAMH690, and UAMH9722.

Among strains of L. truncatum, all European isolates contained an intron of 1.8 kb, with the exception of strain CBS647.89, for which a 4-kb amplicon was detected, corresponding to a 2.8-kb intron. Both isolates from New Zealand, however, contained only intron-minus alleles of the mt rns gene. Among the strains isolated from Ontario, strain TOM86.30 contained an intron of 2.8 kb, while no intron was found in strain TOM74.29. Leptographium truncatum strains CBS929.85, J.R.88-324, J.R.88-449, and CBS647.89 (indicated by the asterisk at the node in Fig. 1) are identical at the ITS-5.8S rDNA sequence level but exhibit a markedly different pattern of intron distribution. Strain CBS929.85 was heteroplasmic; it contained both an intron-minus allele and an intron-plus allele of the mt rns gene, and previous biochemical analysis revealed that this intron was self-splicing and its encoded LHEase cleaved the intron-minus allele 2 nt upstream of the intron insertion site (Mullineux et al. 2010). Strain CBS647.89 contained an intron-plus allele of 4 kb, while introns were absent in strains J.R.88-324 and J.R.88-449.

The Mitochondrial rns Gene Contains a Group II Intron/LHEG Composite Genetic Element

Sequence comparison of intron-plus and intron-minus alleles of the mt rns gene indicated that the introns were inserted at position 952 and corresponded to group II introns containing a putative LHEG, rather than the RT-type ORF typically associated with ORF-containing group II introns. No ORF-less introns, however, were found; that is, the intron and ORF sequences were found together as a composite element in all intron-plus alleles. The intron insertion sequence is conserved in members of Leptographium, as well as in the mS952 introns of Cordyceps spp. and C. parasitica, and is situated within the U5 region of the mt rns gene (Toor and Zimmerly 2002; Monteiro-Vitorello et al. 2009; Mullineux et al. 2010). Sequence characteristics of the group II introns and putative LHEase ORF for the remaining strains in this study are described in Table 2. The size of the intron ranged from 796 nt in Ophiocordyceps konnoana to 1095 in Cordyceps sp. 97003. The intron sequence is AT-rich; GC content ranged from 27.2 to 34.0%, and the GC content of the intronic ORF sequences are similarly low, ranging from 23.9 to 32.2%. Putative ORF sequences were identified using ORF Finder (NCBI). Among Leptographium spp., the putative start codon occurs at either intron position 685 (L. lundbergii and L. truncatum) or 721 (L. wingfieldii and L. terebrantis). Where the ORF sequence appears to be intact, the putative gene encodes an LHEase of 304 amino acids that comprises two LAGIDADG motifs: ICGLVDAEG and LAGFIEGEA.

Table 2 Features of the mS952 group II introns

There is evidence of degeneration in some of the intron ORFs (Table 2).

Insertion of a G at position 352 (based on the numbering of nucleotide positions in the ORF sequence, see Supplementary Fig. 4) in the LHEG of L. lundbergii strains DAOM60397, NFRI89-1040/1/3, and NFRI1502/1 results in a frame-shift that generates a premature UAA stop codon at ORF position 382. A subsequent 7-nt deletion after ORF position 480 regenerates the appropriate reading frame. In L. truncatum strain NFRI1813/1, a T-A transversion at ORF position 446 generates a premature UAA stop codon.

Phylogenetic Analyses of the Mitochondrial rns Gene, Intron, and LHEases Sequences

The mt rns sequences of intron-minus and intron-plus alleles of Leptographium strains were aligned with sequences from representatives of the Sordariomycetes, which include C. parasitica, Neurospora crassa, and Cordyceps spp. and members of the Saccharomycetales, which include Saccharomyces spp. and Kluyveromyces thermotolerans; the latter species was used as the outgroup in phylogenetic analysis (Monteiro-Vitorello et al. 2009). The evolutionary relationships of the mt rns gene of C. parasitica and Cordyceps spp. were previously examined (Monteiro-Vitorello et al. 2009); however, the study did not include representatives of Leptographium. The mt rns gene of Leptographium spp. groups with sequences obtained from members of teleomorphic (sexually reproducing) genera Grosmannia and Ophiostoma, forming a clade with numerous unresolved polytomies (Fig. 2). Within this complex, intron-minus alleles from L. truncatum strain CBS929.85 and L. terebrantis strain CBS337.70 are clustered within a single subclade, with strong support from Bayesian analysis (posterior probability value of 1.00) and moderate (88%) to strong (100%) support from maximum likelihood and NJ analyses, respectively. Intron-plus alleles from L. lundbergii strains DAOM60397, NFRI89-1040/1/3, and NFRI1502/1 also form a subclade with a strong posterior probability value (1.00) and strong support (98%) from maximum likelihood and NJ analyses. Intron-minus and intron-plus alleles from the remaining strains of L. terebrantis, L. truncatum, and L. wingfieldii group with members of Grosmannia, Ophiostoma, and Ceratocystis (it is worth noting, however, that Ceratocystis ossiformis should be transferred to the genus Ophiostoma; see Hausner et al. 1993). The most closely related clade is composed of C. parasitica, P. anserina, and N. crassa. The rns gene sequences from species of Cordyceps, of which some members contain an mS952 group II intron/LHEG composite element, are more distantly related.

Fig. 2
figure 2

Phylogenetic analysis of the mt rns gene sequence in strains of Leptographium and related ascomycetous fungal taxa. Branch lengths were determined using the Bayesian consensus outfile. Values at the nodes were determined using algorithms in NJ/DNA PARS/Tree Puzzle/Mr Bayes programs. The symbol “-” indicates the node is absent or the posterior probability or bootstrap value is not well supported (a posterior probability value of less than 0.90 and bootstrap values less than 70%). The basal position of Ophiostoma microsporum among the tested ophiostomatoid fungi is expected based on a study by Zipfel et al. (2006) that suggests that the genus Ophiostoma could be paraphyletic. Based on rDNA data, Ceratocystis ossiformis should be transferred to the genus Ophiostoma (see Hausner et al. 1993)

For the phylogenetic analysis of the core intron sequences, the putative ORF sequences were removed. The phylogeny showed that introns inserted at position 952 are related (Fig. 3). The Leptographium introns form a distinct clade with 100% bootstrap support and a posterior probability value of 1.00. The topology of the tree shows that the arrangement of the intron sequences resembles that of the host organism (compare Fig. 3 with Fig. 1); that is, three separate subclades are formed comprising the introns found in the L. wingfieldiiL. terebrantis species complex, L. lundbergii, and L. truncatum. However, only the clade composed of the L. truncatum intron sequences received support from maximum likelihood (98%) and Bayesian (0.97) analyses. The Leptographium mt rns intron sequences are more closely related to the introns found within Cordyceps spp., rather than intron 3 (the mS952 intron) of C. parasitica, in contrast to the topology observed in the phylogenetic tree of the host gene (Fig. 2). Introns in the mt rns gene of the Cordyceps spp. are also related, although support for the clade is low (86% bootstrap support from maximum likelihood analysis only); only the subclade formed by intron 1 of Cordyceps sp. 97003 and of O. sobolifera received strong support from bootstrap (97–99%) and Bayesian (1.00) analyses.

Fig. 3
figure 3

Phylogenetic analysis of the mt rns group II intron sequence in strains of Leptographium, Cordyceps spp., and C. parasitica. Sequences from the putative start codon to the putative stop codon were removed from the sequence alignment. Branch lengths were determined using the Bayesian consensus outfile. Values at the nodes were determined using algorithms in NJ/DNA PARS/Tree Puzzle/Mr Bayes programs. The symbol “-” indicates the node is absent or the posterior probability or bootstrap value is not well supported (a posterior probability value of less than 0.95 and bootstrap values less than 70%). Nomenclature of the intron names follows that used by Monteiro-Vitorello et al. (2009) and Mullineux et al. (2010)

Phylogenetic analysis of the LHEase amino acid sequence (Fig. 4a) indicates that the LHEase encoded within mS952 group II introns of C. parasitica, Cordyceps spp., and Leptographium spp. form a distinct clade with moderate bootstrap (83–89%) and posterior probability (1.00) support. The LHEase encoded by the Leptographium intron forms a subclade with strong bootstrap (90–100%) and posterior probability (1.00) support. The topology of this clade reflects that observed with the phylogenetic trees of the intron (Fig. 3) and the ITS-5.8S rDNA (Fig. 1) sequences. In L. truncatum strain NFRI1813/1, a T-A transversion generates a premature UAA stop codon. In L. lundbergii strains DAOM60397, NFRI89-1040/1/3, and NFRI1502/1, a frame-shift mutation is generated by the insertion of a G residue, leading to a premature stop codon. A subsequent loss of 7 nt restores the reading frame. These observations suggest that these particular HEGs are degenerating. The term “(d)” refer to the edited sequence in which the sequences for N- and C-terminal fragments were “joined” by replacing with gaps (-) amino acid sequences that were not identical to those of closely related taxa. The original alignment, showing both fragmented and “ligated” putative LHEases, is shown in Supplementary Fig. 5. LHEase sequences encoded by introns within the mt rns gene of Cordyceps spp. also form a distinct subclade; bootstrap support ranged from 71 to 100% and the posterior probability was 1.00, and these LHEases are more closely related to those of Leptographium spp. than to the mS952 intron encoded LHEase of C. parasitica (Fig. 4a).

Fig. 4
figure 4figure 4

a Phylogenetic analysis of LHEase amino acid sequences in strains of Leptographium and related ascomycetous fungal taxa. Branch lengths were determined using the Bayesian consensus outfile. Values at the nodes were determined using algorithms in DNA PARS/Tree Puzzle/Mr Bayes programs. The symbol “-” indicates the node is absent or the posterior probability or bootstrap value is not well supported (a posterior probability value of less than 0.95 and bootstrap values less than 70%). LHEase sequences encoded within Leptographium spp. are indicated in bold. The “(d)” indicates that the full-length amino acid sequence was obtained by ligating the N- and C-terminal fragments of the LHEase. Nomenclature of the intron names follows that used by Monteiro-Vitorello et al. (2009), Sethuraman et al. (2009), and Mullineux et al. (2010). b Phylogenetic tree based on the complete nucleotide sequences of the mS952 intron ORFs. The annotations of the nodes are the same is in Fig. 2

A second group II intron/LHEG composite element has been previously identified at position 785 of the mt rns gene of C. parasitica (Toor and Zimmerly 2002). This LHEase is distantly related to the mS952 intron ORFs. Instead the mS785 ORF might share ancestry with LHEases associated with group I introns in the SSU and rnl genes of Ophiostoma spp. and C. parasitica (intron 2), as well as intron 1 of the cob gene of P. anserina, albeit with only moderate bootstrap support (87%) based on parsimony analysis. Another monophyletic set of LHEases, based on a node with strong bootstrap (98%, parsimony analysis) and posterior probability (1.00) support, encoded within group I introns in the mt rns gene of C. parasitica (intron 4), O. sobolifera (intron 2), and Agrocybe aegerita (rnsi1) are distantly related to the mS952 LHEases. LHEases encoded by group I introns inserted in the NADH dehydrogenase (ND4L and ND5) genes each form distinct clades with strong support (bootstrap, 90–100%, and posterior probability value, 1.00).

To resolve in greater detail the evolutionary relationships of the mS952 ORF sequences, the nucleotide sequences of the entire HEG sequence, encompassing the start and stop codons, were analyzed (Fig. 4b). The LHEG sequence from Cordyceps spp. and Leptographium spp. each form a distinct clade with strong support from bootstrap (99–100%) and (97–100%), respectively, and posterior probability (1.00) analyses. The ORF sequences of L. terebrantis and L. wingfieldii form a subclade with moderate (87%) to strong (91–99%) bootstrap support and strong support from posterior probability analysis (1.00). Support for the clade encompassing the L. truncatum ORF is similarly strong (with bootstrap values of 89–96% and a posterior probability of 1.00). Support for the node grouping the L. truncatum and L. terebrantisL. wingfieldii ORFs is lower, with only 89% bootstrap support from NJ analysis and a posterior probability value of 0.92. ORF sequences from L. lundbergii form an unresolved polytomy.

Discussion

Group II Introns Encoding LHEGs in the mt rns Gene

The demonstration that the mS952 group II intron of L. truncatum is an active ribozyme and the LHEase cleaves the mt rns gene in the proximity of the intron insertion sequence (Mullineux et al. 2010) led to the intriguing possibility that group II introns and LHEGs have evolved to form a novel type of composite mobile element. Sequence analysis of the mS952 composite element in other Leptographium spp. identified two potential subclasses, on the basis of sequence characteristics of the intron and the position of the putative start codon of the intronic ORF (Table 2). The intron identified in strains of L. lundbergii and L. truncatum is 925 nt and has a % G + C of 29.0–29.1, and the putative start codon of the ORF occurs after intron position 685. However, the intron within the mt rns gene of L. wingfieldii and L. terebrantis is 961 nt and has a % G + C of 30.4–30.5, and the putative start codon follows intron position 721. There is evidence indicating that several ORF sequences are in the process of degeneration (Table 2). Frame-shift mutations were identified in strains of L. lundbergii and L. truncatum strain NFRI1813/1. A mutation after the first LAGLIDADG motif generates a premature stop codon; this mutation effectively renders this HEG a pseudogene.

Phylogenetic analyses of the intron, LHEase, and LHEGs sequences indicated that the mS952 introns and LHEases are related to each other; each element found in the mt rns gene of Leptographium spp. forms a distinct clade, the topology of which reflects that observed in the phylogenetic analysis of the host organism. Numerous unresolved polytomies in the phylogenetic tree of the mt rns gene prevents comparison of the evolutionary relationships of each element to the host gene. Putative LHEGs have been identified in group II introns inserted at positions 785 (mS785 intron) and 952 of the mt SSU gene and in a group II intron inserted within the rnl gene (mL2059) of A. aegerita (Toor and Zimmerly 2002). The amino acid sequence of the putative LHEase in A. aegerita contains numerous frame-shift mutations, and since the purpose of this work was to examine the evolution of group II introns and LHEGs in the SSU gene, specifically the mS952 introns, this sequence was not included in the phylogenetic analysis of the LHEases. However, the results show that LHEases within mS785 and mS952 are only distantly related, and taking into consideration the LHEase in the rnl group II intron, it is likely that group II introns were invaded by LHEGs on at least three separate occasions, with the LHEases associated with the rnl gene of A. aergerita and mS785 introns targeting intron DIV and those associated with mS952 introns targeting intron DIII. It is worth noting that, based on previous studies (Blackwell et al. 2006; Monteiro-Vitorello et al. 2009), as well as the mt rns analysis presented in this study (Fig. 2), Leptographium, Ophiostoma, and Grosmannia are more closely related to C. parasitica than to Cordyceps spp. However, phylogenetic analysis of the two components that make up the mS952 element suggest that the intron (Fig. 3) and the encoded ORFs (Fig. 4) for Cordyceps spp. are more closely related to those found within the Leptographium spp. Thus, the mS952 group II intron/LHEG composite element may be capable of horizontal transmission, as the phylogenetic relationship of these components does not reflect the evolutionary relationships of the host genomes.

Evolution of Group II Intron/LHEG Composite Elements in the mt rns Gene

The generation of mobile introns is described by the endonuclease gene invasion hypothesis (Belfort 2003), which argues that splicing and mobility functions originated independently. In addition, several models have been proposed describing a HEG life cycle of invasion, transmission, degeneration, loss, and re-invasion (Goddard and Burt 1999; Burt and Koufopanou 2004; Haugen et al. 2005; Gogarten and Hilario 2006; Yahara et al. 2009). Self-splicing introns represent phenotypically neutral sites for HEG invasion, and during the degeneration phase of the HEG life cycle, the HEG may evolve such that the HEase fortuitously targets intron-minus versions of the host gene, the HEG then benefits the intron by mobilizing it and allowing it to spread in the genome or be transferred horizontally with the HEG (Loizos et al. 1994; Zeng et al. 2009).

For the purpose of relating intron distribution to the evolutionary relationships of the host organism, data obtained from the intron survey were superimposed onto the phylogenetic tree of the fungal strains. Based on the intron survey presented in Table 2 and Fig. 1 and subsequent sequence analysis of selected amplicons, it is clear that group II intron/LHEG composite elements are present in all tested strains of L. lundbergii; this is strongly indicative of transmission of the element by vertical descent. The distribution of the element is random among strains of L. truncatum and strains of L. wingfieldii and L. terebrantis. This observation suggests that the element was transmitted horizontally through lateral gene transfer, and/or the composite element was randomly lost from a (predominately) intron-plus population. Horizontal transmission throughout a population and heteroplasmy (intron-plus and intron-minus alleles within the same organism) can result from transient hyphal fusion events (anastomosis) that allow for the transfer of mitochondria. Mitochondria, in turn, can also fuse, allowing for genetic recombination events to occur (Basse 2010).

Taking into consideration the cycles proposed by Goddard and Burt (1999) and Burt and Koufopanou (2004), we propose the following model to describe the evolution of the composite element in Leptographium spp. (Fig. 5). Based on our observation that all introns in the mt rns gene of Leptographium spp. that were sequenced have been invaded by LHEGs; that is, no ORF-less introns were found, we suggest that the original homing site of the LHEase was, in fact, intron DIII and the LHEG spread into DIII of all available group II introns. The more parsimonious possibility is that the LHEG invaded DIII in one group II intron and then the ORF sequence evolved to target the rns exon sequence prior to intra- and inter-species transfer of the composite element. The absence of ORF-less mS952 introns or of free-standing LHEGs in the strains examined in this study suggests that these composite elements could be mobile as a unit.

Fig. 5
figure 5

A model describing the possible evolution of mS952 group II introns and LHEG composite elements in the mt rns gene of Leptographium species. The ancestor to mS952 group II intron/LHEG composite elements, marked by the question mark, is unknown. The model proposes that the ancestral LHEase targeted intronic sequences in group II intron DIII. The group II intron shown is ORF-less but whether or not it was indeed ORF-less or contained an RT-type ORF in DIV is unknown (1). Cleavage of the double-stranded break and subsequent repair using the LHEG as a template led to the conversion of LHEG-minus to LHEG-plus introns. This represents the first homing phase of the HEG life cycle (2). Mutations within the LHEG led to the LHEase changing its recognition site for binding and cleavage, thus, cleaving the mt rns gene in the proximity of the intron insertion sequence. In the second homing phase, the LHEase is likely promoting the conversion of intron/LHEG-minus to intron/LHEG-plus alleles of the mt rns gene (3). The LHEG is inherited as a composite element by vertical transmission in L. lundbergii and by lateral gene transfer in strains of L. truncatum, L. wingfieldii, and L. terebrantis. It is possible that the composite element was then lost randomly from the mt rns gene of some members (4). A three-dimensional ribbon structural model of the I-LtrII LHEase encoded by the L. truncatum Lt.SSU/1 intron was obtained from the webserver 3D-JIGSAW using the automatic settings (Bates and Sternberg 1999; Bates et al. 2001; Contreras-Moreira and Bates 2002)

One could speculate that after the initial spreading phase of the LHEG into DIII of other group II introns it began to accumulate mutations, and rather than causing degeneration of the ORF, these mutations resulted in the recognition of a novel target site by the LHEase, namely the mt rns exon sequences near the intron insertion sequence. We further suggest that during the degeneration phase described by the Goddard and Burt (1999) model, the LHEG may have accumulated mutations that led to the recognition of novel target sites. This would have allowed the LHEG to escape the degeneration and loss phases of the cycle and to re-initiate homing into a novel target site, while simultaneously promoting the mobility of the intron into that new site. The following observations support the suggestion that the LHEG encoded by the group II intron is in a second phase of homing: (i) the I-LtrII HEase is active and efficiently cleaves the exon sequences (Mullineux et al. 2010) and (ii) the host organism is heteroplasmic; that is, there are intron-minus alleles present alongside intron-plus alleles, suggesting that there are still potential intact homing sites available.

An alternate explanation is that in unrelated lineages the rns group II intron was invaded by a member of the same LAGLIDADG family, although this explanation requires numerous evolutionary events/steps and, as such, is a less parsimonious model to the one we have described. Due to the rapid evolution of intron sequences and the lack of biochemical/functional data on the mS952 composite element within C. parasitica and Cordyceps spp. one cannot say with certainty if models, such as collaborative homing (Zeng et al. 2009), could be applied to the evolution of the composite element. Also, one cannot discount the possibility that the group II intron/LHEG composite element could have originated in a different genetic location prior to its invasion of the rns gene. To determine if the group II intron and LHEG components of similar mS952 composite elements in Cordyceps spp. and C. parasitica have evolved in a manner similar to that observed in Leptographium spp. awaits functional characterization of those LHEGs.

One puzzling observation is the lack of co-conversion tracts of flanking exon sequences on both sides of the mS952 intron insertion site. Such bi-directional gene conversion events tend to be associated with DNA-based intron mobility mechanisms initiated by LHEases (reviewed in Schäfer 2003). This can be used to distinguish retrohoming events of RT encoding group II introns where one observes co-conversion events only in the upstream exon regions (Lazowska et al. 1994; Schäfer 2003). The best examples of such co-conversion tracks are based on mobile introns that home into protein coding genes (Cho and Palmer 1999; Cusimano et al. 2007), which tend to be more variable at the sequence level, with synonymous substitutions serving as markers that can be used to track intron movements. The region around S952 is highly conserved and we could not find any potential markers that could potentially be “moved” along with the intron during a gene conversion event that accompanies the movement of DNA-based homing.

Conclusion

The observation that different LAGLIDADG ORFs exist within group II introns provides more evidence on the invasive nature of HEGs. The Leptographium rns group II introns and their ORFs are phylogenetically allied to similar group II introns inserted at the same rns position in species of C. parasitica and Cordyceps spp. The origin of this intron may have been the invasion of an ORF-less group II intron by a LHEG. The phylogenetic trees for these species based on rDNA data would suggest that this intron has a stochastic distribution, suggestive of lineages gaining the intron/LHEG combination via horizontal gene transfer and that in some lineages, such as L. lundbergii, this composite intron was vertically transmitted. Overall, the mS952 group II intron appears to behave like a group I intron whose mobility is under the control of an LHEG.