Keywords

Characteristics of Group II Introns

Group II intron s are mobile metalloribozymes that self-splice from precursor RNA to generate excised intron lariat RNA forms, which invade new DNA genomic locations by reverse splicing . These retroelements also encode a reverse transcriptase that stabilizes the RNA structure for forward and reverse splicing and finally converts the inserted intron RNA back to DNA. For these reasons, group II introns initially identified in the mitochondrial and chloroplast genomes of lower eukaryotes and plants, and subsequently found in bacteria and archaea are thought to be the ancestors of nuclear spliceosomal introns and non-long terminal repeat (non-LTR) retrotransposons [14]. Recently identified structural and functional similarities between group II introns and spliceosomal nuclear RNAs have suggested that group II introns may have played an important role at the very start of eukaryote evolution. It is now thought that their invasion of pre-eukaryotic genomes and their proliferation in those genomes may have driven the evolutionary separation of nucleus and cytoplasm [5].

Typically, group II introns consist of a conserved RNA structure organized into six domains. Domain V is the most conserved of these domains and is considered to be an essential part of the catalytic core (Fig. 8.1) [6, 7]. In mobile introns, a protein (the intron-encoded protein) is commonly encoded by domain IV, which contains a specific subdomain (subdomain DIVa) responsible for facilitating the canonical interaction generating the ribonucleoprotein (RNP) particles involved in the invasion of new DNA targets [8]. The IEP has two conserved domains: an N-terminal RT domain and a maturase/thumb domain (also known as the X-domain). Some IEPs also have a C-terminal DNA-binding (D) region followed by a DNA endonuclease (En) domain (Fig. 8.1).

Fig. 8.1
figure 1

Secondary structure of group II intron RNAs and domain structure of the intron-encoded proteins (IEP). Structure of a representative ribozyme (not to scale): domains (DI–DVI), the EBS/IBS elements, the bulged adenosine within DVI, and the sequences involved in the tertiary interactions (Greek letters) occurring during splicing , as described in the text. Major differences between group IIA and IIB introns are indicated in brackets. Note that group IIC introns lack the EBS2 element. The loop of DIV, which encodes the IEP, is depicted by dashed lines, with a box showing the location and structure of DIVa, a high-affinity binding site for the IEP. Diagrams (drawn to scale) of the Ll.LtrB (IIA), RmInt1 (IIB) and GBSi1 (IIC) IEPs encoded within intron DIV are boxed on the right, with the predicted reverse transcriptase (RT) , maturase (X), variable DNA-binding region (D) and conserved DNA endonuclease (En) domains indicated. The C-tail denotes a C-terminal extension of 20 amino-acid residues conserved in the RmInt1 IEP and involved in maturase activity and DNA target recognition. The numbers above the RT domain identify conserved amino-acid sequence blocks characteristic of RTs

Group II Intron Ribozyme Sequences

Group II intron ribozymes are characterized by a conserved secondary structure, which varies in size from 100 nt up to about 3000 nt [9]. The first model was established on the basis of a phylogenetic data comparison, looking for potential base pairings that had been preserved by evolution despite primary sequence divergence [6, 10]. The only sequences of group II intron ribozymes that are strongly conserved are the intron boundaries (GUGYG at the 5′ exon junction and AY at the 3′ junction), which resemble those of spliceosomal introns (GU…AG) and few nucleotides dispersed throughout the rest of the structure [11].

Group II intron ribozymes are organized into six domains, DI–DVI, radiating from a central core (Fig. 8.1). They form a structure consisting of a set of double helices resulting from Watson-Crick and Crick wobble base-pairing [12]. The six domains fold into a catalytically active tertiary structure with the assistance of a series of conserved motifs involved in long-range tertiary interactions surrounding a catalytic four-metal-ion center [13]. Some interactions involve Watson-Crick base pairs (α-α′, β-β′, γ-γ′, δ-δ′, ε-ε′, IBS1-EBS1, IBS2-EBS2 and IBS3-EBS3), whereas other are tetraloop-receptor interactions of known geometries (ζ-ζ′, η-η′ and θ-θ′), or other types of less well defined non-Watson-Crick interactions (λ-λ′, κ-κ′ and μ-μ′) [14, 15]. Two of the six domains are essential for catalysis: the largest domain (DI), and DV. Recent crystallographic studies have revealed that the shape of the RNA molecule is dictated by a set of tertiary interaction networks within domains I and III that creates the scaffold responsible for binding and activating catalytic domain V [16]. DI is essential for exon recognition in forward and reverse splicing reactions. DV contains the catalytic triad, AGC at its base, the G residue being invariant and critical for splicing. Another important catalytic motif is the AC bulge of DV. Tertiary contacts between conserved nucleotides in the linker region J2/3 and the fifth nucleotide of the intron (λ position) have been described and reported to bring together these nucleotides to form a metal ion-binding platform directly involved in catalysis [10, 1720]. Domain VI (DVI) contains a highly conserved bulged adenosine residue that acts as the branch point for lariat formation during splicing [7]. DVI also takes part in the long-range η-η′ interaction underlying the organization of the terminal loop of DVI and an internal helix of DII, which has been reported to be important for transesterification at the 3′-splice site [9].

Group II intron s can be classified into three structural subclasses, according to the recognition of their flanking exons. Subclass IIA and IIB introns, which display strong DNA target specificity, recognize their two exons via three base-pair interactions (IBS1/EBS1 and IBS2/EBS2 for the 5′ exon and δ-δ′ (IIA) or IBS3/EBS3 (IIB) for the 3′ exon), whereas subclass IIC introns use only two of these interactions (IBS1/EBS1 and IBS3/IBS3) and require a stem-loop structure in ssDNA that is generally derived from a transcription terminator by an as yet unknown recognition mechanism [21]. Additional subclasses have also been defined on the basis of specific structural differences: A1, A2, B1, B2 and B3 [10, 22].

Group II Intron-Encoded Proteins

The IEP component is a multifunctional protein containing a reverse transcriptase (RT) domain with subdomains conserved across other RT families (subdomains 0, 1, 2, 2a, 3, 4, 5, 6, 7) [8, 23]. Downstream from the RT domain is domain X, which functions as the thumb domain of the RT, and has a sequence conserved among group II introns but not between group II introns and other types of RTs [4, 24]. Domain X is immediately followed by a DNA-binding domain (D), defined on the basis of its function, but displaying no sequence conservation. Finally, many group II IEPs have an endonuclease domain (En) at their C-terminus that is required for the retromobility of these introns (Fig. 8.1). However, most bacterial IEPs have no En domain. Instead, they have a C-terminal portion that constitutes a distinctive, characteristic signature of this ORF class and has probably been conserved throughout evolution. This region of the protein is a group-specific functionally important protein region participating in both maturase function and intron mobility [25]. Recent studies on the consensus amino-acid sequences of the maturase and C-terminal domains have expanded our knowledge of subclasses of intron ORFs with no recognizable D/En region [26].

The classification and phylogenetic analysis of group II introns on the basis of their IEPs have resulted in the definition of several main groups: A, B, C, D, E, F, CL1 (chloroplast-like 1), CL2 (chloroplast-like 2) and ML (mitochondrion-like) [4, 26, 27], although additional types of intron ORF have recently been identified [26]. The introns of classes A, C, D, E and F and the newly identified g1 introns encode proteins with no En domain [26]. The En domain seems to have been acquired only once in recent phylogenetic lineages of group II introns, by the common ancestor of classes B, CL and ML [26]. Intron RNA structures are generally congruent with ORF phylogeny [27].

Group II Intron Splicing

Group II intron s have developed a series of splicing mechanisms to ensure their removal from the pre-mRNA and, hence, their survival in the host genome [28]. Organellar group II introns mostly interrupt essential genes, whereas prokaryotic group II introns are found in mobile genetic element s or intergenic regions [29]. Some group II introns can self-splice in vitro [30], but excision in vivo is generally assisted by protein factors that facilitate either ribozyme folding or the splicing reaction itself [31].

Excision Mechanisms of Group II Introns

Several excision mechanisms have been described for group II introns, generating different intron splicing products and ligated exons [15, 32]. These mechanisms generally co-exist within the same host [3336], but some group II introns have been reported to display excision exclusively via a specific pathway [3739]. Major advances have been made towards understanding the mechanism and kinetics of group II intron splicing in studies based on self-splicing assays, in which the intron precursor is incubated in non-physiological conditions (high temperatures and salt concentrations) in vitro [30].

Group II intron catalysis generally involves two sequential transesterification reactions (Fig. 8.2a). Mg2+ ions are involved in ensuring the correct folding of the catalytic core of the intron ribozyme and they also orchestrate the rearrangement of the single active site between the first and second splicing reactions [19, 40, 41]. The branching reaction is the most common excision pathway among group II introns [15, 42, 43] (Fig. 8.2a [1]). The 2′-OH of a bulged adenosine located in ribozyme domain VI initiates a nucleophilic attack on the phosphate bond at the intron-5′ exon junction, generating an intron intermediate in which the adenosine is covalently linked to the first intron nucleotide. In a second step, the free 3′-OH of the 5′ exon triggers a nucleophilic attack on the 3′ splice site, leading to exon ligation and intron lariat release. Alternatively, the first transesterification reaction can be initiated by a hydroxyl group from a water molecule, generating a linear excised intron and ligated exons [4446] (Fig. 8.2a [2]). This excision pathway has classically been associated with group II introns that have lost the bulged adenosine [3739]. Both steps in the branching reaction are reversible [47], but the first reaction of the hydrolytic pathway seems to be irreversible [33, 48]. The most recently discovered intron excision mechanism mediates the release of the intron as a true circle [35, 36, 4951] (Fig. 8.2a [3]). The mechanism underlying this reaction remains unknown, but it has been suggested that the 3′-OH of a free 5′ exon could attack the 3′ splice site, releasing the 5′ exon-intron linear intermediate and linked exons. The hydroxyl group at the end of the intron thus reacts with the first intron residue and forms the circular intron. The maturase activity of the IEP modulates the balance between intron lariats and circles in vivo [52]. It has recently been suggested that the splicing of wheat mitochondrial group II introns under cold stress plays a regulatory role [53]. Introns with a conventional branchpoint structure display classical lariat-type splicing regardless of germination temperature. By contrast, the excision of non-conventional introns shifts from a predominantly hydrolytic pathway at room temperature to the production of circular molecules in the cold.

Fig. 8.2
figure 2

Mechanisms of group II intron excision. (a) Intron lariat, linear and circular molecules may be generated, depending on the excision pathway: branching [1], hydrolysis [2] and circle formation [3], respectively. The intron RNA seq uence is indicated by a dark blue line, and the exon sequences are represented by an empty blue box (5′ exon) or a light blue box (3′ exon). Orange bars correspond to the EBSs/IBSs interaction. The bulged adenosine in intron domain VI is represented as an A. Dotted red lines indicate the nucleophilic attacks occurring during each step of the reaction. (b) Structural similarities between the catalytic core of group II introns and activated spliceosomes. Dark blue lines represent the intron RNA, whereas light blue boxes and lines correspond to exon sequences (dotted lines indicating connecting regions of variable length omitted from the diagram for simplification). snRNA segments are shown in black or dark gray, and pink spheres represent Mg2+ ions. The catalytic triad is shown in red; the unpaired CA region in intron domain V and the equivalent segment in the U6 snRNA are shown in green and the bulged adenosine (A) region is shown in violet. The nucleophilic attack of the bulged adenosine on the 5′ exon-intron junction is indicated by a red arrow. Orange bars correspond to the EBSs/IBSs interaction. Other long-range contacts are identified by yellow and light gray bars

The 5′ and 3′ splice sites are recognized principally by base pairing. The EBS1-IBS1 interaction is crucial to 5′ splice site recognition during the splicing reaction. By contrast, the EBS2-IBS2 duplex is entirely dispensable for catalysis and splice site fidelity [36, 54]. Indeed, group IIC introns naturally lack the EBS2-IBS2 interaction, instead requiring the recognition of an upstream transcription terminator stem-loop [21]. Similarly, recognition of the 3′ splice site involves base pairing between the first nucleotide of the 3′ exon and the nucleotide preceding the EBS1 sequence (δ) in group IIA introns, or the EBS3 residue located in the coordination loop in group IIB and IIC introns [55, 56]. In addition to the long-range interactions underlying the catalytic conformation, tertiary contacts and structural constraints determine the efficiency and fidelity of identification, for both splice sites [11, 36, 57] (Fig. 8.2b, left panel).

Degeneration is common in organellar group II introns, for both RNA structures and ORF motifs. Thus, the intron ribozyme may be encoded in two or more pieces located at different positions in the genome , with disruptions often observed within dIV [58]. The intron and the flanking exons are transcribed separately, in a manner similar to that for cis-splicing introns, making the ligation of two distant exons possible. Trans-splicing is generally reported in higher plants, and this process requires the assistance of host-encoded protein factors [59]. A more dramatic process, intron fragmentation, has been observed in the chloroplast genome of Euglena gracilis, in which 155 small group II intron fragments are found [60].

Alternative splicing reactions have also been reported for bacterial and organellar group II introns [6163]. These reactions occur at low frequency and result from misrecognition of the 3′ or 5′ splice site, inducing ORF truncations or small insertions. Alternative excision was initially thought to result in unproductive processing, but a recent study has revealed that this constitutive regulated process involving an unknown mechanism generates four functional surface-layer protein isoforms in the human pathogen Clostridium tetani [64].

Proteins Assisting Intron Excision

The efficiency of group II intron splicing in vivo is dependent on two groups of proteins [8]: those encoded by intron domain IV (IEPs), which are involved in cis-splicing reactions and found mostly in bacteria, and a group of host-encoded proteins with diverse functions, mediating the trans-splicing of organellar group II introns in yeasts, algae and higher plants [31, 65]. IEPs promote the splicing of group II introns in the maturase domain. They are highly specific splicing factors, playing little or no role in the excision of any intron other than the intron that encodes them [6668]. They are usually expressed in cis, but they can promote the splicing of genomic copies of ORF-less introns [68, 69].

The organellar introns have diverged considerably from their bacterial ancestors, through a decrease in the number of maturase-encoding genes. A few intron-encoded ORFs are found in the mitochondria of lower eukaryotes (i.e., Marchantia polymorpha), but only one organelle-encoded protein has been reported to assist in the splicing of about 20 group II introns in vascular plants. This protein is called MatR in mitochondria, and MatK in chloroplasts [70, 71]. Moreover, a series of maturase-related proteins (nMat1a, nMat1b, nMat2a, and nMat2b) have been identified in the nucleus of angiosperms. After translation, these proteins are imported into mitochondria and chloroplasts to mediate the excision of group II introns [7274]. An extensive search of the genomic sequences of recently sequenced green algae and land-plant mitochondria identified a number of new genes potentially encoding maturases, the role of which in the maturation of group II introns remains to be elucidated [75].

A plethora of nuclear-encoded proteins from various families, with diverse functions, has been reported to contribute to the correct folding and excision of organellar group II introns [9, 31, 65, 76]. One of the most numerous and well characterized groups of proteins identified is the DEAD-box proteins, which act as ATP-dependent RNA chaperones, ensuring the correct folding of the ribozyme or resolving inactive kinetic traps and then triggering productive RNA folding (the yeast factor MSS116, CYT-19 in N. crassa; Ded1 in S. cerevisiae; SrmB in E. coli; PMH2 in A. thaliana) [7780]. Only a small number of these proteins (MSS116, CYT-19 and Ded1) can stimulate group II intron splicing in vitro in near-physiological conditions, suggesting that additional cofactors must be required to trigger splicing in vivo. Genetic and biochemical data have shown that organellar group II intron RNAs and multiple protein factors form functional high-molecular weight, spliceosome-like complexes [81, 82] (Fig. 8.2b).

Group II Intron Mobility

In addition to acting as catalytic RNAs, some group II introns are also target-specific mobile genetic element s (recently reviewed in [8, 83]). Mobile group II introns can insert site-specifically into a DNA sequence identical to the splice site (homing), at a frequency of up to 100 % [8486], or more randomly, at low frequency, into ectopic sites (transposition) [8789]. These mechanisms occur through an RNA intermediate and are referred to as retrohoming and retrotransposition, respectively [90, 91]. Both events occur via full reverse splicing , mediated by a ribonucleoprotein particle (RNP) formed by the association of the IEP with DIVa and the catalytic core regions of the excised intron RNA [92]. The IEP is essential for maintenance of the active intron RNA structure, to ensure that the intron can reverse splice into the DNA target site. Group II intron mobility mechanisms were first studied for the yeast mtDNA introns aI1 and aI2 [9396], and have since been investigated in bacteria [66, 90, 97]. The main difference between yeast and bacterial mobile introns is that one or both exons may accompany the intron (co-conversion of flanking exons) in yeast, but not in bacteria [98].

The retrohoming of group II introns is highly site-specific, because a ≈20–25-bp DNA target sequence is recognized by the RNP via domain D or other regions. The IEP recognizes the upstream (positions −23 to −1) and downstream (positions +4 to +9) exon DNA sequences (Fig. 8.3a). Thirteen nucleotides of the DNA target are recognized by base pairing between the intron RNA and exon sequences, through EBS2/IBS2, EBS1/IBS1 and either δ-δ′ (IIA introns) or EBS3/IBS3 (IIB, IIC introns) interactions (Fig. 8.3b; [12, 32, 99, 100]). The essential role of each of the base-pairing interactions has been demonstrated by mutating the DNA target site and observing the inhibition of reverse splicing in vitro or of intron mobility in vivo. These mutations can be rescued by compensatory mutations in the intron RNA. DNA target specificity is mostly controlled by the intron RNA, as the IEP can recognize only a few nucleotide positions. Initial recognition appears to involve interactions in the major groove between the IEP and key bases in the distal region of the 5′ exon, in the chain into which the intron subsequently reverse splices [101]. These interactions, enhanced by contact between the phosphate backbone and the IEP, involve unwinding of the DNA, allowing the reverse splicing of the intron RNA by pairing with IBS and/or δ sequences. DNA target sites have been defined experimentally for the yeast introns coxI-I1 [102] and coxI-I2 [103], the L. lactis Ll.LtrB intron [99], the S. meliloti RmInt1 intron [104], the B. halodurans B.h.I1 intron [21], the E. coli E.c.I5 intron [105], the T. elongatus T.e.I4h intron [106] and the Enterobacter cloacae group IIC E.cl.GOC intron [107]. Group IIA, IIB and IIC introns differ in terms of DNA target site recognition, and these differences affect design and performance in the biotechnological context (Fig. 8.3b).

Fig. 8.3
figure 3

DNA target site recognition: (a) RNP complexes recognize the target site on double- or single-stranded DNA primarily through EBS-IBS pairing (and by δ–δ′ interactions in subgroup IIA), whereas the IEP also binds specifically to key nucleotide residues in distal 5′ and 3′ exon regions indicated by dotted lines in the diagram. (b) Comparison between the base-pairing interactions used by group IIA, IIB and IIC introns for DNA target site recognition. EBS exon binding site, IBS intron binding site

Group II Intron Mobility Pathways

After reverse splicing of the RNA into the DNA target site, at least three different mechanisms may complete the mobility of yeast group II introns [96, 108]. One of these mechanisms involves a minor pathway through which a small proportion of the intron mobility events occur in natural conditions without the coconversion of exons, probably through the synthesis of a full-length intron cDNA, which is joined by DNA repair . A second RT -independent pathway (≈40 %), in which intron integration occurs by homologous recombination of both the 5′ and 3′ exon sequences, involves the repair of the nicks generated by RNPs in the DNA target, by the double-strand break reaction (DSBR) mechanism (Fig. 8.4c). Finally, the major pathway (≈60 %) entails the coconversion of the 5′ exon only, and involves cDNA synthesis by target-primed reverse transcription (TPRT) and the integration of the intron by homologous recombination (DSBR) (Fig. 8.4a, b).

Fig. 8.4
figure 4

Group II intron mobility pathways: Mechanisms a, b and c have been described only in yeast; all these mechanisms are dependent on homologous recombination. a and b are the major retrohoming pathways in yeast, whereas c is the minor pathway and is RT -independent. d is the major pathway in Ll.ltrB and a minor pathway in yeasts. This mechanism is independent of homologous recombination. e is the retrohoming pathway for introns lacking the En domain and the mechanism associated with retrotransposition. f is the retrohoming pathway for linear group II introns

The bacterial group II intron mobility pathway was first described for the L. lactis Ll.LtrB IIA intron, in studies involving in vivo plasmid-based genetic assays in both L. lactis and E. coli [66, 90]. Retrohoming was subsequently characterized by analyzing the biochemical characteristics of RNPs reconstituted from the purified IEP (LtrA) and in vivo excised lariat RNA [67, 109]. In vivo, the RNPs bind the DNA nonspecifically and then scan for the accurate target site by facilitated diffusion [109]. Ll.LtrB RNPs recognize a relatively long target region through three sequence motifs in DI of the RNA (EBS1, EBS2 and δ), and they base pair to complementary sequences in the DNA target site (IBS1, IBS2 and δ′) and through interactions of the IEP with nucleotides located in positions −25 to +9 of the insertion site (Fig. 8.3a) [83, 99, 100, 110, 111]. Once the target has been recognized, retrohoming occurs by TPRT (Fig. 8.4d). The RNA cleaves the sense strand of the double-stranded DNA at the exon junctions, and the intron RNA integrates into the target site. At the same time, LtrA cleaves the antisense strand at position +9, through its En activity. The 3′ end of the antisense strand is used by the RT domain of the IEP for the reverse transcription of the inserted RNA intron. The resulting cDNA is then integrated into the host DNA by homologous recombination-independent repair mechanisms [90].

Some mobile group IIB introns have an IEP with no En domain. Their IEPs have RT activity, and their RNPs can mediate reverse splicing into double- or single-stranded DNA substrates but cannot carry out site-specific second-strand cleavage; they therefore require a variant of the TPRT retrohoming pathway (Fig. 8.4e) [91, 105, 112, 113]. Most En group II introns are inserted into the strand used as a template for synthesis of the lagging strand during replication [114]. They can therefore insert into single-stranded DNA only when the replication fork has overtaken the insertion site. The IEP thus uses the nascent lagging strand to prime reverse transcription [91]. Other mechanisms have also been suggested for the initiation of the cDNA synthesis, including random nicks in the antisense strand (Schizosaccharomyces pombe cob-I1 intron, [115]), or de novo initiation (RT encoded by the Mauriceville plasmid in Neurospora, [116]). The best studied IIB-like intron is RmInt1 [69, 91, 112], which recognizes a DNA target site extending 20 nt into the 5′ exon and 5 nt into the 3′ exon. Target recognition occurs primarily by base-pairing between the EBS1, EBS2 and EBS3 of the intron RNA and the corresponding IBS sequences in the DNA target. The RmInt1 RT recognizes two critical nucleotide residues, possibly with the contribution of additional sequences [104, 112].

Unlike group IIA and IIB introns, group IIC introns have limited specificity due to the recognition of short IBS1 and IBS3 sequences (Fig. 8.3b). Moreover, the IBS2/EBS2 pairing seems to be replaced by the recognition of a palindromic Rho-independent transcription terminator motif or phage attachment site (attC sites), through an as yet unidentified mechanism [21, 37, 116118]. Group IIC intron-encoded proteins also lack the En domain, but retain both domain Z (DNA binding) and domain X (maturase activity) [107]. Introns of this kind are found after non-identical terminators, inserted into the top or bottom strand, with a leading or inverse orientation, respectively [12, 107]. The integration of these introns resembles that of IIB introns, as it occurs through reverse splicing into single-stranded DNA at the replication fork or transcription bubble, with the nascent lagging strand preferentially used to prime reverse transcription of the intron.

It was thought that linear introns could not undergo reverse splicing [119121], but recent studies have shown that yeast and bacterial linear group II introns reverse splice efficiently (Fig. 8.4f) [122124]. The retrohoming of linear Ll.LtrB intron was demonstrated in eukaryotes, by the microinjection of RNPs into Xenopus laevis oocyte nuclei or Drosophila melanogaster embryos [123, 124]. The linear RNA undergoes the first reverse splicing reaction, becoming attached to the 3′ exon but not to the 5′ exon. The IEP then reverse transcribes the RNA, and the cDNA is ligated to the 5′ exon by the non-homologous end-joining (NHEJ) factor Lig 4 and the DNA repair polymerase θ (polQ). Other DNA ligases and polymerases can also perform this function, but at lower efficiency [124]. This mechanism may also mediate the retrohoming of linear RNAs, not only in eukaryotes, but also in many prokaryotes with homologous NHEJ machinery [8, 125].

Group II intron s can also retrotranspose to ectopic DNA target sites, albeit at low frequency (10−4–10−5) [88, 114, 126128]. The pattern of spread of Ll.LtrB within the L. lactis genome is consistent with intron retrotransposition into double- or single-stranded DNA through a homologous recombination-independent mechanism [114], similar to that described for the mitochondrial and bacterial RmInt1 introns [32]. In L. lactis, the retrotransposition of the Ll.ltrB intron is biased towards reverse splicing into transiently single-stranded DNA, with priming by the nascent lagging strand (Fig. 8.4e). By contrast, the retrotransposition of Ll.LtrB in E. coli is characterized by the preferential use of double-stranded DNA targets, with or without En cleavage of the opposite strand [129], indicating a role of the host cell, in addition to the intron, in pathway selection [130].

Host Factors Influencing the Retrohoming Pathway of Group II Introns

Mobile group II introns are genetic elements with specific molecular characteristics favoring their retention and spread in the genome . However, their mobility depends on the genetic background of the host, and retrohoming is dependent on the completion of cell functions [111]. The replication machinery of the cell is required in the early stages, but the host repair machinery is essential during late stages of retrohoming. The first experiments performed with the group II intron Ll.LtrB from L. lactis in the heterologous host E. coli [131] led to the formulation of a model of retrohoming involving host factors that either increased or decreased the efficiency of mobility. Thus, exonucleases (Recj, MutD, and PolI) cutting the ends of the DNA, RNases (RNase H) degrading the RNA template after cDNA synthesis, DNA and repair polymerase complexes (PolII, PolIII, PolIV and PolV) ensuring correct synthesis of the second DNA chain and DNA ligases all facilitate intron mobility. By contrast, degradative enzymes may decrease retrohoming levels. For example, RNase I and E, may eliminate the intron RNA, and exonuclease III (XthA) may degrade the newly synthesized cDNA or top strand in the upstream exon. Further studies revealed that some enzymes from the degradosome (RNase E) may affect retrohoming levels, depending on the physiological status of the cell [132]. It was subsequently shown that Ll.LtrB mobility was influenced by cell interactions and responses to cellular or environmental stresses, through global regulators [133136]. One recent study [137] confirmed previous findings and revealed, through genetic and biochemical analyses, a possible role for replication restart proteins in the retrohoming mechanism.

Use of Group II Introns in Biotechnology

Group II intron s have a number of characteristics that render them suitable for use as biotechnological tools: (1) they integrate into their DNA targets highly efficiently, in a homologous recombination-independent manner; (2) they can mobilize foreign DNA inserted within the intron; (3) minimal host functions, in the form of common cellular DNA repair mechanisms, are required for intron integration and (4) group II introns recognize the target DNA mostly through base pairing with the intron RNA . This last characteristic makes it possible to change intron specificity simply by changing the EBS/δ sequences. Currently, Ll.LtrB [138] a group IIA intron from Lactoccocus lactis, and the group IIB introns EcI5 [105] from Escherichia coli and RmInt1 from Sinorhizobium meliloti [139] are used as biotechnological tools. A chimeric intron based on the TeI3c ribozyme and the TeI4c IEP from Thermosynechococcus elongatus have also been used for gene targeting in thermophilic bacteria [140].

Introns were initially modified for the recognition of new target sites by identifying target sites matching the requirements of the IEP and then modifying the EBS/δ sequences to ensure base pairing with the new IBS/δ′ sequences. The retargeted introns are known as targetrons . Several algorithms have been developed for the retargeting of the Ll.LtrB [138], EcI5 [105] and RmInt1 [141] introns. These algorithms are based on the observed nucleotides frequencies obtained in invasion experiments using both randomized EBSs/δ-intron donor libraries and randomized IBSs/δ′-intron target site libraries. The algorithm scores a DNA sequence across a sliding window with 1 bp increments. The length of the sliding window depends on the intron: 45 bp for Ll.LtrB, 36 bp for EcI5 and 25 bp for RmInt1. A score is assigned to the potential target sites identified, with higher scores associated with a greater probability of a high invasion frequency. For each algorithm, a threshold value has been defined, above which the retargeted intron insertion frequency is high enough for the identification of intron insertion into the selected new target site in a simply assay, such as colony PCR. Once the best potential target site has been identified, the EBSs/δ sequences of the introns are modified to ensure base pairing with the new target site and are inserted into the intron donor plasmid, in which the IBSs/δ′ regions of the flanking exons are also modified to provide complementarity with the modified EBS/δ regions, for efficient RNA splicing . Intron donor plasmids also contain the IEP sequence, together with the corresponding intron from a position outside the DIV of the ribozyme (ΔORF) and downstream from the intron RNA. This conformation has been shown to be more efficient for retrohoming than the wild-type conformation with the IEP within DIV of the intron RNA [69, 105, 110]. Different promoters have been used for expression of the targetron and the associated IEP: constitutive promoters, such as the Km promoter used for the RmInt1 targetron [69], the T7 promoter recognized by the T7 RNA polymerase used in the expression of EcI5 and Ll.LtrB targetrons in E. coli, inducible promoters, such as the m-toluic acid-inducible promoter or tac promoter [142, 143] and endogenous promoters from the bacterial strain in which the targetron is used [144146].

Intron integration can be detected by colony PCR or through the use of a selectable marker such as an antibiotic resistance gene. A retrotransposition-activated marker (RAM) has been developed for this purpose [145, 147]. The RAM cassette is based on a selectable marker with its own promoter inserted in reverse orientation into group II intron domain IV of the intron RNA . The marker is interrupted by the td group I intron in the forward orientation. The selectable marker is thus expressed only if retrohoming occurs. Subsequent modifications, with the selectable marker flanked by FRT sites recognized by the site-specific recombinase Flp, made it possible to remove the marker gene and led to the adaptation of the system for multiple gene disruptions.

Retargeted introns have been used in various species of the genera Agrobacterium [142], Azospirillum [148], Bacillus [149], Clostridium [145], Ehrlichia [150], Escherichia [105], Francisella [146], Lactococcus [151], Listeria [152], Paenibacillus [153], Pasteurella [154], Proteus [155], Pseudomonas [142], Ralstonia [143], Salmonella [111], Shewanella [156], Shigella [111], Sinorhizobium [139], Sodalis [157], Staphylococcus [144], Vibrio [158], and Yersinia [159]. In bacteria, targetrons are used primarily to obtain knockout mutants. In Clostridium, a genus in which transformation is difficult, retargeted intron derivatives of Ll.LtrB, known as ClosTron [145], have proved useful in several studies of the biology of the various species. When retargeted ΔORF introns insert into the sense strand, a conditional disruption is obtained as splicing can take place if the IEP is expressed, even in trans. However, intron targeting to the antisense strand leads to an unconditional mutation.

It is also possible to use targetrons to deliver foreign DNA into specific sequences [151, 160, 161]. The cargo gene is transported into the deleted region of DIV. For Ll.LtrB, fragments of less than 100 bp in length have a slight effect on intron insertion, but mobility efficiency is greatly reduced by fragments of more than 1 kb [162]. The secondary structure of the cargo sequences also affects intron mobility [156].

A method for bacterial genome editing using both targetrons and the Cre/lox system has recently been used [156]. This system has been used for insertions of 12 kb and deletions of up to 120 kb in E. coli and S. aureus, inversions in E. coli and Bacillus subtilis and one-step cut-and-paste manipulations for the translocation of 120 kb of genomic sequence to a site 1.5 Mb away.

Group II intron s (Ll.LtrB) have also been used in eukaryotic cells [163]. This approach is less well developed in eukaryotes than in prokaryotes and several hurdles have yet to be overcome. The principal problem concerns the concentration of Mg2+, which is below that required for the movement of Ll.LtrB in eukaryotic cells. Furthermore, the chromatinization of cellular DNA strongly inhibits intron integration. In eukaryotic cells, group II introns are microinjected into the cell nucleus as in vitro reconstituted ribonucleoproteins (RNPs). Xenopus laevis oocytes, and embryos of Drosophila melanogaster and zebrafish have been used for this purpose. RNPs have been reconstituted with both lariat and linear RNAs. In addition to the RNPs, a mixture of 500 mM Mg2+ and 17–20 mM each of dATP, dCTP, dGTP and dTTP is also injected into the nucleus, to optimize intron insertion. The RNPs and Mg2+ must be injected separately, because RNPs precipitate at this Mg2+ concentration. In these conditions, lariat RNPs injected into the X. laevis oocyte nuclei can both insert into an injected plasmid target at high frequency and stimulate DNA integration by homologous recombination, by producing target-site double-strand breaks. In D. melanogaster embryos, intron integration into the yellow gene has been achieved with introns retargeted against this gene. More knowledge is required about the behaviour of group II introns in eukaryotic cells, for the development of tools for use in eukaryotes.

Evolutionary Aspects of Bacterial Group II Introns

Group II intron s display structural, functional and mechanistic similarities to eukaryotic pre-mRNA nuclear introns [164168]. Nuclear pre-messenger RNA introns [11] and non-long terminal repeat retrotransposons may have evolved from mobile group II introns [169]. It has been suggested [2, 168, 170] that, at an early stage in the evolution of eukaryotes, the ancestral group II intron structure was split into the non-catalytic spliceosomal introns and the catalytically active RNA component of the spliceosome. This transition was accompanied by the degradation of the reverse transcriptase ORF. Maturases may have persisted in plants, during evolution, through the acquisition of a targeting signal enabling them to function within the organelles, to support the splicing of organellar group II introns [75, 171]. The evolution of eukaryotic cell organization may also have been a defensive response to the deleterious effect of group II intron proliferation in the host genome [172, 173]. Nevertheless, a recent report [174] suggests that the compartmentalization of eukaryotic cells into nucleus and cytoplasm does not prevent group II intron invasion of the host genome, although it may control proliferation of the intron, through transient or stable nucleolar sequestration. Strikingly, when the IEP loses its maturase activity, the protein becomes localized in nuclear speckles, domains of the nucleus enriched in pre-mRNA splicing factors [175], including small nuclear ribonucleoproteins (snRNPs) and serine-arginine (SR) proteins located in the interchromatin regions of the nucleoplasm. This is consistent with the hypothesis that eukaryotic spliceosomal introns may have evolved from group II introns.

Bacterial group II introns are tending to evolve towards an inactive form by fragmentation, with the loss of the 3′ terminus, including the IEP [176, 177]. The significance of fragmented introns within a particular genome remains unclear. It has recently been suggested that, as for transposable element s (TEs), the dispersal and dynamics of group II intron spread within a bacterial genome follows a selection-driven extinction model, predicting the removal of highly colonized genomes from the population by purifying selection [178]. Only 25 % of the bacterial genomes sequenced to date [8] harbor recognizable group II introns. This suggests that these introns did not act as a major force with a broad effect in the promotion of evolutionary change, but caution is required in the interpretation of these observations, because the 5′ end of fragmented intron sequences lacking the encoded ORF is unlikely to have been detected in sequenced bacterial genomes.

It is generally accepted that the “selfish” features of mobile elements underlie their acquisition and maintenance in bacterial genomes, but these elements may also be beneficial to the host. In bacteria, group II introns are thought to be tolerated to some extent because they self-splice and preferentially home to sites outside key functional genes, generally within intergenic regions or other mobile genetic element s [179]. Other studies have suggested that group II introns are beneficial to the host because they control other potentially harmful mobile genetic elements [180], and contribute to the generation of diversity and the remodeling of genomes in times of stress [135]. These features may decrease negative effects on the host organism, resulting in the maintenance of these retroelements for longer periods in bacterial populations. It also seems likely that the gradual eradication of group II introns by the host during evolution would not result in the complete elimination of intron sequences, with some intron fragments remaining and continuing to evolve in the genome . It thus remains possible that these fragments provide sequence variation on which selection can act, leading to their persistence and continuing evolution in the genomes of some bacterial lineages [181].