Keywords

11.1 Introduction

Transposable elements (TEs) are DNA fragments that can move from one site of the genome to another. Though ubiquitous in nature, they were first discovered in maize more than 60 years ago (McClintock 1947). This eventual Nobel-Prize-winning discovery began to be acknowledged broadly only three decades later and gained increasingly wider appreciation in the “omics” era (Craig et al. 2002). Today, TEs are considered to have played an intrinsic role in genome structure evolution through the multiple chromosome rearrangements that are brought about by the chromosome cutting properties noted by McClintock (1952). TEs have been proposed as a major driving force in the process of gene creation by providing the raw material needed for the evolution of new gene functions (Dooner and Weil 2012; Feschotte and Pritham 2007) and have turned out to be the major component of most sequenced eukaryotic genomes (Craig et al. 2002).

At the turn of the twenty-first century, the known classes of TEs (Feschotte et al. 2002) were expanded to include the newly hypothesized Helitron transposable elements. Unlike Class I elements (retrotransposons) that transpose through RNA, Class II elements (DNA transposons) transpose through DNA. Helitrons were postulated to transpose via a hypothetical rolling circle (RC) replication mechanism (Kapitonov and Jurka 2001) and, therefore, fall into the latter class. A more recent classification of eukaryotic transposons places them under a special Subclass 2 among DNA transposons (Wicker et al. 2007). In the past decade, a considerable effort has been made to better understand these elusive TEs from all different angles. Our goal in this chapter is to summarize our current knowledge about these DNA transposons in the plant kingdom and to provide a personal view of further explorations in this emerging field.

11.2 Discovery of Helitrons

Shortly before their discovery as unique eukaryotic transposons, Helitrons had been described as repetitive sequences in Arabidopsis thaliana, one of the three genomes analyzed by Kapitonov and Jurka (2001) in their seminal paper. The first such repeat detected was Aie (Arabidopsis insertion element), a 527-bp element insertion present downstream of the polyadenylation site of AtRAD51 in the Columbia ecotype but absent in its Landsberg erecta counterpart (Doutriaux et al. 1998). Aie is AT-rich, contains no ORFs, has a stem-and-loop sequence on the 3′ side (5 unpaired bases in a 21-bp stem, with a 4-bp loop), and shows some short duplications around the insertion site. Because it lacked terminal inverted repeats (TIRs), Aie was taken to be a remnant of an imperfect transposition event, an interpretation supported by its multicopy presence in the two ecotypes.

Due to their abundance in the genome, elements closely related to Aie were readily uncovered in subsequent computational analyses of Arabidopsis repetitive sequences. AthE1 was the most abundant class of repetitive elements in the A. thaliana 1998 sequence database (Surzycki and Belknap 1999). Although they could be as long as 2 kb, these elements lacked any detectable coding capacity for known transposases. While the 5′ and 3′ ends of AthE1 family members were highly conserved, they did not represent either inverted or direct repeats. Direct repeats flanking transposons, also known as target site duplications (TSD), are a common feature of retrotransposons and DNA transposons. Their absence in AthE1 elements suggested that these elements differed from most other known transposons in being unable to recombine into the genome by introducing staggered cuts in the target DNA.

In a comprehensive analysis of potential transposon sequences in chromosome 2 of Arabidopsis, sequences resembling AthE were found to make up 1.1 % of the chromosome. No detectable TSDs or TIRs flanked these unusual repeats, which were named ATREP1-10 and classified as ten families of nonautonomous DNA transposons (Kapitonov and Jurka 1999). Another analysis of transposon diversity in a much larger Arabidopsis dataset (≈17.2 Mb) grouped 179 AthE-like or ATREP-like elements into seven families based on common structural features and identified them as members of a novel superfamily of transposons, named Basho, that moved by an unknown transposition mechanism (Le et al. 2000). A Basho-like group was also identified in maize, supporting the concept of a new plant transposon superfamily. Completion of the whole genome sequence of Arabidopsis (Arabidopsis Genome Initiative 2000) revealed the existence of 1,265 Basho elements. In contrast with the class I elements that primarily occupy the centromere, but consistent with other class II transposons, Basho elements predominate on the periphery of pericentromeric domains. Novel elements resembling the structurally unusual Basho elements were also found in rice, suggesting a wide distribution of these elements in plants (Turcotte et al. 2001). Similar to Basho elements in Arabidopsis, the rice elements are small (<2 kb), lack coding capacities, TSDs or TIR, and are highly conserved at both termini. The big outstanding question after these studies was: by what mechanism does this new superfamily of transposons multiply and transpose in the host genome?

In 2001, this question was answered hypothetically when Kapitonov and Jurka (2001) carried out an in silico reconstruction of putative autonomous transposons from inactive copies accumulated in the three genomes analyzed, Arabidopsis thaliana, Caenorhabditis elegans, and Oryza sativa. Deletions, insertions, and premature stop codons were removed from the consensus sequences of the transposons by computational approaches, in a reconstruction process reminiscent of that of Sleeping Beauty (Ivics et al. 1997). Finally, rolling circle (RC) replication, a transposition mechanism until then restricted to prokaryotes, was proposed to explain movement of this previously unknown category of eukaryotic DNA transposons. The new elements were designated Helitrons because the protein encoded by the putative autonomous elements had a conserved DNA helicase domain.

11.3 Genomics of Helitrons

11.3.1 Molecular Structure of Putatively Autonomous and Nonautonomous Helitrons

Helitrons have been found in every plant genome where they have been carefully looked for (Table 11.1). As a consequence of their in silico detection, the majority of Helitrons identified in a given species share distinct structural features with other elements in the same species and in closely related species. The putative autonomous Helitrons reconstructed from nonautonomous ones in Arabidopsis thaliana (Helitron1 and Helitron2) and Caenorhabditis elegans (Helitron1_CE) encode a large protein denominated RepHel that contains a Rep domain homologous to RC replication initiators and a Hel domain homologous to DNA helicases (Kapitonov and Jurka 2001). Because the predicted RepHel proteins share motifs with the transposases of bacterial RC transposons, Helitrons were postulated to transpose by RC replication. The enzymatic core of the ~100-aa Rep domain contains three motifs that are conserved in a wide diversity of eukaryotes (Feschotte and Pritham 2007; Kapitonov and Jurka 2007). The larger, ~400-aa Hel domain contains eight universally conserved motifs in all putative autonomous Helitrons (Fig. 11.1a). Examples of these conserved motifs are shown in Fig. 11.1d. Conservation of the RepHel protein has been used as the criterion to identify hypothetical autonomous Helitrons in all plant host genomes (Table 11.1).

Table 11.1 Dynamic distribution of Helitron transposons in sequenced plant genomes
Fig. 11.1
figure 00111

Generic structure of identified Helitrons in different eukaryotes. (a) Hypothetical autonomous Helitron with coding capacity for a RepHel protein. Rep (Replication motifs are in green, and Hel (Helicase motifs are in blue. The conserved 5′ TC terminus is shown in light green. The conserved 3′ CTRR terminus is shown in red, with a stem-loop structure formed from a palindromic sequence in the 3′ subterminal region. The insertion is targeted to an AT dinucleotide shown in lowercase above a blue line representing the flanking sequence. The vast majority of Helitrons are nonautonomous elements with similar terminal structures as the autonomous copies. (b) Agenic nonautonomous Helitrons lack any known coding capacity. (c) Genic nonautonomous Helitrons carry fragments from a variable number of genes in the host genome (yellow, orange, and light blue boxes). (d) Multiple alignments of the conserved motifs of Rep domain (two-His and KYK and PIF1 helicase domain) in plant Helitrons. At, Arabidopsis thaliana; Bo, Brassica oleracea; Gm, Glycine max; Ma, Musa acuminata; Mt, Medicago truncatula; Os, Oryza sativa; Pe, Phyllostachys edulis; Sb, Sorghum bicolor; Zm, Zea mays

Shorter nonautonomous Helitrons are far more abundant and correspond to the non-TIR-, non-TSD-containing highly repetitive sequences that were noted earlier in Arabidopsis and rice. They have been grouped into multiple families based on the degree of sequence conservation at both 5′ and 3′ termini (Fig. 11.1b). Most of these elements are smaller than 2 kb and encode no detectable proteins. Longer elements with extra protein-coding capacity (Fig. 11.1c) occur in some species. For example, in Arabidopsis and rice, the putative autonomous Helitrons also encode subunits of RPA70, a single-stranded-DNA-binding protein. These are absent in C. elegans, making it unlikely that they are part of the transposition machinery (Kapitonov and Jurka 2001). Though RPA-like proteins have also been identified in some animal Helitrons (Feschotte and Pritham 2007; Kapitonov and Jurka 2007), their exact function remains unknown.

11.3.2 Biological and Computational Identification of Helitrons

Among the dozens of known eukaryotic DNA transposons (Feschotte and Pritham 2007; Kapitonov and Jurka 2008; Wicker et al. 2007), Helitrons stand out as a rare example of TEs discovered purely by computational, rather than genetic, studies. Though only recently identified, Helitrons are an ancient superfamily of eukaryotic DNA transposons, as evidenced by their cross-kingdom presence in plants (Table 11.1), fungi (Galagan et al. 2005), and animals (Cocca et al. 2011; Kapitonov and Jurka 2001; Pritham and Feschotte 2007). Helitrons are the only eukaryotic transposons that lack TIRs, do not generate TSDs upon integration in the host genome, and do not encode any known transposases. Furthermore, until their computational discovery, none had been found to be the causative agent of a mutation. These unusual features delayed their discovery, although Helitrons resemble other eukaryotic DNA transposons in terms of their impact on the host genome. Following their discovery, Helitrons have been identified by both biological and computational approaches.

11.3.2.1 Biological Identification of Helitrons

Helitrons have been detected biologically in only a handful of cases, either as insertional mutagens causing spontaneous mutations (Table 11.2) or as colinearity disruptors contributing to haplotypic diversity within a species.

Table 11.2 Characterized variants resulting from Helitron insertions

Molecular characterization of the spontaneous sh2-7057 mutant allele in maize (Lal et al. 2003) revealed that the mutation carried a large Helitron insertion in the 11th intron of the sh2 gene. This was the first case to demonstrate the mutagenicity of Helitron transposons. Though the insertion in this mutant was larger than 12 kb, it lacked coding capacity for known transposases and, instead, carried several gene fragments, including four exons with similarity to a plant DEAD box RNA helicase.

The strong terminal sequence similarity of the insertion in the spontaneous mutation ba1-ref (barren stalk-1) with the Helitron transposon in sh2-7057 led to the realization that this classical mutation, identified more than three quarters of a century ago, had been caused by a Helitron insertion. In contrast to the insertion in sh2-7057, the 6.5 kb Helitron element in ba1-ref inserted in the proximal promoter region of the ba1 gene (Gupta et al. 2005). Though the 6.5-kb insertion also carried multiple pseudogene fragments, these differed from those in the Helitron transposon of sh2-7057. The conserved 5′ and 3′ termini of these Helitrons were found to be repetitive in the maize genome, suggesting that they play an important role in Helitron amplification.

More strikingly, three independent ts4 mutations, which develop carpels in the florets of the tassel, were found to carry Helitron insertions in the promoter of the zma-MIR172e gene (Chuck et al. 2007). These mutations arose at different times in different genetic backgrounds. Since only the ends of the insertions were sequenced, it is not possible to speculate on the relationships among these elements. However, the similarity in size between the insertions in ts4-TP and ba1-ref (~6 kb) suggests that the former may also carry gene fragments.

Mutations caused by Helitron insertions have been identified in other plant genomes, as well (Table 11.2). Hel-It1, the first mutagenic Helitron described in dicots, interrupts the anthocyanin pigmentation gene DFR-B in the pearly-s mutant of Ipomoea tricolor (Choi et al. 2007). This 11.5-kb Helitron shows the structure predicted for a plant autonomous element, with conserved 5′ and 3′ termini and genes for Rep/Hel and RPA proteins. A frameshift mutation in the former and a nonsense mutation in the latter would render this element nonautonomous, but several related elements are found in the Ipomoea genome. In fact, RPA transcripts not containing the nonsense mutation of Hel-It1 were detected in the pearly-s mutant and were proposed to originate from a hypothetical autonomous element present in that line.

The 3′-UTR of genes appeared to be an underrepresented target for Helitron insertion until a recent study on the S-RNase-based gametophytic self-incompatibility system in the tetraploid sour cherry (Prunus cerasus). A 306-bp nonautonomous Helitron element was identified 38 bp downstream of the stop codon of the SFB gene in four nonfunctional (self-compatible) S 36 variants (Tsukamoto et al. 2010). The vast majority of SFB transcripts in S 36 do not have a poly (A) tail, suggesting that the presence of the Helitron element interferes with the polyadenylation process. Helitron elements have also been found associated with certain S haplotypes in the self-compatible species Arabidopsis thaliana (Liu et al. 2007; Sherman-Broyles et al. 2007), raising the intriguing prospect that they may have played a widespread role in the evolution of self-compatibility. However, further studies are needed to establish conclusively that the Helitron insertion was the real cause of the loss of function of the S 36 variants in sour cherry.

Genome components other than genes, such as DNA transposons, can also be targeted by Helitrons. In OsES1, a rice homolog of the maize En/Spm transposon, a 1,280-bp nonautonomous Helitron transposon, is located in the seventh intron of the gene encoding the TnpA transposase (Greco et al. 2005). The Helitron insertion seems to induce alternative splicing, as do many other transposon insertions in transcribed regions (Dooner and Weil 2012). Thus, Helitrons may play a role in the regulation of the transpositional activity of CACTA elements, the most abundant superfamily of DNA transposons in rice (Paterson et al. 2009).

Because many maize Helitrons carry segments of multiple genes, they have been identified much more frequently as disruptors of genetic colinearity among different maize inbred lines (Brunner et al. 2005a, b; Fu and Dooner 2002; Lai et al. 2005; Morgante et al. 2005; Song and Messing 2003; Wang and Dooner 2006). The so-called “intraspecific violation of genetic colinearity” (Fu and Dooner 2002) or “plus–minus variation” (Lai et al. 2005) resulting from Helitron insertions in maize led to community efforts to achieve a more detailed and precise identification and annotation of Helitrons (Du et al. 2008, 2009; Yang and Bennetzen 2009a). This effort was essential to a proper annotation of the actual gene content in the maize genome (Schnable et al. 2009) because of the gene-fragment-rich property of the widely prevalent nonautonomous elements (Lal et al. 2009a).

Recently, a maize-type of Helitron transposon was discovered in the Pooideae grass Lolium perenne (perennial ryegrass). Large (~7.5 kb) Helitron elements were identified that had trapped fragments, including exons and introns, from three genes: GIGANTEA (GI), succinate dehydrogenase, and ribosomal protein S7 (Langdon et al. 2009). All three fragmented genes shared the same transcription orientation as the Helitron elements. Highly similar Helitrons were detected in the closely related grass species Festuca pratensis (meadow fescue), indicating a likely common ancestral origin of these elements.

11.3.2.2 Computational Identification of Helitrons in Sequenced Organisms

The vast majority of Helitrons were identified from in silico studies of sequenced genomes either manually or via investigator-designed ad hoc mining programs, such as DomainOrganizer (Tempel et al. 2006), HelitronFinder (Du et al. 2008, 2009), HelSearch (Yang and Bennetzen 2009b), and Helitron_scan (Feschotte et al. 2009). The contribution of Helitrons to plant genomes varies widely, from none to as high as ~7 %. However, determining an exact figure for the Helitron content of any given host genome is chancy. Due to the extremely limited sequence conservation among Helitrons, it is not surprising to find quite different figures in updated versions of the same genome sequence (e.g., Du et al. 2010; Schmutz et al. 2010).

The published programs for automated computational identification and classification of Helitrons utilize either a homology-based or a structure-based approach. The latter approach (Du et al. 2008; Yang and Bennetzen 2009b) has been applied only recently in the analysis of whole genomes (Du et al. 2009, 2010; Yang and Bennetzen 2009a).

Initially, the homology-based approach was used to compare sequences at both the nucleotide and amino acid levels, as demonstrated by Kapitonov and Jurka (2001) in their original paper. Helitron-like transposons in rice were classified as Helitrons based on their capacity to code for proteins homologous to Rep/helicase and RPA (Kapitonov and Jurka 2001) and their shared structure hallmarks with Arabidopsis Helitrons (AT insertion site, 5′-TC, and 3′-CTRR and the 15- to 20-nucleotide palindrome close to the 3′-end). In an analogous approach, 21 Helitron elements were identified in the model legume Lotus japonicus by using as queries the RC motif and domain-5 of the RepHelicase from Arabidopsis Helitrons. Altogether, Helitron elements made up 0.4 % of the 32.4 Mb examined sequences (Holligan et al. 2006).

Novel Helitrons were also identified by nucleotide similarity to whole Helitron elements or to just the termini (Du et al. 2008, 2009; Kapitonov and Jurka 2001; Sweredoski et al. 2008; Tempel et al. 2007; Yang and Bennetzen 2009a, b). Other prevalent criteria implemented in genome-wide annotations of Helitron transposons include nonallelic locations in a given host genome and presence/absence of polymorphisms revealed from vertical comparison of colinear regions in closely related genomes (Wicker et al. 2010).

In addition to the two model plant genomes where Helitrons were originally identified, Helitrons have been detected in many other flowering and nonflowering plants. Paralleling the 20-fold variation in genome size, Helitron content varies from 0.01 % in grape to 6.72 % in the latest annotation of the Arabidopsis thaliana genome (Table 11.1). The estimated contribution of Helitron elements to a particular host genome also varies in different databases analyzed by different researchers, as seen Arabidopsis thaliana, rice, sorghum, and soybean.

Helitrons are poorly conserved among species, even of the same genus; this has made it hard to determine their presence systematically. Nevertheless, comparisons of the Helitron content of closely related species have been carried out in Arabidopsis and rice. The former involved the whole genomes of A. thaliana and A. lyrata (Hollister et al. 2011) and the latter, the partial genomes of 13 Oryza species (Gill et al. 2010).

As shown in a recent study on TE evolutionary dynamics in Arabidopsis employing the powerful transposon display method, Basho Helitrons were amplifiable in A. thaliana but were apparently absent from A. lyrata. This led to the suggestion of a recent burst of Basho insertions specifically within A. thaliana (Lockton and Gaut 2010). However, a subsequent sequence annotation effort revealed that Helitrons are actually the most abundant TEs in the fully sequenced A. lyrata genome (Hollister et al. 2011).

In an attempt to examine the relative abundance and distribution of TE classes across the genus Oryza, DNA transposons were identified by homology-based searches of BAC-end sequences from 13 species representing 8–17 % of each of the ten Oryza genome types. The Helitron content in the genus was found to vary greatly, from 0.29 % in O. australiensis to 3.15 % in O. glaberrima (Gill et al. 2010).

The identification of Helitrons from newly sequenced genomes remains a challenging endeavor despite the availability of several refined programs for detecting them. As shown in Table 11.1, Helitron-related sequences make up as much as 1.6 % of the Selaginella genome (Banks et al. 2011), but less than 0.2 % of the Brachypodium (International Brachypodium Initiative 2010) and Physcomitrella (Rensing et al. 2008) genomes. The lesson learned from other genomes, such as sorghum, suggests that the Helitron content of the latter two genomes will increase upon future careful annotation.

Glimpses of ongoing sequencing projects reveal that Helitrons are major components of some other plant genomes, as they are in sequenced model genomes. For example, Helitron transposons constitute ~1 % of 1.2 Mb of sequences from the tetraploid moso bamboo (Phyllostachys pubescens E. Mazel ex H. de Leh.) (Gui et al. 2010). In wheat (Triticum aestivum), 3,222 TEs have been annotated in 18.2 Mb of sequence from chromosome 3B. Only five families of agenic nonautonomous Helitrons were identified, representing just 0.07 % of the genomic sample sequences, in contrast to the 81.4 % contribution from all other TEs (Choulet et al. 2010). The only Helitron found so far in barley (Scherrer et al. 2005) is present in about 20–30 copies in the genome, based on 574 Mb of high-throughput sequences representing about 10 % of a genome equivalent (Wicker et al. 2008). Very recently, a putative Helitron sequence was first reported in sunflower and its insertion was dated to 1.14 million years ago (Buti et al. 2011).

In spite of the ever-growing numbers of identified Helitrons in newly sequenced genomes, a much more careful characterization of Helitron composition is necessary for sequenced plant genomes where Helitrons have not been yet identified, such as Carica papaya (Ming et al. 2008), Cucumis sativus (Huang et al. 2009), and Solanum tuberosum (The Potato Genome Sequencing Consortium 2011). Given the ubiquitous presence of these elements in all carefully annotated plant genomes, Helitron-free plant genomes are unlikely to exist.

11.3.3 Coding Capacity

The structure of the hypothetical autonomous Helitron proposed by Kapitonov and Jurka (2001) is fairly sound since elements with a similar structure continue to be found in an increasing number of genomes (Choi et al. 2007; Morgante et al. 2005). However, all of the Helitrons identified so far are nonautonomous and, oftentimes, bear gene fragments coding for proteins other than the REP-HEL transposase proposed for the RC transposition of Helitrons (Brunner et al. 2005a, b; Gupta et al. 2005; Lai et al. 2005; Lal et al. 2003; Morgante et al. 2005; Wang and Dooner 2006; Xu and Messing 2006).

In maize, two research groups have scanned the nearly complete genome sequence using similar computational approaches (Du et al. 2009; Yang and Bennetzen 2009a) and concluded that the majority of the ~2,000 genic Helitrons identified carried fragments from genes located in different chromosomes, with a few exceptions coming from neighboring genes. The tendency of Helitrons to gene-fragment capture seen in maize may be not a general property of plant Helitrons. For instance, in A. thaliana, very few Helitron families were found to have acquired gene fragments (Hollister and Gaut 2007; Yang and Bennetzen 2009b). A similar low propensity to capture genes was found among Helitrons from rice, sorghum, and Medicago (Yang and Bennetzen 2009b).

As is the case with most other transposon superfamilies (Levin and Moran 2011), small RNAs generated from endogenous Helitron sequences have the potential to inhibit TE mobility through the posttranscriptional degradation of transposon mRNA. As recently reported in Physcomitrella patens, 6 % of the nucleotides within 48 23-nucleotide RNA loci overlapped with regions similar to Helitron elements, which make up just 0.12 % of the genome (Cho et al. 2008).

11.3.4 Target Preference

The insertion site preference of Helitron transposons has been analyzed at the nucleotide level (target site sequence specificity), gene level (coding capacity of target sequence), and genome level (chromosomal distribution).

Plant Helitrons insert almost invariably in a 5′-AT-3′ dinucleotide (Brunner et al. 2005a, b; Choi et al. 2007; Gupta et al. 2005; Kapitonov and Jurka 2001; Lai et al. 2005; Lal et al. 2003; Morgante et al. 2005; Wang and Dooner 2006; Xu and Messing 2006) and, exceptionally, in a 5′-NT-3′ dinucleotide (Du et al. 2008, 2009; Morgante et al. 2005; Yang and Bennetzen 2009a). In addition, plant Helitron insertion sites are notably AT-enriched on either side of the insertion (Du et al. 2009; Yang and Bennetzen 2009a).

The discovery over the last decade that Helitron insertions have been the cause of spontaneous mutations in several plant species would suggest that Helitrons target genic regions (see Table 11.2), at least in these host genomes. Supporting this inference, maize Helitrons were found to be most abundant in gene-rich regions across the genome (Du et al. 2009; Yang and Bennetzen 2009a). However, this may not be a general pattern in plants.

In Arabidopsis, for example, Helitrons are enriched in gene-poor pericentromeric regions (Yang and Bennetzen 2009b), thus showing a pattern opposite to that of other DNA transposons, which are frequently associated with gene-rich regions. However, in a different study that compared the proximity of transposons of different ages to genes in A. thaliana, Helitrons, and other recently active TE families, such as MITEs, tended to be closer to genes than ancient families, such as CACTA-like elements (Hollister and Gaut 2009). Moreover, nonautonomous Helitrons, many as small as MITEs, were unmethylated in higher proportions than most other TE families. These observations were explained by a model in which host silencing of TEs near genes has deleterious effects on neighboring gene expression, resulting in the preferential loss of methylated TEs from gene-rich chromosomal regions.

In rice, Helitron elements are more scattered along the chromosomes and not enriched in all pericentromeric regions (Yang and Bennetzen 2009b). As with other TEs, the distribution of Helitrons in present-day genomes probably reflects a combination of factors, such as continued mobility, insertion specificity, purifying selection against insertion in genes, and rates of DNA removal in gene-poor heterochromatic regions.

11.3.5 Differential Amplification and Contribution to Host Genome

The variable patterns of Helitron accumulation in sequenced plant genomes suggest different dynamics of Helitron proliferation across species and differential contributions to the present structure of their host genomes.

Helitrons make up a wide fraction of the plant genomes sequenced so far, from barely detectable to as much as 1/16 (Table 11.1). As has been well documented, TE proliferation and polyploidization are the two major processes that increase plant genome size (Bennetzen 2005). Cornucopious, the most abundant Helitron transposon subfamily in maize, consists of thousands of copies of ~1-kb agenic elements with variable sequence identity to the consensus (Du et al. 2009). These relatively small maize Helitrons may be actively transposing after a recent escape from transposition suppression, like the mPing MITEs suddenly amplified during rice domestication (Naito et al. 2006), whereas the amplification of the vast majority of Helitron families in maize, rice, and Sorghum peaked about 0.25 million years ago (Yang and Bennetzen 2009a).

In the recent annotation of the A. thaliana genome (Ahmed et al. 2011), Helitron-related sequences made up 6.7 % of the genome, more than the sum of all other DNA transposons (Table 11.1). In agreement with earlier results (Hollister and Gaut 2009), elements from the Helitron and Tc1/mariner superfamilies had the highest proportion of unmethylated sequences, whereas those from the Gypsy and CACTA superfamilies had the lowest.

As with Helitron content, different numbers of Helitron families have been identified the same organism (Table 11.1). In general, Helitrons with a smaller size tend to be amplified to a high degree (Ahmed et al. 2011; Du et al. 2009; Hollister and Gaut 2007). And, as noted in Arabidopsis and maize, longer Helitrons are less likely to persist in the genome (Hollister and Gaut 2007; Yang and Bennetzen 2009a), presumably because they are selected against in order to avoid the deleterious effects of inter Helitron ectopic recombination. However, other explanations may be possible because no recombination was detected within the heavily methylated gene fragments borne on maize Helitrons in a large-scale experiment specifically designed for that purpose (He and Dooner 2009).

In addition to their effect on genome size through massive amplification of agenic families, Helitrons contribute to haplotype variability through transposition and chromosome rearrangements (Ahmed et al. 2011; Brunner et al. 2005a; Lai et al. 2005; Morgante et al. 2005; Wang and Dooner 2006). The mechanism of gene movement that results in the erosion of colinearity between closely related species was recently investigated in a three-way comparison of the Brachypodium, rice, and sorghum genomes (Wicker et al. 2010). Gene capture by TEs, including Helitrons, was not found to have contributed significantly to gene movements within the grass family. On the other hand, TEs of many superfamilies, including Helitrons, were found at the borders of the noncolinear (i.e., mobilized) regions, suggesting that repair of TE-induced double strand breaks through synthesis-dependent strand annealing (SDSA) may have been involved in the change of position of genes in related genomes.

11.4 The Genetics of Helitrons

Being a member of the rare group of transposons that have been discovered computationally (Feschotte and Pritham 2007), it is not surprising that Helitron genetics trails its genomics. Yet, a genetic approach will be needed to identify a functional autonomous Helitron transposon, discern the actual mode(s) of transposition, assess the regulation of and by captured gene fragments, and elucidate other aspects of basic Helitron biology.

11.4.1 Transposition Mechanism: Rolling Circle and/or Cut-and-Paste?

A rolling circle replication mechanism has been proposed for the amplification of this novel class of transposons (Kapitonov and Jurka 2001). The putative autonomous Helitrons from the three genomes originally examined shared two conserved domains: the cross-kingdom DNA helicase domain and the replicator initiator proteins of RC plasmids and certain ssDNA viruses (Fig. 11.1a). Though still a hypothetical mechanism, RC replication is supported by the conserved structure of putative autonomous copies from several sequenced model plant genomes (Table 11.1).

The genome-wide distribution of Helitron elements favors a dispersive transposition model, although occasional Helitron clusters have been reported in some plant genomes (Lai et al. 2005; Yang and Bennetzen 2009a). Some peculiar head-to-head, head-to-tail, and tail-to-tail Helitron configurations have been identified in the maize genome (Du et al. 2008; Yang and Bennetzen 2009a), but they are composed of dissimilar Helitrons with similar terminal sequences, which differ from the perfect head-to-tail Helitron configurations expected from a RC replication mechanism and, so far, found only in the Myotis lucifugus genome (Pritham and Feschotte 2007).

As discussed in Sect. 11.3.5, Helitrons have contributed to the frequent loss of genetic colinearity in related plant genomes. Many recently duplicated fragments in the grasses are bordered by transposable elements (TEs), including Helitrons (Wicker et al. 2010). Other chromosomal rearrangements, such as inversions, are also oftentimes associated with Helitron transposons. Of the 154 inversions identified between Arabidopsis thaliana and Arabidopsis lyrata, one-third are flanked by inverted repeats from Helitron elements (Hu et al. 2011).

In addition to RC replication, a Helitron cut-and-paste transposition mechanism, like the one used by most known DNA transposons, was recently proposed. Li and Dooner (2009) found that, unexpectedly, some maize Helitrons could excise somatically. The somatic excision products or footprints left by removal of a 6-kb Helitron consisted of a variable number of TA repeats at the prior insertion site, an unlikely consequence of a RC replication mechanism. Somatic excision products were also detected from other genic and agenic Helitron elements (Du et al. 2008; Li and Dooner 2009). This finding suggests that, like Tn7 (Craig 2002) and Mutator (Walbot and Rudenko 2002), Helitrons may exhibit both replicative and excisive modes of transposition.

11.4.2 Gene Capture

Transduplication or the capture of host gene sequences, first reported for Mutator elements (Jiang et al. 2004; Talbert and Chandler 1988), is a common feature of several families of plant transposons (Dooner and Weil 2007). However, Helitrons may contribute the largest portion of transduplicated sequences in some plant genomes, like maize (Brunner et al. 2005b; Du et al. 2009; Lai et al. 2005; Morgante et al. 2005; Wicker et al. 2010; Yang and Bennetzen 2009a, b).

In contrast to the broad-spectrum of captured genes in maize, only a few genes have been captured by Helitrons in A. thaliana (Hollister and Gaut 2007; Yang and Bennetzen 2009b). Gene-capture by Helitrons is also a rare event in Medicago, Brachypodium, sorghum, and rice (Fan et al. 2008; Wicker et al. 2010; Yang and Bennetzen 2009b). No correlation has been found between the transcriptional orientation of the captured gene fragments and the orientation of the TE in which they are lodged. In fact, some Helitrons contain multiple genes with opposite transcriptional orientations (Lai et al. 2005; Lal et al. 2003; Wang and Dooner 2006; Wicker et al. 2010).

In spite of the well-documented transcriptional activities of genes captured by Helitrons from different plant species (Brunner et al. 2005b; Lai et al. 2005; Lal et al. 2003; Morgante et al. 2005 and see Sect. 11.4.3), no cases of functional full-length gene capture by Helitron elements have been reported. Although an almost intact cytidine deaminase gene missing only the first six amino acids was found embedded in a maize Helitron, no transcripts corresponding to it were detected in any tissue examined (Xu and Messing 2006).

The capture of gene fragments from various genomic locations by the same Helitron may give rise to complex networks regulating the donor genes (Brunner et al. 2005b; Lai et al. 2005). The extent to which the host genome could benefit from these potentially deleterious effects (Du et al. 2009) is unclear.

11.4.3 Coevolution with the Host Genome

The potential role of Helitrons and other TEs in gene creation in plants has been recently reviewed by Dooner and Weil (2012).

Gene fragments captured by Helitrons originate from nonadjacent loci in the genome, yet they tend to be in the same transcriptional orientation relative to each other and to the Helitron’s RepHel gene. A large collection of gene-fragment-bearing Helitrons in maize show a notable bias in the orientation of gene fragments that is compatible with Helitron promoter-driven expression (Du et al. 2009; Yang and Bennetzen 2009a). Several chimeric transcripts containing exons from different genes (“exon shuffling”) have been detected for maize Helitrons (Brunner et al. 2005b; Lai et al. 2005; Morgante et al. 2005). Though many of these transcripts contain premature stop codons in all reading frames and are unlikely to encode functional proteins immediately, Helitrons could have contributed to gene creation over evolutionary time (Brunner et al. 2005b). Expression of chimeric transcripts can also be driven by the promoter of the disrupted gene, rather than by a Helitron promoter. In maize, chimeric transcripts derived from genes captured by the inserted Helitron in the sh2-7057 mutant are produced from the sh2 promoter (Lal et al. 2003), rather than from a Helitron promoter.

The idea that TEs have been co-opted by the host as regulatory sequences has received considerable experimental support. Many cis-regulatory elements involved in transcriptional regulation have characteristics of TEs and some of them are Helitrons. For example, the CArG motif essential for the transcriptional activation of LEAFY COTYLEDON2 (LEC2), a master regulator of seed development in A. thaliana, is located at the beginning of a Helitron element (Helitron3). This and other TE insertions located in the promoter region of LEC2 were speculated to control the gene’s specific expression pattern (Berger et al. 2011).

TE sequences are also found in transcripts, where they may play an unsuspected regulatory role. In Arabidopsis thaliana, more than 2,000 putative TE-gene chimeras, where a TE is found in at least one expressed exon, have been identified and compared to all TEs in a TE database (Lockton and Gaut 2009). Helitron-like sequences were strikingly underrepresented (2.4 %) in exons, contrasting with the high abundance (~20 %) of all other TEs. A similar pattern was found for the specific targets of the MOM1 (MORPHEUS’ MOLECULE1) regulator of transcriptional gene silencing in Arabidopsis (Numa et al. 2010). The majority of MOM1 targets carry sequences related to TEs of both classes and are clustered at pericentromeric regions, suggesting that MOM1 acts on regions of heterochromatin in the genome. Helitron remnants, on the other hand, were significantly underrepresented among MOM1-regulated transcripts. The authors suggested that, because Helitrons target active genes undergoing transcription, their low frequency among MOM1-target sequences may reflect exclusion of MOM1 from active chromatin environments. As major contributors to the evolution of plant genomes, more in-depth analyses are required to decipher the contributions of TEs to annotated protein-coding regions, an essentially unexplored field (Lal et al. 2009b).

11.4.4 Epigenetic Regulation

There is growing evidence that the proliferation of TEs in plants is under epigenetic regulation and that their biological properties are strongly affected by cycles of methylation and demethylation (Lisch 2009).

The past couple of years have seen a considerable increase in experimental data, mainly from Arabidopsis, on the methylation status of TEs. As shown in two earlier bisulfite sequencing studies (Gehring et al. 2006; He and Dooner 2009), Helitrons are heavily methylated at CG sites. In the first study, a Helitron inserted 4 kb upstream of the start site of the Arabidopsis MEDEA gene was heavily methylated, yet did not contribute to the allele-specific DNA hypomethylation in the endosperm (Gehring et al. 2006). In the second study, two maize Helitrons shown to be nonrecombinogenic despite the presence of multiple gene fragments were much more methylated than the adjacent recombinogenic gene-rich region (He and Dooner 2009).

Transcriptional reactivation of TEs in the mature pollen of Arabidopsis has been detected in microarray assays of TE expression profiles during development (Slotkin et al. 2009). In most tissues and stages, the ORFs of Helitron2 and six other full-length TEs (including retrotransposons and DNA transposons) were either not expressed or expressed at a very low level, indicating that they are generally silenced. However, all seven full-length TEs examined were coordinately expressed in mature pollen. TE expression coincides with loss of DNA methylation and downregulation of the chromatin remodeler DDM1.

A recent study analyzed the contribution of TEs and small RNAs to gene expression variation in A. thaliana and A. lyrata, a closely related congener with a two to threefold higher copy number for every TE family examined, including Helitrons (Hollister et al. 2011). Reassessment of the TE content in the two species revealed that, unexpectedly, Helitrons were the highest copy number DNA transposons in both (Table 11.1). The 24-nt siRNA complements from the two species were compared in order to address the possible role of siRNA-guided transcriptional gene silencing in differential TE proliferation. Helitrons were found to be less often targeted by unique 24-nt siRNAs in A. lyrata than in A. thaliana, possibly explaining their higher copy number in the former. An almost concurrent reanalysis of DNA methylation, siRNA, and TE datasets from Arabidopsis thaliana concluded that Helitrons actually contribute ~7 % of the annotated genome (Table 11.1) and, along with the Tc1/mariner superfamily, have the largest fraction (40–50 %) of unmethylated TE sequences (Ahmed et al. 2011).

Around a dozen Arabidopsis genes are imprinted, i.e., expressed in a parent-of-origin-dependent manner in the endosperm during seed development (Kermicle 1970). In a couple of cases, Helitron insertions have been implicated in imprinting. In a study on the association of TE methylation with gene imprinting during seed development in A. thaliana, TE fragments were found to be extensively demethylated in the endosperm (Gehring et al. 2009). Two imprinted members of the class IV homeodomain transcription factors contain remnants of Helitron elements at the 5′end. Although these genes showed reciprocal imprinting, i.e., predominant expression of the maternal allele in one and of the paternal allele in the other, methylation of the Helitron remnants was lost from the maternal alleles in both cases. Other imprinted genes are also neighbored by TEs. AGL36, a maternally expressed gene, contains remnants of Helitrons and other TE sequences within a 1.7-kb promoter fragment that is sufficient to confer parent-of-origin-specific expression of a reporter (Shirzadi et al. 2011). Paternally expressed genes, as well, are enriched for cis-proximal transposons, particularly for Helitrons (Wolff et al. 2011). It has been proposed that imprinting may have evolved from targeted methylation of TE insertions near genes followed by positive selection when the resulting expression change was advantageous (Gehring et al. 2009).

Whether a TE can exert a regulatory effect on a nearby gene obviously depends on the distance between the transposon and the gene. A methylated AtREP2 Helitron inserted 3.8 kb upstream of the imprinted MEA gene in the Col-0 and Ler-0 ecotypes of Arabidopsis thaliana was considered a candidate for imprinting control elements until ecotypes were found where MEA was still imprinted, though they lacked the upstream Helitron (Spillane et al. 2004). In a recent study relating gene expression to distance from the nearest TE in A. thaliana, average gene expression increased with distance up to about 2.5 kb (Hollister et al. 2011).

11.5 Perspective

The huge number of annotated Helitron transposons in plant genomes, including both putative autonomous elements and nonautonomous elements with and without gene fragments (Table 11.1), represents only the tip of the iceberg.

The molecular structure of the autonomous Helitron and the RC mechanism of transposition (Kapitonov and Jurka 2001) remain hypothetical, but are supported, respectively, by the conservation of structure of the putative autonomous element across evolutionarily widely divergent species and the identification of occasional head-to-tail configurations that make RC replication a credible transposition mechanism. Whether the RepHel protein is necessary and/or sufficient for RC transposition needs to be confirmed experimentally. The discovery of Helitron somatic excision products in maize (Li and Dooner 2009) suggests that Helitrons may transpose by both copy-and-paste and cut-and-paste mechanisms.

As is evident from successive sequence annotations of the same genome, determination of the overall Helitron contents in a given genome is a challenging and uncertain exercise (Feschotte and Pritham 2009). The conserved sequence and structure of the 3′ end of known Helitrons has served as the basis for the development of a number of ad hoc programs for specific genome-wide surveys of this highly divergent family of transposons. However, their cross-species applications are still not efficient in identifying Helitrons in new species and novel programs, possibly based on the recognition of conserved nucleotide patterns, are desirable for the efficient de novo identification of Helitrons from all genome sequencing projects.

Only a few cases of gene-fragment-bearing Helitrons have been identified in plants other than maize. The high frequency of gene fragment capture by maize Helitrons is enigmatic, but it has been suggested to result from a RepHel enzyme with a different replication/repair fidelity (Yang and Bennetzen 2009b). The identification and characterization of an autonomous Helitron in maize would be highly desirable because maize is an excellent experimental genetic system and has currently active elements, as is evident from several recently arisen mutations (Table 11.2).

The dynamic evolution of Helitron is best exemplified by the discovery in maize of a new group of Helitron-like sequences, designated Heltir, which end in perfect 37-bp TIRs (Du et al. 2009). The sequence variability of Helitrons and the presence in the genome of other forms, like Heltirs, complicate the accurate estimation of the contribution of this transposon superfamily to plant genomes.