Introduction

McClintock’s cytogenetic and genetic experiments in the 1940s with maize, Zea mays, enabled her to identify a line that was prone to breakage and instability of chromosome 9 at a specific locus, Ds (Dissociator). She attributed instability of the Ds locus in the presence of Ac (Activator) to genetic elements (Ds and Ac) capable of transposing their chromosomal locations (McClintock 1948). However, the genetic community was skeptical of McClintock’s concept of ‘moveable’ genes; contemporary geneticists were steadfast in their acceptance of the ‘Chromosome Hypothesis of Heredity’, and Morgan’s legacy of genetic entities (genes) on chromosomes. Geneticists gradually became more accepting of mobile genetic elements after the discovery of similar elements in bacteria and other diverse eukaryotes. Initially transposable elements were labeled as ‘genomic parasites’ or ‘selfish DNAs’ because they were thought to propagate without obvious cellular functions in genomes (Doolittle and Sapienza 1980; Orgel and Crick 1980). Numerous findings have ascribed cellular functions to TEs (Volff 2006; Fedoroff 2012; Alzohairy et al. 2013) that have necessitated a revision of their moniker to ‘genomic gold’. However, the genomic role of TEs remains unclear, and is still being debated (Hua-Van et al. 2011; Abrusán et al. 2013; Brunet and Doolittle 2015). It is widely believed that TEs have been instrumental in evolution, and in shaping the genomes of many contemporary species of flora and fauna (Oliver and Greene 2012; Fedoroff 2012; Zhao et al. 2015). This review summarizes the system of classification of TEs, and evaluates their role as a major player in the evolution of plant genomes.

Classifications of TEs

TEs can be grouped into two classes, based upon their manner of transposition: class 1 elements or retrotransposons move via a “copy-and-paste” manner while class 2 elements move via a ‘cut-and-paste’ manner (Finnegen 1989) (Fig. 1). They can also be classified as autonomous or non-autonomous elements, depending on their movability. While some TEs can move by themselves because they are equipped with requisite molecular features for transposition, there are some that cannot move unless they are provided with proteins for their mobility in trans. Non-autonomous elements usually are derived from autonomous elements through mutations. Genome sequencing projects have revealed many thousands to millions of copies of fossilized transposable elements in diverse eukaryotic genomes (Kapitonov and Jurka 2003; Moutri et al. 2007; Smith et al. 2012).

Fig. 1
figure 1

Class 1 retrotranspson (top) and class 2 DNA transposon (bottom). Class 1 retrotransposons transpose semiconservative copy-and-paste mechanism via RNA intermediate. Class 2 DNA transposon transpose cut-and-paste mechanism via DNA intermediate. For illustration purpose, the retrotransposon inserts into the nearby chromosomal location for illustration purpose. However, the retrotransposons actually transpose more into different chromosomal sites. Whereas, the class 2 DNA transposons transpose into nearby chromosomal site

A system of classification with appropriate annotations is necessary for the vast array of TEs from the newly emerging genome sequences in various species so that they can be classified into orders, super-families, and families on the basis of manner of transposition and sequence identities (Wicker et al. 2007). Class 1 retrotransposons have been classified into 5 orders: long terminal repeat elements (LTRs), Dictyostelium intermediate repeat sequence (DIRS), Penelope-like elements (PLEs), long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). DIRS, PLEs, LINEs, and SINEs are referred to as non-LTR retrotransposons because they lack the LTR sequences. While non-LTR retrotransposons are highly abundant in animal genomes, LTR retrotransposons are predominant in plant genomes (Wicker et al. 2007). LTR-retrotransposons carry two genes, gag and pol (Voytas and Boeke 2002). The gag gene encodes capsid proteins that are responsible for packaging the retrotransposon RNA and proteins. The pol gene encodes three ORFs of reverse transcriptase (RT), RNaseH (RH), and integrase (INT), which are responsible for the retrotransposition of the element into new chromosomal locations. Depending on the order of the ORFs in the pol gene, the LTR-retrotransposons are further classified into Gypsy superfamily and Copia superfamily (Kumar and Bennetzen 1999). The order of the ORFs is RT-RH-INT in Gypsy and INT-RT-RH in Copia retrotransposons, respectively. Transcription starts from the LTR sequences on the left and proceeds until the right LTR; the transcript is exported from the nucleus to cytoplasm where the RNAs serve as a template for translation for protein synthesis as well as cDNA synthesis. The first step is translation of the RNA to produce GAG and POL proteins. LTR-retrotransposon encoded aminopeptidase (AP) clips the POL protein into three components - RT, RNaseH, and INT. The GAG protein produces virus-like particles (VLPs) into which RNAs are packed together with the RT, RNaseH, and INT proteins. The RNAs are then copied into cDNA, starting from one LTR to another, and subsequently into double stranded DNAs within the VLPs. The double stranded DNAs re-enter the nucleus and then integrate into new chromosomal locations (Voytas and Boeke 2002; Levin 2002). Upon retrotransposition, LTR-retrotransposons create a 4–6 bp target site duplication (TSD). Non-LTR retrotransposons are mostly present in metazoans but are scarce in plants. These non-LTR retrotransposons lack both LTR and integrase; instead, they use the poly-A tail of the mRNA as primer to start DNA strand synthesis directly at the at the point of integration (Schulman and Wicker 2013). DIRS retrotransposons do not produce TSDs; PLEs, LINEs and SINEs produce TSDs of variable size on the insertion site (Wicker et al. 2007).

Class 2 TEs are often called DNA transposons, and can be differentiated into two subclasses: subclass 1 terminal invert repeat (TIRs) transposons and subclass 2 non-TIR transposons (Wicker et al. 2007). TIR transposons have transposase (TPase) and a variable size TIR at both ends. The TPase functions both as a DNA binding protein and an endonuclease (Reznikoff 2003). TPase in DNA transposons and INT in LTR-retrotransposons share the motif of aspartic acid (D)-aspartic acid (D)-glutamate (E) in the active site, implying that the divergence of DNA transposons and LTR-retrotransposons has been a very old evolutionary event that predates the divergence of prokaryotes and eukaryotes because they are present in both prokaryotes and eukaryotes (Hickman et al. 2010; Schulman and Wicker 2013). Subclass 2 TIR transposons comprise 9 super-families. The size of TIRs is variable in each super-family, and all of them produce a few bases long TSD upon integration. The Crypton super-family also belongs to the subclass 1 transposon; but Crypton transposons lack TIRs and TPase (Goodwin et al. 2003; Kojima and Jurka 2011). Crypton transposons have 4- or 6-bp direct repeats, and move as circular DNA produced by recombination between direct repeats and catalyzed by tyrosine recombinase (YR) that is encoded by the Crypton transposons. The circular DNAs, then, reinsert into the targeting site by recombination where no TSD is produced (Goodwin et al. 2003).

Two transposons have been identified in subclass 2 - Helitrons (Du et al. 2009) and Maverick (Feschotte and Pritham 2005; Kapitonov and Jurka 2006). These two transposons are markedly different in structure and mode of replication from subclass 1 DNA transposons. Helitrons are widespread in plants, metazoans, and fungi. They were the first transposons to be discovered by computational analysis of whole genome sequences. Helitrons transpose by a rolling circle replication mechanism via a single-stranded DNA intermediate, and do not produce a TSD (Li and Dooner 2009; and Pritham 2014). Helitrons appeared frequently capturing gene fragments and moving in the genome, which has impacted greatly in genetic diversity in maize (Lai et al. 2005; Morgante et al. 2005; Yang and Bennetzen 2009). This will be discussed in detail in the session of the genetic diversity below. Maverick (also called Politron) transposons are unique in that a set of proteins is necessary for transposition, including DNA polymerase, retroviral integrase, cysteine protease, and ATPase. Maverick transposons have several hundred nucleotides long TIRs with 5′-AG and TC-3′ termini, and produce a 6 bp TSD upon transposition (Kapitonov and Jurka 2006). They are excised from and reinserted into a genome by its encoded integrase.

Genome size variations and TEs

C-value is a quantitative metric that is used to define genome size. It denotes the DNA content of a complete complement of chromosomes in an organism; thus, 1 C represents the amount of DNA in an unreplicated monoploid genome (Greilhuber et al. 2005). The parameter is species specific, and has been subject to strong selection during evolution (Lee and Kim 2014; Canapa et al. 2016). C-values generally correlate well with cellular and organismal genome complexities; the genome sizes of prokaryotes are smaller than those of eukaryotes while lower eukaryotes have smaller genomes than higher eukaryotes. Often however, high genome size variations are present among closely related species and C-values correlate poorly with organismal complexity, suggesting a C-value paradox (Thomas 1971; Gregory 2001, 2002).

The C-value is often stated as picogram (pg) or base pairs (bp), where 1 pg represents 980 Mbp. In plant taxa, there is >2000-fold difference in the C-values of Genglisea margaretae (1C = 63.4 Mbp) and Paris japonica (1C = 148,880 Mbp), the smallest and largest members, respectively. In the genus Eloecharis, comprising more than 250 species, the 20-fold difference in C-values between the smallest, E. acicularis (2n = 20, 2C = 0.5 pg) and the largest, E. palustris (2n = 16, 2C = 11.05 pg), is the highest genome size variation among flowering plant species analyzed for both ploidy and C-values (Zedek et al. 2010). C-values have been measured in >6000 plant species and are categorized in the Kew Plant DNA C-value Database (http://data.kew.org/cvlaues) (Bennett and Leitch 2011, 2012). Of the 6287 angiosperms (flowering plants) and 204 gymnosperms, the average 1 C value was 5.809 Gbp in angiosperms, and 18.157 Gbp in gymnosperms, respectively. The median values were 2.401 Gbp in angiosperms and 17.506 Gbp in gymnosperms; ~8.38% of the measured species have a C-value between 0.4 and 0.6 Gbp (Civán et al. 2011). Large genomes may have been derived from smaller genomes through one or more mechanisms of genome amplification. However, the rate of DNA loss is greater in small genomes than the rate of DNA gain by TE amplification, as evidenced in species of cotton and its relatives (Hawkins et al. 2006, 2009).

Whole genome duplication (WGD) or polyploidy is a plausible mechanism that can account for variations in genome size. The current eukaryotes, specifically angiosperms, have had multiple episodic WGDs (Bowers et al. 2003; Kellis et al. 2004; Smith et al. 2013); the small genome of Arabidopsis thaliana (130 Mbp) has experienced three WGDs since its divergence from gymnosperms (Bowsers et al. 2003). Almost all flowering plants are paleopolyploids or polyploids (Wendel et al. 2016). Although WGD (polyploidization) can alter gene content and increase genome size in a single generation, it is not the only means for genomic variations. For example, the remarkable variations in genome size in Eloecharis species have been attributed to the rapid amplification of copy numbers of a few families of transposable elements in these species (Vicient et al. 1999; Bennetzen et al. 2005; Hawkins et al. 2008; Civán et al. 2011). Table 1 shows the genome size and TE content of a few representative angiosperm species. Many reports are available on the linear correlation between genome size and content of transposable elements in the genome (Bennetzen et al. 2005; Vitte and Panaud 2005; Tenaillon et al. 2010). Class I retrotransposons appear to have had the most impact on genome size variations because of their semiconservative “copy-and-paste” manner of transposition. TE content in the small genome of A. thaliana is about 15%, of which only 4% are retrotransposons (Kaul et al. 2000). Large genome species on the other hand have a high content of TEs, of which a large proportion are class 1 TEs. For example, ~85% of the Zea mays genome contains TEs (1C = 2.3 pg), of which 75% are class 1 L-retrotransposons and 8.6% are class 2 TEs (Schnable et al. 2009). Similarly, TEs occupy about 69% of the Secale cereale genome (1C = 8.093 pg), in which class 1 L-retrotransposons and class 2 TEs are 64.3 and 5%, respectively (Bartoš et al. 2008). If amplification of LTR retrotransposons is the more plausible mechanism for ‘genome obesity’ (Hawkins et al. 2008; Civán et al. 2011), why are there large discrepancies in genome sizes in closely related species of a few angiosperms? Is a large genome adaptive to the host organism? The answer is no, because the large genome may impose problems in cell biochemistry and physiology for replication and metabolism as suggested by the ‘large genome constraint’ (Knight et al. 2005). In an analysis of over 6000 plant species, except for a few species with extremely large genomes, genome sizes of angiosperms were found to be skewed toward a smaller size (Civán et al. 2011). This implies that ‘genomic obesity’ through repeated sequences and TEs may have had a negative impact on the survival of lower, simple unicellular organisms. Genome size expansion is counter-balanced by illegitimate recombination between adjacent LTR-retrotransposons (Devos et al. 2002; Hawkins et al. 2009). The illegitimate recombination between LTRs of different LTR-retrotransposons leads to the sequence deletion between LTRs and forms solo-LTR retrotransposon. That is, the abundance of solo-LTR retrotransposon might indicate the genome size reduction by illegitimate recombination between LTR-retrotransposon. Indeed, direct comparison between Arabidopsis (130 Mbp) and rice (430 Mbp) genomes revealed that the solo-LTR retrotransposons were more frequent in the small genome of Arabidopsis than that of rice (Bennetzen et al. 2005).

Table 1 Genome sizes and TE contents in a few plant species

Polyploidy or WGD, and TEs might not act exclusively on genome size evolution; instead, they interact with each other. Hybridization can induce reactivation of the silent TEs in the parental genomes by genome shock (McClintock 1984; Levy 2013). During the episodic and cyclic polyploidization process, massive genome restructuring events occur, placing cells in a different environment from their diploid progenitors (Ågren and Wright 2011; Wendel et al. 2016). Since hyperactivities of TEs reduce host fitness, they are suppressed by host epigenetic silencing mechanisms such as methylation (Slotkin and Martienssen 2007). However, the new cellular environment in hybrids can erase methylation in TE sequences to reactivate the silenced TEs (Levy 2013). Genome shock (i.e., hybridization, WGD) is an internal stress to cells that induces TE activation; external stress (i.e., heat, cold, drought, pathogen) can also induce TE activation (Chadha and Sharma 2014; Makarevitch et al. 2015). Sunflower species in the genus Helianthus provide an example of retrotransposon proliferation from hybridization, and subsequent adaptation in a harsh environment (Ungerer et al. 2006). Three hybrid taxa, H. deserticola, H. anomalus and H. paradoxus are the products of ancient hybridization between diploid parental taxa, H. annus and H. petiolaris (Rieseberg 1997). The hybrids have adapted to either a desert environment (H. deserticola, H. anomalus) or a salty marshland (H. paradoxus). Interestingly, the hybrids have nuclear genomes that are at least 50% larger than that of either parental species. The genome size differences in the hybrids were attributed to proliferation of the Gypsy retrotransposon that was caused by interspecific hybrid and abiotic stress acted (Ungerer et al. 2006; Staton et al. 2012). Here, not all TEs are activated; instead, a small number of TE families are activated to increase copy numbers, as evidenced by only a few lineages of TEs that are abundant in large genomes such as barley (Rostoks et al. 2002) and maize (Schnable et al. 2009). Estep et al. (2013) analyzed LTR retrotransposons in five panicoid grass genomes in which hyperactivity of a small family of LTR retrotransposons resulted in doubling the genome size of Zea luxurians, alleged to have happened in the last few million years. Moreover, in the genus Zea, one of the LTR retrotransposon amplification bursts was initiated by polyploidy, with most of the other TEs not being activated. If LTR retrotransposon bursts are responsible solely for genome expansion, the larger genomes should be teeming with younger LTR retrotransposons compared to the smaller genomes. Bennetzen et al. (2005) tested for this by analyzing the ages of LTR retrotransposons in six plant species; they found that the average age of LTR retrotransposon in rice was 2.5 million years old while that of barley was 2.8 million years old, suggesting that the age of LTR retrotransposons seems not to be related to genome size. The barley genome (4800 Mbp) is ten times larger than that of the rice genome (430 Mbp), implying that large genomes are due to amplification bursts of only a small number of families of LTR retrotransposons. This is consistent with the “Genome-Thrust” theory which states that TE bursts can cause extraordinary genetic changes in a short periods of time in a small population to drive genetic drift, and eventually lead to genome evolution (Oliver and Greene 2009, 2012; Oliver et al. 2013).

Genetic diversity through TEs

Throughout evolution, waves of expansion or contraction of TE copies in the genome have resulted in the genomic variations that characterize eukaryotes. Genetic variations induced by TEs occur at both the genic and chromosomal levels. Genic level variation occurs either when TE insertion disrupts coding function of a gene or when TE excision restores its coding function. A classic example of genic mutation through TE insertion is the wrinkled seed trait in Mendel’s peas. An Ac/DS family TE insertion into the starch synthesis gene resulted in an insufficient amount of starch in the developing seed to manifest as wrinkled, upon drying (Bhattacharyya et al. 1990). Another well-known example of TE insertion is the hopscotch TE insertion into the enhancer of teosinte branched1 (tb1) to differentiate domesticated corn from its ancestral species teosinte (Studer et al. 2011). TEs cause chromosomal level variation by ectopic pairing between related TE families, which leads to chromosome rearrangements such as deletions, translocations, and duplications (Lönnig and Saedler 2002; Zhang et al. 2011). Chromosomal level disturbances affect many genes, often resulting in casualties that are fatal to the organism. TE-insertions into functional genes, however, undermine host adaptability so that the host organisms are sieved out from a population. Excluding a few families of TEs, such as Helitrons in maize as explained below, TEs in contemporary genomes are confined to the evolutionary neutral or ‘safe-haven’ chromosomal regions in a genome. Through their analysis of >500 eukaryotic species, Serra et al. (2013) postulated that an essentially neutral process governs the evolution of the abundance and diversity of TEs in all analyzed genomes.

Since the pioneer work of McClintock, TEs have been thoroughly characterized in maize. Considering the relatively short evolutionary timespan of modern maize lines, their diversity is extraordinary (Fig. 2). TE insertions and recombination in the common gene space have created high genetic variations among the maize lines. Wang and Dooner (2006) reported very high diversity of TE-insertions in the bz locus among 8 inbred maize lines that have been used in the breeding program of the USA. In addition to other TE insertions, two of their 8 inbred lines carried Helitrons in different sites, accounting for haplotype diversity in the bz locus (Wang and Dooner 2006). Of the various TEs, the rolling circle manner transposing Helitron best explains genetic diversity in maize (Morgante et al. 2005; Yang and Benntzen 2009). Helitrons often capture different host gene fragments while they transpose, leading to intraspecific variations. Chimeric gene fragments by capturing different gene fragments were found in multiple chromosomal locations that are within inside of the individual Helitron elements (Lal et al. 2003; Gupta et al. 2005). Morgante et al. (2005) reported that as many as 10,000 genic content polymorphisms have occurred through Helitron insertions in the maize genome. The numbers were even higher in the report by Yang and Bennetzen (2009), who reported a total of 1,930 Helitrons from 8 families and >20,000 fragments which can account for approximately 2.2% of the maize genome (B73). The prominent Helitron burst of amplification activity is dated as having occurred approximately 250,000 years ago, and elements in one of the eight families are still active in transposition.

Fig. 2
figure 2

Genetic diversity in 20 different Korean maize landraces. The polymorphisms are detected by retrotranspson based SSAP (sequence specific amplified polymorphisms) technique

Genetic diversity driven by TE burst can confer adaptability on the host in coping with stresses from a labile environment. This is more pronounced in the plant kingdom where, in contrast to the early separation of germ cells from soma cells in animals, the boundaries between germ cells and somatic cells are unclear. Variations from TE activities in plant somatic cells can therefore be inherited by the progenies, thereby conferring greater genetic diversity amongst populations. If the variation accords host adaptability, progenies with the variations may drift from the main population, form a subpopulation which has a bottleneck effect due to their size, and subsequently diverge from the population to give rise to subspecies (Belyayev et al. 2010; Jurka et al. 2011; Levy 2013). This view is consistent with the hypothesis of punctuated evolution which attributes speciation to sudden events rather than a gradual accumulation of small mutations (Gould and Eldredge 1977). Aegilops speltoides is a diploid, cross-pollinated wild grass species, indigenous to the middle East. Belyayev et al. (2010) demonstrated that TE proliferations promote or intensify karyological and morphological changes in marginal populations, some of which may give potentially importance for the process of microevolution, which in turn all species with plastic genomes to survive new forms or species in this plant species during the period of rapid climatic change.

An experimental test of TE-mediated speciation and genomic plasticity is difficult; Jurka et al. (2011) have proposed a theoretical approach to assessing TE-mediated speciation by testing the potential impact of merging TE families on genetic diversification. Nevertheless, it is an interesting proposal that genome plasticity through TE diversity may have contributed to taxonation (Oliver and Greene 2009; Oliver et al. 2013). Apropos illustration for this can be found in the comparison of spermatophytes of angiosperms and gymnosperms. The number of extant angiosperm species is estimated as >350,000 (Soltis et al. 2008), making them the second largest group after the insects. The woody gymnosperms on the other hand, comprise less <1000 species, despite preceding the angiosperms evolutionarily (Megallon and Sanderson 2005). Both TE composition and diversity are different in the genomes of these two groups of eukaryotes (Kovach et al. 2010; Nystedt et al. 2013). While the genomes of gymnosperms have been hosts to a diverse set of TEs with low TE activity, those of angiosperms have accommodated TEs with much more activity. The TEs in gymnosperms are ancient, whereas in angiosperms, a few families of the TEs are younger, and have been repeatedly activated. Some lineages of TE families have been repeatedly amplified and others have been purged from the genomes in angiosperms, whereas those in gymnosperms have accumulated steadily without efficient removal. While differences in TEs may not be the sole determinant for the high fecundity and species richness of angiosperms, it is not unreasonable to conclude that they can account for the high species variability in plants.

Gene regulation and TEs

We now know that TEs regulate gene expression in many ways, which was demonstrated by McClintock that TEs can cause gene expression changes in both qualitatively and quantitatively kernel color variegation as named TEs as “controlling elements” (McClintock 1956).

TE-insertions into coding genes generate null-alleles that often lead to critical casualties for the host. However, some null-alleles by TE-insertion can survive as exemplified by the wrinkled pea trait, caused by an Ac/DS family insertion into the gene for starch synthesis (Bhattacharyya et al. 1990). Another example is waxy foxtail millet (Setaria italica) that was derived from a TE-insertion into the waxy locus of granule-bound starch synthase (GBSS1) gene that has 14 exons (Kawase et al. 2005). Multiple insertions or excisions of several TEs at various locations in the GBSS1 gene have resulted in waxy (no amylose) to low amylose types of foxtail millet, implying null-function to varying degrees of mild function of the encoded protein. The nature of TE-insertion induced null-phenotypes was also used for gene discovery by reverse genetics approaches (Piffanelli et al. 2007; Settles et al. 2007). Because Tos17, Ty1-copia retrotransposon in rice, is activated by tissue-culture, Piffanelli et al. (2007) constructed a large number of Tos17 tagged mutant library after tissue-culture in rice, which was used for forward and reverse genetics strategies for novel gene discovery in rice.

Because most TE-insertion into exons are deleterious to fade out from the population, TE-insertions are often found near the genes and introns, often in the regulatory regions (Lisch 2013a, b; Zhao et al. 2015). While TE-insertion into the enhancer abolishes gene expression, its insertion in the repressor facilitates gene expression (Lisch 2013a). Gene regulation by TE-insertion into the regulatory region is often quantitative. Erucic acid is a non-edible plant oil found in seeds. There are four alleles, E1, E2, E3, and e of the Fatty Acid Elongation1 (FAE1) gene that determines the content of erucic acid in seeds of yellow mustard (Javidfar and Cheng 2013). E1 is wild type and e is a mutant by Sal-PIF, PIF/Harbinger-like TE, inserted into the coding region of FAE1 gene. In E2 and E3, a copia type retrotransposon inserted between promoter and transcription start site, but they are different by the methylation at the promoter in E3. Erucic acid content in seeds in E1 was 53%, the null mutant e showed no erucic acid. While E2 showed 24% erucic acid, the promoter methylated E3 revealed only 1.4% erucic acid content in seeds (Zheng and Cheng 2014). Another example is lignin content in maize. Cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis. Brown midrib1 (bm1) plants have reduced content of lignin, which is highly advantageous using in silage and biofuel. In maize, bm1-cad1 mutant plant produced lignin as low as 24 to 30% compared to the wild type. The bm1-cad1 mutant has DS element in the first intron (Chen et al. 2012). Thus, TE-insertion into gene or genic regions has an attenuating effect on gene regulation.

TE-insertions can also reprogram gene expression. A classic example of the reprogramming of gene expression by TE-insertion is anthocyanin pigment gene expression in maize. Two regulatory genes, myc and myb, are required for pigment synthesis in maize. The myc gene has two alleles, r1 and b1; c1 and pl are alleles of the myb gene (Dooner and Robbins 1991). Alleles r1 and c1 are expressed in seeds while b1 and pl are expressed in plant parts other than seeds. However, two variant alleles of b1, B-Peru and B-Bolivia are expressed in developing seeds instead of the plant body. Molecular analysis of these mutants reveals TE-insertions in the b1 locus, with the independent TE-insertions causing ectopic expression of pigment synthesis genes (Selinger and Chandler 1999). Variation in grape skin color is also representative of TE-insertion into the regulatory region of a gene (Kobayashi et al. 2004). Myb-related genes, VlmybA1-1, Vlbmyb1-2, and Vlbmyb1-3, regulate anthocyanin accumulation in grape fruit skin. A Ty3-gypsy insertion into Vlbmyb1-1 gene at the 5′-regulatory end abolishes color accumulation, resulting in a color-less grape. LTR–LTR recombination resulted in solo-LTR and reverted the Vlbmyb1-1 gene function partially, resulting in pink grape (Lisch 2013a).

Methylated TE-insertions in genes often result in reduced expression of neighboring genes. Hollister and Gaut (2009), in their analysis of genomic, epigenomic and population genetic data of A. thaliana, observed three responses of neighboring gene expression to methylated TEs: (i) a negative correlation with TE methylation density, (ii) a purifying selection for methylated TEs and not for unmethylated TE, and (iii) presence of farther distances of older and methylated TEs from genes compared to younger TEs. The genome of A. lyrata (207 Mb) carries 2–3 times higher TE copy numbers than the genome of A. thaliana (130 Mb) (Hu et al. 2011). Genome-wide comparison of 24-nt siRNA of A. thaliana and A. lyrata revealed that siRNA-targeted TEs were associated with reduced gene expression in both species. However, the efficacy of RNA-directed DNA methylation silencing is lower in A. lyrata, suggesting differential TE proliferation between two congeners (Hollister et al. 2011). So, although the global cost of TE proliferation is unavoidable, there is a compensating trade-off between the benefit from TE silencing and the deleterious effects on the reduced gene expression nearby genes.

Molecular exaptation of TE encoding genes

Exaptation is an evolutionary term that was coined to account for, “adaptive features originally built by natural selection for one role, or even non-adaptive features, that have since been co-opted for a new role” (Gould and Vrba 1982). TE encoded proteins mediate transposition of transposable elements, but sometimes they are exapted to perform different functions for host adaptability. These exapted TEs have been called domesticated TEs (Miller et al. 1992), and numerous cases of TE domestication have been reported for the proteins encoded in both class 1 and class 2 TEs (Feschotte and Pritham 2007; Hoen and Bureau 2012; Alzohairy et al. 2013). Table 2 lists several domesticated TEs in plants.

Table 2 A few examples of domesticated transposable elements in plants

TE activation can induce mutations that seriously hamper host fitness; in response, host genomes adopt epigenetic defense systems to control TE activities and mitigate their mutagenic potential (Blumenstiel 2011; Martienssen and Chandler 2013). TEs have two fates post-transposition into a new site; either they vanish through loaded mutations or they survive because the host acquires new function(s) (Rouzic et al. 2007; Hoen and Bureau 2012). TEs, upon integration into new chromosomal sites, are epigenetically silenced. The static TEs can either accumulate mutations (sequence changes or deletions), and eventually end up as no longer being recognized as the related ancestral elements. Alternately, they might acquire a function(s) that enhances host fitness (Miller et al. 1992; Muehlbauer et al. 2006). The latter situation is exemplified in domesticated TE elements which have several features that distinguish them from their ancestral autonomous counterparts. Domesticated TEs are usually present in single or low copy numbers, and are often detectable at orthologous loci in other organisms. In contrast, ancestral TEs are present in multiple copies, and often found at different chromosomal loci in divergent species. Domesticated TEs have lost the function for mobility through mutation while ancestral TEs often retain their mobility (Alzohairy et al. 2013). Although the comparison between non-synonymous changes (Ka) and synonymous mutation (Ks) with the ancestral TE protein provides indirect evidence of the domesticated TE proteins, unequivocally identifying the related ancestral TE protein(s) in the genome often posits challenges because the ancestral TE(s) is no longer present or, even if present, they are fragmented or present only small fraction in the genome due to mutations (Donoghue et al. 2011; Joly-Lopez et al. 2016). Integrative computational analysis of whole genome sequences enabled to discover genome-wide novel plant genes derived from transposable elements (Hoen and Doug 2015), then reverse genetics approach can directly verify their phenotypic functions (Joly-Lopez et al. 2012, 2016). This genome-wide integrative approach revealed that TE exaptation events have occurred far more frequently to generate novel genes, coinciding with key evolutionary periods. With respect to this point, the evolution of angiosperms with FAR1 (Far-Red Impaired 1) and FHY3 (Far-Red Elongated Hypocotyl 3), Mutator-like transposase derived plant transcription factors, provide a good example (Lin et al. 2007). FAR1 and FHY3 regulate the expression of genes that are involved in the response of far-red light controlled by phytochrome A (Lin and Wang 2004). FAR1 and FHY3 also play a role in the diverse plant development and physiological processes including circadian rhythm (Li et al. 2011), chloroplast development (Gao et al. 2013), plant hormone signaling (Tang et al. 2013), shoot branching and stress response (Stirnberg et al. 2012), all of which are critical adaptive traits in eudicots. Another citable example is the MUSTANG (MUG) which is also Mutator-like element (MULE) derived domesticated plant gene (Cowan et al. 2005). In Arabidopsis, the MUG genes have a diverse role in flowering plant development such as chlorophyll production, flowering time, and seed yield (Joly-Lopez et al. 2012; 2015). In angiosperms, the MUG gene diverged into MUGA and MUGB that are present in basal eudicots, but absent in gymnosperms (Cowan et al. 2005; Joly-Lopez et al. 2012), implying that MUG genes might have occurred in early period in angiosperm evolution after gymnosperm-angiosperm divergence. Likewise, Transib transposase derived V(D)J recombinase in jawed vertebrates (Kapitonov and Jurka 2005) and Ty3/Gypsy retrotransposon Gag protein derived peg10 in placental mammals (Ono et al. 2006) provide examples of the roles of TE encoded protein exaptation in animal adaptation during evolution. In summation, it would not be unreasonable to contend that molecular exaptation of TE elements has been a significant determinant or turning point in accounting for the evolution of current forms of flora and fauna.

Concluding remarks

Genomic investigations have enabled researchers to probe into the structural and organizational intricacies of an organism’s holistic genetic make-up. It has been amply substantiated that genomes of organisms serve as a home for the panoply of motley genetic elements, of which TEs are the most prominent in terms of abundance and distribution. The notion of mobile elements was not readily accepted at the outset (Fedoroff 2012), and their “raison d’être” earned them such harsh epithets as “parasites”, and “junk” or “selfish” DNA (Doolittle and Sapienza 1980; Orgel and Crick 1980). However, the ubiquity of TEs in prokaryotes and eukaryotes necessitated a revision of their label, and has substantiated their status of an ancient and predated divergence of all life forms (Schulman and Wicker 2013). It is now evident that they have been the major drivers of a unique trajectory in evolutionary biology (Fedoroff 2012; Fedoroff and; Bennetzen 2013). TEs generate genetic variations that are the raw materials for evolution. Bereft of these TE-driven variations, the current taxonomical status of myriad species of flora and fauna would be completely different.

In plants, much of the work with TEs has been deliberately centered around domesticated and laboratory based model species rather than wild species in nature, primarily because of the plethora of relatively accurate and well-characterized data on traits in the former group of species. The citation of a temporal fluctuation of TE copy numbers in marginal populations of the wild diploid wheat, Aegilops speltoides, may facilitate an insight into the relatedness of the microevolution process and TE-driven plastic genomes in times of rapid climate change (Belyayev et al. 2010; Belyayev 2014). The industrial melanism mutation in the British peppered moth was recently attributed to a microevolutionary change caused by a TE-insertion in a gene involved in cell-cycle regulation (van’t Hof et al. 2016). Investigations utilizing environmental epigenomics are the new focus for further elucidation of the role of TEs in the adaptability of wild unexplored species and untamed organisms (Bossdorf et al. 2008; Lira-Medeiros et al. 2010; Mirouze and Paszkowski 2011). Although complete comprehension of the ubiquity and abundance of self-replicating TEs remains in the offing, it is inexorably clear that contemporary biology cannot be credibly and adequately explained without acknowledging the role of transposable elements in evolution and biology.