Introduction

In the late 1940s, Barbara McClintock challenged the existing concepts of genome organization and functioning when she discovered genes prone to mobility (McClintock 1950), which were later called ‘transposable elements’ (TEs). Although, the existence of TEs was accepted relatively soon after by the scientific community, the biology and applications of mobile genetic elements took decades to be widely recognized. With the discovery that many of these sequences are able to selfreproduce and to induce mutations, the selfish or parasitic DNA hypothesis was born. It said that these sequences served no function in the host organism, but were simply maintained by their ability to replicate and spread copies of themselves within and even between genomes (Doolittle and Sapienza 1980; Orgel and Crick 1980). In this ‘selfish’ way, TEs introduce genomic conflict trying to maximize their own fitness at the expense of the host’s genes (Burt and Trivers 2006; Werren 2011). Although, the TEs are primarily selfish and having deleterious effects, their activities may occasionally and stochastically confer a fitness advantage to their hosts. Nowadays, with the improvement of molecular tools for genome analysis including next generation sequencing technologies, the majority of scientists recognize that mobile elements, even behaving selfishly, play a significant biological role in the maintainance of genome integrity and diversification of the genetic repertoire of their hosts. Nevertheless, there is still an underestimation and/or lack of comprehension among scientists about the opportunity of studying TEs for resolving important research issues. The aim of this review was to highlight the significance of TEs as enhancers of genome dynamics and evolution, and to further disseminate this research issue to the biological community. First, I will provide a short overview of TEs, their distribution among eukaryotes and relation to genome size variation. Then, I will emphasize the evolutionary consequences of TEs for genome functioning and integrity through some examples in the plant and animal kingdoms. Finally, I will focus on the practical applications and perspectives of TEs for genome analysis and manipulation.

Structure and abundance of TEs

TEs are DNA sequences that can change their positions within the genome through the mechanism called transposition. There are two major types of TEs depending on their mechanism of transposition (Wicker et al. 2007). Class II elements (or DNA transposons) move through ‘cut and paste’ mechanism which comprises excision of a TE copy from one place in the genome followed by its reinsertion into another place. DNA transposons are present in low–moderate copy numbers in almost all eukaryotes. Class I retrotransposons transcribe RNA intermediates from genomic copies that are reverse-transcribed into a double-stranded DNA and integrated into a new position. Several daughter copies can be produced from a mother copy and insert throughout the genome, this proliferation has made retrotransposons a major fraction of large genomes. Eukaryote genomes are populated with large fractions of TEs. In plants, the repeat sequences range from 3% of the small genome of Utricularia gibba (Ibarra-Laclette et al. 2013) to more than 85% of the large genomes of cereals and maize (Li et al. 2004; Schnable et al. 2009). The human genome consists of about 65% of TEs (de Koning et al. 2011).

C-value paradox, transposable elements and the concept of ‘junk’ DNA

With the availability of many sequenced genomes (Michael and Jackson 2013) and the recently updated information of whole-genome sequences, it became evident that there is a striking variation in genome size in angiosperms which is often poorly correlated with the number of genes that code for proteins or the presumed evolutionary complexity of species (although we still do not understand how to meaningfully measure an organism’s complexity). This observation is the core of the C-value paradox raised more than 30 years ago where ‘C’ stands for the haploid genome size (Gall 1981). Genome sizes substantially vary between taxas: more than 7000-fold among animals (Gregory and DeSalle 2005) and 2400-fold across land plants as evidenced in the recent update of the plant genome size database containing data for 8510 species (Bennett and Leitch 2012; Garcia et al. 2014). A similar trend is observed between close species at a same ploidy level and apparently comparable complexity. A typical example is the plant genus, Eleocharis, which comprises more than 250 species, in which E. acicularis (2 n= 20, C = 0.25 pg) is 20 times smaller than E. palustris (2 n= 16, C = 5.5 pg) (Zedek et al. 2010). This genome size variation results from a substantial fraction of extra DNA other than for protein-coding genes and regulatory sequences, and is comprised of introns, pseudogenes, satellite sequences and TEs. This noncoding DNA has been considered a ‘junk DNA’ that has no biological function and conveys little or no selective advantage to the organisms (Orgel and Crick 1980).

The recent discovery that some regions of the noncoding DNA play important functions in genome and cell integrity has challenged the concept of useless ‘junk’ DNA. The noncoding DNA was found to be essential for maintaining the chromosome structure, the function of centromeres, binding of transcription factors and coding for RNA involved in gene silencing. The ENCODE project represents a significant breakthrough study into the understanding the proportion of human’s genome that is functional. A function was assigned for 80% (dominated by RNA transcription, which alone covered 62%) of the genome particularly outside of protein-coding genes (ENCODE Project Consortium 2012). ENCODE and other studies have indicated that 5–20% of the human genome is under detectable selective pressure. Interestingly, the genome fraction directly involved in gene regulation was found significantly higher (up to 8-fold) than that ascribed to protein-coding exons (1%). Soon after, the ‘function’ by any meaningful sense of the word of the ENCODE-defined functional elements has been criticized and questioned (Eddy 2012; Doolittle 2013; Graur et al. 2013; Niu and Jiang 2013). These authors argued that DNA is still ‘junk’ despite the fact that it may bind transcription factors and contain regions of modified chromatin. ENCODE project represents the first genomewide functional annotation of the human genome. However, it does not directly address the ‘junk’ DNA concept which still remains viable.

TEs, particularly retrotransposons, comprise a major fraction of the ‘junk’ DNA and to a larger extent contribute to genome size expansion and variation among species, even as gene numbers remain relatively constant (table 1). The Arabidopsis genome, e.g. contains about 27,000 genes and 20 to 25 Mb of retrotransposons, whereas the maize genome contains about 40,000 but more than 1800 Mb of retrotransposon sequences (Liu and Bennetzen 2008; Baucom et al. 2009; Schnable et al. 2009). The ‘selfish’ nature of TEs acting as molecular parasites and functioning for themselves rather than having an adaptive function for their host genome was postulated by Doolittle and Sapienza (1980) and Orgel and Crick (1980). Since then, TEs have been a subject of tremendous interest because of their abundance, functionality and role in genome evolution. Do TEs embrace the concept of useless and nonfunctional ‘junk’ DNA? Definitely not for the whole genome fraction of TEs. First of all, many TEs are functional. Their DNA is biochemically active, encode proteins, bind proteins, synthesize regulatory RNA, thus meets the ENCODE criterion of functional elements. Second, many TEs are not ‘junk’ as there is a plenty of ways (see below) through which they provide a benefit to their host genomes. There is, however, a substantial fraction of decaying dead TEs generated from active TEs in the evolutionary past. It is suggested that different loads of such TE relics mostly explains the C-value paradox with larger genomes having a larger fraction of them. One should accept that there is a portion of DNA that seems to serve little useful purpose for the organism. However, I would like to point out on the inappropriateness of referring to DNA sequence as ‘junk’ unless we do not completely understand or have characterized it. DNA that appears useless today may provide a reservoir of sequences from which potentially advantageous new genes can emerge in future (Burt and Trivers 2006). In this way, it may be an important genetic basis for evolutionary innovation. After all, there might be a small amount of DNA that is a true junk. This DNA may just act as a protective buffer against the accumulation of harmful mutations.

Table 1 TE content (%) in representative flowering plant genomes.

In parallel to the interplay between genome size variation and TE accumulation, a recent study explored the link between TE diversity (types and number of TEs) and genome expansion (Elliot and Gregory 2015). The authors showed that this correlation is straightforward only to a certain point of genome size (specifically, around 500 Mbp), and then manifested by either a lack of relationship in animals or a negative correlation in plants. The likely common scenario may be that TE diversity and abundance increase as genomes expand up to a moderate size, whereas further genomic growth beyond this point is driven by a proliferation and divergence of a small subset of TE superfamilies. For example, 50% of the barley genome is made up of just 14 families, 12 of which long-terminal repeats (LTR) retrotransposons (Wicker et al. 2009). Also, retrotransposons BARE-1, Wis and Angela account for more than 10% of the Triticeae genomes. Consistently, differences in BARE-1 abundance primarily explain genome size variation among Hordeum species (Vicient et al. 1999). One point that should be emphasized, however, is that the establishment of a more or less stable equilibrium between genome size and TE proliferation within and among taxa is influenced by several other selection factors like population size, mating system, polyploidization events and ecogeographical distribution. The complexity of the interacting factors means that many comparative studies need to be done before patterns of TE abundance and diversity versus taxon-specificity and genome size can be described and understood.

Epigenetic silencing drives the ups and downs of transposable elements

Why TEs, apparently useless and potentially damaging, are widely spread in higher organisms? Generally, the abundance and accumulation rates of TEs result from a balance between two main forces: TE transposition leading to an increase in copy number and, from the other side, the elimination and inactivation of TEs mediated by mutations in their sequences or through the process of ectopic recombination between TE copies at nonhomologous loci (Charlesworth and Charlesworth 1983). Recombination events between TEs at nonhomologous chromosome locations lead to the generation of truncated inactive elements thus reducing their functional activity and accumulation rate. This balance is widely achieved by epigenetic silencing of TEs by siRNA-directed DNA methylation (Ito 2012). The mechanism consists of the synthesis of small TE-specific noncoding RNAs that guide DNA methylation and silencing of homologous DNA sequences at posttranscriptional level (Lister et al. 2008). Differential expression of TEs influenced by siRNA-directed DNA methylation was observed between different plant genomes (Alzohairy et al. 2014). For instance, differences in TE accumulation and genome size are largely influenced by the extent of TE-silencing as shown in a comparison study between Arabidopsis lyrata and A. thaliana (Hollister et al. 2011). Fedoroff (2012) argued that actually the evolution of epigenetic mechanisms, that control homology-dependent recombination, is driving the accumulation rate of TEs in a long-term aspect. The methylation in prokaryotes represents a nuclear defense system to limit the destructive potential of ‘parasitic sequences’ including TEs (Yoder et al. 1997). Consistently, the epigenetic control is not so efficient in prokaryotes and lower eukaryotes, and ectopic recombination among dispersed TEs would rapidly eliminate them either directly by deletions or indirectly by creating nonviable chromosomes. As a result, these processes keep the genomes of prokaryotes and many lower eukaryotes small. In contrast, higher eukaryotes have larger genomes due to more stringent epigenetic control and lower recombination rates that allow the accumulation of TEs.

Transposable elements are capacitors of genome dynamics

It is obvious that the high frequency of TEs, their capacity to change the location within the host genome and to induce chromosome aberrations (e.g. deletions, duplications and insertions) would confer a potentially negative impact of TEs on genome integrity. Transposable elements can affect genome dynamics with possible effects on phenotypes through multiple mechanisms, depending on the TE itself and its insertion site. Some examples of the effects of TE insertions on gene structure and function are shown in figure 1.

Figure 1
figure 1

Effects of TE insertions on gene structure and function. (a) Insertion of LTR retrotransposon upstream of the Ruby gene in the blood orange provides a new promoter controlling the expression of a gene for flesh colouration of the fruit (Butelli et al. 2012). (b) Insertion of TE into the rugosa locus encoding a starch-branching enzyme in pea results in a wrinkled seed (Bhattacharyya et al. 1990). Insertion of TE into a gene that produces anthocyanin pigments leads to its inactivation and changes in the colour: yellow seeds in maize. (c) White sectors in Petunia hybrida (d) Kroon et al. (1994). Revertant sectors (dark-spotted seeds or blue stripes) appear when the transposon spontaneously excises from that gene in certain cells and the pigment production is restored. (e) siRNA-controlled methylation of a TE insertion influence the expression of the FLC locus (a gene delaying flowering) and leads to earlier flowering in A. thaliana (Liu et al. 2004). (f) Alterations in the transcription levels of genes by antisense transcription from adjacent TE insertions as observed for the agouti colour gene in mice (Morgan et al. 1999).

Transposable elements-mediated alterations in the structure and function of genes

The most obvious TE-induced change is gene disruption leading to observable loss of function. A classical example are mutations in the gene coding for the colour of the maize kernels provoked by the Ac element detected by Barbara McClintock (figure 1c). Similarly, the wrinkled seed phenotype studied by Mendel in pea was found to result from an insertion of a TE into a locus encoding a starch-branching enzyme (Bhattacharyya et al. 1990). Beside plant genomes, ∼0.3% of all human mutations are caused by TE insertions or rearrangements (Cordaux and Batzer 2009). The currently active nonLTR retrotransposons, L1 (LINE 1), SVA and Alu (SINE), are reported to be the causative factors of many genetic disorders, such as haemophilia A, Apert syndrome, familial hypercholesterolaemia, colon and breast cancer, muscular dystrophy etc. (for review see O’Donnell and Burns 2010; Ayarpadikannan and Kim 2014). Alu elements (named for the enzyme used to identify it) are short sequences (∼300 bp) that occur almost a million times in the human genome and comprise up to 3.5% of the total DNA. Transposition of Alu elements to sites in and near genes, or Alu-mediated ectopic recombination events can have occasional deleterious effects on genes. Indeed, many cloned genes were shown to harbour Alu elements in their sequences. On the other hand, Alu sequences likely have positive regulatory functions as mutations within them have been associated with cancer. Transposable elements can also disrupt existing regulatory motifs (repressors or enhancers) or to provide new regulatory information thereby influencing the gene expression and causing mutant phenotypes. Several examples about such effects of TEs are reported in Drosophila (Lerman et al. 2003; Lerman and Feder 2005; McCart and Ffrench-Constant 2008) and plants (Salvi et al. 2007; Studer et al. 2011; Butelli et al. 2012).

Transposable elements affect the epigenetic regulation of genes

A substantial part of TEs are targeted by siRNA-directed DNA methylation that is repressing their activity. This epigenetic silencing can spread to genes located in the vicinity of TE insertions and may generate stable epialleles of potential evolutionary relevance (Slotkin and Martienssen 2007; Wang et al. 2013). For instance, an insertion of a LTR retrotransposon near the agouti gene in mice alters the chromatin state and DNA methylation at this locus (Morgan et al. 1999). Variations in the epigenetic status of the retrotransposon ultimately influence the gene transcript level and the colour of the mice coat (figure 1f). Similarly, early flowering of the Ler ecotype of A. thaliana is controlled by a DNA transposon insertion in the FLC gene responsible for a delayed flowering. This insertion is targeted by TE-derived siRNAs which results in the epigenetic silencing of the gene (Liu et al. 2004).

Transposable elements can mediate genome restructuring

Recombination events between TE copies at nonhomologous sites in the genome (ectopic recombination) either on one or different chromosomes is important mechanism by which TE mediate genome restructuring (deletions, insertions, inversions, translocations and duplications), thus promoting chromosomal instability (Gray 2000). In addition, TEs can enhance genome reshuffling by capturing and transferring genes within genomes (Jiang et al. 2004; Morgante et al. 2005; Schaack et al. 2010).

The number and the variety of mutations induced by TEs is extraordinary and can be hardly embraced. One can still argue that TEs have predominantly negative impact and increase the genetic load and scientists ask the question: Do TEs, through the induction of mutations or regulatory functions, provide some benefit to the organism itself or just stand as pure ‘selfish’ and detrimental DNA?

TEs have important biological functions

During the last two decades, a major focus has been on the positive contribution of TEs to the evolution of gene regulation and diversification of host genes. The major break-through in this area was achieved with the advent of high-throughput sequence technologies and software platforms for the annotation of TEs. We have tremendously enriched our knowledge about the biology of TEs and their interaction with other components of the host genome. One way in which TEs contribute to evolution is that their sequences (e.g. genes, binding sites, and terminal repeats) can be coopted to perform functions beneficial for the proper functioning of the host genome (Sinzelle et al. 2009). This genomic coopting of a molecular parasite is often referred to as molecular domestication or exaptation (Gould and Vrba 1982; Miller et al. 1999). Here I present examples of the different ways that TEs have evolved from strictly parasitic elements to mutualistic sequences that benefit their host genomes (see also table 2).

Table 2 Examples of adaptive mutations and exaptations provided by transposable elements.

Transposable elements as a source of novel regulatory networks

The human genome provides the majority of examples about TEs involved in domestication which has helped to spur remarkable evolutionary innovations. The initial analysis of the human genome has revealed that ∼25% of human promoter regions and ∼4% of human exons contain sequences derived from TEs (Lander et al. 2001; Nekrutenko and Li 2001; Jordan et al. 2003; Kapitonov and Jurka 2005; Jurka et al. 2007).

V(D)J recombination is a unique mechanism of genetic recombination that occurs only in developing lymphocytes during the early stages of T and B cell maturation. The process results in a highly diverse repertoire of antigen receptors in these cells and is a distinguished feature of the adaptive immune system in vertebrates. The V(D)J recombination is mediated by genes RAG1 and RAG2 that are evolutionarily derived from ancient insertions of Transib DNA transposons (Zhou et al. 2004; Ramsden et al. 2010). As expected, based on our knowledge about class II TEs transposition, this process involves the generation of double-strand DNA breaks in a way that is mechanistically similar to the ‘cut’ component of ‘cut and paste’ transposition.

Repeat-induced gene silencing involving L1 retroelements has been hypothesized for X-chromosome inactivation, which is necessary to maintain the proper gene dosage in females. Inactivation is initiated at the X-chromosome inactivation centre (XIC) from which the silencing signal spreads along the chromosome. According to one hypothesis, LINE retrotransposons might trigger the heterochromatization in XIC centre and boost the efficient spread of the silencing away this centre (reviewed in Lyon 2006). In support to this idea, the X chromosomes of many mammals, including humans, are rich in LINE elements, except in regions that are prone to escaping X inactivation (Ross et al. 2005). It is still unknown whether LINEs function in the spread of heterochromatin on the X chromosome or their enrichment may simply be a consequence of the heterochromatic nature of the inactive X. However, Cohen et al. (2007) have shown that short tandem repeats homologous to retrotransposons regulate X-chromosome inactivation by producing bidirectional transcripts in differentiating mouse cells, thus providing indirect evidence that TEs function in both the initiation and spread of X inactivation.

Transposable elements are also functionally implicated in the proper functioning of the mammalian embryo at earliest stages of its development (Macfarlan et al. 2012; Tomkins 2013). In this line, it was recently reported that a family of TEs in mammals provide enhancer sequences that modulate the gene expression in placental cells thus regulating the interaction between the mother and the offspring (Chuong et al. 2013). In humans, the highly conserved centromere-binding protein CENBP facilitates centromere formation and is derived from transposases of the pogo DNA transposon family (Casola et al. 2008).

Several examples about TE exaptation are available in the plant kingdom as well. In Arabidopsis, FHY3 and FAR3 are transcription factors, related to the MuDR family of transposases, that bind to promoter regions and activate several genes involved in far-red light and circadian clock signalling (Hudson et al. 2003; Lin et al. 2007).

A recent study in Arabidopsis has shown that the COPIA-R7 transposon, inserted into the plant disease resistance gene RPP7, enhances its host’s immunity to a pathogenic microorganism from a large group of fungus-like parasites that cause a number of plant diseases (Tsuchiya and Eulgem 2013). The Rim2 gene implicated in defence against fungal infection appears to have been directly exapted from part of a CACTA DNA TE element (He et al. 2000). The rice blast disease resistance gene Pit was refunctionalized by the recruitment of a copia-like LTR element as a promoter (Hayashi and Yoshida 2009).

Relatively few data are available about the direct role of TEs in the processes of species domestication. An insertion of the TE Hopscotch in the regulatory region of the maize domestication gene, teosinte branched1 (tb1), acts as an enhancer of its expression and confers the increased apical dominance in maize compared to its progenitor, teosinte (Studer et al. 2011). Insertion of a CACTA-like transposon into the promoter of the gene ZmCCT, which modulate photoperiod sensitivity, can suppress its expression thus enhancing the spread of maize to long-day temperate regions (Yang et al. 2013). Mustang and Sleeper gene families present in flowering plants have sequences derived from exapted transposases from Mutator-like and hAT DNA elements, respectively (Joly-Lopez et al. 2012; Knip et al. 2012). Mustang genes are present only in the angiosperm lineage and encode putative transcriptional regulators that play important roles in growth, flower development and reproduction.

Transposable elements maintain chromosome stability and functioning

The coexistence of TEs and the host genome has resulted in several regulatory pathways, including a combination of various epigenetic mechanisms, i.e. DNA methylation, small RNAs and histone posttranslational modifications. In A. thaliana, methylated TEs promote the epigenetic silencing and the formation of heterochromatin (Fagegaltier et al. 2009). TEs are abundant in heterochromatin-rich regions, centromeres (Casola et al. 2008; Mateo and González 2014) and telomeres (Levin and Moran 2011; Pardue and DeBaryshe 2011). Such phenomena might be the consequence of: (i) insertional preferences of TEs into heterochromatin; (ii) positive selection of TE maintenance into heterochromatin for genomic stability; or (iii) an induction of heterochromatin by TE sequences. An example of a host using the movement of a retrotransposon to its advantage was found in the telomere maintenance of Drosophila. Two retrotransposons, HeTA and TART, are present in multiple copies on the telomeres and maintain their length in replicating cells (Abad et al. 2004; Shpiz et al. 2007). A correlation between TEs activity and the maintenance of heterochromatin integrity was also demonstrated by the observation that the increase in TE expression and transposition leads to an age-related breakdown in heterochromatin structure and subsequent cellular dysfunction (Wood and Helfand 2013). St Laurent et al. (2010) argued that the stress-induced activation of LINE/L1 elements, particularly prevalent in mammals, leads to mutagenic insertions and DNA damage that accumulate with age and can cause genomic instability even in the absence of a successful transposition event. This fact shows that LINE elements may play an important role in mammalian ageing and evolution. In support to these data a study in budding yeast, using a chronological ageing model, observed that the yeast retrotransposon Ty1 showed increased mobility with age, and this was correlated with chromosome rearrangements and other hallmarks of genomic instability (Maxwell et al. 2011).

Matrix attachment regions (MARs) are DNA sequences that bind to the nuclear matrix forming functionally independent chromatin domains. Colocalization of TEs with MARs located in introns and 5 flanking regions of genes was observed which suggests the putative role of mobile elements and/or their products (transposases) in the formation of these structures (Avramova et al. 1998; Tikhonov et al. 2000; Pathak et al. 2014). For instance, the insulator protein BEAF-32 in Drosophila is entirely derived from a hAT transposase and is involved in chromatin functioning through interactions with the nuclear matrix (Pathak et al. 2007). A cross talk between the distribution and genome organization of BEAF-32 contributes to new gene-expression profiles and distinct phenotypes with a putative role in the evolution. Nevertheless, the functional link between MAR and retrotransposons remains to be comprehensively investigated.

Although, several domesticated TEs have been reported so far, their total number may be much higher than currently reported and many more domesticated TEs may await discovery in the near future. Traditional genetic methods may be insufficient to find such TEs and to assess their evolutionary and functional impact, e.g. due to functional redundancy. Systematic searches that exploit genomic signatures of natural selection have been employed to identify potential domesticated genes, but their predictions have yet to be experimentally verified.

TEs as capacitors of species adaptation to changing environments

Most TEs are usually in a transpositionally silent state but can be occasionally activated in response to different environmental stimuli (Grandbastien 1998), a phenomenon what Barbara McClintock called ‘genome shock’ (McClintock 1984). The major question is whether TE activity is just an undesired side effect of stress exposure or TE-induced genetic diversity accounts for microevolutionary processes such as rapid adaptive evolution and speciation in natural populations. A prevalent view among evolutionary biologists is that the vast majority of TE insertions are nearly neutral and unlikely to have a strong evolutionary impact. Some host forces do indeed select against TE insertions (due to the deleterious impact of insertions or of their effects through ectopic recombination) and to efficiently purge deleterious insertions. However, the selection on TEs accumulation at population level is affected by a complex of factors and the outcome is often difficult to be predicted. Nevertheless, several reports provide convincing evidence about a turnover of TEs closely matching ecogeographical distribution of gene pools. For instance, increase of full length copies of the retrotransposon BARE-1 was observed in barley populations in dry environments compared to those grown a few dozen metre apart in less stressful habitats (Kalendar et al. 2000). A good example of the capacity of TEs to elicit mutational consequences potentially helpful in adapting to new environments is provided by three diploid sunflower species, Helianthus anomalus, H. Deserticola and H. paradoxus. The three hybrid taxa, independently derived through hybridization events between the two parental taxa, H. annuus and H. petiolaris, encountered a rapid, retrotransposon-mediated genome expansion and all of them occupy habitats considered abiotically extreme relative to either parental species (Ungerer et al. 2006; Kawakami et al. 2010). Another example of a TEs involved in the adaptation is the early flowering of the Ler ecotype of A. thaliana controlled by a DNA transposon insertion (Liu et al. 2004). In addition, Gonzalez et al. (2010) have shown that at least 32% of the putatively adaptive insertions screened in natural population of D. melanogaster have a distribution consistent with selection by contrasted ecogeographical conditions. Most of these TE insertions, belonging to multiple TE families, were linked to functional genes with distinct phenotypic changes.

Insights offered by such reverse population genomic approaches pinpoint the importance of TEs as a source of adaptive variation. However, such surveys should be undertaken in a wider range of species to reliably estimate the impact of TEs on their evolutionary ecology.

Transposable elements are reliable DNA molecular markers

The high copy number, chromosome coverage and variable arrangement pattern even among closely related species give TEs advantage as informative markers to assess natural and stress-induced genetic diversity, and to enhance marker assisted selection (MAS) in plant breeding programmes. Both DNA transposons and retrotransposons can be utilized for the generation of markers, the latter ones being much more efficient. Retrotransposons have been found to comprise the most common class of TEs in eukaryotes, and represent up to 90% in plant genomes (Feschotte et al. 2002). They constitute for >50% of the maize and cereal genomes (Meyers et al. 2001; Brenchley et al. 2012) and 14% of the Arabidopsis genome (The Arabidopsis Genome Initiative 2000). Moreover, the presence of conserved domains at both ends (LTR) can be easily exploited for the design of PCR primers (Kalendar et al. 2011).

Major retrotransposon-based markers and type of inheritance

Retrotransposon-based molecular analysis relies on PCR amplification of DNA sequences (markers) using a primer corresponding to a retrotransposon and a primer matching a section of the neighbouring genome (e.g. microsatellite, restriction site or another TE copy). Among the wide diversity of retransposon-based techniques, the three most frequently used, as tools for diversity studies are sequence-specific amplified polymorphism (SSAP) (Waugh et al. 1997), interretrotransposon amplified polymorphism (IRAP) (Kalendar et al. 1999), retrotransposon microsatellite amplified polymorphism (REMAP) (Kalendar et al. 1999) and retrotransposon-based amplified polymorphism (RBIP) (Flavell et al. 1998). A schematic representation of the main marker methods is presented in figure 2. Several molecular marker systems based on the information available for the transposable elements sequences were developed to measure diversity, similarity and cladistic relationships in plants: barley (Kalendar et al. 2000; Vicient et al. 2001), citrus (Breto et al. 2001), genus Malus (Antonius-Klemola et al. 2006), rice (Branco et al. 2007), flax (Melnikova et al. 2014,), pea (Ellis et al. 1998; Pearce et al. 2000; Smýkal 2006), wheat (Queen et al. 2004; Melnikova et al. 2011), pear (Kim et al. 2012) and others. Transposable elements find substantial application as genetic markers in animal kingdom as well. For example, the method transposon display has been recently used to study the effect of interspecies hybridization on TEs dynamics in Drosophila (Vela et al. 2014). In addition, mobile elements are found to be active and to provide polymorphism in human populations (Mills et al. 2007). Consistently, SINE (Alu) and LINE (L1) elements have been used to trace human roots to Africa (Witherspoon et al. 2006). Also, L1 insertion polymorphisms have potential use in forensic analyses (Ray et al. 2007).

Figure 2
figure 2

Main transposon-based marker methods tracking polymorphism in the insertion pattern of TEs. The techniques rely on simultaneous PCR amplification of sequences between copies of a candidate TE and adjacent genomic regions which can be other TE copies interretrotransposon amplified polymorphism (IRAP); microsatellite loci, retrotransposon-microsatellite amplified polymorphism (REMAP) or restriction sites, sequence-specific amplified polymorphism (SSAP). Red arrowheads indicate the primers for PCR amplification.

RBIP is the only retrotransposon-based method directing at polymorphism in the integration of an element at a single-copy locus and can distinguish between its heterozygous or homozygous state, thus having a codominant inheritance. Other marker techniques generally behave as dominant (i.e. presence/absence of a TE insertion) and do not allow one to discriminate the allelic state of the locus. Even though, it may be possible to map two polymorphisms to the same TE integration site but this is very tedious in practice and can not be determined directly. The difficlulty comes from the complexity of band profiles in multitarget PCR reactions and the less sufficient resolution provided by the commonly used standard agarose gel and polyacrylamid gel electrophoresis (PAGE) methods of amplicon separation. The development of high-throughput marker method for genetic studies, the fluorescent SSAP system, improved the available amplicon number and the accuracy in their scoring which allow to dicriminate alleles and to identify heterozygous loci at a resolution of a single nucleotide (Knox et al. 2009). Another way to overcome the dominant nature of the other marker systems is to use genetically homozygous material. For mapping populations, this can be achieved using double-haploid, recombinant inbred lines or haploid tissues like the endosperm of gymnosperms. The efficacy of double-haploid populations for the mapping of retrotransposon markers and genes has been well established (Waugh et al. 1997; Manninen et al. 2000).Advantages of retrotransposon-based markersThere are few properties of TEs that give them an advantage over other DNA molecular markers.

Transposable elements are prone to activation by stress

As mentioned earlier in this review, TEs are prone to activation by different stress factors. Beside the few examples of TEs dynamics at population level, a plethora of studies have also demonstrated the stress-induced escape from the silent state for many TE families under control experimental conditions. In plants, these activating events include artificial interspecific hybridization and allopolyploidization (for review see Parisod et al. 2009; Yaakov and Kashkush 2011), infection (Melayah et al. 2001; Grandbastien et al. 2005), abiotic stresses like drought (Aprile et al. 2009), high and low temperatures (Ivashuta et al. 2002; Young et al. 2005), protoplast isolation, cell culture, wounding, methyl jasmonate, CuCl 2 and salicylic acid (Hirochika 1993; Moreau-Mhiri et al. 1996; Mhiri et al. 1997; Takeda et al. 1998). The activation of wheat retrotransposons under light and salinity stresses has been also reported (Woodrow et al. 2010). Stress response may differ between host genotypes possibly reflecting an adaptive response of ancestral populations to different stimuli. Therefore, these and more studies show without doubt that TEs can be used as reliable DNA markers to assess genome response to stress factors both at individual and population levels.

Although, the exact process of transcription induction remains not completely elucidated, it has been shown that TEs are likely to become activated by mechanisms similar to those employed by host defense genes. Indeed, promoter sequences of both TEs and host defense genes share nucleotide similarities and are likely to bind to similar transcription factors (Casacuberta and Santiago 2003). For instance, Tnt1 retrotransposons from tobacco present regulatory regions with specific motifs that are commonly observed in genes induced by drought, anaerobic conditions or oxidative stresses (Grandbastien et al. 2005). Transposable elements could have captured promoters from normal stress-inducible genes or inversely, that they provided their own stress-inducible promoters to some plant defense genes (Grandbastien et al. 1997; Takeda et al. 1999).

Transposable elements display high level of insertional polymorphism

DNA marker techniques based on TEs are anonymous, producing fingerprints from multiple sites of retrotransposon insertions in the genome (Schulman et al. 2004). The outcome is a high degree of heterogeneity and insertional polymorphism observed both within and between species. One can still narrow the regions upon which one is looking for TE polymorphism. To achive this goal, one should rely on the biology and insertional pattern of TEs. For instance, high copynumber families like the active LTR retrotransposon BARE-1 tend to form clustered (i.e. adjacent insertions) and nested (i.e. insertions within one another) arrangements in intergenic regions of large genomes. In contrast, insertions of short TEs such as Ac/Ds, MITEs, LINEs, SINEs, but also low copy number LTR retrotransposons, seem overrepresented near or within genes in plants and in humans (i.e. mostly in introns, 5 or 3 UTR as well as in flanking regions; Wright et al. 2003; Majewski and Ott 2002). In addition, retrotransposons allows to detect large genome changes and seem to be more informative as a complement to convenient DNA markers like amplified fragment length polymorphism (AFLP), microsatellites (SSR) and single-nucleotide polymorphism (SNP) (Waugh et al. 1997; Ellis et al. 1998; Yu and Wise 2000; Porceddu et al. 2002; Tam et al. 2005) which mainly detect single nucleotide changes.

Retrotransposons are homoplasy-free phylogenetic markers

The regions of transposon insertions are thought to be more or less random. Thus, TE copies at exactly the same location in homeologous chromosomes within and among individuals appear unlikely. Retrotransposon integrations are also assumed to be irreversible events unless a chromosomal segment containing the repeat becomes deleted. An insertion at a specific genomic location suggests a derived character state, and species which share an insertion at a particular locus are grouped together on the tree. Lack of an insertion at an orthologous locus is regarded as an ancestral state and the corresponding individuals are considered basal in the phylogenetic tree. Thus, as opposed to reversible changes in DNA sequence composition, retrotransposon insertions have been claimed to generate homoplasy-free phylogenetic markers that provide an extremely accurate picture of evolutionary relationships and have been proven very successful in elucidating problematic phylogenies. For example, SINE elements were evaluated as reliable cladistic markers to resolve phylogenetic relationships among human and nonhuman primates, clinical diagnostics of diseases and forensic identification (for review see Schmitz et al. 2005; Ray 2007; Konkel et al. 2010; Ray and Batzer 2011).

Transposable elements as a tool for gene manipulation

Insertional mutagenesis

As discussed earlier, TEs can change their genomic location upon activation. If inserted inside the coding or regulatory sequence of a gene, disruption of the reading frame can lead to a loss of gene function. This phenomenon provides the platform for the development of the technique called transposon mutagenesis which has been used to induce mutations, identify and study the function of the responsive gene. The standard mutagenesis platform consists of crossing genetic lines with inactive or nonautonomous TEs with lines containing an active (autonomous) element. The offspring carrying an autonomous transposon, through its transcribing transposase, can mobilize the nonautonomous transposon. The reactivated transposon can further insert randomly into new genomic sites thus causing mutations in the offspring. Hundreds and thousands of individuals can then be screened for a new mutation of interest. If such a phenotype is found, then it can be assumed that TE insertion has inactivated the gene responsible for this phenotype. Because, the sequence of TE is known, the gene can be easily identified either by sequencing the whole genome and searching for the sequence or using PCR to amplify specifically that gene. Several examples are available on genes succesfully tagged using TEs in species like tobacco (Fitzmaurice et al. 1999), maize (Howard et al. 2014) and tomato (van der Biezen et al. 1996).

Transposable elements as gene delivery vehicles

Virtually any DNA sequence of interest can be placed between the terminal inverted repeats (TIRs) of a TE and mobilized in trans by supplementing the transposase gene in the form of an expression plasmid or through mRNA synthesized in vitro. This feature makes TEs natural and easily controllable DNA delivery vehicles to transfer genes to a host organisms’ chromosome for the purposes of introducing new traits and to discover new genes. The use of TEs as nonviral vehicles for persistent gene delivery has emerged with the discovery of DNA transposon ‘sleeping beauty’ as a tool for genetic modifications and persistent gene expression was demonstrated in a wide variety of vertebrate cell lines and species (Ammar et al. 2012). Several other synthetic DNA transposon system vehicles like the PiggyBac (PB) transposon with activity in mammalian cells have been studied and tested (for review see Skipper et al. (2013)). Transposon-based gene transfer is more efficient for stable expression of foreign genes in vertebrates compared to classic approaches as the latter ones rely on physical methods of gene constructs delivery: transfection, electroporation, sonoporation, or microinjection. The main drawbacks of these approaches are the low rates of genomic integration and unstable expression of the chromosomally integrated gene construct. This is believed to be associated with the phenomenon of concatemerization of the injected DNA before genomic integration and repeat-induced gene silencing (Henikoff 1998).

The ongoing investigations will certainly prompt new ideas and new designs to be developed in the expanding universe of TE-based technologies for genetic and cell engineering. Innovative aspect of these studies represents the evaluation of TE-based delivery systems as a potential approach for the therapy of human deseases. Despite the various examples of preclinical efficacy for in vivo gene therapy, the road to the clinic will wind through additional experimentation and evidence of therapeutic effects in large animal models.

Conclusions

Transposable elements occupy a significant portion of eukaryotic genomes and have been long time considered as noncoding ‘junk DNA’ with no beneficial functions for the host genome. With the advent of high-throughput sequence technologies, however, scientists are beginning to find that intrinsic relationships exist between TEs and the other part of the genome. Perhaps only a small proportion of these relationships evolve to become mutually beneficial over the long term. Transposable DNAs have been expertly integrated into the incredibly complex function of the genome as important coordinators in many biological processes such as maintenance of chromosome integrity and creation of novel regulatory networks. The number of evidences for the benefits of TEs is constantly increasing, however, this is perhaps the tip of the iceberg. At the end, TEs are the main provider of genetic diversity which is the raw material to promote eucaryotic genome flexibility and evolution. Transposable elements can therefore be viewed as important genomic symbionts and their study should go ‘hand-in-hand’ with the other part of the genome. The fraction of TEs is often neglected in genomewide studies. For instance, the common practice of scientists is to mask repeat sequences in next generation data focussing on the ‘good’ stuff: coding regions. One should appreciate, however, that by throwing out all data on TEs, we are turning our back on important findings on genome functioning and evolution. In short, the selfish, junk and beneficial DNA hypotheses are all relevant but by no means mutually exclusive and single label for these relationships is inappropriate and potentially misleading. One of the key take homes from the numerous sequenced plant and animal genomes is that we still have a lot to learn about the organization of genomes, function of genes to comprehensively recognize the diverse functional importance of the ‘junk’ DNA.