Introduction

At the beginning, until Darwin and Mendel created foundations for understanding evolution and heredity, our knowledge was formless void and darkness covered the face of the deep. The term gene emerged as a name for a fundamental physical and functional unit of heredity at the beginning of the twentieth century [42]. Deciphering of the genetic code half a century later [13] strongly tied the concept of a gene with protein coding. However, this is only one of the many contexts in which the term gene is being used [8] and the definition of a gene as a unit of heredity has been evolving ever since (reviewed in [29]). Importantly, genome and transcriptome sequencing during the last two decades revealed that protein-coding genes are a critical but relatively small world in the whole universe of heritable information in terms of both, the genome content and the genome fraction transcribed into RNA. For example, human genome sequencing revealed that less than 2 % of its nucleotide sequence codes for proteins, while 55 % is composed of repetitive elements [40]. Next generation sequencing (NGS) provided evidence for mobile elements being one of the major factors driving genome evolution [53]. Mobile elements have an ability to insert themselves into new genomic locations and recombine, thereby causing genetic alterations along with multiplying their numbers in the genome.

In this review, we focus on intersection of two large worlds in the universe of heritable information: mobile genetic elements (particularly retrotransposons) and long non-coding RNAs (lncRNAs). There is a plethora of literature separately covering mobile elements and lncRNAs (for example [12, 31, 36, 45, 62]). Here, we summarize essential features of mobile elements and lncRNAs and focus on how retrotransposons contribute to lncRNA evolution, structure, and function in two main mammalian model organisms—mice and humans.

Retrotransposons

Based on the mode of transposition, mobile elements fall into two major classes: Class I includes “copy and paste” retrotransposons and Class II includes “cut and paste” DNA transposons (Fig. 1, reviewed in [36]). The latter class is characterized by terminal inverted repeats and ability to release itself with a transposase (TPase) from the genome and insert elsewhere (hence cut and paste). However, DNA transposons do not seem to be currently active in mammalian genomes and their remnants are so-called DNA transposon fossils [24]. The human genome contains >100,000 copies of short (180–1200 bp) elements with 14- to 25-bp terminal inverted repeats generated by target site duplications [10, 40, 76]. Class I mobile elements, also called retrotransposons, transpose through an RNA intermediate. Retrotransposons can be further divided into four subclasses based on their retrotransposition competence (autonomous vs. non-autonomous) and the presence/absence of long terminal repeats (LTRs) at their 5′ and 3′ ends (LTR vs. non-LTR elements).

Fig. 1
figure 1

Transposable elements. Overview of two major classes of transposable elements. Copy numbers of transposable elements per haploid genome (copies) and fraction of the genome occupied by each transposable element type (%) were obtained from the literature [10, 40]

Autonomous LTR retrotransposons evolve from retroviruses when their life cycle becomes confined into a host cell as they lose the ability to be released and infect another cell [43, 52, 80]. Accordingly, structure of LTR retrotransposons closely resembles that of retroviruses. LTR retrotransposons are ∼5–12 kb long, they have two long terminal repeats (LTRs) flanking a protein-encoding region, which carries RNA-dependent DNA polymerase (POL, reverse transcriptase) but often lacks an envelope (ENV) protein-encoding gene (reviewed in [36]). In the mouse genome, there is currently one highly active LTR retrotransposon group (Intracisternal A Particle (IAP)) and several presumable LTR retrotransposon fossils, including Mouse Endogenous Retrovirus type-L (MuERV-L) insertions, which are transcribed during early development [78]. The human genome hosts a family of actively retrotransposing Human Endogenous Retroviruses (HERVs, reviewed in [48]). Autonomous LTR-retrotransposons usually reach hundreds to several thousands of insertions before they die out because inserts accumulate mutations abolishing their coding capacity. At the same time, complementation causes that mutant transcripts compete with intact ones for retrotransposition. This eventually minimizes retrotransposon’s chance to make a copy of a functional element while random mutagenesis continues eliminating remaining functional copies.

In non-autonomous LTR elements, the sequence flanked by LTRs does not contain open-reading frames. They are significantly smaller than autonomous LTR elements, ranging usually between 1 and 1.5 kb. Because of the lack of coding capacity, their retrotransposition requires factors provided by autonomous retrotransposons. A representative example of non-autonomous LTR elements is Mammalian apparent LTR Retrotransposons (MaLR, reviewed in [75]). MaLRs include Mouse Transcript (MT) elements, which provide oocyte-specific promoters in mouse oocytes [71].

Autonomous non-LTR elements are represented by long interspersed nuclear elements (LINEs), which are among the most abundant retrotransposons in mammalian genomes (868,000 insertions in the human genome (20 %) [40] and 660,000 insertions in the mouse genome (9 %) [10]). LINE elements are 6–7 kb long and carry two open reading frames but most of the genome insertions are truncated at the 5′ end. Importantly, LINE elements are resistant to the above-mentioned problem of integration of faulty retrotransposon copies because of a strong cis-preference of the retrotransposition machinery. In other words, proteins translated from a LINE RNA preferentially associate with and retrotranspose the RNA from which they were translated [89].

Non-autonomous non-LTR short interspersed nuclear elements (SINEs) are relatively short sequences (<0.5 kb) related to RNA Polymerase (Pol III)-transcribed small RNAs and do not encode any proteins. Except of rodents and primates, animal SINEs are usually related to tRNAs (reviewed in detail in [74]). There are ∼1.5 million SINEs in human and mouse genomes, which occupy ∼11 and 8 % of the genomes, respectively [10, 11]. The most studied mammalian SINEs are human Alu elements, which are derived from the small cytoplasmic 7SL RNA and are the most abundant transposable elements in the human genome (∼1 million insertions [40]). SINE elements in mice are more heterogeneous. The most abundant murine SINE element is SINE B1, which is ∼140 bp long element derived from a portion of 7SL RNA, which has ∼560 000 copies (2.66 %) in a haploid genome [10]. Another noteworthy murine SINE element is SINE B2, which is a tRNA-derived ∼190 bp long element, which has ∼350,000 copies (2.39 %) in the genome [10].

Since 1980’s, complex eukaryotic genomes were considered loaded with selfish DNA (or “junk” DNA), which expands in a genome without contributing to (or even at the expense of) the fitness of the organism [17, 68]. Accordingly, retrotransposons were seen as harmful genomic parasites causing mutations and threatening the genome integrity. This view was reinforced by their retroviral origin and identification of disease-causing mutations [34, 45]. However, retrotransposons were also proposed to be one of the major contributors to genome evolution [15]. More detailed analyses of eukaryotic genome fueled by NGS surge painted a colorful picture where mutated transpositionally incompetent elements can still provide functional cis-elements regulating adjacent genes such as alternative promoters, transcription factor binding sites, enhancers, exons, terminators, and splice junctions (Fig. 2a) [1, 22, 27, 33, 35, 44, 51, 71]. It has been estimated that 16 % of eutherian-specific conserved non-coding elements are derived from mobile elements, implicating their major contribution to mammalian evolution [23, 63]. Furthermore, transposable elements are evolutionarily among the most lineage-specific sequence elements, especially in mammals [59, 82]. Mouse- and human-specific retrotransposons constitute 87.0 and 51.9 % of all mouse and human retrotransposons, respectively [10].

Fig. 2
figure 2

Contribution of retrotransposon sequences to lncRNAs. a Retrotransposons can provide different functional sequences to lncRNAs. Retrotransposon sequences in a lncRNA gene scheme are indicated by black color. pA, polyadenylation signal. The arrow depicts the transcription start site and direction of transcription. b Heureka, a kinetic statue by Jean Tinguely in Zurich, symbolizes her stochastic building of a new functional structure from used material. Photo by Martin Moravec, 2016

Long non-coding RNAs

Large-scale genome analyses brought surprising findings, which changed perspective on “junk DNA”, a traditional label for a part of a genome that did not encode proteins. Transcriptome and chromatin analyses revealed complex RNA production and typical gene-like chromatin signatures outside of protein-coding gene loci (areas traditionally and doubtfully termed intergenic regions) as well as much more complex RNA synthesis within protein-coding gene loci than previously appreciated [16, 32, 81]. Importantly, in the current view of genome organization, the term “junk DNA” does not need to be replaced; it is only the interpretation that should shift from “useless waste” towards “material for recycling”. When one explores RNA expression in intergenic regions and compares same loci in different species, an idea comes to mind of a large scrap yard, where an old material is being recycled in a TinuelyFootnote 1-like fashion (Fig. 2b).

Large-scale genome analyses also revealed large numbers of lncRNAs that do not have an apparent protein-coding potential. A handful of lncRNAs such as Xist, H19, and few others were known already before the NGS era [46]. However, NGS began pouring lncRNAs by thousands [5, 16, 32, 85]. LncRNAs are generally >200 nt, have a bias towards two-exon transcripts, and have predominantly nuclear localization [16]. Their biogenesis resembles that of mRNAs—they are transcribed by Pol II, capped, usually spliced with a high degree of alternative splicing, and frequently polyadenylated, but they are not translated into proteins [32, 67, 81].

LncRNAs generally lack sequence conservation. In fact, lncRNA promoters are more conserved than lncRNA exons [16]. While lncRNA exons are more conserved than neutrally evolving sequences, their conservation is lower than that of untranslated regions in mRNAs but higher than introns of protein-coding genes [44]. DNA sequence conservation in genes is linked to non-coding features important for gene structure and expression (e.g., promoters, enhances, intron boundaries, or polyadenylation sites) and to the functionally important information stored in the encoded RNA (e.g., encoded protein). However, a non-coding RNA function often depends more on the secondary structure rather than on the primary nucleotide sequence. Thus, a conserved secondary RNA structure of a functionally important module in a lncRNA can be maintained via compensating mutations while a common primary sequence analysis of an entire lncRNA might show only a weak conservation. For example, an imprinting-regulating lncRNA Airn exhibits low expression, conservation, and stability, yet it is involved in silencing Igf2r, as the process of transcription is more important than stable transcripts accumulation [73]. Taken together, while conserved regions are assumed to have a function, it should not be assumed that function needs to be associated with sequence conservation [69].

LncRNAs are usually categorized based on their localization relative to the nearest protein-coding gene (Fig. 3a). Categorization by genomic position and exonic structure is the most widely used method because current bioinformatics expertise is not sufficient to perform reasonable function prediction and classification based on lncRNA exonic sequences. This contrasts with protein-coding genes where one can classify protein-coding RNAs and make functional predictions based on identification of annotated functional domains encoded in nucleic acid sequences.

Fig. 3
figure 3

LncRNA classification. a LncRNA can be classified based on their position relative to the nearest protein-coding genes. The scheme depicts diversity of lncRNAs in terms of possible overlaps with exons and introns of the nearest protein-coding gene. b Four major effects mediated by lncRNAs. Shown are schemes of basic mechanisms by which lncRNAs act

LncRNAs have been linked to transcriptional regulation and chromatin modification, especially during pluripotency and differentiation [3, 16, 27, 47, 92]. However, their range of roles is much broader. In terms of functions, lncRNAs are a heterogeneous group that can be classified in many ways. According to the place of action relative to the encoding locus, lncRNAs are classified as cis- or trans-acting lncRNA. Another criterion can be binding partners (protein, DNA, RNA, or a combination) or cellular localization (nuclear/cytoplasmic). Here, we decided to combine classification of lncRNA effects described in the literature [31, 86] into four categories reflecting distinct modes of action: (i) signaling/allosteric effects, (ii) decoying, (iii) scaffolding, and (iv) guiding and tethering (Fig. 3b). Importantly, a specific lncRNA can exert a combination of these effects as its sequence can carry functionally different modules.

Retrotransposon sequences in lncRNAs

Retrotransposons make a strong contribution to lncRNA sequences. Over two thirds of mature lncRNA sequences (75 and 68 % of human and mouse, respectively) have at least a partial retrotransposon insertion in their sequence, which is more than other type of RNA sequences, such as protein-coding sequences, small RNAs, or untranslated regions [44]. The high content of retrotransposon sequences is likely a contributing factor to sequence diversification and high complexity of lncRNAs. At the same time, it was found that human lncRNAs rarely have extensive sequence similarity to each other outside of shared repetitive elements [16].

Retrotransposons overlap with various lncRNA elements—an internal part of an exon, a transcription start site (TSS), a polyadenylation (polyA) site, a splice donor or acceptor (Fig. 4). The contribution of retrotransposons to functional features of lncRNA is much more than protein-coding loci. Approximately, 23 and 30 % of non-redundant TSS and polyA sites, respectively, used by lncRNA transcripts in the human GENCODE v13 set, were found to be provided by retrotransposons [44]. This strongly contrasts with retrotransposon association with 1.7 % of TSS and 7.9 % of polyA sites of protein-coding genes. In total, 29,519 transposable-element derived functional features (TSS, polyA and splice sites) were identified in GENCODE v13 [16, 44].

Fig. 4
figure 4

Four mechanisms of a new lncRNA evolution. While (i) requires duplication of an entire locus, (ii) and (iii) involve insertions, deletions. All three mechanisms rely on pre-existing transcriptional units. (iv) entails a sequence change (mutation, retrotransposon insertion) leading to a formation of a new promoter in a previously untranscribed region. The newly emerged lncRNA is depicted in black. Gray boxes connected with a dashed line represent exons. White rectangles depict protein-coding sequences. Dark gray rectangles represent two LTRs of a retrotransposon

Apart from mutated retrotransposons, which produce non-coding RNAs themselves, some lncRNAs are almost completely made of several different retrotransposon sequences. An example of such a lncRNA is UCA1. Its expression is enriched in bladder carcinomas and it conserved only in a few primate species [87]. In addition, many annotated lncRNAs share a significant proportion of their sequence with retrotransposons, for example, XIST [19], lincRNA-RoR [56], BORG, UCA1 [87], HULC [70], SLC7A2-IT1A/B [7] etc. Some of these mature lncRNA transcripts are almost entirely composed of transposable elements sequences. For example, the first three exons of the mature transcript of human LncRNA BANCR, which is involved in melanoma cell migration [25], are derived from a MER41 retrotransposon of ERV1 LTR retrotransposon family [44]. Mouse lncRNA Borg, which is proposed to have a role in bone morphogenesis [79], has three of its splice site overlapping with B4 SINE elements and MaLR family LTR elements while its second exon is completely composed of an LTR sequence of EVRL-MaLR family retrotransposon. A unique case of retrotransposon sequence-enriched lncRNAs is precursors for small PIWI-associated RNAs (piRNAs), in which accumulation of retrotransposon sequences is functionally desirable (discussed further in the section Retrotransposons and lncRNA functions).

All four major retrotransposon types (Fig. 1) contribute to lncRNA exons approximately proportionally to their occurrence in the genome [44]. Relative to protein-coding genes, LTR/ERV elements were found to be the most enriched retrotransposon families in mouse and human lncRNAs, especially in the lncRNA exons and proximal to lncRNA genes [44]. Moreover, over 40 % of retrotransposon-derived TSSs in the GENCODEv13 map within ERVs [16, 44]. In embryonic stem cells (ESCs), the class of non-coding ESC-specific non-annotated stem transcripts (NASTs) was strongly associated with LTR retrotransposons, particularly with the ERVK and MaLR LTR subfamilies in mice and with ERV1 in humans [27]. Consistent with this, ERVK and MaLR families appeared to be significantly more highly expressed in mouse ESCs; ERV1 and ERVKs showed similar trends in human undifferentiated ESCs [22].

Retrotransposon contribution to tissue-specific lncRNA expression

Tissue-specific expression is one of the characteristic features of lncRNAs. According to the GENCODE v7 data, majority of human protein-coding genes are expressed in multiple tissues whereas expression of majority of lncRNAs is restricted to single tissues [16]. Certain tissues also exhibit enriched lncRNA expression [38, 65]. In this context, it is worth of noting that retrotransposons (especially LTRs) contain regulatory cis-acting elements, which may function as promoters or enhancers [27, 44, 51].

Retrotransposon expression is naturally selected for germline cells because somatic retrotransposition in a sexually reproducing organism is not transmitted into the next generation. Therefore, to increase their copy number in the genome, retrotransposons must direct their activity into the germline. This rationale is consistent with the observation that testes exhibit higher expression of lncRNAs among different organs, with stronger specificity for young than for old lncRNAs [16, 65]. It is believed that chromatin remodeling during male germ cell development provides window of opportunity for this extensive transcription and higher expression of lncRNA [77]. This window is also explored by retrotransposons, which may lead to the birth of new or younger retrotransposon-driven lncRNA transcripts. Contribution of retrotransposons to tissue specific expression has also been well documented for mouse oocytes, where several non-autonomous LTR retrotransposons drive expression of oocyte-specific mRNAs [71]. Accordingly, a recent study reported high expression of lncRNAs in oocytes, from MaLR and EVRK family retrotransposons [84].

Retrotransposons do not support expression only in the germline. There are multiple examples showing that LTR sequences function as enhancers/promoters also in somatic cells. To name a few: Cap analysis of gene expression (CAGE) method revealed that MaLR elements provide promoters in murine adipose tissue, hippocampus, neuroblastoma, and hepatoma cells [22]. Murine VL30 retrotransposon LTRs were shown to function as promoter and enhancer elements in hepatocytes in vivo [37]. LTRs of the human ERV-9 endogenous retrovirus (2-4000 copies/genome) possess enhancer activities in embryonic and hematopoietic cells [54]. Importantly, whether a retrotransposon sequence would function as a promoter or enhancer depends on the chromatin context while the bulk of retrotransposon sequences is silenced by heterochromatin formation during cell differentiation [60, 93]. This implies that expression of retrotransposon-driven lncRNAs would emerge under conditions favoring loss of heterochromatin marks.

LncRNAs in ESCs

A large volume of lncRNA data comes from ESCs, which are an artificial undifferentiated cell type derived from an early embryo, which can be propagated in cell culture while retaining pluripotency. Retrotransposons significantly contribute to ESC-specific expression of lncRNAs, which is conceivable given the reduced heterochromatin at repetitive elements observed in undifferentiated ESCs [60]. Human and mouse ESC lncRNA promoters are located more often in specific LTR retrotransposon families than in the differentiated cells [27]. Approximately 30 % of transcripts (CAGE tags) derived from human embryonic tissues were found to be associated with repetitive elements (16 % retrotransposon, 10 % satellite, 5 % simple repeat), particularly in LINE subfamilies [22]. Among the above-mentioned NASTs (lncRNAs), those associated with LTR-associated promoters accumulate to higher levels than those expressed from promoters not associated with repeats [27]. A quarter of POU5F1, NANOG, and CTCF-bound regions in humans and mouse were found to be within transposable elements [51]. In addition, enrichment for stem cell transcription factors bound at lncRNA (NASTs) loci associated with mouse ERVK and MaLR and human ERV1 elements was greater than for the non-expressed elements [27, 51]. Several HERVH lncRNAs were found expressed at higher levels in ESCs than in any other tissue or cell line [47]. Likewise, the mouse EVRK family also manifested this kind of stem cell-specific expression [47].

Interestingly, ten human lncRNAs significantly upregulated in induced pluripotent stem cells (iPSCs) relative to ESCs were identified [56]. Among them, linc-RoR, which acts as an important modulator of iPSC reprogramming, is almost entirely composed of retrotransposon-derived sequence from seven different retrotransposon families and has an ERV1 LTR at its TSS [47, 56]. Accordingly, it was suggested that endogenous retroviruses shape pluripotency networks via lncRNA regulation in mammals [27, 47]. This notion was corroborated by another study, where 9241 human and 981 mouse lncRNAs that were found to be strongly associated with LTR elements; expressed lncRNAs (NASTs) were associated with mouse ERVK, mouse MaLR and human ERV1 elements, which become silenced by heterochromatin upon differentiation [27].

LncRNA evolution and retrotransposon contributions

LncRNAs are poorly conserved through evolution. The primary lncRNA sequence is loosely connected with functional conservation and importance, as exemplified by XIST, a lncRNA controlling X chromosome inactivation in mammals (reviewed for example in [28]). Mouse and human Xist/XIST transcripts show 49 % sequence identity, which is lower than 5′ and 3′ UTR regions but slightly higher than introns. The homology is not continuous but represents alternating totally unrelated sequences and seven gap-free regions (90–160 bp) of relatively high homology (68–86 %) [4, 66]. Mammalian Xist is also a good example of complex lncRNA evolution with a strong contribution of retrotransposons. It has been proposed that Xist evolved in early eutherians from a protein-coding gene Lnx3 by integration of transposable elements [19]. The Xist gene promoter region and 4/10 exons retain homology to Lnx3 exons. The remaining six Xist exons including those with simple tandem repeats have similarity to different transposable elements. Furthermore, transposable elements in Xist exons are species-specific hence contributing to diversification of Xist transcripts during eutherian evolution [18, 19].

Four possible mechanisms were proposed for new lncRNA origins (Fig. 4): (i) genomic duplication of another lncRNA—this mechanism is also common for protein-coding genes, (ii) birth of a long non-coding RNA from a pseudogene or a protein coding gene, which loses its coding potential, (iii) derivation a new lncRNA from retrotransposon sequences, and (iv) de novo emergence from a previously untranscribed genomic location [44, 72, 83]. Retrotransposons can contribute towards birth of lncRNAs from protein-coding genes by either disrupting the gene or producing a processed pseudogene by reverse transcribing and integrating its mRNA. De novo emergence of lncRNA in a previously untranscribed location can be induced by a novel retrotransposon insertion, which will provide a promoter. Thus, retrotransposons can play a major role in origin and diversification of lncRNAs. This notion is supported by analysis of lineage specific lncRNAs in mammals whose emergence can be mainly credited to retrotransposon sequences (especially LTRs) [27, 59].

Retrotransposons and lncRNA function

As mentioned above, four basic mechanisms of action were proposed for lncRNAs: (i) signaling/allosteric effects, (ii) decoying, (iii) scaffolding, and (iv) guiding and tethering. LncRNAs have various biological functions, including regulation of chromatin structure and transcription where lncRNA can attract silencing or activating complexes to the locus. For example, lncRNA Air and KCNQ1ot1, recruit a chromatin modifying complex to the site of their transcription and silence the locus [50, 64]. Although molecular mechanisms through which lncRNAs act are still only partially understood, a few interesting examples emerged concerning contribution of retrotransposons (particularly of the SINE class) to lncRNA function.

In the first example, a non-coding RNA from a specific retrotransposon regulates spatiotemporal control of gene expression. A specific SINE B2 element functions as a boundary element and its transcription is implicated in the control of growth hormone gene (GH) activity during embryonic development. Pituitary gland-specific expression of GH is repressed until the embryonic day E17.5. A repressive H3K9me3 mark observed at the GH promoter until the E12.5 is replaced by an H3K9me2 mark by E14.5, which is completely lost by E17.5 [57]. A specific SINE B2 element located ∼14 kb upstream of the promoter appears to regulate the temporal activation of GH gene by bidirectional Pol II and Pol III-transcribed non-coding RNAs, which are necessary and sufficient to enable repositioning of the GH locus between nuclear compartments. According to the model, Pol III transcription is implicated in the maintenance of the H3K9me3 repressive mark while Pol II transcription correlates with the loss of heterochromatin and gene activation [57].

Nuclear SINE B2 RNA was also implicated in transcriptional repression under stress conditions. Pol III-transcribed SINE B2 RNA forms secondary structures that can bind Pol II and interfere with polymerase binding, hence causing transcriptional block [21]. Consequently, non-coding RNA transcripts from mouse SINE B2 lead to transcriptional repression during heat shock response [20]. A similar mechanism was reported for Alu RNA, which forms secondary structures similar to B2 SINE RNA, directly binds Pol II, and inhibits transcription during heat shock response in humans [59].

Alu elements are one of the most abundant (1.3 million copies) primate-specific repetitive elements in the human genome [2, 40]. Alu elements regulate gene expression by acting as silencers, promoters, or enhancers [35, 58]. They can also provide templates for A-to-I editing by adenosine deaminase acting on RNA (ADAR) enzyme family. An Alu sequence in lncRNAs can function as a guide and induce Staufen 1 (STAU1)-mediated mRNA decay (SMD) by base pairing with a complementary Alu sequence harbored in the cognate mRNA [30]. STAU1 is a double-stranded RNA binding protein, which was shown to target mRNAs to SMD through binding a STAU1-binding site (SBS), a 3′ UTR 19 nt stem loop structure [49]. However, some SMD-targeted mRNAs, such as Serpine1 and Ankrd57 transcripts, lack SBS. Instead, their 3′ UTRs contain an Alu element sequence. This sequence can then base pair with an Alu-containing lncRNA, forming an imperfect double stranded stem structure mimicking SBS, which in turn leads to SMD. This mechanism might be much more common as many mRNAs carry Alu elements in their 3′ UTRs and 23 % of lncRNAs carry Alu sequence [30]. A similar mechanism of lncRNA and mRNA base-pairing was also shown in mice for SINE elements [88].

Retrotransposons can also contribute to post-transcriptional control in the cytoplasm by selectively stimulating proteosynthesis, as it was demonstrated for Uchl1-AS, a lncRNA antisense to ubiquitin carboxy-terminal hydrolase L1 (Uchl1) gene. UCHL1 is a dual function protein with deubiquitinating and ubiquityl ligase activities expressed mainly in neuronal cells [55]. UCHL1 has been associated with brain function and neurodegenerative diseases such as Parkinson’s and Alzheimer’s disease [9, 55]. Uchl1-AS transcripts, which overlap with the 5′ end of Uchl1 mRNAs, are initially retained in the nucleus, while Uchl1 mRNAs translocate to cytoplasm. Upon cellular stress, Uchl1-AS transcripts move to cytoplasm, which in turn accelerates Uchl1 mRNA translation. This stress-induced proteosynthesis stimulation requires a particular SINE B2 element at the 3′ end of Uchl1-AS along with the 5′ overlap region [6]. The same antisense and SINEB2 dependence was also reported for UXT chaperon protein [6]. The mechanism by which SINE B2 element exerts post-transcriptional regulation is not known. It is conceivable that it, like the mechanisms above, involves a secondary structure, which is a signaling cue for the assembly of translation enhancers or directly binds them.

A distinct case of guiding function of lncRNA-embedded retrotransposon sequences is lncRNA substrates processed into piRNAs, small RNAs (24-32 nucleotides) guiding repressive ribonucleoprotein complexes. The piRNA pathways (reviewed in detail in [90]) suppress retrotransposons and protect the genome integrity in the germline at both, transcriptional and post-transcriptional levels. A complex biogenesis of piRNAs from long lncRNA precursors involves a concerted action of PIWI proteins and other RNA nucleases. Precursor lncRNAs originate from distinct genomic regions (piRNA cluster regions), which harbor retrotransposon sequences and can be seen as checkpoints for screening retrotransposons expanding through the genome. Once a retrotransposon expanding in the genome integrates into such a checkpoint locus, it will be recognized by the piRNA system and all transcripts of that retrotransposon will be recognized and targeted in the germline by complementary piRNAs [41, 90].

Finally, there are also examples linking retrotransposon-lncRNA function to pathophysiology. For example, a trans-acting lncRNA ANRIL, which maps to the atherosclerosis locus on the chromosome 9p21 locus [26, 61, 91], contains an Alu motif implicated in binding genes with a similar Alu motif. The Alu motif mutations in ANRIL reverse trans-effects and pro-atherogenic cellular properties [39]. A single point mutation, which has been linked to an encephalopathy, was found in LINE1 sequence in lncRNA SLC7A2-IT1A/B. This mutation results eightfold downregulation of SLC7A2 intronic lncRNA transcripts in patients brain tissue and increased apoptosis [7]. Finally, high expression of LINE-1 chimeric non-coding transcripts has been observed in breast and colon cancers, which contribute to tumor invasion and metastasis through antisense-mediated downregulation of TFPI2 gene [14].

Summary

Retrotransposons are closely associated with birth, evolution, expression, and function of lncRNAs. Retrotransposons provide mobile platforms giving a rise to novel lncRNAs from protein-coding genes as well as from previously untranscribed regions. Thus, retrotransposons serve as a recycling system probing at random potential of “junk DNA” and creating novel functions through lncRNA. Retrotransposon-derived promoter and enhancer platforms offer synchronization and coordination of lncRNA expression. Retrotransposons also distribute complementary sequences across the genome, providing opportunities for guiding and tethering functions of lncRNAs. At the same time, lncRNAs are employed by the genome defense where they allow for surveying the retrotransposon content and mediating their silencing.