Gene regulation

The principle of gene expression is based on epigenetic, transcriptional, and post-transcriptional regulation. Conrad Hal Waddington coined the term “epigenotype or epigenetics” to summarize the complex developmental processes existing between phenotype and genotype in 1942 [1]. Today, we define epigenetics as the study of heritable changes in gene activity, which are not caused by changes in the DNA sequence. The genomic architecture and intra or interchromosomal communication are key mechanisms for accurate gene regulation [2]. Post-transcriptional processes such as alternative splicing, RNA editing, and microRNA-mediated regulation are reviewed elsewhere [3, 4].

Genomic regulators localized on one chromosome that act on the same chromosome are termed cis-regulatory elements. Elements regulating in-trans are interchromosomal regulators that communicate either between homologous or non-homologous chromosomes. Enhancer elements activate gene expression, in contrast to silencers that suppress expression. Insulators determine barriers between different chromatin states (Fig. 1a) and affect expression secondarily [5]. The gene-regulatory elements can exhibit high conservation; however, tissue-specific expression can differ greatly between species [6, 7]. The distance between the down or upstream-located regulator and the target gene ranges several kilobases to 1.5 megabases (Mb) [8]. Functional protein complexes, namely, transcription factors (TFs) and co-activators, bind with chromatin-modifying proteins at DNA consensus motifs. These motif complexes influence the gene expression. The nucleosome state is changed through ATP-dependent remodeling complexes of the SWitch/sucrose nonfermentable (SWI/SNF) families (first found in yeast) or through histone modifiers, such as histone acetyl or methyl transferases [9, 10]. The tissue-specific expression is based on the chromatin state and the bound TF. Histones, the nucleoproteins around which the DNA is wrapped, and histone marks characterize the chromatin state. More than 60 different reversible and irreversible histone modifications are known. They vary between tissues and species and determine the transcriptional active euchromatin or inactive heterochromatin states [11]. The histone modifications, especially during chromatin looping,Footnote 1 determine quantitatively the gene regulation.

Fig. 1
figure 1

a Scheme of gene-regulatory elements influencing gene expression. Enhancer, silencer, and insulator elements can be localized up to 1.5 Mb upstream or downstream of the transcription start site. Response elements within gene promoters bind transcription factors and co-activators to maintain a tissue-specific gene expression. The regulation is dependent on the histone modifications. b Chromatin-looping between a cis-regulatory element (CRE) and its target gene promoter. Recruited modifying, mediating, activating, or repressing proteins build the transcription factor co-activator complex (TFCC) and interact in physical proximity with the promoter to regulate gene expression

Gene-regulatory elements (Fig. 1b) are then in physical proximity to gene promoters to drive transcription [12]. Methylated alleles or gene clusters of either the maternal or paternal alleles in genomic imprinting processes and during X inactivation to compensate gene dosage become manifested in early embryogenesis [13]. The selection of activating or repressing histone marks was found to be operated by a sequential and combinatorial epigenetic code or language depending on the histone modifications involved, the DNA-binding proteins, and non-coding RNAs (ncRNAs), thereby assuring tissue-specific gene expression and epigenetic modifications [14, 15]. The combinatorial diversity is also due to the variety of functional gene regulators, gene promoters, gene homologs or pseudo genes, and embryonic and tissue-specific developmental stages. As inherited, framework or environmental epigenetic conditions determine systemically each functional element supporting proper gene regulation. Alterations of the complex interactions are often clinically apparent in numeric or structural aberrant karyotypes [16]. The physical dissociation of regulator and gene can cause positional effects leading to differentially expressed genes [17].

lncRNAs influence gene and genome regulation

The Encyclopedia of DNA Elements (ENCODE) consortium was founded in 2003 to characterize and annotate functional genomic elements of the human genome and of several transcriptomes. ENCODE determined that protein-coding genes are not the only major units of the genome that is nearly fully transcribed. Only <3 % of the transcripts originate from protein-coding genes [18]. MicroRNAs (miRNAs), small interfering RNAs (siRNAs), and PIWI-interactingFootnote 2 RNAs (piRNAs) describe the class of short ncRNAs. In contrast, lncRNAs have more than 200 nucleotides, are intra or intergenic, with or without a poly-A signal, oftentimes exhibit low expression levels, and can be highly tissue-specifically expressed and conserved [19, 20]. Linear ncRNAs and circular RNAs (circRNAs) have no protein-coding potential and can exist as mono- or multi-exonic sense and antisense transcripts [21, 22]. The circRNAs are post-transcriptional regulators that are formed by head-to-tail splicing. The first detailed investigation of the circular cerebellar degeneration-related protein 1 transcript (CDR1as) determined antagonistic actions on miRNA [22]. Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) experiments revealed that H3K4me1, H3K36me3, H3K27ac, and p300 characterize gene-activating enhancers and also lncRNAs loci [20, 23]. RNA sequencing (RNA-seq) of 24 human tissues identified more than 8,000 lncRNAs that are typically co-expressed with their neighboring genes [19]. lncRNAs are tools for the gene and genome regulation within the nucleus [24]. Key roles of lncRNAs have been attributed to the biological processes, chromatin remodeling [25], X-chromosome inactivation [26], embryonic stem cell pluripotency [27], embryogenesis, and development [28] as well as imprinting of genomic loci [13].

Since 1991, the X-inactive specific transcript (Xist) has been investigated to gain insight into the mechanisms of X-chromosomal inactivation (XCI) [26]. In 2013, Xist was shown to exploit the three-dimensional structure of the X chromosome to spread from its transcription site to loci with high gene density and transcription that are in physical nuclear proximity. After the recruitment of the chromatin-modifying polycomb group (PcG) proteins, Xist pulls up further in cis localized X-chromosomal regions to pursue inactivation through the formation of a transcriptionally silent H3K27me3 nuclear compartment and spreads dependently on its internal A-repeat domain across the entire X chromosome [29]. Another project explored the spreading mechanism of Xist on the 150 Mb of the X chromosome [30]. Simon and colleagues described a two-step inactivation mechanism. During the XCI establishment in early embryonic cells, Xist targets gene-rich domains before spreading to intervening gene-poor domains. The mechanism seems to persist as epigenetic memory for a facilitated and more efficient XCI during somatic proliferation and maintenance [30]. The active counterpart of Xist is Tsix, a gene that functions as an antisense to Xist to support the active X chromosome. Furthermore, the additional antagonistic relationship of Xist to Jpx, a lncRNA Xist activator, demonstrates that lncRNA-antisense transcripts regulate lncRNA [31, 32]. Also in 2013, Sun and colleagues showed that the lncRNA, Jpx, displaces the chromatin remodeling and RNA-binding protein, CTCF,Footnote 3 from one X chromosome to regulate the Xist expression in a titration-dependent antagonistic mechanism. Prior to XCI, CTCF normally inhibits the Xist expression. However, in the absence of CTCF, Jpx activates the Xist promoter [33]. In summary, these results broaden the classic understanding of how genes or chromosomes are regulated. The process of chromatin looping to establish gene expression functions inter alia through the actions of lncRNA and their associated proteins.

A second well-studied epigenetically acting lncRNA is the HOX antisense intergenic RNA (HOTAIR). HOTAIR is transcribed from the HOXC gene cluster on chromosome 12 and acts in a repressing manner on 40 kilobases of the HOXD cluster on chromosome 2 in trans through H3K27 trimethylation [34]. The lncRNAs Air and KCNQ1OT1 are both localized on imprinted paternal alleles. They recruit the polycomb repressive complex 2 (PRC2) and a histone methylase mediating the enrichment of the histone modification H3K9me3 to silence the genes KCNQ1 and IGFR2 [35, 36]. In contrast to lncRNA-silencing genes, several activating lncRNAs have been described that promote gene expression of in cis target loci with protein-coding genes [37]. HOXA transcript at the distal tip (HOTTIP) directly interacts in cis with the WDR5/MLL complex ending up in the activation of gene transcription from the HOXA gene cluster through enrichment of the euchromatin characteristic H3K4me3 flag [38]. The cis- and trans-chromosomal interaction lncRNA (CISTR-ACT) is a chondrogenic regulator that interacts in cis and in trans with essential developmental genes determining the cartilage. The CISTR-ACT lncRNA is transcribed from an enhancer that loops to a 24.4 Mb distant chromosome 12 position to induce PTHLH expression. In addition, CISTR-ACT pinpoints SOX9 on the non-homologous chromosome 17 in trans [39]. The knowledge about HOTAIR and CISTR-ACT extend the gene-regulatory background regarding in cis interactions. In in trans communications of homologous or non-homologous chromosomes, the nuclear architecture is a major participating element.

As far back as 1904, Theodor Boveri coined the term of chromosomal territories [40]. Chromosomes are not randomly distributed within the nucleus, although mitotic conformational changes occur [41]. Transcriptional active euchromatin is predominantly located in the nuclear center; the compact heterochromatin resides in the outer nucleus. The positioning of chromosomes and their interaction is not predetermined, but rather is a result of a stochastically calculated process, including chromosome looping [42]. In the molecular genetic high-throughput techniques, a modification of chromosome conformation capture (3C) termed Hi-C and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) are methods that allow proximal chromosomal contacts to be identified. The experimental analysis determined the spatial proximity of gene-rich regions and chromosomes and showed the segregation of euchromatin and heterochromatin genomic areas [43, 44]. The topological nuclear domains can determine tissue-specific gene regulation through different sizes and attributes.

Some lncRNAs have been intensively investigated during the last two decades (Table 1). The data yielded insight into the highly organized structure of the nucleus. The complex interplay between chromosome territories, chromatin state, and lncRNAs affects the tissue-specific gene regulation to control developmental stages or to maintain tissue perpetuation [41]. The lncRNAs can reflect nuclear addresses acting in a local, locus-specific, or allele-specific manner for the control of gene expression, genome organization, or regulation. The initial lncRNA transcription signals in cis or in trans and recruits chromatin-modifying complexes or basic factors of the transcription machinery. Moreover, the lncRNA transcription can be involved in the formation of topological nuclear domains, thereby working secondarily on gene regulation [45]. The described lncRNA-mediated regulation scenarios are maintained from different classes of lcnRNAs that belong to the families of competing endogenous lncRNAs (ceRNAs), activating or enhancer-like lncRNAs, natural antisense transcript (NAT) lncRNAs, and small nucleolar RNAs (snoRNAs, Table 1). Table 1 lists the lncRNAs with defined biological functions or with a proven association with human diseases.

Table 1 Biological functions and associations of lncRNAs in Mendelian diseases, cancers, and cardiovascular and neurological diseases.

Functional insights in lncRNA-mediated gene regulation

We are merely at the beginning in our understanding of the functional processes and the biological width of lncRNA-mediated gene regulation. One controversial question is how lncRNAs can specifically regulate their target genes. Do scaffolds between lncRNAs and mediating protein complexes exist to guarantee the in cis and in trans regulations? Are triple helices of lncRNAs:DNA:DNA formed, or do lncRNAs bind transcription factors and mediate gene expression themselves? During chromatin looping, are lncRNAs dropped from their transcription site for a specific and direct lncRNAs:DNA binding at regulatory sites within a coding gene locus? Do lncRNAs bind the mRNA transcript for post-transcriptional modifications?

The features of lncRNAs for biologically relevant regulatory processes are their keys to success. Several lncRNAs function as decoys to trap regulatory proteins. DNA damage mediated the induction of the lncRNA PANDA that interacted with the transcription factor NF-YA, to prevent apoptosis by titrating the NF-YA away from target genes. The expression control of pro-apoptotic genes can be a general feature of genes that drive mitosis and which promoters harbor lncRNAs [46] (Fig. 2a). LncRNAs can provide the service as scaffolding adaptors to bind protein complexes for further gene targeting (Fig. 2b). The lncRNA, TERC, serves as scaffold to transport the telomerase complex [47]. Promoter-associated RNA (pRNA) associate with the chromatin-remodeling complex NoRC/TIP5 to induce transcriptional silencing through DNA methylation of rRNA genes [48]. The DNA methylase, DNMT3b, then recognizes the triple helix of lncRNA at the DNA binding site for the transcription factor TTF-1 [49]. For HOTTIP, Wang et al. showed that chromosomal looping with spatial proximity within the HOXA gene cluster is necessary to drive the transcription of several 5′ HOXA genes through direct binding of the co-activating HOTTIP lncRNA with the adaptor protein WDR5 [38]. Some lncRNAs seem to translate higher spatial chromosome structures and processes such as looping into defined chromatin modifications and domains to control gene expression. The same mechanism could be subject to CISTR-ACT and other enhancer-encoded lncRNAs [37, 39] (Fig. 2c). The physical proximity in chromatin loops enables the transformation of higher order genome conformation into biochemical histone modifications and transcription factor recruitment. The lincRNA-p21, HOTAIR, XIST, AIR, and other lncRNAs can bind RNA-binding or chromatin-remodeling proteins to support guiding functions to conduct further remodeling complexes or co-activators or repressors to specific genomic loci [50, 51] (Table 1, Fig. 2d). Xist can stack the transcription factor YY1 that is capable to bind RNA and DNA, thereby attaching Xist to the X chromosome. They form the nucleation center together with PRC2 and squelch gene expression by competing with the transcription machinery [52].

Fig. 2
figure 2

a lncRNA can serve as decoys to control the actions of DNA-binding proteins, i.e., PANDA. b Scaffold structures of lncRNA with protein partners display functional units for gene regulation, TERC. c Enhancer-encoded lncRNA act specifically on their target genes through chromatin loops, i.e., CISTR-ACT and HOTTIP. d Either DNA-bound adaptor proteins bind lncRNA or DNA-bound lncRNA serves as guides for further functional processes, i.e., Xist or HOTAIR. Triple-helix formations of lncRNA with the DNA double helix are possible

lncRNAs in development and disease

In addition to epigenetic functions during X-chromosome inactivation, imprinting and co-activation, or repression of genes, lncRNAs have been attributed to various functions in cellular homeostasis, during development, and in pathogenesis of diseases. The half-STAU1-binding site RNA (½-sbsRNA) co-activate the STAU1-mediated mRNA decay by dsRNA formation to regulate the degradation of translationally active mRNAs [53]. The terminal differentiation-induced lncRNA (TINCR) regulates the somatic tissue differentiation through binding to differentiation-mediating mRNAs for proper translation [54]. TINCR directly binds to the STAU1 protein, thereby stabilizing differentiating mRNAs. Braveheart (Bvht) was identified as lncRNA responsible for the establishment of the cardiovascular lineage determination and the maintenance of the cardiac fate [55]. Bvht conducts its functions through interaction with SUZ12, an important subunit of PRC2. EGO (eosinophil granule ontogeny) plays a role during eosinophil development [56]. Fendrr is a lateral mesoderm-specific lncRNA controlling mesodermal differentiation and its developing derivatives heart and body wall through binding to the histone-remodeling complexes PRC2 and TrxG/MLL [57]. Fendrr-lacking embryos showed dysregulation of mesoderm-specific transcription factors and reduction of PRC2 enrichment at their loci. Gomafu is involved in neuronal and retinal development. Its downregulation, the binding of splicing factors and the resulting altered splicing patterns, was associated with schizophrenia [58]. The muscle-specific lncRNA linc-MD1 is a competing endogenous RNA (ceRNA) that controls muscle differentiation via sponging of miRNA. Moreover, linc-MD1 seems to be involved in the pathogenesis of Duchenne muscle dystrophy [59]. The lncRNA, FMR4, triggers the ratio of proliferation and apoptosis and was silenced in patients with the fragile X syndrome [60]. CISTR-ACT was dysregulated due to chromosomal translocations in two different families with the autosomal-dominant Mendelian disorder of the chondrodysplasia brachydactyly type E (BDE). Chromosome 12 translocations physically disrupted CISTR-ACT from the major chondrogenic morphogene, PTHLH, and caused dysregulation of the coding gene and lncRNA [39]. These data underscore the important interface between genome conformation and gene-lncRNA regulation. The suppression of UBE3A-ATS can activate UBE3A in patients with the neurogenetic disorder of Angelman’s syndrome [61, 62]. The severe phenotypes of leukemia, myelofibrosis, sarcoma, and vasculitis were detected in Xist-depleted mice and suggest Xist-mediated in vivo cancer repression. The loss of Xist seemed to reactivate the X chromosome leading finally through aberrant hematopoietic stem cells to cancer [63]. The metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) that is very abundantly expressed co-localizes with splicing factors in the nuclear speckles and regulates alternative splicing of pre-mRNAs [64]. Its association with migration, metastasis, and tumor growth in lung adenocarcinoma has been shown [65, 66]. MALAT1 is required for mitotic proliferation and seems to mediate its activity positively through activated p53 and B-MYB, an oncogenic transcription factor. Thus, the dysregulation of splicing factors and alternative splicing led to the dysregulation of cell-cycle-regulated transcription factors promoting cellular proliferation [66]. For the non-polyadenylated MALAT1, a 3′ triple helix formation that served as translational enhancer was found, was inhibited by miRNAs, and was argued for a major role in regulation and stabilization of MALAT1 [67]. HOTAIR was proposed as a diagnostic marker in breast and colorectal cancer. Its depletion resulted in reduced invasiveness, and its expression level correlated with differentially regulated genes of the PRC2 complex [25, 68]. Currently, upregulated HOTTIP and HOXA13 expressions were associated with the prognosis and progression of the hepatocellular carcinoma (HCC) [69]. The highly upregulated lncRNA HULC in liver cancer was found in the blood of HCC patients, promising a potential biomarker [70]. HULC sponges several miRNA such as miR-372, leading to transcriptional inhibition of target genes, i.e., the transcription factor CREB. The CREB motif within the HULC promoter supports CREB-mediated upregulation in liver cancer through an autoregulatory mechanism blocking the miR-372 function [71]. Moreover, HULC correlated with upregulated hepatitis B virus X protein (HBx) that importantly contributes to HCC and that was able to promote HULC expression. The HULC-mediated downregulation of the tumor suppressor p18 supported the HCC proliferation [72]. The expression of BACE1 antisense transcript (BACE1-AS) was linked to increased amyloid-β 1–42 in patients with Alzheimer’s disease and gave rise for a stabilizing function of the lncRNA [73]. Aberrant ANRIL transcripts and mutations were associated with cardiovascular disease and cancer [74, 75]. The existence of linear and circular ANRIL transcripts was found in patients with atherosclerosis [74]. The prostate cancer-associated ncRNA transcript 1 lncRNA PCAT-1 [76, 77], SchlAP1 (second chromosome locus associated with prostate-1) [78], and CTBP1-AS [79] indicate cancer cell invasiveness and metastasis in prostate cancer progression. SchlAP1 antagonizes the tumor-suppressing functions of the SWI/SNF chromatin-remodeling complex [78, 80], and CTBP1-AS represses CTBP not only by interacting with histone deacetylases and the transcriptional repressor PSF but also by inhibiting tumor-suppressor genes in a general manner [79].

Current research directions

The idea that lncRNAs are nuclear addresses or addressors respectively plays a major role in gene and genome regulation. The regulation of the regulators is not clear. The lncRNAs also need to be directed. One hypothesis is that the interplay between the marginally expressed lncRNAs, tissue-specific TF, and histone modifications ensure tissue-specific gene expression. In which extent, is stochastical coincidence present? In the human genome, 46 chromosomes containing ~3 × 109 bp communicate while underlying dramatic conformational changes during mitosis. However, the cellular and nuclear infrastructures remain to fulfill the particularized cellular tasks.

The lncRNAs are often highly tissue specific. Despite the barely understood mechanisms of their specific target-gene regulation, lncRNAs have a potential therapeutic value. To date, most of the therapeutic agents serve an inhibitory function. Blocking lncRNAs could lead to the upregulation of genes and have a stimulatory effect. The subclass of natural antisense transcripts (NATs), shown in Table 1, can be degraded or inhibited in binding their target mRNA through single-stranded antagonistic oligo-nucleotides (antagoNATs) [81]. The endogenous de-repression of genes could be the key in various haplo-insufficiencies. Moreover, upregulated lncRNAs in cancer that normally exhibit decoy functions could be also targets for antagoNATs [81]. Previously, antisense oligo-nucleotides (ASOs) were successfully applied to silence the RNA gain-of-function effect in the hereditary degenerative disease myotonic dystrophy type 1 (DM1), and a performed myogenic long-term Malat1 knockdown was effective [82]. Currently, siRNAs are being introduced therapeutically in patients.

Detailed system biological approaches are needed to locate, to annotate, and to characterize lncRNAs in development and disease. Various genetic model systems have to be established to understand the functional roles of lncRNAs:protein interactions that modulate chromatin remodeling complexes, gene, and genome regulation to investigate lncRNA-associated pathogenesis of disease or developmental defects. In two different projects generating Malat1 knockout mice, any apparent phenotype or alteration of the murine development was observed [83, 84]. Only in cis genes of Malat1 were differentially expressed [84]. The lncRNA NEAT1 is highly expressed in the mammal-specific nuclear paraspeckles Footnote 4. Interestingly, the NEAT1-depleted mice had no phenotypes, suggesting environmentally provoked nuclear structures [85]. In Malat1-depleted mice, showing no phenotype again, Neat1 was downregulated in several tissues lacking Malat1, indicating its dispensability in mice paraspeckles [86]. These data indicate that human-specific lncRNAs that do not exert their human functions in animal models may exist. In contrast, a Hotair deletion in mice was associated with malformation of the spine and metacarpals and a general, non-selective de-repression of genes [87]. In the latest study of 18 knockout models for approved lncRNAs, only five displayed apparent phenotypes [88]. The results indicate that long-term and more precise phenotypization could reveal additional subtle and highly tissue-specific behavioral phenotypes. The lncRNAs that have been associated with clinically apparent phenotypes are only the tip of the iceberg. The detailed molecular analysis of lncRNAs will augment the understanding of the nuclear regulation networks and discover more pathogenic lncRNAs or circRNAs that can serve as clinically relevant prognostic or diagnostic biomarker or as therapeutic targets.