Keywords

11.1 Introduction

Only 2% of the human genome consists of protein-coding genes, and the rest of the genome was once considered “junk.” However, over the past decades, advances in sequencing technologies and analytical methods have led to the discovery of tens of thousands of RNAs that are pervasively transcribed from mammalian genomes but are apparently not translated into proteins (Bertone et al. 2004; Carninci et al. 2005; Kapranov et al. 2007). The subset of these RNAs greater than 200 nucleotides in length is collectively termed long noncoding RNAs (lncRNAs), many of which have been shown to have functions just as rRNA, tRNA, and snRNA do (Cech and Steitz 2014). Moreover, lncRNAs have been recognized as key regulatory molecules in gene expression programs (Guttman and Rinn 2012; Geisler and Coller 2013; Quinn and Chang 2016), and recent studies have identified lncRNAs that play critical roles in cellular function, development, and disease (Ponting et al. 2009; Rinn and Chang 2012; Schmitt and Chang 2016). Different types of lncRNAs, such as circRNA and sno-lncRNA, are generated by distinct processing mechanisms (Chen 2016b; Quinn and Chang 2016).

lncRNAs are expressed in a more tissue- and development-specific manner than mRNAs are, and this characteristic makes them suitable as biomarkers for diagnostic and therapeutic applications (Ulitsky and Bartel 2013). Although the vast majority of lncRNAs remain uncharacterized, a recent CRISPRi screen of 16,401 lncRNA loci (using the CRiNCL guide RNA library, available to academic users via Addgene) in seven human cell lines identified 499 lncRNAs that affect cell viability (Liu et al. 2017). Interestingly, most of them (89%) affected cell growth in only one of the seven cell lines, suggesting that lncRNA functions are cell type-specific and demonstrating that this method could be used as a platform to study these functions. These observations support the utility of lncRNAs for therapeutic applications.

One important feature of lncRNAs is that they carry positional information within the nucleus (Batista and Chang 2013; Engreitz et al. 2016b). By contrast, protein-coding RNAs lose such information after transcription because they are exported to the cytoplasm before translation. Consistent with this, lncRNAs regulate expression of nearby genes in cis (Engreitz et al. 2016a). In addition, lncRNAs have been proposed to act as spatial amplifiers that control gene expression and three-dimensional genome architecture (Engreitz et al. 2016b). lncRNAs function in several biological processes, including epigenetics, histone modification, locus-specific gene regulation, enhancers, chromatin remodeling, transcriptional regulation, and posttranscriptional regulation (Quinn and Chang 2016). Mechanistically, lncRNAs can act as guides, scaffolds, architectures, decoys, or enhancers (Guttman and Rinn 2012; Hirose et al. 2014a). For example, several lncRNAs act as chromatin regulators by recruiting and integrating chromatin regulatory proteins at specific chromatin sites (Rinn and Chang 2012).

In some cases, the specific RNA sequences themselves are not necessary; instead, the process of transcription itself at a specific locus is an important regulatory cue for the expression of nearby genes (Engreitz et al. 2016a; Paralkar et al. 2016). Notably, some RNAs annotated as lncRNAs are in fact translated to produce small polypeptides that are biologically active (Anderson et al. 2015; Nelson et al. 2016; Matsumoto et al. 2017). In skeletal and heart muscle cells, small peptides conserved among species are produced from lncRNAs and modulate the activities of membrane-bound proteins.

The functional characterization of lncRNAs is ongoing, and one of the next important challenges is to classify lncRNAs according to their functional RNA elements found in lncRNAs that are RNA sequences or structures for interactions with specific RNA-binding proteins (RBPs) (Hirose et al. 2014a) (Fig. 11.1). Understanding the RNA elements hidden in lncRNAs would open the door to new applications of lncRNAs, e.g., interference with specific functions of lncRNAs and engineering of artificial lncRNAs. Similar approaches using modular domains of RNAs and proteins have already been used to engineer ribozymes, RNPs, and proteins. Thus, identification and characterization of the modular domains of lncRNAs would expand the toolbox for a wide variety of experimental applications in molecular biology, biotechnology, and therapeutics.

Fig. 11.1
figure 1

A concept for modular RNA domains and RNA motifs in lncRNAs. lncRNAs are surrounded by numerous RBPs, some of which recognize specific RNA sequence or structural motifs. Some RBP binding sites and combinations of RBPs on the specific binding sites (e.g., dashed boxes) form functional modular RNA domains that play specific cellular functions

11.2 Examples of lncRNAs with Applications

Here, I summarize several examples of lncRNAs, along with potential applications, focusing on their modular RNA functional domains, biogenesis, mechanisms of actions, and in vivo functions (Table 11.1).

Table 11.1  Examples of IncRNAs with applications

The lncRNA called SINEUP (RNA containing SINE elements that UPregulate translation) is an antisense lncRNA that promotes translation of its target mRNAs (Takahashi and Carninci 2014; Zucchelli et al. 2015a). This class of lncRNAs was originally identified in mouse as an antisense transcript of the Uchl1 gene, AS Uchl1 (Carrieri et al. 2012). AS Uchl1 spans a 73 nt region overlapping the 5′ untranslated region (UTR) and translational start site of the Uchl1 mRNA and promotes translation of UCHL1 protein without affecting mRNA stability. Outside the overlapping region, AS Uchl1 consists of inverted SINEB2 (short interspersed nuclear element B2) elements. Promotion of translation depends on the overlapping region, called the binding domain (BD), and an inverted SINEB2 element called the effector domain (ED) (Zucchelli et al. 2015b). By changing the sequence of the BD into the antisense sequence of another mRNA, synthetic SINEUPs can be designed that target a mRNA of interest and function in trans (Zucchelli et al. 2015a). These synthetic SINEUPs can promote translation in multiple species and cell types, including human, monkey, hamster, and mouse cells in vitro and mice in vivo (Zucchelli et al. 2015a; Indrieri et al. 2016; Zucchelli et al. 2016). In general, this approach yields protein upregulation from 1.5- to 3-fold. miniSINEUPs that exclusively contain the BD and ED (~250 nt in length) are also active, and these shorter sequences are easier to deliver by viral vectors, or as naked synthetic SINEUPs, for therapeutic use (Zucchelli et al. 2015a, b). Among the available methods for increasing protein production, SINEUPs have two advantages: (1) they increase protein production without introducing stable genomic changes and (2) induction is typically moderate and within the physiological range (~2-fold). These features make SINEUPs appropriate for use in research, protein manufacturing, and therapeutics (Zucchelli et al. 2015a, b). As an example of a therapeutic approach, some inborn diseases are caused by haploinsufficiency of the causative gene, and in such cases, synthetic SINEUPs could be used to upregulate the sole remaining wild-type allele. As SINEUP, another strategy has been developing to upregulate the gene expression based on the finding of natural antisense transcripts (NATs) (Wahlestedt 2013). NATs are transcribed in an antisense direction, in proximity to or overlapping with their sense mRNA partners, and usually repress the expression of the partner mRNAs. Therefore, inhibition of NATs can derepress their partner mRNAs (Katayama et al. 2005). Many overlapping NATs have been identified in human (Engström et al. 2006). Oligonucleotides targeting NATs, called antagoNATs, hold promise for therapeutic applications aimed at upregulating the expression of target mRNAs (Wahlestedt 2013).

Circular RNA (circRNA), another class of lncRNAs, is produced by back-splicing from pre-mRNAs (Chen 2016a; Salzman 2016). Because they cannot be targeted by exoribonucleases, circRNAs are generally quite stable. Thousands of circRNAs have been identified in several species, including human, mouse, and C. elegans. In functional terms, these RNAs act as molecular sponges for miRNAs and RBPs. For example, CDR1as sequesters miR-7, and circMBL sequesters RBP MBNL (Hansen et al. 2013; Memczak et al. 2013; Ashwal-Fluss et al. 2014). The sequences and structures of circRNAs are critical for this sequestration. Accordingly, understanding the rules underlying these interactions would lead to novel applications of circRNA for specific sequestration of RNAs and RBPs of interest. Over the course of efforts to understand the biogenesis of circRNA, systems have been developed for expression of circRNAs in cells, thus expanding their potential usage (Liang and Wilusz 2014). In addition to the noncoding features of circRNAs, they have recently been shown to be translated into proteins. For example, Circ-ZNF609 is a circRNA that can be translated into a protein with a function in myogenesis (Legnini et al. 2017). Furthermore, circRNAs are abundant in specific conditions and cell types, suggesting that they might be suitable as biomarkers or therapeutic targets (Chen 2016a). Moreover, fusion-circRNAs generated from oncogenic translocation contribute to cancer cell survival and oncogenic potential in vivo (Guarnerio et al. 2016).

lncRNAs such as HOTAIR and UPAT regulate the degradation of specific proteins. HOTAIR plays roles in target gene repression and cancer-induced ubiquitin-mediated proteolysis by providing scaffolding for E3 ubiquitin ligases such as Dzip3 and Mex3b and ubiquitination substrates such as Ataxin-1 and Snurportin-1 (Yoon et al. 2013). These E3 ligases and substrates associate with specific RNA domains of HOTAIR, suggesting its potential application in targeted degradation. On the other hand, UPAT lncRNA prevents proteolysis of epigenetic factor UHRF1 by blocking its association with the E3 ubiquitin ligase β-TrCP (Taniue et al. 2016). The specific RNA domains on UPAT responsible for its binding to partner proteins have not yet been identified.

1/2-sbsRNA (half STAU1-binding site) promotes the degradation of target mRNAs via STAU1-mediated mRNA decay (SMD) by providing binding sites for STAU1, an RBP that binds double-stranded RNAs (Gong and Maquato 2011). 1/2-sbsRNA forms imperfect base pairs with target mRNAs. In contrast to 1/2-sbsRNA, TINCR lncRNA directly associates with STAU1 and interacts with its target mRNAs via the 25 nt TINCR box motif, which is enriched in its target sequences (Kretz et al. 2013).

Enhancer RNAs (eRNAs), which are transcribed from enhancers, are important determinants for cell lineages (Kaikkonen et al. 2013). Consequently, inhibition of eRNAs influences specific genes in specific cell lineages, leading to the idea of “enhancer therapy.” Many lncRNAs play crucial roles in epigenetic regulation. XIST lncRNA is one of the most extensively studied lncRNAs, and its RNA domains and interacting proteins have been identified (Chu et al. 2015; McHugh et al. 2015; Chen et al. 2016). In addition, lncRNAs play major roles in cancer biology. Multiple lncRNAs, including PVT1, CCAT2, PCAT-1, SAMMSON, MALAT1, and NEAT1, are associated with cancer progression and repression (Schmitt and Chang 2016). For example, NORAD sequesters PUMILIO2 proteins in the cytoplasm and plays a critical role in genome stability (Lee et al. 2016). Therefore, elucidation of the in vivo cancer-related functions of lncRNAs could lead to therapeutic and diagnostic applications.

11.3 Architectural RNAs (arcRNAs) and Their Potential Applications

In this section, I will focus on architectural RNAs (arcRNAs), a class of lncRNAs that serve as architectural components of nuclear bodies, i.e., cellular bodies within the nucleus. Eukaryotic cells compartmentalize cellular materials to organize and promote essential cellular functions. Cells possess two types of compartments: cellular organelles, which are surrounded by lipid bilayers, and membraneless organelles, also known as cellular bodies (Courchaine et al. 2016; Banani et al. 2017). The latter compartments, which are typically composed of specific sets of proteins and RNAs, are fundamental cellular compartments required for specific biochemical reactions, RNP assembly, storage of proteins and RNAs, and sequestration of proteins and nucleic acids. Numerous cellular bodies have been identified to date, including the nucleolus, perinucleolar compartment, nuclear speckles, paraspeckle, Cajal body, gems, PML body, histone locus body, Sam68 nuclear body, stress granules, and P-bodies. Typically, cellular bodies exchange their constituents dynamically. Recent work showed that phase separation between distinct material states (i.e., liquid, hydrogel, and solid) is a key mechanism underlying the formation of these compartments (Wu 2013; Alberti and Hyman 2016; Banani et al. 2017). Proteins containing prion-like domains (PLDs) or low-complexity domains (LCDs), which are unstructured and prone to aggregate, play essential roles in the formation of phase-separated cellular bodies (Aguzzi and Altmeyer 2016; Uversky 2016).

Although many cellular bodies are proteinaceous, some nuclear bodies have RNAs as their architectural cores (Chujo et al. 2016). This concept was originally proposed based on the identification of NEAT1 (nuclear paraspeckle assembly transcript 1), a nuclear-retained lncRNA that is an essential component of the paraspeckle, a nuclear body (Clemson et al. 2009; Sasaki et al. 2009; Sunwoo et al. 2009). In addition to NEAT1, other lncRNAs play similar architectural roles in the construction of nuclear bodies in various species, suggesting that this is a general function of lncRNAs. For example, heat shock RNA (hsr) omega is the lncRNA for the omega speckle in Drosophila melanogaster, and meiRNA is the lncRNA for the Mei2 dot in S. pombe (Chujo et al. 2016). Accordingly, we refer to such RNAs as architectural RNAs (arcRNAs) (Chujo et al. 2016). An arcRNA can be defined as a lncRNA that localizes in a specific nuclear body and is essential for its integrity. In this section, I focus on the arcRNAs in mammals and describe the known arcRNAs, focusing on their biogenesis, biological functions, relationship to diseases, and potential applications (Table 11.2).

Table 11.2  Mammalian arcRNAs

11.3.1 NEAT1 lncRNA

Several groups identified NEAT1 lncRNA as an essential architectural component of the paraspeckle, which was originally identified as a distinct nuclear body localized adjacent to nuclear speckles (Clemson et al. 2009; Sasaki et al. 2009; Sunwoo et al. 2009). The paraspeckle is a massive (~360 nm diameter), highly ordered RNP structure comprising more than 60 kinds of proteins, several of which are required for paraspeckle biogenesis (Souquere et al. 2010; Naganuma et al. 2012; Fong et al. 2013; Yamazaki and Hirose 2015). The DBHS (Drosophila melanogaster behavior human splicing) family of proteins, SFPQ, NONO, and PSPC1, have coiled-coil structures that form homo- or heterodimers; two of these, SFPQ and NONO, are essential for expression of NEAT1 and formation of paraspeckles (Sasaki et al. 2009; Naganuma et al. 2012; Passon et al. 2012; Lee et al. 2015). In addition, several paraspeckle proteins, including FUS, DAZAP1, HNRNPH3, and the SWI/SNF complex components BRG1 and BRM, are essential for paraspeckle integrity (Naganuma et al. 2012; Kawaguchi et al. 2015). Many paraspeckle proteins are RBPs with a PLD or LCD, some of which are essential for paraspeckle formation (Naganuma et al. 2012; Yamazaki and Hirose 2015). In addition, the PLDs of FUS and RBM14 are essential for paraspeckle integrity (Hennig et al. 2015).

The NEAT1 gene is located on chromosome 11q13 in human and chromosome 19qA in mouse. In both species, the gene encoding another abundant nuclear lncRNA, MALAT1, is adjacent to NEAT1. The NEAT1 lncRNA has two isoforms, NEAT1_1 (~3.7 kb) and NEAT1_2 (~22.7 kb), which are produced from the same transcription start site under the control of the same promoter and then subjected to alternative 3′-end processing (Naganuma et al. 2012). The NEAT1_1 lncRNA has a poly(A) tail, whereas NEAT1_2 has a unique triple-helix structure at its 3′ end that stabilizes cognate RNAs (Wilusz et al. 2012). Similar cis-acting RNA structures, some of which are called ENE (element for nuclear expression), are found in MALAT1 lncRNA, genomic RNAs of diverse viruses including Kaposi’s sarcoma-associated herpesvirus, and ~200 transposable element RNAs in plants and fungi (Conrad and Steitz 2005; Brown et al. 2012; Tycowski et al. 2012; Wilusz et al. 2012; Tycowski et al. 2016). In addition to its role in the stabilization of RNAs, the triple-helix structure increases their translation rates (Wilusz et al. 2012). An important role in 3′-end processing of NEAT1_2 is played by a tRNA-like structure, located just after the triple-helix structure, that is required for 3′-end cleavage of the NEAT1_2 transcript (Wilusz et al. 2008). This tRNA-like structure is processed by RNase P, which is involved in tRNA maturation.

Importantly, the long isoform NEAT1_2 is essential for paraspeckle formation, whereas the short isoform NEAT1_1 is dispensable (Naganuma et al. 2012). RNA polymerase II inhibition rapidly disrupts the paraspeckle, suggesting that paraspeckles form co-transcriptionally and are highly dynamic in nature (Fox et al. 2002). This idea is supported by the direct observation of de novo formation of the paraspeckle during transcription of the NEAT1 locus (Mao et al. 2011). An elegant electron microscopic study demonstrated that NEAT1 is spatially organized within paraspeckles (Souquere et al. 2010). Specifically, the 5′ and 3′ ends of NEAT1_2 are located on the periphery of the paraspeckle, whereas the middle portion is located in the interior. These data indicate that NEAT1 is folded and arranged within the paraspeckle, suggesting that the paraspeckle has a highly ordered structure that may contribute to the formation and functions of this nuclear body. A recent super-resolution microscopic study showed that paraspeckles are typically spherical and that specific proteins are localized to specific domains within a paraspeckle, implying a core–shell spheroidal structure (West et al. 2016).

Several studies have described the molecular functions of NEAT1. For example, NEAT1 regulates several specific types of RNAs, including IRAlu (inverted repeated Alu elements)-containing RNAs, of which 333 are present in human. mRNAs that contain IRAlu in their 3′ UTR are thought to be retained in paraspeckles (Chen and Carmichael 2009). In mouse, CTN mRNAs are retained in a manner dependent upon the paraspeckle component NONO/p54nrb and are exported in response to certain stimuli (Prasanth et al. 2005). Although the biological importance of this phenomenon is unknown, AG-rich RNAs are enriched in paraspeckles at their surface (West et al. 2016). In addition to regulating RNAs, the paraspeckle sequesters proteins and thus controls the free availability of these proteins in the nucleoplasm. Paraspeckle proteins such as SFPQ, which functions as a transcription activator or repressor dependent upon context, are sequestered in paraspeckles, thereby controlling expression of their target genes (Hirose et al. 2014b; Imamura et al. 2014). Together, paraspeckles function in gene regulation as molecular sponges for both RNAs and proteins. A study using the CHART (capture hybridization analysis of RNA targets) method showed that NEAT1 binds actively transcribed genes in specific chromosome loci, suggesting possible roles in direct regulation of these genes (West et al. 2014).

NEAT1 is induced by several stress-related, developmental, and pathological conditions. Proteasome inhibition by compounds such as MG132 and bortezomib induces NEAT1 expression, and NEAT1 knockout (KO) mouse embryonic fibroblasts are sensitive to MG132 treatment (Hirose et al. 2014b). NEAT1_2 is expressed in many cell lines, but not in ES cells. In mice, however, extensive investigation of NEAT1 expression by in situ hybridization revealed that NEAT1_2 is only expressed in a subset of cell types in tissues, whereas NEAT1_1 is expressed in the majority of cell types (Nakagawa et al. 2011, 2014). Consistent with this observation, paraspeckles are absent from most cells. Among the tissues that do have paraspeckles, NEAT1_2 is highly expressed in the corpus luteum. Consistent with this expression pattern, defects in pregnancy are observed in NEAT1 KO mice (Nakagawa et al. 2014). Also, paraspeckles are assembled in luminal epithelial cells in the mammary gland during development; accordingly, NEAT1 KO mice also exhibit defects in mammary gland development and lactation (Standaert et al. 2014).

In addition to the in vivo functions of NEAT1 under normal conditions, multiple studies have revealed its role in diseases. In particular, several reports have demonstrated the critical importance of NEAT1 in cancer. NEAT1 lncRNA is dysregulated (mainly upregulated) in many types of cancer. Moreover, NEAT1 is a prominent target gene of tumor suppressor p53 (Blume et al. 2015; Adriaens et al. 2016). NEAT1 expression is induced, and paraspeckles are assembled, by pharmacologically activating p53; alternatively, p53 can also be activated by oncogene-induced replication stress (Adriaens et al. 2016). In this context, NEAT1 prevents DNA damage resulting from replication stress (Adriaens et al. 2016). Paraspeckles form in tumor tissues, whereas normal tissues adjacent to cancer lack the paraspeckles. Strikingly, NEAT1 KO mice exhibit impaired skin tumorigenesis, indicating that NEAT1 promotes tumorigenesis (Adriaens et al. 2016). Furthermore, depletion of NEAT1 sensitizes cancer cells to chemotherapy by modulating DNA damage responses such as the ATR–CHK1 signaling pathway, suggesting a synthetic lethal interaction between NEAT1 and chemotherapeutic agents (Adriaens et al. 2016). Together, these observations strongly suggest that the NEAT1 lncRNA is a prominent target for increasing the genotoxicity of cancer chemotherapeutics.

In addition to being regulated by p53, paraspeckle formation is also induced in response to tumor hypoxia via transcriptional upregulation of NEAT1_2 by HIF2-alpha, thereby promoting cancer cell survival (Choudhry et al. 2014, 2015). Consistent with this, NEAT1 is the most upregulated lncRNA in prostate cancer. NEAT1 expression is regulated by estrogen receptor alpha and is associated with prostate cancer progression (Chakravarty et al. 2014). NEAT1 has been proposed to drive oncogenic growth by promoting epigenetic changes in the promoters of its target genes (Chakravarty et al. 2014). In addition, the short isoform NEAT1_1, but not NEAT1_2, promotes gene expression by inducing active chromatin states in prostate cancer, indicating that the two isoforms have distinct functions in this context. In addition, expression of NEAT1_2, but not NEAT1_1, predicts the response of ovarian cancer to platinum-based chemotherapy, providing further evidence that the isoforms play different roles (Adriaens et al. 2016). Very recently, NEAT1_1 was detected outside of the paraspeckles, where it forms numerous nucleoplasmic “microspeckles,” which were proposed to have paraspeckle-independent functions (Li et al. 2017). Accordingly, the precise dissection of NEAT1_1 and NEAT1_2 functions is essential for the development of applications of NEAT1. In addition to the dysregulation of its expression, NEAT1 is also highly mutated in several cancers, including liver cancer, although it remains unclear how these mutations affect NEAT1 functions (Fujimoto et al. 2016).

NEAT1 expression is also induced by viruses, including Japanese encephalitis virus, influenza virus, herpes simplex virus, measles, rabies virus, and hantavirus, as well as in HIV-infected cells (Saha et al. 2006; Zhang et al. 2013; Imamura et al. 2014; Ma et al. 2017). In the case of influenza virus, NEAT1 is induced via the Toll-like receptor 3 (TLR3) pathway, resulting in elongation of paraspeckles (Imamura et al. 2014). The resultant elongated paraspeckles regulate immune-responsive genes, including interleukin-8 (IL-8), by sequestering paraspeckle proteins including SFPQ away from their target gene promoters, as in the case of paraspeckles induced by proteasome inhibition. Taken together, these observations highlight the broad importance of NEAT1 in antiviral responses.

NEAT1 is also implicated in several neurodegenerative diseases. For example, NEAT1 is upregulated in the early stage of amyotrophic lateral sclerosis (ALS), but is not expressed in the same cells under normal conditions (Nishimoto et al. 2013). In addition, NEAT1 is significantly upregulated in the brains of frontotemporal dementia (FTD) patients (Tollervey et al. 2011; Tsuiji et al. 2013). NEAT1 is upregulated in Huntington’s disease, and NEAT1_1 expression prevents neuronal death in cell culture, suggesting that NEAT1 plays a protective role in Huntington’s (Sunwoo et al. 2017). By contrast, NEAT1 is acutely downregulated in response to neuronal activity (Barry et al. 2017). Many other paraspeckle proteins are linked to neurodegenerative diseases, including TDP-43, FUS, SS18L1, HNRNPA1, SFPQ, EWSR1, TAF15, and HNRNPH1 (Yamazaki and Hirose 2015; Taylor et al. 2016).

As described above, NEAT1 has pleiotropic functions in diseases and functions in a context-dependent manner. Precise dissection of the functions of NEAT1_1 and NEAT1_2 and their roles in physiological conditions is important for the development of applications. Notably in this regard, phosphorothioate-modified antisense oligonucleotides (ASOs) can induce NEAT1-free paraspeckle-like foci in the nucleus (Shen et al. 2014). Similar approaches might enable intervention in the formation of various nuclear bodies, including paraspeckles, in vivo.

11.3.2 rIGS lncRNAs

Stresses induce dramatic changes in silent genomic regions. The nucleolar intergenic spacer (IGS), where heterochromatin normally forms and which is therefore transcriptionally silent, produces ribosomal IGS (rIGS) lncRNAs in response to several stresses, including acidosis, heat shock, hypoxia, serum starvation, aspirin treatment, and DNA damage (Audas et al. 2012a, b, 2016; Jacob et al. 2012, 2013). Knockdown of IGS lncRNAs disrupts the recruitment of target proteins, implying that IGS lncRNAs are arcRNAs (Audas et al. 2012a; Jacob et al. 2013). Under stress, rIGS lncRNAs form large subnucleolar structures called amyloid bodies (A-bodies, also called nucleolar detention centers [DCs]) (Audas et al. 2016). Unlike other nuclear bodies, A-bodies sequester and immobilize key cellular proteins in the nucleolus, as demonstrated by the immobilization of protein components revealed by FRAP (fluorescent recovery after photo-bleaching), proteinase K insensitivity, and staining with amyloid dyes such as 8-anilino-1-naphthalenesulfonate, Congo red, and Amylo-Glo (Audas et al. 2012a, 2016, Jacob et al. 2013). Proteomic analysis revealed that A-bodies are characterized by physiological amyloids mediated by rIGS RNAs and the amyloid-converting motif (ACM), an arginine/histidine-rich sequence that is enriched in the A-body proteome (Audas et al. 2016). Proteomic analysis also showed that A-bodies have heterogeneous protein compositions and that their components include heat shock proteins such as HSP27, HSP70, and HSP90. The activity of heat shock proteins is required for disassembly of A-bodies, suggesting that the amyloid-like state of its protein components is reversible (Audas et al. 2016). Accordingly, it has been proposed that A-bodies serve to store large quantities of proteins in a dormant state.

There are several IGS lncRNAs, including rIGS28RNA, rIGS16RNA, rIGS22RNA, and rIGS20RNA, which are transcribed from regions ~28, 16, 22, and 20 kilobases (kb), respectively, downstream of the rRNA gene loci under specific stress conditions (Audas et al. 2012a). A-bodies constructed by these rIGS lncRNAs are functionally and compositionally distinct. For instance, the stress-responsive transcription factor HIF1A is degraded by the von Hippel–Lindau (VHL) ubiquitin E3 ligase under normal conditions. However, under oxidative stress conditions, rIGS28RNA is induced and sequesters VHL, along with other cellular proteins, in the nucleolus. Knockdown of rIGS28RNA prevents localization of VHL to the nucleolus under oxidative stress (Audas et al. 2012a). In addition to VHL, many nuclear proteins, including DNA methyltransferase 1 (DNMT1) and DNA polymerase delta 1 (POLD1), are also sequestered (Audas et al. 2012a). Under heat shock conditions, HSP70 is sequestered in the nucleolus in a rIGS16RNA- and rIGS22RNA-dependent manner (Audas et al. 2012a). Moreover, either heat shock or acidosis immobilizes RNA polymerase I subunits and rRNA processing proteins in A-bodies, thereby halting ribosome biogenesis. In addition, MDM2 is sequestered in the nucleolus by rIGS20RNA under transcriptional stress (Audas et al. 2012a; Jacob et al. 2013). These acidic and hypoxic conditions are thought to be prevalent in cancer microenvironments. Consistent with this, in human cancer tissues, IGS28RNA is induced, and several A-body components have been detected (Audas et al. 2016). In addition, rIGS28RNA-depleted MCF7 and PC3 cells form larger tumor masses in nude mouse xenograft assay (Audas et al. 2016).

The ability of rIGS lncRNAs to convert proteins into physiological amyloids by inducing a phase transition to a solid state is highly intriguing. Further dissection of specific RNA elements in rIGS lncRNAs may lead to the development of a method to induce such phase transitions in cellular contexts.

11.3.3 Satellite III lncRNA

Satellite III (Sat III) lncRNAs are primate-specific and essential components of the nuclear stress body (nSB) (Cotto et al. 1997; Chiodi et al. 2000; Denegri et al. 2002; Biamonti 2004; Jolly and Lakhotia 2006; Biamonti and Vourch 2010). nSBs, which have a diameter of 0.3–3 micrometers, form in response to heat shock and several chemical stress conditions and are present in humans and monkeys, but not in rodents. Their formation is initiated by the transcription of Sat III lncRNAs, which are not expressed in non-stressed conditions, from pericentromeric heterochromatic regions (primarily 9q12 in human) (Chiodi et al. 2000; Denegri et al. 2002; Jolly and Lakhotia 2006; Biamonti and Vourch 2010). As with nSBs, Sat III lncRNAs are strongly induced under various stress conditions, including heat shock (Valgardsdottir et al. 2008; Biamonti and Vourch 2010). They are thought to be polyadenylated RNAs containing repetitive sequences with multiple tandem GGAAU or GGAGU repeats connected by linker sequences (Choo et al. 1990; Biamonti 2004; Biamonti and Vourch 2010). The protein components of nSBs include several transcription factors, including HSF1, HSF2, TonEBP, TDP-43, and Sam68, as well as several splicing factors, including SAFB, SRSF1, SRSF7, and SRSF9 (Denegri et al. 2002; Biamonti 2004; Biamonti and Vourch 2010). Formation of nSBs is initiated through a direct interaction between HSF1 and Sat III lncRNA (Biamonti and Vourch 2010). Like paraspeckles, the nSB is a dynamic structure, as determined by FRAP measuring HSF1 dynamics (Audas et al. 2016). nSBs are thought to function in stress response and recovery by globally influencing gene expression via sequestration of transcription and splicing factors (Biamonti 2004; Biamonti and Vourch 2010). Our group reported that the SWI/SNF complex is required for the formation of nSBs, as well as the paraspeckles, suggesting its general importance in formation of nuclear bodies containing arcRNAs (Kawaguchi and Hirose 2015). When Sat III lncRNAs are knocked down, SRSF1 and SAFB cannot localize to nSBs, although HSF1 still localizes in nSBs, suggesting that Sat III can act as a scaffold for RBPs (Valgardsdottir et al. 2008). HSF1 is essential for the integrity of nSBs (Goenka et al. 2016). Consistent with this, RRM2 of SRSF1 is required for its targeting to nSBs, suggesting that splicing factors are recruited to Sat III lncRNA via direct RNA binding (Chiodi et al. 2004). Furthermore, artificial tethering experiments showed that Sat III lncRNA can initiate de novo formation of nSBs, indicating that it plays an architectural role (Shevtsov and Dundr 2011). In addition to stress conditions, SAT III is upregulated in A431 epidermoid carcinoma cells, as well as in several senescent cells, implying that it plays a role in cancer and senescence (Enukashvily et al. 2007).

11.3.4 HSATII RNA

In addition to the Sat III lncRNA, other satellite DNAs are transcribed into noncoding RNAs (ncRNAs) under specific conditions. High-copy satellite II (HSATII) DNA is normally methylated and silenced, but DNA methylation of HSATII is frequently lost in cancer (Ting et al. 2011). Hence, HSATII RNA is aberrantly expressed in many tumors. Small blocks of HSATII are found in the pericentromeres of 11 human chromosomes, including chromosomes 2, 5, 7, 10, 13, 14, 15, 17, 21, 22, and Y (Hall et al. 2017). The transcribed HSATII RNA forms distinct large nuclear RNA foci, termed cancer-associated satellite transcript (CAST) bodies (Hall et al. 2017). These CAST bodies act as molecular sponges to sequester master epigenetic regulatory proteins, including MeCP2 (methyl-CpG-binding protein 2) and are thought to influence the epigenome in cancer cells. In that context, HSATII DNA and RNA form nuclear foci called cancer-associated Polycomb (CAP) bodies from the 1q12 mega-satellite, which also act as molecular sponges (Hall et al. 2017). CAP bodies sequester Polycomb group complex component PRC1. Sequestrations by CSAT and CAP bodies cause epigenetic instability, which is recognized as a hallmark of cancer (Hall et al. 2017). HSATII RNA is highly expressed in cancers, including pancreatic cancer, and is also expressed in preneoplastic pancreatic lesions, suggesting that it could serve as a biomarker for pancreatic cancer (Ting et al. 2011). HSATII RNA is also found in circulating blood of cancer patients; accordingly, a sensitive detection system has been developed to identify this biomarker (Kishikawa et al. 2016). In general, cancer-specific nuclear bodies constructed by HSATII RNA and DNA may serve as biomarkers of epigenetic instability, with diagnostic utility in several cancers.

11.3.5 Other arcRNAs

Our group has screened for nuclear structures that are diminished by RNase treatment. This approach has identified known and novel nuclear bodies, including the Sam68 nuclear body and a novel structure called the DBC1 body (Mannen et al. 2016). These findings suggest that these nuclear bodies, which are present in a subset of cancer cells, are constructed by unidentified arcRNAs. Interestingly, the Sam68 and DBC1 nuclear bodies are joined by the adaptor protein HNRNPL, warranting investigation of the underlying mechanism for this feature (Mannen et al. 2016). The number of validated arcRNAs is still limited, but we recently reported a method for identifying arcRNA candidates on a genome-wide scale (Chujo et al. 2017). This technique takes advantage of a characteristic feature of NEAT1, namely, that it is difficult to extract by conventional RNA extraction methods using acid–guanidine–phenol–chloroform (AGPC) reagents (e.g., Thermo Scientific, TRIzol). Shearing with a needle or heating improves the extraction efficiency of NEAT1. We termed this feature “semi-extractability” and RNAs with this property as “semi-extractable RNAs (seRNAs)” (Chujo et al. 2017). In addition to NEAT1, another arcRNA, rIGS16RNA, was semi-extractable, suggesting that this feature might help identify novel arcRNAs. A comparison of the expression levels of RNAs extracted by the conventional and improved methods revealed seRNAs throughout the genome, including 50 seRNAs in HeLa cells, as determined by RNA-seq (Chujo et al. 2017) (Fig. 11.2). Some of the seRNAs formed distinct nuclear foci that were distant from the transcription sites, indicating the formation of nuclear bodies (Chujo et al. 2017). The list of seRNAs contains several putative arcRNAs, including the LINE-1 and SPA, supporting the utility of this method for exploring arcRNAs at a genome-wide scale.

Fig. 11.2
figure 2

An experimental procedure to search semi-extractable RNAs. An example of workflows to search semi-extractable RNAs from cultured cells is shown. The cells are suspended in AGPC reagents such as TRIzol and divided into two groups for conventional and improved extraction methods. In the conventional method, the samples are directly subjected to RNA extraction. In the improved method, prior to RNA extraction, the samples are subjected to the needle shearing or heating. Both of the extracted RNAs are subjected to RNA-seq analyses. After the sequencing, the sequence reads in both samples are compared. Semi-extractable RNAs will be enriched in the RNAs extracted by the improved method

In addition to endogenous nuclear bodies, repeat RNAs form nuclear foci and play critical roles in pathogenesis in microsatellite expansion diseases such as myotonic dystrophy, a genetic disorder with multisystemic symptoms (Wojciechowska and Krzyzosiak 2011; Belzil et al. 2013; Nelson et al. 2013; Ramaswami et al. 2013; Mohan et al. 2014). Typically, such RNA granules sequester proteins. For example, expanded CTG or CCTG repeats are present in genomic DNA of patients with myotonic dystrophy type 1 or type 2, respectively (Wojciechowska and Krzyzosiak 2011). These repeats are transcribed into RNAs and form RNA foci that sequester RBP and MBNL; in mice, KOs of the corresponding genes cause myotonic dystrophy-like phenotypes (Kanadia et al. 2003). These data indicate that disease symptoms originate from toxic RNAs. Thus, understanding the mechanisms that dictate the biogenesis of these structures, which could be related to endogenous nuclear bodies, will help to develop therapeutic methods for these disorders.

11.3.6 Commonalities and Potential Applications of arcRNAs

In this section, I summarize the commonalities of arcRNAs and arcRNA-constructed nuclear bodies and discuss the reasons why RNA is used as a scaffold or platform for nuclear bodies. In addition, I address potential applications of arcRNAs.

First, most arcRNAs serve as the molecular sponges for proteins and, in some cases, RNAs (Biamonti and Vourch 2010; Hirose et al. 2014b; Imamura et al. 2014; Audas et al. 2016; Hall et al. 2017). It is possible that biochemical reactions and RNP assembly can also take place in these compartments, as observed for proteinaceous nuclear bodies. The components of nuclear bodies are usually dynamic (Fox et al. 2002). In the case of rIGS lncRNAs, the component proteins are completely detained within the bodies (Audas et al. 2012a, 2016). These data suggest that the arcRNAs induce phase transitions among material states (i.e., liquid, hydrogel, and solid) and sequester different factors into distinct states.

Second, many protein components of arcRNA-constructed nuclear bodies contain proteins harboring PLDs, LCDs, or intrinsically disordered domains, suggesting that aggregation-prone sequences play a role in assembly of these bodies (Yamazaki and Hirose 2015). In fact, PLD-containing paraspeckle proteins are essential for the integrity of paraspeckles (Hennig et al. 2015). Whereas most proteins localizing in arcRNA-constructed nuclear bodies are dispensable for the integrity of the bodies, a few of them play an essential role in maintaining nuclear bodies, suggesting that they represent essential core factors.

Third, transcription of arcRNAs is essential for nucleation, leading to assembly of nuclear bodies. Shutdown of transcription quickly eliminates these bodies, indicating that arcRNA-regulated nuclear bodies are dynamically controlled and that sequestration by arcRNAs is reversible (Fox et al. 2002). Consistent with this, the arcRNAs characterized so far are all nuclear ncRNAs induced under stress and disease conditions (Chujo et al. 2016). The rIGS, Sat III, and HSATII lncRNAs are transcriptionally silenced under normal conditions but are dramatically induced in response to stress or disease (Biamonti and Vourch 2010; Audas et al. 2012a, 2016; Hall et al. 2017). Several arcRNAs are aberrantly expressed in disease such as cancer and neurodegenerative disorders (Yamazaki and Hirose 2015; Hall et al. 2017). Therefore, it is conceivable that the primary physiological roles of arcRNAs are related to stress and disease, raising the possibility that identification of arcRNA candidates as seRNAs under various conditions might reveal important regulatory lncRNAs (Chujo et al. 2017).

Fourth, lncRNAs have an advantage as the architectural core of nuclear bodies: they do not require a frame for protein translation. Consequently, their sequences are presumably only constrained by the requirement to form RNA–protein interaction, and arcRNAs have more flexibility to add and change their sequences than proteins, in which changes might cause aggregation. This advantage allows lncRNAs to increase the diversity of their binding partners and the combinations of proteins that are incorporated into nuclear bodies, likely via interactions with unique sequences or structures irrelevant to their protein–protein interaction potential (Chujo et al. 2016). This feature enables the formation of a wide range of nuclear bodies, allowing more rapid adaptation to demands for circumstantial change. Because there are more than 1000 RBPs in human cells, the RBPs that bind to arcRNAs can in turn sequester proteins that interact with RBPs (Baltz et al. 2012; Castello et al. 2012, 2016). The nature of this scaffolding should be suitable for integrating diverse regulatory proteins into specific sites.

Fifth, various repetitive sequences in NEAT1 lncRNA and other arcRNAs derived from satellite regions contain, or consist primarily of, repetitive sequences (Ulitsky and Bartel 2013; Chujo et al. 2016). Half of our genome comprises repeat sequences, including SINEs, LINEs, pseudogenes, endogenous viruses, and other repeats (Steitz 2015). Although these repetitive sequences are usually ignored in genomic and transcriptome analyses, arcRNAs derived from these repetitive sequences could be important regulators of the genome.

Overall, many recent studies have demonstrated that nuclear bodies are phase-separated into liquid, hydrogel, or solid states in cells (Courchaine et al. 2016; Banani et al. 2017). RNA is a suitable molecule for nucleating nuclear bodies by inducing phase separation. Specifically, arcRNAs are thought to induce phase separation by increasing the local concentration of proteins containing PLDs or LCDs and/or via direct RNA–RNA interaction (Chujo et al. 2016). The arcRNAs validated to date induce phase separation into different material states, as mentioned above. These states are likely to be defined by the sequences of arcRNAs, which serve as scaffolds for specific RBPs. By understanding the principles underlying the regulation of nuclear bodies by RNA elements in arcRNAs, it might be possible to artificially control phase separation among several material states that have different molecular dynamics in cells. This could lead in turn to the development of biochemical micro-reactors, cellular compartments that can achieve specific biochemical reactions by recruiting specific sets of protein and nucleic acids. This strategy could enable intervention in cellular functions such as gene and epigenome regulation. In addition, as noted above, phosphorothioate-modified ASOs can induce NEAT1-free paraspeckle-like foci in the nucleus, and ASOs have recently been approved for therapeutic uses in diseases such as spinal muscular atrophy (SMA). Accordingly, ASO or related methods hold promise for control of phase separation and nuclear body formation in vivo (Shen et al. 2014).

arcRNAs are an emerging class of lncRNAs for potential therapeutic targets and other applications. RNA-seq taking advantage of semi-extractability could launch a new era in arcRNA research and lead to the identification of new biomarkers, therapeutic targets for diseases, and methods for intervening in cellular functions (Chujo et al. 2017).

11.4 Conclusion and Future Perspectives

Decoding RNA elements and their protein partners is analogous to understanding the domains or motifs of proteins (Hirose et al. 2014a). In the case of proteins, this understanding has led to the development of inhibitors or activators of specific functions of multifunctional proteins. Similarly, the understanding of lncRNA elements would help develop methods to repress or activate specific functions of lncRNAs, although currently we are only capable of modulating all functions of a given lncRNA. Locked nucleic acid (LNA) has been used to target domains of XIST lncRNA that is essential for X chromosome inactivation and can repress domain-specific functions by blocking the interaction between XIST lncRNA and specific proteins (Sarma et al. 2010). Furthermore, RNA can dynamically change its secondary structure. Therefore, specific inhibition or activation of lncRNA domains could be achieved by modulating the corresponding RNA structures. It will be important to focus on the RNA elements and to accumulate knowledge about these elements.

We now have access to CRISPR interference (CRISPRi), a new method for exploring the functions of lncRNA on a genome-wide scale, as described above (Liu et al. 2017). This is a very powerful tool for exploring functionally important disease-linked lncRNAs. Because lncRNAs are expressed in specific tissues and cells, they are suitable targets for therapeutics with minimal side effects. In addition, many new approaches have been developed for investigating the biological importance of lncRNAs. One of them, CRISPR-mediated KO of lncRNAs in model organisms, is very useful for investigating the roles of these RNAs under physiological conditions. Although CRISPR is powerful, some lncRNAs are generated from multiple genomic loci, and therefore it might be difficult to repress their functions. In this case, an alternative strategy would be direct knockdown of lncRNAs. Because many lncRNAs are localized in the nucleus, siRNA-based methods are usually not capable of targeting them efficiently. However, knockdown using ASOs represents a powerful alternative approach for exploring the functions of nuclear-retained lncRNAs (Ideue et al. 2009). In addition, several methods have been designed to investigate the RNA elements in lncRNAs by elucidating the interactions between lncRNAs and proteins; these include RNA-centric technologies such as ChIRP (chromatin isolation by RNA purification), CHART, and RAP (RNA antisense purification) and protein-centric technologies such as CLIP (cross-linking immunoprecipitation) (Kashi et al. 2016). In addition, technologies for probing RNA secondary structure would also be useful (Lu and Chang 2016). Together, these new strategies will help to expand our understanding of RNA elements in lncRNAs.