Introduction

The Encyclopedia of DNA Elements (ENCODE) project has revealed that at least 75% of the human genome is transcribed into RNAs, while protein-coding genes comprise only 3% of the human genome.1 Because of a long-held protein-centered bias, many of the genomic regions that are transcribed into non-coding RNAs (ncRNAs) had been viewed as ‘junk’ in the genome, and the associated transcription had been regarded as transcriptional ‘noise’ lacking biological meaning.2 The last decade has witnessed an explosive expansion in the understanding of biological function and clinical significance of ncRNA transcripts, exemplified by the large number of published reports linking microRNAs (miRNAs) and various human diseases including cancer.3 With the advancement of sequencing technology and bioinformatics, other types of short or long ncRNAs, such as endogenous small interfering RNAs, PIWI-interacting RNAs, small nucleolar RNAs, natural antisense transcripts (NATs), circular RNAs, long intergenic ncRNAs (lincRNAs), enhancer ncRNAs and transcribed ultraconserved regions (T-UCRs), have been characterized and classified.4, 5 Among these ncRNAs, long ncRNAs (lncRNAs), defined as being at least 200 nucleotides in length, have received much attention due to their abundant presence in the human genome, as well as their tissue-specific expression patterns and functional relevance in complex physiological and pathological processes.6 Distinct from the short miRNAs, the length of lncRNAs allows them to fold into more complex three-dimensional structures, likely to determine specific interactions of lncRNA with biomolecule partners such as transcription factors, histones or other chromatin-modifying proteins. Consequently, alterations in lncRNA expression levels could affect a broad spectrum of genes via their protein partners and as such cause profound phenotypic changes.7 LncRNAs could also have sequence-specific interactions with DNA or RNA in the forms of duplex or triplex structures8, 9 and create complex regulatory networks composing of DNA, RNA and proteins.

The mapping of several lncRNAs to regulatory genomic regions such as promoters and enhancers1 indicates a possible involvement of these noncoding transcripts in gene regulation. In addition, genome-wide association studies (GWAS) revealed that <10% of the disease-related single-nucleotide polymorphisms (SNPs) are in exons of protein-coding genes, whereas nearly half of the disease-associated SNPs are outside protein-coding genes.10 Although lncRNA function remains largely unknown, recent studies have clearly demonstrated the functional importance of lncRNAs in embryonic development,11 cell differentiation12 and various human diseases including cancer.5, 13, 14 Mechanistically, lncRNAs that are transcribed from regulatory elements or cancer-associated genomic regions may cooperate with their genomic DNA elements to fine-tune the complex biological activities necessary for precise regulation. This might be of particular relevance in the regulation of complex biological activities that do not obey to ‘binary switch’ (on and off) regulation, but are rather regulated in a subtler, dosage-dependent manner.

The topic of lncRNA has been covered in several excellent in-depth review papers.13, 15, 16, 17, 18, 19, 20 Here, we focus on the interplay between DNA and lncRNA in the human genome and the relevance of these interactions in human cancer. We introduce various types of lncRNAs from regulatory genomic elements, summarize recently identified molecular mechanisms of DNA–RNA interaction in the context of cancer and discuss the clinical relevance of the findings.

The missing culprit genes

In many non-hypothesis-driven studies, large-scale genotyping from population-based samples is used to evaluate disease gene associations. Among these, GWAS studies provided valuable information as to the genetic variants in cancer risk, disease diagnosis, prognosis and treatment response.21 However, the molecular mechanisms underlying such links remain largely undefined, owing to the fact that many of these genetic variants (43%) are located in gene ‘desert’ regions that lack protein-coding genes13 (summarized in Table 1).

Table 1 Examples of cancer-associated lncRNAs containing cancer predisposition SNPs

Similarly, non-GWAS studies also point to the same observations that in many cases protein-coding genes are not the culprits responsible for disease phenotypes. This notion can be exemplified and supported by the role played by miRNA-15a/16-1 in chronic lymphocytic leukemia (CLL).22 A recurring pattern of 13q14.3 deletions was observed in CLL indicative of the presence of a tumor suppressor in this region. However, the protein-coding genes identified from this genomic region did not fulfill this tumor-suppressing function.22 Instead, two miRNAs were identified and subsequently proven by multiple studies to underlie the etiology of CLL.22 Following this initial finding, several studies identified numerous miRNAs involved in a broad spectrum of human malignancies. Nevertheless, miRNAs and protein-coding genes are not the only determining factors of disease phenotype. Other DNA regulatory elements may have an important role in causing morbid phenotypes by altering gene transcription modalities. Moreover, other types of ncRNAs are transcribed from cancer-associated genomic regions and participate in cancer pathogenesis.

‘Junk DNA’ encodes for lncRNAs

The protein-centered dogma had viewed genomic regions not coding for proteins as ‘junk’ DNA. We now understand that many lncRNAs are transcribed from ‘junk’ regions, and even those encompassing transposons, pseudogenes and simple repeats represent important functional regulators with biological relevance.23, 24 For the convenience of this review, we subdivided lncRNAs into several categories based on their genomic locus relative to protein-coding genes or their unique structural features. However, these classifications are not exclusive, and this grouping does not have any bearing on their biological activity or functional mechanisms.

Promoter-associated lncRNAs

Gene promoters interact with transcription factors and RNA polymerases to activate transcription.25 The recent identification of ncRNA transcripts located within the promoter region of several genes26, 27 has clearly indicated that more complex regulatory mechanisms should be envisaged. A tiling microarray aimed at the study of ncRNAs mapping in the proximity of the transcription start site of 56 cell-cycle-related genes revealed extensive transcription activity in the gene promoter region without protein-coding feature. Among these lncRNA, the non-spliced 1.5-kb ncRNA PANDA transcribed from 5 kb upstream of the CDKN1A transcription start site was proven to function in the DNA damage response.28 Interestingly, while CDKN1A mediates cell cycle arrest, PANDA promotes cell survival in response to DNA damage by preventing the transcription factor NF-YA from binding specific promoters of apoptosis-inhibiting genes.28 This indicates that following DNA damage response, both cell cycle arrest and anti-apoptotic genes (and possibly genes with other functions) can be induced from the same locus, and a complex network will determine the biological phenotypes. In another study, a promoter-associated lncRNA complementary to the rRNA gene promoter binds to rRNA gene to form a lncRNA-DNA triplex. This RNA-DNA triplex prevents the binding of transcription termination factor 1 to the rRNA gene and recruits DNMT3b to silence it.8

Enhancer ncRNAs

Enhancers are defined as DNA elements that, independently of their proximity or orientation with respect to the gene transcription site, are able to enhance gene expression levels.29 Notably, many active enhancer regions are transcribed into lncRNAs.30 In mouse neurons, out of the 12 000 neuronal activity-regulated enhancers defined by p300/CBP occupation and histone H3-Lysine 4 mono-methylation (H3K4Me1), 2000 were found to bi-directionally express lncRNAs, termed ‘enhancer RNAs’ or eRNAs, that are predominantly non-polyadenylated.31 Positive association of eRNA expression at neuronal enhancers with the levels of nearby protein-coding genes suggests that eRNA may regulate mRNA synthesis.31 Next to eRNAs, polyadenylated ‘enhancer-like ncRNAs’ were identified from genomic enhancer regions and shown by RNA interference to activate neighboring protein-coding genes in cis.32 In addition, several T-lymphocyte-specific enhancers are bound by RNA polymerase II and general transcription factors, and express both polyadenylated and nonpolyadenylated lncRNAs.33

Evf2 lncRNA represents yet another example of enhancer RNA that regulates gene expression of the Dlx cluster through interaction with the transcription factor Dlx2.34 HOTTIP, an lncRNA expressed from the distal tip of the HoxA locus, drives expression of several HoxA genes.35 Using an engineered reporter plasmid, it was elegantly shown that HOTTIP activates HoxA genes by cis regulation.35 Recently, two lncRNAs highly expressed in aggressive prostate cancers, PRNCR1 and PCGEM1, were found to enhance transcription of ~2000 androgen receptor-responsive genes by binding to the androgen receptor.36 This study expanded the functional mechanisms of enhancer RNAs by demonstrating a sophisticated underlying mechanism of trans regulation.

T-UCRs

Untraconserved regions (UCRs) refer to a subset of conserved genome sequences longer than 200 bp that are conserved with 100% identity between orthologous regions of the human, rat and mouse genomes. Although a high degree of genomic conservation usually indicates functional relevance, more than half of the 481 ultraconserved regions described by Bejerano et al.37 have no protein-coding potential. Microarray analysis showed that 93% of the UCRs have transcriptional activity in at least one tissue, and consequently are referred to as T-UCR. T-UCR profiling in a panel of 133 human leukemia and carcinoma samples and 40 corresponding normal tissues identified specific signatures associated with each cancer type. For instance, uc.349A and uc.352, both mapping to the familial CLL-associated fragile chromosomal region 13q21.33-q22.2, are differentially expressed in normal versus malignant B-CLL CD5-postitive cells.38 Following the initial report, several studies have reported the importance of the role played by T-UCRs in cancer. For instance, uc.338, a T-UCR whose expression is markedly increased in human hepatocellular carcinoma compared with noncancerous adjacent tissues, promotes anchorage-dependent and anchorage-independent cell proliferation.39 Studies from our group showed that uc.475, a hypoxia-induced noncoding ultraconserved transcript, enhances cell proliferation specifically under hypoxic conditions.40 In addition, we identified a novel lncRNA, named CCAT2, transcribed from a highly conserved ‘gene-desert’ region, and encompassing the cancer-associated SNP rs6983267. We showed that CCAT2 is an oncogenic lncRNA promoting chromosomal instability and colorectal cancer metastasis.41 More recently, the uc.283+A T-UCR was shown to interfere with miRNA processing by binding to primary miRNA-195 (pri-miR-195) via sequence complementarity.42 Despite these findings, the biological activities and functional mechanisms of the majority of T-UCRs still remain largely unexplored. It should be noted that for functional attribution of T-UCRs in human diseases, precise gene annotation is the key, and this requires rigorous analysis determining sense or antisense orientation of ncRNA, for instance, by northern blotting, strand-specific PCR and deep sequencing.

NATs

NATs are endogenous RNA molecules that are partially or fully complementary to protein-coding transcripts. According to their genomic origin, NATs can be separated into cis-NATs, which are transcribed from the same genomic loci as their sense transcripts but from the opposite DNA strand, and trans-NATs, which are transcribed from genomic regions that are distinct from those encoding their sense counterpart.43, 44 Although generally expressed at relatively low level compared with the sense transcripts, NATs have been shown to effectively regulate expression level of their protein-coding targets.45 Systematic global transcriptome analysis suggested that ~70% of transcripts have antisense partners, and that perturbation of antisense RNA can alter the expression of the sense gene.46 NATs activate or inactivate sense gene transcription by mechanisms including epigenetic modifications.45 ANRIL is a NAT transcribed from the INK4A-INK4B gene-cluster locus encoding for the tumor suppressor genes CDKN2A and CDKN2B.47 Through interaction with CBX7, a component of polycomb repressive complex 1 (PRC1) able to recognize H3K27me3-repressive marks, ANRIL recruits the protein complex to its locus for sustained repression of the INK4A-INK4B gene cluster.48 NATs also affect gene expression through post-transcriptional regulation such as splicing. During epithelial–mesenchymal transition, a NAT at the ZEB2 locus is transcriptionally activated. This ZEB2 NAT inhibits splicing of an internal ribosome entry site-containing intron, and positively regulates ZEB2 protein expression.49 The regulation of sense transcript by NATs provides a natural way of improving or reducing protein expression.

LincRNAs

Initially identified using histone marker signatures associated with RNA polymerase II, lincRNAs have received much attention because of their lack of overlap with protein-coding genes. Therefore, their effect can be characterized without ambiguity in the attribution of biological functions.19 HOTAIR is among the first lincRNA that was functionally and mechanistically elucidated.50 Transcribed from a HOXC gene cluster, HOTAIR controls gene expression via a trans-effect, that is, affecting transcription on chromosomes other than the one producing the gene.50 This was achieved by interaction of HOTAIR with polycomb repressive complex 2 (PRC2) and LSD1, which promotes repressive histone marks (such as H3K27me3) to silence the HOXD locus.51 LincRNA-p21, a polyadenylated RNA transcribed from the upstream opposite strand to p21, is induced by DNA damage and acts as a downstream regulator of the p53 transcriptional response.52 LincRNA-p21 physically associates with hnRNP K through its 5′ end and represses p53-responsive apoptotic genes.52

The DNA-RNA twist in cancer genetics

The elucidation of the mechanisms underlying lncRNA function falls far behind the discovery pace of new lncRNAs. Although lncRNAs could be easily classified into different types according to their genomic locus or other features, this classification does not shed light on the mechanisms. Instead, lncRNAs from different classes might possibly share similar molecular mechanisms. Generally, the mode of action of lncRNAs can be classified into cis and trans regulation, depending on whether the lncRNA regulates neighboring genes on the same chromosomal regions where they are located or distant genes on other chromosomes, respectively (See Figure 1). In both cases, lncRNAs need to interact directly or indirectly with genomic DNA elements, in most cases with assistance of proteins, to perform specific biological functions. In addition, SNP variants inside a lncRNA sequence may not only affect the function of the DNA element, but also affect the primary sequence, and possibly the higher-order structure, and consequently the activities of the lncRNA.

Figure 1
figure 1

LncRNA functioning mechanisms via DNA–RNA interaction in cis or trans. (a) XIST loads onto the its own genomic locus via YY1 and recruits PCR2 complex to maintain repressive chromatin marked by H3K27me3 on the same X chromosome. (b) Enhancer RNAs transcribed from enhancer region maintain enhancer-promoter looping by recruiting mediators and transcription factors, and enhance transcription of neighboring mRNA genes. (c) PRNCR1, transcribed from 8q24, and PCGEM1, produced from 2q32, bind to androgen receptor (AR) to promote the chromatin status H3K4me3 and activate the AR-regulated genes located distant from their genomic loci. (d) HOTAIR recruits PRC2 and loads onto distant genomic loci to initiate repressive chromatin marked by H3K27me3 and block HOX gene transcription.

Cis regulation within the genomic context

LncRNAs have several unique properties as cis-acting molecules.53 First, lncRNAs are in close proximity, when compared with proteins, to their genomic locus during transcription and are thus able to direct locus- and allele-specific regulation. Second, the length of lncRNAs gives an advantage to bind with multiple epigenetic complexes and work as initiators or mediators in genomic looping feats necessary for active chromatin of gene transcription. Third, the length of lncRNAs makes it possible to function during transcription, and immediately after transcriptional termination the degradation signals might prevent diffused action at other genomic sites. Many lncRNAs mediate local functions in cis, interacting with chromatin-modifying proteins to regulate their neighboring genes. These include several previously mentioned enhancer RNAs and NATs. For instance, HOTTIP recruits WD repeat domain 5 (WDR5)/mixed lineage leukemia (MLL) complex to drive the H3K4M3 signature and gene transcription of HoxA distal genes.35 Chromosomal looping facilitates HOTTIP to act on its target genes.35 This mechanism was elegantly demonstrated with a luciferase reporter artificially tethered with HOTTIP.35 The lncRNA Mistral employs a similar mechanism of MLL interaction to recruit to and activate the Hoxa6 and Hoxa7 genes.54 The lncRNA ecCEBPA uses a different mechanism, by binding to DNMT1 to prevent methylation of the CEBPA gene.55

The cis regulation could also elicit broader epigenetic changes, as in the cases of Xist, an lncRNA silencing an entire female X chromosome, and of several other lncRNAs regulating gene imprinting. Xist is transcribed exclusively from the inactive X chromosome in females, and tethered to the X inactivation center by the transcription factor Yin Yang 1 (YY1).56 Xist RNA coats the X chromosome and serves as a scaffold for recruitment of silencing factors such as PRC2.57 Interestingly, a repeated motif named ‘Repeat A’ within the Xist RNA encompassing a stem-loop structure was shown to be responsible for the recruitment of the PRC2 complex to the inactive X chromosome.58 As an example of regulating gene imprinting, the lncRNA Air, transcribed from the paternal allele, recruits G9a to methylate H3K9 residues over an adjacent 300-kb genomic region, thus silencing the expression of distantly located genes including Igf2r, Slc22a2 and Slc22a3 on the paternal chromosome.59

LncRNAs not only regulate protein-coding genes, but can also activate neighboring lncRNAs. An example of this is the regulation of Xist by Tsix, a lncRNA transcribed in the antisense orientation in relation to Xist from the activate X chromosome.60 Tsix recruits PRC2 and methyltransferase DNMT3A to the Xist promoter, thus maintaining a repressive chromatin domain for long-term silencing of the Xist gene.61 In addition, Tsix and Xist can form RNA duplex structures, which are subsequently subjected to RNA interference into small regulatory RNAs.62

Although it has been shown that the in cis mechanism employs genomic looping to exert a regulatory effect, whether lncRNAs are necessary to maintain the loop still remains to be determined. Lai et al.63 demonstrated that knockdown of either lncRNAs or mediator (coactivator complex bridging regulatory information from enhancers to the promoter) abolished the chromatin interactions, supporting a participation of both the mediator and the lncRNA in looping enhancer–promoter interactions. Further, the lncRNA–mediator interaction regulates the kinase activity of the mediator protein, and subsequently promotes phosphorylation of serine 10 on histone H3, a chromatin mark for transcriptional activation.63 However, the role of lncRNA in maintaining chromatin looping was not observed in other studies. For instance, depletion of HOTTIP did not disrupt looping chromatin architecture, as determined by high-throughput chromosome conformation capture.35 A recent study similarly suggests that chromatin looping linking p53-binding sites and their targets does not depend on the lncRNAs transcribed from the p53-binding sites.64

Trans regulation at distant genomic loci

The property of interaction with proteins such as transcription factors or chromatin modifiers suggests the possibility of trans regulation by lncRNAs able to act outside the genomic locus they map to. About 20% of all lincRNAs have PRC2 as an interaction partner to regulate gene expression, thus suggesting widespread trans-regulated chromatin remodeling, as previously characterized for HOTAIR.65 Similarly, a cross-linking immunoprecipitation followed by sequencing (CLIP-seq) study of RNAs associated with the SFRS1 splicing factor identified more than 6000 spliced ncRNAs.66 Although not yet experimentally proven, it can be envisioned that a single ncRNA could affect a wide range of genes regulated by SFRS1. A more recent study showing regulation of androgen receptor-responsive genes by PRNCR1 and PCGEM1 also represents a trans mechanism through which more than 2000 genes are regulated by lncRNAs.36

While it is clear that lncRNAs target proteins to exert their in trans effects, the factors determining the RNA-protein interaction are not well-defined. Interestingly, several studies suggest that the secondary structure, instead of the primary lncRNA sequence, dictates a specific interaction. For instance, the tumor suppressor function of the MEG3 lncRNA was maintained by the conservation of the secondary structure, though not in its primary sequence.67, 68 In addition, repetitive sequences were found to contribute to the interaction with protein partners. In the case of Xist, although the cis regulatory mechanism is well established, it still provides an example to explain the importance of higher order structures in RNA–protein interaction. A cluster of nine repetitive elements within Xist was found to form stem-loop structures essential for the interaction with PRC1 and for H3K27 trimethylation, while another region encompassing repetitive elements was shown to bind to YY1 through a stem-loop structure tethering Xist onto the X chromosome.56, 69 Studies on short interspersed elements that are derived from transposons have also showed that repetitive sequences are the recognition domains for RNA polymerase II binding, and that such interactions leads to repression of mammalian heat-shock genes.70, 71

Another puzzling question relative to the mechanisms underlying in trans regulation is how the lncRNAs recognize specific genomic loci. One possibility is that the primary or secondary structure of lncRNAs defines their preferred interaction with certain genomic regions. Using a technique named chromatin isolation by RNA purification (ChIRP), in combination with deep sequencing of genomic binding sites, an enriched binding motif was identified for HOTAIR.9 The exact structure responsible for such RNA–DNA interaction remains to be determined. Notably, a promoter-associated lncRNA forms a triplex with the transcription termination factor 1 binding site, and subsequently recruits DNMT3b to silence rRNA gene.8 The specific recognition of genomic loci could also be achieved by the relay of protein partners, as illustrated by the activation of androgen receptors-responsive genes by PRNCR1 and PCGEM1 via interaction with androgen receptor.36

Linking SNPs, lncRNAs and cancer

The fact that ~90% of disease-associated SNPs are in genomic regions not coding for proteins10 suggests that these ‘gene-poor’ regions may represent a ‘gold mine’ towards the identification and characterization of novel lncRNAs. To facilitate such an effort, a lincSNP database has been established to link lncRNAs with disease-related SNPs.72 Although the association does not necessarily mean a causal relationship between specific lncRNAs and disease phenotypes, the possibility of finding long-sought lncRNA culprits is a very attractive one. In addition, a disease predisposition SNP may flag the existence of regulatory element of a gene whose function is only weakly affected by the SNP variant(s). These ‘disease predisposing’ SNPs could be located upstream, within, or downstream of the lncRNAs. Here, we only review the cancer-related lncRNAs that also encompass cancer-risk SNPs (see Table 1). ANRIL was found to be a hotspot for risk locus for gliomas and basal cell carcinomas in GWAS.73 The rs2151280 SNP variants located within the ANRIL gene were significantly associated with susceptibility to neurofibromas.74 Moreover, the T allele of rs2151280 was correlated with lower ANRIL levels, suggesting that this SNP variant could affect ANRIL expression.74 The rs2839698 and rs2107425 SNPs located within H19, a lncRNA with both oncogenic75 and tumor suppressive activity,76 were reported to be associated with bladder cancer risk.77 Rs2107425 is also found to confer increased breast cancer risk in a different study.78 HULC, a lncRNA involved in hepatocellular carcinoma, encompasses the rs7763881 SNP that determines susceptibility to hepatocellular carcinoma in HBV patients.79 Similarly, this group also identified that the rs619586 variants, located within the MALAT1 gene, were associated with hepatocellular carcinoma risk though with marginal significance.79

A twisted 8q24 genomic region

The 8q24 genomic region is frequently altered by amplification, deletion, viral integration or translocation in many types of human cancers.80 A large-scale study identified the 8q24 region as the most frequently (14%) amplified region among inhuman cancers.81 In addition, GWAS point to 8q24 as a hotspot for cancer-associated SNPs owing to the density, strength, as well as the high allele frequency of these SNPs.82 However, the 2 Mb SNP-rich 8q24 region has nevertheless been considered a ‘gene desert’ largely because of the absence of functionally annotated genes with the only notable exception of the MYC proto-oncogene.83 Several 8q24 loci have demonstrated enhancer activity, and it has been proposed that these enhancer activities might regulate MYC expression through looping with its promoter.84 Recently, several reports revealed that lncRNAs including CCAT1,85 CCAT2,41 CARLo-5,86 PVT1,87 PCAT1,88 and PRNCR1 36 are transcribed from these regions (Figure 2). Among these, CCAT2, PCAT1 and PRNCR1 encompass the cancer predisposition SNPs (Table 1).41, 89, 90, 91, 92 Several of these lncRNAs (for example, CCAT1 and CCAT2) regulate MYC expression,41, 85 while the rs6983267 SNP that resides within the CCAT2 gene shows allele-specific effect on the lncRNA CARLo-5 expression levels.86 Recently, MYC copy-number gains were found to depend on PVT1 in mice with chromosome engineering.87

Figure 2
figure 2

LncRNA and cancer predisposition SNPs on 8q24 genomic region. The 8q24.21 genomic region contains multiple lncRNA genes, located either upstream or downstream of the proto-oncogene MYC. Most of them have shown functional involvement in cancer, and some regulate MYC expression levels. The same region also features multiple cancer predisposition SNPs, either within or outside of the noncoding gene, suggesting a complex regulation network linking SNPs, lncRNAs and MYC.

The CCAT2 gene is located in a very special region: first, this genomic region has shown enhancer activity affected by the SNP variants.93, 94 Second, the rs6983267 SNP it encompasses is one of the most consistently identified predisposition SNPs in multiple types of cancer including colorectal cancer, prostate cancer, ovarian cancer, head and neck cancer and inflammatory breast cancer.95, 96 Third, its genomic sequence is highly conserved among mammals, supporting a functional role for this element.41 Deletion of the 8q24 region encompassing the rs6983267 was found to reduce intestinal tumor multiplicity in ApcMin/+ mice.97 However, the genetic deletion removes not only the DNA enhancer elements, but also the CCAT2 gene, thus allowing for different explanations for the observed phenotypic changes. Our study showing MYC regulation via knockdown approaches suggest that CCAT2 could independently regulate MYC transcription. Analysis of colorectal cancer samples showed a correlation of MYC and CCAT2 at the transcriptional level, further providing experimental support for the causal relationship. Most interestingly, overexpression of CCAT2 transforms a chromosomal stable cell line with near-diploid status into a chromosomally unstable one, with a marked increase in polyploidy. This is well in agreement with the high CCAT2 expression levels found in microsatellite stable tumors, often characterized by aneuploidy, when compared with the near-diploid MSI-High colon tumors.41 Although we proved the oncogenic nature of CCAT2 in promoting chromosomal instability and colorectal cancer, whether the rs6983267 SNP variants affect CCAT2 function still remains to be further elucidated. From this perspective, we reported a significant positive correlation between CCAT2 and MYC expression in GG samples but not in TT samples of CRCs.41

As MYC and its regulatory networks have been proposed as one of the most important drivers in colon cancer development (as implicated by the large-scale TCGA project),98 we hypothesize that a complex regulatory network containing DNA elements (enhancers) and RNA transcripts (lncRNAs) for the MYC gene is active in the 8q24 region and acts to fine tune the expression and function of this critical gene. The concept of super enhancers, defined as large clusters of transcriptional enhancers driving gene expression, has also recently surfaced, and points to MYC regulation in the 8q24 region as a typical example.99

It is also possible that lncRNAs may have fundamental biological effects, independent of MYC transcription, and that these factors together initiate or promote cancer pathogenesis. A genome-wide association approach identified that 75% of the disease-associated SNPs affect expression of lncRNA, but not that of neighboring protein-coding genes.100 In addition, such effects are tissue-dependent, reflecting regulation of a complex trait.100 As we learned from PCAT1 and CCAT2, lncRNAs transcribed from the 8q24 locus may affect double-stranded DNA break repair101 and chromosome instability,41 which consequently exert a broader biological effect in promoting cancer pathogenesis.

The clinical relevance of the DNA-RNA twist in cancer

Many lincRNAs such as ANRIL,48 HOTAIR,102 PCAT-1,88 PRNCR1,36 PCGEM1,36 CCAT2,41 and MALAT1103 have been shown to associate with human cancer. Recently, XIST, a lncRNA for X-chromosome inactivation, was also shown to suppress hematologic cancer.104 The abnormal expression profile and functional importance of lncRNAs in cancer suggest translation potential of this knowledge into clinical applications for the cancer patients.

LncRNAs are generally more tissue-specific than protein-coding genes and thus may be more specifically associated with certain cancer subtypes.6 This tissue-specific expression pattern can possibly enhance the utility of lncRNAs as biomarkers for the early diagnosis of localized cancers from different body fluids, for the detection of cancer metastasis, the prediction of clinical outcome and/or to reveal the origin of metastatic cancers. For instance, increased MALAT1 expression levels predict metastasis and poor survival in early-stage NSCLC.105 Likewise, elevated HOTAIR levels are associated with poor prognosis in several cancer types including breast,102 liver,106 colorectal,107 gastrointestinal108 and pancreatic109 cancers. A mouse study demonstrated that HOTAIR initiate breast cancer metastases.102, 103 Also, CCAT2 levels in primary tumors showed an inverse correlation with metastasis-free survival of breast cancer patients.41, 110 Furthermore, a bioinformatics study identified 120 individual lncRNAs that are significantly associated with progression-free survival in prostate cancer.111

An ideal lncRNA biomarker requires robust detection in plasma and other biofluids such as urine. Although lncRNA stability in such environments remains largely unknown, several studies have suggested the potential of lncRNAs as biomarkers. MALAT1 fragment levels in patient plasma were found to significantly differentiate human subjects with or without prostate cancer.112 The specific association of PCA3 with prostate cancer has been developed into a FDA-approved commercial Progensa PCA3 assay aiding for the recommendation of repeated prostate biopsies.113 The finding of lncRNA germline and somatic mutations in leukemia and colorectal cancer114 suggests that a combined strategy of genotyping the DNA sequence and measurement of lncRNA expression levels may strengthen the disease connection.

The DNA–RNA coordination in determining a specific activity indicates that disruption of either one component could have functional consequences. LncRNAs may represent ideal therapeutic targets. Another attractive feature of lncRNA therapeutics is the capacity to increase protein output in a more natural way, for instance, by targeting NATs. The effect of cis-acting NATs may be more focused on a local gene, and potentially such therapy has less off-target effects. Here the clear understanding of the mechanism of lncRNA within its genomic context is the key for such therapeutic development.

Conclusion

‘One man’s junk is another man’s treasure’. The recent advances in lncRNA research have revealed transcriptional treasures from the once derided ‘junk’ DNA regions. Although currently only a small fraction of the lncRNAs have been functionally characterized, we believe that the reservoir of functional lncRNAs will quickly expand as the result of many emerging technologies for high-throughput screening and functional validation. For instance, studies on protein interaction coupled with the transcriptome data can be greatly facilitated by photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP);115 genomic occupation sites of lncRNAs can be profiled by ChIRP and subsequent DNA sequencing;9 functional motifs within RNA can be detected by RNA–mechanically induced trapping of molecular interactions (RNA-MITOMI);116 RNA movement can be traced by live imaging using engineered fluorescent RNAs.117 However, because of the extremely large number of lncRNAs in the human genome, it may be more practical to first focus on the disease-associated lncRNAs suggested by other studies such as expression analysis and GWAS findings. These disease-related SNPs can be useful marks to flag functioning lncRNAs. In addition, lncRNAs identified in such regions, either functionally affected or altered in their expression levels by specific SNP variants, may be the culprits underlying the mechanisms of disease predisposition. Elucidation of such mechanisms needs a detailed understanding of lncRNA structure, structure-function relationship and a suitable experimental system to distinguish the subtle differences.

Owing to tissue-specific expression patterns and site-specific action of lncRNAs, drugs targeting lncRNAs could achieve more selective therapeutic effect than conventional drugs. In addition, the allele-specific regulatory mechanisms of lncRNAs may be exploited for precise control of gene expression, presumably with fewer side effects. Synthetic oligonucleotides with high affinity and specificity, such as those with locked nucleic acid modifications, allow for targeted regulation of lncRNA expression. Small molecule chemical compounds showing specificity towards a lncRNA could also be tested as candidates to interrupt lncRNA–protein interaction, or interfere with the lncRNA loading onto its target genomic regions.

The regulatory scheme in human cells is complicated, and it is rare that a single molecule can explain an entire disease phenotype. It can be envisioned that in a specific genomic locus there are intertwined transcripts of many kinds, including protein-coding genes, overlapping intronic and noncoding RNAs in the sense or antisense orientation relative to the protein-coding genes, further complicated by the various isoforms caused by alternative splicing. Thus, a loss or gain of a genomic region, as frequently seen in cancer, will not only affect DNA regulatory elements, but also affect the transcription landscape. This concept can be further expanded to include regulatory circuitry at several genomic loci containing both coding and non-coding genes with reciprocal interactions and feedback loops to determine a disease phenotype. Hence, it is of critical importance to consider the genetic context, including gene locus, neighboring genes, chromatin status and target genomic regions, for a comprehensive functional annotation or therapeutic manipulations in the battle against cancer.