Introduction

In the past decade, benefiting from the improvement of large-scale sequencing technique and bioinformatics methods, the completion of the Human Genome Project ENCODE (Encyclopedia of DNA Elements) [1, 2] and FANTOM (Functional Annotation of Mammals) consortia [3] has highlighted the prevalence of non-protein-coding functional elements in human genome. Following the sequencing of the whole human genome, GENCODE, using the next-generation sequencing and the study of genetic landmarks indicative of transcription, has revealed that there are 60,498 total genes (Version 23, March 2015 freeze, GRCh38). Among of these 60,498 genes, ENCODE only defines 19,797 genes as protein-coding genes, with almost all of the other genes being classified as long non-coding RNA, small non-coding RNA genes, and pseudogenes (http://www.gencodegenes.org/stats/current.html). Recently, accumulating evidence has revealed that lncRNAs play important roles in the process of carcinogenesis and tumor progression through chromatin remodeling, epigenetic modification, and sponging miRNAs. Since their discovery in 1977, pseudogenes have been considered as non-functional genomics fossils or biologically inconsequential [4]. Generally, pseudogenes are derived from unfaithful gene duplications or retrotransposition of processed mRNAs back into the genome, and they are divided into three main categories: processed pseudogenes, unprocessed pseudogenes, and unitary pseudogenes based on how they were generated from their ancestral gene [5]. Accordingly, the unprocessed pseudogenes are generated by segmental duplications and disabled by mutations, and they typically have promoter, intron, and exons; even these elements might lose their function. The second major class of pseudogenes is processed pseudogenes that is derived from the reverse transcription of a parental gene’s mRNA and therefore are limited to a single exon in structure [6]. The unitary pseudogenes only constitute a small fraction of all pseudogenes and are a subclass of unprocessed pseudogenes; they do not have an identifiable parental gene in the genome where they reside [7, 8]. However, one of them, the XIST gene, has been widely investigated for its key roles in X-chromosome inactivation [911].

Although pseudogenes were once regarded as “genomic fossil” or “junk genes” [12, 13], recent studies have revealed the multilayered biological function of some pseudogenes in multiple cellular processes [14], especially their involvement in human cancers [15]. In addition, increasing evidence showed that pseudogenes play important roles in post-transcriptional or transcriptional regulations of gene expression by function as antisense RNA, endogenous small-interference RNA (endo-siRNA or esiRNA), endogenous competitors for miRNA, RNA-binding protein (RBP), or translational machinery (Fig. 1) [7, 16]. Moreover, some pseudogenes transcribed non-coding RNAs more than 200 nt in length cloud be included in lncRNA classification. In spite of that, the investigation of pseudogene function has remained limited for a long time, and the recent improvement of genome-wide platforms has led to the discovery that many pseudogenes are transcriptionally active and their disorder-transcribed RNAs contribute to several human diseases including cancers. In this review, we discuss the functions and clinical relevance of pseudogene-expressed RNAs in cancer and hope that this will increase our understanding of the regulatory functions and mechanisms of pseudogenes in cancer development.

Fig. 1
figure 1

Mechanisms of pseudogenes mediated regulation in cancer cells. a The pseudogenes act as competing endogenous RNA for microRNA sponge. b The pseudogene acts as a decoy for mRNA. c The pseudogene generates into endogenous siRNAs. d The pseudogene acts as a decoy for protein. e The pseudogene recruit histone modification protein EZH2 and G9a to target gene promoter, thereby regulating their transcription

Disorder pseudogene-expressed RNAs in cancers

Previously, studies have shown that the pseudogenes participate in ceRNA interactions and contribute to tumorigenesis; however, it is difficult to distinguish pseudogenes from their parental genes due to their high homology to parental genes, which hindered attempts to study pseudogenes on a large scale. Recently, Shanker et al. developed a bioinformatics pipeline for the identification of transcribed pseudogenes based on next-generation sequencing data of cancer samples and identified 2082 pseudogene transcripts. Among them, 218 expressed only in cancer samples, and a breast cancer-specific pseudogene-ATP8A2-ψ was further validated and found to be restricted to breast tumor with luminal histology. Moreover, subsequent gain- and loss-of-function assays in vitro revealed the oncogenic role of ATP8A2-ψ in breast cancer cells [17]. Meanwhile, Han et al. developed a similar computational pipeline and detected 9925 pseudogene transcriptions in 7 cancer types from The Cancer Genome Atlas (TCGA) RNA-seq data, and they found that many pseudogene transcripts are tissue- and/or cancer-specific. Importantly, the study firstly systematically revealed the potential of pseudogenes as subtype and prognostic biomarkers in cancers [18]. In addition, Joshua and colleagues identify a set of 440 pseudogenes that are transcribed in breast cancer, and 309 of them exhibit significant differential expression among breast cancer subtypes [19]. The progression in sequencing accuracy and coverage depth and advance on the detection efficacy of these RNA-based bioinformatics pipelines is facilitating the researches in this regard. As of following, we will list and discuss some important pseudogene-expressed RNA (Table 1) roles and their underlying mechanisms or regulated pathways in cancer development and progression.

Table 1 Cancer-related pseudogenes

The PTEN pseudogene PTENP1

In 2010, Poliseno et al. revealed that RNA molecules that share MREs can regulate each other by competing for microRNA binding, and they hypothesize that protein-coding mRNAs and non-coding RNAs would communicate with each other in a microRNA-dependent manner through a MRE language [40, 41]. Specifically, they identified several pseudogene transcripts exerting regulatory control of their ancestral cancer gene’s expression by competing for microRNAs, and one of these pseudogene transcripts is PTENP1 [20]. PTENP1, a processed pseudogene residing at 9p13.3, is highly homologous to tumor suppressor gene PTEN with only 18 mismatches throughout the coding sequence and a missense mutation of the initiator methionine codon prevents its RNA translation [42]. There are evidence showing that PTEN is post-transcriptionally regulated by numerous miRNAs in cancer cells, and the conserved seed matches for the PTEN-targeting miR-17, miR-21, miR-214, miR-19, and miR-26 families are also found in the PTENP1 sequence within the high-homology region. Furthermore, PTENP1 was found to possess a regulatory function of PTEN through acting as a decoy for PTEN-related miRNAs and competing for these miRNAs, while loss of PTENP1 expression released these miRNAs, which instead targeted PTEN and reduced its protein levels [20]. In addition, PTENP1 over-expression repressed the tumorigenic properties of HCC cells by decoying miR-17, miR-19b, and miR-20a, which would target PTEN, PHLPP and such autophagy genes as ULK1, ATG7, and p62 [22]. Meanwhile, PTENP1 is found to be downregulated in clear-cell renal cell carcinoma tissues and cells due to methylation, and PTENP1 suppressed cancer progression by functions as a competing endogenous RNA (ceRNA) through decoying miR-21 [23]. However, deletion of the PTENP1 and PTEN locus was found in melanoma, which suggested that potential roles of PTENP1 go beyond acting as decoy for PTEN-related miRNAs [43]. Interestingly, Johnsson et al. found that PTENP1 also encodes two asRNA isoforms: PTENP1 asRNA alpha and beta, which indicated that the regulatory function of PTENP1 is more complex. The alpha isoform shares the greatest sequence with PTEN and recruits the DNMT3a and EZH2 to the PTEN promoter to suppress its transcription; however, the beta isoform asRNA forms RNA–RNA interactions with the PTENpg1 sense transcript and stabilizes PTENP1 sense, which consequently affecting its ability to sponge PTEN-related miRNAs [44]. Although the involvement of PTENP1 asRNA has not yet been investigated in human cancers, and it is likely that PTENP1 may also play both transcriptional and post-transcriptional regulatory roles in cancer cells.

The BRAF pseudogene

Although in vitro experiments over the past few years demonstrated that pseudogenes contribute to cell transformation and tumorigenesis through several mechanisms, however, in vivo evidence for potential roles of pseudogenes in cancer development is lacking. Recently, Florian et al. generated a transgenic allele containing murine B-Raf pseudogene Braf-rs1 and found that mice engineered to over-express murine B-Raf pseudogene develop an aggressive malignancy resembling human diffuse large B cell lymphoma. Moreover, they revealed that the in vivo proto-oncogenic function of murine B-Raf pseudogene is partly dependent on its regulation of B-Raf and through decoying miR-134, miR-543, and miR-653. Similarly, its human ortholog-BRAFP1 also elicits oncogenic activity at least in part as ceRNAs that elevate BRAF expression and MAPK activation. Furthermore, BRAFP1 over-expression increased BRAF and pERK levels as well as proliferation of human cells, while BRAFP1 silencing reduced proliferation of OCI-Ly18, H1299, and PC9 cells and elicited a significant effect on BRAF expression. Moreover, luciferase-report assays showed that four human miRNAs (miR-30a, miR-182, miR-876, and miR-590) were able to repress both BRAFP1-and BRAF-luciferase reporters, which indicated that BRAFP1 may be an oncogenic ceRNA in human cancer [24].

The OCT4 pseudogenes

Unlike the PTEN and BRAF pseudogene, it has been expanded to a total number of six OCT4 pseudogenes since the first OCT4-related pseudogene was reported [45]. OCT4 (POU5F1) is a transcription factor that plays critical roles in maintaining pluripotency and self-renewal of stem cells [46]. Recent transcriptional investigations have revealed that several of its pseudogenes are transcribed, and one of them OCT4pg1 is initially reported to be putative cancer susceptibility gene and is over-expressed in prostatic carcinoma by Kastler et al. [47]. In addition, OCT4pg1 (POU5F1B) is also reported to be amplified and expressed at a high level in gastric cancer, and its amplification is associated with a poor prognosis in gastric cancer patients as well as confers an aggressive phenotype on GC cells. Importantly, over-expression of OCT4pg1 promoted GC cells colony formation in vitro as well as both tumorigenicity and tumor growth in vivo, and knockdown of OCT4pg1 expression confirmed the role for OCT4pg1 in the promotion of cancer cell growth and tumor growth. Moreover, OCT4pg1 over-expression upregulated various growth factors in GC cells as well as exhibited angiogenic, mitogenic, and antiapoptotic effects in GC xenografts [34]. Besides, another OCT4 pseudogene-OCT4-pg4 is also involved in carcinogenesis. OCT4-pg4 is abnormally activated in hepatocellular carcinoma (HCC), and its expression level is positively correlated with that of OCT4, and survival analysis suggests that a high OCT4-pg4 level is significantly correlated with poor prognosis of HCC patients. Moreover, mechanism investigation revealed that OCT4-pg4 functions as a competing endogenous RNA to protect OCT4 transcript from being inhibited by miR-145, thus promoting HCC cell growth and tumorigenicity [35]. Meanwhile, mouse Oct4P4 lncRNA could form a complex with the SUV39H1 HMTase to direct the imposition of H3K9me3 and HP1a to the promoter of the ancestral Oct4 gene, leading to its silencing and reduced mESC self-renewal [48]. Unlike OCT4-pg1 and OCT4-pg4, OCT4pg5 could encode an antisense RNA that acts as a negative regulator of OCT4 and Oct4 pseudogenes 4 and 5 expressions. The Oct4-pg5 antisense RNA could recruit Ezh2 and G9a to the promoter of Oct4, which in turn leads to the trimethylation of histone 3 Lys27 (H3K27me3) modifications and silencing of Oct4 transcription. Moreover, knockdown of PURA and NCL had a negative effect on Oct4 mRNA levels, which may through interaction with Oct4-pg5 asRNA and sequester it away from targeted loci [36].

The HMGA1 pseudogenes

The HMGA1 is one of the high-mobility group A (HMGA) family, which are nuclear proteins that participate in the organization of nucleoprotein complexes and contribute to chromatin structure, replication, and gene transcription. The HMGA1 gene codes for two proteins-HMGA1a and HMGA1b that bind to DNA and organize chromatin architecture, interacting with several transcription factors and regulating the gene transcription. HMGA1 over-expression is a feature of human cancer, and their expression levels point out a poor prognosis of the cancer patients [49]. Recently, Esposito et al. used bioinformatics analysis to investigated HMGA1 pseudogenes in cancer settings, and they identified and characterized two processed pseudogenes HMGA1P6 and HMGA1P7 that are placed at 13q12.12 and 6q23.2, respectively. Moreover, HMGA1P6 and HMGA1P7 were over-expressed in human anaplastic thyroid carcinomas that are highly aggressive, but not in differentiated papillary carcinomas that are less aggressive. Consistently, over-expression of HMGA1P6 and HMGA1P7 promoted 8505c cells proliferation, cell cycle progression, migration, and invasion, while knockdown of their expression impaired cell growth and increased the number of cells in the sub-G1 phase, inhibited cell migration and invasion and induced cell apoptosis. Bioinformatic analysis revealed that HMGA1P6 and HMGA1P7 contain sequences that can be targeted by miRNAs (miR-15, miR-16, miR-26a, miR-214, miR-548c-3p, and miR- 761) that target the HMGA1 gene, and luciferase reporter assays revealed that miR-15, miR-16, miR-214, and miR-761 could directly bind to HMGA1P6 and HMGA1P7 sequences. Importantly, over-expression of HMGA1P6 or HMGA1P7 drastically reduced the effects exerted by miRNA on the levels of both the HMGA1 transcript and proteins, supporting that HMGA1P6 and HMGA1P7 act as decoys for HMGA1-targeting miRNAs to regulate HMGA1 levels. Finally, the expression of HMGA1P6 or HMGA1P7 was significantly correlated with HMGA1 protein levels thereby implicating their over-expression in cancer progression [28].

The TUSC2P pseudogene

The tumor suppressor candidate-2 (TUSC2), also known as Fus-1, is a novel tumor suppressor gene that functions as a “gatekeeper” in the molecular pathogenesis of cancer, which may function as a proapoptotic factor and is involved in the release of cytochrome c from the inner membrane of the mitochondria. When analyzing the sequence of the TUSC2 3′UTR, one pseudogene of TUSC2-TUSC2P was identified, which shared 89 % homology with the 3′UTR of TUSC2. Interestingly, many miRNAs were found to have common binding sites for both TUSC2 and TUSC2P, including miR-661, miR-299-3p, miR-93, miR-17, miR-608, and miR-502. TUSC2P can bind to and antagonize these endogenous miRNAs, thereby modulating TUSC2, TIMP2, and TIMP3 expression, while absence of TUSC2P, those miRNA can bind and inhibit the translation of TUSC2, TIMP2, and TIMP3 mRNAs through RNA-induced silencing complex. Moreover, ectopic over-expression of TUSC2P and the TUSC2 3′UTR inhibits cancer cell proliferation, migration, invasion, and induces cell death, suggesting that the TUSC2P may thus be used as combinatorial miRNA inhibitors and might have clinical applications [25].

The INTS6P1 and VEGFR-1 pseudogenes

As the ceRNA paradigm has refocused the attention on pseudogenes, Peng et al. identified the putative tumor suppressor INTS6 and its pseudogene INTS6P1 in HCC through the whole genome microarray expression. INTS6 and INTS6P1 were downregulated in HCC tissues compared with normal tissues, while miR-17-5p was found to be upregulated in same HCC tissues. Moreover, INTS6 and INTS6P1 over-expression impaired cell proliferation in vitro and tumor growth in vivo, inhibited cell migration and induced cell apoptosis in HCC. In addition, increased miR-17-5p expression induced downregulation of INTS6, as well as INTS6P1 while inhibition of miR- 17-5p induced de-repression and subsequent upregulation of INTS6 and INTS6P1. Lastly, the mechanistic experiments revealed that INTS6P1 and INTS6 are reciprocally regulated through competition for miR-17-5p in HCC cells [31]. Interestingly, the plasma INTS6P1 levels were also significantly decreased in HCC patients compared with non-HCC patients, which indicating that INTS6P1 may be used as a novel plasma-based biomarker and might improve the accuracy of HCC screening [50].

Similarly, Ye et al. report their finding of an actively transcribed VEGFR1/FLT1 pseudogene that is transcribed bidirectionally (FLT1P1) in human colorectal cancer cells. Knockdown of FLT1P1 expression by RNA interference (RNAi) markedly inhibited CRC cells proliferation and xenograft tumor growth in vivo. Furthermore, mechanistic investigation showed that expression of FLT1P1 antisense transcript not only inhibited the VEGFR1 expression but also inhibited non-cognate VEGF-A expression through interacting with miR-520a in CRC cells [39].

The PPM1K and AOC4P pseudogenes

As aforementioned, most of the pseudogenes regulated underlying target gene expression by function as ceRNA or interacting with microRNAs. However, Wang et al. identified a novel tumor suppressive pseudogene termed amine oxidase, copper containing 4, pseudogene (AOC4P), whose expression was significantly downregulated in HCC samples and negatively correlated with advanced clinical stage, capsule, and vessel invasion. Meanwhile, decreased AOC4P expression is correlated with poor prognostic outcomes and may serve as an independent prognostic factor for HCC patients. Moreover, functional assays showed that AOC4P over-expression significantly reduced cell proliferation, migration, and invasion through inhibiting the epithelial-mesenchymal transition (EMT). In vivo experiments confirmed the ability of AOC4P to inhibit tumor growth and metastasis. RNA immunoprecipitation assays demonstrated that AOC4P could bind to vimentin protein and promote its degradation [32].

More interestingly, Chan et al. used bioinformatics methods for identifying pseudogene-derived esiRNA through a genome-wide survey, and they identified a partial retrotranscript pseudogene PPM1KP that contained inverted repeats capable of folding into hairpin structures that can be processed into two esiRNAs. Moreover, these esiRNAs were significantly downregulated in HCC tumor tissues, and over-expression of PPM1KP decreased cell growth and clonogenic activity. Bioinformatics analysis predicted that PPM1K-specific esiRNAs are expected to regulate cognate gene PPM1K and NEK8 through association with multiple target sites. Additionally, PPM1K and NEK8 were downregulated in PPM1K-overexpressing cells, and expression of NEK8 can counteract the growth inhibitory effects of PPM1K. These findings suggested that PPM1KP can exert tumor suppressor activity independent of its parental gene by generation of esiRNAs that regulate human cell growth [38].

Others

In addition to the above pseudogenes, there are also some other pseudogenes that contribute to cancer development. Zheng et al. reported that ectopic expression of pseudogene CYP4Z2P 3′UTR in breast cancer cells increased the expression of VEGF-A without affecting cell proliferation in vitro but could enhance proliferation, tube formation of HUVEC, and promote angiogenesis in vivo models [27]. Further study indicated that increased CYP4Z2P-3′UTR expression promotes tumor angiogenesis in breast cancer may be partly via miRNA-dependent activation of PI3K/Akt and ERK1/2 [26]. Moreover, downregulation of pseudogene TPTE2P1 inhibits migration and invasion of gallbladder cancer cells [37], while upregulation of SUMO1 pseudogene 3 (SUMO1P3) in gastric cancer is associated with patients poor prognosis [33].

Conclusion

Although pseudogenes have been considered as non-functional relics littering the genome for a long time, it is clear that many pseudogenes are transcribed now. Recently, a handful of investigations have highlighted their involvement during pathogenesis of diseases such as cancer. A growing body of evidence indicates that thousands of pseudogenes are transcribed as sense transcripts, but only a few of them were found to regulate gene expression through acting as sponges/decoys for miRNAs and proteins or pseudogene asRNAs-mediated regulation. To date, only a few intriguing reports reveal the involvement of pseudogene in human cancers and their underlying mechanisms, while the biological function of the great majority of the tens of thousands of annotated pseudogenes currently remains unknown and investigations of them have proved challenging.

In this review, we highlighted the critical roles of some important pseudogenes in human cancer, and revealed that some of them may be potential therapeutic targets. Although only a small number of pseudogenes have been well characterized in human cancers, the research of pseudogenes is expanding quickly. Therefore, more functional investigations are needed in order to better understand their exact roles in tumorigenesis process, and it is meaningful to elucidate underlying molecular mechanisms and pathways of these pseudogenes. Despite accumulating evidence supporting the potential therapeutic value of pseudogenes for cancers, the regulators involved in pseudogene dysregulation and underlying mechanisms are still not well-known. Additionally, how pseudogenes cross-talk with epigenetic machineries in the pathogenesis of cancer needs to be further investigated. Therefore, integration of pseudogenes into cancer biology will deepen our understanding of the mechanisms of this deadly disease, and some specific pseudogenes may be translated into clinical applications for diagnosis, prognosis, or treatment of cancer patients.