1 Introduction

Somatic and/or germ line mutations are universal characteristics of cancer. Discovery of mutations that activate oncogenes (e.g., RAS) or inactivate tumor suppressors (e.g., TP53, BRCA1) has been pivotal in our understanding of cancer biology, with great relevance to current clinical practices. Large-scale initiatives such as the International Cancer Genome Consortium (ICGC), The Cancer Genome Atlas (TCGA), and Genomics Evidence Neoplasia Exchange (GENIE) have revealed common genomic alterations (e.g., mutation hotspots, DNA copy variations etc.) in more than 30 different cancer types, generating novel insights into cancer biology. Currently, tumor-specific genetic defects are identified and can be used for treatment decisions (reviewed in [1]). Despite all these exciting developments, protein-coding sequences are only the 1.5–2% of mammalian genomes, whereas 80–90% of mammalian genomes are transcribed into non-coding (nc) RNAs, including microRNAs, long ncRNAs, siRNAs, piRNAs, tRNAs, rRNAs, snRNAs, and snoRNAs [2]. ncRNA genes are distributed throughout the genome, in between genes or within genes, with their own regulatory elements or they piggy-back on their host genes for expression. Many of ncRNAs are further processed by splicing or enzymatic cleavage into smaller and functional products that regulate other RNAs and/or proteins. Hence, ncRNAs provide an unprecedented functional richness and complexity to the concept of the central dogma. Consequently, with advances in RNA-sequencing methods, the discovery rate of ncRNAs and our understanding of the non-coding part of our genomes are increasing, providing novel insights into normal physiology and disease mechanisms including cancer (reviewed in [3, 4]).

Overall, together with protein-coding regions and ncRNAs, we have a more comprehensive understanding of the cancer cell. However, there is yet another group of sequences overlooked in mammalian genomes that represent both worlds; 3′ untranslated regions (3′UTRs), which are the non-coding sequences of protein-coding mRNAs. For a long time, 3′UTRs were known for their obscure role in regulating mRNA transport, mRNA stability, however were generally dismissed as merely accessory sequences next to the open reading frames. Now, accumulating evidence show 3′UTRs to be much more. This review addresses diverse roles of 3′UTRs, how 3′UTR variants form and discusses how advances in RNA biology are likely to shape our understanding of cancers from a 3′UTR perspective.

2 Roles of 3′UTRs

The coding region of an mRNA is the recipe to make a peptide; however, supporting role of the 3′UTR in this process is no less significant. 3′UTRs regulate mRNA metabolism in multiple aspects, including mRNA stability, mRNA localization, and mRNA translation efficiency, all of which control how much and when the coding sequence will be translated into the protein. The following sections will describe different aspects of mRNA metabolism from a 3′UTR perspective.

2.1 mRNA half-life and stability

mRNA half-lives range from minutes to hours. Pharmacological inhibition of transcription and more recently, metabolic labeling/sequencing approaches showed grouping of short and long-lived mRNAs according to the function of corresponding proteins. mRNAs of transcription factors and cell cycle regulators have short half-lives (less than 2 h), whereas mRNAs for housekeeping and biosynthetic proteins have much longer half-lives [5, 6]. This correlation is also present in yeast [7, 8], suggesting evolutionarily conserved mechanisms to control mRNA stability, which is regulated at the 3′UTRs.

3′UTRs harbor cis-elements that are recognized by trans-factors including RNA-binding proteins and microRNAs. Approximately 7.5% of all protein-coding genes are generally ubiquitously expressed and directly involved in RNA metabolism by binding to and/or processing RNA [9]. RNA-binding proteins generally bind to 3′UTR AU-rich elements through their RRM (RNA recognition motif), KH (hnRNP K homology), DEAD box helicase, and zinc finger domains [10]. There are also non-canonical RNA-binding domains including intrinsically disordered protein regions (reviewed in [11]). On the other hand, microRNAs bind to 3′UTRs through imperfect base pairing and negatively regulate gene expression at the posttranscriptional level [12]. Overall, binding of proteins and/or microRNAs to specific recognition sequences or secondary structures within mRNAs modulates translation or mRNA decay rates by recruiting specific enzyme complexes that perform the destruction processes. Indeed, availability of structural (e.g., stem loops) regions and/or cis-elements (e.g., AU-rich elements) correlates negatively with mRNA half-lives [6]. However, the availability of such cis-elements on a 3′UTR does not always enforce a specific biological destiny as mRNA stability can be modulated during biological processes including inflammatory response [13], cellular stress [14], and differentiation [15]. Possible ways to regulate mRNA stability are through different modes of binding of trans-factors; however, these are not necessarily mutually exclusive interactions (Fig. 1).

Fig. 1
figure 1

Modes of binding to cis-elements on 3′UTRs. mRNA cis-elements are recognized by RNA-binding proteins and/or microRNAs. These trans-factors regulate the mRNA fate by binding to their target sequences such as the AU-rich elements. Binding of a specific trans-factor may prevent the binding of others through competitive binding (A) and/or in a collaborative manner (B). Binding of trans-factors can be activated or inhibited (e.g., through phosphorylation) by upstream proteins (i.e., regulated binding, C). Binding of trans-factors to a given mRNA may deplete the trans-factor availability and indirectly affect other mRNAs (sponge effect, D)

2.1.1 Regulated binding for stability

Posttranslational modifications (PTMs) regulate the activity and/or subcellular localization of RNA-binding proteins. Lysine acetylation and tyrosine phosphorylation are among known PTMs, suggesting context-dependent regulation of RNA-binding proteins through various signaling routes. One of these signaling routes is the p38/MK2 pathway implicated in a variety of pathological processes including metastasis, apoptosis, and inflammation. In addition to known downstream effects, genotoxic stress-activated p38/MK2 pathway globally regulates RNA-binding protein and 3′UTR interactions, detected by a combination of transcriptomic and proteomic methods. For example, activated MK2 increases interleukin (IL-6) and TNF-α production by stabilizing their mRNAs through directly phosphorylating TTP (tristetraprolin), which is a destabilizing RNA-binding protein that otherwise binds to these mRNA 3′UTRs [16]. Activated p38/MK2 pathway also leads to phosphorylation HuR and regulates its cytoplasmic availability and binding to target mRNAs [17], one of which is COX-2 [18]. COX-2 is a prostaglandin synthase enzyme implicated in inflammation, cell growth, and tumorigenesis. Anti-inflammatory butyric acid produced in the gut, indirectly decreases the stability of inflammatory COX-2 mRNA. Butyrate decreases activity of p38/MK2, leading to less HuR binding to COX-2 3′UTR, and reduced mRNA and protein levels of this inflammatory gene [19].

Another target of the p38/MK2 is hnRNPA0, as part of an axis that is linked to TP53 and chemotherapy response. TP53 has key decisive roles in maintaining genomic integrity of cells by controlling cell-cycle arrest, apoptosis, and is one of the most frequently mutated tumor suppressor genes in cancers [20,21,22]. Upon DNA damage, p53 transcriptionally upregulates p21 and GADD45α transcription to regulate cell cycle arrest. However, in case of TP53 mutations, tumor cells become dependent on the p38/MK2 pathway to survive from topoisomerase inhibitors or platinum-based compounds [23, 24]. In response to DNA damage, MK2-phosphorylated hnRNPA0 binds to p21 and GADD45α 3′UTRs, leading to the stabilization of their mRNAs and inducing cell cycle arrest. This arrest provides time for DNA repair and thus allows development of resistance to cisplatin therapy in lung cancer cells [14].

2.1.2 Competitive binding for stability

More than one type of RNA-binding protein and/or microRNA can recognize the same or overlapping cis-elements on 3′UTRs. In certain instances, recognition of these cis-elements by a specific trans-factor instead of others determines the faith of the mRNA. For example, RNA binding-proteins including hnRNPD, hnRNPA/B, ZMAT3 (a.k.a. wild-type p53-induced gene 1; WIG1), HuR as well as miR-125b interact with the 3′UTR of TP53 mRNA and regulate TP53 protein levels [25, 26]. Binding of WIG1 and HuR to AU-rich elements on the TP53 3′UTR stabilizes the mRNA, leading to higher TP53 protein levels [26, 27]. On the other hand, binding of miR-125a and miR-125b to 3′UTR negatively regulates TP53 translation. HuR and miR-125b competitively and antagonistically control TP53 mRNA translation in response to genotoxic stress to fine-tune TP53 activity and hence modulate cell survival [28].

Another competition among trans-factors is for the 3′UTR of MYC. Destabilizer CELF1 (ELAV-like family member 1, also referred to as CUG-binding protein 1 (CUGBP1)) competes with stabilizer HuR to modulate MYC mRNA stability and translation during intestinal epithelial homeostasis [29]. Similarly, HuR and TTP bind to overlapping sites, with TTP acting as a destabilizing factor for various other transcripts encoding proteins related to immune function and cancer [30].

Accordingly, cell behavior and phenotypes are altered as in the case of apoptosis. B cell CLL/lymphoma 2 (BCL2), an anti-apoptotic protein has several mRNA isoforms. BCL2α mRNA contains the full-length 3′UTR. TRA2β, an SR-like protein, regulates apoptosis by binding to BCL2α 3′ UTR at the consensus binding motif (GAA) which is also the binding site for miR-204. TRA2β binding antagonizes the negative effects of miR-204 and causes stabilization of the BCL2α transcript and upregulation of BCL2 protein. Interestingly, BCL2 mRNA has another isoform, BCL2β that contains an alternatively spliced 3′UTR, whose sequence differs from that of the full-length 3′UTR and lack these cis-elements. TRA2β binding to BCL2 mRNA isoforms is functionally important as knockdown of TRA2β increases sensitivity to anticancer drugs [31]. As seen in case of TP53, MYC, and BCL2, transcripts can be recognized antagonistically by RNA-binding proteins and/or microRNAs. Hence, any shift in this balance of positive and negative regulators is likely to affect the fate of the mRNAs.

2.1.3 Collaborative binding for stability

mRNA cis-elements are often short, and cooperative binding of multiple RNA-binding proteins is required for specificity [32]. For example, RNPC1, an RNA-binding protein and a transcriptional target of TP53, regulates p21 mRNA stability together with HuR. RNPC1 directly interacts with HuR and enhances its RNA-binding activity. Then, both RNPC1 and HuR bind to upstream and downstream AU-rich elements in the p21 3′UTR and stabilize p21 mRNA [33].

On the other hand, mRNA repressor protein TRIM71 recognizes a structural RNA stem-loop motif on the 3′UTR of p21. Next, NMD (non-sense mediated decay) factors such as UPF1 assist TRIM71-mediated degradation of p21 mRNA [34], suggesting a new interplay between RNA-binding proteins and non-canonical NMD. Canonical NMD is generally activated when there is a premature stop codon (PTC) upstream of 3′UTR exon junction complexes (EJC). EJC complexes are deposited by the spliceosome on the mRNA approximately 20–24 nucleotides upstream of the splice junctions in the nucleus, thereby retaining the memory of the former location of the excised introns. In case of an upstream PTC, the ribosome cannot strip off EJCs during the first round of translation. These remaining EJCs initiate NMD, which is regulated by three main and highly conserved factors: UPF1, UPF2, and UPF3. NMD was initially recognized as a quality control mechanism to degrade mRNAs that have PTCs, which may otherwise be translated into truncated proteins. However, it is now clear that NMD is also active in embryogenesis and in normal cells as part of gene expression regulation independent of any non-sense mutations. Within this context, NMD degrades wild-type mRNAs to facilitate cellular responses [35]. Moreover, UPF1 is involved in additional mRNA decay pathways mediated by other RNA-binding proteins (e.g., staufen, stem-loop-binding protein, glucocorticoid receptor) and by microRNAs [36, 37]. Therefore, recruitment of UPF1 and/or other decay stimulating proteins to numerous target mRNAs will be of interest within the context of transformed cells as deregulated expression or activity of these trans-factors are likely to globally alter stabilities of cancer related mRNAs.

2.1.4 Sponge effect of 3′UTRs

Since their initial discovery, mounting evidence has illustrated the key regulatory roles for microRNAs in various developmental, differentiation, cell proliferation, and apoptosis pathways [38,39,40,41]. (Deregulated expression of microRNAs and their roles in cancer have been investigated extensively and reviewed elsewhere [42, 43]). In addition to direct binding of microRNAs to their target mRNAs, there are intricate and indirect relationships between microRNAs and how they recognize their targets on 3′UTRs.

As a well-examined example, miR-34 family was discovered as a direct transcriptional target of TP53 (reviewed in [44]), mediating induction of apoptosis, cell cycle arrest, EMT (epithelial–mesenchymal transition), and senescence [45]. This family is frequently down-regulated in various cancers and has tumor-suppressor roles through their mRNA targets, including MYC, CD44, MET, CDK4/6, NOTCH1, BCL2, and SNAIL (reviewed in [46]). Among numerous key targets of miR-34, CD44 and MYC represent a special case. Overexpression of MYC indirectly causes upregulation of CD44 expression because over-abundance of MYC 3′UTR acts as a miRNA sponge, decreasing the availability of miR-34a pool to bind to the 3′UTR of CD44 mRNA [47]. Since microRNAs recognize multiple mRNAs and a single mRNA can be targeted by more than one microRNA, similar sponge effects are likely to exist in a cancer type specific manner, further fine-tuning gene expression regulation.

Overall, role of 3′UTRs in regulating mRNA stability is probably the most investigated and understood aspect of RNA metabolism in cancer cells. It is clear that interaction of trans-factors with mRNA 3′UTRs are complex and cancer specific changes in the expression or regulation of trans-factors are likely to alter mRNA stabilities and later, protein levels of cancer related genes that are not altered at the DNA level.

2.2 Translation efficiency

Translation initiation is the rate-limiting step of protein synthesis. During initiation, 5′capped (m7GpppG) mRNAs associate with the small ribosomal subunit (40S) together with initiation factors (eIF1, eIF2, eIF3, and eIF4 complexes) to recruit the larger ribosomal subunit. During this assembly process, the eIF4 complex (consisting of eIF4A, eIF4B, eIF4E, and eIF4G) binds to the 5′cap of the mRNA. Next, the 5′cap and the 3′UTR poly(A) tail are circularized and brought together by eIF4G that interacts with the PABP (poly(A) binding protein). This circularization helps translation activation and possibly promotes ribosome recycling [48]. Given the significance of this mRNA conformation during translation initiation, most control elements that regulate the efficiency of translation are located within both 5′ and 3′UTRs. An interesting example is how translation of specific mRNAs is regulated under hypoxic conditions. Oxygen-regulated hypoxia-inducible factor 2α (HIF-2α) forms a complex with the 5′cap-binding protein eIF4E2 (an eIF4E homolog), and RBM4 (RNA binding motif protein 4) which binds to the 3′UTR of the EGFR mRNA. This complex along with other initiation factors direct EGFR mRNA to polysomes for translation under hypoxia when general protein synthesis is repressed [49]. Hence, binding of RBM4 to the 3’UTR of EGFR mRNA is pivotal in HIF-2α-mediated selective translation of EGFR mRNA under hypoxia.

eIF3, another factor for the 43S pre-initiation complex, associates with the eIF4F complex at the 5′cap of the mRNA and helps the recognition of translation initiator AUG codon [50]. Surprisingly, 3′UTR length can modulate eIF3 function, as demonstrated for the PTBP1 (polypyrimidine tract-binding protein) isoforms. eIF3 binds to both 5′ and 3′UTRs of PTBP1 mRNA. However, eIF3 binds differently to 3′UTR isoforms of PTBP1 transcripts, leading to translational regulation of PTBP1 in a cell cycle specific manner, which affects downstream regulation of alternative splicing events mediated by PTBP1 [51]. Another example is a multiprotein complex called GAIT (gamma interferon inhibitor of translation element) that recognizes a bipartite stem-loop structure in 3’UTRs. In myeloid cells, IFN-γ induces binding of GAIT proteins to 3′UTRs of multiple inflammation-related mRNAs, including CP (ceruloplasmin) and VEGFA, and represses their translation [52, 53]. A member of this complex, ribosomal protein L13a, binds to eIF4G to block ribosome recruitment, possibly through mRNA circularization that brings the 3′UTR-bound GAIT complex close to the 5′-end where it can effectively block 43S joining and translation-initiation. RNA secondary structure prediction algorithms predict presence of GAIT elements in other mRNA 3′UTRs in addition to CP and VEGFA mRNAs [54]. Other RBPs also interfere with ribosome recruitment and regulate translation efficiency of mRNAs. HNRNPK and HNRNPE1 bind to the 3′UTR of LOX mRNA and inhibit ribosome assembly and translation during erythroid cell differentiation [55].

While these examples depict the role of 3’UTRs in ribosome assembly and translation initiation, 3′UTRs can modulate the elongation step of translation as well. For example, PUF (PUMILIO and FBF)/AGO complexes bound to cis-elements on 3′UTRs inhibit GTPase activity of the elongation factor, eEF1A, and consequently inhibit peptide synthesis [56].

Another example is highly relevant to cancer related cellular processes. EMT related ILEI (interleukin-like EMT inducer) and DAB2 (Disabled-2) mRNAs are transcribed but are not translated unless the cells are activated by TGF-β [57, 58]. The underlying mechanism depends on an mRNA-protein complex that binds to the 3′UTRs of these transcripts. The so-called BAT (TGF-β activated translational) complex, containing hnRNPE1 inhibits the translation of these EMT related mRNAs by blocking eEF1A1 release during translation elongation. TGF-β signaling leads to phosphorylation of hnRNPE1, which disrupts the hnRNPE1-eEF1A1 interaction, allowing translation and triggering EMT in breast cells [59]. Based on these observations, mRNA circularization is an important step of ribosome function and 3’UTR cis-elements recognized by RNA-binding proteins have the potential to alter translation initiation and elongation dynamics.

2.3 mRNA localization and local protein synthesis

Subcellular enrichment of mRNAs enables accumulation of proteins at specific regions within cells. Polarized cells, such as neurons, are best-known examples for this type of local protein synthesis. mRNA 3′UTRs can facilitate local enrichment of mRNAs and consequently proteins. For example, BDNF (brain-derived neurotrophic factor) mRNA isoforms with different 3′UTR lengths have different localizations. The short 3′UTR isoform of BDNF localizes in soma of neurons, whereas the longer isoform is localized to dendrites. These isoforms have different translational rates and have different roles in the morphogenesis of dendritic spines [60]. Another case is Cdc42 (cell division cycle 42) which is a Rho-GTPase that regulates actin cytoskeleton and cellular morphology [61]. Two mRNA and protein isoforms of Cdc42 with distinct functions (i.e., axonogenesis and formation of dendritic spines) in neuronal polarity are differentially localized at neurites and soma of neurons [62]. These two isoforms differ in their C-terminal ten amino acids and in 3′UTR lengths. Interestingly, it is not the different amino acids but the alternative 3′UTR bound proteins that determine the differential protein localization between soma and neurites [62].

Similar cases are described in less polarized cells where localized mRNAs regulate cell behavior, including cell-cell adhesion and directed cell migration. For example, β-actin mRNA localization and local protein synthesis are enriched at focal adhesions, dynamically regulating the adhesive properties of migrating cells [63]. Deleting the 3′UTR from the β-actin mRNA results with delocalization of β-actin monomer synthesis and perturbs adherens junctions [64, 65]. Zip code binding protein 1 (ZBP1, also called IMP1 (IGF2 mRNA-binding protein 1)) binds to the 3′UTR of β-actin to regulate mRNA localization [66]. IMP1 first associates with β-actin mRNA in the nucleus and transports it to the cell edge in a translationally suppressed form [67]. Later, when an extracellular signal causes IMP1 phosphorylation by Src, β-actin mRNA is released from IMP1 and is translated at the cell edge [67]. IMP1 further facilitates the localization of E-cadherin, α-actinin, and actin-related protein 2/3 (ARP2/3) complex mRNAs, which are involved in cell-cell connections and focal adhesions. Interestingly, non-metastatic breast tumors express IMP1, but it is downregulated in metastatic cells [68]. Decreased expression of IMP1 causes delocalization of cell-motility-related mRNAs, increasing the growth and cell motility of metastatic breast cancer cells [69, 70].

Given these potential implications for protein function, localization, and cellular phenotypes, the question is, how are mRNAs localized or enriched at specific sites? Current evidence suggests two general mechanisms: guided mRNA transport and spatial mRNA degradation. These processes are both dependent on mRNA 3′UTRs.

For mRNA transport, motor proteins and/or RBPs bound to 3′UTRs can facilitate transportation through the cytoskeletal system. In addition to examples in yeast and Drosophila [71, 72], only a few cases have been reported in mammalian cells. For example, a group of mRNAs are anchored to the granule regions at the plus ends of microtubules in mammalian fibroblasts [73, 74]. Of further interest, these mRNAs involved in migration and metastatic progression, require APC (adenomatous polyposis coli) for their localization to cell protrusions in the NIH/3T3 cells spread on a fibronectin-coated surface [73]. Later, APC interacting other RNAs were identified in the brain. Interestingly over 200 mRNA targets were highly enriched for APC-related functions, including microtubule organization, cell motility, and cancer [75]. This unanticipated function for APC may have implications in motility and migration of cancer cells, where APC function is deregulated.

Alternative to active transport, spatial degradation is another way to enrich mRNAs at unique locations in cells. For this, 3′UTR cis-elements have a role in differentially regulating the stability of the mRNA. In Drosophila, Nanos mRNA is localized specifically at the posterior pole of the cytoplasm as it is targeted by 3′UTR-bound Smaug for deadenylation and degradation at other sites [76, 77]. Despite these interesting examples in different cell types, the extent of deregulated mRNA localization and consequences in cancers are not fully known. One reason could be that locally enriched mRNA and protein concentrations are technically difficult to quantify with conventional methods, especially in less polarized cells. Therefore, it is likely that local mRNA enrichment cases are under-estimated and currently, we have an insufficient view of spatial gene expression regulation in cancer cells. Hence, a better understanding of local protein synthesis guided by mRNA subcellular localization may be helpful to understand complex behaviors of cancer cells including focal adhesion dynamics, cell motility, EMT, and metastasis.

2.4 3′UTRs as hubs for protein-protein interactions

3′UTR bound RBPs can facilitate protein-protein interactions such as the CD47 example where protein localization of CD47 is regulated independent of its mRNA localization. CD47 has two mRNA isoforms that differ at their 3′UTR lengths. Only the long 3′UTR harboring mRNA can interact with HuR to recruit SET to its 3′UTR [78]. Next, SET interacts with the CD47 protein synthesized from the longer 3′UTR isoform and enhances plasma membrane localization of CD47, which is a “Don’t eat me” signal [78, 79]. SET transfers from the CD47 mRNA to the CD47 protein during synthesis. This transfer occurs within so-called TIS granules, which are reticular structures intertwined with the endoplasmic reticulum (ER). TIS granules are enriched for the RNA binding-protein TIS11B, and membrane protein-encoding mRNAs. Here, 3′UTR-mediated interaction of SET with membrane proteins occurs, allowing increased surface expression [80].

3 Ending the 3′-end: means to diversity

3′UTRs have pivotal roles in regulating protein abundance and function. With the advancements in RNA-sequencing methods tailored to detect 3′-ends of isoforms, we are just beginning to get a perspective of a 3′-end isoform diversity in cancers and how it is generated by different mechanisms.

3.1 Splicing and polyadenylation

Majority of eukaryotic genes contain long intervening sequences (introns) that disrupt the coding sequences (exons). The spliceosome regulates the removal of introns during pre-mRNA maturation [81]. Since the initial discovery of splicing more than 40 years ago [82, 83], mechanistic understanding of splicing has greatly improved through genetic and biochemical studies including cryo-EM of the spliceosome structure at atomic and near-atomic resolution (for a review, please see [84]). In addition to the spliceosome, numerous splicing regulators including splice enhancer and repressor proteins (e.g., SR proteins) contribute to the splicing decisions in a tissue and developmental stage-specific manner. With the advancements in RNA sequencing methods, it is becoming clear that alternative processing of mRNAs through splicing is the norm and not the exception for gene expression. As for cancer, given that genomic instability is a hallmark, it is not surprising to find a comparable deregulation at the transcriptome level. One aspect of this complexity is due to aberrant splicing in many cancers due to mutations and/or deregulated expression of spliceosome components or mutations in splicing regulating cis-elements such as splice acceptor and donor sites (Reviewed in [85, 86]). For example, RNA splicing machinery is frequently mutated in myeloid malignancies, whereas DNA copy number changes and deregulated expression of splicing factors are more widespread in solid tumors [87].

In fact, defects in splicing machinery are just the tip of the iceberg, with substantial downstream effects on many target transcripts, as was shown for a splicing factor, SRSF1 (serine and arginine-rich splicing factor 1). SRSF1 is amplified and upregulated in breast, lung, colon, and bladder cancers and generates potentially hundreds of pro-proliferative splice variants either through exon inclusion or skipping, enhancing tumorigenesis [88, 89]. For example, SRSF1 acts as an oncogene through regulating alternative splicing of the RON proto-oncogene (MSTR1), a tyrosine kinase receptor for the macrophage-stimulating protein [90]. SRSF1 regulated skipping of exon 11 generates the RONΔ11 isoform, which promotes cell motility and invasion [90]. SRSF1 overexpression also promotes the inclusion of exon 12a of BIN1 (bridging integrator 1) to generate the BIN1+12a isoform, which can no longer bind and inhibit MYC [91]. In addition, inclusion of an exon 13b of the kinase MKNK2 (MAP kinase-interacting serine/threonine-protein kinase 2) generates an isoform that enhances eIF4E phosphorylation [92]. Inclusion of exon 6 of DBF4B (regulatory subunit for CDC7, has a central role in DNA replication and cell proliferation) by SRSF1 promotes tumorigenesis of colon cancer cells [93]. As the SRSF1 case exemplifies the diverse downstream effects, mounting evidence points out to similar aberrant splicing cases and multilayered consequences in cancer cells.

While it has become clear that aberrant splicing is widespread in cancers (aberrant splicing in cancers are discussed in detail in recent and excellent review articles [94,95,96]), general focus is on cassette exons and not so much on 3′-terminal exons. Nevertheless, alternative splicing generates new terminal exons that alter the 3′-end of the coding sequence and/or 3’UTR (Fig. 2). Alternative 3′-terminal exons are generally less conserved and expressed at lower levels than isoforms with the canonical last exons [97]; however, functionally they may be highly relevant to cancer biology.

Fig. 2
figure 2

Splicing leading to 3′UTR isoforms in a hypothetical model. a Removal of the last intronic sequence and use of a polyadenylation signal at the terminal exon marks the polyadenylation site and determines the 3′UTR length. b In case of alternative splicing and retention of a distal intron may change the 3′-end of the coding sequence and include the exonized intronic sequence as part of the 3′UTR. c Splicing causes inclusion of an alternative terminal exon (red), leading to changes in 3′-coding sequence and the 3′UTR sequence

A new terminal exon can be generated by intron retention, which is coupled to activation of an intronic polyadenylation site (Fig. 2b). Alternatively, an exon that is usually spliced out can be included. This new terminal exon may harbor a new stop codon and an exonic polyadenylation signal to be used to add the poly(A) tail (Fig. 2c). In both cases, use of a new terminal exon can potentially lead to mRNA variants that encode protein products with truncated C-terminus amino acids, along with alternate 3′UTR sequences. In case of C-terminus truncation, proteins may be nonfunctional, gain a new function and/or even modulate the function of the wild type protein in a dominant-negative manner. Therefore, the consequences would be unique to each transcript and its corresponding protein function. For example, the observation that dominant-negative and secreted variant isoforms of receptor tyrosine kinases (RTKs) exist as a result of intronic polyadenylation site usage [98], has potential implications and has to be explored in more depth in patient samples. In leukemia, intronic polyadenylation site usage causes inactivation of tumor suppressors including FOXN3, which is a transcriptional repressor involved in cell cycle regulation and tumorigenesis [99]. Alternatively, intronic polyadenylation cases in other genes including CARD11, MGA, and CHST11, promote oncogenic activity [99].

Other intriguing cases have been linked to DNA damage. Initial evidence came from budding yeast where DNA damage induces targeted, genome-wide variation of polyadenylation site usage [100]. Later in breast cancer cells, doxorubicin (DOXO), a TOP-II inhibitor, was shown to modulate use of alternative 3′-terminal exons, which use intronic polyadenylation sites, and globally mediate the genotoxic stress responses [97]. More evidence point out the significance of terminal exon shift in DNA repair pathways. For example, CDK12 phosphorylates C-terminal domain of RNA polymerase II to regulate transcription elongation, splicing, and polyadenylation [101]. CDK12 suppresses intronic polyadenylation and favors distal polyadenylation in mouse embryonic stem cells, enabling the production of full-length mRNAs, including the homologous recombination repair genes. Consequently, depletion of CDK12 causes attenuated DNA damage repair [102]. The functional importance of this finding was further extended with the use of a selective inhibitor (SR-4835) of CDK12 (and CDK13). SR-4835 triggered use of intronic polyadenylation sites (i.e., a total of 1824 sites) some of which impaired the expression/activity of DNA damage response proteins (BAP1, ATR, FANCL, WRN, BRCA1, BRCA2, FANCM, BRIP1, FANCD2, FANCI, BLM, FANCA, ATM), provoking a “BRCAness” phenotype that synergize with DNA-damaging chemotherapy and PARP inhibitors in triple-negative breast cancer cells [103]. Hence, these isoforms individually or collectively can affect treatment responses.

Current understanding is that at least ∼ 20% of human genes have one or more intronic polyadenylation events. Hence, it would be important to understand whether these isoforms are functional in cancer related phenotypes [104]. Of note, many cryptic poly(A) sites within proximal introns are suppressed by a mechanism called telescripting, based on U1 snRNP’s splicing independent role in full-length transcription [105]. Any role of deregulated telescripting in cancers is of yet unknown.

In addition to splicing induced changes to 3’UTRs, alternative polyadenylation also generates 3′-isoform diversity using polyadenylation sites within terminal exons, which are generally located within 3′UTRs. A multi-protein machinery that recognizes polyadenylation signals and neighboring conserved elements, carries out endonucleolytic cleavage of the pre-mRNA [106]. Alternative polyadenylation is the selection of different polyadenylation signals in a tissue-specific way and is globally regulated in response to cell proliferation and differentiation. Splicing independent alternative polyadenylation results with isoforms that have short or long 3′UTRs with identical coding sequences, as opposed to splicing coupled intronic polyadenylation (Fig. 3).

Fig. 3
figure 3

Polyadenylation sites. mRNA 3′UTRs may harbor more than one polyadenylation site (I, II, and III). Activation of a proximal polyadenylation site (II) results in shortening of the 3′UTR, resulting in loss of cis-elements. The resulting protein products from the short versus the long 3′UTR isoforms would be identical. However, in case of splicing coupled alternative polyadenylation (III), loss of amino acids and addition of few others coded by the exonized intron can change the protein C-terminus. Alternative terminal exon usage would also produce similar changes at the C-terminus of the protein

Global changes in 3′UTR lengths due to alternative polyadenylation were first demonstrated in activated T cells and cancer cells [107, 108]. In fact, 3′UTR shortening of growth-promoting mRNA transcripts was observed much earlier to correlate with cancer-related phenotypes. For example, CCND1 (cyclin D1) mRNA 3′UTR shortening prevents microRNA-mediated repression and leads to an additional increase in cyclin D1 protein levels in lymphoma cells and decreased overall survival of patients [109].

Recent 3′-end tailored transcriptomic methods and analysis tools reveal a global picture of alternative polyadenylation patterns that leads to 3′UTR shortening or lengthening cases in cancers [110, 111]. Moreover, individual investigation of 3′UTR isoforms provide functional relevance to increased proliferation, migration and metastasis of cancer cells. For example, 3′UTR shortening of insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1) mRNA contributes more to the oncogenic transformation compared with the longer mRNA isoform [108]. Estradiol induces CDC6, a DNA replication regulator, mRNA upregulation. The shorter 3′UTR isoform of CDC6 is increased more than the longer 3′UTR isoform, enhancing S phase entry [112]. EGF (epidermal growth factor) induces similar 3′UTR shortening events in triple negative breast cancers with significant impact on relapse-free survival [113]. Alternative polyadenylation also allows NRAS and c-JUN to avoid posttranscriptional repression by PUMILIO in triple-negative breast tumors [114].

On the other hand, lengthening of 3′UTRs and reduced gene expression is seen in senescent cells. For example, RRAS2, a member of the Ras superfamily, is expressed as a longer 3′UTR isoform, leading to decreased expression in senescent cells [115]. Similarly, 3′UTR of HN1 (hematological and neurological expressed 1) is lengthened in senescent cells as opposed to cancer cells where expression of a shorter 3′UTR isoform is favored [116].

Overall, dynamic relationship between splicing and polyadenylation leads to an additional layer of transcriptome complexity at the 3′-ends of transcripts. While recent studies are accumulating to show functional relevance, it is obvious that the diversity of transcripts at the 3′-end is not fully known and appreciated. Unfortunately, we are not fully aware of these variants of even well-known cancer genes. For example, BRAF codes for a protein kinase functioning in the highly oncogenic RAS/RAF/MEK/ERK signaling pathway [117] and is mutated in about 50% of melanomas [118] as well as in other cancers, including colon cancers [119]. Despite being an attractive therapy target, information on BRAF isoforms are just beginning to surface. RNA-seq data of 4800 patients of different cancers including melanomas revealed the existence of BRAF mRNA isoforms, which differ in the last part of their coding sequences, as well as in the length and sequence of their 3′UTRs (BRAF: 76 nt; BRAF-X1 and BRAF-X2: up to 7 kb). Expression of these isoforms varies among cancers, contributing differently into BRAF protein levels [120]. Because 3′UTR sequences are different in these isoforms, regulation by microRNAs and/or RNA binding-proteins is very likely to be different in cancers [121]. Hence, differential regulation by trans-factors that bind to these BRAF isoforms may contribute to over activation of the BRAF kinase in a subset of patients who do not have DNA level alterations of BRAF.

3.2 Structural variations and 3′UTR diversity

In addition to splicing and polyadenylation, structural variations identified by long-read technologies can explain certain cases of altered 3′UTRs. Structural variations generally arise from chromosomal translocations that may partially or completely alter 3′UTR sequences. A breakpoint at 3′UTR of PDL1 is an intriguing example. PDL1 functions as a co-inhibitory molecule expressed on the surface of tumor cells to evade the immune system [122]. Increased expression of PDL1 aids immune evasion and is associated with worse prognosis in many tumors. Hence, immunotherapy targeting the PDL1/PD1 pathway is a promising strategy. Overexpression of PDL1 is a critical factor for blockade treatments; however, the current understanding of PDL1 deregulation in tumors is as of yet incomplete. Overexpression of PDL1 occurs through gene amplification and utilization of an ectopic promoter by translocation, as reported in a subset of Hodgkin, other B cell lymphomas and stomach adenocarcinomas [123]. Now, recent evidence is drawing attention to the 3′UTR of PDL1. The first evidence came from structural variation associated breakpoints that disrupt the 3′UTR of PDL1. These breakpoints included deletions, duplications, inversions and translocations, detected from whole-genome sequences of 49 adult T cell leukemia patients. Interestingly, all of the aberrant PDL1 transcripts were elevated; they retained the extracellular receptor-binding and transmembrane domains of PDL1 but not the 3′UTRs [124]. A group of 32 more cases (out of 10,210 samples) across 33 tumor panels from “The Cancer Genome Atlas” had aberrant 3′UTR of PDL1. This aberrant 3′UTR structure was also associated with elevated transcript expression levels in patients [124]. More recently, two groups simultaneously reported a secreted splice variant of PDL1 (secPDL1) which can negatively regulate T lymphocyte function [125, 126]. The secPDL1 has a different 3′UTR that harbors less AU-rich motifs than the full-length PDL1 mRNA. It remains to be seen whether these PDL1 mRNA 3′UTR isoforms have altered mRNA stabilities, and/or targeted differently by microRNAs, such as miR-17-5p [127], and miR-140 [128], and RNA-binding proteins. Based on these reports, 3’UTR cis-elements lost or gained through splicing or genomic alterations of PDL1 locus may have significant implications in immune checkpoint inhibitor treatments and possibly contribute to patient stratification efforts.

3.3 Mutations

A protein-focused view to identify mutations sometimes hinders discovery of less-obvious but potentially important mutations within non-coding parts of genes. For example, a frequent recurrent non-coding mutation in the 3′UTR of NOTCH1 (chr9: 139390152T > C) leads to a novel splicing event within the last exon of NOTCH1 in CLL patients. This isoform lacks the negative regulatory domain of NOTCH1, resulting in increased protein stability. Moreover, prognosis of patients with 3′UTR NOTCH1 mutations are similar to patients with coding mutations in NOTCH1, suggesting functionality [129]. A similar and frequent somatic mutation was reported in gastric cancer. A guanine-to-cytosine mutation in PDL1 3’ UTR leads to protein over-expression by disrupting binding of miR-570 [130]. Synonymous mutations can also alter 3′UTRs. For a long time, synonymous mutations within coding sequences were considered as silent and hence insignificant changes due to the degeneracy of codons. However, given that both DNA and RNA sequences are changed upon synonymous mutations, altered splicing patterns or misfolded RNA secondary structures are potentially significant outcomes of less obvious mutations.

3.4 RNA modifications

N6-methyladenosine (m6A) is a posttranscriptional RNA modification [131], most often found in proximity to stop codons, or within the 3′UTRs [132]. While the functional significance of these modifications is not completely understood, new evidence is beginning to suggest causality. In yeast, Rme1 (regulator of meiosis 1) is a DNA-binding protein that prevents meiosis in haploids. A functional m6A modification within the 3′UTR of Rme1 downregulates Rme1 mRNA levels to enable meiotic progression [133]. Considering the high homology in writer (METTL3/14, WTAP, RBM15/15B, and KIAA1429), reader (YTHDF1/2/3, IGF2BP1, and HNRNPA2B1), and eraser (FTO and ALKBH5) proteins, we are likely to discover functional consequences (e.g., altered RNA–protein interactions or RNA functions) due to m6A modifications within 3′UTRs. In cancer, functional evidence for RNA modifications within 3′UTRs is also starting to surface. For example, ALKBH5 demethylates an m6A residue on the 3′UTR of the NANOG mRNA, which encodes a pluripotency factor. Consequently, increased NANOG mRNA and protein expression is linked to breast cancer stem cell phenotype [134].

4 Conclusion

Recent omics approaches are paving the way to a more comprehensive understanding of cancer cells at genomic, transcriptomic and proteomic level. The tactic of searching for mutations in protein-coding regions through high throughput sequencing is the general basis of today’s personalized medicine efforts. However, whole exome sequencing and targeted gene sequencing are limited to the coding sequences and/or known clinically relevant genes. Therefore, discovery of other “not so obvious” mutations altering 3’UTRs is generally not the primary concern. In addition, conventional gene expression approaches are not tailored to reach 3’UTR ends and detect 3′-isoforms unless 3′-specific sequencing and analysis tools are used (reviewed in [135, 136]). This also hinders discovery of potentially important 3′-end isoforms.

Fortunately, some novel single-cell RNA sequencing (scRNA-seq) techniques (e.g., Drop-seq, Seq-Well, DroNC-seq, and SPLiT-seq) specifically sequence the 3′ end of transcripts, allowing exceptional depth to transcriptomic studies at a single cancer cell level. Accumulating evidence from these studies, strengthen the earlier findings, showing widespread expression of 3′-isoforms including the intronic polyadenylation events [137].

Recently, worm 3′ UTRome v2 resource became available in C. elegans, curated from high-quality 3′UTR data at ultradeep coverage from 1088 transcriptome datasets. This resource contains data for 23,084 3′UTR isoform variants corresponding to 14,788 protein-coding genes and is a novel resource to investigate alternative splicing, and polyadenylation patterns as well as trans-factors regulating these isoforms [138]. For humans, tissue and cancer-specific 3′UTRome is very much needed to identify “not so obvious” changes related to 3′UTRs, with substantial impact on protein function and cell phenotypes. Overall, discovery and characterization of cancer specific 3′UTR isoforms creates new possibilities for diagnosis and pharmacological intervention. Consequently, in the near future, findings in the RNA biology field are likely to provide truly novel insights into normal physiological events and disease mechanisms including cancer.