Keywords

Introduction

One of the most striking characteristics of eukaryotic genomes is the discontinuous arrangement of genes, in which protein-coding segments, known as exons, are interrupted by non-coding sequences, known as introns. Through a highly regulated process known as splicing, the introns present in the precursor messenger RNA (pre-mRNA) are removed, and exons are joined together, forming the mature form of mRNA. While some genes are constitutively spliced, meaning that they will produce the same mature mRNA from a given pre-mRNA, some can generate different mRNA isoforms from a single pre-mRNA through a process known as alternative splicing (AS). AS is a process by which a single gene can give rise to different transcripts through the differential assembly and organization of the sequences present within the exons/introns of a gene. This process can give rise to a wide variety of transcript isoforms from a small number of genes, increasing both transcriptome and proteome [1, 2].

With the recent development of faster and accurate sequencing technologies and the ever-decreasing costs of sequencing whole transcriptomes, it has become apparent that AS has a significant role in organism development. It is estimated that around 95% of all human protein-coding genes undergo a process of AS [3, 4]. After the sequencing of the human genome and that of other model organisms, it became clear that although humans are significantly more complex both morphologically and behaviorally than mice, fruit flies (Drosophila melanogaster), or tapeworms (Caenorhabditis elegans), the difference in the number of genes between these organisms was not the reason behind these differences. This meant that something else besides the number of protein-coding genes was responsible for the increasing level of complexity among different organisms. Evidence from the characterization of expressed cDNA clone sequence tags [5, 6], as well as bioinformatic analysis, indicates that AS events seem to increase with organism complexity. These findings suggest that AS has been used throughout evolutionary history as a mechanism through which higher level of molecular complexity can be achieved. In that regard, humans (Homo sapiens), compared to other organisms, are the ones that have the highest percentage of AS events for multiexon protein-coding genes (around 88%) with at least 2+ mRNA isoforms per gene (comparatively mouse sits at around 63%, fruit fly 45%, and tapeworm 25%). This information is based upon current genome annotations that do not consider recent RNA-seq data. Recent studies based on RNA-seq of the human transcriptome indicate that >95–100% of human genes generate at least two alternative pre-mRNA isoforms (with an average of seven mRNA isoforms per gene) [3, 4]. These observations support that an increase in the occurrence of AS events is directly correlated with the increase in organism complexity [7].

Mechanisms of Pre-mRNA Splicing

The underlying molecular mechanisms by which introns are removed from the pre-mRNA seem to be very well conserved among all eukaryotic groups [8]. Intron excision from the pre-mRNA and subsequent exon ligation is mediated by a complex and highly regulated molecular machinery known as the spliceosome. The spliceosome is a ribonucleoprotein (RNA-protein complex) which acts in a dynamic cycle of stepwise reactions that is assembled and disassembled for every single splicing reaction [9]. Several efforts have been made to characterize and determine the structure and function of the different components of the spliceosome throughout the different steps of the cycle. Biochemical and structural analyses have demonstrated that the catalytic center of the spliceosome is composed of RNA [10]. The actual biochemical reactions that take place during a splicing event are two consecutive SN2-type transesterification reactions involving functional groups from three reactive regions present in the pre-mRNA. Two of these regions are present at the 5′ and 3′ end of introns and are known as either 5′ or 3′ splice sites (SS). The other region involved is known as the branch point site (BPS) located near the 3′ end of the intron, around 15–50 nucleotides upstream of the 3′ SS [11]. The sequences that define the 5′ and 3′ SS, as well as the BPS, are very small consensus elements [9].

The primary assembly process of the spliceosome requires the recruitment of several small nuclear ribonucleoprotein (snRNP) subunits, namely, snRNPs U1, U2, U4, U5, and U6, as well as other non-snRNP splicing factors. Each of these snRNP subunits is composed of a specific small nuclear RNA (snRNA), as well as other proteins. These snRNAs are the ones that through a base pairing-based mechanism recognize the specific sequences within the DNA (5′ and 3′ SS, BPS) to differentiate exons from introns. Firstly, the U1 snRNP recognizes the 5′ SS of the mRNA forming the early complex (complex E), meanwhile, the 3′ SS is recognized by the U2 snRNP as well as other associated factors, such as splicing factor 1 (SF1) and U2 auxiliary factors (U2AFs), which are also components of complex E. U2 then recognizes sequences at the BPS and interacts with U1 to form the pre-spliceosome complex (complex A). After the assembly of the complex A, the U4, U5, and U6 snRNPs are recruited as a pre-formed tri-snRNP, forming complex B. This resulting complex B is still catalytically inactive and goes through several changes to form the catalytically active complex B∗. The activation of complex B results in the release of U1 and U4. Complex B∗ can complete the first catalytic step of splicing, generating complex C, which contains exon 1 and the intron-exon 2 lariat intermediate. Complex C undergoes a series of rearrangements to carry out the second catalytic step that results in a post-spliceosomal complex that contains the lariat intron and the spliced exons. In the last step of the cycle, the U2, U5, and U6 snRNPs are released alongside the lariat intron and recycled for additional rounds of splicing. As mentioned before, the splicing process is largely RNA-based, mainly through the U2-U5-U6 snRNP complex [12], which seems to be the active structure that catalyzes both steps of the splicing reaction, but it seems that some proteins are also necessary for the formation of the catalytic site [13].

Mechanisms and Regulation of Alternative Splicing

The occurrence of different types of AS events is influenced by a wide variety of factors, including splicing regulatory sequences present in the pre-mRNA, differential activities of splicing factors that either activate or repress AS, chromatin density and structure, as well as transcription elongation rates [14]. The different types of AS events include exon skipping, in which an exon is spliced out of the mature mRNA, along with its flanking introns. Intron retention occurs when an intron is included in the mature transcript; alternative 3′ SS and 5′ SS selection events occur when multiple SS are recognized at the same time at either end of an exon. A mutually exclusive AS event can occur when two exons are included in the mature transcript in a mutually exclusive way (either one of the two exons is included); another type of AS event is known as alternative polyadenylation in which the region where the poly(A) tract is localized can vary. In vertebrates, and specifically in humans, the most prevalent AS event is exon skipping [6, 15]. All these events can be combined into even more complex types of AS, giving rise to a wide variety of isoforms from a single transcript.

The SS selection that eventually gives rise to different transcript isoforms from a single gene is determined by a wide variety of elements which can be both cis- and trans-acting. As mentioned earlier, because most SS consensus sequences are poorly conserved, SS by themselves cannot efficiently direct the spliceosome assembly process. For instance, the cis-acting elements include regulatory sequences which can be found both within exons and introns and act as splicing enhancers or silencers, depending on their position as well as their effect on the usage of a SS [14], meaning that within a specific region of the genome, both intronic splicing enhancers (ISE) and silencers (ISS) can be found, as well as exonic splicing enhancers (ESE) and silencers (ESS). These regulatory sequences function mainly as binding sites for trans-acting factors that in turn recruit the snRNP subunits of the spliceosome to a specific site.

The trans-acting elements function through the binding to splicing enhancer or silencer sequences and include both serine-arginine (SR)-rich and heterogeneous nuclear ribonucleoprotein (hnRNP) families of proteins. It is generally considered that SR proteins act as splicing promoters while hnRNP as repressors, although there is evidence indicating that the activities of some of these splicing regulators are determined by the region of the pre-mRNA in which they bind. A recent study has shown that SR proteins can act both as enhancers and silencers of splicing [16]. Interestingly, these proteins can recognize short RNA sequence motifs and can function as splicing enhancers when bound to exons and as repressors of splicing when bound to introns [17]. hnRNPs can also recognize specific binding sequences in RNA, and depending on its binding position, they can also act as both enhancers and silencers of splicing [18].

Alterations of pre-mRNA splicing can give rise to several physiological abnormalities that often lead to disease [19]. Even though the process of splicing itself consists of relatively simple steps, the recognition of correct SS is a daunting task for the splicing machinery given the highly complex organization of the genome. Mutations of these (and other) regulatory regions of splicing can sometimes generate faulty transcripts harboring premature termination codons (PTC) that if translated could lead to defective proteins. The nonsense-mediated mRNA decay (NMD) pathway was first described solely as a post-transcriptional surveillance and quality control mechanism responsible for the degradation of transcripts harboring a PTC that if translated could lead to the production of truncated proteins with deleterious effects for the organism. Evidence has also shown the importance of the NMD pathway as a regulatory mechanism that controls the expression of several naturally occurring transcripts [20]. Recently, it has been shown that not all PTC-containing transcripts trigger the activation of the NMD pathway. Moreover, other transcripts that do not contain a PTC are also targets of NMD [20], which indicates that further studies are needed to understand all the factors involved in the activation of the NMD pathway.

AS can generate transcripts harboring PTCs, and canonically these transcripts should be subject to degradation by NMD; however, the NMD pathway does not degrade these transcripts. Instead, it seems that some of these transcripts were able to “hijack” the NMD pathway to serve as a regulatory mechanism for their transcription [21]. This phenomenon, known as AS-NMD regulation, was first reported as a widely spread mechanism of regulation of a variety of transcripts, both naturally occurring and disease-related [22]. More recent evidence seems to suggest that AS-NMD regulation is not as widespread as initially thought [23], but there is evidence supporting an important role of AS-NMD regulation on specific gene families, including well-known regulators of splicing and AS [24,25,26]. It would seem that AS-NMD acts mainly as a repressive regulatory mechanism, but there is evidence that it can also be used as a developmental switch [27].

Coupling of Splicing with Transcription

There has been accumulating evidence showing the coupling of transcription and splicing [14]. The fact that splicing occurs in a cotranscriptional manner is the basis for inferring that transcription elongation rate, chromatin structure, and modifications are coupled with splicing.

One of the key elements that link transcription and splicing is the RNA polymerase II (RNAPII). Transcription elongation rate can affect SS selection and thus the products of AS. A faster elongation rate of transcription encourages skipping of exons with “weak” 3′ SS, while slower elongation rates encourage the inclusion of exons with weak 3′ SS sites [28]. One possible mechanism is that the occupancy of nucleosomes can alter the elongation rate of the RNAPII to facilitate inclusion of weak SS [29]. Nucleosomes can act as barriers for elongating RNAPII altering its elongation rate, as exons flanked by weak SS are more enriched with nucleosomes compared with those containing strong SS [30].

One of the main characteristics of RNAPII is the presence of a C-terminal domain (CTD). Post-translational modifications of this CTD have been shown to play a crucial role in the regulation of its transcriptional activity, particularly phosphorylation. The CTD of RNAPII serves as a scaffold for the recruitment of a wide variety of splicing factors [31], as mutations of the CTD lead to splicing alterations [32]. These post-transcriptional modifications may regulate the physical interactions between the CTD and splicing components creating a binding platform for splicing factors. Apart from nucleosome occupancy, there is evidence that chromatin modifications also have a role in coupling transcription and splicing [33]. Transcriptional elongation rate is regulated by a dynamic cycle of histone acetylation/deacetylation, which is very important for nucleosome dynamics during transcription and is coordinated by the CTD of RNAPII [34]. A wide variety of histone acetyltransferase (HAT) and deacetylase (HDAC) proteins mediate the addition and removal of acetyl groups that modify the interactions between nucleosomes and DNA. During the transcription process, it is necessary that HATs acetylate the nucleosome downstream of the elongation complex in order to destabilize the interactions between histones and DNA. RNAPII elongation causes a displacement of histones, which are subsequently placed onto the DNA behind RNAPII. These deposited nucleosomes are hyperacetylated momentarily. HDAC complexes remove acetylation marks from the chromatin to maintain a stable configuration. This acetylation/deacetylation dynamic can influence the selection of SS in many genes [33].

Chromatin-modifying proteins that recognize and bind specific histone marks can also affect splicing patterns by recruiting several splicing factors to sites of active transcription to modulate the inclusion or exclusion of alternative exons. Specific patterns of histone marks can correlate with particular splicing patterns in many genes [35]. Another evidence is the fact that modulation of histone marks by inhibitors, overexpression of histone modifiers, or knockdown experiment induces changes in the splicing patterns [36].

Connections Between Alternative Splicing and Human Disease

Recently, a vast amount of evidence supporting pre-mRNA splicing, both constitutive and alternative, as an important regulatory mechanism of organismic complexity in humans gave rise to further research the relevance that pre-mRNA splicing alterations could have on disease [37, 38].

The complex genomic arrangement of eukaryotic genomes comes with the implication that every intron-containing gene requires to undergo the process of splicing, meaning that the proper processing of pre-mRNA into mature transcripts needs to be tightly regulated, and the fact that splicing occurs in a cotranscriptional manner adds another layer of complexity to this process. The downside of the functional versatility that comes with pre-mRNA splicing is that this process can be disrupted through a wide variety of ways and these alterations can end up being the cause of several pathological conditions [37].

According to data from the Human Gene Mutation Database (HGMD), of all the single-nucleotide polymorphisms (SNPs) that are the cause of a disease, around 15% are located within SS sequences and 22% of disease alleles are located within splicing elements, meaning that more than one third of all disease-causing SNPs can alter splicing [39, 40]. Also, evidence from the HGMD suggests that around 10% of human inherited diseases are due to single base-pair substitution mutations located in SS [41]. However, these data only take into account mutations located at the relatively well-conserved SS sequences, but not at other cis-acting splicing elements nor mutations located at loci of trans-acting splicing elements. Alterations in pre-mRNA splicing can come from a wide variety of alterations, but they are grouped into these categories: mutations of canonical 5′ and 3′ SS, mutations of the BPS, mutations of cis-acting regulatory elements, mutations of trans-acting splicing elements, and mutations of the splicing machinery.

Mutations in Splice Sites and Regulatory Sequences

The most common types of splicing-related mutations that occur are those of cis-acting elements, such as the core consensus sequences (both 5′ and 3′ SS, as well as the BPS) as well as other splicing regulatory sequences [19]. Among these, familial dysautonomia is a rare recessive disorder in the Ashkenazi Jewish population that affects both the autonomous nervous system and the somatic sensory neurons [42]. Caused by a point mutation in intron 20 (T→C) in the IKBKAP gene, that results in the alteration of a 5′ SS weakening the binding of the spliceosome subunit U1, leading to the skipping of exon 20 which results in the introduction of a PTC in exon 21, making the mRNA susceptible to degradation by the NMD pathway [43].

One of the best studied cases of splicing alterations of cis-acting elements that end up being the cause of a pathological condition is spinal muscular atrophy (SMA). SMA is a prevalent recessive disorder associated with infant mortality; more than 90% of all cases of SMA are the result of mutations of the Survival Motor Neuron 1 (SMN1) gene. Humans carry two copies of the SMN gene: SMN1 and SMN2. SMN is required for proper snRNP synthesis, and its absence leads to degeneration of motor neurons, particularly those of the spinal cord. The main difference between the two copies is the fact that in SMN2 exon 7 is predominantly skipped in most tissues. SMN2 codes for SMNΔ 7, a partially functional and unstable protein. Loss of SMN1 leads to a deficit of the SMN protein and the consequent death of motor neurons. SMN2 presents a C→T change at position 6 of exon 7. This single mutation causes two different outcomes: first, the deletion of an ESE, and second, it creates an ESS, which in turn promotes the skipping of exon 7 [44].

Another well-documented case is Duchenne muscular dystrophy (DMD). While genomic deletions of the dystrophin gene cause the most severe forms of DMD, some mild forms of DMD are caused by point mutations that affect the splicing patterns [37]. Particularly, a T→A substitution in exon 31 simultaneously generates an ESS resulting in exon skipping and introduces a PTC [45]. These splicing alterations produce a partially functioning form of the protein, which explains the mild phenotype of the disease.

Mutations of Splicing Trans-acting and Core Spliceosome Factors

Mutations in trans-acting splicing factors, as well as core spliceosome components, can simultaneously affect a significant number of genes. Unlike mutations in cis-acting elements, there are relatively few examples of genetic disorders caused by alterations of trans-acting factors, including spliceosomal factors, maybe because many of these mutations result in lethality during embryonic development [46].

Amyotrophic lateral sclerosis (ALS) is an adult-onset neurodegenerative disease characterized by the degeneration of upper and lower motor neurons. It is a fatal disorder that ultimately causes death within 2–5 years following diagnosis. Around 10% of cases seem to follow a pattern of Mendelian inheritance and high penetrance [47]. Studies demonstrated that RNA processing plays an essential role in the onset of the disease. Specifically, one of the most important genes for ALS, TDP-43 (TARDBP), an RNA-binding protein, seems to be a major component of cytoplasmic inclusions in motor neurons of ALS patients [48]. TDP-43 is normally localized in the nucleus and it is involved in RNA processing as well as AS and neurons with cytoplasmic aggregations show a depletion of nuclear TDP-43. Recent studies have demonstrated that TDP-43 seems to be a direct regulator of several AS events in the brain [49, 50]. This evidence suggests that the loss of function of TDP-43 seems to be a major determinant factor in ALS.

Retinitis pigmentosa is a disorder characterized by the progressive degeneration of the retina. Mutations in the core spliceosomal factors PRPF3, PRPF8, PRPF31, and SNRNP200 are associated with the onset of the disease [51]. These factors are important for the assembly of the tri-snRNP complex of U4/U6/U5 from the spliceosome [11]. Even though the specific pathological defects seem to indicate a functional role in the retina, the specific splicing abnormalities responsible for the disease remain to be discovered.

Alternative Splicing and Aging

Aging is characterized by a general decline of the homeostatic capacity of the organism to reach the normal state, leading to a general decline in the physiological and social functions highly associated with mortality. In this context, aging has been associated with several diseases such as cardiovascular and metabolic disorders, many types of cancer, and neurodegenerative diseases as Alzheimer’s and Parkinson’s diseases, among others [52]. Such age-related decline is associated with tissue damage and several inflammatory processes (inflammaging [53] and immunosenescence [54]); in this context aging seems to be tissue-specific and highly related to genetic expression more than genotype itself. Therefore, AS becomes highly essential since 90% of human protein-coding genes produce multiple transcripts through this process [3], and as previously stated, aberrant splicing events are correlated with age-associated diseases, since most of them lead to aberrant protein formation and, therefore, to misfolded protein accumulation, a process highly related to the molecular pillars of aging [55].

Alternative Splicing, Progeroid, and Age-Related Diseases

Werner syndrome (WS)

It is a well-characterized autosomal recessive progeroid syndrome, in which patients developed normally until they reach puberty and then started to experiment wrong physiological development, the suffering of skin and gonadal atrophy, cataracts, type 2 diabetes, osteoporosis, and loss and graying of hair [56]. Most of WS patients reach 54 years old and die of myocardial infarction; mutations on WRN gene (located at chromosome 8p12 that codified for 34 exons encoding a DNA helicase protein) are responsible for such disease [57]. AS is quite essential in WS since due to exon skipping associated with stop codons, indels, or mutations, most of the pathogenic variants on the WRN gene result in helicase protein truncation; additionally pathogenic variants have also been reported in intronic regions [56].

Hutchinson-Gilford syndrome (HGS)

It is a genetic disease characterized by a dramatic aging phenotype early in childhood [58]. Most of HGS patients die from heart attacks and stroke early in their teens [59]. Interestingly, most cases are characterized by a silent mutation within a single codon of the LMNA gene which enhances the use of an internal 5′ SS in exon 11 and leads to the production of a truncated protein (progerin). Its accumulation could lead to abnormally shaped nuclei, loss of heterochromatin distribution, changes in methylation patterns, and misregulation of nuclear proteins, among other changes [60]. Although it is not clear the relationship between HGS and normal aging, evidence suggests that progerin may be involved in the aging process since the number of (+) progerin protein cells increases with age in samples from healthy individuals [61,62,63].

Bloom syndrome (BS)

It is an autosomal recessive progeroid disease mainly characterized by early growth deficiency, photosensitive skin changes, immune deficiency, insulin resistance, and a substantially increased risk for the development of multiple cancers [64]. Mutations on the BLM gene cause BS; such gene encodes RecQ helicase. Interestingly, such mutations have been found to induce aberrant splicing in which the extra exon 3 s are skipped, and a site mutation on p53 splice acceptor has been found in skin fibroblasts derived from BS patient [65].

Diabetes

It is a chronic disease characterized by hyperglycemia as a result of pancreas not making enough insulin or due to insulin resistance. In this context, insulin sensitivity is associated with insulin receptor isoform, and studies have demonstrated that insulin receptor type B mRNA variant increases in response to bariatric surgery [66] and weight loss for low caloric diet [67] as well as the genetic expression of several splicing factors involved in the insulin receptor [68]. Particularly in the pathogenesis of diabetes type 1, splicing process in T cells and node stromal cells seems to be involved in the modulation of the immunological response against β cells; interestingly , β cells exposed to cytokines activate AS networks that modulate its viability and susceptibility to immune-induced stress [69, 70].

Cancer

It is a broad term to refer to several diseases characterized by abnormal cells that possess a very particular genetic expression related with cell survival, accelerated growth and spreading across the body. Since the incidence of most cancers increases as a person ages, it is considered an age-related disease. Several genome-wide association studies suggest a strong relation between AS and cancer due to the plasticity offered by this process [71]. Splicing aberration related with cancer could be divided in four main categories: those related with alteration on tumor suppressor genes and oncogenes, those related with aberration on spliceosomal components, mutations over splicing factors, and changes in the signaling pathways that regulate splicing process; some of these changes are highly significant to cancer hallmarks [72].

Alzheimer’s disease (AD)

It is characterized by a progressive decline in normal cognitive functions, diminishing the performance of memory, attention, language, and visuospatial skills and in executing tasks [73]. Neuropathology of AD includes the accumulation of β-amyloid deposition and accumulation of Tau-hyperphosphorylated proteins [74]. Mutations occurring in the intronic regions of presenilin 1 and 2 cause missplicing and lead to abnormal expression of β-amyloid [75]. A variant including exon 7 is the dominant splice form of the amyloid precursor protein gene in AD patients and contributes to β-amyloid accumulation; interestingly, RBFox is a trans-acting regulator that leads to the inclusion or exclusion of exon 7 [76]. ApoE4 a major cholesterol transporter in the brain and cholesterol-rich membrane domains increase β-amyloid production affecting β- and γ-secretase complexes; the proteolysis of apoE4 may lead to a loss of function in its ability to remove β-amyloid . Interestingly, E4 isoform is more prone to proteolysis than other APOE [77]. The inclusion of the exon 10 in Tau gene generates an isoform susceptible to microtubule binding involved in the formation of Tau into paired helical filaments [74].

Parkinson’s disease (PD)

It is the second most common neurodegenerative disease, characterized by resting tremor, bradykinesia , stiffness of movement, and postural instability; among its physiopathology is caused by protein aggregation in Lewy bodies and loss of dopamine-containing neurons in the substantia nigra of the midbrain [78]. Six genes, including PARK2, SNCAIP, LRRK2, SNCA, SRRM2, and MAPT, are involved in aberrant AS events in PD patients [79]. For instance, several point mutations in PARK2 splice acceptor or donor sites have been identified in PD patients [80]. On the other side, mutations in LRRK2 are the most common genetic cause of familial late-onset parkinsonism; most of them are on intronic regions highly susceptible to splicing [81]. Interestingly, oxidant generated AS of SNCA plays a central role in dopamine neuron cell death [79]. The MAPT gene which encodes Tau is also susceptible to several mutations present within coding regions and reduces its binding activity and decreases the ability of Tau to promote MT assembly [79].

Conclusions

AS constitutes one of the most important mechanisms for the plasticity of the transcriptome and proteome, since it is not only a spatiotemporal process but it helps to identify critical processes. Growing evidence demonstrates that dysregulation of the AS events is highly implicated in several diseases including those age-related diseases. In this context, it can function to develop biomarkers for such diseases or to develop therapeutic agents. Further research must be performed to improve our understanding of this complicated process.