Abstract
Alternative splicing of pre-mRNA is a complex process whose outcome depends on elements reviewed in the previous chapters such as the core spliceosome units, how the core spliceosome units interact between themselves and with other splicing enhancers and repressors, primary sequence motifs, and local RNA secondary structure. Connections between RNA splicing, transcription, and other processes have also been reviewed in the previous chapters. Splicing is inherently a stochastic process: Some defective transcripts are produced and handled by mechanisms such as nonsense-mediated decay (NMD), and studies report high variability at the transcript level between cells supposedly in similar states. Nonetheless, splicing is obviously not a random process: Many determinants of splicing regulation have been identified, and experimental measurements detect highly robust and conserved splicing changes between developmental stages and tissues. These observations naturally lead to the following questions: Can we devise a method that predicts given a cellular context and the primary transcript what would be the splicing outcome? What can such a method tell us about the underlying mechanisms that govern alternative splicing?
This chapter describes how these questions can be framed and addressed using machine-learning methodology. We describe how to extract putative RNA regulatory features from genomic sequence of exons and proximal introns, how to define target values based on experimental measurements of exon inclusion, how to learn a simple splicing model that optimizes the prediction the observed exon inclusion levels from the identified RNA features, and how to subsequently evaluate the learned model accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Berget SM, Moore C, Sharp PA (1977) Spliced segments at the 5′terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci 74:3171–3175
Chow LT, Gelinas RE, Broker TR et al (1977) An amazing sequence arrangement at the 5? ends of adenovirus 2 messenger RNA. Cell 12:1–8
Chen M, Manley JL (2009) Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat Rev Mol Cell Biol 10:741–754
Roca X, Krainer AR, Eperon IC (2013) Pick one, but be quick: 5′ splice sites and the problems of too many choices. Genes Dev 27:129–144
Lim LP, Burge CB (2001) A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci 98:11193–11198
Black DL (1995) Finding splice sites within a wilderness of RNA. RNA 1:763–771
Yu Y, Maroney PA, Denker JA et al (2008) Dynamic regulation of alternative splicing by silencers that modulate 5′ splice site competition. Cell 135:1224–1236
Stadler M, Shomron N, Yeo GW et al (2006) Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genet 2:e191
Wang Y, Xiao X, Zhang J et al (2013) A complex network of factors with overlapping affinities represses splicing through intronic elements. Nat Struct Mol Biol 20:36–45
Lam BJ, Hertel KJ (2002) A general role for splicing enhancers in exon definition. Rna 8:1233–1241
Shepard PJ, Hertel KJ (2010) Embracing the complexity of pre-mRNA splicing. Cell Res 20:866–868
Kalsotra A, Xiao X, Ward AJ et al (2008) A postnatal switch of CELF and MBNL proteins reprograms alternative splicing in the developing heart. Proc Natl Acad Sci 105:20333–20338
Wang ET, Cody NAL, Jog S et al (2012) Transcriptome-wide regulation of Pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150:710–724
Ray D, Kazan H, Chan ET et al (2009) Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat Biotechnol 27:667–670
Xue Y, Zhou Y, Wu T et al (2009) Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 36:996–1006
Barash Y, Calarco JA, Gao W et al (2010) Deciphering the splicing code. Nature 465:53–59
Robberson BL, Cote GJ, Berget SM (1990) Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol Cell Biol 10:84–94
Wang Z, Burge CB (2008) Splicing regulation: From a parts list of regulatory elements to an integrated splicing code. RNA 14:802–813
Black DL (1991) Does steric interference between splice sites block the splicing of a short c-src neuron-specific exon in non-neuronal cells? Genes Dev 5:389–402
Yeo GW, Coufal NG, Liang TY et al (2009) An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol 16:130–137
Zhang C, Zhang Z, Castle J et al (2008) Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev 22:2550–2563
Chan R, Black D (1997) The polypyrimidine tract binding protein binds upstream of neural cell-specific c-src exon N1 to repress the splicing of the intron downstream. Mol Cell Biol 17:4667–4676
Markovtsov V, Nikolic J, Goldman J et al (1992) Activation of c-src neuron-specific splicing by an unusual RNA element in vivo and in vitro. Cell 69:795–807
Rooke N, Markovtsov V, Cagavi E et al (2003) Roles for SR proteins and hnRNP A1 in the regulation of c-src exon N1. Mol Cell Biol 23:1874–1884
Yeo GW, Nostrand EL, Liang TY (2007) Discovery and analysis of evolutionarily conserved intronic splicing regulatory elements. PLoS Genet 3:e85
Castle JC, Zhang C, Shah JK et al (2008) Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet 40(12):1416–1425
Ule J, Stefani G, Mele A et al (2006) An RNA map predicting Nova-dependent splicing regulation. Nature 444(7119):580–586
Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21:897–901
Xiong HY, Barash Y, Frey BJ (2011) Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics 27:2554–2562
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Giardine B, Riemer C, Hardison RC et al (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455
Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11:377–394
Grant GR, Farkas MH, Pizarro AD et al (2011) Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics 27:2518–2528
Hastie T, Tibshirani R, Friedman JH (2003) The elements of statistical learning. Springer, New York
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Barash Y, Elidan G, Kaplan T et al (2005) CIS: compound importance sampling method for protein–DNA binding site p-value estimation. Bioinformatics 21:596–600
Barash Y, Elidan G, Friedman N et al (2003) Modeling dependencies in Protein–DNA binding sites. Proceedings of Seventh International Conference Res in Comp Mol Bio (RECOMB)
Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol 8:344–354
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Barash, Y., Garcia, J.V. (2014). Predicting Alternative Splicing. In: Hertel, K. (eds) Spliceosomal Pre-mRNA Splicing. Methods in Molecular Biology, vol 1126. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-980-2_28
Download citation
DOI: https://doi.org/10.1007/978-1-62703-980-2_28
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-979-6
Online ISBN: 978-1-62703-980-2
eBook Packages: Springer Protocols