Abstract
RNA secondary structure prediction is widely used to understand RNA function. Existing dynamic programming-based algorithms, both the classical minimum free energy (MFE) methods and partition function methods, suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Inspired by incremental parsing for context-free grammars in computational linguistics, we designed linear-time heuristic algorithms, LinearFold and LinearPartition, to approximate the MFE structure, partition function and base pairing probabilities. These programs are orders of magnitude faster than Vienna RNAfold and CONTRAfold on long sequences. More interestingly, LinearFold and LinearPartition lead to more accurate predictions on the longest sequence families for which the structures are well established (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500 + nucleotides apart). This chapter provides protocols for using LinearFold and LinearPartition for secondary structure prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Petrov AI, Kay SJ, Kalvari I, Howe KL, Gray KA, Bruford EA, Kersey PJ, Cochrane G, Finn RD, Bateman A, Kozomara A, Griffiths-Jones S, Frankish A, Zwieb CW, Lau BY, Williams KP, Chan PP, Lowe TM, Cannone JJ, Gutell R, Machnicka MA, Bujnicki JM, Yoshihama M, Kenmochi N, Chai B, Cole JR, Szymanski M, Karlowski WM, Wood V, Huala E, Berardini TZ, Zhao Y, Chen R, Zhu W, Paraskevopoulou MD, Vlachos IS, Hatzigeorgiou AG, Ma L, Zhang Z, Puetz J, Stadler PF, McDonald D, Basu S, Fey P, Engel SR, Cherry JM, Volders PJ, Mestdagh P, Wower J, Clark MB, Quek XC, Dinger ME (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res 45(D1):D128–D134
Bachellerie JP, Cavaillé J, Hüttenhofer A (2002) The expanding snoRNA world. Biochimie 84(8):775–790
Bellaousov S, Mathews DH (2010) ProbKnot: fast prediction of RNA secondary structure including pseudoknots. RNA 16(10):1870–1880
Bernhart SH, Hofacker IL, Stadler PF (2006) Local RNA base pairing probabilities in large sequences. Bioinformatics 22(5):614–615
Clote P, Ponty Y, Steyaert J (2012) Expected distance between terminal nucleotides of RNA secondary structures. J Math Biol 65(3):581–599
Ding Y, Lawrence CE (2003) A statistical sampling algorithm for RNA secondary. Nucleic Acids Res 31(24):7280–7301
Do C, Woods D, Batzoglou S (2006) CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):e90–e98
Doudna JA, Cech TR (2002) The chemical repertoire of natural ribozymes. Nature 418(6894):222–228
Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2(12):919–929
Hofacker IL, Lorenz R (2014) Predicting RNA structure: advances and limitations. In: RNA folding: methods and protocols, pp 1–19
Huang L, Sagae K (2010) Dynamic programming for linear-time incremental parsing. In: Proceedings of ACL 2010. ACL, Uppsala, pp 1077–1086
Huang L, Zhang H, Deng D, Zhao K, Liu K, Hendrix DA, Mathews DH (2019) LinearFold: linear-time approximate RNA folding by 5′-to-3′ dynamic programming and beam search. Bioinformatics 35(14):i295–i304
Kerpedjiev P, Hammer S, Hofacker IL (2015) Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics 31(20):3377–3379
Kiryu H, Kin T, Asai K (2008) Rfold: an exact algorithm for computing local base pairing probabilities. Bioinformatics 24(3):367–373
Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31(13):3423–3428
Lai W-JC, Kayedkhordeh M, Cornell EV, Farah E, Bellaousov S, Rietmeijer R, Salsi E, Mathews DH, Ermolenko DN (2018) mRNAs and lncRNAs intrinsically form secondary structures with short end-to-end distances. Nat Commun 9(1):1–11
Lange SJ, Maticzka D, Mohl M, Gagnon JN, Brown CM, Backofen R (2012) Global or local? predicting secondary structure and accessibility in mRNAs. Nucleic Acids Res 40(12):5215–5226
Leija-Martínez N, Casas-Flores S, Cadena-Nava RD, Roca JA, Mendez-Cabañas JA, Gomez E, Ruiz-Garcia J (2014) The separation between the 5′-3′ ends in long RNA molecules is short and nearly constant. Nucleic Acids Res 42(22):13963–13968
Li TJ, Reidys CM (2018) The rainbow spectrum of RNA secondary structures. Bull Math Biol 80(6):1514–1538
Liu B, Mathews DH, Turner DH (2010) RNA pseudoknots: folding and finding. F1000 Biology Reports 2(8)
Lorenz R, Bernhart SH, Zu Siederdissen CH, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) ViennaRNA package 2.0. Algorithms Mol Biol 6(1):1
Lu ZJ, Gloor JW, Mathews DH (2009) Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 15(10):1805–1813
Lyumkis D (2019) Challenges and opportunities in cryo-EM single-particle analysis. J Biol Chem 294(13):5181–5197
Mathews DH (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10(8):1178–1190
Mathews DH (2006) Revolutions in RNA secondary structure prediction. J Mol Biol 359(3):526–532
Mathews DH (2006) RNA secondary structure analysis using RNAstructure. Curr Protoc Bioinform 13(1):12–6
Mathews DH, Turner DH (2006) Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol 16(3):270–278
McCaskill JS (1990) The equilibrium partition function and base pair probabilities for RNA secondary structure. Biopolymers 29:11105–1119
Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Nat Acad Sci USA 77(11):6309–6313
Reuter JS, Mathews DH (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinform 11(1):1–9
Sato K, Kato Y, Hamada M, Akutsu T, Asai K (2011) IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27(13):i85–i93
Seetin MG, Mathews DH (2012) RNA structure prediction: an overview of methods. In: Bacterial regulatory RNA: methods and protocols, pp 99–122
Sloma M, Mathews D (2016) Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures. RNA 22:1808–1818
Sperschneider J, Datta A (2010) DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model. Nucleic Acids Res 38(7):e103–e114
Turner DH, Mathews DH (2009) NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38:D280–D282
Waterman MS, Smith TF (1986) Rapid dynamic programming algorithms for RNA secondary structure. Adv Appl Math 7(4):455–464
Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1):292–299
Zhang H, Keane S (2019) Advances that facilitate the study of large RNA structure and dynamics by nuclear magnetic resonance spectroscopy. Wiley Interdiscip Rev RNA 10:e1541
Zhang H, Zhang L, Mathews DH, Huang L (2020) LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities. Bioinformatics 36:i258–i267
Zhang J, Ferré-D’Amaré AR (2014) New molecular engineering approaches for crystallographic studies of large RNAs. Curr Opin Struct Biol 26:9–15
Zhang L, Zhang H, Mathews DH, Huang L (2019) ThreshKnot: thresholded probknot for improved RNA secondary structure prediction. bioRxiv
Zuker M (1989) On finding all suboptimal foldings of an RNA molecule. Science 244(4900):48–52
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9(1):133–148
Acknowledgements
This work is supported in part by National Institutes of Health (R35GM145283 to D.H.M.)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zhang, H., Zhang, L., Liu, K., Li, S., Mathews, D.H., Huang, L. (2023). Linear-Time Algorithms for RNA Structure Prediction. In: Kawaguchi, R.K., Iwakiri, J. (eds) RNA Structure Prediction. Methods in Molecular Biology, vol 2586. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2768-6_2
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2768-6_2
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2767-9
Online ISBN: 978-1-0716-2768-6
eBook Packages: Springer Protocols