Abstract
An increasing number of structural homology search tools, mostly based on profile stochastic context-free grammars (SCFGs) have been recently developed for the non-coding RNA gene identification. SCFGs can include statistical biases that often occur in RNA sequences, necessary to profile specific RNA structures for structural homology search. In this paper, a succinct stochastic grammar model is introduced for RNA that has competitive search effectiveness. More importantly, the profiling model can be easily extended to include pseudoknots, structures that are beyond the capability of profile SCFGs. In addition, the model allows heuristics to be exploited, resulting in a significant speed-up for the CYK algorithm-based search.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Sakakibara Y, Brown M, Hughey R et al. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research, 1994, 22: 5112–5120.
Eddy S R, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Research, 1994, 22: 2079–2088.
Tinico I, Borer P N, Dengler B et al. Improved estimation of secondary structure in ribonucleic acids. Nature New Biology, 1973, 246: 40–41.
Lowe T M, Eddy S R. tRNAscan-SE: A program for improved detection of transfer RNA genes in genes in genomic sequences. Nucleic Acids Research, 1997, 25: 955–964.
Klein R J, Eddy S R. Rsearch: Finding homologs of single structured RNA sequences. BMC Bioinformatics, 2003, 4(1): 44.
Rivas E, Eddy S R. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2001, 2(8).
Rivas E, Klein R J, Jones T A, Eddy S R. Computational identification of non-coding RNAs in E. coli by comparative genomics. Curr. Biol., 2001, 1(1): 1369–1373.
Rivas E, Eddy S R. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics, 2000, 16: 583–605.
Eddy S R. Non-coding RNA genes and the modern RNA world. Nature Genetics, 2001, 2: 919–929.
Dowell R D, Eddy S R. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 2004, 5(1): 71.
Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics, 1999, 15: 446–454.
Durbin R, Eddy S R, Krogh A, Mitchison G J. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.
Weinberg Z, Ruzzo W L. Faster genome annotation of non-coding RNA families without loss of accuracy. In Proc. the Eighth Annual Int. Conf. Research in Computational Molecular Biology, 2004, 243–251.
Brown M, Wilson C. RNA pseudoknot modeling using intersections of stochastic context-free grammars with applications to database search. In Pacific Symposium on Biocomputing, 1996.
Felden B, Massire C, Westhof E et al. Phylogenetic analysis of tmRNA genes within a bacterial subgroup reveals a specific structural signature. Nucleic Acids Research, 2001, 29: 1602–1607.
Brown M P. Small subunit ribosomal RNA modeling using stochastic context-free grammars. In Proc. Int. Conf. Intelligent Systems in Molecular Biology, 2000, 8: 57–66.
Holmes I, Rubin D H. Pairwise RNA structure comparison with stochastic context-free grammars. In Pacific Symposium on Biocomputing, 2002, pp.191–203.
Cai L, Malmberg R L, Wu Y. Stochastic modeling of RNA pseudoknotted structures: A grammatical approach. In Proceedings of the 11th Intelligent Systems for Molecular Biology, also Bioinformatics, 2003, 19: 66–73.
Zeenko V V, Ryabova L A, Spirin A S et al. Eukaryotic elongation factor 1A interacts with the upstream pseudoknot domain in the 3′ untranslated region of tobacco mosaic virus RNA. Journal of Virology, 2002, 76(11): 5678–5691.
Griffiths-Jones S, Bateman A, Marshall M et al. Rfam: An RNA family database. Nucleic Acids Research, 2003, 31(1): 439–441.
Lyngso R B, Pedersen C N S. RNA pseudoknot prediction in energy based models. Journal of Computational Biology, 2000, 7: 409–428.
Sprinzl M, Horn C, Brown M et al. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Research, 1998, 26(1): 148–153.
Tanaka Y, Hori T, Tagaya M et al. Imino proton NMR analysis of HDV ribozymes: Nested double pseudoknot structure and Mg2+ion-binding site close to the catalytic core in solution. Nucleic Acids Research, 2002, 30: 766–774.
Author information
Authors and Affiliations
Corresponding author
Additional information
Ying-Lei Song received his B.S. degree in physics from Tsinghua University in 1998, and his M.S. degree in computer science from Ohio University in 2003. He is at present a Ph.D. candidate in the Department of computer science at University of Georgia. His research interests concentrate on designing efficient algorithms for predicting and studying secondary and tertairy structures of RNAs and proteins.
Ji-Zhen Zhao received his M.S. degree in Biology from Peking University in 1997. He is currently a Ph.D. candidate in the Department of computer science at University of Georgia. His research interests focus on modeling of RNA secondary structures and biological networks.
Chun-Mei Liu received his B.E. and M.E. degrees in Computer science and engineering from Anhui University in 1999 and 2002 respectivehex ly. She is currently a Ph.D. candidate in the Department of computer science at University of Georgia. Her research interests include secondary tertairy structures of RNAs and proteins, graph theory, and theory of computation.
Kan Liurecived his B.S. degree in engineering machanics from Beijing Institute of technology, Chaina, in 1995 and his M.S. degree in computers cience form the University of Georgia in 2004. He is a Ph.D. candidate in the University of California, Riverside. His research focuses on developing efficient algorithems and softwarefor computational problems in molecular biologyand genomics.
Ressell L. Malmberg is a professor in the Plant Biology Department, Univesity of Georgia, USA. He recived his Ph.D. degreefrom the University of Wisconsin in Genetics, then did postdoctoral work at Michigan State University and Cold Spring Harbor Laboratory, before moving to the University of Georgia. His Current research interests are in bioinformatics and in evolutionary genetics.
Li-Ming Cai is an associate professor in the Department of Computer Science at University of Georgia. He received his Ph.D. degree in computer science from Texas A & M University in 1994. He also holds B.s and M.s degrees in computer science awarded by Tsinghua University. His current research interestsinclude algorithems, computational biology, and theory of computation.
Rights and permissions
About this article
Cite this article
Song, YL., Zhao, JZ., Liu, CM. et al. RNA Structural Homology Search with a Succinct Stochastic Grammar Model. J Comput Sci Technol 20, 454–464 (2005). https://doi.org/10.1007/s11390-005-0454-x
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-005-0454-x