Abstract
Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity but also spatially conserved conformations caused by residue interactions, and consequently is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases.
This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where in general the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k t N 2) for the structure of N residues and its tree decomposition of width t. The parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. The paper demonstrates a successful application of the algorithm to developing a fast program for RNA structural homology search.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arnborg, S., Proskurowski, A.: Linear time algorithms for NP-hard problems restricted to partial k-trees. Discrete Applied Mathematics 23, 11–24 (1989)
Arnborg, S., Lagergren, J., Seese, D.: Easy problems for tree-decomposable graphs. Journal of Algorithms 12, 308–340 (1991)
Bowie, J., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170 (1991)
Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on Computing 25, 1305–1317 (1996)
Bryant, S.H., Altschul, S.F.: Statistics of sequence-structure threading. Curr. Opinion Struct. Biol. 5, 236–244 (1995)
Brown, M., Wilson, C.: RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. In: Pacific Symposium on Biocomputing, pp. 109–125 (1995)
Cai, L., Malmberg, R., Wu, Y.: Stochastic Modeling of Pseudoknot Structures: A Grammatical Approach. Bioinformatics 19, i66–i73 (2003)
Dandekar, T., Schuster, S., Snel, B., Huynen, M., Bork, P.: Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochemical Journal. 1, 115–124 (1999)
Dandjinou, A.T., Lévesque, N., Larose, S., Lucier, J., Elela, S.A., Wellinger, R.J.: A phylogenetically based secondary structure for the yeast telomerase RNA. Current Biology 14, 1148–1158 (2004)
Doudna, J.A.: Structural genomics of RNA. Nature Structural Biology 7(11) supp., 954–956 (2000)
Downey, R., Fellows, M.: Parameterized Complexity. Springer, Heidelberg (1999)
Eddy, S.R.: Computational genomics of non-coding RNA genes. Cell 109, 137–140 (2002)
Eddy, S., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Research 22, 2079–2088 (1994)
Eppstein, D.: Subgraph isomorphism in planar graphs and related problems. Journal of Graph Algorithms and Applications 3.3, 1–27 (1999)
Geobel, S.J., Hsue, B., Dombrowski, T.F., Masters, P.S.: Characterization of the RNA components of a Putative Molecular Switch in the 3’ Untranslated Region of the Murine Coronavirus Genome. Journal of Virology 78, 669–682 (2004)
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R.: Rfam: an RNA family database. Nucleic Acids Research 31, 439–441 (2003)
Klein, R.J., Eddy, S.R.: RSEARCH: Finding Homologs of Single Structured RNA Sequences. BMC Bioinformatics 4, 44 (2003)
Lathrop, R.H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineering 7, 1069–1068 (1994)
Lathrop, R.H., Rogers Jr., R.G., Bienkowska, J., Bryant, B.K.M., Buturovic, L.J., Gaitatzes, C., Nambudripad, R., White, J.V., Smith, T.F.: Analysis and algorithms for protein sequence-structure alignment. In: Salzberg, Searls, Kasif (eds.) Computational Methods in Molecular Biology. Elsevier, Amsterdam (1998)
Lenhof, H.-P., Reinert, K., Vingron, M.: A polyhedral approach to RNA sequence structure alignment. Journal of Computational Biology 5(3), 517–530 (1998)
Liu, C., Song, Y., Malmberg, R., Cai, L.: Profiling and searching for RNA pseudoknot structures in genomes. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 968–975. Springer, Heidelberg (2005)
Lowe, T.M., Eddy, S.R.: tRNAscan-SE: A Program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25, 955–964 (1997)
Lyngso, S.B., Pedersen, C.N.: RNA pseudoknot prediction in energy-based models. Journal of Computational Biololgy 7(3), 409–427 (2000)
Matousek, J., Thomas, R.: On the complexity of finding iso- and other morphisms for partial k-trees. Discrete Mathematics 108, 343–364 (1992)
Marcotte, E.M., Matteo, P., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999)
Nameki, N., Felden, B., Atkins, J.F., Gesteland, R.F., Himeno, H., Muto, A.: Functional and structural analysis of a pseudoknot upstream of the tag-encoded sequence in E. coli tmRNA. Journal of Molecular Biology 286(3), 733–744 (1999)
Pervouchine, D.D.: IRIS: Intermolecular RNA Interaction Search. Genome Informatics 15(2), 92–101 (2004)
Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., Kececioglu, J.D.: A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the first annual international conference on Computational molecular biology, pp. 241–250 (1997)
Rivas, E., Eddy, S.R.: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2, 8 (2001)
Robertson, N., Seymour, P.D.: Graph Minors II. Algorithmic aspects of tree-width. Journal of Algorithms 7, 309–322 (1986)
Song, Y., Liu, C., Malmberg, R., Pan, F., Cai, L.: Tree decomposition based fast search of RNA secondary structures in genomes. In: Proceedings of 2005 IEEE Computational Systems Biology Conference (2005), (in press)
Uemura, Y., Hasegawa, A., Kobayashi, Y., Yokomori, T.: Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210, 277–303 (1999)
Wang, G., Dunbrack Jr., R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)
Xu, D., Unseren, M.A., Xu, Y., Uberbacher, E.C.: Sequence-structure specificity of a knowledge based energy function at the secondary structure level. Bioinformatics 16, 257–268 (2000)
Xu, J.: Rapid side-chain packing via tree decomposition. In: Proceedings of 2005 International Conference on Research in Computational Biology (to appear)
Xu, J., Li, M., Kim, D., Xu, Y.: RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1(1), 95–113 (2003)
Xu, Y., Xu, D., Uberbacher, E.C.: An efficient computational method for globally optimal threading. Journal of Computational Biology 5(3), 597–614
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, Y., Liu, C., Huang, X., Malmberg, R.L., Xu, Y., Cai, L. (2005). Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_31
Download citation
DOI: https://doi.org/10.1007/11557067_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)