Abstract
In this paper, we study the Transcription Factor Binding Sites (TFBS) prediction problem in bioinformatics. We develop a novel parameterized approach that can efficiently explore the space of all possible locations of TFBSs in a set of homologous sequences with high accuracy. The exploration is performed by an ensemble of a few Hidden Markov Models (HMM), where the size of the ensemble is the parameter of the algorithm. The ensemble is initially constructed through the local alignments between two sequences that have the lowest similarity value in the sequence set, the parameters of each HMM in the ensemble are revised when the remaining sequences in the set are scanned through by it one by one. A list of possible TFBSs are generated when all sequences in the set have been processed by the ensemble. Testing results showed that this approach can accurately handle the cases where a single sequence may contain multiple binding sites and thus has advantages over most of the existing approaches when a sequence may contain multiple binding sites.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Bailey, T.L., Elkan, C.: Unsupervised Learning of Multiple Motifs In Biopolymers Using Expectation Maximization. Technical Report CS93-302, Department of Computer Science, University of California, San Diego (August 1993)
Bailey, T.L., Elkan, C.: Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. In: Proceedings of the Second International Conference on Intelligient Systems for Molecular Biology, pp. 28–36 (1994)
Brent, M.M., Anand, R., Marmorstein, R.: Structural Basis for DNA Recognition by Foxo1 and its Regulation by Posttranslational Modification. Structure 16, 1407–1416 (2008)
Che, D., Song, Y., Rasheed, K.: MDGA: Motif Discovery Using a Genetic Algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 447–452 (2005)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
Galas, D.J., Schmitz, A.: A DNA Footprinting: A Simple Method for the Detection of Protein-DNA Binding Specificity. Nucleic Acids Research 5(9), 3157–3170 (1978)
Garner, M.M., Revzin, A.: A Gel Electrophoresis Method for Quantifying He Binding of Proteins to Specific DNA Regions: Application to Components of the Escherichia Coli Lactose Operon Regulatory Systems. Nucleic Acids Research 9(13), 3047–3060 (1981)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences. Bioinformatcs 15(7), 53–577 (1999)
Hu, T.C., et al.: Snail Associates with EGR-1 and SP-1 to Upregulate Transcriptional Activation of P15ink4b. The FEBS Journal 277, 1202–1218 (2010)
Liu, C., Song, Y., Shapiro, L.W.: RNA Folding Including Pseudoknots: A New Parameterized Algorithm and Improved Upper Bound. In: Proceedings of the 7th Workshop on Algorithms in Bioinformatics, pp. 310–322 (2007)
Liu, C., Song, Y., Burge III, L.L.: Parameterized Lower Bound and Inapproximability of Polylogarithmic String Barcoding. Journal of Combinatorial Optimization 16(1), 39–49 (2008)
Liu, C., Song, Y.: Parameterized Dominating Set Problem in Chordal Graphs: Complexity and Lower Bound. Journal of Combinatorial Optimization 18(1), 87–97 (2009)
Liu, C., Song, Y.: Parameterized Complexity and Inapproximability of Dominating Set Problem in Chordal and Near Chordal Graphs. Journal of Combinatorial Optimization 22(4), 684–698 (2011)
Liu, F.F.M., Tsai, J.J.P., Chen, R.M., Chen, S.N., Shih, S.H.: FMGA: Finding Motifs by Genetic Algorithm. In: IEEE Fourth Symposium on Bioinformatics And Bioengineering, pp. 459–466 (2004)
Liu, J.S., Neuwald, A.F., Lawrence, C.E.: Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies. J. Am. Stat. Assoc. 90(432), 1156–1170 (1995)
Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes. In: Pacific Symposium of Biocomputing, vol. 6, pp. 127–1138 (2001)
Quigley, M., et al.: Transcriptional Analysis of HIV-Specific CD8+ T Cells Shows That PD-1 Inhibits T Cell Function by Upregulating BATF. Nature Medicine 16, 1147–1151 (2010)
Rigbolt, K.T., et al.: System-Wide Temporal Characterization of the Proteome and Phosphoproteome of Human Embryonic Stem Cell Differentiation. Science Signaling 4, RS3–RS3 (2011)
Roth, F.R., Hughes, J.D., Estep, P.E., Church, G.M.: Finding DNA Regulatory Motifs Within Unaligned Non-Coding Sequences Clustered by Whole-Genome Mrna Quantitation. Nature Biotechnology 16(10), 939–945 (1998)
Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Song, J., Liu, C., Song, Y., Qu, J., Hura, G.: Alignment of Multiple Proteins With an Ensemble of Hidden Markov Models. International Journal of Bioinformatics and Data Mining 4(1), 60–71 (2010)
Song, Y., Liu, C., Huang, X., Malmberg, R.L., Xu, Y., Cai, L.: Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 376–388. Springer, Heidelberg (2005)
Song, Y., Liu, C., Malmberg, R.L., Pan, F., Cai, L.: Tree Decomposition Based Fast Search of RNA Structures Including Pseudoknots in Genomes. In: Proceedings of IEEE 2005 Computational Systems Bioinformatics Conference, pp. 223–234 (2005)
Song, Y., Zhao, J., Liu, C., Liu, K., Malmberg, R.L., Cai, L.: RNA Structural Homology Search with a Succinct Stochastic Grammar Model. Journal of Computer Science and Technology 20(4), 454–464 (2005)
Song, Y., Liu, C., Huang, X., Malmberg, R.L., Xu, Y., Cai, L.: Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(4), 423–432 (2006)
Song, Y., Liu, C., Malmberg, R.L., He, C., Cai, L.: Memory Efficient Alignment Between RNA Sequences and Stochastic Grammar Models of Pseudoknots. International Journal on Bioinformatics Research and Applications 2(3), 289–304 (2006)
Song, Y.: A New Parameterized Algorithm for Rapid Peptide Sequencing. Plos ONE 9(2), E87476 (2014)
Song, Y., Chi, A.Y.: A New Approach for Parameter Estimation in the Sequence-Structure Alignment of Non-Coding Rnas. Journal of Information Science and Engineering (in press, 2014)
Song, Y.: An Improved Parameterized Algorithm for the Independent Feedback Vertex Set Problem. Theoretical Computer Science (2014), doi:10.1016/J.Tcs.2014.03.031
Stormo, G.D.: Computer Methods for Analyzing Sequence Recognition of Nucleic Acids. Annu. Rev. Biochem. 17, 241–263 (1988)
Stormo, G.D., Hartzell, G.W.: Identifying Protein-Binding Sites from Unaligned DNA Fragments. Proc. of Nat. Acad. Sci. 86(4), 1183–1187 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Song, Y., Wang, C., Qu, J. (2014). A Parameterized Algorithm for Predicting Transcription Factor Binding Sites. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-09330-7_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09329-1
Online ISBN: 978-3-319-09330-7
eBook Packages: Computer ScienceComputer Science (R0)