A Compact Mathematical Programming Formulation for DNA Motif Finding

Kingsford, Carl; Zaslavsky, Elena; Singh, Mona

doi:10.1007/11780441_22

Carl Kingsford¹⁸,
Elena Zaslavsky¹⁹ &
Mona Singh¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

970 Accesses
4 Citations

Abstract

In the motif finding problem one seeks a set of mutually similar subsequences within a collection of biological sequences. This is an important and widely-studied problem, as such shared motifs in DNA often correspond to regulatory elements. We study a combinatorial framework where the goal is to find subsequences of a given length such that the sum of their pairwise distances is minimized. We describe a novel integer linear program for the problem, which uses the fact that distances between subsequences come from a limited set of possibilities. We show how to tighten its linear programming relaxation by adding an exponential set of constraints and give an efficient separation algorithm that can find violated constraints, thereby showing that the tightened linear program can still be solved in polynomial time. We apply our approach to find optimal solutions for the motif finding problem and show that it is effective in practice in uncovering known transcription factor binding sites.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Novel algorithms for LDD motif search

Article Open access 06 June 2019

On Multiple Longest Common Subsequence and Common Motifs with Gaps (Extended Abstract)

Towards a More Efficient Discovery of Biologically Significant DNA Motifs

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Akutsu, T., Arimura, H., Shimozono, S.: On approximation algorithms for local multiple alignment. In: RECOMB, pp. 1–7 (2000)
Google Scholar
Bafna, V., Lawler, E., Pevzner, P.A.: Approximation algorithms for multiple alignment. Theoretical Computer Science 182, 233–244 (1997)
Article MathSciNet MATH Google Scholar
Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Google Scholar
Chazelle, B., Kingsford, C., Singh, M.: A semidefinite programming approach to side-chain positioning with new rounding strategies. INFORMS J. on Computing 16, 380–392 (2004)
Article MathSciNet Google Scholar
Cook, W., Cunningham, W., Pulleyblank, W., Schrijver, A.: Combinatorial Optimization. Wiley-Interscience, New York (1997)
Google Scholar
Grötschel, M., Lovász, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization, 2nd edn. Springer, Berlin (1993)
MATH Google Scholar
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinf. 15, 563–577 (1999)
Article Google Scholar
Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)
Article Google Scholar
Kingsford, C., Chazelle, B., Singh, M.: Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinf. 21, 1028–1039 (2005)
Article Google Scholar
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Article Google Scholar
Lee, T., Rinaldi, N., Robert, F., Odom, D., Bar-Joseph, Z., Gerber, G., et al.: Transcriptional regulatory networks in S. cerevisiae. Science 298, 799–804 (2002)
Article Google Scholar
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Computer and Systems Sciences 65(1), 73–96 (2002)
Article MathSciNet Google Scholar
Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comp. Bio. 7, 345–362 (2000)
Article Google Scholar
McGuire, A., Hughes, J., Church, G.: Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000)
Article Google Scholar
Osada, R., Zaslavsky, E., Singh, M.: Comparative analysis of methods for representing and searching for transcription factor binding sites. Bioinf. 20, 3516–3525 (2004)
Article Google Scholar
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: ISMB, pp. 269–278 (2000)
Google Scholar
Robison, K., McGuire, A., Church, G.: A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 Genome. J. Mol. Biol. 284, 241–254 (1998)
Article Google Scholar
Schuler, G., Altschul, S., Lipman, D.: A workbench for multiple alignment construction and analysis. Proteins 9(3), 180–190 (1991)
Article Google Scholar
Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G.: Systematic determination of genetic network architecture. Nat. Genetics 22(3), 281–285 (1999)
Article Google Scholar
Thompson, W., Rouchka, E., Lawrence, C.: Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585 (2003)
Article Google Scholar
Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotech. 23, 137–144 (2005)
Article Google Scholar
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comp. Bio. 1, 337–348 (1994)
Article Google Scholar
Zaslavsky, E., Singh, M.: Combinatorial Optimization Approaches to Motif Finding (submitted), also available as Princeton University Computer Science Dept. Technical Report TR-728-05
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Bioinformatics & Computational Biology, University of Maryland, College Park, MD
Carl Kingsford
Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ
Elena Zaslavsky & Mona Singh

Authors

Carl Kingsford
View author publications
You can also search for this author in PubMed Google Scholar
Elena Zaslavsky
View author publications
You can also search for this author in PubMed Google Scholar
Mona Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel
Moshe Lewenstein
Department of Software, Technical University of Catalonia, 08034, Barcelona, Spain
Gabriel Valiente

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kingsford, C., Zaslavsky, E., Singh, M. (2006). A Compact Mathematical Programming Formulation for DNA Motif Finding. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_22

Download citation

DOI: https://doi.org/10.1007/11780441_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Compact Mathematical Programming Formulation for DNA Motif Finding

Abstract

Chapter PDF

Similar content being viewed by others

Novel algorithms for LDD motif search

On Multiple Longest Common Subsequence and Common Motifs with Gaps (Extended Abstract)

Towards a More Efficient Discovery of Biologically Significant DNA Motifs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Compact Mathematical Programming Formulation for DNA Motif Finding

Abstract

Chapter PDF

Similar content being viewed by others

Novel algorithms for LDD motif search

On Multiple Longest Common Subsequence and Common Motifs with Gaps (Extended Abstract)

Towards a More Efficient Discovery of Biologically Significant DNA Motifs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation