Abstract
Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However finding a Shortest Double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying un-weighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2log2(|V|)) time.
In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a \(\Theta(p(|V|+|E|)\log(|V|) + (d_{max}p)^3 )\) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max {|{v | d in (v) − d out (v) > 0}|, |{ v | d in (v) − d out (v) < 0}|} and d max = max { |d in (v) − d out (v)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V| lies between 0.08% and 0.13% with 95% probability.
Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. We also present a Θ((|V| + |E|)log(V)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C.e.a.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Craig Venter, J., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J.e.: The sequence of the human genome. Science 291, 1304–1351 (2001)
Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome research 18, 821–829 (2008)
Pevzner, P.A., Tang, H., Waterman, M.S.: An eulerian path approach to dna fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98, 9748–9753 (2001)
Myers, E.W.: The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005)
Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 289–301. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kundeti, V., Rajasekaran, S., Dinh, H. (2010). An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs. In: Wu, W., Daescu, O. (eds) Combinatorial Optimization and Applications. COCOA 2010. Lecture Notes in Computer Science, vol 6508. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17458-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-17458-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17457-5
Online ISBN: 978-3-642-17458-2
eBook Packages: Computer ScienceComputer Science (R0)