Abstract
We describe an efficient method for assembling short reads into long sequences. In this method, a hashing technique is used to compute overlaps between short reads, allowing base mismatches in the overlaps. Then an overlap graph is constructed, with each vertex representing a read and each edge representing an overlap. The overlap graph is explored by graph algorithms to find unique paths of reads representing contigs. The consensus sequence of each contig is constructed by computing alignments of multiple reads without gaps. This strategy has been implemented as a short read assembly program called PCAP.Solexa. We also describe how to use PCAP. Solexa in assembly of short reads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dear S, Staden R (1991) A sequence assembly and editing program for efficient management of large projects. Nucleic Acids Res 19:3907–3911
Huang X (1992) A contig assembly program based on sensitive detection of fragment overlaps. Genomics 14:18–25
Kececioglu JD, Myers EW (1995) Combinatorial algorithms for DNA sequence assembly. Algorithmica 13:7–51
Green P (1995) http://www.phrap.org
Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877
Sutton GG, White O, Adams MD et al (1995) TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci Tech 1:9–19
Myers EW, Sutton GG, Delcher AL et al (2000) A whole-genome assembly of Drosophila. Science 287:2196–2204
Aparicio S, Chapman J, Stupka E et al (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301–1310
Mullikin JC, Ning Z (2003) The Phusion assembler. Genome Res 13:81–90
Jaffe DB, Butler J, Gnerre S et al (2003) Whole-genome sequence assembly for mammalian genomes: ARACHNE 2. Genome Res 13:91–96
Huang X, Wang J, Aluru S et al (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164–2170
Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98:9748–9753
Chaisson M, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330
Butler J, MacCallum I, Kleber M et al (2008) ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820
Zerbino DR, Birney E (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829
Simpson JT, Wong K, Jackman SD et al (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123
Li R, Zhu H, Ruan J et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272
Boisvert S, Laviolette F, Corbeil J (2010) Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol 17:1519–1533
Liu Y, Schmidt B, Maskell DL (2011) Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinform 12:354
Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477
Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991
Huang X, Yang S-P, Chinwalla A et al (2006) Application of a superword array in genome assembly. Nucleic Acids Res 34:201–205
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202
Acknowledgements
The author thanks Shiaw-Pyng Yang for invaluable feedback on using PCAP.Solexa on various datasets of short reads.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this protocol
Cite this protocol
Huang, X. (2017). Sequence Assembly. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_2
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6622-6_2
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6620-2
Online ISBN: 978-1-4939-6622-6
eBook Packages: Springer Protocols