Skip to main content

Sequence Assembly

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1525))

Abstract

We describe an efficient method for assembling short reads into long sequences. In this method, a hashing technique is used to compute overlaps between short reads, allowing base mismatches in the overlaps. Then an overlap graph is constructed, with each vertex representing a read and each edge representing an overlap. The overlap graph is explored by graph algorithms to find unique paths of reads representing contigs. The consensus sequence of each contig is constructed by computing alignments of multiple reads without gaps. This strategy has been implemented as a short read assembly program called PCAP.Solexa. We also describe how to use PCAP. Solexa in assembly of short reads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dear S, Staden R (1991) A sequence assembly and editing program for efficient management of large projects. Nucleic Acids Res 19:3907–3911

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Huang X (1992) A contig assembly program based on sensitive detection of fragment overlaps. Genomics 14:18–25

    Article  CAS  PubMed  Google Scholar 

  3. Kececioglu JD, Myers EW (1995) Combinatorial algorithms for DNA sequence assembly. Algorithmica 13:7–51

    Article  Google Scholar 

  4. Green P (1995) http://www.phrap.org

  5. Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Sutton GG, White O, Adams MD et al (1995) TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci Tech 1:9–19

    Article  CAS  Google Scholar 

  7. Myers EW, Sutton GG, Delcher AL et al (2000) A whole-genome assembly of Drosophila. Science 287:2196–2204

    Article  CAS  PubMed  Google Scholar 

  8. Aparicio S, Chapman J, Stupka E et al (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301–1310

    Article  CAS  PubMed  Google Scholar 

  9. Mullikin JC, Ning Z (2003) The Phusion assembler. Genome Res 13:81–90

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Jaffe DB, Butler J, Gnerre S et al (2003) Whole-genome sequence assembly for mammalian genomes: ARACHNE 2. Genome Res 13:91–96

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Huang X, Wang J, Aluru S et al (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164–2170

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98:9748–9753

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Chaisson M, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Butler J, MacCallum I, Kleber M et al (2008) ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zerbino DR, Birney E (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Simpson JT, Wong K, Jackman SD et al (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li R, Zhu H, Ruan J et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Boisvert S, Laviolette F, Corbeil J (2010) Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol 17:1519–1533

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Liu Y, Schmidt B, Maskell DL (2011) Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinform 12:354

    Article  Google Scholar 

  20. Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991

    Article  CAS  PubMed  Google Scholar 

  22. Huang X, Yang S-P, Chinwalla A et al (2006) Application of a superword array in genome assembly. Nucleic Acids Res 34:201–205

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York

    Book  Google Scholar 

  24. Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The author thanks Shiaw-Pyng Yang for invaluable feedback on using PCAP.Solexa on various datasets of short reads.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoqiu Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Huang, X. (2017). Sequence Assembly. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6622-6_2

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6620-2

  • Online ISBN: 978-1-4939-6622-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics