Abstract
Whole genome sequencing provides the most comprehensive collection of an individual’s genetic variation. With the falling costs of sequencing technology, we envision paradigm shift from microarray-based genotyping studies to whole genome sequencing. We review methodologies for whole genome sequencing. There are two approaches for assembling short shotgun sequence reads into longer contiguous genomic sequences. In the de novo assembly approach, sequence reads are compared to each other, and then overlapped to build longer contiguous sequences. The reference-based assembly approach involves mapping each read to a reference genome sequence. We discuss methods for identifying genetic variation (single nucleotide polymorphisms, small indels, and copy number variants) and building haplotypes from genome assemblies, and discuss potential pitfalls. We expect methodologies to evolve rapidly as sequencing technologies improve and more human genomes are sequenced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and Hobbs, H.H. (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science, 305, 869–872.
Estivill, X. and Armengol, L. (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet, 3, 1787–1799.
Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P. et al.(2007) The diploid genome sequence of an individual human. PLoS Biol, 5, e254.
Holt, R.A. and Jones, S.J. (2008) The new paradigm of flow cell sequencing. Genome Res, 18, 839–846.
Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics, 6, 31.
Wu, T.D. and Watanabe, C.K. (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21, 1859–1875.
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S.L. (2004) Versatile and open software for comparing large genomes. Genome Biol, 5, R12.
Ning, Z., Cox, A.J. and Mullikin, J.C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res, 11, 1725–1729.
Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A. et al.(2008) The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–876.
Sjoblom, T., Jones, S., Wood, L.D., Parsons, D.W., Lin, J., Barber, T.D. et al.(2006) The consensus coding sequences of human breast and colorectal cancers. Science, 314, 268–274.
Ng, P.C., Levy, S., Huang, J., Stockwell, T.B., Walenz, B.P., Li, K. et al. (2008) Genetic variation in an individual human exome. PLoS Genet, 4, e1000160.
Feuk, L., Carson, A.R. and Scherer, S.W. (2006) Structural variation in the human genome. Nat Rev Genet, 7, 85–97.
Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D. et al.(2006) Global variation in copy number in the human genome. Nature, 444, 444–454.
Winkelmann, B.R., Hoffmann, M.M., Nauck, M., Kumar, A.M., Nandabalan, K., Judson, R.S. et al. (2003) Haplotypes of the cholesteryl ester transfer protein gene predict lipid-modifying response to statin therapy. Pharmacogenomics J, 3, 284–296.
Martin, E.R., Lai, E.H., Gilbert, J.R., Rogala, A.R., Afshari, A.J., Riley, J. et al.(2000) SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am J Hum Genet, 67, 383–394.
Drysdale, C.M., McGraw, D.W., Stack, C.B., Stephens, J.C., Judson, R.S., Nandabalan, K. et al. (2000) Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc Natl Acad Sci U S A, 97, 10483–10488.
Kong, A., Masson, G., Frigge, M.L., Gylfason, A., Zusmanovich, P., Thorleifsson, G. et al.(2008) Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet, 40, 1068–1075.
Stephens, M. and Donnelly, P. (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet, 73, 1162–1169.
Bansal, V., Halpern, A.L., Axelrod, N. and Bafna, V. (2008) An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res, 18, 1336–1346.
Zhang, K., Zhu, J., Shendure, J., Porreca, G.J., Aach, J.D., Mitra, R.D. and Church, G.M. (2006) Long-range polony haplotyping of individual human chromosome molecules. Nat Genet, 38, 382–387.
Turner, D.J., Tyler-Smith, C. and Hurles, M.E. (2008) Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping. Nucleic Acids Res, 36, e82.
Konfortov, B.A., Bankier, A.T. and Dear, P.H. (2007) An efficient method for multi-locus molecular haplotyping. Nucleic Acids Res, 35, e6.
Xiao, M., Gordon, M.P., Phong, A., Ha, C., Chan, T.F., Cai, D. et al. (2007) Determination of haplotypes from single DNA molecules: a method for single-molecule barcoding. Hum Mutat, 28, 913–921.
Bansal, V. and Bafna, V. (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24, i153-i159.
Parsons, D.W., Jones, S., Zhang, X., Lin, J.C., Leary, R.J., Angenendt, P. et al.(2008) An integrated genomic analysis of human glioblastoma multiforme. Science, 321, 1807–1812.
Romeo, S., Pennacchio, L.A., Fu, Y., Boerwinkle, E., Tybjaerg-Hansen, A., Hobbs, H.H. and Cohen, J.C. (2007) Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet, 39, 513–516.
Cohen, J.C., Pertsemlidis, A., Fahmi, S., Esmail, S., Vega, G.L., Grundy, S.M. and Hobbs, H.H. (2006) Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci U S A, 103, 1810–1815.
Jones, S., Zhang, X., Parsons, D.W., Lin, J.C., Leary, R.J., Angenendt, P. et al.(2008) Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science, 321, 1801–1806.
Greenman, C., Stephens, P., Smith, R., Dalgliesh, G.L., Hunter, C., Bignell, G. et al.(2007) Patterns of somatic mutation in human cancer genomes. Nature, 446, 153–158.
Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjoblom, T., Leary, R.J. et al.(2007) The genomic landscapes of human breast and colorectal cancers. Science, 318, 1108–1113.
Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068.
Parmigiani, G., Boca, S., Lin, J., Kinzler, K.W., Velculescu, V. and Vogelstein, B. (2009) Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics, 93(1), 17–21.
Albert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X. et al.(2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods, 4, 903–905.
Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M.N., Smith, S.W. et al.(2007) Genome-wide in situ exon capture for selective resequencing. Nat Genet, 39, 1522–1527.
Okou, D.T., Steinberg, K.M., Middle, C., Cutler, D.J., Albert, T.J. and Zwick, M.E. (2007) Microarray-based genomic selection for high-throughput resequencing. Nat Methods, 4, 907–909.
Porreca, G.J., Zhang, K., Li, J.B., Xie, B., Austin, D., Vassallo, S.L. et al.(2007) Multiplex amplification of large sets of human exons. Nat Methods, 4, 931–936.
Li, B. and Leal, S.M. (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet, 83, 311–321.
Lin, J., Gan, C.M., Zhang, X., Jones, S., Sjoblom, T., Wood, L.D. et al.(2007) A multidimensional analysis of genes mutated in breast and colorectal cancers. Genome Res, 17, 1304–1318.
Chittenden, T.W., Howe, E.A., Culhane, A.C., Sultana, R., Taylor, J.M., Holmes, C. and Quackenbush, J. (2008) Functional classification analysis of somatically mutated genes in human breast and colorectal cancers. Genomics, 91, 508–511.
Marini, N.J., Gin, J., Ziegle, J., Keho, K.H., Ginzinger, D., Gilbert, D.A. and Rine, J. (2008) The prevalence of folate-remedial MTHFR enzyme variants in humans. Proc Natl Acad Sci U S A, 105, 8055–8060.
Fahmi, S., Yang, C., Esmail, S., Hobbs, H.H. and Cohen, J.C. (2008) Functional characterization of genetic variants in NPC1L1 supports the sequencing extremes strategy to identify complex trait genes. Hum Mol Genet, 17, 2101–2107.
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S. et al. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res, 18, 810–820.
Hernandez, D., Francois, P., Farinelli, L., Osteras, M. and Schrenzel, J. (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res, 18, 802–809.
Dohm, J.C., Lottaz, C., Borodina, T. and Himmelbauer, H. (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res, 17, 1697–1706.
Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. and Batzoglou, S. (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE, 2, e484.
Warren, R.L., Sutton, G.G., Jones, S.J. and Holt, R.A. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500–501.
Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R. et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942–2944.
Zerbino, D.R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 18, 821–829.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Ng, P.C., Kirkness, E.F. (2010). Whole Genome Sequencing. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_12
Download citation
DOI: https://doi.org/10.1007/978-1-60327-367-1_12
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60327-366-4
Online ISBN: 978-1-60327-367-1
eBook Packages: Springer Protocols