Abstract
Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture [1]. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as “metagenomics” or “community genomics”. However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun (“random”) sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amann R, Ludwig W, Schleifer K (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59:143–169
Breitbart M et al (2002) Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci U S A 99:14250–14255
Venter JC et al (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66–74
Breitbart M et al (2003) Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol 185: 6220–6223
Hallam SJ et al (2004) Reverse methanogenesis: testing the hypothesis with environmental genomics. Science 305:1457–1462
Gill SR et al (2006) Metagenomic analysis of the human distal gut microbiome. Science 312:1355–1359
Warnecke F et al (2007) Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 450: 560–565
Tringe SG et al (2005) Comparative metagenomics of microbial communities. Science 308:554–557
Tyson GW et al (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37–43
Béjà O et al (2000) Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289:1902–1906
Hess M et al (2011) Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331:463–467
Hemme CL et al (2010) Metagenomic insights into evolution of a heavy metal-contaminated groundwater microbial community. ISME J 4:660–672
Pagani I et al (2012) The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 40:D571–D579
Peterson J et al (2009) The NIH human microbiome project. Genome Res 19:2317–2323
Kroeber M et al (2009) Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing. J Biotechnol 142:38–49
Boetius A et al (2000) A marine microbial consortium apparently mediating anaerobic oxidation of methane. Nature 407:623–626
DeAngelis KM et al (2011) Characterization of trapped lignin-degrading microbes in tropical forest soil. PLoS ONE 6:e19306
Ding H, Valentine DL (2008) Methanotrophic bacteria occupy benthic microbial mats in shallow marine hydrocarbon seeps, Coal Oil Point, California. J Geophys Res 113:G01015
Edwards R et al (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7:57
Havelsrud O et al (2011) A metagenomic study of methanotrophic microorganisms in coal oil Point seep sediments. BMC Microbiol 11:221
Poinar HN et al (2006) Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311:392–394
Turnbaugh PJ et al (2006) An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027–1131
Coetzee B et al (2010) Deep sequencing analysis of viruses infecting grapevines: virome of a vineyard. Virology 400:157–163
Lazarevic V et al (2009) Metagenomic study of the oral microbiota by Illumina high-throughput sequencing. J Microbiol Meth 79:266–271
Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65
Sorek R et al (2007) Genome-wide experimental determination of barriers to horizontal gene transfer. Science 318:1449–1452
Huse SM et al (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8:R143
Gilles A et al (2011) Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12:245
Bordoni R et al (2008) Evaluation of human gene variant detection in amplicon pools by the GS-FLX parallel pyrosequencer. BMC Genomics 9:464
Moore M et al (2006) Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol 6:17
Hornshøj H et al (2009) Transcriptomic and proteomic profiling of two porcine tissues using high-throughput technologies. BMC Genomics 10:30
Jimnez DJ et al (2012) Structural and functional insights from the metagenome of an acidic hot spring microbial planktonic community in the Columbian Andes. PLoS ONE 7(12):e50269
Kunin V et al (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123
Dohm JC et al (2008) Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 36:e105
Hillier LW et al (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat Meth 5:183–188
Aird D et al (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12:R18
Quail MA et al (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Meth 5:1005–1010
Kozarewa I et al (2009) Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G + C)-biased genomes. Nat Meth 6:291–295
Dohm JC et al (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17:1697–1706
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829
DiGuistini S et al (2009) De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biol 10:R94
Reinhardt JA et al (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res 19:294–305
Whiteford N et al (2005) An analysis of the feasibility of short read sequencing. Nucleic Acids Res 33:e171
Kassai-Jáger E et al (2008) Distribution and evolution of short tandem repeats in closely related bacterial genomes. Gene 410:18–25
Rothberg JM et al (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475:348–352
Bragg LM et al (2013) Shining a light on dark sequencing: characterising errors in ion torrent PGM data. PLoS Comp Biol 9(4):e1003031
Quail MA et al (2012) A tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genom 13:341
Loman NJ et al (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotech 30(5):434–439
Jünemann S et al (2013) Bacterial community shift in treated periodontitis patients revealed by ion torrent 16S rRNA gene amplicon sequencing. PLoS ONE 7(8):e41606
Yergeau E et al (2012) Next-generation sequencing of microbial communities in the Athabasca river and its tributaries in relation to oil sands mining activities. Appl Environ Microbiol 78(21):7626–7637
Solonenko SA et al (2013) Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics 14:320
Whitely AS et al (2012) Microbial 16S rRNA Ion Tag and community metagenome sequencing using the Ion Torrent (PGM) Platform. J Microbiol Meth 91:80–88
Seshadri R et al (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5:e75
Markowitz VM et al (2006) An experimental metagenome data management and analysis system. Bioinformatics 22:e359–e367
Meyer F et al (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386
The Hannon Lab FASTX toolkit. http://hannonlab.cshl.edu/fastx_toolkit/index.html
Babraham Bioinformatics FASTQC. FASTQC at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Blanca J et al (2011) ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics 12:285
Quinlan AR et al (2008) Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Meth 5:179–181
Ossowski S et al (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033
Balzer S et al (2010) Characteristics of 454 pyrosequencing data-enabling realistic simulation with flowsim. Bioinformatics 26:i420–i425
Quince C et al (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6:639–641
Bragg LM et al (2012) Fast, accurate error-correction of amplicon pyrosequences using Acacia. Nat Methods 9(5):425–426
Salzberg SL et al (2008) Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Comput Biol 4:e1000186
Simpson JT et al (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123
MacCallum I et al (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10:R103
Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330
Pop M et al (2004) Comparative genome assembly. Brief Bioinform 5(3):237–248
Peng Y et al (2011) Meta-IDBA: a De Novo assembler for metagenomic data. Bioinformatics 27(13):i94–i101
Ye Y, Tang H (2009) An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol 7: 455–471
Namiki T et al (2012) Metavelvet: an extension of Velvet Assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40(20):e155
Treangen TJ et al (2011) Next generation sequence assembly with AMOS. Curr Protoc Bioinform 33:11.8.1–11.8.18
Chevreux B, Wetter T, Suhai S (1999) Genome sequence assembly using trace signals and additional sequence information. Computer Sci Biol 99:45–56
Boisvert S et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122
Morowitz MJ et al (2011) Strain-resolved community genomic analysis of gut microbial colonization in a premature infant. Proc Natl Acad Sci U S A 108:1128–1133
Bonfield JK, Whitwham A (2010) Gap5—editing the billion fragment sequence assembly. Bioinformatics 26:1699–1703
Boetzer M et al (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–579
Salmela L et al (2011) Fast scaffolding with small independent mixed integer programs. Bioinformatics 27:3259–3265
Koren S, Treangen TJ, Pop M (2011) Bambus 2: scaffolding metagenomes. Bioinformatics 27:2964–2971
Eppley J et al (2007) Strainer: software for analysis of population variation in community genomic datasets. BMC Bioinformatics 8:398
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26:589–595
Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
Cole JR et al (2009) The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145
DeSantis TZ et al (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072
Pruesse E et al (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196
Huang Y, Gilna P, Li W (2009) Identification of ribosomal RNA genes in metagenomic fragments. Bioinformatics 25:1338–1340
Brady A, Salzberg SL (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Meth 6:673–676
Teeling H et al (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5:163
McHardy AC et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Meth 4:63–72
Mrázek J (2009) Phylogenetic signals in DNA composition: limitations and prospects. Mol Biol Evol 26:1163–1169
Albertsen M et al (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotech 31:533–538
Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 39:e91
Huson DH et al (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21:1552–1560
Chatterji S et al (2008) CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. Res Comput Mol Biol Proc 4955:17–28
Patil KR et al (2011) Taxonomic metagenome sequence assignment with structured output models. Nat Meth 8:191–192
Chan C-K et al (2008) Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 9:215
Diaz N et al (2009) TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10:56
Weber M et al (2011) Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME J 5:918–928
Meinicke P, Aßhauer KP, Lingner T (2011) Mixture models for analysis of the taxonomic composition of metagenomes. Bioinformatics 27:1618–1624
Schreiber F et al (2010) Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics 26:960–961
Besemer J, Borodovsky M (1999) Heuristic approach to deriving models for gene finding. Nucleic Acids Res 27:3911–3920
Noguchi H, Park J, Takagi T (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34:5623–5630
Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191
Hoff K et al (2008) Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9:217
Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Karp PD et al (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089
Overbeek R et al (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33:5691–5702
Finn RD et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222
Tatusov R et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41
Ye Y, Doak TG (2009) A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 5:e1000465
Huson DH et al (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386
Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71:8228–8235
Kristiansson E, Hugenholtz P, Dalevi D (2009) ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics 25:2737–2738
Rodriguez-Brito B, Rohwer F, Edwards RA (2006) An application of statistics to comparative metagenomics. BMC Bioinformatics 7:162
Segata N et al (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12:R60
Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26:715–721
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Bragg, L., Tyson, G.W. (2014). Metagenomics Using Next-Generation Sequencing. In: Paulsen, I., Holmes, A. (eds) Environmental Microbiology. Methods in Molecular Biology, vol 1096. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-712-9_15
Download citation
DOI: https://doi.org/10.1007/978-1-62703-712-9_15
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-711-2
Online ISBN: 978-1-62703-712-9
eBook Packages: Springer Protocols