Abstract
The development of long-read nucleic acid sequencing is beginning to make very substantive impact on the conduct of metagenome analysis, particularly in relation to the problem of recovering the genomes of member species of complex microbial communities. Here we outline bioinformatics workflows for the recovery and characterization of complete genomes from long-read metagenome data and some complementary procedures for comparison of cognate draft genomes and gene quality obtained from short-read sequencing and long-read sequencing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Nicholls SM, Quick JC, Tang S, Loman NJ (2019) Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8. https://doi.org/10.1093/gigascience/giz043
Arumugam K, Bağcı C, Bessarab I et al (2019) Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data. Microbiome 7. https://doi.org/10.1186/s40168-019-0665-y
Somerville V, Lutz S, Schmid M et al (2019) Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system. BMC Microbiol 19(1):143
Bertrand D, Shaw J, Kalathiyappan M et al (2019) Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat Biotechnol 37:937–944
Stewart RD, Auffret MD, Warr A et al (2019) Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat Biotechnol 37:953–961
Moss EL, Maghini DG, Bhatt AS (2020) Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol 38:701–707
Giguere DJ, Bahcheli AT, Joris BR, Paulssen JM (2020) Complete and validated genomes from a metagenome. bioRxiv
Singleton CM, Petriglieri F, Kristensen JM et al (2021) Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing. Nat Commun 12:2009
Hu Y, Fang L, Nicholson C, Wang K (2020) Implications of error-prone long-read whole-genome shotgun sequencing on characterizing reference microbiomes. iScience 23:101223
Cuscó A, Pérez D, Viñes J et al (2021) Long-read metagenomics retrieves complete single-contig bacterial genomes from canine feces. BMC Genomics 22:330
Arumugam K, Bessarab I, Haryono MAS et al (2021) Recovery of complete genomes and non-chromosomal replicons from activated sludge enrichment microbial communities with long read metagenome sequencing. NPJ Biofilms Microbiomes 7:1–13
Liu L, Wang Y, Che Y et al (2020) High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method. Microbiome 8:155
Antipov D, Korobeynikov A, McLean JS, Pevzner PA (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32:1009–1015
Chng KR, Li C, Bertrand D et al (2020) Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment. Nat Med 26:941–951
Brown CL, Keenum IM, Dai D et al (2021) Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes. Sci Rep 11:3753
Morisse P, Lecroq T, Lefebvre A (2020) Long-read error correction: a survey and qualitative comparison. bioRxiv 2020.03.06.977975
Andrews S, Others (2010) FastQC: a quality control tool for high throughput sequence data. Available online at http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17:10–12
Wick R (2017) Porechop. Github. https://github.com/rrwick/Porechop
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834
Kolmogorov M, Bickhart DM, Behsaz B et al (2020) metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17:1103–1110
Kang DD, Li F, Kirton E et al (2019) MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7:e7359
Huson DH, Beier S, Flade I et al (2016) MEGAN community edition—interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12:e1004957
Huson DH, Albrecht B, Bağcı C et al (2018) MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol Direct 13:6
Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60
Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH (2019) GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz848
Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069
Vaser R, Sović I, Nagarajan N, Šikić M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746
Medaka—sequence correction tool provided by ONT. In: github. https://github.com/nanoporetech/medaka
Olm MR, Brown CT, Brooks B, Banfield JF (2017) dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11:2864–2868
Madden T (2013) The BLAST sequence analysis tool. In: The NCBI handbook [Internet], 2nd edn. National Center for Biotechnology Information (US), Bethesda
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100
Danecek P, Bonfield JK, Liddle J et al (2021) Twelve years of SAMtools and BCFtools. Gigascience 10. https://doi.org/10.1093/gigascience/giab008
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Wick RR, Judd LM, Holt KE (2019) Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20:129
Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM (2013) An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8:e85024
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Bushnell B BBDuk: adapter. Quality trimming and filtering. https://sourceforge.net/projects/bbmap/
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327
Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991
Pop M (2009) Genome assembly reborn: recent computational challenges. Brief Bioinform 10:354–366
Pop M, Salzberg SL, Shumway M (2002) Genome sequence assembly: algorithms and issues. Computer 35:47–54
Quince C, Walker AW, Simpson JT et al (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35:833–844
Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428
Boisvert S, Raymond F, Godzaridis E et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122
Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676
Koren S, Walenz BP, Berlin K et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736
Wick RR, Judd LM, Gorrie CL, Holt KE (2017) Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13:e1005595
Shafin K, Pesout T, Lorig-Roach R et al (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol 38:1044–1053
Vaser R, Šikić M (2021) Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 1:332–336
Antipov D, Hartwick N, Shen M et al (2016) plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 32:3380–3387
Laczny CC, Kiefer C, Galata V et al (2017) BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation. Nucleic Acids Res 45:W171–W179
Krzywinski M, Schein J, Birol I et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Walt AJ van der, van der Walt AJ, van Goethem MW et al (2017) Assembling metagenomes, one community at a time. BMC Genomics 18:521
Xie F, Jin W, Si H et al (2021) An integrated gene catalog and over 10,000 metagenome-assembled genomes from the gastrointestinal microbiome of ruminants. Microbiome 9:137
Delmont TO, Eren AM, Maccario L et al (2015) Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics. Front Microbiol 6:358
Slaby BM, Hackl T, Horn H et al (2017) Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization. ISME J 11:2465–2478
Speth DR, In’t Zandt MH, Guerrero-Cruz S et al (2016) Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system. Nat Commun 7:11172
Parks DH, Rinke C, Chuvochina M et al (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542
Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165
Field D, Amaral-Zettler L, Cochrane G et al (2011) The genomic standards consortium. PLoS Biol 9:e1001088
Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731
Parks DH, Chuvochina M, Waite DW et al (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004
Parks DH, Chuvochina M, Chaumeil P-A et al (2020) A complete domain-to-species taxonomy for bacteria and archaea. Nat Biotechnol 38:1079–1086
Watson M, Warr A (2019) Errors in long-read assemblies can critically affect protein prediction. Nat Biotechnol 37:124–126
Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32:2103–2110
Lee C (2003) Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 19:999–1008
Lee C, Grasso C, Sharlow MF (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–464
Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090
Quince C, Nurk S, Raguideau S et al (2021) Metagenomics strain resolution on assembly graphs. Genome Biol 22(1):214. https://doi.org/10.1186/s13059-021-02419-7
Wick RR, Schultz MB, Zobel J, Holt KE (2015) Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31:3350–3352
Li H (2012) seqtk Toolkit for processing sequences in FASTA/Q formats. GitHub 767:69
Yue Y, Huang H, Qi Z et al (2020) Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinform 21:334
Sczyrba A, Hofmann P, Belmann P et al (2017) Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat Methods 14:1063–1071
Wickramarachchi A, Mallawaarachchi V, Rajan V, Lin Y (2020) MetaBCC-LR: metagenomics binning by coverage and composition for long reads. Bioinformatics 36:i3–i11
Mölder F, Jablonski KP, Letcher B et al (2021) Sustainable data analysis with Snakemake. F1000Res 10:33
Di Tommaso P, Chatzou M, Floden EW et al (2017) Nextflow enables reproducible computational workflows. Nat Biotechnol 35:316–319
Acknowledgments
This research was supported by the Singapore National Research Foundation and Ministry of Education under the Research Centre of Excellence Programme and by program grants 1102-IRIS-10-02 and 1301-IRIS-59 from the National Research Foundation (NRF), and in part by the Life Sciences Institute (LSI), National University of Singapore, and the National Supercompufting Centre (NSCC), Singapore, supported by Project 11000984. We thank our colleagues Xianghui Liu, Rogelio E. Zuniga-Montanez, Samarpita Roy, Guanglei Qiu, Ying Yu Law, Stefan Wuertz, Daniela I.Drautz-Moses, Federico M. Lauro, Daniel H. Huson, Peerada Prommeenate, Benjaphon Suraraksa, Varunee Kongduan, Adeline Chua, and Yuguang Ipsen for excellent collaboration in relation to sample and/or data provision, data analysis, and code provision.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Arumugam, K., Bessarab, I., Haryono, M.A.S., Williams, R.B.H. (2023). Recovery and Analysis of Long-Read Metagenome-Assembled Genomes. In: Mitra, S. (eds) Metagenomic Data Analysis. Methods in Molecular Biology, vol 2649. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3072-3_12
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3072-3_12
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3071-6
Online ISBN: 978-1-0716-3072-3
eBook Packages: Springer Protocols