Recovery and Analysis of Long-Read Metagenome-Assembled Genomes

Arumugam, Krithika; Bessarab, Irina; Haryono, Mindia A. S.; Williams, Rohan B. H.

doi:10.1007/978-1-0716-3072-3_12

Krithika Arumugam³,
Irina Bessarab⁴,
Mindia A. S. Haryono⁴ &
…
Rohan B. H. Williams⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2649))

1364 Accesses
8 Altmetric

Abstract

The development of long-read nucleic acid sequencing is beginning to make very substantive impact on the conduct of metagenome analysis, particularly in relation to the problem of recovering the genomes of member species of complex microbial communities. Here we outline bioinformatics workflows for the recovery and characterization of complete genomes from long-read metagenome data and some complementary procedures for comparison of cognate draft genomes and gene quality obtained from short-read sequencing and long-read sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Metagenomic Assembly: Reconstructing Genomes from Metagenomes

Unraveling metagenomics through long-read sequencing: a comprehensive review

Article Open access 28 January 2024

Computational Metagenomics: State-of-the-Art, Facts and Artifacts

References

Nicholls SM, Quick JC, Tang S, Loman NJ (2019) Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8. https://doi.org/10.1093/gigascience/giz043
Arumugam K, Bağcı C, Bessarab I et al (2019) Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data. Microbiome 7. https://doi.org/10.1186/s40168-019-0665-y
Somerville V, Lutz S, Schmid M et al (2019) Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system. BMC Microbiol 19(1):143
PubMed PubMed Central Google Scholar
Bertrand D, Shaw J, Kalathiyappan M et al (2019) Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat Biotechnol 37:937–944
CAS PubMed Google Scholar
Stewart RD, Auffret MD, Warr A et al (2019) Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat Biotechnol 37:953–961
CAS PubMed PubMed Central Google Scholar
Moss EL, Maghini DG, Bhatt AS (2020) Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol 38:701–707
CAS PubMed PubMed Central Google Scholar
Giguere DJ, Bahcheli AT, Joris BR, Paulssen JM (2020) Complete and validated genomes from a metagenome. bioRxiv
Google Scholar
Singleton CM, Petriglieri F, Kristensen JM et al (2021) Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing. Nat Commun 12:2009
CAS PubMed PubMed Central Google Scholar
Hu Y, Fang L, Nicholson C, Wang K (2020) Implications of error-prone long-read whole-genome shotgun sequencing on characterizing reference microbiomes. iScience 23:101223
CAS PubMed PubMed Central Google Scholar
Cuscó A, Pérez D, Viñes J et al (2021) Long-read metagenomics retrieves complete single-contig bacterial genomes from canine feces. BMC Genomics 22:330
PubMed PubMed Central Google Scholar
Arumugam K, Bessarab I, Haryono MAS et al (2021) Recovery of complete genomes and non-chromosomal replicons from activated sludge enrichment microbial communities with long read metagenome sequencing. NPJ Biofilms Microbiomes 7:1–13
Google Scholar
Liu L, Wang Y, Che Y et al (2020) High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method. Microbiome 8:155
CAS PubMed PubMed Central Google Scholar
Antipov D, Korobeynikov A, McLean JS, Pevzner PA (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32:1009–1015
CAS PubMed Google Scholar
Chng KR, Li C, Bertrand D et al (2020) Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment. Nat Med 26:941–951
CAS PubMed PubMed Central Google Scholar
Brown CL, Keenum IM, Dai D et al (2021) Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes. Sci Rep 11:3753
CAS PubMed PubMed Central Google Scholar
Morisse P, Lecroq T, Lefebvre A (2020) Long-read error correction: a survey and qualitative comparison. bioRxiv 2020.03.06.977975
Google Scholar
Andrews S, Others (2010) FastQC: a quality control tool for high throughput sequence data. Available online at http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17:10–12
Google Scholar
Wick R (2017) Porechop. Github. https://github.com/rrwick/Porechop
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834
CAS PubMed PubMed Central Google Scholar
Kolmogorov M, Bickhart DM, Behsaz B et al (2020) metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17:1103–1110
CAS PubMed Google Scholar
Kang DD, Li F, Kirton E et al (2019) MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7:e7359
PubMed PubMed Central Google Scholar
Huson DH, Beier S, Flade I et al (2016) MEGAN community edition—interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12:e1004957
PubMed PubMed Central Google Scholar
Huson DH, Albrecht B, Bağcı C et al (2018) MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol Direct 13:6
PubMed PubMed Central Google Scholar
Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60
CAS PubMed Google Scholar
Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH (2019) GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz848
Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055
CAS PubMed PubMed Central Google Scholar
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069
CAS PubMed Google Scholar
Vaser R, Sović I, Nagarajan N, Šikić M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746
CAS PubMed PubMed Central Google Scholar
Medaka—sequence correction tool provided by ONT. In: github. https://github.com/nanoporetech/medaka
Olm MR, Brown CT, Brooks B, Banfield JF (2017) dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11:2864–2868
CAS PubMed PubMed Central Google Scholar
Madden T (2013) The BLAST sequence analysis tool. In: The NCBI handbook [Internet], 2nd edn. National Center for Biotechnology Information (US), Bethesda
Google Scholar
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
CAS PubMed PubMed Central Google Scholar
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100
CAS PubMed PubMed Central Google Scholar
Danecek P, Bonfield JK, Liddle J et al (2021) Twelve years of SAMtools and BCFtools. Gigascience 10. https://doi.org/10.1093/gigascience/giab008
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
CAS PubMed PubMed Central Google Scholar
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Google Scholar
Wick RR, Judd LM, Holt KE (2019) Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20:129
PubMed PubMed Central Google Scholar
Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM (2013) An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8:e85024
PubMed PubMed Central Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
CAS PubMed PubMed Central Google Scholar
Bushnell B BBDuk: adapter. Quality trimming and filtering. https://sourceforge.net/projects/bbmap/
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327
CAS PubMed Google Scholar
Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991
CAS PubMed PubMed Central Google Scholar
Pop M (2009) Genome assembly reborn: recent computational challenges. Brief Bioinform 10:354–366
CAS PubMed PubMed Central Google Scholar
Pop M, Salzberg SL, Shumway M (2002) Genome sequence assembly: algorithms and issues. Computer 35:47–54
Google Scholar
Quince C, Walker AW, Simpson JT et al (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35:833–844
CAS PubMed Google Scholar
Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428
CAS PubMed Google Scholar
Boisvert S, Raymond F, Godzaridis E et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122
PubMed PubMed Central Google Scholar
Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676
CAS PubMed Google Scholar
Koren S, Walenz BP, Berlin K et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736
CAS PubMed PubMed Central Google Scholar
Wick RR, Judd LM, Gorrie CL, Holt KE (2017) Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13:e1005595
PubMed PubMed Central Google Scholar
Shafin K, Pesout T, Lorig-Roach R et al (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol 38:1044–1053
CAS PubMed PubMed Central Google Scholar
Vaser R, Šikić M (2021) Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 1:332–336
Google Scholar
Antipov D, Hartwick N, Shen M et al (2016) plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 32:3380–3387
CAS PubMed Google Scholar
Laczny CC, Kiefer C, Galata V et al (2017) BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation. Nucleic Acids Res 45:W171–W179
CAS PubMed PubMed Central Google Scholar
Krzywinski M, Schein J, Birol I et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
CAS PubMed PubMed Central Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
CAS PubMed PubMed Central Google Scholar
Walt AJ van der, van der Walt AJ, van Goethem MW et al (2017) Assembling metagenomes, one community at a time. BMC Genomics 18:521
Google Scholar
Xie F, Jin W, Si H et al (2021) An integrated gene catalog and over 10,000 metagenome-assembled genomes from the gastrointestinal microbiome of ruminants. Microbiome 9:137
CAS PubMed PubMed Central Google Scholar
Delmont TO, Eren AM, Maccario L et al (2015) Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics. Front Microbiol 6:358
PubMed PubMed Central Google Scholar
Slaby BM, Hackl T, Horn H et al (2017) Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization. ISME J 11:2465–2478
PubMed PubMed Central Google Scholar
Speth DR, In’t Zandt MH, Guerrero-Cruz S et al (2016) Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system. Nat Commun 7:11172
CAS PubMed PubMed Central Google Scholar
Parks DH, Rinke C, Chuvochina M et al (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542
CAS PubMed Google Scholar
Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165
PubMed PubMed Central Google Scholar
Field D, Amaral-Zettler L, Cochrane G et al (2011) The genomic standards consortium. PLoS Biol 9:e1001088
CAS PubMed PubMed Central Google Scholar
Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731
CAS PubMed PubMed Central Google Scholar
Parks DH, Chuvochina M, Waite DW et al (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004
CAS PubMed Google Scholar
Parks DH, Chuvochina M, Chaumeil P-A et al (2020) A complete domain-to-species taxonomy for bacteria and archaea. Nat Biotechnol 38:1079–1086
CAS PubMed Google Scholar
Watson M, Warr A (2019) Errors in long-read assemblies can critically affect protein prediction. Nat Biotechnol 37:124–126
CAS PubMed Google Scholar
Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32:2103–2110
CAS PubMed PubMed Central Google Scholar
Lee C (2003) Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 19:999–1008
CAS PubMed Google Scholar
Lee C, Grasso C, Sharlow MF (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–464
CAS PubMed Google Scholar
Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090
CAS PubMed Google Scholar
Quince C, Nurk S, Raguideau S et al (2021) Metagenomics strain resolution on assembly graphs. Genome Biol 22(1):214. https://doi.org/10.1186/s13059-021-02419-7
Article CAS PubMed PubMed Central Google Scholar
Wick RR, Schultz MB, Zobel J, Holt KE (2015) Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31:3350–3352
CAS PubMed PubMed Central Google Scholar
Li H (2012) seqtk Toolkit for processing sequences in FASTA/Q formats. GitHub 767:69
Google Scholar
Yue Y, Huang H, Qi Z et al (2020) Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinform 21:334
CAS Google Scholar
Sczyrba A, Hofmann P, Belmann P et al (2017) Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat Methods 14:1063–1071
CAS PubMed PubMed Central Google Scholar
Wickramarachchi A, Mallawaarachchi V, Rajan V, Lin Y (2020) MetaBCC-LR: metagenomics binning by coverage and composition for long reads. Bioinformatics 36:i3–i11
CAS PubMed PubMed Central Google Scholar
Mölder F, Jablonski KP, Letcher B et al (2021) Sustainable data analysis with Snakemake. F1000Res 10:33
PubMed PubMed Central Google Scholar
Di Tommaso P, Chatzou M, Floden EW et al (2017) Nextflow enables reproducible computational workflows. Nat Biotechnol 35:316–319
PubMed Google Scholar

Download references

Acknowledgments

This research was supported by the Singapore National Research Foundation and Ministry of Education under the Research Centre of Excellence Programme and by program grants 1102-IRIS-10-02 and 1301-IRIS-59 from the National Research Foundation (NRF), and in part by the Life Sciences Institute (LSI), National University of Singapore, and the National Supercompufting Centre (NSCC), Singapore, supported by Project 11000984. We thank our colleagues Xianghui Liu, Rogelio E. Zuniga-Montanez, Samarpita Roy, Guanglei Qiu, Ying Yu Law, Stefan Wuertz, Daniela I.Drautz-Moses, Federico M. Lauro, Daniel H. Huson, Peerada Prommeenate, Benjaphon Suraraksa, Varunee Kongduan, Adeline Chua, and Yuguang Ipsen for excellent collaboration in relation to sample and/or data provision, data analysis, and code provision.

Author information

Authors and Affiliations

Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
Krithika Arumugam
Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
Irina Bessarab, Mindia A. S. Haryono & Rohan B. H. Williams

Authors

Krithika Arumugam
View author publications
You can also search for this author in PubMed Google Scholar
Irina Bessarab
View author publications
You can also search for this author in PubMed Google Scholar
Mindia A. S. Haryono
View author publications
You can also search for this author in PubMed Google Scholar
Rohan B. H. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan B. H. Williams .

Editor information

Editors and Affiliations

Leeds Institute of Medical Research, University of Leeds, Leeds, UK
Suparna Mitra

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Arumugam, K., Bessarab, I., Haryono, M.A.S., Williams, R.B.H. (2023). Recovery and Analysis of Long-Read Metagenome-Assembled Genomes. In: Mitra, S. (eds) Metagenomic Data Analysis. Methods in Molecular Biology, vol 2649. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3072-3_12

Download citation

DOI: https://doi.org/10.1007/978-1-0716-3072-3_12
Published: 01 June 2023
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3071-6
Online ISBN: 978-1-0716-3072-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Recovery and Analysis of Long-Read Metagenome-Assembled Genomes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Metagenomic Assembly: Reconstructing Genomes from Metagenomes

Unraveling metagenomics through long-read sequencing: a comprehensive review

Computational Metagenomics: State-of-the-Art, Facts and Artifacts

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Recovery and Analysis of Long-Read Metagenome-Assembled Genomes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Metagenomic Assembly: Reconstructing Genomes from Metagenomes

Unraveling metagenomics through long-read sequencing: a comprehensive review

Computational Metagenomics: State-of-the-Art, Facts and Artifacts

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation