Skip to main content

Recovery and Analysis of Long-Read Metagenome-Assembled Genomes

  • Protocol
  • First Online:
Metagenomic Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2649))

Abstract

The development of long-read nucleic acid sequencing is beginning to make very substantive impact on the conduct of metagenome analysis, particularly in relation to the problem of recovering the genomes of member species of complex microbial communities. Here we outline bioinformatics workflows for the recovery and characterization of complete genomes from long-read metagenome data and some complementary procedures for comparison of cognate draft genomes and gene quality obtained from short-read sequencing and long-read sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Nicholls SM, Quick JC, Tang S, Loman NJ (2019) Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8. https://doi.org/10.1093/gigascience/giz043

  2. Arumugam K, Bağcı C, Bessarab I et al (2019) Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data. Microbiome 7. https://doi.org/10.1186/s40168-019-0665-y

  3. Somerville V, Lutz S, Schmid M et al (2019) Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system. BMC Microbiol 19(1):143

    PubMed  PubMed Central  Google Scholar 

  4. Bertrand D, Shaw J, Kalathiyappan M et al (2019) Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat Biotechnol 37:937–944

    CAS  PubMed  Google Scholar 

  5. Stewart RD, Auffret MD, Warr A et al (2019) Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat Biotechnol 37:953–961

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Moss EL, Maghini DG, Bhatt AS (2020) Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol 38:701–707

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Giguere DJ, Bahcheli AT, Joris BR, Paulssen JM (2020) Complete and validated genomes from a metagenome. bioRxiv

    Google Scholar 

  8. Singleton CM, Petriglieri F, Kristensen JM et al (2021) Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing. Nat Commun 12:2009

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Hu Y, Fang L, Nicholson C, Wang K (2020) Implications of error-prone long-read whole-genome shotgun sequencing on characterizing reference microbiomes. iScience 23:101223

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Cuscó A, Pérez D, Viñes J et al (2021) Long-read metagenomics retrieves complete single-contig bacterial genomes from canine feces. BMC Genomics 22:330

    PubMed  PubMed Central  Google Scholar 

  11. Arumugam K, Bessarab I, Haryono MAS et al (2021) Recovery of complete genomes and non-chromosomal replicons from activated sludge enrichment microbial communities with long read metagenome sequencing. NPJ Biofilms Microbiomes 7:1–13

    Google Scholar 

  12. Liu L, Wang Y, Che Y et al (2020) High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method. Microbiome 8:155

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Antipov D, Korobeynikov A, McLean JS, Pevzner PA (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32:1009–1015

    CAS  PubMed  Google Scholar 

  14. Chng KR, Li C, Bertrand D et al (2020) Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment. Nat Med 26:941–951

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Brown CL, Keenum IM, Dai D et al (2021) Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes. Sci Rep 11:3753

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Morisse P, Lecroq T, Lefebvre A (2020) Long-read error correction: a survey and qualitative comparison. bioRxiv 2020.03.06.977975

    Google Scholar 

  17. Andrews S, Others (2010) FastQC: a quality control tool for high throughput sequence data. Available online at http://www.bioinformatics.babraham.ac.uk/projects/fastqc

  18. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17:10–12

    Google Scholar 

  19. Wick R (2017) Porechop. Github. https://github.com/rrwick/Porechop

  20. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Kolmogorov M, Bickhart DM, Behsaz B et al (2020) metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17:1103–1110

    CAS  PubMed  Google Scholar 

  22. Kang DD, Li F, Kirton E et al (2019) MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7:e7359

    PubMed  PubMed Central  Google Scholar 

  23. Huson DH, Beier S, Flade I et al (2016) MEGAN community edition—interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12:e1004957

    PubMed  PubMed Central  Google Scholar 

  24. Huson DH, Albrecht B, Bağcı C et al (2018) MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol Direct 13:6

    PubMed  PubMed Central  Google Scholar 

  25. Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60

    CAS  PubMed  Google Scholar 

  26. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH (2019) GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz848

  27. Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069

    CAS  PubMed  Google Scholar 

  29. Vaser R, Sović I, Nagarajan N, Šikić M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Medaka—sequence correction tool provided by ONT. In: github. https://github.com/nanoporetech/medaka

  31. Olm MR, Brown CT, Brooks B, Banfield JF (2017) dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11:2864–2868

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Madden T (2013) The BLAST sequence analysis tool. In: The NCBI handbook [Internet], 2nd edn. National Center for Biotechnology Information (US), Bethesda

    Google Scholar 

  33. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Danecek P, Bonfield JK, Liddle J et al (2021) Twelve years of SAMtools and BCFtools. Gigascience 10. https://doi.org/10.1093/gigascience/giab008

  36. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842

    CAS  PubMed  PubMed Central  Google Scholar 

  37. R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/

    Google Scholar 

  38. Wick RR, Judd LM, Holt KE (2019) Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20:129

    PubMed  PubMed Central  Google Scholar 

  39. Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM (2013) An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8:e85024

    PubMed  PubMed Central  Google Scholar 

  40. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Bushnell B BBDuk: adapter. Quality trimming and filtering. https://sourceforge.net/projects/bbmap/

  42. Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327

    CAS  PubMed  Google Scholar 

  43. Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Pop M (2009) Genome assembly reborn: recent computational challenges. Brief Bioinform 10:354–366

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Pop M, Salzberg SL, Shumway M (2002) Genome sequence assembly: algorithms and issues. Computer 35:47–54

    Google Scholar 

  46. Quince C, Walker AW, Simpson JT et al (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35:833–844

    CAS  PubMed  Google Scholar 

  47. Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428

    CAS  PubMed  Google Scholar 

  48. Boisvert S, Raymond F, Godzaridis E et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122

    PubMed  PubMed Central  Google Scholar 

  49. Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676

    CAS  PubMed  Google Scholar 

  50. Koren S, Walenz BP, Berlin K et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Wick RR, Judd LM, Gorrie CL, Holt KE (2017) Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13:e1005595

    PubMed  PubMed Central  Google Scholar 

  52. Shafin K, Pesout T, Lorig-Roach R et al (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol 38:1044–1053

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Vaser R, Šikić M (2021) Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 1:332–336

    Google Scholar 

  54. Antipov D, Hartwick N, Shen M et al (2016) plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 32:3380–3387

    CAS  PubMed  Google Scholar 

  55. Laczny CC, Kiefer C, Galata V et al (2017) BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation. Nucleic Acids Res 45:W171–W179

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Krzywinski M, Schein J, Birol I et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Walt AJ van der, van der Walt AJ, van Goethem MW et al (2017) Assembling metagenomes, one community at a time. BMC Genomics 18:521

    Google Scholar 

  59. Xie F, Jin W, Si H et al (2021) An integrated gene catalog and over 10,000 metagenome-assembled genomes from the gastrointestinal microbiome of ruminants. Microbiome 9:137

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Delmont TO, Eren AM, Maccario L et al (2015) Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics. Front Microbiol 6:358

    PubMed  PubMed Central  Google Scholar 

  61. Slaby BM, Hackl T, Horn H et al (2017) Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization. ISME J 11:2465–2478

    PubMed  PubMed Central  Google Scholar 

  62. Speth DR, In’t Zandt MH, Guerrero-Cruz S et al (2016) Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system. Nat Commun 7:11172

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Parks DH, Rinke C, Chuvochina M et al (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542

    CAS  PubMed  Google Scholar 

  64. Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165

    PubMed  PubMed Central  Google Scholar 

  65. Field D, Amaral-Zettler L, Cochrane G et al (2011) The genomic standards consortium. PLoS Biol 9:e1001088

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Parks DH, Chuvochina M, Waite DW et al (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004

    CAS  PubMed  Google Scholar 

  68. Parks DH, Chuvochina M, Chaumeil P-A et al (2020) A complete domain-to-species taxonomy for bacteria and archaea. Nat Biotechnol 38:1079–1086

    CAS  PubMed  Google Scholar 

  69. Watson M, Warr A (2019) Errors in long-read assemblies can critically affect protein prediction. Nat Biotechnol 37:124–126

    CAS  PubMed  Google Scholar 

  70. Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32:2103–2110

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Lee C (2003) Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 19:999–1008

    CAS  PubMed  Google Scholar 

  72. Lee C, Grasso C, Sharlow MF (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–464

    CAS  PubMed  Google Scholar 

  73. Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090

    CAS  PubMed  Google Scholar 

  74. Quince C, Nurk S, Raguideau S et al (2021) Metagenomics strain resolution on assembly graphs. Genome Biol 22(1):214. https://doi.org/10.1186/s13059-021-02419-7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Wick RR, Schultz MB, Zobel J, Holt KE (2015) Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31:3350–3352

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Li H (2012) seqtk Toolkit for processing sequences in FASTA/Q formats. GitHub 767:69

    Google Scholar 

  77. Yue Y, Huang H, Qi Z et al (2020) Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinform 21:334

    CAS  Google Scholar 

  78. Sczyrba A, Hofmann P, Belmann P et al (2017) Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat Methods 14:1063–1071

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Wickramarachchi A, Mallawaarachchi V, Rajan V, Lin Y (2020) MetaBCC-LR: metagenomics binning by coverage and composition for long reads. Bioinformatics 36:i3–i11

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Mölder F, Jablonski KP, Letcher B et al (2021) Sustainable data analysis with Snakemake. F1000Res 10:33

    PubMed  PubMed Central  Google Scholar 

  81. Di Tommaso P, Chatzou M, Floden EW et al (2017) Nextflow enables reproducible computational workflows. Nat Biotechnol 35:316–319

    PubMed  Google Scholar 

Download references

Acknowledgments

This research was supported by the Singapore National Research Foundation and Ministry of Education under the Research Centre of Excellence Programme and by program grants 1102-IRIS-10-02 and 1301-IRIS-59 from the National Research Foundation (NRF), and in part by the Life Sciences Institute (LSI), National University of Singapore, and the National Supercompufting Centre (NSCC), Singapore, supported by Project 11000984. We thank our colleagues Xianghui Liu, Rogelio E. Zuniga-Montanez, Samarpita Roy, Guanglei Qiu, Ying Yu Law, Stefan Wuertz, Daniela I.Drautz-Moses, Federico M. Lauro, Daniel H. Huson, Peerada Prommeenate, Benjaphon Suraraksa, Varunee Kongduan, Adeline Chua, and Yuguang Ipsen for excellent collaboration in relation to sample and/or data provision, data analysis, and code provision.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohan B. H. Williams .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Arumugam, K., Bessarab, I., Haryono, M.A.S., Williams, R.B.H. (2023). Recovery and Analysis of Long-Read Metagenome-Assembled Genomes. In: Mitra, S. (eds) Metagenomic Data Analysis. Methods in Molecular Biology, vol 2649. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3072-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3072-3_12

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3071-6

  • Online ISBN: 978-1-0716-3072-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics