Skip to main content

Statistical and Computational Methods for Microbial Strain Analysis

  • Protocol
  • First Online:
Statistical Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2629))

Abstract

Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced “too many” recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.

Short title: Resolving Microbial Strain Mixtures

This research is supported by NIH grants GM123056 and P30DK050306.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lloyd-Price J, Arze C, Ananthakrishnan AN et al (2019) Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569(7758):655–662

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Zhou W, Sailani MR, Contrepois K et al (2019) Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 569(7758):663–671

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Van Rossum T, Ferretti P, Maistrenko OM, Bork P (2020) Diversity within species: interpreting strains in microbiomes. Nat Rev Microbiol 18(9):491–506

    PubMed  PubMed Central  Google Scholar 

  4. Fournier PE, Dubourg G, Raoult D (2014) Clinical detection and characterization of bacterial pathogens in the genomics era. Genome Med 6(11):1–15

    Google Scholar 

  5. Quince C, Nurk S, Raguideau S et al (2021) Strong: metagenomics strain resolution on assembly graphs. Genome Biol 22(1):1–34

    Google Scholar 

  6. van Dijk LR, Walker BJ, Straub TJ et al (2022) StrainGE: A toolkit to track and characterize low-abundance strains in complex microbial communities. Genome Biol 23(1):1–27

    Google Scholar 

  7. Smith BJ, Li X, Abate A et al (2022) Scalable microbial strain inference in metagenomic data using StrainFacts. bioRxiv

    Google Scholar 

  8. Lloyd-Price J, Mahurkar A, Rahnavard G et al (2017) Strains, functions and dynamics in the expanded human microbiome project. Nature 550(7674):61–66

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Smillie CS, Sauk J, Gevers D et al (2018) Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23(2):229–240

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Zhou W, Spoto M, Hardy R et al (2020) Host-specific evolutionary and transmission dynamics shape the functional diversification of staphylococcus epidermidis in human skin. Cell 180(3):454–470

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Garrity GM, Parker CT, Tindall BJ (2015) International code of nomenclature of prokaryotes. Int J Syst Evol Microbiol 90(6)

    Google Scholar 

  12. Jain C, Rodriguez-R LM, Phillippy AM et al (2018) High throughput ani analysis of 90k prokaryotic genomes reveals clear species boundaries. Nat Commun 9(1):1–8

    Google Scholar 

  13. Yan Y, Nguyen LH, Franzosa EA, Huttenhower C (2020) Strain-level epidemiology of microbial communities and the human microbiome. Genome Med 12(1):1–16

    Google Scholar 

  14. Segata N, Waldron L, Ballarini A et al (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9(8):811–814

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):1–12

    Google Scholar 

  16. Callahan BJ, McMurdie PJ, Rosen MJ et al (2016) Dada2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13(7):581–583

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Brenner DJ, Fanning G, Steigerwalt A et al (1972) Polynucleotide sequence relatedness among three groups of pathogenic Escherichia coli strains. Infect Immun 6(3):308–315

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Truong DT, Tett A, Pasolli E et al (2017) Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 27(4):626–638

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Albanese D, Donati C (2017) Strain profiling and epidemiology of bacterial species from metagenomic sequencing. Nat Commun 8(1):1–14

    CAS  Google Scholar 

  20. Li X, Saadat S, Hu H, Li X (2019) BHap: a novel approach for bacterial haplotype reconstruction. Bioinformatics 35(22):4624–4631

    PubMed  PubMed Central  Google Scholar 

  21. Anyansi C, Straub TJ, Manson AL et al (2020) Computational methods for strain-level microbial detection in colony and metagenome sequencing data. Front Microbiol 11:1925

    PubMed  PubMed Central  Google Scholar 

  22. Garg S (2021) Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 22(1):1–24

    Google Scholar 

  23. Ghazi AR, Münch PC, Chen D et al (2022) Strain identification and quantitative analysis in microbial communities. J Mol Biol, 167582

    Google Scholar 

  24. Quince C, Delmont TO, Raguideau S et al (2017) Desman: a new tool for de novo extraction of strains from metagenomes. Genome Biol 18(1):1–22

    Google Scholar 

  25. Pasolli E, Asnicar F, Manara S et al (2019) Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176(3):649–662

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Loh PR, Danecek P, Palamara PF et al (2016) Reference-based phasing using the haplotype reference consortium panel. Nat Genet 48(11):1443–1448

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Scholz M, Ward DV, Pasolli E et al (2016) Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods 13(5):435–438

    CAS  PubMed  Google Scholar 

  28. Wang S, Jiang Y, Li S (2021) Pstrain: an iterative microbial strains profiling algorithm for shotgun metagenomic sequencing data. Bioinformatics 36(22–23):5499–5506

    CAS  Google Scholar 

  29. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60

    Google Scholar 

  30. Cowles MK, Carlin BP (1996) Markov chain Monte Carlo convergence diagnostics: a comparative review. J Am Stat Assoc 91(434):883–904

    Google Scholar 

  31. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Dilthey AT, Jain C, Koren S, Phillippy AM (2019) Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat Commun 10(1):1–12

    CAS  Google Scholar 

  33. Luo C, Knight R, Siljander H et al (2015) Constrains identifies microbial strains in metagenomic datasets. Nat Biotechnol 33(10):1045–1052

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike information criterion statistics. Dordrecht, The Netherlands: D Reidel 81(10.5555):26853

    Google Scholar 

  35. Li H (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application 2:73–94. https://doi.org/10.1146/annurevstatistics010814020351

    Google Scholar 

  36. Sinha R, Abu-Ali G, Vogtmann E et al (2017) Assessment of variation in microbial community amplicon sequencing by the microbiome quality control (MBQC) project consortium. Nat Biotechnol 35(11):1077–1086

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Neale B, Rivas M, Voight B et al (2011) Testing for an unusual distribution of rare variants. PLoS Genet 7(3):e1001322

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Wu M, Lee S, Cai T et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Moss EL, Maghini DG, Bhatt AS (2020) Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol 38:701–707

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Vicedomini R, Quince C, Darling AE, Chikhi R (2021) Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat Commun 12:4485

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Emiola A, Zhou W, Oh J (2020) Metagenomic growth rate inferences of strains in situ. Sci Adv 6(17):eaaz2299

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Andrew Ghazi, Phillip Münch, Di Chen, Jordan Jensen, Yancong Zhang, Curtis Huttenhower, and Christopher Quince for their helpful input and discussions on this review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhe Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Ma, S., Li, H. (2023). Statistical and Computational Methods for Microbial Strain Analysis. In: Fridley, B., Wang, X. (eds) Statistical Genomics. Methods in Molecular Biology, vol 2629. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2986-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2986-4_11

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2985-7

  • Online ISBN: 978-1-0716-2986-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics