Skip to main content

Analyses of Nuclear Reads Obtained Using Genome Skimming

  • Protocol
  • First Online:
DNA Barcoding

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2744))

  • 579 Accesses

Abstract

In this protocol paper, we review a set of methods developed in recent years for analyzing nuclear reads obtained from genome skimming. As the cost of sequencing drops, genome skimming (low-coverage shotgun sequencing of a sample) becomes increasingly a cost-effective method of measuring biodiversity at high resolution. While most practitioners only use assembled over-represented organelle reads from a genome skim, the vast majority of the reads are nuclear. Using assembly-free and alignment-free methods described in this protocol, we can compare samples to each other and reference genomes to compute distances, characterize underlying genomes, and infer evolutionary relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Supple MA, Shapiro B (2018) Conservation of biodiversity in the genomics era. Genome Biol 19(1):1–12

    Article  Google Scholar 

  2. DNA Sequencing Costs-NHGRI. https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data

  3. Nebula Genomics, Partnering with BGI, Sets Industry Standard by Offering 30x Whole-Genome Sequencing for $299 (2020). https://www.biospace.com/article/releases/nebula-genomics-partnering-with-bgi-sets-industry-standard-by-offering-30x-whole-genome-sequencing-for-299/

  4. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc B Biol Sci 270(1512):313–321

    Article  CAS  Google Scholar 

  5. Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane R (2005) Towards writing the Encyclopaedia of life: an introduction to DNA barcoding. Philos Trans R Soc B Biol Sci 360(1462):1805–1811

    Article  CAS  Google Scholar 

  6. Taberlet P, Coissac E, Pompanon F, Brochmann C, Willlerslev E (2012) Towards next-generation biodiversity assessment using DNA metabarcoding. Mol Ecol 21(8):2045–2050

    Article  CAS  PubMed  Google Scholar 

  7. Seifert KA, Samson RA, DeWaard JR, Houbraken J, Levesque CA, Moncalvo JM, Louis-Seize G, Hebert PDN (2007) Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc Natl Acad Sci 104(10):3901–3906

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Vences M, Thomas M, van der Meijden A, Chiari Y, Vieites DR (2005) Comparative performance of the 16S rRNA gene in DNA barcoding of amphibians. Frontiers in Zoology 2:5. ISBN: 1742999425

    Google Scholar 

  9. Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, van der Bank M, Chase MW, Cowan RS, Erickson DL, Fazekas AJ, Graham SW, James KE, Kim KJ, Kress WJ, Schneider H, van AlphenStahl J, Barrett SC, van den Berg C, Bogarin D, Burgess KS, Cameron KM, Carine M, Chacon J, Clark A, Clarkson JJ, Conrad F, Devey DS, Ford CS, Hedderson TA, Hollingsworth ML, Husband BC, Kelly LJ, Kesanakurti PR, Kim JS, Kim YD, Lahaye R, Lee HL, Long DG, Madrinan S, Maurin O, Meusnier I, Newmaster SG, Park CW, Percy DM, Petersen G, Richardson JE, Salazar GA, Savolainen V, Seberg O, Wilkinson MJ, Yi DK, Little DP (2009) A DNA barcode for land plants. Proc Natl Acad Sci 106(31):12794–12797

    Article  CAS  PubMed Central  Google Scholar 

  10. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Bolchacova E, Voigt K, Crous PW, Miller AN, Wingfield MJ, Aime MC, An KD, Bai FY, Barreto RW, Begerow D, Bergeron MJ, Blackwell M, Boekhout T, Bogale M, Boonyuen N, Burgaz AR, Buyck B, Cai L, Cai Q, Cardinali G, Chaverri P, Coppins BJ, Crespo A, Cubas P, Cummings C, Damm U, de Beer ZW, de Hoog GS, Del-Prado R, Dentinger B, Dieguez-Uribeondo J, Divakar PK, Douglas B, Duenas M, Duong TA, Eberhardt U, Edwards JE, Elshahed MS, Fliegerova K, Furtado M, Garcia MA, Ge ZW, Griffith GW, Griffiths K, Groenewald JZ, Groenewald M, Grube M, Gryzenhout M, Guo LD, Hagen F, Hambleton S, Hamelin RC, Hansen K, Harrold P, Heller G, Herrera C, Hirayama K, Hirooka Y, Ho HM, Hoffmann K, Hofstetter V, Hognabba F, Hollingsworth PM, Hong SB, Hosaka K, Houbraken J, Hughes K, Huhtinen S, Hyde KD, James T, Johnson EM, Johnson JE, Johnston PR, Jones EBG, Kelly LJ, Kirk PM, Knapp DG, Koljalg U, Kovacs GM, Kurtzman CP, Landvik S, Leavitt SD, Liggenstoffer AS, Liimatainen K, Lombard L, Luangsa-ard JJ, Lumbsch HT, Maganti H, Maharachchikumbura SSN, Martin MP, May TW, McTaggart AR, Methven AS, Meyer W, Moncalvo JM, Mongkolsamrit S, Nagy LG, Nilsson RH, Niskanen T, Nyilasi I, Okada G, Okane I, Olariaga I, Otte J, Papp T, Park D, Petkovits T, Pino-Bodas R, Quaedvlieg W, Raja HA, Redecker D, Rintoul TL, Ruibal C, Sarmiento-Ramirez JM, Schmitt I, Schussler A, Shearer C, Sotome K, Stefani FOP, Stenroos S, Stielow B, Stockinger H, Suetrong S, Suh SO, Sung GH, Suzuki M, Tanaka K, Tedersoo L, Telleria MT, Tretter E, Untereiner WA, Urbina H, Vagvolgyi C, Vialle A, Vu TD, Walther G, Wang QM, Wang Y, Weir BS, Weiss M, White MM, Xu J, Yahr R, Yang ZL, Yurkov A, Zamora JC, Zhang N, Zhuang WY, Schindel D (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci 109(16):6241–6246

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Steinke D, Vences M, Salzburger W, Meyer A (2005) TaxI: a software tool for DNA barcoding using distance methods. Philos Trans R Soc B Biol Sci 360(1462):1975–1980

    Article  CAS  Google Scholar 

  12. Matsen FA, Kodner RB, Armbrust EV (2010) pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinf 11(1):538

    Google Scholar 

  13. Berger SA, Krompass D, Stamatakis A (2011) Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol 60(3):291–302

    Google Scholar 

  14. Ratnasingham S, Hebert PDN (2007) BOLD: the barcode of life data system (www.barcodinglife.org). Mol Ecol Notes 7(April 2016):355–364. arXiv: gr-qc/9809069v1. ISBN: 1471-8286

  15. Hickerson MJ, Meyer CP, Moritz C, Hedin M (2006) DNA barcoding will often fail to discover new animal species over broad parameter space. Syst Biol 55(5):729–739. ISBN: 1063-5157

    Google Scholar 

  16. Quicke DLJ, Alex Smith M, Janzen DH, Hallwachs W, Fernandez Triana J, Laurenne NM, Zaldívar-Riverón A, Shaw MR, Broad GR, Klopfstein S, Shaw SR, Hrcek J, Hebert PDN, Miller SE, Rodriguez JJ, Whitfield JB, Sharkey MJ, Sharanowski BJ, Jussila R, Gauld[deceased] ID, Chesters D, Vogler AP (2012) Utility of the DNA barcoding gene fragment for parasitic wasp phylogeny (Hymenoptera: Ichneumonoidea): data release and new measure of taxonomic congruence. Mol Ecol Resour 12(4):676–685

    Google Scholar 

  17. McCartney-Melstad E, Gidiş M, Shaffer HB (2018) Population genomic data reveal extreme geographic subdivision and novel conservation actions for the declining foothill yellow-legged frog. Heredity 121(2):112–125

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Coissac E, Hollingsworth PM, Lavergne S, Taberlet P (2016) From barcodes to genomes: extending the concept of DNA barcoding. Mol Ecol 25(7):1423–1428

    Article  CAS  PubMed  Google Scholar 

  19. Liu S, Li Y, Lu J, Su X, Tang M, Zhang R, Zhou L, Zhou C, Yang Q, Ji Y, Yu DW, Zhou X (2013) SOAP Barcode: revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons. Methods Ecol Evol 4(12):1142–1150.

    Article  Google Scholar 

  20. Margaryan A, Noer CL, Richter SR, Restrup ME, Bülow-Hansen JL, Leerhøi F, Langkjær EMR, Gopalakrishnan S, Carøe C, Gilbert MTP, Bohmann K (2021) Mitochondrial genomes of Danish vertebrate species generated for the national DNA reference database, DNAmark. Environ DNA 3(2):472–480

    Article  CAS  Google Scholar 

  21. Bohmann K, Mirarab S, Bafna V, Gilbert MTP (2020) Beyond DNA barcoding: the unrealized potential of genome skim data in sample identification. Mol Ecol 29(14):2521–2534

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Sarmashghi S, Bohmann K, P Gilbert MT, Bafna V, Mirarab S (2019) Skmer: assembly-free and alignment-free sample identification using genome skims. Genome Biol 20(1):34

    Google Scholar 

  23. Rachtman E, Sarmashghi S, Bafna V, Mirarab S (2022) Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling. Cell Syst 13(10):817–829.e3

    Google Scholar 

  24. Sarmashghi S, Balaban M, Rachtman E, Touri B, Mirarab S, Bafna V (2021) Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. PLoS Comput Biol 17(11):e1009449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Balaban M, Sarmashghi S, Mirarab S (2020) APPLES: scalable distance-based phylogenetic placement with or without alignments. Syst Biol 69(3):566–578

    Article  CAS  PubMed  Google Scholar 

  26. Balaban M, Jiang Y, Roush D, Zhu Q, Mirarab S (2022) Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol Ecol Resour 22(3):1213–1227

    Article  CAS  PubMed  Google Scholar 

  27. Rachtman E, Bafna V, Mirarab S (2021) CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genomics and Bioinformatics 3(3):lqab071. https://doi.org/10.1101/2021.03.18.436035

  28. Wood DE, Lu J, Langmead B (2019) Improved metagenomic analysis with Kraken 2. Genome Biol 20(1):257

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. arXiv: #14603 ISBN: 1548-7105 (Electronic)∖r1548-7091 (Linking)

    Google Scholar 

  30. Bushnell B, Rood J, Singer E (2017) BBMerge—Accurate paired shotgun read merging via overlap. PLoS One 12(10):1–15. Publisher: Public Library of Science.

    Google Scholar 

  31. Lefort V, Desper R, Gascuel O (2015) FastME 2.0: A comprehensive, accurate, and fast distance-based phylogeny inference program. Mol Biol Evol 32(10):2798–2800. ISBN: 1537-1719 (Electronic)∖r0737-4038 (Linking)

    Google Scholar 

  32. Matsen FA, Hoffman NG, Gallagher A, Stamatakis A (2012) A format for phylogenetic placements. PLoS ONE 7(2):e31009

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Rosenberg NA, Nordborg M (2002) Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet 3(5):380–390

    Article  CAS  PubMed  Google Scholar 

  34. Rachtman E, Balaban M, Bafna V, Mirarab S (2020) The impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters. Mol Ecol Resour 20(3):649–661

    Article  CAS  Google Scholar 

  35. Cornet L, Baurain D (2022) Contamination detection in genomic data: more is not enough. Genome Biol 23(1):60

    Article  PubMed  PubMed Central  Google Scholar 

  36. Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18):3094–3100

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770

    Article  PubMed  PubMed Central  Google Scholar 

  38. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM (2016) Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17(1):132

    Article  PubMed  PubMed Central  Google Scholar 

  39. Jukes TH, Cantor CR (1969) Evolution of protein molecules. Mammalian Protein Metabolism 3:21–132

    Article  CAS  Google Scholar 

  40. Puillandre N, Lambert A, Brouillet S, Achaz G (2012) ABGD, Automatic Barcode Gap Discovery for primary species delimitation. Mol Ecol 21(8):1864–1877

    Article  CAS  PubMed  Google Scholar 

  41. Esselstyn JA, Evans BJ, Sedlock JL, Khan FAA, Heaney LR (2012) Single-locus species delimitation: a test of the mixed yule-coalescent model, with an empirical application to Philippine round-leaf bats. Proc R Soc B Biol Sci 279(1743):3678–3686

    Article  Google Scholar 

  42. Fujisawa T, Barraclough TG (2013) Delimiting species using single-locus data and the generalized mixed yule coalescent approach: a revised method and evaluation on simulated data sets. Syst Biol 62(5):707–724

    Article  PubMed  PubMed Central  Google Scholar 

  43. Balaban M, Mirarab S (2020) Phylogenetic double placement of mixed samples. Bioinformatics 36(Supplement_1):i335–i343

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siavash Mirarab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Mirarab, S., Bafna, V. (2024). Analyses of Nuclear Reads Obtained Using Genome Skimming. In: DeSalle, R. (eds) DNA Barcoding. Methods in Molecular Biology, vol 2744. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3581-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3581-0_16

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3580-3

  • Online ISBN: 978-1-0716-3581-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics