Skip to main content

Population Genomics Analysis with RAD, Reprised: Stacks 2

  • Protocol
  • First Online:
Marine Genomics

Abstract

Restriction enzymes have been one of the primary tools in the population genetics toolkit for 50 years, being coupled with each new generation of technology to provide a more detailed view into the genetics of natural populations. Restriction site-Associated DNA protocols, which joined enzymes with short-read sequencing technology, have democratized the field of population genomics, providing a means to assay the underlying alleles in scores of populations. More than 10 years on, the technique has been widely applied across the tree of life and served as the basis for many different analysis techniques. Here, we provide a detailed protocol to conduct a RAD analysis from experimental design to de novo analysis—including parameter optimization—as well as reference-based analysis, all in Stacks version 2, which is designed to work with paired-end reads to assemble RAD loci up to 1000 nucleotides in length. The protocol focuses on major points of friction in the molecular approaches and downstream analysis, with special attention given to validating experimental analyses. Finally, the protocol provides several points of departure for further analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Smith HO, Welcox KW (1970) A restriction enzyme from Hemophilus influenzae. J Mol Biol 51:379–391

    Article  CAS  PubMed  Google Scholar 

  2. Kelly TJ, Smith HO (1970) A restriction enzyme from Hemophilus influenzae. J Mol Biol 51:393–409

    Article  CAS  PubMed  Google Scholar 

  3. Botstein D, White RL, Skolnick M et al (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:314–331

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Vos P, Hogers R, Bleeker M et al (1995) AFLP: a new technique for DNA fingerprinting. Nucl Acids Res 23:4407–4414

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Miller MR, Dunham JP, Amores A et al (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res 17:240–248

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Davey JW, Hohenlohe PA, Etter PD et al (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510

    Article  CAS  PubMed  Google Scholar 

  7. Andrews KR, Good JM, Miller MR et al (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Benestan L, Gosselin T, Perrier C et al (2015) RAD genotyping reveals fine-scale genetic structuring and provides powerful population assignment in a widely distributed marine species, the American lobster (Homarus americanus). Mol Ecol 24:3299–3315

    Article  PubMed  Google Scholar 

  9. Frugone MJ, López ME, Segovia NI et al (2019) More than the eye can see: genomic insights into the drivers of genetic differentiation in royal/macaroni penguins across the Southern Ocean. Mol Phylogenet Evol 139:106563

    Article  PubMed  Google Scholar 

  10. Marandel F, Charrier G, Lamy J et al (2020) Estimating effective population size using RADseq: effects of SNP selection and sample size. Ecol Evol 10:1929–1937

    Article  PubMed  PubMed Central  Google Scholar 

  11. Carlen E, Munshi-South J (2021) Widespread genetic connectivity of feral pigeons across the northeastern megacity. Evol Appl 14:150–162

    Article  PubMed  Google Scholar 

  12. Amores A, Catchen J, Ferrara A et al (2011) Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics 188:799–808

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Mérot C, Berdan E, Cayuela H et al (2021) Locally adaptive inversions modulate genetic variation at different geographic scales in a seaweed fly. Mol Biol Evol 38:3953–3971

    Article  PubMed  PubMed Central  Google Scholar 

  14. Bay RA, Karp DS, Saracco JF et al (2021) Genetic variation reveals individual-level climate tracking across the annual cycle of a migratory bird. Ecol Lett 24:819–828

    Article  PubMed  Google Scholar 

  15. Dudaniec RY, Yong CJ, Lancaster LT et al (2018) Signatures of local adaptation along environmental gradients in a range-expanding damselfly (Ischnura elegans). Mol Ecol 27:2576–2593

    Article  CAS  PubMed  Google Scholar 

  16. Schield DR, Walsh MR, Card DC et al (2016) Epi RAD seq: scalable analysis of genomewide patterns of methylation using next-generation sequencing. Methods Ecol Evol 7:60–69

    Article  Google Scholar 

  17. Trucchi E, Mazzarella AB, Gilfillan GD et al (2016) BsRADseq: screening DNA methylation in natural populations of non-model species. Mol Ecol 25:1697–1713

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Eaton DAR, Spriggs EL, Park B et al (2016) Misconceptions on missing data in RAD-seq Phylogenetics with a deep-scale example from flowering plants. Syst Biol 66:399–412

    Google Scholar 

  19. Near TJ, MacGuigan DJ, Parker E et al (2018) Phylogenetic analysis of Antarctic notothenioids illuminates the utility of RADseq for resolving Cenozoic adaptive radiations. Mol Phylogenet Evol 129:268–279

    Article  CAS  PubMed  Google Scholar 

  20. Ali OA, O’Rourke SM, Amish SJ et al (2016) RAD capture (rapture): flexible and efficient sequence-based genotyping. Genetics 202:389–400

    Article  CAS  PubMed  Google Scholar 

  21. Peterson BK, Weber JN, Kay EH et al (2012) Double digest RADseq: an inexpensive method for De novo SNP discovery and genotyping in model and non-model species. PLoS One 7:e37135

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Hoffberg SL, Kieran TJ, Catchen JM et al (2016) RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data. Mol Ecol Resour 16:1264–1278

    Article  CAS  PubMed  Google Scholar 

  23. Catchen JM, Amores A, Hohenlohe P et al (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3 (Bethesda) 1:171–182

    Article  CAS  Google Scholar 

  24. Catchen J, Hohenlohe PA, Bassham S et al (2013) Stacks: an analysis tool set for population genomics. Mol Ecol 22:3124–3140

    Article  PubMed  PubMed Central  Google Scholar 

  25. Rochette NC, Rivera-Colón AG, Catchen JM (2019) Stacks 2: analytical methods for paired-end sequencing improve RADseq-based population genomics. Mol Ecol 28:4737–4754

    Article  CAS  PubMed  Google Scholar 

  26. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]

    Google Scholar 

  27. Campbell EO, Brunet BMT, Dupuis JR et al (2018) Would an RRS by any other name sound as RAD? Methods Ecol Evol 9:1920–1927

    Article  Google Scholar 

  28. Chen L, Lu Y, Li W et al (2019) The genomic basis for colonizing the freezing Southern Ocean revealed by Antarctic toothfish and Patagonian robalo genomes. GigaScience 8(4):1–16

    Google Scholar 

  29. Baird NA, Etter PD, Atwood TS et al (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Etter PD, Bassham S, Hohenlohe PA et al (2012) SNP discovery and genotyping for evolutionary genetics using RAD sequencing. In: Orgogozo V, Rockman MV (eds) Molecular methods for evolutionary genetics. Humana Press, Totowa, NJ, pp 157–178

    Chapter  Google Scholar 

  31. Herrera S, Reyes-Herrera PH, Shank TM (2015) Predicting RAD-seq marker numbers across the eukaryotic tree of life. Genome Biol Evol 7:3207–3225

    Article  PubMed  PubMed Central  Google Scholar 

  32. Davey JW, Cezard T, Fuentes-Utrilla P et al (2013) Special features of RAD sequencing data: implications for genotyping. Mol Ecol 22:3151–3164

    Article  CAS  PubMed  Google Scholar 

  33. Lepais O, Weir JT (2014) SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Mol Ecol Resour 14:1314–1321

    Article  CAS  PubMed  Google Scholar 

  34. Mora-Márquez F, García-Olivares V, Emerson BC et al (2017) ddradseqtools: a software package for in silico simulation and testing of double-digest RADseq experiments. Mol Ecol Resour 17:230–246

    Article  PubMed  CAS  Google Scholar 

  35. Timm H, Weigand H, Weiss M et al (2018) ddrage: a data set generator to evaluate ddRADseq analysis software. Mol Ecol Resour 18:681–690

    Article  CAS  PubMed  Google Scholar 

  36. Rivera-Colón AG, Rochette NC, Catchen JM (2021) Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data. Mol Ecol Resour 21:363–378

    Article  PubMed  Google Scholar 

  37. R Core Team (2021) R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  38. Paris JR, Stevens JR, Catchen JM (2017) Lost in parameter space: a road map for stacks. Methods Ecol Evol 8:1360–1373

    Article  Google Scholar 

  39. Rochette NC, Catchen JM (2017) Deriving genotypes from RAD-seq short-read data using stacks. Nat Protoc 12:2640–2659

    Article  CAS  PubMed  Google Scholar 

  40. McCartney-Melstad E, Gidiş M, Shaffer HB (2019) An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour 19:1195–1204

    Article  PubMed  Google Scholar 

  41. Heller R, Nursyifa C, Garcia-Erill G et al (2021) A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits. Mol Ecol Resour 21:1085–1097

    Article  CAS  PubMed  Google Scholar 

  42. Bassham S, Catchen J, Lescak E et al (2018) Repeated selection of alternatively adapted haplotypes creates sweeping genomic remodeling in stickleback. Genetics 209:921–939

    Article  PubMed  PubMed Central  Google Scholar 

  43. Nelson TC, Cresko WA (2018) Ancient genomic variation underlies repeated ecological adaptation in young stickleback populations. Evolution Lett 2:9–21

    Article  Google Scholar 

  44. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Leitwein M, Duranton M, Rougemont Q et al (2020) Using haplotype information for conservation genomics. Trends Ecol Evol 35:245–258

    Article  PubMed  Google Scholar 

  47. Bootsma ML, Miller L, Sass GG et al (2021) The ghosts of propagation past: haplotype information clarifies the relative influence of stocking history and phylogeographic processes on contemporary population structure of walleye (Sander vitreus). Evol Appl 14:1124–1144

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Malinsky M, Trucchi E, Lawson DJ et al (2018) RADpainter and fineRADstructure: population inference from RADseq data. Mol Biol Evol 35:1284–1290

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Allendorf FW, Hohenlohe PA, Luikart G (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11:697–709

    Article  CAS  PubMed  Google Scholar 

  50. Cruickshank TE, Hahn MW (2014) Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol 23:3133–3157

    Article  PubMed  Google Scholar 

  51. Irwin DE, Milá B, Toews DPL et al (2018) A comparison of genomic islands of differentiation across three young avian species pairs. Mol Ecol 27:4839–4855

    Article  CAS  PubMed  Google Scholar 

  52. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103:338–348

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Gautier M, Klassmann A, Vitalis R (2017) rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure. Mol Ecol Resour 17:78–90

    Article  CAS  PubMed  Google Scholar 

  56. Cerca J, Maurstad MF, Rochette NC et al (2021) Removing the bad apples: a simple bioinformatic method to improve loci-recovery in de novo RADseq data for non-model organisms. Methods Ecol Evol 12:805–817

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Catchen .

Editor information

Editors and Affiliations

Appendices

Appendix

Working example scripts referenced in this protocol can be found online at BitBucket in a Git repository: https://bitbucket.org/CatchenLab/mimb-stacks2-protocol. We list the available scripts below.

Plot Library Demultiplexing and Processing

The file 01_process_radtags/process_radtags_stats.R uses the log reported by process_radtags and calculates several summary statistics on the proportion of samples in the library, number of total reads, and percentage of total reads kept per-sample. It also generates several plots showing the generated distributions. The input file can be obtained using the following.

stacks-dist-extract process_radtags.raw.log per_barcode_raw_read_counts > per_sample_counts.tsv

De Novo Parameter Optimization Shell Loop

The file 02_param_opt/param_opt_loop.sh loops over the values of 1 through 12 and uses the number in each iteration as the corresponding value for ustacks -M and cstacks -n for the Stacks de novo pipeline.

Gstacks Coverage R Script

The file 03_denovo/gstacks_stats.R uses the log reported by gstacks and calculates several summary statistics on sample coverage and PCR duplicates. It also generates several plots showing the generated distributions. The input file can be obtained using the following.

stacks-dist-extract gstacks.log.distribs effective_coverages_per_sample | grep -v '^#' > effective_coverages_per_sample.tsv

Bwa Alignment Shell Loop

The file 04_refmap/bwa_samples_loop.sh loops over each sample in the provided population map file, and for each sample aligns the paired reads with bwa mem and processed alignments using samtools view/sort. For each iteration of the loop, the sample name present in the popmap is used as the prefix from which the path the input reads and output BAM files are constructed.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Rivera-Colón, A.G., Catchen, J. (2022). Population Genomics Analysis with RAD, Reprised: Stacks 2. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2313-8_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2312-1

  • Online ISBN: 978-1-0716-2313-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics