Abstract
Restriction enzymes have been one of the primary tools in the population genetics toolkit for 50 years, being coupled with each new generation of technology to provide a more detailed view into the genetics of natural populations. Restriction site-Associated DNA protocols, which joined enzymes with short-read sequencing technology, have democratized the field of population genomics, providing a means to assay the underlying alleles in scores of populations. More than 10 years on, the technique has been widely applied across the tree of life and served as the basis for many different analysis techniques. Here, we provide a detailed protocol to conduct a RAD analysis from experimental design to de novo analysis—including parameter optimization—as well as reference-based analysis, all in Stacks version 2, which is designed to work with paired-end reads to assemble RAD loci up to 1000 nucleotides in length. The protocol focuses on major points of friction in the molecular approaches and downstream analysis, with special attention given to validating experimental analyses. Finally, the protocol provides several points of departure for further analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Smith HO, Welcox KW (1970) A restriction enzyme from Hemophilus influenzae. J Mol Biol 51:379–391
Kelly TJ, Smith HO (1970) A restriction enzyme from Hemophilus influenzae. J Mol Biol 51:393–409
Botstein D, White RL, Skolnick M et al (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:314–331
Vos P, Hogers R, Bleeker M et al (1995) AFLP: a new technique for DNA fingerprinting. Nucl Acids Res 23:4407–4414
Miller MR, Dunham JP, Amores A et al (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res 17:240–248
Davey JW, Hohenlohe PA, Etter PD et al (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510
Andrews KR, Good JM, Miller MR et al (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92
Benestan L, Gosselin T, Perrier C et al (2015) RAD genotyping reveals fine-scale genetic structuring and provides powerful population assignment in a widely distributed marine species, the American lobster (Homarus americanus). Mol Ecol 24:3299–3315
Frugone MJ, López ME, Segovia NI et al (2019) More than the eye can see: genomic insights into the drivers of genetic differentiation in royal/macaroni penguins across the Southern Ocean. Mol Phylogenet Evol 139:106563
Marandel F, Charrier G, Lamy J et al (2020) Estimating effective population size using RADseq: effects of SNP selection and sample size. Ecol Evol 10:1929–1937
Carlen E, Munshi-South J (2021) Widespread genetic connectivity of feral pigeons across the northeastern megacity. Evol Appl 14:150–162
Amores A, Catchen J, Ferrara A et al (2011) Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics 188:799–808
Mérot C, Berdan E, Cayuela H et al (2021) Locally adaptive inversions modulate genetic variation at different geographic scales in a seaweed fly. Mol Biol Evol 38:3953–3971
Bay RA, Karp DS, Saracco JF et al (2021) Genetic variation reveals individual-level climate tracking across the annual cycle of a migratory bird. Ecol Lett 24:819–828
Dudaniec RY, Yong CJ, Lancaster LT et al (2018) Signatures of local adaptation along environmental gradients in a range-expanding damselfly (Ischnura elegans). Mol Ecol 27:2576–2593
Schield DR, Walsh MR, Card DC et al (2016) Epi RAD seq: scalable analysis of genomewide patterns of methylation using next-generation sequencing. Methods Ecol Evol 7:60–69
Trucchi E, Mazzarella AB, Gilfillan GD et al (2016) BsRADseq: screening DNA methylation in natural populations of non-model species. Mol Ecol 25:1697–1713
Eaton DAR, Spriggs EL, Park B et al (2016) Misconceptions on missing data in RAD-seq Phylogenetics with a deep-scale example from flowering plants. Syst Biol 66:399–412
Near TJ, MacGuigan DJ, Parker E et al (2018) Phylogenetic analysis of Antarctic notothenioids illuminates the utility of RADseq for resolving Cenozoic adaptive radiations. Mol Phylogenet Evol 129:268–279
Ali OA, O’Rourke SM, Amish SJ et al (2016) RAD capture (rapture): flexible and efficient sequence-based genotyping. Genetics 202:389–400
Peterson BK, Weber JN, Kay EH et al (2012) Double digest RADseq: an inexpensive method for De novo SNP discovery and genotyping in model and non-model species. PLoS One 7:e37135
Hoffberg SL, Kieran TJ, Catchen JM et al (2016) RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data. Mol Ecol Resour 16:1264–1278
Catchen JM, Amores A, Hohenlohe P et al (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3 (Bethesda) 1:171–182
Catchen J, Hohenlohe PA, Bassham S et al (2013) Stacks: an analysis tool set for population genomics. Mol Ecol 22:3124–3140
Rochette NC, Rivera-Colón AG, Catchen JM (2019) Stacks 2: analytical methods for paired-end sequencing improve RADseq-based population genomics. Mol Ecol 28:4737–4754
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]
Campbell EO, Brunet BMT, Dupuis JR et al (2018) Would an RRS by any other name sound as RAD? Methods Ecol Evol 9:1920–1927
Chen L, Lu Y, Li W et al (2019) The genomic basis for colonizing the freezing Southern Ocean revealed by Antarctic toothfish and Patagonian robalo genomes. GigaScience 8(4):1–16
Baird NA, Etter PD, Atwood TS et al (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376
Etter PD, Bassham S, Hohenlohe PA et al (2012) SNP discovery and genotyping for evolutionary genetics using RAD sequencing. In: Orgogozo V, Rockman MV (eds) Molecular methods for evolutionary genetics. Humana Press, Totowa, NJ, pp 157–178
Herrera S, Reyes-Herrera PH, Shank TM (2015) Predicting RAD-seq marker numbers across the eukaryotic tree of life. Genome Biol Evol 7:3207–3225
Davey JW, Cezard T, Fuentes-Utrilla P et al (2013) Special features of RAD sequencing data: implications for genotyping. Mol Ecol 22:3151–3164
Lepais O, Weir JT (2014) SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Mol Ecol Resour 14:1314–1321
Mora-Márquez F, García-Olivares V, Emerson BC et al (2017) ddradseqtools: a software package for in silico simulation and testing of double-digest RADseq experiments. Mol Ecol Resour 17:230–246
Timm H, Weigand H, Weiss M et al (2018) ddrage: a data set generator to evaluate ddRADseq analysis software. Mol Ecol Resour 18:681–690
Rivera-Colón AG, Rochette NC, Catchen JM (2021) Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data. Mol Ecol Resour 21:363–378
R Core Team (2021) R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria
Paris JR, Stevens JR, Catchen JM (2017) Lost in parameter space: a road map for stacks. Methods Ecol Evol 8:1360–1373
Rochette NC, Catchen JM (2017) Deriving genotypes from RAD-seq short-read data using stacks. Nat Protoc 12:2640–2659
McCartney-Melstad E, Gidiş M, Shaffer HB (2019) An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour 19:1195–1204
Heller R, Nursyifa C, Garcia-Erill G et al (2021) A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits. Mol Ecol Resour 21:1085–1097
Bassham S, Catchen J, Lescak E et al (2018) Repeated selection of alternatively adapted haplotypes creates sweeping genomic remodeling in stickleback. Genetics 209:921–939
Nelson TC, Cresko WA (2018) Ancient genomic variation underlies repeated ecological adaptation in young stickleback populations. Evolution Lett 2:9–21
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945
Leitwein M, Duranton M, Rougemont Q et al (2020) Using haplotype information for conservation genomics. Trends Ecol Evol 35:245–258
Bootsma ML, Miller L, Sass GG et al (2021) The ghosts of propagation past: haplotype information clarifies the relative influence of stocking history and phylogeographic processes on contemporary population structure of walleye (Sander vitreus). Evol Appl 14:1124–1144
Malinsky M, Trucchi E, Lawson DJ et al (2018) RADpainter and fineRADstructure: population inference from RADseq data. Mol Biol Evol 35:1284–1290
Allendorf FW, Hohenlohe PA, Luikart G (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11:697–709
Cruickshank TE, Hahn MW (2014) Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol 23:3133–3157
Irwin DE, Milá B, Toews DPL et al (2018) A comparison of genomic islands of differentiation across three young avian species pairs. Mol Ecol 27:4839–4855
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097
Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103:338–348
Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
Gautier M, Klassmann A, Vitalis R (2017) rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure. Mol Ecol Resour 17:78–90
Cerca J, Maurstad MF, Rochette NC et al (2021) Removing the bad apples: a simple bioinformatic method to improve loci-recovery in de novo RADseq data for non-model organisms. Methods Ecol Evol 12:805–817
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Working example scripts referenced in this protocol can be found online at BitBucket in a Git repository: https://bitbucket.org/CatchenLab/mimb-stacks2-protocol. We list the available scripts below.
Plot Library Demultiplexing and Processing
The file 01_process_radtags/process_radtags_stats.R uses the log reported by process_radtags and calculates several summary statistics on the proportion of samples in the library, number of total reads, and percentage of total reads kept per-sample. It also generates several plots showing the generated distributions. The input file can be obtained using the following.
stacks-dist-extract process_radtags.raw.log per_barcode_raw_read_counts > per_sample_counts.tsv
De Novo Parameter Optimization Shell Loop
The file 02_param_opt/param_opt_loop.sh loops over the values of 1 through 12 and uses the number in each iteration as the corresponding value for ustacks -M and cstacks -n for the Stacks de novo pipeline.
Gstacks Coverage R Script
The file 03_denovo/gstacks_stats.R uses the log reported by gstacks and calculates several summary statistics on sample coverage and PCR duplicates. It also generates several plots showing the generated distributions. The input file can be obtained using the following.
stacks-dist-extract gstacks.log.distribs effective_coverages_per_sample | grep -v '^#' > effective_coverages_per_sample.tsv
Bwa Alignment Shell Loop
The file 04_refmap/bwa_samples_loop.sh loops over each sample in the provided population map file, and for each sample aligns the paired reads with bwa mem and processed alignments using samtools view/sort. For each iteration of the loop, the sample name present in the popmap is used as the prefix from which the path the input reads and output BAM files are constructed.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Rivera-Colón, A.G., Catchen, J. (2022). Population Genomics Analysis with RAD, Reprised: Stacks 2. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_7
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2313-8_7
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2312-1
Online ISBN: 978-1-0716-2313-8
eBook Packages: Springer Protocols