Abstract
As complete genomes become easier to attain, even from previously difficult-to-sequence species, and as genomic resequencing becomes more routine, it is becoming obvious that genomic structural variation is more widespread than originally thought and plays an important role in maintaining genetic variation in populations. Structural variants (SVs) and associated gene presence–absence variation (PAV) can be important players in local adaptation, allowing the maintenance of genetic variation and taking part in other evolutionarily relevant phenomena. While recent studies have highlighted the importance of structural variation in Mollusca, the prevalence of this phenomenon in the broader context of marine organisms remains to be fully investigated.
Here, we describe a straightforward and broadly applicable method for the identification of SVs in fully assembled diploid genomes, leveraging the same reads used for assembly. We also explain a gene PAV analysis protocol, which could be broadly applied to any species with a fully sequenced reference genome available. Although the strength of these approaches have been tested and proven in marine invertebrates, which tend to have high levels of heterozygosity, possibly due to their lifestyle traits, they are also applicable to other species across the tree of life, providing a ready means to begin investigations into this potentially widespread phenomena.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Feuk L, Marshall CR, Wintle RF et al (2006) Structural variants: changing the landscape of chromosomes and design of disease studies. Hum Mol Genet 15:R57–R66
Marroni F, Pinosio S, Morgante M (2014) Structural variation and genome complexity: is dispensable really dispensable? Curr Opin Plant Biol 18:31–36
Read BA, Emiliania huxleyi Annotation Consortium, Kegel J et al (2013) Pan genome of the phytoplankton Emiliania underpins its global distribution. Nature 499(7457):209–213. https://doi.org/10.1038/nature12221
McInerney JO, McNally A, O’Connell MJ (2017) Why prokaryotes have pangenomes. Nat Microbiol 2:17040. https://doi.org/10.1038/nmicrobiol.2017.40
Medini D, Donati C, Tettelin H et al (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589–594
Vernikos G, Medini D, Riley DR et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154
Aherfi S, Andreani J, Baptiste E et al (2018) A Large Open Pangenome and a Small Core Genome for Giant Pandoraviruses. Front Microbiol 9:1486. https://doi.org/10.3389/fmicb.2018.01486
Song J-M, Guan Z, Hu J et al (2020) Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat Plants 6:34–45
Alonge M, Wang X, Benoit M et al (2020) Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182:145–161.e23
Golicz AA, Bayer PE, Bhalla PL et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36:132–145
McCarthy CGP, Fitzpatrick DA (2019) Pan-genome analyses of model fungal species. Microb Genom 5:e000243
Sherman RM, Forman J, Antonescu V et al (2019) Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 51:30–35
Tian X, Li R, Fu W et al (2020) Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci China Life Sci 63:750–763
Li R, Li Y, Zheng H et al (2010) Building the sequence map of the human pan-genome. Nat Biotechnol 28:57–63
Rosa RD, Alonso P, Santini A et al (2015) High polymorphism in big defensin gene expression reveals presence–absence gene variability (PAV) in the oyster Crassostrea gigas. Dev Comp Immunol 49(2):231–238. https://doi.org/10.1016/j.dci.2014.12.002
Gerdol M, Moreira R, Cruz F et al (2020) Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel. Genome Biol 21:275
Vos M, Eyre-Walker A (2017) Are pangenomes adaptive or not? Nat Microbiol 2:1576–1576
Calcino AD, Kenny NJ, Gerdol M (2021) Single individual structural variant detection uncovers widespread hemizygosity in molluscs. Philos Trans R Soc Lond Ser B Biol Sci 376:20200153
Martinez AS, Willoughby JR, Christie MR (2018) Genetic diversity in fishes is influenced by habitat type and life-history variation. Ecol Evol 8:12022–12031
Olsen KC, Ryan WH, Winn AA et al (2020) Inbreeding shapes the evolution of marine invertebrates. Evolution 74:871–882
Seppey M, Manni M, Zdobnov EM (2019) BUSCO: assessing genome assembly and annotation completeness. Methods Mol Biol 1962:227–245
Zdobnov EM, Tegenfeldt F, Kuznetsov D et al (2017) OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res 45:D744–D749
Bushnell B. et al. (2014) BBMap: A Fast, Accurate, Splice-Aware Aligner. No. LBNL-7065E. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA.
Neph S, Kuehn MS, Reynolds AP et al (2012) BEDOPS: high-performance genomic feature operations. Bioinformatics 28:1919–1920
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760
Li H (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. http://github.com/lh3/bwa
fastp, Github. https://github.com/OpenGene/fastp
Andrews S FastQC, Github. https://github.com/s-andrews/FastQC
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770
Pedersen BS, Quinlan AR (2018) Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34:867–868
Harris CR, Millman KJ, van der Walt SJ et al (2020) Array programming with NumPy. Nature 585:357–362
McKinney W (2010) Data Structures for Statistical Computing in Python. Proceedings of The 9th Python in Science Conference, pp. 51-56. https://doi.org/10.25080/majora-92bf1922-00a
Pacific Biosciences (2017) pbmm2, Github. https://github.com/PacificBiosciences/pbmm2
Pacific Biosciences (2017) pbsv, Github. https://github.com/PacificBiosciences/pbsv
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Virtanen P, Gommers R, Oliphant TE et al (2020) Author correction: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods 17:352
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100
Wingett SW, Andrews S (2018) FastQ screen: a tool for multi-genome mapping and quality control. F1000Res 7:1338
Danecek P, Bonfield JK, Liddle J et al (2021) Twelve years of SAMtools and BCFtools. Gigascience 10:giab008
Falcon S, Gentleman R (2008) Hypergeometric testing used for gene set enrichment. Analysis:207–220. https://doi.org/10.1007/978-0-387-77240-0_14
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29
Gene Ontology Consortium (2021) The gene ontology resource: enriching a GOld mine. Nucleic Acids Res 49:D325–D334
Mistry J, Chuguransky S, Williams L et al (2021) Pfam: the protein families database in 2021. Nucleic Acids Res 49:D412–D419
Jones P, Binns D, Chang H-Y et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
Blum M, Chang H-Y, Chuguransky S et al (2021) The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49:D344–D354
Stancu MC, van Roosmalen MJ, Renkens I et al (2017) Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun 8:1–13
Heller D, Vingron M (2019) SVIM: structural variant identification using mapped long reads. Bioinformatics 35:2907–2915
Jiang T, Liu Y, Jiang Y et al (2020) Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 21:189
Rhie A, Walenz BP, Koren S et al (2020) Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21:245
Bemm F, Weiß CL, Schultz J et al (2016) Genome of a tardigrade: horizontal gene transfer or bacterial contamination? Proc Natl Acad Sci U S A 113(22):E3054–E3056
Espinas NA, Tu LN, Furci L et al (2020) Transcriptional regulation of genes bearing intronic heterochromatin in the rice genome. PLoS Genet 16:e1008637
Laetsch DR, Blaxter ML (2017) BlobTools: interrogation of genome assemblies. F1000Res 6:1287
Wood DE, Lu J, Langmead B (2019) Improved metagenomic analysis with kraken 2. Genome Biol 20:257
Gaudet P, Dessimoz C (2017) Gene ontology: pitfalls, biases, and remedies. Methods Mol Biol 1446:189–205
Khalturin K, Hemmrich G, Fraune S et al (2009) More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet 25:404–413
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Sollitto, M., Kenny, N.J., Greco, S., Tucci, C.F., Calcino, A.D., Gerdol, M. (2022). Detecting Structural Variants and Associated Gene Presence–Absence Variation Phenomena in the Genomes of Marine Organisms. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_4
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2313-8_4
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2312-1
Online ISBN: 978-1-0716-2313-8
eBook Packages: Springer Protocols