Introduction

The genetic basis of receptor-mediated sensation in vertebrates ranges from deceptively simple to highly complicated (Baldwin and Ko 2020). Infrared heat-sensing in pitvipers largely depends on a single calcium-channel that is activated by heating of the pit organ (Gracheva et al. 2010). In contrast, vertebrate chemoperception (e.g., olfaction) depends on hundreds of receptor genes (Bear et al. 2016) that enable detection of a diverse set of scent molecules. Evolution of the sensory receptor underlies diverse adaptive strategies and natural history traits in vertebrates (Baldwin and Ko 2020). A well characterized example is the vertebrate vision system. The complement of photoreceptor paralogs and evolutionary spectral tuning enable vertebrates to utilize diverse light environments (Davies et al. 2012). A wealth of experimental molecular studies has shown links between specific amino acid substitutions and receptor spectral sensitivity (Bowmaker 2008; Davies et al. 2012; Van Hazel et al. 2013; Hunt et al. 2009; Schweikert et al. 2018; Yokoyama 2008; Yokoyama et al. 2014, 2016). Recently, this experimental work has led to functionally predictive, computational modeling of receptor structures (Patel et al. 2018). Clearly, a full genetic accounting of sensory receptors and characterization of the evolutionary forces that shape their distributions and sequences are critical to elucidating the mechanistic basis of sensory perception.

Chemoperception is the sensory detection of chemicals originating outside the body, including airborne, dissolved, and surface-bound compounds. At the molecular level, chemoperception depends on the physical interaction of these compounds with membrane-bound receptor proteins in the sensory tissue (Mombaerts 2004). Vertebrate chemoreceptors involved in chemoperception belong to the large family of G-protein coupled receptors (GPCRs) (Buck and Axel 1991). GPCRs share a highly conserved structure and functional mechanism (Gether and Kobilka 1998; Rosenbaum et al. 2009). Chemical ligands bind to the variable extracellular domain triggering a conformational change that is propagated through the highly conserved seven-transmembrane domain. The conformational change leads to a change in the interaction state of the GPCR intracellular domain with the G-protein. G-protein activation then leads to a signaling cascade and a change in cell state, which, in the case of vertebrate chemoreceptors, is the depolarization of afferent sensory neurons (Rosenbaum et al. 2009).

Vertebrates utilize several types of GPCRs for chemoperception. Among these are the large groups of olfactory receptors (ORs), vomeronasal type-1 receptors (V1Rs), vomeronasal type-2 receptors (V2Rs), and trace amine-associated receptors (TAARs). Copies of these types of chemoreceptors are typically abundant, with large numbers of pseudogenes indicating evolution through a birth–death process (Dong et al. 2009; Hughes et al. 2018; Nei and Rooney 2005; Niimura and Nei 2007). Lineage specific presence/absence of these groups and the relative abundance is evolutionarily responsive to ecology (Grus et al. 2005; Hayden et al. 2010; Khan et al. 2015; Silva and Antunes 2017). For example, several marine mammals (Kishida et al. 2007; Liu et al. 2019) and sea snakes (Kishida and Hikida 2010) show a convergent loss of OR genes upon adaptation to aquatic environments. The abundance of chemoreceptors can also be responsive to the evolution of other sensory systems, as in the decline of functional ORs in primates upon greater reliance on the visual system (Niimura et al. 2018). Lineage specific expansions are associated with behavior as in the expansion of V2R genes in ostariophysian fish with pheromone-stimulated fright reaction (Yang et al. 2019).

A shift in abundance from V2Rs to V1Rs has been postulated to correlate with the shift from aquatic to terrestrial lifestyle during early tetrapod evolution (Shi and Zhang 2007). However, this trend is apparently contradicted in squamate reptiles, including snakes and lizards, that have few V1Rs and abundant V2Rs (Brykczynska et al. 2013). Squamates have distinct chemosensory morphologies (Baeckens et al. 2017), and the olfactory bulb and vomeronasal organ are anatomically separate (Graves 1993; Halpern 1987; Halpern and Martínez-Marcos 2003; Keverne 1999). Airborne molecules enter the nares and pass over the main olfactory bulb (MOB) (Schwenk 1993, 1995) for detection by ORs. Snakes and some lizards utilize elongated tongues that bifurcate into microscopic points, optimized in form and kinematic function for bi-directional scent collection and delivery to the vomeronasal organ (VNO) through distinctive oscillatory tongue-flicking behaviors (Cooper 1995; Daghfous et al. 2012; Filoramo and Schwenk 2009) and subsequent detection by V2Rs (Cooper, 1997a). This dual-system arrangement (Fig. 1a) provides squamates with a unique chemoperception system that can detect volatile, airborne chemicals via the olfactory bulb and ORs as well as water-soluble chemicals via the VNO and V2Rs (Cooper 1997b). Previous work assessing the genomic and chromosomal context of genes in the five-pace viper (Yin et al. 2016) reported signatures of adaptive evolution in ORs. This work further identified significantly more OR genes located on the Z chromosome than any other chromosome. These findings set the stage for precise genomic accounting of snake chemoperception by also considering V2Rs in a similar way.

Fig. 1
figure 1

Graphical summary of ORs and V2Rs found in Crotalus adamanteus. a Chemical cues (shown as red circles) are collected from the environment and processed by chemoreceptors localized in the MOB (above) and VNO (below). b 2D representations of membrane-bound ORs and V2Rs with extracellular regions colored dark green and the intracellular (cytoplasmic and transmembrane) regions colored purple; V2R dimerization is represented by a duplicated, faded and mirrored structure with the dimer-interface highlighted by a red shaded oval; red circles with red arrows conceptualize ligand-binding potential in the extracellular regions of ORs and V2Rs. c Chromosomal distributions of 362 OR paralogs and 430 V2R paralogs; genes occurring on contigs which failed to map to the reference Crotalus viridis assembly are labeled “Unplaced”; genes occurring on contigs that mapped to sub-chromosome resolution regions of the reference assembly are labeled “Unplaced contiguous”

Rattlesnakes display chemosensory-based behaviors, such as the ability to distinguish envenomated verses non-envenomated prey (Saviola et al. 2013), the detection and tracking of potential mates by male rattlesnakes over harsh terrain from distances of up to a kilometer (Weldon et al. 1992; Duvall et al. 1992), and maternal care behaviors between female rattlesnakes and her offspring (as reviewed in Weldon et al. 1992). These distinct behaviors present rattlesnakes as pertinent study systems for understanding the mechanistic basis and biological relevance for chemoperception. Additionally, their reliance on thermal-sensing and venom for predation provides an opportunity for understanding the evolutionary response of phenotypic integration and identifying genetic signatures of prey specificity (Holding et al. 2015).

As a first step to understanding the rattlesnake chemosensory system, we used genomic and transcriptomic approaches to generate a genetic framework of the complete chemoreceptor repertoire of the eastern diamondback rattlesnake (Crotalus adamanteus). We characterized chromosomal localizations and gene family distributions within Crotalus adamanteus and among other squamates, as well as analyzed protein structure and selection signatures across chemoreceptor phylogenies. Our combined bioinformatic approach suggests genomic mechanisms underlying sex-biased chemoperception, evolutionary mechanisms underlying chemoreceptor diversification, and creates a framework for elucidating the roles of chemoreceptors in rattlesnake ecology and evolution.

Results and Discussion

Highly Diverse Complement of Rattlesnake ORs and V2Rs

To characterize rattlesnake chemosensory gene repertoires, we sequenced and assembled a high-coverage reference genome for Crotalus adamanteus using an adult female rattlesnake. The final assembly represents the first full-length genome for this species. Our combined Illumina, Nanopore, and Pacbio sequencing resulted in 51.6 × coverage based on the final assembly size (see methods for sequencing details). The final assembly is 1.6 Gbp in size and consists of 11,369 scaffolded contigs with a N50 of 338 Kb. Gaps comprise < 0.0% and contigs > 50 Kbp comprise 93.07% of assembled bases. Gene annotations were hand-curated for individual chemoreceptor genes guided by full-genome splice aware mapping of mRNA transcripts extracted from olfactory bulb and VNO tissues. Altogether, we annotated 796 full-length chemosensory genes, 164 pseudogenes, and 51 truncated genes (chromosomal distributions summarized in Table 1). Our gene annotations outline exonic structures, splice sites, protein coding sequences (CDSs), mRNAs, UTRs, poly-A tail signals, and assign unique IDs to individual genes. Pseudogenes and truncated genes occurring on contigs containing full-length genes were annotated to provide context for future synteny analyses.

Table 1 Chromosomal distributions of all chemosensory genes, pseudogenes, and truncated genes identified from the Crotalus adamanteus genome assembly

Rattlesnake chemoperception integrates olfactory and vomeronasal acuity through comparable OR and V2R genetic contributions. C. adamanteus possess hundreds of OR and V2R genes (n = 362 and 430, respectively) in addition to two V1R genes, one TAAR2-like gene, and one TAAR5-like gene. High numbers of V2R paralogs were previously described in the corn snake using joint transcriptomics and mRNA in situ hybridization (Brykczynska et al. 2013). More recently, Kishida et al. (2019) reported high numbers of paralogs for both ORs and V2Rs in sea snakes. RNA-seq alone limits the ability to assemble large, multi-exon genes such as V2Rs, as evidenced by the recovery of many partial coding sequences from the corn snake (Brykczynska et al. 2013). The approach used here, RNA-seq read mapping to a reference genome, provides a more complete assembly of full-length OR and V2R genes as demonstrated by Kishida et al. (2019) with sea snakes. The hand-curated full-length gene annotations generated in the present study further distinguish pseudogenes from putatively functional chemoreceptors, which would otherwise obscure the ability to assign chemoreceptor orthology downstream. Our identification of chemoreceptor genes enables future research expanding on multi-species chemoreceptor orthology and recovering clade-specific gene expansions.

Our annotations show that rattlesnake OR and V2R genes generally occur as large tandem-repeat arrays spread across the genome. The largest V2R array of 64 putatively functional genes spans ∼4 Mbp on chromosome 2 (Fig. 2). This particular array includes V2R genes in both the 3′ and 5′ direction, pseudogenes mid-array, and truncated genes (i.e., potentially functional yet only partially recovered in the genome assembly). Conservative estimates suggest that ∼25% of all gene duplication events in vertebrates are tied to tandemly arrayed genes (Pan and Zhang 2008). Array duplications can instantaneously and dramatically increase gene copy numbers and influence gene expression phenotypes (Stranger et al. 2007), suggesting an efficient mechanism for OR and V2R gene expansion. High OR diversity and genomic architecture appear consistent across terrestrial vertebrates (Niimura and Nei 2006; Niimura 2009, 2012) and has previously been extended to non-viperid snakes. For example, Perry et al. (2018) reported high numbers of OR paralogs from the garter snake genome and further confirmed an early expansion in snake ORs when compared to non-varanid lizards. In a comparison of V2Rs from the Komodo dragon and other squamates, Lind et al. (2019) demonstrated lineage-specific V2R expansions. Additionally, Komodo dragon V2Rs occur in tandem repeat clusters, similar to mammalian VNO receptors (Yang et al. 2005) and ORs (Niimura and Nei 2006), suggesting that gene duplication via tandem arrays is a common mechanism for OR and V2R gene family expansion across vertebrates.

Fig. 2
figure 2

Chemosensory genomic architecture for Crotalus adamanteus. Gene annotations are overlaid on the chromosome-level RaGOO scaffolding to visualize high-density regions (*chromosome W is included for size only). OR genes are colored blue and V2Rs are colored green. The yellow and orange zoomed-in regions illustrate the breakdown of a large V2R gene array present on chromosome 2; pseudogenes and truncated genes (dark green) are also shown in the orange ∼600 Kb close-up

Using a comparative syntenic and phylogenetic analysis, we traced the evolution of a distinguishable orthologous V2R array between five squamate species with available genomes (Fig. 3). Based on our established V2R exonic structure template (Fig. 3b), we identified 47 full-length V2R genes for C. adamanteus, 59 for Crotalus tigris, 82 for Thamnophis elegans, 18 for Varanus komodoensis, and 6 for Anolis carolinensis. V2R pseudogenes were also identified for C. tigris and T. elegans in addition to C. adamanteus. There are clear V2R array expansions and contractions potentially corresponding to life history changes during squamate evolution. For example, there is a large expansion of V2R genes among snake lineages but a contraction among pitvipers, which may correlate with the evolution of the heat-sensing organ. Additionally, lineage specific changes in V2R distributions are evident among the two rattlesnake species, which may correlate with differences in diet and habitat. The disparity in genome completeness and annotation among these species prohibits a genome-wide analysis of V2R arrays. However, our approach of combining long and short-read genomic data with transcriptomics provides a guide for annotation and analysis of large copy-number gene families.

Fig. 3
figure 3

Combined synteny and phylogenetic assessment of a shared squamate V2R gene array. a Genomic schematics illustrating V2R gene numbers, order, and orientations identified from the genomes of five species. Missing sequences are shown as red “Ns” with the number of bases underneath in red, and manual scaffold locations are shown as double grey dashes. This array is distinguished by several conserved immunoglobulin type-V subunit containing regions (light-blue arrows), and collectively highlights a substantial V2R gene expansion in snakes with a potential reduction following the evolution of the pit organ. Red and orange shading connects snake V2Rs which grouped with lizard genes on the gene phylogeny. b Generalized exonic structure shared across all V2R genes shown in a. Violations to this structure such as missing exons or premature stops were identified as pseudogenes, and these were excluded for clarity. c Gene phylogeny of all V2R genes shown in a. The tree represents a maximum likelihood reconstruction of full-length gene sequences spanning non-coding regions (i.e., introns and UTRs) for accurate recovery of recent duplication events. The red clade includes V2Rs from all five species, and the orange clade includes V2Rs from the three snakes and Varanus komodoensis, suggesting a potential gene expansion linked to a forked-tongue common ancestor. Image of V. komodoensis courtesy of Daniel Dashevsky

To further confirm our CDS annotations for C. adamanteus and characterize receptor structural features, we iteratively assessed translated chemosensory genes for conserved elements during the genome annotation process. Extracellular ligand binding domains were identified with the Conserved Domains Database (NCBI; Supp_file_01). We used SignalP (Supp_file_02) to identify signal sequences in our set of receptors. Most of the V2R genes contained signal sequences (327/430). We found only one OR (1/362) gene with a signal sequence, which conforms with the enigmatic trafficking of ORs in mature olfactory neurons (McClintock and Sammeta 2003). Additionally, structural homology searches confirm that C. adamanteus ORs are monomeric (Fig. 1b), which follows the one allele-one neuron rule found across vertebrate ORs (Chess et al. 1994). In contrast, V2Rs are likely dimeric (Fig. 1b) based on homology with the metabotropic glutamate receptors (Wu et al. 2014). These structural predictions lack the precision needed to resolve if V2Rs form homo- or heterodimers, although VNO neurons have been shown to contain transcripts from at least two V2R subfamilies (Martini et al. 2001; Francia et al. 2014; Akiyoshi et al. 2018). The combinatorial potential of V2Rs would greatly increase the spectrum of ligands potentially detected by the VNO. Regardless, the large number of OR (362) and V2R (430) genes in C. adamanteus suggests a complex basis for the chemosensory phenotype.

Chromosomal Distribution of ORs and V2Rs Suggests Sex-Biased Chemoperception

Male rattlesnakes are the homogametic sex with two copies of the Z chromosome, and females are heterogametic with Z and W chromosomes. To assign genomic contigs to sex chromosomes for our C. adamanteus assembly, we first mapped short genomic reads from a male and a female to the assembly. Based on male:female scaled mean read depth ratios, we assigned 977 contigs to chromosome W (42 Mbp; 2.63% genomic contribution), 1484 contigs to chromosome Z (91 Mbp; 5.68% genomic contribution), and the remaining 8908 contigs as autosomal (1.46 Gbp; 91.68% genomic contribution). The Z chromosome genomic contribution of 5.68% for our female rattlesnake is consistent with the male Crotalus viridis Z contribution of 8.5% (Schield et al. 2019). We further assigned 7387 of the 8908 autosomal contigs to specific chromosomes via whole genome scaffolding using RaGOO (Alonge et al. 2019) and the chromosome-level C. viridis genome assembly as a reference (Schield et al. 2019). Of the remaining 1521 contigs, 1041 were considered contiguous but unplaced and the remaining 480 were considered unplaced. Inferred chromosome placements of chemoreceptor genes, pseudogenes, and truncated genes are shown in Table 1. Unplaced contiguous chemosensory gene groupings can be found in the Supplementary Materials online (Supp_file_03).

The presence of ORs and V2Rs on the Z and W sex chromosomes suggests a potential route for sex-biased olfaction. The top three chromosomes ranked by descending OR copy-number were chromosomes Z, 1, and 4, and no ORs were detected on the W chromosome (Fig. 1c). Viperids, including rattlesnakes, lack a mechanism to compensate for the reduced gene expression in the heterogametic sex, i.e. no dosage compensation (Vicoso et al. 2013). The chromosomal distribution of ORs suggests that differences in expression levels of Z chromosome ORs between sexes could be a potential mechanism of observed sex-biased olfaction behaviors (Duvall et al. 1992). Similarly, 17 V2R genes mapped to the Z chromosome (Fig. 1C), suggesting a similar mechanism could also modulate V2R based chemoperception. Additionally, five V2R genes present on the W chromosome represents a potential genetic basis for female-biased chemoperception such as detection of offspring (Weldon et al. 1992; Duvall et al. 1992).

Diversification of Extracellular, Ligand Binding Domains Driven by Transient Positive Selection

We evaluated selection signatures across ORs and V2Rs found in the C. adamanteus genome. We generated maximum likelihood gene phylogenies with IQ-TREE based on full-length CDS alignments of putatively functional paralogs. Consensus trees and codon alignments were tested for directional selection within the HyPhy suite v2.5.8 (Pond et al. 2019). Generally, the statistical approaches implemented via HyPhy estimate nonsynonymous substitution rates and synonymous substitution rates to generate a ratio (i.e., dN/dS) which provides information about the evolution of each codon. Estimates of pervasive selection assumes a constant dN/dS across the entire phylogeny for a single codon site. We tested for pervasive selection using FEL v2.1 (Pond and Frost 2005) using a selection cutoff of 0.05 for significance. With FEL, we detected positive selection at 3 sites for ORs and 22 sites for V2Rs. We detected pervasive purifying selection at 295 sites for ORs and 629 sites for V2Rs.

We tested for the signature of episodic evolution in chemoreceptor coding sequences using the MEME program (v2.1.2) in the HyPhy suite using a selection cutoff of 0.05 for significance. This method allows the estimate of the dN/dS ratio for a codon site to vary across the phylogeny revealing periods of positive and negative selection along a lineage (Murrell et al. 2012). The analysis also reports the number of branches under selection corresponding to each codon site, which we called "branch-codons" (total branch-codons = # codon sites in alignment × # branches in phylogeny). We detected episodic selection at 41 sites and 197 branch-codons in ORs and 445 sites and 3442 branch-codons in V2Rs suggesting transient periods of positive selection drove chemoreceptor diversification across the coding sequences.

The ligand binding domains of chemoreceptors that provide scent specificity are generally located in the extracellular region, while the highly conserved 7-transmembrane domains and intracellular region provide the machinery for canonical GPCR signaling. We hypothesized that codon sites undergoing periods of positive selection would be biased toward the extracellular domains (Fig. 1b). For rattlesnake ORs and V2Rs, we confirmed all ligand binding and related features detected via NCBI’s CDD mapped to extracellular regions predicted by TMHMM (Lu et al. 2019; Krogh et al. 2001). We ran proportion z-tests evaluating whether the proportion of sites experiencing selection were significantly higher or lower for extracellular or intracellular regions. The proportions used for z-tests were based on the total number of sites per region detected by FEL and MEME using a selection cutoff of 0.05 for significance. For FEL, we tested if the proportion of positively selected sites was significantly higher or lower for extracellular or intracellular regions, and this test was repeated for the proportion of negatively selected sites (p < 0.05; Table 2). Signals for negative selection were significantly higher in the intracellular regions for both ORs and V2Rs, consistent with these regions performing conserved GPCR functions. For our measurements of episodic selection, we tested if the proportion of sites identified by MEME was significantly higher or lower for either extracellular or intracellular regions, and this test was repeated for branch-codons (p < 0.05; Table 2). Both ORs and V2Rs had proportionally more sites that had experienced episodic selection in their respective extracellular regions, however only the V2Rs were significant in the site-only analysis. For the branch-codon model, both ORs and V2Rs had significantly higher proportions of positive sites in extracellular regions, and this result is apparent in both gene phylogenies (Figs. 4 and 5). We colored branches shades of green corresponding with ( +) selection detected for extracellular sites (empirical Bayes factor > 100). A yellow star on the OR gene tree (Fig. 4) marks a particular Z-linked clade with widespread extracellular episodic selection, suggesting an evolutionary mechanism underlying potential sex-biased chemoperception. Among the autosomal V2Rs, we find lineages in the phylogeny with long branches extending from sub-clades experiencing positive selection in extracellular regions (Fig. 5). These rapidly evolving branches may be currently undergoing adaptive evolution, potentially for prey detection, suggesting the chemoreceptor repertoire is highly dynamic in both number of paralogs and sequence.

Table 2 Summary of proportion z-tests comparing total extracellular and intracellular sites experiencing selection for ORs and V2Rs
Fig. 4
figure 4

OR gene tree for Crotalus adamanteus colored by chromosome placement and relative signals of extracellular selection. The gene phylogeny represents a maximum likelihood reconstruction of full-length nucleotide coding sequences. Branches are shaded green based on signatures of episodic positive selection detected in extracellular sites (MEME analysis; empirical Bayes factor > 100): dark green distinguishes fewer sites detected per branch and light green distinguishes more sites detected per branch. Branches with no extracellular sites detected are colored black. The outside ring is colored according to chromosomal placements for individual genes (i.e., tips or leaves) on the tree. A yellow star on the tree marks a particular clade with candidate genes potentially evolving sex-biased function. Gene labels and bootstrap values were excluded for clarity; fully labeled consensus trees with gene names and bootstrap values can be found in the Supplementary Materials online (Supp_file_07a and Supp_file_07b). Tree figures were generated using iTOL via their online server (Letunic and Bork, 2019)

Fig. 5
figure 5

V2R gene tree for Crotalus adamanteus colored, labeled, and generated identical to the OR gene tree (Fig. 4). Fully labeled consensus trees with gene names and bootstrap values can be found in the Supplementary Materials online (Supp_file_07a and Supp_file_07b)

Conclusion

Rattlesnake sensory systems encompass phenotypic extremes representing a refined suite of predatory features involved in both the perception of prey (Saviola et al. 2013; Katti et al. 2018; Geng et al. 2011; Schraft et al. 2019) and prey capture (Margres et al. 2013, 2015; Rokyta et al. 2012, 2013). The majority of previous genomic studies on rattlesnakes focus largely on venom (Schield et al. 2019; Vonk et al. 2013; Yin et al. 2016). The eastern diamondback (Crotalus adamanteus) possesses one of the most thoroughly characterized venom systems (Rokyta et al. 2012; Margres et al. 2013) with reports of intraspecific venom variation tied to regional adaption to sympatric prey (Margres et al. 2016, 2017), phenotypic integration with fang and head morphologies (Margres et al. 2015), and ontogenetic shifts in venom composition as individual rattlesnakes mature (Rokyta et al. 2017). Although venom represents a crucial aspect of the rattlesnake predatory arsenal, the complete ecological context in which venom is deployed includes the chemosensory-based perception of prey-derived stimuli. Hence, characterizing the sensory repertoire of rattlesnakes directly complements our current understanding of their venoms by generating a more complete polyphenic representation of their predatory arsenal. Additionally, other behaviors directly relevant to fitness, including mate and offspring detection, depend on chemoperception.

We generated a comprehensive characterization of the chemoreceptor repertoire for a representative species, the eastern diamondback rattlesnake (Crotalus adamanteus) using genomic and transcriptomic approaches. We found that OR and V2R genes occur in large tandem-repeat arrays, suggesting a genomic mechanism where array duplications provide the extensive gene duplications needed to generate the large chemoreceptor complement. Additionally, we show for V2Rs that changes in this complement correspond with major changes in life-history traits and exhibits lineage-specific expansions and contractions demonstrating the potential ecological and evolutionary responsiveness of the chemoreceptor repertoire. We determined the chromosomal distribution of chemoreceptors, which suggests sex-biased chemoperception as a potential mechanism for observed male and female specific behaviors. Our selection analyses revealed the signature of episodic selection in the extracellular domains of chemoreceptors, an evolutionary mechanism for the diversification of ORs and V2Rs. Our combined results provide a basis for understanding the origins of a complex phenotype, chemoperception, and a stepping-off point for future comparative studies on the role of chemoperception in rattlesnake ecology and evolution.

Materials and Methods

Sampling, Sequencing, and Genome Assembly

Two Crotalus adamanteus specimens were utilized for genomic sequencing, and two additional specimens were utilized for chemosensory tissue RNA-seq. Animal ID#s, capture locality, and basic morphometric information is shown in Table 3. All specimens were collected from the wild under Florida Fish and Wildlife Conservation Commission permit #LSSC-13-00004A-C, and all procedures were approved by the Florida State University Animal Care and Use Committee (IACUC) under protocols #1529 and #1836.

Table 3 General morphological information for specimens used in this study

For generating the genome assembly, we utilized an adult female eastern diamondback rattlesnake (ID = KW1264) collected from the Apalachicola National Forest in northern Florida (Leon County). Although not utilized for the genome assembly, genomic reads were also generated for later chromosomal assignments from an adult male rattlesnake (ID = KW0944) collected from southern Florida (Miami-Dade county). We extracted genomic DNA using a standard Phenol–Chloroform-Isoamyl alcohol extraction from blood that was frozen in 95% EtOH upon collection. High molecular weight DNA aliquots were set aside for long read sequencing, and the remaining genomic DNA was sheared using a Covaris S220 Focused-ultrasonicator with a target average fragment size of 500 bp. DNA libraries were prepared from the sheared DNA using the Illumina TruSeq DNA PCR-Free library prep kit and quality checked using a 1% agarose gel. KAPA PCR was performed to determine the amplifiable concentration prior to performing 250 bp paired-end sequencing. The complete genome was assembled de novo using a hybrid approach, utilizing Oxford Nanopore (MinION; 195,747 total reads; read N50 value of 12,913) and Pacific Biosciences (performed by GENEWIZ in South Plainfield, NJ using a Sequel II; 6,111,768 total reads; read N50 value of 1,468,549) long-reads at 25 × coverage combined and Illumina 250 bp paired-end data (Illumina HiSeq 2500; 71,531,898 total read pairs) at 60 × coverage, employing MaSuRCA version 3.2.8 (Zimin et al. 2013). The final assembly was checked for duplicate sequences resulting from haplotype variants or assembly artifacts using purge_dups v1.2.5 (https://github.com/dfguan/purge_dups.git). No duplicate or chimeric sequences were detected, and the genome was considered a haploid assembly moving forward. The male snake Illumina short reads were generated using the same protocol as the female (Illumina HiSeq 2500; 35,572,858 total read pairs).

Transcriptomics were implemented to identify chemosensory genes and outline exonic structures for genome annotations. We utilized two Crotalus adamanteus specimens (IDs = DRR0044 and MM0114; additional information on Table 3) for chemosensory tissue transcriptomics. Tissue samples were dissected from chemosensory regions postmortem via the roof of the buccal cavity in the mouth. We targeted olfactory epithelial tissues representing both the vomeronasal organ and the main olfactory bulb for DRR0044 and only the main olfactory bulb for MM0114. Tissue samples were buffered and stored in RNA-later at − 80 °C until RNA extraction. Total RNA was extracted using a TRIzol-chloroform extraction as previously described (Rokyta et al. 2012). Buffered chemosensory tissue was added to 500 µL of TRIzol (Invitrogen) and homogenized with a 20-gauge syringe. The TRIzol-tissue mixture was then thoroughly mixed with 20% chloroform followed by an additional 500 µL of TRIzol. Next, the mixture was transferred to a 5Prime phase lock heavy gel tube and centrifuged at 20 °C. The supernatant was extracted and mixed with glycogen before precipitating with isopropyl alcohol and centrifugation. The resulting totalRNA pellets were subject to a 75% ethanol wash and reconstitution in rnase-free water. Extracted RNA was quantified with a Qubit RNA Broad-range kit (Thermo Fisher Scientific) and quality checked on an Agilent Bioanalyzer using an RNA 6000 Pico chip, per the manufacturer’s instructions. Next, an NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs) was used to isolate the mRNA. The isolated mRNA was fragmented for exactly 13 min to generate an average fragment size of 370 base pairs. Transcript cDNA libraries were prepped from isolated mRNA using the NEBNext Ultra RNA Library Prep kit with the High-Fidelity 2X Hot Start PCR Master Mix, Agencourt AMPure XP PCR Purification Beads, and Multiplex Oligos for Illumina sequencing (New England Biolabs). The resulting libraries were quantified and quality controlled on an Agilent Bioanalyzer using a High Sensitivity DNA chip. KAPA PCR was performed on the purified cDNA libraries by the Molecular Cloning Facility at Florida State University to determine amplifiable concentrations for sequencing. cDNA libraries were diluted to concentrations of ∼5 nM and pooled with other libraries to optimize sequencing capacity. The pooled sample of cDNA libraries was then quality checked with another High Sensitivity DNA Bioanalyzer chip, and the amplifiable concentrations were determined with KAPA PCR. The pooled cDNA libraries were sequenced on an Illumina HiSeq 2500 to generate 150 bp paired-end reads at the Florida State University College of Medicine Translational Laboratory. The resulting chemosensory tissue RNA-sequencing yields were 14,021,239 read pairs for MM0114 (olfactory bulb only) and 22,067,312 read pairs for DRR0044 (olfactory bulb & VNO). The % contamination and % trimmed was 0.014 and 0.5 respectively for MM0114, and 0.008 and 0.5 for DRR0044.

Chemosensory Gene Identification and Annotation

We implemented a conservative approach for chemosensory gene identification and annotation. Gene annotations and downstream analyses were visualized using Geneious v9.1.5 (Kearse et al. 2012). Genomic contigs potentially containing chemosensory genes were identified using BLAST and a custom reference gene database (Camacho et al. 2009). This database comprised full-length nucleotide coding sequences (e.g., start and stop codons confirmed; intact and complete 7-trans membrane domains) compiled from three snake species and two lizard species (including genera Anolis, Gekko, Protobothrops, Thamnophis, and Python; sequences downloaded from the NR database October, 2018). The finalized database included coding sequences spanning all genes identified or predicted to be GPCRs. Any genomic contigs with one or more BLAST hits from this database were isolated from the full-genome assembly, and temporary BLAST-hit annotations were overlaid to highlight target regions for downstream assessment and gene annotations.

Chemoreceptor gene annotations were hand-curated and guided by the RNA-seq transcriptome data aligned to the genome (transcriptome sequencing described in previous section). To identify and reduce sample cross contamination, raw reads were filtered and quality checked using custom scripts and FastQC v0.11.5. Adapters and low quality reads were trimmed from raw reads using Trim Galore! v0.4.5. (Krueger 2015; accessed October 2018). Cleaned read pairs were mapped to the genome using the splice/intronaware aligner HISAT2 v2.1.0 (Kim et al. 2015), and indexed BAM alignments were generated using SAMtools v1.6 (Li et al. 2009), effectively outlining exonic structures for expressed genes from mRNA transcripts. Annotations for putative chemoreceptors were first generated for highly expressed genes with > 100 × coverage overlapping the BLAST-hit regions on the isolated contigs. These initial high-confidence annotations later served as references for predicting structures (e.g., splice sites, starts, stops, UTRs) for genes with low or partial coverage. Stop codons occurring within the highly conserved regions of GPCRs (e.g., the 7-trans membrane domain) signaled pseudogenization (Hilger et al. 2018). Unexpressed genes (0 × coverage) similar to expressed chemoreceptors with intact protein domains were also annotated based on the expressed genes using BLAST and gene prediction tools available in Geneious (Kearse et al. 2012). Annotated genes were back-searched against the genome using BLAST as a final check to confirm all recognizable paralogs had been annotated. The finalized gene annotations include gene IDs, exons, coding sequences, mRNAs, UTRs, and poly-A tail signals. Pseudogenes and truncated genes were not explicitly targeted for this study; however, any occurring on contigs containing full-length genes were annotated to provide context for downstream synteny analyses. Due to the extreme diversity of ORs and V2Rs and high potential for novelty in snakes, these genes were named as OR_### and V2R_###, respectively, where increased numbering corresponds with increased CDS length (e.g., we named the shortest olfactory receptor “OR_001”). Finalized gene coding sequences were isolated for downstream phylogenetic analyses. Top BLAST-p hits and corresponding conventional names for translated coding sequences of the final consensus chemosensory genes can be found in the Supplementary Materials online (Supp_file_04; final BLASTp search completed July, 2020).

Qualitative protein assessments using translated chemosensory gene coding sequences were iteratively performed throughout the genome annotation process. We utilized NCBI’s Conserved Domain Database via their online server to check for completeness of conserved GPCR protein features from translated coding sequences (Marchler-Bauer et al. 2014; Lu et al. 2019; searches completed between October, 2017 and June, 2020). We also checked all annotated coding sequences for the presence of signal peptides using SignalP v5.0 via their online server (Armenteros et al. 2019; Emanuelsson et al. 2007; searches completed August 2019). Structure homology and oligomerization predictions based on highly similar amino acid sequences (i.e., above 30% identity) were also considered using SWISSMODEL Expasy via their online server (Waterhouse et al. 2018; Guex et al. 2009; Bertoni et al. 2017).

Chromosomal Inferences

To identify candidate sex chromosome linked scaffolds, genomic reads from an adult male (ID = KW0944) and an adult female (ID = KW1264) were mapped to the genome assembly and evaluated for read depth and coverage. Male and female short read data (Illumina sequencing described above) were aligned separately using BWA v0.7.12 and read depth and coverage was calculated using SAMtools v1.10 (Li et al. 2009). Sex chromosomes were assigned mirroring methods described by Yin et al. (2016), with several adjustments. Specifically, we adjusted depth criteria and coverage filtering to improve predictions by accounting for noise of potential chimeric contigs with sex chromosome identity for more than half their length. Males are the homogametic sex in snakes with two copies of the Z chromosome, so the expected W chromosome read mapping ratio of male to female (M:F) should be close to zero. We defined W chromosome contigs as those with M:F scaled mean read depth ratios less than 0.3, at least 50% female coverage, and less than 50% male coverage. The expected M:F scaled ratio of a Z chromosome contig is 2:1. We defined Z chromosome contigs as those with M:F scaled mean read depth ratios greater than 1.6 and at least 50% coverage for both male and female. All contigs not identified as Z or W were considered autosomal. We assigned autosomal contigs using RaGOO v1.1 with the chromosome-level Crotalus viridis genome assembly as the reference (Alonge et al. 2019; Schield et al. 2019). The reference Z-chromosome and previously assigned sex-linked contigs were excluded from RaGOO scaffolding to prevent errors from misplacing previously assigned sex-linked contigs. Any genomic contigs which failed to map to the reference were considered “Unplaced”, and contigs that mapped to sub-chromosome resolution regions of the reference assembly were considered “Unplaced contiguous”. Chimeric contigs identified and split by RaGOO were assigned to whatever chromosome comprised the larger percentage of the contig. Genes were assigned to chromosomes based on the final placements of their corresponding genomic contigs, and these assignments were included in the phylogenetic gene trees for Crotalus adamanteus ORs and V2Rs (Figs. 4 and 5), which were each overlaid with color rings representing chromosomal placement of tips. The final gene-chromosome assignments were also utilized for generating the proportional donut plots in Fig. 1c.

Phylogenetics and Selection Analyses

We limited phylogenetic inferences and selection analyses to coding sequences extracted from the C. adamanteus genome, and thus only interpreted gene family diversity and relationships across paralogs in a single species. To generate the most functionally informative alignments for membrane-bound chemosensory genes, we aligned translated OR and V2R coding sequences using PRALINE (Pirovano and Feenstra, 2008) via their online server with default parameters plus the TMHMM v2.0 transmembrane prediction option (Krogh et al. 2001; alignments completed June 2020). Detailed reports for these alignments and the extracted coding sequences can be found in the Supplementary Materials online (Supp_file_05 and Supp_file_06). Resulting amino acid alignments were back-translated using PAL2NAL v14.0 via their online server (Suyama et al. 2006). Back-translated alignments were utilized for downstream phylogenetic reconstructions and selection analyses.

Maximum likelihood trees were inferred for each alignment using IQ-TREE v2.0.5 (Nguyen et al. 2014; Chernomor et al. 2016; Kalyaanamoorthy et al. 2017; Hoang et al. 2017). Input alignments were specified as codon data using the -st CODON option, and the best substitution models were chosen automatically using ModelFinder (Kalyaanamoorthy et al. 2017). Branch support was assessed using ultrafast bootstrap approximations and single branch SH-like approximate likelihood ratio tests using the -B and -alrt commands with 1000 replicates each (Hoang et al. 2017; Guindon et al. 2010). Tree figures were generated using iTOL via their online server (Letunic and Bork 2019; the full IQ-TREE outputs with gene names and bootstrap values can be found in the Supplementary Materials online as Supp_file_07a and Supp_file_07b). Consensus trees and back-translated alignments were consolidated into nexus files and tested for directional selection within the HyPhy suite v2.5.8 (Pond et al. 2019). Specifically, we tested sites for pervasive selection using FEL v2.1 (Pond and Frost, 2005) and episodic selection using MEME v2.1.2 (Murrell et al., 2012). The compiled raw outputs of both FEL and MEME are available in the Supplementary Materials online (Supp_file_08). To investigate if selected sites were correlated with structural regions, we ran proportion z-tests comparing selected sites from extracellular and intracellular regions. Proportions were based on the number of selected extracellular and intracellular sites out of all extracellular and intracellular designated sites. We first tested this using the FEL and MEME default selection cutoff of 0.1 for significance, and this was repeated using 0.05 for significance. Since significant findings were identical between the repeated analysis, only the results from the latter are shown in Table 2. For FEL, we tested if the proportion of positively selected sites was significantly higher or lower for extracellular or intracellular regions, and this was repeated for the proportion of negatively selected sites (p < 0.05). For MEME, we tested if the proportion of episodically diversifying sites was significantly higher or lower for extracellular or intracellular regions, and this was repeated for branch-codons (p < 0.05). Test result summaries can be found in Table 2, and full outputs from the selection analyses and statistical tests can be found in the Supplementary Materials online (Supp_file_08).

Comparative Analysis of V2R Arrays

To demonstrate the utility of our thorough chemosensory characterization for C. adamanteus, we traced the evolution of a distinguishable V2R array between five squamate species with available genomes (Fig. 3). In addition to our C. adamanteus assembly, we downloaded the following genomes from Genbank (accessed January 18th, 2021) and assessed them in Geneious Prime v2021.0.3: Crotalus tigris (Genbank accession GCA_016545835.1), Thamnophis elegans (GCA_009769535.1), Varanus komodoensis (GCA_004798865.1), and Anolis carolinensis (GCA_000090745.2). Full length V2R gene sequences were extracted from the C. adamanteus genome and used as queries for BLAST searches of the other genomes. We selected a V2R array with tractable homology between the five genomes for further analysis. This array is distinguished by several conserved immunoglobulin type-V (IG-V) subunit containing regions flanking and interspersing the tandemly repeated V2R genes (Fig. 3a). The full array was recovered from a single contiguous genomic region from Crotalus tigris (contig ID = VORL01000046.1), Thamnophis elegans (NC_045558.1 (reversed)), and Anolis carolinensis (NW_003339038.1 (reversed)), while manual scaffolding was necessary to recover the complete array in the correct orientation from C. adamanteus (contig IDs = Cadam-autosomal_5021_of_8908, Cadam-autosomal_949_of_8908, Cadam-autosomal_239_of_8908 (reversed), Cadam-autosomal_7936_of_8908 (reversed), and Cadam-autosomal_8446_of_8908) and V. komodoensis (SJPD01000131.1 (reversed), SJPD01000180.1 (reversed), SJPD01000374.1 (reversed), and SJPD01000152.1). The manually scaffolded arrays were simply full-length contig concatenations with no trimming.

To identify potential homology between the lizard and snake V2Rs, we assessed synteny and generated a reduced V2R gene phylogeny using only the genes identified from this array across the five genomes. Most V2R gene annotations among the downloaded genomes were either incomplete or missing from our BLAST-matched regions. For consistency, we manually annotated these regions as described above for C. adamanteus. Full-length gene sequences spanning the predicted exons, introns, and UTRs (Fig. 3b) were extracted, combined, and aligned using MUSCLE v3.8.425 with default parameters and two iterations as is recommended for larger sequences (Edgar, 2004). The nucleotide alignment was manually trimmed to the shortest sequences on each end to minimize end-gap biases. The final alignment we used to generate the phylogeny is available in the supplementary materials online (Supp_file_9). We generated a maximum likelihood gene phylogeny representing this array using IQ-TREE v2.0.5 (Nguyen et al. 2014; Chernomor et al. 2016; Kalyaanamoorthy et al. 2017; Hoang et al. 2017), with identical input commands described earlier minus the -st CODON option. The resulting IQ-TREE output with gene names and bootstrap values can be found in the Supplementary Materials online (Supp_file_9). A handful of snake V2R genes grouped with lizard V2R genes on the phylogeny, and these were colored red and orange in Fig. 3c. To visualize this relationship overlaid onto the syntenic representation of the gene array, we included red and orange color shading connecting these "lizard-like" V2R regions in Fig. 3a. General information corresponding to the additional genomes can be found in the supplemental materials online (Supp_file_08).